Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

LABScon Replay | Chasing Shadows | The Rise of a Prolific Espionage Actor

By: LABScon
20 February 2024 at 21:12

In an engaging exploration at LABSCon, Kris McConkey unveils the evolution and significance of a cyber espionage actor, dubbed as a “superpower” in the digital espionage arena. This actor, initially engaged in phishing campaigns, has matured into one of the most technically sophisticated and deeply entrenched entities in cyber espionage.

Evolution

Tracing back over a decade, public and private intelligence reports have consistently highlighted the actor’s growing sophistication. From early stages marked by widespread malware distribution, such as PlugX and ShadowPad, to a more controlled dissemination of advanced tools like Crosswalk and Sidewalk, the actor has demonstrated a strategic tightening of their operational framework.

Technical Sophistication

The actor’s technical prowess is evident through the use of ShadowPad, a tool first emerged around 2015, with SentinelOne offering a comprehensive analysis on its evolution. Notably, ShadowPad has been adopted by at least 13 distinct threat actors, showcasing its wide influence. Introducing ScatterBee loader in 2020 marked a significant technical leap, showcasing advanced obfuscation techniques that complicate malware analysis efforts.

Operational Tactics

The presentation delves into the operational intricacies of the espionage actor, including their unique approach to malware loading and execution. A notable shift was observed in August 2022, with the discovery of a new ShadowPad variant that employed a novel execution mechanism, further emphasizing the actor’s ongoing innovation and adaptation.

Global Reach and Sector Focus

The actor’s operational scope is global, impacting over 35 countries across various sectors. This widespread engagement underscores the actor’s strategic intent and capability to infiltrate various targets, from governmental bodies to the telecommunications sector. Their focus extends to high-value targets, leveraging tailored malware like FunnySwitch and Spider for specific operations.

Infrastructure and Techniques

An in-depth analysis of the actor’s infrastructure reveals a multi-layered approach, involving relay networks and virtual private servers to obfuscate their activities. This infrastructure supports various capabilities, from direct victim access to sophisticated tunneling techniques, enabling the actor to maintain a persistent threat landscape.

Insights Based on Numbers

  • The actor has evolved over ten years, highlighting their long-term presence and impact.ShadowPad has been utilized by 13 distinct threat actors, indicating its widespread adoption.
  • The espionage network has targeted over 35 countries, demonstrating its global reach.

In conclusion, the rise of this espionage actor from modest beginnings to becoming a formidable force in cyber espionage illustrates a significant shift in the cyber threat landscape. Their ability to innovate, adapt, and execute sophisticated cyber operations underscores the need for advanced defensive strategies and international cooperation to counteract their pervasive influence.

Watch the full presentation:

About the Presenter

Kris leads PwC’s Global Cyber Threat Intelligence practice, which tracks a wide variety of targeted threat actors operating from more than 25 countries.

Kris also leads the EMEA Cyber Threat Operations practice – a front line technical services group responsible for a portfolio of defensive and offensive cyber security services to help clients detect and respond to cyber security threats and incidents. He has spent the past 17 years at PwC delivering cyber incident response, threat hunting and threat research services to global clients.

About LABScon 2023

This presentation was featured live at LABScon, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

LABScon Replay | Send Lawyers, ‘Garchs, and Money

By: LABScon
18 January 2024 at 16:43

Allegations of oligarch elections meddling and influence is old news in 2024, but while prosecutors focus on the money trail in building threat intelligence based cases for indictment, don’t overlook oligarch-funded lawyers with creative delay and distract defense tactics.

From twisting data privacy laws to using funds for SLAPP (Strategic Lawsuits Against Public Participation) libel cases to leaking legal discovery, Elizabeth Wharton dissects a series of US and UK cases citing the Mueller report and the Steele (Orbis) Dossier as examples where oligarchs have thrown lawyers and money as curveballs to exert influence and thwart cybercrime prosecutions. Liz explores the chilling effects that strategic lawsuits can have on researchers when their findings are buried or discredited in lengthy and expensive legal process.

Liz also discusses ways to further leverage these cases as opportunities for closing policy gaps, extend anti-SLAPP legislation and improve open source intelligence data gathering.

Watch the full, fascinating talk below!

About the Presenter

Elizabeth (Liz) leverages almost two decades of legal, public policy, and business experience to advise researchers and to build and scale cybersecurity and threat intelligence focused companies. In addition to having led operations at two adversary research focused startups, her recent prior experience includes serving as the Senior Assistant City Attorney on Atlanta’s ransomware incident immediate response team. Liz was recognized as the 2022 “Cybersecurity or Privacy Woman Law Professional of the Year” by the United Cybersecurity Alliance.

About LABScon 2023

This presentation was featured live at LABScon 2023, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

LABScon Replay | Spectre Strikes Again: Introducing the Firmware Edition

By: LABScon
28 December 2023 at 16:00

The excitement surrounding speculative execution attacks may have subsided, but sadly, such threats remain. Binarly Research has discovered a vast attack surface still vulnerable to known issues like Spectre v1 and v2 on AMD silicon. Ineffective mitigations and the complexity of validation negatively impact the AMD device ecosystem.

While the industry is currently concentrating on constructing confidential computing infrastructure, foundational design problems reveal a lack of basic security at the hardware level. This discovery was made possible due to the asynchronous nature of firmware and hardware security fixes development.

Throughout their lifecycle, devices are susceptible to security issues due to the asynchronous nature of firmware security fixes delivery from multiple parties and the asynchronous nature of the supply chain. The lack of transparency in vendor security advisories results in an opaque channel for informing customers about the criticality of released security fixes and leads to varying approaches to patching widespread vulnerabilities with industry-wide implications.

Even major silicon vendors develop mitigations for side-channel attacks differently. This situation presents an opportunity for potential threat actors to exploit known speculative attacks like the 5-year-old Spectre or the 1-year-old Retbleed. A new perspective is needed to construct an attack vector that utilizes speculative attacks to target UEFI-specific firmware vulnerabilities.

In this presentation, we discuss our research into the potential use of speculative attacks against the System Management Mode (SMM) on AMD-based devices and outline the methodologies we employed throughout our research investigation.

About the Presenter

Alex Matrosov is CEO and Founder of Binarly Inc. where he builds an AI-powered platform to protect devices against emerging firmware threats. He is the author of numerous research papers and the book Rootkits and Bootkits: Reversing Modern Malware and Next Generation Threats. He is a frequently invited speaker at security conferences, such as REcon, Black Hat, Offensivecon, WOOT, DEF CON, and many others.

About LABScon 2023

This presentation was featured live at LABScon 2023, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

LABSCon Replay | Intellexa and Cytrox: From Fixer-Upper to Intel Agency Grade Spyware

By: LABScon
26 December 2023 at 17:00

In this enlightening LABScon Replay session, Vitor Ventura, senior security researcher at Cisco Talos, alongside Michael Gentile, delves into the intriguing evolution of Intellexa and Cytrox in the spyware domain.

The Developmental Saga of Intellexa and Cytrox

Mercenary spyware companies need to evolve their spyware capabilities just like software from any other commercial company. This presentation details an account and timeline of one such mercenary organization, from almost bankrupt to having a fully working spyware targeting iOS and Android with one-click zero-day exploit.

Ventura and Gentile explore the journey of Intellexa, which emerged from the amalgamation of Nexa Technologies, WiSpear, and Cytrox, focusing on Android spyware. The talk sheds light on the critical developments that marked Intellexa’s ascension as a formidable entity in the spyware industry, adept in targeting both iOS and Android platforms.

A Deep Dive into Spyware Development

Ventura and Gentile comprehensively analyze ALIEN/PREDATOR, Intellexa’s flagship spyware suite. Through a combination of code analysis and Open Source Intelligence (OSINT), they chart the evolutionary path of this advanced spyware, revealing its sophisticated capabilities.

The presentation dissects the pivotal moments in the development cycle of the ALIEN/PREDATOR spyware suite, offering the audience valuable insights into spyware research methodologies.

Analyzing the Intricacies of Spyware Components

An important part of the talk is dedicated to the technical breakdown of the spyware’s components. The presenters discuss the distinctions and similarities between the ALIEN/PREDATOR suite and the standalone PREDATOR for iOS, providing a clear understanding of the platform-specific nuances.

This session is a recommended watch for those interested in the complexities of spyware development and its broader implications in cybersecurity. Ventura and Gentile impart a thorough understanding of the nuanced world of digital espionage and the dynamic cyber threat landscape.

Watch the Full Talk Below

About the Presenters

Vitor Ventura is a Cisco Talos security researcher and manager of the EMEA and Asia Outreach team. As a researcher, he investigated and published various articles on emerging threats. Vitor has been a speaker in conferences, like VirusBulletin, NorthSec, Defcon’s Crypto and Privacy Village, among others. Prior to that he was IBM X-Force IRIS European manager where he was the lead responder on several high profile organizations affected by the WannaCry and NotPetya infections.

Mike Gentile is a Senior Security Researcher at Cisco Talos.

About LABScon 2023

This presentation was featured live at LABScon 2023, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

LABScon Replay | The Cyber Arm of China’s Soft Power: Reshaping a Continent

By: LABScon
6 December 2023 at 13:04

In his keynote at LABScon23, SentinelLabs’ Principal Threat Researcher Tom Hegel addressed a crucial but often overlooked aspect of global cybersecurity: cyber threat activity in less-monitored regions, particularly Africa.

Focusing on China’s strategic use of soft power across the African continent, Hegel provides a compelling analysis of how technology and investments are wielded as tools of influence and control.

Highlighting its significant investments in key sectors, Hegel explores how China has established strategic influence in African telecommunications, finance, and surveillance sectors and the implications this has for cybersecurity.

While noting that such investments are attractive to African countries for their undoubted benefits, the talk raises concerns about the trade offs. In the realm of telecommunications, Chinese firms like Huawei and ZTE can be linked to potential cases of surveillance and control, evidenced by actions like internet clampdowns in Zimbabwe during politically sensitive times. In finance, an intricate web of financial engagements provide worrying opportunities for cyber espionage. Initiatives like the Safe City projects bring technological advancements but at the potential price of civil and political surveillance.

Hegel concludes with a call to action for the cybersecurity community. The importance of collaborative efforts in monitoring and understanding the cyber activities in these regions is essential not only for the direct protection of entities in undermonitored areas but also for a broader understanding of the global cyber threat landscape.

Connecting the dots between regional cybersecurity issues in Africa and their global repercussions, this talk advocates for a more inclusive view of global cyber threats, highlighting the need for a unified and informed response from the cybersecurity community.

Watch below to see the full talk. Read the accompanying research paper for an even deeper dive.

About the Presenter

Tom Hegel is a Principal Threat Researcher with SentinelOne. He comes from a background of detection and analysis of malicious actors, malware, and global events with an application to the cyber domain. His past research has focused on threats impacting individuals and organizations across the world, primarily targeted attackers.

About LABScon

This presentation was featured live at LABScon 2023, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

LABScon Replay | Quiver – Using Cutting Edge ML to Detect Interesting Command Lines for Hunters

By: LABScon
26 June 2023 at 13:16

What do GPT3, DALL-E2, and Copilot have in common? By grasping the structure and nature of language, these projects can generate text, images, and code that provide added value to a user.  Now, they even understand command lines!

Quiver – QUick Verifier for Threat HuntER – is an application aimed at understanding command lines and performing tasks like Attribution, Classification, Anomaly Detection, and many others.

DALL-E2 is known to take an input prompt in human language and draw a stunning image with impressive matching results; GPT3 and similar projects can create an infinite amount of text seemingly written by a real person, while Github’s Copilot can generate entire functions from a comment string.

Command lines are a language in themselves and can be taught and learned the same way other languages can. And the application can be as versatile as we want. Imagine giving a command line to an input prompt and getting the probability of it being a reverse shell, by an Iranian actor, or maybe used for cybercrime. A single prompt on its own may not help so much, but with the power of language models algorithms, the threat hunter can have millions of answers in a matter of minutes, shedding a light on the most important or urgent activities within the network.

In this session, Dean and Gal demonstrate how they developed such a model, along with real-world examples of how the model is used in applications like anomaly detection, attribution, and classification.

Quiver – Using Cutting Edge ML to detect interesting command lines for Hunters: Audio automatically transcribed by Sonix

Quiver – Using Cutting Edge ML to detect interesting command lines for Hunters: this mp4 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Dean Langsam:
So first of all, I need to say that our code is in Jupyter Notebooks and PyTorch. So if any one of you want to see the code, just use wheels, exploits and we'll be good. Okay, so this is Quiver. I think I did. We did. Gal and I. Let's begin those three logos or logos for three fairly new tools, although they're pretty famous. The first one is Dall-e two. The second one is GPT three and the and the third one is GitHub copilot. And let's start with some examples.

Dean Langsam:
So Dall-e two can create an image from text. In that example, we can see a cybersecurity researcher sitting on a beanbag in front of a pool in the desert in a fancy hotel trying to reverse engineer a nation state malware, working on a presentation in a realistic style. So that's you guys. If you can connect with that one, maybe this is you guys as you can see, it's not very good with text, but you are all cyber security researchers.

Dean Langsam:
GPT three or GPT three is a model that can generate text. It's applications in cybersecurity. Don't really need to read that. What you need to know is that except for the I've written only the gray part and GPT three created the rest.

Dean Langsam:
In the same manner GitHub copilot. I like,this is code that I actually use just some authentication stuff. And when I've written that I just I was just starting to use GitHub copilot and I like only the gray parts or the parts that I've actually typed in and GitHub copilot did the rest for me. You can see that even you have the function that like I made a typo, I called it anonymized password and like it understood that I mean to anonymize the password.

Dean Langsam:
Okay, so what's common to all those models? All those models understand language. They share language. Common language features between users or between applications. And part of the learning process is unsupervised, a term that we'll speak about later. The question is, can we do the same for the language of command lines? And the answer is yes, but well, no. So currently you're thinking like, what am I doing here? I came to a cybersecurity conference and we're here to talk about deep learning. Gal and I are not, firstly, cybersecurity people. We are coming from the field of machine learning and deep learning, and we try to get a free trip to Phoenix. So we managed to.

Dean Langsam:
We're going to talk about the problems we had with command lines before then. What changed that made this one possible. Then about our package Quiver, which as you've seen, the acronym came first. And eventually we'll show the big show of what we've got. This is Gal.

Gal Braun:
So I'm. Gal. Staff data scientist in SentinelOne for the last six years. A father of two. And Breaking Bad is the best show ever.

Dean Langsam:
And we are mostly the same person. I'm Dean. I'm a Staff data scientist in SentinelOne for three years, actually. Gal got me into the company. I'm a father of one, and Breaking Bad is the best show ever. Except maybe The Wire.

Dean Langsam:
So because we're not in a deep learning conference, let's do like a few minute intro to machine learning and deep learning. What you see here are cats and dogs, and those are called samples. We want to create an algorithm that can distinguish between cats and dogs.

Dean Langsam:
One way they try to do this before is like with algorithms that people are trying to generate. Maybe if it has like the ears are, the ears are that way and the tail is that way, maybe it's a cat, maybe it's a dog. And it was a very hard problem. Even a person couldn't tell you like, why the why am I seeing a cat or a dog in this picture? I just like when you know, you know.

Dean Langsam:
So we try to make this in deep learning. We just show the the computer, the algorithm, many examples of cats and dogs. This is called tagging or labeling. And you can go into Google and just type like give me pictures of dogs. Those would be the green ones and then give me pictures of cats. Those will be the red ones. And then you show the algorithm enough samples and it will create an algorithm using what we call training.

Dean Langsam:
Then when you give it a new sample, the gray one, you, you, you don't tell the algorithm which one it is, which one it is, and you put it in the algorithm and the algorithm spits out, well, this is a cat in the same fashion. It says, This is a dog. Now, that was a pretty easy problem because you could search that on Google, like, give me cats, give me dogs. Enough people tagged cats and dogs in the history of time.

Dean Langsam:
Um, but as my friend John Naisbitt, I know he's not actually my friend, but he's a very famous person. He told "We are drowning in information, but we are starved for knowledge". Like all of us have a lot of stuff, like pictures of things, command lines, language, many things. So what we have, we have many command lines in SentinelOne. The thing we don't have is tag data or label data. The people that can actually do tagging for label data like saying is this command line bad or good or bad? The green ones are good. The red ones are bad. Most of the people that can actually label the data for us are in the in this room.

Dean Langsam:
So I could ask you guys, instead of listening to the talk, give me ten minutes of your time and start tagging data for me. But that is very manual process and that would not scale up.

Dean Langsam:
So what changed? Well, in the old time, meet Mimi. Mimi Katz. She's. She's Jewish like us. And she has a task. Separate, like she gets many papers and we tell her separate those papers between, like, stuff about cyber security and stuff about machine learning. Even if she doesn't know, like, the two concepts, maybe she can try to distinguish between the two. The problem is that the papers are in Hebrew and she doesn't know Hebrew, so she could maybe try and do so. If you give her like thousands of examples, maybe she can try and understand the hieroglyphs of Hebrew and try to understand which hieroglyphs are machine learning and which hieroglyphs are cybersecurity. But that that would again not scale up.

So instead we can introduce a baby. This is a Wonak or Wonak Cry. Won also doesn't speak Hebrew. He doesn't speak any language. He's a baby. But what what he does have is time because he's a baby and people are speaking Hebrew and English next to him all the time. Where does it meet us? Well, this is the old way.

Dean Langsam:
We used to do things like the first one is task one. Give the student a task to distinguish between two things, then give another student its task to distinguish between two other things. A baby can do something else. We can try and give it books like first, understand language, understand what's Hebrew, understand the relationships between words. Just understand the language. Then when you give them tasks, we can give them a lot less data to learn on the tasks instead of like giving it like the whole history of data for each different task. And you're probably starting to understand where we're going with this.

Dean Langsam:
This is again a Quiver and what quiver understands it can do is that Quiver is the baby. We have again in SentinelOne. We don't have a lot of labeled data about command lines, but we have a lot of command lines. So we can just ask Quiver, well, start reading those command lines and start to understand the language of command lines. Of course, this is not as very simple. We have many command line languages and stuff like that, but basically you can just tell it like start reading command lines.

Dean Langsam:
Um, the way we do this is by, I think we call the masked language model. And basically we give it like a sentence and then we hide one of the words or a few of the words and then we can try it like tell it based on that sentence with the hidden word, try to predict that word. That's the way the model learns. This is how we create like, we virtually create labeled data for the task of learning the language.

Dean Langsam:
Ah, now, now, when we learn the language, we can deploy it into different tasks such as like, classify, classify between different executables. We can do anomaly detection. We can of course try to do distinguish between malicious and benign command lines and so on and so forth.

Dean Langsam:
That's, of course, like we have a saying in the data community that given infinite time and infinite data, the model, will learn everything, but unfortunately we don't have infinite time or data. So we try to help our models. In our specific case, we try to take the command line wisdom and deploy some regex rules on it. So you can see that we are trying to mask different directory paths. We try, we, we, we can understand when we are seeing a local IP or a public IP, we can see when we have base64 strings and all those kinds of rules that we've created to help our model.

Gal Braun:
So given that we have this data set of command lines that we pre-processed and we want to feed it to the model, and now eventually, as we mentioned before, the model receives numbers, it needs somehow to translate these strings into vector of numbers that it can can process. So the building blocks of language, which is in our domain called tokens. Let's see how we can extract them.

Gal Braun:
So there are several approaches and the main one will be to dissect these strings into words by using several separators like slashes or whitespaces, which is great if you want to keep the high level entities. For example, argument names, you see that the argument name is still intact, but it makes our lives a little bit difficult when we want when we tackle new strings. For example, if we see a new command line with a new argument name, we need to handle it somehow because we don't see it in our vocabulary.

Gal Braun:
So a different approach will be. Just to split the whole command line into single characters and single chunks, which is the minimum amount which from one. So it mitigates the issue of unknown data that we we tackle. But it, it, it makes it more difficult to understand the higher level entities. And it will take the model a lot, a lot more time to learn.

Gal Braun:
So there is the middle ground, some cool concept that was popped up several years ago which called Subwords. And I won't get in too much into details how it's happening, but it allows us to dissect the text into generic blocks.

Gal Braun:
You can see that these hashtags double hashtags in some of the tokens, which mean it's an end of a word or a start of a word. And it's it's it gives us the, the, the, um, the good parts of both worlds.

Gal Braun:
So what we good output are some things we can can extract with these models is feeding them text for example, like a single token or a whole command line. And we can extract some vector of numbers that we can use for different tasks. And actually, as mentioned before, we are taking this command lines feed it to a model which learn the general way semantics about the command lines and then fine tune it to specific tasks. And during this learning phase it's optimizing some – it's called weights, some numbers inside of this model which will be different for each kind of the tasks so we can extract command lines, representations based on specific tasks that we are interested in.

Gal Braun:
Okay. This was an intro about the core concepts of this model and how it works. And let's see some examples of the output of the results that we got. So here's a nice blob. And we took millions of command lines and fed it to some model and let it just learn the semantics of command lines. Each one of these dots that you see here is a single token from the text that that the model extracted.

Gal Braun:
Now we can take a take a look inside of these tokens and see if it understands some semantics about the command lines. Each each one of the dots is a vector and this is a two dimensionality reduction of the results. So for example, here you can see a minus no profile token, which is a known PowerShell argument. On the left side, you can see it's a zoom in to the specific space location of minus, no profile inside of these tokens representations. And as you can see on the right, you can see that no profile and a token and the green ones are the ones that was mathematically the closest one to it. And on the right and the small table is the five, the most the five most closest tokens to the specific token.

Gal Braun:
As you can see, the top three, which was the closest ones, are different PowerShell arguments or syntax, which is awesome because it really understands something about tokens from PowerShell, PowerShell command lines and the bottom two is not related straight straight to PowerShell, but it's a different arguments. For example, the second from the bottom is a Java argument which again symbolizes that it learns something about arguments to executables, which is nice.

Gal Braun:
A second example regarding that is a different token, which is double hashtag dot VBS quotes, which means the end of a file path inside of an argument value. And as you can see in a similar way, you can see that the top three ones are different VBS tokens, but the rest of them are in the exactly in the same patterns but with different file extensions.

Gal Braun:
So it's dot js, dot bat, PL, JAR and so on. And it really understand that these patterns, these tokens are related inside the same space and give it similar vector numbers and which eventually led us to the conclusion, okay, we have something, it's not totally random and, and we can try and take this model and fine tune it to some task that we want.

Gal Braun:
So, so the most obvious thing that we can think about was trying to teach the model, whether a specific command line is malicious or benign. And what we did is, okay, so we have this baseline language model that learned the general semantics, but we want to fine tune it to this specific task. So firstly, we need some labels. Sentinelone got an MDR service which called Vigilance, which basically going through different cases, different threats that's happening in our customers computers and decide if a specific case is malicious or benign. And we use these cases to try and decide and extract some command lines that we know it would be malicious and vice versa.

Gal Braun:
So here you can see PowerShell command line from a specific malicious threat that was happening and the model actually signed it as malicious, which is cool. But these kind of models let you extract something even more, even more fruitful. You can. Try and extract for each one of the tokens how much it supported to the to the decision if a command line was malicious or benign.

Gal Braun:
So, for example, you can see here the different parts, that led the model to to decide this classification. So for example, here you can see the invoke web request inside of this PowerShell and some parts of the URL cause it to think this command line is malicious.

Gal Braun:
In a similar way here. Another two examples. The the middle one is another PowerShell malicious command line that the model decide what it was. It was malicious and you can see on the areas it focusing like for example, the non interactive token or there's like a it's a little bit faded but the sleep function in the end of of the PowerShell command line which it learned from the data that we fed it, what is malicious and might cause it to be a malicious command line.

Gal Braun:
And the third third example is a benign, entirely benign command line. It's just a win word exe executable that gave in some file path. And the model think it's very, very sorry, I didn't explain that the red parts are saying it's more malicious and the green ones led it to think it's more benign. And you see that the the the fact that the win word is the name of the executable and some string parts in the file name cause it to think it's it's a benign command line.

Gal Braun:
And so what can we do with this this model besides just predicting on a single command line? So firstly, we can just take this model and even if it's not 100% accurate and take it and just throw every command line from a customer environment through this model so it might have mistakes, but it can help us as hunters, for example, find our blind spots, reduce this, this all the areas that we might miss because there's a bunch of threats, a lot, a lot of information just going through our customers and environments.

Gal Braun:
And we have to focus somehow. So this tool can help hunters to focus on the areas that they might missing. And from other aspect, this kind of explanations to understand what causes these command lines to be more malicious or more benign can help us understand our customers information and make conclusions. And even, for example, we can try and let's write a YARA rule that specific fits for these kind of patterns that we see in on malicious command lines or, for example, command lines that the model usually think it's more malicious.

Gal Braun:
So this was one example. And the second one that we wanted to talk about was executable classification. And what we did is take our millions of command lines and split them by arguments and executable. And we fine tune the model to try and given a set of arguments to tell me which executable is it.

Gal Braun:
So another piece of art on the right side. You can see each one of these dots is another reduction to the dimensions of an argument, a set of arguments. And the color is the is the executable. And as you can see, this representation is is is excellent, is actually is very, very good. And most of the clusters are very uniform, which means it actually learns something about which arguments are relevant to which executable. And there are even more interestingly, there are clusters that are not unified which make us think, what are these clusters and what are these interesting command lines that look like different executables.

Gal Braun:
So here is just to have some a little bit more practical examples. You can see some of the clusters like main executable, like CMD or VPC, and actually a cool byproduct you can see at the top like three browsers, different browsers that arose in different clusters but was around the same area in these n-dimensional space. And but you can try and extract some cool information from these clusters, for example, some intent here, for example, a cluster that was based from mostly communication executables, or here you can see a cluster that most of the arguments inside was like Java arguments and one cmd. And if you print this cmd command line, it was actually execution of a Java, which is it actually makes sense. But this tool can be used to try and tag and understand the intent of specific command line without even looking at it. You can try and use this model to try and see a new command line that fell inside of one of these cluster to try and predict, okay, this cmd.exe, it did something that we know is maybe executing Java.

Gal Braun:
And and the last example here is you can see this big giant cluster is full of different PDF readers. And on the bottom you can see two example of CMD and MSEDGE that also opened PDF files and which again we can understand that these clusters, these representations in this cluster and we can tag it with some nice intent and try and predict for a specific command line.

Gal Braun:
So I'm sure that there is at least one person in this audience that think, do this stuff, can do, can solve this thing with regex, sit and try and, and write sophisticated patterns. But the awesome part of this model is just feed them a bunch load of data. You don't need to really fine tune it specifically for the task that you want. And as we mentioned, I think it was like the first day. More and more there are more and more attack vectors for third parties executables and this thing, if you like, keep feeding it more and more data, it will understand better the semantics of command line and easily can be fine tuned to the task that we want. And if the results would won't be good, we still have a saved spot in art school. And. And that's it. Thank you. Any questions?

Speaker3:
Yeah. Have you found any, like, openly available databases, systems with tons and tons of points relevant to this community that we could use for our own? Play on Machine learning and.

Gal Braun:
Do you mean? Like given these representations that were created, whether we found something that we can publish to the community and use it?

More like. Say I don't have the entire database of SentinelOne data to work against, but I do want something to put it against that threat. Researcher. Is there anything, any direction you would push me?

Dean Langsam:
Yeah. So this is currently like only the research phase, but the same way you can use Dall-e two. Although you're not an artist, probably we've never met. You're not an artist, you're not a poet, but you can use GPT three and you can use Dall-e two. Once we have like a working model, it should understand even like new stuff that are in that domain. So even if you give it like a new command line, if we trained it well, if you give it a new command line, it could say like the things that we've taught it to say in that way, if it if we prove it successful and actually good, then yeah, of course we can can do it.

Dean Langsam:
And one of the things that is fairly new in our world is that like Dall-e two is one specific implementation of a bigger academic paper that's called clip. And basically the thing that the most special thing that Dall-e two had is the data itself. But it gives you the data. Now if you say I have more data, I can start from that model. The model itself is open, open source. You can start from that model and train it on your own. I probably take you a lot of time. You need many GPUs, but like it's available to you. It's just a question of like time and money and not. Um, like a proprietary stuff and stuff like that. Yeah. So.

Gal Braun:
So it depends. It depends what you exactly want to achieve. Because overfitting it sounds like it's the worst nightmare for every data scientist, but it might be good for you if you specific want to find an abnormal activity in a specific customer. If you want the model to be fine tuned for a specific customer and extract information. It depends on the applications. And but yes, exactly.

I think one of the reasons we thought about, for example, normalizing paths or local IPS or base64, it was to ease the training. But also let's don't not fine tune into a specific IP or specific directory names so the road is still long before you get to something very mature that we can like publish publicly. But um, but yes, it's something that needs to be thought about and, and beyond that, like PII, for example, let's not give some attacker a option to my IP is something and it will complete it to some DNS server or whatever, something that's important to the customer. And. But yeah. Things to think about. Yeah.

Dean Langsam:
Uh, we're not product people. So once we show it to like the PMs, if they like it, like, as has shown, the part with the green and red parts is very cool to us. We'll customers find it useful. That's not on us, I think. I think it will be cool to show it, but again, the PMs will decide.

Thank you, guys.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp4 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you’d love including powerful integrations and APIs, collaboration tools, automated translation, automatic transcription software, and easily transcribe your Zoom meetings. Try Sonix for free today.

About the Presenters

Gal Braun is a data scientist at SentinelOne, working on Data Science & Machine learning focused on explainability, representation learning, and visualizations.

Dean Langsam is a data scientist at SentinelOne, working on the intersection of data science, machine learning, deep learning, language models, Python scientific programming, data visualizations, and Bayesian modeling.

About LABScon

This presentation was featured live at LABScon 2022, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

Keep up with all the latest on LABScon 2023 here.

LABScon Replay | Star-Gazing: Using a Full Galaxy of YARA Methods to Pursue an Apex Actor

By: LABScon
12 June 2023 at 14:16

This must-see talk discusses a highly-regarded but rarely publicly investigated threat actor, malware similarity, and YARA. Publicly available data yields just a generic AV signature with the actor’s name, leaving a void for malware analysts looking to understand the overlaps between different malware families attributed to the same actor.

Greg Lesnewich explores how analysts can use YARA as an analyzer with the console output, leveraging some simple Python scripting, to develop a malware similarity methodology. With a little – but not too much! – effort, analysts can easily build their own custom malware analysis toolkits using nothing other than freely available open source projects.

Greg’s presentation highlights just how well YARA can be used to pursue an apex predator and contains plenty of examples and links to all the tools used in the talk. Greg also shares the custom tooling he built as he analyzed a notorious threat actor, which can easily be adopted or adapted by other analysts to suit their own purposes.

Star-Gazing | Using a Full Galaxy of YARA Methods to Pursue an Apex Actor | By Greg Lesnewich (Proofpoint): Audio automatically transcribed by Sonix

Star-Gazing | Using a Full Galaxy of YARA Methods to Pursue an Apex Actor | By Greg Lesnewich (Proofpoint): this mp4 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Greg Lesnewich:
Hello, everyone. Thank you. To the lab’s organizers, to Ryan, to JAGS, everyone at S1, all the event staff for this amazing event. I think I’m not the only one that’s been enjoying a week here so far.

Greg Lesnewich:
My name is Greg. I work at an email company called Proofpoint. That is, my job is primarily doing what Victor does following me, chasing the L word out of our email data. And today’s talk is nothing about that. So before we start, this talk does discuss a bit of a taboo actor, which I track as Bright Constellation. But there are a litany of disclaimers that my wife and our company mandated that I say I do not discuss the incident responses. We do not actively pursue this actor. There are no leaked documents herein; this is personal research and although the actor is something that was a little bit attention grabbing previously, it was mostly sort of a interesting piece of data to explore developing a malware similarity via YARA.

Greg Lesnewich:
So there are going to be some musical references scattered throughout here that link to the naming of the malware families themselves. If you can figure them out, you can take a shot after the talk with me. So first, I think that YARA and a lot of parts of this conference only happen from learning through one another and being people being open and willing to share and teaching others. And so the list of humans and robots far exceeds this slide that have helped me to really learn and understand and develop some better ideas for detection ideas.

Greg Lesnewich:
A few that I want to call out today are Connor McLaughlin, Arielle and Costin from Kaspersky Xorex and of course, our pal Steve Miller. And so getting to the elephant in the room, our subject today is the Lamberts. Those I think everybody here probably knows who they are because Juan knows who they are. And at least in my time in the Threat Intel space, they have been maybe the highest regarded actor that I’m aware of. Juan has talked on and on and on about their amazing multi framework toolkit and their incredible operational security and their awesome tradecraft. And so I’m building on a lot of the work that Symantec and Kaspersky and previously FireEye had published about. But my interest in them is basically only because I knew that if I submitted about them, Juan was likely to accept my talk.

Greg Lesnewich:
So the Lamberts present a little bit of an interesting problem for us as an industry. Kaspersky had this amazing Kaspersky and Symantec really had this amazing series of very interesting actors with white papers getting published about them, like Equation like Project Sauron, like Stuxnet, Dooku, Name one. And they had all these really rich papers discussing the malware and doing all these sorts of deep technical analysis that you could walk away with an understanding of what was happening.

Greg Lesnewich:
And Kaspersky has no reason to like, I’m not putting throwing shade on them, but their paper about the Lamberts was noticeably shorter. There weren’t a lot of hashes published with it, but they did have this cool chart showing the constellation of the Lamberts toolkit that, you know, there wasn’t a white paper to sort of support the linkages or highlight what was going on there, which to me presented a pretty interesting opportunity because if you go on VirusTotal, there is a detection across ESET and Kaspersky that just says Lamberts, but it unfortunately is not linked to any of the colors listed there. So it presented kind of a fun black box for us to play with.

Greg Lesnewich:
And so I think like most other threat intel analysts, this is a familiar sight. After another vendor publishes a report, you have a list of files that if they didn’t publish a YARA rule or some other form of detection, you just sort of have to figure out detection in your own environment. And so, yeah, this is our starting point, I think like a lot of other investigations.

Greg Lesnewich:
And so the initial methodology and what we’re going to walk through a few different steps that I took that I thought was decently valuable. I’m going to take a macro view of all of the 50 samples that were available on VirusTotal at the time that I started this.

Greg Lesnewich:
And what we’re going to do is we’re going to rely really heavily on a couple of tools like Yara, particularly its console module. For those of you that aren’t familiar with it, it’s like a console, like anything else, like Python, whatever else. A script that Steve Miller built to sort of wrap the console module called Ronnie, which is a Ronnie Coleman reference that I think one person in this room gets. And then we’re going to use another tool called Binary Refinery to sort of show the evidence of some of the data that we’re working with. And given knowing the crowd here at Labs Con, I’m going to use that as an excuse to really roll really quickly through the first section of the content.

Greg Lesnewich:
So initially. Like most other analysts, you’re looking at samples in bulk. We’re going to look for overlaps across the import hash hashes of the sections, the resources, and then more like developer fingerprints. The PDB path, the DLL name, and then sort of looking at the general geometry of all these files. And so if you take this initial surface area, even for this elite, highly apex actor, we can already start to see some overlaps with these DLL names up here at the top and then some import hashes mixed with DLL .dll there at the bottom.

Greg Lesnewich:
And so one of the things that I want to really highlight in this talk is the codification of what you can do with a local YARA instance, like on an analyst machine and just plug your ideas into console output rules.

Greg Lesnewich:
And so you can have it burp out things like, say, the rich header hash and then use sort and unique to burp out overlaps. And as you work through this and look at at least in this actor in particular, and I think this applies to a lot of them, you can start to start. You can start to see a number of weird overlaps like these DLLs mixed with the A PDB paths. And if you iterate and iterate and iterate and you look at things like the resource and section hashes, ignoring that very obvious empty hash there at the top, you do eventually get to start clustering some of the families, notably the PDB path, the export names and. More like general hashing was really good for us to start to cluster some of these families together. And we actually have our first linkage across the malware families to each other with rationalist and cutting ties, sharing this weird smartcard helper string resource. Still don’t know what it means, but it’s sort of a weak link to point these families together. So after this first round, I think of methods and techniques that we’re all familiar with. We’ve had we have 13 families clustered, but we still had 30 to 40 files outside of those folders. So we still had more to do.

Greg Lesnewich:
And it sort of becomes immediately obvious that as you’re doing these things in bulk, using just features doesn’t like really highlight how the samples are related to each other and it’s pretty brittle. So an import hash can change. They can decide to change the name of an export. And so we want to do something a little bit more resilient. And so one of the themes of this talk is going to be, can we do more? Can we do better? And so let’s keep digging in and try and answer that.

Greg Lesnewich:
And I think the golden goose of all of YARA stuff is finding shared code, not from shared features. And the benefit that we have is that we can use YARA’s console output without necessarily needing to use something like a disassembler or a hex editor for every single file. Especially as for more traditional threat intel analysts, you’ll get the files in bulk, not one by one by one. And so if we want to sort of hone in on at least where code is, the PE file format dictates where it is, so we can look for sections just as a first example that are marked as containing code or as memory executable by the file format itself. And then instead of hashing the full thing where there might be padding, there might be differences in data at the end of it.

Greg Lesnewich:
What if we just hash the first 100 hex bytes and call that a sector hash and throwing this at the wall? There was already an easy win there with the eight sector hashes marked at the top compared to the rest of the seven numbered section hashes. So immediately we had something stick. And surprisingly, you know, we can see the data that gets hashed here. There are a lot of these three instructions that might not be the most interesting or unique code, but its position and its positioning, clustering together ends up being unique to a family that we track as rationalist.

Greg Lesnewich:
But can we do better than just blindly hashing data at the start of a section? I hope that the answer is yes. And so there are a couple of other places that the P tells us. There is code mostly at the entry point and the export functions. And so what if we did something silly like using a console rule to hash the first 20 bytes from the entry point on forward? And the other thing that you can end up doing with the console, instead of just putting in like a string, you can put almost the entirety of a YARA rule whenever you’re having sort of these. Maybe your AC failed over the summer and you’re having some weird ideas about how to find malware similarity. You can codify that sort of in the moment that you’re thinking about what you want to do and have it live on on your analyst machine forever and sort of codify that.

Greg Lesnewich:
And so it can sort of be YARA automation, maybe not perfectly, but in a way that you control. And so this, this ends up actually working and allowed us to cluster a few new families. In this instance, a family we tracked as Marianas Trench. You can see that these hashing that first 20 bytes got catches, a lot of conditional jumps and decrement instructions, which it turns out was really useful because the export name changed over a bunch of the samples. But hashing those first 20 bytes with those particular instructions was unique across not only that sample among the other Lamberts and Bright Constellation samples, but across all of VT and my own very small malware repository. So we had some wins from that and we were able to cluster some additional families using some of these sector hasher sector hashing entry point and export hashing methods, namely invisible enemy bloodletter and existence. But there are a lot more functions inside of these PE files, as many of you know, and so coming back to the question, can we do better? Most of those exports and entries entry point functions do call other functions. So how do we get to those?

Greg Lesnewich:
This actually ends up becoming a little bit of a math problem, which took me an embarrassingly long time to sort of figure out. But YARA can loop over a certain set of bytes inside of a file. And so if you pass it, something to look for the entry point and the first 25 bytes after it and look for any relative call instruction, you can modify the bytes that come after that and sort of follow that into the next function and then hash that. So you get a little bit of this idea of provenance, of something getting called from an export or an entry point and then the code that is inside of it.

And so in this in this example, this allows this allowed us to cluster a family that we tracked as escape artist, where YARA iterates over the first 25 bytes of this export and follows both of those functions and hashes them to see if they match that hash. And the second one they do. And what that data ends up being is the first 14 bytes down to that push, 200 instruction. Again, maybe a little surprisingly, this was a completely unique feature to just escape artist. It is code. It might not be like perfect code overlap, but it’s only clustered among these three samples of escape artists and nothing else out there on or in my malware collection. So once it becomes a math problem, you can sort of get into this idea of like tertiary or whatever comes after tertiary function hashing, which you know.

Greg Lesnewich:
If Yara is cooking. Some people like VT have a vested interest to keep their restaurant running smoothly. Doing stuff like this is like brewing beer in your basement as like a personal experiment. So don’t write rules like this and put them on on VT or your own internal tooling because I think that it can be useful for really exploring your own knowledge of where things are coming from and where to find overlaps across a very small set of files. But it may be more of a last resort thing. And trying other things like conditional jumps or absolute calls were pretty useless. But it did get us another family to cluster in.

Greg Lesnewich:
And by this point we have all these families clustered together. There are two that stand out here level or impairment that only have one file in them each, but from that they didn’t fit into any other buckets, so they sort of got deemed to be their own family. But we don’t really have any idea how they relate to each other. And so we’ve sort of reached the limit, in my opinion, of what we can do with just the console. And so we have to sort of expand our tooling and go look in a different direction. And like Philippe, I have us staring at the abyss titled Slide. And so it comes down to the fact that we have to disassemble.

Greg Lesnewich:
And in that disassembly, we also have to and, you know, sort of disassembling meaning going down to the function level of the P. And then we also have to account for changes in the file like to addresses that get called so that way you can wildcard them out and avoid them. But using those functions is kind of a pain in the ass. So in the previous escape artist example, there are 678 functions inside of it. And how do you pick among those which to focus on? Do you take those that have a high cyclomatic complexity? Do you pick those that have a ton of cross references? Making thresholds for those is really difficult because you don’t necessarily have the best idea across all the files of what a large number of cross references is. And so how do you pick which functions to hone in on and sort of thinking about this over the summer, the answer was gifted to me in the form of a guy called Willy Mellenthin and a tool called Floss that I think some of you would be familiar with. There’s a Mandiant tool, so thank you to William Moritz for building it. That does a lot of cool stuff that doesn’t really get talked about. It uses this engine that Visy built called Vivisick to emulate, which is how it follows functions like these ones that’s shown here in the screenshot and then burps out the decoded strings.

Greg Lesnewich:
It turns out that if you use the X flag in the previous version of Floss or the V flag in the current one, you can get not only the offsets of those strings to write a rule on, but you can also then get the likely decoding function. So those end up being, at least in my opinion, decently high fidelity. And so over the summer where Willy Valentine stepped in was that there was they upgraded floss to version 2.0, which exposed it as a Python library. And you can write a function here like Willy kindly did for me when I asked him a question. And his solution was build a tool for me instead of just answering the question. And we can use those as a feeder for disassembly.

Greg Lesnewich:
And so we can. I don’t have an idle license and Risen was was a really good option for us to sort of walk through and disassemble all the files, particularly because it does this Zignature masking which allows you to basically mask out the address and wild card it instead of just just taking the bytes out of each function individually. And so what you get is the golden goose of a decently interesting code base YARA rule. We’re open sourcing this today. The link is going to be up here in a little bit. We’re calling it Floss2YAR because we are not very creative, but this was sort of my solution to looking at the Lambertz toolkit and figuring out how to link the different disparate families together.

Greg Lesnewich:
Like anything else, it has limitations, but we put it out for free. And so if it sucks, I wrote it. So you get what you pay for. There are a bunch of other tools out here that do this too, but I didn’t have a great understanding of how they were doing things like your yard-signator and Binlex are awesome, but there wasn’t really like a direct answer of like. Okay. You know, this is rare, but what’s it doing if you’re going to go through the time of disassembling something like, you might as well have an idea of what it’s doing. And so the benefit of honing in on likely decoding functions is that you get things like this. This is a slide that I blatantly stole from Costin that linked a bunch of these different Lambert families together. And so if you looking for decoding functions gets you things like this with all of these weird XOR move instructions.

Greg Lesnewich:
And so what happens if you keep writing, running this over batches and batches of these files and very quickly failing and finding things that are sort of just generic windows functions. You can start to link these nodes not only with the sort of idea that you have this understanding that the code is similar, but that the functions are actually shared.

Greg Lesnewich:
And so you can iterate and iterate and iterate. Also occasionally running on export functions and you end up landing on this constellation of Lambert’s tools, which looks a little bit cooler if you color in what I suspect the families actually are. Ariel in the back is going to grade my effort here at the end because Kaspersky knows way more about it than I do. But this was sort of my best guess for what these families were. If some were updated versions and sort of mapping to their color coding.

Greg Lesnewich:
So looking at how we did using this method, we were able to link 14 out of the 21 families. There are six families that we left out to dry. So if you subscribe to the D’s or C’s get degrees, you could call it a win. I do. So I am calling it a win. And you know, looking at all of these files in aggregate, a couple of things do end up standing out like they really like running as Windows Services. There’s a lot of interesting functions that build out Windows services that have string names for that sort of spoof advertising corporations, sort of similar things in their C c2’s And it sort of became apparent as I was exploring them that they’re really keen on hiding from a systems administrator that knows what they’re doing and the Windows operating system, sort of general logging and telemetry.

Greg Lesnewich:
There wasn’t a lot of like user evasion where none of their files had like a PDF icon to entice someone to click, nor was there any sort of like direct AV evasion, at least in my analysis of like I guess, 80 files at the end of the day after retro hunting.

Greg Lesnewich:
There are some shortcomings. With this. I’m probably missing a whole litany of files that Kaspersky has sort of discussed in open source, as well as Juan during. He can’t give a conference talk without mentioning them, but I’m probably missing some things. So there are a lot of gaps to be filled in. I also didn’t really reverse any of these. I’m the hashes and all the rules are going to get shared. So if you want to dive in and contribute to sort of filling in some of the gaps, that would be really cool.

Greg Lesnewich:
And in looking at the sort of assessing like what we did, I think that the type of tooling that is getting found could definitely create a bias in the data set in that if something is running in plain text in memory, that’s much more likely to get clipped and thrown into VirusTotal rather than something that is maybe encrypting big chunks of itself in memory and only decoding them at specific call time. I also could be completely overthinking this and all of those collections and connections that I had in that previous slide, those could all just be like modules of two families and I could be completely overblowing what’s happening and without really doing the the incident response or knowing how the samples interact, we’re at a limit of how we can link them to each other.

Greg Lesnewich:
So I’ll leave this slide up. The tool is here. I think that the link works on GitHub, all of the rules and in comments there are the hashes for the on pastebin, the hashes and the rules, both the sort of wonky ones as well as the code based ones are up there and then the console rule set for like automatically burping out like the import hash or the like tertiary called function hash stuff is all there and just I’m very willing to share the slide deck with people because there are a lot of slides in the appendix about what the families actually look like and some oddities in them. But you have to be a real human and come talk to me to be able to do that. And I’m not just going to tweet it out because I don’t want to get disappeared.

Greg Lesnewich:
And so I think that the main takeaway that I had from doing this research is that YARA is good enough and flexible enough to sort of if it’s good enough to track Bright Constellation or the Lamberts, it’s probably good enough for a lot of the other actors that we’re facing.

Greg Lesnewich:
I will say there is an additional bias in there that these samples were definitely not bloated by certain they’re not written in Delphi, so there isn’t a ton of additional data in there. They’re not stuffing OpenSSL or zlib like full libraries in there either. So that was iterating over them was a little bit of an easier job. But really the thing that I learned most from this doing this research is that if you’re an analyst and you have an idea it is worth your time to learn enough Python or enough Go or whatever language you want to subject yourself to to build it because no one is going to have the same vision that you have and no one is going to know the same outcome that you’re going to want.

Greg Lesnewich:
And so. In that, I don’t know, two and a half years it took me to really feel comfortable writing Python ut sort of enabled this to happen. So if you’re an analyst, the idea is worth it. We’re a better community if you put it out into the world. So yeah, if it doesn’t exist yet, build it. And with that, I’m disappointed that Juan and Kim missed a whole talk about the Lamberts, but I will be taking questions. Thank you. There was no there was no new information. There weren’t any docs. You didn’t miss anything. Ariel. How did I do?

Speaker2:
Awesome. Thank you.

Greg Lesnewich:
Cool. Thank you, everyone. Sabrina Yeah.

Speaker3:
All right.

Speaker2:
Throwing up applause for. For Greg. Awesome.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp4 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you’d love including advanced search, share transcripts, transcribe multiple languages, collaboration tools, and easily transcribe your Zoom meetings. Try Sonix for free today.

About the Presenter

Greg Lesnewich is senior threat researcher at Proofpoint, working on tracking malicious activity linked to the DPRK (North Korea). Greg has a background in threat intelligence, incident response, and managed detection, and previously built a threat intelligence program for a Fortune 50 financial organization.

About LABScon

This presentation was featured live at LABScon 2022, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

Want to join us for LABScon 2023? The Call for Papers is now open!

LABScon Replay | Does This Look Infected 2 (APT41)

By: LABScon
18 May 2023 at 13:25

In March of 2022, Mandiant released new research detailing APT41’s persistent campaign leveraging novel exploits, malware, and techniques to compromise U.S. State Government networks. APT41 continued to demonstrate their tempo by exploiting a zero-day in an animal health management application before quickly shifting to operationalize the then fresh Log4j vulnerability.

At the time, APT41’s goals were unclear. The “Double Dragon’s” name is derived from APT41’s well documented dual espionage and cybercrime operation. Were they hitting U.S. State Governments to support greater intelligence collection initiatives, or for financial gain?

Mandiant researchers Van Ta and Rufus Brown take us on a journey of discovery into the mysteries of a long tail, persistent compromise of U.S. Government networks and offer a unique insight into one of the world’s most sophisticated threat actors.

Does This Look Infected 2: Audio automatically transcribed by Sonix

Does This Look Infected 2: this mp4 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Van Ta:
All right. Thank you, everyone. Thank you for attending. We also wanted to extend a thank you to the lab’s organizers for a great inaugural event so far. So let’s give them a round of applause before we get started. So my name is Van Ta. This is my colleague Rufus Brown, and we’re both part of Mandiant’s Advanced Practices Team. We’re really excited to be here today to expand on a story that we began telling in March of this year. And so, without further ado, this is Does This Look Infected? First. I must disclaim you.

Van Ta:
All right. So in March of this year, we published research on a persistent, months long APT41 campaign to gain access to state government networks. Between May 2021 to February of 2022, APT41 compromised at least six state government victims, primarily through exploitation of deserialization vulnerabilities in Internet facing web applications.

Van Ta:
Now, throughout the roughly ten month time frame, APT41 used two different zero days. The first was in an animal health management application known as USA Herds, which at the time of our analysis was used by 18 different states. Now the nature of this vulnerability was in a static machine key that was present in all default installations of the USA Herds application. And so APT41 in possession of this key could then compromise any server on the internet running this specific application. Now, in December of 2021, APT41 quickly shifted gears to operationalize the then fresh zero day in log4j. Now in the months prior APT41 and our research revealed a number of net new malware variants and that remained the same with log4j.

Van Ta:
What we were observing was apt 41 was exploiting victims with log4j to then deploy the Linux variant of a backdoor that we call KEYPLUG. Now this is notable for a number of reasons. Number one, this was the first time we had observed a Linux port of this backdoor for a piece of malware that’s been around since at least 2019. And number two, the Windows version of this backdoor was heavily used during the government intrusions in the months prior. So not only are they able to shift gears, switch up and operationalize a new zero day, but they’re able to deploy a new malware capability while still simultaneously operating at state government networks. So a lot of tenacity there.

Van Ta:
Now, throughout all this, it was pretty clear that APT41 put the P in APT. Right. They it was frequent that we would begin response at one state government agency only to find APT41 was active in a separate unrelated agency in the same state. And not only that but upon eradication APT41 would quickly recompromise their targets. And that’s something that we observed five different times.

Van Ta:
And so with this research, we were able to unveil quite a bit. But one burning question that we still had that we couldn’t really answer was “Why?”. And that will be the focus of our conversation today.

Van Ta:
So at the time, there were a couple of safe conclusions that we could make. These are state governments. There are treasures within these networks that would be valuable to any adversary. And the evidence of a deliberate, adamant campaign, based on the evidence that I talked about in the previous slide, supported some level of a targeted collection mission. But even then, although we had evidence to support these things, we still don’t really have an answer to why.

Van Ta:
Now, at the time we had a couple of hunches, but nothing really conclusive. But let’s take a look at what that really looked like. So at one state, victim, 41 had deployed the passive version of a backdoor that we call LOWKEY on a server responsible for the state’s financial benefits application. Now being a passive backdoor, it was configured to listen to traffic, to specific URLs, and in this case it was configured to listen or I’m sorry to listen for traffic to a URL in which one of the strings matched that specific benefits server application.

Van Ta:
Now APT41 matching their configurations to kind of blend in with the environment, blend in traffic with these different applications. That’s not something that’s net new. But it did show that APT41 wanted to maintain access to this server and this part of the network. Now, upon seeing something like this, one of the first questions that we would ask is, okay, how many states use this particular application? Do you guys like my breadsticks? Right there. Okay. And so to get a quick and dirty answer, we turn to scan data looking specifically for servers that would elicit a similar response to this particular benefits application. Now, while Rufus was poking around, one server stood out one because it was the only server not in the United States, and two, it was located in China. And so being nosy like we are, we wanted to inspect it a little bit further. So let’s see what we found.

Van Ta:
So. So we found a what appeared to be some sort of custom web app running on an ephemeral port that was leaking PII data for US citizens belonging to one particular state. And digging a little bit further, we found something else that was pretty interesting. We found what appeared to be a custom Baidu map with custom pins located somewhere in China. And so again, being very nosy, we zoomed in a little bit further and we could see that all of the pins are located in the Chengdu province of Chengdu and in particular were four kindergartens in that area. Do you all remember Chengdu 404? That was the front company that was detailed in the September 2020 indictments of APT41 members.

Van Ta:
Now, at this point, we have some loose ties to operations at state government victims. But because we did not directly observe this server in relation to that particular operation, we couldn’t attribute this to APT41 And so at that time, although we had some hunches, we were still back at square one, not really knowing the answer to why. It wasn’t until we completed investigations at two additional victims that we were able to collect the evidence to get us closer to that answer.

Rufus Brown:
All right. Thank you, Van. So for the rest of the presentation, I want to try and focus on these. Two new state government victims. So specifically, new data we haven’t talked about and specifically came from these two new state government victims. So starting out around last summer of June 2021, this is where we saw APT41 first gain initial access at State D, So this was through a proprietary Internet facing web application, which no other state had. Shortly after in August, this is where we saw APT41 gain initial access at the second state. Similar thing, proprietary web application, but this time it was a ASP.NET.

Rufus Brown:
Starting out around August. This is where we first saw the group conduct lateral movement and reconnaissance activities for around 4 to 5 months. So this is a really long time for a technically capable actor such as APT41 to remain active in environment and also really gain a better understanding of the network architecture as well as gain a stronger foothold on many systems across the network.

Rufus Brown:
At the beginning of the year. This is where we saw them first, laterally moved to the state benefits such as state benefit servers and also really conduct some hands on activity. So they started modifying with different software on the server. It really showed that they wanted to stay on these servers. So after an eradication event, about one month after we saw them re compromise via a similar technique, Internet facing web application exploitation, they quickly escalated privileges and got a foothold on over 50 systems in a very short amount of time. So really emphasizing that this group is very technically capable. They’re going to find web applications on your DMZ or Internet facing that are vulnerable.

Rufus Brown:
They have the capability to do that. So the last time we saw any sort of interaction or our last observance of the US state government campaign was around March and then one month after in April is when we saw them turn their focus to other geographic regions and organization verticals.

Rufus Brown:
So what helped us put the pieces of the puzzle together and really what were our big finds? So around out of three dozen systems in a 3 to 4 month time frame, 47% of those systems which were DEADEYE infected endpoints were associated to the state benefits architecture. Right. That’s a pretty large significant number for really showing what APT41 was interested in while in the environment.

Rufus Brown:
Secondly, while we started to investigate the state benefit system servers, we noticed that there was a peculiar malware that was running in memory on the server. This is what we track as FASTPACE, and one of the main capabilities of FASTPACE is to allow for unauthorized potential database modification.

Rufus Brown:
So if you’re not too familiar with fast pace, fast pace, which is aka Skip 2.0, was initially discovered and reported by ESET in late 2019. So pretty much this back door targets only MySQL servers for in-memory database manipulation. The particular backdoor that they discovered and reported on in the initial blog targeted SQL Server versions 11 and 12. While the backdoor malware we identified in the state government victim targeted version 13. This really indicates that APT41 is likely continuing to use FASTPACE in their toolkit and are continuing to update it for different iterations of SQL Server as they come out.

Speaker2:
So the way it works, pretty much this backdoor gets injected into SQL Server process and then it looks for specific byte pattern sequences. So these byte pattern sequences are associated with code functions in like native SQL modules such as SQL Lyngdal and SQL DQ. Basically, these targeted functions are related to credential validation, user authentication, event logging, SQL modification logs, things like that. So basically this pretty much covers up any sort of trace or track of what APT41 was doing on these database servers. So really, really difficult to keep track.

Rufus Brown:
I think it’s important to note, too, that out of all Maneant investigated EPP 41 intrusions. This was the first time we saw fast pace in use by APT41. So they had been active since. I think 2014 is like 78 and this is the first time we’ve seen this malware. And it was particularly at a state government victim, which is pretty interesting.

Rufus Brown:
So lastly, for State D after the eradication event, they went straight back to targeting state benefit servers, really just showing and indicating that they wanted to continue their mission, gather whatever data that they are apparently going after.

Rufus Brown:
Again, similar to state D, but for state E. They both targeted state benefit servers very heavily in both of these environments. So if we recall back to what Van mentioned in one of the beginning slides, so the log4j exploitation event, this is where we first saw the first iteration of the Linux backdoor for KEYPLUG.

Speaker2:
So about one month after we saw that backdoor dropped, we saw the passive version of this backdoor dropped at the state government victim. I think it’s important to note as well that this KEYPLUG passive version was only dropped on state benefit servers. Nowhere else in the environment.

Rufus Brown:
Lastly, so as we continue to investigate these servers in this environment, we saw them begin to tamper with the DNS configuration on the host. So this was a very pivotal point in our investigation and really helped us understand what types of data they were going after.

Rufus Brown:
So initially they began targeting these servers, laterally moved and gained access. Secondly, they deployed malware on these servers. It was just to KEYPLUG Linux passive version that I mentioned. The way it works is basically once it gets injected in the memory, it listens on an interface and looks for a packet that contains another magic byte sequence. This magic byte sequence is generated based on the infected host name of the server.

Rufus Brown:
Pretty similarly to how they target Windows operating systems during this campaign. They attempt to masquerade their files as legitimate binaries such as Microsoft, Fortinet and I believe, VMware. So as we can see here, one was deployed as a shared object file and the other one as a executable, particularly masquerading as VMware Tools.

Rufus Brown:
So after they did that, they immediately went to target the DNS configuration on the host. So specifically the host file. So we acquired this file, took a look at it, And of all the entries in this file, there was only one IP address that was a remote IP address.

Rufus Brown:
So we took a look at this and this remote IP address was mapped to a domain. Particularly this API domain was for a independent user verification service that was related to the state benefit system. So now potentially APT41 is allowing for this user verification traffic to get redirected to their C2.

Rufus Brown:
So potentially what could happen, let’s say, for this like a user logs into the state benefits application, right? They’re going to enter their username password, maybe MFA. Once they do that, likely this back end application is going to generate an API request to this remote domain likely containing a user verification info. So now all that data, all that user verification info is likely getting redirected to APT41 C2 server.

Rufus Brown:
So we took a look at the server, we started profiling it, taking a look at it, and we noticed that on one of the ports there was a Self-signed X509 certificate, particularly the Self-signed X59 certificate masqueraded as the Verification Services company’s country state locality, organization name, as well as the domain and common name. So really just showing that they wanted to blend in with this traffic and really try to masquerade in order to evade detection.

Rufus Brown:
So unfortunately, this is where our investigation ended. Just our scope didn’t include any more of investigating the database servers or web application logs. So this is where it stopped.

Van Ta:
And so we started our story today with a couple of hunches. And with that, we added evidence collected from victims that now in totality paint a convincing argument that what Apt41 was after was specifically our states’ financial benefits data.

Van Ta:
And although we’ve progressed significantly from where we were before, I think still, ultimately after all of this, we really still just want to know why. Now, although this although what we don’t know has been the focus of our presentation today, as we wrap up, I want to talk about the things that we do know.

Van Ta:
So, number one, based on apt41 operations on the state benefits server, based on our understanding of the data that would be exposed to them, it’s very possible that Apt41 has the ingredients to take this in a financial gain direction. And similarly, we know that historically Apt41 has the capability to run both financial gain and espionage operations concurrently.

Van Ta:
But even with that, the data exposed is highly sensitive and still could support some sort of collection mission.

Van Ta:
Now, number two, based on APT 41, just being everywhere as we’re responding to this over a ten month time frame, their willingness to exploit anything available to immediately get back in and retarget these servers, we are confident that the real answer to why does exist out there.

Van Ta:
And lastly, and arguably most importantly, the one thing that we know about this is that APT 41 continues to remain undeterred after their September 2020 indictments.

Van Ta:
And so with that, I hope you all enjoyed this story of essentially Rufus telling me I told you so and thank you all. I will now open it up for questions.

Van Ta:
Yes. Yes. Great question. So for a lot of the exploitation, before log4j, they were crafting a majority of so serial payloads to exploit deserialization vulnerabilities against a diverse set of applications at these different governments. I don’t know if you want to add anything else. Yeah.

Speaker2:
Why? Yet like the net. They’ve been using that for a while.

Van Ta:
Yes, sir. Gentleman in the back.

Speaker3:
He.

Van Ta:
That’s a great suggestion. Thank you for that. Like we we coordinated so closely with law enforcement during this, but didn’t specifically go down that direction. But this is kind of why we like this format as well of a talk with a lot of researchers in the crowd. So we can discuss this in almost in a way crowdsource potentially that answer to why. We’re able to get almost there, but not necessarily across the finish line. But I appreciate that. Yes.

Speaker3:
Are.

Van Ta:
We. We tried to. We tried to. I’ll say that. Yeah. Anything you want. Anything else you want to add?

Rufus Brown:
No, it was just that particular map was just on like another running port on that server. And we still have questions on like what exactly that server is. And it looked like almost maybe it’s something that’s compromised compromise infrastructure. But yeah, don’t know, 100%.

Van Ta:
And I still think that there is a potential that we did stumble upon some sort of operator box. And based on the information that we have here, we have tried to work with partners that would have a deeper, deeper level of visibility into that server itself. Because again, we’re we’re mainly dealing with scan data to identify and further investigate something like that. So yeah. Any other questions? All right. Thank you all so much.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp4 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you’d love including automatic transcription software, enterprise-grade admin tools, collaboration tools, upload many different filetypes, and easily transcribe your Zoom meetings. Try Sonix for free today.

About the Presenters

Van Ta is a Principal Threat Analyst on Mandiant’s Advanced Practices Team, where he leads historical research into the most impactful adversaries facing Mandiant’s customers. His research on various named threat actors FIN11, FIN12, FIN13, and APT41, has been referenced by both private and public organizations.

Rufus Brown is a Senior Threat Analyst on Mandiant’s Advanced Practices Team specializing in attribution and malware tradecraft. His joint research into APT41 was covered by national media outlets.

About LABScon

This presentation was featured live at LABScon 2022, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

Want to join us for LABScon 2023? The Call for Papers is now open!

LABScon Replay | Malshare: 10 Years of Running a Public Malware Repository

By: LABScon
16 May 2023 at 13:43

Since March 2013, alongside a handful of volunteers, Silas has run a fully public, never-for-profit malware repository named MalShare. The site allows anyone to register and immediately have access to our entire collection of malware samples.

When MalShare first launched, the idea of openly sharing malware was highly controversial; Silas was told the site would never survive against existing commercial options and that it would only serve to give threat actors deeper insight into defender visibility. Now ten years later, Malshare is still online. What started out as a handful of open web directories has grown into a service used by thousands of researchers and integrated into numerous tools.

In this talk, Silas shares his experience of some of the challenges and rewards of running a free, public malware repository for the benefit of the research community. Along the way, he describes his greatest fear, discusses rival services like VirusTotal and vx-underground, and explains why he doesn’t worry about people trying to hack the site.

Malshare | 10 years of running a public malware repository: Audio automatically transcribed by Sonix

Malshare | 10 years of running a public malware repository: this mp4 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Silas Cutler:
Thanks for having me. My name is Silas Cutler. And today I’m going to be talking about talking about a really important project to me. But those of you who don’t know me, as I said, my name is Silas. I wear many hats. I’ve worked quite a few places. But the hat that is the most important to me is one that started almost ten years ago. It’ll be ten years in a couple of weeks.

Silas Cutler:
So I run a public malware repository with several other people, several of whom are here called Malshare. Malshare is a Public Repository. We don’t have any paid services. We will never offer any paid services. The entire project is focused around making malware sample access easier. Malshare started on 28th March 2013, and it was really interesting listening to Thomas Ridd’s talk yesterday when he noted back about the pre shadow brokers eras and before the mass proliferation of a lot of the nation state actors that we’ve seen.

Silas Cutler:
Back then, sample sharing was complicated. You couldn’t just openly share malware. People told me that if I was to start this that I was going to be sued into the ground, that no hosting provider would ever talk to me again and that I would essentially be helping the attackers along the way.

Silas Cutler:
Funny how the world has changed. Yeah. And I was told also that the abuse reports and just the takedown requests from people accidentally uploading samples would consume all my time and the entire ten years that this has been running, I’ve had one sample removal request. That’s it. And it’s funny, and this is a really amazing conference to be doing this talk at, because the entire project started at a conference.

Silas Cutler:
It started talking to another analyst, figuring out how we could exchange samples because I was in the process of leaving a job and and I was terrified of losing access to VirusTotal. And that’s the true reason that the entire project started, was because I knew that I needed data. I needed to be able to play with stuff. And I figured if I was going to be building something and trying to build my own way to feed myself samples and do research, that there’s no point in keeping it private and it’s something that can be shared with everyone.

Silas Cutler:
So I will preface all of this with I am not a graphic designer. This is what the first version of our site looked like and fun. Fun Easter eggs. You’ll notice the wonderful Mt. Gox logo icon right at the bottom to donate 0.1 BTC to keep the server going. Back then that was about $20. Now the price ranges from 2000 to something hourly. I think we actually lost like $30 in Bitcoin when Mt.Gox got tanked.

Silas Cutler:
So in the original structure, the way that we were sharing files was we would tar up a batch of files every day and post them on the website. And this seemed like a really easy way to do it. It made things accessible. That process lasted about a week and somebody included it, like the bulk sample set, one of them in part of a dropper, and started trying to use me as a cheap deployment place.

Silas Cutler:
And it was it was horrifying because everything that people had said about how it could become a resource for attackers became absolutely true and smacked me in the face. So we had to get better. And it also became a lot of the ways that I look at the project and taught me the very important lesson that we can do better. And when things like this happen, there is an onus on platforms like this to try and help as much as we can.

Silas Cutler:
So fundamentally, Malshare I see as new researchers and old researchers first and last repository. We do not have the sample collection that VirusTotal does. We don’t have the features of many of the other ones. But when you have no budget and you need and are building a program, we will always be there.

Silas Cutler:
This talk is not about not about the tech stack of malshare or the the back end details. There’s a lot of things that make Malshare the most mediocre malware repository on the Internet, but that is the point as well. The number one thing and the most important thing that I want to say in this talk and I’ve rewritten this talk about four times this week, but the most important thing I want to say actually, is thank you, because Malshare isn’t Malshare is not mine. It’s belongs to everyone who is uploaded samples, who has used files from it, who’s messaged me on Twitter to say, Hey, the site’s down. It’s a community resource that belongs to us all. So thank you to everyone who uploads. Thank you. If you’re on the advisory boards of committed code and thank you for letting me be part of your research over the years and I hope to continue helping and going forward doing that for everyone.

Silas Cutler:
As with most things very bluntly and real talk, I don’t always know where I’m going with projects. I see a path that looks fun and I run at it and along the way it’s been incredibly it has been amazing to be able to watch the people and learn and see how the project has grown. Yes. So what I want what I want to kind of talk about for the next part of the talk is who I see Malshare as and what I see it as, as the one of the administrators of this project.

Silas Cutler:
So this is now where we are. We have users now all across the planet. We are up to 27,049 users as of this morning. And it’s been unbelievably incredible to see and talk to people and hear about how they’re using the files. When you register on the site by default, you’re allowed 2000 downloads a day or queries. So searches, downloads, if you want more, just email doesn’t make it. We don’t charge for anything. If you don’t want to email, you can just make more keys too. Our users do that and I’ll talk about that in a few minutes too.

Silas Cutler:
But it’s been amazing as well watching over the years where people where people come from to register for the site, the projects they work on. We’re heavily rooted in places across the Middle East, across China, and many of them are students who are in university who want to get into malware analysis. And it’s not always accessible. Unfortunately, the one country that I really upset that I have not managed to get users in in one country, but those some of the the more northern blips, maybe.

You know, one of those Chinese lives?

Silas Cutler:
Yeah, we do. So. So, Malshare is a community resource, as I’ve said. Almost everything on the site is open source. We didn’t start out that way. We actually became open source because a employer of mine years ago tried to say that it was improperly disclosed as part of my onboarding and my prior inventions and that the ownership defaulted to them. So a git push later. It belonged to everyone.

Silas Cutler:
There’s a couple pieces that are not yet open source. The reason they’re not open source is because the code is really bad, and I’m a little embarrassed for people to see it. Mind you, the site is written in PHP, so that’s saying a lot with the site being open source. There’s no secrets. Everything we do is visible in the code, but that makes it accessible for people and usable to bend and to use, however meets the needs of people. The site itself is even usable internal outside of the public instance, and there’s a couple of groups that have started forking it and creating local instances at universities. And even a couple like student clubs have their own instances running in order to share samples that they’re collecting as part of. One of them is doing as part of like a honeypot project, which is really cool.

Silas Cutler:
Over the years, the space of malware repositories has significantly increased and there are some of them and it’s some of them have done absolutely amazing things and some of them have have kind of faded off. Oh, I didn’t include the ones that vanished over the years.

Silas Cutler:
But anyways, but it’s been really interesting also to watch each one of them take their own different approach to how they look at creating a usable service to help people hunt through malware malware sets. And I’ll call out vx-underground specifically because they’re feisty ones, aren’t they? Yeah. Yeah. The password infected. I’ll save you the DMs cos Smelly gets really upset about it. But unlike what what Malshare is which was designed very much to focus on the API to allow people to automate into it and to build things to go beyond what the service can do. vx-underground took a fascinating route with this because they went in the almost an encyclopedia like design where people almost look to them now as a resource for for defining what a set looks like. And there’s been arguments on social media about about what’s a Pegasus sample and what’s not.

Silas Cutler:
But each of these different approaches, the admins of these sites all face different interesting challenges and problems along the way. For Malshare, I don’t have to worry about the the problem that vx-underground does in terms of building a library and a curated collection because people don’t are not looking for assessments from the site. It’s also because I don’t have enough like a lawyer to protect if I accidentally slander someone by saying they’re legitimate to software as malicious. Right.

Silas Cutler:
One of the things that has made it really special for me over the years is your hacks actually make me really happy.

Silas Cutler:
So I said before as well, we limit people to 2000 API API queries a day. We see people creating duplicate keys regularly and I’m really privileged to be able to say that I don’t give a fuck because what I care about is and I’ll touch on more at the end of it. As long as you’re not interrupting service to others, as long as you’re not trying to dump the user database, why worry?

Silas Cutler:
It’s been fascinating and exceptionally cool to watch. The ways that people look at the site, use the site, exceed the site and what we can do and build out and to build cooler hacks and things that go beyond. So I pulled yesterday as well trying to look at some of this API API key reuse and it’s fun as an admin seeing, seeing some of the things.

Silas Cutler:
So for example, there’s this odd pattern there where about ten of the duplicate API keys came from 43 IP addresses, Someone’s got a little proxy network or is using Nordvpn to pull samples. Not a problem, but just a curiosity to see how people are trying to harvest things. Another piece of the sort of service abuse that I’ve seen over the years. And there’s actually another malware repository that I listed on the previous slide that actually had this setup where what happened is they would pull my feed every day. It would go through a discord bot that would post it to a channel. They would upload the sample then to VirusTotal so they could get download quota on VirusTotal to download different samples.

Silas Cutler:
I couldn’t be happier to see this because it’s finding creative solutions to what are really dumb problems that don’t need to be there. And I get it. It can be really awkward to send an email. Sometimes there’s people I owe email responses to and it’s been several days. I’m sorry, social anxiety is a thing. So as I said, why worry? In the end, people building creative solutions is what the project is about. There’s a price point that I can get away with continuing to run the service at, and as long as I can continue to hit that price point, which because I want this talk to be as open and transparent as possible, it’s about 125 bucks a month. But as long as I can keep it running at that price and. I’m fine with however much abuse happens on that.

Silas Cutler:
And in a few minutes, I’ll tell you about the abuse that I don’t like and what happens when when people fuck around and need to find out. But as a brief aside to it, something that came up on a Glasshouse call that I did a few weeks ago, one of the odd things as well in the industry that I’ve noticed is that if you want to get into pen testing and offensive security, there are numerous pathways to do it and it’s a series of pathway that has many different steps that are very easy to hop over, ones you don’t like.

Silas Cutler:
So Vuln Hub. Hack the box, hack this site, all these different resources to go from someone who is curious, to someone who knows the skills and knows the techniques. But on the defensive side, especially for things like malware analysis, we still often are dependent on training series written by forum users, on unknown cheats and and sketchy forums from the nineties to learn how to do some of the deep technical analysis that produces some of the cases that we’ve seen this week.

Silas Cutler:
Credit though, to OALabs, which is a group that does twitch streaming on reverse engineering. They are legit and they’re having a really huge impact. So fundamentally, though, by malware not being a commercial service, we don’t have to worry about the things like service abuse. What we do worry about, though, at the end of the day, is ensuring that the things that happen on the site don’t pose a risk to other users.

Silas Cutler:
When things happen that affect or could potentially affect other users, I care a lot. So the example I have of this that I wanted to call out is unfortunately, I had to redact the name of it for the person. So in July 2018, I got an email from someone. Recognize the email immediately there another researcher who I’ve known for a better part of a decade now asking for a couple of samples. It was a little odd also that they introduced themselves by saying they’re an independent security researcher, but I didn’t think too much of it.

Silas Cutler:
But I got this email, so I immediately responded with back with the samples. We’re not perfect when it comes to phishing. We all make mistakes. A couple of days went by. I followed up with him directly via Slack and they said, Oh yeah, I didn’t I didn’t email you. I just downloaded myself. So I immediately followed back up with this suspicious emailer asking if there was anything more they needed. Because if this is already someone impersonating another researcher, I want to see how far this goes.

Silas Cutler:
So what it turned out was that there had been a long running campaign in which someone was going around registering on sites as this famous security researcher. And trying to get things like extra quota and special access. And when you go back to things like Apache logs to dig through, when people are doing stupid stuff, they’re not great about hiding where they from. So long running campaign targeting a researcher from Iran and they’re still active to this day. They haven’t registered on the site and I do watch now for any time they do this. If you see people trying to impersonate or do bad things through Malshare, please let me know because at the end of the day, I want to make sure people are protected. And something like Apache logs to me are not what Malshere considers proprietary or sensitive data. So if there are things that we can provide, we absolutely will. Think about when I want to. Right.

Silas Cutler:
So the other thing that has caused impact in the past are DDoS attacks. Over the past several years, we’ve had three major attacks that have actually disrupted service. Only one of them actually was someone maliciously intending to disrupt the site. The other two were from researchers with poor Python scripts that continued to request the same sample thousands and thousands of times, which is also a really bad way for me to find out that you’re also using multiple keys which don’t care about but care about when it affects the users.

Silas Cutler:
As the briefest aside, talking about the tech stack. Fundamentally, Malshare is pretty simple conceptually. There’s a MySQL database to track everything an Apache web server and a file a file structure on disk for the sample repository. As a well thought out web scaled enterprise, we took this these three pillars of success and we put them in a box. And I mean, we put them on one server. So the site still continues to run on one server.

Silas Cutler:
So the point of this is. The point with this is. Over the past ten years. It has been an incredible privilege to do this, and I want to continue to do this. And I want to also make sure that this service lives on past just me as the single point of failure. And I bring up the fact that it’s still a single server, not because it’s a problem, but because as services like this go, and having watched other ones fail in the past, something an old project manager told me, which is one is none and two is one.

Silas Cutler:
And so unless there’s redundancy, things do fall down. So over the next ten years, where I’m trying to take the site is to build it into something that can outlive and move past a single point of failure or a single server into something that can continue to be a resource for people until how we share sample and how we think about malware no longer is relevant. Over time, things do fade away and become less relevant and Malshare is always a continuous reminder.

Silas Cutler:
And the other thing that stood out so importantly over the past ten years, and I’ve joked that Malshare is a mediocre malware repository, but the other thing that it does and that it has done so well is it defines the bottom of the barrel. If your vendor feed is worse than Malshare, which is free, you’re getting taken for a ride. If you’re not getting the services that should be available from something free, this as a free service, as a community resource says, everything above is where it should be, and that’s a really important role that we don’t focus on enough because it ensures a baseline and helps us move forward.

Silas Cutler:
So with that, I’d like to say one final thank you and open it up for any further questions that people may have. Yes, Brad. So it’s been a long time. Yeah. Since we’re doing this. I’ve had the privilege of watching it grow.

Speaker2:
Over the years, and.

Silas Cutler:
I didn’t want to. I didn’t want to dox you as as one of the folks.

Speaker2:
See one of the see some of the terrors of What do you think has been the biggest success? What what is the biggest thing that surprised you?

Silas Cutler:
The biggest thing that surprised me. The biggest thing that surprised me is that is actually when people say like, Oh, Malshare only has a bunch of HTML pages, or criticizes the quality of the feed. I don’t know how many people have actually pulled like an hourly batch of VirusTotal and gone file by file. I have really bad insomnia and it really helps sometimes, but VTE has a lot of junk too, but they also have so much that nobody’s picking through it at a granular rate. It’s been surprising that that isn’t always obvious.

Silas Cutler:
I think the other thing that’s really surprising is also the other thing that’s incredibly surprising is the integrations that I see. And to everyone who’s written an integration that I will never see and don’t know about, like, thank you and please feel free to let me know if there’s things we can do better. But for example, like Synapse has a plugin for Malshare to pull data and consume the feed, and it’s amazing to see all of these all of these integrations and where the service is being used. Mandiant has one as well that I found when trying to find listings of them. It’s been truly amazing just seeing all those. It has also been surprising seeing people who are resistant to me trying to give them free malware as a feed, which I get. Already the hesitations about people trying to give others malware. But yeah. Yes.

You mentioned there was one sample that you had a request to remove. Yep. You give any context on that?

Silas Cutler:
It was a PDF document for a company. I think it was meeting notes that somebody accidentally uploaded. I really don’t want to throw stones in glass houses, but I’m going to for just a moment to your question also, Brandon, I’m going to throw a real hard stone on this, which is the biggest fear that I’ve had with malware actually is csam. I am deathly afraid of it.

Silas Cutler:
The surprising thing also has been how many people have commercialized that as a service and reaching out to some of the big players who offer services to help watch for it and have hash lists of it. It is a little tone deaf when they tell me the price is $120,000 a year. That has been surprising too. So anyways, any further questions?

Silas Cutler:
Awesome. Thank you again. And again, if there’s if there’s anything we can ever do for Malshare to help, we’re always happy. And here to help. Cheers.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp4 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you’d love including world-class support, share transcripts, advanced search, upload many different filetypes, and easily transcribe your Zoom meetings. Try Sonix for free today.

About the Presenter

Silas Cutler is Senior Director for Cyber Threat Research and Analysis at the Insitute for Security and Technology and Resident Hacker at Stairwell. He specializes in hunting advanced threat actors and malware developers, nation states and organized cybercrime groups. Prior to Stairwell, Silas was a threat intelligence practitioner at CrowdStrike, Google, Chronicle and Dell Secureworks.

About LABScon

This presentation was featured live at LABScon 2022, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

Want to join us for LABScon 2023? The Call for Papers is now open!

LABScon Replay | Blasting Event-Driven Cornucopia: WMI-based User-Space Attacks Blind SIEMs and EDRs

By: LABScon
11 January 2023 at 14:33

Security solutions engineers always find new ways to monitor OS events to mitigate threats on endpoints. These approaches typically reuse different built-in Windows mechanisms that were never designed with security first in mind.

WMI provides rich information about the computing environment, which allows monitoring via event filters, consumers, and bindings to get notifications about important OS events. These features make WMI critical for solutions such as EDRs, AVs, SIEMs. The bad news: Malware countermeasures can disable WMI, making these defense solutions useless.

In this talk, Binarly’s Claudiu Teodorescu provides an analysis of the WMI architecture by reversing user-mode variables and functions from DLLs to demonstrate several new user-mode attacks.

WMI-based user-space attacks impact all versions of Windows. The core vulnerability of WMI is that the DLLs loaded into the WMI core process (WinMgmt), leverage “flags” to perform WMI operations. Attackers can block the access to WMI – receiving new OS events, installing new WMI filters – by modifying these flags. There are no built-in features to block these attacks or repair WMI.

WMI-based attacks can be detected by inspecting the memory of WMI core service, which can disclose other attacks on Windows OS components including privilege escalation, token hijacking, and ETW blinding.

Blasting Event-Driven Cornucopia: WMI-based User-Space Attacks Blind SIEMs and EDRs: Audio automatically transcribed by Sonix

Blasting Event-Driven Cornucopia: WMI-based User-Space Attacks Blind SIEMs and EDRs: this mp4 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Claudiu Teodorescu:
So, Claudiu Teodorescu, I presenting the Binarly and very happy to be here. Good morning, everybody. It will be a short presentation of WMI. I’ll go over some of the information that I presented at Black Hat 2022 and then add two new attacks presented only at this conference.

Claudiu Teodorescu:
So who is Binarly? So Binarly a startup in LA focused on device security and monitoring threats below the operating system and see how they’re moving up the stack into the operating system kernel and user land and then deploy their next level components. Unfortunately, Andrei and Igor, which contributed to this research, could not make the trip to LABScon. But I’ll take the credit for them and then maybe have a drink when we first meet in person.

So let’s go a little bit into the agenda. So I’ll present a little touch, a little bit brief on the architecture and features of WMI. Just a primer for for people that are not familiar with WMI. And also show some artifacts, forensic artifacts that I was deeply involved while reversing the format five, six, seven years ago. Then I’ll show how WMI is leveraged for formal policy orchestration. And next one moving into actually the meat of the discussion on attacks on WMI, how WMI is leveraged for evil, how that will present a threat model WMI threat model and the different attack vectors. And then we’ll move to present the two new attacks that I already advertised.

Claudiu Teodorescu:
So WMI architecture, it’s a pretty comprehensive image. But in short, WMI is the Windows implementation of two standards WBEM, which is web based enterprise management and the CIM, which is a common information model. It’s available systemwide in all operating systems and doesn’t have to be installed, and it offers a standardized framework to produce and consume events that are represented as WMI objects.

And in short, the architecture consists of the producers, WMI producers, which produce device telemetry as WMI objects, clients that consume those devices do those events to get device telemetry. One good example is PowerShell, which can be used to gather this type of telemetry both remotely and locally. Next is the CIM standard, the Common Common Information Module standard, which consists of the repository itself that stores the class definitions and namespace definitions as well as persistent WMI objects. Then we have the MOF, which is an object oriented language used to specify WMI artifacts to extend the frame the the standard. Next we have the query language, which is the WQL. It’s a SQL like query to filter events and DICOM and and when a RAM are used to remotely connect, to transmit and receive data. And last but not least, not the least is the WMI service, which is implemented as the services host DLL service in the Net VCS Group.

WMI providers. We already touched on this. Just a little reminder, it’s an instance of __Win32 provider, which is a standard class since providers are implemented as common com based the DLLs. That identifiable class ID and all the information of the column is in the interfaces, is in the registry, and the Windows 11 are more than 4000 built in WMI providers.

So one of the main ability of WMI is to act on events that cover pretty much any operating system event. And another ability that is very is used by in the wild by attackers is the ability to to register permanent event subscriptions which survive system reboots. We’ll see an example of that.

There are two types of events intrinsic events and existing events. I left here the definition of both for future reference. Now event filters specify which events should be should be transmitted to the bound consumer to act on those. The main the main properties of of of event filter is the namespace in which it operates the query language that is used mostly the WQL and the last but not the least is the the query itself that specify how to filter the events and send it to the bound consumer. And here we have the syntax, the general syntax of WQL for, for a for filtering. And then two examples. One is the first is the intrinsic event example that triggers whenever, notepad.exe is launched. The other one is an example of extrinsic event which monitors the run key for registry key for malware persistence.

And now we talk about the consumers. So the consumer specify the action that should be taken when when an event filter triggers. And as we can see that the the standard defines already five or six default consumers to log files, to log events, to run scripts or command lines or send notifications. And let’s go a little bit more into detail. I mentioned the persistence, the permanent event subscription that’s that’s actually done for persistence and code execution is the method. How you do it and how you do it is just define the filter, which specify which event to trigger the action that is defined by the by the event consumer and then binding bind them into an instance of filter to consumer binding.

So now let’s talk a little bit about the repository. So repository is the WMI repository. It’s a path and files can be found in the registry and consists of three types of files. First is the INDEX.BTR, which is the index file for the for the WMI repository implemented as a B-Tree on disk and stores the the search path strings in pages. Then is OBJECTS.DATA which consists actually the the which contains actually the namespace definitions, class definitions and persistent WMI objects. And then again stored in pages and consisting of records.

Claudiu Teodorescu:
And then the last not the least is the historic versioning of three three historic versions of mapping.map, which contains the mapping records. Because in in WMI there is abstraction logic to to physical logic, page number to physical page number. So the mappings are used to actually translate the logical page number to its physical correspondent.

So let’s look in in practice how this will should work. So we have the INDEX.BTR. First we need to create the search path string for the WMI object that we’re looking for so that how it’s done, you get the identifier for the namespace class, an instance name, you concatenate them using some prefixes that are there mention and then the index that is search for that for that path string. And then the index record is identified. And what’s important in that, the first the first number is the logical page number of the record we are looking for in OBJECT.DATA file. Then the other one is the record identifier, and the third one is the the size of the record. So in order to do the translation, to find out where the physical offset is, we have to use the the mapping map, which actually consists of two arrays of Dwords, one for OBJECT.DATA, the other one for INDEX.BTR and how it works. The logical number actually represents the index in the array. The value of that index is the corresponding physical page number.

The same thing the same algorithm can be used for for parsing the INDEX.BTR, it’s a B-Tree on disk, as I said. So they are the format is using a logical page number for the pointers to the next next nodes in the in the tree.

And now knowing how this can how we can parse the database, looking in on a novel system. In the WMI namespace. We found some interesting classes with Lenovo underscore prefix and one of them is Lenovo BIOS setting. The other one is Lenovo set a BIOS setting and the third one is Lenovo set BIOS password. And if we’re looking at the definition of Lenovo BIOS setting, we see that there is a property called the CurrentSetting, a type string that looks interesting to us.

Doing a search on the on the repository for for the for an instance of this class, we come up with empty results, which means those instances are not persistent. So are they’re generated on the fly by a WMI provider, which in theory should be provided by the by the BIOS vendor. Using PowerShell as a WMI client. We are looking for non empty, non non empty instances of this class. And then where were displaying when when a non instance we found non non non empty instance, we display the CurrentSetting property and we have some information here.

By magnifying on that we get some interesting bios configurations. About trusting execution. BIOSUpdate TPM and so forth. So what does it mean? We we talked about getting and setting class instances and there is a WMI provider that provides that information. So how does this work? So the way the in this case Lenovo implemented so he provided they provided an interface below the operating system that is implemented below the operating system. So the WMI provider calls on that interface to be able to read and alter this this type of information. So pretty much from the user land, you have access to these settings that are very important for your computer.

And from the management perspective or from a firmware policy orchestration. This is a great feature because you can manage everything locally or remotely through through WMI. But. Is dangerous because in the last update from from Lenovo, we have two CVEs that identify two vulnerabilities exactly in the interface that was provided for the WMI provider. And it’s exactly for the SetBIOSPassword the SMI Handler SetBIOSPassword is used to actually complete the instance for the WMI SetPassword class. So the WMI provider calls on that SMI Handler to get the information and create the instance for the WMI SetPassword. And as you can see here, this are industry wide vulnerabilities.

So WMI standard also provides a basic class to get some information about BIOS version the vendor of your BIOS and the device configuratio as part of the standard, so is Win32 BIOS.

So now let’s let’s move to how WMI used by mostly attackers but also for defenders offers greater flexibility in terms of devices providing device telemetry to to security, to Endpoint Security solutions, but for attackers offers a great living on the land infrastructure to do evil. And some of the ways WMI is leveraged is reconnaissance, AV detection, persistence, code execution and so on. So.

That was from last yesterday’s presentation on Matador. Forgive my lacking skills of taking selfies. I’m not used to do that, but I think I saw here the event WMI event subscription. So and getting some more information from the talk. I divide a scenario of how this can be implemented in WMI, which is very, very simple. You create a trigger that tells you that to execute the consumer whenever the system reboots. Whenever the system reboots. And wait a little bit for the boot sequence to finish. And then in the consumer, you just call the CDB.exe To debug the defrag.exe file and to to make the the event subscription permanent event subscription complete. You just need to to bind the evil consumer to to the evil to the evil of filter that specified the trigger to the evil consumer that specified the action to be taken.

Claudiu Teodorescu:
Attacks on WMI. So this is the threat model. The threat model consists of a couple of components. One is the WMI service that communicates with WMI providers and consumers via ALPC Advanced local procedure calls channels. Then we have the the WMI files on disk, the repository and the DLLs for the providers and more files and so on, configuration in registry. And that’s under actually yeah, under the data inside the WMI service. And on the right there are the types of the, the attack vectors that can be used on this threat model. So if we have attacks on data inside of WMI process attacks, attacks on pipes, connections, attacks on files and registries, sandboxing of the WMI using a user land attack and then using a kernel driver. At Black Hat, I already talk about this this attack vectors. But today we’ll focus on one attack, one more attack on data inside of WMI process and actually showcase the attack on ALPC pipe connections.

So most of the attacks on on the data inside of WMI process, WMI service process are done using this type of template, which is pretty simple. You have a global flag that is set in the initialization phase to its init value, and then when a consumer on a new client comes in or new request comes in, there is some dispatch routine that is called and that flag is is checked. If the flag is set to the required value, then the the event is processed and then no error. A success is returned. If it’s not set correctly, then it drops the event and returns an error code.

And we have here a list of flags that can be attacked and actually disable the WMI. Almost all of them I covered during my Black Hat talk. The m_pEseSession is the one I’ll cover today and the one one thing to mention. Attacking those flags using the template provided before will disable WMI, but it will return different error codes. That’s that’s one of the things that happens.

So let’s look at the pEseSession. So in the in the init function, a pointer to the interface IWmiDbSession is set into that that variable which is which is a member of the C repository class. To note there is a global object of the CRepository class in the wbemcore.dll. So it’s very easy to find this this member. And then as we can see in the shutdown method, we have the pointer is released. Then when a new WMI connection comes in, we have this call stack and at the end we have get, getDefaultSession when the default getDefaultSession is called that pointer is checked against the null. If it’s null, then a critical error is returned. If not, the ref count is increased on the pointer and then no error is returned. So the attack is pretty simple. Set this member variable to to null so that the function gets the default session. Returns the critical error.

Now the WMI attacks on ALPC channels, the items involved in this attack, the processes involved in this attack, services.exe, the WMI service and the WMI consumers that interact with the WMI service. On the right we have the. The kernel structures that are involved.

So how does it work? From the services perspective, services.exe perspective a name, the name, the Connection Port called ntsvcs is created by the services.exe and waits for the request from WMI service to connect. Once such request comes in the services.exe allows the connection and then the WMI service receives a handle to the client communication port and the services.exe receives a handle to the services communication port and thus the communication channel between services.exe, the server side and the WMI service, the client side, is established. The same thing. The same mechanism happens with WMI service and their consumers, each consumer WMI consumer will have a channel to the WMI service to actually receive events that are produced by by WMI providers.

So the first attack is pretty, pretty obvious. We cut the cord by closing the handle on the on the client side. So this way, the targeted WMI, WMI client is not going to receive any any WMI events. What you can do, you can retry the client, can try to reestablish the connection or restart itself, but the attacker app can act as a watchdog and then monitor the reconnection and then close the the client communication port handle again.

So time for the first demo. So we’re launching Command.exe. We validate that we have the latest version. We go to our main folder. We have two applications. One is the consumer, the WMI consumer receive WMI events. The other one is the attacker app. So the received WMI events is just monitoring for new processes that are launched.

So as you can see, multiple calls, multiple notepads and everything is logged in the into the WMI client and now we run the attacker and then we run the attack on the consumer, which is the client side. No events are generated. And then. That WMI event the WMI client is disabled and now we relaunch it again. And again, it can because it reestablished the connection, the ALPC connection with the WMI service it can get events back.

I’ll disable it again. So from this perspective, this is a different client. Even if it’s the same process, it’s the same application. Yeah. So now again works. So now we have stacked the client side. The same attack can happen on the on the server side, on the server side. So what we’re going to do, we are closing the server communication port in the WMI service. Again, we’re checking. We have the latest version.

We’re running the WMI tester, which is a way to interrogate WMI. So it is running. Now we do our attack. This time on the WMI service.

No more events I received. Now try to restart. We cannot restart because the server side of the pipe has been closed. Cool. Going back.

And now conclusions. So. WMI created for performance monitoring and telemetry gathering. Without security first in mind. That’s one of the points that we want to make for this for this talk. We want to make sure that people are aware about risks that they can, they can have if they rely on this telemetry. Yes, you can use WMI as a as a data point, but you should be able either if you rely only on WMI to detect this type of attack attacks or use different data points and correlate them to get an idea of what’s happening in your system. And as we saw, the attacks are very simple because WMI has been designed for a different purpose.

And one one last thing that I want to leave everybody here. All this attack can originate in the firmware, which right now it’s a big blindspot for the industry in terms of detection. Thank you very much, everybody.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp4 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you’d love including upload many different filetypes, automatic transcription software, secure transcription and file storage, automated translation, and easily transcribe your Zoom meetings. Try Sonix for free today.

About the Presenter

Claudiu Teodorescu is CTO at firmware security firm Binarly. He has an extensive background in Computer Forensics, Cryptography, Reverse Engineering, and Program Analysis. While at Cylance, he focused on program analysis to augment the ML model feature space with code-specific artifacts. Claudiu is the author of the WMI-parser tool to help IR teams forensically identify malware persistence.

About LABScon

This presentation was featured live at LABScon 2022, an immersive 3-day conference bringing together the world’s top cybersecurity minds, hosted by SentinelOne’s research arm, SentinelLabs.

❌
❌