Normal view

There are new articles available, click to refresh the page.
Before yesterdayFox-IT

Hunting for beacons

By: Fox IT
15 January 2020 at 11:29

Author: Ruud van Luijk

Attacks need to have a form of communication with their victim machines, also known as Command and Control (C2) [1]. This can be in the form of a continuous connection or connect the victim machine directly. However, it’s convenient to have the victim machine connect to you. In other words: It has to communicate back. This blog describes a method to detect one technique utilized by many popular attack frameworks based solely on connection metadata and statistics, in turn enabling this technique to be used on multiple log sources.

Many attack frameworks use beaconing

Frameworks like Cobalt Strike, PoshC2, and Empire, but also some run-in-the-mill malware, frequently check-in at the C2 server to retrieve commands or to communicate results back. In Cobalt Strike this is called a beacon, but concept is similar for many contemporary frameworks. In this blog the term ‘beaconing’ is used as a general term for the call-backs of malware. Previous fingerprinting techniques shows that there are more than a thousand Cobalt Strike servers online in a month that are actively used by several threat actors, making this an important point to focus on.

While the underlying code differs slightly from tool to tool, they often exist of two components to set up a pattern for a connection: a sleep and a jitter. The sleep component indicates how long the beacon has to sleep before checking in again, and the jitter modifies the sleep time so that a random pattern emerges. For example: 60 seconds of sleep with 10% jitter results in a uniformly random sleep between 54 and 66 seconds (PoshC2 [3], Empire [4]) or a uniformly random sleep between 54 and 60 seconds (Cobalt Strike [5]). Note the slight difference in calculation.

This jitter weakens the pattern but will not dissolve the pattern entirely. Moreover, due to the uniform distribution used for the sleep function the jitter is symmetrical. This is in our advantage while detecting this behaviour!

Detecting the beacon

While static signatures are often sufficient in detecting attacks, this is not the case for beaconing. Most frameworks are very customizable to your needs and preferences. This makes it hard to write correct and reliable signatures. Yet, the pattern does not change that much. Therefore, our objective is to find a beaconing pattern in seemingly pattern less connections in real-time using a more anomaly-based method. We encourage other blue teams/defenders to do the same.

Since the average and median of the time between the connections is more or less constant, we can look for connections where the times between consecutive connections constantly stay within a certain range. Regular traffic should not follow such pattern. For example, it makes a few fast-consecutive connections, then a longer time pause, and then again, some interaction. Using a wider range will detect the beacons with a lot of jitter, but more legitimate traffic will also fall in the wider range. There is a clear trade-off between false positives and accounting for more jitter.

In order to track the pattern of connections, we create connection pairs. For example, an IP that connects to a certain host, can be expressed as ’10.0.0.1 -> somerandomhost.com”. This is done for all connection pairs in the network. We will deep dive into one connection pair.

The image above illustrates a beacon is simulated for the pair ’10.0.0.1 -> somerandomhost.com” with a sleep of 1 second and a jitter of 20%, i.e. having a range between 0.8 and 1.2 seconds and the model is set to detect a maximum of 25% jitter. Our model follows the expected timing of the beacon as all connections remain within the lower and upper bound. In general, the more a connection reside within this bandwidth, the more likely it is that there is some sort of beaconing. When a beacon has a jitter of 50% our model has a bandwidth of 25%, it is still expected that half of the beacons will fall within the specified bandwidth.

Even when the configuration of the beacon changes, this method will catch up. The figure above illustrates a change from one to two seconds of sleep whilst maintaining a 10% beaconing. There is a small period after the change where the connections break through the bandwidth, but after several connections the model catches up.

This method can work with any connection pair you want to track. Possibilities include IPs, HTTP(s) hosts, DNS requests, etc. Since it works on only the metadata, this will also help you to hunt for domain fronted beacons (keeping in mind your baseline).

Keep in mind the false positives

Although most regular traffic will not follow a constant pattern, this method will most likely result in several false positives. Every connection that runs on a timer will result in the exact same pattern as beaconing. Example of such connections are windows telemetry, software updates, and custom update scripts. Therefore, some baselining is necessary before using this method for alerting. Still, hunting will always be possible without baselining!

Conclusion

Hunting for C2 beacons proves to be a worthwhile exercise. Real world scenarios confirm the effectiveness of this approach. Depending on the size of the network logs, this method can plow through a month of logs within an hour due to the simplicity of the method. Even when the hunting exercise did not yield malicious results, there are often other applications that act on specific time intervals and are also worth investigating, removing, or altering. While this method will not work when an adversary uses a 100% jitter. Keep in mind that this will probably annoy your adversary, so it’s still a win!

References:

[1]. https://attack.mitre.org/tactics/TA0011/

[2]. https://blog.fox-it.com/2019/02/26/identifying-cobalt-strike-team-servers-in-the-wild/

[3]. https://github.com/nettitude/PoshC2/blob/master/C2-Server.ps1

https://github.com/nettitude/PoshC2_Python/blob/4aea6f957f4aec00ba1f766b5ecc6f3d015da506/Files/Implant-Core.ps1

[4]. https://github.com/EmpireProject/Empire/blob/master/data/agent/agent.ps1

[5]. https://www.cobaltstrike.com/help-beacon

Detecting random filenames using (un)supervised machine learning

By: Fox IT
16 October 2019 at 11:00

Combining both n-grams and random forest models to detect malicious activity.

Author: Haroen Bashir

An essential part of Managed Detection and Response at Fox-IT is the Security Operations Center. This is our frontline for detecting and analyzing possible threats. Our Security Operations Center brings together the best in human and machine analysis and we continually strive to improve both. For instance, we develop machine learning techniques for detecting malicious content such as DGA domains or unusual SMB traffic. In this blog entry we describe a possible method for random filename detection.

During traffic analysis of lateral movement we sometimes recognize random filenames, indicating possible malicious activity or content. Malicious actors often need to move through a network to reach their primary objective, more popularly known as lateral movement [1].

There is a variety of routes for adversaries to perform lateral movement. Attackers can use penetration testing frameworks such as Metasploit [3] or Microsoft Sysinternal application PsExec. This application creates the possibility for remote command execution over the SMB protocol [4].

Due to its malicious nature we would like to detect lateral movement as quickly as possible. In this blogpost we build on our previous blog entry [2] and we describe how we can apply the magic of machine learning in detection of random filenames in SMB traffic.

Supervised versus unsupervised detection models 

Machine learning can be applied in various domains. It is widely used for prediction and classification models, which suits our purpose perfectly. We investigated two possible machine learning architectures for random filename detection.

The first detection method for random filenames is set up by creating bigrams of filenames,  which you can find more information about in our previous post [2]. This detection method is based on unsupervised learning. After the model learns a baseline of common filenames, it can now detect when filenames don’t belong in its learned baseline.

This model has a drawback; it requires a lot of data. The solution can be found with supervised machine learning models. With supervised machine learning we feed a model data whilst simultaneously providing the label of the data. In our current case, we label data as either random or not-random.

A powerful supervised machine learning model is the random forest. We picked this architecture as it’s widely used for predictive models in both classification and regression problems. For an introduction into this technique we advise you to see [4]. The random forest is based on multiple decision trees, increasing the stability of a detection model. The following diagram illustrates the architecture of the detection model we built.

Similar to the first model, we create bigrams of the filenames. The model cannot train on bigrams however, so we have to map the bigrams into numerical vectors. After training and testing the model we then focus on fine-tuning hyperparameters. This is essential for increasing the stability of the model. An important hyperparameter of the random forest is depth. A greater depth will create more decision splits in the random forest, which can easily cause overfitting. It is therefore highly desirable to keep the depth as low as possible, whilst simultaneously maintaining high precision rates.Results

Proper data is one of the most essential parts in machine learning. We gathered our data by scraping nearly 180.000 filenames from SMB logs of our own network. Next to this, we generated 1.000 random filenames ourselves. We want to make sure that the models don’t develop a bias towards for example the extension “.exe”, so we stripped the extensions from the filenames.

As we stated earlier the bigrams model is based on our previously published DGA detection model. This model has been trained on 90% percent of filenames. It is then tested on the remaining filenames and 100% of random filenames.

The random forest has been trained and tested in multiple folds, which is a cross validation technique[6]. We evaluate our predictions in a joint confusion matrix which is illustrated below.

True positives are shown in the upper right column, the bigrams model detected 71% of random filenames and the random forest detected 81% of random filenames. As you can see the models produce low false positive rates, in both models ~0% of not random filenames have been incorrectly classified as random. This is great for use in our Security Operations Center, as this keeps the workload on the analysts consistent.

The F1-scores are 0.83 and 0.89 respectively. Because we focus on adding detection with low false positive rates, it is not our priority to reduce the false negative rates. In future work we will take a better look at the false negative rates of the models.

We were quite interested in differences in both detection models. Looking at the visualization below we can observe that both models equally detect 572 random filenames. They separately detect 236 and 141 random filenames respectively. The bigrams model might miss more random filenames due to its unsupervised architecture. It is possible that the bigrams model requires more data to create it’s baseline and therefore doesn’t perform as well as the supervised random forest.The overlap in both models and the low false positive rate gave us the idea to run both these models cooperatively for detection of random filenames. It doesn’t cost much processing and we would gain a lot! In practical setting this would mean that if a random filename slips by one detection model, it is still possible for the other model to detect this. In theory, we detect 90% of random filenames! The low false positive rates and complementary aspects of the detection models indicate that this setup could be really useful for detection in our Security Operations Center.

Conclusion

During traffic analysis in our Security Operations Center we sometimes recognize random filenames, indicating possible lateral movement. Malicious actors can use penetration testing frameworks (e.g. Metasploit) and Microsoft processes (e.g. PsExec) for lateral movement. If adversaries are able to do this, they can easily compromise a (sub)network of a target. Needless to say that we want to detect this behavior as quickly as possible.

In this blog entry we described how we applied machine learning in order to detect these random filenames. We showed two models for detection: a bigrams model and a random forest. Both these models yield good results in testing stage, indicated by the low false positive rates. We also looked at the overlap in predictions from which we concluded that we can detect 90% of random filenames in SMB traffic! This gave us the idea to run both detection models cooperatively in our Security Operations Center.

For future work we would like to research the usability of these models on endpoint data, as our current research is solely focused on detection in network traffic. There is for instance lots of malware that outputs random filenames on a local machine. This is just one of many possibilities which we can better investigate.

All in all, we can confidently conclude that machine learning methods are one of many efficient ways to keep up with adversaries and improve our security operations!

 

References

[1] – https://attack.mitre.org/tactics/TA0008/

[2] – https://blog.fox-it.com/2019/06/11/using-anomaly-detection-to-find-malicious-domains.

[3] – https://www.offensive-security.com/metasploit-unleashed/pivoting/

[4] – https://www.mindpointgroup.com/blog/lateral-movement-with-psexec/

[5] – https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d

[6] – https://towardsdatascience.com/why-and-how-to-cross-validate-a-model-d6424b45261f

 

 

 

 

 

Office 365: prone to security breaches?

By: Fox IT
11 September 2019 at 11:30

Author: Willem Zeeman

“Office 365 again?”. At the Forensics and Incident Response department of Fox-IT, this is heard often.  Office 365 breach investigations are common at our department.
You’ll find that this blog post actually doesn’t make a case for Office 365 being inherently insecure – rather, it discusses some of the predictability of Office 365 that adversaries might use and mistakes that organisations make. The final part of this blog describes a quick check for signs if you already are a victim of an Office 365 compromise. Extended details about securing and investigating your Office 365 environment will be covered in blogs to come.

Office 365 is predictable
A lot of adversaries seem to have a financial motivation for trying to breach an email environment. A typical adversary doesn’t want to waste too much time searching for the right way to access the email system, despite the fact that it is often enough to browse to an address like https://webmail.companyname.tld. But why would the adversary risk encountering a custom or extra-secure web page? Why would the adversary accept the uncertainty of having to deal with a certain email protocol in use by the particular organisation? Why guess the URL? It’s much easier to use the “Cloud approach”.

In this approach, an adversary first collects a list of valid credentials (email address and password), most frequently gathered with the help of a successful phishing campaign. When credentials have been captured, the adversary simply browses to https://office.com and tries them. If there’s no second type of authentication required, they are in. That’s it. The adversary is now in paradise, because after gaining access, they also know what to expect here. Not some fancy or out-dated email system, but an Office 365 environment just like all the others. There’s a good chance that the compromised account owns an Exchange Online mailbox too.

In predictable environments, like Office 365, it’s also much easier to automate your process of evil intentions. The adversary may create a script or use some tooling, complement it with the gathered list of credentials and sit back. Of course, an adversary may also target a specific on-premises system configuration, but seen from an opportunistic point of view, why would they? According to Microsoft, more than 180 million people are using their popular cloud-based solution. It’s far more effective to try another set of credentials and enter another predictable environment than it is to spend time in figuring out where information might be available, and how the environment is configured.

Office 365 is… secure?
Well, yes, Office 365 is a secure platform. The truth is that it has a lot more easy-to-deploy security capabilities than the most common on-premises solutions. The issue here is that organisations seem to not always realise what they could and should do to secure Office 365.

Best practices for securing your Office 365 environment will be covered in a later blog, but here’s a sneak preview: More than 90% of the Office 365 breaches investigated by Fox-IT would not have happened if the organisation would have had multi-factor authentication in place. No, implementation doesn’t need to be a hassle. Yes, it’s a free to use option. Other security measures like receiving automatic alerts on suspicious activity detected by built-in Office 365 processes are free as well, but often neglected.

Simple preventive solutions like these are not even commonly available in on-premises-situation environments. It almost seems that many companies assume that they can get perfect security right out of the box, rather than configuring the platform to their needs. This may be the reason for organisations to do not even bother configuring Office 365 in a more secure way. That’s a pity, especially when securing your environment is often just a few cloud-clicks away. Office 365 may not be less secure than an on-premises solution, but it might be more prone to being compromised though. Thanks to the lack of involved expertise, and thanks to adversaries who know how to take advantage of this. Microsoft already offers multi-factor authentication to reduce the impact of attacks like phishing. This is great news, because we know from experience that most of the compromises that we see could have been prevented if those companies had used MFA. However, compelling more organisations to adopt it remains an ongoing challenge, and how to drive increased adoption of MFA remains an open question.

A lot of organisations are already compromised. Are you?
At our department we often see that it may take months(!) for an organisation to realise that they have been compromised. In Office 365 breaches, the adversary is often detected due to an action that causes so much noise that it’s no longer possible for the adversary to hide. When the adversary thinks it’s no longer beneficial to persist, the next step is to try to get foothold into another organisation. In our investigations, we see that when this happens, the adversary has already tried reaching a financial goal. This financial goal is often achieved by successfully committing a payment related fraud in which they use an employee’s internal email account to mislead someone. Eventually, to advance into another organisation, a phishing email is sent by the adversary to a large part of the organisation’s address list. In the end, somebody will likely take the bait and leave their credentials on a malevolent and adversary-controlled website. If a victim does, the story starts over again, at the other organisation. For the adversary, it’s just a matter of repeating the steps.

The step to gain foothold in another organisation is also the moment that a lot of (phishing) email messages are flowing out of the organisation. Thanks to Office 365 intelligence, these are automatically blocked if the number of messages surpasses a given limit based on the user’s normal email behaviour. This is commonly the moment where the victim gets in touch with their system administrator, asking why they can’t send any email anymore. Ideally, the system administrator will quickly notice the email messages containing malicious content and report the incident to the security team.

For now, let’s assume you do not have the basic precautions set up, and you want to know if somebody is lurking in your Office 365 environment. You could hire experts to forensically scrutinize your environment, and that would be a correct answer. There actually is a relatively easy way to check if Microsoft’s security intelligence already detected some bad stuff. In this blog we will zoom in on one of these methods. Please keep in mind that a full discussion of these range of the available methods is beyond the scope of this blog post. This blog post describes the method that from our perspective gives quick insights in (afterwards) checking for signs of a breach. The not-so-obvious part of this step is that you will find the output in Microsoft Azure, rather than in Office 365. A big part of the Office 365 environment is actually based on Microsoft Azure, and so is its authentication. This is why it’s usually[1] possible to log in at the Azure portal and check for Risk events.

The steps:

  1. Go to https://portal.azure.com and sign-in with your Office 365 admin account[2]
  2. At the left pane, click Azure Active Directory
  3. Scroll down to the part that says Security and click Risk events
  4. If there are any risky events, these will be listed here. For example, impossible travels are one of the more interesting events to pay attention to. These may look like this:

This risk event type identifies two sign-ins from the same account, originating from geographically distant locations within a period in which the geographically distance cannot be covered. Other unusual sign-ins are also marked by machine learning algorithms. Impossible travel is usually a good indicator that an adversary was able to successfully sign in. However, false positives may occur when a user is traveling using a new device or using a VPN.

Apart from the impossible travel registrations, Azure also has a lot of other automated checks which might be listed in the Risk events section. If you have any doubts about these, or if a compromise seems likely: please get in contact with your security team as fast as possible. If your security team needs help in the investigation or mitigation, contact the FoxCERT team. FoxCERT is available 24/7 by phone on +31 (0)800 FOXCERT (+31 (0)800-3692378).

[1] Disregarding more complex federated setups, and assuming the licensing model permits.

[2] The risky sign-ins reports are available to users in the following roles: Security Administrator, Global Administrator, Security Reader. Source: https://docs.microsoft.com/en-us/azure/active-directory/reports-monitoring/concept-risky-sign-ins

marketingfoxit

❌
❌