Normal view

There are new articles available, click to refresh the page.

Before yesterdayTenable TechBlog - Medium

More macOS Installer Flaws

3 June 2021 at 13:02

Back in December, we wrote about attacking macOS installers. Over the last couple of months, as my team looked into other targets, we kept an eye on the installers of applications we were using and interacting with regularly. During our research, we noticed yet another of the aforementioned flaws in the Microsoft Teams installer and in the process of auditing it, discovered another generalized flaw with macOS package installers.

Frustrated by the prevalence of these issues, we decided to write them up and make separate reports to both Apple and Microsoft. We wrote to Apple to recommend implementing a fix similar to what they did for CVE-2020–9817 and explained the additional LPE mechanism discovered. We wrote to Microsoft to recommend a fix for the flaw in their installer.

Both companies have rejected these submissions and suggestions. Below you will find full explanations of these flaws as well as proofs-of-concept that can be integrated into your existing post-exploitation arsenals.

Attack Surface

To recap from the previous blog, macOS installers have a variety of convenience features that allow developers to customize the installation process for their applications. Most notable of these features are preinstall and postinstall scripts. These are scripts that run before and after the actual application files are copied to their final destination on a given system.

If the installer itself requires elevated privileges for any reason, such as setting up a system-level Launch Daemon for an auto-updater service, the installer will prompt the user for permission to elevate privileges to root. There is also the case of unattended installations automatically doing this, but we will not be covering that in this post.

The primary issue being discussed here occurs when these scripts — running as root — read from and write to locations that a normal, lower-privileged user has control over.

Issue 1: Usage of Insecure Directories During Elevated Installations

In July 2020, NCC Group posted their advisory for CVE-2020–9817. In this advisory, they discuss an issue where files extracted to Installer Sandbox directories retained the permissions of a lower-privileged user, even when the installer itself was running with root privileges. This means that any local attacker (local for code execution, not necessarily physical access) could modify these files and potentially escalate to root privileges during the installation process.

NCC Group conceded that these issues could be mitigated by individual developers, but chose to report the issue to Apple to suggest a more holistic solution. Apple appears to have agreed, provided a fix in HT211170, and assigned a CVE identifier.

Apple’s solution was simple: They modified files extracted to an installer sandbox to obtain the permissions of the user the installer is currently running as. This means that lower privileged users would not be able to modify these files during the installation process and influence actions performed by root.

Similar to the sandbox issue, as noted in our previous blog post, it isn’t uncommon for developers to use other less-secure directories during the installation process. The most common directories we’ve come across that fit this bill are /tmp and /Applications, which both have read/write access for standard users.

Let’s use Microsoft Teams as yet another example of this. During the installation process for Teams, the application contents are moved to /Applications as normal. The postinstall script creates a system-level Launch Daemon that points to the TeamsUpdaterDaemon application (/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon), which will run with root permissions. The issue is that if a local attacker is able to create the /Applications/Microsoft Teams directory tree prior to installation, they can overwrite the TeamsUpdaterDaemon application with their own custom payload during the installation process, which will be run as a Launch Daemon, and thus give the attacker root permissions. This is possible because while the installation scripts do indeed change the write permissions on this file to root-only, creating this directory tree in advance thwarts this permission change because of the open nature of /Applications.

The following demonstrates a quick proof of concept:

# Prep Steps Before Installing

/tmp ❯❯❯ mkdir -p “/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/”

# Just before installing, have this running. Inelegant, but it works for demonstration purposes.

# Payload can be whatever. It won’t spawn a GUI, though, so a custom dropper or other application would be necessary.

/tmp ❯❯❯ while true; do

ln -f -F -s /tmp/payload “/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon”;

done

# Run installer. Wait for the TeamUpdaterDaemon to be called.

The above creates a symlink to an arbitrary payload at the file path used in the postinstall script to create the Launch Daemon. During the installation process, this directory is owned by the lower-privileged user, meaning they can modify the files placed here for a short period of time before the installation scripts change the permissions to allow only root to modify them.

In our report to Microsoft, we recommended verifying the integrity of the TeamsUpdaterDaemon prior to creating the Launch Daemon entry or using the preinstall script to verify permissions on the /Applications/Microsoft Teams directory.

The Microsoft Teams vulnerability triage team has been met with criticism over their handling of vulnerability disclosures these last couple of years. We’d expected that their recent inclusion in Pwn2Own showcased vast improvements in this area, but unfortunately, their communications in this disclosure as well as other disclosures we’ve recently made regarding their products demonstrate that this is not the case.

Full thread: https://mobile.twitter.com/EyalItkin/status/1395278749805985792

Full thread: https://twitter.com/mattaustin/status/1200891624298954752

Full thread: https://twitter.com/MalwareTechBlog/status/1254752591931535360

In response to our disclosure report, Microsoft stated that this was a non-issue because /Applications requires root privileges to write to. We pointed out that this was not true and that if it was, it would mean the installation of any application would require elevated privileges, which is clearly not the case.

We received a response stating that they would review the information again. A few days later our ticket was closed with no reason or response given. After some prodding, the triage team finally stated that they were still unable to confirm that /Applications could be written to without root privileges. Microsoft has since stated that they have no plans to release any immediate fix for this issue.

Apple’s response was different. They stated that they did not consider this a security concern and that mitigations for this sort of issue were best left up to individual developers. While this is a totally valid response and we understand their position, we requested information regarding the difference in treatment from CVE-2020–9817. Apple did not provide a reason or explanation.

Issue 2: Bypassing Gatekeeper and Code Signing Requirements

During our research, we also discovered a way to bypass Gatekeeper and code signing requirements for package installers.

According to Gatekeeper documentation, packages downloaded from the internet or created from other possibly untrusted sources are supposed to have their signatures validated and a prompt is supposed to appear to authorize the opening of the installer. See the following quote for Apple’s explanation:

When a user downloads and opens an app, a plug-in, or an installer package from outside the App Store, Gatekeeper verifies that the software is from an identified developer, is notarized by Apple to be free of known malicious content, and hasn’t been altered. Gatekeeper also requests user approval before opening downloaded software for the first time to make sure the user hasn’t been tricked into running executable code they believed to simply be a data file.

In the case of downloading a package from the internet, we can observe that modifying the package will trigger an alert to the user upon opening it claiming that it has failed signature validation due to being modified or corrupted.

Failed signature validation for a modified package

If we duplicate the package and modify it, however, we can modify contained files at will and repackage it sans signature. Most users will not notice that the installer is no longer signed (the lock symbol in the upper right-hand corner of the installer dialog will be missing) since the remainder of the assets used in the installer will look as expected. This newly modified package will also run without being caught or validated by Gatekeeper (Note: The applications installed will still be checked by Gatekeeper when they are run post-installation. The issue presented here regards the scripts run by the installer.) and could allow malware or some other malicious actor to achieve privilege escalation to root. Additionally, this process can be completely automated by monitoring for .pkg downloads and abusing the fact that all .pkg files follow the same general format and structure.

The below instructions can be used to demonstrate this process using the Microsoft Teams installer. Please note that this issue is not specific to this installer/product and can be generalized and automated to work with any arbitrary installer.

To start, download the Microsoft Teams installation package here: https://www.microsoft.com/en-us/microsoft-teams/download-app#desktopAppDownloadregion

When downloaded, the binary should appear in the user’s Downloads folder (~/Downloads). Before running the installer, open a Terminal session and run the following commands:

# Rename the package
yes | mv ~/Downloads/Teams_osx.pkg ~/Downloads/old.pkg

# Extract package contents
pkgutil — expand ~/Downloads/old.pkg ~/Downloads/extract

# Modify the post installation script used by the installer
mv ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall.bak

echo “#!/usr/bin/env sh\nid > ~/Downloads/exploit\n$(cat ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall.bak)” > ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall

rm -f ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall.bak

chmod +x ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall

# Repackage and rename the installer as expected
pkgutil -f --flatten ~/Downloads/extract ~/Downloads/Teams_osx.pkg

When a user runs this newly created package, it will operate exactly as expected from the perspective of the end-user. Post-installation, however, we can see that the postinstall script run during installation has created a new file at ~/Downloads/exploit that contains the output of the id command as run by the root user, demonstrating successful privilege escalation.

When we reported the above to Apple, this was the response we received:

…

Based on the steps provided, it appears you are reporting Gatekeeper does not apply to a package created locally. This is expected behavior.

…

We confirmed that this is indeed what we were reporting and requested additional information based on the Gatekeeper documentation available:

Apple explained that their initial explanation was faulty, but maintained that Gatekeeper acted as expected in the provided scenario.

Essentially, they state that locally created packages are not checked for malicious content by Gatekeeper nor are they required to be signed. This means that even packages that require root privileges to run can be copied, modified, and recreated locally in order to bypass security mechanisms. This allows an attacker with local access to man-in-the-middle package downloads and escalates privileges to root when a package that does so is executed.

Conclusion and Mitigations

So, are these flaws actually a big deal? From a realistic risk standpoint, no, not really. This is just another tool in an already stuffed post-exploitation toolbox, though, it should be noted that similar installer-based attack vectors are actively being exploited, as is the case in recent SolarWinds news.

From a triage standpoint, however, this is absolutely a big deal for a couple of reasons:

Apple has put so much effort over the last few iterations of macOS into baseline security measures that it seems counterproductive to their development goals to ignore basic issues such as these (especially issues they’ve already implemented similar fixes for).
It demonstrates how much emphasis some vendors place on making issues go away rather than solving them.

We understand that vulnerability triage teams are absolutely bombarded with half-baked vulnerability reports, but becoming unresponsive during the disclosure response, overusing canned messaging, or simply giving incorrect reasons should not be the norm and highlights many of the frustrations researchers experience when interacting with these larger organizations.

We want to point out that we do not blame any single organization or individual here and acknowledge that there may be bigger things going on behind the scenes that we are not privy to. It’s also totally possible that our reports or explanations were hot garbage and our points were not clearly made. In either case, though, communications from the vendors should have been better about what information was needed to clarify the issues before they were simply discarded.

Circling back to the issues at hand, what can users do to protect themselves? It’s impractical for everyone to manually audit each and every installer they interact with. The occasional spot check with Suspicious Package, which shows all scripts executed when an installer package is run, never hurts. In general, though, paying attention to proper code signatures (look for the lock in the upper righthand corner of the installer) goes a long way.

For developers, pay special attention to the directories and files being used during the installation process when creating distribution packages. In general, it’s best practice to use an installer sandbox whenever possible. When that isn’t possible, verifying the integrity of files as well as enforcing proper permissions on the directories and files being operated on is enough to mitigate these issues.

Further details on these discoveries can be found in TRA-2021–19, TRA-2021–20, and TRA-2021–21.

More macOS Installer Flaws was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Tenable TechBlog - Medium
Optimizing 700 CPUs Away With RustAlan Ning
6 May 2021 at 15:31

Optimizing 700 CPUs Away With Rust

Tenable TechBlog - Medium

By: Alan Ning

6 May 2021 at 15:31

In Tenable.io, we are heavy users of Datadog custom metrics. Millions of metrics are sent through Dogstatsd, providing deep insights into the complex platform. As the platform grew, we found that a significant number of metrics sent by legacy apps were obsolete. We tried to hunt down these obsoleted metrics in the codebase, but modifying legacy applications was extremely time consuming and risky.

To address this, we deployed a StatsD filter as a Datadog agent sidecar to filter out unnecessary metrics. The filter is a simple UDP datagram forwarder written in Node.js (sample, not actual code). We chose Node.js because in our environment, its network performance outstripped other languages that equalled its speed to production. We were able to implement, test and deploy this change across all of the T.io platform within a week.

statsd-filter-proxy is deployed as a sidecar to datadog-agent, filtering all StatsD traffic prior to DogstatsD.

While this worked for many months, performance issues began to crop up. As the platform continued to scale up, we were sending more and more metrics through the filter. During the first quarter of 2021, we added over 1.4 million new metrics as an effort to improve our observability. The filters needed more CPU resources to keep up with the new metrics. At this scale, even a minor inefficiency can lead to large wastage. Over time, we were consuming over 1000 CPUs and 400GB of memory on these filters. The overhead had become unacceptable.

We analyzed the performance metrics and decided to rewrite the filter in a more efficient language. We chose Rust for its high performance and safety characteristics. (See our other post on Rust evaluations) The source code of the new Rust-based filter is available here.

The Rust-based filter is much more efficient than the original implementation. With the ability to fully manage the heap allocations, Rust’s memory allocation for handling each datagram is kept to a minimum. This means that the Rust-based filter only needs a few MB of memory to operate. As a result, we saw a 75% reduction in CPU usage and a 95% reduction in memory usage in production.

Per pod, average CPU reduced from 800m to 200m core

Per pod, average memory reduced from 70MB to 5MB.

In addition to reclaiming compute resources, the latency per packet has also dropped by over 50%. While latency isn’t a key performance indicator for this application, it is rewarding to see that we are running twice as fast for a fraction of the resources.

With this small change, we were able to optimize away over 700 CPU and 300GB of memory. This was all implemented, tested and deployed in a single sprint (two weeks). Once the new filter was deployed, we were able to confirm the resource reduction in Datadog metrics.

CPU / Memory usage dropped drastically following the deployment

Source: https://github.com/tenable/statsd-filter-proxy-rs

TL;DR:

Replaced JS-based StatsD filter with Rust and received huge performance improvement.
At scale, even small optimization can result in a huge impact.

Optimizing 700 CPUs Away With Rust was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Tenable TechBlog - Medium
Scaling Tenable.io — From Site to CellAlan Ning
4 March 2021 at 15:46

Scaling Tenable.io — From Site to Cell

Tenable TechBlog - Medium

By: Alan Ning

4 March 2021 at 15:46

Scaling Tenable.io — From Site to Cell

Since the inception of Tenable.io, keeping up with data pressure has been a continuous challenge. This data pressure comes from two dimensions: the growth of the customer base and the growth of usage from each customer. This challenge has been most notable in Elasticsearch, since it is one of the most important stages in our petabytes-scale SaaS pipeline.

When customers run vulnerability scans, the Nessus scanners upload the scan data to Tenable.io. There, the data is broken down into documents detailing vulnerability information, including data such as asset information and cyber exposure details. These documents are then aggregated into an Elasticsearch index. However, when the index reached the scale of hundreds of nodes per cluster, the team discovered that further horizontal scaling would affect overall stability. We would encounter more hot shard problems, leading to uneven load across the index and affecting the user experience. This post will detail the re-architecture that both solved this scaling problem and achieved massive performance improvements for our customers.

Incremental Scaling from Site to Cell

Tenable.io scales by sites, which a site handles requests for a group of customers.

Each point of presence, called a site, contains a multi-tenant Elasticsearch instance to be used by geographically similar customers. As data pressure increases, however, horizontal scaling will cause instability, which will in turn cause instability at the site level.

To overcome this challenge, our overall strategy was to break out the site-wide (monolith) Elasticsearch cluster into multiple smaller, more manageable clusters. We call these smaller clusters cells. The rule is simple: If a customer has over 100 million documents, they will be isolated into their own cell. Smaller customers will be moved to one of the general population (GP) clusters. We came up with a technique to achieve zero downtime migration with massive performance gains.

Migrate from site to cell, where customer with large dataset may get their own Elasticsearch cluster.

Request Routing and Backfill

To achieve a zero downtime migration, we implemented two key pieces of software:

An Elasticsearch proxy that can:

Transparently proxy any Elasticsearch request to any Elasticsearch cluster
Intelligently tee any write request (e.g. Index, Bulk) to one or more clusters

2. A Spark job that can:

Query specific customer data
Using parallelized scrolls, read the Spark dataframes from the monolith cluster
Map the Spark dataframes from the monolith cluster directly to the cell-based cluster

To start, we reconfigured micro-services to communicate with Elasticsearch through the proxy. Based on the targeted customer (more on this later), the proxy performed dual write to the old monolith cluster and the new cell-based cluster. Once the dual write began, all new documents started flowing to the new cell cluster. For all older documents, we ran a Spark job to pull old data from the monolith cluster to the new cell cluster. Finally, after the Spark job completed, we cut all new queries over to the new cell cluster.

A zero downtime backfill process. Step 1: start dual write. Step 2: Backfill old data. Step 3: Cutover reads.

Elasticsearch Proxy

With the cell architecture, we see a future where migrating customers from one Elasticsearch cluster to another is a common event. Customers in a multi-tenant cluster can easily outgrow the cluster’s capacity over time and require migration to other clusters. In addition, we need to reindex the data from time to time to adjust immutable settings (e.g. shard count). With this in mind, we want to make sure this type of migration is completely transparent to all the micro services. This is why we built a proxy to encapsulate all customer routing logic such that all data allocation is completely transparent to client services.

For the proxy to be able to route requests to the correct Elasticsearch clusters, it needs the customer ID to be sent along with each request. To achieve this, we injected a X-CUSTOMER-ID HTTP header in each search and index request. The proxy inspected the X-CUSTOMER-ID header in each request, looked up the customer to cluster mapping, and forwarded the request on to the correct cluster.

While search and index requests always target a single customer, a bulk request contains a large number of documents for numerous customers. A single X-CUSTOMER-ID HTTP header would not provide sufficient routing information for the request. To overcome this, we found an interesting hack in Elasticsearch.

A bulk request body is encoded in a newline-delimited JSON (NDJSON) structure. Each action line is an operation to be performed on a document. This is an example directly copied from Elasticsearch documentation:

We found that within an action line, you can append any amount of metadata to the line as long as it is outside the action body. Elasticsearch seems to accept the request and ignore the extra content with no side effects (verified with ES2 to ES7). With this technique, we modified all clients of the Summary index to append customer IDs to every action.

With this modification, the proxy has enough information to break down a bulk request into subrequests for each customer.

Spark Backfill

To backfill old data after dual writes were enabled, we used AWS EMR with the elasticsearch-hadoop SDK to perform parallel scrolls against every shard from the source index. As Spark retrieves the data in the Resilient Distributed Dataset (RDD) format, the same RDD can be written directly to the destination index. Since we’re backfilling old data, we want to make sure we don’t overwrite anything that’s already been written. To accomplish this, we set es.write.operation to “create”. (Look for an upcoming blog post about how Tenable uses Kotlin with EMR and Spark!)

Here’s some high level sample code:

To optimize the backfill performance, we performed steps similar to the ones taken by Soundcloud. Specifically, we found the following settings the most impactful:

Setting the index replica to 0
Setting the refresh interval to 5 minutes

However, since we are migrating data using a live production system, our primary goal is to minimize performance impact. In the end, we settled on indexing 9000 documents per second as the sweet spot. At this rate, migrating a large customer takes 10–20 hours, which is fast enough for this effort.

Performance Improvement

Since we started this effort, we have noticed drastic performance improvement. Elasticsearch scroll speed saw up to 15X performance improvement, and queries decreased in latency of up to several orders of magnitude.

The chart below is a large scroll request that goes through millions of vulnerabilities. Prior to the cell migration, it could take over 24 hours to run the full scroll. The scroll from the monolith cluster suffers slow performance from the frequent resource contention with other customers, and it is further slowed by our fairness algorithm’s throttling. After the customer is migrated to the cell cluster, the same scroll request completes in just over 1.5 hours. Not only is this a large improvement for this customer, but other customers also reap the benefits of the decrease in contention.

In Summary

Our change in scaling strategy has resulted in large performance improvements for the Tenable.io platform. The new request routing layer and backfill process gave us new powerful tools to shard customer data. The resharding process is streamlined to an easy, safe and zero downtime operation.

Overall, the team is thrilled with the end result. It took a lot of ingenuity, dedication, and teamwork to execute a zero downtime migration of this scale.

tl;dr:

Exponential customer growth on the Tenable.io platform led to a huge increase in the data stored in a monolithic Elasticsearch cluster to the point where it was becoming a challenge to scale further with the existing architecture.
We broke down the site monolith cluster to cell clusters to improve performance.
We migrated customer data through a custom proxy and Spark job, all with zero downtime.
Scrolls performance improved by 15x, and queries latency reduced by several orders of magnitude.

Brought to you by the Sharders team:
Alan Ning, Alex Barbour, Ciaran Gaffney, Jagan Kondapalli, Johnny Mao, Shannon Prickett, Ted O’Meara, Tristan Burch

Special thanks to Jack Matheson and Vincent Gilcreest for all the help with editing.

Scaling Tenable.io — From Site to Cell was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.