❌

Normal view

There are new articles available, click to refresh the page.
Before yesterdayMalware

DGAs – Generating domains dynamically

5 November 2020 at 09:49

A domain generation algorithm is a routine/program that generates a domain dynamically. Think of the following example:

An actor registers the domain evil.com. The corresponding backdoor has this domain hardcoded into its code. Once the attacker infects a target with this malware, it will start contacting its C2 server.

As soon as a security company obtains the malware, it might blacklist the registered domain evil.com. This will hinder any attempts of the malware to receive commands from the original C2.

If a domain generation algorithm would have been used, the domain will be generated based on a seed. The current date for example is a popular seed amongst malware authors. A simple domain blacklisting would not solve the problem. The security company will have to resort to different methods.

By generating domains dynamically, it is harder for defenders to hinder the malware from contacting its C2 server. It will be necessary to understand the algorithm.

Example implementation of a DGA

A quick & dirty implementation(loosely based on Wikipedia)[1] of such algorithm could look like this:

"""Example implementation of a domain generation algorithm."""

import sys
import time
import random


def gen_domain(month, day, hour, minute):
    """Generate the domain based on time. Return domain"""
    print(
        f"[+] Gen domain based on month={month} day={day} hour={hour} min={minute}")
    domain = ""
    for i in range(8):
        month = (((month * 8) ^ 0xF))
        day = (((day * 8) ^ 0xF))
        hour = (((hour * 8) ^ 0xF))
        minute = (((minute * 8) ^ 0xF))
        domain += chr(((month * day * hour * minute) % 25) + 0x61)
    return domain


try:
    while True:
        d = gen_domain(random.randint(1, 12), random.randint(1, 30),
                       random.randint(0, 24), random.randint(0, 60))
        print(f"[+] Generated domain = {d}")
        time.sleep(5)
except KeyboardInterrupt:
    sys.exit()

Our DGA algorithm would use the current date and time as a seed. Each parameter is multiplied with 8 and XOR’d with 0xF. Finally all four values are multiplied with each other. The final operations are used to make sure that we generate a character in small caps. The output of this program looks like this:

[+] Gen domain based on month=12 day=2 hour=4 min=4
[+] Generated domain = taavtaab.com
[+] Gen domain based on month=3 day=10 hour=11 min=36
[+] Generated domain = kugxfkvx.com
[+] Gen domain based on month=2 day=27 hour=4 min=1
[+] Generated domain = kaasuapn.com

Seed or Dictionary based

There are different main approaches when implementing a domain generation algorithm. For the sake of keeping this simple, we will not focus on the hybrid approach.

Different kinds of approaches

Seed based Approach

We already introduced the first one. Our implementation is an algorithm based on a seed, which is served as an input. Another example I can provide, is how APT34 used such seed based algorithm in a campaign targeting a government organisation in the Middle East. The campaign was discovered by FireEye[2].

The mentioned APT group used domain generation algorithms in one of their downloaders. The Downloader was named BONDUPDATER by FireEye and is implemented in the Powershell Scripting Language.

BONDLOADER DGA algorithm

The first 12 chars of the UUID is extracted. Next the program runs into a loop. Each iteration a new random number is generated and the domain is generated by concatenating hardcoded, as well as generated values. GetHostAddresses will try to resolve the generated domain. If it fails, a new iteration starts. Once a registered domain is generated and resolved, it will break the loop.

Depending on the resolved ip address, the script will trigger different actions.

Dictionary based Approach

The second approach is to create a dictionary based domain generation algorithm. Instead of focusing on a seed, a list of words could be provided. The algorithm randomly selects words from these lists, concatenates them and generates a new domain. Suppobox[3] is a malware, which implemented the dictionary based approach[4].

Defeating Domain Generation Algorithms

The straight forward way to counter these algorithms is to reverse engineer the routine and to predict future domains. One famous case of predicting future domains is the takedown of the Necurs Botnet by Microsoft[5]. By understanding the DGA, they were able to predict the domains for the next 25 months.

I am not a ML magician. However, just a quick google research shows that there is a lot research going on. Machine Learning based approaches to counter DGAs seems to be promising too.

Deobfuscating DanaBot’s API Hashing

12 July 2020 at 13:27

You probably already guessed it from the title’s name, API Hashing is used to obfuscate a binary in order to hide API names from static analysis tools, hindering a reverse engineer to understand the malware’s functionality.
A first approach to get an idea of an executable’s functionalities is to more or less dive through the functions and look out for API calls. If, for example a CreateFileW function is called in a specific subroutine, it probably means that cross references or the routine itself implement some file handling functionalities. This won’t be possible if API Hashing is used.

Instead of calling the function directly, each API call has a corresponding checksum/hash. A hardcoded hash value might be retrieved and for each library function a checksum is computed. If the computed value matches the hash value we compare it against, we found our target.

API Hashing used by DanaBot

In this case a reverse engineer needs to choose a different path to analyse the binary or deobfuscate it. This blog article will cover how the DanaBot banking trojan implements API Hashing and possibly the easiest way on how this can be defeated. The SHA256of the binary I am dissecting here is added at the end of this blog post.

Deep diving into DanaBot

DanaBot itself is a banking trojan and has been around since atleast 2018 and was first discovered by ESET[1]. It is worth mentioning that it implements most of its functionalities in plugins, which are downloaded from the C2 server. I will focus on deobfuscating API Hashing in the first stage of DanaBot, a DLL which is dropped and persisted on the system, used to download further plugins.

Reversing the ResolvFuncHash routine

At the beginning of the function, the EAX register stores a pointer to the DOS header of the Dynamic Linked Library which, contains the function the binary wants to call. The corresponding hash of the yet unknown API function is stored in the EDX register. The routine also contains a pile of junk instructions, obfuscating the actual use case for this function.

The hash is computed solely from the function name, so the first step is to get a pointer to all function names of the target library. Each DLL contains a table with all exported functions, which are loaded into memory. This Export Directory is always the first entry in the Data Directory array. The PE file format and its headers contain enough information to reach this mentioned directory by parsing header structures:

Cycling through the PE headers to obtain the ExportDirectory and AddressOfNames

In the picture below, you can see an example of the mentioned junk instructions, as well as the critical block, which compares the computed hash with the checksum of the function we want to call. The routine iterates through all function names in the Export Directory and calculates the hash.
The loop breaks once the computed hash matches the value that is stored in the EDX register since the beginning of this routine.

Graph overview of obfuscated API Hashing function

Reversing the hashing algorithm

The hashing algorithm is fairly simple and nothing too complicated. Junk instructions and opaque predicates complicate the process of reversing this routine.

The algorithm takes the nth and the stringLength-n-1th char of the function name and stores them, as well as capitalised versions into memory, resulting in a total of 4 characters. Each one of those characters is XOR'd with the string length. Finally they are multiplied and the values ​​are added up each time the loop is run and result in the hash value.

def get_hash(funcname):
    """Calculate the hash value for function name. Return hash value as integer"""
    strlen = len(funcname)
    # if the length is even, we encounter a different behaviour
    i = 0
    hashv = 0x0
    while i < strlen:
        if i == (strlen - 1):
            ch1 = funcname[0]
        else:
            ch1 = funcname[strlen - 2 - i]
        # init first character and capitalize it
        ch = funcname[i]
        uc_ch = ch.capitalize()
        # Capitalize the second character
        uc_ch1 = ch1.capitalize()
        # Calculate all XOR values
        xor_ch = ord(ch) ^ strlen
        xor_uc_ch = ord(uc_ch) ^ strlen
        xor_ch1 = ord(ch1) ^ strlen
        xor_uc_ch1 = ord(uc_ch1) ^ strlen
        # do the multiplication and XOR again with upper case character1
        hashv += ((xor_ch * xor_ch1) * xor_uc_ch)
        hashv = hashv ^ xor_uc_ch1
        i += 1
    return hashv

A python script for calculating the hash for a given function name is also uploaded on my github page[2] and free for everyone to use. I’ve also uploaded a text file with hashes for exported functions of commonly used DLLs.

Deobfuscation by Commenting

So now that we cracked the algorithm, we want to update our disassembly to know which hash value represents which function. As I’ve already mentioned, we want to focus on simplicity. The easiest way is to compute hash values for exported functions of commonly used DLLs and write them into a file.

Generated hashes

With this file, we can write an IdaPython script to comment the library function name next to the Api Hashing call. Luckily the Api Hashing function is always called with the same pattern:

  • Move the wanted hash value into the EDX register
  • Move a DWORD into EAX register

First we retrieve all XRefs of the Api Hashing function. Each XRef will contain an address where the Api Hashing function is called at, which means that in atleast the 5 previous instructions, we will find the mentioned pattern. So we will fetch the previous instruction until we extract the wanted hash value, which is being pushed into EDX. Finally we can use this immediate to extract the corresponding api function from the hash values we have generated before and comment the function name next to the Xref address.

def add_comment(addr, hashv, api_table):
    """Write a comment at addr with the matching api function.Return True if a corresponding api hash was found."""
    # remove the "h" at the end of the string
    hashv = hex(int(hashv[:-1], 16))
    keys = api_table.keys()
    if hashv in keys:
        apifunc = api_table[hashv]
        print "Found ApiFunction = %s. Adding comment." % (apifunc,)
        idc.MakeComm(addr, apifunc)
        comment_added = True
    else:
        print "Api function for hash = %s not found" % (hashv,)
        comment_added = False
    return comment_added


def main():
    """Main"""
    f = open(
        "C:\\Users\\luffy\\Desktop\\Danabot\\05-07-2020\\Utils\\danabot_hash_table.txt", "r")
    lines = f.readlines()
    f.close()
    api_table = get_api_table(lines)
    i = 0
    ii = 0
    for xref in idautils.XrefsTo(0x2f2858):
        i += 1
        currentaddr = xref.frm
        addr_minus = currentaddr - 0x10
        while currentaddr >= addr_minus:
            currentaddr = PrevHead(currentaddr)
            is_mov = GetMnem(currentaddr) == "mov"
            if is_mov:
                dst_is_edx = GetOpnd(currentaddr, 0) == "edx"
                # needs to be edx register to match pattern
                if dst_is_edx:
                    src = GetOpnd(currentaddr, 1)
                    # immediate always ends with 'h' in IDA
                    if src.endswith("h"):
                        add_comment(xref.frm, src, api_table)
                        ii += 1
    print "Total xrefs found %d" % (i,)
    print "Total api hash functions deobfuscated %d" % (ii,)


if __name__ == '__main__':
    main()

Conclusion

As reverse engineers, we will probably continue to encounter Api Hashing in various different ways. I hope I was able to show you some quick & dirty method or give you at least some fundament on how to beat this obfuscation technique. I also hope that, the next time a blue team fellow has to analyse DanaBot, this article might become handy to him and saves him some time reverse engineering this banking trojan.

IoCs

  • Dropper = e444e98ee06dc0e26cae8aa57a0cddab7b050db22d3002bd2b0da47d4fd5d78c
  • DLL = cde01a2eeb558545c57d5c71c75e9a3b70d71ea6bbeda790a0b871fcb1b76f49
❌
❌