Normal view

There are new articles available, click to refresh the page.

Before yesterdayKartone Infosec Blog

Kartone Infosec Blog
WannaCry, two years later: a deep look into its codeKartone
23 May 2019 at 09:17

WannaCry, two years later: a deep look into its code

Kartone Infosec Blog

By: Kartone

23 May 2019 at 09:17

WannaCry, two years later: a deep look into its code

My own technical analysis of the malware that, in 2017, spread like wildfire encrypting thousands of computers, using one of the tools leaked from the National Security Agency by the group named ShadowBrokers.

Almost two years passed after that weekend of May 2017, when the crypto-worm WannaCry infested the net thanks to the EternalBlue exploit. In roughly two days, WannaCry spread itself all over the world infecting almost 230.000 computers in over 150 countries:

At that time, working as an Information Security Officer, with my colleagues, especially the guys from IT Infrastructure dept., worked hard to keep the entire Company perimeter safe. Luckily for us, we were not hit by the ransomware, but a lot of effort was spent explaining to the rest of the Company what happened.

Flash forward to 2019

Since this January, I've been running my own Dionaea honeypot that keeps catching a huge number of WannaCry samples. Just to give you some numbers, within two months, the 445 port was hit almost half a million times and I was able to collect roughly 18.000 of its samples at the rate of almost 300 samples per day.

If you notice from the file size, all these samples are all the same, and everyone of them is a WannaCry sample, delivered right to the 445 port in a DLL fashion.

Just to make a contribution to the WannaCry story, though small and useless, I thought it would be fun to analyze the internals of this malware as I wasn't able to do it back in the days. I will concentrate the analysis on its various layers and the most important parts of the code that make this malware unique.

Peeling the onion

First look at one of these samples, confirms that we're dealing with a malicious DLL and it's worth to note its compilation timestamp. Let's call this as launcher.dll because of the evidence found in a string inside the code.

Luckily for us, this sample is not packed. We can check its Import and Export Address Table to get an idea of what this sample is able to do.

Easily enough, checking the imported API, we can assume that the malware uses something in its resource section and supposedly create a file and run a process. Commonly, DLL malware exports functionalities to the outside via its Export Address Table. We can see only one exported function and it's called PlayGame:

As noted above, malware imported some specific APIs to manage its resource section, like FindResourceA and LoadResource. We can easily recognize the magic numbers of a Portable Executable file - a Windows executable file - stored inside this section. We can dump it easily with tools like ResourceHacker:

But before analyzing it, we need to get rid of some bytes in the header, we'll come to these bytes later.

So now, we can open it and check its sections like we just did with the aforementioned DLL. Interestingly this new dumped executable seems 7 years older than the first one, its compile timestamp is dated November 2010 but, be aware that this date can be easily fake.

We can get an idea of what its purpose is by checking out the imported libraries:

We have to expect much more complexity in this stage than the DLL. We have a bunch of standard libraries like KERNEL32.dll or WININET.dll and iphlpapi.dll. This DLL was unknown for me so I found, from MSDN, that:

Purpose
The Internet Protocol Helper (IP Helper) API enables the retrieval and modification of network configuration settings for the local computer.
The IP Helper API is applicable in any computing environment where programmatically manipulating network and TCP/IP configuration is useful. Typical applications include IP routing protocols and Simple Network Management Protocol (SNMP) agents.

A quick look suggests that this executable operates with Windows services configuration, manages files and resources and also, has network capabilities:

The Plan

My plan is to give a deep look inside all various stages that the malware extracts during its execution, analyzing its code and how it interacts with internal Windows subsystems.

For this reason, we're now stepping back to analyze and understand how the DLL extracts this executable in the first place. Then we'll give a look inside the debugger to see how things happen in realtime and then, we will analyze and try to understand what this executable is going to do once it infects the system.

Analysis of the first layer: launcher.dll

The purpose of this DLL is exactly what we supposed thanks to the analysis of the imported libraries. The only exported function PlayGame is easily disassembled by IDAPro.

The first call to sprintf compose the Dest string as C:\WINDOWS\mssecsvc.exe. Then it calls two functions, sub_10001016 that extracts, from its resource section, the executable we dumped before and then, saves it into a new file named as Dest string; after that sub_100010AB runs the file. Notice that we have just gained our first host-based indicator: C:\WINDOWS\MSSECSVC.EXE for this malware detection.

Function `sub_10001016` aka `ExtractAndCreate`

For better reading and understanding this function, we can rename it as ExtractAndCreate and we can split it into two parts: the extract part and the create file part.

During this phase, the malware uses four API calls, that are completely covered inside the MSDN.

FindResourceA: Determines the location of a resource with the specified type and name in the specified module.
LoadResource: Retrieves a handle that can be used to obtain a pointer to the first byte of the specified resource in memory.
LockResource: Retrieves a pointer to the specified resource in memory.
SizeOfResource: Retrieves the size, in bytes, of the specified resource.

That being said, we can now analyze step by step this simple four blocks of code. First function prototype is:

HRSRC FindResourceA(
  HMODULE hModule,
  LPCSTR  lpName,
  LPCSTR  lpType
);

We have three function parameters that, as per calling convention, must be pushed in reverse order, so:

push    offset Type ; "W"
push    65h ; lpName
push    hModule ; hModule
call    ds:FindResourceA

Parameter hModule is being populated inside the DLLMain method, and is equals to variable hinstDLL.

hinstDLL: A handle to the DLL module. The value is the base address of the DLL. The HINSTANCE of a DLL is the same as the HMODULE of the DLL, so hinstDLL can be used in calls to functions that require a module handle.

lpName: The name of the resource. In this case, name is 0x65 or 101 in decimal value. If you look, name is confirmed by analyzing the DLL with ResourceHacker:

lpType: The resource type. Can be also noticed in the screenshot above.

From MSDN: If the function succeeds, the return value is a handle to the specified resource's information block. To obtain a handle to the resource, pass this handle to the LoadResource function. If the function fails, the return value is NULL.

Coming back to the disassembly, this handle is returned into EAX and then moved inside EDI, where is being tested to check if it's null. If it's not, the handle is pushed, as the second argument, to the next API call to LoadResource. Quoting MSDN: it retrieves a handle that can be used to obtain a pointer to the first byte of the specified resource in memory. It also suggests:"...to obtain a pointer to the first byte of the resource data, call the LockResource function; to obtain the size of the resource, call SizeofResource".

HGLOBAL WINAPI LoadResource(
  _In_opt_ HMODULE hModule,
  _In_     HRSRC   hResInfo
);

hModule: A handle to the module whose executable file contains the resource.

hResInfo: A handle to the resource to be loaded.

The same approach applies with the other two API calls: LockResource and SizeofResource. The interesting thing to note here is that the return value from this last call, stored inside EAX register as 500000, won't be used at all:

So now, looking in the debugger, we have:

EAX = 500000
ESI = 10004060

ESI register contains the pointer to the memory region referred to the resource section that contains the executable itself. You can notice it thanks to the MZ header in the memory dump. Remember the 4 bytes that were been removed with hex editor before? According to MSDN this DWORD is the actual size of raw data inside the resource section of the binary itself. So, this value 0x0038D000is moved into EBX and then pushed as lpBuffer to the WriteFile function. Pretty standard call here: CreateFileA will create a file with specific attributes. Parameter dwFlagsAndAttributes, according to MSDN, a value of 0x4stands for: "The file is part of or used exclusively by an operating system".

After the call to WriteFile, we have our executable saved and ready to run. The interesting parameters for this call are:

lpBuffer: equals to ESI, is the value returned by the call to LockResource and is a pointer to the buffer to write into the file. Basically is a pointer to the binary inside the resource section.
nNumberOfBytesToWrite: as we said earlier, this parameter is the value pointed by the ESI to a DWORD inside of resource header. Its value represent the size of the binary data.

So now, we can enable a breakpoint right after the WriteFile call and get the freshly created executable.

Function `sub_100010AB` aka `RunTheFile`

Here we're dealing with a very simple API call to CreateProcessA, nothing fancy to add. I'd prefer not to dig inside all these parameters, it's completely covered inside the MSDN.

Conclusion after the first layer

What I would show here is my own study process: be aware, sometimes it can be very, very time-consuming but it gives me a big, complete and deep look inside Windows internals and how malware uses them. This proceeding, for me as a novice, helped a lot.

Analysis of the second layer: mssecsvc.exe

This will differs from the DLL file. As we noted initially, this executable is way more complex: we'll deal with various libraries and functionalities. But all start with a (Win)main function, right?

Do you remember the kill-switch? Do you remember the story behind? Give it a read, it's very interesting.

In general terms, the main function of a Windows program is named WinMain, this is the first function that will be called when the program starts. We see a very strange url inside this code. Exactly the string is: http://www.iuqerfsodp9ifjaposdfjhgosurijfaewrwergwea.com and is referred through the EDI register. After that, the WinINet subsystem is initialized using the call to InternetOpenA, this function returns a valid handle that the application passes to subsequent WinINet functions. Next, there's a call to InternetOpenUrlA that opens a resource specified by a complete FTP or HTTP URL. After that the handle is closed and a new function is called: sub_408090, we'll name it ServiceStuff:

In the first block of code, according to MSDN: GetModuleFileNameA retrieves the fully qualified path for the file that contains the specified module. The module must have been loaded by the current process, first parameter hModule is the handle to the loaded module whose path is being requested. If this parameter is NULL, GetModuleFileNameA retrieves the path of the executable file of the current process. Here the value is set to NULL, so it retrieves the name of the executable itself:

We then find a check on the number of arguments: if there are arguments the TRUE path will be taken. Because, in our case, we're debugging without any argument, the FALSE path is taken and a new function sub_407F20 is called. This is a simple function that calls other two, so let's call it FunctionCaller:

Simple enough sub_407C40 create a new service and then starts it, so we name it CreateAndStartService. Service will be run with command line mssecsvc.exe -m security and with a display name as "Microsoft Security Center (2.0) Service" defined as "mssecsvc2.0".

When we move then to sub_407cE0, things start to become fun. For the sake of simplicity, we'll analyze this function in four parts. The first part is easy because the malware dynamically resolve some APIs:

Nothing too much complicated here: it uses GetProcAddress to populate some variables with the address of specific APIs, so it can call them in the next lines of code. After that, the second part will manage the resource section, just like the way we analyzed in the DLL launcher.dll:

This is confirmed into the debugger:

The return value from LockResource, as we know, is the pointer to the resource section into the binary and we can notice the MZ header into the memory dump. We then reach another interesting piece of code:

Two distinct string: Dest and NewFileName, are created using sprintf function. This two evidence are others good host-based indicators:

Dest = C:\WINDOWS\tasksche.exe

NewFileName = C:\WINDOWS\qeriuwjhrf

After that, the old file tasksche.exe is moved into the new file qeriuwjhrf and a new tasksche.exe is created. Now, I found myself lost into somehow obscure code: I got that WriteFile will dump the R resource into the created file tasksche.exe and runs it at the end. What's inside the middle part, for me, remains in the dark.

In situations like this, I prefer to view the code inside the debugger because viewing the code during runtime maybe can help to shed some light. Indeed, seems like It created the command line for the incoming CreateProcessA call.

To recap: this function dumps its resource data inside a new executable file named tasksche.exe, making a copy inside another file named qeriuwjhrf, and then run tasksche.exe /i.

Stepping back to ServiceStuff function, there's the other path to analyze: when there are the arguments "-m security", it enters into service mode. After its initialization, it changes service config:

According to MSDN, it changes the config so that failure actions occur if the service exits without entering a SERVICE_STOPPED state. After that, it executes its ServiceFunction:

This function setup the handles and starts exploiting the MS17-010 vulnerability into the reachable networks. Note that it exits after 24h. Here, I renamed this function ExecuteEternalBlue

This call starts a number of events that let the infection to happen. First thing, Winsock subsystem is initialized and a CryptoContext is generated:

Next, the malware will load a DLL into the memory - the very same launcher.dll we analyzed before - and then run it. Networks attacks happen inside two new threads. This flow can be easily observed if we decompile this function:

The first thread, involving the function sub_407720, will enumerates local network adapters and generates IP addresses compatible for those networks. For every IP, it tries to connect to port 445 and, if successful, launch the attack. Second thread, involving function sub_407840, will run 128 times with 2 seconds (hex 7D0) delay between each run. It will generates random IP address and tries to connect on port 445, if connection is successful, malware will launch the EternalBlue attack. It's a pretty big chunk of code, but one interesting block of code is this:

Basically the malware, with the random IP placed into the Dest string converted into the proper format, calls sub_407480 aka CreateSocketAndConnect to try a connection to the 445 port, if the connection is successful, real attack is launched within the function sub_407540 aka SMBAttack.

Conclusion after the second layer

So, until now, we got a DLL - launcher.dll - that loads and runs a binary stored inside its resource section,mssecsvc.exe. The very first time, a new service is created to achieve persistence and after that it scans the networks (local and random remote) launching the EternalBlue exploits against 445 ports. In its stand-alone version, it dumps another binary from its resource section and runs it. What's the purpose of this third binary? Let's give a look.

Analysis of the third layer: tasksche.exe

Remember that this executable come from the resource section of previous file, mssecsvc.exe. When it runs as service, locates its resource section and writes it to the disk creating tasksche.exe. When it starts, it first generates a random string based on computer name, then checks if there are some command line arguments, in particular, if there's /i as argument. We have now two branches to analyze:

If there's /i argument: it creates specific directories and copies the file over it, like C:\ProgramData\somerandomstring\tasksche.exe and runs it from there.

If there's no /i argument: it locates its resource section, named XIA, storing and extracting it onto disk. What's interesting to note here that this resource is a compressed password protected archive. Luckily for us, password is hardcoded in clear text.

Let's give a look inside the archive knowing the password: WNcry@2ol7

We can recognize the magic numbers for a ZIP file that we can dump directly and extract.

b.wnry is the bitmap image of the ransomware. Basically what you see as wallpaper when the computer is infected.

c.wnry is the configuration file in clear text, we can see some onion servers and the archive containing the TOR browser.

r.wnry contains some text ransom note.

Inside the msg folder there are some localized ransom note:

Conclusion after the third layer

This new executable seems pretty interesting because basically, it manages all the crypto actions involved within the ransomware. I won't go into this analysis because it's beyond my actual skills and also because, there are plenty of resources available on the internet, from amazing guys that are way better than me. For example, this technical analysis by FireEye was published only few days aftermath and is complete, deep and detailed. I used it a lot to better understand many pieces of obscure code.

Conclusion

I have learned a lot from this research: I learned how malware interacts with their resource section to hide, dump and create files; I learned how malware interacts with Windows service manager and how they actually load DLLs in memory, how they scans networks and how EternalBlue actually works. Also, having available such complete and detailed technical analysis, on this very specific malware, helped me to not loose the direction when I went too deep inside the assembly code. It was very fun and I hope this research will be helpful to someone at least as it was for me.

Project Sodinokibi

Kartone Infosec Blog

By: Kartone

29 October 2020 at 08:59

Learning Python

Python is the language I always wanted to learn. I tried but failed every single time, don't know exactly why. This time was different though, I knew from the first line of code. So, with a little push of a dear friend of mine (thanks Elio!), I tried to investigate how to decode Sodinokibi ransomware configurations for hundreds, maybe thousands, of samples. I intended to understand, using powerful insights from VirusTotal Enterprise API, if there are relationships between Threat Actor, mapped inside the ransomware configuration, and the country visible from the VirusTotal sample submission.
I am perfectly aware that it's not as easy as it seems: the ransomware sample submission's country, visible from VirusTotal, may not be the country affected by the ransomware itself. But, in one case of another, I think there could be somehow a link between the two parameters: maybe from the Incident Response perspective.

Getting the samples

My first step was to get as many samples as I could. My first thought was to use VirusTotal API: I'm lucky enough to have an Enterprise account, but the results were overwhelming and, due to the fact I was experimenting with Python, the risk of running too many requests and consume my threshold was too high. So I opted to use another excellent malware sharing platform: Malware Bazaar by Abuse.ch

All the code is available here

downloaded_samples = []
data = { 'query': 'get_taginfo', 'tag': args.tag_sample, 'limit': 1000 }
response = requests.post('https://mb-api.abuse.ch/api/v1/', data = data, timeout=10)
maldata = response.json()
print("[+] Retrieving the list of downloaded samples...")
	for file in glob.glob(SAMPLES_PATH+'*'):
        filename = ntpath.basename(os.path.splitext(file)[0])
        downloaded_samples.append(filename)
    print("[+] We have a total of %s samples" % len(downloaded_samples))
    for i in range(len(maldata["data"])):
        if "Decryptor" not in maldata["data"][i]["tags"]:
            for key in maldata["data"][i].keys():
                if key == "sha256_hash":
                    value = maldata["data"][i][key]
                    if value not in downloaded_samples:
                        print("[+] Downloading sample with ", key, "->", value)
                        if args.get_sample:
                            get_sample(value)
                        if args.clean_sample:
                            housekeeping(EXT_TO_CLEAN)
        else:
            print("[+] Skipping the sample because of Tag: Decryptor")

This block of code essentially builds the request for the back-end API where the tag to search for comes from the command line parameter. I defaulted it to Sodinokibi. It then creates a list of samples already present in the ./samples directory not to download them again. Interestingly, because there are many Sodinokibi decryptors executables on the Malware Bazaar platform, I needed some sort of sanitization not to download them. When it founds a sample not present inside the local directory, It then calls the function to download it.

def get_sample(hash):
    headers = { 'API-KEY': KEY } 
    data = { 'query': 'get_file', 'sha256_hash': hash }
    response = requests.post('https://mb-api.abuse.ch/api/v1/', data=data, timeout=15, headers=headers, allow_redirects=True)
    with open(SAMPLES_PATH+hash+'.zip', 'wb') as f:
        f.write(response.content)
        print("[+] Sample downloaded successfully")
    with pyzipper.AESZipFile(SAMPLES_PATH+hash+'.zip') as zf:
        zf.extractall(path=SAMPLES_PATH, pwd=ZIP_PASSWORD)
        print("[+] Sample unpacked successfully")

A straightforward function: builds the API call, gets the zipped sample, unpack, and saves it inside the directory ./samples. Note that the sample filenames are always their SHA-256 hash. After unpacking it, I made a small housekeeping function to get rid of the zip files.

def housekeeping(ext):
    try:
        for f in glob.glob(SAMPLES_PATH+'*.'+ext):
            os.remove(f)
    except OSError as e:
        print("Error: %s - %s " % (e.filename, e.strerror))

This is what happens when you run the script.

Getting insights on ransomware configuration

Now it's time to analyze these samples to get the pieces of information we need. The plan is to extract the configuration from an RC4 encrypted configuration stored inside a PE file section. Save ActorID, CampaignID, and executable hash. With the latter, we then query VirusTotal API to get insights for the sample submission: the City and the Country from where the sample was submitted and when there was the submission. As I wanted to map these pieces of information on a map, with OpenCage API I then obtained cities coordinates of the submissions.

The code to build the API calls and parse the response JSON is rough, shallow and straightforward I would not go with it. I'm sure there are plenty of better ways to do its job, but...it's my first time with Python! So bear with me, please. What I think it's interesting is the function that extracts and decrypts the configuration from the ransomware executable PE file. These are the lines of code that do this task:

excluded_sections = ['.text', '.rdata', '.data', '.reloc', '.rsrc', '.cfg']

def arc4(key, enc_data):
    var = ARC4.new(key)
    dec = var.decrypt(enc_data)
    return dec

def decode_sodinokibi_configuration(f):
    filename = os.path.join('./samples', f)
    filename += '.exe'
    with open(filename, "rb") as file:
        bytes = file.read()
        str_hash = hashlib.sha256(bytes).hexdigest()
    pe = pefile.PE(filename)
    for section in pe.sections:
        section_name = section.Name.decode().rstrip('\x00')
        if section_name not in excluded_sections:
            data = section.get_data()
            enc_len = struct.unpack('I', data[0x24:0x28])[0]
            dec_data = arc4(data[0:32], data[0x28:enc_len + 0x28])
            parsed = json.loads(dec_data[:-1])
            return str_hash, parsed['pid'], parsed['sub']
            #print("Sample SHA256 Hash: ", str_hash)
            #print("Actor ID: ", parsed['pid'])
            #print("Campaign ID: ", parsed['sub'])
            #print("Attacker's Public Encryption Key: ", parsed['pk'])

Disclaimer: these lines are, obviously, not mine. I modified the script provided by the guys of BlackBerry ThreatVector. I invite you to read where they explain how the configuration is stored within the section, where's the RC4 encryption key and how to decrypt it.

In my version of the script, it runs on Python3 and uses a standard library for the RC4 algorithm. Also, it's worth to mention that this script fails if input samples are packed. It expects the existence of the particular section with the saved encrypted configuration; it fails otherwise. I added some controls to handle miserable crashes, but there are unmanaged cases still: I'm so new to Python!

In the end, we have a dear old CSV file enriched with a bunch of information: Country, City, Latitude, Longitude, ActorID, CampaignID, Hash, Timestamp. We're ready to map it.

Understanding the data

Our data is described inside a data.csv

Field aid (ActorID) is changed, during the months, from an integer number, like ActorID: 39 to a hash representation. For now, we have only 174 samples where we managed to extract the configuration. We can now group the data by aid field and count the submissions.

From what I see, I can understand that the samples related to ThreatActor with the ID 39 have nine submissions from the city of Ashburn US. I have to comprehend why this city has so many submissions related to Sodinokibi. I hope that someone that reads this post would help me to understand and shed some light.

If we map the ThreatActorID vs the City of the submission, we can easily see the data.

Next steps would be acquiring as many samples as I can. The best choice would be using VirusTotal API to retrieve the samples and this is what I'm going to do. Hopefully I won't burn my entire Company API limit.

All the scripts used in this post, the data and the Jupiter notebook used to map the data is available here.