There are new articles available, click to refresh the page.
Before yesterdayAvast Threat Labs

Binary Reuse of VB6 P-Code Functions

19 May 2021 at 10:12

Reusing binary code from malware is one of my favorite topics. Binary re-engineering and being able to bend compiled code to your will is really just an amazing skill. There is also something poetic about taking malware decryption routines and making them serve you.

Over the years this topic has come up again and again. Previous articles have included emit based rips [1], exe to dll conversion [2], emulator based approaches [3], and even converting malware into an IPC based decoder service [4].

The above are all native code manipulations which makes them something you can work with directly. Easy to disassemble, easy to debug, easy to patch. (Easy being a relative term of course :))

Lately I have been working on VB6 P-Code, and developing a P-Code debugger. One goal I had was to find a way to call a P-Code function, ripped from a malware, with my own arguments. It is very powerful to be able to harness existing code without having to recreate it (including all of its nuances.)

Is this even possible with P-Code? As it turns out, it is possible, and I am going to show you how.

The distilled knowledge below is small slice of what was unraveled during an 8 month research project into the VB6 runtime and P-code instruction set.

This paper includes 11 code samples which showcase a wide variety of scenarios utilizing this technique [5].

Note on offsets
In several places throughout this paper there may be VB runtime offsets presented. All offsets are to a reference copy with md5: EEBEB73979D0AD3C74B248EBF1B6E770 [6]. Microsoft was kind enough to publish debug symbols for this build including those for the P-Code engine handlers.

Barriers to entry
The VB6 runtime was designed to load executables, dlls, and ocx controls in an undocumented format. This format contains many complex interlinked structures that layout embedded forms, class structures, dependencies etc. During startup the runtime itself also requires certain initialization steps to occur as it prepares itself for use.

If we wish to execute P-Code buffers out of the context of a VB6 host executable there are several hurdles we must overcome:

VB Runtime Initialization 

Standard runtime initialization for executables takes place through the ThunRTMain export. This is the primary entry point for loading a VB6 executable. This function takes 1 argument that is the address of the top level VB Header structure. This structure contains the full complex hierarchy of everything else within. 

While we can utilize this path for our needs, there are easier ways to go about it. Starting from ThunRTMain can also create some problems on process termination so we will avoid it. 

In 2003 when exploring VB6’s ability to generate standard dlls I found a second path to runtime initialization through the CreateIExprSrvObj export.

This export is simple to call and automatically performs the majority of runtime initialization. Some TLS structure fields however are left out. In testing, most things operate fine. The only errors discovered occur when trying to use native VB file commands, MsgBox or the built in App object.

With a little extra leg work it has been found that the TLS structures can be manually completed to regain access to most of this native functionality. 

Finally if the P-Code buffer creates COM objects, a manual call to CoInitilize must also be performed. 

Replicating basic object structures

Once CreateIExprSrvObj has been executed, we can call into P-Code streams as many times as we want from our loader code. Structure initialization is minimal and only requires the following fields: 

If the P-Code routines utilize global variables then the codeObj.aModulePublic field will also have to be set to a writable block of memory. This has been demonstrated in the globalVar and complex_globals examples. We can even pre-initialize these variables here if we desire. 

In addition to filling out these primary structures, we also have to recreate the constant pool as expected by the specific P-Code. Finally we must also update a structure field in the P-Code to point to our current object Info structure. 

While this may sound complex, there is a generator utility which automatically does all of the work for you in the majority of cases. A more detailed explanation of the following code will be presented in later sections. 

Finding an entrypoint to transition into P-Code execution 

Execution of the VB6 P-Code occurs by calling the ProcCallEngine export of the VB runtime. The stub below is the same mechanism used internally by VB compiled applications to transfer execution between sub functions.

The offset_sub_main argument moved into EDX is the address of the target P-Code functions trailing structure that defines attributes of the function. We will discuss this structure in the following sections. 

The asm stub above shows the default scenario of calling a P-Code function with no arguments. A video showing this running in a debugger is available [7]

In the decrypt_test example we explore how to call a ripped function with a complex prototype and a Variant return value. This example demonstrates reusing an extracted P-Code decoder from a malware executable. Here we can call the extracted P-Code function passing it our own data:

Understanding P-Code function layout 

P-Code functions in compiled executables are linked by a structure that trails the actual byte code. This structure is called RTMI in the VB runtime symbols and the reversing community has taken to it as ProcDscInfo. A partial excerpt of this structure is shown below: 

When we rip a P-Code function from a compiled binary, we must also extract the configured RTMI structure. ProcCallEngine requires this information in order to run a P-Code routine successfully. 

When we relocate the P-Code block outside of the target binary, we must also update the link to our new object Info table.

This is what is being set in the generated code: 

Here the rc4 buffer contains the entire ripped function, starting with the P-Code and then followed by the RTMI structure which starts at offset 0x3e4. We then patch in the address of our manually filled out object Info into the RTMI.pObjTable field. Once this is complete, the P-Code is ready for execution.

Code Generation 

When developing a method such as this, we must start with known quantities. For our purposes we are writing our own test code which is done normally in the VB6 Integrated Development Environment. This code is then extracted using a utility which generates the C or VB6 source necessary to execute it independently.

The generator tool we are using in this paper is the free VBDec [8] P-Code debugger. 

While exploring this technique, the sample code has been optimized to follow several conventions for clarity. For this research all code samples were ripped from functions in a single module. This design was chosen so that all sub function access occurs through the ImpAdCall* opcodes which draw directly against function pointers in the const pool. 

Code taken from instanced form or class compilation units would require support to replicate VTable layouts for the *Vcall opcodes. While this can be done I will leave that as future work for now.

Samples are available that make extensive use of callbacks to integrate tightly with the host code. This is useful for integrating debug output through the C host in a simple manner. 

Callbacks are accessed through the standard VB API Declare syntax which is a core part of the language and is well documented. Below are examples of sending both numeric and string debug info from the P-Code to the host. 

Giving VB direct access to the host functions, is as simple as setting their address in the corresponding constant pool slot. 

Ripping functions with VBDec is simple. Simply right click on the function in the left hand treeview and choose the Rip menu option. VBDec will generate all of the embedding data for you. Multiple functions can be ripped at once by right clicking on the top level module name. 

A corresponding const pool will also be auto-generated along with stubs to update the object Info pointers and asm stubs to call interlinked sub functions.

Once extraction/generation is complete it is left up to the developer to integrate the data into one of the sample frameworks provided.

A spectrum of samples are provided ranging from very simple, to quite complex. Samples include:

Sample Description
firstTest simple addition test
globalVar global variables test
structs passing structs from C to P-Ccode
two_funcs interlink two P-Code functions
ConstPool test decoding a binary const pool entry
lateBinding late bind sapi voice example
earlyBinding early bind sapi voice example
decrypt_test P-Code decryptor w/ complex prototype
Variant Data C host returns variant types from callback to P-Code.
benchmark RC4 benchmarking apps in C/P-Code code and straight C

Understanding the Const Pool 

Each compilation unit such as a module, class, form etc gets its own constant pool which is shared for all of the functions in that file. Pool entries are built up on demand as the file is processed by the compiler from top to bottom.

The constant pool can contain several types of entries such as: 

  • string values (BSTRs specifically) 
  • VB method native call stubs 
  • API import native call stubs 
  • COM GUIDs 
  • COM CLSID / IID pairs held in COMDEF structures 
  • CodeObject base offsets (not applicable to our work here) 
  • internal runtime COM objects filled out at startup (not supported) 

VBDec is capable of automatically deciphering these entries and figuring out what they represent. Once the correct type has been determined, it can generate the C or VB source necessary to fill out the const pool in the host code. The constant pool viewer form allows you to manually view these entries.

In testing it has been performing extremely well outputting complete const pools which require little to no modification. 

For callback integration with the host, if you use “dummy” as the dll name, it will automatically be assumed as a host callback. Otherwise it will be translated literally as a LoadLibrary/GetProcAddress call.

Some const pool entries may show up as Unknown. When you click on a specific entry the raw data at that offset will be loaded into the lower textbox. If this data shows all 00 00 00 00’s then this is a reference to an internal VB runtime COM object that would normally be set to a live instance at initialization.

This has been seen when using the App Object. Normally this would be set @6601802F inside _TipRegAppObject function of the runtime on initialization. These types of entries are not currently supported using this technique (and would not make sense in our context anyways.) 

Interlinked sub functions are supported. A corresponding native stub will be generated along with an entry in the const pool for it. 

Early binding and late binding to COM objects is also supported. Late binding is done entirely through strings in the const pool. For early binding you will seen a COMDEF structure and CLSID / IID data automatically generated.

The following is taken from the early binding sample which loads the Sapi.SpVoice COM object. 

Generation of this code is generally automatic by VBDec but there may be times where the tool can not automatically detect which kind of const pool entry is being specified. In these cases you may have to manually explore the const pool and extract the data yourself.

In the above scenario the file data at the const pool address may look similar to the following:

If we visualize this as a COMDEF structure we can see the values 0, 0x401230, 0x401240, 0. Looking at the file offsets for these virtual addresses we find the GUIDs given above. 

String entries are held as BSTRs, which is a length prefixed unicode string. Since we are in complete control of the const pool, and BSTRs can encapsulate binary data. It is possible to include encrypted strings directly in the const pool using SysAllocStringByteLen. The binary_ConstPool* samples demonstrate this technique. You can also dynamically swap out const pool entries to change functionality as the P-Code runs. An example of this is found in the early bind sample. 

Note: It is important to use the SysAlloc* string functions to get real BSTR’s for const pool entries. As the strings get used by the runtime, it may try to realloc or release them.

Extended TLS Initialization

The VB6 runtime stores several key structures in Thread Local Storage (TLS). Several functions of the runtime require these structures to be initialized. These structures are critical for VB error handling routines and can also come into play for file access functions.

Below is the code for the rtcGetErl export. This function retrieves the user specified error line number associated with the last exception that occurred.

From this snippet of code we can see that the runtime stores the TLS slot value at offset 66110000. Once the actual memory address is retrieved with TlsGetValue The structure field 0x98 is then returned as the stored last error line number. In this manner we can begin to understand the meaning of the various structure offsets.

Even without a full analysis of the complete 0xA8 byte structure we can compare the values seen between a fully initialized process with those initialized through the CreateIExprSrvObj export.

Once diffed 2 main empty slots are observed which normally point to other allocations.

  • field 0x18 – normally set @ 66015B25 in EbSetContextWorkerThread
  • field 0x48 – normally set @ 66018081 in RegAppObjectOfProject

Field 0x48 is used for access to the internal VB App. COM object. This object does not make sense to use in our scenario and does not trigger any exceptions if left blank. If we had to replicate the COM object for compatibility with existing code we could however insert a dummy object.

The allocation at offset 0x18 is only required if we wish to use built in VB file operation commands or the MsgBox function.

If demanded for compatibility with ripped code, It was interesting to see if a manual allocation would allow the runtime to operate properly.

The following code was created to dynamically lookup the TLS slot value, retrieve the tlsEbthread memory offset and then manually link in a new allocation to the missing 0x18 field.

Once the above code was integrated full access was restored to the native VB file access functions. Again this extended initialization is not always required.

Debugging integration’s 

When testing this technique it is best to start with your own code that you control. This way you can get familiar with it and develop a feel for working with (and recognizing) the different function prototypes.

The first step is to write and debug your VB6 code as normal in the VB6 IDE. In preparation for running as a byte buffer, you can then pepper the VB code with progress callbacks to API Declare routines which normally access C dll exports.. You don’t actually have to write the dll, but you can. The calls are identical when hosted internally from a native C loader (or even a VB hosted Addressof callback routine). 

If you are calling into a P-Code function with a specific prototype, this is the trickiest part of the integration. Samples are available which pass in int, structures, references, Variants, bools and byte arrays. You will have to be very aware if arguments are being passed in ByVal, or the default ByRef (pointers).

Also pay attention to the function return types. If no argument/return type is defined, it defaults to a COM Variant. VB functions receive variant return values by pushing an extra empty one onto the stack before calling the function. Simple numeric return values are passed back in EAX as normal.

When interacting with callbacks make sure the callbacks are defined as __stdcall. All of the standard VB6 <–> C development knowledge applies. You can cut your teeth on these rules by working with standard C dlls and debugging in Visual Studio from the dll side while launching a VB6 exe host.

When in doubt you can create simple tests to debug just the function prototypes. For the complex prototype decryptor sample given above, I had the VB6 sub main() code call the rc4 function with expected parameters to test it in its natural environment. I could then debug the VB6 executable to watch the exact stack parameters passed to develop more insight into how to replicate it manually from my C loader.

This can be done with a native debugger by setting a breakpoint @6610664E on the ImpAdCallFPR4 handler in the VB runtime. Here you could examine the stack before entry into the target P-Code function. VBDec’s P-Code debugger is also convenient for this task.

When debugging it is best to have the reference copy of the VB runtime in the same directory as the target executable so that all of your offsets line up with your runtime disassembly with debug symbols. If you use IDA as your debugger, start with the disassembly of the VB runtime and set the target executable in the debugger options. Asm focused debuggers such as Olly or x64dbg are highly recommended over Visual Studio which is primarily based around source code debugging. 


When working on malware analysis it is a common task to have to interoperate with various types of custom decoding routines. There are multiple approaches to this. One can sit down and reverse engineer the entire routine and make sure your code is 100% compatible, or you can try to explore rip based techniques. 

Ripping decoders is a fairly common task in my personal playbook. While researching the internals of the VB runtime it was a natural inquiry for me to see if the same concept could be applied to P-Code functions. 

With some experimentation, and a suitable generator, this technique has proven stable and relatively easy to implement. These experiments have also deepened my insights into how the various structures are used by the runtime and my appreciation for how tightly VB6 can integrate with C code. 

Hopefully this information will give you a new arrow to add to your quiver, or at least have been an interesting ride. 

[1] Emit based rip
[2] Using an exe as a dll
[3] Running byte blobs in scdbg
[4] Malware IPC decoder service
[5] Code samples
[6] VB6 runtime with symbols
[7] VB6 internals video
[8] VBDec P-Code Debugger

The post Binary Reuse of VB6 P-Code Functions appeared first on Avast Threat Labs.

Writing a VB6 P-Code Debugger

12 May 2021 at 12:46


In this article we are going to discuss how to write a debugger for VB6 P-code. This has been something I have always wanted to do ever since I first saw the WKTVBDE P-Code Debugger written by Mr Silver and Mr Snow back in the early 2000’s

There was something kind of magical about that debugger when I first saw it. It was early in my career, I loved programming in VB6, and reversing it was a mysterious dark art.

While on sabbatical I finally I found the time to sit down and study the topic in depth. I am now sharing what I discovered along the way.

This article will build heavily on the previous paper titled VB P-Code Disassembly[1]. In this paper we detailed how the run time processes P-Code and transfers execution between the different handlers.

It is this execution flow that we will target to gain control with our debugger.

An example of the debugger architecture detailed in this paper can be found in the free vbdec pcode disassembler and debugger


When I started researching this topic I wanted to first examine what a process running within the WKTVBDE P-Code debugger looked like.

A test P-Code executable was placed alongside a copy of the VB runtime with debug symbols[2]. The executable was launched under WKTVBDE and then a native debugger was attached.

Examining the P-Code function pointer tables at 0x66106D14 revealed all the pointers had been patched to a single function inside the WKTVBDE.dll

This gives us our first hint at how they implemented their debugger. It is also worth noting at this point that the WKTVBDE debugger runs entirely within the process being debugged, GUI and all!

To start the debugger, you run loader.exe and specify your target executable. It will then start the process and inject the WKTVBDE.dll within it. Once loaded WKTVBDE.dll will hook the entire base P-Code handler table with its own function giving it first access to whatever P-Code is about to execute.

The debugger also contains:

  • a P-Code disassembler
  • ability to parse all of the nested VB internal structures
  • ability to list all code objects and control events (like on timer or button click)

This is in addition to the normal debugger UI actions such as data dumping, breakpoint management, stack display etc.

This is A LOT of complex code to run as an injection dll. Debugging all of this would have been quite a lot of work for sure.

With a basic idea of how the debugger operated, I began searching the web to find any other information I could. I was happy to find an old article by Mr Silver on Woodmann that I have mirrored for posterity [3].

In this article Mr Silver lays out the history of their efforts in writing a P-Code debugger and gives a template of the hook function they used. This was a very interesting read and gave me a good place to start.

Design Considerations:

Looking forward there were some design considerations I wanted to change in this architecture.

The first change would be that I would want to move all of the structure parsing, disassembler engine, and user interface code into its own stand alone process. These are complicated tasks and would be very hard to debug as a DLL injection.

To accomplish this task we need an easy to use, stable inter-process communication (IPC) technique that is inherently synchronous. My favorite technique in this category is using Windows Messages which automatically cause the external process to wait until the window procedure has completed before it returns.

I have used this technique extensively to stall malware after it unpacks itself [4]. I have even wired it up to a Javascript engine that interfaces with a remote instance of IDA [5].

This design will give us the opportunity to freely write and debug the file format parsing, disassembly engine, and user interface code completely independent of the debugger core.

At this point debugger integration essentially becomes an add on capability of the disassembler. The injection dll now only has to intercept execution and communicate with the main interface.

For the remainder of this paper we will assume that a fully operational disassembler has already been created and only focus on the debugger specific details.

For discussions on how to implement a disassembler and a reference implementation on structure parsing please refer to the previous paper [1].


With sufficient information now in hand it was time to start experimenting with gaining control over the execution flow.

Our first task is figuring out how to hook the P-Code function pointer table. Before we can hook it, we actually need to be able to find it first! This can be accomplished in several ways. From the WKTVBDE authors paper it sounds like they progressed in three main stages. First they started with a manually patched copy of the VB run time and the modified dll referenced in the import table.

Second they then progressed to a single supported copy of the run time with hard coded offsets to patch. A loader now injecting the debugger dll into the target process. Finally they added the ability to dynamically locate and patch the table regardless of run time version.

This is a good experimental progression which they detail in depth. The second stage is readily accessible to anyone who can understand this paper and will work sufficiently well. I will leave the details of injection and hooking as an exercise to the reader.

The basic steps are:

  • set the memory writable
  • copy the original function pointer table
  • replace original handlers with your own hook procedures

The published sample also made use of self modifying code, which we will seek to avoid. To get around this we will introduce individual hook stubs, 1 per table, to record some additional data.

Before we get into the individual hook stubs, we notice they stored some run time/state information in a global structure. We will expand on this with the following:

From the hooking code you will notice that all of the base opcodes in the first table (excluding lead byte handlers) all received the same hook. The Lead_X bytes at the end each received their own procedure.

Below shows samples of the hook handlers for the first two tables. The other 4 follow the same pattern:

The hooks for each individual table configure the global VM structure fields for current lead byte and table base. The real meat of the implementation now starts in the universal hook procedure.

In the main PCodeHookProc you will notice that we call out to another function defined as: void NotifyUI().

It is in this function where we do things like check for breakpoints, handle single stepping etc. This function then uses the synchronous IPC to talk to the out of process debugger user interface.

The debugger UI will receive the step notification and then go into a wait loop until the user gives a step/go/stop command. This has the effect of freezing the debugee until the SendMessage handler returns. You can find a sample implementation of this in the SysAnalyzer ApiLogger source [6].

The reason we call out to another function from PCodeHookProc is because it is written as a naked function in assembler. Once free from this we can now easily implement more complex logic in C.

Further steps:

Once all of the hooks are implemented you still need a way to exercise control over the debuggee. When the code is being remotely frozen, the remote GUI is actually still free to send the frozen process new commands over a separate IPC back channel.

In this manner you can manage breakpoints, change step modes, and implement lookup services through runtime exports such as rtcTypeName.

The hook dll can also patch in custom opcodes. The code below adds our own one byte NOP instruction at unused slot 0x01

As hinted at in the comments, features such as live patching of the current opcode, and “Set New Origin Here” type features are both possible. These are implemented by the debugger doing direct WriteProcessMemory calls to the global VM struct. The address of this structure was disclosed in initialization messages at startup.


Writing a P-Code debugger is a very interesting concept. It is something that I personally wanted to do for the better part of 20 years.

Once you see all the moving parts up close it is not quite as daunting as it may seem at first glance.

Having a working P-Code debugger is also a foundational step to learning how the P-Code instruction set really works. Being able to watch VB6 P-code run live with integrated stack diffing and data viewer tools is very instructive. Single stepping at this level of granularity gives you a much clearer, higher level overview of what is going on.

While the hook code itself is technically challenging, there are substantial tasks required up front just to get you into the game.

Prerequisites for this include:

  • accurate parsing of an undocumented file format
  • a solid disassembly engine for an undocumented P-Code instruction set
  • user interface that allows for easy data display and debugger control

For a reverse engineer, a project such as this is like candy. There are so many aspects to analyze and work on. So many undocumented things to explore. A puzzle with a thousand pieces.

What capabilities can be squeezed out of it? How much more is there to discover?

For me it is a pretty fascinating journey that also brings me closer to the language that I love. Hopefully these articles will inspire others and enable them to explore as well.

[1] – VB P-Code Disassembly
[2] – VB6 runtime with symbols (MD5: EEBEB73979D0AD3C74B248EBF1B6E770)
[3] – VB P-code Information by Mr Silver
[4] – ApiLogger – Breaking into Malware
[5] – IDA JScript
[6] – SysAnalyzer ApiLogger – freeze remote process

The post Writing a VB6 P-Code Debugger appeared first on Avast Threat Labs.

VB6 P-Code Disassembly

5 May 2021 at 05:48

In this article we are going to discuss the inner depths of VB6 P-Code disassembly and the VB6 runtime.

As a malware analyst, VB6 in general, and P-Code in particular, has always been a problem area. It is not well documented and the publicly available tooling did not give me the clarity I really desired.

In several places throughout this paper there may be VB runtime offsets presented. All offsets are to a reference copy with md5: EEBEB73979D0AD3C74B248EBF1B6E770 [1]. Microsoft has been kind enough to provide debug symbols with this version for the .ENGINE P-Code handlers.

To really delve into this topic we are going to have to cover several areas.

The general layout will cover:

  • how the runtime executes a P-Code stream
  • how P-Code handlers are written
  • primer on the P-Code instruction set
  • instruction groupings
  • internal runtime conventions
  • how to debug handlers

Native Opcode Handlers & Code Flow

Let’s start with how a runtime handler interprets the P-Code stream.

While in future articles we will detail how the transition is made from native code to P-Code. For our purposes here, we will look at individual opcode handlers once the P-Code interpretation has already begun.

For our first example, consider the following P-Code disassembly:

Here we can see two byte codes at virtual address 0x401932. These have been decoded to the instruction LitI2_Byte 255. 0xF4 is the opcode byte. 0xFF is the hardcoded argument passed in the byte stream. 

The opcode handler for this instruction is the following:

While in a handler, the ESI register will always start as the virtual address of the next byte to interpret. In the case above, it would be 0x401933 since the 0xF4 byte has already been processed to get us into this handler.

The first instruction at 0x66105CAB will load a single byte from the P-Code byte stream into the EAX register. This value is then pushed onto the stack. This is the functional operation of this opcode.

EAX is then cleared and the next value from the byte stream is loaded into the lower part of EAX (AL). This will be the opcode byte that takes us to the next native handler.

The byte stream pointer is then incremented by two. This will set ESI past the one byte argument, and past the next opcode which has already been consumed.

Finally, the jmp instruction will transfer execution to the next handler by using the opcode as an array index into a function pointer table.

Now that last sentence is a bit of a mouth full, so lets include an example. Below is the first few entries from the _tblByteDisp table. This table is an array of 4 byte function pointers.

Each opcode is an index into this table. The *4 in the jump statement is because each function pointer is 4 bytes (32 bit code).

The only way we know the names of each of these P-Code instructions is because Microsoft included the handler names in the debug symbols for a precious few versions of the runtime. 

The snippet above also reveals several characteristics of the opcode layout to be aware of. First note, there are invalid slots such as opcode 0x01-InvalidExCode. The reason for this is unknown, but it also means we can have some fun with the runtime such as introducing our own opcodes [5]

The second thing to notice is that multiple opcodes can point to the same handlers such as the case with lblEX_Bos. Here we see that opcode 0 leads to the same place as opcode 2. There are actually 5 opcode sequences which point to the BoS (Beginning of Statement) handler.

The next thing to notice is that the opcode names are abbreviated and will require some deciphering to learn how to read them. 

Finally from the LitI2_Byte handler we already analyzed, we can recognize that all of the stubs were hand written in assembler. 

From here, the next question is how many handlers are there? If each opcode is a single byte, there can only be a maximum of 256 handlers right? That would make sense, but is incorrect.

If we look at the last 5 entries in the _tblByteDisp table we find this:

The handler for each of these looks similar to the following:

Here we see EAX zeroed out, the next opcode byte loaded into AL and the byte code pointer (ESI) incremented. Finally it uses that new opcode to jump into an entirely different function pointer table.

This would give us a maximum opcode count of (6*256)-5 or 1531 opcodes.

Now luckily, not all of these opcodes are defined. Remember some slots are invalid, and some are duplicate entries. If we go through and eliminate the noise, we are left with around 822 unique handlers. Still nothing to sneeze at.

So what the above tells us is that not all instructions can be represented as a single opcode. Many instructions will be prefixed with a lead byte that then makes the actual opcode reference a different function pointer table.

Here is a clip from the second tblDispatch pointer table:

To reach lblEX_ImpUI1 we would need to encode 0xFB as the lead byte and 0x01 as the opcode byte.

This would first send execution into the _lblBEX_Lead0 handler, which then loads the 0x01 opcode and uses tblDispatch table to execute lblEX_ImpUI1.

A little bit confusing, but once you see it in action it becomes quite clear. You can watch it run live for yourself by loading a P-Code executable into a native debugger and setting a breakpoint on the lead* handlers.

Byte stream argument length  

Before we can disassemble a byte stream, we also need to know how many byte code arguments each and every instruction takes. With 822 instructions this can be a big job! Luckily other reversers have already done much of the work for us. The first place I saw this table published was from Mr Silver and Mr Snow in the WKTVBDE help file.

A codified version of this can be found in the Semi-VbDecompiler source [2] which I have used as a reference implementation. The opcode sizes are largely correct in this table, however some errors are still present. As with any reversing task, refinement is a process of trial and error. 

Some instructions, 18 known to date, have variable length byte stream arguments. The actual size of the byte stream to consume before the next opcode is embedded as the two bytes after the opcode. An example of this is the FFreeVar instruction.

In this example we see the first two bytes decode as 0x0008 (little endian format), which here represents 4 stack variables to free.

Opcode Naming Conventions

Before we continue on to opcode arguments, I will give a brief word on naming conventions and opcode groupings.

In the opcode names you will often see a combination of the following abbreviations. The below is my current interpretation of the less intuitive specifiers:

Opcode abbreviation Description
Imp Import
Ad Address
St / Ld Store / Load
I2 Integer/Boolean
I4 Long
UI1 Byte
Lit Literal(ie “Hi”,2,8 )
Cy Currency
R4 Single
R8 Double
Str String
Fn Calls a VBA export function
FPR Floating point register
PR Uses ebp-4C as a general register
Var Variant
Rf Reference
VCall VTable call
LateID Late bound COM object call by method ID
LateNamed Late bound COM Object call by method name

Specifiers are often combined to denote meaning and opcodes often come in groups such as the following:

An opcode search interface such as this is very handy while learning the VB6 instruction set.

Opcode Groups

The following shows an example grouping:

Opcode abbreviation Description
ForUI1 Start For loop with byte as counter type
ForI2 With integer counter, default step = 1
ForI4 Long type as counter
ForStepUI1 For loop with byte counter, user specified step
ForEachCollVar For each loop over collection using variant
ForEachAryVar For each loop over array using variant
ForEachCollObj For each loop over collection using object type

A two part series on the intricacies of how For loops were implemented is available [3] for the curious.

As you can see, the opcode set can collapse down fairly well once you take into account the various groupings. While I have grouped the instructions in the source, I do not have an exact number as the lines between them can still be a bit fuzzy. It is probably around 100 distinct operations once grouped.

Now onto the task of argument decodings. I am not sure why, but most P-Code tools only show you the lead byte, opcode byte, mnemonic. Resolved arguments are only displayed if it is fully handled. 

Everything except Semi-VBDecompiler [6] skips the display of the argument bytes.

The problem arises from the fact no tool decodes all of the arguments correctly for all of the opcodes yet. If you do not see the argument byte stream, there is no indication other than a subtle jump in virtual address that anything has been hidden from you. 

Consider the following displays:

The first version shows you opcode and mnemonic only. You don’t even realize anything is missing. The second version gives you a bigger hint and at least shows you no argument resolution is occurring. The third version decodes the byte stream arguments, and resolves the function call to a usable name.

Obviously the third version is the gold standard we should expect from a disassembler. The second version can be acceptable and shows something is missing. The first version leaves you clueless. If you are not already intimately familiar with the instruction set, you will never know you are missing anything.

Common opcode argument types

In the Semi-VbDecompiler source many opcodes are handled with custom printf type specifiers [4]. Common specifiers include:

Format specifier Description
%a Local argument
%l Jump location
%c Proc / global var address stored in constant pool
%e Pool index as P-Code proc to call
%x Pool index to locate external API call
%s Pool index of string address
%1/2/4 Literal byte, int, or long value
%t Code object from its base offset
%v VTable call
%} End of procedure

Many opcodes only take one or more simple arguments, %a and %s being the most common.

Consider "LitVarStr %a %s" which loads a variant with a literal BSTR string, and then pushes that address to the top of the stack:

The %a decoder will read the first two bytes from the stream and decode it as follows:

Interpreting 0xFF68 as a signed 2 byte number is -0x98. Since it is a negative value, it is a local function variable at ebp-0x98. Positive values denote function arguments. 

Next the %s handler will read the next two bytes which it interprets as a pool index. The value at pool index 0 is the constant 0x40122C. This address contains an embedded BSTR where the address points to the unicode part of the string, and the preceding 4 bytes denoting its length.

A closer look at run time data for this instruction is included in the debugging section later on.

Another common specifier is the %l handler used for jump calculations. It  can be seen in the following examples:

In the first unconditional jump the byte stream argument is 0x002C. Jump locations are all referenced from the function start address, not the current instruction address as may be expected. 

0x4014E4 + 0x2C = 0x401510 
0x4014E4 + 0x3A = 0x40151E

Since all jumps are calculated from the beginning of a function, the offsets in the byte stream must be interpreted as unsigned values. Jumps to addresses before the function start are not possible and represent a disassembly error. 

Next lets consider the %x handler as we revisit the "ImpAdCallFPR4 %x" instruction:

The native handler for this is:

Looking at the P-Code disassembly we can see the byte stream of 24001000 is actually two integer values. The first 0x0024 is a constant pool index, and the second 0x0010 is the expected stack adjustment to verify after the call. 

Now we haven’t yet talked about the constant pool or the house keeping area of the stack that VB6 reserves for state storage. For an abbreviated description, at runtime VB uses the area between ebp and ebp-94h as kind of a scratch pad. The meaning of all of these fields are not yet fully known however several of the key entries are as follows:

Stack position Description
ebp-58 Current function start address
ebp-54 Constant pool
ebp-50 Current function raw address (RTMI structure)
ebp-4C PR (Pointer Register) used for Object references

In the above disassembly we can see entry 0x24 from the constant pool would be loaded.

A constant pool viewer is a very instructive tool to help decipher these argument byte codes.

It has been found that smart decoding routines can reliably decipher constant pool data independent of analysis of the actual disassembly.

One such implementation is shown below:

If we look at entry 0x0024 we see it holds the value 0x4011CE. If we look at this  address in IDA we find the following native disassembly:

0x40110C is the IAT address of msvbvm60.rtcImmediateIf import. This opcode is how VB runtime imports are called. 

While beyond the scope of this paper, it is of interest to note that VB6 embeds a series of small native stubs in P-Code executables to splice together the native and P-Code executions. This is done for API calls, call backs, inter-modular calls etc. 

The Constant Pool

The constant pool itself is worth a bit of discussion. Each compilation unit such as a module, class, form etc gets its own constant pool which is shared for all of the functions in that file. 

Pool entries are built up on demand as the file is processed by the compiler from top to bottom. 

The constant pool can contain several types of entries such as: 

  • string values (BSTRs specifically) 
  • VB method native call stubs 
  • API import native call stubs 
  • COM GUIDs 
  • COM CLSID / IID pairs held in COMDEF structures 
  • CodeObject base offsets
  • blank slots which represent internal COM objects filled out at startup by the runtime (ex: App.)

More advanced opcode processors

More complex argument resolutions require a series of opcode post processors.  In the disassembly engine I am working on there are currently 13 post processors which handle around 30 more involved opcodes.

Things start to get much more complex when we deal with COM object calls. Here we have to resolve the COM class ID, interface ID, and discern its complete VTable layout to determine which method is going to be called. This requires access to the COM objects type library if its an external type, and the ability to recreate its function prototype from that information.

For internal types such as user classes, forms and user controls, we also need to understand their VTable layout. For internal types however we do not receive the aid of tlb files. Public methods will have their names embedded in the VB file format structures which can be of help.

Resolution of these types of calls is beyond the scope of what we can cover in an introductory paper, but it is absolutely critical to get right if you are writing a disassembler that people are going to rely upon for business needs.

More on opcode handler inputs

Back to opcode arguments. It is also important to understand that opcodes can take dynamic runtime stack arguments in addition to the hard coded byte stream arguments. This is not something that a disassembler necessarily needs to understand though. This level of knowledge is mainly required to write P-Code assembly or a P-Code decompiler. 

Some special cases however do require the context of the previous disassembly in order to resolve properly. Consider the following multistep operation:

Here the LateIdLdVar resolver needs to know which object is being accessed. Scanning back and locating the VCallAd instruction is required to find the active object stored in PR

Debugging handlers

When trying to figure out complex opcode handlers, it is often helpful to watch the code run live in a debugger. There are numerous techniques available here. Watching the handler itself run requires a native debugger. 

Typically you will figure out how to generate the opcode with some VB6 source which you compile. You then put the executable in the same directory as your reference copy of the vb runtime and start debugging. 

Some handlers are best viewed in a native debugger, however many can be figured out just by watching it run through a P-Code debugger. 

A P-Code debugger simplifies operations showing you its execution at a higher level. In one step of the debugger you can watch multiple stack arguments disappear, and the stack diff light up with changes to other portions. Higher level tools also allow you to view complex data types on the stack as well as examine TLS memory and keep annotated offsets. 

In some scenarios you may actually find yourself running both a P-Code debugger and a native debugger on the target process at the same time. 

One important thing to keep in mind is that VB6 makes heavy use of COM types. 

Going back to our LitVarStr example:

You would see the following after it executes:

0019FC28 ebp-120 0x0019FCB0 ; ebp-98 - top of stack 
0019FCB0 ebp-98 0x00000008 
0019FCB4 ebp-94 0x00000000  
0019FCB8 ebp-90 0x0040122C

A data viewer would reveal the following when decoding ebp-98 as a variant:

Variant 19FCB0 
VT: 0x8( Bstr ) 
Res1: 0 
Res2: 0 
Res3: 0 
Data: 40122C 
String len: 9 -> never hit

Debugging VB6 apps is a whole other ball of wax. I mention it here only in passing to give you a brief introduction to what may be required when deciphering what opcodes are doing. In particular recognizing Variants and SafeArrays in stack data will serve you well when working with VB6 reversing.


In this paper we have laid the necessary ground work in order to understand the basics of a VB6 P-Code disassembly engine. The Semi-VbDecompiler source is a good starting point to understand its inner workings. 

We have briefly discussed how to find and read native opcode handlers along with some of the conventions necessary for understanding them. We introduced you to how opcodes flow from one to the next, along with how to determine the number of byte stream arguments each one takes, and how to figure out what they represent. 

There is still much work to be done in terms of documenting the instruction set. I have started a project where I catalog:

  • VB6 source code required to generate an opcode
  • byte stream arguments size and meaning
  • stack arguments consumed
  • function outputs

Unfortunately it is still vastly incomplete. This level of documentation is foundational and quite necessary for writing any P-Code analysis tools.

Still to be discussed, is how to find the actual P-Code function blobs within the VB6 compiled executable. This is actually a very involved task that requires understanding a series of complex and nested file structures. Again the Semi-VbDecompiler source can guide you through this maze.

While VB6 is an old technology, it is still commonly used for malware. This research is aimed at reducing gaps in understanding around it and is also quite interesting from a language design standpoint. 

[1] – VB6 runtime with symbols
[2] – Semi-VbDecompiler opcode table Source
[3] – A closer look at the VB6 For Loop implementation
[4] – Semi-VBDecompiler opcode argument decodings 
[5] – Introducing a one byte NOP opcode
[6] – Semi-VBDecompiler

The post VB6 P-Code Disassembly appeared first on Avast Threat Labs.

VB6 P-Code Obfuscation

28 April 2021 at 09:37

Code obfuscation is one of the cornerstones of malware. The harder code is to analyze the longer attackers can fly below the radar and hide the full capabilities of their creations.

Code obfuscation techniques are very old and take many many forms from source code modifications, opcode manipulations, packer layers, virtual machines and more.

Obfuscations are common amongst native code, script languages, .NET IL, and Java byte code

As a defender, it’s important to be able to recognize these types of tricks, and have tools that are capable of dealing with them. Understanding the capabilities of the medium is paramount to determine what is junk, what is code, and what may simply be a tool error in data display. 

On the attackers side, in order to develop a code obfuscation there are certain prerequisites required. The attacker needs tooling and documentation that allows them to craft and debug the complex code flow. 

For binary implementations such as native code or IL, this would involve specs of the target file format, documentation on the opcode instruction set, disassemblers, assemblers, and a capable debugger.

One of the code formats that has not seen common obfuscation has been the Visual Basic 6 P-Code byte streams. This is a proprietary opcode set, in a complex file format, with limited tooling available to work with it. 

In the course of exploring this instruction set certain questions arose:

  • Can VB6 P-Code be obsfuscated at the byte stream layer? 
  • Has this occurred in samples in the wild?
  • What would this look like?
  • Do we have tooling capable of handling it?


Before we continue, we will briefly discuss the VB6 P-Code format and the tools available for working with it.

VB6 P-Code is a proprietary, variable length, binary instruction set that is interpreted by the VB6 Virtual Machine (msvbvm60.dll).

In terms of documentation, Microsoft has never published details of the VB6 file format or opcode instruction set. The opcode handler names were gathered by reversers from the debug symbols leaked with only a handful of runtimes. 

At one time there was a reversing community,  vb-decompiler.theautomaters.com, which was dedicated to the VB6 file format and P-Code instruction set. Mirrors of this message board are still available today [1]. 

On the topic of tooling the main disassemblers are p32Disasm, VB-Decompiler, Semi-Vbdecompiler and the WKTVBDE P-Code debugger.

Of these only Semi-Vbdecompiler shows you the full argument byte stream, the rest display only the opcode byte. While several private P-Code debuggers exist, WKTVBDE is the only public tool with debugging capabilities at the P-Code level. 

In terms of opcode meanings. This is still widely undocumented at this point. Beyond intuition from their names you would really have to compile your own programs from source, disassemble them, disassemble the opcode handlers and debug both the native runtime and P-Code to get a firm grasp of whats going on. 

As you can glimpse, there is a great deal of information required to make sense of P-Code disassembly and it is still a pretty dark art for most reversers. 

Do VB6 obfuscators exist?

While doing research for this series of blog posts we started with an initial sample set of 25,000 P-Code binaries which we analyzed using various metrics. 

Common tricks VB6 malware uses to obfuscate their intent include:

  • junk code insertion at source level
  • inclusion of large bodies of open source code to bulk up binary
  • randomized internal object and method names 
    • mostly commonly done at pre-compilation stage
    • some tools work post compilation.
  • all manner of encoded strings and data hiding
  • native code blobs launched with various tricks such as CallWindowProc

To date, we have not yet documented P-Code level manipulations in the wild. 

Due to the complexity of the vector, P-Code obsfuscations could have easily gone undetected to date which made it an interesting area to research. Hunting for samples will continue.

Can VB P-Code even be obfuscated and what would that look like?

In the course of research, this was a natural question to arise.  We also wanted to make sure we had tooling which could handle it. 

Consider the following VB6 source:

The default P-Code compilation is as follows:

 An obsfuscated sample may look like the following:

From the above we see multiple opcode obfuscation tricks commonly seen in native code.

It has been verified that this code runs fine and does not cause any problems with the runtime. This mutated file has been made available on Virustotal in order for vendors to test the capabilities of their tooling [2]. 

To single out some of the tricks:

Jump over junk:

 Jumping into argument bytes:

At runtime what executes is:

Do nothing sequences:

 Invalid sequences which may trigger fatal errors in disassembly tools:


The easiest markers of P-Code obfuscation are:

  •     jumps into the middle of other instructions
  •     unmatched for/next opcodes counts
  •     invalid/undefined opcodes 
  •     unnatural opcode sequences not produced by the compiler
  •     errors in argument resolution from randomized data 

Some junk sequences such as Not Not can show up normally depending on how a routine was coded.

This level of detection will require a competent, error-free, disassembly engine that is aware of the full structures within the VB6 file format. 


Code obfuscation is a fact of life for malware analysts. The more common and well documented the file format, the more likely that obfuscation tools are wide spread in the wild.

This reasoning is likely why complex formats such as .NET and Java had many public obfuscators early on.

This research proves that VB6 P-Code obfuscation is equally possible and gives us the opportunity to make sure our tools are capable of handling it before being required in a time constrained incident response. 

The techniques explored here also grant us the insight to hunt for advanced threats which may have been already using this technique and had flown under the radar for years.

We encourage researchers to examine the mutated sample [2] and make sure that their frameworks can handle it without error.


[1] vb-decompiler.theautomaters.com mirror

[2] Mutated P-Code sample SHA256 and VirusTotal link

The post VB6 P-Code Obfuscation appeared first on Avast Threat Labs.

Binary Data Hiding in VB6 Executables

22 April 2021 at 12:47


This is part one in a series of posts that focus on understanding Visual Basic 6.0 (VB6) code, and the tactics and techniques both malware authors and researchers use around it. 


This document is a running tally covering many of the various ways VB6 malware can embed binary data within an executable. 

There are 4 main categories: 

  • string based encodings 
  • data hidden within the actual opcodes of the program 
  • data hidden within parts of the VB6 file format  
  • data in or around normal PE structures 

Originally I was only going to cover data hidden within the file format itself but for the sake of  documentation I decided it is worth covering them all.  

Data held within the file format is a special case which I find the most interesting. This is because it can be interspersed within a complex set of undocumented structures which would require  advanced knowledge and intricate parsing to detect. In this scenario it would be hard to determine where the data is coming from or to even recognize that these buffers exist.  

Resource Data 

The first technique is the standard built into the language itself, namely loading data from the  resource section. VB6 comes with an add-in that allows users to add a .RES file to the project.  This file gets compiled into the resource section of the executable and allows for binary data to be  easily loaded. 

This is a well known and standard technique. 

Appended Data 

This technique is very old and has been used from all manner of programming language. It will be mentioned again for thoroughness and to link to a public implementation [1] that allows for  simplified use. 

Hex String Buffers 

It is very common for malware to build up a string of hex characters that are later converted back to binary data. Conversion commonly includes various text manipulations such as decryption or  stripping junk character sequences. Extra character sequences are commonly used to prevent  automatic recognition of the data as a hex string by AV.  

In the context of VB6, there are several limitations. The IDE only allows for a total of 1023  characters to be on a single line. VB’s line continuation syntax of &_ is also limited to only 25  lines. For these reasons you will often see large blocks of data embedded in the following format: 

In a compiled binary each string fragment is held as an individual chunk which is easily  identifiable. A faster variant may hold each element in a string array so conglomeration only  occurs once.  

This is a well known and standard technique. It is commonly found in VBA, VB6 and malware  written in many other languages. Line length limitations can not be bypassed through command  line compilation. 

Binary Data Within Images 

There are multiple ways to embed lossless data into image formats. The most common will be to  embed the data directly within the structure of a BITMAP image. Bitmaps can be held directly  within VB6 Image and Picture controls. Data embedded in this manner will be held in the .FRX  form resource file before compilation. Once compiled it will be held in a binary property field for  the target form element. Images created like this can be generated with a special tool, and then  embedded directly into the form using the IDE. 

The following is a public sample[2] of data being extracted from such a bitmap 

Extracted images will display as a series of colored blocks and pixels of various colors. Note that  this is not stenography. 

Many tools understand how to extract embedded images from binary files. Since the image data  still contains the BITMAP header, parsing of the VB6 file format itself is not necessary. This  technique is public and in common use. The data is often decrypted after it is extracted. 

Chr Strings 

Similar to obfuscations found in C malware, strings can be built up at runtime based on individual  byte values. A common example may look like the following: 

At the asm level, this serves to break up each byte value and puts it inline with a bunch of  opcodes preventing automatic detection or display with strings. For native VB6 code it will look  like the following: 

In P-Code it will look like the following: 

This is a well known and standard technique. It is commonly found in VBA as well as VB6  malware. 

Numeric Arrays 

Numeric arrays are a fairly standard technique in malware that are used to break up the binary  data amongst the programs opcodes. This is similar to the Chr technique but can hold data in a  more compact format. The most common data types used for this technique are 4 byte longs, and 8 byte currency types. The main advantage of this technique is that the data can be easily  manipulated with math to decrypt it on the fly. 





This technique is not as popular as the others, but does have a long history of use. I think the first place I saw it was in Flash ActionScript exploits. 

Form Properties 

Forms and embedded GUI elements can contain compiled in data as part of their properties. The  most common attributes used are Form.Caption, Textbox.Text, and any element’s. Tag property. 

Since all of these properties are typically entered via the IDE, they are usually found to contain  ASCII only data that is later decoded to binary. 

Developers can however embed binary data directly into these properties using several  techniques.  

While there is way to hexedit raw data in the .FRX form resource file, this comes with limitations  such as not being able to handle embedded nulls. Another solution is inserting the data post  compilation. With this technique a large buffer is reserved consisting of ASCII text that has start  and end markers. An embedding tool can then be run on the compiled executable to fill in the  buffer with true binary data.  

Using form element properties to house text based data is a common practice and has been seen  in VBA, VB6, and even PDF scripts. Binary data embedded with a post processing step has been observed in the wild. In both P-Code and Native, access to these properties will be through COM object VTable calls.  

From the Semi-VBDecompiler source, each different control type (including ActiveX) has its own  parser for these compiled in property fields. Results will vary based on tool used if they can display the data. Semi-Vbdecompiler has an option to dump property blobs to disk for manual exploration. This may be required to reveal this type of embedded binary data. 

UserControl Properties 

A special case for the above technique occurs with the built in UserControl type. This control is  used for hosting reusable visual elements and in OCX creation. The control has two events which  are passed a PropertyBag object of its internal binary settings. This binary data can be easily set  in the IDE through property pages. This mechanism can be used to store any kind of binary data  including entire file systems. A public example of this technique is available[3]. Embedded data will be held per instance of the UserControl in its properties on the host form. 

Binary Strings 

Compiled VB6 executables store internal strings with a length prefix. Similar to the form properties trick, these entries can be modified post compilation to contain arbitrary binary data. In order to discern these data blobs from other binary data, in depth understanding and complex  parsing of the VB6 file format would have to occur.  

The longest string that can be embedded with this technique is limited by the line length in the  IDE which is 2042 bytes ((1023 bytes – 2 for quotes) *2 for unicode).

VB6 malware can access these strings normally with no special loading procedure. As far as its  concerned the source was simply str = “binary data”

The IDE can handle a number of unicode characters which can be embedded in the source for compilation. Full binary data can be embedded using a post processing technique. 

Error Line Numbers 

VB6 allows for developers to embed line numbers that can be accessed in the event of an error to  help determine its location. This error line number information is stored in a separate table outside of the byte code stream.  

The error line number can be accessed through the Erl() function. VB6 is limited to 0xFFFF line  numbers per function, and line number values must be in the 0-0xFFFF range. Since the size of  the embedded data is limited with this technique, short strings such as passwords and web  addresses are the most likely use.

When the code below is run, it will output the message “secret” 

Advanced knowledge of the VB6 file format would be required in order to discern this data from  other parts of the file. Embedded data is sequential and readable if not encoded in some other  way. 

Function Bodies 

The AddressOf operator allows VB6 easy runtime access to the address of a public function in  a module. It is possible to include a dummy function that is filled with just placeholder instructions to create a blank buffer within the .text section of the executable. This buffer can be easily loaded  into a byte array with a CopyMemory call. A simple post compilation embedding could be used to  fill in the arbitrary data.

For P-Code compiles, AddressOf returns the offset of a loader stub with a structure offset. P-Code compiles would require several extra steps but would still be possible.  


[1] Embedded files appended to executable – theTrik:

[2] Embedding binary data in Bitmap images – theTrik: 
http://www.vbforums.com/showthread.php?885395-RESOLVED-Store-binary-data-in UserControl&p=5466661&viewfull=1#post5466661 

[3] UserControl binary data embedding – theTrik:

The post Binary Data Hiding in VB6 Executables appeared first on Avast Threat Labs.

  • There are no more articles