Normal view

There are new articles available, click to refresh the page.

Before yesterdayde engineering

de engineering
A deep dive into Processes, Threads, Fibers and Jobs on Windows.Mr. Rc
3 August 2022 at 05:00

A deep dive into Processes, Threads, Fibers and Jobs on Windows.

3 August 2022 at 05:00

Learning how processes and threads work is a crucial part of understanding any Operating System as they are the building block on top of which almost all of the user-mode mechanisms work. Additionally, Windows offers us an elegant API that enables us to interact with them. Unsurprisingly enough, these topics can be a bit complicated to understand since Microsoft does not provide a clear documentation of them and there are not a lot of resources that cover these topics clearly. Windows also provides us the fiber and job APIs which are built on top of the process and thread APIs to allow the developers to manage processes and threads “easily”.

Table of contents:

Processes

Many people assume that a program and a process are the same. However, a process is not same as a program. A program is simply a file containing code. On the other hand, a process is a container of threads and various resources that are required for the threads inside the process to execute.

Process resources

The resources that are required to run a process might differ for each process according to it’s need but these are the fundamental components that every (almost) process has:
Process Identifier: Process identifier (aka PID or process ID) is a unique identifier for each process on the system. While processes with same name can exist on the system, process with same process IDs can not.

Private Virtual Address Space: A specific amount of virtual addresses that a process can use. This amount is different for different systems. I’ve previously wrote a detailed post about Virtual Memory which can be found here.

Executable Code: This refers to the code that is mapped into the private virtual address space (“stored in process’s memory”) of the process from the program. Processes can and do exist without any executable code for special purposes.

Handle Table: A Handle table contains all the pointer to the actual objects in the kernel that are being used by the process. The handles returned by the APIs are essentially the indexes inside the handle table. This table can not be accessed from the user-mode, since it is stored in the kernel mode. Another thing to note here is that this handle table only consists of handles for kernel objects and not for any other category of object, i.e. GDI and user.

Access Token: Each process also has an access token that defines it’s security context which is used by the system to check identity information such as which process belongs to which user, what are the privileges that it has, etc.

Process Environment Block: PEB is a user-mode per process structure that contains quite a lot of information about a process, such as the arguments provided to this process, if it’s being debugged or not, list of loaded modules, etc.
This is how the PEB looks like:

struct _PEB {
    0x000 BYTE InheritedAddressSpace;
    0x001 BYTE ReadImageFileExecOptions;
    0x002 BYTE BeingDebugged;
    0x003 BYTE SpareBool;
    0x004 void* Mutant;
    0x008 void* ImageBaseAddress;
    0x00c _PEB_LDR_DATA* Ldr;
    0x010 _RTL_USER_PROCESS_PARAMETERS* ProcessParameters;
    0x014 void* SubSystemData;
    0x018 void* ProcessHeap;
    0x01c _RTL_CRITICAL_SECTION* FastPebLock;
    0x020 void* FastPebLockRoutine;
    0x024 void* FastPebUnlockRoutine;
    0x028 DWORD EnvironmentUpdateCount;
    0x02c void* KernelCallbackTable;
    0x030 DWORD SystemReserved[1];
    0x034 DWORD ExecuteOptions:2; // bit offset: 34, len=2
    0x034 DWORD SpareBits:30; // bit offset: 34, len=30
    0x038 _PEB_FREE_BLOCK* FreeList;
    0x03c DWORD TlsExpansionCounter;
    0x040 void* TlsBitmap;
    0x044 DWORD TlsBitmapBits[2];
    0x04c void* ReadOnlySharedMemoryBase;
    0x050 void* ReadOnlySharedMemoryHeap;
    0x054 void** ReadOnlyStaticServerData;
    0x058 void* AnsiCodePageData;
    0x05c void* OemCodePageData;
    0x060 void* UnicodeCaseTableData;
    0x064 DWORD NumberOfProcessors;
    0x068 DWORD NtGlobalFlag;
    0x070 _LARGE_INTEGER CriticalSectionTimeout;
    0x078 DWORD HeapSegmentReserve;
    0x07c DWORD HeapSegmentCommit;
    0x080 DWORD HeapDeCommitTotalFreeThreshold;
    0x084 DWORD HeapDeCommitFreeBlockThreshold;
    0x088 DWORD NumberOfHeaps;
    0x08c DWORD MaximumNumberOfHeaps;
    0x090 void** ProcessHeaps;
    0x094 void* GdiSharedHandleTable;
    0x098 void* ProcessStarterHelper;
    0x09c DWORD GdiDCAttributeList;
    0x0a0 void* LoaderLock;
    0x0a4 DWORD OSMajorVersion;
    0x0a8 DWORD OSMinorVersion;
    0x0ac WORD OSBuildNumber;
    0x0ae WORD OSCSDVersion;
    0x0b0 DWORD OSPlatformId;
    0x0b4 DWORD ImageSubsystem;
    0x0b8 DWORD ImageSubsystemMajorVersion;
    0x0bc DWORD ImageSubsystemMinorVersion;
    0x0c0 DWORD ImageProcessAffinityMask;
    0x0c4 DWORD GdiHandleBuffer[34];
    0x14c void (*PostProcessInitRoutine)();
    0x150 void* TlsExpansionBitmap;
    0x154 DWORD TlsExpansionBitmapBits[32];
    0x1d4 DWORD SessionId;
    0x1d8 _ULARGE_INTEGER AppCompatFlags;
    0x1e0 _ULARGE_INTEGER AppCompatFlagsUser;
    0x1e8 void* pShimData;
    0x1ec void* AppCompatInfo;
    0x1f0 _UNICODE_STRING CSDVersion;
    0x1f8 void* ActivationContextData;
    0x1fc void* ProcessAssemblyStorageMap;
    0x200 void* SystemDefaultActivationContextData;
    0x204 void* SystemAssemblyStorageMap;
    0x208 DWORD MinimumStackCommit;
);

Thread: Thread is the entity inside a process that executes code. Every process starts with at least one thread of execution, this thread is called the primary thread. A process without threads can exist, but again, it’s mostly of times it’s of no use since it is not running any code.

EPROCESS structure: The EPROCESS (Executive Process) data structure is the kernel’s representation of the process object. The structure is huge and it contains every possible bit of information related to a process, such as pointers to other data structure, values of different attributes, etc. This structure is not documented by Microsoft.
The structure is very big in size so I’m not including it but it can be found here

KPROCESS structure: One of the most interesting structure inside the EPROCESS data structure is the KPROCESS (Kernel Process) data structure. This data structure also contains a lot of information about the process, such as pointer to process’s page directory, how much time the threads of the process has consumed in the user and kernel-mode, etc. Just like it EPROCESS, this structure is also not documented.
The structure looks like this:

struct _KPROCESS {
  struct _DISPATCHER_HEADER Header;
  struct _LIST_ENTRY ProfileListHead;
  unsigned int DirectoryTableBase;
  unsigned long Asid;
  struct _LIST_ENTRY ThreadListHead;
  unsigned long ProcessLock;
  unsigned long Spare0;
  unsigned int DeepFreezeStartTime;
  struct _KAFFINITY_EX Affinity;
  struct _LIST_ENTRY ReadyListHead;
  struct _SINGLE_LIST_ENTRY SwapListEntry;
  struct _KAFFINITY_EX ActiveProcessors;
  long AutoAlignment : 1;
  long DisableBoost : 1;
  long DisableQuantum : 1;
  unsigned long DeepFreeze : 1;
  unsigned long TimerVirtualization : 1;
  unsigned long CheckStackExtents : 1;
  unsigned long SpareFlags0 : 2;
  unsigned long ActiveGroupsMask : 20;
  long ReservedFlags : 4;
  long ProcessFlags;
  char BasePriority;
  char QuantumReset;
  unsigned int Visited;
  union _KEXECUTE_OPTIONS Flags;
  unsigned long ThreadSeed[20];
  unsigned int IdealNode[20];
  unsigned int IdealGlobalNode;
  union _KSTACK_COUNT StackCount;
  struct _LIST_ENTRY ProcessListEntry;
  unsigned int CycleTime;
  unsigned int ContextSwitches;
  struct _KSCHEDULING_GROUP *SchedulingGroup;
  unsigned long FreezeCount;
  unsigned long KernelTime;
  unsigned long UserTime;
  void *InstrumentationCallback;
};

This diagram shows the components of a process:

Threads

Threads are the actual entities inside a process that are running code on the CPU. Threads can execute any part of the code. A process provides all the resources that threads require to complete their task. Without threads, a process can’t run any code. A process can have multiple threads and such processes are called multi-threaded processes.

Thread scheduling

When there are multiple threads on the system, the scheduler switches between different threads and creates an illusion that all the threads running in parallel. While what’s really happening is that the scheduler is switching between different threads so quickly that it appears that the threads are running in parallel.
The amount of time for which a thread can run on a CPU before it switches is called the thread’s quantum. This quantum is a value that is set by the scheduler. This is usually set to a value that is a multiple of the processor’s clock speed.
Windows uses priority based thread scheduling model where the scheduler uses the thread’s priority to determine which thread should run next. The priority of a thread is a value that is set by the thread’s creator or by the system.
Because this system is quite complex, I will not go over it in detail here.

Thread resources

While a process provides a fair amount of resources for threads to run, there are still a few things that threads need in order to execute, these include:

Context: Every thread has a context which is a user-mode per thread data structure (managed by kernel) that contains the state of all the registers from the time the thread was last executed on the CPU. This data structure is very important because there can’t be multiple threads running on a CPU, so Windows switches between different threads after a few moments and each time it switches a thread, it stores the current CPU registers’ state in the context. This context is loaded again into as the values of the registers when the thread resumes it’s execution on the CPU. Since this data structure stores information related to registers, it’s processor-specific.
This is the how the data structure looks like for x64 machines:

typedef struct _CONTEXT {
  DWORD64 P1Home;
  DWORD64 P2Home;
  DWORD64 P3Home;
  DWORD64 P4Home;
  DWORD64 P5Home;
  DWORD64 P6Home;
  DWORD   ContextFlags;
  DWORD   MxCsr;
  WORD    SegCs;
  WORD    SegDs;
  WORD    SegEs;
  WORD    SegFs;
  WORD    SegGs;
  WORD    SegSs;
  DWORD   EFlags;
  DWORD64 Dr0;
  DWORD64 Dr1;
  DWORD64 Dr2;
  DWORD64 Dr3;
  DWORD64 Dr6;
  DWORD64 Dr7;
  DWORD64 Rax;
  DWORD64 Rcx;
  DWORD64 Rdx;
  DWORD64 Rbx;
  DWORD64 Rsp;
  DWORD64 Rbp;
  DWORD64 Rsi;
  DWORD64 Rdi;
  DWORD64 R8;
  DWORD64 R9;
  DWORD64 R10;
  DWORD64 R11;
  DWORD64 R12;
  DWORD64 R13;
  DWORD64 R14;
  DWORD64 R15;
  DWORD64 Rip;
  union {
    XMM_SAVE_AREA32 FltSave;
    NEON128         Q[16];
    ULONGLONG       D[32];
    struct {
      M128A Header[2];
      M128A Legacy[8];
      M128A Xmm0;
      M128A Xmm1;
      M128A Xmm2;
      M128A Xmm3;
      M128A Xmm4;
      M128A Xmm5;
      M128A Xmm6;
      M128A Xmm7;
      M128A Xmm8;
      M128A Xmm9;
      M128A Xmm10;
      M128A Xmm11;
      M128A Xmm12;
      M128A Xmm13;
      M128A Xmm14;
      M128A Xmm15;
    } DUMMYSTRUCTNAME;
    DWORD           S[32];
  } DUMMYUNIONNAME;
  M128A   VectorRegister[26];
  DWORD64 VectorControl;
  DWORD64 DebugControl;
  DWORD64 LastBranchToRip;
  DWORD64 LastBranchFromRip;
  DWORD64 LastExceptionToRip;
  DWORD64 LastExceptionFromRip;
} CONTEXT, *PCONTEXT;

Two stacks: Every thread has two stacks, a user-mode stack and a kernel-mode stack. The user-mode stack is used for normal purposes, such as for storing the values of variables. Unsurprisingly, the kernel stack is not accessible from the user-mode and it’s used as security mechanism.
When a threads calls a syscall, all of the arguments provided to that syscall are copied from the thread’s user-mode stack to it’s kernel-mode stack. It is done this way because after a thread uses a syscall, the CPU switches to the kernel mode and then the kernel-mode code validates those arguments to see if all the pointers, structures, etc that are passed are valid or not and since this stack is not accessible from the user-mode, a thread can not manipulate the arguments after they have been validated and this way, having two stacks works as a strong security measure.

Thread Local Storage: The thread local storage is a data structure that is used to store data that is specific to each thread. This data is stored in the thread’s context and is not shared between threads.

Thread ID: Just like every process has a unique identifier, every thread also has a unique identifier called a thread ID (TID).

Thread Environment Block: Like processes, threads also have most of their information stored in a data structure called the Thread Environment Block (TEB). This structure contains information such as pointer to the TLS, the LastErrorValue (this has to be this way because if two threads called GetLastError and one thread gets the LastErrorValue of some other thread then it can lead to total chaos), pointer to PEB, etc. TEB is also not documented by Microsoft.
This structure can be found here

Affinity: Setting affinity for a thread forces Windows to run a thread only on a specific CPU. For example, let’s say your machine has for CPU and you set the affinity of process linux.exe to CPU 3 then that thread will only run on CPU 3 until it finishes execution or it’s affinity is changed.

ETHREAD structure: The ETHREAD structure (Executive Thread) is the kernel representation of the thread object. Similar to EPROCESS, this structure also contains every possible bit of information about a thread, such as a pointer to the PEB, LastErrorValue, if this thread is the initial thread (main thread) of the process or not, etc. This structure is also not documented by Microsoft.
This structure can be found here

KTHREAD structure: The KTHREAD data structure (Kernel Thread) is also one of the important data structure inside ETHREAD data structure. It includes information such as the pointer to the kernel stack, a lot of information about it’s scheduling (when and for how long this thread will run on the CPU), pointer to TEB, how much time the thread has spent in the user-mode, etc. This structure is also not documented by Microsoft.
This structure can be found here

This diagram shows the components of a process:

Using Threads

Using threads is very simple. We just need to create a thread using the CreateThread function. The thread will start executing at the address of the specified function.
The function that we want to run in the thread is called the thread’s entry point. The entry point is the function that is called when the thread is created.

Here’s the signature of the CreateThread function:

HANDLE CreateThread(
  [in, optional]  LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  [in]            SIZE_T                  dwStackSize,
  [in]            LPTHREAD_START_ROUTINE  lpStartAddress,
  [in, optional]  __drv_aliasesMem LPVOID lpParameter,
  [in]            DWORD                   dwCreationFlags,
  [out, optional] LPDWORD                 lpThreadId
);

The first parameter is the security attributes. This is a pointer to a SECURITY_ATTRIBUTES structure that contains information about the security of the thread. This is optional and can be NULL for default security.
The second parameter is the stack size. This is the size of the stack that the thread will use. This is optional and can be 0 for default stack size.
The third parameter is the address of the function that will be executed in the thread. This is the entry point of the thread.
The fourth parameter is the parameter that will be passed to the thread. This is optional and can be NULL for no parameter.
The fifth parameter is the creation flags. This is a set of flags that determines how the thread will be created. This is optional and can be 0 if we want the thread to directly execute after being created.
The sixth parameter is the thread ID. This is a pointer to a variable that will receive thread ID after it’s created. This is optional and can be NULL if we do not want to store the thread’s ID.

Fibers

Fibers are unit of execution that allow us to manually schedule (define our own scheduling algorithm) them rather than being automatically scheduled by the scheduler. Fibers run in the context of the threads that created them. Every thread can have multiple fibers and a thread can run one fiber at a time (we decide which). Fibers are often called lightweight threads.
Fibers are invisible to the kernel as they are implemented in the user-mode in Kernel32.dll.

Using Fibers

The first step when using fiber is to convert our own thread into a fiber. This is done by calling the ConvertThreadToFiber function. This is the signature for the function:

LPVOID ConvertThreadToFiber(
  [in, optional] LPVOID lpParameter
);

This function returns the memory address of the fiber’s context that was created. This address is useful later when performing operations on the fiber. The fiber’s context is similar to than that of a thread but it has a few more elements than just registers, these include:

The value of lpParameter that was passed to ConvertThreadToFiber.
The top and bottom memory addresses of the fiber’s stack.
and more.

After this function is called our thread gets converted into a fiber and it starts running on our thread. This fiber may exit either when it’s done executing or when it calls ExitThread (in this case, the thread and fiber both get terminated).

Now, to create a fiber, we need to call the CreateFiber function. This is the signature for the function:

LPVOID CreateFiber(
  [in]           SIZE_T                dwStackSize,
  [in]           LPFIBER_START_ROUTINE lpStartAddress,
  [in, optional] LPVOID                lpParameter
);

The first argument is used to specify the size of the fiber’s stack, generally, 0 is specified, which uses the default value and creates a stack that can scale grow up to 1 MB. The second argument is the address of the function that will be executed when the fiber is scheduled. The third argument is the parameter that will be passed to the function that will be executed.
This function also returns the memory address of the fiber’s context that was created with this context having one additional element: the address of the function that will be executed.
Remember that calling this function only creates the fiber and doesn’t start it. To start the fiber, we need to call the SwitchToFiber function. This is the signature for the function:

void SwitchToFiber(
  [in] LPVOID lpFiber
);

This function takes only one argument, the address of the fiber’s context that was previously returned by CreateFiber. This function actuall starts the execution of the fiber.

To destroy a fiber, we need to call the DeleteFiber function. This is the signature for the function:

void DeleteFiber(
  [in] LPVOID lpFiber
);

It only takes one argument, the address of the fiber’s context that we want to delete.

CreateProcess internals

Usually, when a thread wants to create another process, it calls the Windows API function CreateProcess and specifies the parameters accordingly to create a process with required attributes. This function is takes a lot of arguments and is quite flexible and can be used in almost all cases.
However, sometimes the capabilities of this functions are not enough so other functions (sometimes just a wrapper of this function) are used, here are some of them:

CreateProcessAsUser allows you to create a process on the behalf another user by allowing you to specify the handle to that user’s primary token.
CreateProcessWithTokenW gives you the same capabilities as the previous function but this one just requires a few different privileges.
CreateProcessWithLogonW allows you to provide the credentials of the user in whose context you want to create a process.
ShellExecute is a very unique function. All the previous functions that we talked about work with any valid Portable Executable (PE) file and they do not care about the file extension of the file that you specified, i.e, you can rename the original notepad.exe to notepad.txt and give it to any of those functions and they would still create a process from it.
However, the ShellExecute and ShellExecuteEx are a bit different. These functions accept any file format and then they look inside the HKLM\SOFTWARE\Classes and HKCU\SOFTWARE\Classes registry keys to find the program which is associated with the file format of the file you gave it as an argument and then they eventually call the CreateProcess function with the appropriate executable name/path along with the file name appended, for example you can provide this function a txt file and it will launch notepad with the filename as an argument (notepad.exe filename.txt).

CreateProcess and CreateProcessAsUser both are exported by Kernel32.dll and both of them eventually call CreateProcessInternal (also exported by Kernel32.dll) which also ends up calling the NtCreateUserProcess function which is exported by ntdll.dll. NtCreateUserProcess is the last part of the user-mode code of all user-mode process creation functions, after this function is done with it’s work, it makes a syscall and transforms into kernel mode. Both CreateProcessInternal and NtCreateUserProcess are officially undocumented by Microsoft at the time of writing this post.
However, the CreateProcessWithTokenW and CreateProcessWithLogonW functions are exported by Advapi32.dll. Both of these functions make a Remote Procedure Call (RPC) to the Secondary Login Service (seclogon.dll hosted in svchost.exe), this service allows processes to be started with different user’s credentials and then Secondary Logon Service executes this call in its SlrCreateProcessWithLogon function which eventually calls CreateProcessAsUser.

Arguments

The arguments for all the CreateProcess* functions are almost completely similar with a only a few differences. The explanation of all the CreateProcess* functions would be tedious to write as well as very boring to read, so here is the brief overview of the description of different arguments:

The first argument for CreateProcessAsUser and CreateProcessWithTokenW are the handle to the token under which the process will be started. However, in the case of CreateProcessWithLogonW, the first arguments include the username, domain and password of the user on whose behalf the process will be started.
The next important argument lpApplicationName, which is the full path to the executable to run. This argument can be left NULL and instead the next argument can be used.
The next argument after lpApplicationName is lpCommandLine. This argument doesn’t require us to put the provide the full path of the executable we want create a process of (we can provide it full path but it’s optional), the reason behind this is that when we provide it an executable’s name without a path in this argument, the function searches through several pre-defined paths in an order to find that file’s path. This is the order defined in msdn:
The next important arguments are lpProcessAttributes and lpThreadAttributes. Both of them take a pointer to SECURITY_ATTRIBUTES structure and both of them can be NULL, and when this argument is specified NULL then the default security attributes are used. we can specify whether we want to make the handle of the process that is about to be created (in lpProcessAttributes) and it’s primary thread (in lpProcessAttributes) inheritable by all the other child processes that the caller of CreateProcess* creates or not in the bInheritHandle member of SECURITY_ATTRIBUTES.
The next important argument is bInheritHandles. This argument specifies whether we want the process that is about to be created to inherit all the inheritable handles from the handle table of the parent process or not.
The next important argument is dwCreationFlags. This argument allow us to specify different flags that affect the creation of the process, such as:
- CREATE_SUSPENDED: The initial thread of the process being created is started in suspended state (paused state, it doesn’t directly run after it’s created). A call to ResumeThread can be used thereafter to resume the execution of the thread.
- DEBUG_PROCESS: The calling process declares itself as a debugger and creates the process under it’s control.
The next argument is lpEnvironment. This argument is optional and is used to provide a pointer to the an environment variables’ block. Since it’s optional, we can specify it NULL and it will inherit it’s environment variables from it’s parent process.
The next argument is lpCurrentDirectory. This argument is also optional and is used if we want the process about to be created will have a different current directory than the parent process. If left NULL, the new process will use the current directory of the parent process.
The next argument is lpStartupInfo. This argument is used to specify a pointer to STARTUPINFO or STARTUPINFOEX structures. The STARTUPINFO structure contains some more configuration related for the new process. STARTUPINFOEX structure has an extra field which is used to specify some more attributes for the new process.
The last argument is lpProcessInformation. This argument is used to specify a pointer to PROCESS_INFORMATION structure. The CreateProcess* functions returned the information of the new process in this structure, this information includes the process id of the new process, the thread id of the primary thread, a handle to the new process, etc.

Classification of Processes

Windows provides some (almost) completely different attributes for processes that require extra security or have a special purpose. These processes are not launched like normal processes and they also have different attributes.

Protected Processes

The concept of protected processes was initially introduced to imply with Digital Rights Management (DRM) requirements which were imposed by the media industry for protection of content such as HD-DVD media.
Normally, threads of any process which as debug privilege (usually processes started by the administrator account) could read or write data and code into the memory of any process running on the system. This behavior is very useful in a lot of cases. However, this behavior violates the DRM requirements and for this reason, Windows uses protected processes.
These process exist with normal Windows process, but they provide with little to no access to other processes on the system (even the one’s running with administrator privileges).
These processes can be created by any application on the system with whatever rights they have, but for an executable to be able to run as a protected process, it must be signed with a special Windows Media Certificate. This certificate is a digital signature that is used to identify the executable as a protected process.
These process also only load DLLs that are signed with a special certificate and the data of these processes are only accessible to either kernel or other protected processes.
Examples of protected process are:

The Audio Graph Device process (Audiodg.exe) that is used by Windows to decode protected DRM audio content.
The Media Foundation Protected Media Path (Mfpmp.exe) process used by Windows to decode DRM video content.
The Windows Error Reporting (WER, Werfaultsecure.exe) for reporting crashes of protected apps. This the protected version of WER is required because the normal WER process executes as a normal process and therefore it can’t access the data inside the crashed protected processes.
The system process.

Protected Processes Light (PPL)

PPL is the extended version of Protected Processes introduced allow third party processes, such as Antivirus programs to have same privileges as protected processes. However, PPLs comes with a slight difference, i.e., how much protected a PPL will be depends upon it’s signature, which results in some PPLs having more or less protection than others.
Most system processes on Windows are PPL protected, such as smss.exe, csrss.exe, services.exe, etc.

Minimal Processes

These are essentially empty processes. These processes have empty user-mode address space, ntdll.dll or other subsystem DLLs are not loaded, no PEB or TEB or any related structure are created, no initial thread is created and no executable image is mapped. There processes are created and managed by the kernel and the kernel provides no way to create such processes from the user-mode since these are not meant to be used by the user, but rather by the system to perform special tasks.
These process can have threads and these threads are called minimal threads. These threads don’t have any Thread Environment Block (TEB) or stack.

An example of this is the memory compression process which stores compressed memory of active processes, this process is used to keep more processes memory without paging them out to the disk (this process is hidden from task manager because since it stores compressed memory, it has a lot of memory usage and average users used to get suspicious about this process). You can view this process in process explorer, if you just sort the processes by their working set (amount of physical memory currently being used), this process should appear on top (it might not, if you have some program eating so much of your ram). This process also has no threads or code.

Pico Processes

Windows introduced the concept of pico processes based on their research called the Project Drawbridge. These are minimal processes with a supporting driver called the pico provider. This driver can manage almost everything related to the execution of the pico process it’s managing and this property of pico providers allow them it can act like a separate kernel for that process without the process having any sense of the original system it’s running on, however, the management of memory, I/O and thread scheduling is still done by the original Windows kernel.
A pico provider is able to intercept all the operations that of the pico process that require any handling by the kernel, this includes things such as system calls, exceptions, etc and respond accordingly.
Pico processes can have pico threads (minimal threads for pico processes) and also normal threads. The pico threads also have a context which is stored in the PicoContext member of ETHREAD structure.

Windows Subsystem for Linux

The Windows Subsystem for Linux (WSL) is built on this idea of pico processes. WSL is able to run whole linux system nearly perfectly on Windows without having a single line of code from the linux kernel. This is made possible by the incredible control that pico providers allow.
The pico providers for WSL are lxss.sys and lxcore.sys. These drivers emulate the behavior of the linux kernel by converting all the linux syscalls made from the WSL pico process to NT APIs or by calling specific components that are implemented from scratch.
This implementation of WSL on Windows is a very interesting topic and complicated topic, I might cover it later in some other blog post!

Trustlets (Secure Processes)

Trustlets are another type of processes that provide strong security. Trustlets can not be directly created by the user. They are created by the Windows kernel when a user-mode application requests to create a secure process.
Trustlets use Virtual Trust Levels provided by Hyper-V Hypervisor to isolate themselves in the system. These levels are used to provide the security of the trustlet. The trustlet can only import DLLs that are signed with a certificate that is trusted by the system and other system trusted DLLs such as C/C++ runtime libraries, Kernelbase, Advapi, RPC runtime, CNG base Crypto, and other mathematical libraries that do no require any syscall to work.
The way truslets work is a bit complex as it requires the understanding of how Hypervisors work so I am not covering that here. However, you can read more about them here on msdn.

Jobs

Jobs are a Windows mechanism to group and manage processes together and have them share the same security context. This can be used to run a bunch of processes that are related to each other, for example if you want to manage multiple processes that are the part of same application.
Jobs are shareable, securable and nameable. Any change to the job will affect all the processes in the job. Jobs are used to impose limits on a set of processes, for example if you want to limit the number of processes of an application that can be running at the same time.
Once a process is assigned to a job, it can not leave that job. Child processes created by the processes inside a job will also be a part of that job unless CREATE_BREAKAWAY_FROM_JOB was specified to CreateProcess and the job itself allows processes to break out from it (a job can deny processes inside it from breaking out of it).

Job limits

Here are few of the limits that we can set in a job:

Max active processes: This is used to limit the amount of processes that can exist in a job. When this limit is reached, no new processes are allowed to assign to that job and creation of child processes is blocked.
Processor Affinity: This is used to limit all the processes inside a job to only on a specific CPU.
Priority Class: This is used to set the priority class for all the members of a job. If the thread of any process that is the member of a job that has it’s priority class set tries to increase it’s priority class, it’s request will be ignored and no error will be returned (to SetThreadPriority).
Virtual Memory Limit: This is used to restrict the maximum amount of virtual memory that can be committed by single processes or the entire job.
Clipboard R/W: This is used to disallow all the members of a job from accessing or writing to the clipboard.

API functions for working with Jobs

The Windows API provides us all the important functions that are required to manage and work with job objects. Here are few of the important functions:

CreateJobObject: Used to create a job object. It can also be used to open a job object.
OpenJobObject: Used to open an already existing job object.
AssignProcessToJobObject: Used to assign a process to a job object.
SetInformationJobObject: Used to set limits for the processes inside a job object.
QueryInformationJobObject: Used to retrieve information about the a job object.
TerminateJobObject: Used to terminate all the processes inside a job object.
IsProcessInJob: Used to check if a process is a member of a job object.

Using Jobs

Working with jobs is also quite simple. You can create a job object, assign processes to it and set limits on the processes inside the job. You can also use the API functions to query and set the limits on the processes inside the job.
To create a job object, you can use the CreateJobObject function. This function returns a handle to the job object. Here is the function signature:

HANDLE CreateJobObjectA(
  [in, optional] LPSECURITY_ATTRIBUTES lpJobAttributes,
  [in, optional] LPCSTR                lpName
);

The lpJobAttributes parameter is a pointer to a SECURITY_ATTRIBUTES structure that can be used to set the security attributes for the job object. It can be NULL if you want the job object to have default security attributes.
The lpName parameter is a pointer to a string that can be used to name the job object. This parameter can also be NULL which will result in the job object being unnamed. If the name matches the name of an existing mutex, file-mapping object or waitable object, the function will fail.

After creating an empty job object, you can assign processes to it. To do this, you can use the AssignProcessToJobObject function. Here is the function signature:

BOOL AssignProcessToJobObject(
  [in] HANDLE hJob,
  [in] HANDLE hProcess
);

The hJob parameter is a handle to the job object.
The hProcess parameter is a handle to the process that you want to assign to the job object.
To get the handle of the current process, you can use the GetCurrentProcess function.

To set the limits on the processes inside the job, you can use the SetInformationJobObject function. Here is the function signature:

BOOL SetInformationJobObject(
  [in] HANDLE             hJob,
  [in] JOBOBJECTINFOCLASS JobObjectInformationClass,
  [in] LPVOID             lpJobObjectInformation,
  [in] DWORD              cbJobObjectInformationLength
);

The hJob parameter is a handle to the job object.
The JobObjectInformationClass parameter is a value that specifies the type of information that you want to set. The next parameter is used to specify the actual information that you want to set.
The lpJobObjectInformation parameter is a pointer to the structure containing information that you want to set.

Code Examples

Now that you know the basics of working with processes, jobs, threads and fibers let’s see some code examples.

Creating a Process

Let’s start by looking at how to create a process.

#include <stdio.h>
#include <windows.h>

int main(){
    STARTUPINFOA si;
    PROCESS_INFORMATION pi;

    ZeroMemory( &si, sizeof(si) );
    si.cb = sizeof(si);
    ZeroMemory( &pi, sizeof(pi) );
    LPSTR lpCommandLine = "notepad.exe";

    // Start the child process. 
    if( !CreateProcessA( NULL,   // No module name (use command line)
        lpCommandLine,        // Command line
        0,           // Process handle not inheritable
        0,           // Thread handle not inheritable
        0,          // Set handle inheritance to FALSE
        0,              // No creation flags
        0,           // Use parent's environment block
        0,           // Use parent's starting directory 
        &si,            // Pointer to STARTUPINFO structure
        &pi )           // Pointer to PROCESS_INFORMATION structure
    ){
        printf( "CreateProcess failed (%d).\n", GetLastError() );
        return -1;
    }
    printf("Process Created!\n");

    // Sleep for 5 seconds
    Sleep(5000);

    // Close process and thread handles. 
    CloseHandle( pi.hProcess );
    CloseHandle( pi.hThread );

    return 0;
}

This code should open notepad.exe and exit after 5 seconds.
Process creation is pretty easy, you just need to know the name of the executable and the command line arguments if you want to pass any.

Creating a Thread

Now that you know how to create a process, let’s look at how to create a thread.

#include <stdio.h>
#include <windows.h>

// Function to create a thread
int EthicalFunction(LPVOID lpParam)
{
    // Print a message
    printf("Thread created\n");
    printf("For educational purposes only*\n");
    // Return success
    return 0;
}

int main()
{
    // Create a thread
    HANDLE hThread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)EthicalFunction, NULL, 0, NULL);
    // Wait for thread to finish
    WaitForSingleObject(hThread, INFINITE);
    printf("Thread returned\n");
    printf("Exiting...\n");
    // Close thread handle
    CloseHandle(hThread);
    // Return success
    return 0;
}

This code is creating a thread for the EthicalFunction function and waiting for it to finish and then exiting after printing a message.
You can create multiple threads for multiple functions like this, for example if you want to create a thread a background thread that runs in the background and does not block the main thread until it is finished with its work.

Creating a Fiber

Next, let’s look at how to create a fiber.

#include <stdio.h>
#include <windows.h>

// fiber function
void fiber_function(void* lpParam)
{
    // Print a message
    printf("Fiber created\n");
    printf("For educational purposes only*\n");
    // Converting back into the main thread as fiber will not return to the main thread by itself
    SwitchToFiber(lpParam);

}

// main function
int main()
{
    // Converting thread to fiber
    LPVOID Context = ConvertThreadToFiber(NULL);
    // Creating fiber
    LPVOID lpFiber = CreateFiber(0, (LPFIBER_START_ROUTINE)fiber_function, Context);
    // Switching to fiber (executing fiber function)
    SwitchToFiber(lpFiber);
    // Printing a message
    printf("Fiber returned\n");
    // Deleting fiber
    DeleteFiber(lpFiber);
    // Return success
    printf("Exiting...\n");
    return 0;
}

This code will create a fiber and execute it and then switch back to the main thread and the main thread will print a message and delete the fiber.

Creating a Job Object

Let’s look at how to create a job object and assign a processes to it.

#include <stdio.h>
#include <windows.h>

int main()
{
    // Creating a job with default security attributes
    HANDLE hJob = CreateJobObject(NULL, "Unemployed");
    // Setting the job to terminate when all processes in it terminate
    JOBOBJECT_EXTENDED_LIMIT_INFORMATION jeli = {0};
    jeli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
    SetInformationJobObject(hJob, JobObjectExtendedLimitInformation, &jeli, sizeof(jeli));
    // Creating structures for notepad, cmd and powershell
    STARTUPINFOA si = {0};
    si.cb = sizeof(si);
    PROCESS_INFORMATION pi = {0};
    STARTUPINFOA si1 = {0};
    si1.cb = sizeof(si1);
    PROCESS_INFORMATION pi1 = {0};

    // Creating notepad, and Windows media player in suspended state and adding them to the job and checking for errors
    if (!CreateProcessA(NULL, (LPSTR)"notepad.exe", NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si, &pi) || !CreateProcessA(NULL, (LPSTR)"dvdplay.exe", NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si1, &pi1))
    {
        printf("Error creating processes\n");
        printf("Error code: %d\n", GetLastError());
        return 1;
    }
    
    AssignProcessToJobObject(hJob, pi.hProcess);
    AssignProcessToJobObject(hJob, pi1.hProcess);

    // Resuming processes
    ResumeThread(pi.hThread);
    ResumeThread(pi1.hThread);
    printf("Job created and processes added!\n");
    
    // SLeeping for 1 minutes to let the processes run
    Sleep(60000);
    // Terminating the job
    TerminateJobObject(hJob, 0);
    // Closing handles
    CloseHandle(pi.hProcess);
    CloseHandle(pi.hThread);
    CloseHandle(hJob);

    // Return success
    printf("Exiting...\n");
    return 0;
}

This code will create a job object named Unemployed and add two notepad and Windows Media Player processes to it and then terminate the job after 1 minutes.
To confirm that the processes are running inside the Unemployed job, you can use Process Explorer to view the Properties -> Job of either notepad.exe or wmplayer.exe (not dvdplay.exe as it immediately launches this as the child process, it can also be setup_wm.exe if you do not have your Windows Media Player setup).
Here’s an example image:

Summary

This post covered an overview of the internals of processes, threads, fibers and jobs as well the classification of processes into different types. We also looked at the different components of a process. Later, we looked at how to create a process, thread, fiber and job objects.
I hope you enjoyed this post and found it useful.
Thank you for reading!

Resources

de engineering
Understanding SMT solvers: An Introduction to Z3Mr. Rc
3 August 2022 at 05:00

Understanding SMT solvers: An Introduction to Z3

de engineering

By: Mr. Rc

3 August 2022 at 05:00

Satisfiability Modulo Theories (SMT) solvers are one of the most interesting topics to learn about in Computer Science. They can reduce a big chunk of time that would otherwise be spent on statically or dynamically analyzing the binary. While SMT solvers have their limits, when they work, they work like magic. You might already have heard of or seen someone use a SMT solver like Z3 for solving CTF challenges or Program Analysis. By the end of this blog, you’ll have a good grasp of all the required knowledge to get started with SMT solvers and use Z3.

This post does not use any complicated mathematics to explain these solvers and will deal with only required theory and examples to get started. To go deep into SMT solvers and Program Analysis check out #resources.

Info: If you want to watch a video version, which I made for GuidedHacking (covers 50% of this blog): Watch Here

Table of contents:

SAT Solvers

SMT solvers leverage the powers of another type of solvers called Boolean Satisfiability Problem solvers (SAT solvers). As the name suggests, they have something to do with Boolean variables.

These solvers basically take a Boolean expressions as input and output whether there are possible values for the Boolean variables which result in the expression returning true (or satisfiable). When there are no such variables SAT solver outputs false (or unsatisifable). If the expression is satisfiable then the SAT solver can also output the values for the variables which satisfy the expression.

Relation with SMT solvers

Satisfiability Modulo Theory (SMT) solvers essentially combine the powers of SAT solvers and some other type of solvers but SAT solvers are the primary backend of SMT solvers. SMT solvers like SAT are able to find not only the satisfiability but also the satisfying inputs for Boolean expressions but they are not limited to just Boolean expressions. SMT solvers are capable of other inputs such as integer, bitvector, arrays and arithmetic operations.

Terminologies

There are few terms that you’ll need to know when navigating through the smt solver territory. Concrete values: Concrete values are simply constant values. For example 5 is a concrete value. It’s that simple. Symbolic values: Symbolic values are like the unknown variables which you have when dealing with an algebraic expression. These are have are used to represent values which are not yet known. Example: 3x + 2y = 1, in this expression x and y are symbolic values.

Symbolic Execution

Symbolic Execution is a technique which essentially reduces the conditions inside given program into mathematical equations and tries to solve them using SMT and SAT solvers.

Instead of just theoretical explanation, let us look at an example in order to understand the the essence of Symbolic Execution. Consider the following C program:

int main() {
  int j = getUserInput();
  int k = j * 2;

  if (k > 0) {
    if (k * 2 == 8) {
      printf("Correct input!\n");
    } else {
      exit(-1);
    }
  } else {
    exit(-1)
  }
}

When compiled and executed normally, this program would take an concrete integer input (lets say 7) from the user and evaluate and run into the path which is satisfiable which would result in the program calling the exit function in this case. However, when run with symbolic execution:

j will be assigned a symbolic value (e.g. γ)
k will be assigned γ * 2
At the if statement, the symbolic execution engine will remember the condition as a constraint (i.e. γ * 2 > 0) and execute the if branch and at the same time remember the else branch and add it’s constraint (γ * 2 < 0) and execute that too symbolically with a copy of the program’s current state just like what the if branch has except with a different constraint.
- The path that the if branch takes has another if condition whose constraint (γ * 2 * 2 == 8) is again remembered along with the existing constraint (γ * 2 > 0) and also symbolically executes the else branch at the same time with the opposite constraints.
  - The if branch then proceeds and executes the code which essentially leads the program to exit normally after printing “Correct Input!”, the symbolic execution engine then solves the remembered constraints [ * 2 > 0, γ * 2 * == 8] which results in a concrete value and remembers the paths it leads to.
- The path that the else branch takes simply exits so the concrete value to this path is solved and remembered.
The path that the else branch takes leads to an exit after which the symbolic execution engine solves the constraints and get’s the concrete value which will satisfy the constraints (i.e. γ * 2 < 0) and remembers the path it leads to. After the execution is complete, the symbolic execution engine will output all the possible inputs or the requested inputs and tell where they will lead.

Let us first label all the code paths so it will be easier for us to understand:

int main() {
  int j = getUserInput();
  int k = j * 2;

  if (k > 0) {
    path_if_1:
    if (k * 2 == 8) {
      path_if_2:
      printf("Correct input!\n");
    }
    else {
      path_else_2:
      exit(-1);
    }
  }
  else {
    path_else_1:
    exit(-1)
  }
}

Here’s what we assume the Symbolic Execution engine will tell us:

Constraint	Path	Solution
k > 0	path_if_1	All numbers greater than 0 (n > 0)
k < 0	path_else_1	All numbers are smaller than 0 (n < 0)
[k > 0, k * 2 == 8]	path_if_2	2 (n=2)
[k > 0, k * 2 != 8]	path_if_2	Any number greater than 0 except +2 (n > 0, n != 2)
Consider n to contain all possible inputs.

And now, we know of all the possible inputs and the path they will lead to and now, we can input a specific value and get to the desired path. This desired path is usually a piece of unexplored code region that requires some input that we do not know and as you can see, we can figure that out with the power of symbolic execution.

High IQ Facebook problem

All of us have seen those social media posts where it’s a math puzzle type question which states that 99% people fail at solving them, I’m not sure about the source of this statistic but what I’m sure about is that you’ll be capable of solving those problems in seconds after learning about z3, which is what we’ll do in this part of the blog and learn how this relates with symbolic execution later. This is one such graphic with a question (I redesigned the original problem so it looks nice), if we use symbols, we can represent the problem like this:

square * square + circle = 16
triangle * triangle * triangle = 27
triangle * square = 6

square * circle * triangle = ?

Upon reading the problem question, we know the following things for sure:

There are 3 unknown variables - square, triangle and circle.
There are total 3 known concrete result values of the expressions made of these 3 unknown variables.
All three unknown variables hold integer values.

These three known concrete values of the expressions of these unknown values are essentially the constraints required to reach the required values for square, circle and triangle. If you do not understand this right now, you’ll get it soon.

Example with z3

To get started with z3, install it with the following command:

pip install z3_solver

Now, import everything from z3 to get started:

from z3 import *

Let me bring the problem question here so you don’t have to scroll.

square * square + circle = 16
triangle * triangle * triangle = 27
triangle * square = 6

square * circle * triangle = ?

From our previous analysis, we know that all three unknown variables hold integer values, so we’ll define all three of these as Ints:

from z3 import *

square = Int("square")
circle = Int("circle")
triangle = Int("triangle")

# Alternatively you can define all of them in one line
# square, circle, triangle = Ints("square, circle, triangle")

Now, we’ll have to create a solver object to which we will add all of our constraints:

from z3 import *

square = Int("square")
circle = Int("circle")
triangle = Int("triangle")

solver = Solver()

Let us now define our first constraint, which is square * square + circle = 16:

from z3 import *

...

solver = Solver()
solver.add(square * square + circle == 16) # z3 requires us to use '==' for showing equality.

Simple, right? Now add the rest of the constraints:

from z3 import *

...

solver = Solver()
solver.add(square * square + circle == 16)
solver.add(triangle * triangle * triangle == 27)
solver.add(triangle * square == 6)

Now after defining all the constraints, the next for us is to check whether these set of equations (or constraints) are satisfiable or not, which can be done by calling the check method on the solver object after defining the constraints:

from z3 import *

...

solver.add(square * square + circle == 16)
solver.add(triangle * triangle * triangle == 27)
solver.add(triangle * square == 6)

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	# do stuff

After calling the check method, we call the model method to retrieve a satisfying model which we can later use to get the values of the unknown variables:

from z3 import *

...

solver.add(square * square + circle == 16)
solver.add(triangle * triangle * triangle == 27)
solver.add(triangle * square == 6)

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	m = solver.model()

If you want to keep things easier, you can simply print m and it’ll return the values for square, circle and triangle.

from z3 import *

...

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	m = solver.model()
	print(m)

This will output the values which satisfy our constraint, which are:

[circle = 12, triangle = 3, square = 2]

Now you could manually do solve question with just these values or write code which can do it itself:

manually:

square * circle * triangle = ?
2 * 12 * 3 = 72

The other way is this:

from z3 import *

...

# sat stands for sastisfiable, meaning that the set of constraints are satisfiable
if solver.check() == sat:
	m = solver.model()

	# eval method returns the numbers with the type z3.z3.IntNumRef
	# as_long method is used to convert that type to int
	square_value = m.eval(square).as_long()
	circle_value = m.eval(circle).as_long()
	triangle_value = m.eval(triangle).as_long()

	result = square_value * circle_value * triangle_value
	print("The answer is: ", result)

That’s it, it wasn’t the smallest of the explanation but was meant for people with any level of experience with z3 to understand it. The full code can be found here.

Now, look at this piece of code:

#include <stdio.h>

void win() {
  printf("You win!\n");
  exit(0)
}

void lose() {
  printf("You lose!\n");
  exit(-1)
}

int check(int square, int circle, int triangle) {
  if (square * square + circle == 16) {
    if (triangle * triangle * triangle == 27) {
      if (triangle * square == 6) {
        win();
      }
    }
  }
  lose();
}

int main(void) {
  int square;
  int circle;
  int triangle;

  printf("Enter the value of square: ");
  scanf("%d", & square);

  printf("Enter the value of circle: ");
  scanf("%d", & circle);

  printf("Enter the value of triangle: ");
  scanf("%d", & triangle);

  check(square, circle, triangle);
  return 0;
}

Looks familiar?
Well, this is the same problem but framed as a C program where the objective is to get the program to call the win function. Obviously, we can get the valid inputs for this program from the same script as before. And this is how you’ll write scripts - by first reading the decompiled or source code of the program and then figuring out all the constraints (or conditions) that are needed to be satisfied in order to reach a specific path.

Now that we’ve gone through this one, you can surely try another simple problem that I found today on Twitter: Here
Solution here

Another example

Let’s try another example from a recent challenge from amateurs ctf, it’s name was “volcano”.

Given file: volcano
Description: Inspired by recent “traumatic” events.

Here’s the decompilation of the main function:

__int64 __fastcall main(int a1, char **a2, char **a3)
{
  ...

  v13 = __readfsqword(0x28u);
  setbuf(stdin, 0LL);
  setbuf(stdout, 0LL);
  setbuf(stderr, 0LL);
  printf("Give me a bear: ");
  v7 = 0LL;
  __isoc99_scanf("%llu", &v7);
  if ( (unsigned __int8)sub_0_12BB(v7) != 1 )
  {
    puts("That doesn't look like a bear!");
    return 1LL;
  }
  else
  {
    printf("Give me a volcano: ");
    v8 = 0LL;
    __isoc99_scanf("%llu", &v8);
    if ( (unsigned __int8)sub_0_13D9(v8) != 1 )
    {
      puts("That doesn't look like a volcano!");
      return 1LL;
    }
    else
    {
      printf("Prove to me they are the same: ");
      v9 = 0LL;
      v10 = 4919LL;
      __isoc99_scanf("%llu", &v9);
      if ( (v9 & 1) != 0 && v9 != 1 )
      {
        v4 = sub_0_1209(v8);
        if ( v4 == sub_0_1209(v7)
          && (v5 = sub_0_124D(v8), v5 == sub_0_124D(v7))
          && (v6 = sub_0_1430(v10, v8, v9), v6 == sub_0_1430(v10, v7, v9)) )
        {
          puts("That looks right to me!");
          stream = fopen("flag.txt", "r");
          fgets(s, 128, stream);
          puts(s);
          return 0LL;
        }
		...

So, the program first asks for a integer input (llu stands for long long unsigned) and then calls the sub_0_012BB function to check for something and if the check fails, it prints an error message and exits.
Let’s rename this function to check_input and look into the function to see what it’s doing:

_BOOL8 check_input(unsigned __int64 a1)
{
  if ( (a1 & 1) != 0 )
    return 0LL;
  if ( a1 % 3 != 2 )
    return 0LL;
  if ( a1 % 5 != 1 )
    return 0LL;
  if ( a1 % 7 == 3 )
    return a1 % 0x6D == 55;
  return 0LL;
}

Looks like it’s just checking for some conditions… or constraints? These constraints can be easily defined through z3, so let’s do that, here’s what it’ll result in:

import z3  
  
# 64 bit bitvector (includes printable/non-printable, all characaters)
inp1 = z3.BitVec('inp1', 64)

s = z3.Solver()  
# conditions based on checks  
s.add((inp1 & 1) == 0)  
s.add(inp1 % 3 == 2)  
s.add(inp1 % 5 == 1)  
s.add(inp1 % 7 == 3)  
s.add(inp1 % 0x6D == 55)

Now, let’s see what the code does if the checks are passed and keep updating our script:

...
else
  {
    printf("Give me a volcano: ");
    input2 = 0LL;
    __isoc99_scanf("%llu", &input2);
	// renamed for readability "input2"
    if ( (unsigned __int8)sub_0_13D9(input2) != 1 )
    {
      puts("That doesn't look like a volcano!");
      return 1LL;
    }
    ...

It’s clear that the program takes another such integer input (lets call it inp2) and then calls a function similar to the previous if statement, let’s also look up this function:

_BOOL8 check_input_2(unsigned __int64 a1)
{
  unsigned __int64 v2;

  v2 = 0LL;
  while ( a1 )
  {
    v2 += a1 & 1;
    a1 >>= 1;
  }
  return v2 > 0x10 && v2 <= 0x1A;
}

Nothing quite complex, it seems to be looping a1 times and then doing some some boolean operations - these can be easily reimplemented in Python. Let’s add it to our script:

import z3
...
# the program asks for a "volcano" so we named it after that
def check_volcano(a1):  
	v2 = 0  
	while a1:  
		v2 += a1 & 1 
		# >>== is same as: var = var >> 1 
		a1 = a1 >> 1  

	# just rewrote it more cleanly
	return 0x10 < v2 <= 0x1A

Perfect! Let’s look further to see what the program does when this input also passes through the second function:

...
else{
      printf("Prove to me they are the same: ");
      input3 = 0LL;
      v10 = 0x1337LL;
      __isoc99_scanf("%llu", &input3);
      if ( (input3 & 1) != 0 && input3 != 1 )
      {
		// function cluster
        v4 = sub_0_1209(input2);
        if ( v4 == sub_0_1209(input)
          && (v5 = sub_0_124D(input2), v5 == sub_0_124D(input))
          && (v6 = sub_0_1430(v10, input2, input3), v6 == sub_0_1430(v10, input, input3)) )
        {
          puts("That looks right to me!");
          stream = fopen("flag.txt", "r");
          fgets(s, 128, stream);
          puts(s);
          return 0LL;
        }
		...

Another input is taken (call it inp3) and then it is checked whether the and of this input and 1 is not equal to zero and the number itself is not 1, if that is true the input is then put into a cluster of functions whose output determines whether the input is correct or not. One possible value for input3 would be 3, remember it for later. Alright, let’s have a look into each function one by one:

// function is called with input2 as a parameter
__int64 sub_0_1209(unsigned __int64 a1)
{
  __int64 v3; // [rsp+10h] [rbp-8h]

  v3 = 0LL;
  while ( a1 )
  {
    ++v3;
    a1 /= 0xAuLL;
  }
  return v3;
}

This is another simple function, it’s just counting the number of digits in the input a1. I can easily tell because it’s incrementing v3 for each digit in a1 returning it (no changes made to a1 because the function is using it’s local copy). I’m not reimplementing few functions after this right now, you’ll know why soon.
Now observe this pattern of how this function is called:

if ( (input3 & 1) != 0 && input3 != 1 )
      {
        digits_of_inp1 = count_digits(input2);
        if ( digits_of_inp1 == count_digits(input1)

So, it’s just checking if the number of digits in input1 is equal to the number of digits in input2.
Let’s move forward and look at the function calls after this:

After this, another function is called with both input1 and input2 and then checked in same way:

if ( digits_of_inp1 == count_digits(input1)
          && (v5 = sub_0_124D(input2), v5 == sub_0_124D(input1))
		...

Lets look inside the function:

__int64 __fastcall sub_0_124D(unsigned __int64 a1)
{
  __int64 v3; // [rsp+10h] [rbp-8h]

  v3 = 0LL;
  while ( a1 )
  {
    v3 += a1 % 10;    // abc % 10 = c, gets the last number of a sequence of digits
    a1 /= 10;    // abc // 10 = ab, removes the last digit so it can operate on the next digit
  }
  return v3;
}

The function is simply adding every digit of a1 in reverse and returning it. On every iteration, if the number is let’s say 123, it’ll add get 3 by doing the % 10 operation and then it adds it to v3, then it removes the last digit, which is 3 in this case by the /= 10 operation and continues till there are no digits left in the input. Let’s rename it to sum. Looking at how it’s used, it’s clear that it’s checking the sum of both of these inputs to be the same:

...
&& (v5 = sum(input2), v5 == sum(input1))
...

Let’s now look at the last function in this if statement:

...
&& (v6 = sub_0_1430(v10, input2, input3), v6 == sub_0_1430(v10, input1, input3)) )
...

This last function is now called with three inputs, the variable which holds the constant 0x1337 (v10), input2 and input3, and it’s compared with the call to the same function with just input1 in place of input2: Let’s look inside the function:

__int64 __fastcall sub_0_1430(unsigned __int64 a1, unsigned __int64 a2, unsigned __int64 a3)
{
  unsigned __int64 v5; // [rsp+10h] [rbp-18h]
  __int64 v6; // [rsp+20h] [rbp-8h]

  v6 = 1LL;
  v5 = a1 % a3;
  while ( a2 )
  {
    if ( (a2 & 1) != 0 )
      v6 = v5 * v6 % a3;
    a2 >>= 1;
    v5 = v5 * v5 % a3;
  }
  return v6;
}

This function is essentially implementing (a1^a2) % a3 (^ for exponent, not for xor). I can easily spot this because I’ve seen this pattern before, it doesn’t matter if you don’t understand it completely because we can just reimplement it in python for our z3 script if we need to. If the output of this function with these different outputs remains same, we get the flag:

if ( (input3 & 1) != 0 && input3 != 1 )
      {
        digits_of_inp1 = count_digits(input2);
        if ( digits_of_inp1 == count_digits(input1)
        && (v5 = sum(input2), v5 == sum(input1))
        && (v6 = exponent_modulo(v10, input2, input3), v6 == exponent_modulo(v10, input1, input3)))
        {
          puts("That looks right to me!");
          stream = fopen("flag.txt", "r");
          fgets(s, 128, stream);
          puts(s);
          return 0LL;
        }
		...

After playing a bit with the solution that I originally came up with, I realized that if I just pass the volcano check with the same inputs (input1 == input2), I won’t have to deal with all the other checks in that cluster. Due to this, I did not reimplement any function after volcano_check in my final script to save time, however I included the explanations for the sake of completeness and to teach just how to approach challenges like this. Here’s the final script:

import z3

# 64 bit input
inp1 = z3.BitVec("inp1", 64)

# function translated from decompiled c code
def volcano_check(a1):
    v2 = 0
    while a1:
        v2 += a1 & 1
        a1 = a1 >> 1
    return 0x10 < v2 <= 0x1A

s = z3.Solver()
# conditions based on checks from the first function
s.add((inp1 & 1) == 0)
s.add(inp1 % 3 == 2)
s.add(inp1 % 5 == 1)
s.add(inp1 % 7 == 3)
s.add(inp1 % 0x6D == 55)

# while there are valid solutions
while s.check() == z3.sat:
    inp1_solution = int(s.model()[inp1].as_long())
    # checking if any of the solution that passes the constraints
    # from the first function also passes them for the volcano check func.
    if volcano_check(inp1_solution):
        # input1 and input 2 can be the same
        print("input 1 & 2: ", inp1_solution)

        # i & 1 != 0, remember 3?
        print("input 3: ", 3)
        break

	# if an input is already found but does not pass the check
	# this prevents it  from using it again 
    s.add(inp1 != s.model()[inp1])

Running this script, we get the value for input 1 and 2: 389970145857386 and input 3 is any number whose & with 1 is not zero e.g. 3. Now try executing the binary, give it the input and see :o

And here we go! We’ve solved the challenge using Z3! :D

Problems with SMT solvers

While SMT solvers may seem very powerful, which they imo, they have their own share of flaws. The major one being something known as path explosions. These are not the only limitations of SMT solvers, there are others too but this is the major bottleneck.

Path explosions

As the number of variables and constraints from a program grows, the search space grows exponentially, leading to a “explosion” in the number of possible paths the solver has to explore. This makes it difficult for SMT solvers to scale to large, complex programs which take huge inputs of take inputs in loops. This problem makes SMT solvers quite unusable in real world software analysis scenarios, there are many workarounds and developments in this area for sure but there’s still a lot of work to be done.

Due to this, SMT solvers may not always be the best tool to use for your specific job, they are not yet a one-size-fits-all thing yet.

Summary

This post was an overview of SMT solvers with the practical example of a CTF challenge and we also touched a bit on their limitations. I’m not an expert on the topic, I tried to cover all the introductory knowledge that I could put in without increasing the complexity of the blog. There is indeed far more to learn about and you can do so by checking all the links in the resources section.