Blog post

SteelSeries Overlay Hook​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‍‌​‍​‌‍​‍‌‍‌‌​​​​​​​‍‌​‌​​‍‌​‌​‌‍‌‍​​​‍​​‍‌​‌​​‌‌‌‍‌‍‌‍‌‌​‍‌‌‍​‌‌‍​‌‍​‍​​​‍‌​​​​‍‌‍​‍‌‍​‍​​‌‌‍​‌‌‍‌‌‌‍‌‍‌‍‌‌​‌​​​​​​‌​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‍‌​‍​‌‍​‍‌‍‌‌​​​​​​​‍‌​‌​​‍‌​‌​‌‍‌‍​​​‍​​‍‌​‌​​‌‌‌‍‌‍‌‍‌‌​‍‌‌‍​‌‌‍​‌‍​‍​​​‍‌​​​​‍‌‍​‍‌‍​‍​​‌‌‍​‌‌‍‌‌‌‍‌‍‌‍‌‌​‌​​​​​​‌​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌

March 20, 2026 By Devirtz​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌‌‍​‌‌‌‌‌‌​‌‍‍​‌‍‌​‍​‍‌‌‍​‌‌‌‍‌‍‌‌‌​‍‌‍‌​‍‌‌‍​‌‍​‌‌‍‍‌‍‌‌​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‌‌‌‌‌‌​‌‍‍​‌‍‌​‍​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‍‌‍​‌‌‍‌‌‍‌‌​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‌‍‍‌‌‍‌​​‌‌‍​‌‌‌‌‌‌​‌‍‍​‌‍‌​‍​‍‌‌‍​‌‌‌‍‌‍‌‌‌​‍‌‍‌​‍‌‌‍​‌‍​‌‌‍‍‌‍‌‌​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‌‌‌‌‌‌​‌‍‍​‌‍‌​‍​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‍‌‍​‌‌‍‌‌‍‌‌​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌
reversing​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​​‍​​​​‍‌​‌‍​‌​​‍‌‌‍​‌‍‌‍​‍‌​​‌‍​‍​‌‌‍‌‌​‍‌​‌​​‌​​‌​​​​‍‌‌‍​‍‌‍‌​​‌​‌‍​‍‌‌‍​‌‍‌​‌‍​‍​‍‌​​‍‌‍‌​‌‍‌​‌‍‌‌‌‍‌‌​‌​‌‌​‌​​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‌‍​‌‌‌​‌‍‌‌‌‍‌‌‍‌​‍‌‍‌​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​​‍​​​​‍‌​‌‍​‌​​‍‌‌‍​‌‍‌‍​‍‌​​‌‍​‍​‌‌‍‌‌​‍‌​‌​​‌​​‌​​​​‍‌‌‍​‍‌‍‌​​‌​‌‍​‍‌‌‍​‌‍‌​‌‍​‍​‍‌​​‍‌‍‌​‌‍‌​‌‍‌‌‌‍‌‌​‌​‌‌​‌​​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌‍​‌‍​‌‌‌​‌‍‌‌‌‍‌‌‍‌​‍‌‍‌​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌

lets you draw your own ImGui stuff on top of any game, bypassing most of the AntiCheat detection.​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‍‌​‍​‌‍​‍‌‍‌‌​​​​​​​‍‌​‌​​‍‌​‌​‌‍‌‍​​​‍​​‍‌​‌​​‌‌‌‍‌‍‌‍‌‌​‍‌‌‍​‌‌‍​‌‍​‍​​​‍‌​​​​‍‌‍​‍‌‍​‍​​‌‌‍​‌‌‍‌‌‌‍‌‍‌‍‌‌​‌​​​​​​‌​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‍‌​‍​‌‍​‍‌‍‌‌​​​​​​​‍‌​‌​​‍‌​‌​‌‍‌‍​​​‍​​‍‌​‌​​‌‌‌‍‌‍‌‍‌‌​‍‌‌‍​‌‌‍​‌‍​‍​​​‍‌​​​​‍‌‍​‍‌‍​‍​​‌‌‍​‌‌‍‌‌‌‍‌‍‌‍‌‌​‌​​​​​​‌​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‍‌‌‌‍​‌‍​‌‍‌‌‌​‍‌​​‌‌​​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌

SteelSeries Overlay Hook​​​​‌‍​‍​‍‌‍‌​‍‌‍‍‌‌‍‌‌‍‍‌‌‍‍​‍​‍​‍‍​‍​‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌‍‍‌‌‍​‍​‍​‍​​‍​‍‌‍‍​‌​‍‌‍‌‌‌‍‌‍​‍​‍​‍‍​‍​‍‌‍‍​‌‌​‌‌​‌​​‌​​‍‍​‍​‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‍‌‌‍‍‌‌​‌‍‌‌‌‍‍‌‌​​‍‌‍‌‌‌‍‌​‌‍‍‌‌‌​​‍‌‍‌‌‍‌‍‌​‌‍‌‌​‌‌​​‌​‍‌‍‌‌‌​‌‍‌‌‌‍‍‌‌​‌‍​‌‌‌​‌‍‍‌‌‍‌‍‍​‍‌‍‍‌‌‍‌​​‌​‍‌​‍​‌‍​‍‌‍‌‌​​​​​​​‍‌​‌​​‍‌​‌​‌‍‌‍​​​‍​​‍‌​‌​​‌‌‌‍‌‍‌‍‌‌​‍‌‌‍​‌‌‍​‌‍​‍​​​‍‌​​​​‍‌‍​‍‌‍​‍​​‌‌‍​‌‌‍‌‌‌‍‌‍‌‍‌‌​‌​​​​​​‌​‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‌‍​‍‌‍​‌‌​‌‍‌‌‌‌‌‌‌​‍‌‍​​‌‌‍‍​‌‌​‌‌​‌​​‌​​‍‌‌​​‌​​‌​‍‌‌​​‍‌​‌‍​‍‌‌​​‍‌​‌‍‌‍‌​‌‍‌‌‌‌‍‌‍‍‌‌​‍‌‌​‌‍‍​‍‍‌​‌‍​‌‌‍‍‌‍‍‌‌‌​‌‍‌​‍‍‌​‌‌​‌‌‌‌‍‌​‌‍‍‌‌‍​‍‌‍‌‍‍‌‌‍‌​​‌​‍‌​‍​‌‍​‍‌‍‌‌​​​​​​​‍‌​‌​​‍‌​‌​‌‍‌‍​​​‍​​‍‌​‌​​‌‌‌‍‌‍‌‍‌‌​‍‌‌‍​‌‌‍​‌‍​‍​​​‍‌​​​​‍‌‍​‍‌‍​‍​​‌‌‍​‌‌‍‌‌‌‍‌‍‌‍‌‌​‌​​​​​​‌​‍‌‍‌‌​‌‍‌‌​​‌‍‌‌​‌‌​​‌‍‌​‌‌​​‍‌‍‌​​‌‍​‌‌‌​‌‍‍​​‌‌‌​‌‍‍‌‌‌​‌‍​‌‍‌‌​‍‌‍‌​​‌‍‌‌‌​‍‌​‌​​‌‍‌‌‌‍​‌‌​‌‍‍‌‌‌‍‌‍‌‌​‌‌​​‌‌‌‌‍​‍‌‍​‌‍‍‌‌​‌‍‍​‌‍‌‌‌‍‌​​‍​‍‌‌

SteelSeries Overlay Hook

ok so this is basically a DLL you inject into SteelSeriesGameOverlay.dll that lets you draw your own ImGui stuff on top of any game. SteelSeries already runs a transparent overlay window over your game, so we just hijack that. there's also an injector included in the repo. it's a manual map injector, nothing fancy, but it works great because SteelSeries doesn't have any injection protection lol.


What SteelSeries is Actually Doing

The Window

SteelSeries calls CreateWindowExW to make a transparent popup window with the class name "GameOverlay". I found this by searching for CreateWindowExW in IDA and following the xref to Overlay_CreateWindow at 0x180009620.

The window has a bunch of extended styles on it: WS_EX_LAYERED, WS_EX_TRANSPARENT, WS_EX_NOACTIVATE, and WS_EX_TOPMOST. So by default it's completely click-through and always stays on top of everything.

One thing that's kinda interesting. The width and height come from [rdi+68h] and [rdi+6Ch] inside the overlay object. Those are the tracked game window dimensions, not the full monitor size. So the window starts out matching the game window size, not your whole screen.

The Overlay Object

Every overlay instance has a heap-allocated object that holds all of its state. You can get the pointer to it by calling GetWindowLongPtrW(hwnd, GWLP_USERDATA). Here are the important offsets I found in IDA:

+0x00   HWND                the overlay window handle
+0x08   HWND                tracked game window (used for foreground checks)
+0x20   IDCompositionDevice*
+0x28   IDXGISwapChain*     the live swapchain (gets recreated on WM_SIZE)
+0x78   BYTE  visibleFlag   1 = window is shown and timer is running
+0x7A   BYTE  fgFlag        1 = game window is in foreground
+0x7B   BYTE  dirtyFlag     1 = force a full render this tick

I put all these offsets into State.h so the rest of the code can just reference them by name instead of magic numbers:

struct State
{
    static constexpr size_t    kSwapChainOffset      = 0x28;
    static constexpr ptrdiff_t kTrackedHwndOffset    = 8;
    static constexpr ptrdiff_t kVisibleFlagOffset    = 120;
    static constexpr ptrdiff_t kForegroundFlagOffset = 122;
    static constexpr ptrdiff_t kDirtyFlagOffset      = 123;
};

The Swapchain

Created in Overlay_InitD3D at 0x1800085a0. It's a DirectComposition swapchain with these settings:

Format:      DXGI_FORMAT_B8G8R8A8_UNORM
SwapEffect:  DXGI_SWAP_EFFECT_FLIP_SEQUENTIAL
AlphaMode:   DXGI_ALPHA_MODE_PREMULTIPLIED
BufferCount: 2

The important one here is DXGI_ALPHA_MODE_PREMULTIPLIED. That means each pixel's RGB is already multiplied by its alpha. so a transparent pixel is literally just (0,0,0,0). This is how the overlay shows the game through it. The backbuffer gets cleared to (0,0,0,0) every frame, so anything you don't draw is see-through. Pretty clever honestly.

The Render Loop

I found this by tracing WM_TIMER in the WndProc at 0x18000a278. Every timer tick it calls Overlay_RenderFrame at 0x180009130. Here's what it does:

1. GetForegroundWindow check (early exit if game is not foreground)
2. IDCompositionDevice::WaitForCommitCompletion()
3. IDCompositionDevice::Commit()
4. IDXGISwapChain::Present(SyncInterval=1, 0)

Step 2 is the crazy one. WaitForCommitCompletion just sits there and waits for the DWM to finish compositing, which takes like 16ms. And since the whole message loop runs on one thread, every single message has to wait behind this call. That's why SS is locked to around 60fps no matter what timer rate you set. Pretty bad design tbh.

Hiding Logic

Overlay_UpdateVisibility at 0x18000a040 runs when the tracked game window loses foreground focus. It calls KillTimer(id=1) to stop the render ticks and then ShowWindow(SW_HIDE) to hide the overlay. We need to block both of these to keep our overlay alive.


How We Hook It

Alright this is where the fun starts. The hook DLL is the core of the whole project. Once it gets injected, DllMain fires and spins up a worker thread that does everything:

DWORD WINAPI run(HMODULE module)
{
    // Truncate log file for this inject session.
    {
        HANDLE hf = CreateFileA(logPath(), GENERIC_WRITE, FILE_SHARE_READ,
            nullptr, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, nullptr);
        if (hf != INVALID_HANDLE_VALUE) CloseHandle(hf);
    }
    Log() << "[+] Hook loaded\n";

    HWND hwnd = nullptr;
    for (int i = 0; i < 100 && !hwnd; ++i)
    {
        hwnd = findOverlayWindow();
        Sleep(50);
    }

    if (!hwnd || !waitForSwapChain(hwnd))
    {
        Log() << "[-] GameOverlay not ready\n";
        FreeLibraryAndExitThread(module, 0);
    }

    if (!renderer.install(hwnd))
    {
        Log() << "[-] Install failed\n";
        FreeLibraryAndExitThread(module, 0);
    }

    Log() << "[+] Running  Insert: toggle   F3: eject\n";

    timeBeginPeriod(1);
    bool insertDown = false;
    while (!(GetAsyncKeyState(VK_F3) & 0x8000))
    {
        Sleep(10);

        const bool insert = (GetAsyncKeyState(VK_INSERT) & 0x8000) != 0;
        if (insert && !insertDown)
        {
            window.setMenuVisible(!window.menuVisible());
            window.setInteractive(window.menuVisible());
        }
        insertDown = insert;
    }
    timeEndPeriod(1);
    renderer.uninstall();
    FreeLibraryAndExitThread(module, 0);
    return 0;
}

So the flow goes like this: find the overlay window, wait for the swapchain to show up, install all our hooks, then just sit in a hotkey loop. Insert toggles the menu on and off (and switches whether you can click on it), and F3 ejects the whole DLL cleanly.

Finding the overlay window is pretty simple. We just look for a window with the class name "GameOverlay":

HWND findOverlayWindow()
{
    if (HWND h = FindWindowA("GameOverlay", "GameOverlay"))
        return h;
    return FindWindowA("GameOverlay", nullptr);
}

We also need to wait for the swapchain to actually exist before we can do anything, because it might not be ready right away:

bool waitForSwapChain(HWND hwnd)
{
    for (int i = 0; i < 200; ++i)
    {
        const LONG_PTR obj = GetWindowLongPtrW(hwnd, GWLP_USERDATA);
        if (obj && *reinterpret_cast<void**>(obj + State::kSwapChainOffset))
            return true;
        Sleep(50);
    }
    return false;
}

This checks the overlay object's +0x28 field (the swapchain pointer) up to 200 times, which is about 10 seconds total. Once it's not null, we know D3D is initialized and we're good to go.

1. WndProc Subclassing

We replace the window's GWLP_WNDPROC on the "GameOverlay" window using SetWindowLongPtrA. After that, every single message goes through our hook first before reaching the original.

void Window::subclass(HWND hwnd)
{
    if (!hwnd || m_originalWndProc)
        return;

    m_hwnd            = hwnd;
    m_originalWndProc = reinterpret_cast<WNDPROC>(
        SetWindowLongPtrA(hwnd, GWLP_WNDPROC, reinterpret_cast<LONG_PTR>(&wndProcHook)));

    // One-shot: resize this window to full screen on the SS thread.
    PostMessage(hwnd, WM_APP, 0, 0);

    setInteractive(m_menuVisible);
}

Right after subclassing we fire a WM_APP message to resize the overlay to full screen. Remember, SS originally sizes it to match the game window, but we want full screen coverage.

Now here's the full WndProc hook. This is where we intercept and mess with all the messages:

LRESULT CALLBACK Window::wndProcHook(HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam)
{
    // Custom focus message
    if (msg == WM_USER)
    {
        if (window.m_menuVisible)
            SetFocus(hwnd);
        return 0;
    }

    // Eat every timer tick. Prevent Overlay_RenderFrame from running.
    if (msg == WM_TIMER)
    {
        const LONG_PTR overlayObj = GetWindowLongPtrW(hwnd, GWLP_USERDATA);
        if (overlayObj)
        {
            *reinterpret_cast<BYTE*>(overlayObj + State::kVisibleFlagOffset)    = 1;
            *reinterpret_cast<BYTE*>(overlayObj + State::kForegroundFlagOffset) = 1;
            *reinterpret_cast<BYTE*>(overlayObj + State::kDirtyFlagOffset)      = 0;
        }
        return 0;   // swallow it, never reaches the original WndProc
    }

    // One-shot resize to full screen
    if (msg == WM_APP)
    {
        SetWindowPos(hwnd, nullptr, 0, 0,
            GetSystemMetrics(SM_CXSCREEN), GetSystemMetrics(SM_CYSCREEN),
            SWP_NOZORDER | SWP_NOACTIVATE | SWP_NOOWNERZORDER);
        return 0;
    }

    // Block SS from hiding the overlay
    if (msg == WM_SHOWWINDOW && wParam == FALSE)
        return 0;

    // Force full-screen size and prevent hiding via WINDOWPOS
    if (msg == WM_WINDOWPOSCHANGING)
    {
        auto* wp = reinterpret_cast<WINDOWPOS*>(lParam);
        if (wp)
        {
            wp->flags = (wp->flags & ~SWP_HIDEWINDOW) | SWP_SHOWWINDOW;
            if (!(wp->flags & SWP_NOSIZE))
            {
                wp->cx = GetSystemMetrics(SM_CXSCREEN);
                wp->cy = GetSystemMetrics(SM_CYSCREEN);
            }
            if (!(wp->flags & SWP_NOMOVE))
            {
                wp->x = 0;
                wp->y = 0;
            }
        }
    }

    // Feed input to ImGui when menu is open
    if (window.m_menuVisible && ImGui_ImplWin32_WndProcHandler(hwnd, msg, wParam, lParam))
        return TRUE;

    // Allow mouse clicks through to the overlay when menu is open
    if (msg == WM_MOUSEACTIVATE && window.m_menuVisible)
        return MA_ACTIVATE;

    // When menu is closed, the overlay is click-through
    if (msg == WM_NCHITTEST)
        return window.m_menuVisible ? HTCLIENT : HTTRANSPARENT;

    return CallWindowProcA(window.m_originalWndProc, hwnd, msg, wParam, lParam);
}

Here's a quick summary of what we intercept:

Message What we do
WM_TIMER Eat it. Write visibleFlag=1, fgFlag=1, dirtyFlag=0. Never let Overlay_RenderFrame run.
WM_SHOWWINDOW(FALSE) Block it. SS tries to hide us when the game loses focus. Nope.
WM_WINDOWPOSCHANGING Force position to (0,0) and size to full screen every time SS tries to resize us.
WM_APP Our own message. Resizes the window to full screen on the SS thread.
WM_NCHITTEST Return HTCLIENT when menu is open, HTTRANSPARENT when it's not.
WM_MOUSEACTIVATE Return MA_ACTIVATE when menu is open so we can get clicks.

Eating WM_TIMER is the big one. It stops Overlay_RenderFrame from ever running, which means WaitForCommitCompletion never blocks the SS thread again. That's huge.

Toggling Interactivity

When you hit Insert to toggle the menu, we flip the window's extended styles so it either accepts input or lets everything pass through:

void Window::setInteractive(bool interactive)
{
    if (!m_hwnd || !IsWindow(m_hwnd))
        return;

    const LONG_PTR ex = GetWindowLongPtrA(m_hwnd, GWL_EXSTYLE);
    if (interactive)
    {
        SetWindowLongPtrA(m_hwnd, GWL_EXSTYLE,
            ex & ~(WS_EX_TRANSPARENT | WS_EX_NOACTIVATE));
        BringWindowToTop(m_hwnd);
        SetForegroundWindow(m_hwnd);
        SendMessage(m_hwnd, WM_USER, 0, 0);
    }
    else
    {
        SetWindowLongPtrA(m_hwnd, GWL_EXSTYLE,
            ex | WS_EX_TRANSPARENT | WS_EX_NOACTIVATE);
    }
}

Pretty straightforward. When interactive, we strip WS_EX_TRANSPARENT and WS_EX_NOACTIVATE so the overlay can get mouse and keyboard input. When the menu is closed, we add them back so your clicks go straight through to the game.

2. Vtable Patching

We grab the swapchain pointer from overlayObject + 0x28 and overwrite two slots in its COM vtable. We do this directly in memory using VirtualProtect:

Slot Original Our Version
8 IDXGISwapChain::Present hookedPresent
13 IDXGISwapChain::ResizeBuffers hookResizeBuffers

Now any code that calls Present or ResizeBuffers on this swapchain hits our functions instead.

So how does vtable patching actually work? A COM object stores all its methods as function pointers in an array (the vtable). We just overwrite the pointer in the right slot with our own function:

uintptr_t Renderer::patchVtable(uintptr_t* vtable, int slot, uintptr_t fn)
{
    const uintptr_t old = vtable[slot];
    DWORD protect = 0;
    VirtualProtect(&vtable[slot], sizeof(uintptr_t), PAGE_EXECUTE_READWRITE, &protect);
    vtable[slot] = fn;
    VirtualProtect(&vtable[slot], sizeof(uintptr_t), protect, &protect);
    return old;
}

Save the original pointer. Make the memory page writable. Write our function address in. Restore the old protection. We save the original pointer so we can call the real method later and also put it back when we uninstall.

The install() method puts everything together. It reads the swapchain from the overlay object, patches the vtable, subclasses the window, and starts up the render thread:

bool Renderer::install(HWND hwnd)
{
    const LONG_PTR overlayObj = GetWindowLongPtrW(hwnd, GWLP_USERDATA);
    if (!overlayObj) return false;

    auto* sc = *reinterpret_cast<IDXGISwapChain**>(overlayObj + State::kSwapChainOffset);
    if (!sc) return false;

    m_swapChain = sc;
    m_swapChain->AddRef();
    m_vtable = *reinterpret_cast<uintptr_t**>(sc);

    m_originalPresent       = reinterpret_cast<PresentFn>(
        patchVtable(m_vtable, kSlotPresent, reinterpret_cast<uintptr_t>(&hookedPresent)));
    m_originalResizeBuffers = reinterpret_cast<ResizeBuffersFn>(
        patchVtable(m_vtable, kSlotResizeBuffers, reinterpret_cast<uintptr_t>(&hookResizeBuffers)));

    window.subclass(hwnd);

    m_renderRunning = true;
    m_renderThread  = std::thread([]{ renderer.renderLoop(); });

    return true;
}

3. The Hooked Present (Where Rendering Happens)

This is the heart of the whole thing. Every time anything calls Present on the swapchain, it ends up in our function. We lazily set up D3D11 and ImGui on the first call, then render our stuff every frame:

HRESULT __stdcall Renderer::hookedPresent(IDXGISwapChain* sc, UINT syncInterval, UINT flags)
{
    if (sc != renderer.m_swapChain)
        renderer.syncSwapChain(sc);

    if (!renderer.m_imguiReady)
    {
        if (!renderer.m_device)
        {
            if (FAILED(sc->GetDevice(__uuidof(ID3D11Device),
                       reinterpret_cast<void**>(&renderer.m_device))))
                return renderer.m_originalPresent(sc, syncInterval, flags);
            renderer.m_device->GetImmediateContext(&renderer.m_context);
        }
        renderer.createRTV();
        renderer.initImGui();
        if (!renderer.m_imguiReady)
            return renderer.m_originalPresent(sc, syncInterval, flags);
    }

    if (!renderer.m_rtv)
        renderer.createRTV();

    renderer.ensureBackbuffer(sc);

    if (!IsWindowVisible(window.hwnd()))
        ShowWindow(window.hwnd(), SW_SHOWNOACTIVATE);

    ImGui_ImplDX11_NewFrame();
    ImGui_ImplWin32_NewFrame();

    // Manually feed cursor position (we're not the window's owner)
    POINT pt{};
    GetCursorPos(&pt);
    ScreenToClient(window.hwnd(), &pt);
    ImGui::GetIO().MousePos    = ImVec2(static_cast<float>(pt.x), static_cast<float>(pt.y));
    ImGui::GetIO().DisplaySize = ImVec2(
        static_cast<float>(GetSystemMetrics(SM_CXSCREEN)),
        static_cast<float>(GetSystemMetrics(SM_CYSCREEN)));

    ImGui::NewFrame();
    renderer.renderScene();
    ImGui::Render();

    renderer.m_context->OMSetRenderTargets(1, &renderer.m_rtv, nullptr);

    const ImVec2 ds = ImGui::GetIO().DisplaySize;
    const D3D11_VIEWPORT vp{ 0.0f, 0.0f, ds.x, ds.y, 0.0f, 1.0f };
    renderer.m_context->RSSetViewports(1, &vp);

    // Clear to fully transparent, premultiplied alpha
    constexpr float kClear[4] = { 0.0f, 0.0f, 0.0f, 0.0f };
    renderer.m_context->ClearRenderTargetView(renderer.m_rtv, kClear);
    ImGui_ImplDX11_RenderDrawData(ImGui::GetDrawData());

    return renderer.m_originalPresent(sc, 0, 0);
}

A few important things going on here:

  • Swapchain sync - if SS recreates the swapchain behind our back, syncSwapChain catches that and rebuilds all our resources.
  • Lazy init - we don't set up ImGui during install(). We do it on the first Present call instead. This way we're already on the render thread and avoid threading issues.
  • Manual mouse position - since we ate WM_TIMER and took over the WndProc, we don't get normal WM_MOUSEMOVE messages. So we just poll GetCursorPos ourselves and convert to client coordinates.
  • Transparent clear - that (0, 0, 0, 0) clear is super important. Premultiplied alpha means that's "fully transparent". Only the pixels ImGui draws will actually show up.

Handling Swapchain Recreation

SteelSeries can recreate the swapchain (like on a window resize). We handle that by checking if the device object changed:

void Renderer::syncSwapChain(IDXGISwapChain* sc)
{
    ID3D11Device* newDevice = nullptr;
    sc->GetDevice(__uuidof(ID3D11Device), reinterpret_cast<void**>(&newDevice));

    if (newDevice != m_device)
    {
        shutdownImGui();
        if (m_rtv)     { m_rtv->Release();     m_rtv     = nullptr; }
        if (m_context) { m_context->Release();  m_context = nullptr; }
        if (m_device)  { m_device->Release();   m_device  = nullptr; }
        m_device = newDevice;
        m_device->GetImmediateContext(&m_context);
    }
    else
    {
        if (newDevice) newDevice->Release();
        if (m_rtv) { m_rtv->Release(); m_rtv = nullptr; }
    }

    if (m_swapChain) m_swapChain->Release();
    sc->AddRef();
    m_swapChain = sc;
}

If the device changed, we nuke everything and start over. If only the swapchain changed, we just update our reference and drop the old render target view.

We also make sure the backbuffer is always full-screen sized, even if SS made a smaller one:

void Renderer::ensureBackbuffer(IDXGISwapChain* sc)
{
    const int sw = GetSystemMetrics(SM_CXSCREEN);
    const int sh = GetSystemMetrics(SM_CYSCREEN);
    DXGI_SWAP_CHAIN_DESC desc{};
    sc->GetDesc(&desc);
    if ((int)desc.BufferDesc.Width >= sw && (int)desc.BufferDesc.Height >= sh)
        return;

    if (m_rtv) { m_rtv->Release(); m_rtv = nullptr; }
    m_context->OMSetRenderTargets(0, nullptr, nullptr);
    if (m_imguiReady) ImGui_ImplDX11_InvalidateDeviceObjects();
    sc->ResizeBuffers(0, sw, sh, DXGI_FORMAT_UNKNOWN, 0);
    createRTV();
    if (m_imguiReady) ImGui_ImplDX11_CreateDeviceObjects();
}

ResizeBuffers Hook

We also hook ResizeBuffers so we can release our render target view before the real call and rebuild it after:

HRESULT __stdcall Renderer::hookResizeBuffers(
    IDXGISwapChain* sc, UINT count, UINT w, UINT h, DXGI_FORMAT fmt, UINT flags)
{
    if (renderer.m_rtv) { renderer.m_rtv->Release(); renderer.m_rtv = nullptr; }
    if (renderer.m_imguiReady) ImGui_ImplDX11_InvalidateDeviceObjects();

    const HRESULT hr = renderer.m_originalResizeBuffers(sc, count, w, h, fmt, flags);

    renderer.createRTV();
    if (renderer.m_imguiReady) ImGui_ImplDX11_CreateDeviceObjects();
    return hr;
}

Without this hook, ResizeBuffers would straight up fail because our render target view still holds a reference to the old backbuffer.

4. Render Thread

We spin up a background std::thread that loops and calls swapchain->Present(0, 0). That goes through our hook, which renders ImGui and then calls the original Present. The vsync stall from the DWM only blocks this thread. The SS message loop runs freely now.

void Renderer::renderLoop()
{
    using namespace std::chrono;
    constexpr int kTargetFPS = 165;
    constexpr auto kFrameTime = duration_cast<nanoseconds>(duration<double>(1.0 / kTargetFPS));

    while (m_renderRunning)
    {
        const auto frameStart = high_resolution_clock::now();

        // Re-read the live swapchain pointer every frame (SS can recreate it)
        const LONG_PTR overlayObj = GetWindowLongPtrW(window.hwnd(), GWLP_USERDATA);
        IDXGISwapChain* liveSc = overlayObj
            ? *reinterpret_cast<IDXGISwapChain**>(overlayObj + State::kSwapChainOffset)
            : nullptr;

        if (!liveSc) { Sleep(1); continue; }
        liveSc->Present(0, 0);   // triggers hookedPresent -> ImGui render

        const auto elapsed = high_resolution_clock::now() - frameStart;
        const auto remaining = kFrameTime - elapsed;
        if (remaining > nanoseconds(0))
            std::this_thread::sleep_for(remaining);
    }
}

The key design choice here: every frame we re-read the live swapchain pointer from the overlay object (+0x28), not our cached copy. If SS recreates the swapchain, we pick up the new one automatically. The Present(0, 0) call is what triggers hookedPresent, which does all the ImGui rendering.

We cap at 165 FPS on our side, but the real framerate limiter is WaitForCommitCompletion inside the original Present. Since that only blocks our render thread now (not the SS message loop), everything stays smooth.

5. The Render Scene

This is where you put your actual overlay content. As a demo I just drew a red circle at screen center (always visible) and an ImGui window with some widgets that shows when the menu is open:

void Renderer::renderScene()
{
    ImGui::GetBackgroundDrawList()->AddCircleFilled(
        { 960.0f, 540.0f }, 30.0f, IM_COL32(255, 50, 50, 200));

    if (!window.menuVisible())
        return;

    static bool  showDemo = true;
    static float value    = 0.0f;
    static int   clicks   = 0;

    ImGui::Begin("Overlay");
    ImGui::Text("Insert: toggle   F3: eject");
    ImGui::Separator();
    ImGui::Checkbox("ImGui Demo", &showDemo);
    ImGui::SliderFloat("Value", &value, 0.0f, 1.0f);
    if (ImGui::Button("Click")) ++clicks;
    ImGui::SameLine();
    ImGui::Text("count: %d   %.0f fps", clicks, ImGui::GetIO().Framerate);
    ImGui::End();

    if (showDemo)
        ImGui::ShowDemoWindow(&showDemo);
}

You'd swap this out with whatever you actually want to draw. This is just to prove it works.

6. Clean Uninstall

When you press F3, we cleanly tear everything down. Stop the render thread, put the vtable back, shut down ImGui, release all D3D stuff, and restore the original WndProc:

void Renderer::uninstall()
{
    m_renderRunning = false;
    if (m_renderThread.joinable())
        m_renderThread.join();

    if (m_vtable)
    {
        if (m_originalPresent)
            patchVtable(m_vtable, kSlotPresent,
                        reinterpret_cast<uintptr_t>(m_originalPresent));
        if (m_originalResizeBuffers)
            patchVtable(m_vtable, kSlotResizeBuffers,
                        reinterpret_cast<uintptr_t>(m_originalResizeBuffers));
        m_vtable = nullptr;
    }

    shutdownImGui();
    releaseD3D();

    m_originalPresent       = nullptr;
    m_originalResizeBuffers = nullptr;

    window.restore();
}

We write the original function pointers back into the vtable slots, so after ejection SteelSeries keeps running like nothing ever happened. Pretty clean.


The Injector (Manual Mapping)

The injector is a separate console app that manually maps the hook DLL into the SteelSeries process. We do it this way instead of using LoadLibrary because LoadLibrary is easy to detect. Manual mapping doesn't leave the same traces.

Entry Point

The injector can auto-detect the GameOverlay process, or you can tell it which process to target:

int wmain(int argc, wchar_t* argv[])
{
    if (argc < 2 || argc > 3)
    {
        std::wcerr << L"Usage:\n"
                   << L"  Injector <dll_path>\n"
                   << L"  Injector <process_name|pid> <dll_path>\n";
        return 1;
    }

    DWORD        pid = 0;
    std::wstring dllPath;

    if (argc == 2)
    {
        dllPath = argv[1];
        pid = Injector::findSteelSeriesOverlayPid();
        if (!pid)
        {
            std::cerr << "[-] SteelSeries GameOverlay window not found.\n";
            return 1;
        }
        std::cout << "[+] Found GameOverlay PID: " << pid << '\n';
    }
    else
    {
        std::wstring target = argv[1];
        dllPath             = argv[2];

        wchar_t* end = nullptr;
        pid = static_cast<DWORD>(wcstoul(target.c_str(), &end, 10));
        if (*end != L'\0')
            pid = Injector::findPidByName(target);

        if (!pid)
        {
            std::wcerr << L"[-] Process not found: " << target << L'\n';
            return 1;
        }
    }

    Injector inj(pid);
    return inj.inject(dllPath) ? 0 : 1;
}

Finding the PID works the same way the hook finds the window. Just FindWindowA("GameOverlay", ...) and then get the PID from that:

DWORD Injector::findSteelSeriesOverlayPid()
{
    HWND hw = FindWindowA("GameOverlay", "GameOverlay");
    if (!hw) return 0;
    DWORD pid = 0;
    GetWindowThreadProcessId(hw, &pid);
    return pid;
}

Manual Mapping (The inject() Method)

Ok this is where the real magic happens. We read the DLL file from disk, parse the PE headers, allocate memory inside the remote process, and copy all the sections over:

bool Injector::inject(const std::wstring& dllPath)
{
    auto fileData = readFile(dllPath);
    if (fileData.empty()) { std::cerr << "[-] Cannot read DLL.\n"; return false; }

    BYTE* pSrc    = fileData.data();
    auto* pDosHdr = reinterpret_cast<IMAGE_DOS_HEADER*>(pSrc);
    auto* pNtHdr  = reinterpret_cast<IMAGE_NT_HEADERS*>(pSrc + pDosHdr->e_lfanew);

    if (pDosHdr->e_magic != IMAGE_DOS_SIGNATURE ||
        pNtHdr->Signature != IMAGE_NT_SIGNATURE)
    { std::cerr << "[-] Invalid PE file.\n"; return false; }

    HANDLE hProc = OpenProcess(
        PROCESS_CREATE_THREAD | PROCESS_QUERY_INFORMATION |
        PROCESS_VM_OPERATION  | PROCESS_VM_WRITE | PROCESS_VM_READ,
        FALSE, m_pid);
    if (!hProc) { std::cerr << "[-] OpenProcess failed.\n"; return false; }

    const DWORD imageSize = pNtHdr->OptionalHeader.SizeOfImage;
    BYTE* pBase = reinterpret_cast<BYTE*>(
        VirtualAllocEx(hProc, nullptr, imageSize,
                       MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE));

    // Write PE headers
    WriteProcessMemory(hProc, pBase, pSrc,
                       pNtHdr->OptionalHeader.SizeOfHeaders, nullptr);

    // Copy each section
    auto* pSection = IMAGE_FIRST_SECTION(pNtHdr);
    for (WORD i = 0; i < pNtHdr->FileHeader.NumberOfSections; ++i)
    {
        if (!pSection[i].SizeOfRawData) continue;
        WriteProcessMemory(hProc,
            pBase + pSection[i].VirtualAddress,
            pSrc  + pSection[i].PointerToRawData,
            pSection[i].SizeOfRawData, nullptr);
    }

    // ... (set up shellcode and run it remotely)
}

After writing the image into the remote process, we still need to do relocations and resolve imports. But we can't just call LoadLibrary over there. Instead we inject a shellcode stub that does all this work from inside the target process.

The Shellcode

The shellcode is a self-contained function that runs inside the target process. It only needs two function pointers from kernel32 (which are the same address in every process): LoadLibraryA and GetProcAddress. We pack those into a struct:

struct MappingData
{
    f_LoadLibraryA   pLoadLibraryA;
    f_GetProcAddress pGetProcAddress;
    BYTE*            pBase;
    DWORD            lastError;
    BOOL             initialized;   // set to TRUE when DllMain returns
};

The shellcode itself is compiled into its own PE section (".scode") with optimizations disabled. This is important because we don't want the compiler generating any relocations or external function calls. It has to be completely self-contained:

#pragma code_seg(push, "r", ".scode")
#pragma runtime_checks("", off)
#pragma optimize("g", off)

void __stdcall Shellcode(MappingData* pData)
{
    if (!pData) return;

    BYTE* pBase          = pData->pBase;
    auto& LoadLibraryA_   = pData->pLoadLibraryA;
    auto& GetProcAddress_ = pData->pGetProcAddress;

    auto* pDosHdr = reinterpret_cast<IMAGE_DOS_HEADER*>(pBase);
    auto* pNtHdr  = reinterpret_cast<IMAGE_NT_HEADERS*>(pBase + pDosHdr->e_lfanew);
    auto& optHdr  = pNtHdr->OptionalHeader;

    // --- Step 1: Process base relocations ---
    BYTE* delta = pBase - optHdr.ImageBase;
    if (delta && optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].Size)
    {
        auto* pReloc = reinterpret_cast<IMAGE_BASE_RELOCATION*>(
            pBase + optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);

        while (pReloc->VirtualAddress)
        {
            UINT  count   = (pReloc->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(WORD);
            WORD* entries = reinterpret_cast<WORD*>(pReloc + 1);

            for (UINT i = 0; i < count; ++i)
            {
                WORD type   = entries[i] >> 12;
                WORD offset = entries[i] & 0x0FFF;

                if (type == IMAGE_REL_BASED_DIR64)
                    *reinterpret_cast<UINT_PTR*>(
                        pBase + pReloc->VirtualAddress + offset)
                        += reinterpret_cast<UINT_PTR>(delta);
                else if (type == IMAGE_REL_BASED_HIGHLOW)
                    *reinterpret_cast<DWORD*>(
                        pBase + pReloc->VirtualAddress + offset)
                        += static_cast<DWORD>(reinterpret_cast<UINT_PTR>(delta));
            }

            pReloc = reinterpret_cast<IMAGE_BASE_RELOCATION*>(
                reinterpret_cast<BYTE*>(pReloc) + pReloc->SizeOfBlock);
        }
    }

    // --- Step 2: Resolve imports ---
    if (optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].Size)
    {
        auto* pImportDesc = reinterpret_cast<IMAGE_IMPORT_DESCRIPTOR*>(
            pBase + optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);

        while (pImportDesc->Name)
        {
            HINSTANCE hMod = LoadLibraryA_(
                reinterpret_cast<char*>(pBase + pImportDesc->Name));

            auto* pThunk    = reinterpret_cast<IMAGE_THUNK_DATA*>(
                pBase + pImportDesc->OriginalFirstThunk);
            auto* pFuncAddr = reinterpret_cast<IMAGE_THUNK_DATA*>(
                pBase + pImportDesc->FirstThunk);

            if (!pImportDesc->OriginalFirstThunk)
                pThunk = pFuncAddr;

            while (pThunk->u1.AddressOfData)
            {
                if (IMAGE_SNAP_BY_ORDINAL(pThunk->u1.Ordinal))
                    pFuncAddr->u1.Function = reinterpret_cast<UINT_PTR>(
                        GetProcAddress_(hMod,
                            reinterpret_cast<char*>(pThunk->u1.Ordinal & 0xFFFF)));
                else
                    pFuncAddr->u1.Function = reinterpret_cast<UINT_PTR>(
                        GetProcAddress_(hMod,
                            reinterpret_cast<IMAGE_IMPORT_BY_NAME*>(
                                pBase + pThunk->u1.AddressOfData)->Name));
                ++pThunk;
                ++pFuncAddr;
            }
            ++pImportDesc;
        }
    }

    // --- Step 3: TLS callbacks ---
    if (optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS].Size)
    {
        auto* pTls = reinterpret_cast<IMAGE_TLS_DIRECTORY*>(
            pBase + optHdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_TLS].VirtualAddress);
        auto** ppCb = reinterpret_cast<PIMAGE_TLS_CALLBACK*>(pTls->AddressOfCallBacks);
        if (ppCb)
            while (*ppCb) { (*ppCb)(pBase, DLL_PROCESS_ATTACH, nullptr); ++ppCb; }
    }

    // --- Step 4: Call DllMain ---
    if (optHdr.AddressOfEntryPoint)
        reinterpret_cast<f_DllMain>(pBase + optHdr.AddressOfEntryPoint)(
            reinterpret_cast<HINSTANCE>(pBase), DLL_PROCESS_ATTACH, nullptr);

    pData->initialized = TRUE;
}

void __stdcall ShellcodeEnd() {}  // marker for size calculation

#pragma optimize("g", on)
#pragma runtime_checks("", restore)
#pragma code_seg(pop)

So the shellcode does four things in order:

  1. Relocations - adjusts all the hardcoded addresses in the image. The DLL won't load at its preferred base address, so all absolute pointers need to be fixed up.
  2. Imports - walks the import table, calls LoadLibraryA for each DLL dependency, and fills in all the function pointers with GetProcAddress.
  3. TLS callbacks - fires any TLS (thread-local storage) init callbacks the DLL registered.
  4. DllMain - calls the DLL's entry point with DLL_PROCESS_ATTACH, which kicks off our hook's run() thread.

That ShellcodeEnd function is an empty dummy. We use it as a marker to calculate the shellcode size (ShellcodeEnd - Shellcode). This is why you have to disable LTCG (Link-Time Code Generation) when building. LTCG can reorder functions and that would totally break this size calculation.

Launching the Shellcode Remotely

Back in inject(), we write the MappingData struct plus the shellcode bytes into the remote process, then kick off a thread there to run it:

    // Prepare mapping data with kernel32 function pointers
    HMODULE hKernel32 = GetModuleHandleA("kernel32.dll");
    MappingData data{};
    data.pLoadLibraryA   = reinterpret_cast<f_LoadLibraryA>(
        GetProcAddress(hKernel32, "LoadLibraryA"));
    data.pGetProcAddress = reinterpret_cast<f_GetProcAddress>(
        GetProcAddress(hKernel32, "GetProcAddress"));
    data.pBase           = pBase;

    // Calculate shellcode size from the two marker functions
    SIZE_T scSize = reinterpret_cast<BYTE*>(ShellcodeEnd)
                  - reinterpret_cast<BYTE*>(Shellcode);

    // Allocate a block for [MappingData | Shellcode] in the remote process
    BYTE* pBlock = reinterpret_cast<BYTE*>(VirtualAllocEx(hProc, nullptr,
        sizeof(MappingData) + scSize,
        MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE));

    // Write data struct, then shellcode bytes
    WriteProcessMemory(hProc, pBlock, &data, sizeof(data), nullptr);
    WriteProcessMemory(hProc, pBlock + sizeof(MappingData),
                       reinterpret_cast<BYTE*>(Shellcode), scSize, nullptr);

    // Execute: thread entry = shellcode, argument = &MappingData
    HANDLE hThread = CreateRemoteThread(hProc, nullptr, 0,
        reinterpret_cast<LPTHREAD_START_ROUTINE>(pBlock + sizeof(MappingData)),
        pBlock, 0, nullptr);

    // Poll until the shellcode sets initialized = TRUE
    for (int i = 0; i < 500; ++i)
    {
        MappingData check{};
        ReadProcessMemory(hProc, pBlock, &check, sizeof(check), nullptr);
        if (check.initialized)
        {
            std::cout << "[+] Injected successfully.\n";
            break;
        }
        Sleep(10);
    }

Here's what the memory looks like in the target process:

pBlock +0x00  ┌──────────────────┐
              │   MappingData    │  <- pLoadLibraryA, pGetProcAddress, pBase, ...
              ├──────────────────┤
pBlock +0x28  │   Shellcode()    │  <- thread entry point
              │       ...        │
              │   ShellcodeEnd() │
              └──────────────────┘

CreateRemoteThread starts execution at pBlock + sizeof(MappingData) (that's our shellcode). It passes pBlock itself (the MappingData) as the thread argument. The shellcode does relocations, resolves imports, fires TLS callbacks, calls DllMain, and then sets initialized = TRUE. Our injector polls that flag to know when it's done, then frees the shellcode block. The mapped DLL image stays alive though, that's our running hook now.


Project link

https://github.com/devirtz/ss-overlay-hook