Skip to main content

WebSocket MITM Support

This document tracks the engineering journey of getting WebSocket traffic (specifically codex CLI traffic to api.openai.com/v1/responses) captured and decoded by greyproxy.


Background

The OpenAI /v1/responses endpoint uses WebSocket (Upgrade: websocket) instead of HTTP chunked streaming. When codex CLI sends requests through the proxy, they go over SOCKS5, then MITM TLS termination, then HTTP — so the 101 Switching Protocols response redirects the connection into WebSocket territory that greyproxy did not previously handle.

The WebSocket connection also uses the permessage-deflate extension, which compresses each message using raw DEFLATE. This added a second layer of complexity on top of the WebSocket framing.


Issue History

Issue 1: 101 Switching Protocols exits httpRoundTrip before hooks fire

Symptom: No traffic from api.openai.com/v1/responses appeared in the activity log. The proxy was transparently passing the WebSocket tunnel without capturing anything.

Root cause: httpRoundTrip checked resp.StatusCode == 200 before firing OnHTTPRoundTrip. A 101 response caused an early return via handleUpgradeResponse, bypassing the hook entirely.

Fix: Added an explicit 101 branch before the main response path that fires the hook with the upgrade request/response metadata, then hands off to handleUpgradeResponse. This makes the 101 visible in the activity log with full request and response headers.


Issue 2: WebSocket frames not captured at all

Symptom: Even after the 101 appeared in the UI, no frame-level content was captured. handleUpgradeResponse fell through to a plain io.Copy pipe.

Root cause: sniffing.websocket was not set to true in the service metadata config, so h.Websocket was false and handleUpgradeResponse skipped the sniffingWebsocketFrame path.

Fix: Added sniffing.websocket: true to both HTTP and SOCKS5 service metadata in greyproxy.yml and in the test matrix isolated config.


Issue 3: Captured WebSocket frame payloads were binary (masked, not unmasked)

Symptom: Frames were being captured and stored in the DB, but the request_body column contained binary garbage rather than readable JSON.

Root cause (part 1 — masking): RFC 6455 requires client→server frames to be XOR-masked with a 4-byte key. In copyWebsocketFrame, the in-place XOR was done on buf.Bytes() directly:

payload := buf.Bytes()          // slice into buf's backing array
payload[i] ^= mask[i%4] // modifies buf in-place!

Then the forwarding used:

fr.Data = io.MultiReader(bytes.NewReader(buf.Bytes()), fr.Data)

Because buf.Bytes() was modified in-place, the forwarded bytes were already unmasked but the frame header still declared Masked=true with the original key. The server XOR'd the data again, producing garbage. codex detected the corrupted responses and fell back from WebSocket to plain HTTP after 5 attempts.

Fix: Make an independent copy before unmasking, so the captured payload is plaintext but the forwarded wire bytes remain correctly masked:

payload := make([]byte, buf.Len())
copy(payload, buf.Bytes()) // independent copy
for i := range payload {
payload[i] ^= mask[i%4] // unmask the copy only
}

Issue 4: permessage-deflate frames could not be decompressed (shared context)

Symptom: After the masking fix, WebSocket traffic stayed connected and frames were captured with rsv1=true. Decompression failed with unexpected EOF or invalid code lengths set. Server returned Sec-Websocket-Extensions: permessage-deflate (no no_context_takeover).

Root cause: permessage-deflate by default uses a shared DEFLATE context across all frames in a session. Each frame's compressed bytes are a continuation of the previous frame's DEFLATE stream. Decompressing each frame independently (with a fresh flate.NewReader) fails for any frame after the first.

Fix: Rewrite Sec-Websocket-Extensions in the WebSocket upgrade request (before sending to the upstream server) to force no-context-takeover on both sides:

if strings.EqualFold(req.Header.Get("Upgrade"), "websocket") {
ext := req.Header.Get("Sec-Websocket-Extensions")
if strings.Contains(strings.ToLower(ext), "permessage-deflate") {
req.Header.Set("Sec-Websocket-Extensions",
"permessage-deflate; client_no_context_takeover; server_no_context_takeover")
}
}

OpenAI's server accepted this and responded with permessage-deflate; server_no_context_takeover; client_no_context_takeover. Each frame is now independently decompressible.


Issue 5: Go's compress/flate returns unexpected EOF on valid DEFLATE frames

Symptom: Even after forcing no_context_takeover, decompression still failed with unexpected EOF for every frame. Python's zlib.decompressobj(-15) successfully decompressed the same bytes.

Root cause: The permessage-deflate spec requires senders to strip the 4-byte SYNC_FLUSH trailer (\x00\x00\xff\xff) from the end of each frame's compressed payload. Receivers are expected to re-append it before decompressing. The initial decompressWebSocketFrame did this:

payload = append(payload, 0x00, 0x00, 0xff, 0xff)
r := flate.NewReader(bytes.NewReader(payload))

However, the SYNC_FLUSH block has BFINAL=0 (not the last block). Go's pure-Go compress/flate implementation requires a BFINAL=1 block to signal end-of-stream and return clean io.EOF. Python's libz handles BFINAL=0 SYNC_FLUSH implicitly.

Fix (gorilla/websocket technique): Append both the SYNC_FLUSH trailer AND an additional empty BFINAL=1 stored block:

const tail = "\x00\x00\xff\xff\x01\x00\x00\xff\xff"
// ^^^^^^^^^^^^^^^^ SYNC_FLUSH (stripped by sender)
// ^^^^^^^^^^^^^^^^^^^^^^^^ BFINAL=1 empty block (Go needs this)
mr := io.MultiReader(bytes.NewReader(payload), strings.NewReader(tail))
r := flate.NewReader(mr)

\x01\x00\x00\xff\xff = BFINAL=1, BTYPE=00 (non-compressed), LEN=0, NLEN=0xFFFF. This makes Go's flate reader terminate cleanly.


Issue 6: Conversation assembler crashes on WS_REQ/WS_RESP rows (NULL response_content_type)

Symptom: After WebSocket frames were stored as WS_REQ/WS_RESP transactions, the conversation assembler logged 27+ warnings:

WARN assembler: failed to scan transaction row error="sql: Scan error on column index 8,
name \"response_content_type\": converting NULL to string is unsupported"

The assembler skipped every WS transaction, so no WebSocket sessions appeared in the conversation view.

Root cause: WS frame rows have response_content_type = NULL (they have no HTTP response, only a WebSocket payload). The assembler scanned this column into a string variable, which Go's database/sql refuses to do for NULL values.

Fix: Changed the scan variable type from string to *string and added a derefString helper that returns "" for nil pointers.


Current State

CapabilityStatus
101 Switching Protocols logged
WebSocket frames captured
Client→server frames correctly forwarded (masking preserved)
permessage-deflate frames decompressed
no_context_takeover negotiated transparently
WS frames visible in activity log as WS_REQ/WS_RESP
Conversation assembler handles WS rows without crashing
Conversation assembly from WebSocket sessionswork in progress

Architecture Notes

Frame capture flow

Client (codex)
│ [SOCKS5]

greyproxy SOCKS5 listener
│ [MITM TLS termination]

httpRoundTrip (sniffer.go)
│ GET /v1/responses → 101
│ [rewrites Sec-Websocket-Extensions before req.Write to force no_context_takeover]

handleUpgradeResponse → sniffingWebsocketFrame
│ two goroutines: client→server and server→client

copyWebsocketFrame (per frame)
│ 1. fr.ReadFrom(r) — reads frame header + LimitReader for body
│ 2. io.Copy(buf, fr.Data) — drains body into buffer
│ 3. make+copy → payload — independent copy for hook
│ 4. XOR unmask payload — client frames only (Masked=true)
│ 5. GlobalWebSocketFrameHook — fires with unmasked payload
│ 6. fr.Data = MultiReader(buf.Bytes(), fr.Data) — reassemble masked wire bytes
│ 7. fr.WriteTo(w) — forward original masked frame

program.go hook goroutine
│ if RSV1: decompressWebSocketFrame (append 9-byte tail, flate.NewReader)

greyproxy.CreateHttpTransaction (WS_REQ or WS_RESP)

Compression tail breakdown

Bytes appended before decompression:
00 00 FF FF — SYNC_FLUSH terminator (stripped by RFC 7692)
01 00 00 FF FF — BFINAL=1 empty stored block (required by Go's flate)