Skip to content

MITM standalone worker protocol

This document defines the worker-facing protocol that sing-box uses for external MITM workers. It is intentionally command-agnostic: sing-box only requires a connected Unix domain socket stream and never assumes a specific embedded runtime.

Lifecycle

  1. sing-box establishes one Unix domain socket stream to the worker master process.
  2. Stream 0 is reserved for startup negotiation.
  3. sing-box sends HELLO.
  4. The worker replies with HELLO.
  5. sing-box sends CAPS with the selected protocol version and negotiated capability set.
  6. The worker replies with matching CAPS.
  7. Only after both CAPS frames match may live traffic start.
  8. Each intercepted request or HTTP/2 stream uses one logical StreamID on the same socket.

If negotiation fails, sing-box rejects the worker before serving live traffic.

Frame format

Every frame uses the same 9-byte header:

StreamID[4] + Type[1] + Length[4]
  • StreamID: unsigned big-endian 32-bit integer.
  • Type: unsigned 8-bit frame type.
  • Length: unsigned big-endian 32-bit payload length.

StreamID = 0 is reserved for session negotiation only.

Frame types

Value Name
0x01 control
0x02 request_headers
0x03 request_body
0x04 request_trailers
0x05 response_headers
0x06 response_body
0x07 response_trailers

request_body and response_body payloads are raw bytes. All other payloads are UTF-8 JSON objects with strict unknown-field rejection.

Control frames

control payloads start with one extra subtype byte, followed by a JSON object.

Value Subtype Scope
0x01 HELLO Session startup on stream 0
0x02 OPEN Per-stream open / worker decision
0x03 END Half-close or full-close a logical stream
0x04 ABORT Abort/reset a logical stream
0x05 ERROR Structured protocol/runtime error
0x06 WINDOW Optional flow-control credit update
0x07 CAPS Session startup capability confirmation

Unknown critical control subtypes are fatal protocol errors.

Version and capability negotiation

HELLO

{
  "min_version": 1,
  "max_version": 1,
  "required_capabilities": 217,
  "supported_capabilities": 511
}
  • min_version / max_version: inclusive supported range.
  • required_capabilities: bits that must be present in the final negotiated set.
  • supported_capabilities: full bitset the peer understands.

sing-box rejects:

  • non-overlapping version ranges,
  • required bits that are not a subset of supported bits,
  • any unknown capability bit.

CAPS

{
  "version": 1,
  "capabilities": 255
}

Both sides must send the same selected version and negotiated capability set. Any mismatch is a fatal startup failure.

Capability bits

Bit Name Meaning
1 << 0 request_headers request-header surface is available
1 << 1 request_body request body frames are allowed
1 << 2 request_trailers request trailer frames are allowed
1 << 3 response_headers response-header surface is available
1 << 4 response_body response body frames are allowed
1 << 5 response_trailers response trailer frames are allowed
1 << 6 direct_response worker may short-circuit with its own response
1 << 7 pass_through worker may explicitly decline interception
1 << 8 window_updates WINDOW frames are allowed

Workers must not send a frame that depends on a capability which was not negotiated.

Stream opening

sing-box opens each logical stream with OPEN on a non-zero StreamID:

{
  "protocol": "http/1.1",
  "module": "capture",
  "request_body": true,
  "request_trailers": true,
  "response_body": true,
  "response_trailers": true,
  "is_connect": false,
  "is_extended_connect": false
}

The worker must answer with its own OPEN decision:

{"decision":"intercept"}

Allowed values:

  • intercept
  • direct_response
  • pass_through

direct_response requires negotiated direct_response capability. pass_through requires negotiated pass_through capability.

Redaction requirements

Workers should treat all decrypted HTTP content as sensitive.

  • Normal sing-box observability does not emit decrypted body bytes.
  • Normal sing-box observability does not emit secret-style headers such as Authorization, Proxy-Authorization, Cookie, Set-Cookie, token headers, or API keys.
  • Workers should follow the same rule in their own logs unless an operator has explicitly built a separate secure audit path.

Request metadata that appears in structured MITM events is limited to redaction-safe fields such as module tag, protocol, decision, authority/path/query, safe headers, and request body presence or size.

Header payloads

sing-box → worker request headers

{
  "method": "GET",
  "scheme": "https",
  "authority": "example.com",
  "path": "/v1/data?ok=1",
  "headers": {
    "accept": ["application/json"]
  }
}

worker → sing-box request mutation

{
  "authority": "rewritten.example",
  "path": "/internal/data?ok=1",
  "headers": {
    "x-policy": ["capture"]
  }
}

The worker mutation frame does not include method. Any attempt to send method, outbound, tls, fsm, or other unknown fields is rejected.

Response headers

{
  "status_code": 200,
  "headers": {
    "content-type": ["application/json"]
  }
}

For direct_response, the first worker response_headers frame must include status_code.

Trailers

{
  "headers": {
    "grpc-status": ["0"]
  }
}

Allowed worker outputs

The worker may only do the following:

  • mutate request headers, body, and trailers,
  • mutate response headers, body, and trailers,
  • send a direct response,
  • abort/reset a logical stream,
  • send an explicit pass-through decision,
  • send WINDOW updates only when window_updates was negotiated.

Prohibited worker outputs

The worker must not attempt to control sing-box outside the explicit contract.

Rejected examples:

  • request method mutation,
  • outbound selection or outbound tag changes,
  • TLS engine or certificate changes,
  • generic FSM / scheduler / transport control,
  • HTTP pseudo-header injection in normal header maps,
  • Host mutation through the header map instead of authority,
  • unknown capability bits,
  • unknown fields inside JSON payloads,
  • malformed stream transitions such as response_body before response_headers.

Stream closure and errors

END

{"target":"request"}

Valid target values:

  • request
  • response
  • stream

ABORT

{
  "target": "stream",
  "code": "policy_denied",
  "message": "blocked by worker policy"
}

ERROR

{
  "code": "bad_frame",
  "message": "unknown critical control subtype"
}

WINDOW

{
  "target": "response",
  "credit": 32768
}

WINDOW is invalid unless window_updates was negotiated.

Failure semantics

  • Negotiation failure is a session-level fatal error. sing-box closes the worker connection and does not accept traffic.
  • Parse failure, unknown critical subtype, forbidden mutation, or impossible state transition is a deterministic protocol failure.
  • For supported MITM flows, sing-box treats worker protocol failures as fail-closed behavior rather than silently bypassing policy.
  • pass_through is explicit. Absence of a valid worker decision is not an implicit pass-through.

Example transcript: intercepted request with request rewrite

S→W  stream=0  control/HELLO  {"min_version":1,"max_version":1,...}
W→S  stream=0  control/HELLO  {"min_version":1,"max_version":1,...}
S→W  stream=0  control/CAPS   {"version":1,"capabilities":255}
W→S  stream=0  control/CAPS   {"version":1,"capabilities":255}

S→W  stream=1  control/OPEN   {"protocol":"http/1.1","module":"capture","request_body":true}
S→W  stream=1  request_headers {"method":"GET","scheme":"https","authority":"example.com","path":"/v1/data"}
W→S  stream=1  control/OPEN   {"decision":"intercept"}
W→S  stream=1  request_headers {"authority":"rewritten.example","path":"/internal/data"}
S→W  stream=1  request_body    <bytes>
S→W  stream=1  control/END     {"target":"request"}
W→S  stream=1  response_headers {"status_code":200,"headers":{"x-worker":["1"]}}
W→S  stream=1  response_body    <bytes>
W→S  stream=1  control/END      {"target":"response"}

Example transcript: direct response

S→W  stream=7  control/OPEN      {"protocol":"h2","module":"capture"}
S→W  stream=7  request_headers   {"method":"GET","scheme":"https","authority":"api.example","path":"/healthz"}
W→S  stream=7  control/OPEN      {"decision":"direct_response"}
W→S  stream=7  response_headers  {"status_code":503,"headers":{"content-type":["text/plain"]}}
W→S  stream=7  response_body     <"worker unavailable">
W→S  stream=7  control/END       {"target":"response"}

Example transcript: explicit pass-through

S→W  stream=11 control/OPEN  {"protocol":"http/1.1","module":"capture"}
S→W  stream=11 request_headers {"method":"CONNECT","authority":"db.example:443"}
W→S  stream=11 control/OPEN  {"decision":"pass_through"}
W→S  stream=11 control/END   {"target":"stream"}

Implementation notes for standalone workers

  • Treat the UDS stream as an ordered byte stream; do not rely on message boundaries from the socket layer.
  • Use big-endian parsing for both StreamID and Length.
  • Reserve stream 0 for negotiation only.
  • Reject or surface unknown fields in your own parser too; do not silently discard contract changes.
  • Do not reverse-engineer sing-box internals for unsupported control. If the contract does not expose it, the worker may not control it.