On April 22, 2026, OpenAI published Speeding up agentic workflows with WebSockets in the Responses API, describing WebSocket transport for the same Responses API surface developers already use—aimed especially at agent-style workloads with many model–tool round trips (e.g. coding agents like Codex).

What OpenAI announced

From the announcement and the WebSocket mode platform guide:

  • Persistent WebSocket — clients keep a connection to /v1/responses and continue turns with new input items plus previous_response_id, rather than repeating full HTTP handshakes each hop.
  • Familiar request shape — OpenAI states that shipped WebSocket mode keeps using response.create with the same payload ideas as HTTP; previous_response_id chains context from the prior response’s state, with connection-scoped caching on the server to avoid rebuilding full history every time.
  • Motivation — the post explains that as inference sped up (e.g. toward high tokens-per-second on newer models), API-side overhead on long agent rollouts became a larger share of latency; WebSockets and related optimizations target that bottleneck.

Why long-lived transport matters for agents

OpenAI frames typical agent loops as model action → local or hosted tool execution → tool output back to the API → repeat. On plain HTTP, each follow-up can repeat work tied to conversation state; on WebSocket, the design reuses cached prior-response state on the connection when continuing with previous_response_id, which the docs tie to lower continuation overhead on long chains.

Who should evaluate WebSocket mode

  • Teams building tool-heavy agents (coding, orchestration) on the Responses API.
  • Platforms routing agent traffic that need sequential turn processing (docs note one in-flight response per connection today).
  • Deployments using store=false or Zero Data Retention — the guide states WebSocket mode can remain compatible, with specific rules when cached IDs are missing.

Production checklist (per official docs)

  • Read Connection behavior and limits: e.g. ~60-minute connection cap, sequential responses, no multiplexing on one socket—plan extra connections if you need parallelism.
  • Implement Reconnect and recover paths: new socket after close or limit; handle previous_response_not_found when continuing without valid cached or persisted state.
  • Handle documented error types (including websocket_connection_limit_reached) instead of assuming silent retries.

Frequently asked questions

What is WebSocket mode for the Responses API?

An optional WebSocket transport to /v1/responses where each turn sends response.create (payload aligned with the HTTP create body, aside from transport-only fields), continuing with previous_response_id and incremental input, as described in OpenAI’s guide.

Is it a different API product?

No. It is another way to reach the Responses API; OpenAI emphasizes keeping the integration model familiar while reducing per-turn continuation cost on the active socket.

Where are the canonical references?

The April 22, 2026 article Speeding up agentic workflows with WebSockets in the Responses API covers motivation and architecture; the platform page WebSocket mode documents connection URLs, events, limits, compaction, and error codes.


Primary source: OpenAI — Speeding up agentic workflows with WebSockets in the Responses API (technical details: WebSocket mode).