AI News · 2026-04-22

OpenAI introduces WebSocket mode for the Responses API

Source: OpenAI ↗ 2026-04-22 By AIasdf Editorial

On April 22, 2026, OpenAI published how WebSocket transport for the Responses API keeps a persistent connection to /v1/responses so multi-step agent loops continue with incremental input and previous_response_id—targeting API-side overhead as model inference speeds up. Official docs add limits (e.g. ~60-minute connections), ZDR/store=false continuation rules, and reconnection guidance.

On April 22, 2026, OpenAI published Speeding up agentic workflows with WebSockets in the Responses API, describing WebSocket transport for the same Responses API surface developers already use—aimed especially at agent-style workloads with many model–tool round trips (e.g. coding agents like Codex).

What OpenAI announced

From the announcement and the WebSocket mode platform guide:

Persistent WebSocket — clients keep a connection to /v1/responses and continue turns with new input items plus previous_response_id, rather than repeating full HTTP handshakes each hop.
Familiar request shape — OpenAI states that shipped WebSocket mode keeps using response.create with the same payload ideas as HTTP; previous_response_id chains context from the prior response’s state, with connection-scoped caching on the server to avoid rebuilding full history every time.
Motivation — the post explains that as inference sped up (e.g. toward high tokens-per-second on newer models), API-side overhead on long agent rollouts became a larger share of latency; WebSockets and related optimizations target that bottleneck.

Why long-lived transport matters for agents

OpenAI frames typical agent loops as model action → local or hosted tool execution → tool output back to the API → repeat. On plain HTTP, each follow-up can repeat work tied to conversation state; on WebSocket, the design reuses cached prior-response state on the connection when continuing with previous_response_id, which the docs tie to lower continuation overhead on long chains.

Who should evaluate WebSocket mode

Teams building tool-heavy agents (coding, orchestration) on the Responses API.
Platforms routing agent traffic that need sequential turn processing (docs note one in-flight response per connection today).
Deployments using store=false or Zero Data Retention — the guide states WebSocket mode can remain compatible, with specific rules when cached IDs are missing.

Production checklist (per official docs)

Read Connection behavior and limits: e.g. ~60-minute connection cap, sequential responses, no multiplexing on one socket—plan extra connections if you need parallelism.
Implement Reconnect and recover paths: new socket after close or limit; handle previous_response_not_found when continuing without valid cached or persisted state.
Handle documented error types (including websocket_connection_limit_reached) instead of assuming silent retries.

Frequently asked questions

What is WebSocket mode for the Responses API?

An optional WebSocket transport to /v1/responses where each turn sends response.create (payload aligned with the HTTP create body, aside from transport-only fields), continuing with previous_response_id and incremental input, as described in OpenAI’s guide.

Is it a different API product?

No. It is another way to reach the Responses API; OpenAI emphasizes keeping the integration model familiar while reducing per-turn continuation cost on the active socket.

Where are the canonical references?

The April 22, 2026 article Speeding up agentic workflows with WebSockets in the Responses API covers motivation and architecture; the platform page WebSocket mode documents connection URLs, events, limits, compaction, and error codes.

Primary source: OpenAI — Speeding up agentic workflows with WebSockets in the Responses API (technical details: WebSocket mode).

Related AI Tools

3 Indexed items

ChatGPT

AI ChatFreemium

ChatGPT is a large language model-based chatbot developed by OpenAI, launched in November 2022. It uses the GPT-4 architecture to generate human-like text responses across conversation formats. The model supports multi-modal inputs including text, images, and voice interactions. A free tier is available with GPT-3.5, while ChatGPT Plus provides access to GPT-4 with faster response times and plugin capabilities. It serves as a versatile tool for writing, analysis, coding assistance, and creative tasks.

OpenAI Codex

AI AgentsPaid

OpenAI Codex was the model powering GitHub Copilot before being superseded by more recent models. As a descendant of GPT-3, Codex was specifically trained on code and gained proficiency in dozens of programming languages. In 2023, OpenAI introduced the Agents SDK enabling developers to build autonomous coding agents powered by OpenAI models. These agents can browse the web, write and execute code, and perform complex multi-step software development tasks.

Langfuse

Developer ToolsOpen source + hosted plans

Langfuse is an open-source product for LLM application observability: it ingests traces and spans from your stack, supports datasets and prompt/version workflows, and offers optional Langfuse Cloud or self-hosted deployment. It integrates with popular Python/JS SDKs and frameworks that emit OpenTelemetry-compatible telemetry, so teams can debug agent loops, compare prompt iterations, and monitor production quality metrics without building a custom analytics pipeline from scratch.

Related MCP

2 Indexed items

Agent Protocol MCP

Developer Tools

Implements the Agent Protocol standard enabling MCP clients to coordinate with external agent frameworks using shared task, step, and artifact schemas. Useful when composing multiple agents where one agent hands off work to specialized agents. Supports agent registration and state tracking.

Ollama MCP Server

Developer Tools

Community-maintained Model Context Protocol bridge that exposes Ollama's local HTTP API—model listing, pulls, chat, and OpenAI-compatible completions—to MCP clients such as Claude Desktop and Cursor. Published on npm as `ollama-mcp-server` (maintained fork of NightTrek/Ollama-mcp); requires a running Ollama daemon reachable at `OLLAMA_HOST` (default `http://127.0.0.1:11434`).

Related Skills

2 Indexed items

Agentic workflow design

Planning

Structures multi-step agent tasks with explicit inputs, outputs, fallback behavior, and handoff protocols so agents reliably complete complex workflows instead of stopping at the first blocker. Agentic workflow design applies software engineering discipline to AI agent pipelines, treating each step as a function with typed inputs and outputs.

Designing with LLM structured outputs

Coding

This skill covers when and how to ask an LLM for machine-readable payloads: define a JSON Schema (or the vendor's equivalent), enable the structured-output feature your provider documents, validate responses in application code, and handle refusals or validation errors explicitly. It applies to tool-calling agents, extraction pipelines, configuration emitters, and any workflow where brittle text parsing creates production risk.

Keep reading

All news →

2026-05-05 OpenAI rolls out GPT-5.5 Instant as ChatGPT’s default for everyone OpenAI 2026-05-05 Anthropic ships ten financial-services agent templates plus M365 add-ins and new data connectors Anthropic 2026-05-01 Pentagon Inks AI Deals With Seven Tech Companies to Expand Classified Intelligence Capabilities NYTimes 2026-05-01 Anthropic launches Claude for Creative Work connectors Anthropic