OpenAI Speeds Agent Workflows with WebSockets

2026-04-24 · openai

OpenAI published an engineering deep dive on improving agentic workflow latency in the Responses API. The post explains how repeated request overhead became a bottleneck as model inference got much faster. By shifting key parts of the agent loop to WebSockets and optimizing request handling, OpenAI reports major end-to-end speed gains for Codex-style tool-using runs.

Key Features or Updates

The update focuses on persistent WebSocket connections for the agent loop instead of repeated stateless request overhead. OpenAI also highlights connection-scoped caching and protocol-level optimizations to reduce per-step latency. The result is a materially faster loop for tool execution, context updates, and follow-up model actions.

Impact on Developers

Teams building coding and automation agents get quicker feedback cycles, especially on multi-tool tasks. Lower orchestration overhead improves responsiveness without requiring major app-level redesign. This is particularly relevant for long-running workflows where cumulative latency usually dominates user wait time.

How to use it

Adopt the Responses API patterns described in the post for agent loops that repeatedly call tools. Move from request-by-request polling to a persistent connection model where possible. Benchmark your own before/after loop timing to validate gains on real workflows.

Read Original Post →