The update focuses on persistent WebSocket connections for the agent loop instead of repeated stateless request overhead. OpenAI also highlights connection-scoped caching and protocol-level optimizations to reduce per-step latency. The result is a materially faster loop for tool execution, context updates, and follow-up model actions.
Teams building coding and automation agents get quicker feedback cycles, especially on multi-tool tasks. Lower orchestration overhead improves responsiveness without requiring major app-level redesign. This is particularly relevant for long-running workflows where cumulative latency usually dominates user wait time.
Adopt the Responses API patterns described in the post for agent loops that repeatedly call tools. Move from request-by-request polling to a persistent connection model where possible. Benchmark your own before/after loop timing to validate gains on real workflows.
Read Original Post →