What happened

OpenAI’s Realtime API made speech-to-speech with streaming audio and tool calls practical in one session. Teams are now asking plain operations questions: how to store transcripts safely, when to hand off to a human, and how to test voice paths like REST APIs. The “wow” phase is fading; compliance and uptime are not.

Why it matters

Voice agents tend to fail on dull issues: vague tool permissions, missing rate limits, nobody owns the pager when something breaks. Shops that already run Slack, browser MCPs, and structured debugging treat voice as another surface on the same stack, not a separate demo.

Directory impact

This touches ChatGPT-class assistants, lightweight UI tools like v0 when you need a web shell, and MCPs that ground agents in real systems. Debugging and incident-response skills matter more when voice latency turns small mistakes into customer-visible outages.

What to watch next

Session recording rules, where audio may be stored by region, and standard eval harnesses for speech will get louder. Clear per-session tool budgets from vendors would speed enterprise adoption; without them, every team invents its own policy dialect.