Streaming speech-to-text APIs with Nova-3 for live captioning and Flux for voice agents

Deepgram documents speech-to-text at developers.deepgram.com with WebSocket streaming on `/v1/listen` for general real-time transcription (Nova-3 model, diarization, and search features per API reference) and `/v2/listen` for conversational Flux models with integrated end-of-turn detection (StartOfTurn, EndOfTurn, EagerEndOfTurn events). Official SDKs expose `deepgram.listen.v1.connect` and `deepgram.listen.v2.connect` for binary audio streams. Docs contrast Flux—optimized for voice agents with lower turn-detection latency—against Nova-3 for meetings, IVR, and agent-assist workloads, and describe latency measurement guides targeting sub-300 ms streaming for Nova-3. Self-hosted deployments can run Flux on dedicated Engine nodes with `/v2/listen` enabled per self-hosted configuration guides.

Category Developer Tools

Pricing Pay-as-you-go per audio minute; enterprise plans (see deepgram.com/pricing)

Platforms Web / API / JavaScript / Python

speech-to-textstreamingvoice-agents

Use cases

Live meeting captioning and call analytics with Nova-3 on `/v1/listen`
Voice-agent stacks that need built-in end-of-turn instead of separate VAD pipelines
Compare Flux vs Nova-3 latency before choosing an agent architecture
Measure streaming STT latency in production with documented benchmarking tools
Deploy Flux self-hosted when cloud API routing must stay on-prem

Key features

WebSocket `/v1/listen` streaming with Nova-3 and optional diarization per Listen v1 reference
WebSocket `/v2/listen` Flux turn-based events for conversational agents
Official SDK `listen.v1.connect` and `listen.v2.connect` helpers for binary audio
Latency documentation and tooling (for example stt_stream_file) for EOT measurement
Self-hosted Flux on dedicated Engine nodes with separate `/v2/listen` routing per deployment docs

Who Is It For?

Voice-agent engineers building real-time conversational products
Developers adding streaming STT to video or telephony pipelines
Platform teams evaluating self-hosted vs Deepgram-managed STT

Frequently Asked Questions

When should I use `/v2/listen` instead of `/v1/listen`?: Deepgram docs position Flux on `/v2/listen` for conversational voice agents with turn detection; Nova-3 on `/v1/listen` for general streaming transcription.
Does Flux replace Nova-3 everywhere?: No—the comparison guide lists different endpoints and feature sets; pick per workload (agents vs meetings/IVR).
How is audio sent on the WebSocket?: API references require binary audio frames on the streaming connection with encoding parameters documented per Listen v1/v2 schemas.

3 Indexed items

AssemblyAI

Developer ToolsPay-as-you-go per aud…

AssemblyAI documents Voice AI APIs at assemblyai.com/docs where developers transcribe and analyze audio via REST at `https://api.assemblyai.com` and real-time WebSockets at `wss://streaming.assemblyai.com` (EU pre-recorded host `api.eu.assemblyai.com` per cloud residency docs). Pre-recorded transcription requires an explicit `speech_models` array on every `POST /v2/transcript` request—docs recommend `universal-3-pro` with `universal-2` fallback for 99-language coverage. The platform also publishes a Voice Agent API for speech-to-speech agents, Speech Understanding features (diarization, sentiment, summarization), Guardrails, and an LLM Gateway to run frontier models on transcripts.

Chroma

Developer ToolsOpen source

Chroma documents an open-source embedding database at docs.trychroma.com for storing and querying vectors, metadata, and full-text fields in Python and JavaScript clients. Official guides cover ephemeral in-memory collections, persistent local storage, self-hosted server deployments, and Chroma Cloud at trychroma.com with authentication tokens. The docs describe collection CRUD, `add`/`query`/`get`/`update`/`delete` APIs, embedding functions (default and third-party), hybrid search, and multitenancy patterns for RAG and agent memory workloads per the documentation index.

Groq Cloud API

Developer ToolsFree + Paid

GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.

Deepgram