D

AI Tool

Deepgram

Streaming speech-to-text APIs with Nova-3 for live captioning and Flux for voice agents

Deepgram documents speech-to-text at developers.deepgram.com with WebSocket streaming on `/v1/listen` for general real-time transcription (Nova-3 model, diarization, and search features per API reference) and `/v2/listen` for conversational Flux models with integrated end-of-turn detection (StartOfTurn, EndOfTurn, EagerEndOfTurn events). Official SDKs expose `deepgram.listen.v1.connect` and `deepgram.listen.v2.connect` for binary audio streams. Docs contrast Flux—optimized for voice agents with lower turn-detection latency—against Nova-3 for meetings, IVR, and agent-assist workloads, and describe latency measurement guides targeting sub-300 ms streaming for Nova-3. Self-hosted deployments can run Flux on dedicated Engine nodes with `/v2/listen` enabled per self-hosted configuration guides.

Category Developer Tools
Pricing Pay-as-you-go per audio minute; enterprise plans (see deepgram.com/pricing)
Platforms Web / API / JavaScript / Python
speech-to-textstreamingvoice-agents

Use cases

  • Live meeting captioning and call analytics with Nova-3 on `/v1/listen`
  • Voice-agent stacks that need built-in end-of-turn instead of separate VAD pipelines
  • Compare Flux vs Nova-3 latency before choosing an agent architecture
  • Measure streaming STT latency in production with documented benchmarking tools
  • Deploy Flux self-hosted when cloud API routing must stay on-prem

Key features

  • WebSocket `/v1/listen` streaming with Nova-3 and optional diarization per Listen v1 reference
  • WebSocket `/v2/listen` Flux turn-based events for conversational agents
  • Official SDK `listen.v1.connect` and `listen.v2.connect` helpers for binary audio
  • Latency documentation and tooling (for example stt_stream_file) for EOT measurement
  • Self-hosted Flux on dedicated Engine nodes with separate `/v2/listen` routing per deployment docs

Who Is It For?

  • Voice-agent engineers building real-time conversational products
  • Developers adding streaming STT to video or telephony pipelines
  • Platform teams evaluating self-hosted vs Deepgram-managed STT

Frequently Asked Questions

When should I use `/v2/listen` instead of `/v1/listen`?
Deepgram docs position Flux on `/v2/listen` for conversational voice agents with turn detection; Nova-3 on `/v1/listen` for general streaming transcription.
Does Flux replace Nova-3 everywhere?
No—the comparison guide lists different endpoints and feature sets; pick per workload (agents vs meetings/IVR).
How is audio sent on the WebSocket?
API references require binary audio frames on the streaming connection with encoding parameters documented per Listen v1/v2 schemas.

Related

Related

3 Indexed items

Groq Cloud API

Developer ToolsFree tier + Pay-as-you-go (published USD rates)

GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.

Fireworks AI

Developer ToolsServerless per-token pricing on fireworks.ai/pricing; dedicated deployments billed per GPU-second

Fireworks AI documents a REST platform at docs.fireworks.ai where developers call language, image, and embedding models with Bearer API keys from the dashboard or `firectl api-key create`. Models use globally unique IDs such as `accounts/<account>/models/<model-id>` and can be served via serverless inference for popular open weights (for example Llama 3.1 70B listed on fireworks.ai/models) or private dedicated GPU deployments for custom base models and LoRA addons. Official docs distinguish serverless per-token billing with best-effort uptime from dedicated deployments billed per GPU-second with private capacity, and state that prompts and generated outputs are not logged except for documented exceptions such as the FireFunction model or opt-in advanced features.

LangSmith

Developer ToolsFree developer tier plus paid Team/Enterprise plans (see LangSmith pricing docs)

LangSmith is LangChain's hosted and self-hostable platform for tracing, monitoring, and improving LLM applications. Official documentation at docs.langchain.com describes instrumenting apps via environment variables, framework integrations (OpenAI, Anthropic, CrewAI, Vercel AI SDK, Pydantic AI, and others listed on the integrations page), or the LangSmith SDK so teams can inspect multi-step runs, compare prompt versions, build datasets, run offline and online evaluations, configure automations, and collect feedback queues—without assembling bespoke analytics for agent loops.