A

AI Tool

AssemblyAI

Speech-to-text, streaming transcription, Voice Agent API, and LLM Gateway for voice workflows

AssemblyAI documents Voice AI APIs at assemblyai.com/docs where developers transcribe and analyze audio via REST at `https://api.assemblyai.com` and real-time WebSockets at `wss://streaming.assemblyai.com` (EU pre-recorded host `api.eu.assemblyai.com` per cloud residency docs). Pre-recorded transcription requires an explicit `speech_models` array on every `POST /v2/transcript` request—docs recommend `universal-3-pro` with `universal-2` fallback for 99-language coverage. The platform also publishes a Voice Agent API for speech-to-speech agents, Speech Understanding features (diarization, sentiment, summarization), Guardrails, and an LLM Gateway to run frontier models on transcripts.

Category Developer Tools
Pricing Pay-as-you-go per audio hour; enterprise plans (see assemblyai.com/pricing)
Platforms Web / API / JavaScript / Python
speech-to-textstreamingvoice-agents

Use cases

  • Batch transcribe podcasts or calls with Universal-3 Pro and store transcript IDs
  • Build live captions or agent-assist with streaming STT
  • Run voice agents that need both STT and downstream LLM summarization
  • Redact PII or moderate content with Guardrails on transcript pipelines
  • Serve EU customers via `api.eu.assemblyai.com` residency endpoint

Key features

  • Pre-recorded STT via `POST /v2/transcript` with required `speech_models` parameter
  • Streaming STT WebSocket at `wss://streaming.assemblyai.com` with API-key auth
  • Voice Agent API for speech-to-speech agents over a single WebSocket
  • Speech Understanding suite (diarization, sentiment, topics, auto chapters)
  • LLM Gateway to apply Anthropic, OpenAI, Google, and other models to transcripts

Who Is It For?

  • Developers shipping voice-enabled SaaS products
  • Teams needing both async and real-time transcription in one vendor
  • ML engineers applying LLMs to spoken-data workflows

Frequently Asked Questions

Is there a default speech model?
No—AssemblyAI docs state every pre-recorded request must include `speech_models`; omitting it fails the request.
How do I authenticate API calls?
Pass your API key in the `Authorization` header for REST and LLM Gateway; streaming accepts the key as a query parameter or in the initial WebSocket message per docs.
What is the recommended model stack?
Docs recommend `universal-3-pro` for accuracy and suggest `['universal-3-pro','universal-2']` when you need automatic fallback for unsupported languages.

Related

Related

3 Indexed items

Deepgram

Developer ToolsPay-as-you-go per audio minute; enterprise plans (see deepgram.com/pricing)

Deepgram documents speech-to-text at developers.deepgram.com with WebSocket streaming on `/v1/listen` for general real-time transcription (Nova-3 model, diarization, and search features per API reference) and `/v2/listen` for conversational Flux models with integrated end-of-turn detection (StartOfTurn, EndOfTurn, EagerEndOfTurn events). Official SDKs expose `deepgram.listen.v1.connect` and `deepgram.listen.v2.connect` for binary audio streams. Docs contrast Flux—optimized for voice agents with lower turn-detection latency—against Nova-3 for meetings, IVR, and agent-assist workloads, and describe latency measurement guides targeting sub-300 ms streaming for Nova-3. Self-hosted deployments can run Flux on dedicated Engine nodes with `/v2/listen` enabled per self-hosted configuration guides.

Fireworks AI

Developer ToolsServerless per-token pricing on fireworks.ai/pricing; dedicated deployments billed per GPU-second

Fireworks AI documents a REST platform at docs.fireworks.ai where developers call language, image, and embedding models with Bearer API keys from the dashboard or `firectl api-key create`. Models use globally unique IDs such as `accounts/<account>/models/<model-id>` and can be served via serverless inference for popular open weights (for example Llama 3.1 70B listed on fireworks.ai/models) or private dedicated GPU deployments for custom base models and LoRA addons. Official docs distinguish serverless per-token billing with best-effort uptime from dedicated deployments billed per GPU-second with private capacity, and state that prompts and generated outputs are not logged except for documented exceptions such as the FireFunction model or opt-in advanced features.

Baseten

Developer ToolsUsage-based inference and training; see baseten.co/pricing

Baseten documents a training and inference platform at docs.baseten.co where teams deploy models via the open-source Truss framework or call hosted Model APIs without standing up infrastructure. Config-only Truss deployments point at Hugging Face checkpoints, select GPU resources, and engines such as TensorRT-LLM; `truss push` builds optimized containers and exposes OpenAI-compatible sync endpoints like `https://model-{model_id}.api.baseten.co/environments/production/sync/v1`. Custom architectures use a Truss `Model` class with `load` and `predict` in `model.py`. Model APIs provide immediate OpenAI-SDK-style access to catalog models (DeepSeek, Qwen, GLM, and others listed in docs) using `BASETEN_API_KEY`.