Hosted open-weight models via REST and official Python / TypeScript SDKs

Together AI operates a developer platform for running prominent open-source and vendor-weight models from Together-hosted GPUs. Documentation centers on issuing API keys, installing the Together Python (`together`) or npm (`together-ai`) SDKs, or calling HTTPS endpoints such as `https://api.together.ai/v1/chat/completions` with Bearer authentication. Guides cover streaming chat completions, function calling, structured outputs, model catalog browsing, GPU reservations for steady traffic, and fine-tuning or dedicated cluster offerings published in the broader docs hierarchy.

Category Developer Tools

Pricing Usage-based inference + optional dedicated endpoints / fine-tuning (see Together pricing docs)

Platforms Web / API / Python / Node.js

inferenceapiopen-source-models

Use cases

Shipping chat or agent backends on hosted open-weight checkpoints without owning GPU fleets
Prototyping tool-calling pipelines using Together-supported models listed in docs
Moving from curl-only proofs to typed SDK integrations for retries and telemetry
Exploring Together fine-tuning or dedicated endpoint SKUs once baseline usage patterns are validated
Teaching teams an OpenAI-shaped HTTP interface while swapping model IDs to Together catalog entries

Key features

Official SDK quickstart flows for Python and TypeScript with environment-based API keys (`TOGETHER_API_KEY`)
REST chat-completions endpoints compatible with common OpenAI-style JSON payloads illustrated in Together quickstart docs
Streaming completions demonstrated with `stream=True` in Python and Async iterators in TypeScript samples
Product surface area spanning model catalogs, GPU clusters, LoRA/full fine-tuning, and reservations per Together documentation index
Documented linkage between keys, billing projects (`api.together.ai` console), and per-model selection

Who Is It For?

Backend engineers prototyping LLM-heavy services
ML engineers evaluating hosted inference for open-weight checkpoints
Developer advocates standardizing onboarding material with official SDK snippets

Frequently Asked Questions

Do I need the SDK or can I use curl?: Together documents both: quickstart installs `together` or `together-ai`, but curl examples POST directly to `/v1/chat/completions` with your API key header.
Where are API keys created?: Docs direct users to the Together console API keys workflow under their active project (`api.together.ai/settings/projects/~current/api-keys` per quickstart).
Which model does the introductory sample call?: The quickstart streams `openai/gpt-oss-20b` as of the Together quickstart reproduction in this corpus.

3 Indexed items

Groq Cloud API

Developer ToolsFree tier + Pay-as-you-go (published USD rates)

GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.

Langfuse

Developer ToolsOpen source + hosted plans

Langfuse is an open-source product for LLM application observability: it ingests traces and spans from your stack, supports datasets and prompt/version workflows, and offers optional Langfuse Cloud or self-hosted deployment. It integrates with popular Python/JS SDKs and frameworks that emit OpenTelemetry-compatible telemetry, so teams can debug agent loops, compare prompt iterations, and monitor production quality metrics without building a custom analytics pipeline from scratch.

OpenRouter

Developer ToolsFree tier + Pay-as-you-go

OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.

Together AI