Hosted open-weight models via REST and official Python / TypeScript SDKs
Together AI operates a developer platform for running prominent open-source and vendor-weight models from Together-hosted GPUs. Documentation centers on issuing API keys, installing the Together Python (`together`) or npm (`together-ai`) SDKs, or calling HTTPS endpoints such as `https://api.together.ai/v1/chat/completions` with Bearer authentication. Guides cover streaming chat completions, function calling, structured outputs, model catalog browsing, GPU reservations for steady traffic, and fine-tuning or dedicated cluster offerings published in the broader docs hierarchy.
Use cases
- Shipping chat or agent backends on hosted open-weight checkpoints without owning GPU fleets
- Prototyping tool-calling pipelines using Together-supported models listed in docs
- Moving from curl-only proofs to typed SDK integrations for retries and telemetry
- Exploring Together fine-tuning or dedicated endpoint SKUs once baseline usage patterns are validated
- Teaching teams an OpenAI-shaped HTTP interface while swapping model IDs to Together catalog entries
Key features
- Official SDK quickstart flows for Python and TypeScript with environment-based API keys (`TOGETHER_API_KEY`)
- REST chat-completions endpoints compatible with common OpenAI-style JSON payloads illustrated in Together quickstart docs
- Streaming completions demonstrated with `stream=True` in Python and Async iterators in TypeScript samples
- Product surface area spanning model catalogs, GPU clusters, LoRA/full fine-tuning, and reservations per Together documentation index
- Documented linkage between keys, billing projects (`api.together.ai` console), and per-model selection
Who Is It For?
- Backend engineers prototyping LLM-heavy services
- ML engineers evaluating hosted inference for open-weight checkpoints
- Developer advocates standardizing onboarding material with official SDK snippets
Frequently Asked Questions
- Do I need the SDK or can I use curl?
- Together documents both: quickstart installs `together` or `together-ai`, but curl examples POST directly to `/v1/chat/completions` with your API key header.
- Where are API keys created?
- Docs direct users to the Together console API keys workflow under their active project (`api.together.ai/settings/projects/~current/api-keys` per quickstart).
- Which model does the introductory sample call?
- The quickstart streams `openai/gpt-oss-20b` as of the Together quickstart reproduction in this corpus.
Related
Related
3 Indexed items
Groq Cloud API
GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.
Langfuse
Langfuse is an open-source product for LLM application observability: it ingests traces and spans from your stack, supports datasets and prompt/version workflows, and offers optional Langfuse Cloud or self-hosted deployment. It integrates with popular Python/JS SDKs and frameworks that emit OpenTelemetry-compatible telemetry, so teams can debug agent loops, compare prompt iterations, and monitor production quality metrics without building a custom analytics pipeline from scratch.
OpenRouter
OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.