Run and fine-tune ML models from a hosted prediction API

Replicate is a hosted platform for executing third-party and custom machine-learning models over HTTP without provisioning GPUs yourself. Official documentation explains how to authenticate with API tokens, create asynchronous predictions, stream outputs, retrieve model metadata, wire webhooks for completion events, and optionally deploy or fine-tune checkpoints (for example FLUX image workflows) published to the Replicate catalog.

Category Developer Tools

Pricing Pay-per-prediction billing + prepaid credits (see Replicate billing docs)

Platforms Web / API / Python / Node.js / Colab

inferenceapiimage-models

Use cases

Prototyping generative media workloads (image, audio, video) while abstracting GPU provisioning
Embedding remote inference behind product APIs with webhook-driven completion handling
Training or fine-tuning image models when following Replicate-published fine-tune guides
Teaching ML integration patterns with short official sample notebooks or Node samples
Comparing model outputs side by side using catalog search plus prediction metadata

Key features

HTTP prediction APIs with documented lifecycles (`predictions.create`, `predictions.get`) and optional streaming output modes
First-party quickstarts for Python, Node.js, and Google Colab plus links to client libraries and OpenAPI schemas
Model catalog mixing community and official checkpoints with metadata endpoints for discovery workflows
Guides for webhook delivery, verification, and testing when orchestrating long-running jobs
Build-and-deploy paths for pushing custom models or fine-tuned variants (Transformers, Diffusers, ComfyUI integrations, etc.)
Operational topics such as rate limits, safety checking, data-retention notes, and prepaid billing credits published in the billing section

Who Is It For?

Product engineers shipping inference APIs without dedicated ML ops teams
ML practitioners needing hosted GPUs for intermittent experimentation
Developer educators demonstrating hosted model invocation patterns

Frequently Asked Questions

How does authentication work?: Replicate documents API tokens (create/manage under account settings) used as bearer credentials for HTTPS calls.
Do I manage infrastructure?: The platform abstracts hardware; docs describe selecting models and invoking predictions remotely rather than reserving raw GPU nodes unless you pursue advanced deployment flows.
Can outputs notify my backend?: Yes—webhooks are covered in dedicated topics explaining setup, payload handling, and verification steps.

3 Indexed items

Together AI

Developer ToolsUsage-based inference + optional dedicated endpoints / fine-tuning (see Together pricing docs)

Together AI operates a developer platform for running prominent open-source and vendor-weight models from Together-hosted GPUs. Documentation centers on issuing API keys, installing the Together Python (`together`) or npm (`together-ai`) SDKs, or calling HTTPS endpoints such as `https://api.together.ai/v1/chat/completions` with Bearer authentication. Guides cover streaming chat completions, function calling, structured outputs, model catalog browsing, GPU reservations for steady traffic, and fine-tuning or dedicated cluster offerings published in the broader docs hierarchy.

Groq Cloud API

Developer ToolsFree tier + Pay-as-you-go (published USD rates)

GroqCloud exposes hosted language, speech, and compound workloads through Groq’s HTTP APIs. Documentation highlights compatibility with OpenAI client libraries when you point `base_url` at Groq’s OpenAI-compatible endpoint and supply a Groq API key, alongside first-party Groq SDKs for Python and JavaScript. Pricing pages publish per-model token rates (USD) for on-demand inference.

OpenRouter

Developer ToolsFree tier + Pay-as-you-go

OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.

Replicate