Unified OpenAI-format API for 100+ LLM providers plus optional self-hosted gateway
LiteLLM is an open-source Python library and proxy stack documented at docs.litellm.ai that exposes a single `completion()` interface across providers such as OpenAI, Anthropic, Vertex AI, Bedrock, and Ollama using OpenAI-compatible request and response shapes. The project documents a Router with retry, fallback, and load-balancing across deployments, optional observability callbacks (Langfuse, MLflow, Helicone, and others listed in observability guides), and a self-hosted LiteLLM Proxy (LLM Gateway) with virtual keys, spend tracking, guardrails, and an admin UI. Recent documentation also describes an MCP Gateway that centralizes MCP tool access with per-key, per-team, and per-organization permissions.
Use cases
- Swap model vendors behind one SDK without rewriting client code for each provider's quirks
- Operate an internal LLM gateway with per-team budgets and centralized logging
- Configure automatic fallbacks when a primary deployment hits rate limits or outages
- Attach Langfuse or MLflow callbacks through documented success_callback hooks
- Front multiple MCP servers through LiteLLM's proxy MCP feature for IDE agents
Key features
- Provider-agnostic `completion()`, embeddings, and related APIs with OpenAI-style `ModelResponse` objects per docs
- Router documented for retries, fallbacks, and load balancing across model deployments
- Proxy server quickstarts (CLI and Docker) exposing an OpenAI-compatible base URL on port 4000 in examples
- Virtual keys, cost tracking, and guardrails sections in proxy documentation
- MCP Gateway overview listing Streamable HTTP, SSE, and stdio transports with key/team access controls
Who Is It For?
- Platform engineers standardizing LLM access across microservices
- Teams running multi-provider inference with cost controls
- Developers who want OpenAI-client compatibility against non-OpenAI backends
Frequently Asked Questions
- Is LiteLLM only a Python library?
- Docs position the Python SDK and the optional LiteLLM Proxy as complementary paths—install `litellm` for in-app use or `litellm[proxy]` for the gateway.
- How do exceptions behave across providers?
- Official guides map provider failures to OpenAI exception types such as AuthenticationError and RateLimitError for consistent handling.
- Does the proxy replace observability tools?
- No—it integrates with them via callbacks; you still choose Langfuse, MLflow, or other vendors listed in observability docs.
Related
Related
3 Indexed items
Helicone
Helicone documents an AI Gateway at ai-gateway.helicone.ai that lets teams call 100+ models from OpenAI, Anthropic, Google, Groq, and other vendors through an OpenAI-compatible base URL while logging every request to the Helicone dashboard. Official quickstart guides show signing up at helicone.ai, creating API keys in the US control plane, and pointing standard OpenAI SDK clients at the gateway with automatic observability. Helicone states credits carry 0% markup versus provider list prices, support automatic fallbacks when a provider is down, and allow bringing your own provider keys instead of using Helicone-managed credentials.
Portkey
Portkey documents an AI gateway at docs.portkey.ai that unifies access to more than 250 models through a Portkey SDK or OpenAI-compatible base URL (`PORTKEY_GATEWAY_URL`) with provider routing headers. Official quickstarts show three-line Python or TypeScript integrations that start monitoring LLM requests for resilience, security, and performance. Portkey states the open-source gateway is free to self-host while the managed service includes a free tier of 10k requests per month, edge-hosted workers adding roughly 20–40ms latency versus direct API calls, ISO 27001 and SOC 2 certifications, and optional configurations that skip storing request/response bodies.
OpenRouter
OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.