Unified OpenAI-format API for 100+ LLM providers plus optional self-hosted gateway
LiteLLM is an open-source Python library and proxy stack documented at docs.litellm.ai that exposes a single `completion()` interface across providers such as OpenAI, Anthropic, Vertex AI, Bedrock, and Ollama using OpenAI-compatible request and response shapes. The project documents a Router with retry, fallback, and load-balancing across deployments, optional observability callbacks (Langfuse, MLflow, Helicone, and others listed in observability guides), and a self-hosted LiteLLM Proxy (LLM Gateway) with virtual keys, spend tracking, guardrails, and an admin UI. Recent documentation also describes an MCP Gateway that centralizes MCP tool access with per-key, per-team, and per-organization permissions.
Use cases
- Swap model vendors behind one SDK without rewriting client code for each provider's quirks
- Operate an internal LLM gateway with per-team budgets and centralized logging
- Configure automatic fallbacks when a primary deployment hits rate limits or outages
- Attach Langfuse or MLflow callbacks through documented success_callback hooks
- Front multiple MCP servers through LiteLLM's proxy MCP feature for IDE agents
Key features
- Provider-agnostic `completion()`, embeddings, and related APIs with OpenAI-style `ModelResponse` objects per docs
- Router documented for retries, fallbacks, and load balancing across model deployments
- Proxy server quickstarts (CLI and Docker) exposing an OpenAI-compatible base URL on port 4000 in examples
- Virtual keys, cost tracking, and guardrails sections in proxy documentation
- MCP Gateway overview listing Streamable HTTP, SSE, and stdio transports with key/team access controls
Who Is It For?
- Platform engineers standardizing LLM access across microservices
- Teams running multi-provider inference with cost controls
- Developers who want OpenAI-client compatibility against non-OpenAI backends
Frequently Asked Questions
- Is LiteLLM only a Python library?
- Docs position the Python SDK and the optional LiteLLM Proxy as complementary paths—install `litellm` for in-app use or `litellm[proxy]` for the gateway.
- How do exceptions behave across providers?
- Official guides map provider failures to OpenAI exception types such as AuthenticationError and RateLimitError for consistent handling.
- Does the proxy replace observability tools?
- No—it integrates with them via callbacks; you still choose Langfuse, MLflow, or other vendors listed in observability docs.
Related
Related
3 Indexed items
OpenRouter
OpenRouter is a model gateway that exposes many third-party AI models through one OpenAI-compatible API. Teams can compare providers, set routing preferences, and switch models without rewriting core client logic for each vendor SDK. The service publishes per-model pricing and supports pay-as-you-go usage.
Langfuse
Langfuse is an open-source product for LLM application observability: it ingests traces and spans from your stack, supports datasets and prompt/version workflows, and offers optional Langfuse Cloud or self-hosted deployment. It integrates with popular Python/JS SDKs and frameworks that emit OpenTelemetry-compatible telemetry, so teams can debug agent loops, compare prompt iterations, and monitor production quality metrics without building a custom analytics pipeline from scratch.
LangSmith
LangSmith is LangChain's hosted and self-hostable platform for tracing, monitoring, and improving LLM applications. Official documentation at docs.langchain.com describes instrumenting apps via environment variables, framework integrations (OpenAI, Anthropic, CrewAI, Vercel AI SDK, Pydantic AI, and others listed on the integrations page), or the LangSmith SDK so teams can inspect multi-step runs, compare prompt versions, build datasets, run offline and online evaluations, configure automations, and collect feedback queues—without assembling bespoke analytics for agent loops.