Translates LiteLLM routing documentation into a pre-flight checklist before promoting multi-deployment LLM routes to production. Teams verify Router configuration covers primary and fallback model lists, retry policies, and load-balancing strategy documented at docs.litellm.ai/docs/routing, confirm proxy virtual keys and spend limits if traffic flows through LiteLLM Proxy, and rehearse provider outage drills using OpenAI-mapped exceptions (AuthenticationError, RateLimitError, APIError). The skill also points operators to enable `store_model_in_db` when MCP tools must persist alongside router definitions and to validate MCP server names comply with SEP-986 guidance referenced in LiteLLM v1.80.18 release notes.
Use cases
- Launching a new customer-facing assistant that must survive primary vendor rate limits
- Migrating from a single OpenAI deployment to a Router with Anthropic or Bedrock fallbacks
- Platform review before enabling LiteLLM Proxy MCP Gateway for IDE agents
- Quarterly disaster-recovery exercise for LLM dependencies
- Cost-optimization project that adds cheaper secondary models behind the same API surface
Key features
- Inventory deployments: list each `model_name`, upstream provider, region, and whether it is primary or fallback in Router config.
- Document retry counts, timeout budgets, and cooldown behavior exactly as set in LiteLLM routing YAML or SDK Router objects—no undocumented defaults.
- Run a controlled failure test (disable API key or block primary deployment) and confirm traffic shifts to the documented fallback with observable logs.
- If using LiteLLM Proxy, verify virtual-key budgets, guardrails, and spend-tracking dashboards reflect the drill traffic.
- When MCP servers are in scope, confirm database storage flags and SEP-986-compliant server names per MCP gateway docs before granting teams access.
- Capture outcomes in a sign-off table: test date, failed deployment, observed fallback model, latency delta, and open risks.
When to Use This Skill
- Before any production cutover that introduces Router-based fallbacks
- After adding a new provider deployment to an existing LiteLLM Proxy cluster
- When auditors ask for evidence of LLM dependency resilience beyond a single vendor SLA
Expected Output
A signed routing readiness memo listing deployments, fallback order, test evidence, and residual risks tied to LiteLLM configuration artifacts.
Frequently Asked Questions
- Is this only for the Proxy server?
- No—the checklist applies to in-process LiteLLM Routers as well; add proxy-specific steps only when traffic terminates at the gateway.
- Do we need MCP enabled to review routing?
- Only if your architecture routes MCP tools through LiteLLM Proxy; otherwise focus on completion routing sections.
- Can we skip live failure tests?
- Documentation-driven reviews help, but a controlled primary-outage drill is the only way to prove fallbacks actually fire.
Related
Related
3 Indexed items
Multi-region LLM provider readiness review
Converts export-control and multi-vendor routing guidance into a planning checklist for teams that cannot assume a single geography or chip supplier will stay available. Practitioners document primary and contingency model routes (including gateways such as Helicone or LiteLLM Router configs), quantify revenue or latency exposure if a region is blocked, and set investor/customer messaging when leadership advises to "expect nothing" from a market—as publicly reported when semiconductor vendors discuss China licensing uncertainty. The skill cross-checks legal/compliance sign-off, drills failover to alternate regions or domestic stacks, and records evidence before production launches tied to geopolitically sensitive deployments.
Agentic coding vendor readiness review
Turns platform reliability and multi-vendor coding-agent guidance into a checklist before standardizing on a single AI coding stack. Teams inventory host-platform SLAs (for example GitHub availability incidents documented on githubstatus.com), compare primary and backup agents (GitHub Copilot, Cursor, Claude Code, Codex, etc.), verify observability hooks through Braintrust or similar tracing, and rehearse workflows when the code host or agent API is degraded. The skill cites public status pages and vendor billing changes—such as usage-based Copilot pricing announced on github.blog—so procurement and engineering sign off with eyes open about downtime, leadership churn, and feature parity gaps reported in trade media.
AI economic benefit distribution readiness review
Converts public-policy and labor-relations guidance around AI-driven wealth into a planning checklist for organizations operating in semiconductor-heavy economies. Teams document how AI productivity gains translate—or fail to translate—into worker bonuses, public dividends, or reinvestment; assess concentration risk when chipmakers dominate equity indices; and prepare dialogue frameworks for recurring labor-management disputes as agentic automation scales. The skill cites CNBC reporting on South Korea's deputy prime minister urging that AI benefits reach the public amid Samsung strike negotiations, Kospi gains led by Samsung and SK Hynix, and debates over distributing AI-sector tax windfalls—without prescribing specific tax policies beyond verifying stakeholder messaging against cited facts.