L

Skill Entry

LiteLLM Router fallback readiness review

Translates LiteLLM routing documentation into a pre-flight checklist before promoting multi-deployment LLM routes to production. Teams verify Router configuration covers primary and fallback model lists, retry policies, and load-balancing strategy documented at docs.litellm.ai/docs/routing, confirm proxy virtual keys and spend limits if traffic flows through LiteLLM Proxy, and rehearse provider outage drills using OpenAI-mapped exceptions (AuthenticationError, RateLimitError, APIError). The skill also points operators to enable `store_model_in_db` when MCP tools must persist alongside router definitions and to validate MCP server names comply with SEP-986 guidance referenced in LiteLLM v1.80.18 release notes.

Category Operations
Platform LiteLLM
Published 2026-05-19
litellmroutingfallback

Use cases

  • Launching a new customer-facing assistant that must survive primary vendor rate limits
  • Migrating from a single OpenAI deployment to a Router with Anthropic or Bedrock fallbacks
  • Platform review before enabling LiteLLM Proxy MCP Gateway for IDE agents
  • Quarterly disaster-recovery exercise for LLM dependencies
  • Cost-optimization project that adds cheaper secondary models behind the same API surface

Key features

  • Inventory deployments: list each `model_name`, upstream provider, region, and whether it is primary or fallback in Router config.
  • Document retry counts, timeout budgets, and cooldown behavior exactly as set in LiteLLM routing YAML or SDK Router objects—no undocumented defaults.
  • Run a controlled failure test (disable API key or block primary deployment) and confirm traffic shifts to the documented fallback with observable logs.
  • If using LiteLLM Proxy, verify virtual-key budgets, guardrails, and spend-tracking dashboards reflect the drill traffic.
  • When MCP servers are in scope, confirm database storage flags and SEP-986-compliant server names per MCP gateway docs before granting teams access.
  • Capture outcomes in a sign-off table: test date, failed deployment, observed fallback model, latency delta, and open risks.

When to Use This Skill

  • Before any production cutover that introduces Router-based fallbacks
  • After adding a new provider deployment to an existing LiteLLM Proxy cluster
  • When auditors ask for evidence of LLM dependency resilience beyond a single vendor SLA

Expected Output

A signed routing readiness memo listing deployments, fallback order, test evidence, and residual risks tied to LiteLLM configuration artifacts.

Frequently Asked Questions

Is this only for the Proxy server?
No—the checklist applies to in-process LiteLLM Routers as well; add proxy-specific steps only when traffic terminates at the gateway.
Do we need MCP enabled to review routing?
Only if your architecture routes MCP tools through LiteLLM Proxy; otherwise focus on completion routing sections.
Can we skip live failure tests?
Documentation-driven reviews help, but a controlled primary-outage drill is the only way to prove fallbacks actually fire.

Related

Related

3 Indexed items

Example SLO document authoring

Operations

Operationalizes Appendix A from Google’s SRE workbook by translating the illustrative “Example Game Service” SLO dossier into a checklist teams can mimic: articulate the user-facing workload, nominate rolling measurement windows (the appendix uses four weeks), pair each subsystem with tightly defined SLIs (availability from load balancers excluding 5xx, latency percentile gates, freshness for derived tables, correctness via probers, completeness for pipelines), cite explicit numerator/denominator language, rationalize rounding policies, quantify per-objective error budgets, and cite the sibling error budget policy for enforcement.

Error budget policy drafting

Operations

Translates Google’s worked example error-budget policy into a repeatable playbook for tying release tempo to measured reliability: define goals (protect users from repeated SLO misses while preserving innovation incentives), spell out what happens when the rolling window consumes its budget (freeze changes except urgent defects or security work), codify outage investigation thresholds, and document escalation paths when stakeholders disagree about budget math.

Incident response

Operations

Structured process for handling production incidents from detection to resolution and post-mortem. Covers severity assessment using P0-P3 grading, team coordination with a designated incident commander, communication templates for stakeholders and users, and structured post-mortem requirements to drive organizational learning from every significant outage.