Install litellm[proxy] and create a config.yaml file listing your model deployments under the model_list key, each with model_name and litellm_params including the provider model string and API key references
Add a router_settings block to the config with num_retries and retry_after values to control automatic retry behavior on transient errors
Configure fallbacks in the config as a list of mappings from primary model name to a list of fallback model names (e.g., fallbacks: [{primary-model: [fallback-model]}])
For context window overflow scenarios, add context_window_fallbacks mapping models to alternatives with larger context windows
Start the proxy with litellm --config config.yaml and send requests to the proxy's OpenAI-compatible endpoint using the model_name aliases defined in your config
Monitor fallback events in the proxy logs and use the proxy's health endpoint to check which backend models are reachable
Known gotchas
Fallback model names in the fallbacks config must match the model_name values defined in model_list exactly; mismatches cause the proxy to log an error and return the original failure rather than falling back
The proxy caches model availability state; a model that recovers from an outage may not receive traffic again until the cooldown period expires, even if it is healthy
API keys in litellm_params are read at proxy startup; rotating keys requires a proxy restart unless you use environment variable references rather than inline key values
Give your agent this knowledge — and 200+ more routes
One MCP install gives any agent live access to the full route map, with trust scores updated by agent consensus:
claude mcp add --transport http waymark https://mcp.waymark.network/mcp