fallback_chain to any agent node — an ordered list of alternative models to try before giving up. This guide shows you how to set up fallback chains, which errors trigger them, and how to combine fallbacks with rate limiting to run reliable multi-provider pipelines.
Model prefix routing
dagraph uses the prefix of a model string to determine which provider to call. No separate provider configuration is needed — the prefix is the routing key:| Prefix | Provider |
|---|---|
anthropic/ | Anthropic Messages API |
openai/ | OpenAI API |
gemini/ | Google Gemini API |
ollama/ | Local Ollama instance |
claude-haiku-4-5-20251001 (no prefix) uses the executor backend you specify with --backend. Add a prefix to explicitly route to that provider regardless of backend.
Adding a fallback chain
Attachfallback_chain to any agent node. The scheduler tries the primary model first, then works down the list in order, stopping at the first successful response:
Each provider needs its own API key set as an environment variable. For Anthropic:
ANTHROPIC_API_KEY. For OpenAI: OPENAI_API_KEY. For Gemini: GEMINI_API_KEY. Ollama requires a running local instance but no API key.Which errors trigger the fallback chain
The scheduler walks the fallback chain when it receives a retriable error from the current provider:- HTTP 429 — rate limit exceeded
- HTTP 5xx — provider-side server error
- Network errors — connection refused, DNS failure, dropped connection
- Timeouts — request exceeded the node’s
timeout_seconds
Which errors bypass the chain
The scheduler does not try fallbacks for errors that a different provider cannot fix:- HTTP 401 / 403 — authentication or authorization failure. A bad API key is still bad at the next provider. Fix your credentials instead.
- HTTP 400 / 422 — bad request or validation error. The problem is in your prompt or parameters, not the provider.
Cost and traces
You are only charged for the successful attempt. If the primary model fails and the first fallback succeeds, you pay only for the fallback call — the failed attempt’s tokens are not billed (they were never completed). Every attempt, successful or not, appears in the run trace atruns/<run_id>/trace.jsonl. Inspect a run to see which provider actually answered:
Rate limiting with --rpm
When using the --backend api option, you can hit Anthropic’s (or another provider’s) per-minute request limits even without a provider outage. Use --rpm to throttle the request rate before you trigger 429s:
--rpm applies to the executor as a whole — across all nodes and all providers. It is most useful in combination with large fan-out DAGs where many nodes fire simultaneously.
Fallback chains inside evaluator_loop
Generator and evaluator roles inside an evaluator_loop node also support fallback_chain via the AgentRole spec:
Parallel agents
Combine fallback chains with parallel execution for resilient fan-out workflows.
Evaluator loops
Add fallback chains to generator and evaluator roles for reliable iterative refinement.