Skip to main content
Production AI workflows fail for reasons outside your control: a provider hits a capacity event, your account reaches its rate limit, or a network timeout drops a request. Rather than failing the whole DAG when one call fails, dagraph lets you attach a fallback_chain to any agent node — an ordered list of alternative models to try before giving up. This guide shows you how to set up fallback chains, which errors trigger them, and how to combine fallbacks with rate limiting to run reliable multi-provider pipelines.

Model prefix routing

dagraph uses the prefix of a model string to determine which provider to call. No separate provider configuration is needed — the prefix is the routing key:
PrefixProvider
anthropic/Anthropic Messages API
openai/OpenAI API
gemini/Google Gemini API
ollama/Local Ollama instance
A bare model name like claude-haiku-4-5-20251001 (no prefix) uses the executor backend you specify with --backend. Add a prefix to explicitly route to that provider regardless of backend.

Adding a fallback chain

Attach fallback_chain to any agent node. The scheduler tries the primary model first, then works down the list in order, stopping at the first successful response:
# multi_provider_fallback.yaml
name: multi_provider_fallback
inputs:
  topic:
    type: string

nodes:
  - id: research
    type: agent
    model: anthropic/claude-sonnet-4-6      # primary: try this first
    fallback_chain:
      - openai/gpt-4o                        # fallback 1
      - gemini/gemini-2.0-flash              # fallback 2
      - ollama/llama3.2                      # last resort: local, free
    prompt: |
      Research the topic: {{ topic }}.
      Return 3 key findings as bullet points.

  - id: critique
    type: agent
    model: openai/gpt-4o
    fallback_chain:
      - anthropic/claude-sonnet-4-6
    depends_on: [research]
    prompt: |
      Critique this research:

      {{ research }}

      What's missing? What's overstated?
Run it against the API backend so multi-provider routing is active:
agentgraph run multi_provider_fallback.yaml \
  --input topic="battery storage technology" \
  --backend api
Each provider needs its own API key set as an environment variable. For Anthropic: ANTHROPIC_API_KEY. For OpenAI: OPENAI_API_KEY. For Gemini: GEMINI_API_KEY. Ollama requires a running local instance but no API key.

Which errors trigger the fallback chain

The scheduler walks the fallback chain when it receives a retriable error from the current provider:
  • HTTP 429 — rate limit exceeded
  • HTTP 5xx — provider-side server error
  • Network errors — connection refused, DNS failure, dropped connection
  • Timeouts — request exceeded the node’s timeout_seconds
These errors are retried because a different provider might succeed where the first failed.

Which errors bypass the chain

The scheduler does not try fallbacks for errors that a different provider cannot fix:
  • HTTP 401 / 403 — authentication or authorization failure. A bad API key is still bad at the next provider. Fix your credentials instead.
  • HTTP 400 / 422 — bad request or validation error. The problem is in your prompt or parameters, not the provider.

Cost and traces

You are only charged for the successful attempt. If the primary model fails and the first fallback succeeds, you pay only for the fallback call — the failed attempt’s tokens are not billed (they were never completed). Every attempt, successful or not, appears in the run trace at runs/<run_id>/trace.jsonl. Inspect a run to see which provider actually answered:
agentgraph inspect <run_id>
The trace entry for a node that used a fallback includes both the failed attempt (with the error) and the successful attempt (with the model that answered).

Rate limiting with --rpm

When using the --backend api option, you can hit Anthropic’s (or another provider’s) per-minute request limits even without a provider outage. Use --rpm to throttle the request rate before you trigger 429s:
agentgraph run multi_provider_fallback.yaml \
  --input topic="climate policy" \
  --backend api \
  --rpm 30 \
  --max-concurrent 5
--rpm applies to the executor as a whole — across all nodes and all providers. It is most useful in combination with large fan-out DAGs where many nodes fire simultaneously.
A good default starting point for the Anthropic API free tier is --rpm 20 --max-concurrent 3. Bump these up as you upgrade your plan or add provider fallbacks that spread the load.

Fallback chains inside evaluator_loop

Generator and evaluator roles inside an evaluator_loop node also support fallback_chain via the AgentRole spec:
- id: haiku
  type: evaluator_loop
  max_iterations: 3
  generator:
    model: anthropic/claude-haiku-4-5-20251001
    fallback_chain:
      - openai/gpt-4o-mini
    prompt: "Write a haiku about {{ topic }}. ..."
  evaluator:
    model: anthropic/claude-sonnet-4-6
    fallback_chain:
      - openai/gpt-4o
    prompt: "Evaluate: {{ candidate }} ..."
Each role’s chain is independent — the generator and evaluator can fail over to different providers.

Parallel agents

Combine fallback chains with parallel execution for resilient fan-out workflows.

Evaluator loops

Add fallback chains to generator and evaluator roles for reliable iterative refinement.