Run parallel AI agents across your dagraph workflow
Fan out work to multiple AI agents that run simultaneously, then merge their outputs — cutting wall-clock time and token costs with wave-based execution.
Parallel execution is the core reason to use a DAG instead of a linear chain. When you have independent subtasks — researching different angles, translating into multiple languages, scoring several candidates — dagraph runs them all at once in the same wave, then feeds their combined outputs to a downstream node only when every dependency has finished. This guide walks you through building fan-out/fan-in workflows, controlling concurrency, and scaling to dynamic lists with the map node.
dagraph uses a topological sort (Kahn’s algorithm) to group your nodes into waves. Every node whose depends_on list is empty fires in wave 1. Once wave 1 is complete, any node whose dependencies are now all satisfied fires in wave 2, and so on. Nodes within the same wave run simultaneously.You can preview the wave plan without spending any tokens:
The research.yaml example spins up three independent agents in parallel, each researching a different angle of the same topic, then passes all three outputs to a single synthesizer.
# research.yamlname: researchdescription: Parallel three-angle research on a topic, synthesized by Opus.budget: max_tokens: 50000 max_usd: 2.00nodes: # --- Wave 1: three agents fire simultaneously --- - id: research_a type: agent model: claude-haiku-4-5-20251001 max_output_tokens: 1500 prompt: | Research the topic "{{ topic }}" from a TECHNICAL perspective. Cover: mechanisms, architecture, implementation details. Return 5–8 concise bullet points. No fluff. - id: research_b type: agent model: claude-haiku-4-5-20251001 max_output_tokens: 1500 prompt: | Research the topic "{{ topic }}" from an ECONOMIC perspective. Cover: market forces, cost structures, winners, losers. Return 5–8 concise bullet points. No fluff. - id: research_c type: agent model: claude-haiku-4-5-20251001 max_output_tokens: 1500 prompt: | Research the topic "{{ topic }}" from a HUMAN/SOCIAL perspective. Cover: behavior, adoption, second-order effects on people. Return 5–8 concise bullet points. No fluff. # --- Wave 2: synthesizer waits for all three --- - id: synthesizer type: agent model: claude-sonnet-4-6 max_output_tokens: 3000 depends_on: [research_a, research_b, research_c] # blocks until all three complete prompt: | Three independent researchers analyzed "{{ topic }}". Synthesize their findings into a unified report that highlights: 1. Points of AGREEMENT across perspectives. 2. Points of CONTRADICTION or tension. 3. The 3 most important takeaways overall. == Technical perspective == {{ research_a }} == Economic perspective == {{ research_b }} == Human/Social perspective == {{ research_c }} Write the report in clear prose, ~400 words.
Run it with:
agentgraph run research.yaml --input topic="large language models"
Each node’s output is stored as an artifact and referenced by its id in downstream prompts — {{ research_a }}, {{ research_b }}, {{ research_c }}. You never pass raw text between nodes directly; dagraph resolves references from the artifact store.
Add depends_on to any node to make it wait for one or more predecessors. Dependencies are additive — a node won’t start until every ID in its list has completed successfully.
nodes: - id: fetch_data type: agent model: claude-haiku-4-5-20251001 prompt: "Fetch the latest stats for {{ topic }}." - id: clean_data type: agent model: claude-haiku-4-5-20251001 depends_on: [fetch_data] # waits for fetch_data only prompt: "Clean and normalize: {{ fetch_data }}" - id: write_report type: agent model: claude-sonnet-4-6 depends_on: [clean_data] # waits for the whole chain prompt: "Write a report from: {{ clean_data }}"
You can mix sequential and parallel paths in the same DAG. Any node that doesn’t share a dependency chain with another node will run in parallel with it.
By default dagraph allows up to 10 simultaneous in-flight LLM calls. Use --max-concurrent to lower that ceiling, for example to stay within provider rate limits or control costs during development:
# Allow at most 3 simultaneous LLM callsagentgraph run research.yaml --input topic="AI safety" --max-concurrent 3
Combine with --rpm when using the --backend api option to add a requests-per-minute cap:
agentgraph run research.yaml \ --input topic="AI safety" \ --backend api \ --max-concurrent 5 \ --rpm 30
When you don’t know your list of items at design time, use a map node to fan out over a runtime list. Each item in the list gets its own agent call, and the results are collected into a JSON array available to downstream nodes.
name: map_reducedescription: > Fan-out over a list with a `map` node, then reduce the results with a synthesizer agent.nodes: # Runs one agent call per item in `topics` — up to 3 at once - id: summaries type: map over: topics # Jinja expression — can be a DAG input or a dep's output as: topic # binds the current item to {{ topic }} in the prompt model: claude-haiku-4-5-20251001 max_concurrency: 3 prompt: | Write a single sentence about "{{ topic }}". No introduction, get straight to the point. # Receives summaries as a JSON array - id: synthesize type: agent depends_on: [summaries] model: claude-sonnet-4-6 prompt: | Here are one-sentence summaries for each topic: {{ summaries }} Write a short paragraph (3–4 sentences) connecting the themes.
Pass the list as an input:
agentgraph run map_reduce.yaml \ --input topics='["transformer models","diffusion models","reinforcement learning"]'
max_concurrency on a map node is independent of --max-concurrent on the CLI. The node-level cap is useful when you know one specific fan-out should stay narrow regardless of the global setting.