evaluator_loop node type solves this by running a generator/evaluator cycle inside a single DAG node: the generator produces a candidate, the evaluator scores it, and if the evaluator rejects it, the generator tries again with the feedback in context. This repeats until the evaluator approves or max_iterations is reached.
How the loop works
Each iteration consists of exactly two LLM calls:- Generator call — produces a candidate output. On the first iteration it receives only the original prompt and DAG inputs. On subsequent iterations it also receives
previous_outputandevaluator_feedback. - Evaluator call — receives the candidate and must return a JSON object with at least
{"approved": bool, "feedback": str}. If approved, the loop exits and the final candidate becomes the node’s artifact. If not approved, the cycle repeats.
"approved": true or max_iterations attempts have been exhausted. The final artifact is always the last generator output, regardless of whether the evaluator ultimately approved it.
Complete example: haiku with syllable enforcement
Thehaiku_evaluator.yaml example uses a cheap Haiku model to generate the poem and a more capable Sonnet model to rigorously check the 5-7-5 syllable constraint — a judgment call that’s harder to get right than the generation itself.
Template variables reference
Generator prompt variables
| Variable | Available | Description |
|---|---|---|
{{ topic }} (any DAG input) | All iterations | Standard DAG inputs and upstream node outputs |
{{ iteration }} | All iterations | Current iteration number, 1-indexed |
{{ previous_output }} | Iteration 2+ | The generator’s output from the previous iteration |
{{ evaluator_feedback }} | Iteration 2+ | The feedback field from the evaluator’s JSON response |
{% if iteration > 1 %} guards to keep the first iteration’s prompt clean and only inject feedback context on retries.
Evaluator prompt variables
| Variable | Description |
|---|---|
{{ candidate }} | The generator’s current output (the text to evaluate) |
{{ iteration }} | Current iteration number, 1-indexed |
| Any DAG input | Topic, constraints, or other context from the original inputs |
Evaluator JSON contract
The evaluator must return a JSON object. The minimum required shape is:score — dagraph ignores them but they appear in traces. If the evaluator returns malformed JSON or non-JSON text, dagraph treats the iteration as not approved and uses the raw text as feedback for the next round.
Cost and iteration budget
Each iteration consumes one generator call plus one evaluator call. Withmax_iterations: 3 and an approval on the second attempt, you pay for four total LLM calls (two iterations × two calls each, minus the third evaluator that never fires because the second was approved).
Per-node budget caps apply across all iterations combined. Set a budget on the node if you want a hard ceiling on how much a single evaluator loop can spend:
Building your own evaluator prompt
The evaluator prompt is the most important part of the pattern. Keep these principles in mind:- Be specific about criteria. Vague instructions like “make it good” produce inconsistent
approveddecisions. List numbered, testable criteria. - Ask for a reason even when approving. The
feedbackfield on an approved response still gets stored in the trace and helps you understand what worked. - Tell the evaluator what to ignore. If you only care about format and not style, say so explicitly to prevent false rejections.
Parallel agents
Run independent evaluator loops in parallel across multiple candidates.
Multi-provider fallback
Keep evaluator loops running even during provider outages with fallback chains.