This guide explains how to create workflows that orchestrate multiple commands into automated pipelines. Read Authoring Commands first — workflows are built from commands.
A workflow is a YAML file that defines a directed acyclic graph (DAG) of commands to execute. Workflows enable:
Multi-step automation: Chain multiple AI agents together
Parallel execution: Independent nodes run concurrently
Conditional branching: Route to different paths based on node output
Artifact passing: Output from one node becomes input for downstream nodes
Iterative loops: Loop nodes repeat until a completion signal
name: fix-github-issue
description: Investigate and fix a GitHub issue end-to-end
nodes:
- id: investigate
command: investigate-issue
- id: implement
command: implement-issue
depends_on: [investigate]
context: fresh
Using defaults as templates: Archon ships default workflows in .archon/workflows/defaults/ (12 bundled into the binary, plus additional ones available on disk in source builds). Browse them for real-world examples, then copy and modify:
Workflows live in .archon/workflows/ relative to the working directory:
.archon/
├── workflows/
│ ├── my-workflow.yaml
│ └── review/
│ └── full-review.yaml # Subdirectories work
└── commands/
└── [commands used by workflows]
Archon discovers workflows recursively - subdirectories are fine. If a workflow file fails to load (syntax error, validation failure), it’s skipped and the error is reported via /workflow list.
Global workflows: For workflows that apply to every project, place them in ~/.archon/.archon/workflows/. Global workflows are overridden by same-named repo workflows. See Global Workflows.
CLI vs Server: The CLI reads workflow files from wherever you run it (sees uncommitted changes). The server reads from the workspace clone at ~/.archon/workspaces/owner/repo/, which only syncs from the remote before worktree creation. If you edit a workflow locally but don’t push, the server won’t see it.
Workflows use DAG-based execution with nodes:. Each node runs a command or inline prompt, declares dependencies, and supports conditional branching:
name: classify-and-fix
description: Classify issue type, then run the appropriate fix path
nodes:
- id: classify
command: classify-issue
output_format:
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
required: [type]
- id: investigate
command: investigate-bug
depends_on: [classify]
when: "$classify.output.type == 'BUG'"
- id: plan
command: plan-feature
depends_on: [classify]
when: "$classify.output.type == 'FEATURE'"
- id: implement
command: implement-changes
depends_on: [investigate, plan]
trigger_rule: none_failed_min_one_success
Nodes without depends_on run immediately. Nodes in the same topological layer run concurrently via Promise.allSettled. Skipped nodes (failed when: condition or trigger_rule) propagate their skipped state to dependants.
Note: The steps: (sequential) format has been removed. All workflows use nodes: (DAG) format exclusively.
These fields map directly to Claude Agent SDK options. All are Claude-only — Codex nodes emit a warning and ignore them. They can be set per-node or at the workflow level as defaults (per-node takes precedence). maxBudgetUsd and systemPrompt are per-node only.
effort — reasoning depth:
- id: thorough-review
command: review
effort: high# 'low' | 'medium' | 'high' | 'max'
thinking — extended thinking mode (string shorthand or object form):
Use output_format to enforce JSON output from an AI node. For Claude, the schema is passed via the SDK’s outputFormat option and structured_output is used directly. For Codex (v0.116.0+), the schema is passed via TurnOptions.outputSchema and the agent’s inline JSON response is used. Both ensure clean JSON for when: conditions and $nodeId.output substitution:
nodes:
- id: classify
command: classify-issue
output_format:
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
severity:
type: string
enum: [low, medium, high]
required: [type]
The output is captured as a JSON string and available via $classify.output (full JSON) or $classify.output.type (field access)
Use output_format when downstream nodes need to branch on specific values via when:
allowed_tools and denied_tools for Tool Restrictions
Restrict which built-in tools a node can use without relying on prompt instructions. Restrictions are enforced at the Claude SDK level.
nodes:
- id: review
command: code-review
allowed_tools: [Read, Grep, Glob] # whitelist — only these tools available
- id: implement
command: implement-feature
denied_tools: [WebSearch, WebFetch] # blacklist — remove these tools
- id: mcp-only
command: mcp-command
allowed_tools: [] # empty list = disable all built-in tools
allowed_tools: [] disables all built-in tools (useful for MCP-only nodes). Use the mcp field on a node to attach per-node MCP servers — see Node Fields
If both are set, denied_tools is applied after allowed_tools
undefined (field absent) and [] have different semantics — absent means use default tool set, [] means no tools
Claude only — Codex nodes/steps emit a warning and continue (Codex doesn’t support per-call tool restrictions)
Every node automatically retries on transient errors (SDK subprocess crashes, rate limits, network timeouts) using a default configuration: 2 retries (3 total attempts), 3 s base delay with exponential backoff. You will see a platform notification before each retry attempt.
To customise, add a retry: block:
nodes:
- id: flaky-node
command: flaky-command
retry:
max_attempts: 3# 3 retries = 4 total attempts
delay_ms: 5000
on_error: transient
- id: aggressive-retry
prompt: "Summarise the output"
retry:
max_attempts: 4# 4 retries = 5 total attempts
on_error: all# Retry even non-transient errors (use with caution)
Number of retry attempts (not including the initial attempt). 1 = one retry (2 total attempts)
delay_ms
number
3000
1000–60000
Base delay in ms before the first retry. Doubles each attempt (exponential backoff)
on_error
'transient' | 'all'
'transient'
—
Which errors trigger a retry. 'transient' = SDK crashes, rate limits, network timeouts only. 'all' = any error including unknown errors (FATAL errors such as auth failures are never retried regardless)
SDK subprocess retry (claude.ts) — 3 total attempts, 2 s base backoff
↓ only if all SDK retries exhausted
Node retry (dag-executor) — default 2 retries, 3 s base backoff
↓ only if all node retries exhausted
Workflow fails → next invocation auto-resumes completed nodes
This means a single transient crash may trigger up to 3 SDK retries before a single node retry attempt is consumed.
DAG resume: For nodes: (DAG) workflows, resume is automatic — the next invocation detects the prior failed run and skips already-completed nodes. No --resume flag is needed. See DAG Resume on Failure below.
When a nodes: (DAG) workflow fails (including due to a server restart), the next invocation automatically resumes from where it left off — no --resume flag required.
How it works:
On each invocation, Archon checks for a prior failed run of the same workflow at the same working path.
If found, it loads the node_completed events from that run to determine which nodes finished successfully.
Completed nodes are skipped; only failed and not-yet-run nodes are executed.
You receive a platform message like: Resuming workflow — skipping 3 already-completed node(s).
Server restart: If a server restart leaves runs in running status, they are automatically marked as failed on the next startup (with metadata.failure_reason = 'server_restart'). The next invocation of the same workflow at the same path auto-resumes from completed nodes.
Known limitation: AI session context from prior nodes is not restored. If a downstream node relies on in-context knowledge from a prior run’s session (rather than artifacts), it may need to re-read those artifacts explicitly.
Fresh start: If zero nodes completed in the prior run, Archon starts fresh (no nodes to skip).
By default, workflows started from the Web UI run in the background — execution is
dispatched to an internal worker conversation and results appear only in the workflow run
log, not in the chat window.
Set interactive: true to run the workflow in the foreground (same as CLI, Slack,
Telegram, and GitHub): all AI output and approval gate messages stream directly to the
user’s chat window.
name: my-interactive-workflow
interactive: true# Web UI: foreground execution (output visible in chat)
nodes:
- id: plan
prompt: "Create a plan for $USER_MESSAGE"
- id: review-gate
approval:
message: "Does this plan look good?"
depends_on: [plan]
- id: implement
command: implement
depends_on: [review-gate]
When to use interactive: true:
Workflows with approval nodes — users must see the AI output and respond inline
Workflows with interactive loop nodes (loop.interactive: true) — the loop gate pause requires foreground execution to deliver the gate message and run ID to the user
Multi-turn workflows where the user needs to provide feedback at each step
Any workflow where the response must appear in the user’s active chat thread
Platforms:interactive only affects the web platform. CLI, Slack, Telegram, and
GitHub always run workflows in foreground mode regardless of this setting.
All workflows support variable substitution in prompts and commands. The most commonly used:
Variable
Description
$ARGUMENTS / $USER_MESSAGE
The user’s input message that triggered the workflow
$WORKFLOW_ID
Unique ID for this workflow run
$ARTIFACTS_DIR
Pre-created artifacts directory for this workflow run
$BASE_BRANCH
Base branch (auto-detected or configured)
$DOCS_DIR
Documentation directory path (default: docs/)
$CONTEXT
GitHub issue/PR context (if available)
$nodeId.output
Output of a completed upstream node
$nodeId.output.field
JSON field from a structured upstream node output
See the Variable Reference for the complete list, including $LOOP_USER_INPUT, $REJECTION_REASON, positional arguments, substitution order, and context variable behavior.
Use an approval node to pause for human review before continuing:
name: careful-refactor
description: Refactor with human approval gate
nodes:
- id: propose
command: propose-refactor
- id: review-gate
approval:
message: "Review the proposed refactor before proceeding. Check the artifacts directory."
depends_on: [propose]
- id: execute
command: execute-approved-refactor
depends_on: [review-gate]
- id: pr
command: create-pr
depends_on: [execute]
context: fresh
When the workflow reaches review-gate, it pauses and notifies you. Approve or reject via:
Natural language (recommended): Just type your response in the conversation — the system detects the paused workflow and auto-resumes
CLI: bun run cli workflow approve <run-id> or bun run cli workflow reject <run-id>
Explicit command: /workflow approve <run-id> or /workflow reject <run-id> (records approval; send a follow-up message to resume)
Web UI: Click the Approve/Reject buttons on the dashboard card
API: POST /api/workflows/runs/<run-id>/approve or /reject
After approval via natural language or CLI, the workflow auto-resumes from the next node. The user’s approval comment is available as $review-gate.output in downstream nodes only when capture_response: true is set on the approval node.
Without on_reject: rejecting cancels the workflow.
With on_reject: rejecting triggers an AI rework prompt and re-pauses for re-review.
See Approval Nodes for full details.
Use a cancel: node to stop a workflow when a precondition fails — preventing wasted compute on downstream branches:
nodes:
- id: check
bash: "git merge-base --is-ancestor HEAD origin/main && echo ok || echo blocked"
- id: stop-if-blocked
cancel: "PR has merge conflicts — cannot proceed with review"
depends_on: [check]
when: "$check.output == 'blocked'"
- id: review
prompt: "Review the PR..."
depends_on: [check]
when: "$check.output == 'ok'"
When a cancel: node executes (passes its when: gate), it sets the workflow run to cancelled with the reason string and stops all in-flight nodes. Unlike node failure, cancellation is intentional — the status is cancelled, not failed.
Choosing: Interactive Loop vs Approval with on_reject
Two primitives handle human-in-the-loop iteration. Use the right one for your pattern:
Interactive Loop
Approval + on_reject
YAML
loop.interactive: true
approval.on_reject: { prompt }
User input variable
$LOOP_USER_INPUT
$REJECTION_REASON
How it works
Same prompt runs each iteration, user input injected as variable
Specific on_reject prompt runs only on rejection
Best for
Conversational iteration — explore, refine, review cycles where the AI and human go back and forth
Gate-then-fix — approve to proceed, or reject to trigger a specific corrective action
Approval signal
AI detects user intent in its output (<promise>DONE</promise>)
User explicitly approves or rejects via button/command
Example
PIV loop: explore → user feedback → explore again
Report generation: generate → user rejects → AI revises specific section
Interactive loop (loop.interactive: true):
- id: refine-plan
loop:
prompt: |
User's feedback: $LOOP_USER_INPUT
Read the plan, apply feedback, present changes.
until: PLAN_APPROVED
max_iterations: 10
interactive: true
gate_message: "Review the plan. Provide feedback or say 'approved'."
The AI runs each iteration, pauses for user input, user’s text feeds into the next iteration via $LOOP_USER_INPUT. The AI decides when to emit the completion signal based on the user’s response.
Approval with on_reject (approval.on_reject):
- id: review
approval:
message: "Review the report. Approve or request changes."
capture_response: true
on_reject: { prompt: "Revise based on: $REJECTION_REASON", max_attempts: 5 }
depends_on: [generate]
The workflow pauses at the approval gate. User approves -> workflow continues. User rejects with feedback -> the on_reject prompt runs with $REJECTION_REASON, then re-pauses at the same gate.
Rule of thumb: If the human and AI are having a conversation (exploring, refining, iterating), use an interactive loop. If the workflow should proceed unless the human objects, use an approval gate with on_reject.
After a workflow runs, check the artifacts in the $ARTIFACTS_DIR for that run (located at ~/.archon/workspaces/owner/repo/artifacts/runs/{workflow-id}/).