This guide explains how to create workflows that orchestrate multiple commands into automated pipelines. Read Authoring Commands first — workflows are built from commands.
A workflow is a YAML file that defines a directed acyclic graph (DAG) of commands to execute. Workflows enable:
Multi-step automation: Chain multiple AI agents together
Parallel execution: Independent nodes run concurrently
Conditional branching: Route to different paths based on node output
Artifact passing: Output from one node becomes input for downstream nodes
Iterative loops: Loop nodes repeat until a completion signal
name: fix-github-issue
description: Investigate and fix a GitHub issue end-to-end
nodes:
- id: investigate
command: investigate-issue
- id: implement
command: implement-issue
depends_on: [investigate]
context: fresh
Using defaults as templates: Archon ships default workflows in .archon/workflows/defaults/ (12 bundled into the binary, plus additional ones available on disk in source builds). Browse them for real-world examples, then copy and modify:
Workflows live in .archon/workflows/ relative to the working directory:
.archon/
├── workflows/
│ ├── my-workflow.yaml
│ └── review/
│ └── full-review.yaml # Subdirectories work
└── commands/
└── [commands used by workflows]
Archon discovers workflows recursively - subdirectories are fine. If a workflow file fails to load (syntax error, validation failure), it’s skipped and the error is reported via /workflow list.
Global workflows: For workflows that apply to every project, place them in ~/.archon/workflows/. Global workflows are overridden by same-named repo workflows. See Global Workflows.
CLI vs Server: The CLI reads workflow files from wherever you run it (sees uncommitted changes). The server reads from the workspace clone at ~/.archon/workspaces/owner/repo/, which only syncs from the remote before worktree creation. If you edit a workflow locally but don’t push, the server won’t see it.
Workflows use DAG-based execution with nodes:. Each node runs a command or inline prompt, declares dependencies, and supports conditional branching:
name: classify-and-fix
description: Classify issue type, then run the appropriate fix path
nodes:
- id: classify
command: classify-issue
output_format:
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
required: [type]
- id: investigate
command: investigate-bug
depends_on: [classify]
when: "$classify.output.type == 'BUG'"
- id: plan
command: plan-feature
depends_on: [classify]
when: "$classify.output.type == 'FEATURE'"
- id: implement
command: implement-changes
depends_on: [investigate, plan]
trigger_rule: none_failed_min_one_success
Nodes without depends_on run immediately. Nodes in the same topological layer run concurrently via Promise.allSettled. Skipped nodes (failed when: condition or trigger_rule) propagate their skipped state to dependants.
Note: The steps: (sequential) format has been removed. All workflows use nodes: (DAG) format exclusively.
Node types — exactly one required per node (mutually exclusive):
Field
Type
Description
command
string
Command name to load from .archon/commands/
prompt
string
Inline prompt string
bash
string
Shell script (no AI). Stdout captured as $nodeId.output. Optional timeout (ms, default 120000)
script
string
TypeScript/JavaScript (via bun) or Python (via uv) — inline code or named reference to .archon/scripts/. Stdout captured as $nodeId.output. Requires runtime: bun or runtime: uv. Optional deps (uv only) and timeout (ms, default 120000). See Script Nodes
loop
object
Iterative AI prompt until completion signal. See Loop Nodes
These fields map directly to Claude Agent SDK options. All are Claude-only — Codex nodes emit a warning and ignore them. They can be set per-node or at the workflow level as defaults (per-node takes precedence). maxBudgetUsd and systemPrompt are per-node only.
effort — reasoning depth:
- id: thorough-review
command: review
effort: high# 'low' | 'medium' | 'high' | 'max'
thinking — extended thinking mode (string shorthand or object form):
Use output_format to enforce JSON output from an AI node. For Claude, the schema is passed via the SDK’s outputFormat option and structured_output is used directly. For Codex (v0.116.0+), the schema is passed via TurnOptions.outputSchema and the agent’s inline JSON response is used. Both ensure clean JSON for when: conditions and $nodeId.output substitution:
nodes:
- id: classify
command: classify-issue
output_format:
type: object
properties:
type:
type: string
enum: [BUG, FEATURE]
severity:
type: string
enum: [low, medium, high]
required: [type]
The output is captured as a JSON string and available via $classify.output (full JSON) or $classify.output.type (field access)
Use output_format when downstream nodes need to branch on specific values via when:
allowed_tools and denied_tools for Tool Restrictions
Restrict which built-in tools a node can use without relying on prompt instructions. Restrictions are enforced at the Claude SDK level.
nodes:
- id: review
command: code-review
allowed_tools: [Read, Grep, Glob] # whitelist — only these tools available
- id: implement
command: implement-feature
denied_tools: [WebSearch, WebFetch] # blacklist — remove these tools
- id: mcp-only
command: mcp-command
allowed_tools: [] # empty list = disable all built-in tools
allowed_tools: [] disables all built-in tools (useful for MCP-only nodes). Use the mcp field on a node to attach per-node MCP servers — see Node Fields
If both are set, denied_tools is applied after allowed_tools
undefined (field absent) and [] have different semantics — absent means use default tool set, [] means no tools
Claude only — Codex nodes/steps emit a warning and continue (Codex doesn’t support per-call tool restrictions)
Define Claude sub-agents directly in the workflow YAML, without authoring .claude/agents/*.md files. The main agent can spawn them in parallel via the Task tool — useful for map-reduce patterns where a cheap model (e.g. Haiku) briefs items and a stronger model reduces.
nodes:
- id: triage
prompt: |
Fetch open issues via `gh issue list ...`. For each issue, spawn the
brief-gen sub-agent in parallel (one message, multiple Task tool calls)
to produce a 2-3 sentence brief. Then cluster briefs for duplicates.
model: sonnet
allowed_tools: [Bash, Read, Write, Task]
agents:
brief-gen:
description: Summarises a single GitHub issue in 2-3 sentences
prompt: |
You are concise. Read the issue provided in the caller's prompt.
Agent IDs must be kebab-case (^[a-z0-9]+(-[a-z0-9]+)*$)
Each definition requires description and prompt; model, tools, disallowedTools, skills, and maxTurns are optional
Map is merged with any SDK-level agents and with the internal dag-node-skills wrapper created by skills: — user-defined agents win on ID collision (a warning is logged when this happens)
Claude only. Codex and community providers that don’t support inline agents emit a warning and ignore the field
When to use agents: vs .claude/agents/*.md files:
agents: (inline) — use when the sub-agent is specific to ONE workflow’s needs. Keeps the workflow self-contained in a single YAML file; travels cleanly in PRs and forks.
.claude/agents/*.md (on-disk) — use when the sub-agent is shared across multiple workflows OR the whole project (for example, a triage-agent used by several maintenance workflows). On-disk agents live outside workflow YAMLs and are picked up automatically by the Claude Agent SDK.
Both sources coexist — inline agents and on-disk agents are both available to Task(subagent_type=...) at runtime.
Every node automatically retries on transient errors (SDK subprocess crashes, rate limits, network timeouts) using a default configuration: 2 retries (3 total attempts), 3 s base delay with exponential backoff. You will see a platform notification before each retry attempt.
To customise, add a retry: block:
nodes:
- id: flaky-node
command: flaky-command
retry:
max_attempts: 3# 3 retries = 4 total attempts
delay_ms: 5000
on_error: transient
- id: aggressive-retry
prompt: "Summarise the output"
retry:
max_attempts: 4# 4 retries = 5 total attempts
on_error: all# Retry even non-transient errors (use with caution)
Number of retry attempts (not including the initial attempt). 1 = one retry (2 total attempts)
delay_ms
number
3000
1000–60000
Base delay in ms before the first retry. Doubles each attempt (exponential backoff)
on_error
'transient' | 'all'
'transient'
—
Which errors trigger a retry. 'transient' = SDK crashes, rate limits, network timeouts only. 'all' = any error including unknown errors (FATAL errors such as auth failures are never retried regardless)
SDK subprocess retry (claude.ts) — 3 total attempts, 2 s base backoff
↓ only if all SDK retries exhausted
Node retry (dag-executor) — default 2 retries, 3 s base backoff
↓ only if all node retries exhausted
Workflow fails → user opts in to resume on next invocation
This means a single transient crash may trigger up to 3 SDK retries before a single node retry attempt is consumed.
DAG resume: For nodes: (DAG) workflows, resume is opt-in — pass --resume to archon workflow run, run archon workflow resume <id>, or use the web UI resume button. Plain archon workflow run <name> always starts a fresh run. See DAG Resume on Failure below.
When a nodes: (DAG) workflow fails, the prior run stays in the database as a candidate for resume. Resume is explicit: you opt in by flag or button.
How to resume:
CLI: archon workflow run <name> --resume resumes the most recent failed run for (workflow_name, cwd). Or archon workflow resume <run-id> to target a specific run.
Chat (web): Approving or rejecting a paused workflow auto-resumes from where it left off (the platform already knows the run id).
Web UI: Resume button on the workflow card.
What happens on resume:
The CLI / orchestrator looks up the resumable run, loads its node_completed events to determine which nodes finished successfully, and transitions the row back to running.
Completed nodes are skipped; only failed and not-yet-run nodes are executed.
You receive a platform message like: Resuming workflow — skipping 3 already-completed node(s).
Why opt-in? Earlier versions silently auto-resumed on plain archon workflow run, which caused state from prior failed runs (e.g. cached node outputs with stale inputs) to bleed into new invocations of the same workflow at the same path. See #1392 for the bug; now resume is always a user-driven decision.
Crashed servers / orphaned runs: Archon does not auto-fail running rows on server startup — that would kill workflows actively executing in another process (CLI, adapter). If a server crash leaves a row stuck as running, it remains visible in the dashboard (the Dashboard nav tab shows a count of running workflows). Transition it to a terminal status explicitly:
Web UI: click the Abandon or Cancel button on the workflow card. Abandon marks the run cancelled and keeps completed-node history. Cancel also terminates any in-flight subprocess.
CLI: archon workflow abandon <run-id> (equivalent to the dashboard Abandon button). Run IDs are listed by archon workflow status.
Once the row reaches a terminal status, you can resume it explicitly via the paths above. Plain archon workflow run never resumes implicitly.
Not to be confused with archon workflow cleanup [days], which deletes old terminal runs (completed/failed/cancelled) from the database for disk hygiene. It does not transition running rows.
Known limitation: AI session context from prior nodes is not restored. If a downstream node relies on in-context knowledge from a prior run’s session (rather than artifacts), it may need to re-read those artifacts explicitly.
Fresh start: If zero nodes completed in the prior run, Archon starts fresh (no nodes to skip).
provider: claude# Any registered provider (default: from config)
model: sonnet# Model override (default: from config assistants.claude.model)
Model strings: Whatever you write in model: is forwarded verbatim to the resolved provider’s SDK. Archon doesn’t keep an internal allow-list, because vendor SDKs ship new models faster than this doc can. The provider’s API decides whether the string is valid at request time.
Common shapes you’ll see in practice:
Claude (Anthropic): family aliases (sonnet, opus, haiku), full model IDs (claude-opus-4-7, claude-3-5-sonnet-20241022), context-window suffixed forms (opus[1m], claude-opus-4-7[1m]), or inherit to reuse the previous session’s model.
Codex (OpenAI): any OpenAI model ID — gpt-5.3-codex, gpt-5.2, o5-pro, etc.
Pi (community):<backend>/<model-id> refs — e.g. google/gemini-2.5-pro, openrouter/qwen/qwen3-coder.
If the SDK rejects the string at request time, the node fails loudly with the SDK’s error message — Archon never silently re-routes a model from one provider to another based on the string.
Provider selection is independent of the model string — a model: opus[1m] node with no provider: field will route to your defaultAssistant regardless of the model name. Always pair a provider-specific model string with an explicit provider: on the node.
By default, workflows started from the Web UI run in the background — execution is
dispatched to an internal worker conversation and results appear only in the workflow run
log, not in the chat window.
Set interactive: true to run the workflow in the foreground (same as CLI, Slack,
Telegram, and GitHub): all AI output and approval gate messages stream directly to the
user’s chat window.
name: my-interactive-workflow
interactive: true# Web UI: foreground execution (output visible in chat)
nodes:
- id: plan
prompt: "Create a plan for $USER_MESSAGE"
- id: review-gate
approval:
message: "Does this plan look good?"
depends_on: [plan]
- id: implement
command: implement
depends_on: [review-gate]
When to use interactive: true:
Workflows with approval nodes — users must see the AI output and respond inline
Workflows with interactive loop nodes (loop.interactive: true) — the loop gate pause requires foreground execution to deliver the gate message and run ID to the user
Multi-turn workflows where the user needs to provide feedback at each step
Any workflow where the response must appear in the user’s active chat thread
Platforms:interactive only affects the web platform. CLI, Slack, Telegram, and
GitHub always run workflows in foreground mode regardless of this setting.
All workflows support variable substitution in prompts and commands. The most commonly used:
Variable
Description
$ARGUMENTS / $USER_MESSAGE
The user’s input message that triggered the workflow
$WORKFLOW_ID
Unique ID for this workflow run
$ARTIFACTS_DIR
Pre-created artifacts directory for this workflow run
$BASE_BRANCH
Base branch (auto-detected or configured)
$DOCS_DIR
Documentation directory path (default: docs/)
$CONTEXT
GitHub issue/PR context (if available)
$nodeId.output
Output of a completed upstream node
$nodeId.output.field
JSON field from a structured upstream node output
See the Variable Reference for the complete list, including $LOOP_USER_INPUT, $REJECTION_REASON, positional arguments, substitution order, and context variable behavior.
For long workflows, DAG resume lets you skip already-completed nodes — opt in with --resume:
name: large-migration
description: Multi-file migration with automatic checkpoint recovery
nodes:
- id: plan
command: create-migration-plan
- id: batch-1
command: migrate-batch-1
depends_on: [plan]
context: fresh
- id: batch-2
command: migrate-batch-2
depends_on: [batch-1]
context: fresh
- id: validate
command: validate-migration
depends_on: [batch-2]
context: fresh
If the workflow fails at batch-2, run archon workflow run large-migration --resume to skip plan and batch-1. Plain archon workflow run large-migration (without --resume) starts fresh.
Use an approval node to pause for human review before continuing:
name: careful-refactor
description: Refactor with human approval gate
nodes:
- id: propose
command: propose-refactor
- id: review-gate
approval:
message: "Review the proposed refactor before proceeding. Check the artifacts directory."
depends_on: [propose]
- id: execute
command: execute-approved-refactor
depends_on: [review-gate]
- id: pr
command: create-pr
depends_on: [execute]
context: fresh
When the workflow reaches review-gate, it pauses and notifies you. Approve or reject via:
Natural language (recommended): Just type your response in the conversation — the system detects the paused workflow and auto-resumes
CLI: bun run cli workflow approve <run-id> or bun run cli workflow reject <run-id> — auto-resumes
Explicit command: /workflow approve <run-id> or /workflow reject <run-id> — auto-resumes when issued in the originating conversation
Web UI: Click the Approve/Reject buttons on the dashboard card — auto-resumes for Web-UI-dispatched runs; the Reject dialog includes an optional reason field that flows to $REJECTION_REASON
API: POST /api/workflows/runs/<run-id>/approve or /reject
All four paths auto-resume the workflow from the next node. The user’s approval comment is available as $review-gate.output in downstream nodes only when capture_response: true is set on the approval node. Cross-platform caveat: Web-UI approvals on Slack / Telegram / GitHub-dispatched runs record the decision but do not auto-resume — re-run from the originating platform to continue.
Without on_reject: rejecting cancels the workflow.
With on_reject: rejecting triggers an AI rework prompt and re-pauses for re-review.
See Approval Nodes for full details.
Use a cancel: node to stop a workflow when a precondition fails — preventing wasted compute on downstream branches:
nodes:
- id: check
bash: "git merge-base --is-ancestor HEAD origin/main && echo ok || echo blocked"
- id: stop-if-blocked
cancel: "PR has merge conflicts — cannot proceed with review"
depends_on: [check]
when: "$check.output == 'blocked'"
- id: review
prompt: "Review the PR..."
depends_on: [check]
when: "$check.output == 'ok'"
When a cancel: node executes (passes its when: gate), it sets the workflow run to cancelled with the reason string and stops all in-flight nodes. Unlike node failure, cancellation is intentional — the status is cancelled, not failed.
Choosing: Interactive Loop vs Approval with on_reject
Two primitives handle human-in-the-loop iteration. Use the right one for your pattern:
Interactive Loop
Approval + on_reject
YAML
loop.interactive: true
approval.on_reject: { prompt }
User input variable
$LOOP_USER_INPUT
$REJECTION_REASON
How it works
Same prompt runs each iteration, user input injected as variable
Specific on_reject prompt runs only on rejection
Best for
Conversational iteration — explore, refine, review cycles where the AI and human go back and forth
Gate-then-fix — approve to proceed, or reject to trigger a specific corrective action
Approval signal
AI detects user intent in its output (<promise>DONE</promise>)
User explicitly approves or rejects via button/command
Example
PIV loop: explore → user feedback → explore again
Report generation: generate → user rejects → AI revises specific section
Interactive loop (loop.interactive: true):
- id: refine-plan
loop:
prompt: |
User's feedback: $LOOP_USER_INPUT
Read the plan, apply feedback, present changes.
until: PLAN_APPROVED
max_iterations: 10
interactive: true
gate_message: "Review the plan. Provide feedback or say 'approved'."
The AI runs each iteration, pauses for user input, user’s text feeds into the next iteration via $LOOP_USER_INPUT. The AI decides when to emit the completion signal based on the user’s response.
Approval with on_reject (approval.on_reject):
- id: review
approval:
message: "Review the report. Approve or request changes."
capture_response: true
on_reject: { prompt: "Revise based on: $REJECTION_REASON", max_attempts: 5 }
depends_on: [generate]
The workflow pauses at the approval gate. User approves -> workflow continues. User rejects with feedback -> the on_reject prompt runs with $REJECTION_REASON, then re-pauses at the same gate.
Rule of thumb: If the human and AI are having a conversation (exploring, refining, iterating), use an interactive loop. If the workflow should proceed unless the human objects, use an approval gate with on_reject.
After a workflow runs, check the artifacts in the $ARTIFACTS_DIR for that run (located at ~/.archon/workspaces/owner/repo/artifacts/runs/{workflow-id}/).