Skip to content
Autonomy
AGH RuntimeAutonomy

Review Gate

Post-terminal review of task runs, reviewer routing, persisted verdicts, continuation runs, and the runtime authority boundary.

Audience
Operators running durable agent work
Focus
Autonomy guidance shaped for scanability, day-two clarity, and operator context.

The review gate is a post-terminal quality check on a task run. After a worker terminalizes a run, task.Service may create a review request, the daemon ReviewRouter selects a reviewer, the reviewer submits a typed verdict, and either the terminal outcome is accepted or a new continuation run is enqueued with bounded missing_work and next_round_guidance. The gate never rewrites the reviewed run's history, and channels, bridge messages, skills, notification cursors, and the web UI are never verdict authority.

Authority boundary

Read this section first. Everything else on the page operates inside it.

  • task.Service.RecordRunReview is the only path that records a verdict, updates task review rollups, or creates a rejected-review continuation run. It runs in one BEGIN IMMEDIATE transaction with the verdict, rollup fields, review events, and any continuation row written together.
  • task_runs remains the only durable execution queue and ownership source. Review does not change a terminal run's status. A failed run that is later approved stays failed; an approved completed run stays completed.
  • A reviewer session must be bound to a persisted review request through task.Service.BindRunReviewSession before the native submit_run_review tool is even visible to the session. Review id alone is not authorization.
  • Channels, bridge deliveries, notification cursors, prompt overlays, skills, and web cards never approve, reject, block, or retry a review. They surface, route, or coordinate.
  • TaskExecutionProfile.Review is reviewer-selection input. It chooses who reviews; it never decides the verdict.

Lifecycle

  1. A worker terminalizes a run through the existing token-fenced execution paths (Task Runs and Leases).
  2. The terminal transition writes the review trigger fields on the run row (review_required, review_request_round, review_policy_snapshot) along with the terminal status, summary, and event row.
  3. After the terminal commit, task.Service runs a follow-up transaction. If the captured policy matches the run's terminal status, it inserts an attempt-1 review request, links the new review_id into task_runs.review_request_id, clears review_required, and emits task.run_review_requested. The follow-up is idempotent on (run_id, review_round, attempt = 1); duplicate triggers return the existing row without emitting a second event.
  4. task.Service calls the daemon ReviewRouter callback at the same call site. Wake-up is a nudge only; the router reads persisted state through task service before doing anything.
  5. ReviewRouter resolves the effective reviewer selector from the persisted execution profile (profile.review) and active session/task state. It excludes the original worker by default, then either reuses an eligible active reviewer session or creates a local reviewer session through the daemon composition root.
  6. The daemon binds the reviewer session to the review request through task.Service.BindRunReviewSession before exposing submit_run_review. The bundled agh-task-reviewer skill loads only because the binding is durable.
  7. The reviewer submits exactly one typed verdict. task.Service.RecordRunReview validates the review/run link, actor identity, active binding when the actor is a reviewer session, idempotency, payload bounds, and status transition.
  8. The transaction either accepts the terminal outcome (approved), enqueues exactly one continuation run (rejected), records a blocked review/circuit diagnostic (blocked), or stores an evaluation diagnostic (error, timeout, invalid_output).

Review request creation is idempotent on (run_id, review_round, attempt = 1), so callers can retry the request path without creating duplicate review rows. There is no second queue.

Review policy and outcomes

[task.orchestration.review] defines the defaults; each task may override within those bounds. The policy decides whether a terminal run triggers a review:

PolicyTriggers review on
noneNever. The default.
on_successCompleted runs only.
on_failureFailed or canceled runs only.
alwaysCompleted, failed, and canceled runs.

Verdict outcome values are typed. The terminal review row stores status = "recorded" plus the typed outcome; never store outcomes as review statuses:

OutcomeEffect
approvedAccepts the run's terminal outcome. Emits task.run_review_approved. missing_work must be empty.
rejectedRecords missing work and enqueues exactly one continuation run linked by review_id. Emits task.run_review_retry_enqueued.
blockedRecords a blocked review and writes the task review circuit rollup fields with the bounded reason.
errorRecords a reviewer/tool execution diagnostic. Emits task.run_review_error; it does not create a continuation run.
timeoutRecords that the review exceeded its expected window. Emits task.run_review_timeout; it does not create a continuation run.
invalid_outputRecords that the run output could not be evaluated. Emits task.run_review_invalid_output; it does not create a continuation run.

approved requires empty missing_work. rejected requires at least one missing_work item or non-empty next_round_guidance. Both fields are bounded by missing_work_max_items/missing_work_item_max_bytes and next_round_guidance_max_bytes from [task.orchestration.review]. confidence must be in [0, 1]. Every verdict needs a delivery_id; task.Service.RecordRunReview is idempotent only when review id, run id, actor identity, outcome, and delivery id all match the persisted row.

[task.orchestration.review] also carries bounded policy fields such as max_review_attempts, rapid_terminal_window, rapid_terminal_limit, and failure_policy. They are validated config state and visible to agents, but the current shipped verdict path does not expose a standalone review-circuit reset command or a retry-attempt API.

Reviewer routing and binding

ReviewRouter (internal/daemon) is the runtime composition-root component that turns a persisted review request into a bound reviewer session. It is wake-driven, not event-tail driven.

Selector inputs come from the persisted TaskExecutionProfile.Review block. See Task Execution Profiles for the full overlay model. The router reads the normalized review agent/provider/model, allowed/preferred agent names, allowed/preferred peer ids, allowed/preferred channel ids, and required/preferred capabilities.

Selection order, after resolving the effective selector:

  1. Existing active sessions in the task workspace are scored first. Exact/preferred agent names, preferred peer ids, preferred channel ids, and preferred capabilities increase the score.
  2. If no active session matches, the router creates a local reviewer session for the exact, preferred, or allowed agent that satisfies required capabilities.
  3. If peer selectors require an explicit peer and no active eligible peer exists, the router fails closed rather than creating an unrelated local reviewer.
  4. If nothing matches, the router records a deterministic no-route diagnostic as a typed blocked outcome through task.Service.RecordRunReview with a review-router:no-route: delivery id, so the no-route reason becomes part of review history instead of hidden channel state.

Eligibility rules:

  • Default is allow_original_worker = false. The session, agent, peer, and actor identity that terminalized the run cannot be selected unless the policy explicitly allows it. When the original-worker identity cannot be determined, routing fails closed.
  • Channel membership and peer authorization remain enforced by the network/bridge subsystems. ReviewProfile selectors narrow routing; they do not grant access.

Session binding (task.Service.BindRunReviewSession) is authorization state, not prompt context. It runs in a task-service transaction, verifies the review is still routed or requested, sets reviewer_session_id, started_at, deadline_at, and status = "in_review", and rejects any second active session binding. The native submit_run_review tool calls LookupReviewForSession(session_id) on every invocation; a session without a matching active binding sees ErrToolUnavailable and cannot submit a verdict.

Continuation runs

A rejected verdict creates exactly one continuation run inside the same verdict transaction. The continuation row is a normal task_runs row, not a hidden retry lane:

  • parent_run_id points at the reviewed terminal run.
  • review_id points at the rejected review row.
  • review_round is the next round number.
  • continuation_reason = "review_rejected".
  • missing_work_json and next_round_guidance are bounded copies from the verdict, ready for the next worker's context bundle.

Continuation profile precedence — see Task Execution Profiles:

  1. The task's current TaskExecutionProfile at enqueue time controls worker, reviewer, participant, coordinator-guidance, and sandbox selection for the continuation.
  2. The reviewed run's native coordination/capability fields are copied forward only when the profile leaves the equivalent worker/participant selectors empty.
  3. The continuation run's review columns provide context and lineage. They do not override worker selection or grant permissions.

The next worker reads continuation context through TaskContextBundle.ReviewContinuation and TaskContextBundle.ReviewHistory in the /agent/context task bundle. The implemented API reference is Agent API, and the CLI path is agh me context. Workers must still claim the run through ClaimNextRun and mutate it through session-bound lease lookup. The continuation context is guidance, not permission.

Idempotent rejected-verdict replay returns the existing continuation by task_runs.review_id = review_id. The transaction never enqueues a duplicate when the same delivery_id is replayed.

Manage reviews from the CLI

The agh task review command group operates over UDS and shares its surface with the matching HTTP endpoints. Every subcommand supports -o json|jsonl|toon for agent consumption.

# Request a review for a terminal run.
agh task review request <run-id> --policy on_success --reason "audit before promotion" -o json

# List reviews for a task or run.
agh task review list --task <task-id> --status recorded -o json
agh task review list --run  <run-id>  --status routed   -o json

# Inspect one review.
agh task review show <review-id> -o json

# Submit the bound verdict (review id is the path argument; --run is required for the run link).
agh task review submit <review-id> \
  --run <run-id> \
  --outcome rejected \
  --confidence 0.4 \
  --reason "missing migration safety check" \
  --missing-work "add ON DELETE behavior" \
  --next-round-guidance "Re-run with explicit cascade analysis" \
  --delivery-id rev-123-attempt-1 \
  -o json

Generated reference for each verb:

agh task review submit is the operator-facing path. Submitting from the CLI uses server-derived operator actor authorization. It does not surface or call the reviewer-bound native tool, and it cannot be invoked by an unbound session.

Manage reviews through HTTP and UDS

Both transports mount the same shared core handlers, so HTTP and UDS responses are identical and refer to the same authority. Operation IDs come from openapi/agh.json; do not paraphrase the shapes.

MethodPathOperation IDPurpose
POST/api/task-runs/{id}/reviewsrequestTaskRunReviewCreate a review request for a run.
GET/api/task-runs/{id}/reviewslistTaskRunReviewsList reviews scoped to one run; supports status, reviewer_session_id, limit.
GET/api/tasks/{id}/reviewslistTaskReviewsList reviews scoped to one task.
GET/api/task-reviews/{id}getTaskRunReviewShow one review row.
POST/api/task-reviews/{id}/verdictsubmitTaskRunReviewVerdictPersist the verdict through task.Service.RecordRunReview.

status filters use the persisted review status enum (requested, routed, in_review, recorded, circuit_opened, canceled). The verdict request body is the typed RunReviewVerdict; outcome uses the verdict-outcome enum from this page (approved, rejected, blocked, error, timeout, invalid_output). Generated TypeScript types for web consumers live in web/src/generated/agh-openapi.d.ts.

Reviewer-bound native tool

In-session reviewers submit verdicts through one model-facing tool. AGH registers the tool internally with the agh__ prefix so toolset routing can hide it when the session is not bound.

Model-facing nameInternal idAuthority
submit_run_reviewagh__task_run_review_submitPersists the verdict through task.Service.RecordRunReview. Available only to reviewer-bound sessions.

Visibility rules enforced by the runtime:

  • The bundled agh-task-reviewer skill must be active. Its metadata.agh.requires_review_request flag is the load trigger; the tool is hidden until the loader sees an active review binding.
  • LookupReviewForSession(session_id) must return a binding that matches the call's review_id. Mismatched ids return an unavailable-tool error and never reach the verdict path.
  • The tool never inspects claim leases or exposes raw claim tokens. Reviewer sessions do not need an active claim.
  • Operator/API/UDS/CLI verdict submissions go through the explicit /api/task-reviews/{id}/verdict HTTP/UDS endpoint or agh task review submit with server-derived operator identity. There is no debug-only native-tool bypass for unbound sessions.

The reviewer-related auxiliary tools — task_run_review_request, task_run_review_list, and task_run_review_show (registered as agh__task_run_review_request, agh__task_run_review_list, and agh__task_run_review_show) — are read/request helpers that delegate to task.Service.RequestRunReview, task.Service.ListRunReviews, and task.Service.GetRunReview. They never bypass review policy validation or write GlobalDB directly.

Inspect from the operator web UI

Open a task and switch to the Orchestration tab (the same tab documented in Task Execution Profiles). The task-level reviews card and the run-detail page's run-level variant are read-only views over the same authority:

  • They render status, outcome, reason, missing_work, next_round_guidance, reviewed_at, delivery_id, and reviewer identity from web/src/generated/agh-openapi.d.ts types — no inferred state.
  • A permanent disclaimer reinforces that operator sessions cannot submit a verdict from the web UI. Verdict authority is the reviewer-bound native tool or the explicit /api/task-reviews/{id}/verdict endpoint.
  • Continuation lineage and rejected-review guidance appear next to the reviewed run so operators can match the continuation to the rejected verdict without parsing event payloads.

The Stream Resume card on the same tab seeds reconnects from latest_event_seq, so review events appear in the timeline without a read-then-stream race. See Notification Cursors for the related delivery behavior.

Review events

task.Service emits typed task events for every review transition. They appear on the task SSE stream after persistence and feed the timeline, hooks, and observe surfaces. Payloads are bounded and never carry raw claim tokens or full reviewer transcripts.

EventEmitted when
task.run_review_requestedAttempt-1 review row is durable and task_runs.review_request_id is set.
task.run_review_boundBindRunReviewSession binds the review to a reviewer session.
task.run_review_recordedVerdict is persisted (any outcome).
task.run_review_approvedRecorded verdict outcome is approved. Bridge terminal notifier delivers.
task.run_review_rejectedRecorded verdict outcome is rejected.
task.run_review_blockedRecorded verdict outcome is blocked.
task.run_review_errorRecorded verdict outcome is error.
task.run_review_timeoutRecorded verdict outcome is timeout.
task.run_review_invalid_outputRecorded verdict outcome is invalid_output.
task.run_review_retry_enqueuedRejected verdict created a continuation run inside the verdict transaction.

task.run_review_approved is also the accepted-final terminal event for the bridge terminal notifier on review-gated work. See Notification Cursors.

Bundled skill expectations

agh-task-reviewer is one of the bundled orchestration skills. It is loaded only by reviewer sessions with an active binding (metadata.agh.requires_review_request = true) and is purely instructional:

  • It cannot record verdicts, change task state, alter reviewer routing, or expose claim tokens.
  • It teaches reviewers how to read review packets, prefer typed outcomes, and submit bounded evidence through submit_run_review.

See Bundled Skills — Orchestration for skill load triggers and the orchestration-skill authority boundary.

Config lifecycle

Defaults and bounds for the gate live under [task.orchestration.review]. Read config.toml for the complete field reference and the [task.orchestration.profile] overlay that gates per-task provider/sandbox overrides used by reviewer routing.

Behavior to keep in mind:

  • Workspace overlays (<workspace>/.agh/config.toml) may tighten or relax the review defaults per workspace. Unknown keys still fail validation.
  • Updating [task.orchestration.review] via agh config set or the agh__config_* native tools is allowed because review policy is a runtime configuration value, not a credential. Secret- shaped paths and trust-rooted sections remain off-limits.
  • Per-task review policy and reviewer routing are managed through the task execution profile, not through arbitrary metadata_json fields.

On this page