Skip to content
AGH RuntimeOperations

Production Checklist

Prepare AGH for persistent unattended operation with clear pass and fail checks.

Audience
Operators running durable agent work
Focus
Operations guidance shaped for scanability, day-two clarity, and operator context.

Use this checklist before running AGH as a persistent daemon for real work. It is written for local or self-managed production-like environments where one service user owns one AGH_HOME.

1. Pin the daemon identity and home

CheckPass condition
Service userA dedicated OS user owns the daemon process.
Home directoryAGH_HOME is explicit, stable, and owned by the service user.
CLI operationsOperators use the same AGH_HOME when running agh daemon status, agh session list, and related commands.
File permissionsThe home directory is not world-writable; socket access is limited to the daemon user.

Example:

sudo install -d -o agh -g agh -m 0750 /var/lib/agh

AGH creates its standard subdirectories with normal directory permissions, and the live UDS socket is chmodded to 0600.

2. Harden configuration

Review the home config that the daemon loads:

export AGH_HOME="${AGH_HOME:-$HOME/.agh}"
sed -n '1,220p' "$AGH_HOME/config.toml"

After changing it, run agh daemon start --foreground during a maintenance window or in a staging AGH_HOME to surface config validation errors directly.

Use explicit daemon and HTTP settings:

[daemon]
socket = "/var/lib/agh/daemon.sock"

[http]
host = "localhost"
port = 2123

[log]
level = "info"

[limits]
max_sessions = 10
max_concurrent_agents = 20
CheckPass condition
HTTP bind[http].host is localhost unless AGH is intentionally protected by a reverse proxy or host firewall.
UDS path[daemon].socket is inside a directory owned by the daemon user.
Log level[log].level is info or warn for unattended operation; use debug only for short investigations.
LimitsSession and agent concurrency limits match the host capacity.
Provider environmentRequired provider API keys are set in the service environment, not only in an interactive shell.

3. Run under a service manager

The service manager should:

  • start agh daemon start --foreground
  • send SIGTERM during stop
  • restart on unexpected failure
  • provide the provider environment used by agent subprocesses
  • keep stdout and stderr in a known log location

For concrete service files, see Daemon Operations.

4. Configure log retention

AGH writes structured logs to $AGH_HOME/logs/agh.log. Detached daemon startup also appends child stdout and stderr there.

If your host uses logrotate, use a rule like this and adjust user, group, and path:

/var/lib/agh/logs/agh.log {
  daily
  rotate 14
  compress
  missingok
  notifempty
  copytruncate
  create 0640 agh agh
}
CheckPass condition
RetentionLogs rotate before filling the filesystem.
AccessOnly operators who need runtime logs can read them.
Error reviewRecent error lines are reviewed during incident response and before upgrades.

5. Monitor daemon and runtime health

Use both daemon status and observe health:

agh daemon status --output json
agh observe health --output json

If HTTP is available locally:

curl -fsS http://localhost:2123/api/daemon/status >/dev/null
curl -fsS http://localhost:2123/api/observe/health >/dev/null

Alert on:

SignalFailing condition
Daemon statusStatus is not running, or PID is absent.
HTTP status/api/daemon/status or /api/observe/health cannot be reached from the host.
Active sessionsCount exceeds the expected operating range.
Database sizeglobal_db_size_bytes or session_db_size_bytes grows faster than planned.
LogsRepeated startup, socket, database, or ACP spawn errors.

6. Back up state

Back up at least:

  • $AGH_HOME/agh.db and SQLite sidecars
  • $AGH_HOME/sessions/
  • $AGH_HOME/config.toml
  • $AGH_HOME/agents/
  • $AGH_HOME/skills/
  • $AGH_HOME/memory/

Use one of the backup procedures in Database Operations. For unattended hosts, prefer a scheduled cold backup when the daemon can be stopped. If it cannot be stopped, use SQLite .backup instead of copying only the main database files.

CheckPass condition
FrequencyBackup frequency matches the amount of session history you can afford to lose.
CoverageBackups include global and per-session databases plus config and content directories.
Restore drillA restore has been tested on a separate AGH_HOME.
RetentionOld backups expire according to your storage and compliance needs.

7. Reserve host resources

AGH starts real ACP-compatible agent CLIs as child processes. Size the host for the agent binaries you run, not only the daemon.

CheckPass condition
DiskAGH_HOME, logs, and session event databases have room to grow.
File descriptorsThe service limit is high enough for concurrent sessions, sockets, logs, and SQLite handles.
Process countThe service user can run the daemon plus expected agent child processes.
PATHProvider commands such as npx, codex, or gemini are available to the service environment.
ShutdownThe service manager gives AGH time to stop sessions and close databases before killing it.

For systemd, set resource limits in the service file when needed:

[Service]
LimitNOFILE=8192
TimeoutStopSec=30
Restart=on-failure

8. Upgrade deliberately

Use this flow for binary upgrades:

export AGH_HOME=/var/lib/agh

agh daemon status
agh daemon stop

# Back up AGH_HOME here.
# Install the new agh binary here.

agh daemon start
agh daemon status
agh observe health

Do not rely on old daemon state after replacing the binary. Stop, back up, replace, start, and then confirm status and health.

Final readiness gate

AreaReady when
Daemon lifecycleagh daemon start, agh daemon status, and agh daemon stop work under the service manager.
SocketThe CLI can reach the configured UDS socket as the intended operator user.
HTTPHTTP is bound only where intended and health endpoints are reachable.
LogsLogs rotate and recent errors are actionable.
DatabasesBackups include agh.db, per-session events.db files, metadata, and sidecars.
SessionsTest session creation, stop, list, and resume work with the production service environment.
RecoveryOperators know how to restore a backup into a separate AGH_HOME before touching production state.

On this page