Persistence¶

Every operational question about "what survives a restart / an upgrade / a DR event" lives here. One page for the whole storage story so you don't stitch it together from five others.

If you are looking for just the what volumes exist tl;dr, the same table is in installation → Volumes. This page adds the DR mechanics, the integrity checks, and the bind-mount alternative.

Volume matrix¶

Eight named Docker volumes cover the stack. All persist across docker compose down + up; all are destroyed by docker compose down -v.

Volume (compose name / host name)	Container mount	Contents	In argos backup?	If you lose it, how to recover
`argos_data` / `argos_panel_data`	`/data` in argos	`argos.db`, `/data/geoip/`	Yes (the DB is the tar.gz's main payload)	Restore a backup tarball. Without a backup: everything is gone (hosts, users, audit log, notifications, manual cert metadata, settings).
`argos_backups` / `argos_panel_backups`	`/data/backups` in argos	Local tar.gz backups produced by the scheduler / manual triggers	No (this IS the destination)	Off-site replica if you have one (rclone / borg). If not, local backup history is unrecoverable but the panel keeps running.
`caddy_data` / `argos_caddy_data`	`/data` in caddy (read-only into argos for backup capture)	ACME account keys, issued certs, certmagic state	Yes (best-effort; see Backups)	Caddy re-issues every `tls_mode=auto` cert on the next request. New ACME account auto-created. Brief TLS failure window during re-issue.
`caddy_config` / `argos_caddy_config`	`/config` in caddy	Caddy's runtime config cache	No (regenerable)	No-op: argos re-pushes the JSON config on next boot via the admin API.
`caddy_logs` / `argos_caddy_logs`	`/var/log/caddy` in caddy (read-only into argos + crowdsec)	`access.log` + `errors.log` (rotated at 100 MB × 5 × 7 d)	No (rotates independently)	Rows already ingested into `log_entries` (inside `argos.db`) stay. Anything not yet ingested is lost.
`caddy_manual_certs` / `argos_caddy_manual_certs`	`/data/manual-certs` (argos RW), `/etc/caddy/manual-certs` (caddy RO)	Plaintext `.crt` + `.key` per manual-mode host	No (the encrypted key lives in `host_manual_certs`)	Boot reconciler rematerialises files from `argos.db` on next startup — provided `ARGOS_MASTER_KEY` is unchanged. See Manual certs → Disaster recovery.
`crowdsec_data` / `argos_crowdsec_data`	`/var/lib/crowdsec/data` in crowdsec	LAPI decision DB, machine credentials, bouncer API keys, local scenarios	No	Re-enroll the machine, re-download the community feed. Regenerate the bouncer API key with `cscli bouncers add` and update the panel's `CROWDSEC_BOUNCER_API_KEY`. Operator-added manual bans are lost.
`crowdsec_config` / `argos_crowdsec_config`	`/etc/crowdsec` in crowdsec	Installed collections, parsers, AppSec config	No	Re-run the boot setup script (`crowdsec/setup-appsec.sh`) to reinstall the collections from the bundled sources.

Bind-mount repo files (not volumes, but present on disk)¶

./Caddyfile → /etc/caddy/Caddyfile (RO) — bootstrap config that hands control to argos on first boot. Tiny fixed file, lives in git.
./crowdsec/* → /etc/crowdsec/* + /setup/* (RO) — acquis definitions, AppSec sources, setup scripts. Lives in git.

These come back on any fresh git clone; they do not need separate backup.

Backup scope¶

What `argos-edge` natively backs up¶

The scheduled + manual backup feature (see Backups) produces tar.gz files under /data/backups/ containing:

argos.db — VACUUM INTO snapshot of the live DB. Fully consistent. Includes host_manual_certs rows with AES-GCM-encrypted keys (encrypted with ARGOS_MASTER_KEY).
metadata.json — argos version, git commit, schema version, kind, timestamp UTC, count of caddy files included.
caddy/ (optional, best-effort) — a copy of caddy_data as visible to the argos container. Some files may be owned by root and skipped; metadata.json records the actual count.

What is NOT in the tarball¶

crowdsec_data — out of scope. CrowdSec's own state lives in its own volume.
caddy_logs raw files — rotating log files; rows already ingested into log_entries (in argos.db) are captured, the files themselves are not.
caddy_manual_certs plaintext files — NOT backed up directly. The encrypted keys are inside argos.db; the plaintext .crt / .key files get regenerated by the boot reconciler. See Manual certs → Disaster recovery.
.env — by design. Secrets live outside the panel.

Off-site replication¶

argos_backups is its own volume so you can point a rclone / borg / restic sidecar at just that directory. Typical compose pattern:

services:
  rclone-sync:
    image: rclone/rclone:latest
    restart: unless-stopped
    volumes:
      - argos_backups:/src:ro
      - ./rclone.conf:/config/rclone/rclone.conf:ro
    command: ["rclone", "sync", "/src", "remote:argos-backups",
              "--log-level", "INFO"]

Run this as a second stack; it doesn't need access to anything else.

`ARGOS_MASTER_KEY` is part of your backup¶

Back this up out of band

ARGOS_MASTER_KEY (from .env) encrypts every secret the panel persists: manual cert private keys, OIDC client secrets, SMTP passwords, Telegram bot tokens, VAPID private keys. If you restore argos.db onto fresh infrastructure but cannot produce the original ARGOS_MASTER_KEY, every encrypted value is unrecoverable. The panel still boots; the encrypted values just stay unreadable until you rotate them (re-upload certs, re-save OIDC credentials, etc.).

Store .env (or at minimum ARGOS_MASTER_KEY and ARGOS_SESSION_SECRET) in a password manager, secrets vault, or encrypted cold storage alongside the backup tarballs.

Disaster recovery checklist¶

The fresh-infra / bare-metal-rebuild scenario. You have:

a .env file with the original ARGOS_MASTER_KEY and ARGOS_SESSION_SECRET,
a recent argos-backup-*.tar.gz,
a new host with Docker Engine.

Steps¶

Clone the repo + checkout the version that produced the backup. The tarball's metadata.json records the argos version; match it to avoid migration forward/backward surprises.
```
git clone https://github.com/cmos486/argos-edge.git
cd argos-edge
git checkout v1.1.0    # whichever tag the backup was taken on
```
Put .env in place. Copy the saved .env into the repo root. Do NOT regenerate ARGOS_MASTER_KEY — that breaks everything encrypted.

Stage the tarball. First-boot flow needs the backup to be already present on the argos_data volume:

docker compose up --no-start argos
docker compose cp ./argos-backup-<ts>.tar.gz argos:/data/backups/

Schedule the restore.

docker compose run --rm argos \
  argos restore --file /data/backups/argos-backup-<ts>.tar.gz --yes

The command writes /data/.restore_pending and exits 0.

Start the stack.
```
docker compose up -d
```
Boot order: - The panel sees .restore_pending, extracts the tarball over /data/, replaces argos.db. - Migrations run (idempotent; matching version means nothing to apply). - The manual-cert reconciler walks host_manual_certs and materialises missing .crt / .key files to caddy_manual_certs. - The argos→caddy reconciler pushes the config. Caddy starts serving manual certs immediately; ACME-mode hosts trigger re-issuance on next request if caddy_data was also wiped.

Verify.

# panel alive
docker compose exec argos wget -qO- http://localhost:8080/healthz

# manual cert files rematerialised
docker compose exec argos ls -la /data/manual-certs

# reconciler log
docker compose logs argos --since=5m | grep "manual cert reconcile"

Log in to the panel with the .env admin password. Walk Certificates → Imported; every row from the backup should be present.

If you do NOT have `ARGOS_MASTER_KEY`¶

You can still recover the non-encrypted state: hosts, target groups, rules, users (passwords are bcrypt-hashed, independent of the master key), sessions, audit log, notification rules (without channel secrets), appsec state, backup metadata.

You will lose the decryption ability for:

Every operator-uploaded manual cert key — must be re-uploaded.
OIDC client secrets — must be re-entered.
SMTP passwords, Telegram tokens, webhook auth headers, VAPID private keys — must be re-entered.

The panel boots fine. Affected features show clean "not configured" states; nothing crashes.

Volume lifecycle operations¶

Inspect a volume¶

docker volume inspect argos_panel_data
# .Mountpoint is the host path (typically /var/lib/docker/volumes/...)

Size a volume¶

VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo du -sh "$VOL_PATH"

Move a volume to another host¶

Two paths; pick based on whether the stack is down.

Stack stopped, tar the host path:

docker compose down
VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo tar -C "$(dirname "$VOL_PATH")" -czf /tmp/argos_panel_data.tgz "$(basename "$VOL_PATH")"
# On the new host, create the volume empty, untar onto it.

Stack running, use a helper container (does not need downtime):

docker run --rm -v argos_panel_data:/src -v /tmp:/dst alpine \
  tar -C /src -czf /dst/argos_panel_data.tgz .

Either produces a tar.gz you can scp to the new host and untar onto a fresh volume of the same name.

Reset a single volume¶

Rarely needed, but occasionally: caddy_config (regenerable) or caddy_manual_certs after a botched experiment (reconciler rebuilds from DB).

docker compose stop caddy argos
docker volume rm argos_caddy_config
docker compose start caddy argos

Do NOT do this to argos_data or argos_backups — that is equivalent to docker compose down -v on those specifically. Always have a fresh backup first.

Integrity verification¶

The backup tarball has a sha256 column in the backups table the panel checks against automatically. For the other volumes, the operator owns verification.

Baseline + verify pattern¶

Generate a baseline on a known-good state:

VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo find "$VOL_PATH" -type f ! -name '*.journal' ! -name '*-wal' ! -name '*-shm' \
  -exec sha256sum {} \; | sort > /var/log/argos-data.sha256

(Excluding -wal / -shm is important — SQLite write-ahead log files legitimately change between checks.)

Verify later:

cd / && sudo sha256sum -c /var/log/argos-data.sha256 | grep -v OK || true

The lines NOT ending in OK are files that changed. For the argos volume this is mostly normal churn (backup files appear, log rows flush); the useful application is on immutable volumes like caddy_manual_certs where every change should be operator-initiated.

Cron one-liner for `caddy_manual_certs`¶

# /etc/cron.daily/argos-manual-cert-integrity
#!/bin/sh
VOL=$(docker volume inspect argos_caddy_manual_certs -f '{{.Mountpoint}}')
BASELINE=/var/log/argos-manual-certs.sha256
CURRENT=$(mktemp)
find "$VOL" -type f -exec sha256sum {} \; | sort > "$CURRENT"
if [ -f "$BASELINE" ] && ! diff -q "$BASELINE" "$CURRENT" >/dev/null; then
    logger -t argos "manual cert volume drift detected"
    diff "$BASELINE" "$CURRENT" | logger -t argos
fi
cp "$CURRENT" "$BASELINE"
rm "$CURRENT"

Drift here = someone (or something) modified a file argos did not write. Investigate via the audit log; the reconciler is the only legitimate writer other than the explicit upload / delete handlers.

ZFS / Btrfs snapshots¶

If the host filesystem is ZFS or Btrfs, snapshotting /var/lib/docker/volumes/argos_panel_* is a cheap alternative to tarballing. Scrubs catch silent bit-rot at the filesystem layer. Not a substitute for argos backups (the panel-aware schema matters for restore) but a strong complement.

Production deployments with bind mounts¶

The shipped compose uses Docker named volumes. For production setups where you want host-level backup tooling (restic, borg, duplicity, Proxmox-backed ZFS snapshots, BackupPC, Bacula, etc.) to operate directly on filesystem paths, you can replace any of the named volumes with bind mounts.

When this is useful:

You already run a filesystem-level backup tool and want it to see the panel data without Docker abstractions in the way.
You take ZFS / Btrfs snapshots and want them to cover argos data alongside the rest of the host.
You run syncthing / rsync replication to a standby host.
You want to ls the files from the host without docker volume inspect gymnastics.

Bind-mount override example¶

Set up the target directory FIRST, with permissions that match the container uid (nobody = 65534 inside the argos image):

sudo mkdir -p /srv/argos-edge/{data,backups,caddy-data,caddy-config,caddy-logs,manual-certs,crowdsec-data,crowdsec-config}
sudo chown -R 65534:65534 /srv/argos-edge/data /srv/argos-edge/backups /srv/argos-edge/manual-certs
# Caddy runs as root in the upstream image; its volumes stay
# root-owned. CrowdSec uses GID=1000 per its env -- match it:
sudo chown -R 1000:1000 /srv/argos-edge/crowdsec-data /srv/argos-edge/crowdsec-config

Then drop this alongside docker-compose.yml. Compose merges it automatically (docker-compose.override.yml):

# docker-compose.override.yml -- bind-mount production layout.
# Each named volume is redefined as a local driver with
# type=none / o=bind pointing at a host path.

volumes:
  argos_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/data

  argos_backups:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/backups

  caddy_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/caddy-data

  caddy_config:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/caddy-config

  caddy_logs:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/caddy-logs

  caddy_manual_certs:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/manual-certs

  crowdsec_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/crowdsec-data

  crowdsec_config:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/crowdsec-config

The volume NAMES (both the compose-level argos_data and the host-level name: argos_panel_data) stay identical. Only the underlying storage moves from Docker's managed volume tree to your host path.

Critical gotchas¶

Bind-mount BEFORE the first compose up

Switching from named volumes to bind mounts on a running stack is a two-step operation: stop the stack, move the data from the Docker volume path to the bind path, then bring the stack back up with the override in place. Starting with bind mounts out of the gate is far simpler. If you already have production data in named volumes, export → move → re-import:

docker compose down
# Copy each volume's content to the new bind path.
for v in argos_panel_data argos_panel_backups argos_caddy_data \
         argos_caddy_config argos_caddy_logs argos_caddy_manual_certs \
         argos_crowdsec_data argos_crowdsec_config; do
    src=$(docker volume inspect "$v" -f '{{.Mountpoint}}')
    dst=/srv/argos-edge/$(echo "$v" | sed 's/argos_panel_//; s/argos_caddy_//; s/argos_crowdsec_/crowdsec-/')
    sudo rsync -a "$src/" "$dst/"
done
# Drop in the override, then:
docker compose up -d

Verify everything is intact (log in, list hosts, force a backup) before removing the old Docker volumes with docker volume rm <name>.

Backup tool permissions — the backup tool reads files owned by nobody (uid 65534). On most distros, running the backup as root handles this trivially; an unprivileged user needs either group membership, ACLs, or sudo.
Restore via filesystem-level tools needs chown -R — if your backup tool does not preserve uids (some cloud-bucket syncers flatten ownership), you must re-chown after restore or the containers will see permission-denied on their own data files. chown 65534:65534 /srv/argos-edge/{data,backups,manual-certs} fixes it.
SELinux (RHEL, Fedora, Rocky, AlmaLinux) — append :Z to the volume mount in the compose file, or the containers will hit Permission denied even with correct uid. Docker sets the container-specific SELinux label:
```
volumes:
  - argos_data:/data:Z
```
AppArmor / unconfined host — usually no-op. Only relevant if you have a custom profile restricting Docker.
Filesystem choice — any POSIX filesystem works. SQLite (the argos DB) does NOT work reliably on NFS unless you have correctly-configured locking; keep argos_data on local storage. argos_backups on NFS is fine (just files).

Mix-and-match¶

Nothing forces you to bind-mount all eight. A common pattern:

Bind-mount argos_backups to a filesystem that your existing backup tool already watches.
Keep named volumes for the rest; let Docker manage them.

The override is per-volume — leave the ones you don't need out of the override YAML and they stay on Docker-managed storage.

Backups — scheduler config, archive structure, retention.
Restore from backup — step-by- step restore procedure.
Manual certificates → Disaster recovery — the specific mechanism that makes caddy_manual_certs safe to lose.
Running multiple instances — why volume names are hardcoded and how to rescope for a second stack.
Upgrading — which volumes survive which commands.
Installation → Volumes — same matrix, shorter form.