Persistence¶
Every operational question about "what survives a restart / an upgrade / a DR event" lives here. One page for the whole storage story so you don't stitch it together from five others.
If you are looking for just the what volumes exist tl;dr, the same table is in installation → Volumes. This page adds the DR mechanics, the integrity checks, and the bind-mount alternative.
Volume matrix¶
Eight named Docker volumes cover the stack. All persist across docker compose down + up; all are destroyed by docker compose down -v.
| Volume (compose name / host name) | Container mount | Contents | In argos backup? | If you lose it, how to recover |
|---|---|---|---|---|
argos_data / argos_panel_data | /data in argos | argos.db, /data/geoip/ | Yes (the DB is the tar.gz's main payload) | Restore a backup tarball. Without a backup: everything is gone (hosts, users, audit log, notifications, manual cert metadata, settings). |
argos_backups / argos_panel_backups | /data/backups in argos | Local tar.gz backups produced by the scheduler / manual triggers | No (this IS the destination) | Off-site replica if you have one (rclone / borg). If not, local backup history is unrecoverable but the panel keeps running. |
caddy_data / argos_caddy_data | /data in caddy (read-only into argos for backup capture) | ACME account keys, issued certs, certmagic state | Yes (best-effort; see Backups) | Caddy re-issues every tls_mode=auto cert on the next request. New ACME account auto-created. Brief TLS failure window during re-issue. |
caddy_config / argos_caddy_config | /config in caddy | Caddy's runtime config cache | No (regenerable) | No-op: argos re-pushes the JSON config on next boot via the admin API. |
caddy_logs / argos_caddy_logs | /var/log/caddy in caddy (read-only into argos + crowdsec) | access.log + errors.log (rotated at 100 MB × 5 × 7 d) | No (rotates independently) | Rows already ingested into log_entries (inside argos.db) stay. Anything not yet ingested is lost. |
caddy_manual_certs / argos_caddy_manual_certs | /data/manual-certs (argos RW), /etc/caddy/manual-certs (caddy RO) | Plaintext .crt + .key per manual-mode host | No (the encrypted key lives in host_manual_certs) | Boot reconciler rematerialises files from argos.db on next startup — provided ARGOS_MASTER_KEY is unchanged. See Manual certs → Disaster recovery. |
crowdsec_data / argos_crowdsec_data | /var/lib/crowdsec/data in crowdsec | LAPI decision DB, machine credentials, bouncer API keys, local scenarios | No | Re-enroll the machine, re-download the community feed. Regenerate the bouncer API key with cscli bouncers add and update the panel's CROWDSEC_BOUNCER_API_KEY. Operator-added manual bans are lost. |
crowdsec_config / argos_crowdsec_config | /etc/crowdsec in crowdsec | Installed collections, parsers, AppSec config | No | Re-run the boot setup script (crowdsec/setup-appsec.sh) to reinstall the collections from the bundled sources. |
Bind-mount repo files (not volumes, but present on disk)¶
./Caddyfile→/etc/caddy/Caddyfile(RO) — bootstrap config that hands control to argos on first boot. Tiny fixed file, lives in git../crowdsec/*→/etc/crowdsec/*+/setup/*(RO) — acquis definitions, AppSec sources, setup scripts. Lives in git.
These come back on any fresh git clone; they do not need separate backup.
Backup scope¶
What argos-edge natively backs up¶
The scheduled + manual backup feature (see Backups) produces tar.gz files under /data/backups/ containing:
argos.db—VACUUM INTOsnapshot of the live DB. Fully consistent. Includeshost_manual_certsrows with AES-GCM-encrypted keys (encrypted withARGOS_MASTER_KEY).metadata.json— argos version, git commit, schema version, kind, timestamp UTC, count of caddy files included.caddy/(optional, best-effort) — a copy ofcaddy_dataas visible to the argos container. Some files may be owned by root and skipped;metadata.jsonrecords the actual count.
What is NOT in the tarball¶
crowdsec_data— out of scope. CrowdSec's own state lives in its own volume.caddy_logsraw files — rotating log files; rows already ingested intolog_entries(in argos.db) are captured, the files themselves are not.caddy_manual_certsplaintext files — NOT backed up directly. The encrypted keys are insideargos.db; the plaintext.crt/.keyfiles get regenerated by the boot reconciler. See Manual certs → Disaster recovery..env— by design. Secrets live outside the panel.
Off-site replication¶
argos_backups is its own volume so you can point a rclone / borg / restic sidecar at just that directory. Typical compose pattern:
services:
rclone-sync:
image: rclone/rclone:latest
restart: unless-stopped
volumes:
- argos_backups:/src:ro
- ./rclone.conf:/config/rclone/rclone.conf:ro
command: ["rclone", "sync", "/src", "remote:argos-backups",
"--log-level", "INFO"]
Run this as a second stack; it doesn't need access to anything else.
ARGOS_MASTER_KEY is part of your backup¶
Back this up out of band
ARGOS_MASTER_KEY (from .env) encrypts every secret the panel persists: manual cert private keys, OIDC client secrets, SMTP passwords, Telegram bot tokens, VAPID private keys. If you restore argos.db onto fresh infrastructure but cannot produce the original ARGOS_MASTER_KEY, every encrypted value is unrecoverable. The panel still boots; the encrypted values just stay unreadable until you rotate them (re-upload certs, re-save OIDC credentials, etc.).
Store .env (or at minimum ARGOS_MASTER_KEY and ARGOS_SESSION_SECRET) in a password manager, secrets vault, or encrypted cold storage alongside the backup tarballs.
Disaster recovery checklist¶
The fresh-infra / bare-metal-rebuild scenario. You have:
- a
.envfile with the originalARGOS_MASTER_KEYandARGOS_SESSION_SECRET, - a recent
argos-backup-*.tar.gz, - a new host with Docker Engine.
Steps¶
-
Clone the repo + checkout the version that produced the backup. The tarball's
metadata.jsonrecords the argos version; match it to avoid migration forward/backward surprises. -
Put
.envin place. Copy the saved.envinto the repo root. Do NOT regenerateARGOS_MASTER_KEY— that breaks everything encrypted. -
Stage the tarball. First-boot flow needs the backup to be already present on the
argos_datavolume: -
Schedule the restore.
The command writes
/data/.restore_pendingand exits 0. -
Start the stack.
Boot order: - The panel sees
.restore_pending, extracts the tarball over/data/, replacesargos.db. - Migrations run (idempotent; matching version means nothing to apply). - The manual-cert reconciler walkshost_manual_certsand materialises missing.crt/.keyfiles tocaddy_manual_certs. - The argos→caddy reconciler pushes the config. Caddy starts serving manual certs immediately; ACME-mode hosts trigger re-issuance on next request ifcaddy_datawas also wiped. -
Verify.
# panel alive docker compose exec argos wget -qO- http://localhost:8080/healthz # manual cert files rematerialised docker compose exec argos ls -la /data/manual-certs # reconciler log docker compose logs argos --since=5m | grep "manual cert reconcile"Log in to the panel with the
.envadmin password. Walk Certificates → Imported; every row from the backup should be present.
If you do NOT have ARGOS_MASTER_KEY¶
You can still recover the non-encrypted state: hosts, target groups, rules, users (passwords are bcrypt-hashed, independent of the master key), sessions, audit log, notification rules (without channel secrets), appsec state, backup metadata.
You will lose the decryption ability for:
- Every operator-uploaded manual cert key — must be re-uploaded.
- OIDC client secrets — must be re-entered.
- SMTP passwords, Telegram tokens, webhook auth headers, VAPID private keys — must be re-entered.
The panel boots fine. Affected features show clean "not configured" states; nothing crashes.
Volume lifecycle operations¶
Inspect a volume¶
docker volume inspect argos_panel_data
# .Mountpoint is the host path (typically /var/lib/docker/volumes/...)
Size a volume¶
Move a volume to another host¶
Two paths; pick based on whether the stack is down.
Stack stopped, tar the host path:
docker compose down
VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo tar -C "$(dirname "$VOL_PATH")" -czf /tmp/argos_panel_data.tgz "$(basename "$VOL_PATH")"
# On the new host, create the volume empty, untar onto it.
Stack running, use a helper container (does not need downtime):
docker run --rm -v argos_panel_data:/src -v /tmp:/dst alpine \
tar -C /src -czf /dst/argos_panel_data.tgz .
Either produces a tar.gz you can scp to the new host and untar onto a fresh volume of the same name.
Reset a single volume¶
Rarely needed, but occasionally: caddy_config (regenerable) or caddy_manual_certs after a botched experiment (reconciler rebuilds from DB).
docker compose stop caddy argos
docker volume rm argos_caddy_config
docker compose start caddy argos
Do NOT do this to argos_data or argos_backups — that is equivalent to docker compose down -v on those specifically. Always have a fresh backup first.
Integrity verification¶
The backup tarball has a sha256 column in the backups table the panel checks against automatically. For the other volumes, the operator owns verification.
Baseline + verify pattern¶
Generate a baseline on a known-good state:
VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo find "$VOL_PATH" -type f ! -name '*.journal' ! -name '*-wal' ! -name '*-shm' \
-exec sha256sum {} \; | sort > /var/log/argos-data.sha256
(Excluding -wal / -shm is important — SQLite write-ahead log files legitimately change between checks.)
Verify later:
The lines NOT ending in OK are files that changed. For the argos volume this is mostly normal churn (backup files appear, log rows flush); the useful application is on immutable volumes like caddy_manual_certs where every change should be operator-initiated.
Cron one-liner for caddy_manual_certs¶
# /etc/cron.daily/argos-manual-cert-integrity
#!/bin/sh
VOL=$(docker volume inspect argos_caddy_manual_certs -f '{{.Mountpoint}}')
BASELINE=/var/log/argos-manual-certs.sha256
CURRENT=$(mktemp)
find "$VOL" -type f -exec sha256sum {} \; | sort > "$CURRENT"
if [ -f "$BASELINE" ] && ! diff -q "$BASELINE" "$CURRENT" >/dev/null; then
logger -t argos "manual cert volume drift detected"
diff "$BASELINE" "$CURRENT" | logger -t argos
fi
cp "$CURRENT" "$BASELINE"
rm "$CURRENT"
Drift here = someone (or something) modified a file argos did not write. Investigate via the audit log; the reconciler is the only legitimate writer other than the explicit upload / delete handlers.
ZFS / Btrfs snapshots¶
If the host filesystem is ZFS or Btrfs, snapshotting /var/lib/docker/volumes/argos_panel_* is a cheap alternative to tarballing. Scrubs catch silent bit-rot at the filesystem layer. Not a substitute for argos backups (the panel-aware schema matters for restore) but a strong complement.
Production deployments with bind mounts¶
The shipped compose uses Docker named volumes. For production setups where you want host-level backup tooling (restic, borg, duplicity, Proxmox-backed ZFS snapshots, BackupPC, Bacula, etc.) to operate directly on filesystem paths, you can replace any of the named volumes with bind mounts.
When this is useful:
- You already run a filesystem-level backup tool and want it to see the panel data without Docker abstractions in the way.
- You take ZFS / Btrfs snapshots and want them to cover argos data alongside the rest of the host.
- You run syncthing / rsync replication to a standby host.
- You want to
lsthe files from the host withoutdocker volume inspectgymnastics.
Bind-mount override example¶
Set up the target directory FIRST, with permissions that match the container uid (nobody = 65534 inside the argos image):
sudo mkdir -p /srv/argos-edge/{data,backups,caddy-data,caddy-config,caddy-logs,manual-certs,crowdsec-data,crowdsec-config}
sudo chown -R 65534:65534 /srv/argos-edge/data /srv/argos-edge/backups /srv/argos-edge/manual-certs
# Caddy runs as root in the upstream image; its volumes stay
# root-owned. CrowdSec uses GID=1000 per its env -- match it:
sudo chown -R 1000:1000 /srv/argos-edge/crowdsec-data /srv/argos-edge/crowdsec-config
Then drop this alongside docker-compose.yml. Compose merges it automatically (docker-compose.override.yml):
# docker-compose.override.yml -- bind-mount production layout.
# Each named volume is redefined as a local driver with
# type=none / o=bind pointing at a host path.
volumes:
argos_data:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/data
argos_backups:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/backups
caddy_data:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/caddy-data
caddy_config:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/caddy-config
caddy_logs:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/caddy-logs
caddy_manual_certs:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/manual-certs
crowdsec_data:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/crowdsec-data
crowdsec_config:
driver: local
driver_opts:
type: none
o: bind
device: /srv/argos-edge/crowdsec-config
The volume NAMES (both the compose-level argos_data and the host-level name: argos_panel_data) stay identical. Only the underlying storage moves from Docker's managed volume tree to your host path.
Critical gotchas¶
Bind-mount BEFORE the first compose up
Switching from named volumes to bind mounts on a running stack is a two-step operation: stop the stack, move the data from the Docker volume path to the bind path, then bring the stack back up with the override in place. Starting with bind mounts out of the gate is far simpler. If you already have production data in named volumes, export → move → re-import:
docker compose down
# Copy each volume's content to the new bind path.
for v in argos_panel_data argos_panel_backups argos_caddy_data \
argos_caddy_config argos_caddy_logs argos_caddy_manual_certs \
argos_crowdsec_data argos_crowdsec_config; do
src=$(docker volume inspect "$v" -f '{{.Mountpoint}}')
dst=/srv/argos-edge/$(echo "$v" | sed 's/argos_panel_//; s/argos_caddy_//; s/argos_crowdsec_/crowdsec-/')
sudo rsync -a "$src/" "$dst/"
done
# Drop in the override, then:
docker compose up -d
Verify everything is intact (log in, list hosts, force a backup) before removing the old Docker volumes with docker volume rm <name>.
-
Backup tool permissions — the backup tool reads files owned by
nobody(uid 65534). On most distros, running the backup as root handles this trivially; an unprivileged user needs either group membership, ACLs, or sudo. -
Restore via filesystem-level tools needs
chown -R— if your backup tool does not preserve uids (some cloud-bucket syncers flatten ownership), you must re-chown after restore or the containers will see permission-denied on their own data files.chown 65534:65534 /srv/argos-edge/{data,backups,manual-certs}fixes it. -
SELinux (RHEL, Fedora, Rocky, AlmaLinux) — append
:Zto the volume mount in the compose file, or the containers will hitPermission deniedeven with correct uid. Docker sets the container-specific SELinux label: -
AppArmor / unconfined host — usually no-op. Only relevant if you have a custom profile restricting Docker.
-
Filesystem choice — any POSIX filesystem works. SQLite (the argos DB) does NOT work reliably on NFS unless you have correctly-configured locking; keep
argos_dataon local storage.argos_backupson NFS is fine (just files).
Mix-and-match¶
Nothing forces you to bind-mount all eight. A common pattern:
- Bind-mount
argos_backupsto a filesystem that your existing backup tool already watches. - Keep named volumes for the rest; let Docker manage them.
The override is per-volume — leave the ones you don't need out of the override YAML and they stay on Docker-managed storage.
Related¶
- Backups — scheduler config, archive structure, retention.
- Restore from backup — step-by- step restore procedure.
- Manual certificates → Disaster recovery — the specific mechanism that makes
caddy_manual_certssafe to lose. - Running multiple instances — why volume names are hardcoded and how to rescope for a second stack.
- Upgrading — which volumes survive which commands.
- Installation → Volumes — same matrix, shorter form.