Skip to content

Persistence

Every operational question about "what survives a restart / an upgrade / a DR event" lives here. One page for the whole storage story so you don't stitch it together from five others.

If you are looking for just the what volumes exist tl;dr, the same table is in installation → Volumes. This page adds the DR mechanics, the integrity checks, and the bind-mount alternative.

Volume matrix

Eight named Docker volumes cover the stack. All persist across docker compose down + up; all are destroyed by docker compose down -v.

Volume (compose name / host name) Container mount Contents In argos backup? If you lose it, how to recover
argos_data / argos_panel_data /data in argos argos.db, /data/geoip/ Yes (the DB is the tar.gz's main payload) Restore a backup tarball. Without a backup: everything is gone (hosts, users, audit log, notifications, manual cert metadata, settings).
argos_backups / argos_panel_backups /data/backups in argos Local tar.gz backups produced by the scheduler / manual triggers No (this IS the destination) Off-site replica if you have one (rclone / borg). If not, local backup history is unrecoverable but the panel keeps running.
caddy_data / argos_caddy_data /data in caddy (read-only into argos for backup capture) ACME account keys, issued certs, certmagic state Yes (best-effort; see Backups) Caddy re-issues every tls_mode=auto cert on the next request. New ACME account auto-created. Brief TLS failure window during re-issue.
caddy_config / argos_caddy_config /config in caddy Caddy's runtime config cache No (regenerable) No-op: argos re-pushes the JSON config on next boot via the admin API.
caddy_logs / argos_caddy_logs /var/log/caddy in caddy (read-only into argos + crowdsec) access.log + errors.log (rotated at 100 MB × 5 × 7 d) No (rotates independently) Rows already ingested into log_entries (inside argos.db) stay. Anything not yet ingested is lost.
caddy_manual_certs / argos_caddy_manual_certs /data/manual-certs (argos RW), /etc/caddy/manual-certs (caddy RO) Plaintext .crt + .key per manual-mode host No (the encrypted key lives in host_manual_certs) Boot reconciler rematerialises files from argos.db on next startup — provided ARGOS_MASTER_KEY is unchanged. See Manual certs → Disaster recovery.
crowdsec_data / argos_crowdsec_data /var/lib/crowdsec/data in crowdsec LAPI decision DB, machine credentials, bouncer API keys, local scenarios No Re-enroll the machine, re-download the community feed. Regenerate the bouncer API key with cscli bouncers add and update the panel's CROWDSEC_BOUNCER_API_KEY. Operator-added manual bans are lost.
crowdsec_config / argos_crowdsec_config /etc/crowdsec in crowdsec Installed collections, parsers, AppSec config No Re-run the boot setup script (crowdsec/setup-appsec.sh) to reinstall the collections from the bundled sources.

Bind-mount repo files (not volumes, but present on disk)

  • ./Caddyfile/etc/caddy/Caddyfile (RO) — bootstrap config that hands control to argos on first boot. Tiny fixed file, lives in git.
  • ./crowdsec/*/etc/crowdsec/* + /setup/* (RO) — acquis definitions, AppSec sources, setup scripts. Lives in git.

These come back on any fresh git clone; they do not need separate backup.

Backup scope

What argos-edge natively backs up

The scheduled + manual backup feature (see Backups) produces tar.gz files under /data/backups/ containing:

  • argos.dbVACUUM INTO snapshot of the live DB. Fully consistent. Includes host_manual_certs rows with AES-GCM-encrypted keys (encrypted with ARGOS_MASTER_KEY).
  • metadata.json — argos version, git commit, schema version, kind, timestamp UTC, count of caddy files included.
  • caddy/ (optional, best-effort) — a copy of caddy_data as visible to the argos container. Some files may be owned by root and skipped; metadata.json records the actual count.

What is NOT in the tarball

  • crowdsec_data — out of scope. CrowdSec's own state lives in its own volume.
  • caddy_logs raw files — rotating log files; rows already ingested into log_entries (in argos.db) are captured, the files themselves are not.
  • caddy_manual_certs plaintext files — NOT backed up directly. The encrypted keys are inside argos.db; the plaintext .crt / .key files get regenerated by the boot reconciler. See Manual certs → Disaster recovery.
  • .env — by design. Secrets live outside the panel.

Off-site replication

argos_backups is its own volume so you can point a rclone / borg / restic sidecar at just that directory. Typical compose pattern:

services:
  rclone-sync:
    image: rclone/rclone:latest
    restart: unless-stopped
    volumes:
      - argos_backups:/src:ro
      - ./rclone.conf:/config/rclone/rclone.conf:ro
    command: ["rclone", "sync", "/src", "remote:argos-backups",
              "--log-level", "INFO"]

Run this as a second stack; it doesn't need access to anything else.

ARGOS_MASTER_KEY is part of your backup

Back this up out of band

ARGOS_MASTER_KEY (from .env) encrypts every secret the panel persists: manual cert private keys, OIDC client secrets, SMTP passwords, Telegram bot tokens, VAPID private keys. If you restore argos.db onto fresh infrastructure but cannot produce the original ARGOS_MASTER_KEY, every encrypted value is unrecoverable. The panel still boots; the encrypted values just stay unreadable until you rotate them (re-upload certs, re-save OIDC credentials, etc.).

Store .env (or at minimum ARGOS_MASTER_KEY and ARGOS_SESSION_SECRET) in a password manager, secrets vault, or encrypted cold storage alongside the backup tarballs.

Disaster recovery checklist

The fresh-infra / bare-metal-rebuild scenario. You have:

  • a .env file with the original ARGOS_MASTER_KEY and ARGOS_SESSION_SECRET,
  • a recent argos-backup-*.tar.gz,
  • a new host with Docker Engine.

Steps

  1. Clone the repo + checkout the version that produced the backup. The tarball's metadata.json records the argos version; match it to avoid migration forward/backward surprises.

    git clone https://github.com/cmos486/argos-edge.git
    cd argos-edge
    git checkout v1.1.0    # whichever tag the backup was taken on
    
  2. Put .env in place. Copy the saved .env into the repo root. Do NOT regenerate ARGOS_MASTER_KEY — that breaks everything encrypted.

  3. Stage the tarball. First-boot flow needs the backup to be already present on the argos_data volume:

    docker compose up --no-start argos
    docker compose cp ./argos-backup-<ts>.tar.gz argos:/data/backups/
    
  4. Schedule the restore.

    docker compose run --rm argos \
      argos restore --file /data/backups/argos-backup-<ts>.tar.gz --yes
    

    The command writes /data/.restore_pending and exits 0.

  5. Start the stack.

    docker compose up -d
    

    Boot order: - The panel sees .restore_pending, extracts the tarball over /data/, replaces argos.db. - Migrations run (idempotent; matching version means nothing to apply). - The manual-cert reconciler walks host_manual_certs and materialises missing .crt / .key files to caddy_manual_certs. - The argos→caddy reconciler pushes the config. Caddy starts serving manual certs immediately; ACME-mode hosts trigger re-issuance on next request if caddy_data was also wiped.

  6. Verify.

    # panel alive
    docker compose exec argos wget -qO- http://localhost:8080/healthz
    
    # manual cert files rematerialised
    docker compose exec argos ls -la /data/manual-certs
    
    # reconciler log
    docker compose logs argos --since=5m | grep "manual cert reconcile"
    

    Log in to the panel with the .env admin password. Walk Certificates → Imported; every row from the backup should be present.

If you do NOT have ARGOS_MASTER_KEY

You can still recover the non-encrypted state: hosts, target groups, rules, users (passwords are bcrypt-hashed, independent of the master key), sessions, audit log, notification rules (without channel secrets), appsec state, backup metadata.

You will lose the decryption ability for:

  • Every operator-uploaded manual cert key — must be re-uploaded.
  • OIDC client secrets — must be re-entered.
  • SMTP passwords, Telegram tokens, webhook auth headers, VAPID private keys — must be re-entered.

The panel boots fine. Affected features show clean "not configured" states; nothing crashes.

Volume lifecycle operations

Inspect a volume

docker volume inspect argos_panel_data
# .Mountpoint is the host path (typically /var/lib/docker/volumes/...)

Size a volume

VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo du -sh "$VOL_PATH"

Move a volume to another host

Two paths; pick based on whether the stack is down.

Stack stopped, tar the host path:

docker compose down
VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo tar -C "$(dirname "$VOL_PATH")" -czf /tmp/argos_panel_data.tgz "$(basename "$VOL_PATH")"
# On the new host, create the volume empty, untar onto it.

Stack running, use a helper container (does not need downtime):

docker run --rm -v argos_panel_data:/src -v /tmp:/dst alpine \
  tar -C /src -czf /dst/argos_panel_data.tgz .

Either produces a tar.gz you can scp to the new host and untar onto a fresh volume of the same name.

Reset a single volume

Rarely needed, but occasionally: caddy_config (regenerable) or caddy_manual_certs after a botched experiment (reconciler rebuilds from DB).

docker compose stop caddy argos
docker volume rm argos_caddy_config
docker compose start caddy argos

Do NOT do this to argos_data or argos_backups — that is equivalent to docker compose down -v on those specifically. Always have a fresh backup first.

Integrity verification

The backup tarball has a sha256 column in the backups table the panel checks against automatically. For the other volumes, the operator owns verification.

Baseline + verify pattern

Generate a baseline on a known-good state:

VOL_PATH=$(docker volume inspect argos_panel_data -f '{{.Mountpoint}}')
sudo find "$VOL_PATH" -type f ! -name '*.journal' ! -name '*-wal' ! -name '*-shm' \
  -exec sha256sum {} \; | sort > /var/log/argos-data.sha256

(Excluding -wal / -shm is important — SQLite write-ahead log files legitimately change between checks.)

Verify later:

cd / && sudo sha256sum -c /var/log/argos-data.sha256 | grep -v OK || true

The lines NOT ending in OK are files that changed. For the argos volume this is mostly normal churn (backup files appear, log rows flush); the useful application is on immutable volumes like caddy_manual_certs where every change should be operator-initiated.

Cron one-liner for caddy_manual_certs

# /etc/cron.daily/argos-manual-cert-integrity
#!/bin/sh
VOL=$(docker volume inspect argos_caddy_manual_certs -f '{{.Mountpoint}}')
BASELINE=/var/log/argos-manual-certs.sha256
CURRENT=$(mktemp)
find "$VOL" -type f -exec sha256sum {} \; | sort > "$CURRENT"
if [ -f "$BASELINE" ] && ! diff -q "$BASELINE" "$CURRENT" >/dev/null; then
    logger -t argos "manual cert volume drift detected"
    diff "$BASELINE" "$CURRENT" | logger -t argos
fi
cp "$CURRENT" "$BASELINE"
rm "$CURRENT"

Drift here = someone (or something) modified a file argos did not write. Investigate via the audit log; the reconciler is the only legitimate writer other than the explicit upload / delete handlers.

ZFS / Btrfs snapshots

If the host filesystem is ZFS or Btrfs, snapshotting /var/lib/docker/volumes/argos_panel_* is a cheap alternative to tarballing. Scrubs catch silent bit-rot at the filesystem layer. Not a substitute for argos backups (the panel-aware schema matters for restore) but a strong complement.

Production deployments with bind mounts

The shipped compose uses Docker named volumes. For production setups where you want host-level backup tooling (restic, borg, duplicity, Proxmox-backed ZFS snapshots, BackupPC, Bacula, etc.) to operate directly on filesystem paths, you can replace any of the named volumes with bind mounts.

When this is useful:

  • You already run a filesystem-level backup tool and want it to see the panel data without Docker abstractions in the way.
  • You take ZFS / Btrfs snapshots and want them to cover argos data alongside the rest of the host.
  • You run syncthing / rsync replication to a standby host.
  • You want to ls the files from the host without docker volume inspect gymnastics.

Bind-mount override example

Set up the target directory FIRST, with permissions that match the container uid (nobody = 65534 inside the argos image):

sudo mkdir -p /srv/argos-edge/{data,backups,caddy-data,caddy-config,caddy-logs,manual-certs,crowdsec-data,crowdsec-config}
sudo chown -R 65534:65534 /srv/argos-edge/data /srv/argos-edge/backups /srv/argos-edge/manual-certs
# Caddy runs as root in the upstream image; its volumes stay
# root-owned. CrowdSec uses GID=1000 per its env -- match it:
sudo chown -R 1000:1000 /srv/argos-edge/crowdsec-data /srv/argos-edge/crowdsec-config

Then drop this alongside docker-compose.yml. Compose merges it automatically (docker-compose.override.yml):

# docker-compose.override.yml -- bind-mount production layout.
# Each named volume is redefined as a local driver with
# type=none / o=bind pointing at a host path.

volumes:
  argos_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/data

  argos_backups:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/backups

  caddy_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/caddy-data

  caddy_config:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/caddy-config

  caddy_logs:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/caddy-logs

  caddy_manual_certs:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/manual-certs

  crowdsec_data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/crowdsec-data

  crowdsec_config:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /srv/argos-edge/crowdsec-config

The volume NAMES (both the compose-level argos_data and the host-level name: argos_panel_data) stay identical. Only the underlying storage moves from Docker's managed volume tree to your host path.

Critical gotchas

Bind-mount BEFORE the first compose up

Switching from named volumes to bind mounts on a running stack is a two-step operation: stop the stack, move the data from the Docker volume path to the bind path, then bring the stack back up with the override in place. Starting with bind mounts out of the gate is far simpler. If you already have production data in named volumes, export → move → re-import:

docker compose down
# Copy each volume's content to the new bind path.
for v in argos_panel_data argos_panel_backups argos_caddy_data \
         argos_caddy_config argos_caddy_logs argos_caddy_manual_certs \
         argos_crowdsec_data argos_crowdsec_config; do
    src=$(docker volume inspect "$v" -f '{{.Mountpoint}}')
    dst=/srv/argos-edge/$(echo "$v" | sed 's/argos_panel_//; s/argos_caddy_//; s/argos_crowdsec_/crowdsec-/')
    sudo rsync -a "$src/" "$dst/"
done
# Drop in the override, then:
docker compose up -d

Verify everything is intact (log in, list hosts, force a backup) before removing the old Docker volumes with docker volume rm <name>.

  • Backup tool permissions — the backup tool reads files owned by nobody (uid 65534). On most distros, running the backup as root handles this trivially; an unprivileged user needs either group membership, ACLs, or sudo.

  • Restore via filesystem-level tools needs chown -R — if your backup tool does not preserve uids (some cloud-bucket syncers flatten ownership), you must re-chown after restore or the containers will see permission-denied on their own data files. chown 65534:65534 /srv/argos-edge/{data,backups,manual-certs} fixes it.

  • SELinux (RHEL, Fedora, Rocky, AlmaLinux) — append :Z to the volume mount in the compose file, or the containers will hit Permission denied even with correct uid. Docker sets the container-specific SELinux label:

    volumes:
      - argos_data:/data:Z
    
  • AppArmor / unconfined host — usually no-op. Only relevant if you have a custom profile restricting Docker.

  • Filesystem choice — any POSIX filesystem works. SQLite (the argos DB) does NOT work reliably on NFS unless you have correctly-configured locking; keep argos_data on local storage. argos_backups on NFS is fine (just files).

Mix-and-match

Nothing forces you to bind-mount all eight. A common pattern:

  • Bind-mount argos_backups to a filesystem that your existing backup tool already watches.
  • Keep named volumes for the rest; let Docker manage them.

The override is per-volume — leave the ones you don't need out of the override YAML and they stay on Docker-managed storage.