Components¶
Three containers, one docker network, one persistent volume per container. Nothing else is required to run argos.
flowchart LR
subgraph internet[Internet]
client[Client]
end
subgraph host[Host (LXC / VM)]
subgraph bridge[docker bridge: argos_net]
caddy[Caddy 2<br/>TLS, HTTP/3, WAF, bouncer]
crowdsec[CrowdSec<br/>LAPI + AppSec]
argos[argos panel<br/>Go + embedded SPA]
end
caddy_data[(caddy_data)]
crowdsec_data[(crowdsec_data)]
argos_data[(argos_data<br/>argos.db + backups + geoip)]
end
client -->|443 / 80| caddy
caddy -->|reverse_proxy| upstream[Upstream services]
caddy -.->|admin API :2019| argos
caddy -.->|bouncer poll :8081| crowdsec
caddy -.->|AppSec HTTP :7422/:7423| crowdsec
argos -.->|LAPI :8081| crowdsec
caddy --- caddy_data
crowdsec --- crowdsec_data
argos --- argos_data Solid arrows are data-plane traffic (public requests + upstreams). Dashed arrows are control-plane hooks between containers.
Caddy¶
Reference version in docker-compose.yml. Compiled with:
caddy-crowdsec-bouncer— reads decisions from LAPI every 15 s (tunable), blocks banned IPs before any other handler fires.caddy-appsec— OPTIONAL module that proxies a sub-request to the CrowdSec AppSec HTTP listener for inline WAF.- Standard modules:
caddy2,http,reverse_proxy, TLS + ACME.
Configured entirely through the admin API at :2019, reachable only from the argos container over the docker bridge. The panel never writes Caddyfile text; it POSTs JSON config to /load.
Bootstrap Caddyfile (used only for the admin API setup) lives in the repo; the reconciler takes over after first boot.
CrowdSec¶
Two subsystems in one container:
- LAPI — authoritative decision store + scenario runner + log parser. Reads Caddy's access log (bind-mounted), produces decisions when scenarios match. SQLite-backed.
- AppSec component — separate HTTP listener that accepts sub-requests from Caddy and returns a verdict. Runs Coraza with the OWASP CRS rule set. Two listeners:
:7423(detect) and:7422(block).
The panel talks to LAPI via machine credentials to show / create / delete decisions.
argos panel¶
Single Go binary, scratch-based Docker image. Internal subsystems:
flowchart TB
subgraph argos[argos container]
sqlite[(SQLite<br/>/data/argos.db)]
api[HTTP API + embedded SPA<br/>:8080]
reconciler[Caddy reconciler]
drift[Drift detector<br/>60s ticker]
countryRecon[Country reconciler<br/>5min ticker]
jobRunner[Country JobRunner<br/>single-worker mutex]
publicip[Public IP detector]
notif[Notifications worker]
ingestor[Log ingestor<br/>tails caddy logs]
scheduler[Backup scheduler]
geoip[GeoIP downloader]
retention[Retention cron]
end
api -- writes/reads --> sqlite
api -- triggers --> reconciler
api -- submits --> jobRunner
reconciler -- JSON /load --> caddyAdmin[Caddy admin API]
drift -- read /crowdsec-state mount --> crowdsecConfig[crowdsec config volume]
drift -- write drift_state --> sqlite
countryRecon -- LAPI count by origin --> lapi[CrowdSec LAPI]
countryRecon -- UPDATE state --> sqlite
jobRunner -- N decisions per call --> lapi
publicip -- HTTPS --> external_ipify[ipify.org]
notif -- reads events --> sqlite
notif -- sends --> external[Webhook / SMTP / Telegram / Web Push]
ingestor -- tail --> caddyLogs[Caddy log files]
ingestor -- batch INSERT --> sqlite
scheduler -- VACUUM INTO + tar --> backupDir[/data/backups]
geoip -- download mmdb --> geoipDir[/data/geoip]
retention -- DELETE old --> sqlite Each of the background goroutines is bounded:
| Goroutine | Bound | Owner |
|---|---|---|
| Caddy reconciler | one trigger per host change | internal/reconciler/reconciler.go |
| Drift detector (v1.3.27) | every 60 s | internal/security/drift/drift.go |
| Country reconciler (v1.3.33) | every 5 min | internal/security/country/reconciler.go |
| Country JobRunner (v1.3.31) | single-worker mutex; one expansion at a time | internal/security/country/jobs.go |
| Public IP detector (v1.3.23) | every 1 h | internal/security/publicip/publicip.go |
| notifications worker | N workers, chan-buffered queue | internal/notifications/worker.go |
| log ingestor | one tail + one writer, channel-buffered | internal/logs/ingestor.go |
| backup scheduler | one tick per cron match | internal/backup/scheduler.go |
| geoip downloader | one run per month cron + boot | internal/geoip/downloader.go |
| retention cron | every 6 h | internal/logs/retention.go |
| OIDC pending sweeper | every TTL/2 | internal/oidc/flow.go |
| TOTP challenge sweeper | every TTL/2 | internal/totp/challenge.go |
| ForwardAuth cache sweeper | every 30 s | internal/api/forward.go |
Shutdown: every goroutine respects the main context; a SIGTERM gives them up to 10 s to drain before the binary exits.
Reconcilers verify what¶
Anchor:
#reconcilers-verify-what. Three different goroutines each catch a different drift surface.
Three different goroutines each catch a different drift surface:
- Drift detector (
/api/security/drift) compares the panel'sappsec.disabled_scenariossetting +appsec.inbound_threshold/outbound_thresholdagainst the actual filesystem state in/etc/crowdsec/scenarios/and/etc/crowdsec/appsec-rules/ argos-tuning.yaml. Surfaces: the/security/scenariosand/security/appsectabs render an amber "drift detected" banner when the panel intent doesn't match the running config. - Country reconciler compares the panel's
country_ban_expansions.cidr_countagainst the LAPI's actual decision count forargos-country-XXorigin. Flipsstate='drifted'when the divergence exceeds 1%. Defensive layer for the v1.3.31-era flush-cap incident (which v1.3.33's alert-shape fix addressed at the root); covers any future drift cause not already prevented at emit time. - Country JobRunner isolates the worker side of the async expansion endpoint. Single-worker mutex serialises concurrent POSTs (queueing via
state=pending), preserves boot-time recovery (anypending/runningrow from a prior process transitions tofailedwitherror_message='panel restarted').
Smoke verification¶
Each panel sub-service has a corresponding smoke under scripts/smoke/ that asserts EFFECT against a live stack:
| Sub-service | Smoke | Verifies |
|---|---|---|
| Drift detector | drift-detection.sh | 12-phase: PATCH disable → wait 65s → drift_detected=true; setup-appsec.sh → wait 65s → drift_detected=false |
| Country JobRunner | country-expansion-async.sh | 8-phase async submit → poll → completed; failure path with crowdsec stopped |
| Country alert shape | lapi-flush-cap.sh | NG +1 chunk-alert + IR +3 chunk-alerts; no flush cascade |
| Country reconciler | country-reconciler.sh | Drift detected after manual cscli mutation; recovery to active |
| Reverse-sentinel (scenarios index) | scenario-descriptions.sh | Slimmed file produced; coverage ≥90%; graceful degrade with file removed |
Full matrix: docs/operations/verification-report.md.
The SPA¶
frontend/ is a React + TypeScript + Tailwind app built with Vite. Output lives in backend/static/ via the go:embed directive and gets compiled into the binary. There is no separate frontend container, no CDN, no runtime dependency.
Code-split by route (every top-level page is a React.lazy boundary) + by heavy vendor (charts, map, icons, dnd). Initial bundle is under 220 KiB minified.
Storage layout¶
Inside the argos container, /data is the argos_data volume:
/data/
├── argos.db # single SQLite file, WAL mode
├── argos.db-wal # WAL sidecar
├── argos.db-shm # shared memory
├── backups/ # scheduled + manual tar.gz
│ └── argos-backup-20260420-021500.tar.gz
├── geoip/ # DB-IP Lite mmdb files
│ ├── country.mmdb
│ └── asn.mmdb
└── .restore_pending # present only during a pending restore
caddy_data under the caddy container holds certs + ACME state. crowdsec_data holds the LAPI DB + parsers + scenarios.
What NOT in the stack¶
- No message broker. Notifications go straight from event to sender inside the worker goroutine.
- No cache server (Redis, memcached). Session lookup hits SQLite with a per-process cache for ForwardAuth (30 s TTL).
- No sidecar for off-site backups. Mirror
/data/backups/externally. - No separate metrics endpoint.
/api/system/healthreturns JSON; wire it to whatever uptime monitor you run.
Related¶
- Request flow — what happens when a client request hits the edge.
- Storage — SQLite mode, migrations, backup semantics.
- Threat model — attack surface + mitigations.