v1.3.27 -- Drift detection (Mark-as-applied retired)¶
A focused-feature release. Closes the v1.3.25 elevated-priority queue item: replaces the operator-trust "Mark as applied" model on the Scenarios + AppSec tabs with a real comparison between panel intent and CrowdSec runtime state.
The original v1.3.27 plan bundled a second feature (per-host true_detect_mode enforcement). Pre-implementation upstream verification cut that scope for v1.3.27 -- see "What didn't ship" below + docs/planning/v1.3.28-true-detect-mode.md.
What changed in the UX¶
Before (v1.3.25 -> v1.3.26):
- Operator disables a scenario in the panel.
- Panel writes the sentinel file. Yellow "Pending reload" badge appears.
- Operator runs
docker compose exec crowdsec /setup-appsec.sh. - Operator clicks "Mark as applied". Badge clears.
- If the operator forgot step 4, the badge stayed yellow even though CrowdSec was synced. If step 3 silently failed (errored mid-run), the badge could clear despite CrowdSec not being synced. The signal was operator-trust only.
After (v1.3.27):
- Operator disables a scenario in the panel.
- Panel writes the sentinel file.
- Within 60s, the drift detector compares the panel's disabled set against
/crowdsec-state/scenarios/(the read-only mount added in v1.3.25). The Scenarios tab gets an amber dot; the top-of-page banner appears: "Configuration drift detected. CrowdSec runtime state does not match the panel intent. Runsetup-appsec.sh." - Operator runs
docker compose exec crowdsec /setup-appsec.sh. - The script removes the disabled scenario from the filesystem.
- Within 60s, the next drift tick sees the scenario gone, the detector clears the drift state, the banner disappears.
No "Mark as applied" button. No operator-trust signal. The runtime state itself is the oracle.
What ships¶
Backend¶
backend/internal/security/driftpackage. Filesystem-based detector that reads installed scenarios via the existingscenarios.Reader(no LAPI calls -- the v1.3.25 reality-check established that LAPI v1.7.7 has no hub-state API) and parses the regeneratedargos-tuning.yamlfor thetx.inbound_anomaly_score_threshold=NN/tx.outbound_anomaly_score_threshold=NNSecAction lines.Detector.Start(ctx, interval)mirrors the v1.3.23 publicip.Detector pattern: goroutine + ticker + ctx.Done. First tick runs synchronously so the snapshot is fresh within seconds of boot.- Settings persistence. Two JSON-blob settings rows per surface:
appsec.scenarios.drift_stateappsec.tuning.drift_stateEach blob carriesdrift_detected, the expected vs actual state, andlast_check_at.GET /api/security/driftendpoint. Reads the cached snapshot; no recompute on the request path.- Migration 031 drops the deprecated
appsec.scenarios.last_applied_at+appsec.tuning.last_applied_atsettings rows. - API surface trimmed. Removed:
POST /api/security/scenarios/mark-applied+ handlerPOST /api/security/appsec-tuning/mark-applied+ handlerlast_applied_at+reload_neededfields fromScenariosResponse+AppSecTuningResponse
Frontend¶
- DriftBanner (top of /security page). Renders only when drift is detected on at least one surface. Lists the specific scenarios still enabled despite panel-disable + the runtime threshold mismatch when relevant.
- DriftDot -- a 1.5px amber circle next to the Scenarios / AppSec tab labels. Mirrors the per-surface
drift_detectedflag. useDrifthook. Polls/api/security/driftevery 10s while the page is mounted. The 10s polling cadence is deliberately tighter than the 60s server tick -- it keeps the banner-clear UX snappy after the operator runs the script.- Mark-as-applied button removed. Both ScenariosTab and AppSecTab dropped the "Mark as applied" button + the
markBusystate + thesecurityScenariosMarkApplied/securityAppSecTuningMarkAppliedAPI client methods.
Smoke¶
scripts/smoke/drift-detection.sh-- 12-step end-to-end verification. Two phases (scenarios surface + tuning surface); each phase: PATCH -> wait DRIFT_WAIT (default 65s) -> assert drift=true -> docker exec setup-appsec.sh -> wait DRIFT_WAIT -> assert drift=false. Cleanup trap restores pre-test state.
What didn't ship (deferred to v1.3.28)¶
The original v1.3.27 plan also bundled per-host true_detect_mode enforcement: a Caddy template change emitting per-host appsec_url (referencing the existing argos-appsec-detect.yaml config on port 7423 vs argos-appsec-block.yaml on 7422).
Pre-flight verification of the pinned caddy-crowdsec-bouncer plugin (v0.12.1, latest tag at the time) confirmed the per-route HTTP handler http.handlers.appsec is a no-config wrapper with an empty struct -- the appsec URL is read exclusively from the global apps.crowdsec.appsec_url, which is a single string field. There is no upstream support for per-route override.
This is the sixth case in the upstream-behaviour pattern memo; empty struct + ctx.App("crowdsec") Provision = no per-handler config. v1.3.28 has been opened to evaluate two paths:
- Re-spike the profiles.yaml whitelist approach (the original intent encoded in the
hosts.true_detect_modemodel.go comment) against a live stack. - Upstream PR to the Caddy plugin adding an optional per-handler
appsec_url.
See docs/planning/v1.3.28-true-detect-mode.md for the gate-zero spike plan.
Smoke gate (4/4 PASS)¶
make smoke-self-- the v1.3.26 sync-prod gates remain green.make sync-prod-dry-- expected post-v1.3.26 drift only (Makefile, deployment.md, etc.) before sync; clean post-sync.scripts/smoke/drift-detection.sh-- both surfaces flip drift_detected=true after PATCH + wait, then clear after setup-appsec.sh + wait.make deploy-prod-- panel container recreates cleanly with v1.3.27 image; healthz 200; drift detector loop visible in panel logs (drift detector ...slog lines on each tick).
NO tag until smoke real PASSes.
Upgrade¶
cd ~/argos-edge
git pull
make sync-prod # pulls scripts/smoke/drift-detection.sh
# + setup-appsec.sh changes (none) into
# the operational dir
make deploy-prod # builds panel + restarts; the drift
# detector starts polling on first tick
The first /api/security/drift response after upgrade will show scenarios.drift_detected=false (the v1.3.26 settings store had no appsec.disabled_scenarios written by an operator unless they explicitly disabled one in the v1.3.25 UI). If you DO see drift on first poll, that means the v1.3.25-era operator-trust signal masked a real divergence -- run setup-appsec.sh once and the banner clears within ~60s.
Files changed¶
backend/migrations/031_drop_last_applied_at_settings.{up,down}.sql(new)backend/internal/security/drift/{drift.go,drift_test.go}(new)backend/internal/api/security_drift.go(new)backend/internal/api/security_scenarios.go(handlers + types trimmed)backend/internal/server/server.go(route swap)backend/cmd/argos/main.go(Detector wired)backend/internal/db/migrate_test.go(rollback chain extended)frontend/src/api/client.ts(types + endpoint method)frontend/src/pages/Security.tsx(DriftBanner, useDrift, tab dots)scripts/smoke/drift-detection.sh(new)docs/operations/...-- unchangeddocs/planning/v1.3.28-true-detect-mode.md(new)docs/release-notes/v1.3.27.md(this file)CHANGELOG.md,mkdocs.yml, version bump
Not changed¶
- All v1.3.23 / v1.3.25
/api/security/*read endpoints unchanged exceptlast_applied_at/reload_neededfield removal (which is technically a response-shape break, but no downstream consumer outside the panel itself reads them). - Caddyfile, crowdsec/* configs, docker-compose.yml -- all v1.3.26.
- Migration 030 still latest schema-modifying migration; 031 is data-only.