Skip to content

v1.3.27 -- Drift detection (Mark-as-applied retired)

A focused-feature release. Closes the v1.3.25 elevated-priority queue item: replaces the operator-trust "Mark as applied" model on the Scenarios + AppSec tabs with a real comparison between panel intent and CrowdSec runtime state.

The original v1.3.27 plan bundled a second feature (per-host true_detect_mode enforcement). Pre-implementation upstream verification cut that scope for v1.3.27 -- see "What didn't ship" below + docs/planning/v1.3.28-true-detect-mode.md.

What changed in the UX

Before (v1.3.25 -> v1.3.26):

  1. Operator disables a scenario in the panel.
  2. Panel writes the sentinel file. Yellow "Pending reload" badge appears.
  3. Operator runs docker compose exec crowdsec /setup-appsec.sh.
  4. Operator clicks "Mark as applied". Badge clears.
  5. If the operator forgot step 4, the badge stayed yellow even though CrowdSec was synced. If step 3 silently failed (errored mid-run), the badge could clear despite CrowdSec not being synced. The signal was operator-trust only.

After (v1.3.27):

  1. Operator disables a scenario in the panel.
  2. Panel writes the sentinel file.
  3. Within 60s, the drift detector compares the panel's disabled set against /crowdsec-state/scenarios/ (the read-only mount added in v1.3.25). The Scenarios tab gets an amber dot; the top-of-page banner appears: "Configuration drift detected. CrowdSec runtime state does not match the panel intent. Run setup-appsec.sh."
  4. Operator runs docker compose exec crowdsec /setup-appsec.sh.
  5. The script removes the disabled scenario from the filesystem.
  6. Within 60s, the next drift tick sees the scenario gone, the detector clears the drift state, the banner disappears.

No "Mark as applied" button. No operator-trust signal. The runtime state itself is the oracle.

What ships

Backend

  • backend/internal/security/drift package. Filesystem-based detector that reads installed scenarios via the existing scenarios.Reader (no LAPI calls -- the v1.3.25 reality-check established that LAPI v1.7.7 has no hub-state API) and parses the regenerated argos-tuning.yaml for the tx.inbound_anomaly_score_threshold=NN / tx.outbound_anomaly_score_threshold=NN SecAction lines.
  • Detector.Start(ctx, interval) mirrors the v1.3.23 publicip.Detector pattern: goroutine + ticker + ctx.Done. First tick runs synchronously so the snapshot is fresh within seconds of boot.
  • Settings persistence. Two JSON-blob settings rows per surface:
  • appsec.scenarios.drift_state
  • appsec.tuning.drift_state Each blob carries drift_detected, the expected vs actual state, and last_check_at.
  • GET /api/security/drift endpoint. Reads the cached snapshot; no recompute on the request path.
  • Migration 031 drops the deprecated appsec.scenarios.last_applied_at + appsec.tuning.last_applied_at settings rows.
  • API surface trimmed. Removed:
  • POST /api/security/scenarios/mark-applied + handler
  • POST /api/security/appsec-tuning/mark-applied + handler
  • last_applied_at + reload_needed fields from ScenariosResponse + AppSecTuningResponse

Frontend

  • DriftBanner (top of /security page). Renders only when drift is detected on at least one surface. Lists the specific scenarios still enabled despite panel-disable + the runtime threshold mismatch when relevant.
  • DriftDot -- a 1.5px amber circle next to the Scenarios / AppSec tab labels. Mirrors the per-surface drift_detected flag.
  • useDrift hook. Polls /api/security/drift every 10s while the page is mounted. The 10s polling cadence is deliberately tighter than the 60s server tick -- it keeps the banner-clear UX snappy after the operator runs the script.
  • Mark-as-applied button removed. Both ScenariosTab and AppSecTab dropped the "Mark as applied" button + the markBusy state + the securityScenariosMarkApplied / securityAppSecTuningMarkApplied API client methods.

Smoke

  • scripts/smoke/drift-detection.sh -- 12-step end-to-end verification. Two phases (scenarios surface + tuning surface); each phase: PATCH -> wait DRIFT_WAIT (default 65s) -> assert drift=true -> docker exec setup-appsec.sh -> wait DRIFT_WAIT -> assert drift=false. Cleanup trap restores pre-test state.

What didn't ship (deferred to v1.3.28)

The original v1.3.27 plan also bundled per-host true_detect_mode enforcement: a Caddy template change emitting per-host appsec_url (referencing the existing argos-appsec-detect.yaml config on port 7423 vs argos-appsec-block.yaml on 7422).

Pre-flight verification of the pinned caddy-crowdsec-bouncer plugin (v0.12.1, latest tag at the time) confirmed the per-route HTTP handler http.handlers.appsec is a no-config wrapper with an empty struct -- the appsec URL is read exclusively from the global apps.crowdsec.appsec_url, which is a single string field. There is no upstream support for per-route override.

This is the sixth case in the upstream-behaviour pattern memo; empty struct + ctx.App("crowdsec") Provision = no per-handler config. v1.3.28 has been opened to evaluate two paths:

  1. Re-spike the profiles.yaml whitelist approach (the original intent encoded in the hosts.true_detect_mode model.go comment) against a live stack.
  2. Upstream PR to the Caddy plugin adding an optional per-handler appsec_url.

See docs/planning/v1.3.28-true-detect-mode.md for the gate-zero spike plan.

Smoke gate (4/4 PASS)

  1. make smoke-self -- the v1.3.26 sync-prod gates remain green.
  2. make sync-prod-dry -- expected post-v1.3.26 drift only (Makefile, deployment.md, etc.) before sync; clean post-sync.
  3. scripts/smoke/drift-detection.sh -- both surfaces flip drift_detected=true after PATCH + wait, then clear after setup-appsec.sh + wait.
  4. make deploy-prod -- panel container recreates cleanly with v1.3.27 image; healthz 200; drift detector loop visible in panel logs (drift detector ... slog lines on each tick).

NO tag until smoke real PASSes.

Upgrade

cd ~/argos-edge
git pull
make sync-prod          # pulls scripts/smoke/drift-detection.sh
                        # + setup-appsec.sh changes (none) into
                        # the operational dir
make deploy-prod        # builds panel + restarts; the drift
                        # detector starts polling on first tick

The first /api/security/drift response after upgrade will show scenarios.drift_detected=false (the v1.3.26 settings store had no appsec.disabled_scenarios written by an operator unless they explicitly disabled one in the v1.3.25 UI). If you DO see drift on first poll, that means the v1.3.25-era operator-trust signal masked a real divergence -- run setup-appsec.sh once and the banner clears within ~60s.

Files changed

  • backend/migrations/031_drop_last_applied_at_settings.{up,down}.sql (new)
  • backend/internal/security/drift/{drift.go,drift_test.go} (new)
  • backend/internal/api/security_drift.go (new)
  • backend/internal/api/security_scenarios.go (handlers + types trimmed)
  • backend/internal/server/server.go (route swap)
  • backend/cmd/argos/main.go (Detector wired)
  • backend/internal/db/migrate_test.go (rollback chain extended)
  • frontend/src/api/client.ts (types + endpoint method)
  • frontend/src/pages/Security.tsx (DriftBanner, useDrift, tab dots)
  • scripts/smoke/drift-detection.sh (new)
  • docs/operations/... -- unchanged
  • docs/planning/v1.3.28-true-detect-mode.md (new)
  • docs/release-notes/v1.3.27.md (this file)
  • CHANGELOG.md, mkdocs.yml, version bump

Not changed

  • All v1.3.23 / v1.3.25 /api/security/* read endpoints unchanged except last_applied_at / reload_needed field removal (which is technically a response-shape break, but no downstream consumer outside the panel itself reads them).
  • Caddyfile, crowdsec/* configs, docker-compose.yml -- all v1.3.26.
  • Migration 030 still latest schema-modifying migration; 031 is data-only.