v1.3.6 — 4 bug fixes (3 CrowdSec robustness + 1 modal UX)¶
Four issues surfaced operating the v1.3.5 auto-bootstrap in prod plus one pre-existing UI bug. Bug 5 from the filing (target health badges in the target-group page) is deferred to v1.3.7 — it needs a new API endpoint plus UI work that doesn't fit this release's scope.
Bugs fixed¶
1. Init container collision fallback now actually works¶
Hit in real prod: docker compose up failed with
[crowdsec-init] registering machine argos-panel
Error: cscli machines add: unable to create machine:
user 'argos-panel': user already exist
[exit 1]
The v1.3.5 compose had a pre-check — cscli machines list -o json | grep "\"machineId\":\"argos-panel\"" — but CrowdSec's JSON output formats the field as "machineId": "argos-panel" with a space after the colon. The tight grep pattern never matched, the collision check silently passed, and cscli machines add then failed.
Fix: replaced the pre-check with a simpler "try add, catch, retry with suffix" pattern. Format-independent; works regardless of cscli's JSON layout:
NAME="argos-panel"
if ! cscli machines add "$NAME" --auto -f "$CREDS_FILE" 2>&1; then
NAME="argos-panel-$(date +%s)"
cscli machines add "$NAME" --auto -f "$CREDS_FILE"
fi
2. Panel detects stale machine credentials at boot and purges them¶
Previous behaviour: if the stored credentials became invalid out- of-band (operator ran cscli machines delete argos-panel, or CrowdSec rotated its signing key, or the master key changed and corrupted the ciphertext), the panel kept retrying with the bad password forever. AppSec metrics calls bubbled up as lapi 401: incorrect Username or Password on every request, with no auto-recovery.
Fix: new crowdsec.VerifyMachineCredentials probe runs once at boot. On a 401 response it returns a typed ErrStaleCredentials; main.go purges the stored settings (crowdsec.machine_user, crowdsec.machine_password_encrypted, crowdsec.machine_password) and emits a new crowdsec_creds_stale notification event.
Transient failures (5xx, network, timeout) do NOT trigger the purge — we don't want a LAPI hiccup to nuke working creds. Only a 401 response counts as "stale".
After the purge the panel continues booting normally. The AppSec metrics endpoint falls back to the v1.3.4 degraded payload; the UI banner tells the operator to run the init sidecar.
3. Operator-triggered regenerate endpoint + UI button¶
For the case where the operator wants to verify creds without restarting the panel (or the boot probe succeeded but later out-of-band deletion silently broke things), v1.3.6 ships:
POST /api/crowdsec/regenerate-credentials— verifies current stored creds against LAPI, purges on 401, returns one of four statuses:valid— "credentials are still good"purged— "they were stale; cleared; run the init sidecar"no_credentials— "nothing to regenerate; run the init sidecar"- (502 on transient LAPI failure)
- Verify & regenerate credentials button in the AppSec page's metrics-degraded banner. Invokes the endpoint, surfaces the resulting message as a toast.
The endpoint does NOT call docker compose or the docker socket from the panel — that would be a meaningful privilege escalation. The operator runs docker compose up crowdsec-init manually; the panel picks up the fresh creds on the next reconcile.
4. Add-host modal scrolls when content exceeds viewport¶
Pre-existing bug, not introduced by any recent release. Expanding Advanced + Inline target group + DNS-provider dropdown in the Add-host modal would push the form past the viewport on smaller screens (≤ 768×600). No scrollbar inside the modal; the Save button fell off-screen and the only way to submit was hitting Enter (assuming validation passed).
Fix: Modal.tsx body is now flex-1 overflow-y-auto inside a max-h-[calc(100vh-2rem)] flex flex-col card. The header stays pinned, the body scrolls, Save is always reachable.
What's new¶
docker-compose.yml— init container retries on collision with a timestamp-suffixed machine name.backend/internal/crowdsec/bootstrap.go— new functionsVerifyMachineCredentials,PurgeMachineCredentials; new sentinelErrStaleCredentials.backend/cmd/argos/main.go— boot-time stale-creds check wired in betweenImportMachineCredentialsand the client construction. Only probes when creds came from the DB (env overrides are the operator's explicit choice and are never auto-purged).backend/internal/api/threats.go— new handlerRegenerateCrowdSecCredentialsservingPOST /api/crowdsec/regenerate-credentials.backend/internal/notifications/events.go— new event typecrowdsec_creds_stale(severity warning).frontend/src/pages/AppSec.tsx— degraded-metrics banner gets the Verify & regenerate credentials button + updated copy describing the v1.3.5 auto-bootstrap flow.frontend/src/components/Modal.tsx— flex-column layout with a pinned header + scrollable body.- Docs: two new troubleshooting entries — one for the v1.3.5 init collision symptom (now auto-handled by the retry), one for the
crowdsec_creds_staleevent.
What's deferred¶
- Bug 5 — target health status in the UI. Per the task's own split rule. Needs a new Caddy-admin-API-based endpoint, new TargetGroup detail UI component with polling. Landing in v1.3.7.
Upgrade¶
Drop-in:
The init container retries cleanly on collision. Boot-time stale-creds probe runs once, silently on healthy stacks. Modal UI fix is pure CSS — no data touched.
Not changed¶
- AppSec mode /
appsec.fail_open/appsec_unavailableevent — all unchanged from v1.3.2 / v1.3.3 / v1.3.4 / v1.3.5. - No DB migrations. New settings purge writes empty-string values; no schema change.
- Env-var-sourced machine credentials (
CROWDSEC_PANEL_MACHINE_USER/..._PASSWORD) are never auto-purged. The stale-creds probe only runs against DB-sourced credentials.
Related¶
- v1.3.5 — introduced the auto-bootstrap path that this release hardens.
- AppSec feature page — full setup + operation reference.
- Troubleshooting → CrowdSec — two new entries for v1.3.6 symptoms.