v1.3.35.3 -- Demo: wire crowdsec-init sidecar (machine credentials fix)¶
A bugfix on top of v1.3.35.2. The demo's init.sh invoked docker compose up -d --no-deps argos crowdsec caddy, which explicitly excluded the crowdsec-init sidecar. That sidecar runs cscli machines add argos-panel and writes credentials to a shared volume the panel imports on first boot; without it, every panel-to-LAPI call returned 403 (country reconciler ticks, threats UI, AppSec metrics endpoint, system health recent_errors).
The crowdsec-init service was always defined in the demo's docker-compose.yml (mirrored from prod via the demo's compose override pattern). The bug was 100% in the script that brings the stack up.
argosVersion and frontend/package.json bumped from 1.3.35.2 to 1.3.35.3. Image rebuild required.
Symptoms (pre-fix)¶
/api/system/healthreturnedrecent_errors: null(the reconciler couldn't enumerate decisions to populate the field)./api/threats/decisionsreturnedLAPI 403: access forbidden./api/threats/appsec/metricsreturnedunavailable, requires machine credentials.- Panel logs showed
lapi 403: {"message":"access forbidden"}on every country reconciler tick (every 5 minutes, per country, ~8 lines/tick = ~96/hour of error noise). cscli machines listinsideargos-demo-crowdsecshowed ONLY the bouncer-internallocalhostmachine; noargos-panelregistration.
Root cause¶
scripts/demo/init.sh line 78 (pre-fix):
The --no-deps flag explicitly disables the depends_on chain. The base compose has the chain set up correctly:
argos:
depends_on:
crowdsec-init:
condition: service_completed_successfully
crowdsec-init:
depends_on:
crowdsec:
condition: service_healthy
If we let compose drive ordering, the sidecar runs and exits before the panel even starts. The --no-deps short-circuited that.
The original intent of --no-deps argos crowdsec caddy was likely "be explicit about what we want running" — a defensive posture. It backfired because the explicit list omitted the init service.
Fix¶
scripts/demo/init.sh now uses:
Plain docker compose up -d brings up every service the override declares, in the right order, per depends_on.
Three new wait/verify steps in init.sh¶
- Bumped panel-healthcheck timeout from 60s → 120s. The crowdsec-init step takes 10-30s on a cold-start (first-time hub-update inside the sidecar's
cscliinvocation), so the panel'sStartingwindow is longer than before. crowdsec-initexit-code check — the script now warns loudly if the sidecar exited non-zero, and dumps the sidecar's last 10 log lines. exit 0 = credentials written; anything else = the panel will see 403s post-boot and the operator should know.- Wait-for-
argos-panel-machine-registration loop — pollscscli machines listfor theargos-panelrow. The credential-import inside the panel happens on the next reconcile tick (a few seconds after panel boot), so this loop confirms the import landed before init.sh moves on to the seed step.
Smoke phase 3c — panel-LAPI integration¶
scripts/smoke/demo-environment.sh gains a new phase between the existing 3b and 4. Three assertions:
3c-i: cscli machines list contains 'argos-panel'
3c-ii: zero 'lapi 403' lines in panel logs (last 30s window)
3c-iii: credentials sentinel /data/shared/crowdsec-machine-
credentials.yaml is absent (consumed by panel import)
All three must PASS for the smoke to proceed; any failure points at a specific stage of the credentials chain (3c-i = init sidecar didn't run; 3c-ii = panel hasn't imported yet or import failed; 3c-iii = import didn't run).
Live evidence (post-fix)¶
$ docker exec argos-demo-crowdsec cscli machines list
localhost 127.0.0.1 2026-04-28T18:56:05Z ✔️ ...
argos-panel 172.20.0.4 2026-04-28T18:55:18Z ✔️ ...
$ docker inspect argos-demo-crowdsec-init --format \
'{{.State.Status}}: ExitCode={{.State.ExitCode}}'
exited: ExitCode=0
$ docker logs argos-demo-panel | grep -i credential
... INFO crowdsec: machine credentials imported from init sidecar
user=argos-panel
path=/data/shared/crowdsec-machine-credentials.yaml
... INFO crowdsec: client wired url=http://crowdsec:8081
machine_write=true
$ docker exec argos-demo-panel sh -c \
'test -f /data/shared/crowdsec-machine-credentials.yaml \
&& echo present || echo absent'
absent
$ docker logs argos-demo-panel --since 60s | grep -c 'lapi 403'
0
Files changed¶
scripts/demo/init.sh— drop--no-deps argos crowdsec caddy; bump panel healthcheck timeout 60s → 120s; add crowdsec-init exit-code check + machine-registered wait loop.scripts/smoke/demo-environment.sh— new phase 3c (panel-LAPI integration: machines list + log scan + sentinel-consumed checks).backend/cmd/argos/main.go—argosVersion1.3.35.2 → 1.3.35.3.frontend/package.json—version1.3.35.2 → 1.3.35.3.scripts/demo/docker-compose.override.yml— image pinargos-prod-argos:1.3.35.3.docs/release-notes/v1.3.35.3.md(this file)CHANGELOG.md,mkdocs.yml
Smoke gate¶
scripts/smoke/demo-environment.sh --yes PASS end-to-end with new phase 3c green. Self-executed against the live host pre-tag for v1.3.35.3.
Upgrade¶
cd ~/argos-edge
git pull
make sync-prod && make build-prod-image
scripts/demo/init.sh
# panel ready at http://localhost:9181 login: demo / demo1234
# Verify the panel-LAPI integration:
docker exec argos-demo-crowdsec cscli machines list
# expected: a row with 'argos-panel' in addition to 'localhost'
# After login, /system Health card should show 'recent_errors'
# as an empty array (or a populated array of real errors), NOT
# null. Threats tab should render decisions without 403.
# AppSec metrics tab should render counters.
If you have a v1.3.35 or v1.3.35.2 demo stack still up:
The volume reset is required because the panel's settings DB in the existing demo will have leftover state from the broken init; cleanest path is a full reset.
What this enables¶
The screenshot capture session can now show fully-functional panels: every surface that depends on panel-to-LAPI calls (threats, AppSec metrics, country reconciler state, system health) will render correctly. v1.3.35.2's per-surface density expansion only paid off for surfaces that read from the panel DB directly; v1.3.35.3 closes the LAPI-integrated surfaces too.