v1.3.2 — AppSec fail-open (bug fix)¶

Patch release. Fixes an availability regression where a misconfigured or missing AppSec sidecar would cascade into HTTP 500 responses for every host on the panel.

The bug¶

Pre-v1.3.2, the panel emitted apps.crowdsec.appsec_url in its Caddy config whenever appsec.mode != disabled. The bouncer plugin (hslatman/caddy-crowdsec-bouncer/appsec) defaults to fail-closed behaviour: if the AppSec HTTP endpoint is unreachable, the handler returns an error and Caddy serves HTTP 500.

That default is dangerous for argos-edge installations because:

The CrowdSec image ships with zero AppSec collections by default. Ports :7422 and :7423 refuse connections until setup-appsec.sh is run.
AppSec mode defaults to detect at first boot, which means the panel emits the appsec_url from minute zero.
A single dead sidecar = every host on the panel 500s. Twenty hosts, one broken CrowdSec, zero working sites.

Confirmed in the wild on a freshly-deployed v1.3.1:

caddy_error.log →  "crowdsec.appsec" "appsec component unavailable"
                   "dial tcp 172.20.0.2:7423: connect: connection refused"
curl app.example.com → HTTP 500

The fix¶

The plugin already exposes an appsec_fail_open knob on its app- level config (v0.12.1). v1.3.2 starts emitting it, defaulted to true. Requests now pass through to the backend when the AppSec sidecar is unreachable or errors; the WAF-inline round-trip is skipped for that request.

Operators who actively run AppSec and want strict enforcement can flip the new AppSec → Fail policy card in the panel from "Fail-open (default, recommended)" to "Fail-closed (strict)".

Pieces shipped¶

caddycfg emits appsec_fail_open: true|false inside the apps.crowdsec block whenever appsec_url is set. Absent when AppSec is fully disabled so Caddy doesn't get an orphan field.
New setting appsec.fail_open (bool, default true) wired through api/settings.go + the reconciler.
New notification event appsec_unavailable — severity warning, fires on the reachable → unreachable transition of a background 5-minute probe of the AppSec URL. appsec.fail_open true means traffic keeps flowing; the notification is how the operator learns their WAF-inline silently degraded.
New UI card AppSec → Fail policy with a two-radio chooser (fail-open vs fail-closed). Only shown when AppSec mode is not disabled.
New troubleshooting section in docs/operations/troubleshooting.md covering the "connection refused on :7423" symptom, the setup-appsec.sh runbook, and how to hook up the new notification rule.

What's NOT here¶

Automatic setup-appsec.sh execution: the panel does not try to install AppSec collections for you. Fail-open prevents breakage; getting AppSec actually running is still an operator task.
Block-mode auto-downgrade: the panel does not flip a host from block to detect when AppSec is down. Fail-open is per- request; the policy is at the bouncer level, not the host level.
AppSec health badge in the status card: out of scope for the hotfix. Today the signal is via the notification event + GET /api/appsec/status consumed by the card as before.

Migration¶

Drop-in. The new setting defaults to "true" without a DB migration (it's a key/value setting, read-on-demand; absence = default). Existing stacks pick up the fix on the next reconcile after deploying the new panel + caddy images.

cd argos-edge
git pull
docker compose build
docker compose up -d

No data touched. The appsec.fail_open setting surfaces in the UI immediately as "Fail-open (default)".

Rollback¶

git checkout v1.3.1
docker compose build && up -d

The setting row may remain in settings on rollback — v1.3.1's settingWhitelist doesn't know about it, but the row is inert: the older reconciler never reads it. Harmless orphan, gone on the next v1.3.2+ upgrade.

AppSec troubleshooting section — the "connection refused on :7423" runbook.
Notifications event catalog — appsec_unavailable is the new event.
v1.3.1 — previous release (screenshots patch).