Skip to content

v1.3.36.2 -- Capture timing fix

A bugfix on top of v1.3.36.1 closing the timing regression where the capture spec fired page.screenshot() before the panel's async data fetches had completed. Result: captures showed skeleton/loading states instead of populated data — most visible on dashboards and surfaces with multiple post-mount API calls.

argosVersion and frontend/package.json deliberately stay at 1.3.35.4 (tooling-only). scripts/capture/package.json bumps 1.3.36.11.3.36.2.

Why

Operator's first capture session against prod (post-v1.3.36.1 auth fix) revealed the dashboard cards were captured mid- loading-skeleton. Cause: capture.spec.js had inconsistent per-test waits — some tests had await page.waitForLoadState('networkidle', { timeout: 5_000 }).catch(() => {}) followed by a waitForTimeout(300), others had only the waitForTimeout, others jumped straight to page.screenshot(). None of those waits were enough for multi-card dashboards or 250-row notification deliveries lists.

Fix

Two-part approach matching the operator's spec (Option C + Option D):

Universal waitForSettled() helper

async function waitForSettled(page, opts = {}) {
  const { timeout = 10_000, fallback = 3_000 } = opts;
  try {
    await page.waitForLoadState('networkidle', { timeout });
  } catch {
    // networkidle never reached (likely polling); fallback
    await page.waitForTimeout(fallback);
  }
}

Default behaviour: try networkidle for 10s; if that times out (e.g. continuous polling like the dashboard's 30s health-check interval keeps the network non-idle), fall back to a 3s fixed wait so async data has at least some chance to land.

29 waitForSettled() invocations replace the prior inconsistent wait patterns. Coverage: every page.goto() followed by data fetch is now followed by waitForSettled (the gap of 4 gotos vs 29 settles is gotos that follow a waitForSelector for a specific element, which is itself a settle signal).

Per-surface explicit selectors for the slow listings

For the surfaces hypothesized as slow (operator's PHASE 0 list — couldn't run the spec from this env without prod creds, so trusted the operator's runtime observations):

Surface Extra wait selector
security-banned table tbody tr (CrowdSec decisions)
security-activity table tbody tr, [role="row"] (audit log)
security-scenarios table tbody tr (already had this; preserved)
security-overview table tbody tr, [role="row"] (per-host KPIs)
threats-decisions table tbody tr (CrowdSec LAPI)
notifications-deliveries table tbody tr (~250 rows in demo)
logs-browser table tbody tr, [role="grid"] [role="row"]
backups-list table tbody tr
hosts-list-auth-column table tbody tr (multiple hosts)

Each uses .catch(() => {}) after the row-selector wait so empty states ("no banned IPs") still capture cleanly without failing the test — the helper's wait runs first and the row- selector adds a 5s top-up only if rows actually arrive.

Dashboard-specific extra render time

dashboard-overview.png and dashboard-security.png get an extra 800ms waitForTimeout AFTER waitForSettled() to let chart libraries (sparklines, world map) finish their own render frames. Charts often paint via requestAnimationFrame post-data-arrival, which is invisible to networkidle.

appsec-metrics extra time too

The /appsec Metrics sub-tab uses recharts which has the same post-data render pattern; bumped to 800ms.

Smoke phase 9

scripts/smoke/capture-automation.sh gains five new asserts under phase 9:

9. waitForSettled helper (timing fix):
   - waitForSettled() helper defined
   - uses networkidle as primary
   - has fallback timeout branch
   - waitForSettled invocation count >= 20
     (page.goto: 33; waitForSettled: 29)
   - no leftover pre-v1.3.36.2 inline 'waitForLoadState
     networkidle 5_000 .catch' patterns

Full e2e timing test (mock page with simulated network activity → assert helper resolves at the right moment) would require either a live Playwright browser or a hand-rolled Page mock; the static asserts cover the most common regression vectors (helper missing, helper wrong shape, old inline pattern leaks back in).

Live evidence (post-fix smoke)

phase 1: run.sh refuses without .env...                PASS
phase 2: .env is git check-ignore'd...                 PASS
phase 3: .env.example placeholders only...             PASS
phase 4: safeClick synthetic test...                   PASS (13/13)
phase 5: working tree unchanged by smoke...            PASS
phase 6: storageState wiring (v1.3.36.1)...            PASS (5/5)
phase 7: banner output uses fs.readFileSync...         PASS
phase 8: viewport 1440x1080 + shotFullScroll...        PASS
phase 9: waitForSettled helper (timing fix)...         PASS (5/5)
  page.goto calls: 33; waitForSettled calls: 29
  shotFullScroll calls: 15; shotFull calls: 21

Files changed

  • scripts/capture/capture.spec.js (waitForSettled helper + 29 invocations replacing inconsistent prior waits + per-surface row-selector waits for 9 long-list surfaces + dashboard chart-render extra time)
  • scripts/capture/package.json (1.3.36.11.3.36.2)
  • scripts/smoke/capture-automation.sh (phase 9 added)
  • docs/release-notes/v1.3.36.2.md (this file)
  • CHANGELOG.md, mkdocs.yml

NOT changed: argosVersion stays at 1.3.35.4, frontend/package.json version stays at 1.3.35.4. No Go code; no frontend code; no panel binary change.

Operator workflow post-fix

cd ~/argos-edge && git pull
scripts/capture/run.sh

# Verify post-fix:
# - dashboard-overview.png: real cards/charts populated, NO
#   loading skeletons
# - security-banned.png + notifications-deliveries.png: row
#   tables fully rendered, no "loading..." placeholders
# - logs-browser.png + security-activity.png: same
# - Total runtime: ~3-5 minutes (vs ~15s pre-fix; the prior
#   sub-second screenshots WERE the bug).

If you re-run and notice a specific surface still captures mid-load, paste its name into the next iteration and we can add a more targeted waitForSelector for that surface's key data element.

Versioning

scripts/capture/package.json 1.3.36.11.3.36.2. Tag-without-rebuild precedent for tooling-only patches: v1.3.27.1, v1.3.34, v1.3.35.1, v1.3.35.5.