v1.3.36.2 -- Capture timing fix¶
A bugfix on top of v1.3.36.1 closing the timing regression where the capture spec fired page.screenshot() before the panel's async data fetches had completed. Result: captures showed skeleton/loading states instead of populated data — most visible on dashboards and surfaces with multiple post-mount API calls.
argosVersion and frontend/package.json deliberately stay at 1.3.35.4 (tooling-only). scripts/capture/package.json bumps 1.3.36.1 → 1.3.36.2.
Why¶
Operator's first capture session against prod (post-v1.3.36.1 auth fix) revealed the dashboard cards were captured mid- loading-skeleton. Cause: capture.spec.js had inconsistent per-test waits — some tests had await page.waitForLoadState('networkidle', { timeout: 5_000 }).catch(() => {}) followed by a waitForTimeout(300), others had only the waitForTimeout, others jumped straight to page.screenshot(). None of those waits were enough for multi-card dashboards or 250-row notification deliveries lists.
Fix¶
Two-part approach matching the operator's spec (Option C + Option D):
Universal waitForSettled() helper¶
async function waitForSettled(page, opts = {}) {
const { timeout = 10_000, fallback = 3_000 } = opts;
try {
await page.waitForLoadState('networkidle', { timeout });
} catch {
// networkidle never reached (likely polling); fallback
await page.waitForTimeout(fallback);
}
}
Default behaviour: try networkidle for 10s; if that times out (e.g. continuous polling like the dashboard's 30s health-check interval keeps the network non-idle), fall back to a 3s fixed wait so async data has at least some chance to land.
29 waitForSettled() invocations replace the prior inconsistent wait patterns. Coverage: every page.goto() followed by data fetch is now followed by waitForSettled (the gap of 4 gotos vs 29 settles is gotos that follow a waitForSelector for a specific element, which is itself a settle signal).
Per-surface explicit selectors for the slow listings¶
For the surfaces hypothesized as slow (operator's PHASE 0 list — couldn't run the spec from this env without prod creds, so trusted the operator's runtime observations):
| Surface | Extra wait selector |
|---|---|
security-banned | table tbody tr (CrowdSec decisions) |
security-activity | table tbody tr, [role="row"] (audit log) |
security-scenarios | table tbody tr (already had this; preserved) |
security-overview | table tbody tr, [role="row"] (per-host KPIs) |
threats-decisions | table tbody tr (CrowdSec LAPI) |
notifications-deliveries | table tbody tr (~250 rows in demo) |
logs-browser | table tbody tr, [role="grid"] [role="row"] |
backups-list | table tbody tr |
hosts-list-auth-column | table tbody tr (multiple hosts) |
Each uses .catch(() => {}) after the row-selector wait so empty states ("no banned IPs") still capture cleanly without failing the test — the helper's wait runs first and the row- selector adds a 5s top-up only if rows actually arrive.
Dashboard-specific extra render time¶
dashboard-overview.png and dashboard-security.png get an extra 800ms waitForTimeout AFTER waitForSettled() to let chart libraries (sparklines, world map) finish their own render frames. Charts often paint via requestAnimationFrame post-data-arrival, which is invisible to networkidle.
appsec-metrics extra time too¶
The /appsec Metrics sub-tab uses recharts which has the same post-data render pattern; bumped to 800ms.
Smoke phase 9¶
scripts/smoke/capture-automation.sh gains five new asserts under phase 9:
9. waitForSettled helper (timing fix):
- waitForSettled() helper defined
- uses networkidle as primary
- has fallback timeout branch
- waitForSettled invocation count >= 20
(page.goto: 33; waitForSettled: 29)
- no leftover pre-v1.3.36.2 inline 'waitForLoadState
networkidle 5_000 .catch' patterns
Full e2e timing test (mock page with simulated network activity → assert helper resolves at the right moment) would require either a live Playwright browser or a hand-rolled Page mock; the static asserts cover the most common regression vectors (helper missing, helper wrong shape, old inline pattern leaks back in).
Live evidence (post-fix smoke)¶
phase 1: run.sh refuses without .env... PASS
phase 2: .env is git check-ignore'd... PASS
phase 3: .env.example placeholders only... PASS
phase 4: safeClick synthetic test... PASS (13/13)
phase 5: working tree unchanged by smoke... PASS
phase 6: storageState wiring (v1.3.36.1)... PASS (5/5)
phase 7: banner output uses fs.readFileSync... PASS
phase 8: viewport 1440x1080 + shotFullScroll... PASS
phase 9: waitForSettled helper (timing fix)... PASS (5/5)
page.goto calls: 33; waitForSettled calls: 29
shotFullScroll calls: 15; shotFull calls: 21
Files changed¶
scripts/capture/capture.spec.js(waitForSettled helper + 29 invocations replacing inconsistent prior waits + per-surface row-selector waits for 9 long-list surfaces + dashboard chart-render extra time)scripts/capture/package.json(1.3.36.1→1.3.36.2)scripts/smoke/capture-automation.sh(phase 9 added)docs/release-notes/v1.3.36.2.md(this file)CHANGELOG.md,mkdocs.yml
NOT changed: argosVersion stays at 1.3.35.4, frontend/package.json version stays at 1.3.35.4. No Go code; no frontend code; no panel binary change.
Operator workflow post-fix¶
cd ~/argos-edge && git pull
scripts/capture/run.sh
# Verify post-fix:
# - dashboard-overview.png: real cards/charts populated, NO
# loading skeletons
# - security-banned.png + notifications-deliveries.png: row
# tables fully rendered, no "loading..." placeholders
# - logs-browser.png + security-activity.png: same
# - Total runtime: ~3-5 minutes (vs ~15s pre-fix; the prior
# sub-second screenshots WERE the bug).
If you re-run and notice a specific surface still captures mid-load, paste its name into the next iteration and we can add a more targeted waitForSelector for that surface's key data element.
Versioning¶
scripts/capture/package.json 1.3.36.1 → 1.3.36.2. Tag-without-rebuild precedent for tooling-only patches: v1.3.27.1, v1.3.34, v1.3.35.1, v1.3.35.5.