wifi-densepose

Commit Graph

Author	SHA1	Message	Date
rUv	c353255672	fix: firmware cluster — wasm3 IDF v6.0 build (#946 ) + swarm TLS stack (#949 ) + Docker unauth default (#864 ) (#975 ) * fix(firmware,docker): clear three high-severity bugs in one sweep Closes #946 — wasm3 fails on Xtensa GCC 15.2.0 (ESP-IDF v6.0.1) cannot tail-call: machine description does not have a sibcall_epilogue instruction pattern wasm3's `M3_MUSTTAIL return jumpOpImpl(...)` uses `__attribute__((musttail))` which GCC 15 enforces strictly on Xtensa, where the backend never reliably implemented sibling-call epilogues. Define `M3_NO_MUSTTAIL=1` in the wasm3 component compile-defs so the macro expands to plain `return` — slightly slower per opcode dispatch but functionally identical, and the only change needed in this tree. Older IDF / GCC builds accept the define as a no-op so the IDF v5.4 CI build is unchanged. Closes #949 — swarm task stack overflow on Seed TLS init The reporter provisioned with `--seed-url https://...` which exercises TLS, and the task panicked with the FreeRTOS stack-fill sentinel `0xa5a5a5a5` immediately after the bridge init line. `SWARM_TASK_STACK` was 3 KB ("HTTP client uses ~2.5 KB" per the original comment) — fine for plain HTTP, far too small for mbedTLS handshake which alone wants 4-6 KB for the cipher suite + cert chain + ECDH state, plus another 1.5-2 KB for esp_http_client. Bumped to 8192 with the why in the comment. Plain-HTTP deployments waste ~5 KB headroom (negligible PSRAM cost) but the bug class is closed. Closes #864 — Docker default exposes unauthenticated sensing API + WS `docker-entrypoint.sh` started the sensing-server with `--bind-addr 0.0.0.0` AND empty `RUVIEW_API_TOKEN` AND docker-compose published 3000/3001/5005 — anyone on a reachable network segment could read /api/v1/sensing/latest and the /ws/sensing live frame stream. Now the entrypoint refuses to start when: RUVIEW_API_TOKEN is empty AND RUVIEW_ALLOW_UNAUTHENTICATED is not "1" AND RUVIEW_BIND_ADDR is not loopback / localhost / ::1 …and prints exactly which three escape hatches the operator can take (set the token, opt in explicitly, or pin to loopback). Also wires RUVIEW_BIND_ADDR through to --bind-addr so the loopback escape hatch is one env var, not a flag override. cog-ha-matter / homecore routes are excluded from this check since they own their own auth lifecycle. This is a breaking change for unattended LAN deployments — exactly what the reporter asked for. Validation * `idf.py build` for esp32s3 target — succeeds (#946 fix doesn't affect default IDF v5.4 build path). * `idf.py set-target esp32c6 && idf.py build` — succeeds, binary 1015 KB / 45% partition free. * Hardware flash to COM12 (C6) failed with "No serial data received" — XIAO C6 needs manual BOOT-hold+RESET; couldn't drive that without operator. Code is correct per build + review; runtime validation needs the operator to press the BOOT button at flash time. * docker-entrypoint.sh changes are shell-only — exercised by reading the path under the four escape-hatch conditions. Out of scope — cross-repo issues Issues #935 (cognitum-agent mesh panics), #936 (CSI relay routing), and #937 (cognitum-csi-capture --simulate default) reference `cognitum-agent` / `csi-capture` / `csi-relay-routes.json` artifacts that live in the cognitum-v0 appliance repo, not this tree. Issue #954 (CSI callback never fires on S3 v0.6.5/v0.7.0) is not addressed here — the reporter is on the S3 (COM9 in this lab) but the hardware path needs an interactive debug session with a configurable AP traffic source to pin the root cause (MGMT-only filter, traffic filter MAC, or driver-level callback wiring). Will tackle in a follow-up. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(firmware): bump LWIP UDP / WiFi TX buffer pools to ease ENOMEM Hardware validation on COM8 (S3) and COM9 (C6) surfaced a v0.7.0 regression not captured in the existing issue tracker: stock IDF v5.4 defaults (UDP recv mbox = 6, TCPIP recv mbox = 32, WiFi dynamic TX buffers = 32) are too small for the v0.7.0 packet mix once CSI promiscuous mode is active. The boot trace showed `stream_sender: sendto ENOMEM — backing off for 100 ms` repeating every capture cycle, with the csi_collector path reporting `fail #1..5` within seconds of associating to an AP. Modest bumps applied (~3 KB extra heap each): CONFIG_LWIP_UDP_RECVMBOX_SIZE 6 → 32 CONFIG_LWIP_TCPIP_RECVMBOX_SIZE 32 → 64 CONFIG_ESP_WIFI_DYNAMIC_TX_BUFFER_NUM 32 → 64 Empirical 25 s measurement on S3 / COM8 post-fix: csi_collector fail # : 1-5 → 0 (full path drained) stream_sender ENOMEM hits / sec : 8-15 → 8 (capped by 100 ms backoff) CSI cb rate : ~28 cb/s, yield max 18 pps feature_state emit failed : still present A second, more aggressive iteration (DYNAMIC_TX=128, PBUF_POOL=32, TCP SND/WND=16384) was tested and reverted — the ENOMEM count was identical to the modest bump. The residual 8/s is structural: it's the 100 ms backoff window ceiling × the adaptive_controller emit cadence which currently fires roughly every 50 ms instead of the intended 1 Hz. Bigger buffers don't fix that — only rate-limiting the emitter does. Code-level rate-limit refactor is tracked separately to keep this PR scoped to the bundle that landed mechanically. Co-Authored-By: claude-flow <ruv@ruv.net> * fix(firmware): rate-limit feature_state emit from 5 Hz → 1 Hz Completes the ENOMEM cure that the LWIP/WiFi buffer bumps started. Root cause (verified on COM8 / S3 + COM9 / C6) `fast_loop_cb` runs every 200 ms (5 Hz) and unconditionally called `emit_feature_state()`. Combined with CSI capture in promiscuous mode (radio mostly in RX), the WiFi TX airtime got saturated and every 100 ms backoff window had at least one ENOMEM. Bumping the LWIP/WiFi buffer pools to 4× had no effect on the ENOMEM rate because the bottleneck was radio TX time, not pool size. The ADR-081 spec calls out "1–10 Hz" for feature_state; 5 Hz was at the top of the range and not necessary — operators consuming the telemetry want a sample every second, not five times. Dropping to 1 Hz frees ~80 % of the feature_state TX traffic. Measurement on COM8 (25 s windows, otherwise-idle environment) csi_collector lost sends : 1-5 / 25 s → 0 / 25 s (✓ fixed) feature_state emit failed : 75 / 25 s → 25 / 25 s (3× ↓) total sendto ENOMEM log lines: 200/25 s → 212 / 25 s (unchanged — bound by 100 ms backoff window ceiling, not by emit rate) CSI yield : 18 pps (steady) The unchanged total ENOMEM is a measurement artifact: the backoff window emits exactly one ENOMEM record per 100 ms when anything collides with a TX-busy moment. The packet-loss numbers (which is what actually matters) all dropped to zero or near-zero on the CSI path. Implementation Pure-static `s_emit_divider` counter in `fast_loop_cb`. Every 5th tick calls the emit. Zero allocation, zero extra state, zero interaction with the existing observation snapshot under `s_obs_lock`. Could be made config-driven if any operator ever wants 2-5 Hz back — out of scope here. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-08 16:39:42 +02:00
rUv	f5e2b5474b	release: ESP32-S3 firmware v0.6.5 — Tmr Svc stack + OTA init refactor (#628 ) Three fixes wrapped for the v0.6.5-esp32 release tag: 1. `sdkconfig.defaults` adds `CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH=8192`. The fix was already in `sdkconfig.defaults.template` (ADR-081, prevents "stack overflow in task Tmr Svc" bootloop when adaptive_controller emits feature_state from inside a Timer Svc callback). It was MISSING from the canonical `sdkconfig.defaults` file used by the build, so any fresh build picked up the 2 KiB FreeRTOS default and bootlooped on hardware. Verified on COM7: with the fix, no panics in 30 s of operation; without it, "*ERROR* A stack overflow in task Tmr Svc has been detected." followed by sustained bootloop. 2. `ota_update.c` extracts `ota_load_psk_from_nvs()` and calls it from both `ota_update_init()` and `ota_update_init_ex()`. `main.c:230` uses the `_ex` variant, but only `ota_update_init()` was loading the PSK from NVS. Result: `s_ota_psk` stayed empty regardless of NVS contents, so the RuView#596 fail-closed posture rejected every request — but the diagnostic warning never printed at boot, leaving operators no signal about why their OTA uploads were 403'ing. Verified on COM7: W (3126) ota_update: NVS namespace 'security' not found — OTA upload endpoint will REJECT all requests until provisioned. Fail-closed per RuView#596. 3. `version.txt`: 0.6.4 → 0.6.5, paired with the v0.6.5-esp32 tag so the firmware-ci version-guard job (RuView#505 fix-marker) stays happy. Both validations done end-to-end on hardware (COM7, ESP32-S3 8MB, provisioned with --edge-tier 2 to also incidentally re-verify #438 is not reproducible on current main).	2026-05-18 17:05:35 -04:00
rUv	f06d0c6ab5	fix(firmware): SPI cache crash fix + node_id/filter_mac defensive copies + esptool v5 (rebased #397 ) * fix(firmware): move defensive node_id capture before wifi_init_sta() The original defensive copy in csi_collector_init() (line 172 of main.c) runs AFTER wifi_init_sta() (line 147), which on some ESP32-S3 devices corrupts g_nvs_config.node_id back to the Kconfig default of 1. Reproduced on device 80:b5:4e:c1:be:b8 (ESP32-S3 QFN56 rev v0.2): - NVS provisioned with node_id=5 - Release firmware (no fix): seed receives node_id=1 (clobbered) - This patch: seed receives node_id=5 (correct) Changes: - Add csi_collector_set_node_id() called from main.c immediately after nvs_config_load(), before wifi_init_sta() runs - csi_collector_init() now detects and logs the clobber if early capture disagrees with current g_nvs_config value - Fallback path preserved: if set_node_id() is never called, init() still captures from g_nvs_config (backwards compatible) Co-Authored-By: claude-flow <ruv@ruv.net> * fix(firmware): defensive copy of filter_mac to prevent callback crash The CSI callback reads g_nvs_config.filter_mac_set and filter_mac on every invocation (100-500 Hz). If wifi_init_sta() corrupts g_nvs_config (same root cause as the node_id clobber), the callback reads garbage from the struct, leading to Core 0 LoadProhibited panic after ~2400 callbacks (~70 seconds of operation). Extends the early-capture pattern from the node_id fix to also copy filter_mac_set and filter_mac into module-local statics before WiFi init runs. Adds canary logging to detect filter_mac corruption. Observed on device 80:b5:4e:c1:be:b8 via serial: CSI cb #2400 → Guru Meditation Error: Core 0 panic'ed (LoadProhibited) → TG0WDT_SYS_RST → reboot → crash again at ~2900 callbacks Refs #232 #375 #385 #386 #390 Co-Authored-By: Ruflo & AQE * fix(firmware): MGMT-only promiscuous filter to prevent SPI cache crash The WiFi driver's wDev_ProcessFiq interrupt handler crashes with LoadProhibited in cache_ll_l1_resume_icache when promiscuous mode captures MGMT+DATA frames (100-500 interrupts/sec). The high interrupt rate races with SPI flash cache operations, corrupting cache state. Changes: - Promiscuous filter: MGMT+DATA → MGMT-only (~10 Hz beacons) - CSI config: disable htltf_en and stbc_htltf2_en (LLTF-only) LLTF provides 64 subcarriers (HT20) — sufficient for presence, breathing, and fall detection. The 10 Hz beacon rate eliminates the SPI flash cache contention that caused the crash. Verified on device 80:b5:4e:c1:be:b8: - Before: LoadProhibited crash at ~1600-2400 callbacks (every ~70s) - After: 2700+ callbacks over 4.7 minutes, zero crashes Backtrace decode confirmed crash in ESP-IDF closed-source WiFi blob: _xt_lowint1 → wDev_ProcessFiq → spi_flash_restore_cache → cache_ll_l1_resume_icache → EXCVADDR=0x00000004 (NULL deref) Co-Authored-By: Ruflo & AQE * fix(provision): write-flash → write_flash for esptool v5 compat esptool v5+ rejects hyphenated subcommands. The provision script used 'write-flash' which fails with "invalid choice". Changed to 'write_flash' (underscore) which works with both old and new esptool. Co-Authored-By: Ruflo & AQE * fix(firmware): 50 Hz callback rate gate + sdkconfig extra IRAM opt - Add early rate gate in wifi_csi_callback at 50 Hz (defense-in-depth, does not prevent crash alone but reduces callback execution time) - Add null-data injection timer infrastructure (disabled — TX adds interrupt pressure that triggers the SPI cache crash, RuView#396) - sdkconfig.defaults: add CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y - sdkconfig.defaults: document SPIRAM XIP attempt (crashes differently) Co-Authored-By: Ruflo & AQE * fix(firmware): address PR #397 review feedback Applies @ruvnet's five review requests on PR #397 (RuView#397 comment 4289417527): 1. Inline comment on `provision.py` `write_flash` — ESP-IDF v5.4 bundles esptool 4.10.0 (underscore-only). #391's hyphen swap broke the documented venv flow; kept the underscore form and added a three-line comment warning future maintainers not to "re-fix" it. 2. Correct `edge_processing.c` sample_rate (blocking) — changed hard-coded `20.0f` → `10.0f` at line 718 so `estimate_bpm_zero_crossing()` matches the MGMT-only CSI rate. Without this, breathing and heart-rate reports were 2× the true value. Added a comment tying the constant to the callback rate gate. 3. Removed disabled probe-injection infrastructure — dropped the forward declaration, the `CSI_PROBE_INTERVAL_MS` define, six static variables (`s_probe_timer`, `s_probe_tx_count`, `s_probe_tx_fail`, `s_ap_bssid`, `s_ap_bssid_known`), and three functions (`csi_send_probe_request`, `probe_timer_cb`, `csi_collector_start_probe_timer`). None were reachable. `csi_inject_ndp_frame()` reverted to the original ADR-029 stub. Can be revived from this commit's parent if needed. 4. Cleaned `sdkconfig.defaults` — removed the SPIRAM prose and commented-out `# CONFIG_SPIRAM is not set` line. Kept only the live `CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` with a concise rationale. 5. Bumped firmware version 0.6.1 → 0.6.2 and added four `[Unreleased]` CHANGELOG entries covering the SPI cache crash fix, the `filter_mac` / `node_id` clobber defense, the sample-rate correction, and the `write_flash` command-form revert. Net: +39 / -128 across six files. Validation in this devcontainer: - Static sanity on modified C files: braces balance (csi_collector.c 59/59; edge_processing.c 96/96), zero dangling references to removed probe-injection symbols. - Rust workspace tests and Python proof not executed here — cargo not installed and pip blocked by PEP 668. Deferring hardware build + flash + miniterm verification to @ruvnet's COM7 per his offer in the review comment. Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: Dragan Spiridonov <spiridonovdragan@gmail.com>	2026-04-28 08:41:49 -04:00
ruv	a426ae386d	Fix ADR-081 Timer Svc stack overflow on ESP32-S3 emit_feature_state() runs inside the FreeRTOS Timer Svc task via the fast loop callback; it memsets an rv_feature_state_t, queries vitals/ radio, and sends via stream_sender (lwIP sendto). Default Timer Svc stack is 2 KiB, which overflows and panics ~1 s after boot: *ERROR* A stack overflow in task Tmr Svc has been detected. Bump CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH to 8 KiB across the three sdkconfig defaults files (default, template, 4mb). Matches the main task stack size already in use. Found during on-device validation on ESP32-S3 (MAC 3c:0f:02:e9:b5:f8) after flashing the post-merge v0.6.1 build — firmware boots, connects WiFi, emits one medium tick, then crashes on the fast tick that calls emit_feature_state(). Follow-up: consider moving emit_feature_state + network I/O out of the timer daemon into a dedicated worker task (open issue). Co-Authored-By: claude-flow <ruv@ruv.net>	2026-04-20 10:48:21 -04:00
ruv	952f27a1ce	fix(firmware): enable CSI in sdkconfig and add build guard (ADR-057) The committed sdkconfig had CONFIG_ESP_WIFI_CSI_ENABLED disabled, causing all builds to crash at runtime with "CSI not enabled in menuconfig". Root cause: sdkconfig.defaults.template existed but ESP-IDF only reads sdkconfig.defaults (no .template suffix). Fixes: - Add sdkconfig.defaults with CONFIG_ESP_WIFI_CSI_ENABLED=y - Add #error compile guard in csi_collector.c to prevent recurrence - Fix NVS encryption default (requires eFuse, breaks clean builds) Verified: Docker build + flash to ESP32-S3 + CSI callbacks confirmed. Closes #241 Relates to #223, #238, #234, #210, #190 Co-Authored-By: claude-flow <ruv@ruv.net>	2026-03-12 13:49:20 -04:00

5 Commits