WITNESS-LOG-110 prior state had byte 19 bit 4 (cross-node sync valid)
only being set from c6_timesync_is_valid() — but c6_timesync is the
802.15.4 path that D1 documented as unfixable in IDF v5.4 (rx=0 across
every soak we've run). The working transport is c6_sync_espnow (§A0.7,
§A0.10: 99.43-99.56% RX cross-board, 104 µs smoothed-offset stdev),
yet frames from sync'd nodes had bit 4 cleared because the ESP-NOW
path didn't OR into the flag.
Fix: also set bit 4 when c6_sync_espnow_is_valid() — the OR semantic
means a node signals sync from whichever transport is healthy. Host
sees bit 4 set, knows to pair the frame against the most recent sync
packet (§A0.12) from this node_id.
Side effect: this also enables S3 boards to set bit 4 (c6_sync_espnow
works on both targets, c6_timesync is C6-only). So a multi-target
mesh of S3+C6 boards now correctly signals cross-node alignment
regardless of which chips are in the fleet.
Build evidence: C6 image 1019 KB (+16 bytes for the new check),
45% slack unchanged.
Co-Authored-By: claude-flow <ruv@ruv.net>
Bundles the iter 8 + iter 9 sync-packet work (§A0.11 + §A0.12) into a
shipped release. v0.6.8 didn't carry the sync emission; v0.6.9 closes
the loop.
What ships:
- csi_collector emits a 32-byte UDP sync packet (magic 0xC511A110)
every CONFIG_C6_SYNC_EVERY_N_FRAMES CSI callbacks (default 20).
- New Kconfig knob lets operators tune cadence from ~0.1 Hz (N=1000)
to ~10 Hz (N=1) without rebuilding — sensible defaults for
mainstream multistatic at ~2 s sync interval.
- Backwards-compatible at the wire level: old aggregators drop the new
magic on existing parser mismatch path.
Build artifacts (both green on IDF v5.4):
- S3 8 MB: 1094 KB, 47% partition slack
- C6 4 MB: 1019 KB, 45% partition slack
The macro define was renamed from SYNC_EVERY_N_FRAMES to
CONFIG_C6_SYNC_EVERY_N_FRAMES so the Kconfig generator wires through.
Header guard preserves the default for builds without the kconfig
applied.
Co-Authored-By: claude-flow <ruv@ruv.net>
SOTA iter 9 — closes the §A0.11 wiring gap with empirical evidence.
Added a diagnostic ESP_LOGI in the sync emit path; flashed both C6
boards; captured 45s parallel serial output.
Sync packet generation confirmed live:
COM12 (leader, ...00:84):
sync-pkt #1 ... node=12 flags=0x03 local_us=28864932 epoch_us=28864939
flags=0x03 = leader+valid, epoch ≈ local (7 µs delta = call-stack
elapsed only — leader has no offset by definition)
COM9 (follower, ...05:3c):
sync-pkt #1 ... node=9 flags=0x06 local_us=28798450 epoch_us=27634885
flags=0x06 = valid+smoothed_used, local-epoch = 1,163,565 µs
Matches §A0.10's measured -1.16 s mesh-aligned offset within 285 µs
(WiFi MAC TX jitter floor between samples).
Cadence: 2.05 s between sync packets — 20 CSI frames at the bench's
observed 10 fps rate = exactly the design intent.
UDP send returns -1 (sr=-1) because the bench boards are intentionally
not associated to a real AP (provisioned to dead SSIDs for the iter
2-8 mesh experiments). No crash, no resource leak in 45s. Once boards
hit a routable network, sr becomes the byte count.
Wiring gap §A0.11 now CLOSED. Multistatic CSI fusion downstream has
a documented protocol to recover mesh-aligned timestamps for every CSI
frame: host pairs (node_id, sequence) across the two packet streams.
Host-side parser is the natural next layer (wifi-densepose-sensing-server).
Build evidence: C6 image 1019 KB (+0.5 KB for the diag log line),
45% partition slack unchanged.
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes WITNESS-LOG-110 §A0.11 wiring gap. Adds a separate 32-byte UDP
packet (magic 0xC511A110, distinct from the CSI frame magic 0xC5110001)
carrying:
[0..3] magic 0xC511A110 (LE u32) — CSI-ADR-110 sync packet
[4] node_id
[5] proto version (0x01)
[6] flags: bit0=is_leader, bit1=is_valid, bit2=smoothed_used
[7] reserved
[8..15] local esp_timer_get_time() (LE u64)
[16..23] mesh-aligned epoch (LE u64) = local + EMA-smoothed offset
[24..27] high-water sequence number (LE u32) for pairing with CSI frames
[28..31] reserved (room for leader_id low32 in a follow-up)
Emitted once per 20 CSI frames (≈ 1 Hz at the 20 Hz send-rate gate).
Same stream_sender UDP socket as CSI frames — host dispatches by first
4 bytes of each datagram.
Backwards compatible: aggregators that don't know about the new magic
ignore it (sync packets won't match the CSI parser's magic check, so
they're dropped harmlessly by existing receivers). New aggregators
pair (node_id, sequence) across the two packet streams to align CSI
frames to mesh time.
Sets us up for downstream ADR-029/030 multistatic CSI fusion: with the
host now able to recover the mesh-aligned epoch from each frame's
sequence number, frames from multiple boards can be ordered + fused
on a common timeline.
Build evidence: C6 image 1019 KB (+1 KB vs v0.6.8 no-sync), 45 %
partition slack unchanged. Host-side parser update is a follow-up.
Co-Authored-By: claude-flow <ruv@ruv.net>
SOTA iter 5 — converted the iter 4 ADR-110 §A0.8 closing recommendation
("host-side Kalman / linear fit on the offset trajectory") into a
firmware-side, fixed-point EMA so every downstream consumer of
c6_sync_espnow_get_epoch_us() gets bounded-jitter timestamps for free.
Implementation:
* α = 1/8 (Q3.3 shift = 3), ≈8-sample effective window at the 10 Hz
beacon rate. Tracks the ≈1.4 ppm crystal drift §A0.8 measured while
averaging out per-beacon WiFi-MAC jitter spikes.
* y[n] = y[n-1] + (raw - y[n-1]) >> 3 — integer arithmetic, two cycles
on the RISC-V LP/HP cores, no float dependency.
* Seeded from the first follower-mode sample so we don't bias toward 0.
* New getter: int64_t c6_sync_espnow_get_offset_us_smoothed(void).
* c6_sync_espnow_get_offset_us() (raw) stays for diagnostics, unchanged.
* c6_sync_espnow_get_epoch_us() now prefers the smoothed offset once
s_smoothed_seeded — meaning every CSI frame timestamp ADR-029/030
consumes is already filtered, no host-side rework required.
Diag log line now prints both:
c6_espnow: tx#N ... offset_us=R smoothed=S
90 s bench verification (witness §A0.9 + iter5-COM9-ema-90s.log) shows
both values tracking. Methodology caveat in §A0.9: short windows don't
let the smoothing benefit emerge over the raw noise floor — the
suppression ratio measurement needs ≥5 min, deferred to a long-soak
iteration.
Binary size cost: ~32 bytes (one int64, one bool, one getter). C6 build
still 45% partition slack.
Co-Authored-By: claude-flow <ruv@ruv.net>
Iter 1 finding from /loop 5m SOTA sprint: two C6 boards now mesh through
the c6_softap_he soft-AP (COM12 hosts ruview-c6-twt, COM9 associates), but
COM9 lands at phymode(0x3, 11bgn), he:0 — the soft-AP doesn't advertise
HE. Confirmed by full grep of components/esp_wifi/include/esp_wifi*.h:
the public API exposes ONLY STA-side iTWT/bTWT. There is no
esp_wifi_ap_set_he_config, no wifi_he_ap_config_t, no wifi_config_t.ap.he_*
field — soft-AP HE/TWT-Responder advertise is not user-controllable on
ESP32-C6 in IDF v5.4.
Consequence: B1/B2 cannot be measured via the two-C6 path on this IDF
release. The c6_softap_he module ships as the in-place hook for any
future IDF release that exposes the API; until then a real 11ax router
or phone hotspot remains the path. Sharpens the open question from "do
we need an 11ax AP?" to "we need either a future IDF AP-side HE config
API, or an external 11ax AP".
WITNESS-LOG-110 §A0.6 records the parallel boot logs from both boards
plus the IDF surface grep evidence.
c6_softap_he.c gains an ESP_LOGW at AP-up time so operators understand
exactly why STAs land at 11bgn (avoids confusion with the v0.6.6 §A8
graceful-TWT-NACK story).
While here: cleared the three -Wunused-variable warnings in swarm_bridge.c
that fired on every build (fw_ver, free_heap, presence in heartbeat block).
fw_ver now feeds an ESP_LOGI so the boot log names the build; free_heap +
heartbeat-presence were dead anyway. Pure ultra-opt: smaller .o files, zero
warning noise.
Co-Authored-By: claude-flow <ruv@ruv.net>
ADR-110 P9 — software-only unblocks for the WITNESS-LOG-110 §B
hardware-blocked items. Two new modules, both default-off so v0.6.6 fleets
see no behavior change.
LP-core (B4 path):
- New firmware/esp32-csi-node/main/lp_core/main.c: real RISC-V LP-core
motion-gate program with debounce + monotonic motion_count counter.
- c6_lp_core.c rewritten to load + run the LP binary via ulp_lp_core_run
when CONFIG_C6_LP_CORE_ENABLE=y; falls back to the v0.6.6 ext1 GPIO-wake
path otherwise (keeps regression surface small).
- ulp_embed_binary() wired in main/CMakeLists.txt, gated on the Kconfig.
- New Kconfig knobs: C6_LP_POLL_PERIOD_US (default 10 ms),
C6_LP_DEBOUNCE_SAMPLES (default 3).
- Exposes c6_lp_core_motion_count() / c6_lp_core_poll_count() for the
witness harness — once an INA/Joulescope is on the bench, B4 is one
capture away from a measured number.
Soft-AP HE (B1/B2 unblock):
- New c6_softap_he.{h,c}: brings up the C6 in AP+STA mode with WPA2-PSK
+ HE advertisement, so a second C6 in STA mode can negotiate real
iTWT against a known-cooperative AP without buying an 11ax router.
- main.c calls c6_softap_he_start() right before esp_wifi_start() when
CONFIG_C6_SOFTAP_HE_ENABLE=y.
- New Kconfig knobs: C6_SOFTAP_HE_{SSID,PSK,CHANNEL} with NVS overrides
via softap_ssid / softap_psk / softap_chan in the ruview namespace.
Build artifacts (IDF v5.4, both green, RC=0):
- S3 8 MB: 1093 KB (47% partition slack)
- C6 4 MB: 1019 KB (45% partition slack)
- SHA-256 sums in dist/firmware-v0.6.7/SHA256SUMS.txt
Doc updates: CHANGELOG wave-3 entry, ADR-110 phase table gets P5
upgrade note + new P9 row, WITNESS-LOG-110 gets new A0 section
recording the v0.6.7 build evidence.
Co-Authored-By: claude-flow <ruv@ruv.net>
After 5 systematic experiments confirmed the 802.15.4 RX path is
unfixable from user code in this IDF v5.4 + C6 combination (D1), add a
parallel sync transport over ESP-NOW. Same TS_BEACON protocol, same
public API (c6_sync_espnow_get_epoch_us / is_valid / is_leader), but
runs on the WiFi MAC layer that ESP-IDF fully supports across every
ESP32 family.
The 802.15.4 code stays in source for when the IDF driver is fixed.
ESP-NOW is the working primary today.
Empirical (single-board COM9 — other 3 boards dropped off USB during
the experiment):
- c6_sync_espnow_init() succeeds: "init done local_id=… leader=
yes(candidate) period=100ms"
- TX path 100% reliable: tx#101 fail=0 over ~15s at 100ms cadence
- RX awaiting cross-board test once USB-enumeration is restored
Trade vs. 802.15.4 design:
- Loses: "frees WiFi airtime for CSI" property
- Gains: known-working RX path, cross-target (S3 and C6 both)
- Same API surface — consumers swap transports without code change
Files:
- main/c6_sync_espnow.{h,c} — new module, ~210 lines
- main/CMakeLists.txt — add to SRCS (always built, used on any target)
- main/main.c — init after WiFi STA up, skip on QEMU mock
- test/capture-3board-experiment.py — surface c6_espnow log lines
- docs/WITNESS-LOG-110.md — new §D-workaround documenting the pivot
Ref: ruvnet/RuView#762 / D1 known-issue / draft PR #764
Co-Authored-By: claude-flow <ruv@ruv.net>
Tried 4th hypothesis for the RX-path bug: maybe the IDF v5.4 receiver
strictly requires dst PAN to match the local set_panid() instead of
honoring the 0xFFFF broadcast PAN per 802.15.4 spec. Changed beacon
dst PAN to 0xCAFE (matching set_panid call) to test.
Result: still negative (tx#241 rx#0/1, magic_match=0). PAN was not the
root cause — but the change is technically more correct per the IDF
behavior and is kept.
Also expanded WITNESS-LOG-110 §D1 to record the 4-experiment matrix
that's now been run:
1. WiFi-on + ch15: tx#381 rx#1 magic_match=0
2. WiFi-on + ch26: identical negative
3. WiFi-off + ch26 + OT off + promiscuous true: tx#601 rx#0 — even
the earlier rx#1 was a noise frame, not protocol traffic
4. Dst PAN 0xCAFE: still negative
Hypothesis space narrowed; needs IDF maintainer trace or working
multi-board reference to fix.
Co-Authored-By: claude-flow <ruv@ruv.net>
After 3 systematic hypotheses tested + rejected (radio coex, OpenThread
shadowing, manual RX re-arm), the 802.15.4 leader-election bug is
narrowed to: TX path works perfectly (~10/s clean, 0 fail), but the RX
path stops after exactly 1 frame. Manual esp_ieee802154_receive() from
either callback bootloops the driver (verified across all 3 boards).
The IDF reference example uses the same handle_done-only pattern as
this code, implying the driver should auto-restart RX — but empirically
doesn't here. Either a half-duplex radio state issue or an IDF v5.4
bug. Tracked as known issue D1 in WITNESS-LOG-110.
Changes shipped:
- c6_twt.c: ESP_ERR_INVALID_ARG added to graceful-fallback list
(empirically: ruv.net AP advertises TWT Responder=0, IDF driver
validates against AP HE capability and rejects with INVALID_ARG)
- c6_timesync.c: diagnostic counters (s_tx_count, s_tx_fail, s_rx_count,
s_rx_magic_match) + per-10-beacon log line preserved so future
investigation has the diagnostic harness ready
- sdkconfig.defaults.esp32c6: 15.4 channel default 15 → 26 (non-overlap
with WiFi 2.4 GHz channels), OpenThread disabled (we use raw 15.4)
- promiscuous=true on the radio (broadcast frames addressed to 0xFFFF)
- WITNESS-LOG-110 §D1 expanded with the full diagnostic trace +
3-hypothesis investigation record
Cross-node sync claim (B3) BLOCKED until either an IDF maintainer
trace or a working multi-board reference is available. The other
three SOTA dimensions (HE-LTF, TWT cadence, 5 µA hibernation) are
also still unverified and need different hardware (11ax AP, INA meter)
— honestly recorded in §B.
Tracking: ruvnet/RuView#762, task #30 closed as known-issue.
Co-Authored-By: claude-flow <ruv@ruv.net>
`firmware/esp32-csi-node` now builds for both `esp32s3` (existing
production) and `esp32c6` (new research / battery-seed target) from
the same source tree. ESP-IDF auto-applies `sdkconfig.defaults.esp32c6`
when the target is set to esp32c6; every C6 module is gated on
CONFIG_IDF_TARGET_ESP32C6 (or the SOC_WIFI_HE_SUPPORT capability) so
the S3 build path is byte-identical to today.
New modules (all #ifdef-gated, no-op stubs on S3):
- c6_twt.{h,c} — iTWT wrapper, graceful AP-NACK fallback
- c6_timesync.{h,c} — 802.15.4 beacon-based mesh time-sync, EUI-64
leader election, c6_timesync_get_epoch_us()
- c6_lp_core.{h,c} — wake-on-motion deep-sleep helper (ext1 path
this cut; real LP-core polling deferred)
ADR-018 frame extension:
- byte 18: PPDU type (0=HT/legacy, 1=HE-SU, 2=HE-MU, 3=HE-TB)
- byte 19: bandwidth + STBC + 802.15.4-sync-valid flags
- Magic 0xC5110001 unchanged — backwards compatible
- Dual-branch encoding handles both struct variants of
wifi_pkt_rx_ctrl_t (legacy S3 / HE C6) per CONFIG_SOC_WIFI_HE_SUPPORT
Critical bug fixed during live witness collection (verified across 3
boards on COM6/COM9/COM12):
- c6_timesync.c read MAC into a 6-byte buffer and ran MAC-48->EUI-64
conversion. But esp_read_mac(ESP_MAC_IEEE802154) returns 8 bytes
already in EUI-64 form on C6 — code was double-inserting FFFE.
Boot log was 206ef1fffefffe17, fix yields 206ef1fffe17278c which
matches esptool's eFuse reading exactly.
Tooling:
- CI workflow (firmware-ci.yml) extended with c6-4mb matrix row +
ADR-110 host-unit-test step
- Host unit tests for pure functions (mac48_to_eui64,
eui64_bytes_to_u64, PPDU encoding both branches) — runs on Ubuntu CI
- Multi-board live-capture harness (test/capture-3board-experiment.py)
- Witness bundle script records SHA-256s for s3-adr110, c6-adr110, and
s3-fair-adr110 (apples-to-apples) binary archives
Honest empirical findings (full report in docs/WITNESS-LOG-110.md):
- Verified live on 3 C6 boards: boot, 802.15.4 init w/ correct EUIs,
WiFi STA reaching assoc->run on ruv.net, TWT setup attempted +
gracefully NACKed (AP is 11n-only, TWT Responder:0), HE-MAC firmware
loaded
- NOT verified (need 11ax AP / second-channel exp / INA meter):
HE-LTF subcarrier expansion, TWT cadence determinism, ±100 µs sync
alignment, 5 µA hibernation
- Bug found: leader election doesn't step down under live WiFi load —
likely 2.4 GHz radio coex preemption (WiFi ch 5 vs 15.4 ch 15);
follow-up task #30
- Apples-to-apples size: S3-no-display = 886 KB, C6 = 1003 KB
(C6 is 13% LARGER for equivalent CSI features; the extra is the
802.15.4 + OpenThread stack that S3 lacks)
Tracking: ruvnet/RuView#762
Co-Authored-By: claude-flow <ruv@ruv.net>
At edge tier>=2 on N16R8 PSRAM boards, `process_frame()` runs
`update_multi_person_vitals()` (4 persons × 256 history samples) plus
`wasm_runtime_on_frame()` back-to-back before returning to `edge_task()`.
The existing `vTaskDelay(1)` in `edge_task()` only fires *after*
`process_frame()` returns — under sustained 30 pps CSI load on PSRAM
boards this leaves IDLE1 on Core 1 starved long enough for the 5-second
Task Watchdog Timer to fire.
Fix: add two `vTaskDelay(1)` calls inside `process_frame()`, both gated
on `s_cfg.tier >= 2`:
1. After `update_multi_person_vitals()` (Step 11)
2. After `wasm_runtime_on_frame()` dispatch (Step 14)
Tier 0/1 paths are unaffected. Validated on COM7 (N16R8 board):
`Edge DSP task started on core 1 (tier=2)`, no WDT panics in 20 s.
Also bump firmware version 0.6.5 → 0.6.6 and refresh all 6 release_bins
with the new build (8MB + 4MB variants, built 2026-05-21).
Fix-marker RuView#683 added to scripts/fix-markers.json.
Co-Authored-By: claude-flow <ruv@ruv.net>
Three fixes wrapped for the v0.6.5-esp32 release tag:
1. **`sdkconfig.defaults` adds `CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH=8192`**.
The fix was already in `sdkconfig.defaults.template` (ADR-081, prevents
"stack overflow in task Tmr Svc" bootloop when adaptive_controller emits
feature_state from inside a Timer Svc callback). It was MISSING from the
canonical `sdkconfig.defaults` file used by the build, so any fresh
build picked up the 2 KiB FreeRTOS default and bootlooped on hardware.
Verified on COM7: with the fix, no panics in 30 s of operation; without
it, "***ERROR*** A stack overflow in task Tmr Svc has been detected."
followed by sustained bootloop.
2. **`ota_update.c` extracts `ota_load_psk_from_nvs()` and calls it from
both `ota_update_init()` and `ota_update_init_ex()`.** `main.c:230` uses
the `_ex` variant, but only `ota_update_init()` was loading the PSK
from NVS. Result: `s_ota_psk` stayed empty regardless of NVS contents,
so the RuView#596 fail-closed posture rejected every request — but the
diagnostic warning never printed at boot, leaving operators no signal
about why their OTA uploads were 403'ing. Verified on COM7:
W (3126) ota_update: NVS namespace 'security' not found —
OTA upload endpoint will REJECT all requests until provisioned.
Fail-closed per RuView#596.
3. **`version.txt`: 0.6.4 → 0.6.5**, paired with the v0.6.5-esp32 tag so the
firmware-ci version-guard job (RuView#505 fix-marker) stays happy.
Both validations done end-to-end on hardware (COM7, ESP32-S3 8MB,
provisioned with --edge-tier 2 to also incidentally re-verify #438 is not
reproducible on current main).
ota_check_auth() previously returned true when s_ota_psk[0] == '\0'
("permissive for dev"). A freshly-flashed node — or any node where
nobody had provisioned an OTA PSK yet — accepted attacker-controlled
firmware over plain HTTP on port 8032 from any host on the WiFi. No
Secure Boot V2, no signed-image verification, no transport encryption.
Single LAN call could brick or backdoor a node.
This was flagged in the deep security review of PR #596 but was a
PRE-EXISTING bug in main, not new code from that PR — so it stood as
a critical-severity production issue until this commit.
Fix:
- ota_check_auth() now returns false when no PSK is provisioned, with
ESP_LOGW("OTA rejected: no PSK in NVS …") at the call site so the
operator can diagnose the rejection from serial logs
- ota_update_init() ESP_LOGW message updated to surface the new posture
at boot ("upload endpoint will REJECT all requests until provisioned")
- Doc comment on ota_check_auth() rewritten to make the contract
explicit and reference the audit
The OTA HTTP server itself still starts even when no PSK is set. That
lets the operator run `provision.py --ota-psk <hex>` over USB-CDC to
write the NVS key without reflashing the firmware. The upload endpoint
just refuses every request in the meantime.
Breaking change for any deployment that depended on the unauthenticated
OTA path working out of the box. Documented in CHANGELOG under
[Unreleased] / Security so it's visible at the next release cut.
Fix-marker RuView#596-ota-fail-closed (scripts/fix-markers.json)
requires the new behaviour and forbids the old "permissive for dev"
fallback strings, so a future revert fails CI.
* firmware/esp32-csi-node: fix IDF 6 build (PSA SHA-256, explicit REQUIRES)
- rvf_parser: use psa_hash_* / psa_hash_compute; mbedTLS 4 has no public
mbedtls/sha256.h on the IDF include path.
- main/CMakeLists: declare REQUIRES for WiFi, netif, HTTP, OTA, drivers, lwip,
mbedtls per ESP-IDF v6 component dependency checks; optional wasm3 when
CONFIG_WASM_ENABLE.
Signed-off-by: Chaitanya Tata <chaitanya@dotstarconsulting.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* firmware/esp32-csi-node: fix CSI config for Wi-Fi 6 (ESP32-C6)
When CONFIG_SOC_WIFI_HE_SUPPORT is set, wifi_csi_config_t is the
wifi_csi_acquire_config_t bitfield layout. The legacy bool fields
(lltf_en, htltf_en, ...) only apply to ESP32-S3-class targets.
Initialize acquire fields for HE targets; add MAC v3-only members when
CONFIG_SOC_WIFI_MAC_VERSION_NUM >= 3.
Verified: idf.py build for esp32c6 and esp32s3 (ESP-IDF v6.1).
Signed-off-by: Chaitanya Tata <chaitanya@dotstarconsulting.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* firmware/esp32-csi-node: pin edge DSP task for unicore (ESP32-C6)
edge_processing_init used xTaskCreatePinnedToCore(..., core 1). ESP32-C6
runs FreeRTOS unicore (portNUM_PROCESSORS == 1), so core 1 trips the
xTaskCreatePinnedToCore range assert right after CSI init.
Use core 1 only when SMP is available; otherwise pin to core 0.
Signed-off-by: Chaitanya Tata <chaitanya@dotstarconsulting.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* firmware/esp32-csi-node: provision NVS with chip auto-detect
provision.py always passed --chip esp32s3 to esptool, so flashing NVS on
ESP32-C6 failed. Default --chip to auto (esptool v5) and add an explicit
--chip override. Use write-flash instead of deprecated write_flash.
Signed-off-by: Chaitanya Tata <chaitanya@dotstarconsulting.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Signed-off-by: Chaitanya Tata <chaitanya@dotstarconsulting.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
csi_collector_init() never called esp_wifi_set_ps(), leaving the radio on
the ESP-IDF STA default WIFI_PS_MIN_MODEM. The modem then sleeps between
DTIM beacons; combined with the MGMT-only promiscuous filter (#396) the
CSI callback is starved and the per-second yield collapses toward 0 pps,
which is what users on a clean multi-node setup were seeing
(motion=0.00 presence=0.00 yield=0pps).
Force WIFI_PS_NONE before enabling promiscuous mode — the textbook
requirement for reliable CSI capture (every ESP-IDF CSI example does it).
New boot line: "csi_collector: WiFi modem sleep disabled (WIFI_PS_NONE)
for CSI capture". Battery duty-cycling is unaffected: power_mgmt_init()
runs after this and re-enables modem sleep when provision.py is given
--duty-cycle <100.
Builds clean for esp32s3 (idf.py build, 48% flash free).
Closes#521
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware): move defensive node_id capture before wifi_init_sta()
The original defensive copy in csi_collector_init() (line 172 of main.c)
runs AFTER wifi_init_sta() (line 147), which on some ESP32-S3 devices
corrupts g_nvs_config.node_id back to the Kconfig default of 1.
Reproduced on device 80:b5:4e:c1:be:b8 (ESP32-S3 QFN56 rev v0.2):
- NVS provisioned with node_id=5
- Release firmware (no fix): seed receives node_id=1 (clobbered)
- This patch: seed receives node_id=5 (correct)
Changes:
- Add csi_collector_set_node_id() called from main.c immediately
after nvs_config_load(), before wifi_init_sta() runs
- csi_collector_init() now detects and logs the clobber if early
capture disagrees with current g_nvs_config value
- Fallback path preserved: if set_node_id() is never called,
init() still captures from g_nvs_config (backwards compatible)
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware): defensive copy of filter_mac to prevent callback crash
The CSI callback reads g_nvs_config.filter_mac_set and filter_mac on
every invocation (100-500 Hz). If wifi_init_sta() corrupts g_nvs_config
(same root cause as the node_id clobber), the callback reads garbage
from the struct, leading to Core 0 LoadProhibited panic after ~2400
callbacks (~70 seconds of operation).
Extends the early-capture pattern from the node_id fix to also copy
filter_mac_set and filter_mac into module-local statics before WiFi
init runs. Adds canary logging to detect filter_mac corruption.
Observed on device 80:b5:4e:c1:be:b8 via serial:
CSI cb #2400 → Guru Meditation Error: Core 0 panic'ed (LoadProhibited)
→ TG0WDT_SYS_RST → reboot → crash again at ~2900 callbacks
Refs #232#375#385#386#390
Co-Authored-By: Ruflo & AQE
* fix(firmware): MGMT-only promiscuous filter to prevent SPI cache crash
The WiFi driver's wDev_ProcessFiq interrupt handler crashes with
LoadProhibited in cache_ll_l1_resume_icache when promiscuous mode
captures MGMT+DATA frames (100-500 interrupts/sec). The high interrupt
rate races with SPI flash cache operations, corrupting cache state.
Changes:
- Promiscuous filter: MGMT+DATA → MGMT-only (~10 Hz beacons)
- CSI config: disable htltf_en and stbc_htltf2_en (LLTF-only)
LLTF provides 64 subcarriers (HT20) — sufficient for presence,
breathing, and fall detection. The 10 Hz beacon rate eliminates
the SPI flash cache contention that caused the crash.
Verified on device 80:b5:4e:c1:be:b8:
- Before: LoadProhibited crash at ~1600-2400 callbacks (every ~70s)
- After: 2700+ callbacks over 4.7 minutes, zero crashes
Backtrace decode confirmed crash in ESP-IDF closed-source WiFi blob:
_xt_lowint1 → wDev_ProcessFiq → spi_flash_restore_cache
→ cache_ll_l1_resume_icache → EXCVADDR=0x00000004 (NULL deref)
Co-Authored-By: Ruflo & AQE
* fix(provision): write-flash → write_flash for esptool v5 compat
esptool v5+ rejects hyphenated subcommands. The provision script
used 'write-flash' which fails with "invalid choice". Changed to
'write_flash' (underscore) which works with both old and new esptool.
Co-Authored-By: Ruflo & AQE
* fix(firmware): 50 Hz callback rate gate + sdkconfig extra IRAM opt
- Add early rate gate in wifi_csi_callback at 50 Hz (defense-in-depth,
does not prevent crash alone but reduces callback execution time)
- Add null-data injection timer infrastructure (disabled — TX adds
interrupt pressure that triggers the SPI cache crash, RuView#396)
- sdkconfig.defaults: add CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y
- sdkconfig.defaults: document SPIRAM XIP attempt (crashes differently)
Co-Authored-By: Ruflo & AQE
* fix(firmware): address PR #397 review feedback
Applies @ruvnet's five review requests on PR #397 (RuView#397 comment
4289417527):
1. **Inline comment on `provision.py` `write_flash`** — ESP-IDF v5.4
bundles esptool 4.10.0 (underscore-only). #391's hyphen swap broke
the documented venv flow; kept the underscore form and added a
three-line comment warning future maintainers not to "re-fix" it.
2. **Correct `edge_processing.c` sample_rate** (blocking) — changed
hard-coded `20.0f` → `10.0f` at line 718 so
`estimate_bpm_zero_crossing()` matches the MGMT-only CSI rate.
Without this, breathing and heart-rate reports were 2× the true
value. Added a comment tying the constant to the callback rate gate.
3. **Removed disabled probe-injection infrastructure** — dropped the
forward declaration, the `CSI_PROBE_INTERVAL_MS` define, six static
variables (`s_probe_timer`, `s_probe_tx_count`, `s_probe_tx_fail`,
`s_ap_bssid`, `s_ap_bssid_known`), and three functions
(`csi_send_probe_request`, `probe_timer_cb`,
`csi_collector_start_probe_timer`). None were reachable.
`csi_inject_ndp_frame()` reverted to the original ADR-029 stub.
Can be revived from this commit's parent if needed.
4. **Cleaned `sdkconfig.defaults`** — removed the SPIRAM prose and
commented-out `# CONFIG_SPIRAM is not set` line. Kept only the live
`CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` with a concise rationale.
5. **Bumped firmware version 0.6.1 → 0.6.2** and added four
`[Unreleased]` CHANGELOG entries covering the SPI cache crash fix,
the `filter_mac` / `node_id` clobber defense, the sample-rate
correction, and the `write_flash` command-form revert.
Net: +39 / -128 across six files.
Validation in this devcontainer:
- Static sanity on modified C files: braces balance (csi_collector.c
59/59; edge_processing.c 96/96), zero dangling references to removed
probe-injection symbols.
- Rust workspace tests and Python proof not executed here — cargo not
installed and pip blocked by PEP 668. Deferring hardware build +
flash + miniterm verification to @ruvnet's COM7 per his offer in
the review comment.
Co-Authored-By: claude-flow <ruv@ruv.net>
---------
Co-authored-by: Dragan Spiridonov <spiridonovdragan@gmail.com>
Users on multi-node ESP32 deployments have been reporting for months
that their provisioned `node_id` reverts to the Kconfig default of `1`
in UDP frames and the `csi_collector` init log, despite boot showing:
nvs_config: NVS override: node_id=4
main: ESP32-S3 CSI Node (ADR-018) - Node ID: 4
csi_collector: CSI collection initialized (node_id=1, channel=11)
See #232, #375, #385, #386, #390. The root memory-corruption path for
the `g_nvs_config.node_id` byte has not been definitively isolated
(does not reproduce on my attached ESP32-S3 running current source
and the v0.6.0 release binary), but the UDP frame header can be made
tamper-proof regardless:
1. `csi_collector_init()` now captures `g_nvs_config.node_id` into a
module-local `static uint8_t s_node_id` at init time.
2. `csi_serialize_frame()` reads `buf[4]` from `s_node_id`, not from
the global - so any later corruption of `g_nvs_config` cannot
affect outgoing CSI frames.
3. All other consumers (`edge_processing.c` x3, `wasm_runtime.c`,
`display_ui.c`, `main.c swarm_bridge_init`) now go through a new
`csi_collector_get_node_id()` accessor instead of reading the
global directly.
4. A canary at end-of-init logs `WARN` if `g_nvs_config.node_id`
already diverges from the captured value - this will pinpoint
the corruption path if it happens on a user's device.
Hardware validation on attached ESP32-S3 (COM8):
- NVS loads node_id=2
- Boot log: `main: ... Node ID: 2`
- NEW log: `csi_collector: Captured node_id=2 at init (defensive
copy for #232/#375/#385/#390)`
- Init log: `csi_collector: CSI collection initialized (node_id=2)`
- UDP frame byte[4] = 2 (verified via socket sniffer, 15/15 packets)
This is defense in depth - it shields the UDP frame from whatever
upstream bug is clobbering the struct. When a user hits the original
bug, the canary WARN will help isolate the root cause.
Refs #232#375#385#386#390
Co-Authored-By: claude-flow <ruv@ruv.net>
- Add version.txt (0.6.0) read by CMakeLists.txt so
esp_app_get_description()->version matches the release tag
- Log firmware version on boot: "v0.6.0 — Node ID: X"
- Remove stale Kconfig help text (said default 2.0, actual is 15.0)
Fixes the version mismatch reported in #354 where flashing v0.5.3
binaries showed v0.4.3 because PROJECT_VER was never set.
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(signal): subcarrier importance weighting via mincut partition (Phase 1)
Adds subcarrier_importance_weights() to ruvector signal crate — converts
mincut partition into per-subcarrier float weights (>1.0 for sensitive,
0.5 for insensitive subcarriers).
Sensing server now uses weighted mean/variance in extract_features_from_frame
instead of treating all 56 subcarriers equally. This emphasizes body-motion-
sensitive subcarriers and reduces noise from static multipath.
Expected: ~26% reduction in keypoint jitter (±15cm → ±11cm RMS).
284 tests pass (191 trainer + 51 lib + 18 vital_signs + 16 dataset + 8 multi_node).
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware): stack overflow risk + tick-rate independence (review findings)
Critical fixes from deep review:
1. **Stack overflow prevention**: Moved BPM scratch buffers (br_buf, hr_buf)
from stack to static storage in both process_frame() and
update_multi_person_vitals(). Combined stack was ~6.5-7.5 KB of 8 KB
limit — now reduced by ~4 KB to safe margins.
2. **Tick-rate independence**: Post-batch yield now uses
pdMS_TO_TICKS(20) with min-1 guard instead of raw vTaskDelay(2).
Previously assumed 100Hz tick rate.
3. **EDGE_BATCH_LIMIT to header**: Moved from local const to
edge_processing.h #define for configurability.
Firmware builds clean at 843 KB.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(server): stale node eviction, remove unsafe pointer (review findings)
Critical fixes from deep review:
1. **Stale node eviction**: node_states HashMap now evicts nodes with no
frame for >60 seconds, every 100 ticks. Prevents unbounded memory
growth and stale smoothing data when nodes are replaced.
2. **Remove unsafe raw pointer**: Replaced the unsafe raw pointer to
adaptive_model (used to break borrow checker deadlock with
node_states) with a safe .clone() before the mutable borrow.
AdaptiveModel derives Clone so this is a clean copy.
284 tests pass, zero failures.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware,server): watchdog crash on busy LANs + no detection from edge vitals (#321, #323)
**Firmware (#321):** edge_dsp task now batch-limits frame processing to 4
frames before a 10ms yield. On corporate LANs with high CSI frame rates,
the previous 1-tick-per-frame yield wasn't enough to prevent IDLE1
starvation and task watchdog triggers.
**Sensing server (#323):** When ESP32 runs the edge DSP pipeline (Tier 2+),
it sends vitals packets (magic 0xC5110002) instead of raw CSI frames.
Previously, the server broadcast these as raw edge_vitals but never
generated a sensing_update, so the UI showed "connected" but "0 persons".
Now synthesizes a full sensing_update from vitals data including
classification, person count, and pose generation.
Closes#321Closes#323
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware): address review findings — idle busy-spin and observability
- Fix pdMS_TO_TICKS(5)==0 at 100Hz causing busy-spin in idle path (use
vTaskDelay(1) instead)
- Post-batch yield now 2 ticks (20ms) for genuinely longer pause
- Add s_ring_drops counter to ring_push for diagnosing frame drops
- Expose drop count in periodic vitals log line
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(server): set breathing_band_power for skeleton animation from vitals
When presence is detected via edge vitals, set breathing_band_power to
0.5 so the UI's torso breathing animation works. Previously hardcoded
to 0.0 which made the skeleton appear static even when breathing rate
was being reported.
Co-Authored-By: claude-flow <ruv@ruv.net>
CONFIG_CSI_NODE_ID (compile-time, always 1) was hardcoded in 6
places: CSI frame serialization, compressed frames, vitals packets,
WASM output packets, and display UI. NVS provisioning wrote the
correct node_id but it was never used at runtime.
Fixed all occurrences to use g_nvs_config.node_id:
- csi_collector.c: frame header + log message
- edge_processing.c: compressed frame + vitals packet
- wasm_runtime.c: WASM output packet
- display_ui.c: system info display
This means --node-id 0/1/2 provisioning now actually works for
multi-node mesh deployments.
Closes#279
Co-Authored-By: claude-flow <ruv@ruv.net>
process_frame() is CPU-intensive (biquad filters, Welford stats,
BPM estimation, multi-person vitals) and can run for several ms.
At priority 5, edge_dsp starves IDLE1 (priority 0) on Core 1,
triggering the task watchdog every 5 seconds.
Fix: vTaskDelay(1) after every frame to let IDLE1 reset the
watchdog. At 20 Hz CSI rate this adds ~1 ms per frame —
negligible for vitals extraction.
Verified on real ESP32-S3 with live WiFi CSI: 0 watchdog
triggers in 60 seconds (was triggering every 5s before fix).
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware): fall detection false positives + 4MB flash support (#263, #265)
Issue #263: Default fall_thresh raised from 2.0 to 15.0 rad/s² — normal
walking produces accelerations of 2.5-5.0 which triggered constant false
"Fall Detected" alerts. Added consecutive-frame requirement (3 frames)
and 5-second cooldown debounce to prevent alert storms.
Issue #265: Added partitions_4mb.csv and sdkconfig.defaults.4mb for
ESP32-S3 boards with 4MB flash (e.g. SuperMini). OTA slots are 1.856MB
each, fitting the ~978KB firmware binary with room to spare.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): repair all 3 QEMU workflow job failures
1. Fuzz Tests: add esp_timer_create_args_t, esp_timer_create(),
esp_timer_start_periodic(), esp_timer_delete() stubs to
esp_stubs.h — csi_collector.c uses these for channel hop timer.
2. QEMU Build: add libgcrypt20-dev to apt dependencies —
Espressif QEMU's esp32_flash_enc.c includes <gcrypt.h>.
Bump cache key v4→v5 to force rebuild with new dep.
3. NVS Matrix: switch to subprocess-first invocation of
nvs_partition_gen to avoid 'str' has no attribute 'size' error
from esp_idf_nvs_partition_gen API change. Falls back to
direct import with both int and hex size args.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): pip3 in IDF container + fix swarm QEMU artifact path
QEMU Test jobs: espressif/idf:v5.4 container has pip3, not pip.
Swarm Test: use /opt/qemu-esp32 (fixed path) instead of
${{ github.workspace }}/qemu-build which resolves incorrectly
inside Docker containers.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): source IDF export.sh before pip install in container
espressif/idf:v5.4 container doesn't have pip/pip3 on PATH — it
lives inside the IDF Python venv which is only activated after
sourcing $IDF_PATH/export.sh.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): pad QEMU flash image to 8MB with --fill-flash-size
QEMU rejects flash images that aren't exactly 2/4/8/16 MB.
esptool merge_bin produces a sparse image (~1.1 MB) by default.
Add --fill-flash-size 8MB to pad with 0xFF to the full 8 MB.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): source IDF export before NVS matrix generation in QEMU tests
The generate_nvs_matrix.py script needs the IDF venv's python
(which has esp_idf_nvs_partition_gen installed) rather than the
system /usr/bin/python3 which doesn't have the package.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): QEMU validation treats WARNs as OK + swarm IDF export
1. validate_qemu_output.py: WARNs exit 0 by default (no real WiFi
hardware in QEMU = no CSI data = expected WARNs for frame/vitals
checks). Add --strict flag to fail on warnings when needed.
2. Swarm Test: source IDF export.sh before running qemu_swarm.py
so pip-installed pyyaml is on the Python path.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): provision.py subprocess-first NVS gen + swarm IDF venv
provision.py had same 'str' has no attribute 'size' bug as the
NVS matrix generator — switch to subprocess-first approach.
Swarm test also needs IDF export for the swarm smoke test step.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): handle missing 'ip' command in QEMU swarm orchestrator
The IDF container doesn't have iproute2 installed, so 'ip' binary
is missing. Add shutil.which() check to can_tap guard and catch
FileNotFoundError in _run_ip() for robustness.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): skip Rust aggregator when cargo not available in swarm test
The IDF container doesn't have Rust installed. Check for cargo
with shutil.which() before attempting to spawn the aggregator,
falling back to aggregator-less mode (QEMU nodes still boot and
exercise the firmware pipeline).
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(ci): treat swarm test WARNs as acceptable in CI
The max_boot_time_s assertion WARNs because QEMU doesn't produce
parseable boot time data. Exit code 1 (WARN) is acceptable in CI
without real hardware; only exit code 2+ (FAIL/FATAL) should fail.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware): Kconfig EDGE_FALL_THRESH default 2000→15000
The nvs_config.c fallback (15.0f) was never reached because
Kconfig always defines CONFIG_EDGE_FALL_THRESH. The Kconfig
default was still 2000 (=2.0 rad/s²), causing false fall alerts
on real WiFi CSI data (7 alerts in 45s).
Fixed to 15000 (=15.0 rad/s²). Verified on real ESP32-S3 hardware
with live WiFi CSI: 0 false fall alerts in 60s / 1300+ frames.
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs: update README, CHANGELOG, user guide for v0.4.3-esp32
- README: add v0.4.3 to release table, 4MB flash instructions,
fix fall-thresh example (5000→15000)
- CHANGELOG: v0.4.3-esp32 entry with all fixes and additions
- User guide: 4MB flash section with esptool commands
Co-Authored-By: claude-flow <ruv@ruv.net>
- provision.py: add --channel (CSI channel override) and --filter-mac
(AA:BB:CC:DD:EE:FF format) arguments with validation
- nvs_config: add csi_channel, filter_mac[6], filter_mac_set fields;
read from NVS on boot
- csi_collector: auto-detect AP channel when no NVS override is set;
filter CSI frames by source MAC when filter_mac is configured
- ADR-060 documents the design and rationale
Fixes#247, fixes#229
The committed sdkconfig had CONFIG_ESP_WIFI_CSI_ENABLED disabled, causing
all builds to crash at runtime with "CSI not enabled in menuconfig".
Root cause: sdkconfig.defaults.template existed but ESP-IDF only reads
sdkconfig.defaults (no .template suffix).
Fixes:
- Add sdkconfig.defaults with CONFIG_ESP_WIFI_CSI_ENABLED=y
- Add #error compile guard in csi_collector.c to prevent recurrence
- Fix NVS encryption default (requires eFuse, breaks clean builds)
Verified: Docker build + flash to ESP32-S3 + CSI callbacks confirmed.
Closes#241
Relates to #223, #238, #234, #210, #190
Co-Authored-By: claude-flow <ruv@ruv.net>
The CSI callback fires for every WiFi frame in promiscuous mode
(100-500+ fps). Each call invoked sendto() synchronously, exhausting
lwIP packet buffers (errno 12 = ENOMEM). The rapid-fire failures
cascaded into a LoadProhibited guru meditation crash.
Two fixes:
1. csi_collector.c: Rate-limit UDP sends to 50 Hz (20ms interval).
CSI frames arriving between sends are dropped — the sensing
pipeline only needs 20-50 Hz.
2. stream_sender.c: When sendto fails with ENOMEM, suppress further
sends for 100ms to let lwIP reclaim buffers. Logs the backoff
event and resumes automatically.
Closes#127
* feat: add MAC address filter for ESP32 CSI collection
In multi-AP environments, CSI frames from different access points get
mixed together, corrupting the sensing signal. Add transmitter MAC
filtering so only frames from a specified AP are processed.
Implementation:
- csi_collector: filter in wifi_csi_callback by comparing info->mac
against configured MAC; log transmitter MAC in periodic debug output
- csi_collector_set_filter_mac(): runtime API to enable/disable filter
- Kconfig: CSI_FILTER_MAC option (format "AA:BB:CC:DD:EE:FF")
- NVS: "filter_mac" 6-byte blob overrides Kconfig at runtime
- nvs_config: parse Kconfig MAC string at boot, load NVS override
- main: apply filter from config after csi_collector_init()
When no filter is configured (default), behavior is unchanged —
all transmitter MACs are accepted for backward compatibility.
Fixes#98
Co-Authored-By: claude-flow <ruv@ruv.net>
* chore: add CLAUDE.local.md to .gitignore
Local machine configuration (ESP-IDF paths, COM port, build
instructions) should not be committed to the repository.
Co-Authored-By: claude-flow <ruv@ruv.net>
The source code was moved to v1/src/ but the Dockerfile still
referenced src/ directly, causing build failures. Updated all
COPY paths, uvicorn module paths, test paths, and bandit scan
paths. Also added missing v1/__init__.py for Python module
resolution.
Fixes#33
Co-Authored-By: claude-flow <ruv@ruv.net>