Three fixes wrapped for the v0.6.5-esp32 release tag:
1. **`sdkconfig.defaults` adds `CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH=8192`**.
The fix was already in `sdkconfig.defaults.template` (ADR-081, prevents
"stack overflow in task Tmr Svc" bootloop when adaptive_controller emits
feature_state from inside a Timer Svc callback). It was MISSING from the
canonical `sdkconfig.defaults` file used by the build, so any fresh
build picked up the 2 KiB FreeRTOS default and bootlooped on hardware.
Verified on COM7: with the fix, no panics in 30 s of operation; without
it, "***ERROR*** A stack overflow in task Tmr Svc has been detected."
followed by sustained bootloop.
2. **`ota_update.c` extracts `ota_load_psk_from_nvs()` and calls it from
both `ota_update_init()` and `ota_update_init_ex()`.** `main.c:230` uses
the `_ex` variant, but only `ota_update_init()` was loading the PSK
from NVS. Result: `s_ota_psk` stayed empty regardless of NVS contents,
so the RuView#596 fail-closed posture rejected every request — but the
diagnostic warning never printed at boot, leaving operators no signal
about why their OTA uploads were 403'ing. Verified on COM7:
W (3126) ota_update: NVS namespace 'security' not found —
OTA upload endpoint will REJECT all requests until provisioned.
Fail-closed per RuView#596.
3. **`version.txt`: 0.6.4 → 0.6.5**, paired with the v0.6.5-esp32 tag so the
firmware-ci version-guard job (RuView#505 fix-marker) stays happy.
Both validations done end-to-end on hardware (COM7, ESP32-S3 8MB,
provisioned with --edge-tier 2 to also incidentally re-verify #438 is not
reproducible on current main).
* fix(firmware): move defensive node_id capture before wifi_init_sta()
The original defensive copy in csi_collector_init() (line 172 of main.c)
runs AFTER wifi_init_sta() (line 147), which on some ESP32-S3 devices
corrupts g_nvs_config.node_id back to the Kconfig default of 1.
Reproduced on device 80:b5:4e:c1:be:b8 (ESP32-S3 QFN56 rev v0.2):
- NVS provisioned with node_id=5
- Release firmware (no fix): seed receives node_id=1 (clobbered)
- This patch: seed receives node_id=5 (correct)
Changes:
- Add csi_collector_set_node_id() called from main.c immediately
after nvs_config_load(), before wifi_init_sta() runs
- csi_collector_init() now detects and logs the clobber if early
capture disagrees with current g_nvs_config value
- Fallback path preserved: if set_node_id() is never called,
init() still captures from g_nvs_config (backwards compatible)
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(firmware): defensive copy of filter_mac to prevent callback crash
The CSI callback reads g_nvs_config.filter_mac_set and filter_mac on
every invocation (100-500 Hz). If wifi_init_sta() corrupts g_nvs_config
(same root cause as the node_id clobber), the callback reads garbage
from the struct, leading to Core 0 LoadProhibited panic after ~2400
callbacks (~70 seconds of operation).
Extends the early-capture pattern from the node_id fix to also copy
filter_mac_set and filter_mac into module-local statics before WiFi
init runs. Adds canary logging to detect filter_mac corruption.
Observed on device 80:b5:4e:c1:be:b8 via serial:
CSI cb #2400 → Guru Meditation Error: Core 0 panic'ed (LoadProhibited)
→ TG0WDT_SYS_RST → reboot → crash again at ~2900 callbacks
Refs #232#375#385#386#390
Co-Authored-By: Ruflo & AQE
* fix(firmware): MGMT-only promiscuous filter to prevent SPI cache crash
The WiFi driver's wDev_ProcessFiq interrupt handler crashes with
LoadProhibited in cache_ll_l1_resume_icache when promiscuous mode
captures MGMT+DATA frames (100-500 interrupts/sec). The high interrupt
rate races with SPI flash cache operations, corrupting cache state.
Changes:
- Promiscuous filter: MGMT+DATA → MGMT-only (~10 Hz beacons)
- CSI config: disable htltf_en and stbc_htltf2_en (LLTF-only)
LLTF provides 64 subcarriers (HT20) — sufficient for presence,
breathing, and fall detection. The 10 Hz beacon rate eliminates
the SPI flash cache contention that caused the crash.
Verified on device 80:b5:4e:c1:be:b8:
- Before: LoadProhibited crash at ~1600-2400 callbacks (every ~70s)
- After: 2700+ callbacks over 4.7 minutes, zero crashes
Backtrace decode confirmed crash in ESP-IDF closed-source WiFi blob:
_xt_lowint1 → wDev_ProcessFiq → spi_flash_restore_cache
→ cache_ll_l1_resume_icache → EXCVADDR=0x00000004 (NULL deref)
Co-Authored-By: Ruflo & AQE
* fix(provision): write-flash → write_flash for esptool v5 compat
esptool v5+ rejects hyphenated subcommands. The provision script
used 'write-flash' which fails with "invalid choice". Changed to
'write_flash' (underscore) which works with both old and new esptool.
Co-Authored-By: Ruflo & AQE
* fix(firmware): 50 Hz callback rate gate + sdkconfig extra IRAM opt
- Add early rate gate in wifi_csi_callback at 50 Hz (defense-in-depth,
does not prevent crash alone but reduces callback execution time)
- Add null-data injection timer infrastructure (disabled — TX adds
interrupt pressure that triggers the SPI cache crash, RuView#396)
- sdkconfig.defaults: add CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y
- sdkconfig.defaults: document SPIRAM XIP attempt (crashes differently)
Co-Authored-By: Ruflo & AQE
* fix(firmware): address PR #397 review feedback
Applies @ruvnet's five review requests on PR #397 (RuView#397 comment
4289417527):
1. **Inline comment on `provision.py` `write_flash`** — ESP-IDF v5.4
bundles esptool 4.10.0 (underscore-only). #391's hyphen swap broke
the documented venv flow; kept the underscore form and added a
three-line comment warning future maintainers not to "re-fix" it.
2. **Correct `edge_processing.c` sample_rate** (blocking) — changed
hard-coded `20.0f` → `10.0f` at line 718 so
`estimate_bpm_zero_crossing()` matches the MGMT-only CSI rate.
Without this, breathing and heart-rate reports were 2× the true
value. Added a comment tying the constant to the callback rate gate.
3. **Removed disabled probe-injection infrastructure** — dropped the
forward declaration, the `CSI_PROBE_INTERVAL_MS` define, six static
variables (`s_probe_timer`, `s_probe_tx_count`, `s_probe_tx_fail`,
`s_ap_bssid`, `s_ap_bssid_known`), and three functions
(`csi_send_probe_request`, `probe_timer_cb`,
`csi_collector_start_probe_timer`). None were reachable.
`csi_inject_ndp_frame()` reverted to the original ADR-029 stub.
Can be revived from this commit's parent if needed.
4. **Cleaned `sdkconfig.defaults`** — removed the SPIRAM prose and
commented-out `# CONFIG_SPIRAM is not set` line. Kept only the live
`CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` with a concise rationale.
5. **Bumped firmware version 0.6.1 → 0.6.2** and added four
`[Unreleased]` CHANGELOG entries covering the SPI cache crash fix,
the `filter_mac` / `node_id` clobber defense, the sample-rate
correction, and the `write_flash` command-form revert.
Net: +39 / -128 across six files.
Validation in this devcontainer:
- Static sanity on modified C files: braces balance (csi_collector.c
59/59; edge_processing.c 96/96), zero dangling references to removed
probe-injection symbols.
- Rust workspace tests and Python proof not executed here — cargo not
installed and pip blocked by PEP 668. Deferring hardware build +
flash + miniterm verification to @ruvnet's COM7 per his offer in
the review comment.
Co-Authored-By: claude-flow <ruv@ruv.net>
---------
Co-authored-by: Dragan Spiridonov <spiridonovdragan@gmail.com>
emit_feature_state() runs inside the FreeRTOS Timer Svc task via the
fast loop callback; it memsets an rv_feature_state_t, queries vitals/
radio, and sends via stream_sender (lwIP sendto). Default Timer Svc
stack is 2 KiB, which overflows and panics ~1 s after boot:
***ERROR*** A stack overflow in task Tmr Svc has been detected.
Bump CONFIG_FREERTOS_TIMER_TASK_STACK_DEPTH to 8 KiB across the three
sdkconfig defaults files (default, template, 4mb). Matches the main
task stack size already in use.
Found during on-device validation on ESP32-S3 (MAC 3c:0f:02:e9:b5:f8)
after flashing the post-merge v0.6.1 build — firmware boots, connects
WiFi, emits one medium tick, then crashes on the fast tick that calls
emit_feature_state().
Follow-up: consider moving emit_feature_state + network I/O out of the
timer daemon into a dedicated worker task (open issue).
Co-Authored-By: claude-flow <ruv@ruv.net>
The committed sdkconfig had CONFIG_ESP_WIFI_CSI_ENABLED disabled, causing
all builds to crash at runtime with "CSI not enabled in menuconfig".
Root cause: sdkconfig.defaults.template existed but ESP-IDF only reads
sdkconfig.defaults (no .template suffix).
Fixes:
- Add sdkconfig.defaults with CONFIG_ESP_WIFI_CSI_ENABLED=y
- Add #error compile guard in csi_collector.c to prevent recurrence
- Fix NVS encryption default (requires eFuse, breaks clean builds)
Verified: Docker build + flash to ESP32-S3 + CSI callbacks confirmed.
Closes#241
Relates to #223, #238, #234, #210, #190
Co-Authored-By: claude-flow <ruv@ruv.net>