diff --git a/CHANGELOG.md b/CHANGELOG.md index c754a985..eb52f069 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -167,7 +167,11 @@ firing cleanly, HEALTH mesh packets sent. Kconfig surface added under "Adaptive Controller (ADR-081)". ### Fixed -- **`provision.py` esptool v5 compat** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct. +- **Firmware: SPI flash cache crash under high CSI callback pressure** (RuView#396, #397) — ESP32-S3 nodes crashed in `cache_ll_l1_resume_icache` / `wDev_ProcessFiq` after ~2400 callbacks when the promiscuous filter admitted DATA frames at 100–500 Hz. Fixed by narrowing the filter mask to `WIFI_PROMIS_FILTER_MASK_MGMT` (~10 Hz beacons), adding a 50 Hz early callback rate gate (`CSI_MIN_PROCESS_INTERVAL_US`) that drops excess callbacks before any processing work, and enabling `CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` as defense-in-depth. Stability validated with a 4-min-per-node soak. +- **Firmware: `filter_mac` / `node_id` clobber by WiFi driver init** (#232, #375, #385, #386, #390, #397) — `g_nvs_config` can be corrupted during `wifi_init_sta()` on some devices (confirmed on `80:b5:4e:c1:be:b8`), reverting `node_id` to the Kconfig default and producing garbage MAC-filter reads in the CSI callback (100–500 Hz). New `csi_collector_set_node_id()` API called from `app_main()` **before** `wifi_init_sta()` captures both fields into module-local statics (`s_node_id`, `s_filter_mac`, `s_filter_mac_set`). `csi_collector_init()` now runs a canary that distinguishes "early≠g_nvs_config" (corruption confirmed) from a no-op match. All CSI runtime paths use the defensive copies exclusively. +- **Firmware: `edge_processing` sample rate mismatch** (#397) — `estimate_bpm_zero_crossing()` was called with a hard-coded `sample_rate = 20.0f`, but MGMT-only promiscuous delivers ~10 Hz. Breathing and heart-rate reports were 2× too high. Corrected to `10.0f` with an explicit comment tying it to the callback rate. +- **`provision.py` esptool command form** (#391, #397) — ESP-IDF v5.4 bundles `esptool 4.10.0`, which only accepts `write_flash` (underscore). Standalone `pip install esptool` v5.x accepts both forms but prefers `write-flash`. #391 switched to `write-flash` which broke the documented ESP-IDF Python venv flow; #397 reverts to `write_flash` (works with both esptool 4.x and 5.x) with an inline comment warning future maintainers not to "re-fix" it. +- **`provision.py` esptool v5 dry-run hint** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct. - **`provision.py` silent NVS wipe** (#391) — The script replaces the entire `csi_cfg` NVS namespace on every run, so partial invocations were silently erasing WiFi credentials and causing `Retrying WiFi connection (10/10)` in the field. Now refuses to run without `--ssid`, `--password`, and `--target-ip` unless `--force-partial` is passed. `--force-partial` prints a warning listing which keys will be wiped. - **Firmware: defensive `node_id` capture** (#232, #375, #385, #386, #390) — Users on multi-node deployments reported `node_id` reverting to the Kconfig default (`1`) in UDP frames and in the `csi_collector` init log, despite NVS loading the correct value. The root cause (memory corruption of `g_nvs_config`) has not been definitively isolated, but the UDP frame header is now tamper-proof: `csi_collector_init()` captures `g_nvs_config.node_id` into a module-local `s_node_id` once, and `csi_serialize_frame()` plus all other consumers (`edge_processing.c`, `wasm_runtime.c`, `display_ui.c`, `swarm_bridge_init`) read it via the new `csi_collector_get_node_id()` accessor. A canary logs `WARN` if `g_nvs_config.node_id` diverges from `s_node_id` at end-of-init, helping isolate the upstream corruption path. Validated on attached ESP32-S3 (COM8): NVS `node_id=2` propagates through boot log, capture log, init log, and byte[4] of every UDP frame. diff --git a/firmware/esp32-csi-node/main/csi_collector.c b/firmware/esp32-csi-node/main/csi_collector.c index b91ef2a0..c8d5eb7d 100644 --- a/firmware/esp32-csi-node/main/csi_collector.c +++ b/firmware/esp32-csi-node/main/csi_collector.c @@ -102,9 +102,6 @@ static uint8_t s_hop_index = 0; /** Handle for the periodic hop timer. NULL when timer is not running. */ static esp_timer_handle_t s_hop_timer = NULL; -/* Forward declaration — probe injection timer (defined after hop timer code) */ -static void csi_collector_start_probe_timer(void); - /** * Serialize CSI data into ADR-018 binary frame format. * @@ -382,10 +379,6 @@ void csi_collector_init(void) ESP_LOGI(TAG, "CSI collection initialized (node_id=%u, channel=%u)", (unsigned)s_node_id, (unsigned)csi_channel); - - /* Probe injection disabled — null-data TX at 10 Hz adds enough WiFi - * interrupt pressure to trigger the SPI cache crash (RuView#396). - * MGMT-only at ~10 Hz is the maximum stable rate on this hardware. */ } /* Accessor for other modules that need the authoritative runtime node_id. */ @@ -534,129 +527,42 @@ void csi_collector_start_hop_timer(void) (unsigned long)s_dwell_ms, (unsigned)s_hop_count); } -/* ---- Active CSI excitation via probe request injection (RuView#396) ---- - * - * MGMT-only promiscuous filter gives ~10 Hz (beacons), but the edge processing - * pipeline is designed for 20 Hz. We boost the CSI rate by sending probe - * requests at a controlled interval. Each visible AP responds with a probe - * response (MGMT frame), which the promiscuous callback captures with CSI. - * - * This gives deterministic rate control without DATA frames that cause the - * wDev_ProcessFiq SPI flash cache crash at 100+ Hz interrupt rates. - * - * Rate math: N probe requests/sec → N probe responses/sec per visible AP - * + ~10 Hz beacons = (N * num_APs) + 10 Hz effective CSI rate - * At 10 Hz injection with 1 AP responding: ~20 Hz total (matches edge_proc) - */ +/* ---- ADR-029: NDP frame injection stub ---- */ -#define CSI_PROBE_INTERVAL_MS 100 /* 10 Hz probe injection → ~20 Hz total with beacons */ -static esp_timer_handle_t s_probe_timer = NULL; -static uint32_t s_probe_tx_count = 0; -static uint32_t s_probe_tx_fail = 0; - -static uint8_t s_ap_bssid[6] = {0}; -static bool s_ap_bssid_known = false; - -static void csi_send_probe_request(void) -{ - /* Directed null-data frame to the connected AP. - * - * We send a Null Data frame (not a broadcast probe request) to avoid - * triggering WiFi channel scanning/toggling. The AP responds with an ACK, - * and the exchange generates CSI on both the TX and RX paths. - * Using null-data instead of probe request because: - * - Probe requests to broadcast BSSID trigger channel width negotiation - * (observed: 1912 channel toggles in 2.5 min, disrupting CSI collection) - * - Null-data to the connected AP is the standard WiFi sensing approach - * - The AP always ACKs, giving us a deterministic CSI response - * - * Frame: Type=Data (0x02), Subtype=Null (0x04) → FC=0x0048 - * ToDS=1 (going to AP), FromDS=0 - */ - if (!s_ap_bssid_known) { - wifi_ap_record_t ap_info; - if (esp_wifi_sta_get_ap_info(&ap_info) == ESP_OK) { - memcpy(s_ap_bssid, ap_info.bssid, 6); - s_ap_bssid_known = true; - ESP_LOGI(TAG, "Probe target: AP BSSID %02x:%02x:%02x:%02x:%02x:%02x", - s_ap_bssid[0], s_ap_bssid[1], s_ap_bssid[2], - s_ap_bssid[3], s_ap_bssid[4], s_ap_bssid[5]); - } else { - return; /* Not connected yet — skip this cycle */ - } - } - - uint8_t null_frame[24]; - memset(null_frame, 0, sizeof(null_frame)); - - /* Frame Control: Null Data, ToDS=1 */ - null_frame[0] = 0x48; /* Type=Data, Subtype=Null */ - null_frame[1] = 0x01; /* ToDS=1 */ - - /* Addr1 (receiver = AP BSSID) */ - memcpy(&null_frame[4], s_ap_bssid, 6); - - /* Addr2 (transmitter = our MAC — hardware overwrites, but set for clarity) */ - uint8_t mac[6]; - esp_wifi_get_mac(WIFI_IF_STA, mac); - memcpy(&null_frame[10], mac, 6); - - /* Addr3 (BSSID = AP) */ - memcpy(&null_frame[16], s_ap_bssid, 6); - - esp_err_t err = esp_wifi_80211_tx(WIFI_IF_STA, null_frame, sizeof(null_frame), true); - if (err == ESP_OK) { - s_probe_tx_count++; - } else { - s_probe_tx_fail++; - if (s_probe_tx_fail <= 3) { - ESP_LOGW(TAG, "Null-data TX failed: %s (count=%lu)", - esp_err_to_name(err), (unsigned long)s_probe_tx_fail); - } - } -} - -static void probe_timer_cb(void *arg) -{ - (void)arg; - csi_send_probe_request(); -} - -static void csi_collector_start_probe_timer(void) -{ - if (s_probe_timer != NULL) { - ESP_LOGW(TAG, "Probe timer already running"); - return; - } - - esp_timer_create_args_t timer_args = { - .callback = probe_timer_cb, - .arg = NULL, - .name = "csi_probe", - }; - - esp_err_t err = esp_timer_create(&timer_args, &s_probe_timer); - if (err != ESP_OK) { - ESP_LOGE(TAG, "Failed to create probe timer: %s", esp_err_to_name(err)); - return; - } - - uint64_t period_us = (uint64_t)CSI_PROBE_INTERVAL_MS * 1000; - err = esp_timer_start_periodic(s_probe_timer, period_us); - if (err != ESP_OK) { - ESP_LOGE(TAG, "Failed to start probe timer: %s", esp_err_to_name(err)); - esp_timer_delete(s_probe_timer); - s_probe_timer = NULL; - return; - } - - ESP_LOGI(TAG, "Null-data injection timer started: %d ms (~%d Hz + beacons, RuView#396)", - CSI_PROBE_INTERVAL_MS, 1000 / CSI_PROBE_INTERVAL_MS); -} - -/* Legacy NDP injection stub — kept for API compatibility */ esp_err_t csi_inject_ndp_frame(void) { - csi_send_probe_request(); - return ESP_OK; + /* + * TODO: Construct a proper 802.11 Null Data Packet frame. + * + * A real NDP is preamble-only (~24 us airtime, no payload) and is the + * sensing-first TX mechanism described in ADR-029. For now we send a + * minimal null-data frame as a placeholder so the API is wired up. + * + * Frame structure (IEEE 802.11 Null Data): + * FC (2) | Duration (2) | Addr1 (6) | Addr2 (6) | Addr3 (6) | SeqCtl (2) + * = 24 bytes total, no body, no FCS (hardware appends FCS). + */ + uint8_t ndp_frame[24]; + memset(ndp_frame, 0, sizeof(ndp_frame)); + + /* Frame Control: Type=Data (0x02), Subtype=Null (0x04) -> 0x0048 */ + ndp_frame[0] = 0x48; + ndp_frame[1] = 0x00; + + /* Duration: 0 (let hardware fill) */ + + /* Addr1 (destination): broadcast */ + memset(&ndp_frame[4], 0xFF, 6); + + /* Addr2 (source): will be overwritten by hardware with own MAC */ + + /* Addr3 (BSSID): broadcast */ + memset(&ndp_frame[16], 0xFF, 6); + + esp_err_t err = esp_wifi_80211_tx(WIFI_IF_STA, ndp_frame, sizeof(ndp_frame), false); + if (err != ESP_OK) { + ESP_LOGW(TAG, "NDP inject failed: %s", esp_err_to_name(err)); + } + + return err; } diff --git a/firmware/esp32-csi-node/main/edge_processing.c b/firmware/esp32-csi-node/main/edge_processing.c index ad5c8795..94680e52 100644 --- a/firmware/esp32-csi-node/main/edge_processing.c +++ b/firmware/esp32-csi-node/main/edge_processing.c @@ -714,8 +714,11 @@ static void process_frame(const edge_ring_slot_t *slot) s_frame_count++; s_latest_rssi = slot->rssi; - /* Assumed CSI sample rate (~20 Hz for typical ESP32 CSI). */ - const float sample_rate = 20.0f; + /* CSI sample rate. MGMT-only promiscuous filter (RuView#396, csi_collector.c) + * yields ~10 Hz from beacons; keep this value aligned with csi_collector's + * effective callback rate or estimate_bpm_zero_crossing() reports the wrong + * BPM (2× rate mismatch → 2× wrong breathing/HR). */ + const float sample_rate = 10.0f; /* --- Step 1-2: Phase extraction + unwrapping per subcarrier --- */ float phases[EDGE_MAX_SUBCARRIERS]; diff --git a/firmware/esp32-csi-node/provision.py b/firmware/esp32-csi-node/provision.py index 2c78dea6..d6a0e2f0 100644 --- a/firmware/esp32-csi-node/provision.py +++ b/firmware/esp32-csi-node/provision.py @@ -155,6 +155,9 @@ def flash_nvs(port, baud, nvs_bin): "--chip", "esp32s3", "--port", port, "--baud", str(baud), + # Keep underscore form — ESP-IDF v5.4 bundles esptool 4.10.0 which only + # accepts "write_flash". pip's esptool >=5.x accepts both (hyphenated + # form preferred) but keeps underscore working. Do not "correct" this. "write_flash", hex(NVS_PARTITION_OFFSET), bin_path, ] diff --git a/firmware/esp32-csi-node/sdkconfig.defaults b/firmware/esp32-csi-node/sdkconfig.defaults index fd8f6c2e..9d2ca761 100644 --- a/firmware/esp32-csi-node/sdkconfig.defaults +++ b/firmware/esp32-csi-node/sdkconfig.defaults @@ -32,10 +32,5 @@ CONFIG_LWIP_SO_RCVBUF=y # FreeRTOS: increase task stack for CSI processing CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192 -# SPIRAM XIP tested but crashes with "Cache disabled but cached memory -# region accessed" — different crash type, not solved. Disabled for now. -# See RuView#396 for details. PSRAM heap-only mode can be enabled later. -# CONFIG_SPIRAM is not set - -# Extra WiFi IRAM placement (defense-in-depth) +# Extra WiFi IRAM placement (defense-in-depth for RuView#396 SPI cache race) CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y