fix(firmware): address PR #397 review feedback
Applies @ruvnet's five review requests on PR #397 (RuView#397 comment 4289417527): 1. **Inline comment on `provision.py` `write_flash`** — ESP-IDF v5.4 bundles esptool 4.10.0 (underscore-only). #391's hyphen swap broke the documented venv flow; kept the underscore form and added a three-line comment warning future maintainers not to "re-fix" it. 2. **Correct `edge_processing.c` sample_rate** (blocking) — changed hard-coded `20.0f` → `10.0f` at line 718 so `estimate_bpm_zero_crossing()` matches the MGMT-only CSI rate. Without this, breathing and heart-rate reports were 2× the true value. Added a comment tying the constant to the callback rate gate. 3. **Removed disabled probe-injection infrastructure** — dropped the forward declaration, the `CSI_PROBE_INTERVAL_MS` define, six static variables (`s_probe_timer`, `s_probe_tx_count`, `s_probe_tx_fail`, `s_ap_bssid`, `s_ap_bssid_known`), and three functions (`csi_send_probe_request`, `probe_timer_cb`, `csi_collector_start_probe_timer`). None were reachable. `csi_inject_ndp_frame()` reverted to the original ADR-029 stub. Can be revived from this commit's parent if needed. 4. **Cleaned `sdkconfig.defaults`** — removed the SPIRAM prose and commented-out `# CONFIG_SPIRAM is not set` line. Kept only the live `CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` with a concise rationale. 5. **Bumped firmware version 0.6.1 → 0.6.2** and added four `[Unreleased]` CHANGELOG entries covering the SPI cache crash fix, the `filter_mac` / `node_id` clobber defense, the sample-rate correction, and the `write_flash` command-form revert. Net: +39 / -128 across six files. Validation in this devcontainer: - Static sanity on modified C files: braces balance (csi_collector.c 59/59; edge_processing.c 96/96), zero dangling references to removed probe-injection symbols. - Rust workspace tests and Python proof not executed here — cargo not installed and pip blocked by PEP 668. Deferring hardware build + flash + miniterm verification to @ruvnet's COM7 per his offer in the review comment. Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
a14835bb91
commit
ee11a5a766
|
|
@ -167,7 +167,11 @@ firing cleanly, HEALTH mesh packets sent.
|
|||
Kconfig surface added under "Adaptive Controller (ADR-081)".
|
||||
|
||||
### Fixed
|
||||
- **`provision.py` esptool v5 compat** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct.
|
||||
- **Firmware: SPI flash cache crash under high CSI callback pressure** (RuView#396, #397) — ESP32-S3 nodes crashed in `cache_ll_l1_resume_icache` / `wDev_ProcessFiq` after ~2400 callbacks when the promiscuous filter admitted DATA frames at 100–500 Hz. Fixed by narrowing the filter mask to `WIFI_PROMIS_FILTER_MASK_MGMT` (~10 Hz beacons), adding a 50 Hz early callback rate gate (`CSI_MIN_PROCESS_INTERVAL_US`) that drops excess callbacks before any processing work, and enabling `CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y` as defense-in-depth. Stability validated with a 4-min-per-node soak.
|
||||
- **Firmware: `filter_mac` / `node_id` clobber by WiFi driver init** (#232, #375, #385, #386, #390, #397) — `g_nvs_config` can be corrupted during `wifi_init_sta()` on some devices (confirmed on `80:b5:4e:c1:be:b8`), reverting `node_id` to the Kconfig default and producing garbage MAC-filter reads in the CSI callback (100–500 Hz). New `csi_collector_set_node_id()` API called from `app_main()` **before** `wifi_init_sta()` captures both fields into module-local statics (`s_node_id`, `s_filter_mac`, `s_filter_mac_set`). `csi_collector_init()` now runs a canary that distinguishes "early≠g_nvs_config" (corruption confirmed) from a no-op match. All CSI runtime paths use the defensive copies exclusively.
|
||||
- **Firmware: `edge_processing` sample rate mismatch** (#397) — `estimate_bpm_zero_crossing()` was called with a hard-coded `sample_rate = 20.0f`, but MGMT-only promiscuous delivers ~10 Hz. Breathing and heart-rate reports were 2× too high. Corrected to `10.0f` with an explicit comment tying it to the callback rate.
|
||||
- **`provision.py` esptool command form** (#391, #397) — ESP-IDF v5.4 bundles `esptool 4.10.0`, which only accepts `write_flash` (underscore). Standalone `pip install esptool` v5.x accepts both forms but prefers `write-flash`. #391 switched to `write-flash` which broke the documented ESP-IDF Python venv flow; #397 reverts to `write_flash` (works with both esptool 4.x and 5.x) with an inline comment warning future maintainers not to "re-fix" it.
|
||||
- **`provision.py` esptool v5 dry-run hint** (#391) — Stale `write_flash` (underscore) syntax in the dry-run manual-flash hint now uses `write-flash` (hyphenated) for esptool >= 5.x. The primary flash command was already correct.
|
||||
- **`provision.py` silent NVS wipe** (#391) — The script replaces the entire `csi_cfg` NVS namespace on every run, so partial invocations were silently erasing WiFi credentials and causing `Retrying WiFi connection (10/10)` in the field. Now refuses to run without `--ssid`, `--password`, and `--target-ip` unless `--force-partial` is passed. `--force-partial` prints a warning listing which keys will be wiped.
|
||||
- **Firmware: defensive `node_id` capture** (#232, #375, #385, #386, #390) — Users on multi-node deployments reported `node_id` reverting to the Kconfig default (`1`) in UDP frames and in the `csi_collector` init log, despite NVS loading the correct value. The root cause (memory corruption of `g_nvs_config`) has not been definitively isolated, but the UDP frame header is now tamper-proof: `csi_collector_init()` captures `g_nvs_config.node_id` into a module-local `s_node_id` once, and `csi_serialize_frame()` plus all other consumers (`edge_processing.c`, `wasm_runtime.c`, `display_ui.c`, `swarm_bridge_init`) read it via the new `csi_collector_get_node_id()` accessor. A canary logs `WARN` if `g_nvs_config.node_id` diverges from `s_node_id` at end-of-init, helping isolate the upstream corruption path. Validated on attached ESP32-S3 (COM8): NVS `node_id=2` propagates through boot log, capture log, init log, and byte[4] of every UDP frame.
|
||||
|
||||
|
|
|
|||
|
|
@ -102,9 +102,6 @@ static uint8_t s_hop_index = 0;
|
|||
/** Handle for the periodic hop timer. NULL when timer is not running. */
|
||||
static esp_timer_handle_t s_hop_timer = NULL;
|
||||
|
||||
/* Forward declaration — probe injection timer (defined after hop timer code) */
|
||||
static void csi_collector_start_probe_timer(void);
|
||||
|
||||
/**
|
||||
* Serialize CSI data into ADR-018 binary frame format.
|
||||
*
|
||||
|
|
@ -382,10 +379,6 @@ void csi_collector_init(void)
|
|||
|
||||
ESP_LOGI(TAG, "CSI collection initialized (node_id=%u, channel=%u)",
|
||||
(unsigned)s_node_id, (unsigned)csi_channel);
|
||||
|
||||
/* Probe injection disabled — null-data TX at 10 Hz adds enough WiFi
|
||||
* interrupt pressure to trigger the SPI cache crash (RuView#396).
|
||||
* MGMT-only at ~10 Hz is the maximum stable rate on this hardware. */
|
||||
}
|
||||
|
||||
/* Accessor for other modules that need the authoritative runtime node_id. */
|
||||
|
|
@ -534,129 +527,42 @@ void csi_collector_start_hop_timer(void)
|
|||
(unsigned long)s_dwell_ms, (unsigned)s_hop_count);
|
||||
}
|
||||
|
||||
/* ---- Active CSI excitation via probe request injection (RuView#396) ----
|
||||
*
|
||||
* MGMT-only promiscuous filter gives ~10 Hz (beacons), but the edge processing
|
||||
* pipeline is designed for 20 Hz. We boost the CSI rate by sending probe
|
||||
* requests at a controlled interval. Each visible AP responds with a probe
|
||||
* response (MGMT frame), which the promiscuous callback captures with CSI.
|
||||
*
|
||||
* This gives deterministic rate control without DATA frames that cause the
|
||||
* wDev_ProcessFiq SPI flash cache crash at 100+ Hz interrupt rates.
|
||||
*
|
||||
* Rate math: N probe requests/sec → N probe responses/sec per visible AP
|
||||
* + ~10 Hz beacons = (N * num_APs) + 10 Hz effective CSI rate
|
||||
* At 10 Hz injection with 1 AP responding: ~20 Hz total (matches edge_proc)
|
||||
*/
|
||||
/* ---- ADR-029: NDP frame injection stub ---- */
|
||||
|
||||
#define CSI_PROBE_INTERVAL_MS 100 /* 10 Hz probe injection → ~20 Hz total with beacons */
|
||||
static esp_timer_handle_t s_probe_timer = NULL;
|
||||
static uint32_t s_probe_tx_count = 0;
|
||||
static uint32_t s_probe_tx_fail = 0;
|
||||
|
||||
static uint8_t s_ap_bssid[6] = {0};
|
||||
static bool s_ap_bssid_known = false;
|
||||
|
||||
static void csi_send_probe_request(void)
|
||||
{
|
||||
/* Directed null-data frame to the connected AP.
|
||||
*
|
||||
* We send a Null Data frame (not a broadcast probe request) to avoid
|
||||
* triggering WiFi channel scanning/toggling. The AP responds with an ACK,
|
||||
* and the exchange generates CSI on both the TX and RX paths.
|
||||
* Using null-data instead of probe request because:
|
||||
* - Probe requests to broadcast BSSID trigger channel width negotiation
|
||||
* (observed: 1912 channel toggles in 2.5 min, disrupting CSI collection)
|
||||
* - Null-data to the connected AP is the standard WiFi sensing approach
|
||||
* - The AP always ACKs, giving us a deterministic CSI response
|
||||
*
|
||||
* Frame: Type=Data (0x02), Subtype=Null (0x04) → FC=0x0048
|
||||
* ToDS=1 (going to AP), FromDS=0
|
||||
*/
|
||||
if (!s_ap_bssid_known) {
|
||||
wifi_ap_record_t ap_info;
|
||||
if (esp_wifi_sta_get_ap_info(&ap_info) == ESP_OK) {
|
||||
memcpy(s_ap_bssid, ap_info.bssid, 6);
|
||||
s_ap_bssid_known = true;
|
||||
ESP_LOGI(TAG, "Probe target: AP BSSID %02x:%02x:%02x:%02x:%02x:%02x",
|
||||
s_ap_bssid[0], s_ap_bssid[1], s_ap_bssid[2],
|
||||
s_ap_bssid[3], s_ap_bssid[4], s_ap_bssid[5]);
|
||||
} else {
|
||||
return; /* Not connected yet — skip this cycle */
|
||||
}
|
||||
}
|
||||
|
||||
uint8_t null_frame[24];
|
||||
memset(null_frame, 0, sizeof(null_frame));
|
||||
|
||||
/* Frame Control: Null Data, ToDS=1 */
|
||||
null_frame[0] = 0x48; /* Type=Data, Subtype=Null */
|
||||
null_frame[1] = 0x01; /* ToDS=1 */
|
||||
|
||||
/* Addr1 (receiver = AP BSSID) */
|
||||
memcpy(&null_frame[4], s_ap_bssid, 6);
|
||||
|
||||
/* Addr2 (transmitter = our MAC — hardware overwrites, but set for clarity) */
|
||||
uint8_t mac[6];
|
||||
esp_wifi_get_mac(WIFI_IF_STA, mac);
|
||||
memcpy(&null_frame[10], mac, 6);
|
||||
|
||||
/* Addr3 (BSSID = AP) */
|
||||
memcpy(&null_frame[16], s_ap_bssid, 6);
|
||||
|
||||
esp_err_t err = esp_wifi_80211_tx(WIFI_IF_STA, null_frame, sizeof(null_frame), true);
|
||||
if (err == ESP_OK) {
|
||||
s_probe_tx_count++;
|
||||
} else {
|
||||
s_probe_tx_fail++;
|
||||
if (s_probe_tx_fail <= 3) {
|
||||
ESP_LOGW(TAG, "Null-data TX failed: %s (count=%lu)",
|
||||
esp_err_to_name(err), (unsigned long)s_probe_tx_fail);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void probe_timer_cb(void *arg)
|
||||
{
|
||||
(void)arg;
|
||||
csi_send_probe_request();
|
||||
}
|
||||
|
||||
static void csi_collector_start_probe_timer(void)
|
||||
{
|
||||
if (s_probe_timer != NULL) {
|
||||
ESP_LOGW(TAG, "Probe timer already running");
|
||||
return;
|
||||
}
|
||||
|
||||
esp_timer_create_args_t timer_args = {
|
||||
.callback = probe_timer_cb,
|
||||
.arg = NULL,
|
||||
.name = "csi_probe",
|
||||
};
|
||||
|
||||
esp_err_t err = esp_timer_create(&timer_args, &s_probe_timer);
|
||||
if (err != ESP_OK) {
|
||||
ESP_LOGE(TAG, "Failed to create probe timer: %s", esp_err_to_name(err));
|
||||
return;
|
||||
}
|
||||
|
||||
uint64_t period_us = (uint64_t)CSI_PROBE_INTERVAL_MS * 1000;
|
||||
err = esp_timer_start_periodic(s_probe_timer, period_us);
|
||||
if (err != ESP_OK) {
|
||||
ESP_LOGE(TAG, "Failed to start probe timer: %s", esp_err_to_name(err));
|
||||
esp_timer_delete(s_probe_timer);
|
||||
s_probe_timer = NULL;
|
||||
return;
|
||||
}
|
||||
|
||||
ESP_LOGI(TAG, "Null-data injection timer started: %d ms (~%d Hz + beacons, RuView#396)",
|
||||
CSI_PROBE_INTERVAL_MS, 1000 / CSI_PROBE_INTERVAL_MS);
|
||||
}
|
||||
|
||||
/* Legacy NDP injection stub — kept for API compatibility */
|
||||
esp_err_t csi_inject_ndp_frame(void)
|
||||
{
|
||||
csi_send_probe_request();
|
||||
return ESP_OK;
|
||||
/*
|
||||
* TODO: Construct a proper 802.11 Null Data Packet frame.
|
||||
*
|
||||
* A real NDP is preamble-only (~24 us airtime, no payload) and is the
|
||||
* sensing-first TX mechanism described in ADR-029. For now we send a
|
||||
* minimal null-data frame as a placeholder so the API is wired up.
|
||||
*
|
||||
* Frame structure (IEEE 802.11 Null Data):
|
||||
* FC (2) | Duration (2) | Addr1 (6) | Addr2 (6) | Addr3 (6) | SeqCtl (2)
|
||||
* = 24 bytes total, no body, no FCS (hardware appends FCS).
|
||||
*/
|
||||
uint8_t ndp_frame[24];
|
||||
memset(ndp_frame, 0, sizeof(ndp_frame));
|
||||
|
||||
/* Frame Control: Type=Data (0x02), Subtype=Null (0x04) -> 0x0048 */
|
||||
ndp_frame[0] = 0x48;
|
||||
ndp_frame[1] = 0x00;
|
||||
|
||||
/* Duration: 0 (let hardware fill) */
|
||||
|
||||
/* Addr1 (destination): broadcast */
|
||||
memset(&ndp_frame[4], 0xFF, 6);
|
||||
|
||||
/* Addr2 (source): will be overwritten by hardware with own MAC */
|
||||
|
||||
/* Addr3 (BSSID): broadcast */
|
||||
memset(&ndp_frame[16], 0xFF, 6);
|
||||
|
||||
esp_err_t err = esp_wifi_80211_tx(WIFI_IF_STA, ndp_frame, sizeof(ndp_frame), false);
|
||||
if (err != ESP_OK) {
|
||||
ESP_LOGW(TAG, "NDP inject failed: %s", esp_err_to_name(err));
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
|
|
|||
|
|
@ -714,8 +714,11 @@ static void process_frame(const edge_ring_slot_t *slot)
|
|||
s_frame_count++;
|
||||
s_latest_rssi = slot->rssi;
|
||||
|
||||
/* Assumed CSI sample rate (~20 Hz for typical ESP32 CSI). */
|
||||
const float sample_rate = 20.0f;
|
||||
/* CSI sample rate. MGMT-only promiscuous filter (RuView#396, csi_collector.c)
|
||||
* yields ~10 Hz from beacons; keep this value aligned with csi_collector's
|
||||
* effective callback rate or estimate_bpm_zero_crossing() reports the wrong
|
||||
* BPM (2× rate mismatch → 2× wrong breathing/HR). */
|
||||
const float sample_rate = 10.0f;
|
||||
|
||||
/* --- Step 1-2: Phase extraction + unwrapping per subcarrier --- */
|
||||
float phases[EDGE_MAX_SUBCARRIERS];
|
||||
|
|
|
|||
|
|
@ -155,6 +155,9 @@ def flash_nvs(port, baud, nvs_bin):
|
|||
"--chip", "esp32s3",
|
||||
"--port", port,
|
||||
"--baud", str(baud),
|
||||
# Keep underscore form — ESP-IDF v5.4 bundles esptool 4.10.0 which only
|
||||
# accepts "write_flash". pip's esptool >=5.x accepts both (hyphenated
|
||||
# form preferred) but keeps underscore working. Do not "correct" this.
|
||||
"write_flash",
|
||||
hex(NVS_PARTITION_OFFSET), bin_path,
|
||||
]
|
||||
|
|
|
|||
|
|
@ -32,10 +32,5 @@ CONFIG_LWIP_SO_RCVBUF=y
|
|||
# FreeRTOS: increase task stack for CSI processing
|
||||
CONFIG_ESP_MAIN_TASK_STACK_SIZE=8192
|
||||
|
||||
# SPIRAM XIP tested but crashes with "Cache disabled but cached memory
|
||||
# region accessed" — different crash type, not solved. Disabled for now.
|
||||
# See RuView#396 for details. PSRAM heap-only mode can be enabled later.
|
||||
# CONFIG_SPIRAM is not set
|
||||
|
||||
# Extra WiFi IRAM placement (defense-in-depth)
|
||||
# Extra WiFi IRAM placement (defense-in-depth for RuView#396 SPI cache race)
|
||||
CONFIG_ESP_WIFI_EXTRA_IRAM_OPT=y
|
||||
|
|
|
|||
Loading…
Reference in New Issue