From f995f6962251a4b01e6ee1da8d33f7c92c805f93 Mon Sep 17 00:00:00 2001 From: ruv Date: Tue, 3 Mar 2026 16:14:54 -0500 Subject: [PATCH] docs: update ADRs with ENOMEM crash fix proof (Issue #127) - ADR-018: Document rate-limiting and ENOMEM backoff safeguards in firmware - ADR-029: Add note about rate-limiting requirement for channel hopping, mark lwIP pbuf exhaustion risk as resolved - ADR-039: Add finding #5 documenting the sendto ENOMEM crash and fix (947 KB binary, hardware-verified 200+ callbacks with zero errors) - CHANGELOG: Add entries for Issue #127 fix and Issue #130 provisioning fix Co-Authored-By: claude-flow --- CHANGELOG.md | 3 +++ docs/adr/ADR-018-esp32-dev-implementation.md | 7 +++++++ docs/adr/ADR-029-ruvsense-multistatic-sensing-mode.md | 3 +++ docs/adr/ADR-039-esp32-edge-intelligence.md | 1 + 4 files changed, 14 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8d77cb6a..1f59d53a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Training control: `GET /api/v1/train/status`, `POST /api/v1/train/start`, `POST /api/v1/train/stop` - Recording writes CSI frames to `.jsonl` files via tokio background task - Model/recording directories scanned at startup, state managed via `Arc>` +- **ADR-044: Provisioning tool enhancements** — 5-phase plan for complete NVS coverage (7 missing keys), JSON config files, mesh presets, read-back/verify, and auto-detect - **25 real mobile tests** replacing `it.todo()` placeholders — 205 assertions covering components, services, stores, hooks, screens, and utils - **Project MERIDIAN (ADR-027)** — Cross-environment domain generalization for WiFi pose estimation (1,858 lines, 72 tests) - `HardwareNormalizer` — Catmull-Rom cubic interpolation resamples any hardware CSI to canonical 56 subcarriers; z-score + phase sanitization @@ -30,6 +31,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - ADR-025: macOS CoreWLAN WiFi Sensing (ORCA) ### Fixed +- **sendto ENOMEM crash (Issue #127)** — CSI callbacks in promiscuous mode exhaust lwIP pbuf pool causing guru meditation crash. Fixed with 50 Hz rate limiter in `csi_collector.c` and 100 ms ENOMEM backoff in `stream_sender.c`. Hardware-verified on ESP32-S3 (200+ callbacks, zero crashes) +- **Provisioning script missing TDM/edge flags (Issue #130)** — Added `--tdm-slot`, `--tdm-total`, `--edge-tier`, `--pres-thresh`, `--fall-thresh`, `--vital-win`, `--vital-int`, `--subk-count` to `provision.py` - **WebSocket "RECONNECTING" on Dashboard/Live Demo** — `sensingService.start()` now called on app init in `app.js` so WebSocket connects immediately instead of waiting for Sensing tab visit - **Mobile WebSocket port** — `ws.service.ts` `buildWsUrl()` uses same-origin port instead of hardcoded port 3001 - **Mobile Jest config** — `testPathIgnorePatterns` no longer silently ignores the entire test directory diff --git a/docs/adr/ADR-018-esp32-dev-implementation.md b/docs/adr/ADR-018-esp32-dev-implementation.md index 26a3dd42..6cb70f3d 100644 --- a/docs/adr/ADR-018-esp32-dev-implementation.md +++ b/docs/adr/ADR-018-esp32-dev-implementation.md @@ -96,6 +96,13 @@ static void csi_data_callback(void *ctx, wifi_csi_info_t *info) { **No on-device FFT** (contradicting ADR-012's optional feature extraction path): The Rust aggregator will do feature extraction using the SOTA `wifi-densepose-signal` pipeline. Raw I/Q is cheaper to stream at ESP32 sampling rates (~100 Hz at 56 subcarriers = ~35 KB/s per node). +**Rate-limiting and ENOMEM backoff** (Issue #127 fix): + +CSI callbacks fire 100-500+ times/sec in promiscuous mode. Two safeguards prevent lwIP pbuf exhaustion: + +1. **50 Hz rate limiter** (`csi_collector.c`): `sendto()` is skipped if less than 20 ms have elapsed since the last successful send. Excess CSI callbacks are dropped silently. +2. **ENOMEM backoff** (`stream_sender.c`): When `sendto()` returns `ENOMEM` (errno 12), all sends are suppressed for 100 ms to let lwIP reclaim packet buffers. Without this, rapid-fire failed sends cause a guru meditation crash. + **`sdkconfig.defaults`** must enable: ``` diff --git a/docs/adr/ADR-029-ruvsense-multistatic-sensing-mode.md b/docs/adr/ADR-029-ruvsense-multistatic-sensing-mode.md index 7cf10e86..45e1c781 100644 --- a/docs/adr/ADR-029-ruvsense-multistatic-sensing-mode.md +++ b/docs/adr/ADR-029-ruvsense-multistatic-sensing-mode.md @@ -74,6 +74,8 @@ static uint32_t s_dwell_ms = 50; // 50ms per channel At 100 Hz raw CSI rate with 50 ms dwell across 3 channels, each channel yields ~33 frames/second. The existing ADR-018 binary frame format already carries `channel_freq_mhz` at offset 8, so no wire format change is needed. +> **Note (Issue #127 fix):** In promiscuous mode, CSI callbacks fire 100-500+ times/sec — far exceeding the channel dwell rate. The firmware now rate-limits UDP sends to 50 Hz and applies a 100 ms ENOMEM backoff if lwIP buffers are exhausted. This is essential for stable channel hopping under load. + **NDP frame injection:** `esp_wifi_80211_tx()` injects deterministic Null Data Packet frames (preamble-only, no payload, ~24 us airtime) at GPIO-triggered intervals. This is sensing-first: the primary RF emission purpose is CSI measurement, not data communication. ### 2.3 Multi-Band Frame Fusion @@ -364,6 +366,7 @@ No new workspace dependencies. All ruvector crates are already in the workspace | Risk | Probability | Impact | Mitigation | |------|-------------|--------|------------| | ESP32 channel hop causes CSI gaps | Medium | Reduced effective rate | Measure gap duration; increase dwell if >5ms | +| CSI callback rate exhausts lwIP pbufs | **Resolved** | Guru meditation crash | 50 Hz rate limiter + 100 ms ENOMEM backoff (Issue #127, PR #132) | | 5 GHz CSI unavailable on S3 | High | Lose frequency diversity | Fallback: 3-channel 2.4 GHz still provides 3x BW; ESP32-C6 for dual-band | | Model inference >40ms | Medium | Miss 20 Hz target | Run model at 10 Hz; Kalman predict at 20 Hz interpolates | | Two-person separation fails at 3 nodes | Low | Identity swaps | AETHER re-ID recovers; increase to 4-6 nodes | diff --git a/docs/adr/ADR-039-esp32-edge-intelligence.md b/docs/adr/ADR-039-esp32-edge-intelligence.md index ce9e70be..0eec7604 100644 --- a/docs/adr/ADR-039-esp32-edge-intelligence.md +++ b/docs/adr/ADR-039-esp32-edge-intelligence.md @@ -208,3 +208,4 @@ Measured on ESP32-S3 (QFN56 rev v0.2, 8 MB flash, 160 MHz, ESP-IDF v5.2). 2. **No PSRAM on test board** — WASM arena falls back to internal heap. Boards with PSRAM would support larger modules. 3. **CSI rate exceeds spec** — measured 28.5 Hz vs. expected ~20 Hz. Performance headroom is better than estimated. 4. **WiFi-to-Ethernet isolation** — some routers block UDP between WiFi and wired clients. Recommend same-subnet verification in deployment guide. +5. **sendto ENOMEM crash (Issue #127)** — CSI callbacks in promiscuous mode fire 100-500+ times/sec, exhausting the lwIP pbuf pool and causing a guru meditation crash. Fixed with a dual approach: 50 Hz rate limiter in `csi_collector.c` (20 ms minimum send interval) and a 100 ms ENOMEM backoff in `stream_sender.c`. Binary size with fix: 947 KB. Hardware-verified stable for 200+ CSI callbacks with zero ENOMEM errors.