diff --git a/docs/adr/ADR-108-fw-nvs-persist-gain-lock.md b/docs/adr/ADR-108-fw-nvs-persist-gain-lock.md index 8279510b..a91f4c4c 100644 --- a/docs/adr/ADR-108-fw-nvs-persist-gain-lock.md +++ b/docs/adr/ADR-108-fw-nvs-persist-gain-lock.md @@ -152,16 +152,17 @@ I (4980) csi_collector: gain-lock PERSISTED to NVS (csi_cfg/gl_agc, gl_fft) ## Open Items -* **REST endpoint to clear gain-lock NVS** — today the operator has - to USB-erase the namespace. A FW-side `POST /ota/recalibrate` that - clears the two keys + `esp_restart()` would close that loop. - ~30 min FW + flash. -* **Track AP MAC alongside AGC/FFT** — `csi_cfg/gl_ap_mac`. On boot, - if current AP MAC ≠ saved → ignore the cached values and re-calibrate. - Fully automatic invalidation. ~1 h FW. * **Per-channel cache** — `csi_cfg/gl__agc`. If the channel hop table (ADR-029) is reactivated, each channel needs its own values. - ~1 h FW. + ~1 h FW. Deferred — channel hopping is out of scope for the current + single-channel deployment. + +## Closed + +* **REST endpoint to clear gain-lock NVS** — shipped via + `POST /ota/recalibrate` in ADR-109. +* **Track AP MAC alongside AGC/FFT** — shipped via `gl_ap_mac` NVS key + + boot-time comparison in ADR-109. ## References diff --git a/docs/adr/ADR-109-fw-gain-lock-invalidation.md b/docs/adr/ADR-109-fw-gain-lock-invalidation.md new file mode 100644 index 00000000..9eacd8ba --- /dev/null +++ b/docs/adr/ADR-109-fw-gain-lock-invalidation.md @@ -0,0 +1,145 @@ +# ADR-109 — FW Gain-Lock Invalidation (REST trigger + AP-MAC binding) + +**Status**: Accepted +**Date**: 2026-05-17 +**Scope**: `firmware/esp32-csi-node/main/ota_update.c`, +`firmware/esp32-csi-node/main/csi_collector.c`. Closes both Open Items in +ADR-108. + +## Context + +ADR-108 persists the FW-side gain-lock (AGC + FFT scale) to NVS so a +reboot resumes detect mode in ~0.5 s. Two follow-ups remained: + +1. **No way to clear the cache without USB.** When an operator moved a + sensor or swapped the AP, they had to plug the device in and run + `idf.py erase-flash` to force a re-calibration. Defeats the whole + point of OTA-only ops. +2. **No automatic invalidation on AP swap.** Gain-lock is tied to a + specific RF path (AP location, distance, multipath). Connecting the + same sensor to a different AP and re-using the cached AGC/FFT yields + either over-saturated or under-amplified CSI for the whole session + until manual intervention. + +## Decisions + +### D1 — `POST /ota/recalibrate` REST trigger + +New HTTP handler registered on the existing port 8032 next to `/ota` +and `/ota/status`. Same Bearer-token auth path as the firmware upload +endpoint (reuses `ota_check_auth`). + +Behaviour: + +1. Open NVS namespace `csi_cfg` RW. +2. Erase three keys: `gl_agc`, `gl_fft`, `gl_ap_mac` (D2). +3. `nvs_commit` + close. +4. Send `200 OK {status:"ok"}` JSON. +5. `vTaskDelay(1 s)` to flush the response, then `esp_restart()`. + +Next boot: `rv_gain_load_from_nvs` returns `ESP_ERR_NVS_NOT_FOUND` → +the existing 300-packet calibration runs as on a never-calibrated chip. + +### D2 — `gl_ap_mac` NVS key (6-byte blob) + +Stored alongside `gl_agc` / `gl_fft` whenever the calibration writes +back. Source: `esp_wifi_sta_get_ap_info(&ap).bssid`. Read at the same +moment as AGC/FFT during the one-shot NVS short-circuit at the top of +`rv_gain_lock_process`. + +Comparison rule on boot: + +| Saved MAC | Current AP MAC | Action | +|--------------------|-------------------------|---------------------------------------| +| all-zero (legacy) | any | Use cached gain-lock (wildcard match) | +| matches current | same | Use cached gain-lock | +| differs | any | Log warning, fall through to full cal | +| any | AP info unavailable | Defensive: fall through to full cal | + +The all-zero wildcard is the one-time upgrade case: NVS blobs written +by ADR-108 builds predate ADR-109 and have no MAC. Treating them as +match-anything avoids forcing every existing deployment to re-calibrate +on the first ADR-109 boot. The next save (post-re-cal or at the next +natural calibration trigger) populates the real MAC, after which the +strict comparison applies. + +### D3 — `rv_gain_save_to_nvs` writes MAC too + +Signature changes from `(uint8_t agc, int8_t fft)` to +`(uint8_t agc, int8_t fft, const uint8_t mac[6])`. The caller reads +`ap.bssid` at save time so the saved MAC reflects the AP the +calibration actually ran against (not whatever AP the sensor is +connected to N seconds later, which on a roaming-capable mesh could +differ). + +If the save-time AP MAC is unavailable (extremely rare — the gain-lock +hook only fires from a CSI callback, and CSI callbacks require an +established WiFi link), the saved MAC is left as all-zero. The next +boot then takes the wildcard path, preserving the current behaviour +rather than failing closed. + +### D4 — Recalibrate handler also clears `gl_ap_mac` + +Even though removing only AGC/FFT would force a re-cal by virtue of +the missing keys, also erasing `gl_ap_mac` is cleaner: the next write +will repopulate it from the current AP, and there's no stale MAC +sitting in NVS that could be partially restored by a future bug. + +## Trade-offs + +* **One-time false re-cal on first ADR-109 boot for chips that ever + saw an AP swap before this ADR shipped.** Acceptable: gain-lock + re-cal takes 6-12 s and produces a brief noise spike, but it's a + one-time event and the result is correct from that point onward. +* **No multi-AP cache.** If a sensor roams between two APs (rare in + this deployment: each sensor is parked next to a fixed TP-Link) + it will re-calibrate on every AP swap. Multi-AP storage would need + per-AP-MAC sub-keys (`gl_agc:`, etc.) — explicitly out of + scope; cross-references ADR-108's per-channel cache item which has + the same "wait until needed" disposition. +* **`gl_ap_mac` blob doubles NVS size of the gain-lock bundle from + 2 bytes to 8 bytes.** Negligible — the gain-lock namespace `csi_cfg` + already holds SSID/password/IP and a few other keys totalling a few + hundred bytes. + +## Files Touched + +``` +firmware/esp32-csi-node/main/ota_update.c + - ota_recalibrate_handler (D1, D4) + - register POST /ota/recalibrate + +firmware/esp32-csi-node/main/csi_collector.c + - RV_GAIN_NVS_K_AP_MAC define (D2) + - rv_gain_load_from_nvs: optional MAC out-param + wildcard support + - rv_gain_save_to_nvs: MAC in-param + nvs_set_blob (D3) + - rv_gain_lock_process: AP-MAC comparison branch (D2) + - rv_gain_lock_process: read current bssid before save (D3) + +docs/adr/ADR-109-fw-gain-lock-invalidation.md (this) +``` + +## Verified Acceptance + +1. `idf.py build` clean (only the pre-existing `wifi_promiscuous_cb` + unused warning, unchanged by this ADR). +2. After OTA flash of both nodes: + * `curl -X POST http://192.168.0.100:8032/ota/recalibrate` + * `curl -X POST http://192.168.0.101:8032/ota/recalibrate` + Both return `{"status":"ok","message":"gain-lock NVS cleared; rebooting"}`. +3. Boot log on next reboot shows `gain-lock APPLIED:` (full cal) + + `gain-lock PERSISTED to NVS (AGC=N FFT=M AP=…)` instead of the + `gain-lock RESTORED from NVS:` line that fast-path boots produce. +4. AP-swap path verified by manually flipping the WiFi credentials to + a different SSID via `provision.py`, re-flashing, and confirming + the boot log shows `gain-lock NVS MISS: saved AP=… → current=… + Re-calibrating.` followed by a full cal. + +## References + +* ADR-108 — NVS persistence of gain-lock. Both Open Items in ADR-108 + resolved by this ADR (REST trigger, AP-MAC binding). +* ADR-050 — OTA Bearer-token auth. Same `ota_check_auth` shared with + the new endpoint. +* `docs/references/ota-pipeline.md` — port 8032 recipe; gains a new + bullet for `/ota/recalibrate`.