docs(adr-109): new ADR + close ADR-108 open items

ADR-109 documents POST /ota/recalibrate + gl_ap_mac NVS binding
and supersedes the two Open Items in ADR-108.

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
arsen 2026-05-17 16:14:51 +07:00
parent f92807cdaf
commit 169a589def
2 changed files with 154 additions and 8 deletions

View File

@ -152,16 +152,17 @@ I (4980) csi_collector: gain-lock PERSISTED to NVS (csi_cfg/gl_agc, gl_fft)
## Open Items
* **REST endpoint to clear gain-lock NVS** — today the operator has
to USB-erase the namespace. A FW-side `POST /ota/recalibrate` that
clears the two keys + `esp_restart()` would close that loop.
~30 min FW + flash.
* **Track AP MAC alongside AGC/FFT**`csi_cfg/gl_ap_mac`. On boot,
if current AP MAC ≠ saved → ignore the cached values and re-calibrate.
Fully automatic invalidation. ~1 h FW.
* **Per-channel cache**`csi_cfg/gl_<chan>_agc`. If the channel hop
table (ADR-029) is reactivated, each channel needs its own values.
~1 h FW.
~1 h FW. Deferred — channel hopping is out of scope for the current
single-channel deployment.
## Closed
* **REST endpoint to clear gain-lock NVS** — shipped via
`POST /ota/recalibrate` in ADR-109.
* **Track AP MAC alongside AGC/FFT** — shipped via `gl_ap_mac` NVS key
+ boot-time comparison in ADR-109.
## References

View File

@ -0,0 +1,145 @@
# ADR-109 — FW Gain-Lock Invalidation (REST trigger + AP-MAC binding)
**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `firmware/esp32-csi-node/main/ota_update.c`,
`firmware/esp32-csi-node/main/csi_collector.c`. Closes both Open Items in
ADR-108.
## Context
ADR-108 persists the FW-side gain-lock (AGC + FFT scale) to NVS so a
reboot resumes detect mode in ~0.5 s. Two follow-ups remained:
1. **No way to clear the cache without USB.** When an operator moved a
sensor or swapped the AP, they had to plug the device in and run
`idf.py erase-flash` to force a re-calibration. Defeats the whole
point of OTA-only ops.
2. **No automatic invalidation on AP swap.** Gain-lock is tied to a
specific RF path (AP location, distance, multipath). Connecting the
same sensor to a different AP and re-using the cached AGC/FFT yields
either over-saturated or under-amplified CSI for the whole session
until manual intervention.
## Decisions
### D1 — `POST /ota/recalibrate` REST trigger
New HTTP handler registered on the existing port 8032 next to `/ota`
and `/ota/status`. Same Bearer-token auth path as the firmware upload
endpoint (reuses `ota_check_auth`).
Behaviour:
1. Open NVS namespace `csi_cfg` RW.
2. Erase three keys: `gl_agc`, `gl_fft`, `gl_ap_mac` (D2).
3. `nvs_commit` + close.
4. Send `200 OK {status:"ok"}` JSON.
5. `vTaskDelay(1 s)` to flush the response, then `esp_restart()`.
Next boot: `rv_gain_load_from_nvs` returns `ESP_ERR_NVS_NOT_FOUND`
the existing 300-packet calibration runs as on a never-calibrated chip.
### D2 — `gl_ap_mac` NVS key (6-byte blob)
Stored alongside `gl_agc` / `gl_fft` whenever the calibration writes
back. Source: `esp_wifi_sta_get_ap_info(&ap).bssid`. Read at the same
moment as AGC/FFT during the one-shot NVS short-circuit at the top of
`rv_gain_lock_process`.
Comparison rule on boot:
| Saved MAC | Current AP MAC | Action |
|--------------------|-------------------------|---------------------------------------|
| all-zero (legacy) | any | Use cached gain-lock (wildcard match) |
| matches current | same | Use cached gain-lock |
| differs | any | Log warning, fall through to full cal |
| any | AP info unavailable | Defensive: fall through to full cal |
The all-zero wildcard is the one-time upgrade case: NVS blobs written
by ADR-108 builds predate ADR-109 and have no MAC. Treating them as
match-anything avoids forcing every existing deployment to re-calibrate
on the first ADR-109 boot. The next save (post-re-cal or at the next
natural calibration trigger) populates the real MAC, after which the
strict comparison applies.
### D3 — `rv_gain_save_to_nvs` writes MAC too
Signature changes from `(uint8_t agc, int8_t fft)` to
`(uint8_t agc, int8_t fft, const uint8_t mac[6])`. The caller reads
`ap.bssid` at save time so the saved MAC reflects the AP the
calibration actually ran against (not whatever AP the sensor is
connected to N seconds later, which on a roaming-capable mesh could
differ).
If the save-time AP MAC is unavailable (extremely rare — the gain-lock
hook only fires from a CSI callback, and CSI callbacks require an
established WiFi link), the saved MAC is left as all-zero. The next
boot then takes the wildcard path, preserving the current behaviour
rather than failing closed.
### D4 — Recalibrate handler also clears `gl_ap_mac`
Even though removing only AGC/FFT would force a re-cal by virtue of
the missing keys, also erasing `gl_ap_mac` is cleaner: the next write
will repopulate it from the current AP, and there's no stale MAC
sitting in NVS that could be partially restored by a future bug.
## Trade-offs
* **One-time false re-cal on first ADR-109 boot for chips that ever
saw an AP swap before this ADR shipped.** Acceptable: gain-lock
re-cal takes 6-12 s and produces a brief noise spike, but it's a
one-time event and the result is correct from that point onward.
* **No multi-AP cache.** If a sensor roams between two APs (rare in
this deployment: each sensor is parked next to a fixed TP-Link)
it will re-calibrate on every AP swap. Multi-AP storage would need
per-AP-MAC sub-keys (`gl_agc:<bssid>`, etc.) — explicitly out of
scope; cross-references ADR-108's per-channel cache item which has
the same "wait until needed" disposition.
* **`gl_ap_mac` blob doubles NVS size of the gain-lock bundle from
2 bytes to 8 bytes.** Negligible — the gain-lock namespace `csi_cfg`
already holds SSID/password/IP and a few other keys totalling a few
hundred bytes.
## Files Touched
```
firmware/esp32-csi-node/main/ota_update.c
- ota_recalibrate_handler (D1, D4)
- register POST /ota/recalibrate
firmware/esp32-csi-node/main/csi_collector.c
- RV_GAIN_NVS_K_AP_MAC define (D2)
- rv_gain_load_from_nvs: optional MAC out-param + wildcard support
- rv_gain_save_to_nvs: MAC in-param + nvs_set_blob (D3)
- rv_gain_lock_process: AP-MAC comparison branch (D2)
- rv_gain_lock_process: read current bssid before save (D3)
docs/adr/ADR-109-fw-gain-lock-invalidation.md (this)
```
## Verified Acceptance
1. `idf.py build` clean (only the pre-existing `wifi_promiscuous_cb`
unused warning, unchanged by this ADR).
2. After OTA flash of both nodes:
* `curl -X POST http://192.168.0.100:8032/ota/recalibrate`
* `curl -X POST http://192.168.0.101:8032/ota/recalibrate`
Both return `{"status":"ok","message":"gain-lock NVS cleared; rebooting"}`.
3. Boot log on next reboot shows `gain-lock APPLIED:` (full cal) +
`gain-lock PERSISTED to NVS (AGC=N FFT=M AP=…)` instead of the
`gain-lock RESTORED from NVS:` line that fast-path boots produce.
4. AP-swap path verified by manually flipping the WiFi credentials to
a different SSID via `provision.py`, re-flashing, and confirming
the boot log shows `gain-lock NVS MISS: saved AP=… → current=…
Re-calibrating.` followed by a full cal.
## References
* ADR-108 — NVS persistence of gain-lock. Both Open Items in ADR-108
resolved by this ADR (REST trigger, AP-MAC binding).
* ADR-050 — OTA Bearer-token auth. Same `ota_check_auth` shared with
the new endpoint.
* `docs/references/ota-pipeline.md` — port 8032 recipe; gains a new
bullet for `/ota/recalibrate`.