docs: ADR-104 (per-subcarrier drift) + ADR-108 (FW NVS persistence)
ADR-104: documents the off-axis presence channel that fires
present_still when per-subcarrier amplitudes drift ≥10% from the
saved per_subcarrier_mean baseline, plus the NBVI Step 3 FP-rate
validation (K candidate sweep, smallest-FP wins). Implementation
shipped in 6212b17e.
ADR-108: documents the FW NVS persistence of gain-lock values
(csi_cfg/gl_agc + gl_fft), the one-shot load at first packet after
boot, the save after every successful calibration, and the safety
MIN_SAFE_AGC guard on restored values. Implementation shipped in
3779bb76; flashed to both sensors via OTA.
Both ADRs ≤ 200 lines per the project's docs convention. Open items
recorded so future agents can pick up: per-sub drift age check,
phase-domain drift, REST recalibrate endpoint, AP-MAC tied cache.
This commit is contained in:
parent
3779bb7655
commit
d7189d9b0f
|
|
@ -0,0 +1,162 @@
|
|||
# ADR-104 — Per-Subcarrier Drift Presence Channel + NBVI FP-Rate Validation
|
||||
|
||||
**Status**: Accepted
|
||||
**Date**: 2026-05-17
|
||||
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
|
||||
(`AMP_BASELINE_PER_SUB`, `AMP_DRIFT`, `amp_drift_for_node`,
|
||||
`amp_drift_max`, `amp_node_level`, `amp_classify_from_latest`,
|
||||
`nbvi_select_top_k` Step 3), `scripts/record-baseline.py`
|
||||
(`per_subcarrier_mean` already saved).
|
||||
|
||||
## Context
|
||||
|
||||
After ADR-103 the classifier triggers `present_still` only when the
|
||||
**broadband mean** of the NBVI-selected subset drops by ≥ 25 % from
|
||||
the loaded baseline. This works when the operator's body crosses the
|
||||
line of sight between AP and sensor — direct-component attenuation
|
||||
dominates. But:
|
||||
|
||||
1. **Off-axis presence**: the operator sitting at a desk to the side
|
||||
of the AP-sensor line modulates only a handful of subcarriers
|
||||
(the ones whose Fresnel zone happens to brush their body). The
|
||||
*broadband* mean barely shifts; ADR-103 says `absent` even though
|
||||
someone is clearly in the room.
|
||||
2. **NBVI Step 3**: Pace's full NBVI pipeline picks top-K by raw NBVI
|
||||
score, then **validates** each candidate K by counting false
|
||||
positives the motion detector would produce on the calibration
|
||||
buffer, and keeps the K with the lowest FP rate. We were taking
|
||||
the raw top-12 without validation — fragile if one of the chosen
|
||||
subcarriers happens to overlap a noise source.
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1 — Spectral drift score as a second presence channel
|
||||
|
||||
`amp_presence_override` per node now also computes a **spectral
|
||||
drift score**:
|
||||
|
||||
```
|
||||
drift_k = (current_amp[k] - baseline_amp[k]).abs() / baseline_amp[k] for baseline[k] > 1.0
|
||||
drift = mean(drift_k) across kept subcarriers
|
||||
```
|
||||
|
||||
`current_amp[k]` = mean of the recent `AMP_SHORT_WIN` (90) frames'
|
||||
amplitude at subcarrier `k`. `baseline_amp[k]` = the
|
||||
`per_subcarrier_mean` vector saved by ADR-103's recording script.
|
||||
|
||||
Per-node drift is stashed in `AMP_DRIFT: HashMap<u8, f64>` so
|
||||
`amp_node_level` (per-node) and `amp_classify_from_latest` (global)
|
||||
can use it. Threshold `AMP_DRIFT_PRESENCE_THRESH = 0.10` (10 %
|
||||
average per-subcarrier deviation) is empirical and consistent with
|
||||
the broadband-ratio trigger (drop ≥ 25 %, drift ≥ 10 %).
|
||||
|
||||
### D2 — Trigger order in classifier
|
||||
|
||||
Per node (`amp_node_snapshot`):
|
||||
|
||||
```
|
||||
1. CV ≥ 6× baseline_cv → active
|
||||
2. CV ≥ 3× baseline_cv → present_moving
|
||||
3. drift ≥ 10 % → present_still ← ADR-104 (off-axis)
|
||||
4. mean / baseline < 0.75 → present_still ← ADR-101 (in-path)
|
||||
5. otherwise → absent
|
||||
```
|
||||
|
||||
Global (`amp_classify_from_latest`) uses MAX CV / MAX drift / ANY
|
||||
baseline-drop across nodes. Either drop OR drift fires `present_still`.
|
||||
|
||||
### D3 — Opportunistic loading
|
||||
|
||||
`per_subcarrier_mean` was already being written by
|
||||
`scripts/record-baseline.py` (line ~132, written as a list of
|
||||
~56 floats per node) but the server ignored it. Now `load_baseline_file`
|
||||
parses it and populates `AMP_BASELINE_PER_SUB`. If absent (older
|
||||
`baseline.json` from before this ADR) → drift stays 0.0 → no behaviour
|
||||
change. Re-trigger calibration via the ADR-107 REST endpoint or auto-
|
||||
recalibrate to populate the field and activate the drift channel.
|
||||
|
||||
### D4 — NBVI FP-rate validation (Step 3 of Pace's spec)
|
||||
|
||||
`nbvi_select_top_k` no longer returns the literal top-K. After
|
||||
ranking by NBVI score (Steps 1+2), it evaluates each candidate
|
||||
K ∈ `{6, 8, 10, 12, 16, 20}` clamped to the available subcarrier
|
||||
pool:
|
||||
|
||||
* For each K: compute per-frame broadband mean over the top-K
|
||||
subset across the quiet window.
|
||||
* Slide a sub-window (length `AMP_SHORT_WIN/3 ≈ 30` samples, stride
|
||||
`sub_window/2`) and count windows where rolling CV exceeds the
|
||||
moving-gate threshold (0.10).
|
||||
* Pick the K with the **smallest FP count**. Ties broken by smallest
|
||||
total NBVI score (less noisy subset wins).
|
||||
|
||||
Result: a subset that's stable AND non-FP-producing on the calibration
|
||||
window. If a top-12 NBVI candidate sneaks in a subcarrier overlapping
|
||||
a noise source, the FP count surfaces it and a smaller K wins instead.
|
||||
|
||||
## Files Touched
|
||||
|
||||
```
|
||||
v2/crates/wifi-densepose-sensing-server/src/main.rs
|
||||
- statics: AMP_BASELINE_PER_SUB, AMP_DRIFT
|
||||
- helpers: amp_baseline_per_sub_init, amp_drift_init,
|
||||
amp_drift_for_node, amp_drift_max
|
||||
- load_baseline_file: parse per_subcarrier_mean → AMP_BASELINE_PER_SUB
|
||||
- amp_presence_override: drift computation + stash
|
||||
- amp_node_level: drift trigger (uses MAX for cross-node)
|
||||
- amp_node_snapshot: per-node drift trigger (overrides MAX)
|
||||
- amp_classify_from_latest: any-node drift trigger in global fusion
|
||||
- nbvi_select_top_k: Step 3 FP-rate validation
|
||||
docs/adr/ADR-104-per-subcarrier-drift-presence.md (this)
|
||||
```
|
||||
|
||||
Implementation commit: `6212b17e`.
|
||||
|
||||
## Verified Acceptance
|
||||
|
||||
Server boot log (using existing v1 baseline.json without
|
||||
`per_subcarrier_mean`):
|
||||
|
||||
```
|
||||
baseline: loaded 2 node overrides from data/baseline.json
|
||||
(node1=27.04, node2=14.72; node1_cv=2.62%, node2_cv=3.65%)
|
||||
```
|
||||
|
||||
Without `per_subcarrier_mean` in the file, drift is identically 0
|
||||
and the classifier behaves exactly as ADR-103. To activate the
|
||||
drift channel: re-record via the ADR-107 REST endpoint or wait for
|
||||
auto-recalibrate; new `baseline.json` carries the
|
||||
`per_subcarrier_mean` vector and drift becomes live.
|
||||
|
||||
NBVI Step 3 validation runs on every refresh tick. With K=12 being
|
||||
the "safe" default that always passes (clean low-CV window in the
|
||||
operator's deployment) and smaller Ks not improving FP=0, the picker
|
||||
keeps K=12 in steady state. Defends against future drift in channel
|
||||
conditions where a previously-clean subcarrier picks up interference.
|
||||
|
||||
## Open Items
|
||||
|
||||
* **Per-subcarrier baseline AGE check** — the per-sub vector reflects
|
||||
the channel at calibration time. As the channel slowly drifts (other
|
||||
WiFi clients on the AP, temperature, etc.) the per-sub baseline ages
|
||||
faster than the broadband-mean baseline. Need: if `last_written_sec_ago`
|
||||
> N hours AND drift consistently > threshold → flag for
|
||||
re-calibration. Defer to a future ADR-109.
|
||||
* **Per-subcarrier delta in UI** — `raw.html` only shows broadband
|
||||
bars + global classification. A small "drift" sparkline per node
|
||||
would let the operator see the off-axis channel firing. ~30 min.
|
||||
* **Phase-domain drift** — currently amplitude-only. Phase delta vs
|
||||
baseline phase would catch even subtler movement (chest-wall sub-mm
|
||||
motion during breathing). Requires phase baseline in `baseline.json`,
|
||||
which the recording script doesn't yet save. ~1 h script + ~30 min
|
||||
server.
|
||||
|
||||
## References
|
||||
|
||||
* ADR-101 — broadband classifier; this ADR adds a parallel channel.
|
||||
* ADR-102 — NBVI; this ADR adds Step 3 validation per Pace's spec.
|
||||
* ADR-103 — persistent baseline; `per_subcarrier_mean` already written.
|
||||
* ADR-107 — REST calibrate endpoint; how the operator refreshes the
|
||||
per-sub vector on demand.
|
||||
* [`docs/references/espectre-techniques.md`](../references/espectre-techniques.md)
|
||||
§1.Step 3.
|
||||
|
|
@ -0,0 +1,176 @@
|
|||
# ADR-108 — FW NVS Persistence of Gain-Lock Values
|
||||
|
||||
**Status**: Accepted
|
||||
**Date**: 2026-05-17
|
||||
**Scope**: `firmware/esp32-csi-node/main/csi_collector.c`
|
||||
(`rv_gain_load_from_nvs`, `rv_gain_save_to_nvs`, NVS hook in
|
||||
`rv_gain_lock_process`).
|
||||
|
||||
## Context
|
||||
|
||||
ADR-100 introduced the FW-side gain-lock (AGC + FFT scale) but the
|
||||
calibration runs on *every* boot:
|
||||
|
||||
1. Collect 300 packets (~3 s at 100 pps, but realistically 6-12 s
|
||||
in production where keepalive drives only 25 pps).
|
||||
2. Take the median of AGC and FFT samples.
|
||||
3. Call `phy_force_rx_gain` / `phy_fft_scale_force` to freeze.
|
||||
|
||||
This means after every reboot — OTA, power blip, watchdog — the chip
|
||||
goes through 6-12 s where CSI is generated with **unlocked AGC** that
|
||||
drifts ±20–30 % (the very artefact gain-lock was meant to suppress).
|
||||
The operator's classifier, ADR-101's NBVI selector, and ADR-103's
|
||||
baseline comparison all see noisy data during that warm-up.
|
||||
|
||||
Pace's ESPectre persists everything calibration-related to NVS so
|
||||
post-reboot the sensor is back in detect mode in well under a
|
||||
second. This ADR ports the gain-lock half of that policy
|
||||
(NBVI lives server-side in RuView, doesn't apply).
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1 — NVS namespace + keys
|
||||
|
||||
```c
|
||||
#define RV_GAIN_NVS_NS "csi_cfg"
|
||||
#define RV_GAIN_NVS_K_AGC "gl_agc" // u8
|
||||
#define RV_GAIN_NVS_K_FFT "gl_fft" // i8
|
||||
```
|
||||
|
||||
`csi_cfg` is the same namespace the WiFi creds / collector IP / node_id
|
||||
live in (so it's already initialised + checked by `nvs_config_load`).
|
||||
Two single-byte values — minimal NVS footprint.
|
||||
|
||||
### D2 — Two thin helpers
|
||||
|
||||
```c
|
||||
static esp_err_t rv_gain_load_from_nvs(uint8_t *agc, int8_t *fft);
|
||||
static void rv_gain_save_to_nvs(uint8_t agc, int8_t fft);
|
||||
```
|
||||
|
||||
Both are local to `csi_collector.c`. Load returns `ESP_ERR_NVS_NOT_FOUND`
|
||||
on a fresh chip; save logs a warning but never blocks the boot path
|
||||
if NVS write fails.
|
||||
|
||||
### D3 — One-shot NVS load at top of `rv_gain_lock_process`
|
||||
|
||||
A static `s_nvs_checked` flag triggers exactly **one** load attempt
|
||||
on the first packet after boot:
|
||||
|
||||
```c
|
||||
if (!s_nvs_checked) {
|
||||
s_nvs_checked = true;
|
||||
uint8_t agc; int8_t fft;
|
||||
if (rv_gain_load_from_nvs(&agc, &fft) == ESP_OK
|
||||
&& agc >= RV_GAIN_MIN_SAFE_AGC)
|
||||
{
|
||||
phy_fft_scale_force(true, fft);
|
||||
phy_force_rx_gain(1, (int)agc);
|
||||
s_gain_locked = true;
|
||||
ESP_LOGI(TAG, "gain-lock RESTORED from NVS: AGC=%u FFT=%d", agc, fft);
|
||||
return;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `agc >= RV_GAIN_MIN_SAFE_AGC` guard preserves ADR-100's "skip if
|
||||
signal too strong" safety: a stale low-AGC value that would freeze
|
||||
the RX path is rejected even if it's in NVS.
|
||||
|
||||
### D4 — Save after every successful lock
|
||||
|
||||
The existing `phy_*_force` branch in `rv_gain_lock_process` is wrapped
|
||||
with a save call:
|
||||
|
||||
```c
|
||||
phy_fft_scale_force(true, s_gain_fft_value);
|
||||
phy_force_rx_gain(1, (int)s_gain_agc_value);
|
||||
rv_gain_save_to_nvs(s_gain_agc_value, s_gain_fft_value);
|
||||
ESP_LOGI(TAG, "gain-lock PERSISTED to NVS (%s/%s, %s)",
|
||||
RV_GAIN_NVS_NS, RV_GAIN_NVS_K_AGC, RV_GAIN_NVS_K_FFT);
|
||||
```
|
||||
|
||||
So the first boot ever does the full 300-packet calibration **and**
|
||||
saves; every subsequent boot loads instantly from D3.
|
||||
|
||||
### D5 — Invalidation policy
|
||||
|
||||
Stored values are tied to: this sensor's physical location + this AP's
|
||||
MAC + this channel + this antenna orientation. If any of those change,
|
||||
the saved AGC/FFT may be slightly off-optimal — but **not dangerous**.
|
||||
The WiFi PHY just receives slightly off-optimal CSI; the host will
|
||||
see higher baseline noise until the operator triggers a re-calibration.
|
||||
|
||||
Today: erase via `idf.py erase-flash` over USB, or `nvs_flash_erase()`
|
||||
called from a future REST endpoint. No automatic invalidation — the
|
||||
operator decides when a deployment change is significant enough.
|
||||
|
||||
## Files Touched
|
||||
|
||||
```
|
||||
firmware/esp32-csi-node/main/csi_collector.c
|
||||
- #include "nvs.h" / "nvs_flash.h"
|
||||
- rv_gain_load_from_nvs / rv_gain_save_to_nvs (D2)
|
||||
- s_nvs_checked one-shot in rv_gain_lock_process (D3)
|
||||
- save call after lock branch (D4)
|
||||
docs/adr/ADR-108-fw-nvs-persist-gain-lock.md (this)
|
||||
```
|
||||
|
||||
Implementation commit: `3779bb76`. Flashed to both sensors via OTA
|
||||
(no USB) — `python3 scripts/ota-deploy.sh`.
|
||||
|
||||
## Verified Acceptance
|
||||
|
||||
Test sequence:
|
||||
|
||||
1. OTA flash new FW to both nodes (first boot, NVS empty).
|
||||
2. Wait 15 s for FW to complete first calibration + write to NVS.
|
||||
3. OTA flash the SAME binary again (forces a reboot; new FW has
|
||||
values in NVS from step 2).
|
||||
4. Sample WS amplitude rate in the first 3 s after the second boot.
|
||||
|
||||
Before this ADR: ~5-12 s gap between boot and first amp-bearing WS
|
||||
frame (waiting for fresh calibration). After this ADR: WS shows
|
||||
**44 Hz raw CSI in the first 3 s** — instant resume.
|
||||
|
||||
Logs from a chip that has values in NVS:
|
||||
|
||||
```
|
||||
I (335) main: boot: reset_reason=SW running_partition=ota_1
|
||||
I (520) csi_collector: gain-lock RESTORED from NVS: AGC=44 FFT=-33
|
||||
(0-packet calibration; clear NVS to recalibrate)
|
||||
```
|
||||
|
||||
vs first-boot ever:
|
||||
|
||||
```
|
||||
I (335) main: boot: reset_reason=POWERON running_partition=ota_0
|
||||
I (4980) csi_collector: gain-lock APPLIED: AGC=44 FFT=-33
|
||||
(median of 300 packets)
|
||||
I (4980) csi_collector: gain-lock PERSISTED to NVS (csi_cfg/gl_agc, gl_fft)
|
||||
```
|
||||
|
||||
## Open Items
|
||||
|
||||
* **REST endpoint to clear gain-lock NVS** — today the operator has
|
||||
to USB-erase the namespace. A FW-side `POST /ota/recalibrate` that
|
||||
clears the two keys + `esp_restart()` would close that loop.
|
||||
~30 min FW + flash.
|
||||
* **Track AP MAC alongside AGC/FFT** — `csi_cfg/gl_ap_mac`. On boot,
|
||||
if current AP MAC ≠ saved → ignore the cached values and re-calibrate.
|
||||
Fully automatic invalidation. ~1 h FW.
|
||||
* **Per-channel cache** — `csi_cfg/gl_<chan>_agc`. If the channel hop
|
||||
table (ADR-029) is reactivated, each channel needs its own values.
|
||||
~1 h FW.
|
||||
|
||||
## References
|
||||
|
||||
* ADR-100 — gain-lock implementation that this ADR persists.
|
||||
* ADR-101 — classifier that suffers during the 6-12 s warm-up gap
|
||||
that this ADR closes.
|
||||
* `docs/references/ota-pipeline.md` — the WiFi flash flow used to
|
||||
deploy this FW change without USB.
|
||||
* Francesco Pace, *How I Turned My Wi-Fi Into a Motion Sensor —
|
||||
Part 2*, "Persisted calibration" — the upstream pattern this ADR
|
||||
ports (their NVS payload also includes NBVI indices + baseline,
|
||||
which RuView keeps server-side).
|
||||
Loading…
Reference in New Issue