wifi-densepose/docs/adr/ADR-100-gain-lock-baseline-...

# ADR-100 — PHY Gain Lock for Baseline-Stable CSI

**Status**: Accepted
**Date**: 2026-05-17
**Scope**: `firmware/esp32-csi-node/main/csi_collector.c`,
`v2/crates/wifi-densepose-sensing-server/static/raw.html`.

## Context

After ADR-110 deployed the TP-Link WISP AP and the operator captured three
controlled one-minute windows (empty / sit / walk), the RSSI MAD-Δ
classifier failed to separate the three states — measured `d` values
overlapped within ±0.03 of 0.49 while in-state spread was ±0.10. We
inspected the live amplitude spectrum on the new `raw.html` console and
saw a slow ±20-30 % broadband drift in the sensor amplitude even with
the room provably empty. The drift was indistinguishable from body
modulation at multi-meter range and dominated every downstream feature.

Francesco Pace's [ESPectre](https://github.com/francescopace/espectre)
project (GPLv3) traced the same artefact to the ESP32 PHY's automatic
gain control: AGC continuously rebalances the receiver gain per packet
so received frames stay in the optimal decoding range. For CSI sensing
this is a disaster — the same channel state arrives with a different
amplitude every packet because the gain stage shifts under it. Pace
documented two undocumented PHY routines in the IDF blob that freeze
AGC and FFT scaling, plus a calibration recipe (median of the first
300 packets) that is robust to brief startup activity.

## Decisions

### D1 — Port the ESPectre gain-lock to RuView FW

Added a self-contained block to `csi_collector.c`:

* **Overlay struct** `rv_phy_rx_ctrl_t` aliased over `wifi_csi_info_t.rx_ctrl`
  to read the hidden `agc_gain` (u8) and `fft_gain` (signed i8) fields.
* **Extern declarations** for the two PHY routines:
  ```c
  extern void phy_fft_scale_force(bool force_en, int8_t force_value);
  extern void phy_force_rx_gain(int force_en, int force_value);
  ```
* **Two-phase calibration** (`rv_gain_lock_process`):
  - Phase 1 (≤ 300 packets, ~6 s at the rate-gated 50 Hz callback):
    accumulate AGC and FFT samples into static arrays.
  - At the 300th packet: `qsort` both arrays, take the median, and
    call the two PHY routines to freeze gain.
* **Safety branch**: if median AGC < 30, skip the lock and log a
  warning. Forcing a low gain on a strong-signal deployment causes the
  RX path to freeze (empirically documented in ESPectre's
  `gain_controller.h`).
* **Supported targets**: ESP32-S3, ESP32-C3, ESP32-C6 only — older
  parts compile to a no-op stub. RuView ships on S3 so this is the only
  path we care about.

The hook is wired immediately after the existing rate-gate and MAC
filter in the CSI callback so calibration completes within the first
~6 s after the WiFi association, regardless of host traffic. After
that it short-circuits.

Tagged as ADR-100 in the source comment for traceability.

### D2 — Use the existing `raw.html` console (ADR-110, D2 reuse) as the verification UI

The console added in ADR-110 already streams `nodes[].amplitude` from
the existing WebSocket. No server-side change was needed. The HTML
displays a per-node bar histogram of all 56 active subcarriers plus
broadband mean amplitude and RSSI traces over the last 30 s. This is
the surface where the operator can watch — without any DSP, without any
classification — whether the gain-lock has actually flattened the
baseline.

### D3 — Geometry matters as much as gain-lock

A controlled three-state capture made on 2026-05-17 with both sensors
positioned so that the line `TP-Link AP → sensor` passes through the
operator (lying on the bed) confirmed both decisions. The summary
table appears under *Verified Acceptance* below. Earlier captures
(ADR-110) failed to separate states partly because the sensors were
placed off-axis from the AP-to-body line; with that geometry the body
never physically obstructs the CSI channel.

## Calibration values observed (real captures, this deployment)

| Node | Boot rate (low traffic) | Boot rate (ping flood) | AGC median | FFT scale median | Lock decision |
|---|---|---|---|---|---|
| room01 (192.168.0.101) | 0.3 fps | 30+ fps | **42–44** | −31 / −33 | **APPLIED** |
| room02 (192.168.0.100) | 0.3 fps | 30+ fps | **44** | −40 / −42 | **APPLIED** |

Both AGC medians are comfortably above the 30 safety threshold. The
calibration completes in ~6 s when there is any host traffic (a single
ping to the sensor at 10 pps is enough); on a totally idle channel
beacons drive the rate down to 0.3 fps and calibration would take ~17
minutes — practically we always have some traffic.

## Verified Acceptance — three-state separation

Geometry: TP-Link AP on the wall, both sensors at table-level on the
opposite side of the room, operator lying on the bed between AP and
sensors. 30 seconds per state, gain-lock active on both nodes,
`raw.html` open during capture, `target_ip` provisioned to the Mac's
TP-Link-side IP (192.168.0.103) so no upstream NAT is in the path.

| State | node 1 mean A | node 1 CV | node 1 sub-CV <5 % | node 2 mean A | node 2 CV | node 2 sub-CV <7 % |
|---|---|---|---|---|---|---|
| **EMPTY** (operator out) | **37.28** | **2.71 %** | **44/44** | 9.52 | 5.22 % | 26/44 |
| **STILL** (operator lying still on bed) | 22.43 | 3.70 % | 30/44 | 9.67 | 5.02 % | 24/44 |
| **WALK** (operator pacing the room) | 31.77 | **12.50 %** | 0/44 | 7.15 | **29.72 %** | 0/44 |

Observations:

* **Node 1 separates all three states** by mean amplitude alone: 37 →
  22 → 32. The body lying still blocks the direct path
  (40 % amplitude drop), then motion adds reflections back. The CV
  ladder 2.71 → 3.70 → 12.50 % is a second independent feature.
* **Node 2 separates STILL+EMPTY from WALK** by CV (5 → 30 %). Its
  geometry doesn't pick up a still body, only motion.
* **Compare to ADR-110** where empty/sit/walk differed by ±0.02 inside
  ±0.10 noise — we now have inter-state separation ratios of **×3.4 on
  node 1 and ×5.9 on node 2**. The signal is no longer dominated by
  baseline drift.

## Files Touched

```
firmware/esp32-csi-node/main/csi_collector.c       # gain-lock module + hook
v2/crates/wifi-densepose-sensing-server/static/raw.html   # already from ADR-110
docs/adr/ADR-100-gain-lock-baseline-stabilization.md      # this ADR
```

## Open Items

* ✅ **NBVI subcarrier selection** — closed in ADR-102 (server-side
  port with quiet-window finder).
* ✅ **Server-side RSSI parsing** — fixed by parallel agent in commit
  `3393c1e8` (parse_esp32_frame offset realignment + carrying RSSI
  through feature_state packets).
* ✅ **Calibration latency on an idle channel** — closed in ADR-106
  by the built-in managed-`ping` keepalive (drives sensor RX at
  25 pkt/s/node out of the box).
* ⏳ **NVS target_ip is hardcoded** — still open. Tailscale-target
  option not implemented; sensors still send to the Mac's TP-Link-
  side IP (192.168.0.103). Mac roaming still breaks the CSI stream.

## References

* ADR-039 — Edge intelligence pipeline (host DSP path).
* ADR-098 — Earlier ESP32-S3 deployment fixes.
* ADR-110 — TP-Link WISP deployment + first RSSI-Δ attempt (this ADR
  supersedes the threshold table in ADR-110, D3 — the RSSI MAD-Δ
  detector is left in place but no longer the primary signal).
* Francesco Pace, *How I Turned My Wi-Fi Into a Motion Sensor — Part 2*,
  Dec 2025 — source of the gain-lock recipe.
* `francescopace/espectre`, `components/espectre/gain_controller.{h,cpp}`
  on GitHub — reference implementation (GPLv3).