wifi-densepose/docs/adr/ADR-100-gain-lock-baseline-...

7.3 KiB
Raw Blame History

ADR-100 — PHY Gain Lock for Baseline-Stable CSI

Status: Accepted Date: 2026-05-17 Scope: firmware/esp32-csi-node/main/csi_collector.c, v2/crates/wifi-densepose-sensing-server/static/raw.html.

Context

After ADR-110 deployed the TP-Link WISP AP and the operator captured three controlled one-minute windows (empty / sit / walk), the RSSI MAD-Δ classifier failed to separate the three states — measured d values overlapped within ±0.03 of 0.49 while in-state spread was ±0.10. We inspected the live amplitude spectrum on the new raw.html console and saw a slow ±20-30 % broadband drift in the sensor amplitude even with the room provably empty. The drift was indistinguishable from body modulation at multi-meter range and dominated every downstream feature.

Francesco Pace's ESPectre project (GPLv3) traced the same artefact to the ESP32 PHY's automatic gain control: AGC continuously rebalances the receiver gain per packet so received frames stay in the optimal decoding range. For CSI sensing this is a disaster — the same channel state arrives with a different amplitude every packet because the gain stage shifts under it. Pace documented two undocumented PHY routines in the IDF blob that freeze AGC and FFT scaling, plus a calibration recipe (median of the first 300 packets) that is robust to brief startup activity.

Decisions

D1 — Port the ESPectre gain-lock to RuView FW

Added a self-contained block to csi_collector.c:

  • Overlay struct rv_phy_rx_ctrl_t aliased over wifi_csi_info_t.rx_ctrl to read the hidden agc_gain (u8) and fft_gain (signed i8) fields.
  • Extern declarations for the two PHY routines:
    extern void phy_fft_scale_force(bool force_en, int8_t force_value);
    extern void phy_force_rx_gain(int force_en, int force_value);
    
  • Two-phase calibration (rv_gain_lock_process):
    • Phase 1 (≤ 300 packets, ~6 s at the rate-gated 50 Hz callback): accumulate AGC and FFT samples into static arrays.
    • At the 300th packet: qsort both arrays, take the median, and call the two PHY routines to freeze gain.
  • Safety branch: if median AGC < 30, skip the lock and log a warning. Forcing a low gain on a strong-signal deployment causes the RX path to freeze (empirically documented in ESPectre's gain_controller.h).
  • Supported targets: ESP32-S3, ESP32-C3, ESP32-C6 only — older parts compile to a no-op stub. RuView ships on S3 so this is the only path we care about.

The hook is wired immediately after the existing rate-gate and MAC filter in the CSI callback so calibration completes within the first ~6 s after the WiFi association, regardless of host traffic. After that it short-circuits.

Tagged as ADR-100 in the source comment for traceability.

D2 — Use the existing raw.html console (ADR-110, D2 reuse) as the verification UI

The console added in ADR-110 already streams nodes[].amplitude from the existing WebSocket. No server-side change was needed. The HTML displays a per-node bar histogram of all 56 active subcarriers plus broadband mean amplitude and RSSI traces over the last 30 s. This is the surface where the operator can watch — without any DSP, without any classification — whether the gain-lock has actually flattened the baseline.

D3 — Geometry matters as much as gain-lock

A controlled three-state capture made on 2026-05-17 with both sensors positioned so that the line TP-Link AP → sensor passes through the operator (lying on the bed) confirmed both decisions. The summary table appears under Verified Acceptance below. Earlier captures (ADR-110) failed to separate states partly because the sensors were placed off-axis from the AP-to-body line; with that geometry the body never physically obstructs the CSI channel.

Calibration values observed (real captures, this deployment)

Node Boot rate (low traffic) Boot rate (ping flood) AGC median FFT scale median Lock decision
room01 (192.168.0.101) 0.3 fps 30+ fps 4244 31 / 33 APPLIED
room02 (192.168.0.100) 0.3 fps 30+ fps 44 40 / 42 APPLIED

Both AGC medians are comfortably above the 30 safety threshold. The calibration completes in ~6 s when there is any host traffic (a single ping to the sensor at 10 pps is enough); on a totally idle channel beacons drive the rate down to 0.3 fps and calibration would take ~17 minutes — practically we always have some traffic.

Verified Acceptance — three-state separation

Geometry: TP-Link AP on the wall, both sensors at table-level on the opposite side of the room, operator lying on the bed between AP and sensors. 30 seconds per state, gain-lock active on both nodes, raw.html open during capture, target_ip provisioned to the Mac's TP-Link-side IP (192.168.0.103) so no upstream NAT is in the path.

State node 1 mean A node 1 CV node 1 sub-CV <5 % node 2 mean A node 2 CV node 2 sub-CV <7 %
EMPTY (operator out) 37.28 2.71 % 44/44 9.52 5.22 % 26/44
STILL (operator lying still on bed) 22.43 3.70 % 30/44 9.67 5.02 % 24/44
WALK (operator pacing the room) 31.77 12.50 % 0/44 7.15 29.72 % 0/44

Observations:

  • Node 1 separates all three states by mean amplitude alone: 37 → 22 → 32. The body lying still blocks the direct path (40 % amplitude drop), then motion adds reflections back. The CV ladder 2.71 → 3.70 → 12.50 % is a second independent feature.
  • Node 2 separates STILL+EMPTY from WALK by CV (5 → 30 %). Its geometry doesn't pick up a still body, only motion.
  • Compare to ADR-110 where empty/sit/walk differed by ±0.02 inside ±0.10 noise — we now have inter-state separation ratios of ×3.4 on node 1 and ×5.9 on node 2. The signal is no longer dominated by baseline drift.

Files Touched

firmware/esp32-csi-node/main/csi_collector.c       # gain-lock module + hook
v2/crates/wifi-densepose-sensing-server/static/raw.html   # already from ADR-110
docs/adr/ADR-100-gain-lock-baseline-stabilization.md      # this ADR

Open Items

  • NBVI subcarrier selection — closed in ADR-102 (server-side port with quiet-window finder).
  • Server-side RSSI parsing — fixed by parallel agent in commit 3393c1e8 (parse_esp32_frame offset realignment + carrying RSSI through feature_state packets).
  • Calibration latency on an idle channel — closed in ADR-106 by the built-in managed-ping keepalive (drives sensor RX at 25 pkt/s/node out of the box).
  • NVS target_ip is hardcoded — still open. Tailscale-target option not implemented; sensors still send to the Mac's TP-Link- side IP (192.168.0.103). Mac roaming still breaks the CSI stream.

References

  • ADR-039 — Edge intelligence pipeline (host DSP path).
  • ADR-098 — Earlier ESP32-S3 deployment fixes.
  • ADR-110 — TP-Link WISP deployment + first RSSI-Δ attempt (this ADR supersedes the threshold table in ADR-110, D3 — the RSSI MAD-Δ detector is left in place but no longer the primary signal).
  • Francesco Pace, How I Turned My Wi-Fi Into a Motion Sensor — Part 2, Dec 2025 — source of the gain-lock recipe.
  • francescopace/espectre, components/espectre/gain_controller.{h,cpp} on GitHub — reference implementation (GPLv3).