wifi-densepose/docs/adr/ADR-106-full-complex-csi-ke...

6.8 KiB
Raw Blame History

ADR-106 — Full Complex CSI in WS + Managed-Ping Keepalive

Status: Accepted Date: 2026-05-17 Scope: v2/crates/wifi-densepose-sensing-server/src/main.rs (NodeInfo struct, NodeState, udp_receiver_task, csi_keepalive_task, CLI --csi-keepalive-pps).

Context

The operator's instruction: "work without a model for now, but make sure the sensors give us everything described in the parent repo so the future model — and fine-motion detection right now — has full signal." Two gaps stood between the live deployment and that goal:

  1. WS NodeInfo carried only amplitude. The 56-bin per-subcarrier amplitude vector was exposed, but the equally-important phases vector (radians, atan2(Q, I)) was parsed by parse_esp32_frame and then silently dropped. Vital-signs FFT on phase, MERIDIAN-style hardware normalization, and any future DensePose-class model expect the full complex H[k] = A_k · e^{jφ_k}.
  2. Raw CSI rate depended on an ad-hoc shell ping. With nothing sending unicast traffic to the sensors, beacon-only rate dropped to ~0.3 fps — too slow even for breathing-band FFT. The operator was running ping -i 0.05 192.168.0.101 & by hand; if Mac switched network, it died.

Decisions

D1 — Expose phases + noise_floor + n_antennas + µs timestamp in NodeInfo

Four new fields, each #[serde(skip_serializing_if = empty/zero)] so feature_state ticks (no raw CSI) stay slim:

phases:           Vec<f64>,    // atan2(Q, I), radians
n_antennas:       u8,          // RX antenna count
noise_floor_dbm:  i8,          // RX noise floor
timestamp_us:     u64,         // sensor-side µs timestamp

This is the same data we already parse out of 0xC511_0001 frames in parse_esp32_frame; previously we threw phases away and never even surfaced noise_floor to the WS envelope. Consumers reconstruct the complex CSI with H[k] = amplitude[k] · (cos(phases[k]) + j·sin(phases[k])).

D2 — Per-node stash on NodeState

NodeState gains four new fields: latest_phases: Option<Vec<f64>>, latest_noise_floor: i8, latest_timestamp_us: u64, latest_n_antennas: u8. Populated on every raw-CSI frame in the second raw-CSI path (udp_receiver_task → raw CSI branch). build_node_features and the raw-CSI SensingUpdate builder both read from this stash to populate the new NodeInfo fields uniformly. Avoids carrying a full per-subcarrier phase history buffer — we only need the most recent vector for the UI / classifier; FFT consumers can build their own window.

D3 — Built-in keepalive via managed ping children

csi_keepalive_task async task:

  1. Watches NODE_ADDRS (per-node sender address, populated on every recv_from via a cheap magic-byte peek).
  2. For each known node, spawns one ping -i <interval> <ip> child process (/sbin/ping on macOS, /usr/bin/ping on Linux).
  3. Re-spawns the child if it dies or if the sensor's IP changes (DHCP rotation).
  4. Default rate --csi-keepalive-pps 25-i 0.040 for ping. --csi-keepalive-pps 0 disables.

D4 — Why ICMP, not UDP

We first tried a UDP-based keepalive (sock.send_to(&[0], src_addr) to the sensor's ephemeral source port). On the operator's deployment (ESP32-S3 + TP-Link WISP) it did not drive raw CSI: the sensor's UDP stack rejected the closed-port packet before the CSI callback fired in the WiFi RX path. ICMP echo bypasses user-space port logic entirely — kernel WiFi RX handles it and the CSI callback fires regardless of any listener.

Trade-off accepted: shelling out to /sbin/ping is platform- specific. Linux containers must include iputils-ping; macOS has /sbin/ping built-in. We probe both paths at startup. A pure-Rust raw-socket ICMP would avoid the dependency but needs root / CAP_NET_RAW.

Files Touched

v2/crates/wifi-densepose-sensing-server/src/main.rs
  - struct NodeInfo            (+4 fields, helpers is_zero_*)
  - struct NodeState           (+4 latest_* fields)
  - static NODE_ADDRS          (per-node source address map)
  - fn csi_keepalive_task      (managed ping pool)
  - udp_receiver_task          (NODE_ADDRS populate via magic peek)
  - all NodeInfo {...} sites   (5 — populate new fields)
  - Args { csi_keepalive_pps } (CLI flag, default 25)
docs/adr/ADR-106-full-complex-csi-keepalive.md   (this)

Two implementation commits on the branch:

  • 4daa2c9b — D1 + D2 (WS struct, per-node stash, NodeInfo builders)
  • 8489efe9 — D3 + D4 (keepalive task, NODE_ADDRS, CLI flag)

Verified Acceptance

Live, server fresh-restart, no shell ping running:

boot:    CSI keepalive: 25 ICMP pkt/s/node (interval 0.040s)
boot:    keepalive: learned address for node 1 = 192.168.0.101:60492
boot:    keepalive: learned address for node 2 = 192.168.0.100:51664
+2 s:    keepalive: ping -i 0.040 192.168.0.101 for node 1
+2 s:    keepalive: ping -i 0.040 192.168.0.100 for node 2

WS sample (5 s):
  node 1: 67.6 Hz updates, 55.6 Hz amp-bearing raw CSI
  node 2: 67.6 Hz updates, 55.6 Hz amp-bearing raw CSI

NodeInfo per node now carries amplitude[56], phases[56], rssi_dbm, noise_floor_dbm=-91, n_antennas=1, plus the empty/zero-suppressed timestamp_us (FW doesn't yet emit it — left as a 0 placeholder).

Sampling rate 55 Hz comfortably covers breathing band (0.10.5 Hz) and heart-rate band (0.82 Hz) for FFT; with the phase vector now on the wire, those FFTs can run on phase as well as amplitude, which is more sensitive to chest-wall micrometric motion.

Out of scope / open

  • FW-side µs timestamp — closed in commit b787f40a. FW now appends info->rx_ctrl.timestamp (u32 LE) as 4 trailing bytes after I/Q data; server parses opportunistically (None for older FW). NodeInfo.timestamp_us now carries sensor monotonic µs when available, falls back to server SystemTime otherwise.
  • Per-frame antenna selection when ESP32-S3 reports >1 antenna — current FW hard-codes n_antennas=1 in csi_collector.c. Single- antenna deployments are unaffected.
  • TP-Link queue limits — at 55 Hz × 2 nodes = 110 raw frames/s, plus 25 pings/s × 2 = 50 ICMP/s, all going through one consumer- grade AP. Watching for saturation. Reduce --csi-keepalive-pps if the AP starts dropping.
  • Channel hopping (ADR-029) would give frequency diversity. Single- channel works fine for one room.

References

  • ADR-100 — gain lock (the stability baseline keepalive needs).
  • ADR-101 — classifier (consumes phase via per-node amplitudes; future micro-motion detector will pull phase too).
  • ADR-103 — persistent baseline (loaded at server boot, unaffected by keepalive rate).
  • ADR-105 — no synthetic data (this ADR adds more real data, not more synthetic).
  • docs/references/espectre-gap-analysis.md — phase-aware processing is a prerequisite for several open items.