docs: ADR-106 — full complex CSI in WS + managed-ping keepalive
Records the two-part change that gets the maximum raw signal off the
sensors so the future model — and current fine-motion detection —
has everything the parent project describes:
D1 NodeInfo exposes phases[56], n_antennas, noise_floor_dbm,
timestamp_us in the WS payload (was amplitude-only).
D2 NodeState stashes latest phases/noise/timestamp/antenna count
so build_node_features can populate the new fields uniformly
without a parallel phase_history buffer.
D3 csi_keepalive_task spawns managed `ping` children per
discovered sensor address; replaces the operator's hand-run
`ping -i 0.05 …` workflow. CLI --csi-keepalive-pps controls
rate (default 25), 0 disables.
D4 Why ICMP not UDP: sensor rejects closed-port UDP before its
CSI callback fires; ICMP is handled in WiFi RX path regardless.
Verified: 55.6 Hz raw CSI per node with no shell ping; both
amplitude[56] and phases[56] populated; noise_floor=-91 dBm.
Two impl commits already on the branch: 4daa2c9b, 8489efe9.
This commit is contained in:
parent
8489efe9ae
commit
c6208621b5
|
|
@ -0,0 +1,160 @@
|
|||
# ADR-106 — Full Complex CSI in WS + Managed-Ping Keepalive
|
||||
|
||||
**Status**: Accepted
|
||||
**Date**: 2026-05-17
|
||||
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/main.rs`
|
||||
(`NodeInfo` struct, `NodeState`, `udp_receiver_task`,
|
||||
`csi_keepalive_task`, CLI `--csi-keepalive-pps`).
|
||||
|
||||
## Context
|
||||
|
||||
The operator's instruction: *"work without a model for now, but make
|
||||
sure the sensors give us everything described in the parent repo so
|
||||
the future model — and fine-motion detection right now — has full
|
||||
signal."* Two gaps stood between the live deployment and that goal:
|
||||
|
||||
1. **WS NodeInfo carried only amplitude.** The 56-bin per-subcarrier
|
||||
`amplitude` vector was exposed, but the equally-important
|
||||
`phases` vector (radians, `atan2(Q, I)`) was parsed by
|
||||
`parse_esp32_frame` and then silently dropped. Vital-signs FFT on
|
||||
phase, MERIDIAN-style hardware normalization, and any future
|
||||
DensePose-class model expect the full complex `H[k] = A_k · e^{jφ_k}`.
|
||||
2. **Raw CSI rate depended on an ad-hoc shell `ping`.** With nothing
|
||||
sending unicast traffic to the sensors, beacon-only rate dropped
|
||||
to ~0.3 fps — too slow even for breathing-band FFT. The operator
|
||||
was running `ping -i 0.05 192.168.0.101 &` by hand; if Mac switched
|
||||
network, it died.
|
||||
|
||||
## Decisions
|
||||
|
||||
### D1 — Expose phases + noise_floor + n_antennas + µs timestamp in `NodeInfo`
|
||||
|
||||
Four new fields, each `#[serde(skip_serializing_if = empty/zero)]` so
|
||||
feature_state ticks (no raw CSI) stay slim:
|
||||
|
||||
```rust
|
||||
phases: Vec<f64>, // atan2(Q, I), radians
|
||||
n_antennas: u8, // RX antenna count
|
||||
noise_floor_dbm: i8, // RX noise floor
|
||||
timestamp_us: u64, // sensor-side µs timestamp
|
||||
```
|
||||
|
||||
This is the same data we already parse out of `0xC511_0001` frames
|
||||
in `parse_esp32_frame`; previously we threw `phases` away and never
|
||||
even surfaced `noise_floor` to the WS envelope. Consumers
|
||||
reconstruct the complex CSI with `H[k] = amplitude[k] · (cos(phases[k]) + j·sin(phases[k]))`.
|
||||
|
||||
### D2 — Per-node stash on `NodeState`
|
||||
|
||||
`NodeState` gains four new fields:
|
||||
`latest_phases: Option<Vec<f64>>`, `latest_noise_floor: i8`,
|
||||
`latest_timestamp_us: u64`, `latest_n_antennas: u8`. Populated on
|
||||
every raw-CSI frame in the second raw-CSI path
|
||||
(`udp_receiver_task` → raw CSI branch). `build_node_features` and
|
||||
the raw-CSI SensingUpdate builder both read from this stash to
|
||||
populate the new `NodeInfo` fields uniformly. Avoids carrying a
|
||||
full per-subcarrier phase history buffer — we only need the most
|
||||
recent vector for the UI / classifier; FFT consumers can build their
|
||||
own window.
|
||||
|
||||
### D3 — Built-in keepalive via managed `ping` children
|
||||
|
||||
`csi_keepalive_task` async task:
|
||||
|
||||
1. Watches `NODE_ADDRS` (per-node sender address, populated on every
|
||||
recv_from via a cheap magic-byte peek).
|
||||
2. For each known node, spawns one `ping -i <interval> <ip>` child
|
||||
process (`/sbin/ping` on macOS, `/usr/bin/ping` on Linux).
|
||||
3. Re-spawns the child if it dies or if the sensor's IP changes
|
||||
(DHCP rotation).
|
||||
4. Default rate `--csi-keepalive-pps 25` → `-i 0.040` for `ping`.
|
||||
`--csi-keepalive-pps 0` disables.
|
||||
|
||||
### D4 — Why ICMP, not UDP
|
||||
|
||||
We first tried a UDP-based keepalive (`sock.send_to(&[0], src_addr)`
|
||||
to the sensor's ephemeral source port). On the operator's deployment
|
||||
(ESP32-S3 + TP-Link WISP) it did **not** drive raw CSI: the sensor's
|
||||
UDP stack rejected the closed-port packet before the CSI callback
|
||||
fired in the WiFi RX path. ICMP echo bypasses user-space port logic
|
||||
entirely — kernel WiFi RX handles it and the CSI callback fires
|
||||
regardless of any listener.
|
||||
|
||||
Trade-off accepted: shelling out to `/sbin/ping` is platform-
|
||||
specific. Linux containers must include `iputils-ping`; macOS has
|
||||
`/sbin/ping` built-in. We probe both paths at startup. A pure-Rust
|
||||
raw-socket ICMP would avoid the dependency but needs root /
|
||||
`CAP_NET_RAW`.
|
||||
|
||||
## Files Touched
|
||||
|
||||
```
|
||||
v2/crates/wifi-densepose-sensing-server/src/main.rs
|
||||
- struct NodeInfo (+4 fields, helpers is_zero_*)
|
||||
- struct NodeState (+4 latest_* fields)
|
||||
- static NODE_ADDRS (per-node source address map)
|
||||
- fn csi_keepalive_task (managed ping pool)
|
||||
- udp_receiver_task (NODE_ADDRS populate via magic peek)
|
||||
- all NodeInfo {...} sites (5 — populate new fields)
|
||||
- Args { csi_keepalive_pps } (CLI flag, default 25)
|
||||
docs/adr/ADR-106-full-complex-csi-keepalive.md (this)
|
||||
```
|
||||
|
||||
Two implementation commits on the branch:
|
||||
|
||||
* `4daa2c9b` — D1 + D2 (WS struct, per-node stash, NodeInfo builders)
|
||||
* `8489efe9` — D3 + D4 (keepalive task, NODE_ADDRS, CLI flag)
|
||||
|
||||
## Verified Acceptance
|
||||
|
||||
Live, server fresh-restart, no shell `ping` running:
|
||||
|
||||
```
|
||||
boot: CSI keepalive: 25 ICMP pkt/s/node (interval 0.040s)
|
||||
boot: keepalive: learned address for node 1 = 192.168.0.101:60492
|
||||
boot: keepalive: learned address for node 2 = 192.168.0.100:51664
|
||||
+2 s: keepalive: ping -i 0.040 192.168.0.101 for node 1
|
||||
+2 s: keepalive: ping -i 0.040 192.168.0.100 for node 2
|
||||
|
||||
WS sample (5 s):
|
||||
node 1: 67.6 Hz updates, 55.6 Hz amp-bearing raw CSI
|
||||
node 2: 67.6 Hz updates, 55.6 Hz amp-bearing raw CSI
|
||||
```
|
||||
|
||||
NodeInfo per node now carries `amplitude[56]`, `phases[56]`,
|
||||
`rssi_dbm`, `noise_floor_dbm=-91`, `n_antennas=1`, plus the
|
||||
empty/zero-suppressed `timestamp_us` (FW doesn't yet emit it —
|
||||
left as a 0 placeholder).
|
||||
|
||||
Sampling rate 55 Hz comfortably covers breathing band (0.1–0.5 Hz)
|
||||
and heart-rate band (0.8–2 Hz) for FFT; with the phase vector now
|
||||
on the wire, those FFTs can run on phase as well as amplitude,
|
||||
which is more sensitive to chest-wall micrometric motion.
|
||||
|
||||
## Out of scope / open
|
||||
|
||||
* **FW-side µs timestamp** — `info->rx_ctrl.timestamp` (u32, µs) is
|
||||
in `wifi_pkt_rx_ctrl_t` per ESP-IDF docs. Not yet propagated through
|
||||
the 0xC511_0001 binary header (reserved bytes 18..19 are available
|
||||
but unused). Future ADR-107 will reshape the header to include it.
|
||||
* **Per-frame antenna selection** when ESP32-S3 reports >1 antenna —
|
||||
current FW hard-codes `n_antennas=1` in `csi_collector.c`. Single-
|
||||
antenna deployments are unaffected.
|
||||
* **TP-Link queue limits** — at 55 Hz × 2 nodes = 110 raw frames/s,
|
||||
plus 25 pings/s × 2 = 50 ICMP/s, all going through one consumer-
|
||||
grade AP. Watching for saturation. Reduce `--csi-keepalive-pps` if
|
||||
the AP starts dropping.
|
||||
* **Channel hopping** (ADR-029) would give frequency diversity. Single-
|
||||
channel works fine for one room.
|
||||
|
||||
## References
|
||||
|
||||
* ADR-100 — gain lock (the stability baseline keepalive needs).
|
||||
* ADR-101 — classifier (consumes phase via per-node amplitudes; future
|
||||
micro-motion detector will pull phase too).
|
||||
* ADR-103 — persistent baseline (loaded at server boot, unaffected
|
||||
by keepalive rate).
|
||||
* ADR-105 — no synthetic data (this ADR adds *more* real data, not
|
||||
more synthetic).
|
||||
* [`docs/references/espectre-gap-analysis.md`](../references/espectre-gap-analysis.md)
|
||||
— phase-aware processing is a prerequisite for several open items.
|
||||
Loading…
Reference in New Issue