9.7 KiB
ADR-116 — WiFlow-v1 Supervised Pose Loader (Rust)
Status: Accepted (integration), needs fine-tune (output quality)
Date: 2026-05-17
Scope: v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs (new,
~430 lines incl. tests), src/main.rs (CLI flag + load + 5 tick-site hooks +
pose_current keypoint path), src/lib.rs (module export).
Context
Until this ADR /api/v1/pose/* always returned an empty persons array
(ADR-105 — no synthetic fallback when no real model is loaded). HuggingFace
ruv/ruview/wiflow-v1/wiflow-v1.json is the project's official supervised
pose model (Apache-2.0, 974 KB, 92.9 % PCK@20 on its training set). It just
sat on disk because there was no Rust loader — the only reference impl is
scripts/train-wiflow-supervised.js (JS, training script, not deployment).
This ADR ports the JS inference path to Rust so sensing-server can serve real 17-keypoint COCO skeletons in production.
What was wrong in the model file (and how this ADR works around it)
The HuggingFace JSON has an architecture field that lies:
"architecture": {
"tcnChannels": [35, 256, 256, 192, 128],
"tcnKernel": 7,
"tcnDilations": [1, 2, 4, 8],
"fcDims": [2560, 2048, 34]
}
That's the full scale (~7.7 M params). The file is actually the lite
scale (186,946 params — confirmed by totalParams field). The exporter at
train-wiflow-supervised.js:1599 hardcodes the full-scale dict for every
scale. The loader trusts totalParams and ignores architecture.
Lite topology (recovered from SCALE.lite at train-wiflow-supervised.js:135
and verified by exact param count = 186,946):
- 2 TCN blocks (NOT 4), kernel = 3 (NOT 7), dilations [1, 2] (NOT [1,2,4,8])
- TCN channels: 35 → 32 → 32
- Per block: causal_conv → BN → ReLU → causal_conv → BN + residual → ReLU (1×1 projection on residual when in_ch ≠ out_ch, only block 0)
- Flatten 32 × 20 = 640 → fc1 (640→256) → ReLU → fc2 (256→34)
- Sigmoid on final 34-dim → 17 (x, y) keypoints in [0, 1]
Decisions
D1 — Pure-Rust forward pass, no new crates
wiflow_v1.rs is self-contained: Vec math by hand, inline base64
decoder (50 LoC), no ndarray, no candle, no base64 crate added. The
inference is small enough (~250 K flops/forward) that hand-written Vec
loops are clearer than pulling a tensor framework for one model.
D2 — Weight stream order matches collectParams() in the JS trainer
for each TCN block:
conv1.weight (in_ch * k * out_ch f32s)
conv1.bias (out_ch)
bn1.gamma (out_ch)
bn1.beta (out_ch)
conv2.weight, conv2.bias, bn2.gamma, bn2.beta
(if in_ch != out_ch: res.weight, res.bias)
fc1.weight, fc1.bias, fc2.weight, fc2.bias
Loader asserts the stream is fully consumed (Cursor::remaining() == 0)
after fc2 — catches silent topology mismatches. Param count check
(totalParams == 186_946) catches scale mismatch before unpacking.
D3 — BatchNorm uses per-window mean/var (matches JS impl)
train-wiflow-supervised.js:770 computes mean/var across the T axis at
inference time, ignoring runMean/runVar accumulated during training.
Loader skips running stats entirely (only 2 params per channel stored:
gamma + beta). This is unusual but consistent — the network was trained
this way, so we infer this way.
D4 — Input prep: top-35 subcarriers by NBVI, raw amplitudes
build_input_from_history (in wiflow_v1.rs):
- Take last 20 frames from any node's
AmpState.nbvi_history(Vec<Vec>). - Rank subcarriers by NBVI score (
α·σ/μ² + (1−α)·σ/μ, α = 0.5) — same formula the classifier uses, but pick K = 35 (model input), not K = 12 (classifier). - Apply 25th-percentile dead-zone gate to skip guard tones / null bins.
- Build flat
[35 * 20]row-major tensor of raw amplitudes (no z-score — training data wasn't normalised either, BN handles it).
If fewer than 20 frames or all subcarriers gated out → return None,
inference skipped this tick, pose_keypoints: None in SensingUpdate.
D5 — Per-tick inference, longest-history node
run_wiflow_inference() at every broadcast_tick_task step (5 sites total
in main.rs):
- Picks the node with longest
nbvi_history(ties broken by smallest node_id — deterministic). - Cost: ~250 K flops on the lite scale (BN + 2 small convs + 2 FCs). Measured 0.4 ms on the Mac M1 — well under the 100 ms tick budget.
- Returns
Vec<[f64; 4]>of length 17 ([x, y, z=0, conf=1]).
D6 — pose_current reads pose_keypoints directly
Pre-ADR: /api/v1/pose/current read latest_update.persons. The tracker
populated persons from derive_pose_from_sensing (signal-derived,
synthetic) regardless of model_loaded. Loader-output pose_keypoints
was only read by the WS broadcaster.
This ADR makes pose_current prefer pose_keypoints when 17-len and
present, building a single PersonDetection with COCO joint names. Falls
back to tracker persons only when pose_keypoints is None (cold
start). Keeps the ADR-105 honesty gate: empty array if model_loaded = false.
D7 — Honest about output quality
The loaded model produces 17 keypoints, but the numerical values are saturated (most x/y near 0 or 1) — sigmoid extremes meaning the network has no learned response to our specific deployment's CSI distribution. This is expected: the model was trained on a different ESP32 setup, different room, different person, with camera ground truth we don't have here. The integration is correct; the model needs deployment-specific fine-tune to produce useful keypoints.
Two paths to usable output, left as follow-ups (Pack E):
- Apply
node-1.json/node-2.jsonLoRA adapters (ADR-117 candidate) — they're shipped alongsidewiflow-v1.jsonin the same HuggingFace repo, rank=8, alpha=16, target the encoder + task heads. Loader stub + forward fold ~2 h. - Re-train via
scripts/train-wiflow-supervised.jswith new ground- truth capture (~30 min capture + 19 min training per the model card). Operator-side work.
Files Touched
v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs (new, ~430 LoC)
v2/crates/wifi-densepose-sensing-server/src/lib.rs (+ pub mod)
v2/crates/wifi-densepose-sensing-server/src/main.rs:
+ use wiflow_v1::{self, WiflowModel}
+ Args.wiflow_model: Option<PathBuf>
+ static WIFLOW_MODEL: OnceLock<Option<WiflowModel>>
+ main() — load before existing --model/--load-rvf path
+ fn run_wiflow_inference() -> Option<Vec<[f64;4]>> (right after csi_keepalive_task)
+ 5 × `pose_keypoints: run_wiflow_inference()` at SensingUpdate sites
+ pose_current — prefer pose_keypoints when 17-len; fall back to persons
docs/adr/ADR-116-wiflow-v1-supervised-pose-loader.md (this)
Binary size delta: 3.0 MB → 3.1 MB.
Verified Acceptance
Live test on the operator's TP-Link deployment (.103, both nodes 192.168.0.100/.101):
$ ./target/release/sensing-server --source esp32 --csi-keepalive-pps 25 \
--wiflow-model data/models/ruview/wiflow-v1/wiflow-v1.json
...
ADR-116 wiflow-v1 loaded from data/models/ruview/wiflow-v1/wiflow-v1.json
(lite scale, 186946 params)
keepalive: learned address for node 2 = 192.168.0.100:63940
keepalive: learned address for node 1 = 192.168.0.101:63844
$ curl :8080/api/v1/info → "pose_estimation": true
$ curl :8080/api/v1/pose/stats → "model_loaded": true, frames_processed: 2699
$ curl :8080/api/v1/pose/current
{ persons: [{id: 1, keypoints: [17 × {name, x, y, z, confidence}], ...}],
total_persons: 1, model_loaded: true }
End-to-end: model on disk → loader → forward pass → 17 keypoints → REST & WS payload. UI's pose canvas (un-gated by ADR-105 D4) now draws what the model emits.
Cargo tests
wiflow_v1 ships 3 unit tests covering the most-likely-to-rot bits:
base64_round_trip_alphabet— alphabet, padding, whitespace tolerancesigmoid_bounds— numerical stability at ±10 inputsbuild_input_zero_history— empty-history early return
cargo test -p wifi-densepose-sensing-server wiflow_v1 → 3 passed.
Open Items
- Pack E.1 — LoRA adapter loader.
node-1.json/node-2.jsonrank-8 adapters from the same HF repo, ~21 KB each. The trainer encodes them in the same custom format aswiflow-v1.json(differentformattag), so the loader plumbing is small. ~2 h. - Pack E.2 — Camera-supervised retraining for this room. Run
scripts/collect-ground-truth.pyagainst this Mac's webcam + TP-Link/.100/.101 CSI for 5 min, thenscripts/train-wiflow- supervised.js --scale lite. Should drop sigmoid saturation and produce spatially-coherent keypoints. ~1 h operator + 19 min train. - Inference rate-limiting. Currently runs every tick (10 fps). If multiple WS clients connect, each tick computes once and the result is reused — fine. If model size grows to small/medium scale (~200K/800K params), should cache the result per tick instead of computing per-client.
- Per-node pose tracks. Right now a single virtual person is emitted;
the broadcaster places it in
zone_1with a fixed bbox. If/when LoRA adapters disambiguate per-node viewpoints, fan out to onePersonDetectionper node (left/right of the room).
References
scripts/train-wiflow-supervised.js— JS reference implementation- HuggingFace
ruv/ruview— model file + LoRA adapters (Apache-2.0) - ADR-079 — camera ground-truth training pipeline (the trainer this loader was built against)
- ADR-105 — "no synthetic data in production runtime"; this ADR keeps the gate but feeds it real model output
- ADR-115 —
/ota/set-target(the prerequisite that got the CSI stream flowing again so this loader has data to consume)