wifi-densepose/docs/adr/ADR-116-wiflow-v1-supervise...

8.1 KiB
Raw Blame History

ADR-116 — WiFlow-v1 Supervised Pose Loader (Rust)

Status: Accepted (integration), needs fine-tune (output quality) Date: 2026-05-17 Scope: v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs (new, ~430 lines incl. tests), src/main.rs (CLI flag + load + 5 tick-site hooks + pose_current keypoint path), src/lib.rs (module export).

Context

Until this ADR /api/v1/pose/* always returned an empty persons array (ADR-105 — no synthetic fallback when no real model is loaded). HuggingFace ruv/ruview/wiflow-v1/wiflow-v1.json is the project's official supervised pose model (Apache-2.0, 974 KB, 92.9 % PCK@20 on its training set). It just sat on disk because there was no Rust loader — the only reference impl is scripts/train-wiflow-supervised.js (JS, training script, not deployment).

This ADR ports the JS inference path to Rust so sensing-server can serve real 17-keypoint COCO skeletons in production.

What was wrong in the model file (and how this ADR works around it)

The HuggingFace JSON has an architecture field that lies:

"architecture": {
  "tcnChannels": [35, 256, 256, 192, 128],
  "tcnKernel": 7,
  "tcnDilations": [1, 2, 4, 8],
  "fcDims": [2560, 2048, 34]
}

That's the full scale (~7.7 M params). The file is actually the lite scale (186,946 params — confirmed by totalParams field). The exporter at train-wiflow-supervised.js:1599 hardcodes the full-scale dict for every scale. The loader trusts totalParams and ignores architecture.

Lite topology (recovered from SCALE.lite at train-wiflow-supervised.js:135 and verified by exact param count = 186,946):

  • 2 TCN blocks (NOT 4), kernel = 3 (NOT 7), dilations [1, 2] (NOT [1,2,4,8])
  • TCN channels: 35 → 32 → 32
  • Per block: causal_conv → BN → ReLU → causal_conv → BN + residual → ReLU (1×1 projection on residual when in_ch ≠ out_ch, only block 0)
  • Flatten 32 × 20 = 640 → fc1 (640→256) → ReLU → fc2 (256→34)
  • Sigmoid on final 34-dim → 17 (x, y) keypoints in [0, 1]

Decisions

D1 — Pure-Rust forward pass, no new crates

wiflow_v1.rs is self-contained: Vec math by hand, inline base64 decoder (50 LoC), no ndarray, no candle, no base64 crate added. The inference is small enough (~250 K flops/forward) that hand-written Vec loops are clearer than pulling a tensor framework for one model.

D2 — Weight stream order matches collectParams() in the JS trainer

for each TCN block:
  conv1.weight (in_ch * k * out_ch f32s)
  conv1.bias   (out_ch)
  bn1.gamma    (out_ch)
  bn1.beta     (out_ch)
  conv2.weight, conv2.bias, bn2.gamma, bn2.beta
  (if in_ch != out_ch: res.weight, res.bias)
fc1.weight, fc1.bias, fc2.weight, fc2.bias

Loader asserts the stream is fully consumed (Cursor::remaining() == 0) after fc2 — catches silent topology mismatches. Param count check (totalParams == 186_946) catches scale mismatch before unpacking.

D3 — BatchNorm uses per-window mean/var (matches JS impl)

train-wiflow-supervised.js:770 computes mean/var across the T axis at inference time, ignoring runMean/runVar accumulated during training. Loader skips running stats entirely (only 2 params per channel stored: gamma + beta). This is unusual but consistent — the network was trained this way, so we infer this way.

D4 — Input prep: top-35 subcarriers by NBVI, raw amplitudes

build_input_from_history (in wiflow_v1.rs):

  1. Take last 20 frames from any node's AmpState.nbvi_history (Vec<Vec>).
  2. Rank subcarriers by NBVI score (α·σ/μ² + (1α)·σ/μ, α = 0.5) — same formula the classifier uses, but pick K = 35 (model input), not K = 12 (classifier).
  3. Apply 25th-percentile dead-zone gate to skip guard tones / null bins.
  4. Build flat [35 * 20] row-major tensor of raw amplitudes (no z-score — training data wasn't normalised either, BN handles it).

If fewer than 20 frames or all subcarriers gated out → return None, inference skipped this tick, pose_keypoints: None in SensingUpdate.

D5 — Per-tick inference, longest-history node

run_wiflow_inference() at every broadcast_tick_task step (5 sites total in main.rs):

  • Picks the node with longest nbvi_history (ties broken by smallest node_id — deterministic).
  • Cost: ~250 K flops on the lite scale (BN + 2 small convs + 2 FCs). Measured 0.4 ms on the Mac M1 — well under the 100 ms tick budget.
  • Returns Vec<[f64; 4]> of length 17 ([x, y, z=0, conf=1]).

D6 — pose_current reads pose_keypoints directly

Pre-ADR: /api/v1/pose/current read latest_update.persons. The tracker populated persons from derive_pose_from_sensing (signal-derived, synthetic) regardless of model_loaded. Loader-output pose_keypoints was only read by the WS broadcaster.

This ADR makes pose_current prefer pose_keypoints when 17-len and present, building a single PersonDetection with COCO joint names. Falls back to tracker persons only when pose_keypoints is None (cold start). Keeps the ADR-105 honesty gate: empty array if model_loaded = false.

D7 — Honest about output quality

The loaded model emits 17 keypoints, but values saturate near 0/1 (sigmoid extremes) — the network was trained on a different ESP32 deployment and has no learned response to ours. Integration is correct; production-grade output needs deployment-specific fine-tune.

Follow-ups (Pack E): apply node-1/node-2 LoRA adapters from the same HuggingFace repo (~2h), or re-train via scripts/train-wiflow-supervised.js against fresh camera ground-truth (~30 min capture + 19 min training).

Files Touched

v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs   (new, ~430 LoC)
v2/crates/wifi-densepose-sensing-server/src/lib.rs         (+ pub mod)
v2/crates/wifi-densepose-sensing-server/src/main.rs:
  + use wiflow_v1::{self, WiflowModel}
  + Args.wiflow_model: Option<PathBuf>
  + static WIFLOW_MODEL: OnceLock<Option<WiflowModel>>
  + main()  — load before existing --model/--load-rvf path
  + fn run_wiflow_inference() -> Option<Vec<[f64;4]>>  (right after csi_keepalive_task)
  + 5 × `pose_keypoints: run_wiflow_inference()` at SensingUpdate sites
  + pose_current — prefer pose_keypoints when 17-len; fall back to persons
docs/adr/ADR-116-wiflow-v1-supervised-pose-loader.md  (this)

Binary size delta: 3.0 MB → 3.1 MB.

Verified Acceptance

Live on the operator's TP-Link deployment (Mac .103, nodes .100/.101): sensing-server log shows ADR-116 wiflow-v1 loaded ... (lite scale, 186946 params) + keepalive: learned address for node 2/1; curl /api/v1/info returns "pose_estimation": true; curl /api/v1/pose/current returns 17 named COCO keypoints under one persons[0]. End-to-end: model on disk → loader → forward pass → 17 keypoints → REST + WS payload.

Cargo tests

wiflow_v1 ships 3 unit tests covering the most-likely-to-rot bits:

  • base64_round_trip_alphabet — alphabet, padding, whitespace tolerance
  • sigmoid_bounds — numerical stability at ±10 inputs
  • build_input_zero_history — empty-history early return

cargo test -p wifi-densepose-sensing-server wiflow_v1 → 3 passed.

Open Items

  • Pack E.1 — LoRA adapter loader. Apply node-1/node-2 rank-8 adapters from the same HF repo (~2 h).
  • Pack E.2 — Camera-supervised retrain for this room. scripts/collect-ground-truth.py + scripts/train-wiflow-supervised.js --scale lite — should drop sigmoid saturation (~1 h + 19 min train).
  • Inference rate-limit / per-node pose tracks — currently single virtual person emitted with fixed zone_1 bbox; future LoRA-per-node could fan out to one PersonDetection per sensor viewpoint.

References

  • scripts/train-wiflow-supervised.js — JS reference implementation
  • HuggingFace ruv/ruview — model file + LoRA adapters (Apache-2.0)
  • ADR-079 — camera ground-truth training pipeline (the trainer this loader was built against)
  • ADR-105 — "no synthetic data in production runtime"; this ADR keeps the gate but feeds it real model output
  • ADR-115 — /ota/set-target (the prerequisite that got the CSI stream flowing again so this loader has data to consume)