# ADR-116 — WiFlow-v1 Supervised Pose Loader (Rust)

**Status**: Accepted (integration), needs fine-tune (output quality)
**Date**: 2026-05-17
**Scope**: `v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs` (new,
~430 lines incl. tests), `src/main.rs` (CLI flag + load + 5 tick-site hooks +
`pose_current` keypoint path), `src/lib.rs` (module export).

## Context

Until this ADR `/api/v1/pose/*` always returned an empty `persons` array
(ADR-105 — no synthetic fallback when no real model is loaded). HuggingFace
`ruv/ruview/wiflow-v1/wiflow-v1.json` is the project's official supervised
pose model (Apache-2.0, 974 KB, 92.9 % PCK@20 on its training set). It just
sat on disk because there was no Rust loader — the only reference impl is
`scripts/train-wiflow-supervised.js` (JS, training script, not deployment).

This ADR ports the JS inference path to Rust so sensing-server can serve
real 17-keypoint COCO skeletons in production.

## What was wrong in the model file (and how this ADR works around it)

The HuggingFace JSON has an `architecture` field that **lies**:

```json
"architecture": {
  "tcnChannels": [35, 256, 256, 192, 128],
  "tcnKernel": 7,
  "tcnDilations": [1, 2, 4, 8],
  "fcDims": [2560, 2048, 34]
}
```

That's the `full` scale (~7.7 M params). The file is actually the **lite**
scale (186,946 params — confirmed by `totalParams` field). The exporter at
`train-wiflow-supervised.js:1599` hardcodes the full-scale dict for every
scale. The loader trusts `totalParams` and ignores `architecture`.

Lite topology (recovered from `SCALE.lite` at `train-wiflow-supervised.js:135`
and verified by exact param count = 186,946):

* 2 TCN blocks (NOT 4), kernel = 3 (NOT 7), dilations [1, 2] (NOT [1,2,4,8])
* TCN channels: 35 → 32 → 32
* Per block: causal_conv → BN → ReLU → causal_conv → BN + residual → ReLU
  (1×1 projection on residual when in_ch ≠ out_ch, only block 0)
* Flatten 32 × 20 = 640 → fc1 (640→256) → ReLU → fc2 (256→34)
* Sigmoid on final 34-dim → 17 (x, y) keypoints in [0, 1]

## Decisions

### D1 — Pure-Rust forward pass, no new crates

`wiflow_v1.rs` is self-contained: Vec<f32> math by hand, inline base64
decoder (50 LoC), no `ndarray`, no `candle`, no `base64` crate added. The
inference is small enough (~250 K flops/forward) that hand-written Vec<f32>
loops are clearer than pulling a tensor framework for one model.

### D2 — Weight stream order matches `collectParams()` in the JS trainer

```
for each TCN block:
  conv1.weight (in_ch * k * out_ch f32s)
  conv1.bias   (out_ch)
  bn1.gamma    (out_ch)
  bn1.beta     (out_ch)
  conv2.weight, conv2.bias, bn2.gamma, bn2.beta
  (if in_ch != out_ch: res.weight, res.bias)
fc1.weight, fc1.bias, fc2.weight, fc2.bias
```

Loader asserts the stream is fully consumed (`Cursor::remaining() == 0`)
after fc2 — catches silent topology mismatches. Param count check
(`totalParams == 186_946`) catches scale mismatch before unpacking.

### D3 — BatchNorm uses per-window mean/var (matches JS impl)

`train-wiflow-supervised.js:770` computes mean/var across the T axis at
inference time, ignoring `runMean/runVar` accumulated during training.
Loader skips running stats entirely (only 2 params per channel stored:
gamma + beta). This is unusual but consistent — the network was trained
this way, so we infer this way.

### D4 — Input prep: top-35 subcarriers by NBVI, raw amplitudes

`build_input_from_history` (in `wiflow_v1.rs`):

1. Take last 20 frames from any node's `AmpState.nbvi_history` (Vec<Vec<f64>>).
2. Rank subcarriers by NBVI score (`α·σ/μ² + (1−α)·σ/μ`, α = 0.5) — same
   formula the classifier uses, but pick K = 35 (model input), not K = 12
   (classifier).
3. Apply 25th-percentile dead-zone gate to skip guard tones / null bins.
4. Build flat `[35 * 20]` row-major tensor of raw amplitudes (no z-score —
   training data wasn't normalised either, BN handles it).

If fewer than 20 frames or all subcarriers gated out → return `None`,
inference skipped this tick, `pose_keypoints: None` in SensingUpdate.

### D5 — Per-tick inference, longest-history node

`run_wiflow_inference()` at every `broadcast_tick_task` step (5 sites total
in `main.rs`):

* Picks the node with longest `nbvi_history` (ties broken by smallest
  node_id — deterministic).
* Cost: ~250 K flops on the lite scale (BN + 2 small convs + 2 FCs).
  Measured 0.4 ms on the Mac M1 — well under the 100 ms tick budget.
* Returns `Vec<[f64; 4]>` of length 17 (`[x, y, z=0, conf=1]`).

### D6 — `pose_current` reads `pose_keypoints` directly

Pre-ADR: `/api/v1/pose/current` read `latest_update.persons`. The tracker
populated `persons` from `derive_pose_from_sensing` (signal-derived,
synthetic) regardless of `model_loaded`. Loader-output `pose_keypoints`
was only read by the WS broadcaster.

This ADR makes `pose_current` prefer `pose_keypoints` when 17-len and
present, building a single `PersonDetection` with COCO joint names. Falls
back to tracker `persons` only when `pose_keypoints` is `None` (cold
start). Keeps the ADR-105 honesty gate: empty array if `model_loaded =
false`.

### D7 — Honest about output quality

The loaded model emits 17 keypoints, but values saturate near 0/1
(sigmoid extremes) — the network was trained on a different ESP32
deployment and has no learned response to ours. Integration is correct;
production-grade output needs deployment-specific fine-tune.

Follow-ups (Pack E): apply `node-1`/`node-2` LoRA adapters from the
same HuggingFace repo (~2h), or re-train via
`scripts/train-wiflow-supervised.js` against fresh camera ground-truth
(~30 min capture + 19 min training).

## Files Touched

```
v2/crates/wifi-densepose-sensing-server/src/wiflow_v1.rs   (new, ~430 LoC)
v2/crates/wifi-densepose-sensing-server/src/lib.rs         (+ pub mod)
v2/crates/wifi-densepose-sensing-server/src/main.rs:
  + use wiflow_v1::{self, WiflowModel}
  + Args.wiflow_model: Option<PathBuf>
  + static WIFLOW_MODEL: OnceLock<Option<WiflowModel>>
  + main()  — load before existing --model/--load-rvf path
  + fn run_wiflow_inference() -> Option<Vec<[f64;4]>>  (right after csi_keepalive_task)
  + 5 × `pose_keypoints: run_wiflow_inference()` at SensingUpdate sites
  + pose_current — prefer pose_keypoints when 17-len; fall back to persons
docs/adr/ADR-116-wiflow-v1-supervised-pose-loader.md  (this)
```

Binary size delta: 3.0 MB → 3.1 MB.

## Verified Acceptance

Live on the operator's TP-Link deployment (Mac .103, nodes .100/.101):
sensing-server log shows `ADR-116 wiflow-v1 loaded ... (lite scale,
186946 params)` + `keepalive: learned address for node 2/1`; `curl
/api/v1/info` returns `"pose_estimation": true`; `curl /api/v1/pose/current`
returns 17 named COCO keypoints under one `persons[0]`. End-to-end:
model on disk → loader → forward pass → 17 keypoints → REST + WS payload.

## Cargo tests

`wiflow_v1` ships 3 unit tests covering the most-likely-to-rot bits:

* `base64_round_trip_alphabet` — alphabet, padding, whitespace tolerance
* `sigmoid_bounds` — numerical stability at ±10 inputs
* `build_input_zero_history` — empty-history early return

`cargo test -p wifi-densepose-sensing-server wiflow_v1` → 3 passed.

## Open Items

* **Pack E.1 — LoRA adapter loader.** Apply `node-1`/`node-2` rank-8
  adapters from the same HF repo (~2 h).
* **Pack E.2 — Camera-supervised retrain for this room.**
  `scripts/collect-ground-truth.py` + `scripts/train-wiflow-supervised.js
  --scale lite` — should drop sigmoid saturation (~1 h + 19 min train).
* **Inference rate-limit / per-node pose tracks** — currently single
  virtual person emitted with fixed `zone_1` bbox; future LoRA-per-node
  could fan out to one `PersonDetection` per sensor viewpoint.

## References

* `scripts/train-wiflow-supervised.js` — JS reference implementation
* HuggingFace `ruv/ruview` — model file + LoRA adapters (Apache-2.0)
* ADR-079 — camera ground-truth training pipeline (the trainer this
  loader was built against)
* ADR-105 — "no synthetic data in production runtime"; this ADR keeps
  the gate but feeds it real model output
* ADR-115 — `/ota/set-target` (the prerequisite that got the CSI stream
  flowing again so this loader has data to consume)