wifi-densepose/v2/crates/wifi-densepose-train/tests
rUv 90a88ada9a
feat(train): metric-locked PCK/MPJPE accuracy harness + ADR-173 (resolve PCK-definition ambiguity) (#1092)
* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity

The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).

New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
  BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
  (retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
  metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
  n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
  number is structurally impossible.

17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).

This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).

wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness

Documents the accuracy harness (committed 3a8b2ed13) that resolves the
PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief
(#1090): three historical numbers (96/81.6/61) used three unstated
normalizations. The harness makes normalization explicit + selectable
(PckNormalization enum) and every reported number carries its definition.
Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-15 00:41:02 -04:00
..
test_config.rs fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769) 2026-05-23 05:36:13 -04:00
test_dataset.rs fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769) 2026-05-23 05:36:13 -04:00
test_losses.rs fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769) 2026-05-23 05:36:13 -04:00
test_mae.rs ADR-152: WiFi-Pose SOTA 2026 intake — WiFlow-STD benchmark, Rust integrations, ADR-153 802.11bf layer, efficiency frontier (#1008) 2026-06-11 17:02:23 -04:00
test_metrics.rs feat(train): metric-locked PCK/MPJPE accuracy harness + ADR-173 (resolve PCK-definition ambiguity) (#1092) 2026-06-15 00:41:02 -04:00
test_proof.rs fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769) 2026-05-23 05:36:13 -04:00
test_real_loader.rs fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769) 2026-05-23 05:36:13 -04:00
test_subcarrier.rs fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769) 2026-05-23 05:36:13 -04:00
test_wiflow_std_parity.rs ADR-152: WiFi-Pose SOTA 2026 intake — WiFlow-STD benchmark, Rust integrations, ADR-153 802.11bf layer, efficiency frontier (#1008) 2026-06-11 17:02:23 -04:00