* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity
The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).
New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
(retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
number is structurally impossible.
17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).
This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).
wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness
Documents the accuracy harness (committed 3a8b2ed13) that resolves the
PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief
(#1090): three historical numbers (96/81.6/61) used three unstated
normalizations. The harness makes normalization explicit + selectable
(PckNormalization enum) and every reported number carries its definition.
Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs.
Co-Authored-By: claude-flow <ruv@ruv.net>