feat(train): metric-locked PCK/MPJPE accuracy harness + ADR-173 (resolve PCK-definition ambiguity) (#1092)

* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4) identifies metric ambiguity as the single biggest threat to any beyond-SOTA claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63% AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up because each silently uses a different normalization. The project was retracted twice over this (a withdrawn 92.9% used absolute pixels, not torso). New src/accuracy.rs makes the normalizer explicit, selectable, and carried with every reported number: - PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip), BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t) (retracted convention, reproducible + clearly non-comparable). - pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the metrics_core geometric primitives (no duplicate kernel). - mpjpe(pred, gt, vis) — 2D/3D, mm. - PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK number is structurally impossible. 17 hand-computed deterministic tests (no GPU, no datasets) prove the harness arithmetic, including the key proof that identical predictions score 0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate handling (zero torso, empty frames, NaN coords — no panic, never false-perfect). This is measurement infrastructure, NOT an accuracy claim. Public API worth an ADR — needs ADR slot 173 (parent to write). wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace green (exit 0); Python deterministic proof unchanged (f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a). Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness Documents the accuracy harness (committed 3a8b2ed13) that resolves the PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief (#1090): three historical numbers (96/81.6/61) used three unstated normalizations. The harness makes normalization explicit + selectable (PckNormalization enum) and every reported number carries its definition. Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs. Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-15 00:41:02 -04:00 · 2026-06-15 00:41:02 -04:00 · 90a88ada9a
parent cfd0ad76cf
commit 90a88ada9a
5 changed files with 902 additions and 0 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`homecore-recorder` security review (ADR-132 surfaces) — two real bounding fixes; SQL-injection & NaN-index dimensions confirmed clean with evidence.** Beyond-SOTA review of the HA-compat state recorder (DB persistence + history + ruvector semantic search), the crux being its DB-backed SQL-injection surface. **Findings + fixes:** (1) **Memory-DoS — unbounded `get_state_history`.** The history query carried no `LIMIT`, so a wide `[since, until]` window over a high-frequency entity (a per-second sensor ≈ 86k rows/day) would load an unbounded row set into a single in-memory `Vec`. Added a hard `LIMIT MAX_HISTORY_ROWS` (1,000,000 — generous enough never to truncate a realistic history graph, bounded enough to cap the worst case); the sibling search paths were already `k`-bounded. (2) **Disk-DoS / documented-but-missing `purge`.** The README + HA-compat table advertised `Recorder::purge(older_than)` as a capability, but **no such method existed** — i.e. no retention path at all → unbounded disk growth. Implemented a **transactional** `purge` that deletes `states` + `events` strictly **older than** the cutoff (**exclusive** boundary — idempotent, no off-by-one; a row at the cutoff instant is kept) and **garbage-collects** orphaned `state_attributes` blobs (a dedup-shared blob is dropped only once its last referencing state is gone); all three deletes run in one transaction so a mid-purge failure rolls back cleanly (no states-deleted-but-events-kept corruption). **Confirmed clean with evidence:** SQL injection — **every** query in `db.rs` uses bound `?` parameters (no `format!`/string-concat of user data into SQL); the lone `format!` builds the LIKE *pattern*, which is itself bound as a parameter with `ESCAPE '\\'` and metacharacter escaping. Pinned: a state value `'; DROP TABLE states; --` is stored/queried **literally** (table survives), and a `%`/`_` in a search query matches **literally**, not as a wildcard. NaN-index poisoning (the calibration/vitals/geo class) — **structurally impossible** here: embeddings are SHA-256 → `i32` → `f32` (an `i32` cast to `f32` is always finite, never NaN/Inf), with an all-zero-digest norm guard; probed empty-index search, empty-string query, and `k=0` — all return `Ok(0)`, **no panic**. Fail-closed write path — a removal event yields `Ok(None)`, semantic-index failure is logged not propagated (best-effort, never blocks the durable SQLite write), and `EntityId` parsing failures fall back rather than panic. **6 new pinning tests** (SQL-injection literal-storage, LIKE-metacharacter literalness, history `LIMIT`, purge exclusive-boundary, purge attribute-GC-keeps-shared, purge old-events): `homecore-recorder` **19 → 25** (`--no-default-features`) / **25 → 31** (`--features ruvector`), 0 failed; the purge-boundary test is a true pin (fails deleting 2 rows under an inclusive cutoff, passes deleting 1 under the exclusive cutoff). Behaviour otherwise unchanged; Python deterministic proof unchanged (recorder is off the signal proof path).

 ### Added
+- **Metric-locked PCK/MPJPE accuracy harness — resolves the PCK-definition ambiguity (`wifi-densepose-train`, needs ADR slot 173).** The SOTA brief (`docs/research/sota-nn-train-benchmark-brief.md` §1, §3.1, §4) found the single biggest threat to any "beyond-SOTA" claim is **metric ambiguity**: three PCK@20 figures (96.09% WiFlow-STD image-normalized, 81.63% AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up because each silently uses a different normalization — the project was retracted twice over this (a withdrawn "92.9%" used *absolute* pixels, not torso). New `src/accuracy.rs` makes the normalizer **explicit, selectable, and carried with every reported number**: a `PckNormalization` enum (`TorsoDiameter` = standard MM-Fi/GraphPose-Fi hip↔hip; `BoundingBoxDiagonal` = looser WiFlow-STD image-normalized; `AbsolutePixels(threshold)` = the retracted convention, included so historical numbers are reproducible and clearly labeled non-comparable); one canonical `pck_at(pred, gt, vis, k, normalization)` reusing the `metrics_core` geometric primitives (hip distance, bbox diagonal — no duplicate kernel); `mpjpe(pred, gt, vis)` (2D/3D, mm); and a self-describing `PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames }` returned by `accuracy_report(frames, ks, normalization)` so an **unlabeled PCK number is structurally impossible**. **17 hand-computed deterministic tests** (no GPU, no datasets) prove the harness arithmetic: perfect→PCK=1.0/MPJPE=0; all-just-outside→0.0; half-in-half-out→0.5; the **key proof** that identical predictions score 0.50 (torso) / 1.00 (bbox) / 0.75 (abs) under the three normalizations (the ambiguity is real and the definitions are distinct); MPJPE 2D/3D fixtures; and graceful degenerate handling (zero torso, empty frames, NaN coords — no panic, never a false-perfect). **This is measurement infrastructure, not an accuracy claim** — the tests prove the harness is correct, not that any model is good. `wifi-densepose-train` lib 191→206, `test_metrics` 12→14, 0 failed. Python deterministic proof unchanged (off the signal proof path).
 - **RuField `rufield-viewer` live-ingest mode — closes the RuView↔RuField visual loop (ADR-262 surfaces).** The dashboard gains `--source live --upstream <RuView-URL>`: it consumes RuView's `/ws/field` SSE (falling back to polling `/api/field`), **verifies every event's ed25519 provenance receipt on ingest** (`is_fusable`) — forged/tampered events are flagged ✗ and **never fused** into trusted inferences — and renders real RuView `FieldEvent`s through the same room-state/privacy-badge/fusion-graph/receipt path the synthetic mode uses (wire-compatible by construction: both sides use `rufield_core::FieldEvent` serde). **Strict banner honesty:** a single `BannerState` shows `SYNTHETIC` / `LIVE — <upstream>` / `DISCONNECTED — <upstream> unreachable`, mutually exclusive — never SYNTHETIC while showing live data or vice versa; live mode returns **409** on `/api/run` rather than fabricate a synthetic run, and starts DISCONNECTED until first verified contact. Default stays synthetic. 26 tests / 0 failed. `ruvnet/rufield` `crates/rufield-viewer`; `vendor/rufield` submodule bumped.
 - **ADR-262 P3 — live RuField surface: RuView's running sensing-server now speaks RuField on `/api/field` + `/ws/field`.** Wires the P1 `wifi-densepose-rufield` bridge into the live `wifi-densepose-sensing-server` (the bridge is the only added coupling, ADR-262 §5.4). A new `src/rufield_surface.rs` module (kept out of the 8k-line `main.rs`) holds a `FieldSurface` with a **dedicated ed25519 `Signer`**, a bounded ring buffer of recent signed events (`FIELD_RING_CAPACITY = 64`), and the `/ws/field` broadcast topic; it exposes `GET /api/field` (latest signed `FieldEvent`s + signer pubkey + a `dev_signing_key` flag) and `GET /ws/field` (per-cycle stream, mirroring `/ws/sensing`), plus a standalone `router()` for isolated testing. **Tap:** at the ESP32 governed-trust cycle (`main.rs` `observe_cycle` ~`:5886` / `SensingUpdate` build ~`:5938`), `emit_rufield_event` joins the cycle's real `SensingUpdate` (features/classification/signal_field) with the engine's recorded `effective_class`/`demoted` trust state into a `SensingSnapshot` and surfaces a signed `FieldEvent` — **existing endpoints (`/ws/sensing` etc.) are unchanged; this is purely additive.** **Signer (defers the P2 key decision, §8 Q1):** a **standalone dev/sensing key** from `WDP_RUFIELD_SIGNING_SEED` (64-hex or ≥32-byte value), else a deterministic dev default with a logged `WARN` — reusing the `cog-ha-matter` Ed25519 key is the deferred P2 call, so P3 does not pre-empt it. **Egress privacy (fail-closed):** `network_egress_allowed` is *stricter* than `DefaultPrivacyGuard` for an unattended live surface — only **P1/P2** leave the box; P0 (raw) and P3/P4/P5 are held edge-local, so a `Derived → P4/P5` cycle **never** surfaces; no-presence cycles emit **no phantom event**. **P3 acceptance gates (`tests/rufield_surface_test.rs`, 4 integration via `tower::oneshot` + 4 module unit, 0 failed):** a well-formed **signed** event (`Modality::WifiCsi`, P2 not P1, `is_fusable` ed25519-verified, real timestamp); empty cycle → no phantom; **privacy-safety** — an injected `Derived` trust never surfaces; a mixed stream surfaces only egress-safe events. **Honest scope (ADR-262 §0/§6):** real plumbing on a **live endpoint**, **NOT accuracy** — single-link CSI with its existing caveats (no validated room-coordinate accuracy — `field_localize`), a dedicated dev signing key pending the P2 ownership decision, no accuracy claim. The win is narrowly: "RuView's live sensing now speaks RuField on `/ws/field`."
 - **ADR-262 P1 — `wifi-densepose-rufield` anti-corruption bridge: RuView WiFi-CSI sensing → signed RuField `FieldEvent`s.** A new v2 workspace member (the *single coupling point* between RuView and the standalone RuField MFS spec, ADR-262 §5.4) that **path-deps** the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion` — pure-Rust, `--no-default-features`-buildable: serde/sha2/ed25519/toml only, no tch/openblas/ndarray/candle) and **no** RuView internal crate. The bridge takes owned primitives — `SensingSnapshot` mirrors the `/ws/sensing` `SensingUpdate` (features + classification + signal_field) joined with the `TrustedOutput` trust state (`trust_class`/`demoted`/`identity_bound`) — and `snapshot_to_field_event()` emits one **signed** `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`): a real `FieldTensor` from the feature scalars with the real `timestamp_ns`; an `Observation` whose `range_m`/`motion_vector`/`space_cell` are derived from the strongest **signal-field peak** when present (else `None` — coordinates are **never fabricated**, per the `field_localize` caveat) and `confidence` from the classification; a real `ProvenanceRef` (sha256 over the tensor bytes, `synthetic=false`) **ed25519-signed** so `rufield_provenance::is_fusable` passes. **The §3.3 privacy mapping is the critical correctness item**, implemented as `map_privacy()` mapping RuView's class onto RuField P0–P5 **by information content, NEVER by byte value** and **fail-closed**: RuView `Derived` (byte `1`, which sorts *below* `Anonymous` byte `2`) carries an identity embedding → maps to **P4** (or **P5** if identity-bound), **never P1** (the single most dangerous mapping mistake); `Raw → P0`, `Anonymous → P2`, `Restricted → P2`; a governed-engine `demoted` cycle floors the egress class to ≥ P2 with raw suppressed. **P1 acceptance gates (15 tests / 0 failed — 5 unit + 9 integration + 1 doc):** round-trip (`SensingSnapshot → FieldEvent →` serde `→` equal), `is_fusable` (verified ed25519 receipt), `RuFieldFusion::ingest` accept + `infer()` runs, **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy` — `Derived → P4/P5`, never P1; a table test over every RuView class; fail-closed demotion), and determinism (same snapshot + same signer seed → byte-identical event). **Honest scope:** this is **P1 plumbing** — a tested conversion + a safe privacy mapping. It is **not** wired into the live server (that is P3) and makes **no accuracy claim** (RuField v0.1 is synthetic; RuView's single-link CSI carries its own caveats). CI: the `rust-tests` workflow checkout gains `submodules: recursive` so the path-deps resolve. Python deterministic proof unchanged (off the signal proof path).
--- a/docs/adr/ADR-173-metric-locked-pck-mpjpe-accuracy-harness.md
+++ b/docs/adr/ADR-173-metric-locked-pck-mpjpe-accuracy-harness.md
@ -0,0 +1,123 @@
+# ADR-173: Metric-Locked PCK/MPJPE Accuracy Harness
+
+| Field | Value |
+|-------|-------|
+| **Status** | Accepted — implemented, deterministically tested |
+| **Date** | 2026-06-15 |
+| **Deciders** | ruv |
+| **Codename** | **METRIC-LOCK** |
+| **Amends** | ADR-155 (generalizes the torso-only `metrics_core::pck_canonical` to a selectable normalization) |
+| **Motivated by** | `docs/research/sota-nn-train-benchmark-brief.md` (PR #1090) |
+
+## Context
+
+The beyond-SOTA SOTA-research brief (PR #1090) identified the single biggest
+threat to any "beyond-SOTA" accuracy claim this project makes: **metric
+ambiguity**. Three PCK@20 numbers circulate, computed under three *different and
+unstated* normalizations, so they cannot be compared:
+
+- **96.09–96.61%** — WiFlow-STD reproduction, **image/bounding-box-normalized** PCK (the looser convention).
+- **81.63%** — an internal MM-Fi number reported as **"torso-PCK"** (tighter).
+- **61.1%** — GraphPose-Fi (arXiv 2511.19105), **standard torso-diameter** PCK on the MM-Fi random split (the academic frontier).
+
+The project has been burned by this twice: a previously-published 92.9% was
+retracted because it used **absolute-pixel** normalization, not torso. Until
+there is *one canonical, documented, tested* PCK definition — and every reported
+number carries the definition it was computed under — no accuracy comparison is
+credible, and the "prove everything" bar cannot be met for the benchmark half of
+the work.
+
+This is measurement infrastructure, not an accuracy claim. The deliverable's job
+is to make the metric **unambiguous and reproducible**, so future numbers are
+comparable and an unlabeled PCK is structurally impossible.
+
+## Decision
+
+Add a metric-locked accuracy harness as a new module
+`v2/crates/wifi-densepose-train/src/accuracy.rs` (404 non-test lines; inline
+deterministic tests bring the file to 708), re-exported at the crate root. It
+**extends, not duplicates** — it reuses `metrics_core`'s geometric primitives
+(`bounding_box_diagonal`, canonical hip indices `CANON_LEFT_HIP/RIGHT_HIP`), so
+there remains exactly one implementation of each geometric reference; the
+existing ADR-155 `pck_canonical` (torso-only) is unchanged and this generalizes
+it.
+
+### Public API
+
+- `enum PckNormalization { TorsoDiameter, BoundingBoxDiagonal, AbsolutePixels(f32) }`
+  — the three conventions the three historical numbers used, now **explicit and
+  selectable**. `.label()` / `.tolerance(...)`.
+- `pck_at(pred, gt, vis, k, norm) -> (correct, total, pck)` — PCK@k =
+  fraction of *visible* keypoints whose predicted-vs-GT distance ≤ the tolerance,
+  where tolerance = `k%` of the chosen normalizer (or an absolute threshold for
+  `AbsolutePixels`).
+- `mpjpe(pred, gt, vis) -> f32` — mean per-joint position error (2D/3D, coordinate
+  units; mm for mm inputs). Re-exported crate-root as `pck_mpjpe` to avoid
+  colliding with the existing `eval::mpjpe`.
+- `struct PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames }`
+  — **a reported number always carries its `normalization`**; an unlabeled PCK is
+  structurally impossible to produce through this surface.
+- `struct PoseFrame { pred, gt, visibility }` + `accuracy_report(frames, ks, norm) -> PoseAccuracy`
+  (micro-averaged over keypoints).
+
+### Correctness is proven by hand-computed deterministic tests (no GPU, no data)
+
+The tests construct synthetic keypoint sets whose PCK/MPJPE can be computed by
+hand, and assert the harness matches. Highlights (all pass):
+
+| Test | Construction | Expected |
+|------|--------------|----------|
+| perfect_prediction | pred==gt | PCK=1.0 (all 3 norms), MPJPE=0 |
+| all_just_outside | every error just past τ@20 | PCK=0.0 |
+| half_in_half_out | 2 exact, 2 just outside | PCK=0.5 |
+| **three_normalizations (KEY PROOF)** | identical pred; nose err .06, shoulder .10, hips exact | torso=**0.50**, bbox=**1.00**, abs(.08)=**0.75** |
+| mpjpe_2d / mpjpe_3d | (3,4)→5 / (1,2,2)→3 | 2.5 / 3.0 |
+| mpjpe_excludes_invisible | invisible joint err 100 ignored | 5.0 |
+| zero_torso_unscoreable | coincident hips | `(0,0,0.0)`, **not** false-perfect |
+| no_visible_keypoints | vis=∅ | `(0,0,0.0)` |
+| nan_coords | one NaN pred coord | counted wrong, **no panic** |
+| empty report | no frames | 0.0, **not** NaN |
+| bbox≥torso ordering | same frames | bbox-PCK ≥ torso-PCK |
+
+### The key proof (the ambiguity is real and quantified)
+
+Identical predictions, three declared normalizations → **0.50 / 1.00 / 0.75**.
+Mechanism: the bbox diagonal `√(0.20² + 0.80²) = 0.825` is ~4× the hip-span torso
+`0.20`, so τ@20 is 0.165 (bbox) vs 0.040 (torso) — the looser image-normalized
+convention passes joints the strict torso convention rejects. This is *exactly*
+why 96% / 81.6% / 61% cannot be lined up without declaring the enum, demonstrated
+in-code.
+
+## Validation
+
+- `cargo test -p wifi-densepose-train --no-default-features` → lib **191 → 206**
+  (+15), `test_metrics` **12 → 14** (+2), doc-tests 8 — **0 failed**.
+- `cargo test --workspace --no-default-features` → **exit 0**, 0 failed.
+- `python archive/v1/data/proof/verify.py` → **VERDICT: PASS**, hash
+  `f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a` **unchanged**
+  (off the signal proof path — confirms no pipeline alteration).
+
+## Consequences
+
+### Positive
+- The three historical PCK numbers can now be **recomputed under one declared
+  definition** and compared honestly. The retracted-number class of error
+  (silent normalization mismatch) is structurally prevented going forward.
+- Establishes the measurement substrate for the beyond-SOTA target: GraphPose-Fi
+  cross-environment **PCK@20 = 12.9%** (standard torso PCK) is now a number this
+  harness can produce comparably.
+
+### Negative
+- None functional. The harness is additive; no existing metric path changed.
+
+### Neutral
+- Producing actual model numbers under this harness requires the trained models +
+  datasets (MM-Fi) and, for cross-domain splits, is the next sub-deliverable of
+  the benchmark/optimization milestone — out of scope here (this ADR is the
+  *instrument*, not the *reading*).
+
+## Links
+- ADR-155 — metric core (`pck_canonical`, torso-only) — generalized here
+- ADR-152 — WiFi-Pose SOTA 2026 intake / WiFlow-STD benchmark
+- `docs/research/sota-nn-train-benchmark-brief.md` — the motivating gap analysis
+- GraphPose-Fi — arXiv 2511.19105 (verified cross-env PCK@20 = 12.9% anchor)
--- a/v2/crates/wifi-densepose-train/src/accuracy.rs
+++ b/v2/crates/wifi-densepose-train/src/accuracy.rs
@ -0,0 +1,708 @@
+//! Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173).
+//!
+//! # Why this module exists
+//!
+//! Three PCK\@20 numbers float around this project and **cannot be lined up**
+//! because each silently uses a *different* PCK definition:
+//!
+//! | Number | Source | PCK normalization |
+//! |--------|--------|-------------------|
+//! | 96.09 %  | WiFlow-STD reproduction | image / bounding-box normalized (looser) |
+//! | 81.63 %  | AetherArena MM-Fi (ADR-150) | torso-diameter (standard MM-Fi / GraphPose-Fi) |
+//! | 61.1 %   | GraphPose-Fi (preprint) | torso-diameter, 3D, mm-scale (harder) |
+//!
+//! The project was burned **twice** by metric ambiguity (a now-retracted "92.9 %
+//! PCK\@20" used *absolute* pixel thresholds, not torso normalization). The fix
+//! is to make the normalizer **explicit, selectable, and carried with every
+//! reported number** so an unlabeled PCK figure is structurally impossible.
+//!
+//! [`metrics_core`](crate::metrics_core) already pins the *canonical*
+//! torso-normalized PCK ([`pck_canonical`](crate::metrics_core::pck_canonical)).
+//! This module generalizes it to a [`PckNormalization`] enum covering all three
+//! conventions the SOTA brief names, adds [`mpjpe`] (mm), and bundles results
+//! into a self-describing [`PoseAccuracy`] struct. It **reuses** the
+//! `metrics_core` primitives (hip distance, bounding-box diagonal) — there is
+//! still exactly one implementation of each geometric reference.
+//!
+//! # This is measurement infrastructure, not an accuracy claim
+//!
+//! Nothing here asserts any project model is good. The unit tests prove the
+//! *harness* is arithmetically correct against hand-computed fixtures (no GPU,
+//! no datasets), including the key demonstration that the **same predictions
+//! score different PCK under the three normalizations** — proof the ambiguity is
+//! real and the definitions are genuinely distinct.
+//!
+//! # Literature
+//!
+//! - Torso-diameter PCK is the MM-Fi / GraphPose-Fi convention (Yang et al.,
+//!   *GraphPose-Fi*, arXiv:2511.19105): a keypoint is correct iff its error is
+//!   within `k · d_torso`, with `d_torso` the hip↔hip (or shoulder↔hip) span.
+//! - Bounding-box / image-normalized PCK is the WiFlow-STD-style looser
+//!   convention (arXiv:2602.08661) — normalize by the GT pose bbox diagonal.
+//! - MPJPE (mean per-joint position error, mm) is reported by GraphPose-Fi and
+//!   Person-in-WiFi-3D (Yan et al., CVPR 2024).
+
+use std::collections::BTreeMap;
+
+use ndarray::{Array1, Array2};
+
+use crate::metrics_core::{
+    bounding_box_diagonal, CANON_LEFT_HIP, CANON_RIGHT_HIP,
+};
+
+/// Visibility cutoff: a keypoint counts as *visible* iff `visibility[j] >= 0.5`
+/// (COCO convention; matches [`crate::metrics_core`]).
+const VISIBILITY_THRESHOLD: f32 = 0.5;
+
+/// Minimum positive normalizer extent. Below this the reference scale is
+/// considered degenerate (zero torso, collapsed bbox) and the frame is reported
+/// unscoreable rather than dividing by ≈0.
+const MIN_REFERENCE_EXTENT: f32 = 1e-6;
+
+// ===========================================================================
+// PCK normalization — the explicit, selectable definition
+// ===========================================================================
+
+/// The PCK normalization basis — **the single knob that made three project
+/// numbers non-comparable**, now explicit and carried with every result.
+///
+/// A keypoint `j` (with `visibility[j] >= 0.5`) is *correct* iff
+/// `‖pred_j − gt_j‖₂ ≤ τ`, where the **distance tolerance `τ`** is derived from
+/// the chosen normalization and the PCK threshold `k` (given as a percentage,
+/// e.g. `20` for PCK\@20):
+///
+/// | Variant | `τ` (tolerance in coordinate units) |
+/// |---------|--------------------------------------|
+/// | [`TorsoDiameter`](Self::TorsoDiameter)        | `(k/100) · d_torso` |
+/// | [`BoundingBoxDiagonal`](Self::BoundingBoxDiagonal) | `(k/100) · d_bbox`  |
+/// | [`AbsolutePixels`](Self::AbsolutePixels)      | `threshold` (k ignored) |
+///
+/// `d_torso` is the hip↔hip span (COCO joints 11↔12), falling back to the bbox
+/// diagonal when both hips are not visible — identical to
+/// [`crate::metrics_core::canonical_torso_size`]. `d_bbox` is the diagonal of
+/// the axis-aligned bounding box of all visible GT keypoints.
+///
+/// These yield **different** PCK on the *same* predictions whenever
+/// `d_torso ≠ d_bbox` (always true for a real pose: the bbox is larger than the
+/// hip span), which is exactly why the 96 / 81.6 / 61 numbers cannot be lined
+/// up without declaring this enum.
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum PckNormalization {
+    /// **Torso-diameter** (hip↔hip span). The standard MM-Fi / GraphPose-Fi
+    /// convention and the *stricter* of the two relative normalizers. This is
+    /// the canonical default ([`crate::metrics_core::pck_canonical`]).
+    TorsoDiameter,
+    /// **Bounding-box diagonal** (a.k.a. image-normalized). The looser
+    /// WiFlow-STD-style convention: normalize by the GT pose bbox diagonal,
+    /// which is larger than the torso span ⇒ a more forgiving threshold ⇒ a
+    /// higher PCK on identical predictions.
+    BoundingBoxDiagonal,
+    /// **Absolute pixel/coordinate threshold** — no pose-relative
+    /// normalization. The PCK `k` percentage is ignored; the held `threshold`
+    /// is the raw distance tolerance directly. Included so historical
+    /// retracted-style numbers are reproducible, and **clearly labeled as
+    /// non-comparable** to the relative variants (it does not scale with body
+    /// size or camera distance).
+    AbsolutePixels(f32),
+}
+
+impl PckNormalization {
+    /// Human-readable, *self-documenting* label for a reported number — so a
+    /// `PoseAccuracy` printed anywhere always carries its definition.
+    pub fn label(&self) -> String {
+        match self {
+            PckNormalization::TorsoDiameter => "torso-diameter".to_string(),
+            PckNormalization::BoundingBoxDiagonal => "bbox-diagonal".to_string(),
+            PckNormalization::AbsolutePixels(t) => format!("absolute-px({t})"),
+        }
+    }
+
+    /// Compute the per-frame distance tolerance `τ` for PCK threshold `k`
+    /// (percentage). Returns `None` when the (relative) normalizer is degenerate
+    /// — the frame cannot be scored.
+    ///
+    /// `gt_kpts` is `[n, 2]` (or `[n, ≥2]`, only x/y used); `visibility` is `[n]`.
+    fn tolerance(&self, gt_kpts: &Array2<f32>, visibility: &Array1<f32>, k: u8) -> Option<f32> {
+        let n = gt_kpts.shape()[0].min(visibility.len());
+        match self {
+            PckNormalization::AbsolutePixels(threshold) => {
+                // Raw tolerance, independent of pose scale and of `k`.
+                if *threshold > 0.0 {
+                    Some(*threshold)
+                } else {
+                    None
+                }
+            }
+            PckNormalization::TorsoDiameter => {
+                let d = torso_diameter(gt_kpts, visibility, n)?;
+                Some((k as f32 / 100.0) * d)
+            }
+            PckNormalization::BoundingBoxDiagonal => {
+                let d = bounding_box_diagonal(gt_kpts, visibility, n);
+                if d > MIN_REFERENCE_EXTENT {
+                    Some((k as f32 / 100.0) * d)
+                } else {
+                    None
+                }
+            }
+        }
+    }
+}
+
+/// Hip↔hip torso diameter with a bbox-diagonal fallback — the relative
+/// normalizer shared by `TorsoDiameter` PCK and
+/// [`crate::metrics_core::canonical_torso_size`]. Returns `None` when no
+/// positive-extent reference exists.
+fn torso_diameter(gt_kpts: &Array2<f32>, visibility: &Array1<f32>, n: usize) -> Option<f32> {
+    if CANON_LEFT_HIP < n
+        && CANON_RIGHT_HIP < n
+        && visibility[CANON_LEFT_HIP] >= VISIBILITY_THRESHOLD
+        && visibility[CANON_RIGHT_HIP] >= VISIBILITY_THRESHOLD
+    {
+        let dx = gt_kpts[[CANON_LEFT_HIP, 0]] - gt_kpts[[CANON_RIGHT_HIP, 0]];
+        let dy = gt_kpts[[CANON_LEFT_HIP, 1]] - gt_kpts[[CANON_RIGHT_HIP, 1]];
+        let torso = (dx * dx + dy * dy).sqrt();
+        if torso > MIN_REFERENCE_EXTENT {
+            return Some(torso);
+        }
+    }
+    let diag = bounding_box_diagonal(gt_kpts, visibility, n);
+    if diag > MIN_REFERENCE_EXTENT {
+        Some(diag)
+    } else {
+        None
+    }
+}
+
+// ===========================================================================
+// Single-frame PCK / MPJPE
+// ===========================================================================
+
+/// Per-frame **PCK\@`k`** under the selected `normalization`.
+///
+/// A keypoint `j` with `visibility[j] >= 0.5` is correct iff
+/// `‖pred_j − gt_j‖₂ ≤ τ`, with `τ` from
+/// [`PckNormalization::tolerance`]. Only x/y are used (2D PCK is the standard
+/// keypoint-PCK definition; pass 2-column arrays).
+///
+/// # Returns
+/// `(correct, total, pck)` with `pck ∈ [0,1]`. **`(0, 0, 0.0)`** when no
+/// keypoint is visible, or (for the relative normalizers) the reference scale is
+/// degenerate — a frame with no measurable evidence scores 0, never 1.
+/// NaN-valued coordinates make a keypoint *incorrect* (the `<=` comparison is
+/// false for NaN) rather than panicking.
+pub fn pck_at(
+    pred_kpts: &Array2<f32>,
+    gt_kpts: &Array2<f32>,
+    visibility: &Array1<f32>,
+    k: u8,
+    normalization: PckNormalization,
+) -> (usize, usize, f32) {
+    let n = pred_kpts.shape()[0]
+        .min(gt_kpts.shape()[0])
+        .min(visibility.len());
+    let tol = match normalization.tolerance(gt_kpts, visibility, k) {
+        Some(t) => t,
+        None => return (0, 0, 0.0),
+    };
+
+    let mut correct = 0usize;
+    let mut total = 0usize;
+    for j in 0..n {
+        if visibility[j] < VISIBILITY_THRESHOLD {
+            continue;
+        }
+        total += 1;
+        let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
+        let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
+        let dist = (dx * dx + dy * dy).sqrt();
+        // NaN-safe: `NaN <= tol` is false, so a NaN coordinate counts as wrong.
+        if dist <= tol {
+            correct += 1;
+        }
+    }
+    let pck = if total > 0 {
+        correct as f32 / total as f32
+    } else {
+        0.0
+    };
+    (correct, total, pck)
+}
+
+/// Per-frame **MPJPE** (mean per-joint position error) over visible keypoints,
+/// in the coordinate units of the inputs (report as mm when inputs are mm).
+///
+/// `pred`/`gt` are `[n, D]` with `D ∈ {2, 3}` (2D or 3D pose); all `D` columns
+/// are used. Joints with `visibility[j] < 0.5` are excluded.
+///
+/// Returns `0.0` when no keypoint is visible (no evidence). A NaN coordinate
+/// propagates into the returned mean (callers filter NaN frames upstream); it
+/// does not panic.
+pub fn mpjpe(pred: &Array2<f32>, gt: &Array2<f32>, visibility: &Array1<f32>) -> f32 {
+    let n = pred.shape()[0].min(gt.shape()[0]).min(visibility.len());
+    let d = pred.shape()[1].min(gt.shape()[1]);
+    let mut sum = 0.0f32;
+    let mut count = 0usize;
+    for j in 0..n {
+        if visibility[j] < VISIBILITY_THRESHOLD {
+            continue;
+        }
+        let mut sq = 0.0f32;
+        for c in 0..d {
+            let diff = pred[[j, c]] - gt[[j, c]];
+            sq += diff * diff;
+        }
+        sum += sq.sqrt();
+        count += 1;
+    }
+    if count > 0 {
+        sum / count as f32
+    } else {
+        0.0
+    }
+}
+
+// ===========================================================================
+// Self-describing result struct + batch report
+// ===========================================================================
+
+/// A pose-accuracy result that **always carries the definition it was computed
+/// under** — making an unlabeled PCK number structurally impossible.
+///
+/// Built by [`accuracy_report`] over a set of frames. `pck_at` maps each
+/// requested threshold `k` (percentage, e.g. `20`) to its PCK in `[0,1]`. The
+/// `normalization` field records *which* PCK definition produced those numbers,
+/// so two `PoseAccuracy` values can only be compared when their `normalization`
+/// matches (the comparability check the project lacked).
+#[derive(Debug, Clone, PartialEq)]
+pub struct PoseAccuracy {
+    /// PCK\@k for each requested threshold percentage `k`, in `[0,1]`.
+    pub pck_at: BTreeMap<u8, f32>,
+    /// Mean per-joint position error in coordinate units (mm for mm inputs).
+    pub mpjpe: f32,
+    /// The normalization basis under which `pck_at` was computed — the label a
+    /// reported number must always carry.
+    pub normalization: PckNormalization,
+    /// Number of keypoints per frame (the pose convention, e.g. 17 for COCO).
+    pub n_keypoints: usize,
+    /// Number of frames aggregated into this result.
+    pub n_frames: usize,
+}
+
+impl PoseAccuracy {
+    /// Convenience accessor for a single threshold, returning `None` when that
+    /// `k` was not requested.
+    pub fn pck(&self, k: u8) -> Option<f32> {
+        self.pck_at.get(&k).copied()
+    }
+
+    /// A one-line, self-documenting summary suitable for logs / RESULTS.md, e.g.
+    /// `PCK@20=0.750 (torso-diameter, 17kp, 1 frames) MPJPE=0.030`.
+    pub fn summary(&self) -> String {
+        let pcks: Vec<String> = self
+            .pck_at
+            .iter()
+            .map(|(k, v)| format!("PCK@{k}={v:.3}"))
+            .collect();
+        format!(
+            "{} ({}, {}kp, {} frames) MPJPE={:.4}",
+            pcks.join(" "),
+            self.normalization.label(),
+            self.n_keypoints,
+            self.n_frames,
+            self.mpjpe
+        )
+    }
+}
+
+/// One frame's prediction + ground truth + visibility for batch scoring.
+///
+/// All three arrays share row count `n_keypoints`; `pred`/`gt` are `[n, D]`
+/// (`D ∈ {2,3}`), `visibility` is `[n]`.
+#[derive(Debug, Clone)]
+pub struct PoseFrame {
+    /// Predicted keypoints `[n, D]`.
+    pub pred: Array2<f32>,
+    /// Ground-truth keypoints `[n, D]`.
+    pub gt: Array2<f32>,
+    /// Per-keypoint visibility `[n]` (`>= 0.5` ⇒ visible).
+    pub visibility: Array1<f32>,
+}
+
+/// Aggregate [`PoseAccuracy`] over a batch of frames under **one** explicit
+/// `normalization`, for the requested PCK thresholds `ks` (percentages).
+///
+/// PCK is micro-averaged over keypoints (sum of correct ÷ sum of visible across
+/// all frames — the standard keypoint-PCK aggregation), so frames with more
+/// visible joints contribute proportionally. MPJPE is micro-averaged over
+/// visible joints likewise. Unscoreable frames (no visible joints, degenerate
+/// relative normalizer) contribute `(0, 0)` and so are excluded from the
+/// denominator rather than scored as perfect.
+///
+/// An **empty** `frames` slice yields all-zero PCK and `0.0` MPJPE — never a
+/// panic or NaN.
+pub fn accuracy_report(
+    frames: &[PoseFrame],
+    ks: &[u8],
+    normalization: PckNormalization,
+) -> PoseAccuracy {
+    let n_keypoints = frames.first().map(|f| f.gt.shape()[0]).unwrap_or(0);
+
+    // PCK: per-threshold (correct, total) accumulators across frames.
+    let mut pck_acc: BTreeMap<u8, (usize, usize)> = ks.iter().map(|&k| (k, (0, 0))).collect();
+    // MPJPE: sum of per-joint distances and visible-joint count.
+    let mut mpjpe_sum = 0.0f32;
+    let mut mpjpe_count = 0usize;
+
+    for frame in frames {
+        for &k in ks {
+            let (c, t, _) = pck_at(&frame.pred, &frame.gt, &frame.visibility, k, normalization);
+            let entry = pck_acc.entry(k).or_insert((0, 0));
+            entry.0 += c;
+            entry.1 += t;
+        }
+        // Per-frame MPJPE re-derived as a (sum, count) contribution so the
+        // batch value is a true micro-average over joints.
+        let n = frame.pred.shape()[0].min(frame.gt.shape()[0]).min(frame.visibility.len());
+        let d = frame.pred.shape()[1].min(frame.gt.shape()[1]);
+        for j in 0..n {
+            if frame.visibility[j] < VISIBILITY_THRESHOLD {
+                continue;
+            }
+            let mut sq = 0.0f32;
+            for c in 0..d {
+                let diff = frame.pred[[j, c]] - frame.gt[[j, c]];
+                sq += diff * diff;
+            }
+            mpjpe_sum += sq.sqrt();
+            mpjpe_count += 1;
+        }
+    }
+
+    let pck_at: BTreeMap<u8, f32> = pck_acc
+        .into_iter()
+        .map(|(k, (c, t))| {
+            let v = if t > 0 { c as f32 / t as f32 } else { 0.0 };
+            (k, v)
+        })
+        .collect();
+
+    let mpjpe = if mpjpe_count > 0 {
+        mpjpe_sum / mpjpe_count as f32
+    } else {
+        0.0
+    };
+
+    PoseAccuracy {
+        pck_at,
+        mpjpe,
+        normalization,
+        n_keypoints,
+        n_frames: frames.len(),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    /// Build a 17-joint `[17, 2]` pose from `(joint, x, y)` triples.
+    fn pose17(joints: &[(usize, f32, f32)]) -> Array2<f32> {
+        let mut a = Array2::<f32>::zeros((17, 2));
+        for &(j, x, y) in joints {
+            a[[j, 0]] = x;
+            a[[j, 1]] = y;
+        }
+        a
+    }
+
+    fn vis17(visible: &[usize]) -> Array1<f32> {
+        let mut v = Array1::<f32>::zeros(17);
+        for &j in visible {
+            v[j] = 2.0;
+        }
+        v
+    }
+
+    // -------- consts pinned (no silent metric drift) --------
+    #[test]
+    fn accuracy_consts_unchanged() {
+        assert_eq!(VISIBILITY_THRESHOLD, 0.5_f32);
+        assert_eq!(MIN_REFERENCE_EXTENT, 1e-6_f32);
+    }
+
+    // -------- perfect prediction ⇒ PCK = 1.0, MPJPE = 0 --------
+    #[test]
+    fn perfect_prediction_pck_one_mpjpe_zero() {
+        let gt = pose17(&[
+            (5, 0.35, 0.35),
+            (CANON_LEFT_HIP, 0.40, 0.50),
+            (CANON_RIGHT_HIP, 0.60, 0.50),
+        ]);
+        let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        for norm in [
+            PckNormalization::TorsoDiameter,
+            PckNormalization::BoundingBoxDiagonal,
+            PckNormalization::AbsolutePixels(0.01),
+        ] {
+            let (c, t, pck) = pck_at(&gt, &gt, &vis, 20, norm);
+            assert_eq!((c, t), (3, 3), "{norm:?}");
+            assert!((pck - 1.0).abs() < 1e-6, "{norm:?} perfect PCK must be 1.0");
+        }
+        assert_eq!(mpjpe(&gt, &gt, &vis), 0.0);
+    }
+
+    // -------- all keypoints just OUTSIDE threshold ⇒ PCK = 0.0 --------
+    //
+    // Hand calc (torso): hips at (0.40,0.50)/(0.60,0.50) ⇒ torso = 0.20.
+    // threshold k=20 ⇒ τ = 0.20·0.20 = 0.04. Push every scored joint to an
+    // error of 0.05 (> 0.04) ⇒ all wrong. To avoid the hips themselves being
+    // "correct", we displace the hips too (their displaced positions still
+    // define the torso from GT, which is unchanged).
+    #[test]
+    fn all_just_outside_threshold_pck_zero() {
+        let gt = pose17(&[
+            (5, 0.50, 0.50),
+            (CANON_LEFT_HIP, 0.40, 0.50),
+            (CANON_RIGHT_HIP, 0.60, 0.50),
+        ]);
+        // GT torso = 0.20, τ@20 = 0.04. Displace each scored joint by dx=0.05.
+        let pred = pose17(&[
+            (5, 0.55, 0.50),
+            (CANON_LEFT_HIP, 0.45, 0.50),
+            (CANON_RIGHT_HIP, 0.65, 0.50),
+        ]);
+        let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
+        assert_eq!(t, 3);
+        assert_eq!(c, 0, "all errors 0.05 > τ 0.04 ⇒ none correct");
+        assert_eq!(pck, 0.0);
+    }
+
+    // -------- half-in / half-out ⇒ PCK = 0.5 --------
+    //
+    // Hand calc (torso): torso = 0.20, τ@20 = 0.04. Four visible joints; two
+    // exact (dist 0 ≤ 0.04, correct), two displaced 0.05 (> 0.04, wrong)
+    // ⇒ 2/4 = 0.5.
+    #[test]
+    fn half_in_half_out_pck_half() {
+        let gt = pose17(&[
+            (0, 0.50, 0.20),
+            (5, 0.50, 0.50),
+            (CANON_LEFT_HIP, 0.40, 0.50),
+            (CANON_RIGHT_HIP, 0.60, 0.50),
+        ]);
+        let pred = pose17(&[
+            (0, 0.50, 0.20),          // exact ⇒ correct
+            (5, 0.55, 0.50),          // err 0.05 ⇒ wrong
+            (CANON_LEFT_HIP, 0.40, 0.50),  // exact ⇒ correct
+            (CANON_RIGHT_HIP, 0.65, 0.50), // err 0.05 ⇒ wrong
+        ]);
+        let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
+        assert_eq!((c, t), (2, 4));
+        assert!((pck - 0.5).abs() < 1e-6, "expected 0.5, got {pck}");
+    }
+
+    // -------- THE KEY PROOF: same predictions, three normalizations, three PCK --------
+    //
+    // One construction scored three ways. Hand calc:
+    //   GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30),
+    //       l_hip(11)=(0.40,0.90), r_hip(12)=(0.60,0.90).
+    //   Visible = {0,5,11,12}, all four.
+    //   torso  = |0.60-0.40| = 0.20  (hips, y equal).
+    //   bbox: x∈[0.40,0.60] (w=0.20), y∈[0.10,0.90] (h=0.80)
+    //         ⇒ diag = sqrt(0.20² + 0.80²) = sqrt(0.04+0.64)=sqrt(0.68)=0.8246…
+    //
+    //   Pred errors (pure dx): nose 0.00, l_sh 0.10, l_hip 0.00, r_hip 0.00.
+    //   (Only joint 5 is displaced, by 0.10.)
+    //
+    //   k = 20:
+    //   • Torso  τ = 0.20·0.20 = 0.040 → joint5 err 0.10 > 0.040 ⇒ WRONG
+    //       ⇒ 3 correct / 4 = 0.75
+    //   • Bbox   τ = 0.20·0.8246 = 0.16492 → joint5 err 0.10 ≤ 0.16492 ⇒ CORRECT
+    //       ⇒ 4 correct / 4 = 1.00
+    //   • Abs(0.05) τ = 0.05 → joint5 err 0.10 > 0.05 ⇒ WRONG
+    //       ⇒ 3 correct / 4 = 0.75   (same count as torso HERE by coincidence)
+    //
+    //   To make ALL THREE differ, also test Abs(0.08): τ=0.08, joint5 0.10>0.08
+    //   ⇒ still 0.75. So we additionally displace nose by 0.06 (between 0.05 and
+    //   0.08) to separate the two absolute thresholds — see below.
+    #[test]
+    fn three_normalizations_give_different_pck_on_identical_input() {
+        let gt = pose17(&[
+            (0, 0.50, 0.10),  // nose
+            (5, 0.50, 0.30),  // left_shoulder
+            (CANON_LEFT_HIP, 0.40, 0.90),
+            (CANON_RIGHT_HIP, 0.60, 0.90),
+        ]);
+        // nose displaced 0.06, shoulder displaced 0.10, hips exact.
+        let pred = pose17(&[
+            (0, 0.56, 0.10),  // err 0.06
+            (5, 0.60, 0.30),  // err 0.10
+            (CANON_LEFT_HIP, 0.40, 0.90),  // exact
+            (CANON_RIGHT_HIP, 0.60, 0.90), // exact
+        ]);
+        let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+
+        // Torso τ@20 = 0.04: nose 0.06>0.04 wrong, sh 0.10>0.04 wrong,
+        //   hips exact ⇒ 2/4 = 0.5.
+        let (_, _, torso) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
+        // Bbox diag = sqrt(0.68)=0.82462; τ@20 = 0.164924:
+        //   nose 0.06 ≤ τ correct, sh 0.10 ≤ τ correct, hips exact ⇒ 4/4 = 1.0.
+        let (_, _, bbox) = pck_at(&pred, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal);
+        // Abs(0.08): nose 0.06 ≤ 0.08 correct, sh 0.10 > 0.08 wrong, hips exact
+        //   ⇒ 3/4 = 0.75.
+        let (_, _, abs) = pck_at(&pred, &gt, &vis, 20, PckNormalization::AbsolutePixels(0.08));
+
+        assert!((torso - 0.5).abs() < 1e-6, "torso PCK expected 0.5, got {torso}");
+        assert!((bbox - 1.0).abs() < 1e-6, "bbox PCK expected 1.0, got {bbox}");
+        assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK expected 0.75, got {abs}");
+
+        // The whole point: identical predictions, three DISTINCT PCK values.
+        assert!(torso != bbox && bbox != abs && torso != abs,
+            "normalizations must give distinct PCK: torso={torso}, bbox={bbox}, abs={abs}");
+    }
+
+    // -------- AbsolutePixels ignores k (raw threshold) --------
+    #[test]
+    fn absolute_pixels_ignores_threshold_percentage() {
+        let gt = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
+        let pred = pose17(&[(5, 0.53, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
+        let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        // τ = 0.05 raw; joint5 err 0.03 ≤ 0.05 correct. k=5 and k=99 must agree.
+        let (_, _, p5) = pck_at(&pred, &gt, &vis, 5, PckNormalization::AbsolutePixels(0.05));
+        let (_, _, p99) = pck_at(&pred, &gt, &vis, 99, PckNormalization::AbsolutePixels(0.05));
+        assert_eq!(p5, p99, "AbsolutePixels must ignore the k percentage");
+        assert!((p5 - 1.0).abs() < 1e-6, "all three within 0.05, got {p5}");
+    }
+
+    // -------- MPJPE hand-computed (2D and 3D) --------
+    #[test]
+    fn mpjpe_hand_computed_2d() {
+        // joint0 err (3,4)->5, joint1 exact->0 ⇒ mean (5+0)/2 = 2.5.
+        let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 1.0, 1.0]).unwrap();
+        let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 1.0, 1.0]).unwrap();
+        let vis = Array1::from(vec![2.0, 2.0]);
+        assert!((mpjpe(&pred, &gt, &vis) - 2.5).abs() < 1e-6);
+    }
+
+    #[test]
+    fn mpjpe_hand_computed_3d() {
+        // single joint err (1,2,2) -> sqrt(1+4+4)=3.0.
+        let gt = Array2::from_shape_vec((1, 3), vec![0.0, 0.0, 0.0]).unwrap();
+        let pred = Array2::from_shape_vec((1, 3), vec![1.0, 2.0, 2.0]).unwrap();
+        let vis = Array1::from(vec![2.0]);
+        assert!((mpjpe(&pred, &gt, &vis) - 3.0).abs() < 1e-6);
+    }
+
+    #[test]
+    fn mpjpe_excludes_invisible_joints() {
+        // joint0 visible err 5, joint1 INVISIBLE err 100 ⇒ mean = 5 (joint1 dropped).
+        let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 0.0, 0.0]).unwrap();
+        let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 100.0, 0.0]).unwrap();
+        let vis = Array1::from(vec![2.0, 0.0]);
+        assert!((mpjpe(&pred, &gt, &vis) - 5.0).abs() < 1e-6);
+    }
+
+    // -------- degenerate inputs: no panic --------
+    #[test]
+    fn zero_torso_is_unscoreable_not_perfect() {
+        // Both hips coincident ⇒ torso ≈ 0; bbox also collapses ⇒ None.
+        let gt = pose17(&[(CANON_LEFT_HIP, 0.5, 0.5), (CANON_RIGHT_HIP, 0.5, 0.5)]);
+        let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        assert_eq!(pck_at(&gt, &gt, &vis, 20, PckNormalization::TorsoDiameter), (0, 0, 0.0));
+        assert_eq!(pck_at(&gt, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal), (0, 0, 0.0));
+    }
+
+    #[test]
+    fn no_visible_keypoints_scores_zero() {
+        let gt = pose17(&[(CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
+        let vis = vis17(&[]); // nothing visible
+        let (c, t, pck) = pck_at(&gt, &gt, &vis, 20, PckNormalization::TorsoDiameter);
+        assert_eq!((c, t, pck), (0, 0, 0.0));
+        assert_eq!(mpjpe(&gt, &gt, &vis), 0.0);
+    }
+
+    #[test]
+    fn nan_coords_do_not_panic_and_count_wrong() {
+        let gt = pose17(&[(5, 0.5, 0.5), (CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
+        let mut pred = gt.clone();
+        pred[[5, 0]] = f32::NAN; // joint 5 prediction is NaN
+        let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
+        assert_eq!(t, 3);
+        assert_eq!(c, 2, "NaN joint must count as wrong, hips correct ⇒ 2/3");
+        assert!((pck - 2.0 / 3.0).abs() < 1e-6);
+        // mpjpe with a NaN joint yields NaN (caller filters) but must not panic.
+        assert!(mpjpe(&pred, &gt, &vis).is_nan());
+    }
+
+    // -------- batch report: micro-average + self-describing struct --------
+    #[test]
+    fn accuracy_report_micro_averages_and_carries_definition() {
+        // Frame A: 2 visible, both correct (2/2). Frame B: 2 visible, both wrong (0/2).
+        // Micro-average over joints: 2 correct / 4 = 0.5 (NOT mean-of-frame-PCK,
+        // which would be (1.0+0.0)/2 = 0.5 here too, but the accumulator is the
+        // joint-level one).
+        let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
+        let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        let frame_a = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis.clone() };
+        // Frame B: displace both hips by 0.05 (> τ 0.04) ⇒ both wrong.
+        let pred_b = pose17(&[(CANON_LEFT_HIP, 0.45, 0.50), (CANON_RIGHT_HIP, 0.65, 0.50)]);
+        let frame_b = PoseFrame { pred: pred_b, gt: gt.clone(), visibility: vis.clone() };
+
+        let report = accuracy_report(
+            &[frame_a, frame_b],
+            &[20, 50],
+            PckNormalization::TorsoDiameter,
+        );
+        assert_eq!(report.n_frames, 2);
+        assert_eq!(report.n_keypoints, 17);
+        assert_eq!(report.normalization, PckNormalization::TorsoDiameter);
+        // PCK@20: 2 correct / 4 visible = 0.5.
+        assert!((report.pck(20).unwrap() - 0.5).abs() < 1e-6);
+        // PCK@50: τ = 0.5·0.20 = 0.10, frame B err 0.05 ≤ 0.10 ⇒ all correct
+        //   ⇒ 4/4 = 1.0.
+        assert!((report.pck(50).unwrap() - 1.0).abs() < 1e-6);
+        // A reported number always carries its definition in the summary.
+        assert!(report.summary().contains("torso-diameter"));
+    }
+
+    #[test]
+    fn accuracy_report_empty_is_zero_not_nan() {
+        let report = accuracy_report(&[], &[20], PckNormalization::BoundingBoxDiagonal);
+        assert_eq!(report.n_frames, 0);
+        assert_eq!(report.pck(20), Some(0.0));
+        assert_eq!(report.mpjpe, 0.0);
+        assert!(!report.mpjpe.is_nan());
+    }
+
+    // -------- bbox-norm is looser than torso-norm (sanity, on a batch) --------
+    #[test]
+    fn bbox_norm_scores_at_least_torso_norm() {
+        // bbox diagonal >= torso span always (bbox encloses the hips), so for the
+        // SAME frames bbox-PCK >= torso-PCK at the same k. Pin this ordering.
+        let gt = pose17(&[
+            (0, 0.50, 0.10),
+            (5, 0.50, 0.40),
+            (CANON_LEFT_HIP, 0.40, 0.90),
+            (CANON_RIGHT_HIP, 0.60, 0.90),
+        ]);
+        let pred = pose17(&[
+            (0, 0.55, 0.10),
+            (5, 0.58, 0.40),
+            (CANON_LEFT_HIP, 0.42, 0.90),
+            (CANON_RIGHT_HIP, 0.62, 0.90),
+        ]);
+        let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+        let frame = PoseFrame { pred, gt, visibility: vis };
+        let torso = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::TorsoDiameter);
+        let bbox = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::BoundingBoxDiagonal);
+        assert!(
+            bbox.pck(20).unwrap() >= torso.pck(20).unwrap(),
+            "bbox-norm (looser) must be >= torso-norm: bbox={:?} torso={:?}",
+            bbox.pck(20), torso.pck(20)
+        );
+    }
+}
--- a/v2/crates/wifi-densepose-train/src/lib.rs
+++ b/v2/crates/wifi-densepose-train/src/lib.rs
@ -43,6 +43,11 @@
 // All *this* crate's code is written without unsafe blocks.
 #![warn(missing_docs)]

+/// Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173)
+/// — selectable `PckNormalization` (torso / bbox-diagonal / absolute), `mpjpe`,
+/// and a self-describing `PoseAccuracy` result so a reported PCK number always
+/// carries the definition it was computed under.
+pub mod accuracy;
 pub mod config;
 pub mod dataset;
 pub mod domain;
@ -89,6 +94,11 @@ pub use metrics_core::{
    canonical_torso_size, oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP,
    COCO_KP_SIGMAS,
 };
+// ADR-155 §Tier-1.2 — metric-locked accuracy harness (selectable PCK
+// normalization + MPJPE + self-describing result).
+pub use accuracy::{
+    accuracy_report, mpjpe as pck_mpjpe, pck_at, PckNormalization, PoseAccuracy, PoseFrame,
+};
 pub use config::TrainingConfig;
 pub use dataset::{
    CsiDataset, CsiSample, DataLoader, MmFiDataset, SyntheticConfig, SyntheticCsiDataset,
--- a/v2/crates/wifi-densepose-train/tests/test_metrics.rs
+++ b/v2/crates/wifi-densepose-train/tests/test_metrics.rs
@ -29,6 +29,66 @@

 use ndarray::{Array1, Array2};
 use wifi_densepose_train::{oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP};
+// ADR-155 §Tier-1.2 — metric-locked accuracy harness public surface.
+use wifi_densepose_train::{accuracy_report, pck_at, PckNormalization, PoseFrame};
+
+// ---------------------------------------------------------------------------
+// Metric-locked accuracy harness: the three PCK normalizations are reachable
+// from the crate root and give DIFFERENT PCK on identical predictions — the
+// proof that the 96 / 81.6 / 61 figures were non-comparable (validated here as
+// a downstream consumer would call it).
+// ---------------------------------------------------------------------------
+
+/// Identical predictions, three declared normalizations ⇒ three distinct PCK.
+/// Hand calc (all coords in `[0,1]`):
+/// * GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30), hips=(0.40,0.90)/(0.60,0.90).
+/// * Pred: nose err 0.06, shoulder err 0.10, hips exact.
+/// * torso = 0.20 ⇒ τ@20 = 0.04 ⇒ only hips correct ⇒ 2/4 = **0.50**.
+/// * bbox  = √(0.20²+0.80²)=0.82462 ⇒ τ@20 = 0.16492 ⇒ all correct ⇒ **1.00**.
+/// * abs(0.08): nose 0.06≤0.08 ok, shoulder 0.10>0.08 wrong ⇒ 3/4 = **0.75**.
+#[test]
+fn harness_three_normalizations_differ_from_crate_root() {
+    let gt = pose17(&[
+        (0, 0.50, 0.10),
+        (5, 0.50, 0.30),
+        (CANON_LEFT_HIP, 0.40, 0.90),
+        (CANON_RIGHT_HIP, 0.60, 0.90),
+    ]);
+    let pred = pose17(&[
+        (0, 0.56, 0.10),
+        (5, 0.60, 0.30),
+        (CANON_LEFT_HIP, 0.40, 0.90),
+        (CANON_RIGHT_HIP, 0.60, 0.90),
+    ]);
+    let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+
+    let (_, _, torso) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
+    let (_, _, bbox) = pck_at(&pred, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal);
+    let (_, _, abs) = pck_at(&pred, &gt, &vis, 20, PckNormalization::AbsolutePixels(0.08));
+
+    assert!((torso - 0.50).abs() < 1e-6, "torso PCK 0.50, got {torso}");
+    assert!((bbox - 1.00).abs() < 1e-6, "bbox PCK 1.00, got {bbox}");
+    assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK 0.75, got {abs}");
+    assert!(
+        torso != bbox && bbox != abs && torso != abs,
+        "three normalizations must be distinct: {torso} / {bbox} / {abs}"
+    );
+}
+
+/// `accuracy_report` returns a self-describing result carrying its normalization,
+/// so an unlabeled PCK number is structurally impossible at the API boundary.
+#[test]
+fn harness_report_carries_normalization_label() {
+    let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
+    let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
+    let frame = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis };
+    let report = accuracy_report(&[frame], &[20], PckNormalization::BoundingBoxDiagonal);
+    assert_eq!(report.normalization, PckNormalization::BoundingBoxDiagonal);
+    assert_eq!(report.n_keypoints, 17);
+    assert_eq!(report.n_frames, 1);
+    assert!((report.pck(20).unwrap() - 1.0).abs() < 1e-6);
+    assert!(report.summary().contains("bbox-diagonal"));
+}

 // ---------------------------------------------------------------------------
 // Tests that use `EvalMetrics` (requires tch-backend because the metrics