feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity

The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).

New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
  BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
  (retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
  metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
  n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
  number is structurally impossible.

17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).

This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).

wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
ruv 2026-06-15 00:20:37 -04:00
parent cfd0ad76cf
commit 3a8b2ed134
4 changed files with 779 additions and 0 deletions

View File

@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`homecore-recorder` security review (ADR-132 surfaces) — two real bounding fixes; SQL-injection & NaN-index dimensions confirmed clean with evidence.** Beyond-SOTA review of the HA-compat state recorder (DB persistence + history + ruvector semantic search), the crux being its DB-backed SQL-injection surface. **Findings + fixes:** (1) **Memory-DoS — unbounded `get_state_history`.** The history query carried no `LIMIT`, so a wide `[since, until]` window over a high-frequency entity (a per-second sensor ≈ 86k rows/day) would load an unbounded row set into a single in-memory `Vec`. Added a hard `LIMIT MAX_HISTORY_ROWS` (1,000,000 — generous enough never to truncate a realistic history graph, bounded enough to cap the worst case); the sibling search paths were already `k`-bounded. (2) **Disk-DoS / documented-but-missing `purge`.** The README + HA-compat table advertised `Recorder::purge(older_than)` as a capability, but **no such method existed** — i.e. no retention path at all → unbounded disk growth. Implemented a **transactional** `purge` that deletes `states` + `events` strictly **older than** the cutoff (**exclusive** boundary — idempotent, no off-by-one; a row at the cutoff instant is kept) and **garbage-collects** orphaned `state_attributes` blobs (a dedup-shared blob is dropped only once its last referencing state is gone); all three deletes run in one transaction so a mid-purge failure rolls back cleanly (no states-deleted-but-events-kept corruption). **Confirmed clean with evidence:** SQL injection — **every** query in `db.rs` uses bound `?` parameters (no `format!`/string-concat of user data into SQL); the lone `format!` builds the LIKE *pattern*, which is itself bound as a parameter with `ESCAPE '\\'` and metacharacter escaping. Pinned: a state value `'; DROP TABLE states; --` is stored/queried **literally** (table survives), and a `%`/`_` in a search query matches **literally**, not as a wildcard. NaN-index poisoning (the calibration/vitals/geo class) — **structurally impossible** here: embeddings are SHA-256 → `i32``f32` (an `i32` cast to `f32` is always finite, never NaN/Inf), with an all-zero-digest norm guard; probed empty-index search, empty-string query, and `k=0` — all return `Ok(0)`, **no panic**. Fail-closed write path — a removal event yields `Ok(None)`, semantic-index failure is logged not propagated (best-effort, never blocks the durable SQLite write), and `EntityId` parsing failures fall back rather than panic. **6 new pinning tests** (SQL-injection literal-storage, LIKE-metacharacter literalness, history `LIMIT`, purge exclusive-boundary, purge attribute-GC-keeps-shared, purge old-events): `homecore-recorder` **19 → 25** (`--no-default-features`) / **25 → 31** (`--features ruvector`), 0 failed; the purge-boundary test is a true pin (fails deleting 2 rows under an inclusive cutoff, passes deleting 1 under the exclusive cutoff). Behaviour otherwise unchanged; Python deterministic proof unchanged (recorder is off the signal proof path).
### Added
- **Metric-locked PCK/MPJPE accuracy harness — resolves the PCK-definition ambiguity (`wifi-densepose-train`, needs ADR slot 173).** The SOTA brief (`docs/research/sota-nn-train-benchmark-brief.md` §1, §3.1, §4) found the single biggest threat to any "beyond-SOTA" claim is **metric ambiguity**: three PCK@20 figures (96.09% WiFlow-STD image-normalized, 81.63% AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up because each silently uses a different normalization — the project was retracted twice over this (a withdrawn "92.9%" used *absolute* pixels, not torso). New `src/accuracy.rs` makes the normalizer **explicit, selectable, and carried with every reported number**: a `PckNormalization` enum (`TorsoDiameter` = standard MM-Fi/GraphPose-Fi hip↔hip; `BoundingBoxDiagonal` = looser WiFlow-STD image-normalized; `AbsolutePixels(threshold)` = the retracted convention, included so historical numbers are reproducible and clearly labeled non-comparable); one canonical `pck_at(pred, gt, vis, k, normalization)` reusing the `metrics_core` geometric primitives (hip distance, bbox diagonal — no duplicate kernel); `mpjpe(pred, gt, vis)` (2D/3D, mm); and a self-describing `PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames }` returned by `accuracy_report(frames, ks, normalization)` so an **unlabeled PCK number is structurally impossible**. **17 hand-computed deterministic tests** (no GPU, no datasets) prove the harness arithmetic: perfect→PCK=1.0/MPJPE=0; all-just-outside→0.0; half-in-half-out→0.5; the **key proof** that identical predictions score 0.50 (torso) / 1.00 (bbox) / 0.75 (abs) under the three normalizations (the ambiguity is real and the definitions are distinct); MPJPE 2D/3D fixtures; and graceful degenerate handling (zero torso, empty frames, NaN coords — no panic, never a false-perfect). **This is measurement infrastructure, not an accuracy claim** — the tests prove the harness is correct, not that any model is good. `wifi-densepose-train` lib 191→206, `test_metrics` 12→14, 0 failed. Python deterministic proof unchanged (off the signal proof path).
- **RuField `rufield-viewer` live-ingest mode — closes the RuView↔RuField visual loop (ADR-262 surfaces).** The dashboard gains `--source live --upstream <RuView-URL>`: it consumes RuView's `/ws/field` SSE (falling back to polling `/api/field`), **verifies every event's ed25519 provenance receipt on ingest** (`is_fusable`) — forged/tampered events are flagged ✗ and **never fused** into trusted inferences — and renders real RuView `FieldEvent`s through the same room-state/privacy-badge/fusion-graph/receipt path the synthetic mode uses (wire-compatible by construction: both sides use `rufield_core::FieldEvent` serde). **Strict banner honesty:** a single `BannerState` shows `SYNTHETIC` / `LIVE — <upstream>` / `DISCONNECTED — <upstream> unreachable`, mutually exclusive — never SYNTHETIC while showing live data or vice versa; live mode returns **409** on `/api/run` rather than fabricate a synthetic run, and starts DISCONNECTED until first verified contact. Default stays synthetic. 26 tests / 0 failed. `ruvnet/rufield` `crates/rufield-viewer`; `vendor/rufield` submodule bumped.
- **ADR-262 P3 — live RuField surface: RuView's running sensing-server now speaks RuField on `/api/field` + `/ws/field`.** Wires the P1 `wifi-densepose-rufield` bridge into the live `wifi-densepose-sensing-server` (the bridge is the only added coupling, ADR-262 §5.4). A new `src/rufield_surface.rs` module (kept out of the 8k-line `main.rs`) holds a `FieldSurface` with a **dedicated ed25519 `Signer`**, a bounded ring buffer of recent signed events (`FIELD_RING_CAPACITY = 64`), and the `/ws/field` broadcast topic; it exposes `GET /api/field` (latest signed `FieldEvent`s + signer pubkey + a `dev_signing_key` flag) and `GET /ws/field` (per-cycle stream, mirroring `/ws/sensing`), plus a standalone `router()` for isolated testing. **Tap:** at the ESP32 governed-trust cycle (`main.rs` `observe_cycle` ~`:5886` / `SensingUpdate` build ~`:5938`), `emit_rufield_event` joins the cycle's real `SensingUpdate` (features/classification/signal_field) with the engine's recorded `effective_class`/`demoted` trust state into a `SensingSnapshot` and surfaces a signed `FieldEvent`**existing endpoints (`/ws/sensing` etc.) are unchanged; this is purely additive.** **Signer (defers the P2 key decision, §8 Q1):** a **standalone dev/sensing key** from `WDP_RUFIELD_SIGNING_SEED` (64-hex or ≥32-byte value), else a deterministic dev default with a logged `WARN` — reusing the `cog-ha-matter` Ed25519 key is the deferred P2 call, so P3 does not pre-empt it. **Egress privacy (fail-closed):** `network_egress_allowed` is *stricter* than `DefaultPrivacyGuard` for an unattended live surface — only **P1/P2** leave the box; P0 (raw) and P3/P4/P5 are held edge-local, so a `Derived → P4/P5` cycle **never** surfaces; no-presence cycles emit **no phantom event**. **P3 acceptance gates (`tests/rufield_surface_test.rs`, 4 integration via `tower::oneshot` + 4 module unit, 0 failed):** a well-formed **signed** event (`Modality::WifiCsi`, P2 not P1, `is_fusable` ed25519-verified, real timestamp); empty cycle → no phantom; **privacy-safety** — an injected `Derived` trust never surfaces; a mixed stream surfaces only egress-safe events. **Honest scope (ADR-262 §0/§6):** real plumbing on a **live endpoint**, **NOT accuracy** — single-link CSI with its existing caveats (no validated room-coordinate accuracy — `field_localize`), a dedicated dev signing key pending the P2 ownership decision, no accuracy claim. The win is narrowly: "RuView's live sensing now speaks RuField on `/ws/field`."
- **ADR-262 P1 — `wifi-densepose-rufield` anti-corruption bridge: RuView WiFi-CSI sensing → signed RuField `FieldEvent`s.** A new v2 workspace member (the *single coupling point* between RuView and the standalone RuField MFS spec, ADR-262 §5.4) that **path-deps** the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion` — pure-Rust, `--no-default-features`-buildable: serde/sha2/ed25519/toml only, no tch/openblas/ndarray/candle) and **no** RuView internal crate. The bridge takes owned primitives — `SensingSnapshot` mirrors the `/ws/sensing` `SensingUpdate` (features + classification + signal_field) joined with the `TrustedOutput` trust state (`trust_class`/`demoted`/`identity_bound`) — and `snapshot_to_field_event()` emits one **signed** `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`): a real `FieldTensor` from the feature scalars with the real `timestamp_ns`; an `Observation` whose `range_m`/`motion_vector`/`space_cell` are derived from the strongest **signal-field peak** when present (else `None` — coordinates are **never fabricated**, per the `field_localize` caveat) and `confidence` from the classification; a real `ProvenanceRef` (sha256 over the tensor bytes, `synthetic=false`) **ed25519-signed** so `rufield_provenance::is_fusable` passes. **The §3.3 privacy mapping is the critical correctness item**, implemented as `map_privacy()` mapping RuView's class onto RuField P0P5 **by information content, NEVER by byte value** and **fail-closed**: RuView `Derived` (byte `1`, which sorts *below* `Anonymous` byte `2`) carries an identity embedding → maps to **P4** (or **P5** if identity-bound), **never P1** (the single most dangerous mapping mistake); `Raw → P0`, `Anonymous → P2`, `Restricted → P2`; a governed-engine `demoted` cycle floors the egress class to ≥ P2 with raw suppressed. **P1 acceptance gates (15 tests / 0 failed — 5 unit + 9 integration + 1 doc):** round-trip (`SensingSnapshot → FieldEvent →` serde `→` equal), `is_fusable` (verified ed25519 receipt), `RuFieldFusion::ingest` accept + `infer()` runs, **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy` — `Derived → P4/P5`, never P1; a table test over every RuView class; fail-closed demotion), and determinism (same snapshot + same signer seed → byte-identical event). **Honest scope:** this is **P1 plumbing** — a tested conversion + a safe privacy mapping. It is **not** wired into the live server (that is P3) and makes **no accuracy claim** (RuField v0.1 is synthetic; RuView's single-link CSI carries its own caveats). CI: the `rust-tests` workflow checkout gains `submodules: recursive` so the path-deps resolve. Python deterministic proof unchanged (off the signal proof path).

View File

@ -0,0 +1,708 @@
//! Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173).
//!
//! # Why this module exists
//!
//! Three PCK\@20 numbers float around this project and **cannot be lined up**
//! because each silently uses a *different* PCK definition:
//!
//! | Number | Source | PCK normalization |
//! |--------|--------|-------------------|
//! | 96.09 % | WiFlow-STD reproduction | image / bounding-box normalized (looser) |
//! | 81.63 % | AetherArena MM-Fi (ADR-150) | torso-diameter (standard MM-Fi / GraphPose-Fi) |
//! | 61.1 % | GraphPose-Fi (preprint) | torso-diameter, 3D, mm-scale (harder) |
//!
//! The project was burned **twice** by metric ambiguity (a now-retracted "92.9 %
//! PCK\@20" used *absolute* pixel thresholds, not torso normalization). The fix
//! is to make the normalizer **explicit, selectable, and carried with every
//! reported number** so an unlabeled PCK figure is structurally impossible.
//!
//! [`metrics_core`](crate::metrics_core) already pins the *canonical*
//! torso-normalized PCK ([`pck_canonical`](crate::metrics_core::pck_canonical)).
//! This module generalizes it to a [`PckNormalization`] enum covering all three
//! conventions the SOTA brief names, adds [`mpjpe`] (mm), and bundles results
//! into a self-describing [`PoseAccuracy`] struct. It **reuses** the
//! `metrics_core` primitives (hip distance, bounding-box diagonal) — there is
//! still exactly one implementation of each geometric reference.
//!
//! # This is measurement infrastructure, not an accuracy claim
//!
//! Nothing here asserts any project model is good. The unit tests prove the
//! *harness* is arithmetically correct against hand-computed fixtures (no GPU,
//! no datasets), including the key demonstration that the **same predictions
//! score different PCK under the three normalizations** — proof the ambiguity is
//! real and the definitions are genuinely distinct.
//!
//! # Literature
//!
//! - Torso-diameter PCK is the MM-Fi / GraphPose-Fi convention (Yang et al.,
//! *GraphPose-Fi*, arXiv:2511.19105): a keypoint is correct iff its error is
//! within `k · d_torso`, with `d_torso` the hip↔hip (or shoulder↔hip) span.
//! - Bounding-box / image-normalized PCK is the WiFlow-STD-style looser
//! convention (arXiv:2602.08661) — normalize by the GT pose bbox diagonal.
//! - MPJPE (mean per-joint position error, mm) is reported by GraphPose-Fi and
//! Person-in-WiFi-3D (Yan et al., CVPR 2024).
use std::collections::BTreeMap;
use ndarray::{Array1, Array2};
use crate::metrics_core::{
bounding_box_diagonal, CANON_LEFT_HIP, CANON_RIGHT_HIP,
};
/// Visibility cutoff: a keypoint counts as *visible* iff `visibility[j] >= 0.5`
/// (COCO convention; matches [`crate::metrics_core`]).
const VISIBILITY_THRESHOLD: f32 = 0.5;
/// Minimum positive normalizer extent. Below this the reference scale is
/// considered degenerate (zero torso, collapsed bbox) and the frame is reported
/// unscoreable rather than dividing by ≈0.
const MIN_REFERENCE_EXTENT: f32 = 1e-6;
// ===========================================================================
// PCK normalization — the explicit, selectable definition
// ===========================================================================
/// The PCK normalization basis — **the single knob that made three project
/// numbers non-comparable**, now explicit and carried with every result.
///
/// A keypoint `j` (with `visibility[j] >= 0.5`) is *correct* iff
/// `‖pred_j gt_j‖₂ ≤ τ`, where the **distance tolerance `τ`** is derived from
/// the chosen normalization and the PCK threshold `k` (given as a percentage,
/// e.g. `20` for PCK\@20):
///
/// | Variant | `τ` (tolerance in coordinate units) |
/// |---------|--------------------------------------|
/// | [`TorsoDiameter`](Self::TorsoDiameter) | `(k/100) · d_torso` |
/// | [`BoundingBoxDiagonal`](Self::BoundingBoxDiagonal) | `(k/100) · d_bbox` |
/// | [`AbsolutePixels`](Self::AbsolutePixels) | `threshold` (k ignored) |
///
/// `d_torso` is the hip↔hip span (COCO joints 11↔12), falling back to the bbox
/// diagonal when both hips are not visible — identical to
/// [`crate::metrics_core::canonical_torso_size`]. `d_bbox` is the diagonal of
/// the axis-aligned bounding box of all visible GT keypoints.
///
/// These yield **different** PCK on the *same* predictions whenever
/// `d_torso ≠ d_bbox` (always true for a real pose: the bbox is larger than the
/// hip span), which is exactly why the 96 / 81.6 / 61 numbers cannot be lined
/// up without declaring this enum.
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum PckNormalization {
/// **Torso-diameter** (hip↔hip span). The standard MM-Fi / GraphPose-Fi
/// convention and the *stricter* of the two relative normalizers. This is
/// the canonical default ([`crate::metrics_core::pck_canonical`]).
TorsoDiameter,
/// **Bounding-box diagonal** (a.k.a. image-normalized). The looser
/// WiFlow-STD-style convention: normalize by the GT pose bbox diagonal,
/// which is larger than the torso span ⇒ a more forgiving threshold ⇒ a
/// higher PCK on identical predictions.
BoundingBoxDiagonal,
/// **Absolute pixel/coordinate threshold** — no pose-relative
/// normalization. The PCK `k` percentage is ignored; the held `threshold`
/// is the raw distance tolerance directly. Included so historical
/// retracted-style numbers are reproducible, and **clearly labeled as
/// non-comparable** to the relative variants (it does not scale with body
/// size or camera distance).
AbsolutePixels(f32),
}
impl PckNormalization {
/// Human-readable, *self-documenting* label for a reported number — so a
/// `PoseAccuracy` printed anywhere always carries its definition.
pub fn label(&self) -> String {
match self {
PckNormalization::TorsoDiameter => "torso-diameter".to_string(),
PckNormalization::BoundingBoxDiagonal => "bbox-diagonal".to_string(),
PckNormalization::AbsolutePixels(t) => format!("absolute-px({t})"),
}
}
/// Compute the per-frame distance tolerance `τ` for PCK threshold `k`
/// (percentage). Returns `None` when the (relative) normalizer is degenerate
/// — the frame cannot be scored.
///
/// `gt_kpts` is `[n, 2]` (or `[n, ≥2]`, only x/y used); `visibility` is `[n]`.
fn tolerance(&self, gt_kpts: &Array2<f32>, visibility: &Array1<f32>, k: u8) -> Option<f32> {
let n = gt_kpts.shape()[0].min(visibility.len());
match self {
PckNormalization::AbsolutePixels(threshold) => {
// Raw tolerance, independent of pose scale and of `k`.
if *threshold > 0.0 {
Some(*threshold)
} else {
None
}
}
PckNormalization::TorsoDiameter => {
let d = torso_diameter(gt_kpts, visibility, n)?;
Some((k as f32 / 100.0) * d)
}
PckNormalization::BoundingBoxDiagonal => {
let d = bounding_box_diagonal(gt_kpts, visibility, n);
if d > MIN_REFERENCE_EXTENT {
Some((k as f32 / 100.0) * d)
} else {
None
}
}
}
}
}
/// Hip↔hip torso diameter with a bbox-diagonal fallback — the relative
/// normalizer shared by `TorsoDiameter` PCK and
/// [`crate::metrics_core::canonical_torso_size`]. Returns `None` when no
/// positive-extent reference exists.
fn torso_diameter(gt_kpts: &Array2<f32>, visibility: &Array1<f32>, n: usize) -> Option<f32> {
if CANON_LEFT_HIP < n
&& CANON_RIGHT_HIP < n
&& visibility[CANON_LEFT_HIP] >= VISIBILITY_THRESHOLD
&& visibility[CANON_RIGHT_HIP] >= VISIBILITY_THRESHOLD
{
let dx = gt_kpts[[CANON_LEFT_HIP, 0]] - gt_kpts[[CANON_RIGHT_HIP, 0]];
let dy = gt_kpts[[CANON_LEFT_HIP, 1]] - gt_kpts[[CANON_RIGHT_HIP, 1]];
let torso = (dx * dx + dy * dy).sqrt();
if torso > MIN_REFERENCE_EXTENT {
return Some(torso);
}
}
let diag = bounding_box_diagonal(gt_kpts, visibility, n);
if diag > MIN_REFERENCE_EXTENT {
Some(diag)
} else {
None
}
}
// ===========================================================================
// Single-frame PCK / MPJPE
// ===========================================================================
/// Per-frame **PCK\@`k`** under the selected `normalization`.
///
/// A keypoint `j` with `visibility[j] >= 0.5` is correct iff
/// `‖pred_j gt_j‖₂ ≤ τ`, with `τ` from
/// [`PckNormalization::tolerance`]. Only x/y are used (2D PCK is the standard
/// keypoint-PCK definition; pass 2-column arrays).
///
/// # Returns
/// `(correct, total, pck)` with `pck ∈ [0,1]`. **`(0, 0, 0.0)`** when no
/// keypoint is visible, or (for the relative normalizers) the reference scale is
/// degenerate — a frame with no measurable evidence scores 0, never 1.
/// NaN-valued coordinates make a keypoint *incorrect* (the `<=` comparison is
/// false for NaN) rather than panicking.
pub fn pck_at(
pred_kpts: &Array2<f32>,
gt_kpts: &Array2<f32>,
visibility: &Array1<f32>,
k: u8,
normalization: PckNormalization,
) -> (usize, usize, f32) {
let n = pred_kpts.shape()[0]
.min(gt_kpts.shape()[0])
.min(visibility.len());
let tol = match normalization.tolerance(gt_kpts, visibility, k) {
Some(t) => t,
None => return (0, 0, 0.0),
};
let mut correct = 0usize;
let mut total = 0usize;
for j in 0..n {
if visibility[j] < VISIBILITY_THRESHOLD {
continue;
}
total += 1;
let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
let dist = (dx * dx + dy * dy).sqrt();
// NaN-safe: `NaN <= tol` is false, so a NaN coordinate counts as wrong.
if dist <= tol {
correct += 1;
}
}
let pck = if total > 0 {
correct as f32 / total as f32
} else {
0.0
};
(correct, total, pck)
}
/// Per-frame **MPJPE** (mean per-joint position error) over visible keypoints,
/// in the coordinate units of the inputs (report as mm when inputs are mm).
///
/// `pred`/`gt` are `[n, D]` with `D ∈ {2, 3}` (2D or 3D pose); all `D` columns
/// are used. Joints with `visibility[j] < 0.5` are excluded.
///
/// Returns `0.0` when no keypoint is visible (no evidence). A NaN coordinate
/// propagates into the returned mean (callers filter NaN frames upstream); it
/// does not panic.
pub fn mpjpe(pred: &Array2<f32>, gt: &Array2<f32>, visibility: &Array1<f32>) -> f32 {
let n = pred.shape()[0].min(gt.shape()[0]).min(visibility.len());
let d = pred.shape()[1].min(gt.shape()[1]);
let mut sum = 0.0f32;
let mut count = 0usize;
for j in 0..n {
if visibility[j] < VISIBILITY_THRESHOLD {
continue;
}
let mut sq = 0.0f32;
for c in 0..d {
let diff = pred[[j, c]] - gt[[j, c]];
sq += diff * diff;
}
sum += sq.sqrt();
count += 1;
}
if count > 0 {
sum / count as f32
} else {
0.0
}
}
// ===========================================================================
// Self-describing result struct + batch report
// ===========================================================================
/// A pose-accuracy result that **always carries the definition it was computed
/// under** — making an unlabeled PCK number structurally impossible.
///
/// Built by [`accuracy_report`] over a set of frames. `pck_at` maps each
/// requested threshold `k` (percentage, e.g. `20`) to its PCK in `[0,1]`. The
/// `normalization` field records *which* PCK definition produced those numbers,
/// so two `PoseAccuracy` values can only be compared when their `normalization`
/// matches (the comparability check the project lacked).
#[derive(Debug, Clone, PartialEq)]
pub struct PoseAccuracy {
/// PCK\@k for each requested threshold percentage `k`, in `[0,1]`.
pub pck_at: BTreeMap<u8, f32>,
/// Mean per-joint position error in coordinate units (mm for mm inputs).
pub mpjpe: f32,
/// The normalization basis under which `pck_at` was computed — the label a
/// reported number must always carry.
pub normalization: PckNormalization,
/// Number of keypoints per frame (the pose convention, e.g. 17 for COCO).
pub n_keypoints: usize,
/// Number of frames aggregated into this result.
pub n_frames: usize,
}
impl PoseAccuracy {
/// Convenience accessor for a single threshold, returning `None` when that
/// `k` was not requested.
pub fn pck(&self, k: u8) -> Option<f32> {
self.pck_at.get(&k).copied()
}
/// A one-line, self-documenting summary suitable for logs / RESULTS.md, e.g.
/// `PCK@20=0.750 (torso-diameter, 17kp, 1 frames) MPJPE=0.030`.
pub fn summary(&self) -> String {
let pcks: Vec<String> = self
.pck_at
.iter()
.map(|(k, v)| format!("PCK@{k}={v:.3}"))
.collect();
format!(
"{} ({}, {}kp, {} frames) MPJPE={:.4}",
pcks.join(" "),
self.normalization.label(),
self.n_keypoints,
self.n_frames,
self.mpjpe
)
}
}
/// One frame's prediction + ground truth + visibility for batch scoring.
///
/// All three arrays share row count `n_keypoints`; `pred`/`gt` are `[n, D]`
/// (`D ∈ {2,3}`), `visibility` is `[n]`.
#[derive(Debug, Clone)]
pub struct PoseFrame {
/// Predicted keypoints `[n, D]`.
pub pred: Array2<f32>,
/// Ground-truth keypoints `[n, D]`.
pub gt: Array2<f32>,
/// Per-keypoint visibility `[n]` (`>= 0.5` ⇒ visible).
pub visibility: Array1<f32>,
}
/// Aggregate [`PoseAccuracy`] over a batch of frames under **one** explicit
/// `normalization`, for the requested PCK thresholds `ks` (percentages).
///
/// PCK is micro-averaged over keypoints (sum of correct ÷ sum of visible across
/// all frames — the standard keypoint-PCK aggregation), so frames with more
/// visible joints contribute proportionally. MPJPE is micro-averaged over
/// visible joints likewise. Unscoreable frames (no visible joints, degenerate
/// relative normalizer) contribute `(0, 0)` and so are excluded from the
/// denominator rather than scored as perfect.
///
/// An **empty** `frames` slice yields all-zero PCK and `0.0` MPJPE — never a
/// panic or NaN.
pub fn accuracy_report(
frames: &[PoseFrame],
ks: &[u8],
normalization: PckNormalization,
) -> PoseAccuracy {
let n_keypoints = frames.first().map(|f| f.gt.shape()[0]).unwrap_or(0);
// PCK: per-threshold (correct, total) accumulators across frames.
let mut pck_acc: BTreeMap<u8, (usize, usize)> = ks.iter().map(|&k| (k, (0, 0))).collect();
// MPJPE: sum of per-joint distances and visible-joint count.
let mut mpjpe_sum = 0.0f32;
let mut mpjpe_count = 0usize;
for frame in frames {
for &k in ks {
let (c, t, _) = pck_at(&frame.pred, &frame.gt, &frame.visibility, k, normalization);
let entry = pck_acc.entry(k).or_insert((0, 0));
entry.0 += c;
entry.1 += t;
}
// Per-frame MPJPE re-derived as a (sum, count) contribution so the
// batch value is a true micro-average over joints.
let n = frame.pred.shape()[0].min(frame.gt.shape()[0]).min(frame.visibility.len());
let d = frame.pred.shape()[1].min(frame.gt.shape()[1]);
for j in 0..n {
if frame.visibility[j] < VISIBILITY_THRESHOLD {
continue;
}
let mut sq = 0.0f32;
for c in 0..d {
let diff = frame.pred[[j, c]] - frame.gt[[j, c]];
sq += diff * diff;
}
mpjpe_sum += sq.sqrt();
mpjpe_count += 1;
}
}
let pck_at: BTreeMap<u8, f32> = pck_acc
.into_iter()
.map(|(k, (c, t))| {
let v = if t > 0 { c as f32 / t as f32 } else { 0.0 };
(k, v)
})
.collect();
let mpjpe = if mpjpe_count > 0 {
mpjpe_sum / mpjpe_count as f32
} else {
0.0
};
PoseAccuracy {
pck_at,
mpjpe,
normalization,
n_keypoints,
n_frames: frames.len(),
}
}
#[cfg(test)]
mod tests {
use super::*;
/// Build a 17-joint `[17, 2]` pose from `(joint, x, y)` triples.
fn pose17(joints: &[(usize, f32, f32)]) -> Array2<f32> {
let mut a = Array2::<f32>::zeros((17, 2));
for &(j, x, y) in joints {
a[[j, 0]] = x;
a[[j, 1]] = y;
}
a
}
fn vis17(visible: &[usize]) -> Array1<f32> {
let mut v = Array1::<f32>::zeros(17);
for &j in visible {
v[j] = 2.0;
}
v
}
// -------- consts pinned (no silent metric drift) --------
#[test]
fn accuracy_consts_unchanged() {
assert_eq!(VISIBILITY_THRESHOLD, 0.5_f32);
assert_eq!(MIN_REFERENCE_EXTENT, 1e-6_f32);
}
// -------- perfect prediction ⇒ PCK = 1.0, MPJPE = 0 --------
#[test]
fn perfect_prediction_pck_one_mpjpe_zero() {
let gt = pose17(&[
(5, 0.35, 0.35),
(CANON_LEFT_HIP, 0.40, 0.50),
(CANON_RIGHT_HIP, 0.60, 0.50),
]);
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
for norm in [
PckNormalization::TorsoDiameter,
PckNormalization::BoundingBoxDiagonal,
PckNormalization::AbsolutePixels(0.01),
] {
let (c, t, pck) = pck_at(&gt, &gt, &vis, 20, norm);
assert_eq!((c, t), (3, 3), "{norm:?}");
assert!((pck - 1.0).abs() < 1e-6, "{norm:?} perfect PCK must be 1.0");
}
assert_eq!(mpjpe(&gt, &gt, &vis), 0.0);
}
// -------- all keypoints just OUTSIDE threshold ⇒ PCK = 0.0 --------
//
// Hand calc (torso): hips at (0.40,0.50)/(0.60,0.50) ⇒ torso = 0.20.
// threshold k=20 ⇒ τ = 0.20·0.20 = 0.04. Push every scored joint to an
// error of 0.05 (> 0.04) ⇒ all wrong. To avoid the hips themselves being
// "correct", we displace the hips too (their displaced positions still
// define the torso from GT, which is unchanged).
#[test]
fn all_just_outside_threshold_pck_zero() {
let gt = pose17(&[
(5, 0.50, 0.50),
(CANON_LEFT_HIP, 0.40, 0.50),
(CANON_RIGHT_HIP, 0.60, 0.50),
]);
// GT torso = 0.20, τ@20 = 0.04. Displace each scored joint by dx=0.05.
let pred = pose17(&[
(5, 0.55, 0.50),
(CANON_LEFT_HIP, 0.45, 0.50),
(CANON_RIGHT_HIP, 0.65, 0.50),
]);
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!(t, 3);
assert_eq!(c, 0, "all errors 0.05 > τ 0.04 ⇒ none correct");
assert_eq!(pck, 0.0);
}
// -------- half-in / half-out ⇒ PCK = 0.5 --------
//
// Hand calc (torso): torso = 0.20, τ@20 = 0.04. Four visible joints; two
// exact (dist 0 ≤ 0.04, correct), two displaced 0.05 (> 0.04, wrong)
// ⇒ 2/4 = 0.5.
#[test]
fn half_in_half_out_pck_half() {
let gt = pose17(&[
(0, 0.50, 0.20),
(5, 0.50, 0.50),
(CANON_LEFT_HIP, 0.40, 0.50),
(CANON_RIGHT_HIP, 0.60, 0.50),
]);
let pred = pose17(&[
(0, 0.50, 0.20), // exact ⇒ correct
(5, 0.55, 0.50), // err 0.05 ⇒ wrong
(CANON_LEFT_HIP, 0.40, 0.50), // exact ⇒ correct
(CANON_RIGHT_HIP, 0.65, 0.50), // err 0.05 ⇒ wrong
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!((c, t), (2, 4));
assert!((pck - 0.5).abs() < 1e-6, "expected 0.5, got {pck}");
}
// -------- THE KEY PROOF: same predictions, three normalizations, three PCK --------
//
// One construction scored three ways. Hand calc:
// GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30),
// l_hip(11)=(0.40,0.90), r_hip(12)=(0.60,0.90).
// Visible = {0,5,11,12}, all four.
// torso = |0.60-0.40| = 0.20 (hips, y equal).
// bbox: x∈[0.40,0.60] (w=0.20), y∈[0.10,0.90] (h=0.80)
// ⇒ diag = sqrt(0.20² + 0.80²) = sqrt(0.04+0.64)=sqrt(0.68)=0.8246…
//
// Pred errors (pure dx): nose 0.00, l_sh 0.10, l_hip 0.00, r_hip 0.00.
// (Only joint 5 is displaced, by 0.10.)
//
// k = 20:
// • Torso τ = 0.20·0.20 = 0.040 → joint5 err 0.10 > 0.040 ⇒ WRONG
// ⇒ 3 correct / 4 = 0.75
// • Bbox τ = 0.20·0.8246 = 0.16492 → joint5 err 0.10 ≤ 0.16492 ⇒ CORRECT
// ⇒ 4 correct / 4 = 1.00
// • Abs(0.05) τ = 0.05 → joint5 err 0.10 > 0.05 ⇒ WRONG
// ⇒ 3 correct / 4 = 0.75 (same count as torso HERE by coincidence)
//
// To make ALL THREE differ, also test Abs(0.08): τ=0.08, joint5 0.10>0.08
// ⇒ still 0.75. So we additionally displace nose by 0.06 (between 0.05 and
// 0.08) to separate the two absolute thresholds — see below.
#[test]
fn three_normalizations_give_different_pck_on_identical_input() {
let gt = pose17(&[
(0, 0.50, 0.10), // nose
(5, 0.50, 0.30), // left_shoulder
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
// nose displaced 0.06, shoulder displaced 0.10, hips exact.
let pred = pose17(&[
(0, 0.56, 0.10), // err 0.06
(5, 0.60, 0.30), // err 0.10
(CANON_LEFT_HIP, 0.40, 0.90), // exact
(CANON_RIGHT_HIP, 0.60, 0.90), // exact
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
// Torso τ@20 = 0.04: nose 0.06>0.04 wrong, sh 0.10>0.04 wrong,
// hips exact ⇒ 2/4 = 0.5.
let (_, _, torso) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
// Bbox diag = sqrt(0.68)=0.82462; τ@20 = 0.164924:
// nose 0.06 ≤ τ correct, sh 0.10 ≤ τ correct, hips exact ⇒ 4/4 = 1.0.
let (_, _, bbox) = pck_at(&pred, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal);
// Abs(0.08): nose 0.06 ≤ 0.08 correct, sh 0.10 > 0.08 wrong, hips exact
// ⇒ 3/4 = 0.75.
let (_, _, abs) = pck_at(&pred, &gt, &vis, 20, PckNormalization::AbsolutePixels(0.08));
assert!((torso - 0.5).abs() < 1e-6, "torso PCK expected 0.5, got {torso}");
assert!((bbox - 1.0).abs() < 1e-6, "bbox PCK expected 1.0, got {bbox}");
assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK expected 0.75, got {abs}");
// The whole point: identical predictions, three DISTINCT PCK values.
assert!(torso != bbox && bbox != abs && torso != abs,
"normalizations must give distinct PCK: torso={torso}, bbox={bbox}, abs={abs}");
}
// -------- AbsolutePixels ignores k (raw threshold) --------
#[test]
fn absolute_pixels_ignores_threshold_percentage() {
let gt = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let pred = pose17(&[(5, 0.53, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
// τ = 0.05 raw; joint5 err 0.03 ≤ 0.05 correct. k=5 and k=99 must agree.
let (_, _, p5) = pck_at(&pred, &gt, &vis, 5, PckNormalization::AbsolutePixels(0.05));
let (_, _, p99) = pck_at(&pred, &gt, &vis, 99, PckNormalization::AbsolutePixels(0.05));
assert_eq!(p5, p99, "AbsolutePixels must ignore the k percentage");
assert!((p5 - 1.0).abs() < 1e-6, "all three within 0.05, got {p5}");
}
// -------- MPJPE hand-computed (2D and 3D) --------
#[test]
fn mpjpe_hand_computed_2d() {
// joint0 err (3,4)->5, joint1 exact->0 ⇒ mean (5+0)/2 = 2.5.
let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 1.0, 1.0]).unwrap();
let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 1.0, 1.0]).unwrap();
let vis = Array1::from(vec![2.0, 2.0]);
assert!((mpjpe(&pred, &gt, &vis) - 2.5).abs() < 1e-6);
}
#[test]
fn mpjpe_hand_computed_3d() {
// single joint err (1,2,2) -> sqrt(1+4+4)=3.0.
let gt = Array2::from_shape_vec((1, 3), vec![0.0, 0.0, 0.0]).unwrap();
let pred = Array2::from_shape_vec((1, 3), vec![1.0, 2.0, 2.0]).unwrap();
let vis = Array1::from(vec![2.0]);
assert!((mpjpe(&pred, &gt, &vis) - 3.0).abs() < 1e-6);
}
#[test]
fn mpjpe_excludes_invisible_joints() {
// joint0 visible err 5, joint1 INVISIBLE err 100 ⇒ mean = 5 (joint1 dropped).
let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 0.0, 0.0]).unwrap();
let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 100.0, 0.0]).unwrap();
let vis = Array1::from(vec![2.0, 0.0]);
assert!((mpjpe(&pred, &gt, &vis) - 5.0).abs() < 1e-6);
}
// -------- degenerate inputs: no panic --------
#[test]
fn zero_torso_is_unscoreable_not_perfect() {
// Both hips coincident ⇒ torso ≈ 0; bbox also collapses ⇒ None.
let gt = pose17(&[(CANON_LEFT_HIP, 0.5, 0.5), (CANON_RIGHT_HIP, 0.5, 0.5)]);
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
assert_eq!(pck_at(&gt, &gt, &vis, 20, PckNormalization::TorsoDiameter), (0, 0, 0.0));
assert_eq!(pck_at(&gt, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal), (0, 0, 0.0));
}
#[test]
fn no_visible_keypoints_scores_zero() {
let gt = pose17(&[(CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
let vis = vis17(&[]); // nothing visible
let (c, t, pck) = pck_at(&gt, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!((c, t, pck), (0, 0, 0.0));
assert_eq!(mpjpe(&gt, &gt, &vis), 0.0);
}
#[test]
fn nan_coords_do_not_panic_and_count_wrong() {
let gt = pose17(&[(5, 0.5, 0.5), (CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
let mut pred = gt.clone();
pred[[5, 0]] = f32::NAN; // joint 5 prediction is NaN
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!(t, 3);
assert_eq!(c, 2, "NaN joint must count as wrong, hips correct ⇒ 2/3");
assert!((pck - 2.0 / 3.0).abs() < 1e-6);
// mpjpe with a NaN joint yields NaN (caller filters) but must not panic.
assert!(mpjpe(&pred, &gt, &vis).is_nan());
}
// -------- batch report: micro-average + self-describing struct --------
#[test]
fn accuracy_report_micro_averages_and_carries_definition() {
// Frame A: 2 visible, both correct (2/2). Frame B: 2 visible, both wrong (0/2).
// Micro-average over joints: 2 correct / 4 = 0.5 (NOT mean-of-frame-PCK,
// which would be (1.0+0.0)/2 = 0.5 here too, but the accumulator is the
// joint-level one).
let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let frame_a = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis.clone() };
// Frame B: displace both hips by 0.05 (> τ 0.04) ⇒ both wrong.
let pred_b = pose17(&[(CANON_LEFT_HIP, 0.45, 0.50), (CANON_RIGHT_HIP, 0.65, 0.50)]);
let frame_b = PoseFrame { pred: pred_b, gt: gt.clone(), visibility: vis.clone() };
let report = accuracy_report(
&[frame_a, frame_b],
&[20, 50],
PckNormalization::TorsoDiameter,
);
assert_eq!(report.n_frames, 2);
assert_eq!(report.n_keypoints, 17);
assert_eq!(report.normalization, PckNormalization::TorsoDiameter);
// PCK@20: 2 correct / 4 visible = 0.5.
assert!((report.pck(20).unwrap() - 0.5).abs() < 1e-6);
// PCK@50: τ = 0.5·0.20 = 0.10, frame B err 0.05 ≤ 0.10 ⇒ all correct
// ⇒ 4/4 = 1.0.
assert!((report.pck(50).unwrap() - 1.0).abs() < 1e-6);
// A reported number always carries its definition in the summary.
assert!(report.summary().contains("torso-diameter"));
}
#[test]
fn accuracy_report_empty_is_zero_not_nan() {
let report = accuracy_report(&[], &[20], PckNormalization::BoundingBoxDiagonal);
assert_eq!(report.n_frames, 0);
assert_eq!(report.pck(20), Some(0.0));
assert_eq!(report.mpjpe, 0.0);
assert!(!report.mpjpe.is_nan());
}
// -------- bbox-norm is looser than torso-norm (sanity, on a batch) --------
#[test]
fn bbox_norm_scores_at_least_torso_norm() {
// bbox diagonal >= torso span always (bbox encloses the hips), so for the
// SAME frames bbox-PCK >= torso-PCK at the same k. Pin this ordering.
let gt = pose17(&[
(0, 0.50, 0.10),
(5, 0.50, 0.40),
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
let pred = pose17(&[
(0, 0.55, 0.10),
(5, 0.58, 0.40),
(CANON_LEFT_HIP, 0.42, 0.90),
(CANON_RIGHT_HIP, 0.62, 0.90),
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let frame = PoseFrame { pred, gt, visibility: vis };
let torso = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::TorsoDiameter);
let bbox = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::BoundingBoxDiagonal);
assert!(
bbox.pck(20).unwrap() >= torso.pck(20).unwrap(),
"bbox-norm (looser) must be >= torso-norm: bbox={:?} torso={:?}",
bbox.pck(20), torso.pck(20)
);
}
}

View File

@ -43,6 +43,11 @@
// All *this* crate's code is written without unsafe blocks.
#![warn(missing_docs)]
/// Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173)
/// — selectable `PckNormalization` (torso / bbox-diagonal / absolute), `mpjpe`,
/// and a self-describing `PoseAccuracy` result so a reported PCK number always
/// carries the definition it was computed under.
pub mod accuracy;
pub mod config;
pub mod dataset;
pub mod domain;
@ -89,6 +94,11 @@ pub use metrics_core::{
canonical_torso_size, oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP,
COCO_KP_SIGMAS,
};
// ADR-155 §Tier-1.2 — metric-locked accuracy harness (selectable PCK
// normalization + MPJPE + self-describing result).
pub use accuracy::{
accuracy_report, mpjpe as pck_mpjpe, pck_at, PckNormalization, PoseAccuracy, PoseFrame,
};
pub use config::TrainingConfig;
pub use dataset::{
CsiDataset, CsiSample, DataLoader, MmFiDataset, SyntheticConfig, SyntheticCsiDataset,

View File

@ -29,6 +29,66 @@
use ndarray::{Array1, Array2};
use wifi_densepose_train::{oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP};
// ADR-155 §Tier-1.2 — metric-locked accuracy harness public surface.
use wifi_densepose_train::{accuracy_report, pck_at, PckNormalization, PoseFrame};
// ---------------------------------------------------------------------------
// Metric-locked accuracy harness: the three PCK normalizations are reachable
// from the crate root and give DIFFERENT PCK on identical predictions — the
// proof that the 96 / 81.6 / 61 figures were non-comparable (validated here as
// a downstream consumer would call it).
// ---------------------------------------------------------------------------
/// Identical predictions, three declared normalizations ⇒ three distinct PCK.
/// Hand calc (all coords in `[0,1]`):
/// * GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30), hips=(0.40,0.90)/(0.60,0.90).
/// * Pred: nose err 0.06, shoulder err 0.10, hips exact.
/// * torso = 0.20 ⇒ τ@20 = 0.04 ⇒ only hips correct ⇒ 2/4 = **0.50**.
/// * bbox = √(0.20²+0.80²)=0.82462 ⇒ τ@20 = 0.16492 ⇒ all correct ⇒ **1.00**.
/// * abs(0.08): nose 0.06≤0.08 ok, shoulder 0.10>0.08 wrong ⇒ 3/4 = **0.75**.
#[test]
fn harness_three_normalizations_differ_from_crate_root() {
let gt = pose17(&[
(0, 0.50, 0.10),
(5, 0.50, 0.30),
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
let pred = pose17(&[
(0, 0.56, 0.10),
(5, 0.60, 0.30),
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (_, _, torso) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
let (_, _, bbox) = pck_at(&pred, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal);
let (_, _, abs) = pck_at(&pred, &gt, &vis, 20, PckNormalization::AbsolutePixels(0.08));
assert!((torso - 0.50).abs() < 1e-6, "torso PCK 0.50, got {torso}");
assert!((bbox - 1.00).abs() < 1e-6, "bbox PCK 1.00, got {bbox}");
assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK 0.75, got {abs}");
assert!(
torso != bbox && bbox != abs && torso != abs,
"three normalizations must be distinct: {torso} / {bbox} / {abs}"
);
}
/// `accuracy_report` returns a self-describing result carrying its normalization,
/// so an unlabeled PCK number is structurally impossible at the API boundary.
#[test]
fn harness_report_carries_normalization_label() {
let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let frame = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis };
let report = accuracy_report(&[frame], &[20], PckNormalization::BoundingBoxDiagonal);
assert_eq!(report.normalization, PckNormalization::BoundingBoxDiagonal);
assert_eq!(report.n_keypoints, 17);
assert_eq!(report.n_frames, 1);
assert!((report.pck(20).unwrap() - 1.0).abs() < 1e-6);
assert!(report.summary().contains("bbox-diagonal"));
}
// ---------------------------------------------------------------------------
// Tests that use `EvalMetrics` (requires tch-backend because the metrics