feat(train): metric-locked PCK/MPJPE accuracy harness + ADR-173 (resolve PCK-definition ambiguity) (#1092)

* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity

The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).

New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
  BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
  (retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
  metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
  n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
  number is structurally impossible.

17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).

This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).

wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness

Documents the accuracy harness (committed 3a8b2ed13) that resolves the
PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief
(#1090): three historical numbers (96/81.6/61) used three unstated
normalizations. The harness makes normalization explicit + selectable
(PckNormalization enum) and every reported number carries its definition.
Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs.

Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
rUv 2026-06-15 00:41:02 -04:00 committed by GitHub
parent cfd0ad76cf
commit 90a88ada9a
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 902 additions and 0 deletions

View File

@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`homecore-recorder` security review (ADR-132 surfaces) — two real bounding fixes; SQL-injection & NaN-index dimensions confirmed clean with evidence.** Beyond-SOTA review of the HA-compat state recorder (DB persistence + history + ruvector semantic search), the crux being its DB-backed SQL-injection surface. **Findings + fixes:** (1) **Memory-DoS — unbounded `get_state_history`.** The history query carried no `LIMIT`, so a wide `[since, until]` window over a high-frequency entity (a per-second sensor ≈ 86k rows/day) would load an unbounded row set into a single in-memory `Vec`. Added a hard `LIMIT MAX_HISTORY_ROWS` (1,000,000 — generous enough never to truncate a realistic history graph, bounded enough to cap the worst case); the sibling search paths were already `k`-bounded. (2) **Disk-DoS / documented-but-missing `purge`.** The README + HA-compat table advertised `Recorder::purge(older_than)` as a capability, but **no such method existed** — i.e. no retention path at all → unbounded disk growth. Implemented a **transactional** `purge` that deletes `states` + `events` strictly **older than** the cutoff (**exclusive** boundary — idempotent, no off-by-one; a row at the cutoff instant is kept) and **garbage-collects** orphaned `state_attributes` blobs (a dedup-shared blob is dropped only once its last referencing state is gone); all three deletes run in one transaction so a mid-purge failure rolls back cleanly (no states-deleted-but-events-kept corruption). **Confirmed clean with evidence:** SQL injection — **every** query in `db.rs` uses bound `?` parameters (no `format!`/string-concat of user data into SQL); the lone `format!` builds the LIKE *pattern*, which is itself bound as a parameter with `ESCAPE '\\'` and metacharacter escaping. Pinned: a state value `'; DROP TABLE states; --` is stored/queried **literally** (table survives), and a `%`/`_` in a search query matches **literally**, not as a wildcard. NaN-index poisoning (the calibration/vitals/geo class) — **structurally impossible** here: embeddings are SHA-256 → `i32``f32` (an `i32` cast to `f32` is always finite, never NaN/Inf), with an all-zero-digest norm guard; probed empty-index search, empty-string query, and `k=0` — all return `Ok(0)`, **no panic**. Fail-closed write path — a removal event yields `Ok(None)`, semantic-index failure is logged not propagated (best-effort, never blocks the durable SQLite write), and `EntityId` parsing failures fall back rather than panic. **6 new pinning tests** (SQL-injection literal-storage, LIKE-metacharacter literalness, history `LIMIT`, purge exclusive-boundary, purge attribute-GC-keeps-shared, purge old-events): `homecore-recorder` **19 → 25** (`--no-default-features`) / **25 → 31** (`--features ruvector`), 0 failed; the purge-boundary test is a true pin (fails deleting 2 rows under an inclusive cutoff, passes deleting 1 under the exclusive cutoff). Behaviour otherwise unchanged; Python deterministic proof unchanged (recorder is off the signal proof path).
### Added
- **Metric-locked PCK/MPJPE accuracy harness — resolves the PCK-definition ambiguity (`wifi-densepose-train`, needs ADR slot 173).** The SOTA brief (`docs/research/sota-nn-train-benchmark-brief.md` §1, §3.1, §4) found the single biggest threat to any "beyond-SOTA" claim is **metric ambiguity**: three PCK@20 figures (96.09% WiFlow-STD image-normalized, 81.63% AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up because each silently uses a different normalization — the project was retracted twice over this (a withdrawn "92.9%" used *absolute* pixels, not torso). New `src/accuracy.rs` makes the normalizer **explicit, selectable, and carried with every reported number**: a `PckNormalization` enum (`TorsoDiameter` = standard MM-Fi/GraphPose-Fi hip↔hip; `BoundingBoxDiagonal` = looser WiFlow-STD image-normalized; `AbsolutePixels(threshold)` = the retracted convention, included so historical numbers are reproducible and clearly labeled non-comparable); one canonical `pck_at(pred, gt, vis, k, normalization)` reusing the `metrics_core` geometric primitives (hip distance, bbox diagonal — no duplicate kernel); `mpjpe(pred, gt, vis)` (2D/3D, mm); and a self-describing `PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames }` returned by `accuracy_report(frames, ks, normalization)` so an **unlabeled PCK number is structurally impossible**. **17 hand-computed deterministic tests** (no GPU, no datasets) prove the harness arithmetic: perfect→PCK=1.0/MPJPE=0; all-just-outside→0.0; half-in-half-out→0.5; the **key proof** that identical predictions score 0.50 (torso) / 1.00 (bbox) / 0.75 (abs) under the three normalizations (the ambiguity is real and the definitions are distinct); MPJPE 2D/3D fixtures; and graceful degenerate handling (zero torso, empty frames, NaN coords — no panic, never a false-perfect). **This is measurement infrastructure, not an accuracy claim** — the tests prove the harness is correct, not that any model is good. `wifi-densepose-train` lib 191→206, `test_metrics` 12→14, 0 failed. Python deterministic proof unchanged (off the signal proof path).
- **RuField `rufield-viewer` live-ingest mode — closes the RuView↔RuField visual loop (ADR-262 surfaces).** The dashboard gains `--source live --upstream <RuView-URL>`: it consumes RuView's `/ws/field` SSE (falling back to polling `/api/field`), **verifies every event's ed25519 provenance receipt on ingest** (`is_fusable`) — forged/tampered events are flagged ✗ and **never fused** into trusted inferences — and renders real RuView `FieldEvent`s through the same room-state/privacy-badge/fusion-graph/receipt path the synthetic mode uses (wire-compatible by construction: both sides use `rufield_core::FieldEvent` serde). **Strict banner honesty:** a single `BannerState` shows `SYNTHETIC` / `LIVE — <upstream>` / `DISCONNECTED — <upstream> unreachable`, mutually exclusive — never SYNTHETIC while showing live data or vice versa; live mode returns **409** on `/api/run` rather than fabricate a synthetic run, and starts DISCONNECTED until first verified contact. Default stays synthetic. 26 tests / 0 failed. `ruvnet/rufield` `crates/rufield-viewer`; `vendor/rufield` submodule bumped.
- **ADR-262 P3 — live RuField surface: RuView's running sensing-server now speaks RuField on `/api/field` + `/ws/field`.** Wires the P1 `wifi-densepose-rufield` bridge into the live `wifi-densepose-sensing-server` (the bridge is the only added coupling, ADR-262 §5.4). A new `src/rufield_surface.rs` module (kept out of the 8k-line `main.rs`) holds a `FieldSurface` with a **dedicated ed25519 `Signer`**, a bounded ring buffer of recent signed events (`FIELD_RING_CAPACITY = 64`), and the `/ws/field` broadcast topic; it exposes `GET /api/field` (latest signed `FieldEvent`s + signer pubkey + a `dev_signing_key` flag) and `GET /ws/field` (per-cycle stream, mirroring `/ws/sensing`), plus a standalone `router()` for isolated testing. **Tap:** at the ESP32 governed-trust cycle (`main.rs` `observe_cycle` ~`:5886` / `SensingUpdate` build ~`:5938`), `emit_rufield_event` joins the cycle's real `SensingUpdate` (features/classification/signal_field) with the engine's recorded `effective_class`/`demoted` trust state into a `SensingSnapshot` and surfaces a signed `FieldEvent`**existing endpoints (`/ws/sensing` etc.) are unchanged; this is purely additive.** **Signer (defers the P2 key decision, §8 Q1):** a **standalone dev/sensing key** from `WDP_RUFIELD_SIGNING_SEED` (64-hex or ≥32-byte value), else a deterministic dev default with a logged `WARN` — reusing the `cog-ha-matter` Ed25519 key is the deferred P2 call, so P3 does not pre-empt it. **Egress privacy (fail-closed):** `network_egress_allowed` is *stricter* than `DefaultPrivacyGuard` for an unattended live surface — only **P1/P2** leave the box; P0 (raw) and P3/P4/P5 are held edge-local, so a `Derived → P4/P5` cycle **never** surfaces; no-presence cycles emit **no phantom event**. **P3 acceptance gates (`tests/rufield_surface_test.rs`, 4 integration via `tower::oneshot` + 4 module unit, 0 failed):** a well-formed **signed** event (`Modality::WifiCsi`, P2 not P1, `is_fusable` ed25519-verified, real timestamp); empty cycle → no phantom; **privacy-safety** — an injected `Derived` trust never surfaces; a mixed stream surfaces only egress-safe events. **Honest scope (ADR-262 §0/§6):** real plumbing on a **live endpoint**, **NOT accuracy** — single-link CSI with its existing caveats (no validated room-coordinate accuracy — `field_localize`), a dedicated dev signing key pending the P2 ownership decision, no accuracy claim. The win is narrowly: "RuView's live sensing now speaks RuField on `/ws/field`."
- **ADR-262 P1 — `wifi-densepose-rufield` anti-corruption bridge: RuView WiFi-CSI sensing → signed RuField `FieldEvent`s.** A new v2 workspace member (the *single coupling point* between RuView and the standalone RuField MFS spec, ADR-262 §5.4) that **path-deps** the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion` — pure-Rust, `--no-default-features`-buildable: serde/sha2/ed25519/toml only, no tch/openblas/ndarray/candle) and **no** RuView internal crate. The bridge takes owned primitives — `SensingSnapshot` mirrors the `/ws/sensing` `SensingUpdate` (features + classification + signal_field) joined with the `TrustedOutput` trust state (`trust_class`/`demoted`/`identity_bound`) — and `snapshot_to_field_event()` emits one **signed** `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`): a real `FieldTensor` from the feature scalars with the real `timestamp_ns`; an `Observation` whose `range_m`/`motion_vector`/`space_cell` are derived from the strongest **signal-field peak** when present (else `None` — coordinates are **never fabricated**, per the `field_localize` caveat) and `confidence` from the classification; a real `ProvenanceRef` (sha256 over the tensor bytes, `synthetic=false`) **ed25519-signed** so `rufield_provenance::is_fusable` passes. **The §3.3 privacy mapping is the critical correctness item**, implemented as `map_privacy()` mapping RuView's class onto RuField P0P5 **by information content, NEVER by byte value** and **fail-closed**: RuView `Derived` (byte `1`, which sorts *below* `Anonymous` byte `2`) carries an identity embedding → maps to **P4** (or **P5** if identity-bound), **never P1** (the single most dangerous mapping mistake); `Raw → P0`, `Anonymous → P2`, `Restricted → P2`; a governed-engine `demoted` cycle floors the egress class to ≥ P2 with raw suppressed. **P1 acceptance gates (15 tests / 0 failed — 5 unit + 9 integration + 1 doc):** round-trip (`SensingSnapshot → FieldEvent →` serde `→` equal), `is_fusable` (verified ed25519 receipt), `RuFieldFusion::ingest` accept + `infer()` runs, **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy` — `Derived → P4/P5`, never P1; a table test over every RuView class; fail-closed demotion), and determinism (same snapshot + same signer seed → byte-identical event). **Honest scope:** this is **P1 plumbing** — a tested conversion + a safe privacy mapping. It is **not** wired into the live server (that is P3) and makes **no accuracy claim** (RuField v0.1 is synthetic; RuView's single-link CSI carries its own caveats). CI: the `rust-tests` workflow checkout gains `submodules: recursive` so the path-deps resolve. Python deterministic proof unchanged (off the signal proof path).

View File

@ -0,0 +1,123 @@
# ADR-173: Metric-Locked PCK/MPJPE Accuracy Harness
| Field | Value |
|-------|-------|
| **Status** | Accepted — implemented, deterministically tested |
| **Date** | 2026-06-15 |
| **Deciders** | ruv |
| **Codename** | **METRIC-LOCK** |
| **Amends** | ADR-155 (generalizes the torso-only `metrics_core::pck_canonical` to a selectable normalization) |
| **Motivated by** | `docs/research/sota-nn-train-benchmark-brief.md` (PR #1090) |
## Context
The beyond-SOTA SOTA-research brief (PR #1090) identified the single biggest
threat to any "beyond-SOTA" accuracy claim this project makes: **metric
ambiguity**. Three PCK@20 numbers circulate, computed under three *different and
unstated* normalizations, so they cannot be compared:
- **96.0996.61%** — WiFlow-STD reproduction, **image/bounding-box-normalized** PCK (the looser convention).
- **81.63%** — an internal MM-Fi number reported as **"torso-PCK"** (tighter).
- **61.1%** — GraphPose-Fi (arXiv 2511.19105), **standard torso-diameter** PCK on the MM-Fi random split (the academic frontier).
The project has been burned by this twice: a previously-published 92.9% was
retracted because it used **absolute-pixel** normalization, not torso. Until
there is *one canonical, documented, tested* PCK definition — and every reported
number carries the definition it was computed under — no accuracy comparison is
credible, and the "prove everything" bar cannot be met for the benchmark half of
the work.
This is measurement infrastructure, not an accuracy claim. The deliverable's job
is to make the metric **unambiguous and reproducible**, so future numbers are
comparable and an unlabeled PCK is structurally impossible.
## Decision
Add a metric-locked accuracy harness as a new module
`v2/crates/wifi-densepose-train/src/accuracy.rs` (404 non-test lines; inline
deterministic tests bring the file to 708), re-exported at the crate root. It
**extends, not duplicates** — it reuses `metrics_core`'s geometric primitives
(`bounding_box_diagonal`, canonical hip indices `CANON_LEFT_HIP/RIGHT_HIP`), so
there remains exactly one implementation of each geometric reference; the
existing ADR-155 `pck_canonical` (torso-only) is unchanged and this generalizes
it.
### Public API
- `enum PckNormalization { TorsoDiameter, BoundingBoxDiagonal, AbsolutePixels(f32) }`
— the three conventions the three historical numbers used, now **explicit and
selectable**. `.label()` / `.tolerance(...)`.
- `pck_at(pred, gt, vis, k, norm) -> (correct, total, pck)` — PCK@k =
fraction of *visible* keypoints whose predicted-vs-GT distance ≤ the tolerance,
where tolerance = `k%` of the chosen normalizer (or an absolute threshold for
`AbsolutePixels`).
- `mpjpe(pred, gt, vis) -> f32` — mean per-joint position error (2D/3D, coordinate
units; mm for mm inputs). Re-exported crate-root as `pck_mpjpe` to avoid
colliding with the existing `eval::mpjpe`.
- `struct PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames }`
**a reported number always carries its `normalization`**; an unlabeled PCK is
structurally impossible to produce through this surface.
- `struct PoseFrame { pred, gt, visibility }` + `accuracy_report(frames, ks, norm) -> PoseAccuracy`
(micro-averaged over keypoints).
### Correctness is proven by hand-computed deterministic tests (no GPU, no data)
The tests construct synthetic keypoint sets whose PCK/MPJPE can be computed by
hand, and assert the harness matches. Highlights (all pass):
| Test | Construction | Expected |
|------|--------------|----------|
| perfect_prediction | pred==gt | PCK=1.0 (all 3 norms), MPJPE=0 |
| all_just_outside | every error just past τ@20 | PCK=0.0 |
| half_in_half_out | 2 exact, 2 just outside | PCK=0.5 |
| **three_normalizations (KEY PROOF)** | identical pred; nose err .06, shoulder .10, hips exact | torso=**0.50**, bbox=**1.00**, abs(.08)=**0.75** |
| mpjpe_2d / mpjpe_3d | (3,4)→5 / (1,2,2)→3 | 2.5 / 3.0 |
| mpjpe_excludes_invisible | invisible joint err 100 ignored | 5.0 |
| zero_torso_unscoreable | coincident hips | `(0,0,0.0)`, **not** false-perfect |
| no_visible_keypoints | vis=∅ | `(0,0,0.0)` |
| nan_coords | one NaN pred coord | counted wrong, **no panic** |
| empty report | no frames | 0.0, **not** NaN |
| bbox≥torso ordering | same frames | bbox-PCK ≥ torso-PCK |
### The key proof (the ambiguity is real and quantified)
Identical predictions, three declared normalizations → **0.50 / 1.00 / 0.75**.
Mechanism: the bbox diagonal `√(0.20² + 0.80²) = 0.825` is ~4× the hip-span torso
`0.20`, so τ@20 is 0.165 (bbox) vs 0.040 (torso) — the looser image-normalized
convention passes joints the strict torso convention rejects. This is *exactly*
why 96% / 81.6% / 61% cannot be lined up without declaring the enum, demonstrated
in-code.
## Validation
- `cargo test -p wifi-densepose-train --no-default-features` → lib **191 → 206**
(+15), `test_metrics` **12 → 14** (+2), doc-tests 8 — **0 failed**.
- `cargo test --workspace --no-default-features`**exit 0**, 0 failed.
- `python archive/v1/data/proof/verify.py`**VERDICT: PASS**, hash
`f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a` **unchanged**
(off the signal proof path — confirms no pipeline alteration).
## Consequences
### Positive
- The three historical PCK numbers can now be **recomputed under one declared
definition** and compared honestly. The retracted-number class of error
(silent normalization mismatch) is structurally prevented going forward.
- Establishes the measurement substrate for the beyond-SOTA target: GraphPose-Fi
cross-environment **PCK@20 = 12.9%** (standard torso PCK) is now a number this
harness can produce comparably.
### Negative
- None functional. The harness is additive; no existing metric path changed.
### Neutral
- Producing actual model numbers under this harness requires the trained models +
datasets (MM-Fi) and, for cross-domain splits, is the next sub-deliverable of
the benchmark/optimization milestone — out of scope here (this ADR is the
*instrument*, not the *reading*).
## Links
- ADR-155 — metric core (`pck_canonical`, torso-only) — generalized here
- ADR-152 — WiFi-Pose SOTA 2026 intake / WiFlow-STD benchmark
- `docs/research/sota-nn-train-benchmark-brief.md` — the motivating gap analysis
- GraphPose-Fi — arXiv 2511.19105 (verified cross-env PCK@20 = 12.9% anchor)

View File

@ -0,0 +1,708 @@
//! Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173).
//!
//! # Why this module exists
//!
//! Three PCK\@20 numbers float around this project and **cannot be lined up**
//! because each silently uses a *different* PCK definition:
//!
//! | Number | Source | PCK normalization |
//! |--------|--------|-------------------|
//! | 96.09 % | WiFlow-STD reproduction | image / bounding-box normalized (looser) |
//! | 81.63 % | AetherArena MM-Fi (ADR-150) | torso-diameter (standard MM-Fi / GraphPose-Fi) |
//! | 61.1 % | GraphPose-Fi (preprint) | torso-diameter, 3D, mm-scale (harder) |
//!
//! The project was burned **twice** by metric ambiguity (a now-retracted "92.9 %
//! PCK\@20" used *absolute* pixel thresholds, not torso normalization). The fix
//! is to make the normalizer **explicit, selectable, and carried with every
//! reported number** so an unlabeled PCK figure is structurally impossible.
//!
//! [`metrics_core`](crate::metrics_core) already pins the *canonical*
//! torso-normalized PCK ([`pck_canonical`](crate::metrics_core::pck_canonical)).
//! This module generalizes it to a [`PckNormalization`] enum covering all three
//! conventions the SOTA brief names, adds [`mpjpe`] (mm), and bundles results
//! into a self-describing [`PoseAccuracy`] struct. It **reuses** the
//! `metrics_core` primitives (hip distance, bounding-box diagonal) — there is
//! still exactly one implementation of each geometric reference.
//!
//! # This is measurement infrastructure, not an accuracy claim
//!
//! Nothing here asserts any project model is good. The unit tests prove the
//! *harness* is arithmetically correct against hand-computed fixtures (no GPU,
//! no datasets), including the key demonstration that the **same predictions
//! score different PCK under the three normalizations** — proof the ambiguity is
//! real and the definitions are genuinely distinct.
//!
//! # Literature
//!
//! - Torso-diameter PCK is the MM-Fi / GraphPose-Fi convention (Yang et al.,
//! *GraphPose-Fi*, arXiv:2511.19105): a keypoint is correct iff its error is
//! within `k · d_torso`, with `d_torso` the hip↔hip (or shoulder↔hip) span.
//! - Bounding-box / image-normalized PCK is the WiFlow-STD-style looser
//! convention (arXiv:2602.08661) — normalize by the GT pose bbox diagonal.
//! - MPJPE (mean per-joint position error, mm) is reported by GraphPose-Fi and
//! Person-in-WiFi-3D (Yan et al., CVPR 2024).
use std::collections::BTreeMap;
use ndarray::{Array1, Array2};
use crate::metrics_core::{
bounding_box_diagonal, CANON_LEFT_HIP, CANON_RIGHT_HIP,
};
/// Visibility cutoff: a keypoint counts as *visible* iff `visibility[j] >= 0.5`
/// (COCO convention; matches [`crate::metrics_core`]).
const VISIBILITY_THRESHOLD: f32 = 0.5;
/// Minimum positive normalizer extent. Below this the reference scale is
/// considered degenerate (zero torso, collapsed bbox) and the frame is reported
/// unscoreable rather than dividing by ≈0.
const MIN_REFERENCE_EXTENT: f32 = 1e-6;
// ===========================================================================
// PCK normalization — the explicit, selectable definition
// ===========================================================================
/// The PCK normalization basis — **the single knob that made three project
/// numbers non-comparable**, now explicit and carried with every result.
///
/// A keypoint `j` (with `visibility[j] >= 0.5`) is *correct* iff
/// `‖pred_j gt_j‖₂ ≤ τ`, where the **distance tolerance `τ`** is derived from
/// the chosen normalization and the PCK threshold `k` (given as a percentage,
/// e.g. `20` for PCK\@20):
///
/// | Variant | `τ` (tolerance in coordinate units) |
/// |---------|--------------------------------------|
/// | [`TorsoDiameter`](Self::TorsoDiameter) | `(k/100) · d_torso` |
/// | [`BoundingBoxDiagonal`](Self::BoundingBoxDiagonal) | `(k/100) · d_bbox` |
/// | [`AbsolutePixels`](Self::AbsolutePixels) | `threshold` (k ignored) |
///
/// `d_torso` is the hip↔hip span (COCO joints 11↔12), falling back to the bbox
/// diagonal when both hips are not visible — identical to
/// [`crate::metrics_core::canonical_torso_size`]. `d_bbox` is the diagonal of
/// the axis-aligned bounding box of all visible GT keypoints.
///
/// These yield **different** PCK on the *same* predictions whenever
/// `d_torso ≠ d_bbox` (always true for a real pose: the bbox is larger than the
/// hip span), which is exactly why the 96 / 81.6 / 61 numbers cannot be lined
/// up without declaring this enum.
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum PckNormalization {
/// **Torso-diameter** (hip↔hip span). The standard MM-Fi / GraphPose-Fi
/// convention and the *stricter* of the two relative normalizers. This is
/// the canonical default ([`crate::metrics_core::pck_canonical`]).
TorsoDiameter,
/// **Bounding-box diagonal** (a.k.a. image-normalized). The looser
/// WiFlow-STD-style convention: normalize by the GT pose bbox diagonal,
/// which is larger than the torso span ⇒ a more forgiving threshold ⇒ a
/// higher PCK on identical predictions.
BoundingBoxDiagonal,
/// **Absolute pixel/coordinate threshold** — no pose-relative
/// normalization. The PCK `k` percentage is ignored; the held `threshold`
/// is the raw distance tolerance directly. Included so historical
/// retracted-style numbers are reproducible, and **clearly labeled as
/// non-comparable** to the relative variants (it does not scale with body
/// size or camera distance).
AbsolutePixels(f32),
}
impl PckNormalization {
/// Human-readable, *self-documenting* label for a reported number — so a
/// `PoseAccuracy` printed anywhere always carries its definition.
pub fn label(&self) -> String {
match self {
PckNormalization::TorsoDiameter => "torso-diameter".to_string(),
PckNormalization::BoundingBoxDiagonal => "bbox-diagonal".to_string(),
PckNormalization::AbsolutePixels(t) => format!("absolute-px({t})"),
}
}
/// Compute the per-frame distance tolerance `τ` for PCK threshold `k`
/// (percentage). Returns `None` when the (relative) normalizer is degenerate
/// — the frame cannot be scored.
///
/// `gt_kpts` is `[n, 2]` (or `[n, ≥2]`, only x/y used); `visibility` is `[n]`.
fn tolerance(&self, gt_kpts: &Array2<f32>, visibility: &Array1<f32>, k: u8) -> Option<f32> {
let n = gt_kpts.shape()[0].min(visibility.len());
match self {
PckNormalization::AbsolutePixels(threshold) => {
// Raw tolerance, independent of pose scale and of `k`.
if *threshold > 0.0 {
Some(*threshold)
} else {
None
}
}
PckNormalization::TorsoDiameter => {
let d = torso_diameter(gt_kpts, visibility, n)?;
Some((k as f32 / 100.0) * d)
}
PckNormalization::BoundingBoxDiagonal => {
let d = bounding_box_diagonal(gt_kpts, visibility, n);
if d > MIN_REFERENCE_EXTENT {
Some((k as f32 / 100.0) * d)
} else {
None
}
}
}
}
}
/// Hip↔hip torso diameter with a bbox-diagonal fallback — the relative
/// normalizer shared by `TorsoDiameter` PCK and
/// [`crate::metrics_core::canonical_torso_size`]. Returns `None` when no
/// positive-extent reference exists.
fn torso_diameter(gt_kpts: &Array2<f32>, visibility: &Array1<f32>, n: usize) -> Option<f32> {
if CANON_LEFT_HIP < n
&& CANON_RIGHT_HIP < n
&& visibility[CANON_LEFT_HIP] >= VISIBILITY_THRESHOLD
&& visibility[CANON_RIGHT_HIP] >= VISIBILITY_THRESHOLD
{
let dx = gt_kpts[[CANON_LEFT_HIP, 0]] - gt_kpts[[CANON_RIGHT_HIP, 0]];
let dy = gt_kpts[[CANON_LEFT_HIP, 1]] - gt_kpts[[CANON_RIGHT_HIP, 1]];
let torso = (dx * dx + dy * dy).sqrt();
if torso > MIN_REFERENCE_EXTENT {
return Some(torso);
}
}
let diag = bounding_box_diagonal(gt_kpts, visibility, n);
if diag > MIN_REFERENCE_EXTENT {
Some(diag)
} else {
None
}
}
// ===========================================================================
// Single-frame PCK / MPJPE
// ===========================================================================
/// Per-frame **PCK\@`k`** under the selected `normalization`.
///
/// A keypoint `j` with `visibility[j] >= 0.5` is correct iff
/// `‖pred_j gt_j‖₂ ≤ τ`, with `τ` from
/// [`PckNormalization::tolerance`]. Only x/y are used (2D PCK is the standard
/// keypoint-PCK definition; pass 2-column arrays).
///
/// # Returns
/// `(correct, total, pck)` with `pck ∈ [0,1]`. **`(0, 0, 0.0)`** when no
/// keypoint is visible, or (for the relative normalizers) the reference scale is
/// degenerate — a frame with no measurable evidence scores 0, never 1.
/// NaN-valued coordinates make a keypoint *incorrect* (the `<=` comparison is
/// false for NaN) rather than panicking.
pub fn pck_at(
pred_kpts: &Array2<f32>,
gt_kpts: &Array2<f32>,
visibility: &Array1<f32>,
k: u8,
normalization: PckNormalization,
) -> (usize, usize, f32) {
let n = pred_kpts.shape()[0]
.min(gt_kpts.shape()[0])
.min(visibility.len());
let tol = match normalization.tolerance(gt_kpts, visibility, k) {
Some(t) => t,
None => return (0, 0, 0.0),
};
let mut correct = 0usize;
let mut total = 0usize;
for j in 0..n {
if visibility[j] < VISIBILITY_THRESHOLD {
continue;
}
total += 1;
let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
let dist = (dx * dx + dy * dy).sqrt();
// NaN-safe: `NaN <= tol` is false, so a NaN coordinate counts as wrong.
if dist <= tol {
correct += 1;
}
}
let pck = if total > 0 {
correct as f32 / total as f32
} else {
0.0
};
(correct, total, pck)
}
/// Per-frame **MPJPE** (mean per-joint position error) over visible keypoints,
/// in the coordinate units of the inputs (report as mm when inputs are mm).
///
/// `pred`/`gt` are `[n, D]` with `D ∈ {2, 3}` (2D or 3D pose); all `D` columns
/// are used. Joints with `visibility[j] < 0.5` are excluded.
///
/// Returns `0.0` when no keypoint is visible (no evidence). A NaN coordinate
/// propagates into the returned mean (callers filter NaN frames upstream); it
/// does not panic.
pub fn mpjpe(pred: &Array2<f32>, gt: &Array2<f32>, visibility: &Array1<f32>) -> f32 {
let n = pred.shape()[0].min(gt.shape()[0]).min(visibility.len());
let d = pred.shape()[1].min(gt.shape()[1]);
let mut sum = 0.0f32;
let mut count = 0usize;
for j in 0..n {
if visibility[j] < VISIBILITY_THRESHOLD {
continue;
}
let mut sq = 0.0f32;
for c in 0..d {
let diff = pred[[j, c]] - gt[[j, c]];
sq += diff * diff;
}
sum += sq.sqrt();
count += 1;
}
if count > 0 {
sum / count as f32
} else {
0.0
}
}
// ===========================================================================
// Self-describing result struct + batch report
// ===========================================================================
/// A pose-accuracy result that **always carries the definition it was computed
/// under** — making an unlabeled PCK number structurally impossible.
///
/// Built by [`accuracy_report`] over a set of frames. `pck_at` maps each
/// requested threshold `k` (percentage, e.g. `20`) to its PCK in `[0,1]`. The
/// `normalization` field records *which* PCK definition produced those numbers,
/// so two `PoseAccuracy` values can only be compared when their `normalization`
/// matches (the comparability check the project lacked).
#[derive(Debug, Clone, PartialEq)]
pub struct PoseAccuracy {
/// PCK\@k for each requested threshold percentage `k`, in `[0,1]`.
pub pck_at: BTreeMap<u8, f32>,
/// Mean per-joint position error in coordinate units (mm for mm inputs).
pub mpjpe: f32,
/// The normalization basis under which `pck_at` was computed — the label a
/// reported number must always carry.
pub normalization: PckNormalization,
/// Number of keypoints per frame (the pose convention, e.g. 17 for COCO).
pub n_keypoints: usize,
/// Number of frames aggregated into this result.
pub n_frames: usize,
}
impl PoseAccuracy {
/// Convenience accessor for a single threshold, returning `None` when that
/// `k` was not requested.
pub fn pck(&self, k: u8) -> Option<f32> {
self.pck_at.get(&k).copied()
}
/// A one-line, self-documenting summary suitable for logs / RESULTS.md, e.g.
/// `PCK@20=0.750 (torso-diameter, 17kp, 1 frames) MPJPE=0.030`.
pub fn summary(&self) -> String {
let pcks: Vec<String> = self
.pck_at
.iter()
.map(|(k, v)| format!("PCK@{k}={v:.3}"))
.collect();
format!(
"{} ({}, {}kp, {} frames) MPJPE={:.4}",
pcks.join(" "),
self.normalization.label(),
self.n_keypoints,
self.n_frames,
self.mpjpe
)
}
}
/// One frame's prediction + ground truth + visibility for batch scoring.
///
/// All three arrays share row count `n_keypoints`; `pred`/`gt` are `[n, D]`
/// (`D ∈ {2,3}`), `visibility` is `[n]`.
#[derive(Debug, Clone)]
pub struct PoseFrame {
/// Predicted keypoints `[n, D]`.
pub pred: Array2<f32>,
/// Ground-truth keypoints `[n, D]`.
pub gt: Array2<f32>,
/// Per-keypoint visibility `[n]` (`>= 0.5` ⇒ visible).
pub visibility: Array1<f32>,
}
/// Aggregate [`PoseAccuracy`] over a batch of frames under **one** explicit
/// `normalization`, for the requested PCK thresholds `ks` (percentages).
///
/// PCK is micro-averaged over keypoints (sum of correct ÷ sum of visible across
/// all frames — the standard keypoint-PCK aggregation), so frames with more
/// visible joints contribute proportionally. MPJPE is micro-averaged over
/// visible joints likewise. Unscoreable frames (no visible joints, degenerate
/// relative normalizer) contribute `(0, 0)` and so are excluded from the
/// denominator rather than scored as perfect.
///
/// An **empty** `frames` slice yields all-zero PCK and `0.0` MPJPE — never a
/// panic or NaN.
pub fn accuracy_report(
frames: &[PoseFrame],
ks: &[u8],
normalization: PckNormalization,
) -> PoseAccuracy {
let n_keypoints = frames.first().map(|f| f.gt.shape()[0]).unwrap_or(0);
// PCK: per-threshold (correct, total) accumulators across frames.
let mut pck_acc: BTreeMap<u8, (usize, usize)> = ks.iter().map(|&k| (k, (0, 0))).collect();
// MPJPE: sum of per-joint distances and visible-joint count.
let mut mpjpe_sum = 0.0f32;
let mut mpjpe_count = 0usize;
for frame in frames {
for &k in ks {
let (c, t, _) = pck_at(&frame.pred, &frame.gt, &frame.visibility, k, normalization);
let entry = pck_acc.entry(k).or_insert((0, 0));
entry.0 += c;
entry.1 += t;
}
// Per-frame MPJPE re-derived as a (sum, count) contribution so the
// batch value is a true micro-average over joints.
let n = frame.pred.shape()[0].min(frame.gt.shape()[0]).min(frame.visibility.len());
let d = frame.pred.shape()[1].min(frame.gt.shape()[1]);
for j in 0..n {
if frame.visibility[j] < VISIBILITY_THRESHOLD {
continue;
}
let mut sq = 0.0f32;
for c in 0..d {
let diff = frame.pred[[j, c]] - frame.gt[[j, c]];
sq += diff * diff;
}
mpjpe_sum += sq.sqrt();
mpjpe_count += 1;
}
}
let pck_at: BTreeMap<u8, f32> = pck_acc
.into_iter()
.map(|(k, (c, t))| {
let v = if t > 0 { c as f32 / t as f32 } else { 0.0 };
(k, v)
})
.collect();
let mpjpe = if mpjpe_count > 0 {
mpjpe_sum / mpjpe_count as f32
} else {
0.0
};
PoseAccuracy {
pck_at,
mpjpe,
normalization,
n_keypoints,
n_frames: frames.len(),
}
}
#[cfg(test)]
mod tests {
use super::*;
/// Build a 17-joint `[17, 2]` pose from `(joint, x, y)` triples.
fn pose17(joints: &[(usize, f32, f32)]) -> Array2<f32> {
let mut a = Array2::<f32>::zeros((17, 2));
for &(j, x, y) in joints {
a[[j, 0]] = x;
a[[j, 1]] = y;
}
a
}
fn vis17(visible: &[usize]) -> Array1<f32> {
let mut v = Array1::<f32>::zeros(17);
for &j in visible {
v[j] = 2.0;
}
v
}
// -------- consts pinned (no silent metric drift) --------
#[test]
fn accuracy_consts_unchanged() {
assert_eq!(VISIBILITY_THRESHOLD, 0.5_f32);
assert_eq!(MIN_REFERENCE_EXTENT, 1e-6_f32);
}
// -------- perfect prediction ⇒ PCK = 1.0, MPJPE = 0 --------
#[test]
fn perfect_prediction_pck_one_mpjpe_zero() {
let gt = pose17(&[
(5, 0.35, 0.35),
(CANON_LEFT_HIP, 0.40, 0.50),
(CANON_RIGHT_HIP, 0.60, 0.50),
]);
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
for norm in [
PckNormalization::TorsoDiameter,
PckNormalization::BoundingBoxDiagonal,
PckNormalization::AbsolutePixels(0.01),
] {
let (c, t, pck) = pck_at(&gt, &gt, &vis, 20, norm);
assert_eq!((c, t), (3, 3), "{norm:?}");
assert!((pck - 1.0).abs() < 1e-6, "{norm:?} perfect PCK must be 1.0");
}
assert_eq!(mpjpe(&gt, &gt, &vis), 0.0);
}
// -------- all keypoints just OUTSIDE threshold ⇒ PCK = 0.0 --------
//
// Hand calc (torso): hips at (0.40,0.50)/(0.60,0.50) ⇒ torso = 0.20.
// threshold k=20 ⇒ τ = 0.20·0.20 = 0.04. Push every scored joint to an
// error of 0.05 (> 0.04) ⇒ all wrong. To avoid the hips themselves being
// "correct", we displace the hips too (their displaced positions still
// define the torso from GT, which is unchanged).
#[test]
fn all_just_outside_threshold_pck_zero() {
let gt = pose17(&[
(5, 0.50, 0.50),
(CANON_LEFT_HIP, 0.40, 0.50),
(CANON_RIGHT_HIP, 0.60, 0.50),
]);
// GT torso = 0.20, τ@20 = 0.04. Displace each scored joint by dx=0.05.
let pred = pose17(&[
(5, 0.55, 0.50),
(CANON_LEFT_HIP, 0.45, 0.50),
(CANON_RIGHT_HIP, 0.65, 0.50),
]);
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!(t, 3);
assert_eq!(c, 0, "all errors 0.05 > τ 0.04 ⇒ none correct");
assert_eq!(pck, 0.0);
}
// -------- half-in / half-out ⇒ PCK = 0.5 --------
//
// Hand calc (torso): torso = 0.20, τ@20 = 0.04. Four visible joints; two
// exact (dist 0 ≤ 0.04, correct), two displaced 0.05 (> 0.04, wrong)
// ⇒ 2/4 = 0.5.
#[test]
fn half_in_half_out_pck_half() {
let gt = pose17(&[
(0, 0.50, 0.20),
(5, 0.50, 0.50),
(CANON_LEFT_HIP, 0.40, 0.50),
(CANON_RIGHT_HIP, 0.60, 0.50),
]);
let pred = pose17(&[
(0, 0.50, 0.20), // exact ⇒ correct
(5, 0.55, 0.50), // err 0.05 ⇒ wrong
(CANON_LEFT_HIP, 0.40, 0.50), // exact ⇒ correct
(CANON_RIGHT_HIP, 0.65, 0.50), // err 0.05 ⇒ wrong
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!((c, t), (2, 4));
assert!((pck - 0.5).abs() < 1e-6, "expected 0.5, got {pck}");
}
// -------- THE KEY PROOF: same predictions, three normalizations, three PCK --------
//
// One construction scored three ways. Hand calc:
// GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30),
// l_hip(11)=(0.40,0.90), r_hip(12)=(0.60,0.90).
// Visible = {0,5,11,12}, all four.
// torso = |0.60-0.40| = 0.20 (hips, y equal).
// bbox: x∈[0.40,0.60] (w=0.20), y∈[0.10,0.90] (h=0.80)
// ⇒ diag = sqrt(0.20² + 0.80²) = sqrt(0.04+0.64)=sqrt(0.68)=0.8246…
//
// Pred errors (pure dx): nose 0.00, l_sh 0.10, l_hip 0.00, r_hip 0.00.
// (Only joint 5 is displaced, by 0.10.)
//
// k = 20:
// • Torso τ = 0.20·0.20 = 0.040 → joint5 err 0.10 > 0.040 ⇒ WRONG
// ⇒ 3 correct / 4 = 0.75
// • Bbox τ = 0.20·0.8246 = 0.16492 → joint5 err 0.10 ≤ 0.16492 ⇒ CORRECT
// ⇒ 4 correct / 4 = 1.00
// • Abs(0.05) τ = 0.05 → joint5 err 0.10 > 0.05 ⇒ WRONG
// ⇒ 3 correct / 4 = 0.75 (same count as torso HERE by coincidence)
//
// To make ALL THREE differ, also test Abs(0.08): τ=0.08, joint5 0.10>0.08
// ⇒ still 0.75. So we additionally displace nose by 0.06 (between 0.05 and
// 0.08) to separate the two absolute thresholds — see below.
#[test]
fn three_normalizations_give_different_pck_on_identical_input() {
let gt = pose17(&[
(0, 0.50, 0.10), // nose
(5, 0.50, 0.30), // left_shoulder
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
// nose displaced 0.06, shoulder displaced 0.10, hips exact.
let pred = pose17(&[
(0, 0.56, 0.10), // err 0.06
(5, 0.60, 0.30), // err 0.10
(CANON_LEFT_HIP, 0.40, 0.90), // exact
(CANON_RIGHT_HIP, 0.60, 0.90), // exact
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
// Torso τ@20 = 0.04: nose 0.06>0.04 wrong, sh 0.10>0.04 wrong,
// hips exact ⇒ 2/4 = 0.5.
let (_, _, torso) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
// Bbox diag = sqrt(0.68)=0.82462; τ@20 = 0.164924:
// nose 0.06 ≤ τ correct, sh 0.10 ≤ τ correct, hips exact ⇒ 4/4 = 1.0.
let (_, _, bbox) = pck_at(&pred, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal);
// Abs(0.08): nose 0.06 ≤ 0.08 correct, sh 0.10 > 0.08 wrong, hips exact
// ⇒ 3/4 = 0.75.
let (_, _, abs) = pck_at(&pred, &gt, &vis, 20, PckNormalization::AbsolutePixels(0.08));
assert!((torso - 0.5).abs() < 1e-6, "torso PCK expected 0.5, got {torso}");
assert!((bbox - 1.0).abs() < 1e-6, "bbox PCK expected 1.0, got {bbox}");
assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK expected 0.75, got {abs}");
// The whole point: identical predictions, three DISTINCT PCK values.
assert!(torso != bbox && bbox != abs && torso != abs,
"normalizations must give distinct PCK: torso={torso}, bbox={bbox}, abs={abs}");
}
// -------- AbsolutePixels ignores k (raw threshold) --------
#[test]
fn absolute_pixels_ignores_threshold_percentage() {
let gt = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let pred = pose17(&[(5, 0.53, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
// τ = 0.05 raw; joint5 err 0.03 ≤ 0.05 correct. k=5 and k=99 must agree.
let (_, _, p5) = pck_at(&pred, &gt, &vis, 5, PckNormalization::AbsolutePixels(0.05));
let (_, _, p99) = pck_at(&pred, &gt, &vis, 99, PckNormalization::AbsolutePixels(0.05));
assert_eq!(p5, p99, "AbsolutePixels must ignore the k percentage");
assert!((p5 - 1.0).abs() < 1e-6, "all three within 0.05, got {p5}");
}
// -------- MPJPE hand-computed (2D and 3D) --------
#[test]
fn mpjpe_hand_computed_2d() {
// joint0 err (3,4)->5, joint1 exact->0 ⇒ mean (5+0)/2 = 2.5.
let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 1.0, 1.0]).unwrap();
let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 1.0, 1.0]).unwrap();
let vis = Array1::from(vec![2.0, 2.0]);
assert!((mpjpe(&pred, &gt, &vis) - 2.5).abs() < 1e-6);
}
#[test]
fn mpjpe_hand_computed_3d() {
// single joint err (1,2,2) -> sqrt(1+4+4)=3.0.
let gt = Array2::from_shape_vec((1, 3), vec![0.0, 0.0, 0.0]).unwrap();
let pred = Array2::from_shape_vec((1, 3), vec![1.0, 2.0, 2.0]).unwrap();
let vis = Array1::from(vec![2.0]);
assert!((mpjpe(&pred, &gt, &vis) - 3.0).abs() < 1e-6);
}
#[test]
fn mpjpe_excludes_invisible_joints() {
// joint0 visible err 5, joint1 INVISIBLE err 100 ⇒ mean = 5 (joint1 dropped).
let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 0.0, 0.0]).unwrap();
let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 100.0, 0.0]).unwrap();
let vis = Array1::from(vec![2.0, 0.0]);
assert!((mpjpe(&pred, &gt, &vis) - 5.0).abs() < 1e-6);
}
// -------- degenerate inputs: no panic --------
#[test]
fn zero_torso_is_unscoreable_not_perfect() {
// Both hips coincident ⇒ torso ≈ 0; bbox also collapses ⇒ None.
let gt = pose17(&[(CANON_LEFT_HIP, 0.5, 0.5), (CANON_RIGHT_HIP, 0.5, 0.5)]);
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
assert_eq!(pck_at(&gt, &gt, &vis, 20, PckNormalization::TorsoDiameter), (0, 0, 0.0));
assert_eq!(pck_at(&gt, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal), (0, 0, 0.0));
}
#[test]
fn no_visible_keypoints_scores_zero() {
let gt = pose17(&[(CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
let vis = vis17(&[]); // nothing visible
let (c, t, pck) = pck_at(&gt, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!((c, t, pck), (0, 0, 0.0));
assert_eq!(mpjpe(&gt, &gt, &vis), 0.0);
}
#[test]
fn nan_coords_do_not_panic_and_count_wrong() {
let gt = pose17(&[(5, 0.5, 0.5), (CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
let mut pred = gt.clone();
pred[[5, 0]] = f32::NAN; // joint 5 prediction is NaN
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (c, t, pck) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
assert_eq!(t, 3);
assert_eq!(c, 2, "NaN joint must count as wrong, hips correct ⇒ 2/3");
assert!((pck - 2.0 / 3.0).abs() < 1e-6);
// mpjpe with a NaN joint yields NaN (caller filters) but must not panic.
assert!(mpjpe(&pred, &gt, &vis).is_nan());
}
// -------- batch report: micro-average + self-describing struct --------
#[test]
fn accuracy_report_micro_averages_and_carries_definition() {
// Frame A: 2 visible, both correct (2/2). Frame B: 2 visible, both wrong (0/2).
// Micro-average over joints: 2 correct / 4 = 0.5 (NOT mean-of-frame-PCK,
// which would be (1.0+0.0)/2 = 0.5 here too, but the accumulator is the
// joint-level one).
let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let frame_a = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis.clone() };
// Frame B: displace both hips by 0.05 (> τ 0.04) ⇒ both wrong.
let pred_b = pose17(&[(CANON_LEFT_HIP, 0.45, 0.50), (CANON_RIGHT_HIP, 0.65, 0.50)]);
let frame_b = PoseFrame { pred: pred_b, gt: gt.clone(), visibility: vis.clone() };
let report = accuracy_report(
&[frame_a, frame_b],
&[20, 50],
PckNormalization::TorsoDiameter,
);
assert_eq!(report.n_frames, 2);
assert_eq!(report.n_keypoints, 17);
assert_eq!(report.normalization, PckNormalization::TorsoDiameter);
// PCK@20: 2 correct / 4 visible = 0.5.
assert!((report.pck(20).unwrap() - 0.5).abs() < 1e-6);
// PCK@50: τ = 0.5·0.20 = 0.10, frame B err 0.05 ≤ 0.10 ⇒ all correct
// ⇒ 4/4 = 1.0.
assert!((report.pck(50).unwrap() - 1.0).abs() < 1e-6);
// A reported number always carries its definition in the summary.
assert!(report.summary().contains("torso-diameter"));
}
#[test]
fn accuracy_report_empty_is_zero_not_nan() {
let report = accuracy_report(&[], &[20], PckNormalization::BoundingBoxDiagonal);
assert_eq!(report.n_frames, 0);
assert_eq!(report.pck(20), Some(0.0));
assert_eq!(report.mpjpe, 0.0);
assert!(!report.mpjpe.is_nan());
}
// -------- bbox-norm is looser than torso-norm (sanity, on a batch) --------
#[test]
fn bbox_norm_scores_at_least_torso_norm() {
// bbox diagonal >= torso span always (bbox encloses the hips), so for the
// SAME frames bbox-PCK >= torso-PCK at the same k. Pin this ordering.
let gt = pose17(&[
(0, 0.50, 0.10),
(5, 0.50, 0.40),
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
let pred = pose17(&[
(0, 0.55, 0.10),
(5, 0.58, 0.40),
(CANON_LEFT_HIP, 0.42, 0.90),
(CANON_RIGHT_HIP, 0.62, 0.90),
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let frame = PoseFrame { pred, gt, visibility: vis };
let torso = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::TorsoDiameter);
let bbox = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::BoundingBoxDiagonal);
assert!(
bbox.pck(20).unwrap() >= torso.pck(20).unwrap(),
"bbox-norm (looser) must be >= torso-norm: bbox={:?} torso={:?}",
bbox.pck(20), torso.pck(20)
);
}
}

View File

@ -43,6 +43,11 @@
// All *this* crate's code is written without unsafe blocks.
#![warn(missing_docs)]
/// Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173)
/// — selectable `PckNormalization` (torso / bbox-diagonal / absolute), `mpjpe`,
/// and a self-describing `PoseAccuracy` result so a reported PCK number always
/// carries the definition it was computed under.
pub mod accuracy;
pub mod config;
pub mod dataset;
pub mod domain;
@ -89,6 +94,11 @@ pub use metrics_core::{
canonical_torso_size, oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP,
COCO_KP_SIGMAS,
};
// ADR-155 §Tier-1.2 — metric-locked accuracy harness (selectable PCK
// normalization + MPJPE + self-describing result).
pub use accuracy::{
accuracy_report, mpjpe as pck_mpjpe, pck_at, PckNormalization, PoseAccuracy, PoseFrame,
};
pub use config::TrainingConfig;
pub use dataset::{
CsiDataset, CsiSample, DataLoader, MmFiDataset, SyntheticConfig, SyntheticCsiDataset,

View File

@ -29,6 +29,66 @@
use ndarray::{Array1, Array2};
use wifi_densepose_train::{oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP};
// ADR-155 §Tier-1.2 — metric-locked accuracy harness public surface.
use wifi_densepose_train::{accuracy_report, pck_at, PckNormalization, PoseFrame};
// ---------------------------------------------------------------------------
// Metric-locked accuracy harness: the three PCK normalizations are reachable
// from the crate root and give DIFFERENT PCK on identical predictions — the
// proof that the 96 / 81.6 / 61 figures were non-comparable (validated here as
// a downstream consumer would call it).
// ---------------------------------------------------------------------------
/// Identical predictions, three declared normalizations ⇒ three distinct PCK.
/// Hand calc (all coords in `[0,1]`):
/// * GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30), hips=(0.40,0.90)/(0.60,0.90).
/// * Pred: nose err 0.06, shoulder err 0.10, hips exact.
/// * torso = 0.20 ⇒ τ@20 = 0.04 ⇒ only hips correct ⇒ 2/4 = **0.50**.
/// * bbox = √(0.20²+0.80²)=0.82462 ⇒ τ@20 = 0.16492 ⇒ all correct ⇒ **1.00**.
/// * abs(0.08): nose 0.06≤0.08 ok, shoulder 0.10>0.08 wrong ⇒ 3/4 = **0.75**.
#[test]
fn harness_three_normalizations_differ_from_crate_root() {
let gt = pose17(&[
(0, 0.50, 0.10),
(5, 0.50, 0.30),
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
let pred = pose17(&[
(0, 0.56, 0.10),
(5, 0.60, 0.30),
(CANON_LEFT_HIP, 0.40, 0.90),
(CANON_RIGHT_HIP, 0.60, 0.90),
]);
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let (_, _, torso) = pck_at(&pred, &gt, &vis, 20, PckNormalization::TorsoDiameter);
let (_, _, bbox) = pck_at(&pred, &gt, &vis, 20, PckNormalization::BoundingBoxDiagonal);
let (_, _, abs) = pck_at(&pred, &gt, &vis, 20, PckNormalization::AbsolutePixels(0.08));
assert!((torso - 0.50).abs() < 1e-6, "torso PCK 0.50, got {torso}");
assert!((bbox - 1.00).abs() < 1e-6, "bbox PCK 1.00, got {bbox}");
assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK 0.75, got {abs}");
assert!(
torso != bbox && bbox != abs && torso != abs,
"three normalizations must be distinct: {torso} / {bbox} / {abs}"
);
}
/// `accuracy_report` returns a self-describing result carrying its normalization,
/// so an unlabeled PCK number is structurally impossible at the API boundary.
#[test]
fn harness_report_carries_normalization_label() {
let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
let frame = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis };
let report = accuracy_report(&[frame], &[20], PckNormalization::BoundingBoxDiagonal);
assert_eq!(report.normalization, PckNormalization::BoundingBoxDiagonal);
assert_eq!(report.n_keypoints, 17);
assert_eq!(report.n_frames, 1);
assert!((report.pck(20).unwrap() - 1.0).abs() < 1e-6);
assert!(report.summary().contains("bbox-diagonal"));
}
// ---------------------------------------------------------------------------
// Tests that use `EvalMetrics` (requires tch-backend because the metrics