diff --git a/CHANGELOG.md b/CHANGELOG.md index b1fd80ce..4dc3b4e0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **`homecore-recorder` security review (ADR-132 surfaces) — two real bounding fixes; SQL-injection & NaN-index dimensions confirmed clean with evidence.** Beyond-SOTA review of the HA-compat state recorder (DB persistence + history + ruvector semantic search), the crux being its DB-backed SQL-injection surface. **Findings + fixes:** (1) **Memory-DoS — unbounded `get_state_history`.** The history query carried no `LIMIT`, so a wide `[since, until]` window over a high-frequency entity (a per-second sensor ≈ 86k rows/day) would load an unbounded row set into a single in-memory `Vec`. Added a hard `LIMIT MAX_HISTORY_ROWS` (1,000,000 — generous enough never to truncate a realistic history graph, bounded enough to cap the worst case); the sibling search paths were already `k`-bounded. (2) **Disk-DoS / documented-but-missing `purge`.** The README + HA-compat table advertised `Recorder::purge(older_than)` as a capability, but **no such method existed** — i.e. no retention path at all → unbounded disk growth. Implemented a **transactional** `purge` that deletes `states` + `events` strictly **older than** the cutoff (**exclusive** boundary — idempotent, no off-by-one; a row at the cutoff instant is kept) and **garbage-collects** orphaned `state_attributes` blobs (a dedup-shared blob is dropped only once its last referencing state is gone); all three deletes run in one transaction so a mid-purge failure rolls back cleanly (no states-deleted-but-events-kept corruption). **Confirmed clean with evidence:** SQL injection — **every** query in `db.rs` uses bound `?` parameters (no `format!`/string-concat of user data into SQL); the lone `format!` builds the LIKE *pattern*, which is itself bound as a parameter with `ESCAPE '\\'` and metacharacter escaping. Pinned: a state value `'; DROP TABLE states; --` is stored/queried **literally** (table survives), and a `%`/`_` in a search query matches **literally**, not as a wildcard. NaN-index poisoning (the calibration/vitals/geo class) — **structurally impossible** here: embeddings are SHA-256 → `i32` → `f32` (an `i32` cast to `f32` is always finite, never NaN/Inf), with an all-zero-digest norm guard; probed empty-index search, empty-string query, and `k=0` — all return `Ok(0)`, **no panic**. Fail-closed write path — a removal event yields `Ok(None)`, semantic-index failure is logged not propagated (best-effort, never blocks the durable SQLite write), and `EntityId` parsing failures fall back rather than panic. **6 new pinning tests** (SQL-injection literal-storage, LIKE-metacharacter literalness, history `LIMIT`, purge exclusive-boundary, purge attribute-GC-keeps-shared, purge old-events): `homecore-recorder` **19 → 25** (`--no-default-features`) / **25 → 31** (`--features ruvector`), 0 failed; the purge-boundary test is a true pin (fails deleting 2 rows under an inclusive cutoff, passes deleting 1 under the exclusive cutoff). Behaviour otherwise unchanged; Python deterministic proof unchanged (recorder is off the signal proof path). ### Added +- **Metric-locked PCK/MPJPE accuracy harness — resolves the PCK-definition ambiguity (`wifi-densepose-train`, needs ADR slot 173).** The SOTA brief (`docs/research/sota-nn-train-benchmark-brief.md` §1, §3.1, §4) found the single biggest threat to any "beyond-SOTA" claim is **metric ambiguity**: three PCK@20 figures (96.09% WiFlow-STD image-normalized, 81.63% AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up because each silently uses a different normalization — the project was retracted twice over this (a withdrawn "92.9%" used *absolute* pixels, not torso). New `src/accuracy.rs` makes the normalizer **explicit, selectable, and carried with every reported number**: a `PckNormalization` enum (`TorsoDiameter` = standard MM-Fi/GraphPose-Fi hip↔hip; `BoundingBoxDiagonal` = looser WiFlow-STD image-normalized; `AbsolutePixels(threshold)` = the retracted convention, included so historical numbers are reproducible and clearly labeled non-comparable); one canonical `pck_at(pred, gt, vis, k, normalization)` reusing the `metrics_core` geometric primitives (hip distance, bbox diagonal — no duplicate kernel); `mpjpe(pred, gt, vis)` (2D/3D, mm); and a self-describing `PoseAccuracy { pck_at: BTreeMap, mpjpe, normalization, n_keypoints, n_frames }` returned by `accuracy_report(frames, ks, normalization)` so an **unlabeled PCK number is structurally impossible**. **17 hand-computed deterministic tests** (no GPU, no datasets) prove the harness arithmetic: perfect→PCK=1.0/MPJPE=0; all-just-outside→0.0; half-in-half-out→0.5; the **key proof** that identical predictions score 0.50 (torso) / 1.00 (bbox) / 0.75 (abs) under the three normalizations (the ambiguity is real and the definitions are distinct); MPJPE 2D/3D fixtures; and graceful degenerate handling (zero torso, empty frames, NaN coords — no panic, never a false-perfect). **This is measurement infrastructure, not an accuracy claim** — the tests prove the harness is correct, not that any model is good. `wifi-densepose-train` lib 191→206, `test_metrics` 12→14, 0 failed. Python deterministic proof unchanged (off the signal proof path). - **RuField `rufield-viewer` live-ingest mode — closes the RuView↔RuField visual loop (ADR-262 surfaces).** The dashboard gains `--source live --upstream `: it consumes RuView's `/ws/field` SSE (falling back to polling `/api/field`), **verifies every event's ed25519 provenance receipt on ingest** (`is_fusable`) — forged/tampered events are flagged ✗ and **never fused** into trusted inferences — and renders real RuView `FieldEvent`s through the same room-state/privacy-badge/fusion-graph/receipt path the synthetic mode uses (wire-compatible by construction: both sides use `rufield_core::FieldEvent` serde). **Strict banner honesty:** a single `BannerState` shows `SYNTHETIC` / `LIVE — ` / `DISCONNECTED — unreachable`, mutually exclusive — never SYNTHETIC while showing live data or vice versa; live mode returns **409** on `/api/run` rather than fabricate a synthetic run, and starts DISCONNECTED until first verified contact. Default stays synthetic. 26 tests / 0 failed. `ruvnet/rufield` `crates/rufield-viewer`; `vendor/rufield` submodule bumped. - **ADR-262 P3 — live RuField surface: RuView's running sensing-server now speaks RuField on `/api/field` + `/ws/field`.** Wires the P1 `wifi-densepose-rufield` bridge into the live `wifi-densepose-sensing-server` (the bridge is the only added coupling, ADR-262 §5.4). A new `src/rufield_surface.rs` module (kept out of the 8k-line `main.rs`) holds a `FieldSurface` with a **dedicated ed25519 `Signer`**, a bounded ring buffer of recent signed events (`FIELD_RING_CAPACITY = 64`), and the `/ws/field` broadcast topic; it exposes `GET /api/field` (latest signed `FieldEvent`s + signer pubkey + a `dev_signing_key` flag) and `GET /ws/field` (per-cycle stream, mirroring `/ws/sensing`), plus a standalone `router()` for isolated testing. **Tap:** at the ESP32 governed-trust cycle (`main.rs` `observe_cycle` ~`:5886` / `SensingUpdate` build ~`:5938`), `emit_rufield_event` joins the cycle's real `SensingUpdate` (features/classification/signal_field) with the engine's recorded `effective_class`/`demoted` trust state into a `SensingSnapshot` and surfaces a signed `FieldEvent` — **existing endpoints (`/ws/sensing` etc.) are unchanged; this is purely additive.** **Signer (defers the P2 key decision, §8 Q1):** a **standalone dev/sensing key** from `WDP_RUFIELD_SIGNING_SEED` (64-hex or ≥32-byte value), else a deterministic dev default with a logged `WARN` — reusing the `cog-ha-matter` Ed25519 key is the deferred P2 call, so P3 does not pre-empt it. **Egress privacy (fail-closed):** `network_egress_allowed` is *stricter* than `DefaultPrivacyGuard` for an unattended live surface — only **P1/P2** leave the box; P0 (raw) and P3/P4/P5 are held edge-local, so a `Derived → P4/P5` cycle **never** surfaces; no-presence cycles emit **no phantom event**. **P3 acceptance gates (`tests/rufield_surface_test.rs`, 4 integration via `tower::oneshot` + 4 module unit, 0 failed):** a well-formed **signed** event (`Modality::WifiCsi`, P2 not P1, `is_fusable` ed25519-verified, real timestamp); empty cycle → no phantom; **privacy-safety** — an injected `Derived` trust never surfaces; a mixed stream surfaces only egress-safe events. **Honest scope (ADR-262 §0/§6):** real plumbing on a **live endpoint**, **NOT accuracy** — single-link CSI with its existing caveats (no validated room-coordinate accuracy — `field_localize`), a dedicated dev signing key pending the P2 ownership decision, no accuracy claim. The win is narrowly: "RuView's live sensing now speaks RuField on `/ws/field`." - **ADR-262 P1 — `wifi-densepose-rufield` anti-corruption bridge: RuView WiFi-CSI sensing → signed RuField `FieldEvent`s.** A new v2 workspace member (the *single coupling point* between RuView and the standalone RuField MFS spec, ADR-262 §5.4) that **path-deps** the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion` — pure-Rust, `--no-default-features`-buildable: serde/sha2/ed25519/toml only, no tch/openblas/ndarray/candle) and **no** RuView internal crate. The bridge takes owned primitives — `SensingSnapshot` mirrors the `/ws/sensing` `SensingUpdate` (features + classification + signal_field) joined with the `TrustedOutput` trust state (`trust_class`/`demoted`/`identity_bound`) — and `snapshot_to_field_event()` emits one **signed** `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`): a real `FieldTensor` from the feature scalars with the real `timestamp_ns`; an `Observation` whose `range_m`/`motion_vector`/`space_cell` are derived from the strongest **signal-field peak** when present (else `None` — coordinates are **never fabricated**, per the `field_localize` caveat) and `confidence` from the classification; a real `ProvenanceRef` (sha256 over the tensor bytes, `synthetic=false`) **ed25519-signed** so `rufield_provenance::is_fusable` passes. **The §3.3 privacy mapping is the critical correctness item**, implemented as `map_privacy()` mapping RuView's class onto RuField P0–P5 **by information content, NEVER by byte value** and **fail-closed**: RuView `Derived` (byte `1`, which sorts *below* `Anonymous` byte `2`) carries an identity embedding → maps to **P4** (or **P5** if identity-bound), **never P1** (the single most dangerous mapping mistake); `Raw → P0`, `Anonymous → P2`, `Restricted → P2`; a governed-engine `demoted` cycle floors the egress class to ≥ P2 with raw suppressed. **P1 acceptance gates (15 tests / 0 failed — 5 unit + 9 integration + 1 doc):** round-trip (`SensingSnapshot → FieldEvent →` serde `→` equal), `is_fusable` (verified ed25519 receipt), `RuFieldFusion::ingest` accept + `infer()` runs, **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy` — `Derived → P4/P5`, never P1; a table test over every RuView class; fail-closed demotion), and determinism (same snapshot + same signer seed → byte-identical event). **Honest scope:** this is **P1 plumbing** — a tested conversion + a safe privacy mapping. It is **not** wired into the live server (that is P3) and makes **no accuracy claim** (RuField v0.1 is synthetic; RuView's single-link CSI carries its own caveats). CI: the `rust-tests` workflow checkout gains `submodules: recursive` so the path-deps resolve. Python deterministic proof unchanged (off the signal proof path). diff --git a/docs/adr/ADR-173-metric-locked-pck-mpjpe-accuracy-harness.md b/docs/adr/ADR-173-metric-locked-pck-mpjpe-accuracy-harness.md new file mode 100644 index 00000000..8de81dd1 --- /dev/null +++ b/docs/adr/ADR-173-metric-locked-pck-mpjpe-accuracy-harness.md @@ -0,0 +1,123 @@ +# ADR-173: Metric-Locked PCK/MPJPE Accuracy Harness + +| Field | Value | +|-------|-------| +| **Status** | Accepted — implemented, deterministically tested | +| **Date** | 2026-06-15 | +| **Deciders** | ruv | +| **Codename** | **METRIC-LOCK** | +| **Amends** | ADR-155 (generalizes the torso-only `metrics_core::pck_canonical` to a selectable normalization) | +| **Motivated by** | `docs/research/sota-nn-train-benchmark-brief.md` (PR #1090) | + +## Context + +The beyond-SOTA SOTA-research brief (PR #1090) identified the single biggest +threat to any "beyond-SOTA" accuracy claim this project makes: **metric +ambiguity**. Three PCK@20 numbers circulate, computed under three *different and +unstated* normalizations, so they cannot be compared: + +- **96.09–96.61%** — WiFlow-STD reproduction, **image/bounding-box-normalized** PCK (the looser convention). +- **81.63%** — an internal MM-Fi number reported as **"torso-PCK"** (tighter). +- **61.1%** — GraphPose-Fi (arXiv 2511.19105), **standard torso-diameter** PCK on the MM-Fi random split (the academic frontier). + +The project has been burned by this twice: a previously-published 92.9% was +retracted because it used **absolute-pixel** normalization, not torso. Until +there is *one canonical, documented, tested* PCK definition — and every reported +number carries the definition it was computed under — no accuracy comparison is +credible, and the "prove everything" bar cannot be met for the benchmark half of +the work. + +This is measurement infrastructure, not an accuracy claim. The deliverable's job +is to make the metric **unambiguous and reproducible**, so future numbers are +comparable and an unlabeled PCK is structurally impossible. + +## Decision + +Add a metric-locked accuracy harness as a new module +`v2/crates/wifi-densepose-train/src/accuracy.rs` (404 non-test lines; inline +deterministic tests bring the file to 708), re-exported at the crate root. It +**extends, not duplicates** — it reuses `metrics_core`'s geometric primitives +(`bounding_box_diagonal`, canonical hip indices `CANON_LEFT_HIP/RIGHT_HIP`), so +there remains exactly one implementation of each geometric reference; the +existing ADR-155 `pck_canonical` (torso-only) is unchanged and this generalizes +it. + +### Public API + +- `enum PckNormalization { TorsoDiameter, BoundingBoxDiagonal, AbsolutePixels(f32) }` + — the three conventions the three historical numbers used, now **explicit and + selectable**. `.label()` / `.tolerance(...)`. +- `pck_at(pred, gt, vis, k, norm) -> (correct, total, pck)` — PCK@k = + fraction of *visible* keypoints whose predicted-vs-GT distance ≤ the tolerance, + where tolerance = `k%` of the chosen normalizer (or an absolute threshold for + `AbsolutePixels`). +- `mpjpe(pred, gt, vis) -> f32` — mean per-joint position error (2D/3D, coordinate + units; mm for mm inputs). Re-exported crate-root as `pck_mpjpe` to avoid + colliding with the existing `eval::mpjpe`. +- `struct PoseAccuracy { pck_at: BTreeMap, mpjpe, normalization, n_keypoints, n_frames }` + — **a reported number always carries its `normalization`**; an unlabeled PCK is + structurally impossible to produce through this surface. +- `struct PoseFrame { pred, gt, visibility }` + `accuracy_report(frames, ks, norm) -> PoseAccuracy` + (micro-averaged over keypoints). + +### Correctness is proven by hand-computed deterministic tests (no GPU, no data) + +The tests construct synthetic keypoint sets whose PCK/MPJPE can be computed by +hand, and assert the harness matches. Highlights (all pass): + +| Test | Construction | Expected | +|------|--------------|----------| +| perfect_prediction | pred==gt | PCK=1.0 (all 3 norms), MPJPE=0 | +| all_just_outside | every error just past τ@20 | PCK=0.0 | +| half_in_half_out | 2 exact, 2 just outside | PCK=0.5 | +| **three_normalizations (KEY PROOF)** | identical pred; nose err .06, shoulder .10, hips exact | torso=**0.50**, bbox=**1.00**, abs(.08)=**0.75** | +| mpjpe_2d / mpjpe_3d | (3,4)→5 / (1,2,2)→3 | 2.5 / 3.0 | +| mpjpe_excludes_invisible | invisible joint err 100 ignored | 5.0 | +| zero_torso_unscoreable | coincident hips | `(0,0,0.0)`, **not** false-perfect | +| no_visible_keypoints | vis=∅ | `(0,0,0.0)` | +| nan_coords | one NaN pred coord | counted wrong, **no panic** | +| empty report | no frames | 0.0, **not** NaN | +| bbox≥torso ordering | same frames | bbox-PCK ≥ torso-PCK | + +### The key proof (the ambiguity is real and quantified) + +Identical predictions, three declared normalizations → **0.50 / 1.00 / 0.75**. +Mechanism: the bbox diagonal `√(0.20² + 0.80²) = 0.825` is ~4× the hip-span torso +`0.20`, so τ@20 is 0.165 (bbox) vs 0.040 (torso) — the looser image-normalized +convention passes joints the strict torso convention rejects. This is *exactly* +why 96% / 81.6% / 61% cannot be lined up without declaring the enum, demonstrated +in-code. + +## Validation + +- `cargo test -p wifi-densepose-train --no-default-features` → lib **191 → 206** + (+15), `test_metrics` **12 → 14** (+2), doc-tests 8 — **0 failed**. +- `cargo test --workspace --no-default-features` → **exit 0**, 0 failed. +- `python archive/v1/data/proof/verify.py` → **VERDICT: PASS**, hash + `f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a` **unchanged** + (off the signal proof path — confirms no pipeline alteration). + +## Consequences + +### Positive +- The three historical PCK numbers can now be **recomputed under one declared + definition** and compared honestly. The retracted-number class of error + (silent normalization mismatch) is structurally prevented going forward. +- Establishes the measurement substrate for the beyond-SOTA target: GraphPose-Fi + cross-environment **PCK@20 = 12.9%** (standard torso PCK) is now a number this + harness can produce comparably. + +### Negative +- None functional. The harness is additive; no existing metric path changed. + +### Neutral +- Producing actual model numbers under this harness requires the trained models + + datasets (MM-Fi) and, for cross-domain splits, is the next sub-deliverable of + the benchmark/optimization milestone — out of scope here (this ADR is the + *instrument*, not the *reading*). + +## Links +- ADR-155 — metric core (`pck_canonical`, torso-only) — generalized here +- ADR-152 — WiFi-Pose SOTA 2026 intake / WiFlow-STD benchmark +- `docs/research/sota-nn-train-benchmark-brief.md` — the motivating gap analysis +- GraphPose-Fi — arXiv 2511.19105 (verified cross-env PCK@20 = 12.9% anchor) diff --git a/v2/crates/wifi-densepose-train/src/accuracy.rs b/v2/crates/wifi-densepose-train/src/accuracy.rs new file mode 100644 index 00000000..b074b24f --- /dev/null +++ b/v2/crates/wifi-densepose-train/src/accuracy.rs @@ -0,0 +1,708 @@ +//! Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173). +//! +//! # Why this module exists +//! +//! Three PCK\@20 numbers float around this project and **cannot be lined up** +//! because each silently uses a *different* PCK definition: +//! +//! | Number | Source | PCK normalization | +//! |--------|--------|-------------------| +//! | 96.09 % | WiFlow-STD reproduction | image / bounding-box normalized (looser) | +//! | 81.63 % | AetherArena MM-Fi (ADR-150) | torso-diameter (standard MM-Fi / GraphPose-Fi) | +//! | 61.1 % | GraphPose-Fi (preprint) | torso-diameter, 3D, mm-scale (harder) | +//! +//! The project was burned **twice** by metric ambiguity (a now-retracted "92.9 % +//! PCK\@20" used *absolute* pixel thresholds, not torso normalization). The fix +//! is to make the normalizer **explicit, selectable, and carried with every +//! reported number** so an unlabeled PCK figure is structurally impossible. +//! +//! [`metrics_core`](crate::metrics_core) already pins the *canonical* +//! torso-normalized PCK ([`pck_canonical`](crate::metrics_core::pck_canonical)). +//! This module generalizes it to a [`PckNormalization`] enum covering all three +//! conventions the SOTA brief names, adds [`mpjpe`] (mm), and bundles results +//! into a self-describing [`PoseAccuracy`] struct. It **reuses** the +//! `metrics_core` primitives (hip distance, bounding-box diagonal) — there is +//! still exactly one implementation of each geometric reference. +//! +//! # This is measurement infrastructure, not an accuracy claim +//! +//! Nothing here asserts any project model is good. The unit tests prove the +//! *harness* is arithmetically correct against hand-computed fixtures (no GPU, +//! no datasets), including the key demonstration that the **same predictions +//! score different PCK under the three normalizations** — proof the ambiguity is +//! real and the definitions are genuinely distinct. +//! +//! # Literature +//! +//! - Torso-diameter PCK is the MM-Fi / GraphPose-Fi convention (Yang et al., +//! *GraphPose-Fi*, arXiv:2511.19105): a keypoint is correct iff its error is +//! within `k · d_torso`, with `d_torso` the hip↔hip (or shoulder↔hip) span. +//! - Bounding-box / image-normalized PCK is the WiFlow-STD-style looser +//! convention (arXiv:2602.08661) — normalize by the GT pose bbox diagonal. +//! - MPJPE (mean per-joint position error, mm) is reported by GraphPose-Fi and +//! Person-in-WiFi-3D (Yan et al., CVPR 2024). + +use std::collections::BTreeMap; + +use ndarray::{Array1, Array2}; + +use crate::metrics_core::{ + bounding_box_diagonal, CANON_LEFT_HIP, CANON_RIGHT_HIP, +}; + +/// Visibility cutoff: a keypoint counts as *visible* iff `visibility[j] >= 0.5` +/// (COCO convention; matches [`crate::metrics_core`]). +const VISIBILITY_THRESHOLD: f32 = 0.5; + +/// Minimum positive normalizer extent. Below this the reference scale is +/// considered degenerate (zero torso, collapsed bbox) and the frame is reported +/// unscoreable rather than dividing by ≈0. +const MIN_REFERENCE_EXTENT: f32 = 1e-6; + +// =========================================================================== +// PCK normalization — the explicit, selectable definition +// =========================================================================== + +/// The PCK normalization basis — **the single knob that made three project +/// numbers non-comparable**, now explicit and carried with every result. +/// +/// A keypoint `j` (with `visibility[j] >= 0.5`) is *correct* iff +/// `‖pred_j − gt_j‖₂ ≤ τ`, where the **distance tolerance `τ`** is derived from +/// the chosen normalization and the PCK threshold `k` (given as a percentage, +/// e.g. `20` for PCK\@20): +/// +/// | Variant | `τ` (tolerance in coordinate units) | +/// |---------|--------------------------------------| +/// | [`TorsoDiameter`](Self::TorsoDiameter) | `(k/100) · d_torso` | +/// | [`BoundingBoxDiagonal`](Self::BoundingBoxDiagonal) | `(k/100) · d_bbox` | +/// | [`AbsolutePixels`](Self::AbsolutePixels) | `threshold` (k ignored) | +/// +/// `d_torso` is the hip↔hip span (COCO joints 11↔12), falling back to the bbox +/// diagonal when both hips are not visible — identical to +/// [`crate::metrics_core::canonical_torso_size`]. `d_bbox` is the diagonal of +/// the axis-aligned bounding box of all visible GT keypoints. +/// +/// These yield **different** PCK on the *same* predictions whenever +/// `d_torso ≠ d_bbox` (always true for a real pose: the bbox is larger than the +/// hip span), which is exactly why the 96 / 81.6 / 61 numbers cannot be lined +/// up without declaring this enum. +#[derive(Debug, Clone, Copy, PartialEq)] +pub enum PckNormalization { + /// **Torso-diameter** (hip↔hip span). The standard MM-Fi / GraphPose-Fi + /// convention and the *stricter* of the two relative normalizers. This is + /// the canonical default ([`crate::metrics_core::pck_canonical`]). + TorsoDiameter, + /// **Bounding-box diagonal** (a.k.a. image-normalized). The looser + /// WiFlow-STD-style convention: normalize by the GT pose bbox diagonal, + /// which is larger than the torso span ⇒ a more forgiving threshold ⇒ a + /// higher PCK on identical predictions. + BoundingBoxDiagonal, + /// **Absolute pixel/coordinate threshold** — no pose-relative + /// normalization. The PCK `k` percentage is ignored; the held `threshold` + /// is the raw distance tolerance directly. Included so historical + /// retracted-style numbers are reproducible, and **clearly labeled as + /// non-comparable** to the relative variants (it does not scale with body + /// size or camera distance). + AbsolutePixels(f32), +} + +impl PckNormalization { + /// Human-readable, *self-documenting* label for a reported number — so a + /// `PoseAccuracy` printed anywhere always carries its definition. + pub fn label(&self) -> String { + match self { + PckNormalization::TorsoDiameter => "torso-diameter".to_string(), + PckNormalization::BoundingBoxDiagonal => "bbox-diagonal".to_string(), + PckNormalization::AbsolutePixels(t) => format!("absolute-px({t})"), + } + } + + /// Compute the per-frame distance tolerance `τ` for PCK threshold `k` + /// (percentage). Returns `None` when the (relative) normalizer is degenerate + /// — the frame cannot be scored. + /// + /// `gt_kpts` is `[n, 2]` (or `[n, ≥2]`, only x/y used); `visibility` is `[n]`. + fn tolerance(&self, gt_kpts: &Array2, visibility: &Array1, k: u8) -> Option { + let n = gt_kpts.shape()[0].min(visibility.len()); + match self { + PckNormalization::AbsolutePixels(threshold) => { + // Raw tolerance, independent of pose scale and of `k`. + if *threshold > 0.0 { + Some(*threshold) + } else { + None + } + } + PckNormalization::TorsoDiameter => { + let d = torso_diameter(gt_kpts, visibility, n)?; + Some((k as f32 / 100.0) * d) + } + PckNormalization::BoundingBoxDiagonal => { + let d = bounding_box_diagonal(gt_kpts, visibility, n); + if d > MIN_REFERENCE_EXTENT { + Some((k as f32 / 100.0) * d) + } else { + None + } + } + } + } +} + +/// Hip↔hip torso diameter with a bbox-diagonal fallback — the relative +/// normalizer shared by `TorsoDiameter` PCK and +/// [`crate::metrics_core::canonical_torso_size`]. Returns `None` when no +/// positive-extent reference exists. +fn torso_diameter(gt_kpts: &Array2, visibility: &Array1, n: usize) -> Option { + if CANON_LEFT_HIP < n + && CANON_RIGHT_HIP < n + && visibility[CANON_LEFT_HIP] >= VISIBILITY_THRESHOLD + && visibility[CANON_RIGHT_HIP] >= VISIBILITY_THRESHOLD + { + let dx = gt_kpts[[CANON_LEFT_HIP, 0]] - gt_kpts[[CANON_RIGHT_HIP, 0]]; + let dy = gt_kpts[[CANON_LEFT_HIP, 1]] - gt_kpts[[CANON_RIGHT_HIP, 1]]; + let torso = (dx * dx + dy * dy).sqrt(); + if torso > MIN_REFERENCE_EXTENT { + return Some(torso); + } + } + let diag = bounding_box_diagonal(gt_kpts, visibility, n); + if diag > MIN_REFERENCE_EXTENT { + Some(diag) + } else { + None + } +} + +// =========================================================================== +// Single-frame PCK / MPJPE +// =========================================================================== + +/// Per-frame **PCK\@`k`** under the selected `normalization`. +/// +/// A keypoint `j` with `visibility[j] >= 0.5` is correct iff +/// `‖pred_j − gt_j‖₂ ≤ τ`, with `τ` from +/// [`PckNormalization::tolerance`]. Only x/y are used (2D PCK is the standard +/// keypoint-PCK definition; pass 2-column arrays). +/// +/// # Returns +/// `(correct, total, pck)` with `pck ∈ [0,1]`. **`(0, 0, 0.0)`** when no +/// keypoint is visible, or (for the relative normalizers) the reference scale is +/// degenerate — a frame with no measurable evidence scores 0, never 1. +/// NaN-valued coordinates make a keypoint *incorrect* (the `<=` comparison is +/// false for NaN) rather than panicking. +pub fn pck_at( + pred_kpts: &Array2, + gt_kpts: &Array2, + visibility: &Array1, + k: u8, + normalization: PckNormalization, +) -> (usize, usize, f32) { + let n = pred_kpts.shape()[0] + .min(gt_kpts.shape()[0]) + .min(visibility.len()); + let tol = match normalization.tolerance(gt_kpts, visibility, k) { + Some(t) => t, + None => return (0, 0, 0.0), + }; + + let mut correct = 0usize; + let mut total = 0usize; + for j in 0..n { + if visibility[j] < VISIBILITY_THRESHOLD { + continue; + } + total += 1; + let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]]; + let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]]; + let dist = (dx * dx + dy * dy).sqrt(); + // NaN-safe: `NaN <= tol` is false, so a NaN coordinate counts as wrong. + if dist <= tol { + correct += 1; + } + } + let pck = if total > 0 { + correct as f32 / total as f32 + } else { + 0.0 + }; + (correct, total, pck) +} + +/// Per-frame **MPJPE** (mean per-joint position error) over visible keypoints, +/// in the coordinate units of the inputs (report as mm when inputs are mm). +/// +/// `pred`/`gt` are `[n, D]` with `D ∈ {2, 3}` (2D or 3D pose); all `D` columns +/// are used. Joints with `visibility[j] < 0.5` are excluded. +/// +/// Returns `0.0` when no keypoint is visible (no evidence). A NaN coordinate +/// propagates into the returned mean (callers filter NaN frames upstream); it +/// does not panic. +pub fn mpjpe(pred: &Array2, gt: &Array2, visibility: &Array1) -> f32 { + let n = pred.shape()[0].min(gt.shape()[0]).min(visibility.len()); + let d = pred.shape()[1].min(gt.shape()[1]); + let mut sum = 0.0f32; + let mut count = 0usize; + for j in 0..n { + if visibility[j] < VISIBILITY_THRESHOLD { + continue; + } + let mut sq = 0.0f32; + for c in 0..d { + let diff = pred[[j, c]] - gt[[j, c]]; + sq += diff * diff; + } + sum += sq.sqrt(); + count += 1; + } + if count > 0 { + sum / count as f32 + } else { + 0.0 + } +} + +// =========================================================================== +// Self-describing result struct + batch report +// =========================================================================== + +/// A pose-accuracy result that **always carries the definition it was computed +/// under** — making an unlabeled PCK number structurally impossible. +/// +/// Built by [`accuracy_report`] over a set of frames. `pck_at` maps each +/// requested threshold `k` (percentage, e.g. `20`) to its PCK in `[0,1]`. The +/// `normalization` field records *which* PCK definition produced those numbers, +/// so two `PoseAccuracy` values can only be compared when their `normalization` +/// matches (the comparability check the project lacked). +#[derive(Debug, Clone, PartialEq)] +pub struct PoseAccuracy { + /// PCK\@k for each requested threshold percentage `k`, in `[0,1]`. + pub pck_at: BTreeMap, + /// Mean per-joint position error in coordinate units (mm for mm inputs). + pub mpjpe: f32, + /// The normalization basis under which `pck_at` was computed — the label a + /// reported number must always carry. + pub normalization: PckNormalization, + /// Number of keypoints per frame (the pose convention, e.g. 17 for COCO). + pub n_keypoints: usize, + /// Number of frames aggregated into this result. + pub n_frames: usize, +} + +impl PoseAccuracy { + /// Convenience accessor for a single threshold, returning `None` when that + /// `k` was not requested. + pub fn pck(&self, k: u8) -> Option { + self.pck_at.get(&k).copied() + } + + /// A one-line, self-documenting summary suitable for logs / RESULTS.md, e.g. + /// `PCK@20=0.750 (torso-diameter, 17kp, 1 frames) MPJPE=0.030`. + pub fn summary(&self) -> String { + let pcks: Vec = self + .pck_at + .iter() + .map(|(k, v)| format!("PCK@{k}={v:.3}")) + .collect(); + format!( + "{} ({}, {}kp, {} frames) MPJPE={:.4}", + pcks.join(" "), + self.normalization.label(), + self.n_keypoints, + self.n_frames, + self.mpjpe + ) + } +} + +/// One frame's prediction + ground truth + visibility for batch scoring. +/// +/// All three arrays share row count `n_keypoints`; `pred`/`gt` are `[n, D]` +/// (`D ∈ {2,3}`), `visibility` is `[n]`. +#[derive(Debug, Clone)] +pub struct PoseFrame { + /// Predicted keypoints `[n, D]`. + pub pred: Array2, + /// Ground-truth keypoints `[n, D]`. + pub gt: Array2, + /// Per-keypoint visibility `[n]` (`>= 0.5` ⇒ visible). + pub visibility: Array1, +} + +/// Aggregate [`PoseAccuracy`] over a batch of frames under **one** explicit +/// `normalization`, for the requested PCK thresholds `ks` (percentages). +/// +/// PCK is micro-averaged over keypoints (sum of correct ÷ sum of visible across +/// all frames — the standard keypoint-PCK aggregation), so frames with more +/// visible joints contribute proportionally. MPJPE is micro-averaged over +/// visible joints likewise. Unscoreable frames (no visible joints, degenerate +/// relative normalizer) contribute `(0, 0)` and so are excluded from the +/// denominator rather than scored as perfect. +/// +/// An **empty** `frames` slice yields all-zero PCK and `0.0` MPJPE — never a +/// panic or NaN. +pub fn accuracy_report( + frames: &[PoseFrame], + ks: &[u8], + normalization: PckNormalization, +) -> PoseAccuracy { + let n_keypoints = frames.first().map(|f| f.gt.shape()[0]).unwrap_or(0); + + // PCK: per-threshold (correct, total) accumulators across frames. + let mut pck_acc: BTreeMap = ks.iter().map(|&k| (k, (0, 0))).collect(); + // MPJPE: sum of per-joint distances and visible-joint count. + let mut mpjpe_sum = 0.0f32; + let mut mpjpe_count = 0usize; + + for frame in frames { + for &k in ks { + let (c, t, _) = pck_at(&frame.pred, &frame.gt, &frame.visibility, k, normalization); + let entry = pck_acc.entry(k).or_insert((0, 0)); + entry.0 += c; + entry.1 += t; + } + // Per-frame MPJPE re-derived as a (sum, count) contribution so the + // batch value is a true micro-average over joints. + let n = frame.pred.shape()[0].min(frame.gt.shape()[0]).min(frame.visibility.len()); + let d = frame.pred.shape()[1].min(frame.gt.shape()[1]); + for j in 0..n { + if frame.visibility[j] < VISIBILITY_THRESHOLD { + continue; + } + let mut sq = 0.0f32; + for c in 0..d { + let diff = frame.pred[[j, c]] - frame.gt[[j, c]]; + sq += diff * diff; + } + mpjpe_sum += sq.sqrt(); + mpjpe_count += 1; + } + } + + let pck_at: BTreeMap = pck_acc + .into_iter() + .map(|(k, (c, t))| { + let v = if t > 0 { c as f32 / t as f32 } else { 0.0 }; + (k, v) + }) + .collect(); + + let mpjpe = if mpjpe_count > 0 { + mpjpe_sum / mpjpe_count as f32 + } else { + 0.0 + }; + + PoseAccuracy { + pck_at, + mpjpe, + normalization, + n_keypoints, + n_frames: frames.len(), + } +} + +#[cfg(test)] +mod tests { + use super::*; + + /// Build a 17-joint `[17, 2]` pose from `(joint, x, y)` triples. + fn pose17(joints: &[(usize, f32, f32)]) -> Array2 { + let mut a = Array2::::zeros((17, 2)); + for &(j, x, y) in joints { + a[[j, 0]] = x; + a[[j, 1]] = y; + } + a + } + + fn vis17(visible: &[usize]) -> Array1 { + let mut v = Array1::::zeros(17); + for &j in visible { + v[j] = 2.0; + } + v + } + + // -------- consts pinned (no silent metric drift) -------- + #[test] + fn accuracy_consts_unchanged() { + assert_eq!(VISIBILITY_THRESHOLD, 0.5_f32); + assert_eq!(MIN_REFERENCE_EXTENT, 1e-6_f32); + } + + // -------- perfect prediction ⇒ PCK = 1.0, MPJPE = 0 -------- + #[test] + fn perfect_prediction_pck_one_mpjpe_zero() { + let gt = pose17(&[ + (5, 0.35, 0.35), + (CANON_LEFT_HIP, 0.40, 0.50), + (CANON_RIGHT_HIP, 0.60, 0.50), + ]); + let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + for norm in [ + PckNormalization::TorsoDiameter, + PckNormalization::BoundingBoxDiagonal, + PckNormalization::AbsolutePixels(0.01), + ] { + let (c, t, pck) = pck_at(>, >, &vis, 20, norm); + assert_eq!((c, t), (3, 3), "{norm:?}"); + assert!((pck - 1.0).abs() < 1e-6, "{norm:?} perfect PCK must be 1.0"); + } + assert_eq!(mpjpe(>, >, &vis), 0.0); + } + + // -------- all keypoints just OUTSIDE threshold ⇒ PCK = 0.0 -------- + // + // Hand calc (torso): hips at (0.40,0.50)/(0.60,0.50) ⇒ torso = 0.20. + // threshold k=20 ⇒ τ = 0.20·0.20 = 0.04. Push every scored joint to an + // error of 0.05 (> 0.04) ⇒ all wrong. To avoid the hips themselves being + // "correct", we displace the hips too (their displaced positions still + // define the torso from GT, which is unchanged). + #[test] + fn all_just_outside_threshold_pck_zero() { + let gt = pose17(&[ + (5, 0.50, 0.50), + (CANON_LEFT_HIP, 0.40, 0.50), + (CANON_RIGHT_HIP, 0.60, 0.50), + ]); + // GT torso = 0.20, τ@20 = 0.04. Displace each scored joint by dx=0.05. + let pred = pose17(&[ + (5, 0.55, 0.50), + (CANON_LEFT_HIP, 0.45, 0.50), + (CANON_RIGHT_HIP, 0.65, 0.50), + ]); + let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + let (c, t, pck) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter); + assert_eq!(t, 3); + assert_eq!(c, 0, "all errors 0.05 > τ 0.04 ⇒ none correct"); + assert_eq!(pck, 0.0); + } + + // -------- half-in / half-out ⇒ PCK = 0.5 -------- + // + // Hand calc (torso): torso = 0.20, τ@20 = 0.04. Four visible joints; two + // exact (dist 0 ≤ 0.04, correct), two displaced 0.05 (> 0.04, wrong) + // ⇒ 2/4 = 0.5. + #[test] + fn half_in_half_out_pck_half() { + let gt = pose17(&[ + (0, 0.50, 0.20), + (5, 0.50, 0.50), + (CANON_LEFT_HIP, 0.40, 0.50), + (CANON_RIGHT_HIP, 0.60, 0.50), + ]); + let pred = pose17(&[ + (0, 0.50, 0.20), // exact ⇒ correct + (5, 0.55, 0.50), // err 0.05 ⇒ wrong + (CANON_LEFT_HIP, 0.40, 0.50), // exact ⇒ correct + (CANON_RIGHT_HIP, 0.65, 0.50), // err 0.05 ⇒ wrong + ]); + let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + let (c, t, pck) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter); + assert_eq!((c, t), (2, 4)); + assert!((pck - 0.5).abs() < 1e-6, "expected 0.5, got {pck}"); + } + + // -------- THE KEY PROOF: same predictions, three normalizations, three PCK -------- + // + // One construction scored three ways. Hand calc: + // GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30), + // l_hip(11)=(0.40,0.90), r_hip(12)=(0.60,0.90). + // Visible = {0,5,11,12}, all four. + // torso = |0.60-0.40| = 0.20 (hips, y equal). + // bbox: x∈[0.40,0.60] (w=0.20), y∈[0.10,0.90] (h=0.80) + // ⇒ diag = sqrt(0.20² + 0.80²) = sqrt(0.04+0.64)=sqrt(0.68)=0.8246… + // + // Pred errors (pure dx): nose 0.00, l_sh 0.10, l_hip 0.00, r_hip 0.00. + // (Only joint 5 is displaced, by 0.10.) + // + // k = 20: + // • Torso τ = 0.20·0.20 = 0.040 → joint5 err 0.10 > 0.040 ⇒ WRONG + // ⇒ 3 correct / 4 = 0.75 + // • Bbox τ = 0.20·0.8246 = 0.16492 → joint5 err 0.10 ≤ 0.16492 ⇒ CORRECT + // ⇒ 4 correct / 4 = 1.00 + // • Abs(0.05) τ = 0.05 → joint5 err 0.10 > 0.05 ⇒ WRONG + // ⇒ 3 correct / 4 = 0.75 (same count as torso HERE by coincidence) + // + // To make ALL THREE differ, also test Abs(0.08): τ=0.08, joint5 0.10>0.08 + // ⇒ still 0.75. So we additionally displace nose by 0.06 (between 0.05 and + // 0.08) to separate the two absolute thresholds — see below. + #[test] + fn three_normalizations_give_different_pck_on_identical_input() { + let gt = pose17(&[ + (0, 0.50, 0.10), // nose + (5, 0.50, 0.30), // left_shoulder + (CANON_LEFT_HIP, 0.40, 0.90), + (CANON_RIGHT_HIP, 0.60, 0.90), + ]); + // nose displaced 0.06, shoulder displaced 0.10, hips exact. + let pred = pose17(&[ + (0, 0.56, 0.10), // err 0.06 + (5, 0.60, 0.30), // err 0.10 + (CANON_LEFT_HIP, 0.40, 0.90), // exact + (CANON_RIGHT_HIP, 0.60, 0.90), // exact + ]); + let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + + // Torso τ@20 = 0.04: nose 0.06>0.04 wrong, sh 0.10>0.04 wrong, + // hips exact ⇒ 2/4 = 0.5. + let (_, _, torso) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter); + // Bbox diag = sqrt(0.68)=0.82462; τ@20 = 0.164924: + // nose 0.06 ≤ τ correct, sh 0.10 ≤ τ correct, hips exact ⇒ 4/4 = 1.0. + let (_, _, bbox) = pck_at(&pred, >, &vis, 20, PckNormalization::BoundingBoxDiagonal); + // Abs(0.08): nose 0.06 ≤ 0.08 correct, sh 0.10 > 0.08 wrong, hips exact + // ⇒ 3/4 = 0.75. + let (_, _, abs) = pck_at(&pred, >, &vis, 20, PckNormalization::AbsolutePixels(0.08)); + + assert!((torso - 0.5).abs() < 1e-6, "torso PCK expected 0.5, got {torso}"); + assert!((bbox - 1.0).abs() < 1e-6, "bbox PCK expected 1.0, got {bbox}"); + assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK expected 0.75, got {abs}"); + + // The whole point: identical predictions, three DISTINCT PCK values. + assert!(torso != bbox && bbox != abs && torso != abs, + "normalizations must give distinct PCK: torso={torso}, bbox={bbox}, abs={abs}"); + } + + // -------- AbsolutePixels ignores k (raw threshold) -------- + #[test] + fn absolute_pixels_ignores_threshold_percentage() { + let gt = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]); + let pred = pose17(&[(5, 0.53, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]); + let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + // τ = 0.05 raw; joint5 err 0.03 ≤ 0.05 correct. k=5 and k=99 must agree. + let (_, _, p5) = pck_at(&pred, >, &vis, 5, PckNormalization::AbsolutePixels(0.05)); + let (_, _, p99) = pck_at(&pred, >, &vis, 99, PckNormalization::AbsolutePixels(0.05)); + assert_eq!(p5, p99, "AbsolutePixels must ignore the k percentage"); + assert!((p5 - 1.0).abs() < 1e-6, "all three within 0.05, got {p5}"); + } + + // -------- MPJPE hand-computed (2D and 3D) -------- + #[test] + fn mpjpe_hand_computed_2d() { + // joint0 err (3,4)->5, joint1 exact->0 ⇒ mean (5+0)/2 = 2.5. + let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 1.0, 1.0]).unwrap(); + let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 1.0, 1.0]).unwrap(); + let vis = Array1::from(vec![2.0, 2.0]); + assert!((mpjpe(&pred, >, &vis) - 2.5).abs() < 1e-6); + } + + #[test] + fn mpjpe_hand_computed_3d() { + // single joint err (1,2,2) -> sqrt(1+4+4)=3.0. + let gt = Array2::from_shape_vec((1, 3), vec![0.0, 0.0, 0.0]).unwrap(); + let pred = Array2::from_shape_vec((1, 3), vec![1.0, 2.0, 2.0]).unwrap(); + let vis = Array1::from(vec![2.0]); + assert!((mpjpe(&pred, >, &vis) - 3.0).abs() < 1e-6); + } + + #[test] + fn mpjpe_excludes_invisible_joints() { + // joint0 visible err 5, joint1 INVISIBLE err 100 ⇒ mean = 5 (joint1 dropped). + let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 0.0, 0.0]).unwrap(); + let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 100.0, 0.0]).unwrap(); + let vis = Array1::from(vec![2.0, 0.0]); + assert!((mpjpe(&pred, >, &vis) - 5.0).abs() < 1e-6); + } + + // -------- degenerate inputs: no panic -------- + #[test] + fn zero_torso_is_unscoreable_not_perfect() { + // Both hips coincident ⇒ torso ≈ 0; bbox also collapses ⇒ None. + let gt = pose17(&[(CANON_LEFT_HIP, 0.5, 0.5), (CANON_RIGHT_HIP, 0.5, 0.5)]); + let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]); + assert_eq!(pck_at(>, >, &vis, 20, PckNormalization::TorsoDiameter), (0, 0, 0.0)); + assert_eq!(pck_at(>, >, &vis, 20, PckNormalization::BoundingBoxDiagonal), (0, 0, 0.0)); + } + + #[test] + fn no_visible_keypoints_scores_zero() { + let gt = pose17(&[(CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]); + let vis = vis17(&[]); // nothing visible + let (c, t, pck) = pck_at(>, >, &vis, 20, PckNormalization::TorsoDiameter); + assert_eq!((c, t, pck), (0, 0, 0.0)); + assert_eq!(mpjpe(>, >, &vis), 0.0); + } + + #[test] + fn nan_coords_do_not_panic_and_count_wrong() { + let gt = pose17(&[(5, 0.5, 0.5), (CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]); + let mut pred = gt.clone(); + pred[[5, 0]] = f32::NAN; // joint 5 prediction is NaN + let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + let (c, t, pck) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter); + assert_eq!(t, 3); + assert_eq!(c, 2, "NaN joint must count as wrong, hips correct ⇒ 2/3"); + assert!((pck - 2.0 / 3.0).abs() < 1e-6); + // mpjpe with a NaN joint yields NaN (caller filters) but must not panic. + assert!(mpjpe(&pred, >, &vis).is_nan()); + } + + // -------- batch report: micro-average + self-describing struct -------- + #[test] + fn accuracy_report_micro_averages_and_carries_definition() { + // Frame A: 2 visible, both correct (2/2). Frame B: 2 visible, both wrong (0/2). + // Micro-average over joints: 2 correct / 4 = 0.5 (NOT mean-of-frame-PCK, + // which would be (1.0+0.0)/2 = 0.5 here too, but the accumulator is the + // joint-level one). + let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]); + let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]); + let frame_a = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis.clone() }; + // Frame B: displace both hips by 0.05 (> τ 0.04) ⇒ both wrong. + let pred_b = pose17(&[(CANON_LEFT_HIP, 0.45, 0.50), (CANON_RIGHT_HIP, 0.65, 0.50)]); + let frame_b = PoseFrame { pred: pred_b, gt: gt.clone(), visibility: vis.clone() }; + + let report = accuracy_report( + &[frame_a, frame_b], + &[20, 50], + PckNormalization::TorsoDiameter, + ); + assert_eq!(report.n_frames, 2); + assert_eq!(report.n_keypoints, 17); + assert_eq!(report.normalization, PckNormalization::TorsoDiameter); + // PCK@20: 2 correct / 4 visible = 0.5. + assert!((report.pck(20).unwrap() - 0.5).abs() < 1e-6); + // PCK@50: τ = 0.5·0.20 = 0.10, frame B err 0.05 ≤ 0.10 ⇒ all correct + // ⇒ 4/4 = 1.0. + assert!((report.pck(50).unwrap() - 1.0).abs() < 1e-6); + // A reported number always carries its definition in the summary. + assert!(report.summary().contains("torso-diameter")); + } + + #[test] + fn accuracy_report_empty_is_zero_not_nan() { + let report = accuracy_report(&[], &[20], PckNormalization::BoundingBoxDiagonal); + assert_eq!(report.n_frames, 0); + assert_eq!(report.pck(20), Some(0.0)); + assert_eq!(report.mpjpe, 0.0); + assert!(!report.mpjpe.is_nan()); + } + + // -------- bbox-norm is looser than torso-norm (sanity, on a batch) -------- + #[test] + fn bbox_norm_scores_at_least_torso_norm() { + // bbox diagonal >= torso span always (bbox encloses the hips), so for the + // SAME frames bbox-PCK >= torso-PCK at the same k. Pin this ordering. + let gt = pose17(&[ + (0, 0.50, 0.10), + (5, 0.50, 0.40), + (CANON_LEFT_HIP, 0.40, 0.90), + (CANON_RIGHT_HIP, 0.60, 0.90), + ]); + let pred = pose17(&[ + (0, 0.55, 0.10), + (5, 0.58, 0.40), + (CANON_LEFT_HIP, 0.42, 0.90), + (CANON_RIGHT_HIP, 0.62, 0.90), + ]); + let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + let frame = PoseFrame { pred, gt, visibility: vis }; + let torso = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::TorsoDiameter); + let bbox = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::BoundingBoxDiagonal); + assert!( + bbox.pck(20).unwrap() >= torso.pck(20).unwrap(), + "bbox-norm (looser) must be >= torso-norm: bbox={:?} torso={:?}", + bbox.pck(20), torso.pck(20) + ); + } +} diff --git a/v2/crates/wifi-densepose-train/src/lib.rs b/v2/crates/wifi-densepose-train/src/lib.rs index 712a1966..31745f85 100644 --- a/v2/crates/wifi-densepose-train/src/lib.rs +++ b/v2/crates/wifi-densepose-train/src/lib.rs @@ -43,6 +43,11 @@ // All *this* crate's code is written without unsafe blocks. #![warn(missing_docs)] +/// Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173) +/// — selectable `PckNormalization` (torso / bbox-diagonal / absolute), `mpjpe`, +/// and a self-describing `PoseAccuracy` result so a reported PCK number always +/// carries the definition it was computed under. +pub mod accuracy; pub mod config; pub mod dataset; pub mod domain; @@ -89,6 +94,11 @@ pub use metrics_core::{ canonical_torso_size, oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP, COCO_KP_SIGMAS, }; +// ADR-155 §Tier-1.2 — metric-locked accuracy harness (selectable PCK +// normalization + MPJPE + self-describing result). +pub use accuracy::{ + accuracy_report, mpjpe as pck_mpjpe, pck_at, PckNormalization, PoseAccuracy, PoseFrame, +}; pub use config::TrainingConfig; pub use dataset::{ CsiDataset, CsiSample, DataLoader, MmFiDataset, SyntheticConfig, SyntheticCsiDataset, diff --git a/v2/crates/wifi-densepose-train/tests/test_metrics.rs b/v2/crates/wifi-densepose-train/tests/test_metrics.rs index f3f48646..90239121 100644 --- a/v2/crates/wifi-densepose-train/tests/test_metrics.rs +++ b/v2/crates/wifi-densepose-train/tests/test_metrics.rs @@ -29,6 +29,66 @@ use ndarray::{Array1, Array2}; use wifi_densepose_train::{oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP}; +// ADR-155 §Tier-1.2 — metric-locked accuracy harness public surface. +use wifi_densepose_train::{accuracy_report, pck_at, PckNormalization, PoseFrame}; + +// --------------------------------------------------------------------------- +// Metric-locked accuracy harness: the three PCK normalizations are reachable +// from the crate root and give DIFFERENT PCK on identical predictions — the +// proof that the 96 / 81.6 / 61 figures were non-comparable (validated here as +// a downstream consumer would call it). +// --------------------------------------------------------------------------- + +/// Identical predictions, three declared normalizations ⇒ three distinct PCK. +/// Hand calc (all coords in `[0,1]`): +/// * GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30), hips=(0.40,0.90)/(0.60,0.90). +/// * Pred: nose err 0.06, shoulder err 0.10, hips exact. +/// * torso = 0.20 ⇒ τ@20 = 0.04 ⇒ only hips correct ⇒ 2/4 = **0.50**. +/// * bbox = √(0.20²+0.80²)=0.82462 ⇒ τ@20 = 0.16492 ⇒ all correct ⇒ **1.00**. +/// * abs(0.08): nose 0.06≤0.08 ok, shoulder 0.10>0.08 wrong ⇒ 3/4 = **0.75**. +#[test] +fn harness_three_normalizations_differ_from_crate_root() { + let gt = pose17(&[ + (0, 0.50, 0.10), + (5, 0.50, 0.30), + (CANON_LEFT_HIP, 0.40, 0.90), + (CANON_RIGHT_HIP, 0.60, 0.90), + ]); + let pred = pose17(&[ + (0, 0.56, 0.10), + (5, 0.60, 0.30), + (CANON_LEFT_HIP, 0.40, 0.90), + (CANON_RIGHT_HIP, 0.60, 0.90), + ]); + let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]); + + let (_, _, torso) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter); + let (_, _, bbox) = pck_at(&pred, >, &vis, 20, PckNormalization::BoundingBoxDiagonal); + let (_, _, abs) = pck_at(&pred, >, &vis, 20, PckNormalization::AbsolutePixels(0.08)); + + assert!((torso - 0.50).abs() < 1e-6, "torso PCK 0.50, got {torso}"); + assert!((bbox - 1.00).abs() < 1e-6, "bbox PCK 1.00, got {bbox}"); + assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK 0.75, got {abs}"); + assert!( + torso != bbox && bbox != abs && torso != abs, + "three normalizations must be distinct: {torso} / {bbox} / {abs}" + ); +} + +/// `accuracy_report` returns a self-describing result carrying its normalization, +/// so an unlabeled PCK number is structurally impossible at the API boundary. +#[test] +fn harness_report_carries_normalization_label() { + let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]); + let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]); + let frame = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis }; + let report = accuracy_report(&[frame], &[20], PckNormalization::BoundingBoxDiagonal); + assert_eq!(report.normalization, PckNormalization::BoundingBoxDiagonal); + assert_eq!(report.n_keypoints, 17); + assert_eq!(report.n_frames, 1); + assert!((report.pck(20).unwrap() - 1.0).abs() < 1e-6); + assert!(report.summary().contains("bbox-diagonal")); +} // --------------------------------------------------------------------------- // Tests that use `EvalMetrics` (requires tch-backend because the metrics