feat(train): metric-locked PCK/MPJPE accuracy harness + ADR-173 (resolve PCK-definition ambiguity) (#1092)
* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity
The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).
New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
(retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
number is structurally impossible.
17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).
This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).
wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness
Documents the accuracy harness (committed 3a8b2ed13) that resolves the
PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief
(#1090): three historical numbers (96/81.6/61) used three unstated
normalizations. The harness makes normalization explicit + selectable
(PckNormalization enum) and every reported number carries its definition.
Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs.
Co-Authored-By: claude-flow <ruv@ruv.net>
This commit is contained in:
parent
cfd0ad76cf
commit
90a88ada9a
|
|
@ -14,6 +14,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|||
- **`homecore-recorder` security review (ADR-132 surfaces) — two real bounding fixes; SQL-injection & NaN-index dimensions confirmed clean with evidence.** Beyond-SOTA review of the HA-compat state recorder (DB persistence + history + ruvector semantic search), the crux being its DB-backed SQL-injection surface. **Findings + fixes:** (1) **Memory-DoS — unbounded `get_state_history`.** The history query carried no `LIMIT`, so a wide `[since, until]` window over a high-frequency entity (a per-second sensor ≈ 86k rows/day) would load an unbounded row set into a single in-memory `Vec`. Added a hard `LIMIT MAX_HISTORY_ROWS` (1,000,000 — generous enough never to truncate a realistic history graph, bounded enough to cap the worst case); the sibling search paths were already `k`-bounded. (2) **Disk-DoS / documented-but-missing `purge`.** The README + HA-compat table advertised `Recorder::purge(older_than)` as a capability, but **no such method existed** — i.e. no retention path at all → unbounded disk growth. Implemented a **transactional** `purge` that deletes `states` + `events` strictly **older than** the cutoff (**exclusive** boundary — idempotent, no off-by-one; a row at the cutoff instant is kept) and **garbage-collects** orphaned `state_attributes` blobs (a dedup-shared blob is dropped only once its last referencing state is gone); all three deletes run in one transaction so a mid-purge failure rolls back cleanly (no states-deleted-but-events-kept corruption). **Confirmed clean with evidence:** SQL injection — **every** query in `db.rs` uses bound `?` parameters (no `format!`/string-concat of user data into SQL); the lone `format!` builds the LIKE *pattern*, which is itself bound as a parameter with `ESCAPE '\\'` and metacharacter escaping. Pinned: a state value `'; DROP TABLE states; --` is stored/queried **literally** (table survives), and a `%`/`_` in a search query matches **literally**, not as a wildcard. NaN-index poisoning (the calibration/vitals/geo class) — **structurally impossible** here: embeddings are SHA-256 → `i32` → `f32` (an `i32` cast to `f32` is always finite, never NaN/Inf), with an all-zero-digest norm guard; probed empty-index search, empty-string query, and `k=0` — all return `Ok(0)`, **no panic**. Fail-closed write path — a removal event yields `Ok(None)`, semantic-index failure is logged not propagated (best-effort, never blocks the durable SQLite write), and `EntityId` parsing failures fall back rather than panic. **6 new pinning tests** (SQL-injection literal-storage, LIKE-metacharacter literalness, history `LIMIT`, purge exclusive-boundary, purge attribute-GC-keeps-shared, purge old-events): `homecore-recorder` **19 → 25** (`--no-default-features`) / **25 → 31** (`--features ruvector`), 0 failed; the purge-boundary test is a true pin (fails deleting 2 rows under an inclusive cutoff, passes deleting 1 under the exclusive cutoff). Behaviour otherwise unchanged; Python deterministic proof unchanged (recorder is off the signal proof path).
|
||||
|
||||
### Added
|
||||
- **Metric-locked PCK/MPJPE accuracy harness — resolves the PCK-definition ambiguity (`wifi-densepose-train`, needs ADR slot 173).** The SOTA brief (`docs/research/sota-nn-train-benchmark-brief.md` §1, §3.1, §4) found the single biggest threat to any "beyond-SOTA" claim is **metric ambiguity**: three PCK@20 figures (96.09% WiFlow-STD image-normalized, 81.63% AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up because each silently uses a different normalization — the project was retracted twice over this (a withdrawn "92.9%" used *absolute* pixels, not torso). New `src/accuracy.rs` makes the normalizer **explicit, selectable, and carried with every reported number**: a `PckNormalization` enum (`TorsoDiameter` = standard MM-Fi/GraphPose-Fi hip↔hip; `BoundingBoxDiagonal` = looser WiFlow-STD image-normalized; `AbsolutePixels(threshold)` = the retracted convention, included so historical numbers are reproducible and clearly labeled non-comparable); one canonical `pck_at(pred, gt, vis, k, normalization)` reusing the `metrics_core` geometric primitives (hip distance, bbox diagonal — no duplicate kernel); `mpjpe(pred, gt, vis)` (2D/3D, mm); and a self-describing `PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames }` returned by `accuracy_report(frames, ks, normalization)` so an **unlabeled PCK number is structurally impossible**. **17 hand-computed deterministic tests** (no GPU, no datasets) prove the harness arithmetic: perfect→PCK=1.0/MPJPE=0; all-just-outside→0.0; half-in-half-out→0.5; the **key proof** that identical predictions score 0.50 (torso) / 1.00 (bbox) / 0.75 (abs) under the three normalizations (the ambiguity is real and the definitions are distinct); MPJPE 2D/3D fixtures; and graceful degenerate handling (zero torso, empty frames, NaN coords — no panic, never a false-perfect). **This is measurement infrastructure, not an accuracy claim** — the tests prove the harness is correct, not that any model is good. `wifi-densepose-train` lib 191→206, `test_metrics` 12→14, 0 failed. Python deterministic proof unchanged (off the signal proof path).
|
||||
- **RuField `rufield-viewer` live-ingest mode — closes the RuView↔RuField visual loop (ADR-262 surfaces).** The dashboard gains `--source live --upstream <RuView-URL>`: it consumes RuView's `/ws/field` SSE (falling back to polling `/api/field`), **verifies every event's ed25519 provenance receipt on ingest** (`is_fusable`) — forged/tampered events are flagged ✗ and **never fused** into trusted inferences — and renders real RuView `FieldEvent`s through the same room-state/privacy-badge/fusion-graph/receipt path the synthetic mode uses (wire-compatible by construction: both sides use `rufield_core::FieldEvent` serde). **Strict banner honesty:** a single `BannerState` shows `SYNTHETIC` / `LIVE — <upstream>` / `DISCONNECTED — <upstream> unreachable`, mutually exclusive — never SYNTHETIC while showing live data or vice versa; live mode returns **409** on `/api/run` rather than fabricate a synthetic run, and starts DISCONNECTED until first verified contact. Default stays synthetic. 26 tests / 0 failed. `ruvnet/rufield` `crates/rufield-viewer`; `vendor/rufield` submodule bumped.
|
||||
- **ADR-262 P3 — live RuField surface: RuView's running sensing-server now speaks RuField on `/api/field` + `/ws/field`.** Wires the P1 `wifi-densepose-rufield` bridge into the live `wifi-densepose-sensing-server` (the bridge is the only added coupling, ADR-262 §5.4). A new `src/rufield_surface.rs` module (kept out of the 8k-line `main.rs`) holds a `FieldSurface` with a **dedicated ed25519 `Signer`**, a bounded ring buffer of recent signed events (`FIELD_RING_CAPACITY = 64`), and the `/ws/field` broadcast topic; it exposes `GET /api/field` (latest signed `FieldEvent`s + signer pubkey + a `dev_signing_key` flag) and `GET /ws/field` (per-cycle stream, mirroring `/ws/sensing`), plus a standalone `router()` for isolated testing. **Tap:** at the ESP32 governed-trust cycle (`main.rs` `observe_cycle` ~`:5886` / `SensingUpdate` build ~`:5938`), `emit_rufield_event` joins the cycle's real `SensingUpdate` (features/classification/signal_field) with the engine's recorded `effective_class`/`demoted` trust state into a `SensingSnapshot` and surfaces a signed `FieldEvent` — **existing endpoints (`/ws/sensing` etc.) are unchanged; this is purely additive.** **Signer (defers the P2 key decision, §8 Q1):** a **standalone dev/sensing key** from `WDP_RUFIELD_SIGNING_SEED` (64-hex or ≥32-byte value), else a deterministic dev default with a logged `WARN` — reusing the `cog-ha-matter` Ed25519 key is the deferred P2 call, so P3 does not pre-empt it. **Egress privacy (fail-closed):** `network_egress_allowed` is *stricter* than `DefaultPrivacyGuard` for an unattended live surface — only **P1/P2** leave the box; P0 (raw) and P3/P4/P5 are held edge-local, so a `Derived → P4/P5` cycle **never** surfaces; no-presence cycles emit **no phantom event**. **P3 acceptance gates (`tests/rufield_surface_test.rs`, 4 integration via `tower::oneshot` + 4 module unit, 0 failed):** a well-formed **signed** event (`Modality::WifiCsi`, P2 not P1, `is_fusable` ed25519-verified, real timestamp); empty cycle → no phantom; **privacy-safety** — an injected `Derived` trust never surfaces; a mixed stream surfaces only egress-safe events. **Honest scope (ADR-262 §0/§6):** real plumbing on a **live endpoint**, **NOT accuracy** — single-link CSI with its existing caveats (no validated room-coordinate accuracy — `field_localize`), a dedicated dev signing key pending the P2 ownership decision, no accuracy claim. The win is narrowly: "RuView's live sensing now speaks RuField on `/ws/field`."
|
||||
- **ADR-262 P1 — `wifi-densepose-rufield` anti-corruption bridge: RuView WiFi-CSI sensing → signed RuField `FieldEvent`s.** A new v2 workspace member (the *single coupling point* between RuView and the standalone RuField MFS spec, ADR-262 §5.4) that **path-deps** the `vendor/rufield` submodule crates (`rufield-core`/`-provenance`/`-privacy`/`-fusion` — pure-Rust, `--no-default-features`-buildable: serde/sha2/ed25519/toml only, no tch/openblas/ndarray/candle) and **no** RuView internal crate. The bridge takes owned primitives — `SensingSnapshot` mirrors the `/ws/sensing` `SensingUpdate` (features + classification + signal_field) joined with the `TrustedOutput` trust state (`trust_class`/`demoted`/`identity_bound`) — and `snapshot_to_field_event()` emits one **signed** `FieldEvent` (`Modality::WifiCsi`, axis `[Frequency]`): a real `FieldTensor` from the feature scalars with the real `timestamp_ns`; an `Observation` whose `range_m`/`motion_vector`/`space_cell` are derived from the strongest **signal-field peak** when present (else `None` — coordinates are **never fabricated**, per the `field_localize` caveat) and `confidence` from the classification; a real `ProvenanceRef` (sha256 over the tensor bytes, `synthetic=false`) **ed25519-signed** so `rufield_provenance::is_fusable` passes. **The §3.3 privacy mapping is the critical correctness item**, implemented as `map_privacy()` mapping RuView's class onto RuField P0–P5 **by information content, NEVER by byte value** and **fail-closed**: RuView `Derived` (byte `1`, which sorts *below* `Anonymous` byte `2`) carries an identity embedding → maps to **P4** (or **P5** if identity-bound), **never P1** (the single most dangerous mapping mistake); `Raw → P0`, `Anonymous → P2`, `Restricted → P2`; a governed-engine `demoted` cycle floors the egress class to ≥ P2 with raw suppressed. **P1 acceptance gates (15 tests / 0 failed — 5 unit + 9 integration + 1 doc):** round-trip (`SensingSnapshot → FieldEvent →` serde `→` equal), `is_fusable` (verified ed25519 receipt), `RuFieldFusion::ingest` accept + `infer()` runs, **privacy-safety** (`gate_privacy_safety_derived_never_maps_to_low_privacy` — `Derived → P4/P5`, never P1; a table test over every RuView class; fail-closed demotion), and determinism (same snapshot + same signer seed → byte-identical event). **Honest scope:** this is **P1 plumbing** — a tested conversion + a safe privacy mapping. It is **not** wired into the live server (that is P3) and makes **no accuracy claim** (RuField v0.1 is synthetic; RuView's single-link CSI carries its own caveats). CI: the `rust-tests` workflow checkout gains `submodules: recursive` so the path-deps resolve. Python deterministic proof unchanged (off the signal proof path).
|
||||
|
|
|
|||
|
|
@ -0,0 +1,123 @@
|
|||
# ADR-173: Metric-Locked PCK/MPJPE Accuracy Harness
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Status** | Accepted — implemented, deterministically tested |
|
||||
| **Date** | 2026-06-15 |
|
||||
| **Deciders** | ruv |
|
||||
| **Codename** | **METRIC-LOCK** |
|
||||
| **Amends** | ADR-155 (generalizes the torso-only `metrics_core::pck_canonical` to a selectable normalization) |
|
||||
| **Motivated by** | `docs/research/sota-nn-train-benchmark-brief.md` (PR #1090) |
|
||||
|
||||
## Context
|
||||
|
||||
The beyond-SOTA SOTA-research brief (PR #1090) identified the single biggest
|
||||
threat to any "beyond-SOTA" accuracy claim this project makes: **metric
|
||||
ambiguity**. Three PCK@20 numbers circulate, computed under three *different and
|
||||
unstated* normalizations, so they cannot be compared:
|
||||
|
||||
- **96.09–96.61%** — WiFlow-STD reproduction, **image/bounding-box-normalized** PCK (the looser convention).
|
||||
- **81.63%** — an internal MM-Fi number reported as **"torso-PCK"** (tighter).
|
||||
- **61.1%** — GraphPose-Fi (arXiv 2511.19105), **standard torso-diameter** PCK on the MM-Fi random split (the academic frontier).
|
||||
|
||||
The project has been burned by this twice: a previously-published 92.9% was
|
||||
retracted because it used **absolute-pixel** normalization, not torso. Until
|
||||
there is *one canonical, documented, tested* PCK definition — and every reported
|
||||
number carries the definition it was computed under — no accuracy comparison is
|
||||
credible, and the "prove everything" bar cannot be met for the benchmark half of
|
||||
the work.
|
||||
|
||||
This is measurement infrastructure, not an accuracy claim. The deliverable's job
|
||||
is to make the metric **unambiguous and reproducible**, so future numbers are
|
||||
comparable and an unlabeled PCK is structurally impossible.
|
||||
|
||||
## Decision
|
||||
|
||||
Add a metric-locked accuracy harness as a new module
|
||||
`v2/crates/wifi-densepose-train/src/accuracy.rs` (404 non-test lines; inline
|
||||
deterministic tests bring the file to 708), re-exported at the crate root. It
|
||||
**extends, not duplicates** — it reuses `metrics_core`'s geometric primitives
|
||||
(`bounding_box_diagonal`, canonical hip indices `CANON_LEFT_HIP/RIGHT_HIP`), so
|
||||
there remains exactly one implementation of each geometric reference; the
|
||||
existing ADR-155 `pck_canonical` (torso-only) is unchanged and this generalizes
|
||||
it.
|
||||
|
||||
### Public API
|
||||
|
||||
- `enum PckNormalization { TorsoDiameter, BoundingBoxDiagonal, AbsolutePixels(f32) }`
|
||||
— the three conventions the three historical numbers used, now **explicit and
|
||||
selectable**. `.label()` / `.tolerance(...)`.
|
||||
- `pck_at(pred, gt, vis, k, norm) -> (correct, total, pck)` — PCK@k =
|
||||
fraction of *visible* keypoints whose predicted-vs-GT distance ≤ the tolerance,
|
||||
where tolerance = `k%` of the chosen normalizer (or an absolute threshold for
|
||||
`AbsolutePixels`).
|
||||
- `mpjpe(pred, gt, vis) -> f32` — mean per-joint position error (2D/3D, coordinate
|
||||
units; mm for mm inputs). Re-exported crate-root as `pck_mpjpe` to avoid
|
||||
colliding with the existing `eval::mpjpe`.
|
||||
- `struct PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints, n_frames }`
|
||||
— **a reported number always carries its `normalization`**; an unlabeled PCK is
|
||||
structurally impossible to produce through this surface.
|
||||
- `struct PoseFrame { pred, gt, visibility }` + `accuracy_report(frames, ks, norm) -> PoseAccuracy`
|
||||
(micro-averaged over keypoints).
|
||||
|
||||
### Correctness is proven by hand-computed deterministic tests (no GPU, no data)
|
||||
|
||||
The tests construct synthetic keypoint sets whose PCK/MPJPE can be computed by
|
||||
hand, and assert the harness matches. Highlights (all pass):
|
||||
|
||||
| Test | Construction | Expected |
|
||||
|------|--------------|----------|
|
||||
| perfect_prediction | pred==gt | PCK=1.0 (all 3 norms), MPJPE=0 |
|
||||
| all_just_outside | every error just past τ@20 | PCK=0.0 |
|
||||
| half_in_half_out | 2 exact, 2 just outside | PCK=0.5 |
|
||||
| **three_normalizations (KEY PROOF)** | identical pred; nose err .06, shoulder .10, hips exact | torso=**0.50**, bbox=**1.00**, abs(.08)=**0.75** |
|
||||
| mpjpe_2d / mpjpe_3d | (3,4)→5 / (1,2,2)→3 | 2.5 / 3.0 |
|
||||
| mpjpe_excludes_invisible | invisible joint err 100 ignored | 5.0 |
|
||||
| zero_torso_unscoreable | coincident hips | `(0,0,0.0)`, **not** false-perfect |
|
||||
| no_visible_keypoints | vis=∅ | `(0,0,0.0)` |
|
||||
| nan_coords | one NaN pred coord | counted wrong, **no panic** |
|
||||
| empty report | no frames | 0.0, **not** NaN |
|
||||
| bbox≥torso ordering | same frames | bbox-PCK ≥ torso-PCK |
|
||||
|
||||
### The key proof (the ambiguity is real and quantified)
|
||||
|
||||
Identical predictions, three declared normalizations → **0.50 / 1.00 / 0.75**.
|
||||
Mechanism: the bbox diagonal `√(0.20² + 0.80²) = 0.825` is ~4× the hip-span torso
|
||||
`0.20`, so τ@20 is 0.165 (bbox) vs 0.040 (torso) — the looser image-normalized
|
||||
convention passes joints the strict torso convention rejects. This is *exactly*
|
||||
why 96% / 81.6% / 61% cannot be lined up without declaring the enum, demonstrated
|
||||
in-code.
|
||||
|
||||
## Validation
|
||||
|
||||
- `cargo test -p wifi-densepose-train --no-default-features` → lib **191 → 206**
|
||||
(+15), `test_metrics` **12 → 14** (+2), doc-tests 8 — **0 failed**.
|
||||
- `cargo test --workspace --no-default-features` → **exit 0**, 0 failed.
|
||||
- `python archive/v1/data/proof/verify.py` → **VERDICT: PASS**, hash
|
||||
`f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a` **unchanged**
|
||||
(off the signal proof path — confirms no pipeline alteration).
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
- The three historical PCK numbers can now be **recomputed under one declared
|
||||
definition** and compared honestly. The retracted-number class of error
|
||||
(silent normalization mismatch) is structurally prevented going forward.
|
||||
- Establishes the measurement substrate for the beyond-SOTA target: GraphPose-Fi
|
||||
cross-environment **PCK@20 = 12.9%** (standard torso PCK) is now a number this
|
||||
harness can produce comparably.
|
||||
|
||||
### Negative
|
||||
- None functional. The harness is additive; no existing metric path changed.
|
||||
|
||||
### Neutral
|
||||
- Producing actual model numbers under this harness requires the trained models +
|
||||
datasets (MM-Fi) and, for cross-domain splits, is the next sub-deliverable of
|
||||
the benchmark/optimization milestone — out of scope here (this ADR is the
|
||||
*instrument*, not the *reading*).
|
||||
|
||||
## Links
|
||||
- ADR-155 — metric core (`pck_canonical`, torso-only) — generalized here
|
||||
- ADR-152 — WiFi-Pose SOTA 2026 intake / WiFlow-STD benchmark
|
||||
- `docs/research/sota-nn-train-benchmark-brief.md` — the motivating gap analysis
|
||||
- GraphPose-Fi — arXiv 2511.19105 (verified cross-env PCK@20 = 12.9% anchor)
|
||||
|
|
@ -0,0 +1,708 @@
|
|||
//! Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173).
|
||||
//!
|
||||
//! # Why this module exists
|
||||
//!
|
||||
//! Three PCK\@20 numbers float around this project and **cannot be lined up**
|
||||
//! because each silently uses a *different* PCK definition:
|
||||
//!
|
||||
//! | Number | Source | PCK normalization |
|
||||
//! |--------|--------|-------------------|
|
||||
//! | 96.09 % | WiFlow-STD reproduction | image / bounding-box normalized (looser) |
|
||||
//! | 81.63 % | AetherArena MM-Fi (ADR-150) | torso-diameter (standard MM-Fi / GraphPose-Fi) |
|
||||
//! | 61.1 % | GraphPose-Fi (preprint) | torso-diameter, 3D, mm-scale (harder) |
|
||||
//!
|
||||
//! The project was burned **twice** by metric ambiguity (a now-retracted "92.9 %
|
||||
//! PCK\@20" used *absolute* pixel thresholds, not torso normalization). The fix
|
||||
//! is to make the normalizer **explicit, selectable, and carried with every
|
||||
//! reported number** so an unlabeled PCK figure is structurally impossible.
|
||||
//!
|
||||
//! [`metrics_core`](crate::metrics_core) already pins the *canonical*
|
||||
//! torso-normalized PCK ([`pck_canonical`](crate::metrics_core::pck_canonical)).
|
||||
//! This module generalizes it to a [`PckNormalization`] enum covering all three
|
||||
//! conventions the SOTA brief names, adds [`mpjpe`] (mm), and bundles results
|
||||
//! into a self-describing [`PoseAccuracy`] struct. It **reuses** the
|
||||
//! `metrics_core` primitives (hip distance, bounding-box diagonal) — there is
|
||||
//! still exactly one implementation of each geometric reference.
|
||||
//!
|
||||
//! # This is measurement infrastructure, not an accuracy claim
|
||||
//!
|
||||
//! Nothing here asserts any project model is good. The unit tests prove the
|
||||
//! *harness* is arithmetically correct against hand-computed fixtures (no GPU,
|
||||
//! no datasets), including the key demonstration that the **same predictions
|
||||
//! score different PCK under the three normalizations** — proof the ambiguity is
|
||||
//! real and the definitions are genuinely distinct.
|
||||
//!
|
||||
//! # Literature
|
||||
//!
|
||||
//! - Torso-diameter PCK is the MM-Fi / GraphPose-Fi convention (Yang et al.,
|
||||
//! *GraphPose-Fi*, arXiv:2511.19105): a keypoint is correct iff its error is
|
||||
//! within `k · d_torso`, with `d_torso` the hip↔hip (or shoulder↔hip) span.
|
||||
//! - Bounding-box / image-normalized PCK is the WiFlow-STD-style looser
|
||||
//! convention (arXiv:2602.08661) — normalize by the GT pose bbox diagonal.
|
||||
//! - MPJPE (mean per-joint position error, mm) is reported by GraphPose-Fi and
|
||||
//! Person-in-WiFi-3D (Yan et al., CVPR 2024).
|
||||
|
||||
use std::collections::BTreeMap;
|
||||
|
||||
use ndarray::{Array1, Array2};
|
||||
|
||||
use crate::metrics_core::{
|
||||
bounding_box_diagonal, CANON_LEFT_HIP, CANON_RIGHT_HIP,
|
||||
};
|
||||
|
||||
/// Visibility cutoff: a keypoint counts as *visible* iff `visibility[j] >= 0.5`
|
||||
/// (COCO convention; matches [`crate::metrics_core`]).
|
||||
const VISIBILITY_THRESHOLD: f32 = 0.5;
|
||||
|
||||
/// Minimum positive normalizer extent. Below this the reference scale is
|
||||
/// considered degenerate (zero torso, collapsed bbox) and the frame is reported
|
||||
/// unscoreable rather than dividing by ≈0.
|
||||
const MIN_REFERENCE_EXTENT: f32 = 1e-6;
|
||||
|
||||
// ===========================================================================
|
||||
// PCK normalization — the explicit, selectable definition
|
||||
// ===========================================================================
|
||||
|
||||
/// The PCK normalization basis — **the single knob that made three project
|
||||
/// numbers non-comparable**, now explicit and carried with every result.
|
||||
///
|
||||
/// A keypoint `j` (with `visibility[j] >= 0.5`) is *correct* iff
|
||||
/// `‖pred_j − gt_j‖₂ ≤ τ`, where the **distance tolerance `τ`** is derived from
|
||||
/// the chosen normalization and the PCK threshold `k` (given as a percentage,
|
||||
/// e.g. `20` for PCK\@20):
|
||||
///
|
||||
/// | Variant | `τ` (tolerance in coordinate units) |
|
||||
/// |---------|--------------------------------------|
|
||||
/// | [`TorsoDiameter`](Self::TorsoDiameter) | `(k/100) · d_torso` |
|
||||
/// | [`BoundingBoxDiagonal`](Self::BoundingBoxDiagonal) | `(k/100) · d_bbox` |
|
||||
/// | [`AbsolutePixels`](Self::AbsolutePixels) | `threshold` (k ignored) |
|
||||
///
|
||||
/// `d_torso` is the hip↔hip span (COCO joints 11↔12), falling back to the bbox
|
||||
/// diagonal when both hips are not visible — identical to
|
||||
/// [`crate::metrics_core::canonical_torso_size`]. `d_bbox` is the diagonal of
|
||||
/// the axis-aligned bounding box of all visible GT keypoints.
|
||||
///
|
||||
/// These yield **different** PCK on the *same* predictions whenever
|
||||
/// `d_torso ≠ d_bbox` (always true for a real pose: the bbox is larger than the
|
||||
/// hip span), which is exactly why the 96 / 81.6 / 61 numbers cannot be lined
|
||||
/// up without declaring this enum.
|
||||
#[derive(Debug, Clone, Copy, PartialEq)]
|
||||
pub enum PckNormalization {
|
||||
/// **Torso-diameter** (hip↔hip span). The standard MM-Fi / GraphPose-Fi
|
||||
/// convention and the *stricter* of the two relative normalizers. This is
|
||||
/// the canonical default ([`crate::metrics_core::pck_canonical`]).
|
||||
TorsoDiameter,
|
||||
/// **Bounding-box diagonal** (a.k.a. image-normalized). The looser
|
||||
/// WiFlow-STD-style convention: normalize by the GT pose bbox diagonal,
|
||||
/// which is larger than the torso span ⇒ a more forgiving threshold ⇒ a
|
||||
/// higher PCK on identical predictions.
|
||||
BoundingBoxDiagonal,
|
||||
/// **Absolute pixel/coordinate threshold** — no pose-relative
|
||||
/// normalization. The PCK `k` percentage is ignored; the held `threshold`
|
||||
/// is the raw distance tolerance directly. Included so historical
|
||||
/// retracted-style numbers are reproducible, and **clearly labeled as
|
||||
/// non-comparable** to the relative variants (it does not scale with body
|
||||
/// size or camera distance).
|
||||
AbsolutePixels(f32),
|
||||
}
|
||||
|
||||
impl PckNormalization {
|
||||
/// Human-readable, *self-documenting* label for a reported number — so a
|
||||
/// `PoseAccuracy` printed anywhere always carries its definition.
|
||||
pub fn label(&self) -> String {
|
||||
match self {
|
||||
PckNormalization::TorsoDiameter => "torso-diameter".to_string(),
|
||||
PckNormalization::BoundingBoxDiagonal => "bbox-diagonal".to_string(),
|
||||
PckNormalization::AbsolutePixels(t) => format!("absolute-px({t})"),
|
||||
}
|
||||
}
|
||||
|
||||
/// Compute the per-frame distance tolerance `τ` for PCK threshold `k`
|
||||
/// (percentage). Returns `None` when the (relative) normalizer is degenerate
|
||||
/// — the frame cannot be scored.
|
||||
///
|
||||
/// `gt_kpts` is `[n, 2]` (or `[n, ≥2]`, only x/y used); `visibility` is `[n]`.
|
||||
fn tolerance(&self, gt_kpts: &Array2<f32>, visibility: &Array1<f32>, k: u8) -> Option<f32> {
|
||||
let n = gt_kpts.shape()[0].min(visibility.len());
|
||||
match self {
|
||||
PckNormalization::AbsolutePixels(threshold) => {
|
||||
// Raw tolerance, independent of pose scale and of `k`.
|
||||
if *threshold > 0.0 {
|
||||
Some(*threshold)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
PckNormalization::TorsoDiameter => {
|
||||
let d = torso_diameter(gt_kpts, visibility, n)?;
|
||||
Some((k as f32 / 100.0) * d)
|
||||
}
|
||||
PckNormalization::BoundingBoxDiagonal => {
|
||||
let d = bounding_box_diagonal(gt_kpts, visibility, n);
|
||||
if d > MIN_REFERENCE_EXTENT {
|
||||
Some((k as f32 / 100.0) * d)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Hip↔hip torso diameter with a bbox-diagonal fallback — the relative
|
||||
/// normalizer shared by `TorsoDiameter` PCK and
|
||||
/// [`crate::metrics_core::canonical_torso_size`]. Returns `None` when no
|
||||
/// positive-extent reference exists.
|
||||
fn torso_diameter(gt_kpts: &Array2<f32>, visibility: &Array1<f32>, n: usize) -> Option<f32> {
|
||||
if CANON_LEFT_HIP < n
|
||||
&& CANON_RIGHT_HIP < n
|
||||
&& visibility[CANON_LEFT_HIP] >= VISIBILITY_THRESHOLD
|
||||
&& visibility[CANON_RIGHT_HIP] >= VISIBILITY_THRESHOLD
|
||||
{
|
||||
let dx = gt_kpts[[CANON_LEFT_HIP, 0]] - gt_kpts[[CANON_RIGHT_HIP, 0]];
|
||||
let dy = gt_kpts[[CANON_LEFT_HIP, 1]] - gt_kpts[[CANON_RIGHT_HIP, 1]];
|
||||
let torso = (dx * dx + dy * dy).sqrt();
|
||||
if torso > MIN_REFERENCE_EXTENT {
|
||||
return Some(torso);
|
||||
}
|
||||
}
|
||||
let diag = bounding_box_diagonal(gt_kpts, visibility, n);
|
||||
if diag > MIN_REFERENCE_EXTENT {
|
||||
Some(diag)
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
|
||||
// ===========================================================================
|
||||
// Single-frame PCK / MPJPE
|
||||
// ===========================================================================
|
||||
|
||||
/// Per-frame **PCK\@`k`** under the selected `normalization`.
|
||||
///
|
||||
/// A keypoint `j` with `visibility[j] >= 0.5` is correct iff
|
||||
/// `‖pred_j − gt_j‖₂ ≤ τ`, with `τ` from
|
||||
/// [`PckNormalization::tolerance`]. Only x/y are used (2D PCK is the standard
|
||||
/// keypoint-PCK definition; pass 2-column arrays).
|
||||
///
|
||||
/// # Returns
|
||||
/// `(correct, total, pck)` with `pck ∈ [0,1]`. **`(0, 0, 0.0)`** when no
|
||||
/// keypoint is visible, or (for the relative normalizers) the reference scale is
|
||||
/// degenerate — a frame with no measurable evidence scores 0, never 1.
|
||||
/// NaN-valued coordinates make a keypoint *incorrect* (the `<=` comparison is
|
||||
/// false for NaN) rather than panicking.
|
||||
pub fn pck_at(
|
||||
pred_kpts: &Array2<f32>,
|
||||
gt_kpts: &Array2<f32>,
|
||||
visibility: &Array1<f32>,
|
||||
k: u8,
|
||||
normalization: PckNormalization,
|
||||
) -> (usize, usize, f32) {
|
||||
let n = pred_kpts.shape()[0]
|
||||
.min(gt_kpts.shape()[0])
|
||||
.min(visibility.len());
|
||||
let tol = match normalization.tolerance(gt_kpts, visibility, k) {
|
||||
Some(t) => t,
|
||||
None => return (0, 0, 0.0),
|
||||
};
|
||||
|
||||
let mut correct = 0usize;
|
||||
let mut total = 0usize;
|
||||
for j in 0..n {
|
||||
if visibility[j] < VISIBILITY_THRESHOLD {
|
||||
continue;
|
||||
}
|
||||
total += 1;
|
||||
let dx = pred_kpts[[j, 0]] - gt_kpts[[j, 0]];
|
||||
let dy = pred_kpts[[j, 1]] - gt_kpts[[j, 1]];
|
||||
let dist = (dx * dx + dy * dy).sqrt();
|
||||
// NaN-safe: `NaN <= tol` is false, so a NaN coordinate counts as wrong.
|
||||
if dist <= tol {
|
||||
correct += 1;
|
||||
}
|
||||
}
|
||||
let pck = if total > 0 {
|
||||
correct as f32 / total as f32
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
(correct, total, pck)
|
||||
}
|
||||
|
||||
/// Per-frame **MPJPE** (mean per-joint position error) over visible keypoints,
|
||||
/// in the coordinate units of the inputs (report as mm when inputs are mm).
|
||||
///
|
||||
/// `pred`/`gt` are `[n, D]` with `D ∈ {2, 3}` (2D or 3D pose); all `D` columns
|
||||
/// are used. Joints with `visibility[j] < 0.5` are excluded.
|
||||
///
|
||||
/// Returns `0.0` when no keypoint is visible (no evidence). A NaN coordinate
|
||||
/// propagates into the returned mean (callers filter NaN frames upstream); it
|
||||
/// does not panic.
|
||||
pub fn mpjpe(pred: &Array2<f32>, gt: &Array2<f32>, visibility: &Array1<f32>) -> f32 {
|
||||
let n = pred.shape()[0].min(gt.shape()[0]).min(visibility.len());
|
||||
let d = pred.shape()[1].min(gt.shape()[1]);
|
||||
let mut sum = 0.0f32;
|
||||
let mut count = 0usize;
|
||||
for j in 0..n {
|
||||
if visibility[j] < VISIBILITY_THRESHOLD {
|
||||
continue;
|
||||
}
|
||||
let mut sq = 0.0f32;
|
||||
for c in 0..d {
|
||||
let diff = pred[[j, c]] - gt[[j, c]];
|
||||
sq += diff * diff;
|
||||
}
|
||||
sum += sq.sqrt();
|
||||
count += 1;
|
||||
}
|
||||
if count > 0 {
|
||||
sum / count as f32
|
||||
} else {
|
||||
0.0
|
||||
}
|
||||
}
|
||||
|
||||
// ===========================================================================
|
||||
// Self-describing result struct + batch report
|
||||
// ===========================================================================
|
||||
|
||||
/// A pose-accuracy result that **always carries the definition it was computed
|
||||
/// under** — making an unlabeled PCK number structurally impossible.
|
||||
///
|
||||
/// Built by [`accuracy_report`] over a set of frames. `pck_at` maps each
|
||||
/// requested threshold `k` (percentage, e.g. `20`) to its PCK in `[0,1]`. The
|
||||
/// `normalization` field records *which* PCK definition produced those numbers,
|
||||
/// so two `PoseAccuracy` values can only be compared when their `normalization`
|
||||
/// matches (the comparability check the project lacked).
|
||||
#[derive(Debug, Clone, PartialEq)]
|
||||
pub struct PoseAccuracy {
|
||||
/// PCK\@k for each requested threshold percentage `k`, in `[0,1]`.
|
||||
pub pck_at: BTreeMap<u8, f32>,
|
||||
/// Mean per-joint position error in coordinate units (mm for mm inputs).
|
||||
pub mpjpe: f32,
|
||||
/// The normalization basis under which `pck_at` was computed — the label a
|
||||
/// reported number must always carry.
|
||||
pub normalization: PckNormalization,
|
||||
/// Number of keypoints per frame (the pose convention, e.g. 17 for COCO).
|
||||
pub n_keypoints: usize,
|
||||
/// Number of frames aggregated into this result.
|
||||
pub n_frames: usize,
|
||||
}
|
||||
|
||||
impl PoseAccuracy {
|
||||
/// Convenience accessor for a single threshold, returning `None` when that
|
||||
/// `k` was not requested.
|
||||
pub fn pck(&self, k: u8) -> Option<f32> {
|
||||
self.pck_at.get(&k).copied()
|
||||
}
|
||||
|
||||
/// A one-line, self-documenting summary suitable for logs / RESULTS.md, e.g.
|
||||
/// `PCK@20=0.750 (torso-diameter, 17kp, 1 frames) MPJPE=0.030`.
|
||||
pub fn summary(&self) -> String {
|
||||
let pcks: Vec<String> = self
|
||||
.pck_at
|
||||
.iter()
|
||||
.map(|(k, v)| format!("PCK@{k}={v:.3}"))
|
||||
.collect();
|
||||
format!(
|
||||
"{} ({}, {}kp, {} frames) MPJPE={:.4}",
|
||||
pcks.join(" "),
|
||||
self.normalization.label(),
|
||||
self.n_keypoints,
|
||||
self.n_frames,
|
||||
self.mpjpe
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/// One frame's prediction + ground truth + visibility for batch scoring.
|
||||
///
|
||||
/// All three arrays share row count `n_keypoints`; `pred`/`gt` are `[n, D]`
|
||||
/// (`D ∈ {2,3}`), `visibility` is `[n]`.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct PoseFrame {
|
||||
/// Predicted keypoints `[n, D]`.
|
||||
pub pred: Array2<f32>,
|
||||
/// Ground-truth keypoints `[n, D]`.
|
||||
pub gt: Array2<f32>,
|
||||
/// Per-keypoint visibility `[n]` (`>= 0.5` ⇒ visible).
|
||||
pub visibility: Array1<f32>,
|
||||
}
|
||||
|
||||
/// Aggregate [`PoseAccuracy`] over a batch of frames under **one** explicit
|
||||
/// `normalization`, for the requested PCK thresholds `ks` (percentages).
|
||||
///
|
||||
/// PCK is micro-averaged over keypoints (sum of correct ÷ sum of visible across
|
||||
/// all frames — the standard keypoint-PCK aggregation), so frames with more
|
||||
/// visible joints contribute proportionally. MPJPE is micro-averaged over
|
||||
/// visible joints likewise. Unscoreable frames (no visible joints, degenerate
|
||||
/// relative normalizer) contribute `(0, 0)` and so are excluded from the
|
||||
/// denominator rather than scored as perfect.
|
||||
///
|
||||
/// An **empty** `frames` slice yields all-zero PCK and `0.0` MPJPE — never a
|
||||
/// panic or NaN.
|
||||
pub fn accuracy_report(
|
||||
frames: &[PoseFrame],
|
||||
ks: &[u8],
|
||||
normalization: PckNormalization,
|
||||
) -> PoseAccuracy {
|
||||
let n_keypoints = frames.first().map(|f| f.gt.shape()[0]).unwrap_or(0);
|
||||
|
||||
// PCK: per-threshold (correct, total) accumulators across frames.
|
||||
let mut pck_acc: BTreeMap<u8, (usize, usize)> = ks.iter().map(|&k| (k, (0, 0))).collect();
|
||||
// MPJPE: sum of per-joint distances and visible-joint count.
|
||||
let mut mpjpe_sum = 0.0f32;
|
||||
let mut mpjpe_count = 0usize;
|
||||
|
||||
for frame in frames {
|
||||
for &k in ks {
|
||||
let (c, t, _) = pck_at(&frame.pred, &frame.gt, &frame.visibility, k, normalization);
|
||||
let entry = pck_acc.entry(k).or_insert((0, 0));
|
||||
entry.0 += c;
|
||||
entry.1 += t;
|
||||
}
|
||||
// Per-frame MPJPE re-derived as a (sum, count) contribution so the
|
||||
// batch value is a true micro-average over joints.
|
||||
let n = frame.pred.shape()[0].min(frame.gt.shape()[0]).min(frame.visibility.len());
|
||||
let d = frame.pred.shape()[1].min(frame.gt.shape()[1]);
|
||||
for j in 0..n {
|
||||
if frame.visibility[j] < VISIBILITY_THRESHOLD {
|
||||
continue;
|
||||
}
|
||||
let mut sq = 0.0f32;
|
||||
for c in 0..d {
|
||||
let diff = frame.pred[[j, c]] - frame.gt[[j, c]];
|
||||
sq += diff * diff;
|
||||
}
|
||||
mpjpe_sum += sq.sqrt();
|
||||
mpjpe_count += 1;
|
||||
}
|
||||
}
|
||||
|
||||
let pck_at: BTreeMap<u8, f32> = pck_acc
|
||||
.into_iter()
|
||||
.map(|(k, (c, t))| {
|
||||
let v = if t > 0 { c as f32 / t as f32 } else { 0.0 };
|
||||
(k, v)
|
||||
})
|
||||
.collect();
|
||||
|
||||
let mpjpe = if mpjpe_count > 0 {
|
||||
mpjpe_sum / mpjpe_count as f32
|
||||
} else {
|
||||
0.0
|
||||
};
|
||||
|
||||
PoseAccuracy {
|
||||
pck_at,
|
||||
mpjpe,
|
||||
normalization,
|
||||
n_keypoints,
|
||||
n_frames: frames.len(),
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
/// Build a 17-joint `[17, 2]` pose from `(joint, x, y)` triples.
|
||||
fn pose17(joints: &[(usize, f32, f32)]) -> Array2<f32> {
|
||||
let mut a = Array2::<f32>::zeros((17, 2));
|
||||
for &(j, x, y) in joints {
|
||||
a[[j, 0]] = x;
|
||||
a[[j, 1]] = y;
|
||||
}
|
||||
a
|
||||
}
|
||||
|
||||
fn vis17(visible: &[usize]) -> Array1<f32> {
|
||||
let mut v = Array1::<f32>::zeros(17);
|
||||
for &j in visible {
|
||||
v[j] = 2.0;
|
||||
}
|
||||
v
|
||||
}
|
||||
|
||||
// -------- consts pinned (no silent metric drift) --------
|
||||
#[test]
|
||||
fn accuracy_consts_unchanged() {
|
||||
assert_eq!(VISIBILITY_THRESHOLD, 0.5_f32);
|
||||
assert_eq!(MIN_REFERENCE_EXTENT, 1e-6_f32);
|
||||
}
|
||||
|
||||
// -------- perfect prediction ⇒ PCK = 1.0, MPJPE = 0 --------
|
||||
#[test]
|
||||
fn perfect_prediction_pck_one_mpjpe_zero() {
|
||||
let gt = pose17(&[
|
||||
(5, 0.35, 0.35),
|
||||
(CANON_LEFT_HIP, 0.40, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.50),
|
||||
]);
|
||||
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
for norm in [
|
||||
PckNormalization::TorsoDiameter,
|
||||
PckNormalization::BoundingBoxDiagonal,
|
||||
PckNormalization::AbsolutePixels(0.01),
|
||||
] {
|
||||
let (c, t, pck) = pck_at(>, >, &vis, 20, norm);
|
||||
assert_eq!((c, t), (3, 3), "{norm:?}");
|
||||
assert!((pck - 1.0).abs() < 1e-6, "{norm:?} perfect PCK must be 1.0");
|
||||
}
|
||||
assert_eq!(mpjpe(>, >, &vis), 0.0);
|
||||
}
|
||||
|
||||
// -------- all keypoints just OUTSIDE threshold ⇒ PCK = 0.0 --------
|
||||
//
|
||||
// Hand calc (torso): hips at (0.40,0.50)/(0.60,0.50) ⇒ torso = 0.20.
|
||||
// threshold k=20 ⇒ τ = 0.20·0.20 = 0.04. Push every scored joint to an
|
||||
// error of 0.05 (> 0.04) ⇒ all wrong. To avoid the hips themselves being
|
||||
// "correct", we displace the hips too (their displaced positions still
|
||||
// define the torso from GT, which is unchanged).
|
||||
#[test]
|
||||
fn all_just_outside_threshold_pck_zero() {
|
||||
let gt = pose17(&[
|
||||
(5, 0.50, 0.50),
|
||||
(CANON_LEFT_HIP, 0.40, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.50),
|
||||
]);
|
||||
// GT torso = 0.20, τ@20 = 0.04. Displace each scored joint by dx=0.05.
|
||||
let pred = pose17(&[
|
||||
(5, 0.55, 0.50),
|
||||
(CANON_LEFT_HIP, 0.45, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.65, 0.50),
|
||||
]);
|
||||
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let (c, t, pck) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter);
|
||||
assert_eq!(t, 3);
|
||||
assert_eq!(c, 0, "all errors 0.05 > τ 0.04 ⇒ none correct");
|
||||
assert_eq!(pck, 0.0);
|
||||
}
|
||||
|
||||
// -------- half-in / half-out ⇒ PCK = 0.5 --------
|
||||
//
|
||||
// Hand calc (torso): torso = 0.20, τ@20 = 0.04. Four visible joints; two
|
||||
// exact (dist 0 ≤ 0.04, correct), two displaced 0.05 (> 0.04, wrong)
|
||||
// ⇒ 2/4 = 0.5.
|
||||
#[test]
|
||||
fn half_in_half_out_pck_half() {
|
||||
let gt = pose17(&[
|
||||
(0, 0.50, 0.20),
|
||||
(5, 0.50, 0.50),
|
||||
(CANON_LEFT_HIP, 0.40, 0.50),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.50),
|
||||
]);
|
||||
let pred = pose17(&[
|
||||
(0, 0.50, 0.20), // exact ⇒ correct
|
||||
(5, 0.55, 0.50), // err 0.05 ⇒ wrong
|
||||
(CANON_LEFT_HIP, 0.40, 0.50), // exact ⇒ correct
|
||||
(CANON_RIGHT_HIP, 0.65, 0.50), // err 0.05 ⇒ wrong
|
||||
]);
|
||||
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let (c, t, pck) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter);
|
||||
assert_eq!((c, t), (2, 4));
|
||||
assert!((pck - 0.5).abs() < 1e-6, "expected 0.5, got {pck}");
|
||||
}
|
||||
|
||||
// -------- THE KEY PROOF: same predictions, three normalizations, three PCK --------
|
||||
//
|
||||
// One construction scored three ways. Hand calc:
|
||||
// GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30),
|
||||
// l_hip(11)=(0.40,0.90), r_hip(12)=(0.60,0.90).
|
||||
// Visible = {0,5,11,12}, all four.
|
||||
// torso = |0.60-0.40| = 0.20 (hips, y equal).
|
||||
// bbox: x∈[0.40,0.60] (w=0.20), y∈[0.10,0.90] (h=0.80)
|
||||
// ⇒ diag = sqrt(0.20² + 0.80²) = sqrt(0.04+0.64)=sqrt(0.68)=0.8246…
|
||||
//
|
||||
// Pred errors (pure dx): nose 0.00, l_sh 0.10, l_hip 0.00, r_hip 0.00.
|
||||
// (Only joint 5 is displaced, by 0.10.)
|
||||
//
|
||||
// k = 20:
|
||||
// • Torso τ = 0.20·0.20 = 0.040 → joint5 err 0.10 > 0.040 ⇒ WRONG
|
||||
// ⇒ 3 correct / 4 = 0.75
|
||||
// • Bbox τ = 0.20·0.8246 = 0.16492 → joint5 err 0.10 ≤ 0.16492 ⇒ CORRECT
|
||||
// ⇒ 4 correct / 4 = 1.00
|
||||
// • Abs(0.05) τ = 0.05 → joint5 err 0.10 > 0.05 ⇒ WRONG
|
||||
// ⇒ 3 correct / 4 = 0.75 (same count as torso HERE by coincidence)
|
||||
//
|
||||
// To make ALL THREE differ, also test Abs(0.08): τ=0.08, joint5 0.10>0.08
|
||||
// ⇒ still 0.75. So we additionally displace nose by 0.06 (between 0.05 and
|
||||
// 0.08) to separate the two absolute thresholds — see below.
|
||||
#[test]
|
||||
fn three_normalizations_give_different_pck_on_identical_input() {
|
||||
let gt = pose17(&[
|
||||
(0, 0.50, 0.10), // nose
|
||||
(5, 0.50, 0.30), // left_shoulder
|
||||
(CANON_LEFT_HIP, 0.40, 0.90),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.90),
|
||||
]);
|
||||
// nose displaced 0.06, shoulder displaced 0.10, hips exact.
|
||||
let pred = pose17(&[
|
||||
(0, 0.56, 0.10), // err 0.06
|
||||
(5, 0.60, 0.30), // err 0.10
|
||||
(CANON_LEFT_HIP, 0.40, 0.90), // exact
|
||||
(CANON_RIGHT_HIP, 0.60, 0.90), // exact
|
||||
]);
|
||||
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
|
||||
// Torso τ@20 = 0.04: nose 0.06>0.04 wrong, sh 0.10>0.04 wrong,
|
||||
// hips exact ⇒ 2/4 = 0.5.
|
||||
let (_, _, torso) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter);
|
||||
// Bbox diag = sqrt(0.68)=0.82462; τ@20 = 0.164924:
|
||||
// nose 0.06 ≤ τ correct, sh 0.10 ≤ τ correct, hips exact ⇒ 4/4 = 1.0.
|
||||
let (_, _, bbox) = pck_at(&pred, >, &vis, 20, PckNormalization::BoundingBoxDiagonal);
|
||||
// Abs(0.08): nose 0.06 ≤ 0.08 correct, sh 0.10 > 0.08 wrong, hips exact
|
||||
// ⇒ 3/4 = 0.75.
|
||||
let (_, _, abs) = pck_at(&pred, >, &vis, 20, PckNormalization::AbsolutePixels(0.08));
|
||||
|
||||
assert!((torso - 0.5).abs() < 1e-6, "torso PCK expected 0.5, got {torso}");
|
||||
assert!((bbox - 1.0).abs() < 1e-6, "bbox PCK expected 1.0, got {bbox}");
|
||||
assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK expected 0.75, got {abs}");
|
||||
|
||||
// The whole point: identical predictions, three DISTINCT PCK values.
|
||||
assert!(torso != bbox && bbox != abs && torso != abs,
|
||||
"normalizations must give distinct PCK: torso={torso}, bbox={bbox}, abs={abs}");
|
||||
}
|
||||
|
||||
// -------- AbsolutePixels ignores k (raw threshold) --------
|
||||
#[test]
|
||||
fn absolute_pixels_ignores_threshold_percentage() {
|
||||
let gt = pose17(&[(5, 0.50, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
|
||||
let pred = pose17(&[(5, 0.53, 0.50), (CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
|
||||
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
// τ = 0.05 raw; joint5 err 0.03 ≤ 0.05 correct. k=5 and k=99 must agree.
|
||||
let (_, _, p5) = pck_at(&pred, >, &vis, 5, PckNormalization::AbsolutePixels(0.05));
|
||||
let (_, _, p99) = pck_at(&pred, >, &vis, 99, PckNormalization::AbsolutePixels(0.05));
|
||||
assert_eq!(p5, p99, "AbsolutePixels must ignore the k percentage");
|
||||
assert!((p5 - 1.0).abs() < 1e-6, "all three within 0.05, got {p5}");
|
||||
}
|
||||
|
||||
// -------- MPJPE hand-computed (2D and 3D) --------
|
||||
#[test]
|
||||
fn mpjpe_hand_computed_2d() {
|
||||
// joint0 err (3,4)->5, joint1 exact->0 ⇒ mean (5+0)/2 = 2.5.
|
||||
let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 1.0, 1.0]).unwrap();
|
||||
let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 1.0, 1.0]).unwrap();
|
||||
let vis = Array1::from(vec![2.0, 2.0]);
|
||||
assert!((mpjpe(&pred, >, &vis) - 2.5).abs() < 1e-6);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn mpjpe_hand_computed_3d() {
|
||||
// single joint err (1,2,2) -> sqrt(1+4+4)=3.0.
|
||||
let gt = Array2::from_shape_vec((1, 3), vec![0.0, 0.0, 0.0]).unwrap();
|
||||
let pred = Array2::from_shape_vec((1, 3), vec![1.0, 2.0, 2.0]).unwrap();
|
||||
let vis = Array1::from(vec![2.0]);
|
||||
assert!((mpjpe(&pred, >, &vis) - 3.0).abs() < 1e-6);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn mpjpe_excludes_invisible_joints() {
|
||||
// joint0 visible err 5, joint1 INVISIBLE err 100 ⇒ mean = 5 (joint1 dropped).
|
||||
let gt = Array2::from_shape_vec((2, 2), vec![0.0, 0.0, 0.0, 0.0]).unwrap();
|
||||
let pred = Array2::from_shape_vec((2, 2), vec![3.0, 4.0, 100.0, 0.0]).unwrap();
|
||||
let vis = Array1::from(vec![2.0, 0.0]);
|
||||
assert!((mpjpe(&pred, >, &vis) - 5.0).abs() < 1e-6);
|
||||
}
|
||||
|
||||
// -------- degenerate inputs: no panic --------
|
||||
#[test]
|
||||
fn zero_torso_is_unscoreable_not_perfect() {
|
||||
// Both hips coincident ⇒ torso ≈ 0; bbox also collapses ⇒ None.
|
||||
let gt = pose17(&[(CANON_LEFT_HIP, 0.5, 0.5), (CANON_RIGHT_HIP, 0.5, 0.5)]);
|
||||
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
assert_eq!(pck_at(>, >, &vis, 20, PckNormalization::TorsoDiameter), (0, 0, 0.0));
|
||||
assert_eq!(pck_at(>, >, &vis, 20, PckNormalization::BoundingBoxDiagonal), (0, 0, 0.0));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn no_visible_keypoints_scores_zero() {
|
||||
let gt = pose17(&[(CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
|
||||
let vis = vis17(&[]); // nothing visible
|
||||
let (c, t, pck) = pck_at(>, >, &vis, 20, PckNormalization::TorsoDiameter);
|
||||
assert_eq!((c, t, pck), (0, 0, 0.0));
|
||||
assert_eq!(mpjpe(>, >, &vis), 0.0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn nan_coords_do_not_panic_and_count_wrong() {
|
||||
let gt = pose17(&[(5, 0.5, 0.5), (CANON_LEFT_HIP, 0.4, 0.5), (CANON_RIGHT_HIP, 0.6, 0.5)]);
|
||||
let mut pred = gt.clone();
|
||||
pred[[5, 0]] = f32::NAN; // joint 5 prediction is NaN
|
||||
let vis = vis17(&[5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let (c, t, pck) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter);
|
||||
assert_eq!(t, 3);
|
||||
assert_eq!(c, 2, "NaN joint must count as wrong, hips correct ⇒ 2/3");
|
||||
assert!((pck - 2.0 / 3.0).abs() < 1e-6);
|
||||
// mpjpe with a NaN joint yields NaN (caller filters) but must not panic.
|
||||
assert!(mpjpe(&pred, >, &vis).is_nan());
|
||||
}
|
||||
|
||||
// -------- batch report: micro-average + self-describing struct --------
|
||||
#[test]
|
||||
fn accuracy_report_micro_averages_and_carries_definition() {
|
||||
// Frame A: 2 visible, both correct (2/2). Frame B: 2 visible, both wrong (0/2).
|
||||
// Micro-average over joints: 2 correct / 4 = 0.5 (NOT mean-of-frame-PCK,
|
||||
// which would be (1.0+0.0)/2 = 0.5 here too, but the accumulator is the
|
||||
// joint-level one).
|
||||
let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
|
||||
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let frame_a = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis.clone() };
|
||||
// Frame B: displace both hips by 0.05 (> τ 0.04) ⇒ both wrong.
|
||||
let pred_b = pose17(&[(CANON_LEFT_HIP, 0.45, 0.50), (CANON_RIGHT_HIP, 0.65, 0.50)]);
|
||||
let frame_b = PoseFrame { pred: pred_b, gt: gt.clone(), visibility: vis.clone() };
|
||||
|
||||
let report = accuracy_report(
|
||||
&[frame_a, frame_b],
|
||||
&[20, 50],
|
||||
PckNormalization::TorsoDiameter,
|
||||
);
|
||||
assert_eq!(report.n_frames, 2);
|
||||
assert_eq!(report.n_keypoints, 17);
|
||||
assert_eq!(report.normalization, PckNormalization::TorsoDiameter);
|
||||
// PCK@20: 2 correct / 4 visible = 0.5.
|
||||
assert!((report.pck(20).unwrap() - 0.5).abs() < 1e-6);
|
||||
// PCK@50: τ = 0.5·0.20 = 0.10, frame B err 0.05 ≤ 0.10 ⇒ all correct
|
||||
// ⇒ 4/4 = 1.0.
|
||||
assert!((report.pck(50).unwrap() - 1.0).abs() < 1e-6);
|
||||
// A reported number always carries its definition in the summary.
|
||||
assert!(report.summary().contains("torso-diameter"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn accuracy_report_empty_is_zero_not_nan() {
|
||||
let report = accuracy_report(&[], &[20], PckNormalization::BoundingBoxDiagonal);
|
||||
assert_eq!(report.n_frames, 0);
|
||||
assert_eq!(report.pck(20), Some(0.0));
|
||||
assert_eq!(report.mpjpe, 0.0);
|
||||
assert!(!report.mpjpe.is_nan());
|
||||
}
|
||||
|
||||
// -------- bbox-norm is looser than torso-norm (sanity, on a batch) --------
|
||||
#[test]
|
||||
fn bbox_norm_scores_at_least_torso_norm() {
|
||||
// bbox diagonal >= torso span always (bbox encloses the hips), so for the
|
||||
// SAME frames bbox-PCK >= torso-PCK at the same k. Pin this ordering.
|
||||
let gt = pose17(&[
|
||||
(0, 0.50, 0.10),
|
||||
(5, 0.50, 0.40),
|
||||
(CANON_LEFT_HIP, 0.40, 0.90),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.90),
|
||||
]);
|
||||
let pred = pose17(&[
|
||||
(0, 0.55, 0.10),
|
||||
(5, 0.58, 0.40),
|
||||
(CANON_LEFT_HIP, 0.42, 0.90),
|
||||
(CANON_RIGHT_HIP, 0.62, 0.90),
|
||||
]);
|
||||
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let frame = PoseFrame { pred, gt, visibility: vis };
|
||||
let torso = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::TorsoDiameter);
|
||||
let bbox = accuracy_report(std::slice::from_ref(&frame), &[20], PckNormalization::BoundingBoxDiagonal);
|
||||
assert!(
|
||||
bbox.pck(20).unwrap() >= torso.pck(20).unwrap(),
|
||||
"bbox-norm (looser) must be >= torso-norm: bbox={:?} torso={:?}",
|
||||
bbox.pck(20), torso.pck(20)
|
||||
);
|
||||
}
|
||||
}
|
||||
|
|
@ -43,6 +43,11 @@
|
|||
// All *this* crate's code is written without unsafe blocks.
|
||||
#![warn(missing_docs)]
|
||||
|
||||
/// Metric-locked pose-accuracy harness (ADR-155 §Tier-1.2; needs ADR slot 173)
|
||||
/// — selectable `PckNormalization` (torso / bbox-diagonal / absolute), `mpjpe`,
|
||||
/// and a self-describing `PoseAccuracy` result so a reported PCK number always
|
||||
/// carries the definition it was computed under.
|
||||
pub mod accuracy;
|
||||
pub mod config;
|
||||
pub mod dataset;
|
||||
pub mod domain;
|
||||
|
|
@ -89,6 +94,11 @@ pub use metrics_core::{
|
|||
canonical_torso_size, oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP,
|
||||
COCO_KP_SIGMAS,
|
||||
};
|
||||
// ADR-155 §Tier-1.2 — metric-locked accuracy harness (selectable PCK
|
||||
// normalization + MPJPE + self-describing result).
|
||||
pub use accuracy::{
|
||||
accuracy_report, mpjpe as pck_mpjpe, pck_at, PckNormalization, PoseAccuracy, PoseFrame,
|
||||
};
|
||||
pub use config::TrainingConfig;
|
||||
pub use dataset::{
|
||||
CsiDataset, CsiSample, DataLoader, MmFiDataset, SyntheticConfig, SyntheticCsiDataset,
|
||||
|
|
|
|||
|
|
@ -29,6 +29,66 @@
|
|||
|
||||
use ndarray::{Array1, Array2};
|
||||
use wifi_densepose_train::{oks_canonical, pck_canonical, CANON_LEFT_HIP, CANON_RIGHT_HIP};
|
||||
// ADR-155 §Tier-1.2 — metric-locked accuracy harness public surface.
|
||||
use wifi_densepose_train::{accuracy_report, pck_at, PckNormalization, PoseFrame};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Metric-locked accuracy harness: the three PCK normalizations are reachable
|
||||
// from the crate root and give DIFFERENT PCK on identical predictions — the
|
||||
// proof that the 96 / 81.6 / 61 figures were non-comparable (validated here as
|
||||
// a downstream consumer would call it).
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Identical predictions, three declared normalizations ⇒ three distinct PCK.
|
||||
/// Hand calc (all coords in `[0,1]`):
|
||||
/// * GT: nose(0)=(0.50,0.10), l_sh(5)=(0.50,0.30), hips=(0.40,0.90)/(0.60,0.90).
|
||||
/// * Pred: nose err 0.06, shoulder err 0.10, hips exact.
|
||||
/// * torso = 0.20 ⇒ τ@20 = 0.04 ⇒ only hips correct ⇒ 2/4 = **0.50**.
|
||||
/// * bbox = √(0.20²+0.80²)=0.82462 ⇒ τ@20 = 0.16492 ⇒ all correct ⇒ **1.00**.
|
||||
/// * abs(0.08): nose 0.06≤0.08 ok, shoulder 0.10>0.08 wrong ⇒ 3/4 = **0.75**.
|
||||
#[test]
|
||||
fn harness_three_normalizations_differ_from_crate_root() {
|
||||
let gt = pose17(&[
|
||||
(0, 0.50, 0.10),
|
||||
(5, 0.50, 0.30),
|
||||
(CANON_LEFT_HIP, 0.40, 0.90),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.90),
|
||||
]);
|
||||
let pred = pose17(&[
|
||||
(0, 0.56, 0.10),
|
||||
(5, 0.60, 0.30),
|
||||
(CANON_LEFT_HIP, 0.40, 0.90),
|
||||
(CANON_RIGHT_HIP, 0.60, 0.90),
|
||||
]);
|
||||
let vis = vis17(&[0, 5, CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
|
||||
let (_, _, torso) = pck_at(&pred, >, &vis, 20, PckNormalization::TorsoDiameter);
|
||||
let (_, _, bbox) = pck_at(&pred, >, &vis, 20, PckNormalization::BoundingBoxDiagonal);
|
||||
let (_, _, abs) = pck_at(&pred, >, &vis, 20, PckNormalization::AbsolutePixels(0.08));
|
||||
|
||||
assert!((torso - 0.50).abs() < 1e-6, "torso PCK 0.50, got {torso}");
|
||||
assert!((bbox - 1.00).abs() < 1e-6, "bbox PCK 1.00, got {bbox}");
|
||||
assert!((abs - 0.75).abs() < 1e-6, "abs(0.08) PCK 0.75, got {abs}");
|
||||
assert!(
|
||||
torso != bbox && bbox != abs && torso != abs,
|
||||
"three normalizations must be distinct: {torso} / {bbox} / {abs}"
|
||||
);
|
||||
}
|
||||
|
||||
/// `accuracy_report` returns a self-describing result carrying its normalization,
|
||||
/// so an unlabeled PCK number is structurally impossible at the API boundary.
|
||||
#[test]
|
||||
fn harness_report_carries_normalization_label() {
|
||||
let gt = pose17(&[(CANON_LEFT_HIP, 0.40, 0.50), (CANON_RIGHT_HIP, 0.60, 0.50)]);
|
||||
let vis = vis17(&[CANON_LEFT_HIP, CANON_RIGHT_HIP]);
|
||||
let frame = PoseFrame { pred: gt.clone(), gt: gt.clone(), visibility: vis };
|
||||
let report = accuracy_report(&[frame], &[20], PckNormalization::BoundingBoxDiagonal);
|
||||
assert_eq!(report.normalization, PckNormalization::BoundingBoxDiagonal);
|
||||
assert_eq!(report.n_keypoints, 17);
|
||||
assert_eq!(report.n_frames, 1);
|
||||
assert!((report.pck(20).unwrap() - 1.0).abs() < 1e-6);
|
||||
assert!(report.summary().contains("bbox-diagonal"));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Tests that use `EvalMetrics` (requires tch-backend because the metrics
|
||||
|
|
|
|||
Loading…
Reference in New Issue