* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity
The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).
New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
(retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
number is structurally impossible.
17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).
This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).
wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness
Documents the accuracy harness (committed 3a8b2ed13) that resolves the
PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief
(#1090): three historical numbers (96/81.6/61) used three unstated
normalizations. The harness makes normalization explicit + selectable
(PckNormalization enum) and every reported number carries its definition.
Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs.
Co-Authored-By: claude-flow <ruv@ruv.net>
* refactor(train): hoist canonical PCK/OKS to un-gated metrics_core; fold test_metrics onto production (ADR-155 M1 §8)
ADR-155 §8 deferred item: test_metrics.rs reference kernels validated
production against their OWN reimplementation — a test that cannot catch a
canonical-impl bug (both could be wrong the same way).
- Extract canonical_torso_size / pck_canonical / oks_canonical / sigmas /
bounding_box_diagonal into a new NON-tch-gated `metrics_core` module, so
the single metric definition is reachable under
`cargo test --no-default-features` (the `metrics` module is tch-gated).
`metrics` re-exports every item → still exactly ONE implementation.
- Rewrite tests/test_metrics.rs to assert the PRODUCTION pck_canonical /
oks_canonical equal hand-computed fixtures (not a reimplementation):
canonical_pck_matches_hand_computed_fixture (corr=3/total=4/pck=0.75),
hip↔hip normalizer pin, zero-visible⇒0.0, OKS perfect⇒1.0, fake-Gold pin.
- Keep an INDEPENDENT raw-threshold reference kernel only as a differential
cross-check: test_kernel_agrees_with_canonical asserts it AGREES with
canonical where torso==1.0 (genuine cross-check, not duplication).
Grade: MEASURED. test_metrics 10→12 tests, 0 failed.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(sensing-server): relabel divergent live PCK/OKS so they're never conflated with canonical (ADR-155 M1 §2.1/§8 Goal C)
Goal C named training_api.rs:804 (torso-HEIGHT PCK). Auditing it surfaced
TWO findings the ADR-155 §1 table missed:
1. training_api.rs is an ORPHAN file — not declared `mod` in lib.rs OR main.rs,
so it does NOT compile into the crate. It does not drive the live server.
2. The REAL live `best_pck`/`best_oks` (main.rs training path → RVF metadata
JSON read by model_manager.rs) come from trainer.rs:
- `pck_at_threshold` = RAW-threshold PCK, NO torso normalization (the most
divergent kind), printed/serialized as bare "PCK@0.2".
- `oks_map` calls `oks_single(area=1.0)` = the EXACT fake-Gold pattern
ADR-155 §2.1 claimed closed elsewhere — still live here, inflating best_oks.
Resolution = RELABEL (torso/raw math is load-bearing on different data; the
pub fns can't be renamed without breaking API; sensing-server has no train/
ndarray dep). Honest unify is a tracked §8 backlog item.
- training_api.rs: `compute_pck` → `compute_pck_torso_height` + divergence doc;
val_pck/best_pck/val_oks struct fields documented as torso-HEIGHT proxies;
logs say `pck_torso_h@0.2`. Test torso_pck_is_labelled_distinctly_from_canonical.
- trainer.rs (LIVE): `pck_at_threshold` documented raw-unnormalized; `oks_map`
area=1.0 flagged fake-Gold; test pck_at_threshold_is_raw_unnormalized_not_canonical.
- main.rs: live print relabelled `pck_raw@0.2` / `oks_map(area=1.0 proxy)`.
No wire-format field renames (back-compat); no pub-API rename (no silent break).
Grade: MEASURED (relabel + divergence pinned). sensing-server 450→451 lib tests, 0 failed.
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr-155): mark §8 metric items RESOLVED + audit map + honest §1 under-count correction (M1b Goals A/D)
- §8.1: full PCK/OKS audit map (every def: file:line, basis, canonical/
legacy/distinct), the two §8 items marked RESOLVED with resolution+why.
- Honest finding: §1's "seven divergent metrics" was an UNDER-count —
sensing-server's LIVE trainer.rs has a raw-unnormalized PCK and an
area=1.0 fake-Gold OKS the table omitted, and the file §8 named
(training_api.rs) is orphaned dead code. §9 honest-limits updated.
- Goal D: metrics.rs *_v2 variants confirmed caller-less + deprecated;
noted for future cleanup, NOT deleted (public API, tch-gated).
- CHANGELOG [Unreleased] Fixed entry.
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(ruvector): RaBitQ Pass-2 randomized rotation + topk bugfix (ADR-156 §8)
Implements the deferred "Multi-bit / Extended RaBitQ Pass 2" backlog item
from ADR-156 §8: a deterministic randomized orthogonal rotation applied
before sign-quantization, the published RaBitQ construction (Gao & Long,
SIGMOD 2024).
Rotation construction: Fast Hadamard Transform + seeded ±1 sign flips
("HD" / randomized Hadamard), O(d log d) time and O(d) memory — a dense
d×d rotation is O(d²) and infeasible at the 65,535-d the wire format
provisions for. Pads to the next power of two; SplitMix64 seeds the sign
stream so index-time and query-time rotations are bit-identical.
API is additive and backward-compatible: Pass 1 (`from_embedding`) is
untouched; Pass 2 is opt-in via `Sketch::from_embedding_rotated` and
`SketchBank::with_rotation` (+ `insert_embedding` / `topk_embedding` /
`novelty_embedding` helpers that rotate consistently). Default behaviour
is unchanged.
While building the Pass-2 coverage harness, found and fixed a PRE-EXISTING
correctness bug in `SketchBank::topk`: the n>k heap path used
`BinaryHeap<Reverse<(d,id)>>` (a min-heap) but treated its peek as the
max, so it returned the k FARTHEST sketches as "nearest". The shipped unit
tests only exercised the n≤k fast path, so it went unnoticed. Fixed to a
plain max-heap; pinned by `topk_heap_path_returns_nearest` and
`tight_clusters_give_high_coverage_with_overfetch` (the latter measured
0.072 on the old code).
New tests (+17, 100→117 in the crate): rotation determinism/norm-preservation
(`rotation_is_deterministic_for_seed`, `rotation_preserves_norm`), Pass-2
shape-compatibility, `pass2_coverage_not_worse_than_pass1`, and a
deterministic coverage report.
MEASURED top-K coverage (anisotropic planted-cluster fixture, cosine ground
truth; dim=128 N=2048 K=8 64 clusters noise=0.35 128 queries):
candidate_k=K=8 : Pass1 36.13% -> Pass2 46.39% (both << 90% bar)
candidate_k=24 : Pass1 83.89% -> Pass2 91.60% (Pass2 clears 90%)
candidate_k=32 : Pass1/Pass2 100%
Honest result: rotation consistently helps (+10pp at strict K), but neither
pass clears the ADR-084 90% bar at candidate_k==K on this distribution.
Pass 2 reaches 90% only with ~3x over-fetch (the ADR-084 "candidate set"
deployment pattern). Multi-bit Pass 3 evaluated separately.
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(ruvector): multi-bit Pass-3 experiment + ADR-156/084 measured results
Adds the multi-bit half of the ADR-156 §8 "Multi-bit / Extended RaBitQ"
item as a MEASURED experiment (coverage::measure_multibit): rotate, then
b-bit uniform scalar-quantize each coord, rank by L1 over codes — the
natural multi-bit generalization of hamming. Measures the bit/coverage
tradeoff the backlog item asked for.
MEASURED at the strict bar (candidate_k=K=8, anisotropic planted-cluster
fixture, cosine ground truth):
Pass1 (1-bit, no rot) 36.13% 16 B/vec
Pass2 (1-bit, rot) 46.39% 16 B/vec
Pass3 (rot, 2-bit) 54.39% 32 B/vec
Pass3 (rot, 3-bit) 66.70% 48 B/vec
Pass3 (rot, 4-bit) 74.22% 64 B/vec
Honest: multi-bit monotonically helps but even 4-bit (4x memory) reaches
only 74% at the strict bar — neither rotation nor <=4-bit multi-bit clears
the strict-K 90% bar on this distribution. The bar is met via over-fetch
(Pass2 @ candidate_k=24). Tests: multibit_tradeoff_report,
multibit_1bit_matches_pass2_approx (+ sanity that 1-bit ~= Pass-2).
Docs:
- ADR-156 §8 item #2 marked RESOLVED-PARTIAL; §5 #2 grade CLAIMED ->
MEASURED-on-our-hardware; new §10 with full measured tables, the topk
bugfix disclosure, and graded deferred sub-items.
- ADR-084: "Pass 2" section answering the rotation open-question with
measured numbers + the topk bug note.
- CHANGELOG [Unreleased]: Added (Pass-2 milestone) + Fixed (topk heap).
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes the remaining doable items from the 2026-05-11 training-pipeline audit:
#6 (CSI format default = 56-sc / 1 NIC) + #7 (multi-band 168-sc mesh not in
config): new `TrainingConfig::for_subcarriers(native, target)` plus named
presets `mmfi()` (114→56), `ht40_192()` (≈192-sc ESP32 HT40 → 56) and
`multiband_168()` (168-sc ADR-078 multi-band mesh → 56). Non-MM-Fi CSI shapes
are now first-class instead of requiring manual `native_subcarriers` /
`num_subcarriers` overrides; the field docs list the supported source counts
and the multi-NIC mapping (a 2–3-node mesh currently rides on `n_rx` until a
dedicated node dimension lands). Model input width stays `num_subcarriers`; the
presets only vary the resampling input.
#4 (proof.rs uses synthetic data): reframed — a deterministic proof *must* use
a reproducible source, so `verify-training` correctly stays on
`SyntheticCsiDataset`. The real gap was that nothing exercised the on-disk
`MmFiDataset` path. New `tests/test_real_loader.rs` writes synthetic CSI to
`.npy` files in the `MmFiDataset::discover` layout, loads it back, and checks
the resulting `CsiSample` — covering the no-interp case, the
subcarrier-interpolation branch, and the empty-root case. Adds `ndarray` /
`ndarray-npy` as dev-deps for the fixture writing.
cargo check + cargo test -p wifi-densepose-train --no-default-features: clean,
all existing tests green, 3 new loader tests + the updated config doctest pass.
Purely additive — no model-shape change, no tch-module change.
Addresses three findings from the 2026-05-11 training-pipeline audit:
#1/#2 — `wifi-densepose-signal` was a phantom dependency of `wifi-densepose-train`
(listed in Cargo.toml, never imported), and vitals/CSI signal features were
absent from the pipeline. New module `wifi_densepose_train::signal_features`:
`extract_signal_features(&Array4<f32>, &Array4<f32>) -> Array1<f32>` (and the
convenience method `CsiSample::signal_features()`) runs a windowed observation's
centre frame through `wifi_densepose_signal::features::FeatureExtractor`,
producing a fixed-length (FEATURE_LEN=12) amplitude / phase-coherence / PSD
feature vector — the hook for a future vitals / multi-task supervision head
(breathing- and heart-rate-band power are read off the PSD summary). The vector
is produced on demand and is not yet fed back into the loss; wiring it as a
training target is the documented follow-up. `wifi-densepose-signal` is now an
actually-used dependency. 5 new tests (2 unit in signal_features.rs, 3
integration in tests/test_dataset.rs); existing wifi-densepose-train tests
unchanged and green.
#3 — `docs/huggingface/MODEL_CARD.md` presented PIR/BME280 environmental-sensor
weak-label fine-tuning as a current capability; there is no env-sensor
ingestion in the training pipeline. Marked that path as planned/not-implemented
in the training-steps list and the data-provenance section.
(#5 — README's "92.9% PCK@20" overclaim — fixed separately in PR #535.)
CHANGELOG updated.
The Rust port lived two directories deep (rust-port/wifi-densepose-rs/)
without any sibling under rust-port/ that warranted the extra level.
Move the whole workspace up to v2/ to match v1/ (Python) at the same
depth and shorten every cd / build command across the repo.
git mv preserves history for all tracked files. 60 files updated for
path references (CI workflows, ADRs, docs, scripts, READMEs, internal
.claude-flow state). Two manual fixes for relative-cd paths in
CLAUDE.md and ADR-043 that became wrong after the depth change
(cd ../.. → cd ..).
Validated:
- cargo check --workspace --no-default-features → clean (after target/
nuke; the gitignored target/ was carried by the OS rename and had
hard-coded old paths in build scripts)
- cargo test --workspace --no-default-features → 1,539 passed, 0 failed,
8 ignored (same totals as pre-rename)
- ESP32-S3 on COM7 → still streaming live CSI (cb #40300, RSSI -64 dBm)
After-merge follow-up: contributors should `rm -rf v2/target` once and
let cargo regenerate from the new path.