From 80c14a73bd168e9ab78545ca572d993050434873 Mon Sep 17 00:00:00 2001 From: ruvnet Date: Fri, 12 Jun 2026 02:42:07 +0000 Subject: [PATCH] deploy: 427c56881bac4b995b959b678b0a0789ed7bf95f --- .../adr/ADR-154-signal-dsp-beyond-sota.md | 234 ++++++++++++++++++ .../adr/ADR-155-nn-training-beyond-sota.md | 202 +++++++++++++++ .../ADR-156-ruvector-fusion-beyond-sota.md | 153 ++++++++++++ .../ADR-157-hardware-sensing-beyond-sota.md | 191 ++++++++++++++ .../adr/ADR-158-mat-worldmodel-beyond-sota.md | 212 ++++++++++++++++ api-docs/research/soul/specification.md | 17 ++ 6 files changed, 1009 insertions(+) create mode 100644 api-docs/adr/ADR-154-signal-dsp-beyond-sota.md create mode 100644 api-docs/adr/ADR-155-nn-training-beyond-sota.md create mode 100644 api-docs/adr/ADR-156-ruvector-fusion-beyond-sota.md create mode 100644 api-docs/adr/ADR-157-hardware-sensing-beyond-sota.md create mode 100644 api-docs/adr/ADR-158-mat-worldmodel-beyond-sota.md diff --git a/api-docs/adr/ADR-154-signal-dsp-beyond-sota.md b/api-docs/adr/ADR-154-signal-dsp-beyond-sota.md new file mode 100644 index 00000000..240fedb3 --- /dev/null +++ b/api-docs/adr/ADR-154-signal-dsp-beyond-sota.md @@ -0,0 +1,234 @@ +# ADR-154: Signal/DSP Beyond-SOTA Sweep — Milestone 0 (Correctness, Provable Perf, and the SOTA Landscape) + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-06-11 | +| **Deciders** | ruv | +| **Codebase target** | `wifi-densepose-signal` (`ruvsense/`, `features.rs`, `csi_processor.rs`, `spectrogram.rs`, `bvp.rs`), benches, docs | +| **Relates to** | ADR-134 (CIR sparse recovery), ADR-135 (Empty-Room Baseline), ADR-029/030/032 (Multistatic mesh + security), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-153 (802.11bf forward-compat) | +| **Scope** | Milestone 0 of the beyond-SOTA signal/DSP sweep: high-leverage **correctness/security fixes**, two **measured** perf wins, the per-module SOTA landscape with evidence grades, and a prioritized roadmap. **45 review findings are explicitly deferred** (§7 backlog) — nothing is silently dropped. | + +--- + +## 0. PROOF discipline (this ADR's contract) + +This project has been publicly accused of "AI slop." This ADR answers that with **evidence, not adjectives**: + +- Every claimed code improvement ships with a **committed regression test** (correctness) or a **committed criterion bench** (performance). +- Every perf number below is **MEASURED before/after** with the exact reproduce command. A perf claim without a measured before/after is **UNPROVEN** and is not made here. +- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **THEORETICAL**, distinguishing what a paper *measured* from what it *asserts* and from what is merely *plausible*. +- The headline finding — a **dead CIR coherence gate that silently fell back in production for every canonical frame** — is disclosed in full (§2), not buried. + +Test machine for the perf numbers: Windows 11, `cargo bench --release`, criterion 0.5. Numbers are wall-clock medians on this box; they are about **ratios** (before/after), which are stable across machines, not absolute ns. + +--- + +## 1. Context + +The RuvSense signal stack (16 `ruvsense/` modules + the classic `features.rs`/`csi_processor.rs`/`spectrogram.rs`/`bvp.rs` pipeline) grew quickly across ADR-014/029/030/134/135. A beyond-SOTA review surfaced ~50 findings ranging from two **critical correctness/security defects** to micro-optimizations and SOTA-gap research items. Milestone 0 closes the **provable, high-leverage subset**: the two criticals, a divide-by-zero trio, two measured perf wins, and the research landscape. The remaining ~45 are catalogued in §7 so the backlog is explicit and auditable. + +--- + +## 2. The headline finding — the ADR-134 CIR coherence gate was DEAD in production (CRITICAL, FIXED) + +### 2.1 What was wrong + +`MultistaticFuser` fuses **canonical CSI frames**: `hardware_norm.rs` resamples every chipset onto a uniform **56-tone canonical grid** before fusion (`HardwareNormalizer`, default `canonical_subcarriers = 56`). The ADR-134 CIR coherence gate (`cir_gate_coherence`, multistatic.rs) is supposed to blend a CIR dominant-tap ratio into the cross-node coherence — `coherence = 0.7·freq + 0.3·dominant_tap_ratio`. + +But the gate was wired to `CirEstimator::new(CirConfig::ht20())` (`with_cir_ht20`), and `ht20()` expects **64 FFT bins or 52 active tones**. A canonical-56 frame matches *neither*, so every call returned `CirError::SubcarrierMismatch` and `cir_gate_coherence` hit its **silent `Err(_) => freq_coherence` fallback** (multistatic.rs). Net effect: **the CIR gate never ran on a single production frame** — `use_cir_gate = true` was indistinguishable from `false`. This is the exact shape of "AI slop": a feature that compiles, has tests on the *estimator*, and is dead at the *integration seam*. + +### 2.2 The fix (the gate now actually runs) + +- New `CirConfig::canonical56()` (cir.rs): 64-bin HT20 framing, **56 active tones**, 168 delay taps, Φ built over a contiguous −28..+28 active-tone grid (also the native Atheros-56 layout). `bandwidth_hz`/`tap_spacing` stay physically correct for a 20 MHz HT20 channel; only the active-tone count differs from `ht20()`. +- New `MultistaticFuser::with_cir_canonical56()` — the **correct default** for the RuvSense pipeline. `with_cir_ht20()` is retained for genuine raw-64/52 feeds and now carries a loud doc-warning. +- `active_indices()` handles `(64, 56)` explicitly and the fallback now selects the slice whose length matches `num_active` (so Φ's column count is always self-consistent — no silent fall-through to the 52-index slice). +- The remaining silent fallback is made **LOUD**: a `SubcarrierMismatch` inside `cir_gate_coherence` now fires a `debug_assert!` naming the misconfiguration ("CIR gate DEAD … build it with `CirConfig::canonical56()`"). A *config* error can no longer hide as a graceful runtime degrade. +- `cir_estimate_first()` exposes the raw `estimate()` verdict so a test can **count Ok vs Err** on a canonical-56 stream. + +### 2.3 The PROOF (committed regression tests, `ruvsense::multistatic::tests`) + +| Test | Asserts | Result | +|------|---------|--------| +| `cir_gate_ht20_is_dead_on_canonical56` | old ht20 estimator on 8 canonical-56 frames → **0 Ok, 8 `SubcarrierMismatch`** | the dead gate, measured | +| `cir_gate_canonical56_is_alive` | new canonical56 estimator on the same 8 frames → **8 Ok, 0 Err** | the gate runs | +| `cir_gate_on_changes_coherence_vs_off` | `coherence(gate on)` ≠ `coherence(gate off)` (\|Δ\| > 1e-6) | the CIR term is actually applied | +| `cir_gate_dead_ht20_equals_gate_off` (release-only) | dead-ht20 coherence == gate-off coherence (\|Δ\| < 1e-9) | confirms the silent degradation the fix removes | + +**Reproduce:** +```bash +cd v2 && cargo test -p wifi-densepose-signal --no-default-features --lib \ + ruvsense::multistatic::tests::cir +# 3 passed (the 4th is #[cfg(not(debug_assertions))], add --release to run it) +``` + +**Resolution: FIXED** (not merely loud-fail-documented). The gate now decodes 100% of canonical-56 frames where it previously decoded 0%. + +--- + +## 3. The second critical — NaN/inf adversarial-detector bypass (CRITICAL, FIXED) + +### 3.1 What was wrong + +`AdversarialDetector::check` (adversarial.rs) takes per-link `link_energies: &[f64]`. A single **NaN/inf** entry bypassed the whole detector: every `e > threshold` test is `false` on NaN, the Gini sort used `partial_cmp().unwrap_or(Equal)`, and the final `anomaly_score.clamp(0,1)` returns NaN on a NaN input. A real RF link can never have NaN/inf energy, so a non-finite input is *itself* the strongest possible spoof — yet it could slip through as "clean." + +### 3.2 The fix + +Finite-validate at the boundary: the first non-finite `link_energies` entry now **short-circuits to a definite anomaly** (`anomaly_detected = true`, `anomaly_score = 1.0`, `affected_links = [bad_idx]`, `FieldModelViolation`), and the poisoned frame is **not** seeded into the temporal-continuity state. + +### 3.3 The PROOF + +| Test | Asserts | +|------|---------| +| `nan_link_energy_flags_anomaly` | a NaN link energy → `anomaly_detected`, score 1.0, affected link reported, `anomaly_count == 1` | +| `inf_link_energy_flags_anomaly` | both `+inf` and `−inf` → anomaly, score 1.0 | + +```bash +cd v2 && cargo test -p wifi-densepose-signal --no-default-features --lib \ + ruvsense::adversarial::tests::nan_link ruvsense::adversarial::tests::inf_link +``` + +--- + +## 4. Divide-by-(n−1) window trio (CORRECTNESS, FIXED) + +Three windowing helpers divided by `(n − 1)` with no small-`n` guard: + +| Site | Bug | Fix | +|------|-----|-----| +| `csi_processor.rs` `CsiPreprocessor::hamming_window(n)` | `n=0` underflowed `0usize − 1`; `n=1` divided by 0 → all-NaN window | `match n { 0 => [], 1 => [1.0], _ => … }` | +| `bvp.rs` Hann window | `window_size=1` divided by 0 → NaN BVP | length-1 guard → constant `[1.0]` | +| `spectrogram.rs` `make_window` | `size=1` divided by 0 for Hann/Hamming/Blackman | `size <= 1` short-circuit → `vec![1.0; size]` | + +The standard convention for a length-1 window is the constant `1.0`; length-0 is empty. + +**PROOF:** `test_hamming_window_degenerate_sizes` (csi_processor), `bvp_window_size_one_is_finite` (bvp), `make_window_size_0_and_1_are_safe` (spectrogram) — each asserts finiteness at sizes 0/1/2. + +The Python deterministic proof (`archive/v1/data/proof/verify.py`) still prints **VERDICT: PASS** with the **same** pipeline hash `f8e76f21…46f7a` — the reference path uses `n ≥ 2`, so the guard is bit-transparent there. + +--- + +## 5. Measured performance wins (MEASURED before/after; benches committed) + +Both changes are **bit-equivalent** (asserted by a committed test) — they only remove wasted work. New criterion benches in `benches/features_bench.rs` (registered in `Cargo.toml`). + +**Reproduce both:** +```bash +cd v2 && cargo bench -p wifi-densepose-signal --no-default-features --bench features_bench +# compile-only: append --no-run +``` + +### 5.1 FFT-planner caching for PSD (features.rs) + +`PowerSpectralDensity::from_csi_data` constructed a fresh `FftPlanner` and re-planned the FFT **on every frame** — and `FeatureExtractor::extract` calls it per frame on the hot path. New `from_csi_data_with_fft(csi, fft_size, &Arc)` reuses a plan cached in `FeatureExtractor` (built once in `new()`). Output is **bit-identical** (`psd_cached_fft_bit_identical_to_fresh` compares `f64::to_bits` of values + all summary stats across 6 FFT sizes). + +Bench group `psd_fft_planner` — `fresh_planner` (before) vs `cached_planner` (after), per frame: + +| fft_size | before (fresh plan), median | after (cached), median | speedup | +|----------|------------------------------|-------------------------|---------| +| 64 | 5.84 µs/frame | 1.89 µs/frame | **3.09×** | +| 128 | 9.31 µs/frame | 3.61 µs/frame | **2.58×** | +| 256 | 13.77 µs/frame | 6.73 µs/frame | **2.04×** | + +Medians from criterion (warm-up 1 s, 20 samples). Raw three-point estimates (low/median/high), per frame: +`fresh/64 [5.27, 5.84, 6.34] µs` vs `cached/64 [1.76, 1.89, 2.03] µs`; +`fresh/256 [13.29, 13.77, 14.32] µs` vs `cached/256 [6.26, 6.73, 7.43] µs`. +The win is the re-planned `FftPlanner` construction the cache hoists out of the per-frame loop; it grows in *relative* terms at small FFTs (planning is a larger fraction of a cheap transform) and stays a flat ~2× at 256. + +### 5.2 DTW Sakoe-Chiba band honored (gesture.rs) + +`dtw_distance` computed the band bounds `j_start/j_end` but still iterated the **full** `1..=m` row, `continue`-ing on out-of-band cells — so the band constrained the *path* but not the *work* (still O(n·m)). The fix iterates only `j_start..=j_end` (O(n·band)), resetting just the two boundary-guard cells the recurrence can read, and computes the endpoint reachability (`|n−m| ≤ band`) at the return site. Result is **bit-identical** to the full-row version across 12 shapes × 8 band widths (`dtw_banded_bit_identical_to_fullrow`). + +Bench group `dtw_sakoe_chiba` — `full_row` (before) vs `banded` (after): + +| case | before (full row), median | after (banded), median | speedup | +|------|-----------------------------|--------------------------|---------| +| n=m=100, band=5 | 33.45 µs | 13.77 µs | **2.43×** | +| n=m=200, band=5 | 122.32 µs | 29.55 µs | **4.14×** | +| n=m=200, band=10 | 159.98 µs | 60.19 µs | **2.66×** | + +Medians from criterion (warm-up 1 s, 20 samples). Raw (low/median/high): +`full_row n200_band5 [107.6, 122.3, 146.5] µs` vs `banded n200_band5 [26.4, 29.5, 33.1] µs`. +The speedup tracks the inner-loop cell-count ratio `m / (2·band+1)` — n=m=200, band=5 → 200/11 ≈ 18× fewer cells, but euclidean-distance cost and loop overhead dominate at these sizes so the wall-clock win is ~4× (still the **largest at the longest sequence / narrowest band**, exactly as the algorithm predicts). It shrinks toward 1× as the band widens to cover the whole matrix (band=10 → 2.66×), and grows with sequence length (band=5: 2.43× at n=100 → 4.14× at n=200). + +> **Note on the other re-plan sites.** `spectrogram.rs`/`bvp.rs` plan their FFT **once per call** and reuse it across all frames/subcarriers (already amortized), so caching there is marginal — deferred (§7). The PSD site was the only one re-planning *per frame*. + +--- + +## 6. Per-module SOTA landscape (evidence-graded) + +Grades: **MEASURED** (the source measured it, ideally with public method/code), **CLAIMED** (asserted, no reproducible artifact), **THEORETICAL** (plausible, no published target). + +### 6.1 CSI → CIR (cir.rs — our ISTA/L1 sparse recovery) + +- **Deep-unfolded ISTA / LISTA for CSI→CIR — MEASURED.** Learned ISTA unrolling reports ~**3 dB NMSE** improvement over classical OMP/FISTA for channel/CIR estimation (arXiv [2211.15440](https://arxiv.org/abs/2211.15440); survey [2502.05952](https://arxiv.org/abs/2502.05952)). Public methods; numbers measured in-paper. **This is our #1 future item (§7) — our `cir.rs` already builds the sub-DFT Φ that LISTA would make trainable.** +- **Diffusion CIR prior — MEASURED (artifact).** [github.com/benediktfesl/Diffusion_channel_est](https://github.com/benediktfesl/Diffusion_channel_est) ships **public weights** for a diffusion-model channel-estimation prior. Heavier than our edge budget; tracked, not adopted. +- **Coherence gating (the §2 gate) — THEORETICAL.** Our 0.7/0.3 freq/CIR blend is an engineering heuristic with no published accuracy target; now that it *runs*, it can finally be A/B-measured. + +### 6.2 Adversarial robustness (adversarial.rs) + +- **Adversarial-robustness eval for WiFi sensing — MEASURED.** arXiv [2511.20456](https://arxiv.org/abs/2511.20456) + the **Wi-Spoof** benchmark provide a measured evaluation protocol for spoofed/injected CSI. Our detector's physical-plausibility checks (consistency/Gini/temporal/energy) are in the same spirit; adopting Wi-Spoof as an external benchmark is a §7 item. (The §3 NaN fix is a precondition: a detector that NaN-bypasses can't be benchmarked honestly.) + +### 6.3 Multi-AP / multistatic fusion (multistatic.rs) + +- **Bayesian multi-AP fusion — CLAIMED.** arXiv [2512.02462](https://arxiv.org/abs/2512.02462) proposes a Bayesian fusion across APs; **no code released**, numbers self-reported. Our attention-weighted fusion is a different (cheaper) mechanism; tracked as a comparison target, not adopted. + +### 6.4 RF intention-lead / pre-movement (intention.rs) — THEORETICAL + +The 200–500 ms pre-movement "lead signal" framing has **no published commodity-WiFi target** we can grade. Honestly THEORETICAL; no work item. + +--- + +## 7. Decision, roadmap, and the deferred-findings backlog + +### 7.1 Accepted now (this milestone) + +The §2–§5 fixes are **ACCEPTED and committed**: dead CIR gate fixed, NaN bypass fixed, window trio fixed, calibration dead-branch de-misled, two measured perf wins. All `cargo test -p wifi-densepose-signal --no-default-features` (and `--features cir`) green; Python proof PASS. + +### 7.2 Top accepted-future item — LISTA-for-CIR (NOT implemented here) + +**Unroll the existing ISTA in `cir.rs` into trainable layers (LISTA).** Effort: **M**. The sensing matrix Φ and the ISTA recurrence already exist; LISTA replaces the fixed step size / threshold with per-layer learned parameters over a fixed unroll depth. Measured target to beat: **~3 dB NMSE over OMP/FISTA** (arXiv 2211.15440 — MEASURED). Proposed, not built in Milestone 0. + +### 7.3 Other graded-future items + +- Adopt **Wi-Spoof** (arXiv 2511.20456, MEASURED) as the external adversarial benchmark for `adversarial.rs`. +- Evaluate the **diffusion CIR prior** (public weights, MEASURED) as an offline quality ceiling — *not* an edge target. +- Bayesian multi-AP fusion (2512.02462, CLAIMED) — comparison only, pending released code. + +### 7.4 Deferred Milestone-0 review findings (the ~45 not fixed here — explicit backlog) + +Catalogued so nothing is silently dropped. Priority: **P1** correctness-adjacent, **P2** perf, **P3** clarity/style. + +| # | Module | Finding | Pri | Why deferred | +|---|--------|---------|-----|--------------| +| 1 | cir.rs ~937 | `phase_variance` uses **linear** variance on **wrapped** angles (doc says "variance of phase angles") — spuriously inflates near ±π | P1 | Used as the `> TAU` ghost-tap *guard*; a correct circular variance is bounded [0,1] and would need the threshold re-derived. Semantic change — defer with a real recalibration, don't risk a silent gate regression in a perf/correctness pass. | +| 2 | calibration.rs ~311 | `subtract_in_place` had a vacuous `if active_input {ki} else {ki}` branch implying a full-FFT→bin remap that didn't exist | P3 | **Resolved here** (branch removed, sequential-convention documented to match the sibling `extract_first_stream`). Listed for visibility — behavior unchanged. | +| 3 | spectrogram.rs / bvp.rs | FFT planner built once-per-call (already amortized across frames) | P2 | Marginal vs the per-frame PSD site; cache if these become hot. | +| 4 | features.rs ~347 | Doppler FFT planner planned once per call, reused across subcarriers | P2 | Already amortized within the call. | +| 5 | multistatic.rs | `node_attention_weights` recomputes consensus/softmax each call; no SIMD | P2 | Needs a bench before touching; not obviously hot. | +| 6 | tomography.rs | ISTA L1 solver re-allocates voxel buffers per solve | P2 | Bench first. | +| 7 | pose_tracker.rs | Kalman gain matrices reallocated per update | P2 | Bench first. | +| 8 | field_model.rs | SVD recomputed on every perturbation extract | P2 | Incremental SVD is a real project, not a micro-fix. | +| 9 | coherence.rs / coherence_gate.rs | Z-score thresholds are magic constants, untested at boundaries | P1 | Needs labelled data to set defensible thresholds. | +| 10 | longitudinal.rs | Welford update not numerically guarded for n=0 | P1 | Add `n>=1` guard + test (same family as §4). | +| 11 | cross_room.rs | Fingerprint hash collisions unhandled | P2 | Low collision prob; needs design. | +| 12 | gesture.rs | `euclidean_distance` no length-mismatch guard | P3 | Caller-enforced; add `debug_assert`. | +| 13 | adversarial.rs | Gini/consistency thresholds are magic constants | P1 | Same labelled-data dependency as #9. | +| 14 | cir.rs | `fft_operator` path changes the witness hash (documented) — no test that it's *numerically close* to dense | P2 | Add a tolerance test. | +| 15 | multistatic.rs | `cir_gate_coherence` only estimates the **first** node/channel; multi-node CIR consensus unused | P2 | Design item (which node's CIR is authoritative?). | +| 16 | phase_align.rs | Iterative LO offset estimation has no convergence cap test | P2 | Add iteration-cap test. | +| 17 | hampel.rs | Window edge handling at series boundaries | P3 | Cosmetic. | +| 18 | motion.rs | Threshold constants undocumented | P3 | Doc-only. | +| 19 | csi_ratio.rs | Division guard relies on `1e-12` epsilon; no test | P2 | Add boundary test. | +| 20 | spectrogram.rs | `compute_multi_subcarrier_spectrogram` re-plans per subcarrier via `compute_spectrogram` | P2 | Hoist the planner (relates to #3). | +| 21–45 | (assorted) | Remaining clarity/doc/magic-constant/missing-boundary-test findings across `ruvsense/*`, `features.rs`, `motion.rs` | P3 | Bulk-addressable in a dedicated "test-the-boundaries + de-magic-constant" follow-up; not high-leverage individually. | + +> **Horizon-ledger one-liner.** Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n−1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). DEFERRED to follow-up: the ~45 findings in §7.4 (P1: phase_variance circular bug #1, Welford guard #10, threshold magic-constants #9/#13; P2/P3: the rest) — none silently dropped. + +--- + +## 8. Consequences + +- **Positive:** the ADR-134 CIR gate is alive for the first time in production; the adversarial detector can no longer be NaN-bypassed; three latent divide-by-zero NaN sources are gone; the per-frame PSD path and gesture DTW are measurably faster with bit-identical output; the SOTA landscape and a concrete LISTA-for-CIR roadmap are graded and recorded. +- **Negative / honest limits:** `canonical56()` models the canonical grid as a contiguous 56-tone band — a reasonable physical interpretation of a *resampled* grid, but not a literal hardware tone map; the CIR gate still uses only the first node's CIR (#15); the `phase_variance` circular bug (#1) remains until it can be re-thresholded with data. +- **Neutral:** no public API removed; `with_cir_ht20()` kept (warned); files stay scoped; new bench is additive. diff --git a/api-docs/adr/ADR-155-nn-training-beyond-sota.md b/api-docs/adr/ADR-155-nn-training-beyond-sota.md new file mode 100644 index 00000000..a8c65209 --- /dev/null +++ b/api-docs/adr/ADR-155-nn-training-beyond-sota.md @@ -0,0 +1,202 @@ +# ADR-155: NN / Training Beyond-SOTA Sweep — Milestone 1 (Claim Integrity, Honest Validation, the Unified Metric, and the SOTA Landscape) + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-06-11 | +| **Deciders** | ruv | +| **Codebase target** | `wifi-densepose-train` (`metrics.rs`, `dataset.rs`, `proof.rs`, `rapid_adapt.rs`, `ruview_metrics.rs`, `config.rs`, `ablation.rs`, `subcarrier.rs`, `bin/train.rs`, `bin/verify_training.rs`), `wifi-densepose-nn` (`tensor.rs`, `translator.rs`, `onnx.rs`), benches, docs | +| **Relates to** | ADR-154 (Signal/DSP sweep, Milestone 0), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-150 (RF Foundation Encoder), ADR-079 (Camera-Supervised Pose), ADR-027 (MERIDIAN), ADR-024 (AETHER) | +| **Scope** | Milestone 1 of the beyond-SOTA NN/training sweep: the **integrity-critical** fixes that let the training/metrics subsystem substantiate a clean accuracy claim (the unified metric, leak-free validation, honest TTA, rigorous proof), a focused set of **correctness/security** fixes, two **measured** perf wins, the NN SOTA landscape with evidence grades, and a prioritized backlog. **~45 review findings are explicitly deferred (§8)** — nothing is silently dropped. | + +--- + +## 0. PROOF discipline (this ADR's contract) + +This project has been publicly accused of "AI slop." Milestone 1 is the **most integrity-critical** of the sweep because a gap review found the training/metrics subsystem **could not substantiate a clean accuracy claim**: there were four divergent PCK implementations and three divergent OKS implementations, a model trained on real data was validated against a *synthetic* set, the dataset had no leak-free split, the test-time-adaptation path descended a *fake* gradient, and the deterministic proof self-certified on any loss decrease (including float noise) with no committed baseline. + +We answer that with **evidence, not adjectives**: + +- Every integrity fix ships with a **committed regression test that would have caught the bug**. +- Every perf number is **MEASURED before/after** with the exact reproduce command. A perf claim without a measured before/after is **UNPROVEN** and is not made here. +- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **THEORETICAL**. +- We disclose, in full, what the proof does **not** prove and what remains unmeasured. + +### Build/test constraint (disclosed) + +The reportable-metric code (`metrics.rs`, `trainer.rs`, `proof.rs`, `model.rs`, `losses.rs`) is gated behind the `tch-backend` Cargo feature (libtorch FFI). libtorch is **not installed on the development host**, so the project's standard gate is `cargo test --workspace --no-default-features` (no tch). The canonical-metric *logic* is therefore validated two ways: (1) the non-tch reachable surface (`compute_pck`/`compute_oks` free functions, `dataset.rs` split, `rapid_adapt.rs`, `ruview_metrics.rs`) runs under the workspace test suite with new regression tests; (2) the `tch`-gated accumulator/trainer/proof changes are routed through those same canonical functions, so the metric definition is identical whether or not tch is present. This limitation is disclosed rather than hidden. + +--- + +## 1. Context — the seven divergent metric definitions + +The gap review found **four** PCK and **three** OKS implementations that disagreed on normalization, on the zero-visible-joint case, and on the OKS scale: + +| # | Location | Normalizer | Zero-visible PCK | OKS scale | +|---|----------|-----------|------------------|-----------| +| PCK-1 | `metrics.rs` `MetricsAccumulator` (the trainer's) | bbox **diagonal** | **1.0** (false-perfect bug) | normalized-coord diag² | +| PCK-2 | `metrics.rs` `compute_pck` | torso **hip↔shoulder** | 0.0 | — | +| PCK-3 | `metrics.rs` `compute_pck_v2` | torso **hip↔hip** (pixel) | 0.0 | — | +| PCK-4 | `training_bench.rs` | **raw threshold** (no torso) | 0.0 | — | +| OKS-1 | `metrics.rs:443` `compute_oks` | — | — | caller `s` (`1.0` ⇒ fake Gold) | +| OKS-2 | `metrics.rs:994` `compute_oks_v2` | — | — | `sqrt(area)` (could be 0) | +| OKS-3 | `ruview_metrics.rs:642` | — | — | caller `s` (`1.0` ⇒ fake Gold) | + +Two of these are not merely inconsistent, they are **wrong in a claim-inflating direction**: + +- **The `MetricsAccumulator` zero-visible-joint bug** scored a sample with *no visible joints* as PCK = 1.0 ("no errors to measure"). An empty or garbage prediction could thus *inflate* the reported metric. +- **The OKS `s = 1.0`-on-normalized-coordinates bug** ("fake Gold tier"): with keypoints in `[0,1]` and the scale fixed at `1.0`, every squared distance is ≈0 and the exponential kernel returns ≈1.0 for *any* pose. OKS looked near-perfect regardless of prediction quality. + +This is the same metric-bug class ADR-152 flagged. Milestone 1 closes it for real. + +--- + +## 2. Decision — TIER 1: CLAIM INTEGRITY (the "prove everything" core) + +### 2.1 Unify the metrics — ONE canonical definition — ACCEPTED & IMPLEMENTED + +There is now exactly **one** PCK and one OKS that may be used for any *reported* number, in the `canonical` region of `metrics.rs`: + +- **`pck_canonical(pred, gt, vis, k)` — torso-normalized PCK@k.** A keypoint `j` is correct iff `‖pred_j − gt_j‖₂ ≤ k · torso`, where `torso = ‖left_hip(11) − right_hip(12)‖₂` in the keypoint coordinate space, with a **bounding-box-diagonal fallback** when the hips are not both visible. This is the COCO / ADR-152 convention validated in `benchmarks/wiflow-std/RESULTS.md` (the ~96% PCK@20 reproduction — hip↔hip torso, COCO Setting). **Zero visible joints ⇒ `(0, 0, 0.0)`** — a sample with no measurable evidence scores 0, never 1. +- **`oks_canonical(pred, gt, vis)` — COCO OKS.** `s = sqrt(area)` is derived from the **GT pose extent** (the canonical torso size as a robust, always-positive scale proxy), never a fixed `1.0`. There is no escape hatch that makes OKS ≈ 1.0 for any pose; a degenerate (zero-extent) pose returns 0.0. + +**Single source of truth, enforced.** `MetricsAccumulator::update` (the trainer's), `compute_pck`, `compute_per_joint_pck`, `compute_oks`, `aggregate_metrics`, and the deprecated `compute_pck_v2`/`compute_oks_v2`/`MetricsAccumulatorV2` **all route through** `pck_canonical`/`oks_canonical`. So `Trainer::evaluate()` → `MetricsAccumulator` → canonical; the WiFlow-STD bench definition (RESULTS.md) is the reference the canonical *matches*. `eval.rs` reports MPJPE (a distinct, non-divergent error metric, unchanged). The `v2` functions and the `training_bench.rs` raw-threshold kernel are annotated **`#[deprecated]` / "DO NOT USE for reported metrics"**. + +**The two claim-inflating bugs are fixed and pinned by regression tests:** + +- `canonical_pck_zero_visible_is_zero_not_one` — no-visible ⇒ PCK 0.0 (was 1.0). +- `canonical_oks_not_one_for_wrong_pose_on_normalized_coords` — a pose off by 3× the torso on `[0,1]` coords yields OKS < 0.2 (the old `s=1.0` path returned ≈1.0). +- `canonical_pck_uses_hip_to_hip_torso`, `canonical_torso_falls_back_to_bbox_when_hips_hidden` — pin the normalizer. +- `all_invisible_gives_zero_pck` (renamed from `all_invisible_gives_trivial_pck`, comment cites this ADR) — the trainer accumulator now scores no-visible as 0. + +**Legitimately changed test expectations** (each updated with a comment citing this finding): the historical "perfect on an all-coincident pose" fixtures used keypoints at a single point, which is *correctly unscoreable* under canonical (zero extent ⇒ no scale). Test fixtures were given a real ±0.05 hip span so the canonical normalizer is positive; `all_invisible_*` flipped from 1.0 → 0.0. + +### 2.2 Honest validation — leak-free split + synthetic-val disclosure — ACCEPTED & IMPLEMENTED + +**The leak.** MM-Fi windows are extracted with **stride 1** (`MmFiEntry::num_windows = num_frames − window_frames + 1`), so adjacent windows overlap by `window_frames − 1` frames (~99% at the default 100-frame window). And `bin/train.rs` validated a *real* MM-Fi training run against a **synthetic** val set "for pipeline verification" — any PCK it printed was meaningless on two counts. + +**The fix (mirroring the leak-free discipline of `occupancy_bench::EvalSplit`):** + +- `MmFiDataset::subject_disjoint_split(test_subject_fraction, seed) → (train_view, test_view)` partitions **whole subjects** to one side. Because every window of a subject travels with that subject, the two views share **no subject and no window** — leak-free by construction, deterministic per seed. Returns `DatasetError::InvalidSplit` on <2 subjects, bad fraction, or an empty side. +- `assert_split_leak_free(train, test)` independently verifies subject-disjointness **and** window-index-disjointness, and is called inside the split so a leaky split can never be handed out. +- `bin/train.rs` now **prefers the real split**; the synthetic path is reachable only as a labelled fallback (single-subject data) and is routed through a new `run_smoke_test` that prefixes every metric `[SMOKE-TEST] (DO NOT REPORT)`. `--dry-run` is likewise relabelled. A synthetic-val PCK can no longer be mistaken for a measurement. + +**Leak-free proof (tests):** `subject_split_is_subject_and_window_disjoint` (no shared subject, no shared window index, partition covers every window once), `subject_split_is_deterministic_for_seed`, `subject_split_rejects_single_subject`, `subject_split_rejects_bad_fraction`, `assert_leak_free_detects_injected_subject_leak` (the validator catches a deliberately-injected subject overlap — a guard against future partitioner bugs). + +### 2.3 rapid_adapt honesty — real gradients, scoped claim — ACCEPTED & IMPLEMENTED + +`rapid_adapt.rs`'s `contrastive_step`/`entropy_step` wrote a **fake gradient** (`grad += v * 0.01`) unrelated to the stated triplet / entropy objective — so any "TTA improves the metric" was unsupported by the code. + +**Resolution: real gradients (not removal).** The two `*_loss` functions are now **pure evaluators** of the real objective; `RapidAdaptation::adapt` descends them with a **central finite-difference gradient** of that exact loss (`∂L/∂wᵢ ≈ (L(w+εeᵢ) − L(w−εeᵢ))/2ε`). Finite differences genuinely minimize the stated objective (to O(ε²) truncation), so "the adaptation loss decreases" is now a **real, reproducible** measurement rather than an artefact of a hand-tuned step. The returned `final_loss` is the *actual* objective at the produced weights. + +**Honest scope caveat (recorded in the module and here):** this minimizes a *self-supervised proxy* (temporal-contrastive + prediction entropy) over a tiny LoRA bottleneck on raw CSI. It is **NOT** wired to the pose model, and **there is no measured end-to-end PCK gain on WiFi pose from this path.** TTA-on-pose is a future, **not-yet-measured** capability — no PCK improvement may be cited from this module. + +**Tests:** `contrastive_loss_decreases` and `entropy_loss_decreases` (20/30 real gradient steps do not increase the loss vs 0 steps), `reported_loss_is_the_real_objective_not_a_placeholder` (the returned `final_loss` equals an independent recomputation of the objective at the output weights — i.e. it is the real loss, not a fabricated number). + +### 2.4 proof.rs rigor — margin + committed-hash requirement — ACCEPTED & IMPLEMENTED + +The deterministic proof self-certified: `generate_expected_hash` blessed whatever the pipeline emitted, PASS counted *any* loss decrease (including 1e-9 float noise), and a *missing* expected hash defaulted to PASS. + +**Two hardenings:** + +1. **Minimum-decrease margin.** `MIN_LOSS_DECREASE = 1e-4`. A run counts as "learning" only when `initial − final ≥ MIN_LOSS_DECREASE` — well above float noise, far below a real step's decrease. A pipeline that only wanders by noise now **FAILS**. +2. **No-hash is a SKIP, never a PASS.** `ProofResult::is_pass()` requires `hash_matches == Some(true)` (a *committed* `expected_proof.sha256`). An absent baseline yields SKIP (exit 2). The `verify-training` binary additionally **fails fast** on a sub-margin loss *before* the hash comparison, so a missing baseline can never downgrade a non-learning pipeline to SKIP. + +**What this proves — and what it does NOT (disclosed):** the proof certifies **reproducibility and determinism** (same seed ⇒ same weights ⇒ same hash) and that the optimiser *measurably* reduces a loss. It runs on a deterministic *synthetic* dataset by construction, so it does **not** prove the shipped weights came from real MM-Fi data, nor that any accuracy claim is met. Accuracy is substantiated separately (`benchmarks/wiflow-std/RESULTS.md`). There is currently **no committed `expected_proof.sha256` for the Rust proof**, so it is honestly in the SKIP state until a baseline is committed on a libtorch-enabled host — and SKIP is now reported as SKIP, not green. + +**Tests:** `no_committed_hash_is_skip_not_pass`, `submargin_loss_change_fails_even_without_hash`, `committed_matching_hash_with_real_decrease_passes`. + +--- + +## 3. Decision — TIER 2: CORRECTNESS / SECURITY + +Each fix ships a test that would have caught the bug (all in the non-tch, workspace-tested surface). + +| Finding | File | Fix | Test | +|---------|------|-----|------| +| `softmax(axis)` ignored the axis (whole-tensor normalize — breaks densepose per-pixel probs) | `nn/tensor.rs` | softmax along the given axis per lane; out-of-range axis ⇒ `NnError` (no panic) | (tier-2 suite) | +| `apply_attention` identity/uniform stub (any "with attention" ablation == without) | `nn/translator.rs` | **implemented real single-head scaled-dot-product attention** (`softmax(QKᵀ/√d)V` with Q/K/V/output projections); mis-shaped checkpoint projections rejected so a bad checkpoint can't silently become a no-op | `test_attention_is_not_uniform_stub`, `test_attention_rejects_wrong_weight_shape` | +| `config.validate()` had no UPPER bounds (config-OOM class still open) | `train/config.rs` | upper bounds on `window_frames`/subcarriers/`backbone_channels`/`heatmap_size`/keypoints/parts/`batch_size`; reject negative `gpu_device_id` | rejection tests; defaults+presets still validate | +| `subcarrier.rs` panic on non-contiguous input | `train/subcarrier.rs` | graceful path / typed error on strided input | non-contiguous-input test | +| `ablation.rs` `latency_percentiles` `partial_cmp().unwrap()` NaN panic | `train/ablation.rs` | `total_cmp` / NaN-guarded compare | NaN-input no-panic test | +| `onnx.rs` unchecked `-1` dim cast | `nn/onnx.rs` | reject negative/zero output dims with `NnError` | guarded-dim test | +| `ruview_metrics` `compute_single_oks` `s=1.0` fake-Gold + unguarded `[j]<17` | `train/ruview_metrics.rs` | derive scale from GT extent when none supplied; reject `s≤0`; bound the loop to array extents | `oks_rejects_nonpositive_scale`, `oks_does_not_panic_on_short_arrays`, `oks_not_perfect_for_wrong_pose_with_derived_scale` | + +`rf_encoder.rs` was inspected and found to contain **no checkpoint-deserialization assert**: its `assert_eq!`s in `LinearHead::new` / `ContrastiveBatcher::new` are documented construction-time API contracts on *programmer-supplied* vector lengths, not adversarial-input panics — the described bug does not exist there. Any genuine checkpoint-load assert lives in the tch-gated `proof.rs`/`trainer.rs` path and is deferred (§8) as unverifiable without libtorch. Test pass counts: nn `--no-default-features` **35 passed**, nn `--features onnx onnx::tests` **3 passed**, train `--no-default-features` lib **176 passed**. + +--- + +## 4. Decision — TIER 3: MEASURED perf wins (new criterion benches) + +All numbers MEASURED on the Windows dev host with the `onnx` feature (`ort 2.0.0-rc.11`, runtime auto-downloaded), committed in `nn/benches/onnx_bench.rs`. + +### 4.1 Zero-copy ORT input — LANDED, MEASURED + +`onnx.rs` built the ORT input via `arr.iter().cloned().collect::>()` — a full element-wise copy. Replaced with a contiguous fast path (`arr.as_slice() ⇒ single memcpy`, iterator fallback only for strided views). + +- **Reproduce:** `cargo bench -p wifi-densepose-nn --no-default-features --features onnx --bench onnx_bench -- onnx_input_copy` +- **Measured** (input `[1,256,64,64]` = 1.05M f32): **1.972 ms → 1.336 ms (~1.48× faster)**, 532 → 785 Melem/s. Strided fallback unchanged (within noise), correctness preserved. End-to-end real-model inference: ~45.9 µs. + +### 4.2 ONNX per-inference write-lock — DIAGNOSED, NOT LANDABLE (honest) + +`OnnxBackend::run` takes a `parking_lot::RwLock` **write** lock per inference, serializing concurrency. The intended fix was a read-lock. **It is not landable on `ort 2.0.0-rc.11`:** the safe `Session::run` is `&mut self` (verified against the vendored source) — there is no `&self` run path, so a read-lock fails the borrow checker. The underlying C++ `OrtSession::Run` is thread-safe, but exploiting that would require an `unsafe` interior-mutability bypass; we did **not** introduce that soundness risk. The write lock was kept, with a doc comment recording the upgrade path (a future `ort` with `&self` run ⇒ flip to `read()`). + +- **Harness landed anyway**, empirically proving the serialization: `cargo bench -p wifi-densepose-nn --no-default-features --features onnx --bench onnx_bench -- onnx_concurrency` → throughput **drops** with more threads (1 thr 19.4 Kelem/s → 2 thr 16.9K → 4 thr 14.0K → 8 thr 14.3K). When `ort` exposes `&self` run, the one-line lock change will show the speedup on this same bench. + +The native-conv naive-loop rewrite was **deferred** (§8) as out of scope for a measured milestone. + +--- + +## 5. The NN / training SOTA landscape (graded) + +| Candidate | What | Grade | Verdict | +|-----------|------|-------|---------| +| **GraphPose-Fi** (arXiv 2511.19105, code github.com/Cirrick/GraphPose-Fi) | Graph/skeleton pose **decoder** for cross-environment WiFi pose; MM-Fi, 17 joints — matches our setup. ADR-150 §2.2 named a graph decoder but never built it. | **CLAIMED** (preprint; cross-env gains author-reported) | **Top beyond-SOTA candidate. Propose as ACCEPTED-future — NOT built here.** Best fit because the decoder is a drop-in on our 17-joint MM-Fi backbone and directly targets the cross-environment brittleness ADR-150/ADR-027 fight. | +| **ONNX INT4** | Extend our **measured** INT8 ONNX quantization to INT4 for edge. | **THEORETICAL** for our pipeline (INT8 is MEASURED; INT4 untested here) | #2 priority — natural extension of a measured capability. | +| **CSI-JEPA vs MAE A/B** | Joint-embedding predictive pretraining vs the ADR-152 §2.3 MAE recipe. | **CLAIMED** (JEPA strong elsewhere) — **honest caveat: no JEPA *or* MAE result exists on WiFi POSE yet** (ADR-152 F3: UNSW MAE downstream tasks are classification, not pose). | #3 — run as a measured A/B, do not pre-announce a winner. | +| **"Mamba-CSI-pose"** | A state-space-model CSI pose backbone. | — | **Does NOT exist. Do not propose it.** No such artifact in the 2025–2026 literature; naming it would be exactly the kind of unfounded claim this sweep exists to prevent. | + +--- + +## 6. Validation + +- `cargo test --workspace --no-default-features` — green (the metric unification legitimately changed a handful of test expectations; each was updated with a comment citing the finding, and the trainer/eval/proof now all route through the one canonical metric). +- `python archive/v1/data/proof/verify.py` — `VERDICT: PASS` (Python pipeline proof, independent of the Rust changes). +- New criterion benches compile and run under the `onnx` feature. + +--- + +## 7. What changed, file by file + +- `metrics.rs` — `canonical_torso_size`, `pck_canonical`, `oks_canonical` (single source of truth); `MetricsAccumulator`/`compute_pck`/`compute_per_joint_pck`/`compute_oks`/`aggregate_metrics` route through them; `compute_pck_v2`/`compute_oks_v2`/`MetricsAccumulatorV2` deprecated → canonical; zero-visible and `s=1.0` bugs fixed; canonical bug-catching tests. +- `dataset.rs` — `subject_disjoint_split`, `MmFiSplitView`, `assert_split_leak_free`; leak-free split tests. +- `error.rs` — `DatasetError::InvalidSplit`. +- `bin/train.rs` — prefer real subject-disjoint split; synthetic path relabelled `run_smoke_test` ("DO NOT REPORT"). +- `proof.rs` + `bin/verify_training.rs` — `MIN_LOSS_DECREASE` margin; no-hash ⇒ SKIP-not-PASS; sub-margin ⇒ FAIL-not-SKIP; new tests. +- `rapid_adapt.rs` — fake gradient removed; finite-difference gradient of the real objective; honesty docs + tests. +- `ruview_metrics.rs` — OKS scale derived from GT extent (no `s=1.0`); `s≤0` rejected; OKS loop bounded; tests. +- `config.rs` / `ablation.rs` / `subcarrier.rs` / `nn/tensor.rs` / `nn/translator.rs` / `nn/onnx.rs` — Tier-2 fixes (§3) + Tier-3 perf (§4). +- `training_bench.rs`, `sensing-server/training_api.rs` — divergent local PCK kernels annotated "DO NOT USE for reported metrics"; the sensing-server torso-height PCK unification is a **deferred** backlog item (separate service + tch boundary). + +--- + +## 8. Deferred backlog (NOT silently dropped) + +The gap review surfaced ~60 findings; this milestone scoped to the provable integrity-critical subset plus two measured perf wins. The remainder are tracked here for a future ADR-155 milestone: + +- **GraphPose-Fi graph decoder** — build the §5 top candidate (ACCEPTED-future, not built). +- **ONNX INT4** quantization; **CSI-JEPA vs MAE** A/B; the rest of the §5 roadmap. +- **ONNX read-lock concurrency win** — blocked on an `ort` release exposing `&self` `Session::run` (§4.2); harness already committed. +- **native-conv naive-loop** perf rewrite (§4). +- **`rf_encoder.rs` `assert_eq!`-on-checkpoint** and any other **tch-gated** panic-on-input sites — require a libtorch host to compile/verify (`model.rs` `amp_fc1` unbounded alloc is *indirectly* guarded by the new `config.validate()` upper bounds, but a direct guard + test is deferred). +- **`sensing-server/training_api.rs` PCK** — unify the live-server torso-height PCK with `pck_canonical` (crosses the service + tch boundary). +- **`test_metrics.rs` reference kernels** — the integration test's local `compute_pck`/`compute_oks` are independent reference impls (not production); fold them onto the canonical definition. +- The remaining ~40 lower-severity review findings (style, micro-opt, doc) from the NN/training gap review. + +--- + +## 9. Consequences + +**Positive.** The training/metrics subsystem can now substantiate a clean accuracy claim: one documented metric used everywhere, a leak-free split, an honest TTA path, a proof that fails on noise and refuses to bless an unbaselined run, and two of the most claim-inflating bugs (false-perfect PCK, fake-Gold OKS) closed and pinned by regression tests. The unmeasured/unprovable parts are **disclosed**, not hidden. + +**Negative / honest.** The reportable-metric tch-gated code cannot be compiled on the dev host (libtorch absent), so its validation rests on routing through the workspace-tested canonical functions plus review; the Rust deterministic proof is in SKIP until a baseline is committed on a tch host; the ONNX concurrency win is blocked upstream; and ~45 findings are deferred. None of these is presented as done. diff --git a/api-docs/adr/ADR-156-ruvector-fusion-beyond-sota.md b/api-docs/adr/ADR-156-ruvector-fusion-beyond-sota.md new file mode 100644 index 00000000..69ffc49d --- /dev/null +++ b/api-docs/adr/ADR-156-ruvector-fusion-beyond-sota.md @@ -0,0 +1,153 @@ +# ADR-156: RuVector / Cross-Viewpoint Fusion Beyond-SOTA Sweep — Milestone 2 (Correctness Integrity, an Honest GDOP, Crafted-Input Safety, a Measured Hot-Path Win, and the ANN/Fusion SOTA Landscape) + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-06-11 | +| **Deciders** | ruv | +| **Codebase target** | `wifi-densepose-ruvector` — `viewpoint/` (`attention.rs`, `geometry.rs`, `fusion.rs`, `coherence.rs`), `mat/` (`triangulation.rs`, `heartbeat.rs`), `sketch.rs`, benches, docs | +| **Relates to** | ADR-031 (RuView sensing-first RF mode), ADR-016/017 (RuVector integration), ADR-024 (AETHER re-ID), ADR-027 (MERIDIAN cross-env), ADR-084 (RaBitQ similarity sensor), ADR-138 (ClockQualityGate), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-154 (Signal/DSP sweep M0), ADR-155 (NN/Training sweep M1) | +| **Scope** | Milestone 2 of the beyond-SOTA sweep: four **correctness/integrity/security** fixes on the cross-viewpoint fusion path (each pinned by a regression test that fails on the old code), one **measured** hot-path perf win + a new criterion bench, the ANN/fusion SOTA landscape graded MEASURED/CLAIMED/data-gated, and a prioritized deferred backlog. **Nothing is silently dropped.** | + +--- + +## 0. PROOF discipline (this ADR's contract) + +This project has been publicly accused of "AI slop." Milestone 2 answers with **evidence, not adjectives** — the same contract as ADR-154/155: + +- Every correctness/integrity fix ships a **committed regression test that fails on the old code and passes on the new**. We verified each by reverting the fix and observing the test fail (recorded in §6). +- Every perf number is **MEASURED before/after** with the exact reproduce command and a committed criterion bench. A perf claim without a measured before/after is **UNPROVEN** and is not made here. +- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **DATA-GATED**, distinguishing what a paper *measured* from what it *asserts* from what our own prior measurement (ADR-152) says is **not currently the bottleneck**. +- We disclose, in full, the **one staged finding that turned out to be a numeric no-op** (§2.1): the geometric-bias "angular wrap bug" is real as a *contract* violation but, because the bias kernel is `cos()` (even and 2π-periodic), it changes **no output value** under the current kernel. We land the fix anyway (it matches the documented contract and reuses the canonical helper) but we **do not claim a behaviour change** — that would be exactly the kind of inflation this sweep exists to prevent. + +Test machine for the perf numbers: Windows 11, `cargo bench --release`, criterion 0.5. Numbers are wall-clock medians on this box; the **ratio** (before/after) is the claim, not the absolute ns. + +Build/test gate: `cargo test --workspace --no-default-features` (the project's standard gate — no `crv`/GPU features). All fixes in this milestone are on the **default, non-feature-gated surface**, so they are fully exercised by the standard gate. + +--- + +## 1. Context + +The cross-viewpoint fusion stack (`viewpoint/` — ADR-031) combines per-viewpoint AETHER embeddings into one fused embedding via geometric-bias attention, gated by phase coherence, with array-geometry quality scored by a Geometric Diversity Index and a Cramér-Rao bound. The `mat/` survivor-localisation helpers (`triangulation.rs`, `heartbeat.rs`) share the same crate. A beyond-SOTA review surfaced findings spanning a **mislabeled metric**, an **angular-distance contract violation**, **crafted-input panics on a network-reachable path**, and a **redundant clone in the fusion hot path**, plus an ANN/fusion SOTA-research gap. Milestone 2 closes the provable subset and grades the research landscape. + +--- + +## 2. Decision — CORRECTNESS / INTEGRITY FIXES + +Each fix ships a regression test (all on the non-feature-gated, workspace-tested surface). + +### 2.1 GeometricBias angular separation — use the canonical *wrapped* distance — ACCEPTED & IMPLEMENTED (honest: numeric no-op under the current cos kernel) + +**The finding.** `attention::GeometricBias::build_matrix` computed the pairwise angular separation as the **raw** `|azimuth_i − azimuth_j|`. That can exceed π and mis-states the separation across the 0/2π seam (350° and 10° are 20° apart, but raw `|Δ|` = 340°). The module already had a correct wrapped helper, `geometry::angular_distance` (returns `[0, π]`), but it was **private** and `GeometricBias` did not use it. + +**The honest correction (disclosed, not hidden).** The bias kernel is `w_angle·cos(theta_ij)`. Because `cos` is **even and 2π-periodic**, `cos(raw) == cos(wrapped)` for every pair (verified numerically: max abs diff `1.1e-16` across seam-crossing test cases). So under the *current* kernel this "bug" produces **identical bias values** — it is a **contract violation, not a behaviour bug**. We say so plainly rather than dressing a no-op as a fix. + +**Why land it anyway.** (1) It makes the code satisfy its own documented contract (`theta_ij`: "angular separation in radians", which must be `[0, π]`). (2) It reuses the **single canonical** `angular_distance` helper (now made `pub`), eliminating a divergent angle computation — the same single-source-of-truth discipline ADR-155 applied to metrics. (3) It is **correct by construction** for any future non-even angular kernel (e.g. a linear `w_angle·theta_ij` penalty), which the raw-diff form would silently break. + +**Tests:** `geometric_bias_angular_separation_uses_wrapped_distance` (pins that a seam-crossing pair's wrapped distance is 20° while its raw `|Δ|` exceeds π, and that `build_matrix` is symmetric across the seam) and `geometric_bias_linear_angular_kernel_would_catch_raw_diff` (pins the wrapped value ∈ `[0, π]` — the invariant a future linear kernel relies on; the raw-diff form gives 190° where the wrapped form gives 170°). + +### 2.2 Crafted-input panics on the fusion/localisation path — typed `None` instead of panic — ACCEPTED & IMPLEMENTED (the security item) + +**The finding (DoS).** Two functions on a path that can carry **network-sourced multistatic frames** panicked on crafted input: + +- `mat::triangulation::solve_triangulation` indexed `ap_positions[0]` (panics on an empty AP table) and `ap_positions[i]` / `ap_positions[j]` (panics when a TDoA measurement references an **out-of-range AP index**). A remote peer supplying a TDoA tuple `(i=99, …)` with only 3 APs triggers an out-of-bounds panic — a remotely-triggerable denial of service. +- `mat::heartbeat::CompressedHeartbeatSpectrogram::band_power` computed `self.n_freq_bins - 1`, which **underflows** (usize `0 − 1`) for a zero-bin spectrogram — a debug panic / release `usize::MAX` (then an out-of-range index). + +**The fix.** `solve_triangulation` uses `ap_positions.first()?` and `ap_positions.get(i)?` / `.get(j)?` — any empty table or out-of-range index returns `None`, never panics. `band_power` guards `n_freq_bins == 0` up front and **clamps both bounds** into `[0, last]`, returning `0.0` for empty/inverted ranges. No out-of-range index, no subtraction overflow, on any input. + +**Tests:** `triangulation_out_of_range_index_returns_none_no_panic`, `triangulation_empty_ap_positions_returns_none_no_panic`, `heartbeat_band_power_zero_bins_no_panic`, `heartbeat_band_power_out_of_range_bounds_no_panic`. Each **panics on the old code** (verified by reverting — §6) and returns a clean `None`/`0.0` on the new. + +### 2.3 GDOP mislabel — compute a real, dimensionless GDOP — ACCEPTED & IMPLEMENTED + +**The finding.** `geometry::CramerRaoBound` exposed a field named `gdop` ("Geometric Dilution of Precision") that was computed as `(crb_x + crb_y).sqrt()` — **identical to `rmse_lower_bound`**. That is the RMSE (metres, noise-dependent), **not** a GDOP. GDOP is a *dimensionless geometry factor* independent of the noise level; the name was a lie about the quantity. + +**The fix (honest rename was the fallback; real GDOP was cheap, so we computed it).** True GDOP `= sqrt(trace(G⁻¹))` where `G` is the **unit-variance** bearing-geometry matrix (the Fisher matrix with every `1/σ²` set to 1). It depends only on the array/target geometry and relates noise to position error as `rmse ≈ GDOP·σ`. We accumulate `G` alongside the FIM in both `estimate` and `estimate_regularised` (cheap 2×2), and report `INFINITY` (not NaN/panic) for a degenerate collinear geometry. The doc comment now states exactly what the field is and what it used to (wrongly) be. + +**Test:** `gdop_is_dimensionless_and_noise_independent` — scales every sensor's noise by 10× and asserts GDOP is unchanged while RMSE scales ~10×, and that `rmse ≈ GDOP·σ` at both noise levels. The old `gdop = sqrt(crb_x + crb_y)` **fails** this (it scaled with noise, proving it was RMSE) — verified by reverting (§6). + +### 2.4 `fuse()` double-clone in the aggregation hot path — eliminate the redundant clone — ACCEPTED & IMPLEMENTED (MEASURED — §4) + +**The finding.** `MultistaticArray::fuse` (and `fuse_ungated`) cloned every viewpoint embedding **twice** per fusion: once into the `extracted` tuple vector (`v.embedding.clone()`), then **again** when building the attention input (`extracted.iter().map(|(_, e, _, _)| e.clone())`). At the AETHER dimension (128 f32 = 512 B) over up to 8 viewpoints, that is a wholly redundant second heap allocation + memcpy per viewpoint, every TDM cycle. + +**The fix.** Build `extracted` once (the unavoidable clone out of the borrowed `self.viewpoints`), then **consume** `extracted` by value and **move** each embedding into the attention input (`embeddings.push(emb)`), capturing geometry/ids by `Copy` in the same pass. One clone per viewpoint instead of two. Measured win in §4. + +--- + +## 3. Security review (touched files) + +The §2.2 crafted-input panics **are** the security item: a DoS via out-of-range indices / zero-bin underflow on a fusion/localisation path that may be driven by network-sourced multistatic frames. Beyond those, the touched files were swept for further panic-on-untrusted-input / unbounded-alloc sites: + +- `attention.rs` — all indexing is over internally-sized `n × n` / `d` loops bounded by validated input lengths (`DimensionMismatch` is returned for ragged embeddings); softmax denominators are floored with `f32::EPSILON`. No unbounded alloc (sizes derive from caller-supplied vector lengths already validated against `d_in`). **No further action.** +- `geometry.rs` — `det`/`det_g` are floored before division; degenerate geometry yields `None`/`INFINITY`, never NaN-panic. **No further action.** +- `fusion.rs` — embedding dimension is validated in `submit_viewpoint`; the event log is bounded (`max_events`, oldest-half drain). **No further action.** +- `coherence.rs` — circular buffer is fixed-capacity; gate thresholds are clamped. **No further action.** + +No `unsafe`, no `unwrap()` on external input, and no unbounded allocation remain on the touched paths after §2.2. + +--- + +## 4. MEASURED perf win (new criterion bench) + +A new bench, `crates/wifi-densepose-ruvector/benches/fusion_bench.rs`, covers the fusion hot path. It has two groups: `fusion_pipeline` (end-to-end `MultistaticArray::fuse_ungated()` at 2/4/8 viewpoints, dim 128) and an isolated A/B of the §2.4 marshalling step (`embedding_extract/before_double_clone` vs `after_single_clone`). + +- **Reproduce:** `cargo bench -p wifi-densepose-ruvector --bench fusion_bench` +- **Measured (`embedding_extract`, 8 viewpoints × 128-d), medians:** `before_double_clone` **1.0029 µs** → `after_single_clone` **461.6 ns** — **~2.17× faster** on the marshalling step. The result is what theory predicts (two embedding clones collapse to one), confirming the redundant clone was the cost, not noise. +- **End-to-end `fusion_pipeline` (medians):** 2 vp = 56.3 µs, 4 vp = 99.5 µs, 8 vp = 202.1 µs. The marshalling (~0.5–1 µs) is **well under 1%** of total fusion cost (dominated by the `n×n` attention), so the **end-to-end** effect is modest by construction; the `embedding_extract` A/B isolates and proves the clone-elimination itself. We report this honestly rather than attributing the full 2.17× to the pipeline. + +The double-clone elimination is also correctness-neutral: all 100 `viewpoint`/`mat` lib tests pass unchanged. + +--- + +## 5. The ANN / cross-viewpoint-fusion SOTA landscape (graded) + +| # | Candidate | What | Grade | Verdict | +|---|-----------|------|-------|---------| +| **1** | **SymphonyQG** (SIGMOD 2025, public code) | Unified quantization + graph ANN; source reports **3.5–17× QPS over HNSW at equal recall**, pure-CPU / edge-portable. | **CLAIMED** (author-measured; **not reproduced on our hardware** — reproduction is future work) | **Lead beyond-SOTA candidate for the ruvector ANN path.** Propose as ACCEPTED-future; cite honestly as "claimed by source, reproduction pending." Best fit because the ruvector retrieval path (AETHER re-ID, sketch prefilter) is exactly an ANN problem and SymphonyQG is CPU/edge-portable like our deployment. | +| **2** | **Multi-bit / Extended RaBitQ** | Extends our existing **1-bit** `sketch.rs` (ADR-084) to multiple bits per dimension — precisely the "Pass 2" our own `sketch.rs` doc deferred (1-bit sign quantization ships first; rotation/more-bits "later if benchmark-measured top-K coverage drops below the ADR-084 90% threshold"). | **CLAIMED** (RaBitQ family well-characterised; our 1-bit baseline is MEASURED in `sketch_bench`) | **Accepted near-term.** Concrete, in-scope, incremental — extends a MEASURED capability rather than importing a new system. #2 priority. | +| **3** | **GraphPose-Fi-style learned antenna-attention + ChebGConv fusion head** | Would replace the current **untrained identity-projection + mean-pool** "attention" (the `CrossViewpointAttention` default is `ProjectionWeights::identity` — not a *learned* attention) with a learned graph fusion head. | **DATA-GATED** (per ADR-152 measurement (b): architecture is **NOT** the current bottleneck — **data is**) | **ACCEPTED-future, data-gated. Do NOT build now.** ADR-152's measured lesson was that swapping architecture without more/better paired data does not move PCK. Building a learned fusion head before the data exists would repeat the mistake ADR-155 §5 also flagged for GraphPose-Fi. | +| — | **Cramér-Rao / sensor-placement** (`geometry.rs` CRB) | Investigated for a 2026 advance beating the textbook Fisher-information CRB already implemented. | **Investigated — NO ACTION** | **Cleared honestly.** No 2026 method beats the closed-form Fisher-information CRB for this 2-D bearing problem; our implementation is already correct SOTA. (Recording a negative result is a deliberate anti-slop signal.) The only CRB change this milestone is the §2.3 *GDOP* honesty fix, which is a labelling/quantity correction, not an algorithmic one. | + +--- + +## 6. Validation + +- **Bug-catching tests verified to bite.** Each §2.2/§2.3/§2.4-adjacent fix was reverted and the corresponding test observed to **fail on the old code**, then restored: + - `triangulation_out_of_range_index_returns_none_no_panic` / `triangulation_empty_ap_positions_returns_none_no_panic` — **panic** (index out of bounds) on old code. + - `heartbeat_band_power_zero_bins_no_panic` — **panic** ("attempt to subtract with overflow") on old code. + - `gdop_is_dimensionless_and_noise_independent` — **assertion failure** (GDOP scaled with noise) on old code. + - §2.1 (angular wrap) is the **disclosed no-op**: its tests pin the *contract* (wrapped value ∈ `[0, π]`), since the cos kernel makes the bias value numerically identical with or without the fix. We do not claim a behaviour change. +- **`cd v2 && cargo test -p wifi-densepose-ruvector --no-default-features --lib`** — **100 passed / 0 failed** (was 93; +7 new tests). +- **`cd v2 && cargo test --workspace --no-default-features`** — **3050 passed / 0 failed** (full-workspace aggregate across all crates and test binaries; the +7 new `wifi-densepose-ruvector` tests are included and green). +- **`python archive/v1/data/proof/verify.py`** — **`VERDICT: PASS`** (the Python pipeline proof is independent of these Rust changes — confirmed unaffected). +- New `fusion_bench` compiles and runs under the default feature set. + +--- + +## 7. What changed, file by file + +- `viewpoint/geometry.rs` — `angular_distance` made `pub` (single canonical wrapped-angle helper); real dimensionless GDOP (`sqrt(trace(G⁻¹))`) in `estimate`/`estimate_regularised` (was RMSE mislabelled); `gdop` doc states the quantity and the prior bug; `gdop_is_dimensionless_and_noise_independent` test. +- `viewpoint/attention.rs` — `GeometricBias::build_matrix` uses the canonical wrapped `angular_distance` (contract fix; numeric no-op under cos — disclosed); two contract-pinning tests. +- `viewpoint/fusion.rs` — `fuse`/`fuse_ungated` move embeddings out of `extracted` (single clone, not double); existing tests unchanged and green. +- `mat/triangulation.rs` — `first()?` / `get(i)?` / `get(j)?` guards (no panic on empty table / crafted indices); two no-panic tests. +- `mat/heartbeat.rs` — `band_power` zero-bin guard + bounds clamp (no underflow / out-of-range index); two no-panic tests. +- `benches/fusion_bench.rs` (new) + `Cargo.toml` `[[bench]]` — fusion hot-path bench + the double-clone A/B. + +--- + +## 8. Deferred backlog (NOT silently dropped) + +The review surfaced more than this milestone scoped. Tracked here for a future ADR-156 milestone: + +- **SymphonyQG reproduction** (§5 #1) — reproduce the 3.5–17× QPS-over-HNSW claim on our hardware before integrating into the ruvector ANN path. Currently CLAIMED-only. +- **Multi-bit / Extended RaBitQ** (§5 #2) — implement the `sketch.rs` "Pass 2" (more bits per dimension and/or the randomized rotation) and re-measure top-K coverage against the ADR-084 ≥90% acceptance bar in `sketch_bench`. +- **Learned cross-viewpoint fusion head** (§5 #3, GraphPose-Fi-style) — **data-gated**: blocked on the paired multi-room data ADR-152 measurement (b) identified as the real bottleneck; do not build the architecture first. +- **`CrossViewpointAttention` learned projections** — the default `ProjectionWeights::identity` + mean-pool is honest but unlearned; wiring real learned Q/K/V projections is part of the data-gated item above (no learned weights ⇒ the "attention" is currently a geometric-bias-weighted average, which the code/docs should keep stating plainly). +- **`coherence.rs` / `fusion.rs` micro-opts and the remaining lower-severity review findings** (style, doc, further hot-path tuning) from the fusion gap review. + +--- + +## 9. Consequences + +**Positive.** The fusion path now: uses one canonical wrapped angular-distance helper; reports a **real** dimensionless GDOP instead of a mislabeled RMSE; cannot be panicked by crafted multistatic indices or a zero-bin spectrogram (DoS closed); and does one embedding clone per viewpoint instead of two (measured). Every fix is pinned by a test that fails on the old code, and the ANN/fusion SOTA landscape is graded so the near-term (multi-bit RaBitQ) and the data-gated (learned fusion) are not confused. + +**Negative / honest.** The headline angular-wrap fix is a **numeric no-op** under the current cos kernel — we land it for contract/maintainability, not because it changes an output, and we say so. The two strongest external candidates (SymphonyQG, learned fusion) are **not built here** — one is CLAIMED-pending-reproduction, the other is data-gated by a prior measurement. The perf win is a **local hot-path** improvement, modest in the end-to-end pipeline (attention dominates). None of these is presented as more than it is. diff --git a/api-docs/adr/ADR-157-hardware-sensing-beyond-sota.md b/api-docs/adr/ADR-157-hardware-sensing-beyond-sota.md new file mode 100644 index 00000000..44ac6ecf --- /dev/null +++ b/api-docs/adr/ADR-157-hardware-sensing-beyond-sota.md @@ -0,0 +1,191 @@ +# ADR-157: Hardware / Sensing-Acquisition Layer Beyond-SOTA Sweep — Milestone 3 (An Already-Hardened Layer, Three Small Real Fixes, an Honestly-Null Perf Win, and a Mostly-NO-ACTION SOTA Landscape) + +| Field | Value | +|-------|-------| +| **Status** | Proposed | +| **Date** | 2026-06-11 | +| **Deciders** | ruv | +| **Codebase target** | `wifi-densepose-vitals` (`heartrate.rs`, `breathing.rs`, `anomaly.rs`, `store.rs`), `wifi-densepose-wifiscan` (`pipeline/breathing_extractor.rs`, `pipeline/correlator.rs`, `adapter/netsh_scanner.rs`), `wifi-densepose-hardware` (`esp32_parser.rs`, `sync_packet.rs`, `esp32/secure_tdm.rs`, `ieee80211bf/*`), `wifi-densepose-calibration` (`geometry_embedding.rs`), benches, docs | +| **Relates to** | ADR-021 (ESP32 CSI vitals), ADR-022 (multi-BSSID WiFi sensing), ADR-028 (ESP32 capability audit + witness), ADR-032 (multistatic mesh security), ADR-110 (HE PPDU bandwidth), ADR-151 (per-room calibration), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-153 (802.11bf forward-compat), ADR-154 (Signal/DSP sweep M0), ADR-155 (NN/Training sweep M1), ADR-156 (RuVector/Fusion sweep M2) | +| **Scope** | Milestone 3 of the beyond-SOTA sweep across the four hardware/sensing-acquisition crates. The honest headline: **this layer is already well-hardened** — the real work is small. Three correctness/stability fixes (each pinned by a test that fails on the old code), one algorithmic perf change whose end-to-end win is **null at realistic window sizes** (disclosed, not inflated) with a committed bench, one defense-in-depth hardening on an unreachable path, a **MEASURED negative-results section** (the centerpiece — what was investigated and found already-correct), a graded SOTA landscape that is **mostly NO-ACTION**, and a deferred backlog. **Nothing is silently dropped.** | + +--- + +## 0. PROOF discipline (this ADR's contract) + +This project has been publicly accused of "AI slop." Milestone 3 answers with **evidence, not adjectives** — the same contract as ADR-154/155/156: + +- Every correctness/stability fix ships a **committed regression test that fails on the old code and passes on the new**. Each was verified by reverting the fix and observing the test fail (recorded in §6). +- Every perf number is **MEASURED before/after** with the exact reproduce command and a committed criterion bench. Where the win is below noise, we **say so and claim nothing** — see §4, which is a deliberately-disclosed near-null result. +- Every external SOTA reference is graded **MEASURED** / **CLAIMED** / **DATA-GATED**, and where the right answer is "do nothing," we record the negative result explicitly (§5) — a stronger anti-slop signal than a fix. +- The headline of this milestone is itself a negative result: **the acquisition layer was already hardened.** We disclose what we *checked and did not change* (§3) in as much detail as what we changed (§2), because "investigated, already correct, no action" is the most honest thing a sweep can report when it is true. + +Test machine for the perf numbers: Windows 11, `cargo bench --release`, criterion 0.5. Numbers are wall-clock medians on this box; the **ratio** (before/after) is the claim, not the absolute ns. + +Build/test gate: `cargo test --workspace --no-default-features` (the project's standard gate — no GPU/`crv` features). All fixes in this milestone are on the **default, non-feature-gated surface**, so they are fully exercised by the standard gate. The serde-validated `ieee80211bf` types are additionally verifiable with `--features serde`; the live-QUIC path in `secure_tdm` is structurally tested (HMAC/replay/tamper) but not live-socket-tested in CI. + +--- + +## 1. Context + +The hardware/sensing-acquisition layer is the bottom of the stack: it turns raw RF (ESP32 CSI frames, multi-BSSID netsh scans, 802.11bf measurement reports) into typed, validated domain objects that the signal/fusion/NN layers above consume. A beyond-SOTA review of the four crates surfaced far **fewer** real defects than the signal (ADR-154) or fusion (ADR-156) sweeps — because this layer was written defensively from the start: length-gated parsers, `Option`-returning helpers, `#[serde(try_from)]` validate-on-deserialize, FSMs that return `Result` instead of panicking, and HMAC-authenticated + replay-protected TDM beacons. + +The genuine findings are three: an **O(n²) sliding-window data-structure choice** in the vital-sign extractors (perf, latent), a **partial-weights scale-mixing bug** in breathing fusion (correctness), and an **IIR resonator that can diverge at pathologically low sample rates** (stability). Everything else the review flagged turned out to be already-safe — documented in §3 as MEASURED negative results. + +--- + +## 2. Decision — the fixes that landed + +Each correctness/stability fix ships a regression test on the non-feature-gated, workspace-tested surface. + +### 2.1 §A1 — `Vec::remove(0)` O(n²) sliding windows → `VecDeque` (PERF, latent; MEASURED via bench — near-null at realistic sizes, disclosed) + +**The finding.** Every fixed-length sliding window in the extractors was a `Vec`/`Vec` whose oldest-sample eviction used `Vec::remove(0)` — an **O(n) shift of the whole buffer on every sample**, making a full-window `extract()` sweep O(n²). Six sites: + +| File | Site | Buffer | +|------|------|--------| +| `vitals/heartrate.rs` | `extract` history window | `Vec` → `VecDeque` | +| `vitals/breathing.rs` | `extract` history window | `Vec` → `VecDeque` | +| `vitals/anomaly.rs` | `rr_history` / `hr_history` | `Vec` → `VecDeque` (×2) | +| `vitals/store.rs` | `readings` ring buffer | `Vec` → `VecDeque` | +| `wifiscan/pipeline/breathing_extractor.rs` | filtered history | `Vec` → `VecDeque` | +| `wifiscan/pipeline/correlator.rs` | per-BSSID histories | `Vec>` → `Vec>` | + +**The fix.** Swap to `VecDeque` with `push_back` + `pop_front` (O(1) eviction). Where the autocorrelation / zero-crossing / Pearson loop needs a contiguous slice, call `make_contiguous()` (or `as_slices().0` after it) **once per `extract()`**. This matches the idiom already used correctly in `wifiscan/pipeline/orchestrator.rs`. **Output is bit-identical** — no behavior test bites; the change is bench-gated. + +**The honest measurement (§4).** In **isolation**, the eviction cost collapses from O(n²) to O(n): a microbenchmark of pure eviction shows **34.6× at window=3000 and 3158× at window=100000**. But in the **full `extract()` path at realistic ESP32 window sizes** (heartrate ~1500, breathing ~3000), the per-frame DSP (autocorrelation is O(window·lags); zero-crossing is O(window)) **dominates the eviction entirely**, so the end-to-end win is **below noise** — measured `heartrate` 42.8 ms (before) vs 44.4 ms (after), `breathing` 7.95 ms vs 7.86 ms: overlapping confidence intervals, **no measurable change**. We land A1 because it is the correct data structure and removes a latent O(n²) that *would* bite at higher sample rates or longer windows — **not** because it speeds up the current hot path, which it does not measurably. Claiming an end-to-end speedup here would be exactly the inflation this sweep exists to prevent (the same discipline ADR-156 §2.1 applied to its cos no-op). + +### 2.2 §A2 — `breathing.rs` partial-weights scale-mixing (CORRECTNESS, real) + +**The finding.** `BreathingExtractor::extract` fused per-subcarrier residuals as `Σ residuals[i]·w[i]` where `w[i] = weights.get(i).unwrap_or(1/n)`. The result was **never normalized**. When `weights` was supplied **shorter than** `n`, the supplied entries (e.g. attention weights ~10.0) were used **raw** while the missing tail defaulted to `uniform_w = 1/n` (~0.125) — two scales summed with no renormalization, **silently mis-scaling the breathing signal** by a factor that depends on `weights.len()`. A caller passing 2 high attention weights for an 8-subcarrier frame got a fused value ~20× too large. + +**The fix.** Extracted the fusion into `fuse_weighted_residuals(residuals, weights, n)` and normalized by `Σ(effective weights)` — `weighted_sum / weight_total` — mirroring the **already-correct** pattern in `heartrate::compute_phase_coherence_signal`. A partial weight slice now produces a true weighted average in the residual range, independent of `weights.len()`. + +**Tests (fail on old code, verified by reverting — §6):** +- `partial_weights_are_renormalized_not_scale_mixed` — `residuals=[1.0;8]`, `weights=[10.0,10.0]` → fused value `1.0` (the renormalized weighted mean), and explicitly **not** the old scale-mixed sum `2·10 + 6·0.125 = 20.75`. +- `partial_weights_fusion_is_weighted_average` — differing residuals → a proper weighted average within `[0, 2]`, which the old un-normalized sum is not. + +### 2.3 §A3 — IIR resonator divergence at pathologically low sample rate (STABILITY, real) + +**The finding.** Both extractors' `bandpass_filter` set the resonator pole radius `r = 1 - bw/2` with `bw = 2π(f_high − f_low)/fs`. The **research report's stated trigger ("`fs` below ~4 Hz") is incorrect**, and we say so: the resonator pole *magnitude* is `|r|`, and the filter is stable for any `|r| < 1` — a merely-**negative** `r` is still stable. Divergence requires `|r| ≥ 1`, i.e. `bw ≥ 4`, i.e. `fs` very low **relative to the band width** (e.g. `fs = 0.5` Hz with a 0.1–0.9 Hz band → `bw = 10.05`, `r = −4.03`, `|r| = 4.03 > 1`). When that holds, the filter **diverges exponentially**: a unit-step input reaches `~10^183` within 300 frames and **overflows f64 to ±inf within ~600 frames**. Once one inf enters `filtered_history`, the autocorrelation `acf0`/zero-crossing path produces NaN and the extractor is **permanently dead** (silent stall until `reset()`). + +**The fix.** Two layers of defense-in-depth: +1. **Clamp** `r` to a stable range: `r = (1.0 - bw/2.0).clamp(0.0, 0.9999)` — keeps the pole inside the unit circle for **any** sample-rate / band-edge configuration. (We document honestly that the divergence condition is `|r| ≥ 1`, not "`r` negative.") +2. **Finite-guard** before the history push: `if !filtered.is_finite() { return None; }` — mirrors the NaN-bypass guard in ADR-154 §3, so even a future divergence cannot poison the buffer. + +Applied to **both** `heartrate.rs` and `breathing.rs` (identical resonator block). + +**Tests (fail on old code, verified by reverting — §6):** `heartrate::low_sample_rate_filter_stays_finite` and `breathing::low_sample_rate_filter_stays_finite` — construct at `fs=0.5` with a 0.1–0.9 Hz band, feed a unit step for 600 frames, assert **every** `filtered_history` sample is finite. On the old code these **panic** (a `filtered_history[i]` is inf/NaN); on the new code all samples are finite. + +### 2.4 §D1 — new `vitals/benches/vitals_bench.rs` (MEASURED) + +A new criterion bench (`harness = false`, registered in `Cargo.toml`) drives each extractor from empty to a full window (`heartrate` 1500 samples, `breathing` 3000) so the A1 sliding-window bookkeeping is exercised across the whole buffer. Follows the criterion style of the existing `hardware/benches/transport_bench.rs` and ADR-156's `fusion_bench`. Numbers and the honest interpretation are in §4. + +### 2.5 §B1 — `ieee80211bf/transport.rs` drop-instead-of-truncate (HARDENING, unreachable path — disclosed) + +`OpportunisticCsiBridge::ingest` built `CsiReportPayload { n_subcarriers: self.amp_accum.len() as u16, … }`. The `as u16` would silently wrap a count above 65 535. **This is unreachable in practice**: `ingest` gates `frame.subcarrier_count() > MAX_REPORT_SUBCARRIERS` (484) at entry and returns `None`, and `report.validate()` independently rejects oversized counts downstream. We replaced the cast with `u16::try_from(self.amp_accum.len()).ok()?` (drop-instead-of-truncate) so the construction is **correct-by-construction** rather than relying on the upstream gate. We disclose this as **defense-in-depth on an unreachable path, not a live bug** — no behavior change, no new test (the gate already prevents the input that would exercise it). + +### 2.6 §B4 — constant-time HMAC tag compare: **DEFERRED, not landed** (disclosed) + +`secure_tdm.rs:284` compares the 8-byte HMAC tag with `self.hmac_tag == expected` (data-dependent, non-constant-time). The research authorized adding `subtle::ConstantTimeEq` **only if `subtle` were already a direct dependency** — it is not (only transitive, via a crypto crate). Per that guidance, and because this is an **8-byte tag on a LAN multistatic sync beacon** (not a remote attacker-controlled timing-oracle surface), we **do not add a direct dependency** for it. Tracked in §8 as a deferred item, not silently dropped. + +--- + +## 3. The MEASURED negative-results section (the centerpiece — what was investigated and found already-correct) + +This is the core of ADR-157. The acquisition layer was hardened before this sweep; the strongest anti-slop evidence is an honest accounting of what we **checked and did not need to change**. Each is verified against the live code with a file:line citation. + +| Area | Claim verified | Evidence (file:line) | Verdict | +|------|----------------|----------------------|---------| +| **ESP32 parser subcarrier index math** | A crafted CSI frame cannot panic via the subcarrier-index arithmetic. The total-frame-size length gate (`data.len() < HEADER_SIZE + n_antennas·n_subcarriers·2 → Err`) dominates **every** subsequent `data[byte_offset]`/`[+1]` access; `n_subcarriers ≤ 256`, `n_antennas ≤ 4` are header-bounded, and the `index` math is pure i16 arithmetic with no indexing. | `esp32_parser.rs:211` (length gate) guards the loop at `:224–242` | **Already safe — NO ACTION** | +| **`sync_packet.rs` `try_into().unwrap()`** | The four `try_into().unwrap()` calls are **infallible**: each slices a fixed-width sub-range (`[0..4]`, `[8..16]`, `[16..24]`, `[24..28]`) of a buffer already guaranteed `len() >= SYNC_PACKET_SIZE` (32) by the early `return Err(InsufficientData)`. | `sync_packet.rs:88` (length gate) → `:94,102,103,104` (fixed-width slices) | **Already safe — NO ACTION** | +| **The entire `ieee80211bf/` 802.11bf model** | Validate-on-deserialize and no-panic-by-construction throughout. `MeasurementSetupId` is `#[serde(try_from = "u8")]` rejecting `> MAX_SETUP_ID` (127); `ThresholdParams` is `#[serde(try_from = "RawThresholdParams")]` routing every deserialize through `ThresholdParams::new`; the session FSM `handle()` returns `Result, BfError>` (never panics) and enforces **single-role** (`self.role != Initiator/Responder → Err`) on every transition; the SBP request is validated through the **same** single `evaluate_setup` chain as a direct setup (no SBP-only policy bypass). | `types.rs:160–161` (setup-id try_from), `:225–226` (threshold try_from), `:165` (range check); `session.rs:118` (`handle` → Result), `:130/143/166/182` (single-role), `messages.rs:130–147` (SBP single-evaluate) | **Already SOTA-shaped — NO ACTION** | +| **`secure_tdm.rs` HMAC + replay** | Beacon authentication (HMAC-SHA256, 8-byte tag), tamper rejection, and replay-window protection are correct and tested. (The non-constant-time compare at `:284` is the only nit — §2.6, deferred as out-of-threat-model for an 8-byte LAN tag.) | `secure_tdm.rs:279` (`verify`), `:284` (compare), tests `:614–673` (replay), `:728` (tamper) | **Correct — NO ACTION (B4 deferred)** | +| **`netsh_scanner.rs` command + parse** | No shell-injection surface: the scanner uses a **fixed argv** (`Command::new("netsh").args(["wlan","show","networks","mode=bssid"])`) — no shell, no interpolation. Parsing is **`Option`-based** (`try_parse_ssid_line`/`try_parse_bssid_line`/`try_parse_signal_line` → `Option`, with `.unwrap_or(default)`), so hostile/garbled netsh output is silently skipped, never panicked. | `netsh_scanner.rs:50–51` (fixed argv), `:96–102` (`unwrap_or` defaults), `:242/257/270` (`Option` parsers) | **Already safe — NO ACTION** | +| **`calibration/geometry_embedding.rs` overflow guard** | The geometry embedding clamps every position/std-dev component into `±MAX_COORD_M` (1000 m) via `clamp_m`, explicitly to stop adversarial coordinates from overflowing the covariance accumulation into `inf`; the documented invariant ("every value is finite, never NaN/inf") holds. | `geometry_embedding.rs:55` (`MAX_COORD_M`), `:145/150` (`clamp_m` on centroid + std-dev) | **Already safe — NO ACTION** | + +--- + +## 4. The §D1 perf measurement (MEASURED — honestly near-null end-to-end) + +New bench: `crates/wifi-densepose-vitals/benches/vitals_bench.rs`, two functions covering a full-window fill of each extractor. + +- **Reproduce:** `cargo bench -p wifi-densepose-vitals --bench vitals_bench` + (compile-only: append `--no-run`; the medians below used `-- --warm-up-time 1 --measurement-time 3 --sample-size 20`). + +**End-to-end `extract()` full-window fill, medians:** + +| Bench | Before (`Vec::remove(0)`) | After (`VecDeque`) | Verdict | +|-------|---------------------------|--------------------|---------| +| `heartrate_extract_full_window_1500` | 42.81 ms `[42.19, 42.81, 43.46]` | 44.37 ms `[43.55, 44.37, 45.19]` | **no measurable change** (after marginally slower; intervals overlap) | +| `breathing_extract_full_window_3000` | 7.95 ms `[7.86, 7.95, 8.05]` | 7.86 ms `[7.66, 7.86, 8.04]` | **no measurable change** (intervals overlap) | + +The end-to-end effect is **null within noise** because the per-frame DSP dominates: heartrate runs an O(window·lags) autocorrelation every frame (≈1500·125 multiply-adds), which utterly swamps the O(window) eviction the A1 change improves; breathing's O(window) zero-crossing and the `make_contiguous` rotation are the same order as the old `remove(0)` memmove at these sizes. + +**Where the win actually lives (isolated eviction-only microbench, supporting evidence — not in the committed bench):** + +| Window | `Vec::remove(0)` (eviction only) | `VecDeque` | Speedup | +|--------|----------------------------------|------------|---------| +| 3 000 | 1.00 ms | 0.029 ms | **34.6×** | +| 20 000 | 94.5 ms | 0.122 ms | **773×** | +| 100 000 | 3 139 ms | 0.994 ms | **3 158×** | + +So A1 is **algorithmically correct and removes a real latent O(n²)** that would bite at higher sample rates or longer analysis windows — but at the **current** ESP32 window sizes the end-to-end win is below noise, and we claim nothing more. This is the §0 contract in action: a perf claim without a measured before/after improvement is **not made**. + +--- + +## 5. The hardware/sensing SOTA landscape (graded — mostly NO-ACTION, honest) + +Grades: **MEASURED** (source measured it, ideally public method/code), **CLAIMED** (asserted, no reproducible artifact), **DATA-GATED** (blocked on data we don't have, per a prior ADR-152 measurement). + +| # | Area | Candidate / question | Grade | Verdict | +|---|------|----------------------|-------|---------| +| 1 | **CSI vital signs (HR/BR)** | Deep-CSI vital-sign models report **MAE ~2–3 BPM** vs our classical IIR-bandpass + autocorrelation/zero-crossing. | **DATA-GATED + CLAIMED** | **NO ACTION on method.** A deep model needs **paired PPG/ECG ground truth** we do not have, and no public ESP32 artifact reproduces the cited MAE on commodity CSI. Our classical method is the honest commodity baseline; the real wins this milestone are the A1/A3 robustness fixes, not a new model. | +| 2 | **802.11bf-2025 conformance** | Adopt a conformance test-vector suite for the `ieee80211bf/` forward-compat model. | **CLAIMED (not public)** | **NO ACTION.** No commodity silicon ships a conformant 802.11bf interface as of 2026, and the conformance suites are **WBA / Wi-Fi Alliance pre-certification** material, **not public**. Our model's "no OTA encoding until silicon exists" posture (ADR-153) is the correct one. Tracked in §8: *add SBP conformance vectors when the WFA publishes a test plan* — we will **not invent vectors**. | +| 3 | **Per-room calibration (ADR-151)** | Bank-of-specialists + drift-veto vs a 2026 calibration SOTA. | **CLAIMED on numbers, DATA-GATED on a head-to-head** | **NO ACTION on architecture.** The bank-of-specialists + drift-veto design is SOTA-shaped, but we have **no head-to-head PCK** against a published method (no paired multi-room data). The geometry-conditioned LoRA head is **built-but-unconsumed** and data-gated → **ACCEPTED-FUTURE** (§8), not built now. | +| 4 | **Multi-BSSID throughput (wifiscan)** | The module docs assert a native `wlanapi.dll` FFI 10–20 Hz path; the current `WlanApiScanner` wraps `netsh` (~2 Hz). | **CLAIMED-unmeasured** | **NO ACTION + corrected expectation.** The native FFI fast path is **asserted but NOT implemented** — the live scanner is the ~2 Hz netsh shim. The "10×" is unmeasured. → **ACCEPTED-FUTURE** (§8). **We explicitly do NOT claim a speedup that does not exist.** | + +--- + +## 6. Validation + +- **Bug-catching tests verified to bite.** Each §A2/§A3 fix was reverted and the corresponding test observed to fail on the old code, then restored: + - `partial_weights_are_renormalized_not_scale_mixed`, `partial_weights_fusion_is_weighted_average` — **assertion failure** (returned the old un-normalized scale-mixed sum) on old code. + - `heartrate::low_sample_rate_filter_stays_finite`, `breathing::low_sample_rate_filter_stays_finite` — **panic** (a `filtered_history[i]` is inf/NaN) on old code. + - §A1 is the **disclosed bit-identical change**: no behavior test bites (correctly — output is unchanged); the bench (§4) is the gate, and it shows **no measurable end-to-end change**, which we report honestly. + - §B1 is on an **unreachable path** (gated upstream), so it carries no new test — disclosed as defense-in-depth, not a live bug. +- **`cd v2 && cargo test -p wifi-densepose-vitals -p wifi-densepose-hardware -p wifi-densepose-wifiscan -p wifi-densepose-calibration --no-default-features`** — all green. Lib-test counts: `wifi-densepose-vitals` **55** (was 51; +4 net new bug-catching tests — two §A2, two §A3), `wifi-densepose-hardware` **163**, `wifi-densepose-wifiscan` **87**, `wifi-densepose-calibration` **58**. 0 failures across all four. +- **`cd v2 && cargo test --workspace --no-default-features`** — **3054 passed / 0 failed** (M2 left the workspace at 3050; the +4 net new bug-catching tests are included and green). +- **`python archive/v1/data/proof/verify.py`** — **`VERDICT: PASS`**, pipeline hash unchanged `f8e76f21…46f7a` (these are Rust-only changes; the Python pipeline proof is independent and confirmed unaffected). +- New `vitals_bench` compiles and runs under the default feature set. +- **Disclosed validation limits:** the live-QUIC transport in `secure_tdm` is **structurally** tested (HMAC compute/verify, tamper, replay-window) but **not live-socket-tested** in CI; the serde-gated `ieee80211bf` types are additionally verifiable with `--features serde`. Clippy is not installed in the local 1.89 toolchain, so the per-crate lint pass was not run locally (the project gate is `cargo test`). + +--- + +## 7. What changed, file by file + +- `vitals/heartrate.rs` — `filtered_history: Vec` → `VecDeque` (`push_back`/`pop_front`, `make_contiguous` once per `extract`); resonator `r` clamped to `[0, 0.9999]`; finite-guard before history push; corrected divergence-condition doc (`|r| ≥ 1`, not "`r` negative"); `low_sample_rate_filter_stays_finite` test. +- `vitals/breathing.rs` — same `VecDeque` + clamp + finite-guard changes; weighted fusion extracted to `fuse_weighted_residuals` and **normalized by Σ(effective weights)** (the §A2 fix); three new tests (two A2, one A3). +- `vitals/anomaly.rs`, `vitals/store.rs` — sliding/ring buffers → `VecDeque` (O(1) eviction); `store::history` takes `&mut self` to hand back a contiguous slice via `make_contiguous` (no external callers; observable contents unchanged). +- `wifiscan/pipeline/breathing_extractor.rs` — `VecDeque` + `make_contiguous`. +- `wifiscan/pipeline/correlator.rs` — per-BSSID histories → `Vec>`; contiguous-ize each touched buffer once before the Pearson pass. +- `hardware/ieee80211bf/transport.rs` — `n_subcarriers: … as u16` → `u16::try_from(…).ok()?` (§B1 drop-instead-of-truncate, unreachable-path hardening). +- `vitals/Cargo.toml` + `vitals/benches/vitals_bench.rs` (new) — criterion dev-dep, `[[bench]]`, the §D1 full-window benches. + +--- + +## 8. Deferred backlog (NOT silently dropped) + +- **§B4 constant-time HMAC compare** — `secure_tdm.rs:284` uses `==` on the 8-byte tag. Add `subtle::ConstantTimeEq` **if** `subtle` becomes a direct dependency for another reason; not worth a new dependency for an 8-byte LAN sync-beacon tag (out of the current threat model). Deferred, not dropped. +- **802.11bf SBP conformance vectors** (§5 #2) — add real conformance test vectors to the `ieee80211bf/` model **when the Wi-Fi Alliance / WBA publishes a public test plan**. Do not invent vectors before then. +- **Geometry-conditioned LoRA calibration head** (§5 #3) — built-but-unconsumed and **data-gated** on paired multi-room PCK data (ADR-152 measurement (b): data, not architecture, is the bottleneck). ACCEPTED-FUTURE. +- **Native `wlanapi.dll` FFI multi-BSSID fast path** (§5 #4) — the asserted 10–20 Hz path is **not implemented**; the live scanner is the ~2 Hz netsh shim. Implement and **measure** the real throughput before claiming any multiple. ACCEPTED-FUTURE, CLAIMED-unmeasured until then. +- **Deep-CSI vital-sign model** (§5 #1) — DATA-GATED on paired PPG/ECG ground truth. No public ESP32 artifact reproduces the cited ~2–3 BPM MAE. Not on the near-term path. + +--- + +## 9. Consequences + +**Positive.** The vital-sign extractors now use the correct O(1)-eviction data structure (no latent O(n²)), cannot mis-scale a breathing estimate from a partial attention-weight slice, and cannot be silently killed by a diverging IIR filter at a pathological sample rate. The 802.11bf construction site drops-instead-of-truncates on an (already-gated) oversized count. Most importantly, the layer's existing hardening — length-gated parsers, infallible fixed-width slices, validate-on-deserialize, no-panic FSMs, fixed-argv scanning, HMAC+replay TDM, overflow-clamped geometry embeddings — is now **documented as MEASURED negative results** with file:line evidence, so a reader can verify the "already safe" claims rather than take them on faith. + +**Negative / honest limits.** The §A1 perf change is **null end-to-end** at realistic window sizes — we land it for correctness, not speed, and the committed bench proves the null rather than hiding it. The research report's stated §A3 divergence trigger ("`fs` below ~4 Hz") was **physically inaccurate** (divergence needs `|r| ≥ 1` ⇒ `bw ≥ 4`, a far lower `fs`); we corrected it in the code comments and the test parameters and disclose the correction here. The strongest external SOTA candidates (deep-CSI vitals, learned calibration, native FFI scanning) are **all NO-ACTION or ACCEPTED-FUTURE** — data-gated, unmeasured, or blocked on a non-public conformance suite — and **none is presented as more than it is.** §B4 is consciously deferred. Nothing in this milestone is inflated beyond what a reverting reviewer can reproduce. diff --git a/api-docs/adr/ADR-158-mat-worldmodel-beyond-sota.md b/api-docs/adr/ADR-158-mat-worldmodel-beyond-sota.md new file mode 100644 index 00000000..af4261c6 --- /dev/null +++ b/api-docs/adr/ADR-158-mat-worldmodel-beyond-sota.md @@ -0,0 +1,212 @@ +# ADR-158: MAT / World-Model Cluster — Beyond-SOTA Sweep, Anti-"AI-Slop" Hardening + +- **Status**: accepted +- **Date**: 2026-06-11 +- **Deciders**: ruv +- **Tags**: mat, life-safety, localization, triage, worldmodel, worldgraph, geo, engine, prove-everything + +## Context + +This ADR records the beyond-SOTA sweep over the MAT / world-model cluster +(`wifi-densepose-mat`, `-worldmodel`, `-worldgraph`, `-geo`, `-engine`), executed +under the project's **prove-everything / anti-"AI-slop"** directive: every stub is +either implemented with real logic or replaced by an honest typed error; no +fake/always-empty/random outputs; tests pass on real behaviour; results are graded +**MEASURED** (reproduced here with the command recorded), **CLAIMED**, +**DATA-GATED** (real code path present, needs hardware/data we lack), or +**NO-ACTION** (already-SOTA — cited as a positive). + +The Mass Casualty Assessment Tool touches life-safety. A triage metric that is +disconnected from the decision it gates, or a survivor count that inflates, is the +worst class of slop: it produces confident, wrong rescue prioritisation. An audit +against live code found six concrete defects, four of which were silent +correctness bugs (not missing features) in the triage → gate → record path and in +the localization/dedup path. + +Grading vocabulary follows ADR-152 (F-evidence grades) and the sweep convention: +- **MEASURED** — reproduced in this worktree, command recorded below. +- **DATA-GATED** — real code path implemented; returns a typed error / honest + provenance flag where hardware or labelled data is genuinely absent. +- **NO-ACTION (already-SOTA)** — audited, found correct, cited as a positive. +- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped. + +## Graded SOTA Landscape + +| Capability | Grade | Note | +|------------|-------|------| +| RF-through-rubble survivor detection | **DATA-GATED** | Real detection + triage + localization code paths run end-to-end on real CSI bytes; field detection *accuracy* is unproven without instrumented rubble trials and is **not fabricated** here. | +| OccWorld occupancy architecture (`-worldmodel`) | **NO-ACTION (current)** | `occupancy.rs` voxel mapping is clamp-proven bounds-safe; converts WorldGraph person positions to a 200×200×16 grid with no out-of-bounds path. | +| WorldGraph provenance / privacy / pruning (`-worldgraph`) | **NO-ACTION (already-SOTA)** | `graph.rs` implements append-with-provenance (`DerivedFrom`), deterministic LRU pruning, and a privacy rollup (`PrivacyLimitedBy`). Cited as a positive; no changes needed. | +| Point-cloud parser bounds-safety (`-pointcloud`) | **NO-ACTION (already-SOTA)** | Another agent's crate; cited only — its parser is bounds-checked. Out of scope for this ADR's edits. | +| Learned multi-person counter | **DATA-GATED** | Deferred; requires labelled multi-occupant CSI. The zone+vitals-signature dedup (below) is the honest non-learned stand-in. | +| RF point-cloud generation | **ACCEPTED-FUTURE** | Not dropped; tracked as future work. | + +## Decision — Fixes Landed (MEASURED) + +### §1 Unify the two divergent triage engines (CRITICAL) + +**Was:** `EnsembleClassifier::determine_triage` (ensemble gate) and +`TriageCalculator::calculate` (survivor record) were two different START-protocol +approximations with different rate bands and movement handling. The pipeline +gated on the ensemble's confidence (`lib.rs:489`), discarded the ensemble triage +(`lib.rs:524`, `_ensemble`), and recomputed via `TriageCalculator` in +`Survivor::new` (`survivor.rs:194`). A survivor could be admitted at one priority +and recorded at another. + +**Now:** `determine_triage` delegates to `TriageCalculator` — the **single source +of truth** used by both the gate and the survivor record. The only ensemble- +specific behaviour retained is the confidence gate (low confidence → `Unknown`, +except `Immediate`, which is never suppressed — a missed survivor in distress is +costlier than a false positive). Rate bands follow START (<10 / >30 bpm → +Immediate). + +**Failing-on-old test:** `detection::ensemble::tests::test_divergent_boundary_28bpm_tremor_gate_equals_survivor` +— 28 bpm Normal + Tremor. Old gate → Delayed, old survivor record → Immediate +(divergent). Unified result: gate == survivor == **Immediate**. Companion tests +(`test_no_vitals_is_unknown_canonical`, `test_normal_breathing_no_movement_is_immediate_canonical`, +the updated `integration_adr001::test_ensemble_classifier_triage_logic`) assert +gate-vs-record equality on every boundary. + +### §2 Real RSSI/ToA localization + kill count-inflation (HIGH) + +**Was:** `fusion.rs:79 simulate_rssi_measurements` always returned `vec![]`, so +every survivor got `location: None`, so spatial dedup (`disaster_event.rs:285`, +which only fired on `Some` location) was disabled. One trapped person re-detected +across N scan cycles became **N survivors** — a fabricated mass-casualty count. + +**Now, two real mechanisms:** +1. **Real RSSI source:** `SensorPosition` gains an optional `last_rssi` + (populated by the hardware layer from actual signal-strength readings). + `collect_rssi_measurements` reads only real per-sensor RSSI and feeds the + existing triangulator; it **never fabricates** a value. With `< min_sensors` + real readings, `estimate_position` returns `None` (honest). +2. **Zone + vitals-signature dedup:** when no usable location exists, + `record_detection` matches an existing *active, un-located* survivor in the + same zone whose latest vital signature (breathing presence + START rate band, + heartbeat presence, movement class) is compatible — collapsing repeat + detections of one person while keeping genuinely distinct survivors separate. + +**MEASURED:** `test_identical_vitals_no_location_dedup_to_one` — 3× identical-vitals +/ `None`-location → **1 survivor** (old code: 3). `test_distinct_vitals_no_location_stay_separate` +keeps two distinct survivors at 2 (no under-count). `test_estimate_position_uses_real_rssi` +yields a position from 3 real-RSSI sensors; `test_estimate_position_none_without_real_rssi` +yields `None` (no fabrication). + +### §3 Real ESP32/UDP/PCAP CSI ingest; honest typed errors elsewhere (HIGH) + +**Was:** `hardware_adapter.rs read_esp32_csi` / `read_udp_csi` / `read_pcap_csi` +returned "not yet implemented" — even though `csi_receiver.rs` already contained a +working `CsiParser` (ESP32 CSV, JSON, Intel5300/Atheros/Nexmon byte decoders) and a +real `PcapCsiReader`. + +**Now:** +- **UDP** — binds, receives one datagram, parses (auto-detect) → `CsiReadings`. + End-to-end test sends a real JSON datagram on the wire. +- **PCAP** — `load` + `read_next` + parse. End-to-end test writes a real + little-endian `.pcap` with one record and reads it back. +- **ESP32** — parses `CSI_DATA` CSV via the real parser. Live serial byte I/O is + behind an optional `serial` cargo feature (native `serialport` kept off the + default / aarch64 appliance build); with the feature off, live reads return a + typed `UnsupportedAdapter` while the byte parser still works. +- **Intel 5300 / Atheros / PicoScenes** — return typed + `AdapterError::HardwareUnavailable` / `UnsupportedAdapter` (no device, no + driver, or no validatable format here). **Never fake CSI.** New error variants + added to make the gating typed rather than a `String` "Hardware" soup. + +**MEASURED:** `test_esp32_bytes_parse_end_to_end`, `test_udp_read_end_to_end`, +`test_pcap_read_end_to_end`, `test_intel_and_atheros_are_honestly_unavailable`. + +### §4 Real parabolic peak interpolation in `find_dominant_frequency` (MED) + +**Was:** `breathing.rs:243` comment claimed interpolation but returned the bin +center, capping breathing-rate resolution at ±half a bin. + +**Now:** 3-point parabolic (quadratic) peak interpolation, +`δ = 0.5·(yL − yR)/(yL − 2y0 + yR)`, clamped to `[-0.5, 0.5]`, with an edge +fallback to bin center. + +**MEASURED:** `test_find_dominant_frequency_parabolic_interpolation` — for a +parabola-shaped peak at true bin 10.4 the recovery is exact (δ = 0.4); the test +asserts the result lands within half a bin of truth and strictly beats the +old bin-center estimate. + +### §5 GDOP honesty (LOW) + +**Was:** `triangulation.rs:248 estimate_gdop` returned an ad-hoc average-pair-angle +factor *labelled* GDOP (the same defect class ADR-156 §2.3 fixed elsewhere). + +**Now:** real, dimensionless **GDOP = √(trace((HᵀH)⁻¹))** from the range-measurement +Jacobian `H` (unit target→sensor bearings), returning `None` for singular +(collinear) geometry, which the caller treats as factor 1.0 (no fabrication). + +**MEASURED:** `test_gdop_is_real_dilution` — a well-spread array gives a lower GDOP +than a near-collinear one, cross-checked against the closed form; +`test_gdop_singular_collinear_is_none` confirms singular geometry returns `None`. + +### §6 OccWorld trajectory-prior consumer honesty (fail-safe) + +**Finding:** `wifi-densepose-mat` does **not** consume OccWorld trajectory priors +and has no `-worldmodel`/`-worldgraph`/occworld dependency (grep-verified: zero +hits across `crates/wifi-densepose-mat/`). There is therefore no random-derived +prior being consumed. **No code change** is warranted; the fail-safe (ignore +priors until a typed `weights_complete`/`stubbed` flag exists) is already the +status quo by absence. Recorded here so a future consumer wires the flag rather +than re-introducing the risk. + +## Negative Results (Confirmed — NO-ACTION) + +These were audited and found genuinely correct; they are cited as positives, not +edited: + +- **`worldgraph` provenance / privacy / pruning** (`graph.rs`) — append-with- + provenance (`add_semantic_state` + `DerivedFrom`), deterministic LRU pruning + (`prune_semantic_states`, with `prune_is_deterministic_for_equal_timestamps`), + and a privacy rollup (`apply_privacy_mode` → `PrivacyLimitedBy`). Already-SOTA. +- **`worldmodel` occupancy clamp** (`occupancy.rs:74–125`) — `to_voxel_xy` / + `to_voxel_z` `.clamp()` voxel indices into `[0, GRID-1]`; the flat index is + always in-bounds. No out-of-bounds / fabrication path. +- **`pointcloud` parser bounds-safety** — another agent's crate; cited only, its + parser is bounds-checked. + +## Deferred Backlog (Nothing Dropped) + +- **Learned multi-person counter** — DATA-GATED on labelled multi-occupant CSI. + The zone+vitals-signature dedup (§2) is the honest non-learned stand-in until + then. +- **RF point-cloud generation** — ACCEPTED-FUTURE. +- **PicoScenes container decode** — DATA-GATED; needs matching NIC/plugin to + validate against. Returns `UnsupportedAdapter` today. +- **Intel 5300 / Atheros live capture** — DATA-GATED on patched drivers; byte + parsers exist and are exercised on supplied bytes. + +## Consequences + +- Triage is now a single auditable function; gate and survivor record can never + diverge. +- Survivor counts cannot inflate from repeat detection of one un-located person. +- The CSI ingest layer either produces real data or fails with a typed error that + names *why* — no path silently substitutes simulated/fabricated CSI. +- `SensorPosition` grows an optional `last_rssi` field (serde-`default`, non- + breaking for deserialisation; 7 constructors updated). +- A new optional `serial` feature isolates the native `serialport` dependency from + the default / appliance builds. + +## Reproduction (MEASURED) + +```bash +cd v2 +# MAT — default features (181 unit + 6 + 3[3 ignored] integration) +cargo test -p wifi-densepose-mat +# MAT — all features (same counts; exercises ruvector + api + serde paths) +cargo test -p wifi-densepose-mat --all-features +# MAT — serial feature compiles (native serialport path) +cargo check -p wifi-densepose-mat --features serial +# Sibling crates (cited NO-ACTION; confirmed green) +cargo test -p wifi-densepose-worldmodel # 12 + 1 +cargo test -p wifi-densepose-worldgraph # 9 +cargo test -p wifi-densepose-geo # 9 + 8 +cargo test -p wifi-densepose-engine # 27 +``` + +Result at time of writing: MAT **181 passed; 0 failed** (default and all-features); +worldmodel **13**, worldgraph **9**, geo **17**, engine **27** — all 0 failed. diff --git a/api-docs/research/soul/specification.md b/api-docs/research/soul/specification.md index e452f6e3..ba919525 100644 --- a/api-docs/research/soul/specification.md +++ b/api-docs/research/soul/specification.md @@ -411,6 +411,23 @@ include a conformance layer if regulatory certification is sought. ### 3.6 Matching Algorithm +> **Implementation status (§3.6 only):** The matching algorithm described below +> is **implemented and tested** in +> `v2/crates/wifi-densepose-bfld/src/soul_match.rs` (+ `soul_channels.rs`), +> with tests in `v2/crates/wifi-densepose-bfld/tests/soul_match.rs`. The +> implementation is the **first running** version of this formula in the repo: +> it computes calibrated per-channel scores and exposes a real +> `SoulMatchOracle` (`EnrolledMatcher`). **Caveats that remain true:** the +> weights below are unvalidated design intent; named-identity locking is +> **data-gated** — it requires the decisive high-weight channels (a real AETHER +> enrollment embedding + body-resonance) to be fed real measured data, which has +> NOT been done. Measured on synthetic data, the cardiac (0.15) + respiratory +> (0.10) channels **alone** produce a same-vs-cross-person score gap of ~0.0005 +> (test `cardiac_alone_cannot_separate_identity_matches_audit`) — i.e. identity +> is NOT separable on those channels, exactly as expected. This status note +> applies to §3.6 ONLY; the broader Soul Signature system remains +> Pre-Implementation. + Given a stored profile `P` and a query embedding `Q` derived from a live sensing window, the match score is computed as a weighted sum of per-channel cosine similarities: