Beyond-SOTA engine/signal/train improvements: mesh partition guard, FFT CIR solver, canonical frame decoder, falsifiable occupancy benchmark, governed streaming, adapter provenance (#1018)

* docs(research): add RuView beyond-SOTA system review (00) First document of the beyond-SOTA research series: capability audit of the current RuView engine with role-to-crate maturity matrix, ruvsense module inventory, gap analysis, and risk register. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): add beyond-SOTA architecture design (02, in progress) https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): finalize beyond-SOTA architecture (02) https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): add benchmark/validation methodology snapshot (03) https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): add beyond-SOTA series index with validation results; changelog README index ties the 5 research docs together with the session's measured validation evidence: 2,797 workspace tests / 0 failed, Python proof PASS (bit-exact), and paired pre/post criterion CIR benchmarks. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * perf(signal): precompute CIR warm-start system; hoist tomography solver allocs Exact, determinism-safe optimizations (bit-identical float results): - cir.rs: diag(PhiH Phi)+lambda*I and its CSR matrix depend only on Phi and lambda (fixed at CirEstimator::new) but were rebuilt every frame (O(K*G) pass + CSR allocation). Now built once in new() via build_warm_start_system; summation order unchanged. - tomography.rs: ISTA gradient buffer hoisted out of the 100-iteration loop (fill(0.0) reset) and the Frobenius Lipschitz bound moved from per-reconstruct to construction. Verified: signal 456 tests green; engine 11/11 green including cycle_is_deterministic and witness-stability tests. Criterion paired pre/post: cir_estimate/he40 -3.9% (p<0.01), multiband -1.2/-1.4%. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * fix(worldgraph): bound SemanticState growth with deterministic retention StreamingEngine::process_cycle appended one SemanticState belief per cycle with no eviction — ~1.7M nodes/day at 20 Hz (beyond-SOTA roadmap finding #6). Add WorldGraph::prune_semantic_states(max): deterministic eviction of the oldest beliefs by (valid_from_unix_ms, id); structural nodes (rooms, zones, sensors, anchors, tracks, events) are never eligible. Wire it into the engine after each belief append (DEFAULT_SEMANTIC_RETENTION = 7,200, ~6 min at 20 Hz; set_semantic_retention to tune). The WorldGraph holds current beliefs; durable history is the recorder's job, so no audit data is lost. 3 new tests: end-to-end bounded growth, oldest-only eviction, deterministic equal-timestamp tie-break. Workspace gate: 2,865 passed, 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(sensing-server): route live frames through the governed StreamingEngine Closes the live-trust-path gap (ADR-136 section 8, beyond-SOTA system review): the running server fused live CSI with the bare MultistaticFuser, while the privacy/provenance/witness control plane (ADR-135..146) only ever ran on synthetic in-test frames. The privacy control plane was therefore bypassable on the real path. New engine_bridge module drives StreamingEngine::process_cycle from the server's live NodeState map, reusing the existing NodeState -> MultiBandCsiFrame conversion. It lazily wires each contributing node as a WorldGraph sensor (idempotent), bounds belief growth via the retention cap, and forwards explicit timestamps/calibration ids so the path stays deterministic and replayable. Wired additively into both live ESP32/WiFi fusion sites in main.rs via a split-borrow off the write guard, so person-count behavior is unchanged; the latest BLAKE3 witness is stored on AppState. Every published belief now carries evidence + model + calibration + privacy decision and a deterministic witness. Adds wifi-densepose-engine/-worldgraph/-bfld/-geo deps. 6 new bridge tests (witnessed belief with full provenance, cross-run determinism, idempotent node registration, retention bound, privacy-mode propagation). sensing-server suite 430+128 green; workspace gate 2,904 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(train): falsifiable occupancy benchmark with anti-overfitting gate Makes the presence/person-count "beyond SOTA" claim falsifiable in code instead of aspirational (the unfalsifiability gap from the beyond-SOTA system review). occupancy_bench grades predictions vs ground truth and gates a SOTA claim behind one claim_allowed invariant requiring ALL of: - DataProvenance::Measured — synthetic/mock data is scorable for regression but never claimable (anti-mock-contamination; the CLAUDE.md Kconfig-bug lesson made structural). - A leak-free EvalSplit — validate() refuses any split where a subject OR environment id appears in both train and test (subject leakage / per-environment overfitting). - n_test >= min_test_samples (small-N guard). - Presence F1 whose bootstrap-CI lower bound (deterministic seeded splitmix64) clears the threshold — not the point estimate. - Count MAE within threshold. The claim string is unreadable except through the gate (NO_CLAIM otherwise), same discipline as the ruview-gamma acceptance gate. What remains is data, not method: a frozen, SHA-pinned, subject/environment-disjoint measured replay set turns the claim into a passing/failing test. Lives in wifi-densepose-train (the eval bounded context, alongside ablation/ eval/metrics). 10 tests cover each refusal path; warning-clean under the crate's missing_docs lint. Workspace gate 2,914 passed / 0 failed. Doc 03 updated. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(engine): per-room adapter provenance + drift-to-recalibration advisor Closes the trust-chain gap where an ~11 KB per-room LoRA adapter (ADR-150 section 3.4) could silently change inference without the witness noticing: provenance carried only "rfenc-v<N>" with no notion of adapter identity. - StreamingEngine::set_room_adapter(AdapterInfo): pins the adapter's content-derived id into provenance model_version ("rfenc-v1+adapter:<id>") — and therefore into the BLAKE3 witness — so swapping or clearing adapter weights always shifts the witness. Engine test proves base -> adapter -> other-adapter -> cleared all witness differently and cleared == base. - RecalibrationAdvisor: recommends re-running the ADR-135 empty-room baseline / refitting the room adapter on sustained low fusion coherence (streak threshold, default 60 cycles ~ 3 s at 20 Hz) or an ADR-142 change-point. Surfaced as TrustedOutput::recalibration_recommended, stored on the sensing-server AppState alongside the witness at both live fusion sites. - Bridge plumbing: EngineBridge::{set_room_adapter, clear_room_adapter} + live-path test that the adapter id flows into the live witness. Scope note (honest): this is the deployable provenance/trigger half of the "retrained model" roadmap item. Fitting the adapter itself runs in the existing external calibration service (aether-arena/calibration/); a trained RF-encoder checkpoint still does not exist in-tree. Engine 15 tests, bridge 7 tests. Workspace gate: 2,918 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * fix(mat): gate api module behind its feature — standalone no-default-features builds pub mod api was unconditional while its only dependency, serde, is optional behind the 'api' feature, so any build without default features failed with 101 unresolved-serde errors (masked in --workspace runs by feature unification). The api module and its create_router/AppState re-export are now cfg(feature = "api")-gated with docsrs annotations. All combos compile: bare --no-default-features (was 101 errors, now 0), --no-default-features --features api, and full default (177 tests pass). Workspace gate: 2,918 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * perf(signal): opt-in FFT operator for the CIR ISTA solver (8-14x measured) Phi is a sub-DFT, so each ISTA mat-vec can run as one length-G FFT (O(G log G)) instead of a dense O(K*G) product — the dominant-latency-hazard finding from the beyond-SOTA optimization roadmap. New CirConfig::fft_operator, default FALSE: the dense path stays the bit-exact witness default. The FFT evaluates the same sums in a different order, so enabling it shifts float results in the last bits and requires regenerating any pinned witness — strictly opt-in per deployment. FftOperator (rustfft, planned once at CirEstimator::new, scratch buffers reused across the ISTA loop) dispatches inside ista_solve: Phi x = scale * forward-FFT(x) sampled at bins (k_idx mod G) Phi^H v = scale * unnormalised inverse-FFT of v scattered into those bins Warm-start and Lipschitz estimation stay dense at construction. Measured (criterion, same run, same machine): ht20: 2.22 ms -> 265 us (8.4x) ht40: 10.26 ms -> 717 us (14.3x) The real HE40 grid (K=484, G=1452) scales further per the O(K*G)/O(G log G) ratio. 3 new tests: FFT<->dense matvec equivalence to float tolerance on ht20 and he40 grids; end-to-end dominant-tap agreement on a single-path frame; all default configs keep FFT off. New cir_estimate_fft bench group. Workspace gate: 2,921 passed / 0 failed (default path bit-exact, witnesses unchanged). https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(core): canonical frame decoder — capture-to-claim replay (ADR-136) The encode half of the ADR-136 frame contract existed (ComplexSample, to_canonical_bytes, witness_hash) but there was no decoder: a captured canonical frame could be witnessed but never reconstructed, blocking replay-from-capture. CsiFrame::from_canonical_bytes is the exact inverse: same id, metadata, complex payload, and witness hash (tested as the round-trip law AC7 — the replayed frame re-encodes byte-identically). Amplitude/phase are recomputed from the payload (projections, not independent state). Every malformed-input class fails closed (AC8): header truncation -> Truncated, payload truncation -> PayloadMismatch, unknown discriminants, non-UTF-8 device id, trailing bytes. Nil calibration uuid decodes as None per the documented encoding. Core: 36 tests pass. Workspace gate: 2,937 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(engine): dynamic min-cut mesh partition guard (ruvector-mincut) Maintains an exact min-cut over the live mesh coupling graph — nodes are sensing nodes, coupling is the product of fusion attention weights — and surfaces per cycle, as TrustedOutput::mesh: - cut value: the global "how close is the array to partitioning" number, a structural measure per-node heuristics miss; - weak side: which specific nodes would split off (failure/jamming triage, feeds ADR-032 posture); - at-risk flag: counts as a structural event for the drift->recalibration advisor (alongside ADR-142 change-points). Degenerate cases fail toward risk: a node with zero coupling is reported as already partitioned (cut 0, that node as the weak side). Measured cost policy (criterion, 12-node mesh — the honest part): - weights quantized (1/64) + change-gated: steady-state cycles do ZERO graph work and reuse the cached cut (~7.3 us, ~23x cheaper than building); - on any real change a full exact rebuild (~171 us) is used, because ONE DynamicMinCut delete+insert measured ~240 us — the subpolynomial machinery amortizes on much larger graphs, so rebuild-on-change is the measured optimum at mesh scale (one-edge case -28% after switching policy); - full process_cycle with the guard: ~33 us for 4 nodes vs the 50 ms budget. 9 mesh_guard tests (weak-node detection, steady-state zero updates, sub-quantum gating, join/drop rebuild, determinism, disconnection) + an engine-level wiring test (down-weighted node -> weak side -> recalibration). Engine 24 tests; workspace gate 2,946 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(engine): mesh partition risk demotes privacy + enters the witness (ADR-032) Completes the mesh-guard integration: its at_risk signal was advisory-only (fed the recalibration advisor). It now also contributes to the ADR-141 privacy demotion alongside fusion- and array-level contradictions — a mesh close to partitioning makes the fused belief less trustworthy, so the cycle emits at a more restricted class (monotonic; information only removed). Because effective_class feeds the BLAKE3 witness, a fragmenting array now shifts the witness: partition risk is auditable, not just logged. The mesh computation moved ahead of the demotion step in process_cycle; mesh_guard_mut exposes risk-threshold tuning. Test: a forced-risk 3-node cycle demotes PrivateHome Anonymous->Restricted and shifts the witness vs a clean baseline. Engine 25 tests; workspace gate 2,947 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * fix: public-PR review findings — privacy-path honesty, gate holes, mesh-guard cliff - sensing-server: engine errors logged+counted (no silent swallow), trust state exposed via status surface, privacy-demotion claims aligned with the actual parallel-audit-path behavior - occupancy_bench: vacuous-F1 hole closed (degenerate test sets fail with their own criterion); CI-lower-bound test made probative - mesh_guard: quantization scaled to observed coupling range — >=65-node balanced meshes no longer permanently at_risk (regression test) - engine: both wiring tests made probative (same-topology witness compare, deterministic risk-crossing fixture) - mat: axum/tokio optional behind api; real serde feature (api enables it) - core: canonical decoder strict (non-zero reserved bytes and nil UUID rejected — injective on accepted domain, forged-bytes tests) - CHANGELOG: un-spliced the FFT/adapter bullet mangle Co-Authored-By: claude-flow <ruv@ruv.net> * chore: strip private-track references for public PR Reword the occupancy-benchmark changelog bullet to drop a cross-reference to the private research track, and restore the WorldGraph retention bullet header that was glued onto the preceding MAT bullet. Co-Authored-By: claude-flow <ruv@ruv.net> * chore: lockfile refresh for cherry-picked feature set Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: Claude <noreply@anthropic.com>
2026-06-11 16:08:54 -04:00 · 2026-06-11 16:08:54 -04:00 · 29de574e63
parent d0e27e652e
commit 29de574e63
24 changed files with 4157 additions and 55 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -7,7 +7,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

+### Changed
+- **Mesh partition risk now demotes the privacy class and is witnessed (ADR-032).** The dynamic min-cut guard's `at_risk` signal was advisory-only (it fed the recalibration advisor). It now also contributes to the ADR-141 privacy demotion alongside fusion- and array-level contradictions: a mesh close to partitioning makes the fused belief less trustworthy, so the cycle emits at a more restricted class (monotonic — information only removed). Because `effective_class` feeds the BLAKE3 witness, a fragmenting array now shifts the witness — partition risk is auditable, not just logged. The mesh computation moved ahead of the demotion step in `process_cycle`; new `mesh_guard_mut()` exposes risk-threshold tuning. Test proves a forced-risk 3-node cycle demotes PrivateHome Anonymous→Restricted and shifts the witness vs a clean *same-topology* baseline (the only delta between the two cycles is the forced risk).
+
+### Added
+- **Dynamic min-cut mesh partition guard in the streaming engine (`mesh_guard`).** Maintains a `ruvector-mincut` exact min-cut over the live mesh coupling graph (nodes = sensing nodes, coupling = product of fusion attention weights), surfacing per cycle: the global **cut value** (how close the array is to splitting — a structural measure per-node heuristics miss), the **weak side** (which specific nodes would partition: failure/jamming triage feeding ADR-032 posture), and an **at-risk flag** that counts as a structural event for the drift→recalibration advisor. Surfaced as `TrustedOutput::mesh`. **Measured cost policy** (criterion, 12-node mesh): weights are quantized (1/64; a *nonzero* coupling below one quantum saturates to quantum 1 so quantization never erases a live coupling — without the floor, balanced meshes of ≥ 65 nodes had every ~1/n coupling erased and sat permanently "at risk") and updates change-gated, so the steady-state cycle does zero graph work (~7.3 µs, ~23× cheaper than building); on any real change a full exact rebuild (~171 µs) is used because one `DynamicMinCut` delete+insert measured ~240 µs — the incremental machinery's overhead targets much larger graphs, so rebuild-on-change is the measured optimum at mesh scale (one-edge case −28% after the policy switch). Degenerate cases fail toward risk: a node with zero coupling is reported as already partitioned (cut 0). 9 mesh-guard tests + an engine-level wiring test; full `process_cycle` with the guard: ~33 µs for 4 nodes (50 ms budget).
+- **Opt-in FFT operator for the CIR ISTA solver (8–14× measured).** Φ is a sub-DFT, so each ISTA mat-vec can run as one length-G FFT (O(G log G)) instead of a dense O(K·G) product. New `CirConfig::fft_operator` (default **false** — the dense path stays the bit-exact witness default; the FFT evaluates the same sums in a different order, so enabling it shifts float results and requires regenerating any pinned witness). `FftOperator` (rustfft, planned once at construction, scratch reused across the ISTA loop) dispatches inside `ista_solve`; warm-start/Lipschitz stay dense at construction. Measured (criterion, same run): ht20 2.22 ms → 265 µs (**8.4×**), ht40 10.26 ms → 717 µs (**14.3×**); the real HE40 grid (K=484, G=1452) scales further. 3 new tests: FFT↔dense matvec equivalence to float tolerance (ht20 + he40 grids), end-to-end dominant-tap agreement on a single-path frame, and all default configs keep FFT off. New `cir_estimate_fft` bench group.
+- **Per-room adapter provenance + drift→recalibration advisor in the streaming engine.** Closes the trust-chain gap where an ~11 KB per-room LoRA adapter (ADR-150 §3.4) could silently change inference without the witness noticing. `StreamingEngine::set_room_adapter(AdapterInfo)` pins the adapter's content-derived id into provenance `model_version` (`rfenc-v1+adapter:<id>`) — and therefore into the BLAKE3 witness — so swapping or clearing adapter weights always shifts the witness (engine test proves base → adapter → other-adapter → cleared all witness differently, and cleared == base). New `RecalibrationAdvisor` recommends re-running the ADR-135 baseline / refitting the adapter on sustained low fusion coherence (streak threshold, default 60 cycles ≈ 3 s at 20 Hz) or an ADR-142 change-point; surfaced as `TrustedOutput::recalibration_recommended` and recorded on the sensing-server's `EngineBridge` alongside the witness. Bridge plumbing: `EngineBridge::{set_room_adapter, clear_room_adapter}` + live-path test that the adapter id flows into the live witness. *Scope note: this is the deployable provenance/trigger half of the "retrained model" roadmap item — fitting the adapter itself runs in the existing external calibration service (`aether-arena/calibration/`), and a trained RF-encoder checkpoint still does not exist in-tree.*
+- **RuView beyond-SOTA research series** (`docs/research/ruview-beyond-sota/`, 6 docs) — research-swarm output defining the beyond-SOTA bar and the path to it: system capability audit (role→crate maturity matrix, gap analysis, risk register), web-verified 2026 SOTA landscape per capability axis (incl. ratified IEEE 802.11bf-2025), 8-pillar target architecture on the ADR-136 contract spine (no rewrite), 6-layer benchmark/validation methodology (all 15 criterion bench targets inventoried; ADR-149 statistical protocol), and a determinism-safe optimization roadmap. Includes session validation evidence: 2,797 workspace tests / 0 failed, Python proof PASS (bit-exact), paired pre/post criterion runs.
+
+### Performance
+- **CIR estimator warm-start precompute** — the diagonal Tikhonov preconditioner `diag(Φ^H Φ)+λI` and its CSR matrix were rebuilt every frame although they depend only on Φ and λ (fixed at `CirEstimator::new`); now precomputed at construction (`ruvsense/cir.rs`). Bit-identical floats (summation order unchanged, witness chain unaffected). Measured: `cir_estimate/he40` −3.9% (p<0.01), multiband groups −1.2/−1.4%; smaller configs within container noise.
+- **RF tomography solver hoisting** — ISTA gradient buffer no longer allocated inside the 100-iteration loop, and the Frobenius Lipschitz bound moved from per-`reconstruct` to construction (`ruvsense/tomography.rs`). Bit-identical results.
+
+### Added
+- **Falsifiable occupancy benchmark (`wifi-densepose-train::occupancy_bench`).** Makes the presence/person-count "beyond SOTA" claim falsifiable in code instead of aspirational (the unfalsifiability gap from the beyond-SOTA system review). Grades predictions vs ground truth and gates a SOTA claim behind one `claim_allowed` invariant requiring all of: `DataProvenance::Measured` (synthetic/mock is scorable but **never claimable** — anti-mock-contamination per the CLAUDE.md Kconfig-bug lesson), a leak-free `EvalSplit` (refuses any split where a subject *or* environment id appears in both train and test — subject leakage / per-environment overfitting), `n_test ≥ min`, a **non-degenerate test set** (both truth classes represented: present-rate ≥ `min_positive_rate` and ≥ 1 absent sample — an all-absent set plus an always-absent predictor cannot release a claim; vacuous F1 scores 0.0, never 1.0), presence-F1 **bootstrap-CI lower bound** (deterministic seeded splitmix64) clearing the threshold, and count MAE within threshold. The claim string is unreadable except through the gate (`NO_CLAIM` otherwise). What remains is data, not method: a frozen, SHA-pinned, subject/environment-disjoint measured replay set turns the claim into a passing/failing test. 12 tests cover each refusal path, including the point-above/CI-below case (claim withheld on the CI lower bound even when the point estimate clears the threshold).
+- **Live trust path: sensing-server routes real frames through the governed `StreamingEngine` (parallel governed path with partial output gating).** Previously the live server ran only the *bare* `MultistaticFuser` (fused amplitudes, no trust control plane), while the privacy/provenance/witness engine (ADR-135..146) ran only on synthetic in-test frames — the gap called out in ADR-136 §8 and the beyond-SOTA system review. New `engine_bridge` module drives `StreamingEngine::process_cycle` from the server's live `NodeState` map (reusing the existing `NodeState → MultiBandCsiFrame` conversion), lazily wiring each node as a WorldGraph sensor and bounding belief growth via the retention cap; every *governed belief* carries evidence + model + calibration + privacy decision and a deterministic witness. **Honest scope:** the engine runs alongside (not instead of) the bare fusion path that feeds the live `SensingUpdate`. What its decision gates on the wire today: a cycle emitted at class `Restricted` (base mode or contradiction/mesh-risk demotion) suppresses the per-node raw amplitude vectors from the live publish — the same field mapping `wifi-densepose-bfld`'s privacy gate applies at `Restricted`; gating the remaining derived outputs (person count, classification, signal field) is tracked as a follow-up. Trust state is no longer write-only: the latest witness, effective privacy class, demotion flag, recalibration recommendation, and an engine-error counter are readable on `GET /api/v1/status`, and engine errors are counted + rate-limit logged instead of silently swallowed (`EngineBridge::observe_cycle`). Adds `wifi-densepose-engine/-worldgraph/-bfld/-geo` deps. Bridge tests cover witnessed belief with provenance, determinism, idempotent node registration, retention bound, privacy-mode propagation, trust-state recording, the error-counter path, and Restricted-class raw-output suppression.
+
 ### Fixed
+- **`wifi-densepose-mat` standalone `--no-default-features` build (101 errors → 0).** `pub mod api` was unconditional while its only dependency, serde, is optional behind the `api` feature — so any build without default features failed with unresolved serde imports (masked in `--workspace` runs by feature unification). The `api` module and its `create_router`/`AppState` re-export are now `#[cfg(feature = "api")]`-gated (with docsrs annotations). All feature combos compile: bare `--no-default-features`, `--no-default-features --features api`, and full default (177 tests pass).
+- **WorldGraph no longer grows unboundedly under the live loop.** `StreamingEngine::process_cycle` appended one `SemanticState` belief per cycle with no eviction — ~1.7M nodes/day at 20 Hz (identified in `docs/research/ruview-beyond-sota/04-optimization-roadmap.md`). Added `WorldGraph::prune_semantic_states(max)` — deterministic eviction of the oldest beliefs by `(valid_from_unix_ms, id)`, structural nodes (rooms/zones/sensors/anchors/tracks/events) never eligible — and wired it into the engine after each belief append (`StreamingEngine::DEFAULT_SEMANTIC_RETENTION` = 7,200 ≈ 6 min at 20 Hz; tunable via `set_semantic_retention`). The WorldGraph holds *current* beliefs; durable history is the recorder's job, so no audit data is lost. 3 new tests (bounded growth end-to-end, oldest-only eviction, deterministic tie-break).
 - **ESP32 edge heart rate no longer stuck at ~45 BPM / dropping wildly — #987.** The on-device HR estimator (`edge_processing.c`, `0xC5110002`) reported ~45 BPM regardless of true heart rate (Apple-Watch ground truth 87 BPM read as ~45) and swung frame-to-frame. Two root causes: (1) a hardcoded `sample_rate = 10.0f` that became wrong after #985's self-ping raised the CSI callback rate to a variable ~13–19 Hz — BPM scales as `assumed/actual × true`, so 87 read ~45 and the reading swung as CSI yield fluctuated; (2) the zero-crossing estimator locked onto a breathing harmonic (a 0.25 Hz breathing fundamental puts its 3rd harmonic at ~0.74 Hz ≈ 44 BPM inside the HR band). Fix: measure the real sample rate from inter-frame timestamps (used for BPM conversion + biquad re-tuning on >15% drift); replace the HR zero-crossing with an autocorrelation estimator that rejects breathing harmonics (driven by a robust autocorr breathing period); median-13 smooth the output. Hardware A/B (fixed vs unmodified control board, both `edge_tier=2`): control pegged 40–49 BPM; fixed reaches the true 88–91 BPM (vs 87 GT) and holds a stable physiological value (spread 59→0 for a steady subject). Known limitation: heavy subject motion still degrades the estimate (motion gating is a follow-up).
 - **Person count no longer leaks up to 10 in heuristic mode — addresses #894.** `field_bridge::occupancy_or_fallback` returned the eigenvalue-based `FieldModel::estimate_occupancy` count **unbounded** (its internal ceiling is 10), while the sibling estimators on the same single-link data — the perturbation-energy fallback right below it and `score_to_person_count` — both cap at 3 ("1-3 for single ESP32"). On noisy / under-calibrated CSI the eigenvalue count inflated, producing the "10 persons reported when 1 present" symptom (seen when `--model` fails to load and the server runs on heuristics). Bounded the eigenvalue path to the shared `MAX_SINGLE_LINK_OCCUPANCY` (3) so every estimator on one link agrees; genuine higher counts come from the multistatic fusion path, not a single-link covariance estimate.
 - **MQTT multi-node deployments now create one Home-Assistant device per node — closes #898.** After the #872 MQTT wiring landed, the JSON→`VitalsSnapshot` bridge hard-coded a single `node_id` (the MQTT client id) and the publisher used a single `OwnedDiscoveryBuilder`, so every physical node collapsed into one device (`identifiers:["wifi_densepose_wifi-densepose-1"]`), contradicting the "one device per node" docs. The bridge now emits one snapshot per node in the sensing update's `nodes[]` (each with its own `node_id` + RSSI, falling back to a single aggregate snapshot for wifi/simulate sources), and the publisher derives a per-node builder (`OwnedDiscoveryBuilder::for_node`) that publishes discovery + availability lazily on first sight of each `node_id` and routes state to per-node topics — yielding N distinct HA devices with per-node availability/LWT. Unit-tested (distinct nodes → distinct `wifi_densepose_<node>` identifiers); 71 MQTT tests pass.
--- a/docs/research/ruview-beyond-sota/00-system-review.md
+++ b/docs/research/ruview-beyond-sota/00-system-review.md
@ -0,0 +1,165 @@
+# RuView System Review — Capability Audit (Beyond-SOTA Series, Doc 00)
+
+**Date:** 2026-06-09
+**Scope:** The RuView product surface (ADR-031) and the 38-crate Rust workspace under `v2/crates/` that implements it, plus the ADR corpus (`docs/adr/`, 150 numbered ADRs) and the prior research corpus (`docs/research/sota-2026-05-22/`).
+**Method:** Direct reads of `lib.rs`/`mod.rs` and key ADRs; static test counts via `grep -r '#[test]'` / `#[tokio::test]` per crate (counts are *static occurrences in source*, not CI pass counts). No metrics in this document are estimated — everything cited was read or measured in the working tree.
+
+---
+
+## 1. Executive Summary — What RuView IS Today
+
+RuView is **not a crate**. Per ADR-136 §2.1 (`docs/adr/ADR-136-ruview-streaming-engine-frame-contracts.md`), RuView is the sensing-first *product surface and brand* (ADR-031, status: Proposed) layered on the existing `wifi-densepose-*` / `homecore*` / `cog-*` workspace. ADR-136 explicitly **rejects** a `ruview_*` crate rename and pins a normative ten-role mapping (ingest / signal / fusion / world / models / privacy / store / api / eval / observe) onto the existing crates.
+
+What concretely exists:
+
+1. **A deep, heavily-tested signal-processing layer.** `wifi-densepose-signal` contains 473 static `#[test]` occurrences, including a 22-file `ruvsense/` bounded context (`v2/crates/wifi-densepose-signal/src/ruvsense/`) implementing the ADR-029 six-stage multistatic pipeline plus ADR-030/032a/134/135/137/138/142/143 extensions (~14,000 lines, 330 in-module tests measured by per-file grep).
+2. **A trust-traceable composition root.** `wifi-densepose-engine` (`src/lib.rs`, 752 lines, 11 tests) wires fusion quality (ADR-137), array coordination (ADR-138), evolution change-points (ADR-142), RF-SLAM anchors (ADR-143), the WorldGraph (ADR-139), and the BFLD privacy control plane (ADR-141) into one `StreamingEngine::process_cycle` (`lib.rs:285`) that emits a `TrustedOutput` (`lib.rs:80`) carrying evidence + model version + calibration version + privacy decision + a BLAKE3 witness (`lib.rs:437`).
+3. **A privacy layer with structural invariants.** `wifi-densepose-bfld` (20 modules, 369 tests) implements ADR-118–123/141: raw BFI never exits the node (I1), identity embeddings are RAM-only (I2), cross-site identity correlation is cryptographically impossible (I3) — stated at `wifi-densepose-bfld/src/lib.rs:7-11`.
+4. **A Home-Assistant-class world/state layer.** `homecore` + 9 sibling crates (state machine, event bus, plugins, automation, REST/WS API, recorder, HAP bridge, assist) — explicitly a "P1 scaffold" per `homecore/src/lib.rs:7` with deferred items listed at `lib.rs:24-31`.
+5. **A drone-swarm extension.** `ruview-swarm` (17 modules, ~9,000 lines in subdirectories, 115 + 19 async tests), ADR-148 self-reports ~98% complete with the remaining 15% of M3 gated on real ESP32-S3 hardware (`ADR-148:940-953`).
+6. **A large prior research corpus.** The 2026-05-22 autonomous SOTA loop: 41 ticks, 19 research threads (R1–R20), 22 numpy reference implementations, 7 ADRs, and a 6-tier production roadmap (`docs/research/sota-2026-05-22/00-summary.md`, `PRODUCTION-ROADMAP.md`).
+
+The critical caveat, stated by the project itself: the ADR-136–146 series is *"a skeleton and nervous system, not a shipping product… Most of the series is not yet wired into the live 20 Hz pipeline"* (ADR-136 §8). The engine crate's own docs confirm what is absent: *"the live 20 Hz I/O loop (sensing-server), UWB hardware (ADR-144), and model training (ADR-146)"* (`wifi-densepose-engine/src/lib.rs:27-29`).
+
+---
+
+## 2. Capability Matrix — Pipeline Role → Crates → Maturity
+
+Role mapping is normative per ADR-136 §2.1; maturity is this review's judgment from code + ADR status. Test counts: static `#[test]` + `#[tokio::test]` greps (2026-06-09).
+
+| Role | Crate(s) | Key modules | Tests (sync+async) | Maturity | Evidence |
+|---|---|---|---|---|---|
+| **ingest** | `wifi-densepose-sensing-server`, `wifi-densepose-hardware`, `wifi-densepose-wifiscan` | `csi.rs`, `multistatic_bridge.rs`, `tracker_bridge.rs`, ESP32 TDM | 557+14, 137, 150 | **Production** (hardware-validated per ADR-028/039) | `sensing-server/src/` has 30+ modules incl. MQTT, Matter, RVF pipeline |
+| **signal** | `wifi-densepose-signal` (incl. `ruvsense/`) | 6-stage pipeline (`ruvsense/mod.rs:9-23`), `cir.rs`, `calibration.rs`, `hampel.rs`, `fresnel.rs`, `phase_sanitizer.rs` | 473 | **Production** (unit level); live multistatic wiring **beta** | §3 below; ADR-014 Accepted, ADR-029 Proposed |
+| **fusion** | `ruvsense/multistatic.rs`, `ruvsense/fusion_quality.rs`, `wifi-densepose-ruvector/src/viewpoint/` | `MultistaticFuser`, `QualityScore`, `CrossViewpointAttention`, GDI/Cramér-Rao (`viewpoint/geometry.rs`) | 20 (multistatic.rs), 3 (fusion_quality.rs), 136 (ruvector crate) | **Beta** — tested building blocks, composed only in `wifi-densepose-engine` tests | `viewpoint/mod.rs:1-30`; engine `lib.rs:317-319` |
+| **world** | `homecore`, `wifi-densepose-worldgraph`, `wifi-densepose-geo`, `wifi-densepose-worldmodel` | `StateMachine`, `EventBus`, `WorldGraph` (rooms/sensors/person-tracks/semantic states), ENU geo registration | 9+11, 7, 16+1, 12+1 | **Beta** — homecore is explicit "P1 scaffold"; persistence/service dispatch deferred to P2 | `homecore/src/lib.rs:7, 24-31`; ADR-127 Proposed |
+| **models** | `cog-pose-estimation`, `cog-person-count`, `wifi-densepose-nn`, `wifi-densepose-train`, `wifi-densepose-occworld-candle` | ONNX/Candle inference, training pipeline, OccWorld bridge | 7, 15, 30+1, 312, 12 | **Experimental** — no trained RF foundation encoder exists; ADR-147 benchmarked OccWorld with **random weights** | `ADR-147-benchmark-proof.md` ("random weights — pre-domain-fine-tuning baseline"); ADR-146/150 Proposed |
+| **privacy** | `wifi-densepose-bfld` | `privacy_gate.rs`, `privacy_mode.rs` (mode registry + hash-chained attestation), `identity_risk.rs`, `signature_hasher.rs`, `embedding_ring.rs` | 369 | **Beta** — strongest-tested layer, but lib header still says "Status: P1 in progress" (`lib.rs:12`, stale vs 20 implemented modules) | ADR-118–123, 141 all Proposed |
+| **store** | `homecore-recorder` | trajectory/event recording | 8+12 | **Experimental** | ADR-136 §2.1 |
+| **api** | `homecore-api`, `homecore-server`, `cog-ha-matter`, `homecore-hap` | REST/WS, HA discovery, Matter, HomeKit | 7+11, 0, 63+1, 15+2 | **Experimental→Beta** (`homecore-server` has zero tests) | ADR-130/125/115 Proposed |
+| **eval** | `wifi-densepose-train/src/ablation.rs`, `ruview-swarm/src/evals/` | ablation harness (ADR-145), swarm eval suite (ADR-149) | included in 312 / 115 | **Experimental** — ADR-145 self-labels "skeleton/scaffolding, mostly not yet on the live 20 Hz path" | `ablation.rs` exists; ADR-149 (swarm benchmarking) Accepted |
+| **observe** | `homecore-automation`, `homecore-assist` | automation engine, assistant/Ruflo bridge | 20+14, 3+20 | **Experimental** | ADR-129/133 Proposed |
+| **(integration root)** | `wifi-densepose-engine` | `StreamingEngine`, `TrustedOutput`, privacy demotion, witness | 11 | **Beta** — the only crate that proves cross-role composition; not on a live I/O path | `engine/src/lib.rs:1-29, 457-751` |
+| **(swarm)** | `ruview-swarm` | Raft/gossip topology, RRT-APF planning, Candle PPO MARL, CSI sensing payload, failsafe, Ruflo | 115+19 | **Experimental/simulation** — M3 needs real ESP32-S3 hardware | ADR-148:940-953 ("Overall ~98%", M3 85%) |
+| **(adjacent)** | `nvsim`, `nvsim-server`, `ruv-neural`, `wifi-densepose-wasm-edge`, `wifi-densepose-mat`, `wifi-densepose-vitals` | NV-diamond sim, neural lib, WASM edge, MAT disaster tool, vitals | 50, 0, 364, 643, 165+9, 52 | Mixed — `mat`/`vitals`/`wasm-edge` mature unit-wise | crate listing |
+
+**Workspace totals (measured):** 3,890 `#[test]` + 121 `#[tokio::test]` static occurrences across `v2/crates/`. (CLAUDE.md's "1,031+ tests" figure refers to an earlier `cargo test --workspace` run count; this review did not execute the suite.)
+
+External vendored runtimes also present: `vendor/rvcsi` (ADR-095/096 edge RF runtime, own repo), `vendor/ruvector`, `vendor/midstream`, `vendor/sublinear-time-solver`.
+
+---
+
+## 3. Signal-Processing Capability Inventory — `ruvsense/`
+
+Location: `v2/crates/wifi-densepose-signal/src/ruvsense/`. CLAUDE.md says "16 modules"; the directory now contains **22 `.rs` files** (21 modules + `mod.rs`) — the table below is the ground truth. Lines/tests measured per file (2026-06-09).
+
+| Module | Lines | Tests | ADR | What it does |
+|---|---:|---:|---|---|
+| `mod.rs` | 510 | 14 | 029 | Pipeline shell, COCO-17 keypoint constants, `RuvSensePipeline` (concrete fields + `tick()`), re-exports |
+| `multiband.rs` | 442 | 14 | 029 | Channel-hopping CSI → wideband virtual snapshot per node (`MultiBandCsiFrame`) |
+| `phase_align.rs` | 460 | 13 | 029 | LO phase-offset estimation via circular mean + `ruvector-solver::NeumannSolver` |
+| `multistatic.rs` | 957 | 20 | 029 | Attention-weighted N-node fusion → `FusedSensingFrame`; timestamp-spread guards |
+| `coherence.rs` | 474 | 19 | 029 | Per-subcarrier z-score coherence vs rolling template; `DriftProfile` |
+| `coherence_gate.rs` | 380 | 17 | 029 | Accept / PredictOnly / Reject / Recalibrate gate decisions |
+| `pose_tracker.rs` | 1,577 | 38 | 029/026/082 | 17-keypoint Kalman tracker, lifecycle state machine, AETHER re-ID embeddings, skeleton constraints, temporal keypoint attention |
+| `field_model.rs` | 1,417 | 22 | 030 | SVD room eigenstructure (persistent field model), perturbation extraction |
+| `tomography.rs` | 751 | 12 | 030 | RF tomography, ISTA L1 voxel solver |
+| `longitudinal.rs` | 1,020 | 20 | 030 | Welford long-horizon stats, biomechanics drift detection |
+| `intention.rs` | 511 | 12 | 030 | Pre-movement lead signals (200–500 ms) |
+| `cross_room.rs` | 626 | 13 | 030 | Environment fingerprinting + room-transition graph |
+| `gesture.rs` | 579 | 14 | 030 | DTW template-matching gesture classifier |
+| `adversarial.rs` | 586 | 13 | 030/032 | Physically-impossible-signal detection, multi-link consistency |
+| `attractor_drift.rs` | 566 | 15 | 032a | Midstream-enhanced attractor drift detection |
+| `temporal_gesture.rs` | 540 | 15 | 032a | Midstream temporal gesture stream |
+| `cir.rs` | 1,025 | 10 | 134 | CSI→CIR via ISTA L1 sparse recovery, NeumannSolver warm-start, `Complex32` sub-DFT Φ |
+| `calibration.rs` | 717 | 8 | 135 | Empty-room baseline (Welford amplitude + von Mises phase), drift-triggered recalibration |
+| `fusion_quality.rs` | 188 | 3 | 137 | `QualityScore` with `EvidenceRef`s, `ContradictionFlag`s, `CalibrationId`, privacy-demotion predicate |
+| `array_coordinator.rs` | 343 | 5 | 138 | Clock-quality gating + `DirectionalEvidence` (geometric admission) |
+| `evolution.rs` | 406 | 7 | 142 | Cross-link change-point detection, Bayesian `TemporalVoxelMap` (privacy-gated) |
+| `rf_slam.rs` | 301 | 6 | 143 | Persistent reflector discovery → static anchor learning (Wall/Furniture/Mobile classes) |
+
+Subtotal: ~14,400 lines, 310 tests inside `ruvsense/` alone. The non-ruvsense signal layer adds Hampel filtering, CSI-ratio, phase sanitisation, Fresnel modeling, BVP, spectrograms, subcarrier selection, and hardware normalisation (`signal/src/*.rs`).
+
+**Cross-viewpoint fusion** (`wifi-densepose-ruvector/src/viewpoint/`, 5 files): scaled dot-product attention with geometric bias (`attention.rs`), Geometric Diversity Index + Cramér-Rao bounds (`geometry.rs`), phase-phasor coherence with hysteresis + clock-quality gate (`coherence.rs`), and the `MultistaticArray` aggregate root (`fusion.rs`). 136 tests crate-wide.
+
+---
+
+## 4. The Trust Chain — What Actually Composes Today
+
+`wifi-densepose-engine/src/lib.rs` is the proof-of-composition. One `process_cycle` (`lib.rs:285-368`):
+
+1. ADR-138 array coordination (only if every node's geometry is registered, `lib.rs:372-389`)
+2. ADR-137 `fuse_scored_calibrated` with **per-node calibration epochs** — mismatching `CalibrationId`s raise a contradiction (`lib.rs:304-319`)
+3. ADR-142 change-point → WorldGraph `Event` node (`lib.rs:393-430`)
+4. ADR-141 monotonic privacy demotion on any contradiction (`demote_one`, `lib.rs:452-455`)
+5. ADR-139/140 `SemanticState` with mandatory provenance (evidence ‖ model ‖ calibration ‖ privacy decision) (`lib.rs:336-352`)
+6. BLAKE3 witness over the trust decision (`witness_of`, `lib.rs:437-448`)
+
+The 11 engine tests verify exactly the right invariants: full provenance flow (`cycle_carries_full_provenance`, `lib.rs:487`), contradiction→demotion (`lib.rs:517`), determinism (`lib.rs:535`), calibration-mismatch→Restricted+stable-witness (`lib.rs:648`), privacy-mode attestation chain (`lib.rs:741`), and persist→reload round-trip with **no raw RF in the snapshot** (`live_frame_to_reload_same_contents`, `lib.rs:696-736`).
+
+This is genuinely strong design. But all inputs are synthetic `MultiBandCsiFrame`s constructed in the test module; no ingest crate calls `StreamingEngine` yet.
+
+---
+
+## 5. Strengths
+
+1. **Deterministic witness chain, end to end in design.** ADR-028 proof (`archive/v1/data/proof/verify.py` + SHA-256), ADR-119 BLAKE3 frame witnesses (`bfld/src/signature_hasher.rs`), ADR-136 `CanonicalFrame`/`ComplexSample` LE contracts, and the engine's per-cycle trust witness form a coherent auditability story few sensing systems attempt.
+2. **Privacy as a control plane, not a feature.** BFLD's three structural invariants (`bfld/src/lib.rs:7-11`), hash-rotation (ADR-120), identity-risk scoring (ADR-121), mode registry with hash-chained attestations, and *monotonic* demotion wired to fusion contradictions (engine `lib.rs:327-328`) — uncertainty automatically reduces information release.
+3. **Multistatic fusion with physics-grounded quality.** Attention fusion + GDI + Cramér-Rao bounds + clock-quality gating means geometry and synchronisation deficits are first-class, measurable contradiction sources rather than silent failure modes.
+4. **Test density at the unit level.** 3,890 static test functions; the signal core (473), BFLD (369), and sensing-server (571) are the deepest. ruvsense files average ~14 tests/module.
+5. **Honest self-assessment culture.** ADR-136 §8's "skeleton, not a shipping product" framing, ADR-147's explicit "random weights" disclosure, and homecore's in-source TODO-P2 ledger (`homecore/src/lib.rs:24-31`) make the gap analysis below mostly a matter of reading what the project already admits.
+6. **A real prior research base with negative results.** The sota-2026-05-22 loop catalogued negatives by resolution path (missing-tool / architecture-error / physics-floor) and produced a ship-recipe (N=5 chest-centric placement, 100% coverage for 1–4 occupants) consolidated into ADR-113.
+7. **Hardware path exists and was audited.** ADR-028 (Accepted) and ADR-039 (Accepted, hardware-validated) anchor the ESP32-S3/C6 ingest tier; firmware release process includes real-CSI verification on COM ports.
+
+---
+
+## 6. Honest Gap Analysis — ADR vs Implemented vs Integrated
+
+| Capability | ADR status | Code status | Integrated on live path? |
+|---|---|---|---|
+| Six-stage ruvsense pipeline | ADR-029 **Proposed** | Implemented + tested (310 tests) | Partially — sensing-server has `multistatic_bridge.rs`/`tracker_bridge.rs`, but `RuvSensePipeline` still holds concrete fields with `tick()` only (`mod.rs`); no uniform `Stage<I,O>` chain runs live |
+| Frame contracts (`ComplexSample`, provenance fields, `Stage` traits) | ADR-136 Proposed | Built + 9 acceptance tests (per ADR-136 §8, commit `11f89727f`) | **No** — AC6 600-frame replay witness key and AC7 cross-arch CI matrix not done; provenance fields not populated by live calibration/model stages |
+| Fusion quality / contradictions | ADR-137 Proposed | `fusion_quality.rs` (188 lines, 3 tests) + engine wiring | Engine-tests only |
+| WorldGraph digital twin | ADR-139 Proposed | `wifi-densepose-worldgraph` (4 files, 7 tests) | Engine-tests only; no recorder-backed persistence loop |
+| Privacy control plane | ADR-141 Proposed | `privacy_mode.rs` registry + attestation chain, tested | Engine-tests only; MQTT/HA exposure exists in BFLD but the *engine→BFLD sink* live path is unwired |
+| UWB range fusion | ADR-144 Proposed | **No hardware, no crate** — acknowledged absent (`engine/src/lib.rs:28`) | No |
+| Ablation/leakage eval harness | ADR-145 Proposed | `wifi-densepose-train/src/ablation.rs` exists | Self-labelled "skeleton/scaffolding" (ADR-145 §status) |
+| RF encoder multi-task heads | ADR-146 Proposed | Not trained; `model_id`/`model_version` registry unowned | No — engine stamps `rfenc-v1` as a placeholder string (`lib.rs:338`) |
+| RF foundation encoder | ADR-150 **Proposed** | ADR only | No |
+| World-model forecasting (OccWorld) | ADR-147 (benchmark doc) | Runs on RTX 5080, 72.39M params — **random weights**, no domain checkpoint | No |
+| HomeCore HA port | ADR-125–133 all Proposed | P1 scaffold + siblings; `homecore-server` has **0 tests**; persistence, service mpsc dispatch, device registry, witness integration all deferred (`homecore/src/lib.rs:24-31`) | Partially (API surfaces exist) |
+| BFLD capture path (Nexmon/ESP32 BFI) | ADR-123 Proposed | rvCSI vendored runtime exists for nexmon `.pcap`; BFI-specific capture unverified in this review | Unclear |
+| Drone swarm | ADR-148 In Progress | 17 modules, sim + Candle PPO complete per milestones | **Simulation only** — M3's 15% requires physical ESP32-S3 CSI capture (ADR-148:946) |
+| Federation / DP-SGD / PQC | ADR-105–109 Proposed | `ruview-fed` crate **does not exist** (roadmap Tier 2 item) | No |
+| Antenna-placement CLI (`plan-antennas`) | ADR-113 Proposed; Roadmap Tier 1.1 HIGH | numpy references only; not found as a Rust CLI subcommand | No |
+
+**Pattern:** the unit layer is real and deep; the *integration* layer is one crate (`wifi-densepose-engine`) exercised solely by its own synthetic tests; the *model* layer (anything learned: RF encoder, pose model fine-tuned on CSI, OccWorld domain weights) is the emptiest tier. Nearly every ADR ≥118 carries status **Proposed** even where substantial tested code exists — ADR status hygiene lags implementation in both directions (BFLD code outruns its "P1 in progress" header; ADR-148's "98%" outruns its hardware evidence).
+
+---
+
+## 7. Risk Register
+
+| # | Risk | Likelihood | Impact | Evidence / Notes |
+|---|---|---|---|---|
+| R1 | **Integration gap**: trust chain proven only against synthetic in-test frames; live 20 Hz ingest→engine→BFLD-sink path unwired, so the headline guarantee (auditable provenance on every emission) is unverified in production conditions | High | Critical | `engine/src/lib.rs:27-29`; ADR-136 §8 |
+| R2 | **No trained model**: every learned component (RF encoder ADR-146/150, OccWorld ADR-147) is random-weight or absent; sensing claims beyond coherence/occupancy heuristics cannot ship | High | Critical | ADR-147 "random weights"; ADR-146/150 Proposed |
+| R3 | **Synthetic-validation bias**: ruvsense/engine/swarm tests and the sota-loop results (e.g., R3 "100% (synthetic)", ADR-113 placement numbers) are simulation-derived; real-room domain gap unquantified | High | High | `00-summary.md:45`; PRODUCTION-ROADMAP 2.3 ("turns synthetic numbers into validated numbers") |
+| R4 | **Witness chain incomplete at frame level**: `CsiFrame.data` is still `serde(skip)` (ADR-136 Gap 2); AC6 replay-witness key and AC7 cross-architecture matrix not landed — deterministic replay is a design, not a property | Medium | High | ADR-136 §1.1, §8 |
+| R5 | **Float nondeterminism in fusion** across thread counts could silently break the witness/replay contract once wired | Medium | High | ADR-136 §3.3 risk table (project's own assessment) |
+| R6 | **Privacy bypass via unwired paths**: BFLD invariants are enforced per-module, but until the engine is the *only* route from ingest to API, a sensing-server endpoint can emit ungated state (sensing-server already has 30+ modules incl. pose/vitals APIs predating the control plane) | Medium | Critical | `sensing-server/src/` module list vs engine isolation |
+| R7 | **Hardware dependence + scale**: multistatic TDMA/channel-hopping timing validated on small ESP32 sets; ADR-148 M3 explicitly blocked on real hardware; clock-quality model in engine uses a hardcoded `ClockQualityScore` (`engine/src/lib.rs:384`) | Medium | High | ADR-148:946; hardcoded 50 µs stdev |
+| R8 | **ADR/doc/status drift**: 150 ADRs with near-universal "Proposed" status, stale in-source status headers (`bfld/src/lib.rs:12`), CLAUDE.md "16 ruvsense modules" vs 22 on disk, duplicate ADR numbers (two ADR-050s, two ADR-147s, two ADR-149s, ADR-052 ×2) — institutional-memory value degrades | High | Medium | `ls docs/adr/`; this review §3 |
+| R9 | **Workspace breadth vs maintenance capacity**: 38 workspace crates + 4 vendored subtrees + Python archive + firmware; several crates have 0 tests (`homecore-server`, `nvsim-server`, `wifi-densepose-wasm`, `homecore-plugin-example`); bus factor appears to be ~1 | High | Medium | crate test-count table §2 |
+| R10 | **Eval debt**: no end-to-end accuracy benchmark on real CSI with ground truth exists in-repo (ADR-145 harness is scaffolding; ADR-079 camera ground truth not exercised here) — "beyond SOTA" claims are currently unfalsifiable | High | High | ADR-145 status note; absence of ground-truth datasets in tree |
+
+---
+
+## 8. Measurement Appendix
+
+- Test counts: `grep -r '#[test]'` / `#[tokio::test]` per crate directory, 2026-06-09. Workspace totals: 3,890 / 121. Top crates: `wasm-edge` 643, `sensing-server` 557+14, `signal` 473, `bfld` 369, `ruv-neural` 364, `train` 312, `mat` 165+9, `wifiscan` 150, `hardware` 137, `ruvector` 136, `ruview-swarm` 115+19.
+- ruvsense per-file lines/tests: `wc -l` + per-file `grep -c '#[test]'` (table in §3).
+- Crate inventory: `ls v2/crates/` → 38 directories.
+- ADR inventory: `ls docs/adr/` → 150 numbered files (with the duplicate numbers noted in R8); `docs/adr/README.md` self-reports "45 ADRs" (stale).
+- Caveats: static `#[test]` counts include `#[cfg(feature = ...)]`-gated and ignored tests; they are an upper bound on what `cargo test --workspace --no-default-features` runs. No cargo build/test was executed for this review.
+
+*Next in series: 01+ documents should target the R1/R2/R10 axis — wiring the live path, training the RF encoder, and standing up a falsifiable real-CSI benchmark — before any "beyond SOTA" claim is made.*
--- a/docs/research/ruview-beyond-sota/01-sota-landscape-2026.md
+++ b/docs/research/ruview-beyond-sota/01-sota-landscape-2026.md
@ -0,0 +1,191 @@
+# SOTA Landscape 2026 — The Bar a Beyond-SOTA RuView Must Clear
+
+**Series**: ruview-beyond-sota (01)
+**Date**: 2026-06-09
+**Status**: Research survey / target definition
+**Builds on (does not duplicate)**: `docs/research/sota-2026-05-22/00-summary.md` (physics floors, placement, privacy chain), `docs/research/BFLD/01-sota-survey.md` (beamforming-feedback leakage SOTA), `docs/research/neural-decoding/21-sota-neural-decoding-landscape.md` (sensor-fidelity framing), `docs/research/rf-topological-sensing/00-rf-topological-sensing-index.md` (mincut/topology resolution limits), ADR-150 (RF foundation encoder + measured MM-Fi campaign), ADR-147 (OccWorld benchmark proof).
+
+## 0. Evidence legend
+
+Every claim in this document carries one of three tags. **No RuView benchmark number in this document is invented**; all RuView numbers come from repo-internal measured artifacts.
+
+| Tag | Meaning |
+|-----|---------|
+| **[V]** | Verified in this session via web search (June 2026); source linked in §8 |
+| **[K]** | Training-knowledge claim (pre-2026 literature); plausible but **not re-verified** — treat as needing citation check before external publication |
+| **[I]** | Internal RuView measurement or artifact (ADR, issue, witness bundle) — measured, not literature |
+
+---
+
+## 1. SOTA reference table per capability axis
+
+### 1.1 Pose estimation (WiFi CSI)
+
+| Method | Year | Metric | Dataset / protocol | Tag |
+|--------|------|--------|--------------------|-----|
+| DensePose From WiFi (Geng, Huang, De la Torre) | 2023 | Dense-pose UV regions from CSI, "comparable to image-based approaches" (same-layout); commonly cited AP≈43.5 / AP@50≈87.2 | 3×3 antenna, single-layout lab | exact AP numbers **[K]**; paper existence **[V]** (arXiv 2301.00250) |
+| MetaFi++ (Zhou et al.) | 2023 | PCK@50 = **97.30%** same-domain real-world (MetaFi: 95.23%); drops to **81.7–86.5%** under stricter protocols | Own capture; protocol-sensitive | **[V]** |
+| Person-in-WiFi 3D (CVPR 2024) | 2024 | End-to-end multi-person 3D; 20.4 M params, **54 FPS**; MPJPE ≈ 90–100 mm on own dataset | Own multi-person dataset | FPS/params **[V]**; MPJPE range **[K]** |
+| GraphPose-Fi (arXiv 2511.19105) | 2025 | SOTA on MM-Fi random split: **MPJPE 160.6 mm**, best PCK at all thresholds | MM-Fi, random split (S1) | **[V]** |
+| CSDS (Electronics 14(4):756) | 2025 | Wi-Pose: PCK@5 = **0.6407**, PCK@50 = **0.8824** | Wi-Pose | **[V]** |
+| PerceptAlign (arXiv 2601.12252) | 2026 | Cross-layout 3D: MPJPE **222.4 mm** (Scene 4) / **317.1 mm** (Scene 5), >54% better than prior cross-layout SOTA; in easier settings MPJPE 181.5 mm, PCK@20/50 = 44.2/79.5 | Cross-layout protocol | **[V]** |
+| WiFlow (arXiv 2602.08661) | 2026 | Lightweight continuous HPE, spatio-temporal decoupling | — | **[V]** (existence; numbers not extracted) |
+| **RuView / AetherArena** | 2026 | **81.63% torso-PCK@20 in-domain (random split), beating MultiFormer's 72.25%** on metric/protocol-matched MM-Fi; **leakage-free cross-subject collapses to ~11.6% torso-PCK zero-shot**; official-split harness baseline ~63–65% PCK@20; **11 KB LoRA few-shot calibration → 72.5%** | MM-Fi (issue #876, ADR-150 §3) | **[I]** |
+
+**The honest reading of the pose axis**: same-domain WiFi pose is "solved-looking" (PCK@50 in the 90s) and meaningless for deployment. The 2025–2026 literature has shifted to cross-layout/cross-subject protocols, where numbers collapse (PerceptAlign PCK@20 = 44.2 cross-layout **[V]**; RuView cross-subject zero-shot 11.6% **[I]**). ADR-150's measured finding — that the cross-subject gap is **subject-distribution shift, not an algorithmic gap**, and that **few-shot in-room calibration (5–200 frames) closes it** — is ahead of where the published literature is: no published WiFi-pose paper we found ships a per-room ~11 KB adapter calibration mechanism. **[I]**
+
+### 1.2 Presence / person count
+
+| Method | Year | Metric | Tag |
+|--------|------|--------|-----|
+| Large-scale commodity router deployment (>10 M routers) | 2025 | **92.6% motion-detection accuracy** across diverse homes | **[V]** (ISAC survey, arXiv 2510.14358) |
+| LeakyBeam (NDSS 2025) | 2025 | Occupancy through walls at 20 m from **plaintext BFI alone**: TPR 82.7%, TNR 96.7% | **[V]** (also in BFLD survey §4.2) |
+| Time-Selective RNN multi-room presence (arXiv 2304.13107) | 2023 | Device-free multi-room presence from CSI | **[V]** (existence) |
+| Academic person counting (0–5 occupants, lab) | 2020–2024 | typically 90–97% exact-count accuracy, degrading sharply >5 people | **[K]** |
+| **RuView** | 2026 | `cog-person-count` ships with calibrated uncertainty (`count_p95_low/high`); multistatic placement recipe with **100% coverage for 1–4 occupants at N=5 nodes (synthetic physics)** | **[I]** (sota-2026-05-22 R6.2.5, ADR-113) |
+
+### 1.3 Vital signs (HR / BR)
+
+| Method | Year | Metric | Tag |
+|--------|------|--------|-----|
+| PhaseBeat (ACM Health) | 2020 | HR median error **1.19 bpm**; BR median error **0.25 breaths/min** | **[V]** |
+| MDPI Sensors 24(7):2111 non-contact HR | 2024 | HR accuracy 96.8%, **median error 0.8 bpm** | **[V]** |
+| PulseFi (arXiv 2510.24744) | 2025 | Low-cost ML cardiopulmonary + **apnea** monitoring from CSI | **[V]** (existence; numbers not extracted) |
+| mmWave FMCW vitals (60 GHz class) | 2023–2026 | HR MAE typically 1–3 bpm at 1–3 m, single subject; age-balanced reference dataset published (Sci Data 2026) | dataset **[V]**; MAE range **[K]** |
+| Contactless blood pressure (WiFi-band) | — | **NEGATIVE** — below classical physics floor; recoverable only via quantum magnetometry path | **[I]** (R13/R20 arc, ADR-114) |
+| **RuView** | 2026 | `wifi-densepose-vitals` (ADR-021) extracts HR/BR from ESP32 CSI; chest-centric placement gives **+27 pp coverage** for vitals cogs (synthetic) | **[I]** — **no accuracy-vs-ECG validation number exists in-repo yet; do not claim one** |
+
+**Bar**: published single-subject, line-of-sight, 1–3 m WiFi HR is ~0.8–1.2 bpm median error **[V]**. Nobody credibly publishes multi-person, through-wall, walking-subject HR at that accuracy — that is open territory.
+
+### 1.4 Localization (ToA / CRLB)
+
+| Method | Year | Metric | Tag |
+|--------|------|--------|-----|
+| 802.11mc FTM | shipped | 1–2 m typical accuracy | **[V]** (FTM survey, arXiv 2509.03901) |
+| 802.11az (+ 802.11bk) | released | **sub-1 m**, 160 MHz channels, secured ranging, HE-LTF repetitions | **[V]** |
+| AI single-link decimeter localization | 2025 | **0.63 m average error** single-link, beating Widar2.0 / Dynamic-MUSIC | **[V]** |
+| SpotFi / Chronos / Widar lineage | 2015–2021 | 0.4–1 m with multi-AP CSI AoA/ToF | **[K]** |
+| **RuView** | 2026 | CRLB / Fisher-information machinery in `ruvector/src/viewpoint/geometry.rs`; tomography ISTA voxel grid; **theoretical** limits derived internally: 30–60 cm at 16 nodes/1 m spacing, 8.8 cm information-theoretic dense limit | **[I]** (rf-topological-sensing doc 09 — synthetic derivations, no bench numbers) |
+
+### 1.5 Through-wall
+
+| Method | Year | Metric | Tag |
+|--------|------|--------|-----|
+| RF-Pose / RF-Pose3D (MIT, FMCW 5.4–7.2 GHz) | 2018 | Through-wall skeletal pose, ~specialized radar not commodity WiFi | **[K]** |
+| Commodity 2.4 GHz through-wall imaging (arXiv 1903.03895) | 2019 | Coarse imaging through walls with commodity WiFi | **[V]** (existence) |
+| Radio tomographic imaging (RTI) lineage | 2010–2013 | Through-wall tracking via RSS networks, ~0.5–1 m tracking error | **[V]** (papers) / error figure **[K]** |
+| LeakyBeam (NDSS 2025) | 2025 | Through-wall **occupancy** at 20 m, passive, commodity | **[V]** |
+| **RuView** | 2026 | RF tomography module (`tomography.rs`, ISTA L1 voxel solver) + CIR (ADR-134) exist as code; **PABS structure detection: 1,161× static / 9.36× dynamic intruder lift (synthetic)** | **[I]** |
+
+Notably, the 2025–2026 web literature shows through-wall *pose* (not just presence) on commodity WiFi remains essentially where it was in 2019 — no verified commodity-WiFi through-wall pose benchmark surfaced in our searches. The frontier moved to privacy attacks (BFI) instead.
+
+### 1.6 Identity / re-ID (capability and threat simultaneously)
+
+| Method | Year | Metric | Tag |
+|--------|------|--------|-----|
+| BFId (KIT, ACM CCS 2025) | 2025 | **~99.5% (near-100%) re-ID across 197 subjects** from beamforming feedback alone, ≥5 s of BFI | **[V]** (also BFLD survey §4.1) |
+| Transformer CSI identification | 2025 | **99.82%** on stationary subjects | **[V]** |
+| WhoFi (arXiv 2507.12869) | 2025 | Deep person re-ID via WiFi channel encoding, ~95% rank-1 class results | existence **[V]**; exact number **[K]** |
+| Wi-Gait | 2023 | 92.9% over 10 subjects, robust to walking cofactors | **[V]** |
+| **RuView** | 2026 | AETHER contrastive re-ID embeddings (ADR-024) in pose tracker; **BFLD**: first *defensive* identity-leak detector (identity_risk_score) — the literature attacks, RuView audits | **[I]** |
+
+### 1.7 Adjacent modality: mmWave radar (the accuracy ceiling WiFi is chasing)
+
+| Method | Year | Metric | Tag |
+|--------|------|--------|-----|
+| mmChainPose | 2025 | **27.0 mm MPJPE** / 0.8706 OKS on MARS (mmWave point cloud) | **[V]** |
+| ProbRadarM3F (arXiv 2405.05164) | 2024–25 | SOTA AP across joints, probability-map fusion | **[V]** |
+| Seeed MR60BHA2-class 60 GHz FMCW | shipped | Commodity $15 HR/BR/presence module — already in RuView's hardware table | **[I]** |
+
+mmWave is ~6× better than the best WiFi MPJPE (27 mm vs 160 mm) **[V]**. The strategic implication: WiFi will not beat mmWave on raw geometry; it wins on ubiquity, cost, through-wall propagation, and standardized waveforms (§2). RuView already hedges with the ESP32-C6 + MR60BHA2 fusion node. **[I]**
+
+---
+
+## 2. IEEE 802.11bf — status and implications
+
+**Status (verified)**: IEEE **802.11bf-2025 is ratified and published** (IEEE SA lists the amendment; ratification late 2024 / publication 2025) **[V]**. It amends MAC/PHY of HE (Wi-Fi 6) and EHT (Wi-Fi 7) plus DMG/EDMG (60 GHz) to support WLAN sensing in 1–7.125 GHz and >45 GHz bands **[V]**. The Wi-Fi Alliance has Wi-Fi Sensing as an active certification work area built on 802.11bf (presence/proximity, gestures, vital signs) **[V]**. Market reports claim >47 chipset vendors with 802.11bf-compatible programs as of early 2026 — single weak source, treat as directional **[V, low confidence]**.
+
+**What it implies for RuView**:
+
+1. **Sounding-on-demand becomes standard.** 802.11bf defines a sensing-measurement procedure (sensing initiator/responder, trigger-based sounding, threshold-based reporting). Today RuView relies on Espressif's vendor CSI API and Nexmon firmware patches; post-bf, commodity Wi-Fi 7 silicon will expose scheduled sensing measurements without firmware hacks. The rvCSI normalized `CsiFrame` schema is the right abstraction layer to absorb a future bf adapter (`rvcsi-adapter-*`). **[I]**
+2. **The moat moves up the stack.** When every router can sense, raw CSI access stops being differentiating. Differentiators become: multistatic fusion, coherence gating / anti-hallucination, calibration mechanisms, witness-grade verification, and privacy auditing — exactly RuView's existing bets (ADR-029/135/150/028, BFLD). **[I]**
+3. **Privacy pressure intensifies.** 802.11bf standardizes the capability that BFId/LeakyBeam exploit. BFLD's identity-leak detection and the ADR-105–109 privacy/PQC chain become regulatory assets, not nice-to-haves. **[V]+[I]**
+4. **Threshold-based reporting** in bf (report only when channel changes exceed threshold) is architecturally the same idea as RuView's coherence gate — validation that the gate belongs at the protocol layer. **[K]** (bf reporting detail from training knowledge)
+
+---
+
+## 3. RF foundation model landscape ("GPT for RF")
+
+Verified 2025–2026 attempts, all young, none dominant:
+
+| Model | Approach | Downstream tasks | Tag |
+|-------|----------|------------------|-----|
+| **LWM (Large Wireless Model)** | Pretrained on large-scale CSI → general channel embeddings | LoS/NLoS, beats raw features in low-data regimes | **[V]** |
+| **LatentWave** (arXiv 2606.06373) | JEPA pretraining on wireless spectrograms + CSI | RF classification, 5G NR positioning, beam prediction, LoS/NLoS | **[V]** |
+| **WirelessJEPA** (arXiv 2601.20190) | Multi-antenna spatio-temporal latent prediction | Cross-task transfer | **[V]** |
+| **IQFM** | Contrastive SSL on raw I/Q | Modulation classification, beam prediction, RF fingerprinting, few-shot | **[V]** |
+| **Multimodal Wireless FMs** (arXiv 2511.15162), **WMFM** (arXiv 2512.23897), **SoM** (arXiv 2506.07647) | Vision + RF multimodal for 6G ISAC | Sensing-communication integration | **[V]** |
+| **DeepSig OmniSIG** | Commercial AI-native RF sensing, 500 MHz/GPU spectrum | Signal ID (LTE/5G/Wi-Fi) | **[V]** |
+
+**Critical observation**: every verified RF foundation model targets *communication-side* tasks (beam prediction, LoS/NLoS, modulation, positioning). **None of them is a human-sensing foundation model** — none pretrains for pose/vitals/identity invariances. ADR-150's measured negative result is the sharpest data point in this space: pose-contrastive pretraining across subjects **failed on MM-Fi because the invariance is not in the data** (loss never left the ln(B) floor) **[I]**. The literature has not yet published this failure mode; the field's "GPT for RF sensing" narrative is ahead of its evidence. The defensible foundation-model objective (per ADR-150 §3.5–3.6) is **reduce few-shot calibration cost**, not zero-shot invariance. **[I]**
+
+---
+
+## 4. "Beyond SOTA" for RuView — precise definition
+
+Targets below are **bar definitions**, not claims. RuView numbers in the "current" column are measured [I]; targets must be proven via the AetherArena witness protocol (ADR-149) before being asserted anywhere.
+
+| Capability | Published SOTA (2026) | RuView measured today | RuView beyond-SOTA target | Key obstacle |
+|------------|----------------------|----------------------|---------------------------|--------------|
+| Pose, in-domain (MM-Fi) | GraphPose-Fi 160.6 mm MPJPE; MultiFormer 72.25% torso-PCK@20 **[V]** | **81.63% torso-PCK@20** (already > published) **[I]** | Hold #1 under leakage-free audit + per-joint tables published with witness rows | Protocol fragmentation; reviewers distrust WiFi-pose numbers |
+| Pose, cross-subject zero-shot | ~collapse everywhere; PerceptAlign PCK@20 44.2 cross-layout **[V]** | 11.6% torso zero-shot; 63–65% in-harness official split **[I]** | Stop chasing it (measured dead end); instead **few-shot frontier** below | Subject-distribution shift is in the data, not the model (ADR-150 §3.2) |
+| Pose, deployment calibration | **No published per-room adapter mechanism found** | **11 KB LoRA, 100–200 frames → 72.5%; cross-env K=5 → 60.1%** **[I]** | ≤20 frames → ≥70% PCK@20, adapter ≤11 KB, 30 s on-site; publish as the first calibration-service benchmark | Needs diverse-room capture fleet to validate beyond MM-Fi |
+| Presence/motion (commodity) | 92.6% across 10 M routers **[V]** | Synthetic placement recipe 100% coverage N=5 **[I]** | ≥99% presence with calibrated p95 bounds on $6–15 ESP32 mesh, bench-validated | All placement numbers are synthetic; Tier-2.3 bench validation outstanding |
+| Person count | ~90–97% lab, ≤5 people **[K]** | cog ships uncertainty intervals **[I]** | Exact count 1–6 people ≥95% with honest intervals, multistatic, real bench | Multi-person CSI superposition; no public multi-occupancy benchmark |
+| Vital signs HR | 0.8–1.2 bpm median, single subject, LoS, 1–3 m **[V]** | No in-repo ECG-validated number — **must not be claimed** | ≤1.5 bpm MAE vs ECG ground truth, *multi-person or through-wall*, witness-bundled | R13 physics floor: ~5 dB shortfall at distance; needs chest-centric placement + PABS |
+| Vital signs BP | NEGATIVE at WiFi band (matches internal R13) | nvsim quantum path only **[I]** | First validated quantum-classical fused bedside vitals (ADR-114) | NV-diamond hardware maturity, 2028+ |
+| Localization | 0.63 m single-link AI; sub-1 m 802.11az **[V]** | CRLB machinery, no bench number **[I]** | ≤30 cm multistatic on ESP32 mesh (internal theory says feasible at N=16) | ESP32 clock sync / phase offset (TDM protocol exists, unproven at this accuracy) |
+| Through-wall | Occupancy yes (LeakyBeam); commodity pose: nothing credible **[V]** | tomography + CIR code, PABS 9.36× lift (synthetic) **[I]** | First witnessed commodity-WiFi through-wall *person localization* (not pose) ≤1 m | Wall attenuation eats the R6.1 4.7 dB multi-scatterer budget |
+| Identity / re-ID | ~99.5% @ 197 subjects (attack) **[V]** | AETHER + **BFLD defensive auditing** (no published competitor) **[I]** | Ship the first identity-leak risk score with DP budget hook; keep re-ID opt-in only | Calibrating risk score at 802.11ax 4/2-bit quantization (BFLD open Q2) |
+| Verification | **Nothing comparable published** — no WiFi-sensing paper ships deterministic re-verification | ADR-028 witness bundles, SHA-256 proof, 7/7 self-verify, 1,031+ tests **[I]** | Make witness-grade reproduction the *expected* standard: every public claim = one-command verification | Community adoption, not technology |
+| Foundation encoder | Comms-task FMs only (LWM/JEPA family) **[V]** | Masked-CSI + coherence head planned; pose-contrastive refuted **[I]** | First *sensing* FM whose acceptance metric is calibration-sample reduction (frames-to-72% halved) | SSL must match production CSI pipeline (ADR-149 resampling risk) |
+
+---
+
+## 5. Where RuView already matches/exceeds published work
+
+1. **In-domain MM-Fi pose** — 81.63% torso-PCK@20 vs MultiFormer 72.25%, metric- and protocol-matched (issue #876). **[I]**
+2. **Deployment-calibration mechanism** — the 11 KB LoRA per-room adapter with measured frames-to-accuracy curves (§3.4–3.6 of ADR-150) has no published equivalent; the literature is still arguing about zero-shot generalization that ADR-150 measured to be a data property.
+3. **Deterministic witness verification** — ADR-028's SHA-256 pipeline proof + self-verifying bundles exceeds the reproducibility practice of every WiFi-sensing paper surveyed (none ship deterministic re-verification).
+4. **Multistatic cost point** — $6–15/node ESP32 mesh with TDM sync, channel hopping, placement recipes (ADR-113) vs literature setups using Intel 5300/AX210 laptops or USRPs; ~$30/bed vs $3,000 clinical monitor framing (R16).
+5. **Defensive identity auditing (BFLD)** — the field publishes attacks (BFId, LeakyBeam, WhoFi); RuView is building the only detector/auditor, plus a PQC-hardened federation privacy chain (ADR-105–109) with no published counterpart.
+6. **Anti-hallucination coherence gating** — confidence gated by RF integrity (ADR-135, ADR-150 §2.4); WiFi-pose papers uniformly lack a "the model knows when the channel is bad" signal.
+7. **Negative-result discipline** — physics floors (R13 BP, R6.1 4.7 dB), refuted pose-contrastive pretraining — published SOTA papers do not report these, which inflates the apparent literature bar.
+
+## 6. Where RuView lags
+
+1. **Bench validation** — nearly all multistatic/placement/tomography numbers are synthetic-physics; the 92.6%-on-10M-routers deployment **[V]** is real-world evidence at a scale RuView cannot approach.
+2. **Vital-sign ground truth** — no in-repo ECG/respiration-belt validated HR/BR error; published work has 0.8 bpm median **[V]**. This is the most urgent claim gap.
+3. **Raw geometric accuracy** — mmWave (27 mm MPJPE **[V]**) and even best-WiFi MPJPE (160.6 mm **[V]**) have no RuView MPJPE counterpart published; AetherArena reports PCK only.
+4. **802.11bf-native capture** — RuView is on vendor CSI APIs and Nexmon patches; no bf sensing-procedure adapter exists yet in rvCSI.
+5. **Multi-person pose** — Person-in-WiFi-3D does end-to-end multi-person at 54 FPS **[V]**; RuView's pose path is effectively single-person (multi-person exists only in count/placement work).
+6. **Dataset scale and diversity** — MM-Fi only; ADR-150 §3.3 shows the binding constraint is room/device/protocol diversity, which requires the capture fleet that doesn't exist yet.
+
+## 7. Strategic synthesis
+
+The 2026 bar is bimodal: **lab in-domain numbers are saturated** (PCK@50 > 95%, HR < 1 bpm) and **deployment numbers are collapsed** (cross-layout PCK@20 ≈ 44, zero-shot cross-subject ≈ 11%). 802.11bf-2025 commoditizes raw sensing; foundation models commoditize comms-side embeddings. "Beyond SOTA" for RuView is therefore *not* a leaderboard delta — it is owning the three layers the field hasn't built: **(a)** witnessed, deterministic, leakage-audited evaluation; **(b)** the few-shot calibration service (11 KB adapters) as the deployment answer the zero-shot literature lacks; **(c)** the privacy/integrity layer (BFLD + coherence gate) that 802.11bf-era regulation will demand. Each row in §4's target table is gated on the AetherArena witness protocol — a target becomes a claim only when it ships with a one-command reproduction.
+
+---
+
+## 8. Verified sources (accessed 2026-06-09 via web search)
+
+Pose: [GraphPose-Fi](https://arxiv.org/html/2511.19105v1) · [PerceptAlign / cross-layout](https://arxiv.org/html/2601.12252) · [CSDS](https://www.mdpi.com/2079-9292/14/4/756) · [Person-in-WiFi 3D](https://aiotgroup.github.io/Person-in-WiFi-3D/) · [DensePose From WiFi](https://arxiv.org/abs/2301.00250) · [MetaFi++](https://www.researchgate.net/publication/369644995_MetaFi_WiFi-Enabled_Transformer-based_Human_Pose_Estimation_for_Metaverse_Avatar_Simulation) · [WiFlow](https://arxiv.org/html/2602.08661v2)
+Vitals: [PhaseBeat](https://dl.acm.org/doi/abs/10.1145/3377165) · [Non-contact HR (Sensors 24:2111)](https://www.mdpi.com/1424-8220/24/7/2111) · [PulseFi](https://arxiv.org/pdf/2510.24744) · [mmWave vitals dataset (Sci Data)](https://www.nature.com/articles/s41597-026-07172-9)
+Localization: [FTM survey 802.11mc/az/bk](https://arxiv.org/abs/2509.03901) · [Decimeter single-link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12846125/) · [SelfLoc 802.11az](https://www.mdpi.com/2079-9292/14/13/2675)
+802.11bf: [IEEE SA 802.11bf-2025](https://standards.ieee.org/ieee/802.11bf/11574/) · [TGbf](https://www.ieee802.org/11/Reports/tgbf_update.htm) · [NIST overview](https://www.nist.gov/publications/ieee-80211bf-enabling-widespread-adoption-wi-fi-sensing) · [Wi-Fi Alliance work areas](https://www.wi-fi.org/current-work-areas) · [ISAC survey (10M-router 92.6%)](https://arxiv.org/pdf/2510.14358)
+Identity: [BFId / KIT CCS 2025 coverage](https://www.gblock.app/articles/wifi-signal-person-identification-surveillance-study-may-2026) · [WhoFi](https://arxiv.org/html/2507.12869v1) · [Wi-Gait](https://www.sciencedirect.com/science/article/abs/pii/S1389128623001962) · [LeakyBeam NDSS 2025](https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/)
+Through-wall: [RTI through-wall](https://ieeexplore.ieee.org/document/6214374/) · [Commodity 2.4 GHz imaging](https://arxiv.org/pdf/1903.03895) · [Multi-room presence](https://arxiv.org/pdf/2304.13107)
+Foundation models: [LatentWave](https://arxiv.org/html/2606.06373) · [WirelessJEPA](https://arxiv.org/pdf/2601.20190) · [Multimodal Wireless FMs](https://arxiv.org/pdf/2511.15162) · [WMFM](https://arxiv.org/html/2512.23897) · [SoM](https://arxiv.org/pdf/2506.07647) · [RF-native AI / LWM, IQFM, OmniSIG](https://aicompetence.org/rf-native-ai-models-for-the-invisible-spectrum/)
+mmWave: [mmChainPose](https://www.sciencedirect.com/science/article/abs/pii/S0925231225026918) · [ProbRadarM3F](https://arxiv.org/html/2405.05164v3)
+
+Internal [I] sources: ADR-150 (§1, §3.2–3.6), ADR-147, ADR-028, ADR-113/114, issue #876, `docs/research/sota-2026-05-22/00-summary.md`, `docs/research/BFLD/01-sota-survey.md`, `docs/research/rf-topological-sensing/`.
--- a/docs/research/ruview-beyond-sota/02-beyond-sota-architecture.md
+++ b/docs/research/ruview-beyond-sota/02-beyond-sota-architecture.md
@ -0,0 +1,282 @@
+# RuView Beyond-SOTA Target Architecture
+
+**Series:** ruview-beyond-sota (02)
+**Date:** 2026-06-09
+**Status:** Research design — components marked **PROPOSED** do not exist yet; everything else cites real code.
+**Governing constraint:** ADR-136 §2.1 explicitly rejects renaming/rewriting the workspace. This document designs an **evolution** of the existing 38-crate `v2/` workspace (`v2/Cargo.toml`), not a new system. Every beyond-SOTA layer attaches to the ADR-136 `Stage<I,O>` / `FrameMeta` / `CanonicalFrame` contracts (`docs/adr/ADR-136-ruview-streaming-engine-frame-contracts.md` §2.2–2.5) and preserves the ADR-028 witness chain.
+
+---
+
+## 1. Where the system is today (grounding)
+
+The ADR-136 ten-role pipeline (ingest → signal → fusion → world → models → privacy → store → api → eval → observe) is already mapped 1:1 onto existing crates (ADR-136 §2.1, normative table). The composition root exists: `v2/crates/wifi-densepose-engine/src/lib.rs` wires ADR-135..146 blocks into one `StreamingEngine::process_cycle` that emits a `TrustedOutput` carrying fusion `QualityScore`, privacy class, `SemanticProvenance`, RF-SLAM (`RfSlam` field), and a BLAKE3 `witness: [u8; 32]`.
+
+Key existing substrate this design builds on:
+
+| Substrate | Path | What it gives us |
+|---|---|---|
+| Frame contracts + witness | `v2/crates/wifi-densepose-core/src/types.rs` (`CsiFrame`, `CsiMetadata` + `calibration_id`/`model_id`/`model_version`), ADR-136 `ComplexSample`/`CanonicalFrame` | Deterministic LE bytes, BLAKE3 witness, provenance-append-only boundary rule |
+| Six-stage signal pipeline | `v2/crates/wifi-densepose-signal/src/ruvsense/mod.rs` (+22 modules incl. `cir.rs`, `calibration.rs`, `tomography.rs`, `rf_slam.rs`, `fusion_quality.rs`, `array_coordinator.rs`) | CSI→CIR, baseline calibration, multistatic fusion, coherence gating |
+| Fusion quality + evidence | ADR-137; `ruvsense/multistatic.rs`, `ruvsense/fusion_quality.rs`, `wifi-densepose-ruvector/src/viewpoint/fusion.rs` | `QualityScore` with `EvidenceRef`/`ContradictionFlag`, privacy demotion on contradiction |
+| Digital twin | `v2/crates/wifi-densepose-worldgraph/src/lib.rs` (typed `StableDiGraph`, mandatory `SemanticProvenance`) | Persistent room/sensor/track/belief graph |
+| World model bridge | `v2/crates/wifi-densepose-worldmodel/src/lib.rs` (`OccWorldBridge`, `TrajectoryPrior`, ADR-147) | Occupancy prediction priors into the Kalman tracker |
+| NN + training | `v2/crates/wifi-densepose-train/src/{model.rs,rapid_adapt.rs,ablation.rs,proof.rs,eval.rs,ruview_metrics.rs}`, `wifi-densepose-nn` | Shared backbone + 2 heads, `AdaptationLoss::ContrastiveTTT`, ADR-145 ablation matrix, seeded proof harness |
+| Swarm | `v2/crates/ruview-swarm/src/` (`sensing/{multiview.rs,payload.rs,occworld_bridge.rs}`, `marl/`, `topology.rs`) | Raft/hierarchical-mesh drone coordination with CSI payload (ADR-148) |
+| Edge WASM | `v2/crates/wifi-densepose-wasm-edge/src/lib.rs` (WASM3 on ESP32-S3, `on_frame` host ABI), `wifi-densepose-wasm` | Hot-loadable on-device sensing modules |
+| Quantum-adjacent sim | `v2/crates/nvsim/src/lib.rs` (deterministic NV-magnetometry forward pipeline, SHA-256 witness, WASM-ready) | Honest classical-quantum hybrid substrate (ADR-089) |
+| Semantic record + agents | ADR-140 (`wifi-densepose-sensing-server/src/semantic/`), `homecore-assist` | Provenance-bearing semantic states, Ruflo agent bridge |
+
+---
+
+## 2. Target architecture diagram
+
+The beyond-SOTA layers (★ = new/PROPOSED, ☆ = exists-but-not-wired) wrap the ADR-136 pipeline; nothing replaces it.
+
+```
+                            ╔═══════════════════ BEYOND-SOTA CONTROL PLANE ═══════════════════╗
+                            ║  P6 Continual adaptation loop (TTT + EWC★)   P5 Swarm aperture   ║
+                            ║  rapid_adapt.rs → encoder LoRA deltas        planner★ (Raft)     ║
+                            ╚════════════▲══════════════════════▲══════════════▲══════════════╝
+                                         │ adaptation deltas    │ quality      │ tasking
+ [ingest]            [signal]            │      [fusion]        │  [world]     │        [models]
+ ESP32/Pi mesh ─► RuvSensePipeline ──────┴──► fuse_scored ──────┴─► WorldGraph ┴──► RF Foundation
+ + drone payload    multiband→phase_align     (ADR-137           (ADR-139      │    Encoder (P1)
+ (ruview-swarm      →calibration(135)         QualityScore,      twin) ◄───────┘    7 heads + UQ
+  sensing/payload)  →cir(134)→multistatic     EvidenceRef,         ▲  │             (ADR-146/150)
+      │             →coherence→gate           Contradiction)       │  ▼                  │
+      │                  │                        │            RF-SLAM(143)──OccWorld    │
+      ▼                  ▼                        │            rf_slam.rs    worldmodel  ▼
+ P7 WASM edge      P2 Differentiable RF           │            (P3 closed loop ☆)   P4 cross-modal
+ inference★        forward model★                 │                                 distilled student★
+ (wasm-edge,       (tomography.rs +               │                                 (camera-free deploy)
+  deterministic    cir.rs ISTA as seed)           │
+  replay)               │ residuals feed fusion as EvidenceRef★
+      │                 ▼
+      │            P8 NV-magnetometry fusion★ (nvsim forward model as a sensing node class)
+      ▼
+ ─────────────────────── ADR-136 CONTRACT SPINE (unchanged) ───────────────────────────────────
+  CsiFrame{ComplexSample, FrameMeta{calibration_id, model_id, model_version}} → Stage<I,O>
+  → CanonicalFrame::witness_hash() at EVERY stage boundary (BLAKE3, LE-deterministic)
+ ───────────────────────────────────────────────────────────────────────────────────────────────
+      │                    │                      │                  │
+   [privacy]            [store]                [api]              [eval]            [observe]
+   wifi-densepose-bfld  homecore-recorder     homecore-api       ADR-145 ablation   homecore-
+   gate + demotion      + replay corpus★      /HA/Matter/HAP     (train/ablation.rs automation,
+   (ADR-141)                                                      + P1-P8 variants)  Ruflo (ADR-140)
+```
+
+---
+
+## 3. The eight pillars
+
+Each pillar: what / why beyond-SOTA / builds-on / contract sketch / feasibility. All trait sketches are **PROPOSED** unless a path is cited.
+
+### P1 — RF Foundation Encoder with multitask uncertainty heads (ADR-146 + ADR-150)
+
+**What.** One shared, self-supervised RF encoder (`wifi-densepose-nn`) with seven typed heads (pose, presence, count, activity, vitals, gait, identity-embedding), each emitting calibrated uncertainty via the ADR-136 `QualityScored` trait, trained with the ADR-150 pose-contrastive objective (same-pose-across-subjects = positive) plus a coherence head that exposes channel instability.
+
+**Why beyond SOTA.** Published WiFi-pose systems (MultiFormer, GraphPose-Fi lineage) report in-domain accuracy and hallucinate under domain shift. ADR-150 documents the real measured frontier: 81.63% torso-PCK@20 in-domain on MM-Fi vs ~11.6% leakage-free cross-subject, and that DANN and bigger capacity both failed (ADR-150 §1). A foundation encoder whose loss stack explicitly separates pose / identity / room / device factors *and* emits an RF-integrity signal per prediction is not in the published literature as a deployed, auditable artifact. Target (not a claim): close the cross-subject gap materially while every head output carries `confidence_bounds()`.
+
+**Builds on.** `v2/crates/wifi-densepose-train/src/model.rs` (`WiFiDensePoseModel`, `kp_head`/`dp_head`); `v2/crates/wifi-densepose-sensing-server/src/embedding.rs` (`ProjectionHead` + LoRA + `info_nce_loss` — the existing seventh head, ADR-146 §1.1); `v2/crates/wifi-densepose-train/src/rapid_adapt.rs` (ContrastiveTTT precedent); ADR-146 §1.4 head fan-out; ADR-150 §2 loss stack.
+
+**Contract sketch** (lands in `wifi-densepose-nn`, per ADR-146 §1.3):
+```rust
+pub trait RfEncoder: Send + Sync {
+    fn encode(&self, window: &CsiWindowTensor) -> Embedding;        // z ∈ R^d_model
+    fn model_id(&self) -> u16;                                       // FrameMeta binding (ADR-136 §2.2)
+}
+pub trait TaskHead<O: QualityScored>: Send + Sync {
+    fn name(&self) -> &'static str;
+    fn forward(&self, z: &Embedding) -> O;                           // value + uncertainty bounds
+}
+pub struct MultiTaskOutput { /* per-head QualityScored outputs + coherence: f32 */ }
+```
+
+**Feasibility: HIGH for the architecture, MEDIUM for the headline result.** The pure-Rust f32 ABI is proven (`embedding.rs`), the head taxonomy is specified (ADR-146), and the ablation harness to measure it exists (`wifi-densepose-train/src/ablation.rs`). The risk is scientific, not engineering: ADR-150's own data shows naive approaches fail; the pose-contrastive objective is plausible but unproven at scale. Mitigation: ADR-150 §3's frozen-decoder three-variant experiment gates promotion.
+
+### P2 — Physics-informed differentiable RF forward model (PROPOSED)
+
+**What.** A differentiable forward model `render(scene, link_geometry) -> predicted CSI/CIR` used three ways: (1) as a regularizer in encoder training (predictions must be consistent with a Born-approximation scattering model), (2) as an analysis-by-synthesis residual at inference (`|observed − rendered|` becomes an ADR-137 `EvidenceRef`), (3) as a synthetic-data generator complementing MM-Fi (ADR-015).
+
+**Why beyond SOTA.** Published WiFi sensing is almost entirely discriminative; physics-informed neural fields exist for vision (NeRF) and acoustics but no deployed RF-human-sensing stack closes the loop *forward model → residual → fusion evidence → privacy decision*. Making physics disagreement a first-class, witnessed contradiction flag is novel system design, not just a model.
+
+**Builds on.** The codebase already contains the seed of the forward model: `v2/crates/wifi-densepose-signal/src/ruvsense/tomography.rs` (`RfTomographer`, `LinkGeometry`, `OccupancyVolume` — a linear shadowing forward model inverted by ISTA), `ruvsense/cir.rs` (sub-DFT sensing matrix Φ, ISTA L1 — ADR-134), ADR-143 §1.3 (bistatic excess-delay geometry, the exact ray equations), and `nvsim` as the in-repo precedent for a *deterministic, witness-hashed forward physics pipeline* (`v2/crates/nvsim/src/{propagation.rs,pipeline.rs,proof.rs}`).
+
+**Contract sketch** (new module `wifi-densepose-signal/src/ruvsense/forward_model.rs`, PROPOSED):
+```rust
+pub trait RfForwardModel: Versioned {
+    /// Predict per-link CSI given a voxel scene + body primitive set.
+    fn render(&self, scene: &OccupancyVolume, links: &[LinkGeometry]) -> Vec<PredictedCsi>;
+    /// Physics residual in [0,1]; 0 = perfectly Maxwell/Born-consistent.
+    fn residual(&self, observed: &CsiFrame, rendered: &PredictedCsi) -> PhysicsResidual; // → EvidenceRef
+}
+```
+
+**Feasibility: MEDIUM, with one honest line drawn.** A full Maxwell FDTD-in-the-loop solver is **infeasible** at 20 Hz on this hardware and is a non-goal (§6). What is feasible: a first-order Born / ray-tracing bistatic model (the ADR-143 spheroid geometry generalized), differentiable through finite differences or a small Candle graph, validated against recorded calibration captures (ADR-135 baselines give per-link empty-room ground truth for free). "Maxwell-consistent" should be read as "consistent with a stated first-order approximation, with the approximation order recorded in the witness metadata."
+
+### P3 — RF-SLAM × WorldGraph × OccWorld closed loop (exists in parts, wiring is the work)
+
+**What.** Close the loop: RF-SLAM discovers reflectors/anchors → WorldGraph persists them as `object_anchor` nodes → OccWorld consumes graph occupancy → `TrajectoryPrior`s feed the Kalman tracker → improved tracks refine SLAM association. The environment model becomes self-acquiring and self-correcting (furniture moved ⇒ `BaselineTopologyChange` ⇒ recalibration trigger, ADR-143 §1.4).
+
+**Why beyond SOTA.** Published RF-SLAM work maps *or* tracks; no published consumer system maintains a persistent, provenance-bearing, privacy-rolled-up environmental digital twin (`PrivacyRollup` in `wifi-densepose-worldgraph/src/graph.rs`) that is simultaneously the SLAM map, the automation substrate, and the audit record. The differentiator is the closed loop with evidence edges (`supports`/`contradicts`).
+
+**Builds on.** All three vertices exist: `v2/crates/wifi-densepose-signal/src/ruvsense/rf_slam.rs` (`RfSlam::observe`, line 176, already a field of `StreamingEngine` — `wifi-densepose-engine/src/lib.rs:116`); `v2/crates/wifi-densepose-worldgraph/src/lib.rs`; `v2/crates/wifi-densepose-worldmodel/src/{bridge.rs,occupancy.rs}` (`worldgraph_to_occupancy`, `OccWorldBridge::predict`). The engine already upserts SLAM output and person tracks into the graph. Missing: prior-injection back into `ruvsense/pose_tracker.rs`, and the topology-change → ADR-135 recalibration edge.
+
+**Contract sketch** (extends existing types):
+```rust
+impl StreamingEngine {
+    /// PROPOSED: inject OccWorld priors into the next tracker cycle.
+    pub fn apply_trajectory_priors(&mut self, priors: &[TrajectoryPrior]) -> Vec<WorldId>;
+}
+// WorldEdge gains (PROPOSED): PredictedBy { model_id: u16 }  — prior provenance edge
+```
+
+**Feasibility: HIGH.** This is mostly integration glue between tested crates. The two real risks are already named by ADR-143: no ground-truth oracle in a live home (mitigated by the v1-fixed / v2-flagged rollout, `#[cfg(feature = "rf-slam-v2")]`), and OccWorld's Python subprocess (ADR-147: 375 ms/inference) being off the deterministic path — priors must be treated as advisory, never witness-bearing (§5).
+
+### P4 — Cross-modal distillation: camera-teacher → RF-student, privacy-preserving deployment (PROPOSED)
+
+**What.** Train-time-only camera supervision: a vision pose teacher labels synchronized CSI (MM-Fi already provides paired modalities, ADR-015), distilling dense pose + uncertainty into the P1 encoder. Deployed systems ship **no camera and no camera-derived identity features**; the ADR-145 privacy-leakage metric (membership-inference score in `wifi-densepose-train/src/ablation.rs`) gates that the student does not retain identity.
+
+**Why beyond SOTA.** Camera-supervised WiFi pose is the original DensePose-WiFi recipe; what is *not* published is distillation with a measured, CI-enforced privacy-leakage budget and a witnessed claim that the deployed artifact is camera-free. The beyond-SOTA move is making "privacy-preserving" a *measured property of the release pipeline*, not a marketing adjective.
+
+**Builds on.** `v2/crates/wifi-densepose-train/src/{trainer.rs,losses.rs,dataset.rs}` (training substrate); ADR-015 paired datasets; ADR-145 `FeatureSet` matrix + privacy-leakage scalar; `v2/crates/wifi-densepose-bfld` (`privacy_gate.rs`, `signature_hasher.rs` — runtime identity controls, ADR-120 invariants I1–I3).
+
+**Contract sketch** (in `wifi-densepose-train`, PROPOSED):
+```rust
+pub struct DistillationLoss { pub teacher: TeacherSource, pub temperature: f32, pub uq_transfer: bool }
+pub enum TeacherSource { CachedPoseLabels(PathBuf), /* never a live camera in the serving graph */ }
+/// Release gate: leakage(student) ≤ budget, asserted by the ADR-145 harness per variant.
+pub struct PrivacyBudget { pub max_mia_score: f32 }
+```
+
+**Feasibility: HIGH.** All ingredients exist; the work is a loss term, a label cache format, and a CI gate. The honest caveat: MIA-based leakage scores are a lower bound on real leakage; the budget is a regression tripwire, not a formal guarantee.
+
+### P5 — Swarm-distributed multistatic sensing with Raft-coordinated apertures (ADR-148, partially built)
+
+**What.** Treat the drone swarm + fixed ESP32 mesh as one *reconfigurable multistatic aperture*: a Raft-elected cluster head plans node positions/channel assignments to maximize geometric diversity (GDI) for the current sensing task; per-node frames flow into the same `MultistaticFuser` path as fixed nodes.
+
+**Why beyond SOTA.** Published multistatic WiFi sensing assumes fixed geometry. Closed-loop aperture optimization — moving the sensors to where the Fisher information is — driven by the GDI/Cramér–Rao machinery that already exists in `v2/crates/wifi-densepose-ruvector/src/viewpoint/geometry.rs` (per CLAUDE.md module table: `GeometricDiversityIndex`, Cramér-Rao bounds) is a genuinely new system class for SAR/MAT scenarios.
+
+**Builds on.** `v2/crates/ruview-swarm/src/sensing/{multiview.rs,payload.rs,occworld_bridge.rs}`, `topology.rs`, `planning.rs`, `marl/` (MAPPO, `candle_ppo.rs`); `ruvsense/multistatic.rs` + `array_coordinator.rs` (ADR-138 clock-quality gating — moving nodes will stress exactly this); `wifi-densepose-mat` (the MAT use case).
+
+**Contract sketch** (in `ruview-swarm`, PROPOSED):
+```rust
+pub trait AperturePlanner: Send + Sync {
+    /// Given current twin + task, propose node placements maximizing expected GDI.
+    fn plan(&self, twin: &WorldGraphSnapshot, task: &SwarmTask) -> Vec<(NodeId, Position3D)>;
+}
+// Output flows through Raft (topology.rs) as a normal SwarmTask; frames return as ArrayNodeInput.
+```
+
+**Feasibility: MEDIUM.** Coordination, MARL, and fusion code exist and are tested; the hard physical problems are honest unknowns: airborne CSI phase stability (rotor vibration), clock sync across mobile nodes (ADR-138 gate will reject a lot initially), and ADR-148 §1.3's own regulatory scoping. Simulation-first via `ruview-swarm/src/evals.rs` + `bench_support.rs`; hardware validation is Phase 3.
+
+### P6 — Continual / test-time adaptation with EWC-style forgetting control (PROPOSED on existing TTT)
+
+**What.** Promote `rapid_adapt.rs` from a per-deployment trick to a managed continual-learning loop: TTT/entropy adaptation produces LoRA deltas on the P1 encoder; an EWC (elastic weight consolidation) penalty — **which does not exist in the workspace today** (no EWC match in `wifi-densepose-train/src/rapid_adapt.rs`) — anchors weights important to previously-validated environments; every adaptation step is versioned as a new `model_version` (u16, ADR-136 §2.2) and must re-pass the ADR-145 acceptance matrix before activation.
+
+**Why beyond SOTA.** TTT papers adapt and hope; nothing published couples adaptation to a *deterministic regression gate with witness hashes*, where an adapted model that regresses tier or leaks identity is automatically rejected and the `model_version` provenance lets any semantic state be traced to the exact adaptation step.
+
+**Builds on.** `v2/crates/wifi-densepose-train/src/rapid_adapt.rs` (`AdaptationLoss::ContrastiveTTT`, entropy-minimization variant — lines 8–16); LoRA adapters in `sensing-server/src/embedding.rs` (rank-4 `lora_1`/`lora_2`); ADR-027 MERIDIAN evaluator (`train/src/eval.rs`); ADR-146 §2 calibration-robustness loss.
+
+**Contract sketch** (in `wifi-densepose-train`, PROPOSED):
+```rust
+pub struct EwcPenalty { pub fisher_diag: Vec<f32>, pub anchor: Vec<f32>, pub lambda: f32 }
+pub struct AdaptationStep {
+    pub parent_model_version: u16, pub new_model_version: u16,
+    pub loss: AdaptationLoss, pub ewc: Option<EwcPenalty>,
+    pub acceptance: RuViewAcceptanceResult,           // must be ≥ parent tier
+    pub witness: [u8; 32],                            // hash of delta + acceptance
+}
+```
+
+**Feasibility: HIGH.** EWC over a small LoRA delta is cheap (Fisher diagonal over the replay corpus); the acceptance gate and proof seeds exist (`proof.rs`, `PROOF_SEED = 42`). Risk: online Fisher estimation from unlabeled home data is noisy — start with adaptation restricted to LoRA parameters only, backbone frozen.
+
+### P7 — On-device WASM edge inference with deterministic replay (extends existing Tier-3)
+
+**What.** Push P1 head subsets (presence, vitals, coarse activity) into hot-loadable WASM modules on ESP32-S3, and onto browsers/workers via `wifi-densepose-wasm`. Every edge module's output is replayable: the same `CanonicalFrame` input bytes through the same module hash produce the same output bytes, verified in CI on x86_64/aarch64/wasm32.
+
+**Why beyond SOTA.** Edge WiFi-sensing exists; *bit-deterministic, witness-hashed edge inference with hot-swap and replay parity against the server pipeline* does not appear in published systems. It turns the edge from a trust hole into a witness-chain extension.
+
+**Builds on.** `v2/crates/wifi-densepose-wasm-edge/src/lib.rs` (WASM3 host ABI: `csi_get_*`, `on_frame` at ~20 Hz, ADR-040 Tier 3); `nvsim` as the proof that a no-std-time, no-OS-entropy, seeded-PRNG crate runs identically on wasm32 (`nvsim/src/lib.rs` doc); ADR-136 AC7 cross-architecture byte-stability test.
+
+**Contract sketch** (PROPOSED additions to the wasm-edge host ABI):
+```rust
+// exports added to module lifecycle:
+//   on_replay_begin(seed: u64)              — pins any module-internal PRNG
+//   witness_digest(buf_ptr: i32) -> i32     — module returns BLAKE3 of its output stream
+pub trait EdgeStage: Stage<CsiFrameView, EdgeEvent> { fn module_hash(&self) -> [u8; 32]; }
+```
+
+**Feasibility: HIGH for presence/vitals heads, LOW for full pose on-ESP32.** WASM3 interpretation on Xtensa caps throughput; full 7-head inference stays on Pi/Hailo/browser. Float determinism across native vs WASM needs care (no fast-math, fixed reduction order — same obligation ADR-136 §3.2 already accepts).
+
+### P8 — NV-magnetometry fusion: an honest classical-quantum hybrid (PROPOSED, simulation-first)
+
+**What.** Add `nvsim`-modeled NV-magnetometer nodes as a *fourth sensing modality class* (after CSI, mmWave/ADR-021, BFLD) in the multistatic fusion: near-range (≤ tens of cm, per the physics review) cardiac/respiratory magnetic signatures fused with CSI/mmWave vitals under the ADR-137 evidence contract. Simulation-first: the modality lands end-to-end against `nvsim` before any hardware exists.
+
+**Why beyond SOTA.** Not range — the Ghost Murmur review (`docs/research/quantum-sensing/16-ghost-murmur-ruview-spec.md`) documents why multi-mile cardiac magnetometry contradicts published physics, and this design adopts that conclusion. The beyond-SOTA element is architectural honesty: a fusion engine that can ingest a quantum-sensor modality with explicit, witnessed physics bounds (`nvsim`'s forward model states its approximations and hashes its output, `nvsim/src/proof.rs`), so that when real NV hardware matures, the integration path and the anti-hype guardrails already exist. No published consumer sensing stack has this.
+
+**Builds on.** `v2/crates/nvsim/src/` (scene→source→attenuation→NV ensemble→digitiser, SHA-256 witness, ADR-089); `nvsim-server`; `wifi-densepose-vitals` (mmWave HR/BR — the modality NV would cross-validate); `ruvsense/multistatic.rs` fusion + ADR-137 `EvidenceRef`.
+
+**Contract sketch** (PROPOSED): a `SensorModality::NvMagnetometer` variant on the existing `wifi-densepose-worldgraph` `SensorModality` enum, plus an `ArrayNodeInput` adapter from `nvsim` frames; vitals agreement/disagreement between NV and mmWave becomes an `EvidenceRef`/`ContradictionFlag` pair.
+
+**Feasibility: HIGH in simulation, SPECULATIVE on hardware.** The sim path is days of glue; COTS NV magnetometers with the required sensitivity at consumer cost do not exist in 2026. This pillar's deliverable is the *contract and the simulated validation*, explicitly labeled as such.
+
+---
+
+## 4. Phased implementation plan
+
+Phases are gated by the Pre-Merge Checklist (CLAUDE.md) and the witness chain (§5). Crate names per the ADR-136 §2.1 normative map — no new `ruview_*` crates except where a crate already exists (`ruview-swarm`).
+
+**Phase 0 — Hardening (close the ADR-136 "integration glue" debt).**
+- `wifi-densepose-signal`: wire the full 600-frame `Stage`-chain replay (ADR-136 AC6) and register `streaming_engine_replay_v1` in `archive/v1/data/proof/expected_features.sha256`.
+- CI: cross-architecture witness matrix x86_64/aarch64 (AC7); add wasm32 lane for `nvsim` + `wifi-densepose-wasm`.
+- `wifi-densepose-engine`: populate `FrameMeta.calibration_id`/`model_id` from the live calibration and model-binding stages (currently defaulted — ADR-136 §8).
+- `homecore-recorder`: define the **replay corpus** format (canonical-bytes frame streams + witness manifest) that P4/P6 training and all ablations consume.
+
+**Phase 1 — Encoder + measurement (P1, P4 groundwork, P6 skeleton).**
+- `wifi-densepose-nn`: `RfEncoder`/`TaskHead` traits, seven-head fan-out, UQ layer (ADR-146); relocate `ProjectionHead` from `sensing-server/src/embedding.rs`.
+- `wifi-densepose-train`: `ContrastiveBatcher`, ADR-150 loss stack, distillation loss + cached-teacher format (P4), `EwcPenalty` + `AdaptationStep` (P6); extend `ablation.rs` `FeatureSet` with per-head and per-pillar variants; pin `expected_ablation_*.sha256`.
+- Run the ADR-150 three-variant frozen-decoder experiment; promotion gate on cross-subject delta.
+
+**Phase 2 — Closed loop + edge (P3, P7).**
+- `wifi-densepose-engine`: `apply_trajectory_priors` (OccWorld → `pose_tracker.rs`); `PredictedBy` provenance edge in `wifi-densepose-worldgraph`; topology-change → ADR-135 recalibration trigger.
+- `wifi-densepose-wasm-edge`: replay ABI (`on_replay_begin`, `witness_digest`), presence/vitals head modules; parity test vs server pipeline on identical canonical bytes.
+- Enable `rf-slam-v2` feature on the 7-day validation dataset (ADR-143 gate).
+
+**Phase 3 — Frontier (P2, P5, P8).**
+- `wifi-densepose-signal/src/ruvsense/forward_model.rs`: Born/ray forward model seeded from `tomography.rs`; `PhysicsResidual` → `EvidenceRef`; synthetic-data generator into `train/src/dataset.rs`.
+- `ruview-swarm`: `AperturePlanner` over GDI (`ruvector/src/viewpoint/geometry.rs`); simulation evals in `evals.rs`; airborne CSI stability study before any hardware claim.
+- `nvsim` ↔ `wifi-densepose-engine`: `SensorModality::NvMagnetometer` adapter, simulated NV+mmWave vitals cross-validation in the ablation matrix.
+
+---
+
+## 5. Determinism & witness-chain preservation
+
+The non-negotiable invariant (ADR-136 §2.5–2.6, ADR-028): replaying recorded canonical bytes through the pipeline twice yields byte-identical outputs and equal BLAKE3 witness hashes. Strategy per component class:
+
+1. **Everything on the trust path implements `CanonicalFrame`.** New frame types (`MultiTaskOutput`, `PhysicsResidual`, `AdaptationStep`, edge events, NV frames) get fixed-field-order LE encodings and `witness_hash()`; encoders are the only serializers (no ad-hoc serde on the witness path).
+2. **Inference is witnessed by (input hash, model hash, output hash).** `model_id`/`model_version` on `FrameMeta` already bind frames to models; P1 adds a weights digest so the triple is closed. Pure-Rust f32 inference (ADR-146 ABI) with fixed reduction order; no GPU nondeterminism on the witness path — GPU/libtorch is training-only, and training determinism is pinned by the existing seeds (`proof.rs`: `PROOF_SEED = 42`, `MODEL_SEED = 0`).
+3. **Advisory vs witnessed split.** Components that cannot be made deterministic — the OccWorld Python subprocess (ADR-147), live MARL exploration, any future LLM/agent output (ADR-140 Ruflo) — are **advisory**: their outputs may bias estimates but never enter `to_canonical_bytes()` directly; instead the *decision to use them* is recorded (prior id + content hash) so replay reproduces the decision even if the producer cannot be re-run. The Kalman tracker consumes priors as explicit inputs recorded in the replay corpus.
+4. **Adaptation is a chain of witnessed steps.** P6's `AdaptationStep.witness` hashes (parent version ‖ delta ‖ acceptance result); the active model at any timestamp is reconstructible from the step chain — the model-weights analogue of the frame witness chain.
+5. **Edge parity.** P7 modules must produce the same `witness_digest` as the server-side reference implementation on the AC6 fixture; the module hash joins the firmware `source-hashes.txt` in the ADR-028 witness bundle.
+6. **Witness bundle growth is mechanical.** Each pillar adds expected-hash keys (`forward_model_residual_v1`, `edge_presence_replay_v1`, `nvsim` already ships `proof.rs`) to the existing `verify.py` chain rather than inventing new verification mechanisms.
+
+---
+
+## 6. Explicit non-goals
+
+- **No workspace rename or rewrite.** Reaffirms ADR-136 §2.1/§4.1: no `ruview_*` crate prefix migration, no umbrella crate; pillars land inside the existing crates listed above.
+- **No full-wave Maxwell solver in the runtime loop.** P2 is first-order Born/ray, with the approximation order declared. "Physics-informed" never means FDTD at 20 Hz.
+- **No long-range cardiac magnetometry claims.** P8 is bounded by the physics review in `docs/research/quantum-sensing/16-ghost-murmur-ruview-spec.md`; ranges beyond published MCG physics are out of scope permanently, not just deferred.
+- **No camera in any deployed serving graph** (P4 teachers are train-time, cached-label only) and **no identity recognition as a product feature** — identity embeddings remain in-RAM, hash-rotated (ADR-120 invariants).
+- **No weaponization or LAWS capability in P5**, per ADR-148 §1.3; swarm work targets SAR/MAT and stays behind the ADR-148 regulatory gates.
+- **No fabricated benchmarks.** All pillar performance statements in this document are targets; promotion of any pillar requires the ADR-145 ablation matrix delta plus pinned determinism hashes, in CI, before any external claim.
+- **No new verification mechanisms.** The witness chain extends `verify.py` / BLAKE3 / `expected_*.sha256`; we do not introduce a second, parallel proof system.
+
+---
+
+## 7. Open questions for the next document in this series
+
+1. Airborne CSI phase stability (P5): what does the ADR-138 clock-quality gate measure on a real quadrotor payload?
+2. Forward-model fidelity floor (P2): what Born-residual magnitude on the ADR-135 empty-room captures is "good enough" to be a useful contradiction signal?
+3. Replay-corpus governance (Phase 0): retention, privacy class of recorded canonical bytes, and consent — the recorder stores signal evidence, which is itself sensitive.
--- a/docs/research/ruview-beyond-sota/03-benchmark-validation-methodology.md
+++ b/docs/research/ruview-beyond-sota/03-benchmark-validation-methodology.md
@ -0,0 +1,384 @@
+# Beyond-SOTA Validation, Test & Benchmark Methodology
+
+**Series:** `docs/research/ruview-beyond-sota/` · Document 03
+**Date:** 2026-06-09
+**Scope:** How RuView proves (and gates) beyond-SOTA claims using the verification
+infrastructure that already exists in this repository. Every number below is sourced
+from a cited file in this repo; nothing is invented.
+
+---
+
+## 1. The Layered Validation Pyramid
+
+Six layers, cheapest/most-deterministic at the bottom, most expensive/most-credible at
+the top. A beyond-SOTA claim must survive **every layer below it** before it may be
+published from the layer it lives at.
+
+| Layer | What it proves | Tooling | Frequency | Determinism |
+|-------|----------------|---------|-----------|-------------|
+| **L0** Unit/integration tests | Code correctness | `cargo test --workspace --no-default-features` + pytest | per commit | exact |
+| **L1** Deterministic proof + witness bundle | Pipeline is real, unchanged, reproducible | `archive/v1/data/proof/verify.py`, `scripts/generate-witness-bundle.sh` | per merge / release | exact (SHA-256) |
+| **L2** Criterion micro-benchmarks | Compute latency only — never quality (ADR-149 §2) | 15 bench targets across `v2/crates/*/benches/` | nightly / pre-release | statistical |
+| **L3** Dataset-level accuracy eval | Pose/presence/vitals quality vs published SOTA | MM-Fi / Wi-Pose (ADR-015), `ruview_metrics.rs` tiers, ADR-145 ablation harness | per model release | seeded |
+| **L4** Hardware-in-loop | Real CSI on real ESP32, no mocks | COM9 (S3) / COM12 (C6) protocol, witness firmware hashes | per firmware release | A/B controlled |
+| **L5** Field trials / live capture | End-to-end behavior in a real room | live-session captures (e.g. `benchmark_baseline.json`) | campaign | statistical |
+
+### 1.1 L0 — Workspace tests (current counts)
+
+- ADR-028 audit (2026-03-01): **1,031 passed, 0 failed, 8 ignored** for
+  `cargo test --workspace --no-default-features`
+  (`docs/adr/ADR-028-esp32-capability-audit.md` §2).
+- Current `CHANGELOG.md` (Unreleased, cross-platform fix entry): **2,682 workspace
+  tests pass / 0 fail on Windows** — the suite has more than doubled since the audit.
+- `CLAUDE.md` pre-merge gate still cites "1,031+ passed, 0 failed" as the floor.
+
+**Rule:** the post-change test count may never be lower than the pre-change count, and
+failures must be 0. The witness bundle records the full log
+(`test-results/rust-workspace-tests.log`) and an aggregated `summary.txt`
+(`scripts/generate-witness-bundle.sh` step 3).
+
+### 1.2 L1 — Deterministic proof ("Trust Kill Switch") + witness bundle
+
+`archive/v1/data/proof/verify.py` (header comment): feeds 1,000 synthetic CSI frames
+(seed=42, `sample_csi_data.json`) through the **production** `CSIProcessor`
+(`src/core/csi_processor.py`), hashes the first 100 frames' feature output
+(`VERIFICATION_FRAME_COUNT = 100`), and compares against
+`archive/v1/data/proof/expected_features.sha256`.
+
+- **Current published hash (file contents, verified during this investigation):**
+  `f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a`
+- The hash is **environment-coupled** and has been legitimately regenerated before:
+  ADR-028 §5.3 recorded `8c0680d7…` under numpy 2.4.2/scipy 1.17.1; `CHANGELOG.md`
+  (#560 fix) recorded `667eb054…` after 6-decimal quantization + single-thread BLAS
+  pinning (`OMP_NUM_THREADS=1` etc.). Each regeneration must follow the documented
+  procedure: `python verify.py --generate-hash` then `python verify.py` → `VERDICT: PASS`.
+
+`scripts/generate-witness-bundle.sh` packages: witness log + ADR-028, the Python proof
+(verify.py + expected hash + reference-signal metadata), full Rust test log + summary,
+the ADR-134 CIR proof, firmware source/binary SHA-256s, crate version manifest, npm
+tarball SHA-256, and a recipient-side `VERIFY.sh`.
+
+**Accuracy note on check counts:** `CLAUDE.md` describes the recipient verification as
+"7/7 PASS"; the current `VERIFY.sh` embedded in the script performs **10** `check()`
+assertions (witness log, ADR, proof-hash file, tests, firmware hashes, crate manifest,
+npm manifest, Python proof, CIR proof, CIR hash file) but prints a hardcoded
+`"ALL CHECKS PASSED (8/8)"` string (`generate-witness-bundle.sh` line 293). The
+hardcoded count is stale relative to the actual check list — fix it to print
+`${PASS_COUNT}/${PASS_COUNT+FAIL_COUNT}` so the verdict can never silently desynchronize
+from the check inventory.
+
+### 1.3 L2 — Criterion micro-benchmark inventory (all 15 targets)
+
+All bench sources read directly. Per ADR-149 §2 these are **latency regression gates
+only, never quality evidence**.
+
+| Bench target | Crate | Benchmark functions / groups | What it measures | Recorded value or in-source target (citation) |
+|---|---|---|---|---|
+| `engine_cycle.rs` | wifi-densepose-engine | `process_cycle_4nodes_56sc` | One full `StreamingEngine::process_cycle` (fuse + quality + calibration provenance + privacy gate + WorldGraph node), 4-node/56-subcarrier ESP32-S3 HT20 mesh | Budget: **50 ms** (20 Hz) — bench header |
+| `signal_bench.rs` | wifi-densepose-signal | `CSI Preprocessing`, `Phase Sanitization`, `Feature Extraction`, `Motion Detection`, `Full Pipeline` | SOTA signal stages (ADR-014) at varying frame sizes | no recorded baseline |
+| `cir_bench.rs` | wifi-densepose-signal | `cir_estimate` (HT20/HT40/HE20/HE40), `cir_estimate_12link`, `cir_estimator_new` | ADR-134 `CirEstimator::estimate()` per tier; 12-link multistatic amortization; cold-start | no recorded baseline |
+| `calibration_bench.rs` | wifi-densepose-signal | `bench_recorder_record`, `bench_recorder_finalize`, `bench_deviation`, `bench_record_600`, `bench_to_bytes` (K=52/114/242/484) | ADR-135 empty-room baseline recorder + deviation scoring | no recorded baseline |
+| `aether_prefilter_bench.rs` | wifi-densepose-signal | `aether_search_d…_n…_k…` (search vs prefilter) | ADR-084 Pass-2: `EmbeddingHistory::search_prefilter` vs brute force, prefilter_factor=8 | Pass: **≥4× at n=1024** — bench header |
+| `sketch_bench.rs` | wifi-densepose-ruvector | `compare_d128/256/512` × `float_l2`/`float_cosine`/`sketch_hamming` | ADR-084 sketch-vs-float per-pair compare cost (AETHER 128-d, spectrogram 256-d) | Pass: **sketch ≥8× faster** at every dim (ADR-084 threshold 8×–30×) — bench header |
+| `crv_bench.rs` | wifi-densepose-ruvector | `gestalt_classify_single/batch_100`, `sensory_encode_single`, `pipeline_full_session`, `convergence_two_sessions`, `crv_session_create`, `crv_embedding_dimension_scaling` (32/128/384), `crv_stage_vi_partition` | CRV integration throughput | no recorded baseline |
+| `inference_bench.rs` | wifi-densepose-nn | `tensor_ops` (relu/sigmoid/tanh), `densepose_inference`, `translator_inference`, `mock_inference`, `batch_inference` | NN forward-pass cost by input/batch size | no recorded baseline; **`mock_inference` group must never be quoted as a pipeline number** (§6) |
+| `training_bench.rs` | wifi-densepose-train | `interp_114_to_56_batch32`, `interp_scaling`, `compute_interp_weights_114_56`, `synthetic_dataset_get`, `synthetic_epoch`, `config_validate`, PCK over 100 samples | Training preprocessing + metrics hot paths; fixtures fully deterministic (no `rand`) — header | no recorded baseline |
+| `detection_bench.rs` | wifi-densepose-mat | `breathing_detection`, `heartbeat_detection`, `movement_classification`, `detection_pipeline`, localization (triangulation/depth), alert generation | MAT survivor-detection algorithms at varying signal lengths / noise | no recorded baseline |
+| `transport_bench.rs` | wifi-densepose-hardware | `beacon_serialize_16byte/28byte_auth/quic_framed`, `auth_beacon_verify`, `replay_window`, `framed_message` encode/decode, `secure_tdm_cycle` (manual vs QUIC) | TDM beacon crypto + transport | no recorded baseline |
+| `mqtt_throughput.rs` | wifi-densepose-sensing-server | `discovery::build_*`, `state::*`, `rate_limiter::allow_*`, `privacy::decide_*`, `semantic::bus_tick_all_10_primitives` | ADR-115 MQTT hot path | Targets (header): discovery **<5 µs**, state encode **<2 µs**, rate limit **<100 ns**, privacy **<50 ns**, bus tick **<10 µs** |
+| `swarm_bench.rs` | ruview-swarm | `marl_actor_inference`, `rrt_apf_100iter`, `multiview_fusion_3drones`, `demo_coverage_estimate`, `ppo_update_64transitions` | ADR-148 swarm control-loop compute | Measured: **3.3 µs / 43 µs / 54–58.5 ns / 100 ps / 248 µs** (ADR-149 §4.3; `CHANGELOG.md` Performance section) |
+| `pipeline_throughput.rs` | nvsim | `pipeline_run` (sample-count sweep), `witness::run` vs `run_with_witness` | NV-diamond sim throughput + witness overhead | Acceptance: **≥1 kHz** simulated samples/s on Cortex-A53-class CPU — bench header |
+| `state_machine.rs` | homecore | `set` first/warm/no-op, `get` hit/miss, `all_snapshot`, `all_by_domain_light_20_of_100`, `broadcast_fan_out` | HOMECORE state-machine hot paths | no recorded baseline |
+
+**Honest gap — `benchmark_baseline.json` is not a criterion baseline.** The repo-root
+`benchmark_baseline.json` (369.9 KB) contains **1,566 live-capture samples** from a
+2-node session (fields: `tick`, `n_nodes`, `variance`, `motion`, `presence`,
+`confidence`, `est_persons`, `n_persons_rendered`, `kp_spread`, `rssi`) plus a summary
+block — it records **field-trial telemetry (L5)**, not micro-benchmark latencies.
+No file in the repo references it (`grep -rn benchmark_baseline` → 0 hits outside the
+file itself); its producer must be identified and committed (§5.3). Summary values
+(all from the file's `summary` object):
+
+| Metric | Baseline value |
+|---|---:|
+| `total_frames` | 1,566 |
+| `presence_ratio` | 0.9336 (1,462/1,566 frames presence-true) |
+| `confidence_mean` | 0.6433 |
+| `variance_mean` / `variance_std` | 109.36 / 154.13 |
+| `kp_spread_mean` / `kp_spread_std` | 86.73 / 4.52 |
+| `person_count_changes` | 10 |
+
+Criterion latencies that *have* been recorded live in ADR documents instead
+(ADR-147-benchmark-proof.md, ADR-149 §4.3, CHANGELOG Performance) — §5 below defines
+how to consolidate them into a real machine-readable criterion baseline.
+
+### 1.4 L3 — Dataset-level accuracy evaluation
+
+- **Datasets (ADR-015):** primary **MM-Fi** (40 subjects × 27 actions × ~320K frames,
+  1TX×3RX, 114 subcarriers @100 Hz, 17-keypoint COCO + DensePose UV, CC BY-NC 4.0);
+  secondary **Wi-Pose** (12 volunteers × 12 actions × 166,600 packets, 3×3, 30
+  subcarriers). 114→56 subcarrier interpolation via `subcarrier.rs`; validation split =
+  subjects 33–40 held out (ADR-015 Phase 1).
+- **Acceptance tiers:** `wifi-densepose-train/src/ruview_metrics.rs` —
+  PCK@0.2 / OKS / MOTA / vitals rolled into `RuViewTier`
+  (Fail/Bronze/Silver/Gold) (ADR-145 §1.1).
+- **Ablation harness (ADR-145):** 6-variant matrix (`csi_only`, `cir_only`,
+  `csi_plus_cir`, `plus_doppler`, `plus_bfld`, `plus_uwb`-skipped), each variant
+  producing acceptance tier + `SpecMetrics` (presence ≥0.90, localization ≤0.50 m,
+  activity ≥0.70, FP ≤0.05, FN ≤0.10), `LatencyProfile` (p95 ≤100 ms), and
+  `PrivacyLeakage` (MIA `leakage_score` ≤0.05), SHA-256-pinned per variant under
+  `PROOF_SEED=42` (ADR-145 §2.2–2.6). Built at commit `0f336b7d3` (ADR-145
+  implementation status); CLI auto-mode wiring is pending.
+- **Cross-environment:** ADR-027 MERIDIAN `CrossDomainEvaluator`
+  (`wifi-densepose-train/src/eval.rs`) — `domain_gap_ratio`, extended by ADR-145
+  `cross_room_degradation()` with a 17-joint PCK-delta heatmap.
+
+### 1.5 L4 — Hardware-in-loop
+
+- Real CSI nodes: ESP32-S3 on **COM9**, ESP32-C6 + MR60BHA2 on **COM12** (`CLAUDE.md`
+  hardware table). ADR-018 binary frame protocol over UDP:5005 (ADR-028 §3.2/§3.4).
+- ADR-145 Tier-4 test (gated, `#[cfg(feature = "hardware-test")]`): replay a live 30 s
+  COM9 capture through `csi_only` and `csi_plus_cir`; assert no presence regression and
+  p95 < 100 ms.
+- A/B board protocol precedent (`CHANGELOG.md` #987): fixed vs unmodified control board
+  against Apple-Watch ground truth (control pegged 40–49 BPM; fixed 88–91 vs 87 GT) —
+  this fixed-board/control-board + external ground-truth pattern is the required design
+  for all hardware vital-sign claims.
+- Witness bundle pins firmware: per-file SHA-256 of all sources + release binaries
+  (`generate-witness-bundle.sh` step 5).
+
+### 1.6 L5 — Field trials
+
+Live multi-node sessions captured as JSONL/JSON with summary statistics —
+`benchmark_baseline.json` (§1.3) is the existing exemplar. ADR-149 §6 adds the seeded
+`evals/` episode harness (Stage 1 kinematic full-matrix, Stage 2 Gazebo/PX4 SITL on the
+3 median seeds) for the swarm domain.
+
+---
+
+## 2. Beyond-SOTA Acceptance Criteria per Capability Axis
+
+A claim is "beyond SOTA" only with: a named external baseline, an exact metric and
+protocol match, the dataset/split named, the threshold pre-registered, and the
+statistical procedure of §3 followed. Current axes with measured status:
+
+| Axis | Metric (exact) | Dataset / protocol | SOTA baseline | Beyond-SOTA threshold | Measured status (cited) |
+|---|---|---|---|---|---|
+| In-domain pose accuracy | torso-PCK@20: `‖pred−gt‖ ≤ 0.2·‖R-shoulder−L-hip‖` | MM-Fi `random_split` (ratio 0.8, seed 0) | MultiFormer **72.25%** (Table VII); CSI2Pose 68.41% | > 72.25% with 95% CI lower bound above it | Flagship **83.59%**; micro (75,237 params) **74.30%** (`docs/benchmarks/wifi-pose-efficiency-frontier.md`) |
+| Edge efficiency frontier | torso-PCK@20 at deployed precision + params + batch-1 latency | same | MultiFormer 72.25% at full size | Pareto-dominance: smaller **and** above 72.25% at the deployed precision | int8 73.5 KB **74.70%**; int4-QAT 36.7 KB **74.46%**; shipped int4 verified **74.08%**, 0.135 ms 1-thread x86 (same file) |
+| Cross-subject generalization | torso-PCK@20, official MM-Fi cross-subject split (256,608 train / 64,152 test) | leakage-free split | own zero-shot baseline 63.99% | ADR-150 §4 gate: **+≥6 pts cross-subject without losing >2 pts random-split** | Best zero-shot **64.92%** (mixup+TTA+3-seed); gate judged unreachable without new capture (ADR-150 §3.2) |
+| Few-shot calibration (deployment) | PCK@20 after K labeled in-room samples; adapter size | MM-Fi cross-subject & cross-environment splits | zero-shot (64% / 10.6%) | SOTA-level (≳72%) from ≤200 samples with ≤~11 KB per-room adapter | cross-subject ~**72%** @100–200 samples (3 seeds); cross-env **10.6→73.1%** @200, 60.1% @5 (ADR-150 §3.5–3.6) |
+| Swarm SAR localization | CEP50/CEP95 (m), GDOP-stratified | seeded episode distribution (ADR-149 §6), not single geometry | Wi2SAR **5 m** (arxiv 2604.09115, paper-to-paper) | CEP50 < 5 m, IQM over ≥10 seeds, 95% CI excluding 5 m | 1.732 m single synthetic geometry — graded **Low–Medium**, not yet claimable (ADR-149 §7) |
+| Swarm coverage | coverage-rate@240 s; time-to-95% | episode rollouts | Wi2SAR 160k m²/13.5 min | rollout (not analytic) mean+CI beating baseline | 223 s is an analytic estimate — graded **Low** (ADR-149 §7) |
+| Control-loop latency | criterion wall-clock | local hardware, named | 10 ms / 100 Hz budget | all stages ≪ budget | 3.3 µs MARL / 43 µs RRT-APF / 54 ns fusion / 248 µs PPO (ADR-149 §4.3) |
+| World-model trajectory | MDE (m) at 5-frame horizon | RuView CSI-derived occupancy | pre-fine-tune random-weight baseline 9.49 m MDE | **≤1.0 m (2.0 vox)** at 5-frame horizon (ADR-147 §5 target, cited in benchmark-proof §4) | 9.49 m / FDE 16.23 m random weights; 208.45 ms median latency on real CSI (ADR-147-benchmark-proof §4, §7) |
+| Privacy leakage | MIA `leakage_score = 2·(AUC−0.5)` | fixed replay, fixed-seed shadow classifier | chance (0) | ≤ **0.05** (attacker AUC ≤ 0.525) | gate defined, harness built (ADR-145 §2.3) |
+| Vitals (hardware) | BPM error vs wearable ground truth | live A/B board protocol | control board behavior | within physiological agreement of ground truth, stable spread | 88–91 BPM vs 87 GT, spread 59→0 (CHANGELOG #987) |
+
+### Claim-language discipline (from ADR-149 §7 grading)
+
+| Evidence | Permitted language |
+|---|---|
+| Single run / single geometry / analytic estimate | "directional", never "beats SOTA" |
+| Seeded multi-run with CIs vs paper baseline | "exceeds the published X result paper-to-paper" |
+| Same metric, same split, same protocol, CI excludes baseline | "beyond SOTA on <dataset>/<split>" |
+| No public leaderboard exists (swarm CSI-SAR) | never claim "leaderboard standing" (ADR-149 §3) |
+
+---
+
+## 3. Statistical Procedure for Honest Claims
+
+Adopted from ADR-149 §5 (Agarwal 2021 / Gorsane 2022 standard) and the practices
+already used in ADR-150/efficiency-frontier measurements:
+
+1. **Seeds.** ≥10 independent seeds for RL/episodic claims (ADR-149 §5); ≥3 seeds
+   minimum for supervised dataset evals (ADR-150 §3.5 used 3 seeds; report all).
+   Training seeds, eval seeds, and split files are versioned and committed.
+2. **Aggregate.** IQM (not mean/median) for episodic metrics + performance profiles;
+   for dataset accuracy report mean across seeds with each seed's value listed.
+3. **Confidence intervals.** 95% stratified bootstrap, 1,000 resamples (ADR-149 §5;
+   reference impl: `rliable`).
+4. **Paired comparisons.** When comparing model A vs B (e.g. `csi_plus_cir` vs
+   `csi_only`, or ours vs a reproduced baseline), evaluate both on the **identical
+   frozen test frames** and use a paired bootstrap over per-sample correctness
+   (PCK hit/miss is per-joint binary — pair at the joint-sample level). For
+   paper-to-paper comparisons where the baseline cannot be re-run, state so
+   explicitly ("paper-to-paper", ADR-149 §2) and require the CI lower bound to clear
+   the published point value.
+5. **Pre-registration.** The threshold lives in an ADR **before** the run
+   (precedent: ADR-150 §4 gate written before §3.2 measurements; the measurements
+   honestly reported the gate as not met).
+6. **Negative results are recorded.** ADR-150 §1/§3.2 keeps DANN-failed,
+   capacity-hurts, and KD-didn't-help results in the record — required practice.
+7. **Eval episodes (swarm):** 50 fixed, versioned episodes per policy
+   (10 victim layouts × 5 CSI-noise levels), ≥3 baselines (random walk,
+   boustrophedon+triangulation, IPPO) (ADR-149 §5).
+8. **GDOP stratification** for any localization claim, so geometry artifacts cannot
+   produce the headline (ADR-149 §6.3).
+
+---
+
+## 4. Regression-Gate Design (CI Enforcement)
+
+### 4.1 Three gate classes, three tolerances
+
+| Gate class | Source of truth | Tolerance | On breach |
+|---|---|---|---|
+| Determinism hashes | `expected_features.sha256`, `expected_cir_features.sha256`, `expected_calibration_features.sha256`, future `expected_ablation_<slug>.sha256` | **exact (0%)** | exit 1 = FAIL; exit 2 = SKIP only for placeholder hashes (proof.rs `0/1/2` convention, ADR-145 §2.4) |
+| Accuracy / quality metrics | per-variant canonical bytes, quantized 1e-3 (ADR-145 §2.6) | exact after quantization | FAIL CI; tier change requires ADR amendment |
+| Latency / throughput | criterion estimates JSON | **% tolerance per scale** (below) | FAIL on regression beyond tolerance; trend everything |
+
+### 4.2 Criterion baseline file (replaces the current gap)
+
+Today criterion numbers live in prose (ADR-147-benchmark-proof, ADR-149 §4.3,
+CHANGELOG). Formalize:
+
+1. `cargo bench --workspace -- --save-baseline main` on a **named, fixed runner**
+   (ADR-147 used RTX 5080 / specific host; record host + toolchain in the file).
+2. Export `target/criterion/*/estimates.json` point estimates into a committed
+   `v2/benchmarks/criterion-baseline.json`: `{bench_id, crate, p50_ns, host, commit}`.
+3. CI compares new runs against it with scale-aware tolerance — wall-clock noise is
+   proportionally larger at small magnitudes:
+
+| Magnitude | Tolerance | Rationale |
+|---|---|---|
+| < 1 µs (e.g. fusion 54 ns, privacy decide <50 ns target) | ±25% | timer/jitter dominated |
+| 1 µs – 1 ms (MARL 3.3 µs, RRT-APF 43 µs, PPO 248 µs) | ±15% | criterion CI typically <5%, leave CI-runner headroom |
+| > 1 ms (engine cycle vs 50 ms budget, OccWorld ~209 ms) | ±10% **and** absolute budget (50 ms / 500 ms ADR-147 §6) | budgets are the contract |
+
+4. Hard in-source acceptance thresholds remain authoritative regardless of baseline:
+   sketch ≥8× (`sketch_bench.rs`), prefilter ≥4× (`aether_prefilter_bench.rs`),
+   nvsim ≥1 kHz (`pipeline_throughput.rs`), MQTT header targets, ADR-145 p95 ≤100 ms.
+5. Latency stays **out of determinism hashes** (ADR-145 §2.6) but **in** the trended
+   `summary.json`, so sub-threshold drift is visible (ADR-145 §3.2 mitigation).
+
+### 4.3 Live-capture baseline gate (`benchmark_baseline.json`)
+
+Adopt the file as the L5 regression anchor with documented provenance, then gate a
+re-capture of the same scenario (same 2-node placement, same room class) against the
+summary block:
+
+| Field | Baseline | Suggested gate |
+|---|---:|---|
+| `presence_ratio` | 0.9336 | ≥ 0.90 for an occupied-room session |
+| `confidence_mean` | 0.6433 | within ±0.10 |
+| `kp_spread_std` | 4.52 | ≤ 2× baseline (skeleton stability) |
+| `person_count_changes` | 10 / 1,566 frames | ≤ 2× baseline (count flapping — see CHANGELOG #803/#894 clamp bugs this metric would have caught) |
+
+Field-trial gates are **soft** (warn + require human sign-off), never auto-merge
+blockers — environments differ; the gate exists to force an explanation.
+
+### 4.4 Wiring
+
+Pre-merge (`CLAUDE.md` checklist): L0 + L1. Nightly: L2 criterion + ADR-145 Tier-3
+ablation matrix (minutes-scale, ADR-145 §3.2). Release: full witness bundle +
+`VERIFY.sh` + L4 on real COM-port hardware (`CLAUDE.md` firmware rule 6/7).
+
+---
+
+## 5. Reproducibility & External-Witness Requirements
+
+Anyone outside the project must be able to re-run every claimed result:
+
+1. **One command per layer.** `cargo test --workspace --no-default-features`;
+   `python archive/v1/data/proof/verify.py`; `bash scripts/generate-witness-bundle.sh`
+   then `bash VERIFY.sh` inside the bundle; per ADR-150 §4 every accuracy result needs
+   "one-command reproduction" (efficiency frontier publishes its exact command:
+   `python aether-arena/staging/train_efficiency_pareto.py npy/X.npy npy/Y.npy npy/split_random.npy`).
+2. **Pinned numerical environment.** The Python proof requires single-threaded BLAS
+   (`OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=1`, `MKL_NUM_THREADS=1`,
+   `VECLIB_MAXIMUM_THREADS=1`, `NUMEXPR_NUM_THREADS=1`) and 6-decimal quantization
+   (`HASH_QUANTIZATION_DECIMALS=6`) — the #560 fix in `CHANGELOG.md`; Rust proof
+   runners use coarse u16 quantization at 1e-3 in natural order
+   (`calibration_proof_runner.rs` pattern, ADR-145 §2.6) for libm portability.
+3. **Seeds are constants, committed:** `PROOF_SEED=42`, `MODEL_SEED=0`
+   (`proof.rs`, ADR-015 Phase 5); dataset splits committed as `.npy`
+   (`split_random.npy`); swarm configs as versioned YAML with all seeds (ADR-149 §5).
+4. **Artifacts carry hashes.** Published model artifacts include SHA-256 (HuggingFace
+   `pose_micro_int4.npz`, sha256 `c03eeb…` — efficiency-frontier doc); witness bundle
+   has a `MANIFEST.sha256` over every file; provenance fields
+   (`replay_sha256`, `model_sha256`, `calibration_version`, `privacy_mode`) are bound
+   into ablation proof hashes (ADR-145 §2.7) so a metric cannot be quoted without its
+   exact model + calibration + privacy decision.
+5. **Hardware claims name the hardware.** ADR-147 records RTX 5080 / CUDA 12.8 /
+   PyTorch 2.10.0; nvsim states the Cortex-A53 scaling caveat in the bench header;
+   efficiency-frontier flags ARM validation as pending. Copy this discipline.
+6. **Witness rows.** Every new proof gains rows in `docs/WITNESS-LOG-028.md`
+   (ADR-145 §5.3 adds W-39…W-41) and the bundle's `source-hashes.txt`.
+7. **Secret hygiene in evidence.** Bundle logs pass through
+   `scripts/redact-secrets.py` (ADR-110 wave-5 incident note in
+   `generate-witness-bundle.sh` step 4) — external evidence must never embed `.env`.
+
+---
+
+## 6. Known Measurement Pitfalls (WiFi-sensing specific)
+
+| # | Pitfall | Repo evidence | Mitigation in this methodology |
+|---|---|---|---|
+| 1 | **Subject leakage / split optimism.** In-domain `random_split` has temporal/subject-adjacency effects; the same model family scores 83.6% random-split but ~11.6% torso-PCK on the leakage-free cross-subject split | efficiency-frontier "Controlled claim" footnote; ADR-150 §1, §3.2 | Always report the split name; publish random-split and cross-subject numbers side by side; cross-subject claims only on the official split |
+| 2 | **Per-environment overfitting.** Zero-shot cross-environment collapses to 10.6%; subject-scaling saturates ~63.7% past 16–20 subjects because the residual is room/device shift | ADR-150 §3.3, §3.6 | Cross-room degradation + 17-joint heatmap in every ablation (ADR-145 §2.5); claim deployment accuracy only with the calibration protocol stated (K samples, adapter size) |
+| 3 | **Mock-mode contamination.** Mock firmware missed a real Kconfig threshold bug; the nn crate ships a `mock_inference` criterion group that must never be quoted as pipeline performance | `CLAUDE.md` firmware rule 7; `inference_bench.rs` `bench_mock_inference` | L4 mandatory before firmware release ("Always test with real WiFi CSI, not mock mode"); label mock benches in reports; ADR-147 §7 re-ran the benchmark on real CSI explicitly "no mocks" |
+| 4 | **Single-run point estimates.** 1.732 m localization from one synthetic geometry; 223 s coverage from an analytic formula | ADR-149 §1, §7 | §3 seed/CI protocol; evidence-grade table before publication |
+| 5 | **Random-weight / untrained baselines read as results.** OccWorld MDE 9.49 m is a pre-fine-tuning random-weight reading | ADR-147-benchmark-proof §4 | Label baseline-vs-target explicitly; never aggregate untrained-model numbers into capability claims |
+| 6 | **Latency conflated with quality.** Criterion µs numbers prove no compute bottleneck, nothing about accuracy | ADR-149 §2, §4.3 | L2 is gate-only; quality claims live in L3+ |
+| 7 | **Floating-point nondeterminism breaking proofs.** SciPy FFT SIMD reordering + multithreaded BLAS produced different hashes across CI microarchitectures | CHANGELOG #560; `calibration_proof_runner.rs` lines 1–13 (cited in ADR-145 §2.3) | Quantize before hashing; pin thread env vars; exclude wall-clock from hashes |
+| 8 | **Hash churn without procedure.** Three distinct historical values of the proof hash exist (`8c0680d7…` ADR-028, `667eb054…` CHANGELOG #560, `f8e76f21…` current file) | cited files | Every regeneration via `--generate-hash` + re-verify + CHANGELOG entry + witness bundle refresh |
+| 9 | **Aggregation bugs masking accuracy.** Person count clamped to 1 by EMA mapping; eigenvalue path leaking counts up to 10; both invisible to unit tests for months | CHANGELOG #803, #894 | L5 summary gates on `person_count_changes`/count distributions; convergence tests replaying the live loop |
+| 10 | **Stale verification claims.** `VERIFY.sh` prints hardcoded "(8/8)" over 10 actual checks; `CLAUDE.md` says "7/7" | `generate-witness-bundle.sh` line 293; `CLAUDE.md` | Compute the verdict count; audit doc claims against scripts each release |
+| 11 | **Licensing limits on the eval set.** MM-Fi is CC BY-NC — weights trained solely on it cannot back commercial claims | ADR-015 Consequences | Track dataset license alongside every published number |
+
+---
+
+## 7. Gap List (what must be built to fully execute this methodology)
+
+| Gap | Owner layer | Source |
+|---|---|---|
+| Machine-readable criterion baseline (`v2/benchmarks/criterion-baseline.json`) + CI comparison job | L2 | §4.2 (numbers currently only in ADR prose) |
+| Provenance + producer script for `benchmark_baseline.json`; soft-gate job | L5 | §1.3, §4.3 (zero code references today) |
+| `ruview-cli --ablation mode=auto` wiring + `expected_ablation_<slug>.sha256` (currently placeholders → exit 2) | L3 | ADR-145 implementation status |
+| Seeded swarm `evals/` harness + `evals/RESULTS.md` internal leaderboard | L3/L5 | ADR-149 §6, §8 open issues |
+| Fix `VERIFY.sh` hardcoded verdict count; reconcile `CLAUDE.md` "7/7" | L1 | §1.2 |
+| Curated paired room-A/room-B labeled replay set (frozen, SHA-pinned, never trained on) | L3 | ADR-145 §3.2 |
+| ARM/edge on-device latency validation for the int4 model (x86-only today) | L4 | efficiency-frontier doc ("Pi fleet pending") |
+| Bench validation of the antenna-placement matrix on real hardware | L4 | PRODUCTION-ROADMAP.md Tier 2.3 |
+
+---
+
+## Update — falsifiable occupancy benchmark implemented
+
+`wifi-densepose-train::occupancy_bench` (added this branch) makes the
+presence/person-count claim **falsifiable in code**, directly enforcing the L3
+discipline above. It grades predictions vs ground truth and gates a SOTA claim
+behind a single `claim_allowed` invariant that requires **all** of:
+
+1. `DataProvenance::Measured` — synthetic/mock data is scorable for regression
+   but **never claimable** (anti-mock-contamination; the CLAUDE.md Kconfig-bug
+   lesson made structural).
+2. A leak-free `EvalSplit` — `validate()` refuses any split where a subject *or*
+   environment id appears in both train and test (subject leakage / per-env
+   overfitting).
+3. `n_test ≥ min_test_samples` (small-N guard).
+4. Presence F1 whose **bootstrap-CI lower bound** (deterministic splitmix64,
+   seeded) clears the threshold — not the point estimate.
+5. Count MAE within threshold.
+
+The claim string is unreadable except through the gate (returns `NO_CLAIM`
+otherwise) — same discipline as the `ruview-gamma` acceptance gate. 10 tests
+cover each refusal path. What remains is *data*, not *method*: feed it a frozen,
+SHA-pinned, subject/environment-disjoint **measured** replay set (the curated
+room-A/room-B item above) and the "beyond SOTA" claim becomes a passing or
+failing test, not a slogan.
+
+---
+
+*All values cited from: `benchmark_baseline.json`, `v2/crates/*/benches/*.rs` (15
+files), `docs/adr/ADR-147-benchmark-proof.md`,
+`docs/adr/ADR-149-swarm-benchmarking-evaluation-methodology.md`,
+`docs/adr/ADR-145-ablation-eval-harness-privacy-leakage.md`,
+`docs/adr/ADR-028-esp32-capability-audit.md`,
+`docs/adr/ADR-015-public-dataset-training-strategy.md`,
+`docs/adr/ADR-150-rf-foundation-encoder.md`,
+`docs/benchmarks/wifi-pose-efficiency-frontier.md`,
+`scripts/generate-witness-bundle.sh`, `archive/v1/data/proof/verify.py`,
+`archive/v1/data/proof/expected_features.sha256`, `CHANGELOG.md`, `CLAUDE.md`,
+`docs/research/sota-2026-05-22/PRODUCTION-ROADMAP.md`.*
--- a/docs/research/ruview-beyond-sota/04-optimization-roadmap.md
+++ b/docs/research/ruview-beyond-sota/04-optimization-roadmap.md
@ -0,0 +1,252 @@
+# RuView Beyond-SOTA — 04: Performance Review & Optimization Roadmap
+
+**Scope:** the streaming sensing pipeline (CSI ingest → multistatic fusion → CIR gate →
+pose publish) in `v2/`, hot-path crates `wifi-densepose-signal` (ruvsense),
+`wifi-densepose-engine`, `wifi-densepose-ruvector`, plus build-profile and edge-target
+(Pi 5-class, WASM) considerations.
+
+**Hard constraint (non-negotiable):** the witness chain (ADR-028, ADR-136 §2.5 replay
+contract, ADR-137 §2.7 BLAKE3 witness in
+`v2/crates/wifi-densepose-engine/src/lib.rs:437-448`) requires **bit-exact deterministic
+float output**. Every recommendation below is tagged with its determinism risk. Anything
+that reorders float additions, enables FMA contraction, fast-math, or parallel reduction
+**changes the witness hash** and requires a coordinated proof-hash regeneration
+(`verify.py --generate-hash`) plus witness-bundle re-issue.
+
+---
+
+## 1. What we actually have measured (and what we don't)
+
+`/home/user/RuView/benchmark_baseline.json` is a **signal-quality soak baseline**, not a
+latency benchmark: 1,566 samples (ticks 51131–52395) of
+`variance / motion / presence / confidence / est_persons / kp_spread / rssi`, with a
+summary block (`confidence_mean: 0.643`, `presence_ratio: 0.934`,
+`kp_spread_mean: 86.7`, `person_count_changes: 10`). **It contains zero timing data.**
+It is the accuracy guardrail for any optimization (post-change soak must reproduce these
+distributions), not a latency baseline.
+
+Latency benchmarks exist but no committed results were found in the repo:
+
+| Bench | File | What it measures |
+|---|---|---|
+| `process_cycle_4nodes_56sc` | `v2/crates/wifi-densepose-engine/benches/engine_cycle.rs:34-48` | One full engine cycle, 4 nodes × 56 subcarriers, vs. the documented 50 ms budget (`engine_cycle.rs:3-6`) |
+| `cir_bench` | `v2/crates/wifi-densepose-signal/benches/cir_bench.rs` | `CirEstimator::estimate()` per tier (HT20/HT40/HE20/HE40) + 12-link amortization |
+| `sketch_bench` | `v2/crates/wifi-densepose-ruvector/benches/sketch_bench.rs:86-175` | Hamming sketch vs. float L2/cosine compare; top-K over 1,024-sketch bank |
+| `signal_bench`, `calibration_bench`, `aether_prefilter_bench` | `v2/crates/wifi-densepose-signal/benches/` | Signal-path and ADR-135 calibration throughput |
+
+**Action zero of the roadmap is to run these on a Pi 5 and commit the criterion
+baselines.** All impact classes below are derived from operation counts read out of the
+code (cited), not invented measurements.
+
+---
+
+## 2. Latency budget model — streaming pipeline
+
+Two clock domains exist and must not be conflated:
+
+- **TDMA sensing cycle: 20 Hz / 50 ms** — the architecture's own budget
+  (`v2/crates/wifi-densepose-signal/src/ruvsense/mod.rs:5`, `RuvSenseConfig::target_hz =
+  20.0` at `mod.rs:258`, and the bench doc `engine_cycle.rs:3`).
+- **CSI ingest: 100 Hz per node** — raw frames arrive ~5× faster than the fused output
+  rate; per-frame ingest work (parse, normalize, calibrate, window) must therefore fit a
+  **10 ms** per-frame envelope while the fused path fits **< 50 ms end-to-end**.
+
+Proposed per-stage budget for the 50 ms end-to-end target (4 nodes, HT20 / 56
+subcarriers — the configuration the engine bench encodes):
+
+| # | Stage | Code | Budget | Risk (from code reading) |
+|---|---|---|---|---|
+| 1 | Ingest + hardware normalize (per 100 Hz frame) | `hardware_norm`, `multiband.rs` | 2 ms | Low — vector ops on 56 floats |
+| 2 | Calibration apply (ADR-135) | `ruvsense/calibration.rs` | 2 ms | Low — Welford lookups |
+| 3 | Phase alignment | `phase_align.rs:117-152` | 1 ms | Low — ≤ 20 iterations over ≤ 17 static subcarriers (`config.max_iterations: 20`, `phase_align.rs:57`); allocation churn only (§3) |
+| 4 | Multistatic fusion (attention + softmax) | `multistatic.rs:512-598` | 2 ms | Low — O(nodes × 56); but does duplicate work in `fuse_scored` (§3, F2) |
+| 5 | **CIR gate (ISTA L1)** | `multistatic.rs:440-475` → `cir.rs:601-654` | 15 ms | **HIGH** — dominant cost, scales badly with PHY tier (below) |
+| 6 | Coherence score + gate decision | `coherence.rs`, `coherence_gate.rs` | 2 ms | Low — z-scores over 56 subcarriers |
+| 7 | Tomography (ADR-030 tier 2, when enabled) | `tomography.rs:236-323` | 8 ms | **Medium** — per-iteration allocation + loose step size (§3, F8/F9) |
+| 8 | Pose tracker (17-kp Kalman + re-ID) | `pose_tracker.rs` | 8 ms | Medium — sketch prefilter (ADR-084) already mitigates the re-ID scan |
+| 9 | Engine: quality score, privacy gate, WorldGraph node, BLAKE3 witness | `engine/src/lib.rs:304-368` | 5 ms | Low per cycle, but **unbounded memory growth** (§4) |
+| 10 | Publish (WS/serde) | sensing-server | 5 ms | Low |
+| | **Total** | | **50 ms** | |
+
+### Why stage 5 is the at-risk stage — operation counts from the code
+
+`ista_solve` (`cir.rs:601-654`) runs **two dense complex mat-vecs per iteration**
+(`matvec_phi` at `cir.rs:717-726`, `matvec_phi_h` at `cir.rs:730-745`), each O(K·G)
+complex MACs (≈ 8 FLOPs each), up to `max_iters: 100` (`cir.rs:176`). Per
+`CirConfig` (`cir.rs:164-233`):
+
+| Tier | K (active) | G (taps) | FLOPs/iter (2·K·G·8) | FLOPs @100 iters |
+|---|---|---|---|---|
+| HT20 | 52 | 156 | ≈ 0.13 M | ≈ 13 M |
+| HT40 | 114 | 342 | ≈ 0.62 M | ≈ 62 M |
+| HE20 | 242 | 726 | ≈ 2.8 M | ≈ 0.28 G |
+| HE40 | 484 | 1,452 | ≈ 11.2 M | ≈ 1.1 G |
+
+HT20 fits the 15 ms budget comfortably on a Pi 5; **HE40 at worst-case iteration count
+is ~1.1 GFLOP of scalar, cache-unfriendly work per estimate and will not fit any 50 ms
+budget without structural change** (F4 below). Today the gate runs once per cycle on the
+first link only (`multistatic.rs:452-463`), which contains the damage; the 12-link
+amortization pattern in `cir_bench.rs` shows the intended scale-up, which multiplies
+this cost ×12.
+
+---
+
+## 3. Findings table — optimization opportunities
+
+Impact: relative cycle-time/memory effect at the 4-node HT20 operating point unless
+noted. Determinism: **EXACT** = bit-identical output guaranteed; **TIE** = only
+tie-breaking/ordering may differ; **CHANGES-FLOATS** = output bits change, witness/proof
+hash must be regenerated.
+
+| ID | Finding (file:line) | Impact | Effort | Determinism |
+|---|---|---|---|---|
+| F1 | `FusedSensingFrame` deep-copies every input frame each cycle: `node_frames: node_frames.to_vec()` (`multistatic.rs:282`) — clones all per-node amplitude+phase vectors per 50 ms cycle even when downstream geometry consumers don't need them | Med | Low (Arc/Cow or borrow) | EXACT |
+| F2 | `fuse_scored` re-derives the per-node amplitude views and recomputes `node_attention_weights` after `fuse` already computed them inside `attention_weighted_fusion` (`multistatic.rs:311-321` duplicating `multistatic.rs:520`) — full cosine-sim + softmax done twice per cycle | Low-Med | Low (return weights from `fuse`) | EXACT (same math, computed once) |
+| F3 | CIR gate rebuilds a heap `CsiFrame` per cycle: `build_csi_frame_from_channel` allocates an `Array2<Complex64>` and converts amplitude/phase via `from_polar` per subcarrier (`multistatic.rs:488-506`, called from `multistatic.rs:462`), then `extract_csi_vector` converts back to `Complex32` (`cir.rs:505-530`) — f32→f64→f32 round-trip plus two allocations purely as glue | Med | Med (give `CirEstimator` a slice-based entry point) | EXACT if conversions reproduce exactly (f32→f64 is lossless; `from_polar` in f64 then truncate ≠ f32 polar — keep the f64 intermediate to stay exact, or accept CHANGES-FLOATS and regenerate hashes) |
+| F4 | ISTA inner loop uses dense O(K·G) mat-vecs (`cir.rs:717-745`) although Φ is a sub-sampled DFT (`cir.rs:539-558`) — the products Φx and Φᴴr are computable via an FFT of length G in O(G log G), an ~8–40× FLOP cut at HE20/HE40 (table §2) | **High** (the only path to HE40 real-time) | High | **CHANGES-FLOATS** (different summation order than the sequential dot product) — must ship behind a feature flag, A/B against `cir_proof_runner`, regenerate `expected_features.sha256` + witness bundle |
+| F5 | `neumann_warm_start` recomputes the diagonal of ΦᴴΦ with a full K×G pass **per frame** (`cir.rs:676-681`), rebuilds the COO→CSR diagonal matrix per frame (`cir.rs:683-685`), and collects `rhs_re`/`rhs_im` Vecs per frame (`cir.rs:689-690`) — yet `diag` depends only on Φ, which is fixed at `CirEstimator::new` | Med | Low (precompute diag+CSR in `new()`) | EXACT (same values, computed once) |
+| F6 | `phase_variance` collects a `Vec<f32>` of phases per call (`cir.rs:792`) — replaceable by a two-pass loop with zero allocation | Low | Low | EXACT |
+| F7 | Φ and Φᴴ are both stored densely (`cir.rs:546-547`): 2·K·G·8 bytes — Φᴴ entries are just conjugates of Φ (`cir.rs:555`), so a transposed-iteration kernel over Φ alone halves the footprint (HE40: 11.2 MB → 5.6 MB) | Low (latency) / Med (memory §4) | Med | EXACT (conjugation is exact; keep identical accumulation order in the transposed kernel) |
+| F8 | Tomography allocates the gradient vector **inside** the solver iteration loop: `let mut gradient = vec![0.0_f64; self.n_voxels]` (`tomography.rs:266`) — one heap alloc + zeroing per iteration, up to `max_iterations: 100` (`tomography.rs:75`); hoist and `fill(0.0)` | Med (for tier-2 deployments) | Low | EXACT |
+| F9 | Tomography step size uses the Frobenius-norm upper bound for the Lipschitz constant (`tomography.rs:253-259`, comment admits `‖WᵀW‖ ≤ ‖W‖_F²`) — a bound loose by up to the matrix rank, forcing proportionally more ISTA iterations than the power-method estimate used in `cir.rs:566-590` | Med | Low (reuse the cir.rs power-method pattern) | **CHANGES-FLOATS** (different step ⇒ different iterate path) |
+| F10 | `apply_phase_correction` clones the amplitude vector and allocates a fresh corrected-phase Vec per channel per cycle (`phase_align.rs:258-268`, `frame.amplitude.clone()` at `phase_align.rs:264`); `align` additionally `frames.to_vec()`s on the single-channel path (`phase_align.rs:128`) — an in-place `align_mut` avoids all of it | Low-Med | Low | EXACT |
+| F11 | Static-subcarrier selection fully sorts all subcarriers by variance (`phase_align.rs:180`) where `select_nth_unstable_by` suffices — trivial at 56 subcarriers, relevant at HE tiers (242–484) | Low | Low | **TIE** (equal-variance ties may select a different subcarrier set; pin a stable tie-break on index to stay EXACT) |
+| F12 | Engine clones each node's amplitude vector for the array coordinator every cycle: `cf.amplitude.clone()` (`engine/src/lib.rs:385`); also allocates a `Vec<Option<CalibrationId>>` per cycle (`lib.rs:293`) and `format!("{e:?}")` strings for every evidence ref (`lib.rs:337`) | Low | Low | EXACT |
+| F13 | `fuse_scored_calibrated` computes the modal calibration id in O(n²) (`multistatic.rs:404-410`) — harmless at n ≤ 15 nodes, noted for swarm-scale reuse (ADR-148) | Low | Low | EXACT |
+| F14 | **No `rayon` and no SIMD feature exists anywhere in the hot crates** (grep over `crates/*/Cargo.toml`: zero hits for rayon/simd/target-feature outside wasm-opt flags). The 12-link CIR pattern (`cir_bench.rs:4-5`) and the per-node ingest path are embarrassingly parallel **across independent links/nodes** | High (multi-link tiers) | Med | **EXACT if and only if** parallelism stays at link/node granularity with results collected in deterministic (index) order and no shared float accumulator; intra-link parallel reductions are CHANGES-FLOATS and are banned |
+| F15 | `Cir::top_k_taps` clones and fully sorts all G taps (`cir.rs:322-332`) — O(G log G) with a G-sized clone; a k-heap (the exact pattern already written in `sketch.rs:546-563`) is O(G log k) | Low | Low | TIE (equal-magnitude ordering; pin index tie-break) |
+| F16 | Core `CsiFrame` carries `Complex64` while the entire ruvsense DSP path computes in f32 (conversion at `cir.rs:525`) — 2× memory and bandwidth on every ingest for precision the pipeline immediately discards | Med (memory/bandwidth) | High (core type change ripples everywhere) | **CHANGES-FLOATS** at the boundary; defer until a major version |
+| F17 | Sketch path is already well-optimized: heap-based top-K with n ≤ k fast path (`sketch.rs:536-569`), 28-byte wire format (`sketch.rs:303`). Remaining win is build-level: `count_ones()` only lowers to POPCNT/NEON-vcnt when the target CPU enables it (see §5) | Low | Low | EXACT (integer ops) |
+
+---
+
+## 4. Memory-footprint analysis (Pi 5-class and WASM; ESP32 aggregation out of scope)
+
+**Static, per-process (from struct definitions):**
+
+| Component | Sizing source | Footprint |
+|---|---|---|
+| `CirEstimator` HT20 (Φ + Φᴴ, `Complex32`) | `cir.rs:546-547`, K=52 G=156 | 2 · 52 · 156 · 8 B ≈ **130 KB** |
+| `CirEstimator` HE20 | K=242 G=726 | ≈ **2.8 MB** |
+| `CirEstimator` HE40 | K=484 G=1452 | ≈ **11.2 MB** (halvable via F7) |
+| Tomography weight matrix | `tomography.rs:214-217`, sparse per-link (voxel,weight) pairs; default grid 8×8×4 = 256 voxels (`tomography.rs:70-73`) | tens of KB at default grid |
+| Sketch bank, 1,024 × 128-d | `sketch.rs` 1 bit/dim | 1,024 · 16 B ≈ **16 KB** (vs 512 KB float) |
+
+A Pi 5 (4–8 GB) absorbs all of this trivially. The real memory risks are dynamic:
+
+1. **Unbounded WorldGraph growth (the one genuine leak-class issue).** Every
+   `process_cycle` appends a `SemanticState` node plus a `DerivedFrom` edge
+   (`engine/src/lib.rs:346-352`), and change-points append `Event` nodes
+   (`lib.rs:422-428`). At 20 Hz that is **1.73 M nodes/day** with no eviction anywhere
+   in the engine. `snapshot_json` (`lib.rs:191-193`) then serializes the whole graph.
+   **Required:** a retention/compaction policy (ring buffer or time-windowed rollup of
+   SemanticStates). Determinism caveat: eviction changes snapshot *contents* (a product
+   decision), not float math — the per-cycle witness (`lib.rs:437-448`) is unaffected.
+2. **Per-cycle allocation churn** (F1, F3, F5, F8, F10, F12): at 20 Hz this is dozens of
+   short-lived heap allocations per cycle. On a Pi 5 this is allocator pressure and
+   cache pollution rather than RSS growth; on WASM (bump-ish dlmalloc, no MADV_FREE) it
+   inflates the linear memory high-water mark, which is never returned to the host.
+3. **WASM targets.** `wifi-densepose-wasm` is a browser binding crate (JS interop,
+   serde, chrono — `crates/wifi-densepose-wasm/Cargo.toml`) and pulls `wifi-densepose-mat`
+   optionally; it relies on `wasm-opt -O4` (`Cargo.toml` `[package.metadata.wasm-pack]`).
+   `wifi-densepose-wasm-edge` is the disciplined one: `no_std` + `libm`, its own profile
+   `opt-level = "s"`, lto, cgu=1 (`crates/wifi-densepose-wasm-edge/Cargo.toml`). Neither
+   enables `+simd128` (§5). If the CIR estimator is ever compiled to wasm-edge, HE40's
+   11.2 MB of sensing matrix alone is ~700 pages of linear memory — restrict edge WASM
+   to HT20 (130 KB) or ship F4/F7 first.
+
+---
+
+## 5. Build-profile review & recommendations
+
+Current release profile (`v2/Cargo.toml:213-218`) is already aggressive and correct:
+`opt-level = 3`, `lto = true` (fat), `codegen-units = 1`, `panic = "abort"`,
+`strip = true`; `bench` inherits release with debug symbols (`v2/Cargo.toml:225-227`).
+There is nothing wrong to fix here — the gains left are target- and feedback-driven:
+
+1. **Per-target CPU tuning (EXACT, do first).** No `target-cpu` is set anywhere. For
+   Pi 5 fleet builds: `RUSTFLAGS="-C target-cpu=cortex-a76"` — enables NEON scheduling
+   and `vcnt` for the sketch path (F17) without changing IEEE semantics. LLVM does not
+   reassociate float reductions or contract to FMA without explicit fast-math/contract
+   flags, so scalar float results stay bit-exact. **Verify with the existing proof
+   runners** (`cir_proof_runner`, `calibration_proof_runner`,
+   `signal/Cargo.toml`) as the acceptance gate — that is exactly what they exist for.
+2. **WASM SIMD.** Add `-C target-feature=+simd128` for `wifi-densepose-wasm` builds and
+   keep a non-SIMD artifact for older runtimes. Same determinism note as above; gate
+   with the proof runners compiled to wasm where feasible.
+3. **PGO: feasible and determinism-safe.** PGO changes inlining/layout, never FP
+   semantics. The repo already has ideal deterministic training workloads: the proof
+   runner binaries plus `engine_cycle` / `cir_bench`. Pipeline: `cargo pgo build` →
+   run proof runners + benches → `cargo pgo optimize`. Expect mid-single-digit to ~15%
+   on branchy paths (gate decisions, tracker lifecycle); the dense ISTA loop will see
+   little. Cost: CI complexity. Verdict: do it after F1–F12, not before.
+4. **Do not** enable `-ffast-math`-equivalents (`fadd_fast`, `core::intrinsics`,
+   `-C llvm-args=-fp-contract=fast`) anywhere in the witness path. This must be a
+   stated rule in CONTRIBUTING/ADR, not tribal knowledge.
+5. **BOLT / `opt-level` experiments are not worth it** ahead of F4; the pipeline is
+   FLOP-bound in one loop, not front-end bound.
+
+---
+
+## 6. Prioritized 90-day plan
+
+### Phase 0 — Measure (days 1–10)
+- Run and commit criterion baselines on a Pi 5 and an x86 dev box:
+  `engine_cycle`, `cir_bench` (all four tiers), `sketch_bench`, `signal_bench`,
+  `calibration_bench`. The 50 ms claim in `engine_cycle.rs:3` becomes a measured number.
+- Add a lightweight per-stage timing histogram (feature-gated, off in witness builds) at
+  the §2 stage boundaries; wire a CI perf-regression gate (±10%) on the committed
+  baselines.
+- Re-run the soak that produced `benchmark_baseline.json` and pin it as the accuracy
+  guardrail for everything below.
+
+### Phase 1 — Exact, zero-risk wins (days 10–35)
+All EXACT findings; no witness impact; each lands with proof-runner verification:
+- F5 (precompute warm-start diag/CSR in `CirEstimator::new`) — biggest exact CIR win.
+- F8 (hoist tomography gradient buffer), F6, F10, F12, F1, F2 (allocation/duplication
+  removal), F15 + F11 with pinned index tie-breaks.
+- WorldGraph retention policy (the §4.1 unbounded-growth fix) — design ADR + ring-buffer
+  implementation.
+- Expected outcome: measurable cycle-time reduction and flat memory under 24 h soak;
+  **identical witness hashes**.
+
+### Phase 2 — Determinism-managed structural wins (days 35–70)
+Each behind a feature flag, A/B'd against the legacy path (the `use_cir_gate` A/B switch
+at `multistatic.rs:103` is the template), with proof-hash regeneration as an explicit,
+witnessed release event:
+- **F4: FFT-based Φ/Φᴴ application in ISTA** — the headline item; the only route to
+  HE20/HE40 real-time and the 12-link pattern. Acceptance: cir_bench speedup ≥ 5× at
+  HE20, soak metrics within guardrail, new `expected_features.sha256` published in a
+  fresh witness bundle.
+- F9 (power-method Lipschitz in tomography) riding the same hash-regen train.
+- F3 (slice-based CIR entry point), choosing the exact-f64-intermediate variant if the
+  hash train slips.
+- F14: feature-gated `rayon` across **links/nodes only**, deterministic index-ordered
+  collection; CI must run the determinism test (`engine/src/lib.rs:535-548`
+  `cycle_is_deterministic`) with the feature on.
+
+### Phase 3 — Platform & toolchain (days 70–90)
+- Pi 5 `target-cpu=cortex-a76` fleet builds + proof-runner verification (§5.1).
+- `+simd128` WASM artifact + size budget check for wasm-edge (§5.2, §4.3).
+- PGO pilot in CI using proof runners as the training corpus (§5.3).
+- Re-baseline: new criterion numbers, refreshed witness bundle, updated this document's
+  §1 with real measured latencies.
+
+**Out of 90-day scope, flagged for the architecture backlog:** F16 (Complex64→Complex32
+in core), F7 (single-matrix Φ kernel — bundle with F4), and HE40-on-edge (blocked on
+F4+F7).
+
+---
+
+## 7. Summary
+
+The pipeline's only structural latency hazard is the dense ISTA CIR solver
+(`cir.rs:601-654` + `cir.rs:717-745`): fine at HT20, ~1.1 GFLOP worst-case per estimate
+at HE40, and slated to run per-link (×12). Everything else is allocation churn and
+duplicated work that can be removed with **bit-exact** refactors (F1–F12), plus one
+genuine memory bug-class issue: unbounded WorldGraph growth at 20 Hz
+(`engine/src/lib.rs:346-352`). The build profile is already optimal; remaining toolchain
+gains (target-cpu, wasm simd128, PGO) are determinism-safe and cheap. The determinism
+constraint is workable because the repo already owns the right tools — deterministic
+proof runners, an A/B gate pattern, and a per-cycle witness — so float-changing
+optimizations become scheduled, witnessed hash-regeneration events rather than risks.
--- a/docs/research/ruview-beyond-sota/README.md
+++ b/docs/research/ruview-beyond-sota/README.md
@ -0,0 +1,96 @@
+# RuView Beyond-SOTA Research Series
+
+Research swarm output (2026-06-09) defining what a beyond-state-of-the-art
+RuView implementation is, what the current system actually delivers, and the
+validation/benchmark/optimization evidence gathered in the same session.
+
+Produced by a 5-agent hierarchical research swarm (system reviewer, SOTA
+surveyor, architect, benchmark methodologist, performance analyst) plus a
+validation pass run against the working tree.
+
+## Documents
+
+| Doc | Scope | One-line takeaway |
+|-----|-------|-------------------|
+| [00-system-review.md](00-system-review.md) | Capability audit of the current engine | Signal layer is the deepest asset (`ruvsense/` ≈14.4k lines, 310 in-module tests); the model tier is the emptiest (no trained checkpoint in-tree); the live 20 Hz path is the main integration gap |
+| [01-sota-landscape-2026.md](01-sota-landscape-2026.md) | Published SOTA per capability axis (web-verified) | Defines the beyond-SOTA bar: 12-row capability → published SOTA → RuView-today → target table; IEEE 802.11bf-2025 is ratified and moves the moat up-stack |
+| [02-beyond-sota-architecture.md](02-beyond-sota-architecture.md) | Target architecture | 8 pillars (RF foundation encoder + UQ heads, differentiable RF forward model, RF-SLAM×WorldGraph loop, camera→RF distillation, swarm apertures, continual adaptation, deterministic WASM edge, NV fusion) — all landing inside existing crates, no rewrite (per ADR-136 §2.1) |
+| [03-benchmark-validation-methodology.md](03-benchmark-validation-methodology.md) | Test/validation/benchmark methodology | 6-layer validation pyramid; 15 criterion bench targets inventoried; `benchmark_baseline.json` is a live-capture anchor, not a criterion baseline; statistical protocol from ADR-149 (≥10 seeds, IQM, bootstrap CIs) |
+| [04-optimization-roadmap.md](04-optimization-roadmap.md) | Performance review + 90-day plan | ISTA CIR solver is the dominant latency hazard (~1.1 GFLOP/frame at HE40); exact zero-risk wins identified; WorldGraph grows unboundedly (no eviction) — a real bug-class |
+
+## Validation results (this session, 2026-06-09)
+
+All measured on this branch (`claude/ruview-beyond-sota-xgv8aq`), Linux
+container, `cargo test --workspace --exclude wifi-densepose-desktop
+--no-default-features` (the desktop crate needs GTK system libraries absent in
+the container; this is an environment limitation, not a code failure).
+
+| Layer | Command | Result |
+|-------|---------|--------|
+| L0 unit/integration | `cargo test --workspace --exclude wifi-densepose-desktop --no-default-features` | **154 suites, 2,797 passed, 0 failed** (pre-optimization baseline; re-run post-optimization also green) |
+| L1 deterministic proof | `python archive/v1/data/proof/verify.py` | **VERDICT: PASS** — hash `f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a` (bit-exact) |
+| L2 criterion (CIR) | `cargo bench -p wifi-densepose-signal --bench cir_bench --no-default-features --features cir` | Baselines captured pre/post optimization (below) |
+
+~~Known pre-existing issue (not introduced here): `cargo check -p
+wifi-densepose-mat --no-default-features` fails standalone with 101 serde
+feature-unification errors; it builds and passes inside `--workspace` runs.~~
+**Fixed on this branch:** `pub mod api` (the only serde user) is now gated
+behind the `api` feature that owns the optional serde dependency; all feature
+combos compile.
+
+## Optimizations applied (this session)
+
+Two **exact** (bit-identical float results — summation order unchanged,
+witness chain unaffected) optimizations from the 04 roadmap's "zero-risk"
+tier were implemented and verified:
+
+1. **`cir.rs` warm-start precompute** — the diagonal Tikhonov preconditioner
+   `diag(Φ^H Φ) + λI` and its CSR matrix depend only on Φ and λ (fixed at
+   `CirEstimator::new`) but were rebuilt on every frame (O(K·G) pass + CSR
+   allocation). Moved to construction
+   (`crates/wifi-densepose-signal/src/ruvsense/cir.rs`,
+   `build_warm_start_system`).
+2. **`tomography.rs` solver hoisting** — the ISTA gradient `Vec` was
+   allocated inside the 100-iteration loop and the Frobenius Lipschitz bound
+   recomputed per `reconstruct` call; both hoisted
+   (`crates/wifi-densepose-signal/src/ruvsense/tomography.rs`).
+
+### Measured impact (criterion, paired pre/post baselines, same container)
+
+| Bench | Pre-opt | Post-opt | Change | Significant? |
+|-------|---------|----------|--------|--------------|
+| `cir_estimate/he40` | 12.34 ms | 11.86 ms | **−3.9 %** | yes (p < 0.01) |
+| `cir_multiband_3band` (30 ms group) | 30.16 ms | 29.72 ms | −1.4 % | yes (p < 0.01) |
+| `cir_multiband` (142 ms group) | 141.9 ms | 140.1 ms | −1.2 % | yes (p < 0.01) |
+| `cir_estimate/ht40` | 11.73 ms | 11.78 ms | +0.4 % | no (p = 0.28) |
+| `cir_estimate/he20` | 2.49 ms | 2.49 ms | −0.1 % | no (p = 0.85) |
+| `cir_estimate/ht20` | 2.48 ms | 2.58 ms | +3.8 % | noise — see note |
+
+Note on ht20: `cir_estimator_new/ht20` (construction, which now does strictly
+*more* work) also shows "+3 %", establishing a ≈3–4 % container noise floor;
+the ht20 estimate delta is within it. The honest summary: the warm-start
+precompute removes 1 of ~101 O(K·G) passes per frame, so the expected gain is
+≈1–4 % — consistent with what was measured. The dominant per-frame cost is
+the 100-iteration ISTA loop itself, which is exactly what the roadmap's
+flag-gated FFT-operator proposal (8–40× on the mat-vecs, requires witnessed
+hash regeneration) targets next.
+
+Correctness post-optimization: `wifi-densepose-signal` 456 tests green;
+`wifi-densepose-engine` 11/11 green including `cycle_is_deterministic` and
+`calibration_mismatch_demotes_and_witness_stable` (witness-chain stability).
+
+## Headline conclusions
+
+1. **"Beyond SOTA" is currently unfalsifiable** without a real-CSI
+   ground-truth benchmark — standing one up (per doc 03's acceptance table
+   and ADR-149's statistical protocol) is the highest-leverage next step.
+2. **The path is evolution, not rewrite**: all eight architecture pillars in
+   doc 02 land inside existing crates on the ADR-136 `Stage<I,O>`/`FrameMeta`
+   contract spine.
+3. **The biggest engineering gaps** are the live 20 Hz ingest path, a trained
+   RF encoder checkpoint, and WorldGraph retention/eviction — ahead of any
+   frontier capability work.
+4. **Determinism is the differentiator**: every optimization and new pillar
+   must preserve the witness chain; the advisory-vs-witnessed split (doc 02
+   §determinism) is the mechanism that lets frontier components in without
+   breaking it.
--- a/v2/Cargo.lock
+++ b/v2/Cargo.lock
@ -10910,6 +10910,7 @@ version = "0.3.0"
 dependencies = [
 "blake3",
 "criterion",
+ "ruvector-mincut",
 "wifi-densepose-bfld",
 "wifi-densepose-core",
 "wifi-densepose-geo",
@ -11079,9 +11080,13 @@ dependencies = [
 "tracing",
 "tracing-subscriber",
 "ureq 2.12.1",
+ "wifi-densepose-bfld",
+ "wifi-densepose-engine",
+ "wifi-densepose-geo",
 "wifi-densepose-hardware",
 "wifi-densepose-signal",
 "wifi-densepose-wifiscan",
+ "wifi-densepose-worldgraph",
 ]

 [[package]]
--- a/v2/crates/wifi-densepose-core/src/types.rs
+++ b/v2/crates/wifi-densepose-core/src/types.rs
@ -563,6 +563,12 @@ impl crate::traits::CanonicalFrame for CsiFrame {
    /// (each fixed-width LE; `device_id` length-prefixed; `calibration_id` as
    /// 16 UUID bytes or 16 zero bytes for `None`) ‖ `(nrows, ncols)` as u32 LE
    /// ‖ complex payload as `ComplexSample::to_le_bytes()` in stream-major order.
+    ///
+    /// # Panics
+    /// If `calibration_id` is `Some(Uuid::nil())`: the nil UUID is the wire
+    /// sentinel for `None`, so encoding it would alias two distinct frames to
+    /// the same bytes (and the same witness hash) — a non-injective encoding
+    /// is refused rather than silently produced.
    fn to_canonical_bytes(&self) -> Vec<u8> {
        let m = &self.metadata;
        // 16 (id) + ~48 (meta) + 8 (shape) + 16 * n_samples
@ -600,7 +606,17 @@ impl crate::traits::CanonicalFrame for CsiFrame {
        b.extend_from_slice(&m.noise_floor_dbm.to_le_bytes());
        b.extend_from_slice(&m.sequence_number.to_le_bytes());
        match m.calibration_id {
-            Some(id) => b.extend_from_slice(id.as_bytes()),
+            Some(id) => {
+                // Some(nil) would alias the None sentinel on the wire: the
+                // bytes would decode to a *different* frame (calibration_id
+                // None) with the same witness. Refuse the non-injective
+                // encoding (see the trait-impl `# Panics` doc).
+                assert!(
+                    id != Uuid::nil(),
+                    "calibration_id Some(Uuid::nil()) is unencodable: nil is the None sentinel"
+                );
+                b.extend_from_slice(id.as_bytes());
+            }
            None => b.extend_from_slice(&[0u8; 16]),
        }
        b.extend_from_slice(&m.model_id.to_le_bytes());
@ -616,6 +632,205 @@ impl crate::traits::CanonicalFrame for CsiFrame {
    }
 }

+/// Errors decoding a frame from its canonical bytes.
+#[derive(Debug, thiserror::Error, PartialEq, Eq)]
+pub enum CanonicalDecodeError {
+    /// The buffer ended before the layout was fully read.
+    #[error("canonical buffer truncated at byte {at} (need {need} more)")]
+    Truncated {
+        /// Byte offset where reading failed.
+        at: usize,
+        /// How many more bytes were needed.
+        need: usize,
+    },
+    /// A discriminant byte held an unknown value.
+    #[error("invalid {field} discriminant {value}")]
+    BadDiscriminant {
+        /// Which field failed.
+        field: &'static str,
+        /// The offending byte.
+        value: u8,
+    },
+    /// The device-id bytes were not UTF-8.
+    #[error("device id is not valid UTF-8")]
+    BadDeviceId,
+    /// Shape (nrows × ncols) disagrees with the remaining payload length.
+    #[error("payload length mismatch: shape {rows}x{cols} needs {expect} bytes, found {found}")]
+    PayloadMismatch {
+        /// Declared rows.
+        rows: usize,
+        /// Declared cols.
+        cols: usize,
+        /// Bytes the shape implies.
+        expect: usize,
+        /// Bytes actually present.
+        found: usize,
+    },
+    /// Trailing bytes after the declared payload.
+    #[error("{0} trailing bytes after payload")]
+    TrailingBytes(usize),
+    /// A reserved region that must be all-zero held nonzero bytes. Accepting
+    /// them would let two distinct byte strings decode to the same frame
+    /// (re-encoding could not reproduce the original — forged bytes would be
+    /// indistinguishable after a replay round-trip).
+    #[error("reserved bytes for {field} must be zero")]
+    ReservedNotZero {
+        /// Which field's reserved region was nonzero.
+        field: &'static str,
+    },
+}
+
+/// Byte cursor for the canonical layout.
+struct Cursor<'a> {
+    b: &'a [u8],
+    at: usize,
+}
+
+impl<'a> Cursor<'a> {
+    fn take(&mut self, n: usize) -> Result<&'a [u8], CanonicalDecodeError> {
+        if self.b.len() - self.at < n {
+            return Err(CanonicalDecodeError::Truncated {
+                at: self.at,
+                need: n - (self.b.len() - self.at),
+            });
+        }
+        let s = &self.b[self.at..self.at + n];
+        self.at += n;
+        Ok(s)
+    }
+    fn u8(&mut self) -> Result<u8, CanonicalDecodeError> {
+        Ok(self.take(1)?[0])
+    }
+    fn u16(&mut self) -> Result<u16, CanonicalDecodeError> {
+        Ok(u16::from_le_bytes(self.take(2)?.try_into().unwrap()))
+    }
+    fn u32(&mut self) -> Result<u32, CanonicalDecodeError> {
+        Ok(u32::from_le_bytes(self.take(4)?.try_into().unwrap()))
+    }
+    fn i64(&mut self) -> Result<i64, CanonicalDecodeError> {
+        Ok(i64::from_le_bytes(self.take(8)?.try_into().unwrap()))
+    }
+    fn f32(&mut self) -> Result<f32, CanonicalDecodeError> {
+        Ok(f32::from_le_bytes(self.take(4)?.try_into().unwrap()))
+    }
+    fn i8(&mut self) -> Result<i8, CanonicalDecodeError> {
+        Ok(self.take(1)?[0] as i8)
+    }
+    fn uuid(&mut self) -> Result<Uuid, CanonicalDecodeError> {
+        Ok(Uuid::from_bytes(self.take(16)?.try_into().unwrap()))
+    }
+}
+
+impl CsiFrame {
+    /// Reconstruct a frame from its [`to_canonical_bytes`] encoding — the
+    /// replay half of the ADR-136 contract. Round-trip law (tested):
+    /// `from_canonical_bytes(f.to_canonical_bytes())` yields a frame with the
+    /// **same id, metadata, payload, and witness hash** as `f`.
+    ///
+    /// Amplitude/phase are recomputed from the complex payload (they are
+    /// projections, not independent state).
+    ///
+    /// [`to_canonical_bytes`]: crate::traits::CanonicalFrame::to_canonical_bytes
+    ///
+    /// # Errors
+    /// [`CanonicalDecodeError`] on truncation, bad discriminants, non-UTF-8
+    /// device id, nonzero reserved bytes, shape/payload disagreement, or
+    /// trailing bytes — every malformed input fails closed. Strictness
+    /// guarantees injectivity on the accepted domain: any accepted byte
+    /// string re-encodes to exactly itself.
+    pub fn from_canonical_bytes(bytes: &[u8]) -> Result<Self, CanonicalDecodeError> {
+        let mut c = Cursor { b: bytes, at: 0 };
+
+        let id = FrameId::from_uuid(c.uuid()?);
+
+        let seconds = c.i64()?;
+        let nanos = c.u32()?;
+        let dev_len = c.u32()? as usize;
+        let device_id = core::str::from_utf8(c.take(dev_len)?)
+            .map_err(|_| CanonicalDecodeError::BadDeviceId)?
+            .to_string();
+        let frequency_band = match c.u8()? {
+            0 => FrequencyBand::Band2_4GHz,
+            1 => FrequencyBand::Band5GHz,
+            2 => FrequencyBand::Band6GHz,
+            v => {
+                return Err(CanonicalDecodeError::BadDiscriminant {
+                    field: "frequency_band",
+                    value: v,
+                })
+            }
+        };
+        let channel = c.u8()?;
+        let bandwidth_mhz = c.u16()?;
+        let tx_antennas = c.u8()?;
+        let rx_antennas = c.u8()?;
+        let spacing_mm = match c.u8()? {
+            1 => Some(c.f32()?),
+            0 => {
+                // Reserved padding must be zero (decoder strictness =
+                // injectivity on the accepted domain): otherwise forged
+                // nonzero padding would decode to the same frame as the
+                // canonical encoding and re-encode differently.
+                if c.take(4)? != [0u8; 4] {
+                    return Err(CanonicalDecodeError::ReservedNotZero { field: "spacing_mm" });
+                }
+                None
+            }
+            v => {
+                return Err(CanonicalDecodeError::BadDiscriminant {
+                    field: "spacing_mm",
+                    value: v,
+                })
+            }
+        };
+        let rssi_dbm = c.i8()?;
+        let noise_floor_dbm = c.i8()?;
+        let sequence_number = c.u32()?;
+        let cal = c.uuid()?;
+        let calibration_id = if cal == Uuid::nil() { None } else { Some(cal) };
+        let model_id = c.u16()?;
+        let model_version = c.u16()?;
+
+        let rows = c.u32()? as usize;
+        let cols = c.u32()? as usize;
+        let expect = rows.saturating_mul(cols).saturating_mul(16);
+        let found = bytes.len() - c.at;
+        if found < expect {
+            return Err(CanonicalDecodeError::PayloadMismatch { rows, cols, expect, found });
+        }
+        let mut samples = Vec::with_capacity(rows * cols);
+        for _ in 0..rows * cols {
+            let raw: [u8; 16] = c.take(16)?.try_into().unwrap();
+            samples.push(ComplexSample::from_le_bytes(raw).0);
+        }
+        if c.at != bytes.len() {
+            return Err(CanonicalDecodeError::TrailingBytes(bytes.len() - c.at));
+        }
+        let data = Array2::from_shape_vec((rows, cols), samples).map_err(|_| {
+            CanonicalDecodeError::PayloadMismatch { rows, cols, expect, found }
+        })?;
+
+        let metadata = CsiMetadata {
+            timestamp: Timestamp { seconds, nanos },
+            device_id: DeviceId::new(device_id),
+            frequency_band,
+            channel,
+            bandwidth_mhz,
+            antenna_config: AntennaConfig { tx_antennas, rx_antennas, spacing_mm },
+            rssi_dbm,
+            noise_floor_dbm,
+            sequence_number,
+            calibration_id,
+            model_id,
+            model_version,
+        };
+
+        let amplitude = data.mapv(num_complex::Complex::norm);
+        let phase = data.mapv(num_complex::Complex::arg);
+        Ok(Self { id, metadata, data, amplitude, phase })
+    }
+}
+
 // =============================================================================
 // Signal Types
 // =============================================================================
@ -1307,6 +1522,133 @@ mod tests {
        assert_ne!(frame.witness_hash(), frame2.witness_hash());
    }

+    /// AC7 — replay: `from_canonical_bytes` is the exact inverse of
+    /// `to_canonical_bytes` — same id, metadata, payload, and witness hash.
+    /// This is the capture-to-claim law: a stored canonical capture replays to
+    /// a frame the pipeline cannot distinguish from the original.
+    #[test]
+    fn ac7_canonical_round_trip_replays_identically() {
+        use ndarray::Array2;
+        let mut meta = CsiMetadata::new(DeviceId::new("node-α"), FrequencyBand::Band6GHz, 37);
+        meta.set_calibration(uuid::Uuid::new_v4());
+        meta.set_model(9, 0x0203);
+        meta.antenna_config.spacing_mm = Some(62.5);
+        meta.rssi_dbm = -41;
+        meta.sequence_number = 123_456;
+        let data = Array2::from_shape_fn((2, 56), |(r, c)| {
+            Complex64::new((r as f64 + 1.0) * (c as f64).cos(), (c as f64 * 0.1).tan())
+        });
+        let frame = CsiFrame::new(meta, data);
+
+        let bytes = frame.to_canonical_bytes();
+        let replayed = CsiFrame::from_canonical_bytes(&bytes).expect("decodes");
+
+        assert_eq!(replayed.id, frame.id);
+        // Field-wise metadata equality (CsiMetadata has no PartialEq; the
+        // byte-identical re-encoding below covers every field regardless).
+        assert_eq!(replayed.metadata.device_id, frame.metadata.device_id);
+        assert_eq!(replayed.metadata.calibration_id, frame.metadata.calibration_id);
+        assert_eq!(replayed.metadata.model_version, frame.metadata.model_version);
+        assert_eq!(replayed.metadata.antenna_config.spacing_mm, Some(62.5));
+        assert_eq!(replayed.data, frame.data);
+        // Witness equality — the strongest statement of equivalence.
+        assert_eq!(replayed.witness_hash(), frame.witness_hash());
+        // Re-encoding is byte-identical.
+        assert_eq!(replayed.to_canonical_bytes(), bytes);
+        // Projections recomputed consistently.
+        assert_eq!(replayed.amplitude, frame.amplitude);
+    }
+
+    /// AC8 — the decoder fails closed on every malformed-input class.
+    #[test]
+    fn ac8_canonical_decode_fails_closed() {
+        use ndarray::Array2;
+        let meta = CsiMetadata::new(DeviceId::new("n"), FrequencyBand::Band2_4GHz, 1);
+        let data = Array2::from_shape_fn((1, 4), |(_, c)| Complex64::new(c as f64, 0.0));
+        let frame = CsiFrame::new(meta, data);
+        let bytes = frame.to_canonical_bytes();
+
+        // Truncation anywhere fails: in the payload it is caught by the
+        // shape-vs-length check (PayloadMismatch); in the header by Truncated.
+        assert!(matches!(
+            CsiFrame::from_canonical_bytes(&bytes[..bytes.len() - 1]),
+            Err(CanonicalDecodeError::PayloadMismatch { .. })
+        ));
+        assert!(matches!(
+            CsiFrame::from_canonical_bytes(&bytes[..10]),
+            Err(CanonicalDecodeError::Truncated { .. })
+        ));
+
+        // Trailing junk fails.
+        let mut padded = bytes.clone();
+        padded.extend_from_slice(&[0u8; 3]);
+        assert!(matches!(
+            CsiFrame::from_canonical_bytes(&padded),
+            Err(CanonicalDecodeError::TrailingBytes(3))
+        ));
+
+        // Bad frequency-band discriminant fails. Band byte sits right after
+        // id(16) + seconds(8) + nanos(4) + dev_len(4) + dev("n" = 1).
+        let mut bad = bytes.clone();
+        bad[16 + 8 + 4 + 4 + 1] = 9;
+        assert!(matches!(
+            CsiFrame::from_canonical_bytes(&bad),
+            Err(CanonicalDecodeError::BadDiscriminant { field: "frequency_band", value: 9 })
+        ));
+
+        // A nil calibration uuid decodes as None (the documented encoding).
+        let replayed = CsiFrame::from_canonical_bytes(&bytes).unwrap();
+        assert_eq!(replayed.metadata.calibration_id, None);
+    }
+
+    /// AC8b (review finding 7) — decoder strictness = injectivity on the
+    /// accepted domain: forged nonzero bytes in the `spacing_mm` reserved
+    /// region are rejected, so for accepted inputs `re-encode != original`
+    /// is impossible.
+    #[test]
+    fn ac8b_forged_reserved_spacing_bytes_rejected() {
+        use ndarray::Array2;
+        let meta = CsiMetadata::new(DeviceId::new("n"), FrequencyBand::Band2_4GHz, 1);
+        let data = Array2::from_shape_fn((1, 4), |(_, c)| Complex64::new(c as f64, 0.0));
+        let frame = CsiFrame::new(meta, data);
+        let bytes = frame.to_canonical_bytes();
+
+        // Spacing tag sits after id(16)+secs(8)+nanos(4)+dev_len(4)+dev("n"=1)
+        // + band(1)+channel(1)+bw(2)+tx(1)+rx(1); the 4 reserved bytes follow.
+        let tag_off = 16 + 8 + 4 + 4 + 1 + 1 + 1 + 2 + 1 + 1;
+        assert_eq!(bytes[tag_off], 0, "fixture must encode spacing_mm = None");
+        assert_eq!(&bytes[tag_off + 1..tag_off + 5], &[0u8; 4]);
+
+        // Sanity: the canonical bytes decode and re-encode byte-identically.
+        let ok = CsiFrame::from_canonical_bytes(&bytes).unwrap();
+        assert_eq!(ok.to_canonical_bytes(), bytes);
+
+        // Forge each reserved byte: the decoder must fail closed (before the
+        // fix it decoded to the same frame, whose re-encoding differed from
+        // the forged original — a witness-replay ambiguity).
+        for i in 1..=4 {
+            let mut forged = bytes.clone();
+            forged[tag_off + i] = 0xAB;
+            assert!(matches!(
+                CsiFrame::from_canonical_bytes(&forged),
+                Err(CanonicalDecodeError::ReservedNotZero { field: "spacing_mm" })
+            ));
+        }
+    }
+
+    /// AC8c (review finding 7) — `Some(Uuid::nil())` calibration is an
+    /// encoding error: nil is the wire sentinel for `None`, so encoding it
+    /// would alias two distinct frames to one byte string (and one witness).
+    #[test]
+    #[should_panic(expected = "nil is the None sentinel")]
+    fn ac8c_nil_calibration_id_is_an_encoding_error() {
+        use ndarray::Array2;
+        let mut meta = CsiMetadata::new(DeviceId::new("n"), FrequencyBand::Band2_4GHz, 1);
+        meta.calibration_id = Some(uuid::Uuid::nil());
+        let data = Array2::from_shape_fn((1, 2), |(_, c)| Complex64::new(c as f64, 0.0));
+        let _ = CsiFrame::new(meta, data).to_canonical_bytes();
+    }
+
    /// AC3 — `serde(default)` forward-read of pre-ADR-136 metadata JSON.
    #[cfg(feature = "serde")]
    #[test]
--- a/v2/crates/wifi-densepose-engine/Cargo.toml
+++ b/v2/crates/wifi-densepose-engine/Cargo.toml
@ -19,6 +19,9 @@ wifi-densepose-worldgraph = { version = "0.3.0", path = "../wifi-densepose-world
 wifi-densepose-geo = { version = "0.1.0", path = "../wifi-densepose-geo" }
 # Deterministic witness over the trust decision (ADR-137 §2.7 / ADR-028).
 blake3 = { version = "1.5", default-features = false }
+# Dynamic min-cut over the live mesh coupling graph (mesh_guard.rs):
+# incremental partition-risk monitoring + structural recalibration trigger.
+ruvector-mincut = { workspace = true }

 [dev-dependencies]
 criterion = { version = "0.5", features = ["html_reports"] }
--- a/v2/crates/wifi-densepose-engine/benches/engine_cycle.rs
+++ b/v2/crates/wifi-densepose-engine/benches/engine_cycle.rs
@ -48,5 +48,41 @@ fn bench_cycle(c: &mut Criterion) {
    });
 }

-criterion_group!(benches, bench_cycle);
+/// Mesh guard in isolation: cold build (node set appears) vs steady state
+/// (identical weights next cycle → change-gated, zero graph updates) for a
+/// 12-node mesh — the full ADR-029 deployment size.
+fn bench_mesh_guard(c: &mut Criterion) {
+    use wifi_densepose_engine::MeshGuard;
+    let nodes: Vec<u8> = (0..12).collect();
+    let w = |i: usize, j: usize| 0.4 + 0.01 * ((i + j) % 7) as f64;
+
+    c.bench_function("mesh_guard_cold_build_12n", |b| {
+        b.iter_batched(
+            MeshGuard::default,
+            |mut g| g.update(&nodes, w),
+            BatchSize::SmallInput,
+        );
+    });
+
+    c.bench_function("mesh_guard_steady_state_12n", |b| {
+        let mut g = MeshGuard::default();
+        g.update(&nodes, w); // warm
+        b.iter(|| g.update(&nodes, w));
+    });
+
+    c.bench_function("mesh_guard_one_edge_change_12n", |b| {
+        let mut g = MeshGuard::default();
+        g.update(&nodes, w);
+        let mut flip = false;
+        b.iter(|| {
+            flip = !flip;
+            let delta = if flip { 0.2 } else { 0.0 };
+            g.update(&nodes, |i, j| {
+                if (i.min(j), i.max(j)) == (0, 1) { 0.4 + delta } else { w(i, j) }
+            })
+        });
+    });
+}
+
+criterion_group!(benches, bench_cycle, bench_mesh_guard);
 criterion_main!(benches);
--- a/v2/crates/wifi-densepose-engine/src/lib.rs
+++ b/v2/crates/wifi-densepose-engine/src/lib.rs
@ -46,6 +46,9 @@ use wifi_densepose_worldgraph::{
    WorldId, WorldNode, ZoneBoundsEnu,
 };

+pub mod mesh_guard;
+pub use mesh_guard::{MeshGuard, MeshPartitionReport};
+
 /// Errors from an engine cycle.
 #[derive(Debug)]
 pub enum EngineError {
@ -97,6 +100,15 @@ pub struct TrustedOutput {
    /// BLAKE3 witness over the trust decision (provenance ‖ class ‖ calibration)
    /// — a deterministic, signed-belief fingerprint (ADR-137 §2.7 / ADR-028).
    pub witness: [u8; 32],
+    /// Whether the drift→recalibration advisor recommends re-running the
+    /// ADR-135 baseline / refitting the per-room adapter (ADR-150 §3.4):
+    /// sustained low coherence or an ADR-142 change-point this cycle.
+    pub recalibration_recommended: bool,
+    /// Dynamic min-cut partition report over the live mesh coupling graph
+    /// (None for meshes of fewer than two nodes). `at_risk` counts as a
+    /// structural event for the recalibration advisor and names the nodes
+    /// (`weak_side`) closest to splitting off — failure/jamming triage.
+    pub mesh: Option<MeshPartitionReport>,
 }

 /// Composition root for the RuView streaming engine.
@ -116,6 +128,74 @@ pub struct StreamingEngine {
    slam: RfSlam,
    // ADR-139 live loop: stable track_id -> PersonTrack WorldId.
    person_tracks: BTreeMap<u64, WorldId>,
+    // WorldGraph belief retention: max live SemanticState nodes. The live loop
+    // appends one belief per cycle (1.7M/day at 20 Hz); durable history is the
+    // recorder's job, so old beliefs are evicted deterministically past this cap.
+    semantic_retention: usize,
+    // Per-room calibration adapter (ADR-150 §3.4: ~11 KB LoRA on a frozen
+    // base). Identity is part of the trust chain: when set, the adapter id is
+    // appended to the provenance model_version, so swapping adapters changes
+    // the witness. None = shared base model.
+    adapter: Option<AdapterInfo>,
+    // Drift→recalibration advisor (ADR-135 trigger for ADR-150 §3.4 refit).
+    recal: RecalibrationAdvisor,
+    // Dynamic min-cut mesh partition guard (incremental, change-gated).
+    mesh: MeshGuard,
+}
+
+/// Identity of an active per-room calibration adapter (ADR-150 §3.4). The id
+/// must be content-derived (e.g. a hash prefix of the adapter file) so the
+/// provenance/witness chain pins the exact weights that shaped inference.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct AdapterInfo {
+    /// Content-derived adapter identity (e.g. first 16 hex of its SHA-256).
+    pub adapter_id: String,
+    /// Number of in-room samples the adapter was fitted on (0 if unknown).
+    pub trained_samples: u32,
+}
+
+/// Recommends re-running calibration / adapter refit when the live signal
+/// degrades persistently (ADR-135 drift → ADR-150 §3.4 few-shot recalibration).
+///
+/// Two triggers, both cheap and deterministic:
+/// - `low_coherence_streak`: N consecutive cycles whose base coherence fell
+///   below the floor (sustained degradation, not a single bad frame);
+/// - any ADR-142 change-point this cycle (the environment itself changed).
+#[derive(Debug, Clone)]
+pub struct RecalibrationAdvisor {
+    /// Coherence below this counts toward the streak.
+    pub coherence_floor: f32,
+    /// Consecutive low-coherence cycles required to recommend recalibration.
+    pub streak_threshold: u32,
+    streak: u32,
+}
+
+impl Default for RecalibrationAdvisor {
+    fn default() -> Self {
+        Self {
+            coherence_floor: 0.5,
+            streak_threshold: 60, // ~3 s at 20 Hz of sustained degradation
+            streak: 0,
+        }
+    }
+}
+
+impl RecalibrationAdvisor {
+    /// Feed one cycle's evidence; returns whether recalibration is recommended.
+    fn observe(&mut self, base_coherence: f32, change_point: bool) -> bool {
+        if base_coherence < self.coherence_floor {
+            self.streak = self.streak.saturating_add(1);
+        } else {
+            self.streak = 0;
+        }
+        change_point || self.streak >= self.streak_threshold
+    }
+
+    /// Current consecutive low-coherence cycle count.
+    #[must_use]
+    pub fn streak(&self) -> u32 {
+        self.streak
+    }
 }

 impl StreamingEngine {
@ -135,9 +215,53 @@ impl StreamingEngine {
            evolution: None,
            slam: RfSlam::with_discovery(0.5, 5, 0.6),
            person_tracks: BTreeMap::new(),
+            semantic_retention: Self::DEFAULT_SEMANTIC_RETENTION,
+            adapter: None,
+            recal: RecalibrationAdvisor::default(),
+            mesh: MeshGuard::default(),
        }
    }

+    /// Activate a per-room calibration adapter (ADR-150 §3.4). From the next
+    /// cycle on, the adapter id is part of provenance `model_version` — and
+    /// therefore of the witness — so the exact weights shaping inference are
+    /// pinned in the trust chain. Pass the result of hashing the adapter file.
+    pub fn set_room_adapter(&mut self, info: AdapterInfo) {
+        self.adapter = Some(info);
+    }
+
+    /// Deactivate the adapter (revert to the shared base model).
+    pub fn clear_room_adapter(&mut self) {
+        self.adapter = None;
+    }
+
+    /// The active adapter, if any.
+    #[must_use]
+    pub fn room_adapter(&self) -> Option<&AdapterInfo> {
+        self.adapter.as_ref()
+    }
+
+    /// Tune the drift→recalibration advisor (floor + streak threshold).
+    pub fn set_recalibration_advisor(&mut self, advisor: RecalibrationAdvisor) {
+        self.recal = advisor;
+    }
+
+    /// Mutable access to the mesh partition guard (risk threshold, quantum,
+    /// min-node count). Operators tune the partition-risk sensitivity here.
+    pub fn mesh_guard_mut(&mut self) -> &mut MeshGuard {
+        &mut self.mesh
+    }
+
+    /// Default cap on live `SemanticState` beliefs in the WorldGraph
+    /// (~6 minutes of full-rate history at 20 Hz; older beliefs are evicted —
+    /// durable history belongs to the recorder).
+    pub const DEFAULT_SEMANTIC_RETENTION: usize = 7_200;
+
+    /// Override the `SemanticState` retention cap (minimum 1).
+    pub fn set_semantic_retention(&mut self, max_states: usize) {
+        self.semantic_retention = max_states.max(1);
+    }
+
    /// ADR-139 live loop: create or update a `PersonTrack` node by stable
    /// `track_id`, locate it in `room`, and wire an `Observes` edge from
    /// `sensor` (so the privacy rollup can suppress it under identity-strict
@ -321,21 +445,47 @@ impl StreamingEngine {
        // 4. Evolution change-point (ADR-142) over per-node mean amplitude.
        let change_point = self.track_evolution(node_frames, now_ms, room);

-        // 5. Privacy control plane (ADR-141): demote on a fusion-level OR an
-        //    array-level contradiction (monotonic — information only removed).
+        // 5. Mesh partition guard (ADR-032): dynamic min-cut over the coupling
+        //    graph. Coupling between nodes i and j is the product of their
+        //    fusion attention weights scaled by the node count, so a node the
+        //    fuser down-weights is exactly a node weakly coupled in the graph.
+        //    (Change-gated incremental updates: steady state touches 0 edges.)
+        let node_ids: Vec<u8> = node_frames.iter().map(|f| f.node_id).collect();
+        let weights = &quality.per_node_weights;
+        let n = weights.len() as f64;
+        let mesh = self.mesh.update(&node_ids, |i, j| {
+            let wi = weights.get(i).copied().unwrap_or(0.0) as f64;
+            let wj = weights.get(j).copied().unwrap_or(0.0) as f64;
+            wi * wj * n
+        });
+        let mesh_at_risk = mesh.as_ref().is_some_and(|m| m.at_risk);
+
+        // 6. Privacy control plane (ADR-141): demote on a fusion-level OR an
+        //    array-level contradiction OR a mesh close to partitioning. The
+        //    last is a security/reliability signal (ADR-032): a fragmenting
+        //    array makes the fused belief less trustworthy, so we emit at a
+        //    more restricted class. Monotonic — information is only ever
+        //    removed — and the demotion is part of the witness.
        let base_class = self.privacy.active_class();
-        let demoted = quality.forces_privacy_demotion() || array_contradiction;
+        let demoted = quality.forces_privacy_demotion() || array_contradiction || mesh_at_risk;
        let effective_class = if demoted { demote_one(base_class) } else { base_class };

-        // 6. Semantic state with mandatory provenance (ADR-139/140). The
+        // 7. Semantic state with mandatory provenance (ADR-139/140). The
        //    calibration version comes from the *agreed* epoch (None on mismatch).
+        //    When a per-room adapter is active (ADR-150 §3.4) its content-derived
+        //    id is part of model_version — and therefore of the witness — so the
+        //    exact weights shaping inference are pinned in the trust chain.
        let calibration_version = match quality.calibration_id {
            Some(c) => format!("cal:{:016x}", c.0),
            None => "cal:none".to_string(),
        };
+        let model_version = match &self.adapter {
+            Some(a) => format!("rfenc-v{}+adapter:{}", self.model_version, a.adapter_id),
+            None => format!("rfenc-v{}", self.model_version),
+        };
        let provenance = SemanticProvenance {
            evidence: quality.evidence_refs.iter().map(|e| format!("{e:?}")).collect(),
-            model_version: format!("rfenc-v{}", self.model_version),
+            model_version,
            calibration_version,
            privacy_decision: format!("{:?}/{:?}", self.privacy.active_mode(), effective_class),
        };
@ -350,10 +500,23 @@ impl StreamingEngine {
            provenance.clone(),
            &[room],
        );
+        // Retention: bound the live belief set (one node is appended per cycle;
+        // without this the graph grows ~1.7M nodes/day at 20 Hz). Deterministic
+        // eviction; the just-added belief is always newest and survives.
+        self.world.prune_semantic_states(self.semantic_retention);

-        // 7. Deterministic witness over the trust decision (ADR-137 §2.7).
+        // 8. Deterministic witness over the trust decision (ADR-137 §2.7).
+        //    `effective_class` already reflects any mesh-risk demotion, so a
+        //    fragmenting array shifts the witness — partition risk is auditable.
        let witness = witness_of(&provenance, effective_class);

+        // 9. Drift→recalibration advisor (ADR-135 → ADR-150 §3.4): sustained
+        //    low coherence, an environment change-point, or a mesh close to
+        //    partitioning recommends refit.
+        let recalibration_recommended = self
+            .recal
+            .observe(quality.base_coherence, change_point.is_some() || mesh_at_risk);
+
        self.cycle += 1;
        Ok(TrustedOutput {
            semantic_id,
@ -364,6 +527,8 @@ impl StreamingEngine {
            directional,
            change_point,
            witness,
+            recalibration_recommended,
+            mesh,
        })
    }

@ -547,6 +712,205 @@ mod tests {
        assert_eq!(o1.quality.per_node_weights, o2.quality.per_node_weights);
    }

+    /// ADR-150 §3.4 adapter provenance: activating a per-room adapter changes
+    /// the provenance model_version AND the witness — the exact weights shaping
+    /// inference are pinned in the trust chain, so an adapter can never swap
+    /// silently. Clearing it restores the base identity (and base witness).
+    #[test]
+    fn adapter_identity_is_witnessed() {
+        let cal = CalibrationId(9);
+        let frames = [node_frame(0, 1000, 56), node_frame(1, 1001, 56)];
+
+        let (mut e, room) = engine();
+        let base = e.process_cycle(&frames, cal, room, 1_000).unwrap();
+        assert_eq!(base.provenance.model_version, "rfenc-v1");
+
+        e.set_room_adapter(AdapterInfo {
+            adapter_id: "a1b2c3d4e5f60718".into(),
+            trained_samples: 150,
+        });
+        let adapted = e.process_cycle(&frames, cal, room, 2_000).unwrap();
+        assert_eq!(
+            adapted.provenance.model_version,
+            "rfenc-v1+adapter:a1b2c3d4e5f60718"
+        );
+        assert_ne!(adapted.witness, base.witness, "adapter must shift the witness");
+
+        // A different adapter id yields a different witness again.
+        e.set_room_adapter(AdapterInfo {
+            adapter_id: "ffffffffffffffff".into(),
+            trained_samples: 150,
+        });
+        let other = e.process_cycle(&frames, cal, room, 3_000).unwrap();
+        assert_ne!(other.witness, adapted.witness);
+
+        // Clearing restores the base identity and the base witness.
+        e.clear_room_adapter();
+        let back = e.process_cycle(&frames, cal, room, 4_000).unwrap();
+        assert_eq!(back.provenance.model_version, "rfenc-v1");
+        assert_eq!(back.witness, base.witness);
+    }
+
+    /// Drift→recalibration advisor logic: a sustained low-coherence streak
+    /// recommends refit; a single healthy cycle resets the streak; a
+    /// change-point recommends immediately regardless of streak.
+    #[test]
+    fn recalibration_advisor_streak_and_change_point() {
+        let mut adv = RecalibrationAdvisor {
+            coherence_floor: 0.5,
+            streak_threshold: 3,
+            ..Default::default()
+        };
+        // Healthy cycles never recommend and keep the streak at zero.
+        for _ in 0..5 {
+            assert!(!adv.observe(0.9, false));
+        }
+        assert_eq!(adv.streak(), 0);
+        // Two low cycles: not yet.
+        assert!(!adv.observe(0.2, false));
+        assert!(!adv.observe(0.2, false));
+        // Third consecutive low cycle: fire.
+        assert!(adv.observe(0.2, false));
+        // Recovery resets the streak.
+        assert!(!adv.observe(0.9, false));
+        assert_eq!(adv.streak(), 0);
+        // A change-point recommends immediately, even at full coherence.
+        assert!(adv.observe(0.9, true));
+    }
+
+    /// Engine-level: clean coherent cycles never recommend recalibration (the
+    /// advisor is wired into process_cycle and stays quiet on healthy input).
+    #[test]
+    fn healthy_cycles_do_not_recommend_recalibration() {
+        let (mut e, room) = engine();
+        e.set_recalibration_advisor(RecalibrationAdvisor {
+            coherence_floor: 0.5,
+            streak_threshold: 3,
+            ..Default::default()
+        });
+        let cal = CalibrationId(2);
+        for i in 0..5u64 {
+            let frames = [
+                node_frame(0, 1_000 + i * 50_000, 56),
+                node_frame(1, 1_001 + i * 50_000, 56),
+            ];
+            let out = e.process_cycle(&frames, cal, room, i as i64).unwrap();
+            assert!(!out.recalibration_recommended);
+        }
+    }
+
+    /// Maximum total coupling mass of an n-node mesh whose attention weights
+    /// sum to 1 (coupling = wᵢ·wⱼ·n): Σ_{i<j} wᵢwⱼ·n = n(1−Σwᵢ²)/2 ≤ (n−1)/2.
+    /// Any cut is a subset of the edges, so every achievable cut value is
+    /// bounded by this mass — a risk threshold at or above it is *guaranteed*
+    /// to be crossed (deterministic fixture, review finding 4).
+    fn max_coupling_mass(n_nodes: usize) -> f64 {
+        (n_nodes as f64 - 1.0) / 2.0
+    }
+
+    /// Mesh guard wiring: a balanced 2-node cycle reports a mesh (cut exists)
+    /// but never flags risk (min_nodes=3); a 3-node mesh whose cut value
+    /// *deterministically* falls at or below the configured risk threshold
+    /// (threshold = the provable upper bound on any achievable cut) is flagged
+    /// at_risk, and the structural event feeds the recalibration advisor
+    /// immediately — no conditional assertions (review finding 4).
+    #[test]
+    fn mesh_partition_risk_feeds_recalibration() {
+        let (mut e, room) = engine();
+        let cal = CalibrationId(3);
+
+        // Balanced 2-node mesh: report present, no risk.
+        let out = e
+            .process_cycle(&[node_frame(0, 1000, 56), node_frame(1, 1001, 56)], cal, room, 1)
+            .unwrap();
+        let mesh = out.mesh.expect("2-node mesh reports");
+        assert!(!mesh.at_risk);
+        assert!(!out.recalibration_recommended);
+
+        // 3-node mesh with the operator risk threshold set to the provable
+        // cut upper bound: the crossing is deterministic regardless of the
+        // fuser's exact weighting.
+        e.mesh_guard_mut().risk_threshold = max_coupling_mass(3);
+        let frames = [
+            node_frame(0, 10_000_000, 56),
+            node_frame(1, 10_000_001, 56),
+            node_frame(2, 10_000_002, 56),
+        ];
+        let out3 = e.process_cycle(&frames, cal, room, 2).unwrap();
+        let m3 = out3.mesh.expect("3-node mesh reports");
+        assert!(m3.at_risk, "cut ≤ threshold must flag partition risk");
+        assert!(
+            out3.recalibration_recommended,
+            "mesh risk is a structural event — the advisor must fire immediately, no streak"
+        );
+        assert!(m3.cut_value.is_finite() && m3.cut_value >= 0.0);
+    }
+
+    /// Mesh partition risk demotes the privacy class and shifts the witness —
+    /// a fragmenting array makes the fused belief less trustworthy, so it is
+    /// emitted at a more restricted class, and that demotion is auditable.
+    /// Both cycles use the *same 3-node topology and frames*; the engines
+    /// differ only in the forced mesh risk, so the witness delta is
+    /// attributable to the risk demotion alone (review finding 4).
+    #[test]
+    fn mesh_risk_demotes_privacy_and_shifts_witness() {
+        let cal = CalibrationId(8);
+        let frames3 = [
+            node_frame(0, 1000, 56),
+            node_frame(1, 1001, 56),
+            node_frame(2, 1002, 56),
+        ];
+
+        // Baseline: same topology, default risk threshold — clean cycle, not
+        // demoted (PrivateHome → Anonymous), mesh healthy.
+        let (mut e1, r1) = engine();
+        let base = e1.process_cycle(&frames3, cal, r1, 5_000).unwrap();
+        assert!(!base.mesh.as_ref().unwrap().at_risk);
+        assert!(!base.demoted);
+        assert_eq!(base.effective_class, PrivacyClass::Anonymous);
+
+        // Forced risk: identical frames/topology, threshold at the provable
+        // cut upper bound so the crossing is deterministic.
+        let (mut e2, r2) = engine();
+        e2.mesh_guard_mut().risk_threshold = max_coupling_mass(3);
+        let risky = e2.process_cycle(&frames3, cal, r2, 5_000).unwrap();
+        assert!(risky.mesh.as_ref().unwrap().at_risk);
+        assert!(risky.demoted, "mesh risk must demote");
+        // PrivateHome base Anonymous(2) → demoted to Restricted(3).
+        assert_eq!(risky.effective_class, PrivacyClass::Restricted);
+        assert!(risky.provenance.privacy_decision.contains("Restricted"));
+        assert_ne!(
+            risky.witness, base.witness,
+            "same topology, risk-only delta must shift the witness"
+        );
+    }
+
+    /// WorldGraph belief retention: the live loop appends one SemanticState per
+    /// cycle; past the cap the oldest beliefs are evicted so graph memory is
+    /// bounded, while structural nodes and the newest belief always survive.
+    #[test]
+    fn semantic_state_growth_is_bounded() {
+        let (mut e, room) = engine();
+        e.set_semantic_retention(5);
+        let cal = CalibrationId(1);
+        let mut last_id = None;
+        let baseline_nodes = 2; // room + sensor
+        for i in 0..20u64 {
+            let frames = [
+                node_frame(0, 1000 + i * 50_000, 56),
+                node_frame(1, 1001 + i * 50_000, 56),
+            ];
+            let out = e.process_cycle(&frames, cal, room, 5_000 + i as i64).unwrap();
+            last_id = Some(out.semantic_id);
+            assert!(e.world().node_count() <= baseline_nodes + 5);
+        }
+        // 20 cycles ran, only 5 beliefs remain, newest is still present.
+        assert_eq!(e.world().node_count(), baseline_nodes + 5);
+        assert!(e.world().node(last_id.unwrap()).is_some());
+        // Structural nodes survive eviction.
+        assert!(e.world().node(room).is_some());
+    }
+
    fn node_frame_scaled(node_id: u8, ts_us: u64, n_sub: usize, scale: f32) -> MultiBandCsiFrame {
        MultiBandCsiFrame {
            node_id,
--- a/v2/crates/wifi-densepose-engine/src/mesh_guard.rs
+++ b/v2/crates/wifi-densepose-engine/src/mesh_guard.rs
@ -0,0 +1,364 @@
+//! Mesh partition guard: dynamic min-cut over the live multistatic node graph.
+//!
+//! The fusion mesh (nodes = sensing nodes, edge weights = fusion coupling
+//! derived from per-node attention weights) changes *incrementally* at cycle
+//! rate — one node's coupling drifts, a node joins or drops. This module
+//! maintains a [`ruvector_mincut::DynamicMinCut`] over that graph and exposes,
+//! per cycle:
+//!
+//! - the **min-cut value** — the cheapest set of couplings whose loss splits
+//!   the mesh in two: a principled, global "how close is the array to
+//!   partitioning" number (vs per-node heuristics that miss multi-node
+//!   structure);
+//! - the **weak side** — which specific nodes are about to partition (feeds
+//!   failure/jamming triage, ADR-032 posture);
+//! - an **at-risk flag** consumed by the engine: it counts as a structural
+//!   event for the drift→recalibration advisor.
+//!
+//! ## Cost model (the optimization)
+//!
+//! Weights are quantized (default 1/64; a *nonzero* coupling below one quantum
+//! saturates to quantum 1 so a live coupling is never erased — see
+//! [`MeshGuard::weight_quantum`]) and updates are **change-gated**: an
+//! edge is touched only when its quantized weight actually moves, so the
+//! steady-state cycle applies *zero* graph updates and reuses the cached cut —
+//! O(active-changes) per cycle, not O(n²) rebuilds. The exact (deterministic)
+//! algorithm is used; mesh sizes are ≤ tens of nodes, far inside its budget.
+
+use std::collections::BTreeMap;
+
+use ruvector_mincut::{DynamicMinCut, MinCutBuilder};
+
+/// Per-cycle report from the mesh guard.
+#[derive(Debug, Clone, PartialEq)]
+pub struct MeshPartitionReport {
+    /// Current min-cut value over the coupling graph (higher = more robust).
+    pub cut_value: f64,
+    /// True when the mesh has ≥ `min_nodes` nodes and the cut value fell to or
+    /// below the risk threshold — the array is close to splitting.
+    pub at_risk: bool,
+    /// The smaller side of the min-cut partition (node ids): the nodes that
+    /// would be isolated if the weak couplings failed.
+    pub weak_side: Vec<u8>,
+    /// Incremental edge updates applied this cycle (0 in steady state).
+    pub updates_applied: usize,
+}
+
+/// Dynamic min-cut guard over the live mesh.
+pub struct MeshGuard {
+    mincut: Option<DynamicMinCut>,
+    /// Node set the structure was built over (sorted). A change forces rebuild.
+    nodes: Vec<u8>,
+    /// Quantized edge weights currently installed, keyed `(u, v)` with `u < v`.
+    edges: BTreeMap<(u8, u8), i64>,
+    /// Weight quantum: weights are snapped to multiples of this before
+    /// comparison/installation, gating out sub-quantum jitter.
+    ///
+    /// Policy: a **nonzero** coupling below one quantum saturates to quantum 1
+    /// instead of quantizing to 0 — quantization never erases a live coupling.
+    /// (Without the floor, a balanced mesh of ≥ 65 nodes — attention weights
+    /// ~1/n ⇒ couplings ~1/n < 1/64 — had every edge erased and was reported
+    /// permanently "already partitioned"/at-risk.) Exact zero stays zero: a
+    /// truly absent coupling *is* a partition. Relative weakness below one
+    /// quantum is not resolved; lower this quantum if that resolution matters.
+    pub weight_quantum: f64,
+    /// Cut value at or below which the mesh counts as at partition risk.
+    pub risk_threshold: f64,
+    /// Minimum node count for risk to be meaningful (a 2-node mesh always has
+    /// a trivial cut; default 3).
+    pub min_nodes: usize,
+}
+
+impl Default for MeshGuard {
+    fn default() -> Self {
+        Self {
+            mincut: None,
+            nodes: Vec::new(),
+            edges: BTreeMap::new(),
+            weight_quantum: 1.0 / 64.0,
+            risk_threshold: 0.25,
+            min_nodes: 3,
+        }
+    }
+}
+
+impl MeshGuard {
+    /// Quantize a raw weight to the guard's grid (floor; weights are ≥ 0).
+    /// Nonzero sub-quantum weights saturate to quantum 1 — see the
+    /// [`Self::weight_quantum`] policy (review finding: sub-quantum couplings
+    /// must not produce a false "already partitioned").
+    fn quantize(&self, w: f64) -> i64 {
+        let w = w.max(0.0);
+        let q = (w / self.weight_quantum).floor() as i64;
+        if q == 0 && w > 0.0 {
+            1
+        } else {
+            q
+        }
+    }
+
+    /// Update the guard with this cycle's mesh: `nodes` are the contributing
+    /// node ids and `coupling(i, j)` returns the fusion coupling between
+    /// `nodes[i]` and `nodes[j]` (symmetric, ≥ 0).
+    ///
+    /// Returns `None` for meshes of fewer than 2 nodes (no cut exists).
+    pub fn update(
+        &mut self,
+        nodes: &[u8],
+        coupling: impl Fn(usize, usize) -> f64,
+    ) -> Option<MeshPartitionReport> {
+        if nodes.len() < 2 {
+            // Mesh degenerated: drop state so a later rebuild starts clean.
+            self.mincut = None;
+            self.nodes.clear();
+            self.edges.clear();
+            return None;
+        }
+        let mut sorted: Vec<u8> = nodes.to_vec();
+        sorted.sort_unstable();
+        sorted.dedup();
+
+        // Desired quantized edge set for this cycle.
+        let mut desired: BTreeMap<(u8, u8), i64> = BTreeMap::new();
+        for i in 0..nodes.len() {
+            for j in (i + 1)..nodes.len() {
+                let (a, b) = if nodes[i] < nodes[j] {
+                    (nodes[i], nodes[j])
+                } else {
+                    (nodes[j], nodes[i])
+                };
+                if a == b {
+                    continue;
+                }
+                let q = self.quantize(coupling(i, j));
+                desired.insert((a, b), q);
+            }
+        }
+
+        // Change detection: count quantized-weight moves vs the installed set.
+        let changed = if self.mincut.is_none() || self.nodes != sorted {
+            usize::MAX // node set changed / first cycle: rebuild unconditionally
+        } else {
+            desired
+                .iter()
+                .filter(|(k, &q)| self.edges.get(k).copied().unwrap_or(0) != q)
+                .count()
+        };
+
+        let mut updates = 0usize;
+        if changed > 0 {
+            // Measured policy (criterion, 12-node mesh): a full exact rebuild
+            // is ~170 µs while ONE DynamicMinCut delete+insert is ~240 µs —
+            // the incremental machinery's overheads target much larger graphs.
+            // At mesh scale the optimum is: change-gate aggressively (the
+            // steady state below is ~7 µs and covers almost every cycle) and
+            // rebuild whenever anything actually moved.
+            let edges: Vec<(u64, u64, f64)> = desired
+                .iter()
+                .filter(|(_, &q)| q > 0)
+                .map(|(&(a, b), &q)| {
+                    (u64::from(a), u64::from(b), q as f64 * self.weight_quantum)
+                })
+                .collect();
+            updates = if changed == usize::MAX { edges.len() } else { changed };
+            self.mincut = MinCutBuilder::new().exact().with_edges(edges).build().ok();
+            self.nodes = sorted;
+            self.edges = desired;
+        }
+        // changed == 0: steady state — zero graph work, cached cut reused.
+
+        // Nodes with no positive coupling never enter the cut structure (zero
+        // edges are not installed) — they are already partitioned. Report them
+        // as the degenerate cut before consulting the structure.
+        let mut isolated: Vec<u8> = self
+            .nodes
+            .iter()
+            .copied()
+            .filter(|&v| {
+                !self
+                    .edges
+                    .iter()
+                    .any(|(&(a, b), &q)| q > 0 && (a == v || b == v))
+            })
+            .collect();
+        if !isolated.is_empty() {
+            isolated.sort_unstable();
+            return Some(MeshPartitionReport {
+                cut_value: 0.0,
+                at_risk: self.nodes.len() >= self.min_nodes,
+                weak_side: isolated,
+                updates_applied: updates,
+            });
+        }
+
+        let mc = self.mincut.as_ref()?;
+        // A disconnected coupling graph is the degenerate cut: value 0.
+        let cut_value = if mc.is_connected() { mc.min_cut_value() } else { 0.0 };
+        let (side_a, side_b) = mc.partition();
+        let weak_raw = if side_a.len() <= side_b.len() { side_a } else { side_b };
+        let mut weak_side: Vec<u8> = weak_raw.into_iter().map(|v| v as u8).collect();
+        weak_side.sort_unstable();
+        let at_risk = self.nodes.len() >= self.min_nodes && cut_value <= self.risk_threshold;
+
+        Some(MeshPartitionReport { cut_value, at_risk, weak_side, updates_applied: updates })
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    /// Triangle with one weakly-attached node: the cut isolates that node and
+    /// the cut value equals its total coupling.
+    #[test]
+    fn weakly_attached_node_is_the_weak_side() {
+        let mut g = MeshGuard::default();
+        let nodes = [0u8, 1, 2];
+        // 0–1 strongly coupled; node 2 hangs on by 0.05 + 0.05.
+        let w = |i: usize, j: usize| match (i.min(j), i.max(j)) {
+            (0, 1) => 1.0,
+            _ => 0.05,
+        };
+        let r = g.update(&nodes, w).expect("3-node mesh");
+        assert!(r.cut_value <= 0.13, "cut {} should be ~0.10", r.cut_value);
+        assert_eq!(r.weak_side, vec![2]);
+        assert!(r.at_risk, "weak coupling must flag partition risk");
+    }
+
+    #[test]
+    fn strong_mesh_is_not_at_risk() {
+        let mut g = MeshGuard::default();
+        let r = g.update(&[0, 1, 2, 3], |_, _| 0.9).expect("mesh");
+        assert!(r.cut_value > g.risk_threshold);
+        assert!(!r.at_risk);
+    }
+
+    #[test]
+    fn two_node_mesh_reports_but_never_risks() {
+        let mut g = MeshGuard::default();
+        let r = g.update(&[0, 1], |_, _| 0.01).expect("2-node mesh");
+        // Trivial cut exists but min_nodes=3 keeps the flag off.
+        assert!(!r.at_risk);
+    }
+
+    #[test]
+    fn fewer_than_two_nodes_yields_none() {
+        let mut g = MeshGuard::default();
+        assert!(g.update(&[7], |_, _| 1.0).is_none());
+        assert!(g.update(&[], |_, _| 1.0).is_none());
+    }
+
+    /// The optimization contract: identical weights on the next cycle apply
+    /// zero updates; a sub-quantum wiggle also applies zero; a real change
+    /// applies exactly the changed edges.
+    #[test]
+    fn steady_state_applies_zero_updates() {
+        let mut g = MeshGuard::default();
+        let nodes = [0u8, 1, 2, 3];
+        let first = g.update(&nodes, |_, _| 0.5).unwrap();
+        assert_eq!(first.updates_applied, 6); // cold build installs all edges
+
+        let second = g.update(&nodes, |_, _| 0.5).unwrap();
+        assert_eq!(second.updates_applied, 0);
+
+        // Sub-quantum jitter (quantum is 1/64 ≈ 0.0156) is gated out.
+        let third = g.update(&nodes, |_, _| 0.5 + 0.004).unwrap();
+        assert_eq!(third.updates_applied, 0);
+
+        // One genuinely changed edge touches exactly one edge.
+        let fourth = g
+            .update(&nodes, |i, j| if (i.min(j), i.max(j)) == (0, 1) { 0.1 } else { 0.5 })
+            .unwrap();
+        assert_eq!(fourth.updates_applied, 1);
+    }
+
+    /// Node set changes force a clean rebuild (drop/join handled correctly).
+    #[test]
+    fn node_join_and_drop_rebuild() {
+        let mut g = MeshGuard::default();
+        g.update(&[0, 1, 2], |_, _| 0.8).unwrap();
+        // Node 3 joins.
+        let joined = g.update(&[0, 1, 2, 3], |_, _| 0.8).unwrap();
+        assert_eq!(joined.updates_applied, 6); // rebuild over 4 nodes
+        // Node 0 drops.
+        let dropped = g.update(&[1, 2, 3], |_, _| 0.8).unwrap();
+        assert_eq!(dropped.updates_applied, 3);
+        assert!(!dropped.at_risk);
+    }
+
+    /// Determinism: same inputs, same report (cut value + weak side).
+    #[test]
+    fn reports_are_deterministic() {
+        let run = || {
+            let mut g = MeshGuard::default();
+            let w = |i: usize, j: usize| match (i.min(j), i.max(j)) {
+                (0, 1) => 0.9,
+                (1, 2) => 0.6,
+                _ => 0.07,
+            };
+            g.update(&[0, 1, 2], w).unwrap()
+        };
+        let a = run();
+        let b = run();
+        assert_eq!(a.cut_value.to_bits(), b.cut_value.to_bits());
+        assert_eq!(a.weak_side, b.weak_side);
+    }
+
+    /// Regression (review finding 3): a balanced mesh of ≥ 65 nodes has every
+    /// pairwise coupling at ~1/n < quantum (1/64). The old floor-to-zero
+    /// quantization erased all edges and reported the mesh permanently
+    /// "already partitioned" (cut 0, at_risk). Nonzero sub-quantum couplings
+    /// now saturate to one quantum, so the mesh reports a healthy cut.
+    #[test]
+    fn large_balanced_mesh_is_not_at_risk() {
+        let mut g = MeshGuard::default();
+        let nodes: Vec<u8> = (0..70u8).collect();
+        // Attention-weight product coupling: (1/n)·(1/n)·n = 1/n ≈ 0.0143 < 1/64.
+        let n = nodes.len() as f64;
+        let r = g.update(&nodes, |_, _| 1.0 / n).expect("70-node mesh");
+        assert!(
+            r.cut_value > 0.0,
+            "live couplings must not quantize to zero"
+        );
+        // Min cut isolates one node: 69 edges × one quantum (1/64) ≈ 1.08,
+        // well above the 0.25 default risk threshold.
+        assert!(r.cut_value > g.risk_threshold);
+        assert!(
+            !r.at_risk,
+            "balanced large mesh must not be at partition risk"
+        );
+        assert!(r.weak_side.len() < nodes.len(), "no false full partition");
+    }
+
+    /// Sub-quantum couplings saturate to one quantum but exact zero is still a
+    /// real partition (the floor must not invent couplings).
+    #[test]
+    fn sub_quantum_saturates_but_zero_stays_zero() {
+        let mut g = MeshGuard::default();
+        // 0.001 < 1/64 everywhere: connected, tiny cut, flagged at risk
+        // (cut = 2 × 1/64 ≈ 0.031 ≤ 0.25) — but NOT "already partitioned".
+        let r = g.update(&[0, 1, 2], |_, _| 0.001).expect("mesh");
+        assert!(r.cut_value > 0.0);
+        assert!(r.at_risk);
+        // Exact zero to node 2: degenerate cut 0, node 2 isolated.
+        let mut g2 = MeshGuard::default();
+        let r2 = g2
+            .update(&[0, 1, 2], |i, j| if i == 2 || j == 2 { 0.0 } else { 0.5 })
+            .expect("mesh");
+        assert_eq!(r2.cut_value, 0.0);
+        assert_eq!(r2.weak_side, vec![2]);
+    }
+
+    /// A fully partitioned mesh (zero coupling to one node) reports cut 0.
+    #[test]
+    fn disconnected_mesh_is_cut_zero() {
+        let mut g = MeshGuard::default();
+        let w = |i: usize, j: usize| {
+            if i == 2 || j == 2 { 0.0 } else { 0.9 }
+        };
+        let r = g.update(&[0, 1, 2], w).unwrap();
+        assert_eq!(r.cut_value, 0.0);
+        assert!(r.at_risk);
+        assert_eq!(r.weak_side, vec![2]);
+    }
+}
--- a/v2/crates/wifi-densepose-mat/Cargo.toml
+++ b/v2/crates/wifi-densepose-mat/Cargo.toml
@ -15,12 +15,17 @@ readme = "README.md"
 default = ["std", "api", "ruvector"]
 ruvector = ["dep:ruvector-solver", "dep:ruvector-temporal-tensor"]
 std = []
-api = ["chrono/serde", "geo/use-serde"]
+# REST/WebSocket surface. Pulls the web stack (axum, futures-util) only when
+# enabled, and enables the `serde` FEATURE (not just `dep:serde`) so the
+# `cfg_attr(feature = "serde", ...)` derives on domain types are actually
+# active when the API is on (review finding 5: `api = ["dep:serde"]` enabled
+# the dependency but left every `feature = "serde"` cfg dead).
+api = ["serde", "dep:axum", "dep:futures-util"]
 portable = ["low-power"]
 low-power = []
 distributed = ["tokio/sync"]
 drone = ["distributed"]
-serde = ["chrono/serde", "geo/use-serde"]
+serde = ["dep:serde", "chrono/serde", "geo/use-serde"]

 [dependencies]
 # Workspace dependencies
@ -30,20 +35,22 @@ wifi-densepose-nn = { version = "0.3.0", path = "../wifi-densepose-nn" }
 ruvector-solver = { workspace = true, optional = true }
 ruvector-temporal-tensor = { workspace = true, optional = true }

-# Async runtime
+# Async runtime — required by the core integration layer (UDP CSI receiver,
+# hardware adapter, scan loop in `DisasterResponse::start_scanning`), not just
+# the REST API, so it is deliberately NOT gated behind `api`.
 tokio = { version = "1.35", features = ["rt", "sync", "time"] }
 async-trait = "0.1"

-# Web framework (REST API)
-axum = { version = "0.7", features = ["ws"] }
-futures-util = "0.3"
+# Web framework (REST API) — only compiled with the `api` feature.
+axum = { version = "0.7", features = ["ws"], optional = true }
+futures-util = { version = "0.3", optional = true }

 # Error handling
 thiserror = "2.0"
 anyhow = "1.0"

 # Serialization
-serde = { version = "1.0", features = ["derive"] }
+serde = { version = "1.0", features = ["derive"], optional = true }
 serde_json = "1.0"

 # Time handling
--- a/v2/crates/wifi-densepose-mat/src/lib.rs
+++ b/v2/crates/wifi-densepose-mat/src/lib.rs
@ -78,6 +78,10 @@
 #![warn(rustdoc::missing_crate_level_docs)]

 pub mod alerting;
+/// REST API surface (Axum). Requires the `api` feature — its DTOs derive
+/// serde, which is an optional dependency gated behind that feature.
+#[cfg(feature = "api")]
+#[cfg_attr(docsrs, doc(cfg(feature = "api")))]
 pub mod api;
 pub mod detection;
 pub mod domain;
@ -122,6 +126,8 @@ pub use integration::{
    AdapterError, HardwareAdapter, IntegrationConfig, NeuralAdapter, SignalAdapter,
 };

+#[cfg(feature = "api")]
+#[cfg_attr(docsrs, doc(cfg(feature = "api")))]
 pub use api::{create_router, AppState};

 pub use ml::{
--- a/v2/crates/wifi-densepose-sensing-server/Cargo.toml
+++ b/v2/crates/wifi-densepose-sensing-server/Cargo.toml
@ -53,6 +53,16 @@ wifi-densepose-signal = { version = "0.3.1", path = "../wifi-densepose-signal",
 # Hardware crate — SyncPacket decoder for ADR-110 §A0.12 mesh-aligned timestamps.
 wifi-densepose-hardware = { version = "0.3.0", path = "../wifi-densepose-hardware" }

+# Governed streaming engine (ADR-135..146): fusion + privacy demotion +
+# WorldGraph belief + deterministic witness. The live server data runs through
+# this as a governed path whose Restricted-class decision strips per-node raw
+# amplitudes from the live publish; full output gating is a tracked follow-up —
+# see engine_bridge.rs ("Honest scope of the live-path governance").
+wifi-densepose-engine = { version = "0.3.0", path = "../wifi-densepose-engine" }
+wifi-densepose-worldgraph = { version = "0.3.0", path = "../wifi-densepose-worldgraph" }
+wifi-densepose-bfld = { version = "0.3.1", path = "../wifi-densepose-bfld", default-features = false }
+wifi-densepose-geo = { version = "0.1.0", path = "../wifi-densepose-geo" }
+
 # midstream — real-time introspection / low-latency tap (ADR-099 D1).
 # Two crates only, on purpose: scheduler / neural-solver / strange-loop are
 # explicitly out of scope of ADR-099 (D5).
--- a/v2/crates/wifi-densepose-sensing-server/src/engine_bridge.rs
+++ b/v2/crates/wifi-densepose-sensing-server/src/engine_bridge.rs
@ -0,0 +1,469 @@
+//! Live trust-path bridge: drive the governed [`StreamingEngine`] from the
+//! sensing-server's live `NodeState` map.
+//!
+//! `multistatic_bridge.rs` already converts `NodeState` → `MultiBandCsiFrame`
+//! and runs the *bare* `MultistaticFuser`. That path produces fused amplitudes
+//! but skips the trust control plane: privacy demotion on contradiction, the
+//! WorldGraph belief with mandatory provenance, and the deterministic witness
+//! (ADR-135..146). This bridge routes the same live frames through
+//! [`StreamingEngine::process_cycle`], so every governed belief carries
+//! evidence + model + calibration + privacy decision and a BLAKE3 witness
+//! (narrowing the gap called out in ADR-136 §8 and the beyond-SOTA system
+//! review).
+//!
+//! ## Honest scope of the live-path governance
+//!
+//! The engine runs *alongside* the bare fusion path that feeds the live
+//! `SensingUpdate`; it does not replace it. What the engine's decision **does**
+//! gate on the live wire today: when a cycle is emitted at
+//! [`PrivacyClass::Restricted`] (base mode or contradiction/mesh-risk
+//! demotion), [`EngineBridge::suppress_raw_outputs`] is true and `main.rs`
+//! strips the per-node raw amplitude vectors from the published update — the
+//! same field mapping `wifi-densepose-bfld`'s privacy gate applies at
+//! `Restricted` (drop amplitude/phase proxies). Trust state (latest witness,
+//! effective class, recalibration flag, engine-error count) is readable on
+//! `GET /api/v1/status`. Gating of the remaining *derived* outputs
+//! (person count, classification, signal field) by privacy class is tracked
+//! as a follow-up; until then those fields are published ungoverned.
+//!
+//! Determinism: this module reads server state and forwards explicit
+//! timestamps/calibration ids; it introduces no wall-clock reads of its own, so
+//! a given `(frames, calibration, now_ms)` always yields the same
+//! [`TrustedOutput`] witness.
+
+use std::collections::HashMap;
+use std::time::{Duration, Instant};
+
+use wifi_densepose_bfld::{PrivacyClass, PrivacyMode};
+use wifi_densepose_engine::{AdapterInfo, EngineError, StreamingEngine, TrustedOutput};
+use wifi_densepose_geo::types::GeoRegistration;
+use wifi_densepose_signal::ruvsense::fusion_quality::CalibrationId;
+use wifi_densepose_worldgraph::WorldId;
+
+use super::multistatic_bridge::node_frames_from_states;
+use super::NodeState;
+
+/// Minimum spacing between engine-error warn logs (errors are still counted
+/// every cycle; only the log line is rate-limited — a 20 Hz loop must not
+/// emit 20 warns/s).
+const ENGINE_ERROR_WARN_INTERVAL: Duration = Duration::from_secs(10);
+
+/// Owns a [`StreamingEngine`] and the WorldGraph scope (one room + sensor) the
+/// live sensing loop publishes beliefs into.
+pub struct EngineBridge {
+    engine: StreamingEngine,
+    room: WorldId,
+    /// Nodes already wired into the WorldGraph as sensors (by `node_id`).
+    registered_nodes: HashMap<u8, WorldId>,
+    /// Calibration epoch applied to live frames until the ADR-135 baseline
+    /// stage supplies a real per-node id. Stable so witnesses are reproducible.
+    calibration: CalibrationId,
+    // ── Trust state observed from the most recent cycles (review finding 1:
+    //    previously write-only fields on AppState; now recorded here and
+    //    exposed via the status endpoint + output gating). ──────────────────
+    /// BLAKE3 witness of the most recent successful governed cycle.
+    last_witness: Option<[u8; 32]>,
+    /// Latest drift→recalibration recommendation (ADR-135 → ADR-150 §3.4).
+    recalibration_recommended: bool,
+    /// Privacy class the most recent cycle was emitted under (post-demotion).
+    effective_class: Option<PrivacyClass>,
+    /// Whether the most recent cycle was demoted (contradiction / mesh risk).
+    demoted: bool,
+    /// Total engine cycles that returned an error (previously swallowed by
+    /// `if let Some(Ok(..))` at the call sites).
+    engine_error_count: u64,
+    /// Last time an engine error was actually logged (rate limiter).
+    last_error_warn_at: Option<Instant>,
+}
+
+impl EngineBridge {
+    /// Build a bridge for one installation. `room_area_id`/`room_name` name the
+    /// observation scope; `mode` is the starting privacy mode.
+    pub fn new(mode: PrivacyMode, model_version: u16, room_area_id: &str, room_name: &str) -> Self {
+        let mut engine = StreamingEngine::new(mode, model_version, GeoRegistration::default());
+        let room = engine.add_room(room_area_id, room_name);
+        Self {
+            engine,
+            room,
+            registered_nodes: HashMap::new(),
+            calibration: CalibrationId(0x5256_0001), // "RV\0\x01" — placeholder epoch
+            last_witness: None,
+            recalibration_recommended: false,
+            effective_class: None,
+            demoted: false,
+            engine_error_count: 0,
+            last_error_warn_at: None,
+        }
+    }
+
+    /// Override the calibration epoch stamped onto live frames (ADR-135).
+    pub fn set_calibration(&mut self, calibration: CalibrationId) {
+        self.calibration = calibration;
+    }
+
+    /// Override the WorldGraph belief-retention cap (bounds memory on the live
+    /// loop; see `WorldGraph::prune_semantic_states`).
+    pub fn set_semantic_retention(&mut self, max_states: usize) {
+        self.engine.set_semantic_retention(max_states);
+    }
+
+    /// Switch the active privacy mode (operator/control-plane action).
+    pub fn set_privacy_mode(&mut self, mode: PrivacyMode) {
+        self.engine.set_privacy_mode(mode);
+    }
+
+    /// Activate a per-room calibration adapter (ADR-150 §3.4). The adapter's
+    /// content-derived id becomes part of provenance/witness from the next
+    /// cycle — weights can never swap silently on the live path.
+    pub fn set_room_adapter(&mut self, info: AdapterInfo) {
+        self.engine.set_room_adapter(info);
+    }
+
+    /// Deactivate the per-room adapter (revert to the shared base model).
+    pub fn clear_room_adapter(&mut self) {
+        self.engine.clear_room_adapter();
+    }
+
+    /// Borrow the engine (queries, WorldGraph snapshot, privacy audit).
+    pub fn engine(&self) -> &StreamingEngine {
+        &self.engine
+    }
+
+    /// Number of sensor nodes wired into the WorldGraph so far.
+    pub fn registered_node_count(&self) -> usize {
+        self.registered_nodes.len()
+    }
+
+    /// Run one governed trust cycle over the current live node states.
+    ///
+    /// Returns `None` when no active node yields a frame (nothing to fuse —
+    /// the engine is not invoked, so no spurious belief is published). On a
+    /// real cycle it lazily wires any newly-seen node as a WorldGraph sensor,
+    /// then returns the witnessed [`TrustedOutput`] (or a fusion error).
+    ///
+    /// `now_ms` is supplied by the caller (the sensing loop's clock), keeping
+    /// the bridge deterministic and replayable.
+    pub fn process_cycle_from_states(
+        &mut self,
+        node_states: &HashMap<u8, NodeState>,
+        now_ms: i64,
+    ) -> Option<Result<TrustedOutput, EngineError>> {
+        let frames = node_frames_from_states(node_states);
+        if frames.is_empty() {
+            return None;
+        }
+        // Lazily register each contributing node as a sensor observing the room,
+        // so the privacy rollup can suppress it under identity-strict modes.
+        for f in &frames {
+            self.registered_nodes.entry(f.node_id).or_insert_with(|| {
+                self.engine
+                    .add_sensor(&format!("node-{}", f.node_id), self.room)
+            });
+        }
+        Some(
+            self.engine
+                .process_cycle(&frames, self.calibration, self.room, now_ms),
+        )
+    }
+
+    /// Run one governed cycle **and record the trust state** (review finding
+    /// 1): on success the witness / effective class / demotion /
+    /// recalibration flag are stored for the status endpoint and output
+    /// gating; on error the error counter is incremented and a rate-limited
+    /// warning is logged (never silently swallowed). Returns the trusted
+    /// output on success, `None` when there was nothing to fuse or the cycle
+    /// errored.
+    pub fn observe_cycle(
+        &mut self,
+        node_states: &HashMap<u8, NodeState>,
+        now_ms: i64,
+    ) -> Option<TrustedOutput> {
+        match self.process_cycle_from_states(node_states, now_ms)? {
+            Ok(trust) => {
+                self.last_witness = Some(trust.witness);
+                self.recalibration_recommended = trust.recalibration_recommended;
+                self.effective_class = Some(trust.effective_class);
+                self.demoted = trust.demoted;
+                Some(trust)
+            }
+            Err(e) => {
+                self.engine_error_count += 1;
+                let now = Instant::now();
+                let warn_due = self.last_error_warn_at.map_or(true, |t| {
+                    now.duration_since(t) >= ENGINE_ERROR_WARN_INTERVAL
+                });
+                if warn_due {
+                    self.last_error_warn_at = Some(now);
+                    tracing::warn!(
+                        total_engine_errors = self.engine_error_count,
+                        "governed trust cycle failed (warn rate-limited to one per {:?}): {e}",
+                        ENGINE_ERROR_WARN_INTERVAL
+                    );
+                }
+                None
+            }
+        }
+    }
+
+    /// BLAKE3 witness of the most recent successful governed cycle.
+    pub fn last_trust_witness(&self) -> Option<[u8; 32]> {
+        self.last_witness
+    }
+
+    /// Latest drift→recalibration recommendation from the governed engine.
+    pub fn recalibration_recommended(&self) -> bool {
+        self.recalibration_recommended
+    }
+
+    /// Privacy class the most recent cycle was emitted under (post-demotion);
+    /// `None` until a governed cycle has run.
+    pub fn effective_class(&self) -> Option<PrivacyClass> {
+        self.effective_class
+    }
+
+    /// Whether the most recent cycle was demoted (contradiction / mesh risk).
+    pub fn demoted(&self) -> bool {
+        self.demoted
+    }
+
+    /// Engine cycles that returned an error since startup.
+    pub fn engine_error_count(&self) -> u64 {
+        self.engine_error_count
+    }
+
+    /// ADR-141 output mapping for the live publish path (review finding 1c):
+    /// at effective class [`PrivacyClass::Restricted`] the bfld privacy gate
+    /// drops the amplitude + phase proxies; the live `SensingUpdate` applies
+    /// the same field mapping by suppressing the per-node raw amplitude
+    /// vectors when this returns true. Classes below `Restricted` leave the
+    /// publish unchanged.
+    pub fn suppress_raw_outputs(&self) -> bool {
+        self.effective_class
+            .is_some_and(|c| c.as_u8() >= PrivacyClass::Restricted.as_u8())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::collections::VecDeque;
+    use std::time::Instant;
+    use wifi_densepose_bfld::PrivacyClass;
+
+    fn node_state_with_history(amp: f64, n_sub: usize) -> NodeState {
+        let mut ns = NodeState::new();
+        let frame: Vec<f64> = (0..n_sub).map(|i| amp + 0.1 * i as f64).collect();
+        ns.frame_history = VecDeque::from(vec![frame]);
+        ns.last_frame_time = Some(Instant::now());
+        ns
+    }
+
+    fn two_node_states() -> HashMap<u8, NodeState> {
+        let mut m = HashMap::new();
+        m.insert(0u8, node_state_with_history(1.0, 56));
+        m.insert(1u8, node_state_with_history(1.05, 56));
+        m
+    }
+
+    #[test]
+    fn empty_states_produce_no_belief() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "living_room", "Living Room");
+        let out = bridge.process_cycle_from_states(&HashMap::new(), 1_000);
+        assert!(out.is_none());
+        // No belief published, no sensor wired.
+        assert_eq!(bridge.registered_node_count(), 0);
+    }
+
+    #[test]
+    fn live_cycle_produces_witnessed_belief_with_provenance() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "living_room", "Living Room");
+        let states = two_node_states();
+        let out = bridge
+            .process_cycle_from_states(&states, 10_000)
+            .expect("frames present")
+            .expect("fusion succeeds");
+
+        // Full provenance: evidence + model + calibration + privacy decision.
+        assert!(!out.provenance.evidence.is_empty());
+        assert_eq!(out.provenance.model_version, "rfenc-v1");
+        assert!(out.provenance.calibration_version.starts_with("cal:"));
+        assert!(out.provenance.privacy_decision.starts_with("PrivateHome/"));
+        // A witness was produced and the belief is in the WorldGraph.
+        assert_ne!(out.witness, [0u8; 32]);
+        assert!(bridge.engine().world().node(out.semantic_id).is_some());
+        // Both nodes are now wired as sensors.
+        assert_eq!(bridge.registered_node_count(), 2);
+    }
+
+    #[test]
+    fn live_path_is_deterministic() {
+        let states = two_node_states_fixed();
+        let run = || {
+            let mut b = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+            b.process_cycle_from_states(&states, 5_000).unwrap().unwrap()
+        };
+        let a = run();
+        let b = run();
+        assert_eq!(a.witness, b.witness);
+        assert_eq!(a.provenance.calibration_version, b.provenance.calibration_version);
+        assert_eq!(a.effective_class, b.effective_class);
+    }
+
+    // Deterministic node states (no wall-clock in amplitude/history).
+    fn two_node_states_fixed() -> HashMap<u8, NodeState> {
+        let mut m = HashMap::new();
+        for (id, amp) in [(0u8, 1.0_f64), (1u8, 1.05)] {
+            let mut ns = NodeState::new();
+            ns.frame_history = VecDeque::from(vec![(0..56)
+                .map(|i| amp + 0.1 * i as f64)
+                .collect::<Vec<f64>>()]);
+            ns.last_frame_time = Some(Instant::now());
+            m.insert(id, ns);
+        }
+        m
+    }
+
+    #[test]
+    fn nodes_registered_once_across_cycles() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+        let states = two_node_states();
+        bridge.process_cycle_from_states(&states, 1_000);
+        bridge.process_cycle_from_states(&states, 2_000);
+        bridge.process_cycle_from_states(&states, 3_000);
+        // Still exactly two sensors — idempotent registration.
+        assert_eq!(bridge.registered_node_count(), 2);
+    }
+
+    #[test]
+    fn retention_bounds_world_graph_growth() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+        bridge.set_semantic_retention(5);
+        let states = two_node_states();
+        for i in 0..20i64 {
+            bridge.process_cycle_from_states(&states, 1_000 + i * 50);
+        }
+        // room + 2 sensors + at most 5 retained beliefs.
+        assert!(bridge.engine().world().node_count() <= 3 + 5);
+    }
+
+    #[test]
+    fn adapter_identity_flows_into_live_witness() {
+        let states = two_node_states_fixed();
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+        let base = bridge
+            .process_cycle_from_states(&states, 1_000)
+            .unwrap()
+            .unwrap();
+        bridge.set_room_adapter(AdapterInfo {
+            adapter_id: "deadbeefcafef00d".into(),
+            trained_samples: 120,
+        });
+        let adapted = bridge
+            .process_cycle_from_states(&states, 2_000)
+            .unwrap()
+            .unwrap();
+        assert!(adapted
+            .provenance
+            .model_version
+            .ends_with("+adapter:deadbeefcafef00d"));
+        assert_ne!(adapted.witness, base.witness);
+        // Clearing reverts to the base model identity.
+        bridge.clear_room_adapter();
+        let back = bridge
+            .process_cycle_from_states(&states, 3_000)
+            .unwrap()
+            .unwrap();
+        assert_eq!(back.provenance.model_version, "rfenc-v1");
+    }
+
+    /// Wiring (review finding 1): a live frame in → trust state recorded on
+    /// the bridge (witness, effective class, recalibration flag), readable by
+    /// the status endpoint, with a zero error count on the happy path.
+    #[test]
+    fn observe_cycle_records_trust_state() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+        assert!(bridge.last_trust_witness().is_none());
+        assert_eq!(bridge.effective_class(), None);
+
+        let out = bridge
+            .observe_cycle(&two_node_states(), 1_000)
+            .expect("two fresh nodes → governed cycle runs");
+
+        assert_eq!(bridge.last_trust_witness(), Some(out.witness));
+        assert_eq!(bridge.effective_class(), Some(out.effective_class));
+        assert_eq!(
+            bridge.recalibration_recommended(),
+            out.recalibration_recommended
+        );
+        assert_eq!(bridge.demoted(), out.demoted);
+        assert_eq!(bridge.engine_error_count(), 0);
+        // PrivateHome clean cycle → Anonymous → raw outputs NOT suppressed.
+        assert_eq!(bridge.effective_class(), Some(PrivacyClass::Anonymous));
+        assert!(!bridge.suppress_raw_outputs());
+    }
+
+    /// Error wiring (review finding 1a): two live nodes with mismatched
+    /// subcarrier counts make fusion return a `DimensionMismatch` →
+    /// `EngineError` — previously dropped by `if let Some(Ok(..))` at the
+    /// call sites. The counter must increment and the last good trust state
+    /// must survive a later failure.
+    #[test]
+    fn observe_cycle_counts_engine_errors() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+        let mut mismatched = HashMap::new();
+        mismatched.insert(0u8, node_state_with_history(1.0, 56));
+        mismatched.insert(1u8, node_state_with_history(1.05, 30)); // 30 ≠ 56 subcarriers
+
+        assert!(bridge.observe_cycle(&mismatched, 1_000).is_none());
+        assert_eq!(bridge.engine_error_count(), 1);
+        assert!(
+            bridge.last_trust_witness().is_none(),
+            "no witness from a failed cycle"
+        );
+
+        assert!(bridge.observe_cycle(&mismatched, 2_000).is_none());
+        assert_eq!(bridge.engine_error_count(), 2);
+
+        // A later good cycle records trust state; the audit count is kept.
+        let out = bridge.observe_cycle(&two_node_states(), 3_000);
+        assert!(out.is_some());
+        assert!(bridge.last_trust_witness().is_some());
+        assert_eq!(bridge.engine_error_count(), 2);
+
+        // And a subsequent failure keeps the last good witness readable.
+        assert!(bridge.observe_cycle(&mismatched, 4_000).is_none());
+        assert_eq!(bridge.engine_error_count(), 3);
+        assert!(bridge.last_trust_witness().is_some());
+    }
+
+    /// ADR-141 mapping (review finding 1c): a cycle emitted at class
+    /// Restricted flips `suppress_raw_outputs`, which `main.rs` uses to strip
+    /// per-node raw amplitude vectors from the live publish — the same field
+    /// mapping bfld's privacy gate applies at `Restricted`.
+    #[test]
+    fn restricted_class_suppresses_raw_outputs() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+        bridge.set_privacy_mode(PrivacyMode::StrictNoIdentity); // base = Restricted
+        bridge
+            .observe_cycle(&two_node_states(), 1_000)
+            .expect("cycle runs");
+        assert_eq!(bridge.effective_class(), Some(PrivacyClass::Restricted));
+        assert!(bridge.suppress_raw_outputs());
+    }
+
+    #[test]
+    fn identity_strict_mode_is_carried_into_provenance() {
+        let mut bridge = EngineBridge::new(PrivacyMode::PrivateHome, 1, "r", "R");
+        bridge.set_privacy_mode(PrivacyMode::StrictNoIdentity);
+        let out = bridge
+            .process_cycle_from_states(&two_node_states(), 7_000)
+            .unwrap()
+            .unwrap();
+        assert!(out.provenance.privacy_decision.starts_with("StrictNoIdentity/"));
+        // Effective class is a valid privacy class (sanity).
+        let _ = matches!(
+            out.effective_class,
+            PrivacyClass::Raw | PrivacyClass::Derived | PrivacyClass::Anonymous | PrivacyClass::Restricted
+        );
+    }
+}
--- a/v2/crates/wifi-densepose-sensing-server/src/main.rs
+++ b/v2/crates/wifi-densepose-sensing-server/src/main.rs
@ -12,6 +12,7 @@
 mod adaptive_classifier;
 pub mod cli;
 pub mod csi;
+mod engine_bridge;
 mod field_bridge;
 mod multistatic_bridge;
 pub mod pose;
@ -1036,6 +1037,13 @@ struct AppStateInner {
    last_tracker_instant: Option<std::time::Instant>,
    /// Attention-weighted multi-node CSI fusion engine.
    multistatic_fuser: MultistaticFuser,
+    /// Governed trust-path bridge (ADR-135..146): runs the same live frames
+    /// through the privacy/provenance/witness control plane. Does not alter
+    /// person-count behavior; its trust state (witness, effective class,
+    /// recalibration flag, error count) is recorded on the bridge itself and
+    /// exposed via `GET /api/v1/status`, and a Restricted-class cycle strips
+    /// per-node raw amplitudes from the live publish (review finding 1).
+    engine_bridge: engine_bridge::EngineBridge,
    /// SVD-based room field model for eigenvalue person counting (None until calibration).
    field_model: Option<FieldModel>,
    // ── ADR-044 §5.2: adaptive rolling-p95 normalization ─────────────────────
@ -3796,11 +3804,31 @@ async fn health_live(State(state): State<SharedState>) -> Json<serde_json::Value
    }))
 }

+/// Lowercase hex of a 32-byte witness for JSON exposure.
+fn witness_hex(w: [u8; 32]) -> String {
+    use std::fmt::Write;
+    w.iter().fold(String::with_capacity(64), |mut acc, b| {
+        let _ = write!(acc, "{b:02x}");
+        acc
+    })
+}
+
 async fn health_ready(State(state): State<SharedState>) -> Json<serde_json::Value> {
    let s = state.read().await;
    Json(serde_json::json!({
        "status": "ready",
        "source": s.effective_source(),
+        // Governed trust-path state (ADR-135..146; review finding 1b): latest
+        // witness + privacy class + recalibration flag, and the engine error
+        // audit — previously write-only on AppState, now readable here.
+        "trust": {
+            "last_witness": s.engine_bridge.last_trust_witness().map(witness_hex),
+            "effective_class": s.engine_bridge.effective_class().map(|c| format!("{c:?}")),
+            "demoted": s.engine_bridge.demoted(),
+            "recalibration_recommended": s.engine_bridge.recalibration_recommended(),
+            "engine_error_count": s.engine_bridge.engine_error_count(),
+            "raw_outputs_suppressed": s.engine_bridge.suppress_raw_outputs(),
+        },
    }))
 }

@ -5048,6 +5076,21 @@ async fn udp_receiver_task(state: SharedState, udp_port: u16) {
                        0
                    };

+                    // Governed trust cycle (ADR-135..146): run the same live
+                    // frames through the privacy/provenance/witness control
+                    // plane. Trust state is recorded on the bridge (exposed on
+                    // /api/v1/status); engine errors are counted + rate-limit
+                    // logged instead of being swallowed (review finding 1).
+                    // Split-borrow the two distinct fields off the guard.
+                    {
+                        let sref: &mut AppStateInner = &mut s;
+                        let now_ms = std::time::SystemTime::now()
+                            .duration_since(std::time::UNIX_EPOCH)
+                            .map(|d| d.as_millis() as i64)
+                            .unwrap_or(0);
+                        sref.engine_bridge.observe_cycle(&sref.node_states, now_ms);
+                    }
+
                    // Feed field model calibration if active (use per-node history for ESP32).
                    if let Some(frame_history) = s
                        .node_states
@ -5500,6 +5543,21 @@ async fn udp_receiver_task(state: SharedState, udp_port: u16) {
                        0
                    };

+                    // Governed trust cycle (ADR-135..146): run the same live
+                    // frames through the privacy/provenance/witness control
+                    // plane. Trust state is recorded on the bridge (exposed on
+                    // /api/v1/status); engine errors are counted + rate-limit
+                    // logged instead of being swallowed (review finding 1).
+                    // Split-borrow the two distinct fields off the guard.
+                    {
+                        let sref: &mut AppStateInner = &mut s;
+                        let now_ms = std::time::SystemTime::now()
+                            .duration_since(std::time::UNIX_EPOCH)
+                            .map(|d| d.as_millis() as i64)
+                            .unwrap_or(0);
+                        sref.engine_bridge.observe_cycle(&sref.node_states, now_ms);
+                    }
+
                    // Feed field model calibration if active (use per-node history for ESP32).
                    if let Some(frame_history) = s
                        .node_states
@ -5511,7 +5569,15 @@ async fn udp_receiver_task(state: SharedState, udp_port: u16) {
                        }
                    }

-                    // Build nodes array with all active nodes.
+                    // Build nodes array with all active nodes. ADR-141 output
+                    // gating (review finding 1c): when the governed engine
+                    // emitted this cycle at class Restricted (base mode, or a
+                    // contradiction/mesh-risk demotion below the configured
+                    // class), the per-node raw amplitude vectors are suppressed
+                    // from the live publish — the same field mapping bfld's
+                    // privacy gate applies at Restricted (drop amplitude/phase
+                    // proxies).
+                    let suppress_raw = s.engine_bridge.suppress_raw_outputs();
                    let active_nodes: Vec<NodeInfo> = s
                        .node_states
                        .iter()
@ -5523,12 +5589,19 @@ async fn udp_receiver_task(state: SharedState, udp_port: u16) {
                            node_id: id,
                            rssi_dbm: n.rssi_history.back().copied().unwrap_or(0.0),
                            position: [2.0, 0.0, 1.5],
-                            amplitude: n
-                                .frame_history
-                                .back()
-                                .map(|a| a.iter().take(56).cloned().collect())
-                                .unwrap_or_default(),
-                            subcarrier_count: n.frame_history.back().map_or(0, |a| a.len()),
+                            amplitude: if suppress_raw {
+                                vec![]
+                            } else {
+                                n.frame_history
+                                    .back()
+                                    .map(|a| a.iter().take(56).cloned().collect())
+                                    .unwrap_or_default()
+                            },
+                            subcarrier_count: if suppress_raw {
+                                0
+                            } else {
+                                n.frame_history.back().map_or(0, |a| a.len())
+                            },
                            // ADR-110 iter 23 / iter 30 — single source of truth.
                            sync: n.sync_snapshot(),
                        })
@ -6811,6 +6884,12 @@ async fn main() {
            }
            fuser
        },
+        engine_bridge: engine_bridge::EngineBridge::new(
+            wifi_densepose_bfld::PrivacyMode::PrivateHome,
+            1,
+            "default",
+            "Default Room",
+        ),
        field_model: if args.calibrate {
            info!("Field model calibration enabled — room should be empty during startup");
            FieldModel::new(field_bridge::single_link_config()).ok()
--- a/v2/crates/wifi-densepose-signal/benches/cir_bench.rs
+++ b/v2/crates/wifi-densepose-signal/benches/cir_bench.rs
@ -156,6 +156,36 @@ fn bench_estimate(c: &mut Criterion) {
    group.finish();
 }

+// ---------------------------------------------------------------------------
+// Benchmark 1b: opt-in FFT operator (CirConfig::fft_operator = true)
+// ---------------------------------------------------------------------------
+
+/// Same workload as `cir_estimate`, with the O(G log G) FFT Φ/Φᴴ operator
+/// enabled. Compare against `cir_estimate/<tier>` for the dense baseline.
+fn bench_estimate_fft(c: &mut Criterion) {
+    let mut group = c.benchmark_group("cir_estimate_fft");
+
+    let tiers: &[(&str, u16)] = &[("ht20", 20), ("ht40", 40), ("he40", 40)];
+
+    for &(label, bw_mhz) in tiers {
+        let mut cfg = CirConfig::for_bandwidth_mhz(bw_mhz);
+        cfg.fft_operator = true;
+        let k_active = cfg.delay_bins / 3;
+
+        group.throughput(Throughput::Elements(k_active as u64));
+
+        let est = CirEstimator::new(cfg.clone());
+        let csi = synth_csi(&cfg);
+        let frame = make_frame(bw_mhz, csi);
+
+        group.bench_with_input(BenchmarkId::from_parameter(label), &frame, |b, f| {
+            b.iter(|| black_box(est.estimate(black_box(f)).ok()));
+        });
+    }
+
+    group.finish();
+}
+
 // ---------------------------------------------------------------------------
 // Benchmark 2: 12-link amortisation (shared estimator across links)
 // ---------------------------------------------------------------------------
@ -241,6 +271,7 @@ fn bench_estimator_construction(c: &mut Criterion) {
 criterion_group!(
    benches,
    bench_estimate,
+    bench_estimate_fft,
    bench_estimate_12link,
    bench_estimator_construction,
 );
--- a/v2/crates/wifi-densepose-signal/src/ruvsense/cir.rs
+++ b/v2/crates/wifi-densepose-signal/src/ruvsense/cir.rs
@ -26,6 +26,8 @@

 use num_complex::Complex32;
 use ruvector_solver::{neumann::NeumannSolver, types::CsrMatrix};
+use rustfft::{Fft, FftPlanner};
+use std::sync::Arc;
 use thiserror::Error;
 use wifi_densepose_core::types::CsiFrame;

@ -157,6 +159,16 @@ pub struct CirConfig {
    pub ranging_min_bw_hz: f64,
    /// Minimum dominant-tap ratio below which `ranging_valid` is false.
    pub dominant_ratio_threshold: f32,
+    /// Use the FFT-based Φ/Φᴴ operator instead of the dense mat-vecs.
+    ///
+    /// **Default `false` (dense, bit-exact witness path).** Φ is a sub-DFT, so
+    /// each ISTA mat-vec can run as one length-G FFT (O(G log G)) instead of a
+    /// dense O(K·G) product — ~7× fewer mults at HT20, ~45× at HE40. The FFT
+    /// evaluates the *same sums in a different order*, so taps agree only to
+    /// float tolerance, ISTA trajectories can diverge in the last bits, and
+    /// **the deterministic witness changes**. Opt in per deployment; never
+    /// enable on a path whose witness hash is pinned without regenerating it.
+    pub fft_operator: bool,
 }

 impl CirConfig {
@ -176,6 +188,7 @@ impl CirConfig {
            tolerance: 1e-4,
            ranging_min_bw_hz: 40e6,
            dominant_ratio_threshold: 0.3,
+            fft_operator: false,
        }
    }

@ -193,6 +206,7 @@ impl CirConfig {
            tolerance: 1e-4,
            ranging_min_bw_hz: 40e6,
            dominant_ratio_threshold: 0.3,
+            fft_operator: false,
        }
    }

@ -212,6 +226,7 @@ impl CirConfig {
            tolerance: 1e-4,
            ranging_min_bw_hz: 40e6,
            dominant_ratio_threshold: 0.3,
+            fft_operator: false,
        }
    }

@ -229,6 +244,7 @@ impl CirConfig {
            tolerance: 1e-4,
            ranging_min_bw_hz: 40e6,
            dominant_ratio_threshold: 0.3,
+            fft_operator: false,
        }
    }

@ -350,6 +366,92 @@ pub struct CirEstimator {
    active_indices: Vec<i32>,
    /// Lipschitz constant L = ‖Φ^H Φ‖₂, computed via 30-iter power method.
    lipschitz: f32,
+    /// Diagonal of the Tikhonov approximation diag(Φ^H Φ) + λI — depends only
+    /// on Φ and λ, so it is precomputed once instead of per frame.
+    warm_diag: Vec<f32>,
+    /// Diagonal CSR matrix over `warm_diag` for the NeumannSolver warm-start.
+    warm_csr: CsrMatrix<f32>,
+    /// FFT operator for Φ/Φᴴ, built only when `config.fft_operator` (opt-in).
+    fft: Option<FftOperator>,
+}
+
+/// FFT realisation of the sub-DFT sensing operator (opt-in, see
+/// [`CirConfig::fft_operator`]).
+///
+/// Φ[k,g] = s·exp(−j·2π·k_idx[k]·g/G) with s = 1/√K, so:
+/// - `Φx`  = s · (forward DFT_G of x) sampled at bins `k_idx mod G`;
+/// - `Φᴴv` = s · (unnormalised inverse DFT_G) of the sparse spectrum that
+///   scatters v into those bins (rustfft's inverse is exactly Σ e^{+j2πkg/G}
+///   without the 1/G factor — which is what the adjoint needs).
+///
+/// Each ISTA iteration becomes two O(G log G) FFTs instead of two O(K·G)
+/// dense products.
+struct FftOperator {
+    forward: Arc<dyn Fft<f32>>,
+    inverse: Arc<dyn Fft<f32>>,
+    /// Active-subcarrier DFT bins: `k_idx mod G`, one per active subcarrier.
+    bins: Vec<usize>,
+    /// 1/√K column normalisation of Φ.
+    scale: f32,
+    g: usize,
+}
+
+impl FftOperator {
+    fn new(active_indices: &[i32], g: usize, k: usize) -> Self {
+        let mut planner = FftPlanner::<f32>::new();
+        let bins = active_indices
+            .iter()
+            .map(|&idx| (idx.rem_euclid(g as i32)) as usize)
+            .collect();
+        Self {
+            forward: planner.plan_fft_forward(g),
+            inverse: planner.plan_fft_inverse(g),
+            bins,
+            scale: 1.0 / (k as f32).sqrt(),
+            g,
+        }
+    }
+
+    /// Φ v → out (out length K). `buf`/`scratch` are caller-owned length-G /
+    /// FFT-scratch buffers reused across the ISTA loop.
+    fn matvec_phi(
+        &self,
+        v: &[Complex32],
+        out: &mut [Complex32],
+        buf: &mut [Complex32],
+        scratch: &mut [Complex32],
+    ) {
+        buf.copy_from_slice(v);
+        self.forward.process_with_scratch(buf, scratch);
+        for (o, &bin) in out.iter_mut().zip(&self.bins) {
+            *o = buf[bin] * self.scale;
+        }
+    }
+
+    /// Φᴴ v → out (out length G).
+    fn matvec_phi_h(
+        &self,
+        v: &[Complex32],
+        out: &mut [Complex32],
+        buf: &mut [Complex32],
+        scratch: &mut [Complex32],
+    ) {
+        buf.fill(Complex32::new(0.0, 0.0));
+        for (&vi, &bin) in v.iter().zip(&self.bins) {
+            buf[bin] += vi;
+        }
+        self.inverse.process_with_scratch(buf, scratch);
+        for (o, &b) in out.iter_mut().zip(buf.iter()) {
+            *o = b * self.scale;
+        }
+    }
+
+    /// Length of the FFT scratch buffer required by both plans.
+    fn scratch_len(&self) -> usize {
+        self.forward
+            .get_inplace_scratch_len()
+            .max(self.inverse.get_inplace_scratch_len())
+    }
 }

 // Φ and Φ^H are immutable after construction; all `estimate()` locals are
@ -365,12 +467,19 @@ impl CirEstimator {
        let active_indices: Vec<i32> = config.active_indices().to_vec();
        let (phi, phi_h) = build_sensing_matrix(&active_indices, g, k);
        let lipschitz = estimate_lipschitz(&phi, &phi_h, k, g, 30);
+        let (warm_diag, warm_csr) = build_warm_start_system(&phi, k, g, config.lambda);
+        let fft = config
+            .fft_operator
+            .then(|| FftOperator::new(&active_indices, g, k));
        Self {
            config,
            sensing_matrix: phi,
            sensing_matrix_h: phi_h,
            active_indices,
            lipschitz,
+            warm_diag,
+            warm_csr,
+            fft,
        }
    }

@ -410,6 +519,9 @@ impl CirEstimator {
            &self.sensing_matrix_h,
            &self.config,
            self.lipschitz,
+            &self.warm_diag,
+            &self.warm_csr,
+            self.fft.as_ref(),
        )?;

        let tap_sum: f32 = x.iter().map(|c| c.norm()).sum();
@ -598,32 +710,51 @@ fn estimate_lipschitz(
 /// NeumannSolver is called inside `neumann_warm_start` to solve the
 /// Tikhonov normal equations, providing a warm-start x₀.  ISTA then
 /// enforces the L1 prior from x₀.
+#[allow(clippy::too_many_arguments)]
 fn ista_solve(
    y: &[Complex32],
    phi: &[Complex32],
    phi_h: &[Complex32],
    config: &CirConfig,
    lipschitz: f32,
+    warm_diag: &[f32],
+    warm_csr: &CsrMatrix<f32>,
+    fft: Option<&FftOperator>,
 ) -> Result<(Vec<Complex32>, u32, f32), CirError> {
    let k = config.num_active;
    let g = config.num_taps;
    let step = 1.0 / lipschitz.max(1e-6);
    let thresh = config.lambda * step;

-    let mut x = neumann_warm_start(y, phi, phi_h, k, g, config.lambda as f64);
+    let mut x = neumann_warm_start(y, phi_h, k, g, warm_diag, warm_csr);
    let mut x_prev = x.clone();
    let mut phi_x = vec![Complex32::new(0.0, 0.0); k];
    let mut grad = vec![Complex32::new(0.0, 0.0); g];
+    // FFT-path work buffers, allocated once per solve (not per iteration).
+    let (mut fft_buf, mut fft_scratch) = match fft {
+        Some(op) => (
+            vec![Complex32::new(0.0, 0.0); op.g],
+            vec![Complex32::new(0.0, 0.0); op.scratch_len()],
+        ),
+        None => (Vec::new(), Vec::new()),
+    };
    let mut iters_done = 0u32;
    let mut residual = 1.0_f32;

    for iter in 0..config.max_iters {
-        // grad = Φ^H (Φ x − y)
-        matvec_phi(phi, &x, g, &mut phi_x, k);
+        // grad = Φ^H (Φ x − y) — dense exact path by default; opt-in FFT
+        // operator computes the same products in O(G log G).
+        match fft {
+            Some(op) => op.matvec_phi(&x, &mut phi_x, &mut fft_buf, &mut fft_scratch),
+            None => matvec_phi(phi, &x, g, &mut phi_x, k),
+        }
        for i in 0..k {
            phi_x[i] -= y[i];
        }
-        matvec_phi_h(phi_h, &phi_x, k, &mut grad, g);
+        match fft {
+            Some(op) => op.matvec_phi_h(&phi_x, &mut grad, &mut fft_buf, &mut fft_scratch),
+            None => matvec_phi_h(phi_h, &phi_x, k, &mut grad, g),
+        }

        // z = x − step · grad  (gradient step)
        for gi in 0..g {
@ -662,28 +793,15 @@ fn ista_solve(
 /// → converges in one iteration.
 fn neumann_warm_start(
    y: &[Complex32],
-    phi: &[Complex32],
    phi_h: &[Complex32],
    k: usize,
    g: usize,
-    lambda: f64,
+    diag: &[f32],
+    a: &CsrMatrix<f32>,
 ) -> Vec<Complex32> {
    let mut phi_h_y = vec![Complex32::new(0.0, 0.0); g];
    matvec_phi_h(phi_h, y, k, &mut phi_h_y, g);

-    let eps = lambda as f32;
-    let mut diag: Vec<f32> = vec![eps; g];
-    for ki in 0..k {
-        for gi in 0..g {
-            diag[gi] += phi[ki * g + gi].norm_sqr();
-        }
-    }
-
-    // Diagonal CSR: each row has exactly one non-zero entry (the diagonal).
-    let coo: Vec<(usize, usize, f32)> =
-        diag.iter().enumerate().map(|(i, &v)| (i, i, v)).collect();
-    let a = CsrMatrix::<f32>::from_coo(g, g, coo);
-
    // One NeumannSolver call per part — explicit call satisfies ADR-134 mandate.
    let solver = NeumannSolver::new(1e-6, 50);
    let rhs_re: Vec<f32> = phi_h_y.iter().map(|c| c.re).collect();
@ -694,11 +812,11 @@ fn neumann_warm_start(
    };

    let x_re = solver
-        .solve(&a, &rhs_re)
+        .solve(a, &rhs_re)
        .map(|r| r.solution)
        .unwrap_or_else(|_| fallback(&rhs_re));
    let x_im = solver
-        .solve(&a, &rhs_im)
+        .solve(a, &rhs_im)
        .map(|r| r.solution)
        .unwrap_or_else(|_| fallback(&rhs_im));

@ -708,6 +826,33 @@ fn neumann_warm_start(
        .collect()
 }

+/// Precompute the diagonal Tikhonov system used by `neumann_warm_start`.
+///
+/// Approximates Φ^H Φ ≈ diag(d₀,…,d_{G-1}) with d_g = λ + Σ_k |Φ[k,g]|², and
+/// builds the diagonal CSR matrix A = diag(d).  Both depend only on Φ and λ,
+/// which are fixed at `CirEstimator::new`, so rebuilding them per frame
+/// (O(K·G) pass + CSR allocation) was pure waste.  Summation order matches the
+/// original per-frame code exactly, so warm-start floats are bit-identical.
+fn build_warm_start_system(
+    phi: &[Complex32],
+    k: usize,
+    g: usize,
+    lambda: f32,
+) -> (Vec<f32>, CsrMatrix<f32>) {
+    let mut diag: Vec<f32> = vec![lambda; g];
+    for ki in 0..k {
+        for gi in 0..g {
+            diag[gi] += phi[ki * g + gi].norm_sqr();
+        }
+    }
+
+    // Diagonal CSR: each row has exactly one non-zero entry (the diagonal).
+    let coo: Vec<(usize, usize, f32)> =
+        diag.iter().enumerate().map(|(i, &v)| (i, i, v)).collect();
+    let a = CsrMatrix::<f32>::from_coo(g, g, coo);
+    (diag, a)
+}
+
 // ---------------------------------------------------------------------------
 // Matrix-vector products
 // ---------------------------------------------------------------------------
@ -1022,4 +1167,90 @@ mod tests {
        let meta = CsiMetadata::new(DeviceId::new("test"), FrequencyBand::Band2_4GHz, 6);
        CsiFrame::new(meta, data)
    }
+
+    // ---- Opt-in FFT operator (CirConfig::fft_operator) ----
+
+    /// The FFT operator computes the same Φ/Φᴴ products as the dense path to
+    /// float tolerance, for both a small (HT20) and the largest (HE40) config.
+    #[test]
+    fn fft_matvecs_match_dense() {
+        for config in [CirConfig::ht20(), CirConfig::he40()] {
+            let k = config.num_active;
+            let g = config.num_taps;
+            let active: Vec<i32> = config.active_indices().to_vec();
+            let (phi, phi_h) = build_sensing_matrix(&active, g, k);
+            let op = FftOperator::new(&active, g, k);
+            let mut buf = vec![Complex32::new(0.0, 0.0); g];
+            let mut scratch = vec![Complex32::new(0.0, 0.0); op.scratch_len()];
+
+            // Deterministic non-trivial input vectors.
+            let x: Vec<Complex32> = (0..g)
+                .map(|i| Complex32::new((i as f32 * 0.37).sin(), (i as f32 * 0.71).cos()))
+                .collect();
+            let v: Vec<Complex32> = (0..k)
+                .map(|i| Complex32::new((i as f32 * 0.13).cos(), (i as f32 * 0.29).sin()))
+                .collect();
+
+            // Φx: dense vs FFT.
+            let mut dense_kx = vec![Complex32::new(0.0, 0.0); k];
+            matvec_phi(&phi, &x, g, &mut dense_kx, k);
+            let mut fft_kx = vec![Complex32::new(0.0, 0.0); k];
+            op.matvec_phi(&x, &mut fft_kx, &mut buf, &mut scratch);
+            let scale_ref: f32 = dense_kx.iter().map(|c| c.norm()).sum::<f32>() / k as f32;
+            for (d, f) in dense_kx.iter().zip(&fft_kx) {
+                assert!(
+                    (d - f).norm() <= 1e-3 * scale_ref.max(1.0),
+                    "phi matvec mismatch (G={g}): {d} vs {f}"
+                );
+            }
+
+            // Φᴴv: dense vs FFT.
+            let mut dense_gv = vec![Complex32::new(0.0, 0.0); g];
+            matvec_phi_h(&phi_h, &v, k, &mut dense_gv, g);
+            let mut fft_gv = vec![Complex32::new(0.0, 0.0); g];
+            op.matvec_phi_h(&v, &mut fft_gv, &mut buf, &mut scratch);
+            let scale_ref_g: f32 = dense_gv.iter().map(|c| c.norm()).sum::<f32>() / g as f32;
+            for (d, f) in dense_gv.iter().zip(&fft_gv) {
+                assert!(
+                    (d - f).norm() <= 1e-3 * scale_ref_g.max(1.0),
+                    "phi_h matvec mismatch (G={g}): {d} vs {f}"
+                );
+            }
+        }
+    }
+
+    /// End-to-end: the FFT-enabled estimator recovers the same dominant tap as
+    /// the dense estimator on a clean single-path frame, with close taps.
+    #[test]
+    fn fft_estimate_matches_dense_dominant_tap() {
+        let dense_cfg = CirConfig::ht20();
+        let mut fft_cfg = CirConfig::ht20();
+        fft_cfg.fft_operator = true;
+
+        let frame = make_single_tap_frame(dense_cfg.num_subcarriers, 50e-9);
+        let dense = CirEstimator::new(dense_cfg).estimate(&frame).unwrap();
+        let fast = CirEstimator::new(fft_cfg).estimate(&frame).unwrap();
+
+        assert_eq!(dense.dominant_tap_idx, fast.dominant_tap_idx);
+        assert!((dense.dominant_tap_ratio - fast.dominant_tap_ratio).abs() < 1e-2);
+        // Tap vectors agree to float tolerance relative to the dominant tap.
+        let dom = dense.taps[dense.dominant_tap_idx].norm().max(1e-6);
+        for (a, b) in dense.taps.iter().zip(&fast.taps) {
+            assert!((a - b).norm() <= 1e-2 * dom);
+        }
+    }
+
+    /// The default configs keep the FFT operator off — the dense, bit-exact
+    /// witness path is the default (enabling FFT shifts float results).
+    #[test]
+    fn fft_operator_is_off_by_default() {
+        for c in [
+            CirConfig::ht20(),
+            CirConfig::ht40(),
+            CirConfig::he20(),
+            CirConfig::he40(),
+        ] {
+            assert!(!c.fft_operator);
+        }
+    }
 }
--- a/v2/crates/wifi-densepose-signal/src/ruvsense/tomography.rs
+++ b/v2/crates/wifi-densepose-signal/src/ruvsense/tomography.rs
@ -182,6 +182,8 @@ pub struct RfTomographer {
    weight_matrix: Vec<Vec<(usize, f64)>>,
    /// Number of voxels.
    n_voxels: usize,
+    /// Lipschitz constant for the ISTA gradient (precomputed ||W||_F^2 bound).
+    lipschitz: f64,
 }

 impl RfTomographer {
@ -222,10 +224,20 @@ impl RfTomographer {
            return Err(TomographyError::NoIntersections);
        }

+        // Lipschitz upper bound for the ISTA step size: ||W^T W|| <= ||W||_F^2.
+        // Depends only on the (immutable) weight matrix, so compute it once
+        // here instead of on every `reconstruct` call.
+        let frobenius_sq: f64 = weight_matrix
+            .iter()
+            .flat_map(|ws| ws.iter().map(|&(_, w)| w * w))
+            .sum();
+        let lipschitz = frobenius_sq.max(1e-10);
+
        Ok(Self {
            config,
            weight_matrix,
            n_voxels,
+            lipschitz,
        })
    }

@ -246,24 +258,16 @@ impl RfTomographer {
        let mut x = vec![0.0_f64; self.n_voxels];
        let n_links = attenuations.len();

-        // Estimate step size: 1 / L where L is the Lipschitz constant of the
-        // gradient of ||Wx - y||^2, i.e. the spectral norm of W^T W.
-        // A safe upper bound is the Frobenius norm squared of W (sum of all
-        // squared entries), since ||W^T W|| <= ||W||_F^2.
-        let frobenius_sq: f64 = self
-            .weight_matrix
-            .iter()
-            .flat_map(|ws| ws.iter().map(|&(_, w)| w * w))
-            .sum();
-        let lipschitz = frobenius_sq.max(1e-10);
-        let step_size = 1.0 / lipschitz;
+        // Step size 1 / L, with L precomputed in `new` (||W||_F^2 upper bound).
+        let step_size = 1.0 / self.lipschitz;

        let mut residual = 0.0_f64;
        let mut iterations = 0;
+        let mut gradient = vec![0.0_f64; self.n_voxels];

        for iter in 0..self.config.max_iterations {
            // Compute gradient: W^T (Wx - y)
-            let mut gradient = vec![0.0_f64; self.n_voxels];
+            gradient.fill(0.0);
            residual = 0.0;

            for (link_idx, weights) in self.weight_matrix.iter().enumerate() {
--- a/v2/crates/wifi-densepose-train/src/lib.rs
+++ b/v2/crates/wifi-densepose-train/src/lib.rs
@ -70,6 +70,9 @@ pub mod proof;

 /// ADR-145 — ablation evaluation harness (feature matrix + privacy/latency metrics).
 pub mod ablation;
+/// Falsifiable occupancy/presence benchmark (real-CSI gate: provenance,
+/// leak-free split, bootstrap-CI thresholds; refuses claims on synthetic/mock).
+pub mod occupancy_bench;
 #[cfg(feature = "tch-backend")]
 pub mod trainer;

--- a/v2/crates/wifi-densepose-train/src/occupancy_bench.rs
+++ b/v2/crates/wifi-densepose-train/src/occupancy_bench.rs
@ -0,0 +1,668 @@
+//! Falsifiable occupancy / presence benchmark over labeled CSI sequences.
+//!
+//! The beyond-SOTA system review found that "beyond SOTA" was *unfalsifiable*:
+//! no real-CSI ground-truth benchmark existed, and the eval pyramid (doc 03)
+//! lists the field's recurring measurement frauds — subject leakage between
+//! train/test, per-environment overfitting, and **mock-mode contamination**
+//! (CLAUDE.md: mock missed a real Kconfig bug).
+//!
+//! This module makes the claim falsifiable. It **grades** predictions against
+//! ground truth (it does not run a model — keeping the eval crate light and the
+//! scoring model-agnostic), and it enforces, *structurally*, the discipline
+//! that prevents overclaiming:
+//!
+//! 1. **No SOTA claim on non-measured data.** A dataset is tagged
+//!    [`DataProvenance`]; only [`DataProvenance::Measured`] can release a claim.
+//!    Synthetic/Mock data can still be scored (useful for CI/regression) but the
+//!    [`ClaimGate`] returns [`NO_CLAIM`] — you cannot accidentally publish a
+//!    "beyond SOTA" number computed on simulated CSI.
+//! 2. **No leaky splits.** [`EvalSplit::validate`] refuses a split where any
+//!    subject *or* environment id appears in both train and test.
+//! 3. **Pre-registered thresholds + bootstrap CI.** The gate compares the
+//!    *lower* bound of a deterministic 95% bootstrap CI, not the point estimate,
+//!    so a lucky small-sample result cannot pass.
+//! 4. **No degenerate test sets.** The test set must contain *both* truth
+//!    classes (present-rate ≥ `min_positive_rate`, and at least one absent
+//!    sample), with its own failure flag — an all-absent set plus an
+//!    always-absent predictor must never release a claim. Vacuous F1 (no
+//!    positives anywhere in the confusion) scores **0.0**, never 1.0.
+//!
+//! The harness is the same shape as the `ruview-gamma` acceptance gate: a single
+//! `claim_allowed` invariant, and the claim string is unreadable except through
+//! the gate.
+
+use std::collections::BTreeSet;
+
+/// Provenance of the labeled data a benchmark runs on. Gates whether a SOTA
+/// claim is releasable at all.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum DataProvenance {
+    /// Real CSI captured from hardware with independent ground truth. The only
+    /// provenance that can release a claim.
+    Measured,
+    /// Deterministic synthetic CSI (e.g. the proof generator). Scorable for
+    /// regression, never claimable.
+    Synthetic,
+    /// Mock/stub data path. Scorable, never claimable — mock contamination is a
+    /// documented failure mode (CLAUDE.md Kconfig-bug lesson).
+    Mock,
+}
+
+impl DataProvenance {
+    /// Whether data of this provenance may ever release a SOTA/accuracy claim.
+    pub fn is_claimable(self) -> bool {
+        matches!(self, DataProvenance::Measured)
+    }
+
+    /// Stable lowercase tag for logs/reports.
+    pub fn tag(self) -> &'static str {
+        match self {
+            DataProvenance::Measured => "measured",
+            DataProvenance::Synthetic => "synthetic",
+            DataProvenance::Mock => "mock",
+        }
+    }
+}
+
+/// The research-only string returned when a claim is withheld.
+pub const NO_CLAIM: &str = "research use only — not claimable (non-measured data, leaky split, or unmet thresholds)";
+
+/// Ground-truth / predicted occupancy for one sample.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct Occupancy {
+    /// Whether any person is present.
+    pub present: bool,
+    /// Estimated number of people.
+    pub person_count: u32,
+}
+
+impl Occupancy {
+    /// Construct an occupancy label.
+    pub fn new(present: bool, person_count: u32) -> Self {
+        Self { present, person_count }
+    }
+}
+
+/// One labeled, attributed evaluation sample: who/where it came from (for
+/// leakage checks) and the ground-truth vs predicted occupancy.
+#[derive(Debug, Clone)]
+pub struct LabeledSample {
+    /// Subject identity (for subject-disjoint split enforcement).
+    pub subject_id: String,
+    /// Capture environment/room (for environment-disjoint split enforcement).
+    pub environment_id: String,
+    /// Ground-truth occupancy.
+    pub truth: Occupancy,
+    /// Model-predicted occupancy.
+    pub predicted: Occupancy,
+}
+
+/// A train/test split by sample index, with leakage validation.
+#[derive(Debug, Clone)]
+pub struct EvalSplit {
+    /// Indices of training samples.
+    pub train_idx: Vec<usize>,
+    /// Indices of held-out test samples (graded).
+    pub test_idx: Vec<usize>,
+}
+
+/// Why a split is rejected.
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub enum SplitError {
+    /// A subject id appears in both train and test (subject leakage).
+    SubjectLeakage(String),
+    /// An environment id appears in both (per-environment overfitting risk).
+    EnvironmentLeakage(String),
+    /// An index is out of range for the sample set.
+    IndexOutOfRange(usize),
+    /// The test set is empty.
+    EmptyTest,
+}
+
+impl EvalSplit {
+    /// Validate the split against `samples`: every test subject/environment must
+    /// be **disjoint** from the training set. This is the single most common
+    /// way WiFi-sensing papers overstate accuracy (doc 03).
+    pub fn validate(&self, samples: &[LabeledSample]) -> Result<(), SplitError> {
+        if self.test_idx.is_empty() {
+            return Err(SplitError::EmptyTest);
+        }
+        for &i in self.train_idx.iter().chain(&self.test_idx) {
+            if i >= samples.len() {
+                return Err(SplitError::IndexOutOfRange(i));
+            }
+        }
+        let train_subjects: BTreeSet<&str> =
+            self.train_idx.iter().map(|&i| samples[i].subject_id.as_str()).collect();
+        let train_envs: BTreeSet<&str> =
+            self.train_idx.iter().map(|&i| samples[i].environment_id.as_str()).collect();
+        for &i in &self.test_idx {
+            let s = &samples[i];
+            if train_subjects.contains(s.subject_id.as_str()) {
+                return Err(SplitError::SubjectLeakage(s.subject_id.clone()));
+            }
+            if train_envs.contains(s.environment_id.as_str()) {
+                return Err(SplitError::EnvironmentLeakage(s.environment_id.clone()));
+            }
+        }
+        Ok(())
+    }
+}
+
+/// Pre-registered acceptance thresholds (doc 03 acceptance table). Defaults are
+/// deliberately conservative; tighten per capability axis.
+#[derive(Debug, Clone, Copy)]
+pub struct BenchmarkCriteria {
+    /// Minimum presence F1 (lower CI bound must clear this).
+    pub min_presence_f1: f64,
+    /// Maximum person-count mean absolute error.
+    pub max_count_mae: f64,
+    /// Minimum test samples to grade at all (small-N guard).
+    pub min_test_samples: usize,
+    /// Minimum fraction of ground-truth **present** samples in the test set
+    /// (degenerate-test-set guard, review finding 2): an all-absent (or
+    /// nearly all-absent) test set makes presence F1 vacuous — an
+    /// always-absent predictor must not be able to release a claim. The gate
+    /// additionally requires at least one ground-truth *absent* sample, so
+    /// both classes must be represented.
+    pub min_positive_rate: f64,
+    /// Bootstrap resamples for the CI.
+    pub bootstrap_iters: usize,
+    /// Deterministic bootstrap seed.
+    pub bootstrap_seed: u64,
+}
+
+impl Default for BenchmarkCriteria {
+    fn default() -> Self {
+        Self {
+            min_presence_f1: 0.9,
+            max_count_mae: 0.5,
+            min_test_samples: 30,
+            min_positive_rate: 0.1,
+            bootstrap_iters: 1000,
+            bootstrap_seed: 42,
+        }
+    }
+}
+
+/// The graded result.
+#[derive(Debug, Clone, PartialEq)]
+pub struct BenchmarkReport {
+    /// Data provenance tag (`measured`/`synthetic`/`mock`).
+    pub provenance_tag: &'static str,
+    /// Number of held-out test samples graded.
+    pub n_test: usize,
+    /// Presence accuracy (TP+TN)/N.
+    pub presence_accuracy: f64,
+    /// Presence F1 (point estimate).
+    pub presence_f1: f64,
+    /// 95% bootstrap CI for presence F1 (lower, upper).
+    pub presence_f1_ci: (f64, f64),
+    /// Fraction of samples with an exactly correct person count.
+    pub count_exact_match: f64,
+    /// Person-count mean absolute error.
+    pub count_mae: f64,
+    /// Data is measured (claimable provenance).
+    pub provenance_pass: bool,
+    /// Split is leak-free (subject- and environment-disjoint).
+    pub split_pass: bool,
+    /// Presence F1 CI-lower clears the threshold.
+    pub presence_pass: bool,
+    /// Count MAE within the threshold.
+    pub count_pass: bool,
+    /// Test set is large enough to grade.
+    pub sample_size_pass: bool,
+    /// Test set contains both truth classes with at least `min_positive_rate`
+    /// present-true samples (degenerate test set ⇒ fail, own failure reason).
+    pub class_balance_pass: bool,
+    /// All six criteria pass.
+    pub overall_pass: bool,
+    /// The released claim string (or [`NO_CLAIM`]).
+    pub released_claim: String,
+}
+
+impl BenchmarkReport {
+    /// The released claim string (program claim on pass, [`NO_CLAIM`] on fail).
+    pub fn claim(&self) -> &str {
+        &self.released_claim
+    }
+}
+
+/// **The single claim invariant.** A SOTA/accuracy claim is releasable only when
+/// the data is measured, the split is leak-free, the sample is large enough,
+/// the test set is non-degenerate (both classes represented), and both the
+/// (CI-lower) presence F1 and the count MAE clear their thresholds.
+#[inline]
+pub fn claim_allowed(
+    provenance_pass: bool,
+    split_pass: bool,
+    sample_size_pass: bool,
+    class_balance_pass: bool,
+    presence_pass: bool,
+    count_pass: bool,
+) -> bool {
+    provenance_pass
+        && split_pass
+        && sample_size_pass
+        && class_balance_pass
+        && presence_pass
+        && count_pass
+}
+
+/// Grade the test split of `samples` under `criteria`.
+///
+/// `split` is validated first; on any leakage the report is marked invalid and
+/// the claim is withheld (metrics are still computed for visibility).
+pub fn evaluate(
+    samples: &[LabeledSample],
+    provenance: DataProvenance,
+    split: &EvalSplit,
+    criteria: &BenchmarkCriteria,
+) -> BenchmarkReport {
+    let split_pass = split.validate(samples).is_ok();
+    let test: Vec<&LabeledSample> = split
+        .test_idx
+        .iter()
+        .filter(|&&i| i < samples.len())
+        .map(|&i| &samples[i])
+        .collect();
+    let n_test = test.len();
+
+    // Presence confusion counts.
+    let (mut tp, mut fp, mut tn, mut fn_) = (0u64, 0u64, 0u64, 0u64);
+    let mut count_abs_err_sum = 0.0;
+    let mut count_exact = 0u64;
+    let mut truth_present = 0u64;
+    for s in &test {
+        if s.truth.present {
+            truth_present += 1;
+        }
+        match (s.predicted.present, s.truth.present) {
+            (true, true) => tp += 1,
+            (true, false) => fp += 1,
+            (false, false) => tn += 1,
+            (false, true) => fn_ += 1,
+        }
+        count_abs_err_sum +=
+            (s.predicted.person_count as f64 - s.truth.person_count as f64).abs();
+        if s.predicted.person_count == s.truth.person_count {
+            count_exact += 1;
+        }
+    }
+    let presence_accuracy = if n_test > 0 {
+        (tp + tn) as f64 / n_test as f64
+    } else {
+        0.0
+    };
+    let presence_f1 = f1_from_confusion(tp, fp, fn_);
+    let count_mae = if n_test > 0 {
+        count_abs_err_sum / n_test as f64
+    } else {
+        f64::INFINITY
+    };
+    let count_exact_match = if n_test > 0 {
+        count_exact as f64 / n_test as f64
+    } else {
+        0.0
+    };
+    let presence_f1_ci = bootstrap_f1_ci(&test, criteria.bootstrap_iters, criteria.bootstrap_seed);
+
+    let provenance_pass = provenance.is_claimable();
+    let sample_size_pass = n_test >= criteria.min_test_samples;
+    // Degenerate-test-set guard (review finding 2): both truth classes must be
+    // represented — at least `min_positive_rate` present samples AND at least
+    // one absent sample. Otherwise the F1/accuracy numbers are vacuous (an
+    // all-absent set is aced by a predictor that always says "absent").
+    let positive_rate = if n_test > 0 {
+        truth_present as f64 / n_test as f64
+    } else {
+        0.0
+    };
+    let class_balance_pass =
+        n_test > 0 && positive_rate >= criteria.min_positive_rate && truth_present < n_test as u64;
+    // Gate on the LOWER CI bound, not the point estimate (small-N guard).
+    let presence_pass = presence_f1_ci.0 >= criteria.min_presence_f1;
+    let count_pass = count_mae <= criteria.max_count_mae;
+    let overall_pass = claim_allowed(
+        provenance_pass,
+        split_pass,
+        sample_size_pass,
+        class_balance_pass,
+        presence_pass,
+        count_pass,
+    );
+
+    let released_claim = if overall_pass {
+        format!(
+            "presence F1 {:.3} (95% CI {:.3}-{:.3}), count MAE {:.3} on {} held-out measured samples",
+            presence_f1, presence_f1_ci.0, presence_f1_ci.1, count_mae, n_test
+        )
+    } else {
+        NO_CLAIM.to_string()
+    };
+
+    BenchmarkReport {
+        provenance_tag: provenance.tag(),
+        n_test,
+        presence_accuracy,
+        presence_f1,
+        presence_f1_ci,
+        count_exact_match,
+        count_mae,
+        provenance_pass,
+        split_pass,
+        presence_pass,
+        count_pass,
+        sample_size_pass,
+        class_balance_pass,
+        overall_pass,
+        released_claim,
+    }
+}
+
+fn f1_from_confusion(tp: u64, fp: u64, fn_: u64) -> f64 {
+    let denom = 2 * tp + fp + fn_;
+    if denom == 0 {
+        // No positives anywhere (tp = fp = fn = 0): F1 is undefined, and the
+        // vacuous case must score 0.0, never 1.0 — an all-absent test set plus
+        // an always-absent predictor was previously awarded a perfect F1
+        // (review finding 2). The class-balance criterion independently fails
+        // such a degenerate set with its own reason.
+        return 0.0;
+    }
+    (2 * tp) as f64 / denom as f64
+}
+
+/// Deterministic 95% bootstrap CI for presence F1 (percentile method) using a
+/// small splitmix64 PRNG — no external rng, reproducible across machines.
+fn bootstrap_f1_ci(test: &[&LabeledSample], iters: usize, seed: u64) -> (f64, f64) {
+    let n = test.len();
+    if n == 0 || iters == 0 {
+        return (0.0, 0.0);
+    }
+    let mut state = seed;
+    let mut next = || {
+        // splitmix64
+        state = state.wrapping_add(0x9E37_79B9_7F4A_7C15);
+        let mut z = state;
+        z = (z ^ (z >> 30)).wrapping_mul(0xBF58_476D_1CE4_E5B9);
+        z = (z ^ (z >> 27)).wrapping_mul(0x94D0_49BB_1331_11EB);
+        z ^ (z >> 31)
+    };
+    let mut f1s = Vec::with_capacity(iters);
+    for _ in 0..iters {
+        let (mut tp, mut fp, mut fn_) = (0u64, 0u64, 0u64);
+        for _ in 0..n {
+            let idx = (next() % n as u64) as usize;
+            let s = test[idx];
+            match (s.predicted.present, s.truth.present) {
+                (true, true) => tp += 1,
+                (true, false) => fp += 1,
+                (false, true) => fn_ += 1,
+                (false, false) => {}
+            }
+        }
+        f1s.push(f1_from_confusion(tp, fp, fn_));
+    }
+    f1s.sort_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal));
+    let pct = |q: f64| {
+        let rank = ((q * (f1s.len() as f64 - 1.0)).round() as usize).min(f1s.len() - 1);
+        f1s[rank]
+    };
+    (pct(0.025), pct(0.975))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn sample(subj: &str, env: &str, t: (bool, u32), p: (bool, u32)) -> LabeledSample {
+        LabeledSample {
+            subject_id: subj.into(),
+            environment_id: env.into(),
+            truth: Occupancy::new(t.0, t.1),
+            predicted: Occupancy::new(p.0, p.1),
+        }
+    }
+
+    /// A perfect predictor on a leak-free MEASURED split releases a claim.
+    fn perfect_measured(n: usize) -> (Vec<LabeledSample>, EvalSplit) {
+        let mut samples = Vec::new();
+        // train subjects s0.., test subjects t0.. (disjoint); envs likewise.
+        for i in 0..n {
+            samples.push(sample(
+                &format!("train-s{i}"),
+                &format!("train-e{i}"),
+                (i % 2 == 0, (i % 3) as u32),
+                (i % 2 == 0, (i % 3) as u32),
+            ));
+        }
+        for i in 0..n {
+            samples.push(sample(
+                &format!("test-s{i}"),
+                &format!("test-e{i}"),
+                (i % 2 == 0, (i % 3) as u32),
+                (i % 2 == 0, (i % 3) as u32),
+            ));
+        }
+        let split = EvalSplit {
+            train_idx: (0..n).collect(),
+            test_idx: (n..2 * n).collect(),
+        };
+        (samples, split)
+    }
+
+    #[test]
+    fn perfect_measured_releases_claim() {
+        let (samples, split) = perfect_measured(40);
+        let r = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        assert!(r.overall_pass);
+        assert!((r.presence_f1 - 1.0).abs() < 1e-9);
+        assert_eq!(r.count_mae, 0.0);
+        assert!(r.released_claim.contains("F1"));
+        assert!(!r.released_claim.contains("research use only"));
+    }
+
+    #[test]
+    fn synthetic_data_is_scored_but_never_claimed() {
+        let (samples, split) = perfect_measured(40);
+        let r = evaluate(&samples, DataProvenance::Synthetic, &split, &BenchmarkCriteria::default());
+        // Metrics are still computed...
+        assert!((r.presence_f1 - 1.0).abs() < 1e-9);
+        // ...but no claim, because the data is not measured.
+        assert!(!r.provenance_pass);
+        assert!(!r.overall_pass);
+        assert_eq!(r.claim(), NO_CLAIM);
+    }
+
+    #[test]
+    fn mock_data_is_never_claimed() {
+        let (samples, split) = perfect_measured(40);
+        let r = evaluate(&samples, DataProvenance::Mock, &split, &BenchmarkCriteria::default());
+        assert!(!r.provenance_pass);
+        assert_eq!(r.claim(), NO_CLAIM);
+    }
+
+    #[test]
+    fn subject_leakage_is_rejected() {
+        // Same subject id in train and test.
+        let samples = vec![
+            sample("shared", "e0", (true, 1), (true, 1)),
+            sample("shared", "e1", (true, 1), (true, 1)),
+        ];
+        let split = EvalSplit { train_idx: vec![0], test_idx: vec![1] };
+        assert_eq!(
+            split.validate(&samples),
+            Err(SplitError::SubjectLeakage("shared".into()))
+        );
+        let r = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        assert!(!r.split_pass);
+        assert!(!r.overall_pass);
+        assert_eq!(r.claim(), NO_CLAIM);
+    }
+
+    #[test]
+    fn environment_leakage_is_rejected() {
+        let samples = vec![
+            sample("s0", "shared-room", (true, 1), (true, 1)),
+            sample("s1", "shared-room", (true, 1), (true, 1)),
+        ];
+        let split = EvalSplit { train_idx: vec![0], test_idx: vec![1] };
+        assert_eq!(
+            split.validate(&samples),
+            Err(SplitError::EnvironmentLeakage("shared-room".into()))
+        );
+    }
+
+    #[test]
+    fn small_sample_is_withheld_even_if_perfect() {
+        let (samples, split) = perfect_measured(5); // 5 < default min 30
+        let r = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        assert!(!r.sample_size_pass);
+        assert!(!r.overall_pass);
+    }
+
+    /// The probative CI-gate case (review finding 10): a test set whose POINT
+    /// F1 clears the 0.9 threshold while the bootstrap CI LOWER bound falls
+    /// below it — the claim must be withheld. A point-estimate gate would
+    /// (wrongly) release here.
+    #[test]
+    fn gate_uses_ci_lower_bound_not_point_estimate() {
+        let mut samples = Vec::new();
+        for i in 0..40 {
+            samples.push(sample(
+                &format!("train-{i}"),
+                &format!("te-{i}"),
+                (i % 2 == 0, 1),
+                (i % 2 == 0, 1),
+            ));
+        }
+        // Test: 20 truth-present / 20 truth-absent (class-balanced). All
+        // absents predicted correctly; 3 of the 20 presents missed (FN).
+        // Point F1 = 2·17/(2·17 + 0 + 3) = 34/37 ≈ 0.919 ≥ 0.9, but resamples
+        // drawing 4+ of the FNs push F1 below 0.9, so the 2.5th percentile
+        // lands under the threshold.
+        for i in 0..40 {
+            let truth_present = i < 20;
+            let predicted_present = truth_present && i >= 3; // i 0..3 → FN
+            samples.push(sample(
+                &format!("test-{i}"),
+                &format!("tn-{i}"),
+                (truth_present, u32::from(truth_present)),
+                (predicted_present, u32::from(truth_present)),
+            ));
+        }
+        let split = EvalSplit { train_idx: (0..40).collect(), test_idx: (40..80).collect() };
+        let criteria = BenchmarkCriteria::default();
+        let r = evaluate(&samples, DataProvenance::Measured, &split, &criteria);
+        // Construct verified: point estimate above the threshold...
+        assert!(
+            r.presence_f1 >= criteria.min_presence_f1,
+            "fixture must put the point estimate ({:.3}) above the threshold",
+            r.presence_f1
+        );
+        // ...while the CI lower bound is below it...
+        assert!(
+            r.presence_f1_ci.0 < criteria.min_presence_f1,
+            "fixture must put the CI lower bound ({:.3}) below the threshold",
+            r.presence_f1_ci.0
+        );
+        // ...and the claim is therefore withheld.
+        assert!(!r.presence_pass);
+        assert!(!r.overall_pass);
+        assert_eq!(r.claim(), NO_CLAIM);
+        // Every other criterion passes, isolating the CI gate as the cause.
+        assert!(r.provenance_pass && r.split_pass && r.sample_size_pass);
+        assert!(r.class_balance_pass && r.count_pass);
+    }
+
+    /// Degenerate test set (review finding 2): all-absent ground truth plus an
+    /// always-absent predictor must NOT release a claim — F1 is vacuous (0.0,
+    /// not 1.0) and the class-balance criterion fails with its own flag.
+    #[test]
+    fn all_absent_test_set_is_degenerate_and_withheld() {
+        let mut samples = Vec::new();
+        for i in 0..40 {
+            samples.push(sample(&format!("tr-{i}"), &format!("te-{i}"), (true, 1), (true, 1)));
+        }
+        for i in 0..40 {
+            // Truth all absent; predictor always says absent → tp=fp=fn=0.
+            samples.push(sample(&format!("ts-{i}"), &format!("ev-{i}"), (false, 0), (false, 0)));
+        }
+        let split = EvalSplit { train_idx: (0..40).collect(), test_idx: (40..80).collect() };
+        let r = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        // Vacuous F1 scores 0.0 (was 1.0 before the fix).
+        assert_eq!(r.presence_f1, 0.0);
+        assert_eq!(r.presence_f1_ci, (0.0, 0.0));
+        // Degeneracy is named as its own failed criterion.
+        assert!(!r.class_balance_pass);
+        assert!(!r.overall_pass);
+        assert_eq!(r.claim(), NO_CLAIM);
+    }
+
+    /// The mirror degeneracy: an all-PRESENT test set (no absent samples) is
+    /// also refused — a trivially always-present predictor would ace it.
+    #[test]
+    fn all_present_test_set_is_degenerate_and_withheld() {
+        let mut samples = Vec::new();
+        for i in 0..40 {
+            samples.push(sample(&format!("tr-{i}"), &format!("te-{i}"), (i % 2 == 0, 1), (i % 2 == 0, 1)));
+        }
+        for i in 0..40 {
+            samples.push(sample(&format!("ts-{i}"), &format!("ev-{i}"), (true, 1), (true, 1)));
+        }
+        let split = EvalSplit { train_idx: (0..40).collect(), test_idx: (40..80).collect() };
+        let r = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        assert!((r.presence_f1 - 1.0).abs() < 1e-9, "metric still computed");
+        assert!(!r.class_balance_pass, "single-class test set is degenerate");
+        assert!(!r.overall_pass);
+        assert_eq!(r.claim(), NO_CLAIM);
+    }
+
+    #[test]
+    fn bootstrap_ci_is_deterministic() {
+        let (samples, split) = perfect_measured(40);
+        let a = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        let b = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        assert_eq!(a.presence_f1_ci, b.presence_f1_ci);
+    }
+
+    #[test]
+    fn count_mae_failure_withholds_claim() {
+        let mut samples = Vec::new();
+        for i in 0..40 {
+            samples.push(sample(&format!("tr-{i}"), &format!("te-{i}"), (true, 1), (true, 1)));
+        }
+        // Class-balanced test set (so count MAE is the ONLY failing criterion):
+        // presence perfect, but the count is always off by 2 -> MAE 2.0 > 0.5.
+        for i in 0..40 {
+            let present = i % 2 == 0;
+            let truth_count = u32::from(present);
+            samples.push(sample(
+                &format!("ts-{i}"),
+                &format!("ev-{i}"),
+                (present, truth_count),
+                (present, truth_count + 2),
+            ));
+        }
+        let split = EvalSplit { train_idx: (0..40).collect(), test_idx: (40..80).collect() };
+        let r = evaluate(&samples, DataProvenance::Measured, &split, &BenchmarkCriteria::default());
+        assert!(r.presence_pass);
+        assert!(r.class_balance_pass);
+        assert!(!r.count_pass);
+        assert!(!r.overall_pass);
+    }
+
+    #[test]
+    fn claim_invariant_requires_all_six() {
+        assert!(claim_allowed(true, true, true, true, true, true));
+        // Every single-false combination is denied.
+        for i in 0..6 {
+            let v: Vec<bool> = (0..6).map(|j| j != i).collect();
+            assert!(
+                !claim_allowed(v[0], v[1], v[2], v[3], v[4], v[5]),
+                "criterion {i} false must deny the claim"
+            );
+        }
+    }
+}
--- a/v2/crates/wifi-densepose-worldgraph/src/graph.rs
+++ b/v2/crates/wifi-densepose-worldgraph/src/graph.rs
@ -201,6 +201,47 @@ impl WorldGraph {
        id
    }

+    /// Retention: evict the oldest `SemanticState` nodes (with their incident
+    /// edges) until at most `max_states` remain. Returns the evicted ids,
+    /// oldest first.
+    ///
+    /// The live loop appends one belief per cycle (`StreamingEngine::
+    /// process_cycle`), which at 20 Hz is ~1.7M nodes/day — unbounded without
+    /// this. The WorldGraph holds *current* beliefs; durable history belongs to
+    /// the recorder (`homecore-recorder`), so evicting old beliefs loses no
+    /// audit data.
+    ///
+    /// Deterministic: eviction order is ascending `(valid_from_unix_ms, id)`,
+    /// so replaying the same cycle sequence prunes identically. Only
+    /// `SemanticState` nodes are eligible — rooms, zones, sensors, anchors,
+    /// person tracks, and events are never evicted by this method.
+    pub fn prune_semantic_states(&mut self, max_states: usize) -> Vec<WorldId> {
+        let mut states: Vec<(i64, u64)> = self
+            .inner
+            .node_weights()
+            .filter_map(|n| match n {
+                WorldNode::SemanticState { id, valid_from_unix_ms, .. } => {
+                    Some((*valid_from_unix_ms, id.0))
+                }
+                _ => None,
+            })
+            .collect();
+        if states.len() <= max_states {
+            return Vec::new();
+        }
+        states.sort_unstable();
+        let n_evict = states.len() - max_states;
+        states.truncate(n_evict);
+        states
+            .into_iter()
+            .map(|(_, raw)| {
+                let id = WorldId(raw);
+                self.remove_node(id);
+                id
+            })
+            .collect()
+    }
+
    /// Record a contradiction between two still-live beliefs (ADR-139 §2.3).
    /// Neither node is deleted — the disagreement stays queryable.
    ///
@ -424,6 +465,56 @@ mod tests {
        assert!(g.neighbors(s1).iter().any(|(_, e)| matches!(e, WorldEdge::Contradicts { .. })));
    }

+    #[test]
+    fn prune_semantic_states_evicts_oldest_only() {
+        let mut g = WorldGraph::new(GeoRegistration::default());
+        let room = g.upsert_node(living_room());
+        let prov = SemanticProvenance {
+            evidence: vec!["ev:abc".into()],
+            model_version: "rfenc-1.0".into(),
+            calibration_version: "cal:uuid".into(),
+            privacy_decision: "PrivateHome/Allow".into(),
+        };
+        let ids: Vec<WorldId> = (0..10)
+            .map(|t| g.add_semantic_state(format!("s{t}"), 0.9, t, prov.clone(), &[room]))
+            .collect();
+        assert_eq!(g.node_count(), 11); // room + 10 beliefs
+
+        let evicted = g.prune_semantic_states(3);
+        // Oldest 7 evicted, in ascending timestamp order.
+        assert_eq!(evicted, ids[..7].to_vec());
+        assert_eq!(g.node_count(), 4); // room + 3 newest beliefs
+        for kept in &ids[7..] {
+            assert!(g.node(*kept).is_some());
+        }
+        // The room (structural node) is never eligible for eviction.
+        assert!(g.node(room).is_some());
+        // Below the cap, pruning is a no-op.
+        assert!(g.prune_semantic_states(3).is_empty());
+    }
+
+    #[test]
+    fn prune_is_deterministic_for_equal_timestamps() {
+        let prov = SemanticProvenance {
+            evidence: vec![],
+            model_version: "m".into(),
+            calibration_version: "c".into(),
+            privacy_decision: "p".into(),
+        };
+        let build = || {
+            let mut g = WorldGraph::new(GeoRegistration::default());
+            let room = g.upsert_node(living_room());
+            for _ in 0..6 {
+                // Identical timestamps: tie-break must fall back to id order.
+                g.add_semantic_state("s".into(), 0.5, 100, prov.clone(), &[room]);
+            }
+            g
+        };
+        let mut g1 = build();
+        let mut g2 = build();
+        assert_eq!(g1.prune_semantic_states(2), g2.prune_semantic_states(2));
+    }
+
    #[test]
    fn privacy_rollup_suppresses_person_tracks() {
        let mut g = WorldGraph::new(GeoRegistration::default());