This commit is contained in:
ruvnet 2026-06-11 20:24:37 +00:00
parent 931d961940
commit 46da04e483
6 changed files with 1370 additions and 0 deletions

View File

@ -0,0 +1,165 @@
# RuView System Review — Capability Audit (Beyond-SOTA Series, Doc 00)
**Date:** 2026-06-09
**Scope:** The RuView product surface (ADR-031) and the 38-crate Rust workspace under `v2/crates/` that implements it, plus the ADR corpus (`docs/adr/`, 150 numbered ADRs) and the prior research corpus (`docs/research/sota-2026-05-22/`).
**Method:** Direct reads of `lib.rs`/`mod.rs` and key ADRs; static test counts via `grep -r '#[test]'` / `#[tokio::test]` per crate (counts are *static occurrences in source*, not CI pass counts). No metrics in this document are estimated — everything cited was read or measured in the working tree.
---
## 1. Executive Summary — What RuView IS Today
RuView is **not a crate**. Per ADR-136 §2.1 (`docs/adr/ADR-136-ruview-streaming-engine-frame-contracts.md`), RuView is the sensing-first *product surface and brand* (ADR-031, status: Proposed) layered on the existing `wifi-densepose-*` / `homecore*` / `cog-*` workspace. ADR-136 explicitly **rejects** a `ruview_*` crate rename and pins a normative ten-role mapping (ingest / signal / fusion / world / models / privacy / store / api / eval / observe) onto the existing crates.
What concretely exists:
1. **A deep, heavily-tested signal-processing layer.** `wifi-densepose-signal` contains 473 static `#[test]` occurrences, including a 22-file `ruvsense/` bounded context (`v2/crates/wifi-densepose-signal/src/ruvsense/`) implementing the ADR-029 six-stage multistatic pipeline plus ADR-030/032a/134/135/137/138/142/143 extensions (~14,000 lines, 330 in-module tests measured by per-file grep).
2. **A trust-traceable composition root.** `wifi-densepose-engine` (`src/lib.rs`, 752 lines, 11 tests) wires fusion quality (ADR-137), array coordination (ADR-138), evolution change-points (ADR-142), RF-SLAM anchors (ADR-143), the WorldGraph (ADR-139), and the BFLD privacy control plane (ADR-141) into one `StreamingEngine::process_cycle` (`lib.rs:285`) that emits a `TrustedOutput` (`lib.rs:80`) carrying evidence + model version + calibration version + privacy decision + a BLAKE3 witness (`lib.rs:437`).
3. **A privacy layer with structural invariants.** `wifi-densepose-bfld` (20 modules, 369 tests) implements ADR-118123/141: raw BFI never exits the node (I1), identity embeddings are RAM-only (I2), cross-site identity correlation is cryptographically impossible (I3) — stated at `wifi-densepose-bfld/src/lib.rs:7-11`.
4. **A Home-Assistant-class world/state layer.** `homecore` + 9 sibling crates (state machine, event bus, plugins, automation, REST/WS API, recorder, HAP bridge, assist) — explicitly a "P1 scaffold" per `homecore/src/lib.rs:7` with deferred items listed at `lib.rs:24-31`.
5. **A drone-swarm extension.** `ruview-swarm` (17 modules, ~9,000 lines in subdirectories, 115 + 19 async tests), ADR-148 self-reports ~98% complete with the remaining 15% of M3 gated on real ESP32-S3 hardware (`ADR-148:940-953`).
6. **A large prior research corpus.** The 2026-05-22 autonomous SOTA loop: 41 ticks, 19 research threads (R1R20), 22 numpy reference implementations, 7 ADRs, and a 6-tier production roadmap (`docs/research/sota-2026-05-22/00-summary.md`, `PRODUCTION-ROADMAP.md`).
The critical caveat, stated by the project itself: the ADR-136146 series is *"a skeleton and nervous system, not a shipping product… Most of the series is not yet wired into the live 20 Hz pipeline"* (ADR-136 §8). The engine crate's own docs confirm what is absent: *"the live 20 Hz I/O loop (sensing-server), UWB hardware (ADR-144), and model training (ADR-146)"* (`wifi-densepose-engine/src/lib.rs:27-29`).
---
## 2. Capability Matrix — Pipeline Role → Crates → Maturity
Role mapping is normative per ADR-136 §2.1; maturity is this review's judgment from code + ADR status. Test counts: static `#[test]` + `#[tokio::test]` greps (2026-06-09).
| Role | Crate(s) | Key modules | Tests (sync+async) | Maturity | Evidence |
|---|---|---|---|---|---|
| **ingest** | `wifi-densepose-sensing-server`, `wifi-densepose-hardware`, `wifi-densepose-wifiscan` | `csi.rs`, `multistatic_bridge.rs`, `tracker_bridge.rs`, ESP32 TDM | 557+14, 137, 150 | **Production** (hardware-validated per ADR-028/039) | `sensing-server/src/` has 30+ modules incl. MQTT, Matter, RVF pipeline |
| **signal** | `wifi-densepose-signal` (incl. `ruvsense/`) | 6-stage pipeline (`ruvsense/mod.rs:9-23`), `cir.rs`, `calibration.rs`, `hampel.rs`, `fresnel.rs`, `phase_sanitizer.rs` | 473 | **Production** (unit level); live multistatic wiring **beta** | §3 below; ADR-014 Accepted, ADR-029 Proposed |
| **fusion** | `ruvsense/multistatic.rs`, `ruvsense/fusion_quality.rs`, `wifi-densepose-ruvector/src/viewpoint/` | `MultistaticFuser`, `QualityScore`, `CrossViewpointAttention`, GDI/Cramér-Rao (`viewpoint/geometry.rs`) | 20 (multistatic.rs), 3 (fusion_quality.rs), 136 (ruvector crate) | **Beta** — tested building blocks, composed only in `wifi-densepose-engine` tests | `viewpoint/mod.rs:1-30`; engine `lib.rs:317-319` |
| **world** | `homecore`, `wifi-densepose-worldgraph`, `wifi-densepose-geo`, `wifi-densepose-worldmodel` | `StateMachine`, `EventBus`, `WorldGraph` (rooms/sensors/person-tracks/semantic states), ENU geo registration | 9+11, 7, 16+1, 12+1 | **Beta** — homecore is explicit "P1 scaffold"; persistence/service dispatch deferred to P2 | `homecore/src/lib.rs:7, 24-31`; ADR-127 Proposed |
| **models** | `cog-pose-estimation`, `cog-person-count`, `wifi-densepose-nn`, `wifi-densepose-train`, `wifi-densepose-occworld-candle` | ONNX/Candle inference, training pipeline, OccWorld bridge | 7, 15, 30+1, 312, 12 | **Experimental** — no trained RF foundation encoder exists; ADR-147 benchmarked OccWorld with **random weights** | `ADR-147-benchmark-proof.md` ("random weights — pre-domain-fine-tuning baseline"); ADR-146/150 Proposed |
| **privacy** | `wifi-densepose-bfld` | `privacy_gate.rs`, `privacy_mode.rs` (mode registry + hash-chained attestation), `identity_risk.rs`, `signature_hasher.rs`, `embedding_ring.rs` | 369 | **Beta** — strongest-tested layer, but lib header still says "Status: P1 in progress" (`lib.rs:12`, stale vs 20 implemented modules) | ADR-118123, 141 all Proposed |
| **store** | `homecore-recorder` | trajectory/event recording | 8+12 | **Experimental** | ADR-136 §2.1 |
| **api** | `homecore-api`, `homecore-server`, `cog-ha-matter`, `homecore-hap` | REST/WS, HA discovery, Matter, HomeKit | 7+11, 0, 63+1, 15+2 | **Experimental→Beta** (`homecore-server` has zero tests) | ADR-130/125/115 Proposed |
| **eval** | `wifi-densepose-train/src/ablation.rs`, `ruview-swarm/src/evals/` | ablation harness (ADR-145), swarm eval suite (ADR-149) | included in 312 / 115 | **Experimental** — ADR-145 self-labels "skeleton/scaffolding, mostly not yet on the live 20 Hz path" | `ablation.rs` exists; ADR-149 (swarm benchmarking) Accepted |
| **observe** | `homecore-automation`, `homecore-assist` | automation engine, assistant/Ruflo bridge | 20+14, 3+20 | **Experimental** | ADR-129/133 Proposed |
| **(integration root)** | `wifi-densepose-engine` | `StreamingEngine`, `TrustedOutput`, privacy demotion, witness | 11 | **Beta** — the only crate that proves cross-role composition; not on a live I/O path | `engine/src/lib.rs:1-29, 457-751` |
| **(swarm)** | `ruview-swarm` | Raft/gossip topology, RRT-APF planning, Candle PPO MARL, CSI sensing payload, failsafe, Ruflo | 115+19 | **Experimental/simulation** — M3 needs real ESP32-S3 hardware | ADR-148:940-953 ("Overall ~98%", M3 85%) |
| **(adjacent)** | `nvsim`, `nvsim-server`, `ruv-neural`, `wifi-densepose-wasm-edge`, `wifi-densepose-mat`, `wifi-densepose-vitals` | NV-diamond sim, neural lib, WASM edge, MAT disaster tool, vitals | 50, 0, 364, 643, 165+9, 52 | Mixed — `mat`/`vitals`/`wasm-edge` mature unit-wise | crate listing |
**Workspace totals (measured):** 3,890 `#[test]` + 121 `#[tokio::test]` static occurrences across `v2/crates/`. (CLAUDE.md's "1,031+ tests" figure refers to an earlier `cargo test --workspace` run count; this review did not execute the suite.)
External vendored runtimes also present: `vendor/rvcsi` (ADR-095/096 edge RF runtime, own repo), `vendor/ruvector`, `vendor/midstream`, `vendor/sublinear-time-solver`.
---
## 3. Signal-Processing Capability Inventory — `ruvsense/`
Location: `v2/crates/wifi-densepose-signal/src/ruvsense/`. CLAUDE.md says "16 modules"; the directory now contains **22 `.rs` files** (21 modules + `mod.rs`) — the table below is the ground truth. Lines/tests measured per file (2026-06-09).
| Module | Lines | Tests | ADR | What it does |
|---|---:|---:|---|---|
| `mod.rs` | 510 | 14 | 029 | Pipeline shell, COCO-17 keypoint constants, `RuvSensePipeline` (concrete fields + `tick()`), re-exports |
| `multiband.rs` | 442 | 14 | 029 | Channel-hopping CSI → wideband virtual snapshot per node (`MultiBandCsiFrame`) |
| `phase_align.rs` | 460 | 13 | 029 | LO phase-offset estimation via circular mean + `ruvector-solver::NeumannSolver` |
| `multistatic.rs` | 957 | 20 | 029 | Attention-weighted N-node fusion → `FusedSensingFrame`; timestamp-spread guards |
| `coherence.rs` | 474 | 19 | 029 | Per-subcarrier z-score coherence vs rolling template; `DriftProfile` |
| `coherence_gate.rs` | 380 | 17 | 029 | Accept / PredictOnly / Reject / Recalibrate gate decisions |
| `pose_tracker.rs` | 1,577 | 38 | 029/026/082 | 17-keypoint Kalman tracker, lifecycle state machine, AETHER re-ID embeddings, skeleton constraints, temporal keypoint attention |
| `field_model.rs` | 1,417 | 22 | 030 | SVD room eigenstructure (persistent field model), perturbation extraction |
| `tomography.rs` | 751 | 12 | 030 | RF tomography, ISTA L1 voxel solver |
| `longitudinal.rs` | 1,020 | 20 | 030 | Welford long-horizon stats, biomechanics drift detection |
| `intention.rs` | 511 | 12 | 030 | Pre-movement lead signals (200500 ms) |
| `cross_room.rs` | 626 | 13 | 030 | Environment fingerprinting + room-transition graph |
| `gesture.rs` | 579 | 14 | 030 | DTW template-matching gesture classifier |
| `adversarial.rs` | 586 | 13 | 030/032 | Physically-impossible-signal detection, multi-link consistency |
| `attractor_drift.rs` | 566 | 15 | 032a | Midstream-enhanced attractor drift detection |
| `temporal_gesture.rs` | 540 | 15 | 032a | Midstream temporal gesture stream |
| `cir.rs` | 1,025 | 10 | 134 | CSI→CIR via ISTA L1 sparse recovery, NeumannSolver warm-start, `Complex32` sub-DFT Φ |
| `calibration.rs` | 717 | 8 | 135 | Empty-room baseline (Welford amplitude + von Mises phase), drift-triggered recalibration |
| `fusion_quality.rs` | 188 | 3 | 137 | `QualityScore` with `EvidenceRef`s, `ContradictionFlag`s, `CalibrationId`, privacy-demotion predicate |
| `array_coordinator.rs` | 343 | 5 | 138 | Clock-quality gating + `DirectionalEvidence` (geometric admission) |
| `evolution.rs` | 406 | 7 | 142 | Cross-link change-point detection, Bayesian `TemporalVoxelMap` (privacy-gated) |
| `rf_slam.rs` | 301 | 6 | 143 | Persistent reflector discovery → static anchor learning (Wall/Furniture/Mobile classes) |
Subtotal: ~14,400 lines, 310 tests inside `ruvsense/` alone. The non-ruvsense signal layer adds Hampel filtering, CSI-ratio, phase sanitisation, Fresnel modeling, BVP, spectrograms, subcarrier selection, and hardware normalisation (`signal/src/*.rs`).
**Cross-viewpoint fusion** (`wifi-densepose-ruvector/src/viewpoint/`, 5 files): scaled dot-product attention with geometric bias (`attention.rs`), Geometric Diversity Index + Cramér-Rao bounds (`geometry.rs`), phase-phasor coherence with hysteresis + clock-quality gate (`coherence.rs`), and the `MultistaticArray` aggregate root (`fusion.rs`). 136 tests crate-wide.
---
## 4. The Trust Chain — What Actually Composes Today
`wifi-densepose-engine/src/lib.rs` is the proof-of-composition. One `process_cycle` (`lib.rs:285-368`):
1. ADR-138 array coordination (only if every node's geometry is registered, `lib.rs:372-389`)
2. ADR-137 `fuse_scored_calibrated` with **per-node calibration epochs** — mismatching `CalibrationId`s raise a contradiction (`lib.rs:304-319`)
3. ADR-142 change-point → WorldGraph `Event` node (`lib.rs:393-430`)
4. ADR-141 monotonic privacy demotion on any contradiction (`demote_one`, `lib.rs:452-455`)
5. ADR-139/140 `SemanticState` with mandatory provenance (evidence ‖ model ‖ calibration ‖ privacy decision) (`lib.rs:336-352`)
6. BLAKE3 witness over the trust decision (`witness_of`, `lib.rs:437-448`)
The 11 engine tests verify exactly the right invariants: full provenance flow (`cycle_carries_full_provenance`, `lib.rs:487`), contradiction→demotion (`lib.rs:517`), determinism (`lib.rs:535`), calibration-mismatch→Restricted+stable-witness (`lib.rs:648`), privacy-mode attestation chain (`lib.rs:741`), and persist→reload round-trip with **no raw RF in the snapshot** (`live_frame_to_reload_same_contents`, `lib.rs:696-736`).
This is genuinely strong design. But all inputs are synthetic `MultiBandCsiFrame`s constructed in the test module; no ingest crate calls `StreamingEngine` yet.
---
## 5. Strengths
1. **Deterministic witness chain, end to end in design.** ADR-028 proof (`archive/v1/data/proof/verify.py` + SHA-256), ADR-119 BLAKE3 frame witnesses (`bfld/src/signature_hasher.rs`), ADR-136 `CanonicalFrame`/`ComplexSample` LE contracts, and the engine's per-cycle trust witness form a coherent auditability story few sensing systems attempt.
2. **Privacy as a control plane, not a feature.** BFLD's three structural invariants (`bfld/src/lib.rs:7-11`), hash-rotation (ADR-120), identity-risk scoring (ADR-121), mode registry with hash-chained attestations, and *monotonic* demotion wired to fusion contradictions (engine `lib.rs:327-328`) — uncertainty automatically reduces information release.
3. **Multistatic fusion with physics-grounded quality.** Attention fusion + GDI + Cramér-Rao bounds + clock-quality gating means geometry and synchronisation deficits are first-class, measurable contradiction sources rather than silent failure modes.
4. **Test density at the unit level.** 3,890 static test functions; the signal core (473), BFLD (369), and sensing-server (571) are the deepest. ruvsense files average ~14 tests/module.
5. **Honest self-assessment culture.** ADR-136 §8's "skeleton, not a shipping product" framing, ADR-147's explicit "random weights" disclosure, and homecore's in-source TODO-P2 ledger (`homecore/src/lib.rs:24-31`) make the gap analysis below mostly a matter of reading what the project already admits.
6. **A real prior research base with negative results.** The sota-2026-05-22 loop catalogued negatives by resolution path (missing-tool / architecture-error / physics-floor) and produced a ship-recipe (N=5 chest-centric placement, 100% coverage for 14 occupants) consolidated into ADR-113.
7. **Hardware path exists and was audited.** ADR-028 (Accepted) and ADR-039 (Accepted, hardware-validated) anchor the ESP32-S3/C6 ingest tier; firmware release process includes real-CSI verification on COM ports.
---
## 6. Honest Gap Analysis — ADR vs Implemented vs Integrated
| Capability | ADR status | Code status | Integrated on live path? |
|---|---|---|---|
| Six-stage ruvsense pipeline | ADR-029 **Proposed** | Implemented + tested (310 tests) | Partially — sensing-server has `multistatic_bridge.rs`/`tracker_bridge.rs`, but `RuvSensePipeline` still holds concrete fields with `tick()` only (`mod.rs`); no uniform `Stage<I,O>` chain runs live |
| Frame contracts (`ComplexSample`, provenance fields, `Stage` traits) | ADR-136 Proposed | Built + 9 acceptance tests (per ADR-136 §8, commit `11f89727f`) | **No** — AC6 600-frame replay witness key and AC7 cross-arch CI matrix not done; provenance fields not populated by live calibration/model stages |
| Fusion quality / contradictions | ADR-137 Proposed | `fusion_quality.rs` (188 lines, 3 tests) + engine wiring | Engine-tests only |
| WorldGraph digital twin | ADR-139 Proposed | `wifi-densepose-worldgraph` (4 files, 7 tests) | Engine-tests only; no recorder-backed persistence loop |
| Privacy control plane | ADR-141 Proposed | `privacy_mode.rs` registry + attestation chain, tested | Engine-tests only; MQTT/HA exposure exists in BFLD but the *engine→BFLD sink* live path is unwired |
| UWB range fusion | ADR-144 Proposed | **No hardware, no crate** — acknowledged absent (`engine/src/lib.rs:28`) | No |
| Ablation/leakage eval harness | ADR-145 Proposed | `wifi-densepose-train/src/ablation.rs` exists | Self-labelled "skeleton/scaffolding" (ADR-145 §status) |
| RF encoder multi-task heads | ADR-146 Proposed | Not trained; `model_id`/`model_version` registry unowned | No — engine stamps `rfenc-v1` as a placeholder string (`lib.rs:338`) |
| RF foundation encoder | ADR-150 **Proposed** | ADR only | No |
| World-model forecasting (OccWorld) | ADR-147 (benchmark doc) | Runs on RTX 5080, 72.39M params — **random weights**, no domain checkpoint | No |
| HomeCore HA port | ADR-125133 all Proposed | P1 scaffold + siblings; `homecore-server` has **0 tests**; persistence, service mpsc dispatch, device registry, witness integration all deferred (`homecore/src/lib.rs:24-31`) | Partially (API surfaces exist) |
| BFLD capture path (Nexmon/ESP32 BFI) | ADR-123 Proposed | rvCSI vendored runtime exists for nexmon `.pcap`; BFI-specific capture unverified in this review | Unclear |
| Drone swarm | ADR-148 In Progress | 17 modules, sim + Candle PPO complete per milestones | **Simulation only** — M3's 15% requires physical ESP32-S3 CSI capture (ADR-148:946) |
| Federation / DP-SGD / PQC | ADR-105109 Proposed | `ruview-fed` crate **does not exist** (roadmap Tier 2 item) | No |
| Antenna-placement CLI (`plan-antennas`) | ADR-113 Proposed; Roadmap Tier 1.1 HIGH | numpy references only; not found as a Rust CLI subcommand | No |
**Pattern:** the unit layer is real and deep; the *integration* layer is one crate (`wifi-densepose-engine`) exercised solely by its own synthetic tests; the *model* layer (anything learned: RF encoder, pose model fine-tuned on CSI, OccWorld domain weights) is the emptiest tier. Nearly every ADR ≥118 carries status **Proposed** even where substantial tested code exists — ADR status hygiene lags implementation in both directions (BFLD code outruns its "P1 in progress" header; ADR-148's "98%" outruns its hardware evidence).
---
## 7. Risk Register
| # | Risk | Likelihood | Impact | Evidence / Notes |
|---|---|---|---|---|
| R1 | **Integration gap**: trust chain proven only against synthetic in-test frames; live 20 Hz ingest→engine→BFLD-sink path unwired, so the headline guarantee (auditable provenance on every emission) is unverified in production conditions | High | Critical | `engine/src/lib.rs:27-29`; ADR-136 §8 |
| R2 | **No trained model**: every learned component (RF encoder ADR-146/150, OccWorld ADR-147) is random-weight or absent; sensing claims beyond coherence/occupancy heuristics cannot ship | High | Critical | ADR-147 "random weights"; ADR-146/150 Proposed |
| R3 | **Synthetic-validation bias**: ruvsense/engine/swarm tests and the sota-loop results (e.g., R3 "100% (synthetic)", ADR-113 placement numbers) are simulation-derived; real-room domain gap unquantified | High | High | `00-summary.md:45`; PRODUCTION-ROADMAP 2.3 ("turns synthetic numbers into validated numbers") |
| R4 | **Witness chain incomplete at frame level**: `CsiFrame.data` is still `serde(skip)` (ADR-136 Gap 2); AC6 replay-witness key and AC7 cross-architecture matrix not landed — deterministic replay is a design, not a property | Medium | High | ADR-136 §1.1, §8 |
| R5 | **Float nondeterminism in fusion** across thread counts could silently break the witness/replay contract once wired | Medium | High | ADR-136 §3.3 risk table (project's own assessment) |
| R6 | **Privacy bypass via unwired paths**: BFLD invariants are enforced per-module, but until the engine is the *only* route from ingest to API, a sensing-server endpoint can emit ungated state (sensing-server already has 30+ modules incl. pose/vitals APIs predating the control plane) | Medium | Critical | `sensing-server/src/` module list vs engine isolation |
| R7 | **Hardware dependence + scale**: multistatic TDMA/channel-hopping timing validated on small ESP32 sets; ADR-148 M3 explicitly blocked on real hardware; clock-quality model in engine uses a hardcoded `ClockQualityScore` (`engine/src/lib.rs:384`) | Medium | High | ADR-148:946; hardcoded 50 µs stdev |
| R8 | **ADR/doc/status drift**: 150 ADRs with near-universal "Proposed" status, stale in-source status headers (`bfld/src/lib.rs:12`), CLAUDE.md "16 ruvsense modules" vs 22 on disk, duplicate ADR numbers (two ADR-050s, two ADR-147s, two ADR-149s, ADR-052 ×2) — institutional-memory value degrades | High | Medium | `ls docs/adr/`; this review §3 |
| R9 | **Workspace breadth vs maintenance capacity**: 38 workspace crates + 4 vendored subtrees + Python archive + firmware; several crates have 0 tests (`homecore-server`, `nvsim-server`, `wifi-densepose-wasm`, `homecore-plugin-example`); bus factor appears to be ~1 | High | Medium | crate test-count table §2 |
| R10 | **Eval debt**: no end-to-end accuracy benchmark on real CSI with ground truth exists in-repo (ADR-145 harness is scaffolding; ADR-079 camera ground truth not exercised here) — "beyond SOTA" claims are currently unfalsifiable | High | High | ADR-145 status note; absence of ground-truth datasets in tree |
---
## 8. Measurement Appendix
- Test counts: `grep -r '#[test]'` / `#[tokio::test]` per crate directory, 2026-06-09. Workspace totals: 3,890 / 121. Top crates: `wasm-edge` 643, `sensing-server` 557+14, `signal` 473, `bfld` 369, `ruv-neural` 364, `train` 312, `mat` 165+9, `wifiscan` 150, `hardware` 137, `ruvector` 136, `ruview-swarm` 115+19.
- ruvsense per-file lines/tests: `wc -l` + per-file `grep -c '#[test]'` (table in §3).
- Crate inventory: `ls v2/crates/` → 38 directories.
- ADR inventory: `ls docs/adr/` → 150 numbered files (with the duplicate numbers noted in R8); `docs/adr/README.md` self-reports "45 ADRs" (stale).
- Caveats: static `#[test]` counts include `#[cfg(feature = ...)]`-gated and ignored tests; they are an upper bound on what `cargo test --workspace --no-default-features` runs. No cargo build/test was executed for this review.
*Next in series: 01+ documents should target the R1/R2/R10 axis — wiring the live path, training the RF encoder, and standing up a falsifiable real-CSI benchmark — before any "beyond SOTA" claim is made.*

View File

@ -0,0 +1,191 @@
# SOTA Landscape 2026 — The Bar a Beyond-SOTA RuView Must Clear
**Series**: ruview-beyond-sota (01)
**Date**: 2026-06-09
**Status**: Research survey / target definition
**Builds on (does not duplicate)**: `docs/research/sota-2026-05-22/00-summary.md` (physics floors, placement, privacy chain), `docs/research/BFLD/01-sota-survey.md` (beamforming-feedback leakage SOTA), `docs/research/neural-decoding/21-sota-neural-decoding-landscape.md` (sensor-fidelity framing), `docs/research/rf-topological-sensing/00-rf-topological-sensing-index.md` (mincut/topology resolution limits), ADR-150 (RF foundation encoder + measured MM-Fi campaign), ADR-147 (OccWorld benchmark proof).
## 0. Evidence legend
Every claim in this document carries one of three tags. **No RuView benchmark number in this document is invented**; all RuView numbers come from repo-internal measured artifacts.
| Tag | Meaning |
|-----|---------|
| **[V]** | Verified in this session via web search (June 2026); source linked in §8 |
| **[K]** | Training-knowledge claim (pre-2026 literature); plausible but **not re-verified** — treat as needing citation check before external publication |
| **[I]** | Internal RuView measurement or artifact (ADR, issue, witness bundle) — measured, not literature |
---
## 1. SOTA reference table per capability axis
### 1.1 Pose estimation (WiFi CSI)
| Method | Year | Metric | Dataset / protocol | Tag |
|--------|------|--------|--------------------|-----|
| DensePose From WiFi (Geng, Huang, De la Torre) | 2023 | Dense-pose UV regions from CSI, "comparable to image-based approaches" (same-layout); commonly cited AP≈43.5 / AP@50≈87.2 | 3×3 antenna, single-layout lab | exact AP numbers **[K]**; paper existence **[V]** (arXiv 2301.00250) |
| MetaFi++ (Zhou et al.) | 2023 | PCK@50 = **97.30%** same-domain real-world (MetaFi: 95.23%); drops to **81.786.5%** under stricter protocols | Own capture; protocol-sensitive | **[V]** |
| Person-in-WiFi 3D (CVPR 2024) | 2024 | End-to-end multi-person 3D; 20.4 M params, **54 FPS**; MPJPE ≈ 90100 mm on own dataset | Own multi-person dataset | FPS/params **[V]**; MPJPE range **[K]** |
| GraphPose-Fi (arXiv 2511.19105) | 2025 | SOTA on MM-Fi random split: **MPJPE 160.6 mm**, best PCK at all thresholds | MM-Fi, random split (S1) | **[V]** |
| CSDS (Electronics 14(4):756) | 2025 | Wi-Pose: PCK@5 = **0.6407**, PCK@50 = **0.8824** | Wi-Pose | **[V]** |
| PerceptAlign (arXiv 2601.12252) | 2026 | Cross-layout 3D: MPJPE **222.4 mm** (Scene 4) / **317.1 mm** (Scene 5), >54% better than prior cross-layout SOTA; in easier settings MPJPE 181.5 mm, PCK@20/50 = 44.2/79.5 | Cross-layout protocol | **[V]** |
| WiFlow (arXiv 2602.08661) | 2026 | Lightweight continuous HPE, spatio-temporal decoupling | — | **[V]** (existence; numbers not extracted) |
| **RuView / AetherArena** | 2026 | **81.63% torso-PCK@20 in-domain (random split), beating MultiFormer's 72.25%** on metric/protocol-matched MM-Fi; **leakage-free cross-subject collapses to ~11.6% torso-PCK zero-shot**; official-split harness baseline ~6365% PCK@20; **11 KB LoRA few-shot calibration → 72.5%** | MM-Fi (issue #876, ADR-150 §3) | **[I]** |
**The honest reading of the pose axis**: same-domain WiFi pose is "solved-looking" (PCK@50 in the 90s) and meaningless for deployment. The 20252026 literature has shifted to cross-layout/cross-subject protocols, where numbers collapse (PerceptAlign PCK@20 = 44.2 cross-layout **[V]**; RuView cross-subject zero-shot 11.6% **[I]**). ADR-150's measured finding — that the cross-subject gap is **subject-distribution shift, not an algorithmic gap**, and that **few-shot in-room calibration (5200 frames) closes it** — is ahead of where the published literature is: no published WiFi-pose paper we found ships a per-room ~11 KB adapter calibration mechanism. **[I]**
### 1.2 Presence / person count
| Method | Year | Metric | Tag |
|--------|------|--------|-----|
| Large-scale commodity router deployment (>10 M routers) | 2025 | **92.6% motion-detection accuracy** across diverse homes | **[V]** (ISAC survey, arXiv 2510.14358) |
| LeakyBeam (NDSS 2025) | 2025 | Occupancy through walls at 20 m from **plaintext BFI alone**: TPR 82.7%, TNR 96.7% | **[V]** (also in BFLD survey §4.2) |
| Time-Selective RNN multi-room presence (arXiv 2304.13107) | 2023 | Device-free multi-room presence from CSI | **[V]** (existence) |
| Academic person counting (05 occupants, lab) | 20202024 | typically 9097% exact-count accuracy, degrading sharply >5 people | **[K]** |
| **RuView** | 2026 | `cog-person-count` ships with calibrated uncertainty (`count_p95_low/high`); multistatic placement recipe with **100% coverage for 14 occupants at N=5 nodes (synthetic physics)** | **[I]** (sota-2026-05-22 R6.2.5, ADR-113) |
### 1.3 Vital signs (HR / BR)
| Method | Year | Metric | Tag |
|--------|------|--------|-----|
| PhaseBeat (ACM Health) | 2020 | HR median error **1.19 bpm**; BR median error **0.25 breaths/min** | **[V]** |
| MDPI Sensors 24(7):2111 non-contact HR | 2024 | HR accuracy 96.8%, **median error 0.8 bpm** | **[V]** |
| PulseFi (arXiv 2510.24744) | 2025 | Low-cost ML cardiopulmonary + **apnea** monitoring from CSI | **[V]** (existence; numbers not extracted) |
| mmWave FMCW vitals (60 GHz class) | 20232026 | HR MAE typically 13 bpm at 13 m, single subject; age-balanced reference dataset published (Sci Data 2026) | dataset **[V]**; MAE range **[K]** |
| Contactless blood pressure (WiFi-band) | — | **NEGATIVE** — below classical physics floor; recoverable only via quantum magnetometry path | **[I]** (R13/R20 arc, ADR-114) |
| **RuView** | 2026 | `wifi-densepose-vitals` (ADR-021) extracts HR/BR from ESP32 CSI; chest-centric placement gives **+27 pp coverage** for vitals cogs (synthetic) | **[I]** — **no accuracy-vs-ECG validation number exists in-repo yet; do not claim one** |
**Bar**: published single-subject, line-of-sight, 13 m WiFi HR is ~0.81.2 bpm median error **[V]**. Nobody credibly publishes multi-person, through-wall, walking-subject HR at that accuracy — that is open territory.
### 1.4 Localization (ToA / CRLB)
| Method | Year | Metric | Tag |
|--------|------|--------|-----|
| 802.11mc FTM | shipped | 12 m typical accuracy | **[V]** (FTM survey, arXiv 2509.03901) |
| 802.11az (+ 802.11bk) | released | **sub-1 m**, 160 MHz channels, secured ranging, HE-LTF repetitions | **[V]** |
| AI single-link decimeter localization | 2025 | **0.63 m average error** single-link, beating Widar2.0 / Dynamic-MUSIC | **[V]** |
| SpotFi / Chronos / Widar lineage | 20152021 | 0.41 m with multi-AP CSI AoA/ToF | **[K]** |
| **RuView** | 2026 | CRLB / Fisher-information machinery in `ruvector/src/viewpoint/geometry.rs`; tomography ISTA voxel grid; **theoretical** limits derived internally: 3060 cm at 16 nodes/1 m spacing, 8.8 cm information-theoretic dense limit | **[I]** (rf-topological-sensing doc 09 — synthetic derivations, no bench numbers) |
### 1.5 Through-wall
| Method | Year | Metric | Tag |
|--------|------|--------|-----|
| RF-Pose / RF-Pose3D (MIT, FMCW 5.47.2 GHz) | 2018 | Through-wall skeletal pose, ~specialized radar not commodity WiFi | **[K]** |
| Commodity 2.4 GHz through-wall imaging (arXiv 1903.03895) | 2019 | Coarse imaging through walls with commodity WiFi | **[V]** (existence) |
| Radio tomographic imaging (RTI) lineage | 20102013 | Through-wall tracking via RSS networks, ~0.51 m tracking error | **[V]** (papers) / error figure **[K]** |
| LeakyBeam (NDSS 2025) | 2025 | Through-wall **occupancy** at 20 m, passive, commodity | **[V]** |
| **RuView** | 2026 | RF tomography module (`tomography.rs`, ISTA L1 voxel solver) + CIR (ADR-134) exist as code; **PABS structure detection: 1,161× static / 9.36× dynamic intruder lift (synthetic)** | **[I]** |
Notably, the 20252026 web literature shows through-wall *pose* (not just presence) on commodity WiFi remains essentially where it was in 2019 — no verified commodity-WiFi through-wall pose benchmark surfaced in our searches. The frontier moved to privacy attacks (BFI) instead.
### 1.6 Identity / re-ID (capability and threat simultaneously)
| Method | Year | Metric | Tag |
|--------|------|--------|-----|
| BFId (KIT, ACM CCS 2025) | 2025 | **~99.5% (near-100%) re-ID across 197 subjects** from beamforming feedback alone, ≥5 s of BFI | **[V]** (also BFLD survey §4.1) |
| Transformer CSI identification | 2025 | **99.82%** on stationary subjects | **[V]** |
| WhoFi (arXiv 2507.12869) | 2025 | Deep person re-ID via WiFi channel encoding, ~95% rank-1 class results | existence **[V]**; exact number **[K]** |
| Wi-Gait | 2023 | 92.9% over 10 subjects, robust to walking cofactors | **[V]** |
| **RuView** | 2026 | AETHER contrastive re-ID embeddings (ADR-024) in pose tracker; **BFLD**: first *defensive* identity-leak detector (identity_risk_score) — the literature attacks, RuView audits | **[I]** |
### 1.7 Adjacent modality: mmWave radar (the accuracy ceiling WiFi is chasing)
| Method | Year | Metric | Tag |
|--------|------|--------|-----|
| mmChainPose | 2025 | **27.0 mm MPJPE** / 0.8706 OKS on MARS (mmWave point cloud) | **[V]** |
| ProbRadarM3F (arXiv 2405.05164) | 202425 | SOTA AP across joints, probability-map fusion | **[V]** |
| Seeed MR60BHA2-class 60 GHz FMCW | shipped | Commodity $15 HR/BR/presence module — already in RuView's hardware table | **[I]** |
mmWave is ~6× better than the best WiFi MPJPE (27 mm vs 160 mm) **[V]**. The strategic implication: WiFi will not beat mmWave on raw geometry; it wins on ubiquity, cost, through-wall propagation, and standardized waveforms (§2). RuView already hedges with the ESP32-C6 + MR60BHA2 fusion node. **[I]**
---
## 2. IEEE 802.11bf — status and implications
**Status (verified)**: IEEE **802.11bf-2025 is ratified and published** (IEEE SA lists the amendment; ratification late 2024 / publication 2025) **[V]**. It amends MAC/PHY of HE (Wi-Fi 6) and EHT (Wi-Fi 7) plus DMG/EDMG (60 GHz) to support WLAN sensing in 17.125 GHz and >45 GHz bands **[V]**. The Wi-Fi Alliance has Wi-Fi Sensing as an active certification work area built on 802.11bf (presence/proximity, gestures, vital signs) **[V]**. Market reports claim >47 chipset vendors with 802.11bf-compatible programs as of early 2026 — single weak source, treat as directional **[V, low confidence]**.
**What it implies for RuView**:
1. **Sounding-on-demand becomes standard.** 802.11bf defines a sensing-measurement procedure (sensing initiator/responder, trigger-based sounding, threshold-based reporting). Today RuView relies on Espressif's vendor CSI API and Nexmon firmware patches; post-bf, commodity Wi-Fi 7 silicon will expose scheduled sensing measurements without firmware hacks. The rvCSI normalized `CsiFrame` schema is the right abstraction layer to absorb a future bf adapter (`rvcsi-adapter-*`). **[I]**
2. **The moat moves up the stack.** When every router can sense, raw CSI access stops being differentiating. Differentiators become: multistatic fusion, coherence gating / anti-hallucination, calibration mechanisms, witness-grade verification, and privacy auditing — exactly RuView's existing bets (ADR-029/135/150/028, BFLD). **[I]**
3. **Privacy pressure intensifies.** 802.11bf standardizes the capability that BFId/LeakyBeam exploit. BFLD's identity-leak detection and the ADR-105109 privacy/PQC chain become regulatory assets, not nice-to-haves. **[V]+[I]**
4. **Threshold-based reporting** in bf (report only when channel changes exceed threshold) is architecturally the same idea as RuView's coherence gate — validation that the gate belongs at the protocol layer. **[K]** (bf reporting detail from training knowledge)
---
## 3. RF foundation model landscape ("GPT for RF")
Verified 20252026 attempts, all young, none dominant:
| Model | Approach | Downstream tasks | Tag |
|-------|----------|------------------|-----|
| **LWM (Large Wireless Model)** | Pretrained on large-scale CSI → general channel embeddings | LoS/NLoS, beats raw features in low-data regimes | **[V]** |
| **LatentWave** (arXiv 2606.06373) | JEPA pretraining on wireless spectrograms + CSI | RF classification, 5G NR positioning, beam prediction, LoS/NLoS | **[V]** |
| **WirelessJEPA** (arXiv 2601.20190) | Multi-antenna spatio-temporal latent prediction | Cross-task transfer | **[V]** |
| **IQFM** | Contrastive SSL on raw I/Q | Modulation classification, beam prediction, RF fingerprinting, few-shot | **[V]** |
| **Multimodal Wireless FMs** (arXiv 2511.15162), **WMFM** (arXiv 2512.23897), **SoM** (arXiv 2506.07647) | Vision + RF multimodal for 6G ISAC | Sensing-communication integration | **[V]** |
| **DeepSig OmniSIG** | Commercial AI-native RF sensing, 500 MHz/GPU spectrum | Signal ID (LTE/5G/Wi-Fi) | **[V]** |
**Critical observation**: every verified RF foundation model targets *communication-side* tasks (beam prediction, LoS/NLoS, modulation, positioning). **None of them is a human-sensing foundation model** — none pretrains for pose/vitals/identity invariances. ADR-150's measured negative result is the sharpest data point in this space: pose-contrastive pretraining across subjects **failed on MM-Fi because the invariance is not in the data** (loss never left the ln(B) floor) **[I]**. The literature has not yet published this failure mode; the field's "GPT for RF sensing" narrative is ahead of its evidence. The defensible foundation-model objective (per ADR-150 §3.53.6) is **reduce few-shot calibration cost**, not zero-shot invariance. **[I]**
---
## 4. "Beyond SOTA" for RuView — precise definition
Targets below are **bar definitions**, not claims. RuView numbers in the "current" column are measured [I]; targets must be proven via the AetherArena witness protocol (ADR-149) before being asserted anywhere.
| Capability | Published SOTA (2026) | RuView measured today | RuView beyond-SOTA target | Key obstacle |
|------------|----------------------|----------------------|---------------------------|--------------|
| Pose, in-domain (MM-Fi) | GraphPose-Fi 160.6 mm MPJPE; MultiFormer 72.25% torso-PCK@20 **[V]** | **81.63% torso-PCK@20** (already > published) **[I]** | Hold #1 under leakage-free audit + per-joint tables published with witness rows | Protocol fragmentation; reviewers distrust WiFi-pose numbers |
| Pose, cross-subject zero-shot | ~collapse everywhere; PerceptAlign PCK@20 44.2 cross-layout **[V]** | 11.6% torso zero-shot; 6365% in-harness official split **[I]** | Stop chasing it (measured dead end); instead **few-shot frontier** below | Subject-distribution shift is in the data, not the model (ADR-150 §3.2) |
| Pose, deployment calibration | **No published per-room adapter mechanism found** | **11 KB LoRA, 100200 frames → 72.5%; cross-env K=5 → 60.1%** **[I]** | ≤20 frames → ≥70% PCK@20, adapter ≤11 KB, 30 s on-site; publish as the first calibration-service benchmark | Needs diverse-room capture fleet to validate beyond MM-Fi |
| Presence/motion (commodity) | 92.6% across 10 M routers **[V]** | Synthetic placement recipe 100% coverage N=5 **[I]** | ≥99% presence with calibrated p95 bounds on $615 ESP32 mesh, bench-validated | All placement numbers are synthetic; Tier-2.3 bench validation outstanding |
| Person count | ~9097% lab, ≤5 people **[K]** | cog ships uncertainty intervals **[I]** | Exact count 16 people ≥95% with honest intervals, multistatic, real bench | Multi-person CSI superposition; no public multi-occupancy benchmark |
| Vital signs HR | 0.81.2 bpm median, single subject, LoS, 13 m **[V]** | No in-repo ECG-validated number — **must not be claimed** | ≤1.5 bpm MAE vs ECG ground truth, *multi-person or through-wall*, witness-bundled | R13 physics floor: ~5 dB shortfall at distance; needs chest-centric placement + PABS |
| Vital signs BP | NEGATIVE at WiFi band (matches internal R13) | nvsim quantum path only **[I]** | First validated quantum-classical fused bedside vitals (ADR-114) | NV-diamond hardware maturity, 2028+ |
| Localization | 0.63 m single-link AI; sub-1 m 802.11az **[V]** | CRLB machinery, no bench number **[I]** | ≤30 cm multistatic on ESP32 mesh (internal theory says feasible at N=16) | ESP32 clock sync / phase offset (TDM protocol exists, unproven at this accuracy) |
| Through-wall | Occupancy yes (LeakyBeam); commodity pose: nothing credible **[V]** | tomography + CIR code, PABS 9.36× lift (synthetic) **[I]** | First witnessed commodity-WiFi through-wall *person localization* (not pose) ≤1 m | Wall attenuation eats the R6.1 4.7 dB multi-scatterer budget |
| Identity / re-ID | ~99.5% @ 197 subjects (attack) **[V]** | AETHER + **BFLD defensive auditing** (no published competitor) **[I]** | Ship the first identity-leak risk score with DP budget hook; keep re-ID opt-in only | Calibrating risk score at 802.11ax 4/2-bit quantization (BFLD open Q2) |
| Verification | **Nothing comparable published** — no WiFi-sensing paper ships deterministic re-verification | ADR-028 witness bundles, SHA-256 proof, 7/7 self-verify, 1,031+ tests **[I]** | Make witness-grade reproduction the *expected* standard: every public claim = one-command verification | Community adoption, not technology |
| Foundation encoder | Comms-task FMs only (LWM/JEPA family) **[V]** | Masked-CSI + coherence head planned; pose-contrastive refuted **[I]** | First *sensing* FM whose acceptance metric is calibration-sample reduction (frames-to-72% halved) | SSL must match production CSI pipeline (ADR-149 resampling risk) |
---
## 5. Where RuView already matches/exceeds published work
1. **In-domain MM-Fi pose** — 81.63% torso-PCK@20 vs MultiFormer 72.25%, metric- and protocol-matched (issue #876). **[I]**
2. **Deployment-calibration mechanism** — the 11 KB LoRA per-room adapter with measured frames-to-accuracy curves (§3.43.6 of ADR-150) has no published equivalent; the literature is still arguing about zero-shot generalization that ADR-150 measured to be a data property.
3. **Deterministic witness verification** — ADR-028's SHA-256 pipeline proof + self-verifying bundles exceeds the reproducibility practice of every WiFi-sensing paper surveyed (none ship deterministic re-verification).
4. **Multistatic cost point** — $615/node ESP32 mesh with TDM sync, channel hopping, placement recipes (ADR-113) vs literature setups using Intel 5300/AX210 laptops or USRPs; ~$30/bed vs $3,000 clinical monitor framing (R16).
5. **Defensive identity auditing (BFLD)** — the field publishes attacks (BFId, LeakyBeam, WhoFi); RuView is building the only detector/auditor, plus a PQC-hardened federation privacy chain (ADR-105109) with no published counterpart.
6. **Anti-hallucination coherence gating** — confidence gated by RF integrity (ADR-135, ADR-150 §2.4); WiFi-pose papers uniformly lack a "the model knows when the channel is bad" signal.
7. **Negative-result discipline** — physics floors (R13 BP, R6.1 4.7 dB), refuted pose-contrastive pretraining — published SOTA papers do not report these, which inflates the apparent literature bar.
## 6. Where RuView lags
1. **Bench validation** — nearly all multistatic/placement/tomography numbers are synthetic-physics; the 92.6%-on-10M-routers deployment **[V]** is real-world evidence at a scale RuView cannot approach.
2. **Vital-sign ground truth** — no in-repo ECG/respiration-belt validated HR/BR error; published work has 0.8 bpm median **[V]**. This is the most urgent claim gap.
3. **Raw geometric accuracy** — mmWave (27 mm MPJPE **[V]**) and even best-WiFi MPJPE (160.6 mm **[V]**) have no RuView MPJPE counterpart published; AetherArena reports PCK only.
4. **802.11bf-native capture** — RuView is on vendor CSI APIs and Nexmon patches; no bf sensing-procedure adapter exists yet in rvCSI.
5. **Multi-person pose** — Person-in-WiFi-3D does end-to-end multi-person at 54 FPS **[V]**; RuView's pose path is effectively single-person (multi-person exists only in count/placement work).
6. **Dataset scale and diversity** — MM-Fi only; ADR-150 §3.3 shows the binding constraint is room/device/protocol diversity, which requires the capture fleet that doesn't exist yet.
## 7. Strategic synthesis
The 2026 bar is bimodal: **lab in-domain numbers are saturated** (PCK@50 > 95%, HR < 1 bpm) and **deployment numbers are collapsed** (cross-layout PCK@20 44, zero-shot cross-subject 11%). 802.11bf-2025 commoditizes raw sensing; foundation models commoditize comms-side embeddings. "Beyond SOTA" for RuView is therefore *not* a leaderboard delta it is owning the three layers the field hasn't built: **(a)** witnessed, deterministic, leakage-audited evaluation; **(b)** the few-shot calibration service (11 KB adapters) as the deployment answer the zero-shot literature lacks; **(c)** the privacy/integrity layer (BFLD + coherence gate) that 802.11bf-era regulation will demand. Each row in §4's target table is gated on the AetherArena witness protocol a target becomes a claim only when it ships with a one-command reproduction.
---
## 8. Verified sources (accessed 2026-06-09 via web search)
Pose: [GraphPose-Fi](https://arxiv.org/html/2511.19105v1) · [PerceptAlign / cross-layout](https://arxiv.org/html/2601.12252) · [CSDS](https://www.mdpi.com/2079-9292/14/4/756) · [Person-in-WiFi 3D](https://aiotgroup.github.io/Person-in-WiFi-3D/) · [DensePose From WiFi](https://arxiv.org/abs/2301.00250) · [MetaFi++](https://www.researchgate.net/publication/369644995_MetaFi_WiFi-Enabled_Transformer-based_Human_Pose_Estimation_for_Metaverse_Avatar_Simulation) · [WiFlow](https://arxiv.org/html/2602.08661v2)
Vitals: [PhaseBeat](https://dl.acm.org/doi/abs/10.1145/3377165) · [Non-contact HR (Sensors 24:2111)](https://www.mdpi.com/1424-8220/24/7/2111) · [PulseFi](https://arxiv.org/pdf/2510.24744) · [mmWave vitals dataset (Sci Data)](https://www.nature.com/articles/s41597-026-07172-9)
Localization: [FTM survey 802.11mc/az/bk](https://arxiv.org/abs/2509.03901) · [Decimeter single-link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12846125/) · [SelfLoc 802.11az](https://www.mdpi.com/2079-9292/14/13/2675)
802.11bf: [IEEE SA 802.11bf-2025](https://standards.ieee.org/ieee/802.11bf/11574/) · [TGbf](https://www.ieee802.org/11/Reports/tgbf_update.htm) · [NIST overview](https://www.nist.gov/publications/ieee-80211bf-enabling-widespread-adoption-wi-fi-sensing) · [Wi-Fi Alliance work areas](https://www.wi-fi.org/current-work-areas) · [ISAC survey (10M-router 92.6%)](https://arxiv.org/pdf/2510.14358)
Identity: [BFId / KIT CCS 2025 coverage](https://www.gblock.app/articles/wifi-signal-person-identification-surveillance-study-may-2026) · [WhoFi](https://arxiv.org/html/2507.12869v1) · [Wi-Gait](https://www.sciencedirect.com/science/article/abs/pii/S1389128623001962) · [LeakyBeam NDSS 2025](https://www.ndss-symposium.org/ndss-paper/lend-me-your-beam-privacy-implications-of-plaintext-beamforming-feedback-in-wifi/)
Through-wall: [RTI through-wall](https://ieeexplore.ieee.org/document/6214374/) · [Commodity 2.4 GHz imaging](https://arxiv.org/pdf/1903.03895) · [Multi-room presence](https://arxiv.org/pdf/2304.13107)
Foundation models: [LatentWave](https://arxiv.org/html/2606.06373) · [WirelessJEPA](https://arxiv.org/pdf/2601.20190) · [Multimodal Wireless FMs](https://arxiv.org/pdf/2511.15162) · [WMFM](https://arxiv.org/html/2512.23897) · [SoM](https://arxiv.org/pdf/2506.07647) · [RF-native AI / LWM, IQFM, OmniSIG](https://aicompetence.org/rf-native-ai-models-for-the-invisible-spectrum/)
mmWave: [mmChainPose](https://www.sciencedirect.com/science/article/abs/pii/S0925231225026918) · [ProbRadarM3F](https://arxiv.org/html/2405.05164v3)
Internal [I] sources: ADR-150 (§1, §3.23.6), ADR-147, ADR-028, ADR-113/114, issue #876, `docs/research/sota-2026-05-22/00-summary.md`, `docs/research/BFLD/01-sota-survey.md`, `docs/research/rf-topological-sensing/`.

View File

@ -0,0 +1,282 @@
# RuView Beyond-SOTA Target Architecture
**Series:** ruview-beyond-sota (02)
**Date:** 2026-06-09
**Status:** Research design — components marked **PROPOSED** do not exist yet; everything else cites real code.
**Governing constraint:** ADR-136 §2.1 explicitly rejects renaming/rewriting the workspace. This document designs an **evolution** of the existing 38-crate `v2/` workspace (`v2/Cargo.toml`), not a new system. Every beyond-SOTA layer attaches to the ADR-136 `Stage<I,O>` / `FrameMeta` / `CanonicalFrame` contracts (`docs/adr/ADR-136-ruview-streaming-engine-frame-contracts.md` §2.22.5) and preserves the ADR-028 witness chain.
---
## 1. Where the system is today (grounding)
The ADR-136 ten-role pipeline (ingest → signal → fusion → world → models → privacy → store → api → eval → observe) is already mapped 1:1 onto existing crates (ADR-136 §2.1, normative table). The composition root exists: `v2/crates/wifi-densepose-engine/src/lib.rs` wires ADR-135..146 blocks into one `StreamingEngine::process_cycle` that emits a `TrustedOutput` carrying fusion `QualityScore`, privacy class, `SemanticProvenance`, RF-SLAM (`RfSlam` field), and a BLAKE3 `witness: [u8; 32]`.
Key existing substrate this design builds on:
| Substrate | Path | What it gives us |
|---|---|---|
| Frame contracts + witness | `v2/crates/wifi-densepose-core/src/types.rs` (`CsiFrame`, `CsiMetadata` + `calibration_id`/`model_id`/`model_version`), ADR-136 `ComplexSample`/`CanonicalFrame` | Deterministic LE bytes, BLAKE3 witness, provenance-append-only boundary rule |
| Six-stage signal pipeline | `v2/crates/wifi-densepose-signal/src/ruvsense/mod.rs` (+22 modules incl. `cir.rs`, `calibration.rs`, `tomography.rs`, `rf_slam.rs`, `fusion_quality.rs`, `array_coordinator.rs`) | CSI→CIR, baseline calibration, multistatic fusion, coherence gating |
| Fusion quality + evidence | ADR-137; `ruvsense/multistatic.rs`, `ruvsense/fusion_quality.rs`, `wifi-densepose-ruvector/src/viewpoint/fusion.rs` | `QualityScore` with `EvidenceRef`/`ContradictionFlag`, privacy demotion on contradiction |
| Digital twin | `v2/crates/wifi-densepose-worldgraph/src/lib.rs` (typed `StableDiGraph`, mandatory `SemanticProvenance`) | Persistent room/sensor/track/belief graph |
| World model bridge | `v2/crates/wifi-densepose-worldmodel/src/lib.rs` (`OccWorldBridge`, `TrajectoryPrior`, ADR-147) | Occupancy prediction priors into the Kalman tracker |
| NN + training | `v2/crates/wifi-densepose-train/src/{model.rs,rapid_adapt.rs,ablation.rs,proof.rs,eval.rs,ruview_metrics.rs}`, `wifi-densepose-nn` | Shared backbone + 2 heads, `AdaptationLoss::ContrastiveTTT`, ADR-145 ablation matrix, seeded proof harness |
| Swarm | `v2/crates/ruview-swarm/src/` (`sensing/{multiview.rs,payload.rs,occworld_bridge.rs}`, `marl/`, `topology.rs`) | Raft/hierarchical-mesh drone coordination with CSI payload (ADR-148) |
| Edge WASM | `v2/crates/wifi-densepose-wasm-edge/src/lib.rs` (WASM3 on ESP32-S3, `on_frame` host ABI), `wifi-densepose-wasm` | Hot-loadable on-device sensing modules |
| Quantum-adjacent sim | `v2/crates/nvsim/src/lib.rs` (deterministic NV-magnetometry forward pipeline, SHA-256 witness, WASM-ready) | Honest classical-quantum hybrid substrate (ADR-089) |
| Semantic record + agents | ADR-140 (`wifi-densepose-sensing-server/src/semantic/`), `homecore-assist` | Provenance-bearing semantic states, Ruflo agent bridge |
---
## 2. Target architecture diagram
The beyond-SOTA layers (★ = new/PROPOSED, ☆ = exists-but-not-wired) wrap the ADR-136 pipeline; nothing replaces it.
```
╔═══════════════════ BEYOND-SOTA CONTROL PLANE ═══════════════════╗
║ P6 Continual adaptation loop (TTT + EWC★) P5 Swarm aperture ║
║ rapid_adapt.rs → encoder LoRA deltas planner★ (Raft) ║
╚════════════▲══════════════════════▲══════════════▲══════════════╝
│ adaptation deltas │ quality │ tasking
[ingest] [signal] │ [fusion] │ [world] │ [models]
ESP32/Pi mesh ─► RuvSensePipeline ──────┴──► fuse_scored ──────┴─► WorldGraph ┴──► RF Foundation
+ drone payload multiband→phase_align (ADR-137 (ADR-139 │ Encoder (P1)
(ruview-swarm →calibration(135) QualityScore, twin) ◄───────┘ 7 heads + UQ
sensing/payload) →cir(134)→multistatic EvidenceRef, ▲ │ (ADR-146/150)
│ →coherence→gate Contradiction) │ ▼ │
│ │ │ RF-SLAM(143)──OccWorld │
▼ ▼ │ rf_slam.rs worldmodel ▼
P7 WASM edge P2 Differentiable RF │ (P3 closed loop ☆) P4 cross-modal
inference★ forward model★ │ distilled student★
(wasm-edge, (tomography.rs + │ (camera-free deploy)
deterministic cir.rs ISTA as seed) │
replay) │ residuals feed fusion as EvidenceRef★
│ ▼
│ P8 NV-magnetometry fusion★ (nvsim forward model as a sensing node class)
─────────────────────── ADR-136 CONTRACT SPINE (unchanged) ───────────────────────────────────
CsiFrame{ComplexSample, FrameMeta{calibration_id, model_id, model_version}} → Stage<I,O>
→ CanonicalFrame::witness_hash() at EVERY stage boundary (BLAKE3, LE-deterministic)
───────────────────────────────────────────────────────────────────────────────────────────────
│ │ │ │
[privacy] [store] [api] [eval] [observe]
wifi-densepose-bfld homecore-recorder homecore-api ADR-145 ablation homecore-
gate + demotion + replay corpus★ /HA/Matter/HAP (train/ablation.rs automation,
(ADR-141) + P1-P8 variants) Ruflo (ADR-140)
```
---
## 3. The eight pillars
Each pillar: what / why beyond-SOTA / builds-on / contract sketch / feasibility. All trait sketches are **PROPOSED** unless a path is cited.
### P1 — RF Foundation Encoder with multitask uncertainty heads (ADR-146 + ADR-150)
**What.** One shared, self-supervised RF encoder (`wifi-densepose-nn`) with seven typed heads (pose, presence, count, activity, vitals, gait, identity-embedding), each emitting calibrated uncertainty via the ADR-136 `QualityScored` trait, trained with the ADR-150 pose-contrastive objective (same-pose-across-subjects = positive) plus a coherence head that exposes channel instability.
**Why beyond SOTA.** Published WiFi-pose systems (MultiFormer, GraphPose-Fi lineage) report in-domain accuracy and hallucinate under domain shift. ADR-150 documents the real measured frontier: 81.63% torso-PCK@20 in-domain on MM-Fi vs ~11.6% leakage-free cross-subject, and that DANN and bigger capacity both failed (ADR-150 §1). A foundation encoder whose loss stack explicitly separates pose / identity / room / device factors *and* emits an RF-integrity signal per prediction is not in the published literature as a deployed, auditable artifact. Target (not a claim): close the cross-subject gap materially while every head output carries `confidence_bounds()`.
**Builds on.** `v2/crates/wifi-densepose-train/src/model.rs` (`WiFiDensePoseModel`, `kp_head`/`dp_head`); `v2/crates/wifi-densepose-sensing-server/src/embedding.rs` (`ProjectionHead` + LoRA + `info_nce_loss` — the existing seventh head, ADR-146 §1.1); `v2/crates/wifi-densepose-train/src/rapid_adapt.rs` (ContrastiveTTT precedent); ADR-146 §1.4 head fan-out; ADR-150 §2 loss stack.
**Contract sketch** (lands in `wifi-densepose-nn`, per ADR-146 §1.3):
```rust
pub trait RfEncoder: Send + Sync {
fn encode(&self, window: &CsiWindowTensor) -> Embedding; // z ∈ R^d_model
fn model_id(&self) -> u16; // FrameMeta binding (ADR-136 §2.2)
}
pub trait TaskHead<O: QualityScored>: Send + Sync {
fn name(&self) -> &'static str;
fn forward(&self, z: &Embedding) -> O; // value + uncertainty bounds
}
pub struct MultiTaskOutput { /* per-head QualityScored outputs + coherence: f32 */ }
```
**Feasibility: HIGH for the architecture, MEDIUM for the headline result.** The pure-Rust f32 ABI is proven (`embedding.rs`), the head taxonomy is specified (ADR-146), and the ablation harness to measure it exists (`wifi-densepose-train/src/ablation.rs`). The risk is scientific, not engineering: ADR-150's own data shows naive approaches fail; the pose-contrastive objective is plausible but unproven at scale. Mitigation: ADR-150 §3's frozen-decoder three-variant experiment gates promotion.
### P2 — Physics-informed differentiable RF forward model (PROPOSED)
**What.** A differentiable forward model `render(scene, link_geometry) -> predicted CSI/CIR` used three ways: (1) as a regularizer in encoder training (predictions must be consistent with a Born-approximation scattering model), (2) as an analysis-by-synthesis residual at inference (`|observed rendered|` becomes an ADR-137 `EvidenceRef`), (3) as a synthetic-data generator complementing MM-Fi (ADR-015).
**Why beyond SOTA.** Published WiFi sensing is almost entirely discriminative; physics-informed neural fields exist for vision (NeRF) and acoustics but no deployed RF-human-sensing stack closes the loop *forward model → residual → fusion evidence → privacy decision*. Making physics disagreement a first-class, witnessed contradiction flag is novel system design, not just a model.
**Builds on.** The codebase already contains the seed of the forward model: `v2/crates/wifi-densepose-signal/src/ruvsense/tomography.rs` (`RfTomographer`, `LinkGeometry`, `OccupancyVolume` — a linear shadowing forward model inverted by ISTA), `ruvsense/cir.rs` (sub-DFT sensing matrix Φ, ISTA L1 — ADR-134), ADR-143 §1.3 (bistatic excess-delay geometry, the exact ray equations), and `nvsim` as the in-repo precedent for a *deterministic, witness-hashed forward physics pipeline* (`v2/crates/nvsim/src/{propagation.rs,pipeline.rs,proof.rs}`).
**Contract sketch** (new module `wifi-densepose-signal/src/ruvsense/forward_model.rs`, PROPOSED):
```rust
pub trait RfForwardModel: Versioned {
/// Predict per-link CSI given a voxel scene + body primitive set.
fn render(&self, scene: &OccupancyVolume, links: &[LinkGeometry]) -> Vec<PredictedCsi>;
/// Physics residual in [0,1]; 0 = perfectly Maxwell/Born-consistent.
fn residual(&self, observed: &CsiFrame, rendered: &PredictedCsi) -> PhysicsResidual; // → EvidenceRef
}
```
**Feasibility: MEDIUM, with one honest line drawn.** A full Maxwell FDTD-in-the-loop solver is **infeasible** at 20 Hz on this hardware and is a non-goal (§6). What is feasible: a first-order Born / ray-tracing bistatic model (the ADR-143 spheroid geometry generalized), differentiable through finite differences or a small Candle graph, validated against recorded calibration captures (ADR-135 baselines give per-link empty-room ground truth for free). "Maxwell-consistent" should be read as "consistent with a stated first-order approximation, with the approximation order recorded in the witness metadata."
### P3 — RF-SLAM × WorldGraph × OccWorld closed loop (exists in parts, wiring is the work)
**What.** Close the loop: RF-SLAM discovers reflectors/anchors → WorldGraph persists them as `object_anchor` nodes → OccWorld consumes graph occupancy → `TrajectoryPrior`s feed the Kalman tracker → improved tracks refine SLAM association. The environment model becomes self-acquiring and self-correcting (furniture moved ⇒ `BaselineTopologyChange` ⇒ recalibration trigger, ADR-143 §1.4).
**Why beyond SOTA.** Published RF-SLAM work maps *or* tracks; no published consumer system maintains a persistent, provenance-bearing, privacy-rolled-up environmental digital twin (`PrivacyRollup` in `wifi-densepose-worldgraph/src/graph.rs`) that is simultaneously the SLAM map, the automation substrate, and the audit record. The differentiator is the closed loop with evidence edges (`supports`/`contradicts`).
**Builds on.** All three vertices exist: `v2/crates/wifi-densepose-signal/src/ruvsense/rf_slam.rs` (`RfSlam::observe`, line 176, already a field of `StreamingEngine``wifi-densepose-engine/src/lib.rs:116`); `v2/crates/wifi-densepose-worldgraph/src/lib.rs`; `v2/crates/wifi-densepose-worldmodel/src/{bridge.rs,occupancy.rs}` (`worldgraph_to_occupancy`, `OccWorldBridge::predict`). The engine already upserts SLAM output and person tracks into the graph. Missing: prior-injection back into `ruvsense/pose_tracker.rs`, and the topology-change → ADR-135 recalibration edge.
**Contract sketch** (extends existing types):
```rust
impl StreamingEngine {
/// PROPOSED: inject OccWorld priors into the next tracker cycle.
pub fn apply_trajectory_priors(&mut self, priors: &[TrajectoryPrior]) -> Vec<WorldId>;
}
// WorldEdge gains (PROPOSED): PredictedBy { model_id: u16 } — prior provenance edge
```
**Feasibility: HIGH.** This is mostly integration glue between tested crates. The two real risks are already named by ADR-143: no ground-truth oracle in a live home (mitigated by the v1-fixed / v2-flagged rollout, `#[cfg(feature = "rf-slam-v2")]`), and OccWorld's Python subprocess (ADR-147: 375 ms/inference) being off the deterministic path — priors must be treated as advisory, never witness-bearing (§5).
### P4 — Cross-modal distillation: camera-teacher → RF-student, privacy-preserving deployment (PROPOSED)
**What.** Train-time-only camera supervision: a vision pose teacher labels synchronized CSI (MM-Fi already provides paired modalities, ADR-015), distilling dense pose + uncertainty into the P1 encoder. Deployed systems ship **no camera and no camera-derived identity features**; the ADR-145 privacy-leakage metric (membership-inference score in `wifi-densepose-train/src/ablation.rs`) gates that the student does not retain identity.
**Why beyond SOTA.** Camera-supervised WiFi pose is the original DensePose-WiFi recipe; what is *not* published is distillation with a measured, CI-enforced privacy-leakage budget and a witnessed claim that the deployed artifact is camera-free. The beyond-SOTA move is making "privacy-preserving" a *measured property of the release pipeline*, not a marketing adjective.
**Builds on.** `v2/crates/wifi-densepose-train/src/{trainer.rs,losses.rs,dataset.rs}` (training substrate); ADR-015 paired datasets; ADR-145 `FeatureSet` matrix + privacy-leakage scalar; `v2/crates/wifi-densepose-bfld` (`privacy_gate.rs`, `signature_hasher.rs` — runtime identity controls, ADR-120 invariants I1I3).
**Contract sketch** (in `wifi-densepose-train`, PROPOSED):
```rust
pub struct DistillationLoss { pub teacher: TeacherSource, pub temperature: f32, pub uq_transfer: bool }
pub enum TeacherSource { CachedPoseLabels(PathBuf), /* never a live camera in the serving graph */ }
/// Release gate: leakage(student) ≤ budget, asserted by the ADR-145 harness per variant.
pub struct PrivacyBudget { pub max_mia_score: f32 }
```
**Feasibility: HIGH.** All ingredients exist; the work is a loss term, a label cache format, and a CI gate. The honest caveat: MIA-based leakage scores are a lower bound on real leakage; the budget is a regression tripwire, not a formal guarantee.
### P5 — Swarm-distributed multistatic sensing with Raft-coordinated apertures (ADR-148, partially built)
**What.** Treat the drone swarm + fixed ESP32 mesh as one *reconfigurable multistatic aperture*: a Raft-elected cluster head plans node positions/channel assignments to maximize geometric diversity (GDI) for the current sensing task; per-node frames flow into the same `MultistaticFuser` path as fixed nodes.
**Why beyond SOTA.** Published multistatic WiFi sensing assumes fixed geometry. Closed-loop aperture optimization — moving the sensors to where the Fisher information is — driven by the GDI/CramérRao machinery that already exists in `v2/crates/wifi-densepose-ruvector/src/viewpoint/geometry.rs` (per CLAUDE.md module table: `GeometricDiversityIndex`, Cramér-Rao bounds) is a genuinely new system class for SAR/MAT scenarios.
**Builds on.** `v2/crates/ruview-swarm/src/sensing/{multiview.rs,payload.rs,occworld_bridge.rs}`, `topology.rs`, `planning.rs`, `marl/` (MAPPO, `candle_ppo.rs`); `ruvsense/multistatic.rs` + `array_coordinator.rs` (ADR-138 clock-quality gating — moving nodes will stress exactly this); `wifi-densepose-mat` (the MAT use case).
**Contract sketch** (in `ruview-swarm`, PROPOSED):
```rust
pub trait AperturePlanner: Send + Sync {
/// Given current twin + task, propose node placements maximizing expected GDI.
fn plan(&self, twin: &WorldGraphSnapshot, task: &SwarmTask) -> Vec<(NodeId, Position3D)>;
}
// Output flows through Raft (topology.rs) as a normal SwarmTask; frames return as ArrayNodeInput.
```
**Feasibility: MEDIUM.** Coordination, MARL, and fusion code exist and are tested; the hard physical problems are honest unknowns: airborne CSI phase stability (rotor vibration), clock sync across mobile nodes (ADR-138 gate will reject a lot initially), and ADR-148 §1.3's own regulatory scoping. Simulation-first via `ruview-swarm/src/evals.rs` + `bench_support.rs`; hardware validation is Phase 3.
### P6 — Continual / test-time adaptation with EWC-style forgetting control (PROPOSED on existing TTT)
**What.** Promote `rapid_adapt.rs` from a per-deployment trick to a managed continual-learning loop: TTT/entropy adaptation produces LoRA deltas on the P1 encoder; an EWC (elastic weight consolidation) penalty — **which does not exist in the workspace today** (no EWC match in `wifi-densepose-train/src/rapid_adapt.rs`) — anchors weights important to previously-validated environments; every adaptation step is versioned as a new `model_version` (u16, ADR-136 §2.2) and must re-pass the ADR-145 acceptance matrix before activation.
**Why beyond SOTA.** TTT papers adapt and hope; nothing published couples adaptation to a *deterministic regression gate with witness hashes*, where an adapted model that regresses tier or leaks identity is automatically rejected and the `model_version` provenance lets any semantic state be traced to the exact adaptation step.
**Builds on.** `v2/crates/wifi-densepose-train/src/rapid_adapt.rs` (`AdaptationLoss::ContrastiveTTT`, entropy-minimization variant — lines 816); LoRA adapters in `sensing-server/src/embedding.rs` (rank-4 `lora_1`/`lora_2`); ADR-027 MERIDIAN evaluator (`train/src/eval.rs`); ADR-146 §2 calibration-robustness loss.
**Contract sketch** (in `wifi-densepose-train`, PROPOSED):
```rust
pub struct EwcPenalty { pub fisher_diag: Vec<f32>, pub anchor: Vec<f32>, pub lambda: f32 }
pub struct AdaptationStep {
pub parent_model_version: u16, pub new_model_version: u16,
pub loss: AdaptationLoss, pub ewc: Option<EwcPenalty>,
pub acceptance: RuViewAcceptanceResult, // must be ≥ parent tier
pub witness: [u8; 32], // hash of delta + acceptance
}
```
**Feasibility: HIGH.** EWC over a small LoRA delta is cheap (Fisher diagonal over the replay corpus); the acceptance gate and proof seeds exist (`proof.rs`, `PROOF_SEED = 42`). Risk: online Fisher estimation from unlabeled home data is noisy — start with adaptation restricted to LoRA parameters only, backbone frozen.
### P7 — On-device WASM edge inference with deterministic replay (extends existing Tier-3)
**What.** Push P1 head subsets (presence, vitals, coarse activity) into hot-loadable WASM modules on ESP32-S3, and onto browsers/workers via `wifi-densepose-wasm`. Every edge module's output is replayable: the same `CanonicalFrame` input bytes through the same module hash produce the same output bytes, verified in CI on x86_64/aarch64/wasm32.
**Why beyond SOTA.** Edge WiFi-sensing exists; *bit-deterministic, witness-hashed edge inference with hot-swap and replay parity against the server pipeline* does not appear in published systems. It turns the edge from a trust hole into a witness-chain extension.
**Builds on.** `v2/crates/wifi-densepose-wasm-edge/src/lib.rs` (WASM3 host ABI: `csi_get_*`, `on_frame` at ~20 Hz, ADR-040 Tier 3); `nvsim` as the proof that a no-std-time, no-OS-entropy, seeded-PRNG crate runs identically on wasm32 (`nvsim/src/lib.rs` doc); ADR-136 AC7 cross-architecture byte-stability test.
**Contract sketch** (PROPOSED additions to the wasm-edge host ABI):
```rust
// exports added to module lifecycle:
// on_replay_begin(seed: u64) — pins any module-internal PRNG
// witness_digest(buf_ptr: i32) -> i32 — module returns BLAKE3 of its output stream
pub trait EdgeStage: Stage<CsiFrameView, EdgeEvent> { fn module_hash(&self) -> [u8; 32]; }
```
**Feasibility: HIGH for presence/vitals heads, LOW for full pose on-ESP32.** WASM3 interpretation on Xtensa caps throughput; full 7-head inference stays on Pi/Hailo/browser. Float determinism across native vs WASM needs care (no fast-math, fixed reduction order — same obligation ADR-136 §3.2 already accepts).
### P8 — NV-magnetometry fusion: an honest classical-quantum hybrid (PROPOSED, simulation-first)
**What.** Add `nvsim`-modeled NV-magnetometer nodes as a *fourth sensing modality class* (after CSI, mmWave/ADR-021, BFLD) in the multistatic fusion: near-range (≤ tens of cm, per the physics review) cardiac/respiratory magnetic signatures fused with CSI/mmWave vitals under the ADR-137 evidence contract. Simulation-first: the modality lands end-to-end against `nvsim` before any hardware exists.
**Why beyond SOTA.** Not range — the Ghost Murmur review (`docs/research/quantum-sensing/16-ghost-murmur-ruview-spec.md`) documents why multi-mile cardiac magnetometry contradicts published physics, and this design adopts that conclusion. The beyond-SOTA element is architectural honesty: a fusion engine that can ingest a quantum-sensor modality with explicit, witnessed physics bounds (`nvsim`'s forward model states its approximations and hashes its output, `nvsim/src/proof.rs`), so that when real NV hardware matures, the integration path and the anti-hype guardrails already exist. No published consumer sensing stack has this.
**Builds on.** `v2/crates/nvsim/src/` (scene→source→attenuation→NV ensemble→digitiser, SHA-256 witness, ADR-089); `nvsim-server`; `wifi-densepose-vitals` (mmWave HR/BR — the modality NV would cross-validate); `ruvsense/multistatic.rs` fusion + ADR-137 `EvidenceRef`.
**Contract sketch** (PROPOSED): a `SensorModality::NvMagnetometer` variant on the existing `wifi-densepose-worldgraph` `SensorModality` enum, plus an `ArrayNodeInput` adapter from `nvsim` frames; vitals agreement/disagreement between NV and mmWave becomes an `EvidenceRef`/`ContradictionFlag` pair.
**Feasibility: HIGH in simulation, SPECULATIVE on hardware.** The sim path is days of glue; COTS NV magnetometers with the required sensitivity at consumer cost do not exist in 2026. This pillar's deliverable is the *contract and the simulated validation*, explicitly labeled as such.
---
## 4. Phased implementation plan
Phases are gated by the Pre-Merge Checklist (CLAUDE.md) and the witness chain (§5). Crate names per the ADR-136 §2.1 normative map — no new `ruview_*` crates except where a crate already exists (`ruview-swarm`).
**Phase 0 — Hardening (close the ADR-136 "integration glue" debt).**
- `wifi-densepose-signal`: wire the full 600-frame `Stage`-chain replay (ADR-136 AC6) and register `streaming_engine_replay_v1` in `archive/v1/data/proof/expected_features.sha256`.
- CI: cross-architecture witness matrix x86_64/aarch64 (AC7); add wasm32 lane for `nvsim` + `wifi-densepose-wasm`.
- `wifi-densepose-engine`: populate `FrameMeta.calibration_id`/`model_id` from the live calibration and model-binding stages (currently defaulted — ADR-136 §8).
- `homecore-recorder`: define the **replay corpus** format (canonical-bytes frame streams + witness manifest) that P4/P6 training and all ablations consume.
**Phase 1 — Encoder + measurement (P1, P4 groundwork, P6 skeleton).**
- `wifi-densepose-nn`: `RfEncoder`/`TaskHead` traits, seven-head fan-out, UQ layer (ADR-146); relocate `ProjectionHead` from `sensing-server/src/embedding.rs`.
- `wifi-densepose-train`: `ContrastiveBatcher`, ADR-150 loss stack, distillation loss + cached-teacher format (P4), `EwcPenalty` + `AdaptationStep` (P6); extend `ablation.rs` `FeatureSet` with per-head and per-pillar variants; pin `expected_ablation_*.sha256`.
- Run the ADR-150 three-variant frozen-decoder experiment; promotion gate on cross-subject delta.
**Phase 2 — Closed loop + edge (P3, P7).**
- `wifi-densepose-engine`: `apply_trajectory_priors` (OccWorld → `pose_tracker.rs`); `PredictedBy` provenance edge in `wifi-densepose-worldgraph`; topology-change → ADR-135 recalibration trigger.
- `wifi-densepose-wasm-edge`: replay ABI (`on_replay_begin`, `witness_digest`), presence/vitals head modules; parity test vs server pipeline on identical canonical bytes.
- Enable `rf-slam-v2` feature on the 7-day validation dataset (ADR-143 gate).
**Phase 3 — Frontier (P2, P5, P8).**
- `wifi-densepose-signal/src/ruvsense/forward_model.rs`: Born/ray forward model seeded from `tomography.rs`; `PhysicsResidual``EvidenceRef`; synthetic-data generator into `train/src/dataset.rs`.
- `ruview-swarm`: `AperturePlanner` over GDI (`ruvector/src/viewpoint/geometry.rs`); simulation evals in `evals.rs`; airborne CSI stability study before any hardware claim.
- `nvsim``wifi-densepose-engine`: `SensorModality::NvMagnetometer` adapter, simulated NV+mmWave vitals cross-validation in the ablation matrix.
---
## 5. Determinism & witness-chain preservation
The non-negotiable invariant (ADR-136 §2.52.6, ADR-028): replaying recorded canonical bytes through the pipeline twice yields byte-identical outputs and equal BLAKE3 witness hashes. Strategy per component class:
1. **Everything on the trust path implements `CanonicalFrame`.** New frame types (`MultiTaskOutput`, `PhysicsResidual`, `AdaptationStep`, edge events, NV frames) get fixed-field-order LE encodings and `witness_hash()`; encoders are the only serializers (no ad-hoc serde on the witness path).
2. **Inference is witnessed by (input hash, model hash, output hash).** `model_id`/`model_version` on `FrameMeta` already bind frames to models; P1 adds a weights digest so the triple is closed. Pure-Rust f32 inference (ADR-146 ABI) with fixed reduction order; no GPU nondeterminism on the witness path — GPU/libtorch is training-only, and training determinism is pinned by the existing seeds (`proof.rs`: `PROOF_SEED = 42`, `MODEL_SEED = 0`).
3. **Advisory vs witnessed split.** Components that cannot be made deterministic — the OccWorld Python subprocess (ADR-147), live MARL exploration, any future LLM/agent output (ADR-140 Ruflo) — are **advisory**: their outputs may bias estimates but never enter `to_canonical_bytes()` directly; instead the *decision to use them* is recorded (prior id + content hash) so replay reproduces the decision even if the producer cannot be re-run. The Kalman tracker consumes priors as explicit inputs recorded in the replay corpus.
4. **Adaptation is a chain of witnessed steps.** P6's `AdaptationStep.witness` hashes (parent version ‖ delta ‖ acceptance result); the active model at any timestamp is reconstructible from the step chain — the model-weights analogue of the frame witness chain.
5. **Edge parity.** P7 modules must produce the same `witness_digest` as the server-side reference implementation on the AC6 fixture; the module hash joins the firmware `source-hashes.txt` in the ADR-028 witness bundle.
6. **Witness bundle growth is mechanical.** Each pillar adds expected-hash keys (`forward_model_residual_v1`, `edge_presence_replay_v1`, `nvsim` already ships `proof.rs`) to the existing `verify.py` chain rather than inventing new verification mechanisms.
---
## 6. Explicit non-goals
- **No workspace rename or rewrite.** Reaffirms ADR-136 §2.1/§4.1: no `ruview_*` crate prefix migration, no umbrella crate; pillars land inside the existing crates listed above.
- **No full-wave Maxwell solver in the runtime loop.** P2 is first-order Born/ray, with the approximation order declared. "Physics-informed" never means FDTD at 20 Hz.
- **No long-range cardiac magnetometry claims.** P8 is bounded by the physics review in `docs/research/quantum-sensing/16-ghost-murmur-ruview-spec.md`; ranges beyond published MCG physics are out of scope permanently, not just deferred.
- **No camera in any deployed serving graph** (P4 teachers are train-time, cached-label only) and **no identity recognition as a product feature** — identity embeddings remain in-RAM, hash-rotated (ADR-120 invariants).
- **No weaponization or LAWS capability in P5**, per ADR-148 §1.3; swarm work targets SAR/MAT and stays behind the ADR-148 regulatory gates.
- **No fabricated benchmarks.** All pillar performance statements in this document are targets; promotion of any pillar requires the ADR-145 ablation matrix delta plus pinned determinism hashes, in CI, before any external claim.
- **No new verification mechanisms.** The witness chain extends `verify.py` / BLAKE3 / `expected_*.sha256`; we do not introduce a second, parallel proof system.
---
## 7. Open questions for the next document in this series
1. Airborne CSI phase stability (P5): what does the ADR-138 clock-quality gate measure on a real quadrotor payload?
2. Forward-model fidelity floor (P2): what Born-residual magnitude on the ADR-135 empty-room captures is "good enough" to be a useful contradiction signal?
3. Replay-corpus governance (Phase 0): retention, privacy class of recorded canonical bytes, and consent — the recorder stores signal evidence, which is itself sensitive.

View File

@ -0,0 +1,384 @@
# Beyond-SOTA Validation, Test & Benchmark Methodology
**Series:** `docs/research/ruview-beyond-sota/` · Document 03
**Date:** 2026-06-09
**Scope:** How RuView proves (and gates) beyond-SOTA claims using the verification
infrastructure that already exists in this repository. Every number below is sourced
from a cited file in this repo; nothing is invented.
---
## 1. The Layered Validation Pyramid
Six layers, cheapest/most-deterministic at the bottom, most expensive/most-credible at
the top. A beyond-SOTA claim must survive **every layer below it** before it may be
published from the layer it lives at.
| Layer | What it proves | Tooling | Frequency | Determinism |
|-------|----------------|---------|-----------|-------------|
| **L0** Unit/integration tests | Code correctness | `cargo test --workspace --no-default-features` + pytest | per commit | exact |
| **L1** Deterministic proof + witness bundle | Pipeline is real, unchanged, reproducible | `archive/v1/data/proof/verify.py`, `scripts/generate-witness-bundle.sh` | per merge / release | exact (SHA-256) |
| **L2** Criterion micro-benchmarks | Compute latency only — never quality (ADR-149 §2) | 15 bench targets across `v2/crates/*/benches/` | nightly / pre-release | statistical |
| **L3** Dataset-level accuracy eval | Pose/presence/vitals quality vs published SOTA | MM-Fi / Wi-Pose (ADR-015), `ruview_metrics.rs` tiers, ADR-145 ablation harness | per model release | seeded |
| **L4** Hardware-in-loop | Real CSI on real ESP32, no mocks | COM9 (S3) / COM12 (C6) protocol, witness firmware hashes | per firmware release | A/B controlled |
| **L5** Field trials / live capture | End-to-end behavior in a real room | live-session captures (e.g. `benchmark_baseline.json`) | campaign | statistical |
### 1.1 L0 — Workspace tests (current counts)
- ADR-028 audit (2026-03-01): **1,031 passed, 0 failed, 8 ignored** for
`cargo test --workspace --no-default-features`
(`docs/adr/ADR-028-esp32-capability-audit.md` §2).
- Current `CHANGELOG.md` (Unreleased, cross-platform fix entry): **2,682 workspace
tests pass / 0 fail on Windows** — the suite has more than doubled since the audit.
- `CLAUDE.md` pre-merge gate still cites "1,031+ passed, 0 failed" as the floor.
**Rule:** the post-change test count may never be lower than the pre-change count, and
failures must be 0. The witness bundle records the full log
(`test-results/rust-workspace-tests.log`) and an aggregated `summary.txt`
(`scripts/generate-witness-bundle.sh` step 3).
### 1.2 L1 — Deterministic proof ("Trust Kill Switch") + witness bundle
`archive/v1/data/proof/verify.py` (header comment): feeds 1,000 synthetic CSI frames
(seed=42, `sample_csi_data.json`) through the **production** `CSIProcessor`
(`src/core/csi_processor.py`), hashes the first 100 frames' feature output
(`VERIFICATION_FRAME_COUNT = 100`), and compares against
`archive/v1/data/proof/expected_features.sha256`.
- **Current published hash (file contents, verified during this investigation):**
`f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a`
- The hash is **environment-coupled** and has been legitimately regenerated before:
ADR-028 §5.3 recorded `8c0680d7…` under numpy 2.4.2/scipy 1.17.1; `CHANGELOG.md`
(#560 fix) recorded `667eb054…` after 6-decimal quantization + single-thread BLAS
pinning (`OMP_NUM_THREADS=1` etc.). Each regeneration must follow the documented
procedure: `python verify.py --generate-hash` then `python verify.py``VERDICT: PASS`.
`scripts/generate-witness-bundle.sh` packages: witness log + ADR-028, the Python proof
(verify.py + expected hash + reference-signal metadata), full Rust test log + summary,
the ADR-134 CIR proof, firmware source/binary SHA-256s, crate version manifest, npm
tarball SHA-256, and a recipient-side `VERIFY.sh`.
**Accuracy note on check counts:** `CLAUDE.md` describes the recipient verification as
"7/7 PASS"; the current `VERIFY.sh` embedded in the script performs **10** `check()`
assertions (witness log, ADR, proof-hash file, tests, firmware hashes, crate manifest,
npm manifest, Python proof, CIR proof, CIR hash file) but prints a hardcoded
`"ALL CHECKS PASSED (8/8)"` string (`generate-witness-bundle.sh` line 293). The
hardcoded count is stale relative to the actual check list — fix it to print
`${PASS_COUNT}/${PASS_COUNT+FAIL_COUNT}` so the verdict can never silently desynchronize
from the check inventory.
### 1.3 L2 — Criterion micro-benchmark inventory (all 15 targets)
All bench sources read directly. Per ADR-149 §2 these are **latency regression gates
only, never quality evidence**.
| Bench target | Crate | Benchmark functions / groups | What it measures | Recorded value or in-source target (citation) |
|---|---|---|---|---|
| `engine_cycle.rs` | wifi-densepose-engine | `process_cycle_4nodes_56sc` | One full `StreamingEngine::process_cycle` (fuse + quality + calibration provenance + privacy gate + WorldGraph node), 4-node/56-subcarrier ESP32-S3 HT20 mesh | Budget: **50 ms** (20 Hz) — bench header |
| `signal_bench.rs` | wifi-densepose-signal | `CSI Preprocessing`, `Phase Sanitization`, `Feature Extraction`, `Motion Detection`, `Full Pipeline` | SOTA signal stages (ADR-014) at varying frame sizes | no recorded baseline |
| `cir_bench.rs` | wifi-densepose-signal | `cir_estimate` (HT20/HT40/HE20/HE40), `cir_estimate_12link`, `cir_estimator_new` | ADR-134 `CirEstimator::estimate()` per tier; 12-link multistatic amortization; cold-start | no recorded baseline |
| `calibration_bench.rs` | wifi-densepose-signal | `bench_recorder_record`, `bench_recorder_finalize`, `bench_deviation`, `bench_record_600`, `bench_to_bytes` (K=52/114/242/484) | ADR-135 empty-room baseline recorder + deviation scoring | no recorded baseline |
| `aether_prefilter_bench.rs` | wifi-densepose-signal | `aether_search_d…_n…_k…` (search vs prefilter) | ADR-084 Pass-2: `EmbeddingHistory::search_prefilter` vs brute force, prefilter_factor=8 | Pass: **≥4× at n=1024** — bench header |
| `sketch_bench.rs` | wifi-densepose-ruvector | `compare_d128/256/512` × `float_l2`/`float_cosine`/`sketch_hamming` | ADR-084 sketch-vs-float per-pair compare cost (AETHER 128-d, spectrogram 256-d) | Pass: **sketch ≥8× faster** at every dim (ADR-084 threshold 8×30×) — bench header |
| `crv_bench.rs` | wifi-densepose-ruvector | `gestalt_classify_single/batch_100`, `sensory_encode_single`, `pipeline_full_session`, `convergence_two_sessions`, `crv_session_create`, `crv_embedding_dimension_scaling` (32/128/384), `crv_stage_vi_partition` | CRV integration throughput | no recorded baseline |
| `inference_bench.rs` | wifi-densepose-nn | `tensor_ops` (relu/sigmoid/tanh), `densepose_inference`, `translator_inference`, `mock_inference`, `batch_inference` | NN forward-pass cost by input/batch size | no recorded baseline; **`mock_inference` group must never be quoted as a pipeline number** (§6) |
| `training_bench.rs` | wifi-densepose-train | `interp_114_to_56_batch32`, `interp_scaling`, `compute_interp_weights_114_56`, `synthetic_dataset_get`, `synthetic_epoch`, `config_validate`, PCK over 100 samples | Training preprocessing + metrics hot paths; fixtures fully deterministic (no `rand`) — header | no recorded baseline |
| `detection_bench.rs` | wifi-densepose-mat | `breathing_detection`, `heartbeat_detection`, `movement_classification`, `detection_pipeline`, localization (triangulation/depth), alert generation | MAT survivor-detection algorithms at varying signal lengths / noise | no recorded baseline |
| `transport_bench.rs` | wifi-densepose-hardware | `beacon_serialize_16byte/28byte_auth/quic_framed`, `auth_beacon_verify`, `replay_window`, `framed_message` encode/decode, `secure_tdm_cycle` (manual vs QUIC) | TDM beacon crypto + transport | no recorded baseline |
| `mqtt_throughput.rs` | wifi-densepose-sensing-server | `discovery::build_*`, `state::*`, `rate_limiter::allow_*`, `privacy::decide_*`, `semantic::bus_tick_all_10_primitives` | ADR-115 MQTT hot path | Targets (header): discovery **<5 µs**, state encode **<2 µs**, rate limit **<100 ns**, privacy **<50 ns**, bus tick **<10 µs** |
| `swarm_bench.rs` | ruview-swarm | `marl_actor_inference`, `rrt_apf_100iter`, `multiview_fusion_3drones`, `demo_coverage_estimate`, `ppo_update_64transitions` | ADR-148 swarm control-loop compute | Measured: **3.3 µs / 43 µs / 5458.5 ns / 100 ps / 248 µs** (ADR-149 §4.3; `CHANGELOG.md` Performance section) |
| `pipeline_throughput.rs` | nvsim | `pipeline_run` (sample-count sweep), `witness::run` vs `run_with_witness` | NV-diamond sim throughput + witness overhead | Acceptance: **≥1 kHz** simulated samples/s on Cortex-A53-class CPU — bench header |
| `state_machine.rs` | homecore | `set` first/warm/no-op, `get` hit/miss, `all_snapshot`, `all_by_domain_light_20_of_100`, `broadcast_fan_out` | HOMECORE state-machine hot paths | no recorded baseline |
**Honest gap — `benchmark_baseline.json` is not a criterion baseline.** The repo-root
`benchmark_baseline.json` (369.9 KB) contains **1,566 live-capture samples** from a
2-node session (fields: `tick`, `n_nodes`, `variance`, `motion`, `presence`,
`confidence`, `est_persons`, `n_persons_rendered`, `kp_spread`, `rssi`) plus a summary
block — it records **field-trial telemetry (L5)**, not micro-benchmark latencies.
No file in the repo references it (`grep -rn benchmark_baseline` → 0 hits outside the
file itself); its producer must be identified and committed (§5.3). Summary values
(all from the file's `summary` object):
| Metric | Baseline value |
|---|---:|
| `total_frames` | 1,566 |
| `presence_ratio` | 0.9336 (1,462/1,566 frames presence-true) |
| `confidence_mean` | 0.6433 |
| `variance_mean` / `variance_std` | 109.36 / 154.13 |
| `kp_spread_mean` / `kp_spread_std` | 86.73 / 4.52 |
| `person_count_changes` | 10 |
Criterion latencies that *have* been recorded live in ADR documents instead
(ADR-147-benchmark-proof.md, ADR-149 §4.3, CHANGELOG Performance) — §5 below defines
how to consolidate them into a real machine-readable criterion baseline.
### 1.4 L3 — Dataset-level accuracy evaluation
- **Datasets (ADR-015):** primary **MM-Fi** (40 subjects × 27 actions × ~320K frames,
1TX×3RX, 114 subcarriers @100 Hz, 17-keypoint COCO + DensePose UV, CC BY-NC 4.0);
secondary **Wi-Pose** (12 volunteers × 12 actions × 166,600 packets, 3×3, 30
subcarriers). 114→56 subcarrier interpolation via `subcarrier.rs`; validation split =
subjects 3340 held out (ADR-015 Phase 1).
- **Acceptance tiers:** `wifi-densepose-train/src/ruview_metrics.rs`
PCK@0.2 / OKS / MOTA / vitals rolled into `RuViewTier`
(Fail/Bronze/Silver/Gold) (ADR-145 §1.1).
- **Ablation harness (ADR-145):** 6-variant matrix (`csi_only`, `cir_only`,
`csi_plus_cir`, `plus_doppler`, `plus_bfld`, `plus_uwb`-skipped), each variant
producing acceptance tier + `SpecMetrics` (presence ≥0.90, localization ≤0.50 m,
activity ≥0.70, FP ≤0.05, FN ≤0.10), `LatencyProfile` (p95 ≤100 ms), and
`PrivacyLeakage` (MIA `leakage_score` ≤0.05), SHA-256-pinned per variant under
`PROOF_SEED=42` (ADR-145 §2.22.6). Built at commit `0f336b7d3` (ADR-145
implementation status); CLI auto-mode wiring is pending.
- **Cross-environment:** ADR-027 MERIDIAN `CrossDomainEvaluator`
(`wifi-densepose-train/src/eval.rs`) — `domain_gap_ratio`, extended by ADR-145
`cross_room_degradation()` with a 17-joint PCK-delta heatmap.
### 1.5 L4 — Hardware-in-loop
- Real CSI nodes: ESP32-S3 on **COM9**, ESP32-C6 + MR60BHA2 on **COM12** (`CLAUDE.md`
hardware table). ADR-018 binary frame protocol over UDP:5005 (ADR-028 §3.2/§3.4).
- ADR-145 Tier-4 test (gated, `#[cfg(feature = "hardware-test")]`): replay a live 30 s
COM9 capture through `csi_only` and `csi_plus_cir`; assert no presence regression and
p95 < 100 ms.
- A/B board protocol precedent (`CHANGELOG.md` #987): fixed vs unmodified control board
against Apple-Watch ground truth (control pegged 4049 BPM; fixed 8891 vs 87 GT) —
this fixed-board/control-board + external ground-truth pattern is the required design
for all hardware vital-sign claims.
- Witness bundle pins firmware: per-file SHA-256 of all sources + release binaries
(`generate-witness-bundle.sh` step 5).
### 1.6 L5 — Field trials
Live multi-node sessions captured as JSONL/JSON with summary statistics —
`benchmark_baseline.json` (§1.3) is the existing exemplar. ADR-149 §6 adds the seeded
`evals/` episode harness (Stage 1 kinematic full-matrix, Stage 2 Gazebo/PX4 SITL on the
3 median seeds) for the swarm domain.
---
## 2. Beyond-SOTA Acceptance Criteria per Capability Axis
A claim is "beyond SOTA" only with: a named external baseline, an exact metric and
protocol match, the dataset/split named, the threshold pre-registered, and the
statistical procedure of §3 followed. Current axes with measured status:
| Axis | Metric (exact) | Dataset / protocol | SOTA baseline | Beyond-SOTA threshold | Measured status (cited) |
|---|---|---|---|---|---|
| In-domain pose accuracy | torso-PCK@20: `‖predgt‖ ≤ 0.2·‖R-shoulderL-hip‖` | MM-Fi `random_split` (ratio 0.8, seed 0) | MultiFormer **72.25%** (Table VII); CSI2Pose 68.41% | > 72.25% with 95% CI lower bound above it | Flagship **83.59%**; micro (75,237 params) **74.30%** (`docs/benchmarks/wifi-pose-efficiency-frontier.md`) |
| Edge efficiency frontier | torso-PCK@20 at deployed precision + params + batch-1 latency | same | MultiFormer 72.25% at full size | Pareto-dominance: smaller **and** above 72.25% at the deployed precision | int8 73.5 KB **74.70%**; int4-QAT 36.7 KB **74.46%**; shipped int4 verified **74.08%**, 0.135 ms 1-thread x86 (same file) |
| Cross-subject generalization | torso-PCK@20, official MM-Fi cross-subject split (256,608 train / 64,152 test) | leakage-free split | own zero-shot baseline 63.99% | ADR-150 §4 gate: **+≥6 pts cross-subject without losing >2 pts random-split** | Best zero-shot **64.92%** (mixup+TTA+3-seed); gate judged unreachable without new capture (ADR-150 §3.2) |
| Few-shot calibration (deployment) | PCK@20 after K labeled in-room samples; adapter size | MM-Fi cross-subject & cross-environment splits | zero-shot (64% / 10.6%) | SOTA-level (≳72%) from ≤200 samples with ≤~11 KB per-room adapter | cross-subject ~**72%** @100200 samples (3 seeds); cross-env **10.6→73.1%** @200, 60.1% @5 (ADR-150 §3.53.6) |
| Swarm SAR localization | CEP50/CEP95 (m), GDOP-stratified | seeded episode distribution (ADR-149 §6), not single geometry | Wi2SAR **5 m** (arxiv 2604.09115, paper-to-paper) | CEP50 < 5 m, IQM over 10 seeds, 95% CI excluding 5 m | 1.732 m single synthetic geometry graded **LowMedium**, not yet claimable (ADR-149 §7) |
| Swarm coverage | coverage-rate@240 s; time-to-95% | episode rollouts | Wi2SAR 160k m²/13.5 min | rollout (not analytic) mean+CI beating baseline | 223 s is an analytic estimate — graded **Low** (ADR-149 §7) |
| Control-loop latency | criterion wall-clock | local hardware, named | 10 ms / 100 Hz budget | all stages ≪ budget | 3.3 µs MARL / 43 µs RRT-APF / 54 ns fusion / 248 µs PPO (ADR-149 §4.3) |
| World-model trajectory | MDE (m) at 5-frame horizon | RuView CSI-derived occupancy | pre-fine-tune random-weight baseline 9.49 m MDE | **≤1.0 m (2.0 vox)** at 5-frame horizon (ADR-147 §5 target, cited in benchmark-proof §4) | 9.49 m / FDE 16.23 m random weights; 208.45 ms median latency on real CSI (ADR-147-benchmark-proof §4, §7) |
| Privacy leakage | MIA `leakage_score = 2·(AUC0.5)` | fixed replay, fixed-seed shadow classifier | chance (0) | ≤ **0.05** (attacker AUC ≤ 0.525) | gate defined, harness built (ADR-145 §2.3) |
| Vitals (hardware) | BPM error vs wearable ground truth | live A/B board protocol | control board behavior | within physiological agreement of ground truth, stable spread | 8891 BPM vs 87 GT, spread 59→0 (CHANGELOG #987) |
### Claim-language discipline (from ADR-149 §7 grading)
| Evidence | Permitted language |
|---|---|
| Single run / single geometry / analytic estimate | "directional", never "beats SOTA" |
| Seeded multi-run with CIs vs paper baseline | "exceeds the published X result paper-to-paper" |
| Same metric, same split, same protocol, CI excludes baseline | "beyond SOTA on <dataset>/<split>" |
| No public leaderboard exists (swarm CSI-SAR) | never claim "leaderboard standing" (ADR-149 §3) |
---
## 3. Statistical Procedure for Honest Claims
Adopted from ADR-149 §5 (Agarwal 2021 / Gorsane 2022 standard) and the practices
already used in ADR-150/efficiency-frontier measurements:
1. **Seeds.** ≥10 independent seeds for RL/episodic claims (ADR-149 §5); ≥3 seeds
minimum for supervised dataset evals (ADR-150 §3.5 used 3 seeds; report all).
Training seeds, eval seeds, and split files are versioned and committed.
2. **Aggregate.** IQM (not mean/median) for episodic metrics + performance profiles;
for dataset accuracy report mean across seeds with each seed's value listed.
3. **Confidence intervals.** 95% stratified bootstrap, 1,000 resamples (ADR-149 §5;
reference impl: `rliable`).
4. **Paired comparisons.** When comparing model A vs B (e.g. `csi_plus_cir` vs
`csi_only`, or ours vs a reproduced baseline), evaluate both on the **identical
frozen test frames** and use a paired bootstrap over per-sample correctness
(PCK hit/miss is per-joint binary — pair at the joint-sample level). For
paper-to-paper comparisons where the baseline cannot be re-run, state so
explicitly ("paper-to-paper", ADR-149 §2) and require the CI lower bound to clear
the published point value.
5. **Pre-registration.** The threshold lives in an ADR **before** the run
(precedent: ADR-150 §4 gate written before §3.2 measurements; the measurements
honestly reported the gate as not met).
6. **Negative results are recorded.** ADR-150 §1/§3.2 keeps DANN-failed,
capacity-hurts, and KD-didn't-help results in the record — required practice.
7. **Eval episodes (swarm):** 50 fixed, versioned episodes per policy
(10 victim layouts × 5 CSI-noise levels), ≥3 baselines (random walk,
boustrophedon+triangulation, IPPO) (ADR-149 §5).
8. **GDOP stratification** for any localization claim, so geometry artifacts cannot
produce the headline (ADR-149 §6.3).
---
## 4. Regression-Gate Design (CI Enforcement)
### 4.1 Three gate classes, three tolerances
| Gate class | Source of truth | Tolerance | On breach |
|---|---|---|---|
| Determinism hashes | `expected_features.sha256`, `expected_cir_features.sha256`, `expected_calibration_features.sha256`, future `expected_ablation_<slug>.sha256` | **exact (0%)** | exit 1 = FAIL; exit 2 = SKIP only for placeholder hashes (proof.rs `0/1/2` convention, ADR-145 §2.4) |
| Accuracy / quality metrics | per-variant canonical bytes, quantized 1e-3 (ADR-145 §2.6) | exact after quantization | FAIL CI; tier change requires ADR amendment |
| Latency / throughput | criterion estimates JSON | **% tolerance per scale** (below) | FAIL on regression beyond tolerance; trend everything |
### 4.2 Criterion baseline file (replaces the current gap)
Today criterion numbers live in prose (ADR-147-benchmark-proof, ADR-149 §4.3,
CHANGELOG). Formalize:
1. `cargo bench --workspace -- --save-baseline main` on a **named, fixed runner**
(ADR-147 used RTX 5080 / specific host; record host + toolchain in the file).
2. Export `target/criterion/*/estimates.json` point estimates into a committed
`v2/benchmarks/criterion-baseline.json`: `{bench_id, crate, p50_ns, host, commit}`.
3. CI compares new runs against it with scale-aware tolerance — wall-clock noise is
proportionally larger at small magnitudes:
| Magnitude | Tolerance | Rationale |
|---|---|---|
| < 1 µs (e.g. fusion 54 ns, privacy decide <50 ns target) | ±25% | timer/jitter dominated |
| 1 µs 1 ms (MARL 3.3 µs, RRT-APF 43 µs, PPO 248 µs) | ±15% | criterion CI typically <5%, leave CI-runner headroom |
| > 1 ms (engine cycle vs 50 ms budget, OccWorld ~209 ms) | ±10% **and** absolute budget (50 ms / 500 ms ADR-147 §6) | budgets are the contract |
4. Hard in-source acceptance thresholds remain authoritative regardless of baseline:
sketch ≥8× (`sketch_bench.rs`), prefilter ≥4× (`aether_prefilter_bench.rs`),
nvsim ≥1 kHz (`pipeline_throughput.rs`), MQTT header targets, ADR-145 p95 ≤100 ms.
5. Latency stays **out of determinism hashes** (ADR-145 §2.6) but **in** the trended
`summary.json`, so sub-threshold drift is visible (ADR-145 §3.2 mitigation).
### 4.3 Live-capture baseline gate (`benchmark_baseline.json`)
Adopt the file as the L5 regression anchor with documented provenance, then gate a
re-capture of the same scenario (same 2-node placement, same room class) against the
summary block:
| Field | Baseline | Suggested gate |
|---|---:|---|
| `presence_ratio` | 0.9336 | ≥ 0.90 for an occupied-room session |
| `confidence_mean` | 0.6433 | within ±0.10 |
| `kp_spread_std` | 4.52 | ≤ 2× baseline (skeleton stability) |
| `person_count_changes` | 10 / 1,566 frames | ≤ 2× baseline (count flapping — see CHANGELOG #803/#894 clamp bugs this metric would have caught) |
Field-trial gates are **soft** (warn + require human sign-off), never auto-merge
blockers — environments differ; the gate exists to force an explanation.
### 4.4 Wiring
Pre-merge (`CLAUDE.md` checklist): L0 + L1. Nightly: L2 criterion + ADR-145 Tier-3
ablation matrix (minutes-scale, ADR-145 §3.2). Release: full witness bundle +
`VERIFY.sh` + L4 on real COM-port hardware (`CLAUDE.md` firmware rule 6/7).
---
## 5. Reproducibility & External-Witness Requirements
Anyone outside the project must be able to re-run every claimed result:
1. **One command per layer.** `cargo test --workspace --no-default-features`;
`python archive/v1/data/proof/verify.py`; `bash scripts/generate-witness-bundle.sh`
then `bash VERIFY.sh` inside the bundle; per ADR-150 §4 every accuracy result needs
"one-command reproduction" (efficiency frontier publishes its exact command:
`python aether-arena/staging/train_efficiency_pareto.py npy/X.npy npy/Y.npy npy/split_random.npy`).
2. **Pinned numerical environment.** The Python proof requires single-threaded BLAS
(`OMP_NUM_THREADS=1`, `OPENBLAS_NUM_THREADS=1`, `MKL_NUM_THREADS=1`,
`VECLIB_MAXIMUM_THREADS=1`, `NUMEXPR_NUM_THREADS=1`) and 6-decimal quantization
(`HASH_QUANTIZATION_DECIMALS=6`) — the #560 fix in `CHANGELOG.md`; Rust proof
runners use coarse u16 quantization at 1e-3 in natural order
(`calibration_proof_runner.rs` pattern, ADR-145 §2.6) for libm portability.
3. **Seeds are constants, committed:** `PROOF_SEED=42`, `MODEL_SEED=0`
(`proof.rs`, ADR-015 Phase 5); dataset splits committed as `.npy`
(`split_random.npy`); swarm configs as versioned YAML with all seeds (ADR-149 §5).
4. **Artifacts carry hashes.** Published model artifacts include SHA-256 (HuggingFace
`pose_micro_int4.npz`, sha256 `c03eeb…` — efficiency-frontier doc); witness bundle
has a `MANIFEST.sha256` over every file; provenance fields
(`replay_sha256`, `model_sha256`, `calibration_version`, `privacy_mode`) are bound
into ablation proof hashes (ADR-145 §2.7) so a metric cannot be quoted without its
exact model + calibration + privacy decision.
5. **Hardware claims name the hardware.** ADR-147 records RTX 5080 / CUDA 12.8 /
PyTorch 2.10.0; nvsim states the Cortex-A53 scaling caveat in the bench header;
efficiency-frontier flags ARM validation as pending. Copy this discipline.
6. **Witness rows.** Every new proof gains rows in `docs/WITNESS-LOG-028.md`
(ADR-145 §5.3 adds W-39…W-41) and the bundle's `source-hashes.txt`.
7. **Secret hygiene in evidence.** Bundle logs pass through
`scripts/redact-secrets.py` (ADR-110 wave-5 incident note in
`generate-witness-bundle.sh` step 4) — external evidence must never embed `.env`.
---
## 6. Known Measurement Pitfalls (WiFi-sensing specific)
| # | Pitfall | Repo evidence | Mitigation in this methodology |
|---|---|---|---|
| 1 | **Subject leakage / split optimism.** In-domain `random_split` has temporal/subject-adjacency effects; the same model family scores 83.6% random-split but ~11.6% torso-PCK on the leakage-free cross-subject split | efficiency-frontier "Controlled claim" footnote; ADR-150 §1, §3.2 | Always report the split name; publish random-split and cross-subject numbers side by side; cross-subject claims only on the official split |
| 2 | **Per-environment overfitting.** Zero-shot cross-environment collapses to 10.6%; subject-scaling saturates ~63.7% past 1620 subjects because the residual is room/device shift | ADR-150 §3.3, §3.6 | Cross-room degradation + 17-joint heatmap in every ablation (ADR-145 §2.5); claim deployment accuracy only with the calibration protocol stated (K samples, adapter size) |
| 3 | **Mock-mode contamination.** Mock firmware missed a real Kconfig threshold bug; the nn crate ships a `mock_inference` criterion group that must never be quoted as pipeline performance | `CLAUDE.md` firmware rule 7; `inference_bench.rs` `bench_mock_inference` | L4 mandatory before firmware release ("Always test with real WiFi CSI, not mock mode"); label mock benches in reports; ADR-147 §7 re-ran the benchmark on real CSI explicitly "no mocks" |
| 4 | **Single-run point estimates.** 1.732 m localization from one synthetic geometry; 223 s coverage from an analytic formula | ADR-149 §1, §7 | §3 seed/CI protocol; evidence-grade table before publication |
| 5 | **Random-weight / untrained baselines read as results.** OccWorld MDE 9.49 m is a pre-fine-tuning random-weight reading | ADR-147-benchmark-proof §4 | Label baseline-vs-target explicitly; never aggregate untrained-model numbers into capability claims |
| 6 | **Latency conflated with quality.** Criterion µs numbers prove no compute bottleneck, nothing about accuracy | ADR-149 §2, §4.3 | L2 is gate-only; quality claims live in L3+ |
| 7 | **Floating-point nondeterminism breaking proofs.** SciPy FFT SIMD reordering + multithreaded BLAS produced different hashes across CI microarchitectures | CHANGELOG #560; `calibration_proof_runner.rs` lines 113 (cited in ADR-145 §2.3) | Quantize before hashing; pin thread env vars; exclude wall-clock from hashes |
| 8 | **Hash churn without procedure.** Three distinct historical values of the proof hash exist (`8c0680d7…` ADR-028, `667eb054…` CHANGELOG #560, `f8e76f21…` current file) | cited files | Every regeneration via `--generate-hash` + re-verify + CHANGELOG entry + witness bundle refresh |
| 9 | **Aggregation bugs masking accuracy.** Person count clamped to 1 by EMA mapping; eigenvalue path leaking counts up to 10; both invisible to unit tests for months | CHANGELOG #803, #894 | L5 summary gates on `person_count_changes`/count distributions; convergence tests replaying the live loop |
| 10 | **Stale verification claims.** `VERIFY.sh` prints hardcoded "(8/8)" over 10 actual checks; `CLAUDE.md` says "7/7" | `generate-witness-bundle.sh` line 293; `CLAUDE.md` | Compute the verdict count; audit doc claims against scripts each release |
| 11 | **Licensing limits on the eval set.** MM-Fi is CC BY-NC — weights trained solely on it cannot back commercial claims | ADR-015 Consequences | Track dataset license alongside every published number |
---
## 7. Gap List (what must be built to fully execute this methodology)
| Gap | Owner layer | Source |
|---|---|---|
| Machine-readable criterion baseline (`v2/benchmarks/criterion-baseline.json`) + CI comparison job | L2 | §4.2 (numbers currently only in ADR prose) |
| Provenance + producer script for `benchmark_baseline.json`; soft-gate job | L5 | §1.3, §4.3 (zero code references today) |
| `ruview-cli --ablation mode=auto` wiring + `expected_ablation_<slug>.sha256` (currently placeholders → exit 2) | L3 | ADR-145 implementation status |
| Seeded swarm `evals/` harness + `evals/RESULTS.md` internal leaderboard | L3/L5 | ADR-149 §6, §8 open issues |
| Fix `VERIFY.sh` hardcoded verdict count; reconcile `CLAUDE.md` "7/7" | L1 | §1.2 |
| Curated paired room-A/room-B labeled replay set (frozen, SHA-pinned, never trained on) | L3 | ADR-145 §3.2 |
| ARM/edge on-device latency validation for the int4 model (x86-only today) | L4 | efficiency-frontier doc ("Pi fleet pending") |
| Bench validation of the antenna-placement matrix on real hardware | L4 | PRODUCTION-ROADMAP.md Tier 2.3 |
---
## Update — falsifiable occupancy benchmark implemented
`wifi-densepose-train::occupancy_bench` (added this branch) makes the
presence/person-count claim **falsifiable in code**, directly enforcing the L3
discipline above. It grades predictions vs ground truth and gates a SOTA claim
behind a single `claim_allowed` invariant that requires **all** of:
1. `DataProvenance::Measured` — synthetic/mock data is scorable for regression
but **never claimable** (anti-mock-contamination; the CLAUDE.md Kconfig-bug
lesson made structural).
2. A leak-free `EvalSplit``validate()` refuses any split where a subject *or*
environment id appears in both train and test (subject leakage / per-env
overfitting).
3. `n_test ≥ min_test_samples` (small-N guard).
4. Presence F1 whose **bootstrap-CI lower bound** (deterministic splitmix64,
seeded) clears the threshold — not the point estimate.
5. Count MAE within threshold.
The claim string is unreadable except through the gate (returns `NO_CLAIM`
otherwise) — same discipline as the `ruview-gamma` acceptance gate. 10 tests
cover each refusal path. What remains is *data*, not *method*: feed it a frozen,
SHA-pinned, subject/environment-disjoint **measured** replay set (the curated
room-A/room-B item above) and the "beyond SOTA" claim becomes a passing or
failing test, not a slogan.
---
*All values cited from: `benchmark_baseline.json`, `v2/crates/*/benches/*.rs` (15
files), `docs/adr/ADR-147-benchmark-proof.md`,
`docs/adr/ADR-149-swarm-benchmarking-evaluation-methodology.md`,
`docs/adr/ADR-145-ablation-eval-harness-privacy-leakage.md`,
`docs/adr/ADR-028-esp32-capability-audit.md`,
`docs/adr/ADR-015-public-dataset-training-strategy.md`,
`docs/adr/ADR-150-rf-foundation-encoder.md`,
`docs/benchmarks/wifi-pose-efficiency-frontier.md`,
`scripts/generate-witness-bundle.sh`, `archive/v1/data/proof/verify.py`,
`archive/v1/data/proof/expected_features.sha256`, `CHANGELOG.md`, `CLAUDE.md`,
`docs/research/sota-2026-05-22/PRODUCTION-ROADMAP.md`.*

View File

@ -0,0 +1,252 @@
# RuView Beyond-SOTA — 04: Performance Review & Optimization Roadmap
**Scope:** the streaming sensing pipeline (CSI ingest → multistatic fusion → CIR gate →
pose publish) in `v2/`, hot-path crates `wifi-densepose-signal` (ruvsense),
`wifi-densepose-engine`, `wifi-densepose-ruvector`, plus build-profile and edge-target
(Pi 5-class, WASM) considerations.
**Hard constraint (non-negotiable):** the witness chain (ADR-028, ADR-136 §2.5 replay
contract, ADR-137 §2.7 BLAKE3 witness in
`v2/crates/wifi-densepose-engine/src/lib.rs:437-448`) requires **bit-exact deterministic
float output**. Every recommendation below is tagged with its determinism risk. Anything
that reorders float additions, enables FMA contraction, fast-math, or parallel reduction
**changes the witness hash** and requires a coordinated proof-hash regeneration
(`verify.py --generate-hash`) plus witness-bundle re-issue.
---
## 1. What we actually have measured (and what we don't)
`/home/user/RuView/benchmark_baseline.json` is a **signal-quality soak baseline**, not a
latency benchmark: 1,566 samples (ticks 5113152395) of
`variance / motion / presence / confidence / est_persons / kp_spread / rssi`, with a
summary block (`confidence_mean: 0.643`, `presence_ratio: 0.934`,
`kp_spread_mean: 86.7`, `person_count_changes: 10`). **It contains zero timing data.**
It is the accuracy guardrail for any optimization (post-change soak must reproduce these
distributions), not a latency baseline.
Latency benchmarks exist but no committed results were found in the repo:
| Bench | File | What it measures |
|---|---|---|
| `process_cycle_4nodes_56sc` | `v2/crates/wifi-densepose-engine/benches/engine_cycle.rs:34-48` | One full engine cycle, 4 nodes × 56 subcarriers, vs. the documented 50 ms budget (`engine_cycle.rs:3-6`) |
| `cir_bench` | `v2/crates/wifi-densepose-signal/benches/cir_bench.rs` | `CirEstimator::estimate()` per tier (HT20/HT40/HE20/HE40) + 12-link amortization |
| `sketch_bench` | `v2/crates/wifi-densepose-ruvector/benches/sketch_bench.rs:86-175` | Hamming sketch vs. float L2/cosine compare; top-K over 1,024-sketch bank |
| `signal_bench`, `calibration_bench`, `aether_prefilter_bench` | `v2/crates/wifi-densepose-signal/benches/` | Signal-path and ADR-135 calibration throughput |
**Action zero of the roadmap is to run these on a Pi 5 and commit the criterion
baselines.** All impact classes below are derived from operation counts read out of the
code (cited), not invented measurements.
---
## 2. Latency budget model — streaming pipeline
Two clock domains exist and must not be conflated:
- **TDMA sensing cycle: 20 Hz / 50 ms** — the architecture's own budget
(`v2/crates/wifi-densepose-signal/src/ruvsense/mod.rs:5`, `RuvSenseConfig::target_hz =
20.0` at `mod.rs:258`, and the bench doc `engine_cycle.rs:3`).
- **CSI ingest: 100 Hz per node** — raw frames arrive ~5× faster than the fused output
rate; per-frame ingest work (parse, normalize, calibrate, window) must therefore fit a
**10 ms** per-frame envelope while the fused path fits **< 50 ms end-to-end**.
Proposed per-stage budget for the 50 ms end-to-end target (4 nodes, HT20 / 56
subcarriers — the configuration the engine bench encodes):
| # | Stage | Code | Budget | Risk (from code reading) |
|---|---|---|---|---|
| 1 | Ingest + hardware normalize (per 100 Hz frame) | `hardware_norm`, `multiband.rs` | 2 ms | Low — vector ops on 56 floats |
| 2 | Calibration apply (ADR-135) | `ruvsense/calibration.rs` | 2 ms | Low — Welford lookups |
| 3 | Phase alignment | `phase_align.rs:117-152` | 1 ms | Low — ≤ 20 iterations over ≤ 17 static subcarriers (`config.max_iterations: 20`, `phase_align.rs:57`); allocation churn only (§3) |
| 4 | Multistatic fusion (attention + softmax) | `multistatic.rs:512-598` | 2 ms | Low — O(nodes × 56); but does duplicate work in `fuse_scored` (§3, F2) |
| 5 | **CIR gate (ISTA L1)** | `multistatic.rs:440-475``cir.rs:601-654` | 15 ms | **HIGH** — dominant cost, scales badly with PHY tier (below) |
| 6 | Coherence score + gate decision | `coherence.rs`, `coherence_gate.rs` | 2 ms | Low — z-scores over 56 subcarriers |
| 7 | Tomography (ADR-030 tier 2, when enabled) | `tomography.rs:236-323` | 8 ms | **Medium** — per-iteration allocation + loose step size (§3, F8/F9) |
| 8 | Pose tracker (17-kp Kalman + re-ID) | `pose_tracker.rs` | 8 ms | Medium — sketch prefilter (ADR-084) already mitigates the re-ID scan |
| 9 | Engine: quality score, privacy gate, WorldGraph node, BLAKE3 witness | `engine/src/lib.rs:304-368` | 5 ms | Low per cycle, but **unbounded memory growth** (§4) |
| 10 | Publish (WS/serde) | sensing-server | 5 ms | Low |
| | **Total** | | **50 ms** | |
### Why stage 5 is the at-risk stage — operation counts from the code
`ista_solve` (`cir.rs:601-654`) runs **two dense complex mat-vecs per iteration**
(`matvec_phi` at `cir.rs:717-726`, `matvec_phi_h` at `cir.rs:730-745`), each O(K·G)
complex MACs (≈ 8 FLOPs each), up to `max_iters: 100` (`cir.rs:176`). Per
`CirConfig` (`cir.rs:164-233`):
| Tier | K (active) | G (taps) | FLOPs/iter (2·K·G·8) | FLOPs @100 iters |
|---|---|---|---|---|
| HT20 | 52 | 156 | ≈ 0.13 M | ≈ 13 M |
| HT40 | 114 | 342 | ≈ 0.62 M | ≈ 62 M |
| HE20 | 242 | 726 | ≈ 2.8 M | ≈ 0.28 G |
| HE40 | 484 | 1,452 | ≈ 11.2 M | ≈ 1.1 G |
HT20 fits the 15 ms budget comfortably on a Pi 5; **HE40 at worst-case iteration count
is ~1.1 GFLOP of scalar, cache-unfriendly work per estimate and will not fit any 50 ms
budget without structural change** (F4 below). Today the gate runs once per cycle on the
first link only (`multistatic.rs:452-463`), which contains the damage; the 12-link
amortization pattern in `cir_bench.rs` shows the intended scale-up, which multiplies
this cost ×12.
---
## 3. Findings table — optimization opportunities
Impact: relative cycle-time/memory effect at the 4-node HT20 operating point unless
noted. Determinism: **EXACT** = bit-identical output guaranteed; **TIE** = only
tie-breaking/ordering may differ; **CHANGES-FLOATS** = output bits change, witness/proof
hash must be regenerated.
| ID | Finding (file:line) | Impact | Effort | Determinism |
|---|---|---|---|---|
| F1 | `FusedSensingFrame` deep-copies every input frame each cycle: `node_frames: node_frames.to_vec()` (`multistatic.rs:282`) — clones all per-node amplitude+phase vectors per 50 ms cycle even when downstream geometry consumers don't need them | Med | Low (Arc/Cow or borrow) | EXACT |
| F2 | `fuse_scored` re-derives the per-node amplitude views and recomputes `node_attention_weights` after `fuse` already computed them inside `attention_weighted_fusion` (`multistatic.rs:311-321` duplicating `multistatic.rs:520`) — full cosine-sim + softmax done twice per cycle | Low-Med | Low (return weights from `fuse`) | EXACT (same math, computed once) |
| F3 | CIR gate rebuilds a heap `CsiFrame` per cycle: `build_csi_frame_from_channel` allocates an `Array2<Complex64>` and converts amplitude/phase via `from_polar` per subcarrier (`multistatic.rs:488-506`, called from `multistatic.rs:462`), then `extract_csi_vector` converts back to `Complex32` (`cir.rs:505-530`) — f32→f64→f32 round-trip plus two allocations purely as glue | Med | Med (give `CirEstimator` a slice-based entry point) | EXACT if conversions reproduce exactly (f32→f64 is lossless; `from_polar` in f64 then truncate ≠ f32 polar — keep the f64 intermediate to stay exact, or accept CHANGES-FLOATS and regenerate hashes) |
| F4 | ISTA inner loop uses dense O(K·G) mat-vecs (`cir.rs:717-745`) although Φ is a sub-sampled DFT (`cir.rs:539-558`) — the products Φx and Φᴴr are computable via an FFT of length G in O(G log G), an ~840× FLOP cut at HE20/HE40 (table §2) | **High** (the only path to HE40 real-time) | High | **CHANGES-FLOATS** (different summation order than the sequential dot product) — must ship behind a feature flag, A/B against `cir_proof_runner`, regenerate `expected_features.sha256` + witness bundle |
| F5 | `neumann_warm_start` recomputes the diagonal of ΦᴴΦ with a full K×G pass **per frame** (`cir.rs:676-681`), rebuilds the COO→CSR diagonal matrix per frame (`cir.rs:683-685`), and collects `rhs_re`/`rhs_im` Vecs per frame (`cir.rs:689-690`) — yet `diag` depends only on Φ, which is fixed at `CirEstimator::new` | Med | Low (precompute diag+CSR in `new()`) | EXACT (same values, computed once) |
| F6 | `phase_variance` collects a `Vec<f32>` of phases per call (`cir.rs:792`) — replaceable by a two-pass loop with zero allocation | Low | Low | EXACT |
| F7 | Φ and Φᴴ are both stored densely (`cir.rs:546-547`): 2·K·G·8 bytes — Φᴴ entries are just conjugates of Φ (`cir.rs:555`), so a transposed-iteration kernel over Φ alone halves the footprint (HE40: 11.2 MB → 5.6 MB) | Low (latency) / Med (memory §4) | Med | EXACT (conjugation is exact; keep identical accumulation order in the transposed kernel) |
| F8 | Tomography allocates the gradient vector **inside** the solver iteration loop: `let mut gradient = vec![0.0_f64; self.n_voxels]` (`tomography.rs:266`) — one heap alloc + zeroing per iteration, up to `max_iterations: 100` (`tomography.rs:75`); hoist and `fill(0.0)` | Med (for tier-2 deployments) | Low | EXACT |
| F9 | Tomography step size uses the Frobenius-norm upper bound for the Lipschitz constant (`tomography.rs:253-259`, comment admits `‖WᵀW‖ ≤ ‖W‖_F²`) — a bound loose by up to the matrix rank, forcing proportionally more ISTA iterations than the power-method estimate used in `cir.rs:566-590` | Med | Low (reuse the cir.rs power-method pattern) | **CHANGES-FLOATS** (different step ⇒ different iterate path) |
| F10 | `apply_phase_correction` clones the amplitude vector and allocates a fresh corrected-phase Vec per channel per cycle (`phase_align.rs:258-268`, `frame.amplitude.clone()` at `phase_align.rs:264`); `align` additionally `frames.to_vec()`s on the single-channel path (`phase_align.rs:128`) — an in-place `align_mut` avoids all of it | Low-Med | Low | EXACT |
| F11 | Static-subcarrier selection fully sorts all subcarriers by variance (`phase_align.rs:180`) where `select_nth_unstable_by` suffices — trivial at 56 subcarriers, relevant at HE tiers (242484) | Low | Low | **TIE** (equal-variance ties may select a different subcarrier set; pin a stable tie-break on index to stay EXACT) |
| F12 | Engine clones each node's amplitude vector for the array coordinator every cycle: `cf.amplitude.clone()` (`engine/src/lib.rs:385`); also allocates a `Vec<Option<CalibrationId>>` per cycle (`lib.rs:293`) and `format!("{e:?}")` strings for every evidence ref (`lib.rs:337`) | Low | Low | EXACT |
| F13 | `fuse_scored_calibrated` computes the modal calibration id in O(n²) (`multistatic.rs:404-410`) — harmless at n ≤ 15 nodes, noted for swarm-scale reuse (ADR-148) | Low | Low | EXACT |
| F14 | **No `rayon` and no SIMD feature exists anywhere in the hot crates** (grep over `crates/*/Cargo.toml`: zero hits for rayon/simd/target-feature outside wasm-opt flags). The 12-link CIR pattern (`cir_bench.rs:4-5`) and the per-node ingest path are embarrassingly parallel **across independent links/nodes** | High (multi-link tiers) | Med | **EXACT if and only if** parallelism stays at link/node granularity with results collected in deterministic (index) order and no shared float accumulator; intra-link parallel reductions are CHANGES-FLOATS and are banned |
| F15 | `Cir::top_k_taps` clones and fully sorts all G taps (`cir.rs:322-332`) — O(G log G) with a G-sized clone; a k-heap (the exact pattern already written in `sketch.rs:546-563`) is O(G log k) | Low | Low | TIE (equal-magnitude ordering; pin index tie-break) |
| F16 | Core `CsiFrame` carries `Complex64` while the entire ruvsense DSP path computes in f32 (conversion at `cir.rs:525`) — 2× memory and bandwidth on every ingest for precision the pipeline immediately discards | Med (memory/bandwidth) | High (core type change ripples everywhere) | **CHANGES-FLOATS** at the boundary; defer until a major version |
| F17 | Sketch path is already well-optimized: heap-based top-K with n ≤ k fast path (`sketch.rs:536-569`), 28-byte wire format (`sketch.rs:303`). Remaining win is build-level: `count_ones()` only lowers to POPCNT/NEON-vcnt when the target CPU enables it (see §5) | Low | Low | EXACT (integer ops) |
---
## 4. Memory-footprint analysis (Pi 5-class and WASM; ESP32 aggregation out of scope)
**Static, per-process (from struct definitions):**
| Component | Sizing source | Footprint |
|---|---|---|
| `CirEstimator` HT20 (Φ + Φᴴ, `Complex32`) | `cir.rs:546-547`, K=52 G=156 | 2 · 52 · 156 · 8 B ≈ **130 KB** |
| `CirEstimator` HE20 | K=242 G=726 | ≈ **2.8 MB** |
| `CirEstimator` HE40 | K=484 G=1452 | ≈ **11.2 MB** (halvable via F7) |
| Tomography weight matrix | `tomography.rs:214-217`, sparse per-link (voxel,weight) pairs; default grid 8×8×4 = 256 voxels (`tomography.rs:70-73`) | tens of KB at default grid |
| Sketch bank, 1,024 × 128-d | `sketch.rs` 1 bit/dim | 1,024 · 16 B ≈ **16 KB** (vs 512 KB float) |
A Pi 5 (48 GB) absorbs all of this trivially. The real memory risks are dynamic:
1. **Unbounded WorldGraph growth (the one genuine leak-class issue).** Every
`process_cycle` appends a `SemanticState` node plus a `DerivedFrom` edge
(`engine/src/lib.rs:346-352`), and change-points append `Event` nodes
(`lib.rs:422-428`). At 20 Hz that is **1.73 M nodes/day** with no eviction anywhere
in the engine. `snapshot_json` (`lib.rs:191-193`) then serializes the whole graph.
**Required:** a retention/compaction policy (ring buffer or time-windowed rollup of
SemanticStates). Determinism caveat: eviction changes snapshot *contents* (a product
decision), not float math — the per-cycle witness (`lib.rs:437-448`) is unaffected.
2. **Per-cycle allocation churn** (F1, F3, F5, F8, F10, F12): at 20 Hz this is dozens of
short-lived heap allocations per cycle. On a Pi 5 this is allocator pressure and
cache pollution rather than RSS growth; on WASM (bump-ish dlmalloc, no MADV_FREE) it
inflates the linear memory high-water mark, which is never returned to the host.
3. **WASM targets.** `wifi-densepose-wasm` is a browser binding crate (JS interop,
serde, chrono — `crates/wifi-densepose-wasm/Cargo.toml`) and pulls `wifi-densepose-mat`
optionally; it relies on `wasm-opt -O4` (`Cargo.toml` `[package.metadata.wasm-pack]`).
`wifi-densepose-wasm-edge` is the disciplined one: `no_std` + `libm`, its own profile
`opt-level = "s"`, lto, cgu=1 (`crates/wifi-densepose-wasm-edge/Cargo.toml`). Neither
enables `+simd128` (§5). If the CIR estimator is ever compiled to wasm-edge, HE40's
11.2 MB of sensing matrix alone is ~700 pages of linear memory — restrict edge WASM
to HT20 (130 KB) or ship F4/F7 first.
---
## 5. Build-profile review & recommendations
Current release profile (`v2/Cargo.toml:213-218`) is already aggressive and correct:
`opt-level = 3`, `lto = true` (fat), `codegen-units = 1`, `panic = "abort"`,
`strip = true`; `bench` inherits release with debug symbols (`v2/Cargo.toml:225-227`).
There is nothing wrong to fix here — the gains left are target- and feedback-driven:
1. **Per-target CPU tuning (EXACT, do first).** No `target-cpu` is set anywhere. For
Pi 5 fleet builds: `RUSTFLAGS="-C target-cpu=cortex-a76"` — enables NEON scheduling
and `vcnt` for the sketch path (F17) without changing IEEE semantics. LLVM does not
reassociate float reductions or contract to FMA without explicit fast-math/contract
flags, so scalar float results stay bit-exact. **Verify with the existing proof
runners** (`cir_proof_runner`, `calibration_proof_runner`,
`signal/Cargo.toml`) as the acceptance gate — that is exactly what they exist for.
2. **WASM SIMD.** Add `-C target-feature=+simd128` for `wifi-densepose-wasm` builds and
keep a non-SIMD artifact for older runtimes. Same determinism note as above; gate
with the proof runners compiled to wasm where feasible.
3. **PGO: feasible and determinism-safe.** PGO changes inlining/layout, never FP
semantics. The repo already has ideal deterministic training workloads: the proof
runner binaries plus `engine_cycle` / `cir_bench`. Pipeline: `cargo pgo build`
run proof runners + benches → `cargo pgo optimize`. Expect mid-single-digit to ~15%
on branchy paths (gate decisions, tracker lifecycle); the dense ISTA loop will see
little. Cost: CI complexity. Verdict: do it after F1F12, not before.
4. **Do not** enable `-ffast-math`-equivalents (`fadd_fast`, `core::intrinsics`,
`-C llvm-args=-fp-contract=fast`) anywhere in the witness path. This must be a
stated rule in CONTRIBUTING/ADR, not tribal knowledge.
5. **BOLT / `opt-level` experiments are not worth it** ahead of F4; the pipeline is
FLOP-bound in one loop, not front-end bound.
---
## 6. Prioritized 90-day plan
### Phase 0 — Measure (days 110)
- Run and commit criterion baselines on a Pi 5 and an x86 dev box:
`engine_cycle`, `cir_bench` (all four tiers), `sketch_bench`, `signal_bench`,
`calibration_bench`. The 50 ms claim in `engine_cycle.rs:3` becomes a measured number.
- Add a lightweight per-stage timing histogram (feature-gated, off in witness builds) at
the §2 stage boundaries; wire a CI perf-regression gate (±10%) on the committed
baselines.
- Re-run the soak that produced `benchmark_baseline.json` and pin it as the accuracy
guardrail for everything below.
### Phase 1 — Exact, zero-risk wins (days 1035)
All EXACT findings; no witness impact; each lands with proof-runner verification:
- F5 (precompute warm-start diag/CSR in `CirEstimator::new`) — biggest exact CIR win.
- F8 (hoist tomography gradient buffer), F6, F10, F12, F1, F2 (allocation/duplication
removal), F15 + F11 with pinned index tie-breaks.
- WorldGraph retention policy (the §4.1 unbounded-growth fix) — design ADR + ring-buffer
implementation.
- Expected outcome: measurable cycle-time reduction and flat memory under 24 h soak;
**identical witness hashes**.
### Phase 2 — Determinism-managed structural wins (days 3570)
Each behind a feature flag, A/B'd against the legacy path (the `use_cir_gate` A/B switch
at `multistatic.rs:103` is the template), with proof-hash regeneration as an explicit,
witnessed release event:
- **F4: FFT-based Φ/Φᴴ application in ISTA** — the headline item; the only route to
HE20/HE40 real-time and the 12-link pattern. Acceptance: cir_bench speedup ≥ 5× at
HE20, soak metrics within guardrail, new `expected_features.sha256` published in a
fresh witness bundle.
- F9 (power-method Lipschitz in tomography) riding the same hash-regen train.
- F3 (slice-based CIR entry point), choosing the exact-f64-intermediate variant if the
hash train slips.
- F14: feature-gated `rayon` across **links/nodes only**, deterministic index-ordered
collection; CI must run the determinism test (`engine/src/lib.rs:535-548`
`cycle_is_deterministic`) with the feature on.
### Phase 3 — Platform & toolchain (days 7090)
- Pi 5 `target-cpu=cortex-a76` fleet builds + proof-runner verification (§5.1).
- `+simd128` WASM artifact + size budget check for wasm-edge (§5.2, §4.3).
- PGO pilot in CI using proof runners as the training corpus (§5.3).
- Re-baseline: new criterion numbers, refreshed witness bundle, updated this document's
§1 with real measured latencies.
**Out of 90-day scope, flagged for the architecture backlog:** F16 (Complex64→Complex32
in core), F7 (single-matrix Φ kernel — bundle with F4), and HE40-on-edge (blocked on
F4+F7).
---
## 7. Summary
The pipeline's only structural latency hazard is the dense ISTA CIR solver
(`cir.rs:601-654` + `cir.rs:717-745`): fine at HT20, ~1.1 GFLOP worst-case per estimate
at HE40, and slated to run per-link (×12). Everything else is allocation churn and
duplicated work that can be removed with **bit-exact** refactors (F1F12), plus one
genuine memory bug-class issue: unbounded WorldGraph growth at 20 Hz
(`engine/src/lib.rs:346-352`). The build profile is already optimal; remaining toolchain
gains (target-cpu, wasm simd128, PGO) are determinism-safe and cheap. The determinism
constraint is workable because the repo already owns the right tools — deterministic
proof runners, an A/B gate pattern, and a per-cycle witness — so float-changing
optimizations become scheduled, witnessed hash-regeneration events rather than risks.

View File

@ -0,0 +1,96 @@
# RuView Beyond-SOTA Research Series
Research swarm output (2026-06-09) defining what a beyond-state-of-the-art
RuView implementation is, what the current system actually delivers, and the
validation/benchmark/optimization evidence gathered in the same session.
Produced by a 5-agent hierarchical research swarm (system reviewer, SOTA
surveyor, architect, benchmark methodologist, performance analyst) plus a
validation pass run against the working tree.
## Documents
| Doc | Scope | One-line takeaway |
|-----|-------|-------------------|
| [00-system-review.md](00-system-review.md) | Capability audit of the current engine | Signal layer is the deepest asset (`ruvsense/` ≈14.4k lines, 310 in-module tests); the model tier is the emptiest (no trained checkpoint in-tree); the live 20 Hz path is the main integration gap |
| [01-sota-landscape-2026.md](01-sota-landscape-2026.md) | Published SOTA per capability axis (web-verified) | Defines the beyond-SOTA bar: 12-row capability → published SOTA → RuView-today → target table; IEEE 802.11bf-2025 is ratified and moves the moat up-stack |
| [02-beyond-sota-architecture.md](02-beyond-sota-architecture.md) | Target architecture | 8 pillars (RF foundation encoder + UQ heads, differentiable RF forward model, RF-SLAM×WorldGraph loop, camera→RF distillation, swarm apertures, continual adaptation, deterministic WASM edge, NV fusion) — all landing inside existing crates, no rewrite (per ADR-136 §2.1) |
| [03-benchmark-validation-methodology.md](03-benchmark-validation-methodology.md) | Test/validation/benchmark methodology | 6-layer validation pyramid; 15 criterion bench targets inventoried; `benchmark_baseline.json` is a live-capture anchor, not a criterion baseline; statistical protocol from ADR-149 (≥10 seeds, IQM, bootstrap CIs) |
| [04-optimization-roadmap.md](04-optimization-roadmap.md) | Performance review + 90-day plan | ISTA CIR solver is the dominant latency hazard (~1.1 GFLOP/frame at HE40); exact zero-risk wins identified; WorldGraph grows unboundedly (no eviction) — a real bug-class |
## Validation results (this session, 2026-06-09)
All measured on this branch (`claude/ruview-beyond-sota-xgv8aq`), Linux
container, `cargo test --workspace --exclude wifi-densepose-desktop
--no-default-features` (the desktop crate needs GTK system libraries absent in
the container; this is an environment limitation, not a code failure).
| Layer | Command | Result |
|-------|---------|--------|
| L0 unit/integration | `cargo test --workspace --exclude wifi-densepose-desktop --no-default-features` | **154 suites, 2,797 passed, 0 failed** (pre-optimization baseline; re-run post-optimization also green) |
| L1 deterministic proof | `python archive/v1/data/proof/verify.py` | **VERDICT: PASS** — hash `f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a` (bit-exact) |
| L2 criterion (CIR) | `cargo bench -p wifi-densepose-signal --bench cir_bench --no-default-features --features cir` | Baselines captured pre/post optimization (below) |
~~Known pre-existing issue (not introduced here): `cargo check -p
wifi-densepose-mat --no-default-features` fails standalone with 101 serde
feature-unification errors; it builds and passes inside `--workspace` runs.~~
**Fixed on this branch:** `pub mod api` (the only serde user) is now gated
behind the `api` feature that owns the optional serde dependency; all feature
combos compile.
## Optimizations applied (this session)
Two **exact** (bit-identical float results — summation order unchanged,
witness chain unaffected) optimizations from the 04 roadmap's "zero-risk"
tier were implemented and verified:
1. **`cir.rs` warm-start precompute** — the diagonal Tikhonov preconditioner
`diag(Φ^H Φ) + λI` and its CSR matrix depend only on Φ and λ (fixed at
`CirEstimator::new`) but were rebuilt on every frame (O(K·G) pass + CSR
allocation). Moved to construction
(`crates/wifi-densepose-signal/src/ruvsense/cir.rs`,
`build_warm_start_system`).
2. **`tomography.rs` solver hoisting** — the ISTA gradient `Vec` was
allocated inside the 100-iteration loop and the Frobenius Lipschitz bound
recomputed per `reconstruct` call; both hoisted
(`crates/wifi-densepose-signal/src/ruvsense/tomography.rs`).
### Measured impact (criterion, paired pre/post baselines, same container)
| Bench | Pre-opt | Post-opt | Change | Significant? |
|-------|---------|----------|--------|--------------|
| `cir_estimate/he40` | 12.34 ms | 11.86 ms | **3.9 %** | yes (p < 0.01) |
| `cir_multiband_3band` (30 ms group) | 30.16 ms | 29.72 ms | 1.4 % | yes (p < 0.01) |
| `cir_multiband` (142 ms group) | 141.9 ms | 140.1 ms | 1.2 % | yes (p < 0.01) |
| `cir_estimate/ht40` | 11.73 ms | 11.78 ms | +0.4 % | no (p = 0.28) |
| `cir_estimate/he20` | 2.49 ms | 2.49 ms | 0.1 % | no (p = 0.85) |
| `cir_estimate/ht20` | 2.48 ms | 2.58 ms | +3.8 % | noise — see note |
Note on ht20: `cir_estimator_new/ht20` (construction, which now does strictly
*more* work) also shows "+3 %", establishing a ≈34 % container noise floor;
the ht20 estimate delta is within it. The honest summary: the warm-start
precompute removes 1 of ~101 O(K·G) passes per frame, so the expected gain is
≈14 % — consistent with what was measured. The dominant per-frame cost is
the 100-iteration ISTA loop itself, which is exactly what the roadmap's
flag-gated FFT-operator proposal (840× on the mat-vecs, requires witnessed
hash regeneration) targets next.
Correctness post-optimization: `wifi-densepose-signal` 456 tests green;
`wifi-densepose-engine` 11/11 green including `cycle_is_deterministic` and
`calibration_mismatch_demotes_and_witness_stable` (witness-chain stability).
## Headline conclusions
1. **"Beyond SOTA" is currently unfalsifiable** without a real-CSI
ground-truth benchmark — standing one up (per doc 03's acceptance table
and ADR-149's statistical protocol) is the highest-leverage next step.
2. **The path is evolution, not rewrite**: all eight architecture pillars in
doc 02 land inside existing crates on the ADR-136 `Stage<I,O>`/`FrameMeta`
contract spine.
3. **The biggest engineering gaps** are the live 20 Hz ingest path, a trained
RF encoder checkpoint, and WorldGraph retention/eviction — ahead of any
frontier capability work.
4. **Determinism is the differentiator**: every optimization and new pillar
must preserve the witness chain; the advisory-vs-witnessed split (doc 02
§determinism) is the mechanism that lets frontier components in without
breaking it.