wifi-densepose/docs/research/ruview-beyond-sota/README.md

97 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# RuView Beyond-SOTA Research Series
Research swarm output (2026-06-09) defining what a beyond-state-of-the-art
RuView implementation is, what the current system actually delivers, and the
validation/benchmark/optimization evidence gathered in the same session.
Produced by a 5-agent hierarchical research swarm (system reviewer, SOTA
surveyor, architect, benchmark methodologist, performance analyst) plus a
validation pass run against the working tree.
## Documents
| Doc | Scope | One-line takeaway |
|-----|-------|-------------------|
| [00-system-review.md](00-system-review.md) | Capability audit of the current engine | Signal layer is the deepest asset (`ruvsense/` ≈14.4k lines, 310 in-module tests); the model tier is the emptiest (no trained checkpoint in-tree); the live 20 Hz path is the main integration gap |
| [01-sota-landscape-2026.md](01-sota-landscape-2026.md) | Published SOTA per capability axis (web-verified) | Defines the beyond-SOTA bar: 12-row capability → published SOTA → RuView-today → target table; IEEE 802.11bf-2025 is ratified and moves the moat up-stack |
| [02-beyond-sota-architecture.md](02-beyond-sota-architecture.md) | Target architecture | 8 pillars (RF foundation encoder + UQ heads, differentiable RF forward model, RF-SLAM×WorldGraph loop, camera→RF distillation, swarm apertures, continual adaptation, deterministic WASM edge, NV fusion) — all landing inside existing crates, no rewrite (per ADR-136 §2.1) |
| [03-benchmark-validation-methodology.md](03-benchmark-validation-methodology.md) | Test/validation/benchmark methodology | 6-layer validation pyramid; 15 criterion bench targets inventoried; `benchmark_baseline.json` is a live-capture anchor, not a criterion baseline; statistical protocol from ADR-171 (≥10 seeds, IQM, bootstrap CIs) |
| [04-optimization-roadmap.md](04-optimization-roadmap.md) | Performance review + 90-day plan | ISTA CIR solver is the dominant latency hazard (~1.1 GFLOP/frame at HE40); exact zero-risk wins identified; WorldGraph grows unboundedly (no eviction) — a real bug-class |
## Validation results (this session, 2026-06-09)
All measured on this branch (`claude/ruview-beyond-sota-xgv8aq`), Linux
container, `cargo test --workspace --exclude wifi-densepose-desktop
--no-default-features` (the desktop crate needs GTK system libraries absent in
the container; this is an environment limitation, not a code failure).
| Layer | Command | Result |
|-------|---------|--------|
| L0 unit/integration | `cargo test --workspace --exclude wifi-densepose-desktop --no-default-features` | **154 suites, 2,797 passed, 0 failed** (pre-optimization baseline; re-run post-optimization also green) |
| L1 deterministic proof | `python archive/v1/data/proof/verify.py` | **VERDICT: PASS** — hash `f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a` (bit-exact) |
| L2 criterion (CIR) | `cargo bench -p wifi-densepose-signal --bench cir_bench --no-default-features --features cir` | Baselines captured pre/post optimization (below) |
~~Known pre-existing issue (not introduced here): `cargo check -p
wifi-densepose-mat --no-default-features` fails standalone with 101 serde
feature-unification errors; it builds and passes inside `--workspace` runs.~~
**Fixed on this branch:** `pub mod api` (the only serde user) is now gated
behind the `api` feature that owns the optional serde dependency; all feature
combos compile.
## Optimizations applied (this session)
Two **exact** (bit-identical float results — summation order unchanged,
witness chain unaffected) optimizations from the 04 roadmap's "zero-risk"
tier were implemented and verified:
1. **`cir.rs` warm-start precompute** — the diagonal Tikhonov preconditioner
`diag(Φ^H Φ) + λI` and its CSR matrix depend only on Φ and λ (fixed at
`CirEstimator::new`) but were rebuilt on every frame (O(K·G) pass + CSR
allocation). Moved to construction
(`crates/wifi-densepose-signal/src/ruvsense/cir.rs`,
`build_warm_start_system`).
2. **`tomography.rs` solver hoisting** — the ISTA gradient `Vec` was
allocated inside the 100-iteration loop and the Frobenius Lipschitz bound
recomputed per `reconstruct` call; both hoisted
(`crates/wifi-densepose-signal/src/ruvsense/tomography.rs`).
### Measured impact (criterion, paired pre/post baselines, same container)
| Bench | Pre-opt | Post-opt | Change | Significant? |
|-------|---------|----------|--------|--------------|
| `cir_estimate/he40` | 12.34 ms | 11.86 ms | **3.9 %** | yes (p < 0.01) |
| `cir_multiband_3band` (30 ms group) | 30.16 ms | 29.72 ms | 1.4 % | yes (p < 0.01) |
| `cir_multiband` (142 ms group) | 141.9 ms | 140.1 ms | 1.2 % | yes (p < 0.01) |
| `cir_estimate/ht40` | 11.73 ms | 11.78 ms | +0.4 % | no (p = 0.28) |
| `cir_estimate/he20` | 2.49 ms | 2.49 ms | 0.1 % | no (p = 0.85) |
| `cir_estimate/ht20` | 2.48 ms | 2.58 ms | +3.8 % | noise see note |
Note on ht20: `cir_estimator_new/ht20` (construction, which now does strictly
*more* work) also shows "+3 %", establishing a 34 % container noise floor;
the ht20 estimate delta is within it. The honest summary: the warm-start
precompute removes 1 of ~101 O(K·G) passes per frame, so the expected gain is
14 % consistent with what was measured. The dominant per-frame cost is
the 100-iteration ISTA loop itself, which is exactly what the roadmap's
flag-gated FFT-operator proposal (840× on the mat-vecs, requires witnessed
hash regeneration) targets next.
Correctness post-optimization: `wifi-densepose-signal` 456 tests green;
`wifi-densepose-engine` 11/11 green including `cycle_is_deterministic` and
`calibration_mismatch_demotes_and_witness_stable` (witness-chain stability).
## Headline conclusions
1. **"Beyond SOTA" is currently unfalsifiable** without a real-CSI
ground-truth benchmark standing one up (per doc 03's acceptance table
and ADR-171's statistical protocol) is the highest-leverage next step.
2. **The path is evolution, not rewrite**: all eight architecture pillars in
doc 02 land inside existing crates on the ADR-136 `Stage<I,O>`/`FrameMeta`
contract spine.
3. **The biggest engineering gaps** are the live 20 Hz ingest path, a trained
RF encoder checkpoint, and WorldGraph retention/eviction ahead of any
frontier capability work.
4. **Determinism is the differentiator**: every optimization and new pillar
must preserve the witness chain; the advisory-vs-witnessed split (doc 02
§determinism) is the mechanism that lets frontier components in without
breaking it.