wifi-densepose/api-docs/adr/ADR-154-signal-dsp-beyond-s...

23 KiB
Raw Blame History

ADR-154: Signal/DSP Beyond-SOTA Sweep — Milestone 0 (Correctness, Provable Perf, and the SOTA Landscape)

Field Value
Status Proposed
Date 2026-06-11
Deciders ruv
Codebase target wifi-densepose-signal (ruvsense/, features.rs, csi_processor.rs, spectrogram.rs, bvp.rs), benches, docs
Relates to ADR-134 (CIR sparse recovery), ADR-135 (Empty-Room Baseline), ADR-029/030/032 (Multistatic mesh + security), ADR-152 (WiFi-Pose SOTA 2026 intake), ADR-153 (802.11bf forward-compat)
Scope Milestone 0 of the beyond-SOTA signal/DSP sweep: high-leverage correctness/security fixes, two measured perf wins, the per-module SOTA landscape with evidence grades, and a prioritized roadmap. 45 review findings are explicitly deferred (§7 backlog) — nothing is silently dropped.

0. PROOF discipline (this ADR's contract)

This project has been publicly accused of "AI slop." This ADR answers that with evidence, not adjectives:

  • Every claimed code improvement ships with a committed regression test (correctness) or a committed criterion bench (performance).
  • Every perf number below is MEASURED before/after with the exact reproduce command. A perf claim without a measured before/after is UNPROVEN and is not made here.
  • Every external SOTA reference is graded MEASURED / CLAIMED / THEORETICAL, distinguishing what a paper measured from what it asserts and from what is merely plausible.
  • The headline finding — a dead CIR coherence gate that silently fell back in production for every canonical frame — is disclosed in full (§2), not buried.

Test machine for the perf numbers: Windows 11, cargo bench --release, criterion 0.5. Numbers are wall-clock medians on this box; they are about ratios (before/after), which are stable across machines, not absolute ns.


1. Context

The RuvSense signal stack (16 ruvsense/ modules + the classic features.rs/csi_processor.rs/spectrogram.rs/bvp.rs pipeline) grew quickly across ADR-014/029/030/134/135. A beyond-SOTA review surfaced ~50 findings ranging from two critical correctness/security defects to micro-optimizations and SOTA-gap research items. Milestone 0 closes the provable, high-leverage subset: the two criticals, a divide-by-zero trio, two measured perf wins, and the research landscape. The remaining ~45 are catalogued in §7 so the backlog is explicit and auditable.


2. The headline finding — the ADR-134 CIR coherence gate was DEAD in production (CRITICAL, FIXED)

2.1 What was wrong

MultistaticFuser fuses canonical CSI frames: hardware_norm.rs resamples every chipset onto a uniform 56-tone canonical grid before fusion (HardwareNormalizer, default canonical_subcarriers = 56). The ADR-134 CIR coherence gate (cir_gate_coherence, multistatic.rs) is supposed to blend a CIR dominant-tap ratio into the cross-node coherence — coherence = 0.7·freq + 0.3·dominant_tap_ratio.

But the gate was wired to CirEstimator::new(CirConfig::ht20()) (with_cir_ht20), and ht20() expects 64 FFT bins or 52 active tones. A canonical-56 frame matches neither, so every call returned CirError::SubcarrierMismatch and cir_gate_coherence hit its silent Err(_) => freq_coherence fallback (multistatic.rs). Net effect: the CIR gate never ran on a single production frameuse_cir_gate = true was indistinguishable from false. This is the exact shape of "AI slop": a feature that compiles, has tests on the estimator, and is dead at the integration seam.

2.2 The fix (the gate now actually runs)

  • New CirConfig::canonical56() (cir.rs): 64-bin HT20 framing, 56 active tones, 168 delay taps, Φ built over a contiguous 28..+28 active-tone grid (also the native Atheros-56 layout). bandwidth_hz/tap_spacing stay physically correct for a 20 MHz HT20 channel; only the active-tone count differs from ht20().
  • New MultistaticFuser::with_cir_canonical56() — the correct default for the RuvSense pipeline. with_cir_ht20() is retained for genuine raw-64/52 feeds and now carries a loud doc-warning.
  • active_indices() handles (64, 56) explicitly and the fallback now selects the slice whose length matches num_active (so Φ's column count is always self-consistent — no silent fall-through to the 52-index slice).
  • The remaining silent fallback is made LOUD: a SubcarrierMismatch inside cir_gate_coherence now fires a debug_assert! naming the misconfiguration ("CIR gate DEAD … build it with CirConfig::canonical56()"). A config error can no longer hide as a graceful runtime degrade.
  • cir_estimate_first() exposes the raw estimate() verdict so a test can count Ok vs Err on a canonical-56 stream.

2.3 The PROOF (committed regression tests, ruvsense::multistatic::tests)

Test Asserts Result
cir_gate_ht20_is_dead_on_canonical56 old ht20 estimator on 8 canonical-56 frames → 0 Ok, 8 SubcarrierMismatch the dead gate, measured
cir_gate_canonical56_is_alive new canonical56 estimator on the same 8 frames → 8 Ok, 0 Err the gate runs
cir_gate_on_changes_coherence_vs_off coherence(gate on)coherence(gate off) (|Δ| > 1e-6) the CIR term is actually applied
cir_gate_dead_ht20_equals_gate_off (release-only) dead-ht20 coherence == gate-off coherence (|Δ| < 1e-9) confirms the silent degradation the fix removes

Reproduce:

cd v2 && cargo test -p wifi-densepose-signal --no-default-features --lib \
  ruvsense::multistatic::tests::cir
# 3 passed (the 4th is #[cfg(not(debug_assertions))], add --release to run it)

Resolution: FIXED (not merely loud-fail-documented). The gate now decodes 100% of canonical-56 frames where it previously decoded 0%.


3. The second critical — NaN/inf adversarial-detector bypass (CRITICAL, FIXED)

3.1 What was wrong

AdversarialDetector::check (adversarial.rs) takes per-link link_energies: &[f64]. A single NaN/inf entry bypassed the whole detector: every e > threshold test is false on NaN, the Gini sort used partial_cmp().unwrap_or(Equal), and the final anomaly_score.clamp(0,1) returns NaN on a NaN input. A real RF link can never have NaN/inf energy, so a non-finite input is itself the strongest possible spoof — yet it could slip through as "clean."

3.2 The fix

Finite-validate at the boundary: the first non-finite link_energies entry now short-circuits to a definite anomaly (anomaly_detected = true, anomaly_score = 1.0, affected_links = [bad_idx], FieldModelViolation), and the poisoned frame is not seeded into the temporal-continuity state.

3.3 The PROOF

Test Asserts
nan_link_energy_flags_anomaly a NaN link energy → anomaly_detected, score 1.0, affected link reported, anomaly_count == 1
inf_link_energy_flags_anomaly both +inf and inf → anomaly, score 1.0
cd v2 && cargo test -p wifi-densepose-signal --no-default-features --lib \
  ruvsense::adversarial::tests::nan_link ruvsense::adversarial::tests::inf_link

4. Divide-by-(n1) window trio (CORRECTNESS, FIXED)

Three windowing helpers divided by (n 1) with no small-n guard:

Site Bug Fix
csi_processor.rs CsiPreprocessor::hamming_window(n) n=0 underflowed 0usize 1; n=1 divided by 0 → all-NaN window match n { 0 => [], 1 => [1.0], _ => … }
bvp.rs Hann window window_size=1 divided by 0 → NaN BVP length-1 guard → constant [1.0]
spectrogram.rs make_window size=1 divided by 0 for Hann/Hamming/Blackman size <= 1 short-circuit → vec![1.0; size]

The standard convention for a length-1 window is the constant 1.0; length-0 is empty.

PROOF: test_hamming_window_degenerate_sizes (csi_processor), bvp_window_size_one_is_finite (bvp), make_window_size_0_and_1_are_safe (spectrogram) — each asserts finiteness at sizes 0/1/2.

The Python deterministic proof (archive/v1/data/proof/verify.py) still prints VERDICT: PASS with the same pipeline hash f8e76f21…46f7a — the reference path uses n ≥ 2, so the guard is bit-transparent there.


5. Measured performance wins (MEASURED before/after; benches committed)

Both changes are bit-equivalent (asserted by a committed test) — they only remove wasted work. New criterion benches in benches/features_bench.rs (registered in Cargo.toml).

Reproduce both:

cd v2 && cargo bench -p wifi-densepose-signal --no-default-features --bench features_bench
# compile-only: append --no-run

5.1 FFT-planner caching for PSD (features.rs)

PowerSpectralDensity::from_csi_data constructed a fresh FftPlanner and re-planned the FFT on every frame — and FeatureExtractor::extract calls it per frame on the hot path. New from_csi_data_with_fft(csi, fft_size, &Arc<dyn Fft>) reuses a plan cached in FeatureExtractor (built once in new()). Output is bit-identical (psd_cached_fft_bit_identical_to_fresh compares f64::to_bits of values + all summary stats across 6 FFT sizes).

Bench group psd_fft_plannerfresh_planner (before) vs cached_planner (after), per frame:

fft_size before (fresh plan), median after (cached), median speedup
64 5.84 µs/frame 1.89 µs/frame 3.09×
128 9.31 µs/frame 3.61 µs/frame 2.58×
256 13.77 µs/frame 6.73 µs/frame 2.04×

Medians from criterion (warm-up 1 s, 20 samples). Raw three-point estimates (low/median/high), per frame: fresh/64 [5.27, 5.84, 6.34] µs vs cached/64 [1.76, 1.89, 2.03] µs; fresh/256 [13.29, 13.77, 14.32] µs vs cached/256 [6.26, 6.73, 7.43] µs. The win is the re-planned FftPlanner construction the cache hoists out of the per-frame loop; it grows in relative terms at small FFTs (planning is a larger fraction of a cheap transform) and stays a flat ~2× at 256.

5.2 DTW Sakoe-Chiba band honored (gesture.rs)

dtw_distance computed the band bounds j_start/j_end but still iterated the full 1..=m row, continue-ing on out-of-band cells — so the band constrained the path but not the work (still O(n·m)). The fix iterates only j_start..=j_end (O(n·band)), resetting just the two boundary-guard cells the recurrence can read, and computes the endpoint reachability (|nm| ≤ band) at the return site. Result is bit-identical to the full-row version across 12 shapes × 8 band widths (dtw_banded_bit_identical_to_fullrow).

Bench group dtw_sakoe_chibafull_row (before) vs banded (after):

case before (full row), median after (banded), median speedup
n=m=100, band=5 33.45 µs 13.77 µs 2.43×
n=m=200, band=5 122.32 µs 29.55 µs 4.14×
n=m=200, band=10 159.98 µs 60.19 µs 2.66×

Medians from criterion (warm-up 1 s, 20 samples). Raw (low/median/high): full_row n200_band5 [107.6, 122.3, 146.5] µs vs banded n200_band5 [26.4, 29.5, 33.1] µs. The speedup tracks the inner-loop cell-count ratio m / (2·band+1) — n=m=200, band=5 → 200/11 ≈ 18× fewer cells, but euclidean-distance cost and loop overhead dominate at these sizes so the wall-clock win is ~4× (still the largest at the longest sequence / narrowest band, exactly as the algorithm predicts). It shrinks toward 1× as the band widens to cover the whole matrix (band=10 → 2.66×), and grows with sequence length (band=5: 2.43× at n=100 → 4.14× at n=200).

Note on the other re-plan sites. spectrogram.rs/bvp.rs plan their FFT once per call and reuse it across all frames/subcarriers (already amortized), so caching there is marginal — deferred (§7). The PSD site was the only one re-planning per frame.


6. Per-module SOTA landscape (evidence-graded)

Grades: MEASURED (the source measured it, ideally with public method/code), CLAIMED (asserted, no reproducible artifact), THEORETICAL (plausible, no published target).

6.1 CSI → CIR (cir.rs — our ISTA/L1 sparse recovery)

  • Deep-unfolded ISTA / LISTA for CSI→CIR — MEASURED. Learned ISTA unrolling reports ~3 dB NMSE improvement over classical OMP/FISTA for channel/CIR estimation (arXiv 2211.15440; survey 2502.05952). Public methods; numbers measured in-paper. This is our #1 future item (§7) — our cir.rs already builds the sub-DFT Φ that LISTA would make trainable.
  • Diffusion CIR prior — MEASURED (artifact). github.com/benediktfesl/Diffusion_channel_est ships public weights for a diffusion-model channel-estimation prior. Heavier than our edge budget; tracked, not adopted.
  • Coherence gating (the §2 gate) — THEORETICAL. Our 0.7/0.3 freq/CIR blend is an engineering heuristic with no published accuracy target; now that it runs, it can finally be A/B-measured.

6.2 Adversarial robustness (adversarial.rs)

  • Adversarial-robustness eval for WiFi sensing — MEASURED. arXiv 2511.20456 + the Wi-Spoof benchmark provide a measured evaluation protocol for spoofed/injected CSI. Our detector's physical-plausibility checks (consistency/Gini/temporal/energy) are in the same spirit; adopting Wi-Spoof as an external benchmark is a §7 item. (The §3 NaN fix is a precondition: a detector that NaN-bypasses can't be benchmarked honestly.)

6.3 Multi-AP / multistatic fusion (multistatic.rs)

  • Bayesian multi-AP fusion — CLAIMED. arXiv 2512.02462 proposes a Bayesian fusion across APs; no code released, numbers self-reported. Our attention-weighted fusion is a different (cheaper) mechanism; tracked as a comparison target, not adopted.

6.4 RF intention-lead / pre-movement (intention.rs) — THEORETICAL

The 200500 ms pre-movement "lead signal" framing has no published commodity-WiFi target we can grade. Honestly THEORETICAL; no work item.


7. Decision, roadmap, and the deferred-findings backlog

7.1 Accepted now (this milestone)

The §2§5 fixes are ACCEPTED and committed: dead CIR gate fixed, NaN bypass fixed, window trio fixed, calibration dead-branch de-misled, two measured perf wins. All cargo test -p wifi-densepose-signal --no-default-features (and --features cir) green; Python proof PASS.

7.2 Top accepted-future item — LISTA-for-CIR (NOT implemented here)

Unroll the existing ISTA in cir.rs into trainable layers (LISTA). Effort: M. The sensing matrix Φ and the ISTA recurrence already exist; LISTA replaces the fixed step size / threshold with per-layer learned parameters over a fixed unroll depth. Measured target to beat: ~3 dB NMSE over OMP/FISTA (arXiv 2211.15440 — MEASURED). Proposed, not built in Milestone 0.

7.3 Other graded-future items

  • Adopt Wi-Spoof (arXiv 2511.20456, MEASURED) as the external adversarial benchmark for adversarial.rs.
  • Evaluate the diffusion CIR prior (public weights, MEASURED) as an offline quality ceiling — not an edge target.
  • Bayesian multi-AP fusion (2512.02462, CLAIMED) — comparison only, pending released code.

7.4 Deferred Milestone-0 review findings (explicit backlog)

Catalogued so nothing is silently dropped. Priority: P1 correctness-adjacent, P2 perf, P3 clarity/style.

Milestone-1 update (2026-06-13): the four P1 backlog items (#1, #9, #10, #13) are now cleared — #1 and #10 RESOLVED (MEASURED), #9 and #13 RESOLVED-PARTIAL (DATA-GATED: de-magicked + boundary-tested, operating values unchanged**)**. ~41 P2/P3 items remain deferred. Each fix is pinned by a regression test that fails on the old behaviour (commits fd32f094a, 4a9f2bcf4, d672fa602, 5193f6369); workspace --no-default-features green, Python proof unchanged (bit-exact).

# Module Finding Pri Why deferred
1 cir.rs ~937 phase_variance uses linear variance on wrapped angles (doc says "variance of phase angles") — spuriously inflates near ±π P1 RESOLVED (fd32f094a) — metric MEASURED, threshold DATA-GATED. Replaced with Mardia's circular variance V = 1 R̄ ∈ [0,1], invariant to the cluster's position on the circle (branch-cut artefact gone). Guard re-derived against the bounded metric via named const GHOST_TAP_CIRCULAR_VARIANCE_MAX = 0.99 (fires only when R̄ ≤ 0.01 — essentially uniform phase). The threshold value is DATA-GATED: a clean single-path ramp also sweeps the circle, so V alone can't separate clean from unsanitized without labelled frames — the default is deliberately conservative (strictly more permissive at the wrap boundary than the buggy linear guard). Fails-on-old: phase_variance_circular_not_fooled_by_branch_cut (old linear variance > TAU on wrap-straddling phases while circular V≈0, guard no longer trips), phase_variance_circular_is_bounded_and_extremal.
2 calibration.rs ~311 subtract_in_place had a vacuous if active_input {ki} else {ki} branch implying a full-FFT→bin remap that didn't exist P3 Resolved here (branch removed, sequential-convention documented to match the sibling extract_first_stream). Listed for visibility — behavior unchanged.
3 spectrogram.rs / bvp.rs FFT planner built once-per-call (already amortized across frames) P2 Marginal vs the per-frame PSD site; cache if these become hot.
4 features.rs ~347 Doppler FFT planner planned once per call, reused across subcarriers P2 Already amortized within the call.
5 multistatic.rs node_attention_weights recomputes consensus/softmax each call; no SIMD P2 Needs a bench before touching; not obviously hot.
6 tomography.rs ISTA L1 solver re-allocates voxel buffers per solve P2 Bench first.
7 pose_tracker.rs Kalman gain matrices reallocated per update P2 Bench first.
8 field_model.rs SVD recomputed on every perturbation extract P2 Incremental SVD is a real project, not a micro-fix.
9 coherence.rs / coherence_gate.rs Z-score thresholds are magic constants, untested at boundaries P1 RESOLVED-PARTIAL (5193f6369) — DATA-GATED. De-magicked classify_drift (DRIFT_STABLE_SCORE=0.85, DRIFT_STEP_CHANGE_MAX_STALE=10) and the coherence_gate.rs defaults (DEFAULT_ACCEPT_THRESHOLD/…REJECT…/…MAX_STALE_FRAMES/…PREDICT_ONLY_NOISE) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (classify_drift_*_boundary) + *_consts_unchanged_from_literals. Operating values explicitly NOT changed — defensible values still need labelled stable/drifting traces. The gate already exposed these via GatePolicyConfig (config seam).
10 longitudinal.rs Welford update not numerically guarded for n=0 P1 RESOLVED (4a9f2bcf4) — MEASURED. The shared WelfordStats (field_model.rs, consumed by longitudinal.rs) count < 2 guards already prevent the n=0 NaN / n=1 div0 / (count1) underflow, but the boundary was untested. Added welford_finite_at_n0_and_n1 (finite + documented 0.0 sentinel at n=0/n=1). Fails-on-old proof: removing the sample_variance guard makes the test panic with "attempt to subtract with overflow" at the (count 1) underflow.
11 cross_room.rs Fingerprint hash collisions unhandled P2 Low collision prob; needs design.
12 gesture.rs euclidean_distance no length-mismatch guard P3 Caller-enforced; add debug_assert.
13 adversarial.rs Gini/consistency thresholds are magic constants P1 RESOLVED-PARTIAL (d672fa602) — DATA-GATED. Lifted the bare literals in check/check_consistency (FIELD_MODEL_GINI_VIOLATION=0.8, ENERGY_RATIO_HIGH_VIOLATION=2.0, ENERGY_RATIO_LOW_VIOLATION=0.1, CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1, SCORE_W_*) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (energy_ratio_high_boundary, energy_ratio_low_boundary, field_model_gini_boundary, consistency_active_fraction_boundary) + tuning_consts_unchanged_from_literals. Operating values explicitly NOT changed — defensible values still need labelled spoofed/clean CSI (Wi-Spoof, §6.2/§7.3). Bumping a const fails a boundary test (verified).
14 cir.rs fft_operator path changes the witness hash (documented) — no test that it's numerically close to dense P2 Add a tolerance test.
15 multistatic.rs cir_gate_coherence only estimates the first node/channel; multi-node CIR consensus unused P2 Design item (which node's CIR is authoritative?).
16 phase_align.rs Iterative LO offset estimation has no convergence cap test P2 Add iteration-cap test.
17 hampel.rs Window edge handling at series boundaries P3 Cosmetic.
18 motion.rs Threshold constants undocumented P3 Doc-only.
19 csi_ratio.rs Division guard relies on 1e-12 epsilon; no test P2 Add boundary test.
20 spectrogram.rs compute_multi_subcarrier_spectrogram re-plans per subcarrier via compute_spectrogram P2 Hoist the planner (relates to #3).
2145 (assorted) Remaining clarity/doc/magic-constant/missing-boundary-test findings across ruvsense/*, features.rs, motion.rs P3 Bulk-addressable in a dedicated "test-the-boundaries + de-magic-constant" follow-up; not high-leverage individually.

Horizon-ledger one-liner. Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). Milestone-1 DONE (2026-06-13): all four P1 backlog items cleared — circular phase variance #1 (RESOLVED/MEASURED metric, DATA-GATED threshold), Welford n=0 guard #10 (RESOLVED/MEASURED), threshold magic-constants #9 & #13 (RESOLVED-PARTIAL/DATA-GATED — de-magicked + boundary-tested, values unchanged). DEFERRED to follow-up: the ~41 remaining P2/P3 findings in §7.4 — none silently dropped.


8. Consequences

  • Positive: the ADR-134 CIR gate is alive for the first time in production; the adversarial detector can no longer be NaN-bypassed; three latent divide-by-zero NaN sources are gone; the per-frame PSD path and gesture DTW are measurably faster with bit-identical output; the SOTA landscape and a concrete LISTA-for-CIR roadmap are graded and recorded.
  • Negative / honest limits: canonical56() models the canonical grid as a contiguous 56-tone band — a reasonable physical interpretation of a resampled grid, but not a literal hardware tone map; the CIR gate still uses only the first node's CIR (#15). The phase_variance metric is now correct (Mardia circular variance, Milestone-1 #1), so the branch-cut false-trip is gone — but its ghost-tap threshold (GHOST_TAP_CIRCULAR_VARIANCE_MAX = 0.99) is a conservative DATA-GATED default, not a calibrated operating point, and still awaits labelled sanitized/unsanitized frames to tune. Likewise the de-magicked coherence/adversarial thresholds (#9/#13) keep their pre-existing empirical values pending labelled calibration.
  • Neutral: no public API removed; with_cir_ht20() kept (warned); files stay scoped; new bench is additive.