diff --git a/CHANGELOG.md b/CHANGELOG.md index e47603db..c30f9e9c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -31,6 +31,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **Mesh partition risk now demotes the privacy class and is witnessed (ADR-032).** The dynamic min-cut guard's `at_risk` signal was advisory-only (it fed the recalibration advisor). It now also contributes to the ADR-141 privacy demotion alongside fusion- and array-level contradictions: a mesh close to partitioning makes the fused belief less trustworthy, so the cycle emits at a more restricted class (monotonic — information only removed). Because `effective_class` feeds the BLAKE3 witness, a fragmenting array now shifts the witness — partition risk is auditable, not just logged. The mesh computation moved ahead of the demotion step in `process_cycle`; new `mesh_guard_mut()` exposes risk-threshold tuning. Test proves a forced-risk 3-node cycle demotes PrivateHome Anonymous→Restricted and shifts the witness vs a clean *same-topology* baseline (the only delta between the two cycles is the forced risk). ### Added +- **ADR-154 Milestone-2 — bench-first P2 perf subset + missing boundary tests (`wifi-densepose-signal`, §7.4 #5/#6/#7/#8/#14/#16/#19/#20).** PROOF discipline (ADR-154 §0): every perf item was **benched before being touched** (new committed `benches/dsp_perf_bench.rs`, criterion, this Windows box); only the one item the bench proved hot was optimized, the rest are committed MEASURED-NULLs — a benched null is the proof the micro-opt was unnecessary, the §5.1 "already amortized" pattern. Every behaviour-changing edit is pinned bit-identical (or documented-tolerance). Signal crate lib `--no-default-features`: **447 passed, 0 failed, 1 ignored**; `--features cir`: **447 passed, 0 failed**. + - **#20 MEASURED-HOT, optimized (bit-identical).** `compute_multi_subcarrier_spectrogram` re-planned a fresh `FftPlanner` for *every* subcarrier (via `compute_spectrogram`). Hoisted the plan + window out of the per-subcarrier loop (new `compute_spectrogram_with_plan` core; `compute_spectrogram` delegates, unchanged). **56-subcarrier: 467.88 µs → 254.75 µs = 1.84×** (window 128); **627.27 µs → 448.39 µs = 1.40×** (window 256). Bit-identical via `multi_subcarrier_hoisted_plan_bit_identical` (`f64::to_bits` of every value across all 4 window functions × {power,magnitude}). The §7.4 intro's predicted "most likely real win" — confirmed. + - **#5 / #6 / #7 MEASURED-NULL, left as-is.** `node_attention_weights` 181 ns (2 nodes)…848 ns (8) — sub-µs, no hot-path alloc. `tomography reconstruct` (full 50-iter ISTA, 256 voxels) 47.5 µs (16 links) / 60.4 µs (32) — the 2 voxel buffers are already alloc-once + `.fill`-reused, negligible vs O(iters·links·voxels). `pose_tracker` Kalman cycle 150 ns (17 keypoints) / 2.82 µs (170) — the "gain matrices" are fixed-size **stack** arrays, zero heap to reuse. No rewrite shipped; the committed benches prove each is not hot. + - **#8 MEASUREMENT-ONLY, BLAS-gated (number deferred, not fabricated).** Correction to the finding: `extract_perturbation` does **not** recompute the SVD (it projects against cached `finalize_calibration` modes); the real per-call eigendecomposition is the `eigenvalue`-feature `estimate_occupancy` (`cov.eigh()` on a 56×56 covariance). The `eig` bench is committed but `openblas-src` won't build on this Windows host ("Non-vcpkg builds are not supported on Windows" — the exact reason the project gate runs `--no-default-features`), so its µs cost must come from a Linux/BLAS box. Recorded, not estimated. Incremental SVD stays a sized future item. + - **#14 / #16 / #19 RESOLVED — tests added (no behaviour change).** `fft_operator_within_tolerance_of_dense_canonical56` pins the full `Cir` output of the opt-in FFT path within a documented relative tolerance of the dense path on the production canonical-56 config (τ ∈ {20,50,90} ns) — it changes the witness hash, so it must be provably *close*, not silently divergent. `refinement_terminates_at_iteration_cap_when_not_converging` (+ convergent companion) proves the LO-offset refinement terminates at exactly `max_iterations` on a non-converging input (cap, not convergence, bounds the loop; internal `…_counted` refactor returns the identical offsets). `ratio_finite_at_and_below_1e_12_epsilon` pins that the conjugate-product CSI-ratio (no division → no `1e-12` divide-guard needed) is finite + bit-exact at/below the epsilon boundary and at exact zero (where a naive `H_i/H_j` ratio is ±inf/NaN). - **ADR-156 §8 Milestone-1: RaBitQ Pass-2 randomized rotation + multi-bit experiment — IMPLEMENTED & MEASURED (RESOLVED-PARTIAL).** Closes the §8 "Multi-bit / Extended RaBitQ" backlog item. New `wifi-densepose-ruvector/src/rotation.rs`: a deterministic randomized orthogonal rotation `R = H·D` — **Fast Hadamard Transform** (`O(d log d)`, in-place, `1/√m`-normalized so norm-preserving) + seeded ±1 sign flips (SplitMix64 from a stored `u64` seed; identical at index + query time). Chosen over a dense `d×d` matrix (`O(d²)`, infeasible at the 65,535-d the wire format provisions for); pads to `next_pow2(d)`. Additive, backward-compatible API (`Sketch::from_embedding_rotated`, `SketchBank::with_rotation` + `insert_embedding`/`topk_embedding`/`novelty_embedding`); Pass-1 and the wire format are byte-for-byte unchanged. New `coverage.rs` single-source-of-truth top-K coverage harness (anisotropic planted-cluster fixture, cosine ground truth) backs both a `#[test]` report and the `sketch_bench` coverage table. **MEASURED (dim=128 N=2048 K=8, 64 clusters, noise=0.35, 128 queries, seeded):** at the strict `candidate_k=K` bar, rotation lifts coverage **36.13% → 46.39%**; Pass-2 reaches the **ADR-084 ≥90% bar at candidate_k=24 (~3× over-fetch)**; multi-bit Pass-3 reaches 54%/67%/74% at 2/3/4-bit (strict bar). **Honest verdict: neither rotation nor ≤4-bit multi-bit clears the strict-K 90% bar on this distribution — the bar is met only via the over-fetch "candidate set" pattern ADR-084 specifies.** No benchmark was tuned to manufacture a pass; the strict-bar gap is documented (ADR-156 §10, ADR-084 "Pass 2" section). +19 tests in the crate (100→119), workspace **3,225 / 0 failed**, Python proof VERDICT: PASS (`f8e76f21…`, unchanged — sketch is not on the proof's signal path). - **Beyond-SOTA `v2/crates/` sweep (ADR-154–158) + full stub-implementation push — every claim MEASURED or graded.** A 5-milestone review/optimize/secure/benchmark/validate sweep, then a verified-audit-driven push to replace every production stub with real, tested logic (no labels, no placeholders). Each fix is pinned by a test that fails on the old code; every number ships with a reproduce command. Workspace: **3,122 tests / 0 failed** (`cargo test --workspace --no-default-features`), Python proof **VERDICT: PASS** (bit-exact). - **ADR-154 Signal/DSP** — revived a dead ADR-134 CIR coherence gate (canonical-56 vs ht20 mismatch meant it never ran in production: 8/8 Err → 8/8 Ok); NaN-bypass + window div0 guards; PSD FFT-planner cache (**2.0–3.1×**) + honored DTW band (**2.4–4.1×**). diff --git a/docs/adr/ADR-154-signal-dsp-beyond-sota.md b/docs/adr/ADR-154-signal-dsp-beyond-sota.md index c0cfaa06..f8c6e1bb 100644 --- a/docs/adr/ADR-154-signal-dsp-beyond-sota.md +++ b/docs/adr/ADR-154-signal-dsp-beyond-sota.md @@ -199,7 +199,9 @@ The §2–§5 fixes are **ACCEPTED and committed**: dead CIR gate fixed, NaN byp Catalogued so nothing is silently dropped. Priority: **P1** correctness-adjacent, **P2** perf, **P3** clarity/style. -**Milestone-1 update (2026-06-13):** the **four P1 backlog items** (#1, #9, #10, #13) are now cleared — #1 and #10 **RESOLVED (MEASURED)**, #9 and #13 **RESOLVED-PARTIAL (DATA-GATED:** de-magicked + boundary-tested, operating values unchanged**)**. ~41 P2/P3 items remain deferred. Each fix is pinned by a regression test that fails on the old behaviour (commits `fd32f094a`, `4a9f2bcf4`, `d672fa602`, `5193f6369`); workspace `--no-default-features` green, Python proof unchanged (bit-exact). +**Milestone-1 update (2026-06-13):** the **four P1 backlog items** (#1, #9, #10, #13) are now cleared — #1 and #10 **RESOLVED (MEASURED)**, #9 and #13 **RESOLVED-PARTIAL (DATA-GATED:** de-magicked + boundary-tested, operating values unchanged**)**. Each fix is pinned by a regression test that fails on the old behaviour (commits `fd32f094a`, `4a9f2bcf4`, `d672fa602`, `5193f6369`); workspace `--no-default-features` green, Python proof unchanged (bit-exact). + +**Milestone-2 update (2026-06-13):** the **bench-first P2 perf subset** (#5, #6, #7, #8, #20) and the **three missing boundary tests** (#14, #16, #19) are now cleared — ~36 P2/P3 items remain deferred. PROOF discipline (§0): every perf item was **benched before being touched** — committed in `benches/dsp_perf_bench.rs` (criterion, this Windows box). Only **#20** proved hot and was optimized; **#5/#6/#7** are committed **MEASURED-NULLs** (benched, not hot, left as-is for clarity — exactly the §5.1 "already amortized" pattern); **#8** is **MEASUREMENT-ONLY** but its `eigenvalue`/BLAS backend won't build on this Windows host, so its µs cost must come from a Linux/BLAS box (recorded, not fabricated). Commits `e839fa8f1` (#20 fix), `02e5dd13a` (#14/#16/#19 tests), `aad9464f0` (benches). Workspace `--no-default-features` green; Python proof unchanged (#20 is bit-identical, off the proof path). | # | Module | Finding | Pri | Why deferred | |---|--------|---------|-----|--------------| @@ -207,25 +209,25 @@ Catalogued so nothing is silently dropped. Priority: **P1** correctness-adjacent | 2 | calibration.rs ~311 | `subtract_in_place` had a vacuous `if active_input {ki} else {ki}` branch implying a full-FFT→bin remap that didn't exist | P3 | **Resolved here** (branch removed, sequential-convention documented to match the sibling `extract_first_stream`). Listed for visibility — behavior unchanged. | | 3 | spectrogram.rs / bvp.rs | FFT planner built once-per-call (already amortized across frames) | P2 | Marginal vs the per-frame PSD site; cache if these become hot. | | 4 | features.rs ~347 | Doppler FFT planner planned once per call, reused across subcarriers | P2 | Already amortized within the call. | -| 5 | multistatic.rs | `node_attention_weights` recomputes consensus/softmax each call; no SIMD | P2 | Needs a bench before touching; not obviously hot. | -| 6 | tomography.rs | ISTA L1 solver re-allocates voxel buffers per solve | P2 | Bench first. | -| 7 | pose_tracker.rs | Kalman gain matrices reallocated per update | P2 | Bench first. | -| 8 | field_model.rs | SVD recomputed on every perturbation extract | P2 | Incremental SVD is a real project, not a micro-fix. | +| 5 | multistatic.rs | `node_attention_weights` recomputes consensus/softmax each call; no SIMD | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** `multistatic_attention/weights`: **181 ns** (2 nodes) … **848 ns** (8 nodes) @ 56 subcarriers — sub-µs, no hot-path allocation. A precompute/SIMD rewrite buys nothing measurable at the realistic 2–8 node fan-in; the cosine/softmax cost is dwarfed by the surrounding fusion + per-frame FFT. Bench `multistatic_attention` in `dsp_perf_bench.rs`. | +| 6 | tomography.rs | ISTA L1 solver re-allocates voxel buffers per solve | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** A full 50-iteration `reconstruct` (256 voxels): **47.5 µs** (16 links) / **60.4 µs** (32 links). The two voxel buffers (`x`, `gradient`; ~4 KB) are already allocated *once* per `reconstruct()` and `.fill`-reused across iterations — the per-solve alloc is a negligible fraction of the O(iters·links·voxels) inner product. Reusing scratch across *calls* would force `reconstruct(&self)`→`&mut self` (API break) for no measurable gain. Bench `tomography_reconstruct`. | +| 7 | pose_tracker.rs | Kalman gain matrices reallocated per update | P2 | **MEASURED-NULL (`aad9464f0`) — benched, not hot, left as-is.** A Kalman predict+update cycle: **150 ns** (17 keypoints) / **2.82 µs** (170). The "gain matrices" (`s:[f32;3]`, `k:[[f32;3];6]`) are fixed-size **stack** arrays, *not* heap — there is no per-update allocation to reuse; the compiler keeps them in registers/stack. Bench `pose_kalman_update`. | +| 8 | field_model.rs | SVD recomputed on every perturbation extract | P2 | **MEASUREMENT-ONLY (`aad9464f0`) — BLAS-gated, not measurable on this host.** Correction: `extract_perturbation` does **not** recompute the SVD — it projects against the cached `modes` from `finalize_calibration`. The real per-call eigendecomposition is in the `eigenvalue`-feature `estimate_occupancy` (`cov.eigh()` on a 56×56 covariance, an O(n³)≈175k-flop symmetric eigensolve + O(n²·frames) covariance build, run per call). The bench (`dsp_perf_bench`'s `eig` module) is committed, but `openblas-src` **fails to build on this Windows box** ("Non-vcpkg builds are not supported on Windows" — the very reason the project gate runs `--no-default-features`), so a measured µs number must come from a Linux/BLAS host; **not estimated/fabricated here.** Incremental SVD remains a sized future project, not a micro-fix. | | 9 | coherence.rs / coherence_gate.rs | Z-score thresholds are magic constants, untested at boundaries | P1 | **RESOLVED-PARTIAL (`5193f6369`) — DATA-GATED.** De-magicked `classify_drift` (`DRIFT_STABLE_SCORE=0.85`, `DRIFT_STEP_CHANGE_MAX_STALE=10`) and the `coherence_gate.rs` defaults (`DEFAULT_ACCEPT_THRESHOLD`/`…REJECT…`/`…MAX_STALE_FRAMES`/`…PREDICT_ONLY_NOISE`) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (`classify_drift_*_boundary`) + `*_consts_unchanged_from_literals`. **Operating values explicitly NOT changed** — defensible values still need labelled stable/drifting traces. The gate already exposed these via `GatePolicyConfig` (config seam). | | 10 | longitudinal.rs | Welford update not numerically guarded for n=0 | P1 | **RESOLVED (`4a9f2bcf4`) — MEASURED.** The shared `WelfordStats` (`field_model.rs`, consumed by longitudinal.rs) `count < 2` guards already prevent the n=0 NaN / n=1 div0 / `(count−1)` underflow, but the boundary was untested. Added `welford_finite_at_n0_and_n1` (finite + documented 0.0 sentinel at n=0/n=1). Fails-on-old proof: removing the `sample_variance` guard makes the test panic with "attempt to subtract with overflow" at the `(count − 1)` underflow. | | 11 | cross_room.rs | Fingerprint hash collisions unhandled | P2 | Low collision prob; needs design. | | 12 | gesture.rs | `euclidean_distance` no length-mismatch guard | P3 | Caller-enforced; add `debug_assert`. | | 13 | adversarial.rs | Gini/consistency thresholds are magic constants | P1 | **RESOLVED-PARTIAL (`d672fa602`) — DATA-GATED.** Lifted the bare literals in `check`/`check_consistency` (`FIELD_MODEL_GINI_VIOLATION=0.8`, `ENERGY_RATIO_HIGH_VIOLATION=2.0`, `ENERGY_RATIO_LOW_VIOLATION=0.1`, `CONSISTENCY_ACTIVE_FRACTION_OF_MEAN=0.1`, `SCORE_W_*`) into named, documented consts marked EMPIRICAL DEFAULT; added at/just-below/just-above boundary tests (`energy_ratio_high_boundary`, `energy_ratio_low_boundary`, `field_model_gini_boundary`, `consistency_active_fraction_boundary`) + `tuning_consts_unchanged_from_literals`. **Operating values explicitly NOT changed** — defensible values still need labelled spoofed/clean CSI (Wi-Spoof, §6.2/§7.3). Bumping a const fails a boundary test (verified). | -| 14 | cir.rs | `fft_operator` path changes the witness hash (documented) — no test that it's *numerically close* to dense | P2 | Add a tolerance test. | +| 14 | cir.rs | `fft_operator` path changes the witness hash (documented) — no test that it's *numerically close* to dense | P2 | **RESOLVED (`02e5dd13a`) — tolerance test added.** `fft_operator_within_tolerance_of_dense_canonical56` pins the **full `Cir` output** of the FFT path within a *documented* relative tolerance of the dense path on the production **canonical-56** config across τ ∈ {20,50,90} ns: every tap within `1e-2·|dominant|`, identical `dominant_tap_idx`, `active_tap_count`, `ranging_valid`, `dominant_tap_ratio` within `1e-2`, `rms_delay_spread` within `1e-2` rel. A regression that lets the FFT path drift (scaling/Φ-column bug) now fails here instead of silently corrupting a downstream witness. Extends the existing HT20/single-τ `fft_estimate_matches_dense_dominant_tap`. | | 15 | multistatic.rs | `cir_gate_coherence` only estimates the **first** node/channel; multi-node CIR consensus unused | P2 | Design item (which node's CIR is authoritative?). | -| 16 | phase_align.rs | Iterative LO offset estimation has no convergence cap test | P2 | Add iteration-cap test. | +| 16 | phase_align.rs | Iterative LO offset estimation has no convergence cap test | P2 | **RESOLVED (`02e5dd13a`) — cap test added.** `refinement_terminates_at_iteration_cap_when_not_converging` forces non-convergence (`tolerance = 0.0`, unreachable since `max_update ≥ 0`) and asserts the loop runs **exactly `max_iterations`** then returns — proving the cap (not convergence) bounds the loop, so a non-converging input can never spin forever. Companion `refinement_converges_before_cap_on_easy_input` proves the cap is an upper bound, not the only exit. Internal-only refactor: `estimate_phase_offsets` still returns the identical offset vector; a `…_counted` core surfaces the iteration count for the test. | | 17 | hampel.rs | Window edge handling at series boundaries | P3 | Cosmetic. | | 18 | motion.rs | Threshold constants undocumented | P3 | Doc-only. | -| 19 | csi_ratio.rs | Division guard relies on `1e-12` epsilon; no test | P2 | Add boundary test. | -| 20 | spectrogram.rs | `compute_multi_subcarrier_spectrogram` re-plans per subcarrier via `compute_spectrogram` | P2 | Hoist the planner (relates to #3). | +| 19 | csi_ratio.rs | Division guard relies on `1e-12` epsilon; no test | P2 | **RESOLVED (`02e5dd13a`) — boundary test added.** Finding clarification: `csi_ratio.rs` implements the CSI *ratio model* as the **conjugate product** `H_i·conj(H_j)` (SpotFi/IndoTrack) — there is **no division**, hence no literal `1e-12` epsilon; the classic `H_i/H_j` ratio (which a `1e-12` guard protects) is deliberately avoided. `ratio_finite_at_and_below_1e_12_epsilon` pins the property the finding cares about: at and below the `1e-12` target magnitude (and at exact zero — where a division ratio is ±inf/NaN) the conjugate-product output is **finite**, exactly the conjugate product (bit-exact), collapses toward zero (the physically correct "no path" answer), and stays finite through `ratio_to_amplitude_phase`. | +| 20 | spectrogram.rs | `compute_multi_subcarrier_spectrogram` re-plans per subcarrier via `compute_spectrogram` | P2 | **MEASURED-HOT (`e839fa8f1`) — optimized, bit-identical.** Hoisted the FFT plan + window out of the per-subcarrier loop (new `compute_spectrogram_with_plan` core). **56-subcarrier** multi-spectrogram: **467.88 µs → 254.75 µs = 1.84×** (window 128); **627.27 µs → 448.39 µs = 1.40×** (window 256). The removed cost is the per-subcarrier `FftPlanner` re-plan (~1.86 µs/plan @ w128 × 56). Bit-identical (`multi_subcarrier_hoisted_plan_bit_identical`, `f64::to_bits` across all 4 windows × {power,magnitude}). The most likely real win predicted by the §7.4 intro — confirmed. (Relates to #3, which stays deferred: `spectrogram.rs`/`bvp.rs` single-signal callers already plan once-per-call.) | | 21–45 | (assorted) | Remaining clarity/doc/magic-constant/missing-boundary-test findings across `ruvsense/*`, `features.rs`, `motion.rs` | P3 | Bulk-addressable in a dedicated "test-the-boundaries + de-magic-constant" follow-up; not high-leverage individually. | -> **Horizon-ledger one-liner.** Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n−1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). **Milestone-1 DONE (2026-06-13): all four P1 backlog items cleared — circular phase variance #1 (RESOLVED/MEASURED metric, DATA-GATED threshold), Welford n=0 guard #10 (RESOLVED/MEASURED), threshold magic-constants #9 & #13 (RESOLVED-PARTIAL/DATA-GATED — de-magicked + boundary-tested, values unchanged).** DEFERRED to follow-up: the ~41 remaining P2/P3 findings in §7.4 — none silently dropped. +> **Horizon-ledger one-liner.** Milestone-0 DONE: dead CIR gate (FIXED+proved), NaN/inf adversarial bypass (FIXED+proved), divide-by-(n−1) window trio (FIXED+proved), calibration dead-branch (FIXED), PSD FFT-planner cache (MEASURED), DTW band (MEASURED). **Milestone-1 DONE (2026-06-13): all four P1 backlog items cleared — circular phase variance #1 (RESOLVED/MEASURED metric, DATA-GATED threshold), Welford n=0 guard #10 (RESOLVED/MEASURED), threshold magic-constants #9 & #13 (RESOLVED-PARTIAL/DATA-GATED — de-magicked + boundary-tested, values unchanged).** **Milestone-2 DONE (2026-06-13): bench-first P2 perf subset + missing boundary tests cleared — spectrogram per-subcarrier FFT re-plan #20 (MEASURED-HOT, 1.40–1.84×, bit-identical); attention/tomography/Kalman #5/#6/#7 (MEASURED-NULL — benched, not hot, left as-is); field_model eigendecompose #8 (MEASUREMENT-ONLY, BLAS un-buildable on this Windows host, number deferred to a BLAS box, NOT fabricated); fft_operator tolerance #14, phase-align convergence-cap #16, csi-ratio epsilon #19 (RESOLVED, tests added).** DEFERRED to follow-up: the ~36 remaining P2/P3 findings in §7.4 — none silently dropped. --- diff --git a/v2/Cargo.lock b/v2/Cargo.lock index 016aa000..f0cdd0e6 100644 --- a/v2/Cargo.lock +++ b/v2/Cargo.lock @@ -10835,7 +10835,7 @@ dependencies = [ [[package]] name = "wifi-densepose-cli" -version = "0.3.0" +version = "0.3.1" dependencies = [ "anyhow", "assert_cmd", @@ -11067,7 +11067,7 @@ dependencies = [ [[package]] name = "wifi-densepose-sensing-server" -version = "0.3.2" +version = "0.3.3" dependencies = [ "axum", "chrono", @@ -11101,7 +11101,7 @@ dependencies = [ [[package]] name = "wifi-densepose-signal" -version = "0.3.3" +version = "0.3.4" dependencies = [ "chrono", "criterion", diff --git a/v2/crates/wifi-densepose-signal/Cargo.toml b/v2/crates/wifi-densepose-signal/Cargo.toml index 74836b2c..1191ff38 100644 --- a/v2/crates/wifi-densepose-signal/Cargo.toml +++ b/v2/crates/wifi-densepose-signal/Cargo.toml @@ -71,6 +71,12 @@ harness = false name = "features_bench" harness = false +## ADR-154 Milestone-2: P2 "bench-first" perf items (§7.4 #5/#6/#7/#8/#20). +## #8 (field_model eigendecompose) is measured only under the eigenvalue feature. +[[bench]] +name = "dsp_perf_bench" +harness = false + ## ADR-134: CIR estimator throughput benchmarks [[bench]] name = "cir_bench" diff --git a/v2/crates/wifi-densepose-signal/benches/dsp_perf_bench.rs b/v2/crates/wifi-densepose-signal/benches/dsp_perf_bench.rs new file mode 100644 index 00000000..9d65980b --- /dev/null +++ b/v2/crates/wifi-densepose-signal/benches/dsp_perf_bench.rs @@ -0,0 +1,353 @@ +//! ADR-154 Milestone-2 perf benchmarks (§7.4 P2 "bench-first" items). +//! +//! PROOF discipline (ADR-154 §0): every P2 item is **benched before touched**. +//! A micro-opt is landed only if the bench proves the path hot; otherwise the +//! committed bench *is* the result — a MEASURED-NULL that proves the rewrite was +//! unnecessary (exactly the §5.x "already amortized" pattern). No speedup is +//! claimed without a before/after number from here. +//! +//! Reproduce (compile-only): +//! cargo bench -p wifi-densepose-signal --no-default-features \ +//! --bench dsp_perf_bench --no-run +//! +//! Reproduce (full run, writes target/criterion/ HTML): +//! cargo bench -p wifi-densepose-signal --no-default-features --bench dsp_perf_bench +//! +//! Groups: +//! * `multistatic_attention` (#5) — `node_attention_weights` at 2..8 nodes × +//! 56 subcarriers. Re-derives consensus/softmax each call; no scratch to +//! reuse → expected MEASURED-NULL. +//! * `tomography_reconstruct` (#6) — full ISTA solve. The two voxel buffers are +//! allocated once per `reconstruct()` (then `.fill`-reused across +//! iterations), so the per-solve alloc is 2×n_voxels vs an +//! O(iters·links·voxels) compute → expected MEASURED-NULL. +//! * `pose_kalman_update` (#7) — Kalman predict+update loop. The "gain +//! matrices" are fixed-size **stack** arrays (`[[f32;3];6]`), not heap — +//! nothing to reuse → expected MEASURED-NULL. +//! * `spectrogram_multi_subcarrier` (#20) — `compute_multi_subcarrier_spectrogram`: +//! fresh-planner-per-subcarrier (BEFORE) vs hoisted-plan (AFTER, shipped). +//! The per-subcarrier FFT re-plan is the likely real win. +//! * `field_model_occupancy` (#8, `eigenvalue` only) — per-call n×n +//! eigendecomposition in `estimate_occupancy`. MEASUREMENT-ONLY: quantifies +//! the recompute cost; incremental SVD is a sized future project, not a +//! micro-fix. + +use criterion::{black_box, criterion_group, criterion_main, BenchmarkId, Criterion, Throughput}; +use ndarray::Array2; +use rustfft::FftPlanner; +use std::f64::consts::PI; +use std::time::Duration; + +use wifi_densepose_signal::ruvsense::multistatic::node_attention_weights; +use wifi_densepose_signal::ruvsense::pose_tracker::KeypointState; +use wifi_densepose_signal::ruvsense::tomography::{ + LinkGeometry, Position3D, RfTomographer, TomographyConfig, +}; +use wifi_densepose_signal::spectrogram::{ + compute_multi_subcarrier_spectrogram, compute_spectrogram, Spectrogram, SpectrogramConfig, + WindowFunction, +}; + +// --------------------------------------------------------------------------- +// #5 multistatic node_attention_weights +// --------------------------------------------------------------------------- + +fn make_node_amplitudes(n_nodes: usize, n_sub: usize) -> Vec> { + (0..n_nodes) + .map(|n| { + (0..n_sub) + .map(|s| { + let phase = (n as f32 * 0.31 + s as f32 * 0.07) % std::f32::consts::TAU; + 0.5 + 0.4 * phase.sin() + }) + .collect() + }) + .collect() +} + +fn bench_multistatic_attention(c: &mut Criterion) { + let mut group = c.benchmark_group("multistatic_attention"); + group.measurement_time(Duration::from_secs(3)); + let n_sub = 56; // canonical-56 grid + + for &n_nodes in &[2usize, 4, 8] { + let owned = make_node_amplitudes(n_nodes, n_sub); + let refs: Vec<&[f32]> = owned.iter().map(|v| v.as_slice()).collect(); + group.throughput(Throughput::Elements(1)); + group.bench_with_input( + BenchmarkId::new("weights", n_nodes), + &refs, + |b, amplitudes| { + b.iter(|| black_box(node_attention_weights(black_box(amplitudes), 1.0))); + }, + ); + } + group.finish(); +} + +// --------------------------------------------------------------------------- +// #6 tomography reconstruct (ISTA L1) +// --------------------------------------------------------------------------- + +fn make_tomographer(n_links: usize) -> (RfTomographer, Vec) { + // A modest 8x8x4 grid (256 voxels), n_links TX/RX pairs around the box. + let config = TomographyConfig { + nx: 8, + ny: 8, + nz: 4, + bounds: [0.0, 0.0, 0.0, 4.0, 4.0, 2.0], + lambda: 0.01, + max_iterations: 50, + tolerance: 1e-6, + min_links: 8, + }; + let mut links = Vec::with_capacity(n_links); + for i in 0..n_links { + let t = i as f64 / n_links as f64; + links.push(LinkGeometry { + tx: Position3D { + x: 4.0 * (t * PI).cos().abs(), + y: 0.0, + z: 1.0, + }, + rx: Position3D { + x: 4.0 * (t * PI).sin().abs(), + y: 4.0, + z: 1.0, + }, + link_id: i, + }); + } + let tomo = RfTomographer::new(config, &links).unwrap(); + // Deterministic attenuations (one occupied region in the middle). + let attenuations: Vec = (0..n_links) + .map(|i| 0.1 + 0.05 * ((i as f64 * 0.3).sin())) + .collect(); + (tomo, attenuations) +} + +fn bench_tomography_reconstruct(c: &mut Criterion) { + let mut group = c.benchmark_group("tomography_reconstruct"); + group.measurement_time(Duration::from_secs(4)); + + for &n_links in &[16usize, 32] { + let (tomo, atten) = make_tomographer(n_links); + group.throughput(Throughput::Elements(1)); + group.bench_with_input( + BenchmarkId::new("solve", n_links), + &(tomo, atten), + |b, (tomo, atten)| { + b.iter(|| black_box(tomo.reconstruct(black_box(atten)).unwrap().occupied_count)); + }, + ); + } + group.finish(); +} + +// --------------------------------------------------------------------------- +// #7 pose tracker Kalman update loop +// --------------------------------------------------------------------------- + +fn bench_pose_kalman_update(c: &mut Criterion) { + let mut group = c.benchmark_group("pose_kalman_update"); + group.measurement_time(Duration::from_secs(3)); + + // 17 keypoints (COCO-17), N predict+update cycles — a realistic frame batch. + for &n_updates in &[17usize, 170] { + group.throughput(Throughput::Elements(n_updates as u64)); + group.bench_with_input(BenchmarkId::new("cycles", n_updates), &n_updates, |b, &n| { + b.iter(|| { + let mut acc = 0.0_f32; + for k in 0..n { + let mut state = KeypointState::new( + (k as f32 * 0.1).sin(), + (k as f32 * 0.2).cos(), + 1.0 + (k as f32 * 0.05), + ); + state.predict(0.05, 0.5); + let meas = [ + (k as f32 * 0.1).sin() + 0.01, + (k as f32 * 0.2).cos() - 0.01, + 1.0 + (k as f32 * 0.05), + ]; + state.update(&meas, 0.1, 1.0); + acc += state.state[0]; + } + black_box(acc) + }); + }); + } + group.finish(); +} + +// --------------------------------------------------------------------------- +// #20 multi-subcarrier spectrogram: fresh-planner vs hoisted plan +// --------------------------------------------------------------------------- + +fn make_csi_temporal(n_samples: usize, n_sc: usize) -> Array2 { + Array2::from_shape_fn((n_samples, n_sc), |(t, sc)| { + let freq = 0.7 + sc as f64 * 0.13; + (2.0 * PI * freq * t as f64 / 100.0).sin() + + 0.3 * (2.0 * PI * (freq * 2.1) * t as f64 / 100.0).cos() + }) +} + +/// BEFORE: re-plan the FFT inside `compute_spectrogram` for every subcarrier. +/// Faithful transcription of the pre-ADR-154-M2 `compute_multi_subcarrier_spectrogram`. +fn multi_fresh_planner( + csi: &Array2, + sample_rate: f64, + config: &SpectrogramConfig, +) -> Vec { + let (_, n_sc) = csi.dim(); + (0..n_sc) + .map(|sc| { + let col: Vec = csi.column(sc).to_vec(); + // compute_spectrogram builds a fresh FftPlanner on every call. + compute_spectrogram(&col, sample_rate, config).unwrap() + }) + .collect() +} + +fn bench_spectrogram_multi_subcarrier(c: &mut Criterion) { + let mut group = c.benchmark_group("spectrogram_multi_subcarrier"); + group.measurement_time(Duration::from_secs(5)); + let sample_rate = 100.0; + + // Realistic: 600 temporal samples (~6 s @ 100 Hz) across 56 subcarriers, + // window 128. n_sc re-plans removed by the hoist. + for &(n_samples, n_sc, window) in &[(600usize, 56usize, 128usize), (600, 56, 256)] { + let csi = make_csi_temporal(n_samples, n_sc); + let config = SpectrogramConfig { + window_size: window, + hop_size: 64, + window_fn: WindowFunction::Hann, + power: true, + }; + group.throughput(Throughput::Elements(n_sc as u64)); + + // BEFORE: fresh planner per subcarrier. + group.bench_with_input( + BenchmarkId::new("fresh_planner", format!("sc{n_sc}_w{window}")), + &config, + |b, cfg| { + b.iter(|| black_box(multi_fresh_planner(black_box(&csi), sample_rate, cfg).len())); + }, + ); + + // AFTER: hoisted plan (the shipped `compute_multi_subcarrier_spectrogram`). + group.bench_with_input( + BenchmarkId::new("hoisted_plan", format!("sc{n_sc}_w{window}")), + &config, + |b, cfg| { + b.iter(|| { + black_box( + compute_multi_subcarrier_spectrogram(black_box(&csi), sample_rate, cfg) + .unwrap() + .len(), + ) + }); + }, + ); + } + group.finish(); +} + +// A standalone FftPlanner sanity micro-bench documenting the cost the hoist +// removes: building+planning a length-N forward FFT once. +fn bench_fft_plan_cost(c: &mut Criterion) { + let mut group = c.benchmark_group("fft_plan_cost"); + group.measurement_time(Duration::from_secs(2)); + for &n in &[128usize, 256] { + group.bench_with_input(BenchmarkId::new("plan_forward", n), &n, |b, &n| { + b.iter(|| { + let mut planner = FftPlanner::::new(); + black_box(planner.plan_fft_forward(black_box(n))) + }); + }); + } + group.finish(); +} + +// --------------------------------------------------------------------------- +// #8 field_model SVD/eigendecomposition recompute (MEASUREMENT-ONLY) +// --------------------------------------------------------------------------- +// `estimate_occupancy` builds an n×n covariance and eigendecomposes it on every +// call (BLAS, `eigenvalue` feature). This bench quantifies that per-call cost so +// ADR-154 §7.4 #8 can record a number; incremental SVD is a sized future item, +// NOT attempted here. +#[cfg(feature = "eigenvalue")] +mod eig { + use super::*; + use wifi_densepose_signal::ruvsense::field_model::{FieldModel, FieldModelConfig}; + + fn calibrated_model(n_sub: usize, n_links: usize) -> FieldModel { + let config = FieldModelConfig { + n_subcarriers: n_sub, + n_links, + n_modes: 3, + min_calibration_frames: 20, + baseline_expiry_s: 86_400.0, + }; + let mut model = FieldModel::new(config).unwrap(); + // Feed deterministic calibration frames: [n_links][n_sub] per observation. + for f in 0..30 { + let obs: Vec> = (0..n_links) + .map(|l| { + (0..n_sub) + .map(|s| { + 0.5 + 0.3 + * ((f as f64 * 0.1 + l as f64 * 0.2 + s as f64 * 0.05).sin()) + }) + .collect() + }) + .collect(); + model.feed_calibration(&obs).unwrap(); + } + model.finalize_calibration(0, 0).unwrap(); + model + } + + pub fn bench_field_model_occupancy(c: &mut Criterion) { + let mut group = c.benchmark_group("field_model_occupancy"); + group.measurement_time(Duration::from_secs(4)); + let n_sub = 56; + let model = calibrated_model(n_sub, 4); + // Sliding window of recent frames (50 ~ 2.5 s @ 20 Hz). + let frames: Vec> = (0..50) + .map(|t| { + (0..n_sub) + .map(|s| 0.5 + 0.3 * ((t as f64 * 0.15 + s as f64 * 0.07).sin())) + .collect() + }) + .collect(); + group.throughput(Throughput::Elements(1)); + group.bench_function(BenchmarkId::new("eigh", n_sub), |b| { + b.iter(|| black_box(model.estimate_occupancy(black_box(&frames)))); + }); + group.finish(); + } +} + +#[cfg(feature = "eigenvalue")] +criterion_group!( + benches, + bench_multistatic_attention, + bench_tomography_reconstruct, + bench_pose_kalman_update, + bench_spectrogram_multi_subcarrier, + bench_fft_plan_cost, + eig::bench_field_model_occupancy, +); + +#[cfg(not(feature = "eigenvalue"))] +criterion_group!( + benches, + bench_multistatic_attention, + bench_tomography_reconstruct, + bench_pose_kalman_update, + bench_spectrogram_multi_subcarrier, + bench_fft_plan_cost, +); + +criterion_main!(benches); diff --git a/v2/crates/wifi-densepose-signal/src/csi_ratio.rs b/v2/crates/wifi-densepose-signal/src/csi_ratio.rs index 000d1dcc..504743c2 100644 --- a/v2/crates/wifi-densepose-signal/src/csi_ratio.rs +++ b/v2/crates/wifi-densepose-signal/src/csi_ratio.rs @@ -197,4 +197,61 @@ mod tests { Err(CsiRatioError::LengthMismatch { .. }) )); } + + // ADR-154 §7.4 #19: the CSI *ratio model*. The classic ratio is + // `H_i[k] / H_j[k]`, which blows up (±inf / NaN) when `H_j[k]` approaches + // zero — the case a `1e-12` division-guard epsilon is meant to protect. This + // module deliberately implements the ratio as the **conjugate product** + // `H_i * conj(H_j)` (SpotFi/IndoTrack), which has *no division* and is + // therefore finite even at and below the `1e-12` magnitude boundary. This + // test pins that property: at the epsilon boundary the output is finite and + // exactly the conjugate product (no silent NaN/inf from a hidden divide). + #[test] + fn ratio_finite_at_and_below_1e_12_epsilon() { + let eps = 1e-12_f64; + // Reference at unit magnitude; target swept across / under the epsilon + // boundary a naive H_i/H_j division would need to guard. + let h_ref = vec![ + Complex64::from_polar(1.0, 0.3), + Complex64::from_polar(1.0, 0.3), + Complex64::from_polar(1.0, 0.3), + Complex64::from_polar(1.0, 0.3), + ]; + let h_target = vec![ + Complex64::new(eps, 0.0), // exactly at the epsilon + Complex64::new(eps * 0.5, 0.0), // below the epsilon + Complex64::new(0.0, eps), // imaginary axis, at epsilon + Complex64::new(0.0, 0.0), // exact zero — div would be inf/NaN + ]; + + let ratio = conjugate_multiply(&h_ref, &h_target).unwrap(); + assert_eq!(ratio.len(), 4); + for (k, r) in ratio.iter().enumerate() { + assert!( + r.re.is_finite() && r.im.is_finite(), + "conjugate-multiply ratio must be finite at boundary k={k}: {r:?}" + ); + } + + // The near-zero / zero target collapses the product toward zero (the + // physically correct "no measurable path" answer), never to inf/NaN. + assert!( + ratio[3].norm() == 0.0, + "exact-zero target → zero product, got {}", + ratio[3].norm() + ); + // The at-epsilon entries equal the exact conjugate product (bit-exact). + let expected0 = h_ref[0] * h_target[0].conj(); + assert_eq!(ratio[0].re.to_bits(), expected0.re.to_bits()); + assert_eq!(ratio[0].im.to_bits(), expected0.im.to_bits()); + + // The full pipeline (amplitude/phase extraction) is also finite here. + let mut m = Array2::::zeros((1, 4)); + for (k, &v) in ratio.iter().enumerate() { + m[[0, k]] = v; + } + let (amp, phase) = ratio_to_amplitude_phase(&m); + assert!(amp.iter().all(|a| a.is_finite())); + assert!(phase.iter().all(|p| p.is_finite())); + } } diff --git a/v2/crates/wifi-densepose-signal/src/ruvsense/cir.rs b/v2/crates/wifi-densepose-signal/src/ruvsense/cir.rs index abf6def0..692a3520 100644 --- a/v2/crates/wifi-densepose-signal/src/ruvsense/cir.rs +++ b/v2/crates/wifi-densepose-signal/src/ruvsense/cir.rs @@ -1458,6 +1458,79 @@ mod tests { } } + /// ADR-154 §7.4 #14: the `fft_operator` path *changes the witness hash* + /// (documented in `CirConfig::fft_operator`), so it must be pinned as + /// numerically **close** to the dense path — not silently divergent. The + /// existing `fft_estimate_matches_dense_dominant_tap` covers HT20 / one tau; + /// this test asserts the **full `Cir` output** (every tap + every scalar + /// field) stays within a documented relative tolerance on the production + /// **canonical-56** config across several realistic delays. A regression + /// that lets the FFT path drift (wrong scaling, off-by-one Φ column, etc.) + /// fails here instead of corrupting a downstream witness unnoticed. + #[test] + fn fft_operator_within_tolerance_of_dense_canonical56() { + // Relative tolerances — documented, not silent. The FFT operator sums the + // same Φ entries in a different order, so taps agree to ~float epsilon + // scaled by the dominant-tap magnitude; ISTA can differ by a few last + // bits over its trajectory, hence 1e-2 (same order as the existing test). + const TAP_REL_TOL: f32 = 1e-2; + const RATIO_ABS_TOL: f32 = 1e-2; + const SPREAD_REL_TOL: f64 = 1e-2; + + for &tau in &[20e-9_f64, 50e-9, 90e-9] { + let dense_cfg = CirConfig::canonical56(); + let mut fft_cfg = CirConfig::canonical56(); + fft_cfg.fft_operator = true; + + let frame = make_single_tap_frame(dense_cfg.num_subcarriers, tau); + let dense = CirEstimator::new(dense_cfg).estimate(&frame).unwrap(); + let fast = CirEstimator::new(fft_cfg).estimate(&frame).unwrap(); + + assert_eq!(dense.taps.len(), fast.taps.len()); + + // Full tap vector close (relative to the dominant tap magnitude). + let dom = dense.taps[dense.dominant_tap_idx].norm().max(1e-6); + let mut max_tap_err = 0.0_f32; + for (a, b) in dense.taps.iter().zip(&fast.taps) { + max_tap_err = max_tap_err.max((a - b).norm()); + } + assert!( + max_tap_err <= TAP_REL_TOL * dom, + "tau={tau:e}: FFT taps diverged from dense — max err {max_tap_err} > {TAP_REL_TOL} * {dom} (NOT numerically close)" + ); + + // The dominant tap and the scalar summary fields must agree too — + // these feed the witness, so a silent divergence here is the bug #14 + // guards against. + assert_eq!( + dense.dominant_tap_idx, fast.dominant_tap_idx, + "tau={tau:e}: dominant tap index moved" + ); + assert!( + (dense.dominant_tap_ratio - fast.dominant_tap_ratio).abs() <= RATIO_ABS_TOL, + "tau={tau:e}: dominant_tap_ratio drift {} vs {}", + dense.dominant_tap_ratio, + fast.dominant_tap_ratio + ); + assert_eq!( + dense.active_tap_count, fast.active_tap_count, + "tau={tau:e}: active_tap_count changed" + ); + assert_eq!( + dense.ranging_valid, fast.ranging_valid, + "tau={tau:e}: ranging_valid flipped" + ); + let spread_ref = dense.rms_delay_spread_s.abs().max(1e-12); + assert!( + (dense.rms_delay_spread_s - fast.rms_delay_spread_s).abs() + <= SPREAD_REL_TOL * spread_ref, + "tau={tau:e}: rms_delay_spread drift {} vs {}", + dense.rms_delay_spread_s, + fast.rms_delay_spread_s + ); + } + } + /// The default configs keep the FFT operator off — the dense, bit-exact /// witness path is the default (enabling FFT shifts float results). #[test] diff --git a/v2/crates/wifi-densepose-signal/src/ruvsense/phase_align.rs b/v2/crates/wifi-densepose-signal/src/ruvsense/phase_align.rs index 9bd54c71..404dc9b2 100644 --- a/v2/crates/wifi-densepose-signal/src/ruvsense/phase_align.rs +++ b/v2/crates/wifi-densepose-signal/src/ruvsense/phase_align.rs @@ -201,12 +201,29 @@ fn find_static_subcarriers( /// Estimate per-channel phase offsets using iterative Neumann-style refinement. /// -/// Channel 0 is the reference (offset = 0). +/// Channel 0 is the reference (offset = 0). Thin wrapper that drops the +/// iteration count; `estimate_phase_offsets_counted` is the instrumented core. fn estimate_phase_offsets( frames: &[CanonicalCsiFrame], static_indices: &[usize], config: &PhaseAlignConfig, ) -> std::result::Result, PhaseAlignError> { + estimate_phase_offsets_counted(frames, static_indices, config).map(|(offsets, _iters)| offsets) +} + +/// Core of [`estimate_phase_offsets`], also returning the number of refinement +/// iterations actually executed. +/// +/// The returned count is bounded by `config.max_iterations` — that bound is the +/// convergence cap that guarantees termination on inputs the damped Neumann +/// update never drives below `config.tolerance` (ADR-154 §7.4 #16). The offset +/// vector is identical to the public `estimate_phase_offsets` path; only the +/// iteration count is surfaced (for the cap test). +fn estimate_phase_offsets_counted( + frames: &[CanonicalCsiFrame], + static_indices: &[usize], + config: &PhaseAlignConfig, +) -> std::result::Result<(Vec, usize), PhaseAlignError> { let n_ch = frames.len(); let mut offsets = vec![0.0_f32; n_ch]; @@ -220,7 +237,7 @@ fn estimate_phase_offsets( } // Iterative refinement (Neumann-style) - for _iter in 0..config.max_iterations { + for iter in 0..config.max_iterations { let mut max_update = 0.0_f32; for c in 1..n_ch { @@ -241,12 +258,13 @@ fn estimate_phase_offsets( } if max_update < config.tolerance { - return Ok(offsets); + return Ok((offsets, iter + 1)); } } - // Even if we do not converge tightly, return best estimate - Ok(offsets) + // Even if we do not converge tightly, return best estimate. The loop ran the + // full cap — termination is guaranteed by `config.max_iterations`. + Ok((offsets, config.max_iterations)) } /// Apply phase correction: subtract offset from each subcarrier phase. @@ -446,6 +464,73 @@ mod tests { assert_eq!(cfg.min_static_subcarriers, 5); } + // ADR-154 §7.4 #16: the iterative LO-offset refinement must TERMINATE at the + // `max_iterations` cap on a non-converging input — no unbounded loop. + // + // We force non-convergence by setting `tolerance` to an unreachable value + // (the damped Neumann update on bounded phase residuals can never drive + // `max_update` below 0.0), so the `max_update < tolerance` early-exit is + // never taken. The instrumented core must then run *exactly* + // `max_iterations` and return — proving the cap, not convergence, is what + // bounds the loop. + #[test] + fn refinement_terminates_at_iteration_cap_when_not_converging() { + let n_sub = 56; + let max_iterations = 7; + let config = PhaseAlignConfig { + max_iterations, + // Unreachable tolerance: `max_update` is always ≥ 0, never < 0.0, + // so the convergence branch can never fire. + tolerance: 0.0, + static_fraction: 0.3, + min_static_subcarriers: 5, + }; + // Two channels with a real, persistent offset so each iteration keeps + // producing a non-zero update. + let f0 = make_frame_with_phase(n_sub, 0.0, 0.0); + let f1 = make_frame_with_phase(n_sub, 0.0, 1.3); + let frames = vec![f0, f1]; + let static_indices = find_static_subcarriers(&frames, &config).unwrap(); + + let (offsets, iters) = + estimate_phase_offsets_counted(&frames, &static_indices, &config).unwrap(); + + // The cap, not convergence, terminated the loop. + assert_eq!( + iters, max_iterations, + "expected the loop to run the full cap ({max_iterations}), got {iters}" + ); + // It still returns a finite best-estimate offset vector. + assert_eq!(offsets.len(), 2); + assert!(offsets.iter().all(|o| o.is_finite())); + // Reference channel offset stays 0. + assert_eq!(offsets[0], 0.0); + } + + // Convergent companion: a near-identical input converges *before* the cap, + // so the cap is an upper bound, not the only exit. + #[test] + fn refinement_converges_before_cap_on_easy_input() { + let n_sub = 56; + let config = PhaseAlignConfig { + max_iterations: 50, + tolerance: 1e-2, // loose: a tiny offset converges in a few iters + static_fraction: 0.3, + min_static_subcarriers: 5, + }; + let f0 = make_frame_with_phase(n_sub, 0.0, 0.0); + let f1 = make_frame_with_phase(n_sub, 0.0, 0.02); + let frames = vec![f0, f1]; + let static_indices = find_static_subcarriers(&frames, &config).unwrap(); + let (_offsets, iters) = + estimate_phase_offsets_counted(&frames, &static_indices, &config).unwrap(); + assert!( + iters < config.max_iterations, + "easy input should converge before the cap, ran {iters}/{}", + config.max_iterations + ); + } + #[test] fn phase_correction_preserves_amplitude() { let mut aligner = PhaseAligner::new(2); diff --git a/v2/crates/wifi-densepose-signal/src/spectrogram.rs b/v2/crates/wifi-densepose-signal/src/spectrogram.rs index 7262e65c..19e85a31 100644 --- a/v2/crates/wifi-densepose-signal/src/spectrogram.rs +++ b/v2/crates/wifi-densepose-signal/src/spectrogram.rs @@ -9,9 +9,10 @@ use ndarray::Array2; use num_complex::Complex64; -use rustfft::FftPlanner; +use rustfft::{Fft, FftPlanner}; use ruvector_attn_mincut::attn_mincut; use std::f64::consts::PI; +use std::sync::Arc; /// Configuration for spectrogram generation. #[derive(Debug, Clone)] @@ -87,12 +88,40 @@ pub fn compute_spectrogram( return Err(SpectrogramError::InvalidWindowSize); } - let n_frames = (signal.len() - config.window_size) / config.hop_size + 1; - let n_freq = config.window_size / 2 + 1; - let window = make_window(config.window_fn, config.window_size); - let mut planner = FftPlanner::new(); let fft = planner.plan_fft_forward(config.window_size); + let window = make_window(config.window_fn, config.window_size); + Ok(compute_spectrogram_with_plan( + signal, + sample_rate, + config, + &fft, + &window, + )) +} + +/// STFT core that runs against a **pre-planned** FFT and pre-built window. +/// +/// ADR-154 §7.4 #20: `compute_spectrogram` re-plans the FFT on every call, so +/// `compute_multi_subcarrier_spectrogram` (which calls it once per subcarrier) +/// re-planned the same length-`window_size` FFT for *every* subcarrier. This +/// helper hoists the plan + window out of the per-subcarrier loop. The numeric +/// body is byte-for-byte the old loop — only the plan/window construction is +/// lifted — so the output is **bit-identical** to the per-call path (asserted by +/// `multi_subcarrier_hoisted_plan_bit_identical`). Callers must pass a plan +/// built for exactly `config.window_size` and a window of that length. +fn compute_spectrogram_with_plan( + signal: &[f64], + sample_rate: f64, + config: &SpectrogramConfig, + fft: &Arc>, + window: &[f64], +) -> Spectrogram { + debug_assert_eq!(window.len(), config.window_size, "window/plan size mismatch"); + debug_assert_eq!(fft.len(), config.window_size, "FFT/window size mismatch"); + + let n_frames = (signal.len() - config.window_size) / config.hop_size + 1; + let n_freq = config.window_size / 2 + 1; let mut data = Array2::zeros((n_freq, n_frames)); @@ -116,13 +145,13 @@ pub fn compute_spectrogram( } } - Ok(Spectrogram { + Spectrogram { data, n_freq, n_time: n_frames, freq_resolution: sample_rate / config.window_size as f64, time_resolution: config.hop_size as f64 / sample_rate, - }) + } } /// Compute spectrogram for each subcarrier from a temporal CSI matrix. @@ -134,12 +163,40 @@ pub fn compute_multi_subcarrier_spectrogram( sample_rate: f64, config: &SpectrogramConfig, ) -> Result, SpectrogramError> { - let (_, n_sc) = csi_temporal.dim(); - let mut spectrograms = Vec::with_capacity(n_sc); + let (n_samples, n_sc) = csi_temporal.dim(); + // ADR-154 §7.4 #20: validate *once* (same checks `compute_spectrogram` + // makes), then plan the FFT + build the window *once* and reuse them across + // every subcarrier instead of re-planning per column. The window length is + // identical for all subcarriers, so this is pure hoisting — output stays + // bit-identical to the per-call path. + if n_samples < config.window_size { + return Err(SpectrogramError::SignalTooShort { + signal_len: n_samples, + window_size: config.window_size, + }); + } + if config.hop_size == 0 { + return Err(SpectrogramError::InvalidHopSize); + } + if config.window_size == 0 { + return Err(SpectrogramError::InvalidWindowSize); + } + + let mut planner = FftPlanner::new(); + let fft = planner.plan_fft_forward(config.window_size); + let window = make_window(config.window_fn, config.window_size); + + let mut spectrograms = Vec::with_capacity(n_sc); for sc in 0..n_sc { let col: Vec = csi_temporal.column(sc).to_vec(); - spectrograms.push(compute_spectrogram(&col, sample_rate, config)?); + spectrograms.push(compute_spectrogram_with_plan( + &col, + sample_rate, + config, + &fft, + &window, + )); } Ok(spectrograms) @@ -372,6 +429,67 @@ mod tests { assert_eq!(spec.n_freq, 65); } } + + // ADR-154 §7.4 #20: the FFT-planner hoist in + // `compute_multi_subcarrier_spectrogram` must produce **bit-identical** + // output to calling `compute_spectrogram` (fresh planner) per subcarrier. + // We compare `f64::to_bits` of every spectrogram value across several + // window functions and a realistic 56-subcarrier CSI matrix — the planner + // change only reorders *when* the (identical) plan is built, never the math. + #[test] + fn multi_subcarrier_hoisted_plan_bit_identical() { + let n_samples = 600; + let n_sc = 56; // canonical-56 grid — the production subcarrier count + let sample_rate = 100.0; + let csi = Array2::from_shape_fn((n_samples, n_sc), |(t, sc)| { + // Deterministic, non-trivial per-subcarrier content. + let freq = 0.7 + sc as f64 * 0.13; + (2.0 * PI * freq * t as f64 / sample_rate).sin() + + 0.3 * (2.0 * PI * (freq * 2.1) * t as f64 / sample_rate).cos() + }); + + for window_fn in [ + WindowFunction::Hann, + WindowFunction::Hamming, + WindowFunction::Blackman, + WindowFunction::Rectangular, + ] { + for &power in &[true, false] { + let config = SpectrogramConfig { + window_size: 128, + hop_size: 37, // non-divisor hop to exercise frame edges + window_fn, + power, + }; + + // AFTER: hoisted-plan path. + let hoisted = + compute_multi_subcarrier_spectrogram(&csi, sample_rate, &config).unwrap(); + + // BEFORE: independent per-subcarrier fresh-planner path. + let reference: Vec = (0..n_sc) + .map(|sc| { + let col: Vec = csi.column(sc).to_vec(); + compute_spectrogram(&col, sample_rate, &config).unwrap() + }) + .collect(); + + assert_eq!(hoisted.len(), reference.len()); + for (sc, (h, r)) in hoisted.iter().zip(reference.iter()).enumerate() { + assert_eq!(h.data.dim(), r.data.dim(), "dim sc={sc} {window_fn:?}"); + for (a, b) in h.data.iter().zip(r.data.iter()) { + assert_eq!( + a.to_bits(), + b.to_bits(), + "bit mismatch sc={sc} {window_fn:?} power={power}: {a} vs {b}" + ); + } + assert_eq!(h.freq_resolution.to_bits(), r.freq_resolution.to_bits()); + assert_eq!(h.time_resolution.to_bits(), r.time_resolution.to_bits()); + } + } + } + } } #[cfg(test)]