wifi-densepose/v2
ruv 247794a2c5 bench(temporal): empirical sparse-vs-dense speedup curve (ADR-096 §3.1, #513)
Validates the central performance claim of ADR-096 with a runnable
benchmark. Single-run wall-clock, pure-Rust vs pure-Rust on x86_64
host. Real numbers, not just analytic argument.

Results (N=64..1024):

| N      | Dense (ms) | Sparse (ms) | Speedup |
|--------|-----------:|------------:|--------:|
|     64 |      0.262 |       0.141 |   1.86× |
|    128 |      1.120 |       0.335 |   3.34× |
|    256 |      4.129 |       0.711 |   5.81× |
|    512 |     19.230 |       2.356 |   8.16× |
|   1024 |     71.904 |       3.389 |  21.21× |

Asymptotic check: 64→1024 is 16× more tokens. Dense's 274× cost
growth matches N² (256× = 16²). Sparse's 24× growth matches
N log N (16 · log(1024)/log(64) ≈ 27). The complexity claim is
empirically supported.

ADR-096 §3.1 honest-framing paragraph predicted N=64 would be
overhead-bound; we measured 1.86× there, consistent with the ADR's
warning that AETHER's current `window_frames=100` default is below
the inflection point where sparse pays.

What this commit adds:
- examples/bench_speedup.rs — measures dense_attention (upstream
  reference), AetherTemporalHead.forward (this crate's wrapper),
  and SubquadraticSparseAttention.forward (raw, to confirm the
  wrapper isn't introducing overhead — it isn't, the two are
  within noise).
- benches_results.md — captured table + asymptotic check + caveats
  (config used, what the benchmark doesn't measure, how to run).

Run it:
  cargo run -p wifi-densepose-temporal --example bench_speedup --release

What's NOT measured here:
- Decode-step latency (already proved correct at last-token, not
  yet timed against a hypothetical O(N²) dense decode — they're
  structurally not comparable anyway).
- Memory footprint of KvCache + FP16 (matters on firmware, not host).
- GQA dispatch — this bench uses MHA shape so dense and sparse
  operate on identical tensors. Real AETHER will want MQA per
  TemporalHeadConfig::default_aether(), which halves KV memory.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-08 12:02:36 -04:00
..
.claude-flow chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427) 2026-04-25 21:28:13 -04:00
crates bench(temporal): empirical sparse-vs-dense speedup curve (ADR-096 §3.1, #513) 2026-05-08 12:02:36 -04:00
data chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427) 2026-04-25 21:28:13 -04:00
docs chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427) 2026-04-25 21:28:13 -04:00
examples chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427) 2026-04-25 21:28:13 -04:00
patches/ruvector-crv chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427) 2026-04-25 21:28:13 -04:00
Cargo.lock feat(temporal): scaffold wifi-densepose-temporal crate (ADR-096 Phase 1-3, #513) 2026-05-08 09:26:18 -04:00
Cargo.toml feat(temporal): scaffold wifi-densepose-temporal crate (ADR-096 Phase 1-3, #513) 2026-05-08 09:26:18 -04:00