diff --git a/PROOF.md b/PROOF.md index db651c80..23c50c59 100644 --- a/PROOF.md +++ b/PROOF.md @@ -55,6 +55,8 @@ trained checkpoint) so you can reproduce them yourself. | zero-copy ORT input ~1.48× (ADR-155) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-nn --features onnx --bench onnx_bench` | | pointcloud splats 9→2 passes ~1.24× (ADR-160 research) | **MEASURED** | `cd v2 && cargo bench -p wifi-densepose-pointcloud --bench splats_bench` | | native wlanapi multi-BSSID scan 9.74 Hz (vs netsh ~2 Hz) | **MEASURED (Windows)** | `cd v2 && cargo test -p wifi-densepose-wifiscan -- --ignored measure_native_scan_rate` | +| wasm-edge `process_frame` hot-path latency (host proxy, ADR-163) | **MEASURED-on-host** (NOT the ESP32/WASM3 budget — needs hardware) | `cd v2/crates/wifi-densepose-wasm-edge && cargo bench --features std` | +| cog steady-state CPU infer latency ~305 µs (ADR-163; NOT the manifest cold-start) | **MEASURED-on-host** | `cd v2 && cargo bench -p cog-person-count -p cog-pose-estimation --no-default-features --bench infer_bench` | ## What we do NOT claim (the honest negatives — the strongest anti-slop signal) @@ -68,8 +70,9 @@ trained checkpoint) so you can reproduce them yourself. ## Provenance -Every claim above traces to a committed ADR (`docs/adr/ADR-154`…`ADR-160`), a -test, a criterion bench, or `benchmarks/wiflow-std/RESULTS.md`. The history +Every claim above traces to a committed ADR (`docs/adr/ADR-154`…`ADR-163`), a +test, a criterion bench, `benchmarks/wiflow-std/RESULTS.md`, or +`benchmarks/edge-latency/RESULTS.md`. The history includes published **retractions** (the 92.9% PCK retraction; the WiFlow-STD shipped-checkpoint refutation; the NV-diamond BOM reality check) — a faker hides failures; we commit them. diff --git a/benchmarks/edge-latency/RESULTS.md b/benchmarks/edge-latency/RESULTS.md new file mode 100644 index 00000000..f7bec0ff --- /dev/null +++ b/benchmarks/edge-latency/RESULTS.md @@ -0,0 +1,137 @@ +# Edge-Latency Benchmark Results — ADR-163 + +Converting **CLAIMED** edge latency budgets into **MEASURED-on-host** numbers, +closing the measurement debt flagged by Milestones 5/6 (ADR-159 / ADR-160). +Benches + docs only — **no production-code behavior changed**. + +## The honest caveat, up front (read before citing any number) + +Two distinct gaps separate every number below from the figure it is converting: + +1. **Host ≠ ESP32.** The wasm-edge skill modules document budgets *"on ESP32-S3 + WASM3"* (e.g. `exo_time_crystal`: "H (<10 ms)"). These benches run **native + x86_64 on a development laptop**, not the Xtensa/WASM3 target. A native host + median is an **upper bound on the algorithm's work**, not the ESP32 number. + WASM3 interpretation on a ~240 MHz Xtensa core is typically 1–2 orders of + magnitude slower than native `-O` host code, so a host median far under the + budget **does NOT prove the ESP32 meets it.** *The ESP32 figure is NOT + reproduced here — it needs hardware.* + +2. **Bench ≠ the doc-claimed measurement.** For the cogs, the manifest cites a + **cold-start** number (`cold_start_ms_avg`, weight-load included); these + benches measure **steady-state** per-frame `infer` (warm, weights resident). + Different measurements; we report both, labelled. + +Grades (per `benchmarks/wiflow-std/RESULTS.md` / ADR-152 vocabulary): +- **MEASURED-on-host** — reproduced in this repo on the machine below, exact + command recorded. NOT the ESP32 / NOT the cold-start figure. +- **CLAIMED (ESP32)** — the doc budget; UNMEASURED on hardware here. + +## Machine + +| | | +|---|---| +| Host | `ruvzen` (Windows 11, this dev box) | +| CPU | Intel Core Ultra 9 285H | +| Toolchain | `cargo 1.91.1`, `--release` (opt-level per crate profile) | +| Bench harness | criterion 0.5 (`time: [low **median** high]` reported below) | +| Date | 2026-06-12 | + +Run-to-run spread on this box is non-trivial (criterion's low/high bracket the +median by a few %); the medians below are single-session captures with the smoke +settings `--warm-up-time 1 --measurement-time 2` (wasm-edge) / `3` (cogs). Re-run +for your own machine — the absolute numbers are host-specific. + +--- + +## T1 — wasm-edge `process_frame` hot paths (ADR-160 deferred item → DONE host) + +The crate is **excluded from the v2 workspace**; bench from the crate dir. + +```bash +cd v2/crates/wifi-densepose-wasm-edge +cargo bench --features std -- --warm-up-time 1 --measurement-time 2 +# med_seizure_detect is medical-experimental-gated: +cargo bench --features std,medical-experimental -- --warm-up-time 1 --measurement-time 2 med_seizure +``` + +| Hot path (M6-audit-named) | Bench id | Host median | Grade | Doc budget (CLAIMED, ESP32) | +|---|---|---|---|---| +| `exo_time_crystal` 256-pt × 128-lag autocorrelation (full buffer) | `exo_time_crystal::process_frame[autocorr_256x128]` | **17.3 µs** | MEASURED-on-host | "H (<10 ms) on ESP32-S3 WASM3" — **NOT reproduced here (needs hardware)** | +| `exo_ghost_hunter` empty-room periodicity + hidden-breathing | `exo_ghost_hunter::process_frame[empty_room_periodicity]` | **1.44 µs** | MEASURED-on-host | research/exotic; no firm ESP32 figure — host proxy only | +| `sec_weapon_detect` per-subcarrier Welford (MAX_SC=32) | `sec_weapon_detect::process_frame[per_sc_welford]` | **0.42 µs** (420 ns) | MEASURED-on-host | research-grade; calibration-gated — host proxy only | +| `med_seizure_detect` clonic-phase rhythm path (steady-state frame) | `med_seizure_detect::process_frame[clonic_rhythm]` | **0.10 µs** (105 ns) | MEASURED-on-host (feature-gated) | doc budget "S (<5 ms) on ESP32"; **NOT reproduced here** | + +Reading these honestly: + +- `exo_time_crystal` at **17.3 µs host** is the only one whose host cost is even + in the same *thousandths* of its 10 ms ESP32 budget — it does the most work + (~32K MACs/frame). 17.3 µs native says the algorithm is cheap; it says + **nothing** about whether WASM3-on-Xtensa lands under 10 ms. A naïve + host→ESP32 extrapolation (assume 100× interpreter+clock penalty) would put it + near ~1.7 ms, comfortably under — **but that is an extrapolation, not a + measurement**, and is recorded here only to show the host number is not + obviously in tension with the budget. ESP32 figure: **UNMEASURED**. +- `med_seizure_detect`'s 105 ns is the **steady-state** per-frame cost; the + expensive clonic autocorrelation only fires when the state machine is in the + clonic phase, so this is a lower-bound on the heavy path, not the worst case. + It is still a real, committed host datapoint. +- The pre-existing `tests/budget_compliance.rs` already asserts the L/S/H + wall-clock tiers (25 passing tests); these criterion benches add the + regression-grade, reproducible median that ADR-160 deferred. + +--- + +## T2 — cog steady-state inference latency (ADR-159/160 deferred item → DONE) + +Cog crates are normal workspace members; bench from `v2/`. Real weights +(`count_v1.safetensors` / `pose_v1.safetensors`) ship in-repo under each cog's +`cog/artifacts/`, so the bench measures the **real Candle CPU forward**, not the +stub (the bench `assert!`s `backend().starts_with("candle-")`). + +```bash +cd v2 +cargo bench -p cog-person-count --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3 +cargo bench -p cog-pose-estimation --no-default-features --bench infer_bench -- --warm-up-time 1 --measurement-time 3 +``` + +| Cog | Bench id | Host median (steady-state infer, CPU) | Grade | Manifest cold-start (CLAIMED, different measurement + machine) | +|---|---|---|---|---| +| cog-person-count | `cog_person_count::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | — (person-count manifest carries comparable provenance) | +| cog-pose-estimation | `cog_pose_estimation::infer[cpu_real_weights_steady_state]` | **305 µs** (idle box) | MEASURED-on-host | `cold_start_ms_avg: 5.4` (30 invocations, **ruvultra/RTX 5080 host**, candle 0.9 cpu) — **cold-start, NOT steady-state; NOT this machine** | + +> Spread caveat (observed, honest): both medians above were captured with the box +> otherwise idle. A re-run of the validate-form command *while a second cargo job +> was loading the same cores* gave 385 µs (person-count) / 973 µs (pose) — +> the criterion low/high bracket widens to ~0.34–1.18 ms under contention. The +> 305 µs figures are the idle-box datapoints; the absolute number is host- and +> load-dependent (the ~10× pose swing is core contention, not a code change). + +Reading these honestly: + +- **Steady-state ≠ cold-start.** The pose manifest's `5.4 ms` folds in one-time + weight load / mmap / first-forward allocation. This bench warms the engine + first and times only the recurring per-frame forward, on a *different + machine*. The two numbers are not comparable and we do not claim this bench + reproduces the 5.4 ms manifest figure. +- Both cogs share the same conv encoder; person-count adds a count head + + confidence head, pose adds a 256-wide MLP head. The host steady-state cost is + dominated by the three dilated Conv1d layers (56→64→128→128) shared by both — + which is why both land at ~305 µs. +- **Empirical confirmation of the steady-state/cold-start gap:** pose + steady-state (305 µs host) is ~18× *under* the manifest's 5.4 ms cold-start. + Even accounting for the different machine, this is the expected shape — the + bulk of cold-start is one-time setup, not the forward pass — and it is exactly + why conflating the two would be dishonest. + +--- + +## Status vs the deferred items + +| Deferred item | Was | Now | +|---|---|---| +| ADR-160 "Criterion benches for `process_frame` budget claims" | ACCEPTED-FUTURE | **DONE (host)**; ESP32-on-hardware still **PENDING** (needs the wasm32 target + a flashed ESP32-S3) | +| ADR-159/160 cog inference latency (`cold_start_ms_avg` uncommitted-benched) | CLAIMED | **MEASURED-on-host (steady-state)**; cold-start-on-ruvultra remains the manifest's separate claim | + +Nothing here changes runtime behavior — these are benches + this results file +only. No crate needs republishing. diff --git a/docs/adr/ADR-160-edge-skill-library-honest-labeling.md b/docs/adr/ADR-160-edge-skill-library-honest-labeling.md index 90672aa7..7131a684 100644 --- a/docs/adr/ADR-160-edge-skill-library-honest-labeling.md +++ b/docs/adr/ADR-160-edge-skill-library-honest-labeling.md @@ -182,9 +182,15 @@ label or behavior change, consistent with leaving their claim surface intact.) sign-language claim requires labelled clinical/affective/ASL data and reference standards that do not exist in this repo. The disclaimers + feature gate are the honest stand-in. Nothing is claimed that is not measured. -- **Criterion benches for `process_frame` budget claims** — **ACCEPTED-FUTURE**. - `tests/budget_compliance.rs` asserts L/S/H tier wall-clock budgets (25 tests, - passing), but a regression-grade criterion bench is not yet wired. +- **Criterion benches for `process_frame` budget claims** — **DONE (host)** + (ADR-163, 2026-06-12). `benches/process_frame_bench.rs` benches the heaviest + hot paths (`exo_time_crystal` 256×128 autocorrelation, `exo_ghost_hunter` + periodicity, `sec_weapon_detect` per-subcarrier Welford, `med_seizure_detect` + clonic rhythm) and reports committed **host** medians + (`benchmarks/edge-latency/RESULTS.md`). `tests/budget_compliance.rs` continues + to assert the L/S/H tier wall-clock budgets (25 tests, passing). **ESP32-on- + hardware (Xtensa/WASM3) latency remains PENDING** — the host bench is an + upper-bound algorithm-cost proxy, NOT the ESP32 figure (needs hardware). - **`wasm32-unknown-unknown` `static_mut_refs` confirmation** — **ACCEPTED-FUTURE** (toolchain): the source pattern is eliminated; a CI job on the wasm target should assert zero `static_mut_refs` once the target is added to the build image. diff --git a/docs/adr/ADR-163-edge-latency-measurement.md b/docs/adr/ADR-163-edge-latency-measurement.md new file mode 100644 index 00000000..d49d6390 --- /dev/null +++ b/docs/adr/ADR-163-edge-latency-measurement.md @@ -0,0 +1,123 @@ +# ADR-163: Edge-Latency Measurement — CLAIMED budgets → MEASURED-on-host + +- **Status**: accepted +- **Date**: 2026-06-12 +- **Deciders**: ruv +- **Tags**: edge-latency, wasm-edge, esp32, cog-inference, criterion, prove-everything, measurement-debt +- **Amends**: ADR-160 (deferred "criterion benches for process_frame budget claims" line now DONE-on-host); ADR-159 (cog inference latency) + +## Context — Milestone 9 of the beyond-SOTA sweep + +Prior milestones (M5/M6, ADR-159/ADR-160) flagged **measurement debt**: edge +latency budgets asserted in doc-comments and manifests but **never reproduced by +a committed benchmark**. Specifically: + +- Many `wifi-densepose-wasm-edge` skill modules document a timing budget *"on + ESP32-S3 WASM3"* (e.g. `exo_time_crystal`: "H (heavy, <10 ms)"). These were + **CLAIMED**, not benchmarked. ADR-160's deferred backlog named exactly this: + *"Criterion benches for `process_frame` budget claims — ACCEPTED-FUTURE."* +- `cog-pose-estimation`'s manifest cites `cold_start_ms_avg: 5.4`, but neither + cog had a `benches/` directory or any committed inference-latency number. + +Under the project's **prove-everything / anti-"AI-slop"** directive, a CLAIMED +latency budget that a skeptic cannot reproduce is debt. M9 pays it down — benches +and docs only, **no production-code behavior change** (so nothing republishes). + +## Headline + +**Converted the CLAIMED edge-latency budgets into MEASURED-on-host numbers, with +the honest host-vs-ESP32 caveat stated everywhere.** Added committed criterion +benches over the heaviest hot paths and a results file a skeptic can re-run. The +ESP32-on-hardware figure remains explicitly **UNMEASURED** — this milestone does +not pretend a laptop reproduces an Xtensa/WASM3 budget. + +## Decision — benches landed + +### T1 — wasm-edge `process_frame` budget benches + +`v2/crates/wifi-densepose-wasm-edge/benches/process_frame_bench.rs` (criterion, +`harness = false`, `required-features = ["std"]`). The crate is **excluded from +the v2 workspace**, so it runs from the crate dir. Benches the M6-audit-named +heaviest hot paths over a **fixed synthetic CSI frame**, each driven through the +public `process_frame` after warming the relevant ring/phase buffers so the +expensive path actually executes: + +- `exo_time_crystal::process_frame` — full 256-pt × 128-lag autocorrelation. +- `exo_ghost_hunter::process_frame` — empty-room periodicity / hidden-breathing. +- `sec_weapon_detect::process_frame` — per-subcarrier (MAX_SC=32) Welford. +- `med_seizure_detect::process_frame` — clonic-rhythm path (`#[cfg(feature = + "medical-experimental")]`, only built/run with that gate). + +The lib's `bench = false` was set so the libtest harness does not intercept +criterion CLI flags; the `ghost_hunter` bin is already `standalone-bin`-gated and +not built under `--features std`. + +**Measured host medians** (Intel Core Ultra 9 285H, native `--release`): +`exo_time_crystal` **17.3 µs** · `exo_ghost_hunter` **1.44 µs** · +`sec_weapon_detect` **0.42 µs** · `med_seizure_detect` **0.10 µs**. + +### T2 — cog inference latency benches + +`v2/crates/cog-person-count/benches/infer_bench.rs` and +`v2/crates/cog-pose-estimation/benches/infer_bench.rs` (criterion, +`harness = false`). Each loads the **real** shipped weights from the in-repo +`cog/artifacts/`, asserts the Candle CPU backend (so the stub can never be +silently benched), warms one forward, then times steady-state +`InferenceEngine::infer` over a fixed CSI window on `Device::Cpu`. + +**Measured host medians:** cog-person-count **305 µs** · cog-pose-estimation +**305 µs** (steady-state, CPU, real weights). + +### T3 — results file + +`benchmarks/edge-latency/RESULTS.md`, in the `benchmarks/wiflow-std/RESULTS.md` +style: each number with its exact reproduce command, the machine, the +MEASURED-on-host grade, and the honest caveat. + +## The honest caveat (recorded, non-negotiable) + +1. **Host ≠ ESP32.** The wasm-edge benches run native x86_64, not Xtensa/WASM3. + A host median is an **upper bound on algorithm work**, not the ESP32 number; + WASM3 interpretation on a ~240 MHz core is 1–2 orders of magnitude slower than + native `-O`. A host median under budget does **not** prove the ESP32 meets it. + **The ESP32 figure is NOT reproduced here — it needs hardware.** +2. **Bench ≠ the doc-claimed measurement.** The cogs' manifest cites a + **cold-start** number (weight-load included); these benches measure + **steady-state** per-frame `infer`. We report both, labelled, and do not + conflate them. Empirically, pose steady-state (305 µs host) is ~18× under the + 5.4 ms cold-start — the expected shape, and exactly why conflating would lie. + +## Deferred / still-pending (nothing dropped) + +- **ESP32-on-hardware `process_frame` latency** — **PENDING (hardware)**. Needs + the `wasm32-unknown-unknown` target built + flashed to an ESP32-S3 and timed + under WASM3. The host bench is the algorithm-cost proxy until then. +- **Per-skill *accuracy*** remains **DATA-GATED** (unchanged from ADR-160) — + this ADR measures latency only, never claims detection accuracy. + +## Reproduction (MEASURED) + +```bash +# T1 — wasm-edge (workspace-excluded → run from the crate dir) +cd v2/crates/wifi-densepose-wasm-edge +cargo bench --features std -- --warm-up-time 1 --measurement-time 2 +cargo bench --features std,medical-experimental -- --warm-up-time 1 --measurement-time 2 med_seizure + +# T2 — cogs (workspace members) +cd v2 +cargo bench -p cog-person-count --no-default-features --bench infer_bench +cargo bench -p cog-pose-estimation --no-default-features --bench infer_bench + +# existing tests still green (behavior unchanged) +cargo test -p cog-person-count -p cog-pose-estimation --no-default-features +``` + +## Consequences + +- ADR-160's deferred *"Criterion benches for `process_frame` budget claims"* line + is now **DONE (host)**; the ESP32-on-hardware confirmation is explicitly the + one remaining pending item. +- The cogs now ship committed, reproducible steady-state inference-latency + numbers, cleanly distinguished from the manifest's cold-start claim. +- No runtime behavior changed; no crate republishes. `PROOF.md`'s performance + table and `scripts/prove.sh`'s gated section reference the new benches. diff --git a/scripts/prove.sh b/scripts/prove.sh index 737223f1..afad4e5b 100644 --- a/scripts/prove.sh +++ b/scripts/prove.sh @@ -131,6 +131,7 @@ else SKIP "named person-identity — DATA-GATED: needs a real enrollment feeding the AETHER/body-resonance channel (see docs/research/soul/)" SKIP "OccWorld trained accuracy — needs a trained checkpoint (predict() carries weights_trained=false until then)" SKIP "native wlanapi 9.74 Hz scan — Windows-only; run: cargo test -p wifi-densepose-wifiscan -- --ignored measure_native_scan_rate" + SKIP "edge-latency benches (ADR-163) — host medians, not asserted here: (cd v2/crates/wifi-densepose-wasm-edge && cargo bench --features std) and (cd v2 && cargo bench -p cog-person-count -p cog-pose-estimation --no-default-features --bench infer_bench). HOST proxy only — the ESP32/WASM3 budget is NOT reproduced on a laptop; see benchmarks/edge-latency/RESULTS.md" echo " (re-run with --full to attempt the feature-gated subset where prereqs exist)" fi hr diff --git a/v2/Cargo.lock b/v2/Cargo.lock index 2d8d9ecc..016aa000 100644 --- a/v2/Cargo.lock +++ b/v2/Cargo.lock @@ -1015,6 +1015,7 @@ dependencies = [ "candle-core 0.9.2", "candle-nn 0.9.2", "clap", + "criterion", "safetensors 0.4.5", "serde", "serde_json", @@ -1034,6 +1035,7 @@ dependencies = [ "candle-core 0.9.2", "candle-nn 0.9.2", "clap", + "criterion", "hex", "safetensors 0.4.5", "serde", diff --git a/v2/crates/cog-person-count/Cargo.toml b/v2/crates/cog-person-count/Cargo.toml index 2b3a65ea..811bf485 100644 --- a/v2/crates/cog-person-count/Cargo.toml +++ b/v2/crates/cog-person-count/Cargo.toml @@ -34,6 +34,12 @@ safetensors = "0.4" [dev-dependencies] tempfile = "3" approx = "0.5" +# ADR-163: steady-state infer latency bench (real count_v1 weights, Device::Cpu). +criterion = { version = "0.5", features = ["html_reports"] } + +[[bench]] +name = "infer_bench" +harness = false [features] default = [] diff --git a/v2/crates/cog-person-count/benches/infer_bench.rs b/v2/crates/cog-person-count/benches/infer_bench.rs new file mode 100644 index 00000000..2381f65b --- /dev/null +++ b/v2/crates/cog-person-count/benches/infer_bench.rs @@ -0,0 +1,95 @@ +//! Criterion bench for `cog-person-count` steady-state inference latency +//! (ADR-163, closing the ADR-159/160 deferred "cog inference latency bench" item). +//! +//! ## What this measures — and what the manifest's `cold_start_ms` does NOT +//! +//! This benches **steady-state** `InferenceEngine::infer` over a FIXED CSI +//! window on `Device::Cpu` with the **real** shipped `count_v1.safetensors` +//! weights — i.e. the per-frame cost once the model is loaded and warm. +//! +//! The cog manifest's `build_metadata.cold_start_ms_avg` (in the pose cog; +//! person-count's manifest carries comparable provenance) is a **DIFFERENT +//! measurement**: it includes one-time weight load / mmap / first-forward +//! allocation. Cold-start is a startup cost paid once; steady-state infer is the +//! recurring per-frame cost. They are not comparable and we do not conflate them. +//! `cold_start` was measured on ruvultra (RTX 5080 host, candle 0.9 cpu); this +//! bench runs on whatever machine you run it on — see `benchmarks/edge-latency/RESULTS.md` +//! for the host the committed numbers were taken on. +//! +//! If the weights file is absent the engine falls back to the zero-confidence +//! stub; we skip the bench in that case rather than benchmark the stub (which +//! would be a meaningless number) — the bench prints a notice and measures a +//! no-op so criterion still produces a (clearly-labelled) datapoint. +//! +//! Run (cog crates are normal workspace members): +//! cd v2 && cargo bench -p cog-person-count --no-default-features +//! cd v2 && cargo bench -p cog-person-count --no-default-features -- --warm-up-time 1 --measurement-time 2 + +use std::hint::black_box; +use std::path::Path; + +use criterion::{criterion_group, criterion_main, Criterion}; + +use cog_person_count::inference::{CsiWindow, InferenceEngine, INPUT_SUBCARRIERS, INPUT_TIMESTEPS}; + +/// Deterministic fixed CSI window (seed-stable LCG), normalised-ish amplitudes. +fn fixed_window() -> CsiWindow { + let mut s = 0x00C0_FFEEu32; + let data: Vec = (0..INPUT_SUBCARRIERS * INPUT_TIMESTEPS) + .map(|_| { + s = s.wrapping_mul(1103515245).wrapping_add(12345); + (s >> 16) as f32 / 32768.0 // [0, 1) + }) + .collect(); + CsiWindow { data } +} + +/// Locate the real weights from the crate dir or the repo root. +fn real_weights() -> Option { + let candidates = [ + "cog/artifacts/count_v1.safetensors", + "v2/crates/cog-person-count/cog/artifacts/count_v1.safetensors", + "crates/cog-person-count/cog/artifacts/count_v1.safetensors", + ]; + candidates + .iter() + .map(Path::new) + .find(|p| p.exists()) + .map(|p| p.to_path_buf()) +} + +fn bench_infer(c: &mut Criterion) { + let window = fixed_window(); + + match real_weights() { + Some(path) => { + let engine = + InferenceEngine::with_weights(Some(&path)).expect("load real count_v1 weights"); + assert!( + engine.backend().starts_with("candle-"), + "expected real Candle backend, got {} — bench would measure the stub", + engine.backend() + ); + // Sanity: one real inference before timing. + let _ = engine.infer(&window).expect("warmup infer"); + + c.bench_function("cog_person_count::infer[cpu_real_weights_steady_state]", |b| { + b.iter(|| { + black_box(engine.infer(black_box(&window)).expect("infer")); + }); + }); + } + None => { + eprintln!( + "NOTE: count_v1.safetensors not found — skipping the real-weights infer bench. \ + (The committed RESULTS.md numbers require the in-repo weights.)" + ); + c.bench_function("cog_person_count::infer[SKIPPED_no_weights]", |b| { + b.iter(|| black_box(1 + 1)); + }); + } + } +} + +criterion_group!(benches, bench_infer); +criterion_main!(benches); diff --git a/v2/crates/cog-pose-estimation/Cargo.toml b/v2/crates/cog-pose-estimation/Cargo.toml index 2bdeae77..f01b8626 100644 --- a/v2/crates/cog-pose-estimation/Cargo.toml +++ b/v2/crates/cog-pose-estimation/Cargo.toml @@ -39,6 +39,12 @@ wifi-densepose-train = { version = "0.3.1", path = "../wifi-densepose-train", de [dev-dependencies] tempfile = "3" +# ADR-163: steady-state infer latency bench (real pose_v1 weights, Device::Cpu). +criterion = { version = "0.5", features = ["html_reports"] } + +[[bench]] +name = "infer_bench" +harness = false [features] default = [] diff --git a/v2/crates/cog-pose-estimation/benches/infer_bench.rs b/v2/crates/cog-pose-estimation/benches/infer_bench.rs new file mode 100644 index 00000000..7d90ee59 --- /dev/null +++ b/v2/crates/cog-pose-estimation/benches/infer_bench.rs @@ -0,0 +1,89 @@ +//! Criterion bench for `cog-pose-estimation` steady-state inference latency +//! (ADR-163, closing the ADR-159/160 deferred "cog inference latency bench" item). +//! +//! ## What this measures — and what the manifest's `cold_start_ms_avg` does NOT +//! +//! The pose cog's manifest (`cog/artifacts/manifests/x86_64/manifest.json`) +//! cites `build_metadata.cold_start_ms_avg: 5.4` (30 invocations, measured on +//! ruvultra / RTX 5080 host, candle 0.9 cpu). **That is a cold-start number** — +//! it folds in one-time weight load / mmap / first-forward allocation. +//! +//! This bench measures the **steady-state** per-frame cost instead: +//! `InferenceEngine::infer` over a FIXED CSI window on `Device::Cpu` with the +//! **real** shipped `pose_v1.safetensors`, after a warm-up forward. Steady-state +//! and cold-start are different measurements; we label both honestly and do not +//! claim this reproduces the 5.4 ms manifest figure (different machine, different +//! measurement). See `benchmarks/edge-latency/RESULTS.md`. +//! +//! Run (cog crates are normal workspace members): +//! cd v2 && cargo bench -p cog-pose-estimation --no-default-features +//! cd v2 && cargo bench -p cog-pose-estimation --no-default-features -- --warm-up-time 1 --measurement-time 2 + +use std::hint::black_box; +use std::path::Path; + +use criterion::{criterion_group, criterion_main, Criterion}; + +use cog_pose_estimation::inference::{ + CsiWindow, InferenceEngine, INPUT_SUBCARRIERS, INPUT_TIMESTEPS, +}; + +/// Deterministic fixed CSI window (seed-stable LCG). +fn fixed_window() -> CsiWindow { + let mut s = 0x00C0_FFEEu32; + let data: Vec = (0..INPUT_SUBCARRIERS * INPUT_TIMESTEPS) + .map(|_| { + s = s.wrapping_mul(1103515245).wrapping_add(12345); + (s >> 16) as f32 / 32768.0 // [0, 1) + }) + .collect(); + CsiWindow { data } +} + +fn real_weights() -> Option { + let candidates = [ + "cog/artifacts/pose_v1.safetensors", + "v2/crates/cog-pose-estimation/cog/artifacts/pose_v1.safetensors", + "crates/cog-pose-estimation/cog/artifacts/pose_v1.safetensors", + ]; + candidates + .iter() + .map(Path::new) + .find(|p| p.exists()) + .map(|p| p.to_path_buf()) +} + +fn bench_infer(c: &mut Criterion) { + let window = fixed_window(); + + match real_weights() { + Some(path) => { + let engine = + InferenceEngine::with_weights(Some(&path)).expect("load real pose_v1 weights"); + assert!( + engine.backend().starts_with("candle-"), + "expected real Candle backend, got {} — bench would measure the stub", + engine.backend() + ); + let _ = engine.infer(&window).expect("warmup infer"); + + c.bench_function("cog_pose_estimation::infer[cpu_real_weights_steady_state]", |b| { + b.iter(|| { + black_box(engine.infer(black_box(&window)).expect("infer")); + }); + }); + } + None => { + eprintln!( + "NOTE: pose_v1.safetensors not found — skipping the real-weights infer bench. \ + (The committed RESULTS.md numbers require the in-repo weights.)" + ); + c.bench_function("cog_pose_estimation::infer[SKIPPED_no_weights]", |b| { + b.iter(|| black_box(1 + 1)); + }); + } + } +} + +criterion_group!(benches, bench_infer); +criterion_main!(benches); diff --git a/v2/crates/wifi-densepose-wasm-edge/Cargo.lock b/v2/crates/wifi-densepose-wasm-edge/Cargo.lock index a3f74aa3..77d7ccc0 100644 --- a/v2/crates/wifi-densepose-wasm-edge/Cargo.lock +++ b/v2/crates/wifi-densepose-wasm-edge/Cargo.lock @@ -2,6 +2,33 @@ # It is not intended for manual editing. version = 4 +[[package]] +name = "aho-corasick" +version = "1.1.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301" +dependencies = [ + "memchr", +] + +[[package]] +name = "anes" +version = "0.1.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4b46cbb362ab8752921c97e041f5e366ee6297bd428a31275b9fcf1e380f7299" + +[[package]] +name = "anstyle" +version = "1.0.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "940b3a0ca603d1eade50a4846a2afffd5ef57a9feac2c0e2ec2e14f9ead76000" + +[[package]] +name = "autocfg" +version = "1.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f2032f911046de80f0a198e0901378627c33f59ea0ac00e363d481118bd70a53" + [[package]] name = "block-buffer" version = "0.10.4" @@ -11,12 +38,76 @@ dependencies = [ "generic-array", ] +[[package]] +name = "bumpalo" +version = "3.20.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72f5acc6cb2ba439de613abc23857ec3d78374d8ed5ac84e9d11336e87da8649" + +[[package]] +name = "cast" +version = "0.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5" + [[package]] name = "cfg-if" version = "1.0.4" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801" +[[package]] +name = "ciborium" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "42e69ffd6f0917f5c029256a24d0161db17cea3997d185db0d35926308770f0e" +dependencies = [ + "ciborium-io", + "ciborium-ll", + "serde", +] + +[[package]] +name = "ciborium-io" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "05afea1e0a06c9be33d539b876f1ce3692f4afea2cb41f740e7743225ed1c757" + +[[package]] +name = "ciborium-ll" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "57663b653d948a338bfb3eeba9bb2fd5fcfaecb9e199e87e1eda4d9e8b240fd9" +dependencies = [ + "ciborium-io", + "half", +] + +[[package]] +name = "clap" +version = "4.6.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ddb117e43bbf7dacf0a4190fef4d345b9bad68dfc649cb349e7d17d28428e51" +dependencies = [ + "clap_builder", +] + +[[package]] +name = "clap_builder" +version = "4.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "714a53001bf66416adb0e2ef5ac857140e7dc3a0c48fb28b2f10762fc4b5069f" +dependencies = [ + "anstyle", + "clap_lex", +] + +[[package]] +name = "clap_lex" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c8d4a3bb8b1e0c1050499d1815f5ab16d04f0959b233085fb31653fbfc9d98f9" + [[package]] name = "cpufeatures" version = "0.2.17" @@ -26,6 +117,73 @@ dependencies = [ "libc", ] +[[package]] +name = "criterion" +version = "0.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f2b12d017a929603d80db1831cd3a24082f8137ce19c69e6447f54f5fc8d692f" +dependencies = [ + "anes", + "cast", + "ciborium", + "clap", + "criterion-plot", + "is-terminal", + "itertools", + "num-traits", + "once_cell", + "oorandom", + "plotters", + "rayon", + "regex", + "serde", + "serde_derive", + "serde_json", + "tinytemplate", + "walkdir", +] + +[[package]] +name = "criterion-plot" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6b50826342786a51a89e2da3a28f1c32b06e387201bc2d19791f622c673706b1" +dependencies = [ + "cast", + "itertools", +] + +[[package]] +name = "crossbeam-deque" +version = "0.8.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51" +dependencies = [ + "crossbeam-epoch", + "crossbeam-utils", +] + +[[package]] +name = "crossbeam-epoch" +version = "0.9.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e" +dependencies = [ + "crossbeam-utils", +] + +[[package]] +name = "crossbeam-utils" +version = "0.8.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28" + +[[package]] +name = "crunchy" +version = "0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5" + [[package]] name = "crypto-common" version = "0.1.7" @@ -46,6 +204,36 @@ dependencies = [ "crypto-common", ] +[[package]] +name = "either" +version = "1.16.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "91622ff5e7162018101f2fea40d6ebf4a78bbe5a49736a2020649edf9693679e" + +[[package]] +name = "futures-core" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e3450815272ef58cec6d564423f6e755e25379b217b0bc688e295ba24df6b1d" + +[[package]] +name = "futures-task" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "037711b3d59c33004d3856fbdc83b99d4ff37a24768fa1be9ce3538a1cde4393" + +[[package]] +name = "futures-util" +version = "0.3.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "389ca41296e6190b48053de0321d02a77f32f8a5d2461dd38762c0593805c6d6" +dependencies = [ + "futures-core", + "futures-task", + "pin-project-lite", + "slab", +] + [[package]] name = "generic-array" version = "0.14.7" @@ -56,6 +244,60 @@ dependencies = [ "version_check", ] +[[package]] +name = "half" +version = "2.7.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6ea2d84b969582b4b1864a92dc5d27cd2b77b622a8d79306834f1be5ba20d84b" +dependencies = [ + "cfg-if", + "crunchy", + "zerocopy", +] + +[[package]] +name = "hermit-abi" +version = "0.5.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fc0fef456e4baa96da950455cd02c081ca953b141298e41db3fc7e36b1da849c" + +[[package]] +name = "is-terminal" +version = "0.4.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3640c1c38b8e4e43584d8df18be5fc6b0aa314ce6ebf51b53313d4306cca8e46" +dependencies = [ + "hermit-abi", + "libc", + "windows-sys", +] + +[[package]] +name = "itertools" +version = "0.10.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b0fd2260e829bddf4cb6ea802289de2f86d6a7a690192fbe91b3f46e0f2c8473" +dependencies = [ + "either", +] + +[[package]] +name = "itoa" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f42a60cbdf9a97f5d2305f08a87dc4e09308d1276d28c869c684d7777685682" + +[[package]] +name = "js-sys" +version = "0.3.100" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f2025f20d7a4fa7785846e7b63d10a76d3f1cee98ee5cb79ea59703f95e42162" +dependencies = [ + "cfg-if", + "futures-util", + "wasm-bindgen", +] + [[package]] name = "libc" version = "0.2.182" @@ -68,6 +310,192 @@ version = "0.2.16" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "b6d2cec3eae94f9f509c767b45932f1ada8350c4bdb85af2fcab4a3c14807981" +[[package]] +name = "memchr" +version = "2.8.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "88904434abc2901f197fe8cc55f0445e7ded921dba5911dad2e2b39b48e663c4" + +[[package]] +name = "num-traits" +version = "0.2.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841" +dependencies = [ + "autocfg", +] + +[[package]] +name = "once_cell" +version = "1.21.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9f7c3e4beb33f85d45ae3e3a1792185706c8e16d043238c593331cc7cd313b50" + +[[package]] +name = "oorandom" +version = "11.1.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d6790f58c7ff633d8771f42965289203411a5e5c68388703c06e14f24770b41e" + +[[package]] +name = "pin-project-lite" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a89322df9ebe1c1578d689c92318e070967d1042b512afbe49518723f4e6d5cd" + +[[package]] +name = "plotters" +version = "0.3.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5aeb6f403d7a4911efb1e33402027fc44f29b5bf6def3effcc22d7bb75f2b747" +dependencies = [ + "num-traits", + "plotters-backend", + "plotters-svg", + "wasm-bindgen", + "web-sys", +] + +[[package]] +name = "plotters-backend" +version = "0.3.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "df42e13c12958a16b3f7f4386b9ab1f3e7933914ecea48da7139435263a4172a" + +[[package]] +name = "plotters-svg" +version = "0.3.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "51bae2ac328883f7acdfea3d66a7c35751187f870bc81f94563733a154d7a670" +dependencies = [ + "plotters-backend", +] + +[[package]] +name = "proc-macro2" +version = "1.0.106" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "quote" +version = "1.0.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41f2619966050689382d2b44f664f4bc593e129785a36d6ee376ddf37259b924" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "rayon" +version = "1.12.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fb39b166781f92d482534ef4b4b1b2568f42613b53e5b6c160e24cfbfa30926d" +dependencies = [ + "either", + "rayon-core", +] + +[[package]] +name = "rayon-core" +version = "1.13.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "22e18b0f0062d30d4230b2e85ff77fdfe4326feb054b9783a3460d8435c8ab91" +dependencies = [ + "crossbeam-deque", + "crossbeam-utils", +] + +[[package]] +name = "regex" +version = "1.12.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f1292b7759ae1cb9ec195452d1390a074f0cd8541ab7a5a8c31cd6db45d4a6ba" +dependencies = [ + "aho-corasick", + "memchr", + "regex-automata", + "regex-syntax", +] + +[[package]] +name = "regex-automata" +version = "0.4.14" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f" +dependencies = [ + "aho-corasick", + "memchr", + "regex-syntax", +] + +[[package]] +name = "regex-syntax" +version = "0.8.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d6f6ff9a378485b298a5286656da665ba74413d36db0979633275d2e708145d4" + +[[package]] +name = "rustversion" +version = "1.0.22" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d" + +[[package]] +name = "same-file" +version = "1.0.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "93fc1dc3aaa9bfed95e02e6eadabb4baf7e3078b0bd1b4d7b6b0b68378900502" +dependencies = [ + "winapi-util", +] + +[[package]] +name = "serde" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e" +dependencies = [ + "serde_core", + "serde_derive", +] + +[[package]] +name = "serde_core" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_derive" +version = "1.0.228" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "serde_json" +version = "1.0.150" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e8014e44b4736ed0538adeecded0fce2a272f22dc9578a7eb6b2d9993c74cfb9" +dependencies = [ + "itoa", + "memchr", + "serde", + "serde_core", + "zmij", +] + [[package]] name = "sha2" version = "0.10.9" @@ -79,22 +507,171 @@ dependencies = [ "digest", ] +[[package]] +name = "slab" +version = "0.4.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0c790de23124f9ab44544d7ac05d60440adc586479ce501c1d6d7da3cd8c9cf5" + +[[package]] +name = "syn" +version = "2.0.117" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "tinytemplate" +version = "1.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "be4d6b5f19ff7664e8c98d03e2139cb510db9b0a60b55f8e8709b689d939b6bc" +dependencies = [ + "serde", + "serde_json", +] + [[package]] name = "typenum" version = "1.19.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "562d481066bde0658276a35467c4af00bdc6ee726305698a55b86e61d7ad82bb" +[[package]] +name = "unicode-ident" +version = "1.0.24" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75" + [[package]] name = "version_check" version = "0.9.5" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a" +[[package]] +name = "walkdir" +version = "2.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "29790946404f91d9c5d06f9874efddea1dc06c5efe94541a7d6863108e3a5e4b" +dependencies = [ + "same-file", + "winapi-util", +] + +[[package]] +name = "wasm-bindgen" +version = "0.2.123" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a254a4b10c19a76f09a27640e7ffbf9bc30bf67e16a3bf28aaefa4920fe81563" +dependencies = [ + "cfg-if", + "once_cell", + "rustversion", + "wasm-bindgen-macro", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-macro" +version = "0.2.123" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "24a40fc75b0ec6f3746ceb10d36f53a93dcd68a93b11b6445983945d79eba0dc" +dependencies = [ + "quote", + "wasm-bindgen-macro-support", +] + +[[package]] +name = "wasm-bindgen-macro-support" +version = "0.2.123" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "908f34bd9b9ce3d4caf07b72dfab63d61504d156856c6bd3cd87fa350cf3985b" +dependencies = [ + "bumpalo", + "proc-macro2", + "quote", + "syn", + "wasm-bindgen-shared", +] + +[[package]] +name = "wasm-bindgen-shared" +version = "0.2.123" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7acbf7616c27b194bbb550bf77ed0c2c3e5b7fd1260a93082b95fb7f47959b92" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "web-sys" +version = "0.3.100" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6e0871acf327f283dc6da28a1696cdc64fb355ba9f935d052021fa77f35cce69" +dependencies = [ + "js-sys", + "wasm-bindgen", +] + [[package]] name = "wifi-densepose-wasm-edge" version = "0.3.0" dependencies = [ + "criterion", "libm", "sha2", ] + +[[package]] +name = "winapi-util" +version = "0.1.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c2a7b1c03c876122aa43f3020e6c3c3ee5c05081c9a00739faf7503aeba10d22" +dependencies = [ + "windows-sys", +] + +[[package]] +name = "windows-link" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5" + +[[package]] +name = "windows-sys" +version = "0.61.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc" +dependencies = [ + "windows-link", +] + +[[package]] +name = "zerocopy" +version = "0.8.52" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ce1022995ff5ff5d841ad7d994facc23098cd40152f2c1d11cd607c6f530653f" +dependencies = [ + "zerocopy-derive", +] + +[[package]] +name = "zerocopy-derive" +version = "0.8.52" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ae7f38b72ec2a254e2b87ef277cf2cd4fb97cbebf944faa6f33354da0867930" +dependencies = [ + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "zmij" +version = "1.0.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa" diff --git a/v2/crates/wifi-densepose-wasm-edge/Cargo.toml b/v2/crates/wifi-densepose-wasm-edge/Cargo.toml index 2dee4614..f4e7a7b0 100644 --- a/v2/crates/wifi-densepose-wasm-edge/Cargo.toml +++ b/v2/crates/wifi-densepose-wasm-edge/Cargo.toml @@ -11,6 +11,20 @@ categories = ["embedded", "wasm", "science"] [lib] crate-type = ["cdylib", "rlib"] +# The lib's libtest harness does not understand criterion CLI flags +# (`--warm-up-time` etc.), so exclude it from `cargo bench` — only the criterion +# bench target below should receive bench args (ADR-163). +bench = false + +# ADR-163: host-measured process_frame latency benches (closes the ADR-160 +# "criterion benches for process_frame budget claims" deferred item — HOST only; +# the ESP32-S3 WASM3 budget remains unmeasured, see the bench header). +# `std` is required (criterion is a host crate); the crate is workspace-EXCLUDED +# so run from the crate dir: `cargo bench --features std`. +[[bench]] +name = "process_frame_bench" +harness = false +required-features = ["std"] [dependencies] # no_std math @@ -18,6 +32,11 @@ libm = "0.2" # SHA-256 for RVF build hash (optional, used by builder) sha2 = { version = "0.10", optional = true, default-features = false } +[dev-dependencies] +# Host-only latency regression benches (ADR-163). Pinned to match the rest of +# the workspace's bench crates. +criterion = { version = "0.5", features = ["html_reports"] } + [features] default = ["default-pipeline"] # Enable std for testing on host + RVF builder diff --git a/v2/crates/wifi-densepose-wasm-edge/benches/process_frame_bench.rs b/v2/crates/wifi-densepose-wasm-edge/benches/process_frame_bench.rs new file mode 100644 index 00000000..1d15781f --- /dev/null +++ b/v2/crates/wifi-densepose-wasm-edge/benches/process_frame_bench.rs @@ -0,0 +1,259 @@ +//! Criterion benches for the heaviest `process_frame` hot paths in the edge +//! skill library (ADR-163, closing the ADR-160 §"Deferred Backlog" item +//! "Criterion benches for process_frame budget claims"). +//! +//! ## HONEST SCOPE — read this before citing any number here +//! +//! These benches measure **HOST** wall-clock latency on a development laptop. +//! The per-module doc budgets (e.g. `exo_time_crystal` "H (heavy, <10ms) on +//! ESP32-S3 WASM3") are **for a different target**: an Xtensa ESP32-S3 running +//! the WASM3 interpreter. A native x86_64 host with `-O` is an **upper-bound +//! proxy for the ALGORITHM cost only**; it is NOT the ESP32 number and does NOT +//! reproduce the ESP32 budget. WASM3 interpretation on a ~240 MHz Xtensa core is +//! typically 1-2 orders of magnitude slower than native host code, so a host +//! median well under the budget does NOT prove the ESP32 meets it — it only +//! bounds the work. The ESP32 figure remains UNMEASURED (needs hardware). +//! +//! What these benches DO prove (MEASURED-on-host): +//! * the hot paths run, on a fixed synthetic CSI frame, with a real median; +//! * a regression guard exists so a future change that 10×'s the host cost +//! is caught in CI/dev even before anyone reflashes an ESP32. +//! +//! Run (the crate is EXCLUDED from the v2 workspace — bench from the crate dir): +//! cd v2/crates/wifi-densepose-wasm-edge +//! cargo bench --features std +//! # quick smoke: +//! cargo bench --features std -- --warm-up-time 1 --measurement-time 2 +//! +//! `med_seizure_detect` is gated behind `medical-experimental`; its bench is +//! `#[cfg(feature = "medical-experimental")]` and only runs when that feature is +//! also enabled: +//! cargo bench --features std,medical-experimental + +use criterion::{criterion_group, criterion_main, BatchSize, Criterion}; +use std::hint::black_box; + +use wifi_densepose_wasm_edge::exo_ghost_hunter::GhostHunterDetector; +use wifi_densepose_wasm_edge::exo_time_crystal::TimeCrystalDetector; +use wifi_densepose_wasm_edge::sec_weapon_detect::WeaponDetector; + +// ── Fixed synthetic CSI fixtures (deterministic LCG, seed-stable) ──────────── + +/// Deterministic pseudo-random in [lo, hi) from a 32-bit LCG, matching the +/// generator style used by `tests/budget_compliance.rs`. +fn lcg(seed: &mut u32) -> f32 { + *seed = seed.wrapping_mul(1103515245).wrapping_add(12345); + (*seed >> 16) as f32 / 32768.0 +} + +fn synthetic_phases(n: usize, seed: u32) -> Vec { + let mut s = seed; + (0..n).map(|_| lcg(&mut s) * 6.2832 - 3.1416).collect() +} + +fn synthetic_amplitudes(n: usize, seed: u32) -> Vec { + let mut s = seed; + (0..n).map(|_| lcg(&mut s) * 10.0 + 0.1).collect() +} + +fn synthetic_variance(n: usize, seed: u32) -> Vec { + let mut s = seed; + (0..n).map(|_| lcg(&mut s) * 2.0 + 0.05).collect() +} + +const N_SC: usize = 32; // per-subcarrier width (matches both modules' MAX_SC) + +// ── exo_time_crystal: compute_autocorrelation 256×128 hot path ─────────────── +// +// `compute_autocorrelation` is private, so we drive it through the public +// `process_frame`. To hit the full 256-point × 128-lag autocorrelation the +// circular buffer must be FULL (≥256 samples) and the signal must be +// non-constant (the module early-outs on `buf_var < 1e-8`). We pre-fill once +// with a periodic-plus-noise motion-energy stream, then bench a single +// `process_frame` (each call recomputes the full 256×128 autocorrelation = +// ~32K multiply-accumulates, the M6-audit-named hot path). + +fn prefilled_time_crystal() -> TimeCrystalDetector { + let mut d = TimeCrystalDetector::new(); + let mut s = 0xC0FFEEu32; + // 300 frames (> BUF_LEN=256) so the buffer is full and statistics are warm. + for i in 0..300 { + // period-10 square wave + small noise → guarantees buf_var > 0 and a + // genuine autocorrelation structure (the expensive path runs). + let base = if (i % 10) < 5 { 1.0 } else { 0.0 }; + let me = base + lcg(&mut s) * 0.05; + black_box(d.process_frame(black_box(me))); + } + d +} + +fn bench_exo_time_crystal(c: &mut Criterion) { + c.bench_function("exo_time_crystal::process_frame[autocorr_256x128]", |b| { + let mut s = 0x1357_9BDFu32; + b.iter_batched( + prefilled_time_crystal, + |mut d| { + // One frame = one full 256×128 autocorrelation pass. + let me = if (d.frame_count() % 10) < 5 { 1.0 } else { 0.0 } + lcg(&mut s) * 0.05; + black_box(d.process_frame(black_box(me))); + }, + BatchSize::SmallInput, + ); + }); +} + +// ── exo_ghost_hunter: periodicity + hidden-breathing hot path ──────────────── +// +// Heaviest path runs only when the room is reported EMPTY (presence == 0): +// per-group anomaly accumulation + aggregate-phase autocorrelation for hidden +// periodic (breathing) signatures. We warm the noise floor + phase buffer first, +// then bench one empty-room frame. + +fn prefilled_ghost_hunter() -> GhostHunterDetector { + let mut d = GhostHunterDetector::new(); + let mut s = 0xBADC0DEu32; + // Warm the per-group EWMA noise floors + fill the phase buffer (PHASE_BUF_LEN=64) + // with a periodic phase signal so the periodicity autocorrelation has structure. + for i in 0..120u32 { + let phases: Vec = (0..N_SC) + .map(|k| libm::sinf(i as f32 * 0.4 + k as f32 * 0.1) * 0.3 + lcg(&mut s) * 0.02) + .collect(); + let amps = synthetic_amplitudes(N_SC, 4000 + i); + let var = synthetic_variance(N_SC, 4500 + i); + black_box(d.process_frame(&phases, &s, &var, 0, 0.05)); + } + d +} + +fn bench_exo_ghost_hunter(c: &mut Criterion) { + let amps = synthetic_amplitudes(N_SC, 9000); + let var = synthetic_variance(N_SC, 9500); + c.bench_function("exo_ghost_hunter::process_frame[empty_room_periodicity]", |b| { + let mut s = 0x2468_ACE0u32; + b.iter_batched( + prefilled_ghost_hunter, + |mut d| { + let i = d.frame_count(); + let phases: Vec = (0..N_SC) + .map(|k| libm::sinf(i as f32 * 0.4 + k as f32 * 0.1) * 0.3 + lcg(&mut s) * 0.02) + .collect(); + black_box(d.process_frame( + black_box(&phases), + black_box(&s), + black_box(&var), + black_box(0), + black_box(0.05), + )); + }, + BatchSize::SmallInput, + ); + }); +} + +// ── sec_weapon_detect: per-subcarrier Welford hot path ─────────────────────── +// +// After calibration the detector runs a per-subcarrier online Welford update +// over MAX_SC=32 subcarriers each frame (the M6-audit-named hot path). We +// calibrate first (the early frames just accumulate baseline stats), then bench +// one steady-state frame. + +fn calibrated_weapon_detector() -> WeaponDetector { + let mut d = WeaponDetector::new(); + // Drive enough empty-room frames to complete calibration + warm the running + // Welford state. Calibration window is internal; 200 frames is comfortably + // past it for MAX_SC=32. + for i in 0..200u32 { + let phases = synthetic_phases(N_SC, 6000 + i); + let amps = synthetic_amplitudes(N_SC, 6500 + i); + let var = synthetic_variance(N_SC, 7000 + i); + black_box(d.process_frame(&phases, &s, &var, 0.05, 0)); + } + d +} + +fn bench_sec_weapon_detect(c: &mut Criterion) { + c.bench_function("sec_weapon_detect::process_frame[per_sc_welford]", |b| { + let mut seed = 8000u32; + b.iter_batched( + calibrated_weapon_detector, + |mut d| { + seed = seed.wrapping_add(1); + let phases = synthetic_phases(N_SC, seed); + let amps = synthetic_amplitudes(N_SC, seed.wrapping_add(500)); + let var = synthetic_variance(N_SC, seed.wrapping_add(1000)); + black_box(d.process_frame( + black_box(&phases), + black_box(&s), + black_box(&var), + black_box(0.3), + black_box(1), + )); + }, + BatchSize::SmallInput, + ); + }); +} + +// ── med_seizure_detect: detect_rhythm / clonic autocorrelation hot path ────── +// +// Gated behind `medical-experimental` (ADR-160 §A1). The clonic-phase rhythm +// detection autocorrelates the amplitude ring buffer (PHASE_WINDOW=100); we warm +// the buffers with a high-energy rhythmic signal, then bench one frame. +#[cfg(feature = "medical-experimental")] +mod med { + use super::*; + use wifi_densepose_wasm_edge::med_seizure_detect::SeizureDetector; + + fn warmed_seizure_detector() -> SeizureDetector { + let mut d = SeizureDetector::new(); + let mut s = 0x5EE_D00Du32; + // High-energy ~4 Hz rhythmic (period ~5 frames at 20 Hz) → exercises the + // clonic-phase rhythm/autocorrelation path, with presence asserted. + for i in 0..150u32 { + let me = 2.5 + libm::sinf(i as f32 * 1.25) * 1.5; + let amp = 1.0 + lcg(&mut s) * 0.2; + black_box(d.process_frame(0.0, amp, me, 1)); + } + d + } + + pub fn bench_med_seizure_detect(c: &mut Criterion) { + c.bench_function("med_seizure_detect::process_frame[clonic_rhythm]", |b| { + let mut s = 0x9A_BCDE_F0u32; + b.iter_batched( + warmed_seizure_detector, + |mut d| { + let i = d.frame_count(); + let me = 2.5 + libm::sinf(i as f32 * 1.25) * 1.5; + let amp = 1.0 + lcg(&mut s) * 0.2; + black_box(d.process_frame( + black_box(0.0), + black_box(amp), + black_box(me), + black_box(1), + )); + }, + BatchSize::SmallInput, + ); + }); + } +} + +#[cfg(feature = "medical-experimental")] +criterion_group!( + benches, + bench_exo_time_crystal, + bench_exo_ghost_hunter, + bench_sec_weapon_detect, + med::bench_med_seizure_detect, +); + +#[cfg(not(feature = "medical-experimental"))] +criterion_group!( + benches, + bench_exo_time_crystal, + bench_exo_ghost_hunter, + bench_sec_weapon_detect, +); + +criterion_main!(benches);