diff --git a/api-docs/adr/ADR-152-wifi-pose-sota-2026-intake.md b/api-docs/adr/ADR-152-wifi-pose-sota-2026-intake.md index 139d7f78..a22eefee 100644 --- a/api-docs/adr/ADR-152-wifi-pose-sota-2026-intake.md +++ b/api-docs/adr/ADR-152-wifi-pose-sota-2026-intake.md @@ -47,13 +47,16 @@ Adopt four changes, ordered by effort-vs-gain: 1. **Record transceiver geometry at enrollment.** `EnrollmentProtocol` gains an optional `NodeGeometry` record per node (position estimate, antenna orientation, inter-node distances where known). Stored alongside the room baseline in the bank; schema-versioned so existing banks remain readable. 2. **Fuse geometry embeddings into specialist training.** Where a specialist head consumes the (future, ADR-150) backbone embedding, concatenate a small learned embedding of `NodeGeometry` — the PerceptAlign mechanism, transplanted to our per-room banks. Statistical specialists (current) ignore it; LoRA heads (ADR-151 P6) consume it. -3. **Adopt the two-checkerboard alignment for the camera-supervised path (ADR-079).** When MediaPipe supervision is used, calibrate camera↔WiFi into one shared 3D frame before regression (<5 min, two checkerboards, a few photos). This is the direct defense against F1 for our 92.9%-PCK@20 pipeline. +3. **Adopt the two-checkerboard alignment for the camera-supervised path (ADR-079).** When MediaPipe supervision is used, calibrate camera↔WiFi into one shared 3D frame before regression (<5 min, two checkerboards, a few photos). This is the direct defense against F1 for our camera-supervised pipeline. ~~92.9%-PCK@20~~ — *that figure was retracted during measurement (b) (2026-06-10): the surviving holdout shows a constant-output model under an absolute (non-torso) threshold on 69 near-static frames; mean predictor scores 100% under the same protocol. The §2.2 no-citation rule now applies to it.* 4. **Evaluate on the PerceptAlign cross-domain dataset** (21 subjects / 7 layouts) as the MERIDIAN cross-layout benchmark — *gated on confirming its license and downloadability* (open question; repo per paper: github.com/Trymore-lab/PerceptAlign). + > **Gate resolved (2026-06-10, MEASURED by repo inspection):** repo exists, **MIT license**, dataset downloadable from HuggingFace (5 per-scene repos, raw CSI + separate vision keypoints; Intel 5300, 1TX×3RX×3 ant, 57 subcarriers — same order as ESP32 subcarrier counts; Scene3 ships 3 distinct layouts). Code present, no pretrained weights. Benchmark adoption unblocked; dataset-side license terms inherit HF dataset terms (not separately stated — check at download time). ### 2.2 Benchmark against WiFlow-STD (DY2434) — ACCEPTED Pull the Apache-2.0 weights + 360k-sample dataset; run three measurements: (a) their model on their data (reproduce 97.25% claim), (b) their model fine-tuned on our ESP32 17-keypoint eval set, (c) our internal WiFlow on their dataset (15-keypoint subset mapping). Until (a)–(c) are measured, **no RuView doc may cite 97.25% as a comparable number** — different dataset, subjects, keypoints. +> **Status (2026-06-10, measurement (a) complete — `benchmarks/wiflow-std/RESULTS.md`):** shipped checkpoint REFUTED (0.08% PCK@20 — wrong keypoint normalization, predates published code); released code does not run as published (6 defects, incl. broken package import and an unreachable test phase); released dataset's last 13 files are corrupted (9,072 windows: NaN + float32-max garbage, diverges fp16 training via BatchNorm poisoning). After repairing both, retraining with upstream defaults reproduced **96.09% PCK@20 full-test / 96.61% corruption-free / MPJPE 0.0094–0.0098** (published: 97.25% / 0.007) on an RTX 5080. Accuracy claims graded MEASURED-EQUIVALENT; params (2.23M) and FLOPs (~0.055G) verified. (b)/(c) remain open. + ### 2.3 Apply the UNSW recipe to the ADR-150 encoder — ACCEPTED (amends ADR-150 §2.3) - Pretraining corpus: start from the same 14 public datasets (1.3M samples) + our home/MM-Fi frames; data aggregation takes priority over architecture work. @@ -62,7 +65,7 @@ Pull the Apache-2.0 weights + 360k-sample dataset; run three measurements: (a) t ### 2.4 Hardware watch items — ACCEPTED (no code now) -- **802.11bf**: track silicon/certification; revisit when any commodity chipset exposes standardized sensing measurements. Our opportunistic CSI extraction remains the mechanism until then. +- **802.11bf**: track silicon/certification; OTA binding remains deferred until commodity chipsets expose standardized sensing measurements. **Amended by ADR-153** (2026-06-10): implement a pure Rust forward-compatibility protocol layer now — typed procedure models, a deterministic session FSM, a transport abstraction, simulation tests, and an `OpportunisticCsiBridge` that maps today's ESP32 CSI batches into standardized sensing-report shape. - **esp_wifi_sensing**: benchmark our presence pipeline against the vendor FSM (one afternoon; useful external baseline). Do **not** treat as drop-in (refuted claim). - **ZTECSITool AP**: optional high-resolution anchor node for the ADR-029 multistatic mesh — procurement-gated; only pursue if a 160 MHz anchor materially helps tomography. @@ -71,6 +74,29 @@ Pull the Apache-2.0 weights + 360k-sample dataset; run three measurements: (a) t - No pivot toward "wireless foundation model" papers that don't ship WiFi-CSI artifacts (HeterCSI, FMCW pilot, surveys). - No DensePose-UV work item: the field has not demonstrated UV regression from commodity WiFi; keypoints remain our supervised target (F5). +### 2.6 RuVector vendor sync + integration opportunities (added 2026-06-10) + +**Vendor sync record.** `vendor/ruvector` moved from pin `e38347601` (2026-05-07) to `a083bd77f` (origin/main, 3 commits past tag `ruvector-v0.2.28`; vendored workspace version 2.2.3). 111 commits in the range, roughly half NAPI-binary/lint chores. Substantive: graph condensation + differentiable min-cut (#547), core HNSW correctness fixes v2.2.3 (#502), RUSTSEC/clippy hardening (#504), ONNX embedder API-contract fix (#523/#525 — npm/TypeScript package only), dead parallel-worker import removal (#532). *Evidence: MEASURED (git range + commit-stat inspection).* + +**Opportunity table.** Workspace policy is crates.io versions only, so unpublished crates are WATCH by definition regardless of fit. + +| Crate | What it offers | wifi-densepose target | crates.io | Verdict | +|---|---|---|---|---| +| `ruvector-graph-condense` (new, #547) | Training-free min-cut graph condensation + **differentiable normalized-cut loss** (`DiffCutCondenser`, analytic MinCutPool-style gradients, gradient-checked tests; provenance-retaining super-nodes) | `subcarrier_selection.rs` (condense 114 subcarriers into cut-preserving regions instead of raw min-cut); auxiliary clustering regularizer for `wifi-densepose-train`; `DynamicPersonMatcher` region structure | **Not published** | **WATCH** — strongest technical fit in the sync; adopt when published. README's "no published method uses graph-cut condensation" is CLAIMED; the diffcut implementation + tests are MEASURED | +| `ruvector-attention` 2.1.0 | #304 SOTA modules: MLA, KV-cache, SSM, sparse/MoE, hybrid search, Graph RAG (publish date 2026-03-27 matches the #304 commit — MEASURED) | Supersedes pinned 2.0.4 used by `model.rs` spatial attention + `bvp.rs`; SSM/MLA are candidate pure-Rust edge-inference primitives for the ADR-150 encoder | 2.1.0 (pinned **2.0.4**) | **ADOPT** (minor bump; API-compat check first) | +| `ruvector-gnn` 2.2.0 | panic→`Result` constructors, gradient clipping, MSE/CE/BCE losses, seeded-RNG layer init (#495 is post-2.2.0) | `wifi-densepose-train` GNN path (pinned 2.0.5, `default-features = false`) | 2.2.0 (pinned **2.0.5**) | **ADOPT** (bump) | +| `ruvector-mincut` / `ruvector-solver` 2.0.6 | Patch-level fixes (workspace republish 2026-03-25) | `metrics.rs` DynamicPersonMatcher, subcarrier interpolation, triangulation | 2.0.6 (pinned **2.0.4** each) | **ADOPT** (routine patch bump) | +| `ruvector-core` 2.2.3 (vendor) | HNSW correctness: k=0 guard, sorted results, flat-index fixes, cross-integration helpers (#502 — MEASURED, `index/hnsw.rs` + new integration tests) | `homecore-recorder` `RuvectorSemanticIndex` (real HNSW consumer); `sketch.rs` quantization unaffected | **2.2.0 = latest published**; 2.2.3 unpublished | **WATCH** — bump the moment 2.2.3 publishes | +| `ruvector-cnn` 2.0.6 | Pure-Rust SIMD conv kernels (AVX2/NEON/WASM), MobileNetV3, INT8 quantization, contrastive losses (InfoNCE/triplet, #252) | **Not** the WiFlow-STD training port — `wiflow_std/model.rs` is tch/libtorch (MEASURED). Relevant to the *edge inference* path of the trained ~2.2 MB int8 model, and InfoNCE/triplet overlaps AETHER (ADR-024) | 2.0.6 | **EVALUATE** — only if/when we commit to a no-libtorch edge runtime for WiFlow-STD-class models | +| `ruvector-acorn` (new-ish) | ACORN predicate-agnostic filtered HNSW (SIGMOD'24 algorithm; γ·M denser graphs for low-selectivity filters) | Metadata-filtered pattern search over ADR-151 calibration banks — speculative; bank sizes are far below where filtered-ANN recall collapse matters | **Not published** | **WATCH** | +| `ruvector-cluster` 2.0.6 | Distributed sharding, gossip discovery, DAG consensus | No current need; ADR-029 mesh coordination is ESP32-side, not vector-DB-side | 2.0.6 | **WATCH** | +| ONNX embedder fix (#523/#525) | API-contract + packaging fixes in `npm/packages/ruvector` (TypeScript) | None — `wifi-densepose-nn`'s ONNX backend is Rust (ort/tract), untouched by this change (MEASURED: commit touches npm/ only) | n/a | No action | +| `ruvector-perception` (new, #547) | "Physical perception substrate" (hypothesis/topology/witness modules) — agent-perception oriented, not RF | None identified | Not published | WATCH (name-overlap only) | + +**Security note (RUSTSEC #504).** The substantive fixes target `ruvllm`, `ruvector-dag`, `prime-radiant`, `rvagent-*`, and the `ruvector-server` HTTP endpoint (NaN-safe `partial_cmp`, input-validation guards, env-allowlisted exec) — **none of which we pin**. The commit states `cargo audit` returns clean across the workspace. *Evidence: MEASURED (commit message + file list). Conclusion: no pinned version has an outstanding advisory; no urgent bump required.* The NaN-sort hardening is panic-robustness hygiene our pinned 2.0.4-era crates predate, which is one more reason for the routine bumps below. + +**Version-bump recommendations (follow-up PR — no Cargo.toml change in this ADR):** `ruvector-mincut` 2.0.4→2.0.6, `ruvector-solver` 2.0.4→2.0.6, `ruvector-attention` 2.0.4→2.1.0, `ruvector-gnn` 2.0.5→2.2.0. Current: `ruvector-core` 2.2.0, `ruvector-attn-mincut` 2.0.4, `ruvector-temporal-tensor` 2.0.6, `ruvector-crv` 0.1.1 — all at latest published. Nothing in the sync changes §2.1.2 geometry conditioning (our `viewpoint/attention.rs` `GeometricBias` already implements the fusion mechanism) or the ADR-150 MAE recipe (training stays in tch). + ## 3. Consequences **Positive:** the calibration system gains the one mechanism (geometry conditioning) the 2026 literature identifies as the difference between layout-brittle and layout-robust supervised WiFi pose; ADR-150 gets a measured training recipe instead of a guessed one; we acquire two external benchmarks (WiFlow-STD, PerceptAlign dataset) to keep our claims honest. @@ -82,6 +108,7 @@ Pull the Apache-2.0 weights + 360k-sample dataset; run three measurements: (a) t ## 4. Open questions (carried from the research run) 1. Does WiFlow-STD retain accuracy when fine-tuned on ESP32-S3/C6 CSI (fewer subcarriers, lower SNR), scored on our 17-keypoint set? (§2.2 answers this.) + > **Partial answer (MEASURED 2026-06-11, measurement (b) on 2,046 single-room windows — `benchmarks/wiflow-std/RESULTS.md`):** pretrained init shows strong *optimization* transfer (65% PCK@20 vs scratch's 0% collapse under the same budget) but **no feature transfer** (frozen-trunk + linear adapter ≈ 0%). And no run beat the mean-pose baseline (95.9% PCK@20 — single subject, near-static normalized coords), so no CSI→pose capability is citable from this data. A definitive answer needs multi-subject/multi-position data where the mean pose is weak. 2. Is the PerceptAlign dataset downloadable under a usable license, and does the two-checkerboard procedure work with ESP32 transceiver geometry? (§2.1.4 gate.) 3. Will esp_wifi_sensing evolve toward 802.11bf compliance, replacing opportunistic CSI extraction? diff --git a/api-docs/adr/ADR-153-ieee-802-11bf-sensing-protocol-layer.md b/api-docs/adr/ADR-153-ieee-802-11bf-sensing-protocol-layer.md new file mode 100644 index 00000000..22a86b98 --- /dev/null +++ b/api-docs/adr/ADR-153-ieee-802-11bf-sensing-protocol-layer.md @@ -0,0 +1,168 @@ +# ADR-153: IEEE 802.11bf-2025 Forward-Compatibility Protocol Model for wifi-densepose-hardware + +- **Status**: accepted +- **Date**: 2026-06-10 +- **Deciders**: ruv +- **Tags**: hardware, protocol, sensing, 802.11bf, forward-compatibility + +## Context + +IEEE 802.11bf-2025 (WLAN Sensing) is an **Active Standard**: board approval +2025-05-28, published 2025-09-26 (verified against the IEEE SA record, +). Its scope modifies the +MAC, HE and EHT PHY service interfaces, plus DMG and EDMG PHYs, for WLAN +sensing in **1–7.125 GHz** and **above 45 GHz** bands, with formal sensing +measurement setup, measurement instance, feedback/reporting, and +sensing-by-proxy (SBP) procedures (ADR-152 F4, evidence grade MEASURED). + +No commodity silicon implements the standard yet — ESP32 parts included. +ADR-152 §2.4 therefore decided "track silicon; no code now", with RuView's +opportunistic CSI extraction remaining the mechanism. That left a gap: when +silicon does land, RuView would have no typed model of the standard's +procedures to bind to, and the integration would start from zero. + +ADR-152 §2.4 originally classified 802.11bf as a hardware watch item with no +implementation work until commodity silicon exposes standardized sensing +measurements. This ADR amends that clause: OTA binding remains deferred, but +a pure Rust protocol model, session FSM, transport seam, and opportunistic +CSI bridge will be implemented now so RuView consumers can target a stable +standardized sensing interface before silicon arrives. + +The user directed (2026-06-10) that this **forward-compatibility protocol +model** — a protocol surface, not a conformance implementation — be built +now. + +## Decision + +Implement an `ieee80211bf` **forward-compatibility protocol model** in +`wifi-densepose-hardware` (pure Rust, no internal deps, simulation-testable, +no OTA path): + +> This module is not a certified 802.11bf implementation. It models the +> public procedure shape needed by RuView and RuvSense, while intentionally +> avoiding OTA frame binding until chipset support and vendor APIs exist. + +1. **`types.rs`** — typed structures for the standard's sensing procedures + (sub-7 GHz focus; DMG stubbed): Sensing Measurement Setup (setup ID, + initiator/responder and transmitter/receiver roles, bandwidth, + periodicity, threshold-based reporting parameters), Sensing Measurement + Instance, Sensing Measurement Report (CSI-variant payload), SBP + request/response, termination. Two future-proofing requirements: + + - **Version gates** — every negotiated surface is tagged with a spec + profile, because vendors will expose partial or renamed capabilities + first: + + ```rust + pub enum SpecProfile { + DraftCompatible, + Ieee80211Bf2025, + VendorExtension(String), + } + ``` + + - **Capability negotiation** — no hardcoded ESP32 assumptions in the + future-silicon path: + + ```rust + pub struct SensingCapabilities { + pub sub_7_ghz: bool, + pub dmg: bool, + pub edmg: bool, + pub csi_report: bool, + pub threshold_reporting: bool, + pub sensing_by_proxy: bool, + pub max_bandwidth_mhz: u16, + pub max_period_ms: u32, + pub max_active_setups: u16, + } + ``` + + - **Privacy and governance fields** — sensing is presence inference, not + just radio telemetry. Every `SensingMeasurementSetup` carries policy + metadata (required, not optional), for enterprise, elderly-care, + retail, workplace, and municipal deployments: + + ```rust + pub enum ConsentMode { + LabOnly, + ExplicitConsent, + ManagedEnterprisePolicy, + Disabled, + } + ``` + +2. **`session.rs`** — deterministic event-driven session state machine: + `Idle → SetupNegotiating → Active → Terminating → Idle`, with explicit + rejection paths (unsupported parameters, setup-ID collision) and timeout + handling. +3. **`transport.rs`** — a `SensingTransport` trait abstracting frame + exchange; a `SimTransport` test double; and an `OpportunisticCsiBridge` + adapter mapping today's ESP32 CSI extraction onto the report path + (measurement instances ≈ CSI frame batches), so current hardware sits + behind the standardized interface. **Replaceability benchmark + (acceptance test):** RuvSense must consume either ESP32 opportunistic CSI + or future 802.11bf chipset reports through the same `SensingTransport` + and `SensingMeasurementReport` path, with no consumer-side rewrite — a + future chipset adapter replaces `OpportunisticCsiBridge` without changing + consumers. + +Constraints: input validation at boundaries (typed errors, no panics on +adversarial input), files under 500 lines, all protocol tests runnable +without hardware. + +### Acceptance checklist + +| Area | Acceptance test | +| --------------- | -------------------------------------------------------------------- | +| Types | Serde round trip for setup, instance, report, SBP, termination | +| FSM | Idle → setup → active → terminating → idle | +| Rejection | Unsupported bandwidth, invalid period, duplicate setup ID | +| Timeout | Negotiation timeout returns typed error and resets to Idle | +| Threshold | Report emitted only when threshold condition is crossed | +| SBP | Proxy request maps to responder path without direct sensor coupling | +| Bridge | ESP32 CSI batch becomes standardized measurement report | +| Safety | No panics on malformed inputs | +| CI | All protocol tests run without hardware | +| Maintainability | Each file under 500 lines | + +### Non-Goals + +This ADR does not claim IEEE 802.11bf conformance, certification, or OTA +interoperability. It creates a typed protocol compatibility layer so RuView +can consume standardized sensing reports when commodity silicon exposes +them. Vendor-specific frame exchange, firmware hooks, trigger-frame +sounding, and certification test vectors remain future ADRs. + +## Consequences + +### Positive +- RuView can adopt standardized WLAN sensing the day any chipset exposes + 802.11bf measurements — the data model, session FSM, and transport seam + already exist and are tested. +- The `OpportunisticCsiBridge` gives current ESP32 nodes a standardized-shape + interface now, decoupling RuvSense consumers from the extraction mechanism. +- Simulation transport enables protocol-level tests in CI without hardware. +- `SpecProfile` + `SensingCapabilities` give a clean escape hatch for the + partial/renamed vendor capabilities that will certainly arrive first. +- Consent/policy metadata is structural from day one, not retrofitted. + +### Negative +- Code written against a standard with zero silicon risks drift: vendor + implementations may interpret parameters differently; the layer may need + rework at first real binding (drift risk scored 7/10 at acceptance). +- Adds maintenance surface to wifi-densepose-hardware before any + user-visible benefit (maintenance cost scored 3/10 — small without OTA). + +### Neutral +- ADR-152 §2.4's "watch item" remains: revisit when silicon/certification + appears (re-check by 2026-12). This ADR changes only the "no code now" + clause. + +## Links + +- ADR-152 — WiFi-Pose SOTA 2026 Intake (F4, §2.4 — amended by this ADR) +- ADR-028 — ESP32 capability audit (opportunistic CSI extraction baseline) +- ADR-029 — RuvSense multistatic sensing mode (consumer of sensing reports) +- IEEE 802.11bf-2025 — Active Standard, board approval 2025-05-28, published + 2025-09-26: diff --git a/api-docs/readme-details.md b/api-docs/readme-details.md index 281498ba..5b09ade0 100644 --- a/api-docs/readme-details.md +++ b/api-docs/readme-details.md @@ -50,7 +50,7 @@ See [PR #405](https://github.com/ruvnet/RuView/pull/405) for full details. ### What's New in v0.7.0
-Camera Ground-Truth Training — 92.9% PCK@20 +Camera Ground-Truth Training **v0.7.0 adds camera-supervised pose training** using MediaPipe + real ESP32 CSI data: @@ -76,15 +76,20 @@ node scripts/train-wiflow-supervised.js --data data/paired/*.jsonl --scale lite node scripts/eval-wiflow.js --model models/wiflow-real/wiflow-v1.json --data data/paired/*.jsonl ``` -**Result: 92.9% PCK@20** from a 5-minute data collection session with one ESP32-S3 and one webcam. +> **Accuracy retraction (2026-06-10):** the "92.9% PCK@20" figure previously +> shown here is retracted. A forensic recheck of the surviving eval holdout +> (69 samples) found a constant-output model scored with an absolute +> (non-torso-normalized) threshold on nearly-static frames — a protocol under +> which a trivial mean-pose predictor scores 100%. Torso-normalized PCK@20 on +> the same holdout is ~19% (from that degenerate predictor). No measured +> camera-supervised PCK@20 is currently published (CHANGELOG, PR #535). -| Metric | Before (proxy) | After (camera-supervised) | -|--------|----------------|--------------------------| -| PCK@20 | 0% | **92.9%** | -| Eval loss | 0.700 | **0.082** | -| Bone constraint | N/A | **0.008** | -| Training time | N/A | **19 minutes** | -| Model size | N/A | **974 KB** | +| Metric | Camera-supervised run (protocol retracted) | +|--------|--------------------------------------------| +| Eval loss | 0.082 | +| Bone constraint | 0.008 | +| Training time | 19 minutes | +| Model size | 974 KB | Pre-trained model: [HuggingFace ruv/ruview/wiflow-v1](https://huggingface.co/ruv/ruview) @@ -868,7 +873,7 @@ Download a pre-built binary — no build toolchain needed: | Release | What's included | Tag | |---------|-----------------|-----| -| [v0.7.0](https://github.com/ruvnet/RuView/releases/tag/v0.7.0) | **Latest** — Camera-supervised WiFlow model (92.9% PCK@20), ground-truth training pipeline, ruvector optimizations | `v0.7.0` | +| [v0.7.0](https://github.com/ruvnet/RuView/releases/tag/v0.7.0) | **Latest** — Camera-supervised WiFlow model (accuracy figure retracted 2026-06-10, see above), ground-truth training pipeline, ruvector optimizations | `v0.7.0` | | [v0.6.0](https://github.com/ruvnet/RuView/releases/tag/v0.6.0-esp32) | [Pre-trained models on HuggingFace](https://huggingface.co/ruv/ruview), 17 sensing apps, 51.6% contrastive improvement, 0.008ms inference | `v0.6.0-esp32` | | [v0.5.5](https://github.com/ruvnet/RuView/releases/tag/v0.5.5-esp32) | SNN + MinCut (#348 fix) + CNN spectrogram + WiFlow + multi-freq mesh + graph transformer | `v0.5.5-esp32` | | [v0.5.4](https://github.com/ruvnet/RuView/releases/tag/v0.5.4-esp32) | Cognitum Seed integration ([ADR-069](docs/adr/ADR-069-cognitum-seed-csi-pipeline.md)), 8-dim feature vectors, RVF store, witness chain, security hardening | `v0.5.4-esp32` | diff --git a/api-docs/user-guide.md b/api-docs/user-guide.md index 1e149d24..52f28e60 100644 --- a/api-docs/user-guide.md +++ b/api-docs/user-guide.md @@ -1747,7 +1747,14 @@ See [ADR-071](adr/ADR-071-ruvllm-training-pipeline.md) and the [pretraining tuto For significantly higher accuracy, use a webcam as a **temporary teacher** during training. The camera captures real 17-keypoint poses via MediaPipe, paired with simultaneous ESP32 CSI data. After training, the camera is no longer needed — the model runs on CSI only. -**Result: 92.9% PCK@20** from a 5-minute collection session. +> **Accuracy note (2026-06-10):** the previously cited "92.9% PCK@20" figure is +> retracted — a forensic recheck of the surviving eval holdout showed it came +> from a constant-output model scored with an absolute (non-torso-normalized) +> threshold on 69 nearly-static frames, a protocol under which a trivial +> mean-pose predictor scores 100%. No measured camera-supervised PCK@20 is +> currently published (see CHANGELOG, PR #535). Treat this workflow as a data +> collection mechanism; accuracy claims will follow a ≥35-minute multi-pose +> collection session evaluated with torso-normalized PCK. ### Requirements @@ -1755,50 +1762,103 @@ For significantly higher accuracy, use a webcam as a **temporary teacher** durin - ESP32-S3 node streaming CSI over UDP (port 5005) - A webcam (laptop, USB, or Mac camera via Tailscale) -### Step 1: Capture Camera + CSI Simultaneously +### Step 0: Check your CSI rate and plan the session length + +Window yield is `csi_frames / 20` — **your CSI packet rate sets how long you +must record.** Check it first (10-second probe): + +```bash +python - <<'EOF' +import socket, time +s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM); s.bind(('0.0.0.0', 5005)); s.settimeout(2) +n, t0 = 0, time.time() +while time.time() - t0 < 10: + try: s.recvfrom(4096); n += 1 + except socket.timeout: pass +print(f"{n/10:.1f} Hz -> {n/10*60/20:.0f} windows/min") +EOF +``` + +| CSI rate | Windows/min | Minutes for 2,000 windows (minimum trainable) | +|---|---|---| +| ~13 Hz (idle network) | ~39 | ~52 min | +| ~53 Hz (active self-ping, #985 firmware) | ~160 | ~13 min — record 35–40 min anyway for pose variety | + +A 5-minute session is **not enough to train on** — it produces a few hundred +windows of one pose context, and models trained on it memorize rather than +generalize (this is what invalidated the earlier accuracy figure). + +### Step 1: (Recommended) calibrate camera ↔ room + +The two-checkerboard calibration (ADR-152 §2.1.3) puts labels in a shared 3D +room frame instead of raw camera coordinates, which is the published defense +against layout-brittle "coordinate overfitting" (PerceptAlign, MobiCom'26): + +```bash +python scripts/calibrate-camera-room.py # < 5 min, two checkerboards + a few photos +``` + +Without it, collection still works but labels are camera-frame only and the +trained model will not survive camera/node relocation. + +### Step 2: Capture Camera + CSI Simultaneously Run both scripts at the same time (in separate terminals): ```bash -# Terminal 1: Record ESP32 CSI -python scripts/record-csi-udp.py --duration 300 +# Terminal 1: Record ESP32 CSI (2400 s = 40 min) +python scripts/record-csi-udp.py --duration 2400 # Terminal 2: Capture camera keypoints -python scripts/collect-ground-truth.py --duration 300 --preview +python scripts/collect-ground-truth.py --duration 2400 --preview \ + --calibration data/calibration/camera-room.json # omit if you skipped Step 1 ``` -Move around naturally in front of the camera for 5 minutes. The `--preview` flag shows a live skeleton overlay. +During capture: keep your **full body in frame** with good lighting (MediaPipe +confidence must stay above 0.5 — low-confidence frames are dropped at +alignment), and **change activity every 1–2 minutes**: walk, raise hands, +squat, hands up, kick, wave, turn, jump, sit, stand still. Pose variety is +what the model learns from; 40 minutes of sitting produces a constant-pose +predictor. -### Step 2: Align and Train +### Step 3: Align and Train ```bash -# Align camera keypoints with CSI windows +# Align camera keypoints with CSI windows (prints kept/dropped window counts — +# expect roughly csi_frames/20 kept; investigate if far below) node scripts/align-ground-truth.js \ --gt data/ground-truth/*.jsonl \ --csi data/recordings/csi-*.csi.jsonl -# Train (start with lite, scale up as you collect more data) +# Train (pick the preset matching your window count) node scripts/train-wiflow-supervised.js \ --data data/paired/*.jsonl \ - --scale lite \ + --scale small \ --epochs 50 -# Evaluate +# Evaluate — torso-normalized PCK on a TEMPORAL split node scripts/eval-wiflow.js \ --model models/wiflow-supervised/wiflow-v1.json \ --data data/paired/*.jsonl ``` +**Evaluation protocol matters.** Use `eval-wiflow.js` (torso-normalized +PCK@20, the metric comparable to published WiFi-pose results) on a temporal +hold-out, and sanity-check that predictions actually vary across frames +(`pred std > 0`) — a constant-pose model can score deceptively well on +near-static data under weaker protocols. See +`benchmarks/wiflow-std/RESULTS.md` for the forensic case study. + ### Scale Presets | Preset | Params | Training Time | Best For | |--------|--------|---------------|----------| -| `--scale lite` | 189K | ~19 min | < 1,000 samples (5 min capture) | -| `--scale small` | 474K | ~1 hr | 1K-10K samples | -| `--scale medium` | 800K | ~2 hrs | 10K-50K samples | -| `--scale full` | 7.7M | ~8 hrs | 50K+ samples (GPU recommended) | +| `--scale lite` | 189K | ~19 min | sanity runs only (< 2K windows trains poorly) | +| `--scale small` | 474K | ~1 hr | 2K-10K windows (one 40-min session) | +| `--scale medium` | 800K | ~2 hrs | 10K-50K windows (multiple sessions/rooms) | +| `--scale full` | 7.7M | ~8 hrs | 50K+ windows (GPU recommended) | -See [ADR-079](adr/ADR-079-camera-ground-truth-training.md) for the full design and optimization details. +See [ADR-079](adr/ADR-079-camera-ground-truth-training.md) for the full design and optimization details, and ADR-152 §2.2 for the external WiFlow-STD benchmark these numbers should be read against. ---