wifi-densepose

Commit Graph

Author	SHA1	Message	Date
ruv	2e4461d64d	release: bump 9 crates changed in the beyond-SOTA sweep for crates.io vitals/wifiscan/hardware/nn 0.3.0->0.3.1, ruvector 0.3.1->0.3.2, signal 0.3.2->0.3.3, train 0.3.1->0.3.2, mat 0.3.0->0.3.1, sensing-server 0.3.1->0.3.2. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-11 22:41:21 -04:00
ruv	aa3a6725a6	fix(train,nn): Tier-2 correctness/security — metric scale, OOM bounds, panics (ADR-155 §Tier-2) Each fix ships a test that would have caught the bug: - ruview_metrics OKS: derive scale from GT extent (no s=1.0 fake-Gold), reject s<=0, bound the loop to array extents (no panic on short/adversarial input). - config.validate(): UPPER bounds on window_frames/subcarriers/backbone_channels/ heatmap_size/keypoints/body_parts/batch_size + reject negative gpu_device_id (closes the config-OOM class); defaults+presets still validate. - subcarrier.rs: graceful fallback instead of panic on non-contiguous input. - ablation.rs latency_percentiles: total_cmp + NaN guard (no partial_cmp unwrap). - tensor.rs softmax(axis): normalize per-lane along the given axis (was whole- tensor), out-of-range axis -> NnError; fixes densepose per-pixel probs. - translator.rs apply_attention: real scaled-dot-product attention (was a uniform 1/seq_len stub that made any "with attention" ablation == without); mis-shaped checkpoint projections rejected. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-11 19:57:32 -04:00
ruv	84e2c920fd	fix(train): proof margin + committed-hash requirement (ADR-155 §Tier-1.4) The deterministic proof self-certified: PASS on any loss decrease (incl. 1e-9 noise) and a missing expected hash defaulted to PASS. - MIN_LOSS_DECREASE=1e-4: a run counts as learning only above float noise; a noise-only pipeline now FAILS. - is_pass() requires hash_matches==Some(true); no-hash -> SKIP (exit 2), never PASS. verify-training fails fast on a sub-margin loss before the hash compare, so a missing baseline cannot mask a non-learning pipeline. Documented honestly: the proof certifies reproducibility/determinism on a synthetic dataset, NOT that real data produced the weights nor that any accuracy claim is met. Tests: no_committed_hash_is_skip_not_pass, submargin_loss_change_fails_even_without_hash, committed_matching_hash_with_real_decrease_passes. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-11 19:57:16 -04:00
ruv	7fb3e33557	fix(train): rapid_adapt real finite-difference gradients, not a fake step (ADR-155 §Tier-1.3) contrastive_step/entropy_step wrote a fake gradient (grad += v0.01) unrelated to the stated objective, so any "TTA improves the metric" was unsupported. The _loss functions are now pure evaluators of the real objective; adapt() descends them with a central finite-difference gradient of that exact loss, so "the adaptation loss decreases" is now a real, reproducible measurement. Honest scope caveat (documented): this minimizes a self-supervised proxy over a LoRA bottleneck on raw CSI; it is NOT wired to the pose model and there is NO measured end-to-end PCK gain on WiFi pose from this path. Tests: contrastive_loss_decreases, entropy_loss_decreases (real gradient steps don't increase the loss), reported_loss_is_the_real_objective_not_a_placeholder. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-11 19:57:15 -04:00
ruv	2a2a2c5b06	fix(train): leak-free subject-disjoint split + synthetic-val disclosure (ADR-155 §Tier-1.2) MM-Fi windows are stride-1 (~99% overlap), so an index-level split leaks; and bin/train.rs validated real training against a SYNTHETIC val set, making any printed PCK meaningless on two counts. - MmFiDataset::subject_disjoint_split partitions whole subjects -> the two views share no subject and no window (leak-free by construction, deterministic per seed). assert_split_leak_free verifies subject- AND window-disjointness and is called inside the split so a leaky split is never handed out. - bin/train.rs now prefers the real split; the synthetic path is a labelled run_smoke_test ("[SMOKE-TEST] DO NOT REPORT") reachable only as a fallback. - New DatasetError::InvalidSplit. Tests prove disjointness, determinism, single-subject/bad-fraction rejection, and that the validator catches an injected subject leak. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-11 19:56:57 -04:00
ruv	50b657459f	fix(train): unify 7 divergent PCK/OKS into one canonical metric (ADR-155 §Tier-1.1) Collapse the four PCK and three OKS implementations into a single source of truth — pck_canonical (torso hip↔hip, COCO/ADR-152 convention validated at ~96% PCK@20 in benchmarks/wiflow-std) and oks_canonical (scale from GT pose extent). MetricsAccumulator, compute_pck/_per_joint/_oks, aggregate_metrics and the deprecated *_v2 path all route through them, so Trainer::evaluate() and the bench definition agree. Fixes two claim-inflating bugs, each pinned by a regression test: - zero-visible-joint PCK was 1.0 (false-perfect) -> now 0.0 - OKS s=1.0 on normalized coords made OKS~=1.0 for any pose ("fake Gold tier") -> scale now derived from the pose; a 3x-torso-wrong pose yields OKS<0.2 Divergent local kernels (training_bench raw-threshold, sensing-server torso-height) annotated "DO NOT USE for reported metrics". Legitimately changed test expectations (all-coincident "perfect" fixtures are correctly unscoreable; all-invisible -> 0.0) updated with comments citing the finding. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-11 19:56:44 -04:00
rUv	17471e93ff	ADR-152: WiFi-Pose SOTA 2026 intake — WiFlow-STD benchmark, Rust integrations, ADR-153 802.11bf layer, efficiency frontier (#1008 ) * feat(calibration): NodeGeometry transceiver-geometry recording (ADR-152 §2.1.1) PerceptAlign-motivated geometry capture at enrollment: per-node optional records (position, antenna orientation, inter-node distances, acquisition method) — recorded when known, never required. Event-sourced via EnrollmentEvent::GeometryRecorded (latest recording wins); persisted on SpecialistBank with serde defaults so pre-ADR-152 bank JSON loads cleanly (fixture-proven, and geometry-free banks serialize byte-shape-identical to the old schema); threaded through MultiNodeMixture as data only — the learned geometry embeddings and algorithmic fusion use are §2.1.2, deliberately deferred until the ADR-151 P6 LoRA heads exist. Geometry recorded from now on means banks captured today remain usable for layout-conditioned training later — you can't retroactively add geometry to data you didn't record. 8 new tests (3 geometry, 2 anchor, 2 bank, 1 multistatic) + full-loop extension (2-node geometry, one tape-measured + one unknown, surviving the bank JSON round-trip the runtime loads from). 50/50 calibration (both feature configs) + 23 CLI tests green. Co-Authored-By: RuFlo <ruv@ruv.net> * feat(training): two-checkerboard camera↔room calibration for ADR-079 labels (ADR-152 §2.1.3) Defends the camera-supervised pipeline against PerceptAlign's "coordinate overfitting": MediaPipe keypoints were emitted in raw camera coordinates with no shared frame and no transceiver-geometry metadata — the exact label shape that memorizes deployment layout and collapses cross-layout. - scripts/calibrate-camera-room.py + calibration_lib.py: OpenCV two-checkerboard calibration → versioned bundle JSON (intrinsics, camera→room extrinsics, checkerboard spec, transceiver geometry, sha256 calibration_id). Intrinsics resolve from file > cache > multi-view computation > loud-warning 2-view fallback. - collect-ground-truth.py --calibration <bundle>: every sample gains keypoints_room (unit bearing rays from the camera center in the room frame — documented projective alignment; raw image coords preserved so training chooses), camera_origin_room, calibration_id, and the transceiver geometry stamp. Without the flag, output is byte-identical to before (tested) + a one-line ADR-152 warning. Design finding (recorded for ADR-152): a single planar checkerboard's corner grid is centrosymmetric — the reversed corner ordering fits a ghost camera pose with IDENTICAL reprojection error, so per-board flip disambiguation is mathematically ill-posed. solve_two_board_extrinsics solves the joint wall+floor set over all 4 flip combinations, where the minimum is unique — an independent reason the TWO-checkerboard method is required, beyond what PerceptAlign states. 15 headless pytest tests green (synthetic corners: extrinsics recovery incl. ghost resolution, bundle round-trip + hash stability, ray transforms w/ distortion + cross-resolution, no-calibration byte identity). Co-Authored-By: RuFlo <ruv@ruv.net> * feat(benchmarks): WiFlow-STD reproduction harness + measurement (a) results (ADR-152 §2.2) Shipped checkpoint REFUTED (0.08% PCK@20, wrong keypoint normalization); 6 reproducibility defects documented (broken imports, corrupted dataset tail with float32-max garbage that NaN-poisons fp16 BatchNorm, unreachable test phase). After repairs, retraining with upstream defaults reproduces 96.09% PCK@20 full-test / 96.61% corruption-free (published 97.25%) on RTX 5080. Claims graded MEASURED-EQUIVALENT; 2.23M params + ~0.055 GFLOPs verified. Third-party code/weights/data stay out of tree (gitignored). Co-Authored-By: claude-flow <ruv@ruv.net> * feat: ADR-152 Rust integrations + ADR-153 802.11bf protocol model - calibration: GeometryEmbedding — 32-slot permutation-invariant NodeGeometry featurization for future LoRA-head conditioning (ADR-152 §2.1.2); derived SpecialistBank::geometry_embedding() accessor; 59 tests - train: MaePretrainConfig + patchify/random-mask with UNSW measured recipe (80% masking, (30,3) patches; ADR-152 §2.3, arXiv 2511.18792); strict no-truncate/no-NaN policy; proptest properties - train: WiFlowStdModel — tch-gated port of the verified ~96%-PCK@20 WiFlow-STD architecture (ADR-152 §2.2 beyond-SOTA); ungated param formula pinned to 2,225,042; 15/17-keypoint support; 239 crate tests - hardware: ieee80211bf forward-compatibility protocol model (ADR-153): SpecProfile gates, SensingCapabilities negotiation, required ConsentMode, session FSM, SensingTransport + SimTransport + OpportunisticCsiBridge; full acceptance checklist covered; 156+4 tests - deps: ruvector bumps per ADR-152 §2.6 survey (mincut/solver 2.0.6, attention 2.1.0, gnn 2.2.0); vendor/ruvector synced to a083bd77f - docs: ADR-153 accepted; ADR-152 §2.2 status, §2.4 amendment, §2.6 added Workspace: 162 test suites green (--no-default-features); Python proof PASS. Known pre-existing flake: homecore-api env_empty_falls_back_to_defaults (unserialized env-var mutation) — untouched, follow-up. Co-Authored-By: claude-flow <ruv@ruv.net> * docs: CHANGELOG + CLAUDE.md entries for ADR-152 integrations and ADR-153 Co-Authored-By: claude-flow <ruv@ruv.net> * fix(train): repair tch-backend bit-rot — gated path compiles and tests run again Mechanical API refresh against current tch: Vec::from(Tensor) -> try_from (+ explicit flatten), numel() usize cast, Rem/div ops -> remainder() / divide_scalar_mode(floor) — the latter fixed a silent true-division bug in heatmap argmax decoding; clamp(1.0, f64::MAX) -> clamp_min (torch 2.x scalar overflow panic); petgraph EdgeRef import; missing EvalMetrics and verify_checkpoint_dir APIs that tests documented. wiflow_std roundtrip test uses safetensors (.pt _save_parameters roundtrip broken in torch 2.11 Windows). Gated: 349 passed (incl. all 20 wiflow_std); ungated: unchanged. Known pre-existing: gaussian-heatmap convention mismatch (2 tests), proof seed race under parallel threads — documented, deliberate follow-ups. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(train): WiFlow-STD PyTorch->tch weight import + numerical parity proof export_to_safetensors.py maps the retrained checkpoint (295 tensors -> 248 mapped, param sum exactly 2,225,042; num_batches_tracked dropped) into a tch-loadable safetensors plus a deterministic parity fixture. Gated #[ignore] integration test loads it strictly and asserts forward-pass agreement: max abs diff 1.192e-7 on the seed-42 fixture. dump_variable_names test makes the tch name layout authoritative. Zero architecture discrepancies found. Co-Authored-By: claude-flow <ruv@ruv.net> * fix: workflow-review findings — BN gamma init, ThresholdParams serde, init docs Concurrent validation workflow (2 review lanes + adversarial verification, 13 agents): 5 confirmed findings, 3 refuted. Fixes: - wiflow_std: pin BatchNorm gamma to 1.0 (tch default draws Uniform(0,1) — silently halves activations in from-scratch training; loaded checkpoints unaffected, parity re-verified after the change) - wiflow_std: document the conv-init divergences vs the reference's effective kaiming_normal(fan_out) re-init (from-scratch dynamics only) - ieee80211bf: ThresholdParams deserialization validates via try_from so the <=100 invariant holds for untrusted payloads (+ rejection test) Benchmarks (release, ruvzen): GeometryEmbedding 1.84us/call (542k/s), MAE tokenization 7.38us/window (135k/s), 802.11bf FSM 8.9M events/s — nothing suspicious. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(adr): ADR-152 §2.1.4 gate resolved — PerceptAlign repo MIT, dataset on HF Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): edge optimization measured + measurement (b) blocked + 92.9% retraction Edge optimization (ADR-152 optimize track): ONNX Runtime fp32 is the CPU latency win (3.2 ms/window, ~3.4x faster than torch, parity 2.4e-7); ORT dynamic int8 reaches 2.44 MB (paper's ~2.2 MB claim plausible only via conv-capable toolchains; -0.16pt PCK@20, +18% MPJPE, 2x slower); torch dynamic quant converts 0% of this conv-only model; fp16 halves storage free but is slower on CPU. Measurement (b) BLOCKED-ON-DATA: only 1,077 paired ESP32 windows exist (stop rule <2k). Forensic recheck of the surviving April holdout RETRACTS the ADR-079 '92.9% PCK@20' figure: constant-output model, absolute (not torso) threshold, 69 near-static frames — mean predictor scores 100% under that protocol; torso-PCK@20 is 19.1%. Corroborates PR #535. Stale citations removed from user-guide, readme-details, ADR-152 §2.1.3; no-citation rule extended to ADR-079 accuracy claims. Unblock: >=2k-window multi-pose paired session + torso-PCK re-baseline. Co-Authored-By: claude-flow <ruv@ruv.net> * docs(user-guide): corrected camera-supervised collection tutorial Step 0 CSI-rate check + session-length math (window yield = frames/20 — the May session's 8x under-delivery was a ~12 Hz CSI rate, not an aligner bug); two-checkerboard calibration step (ADR-152 §2.1.3); pose-variety and confidence guidance; torso-normalized PCK + temporal-split + pred-variance eval protocol (lessons from the 92.9% retraction); scale presets re-keyed to realistic window counts. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): static PTQ int8 (calibrated) results + overnight capture script Conv-only static QDQ beats dynamic int8 on accuracy (PCK@20 96.61-96.63% vs 96.52%, MPJPE +10% vs +18% over fp32) at ~equal size/latency; all-ops QDQ strictly worse (int8 activations through attention glue). Entropy calibration verified bit-identical to MinMax on this data. Deployment: ONNX fp32 for speed (3.2ms), static conv-only QDQ for smallest (2.53MB). Also: scripts/overnight-empty-capture.py — segmented UDP CSI recorder for empty-room baselines (no glob collisions, detach-safe). Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): measurement (b) MEASURED — optimization transfer only, mean-pose baseline wins WiFlow-STD fine-tuned on 2,046 fresh single-room ESP32 paired windows (temporal 70/15/15, 70->540 adapter, K=17): pretrained-init 65% PCK@20 vs scratch 0% (optimization transfer) but frozen-trunk ~0% (no feature transfer), and NOTHING beats the mean-pose baseline (95.9% PCK@20 — single subject, near-static normalized coords). Honesty gates held: pred std 0.0113 (non-constant model) but mean-baseline dominance means no citable CSI->pose capability from this data. ADR-152 open question 1 answered partially; definitive answer needs multi-subject/position data. Two new aligner findings: heterogeneous csi_shape with silent zero-padding (~20%), and extractCsiMatrix's transposed shape label (frame-major data, [nSc, nFrames] label) — fixes pending. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(benchmarks): efficiency sweep MEASURED — half model dominates full reference Compact WiFlow-STD variants on the same data/split/protocol: half (843,834 params, 0.38x) strictly dominates the 2.23M reference (PCK@20 96.62 vs 96.61, PCK@50 99.47 vs 99.11, MPJPE 0.00898 vs 0.0094) — the published architecture is over-parameterized for its own benchmark. quarter (338k) 96.05%; tiny (56,290 params, 1/39.5) holds 94.11% — a ~220KB fp32 edge candidate. In-domain caveats recorded; cross-domain untested. Co-Authored-By: claude-flow <ruv@ruv.net> * feat(train): compact WiFlow-STD presets in Rust + tiny edge artifact (ADR-152) WiFlowStdConfig gains half()/quarter()/tiny() mirroring the overnight sweep exactly: TcnGroupsMode (Fixed/Gcd/Depthwise), input_pw_groups, derived stride schedule and decoder-mid (all default to upstream behavior; legacy serde JSON unaffected). Param formulas pin to trained ground truth first try: 843,834 / 338,600 / 56,290; default 2,225,042 pin and 1.192e-7 parity unchanged. 248 tests green. Tiny edge artifact (tiny_edge_bench.py): ONNX fp32 = 295 KB, 0.66 ms/win (~1,500/s CPU), 94.11% PCK@20 (matches sweep clean-test exactly; parity 1.49e-7). Static int8 is a bad trade at this scale (-1.43pt, +19% MPJPE, -16% size, slower) — recorded as negative result. Export note: width-16 breaks AdaptiveAvgPool((15,1)) TorchScript export; replaced by exact mean+matmul equivalent, proven by parity. Co-Authored-By: claude-flow <ruv@ruv.net> * fix: resolve all 10 confirmed code-review findings (7-angle review, 20/20 verified) wiflow_std: min_feature_width (default 15) replaces the keypoints->stride coupling — for_keypoints(17) now provably builds the trained [2,2,2,2] graph and pools 15->17, matching the validated Python protocol (pinned by tests); param_count() total on invalid configs; random_mask returns Result and rejects non-finite/out-of-range ratios; trainer checkpoints switched to safetensors (.pt VarStore roundtrip broken on Windows torch 2.11). ieee80211bf: SBP proxy now re-triggers instances and relays reports via Action::RelaySbpReport -> SensingFrame::SbpReport (clients consume via their existing path); missed_instances reset on success = consecutive semantics; SessionTable gains a guarded SBP entry point + unknown-id drop counter; initiator-role sessions reject inbound setup/SBP requests (RejectedNotSupported) closing the idle hijack; StartSetup/StartSbp outside Idle return InvalidStateForCommand; SBP validation unified through evaluate_setup with a 1:1 SetupStatus->SbpStatus mapping. events.rs split out to honor the 500-line cap. calibration/cli: enrollment geometry now actually reaches trained banks — both production call sites attach .with_geometry; --geometry flag on train-room and POST /enroll/geometry + train-body geometry on calibrate-serve give production a recording surface; geometry-free banks log the ADR-152 §2.1.2 note. benchmarks: corruption masks committed as ground truth (unregenerable after in-place cleaning; verified bit-identical regeneration from the pristine copy) + generate_corruption_masks.py producer; _bench_common.py dedups the 5x-copied shim/evaluate/seed/remap (post-refactor PCK@20 re-verified equal to the last digit); remote scripts get the mmap patch; tiny_edge --calib validated multiple-of-64; onnx_bench --help no longer executes (and overwrote) the export — artifact restored byte-exact. Workspace: 2,963 tests passed, 0 failed; Python proof PASS. Co-Authored-By: claude-flow <ruv@ruv.net> * ci: build workspace tests without debuginfo — runner disk exhaustion The combined 38-crate debug target exceeds the GitHub runner's disk ('final link failed: No space left on device'); the same tree measured 151GB locally with full debuginfo. CARGO_PROFILE_{DEV,TEST}_DEBUG=0 shrinks the target ~5-10x; debuginfo serves no purpose in CI test runs. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-06-11 17:02:23 -04:00
rUv	29de574e63	Beyond-SOTA engine/signal/train improvements: mesh partition guard, FFT CIR solver, canonical frame decoder, falsifiable occupancy benchmark, governed streaming, adapter provenance (#1018 ) * docs(research): add RuView beyond-SOTA system review (00) First document of the beyond-SOTA research series: capability audit of the current RuView engine with role-to-crate maturity matrix, ruvsense module inventory, gap analysis, and risk register. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): add beyond-SOTA architecture design (02, in progress) https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): finalize beyond-SOTA architecture (02) https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): add benchmark/validation methodology snapshot (03) https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * docs(research): add beyond-SOTA series index with validation results; changelog README index ties the 5 research docs together with the session's measured validation evidence: 2,797 workspace tests / 0 failed, Python proof PASS (bit-exact), and paired pre/post criterion CIR benchmarks. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * perf(signal): precompute CIR warm-start system; hoist tomography solver allocs Exact, determinism-safe optimizations (bit-identical float results): - cir.rs: diag(PhiH Phi)+lambdaI and its CSR matrix depend only on Phi and lambda (fixed at CirEstimator::new) but were rebuilt every frame (O(KG) pass + CSR allocation). Now built once in new() via build_warm_start_system; summation order unchanged. - tomography.rs: ISTA gradient buffer hoisted out of the 100-iteration loop (fill(0.0) reset) and the Frobenius Lipschitz bound moved from per-reconstruct to construction. Verified: signal 456 tests green; engine 11/11 green including cycle_is_deterministic and witness-stability tests. Criterion paired pre/post: cir_estimate/he40 -3.9% (p<0.01), multiband -1.2/-1.4%. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * fix(worldgraph): bound SemanticState growth with deterministic retention StreamingEngine::process_cycle appended one SemanticState belief per cycle with no eviction — ~1.7M nodes/day at 20 Hz (beyond-SOTA roadmap finding #6). Add WorldGraph::prune_semantic_states(max): deterministic eviction of the oldest beliefs by (valid_from_unix_ms, id); structural nodes (rooms, zones, sensors, anchors, tracks, events) are never eligible. Wire it into the engine after each belief append (DEFAULT_SEMANTIC_RETENTION = 7,200, ~6 min at 20 Hz; set_semantic_retention to tune). The WorldGraph holds current beliefs; durable history is the recorder's job, so no audit data is lost. 3 new tests: end-to-end bounded growth, oldest-only eviction, deterministic equal-timestamp tie-break. Workspace gate: 2,865 passed, 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(sensing-server): route live frames through the governed StreamingEngine Closes the live-trust-path gap (ADR-136 section 8, beyond-SOTA system review): the running server fused live CSI with the bare MultistaticFuser, while the privacy/provenance/witness control plane (ADR-135..146) only ever ran on synthetic in-test frames. The privacy control plane was therefore bypassable on the real path. New engine_bridge module drives StreamingEngine::process_cycle from the server's live NodeState map, reusing the existing NodeState -> MultiBandCsiFrame conversion. It lazily wires each contributing node as a WorldGraph sensor (idempotent), bounds belief growth via the retention cap, and forwards explicit timestamps/calibration ids so the path stays deterministic and replayable. Wired additively into both live ESP32/WiFi fusion sites in main.rs via a split-borrow off the write guard, so person-count behavior is unchanged; the latest BLAKE3 witness is stored on AppState. Every published belief now carries evidence + model + calibration + privacy decision and a deterministic witness. Adds wifi-densepose-engine/-worldgraph/-bfld/-geo deps. 6 new bridge tests (witnessed belief with full provenance, cross-run determinism, idempotent node registration, retention bound, privacy-mode propagation). sensing-server suite 430+128 green; workspace gate 2,904 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(train): falsifiable occupancy benchmark with anti-overfitting gate Makes the presence/person-count "beyond SOTA" claim falsifiable in code instead of aspirational (the unfalsifiability gap from the beyond-SOTA system review). occupancy_bench grades predictions vs ground truth and gates a SOTA claim behind one claim_allowed invariant requiring ALL of: - DataProvenance::Measured — synthetic/mock data is scorable for regression but never claimable (anti-mock-contamination; the CLAUDE.md Kconfig-bug lesson made structural). - A leak-free EvalSplit — validate() refuses any split where a subject OR environment id appears in both train and test (subject leakage / per-environment overfitting). - n_test >= min_test_samples (small-N guard). - Presence F1 whose bootstrap-CI lower bound (deterministic seeded splitmix64) clears the threshold — not the point estimate. - Count MAE within threshold. The claim string is unreadable except through the gate (NO_CLAIM otherwise), same discipline as the ruview-gamma acceptance gate. What remains is data, not method: a frozen, SHA-pinned, subject/environment-disjoint measured replay set turns the claim into a passing/failing test. Lives in wifi-densepose-train (the eval bounded context, alongside ablation/ eval/metrics). 10 tests cover each refusal path; warning-clean under the crate's missing_docs lint. Workspace gate 2,914 passed / 0 failed. Doc 03 updated. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(engine): per-room adapter provenance + drift-to-recalibration advisor Closes the trust-chain gap where an ~11 KB per-room LoRA adapter (ADR-150 section 3.4) could silently change inference without the witness noticing: provenance carried only "rfenc-v<N>" with no notion of adapter identity. - StreamingEngine::set_room_adapter(AdapterInfo): pins the adapter's content-derived id into provenance model_version ("rfenc-v1+adapter:<id>") — and therefore into the BLAKE3 witness — so swapping or clearing adapter weights always shifts the witness. Engine test proves base -> adapter -> other-adapter -> cleared all witness differently and cleared == base. - RecalibrationAdvisor: recommends re-running the ADR-135 empty-room baseline / refitting the room adapter on sustained low fusion coherence (streak threshold, default 60 cycles ~ 3 s at 20 Hz) or an ADR-142 change-point. Surfaced as TrustedOutput::recalibration_recommended, stored on the sensing-server AppState alongside the witness at both live fusion sites. - Bridge plumbing: EngineBridge::{set_room_adapter, clear_room_adapter} + live-path test that the adapter id flows into the live witness. Scope note (honest): this is the deployable provenance/trigger half of the "retrained model" roadmap item. Fitting the adapter itself runs in the existing external calibration service (aether-arena/calibration/); a trained RF-encoder checkpoint still does not exist in-tree. Engine 15 tests, bridge 7 tests. Workspace gate: 2,918 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * fix(mat): gate api module behind its feature — standalone no-default-features builds pub mod api was unconditional while its only dependency, serde, is optional behind the 'api' feature, so any build without default features failed with 101 unresolved-serde errors (masked in --workspace runs by feature unification). The api module and its create_router/AppState re-export are now cfg(feature = "api")-gated with docsrs annotations. All combos compile: bare --no-default-features (was 101 errors, now 0), --no-default-features --features api, and full default (177 tests pass). Workspace gate: 2,918 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * perf(signal): opt-in FFT operator for the CIR ISTA solver (8-14x measured) Phi is a sub-DFT, so each ISTA mat-vec can run as one length-G FFT (O(G log G)) instead of a dense O(KG) product — the dominant-latency-hazard finding from the beyond-SOTA optimization roadmap. New CirConfig::fft_operator, default FALSE: the dense path stays the bit-exact witness default. The FFT evaluates the same sums in a different order, so enabling it shifts float results in the last bits and requires regenerating any pinned witness — strictly opt-in per deployment. FftOperator (rustfft, planned once at CirEstimator::new, scratch buffers reused across the ISTA loop) dispatches inside ista_solve: Phi x = scale forward-FFT(x) sampled at bins (k_idx mod G) Phi^H v = scale * unnormalised inverse-FFT of v scattered into those bins Warm-start and Lipschitz estimation stay dense at construction. Measured (criterion, same run, same machine): ht20: 2.22 ms -> 265 us (8.4x) ht40: 10.26 ms -> 717 us (14.3x) The real HE40 grid (K=484, G=1452) scales further per the O(KG)/O(G log G) ratio. 3 new tests: FFT<->dense matvec equivalence to float tolerance on ht20 and he40 grids; end-to-end dominant-tap agreement on a single-path frame; all default configs keep FFT off. New cir_estimate_fft bench group. Workspace gate: 2,921 passed / 0 failed (default path bit-exact, witnesses unchanged). https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH feat(core): canonical frame decoder — capture-to-claim replay (ADR-136) The encode half of the ADR-136 frame contract existed (ComplexSample, to_canonical_bytes, witness_hash) but there was no decoder: a captured canonical frame could be witnessed but never reconstructed, blocking replay-from-capture. CsiFrame::from_canonical_bytes is the exact inverse: same id, metadata, complex payload, and witness hash (tested as the round-trip law AC7 — the replayed frame re-encodes byte-identically). Amplitude/phase are recomputed from the payload (projections, not independent state). Every malformed-input class fails closed (AC8): header truncation -> Truncated, payload truncation -> PayloadMismatch, unknown discriminants, non-UTF-8 device id, trailing bytes. Nil calibration uuid decodes as None per the documented encoding. Core: 36 tests pass. Workspace gate: 2,937 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(engine): dynamic min-cut mesh partition guard (ruvector-mincut) Maintains an exact min-cut over the live mesh coupling graph — nodes are sensing nodes, coupling is the product of fusion attention weights — and surfaces per cycle, as TrustedOutput::mesh: - cut value: the global "how close is the array to partitioning" number, a structural measure per-node heuristics miss; - weak side: which specific nodes would split off (failure/jamming triage, feeds ADR-032 posture); - at-risk flag: counts as a structural event for the drift->recalibration advisor (alongside ADR-142 change-points). Degenerate cases fail toward risk: a node with zero coupling is reported as already partitioned (cut 0, that node as the weak side). Measured cost policy (criterion, 12-node mesh — the honest part): - weights quantized (1/64) + change-gated: steady-state cycles do ZERO graph work and reuse the cached cut (~7.3 us, ~23x cheaper than building); - on any real change a full exact rebuild (~171 us) is used, because ONE DynamicMinCut delete+insert measured ~240 us — the subpolynomial machinery amortizes on much larger graphs, so rebuild-on-change is the measured optimum at mesh scale (one-edge case -28% after switching policy); - full process_cycle with the guard: ~33 us for 4 nodes vs the 50 ms budget. 9 mesh_guard tests (weak-node detection, steady-state zero updates, sub-quantum gating, join/drop rebuild, determinism, disconnection) + an engine-level wiring test (down-weighted node -> weak side -> recalibration). Engine 24 tests; workspace gate 2,946 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * feat(engine): mesh partition risk demotes privacy + enters the witness (ADR-032) Completes the mesh-guard integration: its at_risk signal was advisory-only (fed the recalibration advisor). It now also contributes to the ADR-141 privacy demotion alongside fusion- and array-level contradictions — a mesh close to partitioning makes the fused belief less trustworthy, so the cycle emits at a more restricted class (monotonic; information only removed). Because effective_class feeds the BLAKE3 witness, a fragmenting array now shifts the witness: partition risk is auditable, not just logged. The mesh computation moved ahead of the demotion step in process_cycle; mesh_guard_mut exposes risk-threshold tuning. Test: a forced-risk 3-node cycle demotes PrivateHome Anonymous->Restricted and shifts the witness vs a clean baseline. Engine 25 tests; workspace gate 2,947 passed / 0 failed. https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH * fix: public-PR review findings — privacy-path honesty, gate holes, mesh-guard cliff - sensing-server: engine errors logged+counted (no silent swallow), trust state exposed via status surface, privacy-demotion claims aligned with the actual parallel-audit-path behavior - occupancy_bench: vacuous-F1 hole closed (degenerate test sets fail with their own criterion); CI-lower-bound test made probative - mesh_guard: quantization scaled to observed coupling range — >=65-node balanced meshes no longer permanently at_risk (regression test) - engine: both wiring tests made probative (same-topology witness compare, deterministic risk-crossing fixture) - mat: axum/tokio optional behind api; real serde feature (api enables it) - core: canonical decoder strict (non-zero reserved bytes and nil UUID rejected — injective on accepted domain, forged-bytes tests) - CHANGELOG: un-spliced the FFT/adapter bullet mangle Co-Authored-By: claude-flow <ruv@ruv.net> * chore: strip private-track references for public PR Reword the occupancy-benchmark changelog bullet to drop a cross-reference to the private research track, and restore the WorldGraph retention bullet header that was glued onto the preceding MAT bullet. Co-Authored-By: claude-flow <ruv@ruv.net> * chore: lockfile refresh for cherry-picked feature set Co-Authored-By: claude-flow <ruv@ruv.net> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-06-11 16:08:54 -04:00
ruv	483bfa4660	feat(aether-arena): benchmark-first scorer + witness chain + repeatability (M2/M5/M7) Per direction "remove the initial number, optimize for benchmark first" + "include witness chain capabilities for proof and repeatability analysis": - Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a real scoring-pipeline witness; RuView gets no hand-entered baseline. - Real model scoring: aa_score_runner now loads predictions + an eval split (--split/--pred) and scores them through the real ruview_metrics pose harness — not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json). - Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version. - Repeatability analysis: --repeat N runs the harness N× and fails if it ever yields >=2 distinct proof hashes (16/16 identical locally). - Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper- evident (seed/append/verify); editing any past row breaks the chain. - CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger chain verify on every PR. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 16:59:11 -04:00
ruv	a6808568a2	feat(aether-arena): ADR-149 spatial-intelligence benchmark — scorer + CI harness gate (M1-M4) AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark (ADR-149, Accepted). Iteration 1 of the long-horizon build: - ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked (pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score formula, submission state machine, neutrality/governance, and the §7 acceptance test. - aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS. - CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on every PR — the "PR that runs the harness as part of the build" requirement. - Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml. - Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json). Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on ADR-079 data collection, tracked as a stretch goal, not an infra exit. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-30 16:47:22 -04:00
ruv	0f336b7d36	feat(train): ADR-145 ablation eval harness + privacy-leakage/latency metrics (#849 ) - train/ablation.rs: FeatureSet matrix (CSI/CIR/CSI+CIR/+Doppler/+BFLD/+UWB); AblationMetrics (presence acc, loc err, FP/FN, latency p50/p95, privacy leakage, cross-room degradation) derived deterministically from VariantRun - membership_inference_leakage(): MIA proxy = \|AUC-0.5\|*2 (0 indistinguishable, 1 perfectly separable); latency_percentiles_ms (nearest-rank); confusion_rates - AblationReport.to_markdown() (deterministic), csi_cir_beats_csi_only() acceptance check - 5 tests; workspace 0 errors Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-28 23:38:43 -04:00
rUv	004a63e82d	fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769 ) - Upgrade openssl to 0.10.78 (CVE-2026-41676), jsonwebtoken to 9.4 - Suppress unmaintained-only/no-CVE advisories in .cargo/audit.toml with per-entry rationale - Fix all `cargo clippy --all-targets -- -D warnings` errors across 35 crates: derivable_impls, needless_range_loop, map_or→is_some_and/ is_none_or, await_holding_lock (drop MutexGuard before .await), ptr_arg (&mut Vec→&mut [T]), useless_conversion, approximate_constant (2.718→E, 3.14→PI), field_reassign_with_default, manual_inspect, useless_vec, lines_filter_map_ok, print_literal, dead_code - Apply `cargo fmt --all` - Pre-existing test failure in wifi-densepose-signal (test_estimate_occupancy_noise_only) is not introduced by this PR	2026-05-23 05:36:13 -04:00
ruv	6f77b37f5e	chore(release): wifi-densepose-train 0.3.0 -> 0.3.1 Publishing the additive changes from PRs #536/#537 to crates.io: - `signal_features` module — wires `wifi-densepose-signal` into the pipeline (audit #1/#2) - `TrainingConfig::for_subcarriers` / `ht40_192()` / `multiband_168()` presets + the real `MmFiDataset` loader integration test (audit #4/#6/#7) No public API removals or changes — additive only, so 0.3.0 -> 0.3.1 is semver-correct. No other workspace crate depends on `wifi-densepose-train`, so this is a standalone bump. Co-Authored-By: claude-flow <ruv@ruv.net>	2026-05-11 23:59:50 -04:00
rUv	c604ca1150	feat(train): TrainingConfig subcarrier-layout presets + real MmFiDataset loader test (#537 ) Closes the remaining doable items from the 2026-05-11 training-pipeline audit: #6 (CSI format default = 56-sc / 1 NIC) + #7 (multi-band 168-sc mesh not in config): new `TrainingConfig::for_subcarriers(native, target)` plus named presets `mmfi()` (114→56), `ht40_192()` (≈192-sc ESP32 HT40 → 56) and `multiband_168()` (168-sc ADR-078 multi-band mesh → 56). Non-MM-Fi CSI shapes are now first-class instead of requiring manual `native_subcarriers` / `num_subcarriers` overrides; the field docs list the supported source counts and the multi-NIC mapping (a 2–3-node mesh currently rides on `n_rx` until a dedicated node dimension lands). Model input width stays `num_subcarriers`; the presets only vary the resampling input. #4 (proof.rs uses synthetic data): reframed — a deterministic proof must use a reproducible source, so `verify-training` correctly stays on `SyntheticCsiDataset`. The real gap was that nothing exercised the on-disk `MmFiDataset` path. New `tests/test_real_loader.rs` writes synthetic CSI to `.npy` files in the `MmFiDataset::discover` layout, loads it back, and checks the resulting `CsiSample` — covering the no-interp case, the subcarrier-interpolation branch, and the empty-root case. Adds `ndarray` / `ndarray-npy` as dev-deps for the fixture writing. cargo check + cargo test -p wifi-densepose-train --no-default-features: clean, all existing tests green, 3 new loader tests + the updated config doctest pass. Purely additive — no model-shape change, no tch-module change.	2026-05-11 23:49:00 -04:00
rUv	eaedfded6f	fix(train): wire wifi-densepose-signal into the pipeline; correct MODEL_CARD env-sensor claim (#536 ) Addresses three findings from the 2026-05-11 training-pipeline audit: #1/#2 — `wifi-densepose-signal` was a phantom dependency of `wifi-densepose-train` (listed in Cargo.toml, never imported), and vitals/CSI signal features were absent from the pipeline. New module `wifi_densepose_train::signal_features`: `extract_signal_features(&Array4<f32>, &Array4<f32>) -> Array1<f32>` (and the convenience method `CsiSample::signal_features()`) runs a windowed observation's centre frame through `wifi_densepose_signal::features::FeatureExtractor`, producing a fixed-length (FEATURE_LEN=12) amplitude / phase-coherence / PSD feature vector — the hook for a future vitals / multi-task supervision head (breathing- and heart-rate-band power are read off the PSD summary). The vector is produced on demand and is not yet fed back into the loss; wiring it as a training target is the documented follow-up. `wifi-densepose-signal` is now an actually-used dependency. 5 new tests (2 unit in signal_features.rs, 3 integration in tests/test_dataset.rs); existing wifi-densepose-train tests unchanged and green. #3 — `docs/huggingface/MODEL_CARD.md` presented PIR/BME280 environmental-sensor weak-label fine-tuning as a current capability; there is no env-sensor ingestion in the training pipeline. Marked that path as planned/not-implemented in the training-steps list and the data-provenance section. (#5 — README's "92.9% PCK@20" overclaim — fixed separately in PR #535.) CHANGELOG updated.	2026-05-11 23:40:55 -04:00
rUv	f49c722764	chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427 ) The Rust port lived two directories deep (rust-port/wifi-densepose-rs/) without any sibling under rust-port/ that warranted the extra level. Move the whole workspace up to v2/ to match v1/ (Python) at the same depth and shorten every cd / build command across the repo. git mv preserves history for all tracked files. 60 files updated for path references (CI workflows, ADRs, docs, scripts, READMEs, internal .claude-flow state). Two manual fixes for relative-cd paths in CLAUDE.md and ADR-043 that became wrong after the depth change (cd ../.. → cd ..). Validated: - cargo check --workspace --no-default-features → clean (after target/ nuke; the gitignored target/ was carried by the OS rename and had hard-coded old paths in build scripts) - cargo test --workspace --no-default-features → 1,539 passed, 0 failed, 8 ignored (same totals as pre-rename) - ESP32-S3 on COM7 → still streaming live CSI (cb #40300, RSSI -64 dBm) After-merge follow-up: contributors should `rm -rf v2/target` once and let cargo regenerate from the new path.	2026-04-25 21:28:13 -04:00

16 Commits