Commit Graph

16 Commits

Author SHA1 Message Date
ruv 2e4461d64d release: bump 9 crates changed in the beyond-SOTA sweep for crates.io
vitals/wifiscan/hardware/nn 0.3.0->0.3.1, ruvector 0.3.1->0.3.2,
signal 0.3.2->0.3.3, train 0.3.1->0.3.2, mat 0.3.0->0.3.1,
sensing-server 0.3.1->0.3.2.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 22:41:21 -04:00
ruv aa3a6725a6 fix(train,nn): Tier-2 correctness/security — metric scale, OOM bounds, panics (ADR-155 §Tier-2)
Each fix ships a test that would have caught the bug:
- ruview_metrics OKS: derive scale from GT extent (no s=1.0 fake-Gold), reject
  s<=0, bound the loop to array extents (no panic on short/adversarial input).
- config.validate(): UPPER bounds on window_frames/subcarriers/backbone_channels/
  heatmap_size/keypoints/body_parts/batch_size + reject negative gpu_device_id
  (closes the config-OOM class); defaults+presets still validate.
- subcarrier.rs: graceful fallback instead of panic on non-contiguous input.
- ablation.rs latency_percentiles: total_cmp + NaN guard (no partial_cmp unwrap).
- tensor.rs softmax(axis): normalize per-lane along the given axis (was whole-
  tensor), out-of-range axis -> NnError; fixes densepose per-pixel probs.
- translator.rs apply_attention: real scaled-dot-product attention (was a
  uniform 1/seq_len stub that made any "with attention" ablation == without);
  mis-shaped checkpoint projections rejected.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 19:57:32 -04:00
ruv 84e2c920fd fix(train): proof margin + committed-hash requirement (ADR-155 §Tier-1.4)
The deterministic proof self-certified: PASS on any loss decrease (incl. 1e-9
noise) and a missing expected hash defaulted to PASS.

- MIN_LOSS_DECREASE=1e-4: a run counts as learning only above float noise; a
  noise-only pipeline now FAILS.
- is_pass() requires hash_matches==Some(true); no-hash -> SKIP (exit 2), never
  PASS. verify-training fails fast on a sub-margin loss before the hash compare,
  so a missing baseline cannot mask a non-learning pipeline.

Documented honestly: the proof certifies reproducibility/determinism on a
synthetic dataset, NOT that real data produced the weights nor that any accuracy
claim is met. Tests: no_committed_hash_is_skip_not_pass,
submargin_loss_change_fails_even_without_hash,
committed_matching_hash_with_real_decrease_passes.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 19:57:16 -04:00
ruv 7fb3e33557 fix(train): rapid_adapt real finite-difference gradients, not a fake step (ADR-155 §Tier-1.3)
contrastive_step/entropy_step wrote a fake gradient (grad += v*0.01) unrelated
to the stated objective, so any "TTA improves the metric" was unsupported. The
*_loss functions are now pure evaluators of the real objective; adapt() descends
them with a central finite-difference gradient of that exact loss, so "the
adaptation loss decreases" is now a real, reproducible measurement.

Honest scope caveat (documented): this minimizes a self-supervised proxy over a
LoRA bottleneck on raw CSI; it is NOT wired to the pose model and there is NO
measured end-to-end PCK gain on WiFi pose from this path.

Tests: contrastive_loss_decreases, entropy_loss_decreases (real gradient steps
don't increase the loss), reported_loss_is_the_real_objective_not_a_placeholder.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 19:57:15 -04:00
ruv 2a2a2c5b06 fix(train): leak-free subject-disjoint split + synthetic-val disclosure (ADR-155 §Tier-1.2)
MM-Fi windows are stride-1 (~99% overlap), so an index-level split leaks; and
bin/train.rs validated real training against a SYNTHETIC val set, making any
printed PCK meaningless on two counts.

- MmFiDataset::subject_disjoint_split partitions whole subjects -> the two views
  share no subject and no window (leak-free by construction, deterministic per
  seed). assert_split_leak_free verifies subject- AND window-disjointness and is
  called inside the split so a leaky split is never handed out.
- bin/train.rs now prefers the real split; the synthetic path is a labelled
  run_smoke_test ("[SMOKE-TEST] DO NOT REPORT") reachable only as a fallback.
- New DatasetError::InvalidSplit.

Tests prove disjointness, determinism, single-subject/bad-fraction rejection,
and that the validator catches an injected subject leak.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 19:56:57 -04:00
ruv 50b657459f fix(train): unify 7 divergent PCK/OKS into one canonical metric (ADR-155 §Tier-1.1)
Collapse the four PCK and three OKS implementations into a single source of
truth — pck_canonical (torso hip↔hip, COCO/ADR-152 convention validated at
~96% PCK@20 in benchmarks/wiflow-std) and oks_canonical (scale from GT pose
extent). MetricsAccumulator, compute_pck/_per_joint/_oks, aggregate_metrics and
the deprecated *_v2 path all route through them, so Trainer::evaluate() and the
bench definition agree.

Fixes two claim-inflating bugs, each pinned by a regression test:
- zero-visible-joint PCK was 1.0 (false-perfect) -> now 0.0
- OKS s=1.0 on normalized coords made OKS~=1.0 for any pose ("fake Gold tier")
  -> scale now derived from the pose; a 3x-torso-wrong pose yields OKS<0.2

Divergent local kernels (training_bench raw-threshold, sensing-server
torso-height) annotated "DO NOT USE for reported metrics". Legitimately changed
test expectations (all-coincident "perfect" fixtures are correctly unscoreable;
all-invisible -> 0.0) updated with comments citing the finding.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 19:56:44 -04:00
rUv 17471e93ff
ADR-152: WiFi-Pose SOTA 2026 intake — WiFlow-STD benchmark, Rust integrations, ADR-153 802.11bf layer, efficiency frontier (#1008)
* feat(calibration): NodeGeometry transceiver-geometry recording (ADR-152 §2.1.1)

PerceptAlign-motivated geometry capture at enrollment: per-node optional
records (position, antenna orientation, inter-node distances, acquisition
method) — recorded when known, never required. Event-sourced via
EnrollmentEvent::GeometryRecorded (latest recording wins); persisted on
SpecialistBank with serde defaults so pre-ADR-152 bank JSON loads cleanly
(fixture-proven, and geometry-free banks serialize byte-shape-identical
to the old schema); threaded through MultiNodeMixture as data only — the
learned geometry embeddings and algorithmic fusion use are §2.1.2,
deliberately deferred until the ADR-151 P6 LoRA heads exist.

Geometry recorded from now on means banks captured today remain usable
for layout-conditioned training later — you can't retroactively add
geometry to data you didn't record.

8 new tests (3 geometry, 2 anchor, 2 bank, 1 multistatic) + full-loop
extension (2-node geometry, one tape-measured + one unknown, surviving
the bank JSON round-trip the runtime loads from). 50/50 calibration
(both feature configs) + 23 CLI tests green.

Co-Authored-By: RuFlo <ruv@ruv.net>

* feat(training): two-checkerboard camera↔room calibration for ADR-079 labels (ADR-152 §2.1.3)

Defends the camera-supervised pipeline against PerceptAlign's
"coordinate overfitting": MediaPipe keypoints were emitted in raw camera
coordinates with no shared frame and no transceiver-geometry metadata —
the exact label shape that memorizes deployment layout and collapses
cross-layout.

- scripts/calibrate-camera-room.py + calibration_lib.py: OpenCV
  two-checkerboard calibration → versioned bundle JSON (intrinsics,
  camera→room extrinsics, checkerboard spec, transceiver geometry,
  sha256 calibration_id). Intrinsics resolve from file > cache >
  multi-view computation > loud-warning 2-view fallback.
- collect-ground-truth.py --calibration <bundle>: every sample gains
  keypoints_room (unit bearing rays from the camera center in the room
  frame — documented projective alignment; raw image coords preserved
  so training chooses), camera_origin_room, calibration_id, and the
  transceiver geometry stamp. Without the flag, output is byte-identical
  to before (tested) + a one-line ADR-152 warning.

Design finding (recorded for ADR-152): a single planar checkerboard's
corner grid is centrosymmetric — the reversed corner ordering fits a
ghost camera pose with IDENTICAL reprojection error, so per-board flip
disambiguation is mathematically ill-posed. solve_two_board_extrinsics
solves the joint wall+floor set over all 4 flip combinations, where the
minimum is unique — an independent reason the TWO-checkerboard method is
required, beyond what PerceptAlign states.

15 headless pytest tests green (synthetic corners: extrinsics recovery
incl. ghost resolution, bundle round-trip + hash stability, ray
transforms w/ distortion + cross-resolution, no-calibration byte
identity).

Co-Authored-By: RuFlo <ruv@ruv.net>

* feat(benchmarks): WiFlow-STD reproduction harness + measurement (a) results (ADR-152 §2.2)

Shipped checkpoint REFUTED (0.08% PCK@20, wrong keypoint normalization);
6 reproducibility defects documented (broken imports, corrupted dataset
tail with float32-max garbage that NaN-poisons fp16 BatchNorm, unreachable
test phase). After repairs, retraining with upstream defaults reproduces
96.09% PCK@20 full-test / 96.61% corruption-free (published 97.25%) on
RTX 5080. Claims graded MEASURED-EQUIVALENT; 2.23M params + ~0.055 GFLOPs
verified. Third-party code/weights/data stay out of tree (gitignored).

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat: ADR-152 Rust integrations + ADR-153 802.11bf protocol model

- calibration: GeometryEmbedding — 32-slot permutation-invariant NodeGeometry
  featurization for future LoRA-head conditioning (ADR-152 §2.1.2); derived
  SpecialistBank::geometry_embedding() accessor; 59 tests
- train: MaePretrainConfig + patchify/random-mask with UNSW measured recipe
  (80% masking, (30,3) patches; ADR-152 §2.3, arXiv 2511.18792); strict
  no-truncate/no-NaN policy; proptest properties
- train: WiFlowStdModel — tch-gated port of the verified ~96%-PCK@20
  WiFlow-STD architecture (ADR-152 §2.2 beyond-SOTA); ungated param formula
  pinned to 2,225,042; 15/17-keypoint support; 239 crate tests
- hardware: ieee80211bf forward-compatibility protocol model (ADR-153):
  SpecProfile gates, SensingCapabilities negotiation, required ConsentMode,
  session FSM, SensingTransport + SimTransport + OpportunisticCsiBridge;
  full acceptance checklist covered; 156+4 tests
- deps: ruvector bumps per ADR-152 §2.6 survey (mincut/solver 2.0.6,
  attention 2.1.0, gnn 2.2.0); vendor/ruvector synced to a083bd77f
- docs: ADR-153 accepted; ADR-152 §2.2 status, §2.4 amendment, §2.6 added

Workspace: 162 test suites green (--no-default-features); Python proof PASS.
Known pre-existing flake: homecore-api env_empty_falls_back_to_defaults
(unserialized env-var mutation) — untouched, follow-up.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs: CHANGELOG + CLAUDE.md entries for ADR-152 integrations and ADR-153

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix(train): repair tch-backend bit-rot — gated path compiles and tests run again

Mechanical API refresh against current tch: Vec::from(Tensor) -> try_from
(+ explicit flatten), numel() usize cast, Rem/div ops -> remainder() /
divide_scalar_mode(floor) — the latter fixed a silent true-division bug in
heatmap argmax decoding; clamp(1.0, f64::MAX) -> clamp_min (torch 2.x scalar
overflow panic); petgraph EdgeRef import; missing EvalMetrics and
verify_checkpoint_dir APIs that tests documented. wiflow_std roundtrip test
uses safetensors (.pt _save_parameters roundtrip broken in torch 2.11
Windows). Gated: 349 passed (incl. all 20 wiflow_std); ungated: unchanged.
Known pre-existing: gaussian-heatmap convention mismatch (2 tests), proof
seed race under parallel threads — documented, deliberate follow-ups.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(train): WiFlow-STD PyTorch->tch weight import + numerical parity proof

export_to_safetensors.py maps the retrained checkpoint (295 tensors -> 248
mapped, param sum exactly 2,225,042; num_batches_tracked dropped) into a
tch-loadable safetensors plus a deterministic parity fixture. Gated #[ignore]
integration test loads it strictly and asserts forward-pass agreement:
max abs diff 1.192e-7 on the seed-42 fixture. dump_variable_names test makes
the tch name layout authoritative. Zero architecture discrepancies found.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: workflow-review findings — BN gamma init, ThresholdParams serde, init docs

Concurrent validation workflow (2 review lanes + adversarial verification,
13 agents): 5 confirmed findings, 3 refuted. Fixes:
- wiflow_std: pin BatchNorm gamma to 1.0 (tch default draws Uniform(0,1) —
  silently halves activations in from-scratch training; loaded checkpoints
  unaffected, parity re-verified after the change)
- wiflow_std: document the conv-init divergences vs the reference's
  effective kaiming_normal(fan_out) re-init (from-scratch dynamics only)
- ieee80211bf: ThresholdParams deserialization validates via try_from so
  the <=100 invariant holds for untrusted payloads (+ rejection test)

Benchmarks (release, ruvzen): GeometryEmbedding 1.84us/call (542k/s),
MAE tokenization 7.38us/window (135k/s), 802.11bf FSM 8.9M events/s —
nothing suspicious.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(adr): ADR-152 §2.1.4 gate resolved — PerceptAlign repo MIT, dataset on HF

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): edge optimization measured + measurement (b) blocked + 92.9% retraction

Edge optimization (ADR-152 optimize track): ONNX Runtime fp32 is the CPU
latency win (3.2 ms/window, ~3.4x faster than torch, parity 2.4e-7); ORT
dynamic int8 reaches 2.44 MB (paper's ~2.2 MB claim plausible only via
conv-capable toolchains; -0.16pt PCK@20, +18% MPJPE, 2x slower); torch
dynamic quant converts 0% of this conv-only model; fp16 halves storage free
but is slower on CPU.

Measurement (b) BLOCKED-ON-DATA: only 1,077 paired ESP32 windows exist
(stop rule <2k). Forensic recheck of the surviving April holdout RETRACTS
the ADR-079 '92.9% PCK@20' figure: constant-output model, absolute (not
torso) threshold, 69 near-static frames — mean predictor scores 100% under
that protocol; torso-PCK@20 is 19.1%. Corroborates PR #535. Stale citations
removed from user-guide, readme-details, ADR-152 §2.1.3; no-citation rule
extended to ADR-079 accuracy claims. Unblock: >=2k-window multi-pose paired
session + torso-PCK re-baseline.

Co-Authored-By: claude-flow <ruv@ruv.net>

* docs(user-guide): corrected camera-supervised collection tutorial

Step 0 CSI-rate check + session-length math (window yield = frames/20 —
the May session's 8x under-delivery was a ~12 Hz CSI rate, not an aligner
bug); two-checkerboard calibration step (ADR-152 §2.1.3); pose-variety and
confidence guidance; torso-normalized PCK + temporal-split + pred-variance
eval protocol (lessons from the 92.9% retraction); scale presets re-keyed
to realistic window counts.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): static PTQ int8 (calibrated) results + overnight capture script

Conv-only static QDQ beats dynamic int8 on accuracy (PCK@20 96.61-96.63%
vs 96.52%, MPJPE +10% vs +18% over fp32) at ~equal size/latency; all-ops
QDQ strictly worse (int8 activations through attention glue). Entropy
calibration verified bit-identical to MinMax on this data. Deployment:
ONNX fp32 for speed (3.2ms), static conv-only QDQ for smallest (2.53MB).

Also: scripts/overnight-empty-capture.py — segmented UDP CSI recorder for
empty-room baselines (no glob collisions, detach-safe).

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): measurement (b) MEASURED — optimization transfer only, mean-pose baseline wins

WiFlow-STD fine-tuned on 2,046 fresh single-room ESP32 paired windows
(temporal 70/15/15, 70->540 adapter, K=17): pretrained-init 65% PCK@20 vs
scratch 0% (optimization transfer) but frozen-trunk ~0% (no feature
transfer), and NOTHING beats the mean-pose baseline (95.9% PCK@20 —
single subject, near-static normalized coords). Honesty gates held: pred
std 0.0113 (non-constant model) but mean-baseline dominance means no
citable CSI->pose capability from this data. ADR-152 open question 1
answered partially; definitive answer needs multi-subject/position data.

Two new aligner findings: heterogeneous csi_shape with silent zero-padding
(~20%), and extractCsiMatrix's transposed shape label (frame-major data,
[nSc, nFrames] label) — fixes pending.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(benchmarks): efficiency sweep MEASURED — half model dominates full reference

Compact WiFlow-STD variants on the same data/split/protocol: half (843,834
params, 0.38x) strictly dominates the 2.23M reference (PCK@20 96.62 vs
96.61, PCK@50 99.47 vs 99.11, MPJPE 0.00898 vs 0.0094) — the published
architecture is over-parameterized for its own benchmark. quarter (338k)
96.05%; tiny (56,290 params, 1/39.5) holds 94.11% — a ~220KB fp32 edge
candidate. In-domain caveats recorded; cross-domain untested.

Co-Authored-By: claude-flow <ruv@ruv.net>

* feat(train): compact WiFlow-STD presets in Rust + tiny edge artifact (ADR-152)

WiFlowStdConfig gains half()/quarter()/tiny() mirroring the overnight sweep
exactly: TcnGroupsMode (Fixed/Gcd/Depthwise), input_pw_groups, derived
stride schedule and decoder-mid (all default to upstream behavior; legacy
serde JSON unaffected). Param formulas pin to trained ground truth first
try: 843,834 / 338,600 / 56,290; default 2,225,042 pin and 1.192e-7 parity
unchanged. 248 tests green.

Tiny edge artifact (tiny_edge_bench.py): ONNX fp32 = 295 KB, 0.66 ms/win
(~1,500/s CPU), 94.11% PCK@20 (matches sweep clean-test exactly; parity
1.49e-7). Static int8 is a bad trade at this scale (-1.43pt, +19% MPJPE,
-16% size, slower) — recorded as negative result. Export note: width-16
breaks AdaptiveAvgPool((15,1)) TorchScript export; replaced by exact
mean+matmul equivalent, proven by parity.

Co-Authored-By: claude-flow <ruv@ruv.net>

* fix: resolve all 10 confirmed code-review findings (7-angle review, 20/20 verified)

wiflow_std: min_feature_width (default 15) replaces the keypoints->stride
coupling — for_keypoints(17) now provably builds the trained [2,2,2,2]
graph and pools 15->17, matching the validated Python protocol (pinned by
tests); param_count() total on invalid configs; random_mask returns Result
and rejects non-finite/out-of-range ratios; trainer checkpoints switched
to safetensors (.pt VarStore roundtrip broken on Windows torch 2.11).

ieee80211bf: SBP proxy now re-triggers instances and relays reports via
Action::RelaySbpReport -> SensingFrame::SbpReport (clients consume via
their existing path); missed_instances reset on success = consecutive
semantics; SessionTable gains a guarded SBP entry point + unknown-id drop
counter; initiator-role sessions reject inbound setup/SBP requests
(RejectedNotSupported) closing the idle hijack; StartSetup/StartSbp
outside Idle return InvalidStateForCommand; SBP validation unified
through evaluate_setup with a 1:1 SetupStatus->SbpStatus mapping.
events.rs split out to honor the 500-line cap.

calibration/cli: enrollment geometry now actually reaches trained banks —
both production call sites attach .with_geometry; --geometry flag on
train-room and POST /enroll/geometry + train-body geometry on
calibrate-serve give production a recording surface; geometry-free banks
log the ADR-152 §2.1.2 note.

benchmarks: corruption masks committed as ground truth (unregenerable
after in-place cleaning; verified bit-identical regeneration from the
pristine copy) + generate_corruption_masks.py producer; _bench_common.py
dedups the 5x-copied shim/evaluate/seed/remap (post-refactor PCK@20
re-verified equal to the last digit); remote scripts get the mmap patch;
tiny_edge --calib validated multiple-of-64; onnx_bench --help no longer
executes (and overwrote) the export — artifact restored byte-exact.

Workspace: 2,963 tests passed, 0 failed; Python proof PASS.

Co-Authored-By: claude-flow <ruv@ruv.net>

* ci: build workspace tests without debuginfo — runner disk exhaustion

The combined 38-crate debug target exceeds the GitHub runner's disk
('final link failed: No space left on device'); the same tree measured
151GB locally with full debuginfo. CARGO_PROFILE_{DEV,TEST}_DEBUG=0
shrinks the target ~5-10x; debuginfo serves no purpose in CI test runs.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-11 17:02:23 -04:00
rUv 29de574e63
Beyond-SOTA engine/signal/train improvements: mesh partition guard, FFT CIR solver, canonical frame decoder, falsifiable occupancy benchmark, governed streaming, adapter provenance (#1018)
* docs(research): add RuView beyond-SOTA system review (00)

First document of the beyond-SOTA research series: capability audit of
the current RuView engine with role-to-crate maturity matrix, ruvsense
module inventory, gap analysis, and risk register.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* docs(research): add beyond-SOTA architecture design (02, in progress)

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* docs(research): finalize beyond-SOTA architecture (02)

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* docs(research): add benchmark/validation methodology snapshot (03)

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* docs(research): add beyond-SOTA series index with validation results; changelog

README index ties the 5 research docs together with the session's
measured validation evidence: 2,797 workspace tests / 0 failed, Python
proof PASS (bit-exact), and paired pre/post criterion CIR benchmarks.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* perf(signal): precompute CIR warm-start system; hoist tomography solver allocs

Exact, determinism-safe optimizations (bit-identical float results):

- cir.rs: diag(PhiH Phi)+lambda*I and its CSR matrix depend only on Phi
  and lambda (fixed at CirEstimator::new) but were rebuilt every frame
  (O(K*G) pass + CSR allocation). Now built once in new() via
  build_warm_start_system; summation order unchanged.
- tomography.rs: ISTA gradient buffer hoisted out of the 100-iteration
  loop (fill(0.0) reset) and the Frobenius Lipschitz bound moved from
  per-reconstruct to construction.

Verified: signal 456 tests green; engine 11/11 green including
cycle_is_deterministic and witness-stability tests. Criterion paired
pre/post: cir_estimate/he40 -3.9% (p<0.01), multiband -1.2/-1.4%.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* fix(worldgraph): bound SemanticState growth with deterministic retention

StreamingEngine::process_cycle appended one SemanticState belief per cycle
with no eviction — ~1.7M nodes/day at 20 Hz (beyond-SOTA roadmap finding #6).

Add WorldGraph::prune_semantic_states(max): deterministic eviction of the
oldest beliefs by (valid_from_unix_ms, id); structural nodes (rooms, zones,
sensors, anchors, tracks, events) are never eligible. Wire it into the
engine after each belief append (DEFAULT_SEMANTIC_RETENTION = 7,200, ~6 min
at 20 Hz; set_semantic_retention to tune). The WorldGraph holds current
beliefs; durable history is the recorder's job, so no audit data is lost.

3 new tests: end-to-end bounded growth, oldest-only eviction, deterministic
equal-timestamp tie-break. Workspace gate: 2,865 passed, 0 failed.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* feat(sensing-server): route live frames through the governed StreamingEngine

Closes the live-trust-path gap (ADR-136 section 8, beyond-SOTA system review):
the running server fused live CSI with the bare MultistaticFuser, while the
privacy/provenance/witness control plane (ADR-135..146) only ever ran on
synthetic in-test frames. The privacy control plane was therefore bypassable
on the real path.

New engine_bridge module drives StreamingEngine::process_cycle from the
server's live NodeState map, reusing the existing NodeState -> MultiBandCsiFrame
conversion. It lazily wires each contributing node as a WorldGraph sensor
(idempotent), bounds belief growth via the retention cap, and forwards explicit
timestamps/calibration ids so the path stays deterministic and replayable.

Wired additively into both live ESP32/WiFi fusion sites in main.rs via a
split-borrow off the write guard, so person-count behavior is unchanged; the
latest BLAKE3 witness is stored on AppState. Every published belief now carries
evidence + model + calibration + privacy decision and a deterministic witness.

Adds wifi-densepose-engine/-worldgraph/-bfld/-geo deps. 6 new bridge tests
(witnessed belief with full provenance, cross-run determinism, idempotent node
registration, retention bound, privacy-mode propagation). sensing-server suite
430+128 green; workspace gate 2,904 passed / 0 failed.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* feat(train): falsifiable occupancy benchmark with anti-overfitting gate

Makes the presence/person-count "beyond SOTA" claim falsifiable in code
instead of aspirational (the unfalsifiability gap from the beyond-SOTA system
review). occupancy_bench grades predictions vs ground truth and gates a SOTA
claim behind one claim_allowed invariant requiring ALL of:

- DataProvenance::Measured — synthetic/mock data is scorable for regression
  but never claimable (anti-mock-contamination; the CLAUDE.md Kconfig-bug
  lesson made structural).
- A leak-free EvalSplit — validate() refuses any split where a subject OR
  environment id appears in both train and test (subject leakage /
  per-environment overfitting).
- n_test >= min_test_samples (small-N guard).
- Presence F1 whose bootstrap-CI lower bound (deterministic seeded splitmix64)
  clears the threshold — not the point estimate.
- Count MAE within threshold.

The claim string is unreadable except through the gate (NO_CLAIM otherwise),
same discipline as the ruview-gamma acceptance gate. What remains is data, not
method: a frozen, SHA-pinned, subject/environment-disjoint measured replay set
turns the claim into a passing/failing test.

Lives in wifi-densepose-train (the eval bounded context, alongside ablation/
eval/metrics). 10 tests cover each refusal path; warning-clean under the
crate's missing_docs lint. Workspace gate 2,914 passed / 0 failed. Doc 03
updated.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* feat(engine): per-room adapter provenance + drift-to-recalibration advisor

Closes the trust-chain gap where an ~11 KB per-room LoRA adapter (ADR-150
section 3.4) could silently change inference without the witness noticing:
provenance carried only "rfenc-v<N>" with no notion of adapter identity.

- StreamingEngine::set_room_adapter(AdapterInfo): pins the adapter's
  content-derived id into provenance model_version
  ("rfenc-v1+adapter:<id>") — and therefore into the BLAKE3 witness — so
  swapping or clearing adapter weights always shifts the witness. Engine test
  proves base -> adapter -> other-adapter -> cleared all witness differently
  and cleared == base.
- RecalibrationAdvisor: recommends re-running the ADR-135 empty-room baseline
  / refitting the room adapter on sustained low fusion coherence (streak
  threshold, default 60 cycles ~ 3 s at 20 Hz) or an ADR-142 change-point.
  Surfaced as TrustedOutput::recalibration_recommended, stored on the
  sensing-server AppState alongside the witness at both live fusion sites.
- Bridge plumbing: EngineBridge::{set_room_adapter, clear_room_adapter} +
  live-path test that the adapter id flows into the live witness.

Scope note (honest): this is the deployable provenance/trigger half of the
"retrained model" roadmap item. Fitting the adapter itself runs in the
existing external calibration service (aether-arena/calibration/); a trained
RF-encoder checkpoint still does not exist in-tree.

Engine 15 tests, bridge 7 tests. Workspace gate: 2,918 passed / 0 failed.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* fix(mat): gate api module behind its feature — standalone no-default-features builds

pub mod api was unconditional while its only dependency, serde, is optional
behind the 'api' feature, so any build without default features failed with
101 unresolved-serde errors (masked in --workspace runs by feature
unification). The api module and its create_router/AppState re-export are now
cfg(feature = "api")-gated with docsrs annotations.

All combos compile: bare --no-default-features (was 101 errors, now 0),
--no-default-features --features api, and full default (177 tests pass).
Workspace gate: 2,918 passed / 0 failed.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* perf(signal): opt-in FFT operator for the CIR ISTA solver (8-14x measured)

Phi is a sub-DFT, so each ISTA mat-vec can run as one length-G FFT
(O(G log G)) instead of a dense O(K*G) product — the dominant-latency-hazard
finding from the beyond-SOTA optimization roadmap.

New CirConfig::fft_operator, default FALSE: the dense path stays the
bit-exact witness default. The FFT evaluates the same sums in a different
order, so enabling it shifts float results in the last bits and requires
regenerating any pinned witness — strictly opt-in per deployment.

FftOperator (rustfft, planned once at CirEstimator::new, scratch buffers
reused across the ISTA loop) dispatches inside ista_solve:
  Phi x   = scale * forward-FFT(x) sampled at bins (k_idx mod G)
  Phi^H v = scale * unnormalised inverse-FFT of v scattered into those bins
Warm-start and Lipschitz estimation stay dense at construction.

Measured (criterion, same run, same machine):
  ht20: 2.22 ms -> 265 us  (8.4x)
  ht40: 10.26 ms -> 717 us (14.3x)
The real HE40 grid (K=484, G=1452) scales further per the O(K*G)/O(G log G)
ratio.

3 new tests: FFT<->dense matvec equivalence to float tolerance on ht20 and
he40 grids; end-to-end dominant-tap agreement on a single-path frame; all
default configs keep FFT off. New cir_estimate_fft bench group.

Workspace gate: 2,921 passed / 0 failed (default path bit-exact, witnesses
unchanged).

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* feat(core): canonical frame decoder — capture-to-claim replay (ADR-136)

The encode half of the ADR-136 frame contract existed (ComplexSample,
to_canonical_bytes, witness_hash) but there was no decoder: a captured
canonical frame could be witnessed but never reconstructed, blocking
replay-from-capture.

CsiFrame::from_canonical_bytes is the exact inverse: same id, metadata,
complex payload, and witness hash (tested as the round-trip law AC7 — the
replayed frame re-encodes byte-identically). Amplitude/phase are recomputed
from the payload (projections, not independent state). Every malformed-input
class fails closed (AC8): header truncation -> Truncated, payload truncation
-> PayloadMismatch, unknown discriminants, non-UTF-8 device id, trailing
bytes. Nil calibration uuid decodes as None per the documented encoding.

Core: 36 tests pass. Workspace gate: 2,937 passed / 0 failed.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* feat(engine): dynamic min-cut mesh partition guard (ruvector-mincut)

Maintains an exact min-cut over the live mesh coupling graph — nodes are
sensing nodes, coupling is the product of fusion attention weights — and
surfaces per cycle, as TrustedOutput::mesh:

- cut value: the global "how close is the array to partitioning" number,
  a structural measure per-node heuristics miss;
- weak side: which specific nodes would split off (failure/jamming triage,
  feeds ADR-032 posture);
- at-risk flag: counts as a structural event for the drift->recalibration
  advisor (alongside ADR-142 change-points).

Degenerate cases fail toward risk: a node with zero coupling is reported as
already partitioned (cut 0, that node as the weak side).

Measured cost policy (criterion, 12-node mesh — the honest part):
- weights quantized (1/64) + change-gated: steady-state cycles do ZERO graph
  work and reuse the cached cut (~7.3 us, ~23x cheaper than building);
- on any real change a full exact rebuild (~171 us) is used, because ONE
  DynamicMinCut delete+insert measured ~240 us — the subpolynomial machinery
  amortizes on much larger graphs, so rebuild-on-change is the measured
  optimum at mesh scale (one-edge case -28% after switching policy);
- full process_cycle with the guard: ~33 us for 4 nodes vs the 50 ms budget.

9 mesh_guard tests (weak-node detection, steady-state zero updates,
sub-quantum gating, join/drop rebuild, determinism, disconnection) + an
engine-level wiring test (down-weighted node -> weak side -> recalibration).
Engine 24 tests; workspace gate 2,946 passed / 0 failed.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* feat(engine): mesh partition risk demotes privacy + enters the witness (ADR-032)

Completes the mesh-guard integration: its at_risk signal was advisory-only
(fed the recalibration advisor). It now also contributes to the ADR-141
privacy demotion alongside fusion- and array-level contradictions — a mesh
close to partitioning makes the fused belief less trustworthy, so the cycle
emits at a more restricted class (monotonic; information only removed).

Because effective_class feeds the BLAKE3 witness, a fragmenting array now
shifts the witness: partition risk is auditable, not just logged. The mesh
computation moved ahead of the demotion step in process_cycle; mesh_guard_mut
exposes risk-threshold tuning.

Test: a forced-risk 3-node cycle demotes PrivateHome Anonymous->Restricted
and shifts the witness vs a clean baseline. Engine 25 tests; workspace gate
2,947 passed / 0 failed.

https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH

* fix: public-PR review findings — privacy-path honesty, gate holes, mesh-guard cliff

- sensing-server: engine errors logged+counted (no silent swallow), trust
  state exposed via status surface, privacy-demotion claims aligned with
  the actual parallel-audit-path behavior
- occupancy_bench: vacuous-F1 hole closed (degenerate test sets fail with
  their own criterion); CI-lower-bound test made probative
- mesh_guard: quantization scaled to observed coupling range — >=65-node
  balanced meshes no longer permanently at_risk (regression test)
- engine: both wiring tests made probative (same-topology witness compare,
  deterministic risk-crossing fixture)
- mat: axum/tokio optional behind api; real serde feature (api enables it)
- core: canonical decoder strict (non-zero reserved bytes and nil UUID
  rejected — injective on accepted domain, forged-bytes tests)
- CHANGELOG: un-spliced the FFT/adapter bullet mangle

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: strip private-track references for public PR

Reword the occupancy-benchmark changelog bullet to drop a cross-reference
to the private research track, and restore the WorldGraph retention bullet
header that was glued onto the preceding MAT bullet.

Co-Authored-By: claude-flow <ruv@ruv.net>

* chore: lockfile refresh for cherry-picked feature set

Co-Authored-By: claude-flow <ruv@ruv.net>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-06-11 16:08:54 -04:00
ruv 483bfa4660 feat(aether-arena): benchmark-first scorer + witness chain + repeatability (M2/M5/M7)
Per direction "remove the initial number, optimize for benchmark first" + "include
witness chain capabilities for proof and repeatability analysis":

- Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a
  real scoring-pipeline witness; RuView gets no hand-entered baseline.
- Real model scoring: aa_score_runner now loads predictions + an eval split
  (--split/--pred) and scores them through the real ruview_metrics pose harness —
  not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json).
- Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact
  inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version.
- Repeatability analysis: --repeat N runs the harness N× and fails if it ever
  yields >=2 distinct proof hashes (16/16 identical locally).
- Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper-
  evident (seed/append/verify); editing any past row breaks the chain.
- CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger
  chain verify on every PR.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:59:11 -04:00
ruv a6808568a2 feat(aether-arena): ADR-149 spatial-intelligence benchmark — scorer + CI harness gate (M1-M4)
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:

- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
  (pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
  only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
  formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
  harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
  SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
  every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).

Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:47:22 -04:00
ruv 0f336b7d36 feat(train): ADR-145 ablation eval harness + privacy-leakage/latency metrics (#849)
- train/ablation.rs: FeatureSet matrix (CSI/CIR/CSI+CIR/+Doppler/+BFLD/+UWB);
  AblationMetrics (presence acc, loc err, FP/FN, latency p50/p95, privacy
  leakage, cross-room degradation) derived deterministically from VariantRun
- membership_inference_leakage(): MIA proxy = |AUC-0.5|*2 (0 indistinguishable,
  1 perfectly separable); latency_percentiles_ms (nearest-rank); confusion_rates
- AblationReport.to_markdown() (deterministic), csi_cir_beats_csi_only()
  acceptance check
- 5 tests; workspace 0 errors

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-28 23:38:43 -04:00
rUv 004a63e82d
fix(security): audit — fix RUSTSEC vulns, clippy warnings, dead code (#769)
- Upgrade openssl to 0.10.78 (CVE-2026-41676), jsonwebtoken to 9.4
- Suppress unmaintained-only/no-CVE advisories in .cargo/audit.toml
  with per-entry rationale
- Fix all `cargo clippy --all-targets -- -D warnings` errors across
  35 crates: derivable_impls, needless_range_loop, map_or→is_some_and/
  is_none_or, await_holding_lock (drop MutexGuard before .await),
  ptr_arg (&mut Vec→&mut [T]), useless_conversion, approximate_constant
  (2.718→E, 3.14→PI), field_reassign_with_default, manual_inspect,
  useless_vec, lines_filter_map_ok, print_literal, dead_code
- Apply `cargo fmt --all`
- Pre-existing test failure in wifi-densepose-signal
  (test_estimate_occupancy_noise_only) is not introduced by this PR
2026-05-23 05:36:13 -04:00
ruv 6f77b37f5e chore(release): wifi-densepose-train 0.3.0 -> 0.3.1
Publishing the additive changes from PRs #536/#537 to crates.io:
- `signal_features` module — wires `wifi-densepose-signal` into the pipeline
  (audit #1/#2)
- `TrainingConfig::for_subcarriers` / `ht40_192()` / `multiband_168()` presets
  + the real `MmFiDataset` loader integration test (audit #4/#6/#7)

No public API removals or changes — additive only, so 0.3.0 -> 0.3.1 is
semver-correct. No other workspace crate depends on `wifi-densepose-train`,
so this is a standalone bump.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-11 23:59:50 -04:00
rUv c604ca1150
feat(train): TrainingConfig subcarrier-layout presets + real MmFiDataset loader test (#537)
Closes the remaining doable items from the 2026-05-11 training-pipeline audit:

#6 (CSI format default = 56-sc / 1 NIC) + #7 (multi-band 168-sc mesh not in
config): new `TrainingConfig::for_subcarriers(native, target)` plus named
presets `mmfi()` (114→56), `ht40_192()` (≈192-sc ESP32 HT40 → 56) and
`multiband_168()` (168-sc ADR-078 multi-band mesh → 56). Non-MM-Fi CSI shapes
are now first-class instead of requiring manual `native_subcarriers` /
`num_subcarriers` overrides; the field docs list the supported source counts
and the multi-NIC mapping (a 2–3-node mesh currently rides on `n_rx` until a
dedicated node dimension lands). Model input width stays `num_subcarriers`; the
presets only vary the resampling input.

#4 (proof.rs uses synthetic data): reframed — a deterministic proof *must* use
a reproducible source, so `verify-training` correctly stays on
`SyntheticCsiDataset`. The real gap was that nothing exercised the on-disk
`MmFiDataset` path. New `tests/test_real_loader.rs` writes synthetic CSI to
`.npy` files in the `MmFiDataset::discover` layout, loads it back, and checks
the resulting `CsiSample` — covering the no-interp case, the
subcarrier-interpolation branch, and the empty-root case. Adds `ndarray` /
`ndarray-npy` as dev-deps for the fixture writing.

cargo check + cargo test -p wifi-densepose-train --no-default-features: clean,
all existing tests green, 3 new loader tests + the updated config doctest pass.
Purely additive — no model-shape change, no tch-module change.
2026-05-11 23:49:00 -04:00
rUv eaedfded6f
fix(train): wire wifi-densepose-signal into the pipeline; correct MODEL_CARD env-sensor claim (#536)
Addresses three findings from the 2026-05-11 training-pipeline audit:

#1/#2 — `wifi-densepose-signal` was a phantom dependency of `wifi-densepose-train`
(listed in Cargo.toml, never imported), and vitals/CSI signal features were
absent from the pipeline. New module `wifi_densepose_train::signal_features`:
`extract_signal_features(&Array4<f32>, &Array4<f32>) -> Array1<f32>` (and the
convenience method `CsiSample::signal_features()`) runs a windowed observation's
centre frame through `wifi_densepose_signal::features::FeatureExtractor`,
producing a fixed-length (FEATURE_LEN=12) amplitude / phase-coherence / PSD
feature vector — the hook for a future vitals / multi-task supervision head
(breathing- and heart-rate-band power are read off the PSD summary). The vector
is produced on demand and is not yet fed back into the loss; wiring it as a
training target is the documented follow-up. `wifi-densepose-signal` is now an
actually-used dependency. 5 new tests (2 unit in signal_features.rs, 3
integration in tests/test_dataset.rs); existing wifi-densepose-train tests
unchanged and green.

#3 — `docs/huggingface/MODEL_CARD.md` presented PIR/BME280 environmental-sensor
weak-label fine-tuning as a current capability; there is no env-sensor
ingestion in the training pipeline. Marked that path as planned/not-implemented
in the training-steps list and the data-provenance section.

(#5 — README's "92.9% PCK@20" overclaim — fixed separately in PR #535.)

CHANGELOG updated.
2026-05-11 23:40:55 -04:00
rUv f49c722764
chore(repo): rename rust-port/wifi-densepose-rs → v2/ (flatten to one level) (#427)
The Rust port lived two directories deep (rust-port/wifi-densepose-rs/)
without any sibling under rust-port/ that warranted the extra level.
Move the whole workspace up to v2/ to match v1/ (Python) at the same
depth and shorten every cd / build command across the repo.

git mv preserves history for all tracked files. 60 files updated for
path references (CI workflows, ADRs, docs, scripts, READMEs, internal
.claude-flow state). Two manual fixes for relative-cd paths in
CLAUDE.md and ADR-043 that became wrong after the depth change
(cd ../.. → cd ..).

Validated:
- cargo check --workspace --no-default-features → clean (after target/
  nuke; the gitignored target/ was carried by the OS rename and had
  hard-coded old paths in build scripts)
- cargo test --workspace --no-default-features → 1,539 passed, 0 failed,
  8 ignored (same totals as pre-rename)
- ESP32-S3 on COM7 → still streaming live CSI (cb #40300, RSSI -64 dBm)

After-merge follow-up: contributors should `rm -rf v2/target` once and
let cargo regenerate from the new path.
2026-04-25 21:28:13 -04:00