Sub-deliverable 8.2 of the benchmark/optimization milestone. Quantizes the
843,834-param "half" WiFlow-STD pose model (half_best.pth) to int8 two ways and
MEASURES the accuracy/size trade-off vs fp32 under ONE locked normalization
(ADR-173 torso-diameter PCK, upstream calculate_pck use_torso_norm=True), on the
same seed-42 file-level 70/15/15 test split that produced the fp32 sweep numbers.
MEASURED on ruvultra (RTX 5080, torch 2.11.0+cu128, fbgemm; clean test, torso-PCK):
fp32 96.62% pck@20 99.47% pck@50 0.008981 mpjpe 3.351 MB
int8 PTQ static 40.98% pck@20 94.98% pck@50 0.038262 mpjpe 1.046 MB (-55.64pp)
int8 QAT (3 ep) 67.48% pck@20 98.69% pck@50 0.026548 mpjpe 1.043 MB (-29.15pp)
Verdict (honest no): int8 is NOT a win at the strict PCK@20 edge target. Static
PTQ collapses; QAT recovers a large share but still loses 29 pp @20 for a 3.2x
size win — keep fp32/fp16 on the edge. Disclosed: QAT fake-quant val pck@20 was
83.45% but converted int8 scores 67.48% (~16pp convert_fx gap, reported honestly).
Deliverables:
- v2/crates/wifi-densepose-train/scripts/quantize_half_int8.py (reproducible:
header carries the exact ssh command + run date; QAT primary, static PTQ fallback)
- docs/adr/ADR-175-int8-quantization-half-pose-model-measured.md (MEASURED table,
locked normalization, QAT-vs-PTQ labeling, verdict, reproduction, limitations)
- CHANGELOG [Unreleased] ### Added entry
No production Rust or signal-pipeline change. Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a, bit-exact).
* feat(train): metric-locked PCK/MPJPE accuracy harness — resolve PCK-definition ambiguity
The SOTA brief (docs/research/sota-nn-train-benchmark-brief.md §1/§3.1/§4)
identifies metric ambiguity as the single biggest threat to any beyond-SOTA
claim: three PCK@20 numbers (96.09% WiFlow-STD image-normalized, 81.63%
AetherArena torso-PCK, 61.1% GraphPose-Fi standard PCK) cannot be lined up
because each silently uses a different normalization. The project was retracted
twice over this (a withdrawn 92.9% used absolute pixels, not torso).
New src/accuracy.rs makes the normalizer explicit, selectable, and carried with
every reported number:
- PckNormalization enum: TorsoDiameter (standard MM-Fi/GraphPose-Fi hip↔hip),
BoundingBoxDiagonal (looser WiFlow-STD image-normalized), AbsolutePixels(t)
(retracted convention, reproducible + clearly non-comparable).
- pck_at(pred, gt, vis, k, normalization) — one canonical PCK reusing the
metrics_core geometric primitives (no duplicate kernel).
- mpjpe(pred, gt, vis) — 2D/3D, mm.
- PoseAccuracy { pck_at: BTreeMap<u8,f32>, mpjpe, normalization, n_keypoints,
n_frames } via accuracy_report(frames, ks, normalization) — an unlabeled PCK
number is structurally impossible.
17 hand-computed deterministic tests (no GPU, no datasets) prove the harness
arithmetic, including the key proof that identical predictions score
0.50 / 1.00 / 0.75 under the three normalizations, plus graceful degenerate
handling (zero torso, empty frames, NaN coords — no panic, never false-perfect).
This is measurement infrastructure, NOT an accuracy claim. Public API worth an
ADR — needs ADR slot 173 (parent to write).
wifi-densepose-train lib 191→206, test_metrics 12→14, 0 failed; full workspace
green (exit 0); Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a).
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr): ADR-173 — metric-locked PCK/MPJPE accuracy harness
Documents the accuracy harness (committed 3a8b2ed13) that resolves the
PCK-definition ambiguity flagged as the #1 beyond-SOTA risk in the SOTA brief
(#1090): three historical numbers (96/81.6/61) used three unstated
normalizations. The harness makes normalization explicit + selectable
(PckNormalization enum) and every reported number carries its definition.
Key proof: identical predictions → 0.50/1.00/0.75 under torso/bbox/abs.
Co-Authored-By: claude-flow <ruv@ruv.net>
* refactor(train): hoist canonical PCK/OKS to un-gated metrics_core; fold test_metrics onto production (ADR-155 M1 §8)
ADR-155 §8 deferred item: test_metrics.rs reference kernels validated
production against their OWN reimplementation — a test that cannot catch a
canonical-impl bug (both could be wrong the same way).
- Extract canonical_torso_size / pck_canonical / oks_canonical / sigmas /
bounding_box_diagonal into a new NON-tch-gated `metrics_core` module, so
the single metric definition is reachable under
`cargo test --no-default-features` (the `metrics` module is tch-gated).
`metrics` re-exports every item → still exactly ONE implementation.
- Rewrite tests/test_metrics.rs to assert the PRODUCTION pck_canonical /
oks_canonical equal hand-computed fixtures (not a reimplementation):
canonical_pck_matches_hand_computed_fixture (corr=3/total=4/pck=0.75),
hip↔hip normalizer pin, zero-visible⇒0.0, OKS perfect⇒1.0, fake-Gold pin.
- Keep an INDEPENDENT raw-threshold reference kernel only as a differential
cross-check: test_kernel_agrees_with_canonical asserts it AGREES with
canonical where torso==1.0 (genuine cross-check, not duplication).
Grade: MEASURED. test_metrics 10→12 tests, 0 failed.
Co-Authored-By: claude-flow <ruv@ruv.net>
* fix(sensing-server): relabel divergent live PCK/OKS so they're never conflated with canonical (ADR-155 M1 §2.1/§8 Goal C)
Goal C named training_api.rs:804 (torso-HEIGHT PCK). Auditing it surfaced
TWO findings the ADR-155 §1 table missed:
1. training_api.rs is an ORPHAN file — not declared `mod` in lib.rs OR main.rs,
so it does NOT compile into the crate. It does not drive the live server.
2. The REAL live `best_pck`/`best_oks` (main.rs training path → RVF metadata
JSON read by model_manager.rs) come from trainer.rs:
- `pck_at_threshold` = RAW-threshold PCK, NO torso normalization (the most
divergent kind), printed/serialized as bare "PCK@0.2".
- `oks_map` calls `oks_single(area=1.0)` = the EXACT fake-Gold pattern
ADR-155 §2.1 claimed closed elsewhere — still live here, inflating best_oks.
Resolution = RELABEL (torso/raw math is load-bearing on different data; the
pub fns can't be renamed without breaking API; sensing-server has no train/
ndarray dep). Honest unify is a tracked §8 backlog item.
- training_api.rs: `compute_pck` → `compute_pck_torso_height` + divergence doc;
val_pck/best_pck/val_oks struct fields documented as torso-HEIGHT proxies;
logs say `pck_torso_h@0.2`. Test torso_pck_is_labelled_distinctly_from_canonical.
- trainer.rs (LIVE): `pck_at_threshold` documented raw-unnormalized; `oks_map`
area=1.0 flagged fake-Gold; test pck_at_threshold_is_raw_unnormalized_not_canonical.
- main.rs: live print relabelled `pck_raw@0.2` / `oks_map(area=1.0 proxy)`.
No wire-format field renames (back-compat); no pub-API rename (no silent break).
Grade: MEASURED (relabel + divergence pinned). sensing-server 450→451 lib tests, 0 failed.
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(adr-155): mark §8 metric items RESOLVED + audit map + honest §1 under-count correction (M1b Goals A/D)
- §8.1: full PCK/OKS audit map (every def: file:line, basis, canonical/
legacy/distinct), the two §8 items marked RESOLVED with resolution+why.
- Honest finding: §1's "seven divergent metrics" was an UNDER-count —
sensing-server's LIVE trainer.rs has a raw-unnormalized PCK and an
area=1.0 fake-Gold OKS the table omitted, and the file §8 named
(training_api.rs) is orphaned dead code. §9 honest-limits updated.
- Goal D: metrics.rs *_v2 variants confirmed caller-less + deprecated;
noted for future cleanup, NOT deleted (public API, tch-gated).
- CHANGELOG [Unreleased] Fixed entry.
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(ruvector): RaBitQ Pass-2 randomized rotation + topk bugfix (ADR-156 §8)
Implements the deferred "Multi-bit / Extended RaBitQ Pass 2" backlog item
from ADR-156 §8: a deterministic randomized orthogonal rotation applied
before sign-quantization, the published RaBitQ construction (Gao & Long,
SIGMOD 2024).
Rotation construction: Fast Hadamard Transform + seeded ±1 sign flips
("HD" / randomized Hadamard), O(d log d) time and O(d) memory — a dense
d×d rotation is O(d²) and infeasible at the 65,535-d the wire format
provisions for. Pads to the next power of two; SplitMix64 seeds the sign
stream so index-time and query-time rotations are bit-identical.
API is additive and backward-compatible: Pass 1 (`from_embedding`) is
untouched; Pass 2 is opt-in via `Sketch::from_embedding_rotated` and
`SketchBank::with_rotation` (+ `insert_embedding` / `topk_embedding` /
`novelty_embedding` helpers that rotate consistently). Default behaviour
is unchanged.
While building the Pass-2 coverage harness, found and fixed a PRE-EXISTING
correctness bug in `SketchBank::topk`: the n>k heap path used
`BinaryHeap<Reverse<(d,id)>>` (a min-heap) but treated its peek as the
max, so it returned the k FARTHEST sketches as "nearest". The shipped unit
tests only exercised the n≤k fast path, so it went unnoticed. Fixed to a
plain max-heap; pinned by `topk_heap_path_returns_nearest` and
`tight_clusters_give_high_coverage_with_overfetch` (the latter measured
0.072 on the old code).
New tests (+17, 100→117 in the crate): rotation determinism/norm-preservation
(`rotation_is_deterministic_for_seed`, `rotation_preserves_norm`), Pass-2
shape-compatibility, `pass2_coverage_not_worse_than_pass1`, and a
deterministic coverage report.
MEASURED top-K coverage (anisotropic planted-cluster fixture, cosine ground
truth; dim=128 N=2048 K=8 64 clusters noise=0.35 128 queries):
candidate_k=K=8 : Pass1 36.13% -> Pass2 46.39% (both << 90% bar)
candidate_k=24 : Pass1 83.89% -> Pass2 91.60% (Pass2 clears 90%)
candidate_k=32 : Pass1/Pass2 100%
Honest result: rotation consistently helps (+10pp at strict K), but neither
pass clears the ADR-084 90% bar at candidate_k==K on this distribution.
Pass 2 reaches 90% only with ~3x over-fetch (the ADR-084 "candidate set"
deployment pattern). Multi-bit Pass 3 evaluated separately.
Co-Authored-By: claude-flow <ruv@ruv.net>
* feat(ruvector): multi-bit Pass-3 experiment + ADR-156/084 measured results
Adds the multi-bit half of the ADR-156 §8 "Multi-bit / Extended RaBitQ"
item as a MEASURED experiment (coverage::measure_multibit): rotate, then
b-bit uniform scalar-quantize each coord, rank by L1 over codes — the
natural multi-bit generalization of hamming. Measures the bit/coverage
tradeoff the backlog item asked for.
MEASURED at the strict bar (candidate_k=K=8, anisotropic planted-cluster
fixture, cosine ground truth):
Pass1 (1-bit, no rot) 36.13% 16 B/vec
Pass2 (1-bit, rot) 46.39% 16 B/vec
Pass3 (rot, 2-bit) 54.39% 32 B/vec
Pass3 (rot, 3-bit) 66.70% 48 B/vec
Pass3 (rot, 4-bit) 74.22% 64 B/vec
Honest: multi-bit monotonically helps but even 4-bit (4x memory) reaches
only 74% at the strict bar — neither rotation nor <=4-bit multi-bit clears
the strict-K 90% bar on this distribution. The bar is met via over-fetch
(Pass2 @ candidate_k=24). Tests: multibit_tradeoff_report,
multibit_1bit_matches_pass2_approx (+ sanity that 1-bit ~= Pass-2).
Docs:
- ADR-156 §8 item #2 marked RESOLVED-PARTIAL; §5 #2 grade CLAIMED ->
MEASURED-on-our-hardware; new §10 with full measured tables, the topk
bugfix disclosure, and graded deferred sub-items.
- ADR-084: "Pass 2" section answering the rotation open-question with
measured numbers + the topk bug note.
- CHANGELOG [Unreleased]: Added (Pass-2 milestone) + Fixed (topk heap).
Co-Authored-By: claude-flow <ruv@ruv.net>
Each fix ships a test that would have caught the bug:
- ruview_metrics OKS: derive scale from GT extent (no s=1.0 fake-Gold), reject
s<=0, bound the loop to array extents (no panic on short/adversarial input).
- config.validate(): UPPER bounds on window_frames/subcarriers/backbone_channels/
heatmap_size/keypoints/body_parts/batch_size + reject negative gpu_device_id
(closes the config-OOM class); defaults+presets still validate.
- subcarrier.rs: graceful fallback instead of panic on non-contiguous input.
- ablation.rs latency_percentiles: total_cmp + NaN guard (no partial_cmp unwrap).
- tensor.rs softmax(axis): normalize per-lane along the given axis (was whole-
tensor), out-of-range axis -> NnError; fixes densepose per-pixel probs.
- translator.rs apply_attention: real scaled-dot-product attention (was a
uniform 1/seq_len stub that made any "with attention" ablation == without);
mis-shaped checkpoint projections rejected.
Co-Authored-By: claude-flow <ruv@ruv.net>
The deterministic proof self-certified: PASS on any loss decrease (incl. 1e-9
noise) and a missing expected hash defaulted to PASS.
- MIN_LOSS_DECREASE=1e-4: a run counts as learning only above float noise; a
noise-only pipeline now FAILS.
- is_pass() requires hash_matches==Some(true); no-hash -> SKIP (exit 2), never
PASS. verify-training fails fast on a sub-margin loss before the hash compare,
so a missing baseline cannot mask a non-learning pipeline.
Documented honestly: the proof certifies reproducibility/determinism on a
synthetic dataset, NOT that real data produced the weights nor that any accuracy
claim is met. Tests: no_committed_hash_is_skip_not_pass,
submargin_loss_change_fails_even_without_hash,
committed_matching_hash_with_real_decrease_passes.
Co-Authored-By: claude-flow <ruv@ruv.net>
contrastive_step/entropy_step wrote a fake gradient (grad += v*0.01) unrelated
to the stated objective, so any "TTA improves the metric" was unsupported. The
*_loss functions are now pure evaluators of the real objective; adapt() descends
them with a central finite-difference gradient of that exact loss, so "the
adaptation loss decreases" is now a real, reproducible measurement.
Honest scope caveat (documented): this minimizes a self-supervised proxy over a
LoRA bottleneck on raw CSI; it is NOT wired to the pose model and there is NO
measured end-to-end PCK gain on WiFi pose from this path.
Tests: contrastive_loss_decreases, entropy_loss_decreases (real gradient steps
don't increase the loss), reported_loss_is_the_real_objective_not_a_placeholder.
Co-Authored-By: claude-flow <ruv@ruv.net>
MM-Fi windows are stride-1 (~99% overlap), so an index-level split leaks; and
bin/train.rs validated real training against a SYNTHETIC val set, making any
printed PCK meaningless on two counts.
- MmFiDataset::subject_disjoint_split partitions whole subjects -> the two views
share no subject and no window (leak-free by construction, deterministic per
seed). assert_split_leak_free verifies subject- AND window-disjointness and is
called inside the split so a leaky split is never handed out.
- bin/train.rs now prefers the real split; the synthetic path is a labelled
run_smoke_test ("[SMOKE-TEST] DO NOT REPORT") reachable only as a fallback.
- New DatasetError::InvalidSplit.
Tests prove disjointness, determinism, single-subject/bad-fraction rejection,
and that the validator catches an injected subject leak.
Co-Authored-By: claude-flow <ruv@ruv.net>
Collapse the four PCK and three OKS implementations into a single source of
truth — pck_canonical (torso hip↔hip, COCO/ADR-152 convention validated at
~96% PCK@20 in benchmarks/wiflow-std) and oks_canonical (scale from GT pose
extent). MetricsAccumulator, compute_pck/_per_joint/_oks, aggregate_metrics and
the deprecated *_v2 path all route through them, so Trainer::evaluate() and the
bench definition agree.
Fixes two claim-inflating bugs, each pinned by a regression test:
- zero-visible-joint PCK was 1.0 (false-perfect) -> now 0.0
- OKS s=1.0 on normalized coords made OKS~=1.0 for any pose ("fake Gold tier")
-> scale now derived from the pose; a 3x-torso-wrong pose yields OKS<0.2
Divergent local kernels (training_bench raw-threshold, sensing-server
torso-height) annotated "DO NOT USE for reported metrics". Legitimately changed
test expectations (all-coincident "perfect" fixtures are correctly unscoreable;
all-invisible -> 0.0) updated with comments citing the finding.
Co-Authored-By: claude-flow <ruv@ruv.net>
* docs(research): add RuView beyond-SOTA system review (00)
First document of the beyond-SOTA research series: capability audit of
the current RuView engine with role-to-crate maturity matrix, ruvsense
module inventory, gap analysis, and risk register.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* docs(research): add beyond-SOTA architecture design (02, in progress)
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* docs(research): finalize beyond-SOTA architecture (02)
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* docs(research): add benchmark/validation methodology snapshot (03)
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* docs(research): add beyond-SOTA series index with validation results; changelog
README index ties the 5 research docs together with the session's
measured validation evidence: 2,797 workspace tests / 0 failed, Python
proof PASS (bit-exact), and paired pre/post criterion CIR benchmarks.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* perf(signal): precompute CIR warm-start system; hoist tomography solver allocs
Exact, determinism-safe optimizations (bit-identical float results):
- cir.rs: diag(PhiH Phi)+lambda*I and its CSR matrix depend only on Phi
and lambda (fixed at CirEstimator::new) but were rebuilt every frame
(O(K*G) pass + CSR allocation). Now built once in new() via
build_warm_start_system; summation order unchanged.
- tomography.rs: ISTA gradient buffer hoisted out of the 100-iteration
loop (fill(0.0) reset) and the Frobenius Lipschitz bound moved from
per-reconstruct to construction.
Verified: signal 456 tests green; engine 11/11 green including
cycle_is_deterministic and witness-stability tests. Criterion paired
pre/post: cir_estimate/he40 -3.9% (p<0.01), multiband -1.2/-1.4%.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* fix(worldgraph): bound SemanticState growth with deterministic retention
StreamingEngine::process_cycle appended one SemanticState belief per cycle
with no eviction — ~1.7M nodes/day at 20 Hz (beyond-SOTA roadmap finding #6).
Add WorldGraph::prune_semantic_states(max): deterministic eviction of the
oldest beliefs by (valid_from_unix_ms, id); structural nodes (rooms, zones,
sensors, anchors, tracks, events) are never eligible. Wire it into the
engine after each belief append (DEFAULT_SEMANTIC_RETENTION = 7,200, ~6 min
at 20 Hz; set_semantic_retention to tune). The WorldGraph holds current
beliefs; durable history is the recorder's job, so no audit data is lost.
3 new tests: end-to-end bounded growth, oldest-only eviction, deterministic
equal-timestamp tie-break. Workspace gate: 2,865 passed, 0 failed.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* feat(sensing-server): route live frames through the governed StreamingEngine
Closes the live-trust-path gap (ADR-136 section 8, beyond-SOTA system review):
the running server fused live CSI with the bare MultistaticFuser, while the
privacy/provenance/witness control plane (ADR-135..146) only ever ran on
synthetic in-test frames. The privacy control plane was therefore bypassable
on the real path.
New engine_bridge module drives StreamingEngine::process_cycle from the
server's live NodeState map, reusing the existing NodeState -> MultiBandCsiFrame
conversion. It lazily wires each contributing node as a WorldGraph sensor
(idempotent), bounds belief growth via the retention cap, and forwards explicit
timestamps/calibration ids so the path stays deterministic and replayable.
Wired additively into both live ESP32/WiFi fusion sites in main.rs via a
split-borrow off the write guard, so person-count behavior is unchanged; the
latest BLAKE3 witness is stored on AppState. Every published belief now carries
evidence + model + calibration + privacy decision and a deterministic witness.
Adds wifi-densepose-engine/-worldgraph/-bfld/-geo deps. 6 new bridge tests
(witnessed belief with full provenance, cross-run determinism, idempotent node
registration, retention bound, privacy-mode propagation). sensing-server suite
430+128 green; workspace gate 2,904 passed / 0 failed.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* feat(train): falsifiable occupancy benchmark with anti-overfitting gate
Makes the presence/person-count "beyond SOTA" claim falsifiable in code
instead of aspirational (the unfalsifiability gap from the beyond-SOTA system
review). occupancy_bench grades predictions vs ground truth and gates a SOTA
claim behind one claim_allowed invariant requiring ALL of:
- DataProvenance::Measured — synthetic/mock data is scorable for regression
but never claimable (anti-mock-contamination; the CLAUDE.md Kconfig-bug
lesson made structural).
- A leak-free EvalSplit — validate() refuses any split where a subject OR
environment id appears in both train and test (subject leakage /
per-environment overfitting).
- n_test >= min_test_samples (small-N guard).
- Presence F1 whose bootstrap-CI lower bound (deterministic seeded splitmix64)
clears the threshold — not the point estimate.
- Count MAE within threshold.
The claim string is unreadable except through the gate (NO_CLAIM otherwise),
same discipline as the ruview-gamma acceptance gate. What remains is data, not
method: a frozen, SHA-pinned, subject/environment-disjoint measured replay set
turns the claim into a passing/failing test.
Lives in wifi-densepose-train (the eval bounded context, alongside ablation/
eval/metrics). 10 tests cover each refusal path; warning-clean under the
crate's missing_docs lint. Workspace gate 2,914 passed / 0 failed. Doc 03
updated.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* feat(engine): per-room adapter provenance + drift-to-recalibration advisor
Closes the trust-chain gap where an ~11 KB per-room LoRA adapter (ADR-150
section 3.4) could silently change inference without the witness noticing:
provenance carried only "rfenc-v<N>" with no notion of adapter identity.
- StreamingEngine::set_room_adapter(AdapterInfo): pins the adapter's
content-derived id into provenance model_version
("rfenc-v1+adapter:<id>") — and therefore into the BLAKE3 witness — so
swapping or clearing adapter weights always shifts the witness. Engine test
proves base -> adapter -> other-adapter -> cleared all witness differently
and cleared == base.
- RecalibrationAdvisor: recommends re-running the ADR-135 empty-room baseline
/ refitting the room adapter on sustained low fusion coherence (streak
threshold, default 60 cycles ~ 3 s at 20 Hz) or an ADR-142 change-point.
Surfaced as TrustedOutput::recalibration_recommended, stored on the
sensing-server AppState alongside the witness at both live fusion sites.
- Bridge plumbing: EngineBridge::{set_room_adapter, clear_room_adapter} +
live-path test that the adapter id flows into the live witness.
Scope note (honest): this is the deployable provenance/trigger half of the
"retrained model" roadmap item. Fitting the adapter itself runs in the
existing external calibration service (aether-arena/calibration/); a trained
RF-encoder checkpoint still does not exist in-tree.
Engine 15 tests, bridge 7 tests. Workspace gate: 2,918 passed / 0 failed.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* fix(mat): gate api module behind its feature — standalone no-default-features builds
pub mod api was unconditional while its only dependency, serde, is optional
behind the 'api' feature, so any build without default features failed with
101 unresolved-serde errors (masked in --workspace runs by feature
unification). The api module and its create_router/AppState re-export are now
cfg(feature = "api")-gated with docsrs annotations.
All combos compile: bare --no-default-features (was 101 errors, now 0),
--no-default-features --features api, and full default (177 tests pass).
Workspace gate: 2,918 passed / 0 failed.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* perf(signal): opt-in FFT operator for the CIR ISTA solver (8-14x measured)
Phi is a sub-DFT, so each ISTA mat-vec can run as one length-G FFT
(O(G log G)) instead of a dense O(K*G) product — the dominant-latency-hazard
finding from the beyond-SOTA optimization roadmap.
New CirConfig::fft_operator, default FALSE: the dense path stays the
bit-exact witness default. The FFT evaluates the same sums in a different
order, so enabling it shifts float results in the last bits and requires
regenerating any pinned witness — strictly opt-in per deployment.
FftOperator (rustfft, planned once at CirEstimator::new, scratch buffers
reused across the ISTA loop) dispatches inside ista_solve:
Phi x = scale * forward-FFT(x) sampled at bins (k_idx mod G)
Phi^H v = scale * unnormalised inverse-FFT of v scattered into those bins
Warm-start and Lipschitz estimation stay dense at construction.
Measured (criterion, same run, same machine):
ht20: 2.22 ms -> 265 us (8.4x)
ht40: 10.26 ms -> 717 us (14.3x)
The real HE40 grid (K=484, G=1452) scales further per the O(K*G)/O(G log G)
ratio.
3 new tests: FFT<->dense matvec equivalence to float tolerance on ht20 and
he40 grids; end-to-end dominant-tap agreement on a single-path frame; all
default configs keep FFT off. New cir_estimate_fft bench group.
Workspace gate: 2,921 passed / 0 failed (default path bit-exact, witnesses
unchanged).
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* feat(core): canonical frame decoder — capture-to-claim replay (ADR-136)
The encode half of the ADR-136 frame contract existed (ComplexSample,
to_canonical_bytes, witness_hash) but there was no decoder: a captured
canonical frame could be witnessed but never reconstructed, blocking
replay-from-capture.
CsiFrame::from_canonical_bytes is the exact inverse: same id, metadata,
complex payload, and witness hash (tested as the round-trip law AC7 — the
replayed frame re-encodes byte-identically). Amplitude/phase are recomputed
from the payload (projections, not independent state). Every malformed-input
class fails closed (AC8): header truncation -> Truncated, payload truncation
-> PayloadMismatch, unknown discriminants, non-UTF-8 device id, trailing
bytes. Nil calibration uuid decodes as None per the documented encoding.
Core: 36 tests pass. Workspace gate: 2,937 passed / 0 failed.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* feat(engine): dynamic min-cut mesh partition guard (ruvector-mincut)
Maintains an exact min-cut over the live mesh coupling graph — nodes are
sensing nodes, coupling is the product of fusion attention weights — and
surfaces per cycle, as TrustedOutput::mesh:
- cut value: the global "how close is the array to partitioning" number,
a structural measure per-node heuristics miss;
- weak side: which specific nodes would split off (failure/jamming triage,
feeds ADR-032 posture);
- at-risk flag: counts as a structural event for the drift->recalibration
advisor (alongside ADR-142 change-points).
Degenerate cases fail toward risk: a node with zero coupling is reported as
already partitioned (cut 0, that node as the weak side).
Measured cost policy (criterion, 12-node mesh — the honest part):
- weights quantized (1/64) + change-gated: steady-state cycles do ZERO graph
work and reuse the cached cut (~7.3 us, ~23x cheaper than building);
- on any real change a full exact rebuild (~171 us) is used, because ONE
DynamicMinCut delete+insert measured ~240 us — the subpolynomial machinery
amortizes on much larger graphs, so rebuild-on-change is the measured
optimum at mesh scale (one-edge case -28% after switching policy);
- full process_cycle with the guard: ~33 us for 4 nodes vs the 50 ms budget.
9 mesh_guard tests (weak-node detection, steady-state zero updates,
sub-quantum gating, join/drop rebuild, determinism, disconnection) + an
engine-level wiring test (down-weighted node -> weak side -> recalibration).
Engine 24 tests; workspace gate 2,946 passed / 0 failed.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* feat(engine): mesh partition risk demotes privacy + enters the witness (ADR-032)
Completes the mesh-guard integration: its at_risk signal was advisory-only
(fed the recalibration advisor). It now also contributes to the ADR-141
privacy demotion alongside fusion- and array-level contradictions — a mesh
close to partitioning makes the fused belief less trustworthy, so the cycle
emits at a more restricted class (monotonic; information only removed).
Because effective_class feeds the BLAKE3 witness, a fragmenting array now
shifts the witness: partition risk is auditable, not just logged. The mesh
computation moved ahead of the demotion step in process_cycle; mesh_guard_mut
exposes risk-threshold tuning.
Test: a forced-risk 3-node cycle demotes PrivateHome Anonymous->Restricted
and shifts the witness vs a clean baseline. Engine 25 tests; workspace gate
2,947 passed / 0 failed.
https://claude.ai/code/session_01MjBucx95K4BuUxZi8NWwRH
* fix: public-PR review findings — privacy-path honesty, gate holes, mesh-guard cliff
- sensing-server: engine errors logged+counted (no silent swallow), trust
state exposed via status surface, privacy-demotion claims aligned with
the actual parallel-audit-path behavior
- occupancy_bench: vacuous-F1 hole closed (degenerate test sets fail with
their own criterion); CI-lower-bound test made probative
- mesh_guard: quantization scaled to observed coupling range — >=65-node
balanced meshes no longer permanently at_risk (regression test)
- engine: both wiring tests made probative (same-topology witness compare,
deterministic risk-crossing fixture)
- mat: axum/tokio optional behind api; real serde feature (api enables it)
- core: canonical decoder strict (non-zero reserved bytes and nil UUID
rejected — injective on accepted domain, forged-bytes tests)
- CHANGELOG: un-spliced the FFT/adapter bullet mangle
Co-Authored-By: claude-flow <ruv@ruv.net>
* chore: strip private-track references for public PR
Reword the occupancy-benchmark changelog bullet to drop a cross-reference
to the private research track, and restore the WorldGraph retention bullet
header that was glued onto the preceding MAT bullet.
Co-Authored-By: claude-flow <ruv@ruv.net>
* chore: lockfile refresh for cherry-picked feature set
Co-Authored-By: claude-flow <ruv@ruv.net>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Per direction "remove the initial number, optimize for benchmark first" + "include
witness chain capabilities for proof and repeatability analysis":
- Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a
real scoring-pipeline witness; RuView gets no hand-entered baseline.
- Real model scoring: aa_score_runner now loads predictions + an eval split
(--split/--pred) and scores them through the real ruview_metrics pose harness —
not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json).
- Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact
inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version.
- Repeatability analysis: --repeat N runs the harness N× and fails if it ever
yields >=2 distinct proof hashes (16/16 identical locally).
- Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper-
evident (seed/append/verify); editing any past row breaks the chain.
- CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger
chain verify on every PR.
Co-Authored-By: claude-flow <ruv@ruv.net>
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:
- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
(pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).
Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.
Co-Authored-By: claude-flow <ruv@ruv.net>
Publishing the additive changes from PRs #536/#537 to crates.io:
- `signal_features` module — wires `wifi-densepose-signal` into the pipeline
(audit #1/#2)
- `TrainingConfig::for_subcarriers` / `ht40_192()` / `multiband_168()` presets
+ the real `MmFiDataset` loader integration test (audit #4/#6/#7)
No public API removals or changes — additive only, so 0.3.0 -> 0.3.1 is
semver-correct. No other workspace crate depends on `wifi-densepose-train`,
so this is a standalone bump.
Co-Authored-By: claude-flow <ruv@ruv.net>
Closes the remaining doable items from the 2026-05-11 training-pipeline audit:
#6 (CSI format default = 56-sc / 1 NIC) + #7 (multi-band 168-sc mesh not in
config): new `TrainingConfig::for_subcarriers(native, target)` plus named
presets `mmfi()` (114→56), `ht40_192()` (≈192-sc ESP32 HT40 → 56) and
`multiband_168()` (168-sc ADR-078 multi-band mesh → 56). Non-MM-Fi CSI shapes
are now first-class instead of requiring manual `native_subcarriers` /
`num_subcarriers` overrides; the field docs list the supported source counts
and the multi-NIC mapping (a 2–3-node mesh currently rides on `n_rx` until a
dedicated node dimension lands). Model input width stays `num_subcarriers`; the
presets only vary the resampling input.
#4 (proof.rs uses synthetic data): reframed — a deterministic proof *must* use
a reproducible source, so `verify-training` correctly stays on
`SyntheticCsiDataset`. The real gap was that nothing exercised the on-disk
`MmFiDataset` path. New `tests/test_real_loader.rs` writes synthetic CSI to
`.npy` files in the `MmFiDataset::discover` layout, loads it back, and checks
the resulting `CsiSample` — covering the no-interp case, the
subcarrier-interpolation branch, and the empty-root case. Adds `ndarray` /
`ndarray-npy` as dev-deps for the fixture writing.
cargo check + cargo test -p wifi-densepose-train --no-default-features: clean,
all existing tests green, 3 new loader tests + the updated config doctest pass.
Purely additive — no model-shape change, no tch-module change.
Addresses three findings from the 2026-05-11 training-pipeline audit:
#1/#2 — `wifi-densepose-signal` was a phantom dependency of `wifi-densepose-train`
(listed in Cargo.toml, never imported), and vitals/CSI signal features were
absent from the pipeline. New module `wifi_densepose_train::signal_features`:
`extract_signal_features(&Array4<f32>, &Array4<f32>) -> Array1<f32>` (and the
convenience method `CsiSample::signal_features()`) runs a windowed observation's
centre frame through `wifi_densepose_signal::features::FeatureExtractor`,
producing a fixed-length (FEATURE_LEN=12) amplitude / phase-coherence / PSD
feature vector — the hook for a future vitals / multi-task supervision head
(breathing- and heart-rate-band power are read off the PSD summary). The vector
is produced on demand and is not yet fed back into the loss; wiring it as a
training target is the documented follow-up. `wifi-densepose-signal` is now an
actually-used dependency. 5 new tests (2 unit in signal_features.rs, 3
integration in tests/test_dataset.rs); existing wifi-densepose-train tests
unchanged and green.
#3 — `docs/huggingface/MODEL_CARD.md` presented PIR/BME280 environmental-sensor
weak-label fine-tuning as a current capability; there is no env-sensor
ingestion in the training pipeline. Marked that path as planned/not-implemented
in the training-steps list and the data-provenance section.
(#5 — README's "92.9% PCK@20" overclaim — fixed separately in PR #535.)
CHANGELOG updated.
The Rust port lived two directories deep (rust-port/wifi-densepose-rs/)
without any sibling under rust-port/ that warranted the extra level.
Move the whole workspace up to v2/ to match v1/ (Python) at the same
depth and shorten every cd / build command across the repo.
git mv preserves history for all tracked files. 60 files updated for
path references (CI workflows, ADRs, docs, scripts, READMEs, internal
.claude-flow state). Two manual fixes for relative-cd paths in
CLAUDE.md and ADR-043 that became wrong after the depth change
(cd ../.. → cd ..).
Validated:
- cargo check --workspace --no-default-features → clean (after target/
nuke; the gitignored target/ was carried by the OS rename and had
hard-coded old paths in build scripts)
- cargo test --workspace --no-default-features → 1,539 passed, 0 failed,
8 ignored (same totals as pre-rename)
- ESP32-S3 on COM7 → still streaming live CSI (cb #40300, RSSI -64 dBm)
After-merge follow-up: contributors should `rm -rf v2/target` once and
let cargo regenerate from the new path.