Completes the end-to-end product path: cog-pose-estimation run --config
<cfg> --adapter <room.safetensors> loads the shared base + a per-room LoRA
adapter for calibrated inference. Adds InferenceEngine::with_adapter()
(default weights + adapter) and logs when a calibration adapter is active.
6/6 tests pass.
Co-Authored-By: claude-flow <ruv@ruv.net>
Ports the calibration mechanism (ADR-150 §3.5-3.6, reference impl in
aether-arena/calibration/) into the real product pose engine. The Candle
InferenceEngine now loads an optional per-room adapter safetensors and
applies low-rank deltas (y + (x.A).B) on the fc1/fc2 head at inference.
Architecture-agnostic LoRA; base behaviour unchanged when no adapter.
New API: with_weights_and_adapter(), is_calibrated(). Tested: adapter
detection + output-change integration test (6/6 pass).
Co-Authored-By: claude-flow <ruv@ruv.net>
Operationalizes the campaign's central finding (ADR-150 §3.3-3.6): a frozen
shared base + a ~11KB per-room LoRA adapter from ~100-200 labeled samples
recovers SOTA-level pose in any new room/person. Verified end-to-end:
source-only base zero-shot 3.09% on unseen room -> 74.29% after 200-sample
calibration. Files: model.py (PoseNet+LoRA), calibrate.py, infer.py, README
with measured calibration budget.
Co-Authored-By: claude-flow <ruv@ruv.net>
Decisive capstone: cross-environment (unseen room+people) zero-shot
10.6%, but 5 calibration samples/person -> 60%, 200 -> 73%. The hard
frontier is calibration-soluble, MORE dramatically than cross-subject
(+62.5 vs +12 at K=200). The unsolved-frontier framing was a zero-shot
artifact. Reframes generalization: ship few-shot calibration, not
zero-shot invariance. Recommend accepting ADR-150 re-scoped around the
calibration mechanism.
Co-Authored-By: claude-flow <ruv@ruv.net>
Compared per-room calibration methods at K=200: LoRA rank-8 recovers
63.6->72.5% (SOTA-level) with just 11K params (~11KB), 0.5% the model
size. Validates the ship-base-once + tiny-per-room-adapter mechanism for
the RuView calibration service. Accuracy/size knob documented.
Co-Authored-By: claude-flow <ruv@ruv.net>
Measured cross-subject PCK vs N training subjects: 4->8 = +21pts, but
24->32 = +0.45pt. Saturates ~64%, ~19pt below in-domain. Correction to
'more data': subject-count returns vanish past ~16-20; the residual is
device/room/protocol shift. Re-scope phase-1 capture around DIVERSITY
(rooms/devices/protocols) + few-shot target adaptation, not headcount.
Co-Authored-By: claude-flow <ruv@ruv.net>
Published deployable int4-QAT micro (verified 74.08%, ~20KB) at
ruvnet/wifi-densepose-mmfi-pose/edge. Runs 0.135ms single-thread x86 CPU
(no GPU) - real-time pose without an accelerator. ARM on-device validation
pending fleet availability.
Co-Authored-By: claude-flow <ruv@ruv.net>
Swept model size on MM-Fi random_split: every config from micro (75,237
params, 0.22ms, 74.30%) up beats MultiFormer (72.25%); nano (40K, 0.13ms)
within 0.5pt. Pareto-dominant (smaller AND more accurate than prior SOTA).
Orthogonal to the data-bound accuracy frontier (ADR-150).
Co-Authored-By: claude-flow <ruv@ruv.net>
Measured all near-term levers on the official MM-Fi cross-subject split:
- mixup+TTA+ensemble = best at 64.92% (+0.9 over doc 64.04)
- pose-contrastive foundation pretrain: estimated +5..+12, MEASURED -2.3
(SupCon loss pinned at ln(B) across K/BS/seeds -> same-pose CSI is not
contrastively alignable across subjects)
- instance-norm+SpecAugment -4.6; CORAL/DANN ~0
Conclusion: the 18-pt in-domain<->cross-subject gap is fundamental subject
shift, not algorithmic. Promotes multi-subject data collection to the primary
lever; recommends re-scoping ADR-150 phase 1 around capture.
Co-Authored-By: claude-flow <ruv@ruv.net>
v1 '100% presence accuracy' was on a single-class overnight recording
(6062/6063 'present'). Replaced with v2 encoder's honest label-free
held-out temporal-triplet accuracy (66.4% raw -> 82.3% trained).
Models published to HF; tracking ruvnet/RuView#882.
Co-Authored-By: claude-flow <ruv@ruv.net>
Public face of the benchmark: empty-board leaderboard from the witness ledger,
chain-integrity display, submit/verify/about tabs. Presentation layer per ADR-149
§2.2 (heavy scoring stays in the pinned RuView harness / CI).
Live: https://huggingface.co/spaces/ruvnet/aether-arena
Co-Authored-By: claude-flow <ruv@ruv.net>
Per direction "remove the initial number, optimize for benchmark first" + "include
witness chain capabilities for proof and repeatability analysis":
- Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a
real scoring-pipeline witness; RuView gets no hand-entered baseline.
- Real model scoring: aa_score_runner now loads predictions + an eval split
(--split/--pred) and scores them through the real ruview_metrics pose harness —
not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json).
- Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact
inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version.
- Repeatability analysis: --repeat N runs the harness N× and fails if it ever
yields >=2 distinct proof hashes (16/16 identical locally).
- Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper-
evident (seed/append/verify); editing any past row breaks the chain.
- CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger
chain verify on every PR.
Co-Authored-By: claude-flow <ruv@ruv.net>
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:
- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
(pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).
Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.
Co-Authored-By: claude-flow <ruv@ruv.net>
- User guide: full retrain workflow (record → vqvae → transformer → serve)
with checkpoint path usage
- README: note fine-tune capability in world model capability row
Co-Authored-By: claude-flow <ruv@ruv.net>
Drives a real SemanticBus: raw snapshot (fall_detected, past warmup) ->
FallRisk primitive -> SemanticStateRecord (provenance) -> single-signal rule
fires / multi-signal agreement rule does NOT (no false escalation) -> expired
record rejected. Proves the ADR-140 credibility path end to end.
Co-Authored-By: claude-flow <ruv@ruv.net>
Weaves the three framing points into every ADR in the series:
- skeleton/scaffolding (data contracts + trust/privacy/audit machinery +
algorithms; real, tested, compiling) that existing sensing code plugs into
- Built (tested building block) vs Integration glue (not yet on the live 20 Hz
path) — per-ADR, with commit + issue references
- trust throughline (traceable evidence, sensor agreement, calibration
provenance, auditable privacy)
ADR-136 §8 carries the full series framing; 137-146 carry per-ADR status.
Co-Authored-By: claude-flow <ruv@ruv.net>