- README: replace retracted "100% presence" claim with honest 82.3%
held-out temporal-triplet; correct stale "pose model not in this
release" (now live at ruvnet/wifi-densepose-mmfi-pose, 82.69%
torso-PCK@20 SOTA); add a Results & proof table (HF models,
AetherArena, benchmark study, deterministic verify.py proof, witness).
- user-guide: same 100%->82.3% correction in two places; add Results &
proof pointers and the SOTA pose model + AetherArena links.
- docs/proof-of-capabilities.md (new): evidence-first rebuttal to the
"fake / misleading" claims. Concedes what was fair (over-stated early
metrics, AI-doc tone), refutes the category errors (simulate-mode
mistaken for fraud; missing weights mistaken for missing pipeline),
and gives copy-paste "prove it yourself" steps (verify.py VERDICT:
PASS + published SHA-256, cargo test, HF model pull, ESP32 CSI).
Emphasizes built-in-public history (git, 96 ADRs, CHANGELOG, issues
incl. #803/#872 bug->fix arcs) as the anti-facade evidence.
- aether-arena/VERIFY.md: cross-link the whole-platform proof doc.
Verified: python archive/v1/data/proof/verify.py -> VERDICT: PASS
(hash ca58956c...9199 matches published expected_features.sha256).
Co-Authored-By: claude-flow <ruv@ruv.net>
I'd shipped the Rust cog-pose --adapter *consumer* (+test) but there was no
*producer* for cog-format adapters, leaving it a half-feature. cog_calibrate.py
fits a rank-r LoRA on the cog conv+MLP head (pose_v1.safetensors, 56x20) from a
labeled in-room capture and writes a safetensors with fc1.a/fc1.b/fc2.a/fc2.b
(scale baked into b) — exactly what the Rust engine loads. Verified against the
in-repo pose_v1.safetensors: correct keys/shapes, reduces fit error, active
adapter, ~2.6KB. Adds test_cog_calibration.py (passes) + README documenting the
two non-interchangeable producers (transformer .npz vs cog safetensors).
Co-Authored-By: claude-flow <ruv@ruv.net>
The committed calibration service (model.py/calibrate.py/infer.py) had no
automated test — only ad-hoc verification. Adds a CPU-only, no-real-checkpoint
test that exercises the CLI end-to-end on synthetic data: build base ->
calibrate.py fits adapter -> infer.py runs base+adapter, asserting adapter
size (<200KB), keypoint shape [N,17,2], finiteness, [0,1] range, and that the
adapter actually changes the output. Passes on Windows CPU (torch 2.11).
Co-Authored-By: claude-flow <ruv@ruv.net>
Operationalizes the campaign's central finding (ADR-150 §3.3-3.6): a frozen
shared base + a ~11KB per-room LoRA adapter from ~100-200 labeled samples
recovers SOTA-level pose in any new room/person. Verified end-to-end:
source-only base zero-shot 3.09% on unseen room -> 74.29% after 200-sample
calibration. Files: model.py (PoseNet+LoRA), calibrate.py, infer.py, README
with measured calibration budget.
Co-Authored-By: claude-flow <ruv@ruv.net>
Public face of the benchmark: empty-board leaderboard from the witness ledger,
chain-integrity display, submit/verify/about tabs. Presentation layer per ADR-149
§2.2 (heavy scoring stays in the pinned RuView harness / CI).
Live: https://huggingface.co/spaces/ruvnet/aether-arena
Co-Authored-By: claude-flow <ruv@ruv.net>
Per direction "remove the initial number, optimize for benchmark first" + "include
witness chain capabilities for proof and repeatability analysis":
- Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a
real scoring-pipeline witness; RuView gets no hand-entered baseline.
- Real model scoring: aa_score_runner now loads predictions + an eval split
(--split/--pred) and scores them through the real ruview_metrics pose harness —
not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json).
- Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact
inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version.
- Repeatability analysis: --repeat N runs the harness N× and fails if it ever
yields >=2 distinct proof hashes (16/16 identical locally).
- Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper-
evident (seed/append/verify); editing any past row breaks the chain.
- CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger
chain verify on every PR.
Co-Authored-By: claude-flow <ruv@ruv.net>
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:
- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
(pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).
Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.
Co-Authored-By: claude-flow <ruv@ruv.net>