Commit Graph

16 Commits

Author SHA1 Message Date
ruv 0fbdd15955 docs: results+proof links, capabilities-proof rebuttal, fix stale claims
- README: replace retracted "100% presence" claim with honest 82.3%
  held-out temporal-triplet; correct stale "pose model not in this
  release" (now live at ruvnet/wifi-densepose-mmfi-pose, 82.69%
  torso-PCK@20 SOTA); add a Results & proof table (HF models,
  AetherArena, benchmark study, deterministic verify.py proof, witness).
- user-guide: same 100%->82.3% correction in two places; add Results &
  proof pointers and the SOTA pose model + AetherArena links.
- docs/proof-of-capabilities.md (new): evidence-first rebuttal to the
  "fake / misleading" claims. Concedes what was fair (over-stated early
  metrics, AI-doc tone), refutes the category errors (simulate-mode
  mistaken for fraud; missing weights mistaken for missing pipeline),
  and gives copy-paste "prove it yourself" steps (verify.py VERDICT:
  PASS + published SHA-256, cargo test, HF model pull, ESP32 CSI).
  Emphasizes built-in-public history (git, 96 ADRs, CHANGELOG, issues
  incl. #803/#872 bug->fix arcs) as the anti-facade evidence.
- aether-arena/VERIFY.md: cross-link the whole-platform proof doc.

Verified: python archive/v1/data/proof/verify.py -> VERDICT: PASS
(hash ca58956c...9199 matches published expected_features.sha256).

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-31 10:29:28 -04:00
ruv e94f4d8f73 feat(calibration): cog adapter producer — completes the cog --adapter feature
I'd shipped the Rust cog-pose --adapter *consumer* (+test) but there was no
*producer* for cog-format adapters, leaving it a half-feature. cog_calibrate.py
fits a rank-r LoRA on the cog conv+MLP head (pose_v1.safetensors, 56x20) from a
labeled in-room capture and writes a safetensors with fc1.a/fc1.b/fc2.a/fc2.b
(scale baked into b) — exactly what the Rust engine loads. Verified against the
in-repo pose_v1.safetensors: correct keys/shapes, reduces fit error, active
adapter, ~2.6KB. Adds test_cog_calibration.py (passes) + README documenting the
two non-interchangeable producers (transformer .npz vs cog safetensors).

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-31 05:10:07 -04:00
ruv 76cc57294d test(calibration): self-contained end-to-end regression test
The committed calibration service (model.py/calibrate.py/infer.py) had no
automated test — only ad-hoc verification. Adds a CPU-only, no-real-checkpoint
test that exercises the CLI end-to-end on synthetic data: build base ->
calibrate.py fits adapter -> infer.py runs base+adapter, asserting adapter
size (<200KB), keypoint shape [N,17,2], finiteness, [0,1] range, and that the
adapter actually changes the output. Passes on Windows CPU (torch 2.11).

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-31 05:02:24 -04:00
ruv 4db727649a feat(calibration): RuView per-room calibration service (reference impl)
Operationalizes the campaign's central finding (ADR-150 §3.3-3.6): a frozen
shared base + a ~11KB per-room LoRA adapter from ~100-200 labeled samples
recovers SOTA-level pose in any new room/person. Verified end-to-end:
source-only base zero-shot 3.09% on unseen room -> 74.29% after 200-sample
calibration. Files: model.py (PoseNet+LoRA), calibrate.py, infer.py, README
with measured calibration budget.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-31 02:22:10 -04:00
ruv 7bad51aca6 publish: best MM-Fi benchmark set (in-domain 83.59, x-subject 64.0, x-env 17.5 CORAL)
Append best witness rows to ledger (seq 2-4) + update HF Space leaderboard banner.
In-domain 83.59% torso-PCK@20 (graph+ensemble+TTA) supersedes the 81.63 single-model entry,
+11.34 over MultiFormer 72.25. Cross-subject 64.04% (official split). Cross-environment 17.51%
(CORAL domain alignment, the cross-room DG win). Gist + issue #876 updated with frontier map.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 22:22:53 -04:00
ruv eb3509e9ab reframe(aether-arena): vendor-neutral industry benchmark, RuView is one entrant 2026-05-30 19:59:10 -04:00
ruv 046b2564b8 feat(aether-arena): publish RuView MM-Fi SOTA result + ADR-150 RF Foundation Encoder
- Ledger witness row (seq 1, Gold): RuView CSI-Transformer 81.63% torso-PCK@20 on
  MM-Fi random_split, exceeding MultiFormer 72.25% (CSI2Pose 68.41%) — protocol- and
  metric-matched, self-corrected from inflated 91.86% bbox. Hash-chained, verifiable.
- HF Space updated with the controlled SOTA claim + caveat (cross-subject is the frontier).
- Proof/replay/witness gist: gist.github.com/ruvnet/af2fbc1c7674dddf09c15509b3c7f785
- Tracking issue #876 (result + Generalization Track roadmap).
- ADR-150: RuView RF Foundation Encoder — pose-preserving, subject/room/device-invariant
  SSL embedding (masked CSI + pose-contrast-across-subjects + coherence head); the
  principled attack on the cross-subject frontier. DANN failed; this is the corrected design.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 19:55:58 -04:00
ruv 4f7ab8e4f0 docs(aether-arena): v0 infrastructure complete — Space live, harness gate passing (M8) 2026-05-30 17:15:08 -04:00
ruv de6715d958 fix(aether-arena): move HF Space to gradio 5.9.1 (4.44.1 jinja2 cache bug) 2026-05-30 17:14:21 -04:00
ruv c1c04441e9 fix(aether-arena): Space launch on 0.0.0.0:7860 2026-05-30 17:10:17 -04:00
ruv 5284591770 fix(aether-arena): pin huggingface_hub 0.25.2 for gradio 4.44.1 Space 2026-05-30 17:07:08 -04:00
ruv 3f93fcd4ea fix(aether-arena): pin HF Space to python 3.12 (gradio pydub pyaudioop 3.13 removal) 2026-05-30 17:03:14 -04:00
ruv 644b4ba816 docs(aether-arena): mark M6 HF Space deployed 2026-05-30 17:02:03 -04:00
ruv 9359bf5d04 feat(aether-arena): HF Space (Gradio) v0 — deployed to ruvnet/aether-arena (M6)
Public face of the benchmark: empty-board leaderboard from the witness ledger,
chain-integrity display, submit/verify/about tabs. Presentation layer per ADR-149
§2.2 (heavy scoring stays in the pinned RuView harness / CI).
Live: https://huggingface.co/spaces/ruvnet/aether-arena

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 17:01:10 -04:00
ruv 483bfa4660 feat(aether-arena): benchmark-first scorer + witness chain + repeatability (M2/M5/M7)
Per direction "remove the initial number, optimize for benchmark first" + "include
witness chain capabilities for proof and repeatability analysis":

- Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a
  real scoring-pipeline witness; RuView gets no hand-entered baseline.
- Real model scoring: aa_score_runner now loads predictions + an eval split
  (--split/--pred) and scores them through the real ruview_metrics pose harness —
  not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json).
- Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact
  inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version.
- Repeatability analysis: --repeat N runs the harness N× and fails if it ever
  yields >=2 distinct proof hashes (16/16 identical locally).
- Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper-
  evident (seed/append/verify); editing any past row breaks the chain.
- CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger
  chain verify on every PR.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:59:11 -04:00
ruv a6808568a2 feat(aether-arena): ADR-149 spatial-intelligence benchmark — scorer + CI harness gate (M1-M4)
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:

- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
  (pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
  only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
  formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
  harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
  SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
  every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).

Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:47:22 -04:00