Commit Graph

3 Commits

Author SHA1 Message Date
ruv 0fbdd15955 docs: results+proof links, capabilities-proof rebuttal, fix stale claims
- README: replace retracted "100% presence" claim with honest 82.3%
  held-out temporal-triplet; correct stale "pose model not in this
  release" (now live at ruvnet/wifi-densepose-mmfi-pose, 82.69%
  torso-PCK@20 SOTA); add a Results & proof table (HF models,
  AetherArena, benchmark study, deterministic verify.py proof, witness).
- user-guide: same 100%->82.3% correction in two places; add Results &
  proof pointers and the SOTA pose model + AetherArena links.
- docs/proof-of-capabilities.md (new): evidence-first rebuttal to the
  "fake / misleading" claims. Concedes what was fair (over-stated early
  metrics, AI-doc tone), refutes the category errors (simulate-mode
  mistaken for fraud; missing weights mistaken for missing pipeline),
  and gives copy-paste "prove it yourself" steps (verify.py VERDICT:
  PASS + published SHA-256, cargo test, HF model pull, ESP32 CSI).
  Emphasizes built-in-public history (git, 96 ADRs, CHANGELOG, issues
  incl. #803/#872 bug->fix arcs) as the anti-facade evidence.
- aether-arena/VERIFY.md: cross-link the whole-platform proof doc.

Verified: python archive/v1/data/proof/verify.py -> VERDICT: PASS
(hash ca58956c...9199 matches published expected_features.sha256).

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-31 10:29:28 -04:00
ruv 483bfa4660 feat(aether-arena): benchmark-first scorer + witness chain + repeatability (M2/M5/M7)
Per direction "remove the initial number, optimize for benchmark first" + "include
witness chain capabilities for proof and repeatability analysis":

- Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a
  real scoring-pipeline witness; RuView gets no hand-entered baseline.
- Real model scoring: aa_score_runner now loads predictions + an eval split
  (--split/--pred) and scores them through the real ruview_metrics pose harness —
  not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json).
- Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact
  inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version.
- Repeatability analysis: --repeat N runs the harness N× and fails if it ever
  yields >=2 distinct proof hashes (16/16 identical locally).
- Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper-
  evident (seed/append/verify); editing any past row breaks the chain.
- CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger
  chain verify on every PR.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:59:11 -04:00
ruv a6808568a2 feat(aether-arena): ADR-149 spatial-intelligence benchmark — scorer + CI harness gate (M1-M4)
AetherArena ("AA") — the official, project-agnostic Spatial-Intelligence Benchmark
(ADR-149, Accepted). Iteration 1 of the long-horizon build:

- ADR-149 accepted: name locked (ruvnet/aether-arena), v0 metrics locked
  (pose/presence/latency/determinism), dataset legality resolved (MM-Fi CC BY-NC
  only; Wi-Pose excluded). Adds four-part framing, threat model, arena_score
  formula, submission state machine, neutrality/governance, and the §7 acceptance test.
- aa_score_runner: deterministic scorer bin reusing the real ruview_metrics pose
  harness on a fixed seed=42 fixture → RuViewTier-style verdict + cross-platform
  SHA-256 proof hash. Builds --no-default-features (no torch/GPU). VERDICT: PASS.
- CI harness gate: .github/workflows/aether-arena-harness.yml runs the scorer on
  every PR — the "PR that runs the harness as part of the build" requirement.
- Scaffold: aether-arena/{README,VERIFY,STATUS}.md + schema/aa-submission.toml.
- Horizon record persisted (.claude-flow/horizons/aether-arena-aa.json).

Infra = the deliverable; model SOTA (MM-Fi PCK@20) is a separate effort blocked on
ADR-079 data collection, tracked as a stretch goal, not an infra exit.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:47:22 -04:00