Commit Graph

3 Commits

Author SHA1 Message Date
ruv 7bad51aca6 publish: best MM-Fi benchmark set (in-domain 83.59, x-subject 64.0, x-env 17.5 CORAL)
Append best witness rows to ledger (seq 2-4) + update HF Space leaderboard banner.
In-domain 83.59% torso-PCK@20 (graph+ensemble+TTA) supersedes the 81.63 single-model entry,
+11.34 over MultiFormer 72.25. Cross-subject 64.04% (official split). Cross-environment 17.51%
(CORAL domain alignment, the cross-room DG win). Gist + issue #876 updated with frontier map.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 22:22:53 -04:00
ruv 046b2564b8 feat(aether-arena): publish RuView MM-Fi SOTA result + ADR-150 RF Foundation Encoder
- Ledger witness row (seq 1, Gold): RuView CSI-Transformer 81.63% torso-PCK@20 on
  MM-Fi random_split, exceeding MultiFormer 72.25% (CSI2Pose 68.41%) — protocol- and
  metric-matched, self-corrected from inflated 91.86% bbox. Hash-chained, verifiable.
- HF Space updated with the controlled SOTA claim + caveat (cross-subject is the frontier).
- Proof/replay/witness gist: gist.github.com/ruvnet/af2fbc1c7674dddf09c15509b3c7f785
- Tracking issue #876 (result + Generalization Track roadmap).
- ADR-150: RuView RF Foundation Encoder — pose-preserving, subject/room/device-invariant
  SSL embedding (masked CSI + pose-contrast-across-subjects + coherence head); the
  principled attack on the cross-subject frontier. DANN failed; this is the corrected design.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 19:55:58 -04:00
ruv 483bfa4660 feat(aether-arena): benchmark-first scorer + witness chain + repeatability (M2/M5/M7)
Per direction "remove the initial number, optimize for benchmark first" + "include
witness chain capabilities for proof and repeatability analysis":

- Empty board, no seeded numbers: ledger seeds to genesis only. Every result is a
  real scoring-pipeline witness; RuView gets no hand-entered baseline.
- Real model scoring: aa_score_runner now loads predictions + an eval split
  (--split/--pred) and scores them through the real ruview_metrics pose harness —
  not just a synthetic fixture. Committed public smoke split (fixtures/smoke_*.json).
- Witness chain: each score emits a witness = inputs_sha256 (binds it to the exact
  inputs) + proof_sha256 (cross-platform-stable score hash) + harness_version.
- Repeatability analysis: --repeat N runs the harness N× and fails if it ever
  yields >=2 distinct proof hashes (16/16 identical locally).
- Witness ledger: ledger/ledger_tools.py — append-only, hash-chained, tamper-
  evident (seed/append/verify); editing any past row breaks the chain.
- CI gate extended: determinism + repeatability(16) + real-scoring smoke + ledger
  chain verify on every PR.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-05-30 16:59:11 -04:00