Commit Graph

3 Commits

Author SHA1 Message Date
ruv 41665d3de9 test(wasm-edge): synthetic-ground-truth validation harness for edge skills (ADR-160)
Plant signals with known answers, run the real detector, MEASURE detection
accuracy / precision / recall / rate-error — synthetic-ground-truth ONLY, not
field accuracy.

MEASURED-on-synthetic (12 tests, all green):
- vital_trend, exo_ghost_hunter(hidden breathing), occupancy, intrusion,
  exo_rain_detect, sig_optimal_transport: acc 1.000
- exo_time_crystal: 1.000 on periodic-vs-aperiodic (its sub-harmonic-vs-clean-
  period claim is NOT separable by autocorrelation — recorded honestly)
- sig_flash_attention: 8/8 peak localization; spt_spiking_tracker: 4/4 zone
  localization (sparse plant); sig_mincut_person_match: 0 id-swaps/40 frames
- lrn_dtw_gesture_learn: enrollment validated (replay-match reported, not asserted)
- sig_sparse_recovery: trigger validated; recovery accuracy reported NEGATIVE
  (-2.2% vs unrecovered baseline) — only its detect/trigger path is validated

DATA-GATED (listed, NOT faked): med_seizure/apnea/cardiac/respiratory/gait,
sec_weapon_detect, exo_emotion/happiness/dream_stage/gesture_language — each
needs real labelled clinical/affect/ASL/metal-object data; no number claimed.

benchmarks/edge-skills/RESULTS.md documents every result + reproduce command and
the explicit honesty boundary. ADR-160 deferred 'per-skill accuracy validation'
item updated to PARTIALLY MEASURED-on-synthetic + DATA-GATED.

Suite: 631 passed default / 669 medical, 0 failed.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-13 00:33:51 -04:00
ruv 1a17cc5b06 docs(ADR-163): edge-latency RESULTS + PROOF/prove.sh wiring (T3)
Adds benchmarks/edge-latency/RESULTS.md (wiflow-std RESULTS style: each
measured number with reproduce command, machine, MEASURED-on-host grade,
and the honest host-vs-ESP32 / steady-state-vs-cold-start caveats) and
ADR-163 (HEADLINE: CLAIMED latency budgets -> MEASURED-on-host, closing
M5/M6 measurement debt; ESP32-on-hardware still pending).

- ADR-160 deferred 'criterion benches for process_frame budget claims'
  line updated to DONE (host) with the ESP32-pending note.
- PROOF.md performance table gains the two edge-latency reproduce rows;
  provenance ADR range extended to ADR-163.
- prove.sh gated section gains the edge-latency bench note (host proxy
  only; not asserted, never claims the ESP32 figure).

Benches/docs only; no crate republishes.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-12 08:02:07 -04:00
ruv 8ad0d0f91c test+docs(wasm-edge): honest-labeling presence tests + ADR-160 (ADR-159 backlog now TRUE)
- tests/honest_labeling.rs: 10 source-presence tests asserting the A1-A5 claim
  invariants (disclaimers present, uncited stat removed, WEAPON_ALERT no longer
  exported, med_* feature-gated, no static-mut event buffers). Each is designed to
  FAIL on the pre-fix source (ADR-159 A5 manifest-roundtrip style).
- ADR-160: records the headline (0 stubs/0 theater, all real DSP -> claim-surface
  honesty debt), the graded A1-A5 fixes, NO-ACTION positives, per-prefix
  classification, and the DATA-GATED deferred backlog (criterion benches,
  per-skill accuracy validation, wasm32 static_mut_refs CI confirmation).
- ADR-159: its deferred-backlog line "wasm-edge ... honestly labelled, not claimed"
  is now actually TRUE.

Validation (all 0 failed, host --features std):
  DEFAULT 615 | MEDICAL (+medical-experimental) 653 | NO-DEFAULT 615; 0 warnings.

Co-Authored-By: claude-flow <ruv@ruv.net>
2026-06-12 00:01:22 -04:00