test(wasm-edge): synthetic-ground-truth validation harness for edge skills (ADR-160)
Plant signals with known answers, run the real detector, MEASURE detection
accuracy / precision / recall / rate-error — synthetic-ground-truth ONLY, not
field accuracy.
MEASURED-on-synthetic (12 tests, all green):
- vital_trend, exo_ghost_hunter(hidden breathing), occupancy, intrusion,
exo_rain_detect, sig_optimal_transport: acc 1.000
- exo_time_crystal: 1.000 on periodic-vs-aperiodic (its sub-harmonic-vs-clean-
period claim is NOT separable by autocorrelation — recorded honestly)
- sig_flash_attention: 8/8 peak localization; spt_spiking_tracker: 4/4 zone
localization (sparse plant); sig_mincut_person_match: 0 id-swaps/40 frames
- lrn_dtw_gesture_learn: enrollment validated (replay-match reported, not asserted)
- sig_sparse_recovery: trigger validated; recovery accuracy reported NEGATIVE
(-2.2% vs unrecovered baseline) — only its detect/trigger path is validated
DATA-GATED (listed, NOT faked): med_seizure/apnea/cardiac/respiratory/gait,
sec_weapon_detect, exo_emotion/happiness/dream_stage/gesture_language — each
needs real labelled clinical/affect/ASL/metal-object data; no number claimed.
benchmarks/edge-skills/RESULTS.md documents every result + reproduce command and
the explicit honesty boundary. ADR-160 deferred 'per-skill accuracy validation'
item updated to PARTIALLY MEASURED-on-synthetic + DATA-GATED.
Suite: 631 passed default / 669 medical, 0 failed.
Co-Authored-By: claude-flow <ruv@ruv.net>