From 5b7e89f73f3846e7f4a119ae374774b43ebc5c8b Mon Sep 17 00:00:00 2001 From: ruvnet Date: Sat, 13 Jun 2026 04:55:03 +0000 Subject: [PATCH] deploy: 29e937ef526552beb50aee193258964e6339ab0e --- ...-160-edge-skill-library-honest-labeling.md | 31 ++++++++++++++++--- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/api-docs/adr/ADR-160-edge-skill-library-honest-labeling.md b/api-docs/adr/ADR-160-edge-skill-library-honest-labeling.md index 7131a684..6c42956c 100644 --- a/api-docs/adr/ADR-160-edge-skill-library-honest-labeling.md +++ b/api-docs/adr/ADR-160-edge-skill-library-honest-labeling.md @@ -178,10 +178,33 @@ label or behavior change, consistent with leaving their claim surface intact.) ## Deferred Backlog (Nothing Dropped) -- **Per-skill accuracy validation** — **DATA-GATED**. Validating any med_*/affect/ - sign-language claim requires labelled clinical/affective/ASL data and reference - standards that do not exist in this repo. The disclaimers + feature gate are the - honest stand-in. Nothing is claimed that is not measured. +- **Per-skill accuracy validation** — **PARTIALLY MEASURED-on-synthetic** + (2026-06-13). For the subset of skills whose detection target is *constructible* + with known ground truth, a synthetic-ground-truth harness + (`tests/synthetic_validation.rs`, 12 tests) plants signals with known answers, + runs the real detector, and **measures** detection accuracy / rate-error: + `vital_trend`, `exo_time_crystal` (periodic-vs-aperiodic — its sub-harmonic-vs- + clean-period claim is NOT separable, recorded honestly), `exo_ghost_hunter` + (hidden breathing), `occupancy`, `intrusion`, `exo_rain_detect`, + `sig_flash_attention` (8/8 peak localization), `spt_spiking_tracker` (4/4 zone + localization, sparse plant), `sig_optimal_transport`, `sig_mincut_person_match` + (0 id-swaps), `lrn_dtw_gesture_learn` (enrollment) — all 1.000 where claimed; + `sig_sparse_recovery`'s recovery accuracy is reported **negative** (−2.2% vs + unrecovered baseline) — only its trigger path is validated. Full numbers + + reproduce commands in `benchmarks/edge-skills/RESULTS.md`. + The **med_*/affect/sign-language/weapon** claims remain **DATA-GATED**: + validating them requires labelled clinical/affective/ASL/metal-object data and + reference standards that do not exist in this repo. Planting a "seizure-/weapon-/ + happy-like" synthetic signal validates nothing real and is explicitly refused; + RESULTS.md lists each with the real data it needs. The disclaimers + feature gate + are the honest stand-in. Nothing is claimed that is not measured. +- **Unified edge pipeline** — **MEASURED** (2026-06-13). `src/pipeline_all.rs` + (`EdgePipeline`) + `src/skill_registry.rs` register **every** runtime skill + behind one uniform `EdgeSkill` trait and run them all per CSI frame; `med_*` are + registered only under `--features medical-experimental` (preserves the §A1 gate). + `tests/pipeline_all.rs` (4 tests) proves all 59 default / 64 medical skills run + without panic over 300 synthetic frames with a well-formed aggregated event + stream. `examples/run_all_skills.rs` is a runnable demo. No skill DSP changed. - **Criterion benches for `process_frame` budget claims** — **DONE (host)** (ADR-163, 2026-06-12). `benches/process_frame_bench.rs` benches the heaviest hot paths (`exo_time_crystal` 256×128 autocorrelation, `exo_ghost_hunter`