deploy: 29e937ef52

2026-06-13 04:55:03 +00:00 · 2026-06-13 04:55:03 +00:00 · 5b7e89f73f
parent d3604cce6d
commit 5b7e89f73f
1 changed files with 27 additions and 4 deletions
--- a/api-docs/adr/ADR-160-edge-skill-library-honest-labeling.md
+++ b/api-docs/adr/ADR-160-edge-skill-library-honest-labeling.md
@ -178,10 +178,33 @@ label or behavior change, consistent with leaving their claim surface intact.)

 ## Deferred Backlog (Nothing Dropped)

- **Per-skill accuracy validation** — **DATA-GATED**. Validating any med_*/affect/
-  sign-language claim requires labelled clinical/affective/ASL data and reference
-  standards that do not exist in this repo. The disclaimers + feature gate are the
-  honest stand-in. Nothing is claimed that is not measured.
+- **Per-skill accuracy validation** — **PARTIALLY MEASURED-on-synthetic**
+  (2026-06-13). For the subset of skills whose detection target is *constructible*
+  with known ground truth, a synthetic-ground-truth harness
+  (`tests/synthetic_validation.rs`, 12 tests) plants signals with known answers,
+  runs the real detector, and **measures** detection accuracy / rate-error:
+  `vital_trend`, `exo_time_crystal` (periodic-vs-aperiodic — its sub-harmonic-vs-
+  clean-period claim is NOT separable, recorded honestly), `exo_ghost_hunter`
+  (hidden breathing), `occupancy`, `intrusion`, `exo_rain_detect`,
+  `sig_flash_attention` (8/8 peak localization), `spt_spiking_tracker` (4/4 zone
+  localization, sparse plant), `sig_optimal_transport`, `sig_mincut_person_match`
+  (0 id-swaps), `lrn_dtw_gesture_learn` (enrollment) — all 1.000 where claimed;
+  `sig_sparse_recovery`'s recovery accuracy is reported **negative** (−2.2% vs
+  unrecovered baseline) — only its trigger path is validated. Full numbers +
+  reproduce commands in `benchmarks/edge-skills/RESULTS.md`.
+  The **med_*/affect/sign-language/weapon** claims remain **DATA-GATED**:
+  validating them requires labelled clinical/affective/ASL/metal-object data and
+  reference standards that do not exist in this repo. Planting a "seizure-/weapon-/
+  happy-like" synthetic signal validates nothing real and is explicitly refused;
+  RESULTS.md lists each with the real data it needs. The disclaimers + feature gate
+  are the honest stand-in. Nothing is claimed that is not measured.
+- **Unified edge pipeline** — **MEASURED** (2026-06-13). `src/pipeline_all.rs`
+  (`EdgePipeline`) + `src/skill_registry.rs` register **every** runtime skill
+  behind one uniform `EdgeSkill` trait and run them all per CSI frame; `med_*` are
+  registered only under `--features medical-experimental` (preserves the §A1 gate).
+  `tests/pipeline_all.rs` (4 tests) proves all 59 default / 64 medical skills run
+  without panic over 300 synthetic frames with a well-formed aggregated event
+  stream. `examples/run_all_skills.rs` is a runnable demo. No skill DSP changed.
 - **Criterion benches for `process_frame` budget claims** — **DONE (host)**
  (ADR-163, 2026-06-12). `benches/process_frame_bench.rs` benches the heaviest
  hot paths (`exo_time_crystal` 256×128 autocorrelation, `exo_ghost_hunter`