wifi-densepose/docs/adr/ADR-158-mat-worldmodel-beyo...

213 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-158: MAT / World-Model Cluster — Beyond-SOTA Sweep, Anti-"AI-Slop" Hardening
- **Status**: accepted
- **Date**: 2026-06-11
- **Deciders**: ruv
- **Tags**: mat, life-safety, localization, triage, worldmodel, worldgraph, geo, engine, prove-everything
## Context
This ADR records the beyond-SOTA sweep over the MAT / world-model cluster
(`wifi-densepose-mat`, `-worldmodel`, `-worldgraph`, `-geo`, `-engine`), executed
under the project's **prove-everything / anti-"AI-slop"** directive: every stub is
either implemented with real logic or replaced by an honest typed error; no
fake/always-empty/random outputs; tests pass on real behaviour; results are graded
**MEASURED** (reproduced here with the command recorded), **CLAIMED**,
**DATA-GATED** (real code path present, needs hardware/data we lack), or
**NO-ACTION** (already-SOTA — cited as a positive).
The Mass Casualty Assessment Tool touches life-safety. A triage metric that is
disconnected from the decision it gates, or a survivor count that inflates, is the
worst class of slop: it produces confident, wrong rescue prioritisation. An audit
against live code found six concrete defects, four of which were silent
correctness bugs (not missing features) in the triage → gate → record path and in
the localization/dedup path.
Grading vocabulary follows ADR-152 (F-evidence grades) and the sweep convention:
- **MEASURED** — reproduced in this worktree, command recorded below.
- **DATA-GATED** — real code path implemented; returns a typed error / honest
provenance flag where hardware or labelled data is genuinely absent.
- **NO-ACTION (already-SOTA)** — audited, found correct, cited as a positive.
- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped.
## Graded SOTA Landscape
| Capability | Grade | Note |
|------------|-------|------|
| RF-through-rubble survivor detection | **DATA-GATED** | Real detection + triage + localization code paths run end-to-end on real CSI bytes; field detection *accuracy* is unproven without instrumented rubble trials and is **not fabricated** here. |
| OccWorld occupancy architecture (`-worldmodel`) | **NO-ACTION (current)** | `occupancy.rs` voxel mapping is clamp-proven bounds-safe; converts WorldGraph person positions to a 200×200×16 grid with no out-of-bounds path. |
| WorldGraph provenance / privacy / pruning (`-worldgraph`) | **NO-ACTION (already-SOTA)** | `graph.rs` implements append-with-provenance (`DerivedFrom`), deterministic LRU pruning, and a privacy rollup (`PrivacyLimitedBy`). Cited as a positive; no changes needed. |
| Point-cloud parser bounds-safety (`-pointcloud`) | **NO-ACTION (already-SOTA)** | Another agent's crate; cited only — its parser is bounds-checked. Out of scope for this ADR's edits. |
| Learned multi-person counter | **DATA-GATED** | Deferred; requires labelled multi-occupant CSI. The zone+vitals-signature dedup (below) is the honest non-learned stand-in. |
| RF point-cloud generation | **ACCEPTED-FUTURE** | Not dropped; tracked as future work. |
## Decision — Fixes Landed (MEASURED)
### §1 Unify the two divergent triage engines (CRITICAL)
**Was:** `EnsembleClassifier::determine_triage` (ensemble gate) and
`TriageCalculator::calculate` (survivor record) were two different START-protocol
approximations with different rate bands and movement handling. The pipeline
gated on the ensemble's confidence (`lib.rs:489`), discarded the ensemble triage
(`lib.rs:524`, `_ensemble`), and recomputed via `TriageCalculator` in
`Survivor::new` (`survivor.rs:194`). A survivor could be admitted at one priority
and recorded at another.
**Now:** `determine_triage` delegates to `TriageCalculator` — the **single source
of truth** used by both the gate and the survivor record. The only ensemble-
specific behaviour retained is the confidence gate (low confidence → `Unknown`,
except `Immediate`, which is never suppressed — a missed survivor in distress is
costlier than a false positive). Rate bands follow START (<10 / >30 bpm →
Immediate).
**Failing-on-old test:** `detection::ensemble::tests::test_divergent_boundary_28bpm_tremor_gate_equals_survivor`
— 28 bpm Normal + Tremor. Old gate → Delayed, old survivor record → Immediate
(divergent). Unified result: gate == survivor == **Immediate**. Companion tests
(`test_no_vitals_is_unknown_canonical`, `test_normal_breathing_no_movement_is_immediate_canonical`,
the updated `integration_adr001::test_ensemble_classifier_triage_logic`) assert
gate-vs-record equality on every boundary.
### §2 Real RSSI/ToA localization + kill count-inflation (HIGH)
**Was:** `fusion.rs:79 simulate_rssi_measurements` always returned `vec![]`, so
every survivor got `location: None`, so spatial dedup (`disaster_event.rs:285`,
which only fired on `Some` location) was disabled. One trapped person re-detected
across N scan cycles became **N survivors** — a fabricated mass-casualty count.
**Now, two real mechanisms:**
1. **Real RSSI source:** `SensorPosition` gains an optional `last_rssi`
(populated by the hardware layer from actual signal-strength readings).
`collect_rssi_measurements` reads only real per-sensor RSSI and feeds the
existing triangulator; it **never fabricates** a value. With `< min_sensors`
real readings, `estimate_position` returns `None` (honest).
2. **Zone + vitals-signature dedup:** when no usable location exists,
`record_detection` matches an existing *active, un-located* survivor in the
same zone whose latest vital signature (breathing presence + START rate band,
heartbeat presence, movement class) is compatible — collapsing repeat
detections of one person while keeping genuinely distinct survivors separate.
**MEASURED:** `test_identical_vitals_no_location_dedup_to_one` — 3× identical-vitals
/ `None`-location → **1 survivor** (old code: 3). `test_distinct_vitals_no_location_stay_separate`
keeps two distinct survivors at 2 (no under-count). `test_estimate_position_uses_real_rssi`
yields a position from 3 real-RSSI sensors; `test_estimate_position_none_without_real_rssi`
yields `None` (no fabrication).
### §3 Real ESP32/UDP/PCAP CSI ingest; honest typed errors elsewhere (HIGH)
**Was:** `hardware_adapter.rs read_esp32_csi` / `read_udp_csi` / `read_pcap_csi`
returned "not yet implemented" — even though `csi_receiver.rs` already contained a
working `CsiParser` (ESP32 CSV, JSON, Intel5300/Atheros/Nexmon byte decoders) and a
real `PcapCsiReader`.
**Now:**
- **UDP** — binds, receives one datagram, parses (auto-detect) → `CsiReadings`.
End-to-end test sends a real JSON datagram on the wire.
- **PCAP** — `load` + `read_next` + parse. End-to-end test writes a real
little-endian `.pcap` with one record and reads it back.
- **ESP32** — parses `CSI_DATA` CSV via the real parser. Live serial byte I/O is
behind an optional `serial` cargo feature (native `serialport` kept off the
default / aarch64 appliance build); with the feature off, live reads return a
typed `UnsupportedAdapter` while the byte parser still works.
- **Intel 5300 / Atheros / PicoScenes** — return typed
`AdapterError::HardwareUnavailable` / `UnsupportedAdapter` (no device, no
driver, or no validatable format here). **Never fake CSI.** New error variants
added to make the gating typed rather than a `String` "Hardware" soup.
**MEASURED:** `test_esp32_bytes_parse_end_to_end`, `test_udp_read_end_to_end`,
`test_pcap_read_end_to_end`, `test_intel_and_atheros_are_honestly_unavailable`.
### §4 Real parabolic peak interpolation in `find_dominant_frequency` (MED)
**Was:** `breathing.rs:243` comment claimed interpolation but returned the bin
center, capping breathing-rate resolution at ±half a bin.
**Now:** 3-point parabolic (quadratic) peak interpolation,
`δ = 0.5·(yL yR)/(yL 2y0 + yR)`, clamped to `[-0.5, 0.5]`, with an edge
fallback to bin center.
**MEASURED:** `test_find_dominant_frequency_parabolic_interpolation` — for a
parabola-shaped peak at true bin 10.4 the recovery is exact (δ = 0.4); the test
asserts the result lands within half a bin of truth and strictly beats the
old bin-center estimate.
### §5 GDOP honesty (LOW)
**Was:** `triangulation.rs:248 estimate_gdop` returned an ad-hoc average-pair-angle
factor *labelled* GDOP (the same defect class ADR-156 §2.3 fixed elsewhere).
**Now:** real, dimensionless **GDOP = √(trace((HᵀH)⁻¹))** from the range-measurement
Jacobian `H` (unit target→sensor bearings), returning `None` for singular
(collinear) geometry, which the caller treats as factor 1.0 (no fabrication).
**MEASURED:** `test_gdop_is_real_dilution` — a well-spread array gives a lower GDOP
than a near-collinear one, cross-checked against the closed form;
`test_gdop_singular_collinear_is_none` confirms singular geometry returns `None`.
### §6 OccWorld trajectory-prior consumer honesty (fail-safe)
**Finding:** `wifi-densepose-mat` does **not** consume OccWorld trajectory priors
and has no `-worldmodel`/`-worldgraph`/occworld dependency (grep-verified: zero
hits across `crates/wifi-densepose-mat/`). There is therefore no random-derived
prior being consumed. **No code change** is warranted; the fail-safe (ignore
priors until a typed `weights_complete`/`stubbed` flag exists) is already the
status quo by absence. Recorded here so a future consumer wires the flag rather
than re-introducing the risk.
## Negative Results (Confirmed — NO-ACTION)
These were audited and found genuinely correct; they are cited as positives, not
edited:
- **`worldgraph` provenance / privacy / pruning** (`graph.rs`) — append-with-
provenance (`add_semantic_state` + `DerivedFrom`), deterministic LRU pruning
(`prune_semantic_states`, with `prune_is_deterministic_for_equal_timestamps`),
and a privacy rollup (`apply_privacy_mode` → `PrivacyLimitedBy`). Already-SOTA.
- **`worldmodel` occupancy clamp** (`occupancy.rs:74125`) — `to_voxel_xy` /
`to_voxel_z` `.clamp()` voxel indices into `[0, GRID-1]`; the flat index is
always in-bounds. No out-of-bounds / fabrication path.
- **`pointcloud` parser bounds-safety** — another agent's crate; cited only, its
parser is bounds-checked.
## Deferred Backlog (Nothing Dropped)
- **Learned multi-person counter** — DATA-GATED on labelled multi-occupant CSI.
The zone+vitals-signature dedup (§2) is the honest non-learned stand-in until
then.
- **RF point-cloud generation** — ACCEPTED-FUTURE.
- **PicoScenes container decode** — DATA-GATED; needs matching NIC/plugin to
validate against. Returns `UnsupportedAdapter` today.
- **Intel 5300 / Atheros live capture** — DATA-GATED on patched drivers; byte
parsers exist and are exercised on supplied bytes.
## Consequences
- Triage is now a single auditable function; gate and survivor record can never
diverge.
- Survivor counts cannot inflate from repeat detection of one un-located person.
- The CSI ingest layer either produces real data or fails with a typed error that
names *why* — no path silently substitutes simulated/fabricated CSI.
- `SensorPosition` grows an optional `last_rssi` field (serde-`default`, non-
breaking for deserialisation; 7 constructors updated).
- A new optional `serial` feature isolates the native `serialport` dependency from
the default / appliance builds.
## Reproduction (MEASURED)
```bash
cd v2
# MAT — default features (181 unit + 6 + 3[3 ignored] integration)
cargo test -p wifi-densepose-mat
# MAT — all features (same counts; exercises ruvector + api + serde paths)
cargo test -p wifi-densepose-mat --all-features
# MAT — serial feature compiles (native serialport path)
cargo check -p wifi-densepose-mat --features serial
# Sibling crates (cited NO-ACTION; confirmed green)
cargo test -p wifi-densepose-worldmodel # 12 + 1
cargo test -p wifi-densepose-worldgraph # 9
cargo test -p wifi-densepose-geo # 9 + 8
cargo test -p wifi-densepose-engine # 27
```
Result at time of writing: MAT **181 passed; 0 failed** (default and all-features);
worldmodel **13**, worldgraph **9**, geo **17**, engine **27** — all 0 failed.