wifi-densepose/docs/adr/ADR-159-cognitum-appliance-...

243 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-159: Cognitum Appliance Cluster — Beyond-SOTA Sweep, Anti-"AI-Slop" Hardening
- **Status**: accepted
- **Date**: 2026-06-11
- **Deciders**: ruv
- **Tags**: cognitum, cogs, person-count, pose-estimation, ha-matter, drone-swarm, remote-id, manifest, prove-everything
## Context
This ADR records the beyond-SOTA sweep over the Cognitum appliance cluster
(`cog-person-count`, `cog-pose-estimation`, `cog-ha-matter`, `ruview-swarm`),
executed under the project's **prove-everything / anti-"AI-slop"** directive: the
claim surface every cog presents (manifests, descriptions, runtime events,
broadcast fields) must match what the code and the shipped weights actually do.
### Headline — the "never identified anyone" accusation is REFUTED
A read-only audit raised the worst-class accusation: that these cogs are slop that
"never identified anyone." That accusation is **refuted by byte-level evidence**:
- `cog-pose-estimation` and `cog-person-count` ship **real, trained Candle models**
(`pose_v1.safetensors`, `count_v1.safetensors`), not placeholders. The forward
passes (`PoseNet`, `CountNet`) mirror the training scripts exactly and run on
real CSI bytes.
- The artifacts are **SHA-pinned and Ed25519-signed**: the on-disk
`manifests/x86_64/manifest.json` carries a real `binary_sha256`
(`051614ce…388b3` for person-count, `a434739a…71fa` for pose), a real
`weights_sha256`, and a `binary_signature` over `sig_algo: Ed25519`.
- The manifests are **brutally honest about accuracy**: person-count's
`build_metadata` ships `training_class1_accuracy = 0.343` and a candid
`training_caveat`; pose ships `training_pck20 = 3.0` / `training_pck50 = 18.5`.
Nothing is inflated. That honesty *is* the anti-slop win — the models are weak
in the field, and the manifests say so.
So the cogs **do** run real trained inference and **do** disclose how weak it is.
What the audit correctly found were not fabrications but **claim-surface
overclaims** — four places where the surface said more than the weights deliver.
This ADR tightens those four (A1A4) and cites the already-correct subsystems as
NO-ACTION positives.
Grading vocabulary follows ADR-152 / ADR-158:
- **MEASURED** — reproduced in this worktree, command + failing-on-old test recorded.
- **DATA-GATED** — real code path present; honestly flagged where data/hardware is absent.
- **NO-ACTION (already-SOTA)** — audited, found correct, cited as a positive.
- **ACCEPTED-FUTURE** — deliberately deferred, nothing dropped.
## Graded SOTA Landscape
| Capability | Grade | Note |
|------------|-------|------|
| CSI person counting (`cog-person-count`) | **DATA-GATED** | Real Candle count head + Bayesian fusion; weights trained only on classes 0/1 (presence). Multi-occupant accuracy is genuinely unproven and is **not fabricated** — counts above the trained range are now flagged `low_confidence` and clamped. |
| CSI pose estimation (`cog-pose-estimation`) | **DATA-GATED** | Real Candle encoder + 17-keypoint head; field accuracy honestly weak (PCK@50 = 18.5%, disclosed in the manifest). The default-install gate bug (A1) is fixed so it actually emits frames. |
| Signed cog manifests (Ed25519 + SHA-256) | **NO-ACTION (already-SOTA)** | On-disk manifests are real, signed, SHA-pinned, and honest about accuracy. The CLI now emits them verbatim (A4). |
| HA bridge (`cog-ha-matter`) MQTT + witness | **NO-ACTION (already-SOTA)** | Real Ed25519 hash-chain witness, mDNS, embedded broker. Matter commissioning is honestly deferred to v0.8 (TLS off, LAN-only) — description softened to stop claiming Matter (honest-absence). |
| Drone-swarm MARL (`ruview-swarm`) | **DATA-GATED / honest** | `candle_ppo.rs` is real autodiff PPO; it is **untrained at runtime** (random init) by design — the swarm must be trained before deploy, which the code does not hide. |
| ASTM F3411 Remote ID | **MEASURED (A3)** | Basic ID message is real; the Location/Vector message is honestly *not* implemented (NED metres are no longer mislabelled as WGS84 lat/lon). |
## Decision — Fixes Landed (MEASURED)
### §A1 Pose runtime emitted ZERO frames under default config (HIGH)
**Overclaim (silent correctness bug):** `inference.rs` hardcoded
`confidence: 0.185` for every inference, `config.rs default_min_confidence()`
returned `0.3`, and `runtime.rs` gated emission on `confidence >= min_confidence`.
A default install therefore **never emitted a single `pose.frame`** while
`health` reported healthy — the cog *claimed* to be a running pose estimator but
silently produced nothing.
**Real fix:** `pose_v1` has **no confidence head** (the head emits 34 keypoint
coordinates only), so a real per-frame confidence is genuinely unavailable. We
took the disclosed "ok" path rather than silently lowering the threshold:
- Introduced `inference::MODEL_TYPICAL_CONFIDENCE = 0.185` (the validation PCK@50)
as the single published per-frame confidence, used by both `infer()` and the
config default.
- Pinned `default_min_confidence()` to `MODEL_TYPICAL_CONFIDENCE` so a default
install clears its own gate and emits.
- Documented the trade-off in the config field doc, the JSON schema
(`default` 0.3 → 0.185, with a description), **and** added a `run.started`
warning in `main.rs` that fires when an operator raises `min_confidence` above
the model's typical confidence — so a deliberately-high threshold is loud, not
silent.
**Failing-on-old test:** `cog_pose_estimation` smoke
`default_config_emits_frames_with_real_model` — parses a default config and
asserts `min_confidence <= MODEL_TYPICAL_CONFIDENCE` (and, with the real model
loaded, that `infer().confidence >= min_confidence`). **Proven to fail** on the
old `default_min_confidence()=0.3`:
`default min_confidence 0.3 exceeds model typical confidence 0.185 — a default
install would emit zero pose.frame events`.
**Grade: MEASURED.**
### §A2 8-class count head on a 2-class-trained model (MEDIUM)
**Overclaim:** `inference.rs COUNT_CLASSES = 8` with argmax over {0..7}, but
`count_train_results.json` has support only for classes 0 and 1 (`per_class_accuracy`
keys `"0"`/`"1"`). The model is a **presence detector**, not a calibrated
multi-occupant counter; an argmax on classes 2..=7 is out-of-distribution, yet the
cog would emit it as a confident headcount. The Cargo.toml billed it as a
"learned multi-person counter."
**Real fix (no network change — DATA-GATED, accuracy not fabricated):**
- Added `inference::MAX_TRAINED_CLASS = 1`, plus `CountPrediction::is_low_confidence()`
(argmax beyond the trained ceiling) and `clamped_count()` (report clamped to the
trained range, raw argmax kept for audit).
- `person.count` events now carry `low_confidence` + `raw_count`, and downgrade to
`level: "warn"` when out-of-distribution; the reported `count` is clamped so we
never emit a fabricated headcount the weights can't back.
- `run.started` discloses `count_max_trained_class` and `count_classes`.
- Cargo.toml description changed from "learned multi-person counter" to
"presence detector + (data-gated) person count".
**Failing-on-old test:** `cog_person_count` smoke
`untrained_class_argmax_is_flagged_low_confidence` — a prediction whose argmax is
class 5 is asserted `is_low_confidence() == true` and `clamped_count() ==
MAX_TRAINED_CLASS`; a class-1 prediction is asserted *not* flagged. Fails on old
code (no such methods/flag existed).
**Grade: MEASURED (mechanism); multi-occupant accuracy DATA-GATED.**
### §A3 Remote ID broadcast NED metres as WGS84 lat/lon (MEDIUM — safety/compliance)
**Overclaim (compliance hazard):** `security/remote_id.rs update()` stored
`state.position.x/.y` (NED **metres**) into `drone_lat`/`drone_lon`, so the Remote
ID broadcast would carry physically-impossible coordinates (e.g. "latitude =
37.5 m"). The module doc claimed a "Basic ID + Location/Vector message," but only
`encode_basic_id()` exists.
**Real fix (honest naming — never broadcast impossible coordinates):**
- Renamed `drone_lat`/`drone_lon` → `drone_north_m`/`drone_east_m` (NED metres
relative to the operator/takeoff datum), with field docs stating they are *not*
geodetic. `operator_lat`/`operator_lon` remain true WGS84 (from the operator's
GNSS).
- Corrected the module doc to claim **Basic ID only**; the Location/Vector encoder
is explicitly deferred until a datum-anchored NED→WGS84 transform lands
(ACCEPTED-FUTURE), rather than removing a real feature.
**Failing-on-old test:** `security::remote_id::tests::test_ned_offset_stored_as_metres_not_latlon`
— a 37.5 m north / 12.0 m east NED offset is asserted to land in
`drone_north_m`/`drone_east_m`; the operator's real WGS84 fix stays in range. Fails
on old code, where these values were stored into `drone_lat`/`drone_lon`.
**Grade: MEASURED.**
### §A4 Hollow CLI manifest (LOW)
**Overclaim:** `cog-person-count main.rs cmd_manifest` emitted a null skeleton
(`binary_sha256: null`, no training metadata), making the CLI look unsigned even
though the **real signed manifest** existed at
`cog/artifacts/manifests/x86_64/manifest.json`.
**Real fix:** new `cog_person_count::manifest` module `include_str!`-embeds the
real signed manifests (x86_64 + arm), selected by build target arch.
`cmd_manifest` now parses-then-emits the embedded signed manifest — exactly the
pattern `cog-pose-estimation`'s `manifest_roundtrips` test demonstrates. The CLI
now reports the real `binary_sha256`, `weights_sha256`, Ed25519 signature, and
honest `build_metadata` (`training_class1_accuracy = 0.343`).
**Failing-on-old test:** `manifest::tests::embedded_manifest_has_non_null_binary_sha256`
asserts a 64-hex-char `binary_sha256`; companions assert the embedded manifest is
signed (`sig_algo == Ed25519`) and `id == COG_ID`. End-to-end verified:
`cog-person-count manifest` prints `binary_sha256:
051614ce6ba63df704fae848a67ad095df4bb88862fdff05ef3c0419cc8388b3`.
**Grade: MEASURED.**
### §A5 cog-ha-matter description claimed Matter before it exists (LOW — honest-labeling)
**Overclaim:** the Cargo.toml description said "Home Assistant + Matter
integration," but Matter commissioning is deferred to v0.8 (`TlsConfig::Off`,
LAN-only, asserted by `runtime.rs tls_defaults_to_off_for_v1_lan_only`).
**Real fix (no code change):** softened the description to "Home Assistant (MQTT)
integration … LAN-only (no TLS); Matter Bridge commissioning is deferred to v0.8
and not yet implemented." Mirrors ADR-158 §6 honest-absence: state what isn't
there rather than implying it is.
**Grade: MEASURED (label).**
## Negative Results (Confirmed — NO-ACTION positives)
Audited and found genuinely correct; cited as positives, not edited:
- **`cog-ha-matter` witness chain** (`witness.rs` / `witness_signing.rs`) — real
Ed25519 hash-chained witness log. Already-SOTA.
- **`cog-person-count` fusion** (`fusion.rs`) — real Bayesian product-of-experts
multi-node fusion (Stoer-Wagner-bounded clip), not a heuristic. Already-SOTA.
- **`ruview-swarm` PPO** (`marl/candle_ppo.rs`) — real Candle autodiff PPO with a
genuine policy-gradient update; its `randn` uses (init, action sampling,
exploration) are all legitimate, not fake-output substitutes. Untrained at
runtime by design (the swarm must be trained before deploy), which the code
does not hide. Already-SOTA / honest.
## Deferred Backlog (Nothing Dropped)
- **Multi-occupant count accuracy** — DATA-GATED on labelled multi-occupant CSI.
The `low_confidence` flag + clamp (§A2) is the honest stand-in until then.
- **Remote ID Location/Vector message** — ACCEPTED-FUTURE; requires a
datum-anchored local-tangent-plane NED→WGS84 transform with an operator datum.
Basic ID ships today.
- **Matter Bridge commissioning** — ACCEPTED-FUTURE (v0.8); LAN-only MQTT ships today.
- **Criterion benches** for cog inference latency and `mesh_guard` — ACCEPTED-FUTURE
(cold-start timings are recorded in the manifests' `build_metadata`, not yet a
regression bench).
- **`wasm-edge` skill accuracy** — unvalidated; **now honestly labelled, not
claimed** (done in ADR-160: medical/affect/security/exotic claim surfaces
disclaimed, renamed, and feature-gated; per-skill accuracy remains DATA-GATED).
## Consequences
- A default pose-estimation install now actually emits `pose.frame` events;
raising the threshold above the model's reach is a loud `run.started` warning,
not a silent dropout.
- A person-count reading on an untrained class is flagged `low_confidence`,
clamped, and downgraded to `warn` — no fabricated headcounts.
- The Remote ID broadcast can never carry physically-impossible coordinates; NED
metres live in honestly-named metre fields.
- `cog-person-count manifest` now reports the real signed manifest instead of a
hollow null skeleton.
- No cog Cargo.toml description claims a capability (multi-person counting, Matter)
the code/weights don't yet deliver.
## Reproduction (MEASURED)
```bash
cd v2
cargo test -p cog-person-count -p cog-pose-estimation -p cog-ha-matter -p ruview-swarm \
--no-default-features
# ruview-swarm train path compiles (PPO autodiff)
cargo check -p ruview-swarm --features train
# A4 end-to-end — real signed manifest, non-null binary_sha256
cargo run -q -p cog-person-count --no-default-features -- manifest
```
Result at time of writing (all 0 failed):
- `cog-person-count`**19 passed** (lib 10 incl. 3 manifest; smoke 9)
- `cog-pose-estimation`**8 passed** (smoke)
- `cog-ha-matter`**64 passed** (unchanged; description-only edit)
- `ruview-swarm`**117 passed** (default features); `--features train` compiles clean.
Scope was limited to the four named crates. NO-ACTION positives (witness chain,
fusion, PPO + randn audit) were verified by inspection and left untouched.