wifi-densepose/docs/adr/ADR-159-cognitum-appliance-...

14 KiB
Raw Blame History

ADR-159: Cognitum Appliance Cluster — Beyond-SOTA Sweep, Anti-"AI-Slop" Hardening

  • Status: accepted
  • Date: 2026-06-11
  • Deciders: ruv
  • Tags: cognitum, cogs, person-count, pose-estimation, ha-matter, drone-swarm, remote-id, manifest, prove-everything

Context

This ADR records the beyond-SOTA sweep over the Cognitum appliance cluster (cog-person-count, cog-pose-estimation, cog-ha-matter, ruview-swarm), executed under the project's prove-everything / anti-"AI-slop" directive: the claim surface every cog presents (manifests, descriptions, runtime events, broadcast fields) must match what the code and the shipped weights actually do.

Headline — the "never identified anyone" accusation is REFUTED

A read-only audit raised the worst-class accusation: that these cogs are slop that "never identified anyone." That accusation is refuted by byte-level evidence:

  • cog-pose-estimation and cog-person-count ship real, trained Candle models (pose_v1.safetensors, count_v1.safetensors), not placeholders. The forward passes (PoseNet, CountNet) mirror the training scripts exactly and run on real CSI bytes.
  • The artifacts are SHA-pinned and Ed25519-signed: the on-disk manifests/x86_64/manifest.json carries a real binary_sha256 (051614ce…388b3 for person-count, a434739a…71fa for pose), a real weights_sha256, and a binary_signature over sig_algo: Ed25519.
  • The manifests are brutally honest about accuracy: person-count's build_metadata ships training_class1_accuracy = 0.343 and a candid training_caveat; pose ships training_pck20 = 3.0 / training_pck50 = 18.5. Nothing is inflated. That honesty is the anti-slop win — the models are weak in the field, and the manifests say so.

So the cogs do run real trained inference and do disclose how weak it is. What the audit correctly found were not fabrications but claim-surface overclaims — four places where the surface said more than the weights deliver. This ADR tightens those four (A1A4) and cites the already-correct subsystems as NO-ACTION positives.

Grading vocabulary follows ADR-152 / ADR-158:

  • MEASURED — reproduced in this worktree, command + failing-on-old test recorded.
  • DATA-GATED — real code path present; honestly flagged where data/hardware is absent.
  • NO-ACTION (already-SOTA) — audited, found correct, cited as a positive.
  • ACCEPTED-FUTURE — deliberately deferred, nothing dropped.

Graded SOTA Landscape

Capability Grade Note
CSI person counting (cog-person-count) DATA-GATED Real Candle count head + Bayesian fusion; weights trained only on classes 0/1 (presence). Multi-occupant accuracy is genuinely unproven and is not fabricated — counts above the trained range are now flagged low_confidence and clamped.
CSI pose estimation (cog-pose-estimation) DATA-GATED Real Candle encoder + 17-keypoint head; field accuracy honestly weak (PCK@50 = 18.5%, disclosed in the manifest). The default-install gate bug (A1) is fixed so it actually emits frames.
Signed cog manifests (Ed25519 + SHA-256) NO-ACTION (already-SOTA) On-disk manifests are real, signed, SHA-pinned, and honest about accuracy. The CLI now emits them verbatim (A4).
HA bridge (cog-ha-matter) MQTT + witness NO-ACTION (already-SOTA) Real Ed25519 hash-chain witness, mDNS, embedded broker. Matter commissioning is honestly deferred to v0.8 (TLS off, LAN-only) — description softened to stop claiming Matter (honest-absence).
Drone-swarm MARL (ruview-swarm) DATA-GATED / honest candle_ppo.rs is real autodiff PPO; it is untrained at runtime (random init) by design — the swarm must be trained before deploy, which the code does not hide.
ASTM F3411 Remote ID MEASURED (A3) Basic ID message is real; the Location/Vector message is honestly not implemented (NED metres are no longer mislabelled as WGS84 lat/lon).

Decision — Fixes Landed (MEASURED)

§A1 Pose runtime emitted ZERO frames under default config (HIGH)

Overclaim (silent correctness bug): inference.rs hardcoded confidence: 0.185 for every inference, config.rs default_min_confidence() returned 0.3, and runtime.rs gated emission on confidence >= min_confidence. A default install therefore never emitted a single pose.frame while health reported healthy — the cog claimed to be a running pose estimator but silently produced nothing.

Real fix: pose_v1 has no confidence head (the head emits 34 keypoint coordinates only), so a real per-frame confidence is genuinely unavailable. We took the disclosed "ok" path rather than silently lowering the threshold:

  • Introduced inference::MODEL_TYPICAL_CONFIDENCE = 0.185 (the validation PCK@50) as the single published per-frame confidence, used by both infer() and the config default.
  • Pinned default_min_confidence() to MODEL_TYPICAL_CONFIDENCE so a default install clears its own gate and emits.
  • Documented the trade-off in the config field doc, the JSON schema (default 0.3 → 0.185, with a description), and added a run.started warning in main.rs that fires when an operator raises min_confidence above the model's typical confidence — so a deliberately-high threshold is loud, not silent.

Failing-on-old test: cog_pose_estimation smoke default_config_emits_frames_with_real_model — parses a default config and asserts min_confidence <= MODEL_TYPICAL_CONFIDENCE (and, with the real model loaded, that infer().confidence >= min_confidence). Proven to fail on the old default_min_confidence()=0.3: default min_confidence 0.3 exceeds model typical confidence 0.185 — a default install would emit zero pose.frame events.

Grade: MEASURED.

§A2 8-class count head on a 2-class-trained model (MEDIUM)

Overclaim: inference.rs COUNT_CLASSES = 8 with argmax over {0..7}, but count_train_results.json has support only for classes 0 and 1 (per_class_accuracy keys "0"/"1"). The model is a presence detector, not a calibrated multi-occupant counter; an argmax on classes 2..=7 is out-of-distribution, yet the cog would emit it as a confident headcount. The Cargo.toml billed it as a "learned multi-person counter."

Real fix (no network change — DATA-GATED, accuracy not fabricated):

  • Added inference::MAX_TRAINED_CLASS = 1, plus CountPrediction::is_low_confidence() (argmax beyond the trained ceiling) and clamped_count() (report clamped to the trained range, raw argmax kept for audit).
  • person.count events now carry low_confidence + raw_count, and downgrade to level: "warn" when out-of-distribution; the reported count is clamped so we never emit a fabricated headcount the weights can't back.
  • run.started discloses count_max_trained_class and count_classes.
  • Cargo.toml description changed from "learned multi-person counter" to "presence detector + (data-gated) person count".

Failing-on-old test: cog_person_count smoke untrained_class_argmax_is_flagged_low_confidence — a prediction whose argmax is class 5 is asserted is_low_confidence() == true and clamped_count() == MAX_TRAINED_CLASS; a class-1 prediction is asserted not flagged. Fails on old code (no such methods/flag existed).

Grade: MEASURED (mechanism); multi-occupant accuracy DATA-GATED.

§A3 Remote ID broadcast NED metres as WGS84 lat/lon (MEDIUM — safety/compliance)

Overclaim (compliance hazard): security/remote_id.rs update() stored state.position.x/.y (NED metres) into drone_lat/drone_lon, so the Remote ID broadcast would carry physically-impossible coordinates (e.g. "latitude = 37.5 m"). The module doc claimed a "Basic ID + Location/Vector message," but only encode_basic_id() exists.

Real fix (honest naming — never broadcast impossible coordinates):

  • Renamed drone_lat/drone_londrone_north_m/drone_east_m (NED metres relative to the operator/takeoff datum), with field docs stating they are not geodetic. operator_lat/operator_lon remain true WGS84 (from the operator's GNSS).
  • Corrected the module doc to claim Basic ID only; the Location/Vector encoder is explicitly deferred until a datum-anchored NED→WGS84 transform lands (ACCEPTED-FUTURE), rather than removing a real feature.

Failing-on-old test: security::remote_id::tests::test_ned_offset_stored_as_metres_not_latlon — a 37.5 m north / 12.0 m east NED offset is asserted to land in drone_north_m/drone_east_m; the operator's real WGS84 fix stays in range. Fails on old code, where these values were stored into drone_lat/drone_lon.

Grade: MEASURED.

§A4 Hollow CLI manifest (LOW)

Overclaim: cog-person-count main.rs cmd_manifest emitted a null skeleton (binary_sha256: null, no training metadata), making the CLI look unsigned even though the real signed manifest existed at cog/artifacts/manifests/x86_64/manifest.json.

Real fix: new cog_person_count::manifest module include_str!-embeds the real signed manifests (x86_64 + arm), selected by build target arch. cmd_manifest now parses-then-emits the embedded signed manifest — exactly the pattern cog-pose-estimation's manifest_roundtrips test demonstrates. The CLI now reports the real binary_sha256, weights_sha256, Ed25519 signature, and honest build_metadata (training_class1_accuracy = 0.343).

Failing-on-old test: manifest::tests::embedded_manifest_has_non_null_binary_sha256 asserts a 64-hex-char binary_sha256; companions assert the embedded manifest is signed (sig_algo == Ed25519) and id == COG_ID. End-to-end verified: cog-person-count manifest prints binary_sha256: 051614ce6ba63df704fae848a67ad095df4bb88862fdff05ef3c0419cc8388b3.

Grade: MEASURED.

§A5 cog-ha-matter description claimed Matter before it exists (LOW — honest-labeling)

Overclaim: the Cargo.toml description said "Home Assistant + Matter integration," but Matter commissioning is deferred to v0.8 (TlsConfig::Off, LAN-only, asserted by runtime.rs tls_defaults_to_off_for_v1_lan_only).

Real fix (no code change): softened the description to "Home Assistant (MQTT) integration … LAN-only (no TLS); Matter Bridge commissioning is deferred to v0.8 and not yet implemented." Mirrors ADR-158 §6 honest-absence: state what isn't there rather than implying it is.

Grade: MEASURED (label).

Negative Results (Confirmed — NO-ACTION positives)

Audited and found genuinely correct; cited as positives, not edited:

  • cog-ha-matter witness chain (witness.rs / witness_signing.rs) — real Ed25519 hash-chained witness log. Already-SOTA.
  • cog-person-count fusion (fusion.rs) — real Bayesian product-of-experts multi-node fusion (Stoer-Wagner-bounded clip), not a heuristic. Already-SOTA.
  • ruview-swarm PPO (marl/candle_ppo.rs) — real Candle autodiff PPO with a genuine policy-gradient update; its randn uses (init, action sampling, exploration) are all legitimate, not fake-output substitutes. Untrained at runtime by design (the swarm must be trained before deploy), which the code does not hide. Already-SOTA / honest.

Deferred Backlog (Nothing Dropped)

  • Multi-occupant count accuracy — DATA-GATED on labelled multi-occupant CSI. The low_confidence flag + clamp (§A2) is the honest stand-in until then.
  • Remote ID Location/Vector message — ACCEPTED-FUTURE; requires a datum-anchored local-tangent-plane NED→WGS84 transform with an operator datum. Basic ID ships today.
  • Matter Bridge commissioning — ACCEPTED-FUTURE (v0.8); LAN-only MQTT ships today.
  • Criterion benches for cog inference latency and mesh_guard — ACCEPTED-FUTURE (cold-start timings are recorded in the manifests' build_metadata, not yet a regression bench).
  • wasm-edge skill accuracy — unvalidated; now honestly labelled, not claimed (done in ADR-160: medical/affect/security/exotic claim surfaces disclaimed, renamed, and feature-gated; per-skill accuracy remains DATA-GATED).

Consequences

  • A default pose-estimation install now actually emits pose.frame events; raising the threshold above the model's reach is a loud run.started warning, not a silent dropout.
  • A person-count reading on an untrained class is flagged low_confidence, clamped, and downgraded to warn — no fabricated headcounts.
  • The Remote ID broadcast can never carry physically-impossible coordinates; NED metres live in honestly-named metre fields.
  • cog-person-count manifest now reports the real signed manifest instead of a hollow null skeleton.
  • No cog Cargo.toml description claims a capability (multi-person counting, Matter) the code/weights don't yet deliver.

Reproduction (MEASURED)

cd v2
cargo test -p cog-person-count -p cog-pose-estimation -p cog-ha-matter -p ruview-swarm \
  --no-default-features
# ruview-swarm train path compiles (PPO autodiff)
cargo check -p ruview-swarm --features train
# A4 end-to-end — real signed manifest, non-null binary_sha256
cargo run -q -p cog-person-count --no-default-features -- manifest

Result at time of writing (all 0 failed):

  • cog-person-count19 passed (lib 10 incl. 3 manifest; smoke 9)
  • cog-pose-estimation8 passed (smoke)
  • cog-ha-matter64 passed (unchanged; description-only edit)
  • ruview-swarm117 passed (default features); --features train compiles clean.

Scope was limited to the four named crates. NO-ACTION positives (witness chain, fusion, PPO + randn audit) were verified by inspection and left untouched.