wifi-densepose/v2/crates/cog-person-count/cog
rUv 6b4994e105
feat(cog-person-count): train count_v1.safetensors — honest v0.0.1 (ADR-103) (#695)
Phase 2 of ADR-103: trained count head on the existing 1,077 paired
samples (the same data that produced pose_v1 yesterday).

Honest result: 65.1% eval accuracy / 100% within ±1 / MAE 0.349 on
the held-out time-window. Per-class: 100% on "empty room" / 0% on
"1 person". The model overfit by epoch 100 (train_acc → 1.0,
eval_loss climbed 0.67 → 7.8) and the "best" checkpoint is the
snapshot that happened to predict the eval window's class
distribution (140/215 = 65.1%, matches eval_acc exactly). Confidence
head Spearman = 0.023 ⇒ uncalibrated. Same data-bound failure mode
as pose_v1 (#645), bounded by single-session training data; same
fix path (multi-room).

What v0.0.1 still validates end-to-end:
* PyTorch → safetensors → Candle Rust loads cleanly on first try.
  `cog-person-count health` reports `backend: candle-cpu` and emits
  real per-frame predictions instead of the stub backend's hard-coded
  {1 person, 0 confidence}. Architecture parity between train-count.py
  and src/inference.rs::CountNet is bit-exact.
* ONNX export bit-clean (16 KB, opset 18, dynamic batch axis).
* Training wall time: 5.6 s for 400 epochs on RTX 5080.
* Binary size unchanged (2.36 MB stripped), model loads via mmap at
  runtime.

This commit ships:

* scripts/align-ground-truth.js: extended to emit n_persons_mode +
  n_persons_max per window so the training pipeline has count
  labels. Backwards-compatible (additive fields).
* scripts/train-count.py: new — mirrors CountNet architecture
  exactly, loads paired.jsonl, trains 400 epochs with
  CE+BCE+Brier loss, exports safetensors + ONNX + per-epoch JSON.
* v2/.../cog/artifacts/{count_v1.safetensors,count_v1.onnx,
  count_train_results.json}: the trained artifacts.
* v2/.../cog/README.md: Status table updated with the v0.0.1 numbers
  + an Honest Caveat section explaining the data-bound result.
* docs/benchmarks/person-count-cog.md: new — full v0.0.1 benchmark
  log mirroring the format docs/benchmarks/pose-estimation-cog.md
  established. Includes comparison to ADR-103 v0.1.0 acceptance
  gates and per-class breakdown.

Still pending:
* `run` subcommand wiring (long-running polling loop, same as pose)
* Cross-compile + sign + GCS upload (mirror of pose cog pipeline)
* Live install on cognitum-v0
* v0.2.0: re-train on multi-room data, LoRA per-room adapters,
  Stoer-Wagner min-cut clip in fusion stage
2026-05-21 18:56:52 -04:00
..
artifacts feat(cog-person-count): train count_v1.safetensors — honest v0.0.1 (ADR-103) (#695) 2026-05-21 18:56:52 -04:00
README.md feat(cog-person-count): train count_v1.safetensors — honest v0.0.1 (ADR-103) (#695) 2026-05-21 18:56:52 -04:00
config.schema.json feat(cog-person-count): v0.0.1 scaffold + tests + fusion math + bench (ADR-103) (#694) 2026-05-21 18:46:57 -04:00
manifest.template.json feat(cog-person-count): v0.0.1 scaffold + tests + fusion math + bench (ADR-103) (#694) 2026-05-21 18:46:57 -04:00

README.md

Person Count Cog

Learned multi-person counter for WiFi CSI — designed in ADR-103, packaged per ADR-100, discoverable through ADR-102.

What it does

Replaces the PR #491 slot heuristic (subcarrier_diversity / dedup_factor) with a Candle network that emits a calibrated count distribution + confidence per CSI window. Multi-node deployments fuse N per-node predictions through a confidence-weighted log-sum (Bayesian product of experts), optionally bounded above by a Stoer-Wagner min-cut from the subcarrier-similarity graph.

Output (per frame)

{
  "ts": 1779210883.444,
  "level": "info",
  "event": "person.count",
  "fields": {
    "tick": 12345,
    "count": 2,
    "confidence": 0.81,
    "count_p95_low": 1,
    "count_p95_high": 3,
    "n_nodes": 3,
    "probs": [0.01, 0.03, 0.81, 0.13, 0.01, 0.005, 0.003, 0.002]
  }
}

Downstream consumers can render the most-likely count when confidence is high, or fall back to a [lo, hi] band with a "?" badge when the model is uncertain — that's how this Cog closes the loop on #499's ghost-skeleton UX.

Status — v0.0.1

Component State
Crate compiles, library API stable
Tests pass (15 total: 8 smoke + 7 fusion)
Four-verb runtime contract (version, manifest, health)
Trained count_v1.safetensors artifact shipped at cog/artifacts/count_v1.safetensors (392 KB)
ONNX export count_v1.onnx (16 KB), bit-compatible architecture
Honest accuracy reporting See docs/benchmarks/person-count-cog.md — 65.1% eval acc on a single-session dataset; confidence head Spearman 0.023 ⇒ uncalibrated for v0.0.1
run subcommand (long-running loop) same shape as cog-pose-estimation::runtime, lands in follow-up
Signed binary on GCS release pipeline
Stoer-Wagner min-cut clip in fusion stage v0.2.0 (hook in fusion::fuse_with_mincut_clip is stubbed)

Honest v0.0.1 caveat

count_v1 was trained on a single 30-minute solo recording. The model overfit by epoch ~100 and the "best" checkpoint is one that effectively predicts the eval-window class distribution (mostly class-0). Class-1 accuracy on the held-out tail = 0%. This v0.0.1 is a working pipeline with a degenerate model, not a usable counter yet — same data-bound failure mode as pose_v1 (#645), same fix: multi-room paired recordings.

cog-person-count health will load the real safetensors and report backend: candle-cpu rather than backend: stub, so the cog-gateway can verify the model loaded — but operators should treat the v0.0.1 count outputs as scaffold-validation rather than production data. The 2.36 MB binary + 392 KB weights + 16 KB ONNX are all real and reusable as soon as more data lands.

Security

The cog has a very small attack surface — by design, it's a pure consumer of CSI data, not a server:

Threat Mitigation
Untrusted model file mmap count_v1.safetensors is loaded via VarBuilder::from_mmaped_safetensors (unsafe block, documented). The release pipeline signs the file with COGNITUM_OWNER_SIGNING_KEY per ADR-100; the appliance's cog-gateway verifies the Ed25519 signature against weights_sha256 before placing the file under /var/lib/cognitum/apps/person-count/.
Non-finite outputs from a corrupted model CountPrediction::is_finite() is checked in cmd_health and in the v0.0.1 run-loop before any person.count event is emitted; non-finite outputs fail-closed.
Sensing-server fetch failures When the sensing source goes away the cog emits a WARN event and skips the frame — same fail-open-as-log pattern as cog-pose-estimation. No crash, no leaked file descriptors, no stuck pid file.
Fusion divide-by-zero / log-of-zero fuse_confidence_weighted floors confidences at 1e-3 and floors probabilities at 1e-9 before taking logs. Empty input returns the stub default rather than NaN-propagating.
Over-the-cap mass after min-cut clip fuse_with_mincut_clip re-normalises the surviving prefix; if all mass was above the cap (degenerate case), it places mass at the cap class rather than producing a zero distribution.
Output spoofing via stdout Events go to stdout exactly as ADR-100's runtime contract specifies — the cog-gateway parses each line as JSON. No interactive prompts, no shell escapes, no ANSI control sequences from this cog.

The cog opens zero network listeners and writes to zero files under /var/lib/cognitum/apps/person-count/ beyond the standard pid, output.log, and error.log that the cog-gateway manages externally.

Performance / optimization

Release build: 2.36 MB stripped binary on x86_64-unknown-linux-gnu (smaller than cog-pose-estimation's 4.5 MB because we don't transitively pull wifi-densepose-train).

Workspace release profile already enables opt-level = 3, lto = "fat", codegen-units = 1, strip = true. No further per-cog optimization knobs needed.

Cold-start latency (30 sequential health invocations, Windows x86_64, candle-cpu backend):

Cog Cold-start
cog-pose-estimation 76.2 ms
cog-person-count 53.3 ms

Long-running run warm inference: sub-millisecond per frame in the stub backend (single softmax over 8 classes is essentially free). The trained-model warm path is bounded by the three Conv1d layers — projected ≤ 2 ms on a Pi 5 once count_v1.safetensors lands, well under the ≤ 5 ms ADR-103 budget.

See also

  • ADR-103 — Design, SOTA comparison, acceptance gates.
  • ADR-100 — Cog packaging spec.
  • PR #491 — The heuristic this Cog replaces.
  • Issue #499 — Original "double skeletons" report that motivated ADR-103.