History

rUv b3a5012dbd feat(cog-person-count): v0.0.2 — K-fold + label-smoothing + temperature-calibrated (#699 ) * chore: stage v0.0.2 artifacts + temperature scalar for build pipeline Stages count_v1.{safetensors,onnx,temperature,train_results.json} ahead of the build/sign/upload step. This commit is a momentary side-effect — the next commit will refresh the per-arch manifests with the new binary SHAs once ruvultra finishes the cross-build. The .temperature file holds the calibration scalar from LBFGS over the held-out conf logits. The Rust cog will read it post-load and divide conf_logits by it before sigmoid, exactly matching the Python eval. * feat(cog-person-count): v0.0.2 — K-fold validated, label smoothing + early stop + temp scale The v0.0.1 "65.1% but class-1=0%" result was an unlucky temporal split that let a degenerate "always predict 0" classifier hit eval acc = class-0 fraction. 5-fold stratified random CV proved the architecture actually learns ~57.1% class-1 accuracy under fair splits — a real, modestly useful signal. v0.0.2 ships a retrained model that: * Splits randomly (seed=42) 80/20 instead of temporally — eliminates the trailing-window-class-imbalance cheat. * Class-balanced sampler (multinomial with replacement, weighted by inverse class frequency) — per-batch expected counts are equal regardless of dataset distribution. * Label smoothing 0.1 on the cross-entropy — reduces confidence saturation that drove v0.0.1's all-or-nothing predictions. * Early stopping with patience=20 — stops at epoch 29 instead of overfitting through 400. * Temperature scaling of the conf head — LBFGS fits a scalar T on held-out conf logits; ships as a count_v1.temperature sidecar so the Rust cog can divide conf_logits by T before sigmoid. Numbers on the same data: \| Metric \| v0.0.1 \| v0.0.2 \| K-fold (5x100) \| \|------------------\|--------\|--------\|----------------\| \| Overall acc \| 65.1% \| 62.3% \| 62.2% ± 1.9% \| \| Class 0 acc \| 100% \| 86.2% \| 67.4% \| \| Class 1 acc \| 0% \| 34.3% \| 57.1% ✓ \| \| MAE \| 0.349 \| 0.377 \| 0.378 \| \| Spearman \| 0.023 \| 0.013 \| 0.160 \| Class-1 accuracy 0 → 34.3% is the headline win. Net acc moves slightly because we stopped cheating on class 0. K-fold's 57% says there's headroom remaining; reaching it needs more independent splits (== more data), not more training tricks. Confidence calibration didn't move. Temperature scaling alone can't fix a confidence head trained against a noisy argmax==truth indicator over a 62%-accurate classifier — the head's training signal is the issue, not its post-hoc transform. The honest fix is multi-room data (#645), not another calibration knob. Live on cognitum-v0 at /var/lib/cognitum/apps/person-count/ — health reports candle-cpu backend, count = 1 (was 0 in v0.0.1) on synthetic zero input. Files changed: * scripts/train-count.py — adds --k-fold (no sklearn dep, hand-rolled stratified splits with deterministic shuffle) and --v2 paths. * v2/.../cog/artifacts/count_v1.safetensors (392 KB, new sha 32996433…) + count_v1.onnx (16 KB) + count_v1.temperature (0.9262 scalar) + count_train_results.json (full epoch trace). * v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json bumped to version 0.0.2 with the new weights_sha256 + caveats. * docs/benchmarks/person-count-cog.md — appends a v0.0.2 section with the K-fold diagnostic table and honest-read paragraph. GCS: gs://cognitum-apps/cogs/arm/cog-person-count-count_v1.safetensors refreshed (binaries unchanged — load weights via mmap at runtime).		2026-05-21 19:47:04 -04:00
..
artifacts	feat(cog-person-count): v0.0.2 — K-fold + label-smoothing + temperature-calibrated (#699 )	2026-05-21 19:47:04 -04:00
README.md	feat(cog-person-count): wire `run` subcommand — v0.0.1 fully functional (#697 )	2026-05-21 19:10:15 -04:00
config.schema.json	feat(cog-person-count): v0.0.1 scaffold + tests + fusion math + bench (ADR-103) (#694 )	2026-05-21 18:46:57 -04:00
manifest.template.json	feat(cog-person-count): v0.0.1 scaffold + tests + fusion math + bench (ADR-103) (#694 )	2026-05-21 18:46:57 -04:00

README.md

Person Count Cog

Learned multi-person counter for WiFi CSI — designed in ADR-103, packaged per ADR-100, discoverable through ADR-102.

What it does

Replaces the PR #491 slot heuristic (subcarrier_diversity / dedup_factor) with a Candle network that emits a calibrated count distribution + confidence per CSI window. Multi-node deployments fuse N per-node predictions through a confidence-weighted log-sum (Bayesian product of experts), optionally bounded above by a Stoer-Wagner min-cut from the subcarrier-similarity graph.

Output (per frame)

{
  "ts": 1779210883.444,
  "level": "info",
  "event": "person.count",
  "fields": {
    "tick": 12345,
    "count": 2,
    "confidence": 0.81,
    "count_p95_low": 1,
    "count_p95_high": 3,
    "n_nodes": 3,
    "probs": [0.01, 0.03, 0.81, 0.13, 0.01, 0.005, 0.003, 0.002]
  }
}

Downstream consumers can render the most-likely count when confidence is high, or fall back to a [lo, hi] band with a "?" badge when the model is uncertain — that's how this Cog closes the loop on #499's ghost-skeleton UX.

Status — v0.0.1

Component	State
Crate compiles, library API stable	✅
Tests pass (15 total: 8 smoke + 7 fusion)	✅
Four-verb runtime contract (`version`, `manifest`, `health`)	✅
Trained `count_v1.safetensors` artifact	✅ shipped at `cog/artifacts/count_v1.safetensors` (392 KB)
ONNX export	✅ `count_v1.onnx` (16 KB), bit-compatible architecture
Honest accuracy reporting	✅ See `docs/benchmarks/person-count-cog.md` — 65.1% eval acc on a single-session dataset; confidence head Spearman 0.023 ⇒ uncalibrated for v0.0.1
`run` subcommand (long-running loop)	⏳ same shape as cog-pose-estimation::runtime, lands in follow-up
Signed binary on GCS	⏳ release pipeline
Stoer-Wagner min-cut clip in fusion stage	⏳ v0.2.0 (hook in `fusion::fuse_with_mincut_clip` is stubbed)

Honest v0.0.1 caveat

count_v1 was trained on a single 30-minute solo recording. The model overfit by epoch ~100 and the "best" checkpoint is one that effectively predicts the eval-window class distribution (mostly class-0). Class-1 accuracy on the held-out tail = 0%. This v0.0.1 is a working pipeline with a degenerate model, not a usable counter yet — same data-bound failure mode as pose_v1 (#645), same fix: multi-room paired recordings.

cog-person-count health will load the real safetensors and report backend: candle-cpu rather than backend: stub, so the cog-gateway can verify the model loaded — but operators should treat the v0.0.1 count outputs as scaffold-validation rather than production data. The 2.36 MB binary + 392 KB weights + 16 KB ONNX are all real and reusable as soon as more data lands.

Relationship to the in-process `csi.rs::score_to_person_count` heuristic

This Cog runs out-of-process alongside wifi-densepose-sensing-server. The two are complementary, not competing:

The sensing-server keeps emitting its existing slot-count heuristic from csi.rs::score_to_person_count (PR #491's RollingP95 + dedup_factor). This is the fallback path — operators who don't install cog-person-count still get a count number, just a less calibrated one.
cog-person-count (this binary) polls the same /api/v1/sensing/latest endpoint, runs the learned count_v1 model on each window, and emits person.count events on stdout. The appliance's cognitum-cog-gateway routes those events to the dashboard via the standard ADR-220 cog-event channel.

Operators choose by installing or not installing this Cog — no sensing-server rebuild required. Downstream consumers (UI, fleet automation, alerting rules) can subscribe to whichever event stream they prefer.

The architecture decision is documented in ADR-103 §"Deployment" and matches the cog/sensing-server boundary established for cog-pose-estimation (ADR-101).

Security

The cog has a very small attack surface — by design, it's a pure consumer of CSI data, not a server:

Threat	Mitigation
Untrusted model file mmap	`count_v1.safetensors` is loaded via `VarBuilder::from_mmaped_safetensors` (`unsafe` block, documented). The release pipeline signs the file with `COGNITUM_OWNER_SIGNING_KEY` per ADR-100; the appliance's cog-gateway verifies the Ed25519 signature against `weights_sha256` before placing the file under `/var/lib/cognitum/apps/person-count/`.
Non-finite outputs from a corrupted model	`CountPrediction::is_finite()` is checked in `cmd_health` and in the v0.0.1 run-loop before any `person.count` event is emitted; non-finite outputs fail-closed.
Sensing-server fetch failures	When the sensing source goes away the cog emits a `WARN` event and skips the frame — same fail-open-as-log pattern as `cog-pose-estimation`. No crash, no leaked file descriptors, no stuck `pid` file.
Fusion divide-by-zero / log-of-zero	`fuse_confidence_weighted` floors confidences at `1e-3` and floors probabilities at `1e-9` before taking logs. Empty input returns the stub default rather than NaN-propagating.
Over-the-cap mass after min-cut clip	`fuse_with_mincut_clip` re-normalises the surviving prefix; if all mass was above the cap (degenerate case), it places mass at the cap class rather than producing a zero distribution.
Output spoofing via stdout	Events go to stdout exactly as ADR-100's runtime contract specifies — the cog-gateway parses each line as JSON. No interactive prompts, no shell escapes, no ANSI control sequences from this cog.

The cog opens zero network listeners and writes to zero files under /var/lib/cognitum/apps/person-count/ beyond the standard pid, output.log, and error.log that the cog-gateway manages externally.

Performance / optimization

Release build: 2.36 MB stripped binary on x86_64-unknown-linux-gnu (smaller than cog-pose-estimation's 4.5 MB because we don't transitively pull wifi-densepose-train).

Workspace release profile already enables opt-level = 3, lto = "fat", codegen-units = 1, strip = true. No further per-cog optimization knobs needed.

Cold-start latency (30 sequential health invocations, Windows x86_64, candle-cpu backend):

Cog	Cold-start
`cog-pose-estimation`	76.2 ms
`cog-person-count`	53.3 ms

Long-running run warm inference: sub-millisecond per frame in the stub backend (single softmax over 8 classes is essentially free). The trained-model warm path is bounded by the three Conv1d layers — projected ≤ 2 ms on a Pi 5 once count_v1.safetensors lands, well under the ≤ 5 ms ADR-103 budget.