Documents the K-fold diagnostic (62.2 ± 1.9% / class-1 57.1%) that
justified v0.0.2, the v0.0.2 numbers (class-1 0% → 34.3%), and the
honest read that the gap to the K-fold mean is run-to-run variance
not missing improvement.
Phase 3 of ADR-103. Cross-compiled aarch64 + x86_64 on ruvultra, signed
with COGNITUM_OWNER_SIGNING_KEY (Ed25519), uploaded to GCS, and live-
installed on the cognitum-v0 Pi 5 alongside cog-pose-estimation.
Real-hardware bench on cognitum-v0:
./cog-person-count-arm health
→ backend=candle-cpu, count=0, confidence=0.49, p95=[0,7]
30 sequential health invocations: 0.276 s → 9.2 ms/invocation cold
Compares to cog-pose-estimation's 8.4 ms — count cog is ~10% slower
because the dual-head (count softmax + confidence sigmoid) does ~2x
the work after the shared encoder.
GCS release artifacts (publicly downloadable, SHA-verified):
arm/cog-person-count-arm 2,168,816 B
sha: 36bc0bb0...0d47b507b3c3
sig: R/00xdzHriyr/2r...JK+a6k71NDg== (Ed25519)
x86_64/cog-person-count-x86_64 2,615,528 B
sha: 76cdd1ec...3923 7392b01db
sig: QB+8cnGSMQmu...ZtTNIQ2rDg== (Ed25519)
arm/cog-person-count-count_v1.safetensors 392,088 B
sha: dacb0551...e6e04ff56d15c3a65a9ff
Live install at /var/lib/cognitum/apps/person-count/ on cognitum-v0
matches the layout of every other installed cog (anomaly-detect,
seizure-detect, pose-estimation): cog-person-count-arm binary,
count_v1.safetensors weights, manifest.json, config.json.
Adds:
* v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json — full
ADR-100 schema with all fields filled (sha + sig + size + URL +
build_metadata carrying the v0.0.1 honest training caveats).
* docs/benchmarks/person-count-cog.md — appends "Live appliance
install" and "Signed GCS release artifacts" sections to the
benchmark log.
Honest v0.0.1 caveat still applies (class-1 accuracy 0% on the held-
out tail of the single-session training data) — same data-bound
limit as pose_v1. The shipped artifact is the *vehicle*; production-
quality accuracy follows from multi-room paired data per ADR-103's
v0.2.0 plan + #645.
Phase 2 of ADR-103: trained count head on the existing 1,077 paired
samples (the same data that produced pose_v1 yesterday).
Honest result: 65.1% eval accuracy / 100% within ±1 / MAE 0.349 on
the held-out time-window. Per-class: 100% on "empty room" / 0% on
"1 person". The model overfit by epoch 100 (train_acc → 1.0,
eval_loss climbed 0.67 → 7.8) and the "best" checkpoint is the
snapshot that happened to predict the eval window's class
distribution (140/215 = 65.1%, matches eval_acc exactly). Confidence
head Spearman = 0.023 ⇒ uncalibrated. Same data-bound failure mode
as pose_v1 (#645), bounded by single-session training data; same
fix path (multi-room).
What v0.0.1 still validates end-to-end:
* PyTorch → safetensors → Candle Rust loads cleanly on first try.
`cog-person-count health` reports `backend: candle-cpu` and emits
real per-frame predictions instead of the stub backend's hard-coded
{1 person, 0 confidence}. Architecture parity between train-count.py
and src/inference.rs::CountNet is bit-exact.
* ONNX export bit-clean (16 KB, opset 18, dynamic batch axis).
* Training wall time: 5.6 s for 400 epochs on RTX 5080.
* Binary size unchanged (2.36 MB stripped), model loads via mmap at
runtime.
This commit ships:
* scripts/align-ground-truth.js: extended to emit n_persons_mode +
n_persons_max per window so the training pipeline has count
labels. Backwards-compatible (additive fields).
* scripts/train-count.py: new — mirrors CountNet architecture
exactly, loads paired.jsonl, trains 400 epochs with
CE+BCE+Brier loss, exports safetensors + ONNX + per-epoch JSON.
* v2/.../cog/artifacts/{count_v1.safetensors,count_v1.onnx,
count_train_results.json}: the trained artifacts.
* v2/.../cog/README.md: Status table updated with the v0.0.1 numbers
+ an Honest Caveat section explaining the data-bound result.
* docs/benchmarks/person-count-cog.md: new — full v0.0.1 benchmark
log mirroring the format docs/benchmarks/pose-estimation-cog.md
established. Includes comparison to ADR-103 v0.1.0 acceptance
gates and per-class breakdown.
Still pending:
* `run` subcommand wiring (long-running polling loop, same as pose)
* Cross-compile + sign + GCS upload (mirror of pose cog pipeline)
* Live install on cognitum-v0
* v0.2.0: re-train on multi-room data, LoRA per-room adapters,
Stoer-Wagner min-cut clip in fusion stage