Commit Graph

3 Commits

Author SHA1 Message Date
ruv b16d7431bc docs(bench): append v0.0.2 section to person-count benchmark log
Documents the K-fold diagnostic (62.2 ± 1.9% / class-1 57.1%) that
justified v0.0.2, the v0.0.2 numbers (class-1 0% → 34.3%), and the
honest read that the gap to the K-fold mean is run-to-run variance
not missing improvement.
2026-05-21 19:47:55 -04:00
rUv a5e99670f8
feat(cog-person-count): release v0.0.1 — signed binaries on GCS, live on cognitum-v0 (#696)
Phase 3 of ADR-103. Cross-compiled aarch64 + x86_64 on ruvultra, signed
with COGNITUM_OWNER_SIGNING_KEY (Ed25519), uploaded to GCS, and live-
installed on the cognitum-v0 Pi 5 alongside cog-pose-estimation.

Real-hardware bench on cognitum-v0:
  ./cog-person-count-arm health
  → backend=candle-cpu, count=0, confidence=0.49, p95=[0,7]
  30 sequential health invocations: 0.276 s → 9.2 ms/invocation cold

Compares to cog-pose-estimation's 8.4 ms — count cog is ~10% slower
because the dual-head (count softmax + confidence sigmoid) does ~2x
the work after the shared encoder.

GCS release artifacts (publicly downloadable, SHA-verified):
  arm/cog-person-count-arm                          2,168,816 B
    sha:  36bc0bb0...0d47b507b3c3
    sig:  R/00xdzHriyr/2r...JK+a6k71NDg==  (Ed25519)
  x86_64/cog-person-count-x86_64                    2,615,528 B
    sha:  76cdd1ec...3923 7392b01db
    sig:  QB+8cnGSMQmu...ZtTNIQ2rDg==  (Ed25519)
  arm/cog-person-count-count_v1.safetensors           392,088 B
    sha:  dacb0551...e6e04ff56d15c3a65a9ff

Live install at /var/lib/cognitum/apps/person-count/ on cognitum-v0
matches the layout of every other installed cog (anomaly-detect,
seizure-detect, pose-estimation): cog-person-count-arm binary,
count_v1.safetensors weights, manifest.json, config.json.

Adds:
* v2/.../cog/artifacts/manifests/{arm,x86_64}/manifest.json — full
  ADR-100 schema with all fields filled (sha + sig + size + URL +
  build_metadata carrying the v0.0.1 honest training caveats).
* docs/benchmarks/person-count-cog.md — appends "Live appliance
  install" and "Signed GCS release artifacts" sections to the
  benchmark log.

Honest v0.0.1 caveat still applies (class-1 accuracy 0% on the held-
out tail of the single-session training data) — same data-bound
limit as pose_v1. The shipped artifact is the *vehicle*; production-
quality accuracy follows from multi-room paired data per ADR-103's
v0.2.0 plan + #645.
2026-05-21 19:02:26 -04:00
rUv 6b4994e105
feat(cog-person-count): train count_v1.safetensors — honest v0.0.1 (ADR-103) (#695)
Phase 2 of ADR-103: trained count head on the existing 1,077 paired
samples (the same data that produced pose_v1 yesterday).

Honest result: 65.1% eval accuracy / 100% within ±1 / MAE 0.349 on
the held-out time-window. Per-class: 100% on "empty room" / 0% on
"1 person". The model overfit by epoch 100 (train_acc → 1.0,
eval_loss climbed 0.67 → 7.8) and the "best" checkpoint is the
snapshot that happened to predict the eval window's class
distribution (140/215 = 65.1%, matches eval_acc exactly). Confidence
head Spearman = 0.023 ⇒ uncalibrated. Same data-bound failure mode
as pose_v1 (#645), bounded by single-session training data; same
fix path (multi-room).

What v0.0.1 still validates end-to-end:
* PyTorch → safetensors → Candle Rust loads cleanly on first try.
  `cog-person-count health` reports `backend: candle-cpu` and emits
  real per-frame predictions instead of the stub backend's hard-coded
  {1 person, 0 confidence}. Architecture parity between train-count.py
  and src/inference.rs::CountNet is bit-exact.
* ONNX export bit-clean (16 KB, opset 18, dynamic batch axis).
* Training wall time: 5.6 s for 400 epochs on RTX 5080.
* Binary size unchanged (2.36 MB stripped), model loads via mmap at
  runtime.

This commit ships:

* scripts/align-ground-truth.js: extended to emit n_persons_mode +
  n_persons_max per window so the training pipeline has count
  labels. Backwards-compatible (additive fields).
* scripts/train-count.py: new — mirrors CountNet architecture
  exactly, loads paired.jsonl, trains 400 epochs with
  CE+BCE+Brier loss, exports safetensors + ONNX + per-epoch JSON.
* v2/.../cog/artifacts/{count_v1.safetensors,count_v1.onnx,
  count_train_results.json}: the trained artifacts.
* v2/.../cog/README.md: Status table updated with the v0.0.1 numbers
  + an Honest Caveat section explaining the data-bound result.
* docs/benchmarks/person-count-cog.md: new — full v0.0.1 benchmark
  log mirroring the format docs/benchmarks/pose-estimation-cog.md
  established. Includes comparison to ADR-103 v0.1.0 acceptance
  gates and per-class breakdown.

Still pending:
* `run` subcommand wiring (long-running polling loop, same as pose)
* Cross-compile + sign + GCS upload (mirror of pose cog pipeline)
* Live install on cognitum-v0
* v0.2.0: re-train on multi-room data, LoRA per-room adapters,
  Stoer-Wagner min-cut clip in fusion stage
2026-05-21 18:56:52 -04:00