Sub-deliverable 8.2 of the benchmark/optimization milestone. Quantizes the
843,834-param "half" WiFlow-STD pose model (half_best.pth) to int8 two ways and
MEASURES the accuracy/size trade-off vs fp32 under ONE locked normalization
(ADR-173 torso-diameter PCK, upstream calculate_pck use_torso_norm=True), on the
same seed-42 file-level 70/15/15 test split that produced the fp32 sweep numbers.
MEASURED on ruvultra (RTX 5080, torch 2.11.0+cu128, fbgemm; clean test, torso-PCK):
fp32 96.62% pck@20 99.47% pck@50 0.008981 mpjpe 3.351 MB
int8 PTQ static 40.98% pck@20 94.98% pck@50 0.038262 mpjpe 1.046 MB (-55.64pp)
int8 QAT (3 ep) 67.48% pck@20 98.69% pck@50 0.026548 mpjpe 1.043 MB (-29.15pp)
Verdict (honest no): int8 is NOT a win at the strict PCK@20 edge target. Static
PTQ collapses; QAT recovers a large share but still loses 29 pp @20 for a 3.2x
size win — keep fp32/fp16 on the edge. Disclosed: QAT fake-quant val pck@20 was
83.45% but converted int8 scores 67.48% (~16pp convert_fx gap, reported honestly).
Deliverables:
- v2/crates/wifi-densepose-train/scripts/quantize_half_int8.py (reproducible:
header carries the exact ssh command + run date; QAT primary, static PTQ fallback)
- docs/adr/ADR-175-int8-quantization-half-pose-model-measured.md (MEASURED table,
locked normalization, QAT-vs-PTQ labeling, verdict, reproduction, limitations)
- CHANGELOG [Unreleased] ### Added entry
No production Rust or signal-pipeline change. Python deterministic proof unchanged
(f8e76f21a0f9852b70b6d9dd5318239f6b20cbcb4cdd995863263cecdc446f7a, bit-exact).