_ANEDeviceInfo.aneSubType returns "h17" on M5 Max (M4 / base M5 = "h16"), but
peak FP16 (19.27 TFLOPS) and INT8 W8A8 (35.61 TOPS) match M4 within 4%.
Stories110M static 90.0 ms/step, dynamic 73.5 ms/step; Qwen3-0.6B dynamic
320.0 ms/step (1.29× M4 baseline). Training gains over base M5 are CPU-driven
(12 P-cores + Accelerate), not ANE-driven.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Benchmark report now includes full Stories110M model configuration
(arch, layers, dims, kernels). README updated: 12-layer results
replace stale single-layer numbers, limitations reflect current state.
Community-submitted results for M1 Pro/Max, M3 Pro, M4 Pro/Max, M5.
Includes training performance, peak throughput, MIL compatibility
matrix, and structured JSON data.