- Add n_tokens <= SEQ+1 check in train_large_ane.m and training_dynamic/train.m
- Prevents underflow in max_pos and possible OOB reads (aligns with train_large.m)
- Add M5 MacBook Pro benchmark result and full output for Issue #3
Made-with: Cursor
Benchmark report now includes full Stories110M model configuration
(arch, layers, dims, kernels). README updated: 12-layer results
replace stale single-layer numbers, limitations reflect current state.
Community-submitted results for M1 Pro/Max, M3 Pro, M4 Pro/Max, M5.
Includes training performance, peak throughput, MIL compatibility
matrix, and structured JSON data.