Commit Graph

1 Commits

Author SHA1 Message Date
tom 09e9c996bb Add optimized training variant: 14% speedup (107→92 ms/step)
New train_opt target with NEON-vectorized Adam, fp16 activation/gradient
caching, concurrent dW dispatch, pre-allocated buffers, and optional
Metal GPU support. Tested on M3 Max with stories110M.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 09:08:12 -04:00