berkus/ANE - ANE

Commit Graph

Author	SHA1	Message	Date
tom	09e9c996bb	Add optimized training variant: 14% speedup (107→92 ms/step) New train_opt target with NEON-vectorized Adam, fp16 activation/gradient caching, concurrent dW dispatch, pre-allocated buffers, and optional Metal GPU support. Tested on M3 Max with stories110M. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 09:08:12 -04:00

Author

SHA1

Message

Date

tom

09e9c996bb

Add optimized training variant: 14% speedup (107→92 ms/step)

New train_opt target with NEON-vectorized Adam, fp16 activation/gradient
caching, concurrent dW dispatch, pre-allocated buffers, and optional
Metal GPU support. Tested on M3 Max with stories110M.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-03 09:08:12 -04:00

1 Commits