Commit Graph

1 Commits

Author SHA1 Message Date
mgkcloud 0edafd48ca feat: double-buffered async ANE training
Key discovery: compile and eval can run in parallel via GCD.
119 foreground evals completed during a 26.8ms background compile.

Architecture:
- Two kernel sets (A/B) alternate active/pending
- Background GCD thread compiles pending kernels while active runs
- Atomic swap at batch boundary
- Eliminates 88% compilation bottleneck

Includes:
- train_double_buffer.m: modified train_large.m with async compilation
- PROBE_RESULTS.md: full benchmark data from M4 probe
- Updated Makefile
2026-03-04 00:12:17 +11:00