berkus/ANE - ANE

Commit Graph

Author	SHA1	Message	Date
Alvaro GPT	df885ed3df	perf: reduce compile & IO overhead - Make ACCUM_STEPS configurable via ANE_ACCUM_STEPS env var (default 10) Higher values = fewer exec() restarts, better effective throughput - Make MAX_COMPILES configurable via ANE_MAX_COMPILES env var (default 100) Allows tuning for different hardware/OS versions - IOSurface pooling: reuse freed surfaces by size instead of creating new Avoids repeated IOSurfaceCreate/CFRelease on every recompile cycle Pool capacity: 128 surfaces with swap-remove for O(n) lookup	2026-03-03 20:47:29 +01:00
maderix	4d67db1bdb	stories110M: 12-layer ANE training with dashboard, 107ms/step - Scale to full stories110M (109M params, 12 layers) with real TinyStories data - vDSP-vectorized cross-entropy (110ms→14ms), NEON fp16 IO, async dW - TUI dashboard: loss curve, ANE/CPU power, CPU/memory graphs, text generation - Split into modular headers: config, io, mil, cpu_ops	2026-03-01 03:14:39 -08:00

Author

SHA1

Message

Date

Alvaro GPT

df885ed3df

perf: reduce compile & IO overhead

- Make ACCUM_STEPS configurable via ANE_ACCUM_STEPS env var (default 10)
  Higher values = fewer exec() restarts, better effective throughput

- Make MAX_COMPILES configurable via ANE_MAX_COMPILES env var (default 100)
  Allows tuning for different hardware/OS versions

- IOSurface pooling: reuse freed surfaces by size instead of creating new
  Avoids repeated IOSurfaceCreate/CFRelease on every recompile cycle
  Pool capacity: 128 surfaces with swap-remove for O(n) lookup

2026-03-03 20:47:29 +01:00

maderix

4d67db1bdb

stories110M: 12-layer ANE training with dashboard, 107ms/step

- Scale to full stories110M (109M params, 12 layers) with real TinyStories data
- vDSP-vectorized cross-entropy (110ms→14ms), NEON fp16 IO, async dW
- TUI dashboard: loss curve, ANE/CPU power, CPU/memory graphs, text generation
- Split into modular headers: config, io, mil, cpu_ops

2026-03-01 03:14:39 -08:00

2 Commits