Commit Graph

4 Commits

Author SHA1 Message Date
Alvaro Videla 9ff4a57e0a
Merge df885ed3df into 20cd236f61 2026-03-10 15:12:42 +05:30
Alvaro GPT df885ed3df perf: reduce compile & IO overhead
- Make ACCUM_STEPS configurable via ANE_ACCUM_STEPS env var (default 10)
  Higher values = fewer exec() restarts, better effective throughput

- Make MAX_COMPILES configurable via ANE_MAX_COMPILES env var (default 100)
  Allows tuning for different hardware/OS versions

- IOSurface pooling: reuse freed surfaces by size instead of creating new
  Avoids repeated IOSurfaceCreate/CFRelease on every recompile cycle
  Pool capacity: 128 surfaces with swap-remove for O(n) lookup
2026-03-03 20:47:29 +01:00
Alvaro GPT 541bf4ec90 fix: correctness & safety improvements
- Validate all fread() return values in model_load_weights (model.h)
- Check ane_eval() return values in ane_conv_eval (forward.h) and ane_eval_k (tiny_train.m)
- Log error details on ANE eval failure (ane_runtime.h)
- Thread-safe RMSNorm: replace global g_rms_tmp with local allocation (stories_cpu_ops.h)
- Bounds-check token indices in cross_entropy_loss, embed_lookup, embed_backward
- Atomic checkpoint writes via tmp+rename pattern (tiny_train.m)
- Non-destructive recompile: compile new kernels first, swap only on success (model.h)
- Validate fread() in load_checkpoint (tiny_train.m)
2026-03-03 20:46:58 +01:00
maderix f213c8db68 Initial release 2026-02-28 00:22:06 -08:00