- Make ACCUM_STEPS configurable via ANE_ACCUM_STEPS env var (default 10)
Higher values = fewer exec() restarts, better effective throughput
- Make MAX_COMPILES configurable via ANE_MAX_COMPILES env var (default 100)
Allows tuning for different hardware/OS versions
- IOSurface pooling: reuse freed surfaces by size instead of creating new
Avoids repeated IOSurfaceCreate/CFRelease on every recompile cycle
Pool capacity: 128 surfaces with swap-remove for O(n) lookup
Four standalone probe tests to characterize the M5 ANE:
- test_weight_reload: Can weights be hot-swapped via unload+load without recompilation?
- test_perf_stats: Enumerate _ANEPerformanceStats methods/properties and hardware counters
- test_qos_sweep: Measure compile/load/eval latency across QoS 0-63
- test_ane_advanced: Probe SharedEvents, weightsBuffer IOSurface, procedureIndex, VirtualClient
Training telemetry (train_large.m):
- JSON lines to stderr with per-step timing breakdown and per-batch TFLOPS metrics
- Enables external monitoring tools to visualize ANE utilization in real-time
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>