mirror of https://github.com/maderix/ANE.git
3.1 KiB
3.1 KiB
ANE Community Benchmarks
Standardized benchmark results from different Apple Silicon machines, contributed by the community.
How to Run
# Full benchmark (SRAM probe + peak TFLOPS + training)
bash scripts/run_community_benchmark.sh
# Quick benchmark (skip training -- useful if you don't have training data)
bash scripts/run_community_benchmark.sh --skip-training
# Custom training steps
bash scripts/run_community_benchmark.sh --steps 50
This produces a JSON file in community_benchmarks/ named <chip>_<date>.json.
Prerequisites
- macOS on Apple Silicon (M1/M2/M3/M4/M5)
- Xcode command line tools (
xcode-select --install) - Python 3.11-3.13 with
coremltools(auto-installed into a temp venv) - For training benchmarks: run
cd training && make datafirst
How to Submit
Option 1: Pull Request
- Fork this repo
- Run the benchmark:
bash scripts/run_community_benchmark.sh - Commit the generated JSON file from
community_benchmarks/ - Open a PR
Option 2: GitHub Issue
- Run the benchmark
- Open a new issue with title "Benchmark: [Your Chip]"
- Paste the contents of your JSON file
Viewing Aggregated Results
python3 scripts/aggregate_benchmarks.py
This reads all JSON files in community_benchmarks/ and prints a markdown comparison table.
JSON Schema (v1)
Each submission contains:
{
"schema_version": 1,
"timestamp": "2026-03-03T12:00:00Z",
"system": {
"chip": "Apple M4 Max",
"machine": "Mac16,5",
"macos_version": "26.2",
"memory_gb": 128,
"neural_engine_cores": "16"
},
"benchmarks": {
"sram_probe": [
{"channels": 256, "weight_mb": 0.1, "ms_per_eval": 0.378, "tflops": 0.02, "gflops_per_mb": 177.7},
...
],
"inmem_peak": [
{"depth": 128, "channels": 512, "spatial": 64, "weight_mb": 64.0, "gflops": 4.29, "ms_per_eval": 0.385, "tflops": 11.14},
...
],
"training_cpu_classifier": {
"ms_per_step": 72.4,
"ane_tflops_sustained": 1.29,
"ane_util_pct": 8.1,
"compile_pct": 79.7
},
"training_ane_classifier": {
"ms_per_step": 62.9,
"ane_tflops_sustained": 1.68,
"ane_util_pct": 10.6,
"compile_pct": 84.5
}
},
"summary": {
"peak_tflops": 11.14,
"sram_spill_start_channels": 4096,
"training_ms_per_step_cpu": 72.4,
"training_ms_per_step_ane": 62.9,
"training_ane_tflops": 1.68,
"training_ane_util_pct": 10.6
}
}
What We're Measuring
| Benchmark | What it tells us |
|---|---|
| sram_probe | ANE SRAM capacity -- where weight spilling starts |
| inmem_peak | Maximum achievable TFLOPS via programmatic MIL |
| training (CPU cls) | End-to-end training perf with CPU classifier |
| training (ANE cls) | End-to-end training perf with ANE-offloaded classifier |
Key metrics to compare across chips:
- Peak TFLOPS: raw ANE compute capability
- SRAM spill point: determines max efficient kernel size
- Training ms/step: real-world training performance
- ANE utilization %: how much of peak we actually use