ANE/CONTRIBUTING.md

1.8 KiB

Contributing to ANE Training

Thanks for your interest in contributing! This community fork welcomes benchmark submissions, bug fixes, and research contributions.

Benchmark Submissions (Easiest Way to Contribute)

The single most valuable thing you can do is run the benchmark on your hardware and submit results.

Quick Version

bash scripts/run_community_benchmark.sh

The script will guide you through everything, including optional auto-submission to the dashboard.

What Gets Collected

  • Your chip model (e.g., Apple M4 Max)
  • macOS version, memory, core counts
  • SRAM probe results (TFLOPS vs weight size)
  • In-memory peak TFLOPS
  • Training performance (optional, requires training data)
  • Your GitHub username (optional)

No personal data, no IP addresses stored (only hashed for rate limiting).

Bug Reports

Open an issue with:

  • Your hardware (chip, macOS version, memory)
  • Steps to reproduce
  • Expected vs actual behavior
  • Relevant log output

Code Contributions

  1. Fork the repository
  2. Create a feature branch (git checkout -b my-feature)
  3. Make your changes
  4. Test on your hardware
  5. Submit a Pull Request

Code Style

  • Objective-C: follow the existing style in training/ (no ARC annotations in headers, _Float16 for fp16)
  • Shell scripts: use set -euo pipefail, quote variables
  • Python: minimal dependencies, Python 3.11+ compatible

Areas Where Help is Needed

  • Benchmarks on hardware we don't have: M1, M2, M3, M3 Pro/Max/Ultra, M4 Pro, M5
  • Reducing compilation overhead: currently 80-85% of wall time
  • _ANEChainingRequest research: pipelining multiple ANE operations without recompile
  • _ANEPerformanceStats investigation: getting real hardware timing data
  • Larger model support: scaling beyond Stories110M

Questions?

Open a GitHub issue or discussion. We're happy to help.