# Contributing to ANE Training Thanks for your interest in contributing! This community fork welcomes benchmark submissions, bug fixes, and research contributions. ## Benchmark Submissions (Easiest Way to Contribute) The single most valuable thing you can do is run the benchmark on your hardware and submit results. ### Quick Version ```bash bash scripts/run_community_benchmark.sh ``` The script will guide you through everything, including optional auto-submission to the dashboard. ### What Gets Collected - Your chip model (e.g., Apple M4 Max) - macOS version, memory, core counts - SRAM probe results (TFLOPS vs weight size) - In-memory peak TFLOPS - Training performance (optional, requires training data) - Your GitHub username (optional) No personal data, no IP addresses stored (only hashed for rate limiting). ## Bug Reports Open an issue with: - Your hardware (chip, macOS version, memory) - Steps to reproduce - Expected vs actual behavior - Relevant log output ## Code Contributions 1. Fork the repository 2. Create a feature branch (`git checkout -b my-feature`) 3. Make your changes 4. Test on your hardware 5. Submit a Pull Request ### Code Style - Objective-C: follow the existing style in `training/` (no ARC annotations in headers, `_Float16` for fp16) - Shell scripts: use `set -euo pipefail`, quote variables - Python: minimal dependencies, Python 3.11+ compatible ### Areas Where Help is Needed - **Benchmarks on hardware we don't have**: M1, M2, M3, M3 Pro/Max/Ultra, M4 Pro, M5 - **Reducing compilation overhead**: currently 80-85% of wall time - **`_ANEChainingRequest` research**: pipelining multiple ANE operations without recompile - **`_ANEPerformanceStats` investigation**: getting real hardware timing data - **Larger model support**: scaling beyond Stories110M ## Questions? Open a GitHub issue or discussion. We're happy to help.