3.4 KiB
Contribution submission guide
This file summarizes what was done on branch contribution/benchmark-m5-and-fixes and how to submit it.
1. Benchmark (submit to Issue #3)
Link: https://github.com/maderix/ANE/issues/3
Post this as a new comment:
## M5 MacBook Pro benchmark (static pipeline, 20 steps)
- **Chip:** Apple M5, 10-core (4P+6E)
- **RAM:** 24 GB
- **macOS:** 26.3 (Build 25D125)
- **Run:** `./train_large --data ./tinystories_data00.bin --steps 20 --lr 1e-4`
### Efficiency report
- Total steps: 20
- Wall time: 10423 ms (10.4 s)
- Compile time: 7187 ms (69.0%)
- Train time: 2542 ms (24.4%)
- **Avg train: 127.1 ms/step**
- ANE TFLOPS: 0.73 sustained
- ANE utilization: 4.6% of 15.8 TFLOPS
Full output with JSON lines is in `benchmarks/my_m5_benchmark_output.txt` (or paste the contents below).
Then paste the contents of benchmarks/my_m5_benchmark_output.txt in the same comment, or attach it.
2. Bug fix (PR)
Fix: Guard short token datasets in train_large_ane.m and training/training_dynamic/train.m.
Why: When n_tokens <= SEQ + 1, the expression max_pos = n_tokens - SEQ - 1 underflows (unsigned), leading to a huge random range and possible out-of-bounds reads. train_large.m already had this guard; the other two pipelines did not.
Changes:
training/train_large_ane.m: Aftern_tokens = data_len / 2, add a check that fails early with a clear error, munmap and close the fd, and return 1.training/training_dynamic/train.m: Same guard added.
Suggested PR title: fix: guard short token datasets in train_large_ane and dynamic pipeline
Suggested PR description:
## Summary
- Add a token dataset length guard in `training/train_large_ane.m`
- Add the same guard in `training/training_dynamic/train.m`
- Fail early with a clear error when the dataset is too short for one (input, target) window
## Why
Both paths use `max_pos = n_tokens - SEQ - 1`. When `n_tokens <= SEQ + 1`, this unsigned subtraction underflows, producing a huge range and potentially out-of-bounds reads. `train_large.m` already had this guard (lines 299–304); this PR aligns the other two pipelines.
## Validation
- `make -C training train_large_ane` — builds
- `make -C training/training_dynamic train` — builds
- With a too-short data file, both binaries exit with the new error message.
3. Optional: benchmark data in repo
Branch also adds:
benchmarks/my_m5_benchmark_output.txt— full benchmark log- One new entry in
benchmarks/community_results.jsonfor this M5 run (contributor:log-wade)
You can either:
- Include the
community_results.jsonupdate in the same PR as the bug fix, or - Omit it and only post the benchmark to Issue #3 (maintainer may update the report from the issue).
4. Before opening the PR
- Fork the repo on GitHub (if you haven’t): https://github.com/maderix/ANE → Fork.
- Add your fork as a remote and push:
git remote add myfork git@github.com:YOUR_USERNAME/ANE.git git push myfork contribution/benchmark-m5-and-fixes - Open a PR from
myfork/contribution/benchmark-m5-and-fixestomaderix/ANEmain. - Post the benchmark comment to Issue #3 (link above).
5. Replace contributor name
In benchmarks/community_results.json, the new entry uses "contributor": "log-wade". Change that to your GitHub username if different.