mirror of https://github.com/maderix/ANE.git
96 lines
3.4 KiB
Markdown
96 lines
3.4 KiB
Markdown
# Contribution submission guide
|
||
|
||
This file summarizes what was done on branch `contribution/benchmark-m5-and-fixes` and how to submit it.
|
||
|
||
---
|
||
|
||
## 1. Benchmark (submit to Issue #3)
|
||
|
||
**Link:** https://github.com/maderix/ANE/issues/3
|
||
|
||
**Post this as a new comment:**
|
||
|
||
```
|
||
## M5 MacBook Pro benchmark (static pipeline, 20 steps)
|
||
|
||
- **Chip:** Apple M5, 10-core (4P+6E)
|
||
- **RAM:** 24 GB
|
||
- **macOS:** 26.3 (Build 25D125)
|
||
- **Run:** `./train_large --data ./tinystories_data00.bin --steps 20 --lr 1e-4`
|
||
|
||
### Efficiency report
|
||
- Total steps: 20
|
||
- Wall time: 10423 ms (10.4 s)
|
||
- Compile time: 7187 ms (69.0%)
|
||
- Train time: 2542 ms (24.4%)
|
||
- **Avg train: 127.1 ms/step**
|
||
- ANE TFLOPS: 0.73 sustained
|
||
- ANE utilization: 4.6% of 15.8 TFLOPS
|
||
|
||
Full output with JSON lines is in `benchmarks/my_m5_benchmark_output.txt` (or paste the contents below).
|
||
```
|
||
|
||
Then paste the contents of `benchmarks/my_m5_benchmark_output.txt` in the same comment, or attach it.
|
||
|
||
---
|
||
|
||
## 2. Bug fix (PR)
|
||
|
||
**Fix:** Guard short token datasets in `train_large_ane.m` and `training/training_dynamic/train.m`.
|
||
|
||
**Why:** When `n_tokens <= SEQ + 1`, the expression `max_pos = n_tokens - SEQ - 1` underflows (unsigned), leading to a huge random range and possible out-of-bounds reads. `train_large.m` already had this guard; the other two pipelines did not.
|
||
|
||
**Changes:**
|
||
- `training/train_large_ane.m`: After `n_tokens = data_len / 2`, add a check that fails early with a clear error, munmap and close the fd, and return 1.
|
||
- `training/training_dynamic/train.m`: Same guard added.
|
||
|
||
**Suggested PR title:** `fix: guard short token datasets in train_large_ane and dynamic pipeline`
|
||
|
||
**Suggested PR description:**
|
||
|
||
```markdown
|
||
## Summary
|
||
- Add a token dataset length guard in `training/train_large_ane.m`
|
||
- Add the same guard in `training/training_dynamic/train.m`
|
||
- Fail early with a clear error when the dataset is too short for one (input, target) window
|
||
|
||
## Why
|
||
Both paths use `max_pos = n_tokens - SEQ - 1`. When `n_tokens <= SEQ + 1`, this unsigned subtraction underflows, producing a huge range and potentially out-of-bounds reads. `train_large.m` already had this guard (lines 299–304); this PR aligns the other two pipelines.
|
||
|
||
## Validation
|
||
- `make -C training train_large_ane` — builds
|
||
- `make -C training/training_dynamic train` — builds
|
||
- With a too-short data file, both binaries exit with the new error message.
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Optional: benchmark data in repo
|
||
|
||
Branch also adds:
|
||
- `benchmarks/my_m5_benchmark_output.txt` — full benchmark log
|
||
- One new entry in `benchmarks/community_results.json` for this M5 run (contributor: `log-wade`)
|
||
|
||
You can either:
|
||
- Include the `community_results.json` update in the same PR as the bug fix, or
|
||
- Omit it and only post the benchmark to Issue #3 (maintainer may update the report from the issue).
|
||
|
||
---
|
||
|
||
## 4. Before opening the PR
|
||
|
||
1. **Fork the repo** on GitHub (if you haven’t): https://github.com/maderix/ANE → Fork.
|
||
2. **Add your fork as a remote and push:**
|
||
```bash
|
||
git remote add myfork git@github.com:YOUR_USERNAME/ANE.git
|
||
git push myfork contribution/benchmark-m5-and-fixes
|
||
```
|
||
3. Open a PR from `myfork/contribution/benchmark-m5-and-fixes` to `maderix/ANE` main.
|
||
4. Post the benchmark comment to Issue #3 (link above).
|
||
|
||
---
|
||
|
||
## 5. Replace contributor name
|
||
|
||
In `benchmarks/community_results.json`, the new entry uses `"contributor": "log-wade"`. Change that to your GitHub username if different.
|