ANE/CONTRIBUTION_SUBMISSION.md

# Contribution submission guide

This file summarizes what was done on branch `contribution/benchmark-m5-and-fixes` and how to submit it.

---

## 1. Benchmark (submit to Issue #3)

**Link:** https://github.com/maderix/ANE/issues/3

**Post this as a new comment:**

```
## M5 MacBook Pro benchmark (static pipeline, 20 steps)

- **Chip:** Apple M5, 10-core (4P+6E)
- **RAM:** 24 GB
- **macOS:** 26.3 (Build 25D125)
- **Run:** `./train_large --data ./tinystories_data00.bin --steps 20 --lr 1e-4`

### Efficiency report
- Total steps: 20
- Wall time: 10423 ms (10.4 s)
- Compile time: 7187 ms (69.0%)
- Train time: 2542 ms (24.4%)
- **Avg train: 127.1 ms/step**
- ANE TFLOPS: 0.73 sustained
- ANE utilization: 4.6% of 15.8 TFLOPS

Full output with JSON lines is in `benchmarks/my_m5_benchmark_output.txt` (or paste the contents below).
```

Then paste the contents of `benchmarks/my_m5_benchmark_output.txt` in the same comment, or attach it.

---

## 2. Bug fix (PR)

**Fix:** Guard short token datasets in `train_large_ane.m` and `training/training_dynamic/train.m`.

**Why:** When `n_tokens <= SEQ + 1`, the expression `max_pos = n_tokens - SEQ - 1` underflows (unsigned), leading to a huge random range and possible out-of-bounds reads. `train_large.m` already had this guard; the other two pipelines did not.

**Changes:**
- `training/train_large_ane.m`: After `n_tokens = data_len / 2`, add a check that fails early with a clear error, munmap and close the fd, and return 1.
- `training/training_dynamic/train.m`: Same guard added.

**Suggested PR title:** `fix: guard short token datasets in train_large_ane and dynamic pipeline`

**Suggested PR description:**

```markdown
## Summary
- Add a token dataset length guard in `training/train_large_ane.m`
- Add the same guard in `training/training_dynamic/train.m`
- Fail early with a clear error when the dataset is too short for one (input, target) window

## Why
Both paths use `max_pos = n_tokens - SEQ - 1`. When `n_tokens <= SEQ + 1`, this unsigned subtraction underflows, producing a huge range and potentially out-of-bounds reads. `train_large.m` already had this guard (lines 299–304); this PR aligns the other two pipelines.

## Validation
- `make -C training train_large_ane` — builds
- `make -C training/training_dynamic train` — builds
- With a too-short data file, both binaries exit with the new error message.
```

---

## 3. Optional: benchmark data in repo

Branch also adds:
- `benchmarks/my_m5_benchmark_output.txt` — full benchmark log
- One new entry in `benchmarks/community_results.json` for this M5 run (contributor: `log-wade`)

You can either:
- Include the `community_results.json` update in the same PR as the bug fix, or
- Omit it and only post the benchmark to Issue #3 (maintainer may update the report from the issue).

---

## 4. Before opening the PR

1. **Fork the repo** on GitHub (if you haven’t): https://github.com/maderix/ANE → Fork.
2. **Add your fork as a remote and push:**
   ```bash
   git remote add myfork git@github.com:YOUR_USERNAME/ANE.git
   git push myfork contribution/benchmark-m5-and-fixes
   ```
3. Open a PR from `myfork/contribution/benchmark-m5-and-fixes` to `maderix/ANE` main.
4. Post the benchmark comment to Issue #3 (link above).

---

## 5. Replace contributor name

In `benchmarks/community_results.json`, the new entry uses `"contributor": "log-wade"`. Change that to your GitHub username if different.