mirror of https://github.com/maderix/ANE.git
Implement Grouped-Query Attention (16q/8kv heads, head_dim=128) for Qwen3-0.6B (28 layers, 596M params). Model configs moved to models/*.h headers selected at build time via make MODEL=xxx. Key changes: - GQA-aware MIL kernels: sdpaFwd split from woFwd (Q_DIM!=DIM), qBwd/kvBwd split from qkvBwd (different IC dimensions) - K/V tile (KV_HEADS→HEADS) before SDPA backward, reduce after - 10 kernels total, all model-agnostic via compile-time defines - Makefile: make MODEL=qwen3_06b (default) or MODEL=stories110m - Both models verified: Stories110M ~115ms/step, Qwen3 ~412ms/step |
||
|---|---|---|
| .. | ||
| qwen3_06b.h | ||
| stories110m.h | ||