ANE/training/training_dynamic
maderix 475348ad14 Add Qwen3-0.6B GQA support and multi-model build system
Implement Grouped-Query Attention (16q/8kv heads, head_dim=128) for
Qwen3-0.6B (28 layers, 596M params). Model configs moved to
models/*.h headers selected at build time via make MODEL=xxx.

Key changes:
- GQA-aware MIL kernels: sdpaFwd split from woFwd (Q_DIM!=DIM),
  qBwd/kvBwd split from qkvBwd (different IC dimensions)
- K/V tile (KV_HEADS→HEADS) before SDPA backward, reduce after
- 10 kernels total, all model-agnostic via compile-time defines
- Makefile: make MODEL=qwen3_06b (default) or MODEL=stories110m
- Both models verified: Stories110M ~115ms/step, Qwen3 ~412ms/step
2026-03-06 06:23:15 -08:00
..
models Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
Makefile Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
config.h Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
cpu_ops.h Fixed the dynamic pipeline logit generation 2026-03-06 04:51:32 -08:00
io.h Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
mil_dynamic.h Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
train.m Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00