ANE/training_dynamic at 475348ad14c5d2fff31ec236b2d2aa31f3dc3e4e - ANE

History

maderix 475348ad14 Add Qwen3-0.6B GQA support and multi-model build system Implement Grouped-Query Attention (16q/8kv heads, head_dim=128) for Qwen3-0.6B (28 layers, 596M params). Model configs moved to models/*.h headers selected at build time via make MODEL=xxx. Key changes: - GQA-aware MIL kernels: sdpaFwd split from woFwd (Q_DIM!=DIM), qBwd/kvBwd split from qkvBwd (different IC dimensions) - K/V tile (KV_HEADS→HEADS) before SDPA backward, reduce after - 10 kernels total, all model-agnostic via compile-time defines - Makefile: make MODEL=qwen3_06b (default) or MODEL=stories110m - Both models verified: Stories110M ~115ms/step, Qwen3 ~412ms/step		2026-03-06 06:23:15 -08:00
..
models	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
Makefile	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
config.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
cpu_ops.h	Fixed the dynamic pipeline logit generation	2026-03-06 04:51:32 -08:00
io.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
mil_dynamic.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
train.m	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00