ANE/models at d91c9845c0784dec7753048954fc6d0e8411fe29 - ANE

History

maderix 475348ad14 Add Qwen3-0.6B GQA support and multi-model build system Implement Grouped-Query Attention (16q/8kv heads, head_dim=128) for Qwen3-0.6B (28 layers, 596M params). Model configs moved to models/*.h headers selected at build time via make MODEL=xxx. Key changes: - GQA-aware MIL kernels: sdpaFwd split from woFwd (Q_DIM!=DIM), qBwd/kvBwd split from qkvBwd (different IC dimensions) - K/V tile (KV_HEADS→HEADS) before SDPA backward, reduce after - 10 kernels total, all model-agnostic via compile-time defines - Makefile: make MODEL=qwen3_06b (default) or MODEL=stories110m - Both models verified: Stories110M ~115ms/step, Qwen3 ~412ms/step	2026-03-06 06:23:15 -08:00
..
qwen3_06b.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
stories110m.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00

maderix 475348ad14 Add Qwen3-0.6B GQA support and multi-model build system

Implement Grouped-Query Attention (16q/8kv heads, head_dim=128) for
Qwen3-0.6B (28 layers, 596M params). Model configs moved to
models/*.h headers selected at build time via make MODEL=xxx.

Key changes:
- GQA-aware MIL kernels: sdpaFwd split from woFwd (Q_DIM!=DIM),
  qBwd/kvBwd split from qkvBwd (different IC dimensions)
- K/V tile (KV_HEADS→HEADS) before SDPA backward, reduce after
- 10 kernels total, all model-agnostic via compile-time defines
- Makefile: make MODEL=qwen3_06b (default) or MODEL=stories110m
- Both models verified: Stories110M ~115ms/step, Qwen3 ~412ms/step

2026-03-06 06:23:15 -08:00

qwen3_06b.h

Add Qwen3-0.6B GQA support and multi-model build system

2026-03-06 06:23:15 -08:00

stories110m.h

Add Qwen3-0.6B GQA support and multi-model build system

2026-03-06 06:23:15 -08:00