ANE/training_dynamic at main - ANE

History

maderix 7d61ee4d25 Multi-model dashboard with GQA, W&B integration, and best-loss checkpointing Dashboard: multi-model support (Stories110M + Qwen3-0.6B) with GQA-aware text generation and KV cache. Weights & Biases logging (--wandb flag) for loss, timing, power, and checkpoint events. Top-k=50 sampling to eliminate garbage tokens from untrained vocab entries. Tokenizer reads any vocab size. train.m: only save checkpoint when loss improves (best_loss tracking).		2026-03-07 02:56:27 -08:00
..
models	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
Makefile	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
config.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
cpu_ops.h	Fixed the dynamic pipeline logit generation	2026-03-06 04:51:32 -08:00
io.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
mil_dynamic.h	Add Qwen3-0.6B GQA support and multi-model build system	2026-03-06 06:23:15 -08:00
train.m	Multi-model dashboard with GQA, W&B integration, and best-loss checkpointing	2026-03-07 02:56:27 -08:00