ANE/training/training_dynamic
maderix 7d61ee4d25 Multi-model dashboard with GQA, W&B integration, and best-loss checkpointing
Dashboard: multi-model support (Stories110M + Qwen3-0.6B) with GQA-aware
text generation and KV cache. Weights & Biases logging (--wandb flag) for
loss, timing, power, and checkpoint events. Top-k=50 sampling to eliminate
garbage tokens from untrained vocab entries. Tokenizer reads any vocab size.

train.m: only save checkpoint when loss improves (best_loss tracking).
2026-03-07 02:56:27 -08:00
..
models Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
Makefile Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
config.h Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
cpu_ops.h Fixed the dynamic pipeline logit generation 2026-03-06 04:51:32 -08:00
io.h Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
mil_dynamic.h Add Qwen3-0.6B GQA support and multi-model build system 2026-03-06 06:23:15 -08:00
train.m Multi-model dashboard with GQA, W&B integration, and best-loss checkpointing 2026-03-07 02:56:27 -08:00