mirror of https://github.com/maderix/ANE.git
Dashboard: multi-model support (Stories110M + Qwen3-0.6B) with GQA-aware text generation and KV cache. Weights & Biases logging (--wandb flag) for loss, timing, power, and checkpoint events. Top-k=50 sampling to eliminate garbage tokens from untrained vocab entries. Tokenizer reads any vocab size. train.m: only save checkpoint when loss improves (best_loss tracking). |
||
|---|---|---|
| .. | ||
| models | ||
| Makefile | ||
| config.h | ||
| cpu_ops.h | ||
| io.h | ||
| mil_dynamic.h | ||
| train.m | ||