berkus/ANE - ANE

Commit Graph

Author	SHA1	Message	Date
maderix	7d61ee4d25	Multi-model dashboard with GQA, W&B integration, and best-loss checkpointing Dashboard: multi-model support (Stories110M + Qwen3-0.6B) with GQA-aware text generation and KV cache. Weights & Biases logging (--wandb flag) for loss, timing, power, and checkpoint events. Top-k=50 sampling to eliminate garbage tokens from untrained vocab entries. Tokenizer reads any vocab size. train.m: only save checkpoint when loss improves (best_loss tracking).	2026-03-07 02:56:27 -08:00
maderix	c3c5094865	Fixed the dynamic pipeline logit generation	2026-03-06 04:51:32 -08:00
maderix	06535fc5be	Fix dashboard text generation: add KV cache for proper autoregressive attention	2026-03-05 08:14:21 -08:00
maderix	389ee0dc77	Add --data flag to pass training data path from dashboard to binary	2026-03-05 08:03:54 -08:00
maderix	9595b1a499	Add tokenizer via git-lfs, fix dashboard tokenizer path - Add tokenizer.bin (434KB) to assets/models/ via git-lfs - Fix dashboard tokenizer path (was one parent too many)	2026-03-05 07:41:33 -08:00
Manjeet Singh	7fbb912a89	Merge pull request #20 from guitared/main Optimize dashboard and prevent sudo hang when password needed	2026-03-04 17:48:30 +05:30
maderix	443194bca4	Dashboard v2: live stats, JSON parsing, all three pipelines - Parse static pipeline JSON step/batch/perf lines for real-time updates - Running elapsed time, ms/step from wall-clock timestamps, steps/sec - Compute ANE + Total TFLOPS from FLOPs/step when not reported directly - Support --ane (train_large_ane) and --no-ane-extras flags - Dynamic pipeline timing breakdown + CKPT_PATH per mode	2026-03-03 05:24:35 -08:00
maderix	cb474e1537	Add dynamic weight training pipeline — 110ms/step without recompilation Dynamic weight pipeline that eliminates the ~3.7s recompile-every-10-steps bottleneck. Weights are passed via IOSurface spatial dimension instead of baked as constants, so kernels compile once at startup (345ms) and run indefinitely without exec() restart. Key components: - training_dynamic/ — full pipeline (config, IO, MIL generators, train loop) - 9 dynamic kernels shared across all 12 layers - Vocab compaction 32K→9.2K for faster classifier - Vectorized cross-entropy with vDSP/NEON - Adam optimizer with gradient clipping + cosine LR schedule - Checkpoint save/resume - test_dynamic_matmul.m — validates dynamic weight matmul vs cblas - test_weight_patch.m — tests weight update via IOSurface - dashboard.py — updated with --dynamic flag for v2 pipeline support, improved step regex parsing, --scratch/--lr/--accum CLI args Performance: 110ms/step steady-state (no recompile overhead) ane_fwd=21 ane_bwd=28 io_fwd=12 io_bwd=15 silu=10 cls=13 rms=5 ms	2026-03-03 04:34:55 -08:00
Guitared	b8f09a6853	fix non-interactive session error and sudo password input for powermetrics	2026-03-03 14:14:30 +07:00
Guitared	65cfc3255f	optimize singleton token params in generate_text	2026-03-03 14:11:42 +07:00
maderix	4d67db1bdb	stories110M: 12-layer ANE training with dashboard, 107ms/step - Scale to full stories110M (109M params, 12 layers) with real TinyStories data - vDSP-vectorized cross-entropy (110ms→14ms), NEON fp16 IO, async dW - TUI dashboard: loss curve, ANE/CPU power, CPU/memory graphs, text generation - Split into modular headers: config, io, mil, cpu_ops	2026-03-01 03:14:39 -08:00

11 Commits