Commit Graph

5 Commits

Author SHA1 Message Date
Alvaro Videla 668c236a08
Merge 7ea45c2fab into 20cd236f61 2026-03-10 15:12:41 +05:30
maderix e986572e90 Replace assert() with non-fatal bounds checks on token IDs
Follow-up to PR #31 — assert() aborts on bad tokens, which is too
harsh for training. Skip bad tokens with a warning instead.
2026-03-04 04:41:38 -08:00
Alvaro GPT 7ea45c2fab perf: vectorize CPU bottlenecks with vDSP and cblas
- Vectorize adam_update with vDSP batch ops (stories_cpu_ops.h)
  Replaces scalar per-element loop with vDSP_vsmul/vsma/vsq/vdiv
  Expected ~3-4x faster for 2.4M parameter updates

- Vectorize model_adam_step ADAM_UPDATE macro with vDSP (backward.h)
  Same batch ops pattern for the train.m model pipeline

- Replace cpu_accum_dW with cblas_sgemm (backward.h)
  dW += dy^T @ x is a standard BLAS GEMM operation
  Expected 5-10x faster for weight gradient accumulation

- Replace cpu_matmul_backward_dx with cblas_sgemm (backward.h)
  dx = dy @ W^T is also a standard BLAS GEMM

- Add -framework Accelerate to train target (Makefile)
2026-03-03 20:47:03 +01:00
Alvaro GPT 541bf4ec90 fix: correctness & safety improvements
- Validate all fread() return values in model_load_weights (model.h)
- Check ane_eval() return values in ane_conv_eval (forward.h) and ane_eval_k (tiny_train.m)
- Log error details on ANE eval failure (ane_runtime.h)
- Thread-safe RMSNorm: replace global g_rms_tmp with local allocation (stories_cpu_ops.h)
- Bounds-check token indices in cross_entropy_loss, embed_lookup, embed_backward
- Atomic checkpoint writes via tmp+rename pattern (tiny_train.m)
- Non-destructive recompile: compile new kernels first, swap only on success (model.h)
- Validate fread() in load_checkpoint (tiny_train.m)
2026-03-03 20:46:58 +01:00
maderix 4d67db1bdb stories110M: 12-layer ANE training with dashboard, 107ms/step
- Scale to full stories110M (109M params, 12 layers) with real TinyStories data
- vDSP-vectorized cross-entropy (110ms→14ms), NEON fp16 IO, async dW
- TUI dashboard: loss curve, ANE/CPU power, CPU/memory graphs, text generation
- Split into modular headers: config, io, mil, cpu_ops
2026-03-01 03:14:39 -08:00