berkus/ANE - ANE

Commit Graph

Author	SHA1	Message	Date
Erik Bray	99b06838bc	[feat] Merge upstream: dynamic weight training, CLI fixes, dashboard v2	2026-03-03 14:38:52 +01:00
Erik Bray	216776bcb7	[docs] Community fork README, CONTRIBUTING guide, issue templates, gitignore: rewritten README with quickstart, env vars, benchmark instructions, dashboard link	2026-03-03 14:29:16 +01:00
Erik Bray	9832240e72	[feat] Community benchmark system: standardized JSON output, auto-submit to dashboard, aggregation script, M4 Max reference result	2026-03-03 14:29:11 +01:00
Erik Bray	517f1e45bb	[feat] Benchmark runner and mlpackage generator: run_benchmarks.sh for full test suite, gen_mlpackages.py for CoreML model generation	2026-03-03 14:29:04 +01:00
maderix	443194bca4	Dashboard v2: live stats, JSON parsing, all three pipelines - Parse static pipeline JSON step/batch/perf lines for real-time updates - Running elapsed time, ms/step from wall-clock timestamps, steps/sec - Compute ANE + Total TFLOPS from FLOPs/step when not reported directly - Support --ane (train_large_ane) and --no-ane-extras flags - Dynamic pipeline timing breakdown + CKPT_PATH per mode	2026-03-03 05:24:35 -08:00
Erik Bray	37cac988b8	[docs] Developer documentation: architecture diagrams, complete API reference, benchmark guide, M4 Max results, security audit report	2026-03-03 14:22:22 +01:00
Erik Bray	680f8c7e20	[feat] ANE ChainingRequest API prototype: baseline measurement for multi-kernel pipelining without recompile overhead	2026-03-03 14:22:18 +01:00
Erik Bray	7524260ead	[fix] Security hardening (upstream PRs #5 , #7 ): stack-protector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation	2026-03-03 14:22:03 +01:00
Erik Bray	4ae51e038b	[fix] Dashboard sudo hang fix (upstream PR #20 ): prevent blocking when password is required for powermetrics	2026-03-03 14:21:57 +01:00
Erik Bray	380237af1f	[fix] Token sampling underflow fix (upstream PR #17 ): prevent size_t wraparound on short datasets in both train_large variants	2026-03-03 14:21:53 +01:00
Erik Bray	c41acd2290	[fix] M1/M2/M3 MIL syntax compatibility (upstream PR #6 ): use program(1.0), ios16 target, tensor types across 18 files	2026-03-03 14:21:48 +01:00
maderix	3c1aae65d7	Merge dynamic training pipeline + CLI fixes + benchmark comparison	2026-03-03 04:36:03 -08:00
maderix	4c14ed0e25	CLI fixes + --no-ane-extras flag + README benchmark table - Fix positional arg parsing (model_path, steps, lr were silently ignored) - Add --model, --ckpt flags; forward ckpt_path across exec() restarts - Add --no-ane-extras to disable ANE classifier/softmax/rmsnorm_bwd - CPU fallback for softmax/classifier/rmsnorm_bwd when extras disabled - Update README with 4-way benchmark comparison table (20 steps)	2026-03-03 04:34:55 -08:00
maderix	cb474e1537	Add dynamic weight training pipeline — 110ms/step without recompilation Dynamic weight pipeline that eliminates the ~3.7s recompile-every-10-steps bottleneck. Weights are passed via IOSurface spatial dimension instead of baked as constants, so kernels compile once at startup (345ms) and run indefinitely without exec() restart. Key components: - training_dynamic/ — full pipeline (config, IO, MIL generators, train loop) - 9 dynamic kernels shared across all 12 layers - Vocab compaction 32K→9.2K for faster classifier - Vectorized cross-entropy with vDSP/NEON - Adam optimizer with gradient clipping + cosine LR schedule - Checkpoint save/resume - test_dynamic_matmul.m — validates dynamic weight matmul vs cblas - test_weight_patch.m — tests weight update via IOSurface - dashboard.py — updated with --dynamic flag for v2 pipeline support, improved step regex parsing, --scratch/--lr/--accum CLI args Performance: 110ms/step steady-state (no recompile overhead) ane_fwd=21 ane_bwd=28 io_fwd=12 io_bwd=15 silu=10 cls=13 rms=5 ms	2026-03-03 04:34:55 -08:00
Manjeet Singh	c33077430e	Merge PR #19 : Bridge API + ANE classifier/softmax/rmsnorm_bwd offload (16% faster) Bridge+Memory leak fix+More functions	2026-03-03 13:10:57 +05:30
Vipul	ebac5dd73f	Python Bridge+Memory leak fix+More functions	2026-03-03 02:04:36 -05:00
Manjeet Singh	1b792fce34	Merge pull request #15 from maderix/claude/add-readme-scope-notice-EL9sS Add Project Scope & Intent notice to README	2026-03-03 06:26:35 +05:30
Claude	752a3be81a	Add Project Scope & Intent notice to README Weave in scope notice near the top covering project intent, what it is/isn't, hype clarification, maintenance expectations, and fork encouragement. Consolidate private API disclaimer with existing disclaimer section to avoid duplication. https://claude.ai/code/session_01NNL4MVEY1aKp19eGHTYJUv	2026-03-03 00:54:46 +00:00
Manjeet Singh	893f58e725	Merge pull request #2 from m0at/m5-maximized ANE probe tests + training telemetry for M5 optimization	2026-03-02 14:57:12 +05:30
m0at	184b182bfc	Add M5 probe results: weight reload fails, all QoS work, chaining API found Key findings from running all 4 probes on Apple M5: - Weight reload (unload+load after file overwrite) does NOT work — weights are baked at compile time, output is identical regardless of file changes - weightsBuffer IOSurface parameter also does not override compiled weights - All QoS values 0-63 work, no measurable latency difference (~0.07ms/eval) - _ANEPerformanceStats has hwExecutionTime (ns) + perfCounterData - _ANEChainingRequest supports loopback execution (output→input chaining) - _ANEClient has real-time eval path and chaining preparation methods - procedureIndex 0-15 all succeed on single-procedure models Fixed probe tests to use fp32 I/O with cast (matching inmem_peak pattern) and 64+ channel kernels (ANE minimum size requirement). Full analysis in training/m5result.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 23:16:38 -08:00
m0at	40d3f45631	Add ANE probe tests and training telemetry for M5 optimization Four standalone probe tests to characterize the M5 ANE: - test_weight_reload: Can weights be hot-swapped via unload+load without recompilation? - test_perf_stats: Enumerate _ANEPerformanceStats methods/properties and hardware counters - test_qos_sweep: Measure compile/load/eval latency across QoS 0-63 - test_ane_advanced: Probe SharedEvents, weightsBuffer IOSurface, procedureIndex, VirtualClient Training telemetry (train_large.m): - JSON lines to stderr with per-step timing breakdown and per-batch TFLOPS metrics - Enables external monitoring tools to visualize ANE utilization in real-time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 22:54:58 -08:00
maderix	4d67db1bdb	stories110M: 12-layer ANE training with dashboard, 107ms/step - Scale to full stories110M (109M params, 12 layers) with real TinyStories data - vDSP-vectorized cross-entropy (110ms→14ms), NEON fp16 IO, async dW - TUI dashboard: loss curve, ANE/CPU power, CPU/memory graphs, text generation - Split into modular headers: config, io, mil, cpu_ops	2026-03-01 03:14:39 -08:00
maderix	f213c8db68	Initial release	2026-02-28 00:22:06 -08:00

23 Commits All Branches Search

23 Commits

All Branches