Commit Graph - ANE

Commit Graph

Select branches

Hide Pull Requests

claude/add-readme-scope-notice-EL9sS

feature/dynamic-training-pipeline

main

#11

#12

#13

#13

#15

#16

#17

#18

#19

#2

#20

#21

#22

#22

#23

#23

#25

#26

#27

#28

#28

#29

#30

#30

#31

#32

#32

#33

#33

#34

#35

#35

#37

#37

#38

#38

#39

#39

#40

#40

#41

#45

#45

#48

#48

#5

#5

#50

#50

#51

#51

#6

#6

#7

#7

#8

#8

#9

a3e9b47702

Merge 7e40026e1b into d91c9845c0 ethankim 2026-06-06 16:01:52 +0900
7e40026e1b Document local M2 ANE benchmark behavior Hyoyeol Kim 2026-06-05 12:34:02 +0900
82c275d140

Merge c3263bd618 into d91c9845c0 Xiang Li 2026-04-23 17:37:56 +0800
c3263bd618 Add M5 Max benchmark — first H17 ANE on record lixiang 2026-04-23 17:18:52 +0800
2206c55bd8

Merge branch 'main' into docs/fix-readme-outdated-info Ömer 2026-03-15 20:38:37 +0300
d91c9845c0

Update README.md main Manjeet Singh 2026-03-10 15:51:05 +0530
9ff4a57e0a

Merge df885ed3df into 20cd236f61 Alvaro Videla 2026-03-10 15:12:42 +0530
cc5bd18889

Merge 1049590df8 into 20cd236f61 Erik 2026-03-10 15:12:42 +0530
09028da929

Merge 17cda7d940 into 20cd236f61 call4pwn 2026-03-10 15:12:42 +0530
668c236a08

Merge 7ea45c2fab into 20cd236f61 Alvaro Videla 2026-03-10 15:12:41 +0530
88d01326eb

Merge ec2b617064 into 20cd236f61 Erik 2026-03-10 15:12:41 +0530
1524292eb5

Merge 6f398781d7 into 20cd236f61 Livia Z. 2026-03-09 21:43:20 -0700
79aec4b028

Merge 9fbd4dff5b into 20cd236f61 log-wade 2026-03-09 23:22:49 -0400
20cd236f61 Add INT8 W8A8 support: 1.88x ANE throughput via quantize/dequantize MIL ops maderix 2026-03-09 19:47:01 -0700
9fbd4dff5b fix: guard short token datasets in train_large_ane and dynamic pipeline log-wade 2026-03-07 14:12:31 -0600
35152a3490

Merge 7fceb99988 into 7d61ee4d25 Nabbil Khan 2026-03-08 00:20:08 +0800
7d61ee4d25 Multi-model dashboard with GQA, W&B integration, and best-loss checkpointing maderix 2026-03-07 02:56:27 -0800
475348ad14 Add Qwen3-0.6B GQA support and multi-model build system maderix 2026-03-06 06:23:15 -0800
c3c5094865 Fixed the dynamic pipeline logit generation maderix 2026-03-06 04:51:32 -0800
06535fc5be Fix dashboard text generation: add KV cache for proper autoregressive attention maderix 2026-03-05 08:14:21 -0800
19da850fca Use ACCELERATE_NEW_LAPACK to fix deprecated cblas warnings maderix 2026-03-05 08:07:47 -0800
389ee0dc77 Add --data flag to pass training data path from dashboard to binary maderix 2026-03-05 08:03:54 -0800
9595b1a499 Add tokenizer via git-lfs, fix dashboard tokenizer path maderix 2026-03-05 07:41:33 -0800
926f977b40 Fix backward pass: global loss scaling, weight transpose, AdamW, activation clipping maderix 2026-03-05 07:23:08 -0800
17cda7d940 fix(security): prevent OOB write and integer overflows during model load call4pwn 2026-03-05 07:36:40 +0000
005fa4d79a

Merge 99ba013d9b into efcf193075 Erik 2026-03-04 21:39:43 +0100
99ba013d9b [test] ANE private API research: chaining, E5 runtime, custom MIL compilation experiments Erik Bray 2026-03-04 21:39:24 +0100
6f398781d7 feat(training): add M5 ANE pipeline benchmark suite Livia 2026-03-04 14:13:21 -0500
b8d2069c48 fix: address PR review feedback (MIL 1.3 dual-track benchmark, ANE compiler dynamic weights constraints) Livia 2026-03-04 11:48:39 -0500
d5eb7d28e7 docs: update README file structure and fix typo sehawq 2026-03-04 17:27:38 +0300
efcf193075 Add model config to benchmark report, update README with current results maderix 2026-03-04 06:13:21 -0800
1a7d8846b2 Add NE core counts, clarify FP16 vs rated TOPS methodology maderix 2026-03-04 06:11:29 -0800
050bc4fdf0 Add cross-generation ANE benchmark report from issue #3 maderix 2026-03-04 05:30:00 -0800
2bd5e7e93c

Merge e030ffb213 into e986572e90 TastyHeadphones 2026-03-04 22:22:00 +0900
e030ffb213 Guard short token datasets in ANE and dynamic trainers tastyheadphones 2026-03-04 22:21:44 +0900
ec2b617064 [feat] Add cache-optimized embedding ops (~12x lookup speedup) Erik Bray 2026-03-04 14:11:59 +0100
1049590df8 [chore] Add .gitignore for build artifacts, training binaries, and temp files Erik Bray 2026-03-04 14:01:21 +0100
e986572e90 Replace assert() with non-fatal bounds checks on token IDs maderix 2026-03-04 04:41:38 -0800
05fc8f85e3

Merge pull request #31 from alvgeppetto-debug/fix/safety-correctness Manjeet Singh 2026-03-04 18:09:56 +0530
032f866f2d

Merge pull request #29 from nabbilkhan/contrib/fix-training-data-paths Manjeet Singh 2026-03-04 17:48:43 +0530
44309b7625

Merge pull request #27 from jskromer/fix/macos26-inmemory-benchmarks Manjeet Singh 2026-03-04 17:48:39 +0530
7fbb912a89

Merge pull request #20 from guitared/main Manjeet Singh 2026-03-04 17:48:30 +0530
37939c8a60

Merge pull request #34 from 04cb/fix/docs-add-training-data-link Manjeet Singh 2026-03-04 17:48:25 +0530
3efa27d7a3

Merge pull request #17 from TastyHeadphones/tastyheadphones/short-dataset-underflow-fix Manjeet Singh 2026-03-04 17:48:22 +0530
367d21afe2

Merge 9e6b7c6259 into 4a6f3e40a9 William Varney 2026-03-04 09:11:56 +0100
cde79b12ab

Merge 60b0512be3 into 4a6f3e40a9 Nabbil Khan 2026-03-04 09:11:56 +0100
c9da9e62a2

Merge ad119aed46 into 4a6f3e40a9 manni07 2026-03-04 09:11:56 +0100
895b759756

Merge 3575766982 into 4a6f3e40a9 manni07 2026-03-04 09:11:56 +0100
e626968d30

Merge 2d2adacf09 into 4a6f3e40a9 Darko 2026-03-04 15:30:17 +0800
4a6f3e40a9

Revise README for clarity and project details Manjeet Singh 2026-03-04 12:59:09 +0530
0d9e139567 Fix docs: add training data download instructions 04cb 2026-03-04 08:16:20 +0800
be96079bbf [feat][gpu] Q4 quantization, Metal GPU shaders, ANE kernel fusion, memory safety Erik Bray 2026-03-04 00:48:17 +0100
df885ed3df perf: reduce compile & IO overhead Alvaro GPT 2026-03-02 23:16:52 +0100
7ea45c2fab perf: vectorize CPU bottlenecks with vDSP and cblas Alvaro GPT 2026-03-02 23:13:28 +0100
541bf4ec90 fix: correctness & safety improvements Alvaro GPT 2026-03-02 23:10:00 +0100
60b0512be3 Harden token file layout checks and prevent exec-time fd leaks nabbilkhan 2026-03-03 19:42:33 +0000
991bf4d618 Harden token dataset validation across all training pipelines nabbilkhan 2026-03-03 19:36:51 +0000
c04168ee17 Add --data path support for static training pipelines nabbilkhan 2026-03-03 19:19:49 +0000
0e70f5bd71 [feat] Optimize inference: vectorize ops (NEON/vDSP), gate debug output, skip unused ANE compilation, add round-trip benchmark timing, pure C HTTP API with tokenizer Erik Bray 2026-03-03 19:41:54 +0100
7fceb99988 Add reproducible M3 Ultra benchmark submission package nabbilkhan 2026-03-03 18:39:34 +0000
d3d00307c0 Fix benchmarks for macOS 26: replace compileModelAtURL with in-memory MIL pipeline John Stephen Kromer 2026-03-03 10:20:05 -0800
2d2adacf09 wire up fp16 I/O retry in train.m forward path imperatormk 2026-03-03 18:26:12 +0100
6f16dbefca [feat] Inference server mode: keep ANE kernels loaded between prompts (stdin loop + Unix socket server). Subsequent queries respond in ~0.5s instead of ~6s. run.py auto-connects to socket server when available. Erik Bray 2026-03-03 17:34:54 +0100
b4d81b71d4 [feat] Merge upstream PRs #21, #23, #26: NEON-optimized training (train_opt), double-buffered async ANE training (train_double_buffer), Qwen2.5-0.5B LLM inference (inference/). Added get_path() env var support and SEC_FLAGS to all new targets. Skipped PR #22 (binary blob risk). Erik Bray 2026-03-03 17:18:02 +0100
0cf13e2b84 define g_fp16_io in train.m (fixes linker error) imperatormk 2026-03-03 17:16:22 +0100
b476456736 Add LLM inference on ANE — first full transformer on Neural Engine without CoreML zemog 2026-03-03 10:18:15 -0500
21e8a58627 Qwen2.5-0.5B ANE inference — token-for-token match, 82 t/s zemog 2026-03-03 09:30:04 -0500
99b06838bc [feat] Merge upstream: dynamic weight training, CLI fixes, dashboard v2 Erik Bray 2026-03-03 14:38:52 +0100
0a1d841a10 Fix model path: accept argv[1] like train_large does tom 2026-03-03 09:33:58 -0400
216776bcb7 [docs] Community fork README, CONTRIBUTING guide, issue templates, gitignore: rewritten README with quickstart, env vars, benchmark instructions, dashboard link Erik Bray 2026-03-03 14:29:16 +0100
9832240e72 [feat] Community benchmark system: standardized JSON output, auto-submit to dashboard, aggregation script, M4 Max reference result Erik Bray 2026-03-03 14:29:11 +0100
517f1e45bb [feat] Benchmark runner and mlpackage generator: run_benchmarks.sh for full test suite, gen_mlpackages.py for CoreML model generation Erik Bray 2026-03-03 14:29:04 +0100
443194bca4 Dashboard v2: live stats, JSON parsing, all three pipelines maderix 2026-03-03 05:24:35 -0800
37cac988b8 [docs] Developer documentation: architecture diagrams, complete API reference, benchmark guide, M4 Max results, security audit report Erik Bray 2026-03-03 14:22:22 +0100
680f8c7e20 [feat] ANE ChainingRequest API prototype: baseline measurement for multi-kernel pipelining without recompile overhead Erik Bray 2026-03-03 14:22:18 +0100
7524260ead [fix] Security hardening (upstream PRs #5, #7): stack-protector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation Erik Bray 2026-03-03 14:22:03 +0100
4ae51e038b [fix] Dashboard sudo hang fix (upstream PR #20): prevent blocking when password is required for powermetrics Erik Bray 2026-03-03 14:21:57 +0100
380237af1f [fix] Token sampling underflow fix (upstream PR #17): prevent size_t wraparound on short datasets in both train_large variants Erik Bray 2026-03-03 14:21:53 +0100
c41acd2290 [fix] M1/M2/M3 MIL syntax compatibility (upstream PR #6): use program(1.0), ios16 target, tensor types across 18 files Erik Bray 2026-03-03 14:21:48 +0100
9e6b7c6259 fix: raise compile budget for double-buffer, add synthetic data mgkcloud 2026-03-03 12:13:01 +1100
3469d1d0de feat: synthetic data fallback for benchmark mode mgkcloud 2026-03-03 12:07:23 +1100
8fed989146 fix: block capture issues for GCD async compile mgkcloud 2026-03-03 12:06:27 +1100
0edafd48ca feat: double-buffered async ANE training mgkcloud 2026-03-03 10:48:07 +1100
09e9c996bb Add optimized training variant: 14% speedup (107→92 ms/step) tom 2026-03-03 08:33:26 -0400
be88b84fb3

Merge 98ddd2d190 into 3c1aae65d7 fspecii 2026-03-03 15:01:19 +0200
98ddd2d190 bridge: add compile_dyn + write_weight — function parameter IOSurfaces fspecii 2026-03-03 15:00:51 +0200
3c1aae65d7 Merge dynamic training pipeline + CLI fixes + benchmark comparison maderix 2026-03-03 04:36:03 -0800
4c14ed0e25 CLI fixes + --no-ane-extras flag + README benchmark table feature/dynamic-training-pipeline maderix 2026-03-03 04:33:30 -0800
cb474e1537 Add dynamic weight training pipeline — 110ms/step without recompilation maderix 2026-03-02 23:49:55 -0800
c33077430e

Merge PR #19: Bridge API + ANE classifier/softmax/rmsnorm_bwd offload (16% faster) Manjeet Singh 2026-03-03 13:10:57 +0530
a14ce098fb

Capitalize doc header Guitared 2026-03-03 14:18:35 +0700
b8f09a6853

fix non-interactive session error and sudo password input for powermetrics Guitared 2026-03-03 14:14:30 +0700
65cfc3255f

optimize singleton token params in generate_text Guitared 2026-03-03 14:11:42 +0700
ebac5dd73f Python Bridge+Memory leak fix+More functions Vipul 2026-03-03 02:04:36 -0500
e113fae683 feat: implement ANE SDK for general-purpose neural engine development Andy Huang 2026-03-03 15:35:55 +1100
dcacf8a3ae Refactor hardcoded absolute paths to script-relative paths Andy Huang 2026-03-03 14:32:43 +1100
aedb036f08 Optimize ANE training with weights-as-tensors, add inference and benchmarking tools Andy Huang 2026-03-03 14:10:44 +1100
2b3b7ae5cc Fix token sampling underflow on short datasets tastyheadphones 2026-03-03 11:42:42 +0900
7b6a18a059

Add ANE int8/int4 quantization probe Claude 2026-03-03 01:02:05 +0000
f0b74cdc72 Merge pull request #15 from maderix/claude/add-readme-scope-notice-EL9sS Manjeet Singh 2026-03-03 06:26:35 +0530