Commit Graph

  • 82c275d140
    Merge c3263bd618 into d91c9845c0 Xiang Li 2026-04-23 17:37:56 +0800
  • c3263bd618 Add M5 Max benchmark — first H17 ANE on record lixiang 2026-04-23 17:18:52 +0800
  • 2206c55bd8
    Merge branch 'main' into docs/fix-readme-outdated-info Ömer 2026-03-15 20:38:37 +0300
  • d91c9845c0
    Update README.md main Manjeet Singh 2026-03-10 15:51:05 +0530
  • 9ff4a57e0a
    Merge df885ed3df into 20cd236f61 Alvaro Videla 2026-03-10 15:12:42 +0530
  • cc5bd18889
    Merge 1049590df8 into 20cd236f61 Erik 2026-03-10 15:12:42 +0530
  • 09028da929
    Merge 17cda7d940 into 20cd236f61 call4pwn 2026-03-10 15:12:42 +0530
  • 668c236a08
    Merge 7ea45c2fab into 20cd236f61 Alvaro Videla 2026-03-10 15:12:41 +0530
  • 88d01326eb
    Merge ec2b617064 into 20cd236f61 Erik 2026-03-10 15:12:41 +0530
  • 1524292eb5
    Merge 6f398781d7 into 20cd236f61 Livia Z. 2026-03-09 21:43:20 -0700
  • 79aec4b028
    Merge 9fbd4dff5b into 20cd236f61 log-wade 2026-03-09 23:22:49 -0400
  • 20cd236f61 Add INT8 W8A8 support: 1.88x ANE throughput via quantize/dequantize MIL ops maderix 2026-03-09 19:47:01 -0700
  • 9fbd4dff5b fix: guard short token datasets in train_large_ane and dynamic pipeline log-wade 2026-03-07 14:12:31 -0600
  • 35152a3490
    Merge 7fceb99988 into 7d61ee4d25 Nabbil Khan 2026-03-08 00:20:08 +0800
  • 7d61ee4d25 Multi-model dashboard with GQA, W&B integration, and best-loss checkpointing maderix 2026-03-07 02:56:27 -0800
  • 475348ad14 Add Qwen3-0.6B GQA support and multi-model build system maderix 2026-03-06 06:23:15 -0800
  • c3c5094865 Fixed the dynamic pipeline logit generation maderix 2026-03-06 04:51:32 -0800
  • 06535fc5be Fix dashboard text generation: add KV cache for proper autoregressive attention maderix 2026-03-05 08:14:21 -0800
  • 19da850fca Use ACCELERATE_NEW_LAPACK to fix deprecated cblas warnings maderix 2026-03-05 08:07:47 -0800
  • 389ee0dc77 Add --data flag to pass training data path from dashboard to binary maderix 2026-03-05 08:03:54 -0800
  • 9595b1a499 Add tokenizer via git-lfs, fix dashboard tokenizer path maderix 2026-03-05 07:41:33 -0800
  • 926f977b40 Fix backward pass: global loss scaling, weight transpose, AdamW, activation clipping maderix 2026-03-05 07:23:08 -0800
  • 17cda7d940 fix(security): prevent OOB write and integer overflows during model load call4pwn 2026-03-05 07:36:40 +0000
  • 005fa4d79a
    Merge 99ba013d9b into efcf193075 Erik 2026-03-04 21:39:43 +0100
  • 99ba013d9b [test] ANE private API research: chaining, E5 runtime, custom MIL compilation experiments Erik Bray 2026-03-04 21:39:24 +0100
  • 6f398781d7 feat(training): add M5 ANE pipeline benchmark suite Livia 2026-03-04 14:13:21 -0500
  • b8d2069c48 fix: address PR review feedback (MIL 1.3 dual-track benchmark, ANE compiler dynamic weights constraints) Livia 2026-03-04 11:48:39 -0500
  • d5eb7d28e7 docs: update README file structure and fix typo sehawq 2026-03-04 17:27:38 +0300
  • efcf193075 Add model config to benchmark report, update README with current results maderix 2026-03-04 06:13:21 -0800
  • 1a7d8846b2 Add NE core counts, clarify FP16 vs rated TOPS methodology maderix 2026-03-04 06:11:29 -0800
  • 050bc4fdf0 Add cross-generation ANE benchmark report from issue #3 maderix 2026-03-04 05:30:00 -0800
  • 2bd5e7e93c
    Merge e030ffb213 into e986572e90 TastyHeadphones 2026-03-04 22:22:00 +0900
  • e030ffb213 Guard short token datasets in ANE and dynamic trainers tastyheadphones 2026-03-04 22:21:44 +0900
  • ec2b617064 [feat] Add cache-optimized embedding ops (~12x lookup speedup) Erik Bray 2026-03-04 14:11:59 +0100
  • 1049590df8 [chore] Add .gitignore for build artifacts, training binaries, and temp files Erik Bray 2026-03-04 14:01:21 +0100
  • e986572e90 Replace assert() with non-fatal bounds checks on token IDs maderix 2026-03-04 04:41:38 -0800
  • 05fc8f85e3
    Merge pull request #31 from alvgeppetto-debug/fix/safety-correctness Manjeet Singh 2026-03-04 18:09:56 +0530
  • 032f866f2d
    Merge pull request #29 from nabbilkhan/contrib/fix-training-data-paths Manjeet Singh 2026-03-04 17:48:43 +0530
  • 44309b7625
    Merge pull request #27 from jskromer/fix/macos26-inmemory-benchmarks Manjeet Singh 2026-03-04 17:48:39 +0530
  • 7fbb912a89
    Merge pull request #20 from guitared/main Manjeet Singh 2026-03-04 17:48:30 +0530
  • 37939c8a60
    Merge pull request #34 from 04cb/fix/docs-add-training-data-link Manjeet Singh 2026-03-04 17:48:25 +0530
  • 3efa27d7a3
    Merge pull request #17 from TastyHeadphones/tastyheadphones/short-dataset-underflow-fix Manjeet Singh 2026-03-04 17:48:22 +0530
  • 367d21afe2
    Merge 9e6b7c6259 into 4a6f3e40a9 William Varney 2026-03-04 09:11:56 +0100
  • cde79b12ab
    Merge 60b0512be3 into 4a6f3e40a9 Nabbil Khan 2026-03-04 09:11:56 +0100
  • c9da9e62a2
    Merge ad119aed46 into 4a6f3e40a9 manni07 2026-03-04 09:11:56 +0100
  • 895b759756
    Merge 3575766982 into 4a6f3e40a9 manni07 2026-03-04 09:11:56 +0100
  • e626968d30
    Merge 2d2adacf09 into 4a6f3e40a9 Darko 2026-03-04 15:30:17 +0800
  • 4a6f3e40a9
    Revise README for clarity and project details Manjeet Singh 2026-03-04 12:59:09 +0530
  • 0d9e139567 Fix docs: add training data download instructions 04cb 2026-03-04 08:16:20 +0800
  • be96079bbf [feat][gpu] Q4 quantization, Metal GPU shaders, ANE kernel fusion, memory safety Erik Bray 2026-03-04 00:48:17 +0100
  • df885ed3df perf: reduce compile & IO overhead Alvaro GPT 2026-03-02 23:16:52 +0100
  • 7ea45c2fab perf: vectorize CPU bottlenecks with vDSP and cblas Alvaro GPT 2026-03-02 23:13:28 +0100
  • 541bf4ec90 fix: correctness & safety improvements Alvaro GPT 2026-03-02 23:10:00 +0100
  • 60b0512be3 Harden token file layout checks and prevent exec-time fd leaks nabbilkhan 2026-03-03 19:42:33 +0000
  • 991bf4d618 Harden token dataset validation across all training pipelines nabbilkhan 2026-03-03 19:36:51 +0000
  • c04168ee17 Add --data path support for static training pipelines nabbilkhan 2026-03-03 19:19:49 +0000
  • 0e70f5bd71 [feat] Optimize inference: vectorize ops (NEON/vDSP), gate debug output, skip unused ANE compilation, add round-trip benchmark timing, pure C HTTP API with tokenizer Erik Bray 2026-03-03 19:41:54 +0100
  • 7fceb99988 Add reproducible M3 Ultra benchmark submission package nabbilkhan 2026-03-03 18:39:34 +0000
  • d3d00307c0 Fix benchmarks for macOS 26: replace compileModelAtURL with in-memory MIL pipeline John Stephen Kromer 2026-03-03 10:20:05 -0800
  • 2d2adacf09 wire up fp16 I/O retry in train.m forward path imperatormk 2026-03-03 18:26:12 +0100
  • 6f16dbefca [feat] Inference server mode: keep ANE kernels loaded between prompts (stdin loop + Unix socket server). Subsequent queries respond in ~0.5s instead of ~6s. run.py auto-connects to socket server when available. Erik Bray 2026-03-03 17:34:54 +0100
  • b4d81b71d4 [feat] Merge upstream PRs #21, #23, #26: NEON-optimized training (train_opt), double-buffered async ANE training (train_double_buffer), Qwen2.5-0.5B LLM inference (inference/). Added get_path() env var support and SEC_FLAGS to all new targets. Skipped PR #22 (binary blob risk). Erik Bray 2026-03-03 17:18:02 +0100
  • 0cf13e2b84 define g_fp16_io in train.m (fixes linker error) imperatormk 2026-03-03 17:16:22 +0100
  • b476456736 Add LLM inference on ANE — first full transformer on Neural Engine without CoreML zemog 2026-03-03 10:18:15 -0500
  • 21e8a58627 Qwen2.5-0.5B ANE inference — token-for-token match, 82 t/s zemog 2026-03-03 09:30:04 -0500
  • 99b06838bc [feat] Merge upstream: dynamic weight training, CLI fixes, dashboard v2 Erik Bray 2026-03-03 14:38:52 +0100
  • 0a1d841a10 Fix model path: accept argv[1] like train_large does tom 2026-03-03 09:33:58 -0400
  • 216776bcb7 [docs] Community fork README, CONTRIBUTING guide, issue templates, gitignore: rewritten README with quickstart, env vars, benchmark instructions, dashboard link Erik Bray 2026-03-03 14:29:16 +0100
  • 9832240e72 [feat] Community benchmark system: standardized JSON output, auto-submit to dashboard, aggregation script, M4 Max reference result Erik Bray 2026-03-03 14:29:11 +0100
  • 517f1e45bb [feat] Benchmark runner and mlpackage generator: run_benchmarks.sh for full test suite, gen_mlpackages.py for CoreML model generation Erik Bray 2026-03-03 14:29:04 +0100
  • 443194bca4 Dashboard v2: live stats, JSON parsing, all three pipelines maderix 2026-03-03 05:24:35 -0800
  • 37cac988b8 [docs] Developer documentation: architecture diagrams, complete API reference, benchmark guide, M4 Max results, security audit report Erik Bray 2026-03-03 14:22:22 +0100
  • 680f8c7e20 [feat] ANE ChainingRequest API prototype: baseline measurement for multi-kernel pipelining without recompile overhead Erik Bray 2026-03-03 14:22:18 +0100
  • 7524260ead [fix] Security hardening (upstream PRs #5, #7): stack-protector-strong, format-security flags, NULL guards on ane_compile/fread/fopen, tokenize.py input validation Erik Bray 2026-03-03 14:22:03 +0100
  • 4ae51e038b [fix] Dashboard sudo hang fix (upstream PR #20): prevent blocking when password is required for powermetrics Erik Bray 2026-03-03 14:21:57 +0100
  • 380237af1f [fix] Token sampling underflow fix (upstream PR #17): prevent size_t wraparound on short datasets in both train_large variants Erik Bray 2026-03-03 14:21:53 +0100
  • c41acd2290 [fix] M1/M2/M3 MIL syntax compatibility (upstream PR #6): use program(1.0), ios16 target, tensor types across 18 files Erik Bray 2026-03-03 14:21:48 +0100
  • 9e6b7c6259 fix: raise compile budget for double-buffer, add synthetic data mgkcloud 2026-03-03 12:13:01 +1100
  • 3469d1d0de feat: synthetic data fallback for benchmark mode mgkcloud 2026-03-03 12:07:23 +1100
  • 8fed989146 fix: block capture issues for GCD async compile mgkcloud 2026-03-03 12:06:27 +1100
  • 0edafd48ca feat: double-buffered async ANE training mgkcloud 2026-03-03 10:48:07 +1100
  • 09e9c996bb Add optimized training variant: 14% speedup (107→92 ms/step) tom 2026-03-03 08:33:26 -0400
  • be88b84fb3
    Merge 98ddd2d190 into 3c1aae65d7 fspecii 2026-03-03 15:01:19 +0200
  • 98ddd2d190 bridge: add compile_dyn + write_weight — function parameter IOSurfaces fspecii 2026-03-03 15:00:51 +0200
  • 3c1aae65d7 Merge dynamic training pipeline + CLI fixes + benchmark comparison maderix 2026-03-03 04:36:03 -0800
  • 4c14ed0e25 CLI fixes + --no-ane-extras flag + README benchmark table feature/dynamic-training-pipeline maderix 2026-03-03 04:33:30 -0800
  • cb474e1537 Add dynamic weight training pipeline — 110ms/step without recompilation maderix 2026-03-02 23:49:55 -0800
  • c33077430e
    Merge PR #19: Bridge API + ANE classifier/softmax/rmsnorm_bwd offload (16% faster) Manjeet Singh 2026-03-03 13:10:57 +0530
  • a14ce098fb
    Capitalize doc header Guitared 2026-03-03 14:18:35 +0700
  • b8f09a6853
    fix non-interactive session error and sudo password input for powermetrics Guitared 2026-03-03 14:14:30 +0700
  • 65cfc3255f
    optimize singleton token params in generate_text Guitared 2026-03-03 14:11:42 +0700
  • ebac5dd73f Python Bridge+Memory leak fix+More functions Vipul 2026-03-03 02:04:36 -0500
  • e113fae683 feat: implement ANE SDK for general-purpose neural engine development Andy Huang 2026-03-03 15:35:55 +1100
  • dcacf8a3ae Refactor hardcoded absolute paths to script-relative paths Andy Huang 2026-03-03 14:32:43 +1100
  • aedb036f08 Optimize ANE training with weights-as-tensors, add inference and benchmarking tools Andy Huang 2026-03-03 14:10:44 +1100
  • 2b3b7ae5cc Fix token sampling underflow on short datasets tastyheadphones 2026-03-03 11:42:42 +0900
  • 7b6a18a059
    Add ANE int8/int4 quantization probe Claude 2026-03-03 01:02:05 +0000
  • f0b74cdc72 Merge pull request #15 from maderix/claude/add-readme-scope-notice-EL9sS Manjeet Singh 2026-03-03 06:26:35 +0530
  • 1b792fce34
    Merge pull request #15 from maderix/claude/add-readme-scope-notice-EL9sS Manjeet Singh 2026-03-03 06:26:35 +0530
  • 9ba289cbca Add Project Scope & Intent notice to README Claude 2026-03-03 00:54:46 +0000