Erik Bray
|
be96079bbf
|
[feat][gpu] Q4 quantization, Metal GPU shaders, ANE kernel fusion, memory safety
|
2026-03-04 00:51:59 +01:00 |
Erik Bray
|
0e70f5bd71
|
[feat] Optimize inference: vectorize ops (NEON/vDSP), gate debug output, skip unused ANE compilation, add round-trip benchmark timing, pure C HTTP API with tokenizer
|
2026-03-03 19:41:54 +01:00 |
Erik Bray
|
6f16dbefca
|
[feat] Inference server mode: keep ANE kernels loaded between prompts (stdin loop + Unix socket server). Subsequent queries respond in ~0.5s instead of ~6s. run.py auto-connects to socket server when available.
|
2026-03-03 17:34:54 +01:00 |
Erik Bray
|
b4d81b71d4
|
[feat] Merge upstream PRs #21, #23, #26: NEON-optimized training (train_opt), double-buffered async ANE training (train_double_buffer), Qwen2.5-0.5B LLM inference (inference/). Added get_path() env var support and SEC_FLAGS to all new targets. Skipped PR #22 (binary blob risk).
|
2026-03-03 17:18:02 +01:00 |