berkus/ANE - ANE

Commit Graph

Author	SHA1	Message	Date
imperatormk	709b60208f	Fix MIL syntax for cross-generation ANE compatibility The MIL scalar types used shorthand syntax (string("x"), int32(1)) that only works on M4. Changed to the canonical verbose format that CoreML's own compiler emits (tensor<string, []>("x"), tensor<int32, []>(1)). Also targets program(1.0) with <ios16> instead of program(1.3)/<ios18>, and simplifies buildInfo to just coremlc-version. For conv-based kernels, adds runtime fp16 I/O fallback — M1/M2 ANE doesn't support the cast op (fp32<->fp16), so on first compile failure it retries with native fp16 inputs/outputs and does the conversion on the CPU side. The fallback is persisted across exec() restarts. Note: matmul and scaled_dot_product_attention ops still fail on M1/M2 — these are M4+ ANE ops. The attention tests (test_ane_causal_attn, test_ane_sdpa5, test_full_fused attention part) require M4 hardware. Conv-based kernels (training, QKV projections, FFN) work on all generations. Tested on M1 Pro, macOS 26.3 (Tahoe).	2026-03-02 22:00:45 +01:00
maderix	f213c8db68	Initial release	2026-02-28 00:22:06 -08:00

Author

SHA1

Message

Date

imperatormk

709b60208f

Fix MIL syntax for cross-generation ANE compatibility

The MIL scalar types used shorthand syntax (string("x"), int32(1)) that
only works on M4. Changed to the canonical verbose format that CoreML's
own compiler emits (tensor<string, []>("x"), tensor<int32, []>(1)).

Also targets program(1.0) with <ios16> instead of program(1.3)/<ios18>,
and simplifies buildInfo to just coremlc-version.

For conv-based kernels, adds runtime fp16 I/O fallback — M1/M2 ANE
doesn't support the cast op (fp32<->fp16), so on first compile failure
it retries with native fp16 inputs/outputs and does the conversion on
the CPU side. The fallback is persisted across exec() restarts.

Note: matmul and scaled_dot_product_attention ops still fail on M1/M2 —
these are M4+ ANE ops. The attention tests (test_ane_causal_attn,
test_ane_sdpa5, test_full_fused attention part) require M4 hardware.
Conv-based kernels (training, QKV projections, FFN) work on all generations.

Tested on M1 Pro, macOS 26.3 (Tahoe).

2026-03-02 22:00:45 +01:00

maderix

f213c8db68

Initial release

2026-02-28 00:22:06 -08:00

2 Commits