mirror of https://github.com/maderix/ANE.git
Adds a second dynamic weight approach to the bridge alongside the existing
BLOBFILE compile path. Instead of packing weights into the spatial dimension
of a single large input tensor and slicing them inside MIL (the training_dynamic/
approach), weights are declared as native MIL function parameters backed by
persistent IOSurfaces:
// training_dynamic/ approach: spatial packing
func main<ios18>(tensor<fp32, [1, DIM, 1, SEQ + 4*DIM]> x) {
Wq = slice_by_size(x=x, begin=..., size=...); // overhead
...
// this PR: native function parameters
func main<ios18>(tensor<fp16,[1,K,1,M]> x, tensor<fp16,[1,N,K]> W) { ... }
New API:
ane_bridge_compile_dyn() — compile with n_weights IOSurface parameters
ane_bridge_write_weight() — write fp16 to weight IOSurface (~0.001ms)
ane_bridge_write_weight_f32() — write fp32 with NEON conversion
ane_bridge_copy_io() — direct output→input copy, no CPU round-trip
ane_bridge_begin/end_realtime() — 90.6% p99 jitter reduction
Compile cache fix: ANE only writes net.plist for parameter-based models (no
data file). try_cache_restore now checks net.plist only; data is saved/restored
conditionally for BLOBFILE models that do produce it.
Also removes the pre-built libane_bridge.dylib binary from version control.
Performance vs spatial packing (Stories110M, 12 layers, M-series):
training_dynamic/ (slice approach): 110ms/step
function parameter approach: 76.9ms/step (-30%)
The slice/reshape/transpose overhead per weight matrix explains the gap.
Both compile once at startup; weight updates are IOSurface writes in both cases.
Tested: test_bridge.m — 15/15 assertions across all new API functions.
|
||
|---|---|---|
| .. | ||
| Makefile | ||
| ane_bridge.h | ||
| ane_bridge.m | ||
| test_bridge.m | ||