mirror of https://github.com/maderix/ANE.git
41 lines
2.6 KiB
Markdown
41 lines
2.6 KiB
Markdown
# ANE SDK Roadmap: General-Purpose Neural Engine Development Kit
|
|
|
|
This roadmap outlines the evolution of the current Apple Neural Engine (ANE) training infrastructure into a modular, high-level SDK for developing and training arbitrary neural network architectures on Apple Silicon.
|
|
|
|
## 🌟 Strategic Vision: "PyTorch for ANE"
|
|
Transform low-level, transformer-specific MIL (Model Intermediate Language) generation into a modular, layer-based system that allows developers to define, train, and benchmark any architecture (CNNs, MLPs, RNNs) with minimal boilerplate.
|
|
|
|
---
|
|
|
|
## 🛠 Phase 1: Modular Layer Abstractions (Short Term)
|
|
**Goal:** Decouple MIL generation from the Transformer-specific logic.
|
|
- [x] **ANE-MIL Layer Library**: Created a repository of optimized MIL builders for core primitives:
|
|
- `Linear(in, out)`, `Conv2D(kernel, stride, padding)`
|
|
- `ReLU`, `GELU`, `Sigmoid`, `Softmax` activations
|
|
- `LayerNorm` and `RMSNorm`
|
|
- [x] **Unified Tensor API**: High-level wrapper around `IOSurface` and `NEON` via `anesdk.h`.
|
|
- [x] **Weights-as-Tensors by Default**: Every layer automatically utilizes the dynamic weight update optimization (zero-recompile).
|
|
|
|
## 🚀 Phase 2: Automated Graph Engine (Medium Term)
|
|
**Goal:** Automate the orchestration of multiple kernels into a cohesive model.
|
|
- [x] **ANEGraph Orchestrator**: Implemented **Sequential** model container that automates execution order.
|
|
- [ ] **Automatic Backward Pass**: Orchestration of backward kernels in reverse order.
|
|
- [ ] **Automatic Gradient Management**: Logic to handle gradient accumulation and weight updates across multi-layer graphs.
|
|
- [ ] **Optimizer Library**: Implement standard optimizers (SGD, Adam, AdamW) as native C++ components using the Accelerate framework.
|
|
|
|
## 📈 Phase 3: Developer Ecosystem & Tooling (Long Term)
|
|
**Goal:** Improve developer velocity and integration.
|
|
- [ ] **Python Bridge (PyANE)**: A lightweight Python library for defining models that compiles directly to ANE-executable graph binaries.
|
|
- [ ] **Model Profiler**: Native tools to measure TFLOPS, memory bandwidth, and ANE utilization per-layer.
|
|
- [ ] **Deployment Export**: One-click export to CoreML `.mlpackage` for final production deployment.
|
|
|
|
---
|
|
|
|
## 🏁 Success Metrics
|
|
- **Agnosticism**: Ability to run a CIFAR-10 CNN and a Stories110M Transformer using the same core runtime.
|
|
- **Performance**: Maintain >90 TFLOPS sustained throughput across various architectures.
|
|
- **Simplicity**: Reduce the lines of code required to define a new model by >70%.
|
|
|
|
> [!NOTE]
|
|
> This SDK leverages private ANE infrastructure to bypass the limitations of public CoreML training, specifically focusing on high-throughput, on-device weight updates.
|