Claude
|
7b6a18a059
|
Add ANE int8/int4 quantization probe
Probe whether Apple Neural Engine executes quantized ops natively
(faster int8-int8 compute path) or just dequantizes to fp16 at load time.
Tests 5 approaches at transformer-representative dimensions:
1. FP16 baseline conv (baked weights)
2. INT8 via constexpr_affine_dequantize (per-channel scale+zp)
3. UINT4 via constexpr_affine_dequantize (per-channel)
4. UINT4 via constexpr_blockwise_shift_scale (block_size=32)
5. 4-bit palettized via constexpr_lut_to_dense (16-entry LUT)
Each test compiles MIL → ANE kernel, benchmarks 100 evals, reports
TFLOPS. If int8 shows ~2x fp16 TFLOPS, ANE has native int8 compute.
If same TFLOPS, it's dequant-only (still useful for memory savings).
Build: xcrun clang -O2 -fobjc-arc -o quant_probe quant_probe.m \
-framework Foundation -framework IOSurface -ldl
https://claude.ai/code/session_01U5HLjsm4iUzL9iDaHbxeRB
|
2026-03-03 01:02:05 +00:00 |
gigantic
|
8c9161add6
|
Update README.md
|
2026-03-02 14:36:28 -08:00 |
m0at
|
5271c00281
|
Update README to reflect GPT-2 inference, M5 findings, and upstream attribution
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-02 14:26:48 -08:00 |
gigantic
|
70bfa4e54e
|
Revise README for clarity and additional details
Updated README to clarify implementation details and added NEON CPU decode.
|
2026-03-02 13:39:26 -08:00 |
gigantic
|
40a5384074
|
Revise README for project fork and updates
Forked the project and updated the README to reflect changes.
|
2026-03-02 13:31:31 -08:00 |
gigantic
|
6b8d69b93d
|
Update README with project details and structure
|
2026-03-02 13:03:27 -08:00 |
gigantic
|
a102d27fa2
|
Update README.md
|
2026-03-02 13:02:53 -08:00 |
maderix
|
4d67db1bdb
|
stories110M: 12-layer ANE training with dashboard, 107ms/step
- Scale to full stories110M (109M params, 12 layers) with real TinyStories data
- vDSP-vectorized cross-entropy (110ms→14ms), NEON fp16 IO, async dW
- TUI dashboard: loss curve, ANE/CPU power, CPU/memory graphs, text generation
- Split into modular headers: config, io, mil, cpu_ops
|
2026-03-01 03:14:39 -08:00 |
maderix
|
f213c8db68
|
Initial release
|
2026-02-28 00:22:06 -08:00 |