berkus/ANE - ANE

Commit Graph

Author	SHA1	Message	Date
manni07	ad119aed46	fix: address CRIT security findings (CRIT-01 to CRIT-04) - CRIT-01: dlopen() return check + NSClassFromString validation in ane_init() (ane_runtime.h + stories_config.h); g_ane_ok / g_ane_ok_large flag only set when all private classes load successfully; stories_config.h gets re-entry guard (g_ane_init_done) that was previously missing - CRIT-02: g_ane_ok guard in ane_compile() and compile_kern_mil_w(); NULL check for inMemoryModel after inMemoryModelWithDescriptor: — prevents crash when API call returns nil (ane_runtime.h, stories_io.h) - CRIT-03: Validate fread() return for critical config/header reads to prevent garbage malloc() sizes; fopen() NULL check in save_checkpoint(); design decision documented (model.h, train_large.m) - CRIT-04: int -> size_t in build_blob*/build_blob_t/build_blob_fp16; calloc() NULL checks added; (size_t) cast in malloc() size calculations to prevent signed integer overflow UB (stories_io.h, model.h) Simulation: 3 iterations, overall score 96.15% (all criteria >= 95%) ref: docs/reports/security-audit-2026-03-02.md	2026-03-02 22:14:51 +01:00
Manjeet Singh	893f58e725	Merge pull request #2 from m0at/m5-maximized ANE probe tests + training telemetry for M5 optimization	2026-03-02 14:57:12 +05:30
m0at	184b182bfc	Add M5 probe results: weight reload fails, all QoS work, chaining API found Key findings from running all 4 probes on Apple M5: - Weight reload (unload+load after file overwrite) does NOT work — weights are baked at compile time, output is identical regardless of file changes - weightsBuffer IOSurface parameter also does not override compiled weights - All QoS values 0-63 work, no measurable latency difference (~0.07ms/eval) - _ANEPerformanceStats has hwExecutionTime (ns) + perfCounterData - _ANEChainingRequest supports loopback execution (output→input chaining) - _ANEClient has real-time eval path and chaining preparation methods - procedureIndex 0-15 all succeed on single-procedure models Fixed probe tests to use fp32 I/O with cast (matching inmem_peak pattern) and 64+ channel kernels (ANE minimum size requirement). Full analysis in training/m5result.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 23:16:38 -08:00
m0at	40d3f45631	Add ANE probe tests and training telemetry for M5 optimization Four standalone probe tests to characterize the M5 ANE: - test_weight_reload: Can weights be hot-swapped via unload+load without recompilation? - test_perf_stats: Enumerate _ANEPerformanceStats methods/properties and hardware counters - test_qos_sweep: Measure compile/load/eval latency across QoS 0-63 - test_ane_advanced: Probe SharedEvents, weightsBuffer IOSurface, procedureIndex, VirtualClient Training telemetry (train_large.m): - JSON lines to stderr with per-step timing breakdown and per-batch TFLOPS metrics - Enables external monitoring tools to visualize ANE utilization in real-time Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-01 22:54:58 -08:00
maderix	4d67db1bdb	stories110M: 12-layer ANE training with dashboard, 107ms/step - Scale to full stories110M (109M params, 12 layers) with real TinyStories data - vDSP-vectorized cross-entropy (110ms→14ms), NEON fp16 IO, async dW - TUI dashboard: loss curve, ANE/CPU power, CPU/memory graphs, text generation - Split into modular headers: config, io, mil, cpu_ops	2026-03-01 03:14:39 -08:00
maderix	f213c8db68	Initial release	2026-02-28 00:22:06 -08:00

6 Commits All Branches Search

6 Commits

All Branches