docs: update README file structure and fix typo

Update the file structure section to reflect the current repository
layout, including benchmarks/, bridge/, training_dynamic/, and newly
added header files, scripts, and training variants. Fix missing space
in "Fork it, build on it" section.
This commit is contained in:
sehawq 2026-03-04 17:27:38 +03:00
parent efcf193075
commit d5eb7d28e7
1 changed files with 42 additions and 17 deletions

View File

@ -49,7 +49,7 @@ That said:
### Fork it, build on it ### Fork it, build on it
This is MIT licensed for a reason. Everyone now has access to AI-assisted development tools that can adapt and extend code in hours. If this project is useful to you — take it, modify it, build something better. If you do something cool with it, I'd love to hear about it.If in future, community decides to maintain one source of truth repo, I'm in full support of that. This is MIT licensed for a reason. Everyone now has access to AI-assisted development tools that can adapt and extend code in hours. If this project is useful to you — take it, modify it, build something better. If you do something cool with it, I'd love to hear about it. If in future, community decides to maintain one source of truth repo, I'm in full support of that.
--- ---
@ -92,23 +92,48 @@ Key optimizations:
## File Structure ## File Structure
``` ```
├── api_exploration.m # Initial ANE API discovery ├── api_exploration.m # Initial ANE API discovery
├── inmem_basic.m # In-memory MIL compilation proof-of-concept ├── inmem_basic.m # In-memory MIL compilation proof-of-concept
├── inmem_bench.m # ANE dispatch latency benchmarks ├── inmem_bench.m # ANE dispatch latency benchmarks
├── inmem_peak.m # Peak TFLOPS measurement (2048x2048 matmul) ├── inmem_peak.m # Peak TFLOPS measurement (2048x2048 matmul)
├── sram_bench.m # ANE SRAM bandwidth probing ├── sram_bench.m # ANE SRAM bandwidth probing
├── sram_probe.m # SRAM size/layout exploration ├── sram_probe.m # SRAM size/layout exploration
├── benchmarks/
│ ├── ANE_BENCHMARK_REPORT.md # Cross-chip benchmark report
│ └── community_results.json # Community-submitted benchmark data
├── bridge/
│ ├── Makefile
│ ├── ane_bridge.h # ANE bridge header
│ └── ane_bridge.m # ANE bridge implementation
└── training/ └── training/
├── ane_runtime.h # ANE private API wrapper (compile, eval, IOSurface) ├── Makefile
├── ane_mil_gen.h # MIL program generation helpers ├── README.md # Training pipeline documentation
├── model.h # Model weight initialization and blob builders ├── ane_runtime.h # ANE private API wrapper (compile, eval, IOSurface)
├── forward.h # Forward pass MIL generators ├── ane_mil_gen.h # MIL program generation helpers
├── backward.h # Backward pass MIL generators ├── ane_classifier.h # ANE-offloaded classifier
├── train.m # Minimal training loop (early prototype) ├── ane_rmsnorm_bwd.h # ANE RMSNorm backward pass
├── tiny_train.m # 2-layer tiny model training ├── model.h # Model weight initialization and blob builders
├── train_large.m # Main: single-layer dim=768 training (optimized) ├── forward.h # Forward pass MIL generators
├── test_*.m # Unit tests for individual kernels ├── backward.h # Backward pass MIL generators
└── Makefile ├── stories_config.h # Stories model configuration
├── stories_cpu_ops.h # CPU-side operations for Stories model
├── stories_io.h # Stories data I/O (TinyStories loading)
├── stories_mil.h # Stories MIL program generators
├── train.m # Minimal training loop (early prototype)
├── tiny_train.m # 2-layer tiny model training
├── train_large.m # Main: Stories110M training (static pipeline)
├── train_large_ane.m # Stories110M training (ANE classifier)
├── test_*.m # Unit tests for individual kernels
├── dashboard.py # Live training dashboard (power, throughput)
├── tokenize.py # TinyStories pretokenization script
├── download_data.sh # Training data download script
└── training_dynamic/ # Dynamic pipeline (no recompilation)
├── Makefile
├── config.h
├── cpu_ops.h
├── io.h
├── mil_dynamic.h
└── train.m
``` ```
## Training Data ## Training Data