docs: update README file structure and fix typo

Update the file structure section to reflect the current repository layout, including benchmarks/, bridge/, training_dynamic/, and newly added header files, scripts, and training variants. Fix missing space in "Fork it, build on it" section.
2026-03-04 17:27:38 +03:00 · 2026-03-04 17:27:38 +03:00 · d5eb7d28e7
parent efcf193075
commit d5eb7d28e7
1 changed files with 42 additions and 17 deletions
--- a/README.md
+++ b/README.md
@ -49,7 +49,7 @@ That said:
 ### Fork it, build on it
-This is MIT licensed for a reason. Everyone now has access to AI-assisted development tools that can adapt and extend code in hours. If this project is useful to you — take it, modify it, build something better. If you do something cool with it, I'd love to hear about it.If in future, community decides to maintain one source of truth repo, I'm in full support of that.
+This is MIT licensed for a reason. Everyone now has access to AI-assisted development tools that can adapt and extend code in hours. If this project is useful to you — take it, modify it, build something better. If you do something cool with it, I'd love to hear about it. If in future, community decides to maintain one source of truth repo, I'm in full support of that.
 ---
@ -92,23 +92,48 @@ Key optimizations:
 ## File Structure
 ```
-├── api_exploration.m       # Initial ANE API discovery
+├── api_exploration.m           # Initial ANE API discovery
-├── inmem_basic.m           # In-memory MIL compilation proof-of-concept
+├── inmem_basic.m               # In-memory MIL compilation proof-of-concept
-├── inmem_bench.m           # ANE dispatch latency benchmarks
+├── inmem_bench.m               # ANE dispatch latency benchmarks
-├── inmem_peak.m            # Peak TFLOPS measurement (2048x2048 matmul)
+├── inmem_peak.m                # Peak TFLOPS measurement (2048x2048 matmul)
-├── sram_bench.m            # ANE SRAM bandwidth probing
+├── sram_bench.m                # ANE SRAM bandwidth probing
-├── sram_probe.m            # SRAM size/layout exploration
+├── sram_probe.m                # SRAM size/layout exploration
 ├── benchmarks/
 │   ├── ANE_BENCHMARK_REPORT.md # Cross-chip benchmark report
 │   └── community_results.json  # Community-submitted benchmark data
 ├── bridge/
 │   ├── Makefile
 │   ├── ane_bridge.h            # ANE bridge header
 │   └── ane_bridge.m            # ANE bridge implementation
 └── training/
-    ├── ane_runtime.h       # ANE private API wrapper (compile, eval, IOSurface)
+    ├── Makefile
-    ├── ane_mil_gen.h       # MIL program generation helpers
+    ├── README.md               # Training pipeline documentation
-    ├── model.h             # Model weight initialization and blob builders
+    ├── ane_runtime.h           # ANE private API wrapper (compile, eval, IOSurface)
-    ├── forward.h           # Forward pass MIL generators
+    ├── ane_mil_gen.h           # MIL program generation helpers
-    ├── backward.h          # Backward pass MIL generators
+    ├── ane_classifier.h        # ANE-offloaded classifier
-    ├── train.m             # Minimal training loop (early prototype)
+    ├── ane_rmsnorm_bwd.h       # ANE RMSNorm backward pass
-    ├── tiny_train.m        # 2-layer tiny model training
+    ├── model.h                 # Model weight initialization and blob builders
-    ├── train_large.m       # Main: single-layer dim=768 training (optimized)
+    ├── forward.h               # Forward pass MIL generators
-    ├── test_*.m            # Unit tests for individual kernels
+    ├── backward.h              # Backward pass MIL generators
-    └── Makefile
+    ├── stories_config.h        # Stories model configuration
    ├── stories_cpu_ops.h       # CPU-side operations for Stories model
    ├── stories_io.h            # Stories data I/O (TinyStories loading)
    ├── stories_mil.h           # Stories MIL program generators
    ├── train.m                 # Minimal training loop (early prototype)
    ├── tiny_train.m            # 2-layer tiny model training
    ├── train_large.m           # Main: Stories110M training (static pipeline)
    ├── train_large_ane.m       # Stories110M training (ANE classifier)
    ├── test_*.m                # Unit tests for individual kernels
    ├── dashboard.py            # Live training dashboard (power, throughput)
    ├── tokenize.py             # TinyStories pretokenization script
    ├── download_data.sh        # Training data download script
    └── training_dynamic/       # Dynamic pipeline (no recompilation)
        ├── Makefile
        ├── config.h
        ├── cpu_ops.h
        ├── io.h
        ├── mil_dynamic.h
        └── train.m
 ```
 ## Training Data