mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-03 08:14:00 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
159 lines
6.0 KiB
Markdown
159 lines
6.0 KiB
Markdown
# TinyTorch Complete Module Roadmap
|
|
## 20-Module ML Systems Course with Competition System
|
|
|
|
### **PHASE 1: FOUNDATION (Modules 1-6)**
|
|
Build the core mathematical infrastructure for neural networks.
|
|
|
|
- **Module 01**: `setup` - Development environment configuration
|
|
- **Module 02**: `tensor` - Core data structures with autodiff support *(backward design: built-in grad support)*
|
|
- **Module 03**: `activations` - ReLU, Sigmoid, nonlinearity functions
|
|
- **Module 04**: `layers` - Dense layers, network building blocks
|
|
- **Module 05**: `losses` - MSE, CrossEntropy, BCE loss functions
|
|
- **Module 06**: `autograd` - Automatic differentiation engine
|
|
|
|
**Capability Unlocked**: Networks can learn through backpropagation
|
|
**Historical Example**: XOR Problem (1969) - Solve what stumped AI for a decade
|
|
|
|
---
|
|
|
|
### **PHASE 2: TRAINING SYSTEMS (Modules 7-10)**
|
|
Build complete training pipelines for real datasets.
|
|
|
|
- **Module 07**: `dataloader` - Data pipelines, batching, real datasets *(moved from 09)*
|
|
- **Module 08**: `optimizers` - SGD, Adam optimization algorithms
|
|
- **Module 09**: `spatial` - Conv2D, pooling for image processing *(moved from 07)*
|
|
- **Module 10**: `training` - Complete training loops with validation
|
|
|
|
**Capability Unlocked**: Train deep networks on real datasets
|
|
**Historical Examples**:
|
|
- After Module 9: LeNet (1998) - First CNN for digit recognition
|
|
- After Module 10: AlexNet (2012) - Deep learning revolution
|
|
|
|
---
|
|
|
|
### **PHASE 3: LANGUAGE MODELS (Modules 11-14)**
|
|
Build modern transformer architectures for NLP.
|
|
|
|
- **Module 11**: `tokenization` - Text preprocessing and tokenization
|
|
- **Module 12**: `embeddings` - Word vectors, positional encoding
|
|
- **Module 13**: `attention` - Self-attention mechanisms
|
|
- **Module 14**: `transformers` - Complete transformer architecture
|
|
|
|
**Capability Unlocked**: Build GPT-style language models
|
|
**Historical Example**: GPT (2018) - Foundation of modern AI
|
|
|
|
---
|
|
|
|
### **PHASE 4: SYSTEM OPTIMIZATION (Modules 15-19)**
|
|
Transform educational code into production-ready systems through progressive optimization.
|
|
|
|
- **Module 15**: `acceleration` - Core performance optimization
|
|
- Journey from educational loops to optimized operations
|
|
- Cache-friendly blocking for matrix multiplication
|
|
- NumPy vectorization (10-100x speedups)
|
|
- Transparent backend dispatch (existing code runs faster automatically!)
|
|
|
|
- **Module 16**: `caching` - Memory optimization patterns
|
|
- KV caching for transformer inference
|
|
- Incremental computation techniques
|
|
- Autoregressive generation optimization
|
|
- Memory vs computation tradeoffs
|
|
|
|
- **Module 17**: `precision` - Numerical optimization
|
|
- Post-training INT8 quantization
|
|
- Calibration and scaling techniques
|
|
- Accuracy vs performance tradeoffs
|
|
- Memory footprint reduction
|
|
|
|
- **Module 18**: `compression` - Model size optimization
|
|
- Magnitude-based pruning
|
|
- Structured vs unstructured sparsity
|
|
- Knowledge distillation basics
|
|
- Deployment optimization
|
|
|
|
- **Module 19**: `benchmarking` - Performance analysis
|
|
- Profiling and bottleneck identification
|
|
- Memory usage analysis
|
|
- Comparative benchmarking
|
|
- Scientific performance measurement
|
|
|
|
---
|
|
|
|
### **PHASE 5: CAPSTONE PROJECT (Module 20)**
|
|
|
|
- **Module 20**: `capstone` - Complete ML system
|
|
- Combine all optimization techniques
|
|
- Build optimized end-to-end systems
|
|
- Example projects:
|
|
- Optimized CIFAR-10 trainer (75% accuracy, minimal resources)
|
|
- Efficient GPT inference engine (memory-constrained)
|
|
- Custom optimization challenge
|
|
- Deploy production-ready ML systems
|
|
|
|
---
|
|
|
|
## **Key Design Principles**
|
|
|
|
### **1. Backward Design Philosophy**
|
|
Each module is designed with future needs in mind:
|
|
- **Tensors** (Module 2): Built with gradient support from day 1
|
|
- **Layers** (Module 4): Parameter management ready for optimizers
|
|
- **Training** (Module 10): Memory tracking for optimization modules
|
|
- **Transformers** (Module 14): KV structure ready for caching
|
|
|
|
### **2. Backend Dispatch Architecture**
|
|
```python
|
|
# Students run SAME code throughout
|
|
model.train() # Uses appropriate backend automatically
|
|
|
|
# Module 1-14: Naive backend (for learning)
|
|
# Module 15+: Optimized backend (for performance)
|
|
# Zero code changes needed!
|
|
```
|
|
|
|
### **3. Progressive Optimization Journey**
|
|
- **Understanding through implementation** (Modules 1-14): Build with loops for clarity
|
|
- **Systematic optimization** (Modules 15-19): Transform loops into production code
|
|
- **Transparent acceleration**: Optimizations work automatically on existing code
|
|
- **Real-world techniques**: Learn optimizations used in PyTorch/TensorFlow
|
|
|
|
### **4. Historical Context**
|
|
Examples map to ML breakthroughs:
|
|
- 1957: Perceptron (Module 4)
|
|
- 1969: XOR Solution (Module 6)
|
|
- 1998: LeNet (Module 9)
|
|
- 2012: AlexNet (Module 10)
|
|
- 2018: GPT (Module 14)
|
|
|
|
---
|
|
|
|
## **Learning Progression**
|
|
|
|
### **Weeks 1-6**: Foundation
|
|
Students build mathematical infrastructure and understand how neural networks work.
|
|
|
|
### **Weeks 7-10**: Training Systems
|
|
Students build complete training pipelines and understand how to scale to real datasets.
|
|
|
|
### **Weeks 11-14**: Modern AI
|
|
Students build transformer architectures that power ChatGPT and modern AI.
|
|
|
|
### **Weeks 15-19**: System Optimization
|
|
Students transform educational code into production-ready systems through progressive optimization techniques.
|
|
|
|
### **Week 20**: Capstone Project
|
|
Students combine all techniques to build complete, optimized ML systems from scratch.
|
|
|
|
---
|
|
|
|
## **Success Metrics**
|
|
|
|
By completion, students will have:
|
|
- ✅ Built every component of modern ML systems from scratch
|
|
- ✅ Recreated the major breakthroughs in AI history
|
|
- ✅ Transformed educational loops into production-ready code (10-100x speedups)
|
|
- ✅ Understood why PyTorch, TensorFlow are designed the way they are
|
|
- ✅ Mastered real-world optimization techniques (caching, quantization, pruning)
|
|
- ✅ Built complete ML systems that transparently optimize themselves
|
|
|
|
**Ultimate Goal**: Students who can read PyTorch source code and think "I understand why they did it this way - I built this myself in TinyTorch!" |