mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-02 03:17:32 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
180 lines
5.9 KiB
Markdown
180 lines
5.9 KiB
Markdown
# Complete Beautiful Flow: All 20 Modules
|
|
|
|
## The Inevitable Discovery Pattern - Full Journey
|
|
|
|
### **PHASE 1: FOUNDATION (Modules 1-6)**
|
|
```
|
|
1. Setup → 2. Tensor → 3. Activations → 4. Layers → 5. Losses → 6. Optimizers
|
|
```
|
|
|
|
**Module 5 → 6 Connection:**
|
|
```python
|
|
# Module 5 ends: Manual weight updates are messy and error-prone
|
|
for layer in network:
|
|
layer.weight -= learning_rate * layer.grad # Easy to forget, inconsistent
|
|
|
|
# Module 6 starts: "We need systematic weight updates!"
|
|
optimizer = SGD(network.parameters(), lr=0.01)
|
|
optimizer.step() # Clean, systematic, never forget
|
|
```
|
|
|
|
### **PHASE 2: LEARNING TO LEARN (Modules 6-10)**
|
|
|
|
Here's where Training fits in the beautiful flow:
|
|
|
|
#### **Module 6 → 7: Optimizers → Autograd**
|
|
```python
|
|
# Module 6 ends: Computing gradients manually is error-prone
|
|
# For each layer: manually compute dL/dW, dL/db... tedious and buggy!
|
|
|
|
# Module 7 starts: "We need automatic gradient computation!"
|
|
loss.backward() # Handles any architecture
|
|
optimizer.step() # Use the gradients
|
|
```
|
|
|
|
#### **Module 7 → 8: Autograd → Training Loops**
|
|
```python
|
|
# Module 7 ends: We can optimize, but doing it systematically for multiple epochs?
|
|
loss.backward()
|
|
optimizer.step()
|
|
# How do we do this for 100 epochs? Track progress? Validate?
|
|
|
|
# Module 8 starts: "We need systematic training procedures!"
|
|
for epoch in range(100):
|
|
for x, y in data:
|
|
optimizer.zero_grad()
|
|
loss = model(x, y)
|
|
loss.backward()
|
|
optimizer.step()
|
|
|
|
# Validation, logging, early stopping
|
|
if epoch % 10 == 0:
|
|
accuracy = validate(model)
|
|
print(f"Epoch {epoch}: {accuracy}")
|
|
```
|
|
|
|
#### **Module 8 → 9: Training → Spatial**
|
|
```python
|
|
# Module 8 ends: MLPs trained systematically get 85% on MNIST
|
|
# But images have spatial structure - MLPs treat pixels as independent
|
|
|
|
# Module 9 starts: "Images need spatial understanding!"
|
|
conv = Conv2d(1, 16, 3) # Local patterns
|
|
cnn = CNN([conv, pool, linear])
|
|
accuracy = train(cnn) # 98% vs 85% - huge jump!
|
|
```
|
|
|
|
#### **Module 9 → 10: Spatial → DataLoader**
|
|
```python
|
|
# Module 9 ends: Training CNNs sample-by-sample is painfully slow
|
|
for epoch in range(10):
|
|
for i in range(50000): # CIFAR-10 one by one
|
|
sample = dataset[i] # 50k individual loads!
|
|
loss = cnn(sample)
|
|
optimizer.step()
|
|
# Takes 3+ hours, terrible GPU utilization
|
|
|
|
# Module 10 starts: "We need efficient data feeding!"
|
|
loader = DataLoader(dataset, batch_size=32, shuffle=True)
|
|
for epoch in range(10):
|
|
for batch in loader: # 32 samples at once
|
|
loss = cnn(batch)
|
|
optimizer.step()
|
|
# Same training, 30 minutes instead of 3 hours!
|
|
```
|
|
|
|
## **COMPLETE BEAUTIFUL FLOW: Modules 1-20**
|
|
|
|
### **Phase 1: Foundation (1-6)**
|
|
1. **Setup** - Environment
|
|
2. **Tensor** - Data structures
|
|
3. **Activations** - Nonlinearity
|
|
4. **Layers** - Network building blocks
|
|
5. **Losses** - Learning objectives
|
|
6. **Optimizers** - Systematic weight updates
|
|
|
|
**Milestone**: Can solve XOR with clean, systematic code
|
|
|
|
### **Phase 2: Learning to Learn (7-10)**
|
|
7. **Autograd** - Automatic gradient computation
|
|
8. **Training** - Systematic learning procedures
|
|
9. **Spatial** - Architecture for images
|
|
10. **DataLoader** - Efficient data feeding
|
|
|
|
**Milestone**: Train CNN on CIFAR-10 to 75% - complete ML pipeline!
|
|
|
|
### **Phase 3: Modern AI (11-14)**
|
|
11. **Tokenization** - Text processing
|
|
12. **Embeddings** - Vector representations
|
|
13. **Attention** - Sequence understanding
|
|
14. **Transformers** - Complete language models
|
|
|
|
**Milestone**: Build GPT from scratch!
|
|
|
|
### **Phase 4: System Optimization (15-19)**
|
|
15. **Acceleration** - Loops → NumPy optimizations
|
|
16. **Caching** - KV cache for transformers
|
|
17. **Precision** - Quantization techniques
|
|
18. **Compression** - Pruning and distillation
|
|
19. **Benchmarking** - Performance measurement
|
|
|
|
**Milestone**: 10-100x speedups on existing models
|
|
|
|
### **Phase 5: Capstone (20)**
|
|
20. **Capstone** - Complete optimized ML system
|
|
|
|
**Final Milestone**: Production-ready ML system
|
|
|
|
## **Key Insights: Why Training is Module 8**
|
|
|
|
### **Training Needs Both Optimizers AND Autograd**
|
|
```python
|
|
# Training module uses both:
|
|
def train_epoch(model, optimizer, data): # Needs optimizer
|
|
for x, y in data:
|
|
optimizer.zero_grad()
|
|
loss = model(x, y)
|
|
loss.backward() # Needs autograd
|
|
optimizer.step()
|
|
```
|
|
|
|
### **Training Creates Motivation for Better Architectures**
|
|
- Train MLPs systematically → hit accuracy limits
|
|
- "Images have structure MLPs can't see"
|
|
- Natural motivation for CNNs
|
|
|
|
### **Training Makes DataLoader Pain Real**
|
|
- Students experience slow single-sample training
|
|
- Feel the inefficiency before learning the solution
|
|
- DataLoader becomes obvious relief, not abstract concept
|
|
|
|
## **Beautiful Connection Pattern:**
|
|
|
|
**Every module solves the obvious problem from the previous:**
|
|
|
|
6. **Optimizers**: "Manual updates are error-prone"
|
|
7. **Autograd**: "Manual gradients are error-prone"
|
|
8. **Training**: "Ad hoc optimization is unsystematic"
|
|
9. **Spatial**: "MLPs hit accuracy limits on images"
|
|
10. **DataLoader**: "Sample-by-sample training is too slow"
|
|
|
|
## **Expert Validation Test:**
|
|
|
|
Would PyTorch experts say this is beautiful?
|
|
|
|
✅ **Inevitable progression**: Each step solves obvious problems
|
|
✅ **Historical accuracy**: Mirrors how PyTorch actually evolved
|
|
✅ **Immediate gratification**: Every module provides clear value
|
|
✅ **No artificial gaps**: Students predict what comes next
|
|
✅ **Production relevance**: Real ML engineering progression
|
|
|
|
## **The "Training as Bridge" Insight**
|
|
|
|
Training (Module 8) serves as the **bridge** between:
|
|
- **Infrastructure** (Modules 6-7): Optimizers + Autograd
|
|
- **Architecture** (Module 9): Spatial operations
|
|
- **Efficiency** (Module 10): Data loading
|
|
|
|
Students learn to train systematically, THEN discover architectural and efficiency improvements.
|
|
|
|
This creates the beautiful flow you want where experts will say: "This is exactly how someone should learn ML systems - every step feels inevitable." |