mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-03 08:27:32 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
241 lines
7.8 KiB
Markdown
241 lines
7.8 KiB
Markdown
# Beautiful Module Progression Analysis
|
|
## Creating Seamless Learning with Immediate Use and Tight Connections
|
|
|
|
Let me step through each module brutally honestly to ensure we have a **beautiful progression** where experts will say "this is perfect pedagogical flow."
|
|
|
|
## Current State Analysis: Where Are the Gaps?
|
|
|
|
### **Phase 1: Foundation (Modules 1-6)** ✅ TIGHT
|
|
```
|
|
1. Setup → 2. Tensor → 3. Activations → 4. Layers → 5. Losses → 6. Autograd
|
|
```
|
|
|
|
**Connection Analysis:**
|
|
- **1→2**: Setup enables tensor operations ✅
|
|
- **2→3**: Tensors immediately need nonlinearity ✅
|
|
- **3→4**: Activations go into layers ✅
|
|
- **4→5**: Layers need loss functions ✅
|
|
- **5→6**: Losses need gradients ✅
|
|
|
|
**Milestone**: XOR problem solved - beautiful culmination!
|
|
|
|
### **Phase 2: Training Systems (Modules 7-10)** ❌ BROKEN CONNECTIONS
|
|
|
|
**Current Order:**
|
|
```
|
|
7. DataLoader → 8. Optimizers → 9. Spatial → 10. Training
|
|
```
|
|
|
|
**Connection Problems:**
|
|
- **7→8**: DataLoader sits unused until training ❌
|
|
- **8→9**: Optimizers can't optimize spatial models yet ❌
|
|
- **9→10**: Why build CNNs if we can't train them? ❌
|
|
|
|
**PyTorch Expert's Proposed Order:**
|
|
```
|
|
7. Optimizers → 8. Spatial → 9. Training → 10. DataLoader
|
|
```
|
|
|
|
**Let Me Test This Connection by Connection:**
|
|
|
|
## **BRUTAL CONNECTION ANALYSIS: Proposed Order**
|
|
|
|
### **Module 6 → Module 7: Autograd → Optimizers**
|
|
**Connection**: ✅ PERFECT
|
|
- Module 6 ends: "Now we have gradients!"
|
|
- Module 7 starts: "What do we do with gradients? Optimize!"
|
|
- **Immediate use**: Use Module 6's gradient system in SGD/Adam
|
|
- **Gap distance**: ZERO
|
|
|
|
```python
|
|
# Module 6 ending
|
|
loss.backward() # Gradients computed
|
|
print("Gradients:", [p.grad for p in model.parameters()])
|
|
|
|
# Module 7 immediate start
|
|
optimizer = SGD(model.parameters(), lr=0.01)
|
|
optimizer.step() # USE those gradients immediately!
|
|
```
|
|
|
|
### **Module 7 → Module 8: Optimizers → Spatial**
|
|
**Connection**: ⚠️ PROBLEMATIC
|
|
- Module 7 ends: "I can optimize parameters"
|
|
- Module 8 starts: "Let's build CNNs"
|
|
- **Problem**: What meaningful model do optimizers optimize in Module 7?
|
|
- **Gap distance**: LARGE
|
|
|
|
**The Issue:** Optimizers without meaningful models to optimize = abstract learning
|
|
|
|
**BETTER APPROACH:** What if Module 7 uses simple MLPs from Module 4?
|
|
|
|
```python
|
|
# Module 7: Optimizers (using existing components)
|
|
mlp = MLP([784, 64, 10]) # From Module 4
|
|
optimizer = SGD(mlp.parameters(), lr=0.01)
|
|
|
|
# Train on MNIST digits
|
|
for x, y in mnist_samples:
|
|
loss = cross_entropy(mlp(x), y)
|
|
optimizer.step(loss)
|
|
```
|
|
|
|
**This creates immediate use and motivation for CNNs!**
|
|
|
|
### **Module 8 → Module 9: Spatial → Training**
|
|
**Connection**: ❌ BROKEN
|
|
- Module 8 ends: "I built CNN components"
|
|
- Module 9 starts: "Let's train models"
|
|
- **Problem**: Students test CNNs how? Random forward passes?
|
|
- **Gap distance**: MEDIUM
|
|
|
|
**What's Missing:** Immediate use of CNN components in Module 8
|
|
|
|
**SOLUTION:** Module 8 should immediately train simple CNNs:
|
|
|
|
```python
|
|
# Module 8: Spatial (with immediate training)
|
|
conv = Conv2d(3, 16, 3)
|
|
pool = MaxPool2d(2)
|
|
simple_cnn = Sequential([conv, pool, flatten, linear])
|
|
|
|
# Immediate training with Module 7's optimizers
|
|
optimizer = Adam(simple_cnn.parameters()) # From Module 7!
|
|
for epoch in range(5):
|
|
loss = simple_cnn(sample_image)
|
|
optimizer.step(loss)
|
|
```
|
|
|
|
### **Module 9 → Module 10: Training → DataLoader**
|
|
**Connection**: ✅ BEAUTIFUL (if done right)
|
|
- Module 9 ends: "Single-sample training is painfully slow"
|
|
- Module 10 starts: "Let's batch this efficiently"
|
|
- **Immediate use**: Direct before/after comparison
|
|
- **Gap distance**: ZERO
|
|
|
|
## **REVISED BEAUTIFUL PROGRESSION**
|
|
|
|
Based on brutal analysis, here's what would create expert-level flow:
|
|
|
|
### **Module 7: Optimizers (with immediate MLP training)**
|
|
```python
|
|
# Build on Module 4 MLPs + Module 6 autograd
|
|
mnist_mlp = MLP([784, 64, 10])
|
|
optimizer = SGD(mnist_mlp.parameters(), lr=0.01)
|
|
|
|
# Train immediately on MNIST digits
|
|
for sample in range(1000):
|
|
x, y = mnist[sample]
|
|
loss = cross_entropy(mnist_mlp(x), y)
|
|
optimizer.step(loss)
|
|
|
|
print("Achieved 85% on MNIST!")
|
|
print("But this is slow and MLPs aren't great for images...")
|
|
```
|
|
|
|
**Ends with motivation**: "We need better architectures for images"
|
|
|
|
### **Module 8: Spatial (with immediate CNN training)**
|
|
```python
|
|
# Build CNN components
|
|
conv = Conv2d(1, 16, 3)
|
|
pool = MaxPool2d(2)
|
|
mnist_cnn = Sequential([conv, pool, flatten, Linear(16*13*13, 10)])
|
|
|
|
# Train immediately using Module 7's optimizers
|
|
optimizer = Adam(mnist_cnn.parameters()) # Immediate use!
|
|
for sample in range(1000):
|
|
x, y = mnist[sample]
|
|
loss = cross_entropy(mnist_cnn(x), y)
|
|
optimizer.step(loss)
|
|
|
|
print("CNN gets 92% vs MLP's 85%!")
|
|
print("But training sample-by-sample is still slow...")
|
|
```
|
|
|
|
**Ends with motivation**: "We need systematic training"
|
|
|
|
### **Module 9: Training (systematic but inefficient)**
|
|
```python
|
|
# Build proper training loops
|
|
def train_epoch(model, optimizer, dataset):
|
|
for i, (x, y) in enumerate(dataset): # One by one!
|
|
optimizer.zero_grad()
|
|
loss = cross_entropy(model(x), y)
|
|
loss.backward()
|
|
optimizer.step()
|
|
|
|
if i % 1000 == 0:
|
|
print(f"Sample {i}/50000 - this is taking forever!")
|
|
|
|
# Train CIFAR-10 CNN
|
|
cifar_cnn = CNN() # From Module 8
|
|
train_epoch(cifar_cnn, optimizer, cifar10_dataset)
|
|
# Takes 3 hours instead of 30 minutes!
|
|
```
|
|
|
|
**Ends with pain**: "This is unbearably slow for real datasets"
|
|
|
|
### **Module 10: DataLoader (immediate relief)**
|
|
```python
|
|
# Same model, same optimizer, but batched!
|
|
loader = DataLoader(cifar10_dataset, batch_size=32)
|
|
|
|
def train_epoch_fast(model, optimizer, dataloader):
|
|
for batch_x, batch_y in dataloader: # 32 at once!
|
|
optimizer.zero_grad()
|
|
loss = cross_entropy(model(batch_x), batch_y)
|
|
loss.backward()
|
|
optimizer.step()
|
|
|
|
# Same training, 32x faster!
|
|
train_epoch_fast(cifar_cnn, optimizer, loader)
|
|
# Takes 30 minutes - students see immediate relief!
|
|
```
|
|
|
|
## **BEAUTIFUL CONNECTIONS SUMMARY**
|
|
|
|
### **Every Module Immediately Uses Previous:**
|
|
- **Module 7**: Uses Module 6's autograd + Module 4's MLPs
|
|
- **Module 8**: Uses Module 7's optimizers for CNN training
|
|
- **Module 9**: Uses Module 8's CNNs + Module 7's optimizers
|
|
- **Module 10**: Uses Module 9's training but makes it efficient
|
|
|
|
### **Every Module Creates Clear Motivation:**
|
|
- **Module 7**: "MLPs aren't great for images" → need CNNs
|
|
- **Module 8**: "Sample-by-sample training is ad hoc" → need systematic training
|
|
- **Module 9**: "This is painfully slow" → need efficient data loading
|
|
- **Module 10**: "Now we can train real models on real data fast!"
|
|
|
|
### **Gap Distance**: ZERO between every module
|
|
|
|
## **EXPERT VALIDATION PREDICTION**
|
|
|
|
With this progression, experts will say:
|
|
- ✅ **"Perfect logical flow"** - each module builds immediately
|
|
- ✅ **"No wasted learning"** - everything gets used right away
|
|
- ✅ **"Natural motivation"** - students feel the need for each next step
|
|
- ✅ **"Production-like progression"** - mirrors how real ML systems evolve
|
|
|
|
## **IMPLEMENTATION REQUIREMENTS**
|
|
|
|
### **Module 7: Optimizers**
|
|
- Must include immediate MLP training examples
|
|
- Show clear performance metrics (85% MNIST)
|
|
- End with "images need better architectures"
|
|
|
|
### **Module 8: Spatial**
|
|
- Must immediately train CNNs using Module 7's optimizers
|
|
- Show CNN vs MLP comparison (92% vs 85%)
|
|
- End with "sample-by-sample is inefficient"
|
|
|
|
### **Module 9: Training**
|
|
- Must deliberately show slow single-sample training
|
|
- Create genuine frustration with timing
|
|
- End with clear "this is too slow" message
|
|
|
|
### **Module 10: DataLoader**
|
|
- Must show dramatic before/after speedup
|
|
- Use identical model/optimizer from Module 9
|
|
- Students see immediate 20-50x improvement
|
|
|
|
This creates the **beautiful progression** you want - every step immediately useful, tightly connected, with clear motivation for what's next. |