This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
7.8 KiB
Beautiful Module Progression Analysis
Creating Seamless Learning with Immediate Use and Tight Connections
Let me step through each module brutally honestly to ensure we have a beautiful progression where experts will say "this is perfect pedagogical flow."
Current State Analysis: Where Are the Gaps?
Phase 1: Foundation (Modules 1-6) ✅ TIGHT
1. Setup → 2. Tensor → 3. Activations → 4. Layers → 5. Losses → 6. Autograd
Connection Analysis:
- 1→2: Setup enables tensor operations ✅
- 2→3: Tensors immediately need nonlinearity ✅
- 3→4: Activations go into layers ✅
- 4→5: Layers need loss functions ✅
- 5→6: Losses need gradients ✅
Milestone: XOR problem solved - beautiful culmination!
Phase 2: Training Systems (Modules 7-10) ❌ BROKEN CONNECTIONS
Current Order:
7. DataLoader → 8. Optimizers → 9. Spatial → 10. Training
Connection Problems:
- 7→8: DataLoader sits unused until training ❌
- 8→9: Optimizers can't optimize spatial models yet ❌
- 9→10: Why build CNNs if we can't train them? ❌
PyTorch Expert's Proposed Order:
7. Optimizers → 8. Spatial → 9. Training → 10. DataLoader
Let Me Test This Connection by Connection:
BRUTAL CONNECTION ANALYSIS: Proposed Order
Module 6 → Module 7: Autograd → Optimizers
Connection: ✅ PERFECT
- Module 6 ends: "Now we have gradients!"
- Module 7 starts: "What do we do with gradients? Optimize!"
- Immediate use: Use Module 6's gradient system in SGD/Adam
- Gap distance: ZERO
# Module 6 ending
loss.backward() # Gradients computed
print("Gradients:", [p.grad for p in model.parameters()])
# Module 7 immediate start
optimizer = SGD(model.parameters(), lr=0.01)
optimizer.step() # USE those gradients immediately!
Module 7 → Module 8: Optimizers → Spatial
Connection: ⚠️ PROBLEMATIC
- Module 7 ends: "I can optimize parameters"
- Module 8 starts: "Let's build CNNs"
- Problem: What meaningful model do optimizers optimize in Module 7?
- Gap distance: LARGE
The Issue: Optimizers without meaningful models to optimize = abstract learning
BETTER APPROACH: What if Module 7 uses simple MLPs from Module 4?
# Module 7: Optimizers (using existing components)
mlp = MLP([784, 64, 10]) # From Module 4
optimizer = SGD(mlp.parameters(), lr=0.01)
# Train on MNIST digits
for x, y in mnist_samples:
loss = cross_entropy(mlp(x), y)
optimizer.step(loss)
This creates immediate use and motivation for CNNs!
Module 8 → Module 9: Spatial → Training
Connection: ❌ BROKEN
- Module 8 ends: "I built CNN components"
- Module 9 starts: "Let's train models"
- Problem: Students test CNNs how? Random forward passes?
- Gap distance: MEDIUM
What's Missing: Immediate use of CNN components in Module 8
SOLUTION: Module 8 should immediately train simple CNNs:
# Module 8: Spatial (with immediate training)
conv = Conv2d(3, 16, 3)
pool = MaxPool2d(2)
simple_cnn = Sequential([conv, pool, flatten, linear])
# Immediate training with Module 7's optimizers
optimizer = Adam(simple_cnn.parameters()) # From Module 7!
for epoch in range(5):
loss = simple_cnn(sample_image)
optimizer.step(loss)
Module 9 → Module 10: Training → DataLoader
Connection: ✅ BEAUTIFUL (if done right)
- Module 9 ends: "Single-sample training is painfully slow"
- Module 10 starts: "Let's batch this efficiently"
- Immediate use: Direct before/after comparison
- Gap distance: ZERO
REVISED BEAUTIFUL PROGRESSION
Based on brutal analysis, here's what would create expert-level flow:
Module 7: Optimizers (with immediate MLP training)
# Build on Module 4 MLPs + Module 6 autograd
mnist_mlp = MLP([784, 64, 10])
optimizer = SGD(mnist_mlp.parameters(), lr=0.01)
# Train immediately on MNIST digits
for sample in range(1000):
x, y = mnist[sample]
loss = cross_entropy(mnist_mlp(x), y)
optimizer.step(loss)
print("Achieved 85% on MNIST!")
print("But this is slow and MLPs aren't great for images...")
Ends with motivation: "We need better architectures for images"
Module 8: Spatial (with immediate CNN training)
# Build CNN components
conv = Conv2d(1, 16, 3)
pool = MaxPool2d(2)
mnist_cnn = Sequential([conv, pool, flatten, Linear(16*13*13, 10)])
# Train immediately using Module 7's optimizers
optimizer = Adam(mnist_cnn.parameters()) # Immediate use!
for sample in range(1000):
x, y = mnist[sample]
loss = cross_entropy(mnist_cnn(x), y)
optimizer.step(loss)
print("CNN gets 92% vs MLP's 85%!")
print("But training sample-by-sample is still slow...")
Ends with motivation: "We need systematic training"
Module 9: Training (systematic but inefficient)
# Build proper training loops
def train_epoch(model, optimizer, dataset):
for i, (x, y) in enumerate(dataset): # One by one!
optimizer.zero_grad()
loss = cross_entropy(model(x), y)
loss.backward()
optimizer.step()
if i % 1000 == 0:
print(f"Sample {i}/50000 - this is taking forever!")
# Train CIFAR-10 CNN
cifar_cnn = CNN() # From Module 8
train_epoch(cifar_cnn, optimizer, cifar10_dataset)
# Takes 3 hours instead of 30 minutes!
Ends with pain: "This is unbearably slow for real datasets"
Module 10: DataLoader (immediate relief)
# Same model, same optimizer, but batched!
loader = DataLoader(cifar10_dataset, batch_size=32)
def train_epoch_fast(model, optimizer, dataloader):
for batch_x, batch_y in dataloader: # 32 at once!
optimizer.zero_grad()
loss = cross_entropy(model(batch_x), batch_y)
loss.backward()
optimizer.step()
# Same training, 32x faster!
train_epoch_fast(cifar_cnn, optimizer, loader)
# Takes 30 minutes - students see immediate relief!
BEAUTIFUL CONNECTIONS SUMMARY
Every Module Immediately Uses Previous:
- Module 7: Uses Module 6's autograd + Module 4's MLPs
- Module 8: Uses Module 7's optimizers for CNN training
- Module 9: Uses Module 8's CNNs + Module 7's optimizers
- Module 10: Uses Module 9's training but makes it efficient
Every Module Creates Clear Motivation:
- Module 7: "MLPs aren't great for images" → need CNNs
- Module 8: "Sample-by-sample training is ad hoc" → need systematic training
- Module 9: "This is painfully slow" → need efficient data loading
- Module 10: "Now we can train real models on real data fast!"
Gap Distance: ZERO between every module
EXPERT VALIDATION PREDICTION
With this progression, experts will say:
- ✅ "Perfect logical flow" - each module builds immediately
- ✅ "No wasted learning" - everything gets used right away
- ✅ "Natural motivation" - students feel the need for each next step
- ✅ "Production-like progression" - mirrors how real ML systems evolve
IMPLEMENTATION REQUIREMENTS
Module 7: Optimizers
- Must include immediate MLP training examples
- Show clear performance metrics (85% MNIST)
- End with "images need better architectures"
Module 8: Spatial
- Must immediately train CNNs using Module 7's optimizers
- Show CNN vs MLP comparison (92% vs 85%)
- End with "sample-by-sample is inefficient"
Module 9: Training
- Must deliberately show slow single-sample training
- Create genuine frustration with timing
- End with clear "this is too slow" message
Module 10: DataLoader
- Must show dramatic before/after speedup
- Use identical model/optimizer from Module 9
- Students see immediate 20-50x improvement
This creates the beautiful progression you want - every step immediately useful, tightly connected, with clear motivation for what's next.