mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-01 16:09:36 -05:00
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
5.5 KiB
5.5 KiB
Training Systems Module Ordering Analysis
The Core Question
Should DataLoader come BEFORE or AFTER Training? Let's analyze both directions.
Option 1: DataLoader BEFORE Training (Current)
7. DataLoader → 8. Optimizers → 9. Spatial → 10. Training
Pros ✅
- Training uses real data from the start - More satisfying
- Batching is available - Training loop can show proper batching
- Real patterns - SGD/Adam work on actual data distributions
- No rework - Training module uses DataLoader immediately
Cons ❌
- DataLoader without purpose - Students don't know WHY they need it yet
- Abstract introduction - Batching/shuffling seems arbitrary without training context
- Delayed gratification - Can't train anything after building DataLoader
Option 2: DataLoader AFTER Training
7. Optimizers → 8. Spatial → 9. Training → 10. DataLoader
Pros ✅
- Clear motivation - Students hit limits with toy data, THEN get DataLoader
- Natural progression - Simple → Complex data handling
- Pedagogical clarity - "Now let's scale to real datasets"
Cons ❌
- Training module is limited - Can only use toy/synthetic data
- Rework needed - Module 10 updates training to use DataLoader
- Artificial limitation - Training without batching feels incomplete
Option 3: Split Approach (RECOMMENDED)
7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training
Why This Works Best 🎯
Module 7: Optimizers
# Learn algorithms on simple problems
# No need for complex data yet
def optimize_parabola():
w = 5.0
for _ in range(100):
grad = 2 * w # f(w) = w^2
w = sgd_step(w, grad)
Module 8: DataLoader (RIGHT AFTER OPTIMIZERS)
# Now that we have optimizers, we need data!
# Introduce batching WITH IMMEDIATE USE
# Simple example showing WHY we need batching
dataset = SimpleDataset(10000) # Too big for memory!
loader = DataLoader(dataset, batch_size=32)
# Immediately use with SGD
for batch in loader:
# Show how optimizers work with batches
loss = compute_loss(batch)
sgd.step(loss)
Module 9: Spatial
# Build CNNs using DataLoader for testing
cifar = CIFAR10Dataset()
loader = DataLoader(cifar, batch_size=1)
# Test convolution on real images
for image, label in loader:
output = conv2d(image)
visualize(output) # See feature maps!
Module 10: Training (EVERYTHING COMES TOGETHER)
# Full training loop with all components
model = CNN() # From Module 9
optimizer = Adam(model.parameters()) # From Module 7
train_loader = DataLoader(cifar_train) # From Module 8
val_loader = DataLoader(cifar_val)
# Complete training pipeline
for epoch in range(10):
for batch in train_loader:
loss = model.forward(batch)
optimizer.step(loss.backward())
The Winner: Modified Current Order
7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training
This is optimal because:
- Optimizers (Module 7): Learn the algorithms without data complexity
- DataLoader (Module 8): Introduce right when needed for optimizer testing
- Spatial (Module 9): Use DataLoader to visualize CNN features on real images
- Training (Module 10): Everything culminates in complete pipeline
Key Insight: DataLoader as the Bridge 🌉
DataLoader should come AFTER learning optimizers but BEFORE building architectures. This way:
- Students understand gradient descent first
- Then learn "how do we feed data to optimizers?"
- Then build architectures that process this data
- Finally put it all together in training
Concrete Examples Showing the Flow
Module 7 (Optimizers) - No DataLoader Needed
# Optimize simple functions
def rosenbrock(x, y):
return (1-x)**2 + 100*(y-x**2)**2
# Students implement SGD, Adam
optimizer = SGD([x, y], lr=0.01)
for _ in range(1000):
loss = rosenbrock(x, y)
optimizer.step(loss.backward())
Module 8 (DataLoader) - Immediate Use Case
# NOW we need to handle real data
mnist = MNISTDataset() # 60,000 images!
# Without DataLoader (bad)
for i in range(60000): # Memory explosion!
optimizer.step(mnist[i])
# With DataLoader (good)
loader = DataLoader(mnist, batch_size=32)
for batch in loader: # Only 32 in memory
optimizer.step(batch)
Module 9 (Spatial) - DataLoader for Visualization
# Use DataLoader to explore convolutions
loader = DataLoader(CIFAR10(), batch_size=1)
conv = Conv2d(3, 16, kernel_size=3)
for image, _ in loader:
features = conv(image)
plot_feature_maps(features) # See what CNNs learn!
Module 10 (Training) - Full Integration
# Everything they've built comes together
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = DataLoader(val_set, batch_size=64)
trainer = Trainer(
model=CNN(), # Module 9
optimizer=Adam(), # Module 7
train_loader=train_loader, # Module 8
val_loader=val_loader # Module 8
)
trainer.fit(epochs=20) # 75% on CIFAR-10!
Final Recommendation
Keep a modified version of current order but ensure:
- Module 7 (Optimizers): Focus on algorithms, not data
- Module 8 (DataLoader): Immediately show WHY it's needed for optimizers
- Module 9 (Spatial): Use DataLoader for CNN exploration
- Module 10 (Training): Grand synthesis of all components
This way DataLoader is introduced exactly when students need it, and they use it throughout modules 8-10!