MAJOR: Implement beautiful module progression through strategic reordering

This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles.

## Module Reordering Summary

**Previous Order (Problems)**:
- 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training
- Issues: Autograd before optimizers, DataLoader before training, scattered dependencies

**New Order (Beautiful Progression)**:
- 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader
- Benefits: Each module creates inevitable need for the next

## Pedagogical Flow Achieved

**05_losses** → "Need systematic weight updates" → **06_optimizers**
**06_optimizers** → "Need automatic gradients" → **07_autograd**
**07_autograd** → "Need systematic training" → **08_training**
**08_training** → "MLPs hit limits on images" → **09_spatial**
**09_spatial** → "Training is too slow" → **10_dataloader**

## Technical Changes

### Module Directory Renaming
- `06_autograd` → `07_autograd`
- `07_dataloader` → `10_dataloader`
- `08_optimizers` → `06_optimizers`
- `10_training` → `08_training`
- `09_spatial` → `09_spatial` (no change)

### System Integration Updates
- **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py
- **Test directories**: Renamed module_XX directories to match new numbers
- **Documentation**: Updated all references in MD files and agent configurations
- **CLI integration**: Updated next-steps suggestions for proper flow

### Agent Configuration Updates
- **Quality Assurance**: Updated module audit status with new numbers
- **Module Developer**: Updated work tracking with new sequence
- **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression

## Educational Benefits

1. **Inevitable Discovery**: Each module naturally leads to the next
2. **Cognitive Load**: Concepts introduced exactly when needed
3. **Motivation**: Students understand WHY each tool is necessary
4. **Synthesis**: Everything flows toward complete ML systems understanding
5. **Professional Alignment**: Matches real ML engineering workflows

## Quality Assurance

-  All CLI commands still function
-  Checkpoint system mappings updated
-  Documentation consistency maintained
-  Test directory structure aligned
-  Agent configurations synchronized

**Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
This commit is contained in:
Vijay Janapa Reddi
2025-09-24 15:56:47 -04:00
parent 0d87b6603f
commit 2f23f757e7
68 changed files with 5875 additions and 2399 deletions

View File

@@ -0,0 +1,184 @@
# Training Systems Module Ordering Analysis
## The Core Question
Should DataLoader come BEFORE or AFTER Training? Let's analyze both directions.
## Option 1: DataLoader BEFORE Training (Current)
```
7. DataLoader → 8. Optimizers → 9. Spatial → 10. Training
```
### Pros ✅
- **Training uses real data from the start** - More satisfying
- **Batching is available** - Training loop can show proper batching
- **Real patterns** - SGD/Adam work on actual data distributions
- **No rework** - Training module uses DataLoader immediately
### Cons ❌
- **DataLoader without purpose** - Students don't know WHY they need it yet
- **Abstract introduction** - Batching/shuffling seems arbitrary without training context
- **Delayed gratification** - Can't train anything after building DataLoader
## Option 2: DataLoader AFTER Training
```
7. Optimizers → 8. Spatial → 9. Training → 10. DataLoader
```
### Pros ✅
- **Clear motivation** - Students hit limits with toy data, THEN get DataLoader
- **Natural progression** - Simple → Complex data handling
- **Pedagogical clarity** - "Now let's scale to real datasets"
### Cons ❌
- **Training module is limited** - Can only use toy/synthetic data
- **Rework needed** - Module 10 updates training to use DataLoader
- **Artificial limitation** - Training without batching feels incomplete
## Option 3: Split Approach (RECOMMENDED)
```
7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training
```
### Why This Works Best 🎯
#### Module 7: Optimizers
```python
# Learn algorithms on simple problems
# No need for complex data yet
def optimize_parabola():
w = 5.0
for _ in range(100):
grad = 2 * w # f(w) = w^2
w = sgd_step(w, grad)
```
#### Module 8: DataLoader (RIGHT AFTER OPTIMIZERS)
```python
# Now that we have optimizers, we need data!
# Introduce batching WITH IMMEDIATE USE
# Simple example showing WHY we need batching
dataset = SimpleDataset(10000) # Too big for memory!
loader = DataLoader(dataset, batch_size=32)
# Immediately use with SGD
for batch in loader:
# Show how optimizers work with batches
loss = compute_loss(batch)
sgd.step(loss)
```
#### Module 9: Spatial
```python
# Build CNNs using DataLoader for testing
cifar = CIFAR10Dataset()
loader = DataLoader(cifar, batch_size=1)
# Test convolution on real images
for image, label in loader:
output = conv2d(image)
visualize(output) # See feature maps!
```
#### Module 10: Training (EVERYTHING COMES TOGETHER)
```python
# Full training loop with all components
model = CNN() # From Module 9
optimizer = Adam(model.parameters()) # From Module 7
train_loader = DataLoader(cifar_train) # From Module 8
val_loader = DataLoader(cifar_val)
# Complete training pipeline
for epoch in range(10):
for batch in train_loader:
loss = model.forward(batch)
optimizer.step(loss.backward())
```
## The Winner: Modified Current Order
```
7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training
```
### This is optimal because:
1. **Optimizers (Module 7)**: Learn the algorithms without data complexity
2. **DataLoader (Module 8)**: Introduce right when needed for optimizer testing
3. **Spatial (Module 9)**: Use DataLoader to visualize CNN features on real images
4. **Training (Module 10)**: Everything culminates in complete pipeline
### Key Insight: DataLoader as the Bridge 🌉
DataLoader should come AFTER learning optimizers but BEFORE building architectures. This way:
- Students understand gradient descent first
- Then learn "how do we feed data to optimizers?"
- Then build architectures that process this data
- Finally put it all together in training
## Concrete Examples Showing the Flow
### Module 7 (Optimizers) - No DataLoader Needed
```python
# Optimize simple functions
def rosenbrock(x, y):
return (1-x)**2 + 100*(y-x**2)**2
# Students implement SGD, Adam
optimizer = SGD([x, y], lr=0.01)
for _ in range(1000):
loss = rosenbrock(x, y)
optimizer.step(loss.backward())
```
### Module 8 (DataLoader) - Immediate Use Case
```python
# NOW we need to handle real data
mnist = MNISTDataset() # 60,000 images!
# Without DataLoader (bad)
for i in range(60000): # Memory explosion!
optimizer.step(mnist[i])
# With DataLoader (good)
loader = DataLoader(mnist, batch_size=32)
for batch in loader: # Only 32 in memory
optimizer.step(batch)
```
### Module 9 (Spatial) - DataLoader for Visualization
```python
# Use DataLoader to explore convolutions
loader = DataLoader(CIFAR10(), batch_size=1)
conv = Conv2d(3, 16, kernel_size=3)
for image, _ in loader:
features = conv(image)
plot_feature_maps(features) # See what CNNs learn!
```
### Module 10 (Training) - Full Integration
```python
# Everything they've built comes together
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = DataLoader(val_set, batch_size=64)
trainer = Trainer(
model=CNN(), # Module 9
optimizer=Adam(), # Module 7
train_loader=train_loader, # Module 8
val_loader=val_loader # Module 8
)
trainer.fit(epochs=20) # 75% on CIFAR-10!
```
## Final Recommendation
Keep a modified version of current order but ensure:
1. **Module 7 (Optimizers)**: Focus on algorithms, not data
2. **Module 8 (DataLoader)**: Immediately show WHY it's needed for optimizers
3. **Module 9 (Spatial)**: Use DataLoader for CNN exploration
4. **Module 10 (Training)**: Grand synthesis of all components
This way DataLoader is introduced exactly when students need it, and they use it throughout modules 8-10!