mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-01 16:09:36 -05:00

Files

Vijay Janapa Reddi 2f23f757e7 MAJOR: Implement beautiful module progression through strategic reordering

This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles.

## Module Reordering Summary

**Previous Order (Problems)**:
- 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training
- Issues: Autograd before optimizers, DataLoader before training, scattered dependencies

**New Order (Beautiful Progression)**:
- 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader
- Benefits: Each module creates inevitable need for the next

## Pedagogical Flow Achieved

**05_losses** → "Need systematic weight updates" → **06_optimizers**
**06_optimizers** → "Need automatic gradients" → **07_autograd**
**07_autograd** → "Need systematic training" → **08_training**
**08_training** → "MLPs hit limits on images" → **09_spatial**
**09_spatial** → "Training is too slow" → **10_dataloader**

## Technical Changes

### Module Directory Renaming
- `06_autograd` → `07_autograd`
- `07_dataloader` → `10_dataloader`
- `08_optimizers` → `06_optimizers`
- `10_training` → `08_training`
- `09_spatial` → `09_spatial` (no change)

### System Integration Updates
- **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py
- **Test directories**: Renamed module_XX directories to match new numbers
- **Documentation**: Updated all references in MD files and agent configurations
- **CLI integration**: Updated next-steps suggestions for proper flow

### Agent Configuration Updates
- **Quality Assurance**: Updated module audit status with new numbers
- **Module Developer**: Updated work tracking with new sequence
- **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression

## Educational Benefits

1. **Inevitable Discovery**: Each module naturally leads to the next
2. **Cognitive Load**: Concepts introduced exactly when needed
3. **Motivation**: Students understand WHY each tool is necessary
4. **Synthesis**: Everything flows toward complete ML systems understanding
5. **Professional Alignment**: Matches real ML engineering workflows

## Quality Assurance

- ✅ All CLI commands still function
- ✅ Checkpoint system mappings updated
- ✅ Documentation consistency maintained
- ✅ Test directory structure aligned
- ✅ Agent configurations synchronized

**Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.

2025-09-24 15:56:47 -04:00

5.5 KiB

Raw Blame History

Training Systems Module Ordering Analysis

The Core Question

Should DataLoader come BEFORE or AFTER Training? Let's analyze both directions.

Option 1: DataLoader BEFORE Training (Current)

7. DataLoader → 8. Optimizers → 9. Spatial → 10. Training

Pros ✅

Training uses real data from the start - More satisfying
Batching is available - Training loop can show proper batching
Real patterns - SGD/Adam work on actual data distributions
No rework - Training module uses DataLoader immediately

Cons ❌

DataLoader without purpose - Students don't know WHY they need it yet
Abstract introduction - Batching/shuffling seems arbitrary without training context
Delayed gratification - Can't train anything after building DataLoader

Option 2: DataLoader AFTER Training

7. Optimizers → 8. Spatial → 9. Training → 10. DataLoader

Pros ✅

Clear motivation - Students hit limits with toy data, THEN get DataLoader
Natural progression - Simple → Complex data handling
Pedagogical clarity - "Now let's scale to real datasets"

Cons ❌

Training module is limited - Can only use toy/synthetic data
Rework needed - Module 10 updates training to use DataLoader
Artificial limitation - Training without batching feels incomplete

Option 3: Split Approach (RECOMMENDED)

7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training

Why This Works Best 🎯

Module 7: Optimizers

# Learn algorithms on simple problems
# No need for complex data yet
def optimize_parabola():
    w = 5.0
    for _ in range(100):
        grad = 2 * w  # f(w) = w^2
        w = sgd_step(w, grad)

Module 8: DataLoader (RIGHT AFTER OPTIMIZERS)

# Now that we have optimizers, we need data!
# Introduce batching WITH IMMEDIATE USE

# Simple example showing WHY we need batching
dataset = SimpleDataset(10000)  # Too big for memory!
loader = DataLoader(dataset, batch_size=32)

# Immediately use with SGD
for batch in loader:
    # Show how optimizers work with batches
    loss = compute_loss(batch)
    sgd.step(loss)

Module 9: Spatial

# Build CNNs using DataLoader for testing
cifar = CIFAR10Dataset()
loader = DataLoader(cifar, batch_size=1)

# Test convolution on real images
for image, label in loader:
    output = conv2d(image)
    visualize(output)  # See feature maps!

Module 10: Training (EVERYTHING COMES TOGETHER)

# Full training loop with all components
model = CNN()  # From Module 9
optimizer = Adam(model.parameters())  # From Module 7
train_loader = DataLoader(cifar_train)  # From Module 8
val_loader = DataLoader(cifar_val)

# Complete training pipeline
for epoch in range(10):
    for batch in train_loader:
        loss = model.forward(batch)
        optimizer.step(loss.backward())

The Winner: Modified Current Order

7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training

This is optimal because:

Optimizers (Module 7): Learn the algorithms without data complexity
DataLoader (Module 8): Introduce right when needed for optimizer testing
Spatial (Module 9): Use DataLoader to visualize CNN features on real images
Training (Module 10): Everything culminates in complete pipeline

Key Insight: DataLoader as the Bridge 🌉

DataLoader should come AFTER learning optimizers but BEFORE building architectures. This way:

Students understand gradient descent first
Then learn "how do we feed data to optimizers?"
Then build architectures that process this data
Finally put it all together in training

Concrete Examples Showing the Flow

Module 7 (Optimizers) - No DataLoader Needed

# Optimize simple functions
def rosenbrock(x, y):
    return (1-x)**2 + 100*(y-x**2)**2

# Students implement SGD, Adam
optimizer = SGD([x, y], lr=0.01)
for _ in range(1000):
    loss = rosenbrock(x, y)
    optimizer.step(loss.backward())

Module 8 (DataLoader) - Immediate Use Case

# NOW we need to handle real data
mnist = MNISTDataset()  # 60,000 images!

# Without DataLoader (bad)
for i in range(60000):  # Memory explosion!
    optimizer.step(mnist[i])
    
# With DataLoader (good)  
loader = DataLoader(mnist, batch_size=32)
for batch in loader:  # Only 32 in memory
    optimizer.step(batch)

Module 9 (Spatial) - DataLoader for Visualization

# Use DataLoader to explore convolutions
loader = DataLoader(CIFAR10(), batch_size=1)
conv = Conv2d(3, 16, kernel_size=3)

for image, _ in loader:
    features = conv(image)
    plot_feature_maps(features)  # See what CNNs learn!

Module 10 (Training) - Full Integration

# Everything they've built comes together
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = DataLoader(val_set, batch_size=64)

trainer = Trainer(
    model=CNN(),           # Module 9
    optimizer=Adam(),      # Module 7  
    train_loader=train_loader,  # Module 8
    val_loader=val_loader      # Module 8
)

trainer.fit(epochs=20)  # 75% on CIFAR-10!

Final Recommendation

Keep a modified version of current order but ensure:

Module 7 (Optimizers): Focus on algorithms, not data
Module 8 (DataLoader): Immediately show WHY it's needed for optimizers
Module 9 (Spatial): Use DataLoader for CNN exploration
Module 10 (Training): Grand synthesis of all components

This way DataLoader is introduced exactly when students need it, and they use it throughout modules 8-10!

5.5 KiB Raw Blame History

Training Systems Module Ordering Analysis

The Core Question

Option 1: DataLoader BEFORE Training (Current)

Pros ✅

Cons ❌

Option 2: DataLoader AFTER Training

Pros ✅

Cons ❌

Option 3: Split Approach (RECOMMENDED)

Why This Works Best 🎯

Module 7: Optimizers

Module 8: DataLoader (RIGHT AFTER OPTIMIZERS)

Module 9: Spatial

Module 10: Training (EVERYTHING COMES TOGETHER)

The Winner: Modified Current Order

This is optimal because:

Key Insight: DataLoader as the Bridge 🌉

Concrete Examples Showing the Flow

Module 7 (Optimizers) - No DataLoader Needed

Module 8 (DataLoader) - Immediate Use Case

Module 9 (Spatial) - DataLoader for Visualization

Module 10 (Training) - Full Integration

Final Recommendation

5.5 KiB

Raw Blame History