mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-03-11 21:43:34 -05:00

Files

Vijay Janapa Reddi d2cfb2d57e docs: Major cleanup - 46 → 12 essential docs

MASSIVE DOCUMENTATION CLEANUP:
- Reduced from 46 docs to 12 essential files
- Archived 34 outdated planning and analysis documents

✅ KEPT (Essential for current operations):
- STUDENT_QUICKSTART.md - Student onboarding
- INSTRUCTOR_GUIDE.md - Instructor setup
- cifar10-training-guide.md - North star achievement
- tinytorch-assumptions.md - Complexity framework (NEW)
- tinytorch-textbook-alignment.md - Academic alignment

- NBGrader integration docs (3 files)
- Development standards (3 files)
- docs/README.md - Navigation guide (NEW)

🗑️ ARCHIVED (Completed/outdated planning):
- All optimization-modules-* planning docs
- All milestone-* system docs
- All tutorial-master-plan and analysis docs
- Module reordering and structure analysis
- Agent setup and workflow case studies

RESULT: Clean, focused documentation structure
Only active, current docs remain - easy to find what you need!

2025-09-27 17:04:19 -04:00

5.5 KiB

Raw Blame History

Training Systems Module Ordering Analysis

The Core Question

Should DataLoader come BEFORE or AFTER Training? Let's analyze both directions.

Option 1: DataLoader BEFORE Training (Current)

7. DataLoader → 8. Optimizers → 9. Spatial → 10. Training

Pros ✅

Training uses real data from the start - More satisfying
Batching is available - Training loop can show proper batching
Real patterns - SGD/Adam work on actual data distributions
No rework - Training module uses DataLoader immediately

Cons ❌

DataLoader without purpose - Students don't know WHY they need it yet
Abstract introduction - Batching/shuffling seems arbitrary without training context
Delayed gratification - Can't train anything after building DataLoader

Option 2: DataLoader AFTER Training

7. Optimizers → 8. Spatial → 9. Training → 10. DataLoader

Pros ✅

Clear motivation - Students hit limits with toy data, THEN get DataLoader
Natural progression - Simple → Complex data handling
Pedagogical clarity - "Now let's scale to real datasets"

Cons ❌

Training module is limited - Can only use toy/synthetic data
Rework needed - Module 10 updates training to use DataLoader
Artificial limitation - Training without batching feels incomplete

Option 3: Split Approach (RECOMMENDED)

7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training

Why This Works Best 🎯

Module 7: Optimizers

# Learn algorithms on simple problems
# No need for complex data yet
def optimize_parabola():
    w = 5.0
    for _ in range(100):
        grad = 2 * w  # f(w) = w^2
        w = sgd_step(w, grad)

Module 8: DataLoader (RIGHT AFTER OPTIMIZERS)

# Now that we have optimizers, we need data!
# Introduce batching WITH IMMEDIATE USE

# Simple example showing WHY we need batching
dataset = SimpleDataset(10000)  # Too big for memory!
loader = DataLoader(dataset, batch_size=32)

# Immediately use with SGD
for batch in loader:
    # Show how optimizers work with batches
    loss = compute_loss(batch)
    sgd.step(loss)

Module 9: Spatial

# Build CNNs using DataLoader for testing
cifar = CIFAR10Dataset()
loader = DataLoader(cifar, batch_size=1)

# Test convolution on real images
for image, label in loader:
    output = conv2d(image)
    visualize(output)  # See feature maps!

Module 10: Training (EVERYTHING COMES TOGETHER)

# Full training loop with all components
model = CNN()  # From Module 9
optimizer = Adam(model.parameters())  # From Module 7
train_loader = DataLoader(cifar_train)  # From Module 8
val_loader = DataLoader(cifar_val)

# Complete training pipeline
for epoch in range(10):
    for batch in train_loader:
        loss = model.forward(batch)
        optimizer.step(loss.backward())

The Winner: Modified Current Order

7. Optimizers → 8. DataLoader → 9. Spatial → 10. Training

This is optimal because:

Optimizers (Module 7): Learn the algorithms without data complexity
DataLoader (Module 8): Introduce right when needed for optimizer testing
Spatial (Module 9): Use DataLoader to visualize CNN features on real images
Training (Module 10): Everything culminates in complete pipeline

Key Insight: DataLoader as the Bridge 🌉

DataLoader should come AFTER learning optimizers but BEFORE building architectures. This way:

Students understand gradient descent first
Then learn "how do we feed data to optimizers?"
Then build architectures that process this data
Finally put it all together in training

Concrete Examples Showing the Flow

Module 7 (Optimizers) - No DataLoader Needed

# Optimize simple functions
def rosenbrock(x, y):
    return (1-x)**2 + 100*(y-x**2)**2

# Students implement SGD, Adam
optimizer = SGD([x, y], lr=0.01)
for _ in range(1000):
    loss = rosenbrock(x, y)
    optimizer.step(loss.backward())

Module 8 (DataLoader) - Immediate Use Case

# NOW we need to handle real data
mnist = MNISTDataset()  # 60,000 images!

# Without DataLoader (bad)
for i in range(60000):  # Memory explosion!
    optimizer.step(mnist[i])
    
# With DataLoader (good)  
loader = DataLoader(mnist, batch_size=32)
for batch in loader:  # Only 32 in memory
    optimizer.step(batch)

Module 9 (Spatial) - DataLoader for Visualization

# Use DataLoader to explore convolutions
loader = DataLoader(CIFAR10(), batch_size=1)
conv = Conv2d(3, 16, kernel_size=3)

for image, _ in loader:
    features = conv(image)
    plot_feature_maps(features)  # See what CNNs learn!

Module 10 (Training) - Full Integration

# Everything they've built comes together
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = DataLoader(val_set, batch_size=64)

trainer = Trainer(
    model=CNN(),           # Module 9
    optimizer=Adam(),      # Module 7  
    train_loader=train_loader,  # Module 8
    val_loader=val_loader      # Module 8
)

trainer.fit(epochs=20)  # 75% on CIFAR-10!

Final Recommendation

Keep a modified version of current order but ensure:

Module 7 (Optimizers): Focus on algorithms, not data
Module 8 (DataLoader): Immediately show WHY it's needed for optimizers
Module 9 (Spatial): Use DataLoader for CNN exploration
Module 10 (Training): Grand synthesis of all components

This way DataLoader is introduced exactly when students need it, and they use it throughout modules 8-10!

5.5 KiB Raw Blame History

Training Systems Module Ordering Analysis

The Core Question

Option 1: DataLoader BEFORE Training (Current)

Pros ✅

Cons ❌

Option 2: DataLoader AFTER Training

Pros ✅

Cons ❌

Option 3: Split Approach (RECOMMENDED)

Why This Works Best 🎯

Module 7: Optimizers

Module 8: DataLoader (RIGHT AFTER OPTIMIZERS)

Module 9: Spatial

Module 10: Training (EVERYTHING COMES TOGETHER)

The Winner: Modified Current Order

This is optimal because:

Key Insight: DataLoader as the Bridge 🌉

Concrete Examples Showing the Flow

Module 7 (Optimizers) - No DataLoader Needed

Module 8 (DataLoader) - Immediate Use Case

Module 9 (Spatial) - DataLoader for Visualization

Module 10 (Training) - Full Integration

Final Recommendation

5.5 KiB

Raw Blame History