Files
TinyTorch/docs/module-plan-final.md
Vijay Janapa Reddi b3c8dfaa3d MILESTONE: Complete Phase 2 CNN training pipeline
 Phase 1-2 Complete: Modules 1-10 aligned with tutorial master plan
 CNN Training Pipeline: Autograd → Spatial → Optimizers → DataLoader → Training
 Technical Validation: All modules import and function correctly
 CIFAR-10 Ready: Multi-channel Conv2D, BatchNorm, MaxPool2D, complete pipeline

Key Achievements:
- Fixed module sequence alignment (spatial now Module 7, not 6)
- Updated tutorial master plan for logical pedagogical flow
- Phase 2 milestone achieved: Students can train CNNs on CIFAR-10
- Complete systems engineering focus throughout all modules
- Production-ready CNN pipeline with memory profiling

Next Phase: Language models (Modules 11-15) for TinyGPT milestone
2025-09-23 18:33:56 -04:00

352 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🚀 TinyTorch Final Module Plan: 17 Modules to ML Systems Mastery
## Overview: Three Learning Phases
**Phase 1: Foundation (Modules 1-5)** → Unlock Inference Examples
**Phase 2: Training & Vision (Modules 6-10)** → Unlock CNN Training
**Phase 3: Language & Systems (Modules 11-17)** → Unlock TinyGPT & Competition
---
## 📚 Phase 1: Foundation - "Look What You Can Already Do!"
### Module 01: Setup
**What Students Build:**
- Virtual environment configuration
- Rich CLI for beautiful progress tracking
- Testing infrastructure
- Development tools (debugger, profiler stubs)
**Systems Concepts:**
- Development environment best practices
- Dependency management
- Testing frameworks
### Module 02: Tensor
**What Students Build:**
- N-dimensional array class
- Broadcasting operations
- Memory-efficient views and slicing
- Basic math operations (+, -, *, /)
**Systems Concepts:**
- Memory layout (row-major vs column-major)
- Cache efficiency
- Vectorization opportunities
- O(1) vs O(N) operations
### Module 03: Activations
**What Students Build:**
- ReLU, Sigmoid, Tanh, Softmax
- Backward pass for each activation
- Numerical stability (LogSoftmax)
**Systems Concepts:**
- Numerical stability (overflow/underflow)
- Computational complexity per activation
- Memory requirements (in-place vs copy)
### Module 04: Layers
**What Students Build:**
- Module base class
- Parameter management
- Forward/backward protocol
- Layer composition patterns
**Systems Concepts:**
- Object-oriented design for ML
- Memory management for parameters
- Modular architecture benefits
### Module 05: Networks (Dense)
**What Students Build:**
- Linear/Dense layer
- Sequential container
- Basic neural network class
- Weight initialization
**Systems Concepts:**
- Matrix multiplication complexity O(N²) or O(N³)
- Parameter memory scaling
- Why initialization matters
**🎉 UNLOCK: Inference Examples!**
- Run pretrained XOR network
- Run pretrained MNIST classifier
- Run pretrained CIFAR-10 CNN
- Students see their code actually works!
---
## 📚 Phase 2: Training & Vision - "Now Train Your Own!"
### Module 06: DataLoader
**What Students Build:**
- Dataset abstraction
- Batch sampling
- Shuffling and iteration
- CIFAR-10 loader
**Systems Concepts:**
- I/O bottlenecks
- Memory vs disk tradeoffs
- Prefetching and pipelining
### Module 07: Autograd
**What Students Build:**
- Computational graph
- Automatic differentiation
- Gradient accumulation
- Backward pass automation
**Systems Concepts:**
- Graph memory consumption
- Forward vs reverse mode AD
- Gradient checkpointing concepts
### Module 08: Optimizers
**What Students Build:**
- SGD with momentum
- Adam optimizer
- Learning rate scheduling
- Gradient clipping
**Systems Concepts:**
- Memory usage (Adam = 3× parameters!)
- Convergence rates
- Numerical stability in updates
### Module 09: Training
**What Students Build:**
- Training loop
- Loss functions (MSE, CrossEntropy)
- Validation and metrics
- Checkpointing
**Systems Concepts:**
- Memory during training
- Gradient accumulation for large batches
- Disk I/O for checkpoints
### Module 10: Spatial (CNN)
**What Students Build:**
- Conv2d layer
- Pooling operations
- CNN architectures
- Image augmentation
**Systems Concepts:**
- Convolution complexity O(N²K²C²)
- Memory footprint of feature maps
- Cache-friendly implementations
**🎉 UNLOCK: CNN Training!**
- Train CNN on CIFAR-10
- Achieve 75% accuracy milestone
- Visualize learned features
---
## 📚 Phase 3: Language & Systems - "From Vision to Language to Production!"
### Module 11: Tokenization
**What Students Build:**
- Character tokenizer
- BPE tokenizer basics
- Vocabulary management
- Padding and truncation
**Systems Concepts:**
- Memory efficiency of token representations
- Vocabulary size tradeoffs
- Tokenization speed considerations
### Module 12: Embeddings
**What Students Build:**
- Embedding layer
- Positional encodings
- Learned vs fixed embeddings
- Embedding initialization
**Systems Concepts:**
- Embedding table memory (vocab_size × dim)
- Sparse vs dense operations
- Cache locality in lookups
### Module 13: Attention
**What Students Build:**
- Scaled dot-product attention
- Multi-head attention
- Causal masking
- KV-cache basics
**Systems Concepts:**
- O(N²) attention complexity
- Memory bottlenecks in attention
- Why KV-cache matters
### Module 14: Transformers
**What Students Build:**
- LayerNorm
- Transformer block
- Full GPT architecture
- Residual connections
**Systems Concepts:**
- Layer normalization stability
- Residual path gradient flow
- Transformer memory scaling
**🎉 UNLOCK: TinyGPT!**
- Train character-level language model
- Generate text
- Compare with vision models
---
## 🔥 Phase 4: Systems Optimization - "Make It Fast, Make It Small!"
### Module 15: Kernels
**What Students Build:**
- Fused operations (e.g., fused_relu_add)
- Matrix multiplication optimization
- Custom CUDA-like kernels (in NumPy)
- Operator fusion patterns
**Why Universal:**
- Works for MLPs, CNNs, and Transformers
- Reduces memory bandwidth usage
- Speeds up any model architecture
**Systems Concepts:**
- Memory bandwidth vs compute bound
- Kernel fusion benefits
- Cache optimization
- Vectorization with NumPy
**Performance Gains:**
- 2-5× speedup from fusion
- Memory bandwidth reduction
- Works on CPU (NumPy vectorization)
### Module 16: Compression
**What Students Build:**
- Quantization (INT8, INT4)
- Pruning (magnitude, structured)
- Knowledge distillation setup
- Model size reduction
**Why Universal:**
- Quantize any model (MLP/CNN/GPT)
- Prune any architecture
- Distill large to small
**Systems Concepts:**
- Precision vs accuracy tradeoffs
- Structured vs unstructured sparsity
- Compression ratios
- Inference speedup from quantization
**Performance Gains:**
- 4× size reduction (FP32 → INT8)
- 2× inference speedup
- 90% sparsity possible
### Module 17: Competition - "The Grand Finale!"
**What Students Build:**
- KV-cache for transformers
- Dynamic batching
- Mixed precision training
- Model ensemble techniques
- All optimizations combined!
**Competition Elements:**
- **Leaderboard**: Real-time ranking
- **Metrics**: Accuracy, speed, model size
- **Constraints**: Max 10MB model, <100ms inference
- **Tasks**: CIFAR-10, MNIST, TinyGPT generation
**Systems Concepts:**
- KV-cache memory management
- Batch size vs latency tradeoffs
- Optimization stacking
- Production deployment considerations
**🏆 GRAND FINALE:**
- Students submit optimized models
- Automatic evaluation on hidden test set
- Leaderboard shows:
- Accuracy scores
- Inference time
- Model size
- Memory usage
- Winners announced for:
- Best accuracy
- Fastest inference
- Smallest model
- Best accuracy/size ratio
---
## 🎯 Why This Structure Works
### Progressive Unlocking
1. **Modules 1-5**: Build foundation → Unlock inference (immediate gratification)
2. **Modules 6-10**: Add training → Unlock CNN training (real achievement)
3. **Modules 11-14**: Add language → Unlock TinyGPT (wow factor)
4. **Modules 15-17**: Optimize everything → Competition (epic finale)
### Universal Optimizations (Modules 15-17)
- **Not** architecture-specific
- Work on MLPs, CNNs, and Transformers
- Real production techniques
- Measurable improvements
### Competition as Culmination
- Uses EVERYTHING students built
- Competitive element drives engagement
- Multiple winning categories (not just accuracy)
- Shows real ML engineering tradeoffs
- Students optimize their own code!
### High Note Ending
- Module 15: "Make it fast!" (kernels)
- Module 16: "Make it small!" (compression)
- Module 17: "Make it production-ready!" (competition)
- Final message: "You built a complete ML framework and optimized it for production!"
---
## 📊 Module Complexity Progression
```
Complexity: ▁▂▃▄▄▅▅▆▆▇▇███████
Modules: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
└─Found.─┘└Training┘└─Language─┘└Systems┘
Unlocks: ↑ ↑ ↑ ↑
Inference CNN TinyGPT Competition
```
---
## 🏁 Student Journey Summary
**Week 1-2**: Foundation (Modules 1-5)
- "I built tensors and layers!"
- "I can run pretrained models!"
**Week 3-4**: Training (Modules 6-10)
- "I built autograd from scratch!"
- "I trained a CNN to 75% accuracy!"
**Week 5-6**: Language (Modules 11-14)
- "I built attention mechanisms!"
- "I have a working GPT!"
**Week 7**: Systems (Modules 15-17)
- "I optimized everything!"
- "I'm on the leaderboard!"
- "I built a complete, optimized ML framework!"
**Final Achievement**:
"I didn't just learn ML algorithms - I built the entire infrastructure, optimized it for production, and competed against my peers. I understand ML systems engineering!"