mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-03 04:32:33 -05:00
✅ Phase 1-2 Complete: Modules 1-10 aligned with tutorial master plan ✅ CNN Training Pipeline: Autograd → Spatial → Optimizers → DataLoader → Training ✅ Technical Validation: All modules import and function correctly ✅ CIFAR-10 Ready: Multi-channel Conv2D, BatchNorm, MaxPool2D, complete pipeline Key Achievements: - Fixed module sequence alignment (spatial now Module 7, not 6) - Updated tutorial master plan for logical pedagogical flow - Phase 2 milestone achieved: Students can train CNNs on CIFAR-10 - Complete systems engineering focus throughout all modules - Production-ready CNN pipeline with memory profiling Next Phase: Language models (Modules 11-15) for TinyGPT milestone
352 lines
8.7 KiB
Markdown
352 lines
8.7 KiB
Markdown
# 🚀 TinyTorch Final Module Plan: 17 Modules to ML Systems Mastery
|
||
|
||
## Overview: Three Learning Phases
|
||
|
||
**Phase 1: Foundation (Modules 1-5)** → Unlock Inference Examples
|
||
**Phase 2: Training & Vision (Modules 6-10)** → Unlock CNN Training
|
||
**Phase 3: Language & Systems (Modules 11-17)** → Unlock TinyGPT & Competition
|
||
|
||
---
|
||
|
||
## 📚 Phase 1: Foundation - "Look What You Can Already Do!"
|
||
|
||
### Module 01: Setup
|
||
**What Students Build:**
|
||
- Virtual environment configuration
|
||
- Rich CLI for beautiful progress tracking
|
||
- Testing infrastructure
|
||
- Development tools (debugger, profiler stubs)
|
||
|
||
**Systems Concepts:**
|
||
- Development environment best practices
|
||
- Dependency management
|
||
- Testing frameworks
|
||
|
||
### Module 02: Tensor
|
||
**What Students Build:**
|
||
- N-dimensional array class
|
||
- Broadcasting operations
|
||
- Memory-efficient views and slicing
|
||
- Basic math operations (+, -, *, /)
|
||
|
||
**Systems Concepts:**
|
||
- Memory layout (row-major vs column-major)
|
||
- Cache efficiency
|
||
- Vectorization opportunities
|
||
- O(1) vs O(N) operations
|
||
|
||
### Module 03: Activations
|
||
**What Students Build:**
|
||
- ReLU, Sigmoid, Tanh, Softmax
|
||
- Backward pass for each activation
|
||
- Numerical stability (LogSoftmax)
|
||
|
||
**Systems Concepts:**
|
||
- Numerical stability (overflow/underflow)
|
||
- Computational complexity per activation
|
||
- Memory requirements (in-place vs copy)
|
||
|
||
### Module 04: Layers
|
||
**What Students Build:**
|
||
- Module base class
|
||
- Parameter management
|
||
- Forward/backward protocol
|
||
- Layer composition patterns
|
||
|
||
**Systems Concepts:**
|
||
- Object-oriented design for ML
|
||
- Memory management for parameters
|
||
- Modular architecture benefits
|
||
|
||
### Module 05: Networks (Dense)
|
||
**What Students Build:**
|
||
- Linear/Dense layer
|
||
- Sequential container
|
||
- Basic neural network class
|
||
- Weight initialization
|
||
|
||
**Systems Concepts:**
|
||
- Matrix multiplication complexity O(N²) or O(N³)
|
||
- Parameter memory scaling
|
||
- Why initialization matters
|
||
|
||
**🎉 UNLOCK: Inference Examples!**
|
||
- Run pretrained XOR network
|
||
- Run pretrained MNIST classifier
|
||
- Run pretrained CIFAR-10 CNN
|
||
- Students see their code actually works!
|
||
|
||
---
|
||
|
||
## 📚 Phase 2: Training & Vision - "Now Train Your Own!"
|
||
|
||
### Module 06: DataLoader
|
||
**What Students Build:**
|
||
- Dataset abstraction
|
||
- Batch sampling
|
||
- Shuffling and iteration
|
||
- CIFAR-10 loader
|
||
|
||
**Systems Concepts:**
|
||
- I/O bottlenecks
|
||
- Memory vs disk tradeoffs
|
||
- Prefetching and pipelining
|
||
|
||
### Module 07: Autograd
|
||
**What Students Build:**
|
||
- Computational graph
|
||
- Automatic differentiation
|
||
- Gradient accumulation
|
||
- Backward pass automation
|
||
|
||
**Systems Concepts:**
|
||
- Graph memory consumption
|
||
- Forward vs reverse mode AD
|
||
- Gradient checkpointing concepts
|
||
|
||
### Module 08: Optimizers
|
||
**What Students Build:**
|
||
- SGD with momentum
|
||
- Adam optimizer
|
||
- Learning rate scheduling
|
||
- Gradient clipping
|
||
|
||
**Systems Concepts:**
|
||
- Memory usage (Adam = 3× parameters!)
|
||
- Convergence rates
|
||
- Numerical stability in updates
|
||
|
||
### Module 09: Training
|
||
**What Students Build:**
|
||
- Training loop
|
||
- Loss functions (MSE, CrossEntropy)
|
||
- Validation and metrics
|
||
- Checkpointing
|
||
|
||
**Systems Concepts:**
|
||
- Memory during training
|
||
- Gradient accumulation for large batches
|
||
- Disk I/O for checkpoints
|
||
|
||
### Module 10: Spatial (CNN)
|
||
**What Students Build:**
|
||
- Conv2d layer
|
||
- Pooling operations
|
||
- CNN architectures
|
||
- Image augmentation
|
||
|
||
**Systems Concepts:**
|
||
- Convolution complexity O(N²K²C²)
|
||
- Memory footprint of feature maps
|
||
- Cache-friendly implementations
|
||
|
||
**🎉 UNLOCK: CNN Training!**
|
||
- Train CNN on CIFAR-10
|
||
- Achieve 75% accuracy milestone
|
||
- Visualize learned features
|
||
|
||
---
|
||
|
||
## 📚 Phase 3: Language & Systems - "From Vision to Language to Production!"
|
||
|
||
### Module 11: Tokenization
|
||
**What Students Build:**
|
||
- Character tokenizer
|
||
- BPE tokenizer basics
|
||
- Vocabulary management
|
||
- Padding and truncation
|
||
|
||
**Systems Concepts:**
|
||
- Memory efficiency of token representations
|
||
- Vocabulary size tradeoffs
|
||
- Tokenization speed considerations
|
||
|
||
### Module 12: Embeddings
|
||
**What Students Build:**
|
||
- Embedding layer
|
||
- Positional encodings
|
||
- Learned vs fixed embeddings
|
||
- Embedding initialization
|
||
|
||
**Systems Concepts:**
|
||
- Embedding table memory (vocab_size × dim)
|
||
- Sparse vs dense operations
|
||
- Cache locality in lookups
|
||
|
||
### Module 13: Attention
|
||
**What Students Build:**
|
||
- Scaled dot-product attention
|
||
- Multi-head attention
|
||
- Causal masking
|
||
- KV-cache basics
|
||
|
||
**Systems Concepts:**
|
||
- O(N²) attention complexity
|
||
- Memory bottlenecks in attention
|
||
- Why KV-cache matters
|
||
|
||
### Module 14: Transformers
|
||
**What Students Build:**
|
||
- LayerNorm
|
||
- Transformer block
|
||
- Full GPT architecture
|
||
- Residual connections
|
||
|
||
**Systems Concepts:**
|
||
- Layer normalization stability
|
||
- Residual path gradient flow
|
||
- Transformer memory scaling
|
||
|
||
**🎉 UNLOCK: TinyGPT!**
|
||
- Train character-level language model
|
||
- Generate text
|
||
- Compare with vision models
|
||
|
||
---
|
||
|
||
## 🔥 Phase 4: Systems Optimization - "Make It Fast, Make It Small!"
|
||
|
||
### Module 15: Kernels
|
||
**What Students Build:**
|
||
- Fused operations (e.g., fused_relu_add)
|
||
- Matrix multiplication optimization
|
||
- Custom CUDA-like kernels (in NumPy)
|
||
- Operator fusion patterns
|
||
|
||
**Why Universal:**
|
||
- Works for MLPs, CNNs, and Transformers
|
||
- Reduces memory bandwidth usage
|
||
- Speeds up any model architecture
|
||
|
||
**Systems Concepts:**
|
||
- Memory bandwidth vs compute bound
|
||
- Kernel fusion benefits
|
||
- Cache optimization
|
||
- Vectorization with NumPy
|
||
|
||
**Performance Gains:**
|
||
- 2-5× speedup from fusion
|
||
- Memory bandwidth reduction
|
||
- Works on CPU (NumPy vectorization)
|
||
|
||
### Module 16: Compression
|
||
**What Students Build:**
|
||
- Quantization (INT8, INT4)
|
||
- Pruning (magnitude, structured)
|
||
- Knowledge distillation setup
|
||
- Model size reduction
|
||
|
||
**Why Universal:**
|
||
- Quantize any model (MLP/CNN/GPT)
|
||
- Prune any architecture
|
||
- Distill large to small
|
||
|
||
**Systems Concepts:**
|
||
- Precision vs accuracy tradeoffs
|
||
- Structured vs unstructured sparsity
|
||
- Compression ratios
|
||
- Inference speedup from quantization
|
||
|
||
**Performance Gains:**
|
||
- 4× size reduction (FP32 → INT8)
|
||
- 2× inference speedup
|
||
- 90% sparsity possible
|
||
|
||
### Module 17: Competition - "The Grand Finale!"
|
||
**What Students Build:**
|
||
- KV-cache for transformers
|
||
- Dynamic batching
|
||
- Mixed precision training
|
||
- Model ensemble techniques
|
||
- All optimizations combined!
|
||
|
||
**Competition Elements:**
|
||
- **Leaderboard**: Real-time ranking
|
||
- **Metrics**: Accuracy, speed, model size
|
||
- **Constraints**: Max 10MB model, <100ms inference
|
||
- **Tasks**: CIFAR-10, MNIST, TinyGPT generation
|
||
|
||
**Systems Concepts:**
|
||
- KV-cache memory management
|
||
- Batch size vs latency tradeoffs
|
||
- Optimization stacking
|
||
- Production deployment considerations
|
||
|
||
**🏆 GRAND FINALE:**
|
||
- Students submit optimized models
|
||
- Automatic evaluation on hidden test set
|
||
- Leaderboard shows:
|
||
- Accuracy scores
|
||
- Inference time
|
||
- Model size
|
||
- Memory usage
|
||
- Winners announced for:
|
||
- Best accuracy
|
||
- Fastest inference
|
||
- Smallest model
|
||
- Best accuracy/size ratio
|
||
|
||
---
|
||
|
||
## 🎯 Why This Structure Works
|
||
|
||
### Progressive Unlocking
|
||
1. **Modules 1-5**: Build foundation → Unlock inference (immediate gratification)
|
||
2. **Modules 6-10**: Add training → Unlock CNN training (real achievement)
|
||
3. **Modules 11-14**: Add language → Unlock TinyGPT (wow factor)
|
||
4. **Modules 15-17**: Optimize everything → Competition (epic finale)
|
||
|
||
### Universal Optimizations (Modules 15-17)
|
||
- **Not** architecture-specific
|
||
- Work on MLPs, CNNs, and Transformers
|
||
- Real production techniques
|
||
- Measurable improvements
|
||
|
||
### Competition as Culmination
|
||
- Uses EVERYTHING students built
|
||
- Competitive element drives engagement
|
||
- Multiple winning categories (not just accuracy)
|
||
- Shows real ML engineering tradeoffs
|
||
- Students optimize their own code!
|
||
|
||
### High Note Ending
|
||
- Module 15: "Make it fast!" (kernels)
|
||
- Module 16: "Make it small!" (compression)
|
||
- Module 17: "Make it production-ready!" (competition)
|
||
- Final message: "You built a complete ML framework and optimized it for production!"
|
||
|
||
---
|
||
|
||
## 📊 Module Complexity Progression
|
||
|
||
```
|
||
Complexity: ▁▂▃▄▄▅▅▆▆▇▇███████
|
||
Modules: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
||
└─Found.─┘└Training┘└─Language─┘└Systems┘
|
||
Unlocks: ↑ ↑ ↑ ↑
|
||
Inference CNN TinyGPT Competition
|
||
```
|
||
|
||
---
|
||
|
||
## 🏁 Student Journey Summary
|
||
|
||
**Week 1-2**: Foundation (Modules 1-5)
|
||
- "I built tensors and layers!"
|
||
- "I can run pretrained models!"
|
||
|
||
**Week 3-4**: Training (Modules 6-10)
|
||
- "I built autograd from scratch!"
|
||
- "I trained a CNN to 75% accuracy!"
|
||
|
||
**Week 5-6**: Language (Modules 11-14)
|
||
- "I built attention mechanisms!"
|
||
- "I have a working GPT!"
|
||
|
||
**Week 7**: Systems (Modules 15-17)
|
||
- "I optimized everything!"
|
||
- "I'm on the leaderboard!"
|
||
- "I built a complete, optimized ML framework!"
|
||
|
||
**Final Achievement**:
|
||
"I didn't just learn ML algorithms - I built the entire infrastructure, optimized it for production, and competed against my peers. I understand ML systems engineering!" |