✅ Phase 1-2 Complete: Modules 1-10 aligned with tutorial master plan ✅ CNN Training Pipeline: Autograd → Spatial → Optimizers → DataLoader → Training ✅ Technical Validation: All modules import and function correctly ✅ CIFAR-10 Ready: Multi-channel Conv2D, BatchNorm, MaxPool2D, complete pipeline Key Achievements: - Fixed module sequence alignment (spatial now Module 7, not 6) - Updated tutorial master plan for logical pedagogical flow - Phase 2 milestone achieved: Students can train CNNs on CIFAR-10 - Complete systems engineering focus throughout all modules - Production-ready CNN pipeline with memory profiling Next Phase: Language models (Modules 11-15) for TinyGPT milestone
8.7 KiB
🚀 TinyTorch Final Module Plan: 17 Modules to ML Systems Mastery
Overview: Three Learning Phases
Phase 1: Foundation (Modules 1-5) → Unlock Inference Examples
Phase 2: Training & Vision (Modules 6-10) → Unlock CNN Training
Phase 3: Language & Systems (Modules 11-17) → Unlock TinyGPT & Competition
📚 Phase 1: Foundation - "Look What You Can Already Do!"
Module 01: Setup
What Students Build:
- Virtual environment configuration
- Rich CLI for beautiful progress tracking
- Testing infrastructure
- Development tools (debugger, profiler stubs)
Systems Concepts:
- Development environment best practices
- Dependency management
- Testing frameworks
Module 02: Tensor
What Students Build:
- N-dimensional array class
- Broadcasting operations
- Memory-efficient views and slicing
- Basic math operations (+, -, *, /)
Systems Concepts:
- Memory layout (row-major vs column-major)
- Cache efficiency
- Vectorization opportunities
- O(1) vs O(N) operations
Module 03: Activations
What Students Build:
- ReLU, Sigmoid, Tanh, Softmax
- Backward pass for each activation
- Numerical stability (LogSoftmax)
Systems Concepts:
- Numerical stability (overflow/underflow)
- Computational complexity per activation
- Memory requirements (in-place vs copy)
Module 04: Layers
What Students Build:
- Module base class
- Parameter management
- Forward/backward protocol
- Layer composition patterns
Systems Concepts:
- Object-oriented design for ML
- Memory management for parameters
- Modular architecture benefits
Module 05: Networks (Dense)
What Students Build:
- Linear/Dense layer
- Sequential container
- Basic neural network class
- Weight initialization
Systems Concepts:
- Matrix multiplication complexity O(N²) or O(N³)
- Parameter memory scaling
- Why initialization matters
🎉 UNLOCK: Inference Examples!
- Run pretrained XOR network
- Run pretrained MNIST classifier
- Run pretrained CIFAR-10 CNN
- Students see their code actually works!
📚 Phase 2: Training & Vision - "Now Train Your Own!"
Module 06: DataLoader
What Students Build:
- Dataset abstraction
- Batch sampling
- Shuffling and iteration
- CIFAR-10 loader
Systems Concepts:
- I/O bottlenecks
- Memory vs disk tradeoffs
- Prefetching and pipelining
Module 07: Autograd
What Students Build:
- Computational graph
- Automatic differentiation
- Gradient accumulation
- Backward pass automation
Systems Concepts:
- Graph memory consumption
- Forward vs reverse mode AD
- Gradient checkpointing concepts
Module 08: Optimizers
What Students Build:
- SGD with momentum
- Adam optimizer
- Learning rate scheduling
- Gradient clipping
Systems Concepts:
- Memory usage (Adam = 3× parameters!)
- Convergence rates
- Numerical stability in updates
Module 09: Training
What Students Build:
- Training loop
- Loss functions (MSE, CrossEntropy)
- Validation and metrics
- Checkpointing
Systems Concepts:
- Memory during training
- Gradient accumulation for large batches
- Disk I/O for checkpoints
Module 10: Spatial (CNN)
What Students Build:
- Conv2d layer
- Pooling operations
- CNN architectures
- Image augmentation
Systems Concepts:
- Convolution complexity O(N²K²C²)
- Memory footprint of feature maps
- Cache-friendly implementations
🎉 UNLOCK: CNN Training!
- Train CNN on CIFAR-10
- Achieve 75% accuracy milestone
- Visualize learned features
📚 Phase 3: Language & Systems - "From Vision to Language to Production!"
Module 11: Tokenization
What Students Build:
- Character tokenizer
- BPE tokenizer basics
- Vocabulary management
- Padding and truncation
Systems Concepts:
- Memory efficiency of token representations
- Vocabulary size tradeoffs
- Tokenization speed considerations
Module 12: Embeddings
What Students Build:
- Embedding layer
- Positional encodings
- Learned vs fixed embeddings
- Embedding initialization
Systems Concepts:
- Embedding table memory (vocab_size × dim)
- Sparse vs dense operations
- Cache locality in lookups
Module 13: Attention
What Students Build:
- Scaled dot-product attention
- Multi-head attention
- Causal masking
- KV-cache basics
Systems Concepts:
- O(N²) attention complexity
- Memory bottlenecks in attention
- Why KV-cache matters
Module 14: Transformers
What Students Build:
- LayerNorm
- Transformer block
- Full GPT architecture
- Residual connections
Systems Concepts:
- Layer normalization stability
- Residual path gradient flow
- Transformer memory scaling
🎉 UNLOCK: TinyGPT!
- Train character-level language model
- Generate text
- Compare with vision models
🔥 Phase 4: Systems Optimization - "Make It Fast, Make It Small!"
Module 15: Kernels
What Students Build:
- Fused operations (e.g., fused_relu_add)
- Matrix multiplication optimization
- Custom CUDA-like kernels (in NumPy)
- Operator fusion patterns
Why Universal:
- Works for MLPs, CNNs, and Transformers
- Reduces memory bandwidth usage
- Speeds up any model architecture
Systems Concepts:
- Memory bandwidth vs compute bound
- Kernel fusion benefits
- Cache optimization
- Vectorization with NumPy
Performance Gains:
- 2-5× speedup from fusion
- Memory bandwidth reduction
- Works on CPU (NumPy vectorization)
Module 16: Compression
What Students Build:
- Quantization (INT8, INT4)
- Pruning (magnitude, structured)
- Knowledge distillation setup
- Model size reduction
Why Universal:
- Quantize any model (MLP/CNN/GPT)
- Prune any architecture
- Distill large to small
Systems Concepts:
- Precision vs accuracy tradeoffs
- Structured vs unstructured sparsity
- Compression ratios
- Inference speedup from quantization
Performance Gains:
- 4× size reduction (FP32 → INT8)
- 2× inference speedup
- 90% sparsity possible
Module 17: Competition - "The Grand Finale!"
What Students Build:
- KV-cache for transformers
- Dynamic batching
- Mixed precision training
- Model ensemble techniques
- All optimizations combined!
Competition Elements:
- Leaderboard: Real-time ranking
- Metrics: Accuracy, speed, model size
- Constraints: Max 10MB model, <100ms inference
- Tasks: CIFAR-10, MNIST, TinyGPT generation
Systems Concepts:
- KV-cache memory management
- Batch size vs latency tradeoffs
- Optimization stacking
- Production deployment considerations
🏆 GRAND FINALE:
- Students submit optimized models
- Automatic evaluation on hidden test set
- Leaderboard shows:
- Accuracy scores
- Inference time
- Model size
- Memory usage
- Winners announced for:
- Best accuracy
- Fastest inference
- Smallest model
- Best accuracy/size ratio
🎯 Why This Structure Works
Progressive Unlocking
- Modules 1-5: Build foundation → Unlock inference (immediate gratification)
- Modules 6-10: Add training → Unlock CNN training (real achievement)
- Modules 11-14: Add language → Unlock TinyGPT (wow factor)
- Modules 15-17: Optimize everything → Competition (epic finale)
Universal Optimizations (Modules 15-17)
- Not architecture-specific
- Work on MLPs, CNNs, and Transformers
- Real production techniques
- Measurable improvements
Competition as Culmination
- Uses EVERYTHING students built
- Competitive element drives engagement
- Multiple winning categories (not just accuracy)
- Shows real ML engineering tradeoffs
- Students optimize their own code!
High Note Ending
- Module 15: "Make it fast!" (kernels)
- Module 16: "Make it small!" (compression)
- Module 17: "Make it production-ready!" (competition)
- Final message: "You built a complete ML framework and optimized it for production!"
📊 Module Complexity Progression
Complexity: ▁▂▃▄▄▅▅▆▆▇▇███████
Modules: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
└─Found.─┘└Training┘└─Language─┘└Systems┘
Unlocks: ↑ ↑ ↑ ↑
Inference CNN TinyGPT Competition
🏁 Student Journey Summary
Week 1-2: Foundation (Modules 1-5)
- "I built tensors and layers!"
- "I can run pretrained models!"
Week 3-4: Training (Modules 6-10)
- "I built autograd from scratch!"
- "I trained a CNN to 75% accuracy!"
Week 5-6: Language (Modules 11-14)
- "I built attention mechanisms!"
- "I have a working GPT!"
Week 7: Systems (Modules 15-17)
- "I optimized everything!"
- "I'm on the leaderboard!"
- "I built a complete, optimized ML framework!"
Final Achievement: "I didn't just learn ML algorithms - I built the entire infrastructure, optimized it for production, and competed against my peers. I understand ML systems engineering!"