mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-28 14:12:53 -05:00
Clean, dependency-driven organization: - Part I (1-5): MLPs for XORNet - Part II (6-10): CNNs for CIFAR-10 - Part III (11-15): Transformers for TinyGPT Key improvements: - Dropped modules 16-17 (regularization/systems) to maintain scope - Moved normalization to module 13 (Part III where it's needed) - Created three CIFAR-10 examples: random, MLP, CNN - Each part introduces ONE major innovation (FC → Conv → Attention) CIFAR-10 now showcases progression: - test_random_baseline.py: ~10% (random chance) - train_mlp.py: ~55% (no convolutions) - train_cnn.py: ~60%+ (WITH Conv2D - shows why convolutions matter!) This follows actual ML history and each module is needed for its capstone.
3.5 KiB
3.5 KiB
TinyTorch 15-Module Structure
Three-Part Journey: MLPs → CNNs → Transformers
Part I: Multi-Layer Perceptrons (Modules 1-5)
Goal: Build neural networks that can solve XOR
| Module | Topic | What You Build |
|---|---|---|
| 01 | Setup | Development environment |
| 02 | Tensors | N-dimensional arrays |
| 03 | Activations | ReLU, Sigmoid, Softmax |
| 04 | Layers | Dense layers |
| 05 | Networks | Sequential models |
Capstone: XORNet - Proves neural networks can learn non-linear functions
Part II: Convolutional Neural Networks (Modules 6-10)
Goal: Build CNNs for image classification
| Module | Topic | What You Build |
|---|---|---|
| 06 | Spatial | Conv2D, MaxPool2D |
| 07 | DataLoader | Efficient data pipelines |
| 08 | Autograd | Automatic differentiation |
| 09 | Optimizers | SGD, Adam |
| 10 | Training | Complete training loops |
Capstone: CIFAR-10 with three approaches:
- Random Baseline: ~10% accuracy (chance)
- MLP Approach: ~55% accuracy (no convolutions)
- CNN Approach: ~60%+ accuracy (WITH Conv2D!)
This progression shows WHY convolutions matter for vision!
Part III: Transformers (Modules 11-15)
Goal: Build transformers for text generation
| Module | Topic | What You Build |
|---|---|---|
| 11 | Embeddings | Token & positional encoding |
| 12 | Attention | Multi-head attention |
| 13 | Normalization | LayerNorm for stable training |
| 14 | Transformers | Complete transformer blocks |
| 15 | Generation | Autoregressive decoding |
Capstone: TinyGPT - Character-level text generation
Why This Structure Works
Pedagogical Excellence
- Each part introduces ONE major innovation:
- Part I: Fully connected networks (the foundation)
- Part II: Convolutions (spatial processing)
- Part III: Attention (sequence processing)
Historical Accuracy
- Follows ML evolution:
- 1980s-90s: MLPs dominate
- 2012: AlexNet shows CNNs beat MLPs on ImageNet
- 2017: Transformers revolutionize NLP
Dependency-Driven Design
- Nothing unnecessary: Each module is needed for its capstone
- Progressive complexity: Each part builds on the previous
- Clear motivation: Students see WHY each innovation matters
Module Dependencies
Part I: Foundations
├── 02_tensor (required by everything)
├── 03_activations (required by 04)
├── 04_layers (required by 05)
└── 05_networks (combines all above)
└── ✅ XORNet works!
Part II: Computer Vision
├── 06_spatial (Conv2D - THE KEY!)
├── 07_dataloader (handle real data)
├── 08_autograd (enable learning)
├── 09_optimizers (gradient descent)
└── 10_training (put it all together)
└── ✅ CIFAR-10 CNN works!
Part III: Language Models
├── 11_embeddings (discrete → continuous)
├── 12_attention (THE KEY!)
├── 13_normalization (stable training)
├── 14_transformers (attention + FFN)
└── 15_generation (sampling strategies)
└── ✅ TinyGPT works!
What We Dropped
- Module 16 (Regularization): Important but not essential for capstones
- Module 17 (Systems): Kernels, benchmarking - advanced optimization
These could be bonus content or a separate "Production ML" course.
The Beauty of 15 Modules
- 3 parts × 5 modules = 15: Perfect symmetry!
- Each part is self-contained: Students can stop after any part
- Clear progression: MLP → CNN → Transformer
- Manageable scope: Achievable in one semester