Files
TinyTorch/modules/LEARNING_PATH.md
Vijay Janapa Reddi a5679de141 Update documentation after module reordering
All module references updated to reflect new ordering:
- Module 15: Quantization (was 16)
- Module 16: Compression (was 17)
- Module 17: Memoization (was 15)

Updated by module-developer and website-manager agents:
- Module ABOUT files with correct numbers and prerequisites
- Cross-references and "What's Next" chains
- Website navigation (_toc.yml) and content
- Learning path progression in LEARNING_PATH.md
- Profile milestone completion message (Module 17)

Pedagogical flow now: Profile → Quantize → Prune → Cache → Accelerate
2025-11-10 19:37:41 -05:00

513 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TinyTorch Learning Journey
**From Zero to Transformer: A 20-Module Adventure**
```
┌─────────────────────────────────────────────────────────────────────┐
│ 🎯 YOUR LEARNING DESTINATION │
│ │
│ Start: "What's a tensor?" │
│ ↓ │
│ Finish: "I built a transformer from scratch using only NumPy!" │
│ │
│ 🏆 North Star Achievement: Train CNNs on CIFAR-10 to 75%+ accuracy │
└─────────────────────────────────────────────────────────────────────┘
```
## Overview: 4 Phases, 20 Modules, 6 Milestones
**Total Time**: 60-80 hours (3-4 weeks at 20 hrs/week)
**Prerequisites**: Python, NumPy basics, basic linear algebra
**Tools**: Just Python + NumPy + Jupyter notebooks
---
## Phase 1: FOUNDATION (Modules 01-04)
**Goal**: Build the fundamental data structures and operations
**Time**: 10-12 hours | **Difficulty**: ⭐⭐ Beginner-friendly
```
┌──────────┐ ┌──────────────┐ ┌─────────┐ ┌─────────┐
│ 01 │─────▶│ 02 │─────▶│ 03 │─────▶│ 04 │
│ Tensor │ │ Activations │ │ Layers │ │ Losses │
│ │ │ │ │ │ │ │
│ • Shape │ │ • ReLU │ │ • Linear│ │ • MSE │
│ • Data │ │ • Sigmoid │ │ • Module│ │ • Cross │
│ • Ops │ │ • Softmax │ │ • Params│ │ Entropy│
└──────────┘ └──────────────┘ └─────────┘ └─────────┘
2-3 hrs 1.5-2 hrs 2-3 hrs 2-3 hrs
⭐⭐ ⭐⭐ ⭐⭐⭐ ⭐⭐⭐
```
### Module Details
**Module 01: Tensor** (2-3 hours, ⭐⭐)
- Build the foundation: n-dimensional arrays with operations
- Implement: shape, reshape, indexing, broadcasting
- Operations: add, multiply, matmul, transpose
- Why it matters: Everything in ML is tensor operations
**Module 02: Activations** (1.5-2 hours, ⭐⭐)
- Add non-linearity: ReLU, Sigmoid, Softmax
- Understand: Why neural networks need activations
- Implement: Forward passes for each activation
- Why it matters: Without activations, networks are just linear algebra
**Module 03: Layers** (2-3 hours, ⭐⭐⭐)
- Build neural network components: Linear layers
- Implement: nn.Module system, Parameter class
- Create: Weight initialization, layer composition
- Why it matters: Foundation for all network architectures
**Module 04: Losses** (2-3 hours, ⭐⭐⭐)
- Measure performance: MSE and CrossEntropy
- Understand: How to quantify model errors
- Implement: Loss calculation and aggregation
- Why it matters: Without loss, we can't train networks
### Milestone Checkpoint 1: 1957 Perceptron
**Unlock After**: Module 04
```
🏆 CHECKPOINT: Train Rosenblatt's Original Perceptron
├─ Dataset: Linearly separable binary classification
├─ Architecture: Single layer, no hidden units
├─ Achievement: First trainable neural network in history!
└─ Test: Can your implementation learn AND/OR logic?
```
---
## Phase 2: TRAINING SYSTEMS (Modules 05-08)
**Goal**: Make your networks learn from data
**Time**: 14-18 hours | **Difficulty**: ⭐⭐⭐ Core ML concepts
```
┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────────┐
│ 05 │─────▶│ 06 │─────▶│ 07 │─────▶│ 08 │
│ Autograd │ │ Optimizers │ │ Training │ │ DataLoader │
│ │ │ │ │ │ │ │
│ • Graph │ │ • SGD │ │ • Loops │ │ • Batching │
│ • Forward│ │ • Momentum │ │ • Epochs │ │ • Shuffling│
│ • Backward│ │ • Adam │ │ • Eval │ │ • Pipeline │
└──────────┘ └────────────┘ └──────────┘ └────────────┘
3-4 hrs 3-4 hrs 4-5 hrs 3-4 hrs
⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐
│ │ │ │
└─────────────────┴──────────────────┴──────────────────┘
ALL BUILD ON TENSOR (Module 01)
```
### Module Details
**Module 05: Autograd** (3-4 hours, ⭐⭐⭐⭐) **CRITICAL MODULE**
- Implement automatic differentiation: The magic of modern ML
- Build: Computational graph, gradient tracking
- Implement: backward() for all operations
- Why it matters: This IS machine learning - without gradients, no training
**Module 06: Optimizers** (3-4 hours, ⭐⭐⭐⭐)
- Update weights intelligently: SGD, Momentum, Adam
- Understand: Learning rates, momentum, adaptive methods
- Implement: Parameter updates, state management
- Why it matters: How networks actually improve over time
**Module 07: Training** (4-5 hours, ⭐⭐⭐⭐) **CRITICAL MODULE**
- Complete training loops: The full ML pipeline
- Implement: Epochs, batches, forward/backward passes
- Add: Metrics tracking, model evaluation
- Why it matters: This is where everything comes together
**Module 08: DataLoader** (3-4 hours, ⭐⭐⭐)
- Efficient data handling: Batching, shuffling, pipelines
- Implement: Batch creation, data iteration
- Optimize: Memory efficiency, preprocessing
- Why it matters: Real ML needs to handle millions of examples
### Milestone Checkpoint 2: 1969 XOR Crisis & Solution
**Unlock After**: Module 07
```
🏆 CHECKPOINT: Solve the Problem That Nearly Killed AI
├─ Dataset: XOR (the "impossible" problem for single-layer networks)
├─ Architecture: Multi-layer perceptron with hidden units
├─ Achievement: Prove Minsky wrong - MLPs can learn XOR!
└─ Test: 100% accuracy on XOR with your backpropagation
```
### Milestone Checkpoint 3: 1986 MLP Revival
**Unlock After**: Module 08
```
🏆 CHECKPOINT: Recognize Handwritten Digits (MNIST)
├─ Dataset: MNIST (60,000 handwritten digits)
├─ Architecture: 2-3 layer MLP with ReLU activations
├─ Achievement: 95%+ accuracy on real computer vision!
└─ Test: Your network recognizes digits you draw yourself
```
---
## Phase 3: ADVANCED ARCHITECTURES (Modules 09-13)
**Goal**: Build modern CV and NLP architectures
**Time**: 20-25 hours | **Difficulty**: ⭐⭐⭐⭐ Advanced concepts
```
┌──────────┐ ┌───────────────┐ ┌─────────────┐
│ 09 │─────▶│ 10 │─────▶│ 11 │
│ Spatial │ │ Tokenization │ │ Embeddings │
│ │ │ │ │ │
│ • Conv2d │ │ • BPE │ │ • Token Emb │
│ • Pool2d │ │ • Vocab │ │ • Position │
│ • CNNs │ │ • Encoding │ │ • Learned │
└──────────┘ └───────────────┘ └─────────────┘
5-6 hrs 4-5 hrs 3-4 hrs
⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐
│ │ │
│ └──────────┬───────────┘
│ ▼
│ ┌──────────┐ ┌──────────────┐
│ │ 12 │─────▶│ 13 │
│ │Attention │ │Transformers │
│ │ │ │ │
│ │ • Q,K,V │ │ • Encoder │
│ │ • Multi │ │ • Decoder │
│ │ -Head │ │ • Complete │
│ └──────────┘ └──────────────┘
│ 4-5 hrs 6-8 hrs
│ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
│ │ │
└──────────────────┴──────────────────┘
ALL USE AUTOGRAD (Module 05)
```
### Module Details
**Module 09: Spatial Operations** (5-6 hours, ⭐⭐⭐⭐⭐) **CRITICAL MODULE**
- Convolutional Neural Networks: Modern computer vision
- Implement: Conv2d (with 6 nested loops!), MaxPool2d
- Understand: Why CNNs revolutionized image processing
- Why it matters: The foundation of modern computer vision
**Module 10: Tokenization** (4-5 hours, ⭐⭐⭐⭐)
- Text preprocessing: From strings to numbers
- Implement: Byte-Pair Encoding (BPE), vocabulary building
- Understand: How transformers see language
- Why it matters: Can't process text without tokenization
**Module 11: Embeddings** (3-4 hours, ⭐⭐⭐⭐)
- Convert tokens to vectors: Token and positional embeddings
- Implement: Embedding lookup, sinusoidal position encoding
- Understand: How models represent meaning
- Why it matters: Foundation for all language models
**Module 12: Attention** (4-5 hours, ⭐⭐⭐⭐⭐) **CRITICAL MODULE**
- The transformer revolution: Multi-head self-attention
- Implement: Q, K, V projections, scaled dot-product attention
- Understand: Why attention changed everything
- Why it matters: The core of GPT, BERT, and all modern LLMs
**Module 13: Transformers** (6-8 hours, ⭐⭐⭐⭐⭐) **CRITICAL MODULE**
- Complete transformer architecture: GPT-style models
- Implement: Encoder/decoder blocks, layer norm, residuals
- Build: Full transformer from components
- Why it matters: You're building GPT from scratch!
### Milestone Checkpoint 4: 1998 CNN Revolution
**Unlock After**: Module 09
```
🏆 CHECKPOINT: CIFAR-10 Image Classification (North Star!)
├─ Dataset: CIFAR-10 (50,000 color images, 10 classes)
├─ Architecture: LeNet-inspired CNN with Conv2d + MaxPool
├─ Achievement: 75%+ accuracy on real-world images!
├─ Test: Classify airplanes, cars, birds, cats, etc.
└─ Impact: This is where your framework becomes REAL
```
### Milestone Checkpoint 5: 2017 Transformer Era
**Unlock After**: Module 13
```
🏆 CHECKPOINT: Build a Language Model
├─ Dataset: Text corpus (Shakespeare, WikiText, etc.)
├─ Architecture: GPT-style decoder with multi-head attention
├─ Achievement: Generate coherent text character-by-character
├─ Test: Your model completes sentences meaningfully
└─ Impact: You've built the architecture behind ChatGPT!
```
---
## Phase 4: PRODUCTION SYSTEMS (Modules 14-20)
**Goal**: Optimize and deploy ML systems at scale
**Time**: 18-22 hours | **Difficulty**: ⭐⭐⭐⭐⭐ Systems engineering
```
┌──────────┐ ┌──────────────┐ ┌──────────────┐
│ 14 │─────▶│ 15 │─────▶│ 16 │
│Profiling │ │ Quantization │ │ Compression │
│ │ │ │ │ │
│ • Time │ │ • INT8 │ │ • Pruning │
│ • Memory │ │ • Calibrate │ │ • Distill │
│ • FLOPs │ │ • Compress │ │ • Sparse │
└──────────┘ └──────────────┘ └──────────────┘
3-4 hrs 5-6 hrs 4-5 hrs
⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
▼ ▼ ▼
┌──────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐
│ 17 │─────▶│ 18 │─────▶│ 19 │─────▶│ 20 │
│Memoization│ │Acceleration │ │Benchmark │ │ Capstone │
│ │ │ │ │ │ │ │
│ • KV-Cache│ │ • Vectorize │ │ • Compare│ │ • Full │
│ • Reuse │ │ • Hardware │ │ • Report │ │ System │
│ • Speedup│ │ • Parallel │ │ • Analyze│ │ • Deploy │
└──────────┘ └──────────────┘ └──────────┘ └──────────┘
3-4 hrs 3-4 hrs 3-4 hrs 4-6 hrs
⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
```
### Module Details
**Module 14: Profiling** (3-4 hours, ⭐⭐⭐⭐)
- Measure everything: Time, memory, FLOPs
- Implement: Profiling decorators, bottleneck analysis
- Understand: Where computation actually happens
- Why it matters: Can't optimize what you don't measure
**Module 15: Quantization** (5-6 hours, ⭐⭐⭐⭐⭐)
- Compress models: Float32 → INT8
- Implement: Quantization, calibration, dequantization
- Achieve: 4× smaller models, faster inference
- Why it matters: Deploy models on edge devices
**Module 16: Compression** (4-5 hours, ⭐⭐⭐⭐⭐)
- Shrink models: Pruning and distillation
- Implement: Weight pruning, knowledge distillation
- Achieve: 10× smaller models with minimal accuracy loss
- Why it matters: Mobile ML and resource-constrained deployment
**Module 17: Memoization** (3-4 hours, ⭐⭐⭐⭐)
- Cache computations: KV-cache for transformers
- Implement: Memoization decorators, cache management
- Optimize: 10-100× speedup for inference
- Why it matters: How production LLMs run efficiently
**Module 18: Acceleration** (3-4 hours, ⭐⭐⭐⭐)
- Hardware optimization: Vectorization, parallelization
- Implement: NumPy tricks, batch processing
- Achieve: 10-100× speedups
- Why it matters: Production systems need speed
**Module 19: Benchmarking** (3-4 hours, ⭐⭐⭐⭐)
- Compare implementations: Rigorous performance testing
- Implement: Benchmark suite, statistical analysis
- Report: Scientific measurements
- Why it matters: Engineering decisions need data
**Module 20: Capstone** (4-6 hours, ⭐⭐⭐⭐⭐) **FINAL PROJECT**
- Build complete system: End-to-end ML pipeline
- Integrate: All 19 modules into production-ready system
- Deploy: Real application with optimization
- Why it matters: This is your portfolio piece!
### Milestone Checkpoint 6: 2024 Systems Age
**Unlock After**: Module 20
```
🏆 FINAL CHECKPOINT: Production-Optimized ML System
├─ Challenge: Take any milestone and make it production-ready
├─ Requirements:
│ ├─ 10× faster inference (profiling + acceleration)
│ ├─ 4× smaller model (quantization + compression)
│ ├─ <100ms latency (memoization + optimization)
│ └─ Rigorous benchmarks (statistical significance)
├─ Achievement: You're now an ML systems engineer!
└─ Test: Deploy your system, measure everything, compare to PyTorch
```
---
## Dependency Map: How Modules Connect
```
CORE FOUNDATION
├─ Module 01 (Tensor)
│ ├─▶ Module 02 (Activations)
│ ├─▶ Module 03 (Layers)
│ ├─▶ Module 04 (Losses)
│ └─▶ Module 08 (DataLoader)
TRAINING ENGINE
├─ Module 05 (Autograd) ← Enhances Module 01
│ ├─▶ Module 06 (Optimizers)
│ └─▶ Module 07 (Training)
COMPUTER VISION BRANCH
├─ Module 09 (Spatial) ← Uses 01,02,03,05
│ └─▶ Module 20 (Capstone)
NLP BRANCH
├─ Module 10 (Tokenization) ← Uses 01
│ ├─▶ Module 11 (Embeddings)
│ └─▶ Module 12 (Attention) ← Uses 01,03,05,11
│ └─▶ Module 13 (Transformers) ← Uses 02,11,12
OPTIMIZATION BRANCH
├─ Module 14 (Profiling) ← Measures any module
│ ├─▶ Module 15 (Quantization) ← Compresses any module
│ ├─▶ Module 16 (Compression) ← Shrinks any module
│ ├─▶ Module 17 (Memoization) ← Optimizes 12,13
│ ├─▶ Module 18 (Acceleration) ← Speeds up any module
│ └─▶ Module 19 (Benchmarking) ← Measures optimizations
│ └─▶ Module 20 (Capstone)
```
---
## Time Estimates by Experience Level
```
┌──────────────────┬──────────┬──────────┬──────────┬──────────┐
│ Experience Level │ Phase 1 │ Phase 2 │ Phase 3 │ Phase 4 │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Beginner │ 12-15h │ 18-22h │ 25-30h │ 22-26h │
│ (New to ML) │ │ │ │ │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Intermediate │ 10-12h │ 14-18h │ 20-25h │ 18-22h │
│ (Used PyTorch) │ │ │ │ │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Advanced │ 8-10h │ 12-15h │ 18-22h │ 16-20h │
│ (Built models) │ │ │ │ │
└──────────────────┴──────────┴──────────┴──────────┴──────────┘
Total Time: 60-80 hours (Intermediate) | 3-4 weeks at 20 hrs/week
```
---
## Difficulty Ratings Explained
```
⭐⭐ │ Beginner-friendly
│ - Follow clear instructions
│ - Build intuition for concepts
│ - ~2 hours per module
⭐⭐⭐ │ Core ML concepts
│ - Implement fundamental algorithms
│ - Connect multiple concepts
│ - ~3 hours per module
⭐⭐⭐⭐ │ Advanced implementation
│ - Complex algorithms
│ - Systems thinking required
│ - ~4 hours per module
⭐⭐⭐⭐⭐ │ Expert-level systems
│ - Multi-layered complexity
│ - Production considerations
│ - ~5-6 hours per module
```
---
## Suggested Learning Paths
### Fast Track (Core ML Only) - 40 hours
Focus on the essentials to build and train networks:
```
01 → 02 → 03 → 04 → 05 → 06 → 07 → 08 → 09
(Tensor through Spatial for CNNs)
Milestones: Perceptron → XOR → MNIST → CIFAR-10
```
### NLP Focus - 55 hours
Core + Language models:
```
01 → 02 → 03 → 04 → 05 → 06 → 07 → 08
10 → 11 → 12 → 13
(Add Tokenization through Transformers)
Milestones: All ML history + Transformer Era
```
### Systems Engineering Path - Full 75 hours
Everything + optimization:
```
Complete all 20 modules
(Tensor → Transformers → Optimization → Capstone)
Milestones: All 6 checkpoints + Production Systems
```
---
## Success Metrics: What "Done" Looks Like
```
✅ Module Complete When:
├─ All unit tests pass (test_unit_* functions)
├─ Module integration test passes (test_module())
├─ You can explain the algorithm to someone else
└─ Code matches PyTorch API (but implemented from scratch)
✅ Phase Complete When:
├─ All modules in phase pass tests
├─ Milestone checkpoint achieved
└─ You understand connections between modules
✅ Course Complete When:
├─ All 20 modules implemented
├─ All 6 milestones achieved
├─ Capstone project deployed
└─ You can confidently say: "I built a transformer from scratch!"
```
---
## Common Questions
**Q: Do I need to complete modules in order?**
A: YES! Each module builds on previous ones. Module 05 (Autograd) enhances Module 01 (Tensor), Module 12 (Attention) uses Modules 01, 03, 05, and 11. The dependency chain is strict.
**Q: Can I skip modules?**
A: Modules 01-08 are REQUIRED. Modules 09-13 split into CV (09) and NLP (10-13) tracks - you can choose one. Modules 14-20 are optimization - recommended but optional for core understanding.
**Q: How do I know if I'm ready for the next module?**
A: Run `test_module()` - if all tests pass, you're ready! Each module has comprehensive integration tests.
**Q: What if I get stuck?**
A: Each module has reference solutions, detailed scaffolding, and clear error messages. Plus milestone checkpoints validate your progress.
**Q: How is this different from online courses?**
A: You BUILD everything from scratch. No black boxes. No "just import PyTorch." You implement every line of a production ML framework.
---
## Your Journey Starts Now
```
┌─────────────────────────────────────────────┐
│ 📍 YOU ARE HERE │
│ │
│ Next Step: cd modules/01_tensor/ │
│ jupyter notebook tensor_dev.py │
│ │
│ First Goal: Understand what a tensor is │
│ First Win: Implement your first matmul │
│ First Checkpoint: Train a perceptron │
│ │
│ 🎯 Final Destination (60-80 hours ahead): │
│ "I built a transformer from scratch!" │
└─────────────────────────────────────────────┘
```
**Remember**: Every expert was once a beginner. Every line of PyTorch was written by someone who understood these fundamentals. Now it's your turn.
**Ready to start building?**
```bash
cd modules/01_tensor
jupyter notebook tensor_dev.py
```
Let's build something amazing! 🚀