Files
TinyTorch/modules/LEARNING_PATH.md
Vijay Janapa Reddi a2e4586f18 Update documentation after module reordering
All module references updated to reflect new ordering:
- Module 15: Quantization (was 16)
- Module 16: Compression (was 17)
- Module 17: Memoization (was 15)

Updated by module-developer and website-manager agents:
- Module ABOUT files with correct numbers and prerequisites
- Cross-references and "What's Next" chains
- Website navigation (_toc.yml) and content
- Learning path progression in LEARNING_PATH.md
- Profile milestone completion message (Module 17)

Pedagogical flow now: Profile → Quantize → Prune → Cache → Accelerate
2025-11-10 19:37:41 -05:00

23 KiB
Raw Permalink Blame History

TinyTorch Learning Journey

From Zero to Transformer: A 20-Module Adventure

┌─────────────────────────────────────────────────────────────────────┐
│                    🎯 YOUR LEARNING DESTINATION                      │
│                                                                       │
│  Start: "What's a tensor?"                                           │
│    ↓                                                                  │
│  Finish: "I built a transformer from scratch using only NumPy!"      │
│                                                                       │
│  🏆 North Star Achievement: Train CNNs on CIFAR-10 to 75%+ accuracy │
└─────────────────────────────────────────────────────────────────────┘

Overview: 4 Phases, 20 Modules, 6 Milestones

Total Time: 60-80 hours (3-4 weeks at 20 hrs/week) Prerequisites: Python, NumPy basics, basic linear algebra Tools: Just Python + NumPy + Jupyter notebooks


Phase 1: FOUNDATION (Modules 01-04)

Goal: Build the fundamental data structures and operations Time: 10-12 hours | Difficulty: Beginner-friendly

┌──────────┐      ┌──────────────┐      ┌─────────┐      ┌─────────┐
│    01    │─────▶│      02      │─────▶│   03    │─────▶│   04    │
│  Tensor  │      │ Activations  │      │ Layers  │      │ Losses  │
│          │      │              │      │         │      │         │
│ • Shape  │      │ • ReLU       │      │ • Linear│      │ • MSE   │
│ • Data   │      │ • Sigmoid    │      │ • Module│      │ • Cross │
│ • Ops    │      │ • Softmax    │      │ • Params│      │   Entropy│
└──────────┘      └──────────────┘      └─────────┘      └─────────┘
  2-3 hrs           1.5-2 hrs            2-3 hrs          2-3 hrs
   ⭐⭐              ⭐⭐                  ⭐⭐⭐            ⭐⭐⭐

Module Details

Module 01: Tensor (2-3 hours, )

  • Build the foundation: n-dimensional arrays with operations
  • Implement: shape, reshape, indexing, broadcasting
  • Operations: add, multiply, matmul, transpose
  • Why it matters: Everything in ML is tensor operations

Module 02: Activations (1.5-2 hours, )

  • Add non-linearity: ReLU, Sigmoid, Softmax
  • Understand: Why neural networks need activations
  • Implement: Forward passes for each activation
  • Why it matters: Without activations, networks are just linear algebra

Module 03: Layers (2-3 hours, )

  • Build neural network components: Linear layers
  • Implement: nn.Module system, Parameter class
  • Create: Weight initialization, layer composition
  • Why it matters: Foundation for all network architectures

Module 04: Losses (2-3 hours, )

  • Measure performance: MSE and CrossEntropy
  • Understand: How to quantify model errors
  • Implement: Loss calculation and aggregation
  • Why it matters: Without loss, we can't train networks

Milestone Checkpoint 1: 1957 Perceptron

Unlock After: Module 04

🏆 CHECKPOINT: Train Rosenblatt's Original Perceptron
├─ Dataset: Linearly separable binary classification
├─ Architecture: Single layer, no hidden units
├─ Achievement: First trainable neural network in history!
└─ Test: Can your implementation learn AND/OR logic?

Phase 2: TRAINING SYSTEMS (Modules 05-08)

Goal: Make your networks learn from data Time: 14-18 hours | Difficulty: Core ML concepts

┌──────────┐      ┌────────────┐      ┌──────────┐      ┌────────────┐
│    05    │─────▶│     06     │─────▶│    07    │─────▶│     08     │
│ Autograd │      │ Optimizers │      │ Training │      │ DataLoader │
│          │      │            │      │          │      │            │
│ • Graph  │      │ • SGD      │      │ • Loops  │      │ • Batching │
│ • Forward│      │ • Momentum │      │ • Epochs │      │ • Shuffling│
│ • Backward│     │ • Adam     │      │ • Eval   │      │ • Pipeline │
└──────────┘      └────────────┘      └──────────┘      └────────────┘
  3-4 hrs          3-4 hrs             4-5 hrs           3-4 hrs
  ⭐⭐⭐⭐          ⭐⭐⭐⭐             ⭐⭐⭐⭐           ⭐⭐⭐
     │                 │                  │                  │
     └─────────────────┴──────────────────┴──────────────────┘
                    ALL BUILD ON TENSOR (Module 01)

Module Details

Module 05: Autograd (3-4 hours, ) CRITICAL MODULE

  • Implement automatic differentiation: The magic of modern ML
  • Build: Computational graph, gradient tracking
  • Implement: backward() for all operations
  • Why it matters: This IS machine learning - without gradients, no training

Module 06: Optimizers (3-4 hours, )

  • Update weights intelligently: SGD, Momentum, Adam
  • Understand: Learning rates, momentum, adaptive methods
  • Implement: Parameter updates, state management
  • Why it matters: How networks actually improve over time

Module 07: Training (4-5 hours, ) CRITICAL MODULE

  • Complete training loops: The full ML pipeline
  • Implement: Epochs, batches, forward/backward passes
  • Add: Metrics tracking, model evaluation
  • Why it matters: This is where everything comes together

Module 08: DataLoader (3-4 hours, )

  • Efficient data handling: Batching, shuffling, pipelines
  • Implement: Batch creation, data iteration
  • Optimize: Memory efficiency, preprocessing
  • Why it matters: Real ML needs to handle millions of examples

Milestone Checkpoint 2: 1969 XOR Crisis & Solution

Unlock After: Module 07

🏆 CHECKPOINT: Solve the Problem That Nearly Killed AI
├─ Dataset: XOR (the "impossible" problem for single-layer networks)
├─ Architecture: Multi-layer perceptron with hidden units
├─ Achievement: Prove Minsky wrong - MLPs can learn XOR!
└─ Test: 100% accuracy on XOR with your backpropagation

Milestone Checkpoint 3: 1986 MLP Revival

Unlock After: Module 08

🏆 CHECKPOINT: Recognize Handwritten Digits (MNIST)
├─ Dataset: MNIST (60,000 handwritten digits)
├─ Architecture: 2-3 layer MLP with ReLU activations
├─ Achievement: 95%+ accuracy on real computer vision!
└─ Test: Your network recognizes digits you draw yourself

Phase 3: ADVANCED ARCHITECTURES (Modules 09-13)

Goal: Build modern CV and NLP architectures Time: 20-25 hours | Difficulty: Advanced concepts

┌──────────┐      ┌───────────────┐      ┌─────────────┐
│    09    │─────▶│      10       │─────▶│     11      │
│ Spatial  │      │ Tokenization  │      │ Embeddings  │
│          │      │               │      │             │
│ • Conv2d │      │ • BPE         │      │ • Token Emb │
│ • Pool2d │      │ • Vocab       │      │ • Position  │
│ • CNNs   │      │ • Encoding    │      │ • Learned   │
└──────────┘      └───────────────┘      └─────────────┘
  5-6 hrs          4-5 hrs                3-4 hrs
 ⭐⭐⭐⭐⭐         ⭐⭐⭐⭐               ⭐⭐⭐⭐
     │                  │                      │
     │                  └──────────┬───────────┘
     │                             ▼
     │            ┌──────────┐      ┌──────────────┐
     │            │    12    │─────▶│      13      │
     │            │Attention │      │Transformers  │
     │            │          │      │              │
     │            │ • Q,K,V  │      │ • Encoder    │
     │            │ • Multi  │      │ • Decoder    │
     │            │   -Head  │      │ • Complete   │
     │            └──────────┘      └──────────────┘
     │              4-5 hrs           6-8 hrs
     │             ⭐⭐⭐⭐⭐          ⭐⭐⭐⭐⭐
     │                  │                  │
     └──────────────────┴──────────────────┘
              ALL USE AUTOGRAD (Module 05)

Module Details

Module 09: Spatial Operations (5-6 hours, ) CRITICAL MODULE

  • Convolutional Neural Networks: Modern computer vision
  • Implement: Conv2d (with 6 nested loops!), MaxPool2d
  • Understand: Why CNNs revolutionized image processing
  • Why it matters: The foundation of modern computer vision

Module 10: Tokenization (4-5 hours, )

  • Text preprocessing: From strings to numbers
  • Implement: Byte-Pair Encoding (BPE), vocabulary building
  • Understand: How transformers see language
  • Why it matters: Can't process text without tokenization

Module 11: Embeddings (3-4 hours, )

  • Convert tokens to vectors: Token and positional embeddings
  • Implement: Embedding lookup, sinusoidal position encoding
  • Understand: How models represent meaning
  • Why it matters: Foundation for all language models

Module 12: Attention (4-5 hours, ) CRITICAL MODULE

  • The transformer revolution: Multi-head self-attention
  • Implement: Q, K, V projections, scaled dot-product attention
  • Understand: Why attention changed everything
  • Why it matters: The core of GPT, BERT, and all modern LLMs

Module 13: Transformers (6-8 hours, ) CRITICAL MODULE

  • Complete transformer architecture: GPT-style models
  • Implement: Encoder/decoder blocks, layer norm, residuals
  • Build: Full transformer from components
  • Why it matters: You're building GPT from scratch!

Milestone Checkpoint 4: 1998 CNN Revolution

Unlock After: Module 09

🏆 CHECKPOINT: CIFAR-10 Image Classification (North Star!)
├─ Dataset: CIFAR-10 (50,000 color images, 10 classes)
├─ Architecture: LeNet-inspired CNN with Conv2d + MaxPool
├─ Achievement: 75%+ accuracy on real-world images!
├─ Test: Classify airplanes, cars, birds, cats, etc.
└─ Impact: This is where your framework becomes REAL

Milestone Checkpoint 5: 2017 Transformer Era

Unlock After: Module 13

🏆 CHECKPOINT: Build a Language Model
├─ Dataset: Text corpus (Shakespeare, WikiText, etc.)
├─ Architecture: GPT-style decoder with multi-head attention
├─ Achievement: Generate coherent text character-by-character
├─ Test: Your model completes sentences meaningfully
└─ Impact: You've built the architecture behind ChatGPT!

Phase 4: PRODUCTION SYSTEMS (Modules 14-20)

Goal: Optimize and deploy ML systems at scale Time: 18-22 hours | Difficulty: Systems engineering

┌──────────┐      ┌──────────────┐      ┌──────────────┐
│    14    │─────▶│      15      │─────▶│      16      │
│Profiling │      │ Quantization │      │ Compression  │
│          │      │              │      │              │
│ • Time   │      │ • INT8       │      │ • Pruning    │
│ • Memory │      │ • Calibrate  │      │ • Distill    │
│ • FLOPs  │      │ • Compress   │      │ • Sparse     │
└──────────┘      └──────────────┘      └──────────────┘
  3-4 hrs          5-6 hrs                4-5 hrs
  ⭐⭐⭐⭐          ⭐⭐⭐⭐⭐              ⭐⭐⭐⭐⭐

       ▼                 ▼                     ▼

┌──────────┐      ┌──────────────┐      ┌──────────┐      ┌──────────┐
│    17    │─────▶│      18      │─────▶│    19    │─────▶│    20    │
│Memoization│    │Acceleration  │      │Benchmark │      │ Capstone │
│          │      │              │      │          │      │          │
│ • KV-Cache│     │ • Vectorize  │      │ • Compare│      │ • Full   │
│ • Reuse  │      │ • Hardware   │      │ • Report │      │   System │
│ • Speedup│      │ • Parallel   │      │ • Analyze│      │ • Deploy │
└──────────┘      └──────────────┘      └──────────┘      └──────────┘
  3-4 hrs          3-4 hrs               3-4 hrs          4-6 hrs
  ⭐⭐⭐⭐          ⭐⭐⭐⭐              ⭐⭐⭐⭐          ⭐⭐⭐⭐⭐

Module Details

Module 14: Profiling (3-4 hours, )

  • Measure everything: Time, memory, FLOPs
  • Implement: Profiling decorators, bottleneck analysis
  • Understand: Where computation actually happens
  • Why it matters: Can't optimize what you don't measure

Module 15: Quantization (5-6 hours, )

  • Compress models: Float32 → INT8
  • Implement: Quantization, calibration, dequantization
  • Achieve: 4× smaller models, faster inference
  • Why it matters: Deploy models on edge devices

Module 16: Compression (4-5 hours, )

  • Shrink models: Pruning and distillation
  • Implement: Weight pruning, knowledge distillation
  • Achieve: 10× smaller models with minimal accuracy loss
  • Why it matters: Mobile ML and resource-constrained deployment

Module 17: Memoization (3-4 hours, )

  • Cache computations: KV-cache for transformers
  • Implement: Memoization decorators, cache management
  • Optimize: 10-100× speedup for inference
  • Why it matters: How production LLMs run efficiently

Module 18: Acceleration (3-4 hours, )

  • Hardware optimization: Vectorization, parallelization
  • Implement: NumPy tricks, batch processing
  • Achieve: 10-100× speedups
  • Why it matters: Production systems need speed

Module 19: Benchmarking (3-4 hours, )

  • Compare implementations: Rigorous performance testing
  • Implement: Benchmark suite, statistical analysis
  • Report: Scientific measurements
  • Why it matters: Engineering decisions need data

Module 20: Capstone (4-6 hours, ) FINAL PROJECT

  • Build complete system: End-to-end ML pipeline
  • Integrate: All 19 modules into production-ready system
  • Deploy: Real application with optimization
  • Why it matters: This is your portfolio piece!

Milestone Checkpoint 6: 2024 Systems Age

Unlock After: Module 20

🏆 FINAL CHECKPOINT: Production-Optimized ML System
├─ Challenge: Take any milestone and make it production-ready
├─ Requirements:
│   ├─ 10× faster inference (profiling + acceleration)
│   ├─ 4× smaller model (quantization + compression)
│   ├─ <100ms latency (memoization + optimization)
│   └─ Rigorous benchmarks (statistical significance)
├─ Achievement: You're now an ML systems engineer!
└─ Test: Deploy your system, measure everything, compare to PyTorch

Dependency Map: How Modules Connect

CORE FOUNDATION
├─ Module 01 (Tensor)
│   ├─▶ Module 02 (Activations)
│   ├─▶ Module 03 (Layers)
│   ├─▶ Module 04 (Losses)
│   └─▶ Module 08 (DataLoader)
│
TRAINING ENGINE
├─ Module 05 (Autograd) ← Enhances Module 01
│   ├─▶ Module 06 (Optimizers)
│   └─▶ Module 07 (Training)
│
COMPUTER VISION BRANCH
├─ Module 09 (Spatial) ← Uses 01,02,03,05
│   └─▶ Module 20 (Capstone)
│
NLP BRANCH
├─ Module 10 (Tokenization) ← Uses 01
│   ├─▶ Module 11 (Embeddings)
│   └─▶ Module 12 (Attention) ← Uses 01,03,05,11
│       └─▶ Module 13 (Transformers) ← Uses 02,11,12
│
OPTIMIZATION BRANCH
├─ Module 14 (Profiling) ← Measures any module
│   ├─▶ Module 15 (Quantization) ← Compresses any module
│   ├─▶ Module 16 (Compression) ← Shrinks any module
│   ├─▶ Module 17 (Memoization) ← Optimizes 12,13
│   ├─▶ Module 18 (Acceleration) ← Speeds up any module
│   └─▶ Module 19 (Benchmarking) ← Measures optimizations
│       └─▶ Module 20 (Capstone)

Time Estimates by Experience Level

┌──────────────────┬──────────┬──────────┬──────────┬──────────┐
│ Experience Level │ Phase 1  │ Phase 2  │ Phase 3  │ Phase 4  │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Beginner         │ 12-15h   │ 18-22h   │ 25-30h   │ 22-26h   │
│ (New to ML)      │          │          │          │          │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Intermediate     │ 10-12h   │ 14-18h   │ 20-25h   │ 18-22h   │
│ (Used PyTorch)   │          │          │          │          │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Advanced         │  8-10h   │ 12-15h   │ 18-22h   │ 16-20h   │
│ (Built models)   │          │          │          │          │
└──────────────────┴──────────┴──────────┴──────────┴──────────┘

Total Time: 60-80 hours (Intermediate) | 3-4 weeks at 20 hrs/week

Difficulty Ratings Explained

⭐⭐         │ Beginner-friendly
            │ - Follow clear instructions
            │ - Build intuition for concepts
            │ - ~2 hours per module
            │
⭐⭐⭐       │ Core ML concepts
            │ - Implement fundamental algorithms
            │ - Connect multiple concepts
            │ - ~3 hours per module
            │
⭐⭐⭐⭐     │ Advanced implementation
            │ - Complex algorithms
            │ - Systems thinking required
            │ - ~4 hours per module
            │
⭐⭐⭐⭐⭐   │ Expert-level systems
            │ - Multi-layered complexity
            │ - Production considerations
            │ - ~5-6 hours per module

Suggested Learning Paths

Fast Track (Core ML Only) - 40 hours

Focus on the essentials to build and train networks:

01 → 02 → 03 → 04 → 05 → 06 → 07 → 08 → 09
(Tensor through Spatial for CNNs)

Milestones: Perceptron → XOR → MNIST → CIFAR-10

NLP Focus - 55 hours

Core + Language models:

01 → 02 → 03 → 04 → 05 → 06 → 07 → 08
          ↓
10 → 11 → 12 → 13
(Add Tokenization through Transformers)

Milestones: All ML history + Transformer Era

Systems Engineering Path - Full 75 hours

Everything + optimization:

Complete all 20 modules
(Tensor → Transformers → Optimization → Capstone)

Milestones: All 6 checkpoints + Production Systems

Success Metrics: What "Done" Looks Like

✅ Module Complete When:
├─ All unit tests pass (test_unit_* functions)
├─ Module integration test passes (test_module())
├─ You can explain the algorithm to someone else
└─ Code matches PyTorch API (but implemented from scratch)

✅ Phase Complete When:
├─ All modules in phase pass tests
├─ Milestone checkpoint achieved
└─ You understand connections between modules

✅ Course Complete When:
├─ All 20 modules implemented
├─ All 6 milestones achieved
├─ Capstone project deployed
└─ You can confidently say: "I built a transformer from scratch!"

Common Questions

Q: Do I need to complete modules in order? A: YES! Each module builds on previous ones. Module 05 (Autograd) enhances Module 01 (Tensor), Module 12 (Attention) uses Modules 01, 03, 05, and 11. The dependency chain is strict.

Q: Can I skip modules? A: Modules 01-08 are REQUIRED. Modules 09-13 split into CV (09) and NLP (10-13) tracks - you can choose one. Modules 14-20 are optimization - recommended but optional for core understanding.

Q: How do I know if I'm ready for the next module? A: Run test_module() - if all tests pass, you're ready! Each module has comprehensive integration tests.

Q: What if I get stuck? A: Each module has reference solutions, detailed scaffolding, and clear error messages. Plus milestone checkpoints validate your progress.

Q: How is this different from online courses? A: You BUILD everything from scratch. No black boxes. No "just import PyTorch." You implement every line of a production ML framework.


Your Journey Starts Now

┌─────────────────────────────────────────────┐
│  📍 YOU ARE HERE                             │
│                                              │
│  Next Step: cd modules/01_tensor/    │
│             jupyter notebook tensor_dev.py   │
│                                              │
│  First Goal: Understand what a tensor is    │
│  First Win: Implement your first matmul     │
│  First Checkpoint: Train a perceptron       │
│                                              │
│  🎯 Final Destination (60-80 hours ahead):  │
│     "I built a transformer from scratch!"   │
└─────────────────────────────────────────────┘

Remember: Every expert was once a beginner. Every line of PyTorch was written by someone who understood these fundamentals. Now it's your turn.

Ready to start building?

cd modules/01_tensor
jupyter notebook tensor_dev.py

Let's build something amazing! 🚀