mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-01 23:08:57 -05:00

Files

Vijay Janapa Reddi 0e306808f8 Updates module difficulty and time estimates

Refactors difficulty levels to use star ratings for better visual representation.

Adjusts time estimates for modules based on user feedback and complexity,
resulting in a more accurate learning path.

2025-11-24 12:56:26 -05:00

23 KiB

Raw Blame History

TinyTorch Learning Journey

From Zero to Transformer: A 20-Module Adventure

┌─────────────────────────────────────────────────────────────────────┐
│                    🎯 YOUR LEARNING DESTINATION                      │
│                                                                       │
│  Start: "What's a tensor?"                                           │
│    ↓                                                                  │
│  Finish: "I built a transformer from scratch using only NumPy!"      │
│                                                                       │
│  🏆 North Star Achievement: Train CNNs on CIFAR-10 to 75%+ accuracy │
└─────────────────────────────────────────────────────────────────────┘

Overview: 4 Phases, 20 Modules, 6 Milestones

Total Time: 100-130 hours (5-7 weeks at 20 hrs/week) Prerequisites: Python, NumPy basics, basic linear algebra Tools: Just Python + NumPy + Jupyter notebooks

Phase 1: FOUNDATION (Modules 01-04)

Goal: Build the fundamental data structures and operations Time: 14-19 hours | Difficulty: ⭐-⭐⭐ Beginner-friendly

┌──────────┐      ┌──────────────┐      ┌─────────┐      ┌─────────┐
│    01    │─────▶│      02      │─────▶│   03    │─────▶│   04    │
│  Tensor  │      │ Activations  │      │ Layers  │      │ Losses  │
│          │      │              │      │         │      │         │
│ • Shape  │      │ • ReLU       │      │ • Linear│      │ • MSE   │
│ • Data   │      │ • Sigmoid    │      │ • Module│      │ • Cross │
│ • Ops    │      │ • Softmax    │      │ • Params│      │   Entropy│
└──────────┘      └──────────────┘      └─────────┘      └─────────┘
  4-6 hrs           3-4 hrs              4-5 hrs          3-4 hrs
    ⭐                ⭐⭐                  ⭐⭐              ⭐⭐

Module Details

Module 01: Tensor (4-6 hours, ⭐)

Build the foundation: n-dimensional arrays with operations
Implement: shape, reshape, indexing, broadcasting
Operations: add, multiply, matmul, transpose
Why it matters: Everything in ML is tensor operations

Module 02: Activations (3-4 hours, ⭐⭐)

Add non-linearity: ReLU, Sigmoid, Softmax
Understand: Why neural networks need activations
Implement: Forward passes for each activation
Why it matters: Without activations, networks are just linear algebra

Module 03: Layers (4-5 hours, ⭐⭐)

Build neural network components: Linear layers
Implement: nn.Module system, Parameter class
Create: Weight initialization, layer composition
Why it matters: Foundation for all network architectures

Module 04: Losses (3-4 hours, ⭐⭐)

Measure performance: MSE and CrossEntropy
Understand: How to quantify model errors
Implement: Loss calculation and aggregation
Why it matters: Without loss, we can't train networks

Milestone Checkpoint 1: 1957 Perceptron

Unlock After: Module 04

🏆 CHECKPOINT: Train Rosenblatt's Original Perceptron
├─ Dataset: Linearly separable binary classification
├─ Architecture: Single layer, no hidden units
├─ Achievement: First trainable neural network in history!
└─ Test: Can your implementation learn AND/OR logic?

Phase 2: TRAINING SYSTEMS (Modules 05-08)

Goal: Make your networks learn from data Time: 24-31 hours | Difficulty: ⭐⭐⭐-⭐⭐⭐⭐ Core ML concepts

┌──────────┐      ┌────────────┐      ┌──────────┐      ┌────────────┐
│    05    │─────▶│     06     │─────▶│    07    │─────▶│     08     │
│ Autograd │      │ Optimizers │      │ Training │      │ DataLoader │
│          │      │            │      │          │      │            │
│ • Graph  │      │ • SGD      │      │ • Loops  │      │ • Batching │
│ • Forward│      │ • Momentum │      │ • Epochs │      │ • Shuffling│
│ • Backward│     │ • Adam     │      │ • Eval   │      │ • Pipeline │
└──────────┘      └────────────┘      └──────────┘      └────────────┘
 8-10 hrs          6-8 hrs             6-8 hrs           4-5 hrs
 ⭐⭐⭐⭐          ⭐⭐⭐⭐             ⭐⭐⭐⭐           ⭐⭐⭐
     │                 │                  │                  │
     └─────────────────┴──────────────────┴──────────────────┘
                    ALL BUILD ON TENSOR (Module 01)

Module Details

Module 05: Autograd (8-10 hours, ⭐⭐⭐⭐) CRITICAL MODULE

Implement automatic differentiation: The magic of modern ML
Build: Computational graph, gradient tracking
Implement: backward() for all operations
Why it matters: This IS machine learning - without gradients, no training

Module 06: Optimizers (6-8 hours, ⭐⭐⭐⭐)

Update weights intelligently: SGD, Momentum, Adam
Understand: Learning rates, momentum, adaptive methods
Implement: Parameter updates, state management
Why it matters: How networks actually improve over time

Module 07: Training (6-8 hours, ⭐⭐⭐⭐) CRITICAL MODULE

Complete training loops: The full ML pipeline
Implement: Epochs, batches, forward/backward passes
Add: Metrics tracking, model evaluation
Why it matters: This is where everything comes together

Module 08: DataLoader (4-5 hours, ⭐⭐⭐)

Efficient data handling: Batching, shuffling, pipelines
Implement: Batch creation, data iteration
Optimize: Memory efficiency, preprocessing
Why it matters: Real ML needs to handle millions of examples

Milestone Checkpoint 2: 1969 XOR Crisis & Solution

Unlock After: Module 07

🏆 CHECKPOINT: Solve the Problem That Nearly Killed AI
├─ Dataset: XOR (the "impossible" problem for single-layer networks)
├─ Architecture: Multi-layer perceptron with hidden units
├─ Achievement: Prove Minsky wrong - MLPs can learn XOR!
└─ Test: 100% accuracy on XOR with your backpropagation

Milestone Checkpoint 3: 1986 MLP Revival

Unlock After: Module 08

🏆 CHECKPOINT: Recognize Handwritten Digits (MNIST)
├─ Dataset: MNIST (60,000 handwritten digits)
├─ Architecture: 2-3 layer MLP with ReLU activations
├─ Achievement: 95%+ accuracy on real computer vision!
└─ Test: Your network recognizes digits you draw yourself

Phase 3: ADVANCED ARCHITECTURES (Modules 09-13)

Goal: Build modern CV and NLP architectures Time: 26-33 hours | Difficulty: ⭐⭐⭐-⭐⭐⭐⭐ Advanced concepts

┌──────────┐      ┌───────────────┐      ┌─────────────┐
│    09    │─────▶│      10       │─────▶│     11      │
│ Spatial  │      │ Tokenization  │      │ Embeddings  │
│          │      │               │      │             │
│ • Conv2d │      │ • BPE         │      │ • Token Emb │
│ • Pool2d │      │ • Vocab       │      │ • Position  │
│ • CNNs   │      │ • Encoding    │      │ • Learned   │
└──────────┘      └───────────────┘      └─────────────┘
  6-8 hrs          4-5 hrs                4-5 hrs
  ⭐⭐⭐            ⭐⭐                    ⭐⭐
     │                  │                      │
     │                  └──────────┬───────────┘
     │                             ▼
     │            ┌──────────┐      ┌──────────────┐
     │            │    12    │─────▶│      13      │
     │            │Attention │      │Transformers  │
     │            │          │      │              │
     │            │ • Q,K,V  │      │ • Encoder    │
     │            │ • Multi  │      │ • Decoder    │
     │            │   -Head  │      │ • Complete   │
     │            └──────────┘      └──────────────┘
     │              5-6 hrs           6-8 hrs
     │              ⭐⭐⭐             ⭐⭐⭐⭐
     │                  │                  │
     └──────────────────┴──────────────────┘
              ALL USE AUTOGRAD (Module 05)

Module Details

Module 09: Spatial Operations (6-8 hours, ⭐⭐⭐) CRITICAL MODULE

Convolutional Neural Networks: Modern computer vision
Implement: Conv2d (with 6 nested loops!), MaxPool2d
Understand: Why CNNs revolutionized image processing
Why it matters: The foundation of modern computer vision

Module 10: Tokenization (4-5 hours, ⭐⭐)

Text preprocessing: From strings to numbers
Implement: Byte-Pair Encoding (BPE), vocabulary building
Understand: How transformers see language
Why it matters: Can't process text without tokenization

Module 11: Embeddings (4-5 hours, ⭐⭐)

Convert tokens to vectors: Token and positional embeddings
Implement: Embedding lookup, sinusoidal position encoding
Understand: How models represent meaning
Why it matters: Foundation for all language models

Module 12: Attention (5-6 hours, ⭐⭐⭐) CRITICAL MODULE

The transformer revolution: Multi-head self-attention
Implement: Q, K, V projections, scaled dot-product attention
Understand: Why attention changed everything
Why it matters: The core of GPT, BERT, and all modern LLMs

Module 13: Transformers (6-8 hours, ⭐⭐⭐⭐) CRITICAL MODULE

Complete transformer architecture: GPT-style models
Implement: Encoder/decoder blocks, layer norm, residuals
Build: Full transformer from components
Why it matters: You're building GPT from scratch!

Milestone Checkpoint 4: 1998 CNN Revolution

Unlock After: Module 09

🏆 CHECKPOINT: CIFAR-10 Image Classification (North Star!)
├─ Dataset: CIFAR-10 (50,000 color images, 10 classes)
├─ Architecture: LeNet-inspired CNN with Conv2d + MaxPool
├─ Achievement: 75%+ accuracy on real-world images!
├─ Test: Classify airplanes, cars, birds, cats, etc.
└─ Impact: This is where your framework becomes REAL

Milestone Checkpoint 5: 2017 Transformer Era

Unlock After: Module 13

🏆 CHECKPOINT: Build a Language Model
├─ Dataset: Text corpus (Shakespeare, WikiText, etc.)
├─ Architecture: GPT-style decoder with multi-head attention
├─ Achievement: Generate coherent text character-by-character
├─ Test: Your model completes sentences meaningfully
└─ Impact: You've built the architecture behind ChatGPT!

Phase 4: PRODUCTION SYSTEMS (Modules 14-20)

Goal: Optimize and deploy ML systems at scale Time: 36-47 hours | Difficulty: ⭐⭐⭐-⭐⭐⭐⭐ Systems engineering

┌──────────┐      ┌──────────────┐      ┌──────────────┐
│    14    │─────▶│      15      │─────▶│      16      │
│Profiling │      │ Quantization │      │ Compression  │
│          │      │              │      │              │
│ • Time   │      │ • INT8       │      │ • Pruning    │
│ • Memory │      │ • Calibrate  │      │ • Distill    │
│ • FLOPs  │      │ • Compress   │      │ • Sparse     │
└──────────┘      └──────────────┘      └──────────────┘
  5-6 hrs          5-6 hrs                5-6 hrs
  ⭐⭐⭐            ⭐⭐⭐                  ⭐⭐⭐

       ▼                 ▼                     ▼

┌──────────┐      ┌──────────────┐      ┌──────────┐      ┌──────────┐
│    17    │─────▶│      18      │─────▶│    19    │─────▶│    20    │
│Memoization│    │Acceleration  │      │Benchmark │      │ Capstone │
│          │      │              │      │          │      │          │
│ • KV-Cache│     │ • Vectorize  │      │ • Compare│      │ • Full   │
│ • Reuse  │      │ • Hardware   │      │ • Report │      │   System │
│ • Speedup│      │ • Parallel   │      │ • Analyze│      │ • Deploy │
└──────────┘      └──────────────┘      └──────────┘      └──────────┘
  4-5 hrs          6-8 hrs               5-6 hrs          5-8 hrs
  ⭐⭐⭐            ⭐⭐⭐                 ⭐⭐⭐            ⭐⭐⭐⭐

Module Details

Module 14: Profiling (5-6 hours, ⭐⭐⭐)

Measure everything: Time, memory, FLOPs
Implement: Profiling decorators, bottleneck analysis
Understand: Where computation actually happens
Why it matters: Can't optimize what you don't measure

Module 15: Quantization (5-6 hours, ⭐⭐⭐)

Compress models: Float32 → INT8
Implement: Quantization, calibration, dequantization
Achieve: 4× smaller models, faster inference
Why it matters: Deploy models on edge devices

Module 16: Compression (5-6 hours, ⭐⭐⭐)

Shrink models: Pruning and distillation
Implement: Weight pruning, knowledge distillation
Achieve: 10× smaller models with minimal accuracy loss
Why it matters: Mobile ML and resource-constrained deployment

Module 17: Memoization (4-5 hours, ⭐⭐⭐)

Cache computations: KV-cache for transformers
Implement: Memoization decorators, cache management
Optimize: 10-100× speedup for inference
Why it matters: How production LLMs run efficiently

Module 18: Acceleration (6-8 hours, ⭐⭐⭐)

Hardware optimization: Vectorization, parallelization
Implement: NumPy tricks, batch processing
Achieve: 10-100× speedups
Why it matters: Production systems need speed

Module 19: Benchmarking (5-6 hours, ⭐⭐⭐)

Compare implementations: Rigorous performance testing
Implement: Benchmark suite, statistical analysis
Report: Scientific measurements
Why it matters: Engineering decisions need data

Module 20: Capstone (5-8 hours, ⭐⭐⭐⭐) FINAL PROJECT

Build complete system: End-to-end ML pipeline
Integrate: All 19 modules into production-ready system
Deploy: Real application with optimization
Why it matters: This is your portfolio piece!

Milestone Checkpoint 6: 2024 Systems Age

Unlock After: Module 20

🏆 FINAL CHECKPOINT: Production-Optimized ML System
├─ Challenge: Take any milestone and make it production-ready
├─ Requirements:
│   ├─ 10× faster inference (profiling + acceleration)
│   ├─ 4× smaller model (quantization + compression)
│   ├─ <100ms latency (memoization + optimization)
│   └─ Rigorous benchmarks (statistical significance)
├─ Achievement: You're now an ML systems engineer!
└─ Test: Deploy your system, measure everything, compare to PyTorch

Dependency Map: How Modules Connect

CORE FOUNDATION
├─ Module 01 (Tensor)
│   ├─▶ Module 02 (Activations)
│   ├─▶ Module 03 (Layers)
│   ├─▶ Module 04 (Losses)
│   └─▶ Module 08 (DataLoader)
│
TRAINING ENGINE
├─ Module 05 (Autograd) ← Enhances Module 01
│   ├─▶ Module 06 (Optimizers)
│   └─▶ Module 07 (Training)
│
COMPUTER VISION BRANCH
├─ Module 09 (Spatial) ← Uses 01,02,03,05
│   └─▶ Module 20 (Capstone)
│
NLP BRANCH
├─ Module 10 (Tokenization) ← Uses 01
│   ├─▶ Module 11 (Embeddings)
│   └─▶ Module 12 (Attention) ← Uses 01,03,05,11
│       └─▶ Module 13 (Transformers) ← Uses 02,11,12
│
OPTIMIZATION BRANCH
├─ Module 14 (Profiling) ← Measures any module
│   ├─▶ Module 15 (Quantization) ← Compresses any module
│   ├─▶ Module 16 (Compression) ← Shrinks any module
│   ├─▶ Module 17 (Memoization) ← Optimizes 12,13
│   ├─▶ Module 18 (Acceleration) ← Speeds up any module
│   └─▶ Module 19 (Benchmarking) ← Measures optimizations
│       └─▶ Module 20 (Capstone)

Time Estimates by Experience Level

┌──────────────────┬──────────┬──────────┬──────────┬──────────┐
│ Experience Level │ Phase 1  │ Phase 2  │ Phase 3  │ Phase 4  │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Beginner         │ 17-23h   │ 29-37h   │ 31-40h   │ 43-56h   │
│ (New to ML)      │          │          │          │          │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Intermediate     │ 14-19h   │ 24-31h   │ 26-33h   │ 36-47h   │
│ (Used PyTorch)   │          │          │          │          │
├──────────────────┼──────────┼──────────┼──────────┼──────────┤
│ Advanced         │ 11-15h   │ 19-25h   │ 21-26h   │ 29-38h   │
│ (Built models)   │          │          │          │          │
└──────────────────┴──────────┴──────────┴──────────┴──────────┘

Total Time: 100-130 hours (Intermediate) | 5-7 weeks at 20 hrs/week

Difficulty Ratings Explained

⭐⭐         │ Beginner-friendly
            │ - Follow clear instructions
            │ - Build intuition for concepts
            │ - ~2 hours per module
            │
⭐⭐⭐       │ Core ML concepts
            │ - Implement fundamental algorithms
            │ - Connect multiple concepts
            │ - ~3 hours per module
            │
⭐⭐⭐⭐     │ Advanced implementation
            │ - Complex algorithms
            │ - Systems thinking required
            │ - ~4 hours per module
            │
⭐⭐⭐⭐⭐   │ Expert-level systems
            │ - Multi-layered complexity
            │ - Production considerations
            │ - ~5-6 hours per module

Suggested Learning Paths

Fast Track (Core ML Only) - 64 hours

Focus on the essentials to build and train networks:

01 → 02 → 03 → 04 → 05 → 06 → 07 → 08 → 09
(Tensor through Spatial for CNNs)

Milestones: Perceptron → XOR → MNIST → CIFAR-10

NLP Focus - 85 hours

Core + Language models:

01 → 02 → 03 → 04 → 05 → 06 → 07 → 08
          ↓
10 → 11 → 12 → 13
(Add Tokenization through Transformers)

Milestones: All ML history + Transformer Era

Systems Engineering Path - Full 100-130 hours

Everything + optimization:

Complete all 20 modules
(Tensor → Transformers → Optimization → Capstone)

Milestones: All 6 checkpoints + Production Systems

Success Metrics: What "Done" Looks Like

✅ Module Complete When:
├─ All unit tests pass (test_unit_* functions)
├─ Module integration test passes (test_module())
├─ You can explain the algorithm to someone else
└─ Code matches PyTorch API (but implemented from scratch)

✅ Phase Complete When:
├─ All modules in phase pass tests
├─ Milestone checkpoint achieved
└─ You understand connections between modules

✅ Course Complete When:
├─ All 20 modules implemented
├─ All 6 milestones achieved
├─ Capstone project deployed
└─ You can confidently say: "I built a transformer from scratch!"

Common Questions

Q: Do I need to complete modules in order? A: YES! Each module builds on previous ones. Module 05 (Autograd) enhances Module 01 (Tensor), Module 12 (Attention) uses Modules 01, 03, 05, and 11. The dependency chain is strict.

Q: Can I skip modules? A: Modules 01-08 are REQUIRED. Modules 09-13 split into CV (09) and NLP (10-13) tracks - you can choose one. Modules 14-20 are optimization - recommended but optional for core understanding.

Q: How do I know if I'm ready for the next module? A: Run test_module() - if all tests pass, you're ready! Each module has comprehensive integration tests.

Q: What if I get stuck? A: Each module has reference solutions, detailed scaffolding, and clear error messages. Plus milestone checkpoints validate your progress.

Q: How is this different from online courses? A: You BUILD everything from scratch. No black boxes. No "just import PyTorch." You implement every line of a production ML framework.

Your Journey Starts Now

┌─────────────────────────────────────────────┐
│  📍 YOU ARE HERE                             │
│                                              │
│  Next Step: cd modules/01_tensor/    │
│             jupyter notebook tensor_dev.py   │
│                                              │
│  First Goal: Understand what a tensor is    │
│  First Win: Implement your first matmul     │
│  First Checkpoint: Train a perceptron       │
│                                              │
│  🎯 Final Destination (60-80 hours ahead):  │
│     "I built a transformer from scratch!"   │
└─────────────────────────────────────────────┘

Remember: Every expert was once a beginner. Every line of PyTorch was written by someone who understood these fundamentals. Now it's your turn.

Ready to start building?

cd modules/01_tensor
jupyter notebook tensor_dev.py

Let's build something amazing! 🚀

23 KiB Raw Blame History Unescape Escape

TinyTorch Learning Journey

Overview: 4 Phases, 20 Modules, 6 Milestones

Phase 1: FOUNDATION (Modules 01-04)

Module Details

Milestone Checkpoint 1: 1957 Perceptron

Phase 2: TRAINING SYSTEMS (Modules 05-08)

Module Details

Milestone Checkpoint 2: 1969 XOR Crisis & Solution

Milestone Checkpoint 3: 1986 MLP Revival

Phase 3: ADVANCED ARCHITECTURES (Modules 09-13)

Module Details

Milestone Checkpoint 4: 1998 CNN Revolution

Milestone Checkpoint 5: 2017 Transformer Era

Phase 4: PRODUCTION SYSTEMS (Modules 14-20)

Module Details

Milestone Checkpoint 6: 2024 Systems Age

Dependency Map: How Modules Connect

Time Estimates by Experience Level

Difficulty Ratings Explained

Suggested Learning Paths

Fast Track (Core ML Only) - 64 hours

NLP Focus - 85 hours

Systems Engineering Path - Full 100-130 hours

Success Metrics: What "Done" Looks Like

Common Questions

Your Journey Starts Now

23 KiB

Raw Blame History