Files
TinyTorch/docs/15-module-structure.md
Vijay Janapa Reddi 7c58db8458 Finalize 15-module structure: MLPs → CNNs → Transformers
Clean, dependency-driven organization:
- Part I (1-5): MLPs for XORNet
- Part II (6-10): CNNs for CIFAR-10
- Part III (11-15): Transformers for TinyGPT

Key improvements:
- Dropped modules 16-17 (regularization/systems) to maintain scope
- Moved normalization to module 13 (Part III where it's needed)
- Created three CIFAR-10 examples: random, MLP, CNN
- Each part introduces ONE major innovation (FC → Conv → Attention)

CIFAR-10 now showcases progression:
- test_random_baseline.py: ~10% (random chance)
- train_mlp.py: ~55% (no convolutions)
- train_cnn.py: ~60%+ (WITH Conv2D - shows why convolutions matter!)

This follows actual ML history and each module is needed for its capstone.
2025-09-22 10:07:09 -04:00

3.5 KiB
Raw Blame History

TinyTorch 15-Module Structure

Three-Part Journey: MLPs → CNNs → Transformers

Part I: Multi-Layer Perceptrons (Modules 1-5)

Goal: Build neural networks that can solve XOR

Module Topic What You Build
01 Setup Development environment
02 Tensors N-dimensional arrays
03 Activations ReLU, Sigmoid, Softmax
04 Layers Dense layers
05 Networks Sequential models

Capstone: XORNet - Proves neural networks can learn non-linear functions


Part II: Convolutional Neural Networks (Modules 6-10)

Goal: Build CNNs for image classification

Module Topic What You Build
06 Spatial Conv2D, MaxPool2D
07 DataLoader Efficient data pipelines
08 Autograd Automatic differentiation
09 Optimizers SGD, Adam
10 Training Complete training loops

Capstone: CIFAR-10 with three approaches:

  1. Random Baseline: ~10% accuracy (chance)
  2. MLP Approach: ~55% accuracy (no convolutions)
  3. CNN Approach: ~60%+ accuracy (WITH Conv2D!)

This progression shows WHY convolutions matter for vision!


Part III: Transformers (Modules 11-15)

Goal: Build transformers for text generation

Module Topic What You Build
11 Embeddings Token & positional encoding
12 Attention Multi-head attention
13 Normalization LayerNorm for stable training
14 Transformers Complete transformer blocks
15 Generation Autoregressive decoding

Capstone: TinyGPT - Character-level text generation


Why This Structure Works

Pedagogical Excellence

  • Each part introduces ONE major innovation:
    • Part I: Fully connected networks (the foundation)
    • Part II: Convolutions (spatial processing)
    • Part III: Attention (sequence processing)

Historical Accuracy

  • Follows ML evolution:
    • 1980s-90s: MLPs dominate
    • 2012: AlexNet shows CNNs beat MLPs on ImageNet
    • 2017: Transformers revolutionize NLP

Dependency-Driven Design

  • Nothing unnecessary: Each module is needed for its capstone
  • Progressive complexity: Each part builds on the previous
  • Clear motivation: Students see WHY each innovation matters

Module Dependencies

Part I: Foundations
├── 02_tensor (required by everything)
├── 03_activations (required by 04)
├── 04_layers (required by 05)
└── 05_networks (combines all above)
    └── ✅ XORNet works!

Part II: Computer Vision
├── 06_spatial (Conv2D - THE KEY!)
├── 07_dataloader (handle real data)
├── 08_autograd (enable learning)
├── 09_optimizers (gradient descent)
└── 10_training (put it all together)
    └── ✅ CIFAR-10 CNN works!

Part III: Language Models
├── 11_embeddings (discrete → continuous)
├── 12_attention (THE KEY!)
├── 13_normalization (stable training)
├── 14_transformers (attention + FFN)
└── 15_generation (sampling strategies)
    └── ✅ TinyGPT works!

What We Dropped

  • Module 16 (Regularization): Important but not essential for capstones
  • Module 17 (Systems): Kernels, benchmarking - advanced optimization

These could be bonus content or a separate "Production ML" course.

The Beauty of 15 Modules

  • 3 parts × 5 modules = 15: Perfect symmetry!
  • Each part is self-contained: Students can stop after any part
  • Clear progression: MLP → CNN → Transformer
  • Manageable scope: Achievable in one semester