mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-04-28 14:12:53 -05:00

Files

Vijay Janapa Reddi 7c58db8458 Finalize 15-module structure: MLPs → CNNs → Transformers

Clean, dependency-driven organization:
- Part I (1-5): MLPs for XORNet
- Part II (6-10): CNNs for CIFAR-10
- Part III (11-15): Transformers for TinyGPT

Key improvements:
- Dropped modules 16-17 (regularization/systems) to maintain scope
- Moved normalization to module 13 (Part III where it's needed)
- Created three CIFAR-10 examples: random, MLP, CNN
- Each part introduces ONE major innovation (FC → Conv → Attention)

CIFAR-10 now showcases progression:
- test_random_baseline.py: ~10% (random chance)
- train_mlp.py: ~55% (no convolutions)
- train_cnn.py: ~60%+ (WITH Conv2D - shows why convolutions matter!)

This follows actual ML history and each module is needed for its capstone.

2025-09-22 10:07:09 -04:00

3.5 KiB

Raw Blame History

TinyTorch 15-Module Structure

Three-Part Journey: MLPs → CNNs → Transformers

Part I: Multi-Layer Perceptrons (Modules 1-5)

Goal: Build neural networks that can solve XOR

Module	Topic	What You Build
01	Setup	Development environment
02	Tensors	N-dimensional arrays
03	Activations	ReLU, Sigmoid, Softmax
04	Layers	Dense layers
05	Networks	Sequential models

Capstone: XORNet - Proves neural networks can learn non-linear functions

Part II: Convolutional Neural Networks (Modules 6-10)

Goal: Build CNNs for image classification

Module	Topic	What You Build
06	Spatial	Conv2D, MaxPool2D
07	DataLoader	Efficient data pipelines
08	Autograd	Automatic differentiation
09	Optimizers	SGD, Adam
10	Training	Complete training loops

Capstone: CIFAR-10 with three approaches:

Random Baseline: ~10% accuracy (chance)
MLP Approach: ~55% accuracy (no convolutions)
CNN Approach: ~60%+ accuracy (WITH Conv2D!)

This progression shows WHY convolutions matter for vision!

Part III: Transformers (Modules 11-15)

Goal: Build transformers for text generation

Module	Topic	What You Build
11	Embeddings	Token & positional encoding
12	Attention	Multi-head attention
13	Normalization	LayerNorm for stable training
14	Transformers	Complete transformer blocks
15	Generation	Autoregressive decoding

Capstone: TinyGPT - Character-level text generation

Why This Structure Works

Pedagogical Excellence

Each part introduces ONE major innovation:
- Part I: Fully connected networks (the foundation)
- Part II: Convolutions (spatial processing)
- Part III: Attention (sequence processing)

Historical Accuracy

Follows ML evolution:
- 1980s-90s: MLPs dominate
- 2012: AlexNet shows CNNs beat MLPs on ImageNet
- 2017: Transformers revolutionize NLP

Dependency-Driven Design

Nothing unnecessary: Each module is needed for its capstone
Progressive complexity: Each part builds on the previous
Clear motivation: Students see WHY each innovation matters

Module Dependencies

Part I: Foundations
├── 02_tensor (required by everything)
├── 03_activations (required by 04)
├── 04_layers (required by 05)
└── 05_networks (combines all above)
    └── ✅ XORNet works!

Part II: Computer Vision
├── 06_spatial (Conv2D - THE KEY!)
├── 07_dataloader (handle real data)
├── 08_autograd (enable learning)
├── 09_optimizers (gradient descent)
└── 10_training (put it all together)
    └── ✅ CIFAR-10 CNN works!

Part III: Language Models
├── 11_embeddings (discrete → continuous)
├── 12_attention (THE KEY!)
├── 13_normalization (stable training)
├── 14_transformers (attention + FFN)
└── 15_generation (sampling strategies)
    └── ✅ TinyGPT works!

What We Dropped

Module 16 (Regularization): Important but not essential for capstones
Module 17 (Systems): Kernels, benchmarking - advanced optimization

These could be bonus content or a separate "Production ML" course.

The Beauty of 15 Modules

3 parts × 5 modules = 15: Perfect symmetry!
Each part is self-contained: Students can stop after any part
Clear progression: MLP → CNN → Transformer
Manageable scope: Achievable in one semester

3.5 KiB Raw Blame History Unescape Escape