Add tier overview pages for Foundation, Architecture, Optimization, and Capstone

- Add foundation.md: Overview of Foundation Tier (modules 01-07)
- Add architecture.md: Overview of Architecture Tier (modules 08-13)
- Add optimization.md: Overview of Optimization Tier (modules 14-19)
- Add olympics.md: Overview of Capstone Competition (module 20)

These pages provide comprehensive tier-level context and learning objectives
This commit is contained in:
Vijay Janapa Reddi
2025-11-14 18:27:13 -05:00
parent 15ab594dc4
commit a787bcbba4
4 changed files with 1113 additions and 0 deletions

246
site/tiers/architecture.md Normal file
View File

@@ -0,0 +1,246 @@
# 🏛️ Architecture Tier (Modules 08-13)
**Build modern neural architectures—from computer vision to language models.**
---
## What You'll Learn
The Architecture tier teaches you how to build the neural network architectures that power modern AI. You'll implement CNNs for computer vision, transformers for language understanding, and the data loading infrastructure needed to train on real datasets.
**By the end of this tier, you'll understand:**
- How data loaders efficiently feed training data to models
- Why convolutional layers are essential for computer vision
- How attention mechanisms enable transformers to understand sequences
- What embeddings do to represent discrete tokens as continuous vectors
- How modern architectures compose these components into powerful systems
---
## Module Progression
```{mermaid}
graph TB
F[🏗 Foundation<br/>Tensor, Autograd, Training]
F --> M08[08. DataLoader<br/>Efficient data pipelines]
F --> M09[09. Spatial<br/>Conv2d + Pooling]
M08 --> M09
M09 --> VISION[💡 Computer Vision<br/>CNNs unlock spatial intelligence]
F --> M10[10. Tokenization<br/>Text → integers]
M10 --> M11[11. Embeddings<br/>Integers → vectors]
M11 --> M12[12. Attention<br/>Context-aware representations]
M12 --> M13[13. Transformers<br/>Complete architecture]
M13 --> LLM[💡 Language Models<br/>Transformers generate text]
style F fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style M08 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
style M09 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
style M10 fill:#e1bee7,stroke:#6a1b9a,stroke-width:3px
style M11 fill:#e1bee7,stroke:#6a1b9a,stroke-width:3px
style M12 fill:#ce93d8,stroke:#4a148c,stroke-width:3px
style M13 fill:#ba68c8,stroke:#4a148c,stroke-width:4px
style VISION fill:#fef3c7,stroke:#f59e0b,stroke-width:3px
style LLM fill:#fef3c7,stroke:#f59e0b,stroke-width:3px
```
---
## Module Details
### 08. DataLoader - Efficient Data Pipelines
**What it is**: Infrastructure for loading, batching, and shuffling training data efficiently.
**Why it matters**: Real ML systems train on datasets that don't fit in memory. DataLoaders handle batching, shuffling, and parallel data loading—essential for efficient training.
**What you'll build**: A DataLoader that supports batching, shuffling, and dataset iteration with proper memory management.
**Systems focus**: Memory efficiency, batching strategies, I/O optimization
---
### 09. Spatial - Convolutional Neural Networks
**What it is**: Conv2d (convolutional layers) and pooling operations for processing images.
**Why it matters**: CNNs revolutionized computer vision by exploiting spatial structure. Understanding convolutions, kernels, and pooling is essential for image processing and beyond.
**What you'll build**: Conv2d, MaxPool2d, and related operations with proper gradient computation.
**Systems focus**: Spatial operations, memory layout (channels), computational intensity
**Historical impact**: This module enables **Milestone 04 (1998 CNN Revolution)** - achieving 75%+ accuracy on CIFAR-10 with YOUR implementations.
---
### 10. Tokenization - From Text to Numbers
**What it is**: Converting text into integer sequences that neural networks can process.
**Why it matters**: Neural networks operate on numbers, not text. Tokenization is the bridge between human language and machine learning—understanding vocabulary, encoding, and decoding is fundamental.
**What you'll build**: Character-level and subword tokenizers with vocabulary management and encoding/decoding.
**Systems focus**: Vocabulary management, encoding schemes, out-of-vocabulary handling
---
### 11. Embeddings - Learning Representations
**What it is**: Learned mappings from discrete tokens (words, characters) to continuous vectors.
**Why it matters**: Embeddings transform sparse, discrete representations into dense, semantic vectors. Understanding embeddings is crucial for NLP, recommendation systems, and any domain with categorical data.
**What you'll build**: Embedding layers with proper initialization and gradient computation.
**Systems focus**: Lookup tables, gradient backpropagation through indices, initialization
---
### 12. Attention - Context-Aware Representations
**What it is**: Self-attention mechanisms that let each token attend to all other tokens in a sequence.
**Why it matters**: Attention is the breakthrough that enabled modern LLMs. It allows models to capture long-range dependencies and contextual relationships that RNNs struggled with.
**What you'll build**: Scaled dot-product attention, multi-head attention, and causal masking for autoregressive generation.
**Systems focus**: O(n²) memory/compute, masking strategies, numerical stability
---
### 13. Transformers - The Modern Architecture
**What it is**: Complete transformer architecture combining embeddings, attention, and feedforward layers.
**Why it matters**: Transformers power GPT, BERT, and virtually all modern LLMs. Understanding their architecture—positional encodings, layer normalization, residual connections—is essential for AI engineering.
**What you'll build**: A complete decoder-only transformer (GPT-style) for autoregressive text generation.
**Systems focus**: Layer composition, residual connections, generation loop
**Historical impact**: This module enables **Milestone 05 (2017 Transformer Era)** - generating coherent text with YOUR attention implementation.
---
## What You Can Build After This Tier
```{mermaid}
timeline
title Historical Achievements Unlocked
1998 : CNN Revolution : 75%+ accuracy on CIFAR-10 with spatial intelligence
2017 : Transformer Era : Text generation with attention mechanisms
```
After completing the Architecture tier, you'll be able to:
- **Milestone 04 (1998)**: Build CNNs that achieve 75%+ accuracy on CIFAR-10 (color images)
- **Milestone 05 (2017)**: Implement transformers that generate coherent text responses
- Train on real datasets (MNIST, CIFAR-10, text corpora)
- Understand why modern architectures (ResNets, Vision Transformers, LLMs) work
---
## Prerequisites
**Required**:
- **🏗 Foundation Tier** (Modules 01-07) completed
- Understanding of tensors, autograd, and training loops
- Basic understanding of images (height, width, channels)
- Basic understanding of text/language concepts
**Helpful but not required**:
- Computer vision concepts (convolution, feature maps)
- NLP concepts (tokens, vocabulary, sequence modeling)
---
## Time Commitment
**Per module**: 4-6 hours (implementation + exercises + datasets)
**Total tier**: ~30-40 hours for complete mastery
**Recommended pace**: 1 module per week (2 modules/week for intensive study)
---
## Learning Approach
Each module follows the **Build → Use → Reflect** cycle with **real datasets**:
1. **Build**: Implement the architecture component (Conv2d, attention, transformers)
2. **Use**: Train on real data (CIFAR-10 images, text corpora)
3. **Reflect**: Analyze systems trade-offs (memory vs accuracy, speed vs quality)
---
## Key Achievements
### 🎯 Milestone 04: CNN Revolution (1998)
**After Module 09**, you'll recreate Yann LeCun's breakthrough:
```bash
cd milestones/04_1998_cnn
python 02_lecun_cifar10.py # 75%+ accuracy on CIFAR-10
```
**What makes this special**: You're not just importing `torch.nn.Conv2d`—you built the entire convolutional architecture from scratch.
### 🎯 Milestone 05: Transformer Era (2017)
**After Module 13**, you'll implement the attention revolution:
```bash
cd milestones/05_2017_transformer
python 01_vaswani_generation.py # Text generation with YOUR transformer
```
**What makes this special**: Your attention implementation powers the same architecture behind GPT, ChatGPT, and modern LLMs.
---
## Two Parallel Tracks
The Architecture tier splits into two parallel paths that can be learned in any order:
**Vision Track (Modules 08-09)**:
- DataLoader → Spatial (Conv2d + Pooling)
- Enables computer vision applications
- Culminates in CNN milestone
**Language Track (Modules 10-13)**:
- Tokenization → Embeddings → Attention → Transformers
- Enables natural language processing
- Culminates in Transformer milestone
**Recommendation**: Complete both tracks in order (08→09→10→11→12→13), but you can prioritize the track that interests you more.
---
## Next Steps
**Ready to build modern architectures?**
```bash
# Start the Architecture tier
tito module start 08_dataloader
# Or jump to language models
tito module start 10_tokenization
```
**Or explore other tiers:**
- **[🏗 Foundation Tier](foundation)** (Modules 01-07): Mathematical foundations
- **[⏱️ Optimization Tier](optimization)** (Modules 14-19): Production-ready performance
- **[🏅 Torch Olympics](olympics)** (Module 20): Compete in ML systems challenges
---
**[← Back to Home](../intro)** • **[View All Modules](../chapters/00-introduction)** • **[Historical Milestones](../chapters/milestones)**

206
site/tiers/foundation.md Normal file
View File

@@ -0,0 +1,206 @@
# 🏗 Foundation Tier (Modules 01-07)
**Build the mathematical core that makes neural networks learn.**
---
## What You'll Learn
The Foundation tier teaches you how to build a complete learning system from scratch. Starting with basic tensor operations, you'll construct the mathematical infrastructure that powers every modern ML framework—automatic differentiation, gradient-based optimization, and training loops.
**By the end of this tier, you'll understand:**
- How tensors represent and transform data in neural networks
- Why activation functions enable non-linear learning
- How backpropagation computes gradients automatically
- What optimizers do to make training converge
- How training loops orchestrate the entire learning process
---
## Module Progression
```{mermaid}
graph TB
M01[01. Tensor<br/>Multidimensional arrays] --> M03[03. Layers<br/>Linear transformations]
M02[02. Activations<br/>Non-linear functions] --> M03
M03 --> M04[04. Losses<br/>Measure prediction quality]
M03 --> M05[05. Autograd<br/>Automatic differentiation]
M04 --> M06[06. Optimizers<br/>Gradient-based updates]
M05 --> M06
M06 --> M07[07. Training<br/>Complete learning loop]
style M01 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
style M02 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
style M03 fill:#bbdefb,stroke:#1565c0,stroke-width:3px
style M04 fill:#90caf9,stroke:#1565c0,stroke-width:3px
style M05 fill:#90caf9,stroke:#1565c0,stroke-width:3px
style M06 fill:#64b5f6,stroke:#0d47a1,stroke-width:3px
style M07 fill:#42a5f5,stroke:#0d47a1,stroke-width:4px
```
---
## Module Details
### 01. Tensor - The Foundation of Everything
**What it is**: Multidimensional arrays with automatic shape tracking and broadcasting.
**Why it matters**: Tensors are the universal data structure for ML. Understanding tensor operations, broadcasting, and memory layouts is essential for building efficient neural networks.
**What you'll build**: A pure Python tensor class supporting arithmetic, reshaping, slicing, and broadcasting—just like PyTorch tensors.
**Systems focus**: Memory layout, broadcasting semantics, operation fusion
---
### 02. Activations - Enabling Non-Linear Learning
**What it is**: Non-linear functions applied element-wise to tensors.
**Why it matters**: Without activations, neural networks collapse to linear models. Activations like ReLU, Sigmoid, and Tanh enable networks to learn complex, non-linear patterns.
**What you'll build**: Common activation functions with their gradients for backpropagation.
**Systems focus**: Numerical stability, in-place operations, gradient flow
---
### 03. Layers - Building Blocks of Networks
**What it is**: Parameterized transformations (Linear, Conv2d) that learn from data.
**Why it matters**: Layers are the modular components you stack to build networks. Understanding weight initialization, parameter management, and forward passes is crucial.
**What you'll build**: Linear (fully-connected) layers with proper initialization and parameter tracking.
**Systems focus**: Parameter storage, initialization strategies, forward computation
---
### 04. Losses - Measuring Success
**What it is**: Functions that quantify how wrong your predictions are.
**Why it matters**: Loss functions define what "good" means for your model. Different tasks (classification, regression) require different loss functions.
**What you'll build**: CrossEntropyLoss, MSELoss, and other common objectives with their gradients.
**Systems focus**: Numerical stability (log-sum-exp trick), reduction strategies
---
### 05. Autograd - The Gradient Revolution
**What it is**: Automatic differentiation system that computes gradients through computation graphs.
**Why it matters**: Autograd is what makes deep learning practical. It automatically computes gradients for any computation, enabling backpropagation through arbitrarily complex networks.
**What you'll build**: A computational graph system that tracks operations and computes gradients via the chain rule.
**Systems focus**: Computational graphs, topological sorting, gradient accumulation
---
### 06. Optimizers - Learning from Gradients
**What it is**: Algorithms that update parameters using gradients (SGD, Adam, RMSprop).
**Why it matters**: Raw gradients don't directly tell you how to update parameters. Optimizers use momentum, adaptive learning rates, and other tricks to make training converge faster and more reliably.
**What you'll build**: SGD, Adam, and RMSprop with proper momentum and learning rate scheduling.
**Systems focus**: Update rules, momentum buffers, numerical stability
---
### 07. Training - Orchestrating the Learning Process
**What it is**: The training loop that ties everything together—forward pass, loss computation, backpropagation, parameter updates.
**Why it matters**: Training loops orchestrate the entire learning process. Understanding this flow—including batching, epochs, and validation—is essential for practical ML.
**What you'll build**: A complete training framework with progress tracking, validation, and model checkpointing.
**Systems focus**: Batch processing, gradient clipping, learning rate scheduling
---
## What You Can Build After This Tier
```{mermaid}
timeline
title Historical Achievements Unlocked
1957 : Perceptron : Binary classification with gradient descent
1969 : XOR Crisis Solved : Hidden layers enable non-linear learning
1986 : MLP Revival : Multi-layer networks achieve 95%+ on MNIST
```
After completing the Foundation tier, you'll be able to:
- **Milestone 01 (1957)**: Recreate the Perceptron, the first trainable neural network
- **Milestone 02 (1969)**: Solve the XOR problem that nearly ended AI research
- **Milestone 03 (1986)**: Build multi-layer perceptrons that achieve 95%+ accuracy on MNIST
---
## Prerequisites
**Required**:
- Python programming (functions, classes, loops)
- Basic linear algebra (matrix multiplication, dot products)
- Basic calculus (derivatives, chain rule)
**Helpful but not required**:
- NumPy experience
- Understanding of neural network concepts
---
## Time Commitment
**Per module**: 3-5 hours (implementation + exercises + systems thinking)
**Total tier**: ~25-35 hours for complete mastery
**Recommended pace**: 1-2 modules per week
---
## Learning Approach
Each module follows the **Build → Use → Reflect** cycle:
1. **Build**: Implement the component from scratch (tensor operations, autograd, optimizers)
2. **Use**: Apply it to real problems (toy datasets, simple networks)
3. **Reflect**: Answer systems thinking questions (memory usage, computational complexity, design trade-offs)
---
## Next Steps
**Ready to start building?**
```bash
# Start with Module 01: Tensor
tito module start 01_tensor
# Follow the daily workflow
# 1. Read the ABOUT guide
# 2. Implement in *_dev.py
# 3. Test with tito module test
# 4. Export to *_sol.py
```
**Or explore other tiers:**
- **[🏛️ Architecture Tier](architecture)** (Modules 08-13): CNNs, transformers, attention
- **[⏱️ Optimization Tier](optimization)** (Modules 14-19): Production-ready performance
- **[🏅 Torch Olympics](olympics)** (Module 20): Compete in ML systems challenges
---
**[← Back to Home](../intro)** • **[View All Modules](../chapters/00-introduction)** • **[Daily Workflow Guide](../student-workflow)**

385
site/tiers/olympics.md Normal file
View File

@@ -0,0 +1,385 @@
# 🏅 Torch Olympics (Module 20)
**The ultimate test: Build a complete, competition-ready ML system.**
---
## What Is the Torch Olympics?
The Torch Olympics is TinyTorch's **capstone experience**—a comprehensive challenge where you integrate everything you've learned across 19 modules to build, optimize, and compete with a complete ML system.
This isn't a traditional homework assignment. It's a **systems engineering competition** where you'll:
- Design and implement a complete neural architecture
- Train it on real datasets with YOUR framework
- Optimize for production deployment
- Benchmark against other students
- Submit to the TinyTorch Leaderboard
**Think of it as**: MLPerf meets academic research meets systems engineering—all using the framework YOU built.
---
## What You'll Build
```{mermaid}
graph TB
FOUNDATION[🏗 Foundation<br/>Tensor, Autograd, Training]
ARCHITECTURE[🏛️ Architecture<br/>CNNs, Transformers]
OPTIMIZATION[⏱️ Optimization<br/>Quantization, Acceleration]
FOUNDATION --> SYSTEM[🏅 Production System]
ARCHITECTURE --> SYSTEM
OPTIMIZATION --> SYSTEM
SYSTEM --> CHALLENGES[Competition Challenges]
CHALLENGES --> C1[Vision: CIFAR-10<br/>Goal: 80%+ accuracy]
CHALLENGES --> C2[Language: TinyTalks<br/>Goal: Coherent generation]
CHALLENGES --> C3[Optimization: Speed<br/>Goal: 100 tokens/sec]
CHALLENGES --> C4[Compression: Size<br/>Goal: <10MB model]
C1 --> LEADERBOARD[🏆 TinyTorch Leaderboard]
C2 --> LEADERBOARD
C3 --> LEADERBOARD
C4 --> LEADERBOARD
style FOUNDATION fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style ARCHITECTURE fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style OPTIMIZATION fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style SYSTEM fill:#fef3c7,stroke:#f59e0b,stroke-width:4px
style LEADERBOARD fill:#c8e6c9,stroke:#388e3c,stroke-width:4px
```
---
## Competition Tracks
### Track 1: Computer Vision Excellence
**Challenge**: Achieve the highest accuracy on CIFAR-10 (color images) using YOUR Conv2d implementation.
**Constraints**:
- Must use YOUR TinyTorch implementation (no PyTorch/TensorFlow)
- Training time: <2 hours on standard hardware
- Model size: <50MB
**Skills tested**:
- CNN architecture design
- Data augmentation strategies
- Hyperparameter tuning
- Training loop optimization
**Current record**: 82% accuracy (can you beat it?)
---
### Track 2: Language Generation Quality
**Challenge**: Build the best text generation system using YOUR transformer implementation.
**Evaluation**:
- Coherence: Do responses make sense?
- Relevance: Does the model stay on topic?
- Fluency: Is the language natural?
- Perplexity: Lower is better
**Constraints**:
- Must use YOUR attention + transformer code
- Trained on TinyTalks dataset
- Context length: 512 tokens
**Skills tested**:
- Transformer architecture design
- Tokenization strategy
- Training stability
- Generation sampling techniques
---
### Track 3: Inference Speed Championship
**Challenge**: Achieve the highest throughput (tokens/second) for transformer inference.
**Optimization techniques**:
- KV-cache implementation quality
- Batching efficiency
- Operation fusion
- Memory management
**Constraints**:
- Must maintain >95% of baseline accuracy
- Measured on standard hardware (CPU or GPU)
- Single-thread or multi-thread allowed
**Current record**: 250 tokens/sec (can you go faster?)
**Skills tested**:
- Profiling and bottleneck identification
- Cache management
- Systems-level optimization
- Performance benchmarking
---
### Track 4: Model Compression Masters
**Challenge**: Build the smallest model that maintains competitive accuracy.
**Optimization techniques**:
- Quantization (INT8, INT4)
- Structured pruning
- Knowledge distillation
- Architecture search
**Constraints**:
- Accuracy drop: <3% from baseline
- Target: <10MB model size
- Must run on CPU (no GPU required)
**Current record**: 8.2MB model with 92% CIFAR-10 accuracy
**Skills tested**:
- Quantization strategy
- Pruning methodology
- Accuracy-efficiency trade-offs
- Edge deployment considerations
---
## How It Works
### 1. Choose Your Challenge
Pick one or more competition tracks based on your interests:
- Vision (CNNs)
- Language (Transformers)
- Speed (Inference optimization)
- Size (Model compression)
### 2. Design Your System
Use all 19 modules you've completed:
```python
from tinytorch import Tensor, Linear, Conv2d, Attention # YOUR code
from tinytorch import Adam, CrossEntropyLoss # YOUR optimizers
from tinytorch import DataLoader, train_loop # YOUR infrastructure
# Design your architecture
model = YourCustomArchitecture() # Your design choices matter!
# Train with YOUR framework
optimizer = Adam(model.parameters(), lr=0.001)
train_loop(model, train_loader, optimizer, epochs=50)
# Optimize for production
quantized_model = quantize(model) # YOUR quantization
pruned_model = prune(quantized_model, sparsity=0.5) # YOUR pruning
```
### 3. Benchmark Rigorously
Use Module 19's benchmarking tools:
```bash
# Accuracy
tito benchmark accuracy --model your_model.pt --dataset cifar10
# Speed (tokens/sec)
tito benchmark speed --model your_transformer.pt --input-length 512
# Size (MB)
tito benchmark size --model your_model.pt
# Memory (peak usage)
tito benchmark memory --model your_model.pt
```
### 4. Submit to Leaderboard
```bash
# Package your submission
tito olympics submit \
--track vision \
--model your_model.pt \
--code your_training.py \
--report your_analysis.md
# View leaderboard
tito olympics leaderboard --track vision
```
---
## Leaderboard Dimensions
Your submission is evaluated across **multiple dimensions**:
| Dimension | Weight | What It Measures |
|-----------|--------|------------------|
| **Accuracy** | 40% | Primary task performance |
| **Speed** | 20% | Inference throughput (tokens/sec or images/sec) |
| **Size** | 20% | Model size in MB |
| **Code Quality** | 10% | Implementation clarity and documentation |
| **Innovation** | 10% | Novel techniques or insights |
**Final score**: Weighted combination of all dimensions. This mirrors real-world ML where you optimize for multiple objectives simultaneously.
---
## Learning Objectives
The Torch Olympics integrates everything you've learned:
### Systems Engineering Skills
- **Architecture design**: Making trade-offs between depth, width, and complexity
- **Hyperparameter tuning**: Systematic search vs intuition
- **Performance optimization**: Profiling → optimization → validation loop
- **Benchmarking**: Rigorous measurement and comparison
### Production Readiness
- **Deployment constraints**: Size, speed, memory limits
- **Quality assurance**: Testing, validation, error analysis
- **Documentation**: Explaining your design choices
- **Reproducibility**: Others can run your code
### Research Skills
- **Experimentation**: Hypothesis → experiment → analysis
- **Literature review**: Understanding SOTA techniques
- **Innovation**: Trying new ideas and combinations
- **Communication**: Writing clear technical reports
---
## Grading (For Classroom Use)
Instructors can use the Torch Olympics as a capstone project:
**Deliverables**:
1. **Working Implementation** (40%): Model trains and achieves target metrics
2. **Technical Report** (30%): Design choices, experiments, analysis
3. **Code Quality** (20%): Clean, documented, reproducible
4. **Leaderboard Performance** (10%): Relative ranking
**Example rubric**:
- 90-100%: Top 10% of leaderboard + excellent report
- 80-89%: Top 25% + good report
- 70-79%: Baseline metrics met + complete report
- 60-69%: Partial completion
- <60%: Incomplete submission
---
## Timeline
**Recommended schedule** (8-week capstone):
- **Weeks 1-2**: Challenge selection and initial implementation
- **Weeks 3-4**: Training and baseline experiments
- **Weeks 5-6**: Optimization and experimentation
- **Week 7**: Benchmarking and final tuning
- **Week 8**: Report writing and submission
**Intensive schedule** (2-week sprint):
- Days 1-3: Baseline implementation
- Days 4-7: Optimization sprint
- Days 8-10: Benchmarking
- Days 11-14: Documentation and submission
---
## Support and Resources
### Reference Implementations
Starter code is provided for each track:
```bash
# Vision track starter
tito olympics init --track vision --output ./my_vision_project
# Language track starter
tito olympics init --track language --output ./my_language_project
```
### Community
- **Discord**: Get help from other students and instructors
- **Office Hours**: Weekly video calls for Q&A
- **Leaderboard**: See what others are achieving
- **Forums**: Share insights and techniques
### Documentation
- **[MLPerf Milestone](../chapters/milestones)**: Historical context
- **[Benchmarking Guide](../modules/19_benchmarking_ABOUT)**: Measurement methodology
- **[Optimization Techniques](../tiers/optimization)**: Compression and acceleration strategies
---
## Prerequisites
**Required**:
- ✅ **All 19 modules completed** (Foundation + Architecture + Optimization)
- ✅ Experience training models on real datasets
- ✅ Understanding of profiling and benchmarking
- ✅ Comfort with YOUR TinyTorch codebase
**Highly recommended**:
- Complete all 6 historical milestones (1957-2018)
- Review optimization tier (Modules 14-19)
- Practice with profiling tools
---
## Time Commitment
**Minimum**: 20-30 hours for single track completion
**Recommended**: 40-60 hours for multi-track competition + excellent report
**Intensive**: 80+ hours for top leaderboard performance + research-level analysis
This is a capstone project—expect it to be challenging and rewarding!
---
## What You'll Take Away
By completing the Torch Olympics, you'll have:
1. **Portfolio piece**: A complete ML system you built from scratch
2. **Systems thinking**: Deep understanding of ML engineering trade-offs
3. **Benchmarking skills**: Ability to measure and optimize systematically
4. **Production experience**: End-to-end ML system development
5. **Competition experience**: Leaderboard ranking and peer comparison
**This is what sets TinyTorch apart**: You didn't just learn to use ML frameworks—you built one, optimized it, and competed with it.
---
## Next Steps
**Ready to compete?**
```bash
# Initialize your Torch Olympics project
tito olympics init --track vision
# Review the rules
tito olympics rules
# View current leaderboard
tito olympics leaderboard
```
**Or review prerequisites:**
- **[🏗 Foundation Tier](foundation)** (Modules 01-07)
- **[🏛️ Architecture Tier](architecture)** (Modules 08-13)
- **[⏱️ Optimization Tier](optimization)** (Modules 14-19)
---
**[← Back to Home](../intro)** • **[View Leaderboard](../leaderboard)** • **[Competition Rules](../olympics-rules)**

276
site/tiers/optimization.md Normal file
View File

@@ -0,0 +1,276 @@
# ⏱️ Optimization Tier (Modules 14-19)
**Transform research prototypes into production-ready systems.**
---
## What You'll Learn
The Optimization tier teaches you how to make ML systems fast, small, and deployable. You'll learn systematic profiling, model compression through quantization and pruning, inference acceleration with caching and batching, and comprehensive benchmarking methodologies.
**By the end of this tier, you'll understand:**
- How to identify performance bottlenecks through profiling
- Why quantization reduces model size by 4-16× with minimal accuracy loss
- How pruning removes unnecessary parameters to compress models
- What KV-caching does to accelerate transformer inference
- How batching and other optimizations achieve production speed
---
## Module Progression
```{mermaid}
graph TB
A[🏛️ Architecture<br/>CNNs + Transformers]
A --> M14[14. Profiling<br/>Find bottlenecks]
M14 --> M15[15. Quantization<br/>INT8 compression]
M14 --> M16[16. Compression<br/>Structured pruning]
M15 --> SMALL[💡 Smaller Models<br/>4-16× size reduction]
M16 --> SMALL
M14 --> M17[17. Memoization<br/>KV-cache for inference]
M17 --> M18[18. Acceleration<br/>Batching + optimizations]
M18 --> FAST[💡 Faster Inference<br/>12-40× speedup]
SMALL --> M19[19. Benchmarking<br/>Systematic measurement]
FAST --> M19
M19 --> OLYMPICS[🏅 MLPerf Torch Olympics<br/>Production-ready systems]
style A fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
style M14 fill:#fff3e0,stroke:#f57c00,stroke-width:3px
style M15 fill:#ffe0b2,stroke:#ef6c00,stroke-width:3px
style M16 fill:#ffe0b2,stroke:#ef6c00,stroke-width:3px
style M17 fill:#ffcc80,stroke:#e65100,stroke-width:3px
style M18 fill:#ffb74d,stroke:#e65100,stroke-width:3px
style M19 fill:#ffa726,stroke:#e65100,stroke-width:4px
style SMALL fill:#c8e6c9,stroke:#388e3c,stroke-width:3px
style FAST fill:#c8e6c9,stroke:#388e3c,stroke-width:3px
style OLYMPICS fill:#fef3c7,stroke:#f59e0b,stroke-width:4px
```
---
## Module Details
### 14. Profiling - Measure Before Optimizing
**What it is**: Tools and techniques to identify computational bottlenecks in ML systems.
**Why it matters**: "Premature optimization is the root of all evil." Profiling tells you WHERE to optimize—which operations consume the most time, memory, or energy. Without profiling, you're guessing.
**What you'll build**: Memory profilers, timing utilities, and FLOPs counters to analyze model performance.
**Systems focus**: Time complexity, space complexity, computational graphs, hotspot identification
**Key insight**: Don't optimize blindly. Profile first, then optimize the bottlenecks.
---
### 15. Quantization - Smaller Models, Similar Accuracy
**What it is**: Converting FP32 weights to INT8 to reduce model size and speed up inference.
**Why it matters**: Quantization achieves 4× size reduction and faster computation with minimal accuracy loss (often <1%). Essential for deploying models on edge devices or reducing cloud costs.
**What you'll build**: Post-training quantization (PTQ) for weights and activations with calibration.
**Systems focus**: Numerical precision, scale/zero-point calculation, quantization-aware operations
**Impact**: Models shrink from 100MB → 25MB while maintaining 95%+ of original accuracy.
---
### 16. Compression - Pruning Unnecessary Parameters
**What it is**: Removing unimportant weights and neurons through structured pruning.
**Why it matters**: Neural networks are often over-parameterized. Pruning removes 50-90% of parameters with minimal accuracy loss, reducing memory and computation.
**What you'll build**: Magnitude-based pruning, structured pruning (entire channels/layers), and fine-tuning after pruning.
**Systems focus**: Sparsity patterns, memory layout, retraining strategies
**Impact**: Combined with quantization, achieve 8-16× compression (quantize + prune).
---
### 17. Memoization - KV-Cache for Fast Generation
**What it is**: Caching key-value pairs in transformers to avoid recomputing attention for previously generated tokens.
**Why it matters**: Without KV-cache, generating each new token requires O(n²) recomputation of all previous tokens. With KV-cache, generation becomes O(n), achieving 10-100× speedups for long sequences.
**What you'll build**: KV-cache implementation for transformer inference with proper memory management.
**Systems focus**: Cache management, memory vs speed trade-offs, incremental computation
**Impact**: Text generation goes from 0.5 tokens/sec → 50+ tokens/sec.
---
### 18. Acceleration - Batching and Beyond
**What it is**: Batching multiple requests, operation fusion, and other inference optimizations.
**Why it matters**: Production systems serve multiple users simultaneously. Batching amortizes overhead across requests, achieving near-linear throughput scaling.
**What you'll build**: Dynamic batching, operation fusion, and inference server patterns.
**Systems focus**: Throughput vs latency, memory pooling, request scheduling
**Impact**: Combined with KV-cache, achieve 12-40× faster inference than naive implementations.
---
### 19. Benchmarking - Systematic Measurement
**What it is**: Rigorous methodology for measuring model performance across multiple dimensions.
**Why it matters**: "What gets measured gets managed." Benchmarking provides apples-to-apples comparisons of accuracy, speed, memory, and energy—essential for production decisions.
**What you'll build**: Comprehensive benchmarking suite measuring accuracy, latency, throughput, memory, and FLOPs.
**Systems focus**: Measurement methodology, statistical significance, performance metrics
**Historical context**: MLCommons' MLPerf (founded 2018) established systematic benchmarking as AI systems grew too complex for ad-hoc evaluation.
---
## What You Can Build After This Tier
```{mermaid}
timeline
title Production-Ready Systems
Baseline : 100MB model, 0.5 tokens/sec, 95% accuracy
Quantization : 25MB model (4× smaller), same accuracy
Pruning : 12MB model (8× smaller), 94% accuracy
KV-Cache : 50 tokens/sec (100× faster generation)
Batching : 500 tokens/sec (1000× throughput)
MLPerf Olympics : Production-ready transformer deployment
```
After completing the Optimization tier, you'll be able to:
- **Milestone 06 (2018)**: Achieve production-ready optimization:
- 8-16× smaller models (quantization + pruning)
- 12-40× faster inference (KV-cache + batching)
- Systematic profiling and benchmarking workflows
- Deploy models that run on:
- Edge devices (Raspberry Pi, mobile phones)
- Cloud infrastructure (cost-effective serving)
- Real-time applications (low-latency requirements)
---
## Prerequisites
**Required**:
- **🏛️ Architecture Tier** (Modules 08-13) completed
- Understanding of CNNs and/or transformers
- Experience training models on real datasets
- Basic understanding of systems concepts (memory, CPU/GPU, throughput)
**Helpful but not required**:
- Production ML experience
- Systems programming background
- Understanding of hardware constraints
---
## Time Commitment
**Per module**: 4-6 hours (implementation + profiling + benchmarking)
**Total tier**: ~30-40 hours for complete mastery
**Recommended pace**: 1 module per week (this tier is dense!)
---
## Learning Approach
Each module follows **Measure → Optimize → Validate**:
1. **Measure**: Profile baseline performance (time, memory, accuracy)
2. **Optimize**: Implement optimization technique (quantize, prune, cache)
3. **Validate**: Benchmark improvements and understand trade-offs
This mirrors production ML workflows where optimization is an iterative, data-driven process.
---
## Key Achievement: MLPerf Torch Olympics
**After Module 19**, you'll complete the **MLPerf Torch Olympics Milestone (2018)**:
```bash
cd milestones/06_2018_mlperf
python 01_baseline_profile.py # Identify bottlenecks
python 02_compression.py # Quantize + prune (8-16× smaller)
python 03_generation_opts.py # KV-cache + batching (12-40× faster)
```
**What makes this special**: You'll have built the entire optimization pipeline from scratch—profiling tools, quantization engine, pruning algorithms, caching systems, and benchmarking infrastructure.
---
## Two Optimization Tracks
The Optimization tier has two parallel focuses:
**Size Optimization (Modules 15-16)**:
- Quantization (INT8 compression)
- Pruning (removing parameters)
- Goal: Smaller models for deployment
**Speed Optimization (Modules 17-18)**:
- Memoization (KV-cache)
- Acceleration (batching, fusion)
- Goal: Faster inference for production
Both tracks start from **Module 14 (Profiling)** and converge at **Module 19 (Benchmarking)**.
**Recommendation**: Complete modules in order (14→15→16→17→18→19) to build a complete understanding of the optimization landscape.
---
## Real-World Impact
The techniques in this tier are used by every production ML system:
- **Quantization**: TensorFlow Lite, ONNX Runtime, Apple Neural Engine
- **Pruning**: Mobile ML, edge AI, efficient transformers
- **KV-Cache**: All transformer inference engines (vLLM, TGI, llama.cpp)
- **Batching**: Cloud serving (AWS SageMaker, GCP Vertex AI)
- **Benchmarking**: MLPerf industry standard for AI performance
After this tier, you'll understand how real ML systems achieve production performance.
---
## Next Steps
**Ready to optimize?**
```bash
# Start the Optimization tier
tito module start 14_profiling
# Follow the measure → optimize → validate cycle
```
**Or explore other tiers:**
- **[🏗 Foundation Tier](foundation)** (Modules 01-07): Mathematical foundations
- **[🏛️ Architecture Tier](architecture)** (Modules 08-13): CNNs and transformers
- **[🏅 Torch Olympics](olympics)** (Module 20): Final integration challenge
---
**[← Back to Home](../intro)** • **[View All Modules](../chapters/00-introduction)** • **[MLPerf Milestone](../chapters/milestones)**