- Replace all .html → .md in markdown source files (43 instances) - Fix broken links: tito-essentials.md → tito/overview.md - Remove broken links to non-existent leaderboard/olympics-rules pages - Fix PDF_BUILD_GUIDE reference in website-README.md Website rebuilt successfully with 46 warnings. Changes: - All markdown files now use .md extension for internal links - Removed references to missing/planned files - Website builds cleanly and all links are functional
20 KiB
Course Introduction: ML Systems Engineering Through Implementation
Transform from ML user to ML systems engineer by building everything yourself.
The Origin Story: Why TinyTorch Exists
The Problem We're Solving
There's a critical gap in ML engineering today. Plenty of people can use ML frameworks (PyTorch, TensorFlow, JAX, etc.), but very few understand the systems underneath. This creates real problems:
- Engineers deploy models but can't debug when things go wrong
- Teams hit performance walls because no one understands the bottlenecks
- Companies struggle to scale - whether to tiny edge devices or massive clusters
- Innovation stalls when everyone is limited to existing framework capabilities
How TinyTorch Began
TinyTorch started as exercises for the MLSysBook.ai textbook - students needed hands-on implementation experience. But it quickly became clear this addressed a much bigger problem:
The industry desperately needs engineers who can BUILD ML systems, not just USE them.
Deploying ML systems at scale is hard. Scale means both directions:
- Small scale: Running models on edge devices with 1MB of RAM
- Large scale: Training models across thousands of GPUs
- Production scale: Serving millions of requests with <100ms latency
We need more engineers who understand memory hierarchies, computational graphs, kernel optimization, distributed communication - the actual systems that make ML work.
Our Solution: Learn By Building
TinyTorch teaches ML systems the only way that really works: by building them yourself.
When you implement your own tensor operations, write your own autograd, build your own optimizer - you gain understanding that's impossible to achieve by just calling APIs. You learn not just what these systems do, but HOW they do it and WHY they're designed that way.
Core Learning Concepts
Concept 1: Systems Memory Analysis
# Learning objective: Understand memory usage patterns
# Framework user: "torch.optim.Adam()" - black box
# TinyTorch student: Implements Adam and discovers why it needs 3x parameter memory
# Result: Deep understanding of optimizer trade-offs applicable to any framework
Concept 2: Computational Complexity
# Learning objective: Analyze algorithmic scaling behavior
# Framework user: "Attention mechanism" - abstract concept
# TinyTorch student: Implements attention from scratch, measures O(n²) scaling
# Result: Intuition for sequence modeling limits across PyTorch, TensorFlow, JAX
Concept 3: Automatic Differentiation
# Learning objective: Understand gradient computation
# Framework user: "loss.backward()" - mysterious process
# TinyTorch student: Builds autograd engine with computational graphs
# Result: Knowledge of how all modern ML frameworks enable learning
What Makes TinyTorch Different
Most ML education teaches you to use frameworks (PyTorch, TensorFlow, JAX, etc.). TinyTorch teaches you to build them.
This fundamental difference creates engineers who understand systems deeply, not just APIs superficially.
The Learning Philosophy: Build → Use → Reflect
Traditional Approach:
import torch
model = torch.nn.Linear(784, 10) # Use someone else's implementation
output = model(input) # Trust it works, don't understand how
TinyTorch Approach:
# 1. BUILD: You implement Linear from scratch
class Linear:
def forward(self, x):
return x @ self.weight + self.bias # You write this
# 2. USE: Your implementation in action
from tinytorch.core.layers import Linear # YOUR code
model = Linear(784, 10) # YOUR implementation
output = model(input) # YOU know exactly how this works
# 3. REFLECT: Systems thinking
# "Why does matrix multiplication dominate compute time?"
# "How does this scale with larger models?"
# "What memory optimizations are possible?"
Who This Course Serves
Perfect For:
🎓 Computer Science Students
- Want to understand ML systems beyond high-level APIs
- Need to implement custom operations for research
- Preparing for ML engineering roles that require systems knowledge
👩💻 Software Engineers → ML Engineers
- Transitioning into ML engineering roles
- Need to debug and optimize production ML systems
- Want to understand what happens "under the hood" of ML frameworks
🔬 ML Practitioners & Researchers
- Debug performance issues in production systems
- Implement novel architectures and custom operations
- Optimize training and inference for resource constraints
🧠 Anyone Curious About ML Systems
- Understand how PyTorch, TensorFlow actually work
- Build intuition for ML systems design and optimization
- Appreciate the engineering behind modern AI breakthroughs
Prerequisites
Required:
- Python Programming: Comfortable with classes, functions, basic NumPy
- Linear Algebra Basics: Matrix multiplication, gradients (we review as needed)
- Learning Mindset: Willingness to implement rather than just use
Not Required:
- Prior ML framework experience (we build our own!)
- Deep learning theory (we learn through implementation)
- Advanced math (we focus on practical systems implementation)
What You'll Achieve: Tier-by-Tier Mastery
After Foundation Tier (Modules 01-07)
Build a complete neural network framework from mathematical first principles:
# YOUR implementation training real networks on real data
model = Sequential([
Linear(784, 128), # Your linear algebra implementation
ReLU(), # Your activation function
Linear(128, 64), # Your gradient-aware layers
ReLU(), # Your nonlinearity
Linear(64, 10) # Your classification head
])
# YOUR complete training system
optimizer = Adam(model.parameters(), lr=0.001) # Your optimization algorithm
for batch in dataloader: # Your data management
output = model(batch.x) # Your forward computation
loss = CrossEntropyLoss()(output, batch.y) # Your loss calculation
loss.backward() # YOUR backpropagation engine
optimizer.step() # Your parameter updates
🎯 Foundation Achievement: 95%+ accuracy on MNIST using 100% your own mathematical implementations
After Architecture Tier (Modules 08-13)
- Computer Vision Mastery: CNNs achieving 75%+ accuracy on CIFAR-10 with YOUR convolution implementations
- Language Understanding: Transformers generating coherent text using YOUR attention mechanisms
- Universal Architecture: Discover why the SAME mathematical principles work for vision AND language
- AI Breakthrough Recreation: Implement the architectures that created the modern AI revolution
After Optimization Tier (Modules 14-20)
- Production Performance: Systems optimized for <100ms inference latency using YOUR profiling tools
- Memory Efficiency: Models compressed to 25% original size with YOUR quantization implementations
- Hardware Acceleration: Kernels achieving 10x speedups through YOUR vectorization techniques
- Competition Ready: Torch Olympics submissions competitive with industry implementations
The ML Evolution Story You'll Experience
TinyTorch's three-tier structure follows the actual historical progression of machine learning breakthroughs:
Foundation Era (1980s-1990s) → Foundation Tier
The Beginning: Mathematical foundations that started it all
- 1986 Breakthrough: Backpropagation enables multi-layer networks
- Your Implementation: Build automatic differentiation and gradient-based optimization
- Historical Milestone: Train MLPs to 95%+ accuracy on MNIST using YOUR autograd engine
Architecture Era (1990s-2010s) → Architecture Tier
The Revolution: Specialized architectures for vision and language
- 1998 Breakthrough: CNNs revolutionize computer vision (LeCun's LeNet)
- 2017 Breakthrough: Transformers unify vision and language ("Attention is All You Need")
- Your Implementation: Build CNNs achieving 75%+ on CIFAR-10, then transformers for text generation
- Historical Milestone: Recreate both revolutions using YOUR spatial and attention implementations
Optimization Era (2010s-Present) → Optimization Tier
The Engineering: Production systems that scale to billions of users
- 2020s Breakthrough: Efficient inference enables real-time LLMs (GPT, ChatGPT)
- Your Implementation: Build KV-caching, quantization, and production optimizations
- Historical Milestone: Deploy systems competitive in Torch Olympics benchmarks
Why This Progression Matters: You'll understand not just modern AI, but WHY it evolved this way. Each tier builds essential capabilities that inform the next, just like ML history itself.
Systems Engineering Focus: Why Tiers Matter
Traditional ML courses teach algorithms in isolation. TinyTorch's tier structure teaches systems thinking - how components interact to create production ML systems.
Traditional Linear Approach:
Module 1: Tensors → Module 2: Layers → Module 3: Training → ...
Problem: Students learn components but miss system interactions
TinyTorch Tier Approach:
🏗️ Foundation Tier: Build mathematical infrastructure
🏛️ Architecture Tier: Compose intelligent architectures
⚡ Optimization Tier: Deploy at production scale
Advantage: Each tier builds complete, working systems with clear progression
What Traditional Courses Teach vs. TinyTorch Tiers:
Traditional: "Use torch.optim.Adam for optimization"
Foundation Tier: "Why Adam needs 3× more memory than SGD and how to implement both from mathematical first principles"
Traditional: "Transformers use attention mechanisms" Architecture Tier: "How attention creates O(N²) scaling, why this limits context windows, and how to implement efficient attention yourself"
Traditional: "Deploy models with TensorFlow Serving" Optimization Tier: "How to profile bottlenecks, implement KV-caching for 10× speedup, and compete in production benchmarks"
Career Impact by Tier
After each tier, you become the team member who:
🏗️ Foundation Tier Graduate:
- Debugs gradient flow issues: "Your ReLU is causing dead neurons"
- Implements custom optimizers: "I'll build a variant of Adam for this use case"
- Understands memory patterns: "Batch size 64 hits your GPU memory limit here"
🏛️ Architecture Tier Graduate:
- Designs novel architectures: "We can adapt transformers for this computer vision task"
- Optimizes attention patterns: "This attention bottleneck is why your model won't scale to longer sequences"
- Bridges vision and language: "The same mathematical principles work for both domains"
⚡ Optimization Tier Graduate:
- Deploys production systems: "I can get us from 500ms to 50ms inference latency"
- Leads performance optimization: "Here's our memory bottleneck and my 3-step plan to fix it"
- Competes at industry scale: "Our optimizations achieve Torch Olympics benchmark performance"
Learning Support & Community
Comprehensive Infrastructure
- Automated Testing: Every component includes comprehensive test suites
- Progress Tracking: 16-checkpoint capability assessment system
- CLI Tools:
titocommand-line interface for development workflow - Visual Progress: Real-time tracking of learning milestones
Multiple Learning Paths
- Quick Exploration (5 min): Browser-based exploration, no setup required
- Serious Development (8+ weeks): Full local development environment
- Classroom Use: Complete course infrastructure with automated grading
Professional Development Practices
- Version Control: Git-based workflow with feature branches
- Testing Culture: Test-driven development for all implementations
- Code Quality: Professional coding standards and review processes
- Documentation: Comprehensive guides and system architecture documentation
Start Your Journey
Begin Building ML Systems
Choose your starting point based on your goals and time commitment
15-Minute Start → Foundation Tier →Next Steps:
- New to TinyTorch: Start with Quick Start Guide for immediate hands-on experience
- Ready to Commit: Begin Module 01: Tensor to start building
- Teaching a Course: Review Getting Started Guide - For Instructors for classroom integration
:class: tip
By completing all three tiers, you'll have built a complete ML framework that rivals production implementations:
**🏗️ Foundation Tier Achievement**: 95%+ accuracy on MNIST with YOUR mathematical implementations
**🏛️ Architecture Tier Achievement**: 75%+ accuracy on CIFAR-10 AND coherent text generation
**⚡ Optimization Tier Achievement**: Production systems competitive in Torch Olympics benchmarks
All using code you wrote yourself, from mathematical first principles to production optimization.
📖 Want to understand the pedagogical narrative behind this structure? See The Learning Journey to understand WHY modules flow this way and HOW they build on each other through a six-act learning story.
Foundation Tier (Modules 01-07)
Building Blocks of ML Systems • 6-8 weeks • All Prerequisites for Neural Networks
What You'll Learn: Build the mathematical and computational infrastructure that powers all neural networks. Master tensor operations, gradient computation, and optimization algorithms.
Prerequisites: Python programming, basic linear algebra (matrix multiplication)
Career Connection: Foundation skills required for ML Infrastructure Engineer, Research Engineer, Framework Developer roles
Time Investment: ~20 hours total (3 hours/week for 6-8 weeks)
| Module | Component | Core Capability | Real-World Connection |
|---|---|---|---|
| 01 | Tensor | Data structures and operations | NumPy, PyTorch tensors |
| 02 | Activations | Nonlinear functions | ReLU, attention activations |
| 03 | Layers | Linear transformations | nn.Linear, dense layers |
| 04 | Losses | Optimization objectives | CrossEntropy, MSE loss |
| 05 | Autograd | Automatic differentiation | PyTorch autograd engine |
| 06 | Optimizers | Parameter updates | Adam, SGD optimizers |
| 07 | Training | Complete training loops | Model.fit(), training scripts |
🎯 Tier Milestone: Train neural networks achieving 95%+ accuracy on MNIST using 100% your own implementations!
Skills Gained:
- Understand memory layout and computational graphs
- Debug gradient flow and numerical stability issues
- Implement any optimization algorithm from research papers
- Build custom neural network architectures from scratch
Architecture Tier (Modules 08-13)
Modern AI Algorithms • 4-6 weeks • Vision + Language Architectures
What You'll Learn: Implement the architectures powering modern AI: convolutional networks for vision and transformers for language. Discover why the same mathematical principles work across domains.
Prerequisites: Foundation Tier complete (Modules 01-07)
Career Connection: Computer Vision Engineer, NLP Engineer, AI Research Scientist, ML Product Manager roles
Time Investment: ~25 hours total (4-6 hours/week for 4-6 weeks)
| Module | Component | Core Capability | Real-World Connection |
|---|---|---|---|
| 08 | Spatial | Convolutions and regularization | CNNs, ResNet, computer vision |
| 09 | DataLoader | Batch processing | PyTorch DataLoader, tf.data |
| 10 | Tokenization | Text preprocessing | BERT tokenizer, GPT tokenizer |
| 11 | Embeddings | Representation learning | Word2Vec, positional encodings |
| 12 | Attention | Information routing | Multi-head attention, self-attention |
| 13 | Transformers | Modern architectures | GPT, BERT, Vision Transformer |
🎯 Tier Milestone: Achieve 75%+ accuracy on CIFAR-10 with CNNs AND generate coherent text with transformers!
Skills Gained:
- Understand why convolution works for spatial data
- Implement attention mechanisms from scratch
- Build transformer architectures for any domain
- Debug sequence modeling and attention patterns
Optimization Tier (Modules 14-19)
Production & Performance • 4-6 weeks • Deploy and Scale ML Systems
What You'll Learn: Transform research models into production systems. Master profiling, optimization, and deployment techniques used by companies like OpenAI, Google, and Meta.
Prerequisites: Architecture Tier complete (Modules 08-13)
Career Connection: ML Systems Engineer, Performance Engineer, MLOps Engineer, Senior ML Engineer roles
Time Investment: ~30 hours total (5-7 hours/week for 4-6 weeks)
| Module | Component | Core Capability | Real-World Connection |
|---|---|---|---|
| 14 | Profiling | Performance analysis | PyTorch Profiler, TensorBoard |
| 15 | Quantization | Memory efficiency | INT8 inference, model compression |
| 16 | Compression | Model optimization | Pruning, distillation, ONNX |
| 17 | Memoization | Memory management | KV-cache for generation |
| 18 | Acceleration | Speed improvements | CUDA kernels, vectorization |
| 19 | Benchmarking | Measurement systems | Torch Olympics, production monitoring |
| 20 | Capstone | Full system integration | End-to-end ML pipeline |
🎯 Tier Milestone: Build production-ready systems competitive in Torch Olympics benchmarks!
Skills Gained:
- Profile memory usage and identify bottlenecks
- Implement efficient inference optimizations
- Deploy models with <100ms latency requirements
- Design scalable ML system architectures
Learning Path Recommendations
Choose Your Learning Style
🚀 Complete Builder
Implement every component from scratch
Time: 14-18 weeks
Ideal for: CS students, aspiring ML engineers
⚡ Focused Explorer
Pick one tier based on your goals
Time: 4-8 weeks
Ideal for: Working professionals, specific skill gaps
📚 Guided Learner
Study implementations with hands-on exercises
Time: 8-12 weeks
Ideal for: Self-directed learners, bootcamp graduates
Welcome to ML systems engineering!