12 KiB
TinyTorch: Build ML Systems from Scratch
Don't just import it. Build it.
What is TinyTorch?
TinyTorch is an educational ML systems course where you build complete neural networks from scratch. Instead of blindly using PyTorch or TensorFlow as black boxes, you implement every component yourself—from tensors and gradients to optimizers and attention mechanisms—gaining deep understanding of how modern ML frameworks actually work.
Core Learning Approach: Build → Profile → Optimize. You'll implement each system component, measure its performance characteristics, and understand the engineering trade-offs that shape production ML systems.
Three-Tier Learning Pathway
TinyTorch organizes learning through three pedagogically-motivated tiers that follow ML history:
🏗️ Foundation Tier (Modules 01-07): Build mathematical infrastructure - tensors, autograd, optimizers 🧠 Intelligence Tier (Modules 08-13): Implement modern AI - CNNs for vision, transformers for language ⚡ Optimization Tier (Modules 14-20): Deploy production systems - profiling, quantization, acceleration
Each tier builds complete, working systems with clear career connections and practical skills.
📖 See Complete Three-Tier Structure for detailed tier breakdown, time estimates, and learning outcomes.
🏆 Prove Your Mastery Through History
As you complete modules, unlock historical milestone demonstrations that prove what you've built works! From Rosenblatt's 1957 perceptron to modern CNNs achieving 75%+ accuracy on CIFAR-10, each milestone recreates a breakthrough using YOUR implementations:
- 🧠 1957: Perceptron - First trainable network with YOUR Linear layer
- ⚡ 1969: XOR Solution - Multi-layer networks with YOUR autograd
- 🔢 1986: MNIST MLP - Backpropagation achieving 95%+ with YOUR optimizers
- 🖼️ 1998: CIFAR-10 CNN - Spatial intelligence with YOUR Conv2d (75%+ accuracy!)
- 🤖 2017: Transformers - Language generation with YOUR attention
- ⚡ 2024: Systems Age - Production optimization with YOUR profiling
📖 See Journey Through ML History for complete milestone details and requirements.
Why Build Instead of Use?
The difference between using a library and understanding a system is the difference between being limited by tools and being empowered to create them. When you build from scratch, you transform from a framework user into a systems engineer:
❌ Using PyTorch
import torch.nn as nn
import torch.optim as optim
model = nn.Linear(784, 10)
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Your model trains but then...
# 🔥 OOM error! Why?
# 🔥 Loss is NaN! How to debug?
# 🔥 Training is slow! What's the bottleneck?
You're stuck when things break
❌ Using TensorFlow
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])
# Magic happens somewhere...
# 🤷 How are gradients computed?
# 🤷 Why this initialization?
# 🤷 What's happening in backward pass?
Magic boxes you can't understand
✅ Building TinyTorch
class Linear:
def __init__(self, in_features, out_features):
self.weight = randn(in_features, out_features) * 0.01
self.bias = zeros(out_features)
def forward(self, x):
self.input = x # Save for backward
return x @ self.weight + self.bias
def backward(self, grad):
# You wrote this! You know exactly why:
self.weight.grad = self.input.T @ grad
self.bias.grad = grad.sum(axis=0)
return grad @ self.weight.T
You can debug anything
✅ Building KV Cache
class KVCache:
def __init__(self, max_seq_len, n_heads, head_dim):
# You understand EXACTLY the memory layout:
self.k_cache = zeros(max_seq_len, n_heads, head_dim)
self.v_cache = zeros(max_seq_len, n_heads, head_dim)
# That's why GPT needs GBs of RAM!
def update(self, k, v, pos):
# You know why position matters:
self.k_cache[pos:pos+len(k)] = k # Reuse past computations
self.v_cache[pos:pos+len(v)] = v # O(n²) → O(n) speedup!
# Now you understand why context windows are limited
You master modern LLM optimizations
Who Is This For?
Perfect if you're asking these questions:
ML Systems Engineers: "Why does my model training OOM at batch size 32? How do attention mechanisms scale quadratically with sequence length? When does data loading become the bottleneck?" You'll build and profile every component, understanding memory hierarchies, computational complexity, and system bottlenecks that production ML systems face daily.
Students & Researchers: "How does that nn.Linear() call actually compute gradients? Why does Adam optimizer need 3× the memory of SGD? What's actually happening during a forward pass?" You'll implement the mathematics you learned in class and discover how theoretical concepts become practical systems with real performance implications.
Performance Engineers: "Where are the actual bottlenecks in transformer inference? How does KV-cache reduce computation by 10-100×? Why does my CNN use 4GB of memory?" By building these systems from scratch, you'll understand memory access patterns, cache efficiency, and optimization opportunities that profilers alone can't teach.
Academics & Educators: "How can I teach ML systems—not just ML algorithms?" TinyTorch provides a complete pedagogical framework emphasizing systems thinking: memory profiling, performance analysis, and scaling behavior are built into every module, not added as an afterthought.
ML Practitioners: "Why does training slow down after epoch 10? How do I debug gradient explosions? When should I use mixed precision?" Even experienced engineers often treat frameworks as black boxes. By understanding the systems underneath, you'll debug faster, optimize better, and make informed architectural decisions.
How to Choose Your Learning Path
Three Learning Approaches: You can build complete tiers (implement all 20 modules), focus on specific tiers (target your skill gaps), or explore selectively (study key concepts). Each tier builds complete, working systems.
Getting Started
Whether you're just exploring or ready to dive in, here are helpful resources: 📖 See Essential Commands for complete setup and command reference, or 📖 See Three-Tier Learning Structure for detailed tier breakdown and learning outcomes.
Additional Resources:
- Progress Tracking - Monitor your learning journey with 21 capability checkpoints
- Testing Framework - Understand our comprehensive validation system
- Documentation & Guides - Complete technical documentation and tutorials
TinyTorch is more than a course—it's a community of learners building together. Join thousands exploring ML systems from the ground up.