---
html_meta:
"property=og:title": "TinyTorch: Build your own ML framework from scratch"
"property=og:description": "Learn ML systems by building them. From computer vision to language models. Comprehensive educational framework for understanding ML systems engineering."
"property=og:url": "https://mlsysbook.github.io/TinyTorch/"
"property=og:type": "website"
"property=og:image": "https://mlsysbook.github.io/TinyTorch/logo.png"
"property=og:site_name": "TinyTorch Course"
"name=twitter:card": "summary_large_image"
"name=twitter:title": "TinyTorch: Build your own ML framework"
"name=twitter:description": "TinyTorch is a minimalist framework for building machine learning systems from scratchβfrom vision to language."
"name=twitter:image": "https://mlsysbook.github.io/TinyTorch/logo.png"
---
# TinyTorch: Build Your Own ML Framework from First Principles
**Most ML education teaches you to _use_ frameworks. TinyTorch teaches you to _build_ them.**
Tinyπ₯Torch is a minimalist framework for building machine learning systems from scratchβfrom tensors to systems. Instead of relying on PyTorch or TensorFlow, you implement everything yourselfβtensors, autograd, optimizers, even MLOps tooling.
**The Vision: Train ML Systems Engineers, Not Just ML Users**
This hands-on approach builds the deep systems intuition that separates ML engineers from ML users. You'll understand not just *what* neural networks do, but *how* they work under the hood, *why* certain design choices matter in production, and *when* to make trade-offs between memory, speed, and accuracy.
```{admonition} What You'll Build: The Complete ML Evolution Story
:class: tip
**A complete ML framework from scratch** that recreates the history of ML breakthroughs:
**π§ MLP Era (1980s): The Foundation**
- **Train MLPs to 52.7% accuracy on CIFAR-10** (the baseline everyone tried to beat)
- Implement automatic differentiation from first principles
- Master gradient-based optimization with SGD and Adam
**π‘ CNN Revolution (1989-1998): Spatial Intelligence**
- **LeNet-1 (1989)**: Build the first successful CNN architecture (39.4% accuracy)
- **LeNet-5 (1998)**: Implement the classic CNN that established the standard (47.5% accuracy)
- **Modern CNNs**: Push beyond MLPs with optimized architectures (55%+ achievable)
**π₯ Transformer Era (2017-present): Language & Beyond**
- **TinyGPT**: Complete language models using your vision framework
- **Universal Architecture**: 95% component reuse from vision to language
- **Modern ML Systems**: Full pipeline from data loading to deployment
**Result:** You experience firsthand how ML evolved from simple perceptrons to modern AI systems, implementing every breakthrough yourself. All 16 modules pass comprehensive tests with 100% health status.
```
_Understanding how to build ML systems makes you a more effective ML engineer._
```{admonition} The Perfect Learning Combination
:class: note
TinyTorch was designed as the hands-on lab companion to [**Machine Learning Systems**](https://mlsysbook.ai) by [Prof. Vijay Janapa Reddi](https://vijay.seas.harvard.edu) (Harvard). The book teaches you ML systems **theory and principles** - TinyTorch lets you **implement and experience** those concepts firsthand. Together, they provide complete ML systems mastery.
```
---
## The Historic Journey: From MLPs to Modern AI
TinyTorch recreates the actual progression of machine learning breakthroughs. You don't just learn modern AI - you **experience the evolution that created it**:
```python
π§ MLP Era (1980s): π‘ CNN Revolution (1989): π₯ Transformer Era (2017):
βββ class MLP: βββ class LeNet1: βββ class TinyGPT:
β def forward(self, x): β def forward(self, x): β def forward(self, x):
β h = x.reshape(batch,-1) β h = self.conv1(x) β h = self.embed(x)
β h = self.fc1(h) β h = self.pool(h) β h = self.attention(h)
β return self.fc2(h) β h = self.conv2(h) β return self.lm_head(h)
β β return self.fc(h.flat()) β
β Result: 52.7% CIFAR-10 β Result: 47.5% CIFAR-10 β Result: Language generation
βββ "Good, but can we do βββ "Spatial features help!" βββ "Universal intelligence!"
better with images?"
The SAME tensor operations power all three eras - you build them once, use everywhere.
```
**The ML Evolution Story:**
- **1980s**: MLPs could learn, but struggled with complex patterns
- **1989**: LeNet-1 proved convolutions extract spatial features
- **1998**: LeNet-5 established CNNs as the vision standard
- **2012**: AlexNet showed deep CNNs dominate computer vision
- **2017**: Transformers unified vision AND language processing
- **Today**: Same mathematical foundations power all AI systems
TinyTorch focuses on implementation and systems thinking. You learn *how* to build working systems with progressive scaffolding, production ready practices, and comprehensive course infrastructure that bridges the gap between learning and building.
**What Makes This Different: Systems-First Thinking**
Traditional ML courses teach algorithms. TinyTorch teaches **ML systems engineering**:
- **Memory Management**: Why Adam uses 3Γ more memory than SGD and when that matters
- **Performance Analysis**: How attention mechanisms scale O(NΒ²) and limit context length
- **Production Trade-offs**: When to use gradient accumulation vs larger GPUs
- **Hardware Awareness**: How cache misses make naive convolution 100Γ slower
- **System Design**: How autograd graphs consume memory and enable gradient checkpointing
**Result**: You become the engineer who designs ML systems, not just uses them.
---
## Learning Philosophy: Build, Use, Reflect
Every component follows the same powerful learning cycle:
### Example: Activation Functions
**Build:** Implement ReLU from scratch
```python
def relu(x):
# YOU implement this function
return np.maximum(0, x) # Your solution
```
**Use:** Immediately use your own code
```python
from tinytorch.core.activations import ReLU # YOUR implementation!
layer = ReLU()
output = layer.forward(input_tensor) # Your code working!
```
**Reflect:** See it working in real networks
```python
# Your ReLU is now part of a real neural network
model = Sequential([
Dense(784, 128),
ReLU(), # <-- Your implementation
Dense(128, 10)
])
```
This pattern repeats for every component: tensors, layers, optimizers, even MLOps systems. You build it, use it immediately, then reflect on how it fits into larger systems.
**π― Track Your Capabilities**
TinyTorch uses a [checkpoint system](checkpoint-system.md) to track your progress through **ML systems engineering capabilities**:
- **Foundation** β Core ML primitives and setup
- **Architecture** β Neural network building
- **Training** β Model training pipeline
- **Inference** β Deployment and optimization
- **Serving** β Complete system integration
Use `tito checkpoint status` to see your progress anytime!
**π― Beyond Code: Systems Intuition**
Each module includes **ML Systems Thinking** sections that connect your implementations to production reality:
- *"How does your tensor implementation compare to PyTorch's memory management?"*
- *"When would you choose SGD over Adam in production training?"*
- *"How do frameworks handle the quadratic memory scaling of attention?"*
- *"What happens to your autograd implementation under distributed training?"*
These aren't just academic questions - they're the system-level challenges that ML engineers solve every day.
---
## π₯ Who This Is For
### π― Perfect For:
- **CS students** who want to understand ML systems beyond high-level APIs
- **Software engineers** transitioning to ML engineering roles
- **ML practitioners** who want to optimize and debug production systems
- **Researchers** who need to implement custom operations and architectures
- **Anyone curious** about how PyTorch/TensorFlow actually work under the hood
### π Prerequisites:
- **Python programming** (comfortable with classes, functions, basic NumPy)
- **Linear algebra basics** (matrix multiplication, gradients)
- **Learning mindset** - we'll teach you everything else!
### π Career Impact:
After TinyTorch, you'll be the person your team asks:
- *"Why is this training so slow?"* (You'll know how to profile and optimize)
- *"Can we fit this model in GPU memory?"* (You'll understand memory trade-offs)
- *"What's the best optimizer for this problem?"* (You'll know the system implications)
---
## π **STREAMLINED Journey: Train Neural Networks in 7 Modules!**
```{admonition} β¨ **NEW: Accelerated Learning Path**
:class: important
**BREAKTHROUGH: Students can train neural networks after just 7 modules** (vs 11 before)!
The reorganization eliminates forward dependencies and focuses on essentials.
```
```{admonition} π§ Neural Network Foundations (Modules 1-7)
:class: note
**1. Setup** β’ **2. Tensor + Autograd** β’ **3. ReLU + Softmax** β’ **4. Linear + Module + Flatten**
**5. Loss Functions** β’ **6. Optimizers** β’ **7. Training**
**GAME CHANGER**: Complete neural network training capability in 7 modules!
- **Module 2**: Gradients from the start (no waiting until Module 9!)
- **Module 3**: Focus on 2 essential activations (not 6 distractions)
- **Module 4**: All building blocks in one place (Linear + Module + Flatten)
- **Module 7**: **Train XOR and MNIST after 7 modules!**
```
```{admonition} π‘ Computer Vision (Modules 8-9)
:class: note
**8. CNN Operations** β’ **9. DataLoader**
Add convolutional intelligence: Conv2d, MaxPool2d, and efficient data loading.
**Result**: Train CNNs on CIFAR-10 after just 9 modules!
```
```{admonition} π₯ Language Models (Modules 10-12)
:class: note
**10. Embeddings** β’ **11. Attention** β’ **12. Transformers**
Universal intelligence: Build GPT-style language models using your vision infrastructure.
**Result**: Complete TinyGPT using 95% of your vision components!
```
---
## π Complete System Integration
**This isn't 16 separate exercises.** Every component you build integrates into one fully functional ML framework with universal foundations:
```{admonition} π― How It All Connects
:class: important
```{mermaid}
flowchart TD
A[01_setup
π§ Environment & CLI] --> B[02_tensor
π Tensor + Basic Autograd
π GRADIENTS FROM START!]
B --> C[03_activations
β‘ ReLU + Softmax
π― ESSENTIALS ONLY]
C --> D[04_layers
π§± Linear + Module + Flatten
π COMPLETE BUILDING BLOCKS]
D --> E[05_losses
π MSE + CrossEntropy
π― WHAT TO OPTIMIZE]
E --> F[06_optimizers
π SGD + Adam
π― HOW TO OPTIMIZE]
F --> G[07_training
π₯ Complete Training
β
TRAIN NETWORKS NOW!]
G --> H[08_cnn_ops
ποΈ Conv2d + MaxPool2d
πΌοΈ VISION INTELLIGENCE]
G --> I[09_dataloader
π CIFAR10 + DataLoader
ποΈ REAL DATA]
H --> I
I --> J[πΌοΈ CIFAR-10 CNNs
Train on Real Images]
G --> K[10_embeddings
π Token Embeddings]
K --> L[11_attention
π Multi-Head Attention]
L --> M[12_transformers
π€ TinyGPT
π₯ LANGUAGE MODELS]
style G fill:#ff6b6b,stroke:#333,stroke-width:3px,color:#fff
style J fill:#4ecdc4,stroke:#333,stroke-width:3px,color:#fff
style M fill:#45b7d1,stroke:#333,stroke-width:3px,color:#fff
```
**Result:** Every component you build converges into TinyGPT - proving your framework is complete and production-ready.
```
### π₯ TinyGPT: Proving Framework Universality
TinyGPT is your **capstone achievement** - demonstrating that the same foundations power all modern AI:
**The Historical Proof:**
- **1980s MLP components** β **1989 CNN revolution** β **2017 Transformer era**
- **95% component reuse**: Your tensors, layers, and training systems work across all three eras
- **Universal mathematics**: The same operations that power MLPs (52.7%) and CNNs (LeNet-5: 47.5%) also power language models
**What TinyGPT Proves:**
- **Framework Universality**: Vision and language use identical mathematical foundations
- **Component Integration**: All 16 modules work together seamlessly across domains
- **Systems Mastery**: You understand how modern AI builds on historical breakthroughs
- **Career Readiness**: You can implement any architecture from any era
**The Achievement:** Build GPT using components you designed for computer vision. This proves you didn't just learn isolated techniques - you built a complete, universal ML framework capable of any task.
---
## Choose Your Learning Path
```{admonition} Three Ways to Engage with TinyTorch
:class: important
### [Quick Exploration](usage-paths/quick-exploration.md) *(5 minutes)*
*"I want to see what this is about"*
- Click and run code immediately in your browser (Binder)
- No installation or setup required
- Implement ReLU, tensors, neural networks interactively
- Perfect for getting a feel for the course
### [Serious Development](usage-paths/serious-development.md) *(8+ weeks)*
*"I want to build this myself"*
- Fork the repo and work locally with full development environment
- Build complete ML framework from scratch with `tito` CLI
- 16 progressive assignments from setup to language models
- Professional development workflow with automated testing
### [Classroom Use](usage-paths/classroom-use.md) *(Instructors)*
*"I want to teach this course"*
- Complete course infrastructure with NBGrader integration
- Automated grading for comprehensive testing
- Flexible pacing (8-16 weeks) with proven pedagogical structure
- Turn-key solution for ML systems education
```
---
## Ready to Start?
### Quick Taste: Try Module 1 Right Now
Want to see what TinyTorch feels like? **[Launch the Setup chapter](chapters/01-setup.md)** in Binder and implement your first TinyTorch function in 2 minutes!
---
## Acknowledgments
TinyTorch originated from CS249r: Tiny Machine Learning Systems at Harvard University. We're inspired by projects like [tinygrad](https://github.com/geohot/tinygrad), [micrograd](https://github.com/karpathy/micrograd), and [MiniTorch](https://minitorch.github.io/) that demonstrate the power of minimal implementations.