Files
TinyTorch/book/intro.md
Vijay Janapa Reddi 6491a7512e Clean up repository: remove temp files, organize modules, prepare for PyPI publication
- Removed temporary test files and audit reports
- Deleted backup and temp_holding directories
- Reorganized module structure (07->09 spatial, 09->07 dataloader)
- Added new modules: 11-14 (tokenization, embeddings, attention, transformers)
- Updated examples with historical ML milestones
- Cleaned up documentation structure
2025-09-24 10:13:37 -04:00

14 KiB
Raw Blame History

html_meta
html_meta
property=og:title property=og:description property=og:url property=og:type property=og:image property=og:site_name name=twitter:card name=twitter:title name=twitter:description name=twitter:image
TinyTorch: Build your own ML framework from scratch Learn ML systems by building them. From computer vision to language models. Comprehensive educational framework for understanding ML systems engineering. https://mlsysbook.github.io/TinyTorch/ website https://mlsysbook.github.io/TinyTorch/logo.png TinyTorch Course summary_large_image TinyTorch: Build your own ML framework TinyTorch is a minimalist framework for building machine learning systems from scratch—from vision to language. https://mlsysbook.github.io/TinyTorch/logo.png

TinyTorch: Build Your Own ML Framework from First Principles

Most ML education teaches you to use frameworks. TinyTorch teaches you to build them.

Tiny🔥Torch is a minimalist framework for building machine learning systems from scratch—from tensors to systems. Instead of relying on PyTorch or TensorFlow, you implement everything yourself—tensors, autograd, optimizers, even MLOps tooling.

The Vision: Train ML Systems Engineers, Not Just ML Users

This hands-on approach builds the deep systems intuition that separates ML engineers from ML users. You'll understand not just what neural networks do, but how they work under the hood, why certain design choices matter in production, and when to make trade-offs between memory, speed, and accuracy.

:class: tip
**A complete ML framework from scratch** that recreates the history of ML breakthroughs:

**🧠 MLP Era (1980s): The Foundation**
- **Train MLPs to 52.7% accuracy on CIFAR-10** (the baseline everyone tried to beat)
- Implement automatic differentiation from first principles
- Master gradient-based optimization with SGD and Adam

**📡 CNN Revolution (1989-1998): Spatial Intelligence**
- **LeNet-1 (1989)**: Build the first successful CNN architecture (39.4% accuracy)
- **LeNet-5 (1998)**: Implement the classic CNN that established the standard (47.5% accuracy)
- **Modern CNNs**: Push beyond MLPs with optimized architectures (55%+ achievable)

**🔥 Transformer Era (2017-present): Language & Beyond**
- **TinyGPT**: Complete language models using your vision framework
- **Universal Architecture**: 95% component reuse from vision to language
- **Modern ML Systems**: Full pipeline from data loading to deployment

**Result:** You experience firsthand how ML evolved from simple perceptrons to modern AI systems, implementing every breakthrough yourself. All 16 modules pass comprehensive tests with 100% health status.

Understanding how to build ML systems makes you a more effective ML engineer.

:class: note
TinyTorch was designed as the hands-on lab companion to [**Machine Learning Systems**](https://mlsysbook.ai) by [Prof. Vijay Janapa Reddi](https://vijay.seas.harvard.edu) (Harvard). The book teaches you ML systems **theory and principles** - TinyTorch lets you **implement and experience** those concepts firsthand. Together, they provide complete ML systems mastery.

The Historic Journey: From MLPs to Modern AI

TinyTorch recreates the actual progression of machine learning breakthroughs. You don't just learn modern AI - you experience the evolution that created it:

🧠 MLP Era (1980s):           📡 CNN Revolution (1989):      🔥 Transformer Era (2017):
├── class MLP:                ├── class LeNet1:               ├── class TinyGPT:
   def forward(self, x):        def forward(self, x):          def forward(self, x):
   h = x.reshape(batch,-1)      h = self.conv1(x)             h = self.embed(x)
   h = self.fc1(h)             h = self.pool(h)              h = self.attention(h)
   return self.fc2(h)           h = self.conv2(h)             return self.lm_head(h)
                                return self.fc(h.flat())    
 Result: 52.7% CIFAR-10      Result: 47.5% CIFAR-10        Result: Language generation
└── "Good, but can we do     └── "Spatial features help!"   └── "Universal intelligence!"
    better with images?"

The SAME tensor operations power all three eras - you build them once, use everywhere.

The ML Evolution Story:

  • 1980s: MLPs could learn, but struggled with complex patterns
  • 1989: LeNet-1 proved convolutions extract spatial features
  • 1998: LeNet-5 established CNNs as the vision standard
  • 2012: AlexNet showed deep CNNs dominate computer vision
  • 2017: Transformers unified vision AND language processing
  • Today: Same mathematical foundations power all AI systems

TinyTorch focuses on implementation and systems thinking. You learn how to build working systems with progressive scaffolding, production ready practices, and comprehensive course infrastructure that bridges the gap between learning and building.

What Makes This Different: Systems-First Thinking

Traditional ML courses teach algorithms. TinyTorch teaches ML systems engineering:

  • Memory Management: Why Adam uses 3× more memory than SGD and when that matters
  • Performance Analysis: How attention mechanisms scale O(N²) and limit context length
  • Production Trade-offs: When to use gradient accumulation vs larger GPUs
  • Hardware Awareness: How cache misses make naive convolution 100× slower
  • System Design: How autograd graphs consume memory and enable gradient checkpointing

Result: You become the engineer who designs ML systems, not just uses them.


Learning Philosophy: Build, Use, Reflect

Every component follows the same powerful learning cycle:

Example: Activation Functions

Build: Implement ReLU from scratch

def relu(x):
    # YOU implement this function
    return np.maximum(0, x)  # Your solution

Use: Immediately use your own code

from tinytorch.core.activations import ReLU  # YOUR implementation!
layer = ReLU()
output = layer.forward(input_tensor)  # Your code working!

Reflect: See it working in real networks

# Your ReLU is now part of a real neural network
model = Sequential([
    Dense(784, 128),
    ReLU(),           # <-- Your implementation
    Dense(128, 10)
])

This pattern repeats for every component: tensors, layers, optimizers, even MLOps systems. You build it, use it immediately, then reflect on how it fits into larger systems.

🎯 Track Your Capabilities

TinyTorch uses a checkpoint system to track your progress through ML systems engineering capabilities:

  • Foundation → Core ML primitives and setup
  • Architecture → Neural network building
  • Training → Model training pipeline
  • Inference → Deployment and optimization
  • Serving → Complete system integration

Use tito checkpoint status to see your progress anytime!

🎯 Beyond Code: Systems Intuition

Each module includes ML Systems Thinking sections that connect your implementations to production reality:

  • "How does your tensor implementation compare to PyTorch's memory management?"
  • "When would you choose SGD over Adam in production training?"
  • "How do frameworks handle the quadratic memory scaling of attention?"
  • "What happens to your autograd implementation under distributed training?"

These aren't just academic questions - they're the system-level challenges that ML engineers solve every day.


👥 Who This Is For

🎯 Perfect For:

  • CS students who want to understand ML systems beyond high-level APIs
  • Software engineers transitioning to ML engineering roles
  • ML practitioners who want to optimize and debug production systems
  • Researchers who need to implement custom operations and architectures
  • Anyone curious about how PyTorch/TensorFlow actually work under the hood

📚 Prerequisites:

  • Python programming (comfortable with classes, functions, basic NumPy)
  • Linear algebra basics (matrix multiplication, gradients)
  • Learning mindset - we'll teach you everything else!

🚀 Career Impact:

After TinyTorch, you'll be the person your team asks:

  • "Why is this training so slow?" (You'll know how to profile and optimize)
  • "Can we fit this model in GPU memory?" (You'll understand memory trade-offs)
  • "What's the best optimizer for this problem?" (You'll know the system implications)

📚 STREAMLINED Journey: Train Neural Networks in 7 Modules!

:class: important
**BREAKTHROUGH: Students can train neural networks after just 7 modules** (vs 11 before)!
The reorganization eliminates forward dependencies and focuses on essentials.
:class: note
**1. Setup** • **2. Tensor + Autograd** • **3. ReLU + Softmax** • **4. Linear + Module + Flatten**  
**5. Loss Functions** • **6. Optimizers** • **7. Training**

**GAME CHANGER**: Complete neural network training capability in 7 modules!
- **Module 2**: Gradients from the start (no waiting until Module 9!)
- **Module 3**: Focus on 2 essential activations (not 6 distractions)
- **Module 4**: All building blocks in one place (Linear + Module + Flatten)
- **Module 7**: **Train XOR and MNIST after 7 modules!**
:class: note
**8. CNN Operations** • **9. DataLoader**

Add convolutional intelligence: Conv2d, MaxPool2d, and efficient data loading.
**Result**: Train CNNs on CIFAR-10 after just 9 modules!
:class: note
**10. Embeddings** • **11. Attention** • **12. Transformers**

Universal intelligence: Build GPT-style language models using your vision infrastructure.
**Result**: Complete TinyGPT using 95% of your vision components!

🔗 Complete System Integration

This isn't 16 separate exercises. Every component you build integrates into one fully functional ML framework with universal foundations:

:class: important

```{mermaid}
flowchart TD
    A[01_setup<br/>🔧 Environment & CLI] --> B[02_tensor<br/>📊 Tensor + Basic Autograd<br/>🚀 GRADIENTS FROM START!]
    
    B --> C[03_activations<br/>⚡ ReLU + Softmax<br/>🎯 ESSENTIALS ONLY]
    
    C --> D[04_layers<br/>🧱 Linear + Module + Flatten<br/>💎 COMPLETE BUILDING BLOCKS]
    
    D --> E[05_losses<br/>📊 MSE + CrossEntropy<br/>🎯 WHAT TO OPTIMIZE]
    
    E --> F[06_optimizers<br/>🚀 SGD + Adam<br/>🎯 HOW TO OPTIMIZE]
    
    F --> G[07_training<br/>🔥 Complete Training<br/>✅ TRAIN NETWORKS NOW!]
    
    G --> H[08_cnn_ops<br/>👁️ Conv2d + MaxPool2d<br/>🖼️ VISION INTELLIGENCE]
    
    G --> I[09_dataloader<br/>📁 CIFAR10 + DataLoader<br/>🗂️ REAL DATA]
    
    H --> I
    I --> J[🖼️ CIFAR-10 CNNs<br/>Train on Real Images]
    
    G --> K[10_embeddings<br/>📚 Token Embeddings]
    K --> L[11_attention<br/>🔍 Multi-Head Attention]
    L --> M[12_transformers<br/>🤖 TinyGPT<br/>🔥 LANGUAGE MODELS]
    
    style G fill:#ff6b6b,stroke:#333,stroke-width:3px,color:#fff
    style J fill:#4ecdc4,stroke:#333,stroke-width:3px,color:#fff
    style M fill:#45b7d1,stroke:#333,stroke-width:3px,color:#fff

Result: Every component you build converges into TinyGPT - proving your framework is complete and production-ready.


### 🔥 TinyGPT: Proving Framework Universality

TinyGPT is your **capstone achievement** - demonstrating that the same foundations power all modern AI:

**The Historical Proof:**
- **1980s MLP components** → **1989 CNN revolution** → **2017 Transformer era**
- **95% component reuse**: Your tensors, layers, and training systems work across all three eras
- **Universal mathematics**: The same operations that power MLPs (52.7%) and CNNs (LeNet-5: 47.5%) also power language models

**What TinyGPT Proves:**
- **Framework Universality**: Vision and language use identical mathematical foundations  
- **Component Integration**: All 16 modules work together seamlessly across domains
- **Systems Mastery**: You understand how modern AI builds on historical breakthroughs
- **Career Readiness**: You can implement any architecture from any era

**The Achievement:** Build GPT using components you designed for computer vision. This proves you didn't just learn isolated techniques - you built a complete, universal ML framework capable of any task.

---

## Choose Your Learning Path

```{admonition} Three Ways to Engage with TinyTorch
:class: important

### [Quick Exploration](usage-paths/quick-exploration.md) *(5 minutes)*
*"I want to see what this is about"*
- Click and run code immediately in your browser (Binder)
- No installation or setup required
- Implement ReLU, tensors, neural networks interactively
- Perfect for getting a feel for the course

### [Serious Development](usage-paths/serious-development.md) *(8+ weeks)*
*"I want to build this myself"*
- Fork the repo and work locally with full development environment
- Build complete ML framework from scratch with `tito` CLI
- 16 progressive assignments from setup to language models
- Professional development workflow with automated testing

### [Classroom Use](usage-paths/classroom-use.md) *(Instructors)*
*"I want to teach this course"*
- Complete course infrastructure with NBGrader integration
- Automated grading for comprehensive testing
- Flexible pacing (8-16 weeks) with proven pedagogical structure
- Turn-key solution for ML systems education

Ready to Start?

Quick Taste: Try Module 1 Right Now

Want to see what TinyTorch feels like? Launch the Setup chapter in Binder and implement your first TinyTorch function in 2 minutes!


Acknowledgments

TinyTorch originated from CS249r: Tiny Machine Learning Systems at Harvard University. We're inspired by projects like tinygrad, micrograd, and MiniTorch that demonstrate the power of minimal implementations.