Files
TinyTorch/docs/chapters/00-introduction.md
Vijay Janapa Reddi c058ab9419 Fix documentation links after site → docs reorganization
- Replace all .html → .md in markdown source files (43 instances)
- Fix broken links: tito-essentials.md → tito/overview.md
- Remove broken links to non-existent leaderboard/olympics-rules pages
- Fix PDF_BUILD_GUIDE reference in website-README.md

Website rebuilt successfully with 46 warnings.

Changes:
- All markdown files now use .md extension for internal links
- Removed references to missing/planned files
- Website builds cleanly and all links are functional
2025-11-28 05:01:44 +01:00

20 KiB
Raw Blame History

Course Introduction: ML Systems Engineering Through Implementation

Transform from ML user to ML systems engineer by building everything yourself.


The Origin Story: Why TinyTorch Exists

The Problem We're Solving

There's a critical gap in ML engineering today. Plenty of people can use ML frameworks (PyTorch, TensorFlow, JAX, etc.), but very few understand the systems underneath. This creates real problems:

  • Engineers deploy models but can't debug when things go wrong
  • Teams hit performance walls because no one understands the bottlenecks
  • Companies struggle to scale - whether to tiny edge devices or massive clusters
  • Innovation stalls when everyone is limited to existing framework capabilities

How TinyTorch Began

TinyTorch started as exercises for the MLSysBook.ai textbook - students needed hands-on implementation experience. But it quickly became clear this addressed a much bigger problem:

The industry desperately needs engineers who can BUILD ML systems, not just USE them.

Deploying ML systems at scale is hard. Scale means both directions:

  • Small scale: Running models on edge devices with 1MB of RAM
  • Large scale: Training models across thousands of GPUs
  • Production scale: Serving millions of requests with <100ms latency

We need more engineers who understand memory hierarchies, computational graphs, kernel optimization, distributed communication - the actual systems that make ML work.

Our Solution: Learn By Building

TinyTorch teaches ML systems the only way that really works: by building them yourself.

When you implement your own tensor operations, write your own autograd, build your own optimizer - you gain understanding that's impossible to achieve by just calling APIs. You learn not just what these systems do, but HOW they do it and WHY they're designed that way.


Core Learning Concepts

Concept 1: Systems Memory Analysis

# Learning objective: Understand memory usage patterns
# Framework user: "torch.optim.Adam()" - black box
# TinyTorch student: Implements Adam and discovers why it needs 3x parameter memory
# Result: Deep understanding of optimizer trade-offs applicable to any framework

Concept 2: Computational Complexity

# Learning objective: Analyze algorithmic scaling behavior
# Framework user: "Attention mechanism" - abstract concept
# TinyTorch student: Implements attention from scratch, measures O(n²) scaling
# Result: Intuition for sequence modeling limits across PyTorch, TensorFlow, JAX

Concept 3: Automatic Differentiation

# Learning objective: Understand gradient computation
# Framework user: "loss.backward()" - mysterious process
# TinyTorch student: Builds autograd engine with computational graphs
# Result: Knowledge of how all modern ML frameworks enable learning

What Makes TinyTorch Different

Most ML education teaches you to use frameworks (PyTorch, TensorFlow, JAX, etc.). TinyTorch teaches you to build them.

This fundamental difference creates engineers who understand systems deeply, not just APIs superficially.

The Learning Philosophy: Build → Use → Reflect

Traditional Approach:

import torch
model = torch.nn.Linear(784, 10)  # Use someone else's implementation
output = model(input)             # Trust it works, don't understand how

TinyTorch Approach:

# 1. BUILD: You implement Linear from scratch
class Linear:
    def forward(self, x):
        return x @ self.weight + self.bias  # You write this
        
# 2. USE: Your implementation in action
from tinytorch.core.layers import Linear  # YOUR code
model = Linear(784, 10)                  # YOUR implementation
output = model(input)                    # YOU know exactly how this works

# 3. REFLECT: Systems thinking
# "Why does matrix multiplication dominate compute time?"
# "How does this scale with larger models?"
# "What memory optimizations are possible?"

Who This Course Serves

Perfect For:

🎓 Computer Science Students

  • Want to understand ML systems beyond high-level APIs
  • Need to implement custom operations for research
  • Preparing for ML engineering roles that require systems knowledge

👩‍💻 Software Engineers → ML Engineers

  • Transitioning into ML engineering roles
  • Need to debug and optimize production ML systems
  • Want to understand what happens "under the hood" of ML frameworks

🔬 ML Practitioners & Researchers

  • Debug performance issues in production systems
  • Implement novel architectures and custom operations
  • Optimize training and inference for resource constraints

🧠 Anyone Curious About ML Systems

  • Understand how PyTorch, TensorFlow actually work
  • Build intuition for ML systems design and optimization
  • Appreciate the engineering behind modern AI breakthroughs

Prerequisites

Required:

  • Python Programming: Comfortable with classes, functions, basic NumPy
  • Linear Algebra Basics: Matrix multiplication, gradients (we review as needed)
  • Learning Mindset: Willingness to implement rather than just use

Not Required:

  • Prior ML framework experience (we build our own!)
  • Deep learning theory (we learn through implementation)
  • Advanced math (we focus on practical systems implementation)

What You'll Achieve: Tier-by-Tier Mastery

After Foundation Tier (Modules 01-07)

Build a complete neural network framework from mathematical first principles:

# YOUR implementation training real networks on real data
model = Sequential([
    Linear(784, 128),    # Your linear algebra implementation
    ReLU(),              # Your activation function
    Linear(128, 64),     # Your gradient-aware layers
    ReLU(),              # Your nonlinearity
    Linear(64, 10)       # Your classification head
])

# YOUR complete training system
optimizer = Adam(model.parameters(), lr=0.001)  # Your optimization algorithm
for batch in dataloader:  # Your data management
    output = model(batch.x)                     # Your forward computation
    loss = CrossEntropyLoss()(output, batch.y)  # Your loss calculation
    loss.backward()                             # YOUR backpropagation engine
    optimizer.step()                            # Your parameter updates

🎯 Foundation Achievement: 95%+ accuracy on MNIST using 100% your own mathematical implementations

After Architecture Tier (Modules 08-13)

  • Computer Vision Mastery: CNNs achieving 75%+ accuracy on CIFAR-10 with YOUR convolution implementations
  • Language Understanding: Transformers generating coherent text using YOUR attention mechanisms
  • Universal Architecture: Discover why the SAME mathematical principles work for vision AND language
  • AI Breakthrough Recreation: Implement the architectures that created the modern AI revolution

After Optimization Tier (Modules 14-20)

  • Production Performance: Systems optimized for <100ms inference latency using YOUR profiling tools
  • Memory Efficiency: Models compressed to 25% original size with YOUR quantization implementations
  • Hardware Acceleration: Kernels achieving 10x speedups through YOUR vectorization techniques
  • Competition Ready: Torch Olympics submissions competitive with industry implementations

The ML Evolution Story You'll Experience

TinyTorch's three-tier structure follows the actual historical progression of machine learning breakthroughs:

Foundation Era (1980s-1990s) → Foundation Tier

The Beginning: Mathematical foundations that started it all

  • 1986 Breakthrough: Backpropagation enables multi-layer networks
  • Your Implementation: Build automatic differentiation and gradient-based optimization
  • Historical Milestone: Train MLPs to 95%+ accuracy on MNIST using YOUR autograd engine

Architecture Era (1990s-2010s) → Architecture Tier

The Revolution: Specialized architectures for vision and language

  • 1998 Breakthrough: CNNs revolutionize computer vision (LeCun's LeNet)
  • 2017 Breakthrough: Transformers unify vision and language ("Attention is All You Need")
  • Your Implementation: Build CNNs achieving 75%+ on CIFAR-10, then transformers for text generation
  • Historical Milestone: Recreate both revolutions using YOUR spatial and attention implementations

Optimization Era (2010s-Present) → Optimization Tier

The Engineering: Production systems that scale to billions of users

  • 2020s Breakthrough: Efficient inference enables real-time LLMs (GPT, ChatGPT)
  • Your Implementation: Build KV-caching, quantization, and production optimizations
  • Historical Milestone: Deploy systems competitive in Torch Olympics benchmarks

Why This Progression Matters: You'll understand not just modern AI, but WHY it evolved this way. Each tier builds essential capabilities that inform the next, just like ML history itself.


Systems Engineering Focus: Why Tiers Matter

Traditional ML courses teach algorithms in isolation. TinyTorch's tier structure teaches systems thinking - how components interact to create production ML systems.

Traditional Linear Approach:

Module 1: Tensors → Module 2: Layers → Module 3: Training → ...

Problem: Students learn components but miss system interactions

TinyTorch Tier Approach:

🏗️ Foundation Tier: Build mathematical infrastructure
🏛️ Architecture Tier: Compose intelligent architectures
⚡ Optimization Tier: Deploy at production scale

Advantage: Each tier builds complete, working systems with clear progression

What Traditional Courses Teach vs. TinyTorch Tiers:

Traditional: "Use torch.optim.Adam for optimization" Foundation Tier: "Why Adam needs 3× more memory than SGD and how to implement both from mathematical first principles"

Traditional: "Transformers use attention mechanisms" Architecture Tier: "How attention creates O(N²) scaling, why this limits context windows, and how to implement efficient attention yourself"

Traditional: "Deploy models with TensorFlow Serving" Optimization Tier: "How to profile bottlenecks, implement KV-caching for 10× speedup, and compete in production benchmarks"

Career Impact by Tier

After each tier, you become the team member who:

🏗️ Foundation Tier Graduate:

  • Debugs gradient flow issues: "Your ReLU is causing dead neurons"
  • Implements custom optimizers: "I'll build a variant of Adam for this use case"
  • Understands memory patterns: "Batch size 64 hits your GPU memory limit here"

🏛️ Architecture Tier Graduate:

  • Designs novel architectures: "We can adapt transformers for this computer vision task"
  • Optimizes attention patterns: "This attention bottleneck is why your model won't scale to longer sequences"
  • Bridges vision and language: "The same mathematical principles work for both domains"

Optimization Tier Graduate:

  • Deploys production systems: "I can get us from 500ms to 50ms inference latency"
  • Leads performance optimization: "Here's our memory bottleneck and my 3-step plan to fix it"
  • Competes at industry scale: "Our optimizations achieve Torch Olympics benchmark performance"

Learning Support & Community

Comprehensive Infrastructure

  • Automated Testing: Every component includes comprehensive test suites
  • Progress Tracking: 16-checkpoint capability assessment system
  • CLI Tools: tito command-line interface for development workflow
  • Visual Progress: Real-time tracking of learning milestones

Multiple Learning Paths

  • Quick Exploration (5 min): Browser-based exploration, no setup required
  • Serious Development (8+ weeks): Full local development environment
  • Classroom Use: Complete course infrastructure with automated grading

Professional Development Practices

  • Version Control: Git-based workflow with feature branches
  • Testing Culture: Test-driven development for all implementations
  • Code Quality: Professional coding standards and review processes
  • Documentation: Comprehensive guides and system architecture documentation

Start Your Journey

Begin Building ML Systems

Choose your starting point based on your goals and time commitment

15-Minute Start → Foundation Tier →

Next Steps:

:class: tip
By completing all three tiers, you'll have built a complete ML framework that rivals production implementations:

**🏗️ Foundation Tier Achievement**: 95%+ accuracy on MNIST with YOUR mathematical implementations
**🏛️ Architecture Tier Achievement**: 75%+ accuracy on CIFAR-10 AND coherent text generation
**⚡ Optimization Tier Achievement**: Production systems competitive in Torch Olympics benchmarks

All using code you wrote yourself, from mathematical first principles to production optimization.

📖 Want to understand the pedagogical narrative behind this structure? See The Learning Journey to understand WHY modules flow this way and HOW they build on each other through a six-act learning story.


Foundation Tier (Modules 01-07)

Building Blocks of ML Systems • 6-8 weeks • All Prerequisites for Neural Networks

What You'll Learn: Build the mathematical and computational infrastructure that powers all neural networks. Master tensor operations, gradient computation, and optimization algorithms.

Prerequisites: Python programming, basic linear algebra (matrix multiplication)

Career Connection: Foundation skills required for ML Infrastructure Engineer, Research Engineer, Framework Developer roles

Time Investment: ~20 hours total (3 hours/week for 6-8 weeks)

Module Component Core Capability Real-World Connection
01 Tensor Data structures and operations NumPy, PyTorch tensors
02 Activations Nonlinear functions ReLU, attention activations
03 Layers Linear transformations nn.Linear, dense layers
04 Losses Optimization objectives CrossEntropy, MSE loss
05 Autograd Automatic differentiation PyTorch autograd engine
06 Optimizers Parameter updates Adam, SGD optimizers
07 Training Complete training loops Model.fit(), training scripts

🎯 Tier Milestone: Train neural networks achieving 95%+ accuracy on MNIST using 100% your own implementations!

Skills Gained:

  • Understand memory layout and computational graphs
  • Debug gradient flow and numerical stability issues
  • Implement any optimization algorithm from research papers
  • Build custom neural network architectures from scratch

Architecture Tier (Modules 08-13)

Modern AI Algorithms • 4-6 weeks • Vision + Language Architectures

What You'll Learn: Implement the architectures powering modern AI: convolutional networks for vision and transformers for language. Discover why the same mathematical principles work across domains.

Prerequisites: Foundation Tier complete (Modules 01-07)

Career Connection: Computer Vision Engineer, NLP Engineer, AI Research Scientist, ML Product Manager roles

Time Investment: ~25 hours total (4-6 hours/week for 4-6 weeks)

Module Component Core Capability Real-World Connection
08 Spatial Convolutions and regularization CNNs, ResNet, computer vision
09 DataLoader Batch processing PyTorch DataLoader, tf.data
10 Tokenization Text preprocessing BERT tokenizer, GPT tokenizer
11 Embeddings Representation learning Word2Vec, positional encodings
12 Attention Information routing Multi-head attention, self-attention
13 Transformers Modern architectures GPT, BERT, Vision Transformer

🎯 Tier Milestone: Achieve 75%+ accuracy on CIFAR-10 with CNNs AND generate coherent text with transformers!

Skills Gained:

  • Understand why convolution works for spatial data
  • Implement attention mechanisms from scratch
  • Build transformer architectures for any domain
  • Debug sequence modeling and attention patterns

Optimization Tier (Modules 14-19)

Production & Performance • 4-6 weeks • Deploy and Scale ML Systems

What You'll Learn: Transform research models into production systems. Master profiling, optimization, and deployment techniques used by companies like OpenAI, Google, and Meta.

Prerequisites: Architecture Tier complete (Modules 08-13)

Career Connection: ML Systems Engineer, Performance Engineer, MLOps Engineer, Senior ML Engineer roles

Time Investment: ~30 hours total (5-7 hours/week for 4-6 weeks)

Module Component Core Capability Real-World Connection
14 Profiling Performance analysis PyTorch Profiler, TensorBoard
15 Quantization Memory efficiency INT8 inference, model compression
16 Compression Model optimization Pruning, distillation, ONNX
17 Memoization Memory management KV-cache for generation
18 Acceleration Speed improvements CUDA kernels, vectorization
19 Benchmarking Measurement systems Torch Olympics, production monitoring
20 Capstone Full system integration End-to-end ML pipeline

🎯 Tier Milestone: Build production-ready systems competitive in Torch Olympics benchmarks!

Skills Gained:

  • Profile memory usage and identify bottlenecks
  • Implement efficient inference optimizations
  • Deploy models with <100ms latency requirements
  • Design scalable ML system architectures

Learning Path Recommendations

Choose Your Learning Style

🚀 Complete Builder

Implement every component from scratch

Time: 14-18 weeks
Ideal for: CS students, aspiring ML engineers

Focused Explorer

Pick one tier based on your goals

Time: 4-8 weeks
Ideal for: Working professionals, specific skill gaps

📚 Guided Learner

Study implementations with hands-on exercises

Time: 8-12 weeks
Ideal for: Self-directed learners, bootcamp graduates


Welcome to ML systems engineering!