mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-03-11 21:23:33 -05:00

Files

Vijay Janapa Reddi c058ab9419 Fix documentation links after site → docs reorganization

- Replace all .html → .md in markdown source files (43 instances)
- Fix broken links: tito-essentials.md → tito/overview.md
- Remove broken links to non-existent leaderboard/olympics-rules pages
- Fix PDF_BUILD_GUIDE reference in website-README.md

Website rebuilt successfully with 46 warnings.

Changes:
- All markdown files now use .md extension for internal links
- Removed references to missing/planned files
- Website builds cleanly and all links are functional

2025-11-28 05:01:44 +01:00

20 KiB

Raw Blame History

Course Introduction: ML Systems Engineering Through Implementation

Transform from ML user to ML systems engineer by building everything yourself.

The Origin Story: Why TinyTorch Exists

The Problem We're Solving

There's a critical gap in ML engineering today. Plenty of people can use ML frameworks (PyTorch, TensorFlow, JAX, etc.), but very few understand the systems underneath. This creates real problems:

Engineers deploy models but can't debug when things go wrong
Teams hit performance walls because no one understands the bottlenecks
Companies struggle to scale - whether to tiny edge devices or massive clusters
Innovation stalls when everyone is limited to existing framework capabilities

How TinyTorch Began

TinyTorch started as exercises for the MLSysBook.ai textbook - students needed hands-on implementation experience. But it quickly became clear this addressed a much bigger problem:

The industry desperately needs engineers who can BUILD ML systems, not just USE them.

Deploying ML systems at scale is hard. Scale means both directions:

Small scale: Running models on edge devices with 1MB of RAM
Large scale: Training models across thousands of GPUs
Production scale: Serving millions of requests with <100ms latency

We need more engineers who understand memory hierarchies, computational graphs, kernel optimization, distributed communication - the actual systems that make ML work.

Our Solution: Learn By Building

TinyTorch teaches ML systems the only way that really works: by building them yourself.

When you implement your own tensor operations, write your own autograd, build your own optimizer - you gain understanding that's impossible to achieve by just calling APIs. You learn not just what these systems do, but HOW they do it and WHY they're designed that way.

Core Learning Concepts

Concept 1: Systems Memory Analysis

# Learning objective: Understand memory usage patterns
# Framework user: "torch.optim.Adam()" - black box
# TinyTorch student: Implements Adam and discovers why it needs 3x parameter memory
# Result: Deep understanding of optimizer trade-offs applicable to any framework

Concept 2: Computational Complexity

# Learning objective: Analyze algorithmic scaling behavior
# Framework user: "Attention mechanism" - abstract concept
# TinyTorch student: Implements attention from scratch, measures O(n²) scaling
# Result: Intuition for sequence modeling limits across PyTorch, TensorFlow, JAX

Concept 3: Automatic Differentiation

# Learning objective: Understand gradient computation
# Framework user: "loss.backward()" - mysterious process
# TinyTorch student: Builds autograd engine with computational graphs
# Result: Knowledge of how all modern ML frameworks enable learning

What Makes TinyTorch Different

Most ML education teaches you to use frameworks (PyTorch, TensorFlow, JAX, etc.). TinyTorch teaches you to build them.

This fundamental difference creates engineers who understand systems deeply, not just APIs superficially.

The Learning Philosophy: Build → Use → Reflect

Traditional Approach:

import torch
model = torch.nn.Linear(784, 10)  # Use someone else's implementation
output = model(input)             # Trust it works, don't understand how

TinyTorch Approach:

# 1. BUILD: You implement Linear from scratch
class Linear:
    def forward(self, x):
        return x @ self.weight + self.bias  # You write this
        
# 2. USE: Your implementation in action
from tinytorch.core.layers import Linear  # YOUR code
model = Linear(784, 10)                  # YOUR implementation
output = model(input)                    # YOU know exactly how this works

# 3. REFLECT: Systems thinking
# "Why does matrix multiplication dominate compute time?"
# "How does this scale with larger models?"
# "What memory optimizations are possible?"

Who This Course Serves

Perfect For:

🎓 Computer Science Students

Want to understand ML systems beyond high-level APIs
Need to implement custom operations for research
Preparing for ML engineering roles that require systems knowledge

👩‍💻 Software Engineers → ML Engineers

Transitioning into ML engineering roles
Need to debug and optimize production ML systems
Want to understand what happens "under the hood" of ML frameworks

🔬 ML Practitioners & Researchers

Debug performance issues in production systems
Implement novel architectures and custom operations
Optimize training and inference for resource constraints

🧠 Anyone Curious About ML Systems

Understand how PyTorch, TensorFlow actually work
Build intuition for ML systems design and optimization
Appreciate the engineering behind modern AI breakthroughs

Prerequisites

Required:

Python Programming: Comfortable with classes, functions, basic NumPy
Linear Algebra Basics: Matrix multiplication, gradients (we review as needed)
Learning Mindset: Willingness to implement rather than just use

Not Required:

Prior ML framework experience (we build our own!)
Deep learning theory (we learn through implementation)
Advanced math (we focus on practical systems implementation)

What You'll Achieve: Tier-by-Tier Mastery

After Foundation Tier (Modules 01-07)

Build a complete neural network framework from mathematical first principles:

# YOUR implementation training real networks on real data
model = Sequential([
    Linear(784, 128),    # Your linear algebra implementation
    ReLU(),              # Your activation function
    Linear(128, 64),     # Your gradient-aware layers
    ReLU(),              # Your nonlinearity
    Linear(64, 10)       # Your classification head
])

# YOUR complete training system
optimizer = Adam(model.parameters(), lr=0.001)  # Your optimization algorithm
for batch in dataloader:  # Your data management
    output = model(batch.x)                     # Your forward computation
    loss = CrossEntropyLoss()(output, batch.y)  # Your loss calculation
    loss.backward()                             # YOUR backpropagation engine
    optimizer.step()                            # Your parameter updates

🎯 Foundation Achievement: 95%+ accuracy on MNIST using 100% your own mathematical implementations

After Architecture Tier (Modules 08-13)

Computer Vision Mastery: CNNs achieving 75%+ accuracy on CIFAR-10 with YOUR convolution implementations
Language Understanding: Transformers generating coherent text using YOUR attention mechanisms
Universal Architecture: Discover why the SAME mathematical principles work for vision AND language
AI Breakthrough Recreation: Implement the architectures that created the modern AI revolution

After Optimization Tier (Modules 14-20)

Production Performance: Systems optimized for <100ms inference latency using YOUR profiling tools
Memory Efficiency: Models compressed to 25% original size with YOUR quantization implementations
Hardware Acceleration: Kernels achieving 10x speedups through YOUR vectorization techniques
Competition Ready: Torch Olympics submissions competitive with industry implementations

The ML Evolution Story You'll Experience

TinyTorch's three-tier structure follows the actual historical progression of machine learning breakthroughs:

Foundation Era (1980s-1990s) → Foundation Tier

The Beginning: Mathematical foundations that started it all

1986 Breakthrough: Backpropagation enables multi-layer networks
Your Implementation: Build automatic differentiation and gradient-based optimization
Historical Milestone: Train MLPs to 95%+ accuracy on MNIST using YOUR autograd engine

Architecture Era (1990s-2010s) → Architecture Tier

The Revolution: Specialized architectures for vision and language

1998 Breakthrough: CNNs revolutionize computer vision (LeCun's LeNet)
2017 Breakthrough: Transformers unify vision and language ("Attention is All You Need")
Your Implementation: Build CNNs achieving 75%+ on CIFAR-10, then transformers for text generation
Historical Milestone: Recreate both revolutions using YOUR spatial and attention implementations

Optimization Era (2010s-Present) → Optimization Tier

The Engineering: Production systems that scale to billions of users

2020s Breakthrough: Efficient inference enables real-time LLMs (GPT, ChatGPT)
Your Implementation: Build KV-caching, quantization, and production optimizations
Historical Milestone: Deploy systems competitive in Torch Olympics benchmarks

Why This Progression Matters: You'll understand not just modern AI, but WHY it evolved this way. Each tier builds essential capabilities that inform the next, just like ML history itself.

Systems Engineering Focus: Why Tiers Matter

Traditional ML courses teach algorithms in isolation. TinyTorch's tier structure teaches systems thinking - how components interact to create production ML systems.

Traditional Linear Approach:

Module 1: Tensors → Module 2: Layers → Module 3: Training → ...

Problem: Students learn components but miss system interactions

TinyTorch Tier Approach:

🏗️ Foundation Tier: Build mathematical infrastructure
🏛️ Architecture Tier: Compose intelligent architectures
⚡ Optimization Tier: Deploy at production scale

Advantage: Each tier builds complete, working systems with clear progression

What Traditional Courses Teach vs. TinyTorch Tiers:

Traditional: "Use torch.optim.Adam for optimization" Foundation Tier: "Why Adam needs 3× more memory than SGD and how to implement both from mathematical first principles"

Traditional: "Transformers use attention mechanisms" Architecture Tier: "How attention creates O(N²) scaling, why this limits context windows, and how to implement efficient attention yourself"

Traditional: "Deploy models with TensorFlow Serving" Optimization Tier: "How to profile bottlenecks, implement KV-caching for 10× speedup, and compete in production benchmarks"

Career Impact by Tier

After each tier, you become the team member who:

🏗️ Foundation Tier Graduate:

Debugs gradient flow issues: "Your ReLU is causing dead neurons"
Implements custom optimizers: "I'll build a variant of Adam for this use case"
Understands memory patterns: "Batch size 64 hits your GPU memory limit here"

🏛️ Architecture Tier Graduate:

Designs novel architectures: "We can adapt transformers for this computer vision task"
Optimizes attention patterns: "This attention bottleneck is why your model won't scale to longer sequences"
Bridges vision and language: "The same mathematical principles work for both domains"

⚡ Optimization Tier Graduate:

Deploys production systems: "I can get us from 500ms to 50ms inference latency"
Leads performance optimization: "Here's our memory bottleneck and my 3-step plan to fix it"
Competes at industry scale: "Our optimizations achieve Torch Olympics benchmark performance"

Learning Support & Community

Comprehensive Infrastructure

Automated Testing: Every component includes comprehensive test suites
Progress Tracking: 16-checkpoint capability assessment system
CLI Tools: tito command-line interface for development workflow
Visual Progress: Real-time tracking of learning milestones

Multiple Learning Paths

Quick Exploration (5 min): Browser-based exploration, no setup required
Serious Development (8+ weeks): Full local development environment
Classroom Use: Complete course infrastructure with automated grading

Professional Development Practices

Version Control: Git-based workflow with feature branches
Testing Culture: Test-driven development for all implementations
Code Quality: Professional coding standards and review processes
Documentation: Comprehensive guides and system architecture documentation

Start Your Journey

Begin Building ML Systems

Choose your starting point based on your goals and time commitment

15-Minute Start → Foundation Tier →

Next Steps:

New to TinyTorch: Start with Quick Start Guide for immediate hands-on experience
Ready to Commit: Begin Module 01: Tensor to start building
Teaching a Course: Review Getting Started Guide - For Instructors for classroom integration

:class: tip
By completing all three tiers, you'll have built a complete ML framework that rivals production implementations:

**🏗️ Foundation Tier Achievement**: 95%+ accuracy on MNIST with YOUR mathematical implementations
**🏛️ Architecture Tier Achievement**: 75%+ accuracy on CIFAR-10 AND coherent text generation
**⚡ Optimization Tier Achievement**: Production systems competitive in Torch Olympics benchmarks

All using code you wrote yourself, from mathematical first principles to production optimization.

📖 Want to understand the pedagogical narrative behind this structure? See The Learning Journey to understand WHY modules flow this way and HOW they build on each other through a six-act learning story.

Foundation Tier (Modules 01-07)

Building Blocks of ML Systems • 6-8 weeks • All Prerequisites for Neural Networks

What You'll Learn: Build the mathematical and computational infrastructure that powers all neural networks. Master tensor operations, gradient computation, and optimization algorithms.

Prerequisites: Python programming, basic linear algebra (matrix multiplication)

Career Connection: Foundation skills required for ML Infrastructure Engineer, Research Engineer, Framework Developer roles

Time Investment: ~20 hours total (3 hours/week for 6-8 weeks)

Module	Component	Core Capability	Real-World Connection
01	Tensor	Data structures and operations	NumPy, PyTorch tensors
02	Activations	Nonlinear functions	ReLU, attention activations
03	Layers	Linear transformations	`nn.Linear`, dense layers
04	Losses	Optimization objectives	CrossEntropy, MSE loss
05	Autograd	Automatic differentiation	PyTorch autograd engine
06	Optimizers	Parameter updates	Adam, SGD optimizers
07	Training	Complete training loops	Model.fit(), training scripts

🎯 Tier Milestone: Train neural networks achieving 95%+ accuracy on MNIST using 100% your own implementations!

Skills Gained:

Understand memory layout and computational graphs
Debug gradient flow and numerical stability issues
Implement any optimization algorithm from research papers
Build custom neural network architectures from scratch

Architecture Tier (Modules 08-13)

Modern AI Algorithms • 4-6 weeks • Vision + Language Architectures

What You'll Learn: Implement the architectures powering modern AI: convolutional networks for vision and transformers for language. Discover why the same mathematical principles work across domains.

Prerequisites: Foundation Tier complete (Modules 01-07)

Career Connection: Computer Vision Engineer, NLP Engineer, AI Research Scientist, ML Product Manager roles

Time Investment: ~25 hours total (4-6 hours/week for 4-6 weeks)

Module	Component	Core Capability	Real-World Connection
08	Spatial	Convolutions and regularization	CNNs, ResNet, computer vision
09	DataLoader	Batch processing	PyTorch DataLoader, tf.data
10	Tokenization	Text preprocessing	BERT tokenizer, GPT tokenizer
11	Embeddings	Representation learning	Word2Vec, positional encodings
12	Attention	Information routing	Multi-head attention, self-attention
13	Transformers	Modern architectures	GPT, BERT, Vision Transformer

🎯 Tier Milestone: Achieve 75%+ accuracy on CIFAR-10 with CNNs AND generate coherent text with transformers!

Skills Gained:

Understand why convolution works for spatial data
Implement attention mechanisms from scratch
Build transformer architectures for any domain
Debug sequence modeling and attention patterns

Optimization Tier (Modules 14-19)

Production & Performance • 4-6 weeks • Deploy and Scale ML Systems

What You'll Learn: Transform research models into production systems. Master profiling, optimization, and deployment techniques used by companies like OpenAI, Google, and Meta.

Prerequisites: Architecture Tier complete (Modules 08-13)

Career Connection: ML Systems Engineer, Performance Engineer, MLOps Engineer, Senior ML Engineer roles

Time Investment: ~30 hours total (5-7 hours/week for 4-6 weeks)

Module	Component	Core Capability	Real-World Connection
14	Profiling	Performance analysis	PyTorch Profiler, TensorBoard
15	Quantization	Memory efficiency	INT8 inference, model compression
16	Compression	Model optimization	Pruning, distillation, ONNX
17	Memoization	Memory management	KV-cache for generation
18	Acceleration	Speed improvements	CUDA kernels, vectorization
19	Benchmarking	Measurement systems	Torch Olympics, production monitoring
20	Capstone	Full system integration	End-to-end ML pipeline

🎯 Tier Milestone: Build production-ready systems competitive in Torch Olympics benchmarks!

Skills Gained:

Profile memory usage and identify bottlenecks
Implement efficient inference optimizations
Deploy models with <100ms latency requirements
Design scalable ML system architectures

Learning Path Recommendations

Choose Your Learning Style

🚀 Complete Builder

Implement every component from scratch

Time: 14-18 weeks
Ideal for: CS students, aspiring ML engineers

⚡ Focused Explorer

Pick one tier based on your goals

Time: 4-8 weeks
Ideal for: Working professionals, specific skill gaps

📚 Guided Learner

Study implementations with hands-on exercises

Time: 8-12 weeks
Ideal for: Self-directed learners, bootcamp graduates

Welcome to ML systems engineering!

20 KiB Raw Blame History Unescape Escape

Course Introduction: ML Systems Engineering Through Implementation

The Origin Story: Why TinyTorch Exists

The Problem We're Solving

How TinyTorch Began

Our Solution: Learn By Building

Core Learning Concepts

What Makes TinyTorch Different

The Learning Philosophy: Build → Use → Reflect

Who This Course Serves

Perfect For:

Prerequisites

What You'll Achieve: Tier-by-Tier Mastery

After Foundation Tier (Modules 01-07)

After Architecture Tier (Modules 08-13)

After Optimization Tier (Modules 14-20)

The ML Evolution Story You'll Experience

Foundation Era (1980s-1990s) → Foundation Tier

Architecture Era (1990s-2010s) → Architecture Tier

Optimization Era (2010s-Present) → Optimization Tier

Systems Engineering Focus: Why Tiers Matter

Traditional Linear Approach:

TinyTorch Tier Approach:

What Traditional Courses Teach vs. TinyTorch Tiers:

Career Impact by Tier

Learning Support & Community

Comprehensive Infrastructure

Multiple Learning Paths

Professional Development Practices

Start Your Journey

Begin Building ML Systems

Foundation Tier (Modules 01-07)

Architecture Tier (Modules 08-13)

Optimization Tier (Modules 14-19)

Learning Path Recommendations

Choose Your Learning Style

🚀 Complete Builder

⚡ Focused Explorer

📚 Guided Learner

20 KiB

Raw Blame History