mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-06 13:12:33 -05:00
✅ Rename all module directories: 00_setup → 01_setup, etc. ✅ Update convert_modules.py mappings for new directory names ✅ Update _toc.yml file paths and titles (1-14 instead of 0-13) ✅ Regenerate all overview pages with new numbering ✅ Fix all broken references in usage-paths and intro ✅ Update chapter references to use natural numbering Benefits: - More intuitive course progression starting from 1 - Matches academic course numbering conventions - Eliminates confusion about 'Module 0' concept - Cleaner mental model for students and instructors - All references and links properly updated Complete transformation: 14 modules now numbered 01-14
🧠 Module 7: Autograd - Automatic Differentiation Engine
📊 Module Info
- Difficulty: ⭐⭐⭐⭐ Advanced
- Time Estimate: 6-8 hours
- Prerequisites: Tensor, Activations, Layers modules
- Next Steps: Training, Optimizers modules
Build the automatic differentiation engine that makes neural network training possible
🎯 Learning Objectives
After completing this module, you will:
- Understand how automatic differentiation works through computational graphs
- Implement the Variable class that tracks gradients and operations
- Build backward propagation for gradient computation
- Create differentiable versions of all mathematical operations
- Master the mathematical foundations of backpropagation
🧠 Build → Use → Analyze
This module follows the TinyTorch pedagogical framework:
- Build: Create the Variable class and gradient computation system
- Use: Perform automatic differentiation on complex expressions
- Analyze: Understand how gradients flow through computational graphs and optimize performance
📚 What You'll Build
Variable Class
# Gradient-tracking wrapper around Tensors
x = Variable(5.0, requires_grad=True)
y = Variable(3.0, requires_grad=True)
z = x * y + x**2
z.backward()
print(x.grad) # Gradient of z with respect to x
print(y.grad) # Gradient of z with respect to y
Differentiable Operations
# All operations track gradients automatically
def f(x, y):
return x**2 + 2*x*y + y**2
x = Variable(2.0, requires_grad=True)
y = Variable(3.0, requires_grad=True)
result = f(x, y)
result.backward()
print(f"df/dx = {x.grad}") # Should be 2x + 2y = 10
print(f"df/dy = {y.grad}") # Should be 2x + 2y = 10
Neural Network Integration
# Works seamlessly with existing TinyTorch components
from tinytorch.core.activations import ReLU
from tinytorch.core.layers import Dense
# Create differentiable network
x = Variable([[1.0, 2.0, 3.0]])
layer = Dense(3, 2)
relu = ReLU()
# Forward pass with gradient tracking
output = relu(layer(x))
loss = output.sum()
loss.backward()
# Gradients available for all parameters
print(layer.weights.grad) # Weight gradients
print(layer.bias.grad) # Bias gradients
🚀 Getting Started
Prerequisites
-
Activate the virtual environment:
source bin/activate-tinytorch.sh -
Start development environment:
tito jupyter
Development Workflow
-
Open the development file:
# Then open modules/source/07_autograd/autograd_dev.py -
Implement the core components:
- Start with Variable class (gradient tracking)
- Add basic operations (add, multiply, etc.)
- Implement backward propagation
- Add activation function gradients
-
Test your implementation:
tito test --module 07_autograd
📊 Understanding Automatic Differentiation
The Chain Rule in Action
Automatic differentiation is based on the chain rule:
If z = f(g(x)), then dz/dx = (dz/df) * (df/dx)
Computational Graph Example
Expression: f(x, y) = (x + y) * (x - y)
Forward Pass:
x = 2, y = 3
a = x + y = 5
b = x - y = -1
f = a * b = -5
Backward Pass:
df/df = 1
df/da = b = -1, df/db = a = 5
da/dx = 1, da/dy = 1
db/dx = 1, db/dy = -1
df/dx = df/da * da/dx + df/db * db/dx = (-1)(1) + (5)(1) = 4
df/dy = df/da * da/dy + df/db * db/dy = (-1)(1) + (5)(-1) = -6
Key Concepts
| Concept | Description | Example |
|---|---|---|
| Variable | Tensor wrapper with gradient tracking | Variable(5.0, requires_grad=True) |
| Computational Graph | DAG representing operations | z = x * y creates graph |
| Forward Pass | Computing function values | z.data contains result |
| Backward Pass | Computing gradients | z.backward() fills gradients |
| Leaf Node | Variable created by user | x = Variable(5.0) |
| Gradient Function | How to compute gradients | grad_fn for each operation |
🧪 Testing Your Implementation
Unit Tests
tito test --module 07_autograd
Test Coverage:
- ✅ Variable creation and properties
- ✅ Basic arithmetic operations
- ✅ Gradient computation correctness
- ✅ Chain rule implementation
- ✅ Integration with existing modules
Manual Testing
# Test basic gradients
x = Variable(2.0, requires_grad=True)
y = x**2 + 3*x + 1
y.backward()
print(x.grad) # Should be 2*2 + 3 = 7
# Test chain rule
x = Variable(2.0, requires_grad=True)
y = Variable(3.0, requires_grad=True)
z = x * y
w = z + x
w.backward()
print(x.grad) # Should be y + 1 = 4
print(y.grad) # Should be x = 2
📊 Mathematical Foundations
Gradient Computation Rules
| Operation | Forward | Backward (Gradient) |
|---|---|---|
| Addition | z = x + y |
dx = dz, dy = dz |
| Multiplication | z = x * y |
dx = y * dz, dy = x * dz |
| Power | z = x^n |
dx = n * x^(n-1) * dz |
| Exp | z = exp(x) |
dx = exp(x) * dz |
| Log | z = log(x) |
dx = (1/x) * dz |
| ReLU | z = max(0, x) |
dx = (x > 0) * dz |
| Sigmoid | z = 1/(1+exp(-x)) |
dx = z * (1-z) * dz |
Advanced Concepts
- Higher-order gradients: Gradients of gradients
- Jacobian matrices: Gradients for vector functions
- Hessian matrices: Second-order derivatives
- Gradient checkpointing: Memory optimization
🔧 Integration with TinyTorch
After implementation, your autograd system will enable:
from tinytorch.core.autograd import Variable
from tinytorch.core.layers import Dense
from tinytorch.core.activations import ReLU
# Create a simple neural network
x = Variable([[1.0, 2.0, 3.0]])
layer1 = Dense(3, 4)
layer2 = Dense(4, 1)
relu = ReLU()
# Forward pass
h = relu(layer1(x))
output = layer2(h)
loss = output.sum()
# Backward pass
loss.backward()
# All gradients computed automatically!
print(layer1.weights.grad)
print(layer2.weights.grad)
🎯 Success Criteria
Your autograd module is complete when:
- All tests pass:
tito test --module 07_autograd - Variable imports correctly:
from tinytorch.core.autograd import Variable - Basic operations work: Can create Variables and do arithmetic
- Gradients compute correctly: Backward pass produces correct gradients
- Integration works: Seamlessly works with existing TinyTorch modules
💡 Implementation Tips
Start with the Basics
- Variable class - Wrap Tensors with gradient tracking
- Simple operations - Start with addition and multiplication
- Backward method - Implement gradient computation
- Test frequently - Verify gradients match analytical solutions
Design Patterns
class Variable:
def __init__(self, data, requires_grad=True, grad_fn=None):
# Store data, gradient state, and computation history
def backward(self, gradient=None):
# Implement backpropagation using chain rule
def add(a, b):
# Create new Variable with grad_fn that knows how to backprop
def backward_fn(grad):
# Distribute gradient to inputs
return Variable(result, grad_fn=backward_fn)
Common Challenges
- Gradient accumulation - Handle multiple paths to same Variable
- Memory management - Store intermediate values efficiently
- Numerical stability - Handle edge cases in gradient computation
- Graph construction - Build computation graph correctly
🔧 Advanced Features (Optional)
If you finish early, try implementing:
- Higher-order gradients - Gradients of gradients
- Gradient checkpointing - Memory optimization
- Custom operations - Define your own differentiable functions
- Gradient clipping - Prevent exploding gradients
🚀 Next Steps
Once you complete the autograd module:
- Move to Training:
cd modules/source/08_training/ - Build optimization algorithms: Implement SGD, Adam, etc.
- Create training loops: Put it all together
- Train real models: Use your autograd system for actual ML!
🔗 Why Autograd Matters
Automatic differentiation is the foundation of modern ML:
- Neural networks require gradients for backpropagation
- Optimization needs gradients for parameter updates
- Research benefits from easy gradient computation
- Production systems rely on efficient autodiff
This module transforms TinyTorch from a static computation library into a dynamic, trainable ML framework!