mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-06 13:12:33 -05:00

Files

Vijay Janapa Reddi b34f3681dd Renumber modules from 00-13 to 01-14 for natural numbering

✅ Rename all module directories: 00_setup → 01_setup, etc.
✅ Update convert_modules.py mappings for new directory names
✅ Update _toc.yml file paths and titles (1-14 instead of 0-13)
✅ Regenerate all overview pages with new numbering
✅ Fix all broken references in usage-paths and intro
✅ Update chapter references to use natural numbering

Benefits:
- More intuitive course progression starting from 1
- Matches academic course numbering conventions
- Eliminates confusion about 'Module 0' concept
- Cleaner mental model for students and instructors
- All references and links properly updated

Complete transformation: 14 modules now numbered 01-14

2025-07-15 18:51:36 -04:00

autograd_dev_backup.py

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

autograd_dev.py

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

module.yaml

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

README.md

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

README.md

🧠 Module 7: Autograd - Automatic Differentiation Engine

📊 Module Info

Difficulty: ⭐⭐⭐⭐ Advanced
Time Estimate: 6-8 hours
Prerequisites: Tensor, Activations, Layers modules
Next Steps: Training, Optimizers modules

Build the automatic differentiation engine that makes neural network training possible

🎯 Learning Objectives

After completing this module, you will:

Understand how automatic differentiation works through computational graphs
Implement the Variable class that tracks gradients and operations
Build backward propagation for gradient computation
Create differentiable versions of all mathematical operations
Master the mathematical foundations of backpropagation

🧠 Build → Use → Analyze

This module follows the TinyTorch pedagogical framework:

Build: Create the Variable class and gradient computation system
Use: Perform automatic differentiation on complex expressions
Analyze: Understand how gradients flow through computational graphs and optimize performance

📚 What You'll Build

Variable Class

# Gradient-tracking wrapper around Tensors
x = Variable(5.0, requires_grad=True)
y = Variable(3.0, requires_grad=True)
z = x * y + x**2
z.backward()
print(x.grad)  # Gradient of z with respect to x
print(y.grad)  # Gradient of z with respect to y

Differentiable Operations

# All operations track gradients automatically
def f(x, y):
    return x**2 + 2*x*y + y**2

x = Variable(2.0, requires_grad=True)
y = Variable(3.0, requires_grad=True)
result = f(x, y)
result.backward()
print(f"df/dx = {x.grad}")  # Should be 2x + 2y = 10
print(f"df/dy = {y.grad}")  # Should be 2x + 2y = 10

Neural Network Integration

# Works seamlessly with existing TinyTorch components
from tinytorch.core.activations import ReLU
from tinytorch.core.layers import Dense

# Create differentiable network
x = Variable([[1.0, 2.0, 3.0]])
layer = Dense(3, 2)
relu = ReLU()

# Forward pass with gradient tracking
output = relu(layer(x))
loss = output.sum()
loss.backward()

# Gradients available for all parameters
print(layer.weights.grad)  # Weight gradients
print(layer.bias.grad)     # Bias gradients

🚀 Getting Started

Prerequisites

Activate the virtual environment:
```
source bin/activate-tinytorch.sh
```
Start development environment:
```
tito jupyter
```

Development Workflow

Open the development file:

# Then open modules/source/07_autograd/autograd_dev.py

Implement the core components:
- Start with Variable class (gradient tracking)
- Add basic operations (add, multiply, etc.)
- Implement backward propagation
- Add activation function gradients
Test your implementation:
```
tito test --module 07_autograd
```

📊 Understanding Automatic Differentiation

The Chain Rule in Action

Automatic differentiation is based on the chain rule:

If z = f(g(x)), then dz/dx = (dz/df) * (df/dx)

Computational Graph Example

Expression: f(x, y) = (x + y) * (x - y)

Forward Pass:
x = 2, y = 3
a = x + y = 5
b = x - y = -1  
f = a * b = -5

Backward Pass:
df/df = 1
df/da = b = -1, df/db = a = 5
da/dx = 1, da/dy = 1
db/dx = 1, db/dy = -1
df/dx = df/da * da/dx + df/db * db/dx = (-1)(1) + (5)(1) = 4
df/dy = df/da * da/dy + df/db * db/dy = (-1)(1) + (5)(-1) = -6

Key Concepts

Concept	Description	Example
Variable	Tensor wrapper with gradient tracking	`Variable(5.0, requires_grad=True)`
Computational Graph	DAG representing operations	`z = x * y` creates graph
Forward Pass	Computing function values	`z.data` contains result
Backward Pass	Computing gradients	`z.backward()` fills gradients
Leaf Node	Variable created by user	`x = Variable(5.0)`
Gradient Function	How to compute gradients	`grad_fn` for each operation

🧪 Testing Your Implementation

Unit Tests

tito test --module 07_autograd

Test Coverage:

✅ Variable creation and properties
✅ Basic arithmetic operations
✅ Gradient computation correctness
✅ Chain rule implementation
✅ Integration with existing modules

Manual Testing

# Test basic gradients
x = Variable(2.0, requires_grad=True)
y = x**2 + 3*x + 1
y.backward()
print(x.grad)  # Should be 2*2 + 3 = 7

# Test chain rule
x = Variable(2.0, requires_grad=True)
y = Variable(3.0, requires_grad=True)
z = x * y
w = z + x
w.backward()
print(x.grad)  # Should be y + 1 = 4
print(y.grad)  # Should be x = 2

📊 Mathematical Foundations

Gradient Computation Rules

Operation	Forward	Backward (Gradient)
Addition	`z = x + y`	`dx = dz, dy = dz`
Multiplication	`z = x * y`	`dx = y * dz, dy = x * dz`
Power	`z = x^n`	`dx = n * x^(n-1) * dz`
Exp	`z = exp(x)`	`dx = exp(x) * dz`
Log	`z = log(x)`	`dx = (1/x) * dz`
ReLU	`z = max(0, x)`	`dx = (x > 0) * dz`
Sigmoid	`z = 1/(1+exp(-x))`	`dx = z * (1-z) * dz`

Advanced Concepts

Higher-order gradients: Gradients of gradients
Jacobian matrices: Gradients for vector functions
Hessian matrices: Second-order derivatives
Gradient checkpointing: Memory optimization

🔧 Integration with TinyTorch

After implementation, your autograd system will enable:

from tinytorch.core.autograd import Variable
from tinytorch.core.layers import Dense
from tinytorch.core.activations import ReLU

# Create a simple neural network
x = Variable([[1.0, 2.0, 3.0]])
layer1 = Dense(3, 4)
layer2 = Dense(4, 1)
relu = ReLU()

# Forward pass
h = relu(layer1(x))
output = layer2(h)
loss = output.sum()

# Backward pass
loss.backward()

# All gradients computed automatically!
print(layer1.weights.grad)
print(layer2.weights.grad)

🎯 Success Criteria

Your autograd module is complete when:

All tests pass: tito test --module 07_autograd
Variable imports correctly: from tinytorch.core.autograd import Variable
Basic operations work: Can create Variables and do arithmetic
Gradients compute correctly: Backward pass produces correct gradients
Integration works: Seamlessly works with existing TinyTorch modules

💡 Implementation Tips

Start with the Basics

Variable class - Wrap Tensors with gradient tracking
Simple operations - Start with addition and multiplication
Backward method - Implement gradient computation
Test frequently - Verify gradients match analytical solutions

Design Patterns

class Variable:
    def __init__(self, data, requires_grad=True, grad_fn=None):
        # Store data, gradient state, and computation history
        
    def backward(self, gradient=None):
        # Implement backpropagation using chain rule
        
def add(a, b):
    # Create new Variable with grad_fn that knows how to backprop
    def backward_fn(grad):
        # Distribute gradient to inputs
    return Variable(result, grad_fn=backward_fn)

Common Challenges

Gradient accumulation - Handle multiple paths to same Variable
Memory management - Store intermediate values efficiently
Numerical stability - Handle edge cases in gradient computation
Graph construction - Build computation graph correctly

🔧 Advanced Features (Optional)

If you finish early, try implementing:

Higher-order gradients - Gradients of gradients
Gradient checkpointing - Memory optimization
Custom operations - Define your own differentiable functions
Gradient clipping - Prevent exploding gradients

🚀 Next Steps

Once you complete the autograd module:

Move to Training: cd modules/source/08_training/
Build optimization algorithms: Implement SGD, Adam, etc.
Create training loops: Put it all together
Train real models: Use your autograd system for actual ML!

🔗 Why Autograd Matters

Automatic differentiation is the foundation of modern ML:

Neural networks require gradients for backpropagation
Optimization needs gradients for parameter updates
Research benefits from easy gradient computation
Production systems rely on efficient autodiff

This module transforms TinyTorch from a static computation library into a dynamic, trainable ML framework!