mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-30 23:31:32 -05:00
- Remove circular imports where modules imported from themselves - Convert tinytorch.core imports to sys.path relative imports - Only import dependencies that are actually used in each module - Preserve documentation imports in markdown cells - Use consistent relative path pattern across all modules - Remove hardcoded absolute paths in favor of relative imports Affected modules: 02_activations, 03_layers, 04_losses, 06_optimizers, 07_training, 09_spatial, 12_attention, 17_quantization
5.6 KiB
5.6 KiB
The Simplest Solution: Progressive Enhancement
The Problem with My Previous Solution
- Too much upfront complexity - Module 01 has gradient stuff students don't understand
- Confusing placeholders - Why does
backward()exist but do nothing? - Not pedagogically sound - Students see complexity before they need it
The Simplest Solution: Just Add Attributes When Needed
Module 01-04: Keep It SIMPLE
# Module 01: tensor_dev.py
class Tensor:
"""Simple tensor - just data and operations."""
def __init__(self, data):
self.data = np.array(data)
self.shape = self.data.shape
def __add__(self, other):
return Tensor(self.data + other.data)
def __mul__(self, other):
return Tensor(self.data * other.data)
# That's it! No gradient stuff at all.
Module 05: Dynamically Add Gradient Support
# Module 05: autograd_dev.py
from tinytorch import Tensor
# Python lets us add attributes and methods at runtime!
def enable_gradients():
"""Upgrade Tensor class with gradient support."""
# Add gradient storage to __init__
original_init = Tensor.__init__
def init_with_grad(self, data, requires_grad=False):
original_init(self, data)
self.requires_grad = requires_grad
self.grad = None
self._backward = lambda: None
Tensor.__init__ = init_with_grad
# Add backward method
def backward(self, grad=None):
if not hasattr(self, 'requires_grad') or not self.requires_grad:
return
if grad is None:
grad = np.ones_like(self.data)
# Accumulate gradients
if self.grad is None:
self.grad = grad
else:
self.grad += grad
# Call the operation's backward
self._backward()
Tensor.backward = backward
# Wrap operations to track gradients
original_add = Tensor.__add__
def add_with_grad(self, other):
result = original_add(self, other)
# Only track if needed
if (hasattr(self, 'requires_grad') and self.requires_grad) or \
(hasattr(other, 'requires_grad') and other.requires_grad):
result.requires_grad = True
def _backward():
if hasattr(self, 'requires_grad') and self.requires_grad:
self.backward(result.grad)
if hasattr(other, 'requires_grad') and other.requires_grad:
other.backward(result.grad)
result._backward = _backward
return result
Tensor.__add__ = add_with_grad
# Enable the gradient system
enable_gradients()
# Now Tensors created AFTER this point can have gradients
x = Tensor([1, 2, 3], requires_grad=True) # Works!
Why This Is Better
| Aspect | Previous (Forward-Compatible) | This (Progressive) |
|---|---|---|
| Module 01 complexity | Has confusing gradient placeholders | Just data and ops |
| Student confusion | "Why is requires_grad there?" | Everything makes sense |
| Implementation | Careful planning needed | Natural progression |
| Pedagogical value | Shows "planning ahead" | Shows "evolving design" |
Alternative: Even Simpler with Subclassing
Module 01-04: Basic Tensor
class Tensor:
def __init__(self, data):
self.data = np.array(data)
Module 05: Introduce GradTensor
class GradTensor(Tensor):
"""Tensor with gradient support."""
def __init__(self, data, requires_grad=False):
super().__init__(data)
self.requires_grad = requires_grad
self.grad = None
self._backward = lambda: None
def backward(self, grad=None):
# Implementation here
pass
# Make Tensor an alias to GradTensor
import tinytorch
tinytorch.Tensor = GradTensor # Replace globally!
Alternative: Context Manager Pattern (Like TensorFlow)
Keep Tensor Simple Forever
class Tensor:
def __init__(self, data):
self.data = np.array(data)
Module 05: Add Gradient Tape
class GradientTape:
"""Context manager for gradient tracking."""
def __enter__(self):
self.tape = []
return self
def watch(self, tensor):
tensor._tape_grad = None
def gradient(self, target, sources):
# Compute gradients
return grads
# Usage:
with GradientTape() as tape:
tape.watch(x)
y = x * 2 + 1
loss = y.sum()
grads = tape.gradient(loss, [x])
Alternative: Functional Pattern (Like JAX)
def grad(f):
"""Return gradient function of f."""
def grad_f(x):
# Use finite differences or AD
return gradient
return grad_f
# Usage:
def loss_fn(params):
return (params ** 2).sum()
grad_fn = grad(loss_fn)
gradients = grad_fn(params)
🎯 RECOMMENDATION: Progressive Enhancement
Go with the simplest approach:
- Modules 01-04: Dead simple Tensor class (no gradient stuff)
- Module 05: Monkey-patch to add gradient support
- Key insight: Old Tensors (created before Module 05) won't have gradients, but that's fine - students won't use them for training anyway!
Why this is best:
- ✅ Maximally simple for students in early modules
- ✅ Natural progression - complexity only when needed
- ✅ Pedagogically sound - students see evolution of a framework
- ✅ No wasted concepts - everything introduced has immediate use
- ✅ Honest about engineering - real frameworks evolve too!
The implementation is just:
- Module 01: 50 lines of simple Tensor
- Module 05: 100 lines to add gradients via monkey-patching
- Module 06+: Everything just works with the enhanced Tensor
This is what I recommend!