Files
TinyTorch/modules/source/archive/MODULE_PLAN_SIMPLEST_SOLUTION.md
Vijay Janapa Reddi e1a9541c4b Clean up module imports: convert tinytorch.core to sys.path style
- Remove circular imports where modules imported from themselves
- Convert tinytorch.core imports to sys.path relative imports
- Only import dependencies that are actually used in each module
- Preserve documentation imports in markdown cells
- Use consistent relative path pattern across all modules
- Remove hardcoded absolute paths in favor of relative imports

Affected modules: 02_activations, 03_layers, 04_losses, 06_optimizers,
07_training, 09_spatial, 12_attention, 17_quantization
2025-09-30 08:58:58 -04:00

5.6 KiB

The Simplest Solution: Progressive Enhancement

The Problem with My Previous Solution

  • Too much upfront complexity - Module 01 has gradient stuff students don't understand
  • Confusing placeholders - Why does backward() exist but do nothing?
  • Not pedagogically sound - Students see complexity before they need it

The Simplest Solution: Just Add Attributes When Needed

Module 01-04: Keep It SIMPLE

# Module 01: tensor_dev.py
class Tensor:
    """Simple tensor - just data and operations."""
    def __init__(self, data):
        self.data = np.array(data)
        self.shape = self.data.shape

    def __add__(self, other):
        return Tensor(self.data + other.data)

    def __mul__(self, other):
        return Tensor(self.data * other.data)

# That's it! No gradient stuff at all.

Module 05: Dynamically Add Gradient Support

# Module 05: autograd_dev.py
from tinytorch import Tensor

# Python lets us add attributes and methods at runtime!
def enable_gradients():
    """Upgrade Tensor class with gradient support."""

    # Add gradient storage to __init__
    original_init = Tensor.__init__
    def init_with_grad(self, data, requires_grad=False):
        original_init(self, data)
        self.requires_grad = requires_grad
        self.grad = None
        self._backward = lambda: None

    Tensor.__init__ = init_with_grad

    # Add backward method
    def backward(self, grad=None):
        if not hasattr(self, 'requires_grad') or not self.requires_grad:
            return

        if grad is None:
            grad = np.ones_like(self.data)

        # Accumulate gradients
        if self.grad is None:
            self.grad = grad
        else:
            self.grad += grad

        # Call the operation's backward
        self._backward()

    Tensor.backward = backward

    # Wrap operations to track gradients
    original_add = Tensor.__add__
    def add_with_grad(self, other):
        result = original_add(self, other)

        # Only track if needed
        if (hasattr(self, 'requires_grad') and self.requires_grad) or \
           (hasattr(other, 'requires_grad') and other.requires_grad):
            result.requires_grad = True

            def _backward():
                if hasattr(self, 'requires_grad') and self.requires_grad:
                    self.backward(result.grad)
                if hasattr(other, 'requires_grad') and other.requires_grad:
                    other.backward(result.grad)

            result._backward = _backward

        return result

    Tensor.__add__ = add_with_grad

# Enable the gradient system
enable_gradients()

# Now Tensors created AFTER this point can have gradients
x = Tensor([1, 2, 3], requires_grad=True)  # Works!

Why This Is Better

Aspect Previous (Forward-Compatible) This (Progressive)
Module 01 complexity Has confusing gradient placeholders Just data and ops
Student confusion "Why is requires_grad there?" Everything makes sense
Implementation Careful planning needed Natural progression
Pedagogical value Shows "planning ahead" Shows "evolving design"

Alternative: Even Simpler with Subclassing

Module 01-04: Basic Tensor

class Tensor:
    def __init__(self, data):
        self.data = np.array(data)

Module 05: Introduce GradTensor

class GradTensor(Tensor):
    """Tensor with gradient support."""
    def __init__(self, data, requires_grad=False):
        super().__init__(data)
        self.requires_grad = requires_grad
        self.grad = None
        self._backward = lambda: None

    def backward(self, grad=None):
        # Implementation here
        pass

# Make Tensor an alias to GradTensor
import tinytorch
tinytorch.Tensor = GradTensor  # Replace globally!

Alternative: Context Manager Pattern (Like TensorFlow)

Keep Tensor Simple Forever

class Tensor:
    def __init__(self, data):
        self.data = np.array(data)

Module 05: Add Gradient Tape

class GradientTape:
    """Context manager for gradient tracking."""
    def __enter__(self):
        self.tape = []
        return self

    def watch(self, tensor):
        tensor._tape_grad = None

    def gradient(self, target, sources):
        # Compute gradients
        return grads

# Usage:
with GradientTape() as tape:
    tape.watch(x)
    y = x * 2 + 1
    loss = y.sum()

grads = tape.gradient(loss, [x])

Alternative: Functional Pattern (Like JAX)

def grad(f):
    """Return gradient function of f."""
    def grad_f(x):
        # Use finite differences or AD
        return gradient
    return grad_f

# Usage:
def loss_fn(params):
    return (params ** 2).sum()

grad_fn = grad(loss_fn)
gradients = grad_fn(params)

🎯 RECOMMENDATION: Progressive Enhancement

Go with the simplest approach:

  1. Modules 01-04: Dead simple Tensor class (no gradient stuff)
  2. Module 05: Monkey-patch to add gradient support
  3. Key insight: Old Tensors (created before Module 05) won't have gradients, but that's fine - students won't use them for training anyway!

Why this is best:

  • Maximally simple for students in early modules
  • Natural progression - complexity only when needed
  • Pedagogically sound - students see evolution of a framework
  • No wasted concepts - everything introduced has immediate use
  • Honest about engineering - real frameworks evolve too!

The implementation is just:

  1. Module 01: 50 lines of simple Tensor
  2. Module 05: 100 lines to add gradients via monkey-patching
  3. Module 06+: Everything just works with the enhanced Tensor

This is what I recommend!