diff --git a/CLAUDE.md b/CLAUDE.md
index e9d5420d..6d92f857 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -681,31 +681,72 @@ Modules are built by students in numerical order. Each module can ONLY use what
 - ✅ **CORRECT**: Module 06_spatial importing from 02_tensor and 03_layers
 - ✅ **CORRECT**: Module 10_optimizers using all modules 01-09
 
-#### **3. Progressive Enhancement Pattern**
-When later modules add capabilities (like autograd), use adaptive patterns:
+#### **3. Tensor Evolution Pattern - THE CLEAN APPROACH**
+**CRITICAL: Use ONE evolving Tensor class, NOT separate Tensor/Variable classes**
+
+Following PyTorch's actual design philosophy, TinyTorch uses a single `Tensor` class that gains capabilities over time:
 
 ```python
-# CORRECT: Adaptive import in early module
-class Parameter:
-    def __init__(self, data):
-        # Works with basic Tensor initially
-        self._tensor = Tensor(data)
+# Module 02: Basic Tensor (no gradients yet)
+class Tensor:
+    def __init__(self, data, requires_grad=False):
+        self.data = np.array(data)
+        self.requires_grad = requires_grad
+        self.grad = None  # Placeholder for later
 
-        # Try to upgrade if autograd available
-        try:
-            from tinytorch.core.autograd import Variable
-            self._variable = Variable(data, requires_grad=True)
-            self._has_autograd = True
-        except ImportError:
-            # Autograd not built yet - work without it
-            self._has_autograd = False
+    def backward(self, gradient=None):
+        # Helpful error message before autograd is implemented
+        raise NotImplementedError("Autograd coming in Module 05! Set requires_grad=True after implementing autograd.")
+
+    def __add__(self, other):
+        # Basic operation without gradient tracking
+        return Tensor(self.data + other.data)
 ```
 
-#### **4. NO hasattr() Hacks - Find Root Causes**
-- ❌ **BAD**: Using `hasattr()` checks everywhere as band-aids
-- ❌ **BAD**: Catching AttributeErrors without understanding why
-- ✅ **GOOD**: Clean interfaces that work at each stage
-- ✅ **GOOD**: Clear error messages when features aren't available yet
+```python
+# Module 05: Students ADD autograd to existing Tensor class
+def backward(self, gradient=None):
+    """Student implements this in Module 05"""
+    if not self.requires_grad:
+        raise RuntimeError("Tensor doesn't require gradients")
+
+    if self.grad is None:
+        self.grad = np.zeros_like(self.data)
+    self.grad += gradient
+
+    if self.grad_fn:
+        self.grad_fn(gradient)
+
+# Students UPDATE existing operations to track gradients
+def __add__(self, other):
+    result_data = self.data + other.data
+    result = Tensor(result_data, requires_grad=(self.requires_grad or other.requires_grad))
+
+    if result.requires_grad:
+        def grad_fn(gradient):
+            if self.requires_grad:
+                self.backward(gradient)
+            if other.requires_grad:
+                other.backward(gradient)
+        result.grad_fn = grad_fn
+
+    return result
+```
+
+**Key Benefits:**
+- ✅ **No hasattr() checks needed anywhere**
+- ✅ **Single class students always use: Tensor**
+- ✅ **Clean evolution: students enhance existing class**
+- ✅ **Matches PyTorch mental model exactly**
+- ✅ **No type confusion or conversion needed**
+
+#### **4. NO hasattr() Hacks - Use Clean Evolution Instead**
+- ❌ **BAD**: `if hasattr(x, 'data'): x.data else: x`
+- ❌ **BAD**: `if hasattr(x, 'grad'): x.grad else: None`
+- ❌ **BAD**: Separate Tensor and Variable classes
+- ✅ **GOOD**: Single Tensor class with `requires_grad` flag
+- ✅ **GOOD**: Clear error messages: "Autograd not implemented yet"
+- ✅ **GOOD**: Students enhance existing classes, don't create new ones
 
 #### **5. Educational Framework Standards**
 **Remember: This is an educational framework, not production code**
@@ -721,15 +762,55 @@ Each module MUST be testable in isolation:
 - No mocking of future module functionality
 - If a test needs autograd but module comes before autograd, the test is wrong
 
-#### **7. Clear Capability Boundaries**
+#### **7. Module Evolution Plan - Tensor Class Growth**
+
+**CRITICAL: This is exactly how students build TinyTorch - evolving ONE Tensor class:**
+
+```
+Module 01 (Tensor):
+├── Create basic Tensor class with data storage
+├── Add requires_grad=False by default
+├── Add placeholder grad=None
+├── Add NotImplementedError for backward()
+└── Basic operations (__add__, __mul__) without gradient tracking
+
+Module 02-04 (Activations, Layers, Losses):
+├── Use existing Tensor class as-is
+├── Work with requires_grad=False tensors
+├── Build layers, activations, losses on basic Tensor
+└── No gradient functionality needed yet
+
+Module 05 (Autograd):
+├── STUDENTS UPDATE the existing Tensor class
+├── Implement the backward() method (replace NotImplementedError)
+├── Update operations (__add__, __mul__) to build computation graph
+├── Add grad_fn tracking for chain rule
+└── Now requires_grad=True works everywhere automatically
+
+Module 06+ (Optimizers, Training, etc.):
+├── Use enhanced Tensor class with full gradient capabilities
+├── All previous code works unchanged (backward compatibility)
+├── New code can use requires_grad=True for automatic differentiation
+└── Single clean interface throughout
+```
+
+**Key Teaching Points:**
+1. **Module 01**: "Here's a Tensor data structure"
+2. **Modules 02-04**: "Here's how to build ML components with Tensors"
+3. **Module 05**: "Now let's add automatic differentiation to our existing Tensor"
+4. **Module 06+**: "Our enhanced Tensor enables gradient-based optimization"
+
+#### **8. Clear Capability Boundaries**
 Document what each module provides and requires:
 ```python
 # Module 03_layers header comment
 """
 Layers Module - Neural Network Building Blocks
 Prerequisites: 01_tensor, 02_activations
+Uses: Tensor class (requires_grad=False only)
 Provides: Linear, Parameter, Module base class
 Does NOT provide: Automatic differentiation (comes in 05_autograd)
+After Module 05: Same code works with requires_grad=True automatically
 """
 ```
 
@@ -873,7 +954,9 @@ Content here...
 **Module Developer:**
 - **MUST respect module dependency order** - NO forward references, EVER
 - **MUST ensure module N only imports from modules 1 through N-1**
-- **MUST NOT use hasattr() hacks** - fix root causes instead
+- **MUST use Tensor Evolution Pattern** - single evolving Tensor class, NO separate Variable class
+- **MUST NOT use hasattr() hacks** - use clean Tensor with requires_grad flag
+- **MUST follow Module Evolution Plan**: basic Tensor → enhanced Tensor in Module 05
 - Code implementation with MANDATORY ML systems analysis
 - **Memory profiling and complexity analysis** in every module
 - **Performance benchmarking** and bottleneck identification
@@ -883,8 +966,8 @@ Content here...
 - **Checkpoint system implementation**: Build checkpoint test files and CLI integration
 - **Module completion workflow**: Implement `tito module complete` with export and testing
 - **MUST include systems insights**: memory usage, computational complexity, scaling behavior
-- **MUST use adaptive patterns** when later modules add capabilities
-- **MUST ensure each module is testable in isolation**
+- **MUST ensure each module is testable in isolation** using only Tensor class
+- **MUST provide clear error messages** when gradient features not yet implemented
 - **MUST notify QA Agent after ANY module changes**
 
 **Package Manager:**