From 8e6ad0eabdcbdb579587c66e4a264abdc265a137 Mon Sep 17 00:00:00 2001 From: Vijay Janapa Reddi Date: Mon, 29 Sep 2025 11:42:29 -0400 Subject: [PATCH] Reorganize documentation structure properly - Move detailed Tensor Evolution Pattern to .claude/guidelines/MODULE_DEVELOPMENT.md - Clean up CLAUDE.md to focus on agent coordination and high-level principles - Point Module Developer to proper guidelines file for technical details - Maintain separation of concerns: CLAUDE.md = agent coordination, guidelines = technical specs - Proper documentation architecture for agent-based development --- .claude/guidelines/MODULE_DEVELOPMENT.md | 94 +++++++++++++ CLAUDE.md | 165 ++--------------------- 2 files changed, 108 insertions(+), 151 deletions(-) diff --git a/.claude/guidelines/MODULE_DEVELOPMENT.md b/.claude/guidelines/MODULE_DEVELOPMENT.md index 72d9ea3a..c25607db 100644 --- a/.claude/guidelines/MODULE_DEVELOPMENT.md +++ b/.claude/guidelines/MODULE_DEVELOPMENT.md @@ -126,6 +126,100 @@ def test_dense_layer(): test_dense_layer() ``` +## ๐Ÿšจ CRITICAL: Module Dependency Rules + +### Tensor Evolution Pattern - THE CLEAN APPROACH + +**CRITICAL: Use ONE evolving Tensor class, NOT separate Tensor/Variable classes** + +Following PyTorch's design philosophy, TinyTorch uses a single `Tensor` class that gains capabilities over time: + +#### Module Evolution Plan + +``` +Module 01 (Tensor): +โ”œโ”€โ”€ Create basic Tensor class with data storage +โ”œโ”€โ”€ Add requires_grad=False by default +โ”œโ”€โ”€ Add placeholder grad=None +โ”œโ”€โ”€ Add NotImplementedError for backward() +โ””โ”€โ”€ Basic operations (__add__, __mul__) without gradient tracking + +Module 02-04 (Activations, Layers, Losses): +โ”œโ”€โ”€ Use existing Tensor class as-is +โ”œโ”€โ”€ Work with requires_grad=False tensors +โ”œโ”€โ”€ Build layers, activations, losses on basic Tensor +โ””โ”€โ”€ No gradient functionality needed yet + +Module 05 (Autograd): +โ”œโ”€โ”€ STUDENTS UPDATE the existing Tensor class +โ”œโ”€โ”€ Implement the backward() method (replace NotImplementedError) +โ”œโ”€โ”€ Update operations (__add__, __mul__) to build computation graph +โ”œโ”€โ”€ Add grad_fn tracking for chain rule +โ””โ”€โ”€ Now requires_grad=True works everywhere automatically + +Module 06+ (Optimizers, Training, etc.): +โ”œโ”€โ”€ Use enhanced Tensor class with full gradient capabilities +โ”œโ”€โ”€ All previous code works unchanged (backward compatibility) +โ”œโ”€โ”€ New code can use requires_grad=True for automatic differentiation +โ””โ”€โ”€ Single clean interface throughout +``` + +#### Implementation Examples + +**Module 01: Basic Tensor** +```python +class Tensor: + def __init__(self, data, requires_grad=False): + self.data = np.array(data) + self.requires_grad = requires_grad + self.grad = None # Placeholder for later + + def backward(self, gradient=None): + raise NotImplementedError("Autograd coming in Module 05!") + + def __add__(self, other): + return Tensor(self.data + other.data) +``` + +**Module 03: Layers using Tensor** +```python +class Linear: + def __init__(self, in_features, out_features): + # Use Tensor directly, not Parameter wrapper + self.weights = Tensor(np.random.randn(in_features, out_features) * 0.1) + self.bias = Tensor(np.zeros(out_features)) + + def forward(self, x): + return x @ self.weights + self.bias # Clean operations +``` + +**Module 05: Students enhance existing Tensor** +```python +def backward(self, gradient=None): + """Students implement this to replace NotImplementedError""" + if not self.requires_grad: + raise RuntimeError("Tensor doesn't require gradients") + if self.grad is None: + self.grad = np.zeros_like(self.data) + self.grad += gradient + if self.grad_fn: + self.grad_fn(gradient) +``` + +### Key Benefits +- โœ… **No hasattr() checks needed anywhere** +- โœ… **Single class students always use: Tensor** +- โœ… **Clean evolution: students enhance existing class** +- โœ… **Matches PyTorch mental model exactly** +- โœ… **No type confusion or conversion needed** + +### Forbidden Patterns +- โŒ **BAD**: `if hasattr(x, 'data'): x.data else: x` +- โŒ **BAD**: Separate Tensor and Variable classes +- โŒ **BAD**: Parameter wrappers with hasattr() checks +- โœ… **GOOD**: Single Tensor class with requires_grad flag +- โœ… **GOOD**: Clear error messages when features not available + ## ๐Ÿ”ฌ ML Systems Focus ### MANDATORY Systems Analysis Sections diff --git a/CLAUDE.md b/CLAUDE.md index 6d92f857..ea017fb1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -663,156 +663,21 @@ All TinyTorch modules MUST follow the standardized structure with MANDATORY syst ### ๐Ÿ”ฌ **New Principle: Every Module Teaches Systems Thinking Through Implementation** **MANDATORY**: Every module must demonstrate that understanding systems comes through building them, not just studying them. -### ๐Ÿšจ **CRITICAL: Module Dependency Rules - NO FORWARD REFERENCES** +### ๐Ÿšจ **CRITICAL: Module Development Guidelines** -**MANDATORY MODULE DEPENDENCY PRINCIPLES:** +**All detailed module development standards are in `.claude/guidelines/MODULE_DEVELOPMENT.md`** -#### **1. Sequential Build Order - STRICTLY ENFORCED** -Modules are built by students in numerical order. Each module can ONLY use what came before: -``` -01_tensor โ†’ 02_activations โ†’ 03_layers โ†’ 04_losses โ†’ 05_autograd โ†’ 06_spatial โ†’ ... -``` +#### **Key Principles for All Agents:** +1. **Sequential dependency order** - Module N only uses modules 1 through N-1 +2. **Single evolving Tensor class** - No separate Variable classes or hasattr() hacks +3. **Educational framework focus** - Good enough to teach, not production-level +4. **Test in isolation** - Each module works with only prior dependencies -**GOLDEN RULE: Module N can only import from modules 1 through N-1** - -#### **2. NO Forward References - ZERO TOLERANCE** -- โŒ **FORBIDDEN**: Module 03_layers importing from 05_autograd -- โŒ **FORBIDDEN**: Module 04_losses importing from 09_optimizers -- โœ… **CORRECT**: Module 06_spatial importing from 02_tensor and 03_layers -- โœ… **CORRECT**: Module 10_optimizers using all modules 01-09 - -#### **3. Tensor Evolution Pattern - THE CLEAN APPROACH** -**CRITICAL: Use ONE evolving Tensor class, NOT separate Tensor/Variable classes** - -Following PyTorch's actual design philosophy, TinyTorch uses a single `Tensor` class that gains capabilities over time: - -```python -# Module 02: Basic Tensor (no gradients yet) -class Tensor: - def __init__(self, data, requires_grad=False): - self.data = np.array(data) - self.requires_grad = requires_grad - self.grad = None # Placeholder for later - - def backward(self, gradient=None): - # Helpful error message before autograd is implemented - raise NotImplementedError("Autograd coming in Module 05! Set requires_grad=True after implementing autograd.") - - def __add__(self, other): - # Basic operation without gradient tracking - return Tensor(self.data + other.data) -``` - -```python -# Module 05: Students ADD autograd to existing Tensor class -def backward(self, gradient=None): - """Student implements this in Module 05""" - if not self.requires_grad: - raise RuntimeError("Tensor doesn't require gradients") - - if self.grad is None: - self.grad = np.zeros_like(self.data) - self.grad += gradient - - if self.grad_fn: - self.grad_fn(gradient) - -# Students UPDATE existing operations to track gradients -def __add__(self, other): - result_data = self.data + other.data - result = Tensor(result_data, requires_grad=(self.requires_grad or other.requires_grad)) - - if result.requires_grad: - def grad_fn(gradient): - if self.requires_grad: - self.backward(gradient) - if other.requires_grad: - other.backward(gradient) - result.grad_fn = grad_fn - - return result -``` - -**Key Benefits:** -- โœ… **No hasattr() checks needed anywhere** -- โœ… **Single class students always use: Tensor** -- โœ… **Clean evolution: students enhance existing class** -- โœ… **Matches PyTorch mental model exactly** -- โœ… **No type confusion or conversion needed** - -#### **4. NO hasattr() Hacks - Use Clean Evolution Instead** -- โŒ **BAD**: `if hasattr(x, 'data'): x.data else: x` -- โŒ **BAD**: `if hasattr(x, 'grad'): x.grad else: None` -- โŒ **BAD**: Separate Tensor and Variable classes -- โœ… **GOOD**: Single Tensor class with `requires_grad` flag -- โœ… **GOOD**: Clear error messages: "Autograd not implemented yet" -- โœ… **GOOD**: Students enhance existing classes, don't create new ones - -#### **5. Educational Framework Standards** -**Remember: This is an educational framework, not production code** -- **Goal**: Good enough to teach concepts clearly -- **Non-goal**: Production-level performance or features -- **Priority**: Clear, understandable code that builds incrementally -- **OK to**: Look at PyTorch/TensorFlow for implementation patterns -- **NOT OK**: Complex abstractions that confuse learning - -#### **6. Module Testing Independence** -Each module MUST be testable in isolation: -- Module tests should pass using only prior modules -- No mocking of future module functionality -- If a test needs autograd but module comes before autograd, the test is wrong - -#### **7. Module Evolution Plan - Tensor Class Growth** - -**CRITICAL: This is exactly how students build TinyTorch - evolving ONE Tensor class:** - -``` -Module 01 (Tensor): -โ”œโ”€โ”€ Create basic Tensor class with data storage -โ”œโ”€โ”€ Add requires_grad=False by default -โ”œโ”€โ”€ Add placeholder grad=None -โ”œโ”€โ”€ Add NotImplementedError for backward() -โ””โ”€โ”€ Basic operations (__add__, __mul__) without gradient tracking - -Module 02-04 (Activations, Layers, Losses): -โ”œโ”€โ”€ Use existing Tensor class as-is -โ”œโ”€โ”€ Work with requires_grad=False tensors -โ”œโ”€โ”€ Build layers, activations, losses on basic Tensor -โ””โ”€โ”€ No gradient functionality needed yet - -Module 05 (Autograd): -โ”œโ”€โ”€ STUDENTS UPDATE the existing Tensor class -โ”œโ”€โ”€ Implement the backward() method (replace NotImplementedError) -โ”œโ”€โ”€ Update operations (__add__, __mul__) to build computation graph -โ”œโ”€โ”€ Add grad_fn tracking for chain rule -โ””โ”€โ”€ Now requires_grad=True works everywhere automatically - -Module 06+ (Optimizers, Training, etc.): -โ”œโ”€โ”€ Use enhanced Tensor class with full gradient capabilities -โ”œโ”€โ”€ All previous code works unchanged (backward compatibility) -โ”œโ”€โ”€ New code can use requires_grad=True for automatic differentiation -โ””โ”€โ”€ Single clean interface throughout -``` - -**Key Teaching Points:** -1. **Module 01**: "Here's a Tensor data structure" -2. **Modules 02-04**: "Here's how to build ML components with Tensors" -3. **Module 05**: "Now let's add automatic differentiation to our existing Tensor" -4. **Module 06+**: "Our enhanced Tensor enables gradient-based optimization" - -#### **8. Clear Capability Boundaries** -Document what each module provides and requires: -```python -# Module 03_layers header comment -""" -Layers Module - Neural Network Building Blocks -Prerequisites: 01_tensor, 02_activations -Uses: Tensor class (requires_grad=False only) -Provides: Linear, Parameter, Module base class -Does NOT provide: Automatic differentiation (comes in 05_autograd) -After Module 05: Same code works with requires_grad=True automatically -""" -``` +**Module Developer MUST read and follow `.claude/guidelines/MODULE_DEVELOPMENT.md` for:** +- Tensor Evolution Pattern implementation details +- Forbidden and required coding patterns +- Module structure requirements +- NBGrader integration standards ### ๐Ÿงช Testing Pattern - MANDATORY ``` @@ -952,11 +817,10 @@ Content here... - **MUST ensure every module teaches systems thinking through implementation** **Module Developer:** -- **MUST respect module dependency order** - NO forward references, EVER -- **MUST ensure module N only imports from modules 1 through N-1** +- **MUST read and follow `.claude/guidelines/MODULE_DEVELOPMENT.md`** - ALL technical standards documented there - **MUST use Tensor Evolution Pattern** - single evolving Tensor class, NO separate Variable class +- **MUST respect module dependency order** - NO forward references, EVER - **MUST NOT use hasattr() hacks** - use clean Tensor with requires_grad flag -- **MUST follow Module Evolution Plan**: basic Tensor โ†’ enhanced Tensor in Module 05 - Code implementation with MANDATORY ML systems analysis - **Memory profiling and complexity analysis** in every module - **Performance benchmarking** and bottleneck identification @@ -967,7 +831,6 @@ Content here... - **Module completion workflow**: Implement `tito module complete` with export and testing - **MUST include systems insights**: memory usage, computational complexity, scaling behavior - **MUST ensure each module is testable in isolation** using only Tensor class -- **MUST provide clear error messages** when gradient features not yet implemented - **MUST notify QA Agent after ANY module changes** **Package Manager:**