Clean up repository

- Remove stale feature branches (kept debugging branch with unmerged work) - Move test_spatial_core.py to correct directory (tests/09_spatial) - Remove .tito user state from tracking (config.json, progress.json) - Delete archived CLI commands (tito/commands/_archived/) - Move standalone integration tests to tests/integration/ - Remove outdated audit/report markdown files - Remove old template and deprecated test files - Simplify .gitignore for .tito/ directory
2025-12-05 19:17:52 -06:00 · 2025-12-02 22:03:16 -05:00
parent ca9922224a
commit 3a885601f9
41 changed files with 402 additions and 12760 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -137,9 +137,8 @@ Thumbs.db
 tito-cli.log
 COMMIT_LOG.txt

-# Tito CLI backups and cache
-.tito/backups/
-.tito/cache/
+# Tito CLI user state and cache (local to each user)
+.tito/

 # Downloaded datasets (not source-controlled, too large)
 data/
--- a/.tito/config.json
+++ b/.tito/config.json
@@ -1,3 +0,0 @@
-{
-  "logo_theme": "standard"
-}
--- a/.tito/progress.json
+++ b/.tito/progress.json
@@ -1,16 +0,0 @@
-{
-  "completed_modules": [
-    "01_setup",
-    "02_tensor",
-    "03_activations",
-    "04_layers"
-  ],
-  "completion_dates": {
-    "01_setup": "2025-09-19T10:21:11.081117",
-    "02_tensor": "2025-09-19T10:21:34.831693",
-    "03_activations": "2025-09-19T10:21:50.000000",
-    "04_layers": "2025-09-19T10:21:55.000000"
-  },
-  "achievements": [],
-  "total_capabilities_unlocked": 0
-}
--- a/tests/05_autograd/INTEGRATION_TEST_AUDIT.md
+++ b/tests/05_autograd/INTEGRATION_TEST_AUDIT.md
@@ -1,660 +0,0 @@
-# Module 05 (Autograd) Integration Test Audit Report
-
-**Date**: 2025-11-25
-**Auditor**: Dr. Sarah Rodriguez
-**Status**: CRITICAL GAPS IDENTIFIED
-
---
-
-## Executive Summary
-
-**Current State**: The `test_progressive_integration.py` file is MISNAMED and tests Module 08 (DataLoader), NOT Module 05 (Autograd). This is a critical error that breaks the testing framework.
-
-**Test Coverage**: 40% - Missing critical integration tests for gradient flow, in-place operations, memory leaks, and multi-module integration.
-
-**Bug-Catching Priority**: MEDIUM - Existing tests cover specific operations but miss systemic integration issues.
-
---
-
-## Critical Issues
-
-### 1. WRONG MODULE TESTED (BLOCKER)
-
-**Issue**: `/Users/VJ/GitHub/TinyTorch/tests/05_autograd/test_progressive_integration.py` tests Module 08 (DataLoader), not Module 05 (Autograd)
-
-**Evidence**:
-```python
-# Line 1-7 of test_progressive_integration.py
-"""
-Module 08: Progressive Integration Tests
-Tests that Module 08 (DataLoader) works correctly AND that the entire prior stack works.
-
-DEPENDENCY CHAIN: 01_setup → 02_tensor → 03_activations → 04_layers → 05_dense → 06_spatial → 07_attention → 08_dataloader
-This is where we enable real data processing for ML systems.
-```
-
-**Impact**:
- Module 05 has NO progressive integration tests
- Cannot verify that Autograd works with prior modules (01-04)
- Cannot verify that prior modules remain stable after Autograd
-
-**Action Required**:
-1. Rename current file to `tests/08_dataloader/test_progressive_integration.py`
-2. Create NEW `tests/05_autograd/test_progressive_integration.py` for Autograd
-
---
-
-## Current Test Coverage Analysis
-
-### Existing Tests (What We Have)
-
-| Test File | Purpose | Coverage |
-|-----------|---------|----------|
-| `test_gradient_flow.py` | Tests gradient tracking through operations | ✅ Good |
-| `test_batched_matmul_backward.py` | Tests batched matmul gradients | ✅ Excellent |
-| `test_dataloader_tensor_integration.py` | DataLoader integration (wrong module!) | ❌ Misplaced |
-| `test_progressive_integration.py` | Module 08 tests (WRONG!) | ❌ Wrong module |
-
-### What These Tests Cover
-
-**✅ COVERED:**
-1. **Arithmetic gradient flow** (add, sub, mul, div)
-2. **Activation gradients** (ReLU, Sigmoid, Softmax, GELU)
-3. **Reshape/transpose gradients**
-4. **Batched matmul** (attention patterns)
-5. **LayerNorm operations** (sqrt, mean)
-
-**❌ MISSING:**
-1. **Integration with Module 01 (Tensor)** - No tests that Tensor operations work
-2. **Integration with Module 02 (Activations)** - Limited activation gradient tests
-3. **Integration with Module 03 (Layers)** - No Dense layer gradient tests
-4. **Integration with Module 04 (Losses)** - No loss gradient tests
-5. **In-place operation bugs** - Critical for catching graph breaking
-6. **Memory leak detection** - Computational graph accumulation
-7. **Gradient accumulation bugs** - Shared parameters
-8. **Multi-layer backprop** - End-to-end gradient flow
-9. **Prior module stability** - Regression testing
-
---
-
-## Critical Integration Points Analysis
-
-### Integration Point 1: Autograd + Module 01 (Tensor)
-
-**What Should Be Tested**:
- All Tensor operations preserve `requires_grad`
- Tensor operations create `_grad_fn` correctly
- `backward()` computes correct gradients for all operations
- Broadcasting during backward works correctly
- Scalar tensors can call `backward()` without arguments
-
-**Current Coverage**: 60%
- ✅ Basic operations tested in `test_gradient_flow.py`
- ❌ Missing: Broadcasting edge cases
- ❌ Missing: Scalar tensor backward
- ❌ Missing: Inplace operation detection
-
-**Missing Tests**:
-```python
-# Test: Broadcasting gradient accumulation
-def test_broadcasting_backward():
-    """Test gradients accumulate correctly with broadcasting."""
-    bias = Tensor([1.0], requires_grad=True)  # Shape (1,)
-    x = Tensor([[1, 2], [3, 4]], requires_grad=True)  # Shape (2, 2)
-    y = x + bias  # Broadcasts to (2, 2)
-    loss = y.sum()
-    loss.backward()
-    # bias.grad should be summed over all broadcast dimensions
-    assert bias.grad.shape == (1,), "Bias gradient shape wrong"
-    assert np.allclose(bias.grad, [4.0]), "Broadcasting backward failed"
-```
-
-### Integration Point 2: Autograd + Module 02 (Activations)
-
-**What Should Be Tested**:
- ReLU, Sigmoid, Softmax, GELU all preserve gradient tracking
- Activation gradients compose correctly in chains
- Dead ReLU neurons (zero gradient) handled correctly
- Softmax numerical stability during backward
-
-**Current Coverage**: 70%
- ✅ Basic activation gradients tested
- ✅ GELU gradient flow tested
- ❌ Missing: Activation chaining gradients
- ❌ Missing: Dead ReLU detection
-
-**Missing Tests**:
-```python
-# Test: Multi-activation gradient chain
-def test_activation_chain_gradients():
-    """Test gradients flow through chained activations."""
-    x = Tensor([1.0, -1.0, 2.0], requires_grad=True)
-    relu = ReLU()
-    sigmoid = Sigmoid()
-
-    # Chain: x -> ReLU -> Sigmoid -> loss
-    h = relu(x)
-    y = sigmoid(h)
-    loss = y.sum()
-    loss.backward()
-
-    # x.grad should reflect both ReLU and Sigmoid derivatives
-    assert x.grad is not None, "Gradient didn't flow through chain"
-    # Dead neuron at x=-1 should have zero gradient
-    assert np.isclose(x.grad[1], 0.0), "Dead ReLU gradient not zero"
-```
-
-### Integration Point 3: Autograd + Module 03 (Layers)
-
-**What Should Be Tested**:
- Dense layer forward preserves `requires_grad`
- Dense layer backward computes weight and bias gradients
- Multi-layer networks backpropagate correctly
- Parameter sharing accumulates gradients
-
-**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No tests for Dense layer gradients
-
-**Missing Tests**:
-```python
-# Test: Dense layer gradient computation
-def test_dense_layer_gradients():
-    """Test Dense layer computes weight and bias gradients."""
-    from tinytorch.core.layers import Dense
-
-    layer = Dense(3, 2)
-    x = Tensor([[1, 2, 3]], requires_grad=True)
-
-    # Forward pass
-    y = layer(x)
-    loss = y.sum()
-
-    # Backward pass
-    loss.backward()
-
-    # Check all gradients exist
-    assert layer.weight.grad is not None, "Weight gradient missing"
-    assert layer.bias.grad is not None, "Bias gradient missing"
-    assert x.grad is not None, "Input gradient missing"
-
-    # Check gradient shapes
-    assert layer.weight.grad.shape == layer.weight.shape
-    assert layer.bias.grad.shape == layer.bias.shape
-```
-
-### Integration Point 4: Autograd + Module 04 (Losses)
-
-**What Should Be Tested**:
- MSE loss computes correct gradients
- CrossEntropy loss computes correct gradients
- BCE loss computes correct gradients
- Loss gradients match hand-calculated values
-
-**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No tests for loss function gradients
-
-**Missing Tests**:
-```python
-# Test: MSE loss gradient
-def test_mse_loss_gradient():
-    """Test MSE loss computes correct gradients."""
-    from tinytorch.core.losses import MSELoss
-
-    predictions = Tensor([1.0, 2.0, 3.0], requires_grad=True)
-    targets = Tensor([1.5, 2.5, 2.5])
-
-    mse = MSELoss()
-    loss = mse(predictions, targets)
-    loss.backward()
-
-    # MSE gradient: 2 * (pred - target) / N
-    expected_grad = 2 * (predictions.data - targets.data) / 3
-    assert np.allclose(predictions.grad, expected_grad), "MSE gradient incorrect"
-```
-
-### Integration Point 5: In-Place Operations
-
-**What Should Be Tested**:
- In-place ops break computation graph (expected behavior)
- In-place ops raise warnings or errors
- Students see clear error messages
-
-**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No in-place operation tests
-
-**Missing Tests**:
-```python
-# Test: In-place operation detection
-def test_inplace_operations_break_graph():
-    """Test that in-place operations are detected and warned."""
-    x = Tensor([1, 2, 3], requires_grad=True)
-    y = x * 2
-
-    # In-place modification (if implemented) should break graph
-    # This test ensures students understand the danger
-    try:
-        x.data[0] = 999  # Direct modification
-        y.backward(Tensor([1, 1, 1]))
-        # If we get here, gradient is computed on modified data - BAD!
-        assert False, "In-place modification should affect gradients"
-    except Exception:
-        # Expected: Some warning or error about in-place ops
-        pass
-```
-
-### Integration Point 6: Memory Leaks (Computational Graph)
-
-**What Should Be Tested**:
- Computation graphs don't accumulate across iterations
- `zero_grad()` prevents gradient accumulation
- Large graphs can be garbage collected
-
-**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No memory leak tests
-
-**Missing Tests**:
-```python
-# Test: Gradient accumulation prevention
-def test_zero_grad_prevents_accumulation():
-    """Test zero_grad() prevents gradient accumulation."""
-    x = Tensor([1.0], requires_grad=True)
-
-    # First backward pass
-    y1 = x * 2
-    y1.backward()
-    first_grad = x.grad.copy()
-
-    # Second backward WITHOUT zero_grad - accumulates
-    y2 = x * 3
-    y2.backward()
-    assert np.allclose(x.grad, first_grad + 3.0), "Gradients should accumulate"
-
-    # Third backward WITH zero_grad - doesn't accumulate
-    x.zero_grad()
-    y3 = x * 4
-    y3.backward()
-    assert np.allclose(x.grad, 4.0), "zero_grad() should reset gradients"
-```
-
-### Integration Point 7: Gradient Accumulation (Parameter Sharing)
-
-**What Should Be Tested**:
- Shared parameters accumulate gradients correctly
- Embedding layers with repeated indices accumulate gradients
- Multi-path graphs accumulate gradients
-
-**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No gradient accumulation tests
-
-**Missing Tests**:
-```python
-# Test: Parameter sharing gradient accumulation
-def test_shared_parameter_gradient_accumulation():
-    """Test shared parameters accumulate gradients from multiple uses."""
-    weight = Tensor([2.0], requires_grad=True)
-
-    # Use same weight twice
-    x1 = Tensor([1.0])
-    x2 = Tensor([3.0])
-
-    y1 = weight * x1  # First use
-    y2 = weight * x2  # Second use
-
-    loss = y1.sum() + y2.sum()
-    loss.backward()
-
-    # Gradient should accumulate: dy1/dw + dy2/dw = 1.0 + 3.0 = 4.0
-    assert np.allclose(weight.grad, 4.0), "Shared parameter gradients didn't accumulate"
-```
-
---
-
-## Missing Progressive Integration Tests
-
-### Test Class 1: Prior Stack Stability (Modules 01-04)
-
-**Purpose**: Verify Autograd didn't break previous modules
-
-**Missing Tests**:
-```python
-class TestPriorStackStillWorking:
-    """Verify Modules 01-04 still work after Autograd."""
-
-    def test_tensor_operations_stable(self):
-        """Tensor operations work without requires_grad."""
-        from tinytorch.core.tensor import Tensor
-
-        # Should work exactly as before (Module 01)
-        x = Tensor([1, 2, 3])
-        y = Tensor([4, 5, 6])
-        z = x + y
-
-        assert np.array_equal(z.data, [5, 7, 9])
-        assert z.grad is None  # No gradient tracking
-
-    def test_activations_stable(self):
-        """Activations work without requires_grad."""
-        from tinytorch.core.activations import ReLU
-        from tinytorch.core.tensor import Tensor
-
-        relu = ReLU()
-        x = Tensor([-1, 0, 1])
-        y = relu(x)
-
-        assert np.array_equal(y.data, [0, 0, 1])
-        assert y.grad is None  # No gradient tracking
-```
-
-### Test Class 2: Autograd Core Functionality
-
-**Purpose**: Test Autograd's core capabilities
-
-**Missing Tests**:
-```python
-class TestModule05AutogradCore:
-    """Test Module 05 (Autograd) core functionality."""
-
-    def test_simple_backward_pass(self):
-        """Test simple computational graph backward pass."""
-        enable_autograd()
-
-        x = Tensor([2.0], requires_grad=True)
-        y = x * 3
-        loss = y.sum()
-
-        loss.backward()
-
-        assert x.grad is not None
-        assert np.allclose(x.grad, [3.0])
-
-    def test_multi_step_backward(self):
-        """Test multi-step computation graph."""
-        enable_autograd()
-
-        x = Tensor([2.0], requires_grad=True)
-        y = x * 3     # y = 6
-        z = y + 1     # z = 7
-        w = z * 2     # w = 14
-
-        w.backward()
-
-        # dw/dx = dw/dz * dz/dy * dy/dx = 2 * 1 * 3 = 6
-        assert np.allclose(x.grad, [6.0])
-```
-
-### Test Class 3: Full Stack Integration
-
-**Purpose**: Test complete pipeline (Modules 01-05)
-
-**Missing Tests**:
-```python
-class TestProgressiveStackIntegration:
-    """Test complete stack (01→05) works together."""
-
-    def test_neural_network_backward(self):
-        """Test complete neural network with backprop."""
-        enable_autograd()
-        from tinytorch.core.layers import Dense
-        from tinytorch.core.activations import ReLU
-        from tinytorch.core.losses import MSELoss
-
-        # Build network
-        layer1 = Dense(3, 4)
-        relu = ReLU()
-        layer2 = Dense(4, 2)
-
-        # Forward pass
-        x = Tensor([[1, 2, 3]], requires_grad=True)
-        h = relu(layer1(x))
-        y = layer2(h)
-
-        # Loss
-        target = Tensor([[1, 0]])
-        loss_fn = MSELoss()
-        loss = loss_fn(y, target)
-
-        # Backward pass
-        loss.backward()
-
-        # All parameters should have gradients
-        assert layer1.weight.grad is not None
-        assert layer1.bias.grad is not None
-        assert layer2.weight.grad is not None
-        assert layer2.bias.grad is not None
-        assert x.grad is not None
-```
-
---
-
-## Bug-Catching Priority Matrix
-
-| Category | Priority | Coverage | Missing Tests |
-|----------|----------|----------|---------------|
-| **Gradient Correctness** | 🔴 CRITICAL | 70% | Numerical gradient checks |
-| **In-Place Operations** | 🔴 CRITICAL | 0% | Graph breaking detection |
-| **Memory Leaks** | 🟠 HIGH | 0% | Graph accumulation tests |
-| **Gradient Accumulation** | 🟠 HIGH | 0% | Shared parameter tests |
-| **Module Integration** | 🟠 HIGH | 30% | Multi-module pipelines |
-| **Prior Module Stability** | 🟡 MEDIUM | 0% | Regression tests |
-| **Broadcasting** | 🟡 MEDIUM | 40% | Edge case tests |
-| **Numerical Stability** | 🟢 LOW | 50% | Extreme value tests |
-
---
-
-## Recommendations
-
-### Immediate Actions (Week 1)
-
-1. **Fix File Misplacement** (1 hour)
-   - Move `test_progressive_integration.py` to `tests/08_dataloader/`
-   - Create new `tests/05_autograd/test_progressive_integration.py`
-
-2. **Add Critical Missing Tests** (4 hours)
-   - Dense layer gradient tests
-   - Loss function gradient tests
-   - In-place operation detection
-   - Memory leak tests
-
-3. **Add Prior Module Stability Tests** (2 hours)
-   - Test Modules 01-04 still work
-   - Test gradients don't affect non-gradient mode
-
-### Short-Term Actions (Week 2-3)
-
-4. **Add Integration Tests** (6 hours)
-   - Full neural network backward pass
-   - Multi-layer gradient flow
-   - Shared parameter accumulation
-
-5. **Add Edge Case Tests** (3 hours)
-   - Broadcasting edge cases
-   - Scalar tensor backward
-   - Empty gradient handling
-
-### Long-Term Actions (Month 1)
-
-6. **Add Numerical Gradient Checks** (8 hours)
-   - Finite difference verification for all operations
-   - Ensures analytical gradients are correct
-
-7. **Add Performance Tests** (4 hours)
-   - Large graph memory usage
-   - Gradient computation speed
-   - Graph building overhead
-
---
-
-## Test Template for Module 05
-
-```python
-"""
-Module 05: Progressive Integration Tests
-Tests that Module 05 (Autograd) works correctly AND that all previous modules still work.
-
-DEPENDENCY CHAIN: 01_tensor → 02_activations → 03_layers → 04_losses → 05_autograd
-This is where automatic differentiation enables training.
-"""
-
-import numpy as np
-import sys
-from pathlib import Path
-
-# Add project root to path
-sys.path.insert(0, str(Path(__file__).parent.parent.parent))
-
-
-class TestPriorStackStillWorking:
-    """Verify Modules 01-04 functionality is still intact."""
-
-    def test_tensor_operations_stable(self):
-        """Ensure tensor operations work without gradients."""
-        # Test implementation
-        pass
-
-    def test_activations_stable(self):
-        """Ensure activations work without gradients."""
-        # Test implementation
-        pass
-
-    def test_layers_stable(self):
-        """Ensure layers work without gradients."""
-        # Test implementation
-        pass
-
-
-class TestModule05AutogradCore:
-    """Test Module 05 (Autograd) core functionality."""
-
-    def test_enable_autograd(self):
-        """Test autograd can be enabled."""
-        # Test implementation
-        pass
-
-    def test_simple_backward(self):
-        """Test simple backward pass."""
-        # Test implementation
-        pass
-
-    def test_requires_grad_tracking(self):
-        """Test requires_grad flag works."""
-        # Test implementation
-        pass
-
-
-class TestAutogradTensorIntegration:
-    """Test Autograd works with all Tensor operations (Module 01)."""
-
-    def test_arithmetic_gradients(self):
-        """Test gradients for +, -, *, /."""
-        # Test implementation
-        pass
-
-    def test_matmul_gradients(self):
-        """Test gradients for matrix multiplication."""
-        # Test implementation
-        pass
-
-    def test_broadcasting_gradients(self):
-        """Test broadcasting during backward."""
-        # Test implementation
-        pass
-
-
-class TestAutogradActivationIntegration:
-    """Test Autograd works with Activations (Module 02)."""
-
-    def test_relu_gradients(self):
-        """Test ReLU gradients."""
-        # Test implementation
-        pass
-
-    def test_sigmoid_gradients(self):
-        """Test Sigmoid gradients."""
-        # Test implementation
-        pass
-
-    def test_activation_chain_gradients(self):
-        """Test chained activation gradients."""
-        # Test implementation
-        pass
-
-
-class TestAutogradLayerIntegration:
-    """Test Autograd works with Layers (Module 03)."""
-
-    def test_dense_layer_gradients(self):
-        """Test Dense layer parameter gradients."""
-        # Test implementation
-        pass
-
-    def test_multi_layer_gradients(self):
-        """Test multi-layer network gradients."""
-        # Test implementation
-        pass
-
-
-class TestAutogradLossIntegration:
-    """Test Autograd works with Loss functions (Module 04)."""
-
-    def test_mse_loss_gradients(self):
-        """Test MSE loss gradients."""
-        # Test implementation
-        pass
-
-    def test_crossentropy_loss_gradients(self):
-        """Test CrossEntropy loss gradients."""
-        # Test implementation
-        pass
-
-
-class TestProgressiveStackIntegration:
-    """Test complete stack (01→05) works together."""
-
-    def test_end_to_end_training_step(self):
-        """Test complete forward + backward pass."""
-        # Test implementation
-        pass
-
-    def test_gradient_accumulation(self):
-        """Test gradients accumulate correctly."""
-        # Test implementation
-        pass
-
-
-class TestAutogradBugPrevention:
-    """Tests that catch common autograd bugs."""
-
-    def test_inplace_operations(self):
-        """Test in-place operations are handled correctly."""
-        # Test implementation
-        pass
-
-    def test_memory_leaks(self):
-        """Test computation graphs don't leak memory."""
-        # Test implementation
-        pass
-
-    def test_zero_grad_works(self):
-        """Test zero_grad() prevents accumulation."""
-        # Test implementation
-        pass
-```
-
---
-
-## Conclusion
-
-**Overall Assessment**: Module 05 integration tests are **INCOMPLETE** and **MISPLACED**.
-
-**Risk Level**: 🔴 **HIGH** - Missing critical tests could allow gradient bugs to slip into production.
-
-**Recommended Action**: Implement missing tests IMMEDIATELY before students encounter gradient bugs.
-
-**Estimated Effort**: 20-25 hours to achieve 90% coverage.
-
-**Student Impact**: Without these tests, students will encounter confusing gradient bugs that are hard to debug. Proper integration tests will catch these issues early.
-
---
-
-**Report Generated**: 2025-11-25
-**Next Review**: After implementing critical missing tests
--- a/tests/05_autograd/test_progressive_integration_OLD_MODULE08.py
+++ b/tests/05_autograd/test_progressive_integration_OLD_MODULE08.py
@@ -1,401 +0,0 @@
-"""
-Module 08: Progressive Integration Tests
-Tests that Module 08 (DataLoader) works correctly AND that the entire prior stack works.
-
-DEPENDENCY CHAIN: 01_setup → 02_tensor → 03_activations → 04_layers → 05_dense → 06_spatial → 07_attention → 08_dataloader
-This is where we enable real data processing for ML systems.
-"""
-
-import numpy as np
-import sys
-from pathlib import Path
-
-# Add project root to path
-sys.path.insert(0, str(Path(__file__).parent.parent.parent))
-
-
-class TestPriorStackStillWorking:
-    """Quick regression checks that prior modules (01→07) still work."""
-    
-    def test_foundation_stack_stable(self):
-        """Verify foundation stack (01→05) remains stable."""
-        # Environment (Module 01)
-        assert sys.version_info >= (3, 8), "Foundation broken: Python version"
-        
-        # Core functionality should work
-        try:
-            from tinytorch.core.tensor import Tensor
-            from tinytorch.core.layers import Dense
-            
-            # Should still be able to build networks
-            layer = Dense(10, 5)
-            x = Tensor(np.random.randn(4, 10))
-            output = layer(x)
-            assert output.shape == (4, 5), "Foundation broken: Neural network"
-            
-        except ImportError:
-            assert True, "Foundation not implemented yet"
-    
-    def test_advanced_stack_stable(self):
-        """Verify advanced modules (06→07) still work."""
-        try:
-            from tinytorch.core.spatial import Conv2D
-            from tinytorch.core.attention import MultiHeadAttention
-            
-            # Spatial and attention should work
-            conv = Conv2D(in_channels=3, out_channels=16, kernel_size=3)
-            attention = MultiHeadAttention(embed_dim=64, num_heads=8)
-            
-            assert hasattr(conv, 'forward'), "Advanced stack broken: Spatial"
-            assert hasattr(attention, 'forward'), "Advanced stack broken: Attention"
-            
-        except ImportError:
-            assert True, "Advanced stack not implemented yet"
-
-
-class TestModule08DataLoaderCore:
-    """Test Module 08 (DataLoader) core functionality."""
-    
-    def test_dataset_creation(self):
-        """Test basic dataset creation works."""
-        try:
-            from tinytorch.core.data import Dataset
-            
-            # Create simple dataset
-            class SimpleDataset(Dataset):
-                def __init__(self, size=100):
-                    self.size = size
-                    self.data = np.random.randn(size, 10)
-                    self.targets = np.random.randint(0, 3, size)
-                
-                def __len__(self):
-                    return self.size
-                
-                def __getitem__(self, idx):
-                    return self.data[idx], self.targets[idx]
-            
-            dataset = SimpleDataset(50)
-            assert len(dataset) == 50, "Dataset length broken"
-            
-            # Test data access
-            sample, target = dataset[0]
-            assert sample.shape == (10,), "Dataset sample shape broken"
-            assert isinstance(target, (int, np.integer)), "Dataset target type broken"
-            
-        except ImportError:
-            assert True, "Dataset not implemented yet"
-    
-    def test_dataloader_creation(self):
-        """Test DataLoader creation and batching."""
-        try:
-            from tinytorch.core.data import DataLoader, Dataset
-            from tinytorch.core.tensor import Tensor
-            
-            # Simple dataset for testing
-            class TestDataset(Dataset):
-                def __init__(self):
-                    self.data = np.random.randn(20, 5)
-                    self.targets = np.random.randint(0, 2, 20)
-                
-                def __len__(self):
-                    return 20
-                
-                def __getitem__(self, idx):
-                    return Tensor(self.data[idx]), self.targets[idx]
-            
-            dataset = TestDataset()
-            dataloader = DataLoader(dataset, batch_size=4, shuffle=True)
-            
-            # Test batching
-            for batch_x, batch_y in dataloader:
-                assert batch_x.shape == (4, 5), "DataLoader batch shape broken"
-                assert len(batch_y) == 4, "DataLoader target batch broken"
-                break  # Just test first batch
-                
-        except ImportError:
-            assert True, "DataLoader not implemented yet"
-    
-    def test_real_dataset_support(self):
-        """Test support for real datasets like CIFAR-10."""
-        try:
-            from tinytorch.core.data import CIFAR10Dataset
-            
-            # Note: This might download data, so we'll just test instantiation
-            # In real usage, students would download CIFAR-10
-            try:
-                dataset = CIFAR10Dataset(root='./data', train=True, download=False)
-                # If dataset exists, test basic functionality
-                if len(dataset) > 0:
-                    sample, target = dataset[0]
-                    assert len(sample.shape) >= 2, "CIFAR-10 sample shape invalid"
-                    assert isinstance(target, (int, np.integer)), "CIFAR-10 target invalid"
-            except (FileNotFoundError, RuntimeError):
-                # Data not downloaded, which is fine for testing
-                assert True, "CIFAR-10 data not available (expected)"
-                
-        except ImportError:
-            assert True, "Real dataset support not implemented yet"
-
-
-class TestProgressiveStackIntegration:
-    """Test that the complete stack (01→08) works together."""
-    
-    def test_complete_training_pipeline(self):
-        """Test complete ML pipeline: data → model → training."""
-        try:
-            from tinytorch.core.data import DataLoader, Dataset
-            from tinytorch.core.tensor import Tensor
-            from tinytorch.core.layers import Dense
-            from tinytorch.core.activations import ReLU, Softmax
-            
-            # Create dataset
-            class MLDataset(Dataset):
-                def __init__(self):
-                    self.data = np.random.randn(40, 10)
-                    self.targets = np.random.randint(0, 3, 40)
-                
-                def __len__(self):
-                    return 40
-                
-                def __getitem__(self, idx):
-                    return Tensor(self.data[idx]), self.targets[idx]
-            
-            # Create data pipeline
-            dataset = MLDataset()
-            dataloader = DataLoader(dataset, batch_size=8, shuffle=True)
-            
-            # Create model using prior modules
-            layer1 = Dense(10, 16)
-            layer2 = Dense(16, 3)
-            relu = ReLU()
-            softmax = Softmax()
-            
-            # Test training loop structure
-            for batch_x, batch_y in dataloader:
-                # Forward pass through complete pipeline
-                h = relu(layer1(batch_x))
-                logits = layer2(h)
-                predictions = softmax(logits)
-                
-                assert predictions.shape == (8, 3), "Complete pipeline broken"
-                
-                # Test one batch
-                break
-                
-        except ImportError:
-            assert True, "Complete training pipeline not ready yet"
-    
-    def test_cnn_data_pipeline(self):
-        """Test CNN pipeline with spatial data."""
-        try:
-            from tinytorch.core.data import DataLoader, Dataset  
-            from tinytorch.core.spatial import Conv2D, MaxPool2D
-            from tinytorch.core.layers import Dense
-            from tinytorch.core.tensor import Tensor
-            
-            # Image dataset
-            class ImageDataset(Dataset):
-                def __init__(self):
-                    # 32x32 RGB images
-                    self.data = np.random.randn(20, 3, 32, 32)
-                    self.targets = np.random.randint(0, 5, 20)
-                
-                def __len__(self):
-                    return 20
-                
-                def __getitem__(self, idx):
-                    return Tensor(self.data[idx]), self.targets[idx]
-            
-            dataset = ImageDataset()
-            dataloader = DataLoader(dataset, batch_size=4)
-            
-            # CNN components
-            conv1 = Conv2D(in_channels=3, out_channels=16, kernel_size=3)
-            pool = MaxPool2D(kernel_size=2)
-            fc = Dense(16 * 15 * 15, 5)  # Approximate after conv/pool
-            
-            # Test CNN pipeline
-            for batch_x, batch_y in dataloader:
-                assert batch_x.shape == (4, 3, 32, 32), "Image batch shape broken"
-                
-                # Simplified CNN forward (shape checking)
-                if hasattr(conv1, '__call__'):
-                    conv_out = conv1(batch_x)
-                    # Check reasonable conv output shape
-                    assert len(conv_out.shape) == 4, "Conv output dimensionality broken"
-                
-                break
-                
-        except ImportError:
-            assert True, "CNN data pipeline not ready yet"
-
-
-class TestRealWorldDataCapability:
-    """Test capability to handle real-world datasets."""
-    
-    def test_data_preprocessing_pipeline(self):
-        """Test data preprocessing and augmentation."""
-        try:
-            from tinytorch.core.data import transforms
-            from tinytorch.core.tensor import Tensor
-            
-            # Basic transforms
-            if hasattr(transforms, 'Normalize'):
-                normalize = transforms.Normalize(mean=[0.5], std=[0.5])
-                
-                # Test data
-                data = Tensor(np.random.randn(3, 32, 32))
-                normalized = normalize(data)
-                
-                assert normalized.shape == data.shape, "Normalization broken"
-            
-            if hasattr(transforms, 'RandomCrop'):
-                crop = transforms.RandomCrop(size=28)
-                
-                data = Tensor(np.random.randn(3, 32, 32))
-                cropped = crop(data)
-                
-                assert cropped.shape[-2:] == (28, 28), "Random crop broken"
-                
-        except ImportError:
-            assert True, "Data preprocessing not implemented yet"
-    
-    def test_memory_efficient_loading(self):
-        """Test memory efficient data loading."""
-        try:
-            from tinytorch.core.data import DataLoader, Dataset
-            
-            # Large dataset simulation
-            class LargeDataset(Dataset):
-                def __init__(self, size=1000):
-                    self.size = size
-                    # Don't load all data at once - simulate lazy loading
-                
-                def __len__(self):
-                    return self.size
-                
-                def __getitem__(self, idx):
-                    # Simulate loading data on-demand
-                    return np.random.randn(100), idx % 10
-            
-            dataset = LargeDataset(1000)
-            dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
-            
-            # Should be able to iterate without loading all data
-            batch_count = 0
-            for batch_x, batch_y in dataloader:
-                batch_count += 1
-                if batch_count >= 3:  # Test a few batches
-                    break
-            
-            assert batch_count == 3, "Memory efficient loading broken"
-            
-        except ImportError:
-            assert True, "Memory efficient loading not ready yet"
-    
-    def test_parallel_data_loading(self):
-        """Test parallel/multi-threaded data loading."""
-        try:
-            from tinytorch.core.data import DataLoader, Dataset
-            
-            class ParallelDataset(Dataset):
-                def __init__(self):
-                    self.data = np.random.randn(100, 50)
-                
-                def __len__(self):
-                    return 100
-                
-                def __getitem__(self, idx):
-                    # Simulate some processing time
-                    return self.data[idx], idx % 5
-            
-            dataset = ParallelDataset()
-            
-            # Test with num_workers if supported
-            if 'num_workers' in DataLoader.__init__.__code__.co_varnames:
-                dataloader = DataLoader(dataset, batch_size=16, num_workers=2)
-            else:
-                dataloader = DataLoader(dataset, batch_size=16)
-            
-            # Should work regardless of parallel support
-            for batch_x, batch_y in dataloader:
-                assert batch_x.shape == (16, 50), "Parallel loading broken"
-                break
-                
-        except ImportError:
-            assert True, "Parallel data loading not ready yet"
-
-
-class TestRegressionPrevention:
-    """Ensure previous modules still work after Module 08 development."""
-    
-    def test_no_foundation_regression(self):
-        """Verify foundation stack (01→05) unchanged."""
-        # Core functionality should remain stable
-        assert sys.version_info.major >= 3, "Foundation: Python detection broken"
-        
-        # Tensor operations should still work
-        try:
-            from tinytorch.core.tensor import Tensor
-            t = Tensor([1, 2, 3])
-            assert t.shape == (3,), "Foundation regression: Tensor broken"
-        except ImportError:
-            import numpy as np
-            arr = np.array([1, 2, 3])
-            assert arr.shape == (3,), "Foundation regression: Numpy broken"
-    
-    def test_no_advanced_regression(self):
-        """Verify advanced modules (06→07) unchanged."""
-        try:
-            from tinytorch.core.spatial import Conv2D
-            from tinytorch.core.attention import MultiHeadAttention
-            
-            # Advanced operations should still work
-            conv = Conv2D(in_channels=1, out_channels=4, kernel_size=3)
-            attention = MultiHeadAttention(embed_dim=32, num_heads=4)
-            
-            assert hasattr(conv, 'forward'), "Advanced regression: Spatial broken"
-            assert hasattr(attention, 'forward'), "Advanced regression: Attention broken"
-            
-        except ImportError:
-            # If not implemented, basic functionality should work
-            import numpy as np
-            assert np.random is not None, "Advanced regression: Random broken"
-    
-    def test_progressive_stability(self):
-        """Test the progressive stack is stable through data loading."""
-        # Stack should be stable through: Setup → ... → Attention → DataLoader
-        
-        # Setup level
-        import numpy as np
-        assert np is not None, "Setup level broken"
-        
-        # Foundation level (if available)
-        try:
-            from tinytorch.core.tensor import Tensor
-            from tinytorch.core.layers import Dense
-            
-            # Neural networks should still work
-            layer = Dense(5, 3)
-            x = Tensor(np.random.randn(2, 5))
-            output = layer(x)
-            assert output.shape == (2, 3), "Foundation level broken"
-            
-        except ImportError:
-            pass  # Not implemented yet
-        
-        # Data level (if available)
-        try:
-            from tinytorch.core.data import Dataset
-            
-            class TestDataset(Dataset):
-                def __len__(self):
-                    return 10
-                def __getitem__(self, idx):
-                    return idx, idx * 2
-            
-            dataset = TestDataset()
-            assert len(dataset) == 10, "Data level broken"
-            
-        except ImportError:
-            pass  # Not implemented yet
--- a/tests/07_training/CRITICAL_TESTS_TEMPLATE.py
+++ b/tests/07_training/CRITICAL_TESTS_TEMPLATE.py
@@ -1,515 +0,0 @@
-"""
-Module 07 Training - Critical Integration Tests Template
-
-This file contains the TOP 3 CRITICAL tests that MUST be implemented immediately
-to establish basic confidence that Module 07 (Training) works correctly.
-
-These tests catch the most common and severe bugs in training systems.
-
-PRIORITY: P0 - IMPLEMENT IMMEDIATELY
-ESTIMATED TIME: 2-3 hours
-BUG-CATCHING VALUE: CRITICAL
-"""
-
-import pytest
-import numpy as np
-import sys
-from pathlib import Path
-
-# Add project root to path
-sys.path.insert(0, str(Path(__file__).parent.parent.parent))
-
-# Import from TinyTorch
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.layers import Linear
-from tinytorch.core.activations import ReLU
-from tinytorch.core.losses import MSELoss, CrossEntropyLoss
-from tinytorch.core.optimizers import SGD, AdamW
-from tinytorch.core.training import Trainer, CosineSchedule, clip_grad_norm
-
-
-# =============================================================================
-# CRITICAL TEST 1: Missing zero_grad() Detection
-# =============================================================================
-# BUG-CATCHING VALUE: CRITICAL
-# COMMON STUDENT MISTAKE: Forgetting optimizer.zero_grad()
-# SYMPTOM: Training appears to run but gradients accumulate incorrectly
-# =============================================================================
-
-class TestMissingZeroGrad:
-    """Test that missing zero_grad() is caught and causes visible failure."""
-
-    def test_zero_grad_required_for_correct_training(self):
-        """
-        Test that zero_grad() is essential for correct gradient computation.
-
-        This test validates that:
-        1. Without zero_grad(), gradients accumulate across batches
-        2. Accumulated gradients cause incorrect parameter updates
-        3. Training with accumulated gradients behaves differently than correct training
-        """
-        # Create simple linear model: y = Wx + b
-        layer_correct = Linear(1, 1)
-        layer_broken = Linear(1, 1)
-
-        # Make weights identical to start
-        layer_broken.weights.data = layer_correct.weights.data.copy()
-        if hasattr(layer_correct, 'bias') and layer_correct.bias is not None:
-            layer_broken.bias.data = layer_correct.bias.data.copy()
-
-        # Create optimizers
-        optimizer_correct = SGD(layer_correct.parameters(), lr=0.1)
-        optimizer_broken = SGD(layer_broken.parameters(), lr=0.1)
-
-        loss_fn = MSELoss()
-
-        # Training data: 5 identical samples
-        x_data = Tensor([[1.0]])
-        y_data = Tensor([[2.0]])
-
-        # === CORRECT TRAINING (with zero_grad) ===
-        correct_grad_norms = []
-        for step in range(5):
-            optimizer_correct.zero_grad()  # ✅ CRITICAL: Clear gradients
-
-            output = layer_correct.forward(x_data)
-            loss = loss_fn.forward(output, y_data)
-            loss.backward()
-
-            # Record gradient norm
-            grad_norm = np.linalg.norm(layer_correct.weights.grad.data)
-            correct_grad_norms.append(grad_norm)
-
-            optimizer_correct.step()
-
-        # === BROKEN TRAINING (without zero_grad) ===
-        broken_grad_norms = []
-        for step in range(5):
-            # ❌ BUG: Missing optimizer_broken.zero_grad()
-
-            output = layer_broken.forward(x_data)
-            loss = loss_fn.forward(output, y_data)
-            loss.backward()
-
-            # Record gradient norm (should accumulate!)
-            grad_norm = np.linalg.norm(layer_broken.weights.grad.data)
-            broken_grad_norms.append(grad_norm)
-
-            optimizer_broken.step()
-
-        # === VALIDATION ===
-        print("\n🔬 Testing zero_grad() requirement:")
-        print(f"Correct gradient norms (with zero_grad): {correct_grad_norms}")
-        print(f"Broken gradient norms (without zero_grad): {broken_grad_norms}")
-
-        # Test 1: Gradients should accumulate without zero_grad()
-        assert broken_grad_norms[-1] > broken_grad_norms[0] * 2.0, \
-            "Gradients should accumulate when zero_grad() is missing"
-
-        # Test 2: Correct gradients should be relatively stable
-        correct_variation = max(correct_grad_norms) / (min(correct_grad_norms) + 1e-8)
-        assert correct_variation < 5.0, \
-            "Correct gradients shouldn't grow excessively"
-
-        # Test 3: Broken gradients grow much larger than correct ones
-        assert broken_grad_norms[-1] > correct_grad_norms[-1] * 2.0, \
-            "Missing zero_grad() should cause noticeably larger gradients"
-
-        print("✅ zero_grad() requirement correctly enforced!")
-
-    def test_trainer_calls_zero_grad(self):
-        """
-        Test that Trainer class properly calls zero_grad() during training.
-
-        This validates the Trainer implementation includes the critical zero_grad() call.
-        """
-        # Create simple model
-        class SimpleModel:
-            def __init__(self):
-                self.layer = Linear(2, 1)
-                self.training = True
-
-            def forward(self, x):
-                return self.layer.forward(x)
-
-            def parameters(self):
-                return self.layer.parameters()
-
-        model = SimpleModel()
-        optimizer = SGD(model.parameters(), lr=0.01)
-        loss_fn = MSELoss()
-        trainer = Trainer(model, optimizer, loss_fn)
-
-        # Create simple dataset
-        class SimpleDataset:
-            def __iter__(self):
-                for _ in range(3):
-                    x = Tensor(np.random.randn(2, 2))
-                    y = Tensor(np.random.randn(2, 1))
-                    yield x, y
-
-        # Train for 2 epochs
-        for epoch in range(2):
-            trainer.train_epoch(SimpleDataset())
-
-        # After training, gradients should be zeroed (from last zero_grad() call)
-        # OR they should exist from last backward (depends on implementation)
-        # Key test: Training should have called zero_grad() internally
-        # (This is validated by training not diverging)
-
-        print("✅ Trainer correctly manages gradient clearing!")
-
-
-# =============================================================================
-# CRITICAL TEST 2: Loss Convergence Validation
-# =============================================================================
-# BUG-CATCHING VALUE: CRITICAL
-# PURPOSE: Validate entire training pipeline produces learning
-# SYMPTOM: Training runs but model doesn't improve
-# =============================================================================
-
-class TestLossConvergence:
-    """Test that training actually produces learning on simple problems."""
-
-    def test_linear_regression_convergence(self):
-        """
-        Test training converges on simple linear regression problem.
-
-        Problem: Learn y = 2x + 1
-        Model: Linear(1, 1) with weights and bias
-        Success criteria: Loss decreases, learned weights ≈ [2.0], bias ≈ [1.0]
-        """
-        # Create model
-        class LinearModel:
-            def __init__(self):
-                self.layer = Linear(1, 1)
-                self.training = True
-
-            def forward(self, x):
-                return self.layer.forward(x)
-
-            def parameters(self):
-                return self.layer.parameters()
-
-        model = LinearModel()
-        optimizer = SGD(model.parameters(), lr=0.01)
-        loss_fn = MSELoss()
-        trainer = Trainer(model, optimizer, loss_fn)
-
-        # Generate training data: y = 2x + 1
-        np.random.seed(42)
-        X_train = np.random.randn(100, 1).astype(np.float32)
-        y_train = (2.0 * X_train + 1.0).astype(np.float32)
-
-        # Create dataset
-        class RegressionDataset:
-            def __init__(self, X, y, batch_size=10):
-                self.X = X
-                self.y = y
-                self.batch_size = batch_size
-
-            def __iter__(self):
-                indices = np.arange(len(self.X))
-                np.random.shuffle(indices)
-                for i in range(0, len(self.X), self.batch_size):
-                    batch_indices = indices[i:i+self.batch_size]
-                    yield Tensor(self.X[batch_indices]), Tensor(self.y[batch_indices])
-
-        dataset = RegressionDataset(X_train, y_train, batch_size=10)
-
-        # Train for 100 epochs
-        print("\n🔬 Testing loss convergence on y = 2x + 1:")
-        losses = []
-        for epoch in range(100):
-            loss = trainer.train_epoch(dataset)
-            losses.append(loss)
-
-            if epoch % 20 == 0:
-                print(f"Epoch {epoch:3d}: Loss = {loss:.6f}")
-
-        initial_loss = losses[0]
-        final_loss = losses[-1]
-
-        print(f"\nInitial loss: {initial_loss:.6f}")
-        print(f"Final loss: {final_loss:.6f}")
-        print(f"Reduction: {(1 - final_loss/initial_loss)*100:.1f}%")
-
-        # Test 1: Loss should decrease significantly
-        assert final_loss < initial_loss * 0.1, \
-            f"Loss should decrease to < 10% of initial. Got {final_loss/initial_loss*100:.1f}%"
-
-        # Test 2: Loss should be near zero (good fit)
-        assert final_loss < 0.1, \
-            f"Final loss should be < 0.1 for simple problem. Got {final_loss:.6f}"
-
-        # Test 3: Learned weights should approximate true values
-        learned_weight = model.layer.weights.data[0, 0]
-        learned_bias = model.layer.bias.data[0] if model.layer.bias is not None else 0.0
-
-        print(f"\nTrue parameters: weight=2.0, bias=1.0")
-        print(f"Learned parameters: weight={learned_weight:.3f}, bias={learned_bias:.3f}")
-
-        # Allow some tolerance for learning
-        assert abs(learned_weight - 2.0) < 0.5, \
-            f"Weight should be close to 2.0, got {learned_weight:.3f}"
-
-        if model.layer.bias is not None:
-            assert abs(learned_bias - 1.0) < 0.5, \
-                f"Bias should be close to 1.0, got {learned_bias:.3f}"
-
-        print("✅ Training successfully converged to correct solution!")
-
-    def test_classification_convergence(self):
-        """
-        Test training converges on simple classification problem.
-
-        Problem: Learn XOR-like pattern with 2-layer network
-        Success criteria: Loss decreases, accuracy improves
-        """
-        # Create 2-layer model for XOR
-        class XORModel:
-            def __init__(self):
-                self.layer1 = Linear(2, 4)
-                self.relu = ReLU()
-                self.layer2 = Linear(4, 2)
-                self.training = True
-
-            def forward(self, x):
-                x = self.layer1.forward(x)
-                x = self.relu.forward(x)
-                x = self.layer2.forward(x)
-                return x
-
-            def parameters(self):
-                return self.layer1.parameters() + self.layer2.parameters()
-
-        model = XORModel()
-        optimizer = AdamW(model.parameters(), lr=0.01)
-        loss_fn = CrossEntropyLoss()
-        trainer = Trainer(model, optimizer, loss_fn)
-
-        # Generate XOR-like data
-        np.random.seed(42)
-        X_train = np.array([
-            [0, 0], [0, 1], [1, 0], [1, 1],
-            [0, 0], [0, 1], [1, 0], [1, 1],
-            [0, 0], [0, 1], [1, 0], [1, 1],
-        ], dtype=np.float32)
-
-        y_train = np.array([0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0], dtype=np.int64)
-
-        # Create dataset
-        class XORDataset:
-            def __iter__(self):
-                for i in range(len(X_train)):
-                    yield Tensor(X_train[i:i+1]), Tensor(y_train[i:i+1])
-
-        dataset = XORDataset()
-
-        # Train for 200 epochs
-        print("\n🔬 Testing classification convergence on XOR pattern:")
-        losses = []
-        for epoch in range(200):
-            loss = trainer.train_epoch(dataset)
-            losses.append(loss)
-
-            if epoch % 40 == 0:
-                print(f"Epoch {epoch:3d}: Loss = {loss:.6f}")
-
-        initial_loss = losses[0]
-        final_loss = losses[-1]
-
-        print(f"\nInitial loss: {initial_loss:.6f}")
-        print(f"Final loss: {final_loss:.6f}")
-        print(f"Reduction: {(1 - final_loss/initial_loss)*100:.1f}%")
-
-        # Test: Loss should decrease significantly
-        assert final_loss < initial_loss * 0.5, \
-            f"Loss should decrease to < 50% of initial. Got {final_loss/initial_loss*100:.1f}%"
-
-        print("✅ Classification training successfully converged!")
-
-
-# =============================================================================
-# CRITICAL TEST 3: Scheduler Integration
-# =============================================================================
-# BUG-CATCHING VALUE: HIGH
-# COMMON BUG: Scheduler exists but doesn't actually update learning rate
-# SYMPTOM: Learning rate stays constant despite scheduler
-# =============================================================================
-
-class TestSchedulerIntegration:
-    """Test that learning rate scheduler actually updates optimizer learning rate."""
-
-    def test_scheduler_updates_learning_rate(self):
-        """
-        Test that CosineSchedule integrates with Trainer and updates LR each epoch.
-
-        This validates:
-        1. Scheduler computes correct learning rates
-        2. Trainer applies scheduler updates to optimizer
-        3. Learning rate actually changes during training
-        """
-        # Create simple model
-        class SimpleModel:
-            def __init__(self):
-                self.layer = Linear(2, 1)
-                self.training = True
-
-            def forward(self, x):
-                return self.layer.forward(x)
-
-            def parameters(self):
-                return self.layer.parameters()
-
-        model = SimpleModel()
-        optimizer = SGD(model.parameters(), lr=0.1)  # Initial LR (will be overridden)
-
-        # Create scheduler: 0.1 → 0.01 over 10 epochs
-        scheduler = CosineSchedule(max_lr=0.1, min_lr=0.01, total_epochs=10)
-
-        loss_fn = MSELoss()
-        trainer = Trainer(model, optimizer, loss_fn, scheduler=scheduler)
-
-        # Create simple dataset
-        class SimpleDataset:
-            def __iter__(self):
-                for _ in range(5):
-                    x = Tensor(np.random.randn(4, 2))
-                    y = Tensor(np.random.randn(4, 1))
-                    yield x, y
-
-        print("\n🔬 Testing learning rate scheduling:")
-
-        # Train for 10 epochs and track learning rate
-        learning_rates = []
-        for epoch in range(10):
-            # Record LR before training
-            lr_before = optimizer.lr
-
-            # Train one epoch
-            trainer.train_epoch(SimpleDataset())
-
-            # Record LR after training (scheduler should have updated it)
-            lr_after = optimizer.lr
-            learning_rates.append(lr_after)
-
-            print(f"Epoch {epoch}: LR = {lr_after:.6f}")
-
-        print(f"\nLearning rates: {[f'{lr:.4f}' for lr in learning_rates]}")
-
-        # Test 1: Learning rate should start at max_lr
-        assert abs(learning_rates[0] - 0.1) < 1e-6, \
-            f"Initial LR should be 0.1, got {learning_rates[0]:.6f}"
-
-        # Test 2: Learning rate should end at min_lr
-        assert abs(learning_rates[-1] - 0.01) < 1e-6, \
-            f"Final LR should be 0.01, got {learning_rates[-1]:.6f}"
-
-        # Test 3: Learning rate should decrease monotonically
-        for i in range(len(learning_rates) - 1):
-            assert learning_rates[i] >= learning_rates[i+1], \
-                f"LR should decrease monotonically. Epoch {i}: {learning_rates[i]:.6f} > Epoch {i+1}: {learning_rates[i+1]:.6f}"
-
-        # Test 4: Learning rate should actually change (not stuck)
-        unique_lrs = len(set([round(lr, 6) for lr in learning_rates]))
-        assert unique_lrs >= 5, \
-            f"LR should change across epochs. Only {unique_lrs} unique values found."
-
-        # Test 5: History should track learning rates
-        assert len(trainer.history['learning_rates']) == 10, \
-            "Trainer should record learning rate for each epoch"
-
-        print("✅ Learning rate scheduling works correctly!")
-
-    def test_training_without_scheduler(self):
-        """
-        Test that training works correctly when scheduler=None.
-
-        This validates that scheduler is truly optional.
-        """
-        # Create simple model
-        class SimpleModel:
-            def __init__(self):
-                self.layer = Linear(1, 1)
-                self.training = True
-
-            def forward(self, x):
-                return self.layer.forward(x)
-
-            def parameters(self):
-                return self.layer.parameters()
-
-        model = SimpleModel()
-        optimizer = SGD(model.parameters(), lr=0.05)
-        loss_fn = MSELoss()
-
-        # Create trainer WITHOUT scheduler
-        trainer = Trainer(model, optimizer, loss_fn, scheduler=None)
-
-        # Create simple dataset
-        class SimpleDataset:
-            def __iter__(self):
-                for _ in range(3):
-                    x = Tensor(np.random.randn(2, 1))
-                    y = Tensor(np.random.randn(2, 1))
-                    yield x, y
-
-        print("\n🔬 Testing training without scheduler:")
-
-        # Train for 5 epochs
-        initial_lr = optimizer.lr
-        for epoch in range(5):
-            trainer.train_epoch(SimpleDataset())
-            current_lr = optimizer.lr
-
-            print(f"Epoch {epoch}: LR = {current_lr:.6f}")
-
-            # Learning rate should stay constant
-            assert abs(current_lr - initial_lr) < 1e-9, \
-                f"LR should remain constant without scheduler. Expected {initial_lr}, got {current_lr}"
-
-        print("✅ Training without scheduler works correctly!")
-
-
-# =============================================================================
-# Test Execution
-# =============================================================================
-
-if __name__ == "__main__":
-    print("=" * 70)
-    print("Module 07 - CRITICAL Integration Tests")
-    print("=" * 70)
-
-    # Test 1: Missing zero_grad()
-    print("\n" + "=" * 70)
-    print("TEST 1: Missing zero_grad() Detection")
-    print("=" * 70)
-    test_zero_grad = TestMissingZeroGrad()
-    test_zero_grad.test_zero_grad_required_for_correct_training()
-    test_zero_grad.test_trainer_calls_zero_grad()
-
-    # Test 2: Loss Convergence
-    print("\n" + "=" * 70)
-    print("TEST 2: Loss Convergence Validation")
-    print("=" * 70)
-    test_convergence = TestLossConvergence()
-    test_convergence.test_linear_regression_convergence()
-    test_convergence.test_classification_convergence()
-
-    # Test 3: Scheduler Integration
-    print("\n" + "=" * 70)
-    print("TEST 3: Scheduler Integration")
-    print("=" * 70)
-    test_scheduler = TestSchedulerIntegration()
-    test_scheduler.test_scheduler_updates_learning_rate()
-    test_scheduler.test_training_without_scheduler()
-
-    print("\n" + "=" * 70)
-    print("ALL CRITICAL TESTS PASSED! ✅")
-    print("=" * 70)
-    print("\nModule 07 Training has passed critical integration validation.")
-    print("These tests verify:")
-    print("  ✅ Gradients are managed correctly (zero_grad)")
-    print("  ✅ Training produces learning (convergence)")
-    print("  ✅ Learning rate scheduling works (scheduler integration)")
--- a/tests/07_training/INTEGRATION_TEST_AUDIT.md
+++ b/tests/07_training/INTEGRATION_TEST_AUDIT.md
@@ -1,550 +0,0 @@
-# Module 07 (Training) - Integration Test Audit Report
-
-**Date**: 2025-11-25
-**Auditor**: Dr. Sarah Rodriguez
-**Status**: CRITICAL GAPS IDENTIFIED - Test coverage is for Module 10 (Optimizers), not Module 07 (Training)
-
---
-
-## CRITICAL FINDING: Wrong Module Being Tested
-
-**ISSUE**: The file `/tests/07_training/test_progressive_integration.py` contains tests for **Module 10 (Optimizers)**, NOT Module 07 (Training).
-
-**Evidence**:
- Line 2: "Module 10: Progressive Integration Tests"
- Line 3: "Tests that Module 10 (Optimizers) works correctly"
- Line 5: "DEPENDENCY CHAIN: 01_setup → ... → 10_optimizers"
- Line 6: "This is where we enable actual learning through gradient-based optimization."
-
-**Impact**: Module 07 (Training) has NO progressive integration tests validating its core functionality.
-
---
-
-## Module 07 Implementation Overview
-
-Based on `/src/07_training/07_training.py`, Module 07 provides:
-
-### Core Components Implemented:
-1. **CosineSchedule** - Learning rate scheduling with cosine annealing
-2. **clip_grad_norm()** - Global gradient norm clipping
-3. **Trainer class** - Complete training orchestration with:
-   - `train_epoch()` - Training loop with gradient accumulation
-   - `evaluate()` - Evaluation mode without gradients
-   - `save_checkpoint()` / `load_checkpoint()` - State persistence
-   - Train/eval mode switching
-   - Learning rate scheduling integration
-   - Gradient clipping integration
-   - History tracking
-
-### Integration Points (Modules 01-06):
- Module 01: Tensor operations
- Module 02: Activations (ReLU, Sigmoid)
- Module 03: Layers (Linear)
- Module 04: Losses (MSELoss, CrossEntropyLoss)
- Module 05: Autograd (backward pass, gradients)
- Module 06: Optimizers (SGD, AdamW)
-
---
-
-## Current Test Coverage Analysis
-
-### Existing Test Files:
-1. **test_progressive_integration.py** (498 lines)
-   - **WRONG MODULE**: Tests Module 10 (Optimizers)
-   - Tests SGD/Adam creation, parameter updates, gradient clipping
-   - Does NOT test Trainer class or training loops
-
-2. **test_autograd_integration.py** (213 lines)
-   - Tests autograd integration with tensors, layers, activations
-   - Validates backward pass, computation graphs
-   - Does NOT test training-specific functionality
-
-3. **test_tensor_autograd_integration.py** (348 lines)
-   - Tests Variable wrapping of Tensors
-   - Tests operations (add, multiply, relu, sigmoid)
-   - Tests backward pass and gradient computation
-   - Does NOT test training loops
-
-### Coverage Summary:
- **Autograd Integration**: ✅ Well covered (561 lines)
- **Optimizer Integration**: ✅ Covered (in wrong file)
- **Training Loop Integration**: ❌ **MISSING**
- **Trainer Class Integration**: ❌ **MISSING**
- **Learning Rate Scheduling**: ❌ **MISSING**
- **Gradient Clipping**: ⚠️ Partial (optimizer tests only)
- **Checkpointing**: ❌ **MISSING**
- **Train/Eval Mode**: ❌ **MISSING**
-
---
-
-## MISSING INTEGRATION TESTS - Critical Priorities
-
-### Priority 1: Training Loop Core Functionality
-
-#### Test 1.1: Complete Training Loop Integration
-**What to test**: End-to-end training loop through Trainer class
-```python
-class TestTrainerCoreIntegration:
-    def test_complete_training_loop(self):
-        """Test complete training loop integrates all modules correctly."""
-        # Components from all modules:
-        # - Model: Linear layers (Module 03) + ReLU (Module 02)
-        # - Loss: MSELoss or CrossEntropyLoss (Module 04)
-        # - Optimizer: SGD or AdamW (Module 06)
-        # - Trainer: Training orchestration (Module 07)
-
-        # Verify:
-        # - Forward pass works
-        # - Loss computation works
-        # - Backward pass computes gradients
-        # - Optimizer updates parameters
-        # - Loss decreases over epochs
-```
-
-**Why critical**: This is the PRIMARY integration point for Module 07. If this doesn't work, nothing else matters.
-
-#### Test 1.2: Missing zero_grad() Detection
-**What to test**: Training fails catastrophically if zero_grad() is missing
-```python
-def test_missing_zero_grad_causes_gradient_accumulation(self):
-    """Test that forgetting zero_grad() causes incorrect gradient accumulation."""
-    # Create trainer WITHOUT zero_grad() call
-    # Run multiple training steps
-    # Verify gradients accumulate incorrectly
-    # Show loss diverges instead of converging
-```
-
-**Why critical**: This is the #1 student mistake in training loops. Tests should catch it.
-
-**Bug-catching value**: HIGH - Common error that silently breaks training
-
-#### Test 1.3: Gradient Accumulation Pattern
-**What to test**: Gradient accumulation works correctly with accumulation_steps > 1
-```python
-def test_gradient_accumulation_correctness(self):
-    """Test gradient accumulation produces same results as larger batch."""
-    # Train with batch_size=4, accumulation_steps=1
-    # Train with batch_size=2, accumulation_steps=2
-    # Verify final gradients are equivalent
-    # Verify effective batch size is the same
-```
-
-**Why critical**: Production pattern for memory-limited training. Must work correctly.
-
---
-
-### Priority 2: Train/Eval Mode Switching
-
-#### Test 2.1: Mode Switching Affects Model Behavior
-**What to test**: model.training flag changes behavior correctly
-```python
-def test_train_eval_mode_switching(self):
-    """Test train/eval mode switching affects model behavior."""
-    # Create model with dropout or batchnorm (future modules)
-    # Run forward in training mode
-    # Run forward in eval mode
-    # Verify different outputs/behavior
-
-    # For Module 07: At minimum verify:
-    # - Trainer sets model.training = True in train_epoch()
-    # - Trainer sets model.training = False in evaluate()
-```
-
-**Why critical**: Proper mode switching is essential for correct evaluation and inference.
-
-**Bug-catching value**: MEDIUM - Subtle bug that causes incorrect evaluation metrics
-
-#### Test 2.2: Gradients Disabled During Evaluation
-**What to test**: No gradients computed during evaluation
-```python
-def test_evaluation_disables_gradients(self):
-    """Test evaluation doesn't compute or accumulate gradients."""
-    # Run evaluate() on test data
-    # Verify no gradients are computed
-    # Verify no parameter updates occur
-    # Verify optimizer state unchanged
-```
-
-**Why critical**: Evaluation should be faster and memory-efficient without gradients.
-
---
-
-### Priority 3: Learning Rate Scheduling Integration
-
-#### Test 3.1: Scheduler Updates Learning Rate
-**What to test**: Scheduler properly updates optimizer learning rate each epoch
-```python
-def test_scheduler_updates_learning_rate(self):
-    """Test learning rate scheduler integrates with training loop."""
-    # Create CosineSchedule(max_lr=0.1, min_lr=0.01, total_epochs=10)
-    # Create Trainer with scheduler
-    # Train for 10 epochs
-    # Verify optimizer.lr changes each epoch
-    # Verify lr follows cosine schedule (decreasing)
-    # Verify final lr ≈ min_lr
-```
-
-**Why critical**: Scheduling is essential for training convergence. Must integrate correctly.
-
-**Bug-catching value**: HIGH - Scheduler exists but doesn't actually update LR (common integration bug)
-
-#### Test 3.2: Training Without Scheduler Still Works
-**What to test**: Scheduler is optional, training works without it
-```python
-def test_training_without_scheduler(self):
-    """Test training works with scheduler=None."""
-    # Create Trainer with scheduler=None
-    # Train for multiple epochs
-    # Verify optimizer.lr stays constant
-    # Verify training still works correctly
-```
-
-**Why critical**: Ensures optional components are truly optional.
-
---
-
-### Priority 4: Gradient Clipping Integration
-
-#### Test 4.1: Gradient Clipping Prevents Explosion
-**What to test**: Gradient clipping rescales large gradients correctly
-```python
-def test_gradient_clipping_prevents_explosion(self):
-    """Test gradient clipping prevents exploding gradients."""
-    # Create model with potential for large gradients
-    # Set grad_clip_norm=1.0
-    # Inject artificially large gradients
-    # Train one step
-    # Verify gradient norm ≤ clip threshold
-    # Verify parameters update reasonably
-```
-
-**Why critical**: Prevents training instability from exploding gradients.
-
-**Bug-catching value**: HIGH - Clipping may be called but not actually applied
-
-#### Test 4.2: Small Gradients Not Affected
-**What to test**: Gradient clipping doesn't affect small gradients
-```python
-def test_small_gradients_unchanged_by_clipping(self):
-    """Test gradient clipping doesn't modify small gradients."""
-    # Create model with small gradients
-    # Set grad_clip_norm=10.0 (high threshold)
-    # Compute gradients
-    # Verify gradients unchanged
-```
-
-**Why critical**: Clipping should only activate when needed.
-
---
-
-### Priority 5: Loss Convergence Validation
-
-#### Test 5.1: Loss Decreases During Training
-**What to test**: Training actually improves model performance
-```python
-def test_loss_convergence_on_simple_problem(self):
-    """Test training reduces loss on simple learnable problem."""
-    # Create simple linear regression problem: y = 2x + 1
-    # Create model: Linear(1, 1)
-    # Train for 100 epochs
-    # Verify loss decreases monotonically (or mostly)
-    # Verify final loss < initial loss * 0.1
-    # Verify learned weights ≈ [2.0] and bias ≈ [1.0]
-```
-
-**Why critical**: Validates entire training pipeline produces learning.
-
-**Bug-catching value**: CRITICAL - Detects any component breaking learning
-
-#### Test 5.2: History Tracking Accuracy
-**What to test**: trainer.history correctly records training metrics
-```python
-def test_history_tracking(self):
-    """Test training history is tracked correctly."""
-    # Train for 5 epochs
-    # Verify len(trainer.history['train_loss']) == 5
-    # Verify len(trainer.history['learning_rates']) == 5 (if scheduler used)
-    # Verify values are reasonable (no NaN, no infinite)
-```
-
-**Why critical**: Users rely on history for monitoring and debugging.
-
---
-
-### Priority 6: Checkpointing and State Persistence
-
-#### Test 6.1: Save and Load Checkpoint
-**What to test**: Training state can be saved and restored
-```python
-def test_save_load_checkpoint(self):
-    """Test checkpoint saving and loading preserves training state."""
-    # Train for 5 epochs
-    # Save checkpoint
-    # Train for 5 more epochs
-    # Record final state
-
-    # Create new trainer
-    # Load checkpoint
-    # Train for 5 epochs
-    # Verify final state matches original
-```
-
-**Why critical**: Essential for long training jobs and experimentation.
-
-**Bug-catching value**: MEDIUM - Checkpoint may save but not restore correctly
-
-#### Test 6.2: Checkpoint Contains Complete State
-**What to test**: Checkpoint includes all necessary components
-```python
-def test_checkpoint_completeness(self):
-    """Test checkpoint contains all training state components."""
-    # Train for a few epochs
-    # Save checkpoint
-    # Load checkpoint dictionary
-    # Verify contains:
-    #   - model state (weights, biases)
-    #   - optimizer state (momentum, velocity for Adam)
-    #   - scheduler state (current epoch)
-    #   - training metadata (epoch, step)
-```
-
-**Why critical**: Incomplete checkpoints cause subtle resume errors.
-
---
-
-### Priority 7: Integration with Previous Modules
-
-#### Test 7.1: Works with Different Layer Types
-**What to test**: Training works with various layer architectures
-```python
-def test_training_with_different_architectures(self):
-    """Test training works with different model architectures."""
-    # Test 1: Single Linear layer
-    # Test 2: Multi-layer perceptron (Linear + ReLU + Linear)
-    # Test 3: Different activation functions
-    # Verify all train successfully
-```
-
-**Why critical**: Training should be architecture-agnostic.
-
-#### Test 7.2: Works with Different Loss Functions
-**What to test**: Training works with MSE, CrossEntropy, etc.
-```python
-def test_training_with_different_losses(self):
-    """Test training works with different loss functions."""
-    # Test 1: MSELoss for regression
-    # Test 2: CrossEntropyLoss for classification
-    # Verify both train correctly
-    # Verify gradients flow properly
-```
-
-**Why critical**: Training should support all loss types.
-
-#### Test 7.3: Works with Different Optimizers
-**What to test**: Training works with SGD, AdamW, etc.
-```python
-def test_training_with_different_optimizers(self):
-    """Test training works with different optimizers."""
-    # Test 1: SGD (simple, no momentum)
-    # Test 2: AdamW (complex, with momentum and adaptive LR)
-    # Verify both integrate correctly
-    # Verify both produce learning
-```
-
-**Why critical**: Training should be optimizer-agnostic.
-
---
-
-## Test Organization Recommendations
-
-### Suggested File Structure:
-
-```
-tests/07_training/
-├── test_progressive_integration.py    # FIX: Rename/move to tests/10_optimizers/
-├── test_trainer_core.py               # NEW: Priority 1 tests
-├── test_trainer_modes.py              # NEW: Priority 2 tests
-├── test_scheduler_integration.py      # NEW: Priority 3 tests
-├── test_gradient_clipping.py          # NEW: Priority 4 tests
-├── test_convergence.py                # NEW: Priority 5 tests
-├── test_checkpointing.py              # NEW: Priority 6 tests
-├── test_module_integration.py         # NEW: Priority 7 tests
-├── test_autograd_integration.py       # KEEP: Good coverage
-└── test_tensor_autograd_integration.py # KEEP: Good coverage
-```
-
---
-
-## Bug-Catching Priority Matrix
-
-| Test Category | Bug-Catching Value | Student Impact | Priority |
-|--------------|-------------------|----------------|----------|
-| Missing zero_grad() | CRITICAL | High - Silent failure | P0 |
-| Loss convergence validation | CRITICAL | High - No learning | P0 |
-| Scheduler integration | HIGH | Medium - Poor convergence | P1 |
-| Gradient clipping | HIGH | Medium - Training instability | P1 |
-| Train/eval mode | MEDIUM | Medium - Wrong metrics | P2 |
-| Checkpoint save/load | MEDIUM | Low - Resume failures | P2 |
-| Gradient accumulation | MEDIUM | Low - Memory issues | P3 |
-
---
-
-## Recommended Test Implementation Order
-
-### Phase 1: Core Functionality (P0)
-1. ✅ Fix file organization (move optimizer tests to correct location)
-2. ✅ Test complete training loop integration
-3. ✅ Test missing zero_grad() detection
-4. ✅ Test loss convergence on simple problem
-
-### Phase 2: Essential Features (P1)
-5. ✅ Test learning rate scheduling integration
-6. ✅ Test gradient clipping prevents explosion
-7. ✅ Test train/eval mode switching
-
-### Phase 3: Production Features (P2)
-8. ✅ Test checkpoint save and load
-9. ✅ Test gradient accumulation correctness
-10. ✅ Test history tracking accuracy
-
-### Phase 4: Robustness (P3)
-11. ✅ Test with different architectures
-12. ✅ Test with different loss functions
-13. ✅ Test with different optimizers
-
---
-
-## Summary
-
-### Current State:
- **Total test lines**: 1159 (but misplaced)
- **Module 07 specific tests**: ~0 (all tests are for wrong module)
- **Integration coverage**: 0% for training, 100% for autograd
-
-### Required Action:
-1. **URGENT**: Rename/move `test_progressive_integration.py` to `tests/10_optimizers/`
-2. **URGENT**: Create new `test_trainer_core.py` with Priority 1 tests (P0)
-3. **HIGH**: Create Priority 2-3 test files (P1)
-4. **MEDIUM**: Create Priority 4-7 test files (P2-P3)
-
-### Estimated Test Lines Needed:
- **Minimum (P0-P1)**: ~400 lines for critical functionality
- **Recommended (P0-P2)**: ~800 lines for production readiness
- **Comprehensive (P0-P3)**: ~1200 lines for full coverage
-
-### Critical Integration Points Missing Tests:
-1. ❌ Training loop orchestration
-2. ❌ zero_grad() requirement
-3. ❌ Learning rate scheduling
-4. ❌ Gradient clipping application
-5. ❌ Train/eval mode effects
-6. ❌ Loss convergence validation
-7. ❌ Checkpoint persistence
-
-**Overall Assessment**: Module 07 has ZERO integration test coverage. All existing tests are for the wrong module (10) or test components (autograd) rather than the training loop itself.
-
-**Risk Level**: 🔴 **CRITICAL** - Module 07 could be completely broken and tests would pass.
-
---
-
-## Appendix: Test Template Examples
-
-### Template: Complete Training Loop Test
-```python
-class TestTrainerCoreIntegration:
-    """Test Trainer class integrates all modules correctly."""
-
-    def test_complete_training_loop(self):
-        """Test end-to-end training with all components."""
-        from tinytorch.core.tensor import Tensor
-        from tinytorch.core.layers import Linear
-        from tinytorch.core.activations import ReLU
-        from tinytorch.core.losses import MSELoss
-        from tinytorch.core.optimizers import SGD
-        from tinytorch.core.training import Trainer
-
-        # Create simple model
-        class SimpleModel:
-            def __init__(self):
-                self.layer1 = Linear(2, 4)
-                self.relu = ReLU()
-                self.layer2 = Linear(4, 1)
-                self.training = True
-
-            def forward(self, x):
-                x = self.layer1(x)
-                x = self.relu(x)
-                x = self.layer2(x)
-                return x
-
-            def parameters(self):
-                return self.layer1.parameters() + self.layer2.parameters()
-
-        # Create components
-        model = SimpleModel()
-        optimizer = SGD(model.parameters(), lr=0.01)
-        loss_fn = MSELoss()
-        trainer = Trainer(model, optimizer, loss_fn)
-
-        # Create simple dataset: y = x1 + x2
-        class SimpleDataset:
-            def __iter__(self):
-                for _ in range(10):  # 10 batches
-                    x = Tensor(np.random.randn(4, 2))
-                    y = Tensor(x.data[:, 0:1] + x.data[:, 1:2])
-                    yield x, y
-
-        # Train for 5 epochs
-        initial_loss = None
-        for epoch in range(5):
-            loss = trainer.train_epoch(SimpleDataset())
-            if initial_loss is None:
-                initial_loss = loss
-
-        # Verify training worked
-        assert loss < initial_loss * 0.8, "Loss should decrease significantly"
-        assert len(trainer.history['train_loss']) == 5
-        assert trainer.epoch == 5
-```
-
-### Template: Missing zero_grad() Test
-```python
-def test_missing_zero_grad_breaks_training(self):
-    """Test that forgetting zero_grad() causes gradient accumulation."""
-    from tinytorch.core.tensor import Tensor
-    from tinytorch.core.layers import Linear
-    from tinytorch.core.losses import MSELoss
-    from tinytorch.core.optimizers import SGD
-
-    # Create model and optimizer
-    layer = Linear(1, 1)
-    optimizer = SGD(layer.parameters(), lr=0.1)
-    loss_fn = MSELoss()
-
-    # Manual training loop WITHOUT zero_grad()
-    x = Tensor([[1.0]])
-    y = Tensor([[2.0]])
-
-    # First step
-    out1 = layer.forward(x)
-    loss1 = loss_fn.forward(out1, y)
-    loss1.backward()
-    grad1 = layer.weights.grad.data.copy()
-    optimizer.step()
-    # FORGOT: optimizer.zero_grad()  ← BUG
-
-    # Second step
-    out2 = layer.forward(x)
-    loss2 = loss_fn.forward(out2, y)
-    loss2.backward()
-    grad2 = layer.weights.grad.data.copy()
-
-    # Verify gradients accumulated incorrectly
-    # grad2 should be ~2x grad1 because gradients accumulated
-    assert np.abs(grad2) > np.abs(grad1) * 1.5, \
-        "Gradients should accumulate when zero_grad() is missing"
-```
-
---
-
-**End of Audit Report**
--- a/tests/07_training/README_AUDIT.md
+++ b/tests/07_training/README_AUDIT.md
@@ -1,151 +0,0 @@
-# Module 07 Integration Test Audit - Quick Reference
-
-## TL;DR
-
-**Status**: 🔴 CRITICAL - Module 07 has 0% integration test coverage
-
-**Problem**: Test file tests wrong module (Module 10 instead of Module 07)
-
-**Impact**: Training loop could be completely broken and tests would pass
-
---
-
-## What to Read
-
-1. **Executive Summary** (2 min): `AUDIT_SUMMARY.md`
-   - Critical findings
-   - Top 3 missing tests
-   - Action items
-
-2. **Full Audit Report** (10 min): `INTEGRATION_TEST_AUDIT.md`
-   - Complete coverage analysis
-   - All missing tests (Priorities 0-3)
-   - Implementation templates
-
-3. **Critical Tests** (code): `CRITICAL_TESTS_TEMPLATE.py`
-   - Top 3 bug-catching tests (ready to run)
-   - ~400 lines of working test code
-   - Immediate implementation guide
-
---
-
-## Critical Integration Points
-
-| Integration Point | Current Coverage | Priority |
-|------------------|------------------|----------|
-| Training loop orchestration | ❌ 0% | P0 - CRITICAL |
-| zero_grad() requirement | ❌ 0% | P0 - CRITICAL |
-| Loss convergence | ❌ 0% | P0 - CRITICAL |
-| Learning rate scheduling | ❌ 0% | P1 - HIGH |
-| Gradient clipping | ⚠️ 20% | P1 - HIGH |
-| Train/eval mode | ❌ 0% | P1 - HIGH |
-| Checkpointing | ❌ 0% | P2 - MEDIUM |
-| Gradient accumulation | ❌ 0% | P2 - MEDIUM |
-
---
-
-## Immediate Actions Required
-
-### 1. Fix File Organization (5 min)
-```bash
-# Move misplaced test file to correct module
-mv tests/07_training/test_progressive_integration.py \
-   tests/10_optimizers/test_progressive_integration.py
-```
-
-### 2. Run Critical Tests (30 min)
-```bash
-# Test the 3 most critical integration points
-cd tests/07_training
-pytest CRITICAL_TESTS_TEMPLATE.py -v
-
-# Expected: Some tests may FAIL (catching real bugs!)
-```
-
-### 3. Create Real Test File (2 hours)
-```bash
-# Use template as basis for permanent test file
-cp CRITICAL_TESTS_TEMPLATE.py test_trainer_core.py
-
-# Integrate with TinyTorch test suite
-# Add to CI/CD pipeline
-```
-
---
-
-## Test Implementation Priority
-
-**Phase 1: P0 Tests (~210 lines, CRITICAL)**
- Missing zero_grad() detection
- Loss convergence validation
- Complete training loop integration
-
-**Phase 2: P1 Tests (~160 lines, HIGH)**
- Learning rate scheduling
- Gradient clipping
- Train/eval mode switching
-
-**Phase 3: P2 Tests (~180 lines, MEDIUM)**
- Checkpoint save/load
- Gradient accumulation
- History tracking
-
---
-
-## Expected Test Results
-
-### If All Components Work:
-```
-✅ zero_grad() requirement correctly enforced
-✅ Training successfully converged to correct solution
-✅ Learning rate scheduling works correctly
-```
-
-### If Bugs Exist (likely):
-```
-❌ Gradients accumulate without zero_grad() but training still "works"
-   → BUG: Missing zero_grad() in training loop
-
-❌ Loss doesn't decrease after 100 epochs
-   → BUG: Complete pipeline failure (check backward pass, optimizer)
-
-❌ Learning rate stays constant at 0.1
-   → BUG: Scheduler not integrated (called but LR not updated)
-```
-
---
-
-## Files Created by This Audit
-
-1. `AUDIT_SUMMARY.md` - Executive summary
-2. `INTEGRATION_TEST_AUDIT.md` - Full audit report
-3. `CRITICAL_TESTS_TEMPLATE.py` - Top 3 tests (ready to run)
-4. `README_AUDIT.md` - This quick reference
-
---
-
-## Questions to Answer
-
-**Q: Why is this marked CRITICAL?**
-A: Module 07 is where ALL previous modules integrate. If training doesn't work, nothing works. Zero test coverage means complete integration could be broken.
-
-**Q: How do we know tests are missing?**
-A: Current test file (`test_progressive_integration.py`) has wrong header ("Module 10") and tests optimizers, not training loops.
-
-**Q: What's the quickest way to establish confidence?**
-A: Run `CRITICAL_TESTS_TEMPLATE.py`. If those 3 tests pass, core functionality works. If they fail, we found critical bugs.
-
-**Q: How much work to fix?**
-A: Minimum (P0): ~210 lines, 2-3 hours. Recommended (P0+P1): ~370 lines, 1 day.
-
---
-
-## Contact
-
-For questions about this audit, see:
- Full report: `INTEGRATION_TEST_AUDIT.md`
- Test templates: `CRITICAL_TESTS_TEMPLATE.py`
- Module implementation: `/src/07_training/07_training.py`
-
-**Audit Date**: 2025-11-25
-**Status**: CRITICAL - Immediate action required
--- a/tests/08_dataloader/AUDIT_SUMMARY.txt
+++ b/tests/08_dataloader/AUDIT_SUMMARY.txt
@@ -1,210 +0,0 @@
-╔═══════════════════════════════════════════════════════════════════════════════╗
-║                   MODULE 08 INTEGRATION TEST AUDIT SUMMARY                    ║
-╚═══════════════════════════════════════════════════════════════════════════════╝
-
-🚨 CRITICAL BUG FOUND 🚨
-┌───────────────────────────────────────────────────────────────────────────────┐
-│ File Location: tests/08_dataloader/test_progressive_integration.py           │
-│ Expected Module: Module 08 (DataLoader)                                      │
-│ Actual Module: Module 09 (Autograd) ❌                                       │
-│                                                                               │
-│ IMPACT: Module 08 has ZERO integration tests currently!                      │
-└───────────────────────────────────────────────────────────────────────────────┘
-
-═══════════════════════════════════════════════════════════════════════════════
-📊 CURRENT TEST COVERAGE ANALYSIS
-═══════════════════════════════════════════════════════════════════════════════
-
-Current Tests (ALL WRONG MODULE):
-┌─────────────────────────────────────────────────────────────┐
-│ ✗ TestCompleteMLPipelineStillWorks                          │
-│   └─ Tests Module 09 regression, not Module 08             │
-│                                                             │
-│ ✗ TestModule09AutogradCore                                 │
-│   ├─ test_variable_wrapper_exists                          │
-│   ├─ test_gradient_computation                             │
-│   └─ test_computation_graph_building                       │
-│                                                             │
-│ ✗ TestAutogradIntegration                                  │
-│   ├─ test_autograd_with_layers                             │
-│   ├─ test_autograd_with_spatial_operations                 │
-│   └─ test_autograd_with_attention                          │
-│                                                             │
-│ ✗ TestGradientBasedLearningFoundation                      │
-│   ├─ test_parameter_gradient_computation                   │
-│   ├─ test_loss_function_gradients                          │
-│   └─ test_optimization_readiness                           │
-│                                                             │
-│ ✗ TestModule09Completion                                   │
-│   └─ test_autograd_foundation_complete                     │
-└─────────────────────────────────────────────────────────────┘
-
-Module 08 Coverage: 0/7 critical integration points tested ❌
-
-═══════════════════════════════════════════════════════════════════════════════
-🎯 MISSING MODULE 08 INTEGRATION TESTS
-═══════════════════════════════════════════════════════════════════════════════
-
-🔴 CRITICAL PRIORITY (Must Have):
-
-1. DataLoader + Training Loop Integration ⚠️
-   ┌────────────────────────────────────────────────────────┐
-   │ Tests: Batches work with model forward pass           │
-   │ Risk: Students can't train models                     │
-   │ Catches: Shape mismatches, iteration bugs             │
-   └────────────────────────────────────────────────────────┘
-
-2. Shuffling Consistency Across Epochs ⚠️
-   ┌────────────────────────────────────────────────────────┐
-   │ Tests: Data shuffles properly each epoch              │
-   │ Risk: Training may not converge                       │
-   │ Catches: Randomization bugs, duplicate samples        │
-   └────────────────────────────────────────────────────────┘
-
-3. Batch Size Memory Scaling ⚠️
-   ┌────────────────────────────────────────────────────────┐
-   │ Tests: Memory usage scales with batch size            │
-   │ Risk: OOM errors, poor performance                    │
-   │ Catches: Memory issues, batch handling bugs           │
-   └────────────────────────────────────────────────────────┘
-
-🟡 HIGH PRIORITY (Very Important):
-
-4. Tensor Dtype Compatibility
-   ┌────────────────────────────────────────────────────────┐
-   │ Tests: DataLoader tensors match model expectations    │
-   │ Risk: Type errors during training                     │
-   │ Catches: Dtype mismatches, conversion errors          │
-   └────────────────────────────────────────────────────────┘
-
-5. DataLoader + Loss Function Integration
-   ┌────────────────────────────────────────────────────────┐
-   │ Tests: Batched predictions work with loss functions   │
-   │ Risk: Loss computation fails                          │
-   │ Catches: Shape errors, reduction bugs                 │
-   └────────────────────────────────────────────────────────┘
-
-🟢 MEDIUM PRIORITY (Should Have):
-
-6. Empty/Single Sample Edge Cases
-   ┌────────────────────────────────────────────────────────┐
-   │ Tests: Graceful handling of unusual datasets          │
-   │ Risk: Crashes on edge cases                           │
-   │ Catches: Division by zero, empty iteration            │
-   └────────────────────────────────────────────────────────┘
-
-7. Multi-Epoch Iteration Stability
-   ┌────────────────────────────────────────────────────────┐
-   │ Tests: Multiple epochs work reliably                  │
-   │ Risk: Multi-epoch training fails                      │
-   │ Catches: Memory leaks, iteration bugs                 │
-   └────────────────────────────────────────────────────────┘
-
-═══════════════════════════════════════════════════════════════════════════════
-🔗 MODULE 08 INTEGRATION POINTS
-═══════════════════════════════════════════════════════════════════════════════
-
-Dependencies (What Module 08 Uses):
-┌─────────────────────────────────────────────────────────┐
-│ Module 01 (Tensor) ────→ Core data structure           │
-│ Module 03 (Layers) ────→ Batches passed to layers      │
-│ Module 04 (Losses) ────→ Batch predictions → loss      │
-│ Module 05 (Autograd) ──→ Batches in gradient tracking  │
-│ Module 06 (Optimizers) → Batches drive updates         │
-│ Module 07 (Training) ──→ DataLoader in training loop   │
-└─────────────────────────────────────────────────────────┘
-
-Enables (What Uses Module 08):
-┌─────────────────────────────────────────────────────────┐
-│ Module 07 (Training) → Training loop iteration         │
-│ Module 09 (Spatial) ──→ Batched image data for CNNs    │
-│ Module 10 (Text) ─────→ Batched text/token data        │
-│ All Future Modules ───→ Any batch processing           │
-└─────────────────────────────────────────────────────────┘
-
-═══════════════════════════════════════════════════════════════════════════════
-🛠️ RECOMMENDED ACTION PLAN
-═══════════════════════════════════════════════════════════════════════════════
-
-Step 1: Fix File Location ⚠️ IMMEDIATE
-┌─────────────────────────────────────────────────────────┐
-│ Move current file to correct location:                 │
-│                                                         │
-│ FROM: tests/08_dataloader/test_progressive_*.py        │
-│   TO: tests/09_autograd/test_progressive_*.py          │
-│                                                         │
-│ Reason: Current tests are for Module 09, not 08        │
-└─────────────────────────────────────────────────────────┘
-
-Step 2: Create New Module 08 Tests
-┌─────────────────────────────────────────────────────────┐
-│ Create proper test_progressive_integration.py for:     │
-│ - Dataset abstract class                               │
-│ - TensorDataset implementation                         │
-│ - DataLoader batching and shuffling                    │
-└─────────────────────────────────────────────────────────┘
-
-Step 3: Implement Critical Tests First
-┌─────────────────────────────────────────────────────────┐
-│ Priority Order:                                         │
-│ 1. DataLoader + Training Loop Integration              │
-│ 2. Shuffling Consistency                               │
-│ 3. Batch Size Memory Scaling                           │
-└─────────────────────────────────────────────────────────┘
-
-Step 4: Validate Student Workflows
-┌─────────────────────────────────────────────────────────┐
-│ Ensure tests catch real student issues:                │
-│ - Can they create datasets?                            │
-│ - Can they iterate batches?                            │
-│ - Can they train models end-to-end?                    │
-└─────────────────────────────────────────────────────────┘
-
-═══════════════════════════════════════════════════════════════════════════════
-📈 IMPACT ASSESSMENT
-═══════════════════════════════════════════════════════════════════════════════
-
-Current State:
-  ┌────────────────────────────────────────────┐
-  │ Module 08 Integration Coverage: 0%        │
-  │ Critical Bug Risk: VERY HIGH              │
-  │ Student Success Risk: VERY HIGH           │
-  └────────────────────────────────────────────┘
-
-After Implementing Recommended Tests:
-  ┌────────────────────────────────────────────┐
-  │ Module 08 Integration Coverage: 100%      │
-  │ Critical Bug Risk: LOW                    │
-  │ Student Success Risk: LOW                 │
-  └────────────────────────────────────────────┘
-
-Bugs Caught by New Tests:
-  ✓ Training loop integration failures
-  ✓ Shuffling and randomization bugs
-  ✓ Memory allocation issues
-  ✓ Dtype mismatches
-  ✓ Loss function integration errors
-  ✓ Edge case crashes
-  ✓ Multi-epoch stability issues
-
-═══════════════════════════════════════════════════════════════════════════════
-🎓 STUDENT IMPACT
-═══════════════════════════════════════════════════════════════════════════════
-
-Without Module 08 Tests:
-  ❌ Students can implement DataLoader but can't verify it works
-  ❌ Training loop failures discovered during later modules
-  ❌ Confusing errors with no clear debugging path
-  ❌ Wasted time on issues that tests should catch
-  ❌ Poor understanding of batch processing trade-offs
-
-With Module 08 Tests:
-  ✅ Students verify DataLoader works immediately
-  ✅ Integration issues caught at Module 08 boundary
-  ✅ Clear error messages guide debugging
-  ✅ Confidence to proceed to next modules
-  ✅ Deep understanding of batch processing mechanics
-
-═══════════════════════════════════════════════════════════════════════════════
-
-For detailed analysis, see: INTEGRATION_TEST_AUDIT.md
--- a/tests/08_dataloader/INTEGRATION_TEST_AUDIT.md
+++ b/tests/08_dataloader/INTEGRATION_TEST_AUDIT.md
@@ -1,361 +0,0 @@
-# Module 08 (DataLoader) Integration Test Audit
-
-## CRITICAL BUG IDENTIFIED
-
-**File**: `/Users/VJ/GitHub/TinyTorch/tests/08_dataloader/test_progressive_integration.py`
-**Issue**: Tests Module 09 (Autograd) instead of Module 08 (DataLoader)
-
-### Current Status
-
-The test file header claims to test Module 08 but actually tests:
-```python
-"""
-Module 08: Progressive Integration Tests
-Tests that Module 09 (Autograd) works correctly AND that the entire prior stack (01→08) still works.
-```
-
-**This is WRONG.** The file is in `tests/08_dataloader/` but tests Module 09 functionality.
-
---
-
-## What Tests Currently Exist
-
-### Current Tests (Module 09 - Autograd, WRONG MODULE)
-
-1. **TestCompleteMLPipelineStillWorks**
-   - `test_end_to_end_ml_pipeline_stable()` - Full CNN pipeline
-   - `test_attention_and_spatial_integration_stable()` - Advanced architectures
-
-2. **TestModule09AutogradCore** (WRONG - testing future module!)
-   - `test_variable_wrapper_exists()` - Variable class
-   - `test_gradient_computation()` - Backward pass
-   - `test_computation_graph_building()` - Computation graph
-
-3. **TestAutogradIntegration** (WRONG - testing future module!)
-   - `test_autograd_with_layers()` - Gradients through Dense layers
-   - `test_autograd_with_spatial_operations()` - CNN gradients
-   - `test_autograd_with_attention()` - Transformer gradients
-
-4. **TestGradientBasedLearningFoundation** (WRONG - testing future module!)
-   - `test_parameter_gradient_computation()` - Parameter gradients
-   - `test_loss_function_gradients()` - Loss gradients
-   - `test_optimization_readiness()` - Optimizer foundation
-
-5. **TestModule09Completion** (WRONG - testing future module!)
-   - `test_autograd_foundation_complete()` - Complete autograd validation
-
---
-
-## What Module 08 Tests SHOULD Exist
-
-### Module 08 Scope: DataLoader (Data Pipeline)
-
-**Implementation Location**: `tinytorch/data/loader.py`
-
-**Core Components**:
- `Dataset` - Abstract base class
- `TensorDataset` - Tensor wrapper dataset
- `DataLoader` - Batching and shuffling
-
-### Missing Integration Tests for Module 08
-
-#### 1. **DataLoader + Training Loop Integration** ⚠️ CRITICAL
-**Why**: Students need to verify DataLoader works with training loops
-
-```python
-def test_dataloader_training_loop_integration():
-    """
-    Test DataLoader provides batches correctly for training.
-
-    Integration Points:
-    - DataLoader batches → Model forward pass
-    - Batch tensors → Loss computation
-    - Multi-epoch iteration
-    """
-```
-
-**What to test**:
- DataLoader provides correct batch shapes
- Batches work with model forward pass
- Multiple epochs iterate correctly
- Training loop can consume all batches
-
-
-#### 2. **Shuffling Consistency** ⚠️ CRITICAL
-**Why**: Critical for training stability and reproducibility
-
-```python
-def test_dataloader_shuffling_consistency():
-    """
-    Test shuffling behavior across epochs.
-
-    Integration Points:
-    - Same data, different order each epoch
-    - Reproducibility with random seed
-    - All samples seen exactly once per epoch
-    """
-```
-
-**What to test**:
- Shuffle=True changes order between epochs
- Shuffle=False maintains order
- All samples appear exactly once per epoch
- Random seed controls shuffling
-
-
-#### 3. **Batch Size Memory Scaling** ⚠️ CRITICAL
-**Why**: Students need to understand batch size impact on memory
-
-```python
-def test_batch_size_memory_scaling():
-    """
-    Test memory usage scales with batch size.
-
-    Systems Analysis:
-    - Small batches (4): Low memory, more iterations
-    - Medium batches (32): Balanced
-    - Large batches (128): High memory, fewer iterations
-    """
-```
-
-**What to test**:
- Small batch sizes work correctly
- Large batch sizes work correctly
- Total samples = batches * batch_size (approximately)
- Last batch handles remainder correctly
-
-
-#### 4. **Tensor Dtype Compatibility** ⚠️ HIGH PRIORITY
-**Why**: DataLoader tensors must match model expectations
-
-```python
-def test_dataloader_tensor_dtype_compatibility():
-    """
-    Test DataLoader outputs match model input expectations.
-
-    Integration Points:
-    - DataLoader tensors → Model layers
-    - Feature dtype (float32)
-    - Label dtype (int64 for classification, float32 for regression)
-    """
-```
-
-**What to test**:
- Features are float32 tensors
- Labels have correct dtype
- Shapes match model input requirements
- No dtype conversion errors during training
-
-
-#### 5. **DataLoader + Loss Function Integration** ⚠️ HIGH PRIORITY
-**Why**: Batches must work with loss computation
-
-```python
-def test_dataloader_loss_integration():
-    """
-    Test DataLoader batches work with loss functions.
-
-    Integration Points:
-    - Batch predictions → Loss computation
-    - Batch labels → Loss targets
-    - Reduction across batch dimension
-    """
-```
-
-**What to test**:
- Batched predictions work with MSE loss
- Batched predictions work with CrossEntropy loss
- Loss reduction handles batch dimension
- Gradients (when ready) flow through batches
-
-
-#### 6. **Empty/Single Sample Edge Cases** ⚠️ MEDIUM PRIORITY
-**Why**: Robust data handling prevents training crashes
-
-```python
-def test_dataloader_edge_cases():
-    """
-    Test DataLoader handles edge cases gracefully.
-
-    Edge Cases:
-    - Dataset smaller than batch size
-    - Single sample dataset
-    - Last batch smaller than batch_size
-    """
-```
-
-**What to test**:
- Dataset with 1 sample
- Dataset smaller than batch_size
- Uneven division (10 samples, batch_size=3 → 4 batches)
- Empty iteration behavior
-
-
-#### 7. **DataLoader Iteration Stability** ⚠️ MEDIUM PRIORITY
-**Why**: Multiple epochs must work reliably
-
-```python
-def test_dataloader_multi_epoch_stability():
-    """
-    Test DataLoader can iterate multiple epochs without issues.
-
-    Integration Points:
-    - Reset between epochs
-    - Shuffle consistency
-    - No memory leaks across epochs
-    """
-```
-
-**What to test**:
- Can iterate 10+ epochs
- Each epoch yields same total samples
- Shuffling works every epoch
- No gradual slowdown
-
-
---
-
-## Bug-Catching Priority Ranking
-
-### CRITICAL (Must Have for Module 08)
-
-1. **DataLoader + Training Loop Integration**
-   - **Risk**: Students can't train models without this
-   - **Impact**: Complete failure of ML pipeline
-   - **Catches**: Shape mismatches, iteration bugs
-
-2. **Shuffling Consistency**
-   - **Risk**: Training may not converge if shuffling breaks
-   - **Impact**: Poor model performance, confusing results
-   - **Catches**: Randomization bugs, duplicate samples
-
-3. **Batch Size Memory Scaling**
-   - **Risk**: Students don't understand memory-compute trade-offs
-   - **Impact**: OOM errors, slow training
-   - **Catches**: Memory issues, batch handling bugs
-
-### HIGH PRIORITY (Very Important)
-
-4. **Tensor Dtype Compatibility**
-   - **Risk**: Type errors during training
-   - **Impact**: Cryptic errors, wasted debugging time
-   - **Catches**: Dtype mismatches, conversion errors
-
-5. **DataLoader + Loss Function Integration**
-   - **Risk**: Loss computation fails with batched data
-   - **Impact**: Training loop crashes
-   - **Catches**: Shape errors, reduction bugs
-
-### MEDIUM PRIORITY (Should Have)
-
-6. **Empty/Single Sample Edge Cases**
-   - **Risk**: Crashes on unusual datasets
-   - **Impact**: Fragile code, production failures
-   - **Catches**: Division by zero, empty iteration
-
-7. **DataLoader Iteration Stability**
-   - **Risk**: Multi-epoch training fails
-   - **Impact**: Can't train for sufficient epochs
-   - **Catches**: Memory leaks, iteration bugs
-
---
-
-## Recommended Action Plan
-
-### Immediate Actions
-
-1. **Rename Current File**
-   ```bash
-   mv tests/08_dataloader/test_progressive_integration.py \
-      tests/09_autograd/test_progressive_integration.py
-   ```
-   The current tests are for Module 09 (Autograd), not Module 08.
-
-2. **Create New Module 08 Tests**
-   Create a proper `test_progressive_integration.py` for Module 08 DataLoader testing.
-
-3. **Implement Critical Tests First**
-   - DataLoader + Training Loop Integration
-   - Shuffling Consistency
-   - Batch Size Memory Scaling
-
-### Test Structure for Module 08
-
-```python
-"""
-Module 08: Progressive Integration Tests
-Tests that Module 08 (DataLoader) works correctly AND that the entire prior stack (01→07) still works.
-
-DEPENDENCY CHAIN: 01_tensor → 02_activations → 03_layers → 04_losses → 05_autograd → 06_optimizers → 07_training → 08_dataloader
-
-This is where we enable efficient batch processing and data iteration for training.
-"""
-
-class TestPriorStackStillWorking:
-    """Regression: Modules 01-07 still work"""
-    # Quick smoke tests for foundation
-
-class TestModule08DataLoaderCore:
-    """Test Module 08 (DataLoader) core functionality"""
-    # Dataset, TensorDataset, DataLoader basic operations
-
-class TestDataLoaderTrainingIntegration:
-    """Integration: DataLoader + Training Loop"""
-    # CRITICAL: Full training pipeline with batching
-
-class TestDataLoaderMemoryBehavior:
-    """Systems: Memory and performance characteristics"""
-    # Batch size scaling, memory usage
-
-class TestModule08Completion:
-    """Final validation: Ready for next modules"""
-    # Complete checklist
-```
-
---
-
-## Integration Points for Module 08
-
-Based on existing code analysis:
-
-### Module 08 Dependencies (What it uses)
- **Module 01 (Tensor)**: `tinytorch.core.tensor.Tensor` - Core data structure
- **Module 02 (Activations)**: Not directly used, but batches go through activations
- **Module 03 (Layers)**: Batches passed to layers
- **Module 04 (Losses)**: Batch predictions → loss computation
- **Module 05 (Autograd)**: Batches participate in gradient computation
- **Module 06 (Optimizers)**: Batches drive parameter updates
- **Module 07 (Training)**: DataLoader provides batches for training loop
-
-### Module 08 Enables (What uses it)
- **Module 07 (Training)**: Training loops iterate over DataLoader
- **Module 09 (Spatial)**: Batched image data for CNNs
- **Module 10 (Tokenization)**: Batched text data
- **Module 11 (Embeddings)**: Batched sequence data
- All future training/inference pipelines
-
---
-
-## Summary
-
-### Current Coverage: **0% for Module 08 DataLoader**
- All existing tests are for Module 09 (Autograd)
- No tests for Dataset, TensorDataset, or DataLoader
- Critical integration points completely untested
-
-### Missing Tests: **7 integration test scenarios**
- 3 CRITICAL priority tests
- 2 HIGH priority tests
- 2 MEDIUM priority tests
-
-### Bug-Catching Gaps:
- **Training integration**: Untested - will students be able to train models?
- **Shuffling behavior**: Untested - will training converge?
- **Memory scaling**: Untested - will students understand batch size?
- **Dtype compatibility**: Untested - will type errors occur?
-
-### Recommended Next Steps:
-1. Move current file to Module 09 tests
-2. Create proper Module 08 integration tests
-3. Implement critical tests first (training loop, shuffling, memory)
-4. Validate with student workflows
--- a/tests/06_optimizers/test_spatial_core.py
+++ b/tests/06_optimizers/test_spatial_core.py
--- a/tests/10_tokenization/INTEGRATION_TEST_AUDIT.md
+++ b/tests/10_tokenization/INTEGRATION_TEST_AUDIT.md
@@ -1,575 +0,0 @@
-# Module 10 (Tokenization) Integration Test Audit
-
-**Date**: 2025-11-25
-**Auditor**: QA Agent
-**Status**: CRITICAL ISSUES FOUND - Test file contains completely wrong content
-
---
-
-## Executive Summary
-
-**CRITICAL FINDING**: The integration test file `/tests/10_tokenization/test_progressive_integration.py` contains **WRONG MODULE CONTENT** - it tests Module 11 (Training) instead of Module 10 (Tokenization).
-
-**Current Coverage**: 0% - No tokenization integration tests exist
-**Missing Tests**: 100% - All critical integration points untested
-**Priority**: HIGH - Module 10 has no integration validation
-
---
-
-## Current Test File Analysis
-
-### Problem: Wrong Module Tests
-
-The file `test_progressive_integration.py` contains:
- ❌ **Line 3-6**: References wrong dependency chain (mentions "11_training")
- ❌ **Classes**: TestModule11TrainingCore, TestAdvancedTrainingFeatures
- ❌ **Tests**: training loops, loss functions, optimizers, CNN pipelines
- ❌ **Imports**: training.Trainer, training.CrossEntropyLoss, etc.
-
-**Root Cause**: Copy-paste error from Module 11 template
-
---
-
-## Module 10 Actual Implementation
-
-### What Module 10 Provides
-
-**Location**: `tinytorch.text.tokenization`
-
-**Classes Implemented**:
-1. `Tokenizer` - Base class with encode/decode interface
-2. `CharTokenizer` - Character-level tokenization
-3. `BPETokenizer` - Byte Pair Encoding tokenizer
-
-**Key Methods**:
- `CharTokenizer.build_vocab(corpus)` - Build vocabulary from text
- `CharTokenizer.encode(text)` - Text → token IDs (List[int])
- `CharTokenizer.decode(tokens)` - Token IDs → text
- `BPETokenizer.train(corpus, vocab_size)` - Learn BPE merges
- `BPETokenizer.encode(text)` - BPE encoding
- `BPETokenizer.decode(tokens)` - BPE decoding
-
-**Integration Points with Other Modules**:
- Module 01 (Tensor): Can convert token IDs to Tensor (optional)
- Module 11 (Embeddings): Token IDs feed into embedding layers
- Module 08 (DataLoader): Tokenizers process text datasets
-
---
-
-## Critical Integration Tests MISSING
-
-### Priority 1: Data Type Correctness (Bug-Catching Priority)
-
-**Missing Test**: Tokenizers produce correct tensor dtypes
-```python
-def test_tokenizer_produces_int64_tensors():
-    """Verify tokenizers produce int64 token IDs for embedding layers."""
-    # WHY CRITICAL: Embeddings expect int64 indices, not float32
-    # BUG SCENARIO: If tokenizer returns float, embedding lookup crashes
-
-    tokenizer = CharTokenizer()
-    tokenizer.build_vocab(["hello world"])
-
-    # Encode text
-    token_ids = tokenizer.encode("hello")
-
-    # CRITICAL: Must be integers, not floats
-    assert all(isinstance(t, (int, np.integer)) for t in token_ids), \
-        "Token IDs must be integers for embedding lookup"
-
-    # If converting to Tensor, must be int64
-    token_tensor = Tensor(token_ids)
-    assert token_tensor.data.dtype == np.int64, \
-        f"Expected int64 for embeddings, got {token_tensor.data.dtype}"
-```
-
-**Bug This Catches**: Type mismatch between tokenizer output and embedding input
-
---
-
-### Priority 2: Embedding Layer Integration (Module 11 Dependency)
-
-**Missing Test**: Token sequences work with embeddings
-```python
-def test_tokenization_to_embedding_pipeline():
-    """Test complete tokenization → embedding pipeline."""
-    # WHY CRITICAL: This is the PRIMARY use case for tokenizers
-
-    try:
-        from tinytorch.text.embeddings import Embedding
-        from tinytorch.text.tokenization import CharTokenizer
-
-        # Build tokenizer
-        tokenizer = CharTokenizer()
-        corpus = ["hello", "world", "test"]
-        tokenizer.build_vocab(corpus)
-
-        vocab_size = len(tokenizer.vocab)
-        embed_dim = 16
-
-        # Create embedding layer
-        embedding = Embedding(vocab_size, embed_dim)
-
-        # Tokenize text
-        text = "hello world"
-        token_ids = tokenizer.encode(text)
-
-        # CRITICAL: Shape compatibility
-        token_tensor = Tensor(token_ids)
-        assert token_tensor.shape == (len(token_ids),), \
-            "Token IDs should be 1D sequence"
-
-        # Embedding lookup should work
-        embedded = embedding(token_tensor)
-        assert embedded.shape == (len(token_ids), embed_dim), \
-            f"Expected shape ({len(token_ids)}, {embed_dim}), got {embedded.shape}"
-
-        # Values should be actual embeddings, not zeros
-        assert not np.allclose(embedded.data, 0), \
-            "Embeddings should be non-zero (initialized randomly)"
-
-    except ImportError:
-        pytest.skip("Embeddings module not yet implemented")
-```
-
-**Bug This Catches**: Shape mismatches, dtype errors, index out-of-bounds
-
---
-
-### Priority 3: BPE Edge Cases (Robustness)
-
-**Missing Test**: BPE tokenizer handles edge cases
-```python
-def test_bpe_edge_cases():
-    """Test BPE tokenizer robustness with edge cases."""
-    tokenizer = BPETokenizer(vocab_size=100)
-
-    # Edge Case 1: Empty string
-    token_ids = tokenizer.encode("")
-    assert token_ids == [], "Empty string should produce empty token list"
-
-    decoded = tokenizer.decode([])
-    assert decoded == "", "Empty tokens should decode to empty string"
-
-    # Edge Case 2: Single character
-    tokenizer.train(["a", "b", "c"])
-    token_ids = tokenizer.encode("a")
-    assert len(token_ids) > 0, "Single char should tokenize"
-    assert tokenizer.decode(token_ids).strip() == "a", "Should roundtrip"
-
-    # Edge Case 3: Unknown characters (after training on limited corpus)
-    tokenizer.train(["hello", "world"])
-    token_ids = tokenizer.encode("xyz")  # Characters not in training
-
-    # Should handle gracefully with <UNK> token
-    assert 0 in token_ids or tokenizer.token_to_id.get('<UNK>') in token_ids, \
-        "Unknown characters should map to <UNK> token"
-
-    # Edge Case 4: Very long text
-    long_text = "hello " * 1000
-    token_ids = tokenizer.encode(long_text)
-    assert len(token_ids) > 0, "Long text should tokenize"
-    assert all(isinstance(t, int) for t in token_ids), \
-        "All tokens should be integers"
-
-    # Edge Case 5: Special characters
-    special_text = "hello, world! @#$%"
-    token_ids = tokenizer.encode(special_text)
-    decoded = tokenizer.decode(token_ids)
-    # Should preserve word content even if punctuation changes
-    assert "hello" in decoded or "world" in decoded, \
-        "Should preserve core words"
-```
-
-**Bug This Catches**: Crashes on empty input, unknown character handling, memory issues
-
---
-
-### Priority 4: Vocabulary Consistency
-
-**Missing Test**: Vocabulary consistency across encode/decode
-```python
-def test_vocabulary_encode_decode_consistency():
-    """Verify vocabulary mappings are bidirectional and consistent."""
-
-    # Test CharTokenizer
-    char_tokenizer = CharTokenizer()
-    corpus = ["abc", "def", "xyz"]
-    char_tokenizer.build_vocab(corpus)
-
-    # Check bidirectional mappings
-    for token, token_id in char_tokenizer.token_to_id.items():
-        assert char_tokenizer.id_to_token[token_id] == token, \
-            f"Bidirectional mapping broken: {token} -> {token_id} -> {char_tokenizer.id_to_token[token_id]}"
-
-    # Test roundtrip for all corpus text
-    for text in corpus:
-        token_ids = char_tokenizer.encode(text)
-        decoded = char_tokenizer.decode(token_ids)
-        # Should preserve characters (may have different spacing)
-        for char in text:
-            assert char in decoded, f"Lost character '{char}' in roundtrip"
-
-    # Test BPETokenizer
-    bpe_tokenizer = BPETokenizer(vocab_size=50)
-    bpe_tokenizer.train(["hello world", "test data"])
-
-    # Vocabulary should contain special tokens
-    assert '<UNK>' in bpe_tokenizer.vocab, "BPE should have <UNK> token"
-    assert bpe_tokenizer.token_to_id['<UNK>'] == 0, "<UNK> should be ID 0"
-
-    # Test roundtrip
-    text = "hello world"
-    token_ids = bpe_tokenizer.encode(text)
-    decoded = bpe_tokenizer.decode(token_ids)
-
-    # Should preserve words (BPE may merge/split differently)
-    words = text.split()
-    for word in words:
-        # Word content should be preserved (possibly with merges)
-        assert word in decoded or any(word in decoded for word in words), \
-            f"Lost word '{word}' in BPE roundtrip"
-```
-
-**Bug This Catches**: Vocabulary corruption, ID collisions, decode inconsistency
-
---
-
-### Priority 5: Batch Processing
-
-**Missing Test**: Tokenizer handles batches correctly
-```python
-def test_tokenizer_batch_processing():
-    """Test tokenizer works with batched text data."""
-    tokenizer = CharTokenizer()
-    corpus = ["hello", "world", "test", "data"]
-    tokenizer.build_vocab(corpus)
-
-    # Batch of texts
-    texts = ["hello world", "test data", "new text"]
-
-    # Encode batch
-    batch_token_ids = [tokenizer.encode(text) for text in texts]
-
-    # Check all are lists of ints
-    for token_ids in batch_token_ids:
-        assert isinstance(token_ids, list), "Each should be a list"
-        assert all(isinstance(t, int) for t in token_ids), \
-            "All tokens should be integers"
-
-    # Check different texts produce different token sequences
-    assert batch_token_ids[0] != batch_token_ids[1], \
-        "Different texts should produce different token sequences"
-
-    # Decode batch
-    decoded_texts = [tokenizer.decode(token_ids) for token_ids in batch_token_ids]
-
-    # Should preserve core content
-    for original, decoded in zip(texts, decoded_texts):
-        # May have spacing differences, but core words should match
-        original_words = set(original.split())
-        decoded_words = set(decoded.split())
-
-        # At least some words should match
-        assert len(original_words & decoded_words) > 0, \
-            f"Lost all words in roundtrip: {original} -> {decoded}"
-```
-
-**Bug This Catches**: Batch size errors, state pollution between encodes
-
---
-
-### Priority 6: Memory and Performance
-
-**Missing Test**: Tokenization memory usage and throughput
-```python
-def test_tokenization_performance():
-    """Test tokenization memory and throughput characteristics."""
-    import time
-
-    # Build tokenizers
-    char_tokenizer = CharTokenizer()
-    bpe_tokenizer = BPETokenizer(vocab_size=1000)
-
-    # Training corpus
-    corpus = ["hello world"] * 100
-    char_tokenizer.build_vocab(corpus)
-    bpe_tokenizer.train(corpus)
-
-    # Test text (simulate real document)
-    test_text = "hello world test data " * 100  # ~400 chars
-
-    # Measure CharTokenizer throughput
-    start = time.time()
-    iterations = 1000
-    for _ in range(iterations):
-        token_ids = char_tokenizer.encode(test_text)
-    char_time = time.time() - start
-    char_throughput = (len(test_text) * iterations) / char_time
-
-    print(f"CharTokenizer: {char_throughput:.0f} chars/sec")
-    assert char_throughput > 10000, \
-        f"CharTokenizer too slow: {char_throughput:.0f} chars/sec (expected >10K)"
-
-    # Measure BPE throughput
-    start = time.time()
-    for _ in range(iterations):
-        token_ids = bpe_tokenizer.encode(test_text)
-    bpe_time = time.time() - start
-    bpe_throughput = (len(test_text) * iterations) / bpe_time
-
-    print(f"BPETokenizer: {bpe_throughput:.0f} chars/sec")
-    # BPE should be slower (more complex), but still reasonable
-    assert bpe_throughput > 1000, \
-        f"BPETokenizer too slow: {bpe_throughput:.0f} chars/sec (expected >1K)"
-
-    # Vocabulary size check
-    assert len(char_tokenizer.vocab) < 500, \
-        f"CharTokenizer vocab too large: {len(char_tokenizer.vocab)} (expected <500)"
-
-    assert len(bpe_tokenizer.vocab) <= 1000, \
-        f"BPETokenizer vocab exceeded limit: {len(bpe_tokenizer.vocab)}"
-```
-
-**Bug This Catches**: Performance regressions, memory leaks, vocabulary explosion
-
---
-
-### Priority 7: DataLoader Integration
-
-**Missing Test**: Tokenizer integration with DataLoader
-```python
-def test_tokenizer_dataloader_integration():
-    """Test tokenizer works in DataLoader pipeline."""
-    try:
-        from tinytorch.core.data import Dataset, DataLoader
-        from tinytorch.text.tokenization import CharTokenizer
-
-        # Custom dataset with tokenization
-        class TextDataset(Dataset):
-            def __init__(self, texts, tokenizer):
-                self.texts = texts
-                self.tokenizer = tokenizer
-
-            def __len__(self):
-                return len(self.texts)
-
-            def __getitem__(self, idx):
-                text = self.texts[idx]
-                token_ids = self.tokenizer.encode(text)
-                # Return as tensor
-                return Tensor(token_ids)
-
-        # Build tokenizer
-        tokenizer = CharTokenizer()
-        texts = ["hello world", "test data", "sample text"]
-        tokenizer.build_vocab(texts)
-
-        # Create dataset and dataloader
-        dataset = TextDataset(texts, tokenizer)
-        dataloader = DataLoader(dataset, batch_size=2, shuffle=False)
-
-        # Iterate batches
-        batch_count = 0
-        for batch in dataloader:
-            batch_count += 1
-
-            # Batch should be tensor or list of tensors
-            if isinstance(batch, (list, tuple)):
-                assert len(batch) <= 2, "Batch size should be 2"
-                for item in batch:
-                    assert hasattr(item, 'data') or isinstance(item, Tensor), \
-                        "Items should be Tensors"
-            else:
-                # Single batch tensor
-                assert hasattr(batch, 'data'), "Batch should be Tensor"
-
-        assert batch_count > 0, "DataLoader should produce batches"
-
-    except ImportError:
-        pytest.skip("DataLoader not yet implemented")
-```
-
-**Bug This Catches**: DataLoader compatibility issues, batching errors
-
---
-
-## Regression Prevention Tests MISSING
-
-### Test: Prior Stack Still Works
-
-**Missing Test**: Verify Modules 01-09 unchanged
-```python
-def test_no_prior_module_regression():
-    """Ensure tokenization doesn't break prior modules."""
-    # Module 01 (Tensor) should still work
-    from tinytorch.core.tensor import Tensor
-
-    x = Tensor([1, 2, 3])
-    assert x.shape == (3,), "Tensor creation broken"
-
-    # Module 02 (Activations) should still work
-    try:
-        from tinytorch.core.activations import ReLU
-        relu = ReLU()
-        y = relu(x)
-        assert y.shape == x.shape, "Activation broken"
-    except ImportError:
-        pass  # Not implemented yet
-
-    # Module 08 (DataLoader) should still work
-    try:
-        from tinytorch.core.data import Dataset, DataLoader
-
-        class DummyDataset(Dataset):
-            def __len__(self):
-                return 5
-            def __getitem__(self, idx):
-                return idx
-
-        dataset = DummyDataset()
-        loader = DataLoader(dataset, batch_size=2)
-        assert len(dataset) == 5, "Dataset broken"
-    except ImportError:
-        pass
-```
-
---
-
-## Recommended Test File Structure
-
-```python
-"""
-Module 10: Progressive Integration Tests
-Tests that Module 10 (Tokenization) works correctly AND integrates with prior modules.
-
-DEPENDENCY CHAIN: 01_tensor → ... → 08_dataloader → 10_tokenization → 11_embeddings
-This is where we enable text processing for NLP.
-"""
-
-class TestPriorStackStillWorking:
-    """Quick regression checks that prior modules (01-09) still work."""
-
-    def test_tensor_operations_stable(self):
-        """Verify Module 01 (Tensor) still works."""
-
-    def test_dataloader_stable(self):
-        """Verify Module 08 (DataLoader) still works."""
-
-
-class TestModule10TokenizationCore:
-    """Test Module 10 (Tokenization) core functionality."""
-
-    def test_char_tokenizer_creation(self):
-        """Test CharTokenizer initialization and vocab building."""
-
-    def test_char_tokenizer_encode_decode(self):
-        """Test CharTokenizer encode/decode roundtrip."""
-
-    def test_bpe_tokenizer_training(self):
-        """Test BPE tokenizer training on corpus."""
-
-    def test_bpe_tokenizer_encode_decode(self):
-        """Test BPE encode/decode roundtrip."""
-
-
-class TestTokenizationIntegration:
-    """Test tokenization integration with other modules."""
-
-    def test_tokenizer_produces_correct_dtypes(self):
-        """PRIORITY 1: Verify int64 output for embeddings."""
-
-    def test_tokenization_to_embedding_pipeline(self):
-        """PRIORITY 2: Test complete tokenization → embedding flow."""
-
-    def test_tokenizer_dataloader_integration(self):
-        """Test tokenizer in DataLoader pipeline."""
-
-
-class TestTokenizationEdgeCases:
-    """Test tokenization robustness with edge cases."""
-
-    def test_bpe_edge_cases(self):
-        """PRIORITY 3: Empty strings, unknown tokens, special chars."""
-
-    def test_vocabulary_consistency(self):
-        """PRIORITY 4: Bidirectional mappings, roundtrip integrity."""
-
-    def test_batch_processing(self):
-        """PRIORITY 5: Batch encoding/decoding correctness."""
-
-
-class TestTokenizationPerformance:
-    """Test tokenization performance characteristics."""
-
-    def test_tokenization_throughput(self):
-        """PRIORITY 6: Measure chars/sec, vocab size."""
-
-    def test_memory_usage(self):
-        """Verify vocabulary doesn't consume excessive memory."""
-
-
-class TestRegressionPrevention:
-    """Ensure previous modules still work after Module 10."""
-
-    def test_no_tensor_regression(self):
-        """Verify Module 01 (Tensor) unchanged."""
-
-    def test_no_dataloader_regression(self):
-        """Verify Module 08 (DataLoader) unchanged."""
-```
-
---
-
-## Summary Statistics
-
-| Category | Missing Tests | Priority | Impact |
-|----------|--------------|----------|--------|
-| Data Type Correctness | 1 | CRITICAL | Breaks embeddings |
-| Embedding Integration | 1 | CRITICAL | Core use case |
-| BPE Edge Cases | 1 | HIGH | Production robustness |
-| Vocabulary Consistency | 1 | HIGH | Data integrity |
-| Batch Processing | 1 | MEDIUM | Real-world usage |
-| Performance | 1 | MEDIUM | Production viability |
-| DataLoader Integration | 1 | MEDIUM | Pipeline integrity |
-| Regression Prevention | 2 | HIGH | Stack stability |
-
-**Total Missing Tests**: 9 critical integration tests
-**Current Test Coverage**: 0% (wrong module)
-**Recommended Action**: REPLACE entire test file
-
---
-
-## Recommended Action Plan
-
-### Phase 1: Immediate (Critical Fixes)
-1. **REPLACE test_progressive_integration.py** with correct Module 10 tests
-2. **Implement Priority 1-2 tests** (dtype correctness, embedding integration)
-3. **Add BPE edge case tests** (Priority 3)
-
-### Phase 2: Short-term (Robustness)
-4. **Add vocabulary consistency tests** (Priority 4)
-5. **Add batch processing tests** (Priority 5)
-6. **Add regression prevention tests**
-
-### Phase 3: Performance Validation
-7. **Add performance benchmarks** (Priority 6)
-8. **Add DataLoader integration** (Priority 7)
-
---
-
-## Bug-Catching Priorities (Ranked)
-
-1. **Data Type Mismatch** (CRITICAL): int vs float breaks embedding lookup
-2. **Embedding Integration** (CRITICAL): Core use case must work
-3. **Unknown Token Handling** (HIGH): Crashes on unseen characters
-4. **Vocabulary Corruption** (HIGH): Encode/decode inconsistency
-5. **Empty Input Crashes** (MEDIUM): Edge case handling
-6. **Batch State Pollution** (MEDIUM): Tokenizer state leaks between calls
-7. **Performance Regression** (LOW): Slow tokenization impacts pipelines
-
---
-
-**Audit Completed**: 2025-11-25
-**Next Review**: After test file replacement
-**Sign-off**: QA Agent - Integration Testing Team
--- a/tests/11_embeddings/AUDIT_SUMMARY.txt
+++ b/tests/11_embeddings/AUDIT_SUMMARY.txt
@@ -1,105 +0,0 @@
-================================================================================
-MODULE 11 EMBEDDINGS - INTEGRATION TEST AUDIT SUMMARY
-================================================================================
-Date: 2025-11-25
-Status: CRITICAL ISSUES FOUND
-
-CRITICAL FINDING
-================================================================================
-The test file tests THE WRONG MODULE!
- File claims to test Module 11 (Embeddings)
- Actually tests Module 12 (Compression)
- This is a copy-paste error requiring COMPLETE REWRITE
-
-COVERAGE ANALYSIS
-================================================================================
-Current Coverage: 0% (tests wrong module)
-Missing Tests: 12 critical integration tests
-Risk Level: HIGH - No validation of embedding functionality
-
-TOP PRIORITY MISSING TESTS (P0 - CRITICAL)
-================================================================================
-1. test_tokenizer_embedding_pipeline
-   → Validates Module 10 → Module 11 integration
-   → Catches: Vocab size mismatches, invalid token IDs
-   → Priority: HIGHEST - This is the core use case
-
-2. test_embedding_index_out_of_bounds  
-   → Validates error handling for invalid indices
-   → Catches: Silent failures, tokenizer bugs
-   → Priority: HIGHEST - Prevents crashes
-
-3. test_positional_encoding_max_seq_len
-   → Validates sequence length limits
-   → Catches: OOB errors in attention, OOM crashes
-   → Priority: HIGHEST - Critical for Module 12
-
-4. test_embedding_gradient_flow
-   → Validates autograd integration (Module 05)
-   → Catches: Training failures, gradient bugs
-   → Priority: HIGH - Ensures embeddings are trainable
-
-HIGH PRIORITY MISSING TESTS (P1)
-================================================================================
-5. test_embedding_attention_shape_compatibility
-   → Validates Module 11 → Module 12 forward integration
-   → Ensures attention receives correct input shapes
-
-6. test_variable_sequence_length_handling
-   → Validates dynamic sequence length support
-   → Critical for real-world NLP tasks
-
-7. test_embedding_positional_composition
-   → Validates token + positional encoding combination
-   → Ensures both components contribute
-
-8. test_embedding_parameters_optimizable
-   → Validates optimizer integration
-   → Ensures embeddings participate in training
-
-CRITICAL INTEGRATION POINTS
-================================================================================
-Backward Integration (Dependencies):
-  ✗ Module 10 (Tokenization) → Token IDs feed embeddings
-  ✗ Module 05 (Autograd)     → Gradient flow through embeddings
-  ✗ Module 01 (Tensor)       → Embedding operations use Tensor
-
-Forward Integration (Dependents):
-  ✗ Module 11 → Module 12 (Attention)      → Shape compatibility
-  ✗ Module 11 → Module 13 (Transformers)   → Complete pipeline
-  ✗ Module 11 → Module 06 (Optimizers)     → Parameter updates
-
-BUG-CATCHING VALUE
-================================================================================
-Highest Impact Tests:
-  1. Index validation        → Catches 40% of embedding bugs
-  2. Gradient flow           → Catches 25% of bugs  
-  3. Shape compatibility     → Catches 20% of bugs
-  4. Sequence length limits  → Catches 15% of bugs
-
-IMMEDIATE ACTION REQUIRED
-================================================================================
-1. Delete all compression tests from test_progressive_integration.py
-2. Implement 4 P0 tests (tokenizer integration, index validation, etc.)
-3. Implement 4 P1 tests (attention compatibility, variable sequences, etc.)
-4. Add regression prevention tests (prior stack stability)
-
-ESTIMATED EFFORT
-================================================================================
-Total Time: 4-6 hours
-  - Fix wrong module bug:  30 min
-  - P0 tests (4):          1.5 hours  
-  - P1 tests (4):          1.5 hours
-  - P2 tests (4):          1.5 hours
-  - Documentation:         30 min
-  - Testing/validation:    1 hour
-
-EXPECTED OUTCOME
-================================================================================
-After fixes: 90%+ bug detection coverage
- Tokenizer integration validated
- Gradient flow confirmed
- Attention compatibility ensured
- Training loop integration verified
-
-See INTEGRATION_TEST_AUDIT.md for detailed analysis and test implementations.
--- a/tests/11_embeddings/INTEGRATION_TEST_AUDIT.md
+++ b/tests/11_embeddings/INTEGRATION_TEST_AUDIT.md
@@ -1,630 +0,0 @@
-# Module 11 (Embeddings) Integration Test Audit Report
-
-**Date**: 2025-11-25
-**Auditor**: Dr. Sarah Rodriguez
-**Module**: 11_embeddings (Token and Positional Embeddings)
-**Test File**: `tests/11_embeddings/test_progressive_integration.py`
-
---
-
-## Executive Summary
-
-**CRITICAL FINDING**: The integration test file is completely incorrect - it tests Module 12 (Compression) instead of Module 11 (Embeddings). This is a copy-paste error that must be fixed immediately.
-
-**Status**: MAJOR ISSUES - Complete rewrite required
-**Coverage**: 0% of Module 11 functionality (tests wrong module)
-**Risk Level**: HIGH - No integration validation for embeddings
-
---
-
-## Current Test File Issues
-
-### Issue 1: Wrong Module Being Tested (CRITICAL)
-**Problem**: File header says "Module 11" but tests "Module 12 (Compression)"
-```python
-# Current (WRONG):
-"""
-Module 11: Progressive Integration Tests
-Tests that Module 12 (Compression) works correctly...
-"""
-
-# Should be:
-"""
-Module 11: Progressive Integration Tests
-Tests that Module 11 (Embeddings) works correctly...
-"""
-```
-
-**Impact**: ZERO coverage of Module 11 integration points
-
-### Issue 2: Wrong Dependency Chain
-**Problem**: States dependency chain ending in compression
-```python
-# Current (WRONG):
-DEPENDENCY CHAIN: 01_setup → ... → 11_training → 12_compression
-
-# Should be:
-DEPENDENCY CHAIN: 01_tensor → 02_activations → ... → 10_tokenization → 11_embeddings
-```
-
-### Issue 3: No Embedding-Specific Tests
-**Problem**: All test classes focus on compression (quantization, pruning, distillation)
- `TestModule12CompressionCore` - Wrong module
- No `TestModule11EmbeddingsCore` - Missing!
- No embedding-tokenizer integration - Missing!
- No embedding-attention preparation - Missing!
-
---
-
-## Critical Integration Points for Module 11
-
-Based on the module implementation and DEFINITIVE_MODULE_PLAN, Module 11 must validate:
-
-### 1. Backward Integration (Dependencies)
-**Module 10 (Tokenization) → Module 11 (Embeddings)**
- ✗ Token IDs from tokenizers must be valid embedding indices
- ✗ Vocabulary size consistency between tokenizer and embedding
- ✗ Special token handling (<UNK>, <PAD>, <BOS>, <EOS>)
- ✗ Batch dimension handling from DataLoader
-
-**Module 01 (Tensor) → Module 11**
- ✗ Embeddings return proper Tensor objects
- ✗ Gradient tracking works (`requires_grad=True`)
- ✗ Tensor operations (slicing, reshaping) preserve embedding semantics
-
-**Module 05 (Autograd) → Module 11**
- ✗ EmbeddingBackward gradient computation
- ✗ Gradient accumulation for shared embeddings
- ✗ Positional encoding gradients flow correctly
-
-### 2. Forward Integration (Dependents)
-**Module 11 (Embeddings) → Module 12 (Attention)**
- ✗ Embedding output shape matches attention input requirements
- ✗ Positional encodings don't exceed max_seq_len
- ✗ Embedding + positional encoding creates position-aware representations
- ✗ Variable sequence length handling
-
-**Module 11 → Module 13 (Transformers)**
- ✗ EmbeddingLayer provides complete pipeline (token + positional)
- ✗ Embedding scaling (sqrt(embed_dim)) matches transformer conventions
- ✗ Learnable vs sinusoidal positional encoding options
-
-### 3. Cross-Module Integration
-**Embeddings + Optimizers**
- ✗ Embedding parameters appear in optimizer.parameters()
- ✗ Gradient updates modify embedding table correctly
- ✗ Positional encodings are trainable (when learned)
-
-**Embeddings + Training**
- ✗ Forward pass with batched token sequences
- ✗ Loss computation with embedded representations
- ✗ Backward pass updates embedding weights
-
---
-
-## Missing Test Coverage Analysis
-
-### Category A: Backward Integration Tests (HIGH PRIORITY)
-
-#### 1. Tokenizer → Embedding Integration
-**Missing Test**: `test_tokenizer_embedding_pipeline`
-```python
-def test_tokenizer_embedding_pipeline(self):
-    """Test token IDs from tokenizer work with embeddings."""
-    from tinytorch.text.tokenization import CharTokenizer
-    from tinytorch.text.embeddings import Embedding
-    from tinytorch.core.tensor import Tensor
-
-    # Tokenize text
-    tokenizer = CharTokenizer()
-    text = "Hello, world!"
-    token_ids = tokenizer.encode(text)  # Returns list of IDs
-
-    # Create embedding
-    vocab_size = len(tokenizer.vocab)
-    embed = Embedding(vocab_size=vocab_size, embed_dim=64)
-
-    # Convert to tensor and embed
-    tokens_tensor = Tensor(np.array([token_ids]))  # (1, seq_len)
-    embeddings = embed.forward(tokens_tensor)
-
-    # Validate
-    assert embeddings.shape == (1, len(token_ids), 64)
-    assert embeddings.requires_grad == True  # Should track gradients
-```
-
-**Bug-Catching Value**: Catches vocabulary size mismatches, invalid token IDs, dimension errors
-
-#### 2. Embedding Index Validation
-**Missing Test**: `test_embedding_index_out_of_bounds`
-```python
-def test_embedding_index_out_of_bounds(self):
-    """Test embedding handles invalid token IDs gracefully."""
-    from tinytorch.text.embeddings import Embedding
-    from tinytorch.core.tensor import Tensor
-
-    embed = Embedding(vocab_size=100, embed_dim=64)
-
-    # Test negative indices
-    try:
-        invalid_tokens = Tensor(np.array([[-1, 0, 1]]))
-        output = embed.forward(invalid_tokens)
-        assert False, "Should raise ValueError for negative indices"
-    except ValueError as e:
-        assert "out of range" in str(e).lower()
-
-    # Test indices >= vocab_size
-    try:
-        invalid_tokens = Tensor(np.array([[0, 1, 100]]))  # 100 >= vocab_size
-        output = embed.forward(invalid_tokens)
-        assert False, "Should raise ValueError for indices >= vocab_size"
-    except ValueError as e:
-        assert "out of range" in str(e).lower()
-```
-
-**Bug-Catching Value**: Prevents silent failures, catches tokenizer bugs, validates error messages
-
-#### 3. Gradient Flow Through Embeddings
-**Missing Test**: `test_embedding_gradient_flow`
-```python
-def test_embedding_gradient_flow(self):
-    """Test gradients flow back to embedding weights."""
-    from tinytorch.text.embeddings import Embedding
-    from tinytorch.core.tensor import Tensor
-
-    embed = Embedding(vocab_size=50, embed_dim=32)
-    tokens = Tensor(np.array([[1, 2, 3]]))  # (1, 3)
-
-    # Forward pass
-    output = embed.forward(tokens)
-    assert output.requires_grad == True
-
-    # Check backward function attached
-    assert hasattr(output, '_grad_fn')
-    assert output._grad_fn is not None
-
-    # Verify embedding weights are marked for gradients
-    assert embed.weight.requires_grad == True
-```
-
-**Bug-Catching Value**: Catches gradient tracking bugs, validates autograd integration
-
-#### 4. Positional Encoding Sequence Length Limits
-**Missing Test**: `test_positional_encoding_max_seq_len`
-```python
-def test_positional_encoding_max_seq_len(self):
-    """Test positional encoding respects max_seq_len."""
-    from tinytorch.text.embeddings import PositionalEncoding
-    from tinytorch.core.tensor import Tensor
-
-    max_seq_len = 512
-    pos_enc = PositionalEncoding(max_seq_len=max_seq_len, embed_dim=64)
-
-    # Test at limit (should work)
-    x_valid = Tensor(np.random.randn(2, 512, 64))  # (batch, seq, embed)
-    output = pos_enc.forward(x_valid)
-    assert output.shape == (2, 512, 64)
-
-    # Test beyond limit (should fail)
-    try:
-        x_invalid = Tensor(np.random.randn(2, 513, 64))  # Exceeds max_seq_len
-        output = pos_enc.forward(x_invalid)
-        assert False, "Should raise ValueError for seq_len > max_seq_len"
-    except ValueError as e:
-        assert "exceeds maximum" in str(e).lower()
-```
-
-**Bug-Catching Value**: Prevents position encoding OOB errors, critical for attention modules
-
-### Category B: Forward Integration Tests (HIGH PRIORITY)
-
-#### 5. Embedding → Attention Shape Compatibility
-**Missing Test**: `test_embedding_attention_shape_compatibility`
-```python
-def test_embedding_attention_shape_compatibility(self):
-    """Test embedding output shapes work with attention input requirements."""
-    from tinytorch.text.embeddings import EmbeddingLayer
-    from tinytorch.core.tensor import Tensor
-
-    # Create embedding layer
-    embed_layer = EmbeddingLayer(
-        vocab_size=1000,
-        embed_dim=512,
-        max_seq_len=128,
-        pos_encoding='learned'
-    )
-
-    # Simulate tokenized batch
-    batch_size, seq_len = 4, 32
-    tokens = Tensor(np.random.randint(0, 1000, (batch_size, seq_len)))
-
-    # Get embeddings
-    embeddings = embed_layer.forward(tokens)
-
-    # Validate attention-compatible shape (batch, seq, embed)
-    assert embeddings.shape == (batch_size, seq_len, 512)
-    assert embeddings.requires_grad == True
-
-    # Verify positional information is added
-    # (Different positions should have different representations)
-    # This is implicit validation - attention expects position-aware inputs
-```
-
-**Bug-Catching Value**: Ensures Module 12 (Attention) integration works, catches shape errors
-
-#### 6. Variable Sequence Length Handling
-**Missing Test**: `test_variable_sequence_length_handling`
-```python
-def test_variable_sequence_length_handling(self):
-    """Test embeddings handle variable sequence lengths correctly."""
-    from tinytorch.text.embeddings import EmbeddingLayer
-    from tinytorch.core.tensor import Tensor
-
-    embed_layer = EmbeddingLayer(
-        vocab_size=500,
-        embed_dim=256,
-        max_seq_len=512
-    )
-
-    # Test different sequence lengths
-    for seq_len in [10, 50, 100, 256, 512]:
-        tokens = Tensor(np.random.randint(0, 500, (2, seq_len)))
-        output = embed_layer.forward(tokens)
-
-        assert output.shape == (2, seq_len, 256)
-        assert output.requires_grad == True
-```
-
-**Bug-Catching Value**: Validates dynamic sequence handling, catches hardcoded assumptions
-
-#### 7. Embedding + Positional Encoding Composition
-**Missing Test**: `test_embedding_positional_composition`
-```python
-def test_embedding_positional_composition(self):
-    """Test token embeddings correctly combine with positional encodings."""
-    from tinytorch.text.embeddings import Embedding, PositionalEncoding
-    from tinytorch.core.tensor import Tensor
-
-    # Create components
-    token_embed = Embedding(vocab_size=100, embed_dim=64)
-    pos_enc = PositionalEncoding(max_seq_len=128, embed_dim=64)
-
-    # Token sequence
-    tokens = Tensor(np.array([[1, 2, 3, 4]]))  # (1, 4)
-
-    # Manual composition
-    token_embeds = token_embed.forward(tokens)  # (1, 4, 64)
-    position_aware = pos_enc.forward(token_embeds)  # (1, 4, 64)
-
-    # Validate shape preservation
-    assert position_aware.shape == token_embeds.shape
-
-    # Validate it's not just token embeddings (positional info added)
-    # NOTE: Can't easily test this without comparing values,
-    # but gradients should flow through both components
-    assert hasattr(position_aware, '_grad_fn')
-```
-
-**Bug-Catching Value**: Validates additive composition, ensures both components contribute
-
-### Category C: Cross-Module Integration Tests (MEDIUM PRIORITY)
-
-#### 8. Embedding Parameters in Optimizer
-**Missing Test**: `test_embedding_parameters_optimizable`
-```python
-def test_embedding_parameters_optimizable(self):
-    """Test embedding parameters work with optimizers."""
-    from tinytorch.text.embeddings import EmbeddingLayer
-    from tinytorch.core.optimizers import SGD
-    from tinytorch.core.tensor import Tensor
-    import numpy as np
-
-    # Create embedding layer
-    embed_layer = EmbeddingLayer(
-        vocab_size=200,
-        embed_dim=128,
-        pos_encoding='learned'
-    )
-
-    # Get parameters
-    params = embed_layer.parameters()
-
-    # Should have 2 parameter sets: token embeddings + positional encodings
-    assert len(params) == 2
-    assert all(p.requires_grad for p in params)
-
-    # Create optimizer
-    optimizer = SGD(params, lr=0.01)
-
-    # Verify optimizer accepted parameters
-    assert len(optimizer.parameters) == 2
-```
-
-**Bug-Catching Value**: Ensures training loop integration, catches parameter registration bugs
-
-#### 9. Embedding Training End-to-End
-**Missing Test**: `test_embedding_training_updates`
-```python
-def test_embedding_training_updates(self):
-    """Test embeddings update during training."""
-    from tinytorch.text.embeddings import Embedding
-    from tinytorch.core.tensor import Tensor
-    from tinytorch.core.losses import mse_loss
-    import numpy as np
-
-    embed = Embedding(vocab_size=50, embed_dim=32)
-
-    # Save initial weights
-    initial_weights = embed.weight.data.copy()
-
-    # Forward pass
-    tokens = Tensor(np.array([[1, 2, 3]]))
-    output = embed.forward(tokens)
-
-    # Compute loss (dummy target)
-    target = Tensor(np.random.randn(1, 3, 32))
-    loss = mse_loss(output, target)
-
-    # Backward pass
-    loss.backward()
-
-    # Verify gradients computed
-    assert embed.weight.grad is not None
-    assert embed.weight.grad.shape == embed.weight.shape
-
-    # Gradients should be non-zero for used embeddings
-    # (Only tokens 1, 2, 3 should have gradients)
-    # This validates sparse gradient accumulation
-```
-
-**Bug-Catching Value**: Validates end-to-end training, catches gradient bugs
-
-#### 10. Sinusoidal vs Learned Positional Encoding
-**Missing Test**: `test_sinusoidal_vs_learned_positional`
-```python
-def test_sinusoidal_vs_learned_positional(self):
-    """Test both positional encoding types work correctly."""
-    from tinytorch.text.embeddings import EmbeddingLayer
-    from tinytorch.core.tensor import Tensor
-
-    tokens = Tensor(np.random.randint(0, 100, (2, 10)))
-
-    # Learned positional encoding
-    embed_learned = EmbeddingLayer(
-        vocab_size=100,
-        embed_dim=64,
-        pos_encoding='learned'
-    )
-    output_learned = embed_learned.forward(tokens)
-    assert output_learned.shape == (2, 10, 64)
-
-    # Should have trainable positional parameters
-    params_learned = embed_learned.parameters()
-    assert len(params_learned) == 2  # Token + Positional
-
-    # Sinusoidal positional encoding
-    embed_sinusoidal = EmbeddingLayer(
-        vocab_size=100,
-        embed_dim=64,
-        pos_encoding='sinusoidal'
-    )
-    output_sinusoidal = embed_sinusoidal.forward(tokens)
-    assert output_sinusoidal.shape == (2, 10, 64)
-
-    # Should only have token embeddings as parameters (sinusoidal is fixed)
-    params_sinusoidal = embed_sinusoidal.parameters()
-    assert len(params_sinusoidal) == 1  # Only token embeddings
-
-    # No positional encoding
-    embed_none = EmbeddingLayer(
-        vocab_size=100,
-        embed_dim=64,
-        pos_encoding=None
-    )
-    output_none = embed_none.forward(tokens)
-    assert output_none.shape == (2, 10, 64)
-```
-
-**Bug-Catching Value**: Validates positional encoding options, ensures transformer flexibility
-
-### Category D: Regression Prevention Tests (MEDIUM PRIORITY)
-
-#### 11. Prior Stack Stability
-**Missing Test**: `test_prior_stack_stable_through_embeddings`
-```python
-def test_prior_stack_stable_through_embeddings(self):
-    """Verify embedding development didn't break Modules 01-10."""
-    # Module 01: Tensor
-    from tinytorch.core.tensor import Tensor
-    t = Tensor([1, 2, 3])
-    assert t.shape == (3,)
-
-    # Module 02: Activations
-    from tinytorch.core.activations import ReLU
-    relu = ReLU()
-    assert hasattr(relu, 'forward')
-
-    # Module 05: Autograd
-    from tinytorch.core.autograd import AddBackward
-    assert AddBackward is not None
-
-    # Module 10: Tokenization
-    from tinytorch.text.tokenization import CharTokenizer
-    tokenizer = CharTokenizer()
-    encoded = tokenizer.encode("test")
-    assert isinstance(encoded, list)
-```
-
-**Bug-Catching Value**: Catches import errors, validates module isolation
-
-#### 12. Embedding Memory Scaling
-**Missing Test**: `test_embedding_memory_scaling`
-```python
-def test_embedding_memory_scaling(self):
-    """Test embedding memory scales as expected."""
-    from tinytorch.text.embeddings import Embedding
-
-    # Small embedding
-    embed_small = Embedding(vocab_size=1000, embed_dim=128)
-    memory_small = embed_small.weight.data.nbytes
-
-    # Large embedding (4x vocabulary, 2x dimensions)
-    embed_large = Embedding(vocab_size=4000, embed_dim=256)
-    memory_large = embed_large.weight.data.nbytes
-
-    # Memory should scale proportionally: 4 * 2 = 8x
-    expected_ratio = 8.0
-    actual_ratio = memory_large / memory_small
-
-    assert np.isclose(actual_ratio, expected_ratio, rtol=0.1)
-```
-
-**Bug-Catching Value**: Validates memory model, catches initialization bugs
-
---
-
-## Recommended Test Structure
-
-### New File: `test_progressive_integration.py`
-```python
-"""
-Module 11: Progressive Integration Tests
-Tests that Module 11 (Embeddings) works correctly AND integrates with prior modules.
-
-DEPENDENCY CHAIN: 01_tensor → 05_autograd → 10_tokenization → 11_embeddings → 12_attention
-"""
-
-class TestPriorStackStillWorking:
-    """Verify Modules 01-10 still work after Module 11 development."""
-
-    def test_tensor_functionality_stable(self):
-        """Module 01: Tensor operations still work."""
-
-    def test_tokenization_functionality_stable(self):
-        """Module 10: Tokenization still works."""
-
-class TestModule11EmbeddingsCore:
-    """Test Module 11 core functionality in isolation."""
-
-    def test_embedding_creation(self):
-        """Test basic embedding layer creation."""
-
-    def test_positional_encoding_creation(self):
-        """Test positional encoding creation."""
-
-    def test_embedding_layer_complete_system(self):
-        """Test complete EmbeddingLayer system."""
-
-class TestBackwardIntegration:
-    """Test Module 11 integrates with dependencies (Modules 01-10)."""
-
-    def test_tokenizer_embedding_pipeline(self):
-        """Module 10 → 11: Tokenizer output feeds embeddings."""
-
-    def test_embedding_gradient_flow(self):
-        """Module 05 → 11: Autograd works with embeddings."""
-
-    def test_embedding_index_validation(self):
-        """Input validation catches tokenizer bugs."""
-
-class TestForwardIntegration:
-    """Test Module 11 prepares for dependents (Module 12+)."""
-
-    def test_embedding_attention_compatibility(self):
-        """Module 11 → 12: Output shapes match attention requirements."""
-
-    def test_positional_encoding_sequence_limits(self):
-        """Position encodings respect max_seq_len for attention."""
-
-    def test_variable_sequence_length_handling(self):
-        """Dynamic sequence lengths work correctly."""
-
-class TestCrossModuleIntegration:
-    """Test Module 11 works with the complete stack."""
-
-    def test_embedding_parameters_optimizable(self):
-        """Embeddings integrate with optimizers."""
-
-    def test_embedding_training_updates(self):
-        """End-to-end training updates embeddings."""
-
-    def test_sinusoidal_vs_learned_encoding(self):
-        """Both positional encoding types work."""
-
-class TestRegressionPrevention:
-    """Prevent future bugs and validate edge cases."""
-
-    def test_embedding_memory_scaling(self):
-        """Memory usage scales correctly."""
-
-    def test_embedding_edge_cases(self):
-        """Empty sequences, single tokens, max length."""
-```
-
---
-
-## Priority Ranking for Implementation
-
-### P0 - CRITICAL (Implement First)
-1. **Fix wrong module bug** - Replace compression tests with embedding tests
-2. **test_tokenizer_embedding_pipeline** - Core integration point
-3. **test_embedding_index_out_of_bounds** - Prevents silent failures
-4. **test_positional_encoding_max_seq_len** - Critical for attention
-
-### P1 - HIGH (Implement Second)
-5. **test_embedding_attention_shape_compatibility** - Forward integration
-6. **test_embedding_gradient_flow** - Autograd validation
-7. **test_variable_sequence_length_handling** - Dynamic sequences
-8. **test_embedding_positional_composition** - Component interaction
-
-### P2 - MEDIUM (Implement Third)
-9. **test_embedding_parameters_optimizable** - Training integration
-10. **test_sinusoidal_vs_learned_positional** - Encoding options
-11. **test_embedding_training_updates** - End-to-end validation
-12. **test_embedding_memory_scaling** - Performance awareness
-
---
-
-## Bug-Catching Priorities
-
-### Highest Value Tests (Catch Most Bugs)
-1. **Index validation** - Catches 40% of embedding bugs (OOB errors, vocab mismatches)
-2. **Gradient flow** - Catches 25% of bugs (autograd issues, training failures)
-3. **Shape compatibility** - Catches 20% of bugs (dimension mismatches, pipeline errors)
-4. **Sequence length limits** - Catches 15% of bugs (attention crashes, OOM errors)
-
-### Production-Critical Tests
- **test_tokenizer_embedding_pipeline** - Real usage pattern
- **test_embedding_attention_compatibility** - Transformer requirement
- **test_positional_encoding_max_seq_len** - Prevents runtime crashes
- **test_embedding_training_updates** - Validates learning actually works
-
---
-
-## Estimated Implementation Effort
-
-**Total Work**: ~4-6 hours for complete integration test suite
- P0 tests: 1.5 hours (4 tests)
- P1 tests: 1.5 hours (4 tests)
- P2 tests: 1.5 hours (4 tests)
- Documentation: 0.5 hours
- Testing & validation: 1 hour
-
-**Recommended Approach**:
-1. Day 1: Fix wrong module bug, implement P0 tests
-2. Day 2: Implement P1 tests
-3. Day 3: Implement P2 tests, documentation
-
---
-
-## Conclusion
-
-The current integration test file is **completely broken** - it tests the wrong module (Compression instead of Embeddings). A full rewrite is required.
-
-**Key Priorities**:
-1. Replace all compression tests with embedding tests
-2. Focus on tokenizer → embedding → attention integration
-3. Validate gradient flow and parameter optimization
-4. Test both learned and sinusoidal positional encodings
-
-**Expected Outcome**: Robust integration test suite that catches 90%+ of embedding-related bugs before they reach production.
--- a/tests/15_memoization/INTEGRATION_TEST_AUDIT.md
+++ b/tests/15_memoization/INTEGRATION_TEST_AUDIT.md
@@ -1,518 +0,0 @@
-# Module 17 (Memoization/KV Cache) - Integration Test Audit Report
-
-## Executive Summary
-
-**Current Status**: Module 15/17 (Memoization) has **NO specific integration tests** - the test file `tests/15_memoization/test_progressive_integration.py` currently contains only generic TinyGPT/Capstone tests that belong in a later module.
-
-**Critical Gap**: This module implements KV caching - a production-critical optimization with complex integration points - but has zero tests validating those integrations work correctly.
-
---
-
-## Current Test Coverage Analysis
-
-### What Exists (tests/15_memoization/test_progressive_integration.py)
-
-The current test file is **COMPLETELY MISNAMED** - it tests Module 16 (TinyGPT Capstone), NOT Module 17 (Memoization):
-
-```python
-class TestModule16TinyGPTCore:  # ← Tests TinyGPT, not KV cache!
-    def test_transformer_block_creation(self)
-    def test_tinygpt_model_creation(self)
-    def test_text_generation_capabilities(self)
-
-class TestCompleteSystemIntegration:  # ← Generic system tests
-    def test_end_to_end_language_model_training(self)
-    def test_compressed_transformer_deployment(self)
-    def test_multi_modal_capabilities(self)
-```
-
-**Zero tests validate**:
- KVCache integration with MultiHeadAttention
- Cache updates during autoregressive generation
- Training vs inference mode detection
- Cache corruption across generation steps
- Memory scaling validation
-
---
-
-## Critical Integration Points for Module 17
-
-Based on module implementation (`src/17_memoization/17_memoization.py`), these are the **CRITICAL integration points that MUST be tested**:
-
-### 1. KVCache ↔ MultiHeadAttention Integration
-
-**What needs testing**:
-```python
-class KVCache:
-    def update(layer_idx, key, value)  # ← Must work with attention output
-    def get(layer_idx)  # ← Must provide correct format for attention
-    def advance()  # ← Must sync with generation loop
-```
-
-**Integration scenarios**:
- ✅ KVCache stores K,V tensors from attention computation
- ✅ Retrieved cache has correct shape for attention: `(batch, heads, seq_len, head_dim)`
- ✅ Cache updates don't corrupt data across layers
- ✅ Sequence position advances correctly after all layers process
-
-**Risk**: Cache shape mismatch crashes attention → broken generation
-
---
-
-### 2. Cache ↔ Generation Loop Integration
-
-**What needs testing**:
-```python
-def enable_kv_cache(model)  # ← Non-invasive model patching
-# Generation loop must:
-# 1. Create cache before generation
-# 2. Pass cache to model.forward()
-# 3. Advance cache after each step
-# 4. Stop at max_seq_len
-```
-
-**Integration scenarios**:
- ✅ Cache initialized with correct model architecture params
- ✅ Generation produces correct output with cache enabled
- ✅ Cache updates don't break across generation steps
- ✅ Generated sequence length respects max_seq_len limit
- ✅ Cache memory doesn't grow unbounded
-
-**Risk**: Cache corruption mid-generation → garbage output after N tokens
-
---
-
-### 3. Training Mode Detection
-
-**What needs testing**:
-```python
-# From implementation:
-# - Training: Don't use cache (need gradients)
-# - Inference: Use cache (no gradients, faster)
-```
-
-**Integration scenarios**:
- ✅ model.train() disables cache usage
- ✅ model.eval() enables cache usage
- ✅ Training with cache accidentally enabled → error or warning
- ✅ Cache correctly marked as inference-only (no gradient tracking)
-
-**Risk**: Training with cache enabled → incorrect gradients → broken model
-
---
-
-### 4. Multi-Layer Cache Consistency
-
-**What needs testing**:
-```python
-# Each transformer layer has its own (K, V) cache
-# Cache updates must not interfere across layers
-cache.update(layer_idx=0, ...)  # Layer 0
-cache.update(layer_idx=1, ...)  # Layer 1
-```
-
-**Integration scenarios**:
- ✅ Layer 0 cache update doesn't corrupt Layer 1 cache
- ✅ All layers retrieve correct cached K,V for their layer_idx
- ✅ Parallel layer processing doesn't cause race conditions
- ✅ Cache.get() returns layer-specific cached values
-
-**Risk**: Layer cache mixing → incorrect attention → degraded quality
-
---
-
-### 5. Batch Inference Validation
-
-**What needs testing**:
-```python
-cache = KVCache(batch_size=4, ...)  # Generate 4 sequences in parallel
-# Each sequence in batch has independent cache state
-```
-
-**Integration scenarios**:
- ✅ Batch dimension properly handled in cache updates
- ✅ Different sequences don't interfere with each other
- ✅ Cache memory scales linearly with batch_size
- ✅ Batch inference produces same results as sequential
-
-**Risk**: Batch sequences cross-contaminate → non-deterministic output
-
---
-
-### 6. Memory Scaling Validation
-
-**What needs testing**:
-```python
-# Cache memory = batch × layers × heads × seq_len × head_dim × 4 bytes
-# Must validate this doesn't OOM for realistic configs
-```
-
-**Integration scenarios**:
- ✅ Small model (2 layers, 64 dim) uses <1 MB
- ✅ Medium model (4 layers, 128 dim) uses 1-10 MB
- ✅ Large model (12 layers, 768 dim, seq=1024) uses ~37 MB
- ✅ Memory calculation matches actual allocation
- ✅ Max sequence length enforcement prevents unbounded growth
-
-**Risk**: Unbounded cache growth → OOM crash in production
-
---
-
-## Missing Integration Tests (Priority Ordered)
-
-### CRITICAL (P0) - Break Production if Missing
-
-#### Test 1: Cache-Enabled Generation Produces Correct Output
-```python
-def test_kv_cache_generation_correctness():
-    """Verify cached generation matches non-cached generation."""
-    model = create_tiny_transformer()
-    input_ids = [1, 2, 3]
-
-    # Generate without cache (baseline)
-    output_no_cache = model.generate(input_ids, max_new_tokens=10)
-
-    # Generate with cache
-    cache = enable_kv_cache(model)
-    output_with_cache = model.generate(input_ids, max_new_tokens=10, cache=cache)
-
-    # Outputs should be identical (deterministic generation)
-    assert output_no_cache == output_with_cache
-```
-
-**Bug it catches**: Cache corruption producing wrong tokens
-
---
-
-#### Test 2: Cache Updates Don't Corrupt Across Layers
-```python
-def test_cache_layer_isolation():
-    """Verify each layer's cache is independent."""
-    cache = KVCache(batch_size=1, max_seq_len=10, num_layers=3,
-                    num_heads=4, head_dim=16)
-
-    # Update each layer with unique data
-    for layer_idx in range(3):
-        key = Tensor(np.full((1, 4, 1, 16), layer_idx))
-        val = Tensor(np.full((1, 4, 1, 16), layer_idx * 10))
-        cache.update(layer_idx, key, val)
-
-    cache.advance()
-
-    # Verify each layer has its own data (no cross-contamination)
-    for layer_idx in range(3):
-        k, v = cache.get(layer_idx)
-        assert np.all(k.data == layer_idx), f"Layer {layer_idx} key corrupted"
-        assert np.all(v.data == layer_idx * 10), f"Layer {layer_idx} value corrupted"
-```
-
-**Bug it catches**: Layer cache mixing causing quality degradation
-
---
-
-#### Test 3: Training Mode Prevents Cache Usage
-```python
-def test_training_mode_disables_cache():
-    """Verify cache is disabled during training."""
-    model = create_tiny_transformer()
-    cache = enable_kv_cache(model)
-
-    # Training mode
-    model.train()
-
-    # Forward pass should NOT use cache (needs gradients)
-    input_ids = Tensor([[1, 2, 3, 4]])
-    output = model(input_ids)
-
-    # Cache should not have been updated
-    assert cache.seq_pos == 0, "Cache updated during training mode!"
-
-    # Inference mode
-    model.eval()
-    output = model(input_ids)
-
-    # Now cache should be updated
-    assert cache.seq_pos > 0, "Cache not updated during eval mode!"
-```
-
-**Bug it catches**: Incorrect gradients from cached computation
-
---
-
-#### Test 4: Cache Memory Grows Correctly
-```python
-def test_cache_memory_scaling():
-    """Verify cache memory scales as expected."""
-    configs = [
-        # (layers, embed_dim, heads, seq_len, expected_mb)
-        (2, 64, 4, 64, 0.1),      # Tiny: <0.2 MB
-        (4, 128, 8, 128, 2.0),    # Small: ~2 MB
-        (6, 256, 8, 256, 12.0),   # Medium: ~12 MB
-    ]
-
-    for num_layers, embed_dim, num_heads, max_seq_len, expected_mb in configs:
-        head_dim = embed_dim // num_heads
-        cache = KVCache(
-            batch_size=1,
-            max_seq_len=max_seq_len,
-            num_layers=num_layers,
-            num_heads=num_heads,
-            head_dim=head_dim
-        )
-
-        mem_info = cache.get_memory_usage()
-        actual_mb = mem_info['total_mb']
-
-        # Allow 20% tolerance for overhead
-        assert 0.8 * expected_mb < actual_mb < 1.2 * expected_mb, \
-            f"Memory scaling broken: expected ~{expected_mb}MB, got {actual_mb}MB"
-```
-
-**Bug it catches**: OOM from unbounded cache growth
-
---
-
-### HIGH (P1) - Degrade User Experience
-
-#### Test 5: Batch Inference Maintains Independence
-```python
-def test_batch_cache_independence():
-    """Verify batch sequences don't interfere."""
-    cache = KVCache(batch_size=4, max_seq_len=10, num_layers=2,
-                    num_heads=4, head_dim=16)
-
-    # Update with batch-specific data
-    # Batch 0: all 0s, Batch 1: all 1s, etc.
-    for step in range(3):
-        for layer_idx in range(2):
-            key = Tensor(np.stack([
-                np.full((4, 1, 16), batch_idx)
-                for batch_idx in range(4)
-            ]))
-            val = key.copy()
-            cache.update(layer_idx, key, val)
-        cache.advance()
-
-    # Verify each batch maintained its own data
-    for layer_idx in range(2):
-        k, v = cache.get(layer_idx)
-        for batch_idx in range(4):
-            assert np.all(k.data[batch_idx] == batch_idx), \
-                f"Batch {batch_idx} contaminated"
-```
-
-**Bug it catches**: Batch cross-contamination causing non-deterministic output
-
---
-
-#### Test 6: Cache Sequence Length Enforcement
-```python
-def test_cache_max_length_enforcement():
-    """Verify cache prevents exceeding max_seq_len."""
-    cache = KVCache(batch_size=1, max_seq_len=5, num_layers=2,
-                    num_heads=4, head_dim=16)
-
-    # Fill cache to max
-    for step in range(5):
-        for layer_idx in range(2):
-            key = Tensor(np.random.randn(1, 4, 1, 16))
-            val = Tensor(np.random.randn(1, 4, 1, 16))
-            cache.update(layer_idx, key, val)
-        cache.advance()
-
-    # Attempting to exceed should raise error
-    with pytest.raises(ValueError, match="max_seq_len"):
-        key = Tensor(np.random.randn(1, 4, 1, 16))
-        val = Tensor(np.random.randn(1, 4, 1, 16))
-        cache.update(0, key, val)  # Should fail
-```
-
-**Bug it catches**: Unbounded generation causing OOM
-
---
-
-#### Test 7: Cache Reset Functionality
-```python
-def test_cache_reset_clears_state():
-    """Verify reset() clears cache for reuse."""
-    cache = KVCache(batch_size=1, max_seq_len=10, num_layers=2,
-                    num_heads=4, head_dim=16)
-
-    # Fill cache with data
-    for step in range(3):
-        for layer_idx in range(2):
-            key = Tensor(np.ones((1, 4, 1, 16)))
-            val = Tensor(np.ones((1, 4, 1, 16)))
-            cache.update(layer_idx, key, val)
-        cache.advance()
-
-    assert cache.seq_pos == 3
-
-    # Reset cache
-    cache.reset()
-
-    # Verify clean state
-    assert cache.seq_pos == 0
-    k, v = cache.get(0)
-    assert k.shape[2] == 0, "Cache not empty after reset"
-```
-
-**Bug it catches**: Stale cache data corrupting next generation
-
---
-
-### MEDIUM (P2) - Nice to Have
-
-#### Test 8: enable_kv_cache() Integration with Real Model
-```python
-def test_enable_kv_cache_real_model():
-    """Verify enable_kv_cache() works with transformer model."""
-    from tinytorch.models.transformer import GPT
-
-    model = GPT(vocab_size=100, embed_dim=64, num_layers=2,
-                num_heads=4, max_seq_len=32)
-
-    # Enable cache
-    cache = enable_kv_cache(model)
-
-    # Verify model attributes
-    assert hasattr(model, '_kv_cache')
-    assert hasattr(model, '_cache_enabled')
-    assert model._cache_enabled == True
-
-    # Verify cache configuration matches model
-    assert cache.num_layers == model.num_layers
-    assert cache.num_heads == model.num_heads
-    assert cache.max_seq_len == model.max_seq_len
-```
-
-**Bug it catches**: enable_kv_cache() misconfiguration
-
---
-
-#### Test 9: Cache Shape Compatibility with Attention
-```python
-def test_cache_shapes_match_attention_requirements():
-    """Verify cached K,V have correct shapes for attention."""
-    cache = KVCache(batch_size=2, max_seq_len=10, num_layers=1,
-                    num_heads=4, head_dim=16)
-
-    # Simulate 3 generation steps
-    for step in range(3):
-        key = Tensor(np.random.randn(2, 4, 1, 16))  # (B, H, 1, D)
-        val = Tensor(np.random.randn(2, 4, 1, 16))
-        cache.update(0, key, val)
-        cache.advance()
-
-    # Get cached K,V
-    k, v = cache.get(0)
-
-    # Should have shape (B, H, seq_pos, D)
-    assert k.shape == (2, 4, 3, 16), f"Wrong key shape: {k.shape}"
-    assert v.shape == (2, 4, 3, 16), f"Wrong value shape: {v.shape}"
-
-    # Should be compatible with attention computation
-    # Q: (B, H, 1, D) @ K.T: (B, H, D, seq_pos) → (B, H, 1, seq_pos)
-    query = Tensor(np.random.randn(2, 4, 1, 16))
-    scores = query @ k.transpose(-2, -1)
-    assert scores.shape == (2, 4, 1, 3), "Attention computation failed"
-```
-
-**Bug it catches**: Shape mismatch causing attention crashes
-
---
-
-## Test Organization Recommendation
-
-### Proposed Structure
-
-```
-tests/15_memoization/
-├── test_progressive_integration.py  # RENAME from TinyGPT tests
-│   ├── TestKVCacheAttentionIntegration
-│   │   ├── test_cache_enabled_generation_correctness (P0)
-│   │   ├── test_cache_layer_isolation (P0)
-│   │   └── test_cache_shapes_match_attention (P2)
-│   │
-│   ├── TestCacheGenerationLoop
-│   │   ├── test_training_mode_disables_cache (P0)
-│   │   ├── test_cache_max_length_enforcement (P1)
-│   │   └── test_cache_reset_clears_state (P1)
-│   │
-│   ├── TestCacheMemoryScaling
-│   │   ├── test_cache_memory_scaling (P0)
-│   │   └── test_batch_cache_independence (P1)
-│   │
-│   └── TestEnableKVCacheIntegration
-│       └── test_enable_kv_cache_real_model (P2)
-│
-└── test_kv_cache_unit.py  # Unit tests (already exist in module)
-    └── test_unit_kvcache()  # From 17_memoization.py
-```
-
---
-
-## Summary Statistics
-
-| Category | Count |
-|----------|-------|
-| **Total Integration Tests Needed** | 9 |
-| **Critical (P0)** | 4 |
-| **High Priority (P1)** | 3 |
-| **Medium Priority (P2)** | 2 |
-| **Current Integration Tests** | 0 |
-| **Coverage Gap** | 100% |
-
---
-
-## Recommended Action Plan
-
-### Phase 1: Critical Tests (Week 1)
-1. Implement P0 tests (4 tests)
-2. Verify with real model (create minimal transformer for testing)
-3. Fix any bugs discovered
-
-### Phase 2: High Priority (Week 2)
-4. Implement P1 tests (3 tests)
-5. Add batch inference validation
-6. Add sequence length enforcement
-
-### Phase 3: Medium Priority (Week 3)
-7. Implement P2 tests (2 tests)
-8. Complete integration with enable_kv_cache()
-9. Final validation pass
-
---
-
-## Risk Assessment
-
-### Current Risk Level: **HIGH** ⚠️
-
-**Without these integration tests:**
- ✗ Cache corruption could go undetected → broken generation in production
- ✗ Training mode cache usage → incorrect gradients → broken models
- ✗ Memory leaks from unbounded cache → OOM crashes
- ✗ Layer cache mixing → degraded output quality
- ✗ Batch contamination → non-deterministic behavior
-
-**With these integration tests:**
- ✓ Catch cache corruption before deployment
- ✓ Prevent training/inference mode bugs
- ✓ Validate memory scaling behavior
- ✓ Ensure layer independence
- ✓ Guarantee batch inference correctness
-
---
-
-## Conclusion
-
-Module 17 (Memoization/KV Cache) currently has **ZERO integration tests** despite implementing complex interactions with:
- MultiHeadAttention (Module 12)
- Transformer blocks (Module 13)
- Generation loops
- Training/inference mode switching
- Multi-layer cache coordination
-
-**Recommendation**: Prioritize implementing the 4 P0 tests IMMEDIATELY to prevent production issues. These tests would have caught cache corruption bugs that could silently degrade model quality.
-
-The current test file is completely misnamed and tests the wrong module. It should be renamed and populated with the 9 integration tests outlined above.
--- a/tests/16_quantization/INTEGRATION_TEST_AUDIT.md
+++ b/tests/16_quantization/INTEGRATION_TEST_AUDIT.md
@@ -1,440 +0,0 @@
-# Module 16 Quantization - Integration Test Audit Report
-
-## Executive Summary
-
-**Current Status**: ❌ **CRITICAL - No integration tests implemented**
-**Test File**: `tests/16_quantization/test_quantization_integration.py`
-**Current Coverage**: 0% (stub file only)
-**Required Coverage**: Full integration with Modules 01-15
-
---
-
-## Critical Integration Points (Missing Tests)
-
-### 1. ✅ Model Integrity After Quantization
-**Status**: ❌ MISSING
-**Priority**: 🔴 CRITICAL - Bug Prevention
-
-**What needs testing**:
-```python
-def test_quantization_preserves_model_structure():
-    """Verify quantization doesn't corrupt model from Modules 03-13."""
-    # Test that quantized models can still:
-    # - Forward pass with correct shapes
-    # - Work with optimizers (Module 06)
-    # - Train with Trainer (Module 07)
-    # - Process batched data from DataLoader (Module 08)
-    # - Integrate with Conv2D/MaxPool2D (Module 09)
-    # - Work with attention mechanisms (Module 12)
-```
-
-**Why this matters**:
- Quantization modifies model layers IN-PLACE
- Must preserve API compatibility with all prior modules
- Breaking changes would cascade through entire system
- Students need confidence their models still work
-
-**Test cases needed**:
-1. Quantize MLP → verify Dense layers still work
-2. Quantize CNN → verify Conv2D/MaxPool2D integration
-3. Quantize Transformer → verify attention/embeddings work
-4. Quantize then train → verify optimizer compatibility
-5. Quantize then profile → verify profiler (M14) integration
-
---
-
-### 2. ✅ Output Similarity Validation
-**Status**: ❌ MISSING
-**Priority**: 🔴 CRITICAL - Accuracy Validation
-
-**What needs testing**:
-```python
-def test_quantized_output_matches_float32():
-    """Verify quantized models produce similar outputs to FP32."""
-    # Given: Original FP32 model
-    # When: Quantize to INT8
-    # Then: Output error < 1% (not just < 0.2 like unit test)
-
-    # Test across:
-    # - Different model architectures (MLP, CNN, Transformer)
-    # - Different input distributions (uniform, normal, realistic)
-    # - Different weight distributions (Xavier, He, pre-trained)
-```
-
-**Why this matters**:
- Unit tests use random weights (not realistic)
- Integration tests need realistic scenarios
- Must validate on actual model architectures
- Accuracy loss should be < 1% in production
-
-**Test cases needed**:
-1. Simple MLP on random data (baseline)
-2. CNN on image-like data (spatial patterns)
-3. Attention on sequence data (positional dependencies)
-4. Pre-trained weights (realistic distributions)
-5. Edge cases: very small/large activation ranges
-
---
-
-### 3. ⚠️ In-Place Modification Warning System
-**Status**: ❌ MISSING
-**Priority**: 🟡 HIGH - Student Safety
-
-**What needs testing**:
-```python
-def test_quantization_in_place_warning():
-    """Verify students are warned about destructive operations."""
-    # Test that:
-    # 1. quantize_model() warns about in-place modification
-    # 2. Documentation clearly states weights are LOST
-    # 3. Example shows copy.deepcopy() pattern
-    # 4. Error handling for trying to "unquantize"
-```
-
-**Why this matters**:
- Students will lose their trained models
- Can't recover FP32 weights after quantization
- Common mistake in production (quantize checkpoint by accident)
- Educational: teach defensive programming patterns
-
-**Test cases needed**:
-1. Verify warning message displays
-2. Test that original model IS modified
-3. Verify deepcopy() prevents modification
-4. Test error message for invalid recovery attempts
-
---
-
-### 4. 💾 Memory Reduction Measurement
-**Status**: ❌ MISSING
-**Priority**: 🟡 HIGH - Core Value Proposition
-
-**What needs testing**:
-```python
-def test_quantization_actual_memory_reduction():
-    """Measure ACTUAL memory savings, not theoretical."""
-    # Test that:
-    # 1. INT8 tensors use 1 byte (not 4 bytes)
-    # 2. Compression ratio ≈ 4× in practice
-    # 3. Memory profiler (M14) shows real savings
-    # 4. Savings persist after forward/backward passes
-```
-
-**Why this matters**:
- Unit tests calculate theoretical savings
- Need to verify ACTUAL memory usage
- Python's memory model can be tricky (views, copies)
- Students need to see real impact
-
-**Test cases needed**:
-1. Profile memory before/after quantization
-2. Verify dtype is actually int8 (not float32)
-3. Test memory during forward pass (no hidden FP32 copies)
-4. Measure total process memory (OS-level)
-5. Compare with Module 14 profiler predictions
-
---
-
-## Additional Missing Integration Tests
-
-### 5. 🔄 Backward Compatibility
-**Status**: ❌ MISSING
-**Priority**: 🟡 HIGH
-
-```python
-def test_quantized_models_work_with_existing_code():
-    """Verify quantized models integrate seamlessly."""
-    # Test that quantized models work with:
-    # - DataLoader batching
-    # - Training loops
-    # - Gradient computation (if supported)
-    # - Model saving/loading
-```
-
-### 6. 🚨 Edge Cases and Error Handling
-**Status**: ❌ MISSING
-**Priority**: 🟢 MEDIUM
-
-```python
-def test_quantization_edge_cases():
-    """Test corner cases that might break."""
-    # Test:
-    # - Quantizing already quantized model (should error)
-    # - Quantizing model with no Linear layers
-    # - Quantizing with empty calibration data
-    # - Quantizing constant weights (all zeros, all ones)
-    # - Quantizing extreme ranges (very small, very large)
-```
-
-### 7. 📊 Profiler Integration (Module 14)
-**Status**: ❌ MISSING
-**Priority**: 🟢 MEDIUM
-
-```python
-def test_quantization_with_profiler():
-    """Verify M14 profiler works with M16 quantization."""
-    # Test that:
-    # - Profiler can measure quantized models
-    # - Memory measurements are accurate
-    # - Parameter counting works correctly
-    # - Benchmark results make sense
-```
-
-### 8. 🏗️ Multi-Layer Model Integration
-**Status**: ❌ MISSING
-**Priority**: 🟡 HIGH
-
-```python
-def test_quantization_complex_architectures():
-    """Test quantization on realistic architectures."""
-    # Test:
-    # - ResNet-like skip connections
-    # - Multi-head attention models
-    # - Mixed CNN + Transformer
-    # - Models with shared weights (embeddings)
-```
-
---
-
-## Comparison with Other Modules
-
-### Module 14 (Profiling) Integration Test Pattern
-```python
-# Module 14 tests verify:
-✅ Complete system (01→14) still works
-✅ Multi-modal models work correctly
-✅ Advanced features integrate properly
-✅ Regression prevention for all prior modules
-```
-
-### Module 16 Should Follow Same Pattern
-```python
-# Module 16 needs:
-❌ Complete system (01→15) verification
-❌ Quantized multi-modal models
-❌ Integration with profiling/compression
-❌ Regression prevention
-```
-
---
-
-## Recommended Test Implementation Order
-
-### Phase 1: Critical Bug Prevention (Week 1)
-1. **test_quantization_preserves_model_structure()** - Prevent breaking changes
-2. **test_quantized_output_matches_float32()** - Validate accuracy preservation
-3. **test_quantization_actual_memory_reduction()** - Verify core value prop
-
-### Phase 2: Student Safety (Week 2)
-4. **test_quantization_in_place_warning()** - Prevent data loss
-5. **test_quantized_models_work_with_existing_code()** - Ensure usability
-6. **test_quantization_edge_cases()** - Handle corner cases
-
-### Phase 3: Advanced Integration (Week 3)
-7. **test_quantization_with_profiler()** - M14 + M16 integration
-8. **test_quantization_complex_architectures()** - Real-world scenarios
-9. **test_complete_tinytorch_system_stable()** - Full regression suite
-
---
-
-## Test Coverage Gaps - Detailed Analysis
-
-### Current Unit Test Coverage (in module)
-✅ `test_unit_quantize_int8()` - Basic quantization works
-✅ `test_unit_dequantize_int8()` - Basic dequantization works
-✅ `test_unit_quantized_linear()` - Single layer quantization
-✅ `test_unit_quantize_model()` - Model-level quantization
-✅ `test_unit_compare_model_sizes()` - Memory comparison
-
-### Missing Integration Coverage
-❌ **Cross-module compatibility** - No tests verify M16 works with M01-M15
-❌ **Real-world scenarios** - No tests on realistic architectures
-❌ **Production patterns** - No tests for deployment workflows
-❌ **Error recovery** - No tests for handling failures gracefully
-❌ **Performance validation** - No tests verify speedup claims
-❌ **Hardware compatibility** - No tests for different backends
-
---
-
-## Bug-Catching Priorities
-
-### P0: Critical Bugs (Would break student work)
-1. **Quantization corrupts model state** → Students lose trained models
-2. **Output accuracy degradation > 5%** → Models become useless
-3. **Memory not actually reduced** → False promises
-4. **In-place modification without warning** → Silent data loss
-
-### P1: High-Impact Bugs (Would frustrate students)
-5. **Quantized models incompatible with training** → Can't fine-tune
-6. **Profiler breaks on quantized models** → Can't measure impact
-7. **Edge cases crash silently** → Hard to debug
-
-### P2: Quality Issues (Would confuse students)
-8. **Inconsistent compression ratios** → Unclear value proposition
-9. **Calibration doesn't improve accuracy** → Wasted complexity
-10. **Documentation claims don't match reality** → Trust issues
-
---
-
-## Recommended Test File Structure
-
-```python
-"""
-Integration tests for Module 16: Quantization
-Tests INT8 quantization, model preservation, and system integration
-"""
-
-class TestQuantizationModelIntegrity:
-    """Verify quantization preserves model structure and functionality."""
-
-    def test_quantize_mlp_preserves_structure()
-    def test_quantize_cnn_preserves_spatial_ops()
-    def test_quantize_transformer_preserves_attention()
-    def test_quantized_model_trains_correctly()
-    def test_quantized_model_profiles_correctly()
-
-
-class TestQuantizationAccuracy:
-    """Verify quantized models maintain acceptable accuracy."""
-
-    def test_mlp_output_similarity()
-    def test_cnn_output_similarity()
-    def test_transformer_output_similarity()
-    def test_calibrated_vs_uncalibrated_accuracy()
-    def test_quantization_error_within_1_percent()
-
-
-class TestQuantizationMemorySavings:
-    """Verify actual memory reduction matches claims."""
-
-    def test_int8_tensor_actual_memory()
-    def test_compression_ratio_approximately_4x()
-    def test_memory_savings_persist_during_inference()
-    def test_profiler_measures_savings_correctly()
-    def test_os_level_memory_reduction()
-
-
-class TestQuantizationSafety:
-    """Verify safe usage patterns and error handling."""
-
-    def test_in_place_modification_warning()
-    def test_cannot_unquantize_model()
-    def test_deepcopy_prevents_modification()
-    def test_quantizing_quantized_model_errors()
-    def test_edge_case_constant_tensors()
-
-
-class TestQuantizationSystemIntegration:
-    """Verify quantization works with complete TinyTorch system."""
-
-    def test_complete_system_01_to_15_stable()
-    def test_quantized_dataloader_pipeline()
-    def test_quantized_training_workflow()
-    def test_quantization_plus_profiling()
-    def test_multimodal_model_quantization()
-
-
-class TestQuantizationEdgeCases:
-    """Test corner cases and error conditions."""
-
-    def test_empty_calibration_data()
-    def test_zero_weights_quantization()
-    def test_extreme_activation_ranges()
-    def test_model_with_no_linear_layers()
-    def test_single_layer_quantization_error()
-```
-
---
-
-## Success Metrics
-
-### Minimum Acceptable Coverage
- ✅ All P0 bugs prevented (4/4 tests)
- ✅ Integration with M01-M15 verified (5+ tests)
- ✅ Real-world scenarios tested (3+ architectures)
- ✅ Memory savings validated (actual measurements)
-
-### Gold Standard Coverage
- ✅ All recommended tests implemented (20+ tests)
- ✅ Cross-module regression suite (like M14)
- ✅ Performance benchmarks included
- ✅ Error handling comprehensive
-
---
-
-## Next Actions
-
-### Immediate (This Sprint)
-1. Create basic test structure (5 test classes)
-2. Implement P0 critical tests (4 tests)
-3. Add model integrity tests (5 tests)
-
-### Short-term (Next Sprint)
-4. Implement accuracy validation (5 tests)
-5. Add memory measurement tests (5 tests)
-6. Create safety/warning tests (5 tests)
-
-### Long-term (Future Sprints)
-7. Complete edge case coverage
-8. Add performance benchmarks
-9. Create comprehensive regression suite
-10. Document test patterns for future modules
-
---
-
-## Appendix: Test Examples
-
-### Example: Critical Integration Test
-
-```python
-def test_quantization_preserves_cnn_functionality():
-    """
-    CRITICAL: Verify quantized CNN still works with spatial operations.
-
-    Bug this catches:
-    - Quantization breaks Conv2D/MaxPool2D integration
-    - Shape mismatches after quantization
-    - Gradient flow issues (if backward supported)
-    """
-    from tinytorch.core.spatial import Conv2D, MaxPool2D
-    from tinytorch.core.layers import Linear
-    from tinytorch.core.activations import ReLU
-    from tinytorch.optimization.quantization import quantize_model
-
-    # Build realistic CNN
-    conv1 = Conv2D(3, 16, kernel_size=3)
-    pool = MaxPool2D(kernel_size=2)
-    conv2 = Conv2D(16, 32, kernel_size=3)
-    flatten = # ... flatten operation
-    fc = Linear(800, 10)  # Assume flattened size
-
-    model = SimpleCNN(conv1, pool, conv2, flatten, fc)
-
-    # Test original
-    x = Tensor(np.random.randn(4, 3, 32, 32))
-    original_output = model.forward(x)
-
-    # Quantize (in-place)
-    quantize_model(model)
-
-    # Test quantized
-    quantized_output = model.forward(x)
-
-    # Assertions
-    assert quantized_output.shape == original_output.shape, \
-        "Quantization changed output shape - BREAKS SYSTEM"
-
-    error = np.mean(np.abs(original_output.data - quantized_output.data))
-    assert error < 0.5, \
-        f"Quantization error {error:.3f} too high for CNN"
-
-    # Verify Conv2D layers still work
-    assert hasattr(model.conv1, 'forward'), \
-        "Quantization broke Conv2D API"
-```
-
---
-
-**Report Generated**: 2024-11-25
-**Auditor**: Claude (ML Systems QA)
-**Status**: Ready for implementation
--- a/tests/17_compression/INTEGRATION_TEST_AUDIT.md
+++ b/tests/17_compression/INTEGRATION_TEST_AUDIT.md
@@ -1,453 +0,0 @@
-# Module 17 (Compression/Pruning) - Integration Test Audit Report
-
-**Audit Date**: 2025-11-25
-**Auditor**: QA Agent
-**Module**: 17 - Compression (Pruning, Knowledge Distillation)
-**Status**: CRITICAL GAPS IDENTIFIED
-
---
-
-## Executive Summary
-
-**Current State**: Module 17 has ONLY a placeholder integration test file with no actual tests.
-
-**Risk Level**: HIGH - Module is exported to production package but lacks integration validation.
-
-**Critical Finding**: The checkpoint test (checkpoint_17_compression.py) expects completely different APIs than what's implemented in the actual module.
-
---
-
-## 1. Current Test Coverage
-
-### Existing Test Files
-```
-tests/17_compression/
-├── test_compression_integration.py  ❌ PLACEHOLDER ONLY (23 lines, no real tests)
-├── run_all_tests.py                 ✅ Exists but returns PENDING status
-└── __pycache__/
-```
-
-### Current Coverage: 0%
- **Unit Tests**: None in integration directory
- **Integration Tests**: Placeholder only
- **Progressive Tests**: Missing entirely
- **Cross-Module Tests**: None
-
---
-
-## 2. Critical Integration Points for Module 17
-
-Based on the actual implementation (`tinytorch/optimization/compression.py`), these are the critical integration points that MUST be tested:
-
-### 2.1 Pruning Doesn't Corrupt Shared Weight References
-**Risk**: High - Pruning modifies weights in-place
-**Current Coverage**: 0%
-**Bug Potential**: CRITICAL
-
-**What to test**:
-```python
-# Multiple layers sharing same weight tensor
-layer1 = Linear(10, 20)
-layer2_weights = layer1.weight  # Shared reference
-model = SimpleModel(layer1, layer2_with_shared_weights)
-
-magnitude_prune(model, sparsity=0.5)
-
-# CRITICAL: Verify both references see the same pruned weights
-# CRITICAL: Verify gradients still flow correctly through shared weights
-```
-
-**Why this matters**:
- Weight sharing is common (e.g., tied embeddings in transformers)
- In-place pruning could break reference sharing
- Could cause silent accuracy degradation
-
-### 2.2 Sparse Models Still Train Correctly
-**Risk**: High - Pruning creates zeros that must stay zero during training
-**Current Coverage**: 0%
-**Bug Potential**: CRITICAL
-
-**What to test**:
-```python
-model = create_simple_mlp()
-magnitude_prune(model, sparsity=0.7)
-
-# Train for several steps
-for _ in range(10):
-    output = model.forward(input)
-    loss = compute_loss(output, target)
-    loss.backward()
-    optimizer.step()
-
-# CRITICAL: Verify pruned weights remain zero after training
-# CRITICAL: Verify unpruned weights still update normally
-# CRITICAL: Verify loss decreases despite sparsity
-```
-
-**Why this matters**:
- Pruned weights should stay pruned during fine-tuning
- Optimizer updates could "resurrect" pruned weights
- Gradient flow through sparse matrices can be unstable
-
-### 2.3 Sparsity Measurement Consistency
-**Risk**: Medium - Different measurement methods should agree
-**Current Coverage**: 0%
-**Bug Potential**: MEDIUM
-
-**What to test**:
-```python
-model = create_model()
-magnitude_prune(model, sparsity=0.6)
-
-# Measure sparsity multiple ways
-sparsity_v1 = measure_sparsity(model)  # Current implementation
-sparsity_v2 = manual_count_zeros(model) / total_params(model)
-sparsity_v3 = CompressionComplete.measure_sparsity(model)
-
-# CRITICAL: All methods should agree within 1%
-assert abs(sparsity_v1 - sparsity_v2) < 0.01
-assert abs(sparsity_v1 - sparsity_v3) < 0.01
-```
-
-**Why this matters**:
- Inconsistent sparsity metrics confuse students
- Could hide bugs in pruning implementation
- Affects compression ratio calculations
-
-### 2.4 Pruned Model Inference Works
-**Risk**: High - Sparse operations must produce correct outputs
-**Current Coverage**: 0%
-**Bug Potential**: HIGH
-
-**What to test**:
-```python
-# Create model, train it, get baseline accuracy
-model = create_and_train_model()
-baseline_output = model.forward(test_input)
-
-# Prune and verify inference still works
-magnitude_prune(model, sparsity=0.7)
-pruned_output = model.forward(test_input)
-
-# CRITICAL: Output shape unchanged
-assert pruned_output.shape == baseline_output.shape
-
-# CRITICAL: Output values reasonable (not NaN/Inf)
-assert not np.any(np.isnan(pruned_output.data))
-assert not np.any(np.isinf(pruned_output.data))
-
-# CRITICAL: Output changes are bounded
-max_change = np.max(np.abs(pruned_output.data - baseline_output.data))
-assert max_change < 10.0  # Reasonable threshold
-```
-
-### 2.5 Structured vs Unstructured Pruning Interaction
-**Risk**: Medium - Both pruning types modify same weights
-**Current Coverage**: 0%
-**Bug Potential**: MEDIUM
-
-**What to test**:
-```python
-model = create_model()
-
-# Apply both pruning types
-magnitude_prune(model, sparsity=0.5)      # Unstructured
-initial_sparsity = measure_sparsity(model)
-
-structured_prune(model, prune_ratio=0.3)  # Structured
-final_sparsity = measure_sparsity(model)
-
-# CRITICAL: Sparsity should increase (or stay same)
-assert final_sparsity >= initial_sparsity
-
-# CRITICAL: Model still functional
-output = model.forward(test_input)
-assert output.shape == expected_shape
-```
-
-### 2.6 Knowledge Distillation Integration
-**Risk**: High - KD loss depends on correct tensor operations
-**Current Coverage**: 0%
-**Bug Potential**: HIGH
-
-**What to test**:
-```python
-teacher = create_large_model()
-student = create_small_model()
-
-kd = KnowledgeDistillation(teacher, student, temperature=3.0, alpha=0.7)
-
-# Generate predictions
-teacher_logits = teacher.forward(input)
-student_logits = student.forward(input)
-true_labels = np.array([0, 1, 2, 3])
-
-# Compute distillation loss
-loss = kd.distillation_loss(student_logits, teacher_logits, true_labels)
-
-# CRITICAL: Loss is a scalar
-assert np.isscalar(loss) or (isinstance(loss, np.ndarray) and loss.size == 1)
-
-# CRITICAL: Loss is positive and finite
-assert loss > 0
-assert not np.isnan(loss)
-assert not np.isinf(loss)
-
-# CRITICAL: Alpha parameter affects loss composition
-loss_high_alpha = KnowledgeDistillation(teacher, student, alpha=0.9).distillation_loss(...)
-loss_low_alpha = KnowledgeDistillation(teacher, student, alpha=0.1).distillation_loss(...)
-# Different alpha should give different losses
-assert abs(loss_high_alpha - loss_low_alpha) > 0.01
-```
-
---
-
-## 3. Missing Progressive Integration Tests
-
-Module 17 integration tests should verify the ENTIRE stack (Modules 01-17) still works:
-
-### 3.1 Prior Stack Regression Tests (MISSING)
-```python
-class TestPriorStackStillWorking:
-    """Verify Modules 01-16 unchanged after compression development."""
-
-    def test_quantization_still_works(self):
-        """Module 16 (Quantization) should be unaffected."""
-        # Test quantization APIs still functional
-
-    def test_profiling_still_works(self):
-        """Module 14 (Profiling) should be unaffected."""
-        # Test profiling APIs still functional
-
-    def test_training_pipeline_stable(self):
-        """Complete training pipeline (Modules 01-07) should work."""
-        # End-to-end training test
-```
-
-### 3.2 Cross-Module Integration Tests (MISSING)
-```python
-class TestCompressionWithOtherModules:
-    """Test compression works with other advanced modules."""
-
-    def test_compression_with_quantization(self):
-        """Test: Prune first, then quantize."""
-        model = create_model()
-        magnitude_prune(model, sparsity=0.7)
-        quantize_model(model, bits=8)
-        # Verify both optimizations work together
-
-    def test_compression_with_attention(self):
-        """Test: Prune attention mechanisms."""
-        attention = MultiHeadAttention(64, 8)
-        structured_prune(attention, prune_ratio=0.3)
-        # Verify attention still computes correctly
-
-    def test_compression_with_spatial_conv(self):
-        """Test: Prune CNN filters."""
-        conv = Conv2D(3, 64, kernel_size=3)
-        structured_prune(conv, prune_ratio=0.5)
-        # Verify convolutions still work
-```
-
---
-
-## 4. API Mismatch with Checkpoint Test
-
-**CRITICAL ISSUE**: The checkpoint test expects completely different APIs than what's implemented!
-
-### Expected APIs (from checkpoint_17_compression.py):
-```python
-from tinytorch.nn.utils.prune import (
-    MagnitudePruner,           # ❌ Class-based API
-    prune_conv_filters,        # ❌ Specialized function
-    CompressionAnalyzer        # ❌ Analysis class
-)
-
-pruner = MagnitudePruner()
-pruned_weights, mask, stats = pruner.prune(test_weights, sparsity=0.7)
-```
-
-### Actual Implementation (in compression.py):
-```python
-from tinytorch.optimization.compression import (
-    magnitude_prune,           # ✅ Function-based API
-    structured_prune,          # ✅ Function-based API
-    KnowledgeDistillation,     # ✅ KD class
-    measure_sparsity,          # ✅ Utility function
-    compress_model             # ✅ Pipeline function
-)
-
-magnitude_prune(model, sparsity=0.7)  # In-place, no mask/stats returned
-```
-
-### Resolution Required:
-1. **Option A**: Update checkpoint to match actual implementation
-2. **Option B**: Extend implementation to match checkpoint expectations
-3. **Option C**: Document API differences and maintain both
-
-**Recommendation**: Option A - Update checkpoint to match the cleaner functional API actually implemented.
-
---
-
-## 5. Bug-Catching Test Priorities
-
-### Priority 1: CRITICAL (Could cause silent failures)
-1. **Shared weight corruption test** - Highest risk for silent accuracy degradation
-2. **Training with pruned weights test** - Optimizer could resurrect pruned weights
-3. **Knowledge distillation loss validity test** - Invalid loss breaks training
-
-### Priority 2: HIGH (Could cause obvious failures)
-4. **Pruned model inference test** - Ensures basic functionality works
-5. **Sparsity measurement consistency test** - Prevents metric confusion
-6. **Cross-module integration tests** - Ensures compression doesn't break other modules
-
-### Priority 3: MEDIUM (Quality of life issues)
-7. **Structured vs unstructured interaction test** - Edge case handling
-8. **Progressive stack regression tests** - Prevent accidental breakage
-9. **Performance profiling tests** - Verify compression actually improves performance
-
---
-
-## 6. Recommended Test Structure
-
-```
-tests/17_compression/
-├── test_progressive_integration.py          # NEW - Progressive stack tests
-│   ├── TestPriorStackStillWorking          # Modules 01-16 regression
-│   ├── TestModule17CompressionCore         # Core compression functionality
-│   ├── TestProgressiveStackIntegration     # Full stack (01-17) integration
-│   └── TestRegressionPrevention            # Prevent breakage
-│
-├── test_compression_integration.py          # EXPAND - Currently placeholder
-│   ├── TestPruningIntegration              # In-place pruning behavior
-│   ├── TestSparsityConsistency             # Measurement accuracy
-│   ├── TestKnowledgeDistillation           # KD integration
-│   └── TestCrossModuleInteraction          # With quantization, attention, etc.
-│
-├── test_pruning_edge_cases.py              # NEW - Edge case handling
-│   ├── TestSharedWeightReferences          # CRITICAL
-│   ├── TestTrainingAfterPruning            # CRITICAL
-│   ├── TestExtremeSparsity                 # 0%, 100% sparsity
-│   └── TestInvalidInputHandling            # Error cases
-│
-└── test_compression_performance.py          # NEW - Performance validation
-    ├── TestMemoryReduction                 # Actual memory savings
-    ├── TestInferenceSpeed                  # Sparse inference performance
-    └── TestCompressionQuality              # Accuracy preservation
-```
-
---
-
-## 7. Sample Integration Test Implementation
-
-Here's a sample of what the CRITICAL shared weight test should look like:
-
-```python
-def test_pruning_with_shared_weights():
-    """CRITICAL: Verify pruning doesn't corrupt shared weight references."""
-    print("🔬 Testing pruning with shared weight references...")
-
-    # Create two layers sharing the same weight tensor
-    layer1 = Linear(100, 50)
-    layer2 = Linear(100, 50)
-
-    # Share weights (common pattern: tied embeddings)
-    layer2.weight = layer1.weight  # Share reference
-
-    # Create model with shared weights
-    model = SimpleModel(layer1, layer2)
-
-    # Verify weights are actually shared before pruning
-    original_id = id(layer1.weight.data)
-    assert id(layer2.weight.data) == original_id, "Weights should be shared"
-
-    # Apply magnitude pruning
-    magnitude_prune(model, sparsity=0.6)
-
-    # CRITICAL TEST 1: Weights still shared after pruning
-    assert id(layer1.weight.data) == id(layer2.weight.data), \
-        "Pruning should preserve weight sharing"
-
-    # CRITICAL TEST 2: Both layers see the same pruned pattern
-    assert np.array_equal(layer1.weight.data, layer2.weight.data), \
-        "Shared weights should have identical pruning masks"
-
-    # CRITICAL TEST 3: Sparsity is correct
-    sparsity = np.sum(layer1.weight.data == 0) / layer1.weight.data.size
-    assert 0.55 <= sparsity <= 0.65, \
-        f"Expected ~60% sparsity, got {sparsity:.1%}"
-
-    # CRITICAL TEST 4: Forward pass works with shared pruned weights
-    input_data = Tensor(np.random.randn(10, 100))
-    output1 = layer1.forward(input_data)
-    output2 = layer2.forward(input_data)
-
-    # Both layers should produce identical outputs (same weights)
-    assert np.allclose(output1.data, output2.data), \
-        "Shared pruned weights should produce identical outputs"
-
-    print("✅ Shared weight pruning works correctly!")
-```
-
---
-
-## 8. Actionable Recommendations
-
-### Immediate Actions (This Sprint)
-1. **Create test_progressive_integration.py** - Following Module 02 pattern
-2. **Implement 6 critical integration tests** - Focus on shared weights, training, KD
-3. **Resolve checkpoint API mismatch** - Update checkpoint or extend implementation
-4. **Add cross-module tests** - Compression + Quantization, Compression + Attention
-
-### Short-term Actions (Next Sprint)
-5. **Add edge case tests** - Extreme sparsity, invalid inputs, error handling
-6. **Add performance validation tests** - Verify actual memory/speed improvements
-7. **Document integration patterns** - How compression interacts with other modules
-8. **Create test data fixtures** - Reusable models for testing
-
-### Long-term Actions (Future)
-9. **Continuous integration monitoring** - Add to CI/CD pipeline
-10. **Property-based testing** - Use Hypothesis for generative test cases
-11. **Benchmark suite** - Performance regression detection
-12. **Student confusion monitoring** - Track common errors in integration
-
---
-
-## 9. Risk Assessment
-
-| Risk Category | Likelihood | Impact | Mitigation Priority |
-|---------------|------------|--------|---------------------|
-| Shared weight corruption | HIGH | CRITICAL | P1 - Immediate |
-| Training resurrects pruned weights | HIGH | CRITICAL | P1 - Immediate |
-| KD loss computation errors | MEDIUM | HIGH | P1 - Immediate |
-| Sparsity measurement bugs | MEDIUM | MEDIUM | P2 - Short-term |
-| Cross-module incompatibility | LOW | HIGH | P2 - Short-term |
-| API confusion (checkpoint mismatch) | HIGH | MEDIUM | P1 - Immediate |
-
---
-
-## 10. Conclusion
-
-**Module 17 (Compression) has ZERO integration test coverage despite being exported to production.**
-
-**Highest-risk gaps**:
-1. No validation that pruning preserves shared weight references
-2. No validation that pruned models can still train
-3. No validation that knowledge distillation produces valid losses
-4. Complete API mismatch with checkpoint expectations
-
-**Recommended action**: Implement the 6 critical integration tests IMMEDIATELY before any student uses this module in combination with other modules.
-
-**Estimated effort**:
- Critical tests (Priority 1): 4-6 hours
- High-priority tests (Priority 2): 3-4 hours
- Progressive integration structure: 2-3 hours
- **Total**: 10-13 hours to achieve acceptable coverage
-
-**Next steps**: Review this audit with Module Developer, prioritize critical tests, assign implementation tasks.
-
---
-
-**Audit completed**: 2025-11-25
-**Reviewed by**: QA Agent
-**Status**: APPROVED FOR DEVELOPMENT
--- a/tests/19_benchmarking/INTEGRATION_TEST_AUDIT.md
+++ b/tests/19_benchmarking/INTEGRATION_TEST_AUDIT.md
@@ -1,615 +0,0 @@
-# Module 19 (Benchmarking) - Integration Test Audit Report
-
-**Audit Date**: 2025-11-25
-**Module**: 19_benchmarking
-**Current Test File**: `tests/19_benchmarking/test_benchmarking_integration.py`
-**Status**: STUB ONLY - NO IMPLEMENTATION
-
---
-
-## EXECUTIVE SUMMARY
-
-**CRITICAL FINDING**: Module 19 integration tests are completely unimplemented (TODO stub only).
-
- **Current Coverage**: 0% (stub file with TODO comments)
- **Expected Coverage**: ~80% for production-ready benchmarking system
- **Priority**: HIGH - Benchmarking is final implementation module and capstone foundation
- **Risk**: Students cannot validate benchmarking correctness or integration with optimization modules
-
---
-
-## 1. CURRENT TEST COVERAGE ANALYSIS
-
-### 1.1 What EXISTS (Stub Only)
-
-```python
-def test_benchmarking_integration():
-    """Test benchmarking system integration."""
-    # TODO: Implement integration tests
-    # - Test benchmark runner
-    # - Test performance metrics collection
-    # - Test result validation
-    # - Test comparison with baselines
-    # - Test leaderboard submission
-    pass
-```
-
-**Lines of Code**: 24 (all comments/stubs)
-**Actual Tests**: 0
-**Integration Scenarios**: 0
-
-### 1.2 What Module 19 IMPLEMENTS (2546 lines)
-
-Module 19 provides comprehensive benchmarking infrastructure:
-
-**Core Components**:
-1. `BenchmarkResult` - Statistical analysis container
-2. `PreciseTimer` - High-precision timing infrastructure
-3. `Benchmark` - Multi-model comparison framework
-4. `BenchmarkSuite` - Comprehensive multi-metric evaluation
-5. `TinyMLPerf` - Industry-standard benchmark runner
-6. `compare_optimization_techniques()` - Optimization comparison engine
-
-**Key Integration Points**:
- Uses `Profiler` from Module 14 for measurements
- Uses `Tensor` from Module 01 for data handling
- Should work with optimized models from Modules 15-18
- Generates reports for TorchPerf Olympics capstone
-
---
-
-## 2. CRITICAL INTEGRATION POINTS FOR MODULE 19
-
-### 2.1 Real Model Performance Measurement
-
-**What Needs Testing**:
-```python
-✗ Benchmark measures ACTUAL model latency (not simulated)
-✗ Benchmark measures REAL memory usage (not estimates)
-✗ Benchmark handles different model types (TinyTorch, PyTorch, custom)
-✗ Benchmark works with models from previous modules (Conv2D, MLP, Transformer)
-```
-
-**Why Critical**:
- Students need to benchmark their actual implementations, not mock models
- Profiler integration must work correctly with real TinyTorch models
- Duck-typing (hasattr checks) must handle various model interfaces
-
-### 2.2 Statistical Validity of Measurements
-
-**What Needs Testing**:
-```python
-✗ Confidence intervals calculated correctly
-✗ Warmup runs eliminate cold-start effects
-✗ Measurement variance is reasonable (CV < 20%)
-✗ Outlier detection prevents skewed results
-✗ Sample size recommendations are valid
-```
-
-**Why Critical**:
- Poor statistics lead to incorrect optimization decisions
- Benchmarking is worthless without statistical rigor
- Students must learn to trust/distrust measurements
-
-### 2.3 Resource Exhaustion Prevention
-
-**What Needs Testing**:
-```python
-✗ Memory benchmarks don't cause OOM crashes
-✗ Large models don't hang the benchmarking system
-✗ Timeout mechanisms prevent infinite loops
-✗ Graceful degradation when resources are limited
-✗ Clean resource cleanup after benchmarks
-```
-
-**Why Critical**:
- Benchmarking shouldn't crash student systems
- Edge cases (huge models, limited RAM) must be handled
- Production systems require robust error handling
-
-### 2.4 Benchmark Results Reproducibility
-
-**What Needs Testing**:
-```python
-✗ Same model produces consistent results across runs
-✗ Randomness is controlled (seeded) where needed
-✗ System state doesn't affect benchmark validity
-✗ Results can be serialized/deserialized correctly
-✗ Comparison across different machines is meaningful
-```
-
-**Why Critical**:
- TorchPerf Olympics requires reproducible submissions
- Students must be able to verify their optimizations
- Leaderboard requires fair comparisons
-
-### 2.5 Optimization Module Integration (M15-18)
-
-**What Needs Testing**:
-```python
-✗ Benchmark works with quantized models (Module 15)
-✗ Benchmark works with pruned models (Module 16)
-✗ Benchmark works with distilled models (Module 17)
-✗ Benchmark works with fused operators (Module 18)
-✗ compare_optimization_techniques() handles all optimization types
-```
-
-**Why Critical**:
- Module 19 is the EVALUATION framework for Modules 15-18
- Without integration, students can't validate optimizations
- Capstone requires combining multiple optimization techniques
-
-### 2.6 TinyMLPerf Standard Compliance
-
-**What Needs Testing**:
-```python
-✗ Standard benchmarks (keyword_spotting, image_classification, etc.) run correctly
-✗ Compliance thresholds enforced properly
-✗ Report generation matches MLPerf format
-✗ Leaderboard submission format is valid
-✗ Results are comparable to official MLPerf baselines
-```
-
-**Why Critical**:
- Industry-standard benchmarking teaches professional practices
- Capstone submissions require MLPerf-style reporting
- Career preparation for ML engineering roles
-
---
-
-## 3. MISSING INTEGRATION TESTS (BY PRIORITY)
-
-### PRIORITY 1: Core Benchmarking Workflow (CRITICAL)
-
-**Test**: `test_benchmark_real_tinytorch_models()`
-```python
-def test_benchmark_real_tinytorch_models():
-    """
-    ✅ TEST: Benchmark should measure REAL TinyTorch models correctly
-
-    VALIDATES:
-    - Integration with Tensor, Linear, Conv2D from earlier modules
-    - Profiler from Module 14 works in benchmarking context
-    - Latency/memory measurements are realistic (not zero, not infinite)
-    - Results structure is correct and serializable
-
-    🐛 BUG-CATCHING:
-    - Model.forward() not being called correctly
-    - Profiler returning None or invalid measurements
-    - Memory tracking not working with TinyTorch tensors
-    - Duck-typing failures with real TinyTorch models
-    """
-```
-
-**Bug Examples**:
- Benchmark tries to call `model.predict()` but TinyTorch uses `model.forward()`
- Memory measurement returns 0 for all models
- Latency measurement includes warmup time incorrectly
-
---
-
-**Test**: `test_statistical_validity()`
-```python
-def test_statistical_validity():
-    """
-    ✅ TEST: Statistical analysis should be mathematically correct
-
-    VALIDATES:
-    - Confidence intervals calculated using proper formulas
-    - Mean/std/median computed correctly
-    - Sample size sufficient for statistical significance
-    - Variance is reasonable (not too high or too low)
-
-    🐛 BUG-CATCHING:
-    - Wrong t-score value (should be 1.96 for 95% CI)
-    - Division by zero when n=1
-    - CI width unreasonably large (>50% of mean)
-    - Outliers not handled properly
-    """
-```
-
-**Bug Examples**:
- Confidence interval calculation uses wrong formula
- Single measurement causes divide-by-zero in std calculation
- Outliers skew results (one 100ms measurement among 1ms measurements)
-
---
-
-**Test**: `test_benchmark_suite_multi_metric()`
-```python
-def test_benchmark_suite_multi_metric():
-    """
-    ✅ TEST: BenchmarkSuite should run all metrics and combine results
-
-    VALIDATES:
-    - Latency, accuracy, memory, energy all measured
-    - Results structure contains all metrics
-    - Pareto frontier analysis identifies optimal models
-    - Report generation produces valid output
-
-    🐛 BUG-CATCHING:
-    - One metric failing breaks entire suite
-    - Results missing some metrics
-    - Pareto analysis chooses dominated solutions
-    - Energy estimation produces negative values
-    """
-```
-
---
-
-### PRIORITY 2: Optimization Integration (HIGH)
-
-**Test**: `test_optimization_module_integration()`
-```python
-def test_optimization_module_integration():
-    """
-    ✅ TEST: Benchmark should work with models from optimization modules
-
-    VALIDATES:
-    - Quantized models (Module 15) benchmark correctly
-    - Pruned models (Module 16) show reduced memory
-    - Distilled models (Module 17) measured accurately
-    - Fused operators (Module 18) show speedups
-    - compare_optimization_techniques() generates valid comparisons
-
-    🐛 BUG-CATCHING:
-    - Quantized model measurement crashes
-    - Pruned model memory doesn't decrease
-    - Fused operators show no speedup
-    - Comparison function fails with empty models
-    """
-```
-
-**Bug Examples**:
- Quantized model forward() returns wrong dtype, crashes Profiler
- Pruned model parameter counting doesn't account for sparse weights
- Comparison assumes all models have same interface
-
---
-
-**Test**: `test_optimization_recommendations()`
-```python
-def test_optimization_recommendations():
-    """
-    ✅ TEST: Recommendation engine should provide actionable guidance
-
-    VALIDATES:
-    - Recommendations match use case constraints
-    - Latency-critical use case chooses fastest model
-    - Memory-constrained use case chooses smallest model
-    - Balanced use case considers multiple metrics
-    - Recommendations include reasoning
-
-    🐛 BUG-CATCHING:
-    - Latency-critical recommends slowest model
-    - Memory-constrained ignores memory metric
-    - Recommendations contradict actual measurements
-    - Reasoning is generic (not specific to results)
-    """
-```
-
---
-
-### PRIORITY 3: Robustness & Edge Cases (MEDIUM)
-
-**Test**: `test_resource_exhaustion_prevention()`
-```python
-def test_resource_exhaustion_prevention():
-    """
-    ✅ TEST: Benchmark should handle resource constraints gracefully
-
-    VALIDATES:
-    - Large models don't cause OOM crashes
-    - Long-running benchmarks can be interrupted
-    - Memory is cleaned up after benchmarks
-    - Timeout prevents infinite loops
-    - Error messages are helpful
-
-    🐛 BUG-CATCHING:
-    - Memory leak in benchmark loop
-    - No timeout on model.forward() calls
-    - Crash instead of graceful degradation
-    - Resources not released on exception
-    """
-```
-
-**Bug Examples**:
- Benchmarking 1GB model crashes with OOM
- Infinite loop in warmup phase (no timeout)
- Memory leak: each benchmark run consumes more memory
-
---
-
-**Test**: `test_benchmark_reproducibility()`
-```python
-def test_benchmark_reproducibility():
-    """
-    ✅ TEST: Benchmark results should be reproducible
-
-    VALIDATES:
-    - Same model gives consistent results across runs
-    - Random seed controls variability
-    - Serialized results match original
-    - Deserialized results can be compared
-    - Variance is within acceptable bounds (CV < 10%)
-
-    🐛 BUG-CATCHING:
-    - Results vary wildly between identical runs (CV > 50%)
-    - Serialization loses precision
-    - Deserialization fails on valid files
-    - No seed control for reproducibility
-    """
-```
-
---
-
-**Test**: `test_edge_case_models()`
-```python
-def test_edge_case_models():
-    """
-    ✅ TEST: Benchmark should handle unusual model types
-
-    VALIDATES:
-    - Empty model (no parameters) doesn't crash
-    - Single-parameter model benchmarks correctly
-    - Model with no forward() method fails gracefully
-    - Model returning wrong shape is caught
-    - Non-tensor outputs handled appropriately
-
-    🐛 BUG-CATCHING:
-    - Empty model causes division by zero
-    - Missing forward() crashes instead of error message
-    - Wrong output shape causes silent failure
-    - Non-tensor output crashes Profiler
-    """
-```
-
---
-
-### PRIORITY 4: TinyMLPerf & Capstone (MEDIUM-HIGH)
-
-**Test**: `test_tinymlperf_standard_benchmarks()`
-```python
-def test_tinymlperf_standard_benchmarks():
-    """
-    ✅ TEST: TinyMLPerf should run standard industry benchmarks
-
-    VALIDATES:
-    - All standard benchmarks (keyword_spotting, image_classification, etc.) run
-    - Compliance thresholds enforced correctly
-    - Report format matches MLPerf specification
-    - Leaderboard submission JSON is valid
-    - Results comparable to reference implementations
-
-    🐛 BUG-CATCHING:
-    - Benchmark names don't match MLPerf standard
-    - Compliance check uses wrong thresholds
-    - Report missing required fields
-    - JSON serialization produces invalid format
-    """
-```
-
---
-
-**Test**: `test_torchperf_olympics_workflow()`
-```python
-def test_torchperf_olympics_workflow():
-    """
-    ✅ TEST: TorchPerf Olympics submission workflow should work end-to-end
-
-    VALIDATES:
-    - Student can choose Olympic event
-    - Benchmark runs for chosen event
-    - Results validated against event constraints
-    - Submission package generated correctly
-    - Leaderboard ranking calculated properly
-
-    🐛 BUG-CATCHING:
-    - Event constraints not enforced
-    - Invalid submission passes validation
-    - Ranking algorithm broken (ties handled wrong)
-    - Submission package missing required files
-    """
-```
-
---
-
-### PRIORITY 5: Progressive Integration (MEDIUM)
-
-**Test**: `test_complete_tinytorch_system_still_works()`
-```python
-def test_complete_tinytorch_system_still_works():
-    """
-    🔄 REGRESSION: Complete TinyTorch system (Modules 01-18) should still work
-
-    VALIDATES:
-    - Tensor, activations, layers still functional
-    - Training loops still work
-    - Optimization modules (15-18) still work
-    - Benchmarking doesn't break existing functionality
-
-    🐛 BUG-CATCHING:
-    - Benchmarking imports break core modules
-    - Profiler integration interferes with training
-    - Circular dependencies introduced
-    """
-```
-
---
-
-## 4. REFERENCE: GOOD INTEGRATION TEST STRUCTURE
-
-Based on `tests/02_activations/test_progressive_integration.py`:
-
-```python
-"""
-Module 19: Progressive Integration Tests
-Tests that Module 19 (Benchmarking) works correctly AND that entire TinyTorch system still works.
-
-DEPENDENCY CHAIN: 01_tensor → ... → 18_fusion → 19_benchmarking → Capstone
-Final validation before TorchPerf Olympics capstone project.
-"""
-
-import numpy as np
-import sys
-from pathlib import Path
-sys.path.insert(0, str(Path(__file__).parent.parent.parent))
-
-
-class TestModules01Through18StillWorking:
-    """Verify all previous modules still work after benchmarking development."""
-
-    def test_core_modules_stable(self):
-        """Ensure core modules (01-09) weren't broken."""
-        # Test imports and basic functionality
-        pass
-
-    def test_optimization_modules_stable(self):
-        """Ensure optimization modules (15-18) still work."""
-        # Test quantization, pruning, distillation, fusion
-        pass
-
-
-class TestModule19BenchmarkingCore:
-    """Test Module 19 core benchmarking functionality."""
-
-    def test_benchmark_result_statistics(self):
-        """Test BenchmarkResult calculates statistics correctly."""
-        pass
-
-    def test_benchmark_runner_real_models(self):
-        """Test Benchmark class with real TinyTorch models."""
-        pass
-
-    def test_benchmark_suite_multi_metric(self):
-        """Test BenchmarkSuite runs all metrics."""
-        pass
-
-    def test_tinymlperf_compliance(self):
-        """Test TinyMLPerf standard benchmarks."""
-        pass
-
-
-class TestProgressiveStackIntegration:
-    """Test complete stack (01→19) works together."""
-
-    def test_benchmark_optimized_models_pipeline(self):
-        """Test benchmarking pipeline with models from optimization modules."""
-        # Create base model
-        # Apply optimization (quantize, prune, etc.)
-        # Benchmark both
-        # Verify comparison results
-        pass
-
-    def test_torchperf_olympics_submission_workflow(self):
-        """Test end-to-end capstone submission workflow."""
-        # Choose event
-        # Optimize model
-        # Benchmark
-        # Generate submission
-        # Validate submission
-        pass
-```
-
---
-
-## 5. BUG-CATCHING PRIORITIES
-
-### 5.1 CRITICAL Bugs (Would Break Capstone)
-
-1. **Benchmark fails with real TinyTorch models** → Students can't validate their work
-2. **Statistical calculations wrong** → Incorrect optimization decisions
-3. **Memory measurement always returns 0** → Can't evaluate memory optimizations
-4. **Profiler integration broken** → No measurements at all
-5. **compare_optimization_techniques() crashes** → Can't compare optimizations
-
-### 5.2 HIGH-PRIORITY Bugs (Would Mislead Students)
-
-6. **Confidence intervals calculated incorrectly** → False confidence in results
-7. **Warmup runs not working** → Cold-start bias in measurements
-8. **Pareto frontier analysis chooses dominated solutions** → Wrong recommendations
-9. **Energy estimation produces negative values** → Meaningless results
-10. **Reproducibility broken** → Can't verify submissions
-
-### 5.3 MEDIUM-PRIORITY Bugs (Would Cause Confusion)
-
-11. **Duck-typing fails with custom models** → Limits flexibility
-12. **Resource exhaustion crashes system** → Poor student experience
-13. **Serialization loses precision** → Comparison errors
-14. **Report generation missing metrics** → Incomplete analysis
-15. **Timeout not implemented** → Infinite loops possible
-
---
-
-## 6. RECOMMENDED IMPLEMENTATION ORDER
-
-### Phase 1: Core Functionality (Week 1)
-1. `test_benchmark_real_tinytorch_models()` - CRITICAL
-2. `test_statistical_validity()` - CRITICAL
-3. `test_benchmark_suite_multi_metric()` - CRITICAL
-
-### Phase 2: Optimization Integration (Week 2)
-4. `test_optimization_module_integration()` - HIGH
-5. `test_optimization_recommendations()` - HIGH
-6. `test_complete_tinytorch_system_still_works()` - HIGH (regression)
-
-### Phase 3: Robustness (Week 3)
-7. `test_resource_exhaustion_prevention()` - MEDIUM
-8. `test_benchmark_reproducibility()` - MEDIUM
-9. `test_edge_case_models()` - MEDIUM
-
-### Phase 4: Capstone Preparation (Week 4)
-10. `test_tinymlperf_standard_benchmarks()` - MEDIUM-HIGH
-11. `test_torchperf_olympics_workflow()` - MEDIUM-HIGH
-
---
-
-## 7. ACCEPTANCE CRITERIA
-
-Module 19 integration tests are COMPLETE when:
-
- [ ] **Benchmark works with real TinyTorch models** (Tensor, Linear, Conv2D, MLP, Transformer)
- [ ] **Statistical analysis is mathematically correct** (CI, mean, std validated)
- [ ] **All metrics measured correctly** (latency, memory, accuracy, energy)
- [ ] **Optimization modules integrate properly** (quantization, pruning, distillation, fusion)
- [ ] **Resource exhaustion prevented** (OOM, timeouts, cleanup tested)
- [ ] **Results are reproducible** (same model → consistent results)
- [ ] **TinyMLPerf compliance validated** (standard benchmarks run correctly)
- [ ] **Capstone workflow tested end-to-end** (Olympics submission works)
- [ ] **Progressive integration verified** (all previous modules still work)
- [ ] **Test coverage ≥ 80%** for critical integration points
-
---
-
-## 8. CONCLUSION
-
-**Current State**: CRITICAL GAP - No integration tests implemented
-
-**Risk Level**: HIGH
- Students cannot validate benchmarking correctness
- Capstone project (TorchPerf Olympics) has no test foundation
- Integration with optimization modules unverified
- Statistical validity unchecked
-
-**Recommendation**: IMPLEMENT IMMEDIATELY
- Start with Phase 1 (core functionality) ASAP
- Module 19 is the final implementation module before capstone
- Benchmarking is the EVALUATION framework for all optimizations
- Without tests, students cannot trust their measurements
-
-**Estimated Effort**: 3-4 weeks for complete implementation
- Week 1: Core benchmarking tests (3 tests, ~500 LOC)
- Week 2: Optimization integration tests (3 tests, ~400 LOC)
- Week 3: Robustness tests (3 tests, ~300 LOC)
- Week 4: Capstone workflow tests (2 tests, ~300 LOC)
-
-**Total**: ~11 comprehensive integration tests, ~1500 LOC
-
---
-
-**Next Steps**:
-1. Implement `test_benchmark_real_tinytorch_models()` first (most critical)
-2. Add `test_statistical_validity()` (foundation for all analysis)
-3. Proceed through phases systematically
-4. Test with real student models from earlier modules
-5. Validate capstone workflow before student submission deadlines
--- a/tests/cli/CLEANUP_REPORT.md
+++ b/tests/cli/CLEANUP_REPORT.md
@@ -1,119 +0,0 @@
-# CLI Command Files - Usage Report
-
-## Summary
-
-**Status**: ✅ All files are accounted for. Some are imported but not exposed as top-level commands.
-
-## File Categories
-
-### 1. ✅ Registered Top-Level Commands (18)
-These are in `TinyTorchCLI.commands` and accessible via `tito <command>`:
-
-```
-benchmark, book, checkpoint, community, demo, export,
-grade, leaderboard, logo, milestones, module, nbgrader,
-olympics, package, setup, src, system, test
-```
-
-### 2. 🔧 Internal Subcommands (7)
-**Imported and used by other commands, but not top-level:**
-
-| File | Used By | Purpose |
-|------|---------|---------|
-| `reset.py` | `package.py` | Reset functionality for package command |
-| `module_reset.py` | `module_workflow.py` | Module reset subcommand |
-| `status.py` | - | Imported in main.py but not clearly used |
-| `nbdev.py` | `package.py` | NBDev integration for package command |
-| `info.py` | `system.py`, `health.py` | System info subcommand |
-| `health.py` | `system.py` | System health check subcommand |
-| `jupyter.py` | `system.py` | Jupyter integration subcommand |
-
-**Action**: ✅ **KEEP THESE** - They're used by other commands
-
-### 3. ❓ Imported but Unclear Usage (1)
-
-| File | Issue | Recommendation |
-|------|-------|----------------|
-| `notebooks.py` | Imported in main.py, but no usage found | Check if used, otherwise remove import |
-| `status.py` | Imported in main.py, but no clear usage | Check if used, otherwise remove import |
-
-**Action**: Need to verify these
-
-### 4. 🗑️ Likely Unused/Deprecated (9)
-
-| File | Status |
-|------|--------|
-| `check.py` | Not imported anywhere |
-| `clean.py` | Not imported anywhere |
-| `clean_workspace.py` | Not imported anywhere |
-| `help.py` | Not imported anywhere |
-| `protect.py` | Not imported anywhere |
-| `report.py` | Not imported anywhere |
-| `version.py` | Not imported anywhere |
-| `view.py` | Not imported anywhere |
-
-**Action**: ⚠️ Safe to delete (not imported anywhere)
-
-## Cleanup Actions
-
-### Step 1: Remove Dead Imports from main.py
-
-These are imported but not registered or used:
-
-```python
-# Remove from tito/main.py lines 28-37:
-from .commands.notebooks import NotebooksCommand  # ❌ Not used
-from .commands.status import StatusCommand        # ❌ Not used (verify first)
-```
-
-### Step 2: Delete Truly Unused Files
-
-```bash
-# These are safe to delete (not imported anywhere)
-rm tito/commands/check.py
-rm tito/commands/clean.py
-rm tito/commands/clean_workspace.py
-rm tito/commands/help.py
-rm tito/commands/protect.py
-rm tito/commands/report.py
-rm tito/commands/version.py
-rm tito/commands/view.py
-```
-
-### Step 3: Verify and Update Tests
-
-Update `test_cli_registry.py` to remove deleted files from `known_internal`:
-
-```python
-known_internal = {
-    'health.py',      # Used by system command
-    'info.py',        # Used by system command
-    'jupyter.py',     # Used by system command
-    'nbdev.py',       # Used by package command
-    'notebooks.py',   # Verify if needed, otherwise remove
-    'reset.py',       # Used by package command
-    'status.py',      # Verify if needed, otherwise remove
-    'module_reset.py' # Used by module_workflow command
-}
-```
-
-## Verification Commands
-
-Check if status.py is actually used:
-```bash
-grep -r "StatusCommand" tito/ --include="*.py" | grep -v "^tito/main.py:from" | grep -v "class StatusCommand"
-```
-
-Check if notebooks.py is actually used:
-```bash
-grep -r "NotebooksCommand" tito/ --include="*.py" | grep -v "^tito/main.py:from" | grep -v "class NotebooksCommand"
-```
-
-## Final Architecture
-
-After cleanup, you'll have:
- **18 top-level commands** (user-facing via `tito <cmd>`)
- **7-8 internal commands** (used as helpers by other commands)
- **0 orphaned files** (everything has a purpose)
-
-Clean CLI with clear separation between public API and internal helpers!
--- a/tests/cli/FINAL_ANSWER.md
+++ b/tests/cli/FINAL_ANSWER.md
@@ -1,107 +0,0 @@
-# Final Answer: CLI Command Cleanup
-
-## What the Tests Found ✅
-
-**Good news**: No broken or dangling commands! Everything is accounted for.
-
-**However**: Found some cleanup opportunities:
-
-### 1. Dead Imports in main.py
-
-These 2 commands are imported but **never used**:
-```python
-# tito/main.py lines 28 and 37
-from .commands.notebooks import NotebooksCommand  # ❌ DELETE
-from .commands.status import StatusCommand        # ❌ DELETE
-```
-
-They're only in `__init__.py` exports, not actually used anywhere.
-
-### 2. Orphaned Command Files (8 files)
-
-These files exist but are **not imported anywhere**:
-```bash
-tito/commands/check.py
-tito/commands/clean.py
-tito/commands/clean_workspace.py
-tito/commands/help.py
-tito/commands/protect.py
-tito/commands/report.py
-tito/commands/version.py
-tito/commands/view.py
-```
-
-### 3. Internal Helper Commands (6 files) ✅ KEEP
-
-These are used by other commands:
- `reset.py` → used by `package.py`
- `nbdev.py` → used by `package.py`
- `info.py` → used by `system.py`
- `health.py` → used by `system.py`
- `jupyter.py` → used by `system.py`
- `module_reset.py` → used by `module_workflow.py`
-
-## Recommended Actions
-
-### Option A: Full Cleanup (Recommended)
-
-```bash
-# 1. Delete truly orphaned files
-rm tito/commands/check.py
-rm tito/commands/clean.py
-rm tito/commands/clean_workspace.py
-rm tito/commands/help.py
-rm tito/commands/protect.py
-rm tito/commands/report.py
-rm tito/commands/version.py
-rm tito/commands/view.py
-
-# 2. Delete unused imported files
-rm tito/commands/notebooks.py
-rm tito/commands/status.py
-
-# 3. Remove dead imports from main.py
-# Edit tito/main.py and remove lines 28 and 37
-```
-
-### Option B: Conservative (Move to Archive)
-
-```bash
-# Move to archive instead of deleting
-mkdir -p tito/commands/_archived
-mv tito/commands/{check,clean,clean_workspace,help,protect,report,version,view}.py tito/commands/_archived/
-mv tito/commands/{notebooks,status}.py tito/commands/_archived/
-```
-
-### Option C: Do Nothing
-
-Current state is **fine** - tests prove nothing is broken. The extra files just create clutter but don't hurt.
-
-## After Cleanup
-
-Update `tests/cli/test_cli_registry.py`:
-
-```python
-# Remove these from known_internal since they'll be deleted:
-known_internal = {
-    'health.py',       # Used by system
-    'info.py',         # Used by system
-    'jupyter.py',      # Used by system
-    'nbdev.py',        # Used by package
-    'reset.py',        # Used by package
-    'module_reset.py'  # Used by module_workflow
-}
-```
-
-## Summary
-
-Your CLI is **healthy**! The tests caught:
- ✅ 18 working registered commands
- ✅ 6 internal helper commands (properly used)
- ❌ 2 dead imports (should remove)
- ❌ 8 orphaned files (safe to delete)
- ❌ 2 unused command files (safe to delete)
-
-**Total cleanup**: 12 files/imports that can be safely removed without breaking anything.
-
-Want me to do the cleanup for you?
--- a/tests/cli/REFACTOR_COMPLETE.md
+++ b/tests/cli/REFACTOR_COMPLETE.md
@@ -1,233 +0,0 @@
-# CLI Hierarchy Refactor - COMPLETE ✅
-
-## Summary
-
-Successfully refactored TinyTorch CLI from flat structure to hierarchical organization with subfolders for complex commands.
-
-**Date**: 2025-11-28
-**Tests Passing**: 52/52 ✅
-**User Impact**: ZERO (completely internal)
-
---
-
-## What Changed
-
-### Before (Flat Structure)
-```
-tito/commands/
-├── module_workflow.py
-├── module_reset.py
-├── system.py
-├── info.py
-├── health.py
-├── jupyter.py
-├── package.py
-├── reset.py
-├── nbdev.py
-├── ... (34 files total, hard to navigate)
-```
-
-### After (Hierarchical Structure)
-```
-tito/commands/
-├── module/
-│   ├── __init__.py
-│   ├── workflow.py         # Main module command
-│   └── reset.py            # Module reset subcommand
-├── system/
-│   ├── __init__.py
-│   ├── system.py           # Main system command
-│   ├── info.py             # system info
-│   ├── health.py           # system doctor
-│   └── jupyter.py          # system jupyter
-├── package/
-│   ├── __init__.py
-│   ├── package.py          # Main package command
-│   ├── reset.py            # package reset
-│   └── nbdev.py            # package nbdev
-├── _archived/              # Deprecated files
-│   ├── clean.py
-│   ├── help.py
-│   ├── notebooks.py
-│   └── status.py
-├── setup.py                # Simple commands stay flat
-├── test.py
-├── export.py
-└── ... (15 simple commands)
-```
-
---
-
-## Benefits
-
-### ✅ Clear Ownership
- Easy to see that `module/reset.py` belongs to module command
- No confusion about which files are helpers vs top-level commands
-
-### ✅ Better Organization
- Related files grouped together
- Subfolders scale as commands grow
- Clear separation between simple and complex commands
-
-### ✅ Easier Maintenance
- Tests validate structure automatically
- Adding new subcommands is straightforward
- No orphaned files hiding in flat structure
-
-### ✅ Zero User Impact
-```bash
-# These still work EXACTLY the same:
-tito module complete 01
-tito system info
-tito package export
-```
-
---
-
-## Files Changed
-
-### Moved Files (10)
-```
-module_workflow.py  → module/workflow.py
-module_reset.py     → module/reset.py
-system.py           → system/system.py
-info.py             → system/info.py
-health.py           → system/health.py
-jupyter.py          → system/jupyter.py
-package.py          → package/package.py
-reset.py            → package/reset.py
-nbdev.py            → package/nbdev.py
-```
-
-### Created Files (4)
-```
-module/__init__.py
-system/__init__.py
-package/__init__.py
-_archived/README.md
-```
-
-### Updated Files (3)
-```
-tito/main.py                          # Updated imports
-tito/commands/__init__.py             # Updated imports
-tests/cli/test_cli_registry.py        # Updated file path expectations
-```
-
-### Archived Files (4)
-```
-Moved to _archived/:
- clean.py (deprecated)
- help.py (deprecated)
- notebooks.py (deprecated)
- status.py (deprecated)
-```
-
---
-
-## Test Results
-
-### Before Refactor
-```
-52 tests passing ✅
-```
-
-### After Refactor
-```
-52 tests passing ✅
-```
-
-### Test Coverage
- ✅ All commands are BaseCommand subclasses
- ✅ All commands have descriptions
- ✅ All commands implement required methods
- ✅ All help text accessible
- ✅ No orphaned files
- ✅ All file paths correct
- ✅ All subcommands work
-
---
-
-## Verification Commands
-
-Test the refactored CLI:
-
-```bash
-# Version check
-tito --version
-
-# Module commands
-tito module -h
-tito module status
-
-# System commands
-tito system -h
-tito system info
-tito system doctor
-
-# Package commands
-tito package -h
-tito package reset -h
-
-# Run all tests
-pytest tests/cli/ -v
-
-# Quick import test
-python -c "from tito.main import TinyTorchCLI; print('Success')"
-```
-
-All passing! ✅
-
---
-
-## Architecture Decision
-
-**Question**: Should we organize commands with subcommands into subfolders?
-**Answer**: YES! ✅
-
-**Follows best practices from**:
- Git (`git/builtin/`)
- AWS CLI (`awscli/customizations/`)
- Django (`django/core/management/commands/`)
- Click (Python CLI framework)
-
-**Key insight**: Flat worked when small, but with 34 files it became unmaintainable. Hierarchical structure scales better and makes ownership crystal clear.
-
---
-
-## Future Additions
-
-When adding new commands:
-
-### Simple Command (no subcommands)
-```bash
-# Create at top level
-tito/commands/newcmd.py
-```
-
-### Complex Command (with subcommands)
-```bash
-# Create subfolder
-tito/commands/newcmd/
-├── __init__.py       # Export main command
-├── newcmd.py         # Main command
-└── helper.py         # Subcommand
-```
-
-Tests will automatically validate! 🎉
-
---
-
-## Impact Summary
-
-| Metric | Before | After |
-|--------|--------|-------|
-| Total files in commands/ | 34 | 29 (+ 3 subfolders) |
-| Flat files | 34 | 19 |
-| Organized in subfolders | 0 | 10 |
-| Orphaned files | Unknown | 0 (archived) |
-| Tests passing | 52 | 52 |
-| User-facing changes | N/A | 0 |
-| Developer clarity | ⚠️ Confusing | ✅ Crystal clear |
-
-**Result**: Much cleaner, easier to maintain, zero user impact! 🚀
--- a/tests/integration/integration_cnn_test.py
+++ b/tests/integration/integration_cnn_test.py
--- a/tests/integration/integration_mnist_test.py
+++ b/tests/integration/integration_mnist_test.py
--- a/tests/integration/integration_simple_test.py
+++ b/tests/integration/integration_simple_test.py
--- a/tests/integration/integration_tests.py
+++ b/tests/integration/integration_tests.py
--- a/tests/integration/integration_tinygpt_test.py
+++ b/tests/integration/integration_tinygpt_test.py
--- a/tests/integration/integration_xor_test.py
+++ b/tests/integration/integration_xor_test.py
--- a/tests/integration/minimal_training_example.py
+++ b/tests/integration/minimal_training_example.py
--- a/tests/integration/test_gradient_flow.py
+++ b/tests/integration/test_gradient_flow.py
@@ -1,472 +1,436 @@
 #!/usr/bin/env python3
 """
-Comprehensive gradient flow testing for TinyTorch.
+Comprehensive Gradient Flow Tests for TinyTorch
+================================================

-This test suite systematically validates that gradients propagate correctly
-through all components of the training stack.
+Tests that gradients flow correctly through:
+1. Simple networks (single layer)
+2. Multi-layer networks (MLP)
+3. Convolutional networks (CNN)
+4. Attention mechanisms
+5. Complete training loops

-Run with: pytest tests/test_gradient_flow.py -v
-Or directly: python tests/test_gradient_flow.py
+This ensures backpropagation works correctly end-to-end.
 """

-import numpy as np
 import sys
 import os
+import numpy as np

 # Add project root to path
-sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
+project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+sys.path.insert(0, project_root)

-from tinytorch import Tensor, Linear, Dropout
-from tinytorch import Sigmoid, ReLU, Tanh, GELU, Softmax
-from tinytorch import MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss
-from tinytorch import SGD, AdamW
+from tinytorch.core.tensor import Tensor
+from tinytorch.core.layers import Linear, Dropout
+from tinytorch.core.activations import ReLU, Sigmoid, Softmax
+from tinytorch.core.losses import MSELoss, BinaryCrossEntropyLoss, CrossEntropyLoss
+from tinytorch.core.optimizers import SGD, Adam
+from tinytorch.core.spatial import Conv2d, MaxPool2d
+from tinytorch.core.autograd import enable_autograd
+
+# Enable autograd
+enable_autograd()
+
+def test_simple_linear_gradient_flow():
+    """Test gradients flow through a single linear layer"""
+    print("\n" + "="*70)
+    print("TEST 1: Simple Linear Layer Gradient Flow")
+    print("="*70)
+
+    # Create simple network: Linear(2->1)
+    layer = Linear(2, 1)
+
+    # Input
+    x = Tensor([[1.0, 2.0]], requires_grad=True)
+    target = Tensor([[3.0]])
+
+    # Forward pass
+    output = layer.forward(x)
+
+    # Loss
+    loss_fn = MSELoss()
+    loss = loss_fn.forward(output, target)
+
+    print(f"Initial loss: {float(loss.data):.4f}")
+    print(f"Initial weight shape: {layer.weight.shape}")
+    print(f"Initial bias shape: {layer.bias.shape}")
+
+    # Backward pass
+    loss.backward()
+
+    # Check gradients exist
+    assert layer.weight.grad is not None, "Weight gradient is None!"
+    assert layer.bias.grad is not None, "Bias gradient is None!"
+    assert x.grad is not None, "Input gradient is None!"
+
+    # Check gradients are non-zero
+    weight_grad_norm = np.linalg.norm(layer.weight.grad.data)
+    bias_grad_norm = np.linalg.norm(layer.bias.grad.data)
+    input_grad_norm = np.linalg.norm(x.grad.data)
+
+    print(f"\n✓ Weight gradient norm: {weight_grad_norm:.6f}")
+    print(f"✓ Bias gradient norm: {bias_grad_norm:.6f}")
+    print(f"✓ Input gradient norm: {input_grad_norm:.6f}")
+
+    assert weight_grad_norm > 1e-6, f"Weight gradients too small: {weight_grad_norm}"
+    assert bias_grad_norm > 1e-6, f"Bias gradients too small: {bias_grad_norm}"
+    assert input_grad_norm > 1e-6, f"Input gradients too small: {input_grad_norm}"
+
+    print("\n✅ TEST PASSED: Gradients flow correctly through linear layer")
+    return True


-class TestBasicTensorGradients:
-    """Test gradient computation for basic tensor operations."""
-    
-    def test_multiplication_gradient(self):
-        """Test gradient flow through multiplication."""
-        x = Tensor([[1.0, 2.0]], requires_grad=True)
-        y = x * 3
-        loss = y.sum()
-        
-        loss.backward()
-        
-        # dy/dx = 3
-        assert x.grad is not None, "Gradient should be computed"
-        assert np.allclose(x.grad, [[3.0, 3.0]]), f"Expected [[3, 3]], got {x.grad}"
-    
-    def test_addition_gradient(self):
-        """Test gradient flow through addition."""
-        x = Tensor([[1.0, 2.0]], requires_grad=True)
-        y = Tensor([[3.0, 4.0]], requires_grad=True)
-        z = x + y
-        loss = z.sum()
-        
-        loss.backward()
-        
-        # dz/dx = 1, dz/dy = 1
-        assert np.allclose(x.grad, [[1.0, 1.0]]), f"x.grad: {x.grad}"
-        assert np.allclose(y.grad, [[1.0, 1.0]]), f"y.grad: {y.grad}"
-    
-    def test_chain_rule(self):
-        """Test gradient flow through chain of operations."""
-        x = Tensor([[2.0]], requires_grad=True)
-        y = x * 3      # y = 3x
-        z = y + 1      # z = 3x + 1
-        w = z * 2      # w = 2(3x + 1) = 6x + 2
-        
-        w.backward()
-        
-        # dw/dx = 6
-        assert np.allclose(x.grad, [[6.0]]), f"Expected [[6]], got {x.grad}"
-    
-    def test_matmul_gradient(self):
-        """Test gradient flow through matrix multiplication."""
-        x = Tensor([[1.0, 2.0]], requires_grad=True)
-        W = Tensor([[1.0], [2.0]], requires_grad=True)
-        y = x.matmul(W)  # y = [[5.0]]
-        
-        y.backward()
-        
-        # dy/dx = W^T = [[1, 2]]
-        # dy/dW = x^T = [[1], [2]]
-        assert np.allclose(x.grad, [[1.0, 2.0]]), f"x.grad: {x.grad}"
-        assert np.allclose(W.grad, [[1.0], [2.0]]), f"W.grad: {W.grad}"
-    
-    def test_broadcasting_gradient(self):
-        """Test gradient flow with broadcasting (e.g., bias addition)."""
-        x = Tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)  # (2, 2)
-        bias = Tensor([1.0, 2.0], requires_grad=True)              # (2,)
-        y = x + bias  # Broadcasting happens
-        loss = y.sum()
-        
-        loss.backward()
-        
-        # Gradient should sum over broadcast dimension
-        assert x.grad.shape == (2, 2), f"x.grad shape: {x.grad.shape}"
-        assert bias.grad.shape == (2,), f"bias.grad shape: {bias.grad.shape}"
-        assert np.allclose(bias.grad, [2.0, 2.0]), f"bias.grad: {bias.grad}"
+def test_mlp_gradient_flow():
+    """Test gradients flow through multi-layer perceptron"""
+    print("\n" + "="*70)
+    print("TEST 2: Multi-Layer Perceptron Gradient Flow")
+    print("="*70)
+
+    # Create MLP: Input(4) -> Linear(4->8) -> ReLU -> Linear(8->2)
+    layer1 = Linear(4, 8)
+    activation = ReLU()
+    layer2 = Linear(8, 2)
+
+    # Input and target
+    x = Tensor(np.random.randn(3, 4), requires_grad=True)
+    target = Tensor(np.array([[1, 0], [0, 1], [1, 0]]))
+
+    print(f"Input shape: {x.shape}")
+    print(f"Target shape: {target.shape}")
+
+    # Forward pass
+    h1 = layer1.forward(x)
+    h1_activated = activation.forward(h1)
+    output = layer2.forward(h1_activated)
+
+    print(f"Hidden layer shape: {h1.shape}")
+    print(f"Output shape: {output.shape}")
+
+    # Loss
+    loss_fn = MSELoss()
+    loss = loss_fn.forward(output, target)
+
+    print(f"Initial loss: {float(loss.data):.4f}")
+
+    # Backward pass
+    loss.backward()
+
+    # Check all layer gradients exist
+    assert layer1.weight.grad is not None, "Layer1 weight gradient is None!"
+    assert layer1.bias.grad is not None, "Layer1 bias gradient is None!"
+    assert layer2.weight.grad is not None, "Layer2 weight gradient is None!"
+    assert layer2.bias.grad is not None, "Layer2 bias gradient is None!"
+
+    # Check gradient magnitudes
+    l1_weight_norm = np.linalg.norm(layer1.weight.grad.data)
+    l1_bias_norm = np.linalg.norm(layer1.bias.grad.data)
+    l2_weight_norm = np.linalg.norm(layer2.weight.grad.data)
+    l2_bias_norm = np.linalg.norm(layer2.bias.grad.data)
+
+    print(f"\n✓ Layer1 weight gradient norm: {l1_weight_norm:.6f}")
+    print(f"✓ Layer1 bias gradient norm: {l1_bias_norm:.6f}")
+    print(f"✓ Layer2 weight gradient norm: {l2_weight_norm:.6f}")
+    print(f"✓ Layer2 bias gradient norm: {l2_bias_norm:.6f}")
+
+    assert l1_weight_norm > 1e-6, "Layer1 weight gradients too small"
+    assert l1_bias_norm > 1e-6, "Layer1 bias gradients too small"
+    assert l2_weight_norm > 1e-6, "Layer2 weight gradients too small"
+    assert l2_bias_norm > 1e-6, "Layer2 bias gradients too small"
+
+    print("\n✅ TEST PASSED: Gradients flow correctly through MLP")
+    return True


-class TestLayerGradients:
-    """Test gradient computation through neural network layers."""
-    
-    def test_linear_layer_gradients(self):
-        """Test gradient flow through Linear layer."""
-        layer = Linear(2, 3)
-        x = Tensor([[1.0, 2.0]], requires_grad=True)
-        
-        w_before = layer.weight.data.copy()
-        b_before = layer.bias.data.copy()
-        
-        out = layer(x)
-        loss = out.sum()
-        loss.backward()
-        
-        # All gradients should exist
-        assert layer.weight.grad is not None, "Weight gradient missing"
-        assert layer.bias.grad is not None, "Bias gradient missing"
-        assert x.grad is not None, "Input gradient missing"
-        
-        # Gradient shapes should match parameter shapes
-        assert layer.weight.grad.shape == layer.weight.shape
-        assert layer.bias.grad.shape == layer.bias.shape
-    
-    def test_multi_layer_gradients(self):
-        """Test gradient flow through multiple layers."""
-        layer1 = Linear(2, 3)
-        layer2 = Linear(3, 1)
-        
-        x = Tensor([[1.0, 2.0]], requires_grad=True)
-        
-        h = layer1(x)
-        out = layer2(h)
-        loss = out.sum()
-        
-        loss.backward()
-        
-        # All layers should have gradients
-        assert layer1.weight.grad is not None
-        assert layer1.bias.grad is not None
-        assert layer2.weight.grad is not None
-        assert layer2.bias.grad is not None
+def test_mlp_training_updates():
+    """Test that MLP actually learns (loss decreases)"""
+    print("\n" + "="*70)
+    print("TEST 3: MLP Training - Loss Reduction")
+    print("="*70)

+    # Create simple MLP
+    layer1 = Linear(2, 4)
+    activation = ReLU()
+    layer2 = Linear(4, 1)

-class TestActivationGradients:
-    """Test gradient computation through activation functions."""
-    
-    def test_sigmoid_gradient(self):
-        """Test gradient flow through Sigmoid."""
-        x = Tensor([[0.0, 1.0, -1.0]], requires_grad=True)
-        sigmoid = Sigmoid()
-        
-        y = sigmoid(x)
-        loss = y.sum()
-        loss.backward()
-        
-        assert x.grad is not None, "Sigmoid gradient missing"
-        # Sigmoid gradient: σ'(x) = σ(x)(1 - σ(x))
-        # At x=0: σ(0) = 0.5, σ'(0) = 0.25
-        assert x.grad[0, 0] > 0, "Gradient should be positive"
-    
-    def test_relu_gradient(self):
-        """Test gradient flow through ReLU."""
-        x = Tensor([[-1.0, 0.0, 1.0]], requires_grad=True)
-        relu = ReLU()
-        
-        y = relu(x)
-        loss = y.sum()
-        loss.backward()
-        
-        # ReLU gradient: 1 if x > 0, else 0
-        # Note: We haven't implemented ReLU backward yet, so this will fail
-        # TODO: Implement ReLU backward in autograd
-    
-    def test_tanh_gradient(self):
-        """Test gradient flow through Tanh."""
-        x = Tensor([[0.0, 1.0]], requires_grad=True)
-        tanh = Tanh()
-        
-        y = tanh(x)
-        loss = y.sum()
-        
-        # TODO: Implement Tanh backward
-        # loss.backward()
+    # Simple dataset (XOR-like)
+    X = Tensor(np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]), requires_grad=False)
+    y = Tensor(np.array([[0.0], [1.0], [1.0], [0.0]]))

+    # Optimizer
+    optimizer = SGD([layer1.weight, layer1.bias, layer2.weight, layer2.bias], lr=0.1)
+    loss_fn = MSELoss()

-class TestLossGradients:
-    """Test gradient computation through loss functions."""
-    
-    def test_bce_gradient(self):
-        """Test gradient flow through Binary Cross-Entropy."""
-        predictions = Tensor([[0.7, 0.3, 0.9]], requires_grad=True)
-        targets = Tensor([[1.0, 0.0, 1.0]])
-        
-        loss_fn = BinaryCrossEntropyLoss()
-        loss = loss_fn(predictions, targets)
-        
-        loss.backward()
-        
-        assert predictions.grad is not None, "BCE gradient missing"
-        assert predictions.grad.shape == predictions.shape
-        # Gradient should be negative for correct predictions
-        assert predictions.grad[0, 0] < 0, "Gradient sign incorrect"
-    
-    def test_mse_gradient(self):
-        """Test gradient flow through MSE loss."""
-        predictions = Tensor([[1.0, 2.0, 3.0]], requires_grad=True)
-        targets = Tensor([[2.0, 2.0, 2.0]])
-        
-        loss_fn = MSELoss()
-        loss = loss_fn(predictions, targets)
-        
-        # TODO: Implement MSE backward
-        # loss.backward()
+    losses = []

+    print("Training for 50 epochs...")
+    for epoch in range(50):
+        # Forward
+        h1 = layer1.forward(X)
+        h1_act = activation.forward(h1)
+        output = layer2.forward(h1_act)

-class TestOptimizerIntegration:
-    """Test optimizer integration with gradient flow."""
-    
-    def test_sgd_updates_parameters(self):
-        """Test that SGD actually updates parameters."""
-        layer = Linear(2, 1)
-        optimizer = SGD(layer.parameters(), lr=0.1)
-        
-        w_before = layer.weight.data.copy()
-        b_before = layer.bias.data.copy()
-        
-        # Forward pass
-        x = Tensor([[1.0, 2.0]], requires_grad=True)
-        out = layer(x)
-        loss = out.sum()
-        
-        # Backward pass
-        loss.backward()
-        
-        # Optimizer step
-        optimizer.step()
-        
-        # Parameters should change
-        assert not np.allclose(layer.weight.data, w_before), "Weights didn't update"
-        assert not np.allclose(layer.bias.data, b_before), "Bias didn't update"
-    
-    def test_zero_grad_clears_gradients(self):
-        """Test that zero_grad() clears gradients."""
-        layer = Linear(2, 1)
-        optimizer = SGD(layer.parameters(), lr=0.1)
-        
-        # First backward pass
-        x = Tensor([[1.0, 2.0]])
-        out = layer(x)
-        loss = out.sum()
-        loss.backward()
-        
-        assert layer.weight.grad is not None, "Gradient should exist"
-        
-        # Clear gradients
+        # Loss
+        loss = loss_fn.forward(output, y)
+        losses.append(float(loss.data))
+
+        # Backward
        optimizer.zero_grad()
-        
-        assert layer.weight.grad is None, "Gradient should be cleared"
-        assert layer.bias.grad is None, "Bias gradient should be cleared"
-    
-    def test_adamw_updates_parameters(self):
-        """Test that AdamW optimizer works."""
-        layer = Linear(2, 1)
-        optimizer = AdamW(layer.parameters(), lr=0.01)
-        
-        w_before = layer.weight.data.copy()
-        
-        x = Tensor([[1.0, 2.0]])
-        out = layer(x)
-        loss = out.sum()
        loss.backward()
+
+        # Update
        optimizer.step()
-        
-        assert not np.allclose(layer.weight.data, w_before), "AdamW didn't update weights"
+
+        if (epoch + 1) % 10 == 0:
+            print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
+
+    # Check loss decreased
+    initial_loss = losses[0]
+    final_loss = losses[-1]
+    reduction = initial_loss - final_loss
+    reduction_pct = (reduction / initial_loss) * 100
+
+    print(f"\n✓ Initial loss: {initial_loss:.6f}")
+    print(f"✓ Final loss: {final_loss:.6f}")
+    print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
+
+    assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
+    assert reduction_pct > 10, f"Loss reduction too small: {reduction_pct:.1f}%"
+
+    print("\n✅ TEST PASSED: MLP learns successfully (loss decreases)")
+    return True


-class TestFullTrainingLoop:
-    """Test complete training scenarios."""
-    
-    def test_simple_convergence(self):
-        """Test that a simple model can learn."""
-        # Simple task: learn to output 5 from input [1, 2]
-        layer = Linear(2, 1)
-        optimizer = SGD(layer.parameters(), lr=0.1)
-        loss_fn = MSELoss()
-        
-        x = Tensor([[1.0, 2.0]])
-        target = Tensor([[5.0]])
-        
-        initial_loss = None
-        final_loss = None
-        
-        # Train for a few iterations
-        for i in range(50):
-            # Forward
-            pred = layer(x)
-            loss = loss_fn(pred, target)
-            
-            if i == 0:
-                initial_loss = loss.data
-            if i == 49:
-                final_loss = loss.data
-            
-            # Backward
-            loss.backward()
-            
-            # Update
-            optimizer.step()
-            optimizer.zero_grad()
-        
-        # Loss should decrease
-        assert final_loss < initial_loss, f"Loss didn't decrease: {initial_loss} → {final_loss}"
-    
-    def test_binary_classification(self):
-        """Test binary classification training."""
-        layer = Linear(2, 1)
-        sigmoid = Sigmoid()
-        loss_fn = BinaryCrossEntropyLoss()
-        optimizer = SGD(layer.parameters(), lr=0.1)
-        
-        # Simple dataset: [1, 1] → 1, [0, 0] → 0
-        X = Tensor([[1.0, 1.0], [0.0, 0.0]])
-        y = Tensor([[1.0], [0.0]])
-        
-        initial_loss = None
-        final_loss = None
-        
-        for i in range(50):
-            # Forward
-            logits = layer(X)
-            probs = sigmoid(logits)
-            loss = loss_fn(probs, y)
-            
-            if i == 0:
-                initial_loss = loss.data
-            if i == 49:
-                final_loss = loss.data
-            
-            # Backward
-            loss.backward()
-            
-            # Update
-            optimizer.step()
-            optimizer.zero_grad()
-        
-        assert final_loss < initial_loss, "Binary classification didn't learn"
+def test_cnn_gradient_flow():
+    """Test gradients flow through convolutional layers"""
+    print("\n" + "="*70)
+    print("TEST 4: CNN Gradient Flow")
+    print("="*70)
+
+    # Create simple CNN: Conv2d -> ReLU -> Linear
+    conv = Conv2d(in_channels=1, out_channels=4, kernel_size=3, stride=1, padding=0)
+    activation = ReLU()
+
+    # Input: batch=2, channels=1, height=8, width=8
+    x = Tensor(np.random.randn(2, 1, 8, 8), requires_grad=True)
+
+    print(f"Input shape: {x.shape}")
+    print(f"Conv weight shape: {conv.weight.shape}")
+
+    # Forward through conv
+    conv_out = conv.forward(x)
+    print(f"Conv output shape: {conv_out.shape}")
+
+    activated = activation.forward(conv_out)
+
+    # Flatten for linear layer
+    batch_size = activated.shape[0]
+    flattened_size = np.prod(activated.shape[1:])
+    # Use reshape method to maintain gradient flow
+    flattened = activated.reshape(batch_size, flattened_size)
+
+    linear = Linear(flattened_size, 2)
+    output = linear.forward(flattened)
+
+    print(f"Flattened shape: {flattened.shape}")
+    print(f"Output shape: {output.shape}")
+
+    # Loss
+    target = Tensor(np.array([[1, 0], [0, 1]]))
+    loss_fn = MSELoss()
+    loss = loss_fn.forward(output, target)
+
+    print(f"Initial loss: {float(loss.data):.4f}")
+
+    # Backward
+    loss.backward()
+
+    # Check gradients
+    assert conv.weight.grad is not None, "Conv weight gradient is None!"
+    assert conv.bias.grad is not None, "Conv bias gradient is None!"
+    assert linear.weight.grad is not None, "Linear weight gradient is None!"
+
+    weight_grad_norm = np.linalg.norm(conv.weight.grad.data)
+    conv_bias_norm = np.linalg.norm(conv.bias.grad.data)
+    linear_grad_norm = np.linalg.norm(linear.weight.grad.data)
+
+    print(f"\n✓ Conv weight gradient norm: {weight_grad_norm:.6f}")
+    print(f"✓ Conv bias gradient norm: {conv_bias_norm:.6f}")
+    print(f"✓ Linear weight gradient norm: {linear_grad_norm:.6f}")
+
+    assert weight_grad_norm > 1e-6, f"Conv weight gradients too small: {weight_grad_norm}"
+    assert conv_bias_norm > 1e-6, f"Conv bias gradients too small: {conv_bias_norm}"
+    assert linear_grad_norm > 1e-6, f"Linear gradients too small: {linear_grad_norm}"
+
+    print("\n✅ TEST PASSED: Gradients flow correctly through CNN")
+    return True


-class TestEdgeCases:
-    """Test edge cases and potential failure modes."""
-    
-    def test_zero_gradient(self):
-        """Test that zero gradients don't break training."""
-        x = Tensor([[0.0, 0.0]], requires_grad=True)
-        y = x * 0
-        loss = y.sum()
-        
+def test_cnn_training_updates():
+    """Test that CNN actually learns on simple data"""
+    print("\n" + "="*70)
+    print("TEST 5: CNN Training - Loss Reduction")
+    print("="*70)
+
+    # Simple CNN
+    conv = Conv2d(1, 2, kernel_size=3, stride=1, padding=1)
+    activation = ReLU()
+
+    # Simple data: 4 samples, 1 channel, 4x4 images
+    X = Tensor(np.random.randn(4, 1, 4, 4), requires_grad=False)
+
+    # After conv: (4, 2, 4, 4) -> flatten to (4, 32)
+    conv_out_size = 2 * 4 * 4  # channels * height * width
+    linear = Linear(conv_out_size, 2)
+
+    y = Tensor(np.array([[1, 0], [0, 1], [1, 0], [0, 1]]))
+
+    # Get parameters with gradients
+    params = []
+    for p in [conv.weight, conv.bias, linear.weight, linear.bias]:
+        if not p.requires_grad:
+            p.requires_grad = True
+        params.append(p)
+
+    # Optimizer
+    optimizer = SGD(params, lr=0.01)
+    loss_fn = MSELoss()
+
+    losses = []
+
+    print("Training for 30 epochs...")
+    for epoch in range(30):
+        # Forward
+        conv_out = conv.forward(X)
+        activated = activation.forward(conv_out)
+
+        # Flatten using reshape to maintain gradients
+        batch_size = activated.shape[0]
+        flattened = activated.reshape(batch_size, -1)
+
+        output = linear.forward(flattened)
+
+        # Loss
+        loss = loss_fn.forward(output, y)
+        losses.append(float(loss.data))
+
+        # Backward
+        optimizer.zero_grad()
        loss.backward()
-        
-        assert x.grad is not None
-        assert np.allclose(x.grad, [[0.0, 0.0]])
-    
-    def test_very_small_values(self):
-        """Test gradient flow with very small values."""
-        x = Tensor([[1e-8, 1e-8]], requires_grad=True)
-        y = x * 2
-        loss = y.sum()
-        
-        loss.backward()
-        
-        assert x.grad is not None
-        assert np.allclose(x.grad, [[2.0, 2.0]])
-    
-    def test_gradient_accumulation(self):
-        """Test that gradients accumulate correctly across multiple backward passes."""
-        x = Tensor([[1.0]], requires_grad=True)
-        
-        # First backward
-        y1 = x * 2
-        y1.backward()
-        grad_after_first = x.grad.copy()
-        
-        # Second backward (without zero_grad)
-        y2 = x * 3
-        y2.backward()
-        
-        # Gradient should accumulate: 2 + 3 = 5
-        expected = grad_after_first + np.array([[3.0]])
-        assert np.allclose(x.grad, expected), f"Expected {expected}, got {x.grad}"
+
+        # Update
+        optimizer.step()
+
+        if (epoch + 1) % 10 == 0:
+            print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
+
+    # Check loss decreased
+    initial_loss = losses[0]
+    final_loss = losses[-1]
+    reduction = initial_loss - final_loss
+    reduction_pct = (reduction / initial_loss) * 100
+
+    print(f"\n✓ Initial loss: {initial_loss:.6f}")
+    print(f"✓ Final loss: {final_loss:.6f}")
+    print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
+
+    assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
+
+    print("\n✅ TEST PASSED: CNN learns successfully (loss decreases)")
+    return True


-def run_all_tests():
-    """Run all tests and print results."""
-    import inspect
-    
-    test_classes = [
-        TestBasicTensorGradients,
-        TestLayerGradients,
-        TestActivationGradients,
-        TestLossGradients,
-        TestOptimizerIntegration,
-        TestFullTrainingLoop,
-        TestEdgeCases,
+def test_gradient_accumulation():
+    """Test that gradients accumulate correctly across batches"""
+    print("\n" + "="*70)
+    print("TEST 6: Gradient Accumulation")
+    print("="*70)
+
+    layer = Linear(2, 1)
+
+    # Two batches
+    x1 = Tensor([[1.0, 2.0]], requires_grad=True)
+    x2 = Tensor([[3.0, 4.0]], requires_grad=True)
+    target = Tensor([[1.0]])
+
+    loss_fn = MSELoss()
+
+    # Forward + backward on first batch (don't zero grad)
+    out1 = layer.forward(x1)
+    loss1 = loss_fn.forward(out1, target)
+    loss1.backward()
+
+    grad_after_first = np.array(layer.weight.grad.data)
+
+    # Forward + backward on second batch (gradients should accumulate)
+    out2 = layer.forward(x2)
+    loss2 = loss_fn.forward(out2, target)
+    loss2.backward()
+
+    grad_after_second = layer.weight.grad.data
+
+    # Gradients should have accumulated (not been replaced)
+    grad_diff = np.linalg.norm(grad_after_second - grad_after_first)
+
+    print(f"✓ Gradient after first batch norm: {np.linalg.norm(grad_after_first):.6f}")
+    print(f"✓ Gradient after second batch norm: {np.linalg.norm(grad_after_second):.6f}")
+    print(f"✓ Difference: {grad_diff:.6f}")
+
+    assert grad_diff > 1e-6, "Gradients didn't accumulate properly"
+
+    print("\n✅ TEST PASSED: Gradients accumulate correctly")
+    return True
+
+
+def main():
+    """Run all gradient flow tests"""
+    print("\n" + "="*70)
+    print("  TINYTORCH GRADIENT FLOW TEST SUITE")
+    print("="*70)
+
+    tests = [
+        ("Simple Linear", test_simple_linear_gradient_flow),
+        ("MLP Gradient Flow", test_mlp_gradient_flow),
+        ("MLP Training", test_mlp_training_updates),
+        ("CNN Gradient Flow", test_cnn_gradient_flow),
+        ("CNN Training", test_cnn_training_updates),
+        ("Gradient Accumulation", test_gradient_accumulation),
    ]
-    
-    total_tests = 0
-    passed_tests = 0
-    failed_tests = []
-    skipped_tests = []
-    
-    print("=" * 80)
-    print("🧪 TINYTORCH GRADIENT FLOW TEST SUITE")
-    print("=" * 80)
-    
-    for test_class in test_classes:
-        print(f"\n{'=' * 80}")
-        print(f"📦 {test_class.__name__}")
-        print(f"{'=' * 80}")
-        
-        instance = test_class()
-        methods = [m for m in dir(instance) if m.startswith('test_')]
-        
-        for method_name in methods:
-            total_tests += 1
-            method = getattr(instance, method_name)
-            
-            # Get docstring
-            doc = method.__doc__ or method_name
-            doc = doc.strip().split('\n')[0]
-            
-            print(f"\n  {method_name}")
-            print(f"  {doc}")
-            
-            try:
-                method()
-                print(f"  ✅ PASSED")
-                passed_tests += 1
-            except NotImplementedError as e:
-                print(f"  ⏭️  SKIPPED: {e}")
-                skipped_tests.append((test_class.__name__, method_name, str(e)))
-            except AssertionError as e:
-                print(f"  ❌ FAILED: {e}")
-                failed_tests.append((test_class.__name__, method_name, str(e)))
-            except Exception as e:
-                print(f"  ❌ ERROR: {e}")
-                failed_tests.append((test_class.__name__, method_name, str(e)))
-    
+
+    results = []
+
+    for name, test_func in tests:
+        try:
+            result = test_func()
+            results.append((name, "PASSED" if result else "FAILED"))
+        except Exception as e:
+            print(f"\n❌ TEST FAILED: {name}")
+            print(f"Error: {str(e)}")
+            import traceback
+            traceback.print_exc()
+            results.append((name, "FAILED"))
+
    # Summary
-    print("\n" + "=" * 80)
-    print("📊 TEST SUMMARY")
-    print("=" * 80)
-    print(f"Total tests:   {total_tests}")
-    print(f"✅ Passed:     {passed_tests}")
-    print(f"❌ Failed:     {len(failed_tests)}")
-    print(f"⏭️  Skipped:    {len(skipped_tests)}")
-    
-    if failed_tests:
-        print("\n" + "=" * 80)
-        print("❌ FAILED TESTS:")
-        print("=" * 80)
-        for class_name, method_name, error in failed_tests:
-            print(f"\n  {class_name}.{method_name}")
-            print(f"    {error}")
-    
-    if skipped_tests:
-        print("\n" + "=" * 80)
-        print("⏭️  SKIPPED TESTS (Not Yet Implemented):")
-        print("=" * 80)
-        for class_name, method_name, reason in skipped_tests:
-            print(f"  {class_name}.{method_name}")
-    
-    print("\n" + "=" * 80)
-    
-    return len(failed_tests) == 0
+    print("\n" + "="*70)
+    print("  TEST SUMMARY")
+    print("="*70)
+
+    passed = sum(1 for _, status in results if status == "PASSED")
+    total = len(results)
+
+    for name, status in results:
+        symbol = "✅" if status == "PASSED" else "❌"
+        print(f"{symbol} {name}: {status}")
+
+    print(f"\nTotal: {passed}/{total} tests passed")
+
+    if passed == total:
+        print("\n🎉 ALL TESTS PASSED! Gradients flow correctly through TinyTorch.")
+        return 0
+    else:
+        print(f"\n⚠️  {total - passed} tests failed. Please review the errors above.")
+        return 1


 if __name__ == "__main__":
-    success = run_all_tests()
-    sys.exit(0 if success else 1)
+    exit(main())
--- a/tests/integration/working_training.py
+++ b/tests/integration/working_training.py
--- a/tests/test_gradient_flow.py
+++ b/tests/test_gradient_flow.py
@@ -1,436 +0,0 @@
-#!/usr/bin/env python3
-"""
-Comprehensive Gradient Flow Tests for TinyTorch
-================================================
-
-Tests that gradients flow correctly through:
-1. Simple networks (single layer)
-2. Multi-layer networks (MLP)
-3. Convolutional networks (CNN)
-4. Attention mechanisms
-5. Complete training loops
-
-This ensures backpropagation works correctly end-to-end.
-"""
-
-import sys
-import os
-import numpy as np
-
-# Add project root to path
-project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
-sys.path.insert(0, project_root)
-
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.layers import Linear, Dropout
-from tinytorch.core.activations import ReLU, Sigmoid, Softmax
-from tinytorch.core.losses import MSELoss, BinaryCrossEntropyLoss, CrossEntropyLoss
-from tinytorch.core.optimizers import SGD, Adam
-from tinytorch.core.spatial import Conv2d, MaxPool2d
-from tinytorch.core.autograd import enable_autograd
-
-# Enable autograd
-enable_autograd()
-
-def test_simple_linear_gradient_flow():
-    """Test gradients flow through a single linear layer"""
-    print("\n" + "="*70)
-    print("TEST 1: Simple Linear Layer Gradient Flow")
-    print("="*70)
-
-    # Create simple network: Linear(2->1)
-    layer = Linear(2, 1)
-
-    # Input
-    x = Tensor([[1.0, 2.0]], requires_grad=True)
-    target = Tensor([[3.0]])
-
-    # Forward pass
-    output = layer.forward(x)
-
-    # Loss
-    loss_fn = MSELoss()
-    loss = loss_fn.forward(output, target)
-
-    print(f"Initial loss: {float(loss.data):.4f}")
-    print(f"Initial weight shape: {layer.weight.shape}")
-    print(f"Initial bias shape: {layer.bias.shape}")
-
-    # Backward pass
-    loss.backward()
-
-    # Check gradients exist
-    assert layer.weight.grad is not None, "Weight gradient is None!"
-    assert layer.bias.grad is not None, "Bias gradient is None!"
-    assert x.grad is not None, "Input gradient is None!"
-
-    # Check gradients are non-zero
-    weight_grad_norm = np.linalg.norm(layer.weight.grad.data)
-    bias_grad_norm = np.linalg.norm(layer.bias.grad.data)
-    input_grad_norm = np.linalg.norm(x.grad.data)
-
-    print(f"\n✓ Weight gradient norm: {weight_grad_norm:.6f}")
-    print(f"✓ Bias gradient norm: {bias_grad_norm:.6f}")
-    print(f"✓ Input gradient norm: {input_grad_norm:.6f}")
-
-    assert weight_grad_norm > 1e-6, f"Weight gradients too small: {weight_grad_norm}"
-    assert bias_grad_norm > 1e-6, f"Bias gradients too small: {bias_grad_norm}"
-    assert input_grad_norm > 1e-6, f"Input gradients too small: {input_grad_norm}"
-
-    print("\n✅ TEST PASSED: Gradients flow correctly through linear layer")
-    return True
-
-
-def test_mlp_gradient_flow():
-    """Test gradients flow through multi-layer perceptron"""
-    print("\n" + "="*70)
-    print("TEST 2: Multi-Layer Perceptron Gradient Flow")
-    print("="*70)
-
-    # Create MLP: Input(4) -> Linear(4->8) -> ReLU -> Linear(8->2)
-    layer1 = Linear(4, 8)
-    activation = ReLU()
-    layer2 = Linear(8, 2)
-
-    # Input and target
-    x = Tensor(np.random.randn(3, 4), requires_grad=True)
-    target = Tensor(np.array([[1, 0], [0, 1], [1, 0]]))
-
-    print(f"Input shape: {x.shape}")
-    print(f"Target shape: {target.shape}")
-
-    # Forward pass
-    h1 = layer1.forward(x)
-    h1_activated = activation.forward(h1)
-    output = layer2.forward(h1_activated)
-
-    print(f"Hidden layer shape: {h1.shape}")
-    print(f"Output shape: {output.shape}")
-
-    # Loss
-    loss_fn = MSELoss()
-    loss = loss_fn.forward(output, target)
-
-    print(f"Initial loss: {float(loss.data):.4f}")
-
-    # Backward pass
-    loss.backward()
-
-    # Check all layer gradients exist
-    assert layer1.weight.grad is not None, "Layer1 weight gradient is None!"
-    assert layer1.bias.grad is not None, "Layer1 bias gradient is None!"
-    assert layer2.weight.grad is not None, "Layer2 weight gradient is None!"
-    assert layer2.bias.grad is not None, "Layer2 bias gradient is None!"
-
-    # Check gradient magnitudes
-    l1_weight_norm = np.linalg.norm(layer1.weight.grad.data)
-    l1_bias_norm = np.linalg.norm(layer1.bias.grad.data)
-    l2_weight_norm = np.linalg.norm(layer2.weight.grad.data)
-    l2_bias_norm = np.linalg.norm(layer2.bias.grad.data)
-
-    print(f"\n✓ Layer1 weight gradient norm: {l1_weight_norm:.6f}")
-    print(f"✓ Layer1 bias gradient norm: {l1_bias_norm:.6f}")
-    print(f"✓ Layer2 weight gradient norm: {l2_weight_norm:.6f}")
-    print(f"✓ Layer2 bias gradient norm: {l2_bias_norm:.6f}")
-
-    assert l1_weight_norm > 1e-6, "Layer1 weight gradients too small"
-    assert l1_bias_norm > 1e-6, "Layer1 bias gradients too small"
-    assert l2_weight_norm > 1e-6, "Layer2 weight gradients too small"
-    assert l2_bias_norm > 1e-6, "Layer2 bias gradients too small"
-
-    print("\n✅ TEST PASSED: Gradients flow correctly through MLP")
-    return True
-
-
-def test_mlp_training_updates():
-    """Test that MLP actually learns (loss decreases)"""
-    print("\n" + "="*70)
-    print("TEST 3: MLP Training - Loss Reduction")
-    print("="*70)
-
-    # Create simple MLP
-    layer1 = Linear(2, 4)
-    activation = ReLU()
-    layer2 = Linear(4, 1)
-
-    # Simple dataset (XOR-like)
-    X = Tensor(np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]), requires_grad=False)
-    y = Tensor(np.array([[0.0], [1.0], [1.0], [0.0]]))
-
-    # Optimizer
-    optimizer = SGD([layer1.weight, layer1.bias, layer2.weight, layer2.bias], lr=0.1)
-    loss_fn = MSELoss()
-
-    losses = []
-
-    print("Training for 50 epochs...")
-    for epoch in range(50):
-        # Forward
-        h1 = layer1.forward(X)
-        h1_act = activation.forward(h1)
-        output = layer2.forward(h1_act)
-
-        # Loss
-        loss = loss_fn.forward(output, y)
-        losses.append(float(loss.data))
-
-        # Backward
-        optimizer.zero_grad()
-        loss.backward()
-
-        # Update
-        optimizer.step()
-
-        if (epoch + 1) % 10 == 0:
-            print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
-
-    # Check loss decreased
-    initial_loss = losses[0]
-    final_loss = losses[-1]
-    reduction = initial_loss - final_loss
-    reduction_pct = (reduction / initial_loss) * 100
-
-    print(f"\n✓ Initial loss: {initial_loss:.6f}")
-    print(f"✓ Final loss: {final_loss:.6f}")
-    print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
-
-    assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
-    assert reduction_pct > 10, f"Loss reduction too small: {reduction_pct:.1f}%"
-
-    print("\n✅ TEST PASSED: MLP learns successfully (loss decreases)")
-    return True
-
-
-def test_cnn_gradient_flow():
-    """Test gradients flow through convolutional layers"""
-    print("\n" + "="*70)
-    print("TEST 4: CNN Gradient Flow")
-    print("="*70)
-
-    # Create simple CNN: Conv2d -> ReLU -> Linear
-    conv = Conv2d(in_channels=1, out_channels=4, kernel_size=3, stride=1, padding=0)
-    activation = ReLU()
-
-    # Input: batch=2, channels=1, height=8, width=8
-    x = Tensor(np.random.randn(2, 1, 8, 8), requires_grad=True)
-
-    print(f"Input shape: {x.shape}")
-    print(f"Conv weight shape: {conv.weight.shape}")
-
-    # Forward through conv
-    conv_out = conv.forward(x)
-    print(f"Conv output shape: {conv_out.shape}")
-
-    activated = activation.forward(conv_out)
-
-    # Flatten for linear layer
-    batch_size = activated.shape[0]
-    flattened_size = np.prod(activated.shape[1:])
-    # Use reshape method to maintain gradient flow
-    flattened = activated.reshape(batch_size, flattened_size)
-
-    linear = Linear(flattened_size, 2)
-    output = linear.forward(flattened)
-
-    print(f"Flattened shape: {flattened.shape}")
-    print(f"Output shape: {output.shape}")
-
-    # Loss
-    target = Tensor(np.array([[1, 0], [0, 1]]))
-    loss_fn = MSELoss()
-    loss = loss_fn.forward(output, target)
-
-    print(f"Initial loss: {float(loss.data):.4f}")
-
-    # Backward
-    loss.backward()
-
-    # Check gradients
-    assert conv.weight.grad is not None, "Conv weight gradient is None!"
-    assert conv.bias.grad is not None, "Conv bias gradient is None!"
-    assert linear.weight.grad is not None, "Linear weight gradient is None!"
-
-    weight_grad_norm = np.linalg.norm(conv.weight.grad.data)
-    conv_bias_norm = np.linalg.norm(conv.bias.grad.data)
-    linear_grad_norm = np.linalg.norm(linear.weight.grad.data)
-
-    print(f"\n✓ Conv weight gradient norm: {weight_grad_norm:.6f}")
-    print(f"✓ Conv bias gradient norm: {conv_bias_norm:.6f}")
-    print(f"✓ Linear weight gradient norm: {linear_grad_norm:.6f}")
-
-    assert weight_grad_norm > 1e-6, f"Conv weight gradients too small: {weight_grad_norm}"
-    assert conv_bias_norm > 1e-6, f"Conv bias gradients too small: {conv_bias_norm}"
-    assert linear_grad_norm > 1e-6, f"Linear gradients too small: {linear_grad_norm}"
-
-    print("\n✅ TEST PASSED: Gradients flow correctly through CNN")
-    return True
-
-
-def test_cnn_training_updates():
-    """Test that CNN actually learns on simple data"""
-    print("\n" + "="*70)
-    print("TEST 5: CNN Training - Loss Reduction")
-    print("="*70)
-
-    # Simple CNN
-    conv = Conv2d(1, 2, kernel_size=3, stride=1, padding=1)
-    activation = ReLU()
-
-    # Simple data: 4 samples, 1 channel, 4x4 images
-    X = Tensor(np.random.randn(4, 1, 4, 4), requires_grad=False)
-
-    # After conv: (4, 2, 4, 4) -> flatten to (4, 32)
-    conv_out_size = 2 * 4 * 4  # channels * height * width
-    linear = Linear(conv_out_size, 2)
-
-    y = Tensor(np.array([[1, 0], [0, 1], [1, 0], [0, 1]]))
-
-    # Get parameters with gradients
-    params = []
-    for p in [conv.weight, conv.bias, linear.weight, linear.bias]:
-        if not p.requires_grad:
-            p.requires_grad = True
-        params.append(p)
-
-    # Optimizer
-    optimizer = SGD(params, lr=0.01)
-    loss_fn = MSELoss()
-
-    losses = []
-
-    print("Training for 30 epochs...")
-    for epoch in range(30):
-        # Forward
-        conv_out = conv.forward(X)
-        activated = activation.forward(conv_out)
-
-        # Flatten using reshape to maintain gradients
-        batch_size = activated.shape[0]
-        flattened = activated.reshape(batch_size, -1)
-
-        output = linear.forward(flattened)
-
-        # Loss
-        loss = loss_fn.forward(output, y)
-        losses.append(float(loss.data))
-
-        # Backward
-        optimizer.zero_grad()
-        loss.backward()
-
-        # Update
-        optimizer.step()
-
-        if (epoch + 1) % 10 == 0:
-            print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
-
-    # Check loss decreased
-    initial_loss = losses[0]
-    final_loss = losses[-1]
-    reduction = initial_loss - final_loss
-    reduction_pct = (reduction / initial_loss) * 100
-
-    print(f"\n✓ Initial loss: {initial_loss:.6f}")
-    print(f"✓ Final loss: {final_loss:.6f}")
-    print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
-
-    assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
-
-    print("\n✅ TEST PASSED: CNN learns successfully (loss decreases)")
-    return True
-
-
-def test_gradient_accumulation():
-    """Test that gradients accumulate correctly across batches"""
-    print("\n" + "="*70)
-    print("TEST 6: Gradient Accumulation")
-    print("="*70)
-
-    layer = Linear(2, 1)
-
-    # Two batches
-    x1 = Tensor([[1.0, 2.0]], requires_grad=True)
-    x2 = Tensor([[3.0, 4.0]], requires_grad=True)
-    target = Tensor([[1.0]])
-
-    loss_fn = MSELoss()
-
-    # Forward + backward on first batch (don't zero grad)
-    out1 = layer.forward(x1)
-    loss1 = loss_fn.forward(out1, target)
-    loss1.backward()
-
-    grad_after_first = np.array(layer.weight.grad.data)
-
-    # Forward + backward on second batch (gradients should accumulate)
-    out2 = layer.forward(x2)
-    loss2 = loss_fn.forward(out2, target)
-    loss2.backward()
-
-    grad_after_second = layer.weight.grad.data
-
-    # Gradients should have accumulated (not been replaced)
-    grad_diff = np.linalg.norm(grad_after_second - grad_after_first)
-
-    print(f"✓ Gradient after first batch norm: {np.linalg.norm(grad_after_first):.6f}")
-    print(f"✓ Gradient after second batch norm: {np.linalg.norm(grad_after_second):.6f}")
-    print(f"✓ Difference: {grad_diff:.6f}")
-
-    assert grad_diff > 1e-6, "Gradients didn't accumulate properly"
-
-    print("\n✅ TEST PASSED: Gradients accumulate correctly")
-    return True
-
-
-def main():
-    """Run all gradient flow tests"""
-    print("\n" + "="*70)
-    print("  TINYTORCH GRADIENT FLOW TEST SUITE")
-    print("="*70)
-
-    tests = [
-        ("Simple Linear", test_simple_linear_gradient_flow),
-        ("MLP Gradient Flow", test_mlp_gradient_flow),
-        ("MLP Training", test_mlp_training_updates),
-        ("CNN Gradient Flow", test_cnn_gradient_flow),
-        ("CNN Training", test_cnn_training_updates),
-        ("Gradient Accumulation", test_gradient_accumulation),
-    ]
-
-    results = []
-
-    for name, test_func in tests:
-        try:
-            result = test_func()
-            results.append((name, "PASSED" if result else "FAILED"))
-        except Exception as e:
-            print(f"\n❌ TEST FAILED: {name}")
-            print(f"Error: {str(e)}")
-            import traceback
-            traceback.print_exc()
-            results.append((name, "FAILED"))
-
-    # Summary
-    print("\n" + "="*70)
-    print("  TEST SUMMARY")
-    print("="*70)
-
-    passed = sum(1 for _, status in results if status == "PASSED")
-    total = len(results)
-
-    for name, status in results:
-        symbol = "✅" if status == "PASSED" else "❌"
-        print(f"{symbol} {name}: {status}")
-
-    print(f"\nTotal: {passed}/{total} tests passed")
-
-    if passed == total:
-        print("\n🎉 ALL TESTS PASSED! Gradients flow correctly through TinyTorch.")
-        return 0
-    else:
-        print(f"\n⚠️  {total - passed} tests failed. Please review the errors above.")
-        return 1
-
-
-if __name__ == "__main__":
-    exit(main())
--- a/tito/commands/_archived/README.md
+++ b/tito/commands/_archived/README.md
@@ -1,24 +0,0 @@
-# Archived Commands
-
-These command files are no longer top-level commands but are kept for reference.
-
-## Archived Files
-
- `clean.py` - Deprecated cleanup command
- `help.py` - Old help command (now handled by argparse)
- `notebooks.py` - Deprecated notebooks command
- `status.py` - Old status command (functionality moved to module workflow)
- `checkpoint.py` - Old checkpoint tracking (superseded by milestones command)
- `demo.py` - Demo runner (students can run demos directly with Python)
- `book.py` - Jupyter Book builder (developers can run jupyter-book directly)
- `leaderboard.py` - Community leaderboard (functionality merged into community command)
- `olympics.py` - Competition events (functionality merged into community command)
-
-## Note
-
-During the CLI reorganization on 2025-11-28, commands with subcommands were moved into logical subfolders:
- `module/` - Module workflow and reset
- `system/` - System commands (info, health, jupyter, check, version, clean_workspace, report, protect)
- `package/` - Package management (nbdev, reset)
-
-These archived files are truly deprecated and not used anywhere in the codebase.
--- a/tito/commands/_archived/book.py
+++ b/tito/commands/_archived/book.py
@@ -1,396 +0,0 @@
-"""
-Book command for TinyTorch CLI: builds and manages the Jupyter Book.
-"""
-
-import os
-import subprocess
-from argparse import ArgumentParser, Namespace
-from pathlib import Path
-from rich.panel import Panel
-
-from .base import BaseCommand
-
-NOTEBOOKS_DIR = "modules"
-
-class BookCommand(BaseCommand):
-    @property
-    def name(self) -> str:
-        return "book"
-
-    @property
-    def description(self) -> str:
-        return "Build and manage the TinyTorch Jupyter Book"
-
-    def add_arguments(self, parser: ArgumentParser) -> None:
-        subparsers = parser.add_subparsers(
-            dest='book_command',
-            help='Book management commands',
-            metavar='COMMAND'
-        )
-        
-        # Build command
-        build_parser = subparsers.add_parser(
-            'build',
-            help='Build the Jupyter Book locally'
-        )
-        
-        # Publish command
-        publish_parser = subparsers.add_parser(
-            'publish',
-            help='Generate content, commit, and publish to GitHub'
-        )
-        publish_parser.add_argument(
-            '--message',
-            type=str,
-            default='📚 Update book content',
-            help='Commit message (default: "📚 Update book content")'
-        )
-        publish_parser.add_argument(
-            '--branch',
-            type=str,
-            default='main',
-            help='Branch to push to (default: main)'
-        )
-        
-        # Clean command
-        clean_parser = subparsers.add_parser(
-            'clean',
-            help='Clean built book files'
-        )
-        
-        # Serve command
-        serve_parser = subparsers.add_parser(
-            'serve',
-            help='Build and serve the Jupyter Book locally'
-        )
-        serve_parser.add_argument(
-            '--port',
-            type=int,
-            default=8001,
-            help='Port to serve on (default: 8001)'
-        )
-        serve_parser.add_argument(
-            '--no-build',
-            action='store_true',
-            help='Skip building and serve existing files'
-        )
-
-    def run(self, args: Namespace) -> int:
-        console = self.console
-        
-        # Check if we're in the right directory
-        if not Path("site").exists():
-            console.print(Panel(
-                "[red]❌ site/ directory not found. Run this command from the TinyTorch root directory.[/red]",
-                title="Error",
-                border_style="red"
-            ))
-            return 1
-        
-        # Handle subcommands
-        if not hasattr(args, 'book_command') or not args.book_command:
-            console.print(Panel(
-                "[bold cyan]📚 TinyTorch Book Management[/bold cyan]\n\n"
-                "[bold]Available Commands:[/bold]\n"
-                "  [bold green]build[/bold green]      - Build the complete Jupyter Book\n"
-                "  [bold green]serve[/bold green]      - Build and serve the Jupyter Book locally\n"
-                "  [bold green]publish[/bold green]   - Generate content, commit, and publish to GitHub\n"
-                "  [bold green]clean[/bold green]     - Clean built book files\n\n"
-                "[bold]Quick Start:[/bold]\n"
-                "  [dim]tito book publish[/dim]       - Generate, commit, and publish to GitHub\n"
-                "  [dim]tito book clean[/dim]         - Clean built book files",
-                title="Book Commands",
-                border_style="bright_blue"
-            ))
-            return 0
-        
-        if args.book_command == 'build':
-            return self._build_book(args)
-        elif args.book_command == 'serve':
-            return self._serve_book(args)
-        elif args.book_command == 'publish':
-            return self._publish_book(args)
-        elif args.book_command == 'clean':
-            return self._clean_book()
-        else:
-            console.print(f"[red]Unknown book command: {args.book_command}[/red]")
-            return 1
-
-    def _generate_overview(self) -> int:
-        """Generate overview pages from modules."""
-        console = self.console
-        console.print("🔄 Generating overview pages from modules...")
-        
-        try:
-            os.chdir("site")
-            result = subprocess.run(
-                ["python3", "convert_readmes.py"],
-                capture_output=True,
-                text=True
-            )
-            
-            if result.returncode == 0:
-                console.print("✅ Overview pages generated successfully")
-                # Show summary from the output
-                for line in result.stdout.split('\n'):
-                    if "✅ Created" in line or "🎉 Converted" in line:
-                        console.print(f"   {line.strip()}")
-                return 0
-            else:
-                console.print(f"[red]❌ Failed to generate overview pages: {result.stderr}[/red]")
-                return 1
-                
-        except FileNotFoundError:
-            console.print("[red]❌ Python3 not found or convert_readmes.py missing[/red]")
-            return 1
-        except Exception as e:
-            console.print(f"[red]❌ Error generating overview pages: {e}[/red]")
-            return 1
-        finally:
-            os.chdir("..")
-
-    def _generate_all(self) -> int:
-        """Verify that all book chapters exist."""
-        console = self.console
-        console.print("📝 Verifying book chapters...")
-        
-        # Check that the chapters directory exists
-        chapters_dir = Path("docs/chapters")
-        if not chapters_dir.exists():
-            console.print("[red]❌ docs/chapters directory not found[/red]")
-            return 1
-        
-        # Count markdown files in chapters directory
-        chapter_files = list(chapters_dir.glob("*.md"))
-        if chapter_files:
-            console.print(f"✅ Found {len(chapter_files)} chapter files")
-        else:
-            console.print("[yellow]⚠️  No chapter files found in docs/chapters/[/yellow]")
-        
-        return 0
-
-    def _build_book(self, args: Namespace) -> int:
-        """Build the Jupyter Book locally."""
-        console = self.console
-        
-        # First generate all content (notebooks + overview pages)
-        console.print("📄 Step 1: Generating all content...")
-        if self._generate_all() != 0:
-            return 1
-        
-        # Then build the book
-        console.print("📚 Step 2: Building Jupyter Book...")
-        
-        try:
-            os.chdir("site")
-            result = subprocess.run(
-                ["jupyter-book", "build", "."],
-                capture_output=True,
-                text=True
-            )
-            
-            if result.returncode == 0:
-                console.print("✅ Book built successfully!")
-                
-                # Extract and show the file path
-                if "file://" in result.stdout:
-                    for line in result.stdout.split('\n'):
-                        if "file://" in line:
-                            console.print(f"🌐 View at: {line.strip()}")
-                            break
-                
-                console.print("📁 HTML files available in: docs/_build/html/")
-                return 0
-            else:
-                console.print(f"[red]❌ Failed to build book[/red]")
-                if result.stderr:
-                    console.print(f"Error details: {result.stderr}")
-                return 1
-                
-        except FileNotFoundError:
-            console.print("[red]❌ jupyter-book not found. Install with: pip install jupyter-book[/red]")
-            return 1
-        except Exception as e:
-            console.print(f"[red]❌ Error building book: {e}[/red]")
-            return 1
-        finally:
-            os.chdir("..")
-
-    def _serve_book(self, args: Namespace) -> int:
-        """Build and serve the Jupyter Book locally."""
-        console = self.console
-        
-        # Build the book first unless --no-build is specified
-        if not args.no_build:
-            console.print("📚 Step 1: Building the book...")
-            if self._build_book(args) != 0:
-                return 1
-            console.print()
-        
-        # Start the HTTP server
-        console.print("🌐 Step 2: Starting development server...")
-        console.print(f"📖 Open your browser to: [bold blue]http://localhost:{args.port}[/bold blue]")
-        console.print("🛑 Press [bold]Ctrl+C[/bold] to stop the server")
-        console.print()
-        
-        book_dir = Path("docs/_build/html")
-        if not book_dir.exists():
-            console.print("[red]❌ Built book not found. Run with --no-build=False to build first.[/red]")
-            return 1
-        
-        try:
-            # Use Python's built-in HTTP server
-            subprocess.run([
-                "python3", "-m", "http.server", str(args.port),
-                "--directory", str(book_dir)
-            ])
-        except KeyboardInterrupt:
-            console.print("\n🛑 Development server stopped")
-        except FileNotFoundError:
-            console.print("[red]❌ Python3 not found in PATH[/red]")
-            return 1
-        except Exception as e:
-            console.print(f"[red]❌ Error starting server: {e}[/red]")
-            return 1
-        
-        return 0
-
-    def _clean_book(self) -> int:
-        """Clean built book files."""
-        console = self.console
-        console.print("🧹 Cleaning book build files...")
-        
-        try:
-            os.chdir("site")
-            result = subprocess.run(
-                ["jupyter-book", "clean", "."],
-                capture_output=True,
-                text=True
-            )
-            
-            if result.returncode == 0:
-                console.print("✅ Book files cleaned successfully")
-                return 0
-            else:
-                console.print(f"[red]❌ Failed to clean book files: {result.stderr}[/red]")
-                return 1
-                
-        except FileNotFoundError:
-            console.print("[red]❌ jupyter-book not found[/red]")
-            return 1
-        except Exception as e:
-            console.print(f"[red]❌ Error cleaning book: {e}[/red]")
-            return 1
-        finally:
-            os.chdir("..")
-
-    def _publish_book(self, args: Namespace) -> int:
-        """Generate content, commit, and publish to GitHub."""
-        console = self.console
-        
-        console.print("🚀 Starting book publishing workflow...")
-        
-        # Step 1: Generate all content
-        console.print("📝 Step 1: Generating all content...")
-        if self._generate_all() != 0:
-            console.print("[red]❌ Failed to generate content. Aborting publish.[/red]")
-            return 1
-        
-        # Step 2: Check git status
-        console.print("🔍 Step 2: Checking git status...")
-        try:
-            result = subprocess.run(
-                ["git", "status", "--porcelain"],
-                capture_output=True,
-                text=True,
-                cwd="."
-            )
-            
-            if result.returncode != 0:
-                console.print("[red]❌ Git not available or not a git repository[/red]")
-                return 1
-            
-            changes = result.stdout.strip()
-            if not changes:
-                console.print("✅ No changes to publish")
-                return 0
-                
-        except Exception as e:
-            console.print(f"[red]❌ Error checking git status: {e}[/red]")
-            return 1
-        
-        # Step 3: Add and commit changes
-        console.print("📦 Step 3: Committing changes...")
-        try:
-            # Add all changes
-            subprocess.run(["git", "add", "."], check=True, cwd=".")
-            
-            # Commit with message
-            subprocess.run([
-                "git", "commit", "-m", args.message
-            ], check=True, cwd=".")
-            
-            console.print(f"✅ Committed with message: {args.message}")
-            
-        except subprocess.CalledProcessError as e:
-            console.print(f"[red]❌ Failed to commit changes: {e}[/red]")
-            return 1
-        except Exception as e:
-            console.print(f"[red]❌ Error during commit: {e}[/red]")
-            return 1
-        
-        # Step 4: Push to GitHub
-        console.print(f"⬆️  Step 4: Pushing to {args.branch} branch...")
-        try:
-            result = subprocess.run([
-                "git", "push", "origin", args.branch
-            ], capture_output=True, text=True, cwd=".")
-            
-            if result.returncode == 0:
-                console.print(f"✅ Successfully pushed to {args.branch}")
-            else:
-                console.print(f"[red]❌ Failed to push: {result.stderr}[/red]")
-                return 1
-                
-        except Exception as e:
-            console.print(f"[red]❌ Error during push: {e}[/red]")
-            return 1
-        
-        # Step 5: Show deployment info
-        console.print("🌐 Step 5: Deployment initiated...")
-        console.print("✅ GitHub Actions will now:")
-        console.print("   📚 Build the Jupyter Book")
-        console.print("   🚀 Deploy to GitHub Pages")
-        console.print("   🔗 Update live website")
-        
-        # Try to get repository info for deployment URL
-        try:
-            result = subprocess.run([
-                "git", "remote", "get-url", "origin"
-            ], capture_output=True, text=True, cwd=".")
-            
-            if result.returncode == 0:
-                remote_url = result.stdout.strip()
-                if "github.com" in remote_url:
-                    # Extract owner/repo from git URL
-                    if remote_url.endswith(".git"):
-                        remote_url = remote_url[:-4]
-                    if remote_url.startswith("git@github.com:"):
-                        repo_path = remote_url.replace("git@github.com:", "")
-                    elif remote_url.startswith("https://github.com/"):
-                        repo_path = remote_url.replace("https://github.com/", "")
-                    else:
-                        repo_path = None
-                    
-                    if repo_path:
-                        console.print(f"\n🔗 Monitor deployment: https://github.com/{repo_path}/actions")
-                        console.print(f"📖 Live website: https://{repo_path.split('/')[0]}.github.io/{repo_path.split('/')[1]}/")
-                        
-        except Exception:
-            # Don't fail the whole command if we can't get repo info
-            pass
-        
-        console.print("\n🎉 Publishing workflow complete!")
-        console.print("💡 Check GitHub Actions for deployment status")
-        
-        return 0 
--- a/tito/commands/_archived/checkpoint.py
+++ b/tito/commands/_archived/checkpoint.py
@@ -1,690 +0,0 @@
-"""
-Checkpoint tracking and visualization command for TinyTorch CLI.
-
-Provides capability-based progress tracking through the ML systems engineering journey:
-Foundation → Architecture → Training → Inference → Serving
-"""
-
-import argparse
-import subprocess
-import sys
-from pathlib import Path
-from typing import Dict, List, Tuple, Optional
-from rich.console import Console
-from rich.panel import Panel
-from rich.progress import Progress, BarColumn, TextColumn, SpinnerColumn
-from rich.table import Table
-from rich.tree import Tree
-from rich.text import Text
-from rich.layout import Layout
-from rich.columns import Columns
-from rich.status import Status
-
-from .base import BaseCommand
-from ..core.config import CLIConfig
-from ..core.console import get_console, print_error, print_success
-
-
-class CheckpointSystem:
-    """Core checkpoint tracking system."""
-    
-    # Define the 20-checkpoint structure for complete ML systems engineering journey
-    CHECKPOINTS = {
-        "00": {
-            "name": "Environment",
-            "description": "Development environment setup and configuration",
-            "test_file": "checkpoint_00_environment.py",
-            "capability": "Can I configure my TinyTorch development environment?"
-        },
-        "01": {
-            "name": "Foundation",
-            "description": "Basic tensor operations and ML building blocks",
-            "test_file": "checkpoint_01_foundation.py",
-            "capability": "Can I create and manipulate the building blocks of ML?"
-        },
-        "02": {
-            "name": "Intelligence",
-            "description": "Nonlinear activation functions",
-            "test_file": "checkpoint_02_intelligence.py",
-            "capability": "Can I add nonlinearity - the key to neural network intelligence?"
-        },
-        "03": {
-            "name": "Components",
-            "description": "Fundamental neural network building blocks",
-            "test_file": "checkpoint_03_components.py",
-            "capability": "Can I build the fundamental building blocks of neural networks?"
-        },
-        "04": {
-            "name": "Networks",
-            "description": "Complete multi-layer neural networks",
-            "test_file": "checkpoint_04_networks.py",
-            "capability": "Can I build complete multi-layer neural networks?"
-        },
-        "05": {
-            "name": "Learning",
-            "description": "Spatial data processing with convolutional operations",
-            "test_file": "checkpoint_05_learning.py",
-            "capability": "Can I process spatial data like images with convolutional operations?"
-        },
-        "06": {
-            "name": "Attention",
-            "description": "Attention mechanisms for sequence understanding",
-            "test_file": "checkpoint_06_attention.py",
-            "capability": "Can I build attention mechanisms for sequence understanding?"
-        },
-        "07": {
-            "name": "Stability",
-            "description": "Training stabilization with normalization",
-            "test_file": "checkpoint_07_stability.py",
-            "capability": "Can I stabilize training with normalization techniques?"
-        },
-        "08": {
-            "name": "Differentiation",
-            "description": "Automatic gradient computation for learning",
-            "test_file": "checkpoint_08_differentiation.py",
-            "capability": "Can I automatically compute gradients for learning?"
-        },
-        "09": {
-            "name": "Optimization",
-            "description": "Sophisticated optimization algorithms",
-            "test_file": "checkpoint_09_optimization.py",
-            "capability": "Can I optimize neural networks with sophisticated algorithms?"
-        },
-        "10": {
-            "name": "Training",
-            "description": "Complete training loops for end-to-end learning",
-            "test_file": "checkpoint_10_training.py",
-            "capability": "Can I build complete training loops for end-to-end learning?"
-        },
-        "11": {
-            "name": "Regularization",
-            "description": "Overfitting prevention and robust model building",
-            "test_file": "checkpoint_11_regularization.py",
-            "capability": "Can I prevent overfitting and build robust models?"
-        },
-        "12": {
-            "name": "Kernels",
-            "description": "High-performance computational kernels",
-            "test_file": "checkpoint_12_kernels.py",
-            "capability": "Can I implement high-performance computational kernels?"
-        },
-        "13": {
-            "name": "Benchmarking",
-            "description": "Performance analysis and bottleneck identification",
-            "test_file": "checkpoint_13_benchmarking.py",
-            "capability": "Can I analyze performance and identify bottlenecks in ML systems?"
-        },
-        "14": {
-            "name": "Deployment",
-            "description": "Production deployment and monitoring",
-            "test_file": "checkpoint_14_deployment.py",
-            "capability": "Can I deploy and monitor ML systems in production?"
-        },
-        "15": {
-            "name": "Acceleration",
-            "description": "Algorithmic optimization and acceleration techniques",
-            "test_file": "checkpoint_15_acceleration.py",
-            "capability": "Can I accelerate computations through algorithmic optimization?"
-        },
-        "16": {
-            "name": "Quantization",
-            "description": "Trading precision for speed with INT8 quantization",
-            "test_file": "checkpoint_16_quantization.py",
-            "capability": "Can I trade precision for speed with INT8 quantization?"
-        },
-        "17": {
-            "name": "Compression",
-            "description": "Neural network pruning for edge deployment",
-            "test_file": "checkpoint_17_compression.py",
-            "capability": "Can I remove 70% of parameters while maintaining accuracy?"
-        },
-        "18": {
-            "name": "Caching",
-            "description": "KV caching for transformer inference optimization",
-            "test_file": "checkpoint_18_caching.py",
-            "capability": "Can I transform O(N²) to O(N) complexity with intelligent caching?"
-        },
-        "19": {
-            "name": "Competition",
-            "description": "TinyMLPerf competition system for optimization mastery",
-            "test_file": "checkpoint_19_competition.py",
-            "capability": "Can I build competition-grade benchmarking infrastructure?"
-        },
-        "20": {
-            "name": "TinyGPT Capstone",
-            "description": "Complete language model demonstrating ML systems mastery",
-            "test_file": "checkpoint_20_capstone.py",
-            "capability": "Can I build a complete language model that generates coherent text from scratch?"
-        }
-    }
-    
-    def __init__(self, config: CLIConfig):
-        """Initialize checkpoint system."""
-        self.config = config
-        self.console = get_console()
-        self.modules_dir = config.project_root / "modules" / "source"
-        self.checkpoints_dir = config.project_root / "tests" / "checkpoints"
-    
-    def get_checkpoint_test_status(self, checkpoint_id: str) -> Dict[str, bool]:
-        """Get the status of a checkpoint test file."""
-        if checkpoint_id not in self.CHECKPOINTS:
-            return {"exists": False, "tested": False, "passed": False}
-        
-        test_file = self.CHECKPOINTS[checkpoint_id]["test_file"]
-        test_path = self.checkpoints_dir / test_file
-        
-        return {
-            "exists": test_path.exists(),
-            "tested": False,  # Will be set when we run tests
-            "passed": False   # Will be set based on test results
-        }
-    
-    def get_checkpoint_status(self, checkpoint_id: str) -> Dict:
-        """Get status information for a checkpoint."""
-        checkpoint = self.CHECKPOINTS[checkpoint_id]
-        test_status = self.get_checkpoint_test_status(checkpoint_id)
-        
-        return {
-            "checkpoint": checkpoint,
-            "test_status": test_status,
-            "is_available": test_status["exists"],
-            "is_complete": test_status.get("passed", False),
-            "checkpoint_id": checkpoint_id
-        }
-    
-    def get_overall_progress(self) -> Dict:
-        """Get overall progress across all checkpoints."""
-        checkpoints_status = {}
-        current_checkpoint = None
-        total_complete = 0
-        total_checkpoints = len(self.CHECKPOINTS)
-        
-        for checkpoint_id in self.CHECKPOINTS.keys():
-            status = self.get_checkpoint_status(checkpoint_id)
-            checkpoints_status[checkpoint_id] = status
-            
-            if status["is_complete"]:
-                total_complete += 1
-            elif current_checkpoint is None and status["is_available"]:
-                # First available but incomplete checkpoint is current
-                current_checkpoint = checkpoint_id
-        
-        # If all are complete, set current to last checkpoint
-        if current_checkpoint is None and total_complete == total_checkpoints:
-            current_checkpoint = list(self.CHECKPOINTS.keys())[-1]
-        # If none are complete, start with first
-        elif current_checkpoint is None:
-            current_checkpoint = "00"
-        
-        # Calculate overall percentage
-        overall_percent = (total_complete / total_checkpoints * 100) if total_checkpoints > 0 else 0
-        
-        return {
-            "checkpoints": checkpoints_status,
-            "current": current_checkpoint,
-            "overall_progress": overall_percent,
-            "total_complete": total_complete,
-            "total_checkpoints": total_checkpoints
-        }
-    
-    def run_checkpoint_test(self, checkpoint_id: str) -> Dict:
-        """Run a specific checkpoint test and return results."""
-        if checkpoint_id not in self.CHECKPOINTS:
-            return {"success": False, "error": f"Unknown checkpoint: {checkpoint_id}"}
-        
-        checkpoint = self.CHECKPOINTS[checkpoint_id]
-        test_file = checkpoint["test_file"]
-        test_path = self.checkpoints_dir / test_file
-        
-        if not test_path.exists():
-            return {"success": False, "error": f"Test file not found: {test_file}"}
-        
-        try:
-            # Run the test using subprocess to capture output
-            result = subprocess.run(
-                [sys.executable, str(test_path)],
-                capture_output=True,
-                text=True,
-                cwd=self.config.project_root,
-                timeout=30  # 30 second timeout
-            )
-            
-            return {
-                "success": result.returncode == 0,
-                "returncode": result.returncode,
-                "stdout": result.stdout,
-                "stderr": result.stderr,
-                "checkpoint_name": checkpoint["name"],
-                "capability": checkpoint["capability"]
-            }
-            
-        except subprocess.TimeoutExpired:
-            return {"success": False, "error": "Test timed out after 30 seconds"}
-        except Exception as e:
-            return {"success": False, "error": f"Test execution failed: {str(e)}"}
-
-
-class CheckpointCommand(BaseCommand):
-    """Checkpoint tracking and visualization command."""
-    
-    name = "checkpoint"
-    description = "Track and visualize ML systems engineering progress through checkpoints"
-    
-    def add_arguments(self, parser: argparse.ArgumentParser) -> None:
-        """Add checkpoint-specific arguments."""
-        subparsers = parser.add_subparsers(
-            dest='checkpoint_command',
-            help='Checkpoint operations',
-            metavar='COMMAND'
-        )
-        
-        # Status command
-        status_parser = subparsers.add_parser(
-            'status',
-            help='Show current checkpoint progress'
-        )
-        status_parser.add_argument(
-            '--detailed', '-d',
-            action='store_true',
-            help='Show detailed module-level progress'
-        )
-        
-        # Timeline command
-        timeline_parser = subparsers.add_parser(
-            'timeline',
-            help='Show visual progress timeline'
-        )
-        timeline_parser.add_argument(
-            '--horizontal',
-            action='store_true',
-            help='Show horizontal timeline (default: vertical)'
-        )
-        
-        # Test command
-        test_parser = subparsers.add_parser(
-            'test',
-            help='Test checkpoint capabilities'
-        )
-        test_parser.add_argument(
-            'checkpoint_id',
-            nargs='?',
-            help='Checkpoint ID to test (00-20, current checkpoint if not specified)'
-        )
-        
-        # Run command (new)
-        run_parser = subparsers.add_parser(
-            'run',
-            help='Run specific checkpoint tests with progress tracking'
-        )
-        run_parser.add_argument(
-            'checkpoint_id',
-            help='Checkpoint ID to run (00-20)'
-        )
-        run_parser.add_argument(
-            '--verbose', '-v',
-            action='store_true',
-            help='Show detailed test output'
-        )
-        
-        # Unlock command
-        unlock_parser = subparsers.add_parser(
-            'unlock',
-            help='Attempt to unlock next checkpoint'
-        )
-    
-    def run(self, args: argparse.Namespace) -> int:
-        """Execute checkpoint command."""
-        checkpoint_system = CheckpointSystem(self.config)
-        
-        if not args.checkpoint_command:
-            return self._show_help(args)
-        
-        if args.checkpoint_command == 'status':
-            return self._show_status(checkpoint_system, args)
-        elif args.checkpoint_command == 'timeline':
-            return self._show_timeline(checkpoint_system, args)
-        elif args.checkpoint_command == 'test':
-            return self._test_checkpoint(checkpoint_system, args)
-        elif args.checkpoint_command == 'run':
-            return self._run_checkpoint(checkpoint_system, args)
-        elif args.checkpoint_command == 'unlock':
-            return self._unlock_checkpoint(checkpoint_system, args)
-        else:
-            print_error(f"Unknown checkpoint command: {args.checkpoint_command}")
-            return 1
-    
-    def _show_help(self, args: argparse.Namespace) -> int:
-        """Show checkpoint command help."""
-        console = get_console()
-        console.print(Panel(
-            "[bold cyan]TinyTorch Checkpoint System[/bold cyan]\n\n"
-            "[bold]Track your progress through 20 capability checkpoints:[/bold]\n"
-            "  00-04: Foundation  → Environment, tensors, networks\n"
-            "  05-09: Architecture → Spatial, attention, autograd, optimization\n"
-            "  10-14: Systems     → Training, kernels, benchmarking, deployment\n"
-            "  15-19: Optimization → Acceleration, quantization, compression, caching, competition\n"
-            "  20: Capstone       → Complete TinyGPT language model\n\n"
-            "[bold]Available Commands:[/bold]\n"
-            "  [green]status[/green]     - Show current progress and capabilities\n"
-            "  [green]timeline[/green]   - Visual progress timeline\n"
-            "  [green]test[/green]       - Test checkpoint capabilities\n"
-            "  [green]run[/green]        - Run specific checkpoint with progress\n"
-            "  [green]unlock[/green]     - Attempt to unlock next checkpoint\n\n"
-            "[bold]Examples:[/bold]\n"
-            "  [dim]tito checkpoint status --detailed[/dim]\n"
-            "  [dim]tito checkpoint timeline --horizontal[/dim]\n"
-            "  [dim]tito checkpoint test 16[/dim]\n"
-            "  [dim]tito checkpoint run 20 --verbose[/dim]",
-            title="Checkpoint System (20 Checkpoints)",
-            border_style="bright_blue"
-        ))
-        return 0
-    
-    def _show_status(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
-        """Show checkpoint status."""
-        console = get_console()
-        progress_data = checkpoint_system.get_overall_progress()
-        
-        # Header
-        console.print(Panel(
-            "[bold cyan]🚀 TinyTorch Framework Capabilities[/bold cyan]",
-            border_style="bright_blue"
-        ))
-        
-        # Overall progress
-        overall_percent = progress_data["overall_progress"]
-        console.print(f"\n[bold]Overall Progress:[/bold] {overall_percent:.0f}% ({progress_data['total_complete']}/{progress_data['total_checkpoints']} checkpoints)")
-        
-        # Current status summary
-        current = progress_data["current"]
-        if current:
-            current_status = progress_data["checkpoints"][current]
-            current_name = current_status["checkpoint"]["name"]
-            
-            console.print(f"[bold]Current Checkpoint:[/bold] {current:0>2} - {current_name}")
-            
-            if current_status["is_complete"]:
-                console.print(f"[bold green]✅ {current_name} checkpoint achieved![/bold green]")
-                console.print(f"[dim]Capability unlocked: {current_status['checkpoint']['capability']}[/dim]")
-            else:
-                console.print(f"[bold yellow]🎯 Ready to test {current_name} capabilities[/bold yellow]")
-                console.print(f"[dim]Goal: {current_status['checkpoint']['capability']}[/dim]")
-        
-        console.print()
-        
-        # Checkpoint progress  
-        for checkpoint_id, checkpoint_data in progress_data["checkpoints"].items():
-            checkpoint = checkpoint_data["checkpoint"]
-            
-            # Checkpoint header
-            if checkpoint_data["is_complete"]:
-                status_icon = "✅"
-                status_color = "green"
-            elif checkpoint_id == current:
-                status_icon = "🎯"
-                status_color = "yellow"
-            else:
-                status_icon = "⏳"
-                status_color = "dim"
-            
-            console.print(f"[bold]{status_icon} {checkpoint_id:0>2}: {checkpoint['name']}[/bold] [{status_color}]{'COMPLETE' if checkpoint_data['is_complete'] else 'PENDING'}[/{status_color}]")
-            
-            if args.detailed:
-                # Show test file and availability
-                test_status = checkpoint_data["test_status"]
-                test_available = "✅" if test_status["exists"] else "❌"
-                console.print(f"   {test_available} Test: {checkpoint['test_file']}")
-            
-            console.print(f"   [dim]{checkpoint['capability']}[/dim]\n")
-        
-        return 0
-    
-    def _show_timeline(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
-        """Show visual timeline with Rich progress bar."""
-        console = get_console()
-        progress_data = checkpoint_system.get_overall_progress()
-        
-        console.print("\n[bold cyan]🚀 TinyTorch Framework Progress Timeline[/bold cyan]\n")
-        
-        if args.horizontal:
-            # Enhanced horizontal timeline with progress line
-            overall_percent = progress_data["overall_progress"]
-            total_checkpoints = progress_data["total_checkpoints"]
-            complete_checkpoints = progress_data["total_complete"]
-            
-            # Create a visual progress bar
-            filled = int(overall_percent / 2)  # 50 characters total width
-            bar = "█" * filled + "░" * (50 - filled)
-            console.print(f"[bold]Overall:[/bold] [{bar}] {overall_percent:.0f}%")
-            console.print(f"[dim]{complete_checkpoints}/{total_checkpoints} checkpoints complete[/dim]\n")
-            
-            # Show checkpoint progression - group in rows of 8
-            checkpoints_list = list(progress_data["checkpoints"].items())
-            
-            for row_start in range(0, len(checkpoints_list), 8):
-                row_checkpoints = checkpoints_list[row_start:row_start + 8]
-                
-                # Build the checkpoint line for this row
-                checkpoint_line = ""
-                names_line = ""
-                
-                for i, (checkpoint_id, checkpoint_data) in enumerate(row_checkpoints):
-                    checkpoint = checkpoint_data["checkpoint"]
-                    
-                    # Checkpoint status
-                    if checkpoint_data["is_complete"]:
-                        checkpoint_marker = f"[green]●[/green]"
-                        name_color = "green"
-                    elif checkpoint_id == progress_data["current"]:
-                        checkpoint_marker = f"[yellow]◉[/yellow]"
-                        name_color = "yellow"
-                    else:
-                        checkpoint_marker = f"[dim]○[/dim]"
-                        name_color = "dim"
-                    
-                    # Add checkpoint with ID
-                    checkpoint_line += f"{checkpoint_marker}{checkpoint_id}"
-                    names_line += f"[{name_color}]{checkpoint['name'][:9]:^9}[/{name_color}]"
-                    
-                    # Add spacing (except for last in row)
-                    if i < len(row_checkpoints) - 1:
-                        if checkpoint_data["is_complete"]:
-                            checkpoint_line += "[green]━━[/green]"
-                        else:
-                            checkpoint_line += "[dim]━━[/dim]"
-                        names_line += "  "
-                
-                console.print(checkpoint_line)
-                console.print(names_line)
-                console.print()  # Empty line between rows
-            
-        else:
-            # Vertical timeline (tree structure)
-            tree = Tree("ML Systems Engineering Journey (20 Checkpoints)")
-            
-            for checkpoint_id, checkpoint_data in progress_data["checkpoints"].items():
-                checkpoint = checkpoint_data["checkpoint"]
-                
-                if checkpoint_data["is_complete"]:
-                    checkpoint_text = f"[green]✅ {checkpoint_id}: {checkpoint['name']}[/green]"
-                elif checkpoint_id == progress_data["current"]:
-                    checkpoint_text = f"[yellow]🎯 {checkpoint_id}: {checkpoint['name']} (CURRENT)[/yellow]"
-                else:
-                    checkpoint_text = f"[dim]⏳ {checkpoint_id}: {checkpoint['name']}[/dim]"
-                
-                checkpoint_node = tree.add(checkpoint_text)
-                checkpoint_node.add(f"[dim]{checkpoint['capability']}[/dim]")
-            
-            console.print(tree)
-        
-        console.print()
-        return 0
-    
-    def _test_checkpoint(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
-        """Test checkpoint capabilities."""
-        console = get_console()
-        
-        # Determine which checkpoint to test
-        checkpoint_id = args.checkpoint_id
-        if not checkpoint_id:
-            progress_data = checkpoint_system.get_overall_progress()
-            checkpoint_id = progress_data["current"]
-        
-        # Validate checkpoint ID
-        if checkpoint_id not in checkpoint_system.CHECKPOINTS:
-            print_error(f"Unknown checkpoint: {checkpoint_id}")
-            console.print(f"[dim]Available checkpoints: {', '.join(checkpoint_system.CHECKPOINTS.keys())}[/dim]")
-            return 1
-        
-        checkpoint = checkpoint_system.CHECKPOINTS[checkpoint_id]
-        
-        # Show what we're testing
-        console.print(f"\n[bold cyan]Testing Checkpoint {checkpoint_id}: {checkpoint['name']}[/bold cyan]")
-        console.print(f"[bold]Capability Question:[/bold] {checkpoint['capability']}\n")
-        
-        # Run the test
-        with console.status(f"[bold green]Running checkpoint {checkpoint_id} test...", spinner="dots") as status:
-            result = checkpoint_system.run_checkpoint_test(checkpoint_id)
-        
-        # Display results
-        if result["success"]:
-            console.print(f"[bold green]✅ Checkpoint {checkpoint_id} PASSED![/bold green]")
-            console.print(f"[green]Capability achieved: {checkpoint['capability']}[/green]\n")
-            
-            # Show brief output
-            if result.get("stdout") and "🎉" in result["stdout"]:
-                # Extract the completion message
-                lines = result["stdout"].split('\n')
-                for line in lines:
-                    if "🎉" in line or "📝" in line or "🎯" in line:
-                        console.print(f"[dim]{line}[/dim]")
-            
-            print_success(f"Checkpoint {checkpoint_id} test completed successfully!")
-            return 0
-        else:
-            console.print(f"[bold red]❌ Checkpoint {checkpoint_id} FAILED[/bold red]\n")
-            
-            # Show error details
-            if "error" in result:
-                console.print(f"[red]Error: {result['error']}[/red]")
-            elif result.get("stderr"):
-                console.print(f"[red]Error output:[/red]")
-                console.print(f"[dim]{result['stderr']}[/dim]")
-            elif result.get("stdout"):
-                console.print(f"[yellow]Test output:[/yellow]")
-                console.print(f"[dim]{result['stdout']}[/dim]")
-            
-            print_error(f"Checkpoint {checkpoint_id} test failed")
-            return 1
-    
-    def _run_checkpoint(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
-        """Run specific checkpoint test with detailed progress tracking."""
-        console = get_console()
-        checkpoint_id = args.checkpoint_id
-        
-        # Validate checkpoint ID
-        if checkpoint_id not in checkpoint_system.CHECKPOINTS:
-            print_error(f"Unknown checkpoint: {checkpoint_id}")
-            console.print(f"[dim]Available checkpoints: {', '.join(checkpoint_system.CHECKPOINTS.keys())}[/dim]")
-            return 1
-        
-        checkpoint = checkpoint_system.CHECKPOINTS[checkpoint_id]
-        
-        # Show detailed information
-        console.print(Panel(
-            f"[bold cyan]Checkpoint {checkpoint_id}: {checkpoint['name']}[/bold cyan]\n\n"
-            f"[bold]Capability Question:[/bold]\n{checkpoint['capability']}\n\n"
-            f"[bold]Test File:[/bold] {checkpoint['test_file']}\n"
-            f"[bold]Description:[/bold] {checkpoint['description']}",
-            title=f"Running Checkpoint {checkpoint_id}",
-            border_style="bright_blue"
-        ))
-        
-        # Check if test file exists
-        test_path = checkpoint_system.checkpoints_dir / checkpoint["test_file"]
-        if not test_path.exists():
-            print_error(f"Test file not found: {checkpoint['test_file']}")
-            return 1
-        
-        console.print(f"\n[bold]Executing test...[/bold]")
-        
-        # Run the test with status feedback
-        with console.status(f"[bold green]Running checkpoint {checkpoint_id} test...", spinner="dots"):
-            result = checkpoint_system.run_checkpoint_test(checkpoint_id)
-        
-        console.print()
-        
-        # Display detailed results
-        if result["success"]:
-            console.print(Panel(
-                f"[bold green]✅ SUCCESS![/bold green]\n\n"
-                f"[green]Checkpoint {checkpoint_id} completed successfully![/green]\n"
-                f"[green]Capability achieved: {checkpoint['capability']}[/green]",
-                title="Test Results",
-                border_style="green"
-            ))
-            
-            # Show test output if verbose or if it contains key markers
-            if args.verbose or (result.get("stdout") and any(marker in result["stdout"] for marker in ["🎉", "✅", "📝", "🎯"])):
-                console.print(f"\n[bold]Test Output:[/bold]")
-                if result.get("stdout"):
-                    console.print(result["stdout"])
-            
-            return 0
-        else:
-            console.print(Panel(
-                f"[bold red]❌ FAILED[/bold red]\n\n"
-                f"[red]Checkpoint {checkpoint_id} test failed[/red]\n"
-                f"[yellow]This indicates the required capabilities are not yet implemented.[/yellow]",
-                title="Test Results",
-                border_style="red"
-            ))
-            
-            # Show error details
-            if "error" in result:
-                console.print(f"\n[bold red]Error:[/bold red] {result['error']}")
-            
-            if args.verbose or "error" in result:
-                if result.get("stdout"):
-                    console.print(f"\n[bold]Standard Output:[/bold]")
-                    console.print(result["stdout"])
-                if result.get("stderr"):
-                    console.print(f"\n[bold]Error Output:[/bold]")
-                    console.print(result["stderr"])
-            
-            return 1
-    
-    def _unlock_checkpoint(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
-        """Attempt to unlock next checkpoint."""
-        console = get_console()
-        progress_data = checkpoint_system.get_overall_progress()
-        current = progress_data["current"]
-        
-        if not current:
-            console.print("[green]All checkpoints completed! 🎉[/green]")
-            return 0
-        
-        current_status = progress_data["checkpoints"][current]
-        
-        if current_status["is_complete"]:
-            console.print(f"[green]✅ Checkpoint {current} ({current_status['checkpoint']['name']}) already complete![/green]")
-            
-            # Find next checkpoint
-            checkpoint_ids = list(checkpoint_system.CHECKPOINTS.keys())
-            try:
-                current_index = checkpoint_ids.index(current)
-                if current_index < len(checkpoint_ids) - 1:
-                    next_id = checkpoint_ids[current_index + 1]
-                    next_checkpoint = checkpoint_system.CHECKPOINTS[next_id]
-                    console.print(f"[bold]Next checkpoint:[/bold] {next_id} - {next_checkpoint['name']}")
-                    console.print(f"[dim]Goal: {next_checkpoint['capability']}[/dim]")
-                else:
-                    console.print("[bold]🎉 All checkpoints completed![/bold]")
-            except ValueError:
-                console.print("[yellow]Cannot determine next checkpoint[/yellow]")
-        else:
-            console.print(f"[yellow]Test checkpoint {current} to unlock your next capability:[/yellow]")
-            console.print(f"[bold]Goal:[/bold] {current_status['checkpoint']['capability']}")
-            console.print(f"[dim]Run: tito checkpoint run {current}[/dim]")
-        
-        return 0
--- a/tito/commands/_archived/clean.py
+++ b/tito/commands/_archived/clean.py
@@ -1,160 +0,0 @@
-"""
-Clean command for TinyTorch CLI: cleans up module directories to start fresh.
-"""
-
-import shutil
-from argparse import ArgumentParser, Namespace
-from pathlib import Path
-from rich.panel import Panel
-from rich.text import Text
-
-from .base import BaseCommand
-
-class CleanCommand(BaseCommand):
-    @property
-    def name(self) -> str:
-        return "clean"
-
-    @property
-    def description(self) -> str:
-        return "Clean up module directories (notebooks, cache, etc.)"
-
-    def add_arguments(self, parser: ArgumentParser) -> None:
-        parser.add_argument("module", nargs="?", help="Clean specific module only")
-        parser.add_argument("--notebooks", action="store_true", help="Remove generated notebook files")
-        parser.add_argument("--cache", action="store_true", help="Remove Python cache files")
-        parser.add_argument("--all", action="store_true", help="Clean all modules")
-        parser.add_argument("--force", action="store_true", help="Skip confirmation prompt")
-
-    def run(self, args: Namespace) -> int:
-        console = self.console
-        
-        console.print(Panel("🧹 Cleaning Module Directories", 
-                           title="Module Cleanup", border_style="bright_yellow"))
-        
-        modules_dir = Path("modules")
-        if not modules_dir.exists():
-            console.print(Panel("[red]❌ modules/ directory not found[/red]", 
-                              title="Error", border_style="red"))
-            return 1
-        
-        # Determine what to clean (file types)
-        clean_notebooks = args.notebooks or (not args.notebooks and not args.cache)
-        clean_cache = args.cache or (not args.notebooks and not args.cache)
-        
-        # Determine which modules to clean
-        if args.module:
-            module_path = modules_dir / args.module
-            if not module_path.exists():
-                console.print(Panel(f"[red]❌ Module '{args.module}' not found[/red]", 
-                                  title="Module Not Found", border_style="red"))
-                return 1
-            module_dirs = [module_path]
-        elif args.all:
-            # Find all module directories (exclude special directories)
-            exclude_dirs = {'.quarto', '__pycache__', '.git', '.pytest_cache', 'sidebar.yml', 'nbdev.yml'}
-            module_dirs = [d for d in modules_dir.iterdir() 
-                          if d.is_dir() and d.name not in exclude_dirs]
-        else:
-            # No module specified and no --all flag
-            console.print(Panel("[red]❌ Please specify a module name or use --all to clean all modules[/red]\n\n"
-                              "[dim]Examples:[/dim]\n"
-                              "[dim]  tito module clean tensor     - Clean specific module[/dim]\n"
-                              "[dim]  tito module clean --all      - Clean all modules[/dim]", 
-                              title="Module Required", border_style="red"))
-            return 1
-        
-        if not module_dirs:
-            console.print(Panel("[yellow]⚠️  No modules found to clean[/yellow]", 
-                              title="Nothing to Clean", border_style="yellow"))
-            return 0
-        
-        # Show what will be cleaned
-        clean_text = Text()
-        clean_text.append("📋 Cleanup Plan:\n\n", style="bold cyan")
-        
-        files_to_remove = []
-        for module_dir in module_dirs:
-            module_name = module_dir.name
-            clean_text.append(f"📁 {module_name}:\n", style="bold white")
-            
-            if clean_notebooks:
-                # Find .ipynb files
-                for ipynb_file in module_dir.glob("*.ipynb"):
-                    files_to_remove.append(ipynb_file)
-                    clean_text.append(f"  🗑️  {ipynb_file.name}\n", style="yellow")
-            
-            if clean_cache:
-                # Find __pycache__ directories
-                pycache_dirs = []
-                for pycache in module_dir.rglob("__pycache__"):
-                    if pycache.is_dir():
-                        pycache_dirs.append(pycache)
-                        files_to_remove.append(pycache)
-                        clean_text.append(f"  🗑️  {pycache.relative_to(module_dir)}/\n", style="yellow")
-                
-                # Find .pyc files that are NOT inside __pycache__ directories
-                for pyc_file in module_dir.rglob("*.pyc"):
-                    # Check if this pyc file is inside any __pycache__ directory
-                    is_in_pycache = any(pycache in pyc_file.parents for pycache in pycache_dirs)
-                    if not is_in_pycache:
-                        files_to_remove.append(pyc_file)
-                        clean_text.append(f"  🗑️  {pyc_file.relative_to(module_dir)}\n", style="yellow")
-        
-        if not files_to_remove:
-            console.print(Panel("[green]✅ No files found to clean - modules are already clean![/green]", 
-                              title="Already Clean", border_style="green"))
-            return 0
-        
-        clean_text.append(f"\n📊 Total: {len(files_to_remove)} files/directories to remove\n", style="bold cyan")
-        
-        console.print(Panel(clean_text, title="Cleanup Preview", border_style="bright_yellow"))
-        
-        # Ask for confirmation unless --force is used
-        if not args.force:
-            console.print("\n[yellow]This will permanently remove the files listed above.[/yellow]")
-            console.print("[yellow]Python source files (*.py) will be preserved.[/yellow]\n")
-            
-            try:
-                response = input("Are you sure you want to proceed? (y/N): ").strip().lower()
-                if response not in ['y', 'yes']:
-                    console.print(Panel("[cyan]Cleanup cancelled.[/cyan]", 
-                                      title="Cancelled", border_style="cyan"))
-                    return 0
-            except KeyboardInterrupt:
-                console.print(Panel("[cyan]Cleanup cancelled.[/cyan]", 
-                                  title="Cancelled", border_style="cyan"))
-                return 0
-        
-        # Perform cleanup
-        removed_count = 0
-        error_count = 0
-        
-        for file_path in files_to_remove:
-            try:
-                if file_path.is_dir():
-                    shutil.rmtree(file_path)
-                else:
-                    file_path.unlink()
-                removed_count += 1
-            except Exception as e:
-                console.print(f"  ❌ Failed to remove {file_path}: {e}")
-                error_count += 1
-        
-        # Show results
-        result_text = Text()
-        if removed_count > 0:
-            result_text.append(f"✅ Successfully removed {removed_count} files/directories\n", style="bold green")
-        if error_count > 0:
-            result_text.append(f"❌ Failed to remove {error_count} files/directories\n", style="bold red")
-        
-        if removed_count > 0:
-            result_text.append("\n💡 Next steps:\n", style="bold yellow")
-            result_text.append("  • Run: tito module notebooks      - Regenerate notebooks\n", style="white")
-            result_text.append("  • Run: tito module test --all     - Test all modules\n", style="white")
-            result_text.append("  • Run: tito module export --all   - Export to package\n", style="white")
-        
-        border_style = "green" if error_count == 0 else "yellow"
-        console.print(Panel(result_text, title="Cleanup Complete", border_style=border_style))
-        
-        return 0 if error_count == 0 else 1 
--- a/tito/commands/_archived/demo.py
+++ b/tito/commands/_archived/demo.py
@@ -1,263 +0,0 @@
-#!/usr/bin/env python3
-"""
-Tito Demo Command - Show off your AI capabilities!
-Runs progressive demos showing what TinyTorch can do at each stage.
-"""
-
-import argparse
-import subprocess
-import sys
-from pathlib import Path
-from rich.console import Console
-from rich.table import Table
-from rich.panel import Panel
-from rich.text import Text
-
-from .base import BaseCommand
-
-console = Console()
-
-class TinyTorchDemoMatrix:
-    """Tracks and displays TinyTorch AI demo capabilities"""
-    
-    def __init__(self):
-        self.demos = {
-            'math': {
-                'name': 'Mathematical Operations',
-                'file': 'demo_tensor_math.py',
-                'requires': ['02_tensor'],
-                'description': 'Linear algebra, matrix operations, transformations'
-            },
-            'logic': {
-                'name': 'Logical Reasoning', 
-                'file': 'demo_activations.py',
-                'requires': ['02_tensor', '03_activations'],
-                'description': 'Boolean functions, XOR problem, decision boundaries'
-            },
-            'neuron': {
-                'name': 'Single Neuron Learning',
-                'file': 'demo_single_neuron.py', 
-                'requires': ['02_tensor', '03_activations', '04_layers'],
-                'description': 'Watch a neuron learn the AND gate'
-            },
-            'network': {
-                'name': 'Multi-Layer Networks',
-                'file': 'demo_xor_network.py',
-                'requires': ['02_tensor', '03_activations', '04_layers', '05_dense'],
-                'description': 'Solve the famous XOR problem'
-            },
-            'vision': {
-                'name': 'Computer Vision',
-                'file': 'demo_vision.py',
-                'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '06_spatial'],
-                'description': 'Image processing and pattern recognition'
-            },
-            'attention': {
-                'name': 'Attention Mechanisms',
-                'file': 'demo_attention.py',
-                'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '07_attention'],
-                'description': 'Sequence processing and attention'
-            },
-            'training': {
-                'name': 'End-to-End Training',
-                'file': 'demo_training.py',
-                'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '11_training'],
-                'description': 'Complete training pipelines'
-            },
-            'language': {
-                'name': 'Language Generation',
-                'file': 'demo_language.py',
-                'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '07_attention', '16_tinygpt'],
-                'description': 'AI text generation and language models'
-            }
-        }
-    
-    def check_module_exported(self, module_name):
-        """Check if a module has been exported to the package"""
-        try:
-            if module_name == '02_tensor':
-                import tinytorch.core.tensor
-                return True
-            elif module_name == '03_activations':
-                import tinytorch.core.activations
-                return True
-            elif module_name == '04_layers':
-                import tinytorch.core.layers
-                return True
-            elif module_name == '05_dense':
-                import tinytorch.core.dense
-                return True
-            elif module_name == '06_spatial':
-                import tinytorch.core.spatial
-                return True
-            elif module_name == '07_attention':
-                import tinytorch.core.attention
-                return True
-            elif module_name == '11_training':
-                import tinytorch.core.training
-                return True
-            elif module_name == '16_tinygpt':
-                import tinytorch.tinygpt
-                return True
-            return False
-        except ImportError:
-            return False
-    
-    def get_demo_status(self, demo_name):
-        """Get status of a demo: available, partial, or unavailable"""
-        demo = self.demos[demo_name]
-        required_modules = demo['requires']
-        
-        available_count = sum(1 for module in required_modules if self.check_module_exported(module))
-        total_count = len(required_modules)
-        
-        if available_count == total_count:
-            return '✅'  # Fully available
-        elif available_count > 0:
-            return '⚡'  # Partially available
-        else:
-            return '❌'  # Not available
-    
-    def show_matrix(self):
-        """Display the demo capability matrix"""
-        console.print("\n🤖 TinyTorch Demo Matrix", style="bold cyan")
-        console.print("=" * 50)
-        
-        table = Table(show_header=True, header_style="bold magenta")
-        table.add_column("Demo", style="cyan", width=20)
-        table.add_column("Status", justify="center", width=8)
-        table.add_column("Description", style="dim")
-        
-        available_demos = []
-        
-        for demo_name, demo_info in self.demos.items():
-            status = self.get_demo_status(demo_name)
-            table.add_row(demo_info['name'], status, demo_info['description'])
-            
-            if status == '✅':
-                available_demos.append(demo_name)
-        
-        console.print(table)
-        console.print()
-        
-        if available_demos:
-            console.print("🎯 Available Demos:", style="bold green")
-            for demo in available_demos:
-                console.print(f"  • tito demo {demo}")
-            console.print()
-        
-        console.print("Legend: ✅ Ready  ⚡ Partial  ❌ Not Available")
-        console.print()
-    
-    def run_demo(self, demo_name):
-        """Run a specific demo"""
-        if demo_name not in self.demos:
-            console.print(f"❌ Unknown demo: {demo_name}", style="red")
-            console.print("Available demos:", ', '.join(self.demos.keys()))
-            return False
-        
-        demo = self.demos[demo_name]
-        status = self.get_demo_status(demo_name)
-        
-        if status == '❌':
-            console.print(f"❌ Demo '{demo_name}' not available", style="red")
-            missing_modules = [m for m in demo['requires'] if not self.check_module_exported(m)]
-            console.print(f"Missing modules: {', '.join(missing_modules)}")
-            console.print(f"Run: tito export {' '.join(missing_modules)}")
-            return False
-        
-        if status == '⚡':
-            console.print(f"⚠️ Demo '{demo_name}' partially available", style="yellow")
-            console.print("Some features may not work correctly.")
-        
-        # Find the demo file
-        project_root = Path(__file__).parent.parent.parent
-        demo_file = project_root / "demos" / demo['file']
-        
-        if not demo_file.exists():
-            console.print(f"❌ Demo file not found: {demo_file}", style="red")
-            return False
-        
-        console.print(f"🚀 Running {demo['name']} Demo...", style="bold green")
-        console.print()
-        
-        # Run the demo
-        try:
-            result = subprocess.run([sys.executable, str(demo_file)], 
-                                  capture_output=False, 
-                                  text=True)
-            return result.returncode == 0
-        except Exception as e:
-            console.print(f"❌ Demo failed: {e}", style="red")
-            return False
-
-class DemoCommand(BaseCommand):
-    """Command for running TinyTorch AI capability demos"""
-    
-    def __init__(self, config):
-        super().__init__(config)
-        self.matrix = TinyTorchDemoMatrix()
-    
-    @property
-    def name(self) -> str:
-        return "demo"
-    
-    @property
-    def description(self) -> str:
-        return "Run AI capability demos"
-    
-    def add_arguments(self, parser):
-        """Add demo command arguments"""
-        parser.add_argument('demo_name', nargs='?', 
-                           help='Name of demo to run (math, logic, neuron, network, etc.)')
-        parser.add_argument('--all', action='store_true',
-                           help='Run all available demos')
-        parser.add_argument('--matrix', action='store_true',
-                           help='Show capability matrix only')
-    
-    def run(self, args):
-        """Execute the demo command"""
-        # Just show matrix if no args or --matrix flag
-        if not args.demo_name and not args.all or args.matrix:
-            self.matrix.show_matrix()
-            return
-        
-        # Run all available demos
-        if args.all:
-            self.matrix.show_matrix()
-            available_demos = [name for name in self.matrix.demos.keys() 
-                              if self.matrix.get_demo_status(name) == '✅']
-            
-            if not available_demos:
-                console.print("❌ No demos available. Export some modules first!", style="red")
-                return
-            
-            console.print(f"🚀 Running {len(available_demos)} available demos...", style="bold green")
-            console.print()
-            
-            for demo_name in available_demos:
-                console.print(f"\n{'='*60}")
-                success = self.matrix.run_demo(demo_name)
-                if not success:
-                    console.print(f"❌ Demo {demo_name} failed", style="red")
-            
-            console.print(f"\n{'='*60}")
-            console.print("🏆 All available demos completed!", style="bold green")
-            return
-        
-        # Run specific demo
-        if args.demo_name:
-            self.matrix.run_demo(args.demo_name)
-
-def main():
-    """Standalone entry point for development"""
-    import argparse
-    parser = argparse.ArgumentParser()
-    DemoCommand.add_parser(parser._subparsers_action.add_parser if hasattr(parser, '_subparsers_action') else parser.add_subparser)
-    args = parser.parse_args()
-    
-    cmd = DemoCommand()
-    cmd.execute(args)
-
-if __name__ == "__main__":
-    main()
--- a/tito/commands/_archived/help.py
+++ b/tito/commands/_archived/help.py
@@ -1,469 +0,0 @@
-"""
-Tiny🔥Torch Interactive Help System
-
-Provides contextual, progressive guidance for new and experienced users.
-"""
-
-from argparse import ArgumentParser, Namespace
-from typing import Optional, List, Dict, Any
-import os
-from pathlib import Path
-
-from .base import BaseCommand
-from ..core.config import CLIConfig
-from ..core.console import get_console
-from rich.console import Console
-from rich.panel import Panel
-from rich.columns import Columns
-from rich.table import Table
-from rich.text import Text
-from rich.prompt import Prompt, Confirm
-
-
-class HelpCommand(BaseCommand):
-    """Interactive help and onboarding system."""
-    
-    @property
-    def name(self) -> str:
-        return "help"
-    
-    @property
-    def description(self) -> str:
-        return "Interactive help system with guided onboarding"
-    
-    def add_arguments(self, parser: ArgumentParser) -> None:
-        """Add help command arguments."""
-        parser.add_argument(
-            'topic', 
-            nargs='?', 
-            help='Specific help topic (getting-started, commands, workflow, etc.)'
-        )
-        parser.add_argument(
-            '--interactive', '-i',
-            action='store_true',
-            help='Launch interactive onboarding wizard'
-        )
-        parser.add_argument(
-            '--quick', '-q',
-            action='store_true',
-            help='Show quick reference card'
-        )
-    
-    def run(self, args: Namespace) -> int:
-        """Execute help command."""
-        console = get_console()
-        
-        # Interactive onboarding wizard
-        if args.interactive:
-            return self._interactive_onboarding()
-        
-        # Quick reference
-        if args.quick:
-            return self._show_quick_reference()
-        
-        # Topic-specific help
-        if args.topic:
-            return self._show_topic_help(args.topic)
-        
-        # Default: Show main help with user context
-        return self._show_contextual_help()
-    
-    def _interactive_onboarding(self) -> int:
-        """Launch interactive onboarding wizard."""
-        console = get_console()
-        
-        # Welcome screen
-        console.print(Panel.fit(
-            "[bold blue]🚀 Welcome to Tiny🔥Torch![/bold blue]\n\n"
-            "Let's get you started on your ML systems engineering journey.\n"
-            "This quick wizard will help you understand what Tiny🔥Torch is\n"
-            "and guide you to the right starting point.",
-            title="Tiny🔥Torch Onboarding Wizard",
-            border_style="blue"
-        ))
-        
-        # User experience assessment
-        experience = self._assess_user_experience()
-        
-        # Learning goal identification
-        goals = self._identify_learning_goals()
-        
-        # Time commitment assessment
-        time_commitment = self._assess_time_commitment()
-        
-        # Generate personalized recommendations
-        recommendations = self._generate_recommendations(experience, goals, time_commitment)
-        
-        # Show personalized path
-        self._show_personalized_path(recommendations)
-        
-        # Offer to start immediately
-        if Confirm.ask("\n[bold green]Ready to start your first steps?[/bold green]"):
-            self._launch_first_steps(recommendations)
-        
-        return 0
-    
-    def _assess_user_experience(self) -> str:
-        """Assess user's ML and programming experience."""
-        console = get_console()
-        
-        console.print("\n[bold cyan]📋 Quick Experience Assessment[/bold cyan]")
-        
-        choices = [
-            "New to ML and Python - need fundamentals",
-            "Know Python, new to ML - want to learn systems",
-            "Use PyTorch/TensorFlow - want to understand internals", 
-            "ML Engineer - need to debug/optimize production systems",
-            "Instructor - want to teach this course"
-        ]
-        
-        console.print("\nWhat best describes your background?")
-        for i, choice in enumerate(choices, 1):
-            console.print(f"  {i}. {choice}")
-        
-        while True:
-            try:
-                selection = int(Prompt.ask("\nEnter your choice (1-5)"))
-                if 1 <= selection <= 5:
-                    return ['beginner', 'python_user', 'framework_user', 'ml_engineer', 'instructor'][selection-1]
-                else:
-                    console.print("[red]Please enter a number between 1-5[/red]")
-            except ValueError:
-                console.print("[red]Please enter a valid number[/red]")
-    
-    def _identify_learning_goals(self) -> List[str]:
-        """Identify user's learning goals."""
-        console = get_console()
-        
-        console.print("\n[bold cyan]🎯 Learning Goals[/bold cyan]")
-        console.print("What do you want to achieve? (Select all that apply)")
-        
-        goals = [
-            ("understand_internals", "Understand how PyTorch/TensorFlow work internally"),
-            ("build_networks", "Build neural networks from scratch"),
-            ("optimize_performance", "Learn to optimize ML system performance"),
-            ("debug_production", "Debug production ML systems"),
-            ("teach_course", "Teach ML systems to others"),
-            ("career_transition", "Transition from software engineering to ML"),
-            ("research_custom", "Implement custom operations for research")
-        ]
-        
-        selected_goals = []
-        for key, description in goals:
-            if Confirm.ask(f"  • {description}?"):
-                selected_goals.append(key)
-        
-        return selected_goals
-    
-    def _assess_time_commitment(self) -> str:
-        """Assess available time commitment."""
-        console = get_console()
-        
-        console.print("\n[bold cyan]⏰ Time Commitment[/bold cyan]")
-        
-        choices = [
-            ("15_minutes", "15 minutes - just want a quick taste"),
-            ("2_hours", "2 hours - explore a few modules"),
-            ("weekend", "Weekend project - build something substantial"),
-            ("semester", "8-12 weeks - complete learning journey"),
-            ("teaching", "Teaching timeline - need instructor resources")
-        ]
-        
-        console.print("How much time can you dedicate?")
-        for i, (key, description) in enumerate(choices, 1):
-            console.print(f"  {i}. {description}")
-        
-        while True:
-            try:
-                selection = int(Prompt.ask("\nEnter your choice (1-5)"))
-                if 1 <= selection <= 5:
-                    return choices[selection-1][0]
-                else:
-                    console.print("[red]Please enter a number between 1-5[/red]")
-            except ValueError:
-                console.print("[red]Please enter a valid number[/red]")
-    
-    def _generate_recommendations(self, experience: str, goals: List[str], time: str) -> Dict[str, Any]:
-        """Generate personalized recommendations."""
-        
-        # Learning path mapping
-        path_mapping = {
-            'beginner': 'foundation_first',
-            'python_user': 'guided_learning', 
-            'framework_user': 'systems_focus',
-            'ml_engineer': 'optimization_focus',
-            'instructor': 'teaching_resources'
-        }
-        
-        # Starting point mapping
-        start_mapping = {
-            '15_minutes': 'quick_demo',
-            '2_hours': 'first_module',
-            'weekend': 'milestone_project', 
-            'semester': 'full_curriculum',
-            'teaching': 'instructor_setup'
-        }
-        
-        return {
-            'learning_path': path_mapping.get(experience, 'guided_learning'),
-            'starting_point': start_mapping.get(time, 'first_module'),
-            'experience_level': experience,
-            'goals': goals,
-            'time_commitment': time
-        }
-    
-    def _show_personalized_path(self, recommendations: Dict[str, Any]) -> None:
-        """Show personalized learning path."""
-        console = get_console()
-        
-        # Path descriptions
-        paths = {
-            'foundation_first': {
-                'title': '🌱 Foundation First Path',
-                'description': 'Build fundamentals step-by-step with extra explanations',
-                'next_steps': ['Module 1: Setup & Environment', 'Python fundamentals review', 'Linear algebra primer']
-            },
-            'guided_learning': {
-                'title': '🎯 Guided Learning Path', 
-                'description': 'Structured progression through all major concepts',
-                'next_steps': ['Module 1: Setup', 'Module 2: Tensors', 'Track progress with checkpoints']
-            },
-            'systems_focus': {
-                'title': '⚡ Systems Focus Path',
-                'description': 'Understand internals of frameworks you already use',
-                'next_steps': ['Compare PyTorch vs your code', 'Profile memory usage', 'Optimization modules']
-            },
-            'optimization_focus': {
-                'title': '🚀 Optimization Focus Path',
-                'description': 'Performance debugging and production optimization',
-                'next_steps': ['Profiling module', 'Benchmarking module', 'TinyMLPerf competition']
-            },
-            'teaching_resources': {
-                'title': '🎓 Teaching Resources Path',
-                'description': 'Instructor guides and classroom setup',
-                'next_steps': ['Instructor guide', 'NBGrader setup', 'Student progress tracking']
-            }
-        }
-        
-        path_info = paths[recommendations['learning_path']]
-        
-        console.print(f"\n[bold green]✨ Your Personalized Learning Path[/bold green]")
-        console.print(Panel(
-            f"[bold]{path_info['title']}[/bold]\n\n"
-            f"{path_info['description']}\n\n"
-            f"[bold cyan]Your Next Steps:[/bold cyan]\n" +
-            "\n".join(f"  • {step}" for step in path_info['next_steps']),
-            border_style="green"
-        ))
-    
-    def _launch_first_steps(self, recommendations: Dict[str, Any]) -> None:
-        """Launch appropriate first steps based on recommendations."""
-        console = get_console()
-        
-        starting_point = recommendations['starting_point']
-        
-        if starting_point == 'quick_demo':
-            console.print("\n[bold blue]🚀 Launching Quick Demo...[/bold blue]")
-            console.print("Running: [code]tito demo quick[/code]")
-            os.system("tito demo quick")
-            
-        elif starting_point == 'first_module':
-            console.print("\n[bold blue]🛠️ Setting up Module 1...[/bold blue]")
-            console.print("Next commands:")
-            console.print("  [code]cd modules/01_setup[/code]")
-            console.print("  [code]jupyter lab setup.py[/code]")
-            
-        elif starting_point == 'milestone_project':
-            console.print("\n[bold blue]🎯 Weekend Project Recommendations...[/bold blue]")
-            console.print("Suggested goal: Build XOR solver (Modules 1-6)")
-            console.print("Time estimate: 6-8 hours")
-            
-        elif starting_point == 'full_curriculum':
-            console.print("\n[bold blue]📚 Full Curriculum Setup...[/bold blue]")
-            console.print("Running checkpoint system initialization...")
-            os.system("tito checkpoint status")
-            
-        elif starting_point == 'instructor_setup':
-            console.print("\n[bold blue]🎓 Instructor Resources...[/bold blue]")
-            console.print("Opening instructor guide...")
-            console.print("Check: [code]book/usage-paths/classroom-use.html[/code]")
-    
-    def _show_quick_reference(self) -> int:
-        """Show quick reference card."""
-        console = get_console()
-        
-        # Essential commands table
-        table = Table(title="🚀 TinyTorch Quick Reference", show_header=True, header_style="bold cyan")
-        table.add_column("Command", style="bold", width=25)
-        table.add_column("Description", width=40)
-        table.add_column("Example", style="dim", width=30)
-        
-        essential_commands = [
-            ("tito help --interactive", "Launch onboarding wizard", "First time users"),
-            ("tito checkpoint status", "See your progress", "Track learning journey"),
-            ("tito module complete 02", "Finish a module", "Export & test your code"),
-            ("tito demo quick", "See framework in action", "5-minute demonstration"),
-            ("tito leaderboard join", "Join community", "Connect with learners"),
-            ("tito system health", "Check environment", "Troubleshoot issues")
-        ]
-        
-        for cmd, desc, example in essential_commands:
-            table.add_row(cmd, desc, example)
-        
-        console.print(table)
-        
-        # Common workflows
-        console.print("\n[bold cyan]📋 Common Workflows:[/bold cyan]")
-        workflows = [
-            ("New User", "tito help -i → tito checkpoint status → cd modules/01_setup"),
-            ("Continue Learning", "tito checkpoint status → work on next module → tito module complete XX"),
-            ("Join Community", "tito leaderboard join → submit progress → see global rankings"),
-            ("Get Help", "tito system health → check docs/FAQ → ask community")
-        ]
-        
-        for workflow, commands in workflows:
-            console.print(f"  [bold]{workflow}:[/bold] {commands}")
-        
-        return 0
-    
-    def _show_topic_help(self, topic: str) -> int:
-        """Show help for specific topic."""
-        console = get_console()
-        
-        topics = {
-            'getting-started': self._help_getting_started,
-            'commands': self._help_commands,
-            'workflow': self._help_workflow,
-            'modules': self._help_modules,
-            'checkpoints': self._help_checkpoints,
-            'community': self._help_community,
-            'troubleshooting': self._help_troubleshooting
-        }
-        
-        if topic in topics:
-            topics[topic]()
-            return 0
-        else:
-            console.print(f"[red]Unknown help topic: {topic}[/red]")
-            console.print("Available topics: " + ", ".join(topics.keys()))
-            return 1
-    
-    def _show_contextual_help(self) -> int:
-        """Show contextual help based on user progress."""
-        console = get_console()
-        
-        # Check user progress to provide contextual guidance
-        progress = self._assess_user_progress()
-        
-        if progress['is_new_user']:
-            self._show_new_user_help()
-        elif progress['current_module']:
-            self._show_in_progress_help(progress['current_module'])
-        else:
-            self._show_experienced_user_help()
-        
-        return 0
-    
-    def _assess_user_progress(self) -> Dict[str, Any]:
-        """Assess user's current progress."""
-        # Check for checkpoint files, completed modules, etc.
-        # This would integrate with the checkpoint system
-        
-        # Simplified implementation for now
-        checkpoints_dir = Path("tests/checkpoints")
-        modules_dir = Path("modules")
-        
-        return {
-            'is_new_user': not checkpoints_dir.exists(),
-            'current_module': None,  # Would be determined by checkpoint status
-            'completed_modules': [],  # Would be populated from checkpoint results
-            'has_joined_community': False  # Would check leaderboard status
-        }
-    
-    def _show_new_user_help(self) -> None:
-        """Show help optimized for new users."""
-        console = get_console()
-        
-        console.print(Panel.fit(
-            "[bold blue]👋 Welcome to Tiny🔥Torch![/bold blue]\n\n"
-            "You're about to build a complete ML framework from scratch.\n"
-            "Here's how to get started:\n\n"
-            "[bold cyan]Next Steps:[/bold cyan]\n"
-            "1. [code]tito help --interactive[/code] - Personalized onboarding\n"
-            "2. [code]tito system health[/code] - Check your environment\n"
-            "3. [code]tito checkpoint status[/code] - See the learning journey\n\n"
-            "[bold yellow]New to ML systems?[/bold yellow] Run the interactive wizard!",
-            title="Getting Started",
-            border_style="blue"
-        ))
-    
-    def _help_getting_started(self) -> None:
-        """Detailed getting started help."""
-        console = get_console()
-        
-        console.print("[bold blue]🚀 Getting Started with Tiny🔥Torch[/bold blue]\n")
-        
-        # Installation steps
-        install_panel = Panel(
-            "[bold]1. Environment Setup[/bold]\n"
-            "```bash\n"
-            "git clone https://github.com/mlsysbook/Tiny🔥Torch.git\n"
-            "cd Tiny🔥Torch\n"
-            f"python -m venv {self.venv_path}\n"
-            f"source {self.venv_path}/bin/activate  # Windows: .venv\\Scripts\\activate\n"
-            "pip install -r requirements.txt\n"
-            "pip install -e .\n"
-            "```",
-            title="Installation",
-            border_style="green"
-        )
-        
-        # First steps
-        first_steps_panel = Panel(
-            "[bold]2. First Steps[/bold]\n"
-            "• [code]tito system health[/code] - Verify installation\n"
-            "• [code]tito help --interactive[/code] - Personalized guidance\n"
-            "• [code]tito checkpoint status[/code] - See learning path\n"
-            "• [code]cd modules/01_setup[/code] - Start first module",
-            title="First Steps",
-            border_style="blue"
-        )
-        
-        # Learning path
-        learning_panel = Panel(
-            "[bold]3. Learning Journey[/bold]\n"
-            "📚 [bold]Modules 1-8:[/bold] Neural Network Foundations\n"
-            "🔬 [bold]Modules 9-10:[/bold] Computer Vision (CNNs)\n"
-            "🤖 [bold]Modules 11-14:[/bold] Language Models (Transformers)\n"
-            "⚡ [bold]Modules 15-20:[/bold] System Optimization\n\n"
-            "[dim]Each module: Build → Test → Export → Checkpoint[/dim]",
-            title="Learning Path",
-            border_style="yellow"
-        )
-        
-        console.print(Columns([install_panel, first_steps_panel, learning_panel]))
-    
-    # Additional help methods would be implemented here...
-    def _help_commands(self) -> None:
-        """Show comprehensive command reference."""
-        pass
-    
-    def _help_workflow(self) -> None:
-        """Show common workflow patterns."""
-        pass
-    
-    def _help_modules(self) -> None:
-        """Show module system explanation."""
-        pass
-    
-    def _help_checkpoints(self) -> None:
-        """Show checkpoint system explanation.""" 
-        pass
-    
-    def _help_community(self) -> None:
-        """Show community features and leaderboard."""
-        pass
-    
-    def _help_troubleshooting(self) -> None:
-        """Show troubleshooting guide."""
-        pass
--- a/tito/commands/_archived/leaderboard.py
+++ b/tito/commands/_archived/leaderboard.py
--- a/tito/commands/_archived/notebooks.py
+++ b/tito/commands/_archived/notebooks.py
@@ -1,193 +0,0 @@
-"""
-Notebooks command for building Jupyter notebooks from Python files using Jupytext.
-"""
-
-import subprocess
-import sys
-from argparse import ArgumentParser, Namespace
-from pathlib import Path
-from typing import List, Tuple
-
-from rich.panel import Panel
-from rich.text import Text
-
-from .base import BaseCommand
-from ..core.exceptions import ExecutionError, ModuleNotFoundError
-
-class NotebooksCommand(BaseCommand):
-    """Command to build Jupyter notebooks from Python files using Jupytext."""
-    
-    @property
-    def name(self) -> str:
-        return "notebooks"
-    
-    @property
-    def description(self) -> str:
-        return "Build notebooks from Python files"
-    
-    def add_arguments(self, parser: ArgumentParser) -> None:
-        """Add notebooks command arguments."""
-        parser.add_argument(
-            '--module', 
-            help='Build notebook for specific module'
-        )
-        parser.add_argument(
-            '--force',
-            action='store_true',
-            help='Force rebuild even if notebook exists'
-        )
-        parser.add_argument(
-            '--dry-run',
-            action='store_true',
-            help='Show what would be built without actually building'
-        )
-    
-    def validate_args(self, args: Namespace) -> None:
-        """Validate notebooks command arguments."""
-        if args.module:
-            module_dir = self.config.modules_dir / args.module
-            if not module_dir.exists():
-                raise ModuleNotFoundError(f"Module directory '{args.module}' not found")
-
-            # Find module Python file in the module directory
-            # Extract short name from module directory name
-            if args.module.startswith(tuple(f"{i:02d}_" for i in range(100))):
-                short_name = args.module[3:]  # Remove "00_" prefix
-            else:
-                short_name = args.module
-            dev_file = module_dir / f"{short_name}.py"
-            if not dev_file.exists():
-                raise ModuleNotFoundError(
-                    f"No module file found in module '{args.module}'. Expected: {dev_file.name}"
-                )
-    
-    def _find_dev_files(self) -> List[Path]:
-        """Find all module Python files in modules directory."""
-        dev_files = []
-        # Look in modules/ directory
-        modules_dir = self.config.modules_dir
-
-        for module_dir in modules_dir.iterdir():
-            if module_dir.is_dir() and not module_dir.name.startswith('.'):
-                # Extract short name from module directory name
-                module_name = module_dir.name
-                if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
-                    short_name = module_name[3:]  # Remove "00_" prefix
-                else:
-                    short_name = module_name
-                # Look for module Python file (without _dev suffix)
-                py_file = module_dir / f"{short_name}.py"
-                if py_file.exists():
-                    dev_files.append(py_file)
-        return sorted(dev_files)
-    
-    def _convert_file(self, dev_file: Path) -> Tuple[bool, str]:
-        """Convert a single Python file to notebook using Jupytext."""
-        try:
-            # Use Jupytext from venv to convert Python file to notebook
-            import sys
-            venv_python = Path(sys.executable)
-            jupytext_cmd = venv_python.parent / "jupytext"
-
-            result = subprocess.run([
-                str(jupytext_cmd), "--to", "notebook", str(dev_file)
-            ], capture_output=True, text=True, timeout=30, cwd=dev_file.parent)
-            
-            if result.returncode == 0:
-                notebook_file = dev_file.with_suffix('.ipynb')
-                return True, f"{dev_file.name} → {notebook_file.name}"
-            else:
-                error_msg = result.stderr.strip() if result.stderr.strip() else "Conversion failed"
-                return False, error_msg
-                
-        except subprocess.TimeoutExpired:
-            return False, "Conversion timed out"
-        except FileNotFoundError:
-            return False, "Jupytext not found. Install with: pip install jupytext"
-        except Exception as e:
-            return False, f"Error: {str(e)}"
-    
-    def run(self, args: Namespace) -> int:
-        """Execute the notebooks command."""
-        self.console.print(Panel(
-            "📓 Building Notebooks from Python Files (using Jupytext)", 
-            title="Notebook Generation", 
-            border_style="bright_cyan"
-        ))
-        
-        # Find files to convert
-        if args.module:
-            module_dir = self.config.modules_dir / args.module
-            # Extract short name from module directory name
-            module_name = args.module
-            if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
-                short_name = module_name[3:]  # Remove "00_" prefix
-            else:
-                short_name = module_name
-            dev_file = module_dir / f"{short_name}.py"
-            if dev_file.exists():
-                dev_files = [dev_file]
-            else:
-                dev_files = []
-            self.console.print(f"🔄 Building notebook for module: {args.module}")
-        else:
-            dev_files = self._find_dev_files()
-            if not dev_files:
-                self.console.print(Panel(
-                    "[yellow]⚠️  No *.py files found in modules/[/yellow]", 
-                    title="Nothing to Convert", 
-                    border_style="yellow"
-                ))
-                return 0
-            self.console.print(f"🔄 Building notebooks for {len(dev_files)} modules...")
-        
-        # Dry run mode
-        if args.dry_run:
-            self.console.print("\n[cyan]Dry run mode - would convert:[/cyan]")
-            for dev_file in dev_files:
-                module_name = dev_file.parent.name
-                self.console.print(f"  • {module_name}: {dev_file.name}")
-            return 0
-        
-        # Convert files
-        success_count = 0
-        error_count = 0
-        
-        for dev_file in dev_files:
-            success, message = self._convert_file(dev_file)
-            module_name = dev_file.parent.name
-            
-            if success:
-                success_count += 1
-                self.console.print(f"  ✅ {module_name}: {message}")
-            else:
-                error_count += 1
-                self.console.print(f"  ❌ {module_name}: {message}")
-        
-        # Summary
-        self._print_summary(success_count, error_count)
-        
-        return 0 if error_count == 0 else 1
-    
-    def _print_summary(self, success_count: int, error_count: int) -> None:
-        """Print command execution summary."""
-        summary_text = Text()
-        
-        if success_count > 0:
-            summary_text.append(f"✅ Successfully built {success_count} notebook(s)\n", style="bold green")
-        if error_count > 0:
-            summary_text.append(f"❌ Failed to build {error_count} notebook(s)\n", style="bold red")
-        
-        if success_count > 0:
-            summary_text.append("\n💡 Next steps:\n", style="bold yellow")
-            summary_text.append("  • Open notebooks with: jupyter lab\n", style="white")
-            summary_text.append("  • Work interactively in the notebooks\n", style="white")
-            summary_text.append("  • Export code with: tito package export\n", style="white")
-            summary_text.append("  • Run tests with: tito module test\n", style="white")
-        
-        border_style = "green" if error_count == 0 else "yellow"
-        self.console.print(Panel(
-            summary_text, 
-            title="Notebook Generation Complete", 
-            border_style=border_style
-        )) 
--- a/tito/commands/_archived/olympics.py
+++ b/tito/commands/_archived/olympics.py
@@ -1,897 +0,0 @@
-"""
-TinyTorch Olympics Command
-
-Special competition events with focused challenges, time-limited competitions,
-and unique recognition opportunities beyond the regular community leaderboard.
-"""
-
-import json
-import os
-from argparse import ArgumentParser, Namespace
-from datetime import datetime, timedelta
-from pathlib import Path
-from typing import Dict, List, Optional, Any
-import uuid
-
-from rich.panel import Panel
-from rich.table import Table
-from rich.progress import track
-from rich.prompt import Prompt, Confirm
-from rich.console import Group
-from rich.align import Align
-
-from .base import BaseCommand
-from ..core.exceptions import TinyTorchCLIError
-
-
-class OlympicsCommand(BaseCommand):
-    """Special competition events - Focused challenges and recognition"""
-    
-    @property
-    def name(self) -> str:
-        return "olympics"
-    
-    @property
-    def description(self) -> str:
-        return "Special competition events with unique challenges and recognition"
-    
-    def add_arguments(self, parser: ArgumentParser) -> None:
-        """Add olympics subcommands."""
-        subparsers = parser.add_subparsers(
-            dest='olympics_command',
-            help='Olympics operations',
-            metavar='COMMAND'
-        )
-        
-        # Events command
-        events_parser = subparsers.add_parser(
-            'events',
-            help='View current and upcoming competition events'
-        )
-        events_parser.add_argument(
-            '--upcoming',
-            action='store_true',
-            help='Show only upcoming events'
-        )
-        events_parser.add_argument(
-            '--past',
-            action='store_true',
-            help='Show past competition results'
-        )
-        
-        # Compete command
-        compete_parser = subparsers.add_parser(
-            'compete',
-            help='Enter a specific competition event'
-        )
-        compete_parser.add_argument(
-            '--event',
-            required=True,
-            help='Event ID to compete in'
-        )
-        compete_parser.add_argument(
-            '--accuracy',
-            type=float,
-            help='Accuracy achieved for this competition'
-        )
-        compete_parser.add_argument(
-            '--model',
-            help='Model description and approach used'
-        )
-        compete_parser.add_argument(
-            '--code-url',
-            help='Optional: Link to your competition code/approach'
-        )
-        compete_parser.add_argument(
-            '--notes',
-            help='Competition-specific notes, innovations, learnings'
-        )
-        
-        # Awards command
-        awards_parser = subparsers.add_parser(
-            'awards',
-            help='View special recognition and achievement badges'
-        )
-        awards_parser.add_argument(
-            '--personal',
-            action='store_true',
-            help='Show only your personal awards'
-        )
-        
-        # History command
-        history_parser = subparsers.add_parser(
-            'history',
-            help='View past competition events and memorable moments'
-        )
-        history_parser.add_argument(
-            '--year',
-            type=int,
-            help='Filter by specific year'
-        )
-        history_parser.add_argument(
-            '--event-type',
-            choices=['speed', 'accuracy', 'innovation', 'efficiency', 'community'],
-            help='Filter by event type'
-        )
-    
-    def run(self, args: Namespace) -> int:
-        """Execute olympics command."""
-        command = getattr(args, 'olympics_command', None)
-        
-        if not command:
-            self._show_olympics_overview()
-            return 0
-        
-        if command == 'events':
-            return self._show_events(args)
-        elif command == 'compete':
-            return self._compete_in_event(args)
-        elif command == 'awards':
-            return self._show_awards(args)
-        elif command == 'history':
-            return self._show_history(args)
-        else:
-            raise TinyTorchCLIError(f"Unknown olympics command: {command}")
-    
-    def _show_olympics_overview(self) -> None:
-        """Show olympics overview and current special events."""
-        self.console.print(Panel(
-            Group(
-                Align.center("[bold bright_gold]🏅 TinyTorch Olympics 🏅[/bold bright_gold]"),
-                "",
-                "[bold]Special Competition Events![/bold] Beyond the regular community leaderboard:",
-                "",
-                "🎯 [bold bright_blue]Focused Challenges[/bold bright_blue]",
-                "  • Time-limited competitions (24hr, 1week, 1month challenges)",
-                "  • Specific constraints (memory-efficient, fastest training, novel architectures)",
-                "  • Theme-based events (interpretability, fairness, efficiency)",
-                "",
-                "🏆 [bold bright_yellow]Special Recognition[/bold bright_yellow]",
-                "  • Olympic medals and achievement badges",
-                "  • Innovation awards for creative approaches",
-                "  • Community impact recognition",
-                "",
-                "🌟 [bold bright_green]Current Active Events[/bold bright_green]",
-                "  • Winter 2024 Speed Challenge (Training under 5 minutes)",
-                "  • Memory Efficiency Olympics (Models under 1MB)",
-                "  • Architecture Innovation Contest (Novel designs welcome)",
-                "",
-                "[bold]Available Commands:[/bold]",
-                "  [green]events[/green]   - See current and upcoming competitions",
-                "  [green]compete[/green]  - Enter a specific event",
-                "  [green]awards[/green]   - View special recognition and badges",
-                "  [green]history[/green]  - Past competitions and memorable moments",
-                "",
-                "[dim]💡 Note: Olympics are special events separate from daily community leaderboard[/dim]",
-            ),
-            title="🥇 Competition Central",
-            border_style="bright_yellow",
-            padding=(1, 2)
-        ))
-    
-    def _show_events(self, args: Namespace) -> int:
-        """Show current and upcoming competition events."""
-        # Load events data (mock for now)
-        events = self._load_olympics_events()
-        
-        if args.upcoming:
-            events = [e for e in events if e["status"] == "upcoming"]
-            title = "📅 Upcoming Competition Events"
-        elif args.past:
-            events = [e for e in events if e["status"] == "completed"]
-            title = "🏛️ Past Competition Results"
-        else:
-            title = "🏅 All Competition Events"
-        
-        if not events:
-            status_text = "upcoming" if args.upcoming else "past" if args.past else "available"
-            self.console.print(Panel(
-                f"[yellow]No {status_text} events at this time![/yellow]\n\n"
-                "Check back soon for new competition opportunities!",
-                title="📅 No Events",
-                border_style="yellow"
-            ))
-            return 0
-        
-        # Create events table
-        table = Table(title=title)
-        table.add_column("Event", style="bold")
-        table.add_column("Type", style="blue")
-        table.add_column("Duration", style="green")
-        table.add_column("Status", style="yellow")
-        table.add_column("Prize/Recognition", style="bright_magenta")
-        table.add_column("Participants", style="cyan", justify="right")
-        
-        for event in events:
-            status_display = self._get_status_display(event["status"], event.get("end_date"))
-            
-            table.add_row(
-                event["name"],
-                event["type"],
-                event["duration"],
-                status_display,
-                event["prize"],
-                str(event.get("participants", 0))
-            )
-        
-        self.console.print(table)
-        
-        # Show active event details
-        active_events = [e for e in events if e["status"] == "active"]
-        if active_events:
-            self.console.print(Panel(
-                Group(
-                    "[bold bright_green]🔥 Active Competitions You Can Join Now![/bold bright_green]",
-                    "",
-                    *[f"• [bold]{event['name']}[/bold]: {event['description']}" for event in active_events[:3]],
-                    "",
-                    "[bold]Join a competition:[/bold]",
-                    "[dim]tito olympics compete --event <event_id>[/dim]",
-                ),
-                title="⚡ Join Now",
-                border_style="bright_green",
-                padding=(0, 1)
-            ))
-        
-        return 0
-    
-    def _compete_in_event(self, args: Namespace) -> int:
-        """Enter a competition event."""
-        # Check if user is registered for leaderboard
-        if not self._is_user_registered():
-            self.console.print(Panel(
-                "[yellow]Please register for the community leaderboard first![/yellow]\n\n"
-                "Olympics competitions require community membership:\n"
-                "[bold]tito leaderboard register[/bold]",
-                title="📝 Registration Required",
-                border_style="yellow"
-            ))
-            return 1
-        
-        # Load event details
-        event = self._get_event_details(args.event)
-        if not event:
-            self.console.print(Panel(
-                f"[red]Event '{args.event}' not found![/red]\n\n"
-                "See available events: [bold]tito olympics events[/bold]",
-                title="❌ Event Not Found",
-                border_style="red"
-            ))
-            return 1
-        
-        # Check if event is active
-        if event["status"] != "active":
-            self.console.print(Panel(
-                f"[yellow]Event '{event['name']}' is not currently active![/yellow]\n\n"
-                f"Status: {event['status']}\n"
-                "See active events: [bold]tito olympics events[/bold]",
-                title="⏰ Event Not Active",
-                border_style="yellow"
-            ))
-            return 1
-        
-        # Show event details and confirm participation
-        self._show_event_details(event)
-        
-        if not Confirm.ask("\n[bold]Compete in this event?[/bold]"):
-            self.console.print("[dim]Maybe next time! 👋[/dim]")
-            return 0
-        
-        # Gather competition submission
-        submission = self._gather_competition_submission(event, args)
-        
-        # Validate submission meets event criteria
-        validation_result = self._validate_submission(event, submission)
-        if not validation_result["valid"]:
-            self.console.print(Panel(
-                f"[red]Submission doesn't meet event criteria![/red]\n\n"
-                f"Issue: {validation_result['reason']}\n\n"
-                "Please check event requirements and try again.",
-                title="❌ Validation Failed",
-                border_style="red"
-            ))
-            return 1
-        
-        # Save competition entry
-        self._save_competition_entry(event, submission)
-        
-        # Show competition confirmation and standing
-        self._show_competition_confirmation(event, submission)
-        
-        return 0
-    
-    def _show_awards(self, args: Namespace) -> int:
-        """Show special recognition and achievement badges."""
-        if args.personal:
-            return self._show_personal_awards()
-        else:
-            return self._show_all_awards()
-    
-    def _show_personal_awards(self) -> int:
-        """Show user's personal awards and badges."""
-        if not self._is_user_registered():
-            self.console.print(Panel(
-                "[yellow]Please register first to see your awards![/yellow]\n\n"
-                "Run: [bold]tito leaderboard register[/bold]",
-                title="📝 Registration Required",
-                border_style="yellow"
-            ))
-            return 1
-        
-        # Load user's Olympic achievements
-        olympic_profile = self._load_user_olympic_profile()
-        awards = olympic_profile.get("awards", [])
-        competitions = olympic_profile.get("competitions", [])
-        
-        if not awards and not competitions:
-            self.console.print(Panel(
-                Group(
-                    "[bold bright_blue]🌟 Your Olympic Journey Awaits![/bold bright_blue]",
-                    "",
-                    "You haven't participated in Olympics competitions yet.",
-                    "",
-                    "[bold]Start your journey:[/bold]",
-                    "• Check active events: [green]tito olympics events[/green]",
-                    "• Join a competition: [green]tito olympics compete --event <id>[/green]",
-                    "• Earn your first Olympic badge! 🏅",
-                    "",
-                    "[dim]Every Olympic participant gets recognition for participation![/dim]",
-                ),
-                title="🏅 Your Olympic Profile",
-                border_style="bright_blue",
-                padding=(1, 2)
-            ))
-            return 0
-        
-        # Show awards and achievements
-        self._display_personal_olympic_achievements(olympic_profile)
-        return 0
-    
-    def _show_all_awards(self) -> int:
-        """Show community awards and notable achievements."""
-        # Mock awards data
-        notable_awards = self._load_notable_awards()
-        
-        # Recent awards table
-        table = Table(title="🏆 Recent Olympic Achievements")
-        table.add_column("Award", style="bold")
-        table.add_column("Recipient", style="green")
-        table.add_column("Event", style="blue")
-        table.add_column("Achievement", style="yellow")
-        table.add_column("Date", style="dim")
-        
-        for award in notable_awards[:10]:
-            table.add_row(
-                award["award_type"],
-                award["recipient"],
-                award["event"],
-                award["description"],
-                award["date"]
-            )
-        
-        self.console.print(table)
-        
-        # Award categories explanation
-        self.console.print(Panel(
-            Group(
-                "[bold bright_yellow]🏅 Olympic Award Categories[/bold bright_yellow]",
-                "",
-                "🥇 [bold]Performance Awards[/bold]",
-                "  • Gold/Silver/Bronze medals for top competition results",
-                "  • Speed records, accuracy achievements, efficiency milestones",
-                "",
-                "🌟 [bold]Innovation Awards[/bold]",
-                "  • Novel Architecture Award for creative model designs",
-                "  • Optimization Genius for breakthrough efficiency techniques",
-                "  • Interpretability Champion for explainable AI contributions",
-                "",
-                "🤝 [bold]Community Awards[/bold]",
-                "  • Mentor Badge for helping other competitors",
-                "  • Knowledge Sharer for valuable insights and tutorials",
-                "  • Sportsperson Award for exceptional community spirit",
-                "",
-                "🎯 [bold]Special Recognition[/bold]",
-                "  • First Participation Badge (everyone gets this!)",
-                "  • Consistency Award for regular competition participation",
-                "  • Breakthrough Achievement for major personal improvements",
-            ),
-            title="🏆 Recognition System",
-            border_style="bright_yellow",
-            padding=(0, 1)
-        ))
-        
-        return 0
-    
-    def _show_history(self, args: Namespace) -> int:
-        """Show past competition events and memorable moments."""
-        # Load historical data
-        history = self._load_olympics_history()
-        
-        # Filter by year if specified
-        if args.year:
-            history = [h for h in history if h["year"] == args.year]
-        
-        # Filter by event type if specified
-        if args.event_type:
-            history = [h for h in history if h["type"] == args.event_type]
-        
-        if not history:
-            filter_text = f" for {args.year}" if args.year else ""
-            filter_text += f" ({args.event_type} events)" if args.event_type else ""
-            
-            self.console.print(Panel(
-                f"[yellow]No competition history found{filter_text}![/yellow]\n\n"
-                "The Olympics program is just getting started!",
-                title="📚 No History",
-                border_style="yellow"
-            ))
-            return 0
-        
-        # Create history table
-        table = Table(title="📚 TinyTorch Olympics History")
-        table.add_column("Event", style="bold")
-        table.add_column("Date", style="dim")
-        table.add_column("Type", style="blue")
-        table.add_column("Winner", style="green")
-        table.add_column("Achievement", style="yellow")
-        table.add_column("Memorable Moment", style="cyan")
-        
-        for event in sorted(history, key=lambda x: x["date"], reverse=True):
-            table.add_row(
-                event["name"],
-                event["date"],
-                event["type"],
-                event["winner"],
-                event["winning_achievement"],
-                event["memorable_moment"]
-            )
-        
-        self.console.print(table)
-        
-        # Show legendary moments
-        if not args.year and not args.event_type:
-            self.console.print(Panel(
-                Group(
-                    "[bold bright_gold]🌟 Legendary Olympic Moments[/bold bright_gold]",
-                    "",
-                    "🏆 [bold]The Great Speed Challenge 2024[/bold]",
-                    "   Winner achieved 75% CIFAR-10 accuracy in just 47 seconds!",
-                    "",
-                    "🧠 [bold]Architecture Innovation Contest[/bold]",
-                    "   Revolutionary attention mechanism reduced parameters by 90%",
-                    "",
-                    "🤝 [bold]Community Spirit Award[/bold]",
-                    "   Competitor shared winning code to help others improve",
-                    "",
-                    "[dim]Each Olympics creates new legends in the TinyTorch community! 💫[/dim]",
-                ),
-                title="🏛️ Hall of Fame",
-                border_style="bright_gold",
-                padding=(0, 1)
-            ))
-        
-        return 0
-    
-    def _load_olympics_events(self) -> List[Dict[str, Any]]:
-        """Load olympics events data (mock implementation)."""
-        return [
-            {
-                "id": "winter2024_speed",
-                "name": "Winter 2024 Speed Challenge",
-                "type": "Speed",
-                "status": "active",
-                "duration": "24 hours",
-                "description": "Train CIFAR-10 model to 70%+ accuracy in under 5 minutes",
-                "prize": "🏆 Speed Medal + Recognition",
-                "participants": 23,
-                "start_date": "2024-01-15",
-                "end_date": "2024-01-16",
-                "criteria": {"min_accuracy": 70.0, "max_time_minutes": 5}
-            },
-            {
-                "id": "memory2024_efficiency",
-                "name": "Memory Efficiency Olympics",
-                "type": "Efficiency",
-                "status": "active", 
-                "duration": "1 week",
-                "description": "Best CIFAR-10 accuracy with model under 1MB",
-                "prize": "🥇 Efficiency Champion",
-                "participants": 15,
-                "start_date": "2024-01-10",
-                "end_date": "2024-01-17",
-                "criteria": {"max_model_size_mb": 1.0}
-            },
-            {
-                "id": "innovation2024_arch",
-                "name": "Architecture Innovation Contest",
-                "type": "Innovation",
-                "status": "upcoming",
-                "duration": "2 weeks",
-                "description": "Novel architectures and creative approaches welcome",
-                "prize": "🌟 Innovation Award",
-                "participants": 0,
-                "start_date": "2024-02-01",
-                "end_date": "2024-02-14",
-                "criteria": {"novelty_required": True}
-            },
-            {
-                "id": "autumn2023_classic",
-                "name": "Autumn 2023 Classic",
-                "type": "Accuracy",
-                "status": "completed",
-                "duration": "1 month",
-                "description": "Best overall CIFAR-10 accuracy challenge",
-                "prize": "🥇 Gold Medal",
-                "participants": 87,
-                "start_date": "2023-10-01",
-                "end_date": "2023-10-31",
-                "winner": "neural_champion",
-                "winning_score": 84.2
-            }
-        ]
-    
-    def _get_status_display(self, status: str, end_date: Optional[str] = None) -> str:
-        """Get display-friendly status with timing information."""
-        if status == "active":
-            if end_date:
-                # Calculate time remaining
-                end = datetime.fromisoformat(end_date)
-                now = datetime.now()
-                if end > now:
-                    remaining = end - now
-                    if remaining.days > 0:
-                        return f"🔥 Active ({remaining.days}d left)"
-                    else:
-                        hours = remaining.seconds // 3600
-                        return f"🔥 Active ({hours}h left)"
-            return "🔥 Active"
-        elif status == "upcoming":
-            return "📅 Upcoming"
-        elif status == "completed":
-            return "✅ Completed"
-        else:
-            return status.title()
-    
-    def _is_user_registered(self) -> bool:
-        """Check if user is registered for community leaderboard."""
-        from .leaderboard import LeaderboardCommand
-        leaderboard_cmd = LeaderboardCommand(self.config)
-        return leaderboard_cmd._load_user_profile() is not None
-    
-    def _get_event_details(self, event_id: str) -> Optional[Dict[str, Any]]:
-        """Get details for a specific event."""
-        events = self._load_olympics_events()
-        return next((e for e in events if e["id"] == event_id), None)
-    
-    def _show_event_details(self, event: Dict[str, Any]) -> None:
-        """Show detailed information about an event."""
-        self.console.print(Panel(
-            Group(
-                f"[bold bright_blue]{event['name']}[/bold bright_blue]",
-                "",
-                f"[bold]Type:[/bold] {event['type']}",
-                f"[bold]Duration:[/bold] {event['duration']}",
-                f"[bold]Current Participants:[/bold] {event.get('participants', 0)}",
-                "",
-                f"[bold]Challenge:[/bold]",
-                f"  {event['description']}",
-                "",
-                f"[bold]Recognition:[/bold]",
-                f"  {event['prize']}",
-                "",
-                f"[bold]Requirements:[/bold]",
-                *[f"  • {k.replace('_', ' ').title()}: {v}" for k, v in event.get('criteria', {}).items()],
-            ),
-            title=f"🏅 {event['type']} Competition",
-            border_style="bright_blue",
-            padding=(1, 2)
-        ))
-    
-    def _gather_competition_submission(self, event: Dict[str, Any], args: Namespace) -> Dict[str, Any]:
-        """Gather submission details for competition."""
-        submission = {
-            "event_id": event["id"],
-            "submitted_date": datetime.now().isoformat()
-        }
-        
-        # Get accuracy
-        if args.accuracy is not None:
-            submission["accuracy"] = args.accuracy
-        else:
-            submission["accuracy"] = float(Prompt.ask(
-                f"[bold]Accuracy achieved on {event.get('dataset', 'the task')}[/bold]",
-                default="0.0"
-            ))
-        
-        # Get model description
-        if args.model:
-            submission["model"] = args.model
-        else:
-            submission["model"] = Prompt.ask(
-                "[bold]Model description[/bold] (architecture, approach, innovations)",
-                default="Custom Model"
-            )
-        
-        # Optional fields
-        submission["code_url"] = args.code_url or Prompt.ask(
-            "[bold]Code/approach URL[/bold] (optional)",
-            default=""
-        ) or None
-        
-        submission["notes"] = args.notes or Prompt.ask(
-            "[bold]Competition notes[/bold] (innovations, challenges, learnings)",
-            default=""
-        ) or None
-        
-        # Event-specific metrics
-        if "max_time_minutes" in event.get("criteria", {}):
-            training_time = float(Prompt.ask(
-                "[bold]Training time in minutes[/bold]",
-                default="0.0"
-            ))
-            submission["training_time_minutes"] = training_time
-        
-        if "max_model_size_mb" in event.get("criteria", {}):
-            model_size = float(Prompt.ask(
-                "[bold]Model size in MB[/bold]",
-                default="0.0"
-            ))
-            submission["model_size_mb"] = model_size
-        
-        return submission
-    
-    def _validate_submission(self, event: Dict[str, Any], submission: Dict[str, Any]) -> Dict[str, Any]:
-        """Validate submission meets event criteria."""
-        criteria = event.get("criteria", {})
-        
-        # Check minimum accuracy
-        if "min_accuracy" in criteria:
-            if submission["accuracy"] < criteria["min_accuracy"]:
-                return {
-                    "valid": False,
-                    "reason": f"Accuracy {submission['accuracy']:.1f}% below required {criteria['min_accuracy']:.1f}%"
-                }
-        
-        # Check maximum training time
-        if "max_time_minutes" in criteria:
-            if submission.get("training_time_minutes", 0) > criteria["max_time_minutes"]:
-                return {
-                    "valid": False,
-                    "reason": f"Training time {submission['training_time_minutes']:.1f}min exceeds limit {criteria['max_time_minutes']:.1f}min"
-                }
-        
-        # Check maximum model size
-        if "max_model_size_mb" in criteria:
-            if submission.get("model_size_mb", 0) > criteria["max_model_size_mb"]:
-                return {
-                    "valid": False,
-                    "reason": f"Model size {submission['model_size_mb']:.1f}MB exceeds limit {criteria['max_model_size_mb']:.1f}MB"
-                }
-        
-        return {"valid": True}
-    
-    def _save_competition_entry(self, event: Dict[str, Any], submission: Dict[str, Any]) -> None:
-        """Save competition entry to user's Olympic profile."""
-        olympic_profile = self._load_user_olympic_profile()
-        
-        if "competitions" not in olympic_profile:
-            olympic_profile["competitions"] = []
-        
-        olympic_profile["competitions"].append(submission)
-        
-        # Add participation award if first competition
-        if len(olympic_profile["competitions"]) == 1:
-            award = {
-                "type": "participation",
-                "name": "First Olympic Participation",
-                "description": "Welcomed to the Olympics community!",
-                "event": event["name"],
-                "earned_date": datetime.now().isoformat()
-            }
-            if "awards" not in olympic_profile:
-                olympic_profile["awards"] = []
-            olympic_profile["awards"].append(award)
-        
-        self._save_user_olympic_profile(olympic_profile)
-    
-    def _show_competition_confirmation(self, event: Dict[str, Any], submission: Dict[str, Any]) -> None:
-        """Show confirmation and current standing."""
-        # Determine performance level for this competition
-        ranking_message = self._get_competition_ranking_message(event, submission)
-        
-        self.console.print(Panel(
-            Group(
-                Align.center("[bold bright_green]🎉 Competition Entry Submitted! 🎉[/bold bright_green]"),
-                "",
-                f"[bold]Event:[/bold] {event['name']}",
-                f"[bold]Your Result:[/bold] {submission['accuracy']:.1f}% accuracy",
-                f"[bold]Model:[/bold] {submission['model']}",
-                "",
-                ranking_message,
-                "",
-                "[bold bright_blue]🏅 Recognition Earned:[/bold bright_blue]",
-                "• Olympic Participant Badge",
-                "• Competition Experience Points",
-                "• Community Recognition",
-                "",
-                "[bold]Next Steps:[/bold]",
-                "• View your awards: [green]tito olympics awards --personal[/green]",
-                "• See current standings: [green]tito olympics events[/green]",
-                "• Join another event: [green]tito olympics events[/green]",
-            ),
-            title="🥇 Olympic Achievement",
-            border_style="bright_green",
-            padding=(1, 2)
-        ))
-    
-    def _get_competition_ranking_message(self, event: Dict[str, Any], submission: Dict[str, Any]) -> str:
-        """Get appropriate ranking/performance message for competition."""
-        accuracy = submission["accuracy"]
-        
-        # Mock competition standings for encouragement
-        if accuracy >= 80:
-            return "[bright_green]🏆 Outstanding performance! You're in contention for top prizes![/bright_green]"
-        elif accuracy >= 70:
-            return "[bright_blue]🎯 Strong showing! You're competing well in this event![/bright_blue]"
-        elif accuracy >= 60:
-            return "[bright_yellow]🌟 Good effort! Every competition teaches valuable lessons![/bright_yellow]"
-        else:
-            return "[bright_magenta]💝 Thank you for participating! Competition experience is valuable![/bright_magenta]"
-    
-    def _load_user_olympic_profile(self) -> Dict[str, Any]:
-        """Load user's Olympic competition profile."""
-        data_dir = Path.home() / ".tinytorch" / "olympics"
-        data_dir.mkdir(parents=True, exist_ok=True)
-        profile_file = data_dir / "olympic_profile.json"
-        
-        if profile_file.exists():
-            with open(profile_file, 'r') as f:
-                return json.load(f)
-        
-        return {
-            "competitions": [],
-            "awards": [],
-            "created_date": datetime.now().isoformat()
-        }
-    
-    def _save_user_olympic_profile(self, profile: Dict[str, Any]) -> None:
-        """Save user's Olympic competition profile."""
-        data_dir = Path.home() / ".tinytorch" / "olympics"
-        profile_file = data_dir / "olympic_profile.json"
-        
-        with open(profile_file, 'w') as f:
-            json.dump(profile, f, indent=2)
-    
-    def _display_personal_olympic_achievements(self, olympic_profile: Dict[str, Any]) -> None:
-        """Display user's personal Olympic achievements."""
-        competitions = olympic_profile.get("competitions", [])
-        awards = olympic_profile.get("awards", [])
-        
-        # Summary stats
-        total_competitions = len(competitions)
-        best_accuracy = max([c["accuracy"] for c in competitions], default=0)
-        events_participated = len(set(c["event_id"] for c in competitions))
-        
-        self.console.print(Panel(
-            Group(
-                Align.center("[bold bright_gold]🏅 Your Olympic Journey 🏅[/bold bright_gold]"),
-                "",
-                f"🎯 Competitions Entered: {total_competitions}",
-                f"🏆 Best Performance: {best_accuracy:.1f}% accuracy",
-                f"🌟 Events Participated: {events_participated}",
-                f"🥇 Awards Earned: {len(awards)}",
-            ),
-            title="📊 Olympic Stats",
-            border_style="bright_gold",
-            padding=(1, 2)
-        ))
-        
-        # Awards table
-        if awards:
-            awards_table = Table(title="🏆 Your Olympic Awards")
-            awards_table.add_column("Award", style="bold")
-            awards_table.add_column("Event", style="blue")
-            awards_table.add_column("Description", style="green")
-            awards_table.add_column("Date", style="dim")
-            
-            for award in sorted(awards, key=lambda x: x["earned_date"], reverse=True):
-                awards_table.add_row(
-                    award["name"],
-                    award["event"],
-                    award["description"],
-                    award["earned_date"][:10]
-                )
-            
-            self.console.print(awards_table)
-        
-        # Recent competitions
-        if competitions:
-            recent_comps = sorted(competitions, key=lambda x: x["submitted_date"], reverse=True)[:5]
-            
-            comps_table = Table(title="🎯 Recent Competition Entries")
-            comps_table.add_column("Event", style="bold")
-            comps_table.add_column("Accuracy", style="green", justify="right")
-            comps_table.add_column("Model", style="blue")
-            comps_table.add_column("Date", style="dim")
-            
-            for comp in recent_comps:
-                comps_table.add_row(
-                    comp["event_id"],
-                    f"{comp['accuracy']:.1f}%",
-                    comp["model"],
-                    comp["submitted_date"][:10]
-                )
-            
-            self.console.print(comps_table)
-    
-    def _load_notable_awards(self) -> List[Dict[str, Any]]:
-        """Load notable community awards (mock implementation)."""
-        return [
-            {
-                "award_type": "🥇 Gold Medal",
-                "recipient": "speed_demon",
-                "event": "Winter 2024 Speed Challenge",
-                "description": "2.3 min training, 78.4% accuracy",
-                "date": "2024-01-16"
-            },
-            {
-                "award_type": "🌟 Innovation Award",
-                "recipient": "arch_wizard",
-                "event": "Memory Efficiency Olympics",
-                "description": "Novel attention mechanism",
-                "date": "2024-01-15"
-            },
-            {
-                "award_type": "🤝 Community Spirit",
-                "recipient": "helpful_mentor",
-                "event": "Autumn 2023 Classic",
-                "description": "Shared winning approach publicly",
-                "date": "2023-11-01"
-            },
-            {
-                "award_type": "🏆 Speed Record",
-                "recipient": "lightning_fast",
-                "event": "Winter 2024 Speed Challenge",
-                "description": "47 second training record",
-                "date": "2024-01-15"
-            },
-            {
-                "award_type": "🎯 Accuracy Champion",
-                "recipient": "precision_master",
-                "event": "Architecture Innovation",
-                "description": "86.7% CIFAR-10 accuracy",
-                "date": "2024-01-10"
-            }
-        ]
-    
-    def _load_olympics_history(self) -> List[Dict[str, Any]]:
-        """Load historical Olympics data (mock implementation)."""
-        return [
-            {
-                "name": "Autumn 2023 Classic",
-                "date": "2023-10-31",
-                "year": 2023,
-                "type": "accuracy",
-                "winner": "neural_champion",
-                "winning_achievement": "84.2% CIFAR-10 accuracy",
-                "memorable_moment": "First 80%+ achievement in community"
-            },
-            {
-                "name": "Summer 2023 Speed Trial",
-                "date": "2023-07-15",
-                "year": 2023,
-                "type": "speed",
-                "winner": "velocity_victor",
-                "winning_achievement": "3.2 minute training",
-                "memorable_moment": "Breakthrough GPU optimization technique"
-            },
-            {
-                "name": "Spring 2023 Innovation Fair",
-                "date": "2023-04-20",
-                "year": 2023,
-                "type": "innovation",
-                "winner": "creative_genius",
-                "winning_achievement": "Self-organizing architecture",
-                "memorable_moment": "Inspired 12 follow-up research papers"
-            }
-        ]
--- a/tito/commands/_archived/status.py
+++ b/tito/commands/_archived/status.py
@@ -1,572 +0,0 @@
-"""
-Status command for TinyTorch CLI: checks status of all modules in modules/ directory.
-
-Supports both basic status checking and comprehensive system analysis.
-"""
-
-import subprocess
-import sys
-import yaml
-import re
-import time
-from argparse import ArgumentParser, Namespace
-from pathlib import Path
-from rich.panel import Panel
-from rich.table import Table
-from rich.text import Text
-from typing import Union, Dict, Any, Optional
-
-from .base import BaseCommand
-from ..core.status_analyzer import TinyTorchStatusAnalyzer
-
-class StatusCommand(BaseCommand):
-    @property
-    def name(self) -> str:
-        return "status"
-
-    @property
-    def description(self) -> str:
-        return "Check status of all modules"
-
-    def add_arguments(self, parser: ArgumentParser) -> None:
-        parser.add_argument("--progress", action="store_true", help="Show user progress (modules + milestones) - DEFAULT")
-        parser.add_argument("--files", action="store_true", help="Show file structure and module status")
-        parser.add_argument("--details", action="store_true", help="Show detailed file structure")
-        parser.add_argument("--metadata", action="store_true", help="Show module metadata information")
-        parser.add_argument("--test-status", action="store_true", help="Include test execution status (slower)")
-        parser.add_argument("--comprehensive", action="store_true", help="Run comprehensive system health dashboard (environment + compliance + testing)")
-
-    def _get_export_target(self, module_path: Path) -> str:
-        """
-        Read the actual export target from the dev file's #| default_exp directive.
-        Same logic as the export command.
-        """
-        # Extract short name from module directory name for dev file
-        module_name = module_path.name
-        if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
-            short_name = module_name[3:]  # Remove "00_" prefix
-        else:
-            short_name = module_name
-        dev_file = module_path / f"{short_name}.py"
-        if not dev_file.exists():
-            return "not_found"
-        
-        try:
-            with open(dev_file, 'r', encoding='utf-8') as f:
-                content = f.read()
-                # Look for #| default_exp directive
-                match = re.search(r'#\|\s*default_exp\s+([^\n\r]+)', content)
-                if match:
-                    return match.group(1).strip()
-                return "no_export"
-        except Exception:
-            return "read_error"
-
-    def _count_test_functions(self, dev_file: Path) -> int:
-        """Count the number of test functions in a dev file."""
-        try:
-            with open(dev_file, 'r', encoding='utf-8') as f:
-                content = f.read()
-                # Count lines that start with "def test_"
-                lines = content.split('\n')
-                test_functions = [line for line in lines if line.strip().startswith('def test_')]
-                return len(test_functions)
-        except Exception:
-            return 0
-
-    def _count_export_functions(self, dev_file: Path) -> int:
-        """Count the number of exported functions/classes in a dev file."""
-        try:
-            with open(dev_file, 'r', encoding='utf-8') as f:
-                content = f.read()
-                # Count lines that have #| export directive
-                lines = content.split('\n')
-                export_lines = [line for line in lines if line.strip().startswith('#| export')]
-                return len(export_lines)
-        except Exception:
-            return 0
-
-    def run(self, args: Namespace) -> int:
-        console = self.console
-
-        # Handle comprehensive analysis mode
-        if args.comprehensive:
-            return self._run_comprehensive_analysis()
-
-        # Handle progress view (default if no flags, or --progress)
-        if not args.files and not args.details and not args.metadata and not args.test_status:
-            return self._run_progress_view()
-
-        if args.progress:
-            return self._run_progress_view()
-
-        # Standard file status check mode
-        return self._run_standard_status(args)
-    
-    def _run_progress_view(self) -> int:
-        """Show unified user progress view (modules + milestones)."""
-        console = self.console
-        import json
-        from datetime import datetime
-
-        # Load progress data
-        progress_file = Path(".tito") / "progress.json"
-        milestones_file = Path(".tito") / "milestones.json"
-
-        # Load module progress
-        if progress_file.exists():
-            progress_data = json.loads(progress_file.read_text())
-            completed_modules = progress_data.get("completed_modules", [])
-            completion_dates = progress_data.get("completion_dates", {})
-        else:
-            completed_modules = []
-            completion_dates = {}
-
-        # Load milestone achievements
-        if milestones_file.exists():
-            milestones_data = json.loads(milestones_file.read_text())
-            completed_milestones = milestones_data.get("completed_milestones", [])
-            milestone_dates = milestones_data.get("completion_dates", {})
-        else:
-            completed_milestones = []
-            milestone_dates = {}
-
-        # Calculate progress percentages
-        total_modules = 20
-        total_milestones = 6
-        modules_percent = int((len(completed_modules) / total_modules) * 100)
-        milestones_percent = int((len(completed_milestones) / total_milestones) * 100)
-
-        # Create summary panel
-        summary_text = Text()
-        summary_text.append(f"📦 Modules Completed: ", style="bold")
-        summary_text.append(f"{len(completed_modules)}/{total_modules} ({modules_percent}%)\n", style="cyan")
-        summary_text.append(f"🏆 Milestones Achieved: ", style="bold")
-        summary_text.append(f"{len(completed_milestones)}/{total_milestones} ({milestones_percent}%)\n\n", style="magenta")
-
-        # Last activity
-        all_dates = list(completion_dates.values()) + list(milestone_dates.values())
-        if all_dates:
-            latest_date = max(all_dates)
-            summary_text.append("📍 Last Activity: ", style="bold")
-            summary_text.append(f"{latest_date}\n", style="dim")
-
-        console.print(Panel(
-            summary_text,
-            title="📊 TinyTorch Progress",
-            border_style="bright_cyan"
-        ))
-
-        # Module Progress Table
-        if completed_modules:
-            console.print("\n[bold]Module Progress:[/bold]")
-            for i in range(1, total_modules + 1):
-                mod_num = i
-                if mod_num in completed_modules:
-                    module_name = self._get_module_name(mod_num)
-                    console.print(f"  [green]✅ {mod_num:02d} {module_name}[/green]")
-                elif i <= len(completed_modules) + 3:  # Show next few modules
-                    module_name = self._get_module_name(mod_num)
-                    console.print(f"  [dim]🔒 {mod_num:02d} {module_name}[/dim]")
-
-        # Milestone Achievements
-        if completed_milestones or (completed_modules and len(completed_modules) >= 1):
-            console.print("\n[bold]Milestone Achievements:[/bold]")
-            milestone_names = {
-                "01": "Perceptron (1957)",
-                "02": "Backpropagation (1986)",
-                "03": "MLP Revival (1986)",
-                "04": "CNN Revolution (1998)",
-                "05": "Transformer Era (2017)",
-                "06": "MLPerf (2018)"
-            }
-            for mid in ["01", "02", "03", "04", "05", "06"]:
-                if mid in completed_milestones:
-                    console.print(f"  [magenta]✅ {mid} - {milestone_names[mid]}[/magenta]")
-                else:
-                    # Check if ready
-                    prereqs_met = self._check_milestone_prereqs(mid, completed_modules)
-                    if prereqs_met:
-                        console.print(f"  [yellow]🎯 {mid} - {milestone_names[mid]} [Ready!][/yellow]")
-                    else:
-                        console.print(f"  [dim]🔒 {mid} - {milestone_names[mid]}[/dim]")
-
-        console.print()
-        return 0
-
-    def _get_module_name(self, module_num: int) -> str:
-        """Get module name from number."""
-        module_names = {
-            1: "Tensor", 2: "Activations", 3: "Layers", 4: "Losses",
-            5: "Autograd", 6: "Optimizers", 7: "Training", 8: "DataLoader",
-            9: "Convolutions", 10: "Normalization", 11: "Tokenization",
-            12: "Embeddings", 13: "Attention", 14: "Transformers",
-            15: "Profiling", 16: "Quantization", 17: "Compression",
-            18: "Memoization", 19: "Benchmarking", 20: "Capstone"
-        }
-        return module_names.get(module_num, "Unknown")
-
-    def _check_milestone_prereqs(self, milestone_id: str, completed_modules: list) -> bool:
-        """Check if milestone prerequisites are met."""
-        prereqs = {
-            "01": [1],
-            "02": [1, 2, 3, 4, 5],
-            "03": [1, 2, 3, 4, 5, 6, 7],
-            "04": [1, 2, 3, 4, 5, 6, 7, 8, 9],
-            "05": [1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14],
-            "06": [1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 16, 19]
-        }
-        required = prereqs.get(milestone_id, [])
-        return all(mod in completed_modules for mod in required)
-
-    def _run_comprehensive_analysis(self) -> int:
-        """Run comprehensive system health dashboard."""
-        console = self.console
-        start_time = time.time()
-        
-        console.print("🚀 Starting TinyTorch Comprehensive Status Check...", style="bold green")
-        
-        # Initialize analyzer
-        analyzer = TinyTorchStatusAnalyzer()
-        
-        # Run full analysis
-        result = analyzer.run_full_analysis()
-        
-        # Generate comprehensive report
-        analyzer.generate_comprehensive_report(console)
-        
-        # Summary
-        total_time = time.time() - start_time
-        console.print(f"\n⏱️ Comprehensive analysis completed in {total_time:.1f}s", style="dim")
-        
-        # Return appropriate exit code
-        if result['summary']['environment_healthy'] and result['summary']['working_modules'] >= result['summary']['total_modules'] * 0.8:
-            return 0  # Success
-        else:
-            return 1  # Issues found
-    
-    def _run_standard_status(self, args: Namespace) -> int:
-        """Run standard status check mode."""
-        console = self.console
-        
-        # Scan modules directory
-        modules_dir = Path("modules")
-        if not modules_dir.exists():
-            console.print(Panel("[red]❌ modules/ directory not found[/red]", 
-                              title="Error", border_style="red"))
-            return 1
-        
-        # Find all module directories (exclude special directories)
-        exclude_dirs = {'.quarto', '__pycache__', '.git', '.pytest_cache'}
-        module_dirs = [d for d in modules_dir.iterdir() 
-                      if d.is_dir() and d.name not in exclude_dirs]
-        
-        if not module_dirs:
-            console.print(Panel("[yellow]⚠️  No modules found in modules/ directory[/yellow]", 
-                              title="Warning", border_style="yellow"))
-            return 0
-        
-        console.print(Panel(f"📋 Found {len(module_dirs)} modules in modules directory", 
-                          title="Module Status Check", border_style="bright_cyan"))
-        
-        # Create status table
-        status_table = Table(title="Module Status Overview", show_header=True, header_style="bold blue")
-        status_table.add_column("Module", style="bold cyan", width=17)
-        status_table.add_column("Status", width=12, justify="center")
-        status_table.add_column("Dev File", width=12, justify="center")
-        status_table.add_column("Inline Tests", width=12, justify="center")
-        status_table.add_column("External Tests", width=12, justify="center")
-        status_table.add_column("README", width=12, justify="center")
-        
-        if args.metadata:
-            status_table.add_column("Export Target", width=20, justify="center")
-            status_table.add_column("Prerequisites", width=15, justify="center")
-        
-        # Check each module
-        modules_status = []
-        for module_dir in sorted(module_dirs):
-            module_name = module_dir.name
-            status = self._check_module_status(module_dir, args.test_status)
-            modules_status.append((module_name, status))
-            
-            # Add to table
-            row = [
-                module_name,
-                self._format_status(status['overall_status']),
-                self._format_file_status(status['dev_file'], status.get('export_count', 0)),
-                self._format_inline_tests(status['inline_test_count']),
-                self._format_external_tests(status['external_tests'], status.get('external_test_status')),
-                "✅" if status['readme'] else "❌"
-            ]
-            
-            # Add metadata columns if requested
-            if args.metadata:
-                metadata = status.get('metadata', {})
-                export_target = status.get('export_target', 'unknown')
-                row.append(export_target if export_target not in ['not_found', 'no_export', 'read_error'] else export_target)
-                
-                # Show prerequisites from dependencies
-                deps = metadata.get('dependencies', {})
-                prereqs = deps.get('prerequisites', [])
-                row.append(', '.join(prereqs) if prereqs else 'none')
-            
-            status_table.add_row(*row)
-        
-        console.print(status_table)
-        
-        # Summary with better logic
-        total_modules = len(modules_status)
-        
-        # A module is "working" if it has a dev file with implementations
-        working_modules = sum(1 for _, status in modules_status 
-                            if status['dev_file'] and status.get('export_count', 0) > 0)
-        
-        # A module is "complete" if it has everything
-        complete_modules = sum(1 for _, status in modules_status 
-                             if status['dev_file'] and status['external_tests'] and status['readme'] and status.get('export_count', 0) > 0)
-        
-        console.print(f"\n📊 Summary:")
-        console.print(f"   🏗️  Working modules: {working_modules}/{total_modules} (have implementations)")
-        console.print(f"   ✅ Complete modules: {complete_modules}/{total_modules} (have implementations, tests, docs)")
-        
-        # Helpful commands
-        console.print(f"\n💡 Quick commands:")
-        console.print(f"   [bold cyan]tito status --comprehensive[/bold cyan]      # Full system health dashboard")
-        console.print(f"   [bold cyan]tito module test --all[/bold cyan]           # Test all modules")
-        console.print(f"   [bold cyan]tito module test MODULE_NAME[/bold cyan]     # Test specific module")
-        console.print(f"   [bold cyan]pytest modules/*/  -k test_[/bold cyan]  # Run pytest on inline tests")
-        console.print(f"   [bold cyan]pytest tests/test_*.py[/bold cyan]           # Run external tests")
-        
-        # Detailed view
-        if args.details:
-            console.print("\n" + "="*60)
-            console.print("📁 Detailed Module Structure")
-            console.print("="*60)
-            
-            for module_name, status in modules_status:
-                self._print_module_details(module_name, status)
-        
-        # Metadata view
-        if args.metadata:
-            console.print("\n" + "="*60)
-            console.print("📊 Module Metadata")
-            console.print("="*60)
-            
-            for module_name, status in modules_status:
-                if status.get('metadata'):
-                    self._print_module_metadata(module_name, status['metadata'])
-        
-        return 0
-    
-    def _check_module_status(self, module_dir: Path, check_tests: bool = False) -> dict:
-        """Check the status of a single module."""
-        module_name = module_dir.name
-        
-        # Check for required files
-        # Extract short name from module directory name for dev file
-        if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
-            short_name = module_name[3:]  # Remove "00_" prefix
-        else:
-            short_name = module_name
-        dev_file = module_dir / f"{short_name}.py"
-        readme_file = module_dir / "README.md"
-        metadata_file = module_dir / "module.yaml"
-        
-        # Check for tests in main tests directory
-        # Extract short name from module directory name (e.g., "01_tensor" -> "tensor")
-        if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
-            short_name = module_name[3:]  # Remove "00_" prefix
-        else:
-            short_name = module_name
-        
-        main_test_file = Path("tests") / f"test_{short_name}.py"
-        
-        status = {
-            'dev_file': dev_file.exists(),
-            'readme': readme_file.exists(),
-            'metadata_file': metadata_file.exists(),
-            'external_tests': main_test_file.exists(),
-            'inline_test_count': 0,
-            'export_count': 0,
-            'export_target': 'not_found',
-            'external_test_status': None,
-            'overall_status': 'unknown',
-            'metadata': None
-        }
-        
-        # Count inline tests and exports if dev file exists
-        if dev_file.exists():
-            status['inline_test_count'] = self._count_test_functions(dev_file)
-            status['export_count'] = self._count_export_functions(dev_file)
-            status['export_target'] = self._get_export_target(module_dir)
-        
-        # Run external tests if requested (slower)
-        if check_tests and main_test_file.exists():
-            status['external_test_status'] = self._check_external_tests(main_test_file)
-        
-        # Determine overall status
-        status['overall_status'] = self._determine_overall_status(status)
-        
-        # Load metadata if available
-        if metadata_file.exists():
-            try:
-                with open(metadata_file, 'r') as f:
-                    metadata = yaml.safe_load(f)
-                    status['metadata'] = metadata
-            except Exception as e:
-                status['metadata'] = {'error': str(e)}
-        
-        return status
-    
-    def _determine_overall_status(self, status: dict) -> str:
-        """Determine overall module status based on files and implementation."""
-        # If no dev file, module is not started
-        if not status['dev_file']:
-            return 'not_started'
-        
-        # If dev file exists but no implementations, module is empty
-        if status.get('export_count', 0) == 0:
-            return 'empty'
-        
-        # If has implementations but no tests, module is in progress
-        if status.get('inline_test_count', 0) == 0 and not status.get('external_tests', False):
-            return 'no_tests'
-        
-        # If has implementations and tests, module is working
-        if status.get('export_count', 0) > 0 and (status.get('inline_test_count', 0) > 0 or status.get('external_tests', False)):
-            return 'working'
-        
-        return 'unknown'
-    
-    def _check_external_tests(self, test_file: Path) -> str:
-        """Check if external tests pass (used only when --test-status is specified)."""
-        try:
-            result = subprocess.run(
-                [sys.executable, "-m", "pytest", str(test_file), "-q", "--tb=no"],
-                capture_output=True,
-                text=True,
-                timeout=30
-            )
-            
-            if result.returncode == 0:
-                return 'passing'
-            else:
-                return 'failing'
-                
-        except (subprocess.TimeoutExpired, FileNotFoundError):
-            return 'error'
-    
-    def _format_status(self, status: str) -> str:
-        """Format overall module status with appropriate emoji and color."""
-        status_map = {
-            'working': '✅',        # Has implementations and tests
-            'no_tests': '🚧',       # Has implementations but no tests
-            'empty': '📝',          # Has dev file but no implementations
-            'not_started': '❌',    # No dev file
-            'unknown': '❓'
-        }
-        return status_map.get(status, '❓')
-    
-    def _format_file_status(self, exists: bool, export_count: int) -> str:
-        """Format dev file status showing if it has implementations."""
-        if not exists:
-            return "❌"
-        if export_count == 0:
-            return "📝"  # File exists but empty
-        return f"✅({export_count})"  # File exists with implementations
-    
-    def _format_inline_tests(self, test_count: int) -> str:
-        """Format inline test count."""
-        if test_count == 0:
-            return "❌"
-        return f"✅({test_count})"
-    
-    def _format_external_tests(self, exists: bool, test_status: Optional[str] = None) -> str:
-        """Format external test status."""
-        if not exists:
-            return "❌"
-        if test_status == 'passing':
-            return "✅"
-        elif test_status == 'failing':
-            return "🔴"
-        elif test_status == 'error':
-            return "⚠️"
-        else:
-            return "✅"  # Exists but not tested
-    
-    def _print_module_details(self, module_name: str, status: dict) -> None:
-        """Print detailed information about a module."""
-        console = self.console
-        
-        # Module header
-        console.print(f"\n📦 {module_name.upper()}", style="bold cyan")
-        console.print("-" * 40)
-        
-        # File structure
-        files_table = Table(show_header=False, box=None, padding=(0, 2))
-        files_table.add_column("File", style="dim")
-        files_table.add_column("Status")
-        
-        dev_status = "✅ Found" if status['dev_file'] else "❌ Missing"
-        if status['dev_file']:
-            dev_status += f" ({status.get('export_count', 0)} exports, {status.get('inline_test_count', 0)} inline tests)"
-        
-        files_table.add_row(f"{module_name}.py", dev_status)
-        files_table.add_row("tests/test_*.py", "✅ Found" if status['external_tests'] else "❌ Missing")
-        files_table.add_row("README.md", "✅ Found" if status['readme'] else "❌ Missing")
-        
-        console.print(files_table)
-        
-        # Pytest commands
-        if status['dev_file'] or status['external_tests']:
-            console.print("\n[dim]💡 Test commands:[/dim]")
-            if status['dev_file']:
-                console.print(f"[dim]   pytest modules/{module_name}/{module_name}.py -k test_[/dim]")
-            if status['external_tests']:
-                short_name = module_name[3:] if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))) else module_name
-                console.print(f"[dim]   pytest tests/test_{short_name}.py -v[/dim]")
-    
-    def _print_module_metadata(self, module_name: str, metadata: dict) -> None:
-        """Print detailed metadata information about a module."""
-        console = self.console
-        
-        # Module header
-        title = metadata.get('title', module_name.title())
-        console.print(f"\n📦 {title}", style="bold cyan")
-        console.print("-" * (len(title) + 4))
-        
-        # Basic info
-        if metadata.get('description'):
-            console.print(f"📝 {metadata['description']}")
-        
-        # Export info (read from dev file - source of truth)
-        module_path = Path(f"modules/{module_name}")
-        export_target = self._get_export_target(module_path)
-        if export_target not in ['not_found', 'no_export', 'read_error']:
-            console.print(f"📦 Exports to: {export_target}")
-        
-        # Dependencies
-        if metadata.get('dependencies'):
-            deps = metadata['dependencies']
-            console.print("\n🔗 Dependencies:")
-            if deps.get('prerequisites'):
-                console.print(f"  Prerequisites: {', '.join(deps['prerequisites'])}")
-            if deps.get('enables'):
-                console.print(f"  Enables: {', '.join(deps['enables'])}")
-        
-        # Components
-        if metadata.get('components'):
-            console.print("\n🧩 Components:")
-            for component in metadata['components']:
-                console.print(f"  • {component}")
-        
-        # Files
-        if metadata.get('files'):
-            files = metadata['files']
-            console.print("\n📁 Files:")
-            if files.get('dev_file'):
-                console.print(f"  • Dev: {files['dev_file']}")
-            if files.get('test_file'):
-                console.print(f"  • Test: {files['test_file']}")
-            if files.get('readme'):
-                console.print(f"  • README: {files['readme']}")