Clean up repository

- Remove stale feature branches (kept debugging branch with unmerged work)
- Move test_spatial_core.py to correct directory (tests/09_spatial)
- Remove .tito user state from tracking (config.json, progress.json)
- Delete archived CLI commands (tito/commands/_archived/)
- Move standalone integration tests to tests/integration/
- Remove outdated audit/report markdown files
- Remove old template and deprecated test files
- Simplify .gitignore for .tito/ directory
This commit is contained in:
Vijay Janapa Reddi
2025-12-02 22:03:16 -05:00
parent ca9922224a
commit 3a885601f9
41 changed files with 402 additions and 12760 deletions

5
.gitignore vendored
View File

@@ -137,9 +137,8 @@ Thumbs.db
tito-cli.log
COMMIT_LOG.txt
# Tito CLI backups and cache
.tito/backups/
.tito/cache/
# Tito CLI user state and cache (local to each user)
.tito/
# Downloaded datasets (not source-controlled, too large)
data/

View File

@@ -1,3 +0,0 @@
{
"logo_theme": "standard"
}

View File

@@ -1,16 +0,0 @@
{
"completed_modules": [
"01_setup",
"02_tensor",
"03_activations",
"04_layers"
],
"completion_dates": {
"01_setup": "2025-09-19T10:21:11.081117",
"02_tensor": "2025-09-19T10:21:34.831693",
"03_activations": "2025-09-19T10:21:50.000000",
"04_layers": "2025-09-19T10:21:55.000000"
},
"achievements": [],
"total_capabilities_unlocked": 0
}

View File

@@ -1,660 +0,0 @@
# Module 05 (Autograd) Integration Test Audit Report
**Date**: 2025-11-25
**Auditor**: Dr. Sarah Rodriguez
**Status**: CRITICAL GAPS IDENTIFIED
---
## Executive Summary
**Current State**: The `test_progressive_integration.py` file is MISNAMED and tests Module 08 (DataLoader), NOT Module 05 (Autograd). This is a critical error that breaks the testing framework.
**Test Coverage**: 40% - Missing critical integration tests for gradient flow, in-place operations, memory leaks, and multi-module integration.
**Bug-Catching Priority**: MEDIUM - Existing tests cover specific operations but miss systemic integration issues.
---
## Critical Issues
### 1. WRONG MODULE TESTED (BLOCKER)
**Issue**: `/Users/VJ/GitHub/TinyTorch/tests/05_autograd/test_progressive_integration.py` tests Module 08 (DataLoader), not Module 05 (Autograd)
**Evidence**:
```python
# Line 1-7 of test_progressive_integration.py
"""
Module 08: Progressive Integration Tests
Tests that Module 08 (DataLoader) works correctly AND that the entire prior stack works.
DEPENDENCY CHAIN: 01_setup → 02_tensor → 03_activations → 04_layers → 05_dense → 06_spatial → 07_attention → 08_dataloader
This is where we enable real data processing for ML systems.
```
**Impact**:
- Module 05 has NO progressive integration tests
- Cannot verify that Autograd works with prior modules (01-04)
- Cannot verify that prior modules remain stable after Autograd
**Action Required**:
1. Rename current file to `tests/08_dataloader/test_progressive_integration.py`
2. Create NEW `tests/05_autograd/test_progressive_integration.py` for Autograd
---
## Current Test Coverage Analysis
### Existing Tests (What We Have)
| Test File | Purpose | Coverage |
|-----------|---------|----------|
| `test_gradient_flow.py` | Tests gradient tracking through operations | ✅ Good |
| `test_batched_matmul_backward.py` | Tests batched matmul gradients | ✅ Excellent |
| `test_dataloader_tensor_integration.py` | DataLoader integration (wrong module!) | ❌ Misplaced |
| `test_progressive_integration.py` | Module 08 tests (WRONG!) | ❌ Wrong module |
### What These Tests Cover
**✅ COVERED:**
1. **Arithmetic gradient flow** (add, sub, mul, div)
2. **Activation gradients** (ReLU, Sigmoid, Softmax, GELU)
3. **Reshape/transpose gradients**
4. **Batched matmul** (attention patterns)
5. **LayerNorm operations** (sqrt, mean)
**❌ MISSING:**
1. **Integration with Module 01 (Tensor)** - No tests that Tensor operations work
2. **Integration with Module 02 (Activations)** - Limited activation gradient tests
3. **Integration with Module 03 (Layers)** - No Dense layer gradient tests
4. **Integration with Module 04 (Losses)** - No loss gradient tests
5. **In-place operation bugs** - Critical for catching graph breaking
6. **Memory leak detection** - Computational graph accumulation
7. **Gradient accumulation bugs** - Shared parameters
8. **Multi-layer backprop** - End-to-end gradient flow
9. **Prior module stability** - Regression testing
---
## Critical Integration Points Analysis
### Integration Point 1: Autograd + Module 01 (Tensor)
**What Should Be Tested**:
- All Tensor operations preserve `requires_grad`
- Tensor operations create `_grad_fn` correctly
- `backward()` computes correct gradients for all operations
- Broadcasting during backward works correctly
- Scalar tensors can call `backward()` without arguments
**Current Coverage**: 60%
- ✅ Basic operations tested in `test_gradient_flow.py`
- ❌ Missing: Broadcasting edge cases
- ❌ Missing: Scalar tensor backward
- ❌ Missing: Inplace operation detection
**Missing Tests**:
```python
# Test: Broadcasting gradient accumulation
def test_broadcasting_backward():
"""Test gradients accumulate correctly with broadcasting."""
bias = Tensor([1.0], requires_grad=True) # Shape (1,)
x = Tensor([[1, 2], [3, 4]], requires_grad=True) # Shape (2, 2)
y = x + bias # Broadcasts to (2, 2)
loss = y.sum()
loss.backward()
# bias.grad should be summed over all broadcast dimensions
assert bias.grad.shape == (1,), "Bias gradient shape wrong"
assert np.allclose(bias.grad, [4.0]), "Broadcasting backward failed"
```
### Integration Point 2: Autograd + Module 02 (Activations)
**What Should Be Tested**:
- ReLU, Sigmoid, Softmax, GELU all preserve gradient tracking
- Activation gradients compose correctly in chains
- Dead ReLU neurons (zero gradient) handled correctly
- Softmax numerical stability during backward
**Current Coverage**: 70%
- ✅ Basic activation gradients tested
- ✅ GELU gradient flow tested
- ❌ Missing: Activation chaining gradients
- ❌ Missing: Dead ReLU detection
**Missing Tests**:
```python
# Test: Multi-activation gradient chain
def test_activation_chain_gradients():
"""Test gradients flow through chained activations."""
x = Tensor([1.0, -1.0, 2.0], requires_grad=True)
relu = ReLU()
sigmoid = Sigmoid()
# Chain: x -> ReLU -> Sigmoid -> loss
h = relu(x)
y = sigmoid(h)
loss = y.sum()
loss.backward()
# x.grad should reflect both ReLU and Sigmoid derivatives
assert x.grad is not None, "Gradient didn't flow through chain"
# Dead neuron at x=-1 should have zero gradient
assert np.isclose(x.grad[1], 0.0), "Dead ReLU gradient not zero"
```
### Integration Point 3: Autograd + Module 03 (Layers)
**What Should Be Tested**:
- Dense layer forward preserves `requires_grad`
- Dense layer backward computes weight and bias gradients
- Multi-layer networks backpropagate correctly
- Parameter sharing accumulates gradients
**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No tests for Dense layer gradients
**Missing Tests**:
```python
# Test: Dense layer gradient computation
def test_dense_layer_gradients():
"""Test Dense layer computes weight and bias gradients."""
from tinytorch.core.layers import Dense
layer = Dense(3, 2)
x = Tensor([[1, 2, 3]], requires_grad=True)
# Forward pass
y = layer(x)
loss = y.sum()
# Backward pass
loss.backward()
# Check all gradients exist
assert layer.weight.grad is not None, "Weight gradient missing"
assert layer.bias.grad is not None, "Bias gradient missing"
assert x.grad is not None, "Input gradient missing"
# Check gradient shapes
assert layer.weight.grad.shape == layer.weight.shape
assert layer.bias.grad.shape == layer.bias.shape
```
### Integration Point 4: Autograd + Module 04 (Losses)
**What Should Be Tested**:
- MSE loss computes correct gradients
- CrossEntropy loss computes correct gradients
- BCE loss computes correct gradients
- Loss gradients match hand-calculated values
**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No tests for loss function gradients
**Missing Tests**:
```python
# Test: MSE loss gradient
def test_mse_loss_gradient():
"""Test MSE loss computes correct gradients."""
from tinytorch.core.losses import MSELoss
predictions = Tensor([1.0, 2.0, 3.0], requires_grad=True)
targets = Tensor([1.5, 2.5, 2.5])
mse = MSELoss()
loss = mse(predictions, targets)
loss.backward()
# MSE gradient: 2 * (pred - target) / N
expected_grad = 2 * (predictions.data - targets.data) / 3
assert np.allclose(predictions.grad, expected_grad), "MSE gradient incorrect"
```
### Integration Point 5: In-Place Operations
**What Should Be Tested**:
- In-place ops break computation graph (expected behavior)
- In-place ops raise warnings or errors
- Students see clear error messages
**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No in-place operation tests
**Missing Tests**:
```python
# Test: In-place operation detection
def test_inplace_operations_break_graph():
"""Test that in-place operations are detected and warned."""
x = Tensor([1, 2, 3], requires_grad=True)
y = x * 2
# In-place modification (if implemented) should break graph
# This test ensures students understand the danger
try:
x.data[0] = 999 # Direct modification
y.backward(Tensor([1, 1, 1]))
# If we get here, gradient is computed on modified data - BAD!
assert False, "In-place modification should affect gradients"
except Exception:
# Expected: Some warning or error about in-place ops
pass
```
### Integration Point 6: Memory Leaks (Computational Graph)
**What Should Be Tested**:
- Computation graphs don't accumulate across iterations
- `zero_grad()` prevents gradient accumulation
- Large graphs can be garbage collected
**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No memory leak tests
**Missing Tests**:
```python
# Test: Gradient accumulation prevention
def test_zero_grad_prevents_accumulation():
"""Test zero_grad() prevents gradient accumulation."""
x = Tensor([1.0], requires_grad=True)
# First backward pass
y1 = x * 2
y1.backward()
first_grad = x.grad.copy()
# Second backward WITHOUT zero_grad - accumulates
y2 = x * 3
y2.backward()
assert np.allclose(x.grad, first_grad + 3.0), "Gradients should accumulate"
# Third backward WITH zero_grad - doesn't accumulate
x.zero_grad()
y3 = x * 4
y3.backward()
assert np.allclose(x.grad, 4.0), "zero_grad() should reset gradients"
```
### Integration Point 7: Gradient Accumulation (Parameter Sharing)
**What Should Be Tested**:
- Shared parameters accumulate gradients correctly
- Embedding layers with repeated indices accumulate gradients
- Multi-path graphs accumulate gradients
**Current Coverage**: 0% ❌
- **COMPLETELY MISSING**: No gradient accumulation tests
**Missing Tests**:
```python
# Test: Parameter sharing gradient accumulation
def test_shared_parameter_gradient_accumulation():
"""Test shared parameters accumulate gradients from multiple uses."""
weight = Tensor([2.0], requires_grad=True)
# Use same weight twice
x1 = Tensor([1.0])
x2 = Tensor([3.0])
y1 = weight * x1 # First use
y2 = weight * x2 # Second use
loss = y1.sum() + y2.sum()
loss.backward()
# Gradient should accumulate: dy1/dw + dy2/dw = 1.0 + 3.0 = 4.0
assert np.allclose(weight.grad, 4.0), "Shared parameter gradients didn't accumulate"
```
---
## Missing Progressive Integration Tests
### Test Class 1: Prior Stack Stability (Modules 01-04)
**Purpose**: Verify Autograd didn't break previous modules
**Missing Tests**:
```python
class TestPriorStackStillWorking:
"""Verify Modules 01-04 still work after Autograd."""
def test_tensor_operations_stable(self):
"""Tensor operations work without requires_grad."""
from tinytorch.core.tensor import Tensor
# Should work exactly as before (Module 01)
x = Tensor([1, 2, 3])
y = Tensor([4, 5, 6])
z = x + y
assert np.array_equal(z.data, [5, 7, 9])
assert z.grad is None # No gradient tracking
def test_activations_stable(self):
"""Activations work without requires_grad."""
from tinytorch.core.activations import ReLU
from tinytorch.core.tensor import Tensor
relu = ReLU()
x = Tensor([-1, 0, 1])
y = relu(x)
assert np.array_equal(y.data, [0, 0, 1])
assert y.grad is None # No gradient tracking
```
### Test Class 2: Autograd Core Functionality
**Purpose**: Test Autograd's core capabilities
**Missing Tests**:
```python
class TestModule05AutogradCore:
"""Test Module 05 (Autograd) core functionality."""
def test_simple_backward_pass(self):
"""Test simple computational graph backward pass."""
enable_autograd()
x = Tensor([2.0], requires_grad=True)
y = x * 3
loss = y.sum()
loss.backward()
assert x.grad is not None
assert np.allclose(x.grad, [3.0])
def test_multi_step_backward(self):
"""Test multi-step computation graph."""
enable_autograd()
x = Tensor([2.0], requires_grad=True)
y = x * 3 # y = 6
z = y + 1 # z = 7
w = z * 2 # w = 14
w.backward()
# dw/dx = dw/dz * dz/dy * dy/dx = 2 * 1 * 3 = 6
assert np.allclose(x.grad, [6.0])
```
### Test Class 3: Full Stack Integration
**Purpose**: Test complete pipeline (Modules 01-05)
**Missing Tests**:
```python
class TestProgressiveStackIntegration:
"""Test complete stack (01→05) works together."""
def test_neural_network_backward(self):
"""Test complete neural network with backprop."""
enable_autograd()
from tinytorch.core.layers import Dense
from tinytorch.core.activations import ReLU
from tinytorch.core.losses import MSELoss
# Build network
layer1 = Dense(3, 4)
relu = ReLU()
layer2 = Dense(4, 2)
# Forward pass
x = Tensor([[1, 2, 3]], requires_grad=True)
h = relu(layer1(x))
y = layer2(h)
# Loss
target = Tensor([[1, 0]])
loss_fn = MSELoss()
loss = loss_fn(y, target)
# Backward pass
loss.backward()
# All parameters should have gradients
assert layer1.weight.grad is not None
assert layer1.bias.grad is not None
assert layer2.weight.grad is not None
assert layer2.bias.grad is not None
assert x.grad is not None
```
---
## Bug-Catching Priority Matrix
| Category | Priority | Coverage | Missing Tests |
|----------|----------|----------|---------------|
| **Gradient Correctness** | 🔴 CRITICAL | 70% | Numerical gradient checks |
| **In-Place Operations** | 🔴 CRITICAL | 0% | Graph breaking detection |
| **Memory Leaks** | 🟠 HIGH | 0% | Graph accumulation tests |
| **Gradient Accumulation** | 🟠 HIGH | 0% | Shared parameter tests |
| **Module Integration** | 🟠 HIGH | 30% | Multi-module pipelines |
| **Prior Module Stability** | 🟡 MEDIUM | 0% | Regression tests |
| **Broadcasting** | 🟡 MEDIUM | 40% | Edge case tests |
| **Numerical Stability** | 🟢 LOW | 50% | Extreme value tests |
---
## Recommendations
### Immediate Actions (Week 1)
1. **Fix File Misplacement** (1 hour)
- Move `test_progressive_integration.py` to `tests/08_dataloader/`
- Create new `tests/05_autograd/test_progressive_integration.py`
2. **Add Critical Missing Tests** (4 hours)
- Dense layer gradient tests
- Loss function gradient tests
- In-place operation detection
- Memory leak tests
3. **Add Prior Module Stability Tests** (2 hours)
- Test Modules 01-04 still work
- Test gradients don't affect non-gradient mode
### Short-Term Actions (Week 2-3)
4. **Add Integration Tests** (6 hours)
- Full neural network backward pass
- Multi-layer gradient flow
- Shared parameter accumulation
5. **Add Edge Case Tests** (3 hours)
- Broadcasting edge cases
- Scalar tensor backward
- Empty gradient handling
### Long-Term Actions (Month 1)
6. **Add Numerical Gradient Checks** (8 hours)
- Finite difference verification for all operations
- Ensures analytical gradients are correct
7. **Add Performance Tests** (4 hours)
- Large graph memory usage
- Gradient computation speed
- Graph building overhead
---
## Test Template for Module 05
```python
"""
Module 05: Progressive Integration Tests
Tests that Module 05 (Autograd) works correctly AND that all previous modules still work.
DEPENDENCY CHAIN: 01_tensor → 02_activations → 03_layers → 04_losses → 05_autograd
This is where automatic differentiation enables training.
"""
import numpy as np
import sys
from pathlib import Path
# Add project root to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
class TestPriorStackStillWorking:
"""Verify Modules 01-04 functionality is still intact."""
def test_tensor_operations_stable(self):
"""Ensure tensor operations work without gradients."""
# Test implementation
pass
def test_activations_stable(self):
"""Ensure activations work without gradients."""
# Test implementation
pass
def test_layers_stable(self):
"""Ensure layers work without gradients."""
# Test implementation
pass
class TestModule05AutogradCore:
"""Test Module 05 (Autograd) core functionality."""
def test_enable_autograd(self):
"""Test autograd can be enabled."""
# Test implementation
pass
def test_simple_backward(self):
"""Test simple backward pass."""
# Test implementation
pass
def test_requires_grad_tracking(self):
"""Test requires_grad flag works."""
# Test implementation
pass
class TestAutogradTensorIntegration:
"""Test Autograd works with all Tensor operations (Module 01)."""
def test_arithmetic_gradients(self):
"""Test gradients for +, -, *, /."""
# Test implementation
pass
def test_matmul_gradients(self):
"""Test gradients for matrix multiplication."""
# Test implementation
pass
def test_broadcasting_gradients(self):
"""Test broadcasting during backward."""
# Test implementation
pass
class TestAutogradActivationIntegration:
"""Test Autograd works with Activations (Module 02)."""
def test_relu_gradients(self):
"""Test ReLU gradients."""
# Test implementation
pass
def test_sigmoid_gradients(self):
"""Test Sigmoid gradients."""
# Test implementation
pass
def test_activation_chain_gradients(self):
"""Test chained activation gradients."""
# Test implementation
pass
class TestAutogradLayerIntegration:
"""Test Autograd works with Layers (Module 03)."""
def test_dense_layer_gradients(self):
"""Test Dense layer parameter gradients."""
# Test implementation
pass
def test_multi_layer_gradients(self):
"""Test multi-layer network gradients."""
# Test implementation
pass
class TestAutogradLossIntegration:
"""Test Autograd works with Loss functions (Module 04)."""
def test_mse_loss_gradients(self):
"""Test MSE loss gradients."""
# Test implementation
pass
def test_crossentropy_loss_gradients(self):
"""Test CrossEntropy loss gradients."""
# Test implementation
pass
class TestProgressiveStackIntegration:
"""Test complete stack (01→05) works together."""
def test_end_to_end_training_step(self):
"""Test complete forward + backward pass."""
# Test implementation
pass
def test_gradient_accumulation(self):
"""Test gradients accumulate correctly."""
# Test implementation
pass
class TestAutogradBugPrevention:
"""Tests that catch common autograd bugs."""
def test_inplace_operations(self):
"""Test in-place operations are handled correctly."""
# Test implementation
pass
def test_memory_leaks(self):
"""Test computation graphs don't leak memory."""
# Test implementation
pass
def test_zero_grad_works(self):
"""Test zero_grad() prevents accumulation."""
# Test implementation
pass
```
---
## Conclusion
**Overall Assessment**: Module 05 integration tests are **INCOMPLETE** and **MISPLACED**.
**Risk Level**: 🔴 **HIGH** - Missing critical tests could allow gradient bugs to slip into production.
**Recommended Action**: Implement missing tests IMMEDIATELY before students encounter gradient bugs.
**Estimated Effort**: 20-25 hours to achieve 90% coverage.
**Student Impact**: Without these tests, students will encounter confusing gradient bugs that are hard to debug. Proper integration tests will catch these issues early.
---
**Report Generated**: 2025-11-25
**Next Review**: After implementing critical missing tests

View File

@@ -1,401 +0,0 @@
"""
Module 08: Progressive Integration Tests
Tests that Module 08 (DataLoader) works correctly AND that the entire prior stack works.
DEPENDENCY CHAIN: 01_setup → 02_tensor → 03_activations → 04_layers → 05_dense → 06_spatial → 07_attention → 08_dataloader
This is where we enable real data processing for ML systems.
"""
import numpy as np
import sys
from pathlib import Path
# Add project root to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
class TestPriorStackStillWorking:
"""Quick regression checks that prior modules (01→07) still work."""
def test_foundation_stack_stable(self):
"""Verify foundation stack (01→05) remains stable."""
# Environment (Module 01)
assert sys.version_info >= (3, 8), "Foundation broken: Python version"
# Core functionality should work
try:
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Dense
# Should still be able to build networks
layer = Dense(10, 5)
x = Tensor(np.random.randn(4, 10))
output = layer(x)
assert output.shape == (4, 5), "Foundation broken: Neural network"
except ImportError:
assert True, "Foundation not implemented yet"
def test_advanced_stack_stable(self):
"""Verify advanced modules (06→07) still work."""
try:
from tinytorch.core.spatial import Conv2D
from tinytorch.core.attention import MultiHeadAttention
# Spatial and attention should work
conv = Conv2D(in_channels=3, out_channels=16, kernel_size=3)
attention = MultiHeadAttention(embed_dim=64, num_heads=8)
assert hasattr(conv, 'forward'), "Advanced stack broken: Spatial"
assert hasattr(attention, 'forward'), "Advanced stack broken: Attention"
except ImportError:
assert True, "Advanced stack not implemented yet"
class TestModule08DataLoaderCore:
"""Test Module 08 (DataLoader) core functionality."""
def test_dataset_creation(self):
"""Test basic dataset creation works."""
try:
from tinytorch.core.data import Dataset
# Create simple dataset
class SimpleDataset(Dataset):
def __init__(self, size=100):
self.size = size
self.data = np.random.randn(size, 10)
self.targets = np.random.randint(0, 3, size)
def __len__(self):
return self.size
def __getitem__(self, idx):
return self.data[idx], self.targets[idx]
dataset = SimpleDataset(50)
assert len(dataset) == 50, "Dataset length broken"
# Test data access
sample, target = dataset[0]
assert sample.shape == (10,), "Dataset sample shape broken"
assert isinstance(target, (int, np.integer)), "Dataset target type broken"
except ImportError:
assert True, "Dataset not implemented yet"
def test_dataloader_creation(self):
"""Test DataLoader creation and batching."""
try:
from tinytorch.core.data import DataLoader, Dataset
from tinytorch.core.tensor import Tensor
# Simple dataset for testing
class TestDataset(Dataset):
def __init__(self):
self.data = np.random.randn(20, 5)
self.targets = np.random.randint(0, 2, 20)
def __len__(self):
return 20
def __getitem__(self, idx):
return Tensor(self.data[idx]), self.targets[idx]
dataset = TestDataset()
dataloader = DataLoader(dataset, batch_size=4, shuffle=True)
# Test batching
for batch_x, batch_y in dataloader:
assert batch_x.shape == (4, 5), "DataLoader batch shape broken"
assert len(batch_y) == 4, "DataLoader target batch broken"
break # Just test first batch
except ImportError:
assert True, "DataLoader not implemented yet"
def test_real_dataset_support(self):
"""Test support for real datasets like CIFAR-10."""
try:
from tinytorch.core.data import CIFAR10Dataset
# Note: This might download data, so we'll just test instantiation
# In real usage, students would download CIFAR-10
try:
dataset = CIFAR10Dataset(root='./data', train=True, download=False)
# If dataset exists, test basic functionality
if len(dataset) > 0:
sample, target = dataset[0]
assert len(sample.shape) >= 2, "CIFAR-10 sample shape invalid"
assert isinstance(target, (int, np.integer)), "CIFAR-10 target invalid"
except (FileNotFoundError, RuntimeError):
# Data not downloaded, which is fine for testing
assert True, "CIFAR-10 data not available (expected)"
except ImportError:
assert True, "Real dataset support not implemented yet"
class TestProgressiveStackIntegration:
"""Test that the complete stack (01→08) works together."""
def test_complete_training_pipeline(self):
"""Test complete ML pipeline: data → model → training."""
try:
from tinytorch.core.data import DataLoader, Dataset
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Dense
from tinytorch.core.activations import ReLU, Softmax
# Create dataset
class MLDataset(Dataset):
def __init__(self):
self.data = np.random.randn(40, 10)
self.targets = np.random.randint(0, 3, 40)
def __len__(self):
return 40
def __getitem__(self, idx):
return Tensor(self.data[idx]), self.targets[idx]
# Create data pipeline
dataset = MLDataset()
dataloader = DataLoader(dataset, batch_size=8, shuffle=True)
# Create model using prior modules
layer1 = Dense(10, 16)
layer2 = Dense(16, 3)
relu = ReLU()
softmax = Softmax()
# Test training loop structure
for batch_x, batch_y in dataloader:
# Forward pass through complete pipeline
h = relu(layer1(batch_x))
logits = layer2(h)
predictions = softmax(logits)
assert predictions.shape == (8, 3), "Complete pipeline broken"
# Test one batch
break
except ImportError:
assert True, "Complete training pipeline not ready yet"
def test_cnn_data_pipeline(self):
"""Test CNN pipeline with spatial data."""
try:
from tinytorch.core.data import DataLoader, Dataset
from tinytorch.core.spatial import Conv2D, MaxPool2D
from tinytorch.core.layers import Dense
from tinytorch.core.tensor import Tensor
# Image dataset
class ImageDataset(Dataset):
def __init__(self):
# 32x32 RGB images
self.data = np.random.randn(20, 3, 32, 32)
self.targets = np.random.randint(0, 5, 20)
def __len__(self):
return 20
def __getitem__(self, idx):
return Tensor(self.data[idx]), self.targets[idx]
dataset = ImageDataset()
dataloader = DataLoader(dataset, batch_size=4)
# CNN components
conv1 = Conv2D(in_channels=3, out_channels=16, kernel_size=3)
pool = MaxPool2D(kernel_size=2)
fc = Dense(16 * 15 * 15, 5) # Approximate after conv/pool
# Test CNN pipeline
for batch_x, batch_y in dataloader:
assert batch_x.shape == (4, 3, 32, 32), "Image batch shape broken"
# Simplified CNN forward (shape checking)
if hasattr(conv1, '__call__'):
conv_out = conv1(batch_x)
# Check reasonable conv output shape
assert len(conv_out.shape) == 4, "Conv output dimensionality broken"
break
except ImportError:
assert True, "CNN data pipeline not ready yet"
class TestRealWorldDataCapability:
"""Test capability to handle real-world datasets."""
def test_data_preprocessing_pipeline(self):
"""Test data preprocessing and augmentation."""
try:
from tinytorch.core.data import transforms
from tinytorch.core.tensor import Tensor
# Basic transforms
if hasattr(transforms, 'Normalize'):
normalize = transforms.Normalize(mean=[0.5], std=[0.5])
# Test data
data = Tensor(np.random.randn(3, 32, 32))
normalized = normalize(data)
assert normalized.shape == data.shape, "Normalization broken"
if hasattr(transforms, 'RandomCrop'):
crop = transforms.RandomCrop(size=28)
data = Tensor(np.random.randn(3, 32, 32))
cropped = crop(data)
assert cropped.shape[-2:] == (28, 28), "Random crop broken"
except ImportError:
assert True, "Data preprocessing not implemented yet"
def test_memory_efficient_loading(self):
"""Test memory efficient data loading."""
try:
from tinytorch.core.data import DataLoader, Dataset
# Large dataset simulation
class LargeDataset(Dataset):
def __init__(self, size=1000):
self.size = size
# Don't load all data at once - simulate lazy loading
def __len__(self):
return self.size
def __getitem__(self, idx):
# Simulate loading data on-demand
return np.random.randn(100), idx % 10
dataset = LargeDataset(1000)
dataloader = DataLoader(dataset, batch_size=32, shuffle=True)
# Should be able to iterate without loading all data
batch_count = 0
for batch_x, batch_y in dataloader:
batch_count += 1
if batch_count >= 3: # Test a few batches
break
assert batch_count == 3, "Memory efficient loading broken"
except ImportError:
assert True, "Memory efficient loading not ready yet"
def test_parallel_data_loading(self):
"""Test parallel/multi-threaded data loading."""
try:
from tinytorch.core.data import DataLoader, Dataset
class ParallelDataset(Dataset):
def __init__(self):
self.data = np.random.randn(100, 50)
def __len__(self):
return 100
def __getitem__(self, idx):
# Simulate some processing time
return self.data[idx], idx % 5
dataset = ParallelDataset()
# Test with num_workers if supported
if 'num_workers' in DataLoader.__init__.__code__.co_varnames:
dataloader = DataLoader(dataset, batch_size=16, num_workers=2)
else:
dataloader = DataLoader(dataset, batch_size=16)
# Should work regardless of parallel support
for batch_x, batch_y in dataloader:
assert batch_x.shape == (16, 50), "Parallel loading broken"
break
except ImportError:
assert True, "Parallel data loading not ready yet"
class TestRegressionPrevention:
"""Ensure previous modules still work after Module 08 development."""
def test_no_foundation_regression(self):
"""Verify foundation stack (01→05) unchanged."""
# Core functionality should remain stable
assert sys.version_info.major >= 3, "Foundation: Python detection broken"
# Tensor operations should still work
try:
from tinytorch.core.tensor import Tensor
t = Tensor([1, 2, 3])
assert t.shape == (3,), "Foundation regression: Tensor broken"
except ImportError:
import numpy as np
arr = np.array([1, 2, 3])
assert arr.shape == (3,), "Foundation regression: Numpy broken"
def test_no_advanced_regression(self):
"""Verify advanced modules (06→07) unchanged."""
try:
from tinytorch.core.spatial import Conv2D
from tinytorch.core.attention import MultiHeadAttention
# Advanced operations should still work
conv = Conv2D(in_channels=1, out_channels=4, kernel_size=3)
attention = MultiHeadAttention(embed_dim=32, num_heads=4)
assert hasattr(conv, 'forward'), "Advanced regression: Spatial broken"
assert hasattr(attention, 'forward'), "Advanced regression: Attention broken"
except ImportError:
# If not implemented, basic functionality should work
import numpy as np
assert np.random is not None, "Advanced regression: Random broken"
def test_progressive_stability(self):
"""Test the progressive stack is stable through data loading."""
# Stack should be stable through: Setup → ... → Attention → DataLoader
# Setup level
import numpy as np
assert np is not None, "Setup level broken"
# Foundation level (if available)
try:
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Dense
# Neural networks should still work
layer = Dense(5, 3)
x = Tensor(np.random.randn(2, 5))
output = layer(x)
assert output.shape == (2, 3), "Foundation level broken"
except ImportError:
pass # Not implemented yet
# Data level (if available)
try:
from tinytorch.core.data import Dataset
class TestDataset(Dataset):
def __len__(self):
return 10
def __getitem__(self, idx):
return idx, idx * 2
dataset = TestDataset()
assert len(dataset) == 10, "Data level broken"
except ImportError:
pass # Not implemented yet

View File

@@ -1,515 +0,0 @@
"""
Module 07 Training - Critical Integration Tests Template
This file contains the TOP 3 CRITICAL tests that MUST be implemented immediately
to establish basic confidence that Module 07 (Training) works correctly.
These tests catch the most common and severe bugs in training systems.
PRIORITY: P0 - IMPLEMENT IMMEDIATELY
ESTIMATED TIME: 2-3 hours
BUG-CATCHING VALUE: CRITICAL
"""
import pytest
import numpy as np
import sys
from pathlib import Path
# Add project root to path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
# Import from TinyTorch
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Linear
from tinytorch.core.activations import ReLU
from tinytorch.core.losses import MSELoss, CrossEntropyLoss
from tinytorch.core.optimizers import SGD, AdamW
from tinytorch.core.training import Trainer, CosineSchedule, clip_grad_norm
# =============================================================================
# CRITICAL TEST 1: Missing zero_grad() Detection
# =============================================================================
# BUG-CATCHING VALUE: CRITICAL
# COMMON STUDENT MISTAKE: Forgetting optimizer.zero_grad()
# SYMPTOM: Training appears to run but gradients accumulate incorrectly
# =============================================================================
class TestMissingZeroGrad:
"""Test that missing zero_grad() is caught and causes visible failure."""
def test_zero_grad_required_for_correct_training(self):
"""
Test that zero_grad() is essential for correct gradient computation.
This test validates that:
1. Without zero_grad(), gradients accumulate across batches
2. Accumulated gradients cause incorrect parameter updates
3. Training with accumulated gradients behaves differently than correct training
"""
# Create simple linear model: y = Wx + b
layer_correct = Linear(1, 1)
layer_broken = Linear(1, 1)
# Make weights identical to start
layer_broken.weights.data = layer_correct.weights.data.copy()
if hasattr(layer_correct, 'bias') and layer_correct.bias is not None:
layer_broken.bias.data = layer_correct.bias.data.copy()
# Create optimizers
optimizer_correct = SGD(layer_correct.parameters(), lr=0.1)
optimizer_broken = SGD(layer_broken.parameters(), lr=0.1)
loss_fn = MSELoss()
# Training data: 5 identical samples
x_data = Tensor([[1.0]])
y_data = Tensor([[2.0]])
# === CORRECT TRAINING (with zero_grad) ===
correct_grad_norms = []
for step in range(5):
optimizer_correct.zero_grad() # ✅ CRITICAL: Clear gradients
output = layer_correct.forward(x_data)
loss = loss_fn.forward(output, y_data)
loss.backward()
# Record gradient norm
grad_norm = np.linalg.norm(layer_correct.weights.grad.data)
correct_grad_norms.append(grad_norm)
optimizer_correct.step()
# === BROKEN TRAINING (without zero_grad) ===
broken_grad_norms = []
for step in range(5):
# ❌ BUG: Missing optimizer_broken.zero_grad()
output = layer_broken.forward(x_data)
loss = loss_fn.forward(output, y_data)
loss.backward()
# Record gradient norm (should accumulate!)
grad_norm = np.linalg.norm(layer_broken.weights.grad.data)
broken_grad_norms.append(grad_norm)
optimizer_broken.step()
# === VALIDATION ===
print("\n🔬 Testing zero_grad() requirement:")
print(f"Correct gradient norms (with zero_grad): {correct_grad_norms}")
print(f"Broken gradient norms (without zero_grad): {broken_grad_norms}")
# Test 1: Gradients should accumulate without zero_grad()
assert broken_grad_norms[-1] > broken_grad_norms[0] * 2.0, \
"Gradients should accumulate when zero_grad() is missing"
# Test 2: Correct gradients should be relatively stable
correct_variation = max(correct_grad_norms) / (min(correct_grad_norms) + 1e-8)
assert correct_variation < 5.0, \
"Correct gradients shouldn't grow excessively"
# Test 3: Broken gradients grow much larger than correct ones
assert broken_grad_norms[-1] > correct_grad_norms[-1] * 2.0, \
"Missing zero_grad() should cause noticeably larger gradients"
print("✅ zero_grad() requirement correctly enforced!")
def test_trainer_calls_zero_grad(self):
"""
Test that Trainer class properly calls zero_grad() during training.
This validates the Trainer implementation includes the critical zero_grad() call.
"""
# Create simple model
class SimpleModel:
def __init__(self):
self.layer = Linear(2, 1)
self.training = True
def forward(self, x):
return self.layer.forward(x)
def parameters(self):
return self.layer.parameters()
model = SimpleModel()
optimizer = SGD(model.parameters(), lr=0.01)
loss_fn = MSELoss()
trainer = Trainer(model, optimizer, loss_fn)
# Create simple dataset
class SimpleDataset:
def __iter__(self):
for _ in range(3):
x = Tensor(np.random.randn(2, 2))
y = Tensor(np.random.randn(2, 1))
yield x, y
# Train for 2 epochs
for epoch in range(2):
trainer.train_epoch(SimpleDataset())
# After training, gradients should be zeroed (from last zero_grad() call)
# OR they should exist from last backward (depends on implementation)
# Key test: Training should have called zero_grad() internally
# (This is validated by training not diverging)
print("✅ Trainer correctly manages gradient clearing!")
# =============================================================================
# CRITICAL TEST 2: Loss Convergence Validation
# =============================================================================
# BUG-CATCHING VALUE: CRITICAL
# PURPOSE: Validate entire training pipeline produces learning
# SYMPTOM: Training runs but model doesn't improve
# =============================================================================
class TestLossConvergence:
"""Test that training actually produces learning on simple problems."""
def test_linear_regression_convergence(self):
"""
Test training converges on simple linear regression problem.
Problem: Learn y = 2x + 1
Model: Linear(1, 1) with weights and bias
Success criteria: Loss decreases, learned weights ≈ [2.0], bias ≈ [1.0]
"""
# Create model
class LinearModel:
def __init__(self):
self.layer = Linear(1, 1)
self.training = True
def forward(self, x):
return self.layer.forward(x)
def parameters(self):
return self.layer.parameters()
model = LinearModel()
optimizer = SGD(model.parameters(), lr=0.01)
loss_fn = MSELoss()
trainer = Trainer(model, optimizer, loss_fn)
# Generate training data: y = 2x + 1
np.random.seed(42)
X_train = np.random.randn(100, 1).astype(np.float32)
y_train = (2.0 * X_train + 1.0).astype(np.float32)
# Create dataset
class RegressionDataset:
def __init__(self, X, y, batch_size=10):
self.X = X
self.y = y
self.batch_size = batch_size
def __iter__(self):
indices = np.arange(len(self.X))
np.random.shuffle(indices)
for i in range(0, len(self.X), self.batch_size):
batch_indices = indices[i:i+self.batch_size]
yield Tensor(self.X[batch_indices]), Tensor(self.y[batch_indices])
dataset = RegressionDataset(X_train, y_train, batch_size=10)
# Train for 100 epochs
print("\n🔬 Testing loss convergence on y = 2x + 1:")
losses = []
for epoch in range(100):
loss = trainer.train_epoch(dataset)
losses.append(loss)
if epoch % 20 == 0:
print(f"Epoch {epoch:3d}: Loss = {loss:.6f}")
initial_loss = losses[0]
final_loss = losses[-1]
print(f"\nInitial loss: {initial_loss:.6f}")
print(f"Final loss: {final_loss:.6f}")
print(f"Reduction: {(1 - final_loss/initial_loss)*100:.1f}%")
# Test 1: Loss should decrease significantly
assert final_loss < initial_loss * 0.1, \
f"Loss should decrease to < 10% of initial. Got {final_loss/initial_loss*100:.1f}%"
# Test 2: Loss should be near zero (good fit)
assert final_loss < 0.1, \
f"Final loss should be < 0.1 for simple problem. Got {final_loss:.6f}"
# Test 3: Learned weights should approximate true values
learned_weight = model.layer.weights.data[0, 0]
learned_bias = model.layer.bias.data[0] if model.layer.bias is not None else 0.0
print(f"\nTrue parameters: weight=2.0, bias=1.0")
print(f"Learned parameters: weight={learned_weight:.3f}, bias={learned_bias:.3f}")
# Allow some tolerance for learning
assert abs(learned_weight - 2.0) < 0.5, \
f"Weight should be close to 2.0, got {learned_weight:.3f}"
if model.layer.bias is not None:
assert abs(learned_bias - 1.0) < 0.5, \
f"Bias should be close to 1.0, got {learned_bias:.3f}"
print("✅ Training successfully converged to correct solution!")
def test_classification_convergence(self):
"""
Test training converges on simple classification problem.
Problem: Learn XOR-like pattern with 2-layer network
Success criteria: Loss decreases, accuracy improves
"""
# Create 2-layer model for XOR
class XORModel:
def __init__(self):
self.layer1 = Linear(2, 4)
self.relu = ReLU()
self.layer2 = Linear(4, 2)
self.training = True
def forward(self, x):
x = self.layer1.forward(x)
x = self.relu.forward(x)
x = self.layer2.forward(x)
return x
def parameters(self):
return self.layer1.parameters() + self.layer2.parameters()
model = XORModel()
optimizer = AdamW(model.parameters(), lr=0.01)
loss_fn = CrossEntropyLoss()
trainer = Trainer(model, optimizer, loss_fn)
# Generate XOR-like data
np.random.seed(42)
X_train = np.array([
[0, 0], [0, 1], [1, 0], [1, 1],
[0, 0], [0, 1], [1, 0], [1, 1],
[0, 0], [0, 1], [1, 0], [1, 1],
], dtype=np.float32)
y_train = np.array([0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0], dtype=np.int64)
# Create dataset
class XORDataset:
def __iter__(self):
for i in range(len(X_train)):
yield Tensor(X_train[i:i+1]), Tensor(y_train[i:i+1])
dataset = XORDataset()
# Train for 200 epochs
print("\n🔬 Testing classification convergence on XOR pattern:")
losses = []
for epoch in range(200):
loss = trainer.train_epoch(dataset)
losses.append(loss)
if epoch % 40 == 0:
print(f"Epoch {epoch:3d}: Loss = {loss:.6f}")
initial_loss = losses[0]
final_loss = losses[-1]
print(f"\nInitial loss: {initial_loss:.6f}")
print(f"Final loss: {final_loss:.6f}")
print(f"Reduction: {(1 - final_loss/initial_loss)*100:.1f}%")
# Test: Loss should decrease significantly
assert final_loss < initial_loss * 0.5, \
f"Loss should decrease to < 50% of initial. Got {final_loss/initial_loss*100:.1f}%"
print("✅ Classification training successfully converged!")
# =============================================================================
# CRITICAL TEST 3: Scheduler Integration
# =============================================================================
# BUG-CATCHING VALUE: HIGH
# COMMON BUG: Scheduler exists but doesn't actually update learning rate
# SYMPTOM: Learning rate stays constant despite scheduler
# =============================================================================
class TestSchedulerIntegration:
"""Test that learning rate scheduler actually updates optimizer learning rate."""
def test_scheduler_updates_learning_rate(self):
"""
Test that CosineSchedule integrates with Trainer and updates LR each epoch.
This validates:
1. Scheduler computes correct learning rates
2. Trainer applies scheduler updates to optimizer
3. Learning rate actually changes during training
"""
# Create simple model
class SimpleModel:
def __init__(self):
self.layer = Linear(2, 1)
self.training = True
def forward(self, x):
return self.layer.forward(x)
def parameters(self):
return self.layer.parameters()
model = SimpleModel()
optimizer = SGD(model.parameters(), lr=0.1) # Initial LR (will be overridden)
# Create scheduler: 0.1 → 0.01 over 10 epochs
scheduler = CosineSchedule(max_lr=0.1, min_lr=0.01, total_epochs=10)
loss_fn = MSELoss()
trainer = Trainer(model, optimizer, loss_fn, scheduler=scheduler)
# Create simple dataset
class SimpleDataset:
def __iter__(self):
for _ in range(5):
x = Tensor(np.random.randn(4, 2))
y = Tensor(np.random.randn(4, 1))
yield x, y
print("\n🔬 Testing learning rate scheduling:")
# Train for 10 epochs and track learning rate
learning_rates = []
for epoch in range(10):
# Record LR before training
lr_before = optimizer.lr
# Train one epoch
trainer.train_epoch(SimpleDataset())
# Record LR after training (scheduler should have updated it)
lr_after = optimizer.lr
learning_rates.append(lr_after)
print(f"Epoch {epoch}: LR = {lr_after:.6f}")
print(f"\nLearning rates: {[f'{lr:.4f}' for lr in learning_rates]}")
# Test 1: Learning rate should start at max_lr
assert abs(learning_rates[0] - 0.1) < 1e-6, \
f"Initial LR should be 0.1, got {learning_rates[0]:.6f}"
# Test 2: Learning rate should end at min_lr
assert abs(learning_rates[-1] - 0.01) < 1e-6, \
f"Final LR should be 0.01, got {learning_rates[-1]:.6f}"
# Test 3: Learning rate should decrease monotonically
for i in range(len(learning_rates) - 1):
assert learning_rates[i] >= learning_rates[i+1], \
f"LR should decrease monotonically. Epoch {i}: {learning_rates[i]:.6f} > Epoch {i+1}: {learning_rates[i+1]:.6f}"
# Test 4: Learning rate should actually change (not stuck)
unique_lrs = len(set([round(lr, 6) for lr in learning_rates]))
assert unique_lrs >= 5, \
f"LR should change across epochs. Only {unique_lrs} unique values found."
# Test 5: History should track learning rates
assert len(trainer.history['learning_rates']) == 10, \
"Trainer should record learning rate for each epoch"
print("✅ Learning rate scheduling works correctly!")
def test_training_without_scheduler(self):
"""
Test that training works correctly when scheduler=None.
This validates that scheduler is truly optional.
"""
# Create simple model
class SimpleModel:
def __init__(self):
self.layer = Linear(1, 1)
self.training = True
def forward(self, x):
return self.layer.forward(x)
def parameters(self):
return self.layer.parameters()
model = SimpleModel()
optimizer = SGD(model.parameters(), lr=0.05)
loss_fn = MSELoss()
# Create trainer WITHOUT scheduler
trainer = Trainer(model, optimizer, loss_fn, scheduler=None)
# Create simple dataset
class SimpleDataset:
def __iter__(self):
for _ in range(3):
x = Tensor(np.random.randn(2, 1))
y = Tensor(np.random.randn(2, 1))
yield x, y
print("\n🔬 Testing training without scheduler:")
# Train for 5 epochs
initial_lr = optimizer.lr
for epoch in range(5):
trainer.train_epoch(SimpleDataset())
current_lr = optimizer.lr
print(f"Epoch {epoch}: LR = {current_lr:.6f}")
# Learning rate should stay constant
assert abs(current_lr - initial_lr) < 1e-9, \
f"LR should remain constant without scheduler. Expected {initial_lr}, got {current_lr}"
print("✅ Training without scheduler works correctly!")
# =============================================================================
# Test Execution
# =============================================================================
if __name__ == "__main__":
print("=" * 70)
print("Module 07 - CRITICAL Integration Tests")
print("=" * 70)
# Test 1: Missing zero_grad()
print("\n" + "=" * 70)
print("TEST 1: Missing zero_grad() Detection")
print("=" * 70)
test_zero_grad = TestMissingZeroGrad()
test_zero_grad.test_zero_grad_required_for_correct_training()
test_zero_grad.test_trainer_calls_zero_grad()
# Test 2: Loss Convergence
print("\n" + "=" * 70)
print("TEST 2: Loss Convergence Validation")
print("=" * 70)
test_convergence = TestLossConvergence()
test_convergence.test_linear_regression_convergence()
test_convergence.test_classification_convergence()
# Test 3: Scheduler Integration
print("\n" + "=" * 70)
print("TEST 3: Scheduler Integration")
print("=" * 70)
test_scheduler = TestSchedulerIntegration()
test_scheduler.test_scheduler_updates_learning_rate()
test_scheduler.test_training_without_scheduler()
print("\n" + "=" * 70)
print("ALL CRITICAL TESTS PASSED! ✅")
print("=" * 70)
print("\nModule 07 Training has passed critical integration validation.")
print("These tests verify:")
print(" ✅ Gradients are managed correctly (zero_grad)")
print(" ✅ Training produces learning (convergence)")
print(" ✅ Learning rate scheduling works (scheduler integration)")

View File

@@ -1,550 +0,0 @@
# Module 07 (Training) - Integration Test Audit Report
**Date**: 2025-11-25
**Auditor**: Dr. Sarah Rodriguez
**Status**: CRITICAL GAPS IDENTIFIED - Test coverage is for Module 10 (Optimizers), not Module 07 (Training)
---
## CRITICAL FINDING: Wrong Module Being Tested
**ISSUE**: The file `/tests/07_training/test_progressive_integration.py` contains tests for **Module 10 (Optimizers)**, NOT Module 07 (Training).
**Evidence**:
- Line 2: "Module 10: Progressive Integration Tests"
- Line 3: "Tests that Module 10 (Optimizers) works correctly"
- Line 5: "DEPENDENCY CHAIN: 01_setup → ... → 10_optimizers"
- Line 6: "This is where we enable actual learning through gradient-based optimization."
**Impact**: Module 07 (Training) has NO progressive integration tests validating its core functionality.
---
## Module 07 Implementation Overview
Based on `/src/07_training/07_training.py`, Module 07 provides:
### Core Components Implemented:
1. **CosineSchedule** - Learning rate scheduling with cosine annealing
2. **clip_grad_norm()** - Global gradient norm clipping
3. **Trainer class** - Complete training orchestration with:
- `train_epoch()` - Training loop with gradient accumulation
- `evaluate()` - Evaluation mode without gradients
- `save_checkpoint()` / `load_checkpoint()` - State persistence
- Train/eval mode switching
- Learning rate scheduling integration
- Gradient clipping integration
- History tracking
### Integration Points (Modules 01-06):
- Module 01: Tensor operations
- Module 02: Activations (ReLU, Sigmoid)
- Module 03: Layers (Linear)
- Module 04: Losses (MSELoss, CrossEntropyLoss)
- Module 05: Autograd (backward pass, gradients)
- Module 06: Optimizers (SGD, AdamW)
---
## Current Test Coverage Analysis
### Existing Test Files:
1. **test_progressive_integration.py** (498 lines)
- **WRONG MODULE**: Tests Module 10 (Optimizers)
- Tests SGD/Adam creation, parameter updates, gradient clipping
- Does NOT test Trainer class or training loops
2. **test_autograd_integration.py** (213 lines)
- Tests autograd integration with tensors, layers, activations
- Validates backward pass, computation graphs
- Does NOT test training-specific functionality
3. **test_tensor_autograd_integration.py** (348 lines)
- Tests Variable wrapping of Tensors
- Tests operations (add, multiply, relu, sigmoid)
- Tests backward pass and gradient computation
- Does NOT test training loops
### Coverage Summary:
- **Autograd Integration**: ✅ Well covered (561 lines)
- **Optimizer Integration**: ✅ Covered (in wrong file)
- **Training Loop Integration**: ❌ **MISSING**
- **Trainer Class Integration**: ❌ **MISSING**
- **Learning Rate Scheduling**: ❌ **MISSING**
- **Gradient Clipping**: ⚠️ Partial (optimizer tests only)
- **Checkpointing**: ❌ **MISSING**
- **Train/Eval Mode**: ❌ **MISSING**
---
## MISSING INTEGRATION TESTS - Critical Priorities
### Priority 1: Training Loop Core Functionality
#### Test 1.1: Complete Training Loop Integration
**What to test**: End-to-end training loop through Trainer class
```python
class TestTrainerCoreIntegration:
def test_complete_training_loop(self):
"""Test complete training loop integrates all modules correctly."""
# Components from all modules:
# - Model: Linear layers (Module 03) + ReLU (Module 02)
# - Loss: MSELoss or CrossEntropyLoss (Module 04)
# - Optimizer: SGD or AdamW (Module 06)
# - Trainer: Training orchestration (Module 07)
# Verify:
# - Forward pass works
# - Loss computation works
# - Backward pass computes gradients
# - Optimizer updates parameters
# - Loss decreases over epochs
```
**Why critical**: This is the PRIMARY integration point for Module 07. If this doesn't work, nothing else matters.
#### Test 1.2: Missing zero_grad() Detection
**What to test**: Training fails catastrophically if zero_grad() is missing
```python
def test_missing_zero_grad_causes_gradient_accumulation(self):
"""Test that forgetting zero_grad() causes incorrect gradient accumulation."""
# Create trainer WITHOUT zero_grad() call
# Run multiple training steps
# Verify gradients accumulate incorrectly
# Show loss diverges instead of converging
```
**Why critical**: This is the #1 student mistake in training loops. Tests should catch it.
**Bug-catching value**: HIGH - Common error that silently breaks training
#### Test 1.3: Gradient Accumulation Pattern
**What to test**: Gradient accumulation works correctly with accumulation_steps > 1
```python
def test_gradient_accumulation_correctness(self):
"""Test gradient accumulation produces same results as larger batch."""
# Train with batch_size=4, accumulation_steps=1
# Train with batch_size=2, accumulation_steps=2
# Verify final gradients are equivalent
# Verify effective batch size is the same
```
**Why critical**: Production pattern for memory-limited training. Must work correctly.
---
### Priority 2: Train/Eval Mode Switching
#### Test 2.1: Mode Switching Affects Model Behavior
**What to test**: model.training flag changes behavior correctly
```python
def test_train_eval_mode_switching(self):
"""Test train/eval mode switching affects model behavior."""
# Create model with dropout or batchnorm (future modules)
# Run forward in training mode
# Run forward in eval mode
# Verify different outputs/behavior
# For Module 07: At minimum verify:
# - Trainer sets model.training = True in train_epoch()
# - Trainer sets model.training = False in evaluate()
```
**Why critical**: Proper mode switching is essential for correct evaluation and inference.
**Bug-catching value**: MEDIUM - Subtle bug that causes incorrect evaluation metrics
#### Test 2.2: Gradients Disabled During Evaluation
**What to test**: No gradients computed during evaluation
```python
def test_evaluation_disables_gradients(self):
"""Test evaluation doesn't compute or accumulate gradients."""
# Run evaluate() on test data
# Verify no gradients are computed
# Verify no parameter updates occur
# Verify optimizer state unchanged
```
**Why critical**: Evaluation should be faster and memory-efficient without gradients.
---
### Priority 3: Learning Rate Scheduling Integration
#### Test 3.1: Scheduler Updates Learning Rate
**What to test**: Scheduler properly updates optimizer learning rate each epoch
```python
def test_scheduler_updates_learning_rate(self):
"""Test learning rate scheduler integrates with training loop."""
# Create CosineSchedule(max_lr=0.1, min_lr=0.01, total_epochs=10)
# Create Trainer with scheduler
# Train for 10 epochs
# Verify optimizer.lr changes each epoch
# Verify lr follows cosine schedule (decreasing)
# Verify final lr ≈ min_lr
```
**Why critical**: Scheduling is essential for training convergence. Must integrate correctly.
**Bug-catching value**: HIGH - Scheduler exists but doesn't actually update LR (common integration bug)
#### Test 3.2: Training Without Scheduler Still Works
**What to test**: Scheduler is optional, training works without it
```python
def test_training_without_scheduler(self):
"""Test training works with scheduler=None."""
# Create Trainer with scheduler=None
# Train for multiple epochs
# Verify optimizer.lr stays constant
# Verify training still works correctly
```
**Why critical**: Ensures optional components are truly optional.
---
### Priority 4: Gradient Clipping Integration
#### Test 4.1: Gradient Clipping Prevents Explosion
**What to test**: Gradient clipping rescales large gradients correctly
```python
def test_gradient_clipping_prevents_explosion(self):
"""Test gradient clipping prevents exploding gradients."""
# Create model with potential for large gradients
# Set grad_clip_norm=1.0
# Inject artificially large gradients
# Train one step
# Verify gradient norm ≤ clip threshold
# Verify parameters update reasonably
```
**Why critical**: Prevents training instability from exploding gradients.
**Bug-catching value**: HIGH - Clipping may be called but not actually applied
#### Test 4.2: Small Gradients Not Affected
**What to test**: Gradient clipping doesn't affect small gradients
```python
def test_small_gradients_unchanged_by_clipping(self):
"""Test gradient clipping doesn't modify small gradients."""
# Create model with small gradients
# Set grad_clip_norm=10.0 (high threshold)
# Compute gradients
# Verify gradients unchanged
```
**Why critical**: Clipping should only activate when needed.
---
### Priority 5: Loss Convergence Validation
#### Test 5.1: Loss Decreases During Training
**What to test**: Training actually improves model performance
```python
def test_loss_convergence_on_simple_problem(self):
"""Test training reduces loss on simple learnable problem."""
# Create simple linear regression problem: y = 2x + 1
# Create model: Linear(1, 1)
# Train for 100 epochs
# Verify loss decreases monotonically (or mostly)
# Verify final loss < initial loss * 0.1
# Verify learned weights ≈ [2.0] and bias ≈ [1.0]
```
**Why critical**: Validates entire training pipeline produces learning.
**Bug-catching value**: CRITICAL - Detects any component breaking learning
#### Test 5.2: History Tracking Accuracy
**What to test**: trainer.history correctly records training metrics
```python
def test_history_tracking(self):
"""Test training history is tracked correctly."""
# Train for 5 epochs
# Verify len(trainer.history['train_loss']) == 5
# Verify len(trainer.history['learning_rates']) == 5 (if scheduler used)
# Verify values are reasonable (no NaN, no infinite)
```
**Why critical**: Users rely on history for monitoring and debugging.
---
### Priority 6: Checkpointing and State Persistence
#### Test 6.1: Save and Load Checkpoint
**What to test**: Training state can be saved and restored
```python
def test_save_load_checkpoint(self):
"""Test checkpoint saving and loading preserves training state."""
# Train for 5 epochs
# Save checkpoint
# Train for 5 more epochs
# Record final state
# Create new trainer
# Load checkpoint
# Train for 5 epochs
# Verify final state matches original
```
**Why critical**: Essential for long training jobs and experimentation.
**Bug-catching value**: MEDIUM - Checkpoint may save but not restore correctly
#### Test 6.2: Checkpoint Contains Complete State
**What to test**: Checkpoint includes all necessary components
```python
def test_checkpoint_completeness(self):
"""Test checkpoint contains all training state components."""
# Train for a few epochs
# Save checkpoint
# Load checkpoint dictionary
# Verify contains:
# - model state (weights, biases)
# - optimizer state (momentum, velocity for Adam)
# - scheduler state (current epoch)
# - training metadata (epoch, step)
```
**Why critical**: Incomplete checkpoints cause subtle resume errors.
---
### Priority 7: Integration with Previous Modules
#### Test 7.1: Works with Different Layer Types
**What to test**: Training works with various layer architectures
```python
def test_training_with_different_architectures(self):
"""Test training works with different model architectures."""
# Test 1: Single Linear layer
# Test 2: Multi-layer perceptron (Linear + ReLU + Linear)
# Test 3: Different activation functions
# Verify all train successfully
```
**Why critical**: Training should be architecture-agnostic.
#### Test 7.2: Works with Different Loss Functions
**What to test**: Training works with MSE, CrossEntropy, etc.
```python
def test_training_with_different_losses(self):
"""Test training works with different loss functions."""
# Test 1: MSELoss for regression
# Test 2: CrossEntropyLoss for classification
# Verify both train correctly
# Verify gradients flow properly
```
**Why critical**: Training should support all loss types.
#### Test 7.3: Works with Different Optimizers
**What to test**: Training works with SGD, AdamW, etc.
```python
def test_training_with_different_optimizers(self):
"""Test training works with different optimizers."""
# Test 1: SGD (simple, no momentum)
# Test 2: AdamW (complex, with momentum and adaptive LR)
# Verify both integrate correctly
# Verify both produce learning
```
**Why critical**: Training should be optimizer-agnostic.
---
## Test Organization Recommendations
### Suggested File Structure:
```
tests/07_training/
├── test_progressive_integration.py # FIX: Rename/move to tests/10_optimizers/
├── test_trainer_core.py # NEW: Priority 1 tests
├── test_trainer_modes.py # NEW: Priority 2 tests
├── test_scheduler_integration.py # NEW: Priority 3 tests
├── test_gradient_clipping.py # NEW: Priority 4 tests
├── test_convergence.py # NEW: Priority 5 tests
├── test_checkpointing.py # NEW: Priority 6 tests
├── test_module_integration.py # NEW: Priority 7 tests
├── test_autograd_integration.py # KEEP: Good coverage
└── test_tensor_autograd_integration.py # KEEP: Good coverage
```
---
## Bug-Catching Priority Matrix
| Test Category | Bug-Catching Value | Student Impact | Priority |
|--------------|-------------------|----------------|----------|
| Missing zero_grad() | CRITICAL | High - Silent failure | P0 |
| Loss convergence validation | CRITICAL | High - No learning | P0 |
| Scheduler integration | HIGH | Medium - Poor convergence | P1 |
| Gradient clipping | HIGH | Medium - Training instability | P1 |
| Train/eval mode | MEDIUM | Medium - Wrong metrics | P2 |
| Checkpoint save/load | MEDIUM | Low - Resume failures | P2 |
| Gradient accumulation | MEDIUM | Low - Memory issues | P3 |
---
## Recommended Test Implementation Order
### Phase 1: Core Functionality (P0)
1. ✅ Fix file organization (move optimizer tests to correct location)
2. ✅ Test complete training loop integration
3. ✅ Test missing zero_grad() detection
4. ✅ Test loss convergence on simple problem
### Phase 2: Essential Features (P1)
5. ✅ Test learning rate scheduling integration
6. ✅ Test gradient clipping prevents explosion
7. ✅ Test train/eval mode switching
### Phase 3: Production Features (P2)
8. ✅ Test checkpoint save and load
9. ✅ Test gradient accumulation correctness
10. ✅ Test history tracking accuracy
### Phase 4: Robustness (P3)
11. ✅ Test with different architectures
12. ✅ Test with different loss functions
13. ✅ Test with different optimizers
---
## Summary
### Current State:
- **Total test lines**: 1159 (but misplaced)
- **Module 07 specific tests**: ~0 (all tests are for wrong module)
- **Integration coverage**: 0% for training, 100% for autograd
### Required Action:
1. **URGENT**: Rename/move `test_progressive_integration.py` to `tests/10_optimizers/`
2. **URGENT**: Create new `test_trainer_core.py` with Priority 1 tests (P0)
3. **HIGH**: Create Priority 2-3 test files (P1)
4. **MEDIUM**: Create Priority 4-7 test files (P2-P3)
### Estimated Test Lines Needed:
- **Minimum (P0-P1)**: ~400 lines for critical functionality
- **Recommended (P0-P2)**: ~800 lines for production readiness
- **Comprehensive (P0-P3)**: ~1200 lines for full coverage
### Critical Integration Points Missing Tests:
1. ❌ Training loop orchestration
2. ❌ zero_grad() requirement
3. ❌ Learning rate scheduling
4. ❌ Gradient clipping application
5. ❌ Train/eval mode effects
6. ❌ Loss convergence validation
7. ❌ Checkpoint persistence
**Overall Assessment**: Module 07 has ZERO integration test coverage. All existing tests are for the wrong module (10) or test components (autograd) rather than the training loop itself.
**Risk Level**: 🔴 **CRITICAL** - Module 07 could be completely broken and tests would pass.
---
## Appendix: Test Template Examples
### Template: Complete Training Loop Test
```python
class TestTrainerCoreIntegration:
"""Test Trainer class integrates all modules correctly."""
def test_complete_training_loop(self):
"""Test end-to-end training with all components."""
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Linear
from tinytorch.core.activations import ReLU
from tinytorch.core.losses import MSELoss
from tinytorch.core.optimizers import SGD
from tinytorch.core.training import Trainer
# Create simple model
class SimpleModel:
def __init__(self):
self.layer1 = Linear(2, 4)
self.relu = ReLU()
self.layer2 = Linear(4, 1)
self.training = True
def forward(self, x):
x = self.layer1(x)
x = self.relu(x)
x = self.layer2(x)
return x
def parameters(self):
return self.layer1.parameters() + self.layer2.parameters()
# Create components
model = SimpleModel()
optimizer = SGD(model.parameters(), lr=0.01)
loss_fn = MSELoss()
trainer = Trainer(model, optimizer, loss_fn)
# Create simple dataset: y = x1 + x2
class SimpleDataset:
def __iter__(self):
for _ in range(10): # 10 batches
x = Tensor(np.random.randn(4, 2))
y = Tensor(x.data[:, 0:1] + x.data[:, 1:2])
yield x, y
# Train for 5 epochs
initial_loss = None
for epoch in range(5):
loss = trainer.train_epoch(SimpleDataset())
if initial_loss is None:
initial_loss = loss
# Verify training worked
assert loss < initial_loss * 0.8, "Loss should decrease significantly"
assert len(trainer.history['train_loss']) == 5
assert trainer.epoch == 5
```
### Template: Missing zero_grad() Test
```python
def test_missing_zero_grad_breaks_training(self):
"""Test that forgetting zero_grad() causes gradient accumulation."""
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Linear
from tinytorch.core.losses import MSELoss
from tinytorch.core.optimizers import SGD
# Create model and optimizer
layer = Linear(1, 1)
optimizer = SGD(layer.parameters(), lr=0.1)
loss_fn = MSELoss()
# Manual training loop WITHOUT zero_grad()
x = Tensor([[1.0]])
y = Tensor([[2.0]])
# First step
out1 = layer.forward(x)
loss1 = loss_fn.forward(out1, y)
loss1.backward()
grad1 = layer.weights.grad.data.copy()
optimizer.step()
# FORGOT: optimizer.zero_grad() ← BUG
# Second step
out2 = layer.forward(x)
loss2 = loss_fn.forward(out2, y)
loss2.backward()
grad2 = layer.weights.grad.data.copy()
# Verify gradients accumulated incorrectly
# grad2 should be ~2x grad1 because gradients accumulated
assert np.abs(grad2) > np.abs(grad1) * 1.5, \
"Gradients should accumulate when zero_grad() is missing"
```
---
**End of Audit Report**

View File

@@ -1,151 +0,0 @@
# Module 07 Integration Test Audit - Quick Reference
## TL;DR
**Status**: 🔴 CRITICAL - Module 07 has 0% integration test coverage
**Problem**: Test file tests wrong module (Module 10 instead of Module 07)
**Impact**: Training loop could be completely broken and tests would pass
---
## What to Read
1. **Executive Summary** (2 min): `AUDIT_SUMMARY.md`
- Critical findings
- Top 3 missing tests
- Action items
2. **Full Audit Report** (10 min): `INTEGRATION_TEST_AUDIT.md`
- Complete coverage analysis
- All missing tests (Priorities 0-3)
- Implementation templates
3. **Critical Tests** (code): `CRITICAL_TESTS_TEMPLATE.py`
- Top 3 bug-catching tests (ready to run)
- ~400 lines of working test code
- Immediate implementation guide
---
## Critical Integration Points
| Integration Point | Current Coverage | Priority |
|------------------|------------------|----------|
| Training loop orchestration | ❌ 0% | P0 - CRITICAL |
| zero_grad() requirement | ❌ 0% | P0 - CRITICAL |
| Loss convergence | ❌ 0% | P0 - CRITICAL |
| Learning rate scheduling | ❌ 0% | P1 - HIGH |
| Gradient clipping | ⚠️ 20% | P1 - HIGH |
| Train/eval mode | ❌ 0% | P1 - HIGH |
| Checkpointing | ❌ 0% | P2 - MEDIUM |
| Gradient accumulation | ❌ 0% | P2 - MEDIUM |
---
## Immediate Actions Required
### 1. Fix File Organization (5 min)
```bash
# Move misplaced test file to correct module
mv tests/07_training/test_progressive_integration.py \
tests/10_optimizers/test_progressive_integration.py
```
### 2. Run Critical Tests (30 min)
```bash
# Test the 3 most critical integration points
cd tests/07_training
pytest CRITICAL_TESTS_TEMPLATE.py -v
# Expected: Some tests may FAIL (catching real bugs!)
```
### 3. Create Real Test File (2 hours)
```bash
# Use template as basis for permanent test file
cp CRITICAL_TESTS_TEMPLATE.py test_trainer_core.py
# Integrate with TinyTorch test suite
# Add to CI/CD pipeline
```
---
## Test Implementation Priority
**Phase 1: P0 Tests (~210 lines, CRITICAL)**
- Missing zero_grad() detection
- Loss convergence validation
- Complete training loop integration
**Phase 2: P1 Tests (~160 lines, HIGH)**
- Learning rate scheduling
- Gradient clipping
- Train/eval mode switching
**Phase 3: P2 Tests (~180 lines, MEDIUM)**
- Checkpoint save/load
- Gradient accumulation
- History tracking
---
## Expected Test Results
### If All Components Work:
```
✅ zero_grad() requirement correctly enforced
✅ Training successfully converged to correct solution
✅ Learning rate scheduling works correctly
```
### If Bugs Exist (likely):
```
❌ Gradients accumulate without zero_grad() but training still "works"
→ BUG: Missing zero_grad() in training loop
❌ Loss doesn't decrease after 100 epochs
→ BUG: Complete pipeline failure (check backward pass, optimizer)
❌ Learning rate stays constant at 0.1
→ BUG: Scheduler not integrated (called but LR not updated)
```
---
## Files Created by This Audit
1. `AUDIT_SUMMARY.md` - Executive summary
2. `INTEGRATION_TEST_AUDIT.md` - Full audit report
3. `CRITICAL_TESTS_TEMPLATE.py` - Top 3 tests (ready to run)
4. `README_AUDIT.md` - This quick reference
---
## Questions to Answer
**Q: Why is this marked CRITICAL?**
A: Module 07 is where ALL previous modules integrate. If training doesn't work, nothing works. Zero test coverage means complete integration could be broken.
**Q: How do we know tests are missing?**
A: Current test file (`test_progressive_integration.py`) has wrong header ("Module 10") and tests optimizers, not training loops.
**Q: What's the quickest way to establish confidence?**
A: Run `CRITICAL_TESTS_TEMPLATE.py`. If those 3 tests pass, core functionality works. If they fail, we found critical bugs.
**Q: How much work to fix?**
A: Minimum (P0): ~210 lines, 2-3 hours. Recommended (P0+P1): ~370 lines, 1 day.
---
## Contact
For questions about this audit, see:
- Full report: `INTEGRATION_TEST_AUDIT.md`
- Test templates: `CRITICAL_TESTS_TEMPLATE.py`
- Module implementation: `/src/07_training/07_training.py`
**Audit Date**: 2025-11-25
**Status**: CRITICAL - Immediate action required

View File

@@ -1,210 +0,0 @@
╔═══════════════════════════════════════════════════════════════════════════════╗
║ MODULE 08 INTEGRATION TEST AUDIT SUMMARY ║
╚═══════════════════════════════════════════════════════════════════════════════╝
🚨 CRITICAL BUG FOUND 🚨
┌───────────────────────────────────────────────────────────────────────────────┐
│ File Location: tests/08_dataloader/test_progressive_integration.py │
│ Expected Module: Module 08 (DataLoader) │
│ Actual Module: Module 09 (Autograd) ❌ │
│ │
│ IMPACT: Module 08 has ZERO integration tests currently! │
└───────────────────────────────────────────────────────────────────────────────┘
═══════════════════════════════════════════════════════════════════════════════
📊 CURRENT TEST COVERAGE ANALYSIS
═══════════════════════════════════════════════════════════════════════════════
Current Tests (ALL WRONG MODULE):
┌─────────────────────────────────────────────────────────────┐
│ ✗ TestCompleteMLPipelineStillWorks │
│ └─ Tests Module 09 regression, not Module 08 │
│ │
│ ✗ TestModule09AutogradCore │
│ ├─ test_variable_wrapper_exists │
│ ├─ test_gradient_computation │
│ └─ test_computation_graph_building │
│ │
│ ✗ TestAutogradIntegration │
│ ├─ test_autograd_with_layers │
│ ├─ test_autograd_with_spatial_operations │
│ └─ test_autograd_with_attention │
│ │
│ ✗ TestGradientBasedLearningFoundation │
│ ├─ test_parameter_gradient_computation │
│ ├─ test_loss_function_gradients │
│ └─ test_optimization_readiness │
│ │
│ ✗ TestModule09Completion │
│ └─ test_autograd_foundation_complete │
└─────────────────────────────────────────────────────────────┘
Module 08 Coverage: 0/7 critical integration points tested ❌
═══════════════════════════════════════════════════════════════════════════════
🎯 MISSING MODULE 08 INTEGRATION TESTS
═══════════════════════════════════════════════════════════════════════════════
🔴 CRITICAL PRIORITY (Must Have):
1. DataLoader + Training Loop Integration ⚠️
┌────────────────────────────────────────────────────────┐
│ Tests: Batches work with model forward pass │
│ Risk: Students can't train models │
│ Catches: Shape mismatches, iteration bugs │
└────────────────────────────────────────────────────────┘
2. Shuffling Consistency Across Epochs ⚠️
┌────────────────────────────────────────────────────────┐
│ Tests: Data shuffles properly each epoch │
│ Risk: Training may not converge │
│ Catches: Randomization bugs, duplicate samples │
└────────────────────────────────────────────────────────┘
3. Batch Size Memory Scaling ⚠️
┌────────────────────────────────────────────────────────┐
│ Tests: Memory usage scales with batch size │
│ Risk: OOM errors, poor performance │
│ Catches: Memory issues, batch handling bugs │
└────────────────────────────────────────────────────────┘
🟡 HIGH PRIORITY (Very Important):
4. Tensor Dtype Compatibility
┌────────────────────────────────────────────────────────┐
│ Tests: DataLoader tensors match model expectations │
│ Risk: Type errors during training │
│ Catches: Dtype mismatches, conversion errors │
└────────────────────────────────────────────────────────┘
5. DataLoader + Loss Function Integration
┌────────────────────────────────────────────────────────┐
│ Tests: Batched predictions work with loss functions │
│ Risk: Loss computation fails │
│ Catches: Shape errors, reduction bugs │
└────────────────────────────────────────────────────────┘
🟢 MEDIUM PRIORITY (Should Have):
6. Empty/Single Sample Edge Cases
┌────────────────────────────────────────────────────────┐
│ Tests: Graceful handling of unusual datasets │
│ Risk: Crashes on edge cases │
│ Catches: Division by zero, empty iteration │
└────────────────────────────────────────────────────────┘
7. Multi-Epoch Iteration Stability
┌────────────────────────────────────────────────────────┐
│ Tests: Multiple epochs work reliably │
│ Risk: Multi-epoch training fails │
│ Catches: Memory leaks, iteration bugs │
└────────────────────────────────────────────────────────┘
═══════════════════════════════════════════════════════════════════════════════
🔗 MODULE 08 INTEGRATION POINTS
═══════════════════════════════════════════════════════════════════════════════
Dependencies (What Module 08 Uses):
┌─────────────────────────────────────────────────────────┐
│ Module 01 (Tensor) ────→ Core data structure │
│ Module 03 (Layers) ────→ Batches passed to layers │
│ Module 04 (Losses) ────→ Batch predictions → loss │
│ Module 05 (Autograd) ──→ Batches in gradient tracking │
│ Module 06 (Optimizers) → Batches drive updates │
│ Module 07 (Training) ──→ DataLoader in training loop │
└─────────────────────────────────────────────────────────┘
Enables (What Uses Module 08):
┌─────────────────────────────────────────────────────────┐
│ Module 07 (Training) → Training loop iteration │
│ Module 09 (Spatial) ──→ Batched image data for CNNs │
│ Module 10 (Text) ─────→ Batched text/token data │
│ All Future Modules ───→ Any batch processing │
└─────────────────────────────────────────────────────────┘
═══════════════════════════════════════════════════════════════════════════════
🛠️ RECOMMENDED ACTION PLAN
═══════════════════════════════════════════════════════════════════════════════
Step 1: Fix File Location ⚠️ IMMEDIATE
┌─────────────────────────────────────────────────────────┐
│ Move current file to correct location: │
│ │
│ FROM: tests/08_dataloader/test_progressive_*.py │
│ TO: tests/09_autograd/test_progressive_*.py │
│ │
│ Reason: Current tests are for Module 09, not 08 │
└─────────────────────────────────────────────────────────┘
Step 2: Create New Module 08 Tests
┌─────────────────────────────────────────────────────────┐
│ Create proper test_progressive_integration.py for: │
│ - Dataset abstract class │
│ - TensorDataset implementation │
│ - DataLoader batching and shuffling │
└─────────────────────────────────────────────────────────┘
Step 3: Implement Critical Tests First
┌─────────────────────────────────────────────────────────┐
│ Priority Order: │
│ 1. DataLoader + Training Loop Integration │
│ 2. Shuffling Consistency │
│ 3. Batch Size Memory Scaling │
└─────────────────────────────────────────────────────────┘
Step 4: Validate Student Workflows
┌─────────────────────────────────────────────────────────┐
│ Ensure tests catch real student issues: │
│ - Can they create datasets? │
│ - Can they iterate batches? │
│ - Can they train models end-to-end? │
└─────────────────────────────────────────────────────────┘
═══════════════════════════════════════════════════════════════════════════════
📈 IMPACT ASSESSMENT
═══════════════════════════════════════════════════════════════════════════════
Current State:
┌────────────────────────────────────────────┐
│ Module 08 Integration Coverage: 0% │
│ Critical Bug Risk: VERY HIGH │
│ Student Success Risk: VERY HIGH │
└────────────────────────────────────────────┘
After Implementing Recommended Tests:
┌────────────────────────────────────────────┐
│ Module 08 Integration Coverage: 100% │
│ Critical Bug Risk: LOW │
│ Student Success Risk: LOW │
└────────────────────────────────────────────┘
Bugs Caught by New Tests:
✓ Training loop integration failures
✓ Shuffling and randomization bugs
✓ Memory allocation issues
✓ Dtype mismatches
✓ Loss function integration errors
✓ Edge case crashes
✓ Multi-epoch stability issues
═══════════════════════════════════════════════════════════════════════════════
🎓 STUDENT IMPACT
═══════════════════════════════════════════════════════════════════════════════
Without Module 08 Tests:
❌ Students can implement DataLoader but can't verify it works
❌ Training loop failures discovered during later modules
❌ Confusing errors with no clear debugging path
❌ Wasted time on issues that tests should catch
❌ Poor understanding of batch processing trade-offs
With Module 08 Tests:
✅ Students verify DataLoader works immediately
✅ Integration issues caught at Module 08 boundary
✅ Clear error messages guide debugging
✅ Confidence to proceed to next modules
✅ Deep understanding of batch processing mechanics
═══════════════════════════════════════════════════════════════════════════════
For detailed analysis, see: INTEGRATION_TEST_AUDIT.md

View File

@@ -1,361 +0,0 @@
# Module 08 (DataLoader) Integration Test Audit
## CRITICAL BUG IDENTIFIED
**File**: `/Users/VJ/GitHub/TinyTorch/tests/08_dataloader/test_progressive_integration.py`
**Issue**: Tests Module 09 (Autograd) instead of Module 08 (DataLoader)
### Current Status
The test file header claims to test Module 08 but actually tests:
```python
"""
Module 08: Progressive Integration Tests
Tests that Module 09 (Autograd) works correctly AND that the entire prior stack (01→08) still works.
```
**This is WRONG.** The file is in `tests/08_dataloader/` but tests Module 09 functionality.
---
## What Tests Currently Exist
### Current Tests (Module 09 - Autograd, WRONG MODULE)
1. **TestCompleteMLPipelineStillWorks**
- `test_end_to_end_ml_pipeline_stable()` - Full CNN pipeline
- `test_attention_and_spatial_integration_stable()` - Advanced architectures
2. **TestModule09AutogradCore** (WRONG - testing future module!)
- `test_variable_wrapper_exists()` - Variable class
- `test_gradient_computation()` - Backward pass
- `test_computation_graph_building()` - Computation graph
3. **TestAutogradIntegration** (WRONG - testing future module!)
- `test_autograd_with_layers()` - Gradients through Dense layers
- `test_autograd_with_spatial_operations()` - CNN gradients
- `test_autograd_with_attention()` - Transformer gradients
4. **TestGradientBasedLearningFoundation** (WRONG - testing future module!)
- `test_parameter_gradient_computation()` - Parameter gradients
- `test_loss_function_gradients()` - Loss gradients
- `test_optimization_readiness()` - Optimizer foundation
5. **TestModule09Completion** (WRONG - testing future module!)
- `test_autograd_foundation_complete()` - Complete autograd validation
---
## What Module 08 Tests SHOULD Exist
### Module 08 Scope: DataLoader (Data Pipeline)
**Implementation Location**: `tinytorch/data/loader.py`
**Core Components**:
- `Dataset` - Abstract base class
- `TensorDataset` - Tensor wrapper dataset
- `DataLoader` - Batching and shuffling
### Missing Integration Tests for Module 08
#### 1. **DataLoader + Training Loop Integration** ⚠️ CRITICAL
**Why**: Students need to verify DataLoader works with training loops
```python
def test_dataloader_training_loop_integration():
"""
Test DataLoader provides batches correctly for training.
Integration Points:
- DataLoader batches → Model forward pass
- Batch tensors → Loss computation
- Multi-epoch iteration
"""
```
**What to test**:
- DataLoader provides correct batch shapes
- Batches work with model forward pass
- Multiple epochs iterate correctly
- Training loop can consume all batches
#### 2. **Shuffling Consistency** ⚠️ CRITICAL
**Why**: Critical for training stability and reproducibility
```python
def test_dataloader_shuffling_consistency():
"""
Test shuffling behavior across epochs.
Integration Points:
- Same data, different order each epoch
- Reproducibility with random seed
- All samples seen exactly once per epoch
"""
```
**What to test**:
- Shuffle=True changes order between epochs
- Shuffle=False maintains order
- All samples appear exactly once per epoch
- Random seed controls shuffling
#### 3. **Batch Size Memory Scaling** ⚠️ CRITICAL
**Why**: Students need to understand batch size impact on memory
```python
def test_batch_size_memory_scaling():
"""
Test memory usage scales with batch size.
Systems Analysis:
- Small batches (4): Low memory, more iterations
- Medium batches (32): Balanced
- Large batches (128): High memory, fewer iterations
"""
```
**What to test**:
- Small batch sizes work correctly
- Large batch sizes work correctly
- Total samples = batches * batch_size (approximately)
- Last batch handles remainder correctly
#### 4. **Tensor Dtype Compatibility** ⚠️ HIGH PRIORITY
**Why**: DataLoader tensors must match model expectations
```python
def test_dataloader_tensor_dtype_compatibility():
"""
Test DataLoader outputs match model input expectations.
Integration Points:
- DataLoader tensors → Model layers
- Feature dtype (float32)
- Label dtype (int64 for classification, float32 for regression)
"""
```
**What to test**:
- Features are float32 tensors
- Labels have correct dtype
- Shapes match model input requirements
- No dtype conversion errors during training
#### 5. **DataLoader + Loss Function Integration** ⚠️ HIGH PRIORITY
**Why**: Batches must work with loss computation
```python
def test_dataloader_loss_integration():
"""
Test DataLoader batches work with loss functions.
Integration Points:
- Batch predictions → Loss computation
- Batch labels → Loss targets
- Reduction across batch dimension
"""
```
**What to test**:
- Batched predictions work with MSE loss
- Batched predictions work with CrossEntropy loss
- Loss reduction handles batch dimension
- Gradients (when ready) flow through batches
#### 6. **Empty/Single Sample Edge Cases** ⚠️ MEDIUM PRIORITY
**Why**: Robust data handling prevents training crashes
```python
def test_dataloader_edge_cases():
"""
Test DataLoader handles edge cases gracefully.
Edge Cases:
- Dataset smaller than batch size
- Single sample dataset
- Last batch smaller than batch_size
"""
```
**What to test**:
- Dataset with 1 sample
- Dataset smaller than batch_size
- Uneven division (10 samples, batch_size=3 → 4 batches)
- Empty iteration behavior
#### 7. **DataLoader Iteration Stability** ⚠️ MEDIUM PRIORITY
**Why**: Multiple epochs must work reliably
```python
def test_dataloader_multi_epoch_stability():
"""
Test DataLoader can iterate multiple epochs without issues.
Integration Points:
- Reset between epochs
- Shuffle consistency
- No memory leaks across epochs
"""
```
**What to test**:
- Can iterate 10+ epochs
- Each epoch yields same total samples
- Shuffling works every epoch
- No gradual slowdown
---
## Bug-Catching Priority Ranking
### CRITICAL (Must Have for Module 08)
1. **DataLoader + Training Loop Integration**
- **Risk**: Students can't train models without this
- **Impact**: Complete failure of ML pipeline
- **Catches**: Shape mismatches, iteration bugs
2. **Shuffling Consistency**
- **Risk**: Training may not converge if shuffling breaks
- **Impact**: Poor model performance, confusing results
- **Catches**: Randomization bugs, duplicate samples
3. **Batch Size Memory Scaling**
- **Risk**: Students don't understand memory-compute trade-offs
- **Impact**: OOM errors, slow training
- **Catches**: Memory issues, batch handling bugs
### HIGH PRIORITY (Very Important)
4. **Tensor Dtype Compatibility**
- **Risk**: Type errors during training
- **Impact**: Cryptic errors, wasted debugging time
- **Catches**: Dtype mismatches, conversion errors
5. **DataLoader + Loss Function Integration**
- **Risk**: Loss computation fails with batched data
- **Impact**: Training loop crashes
- **Catches**: Shape errors, reduction bugs
### MEDIUM PRIORITY (Should Have)
6. **Empty/Single Sample Edge Cases**
- **Risk**: Crashes on unusual datasets
- **Impact**: Fragile code, production failures
- **Catches**: Division by zero, empty iteration
7. **DataLoader Iteration Stability**
- **Risk**: Multi-epoch training fails
- **Impact**: Can't train for sufficient epochs
- **Catches**: Memory leaks, iteration bugs
---
## Recommended Action Plan
### Immediate Actions
1. **Rename Current File**
```bash
mv tests/08_dataloader/test_progressive_integration.py \
tests/09_autograd/test_progressive_integration.py
```
The current tests are for Module 09 (Autograd), not Module 08.
2. **Create New Module 08 Tests**
Create a proper `test_progressive_integration.py` for Module 08 DataLoader testing.
3. **Implement Critical Tests First**
- DataLoader + Training Loop Integration
- Shuffling Consistency
- Batch Size Memory Scaling
### Test Structure for Module 08
```python
"""
Module 08: Progressive Integration Tests
Tests that Module 08 (DataLoader) works correctly AND that the entire prior stack (01→07) still works.
DEPENDENCY CHAIN: 01_tensor → 02_activations → 03_layers → 04_losses → 05_autograd → 06_optimizers → 07_training → 08_dataloader
This is where we enable efficient batch processing and data iteration for training.
"""
class TestPriorStackStillWorking:
"""Regression: Modules 01-07 still work"""
# Quick smoke tests for foundation
class TestModule08DataLoaderCore:
"""Test Module 08 (DataLoader) core functionality"""
# Dataset, TensorDataset, DataLoader basic operations
class TestDataLoaderTrainingIntegration:
"""Integration: DataLoader + Training Loop"""
# CRITICAL: Full training pipeline with batching
class TestDataLoaderMemoryBehavior:
"""Systems: Memory and performance characteristics"""
# Batch size scaling, memory usage
class TestModule08Completion:
"""Final validation: Ready for next modules"""
# Complete checklist
```
---
## Integration Points for Module 08
Based on existing code analysis:
### Module 08 Dependencies (What it uses)
- **Module 01 (Tensor)**: `tinytorch.core.tensor.Tensor` - Core data structure
- **Module 02 (Activations)**: Not directly used, but batches go through activations
- **Module 03 (Layers)**: Batches passed to layers
- **Module 04 (Losses)**: Batch predictions → loss computation
- **Module 05 (Autograd)**: Batches participate in gradient computation
- **Module 06 (Optimizers)**: Batches drive parameter updates
- **Module 07 (Training)**: DataLoader provides batches for training loop
### Module 08 Enables (What uses it)
- **Module 07 (Training)**: Training loops iterate over DataLoader
- **Module 09 (Spatial)**: Batched image data for CNNs
- **Module 10 (Tokenization)**: Batched text data
- **Module 11 (Embeddings)**: Batched sequence data
- All future training/inference pipelines
---
## Summary
### Current Coverage: **0% for Module 08 DataLoader**
- All existing tests are for Module 09 (Autograd)
- No tests for Dataset, TensorDataset, or DataLoader
- Critical integration points completely untested
### Missing Tests: **7 integration test scenarios**
- 3 CRITICAL priority tests
- 2 HIGH priority tests
- 2 MEDIUM priority tests
### Bug-Catching Gaps:
- **Training integration**: Untested - will students be able to train models?
- **Shuffling behavior**: Untested - will training converge?
- **Memory scaling**: Untested - will students understand batch size?
- **Dtype compatibility**: Untested - will type errors occur?
### Recommended Next Steps:
1. Move current file to Module 09 tests
2. Create proper Module 08 integration tests
3. Implement critical tests first (training loop, shuffling, memory)
4. Validate with student workflows

View File

@@ -1,575 +0,0 @@
# Module 10 (Tokenization) Integration Test Audit
**Date**: 2025-11-25
**Auditor**: QA Agent
**Status**: CRITICAL ISSUES FOUND - Test file contains completely wrong content
---
## Executive Summary
**CRITICAL FINDING**: The integration test file `/tests/10_tokenization/test_progressive_integration.py` contains **WRONG MODULE CONTENT** - it tests Module 11 (Training) instead of Module 10 (Tokenization).
**Current Coverage**: 0% - No tokenization integration tests exist
**Missing Tests**: 100% - All critical integration points untested
**Priority**: HIGH - Module 10 has no integration validation
---
## Current Test File Analysis
### Problem: Wrong Module Tests
The file `test_progressive_integration.py` contains:
-**Line 3-6**: References wrong dependency chain (mentions "11_training")
-**Classes**: TestModule11TrainingCore, TestAdvancedTrainingFeatures
-**Tests**: training loops, loss functions, optimizers, CNN pipelines
-**Imports**: training.Trainer, training.CrossEntropyLoss, etc.
**Root Cause**: Copy-paste error from Module 11 template
---
## Module 10 Actual Implementation
### What Module 10 Provides
**Location**: `tinytorch.text.tokenization`
**Classes Implemented**:
1. `Tokenizer` - Base class with encode/decode interface
2. `CharTokenizer` - Character-level tokenization
3. `BPETokenizer` - Byte Pair Encoding tokenizer
**Key Methods**:
- `CharTokenizer.build_vocab(corpus)` - Build vocabulary from text
- `CharTokenizer.encode(text)` - Text → token IDs (List[int])
- `CharTokenizer.decode(tokens)` - Token IDs → text
- `BPETokenizer.train(corpus, vocab_size)` - Learn BPE merges
- `BPETokenizer.encode(text)` - BPE encoding
- `BPETokenizer.decode(tokens)` - BPE decoding
**Integration Points with Other Modules**:
- Module 01 (Tensor): Can convert token IDs to Tensor (optional)
- Module 11 (Embeddings): Token IDs feed into embedding layers
- Module 08 (DataLoader): Tokenizers process text datasets
---
## Critical Integration Tests MISSING
### Priority 1: Data Type Correctness (Bug-Catching Priority)
**Missing Test**: Tokenizers produce correct tensor dtypes
```python
def test_tokenizer_produces_int64_tensors():
"""Verify tokenizers produce int64 token IDs for embedding layers."""
# WHY CRITICAL: Embeddings expect int64 indices, not float32
# BUG SCENARIO: If tokenizer returns float, embedding lookup crashes
tokenizer = CharTokenizer()
tokenizer.build_vocab(["hello world"])
# Encode text
token_ids = tokenizer.encode("hello")
# CRITICAL: Must be integers, not floats
assert all(isinstance(t, (int, np.integer)) for t in token_ids), \
"Token IDs must be integers for embedding lookup"
# If converting to Tensor, must be int64
token_tensor = Tensor(token_ids)
assert token_tensor.data.dtype == np.int64, \
f"Expected int64 for embeddings, got {token_tensor.data.dtype}"
```
**Bug This Catches**: Type mismatch between tokenizer output and embedding input
---
### Priority 2: Embedding Layer Integration (Module 11 Dependency)
**Missing Test**: Token sequences work with embeddings
```python
def test_tokenization_to_embedding_pipeline():
"""Test complete tokenization → embedding pipeline."""
# WHY CRITICAL: This is the PRIMARY use case for tokenizers
try:
from tinytorch.text.embeddings import Embedding
from tinytorch.text.tokenization import CharTokenizer
# Build tokenizer
tokenizer = CharTokenizer()
corpus = ["hello", "world", "test"]
tokenizer.build_vocab(corpus)
vocab_size = len(tokenizer.vocab)
embed_dim = 16
# Create embedding layer
embedding = Embedding(vocab_size, embed_dim)
# Tokenize text
text = "hello world"
token_ids = tokenizer.encode(text)
# CRITICAL: Shape compatibility
token_tensor = Tensor(token_ids)
assert token_tensor.shape == (len(token_ids),), \
"Token IDs should be 1D sequence"
# Embedding lookup should work
embedded = embedding(token_tensor)
assert embedded.shape == (len(token_ids), embed_dim), \
f"Expected shape ({len(token_ids)}, {embed_dim}), got {embedded.shape}"
# Values should be actual embeddings, not zeros
assert not np.allclose(embedded.data, 0), \
"Embeddings should be non-zero (initialized randomly)"
except ImportError:
pytest.skip("Embeddings module not yet implemented")
```
**Bug This Catches**: Shape mismatches, dtype errors, index out-of-bounds
---
### Priority 3: BPE Edge Cases (Robustness)
**Missing Test**: BPE tokenizer handles edge cases
```python
def test_bpe_edge_cases():
"""Test BPE tokenizer robustness with edge cases."""
tokenizer = BPETokenizer(vocab_size=100)
# Edge Case 1: Empty string
token_ids = tokenizer.encode("")
assert token_ids == [], "Empty string should produce empty token list"
decoded = tokenizer.decode([])
assert decoded == "", "Empty tokens should decode to empty string"
# Edge Case 2: Single character
tokenizer.train(["a", "b", "c"])
token_ids = tokenizer.encode("a")
assert len(token_ids) > 0, "Single char should tokenize"
assert tokenizer.decode(token_ids).strip() == "a", "Should roundtrip"
# Edge Case 3: Unknown characters (after training on limited corpus)
tokenizer.train(["hello", "world"])
token_ids = tokenizer.encode("xyz") # Characters not in training
# Should handle gracefully with <UNK> token
assert 0 in token_ids or tokenizer.token_to_id.get('<UNK>') in token_ids, \
"Unknown characters should map to <UNK> token"
# Edge Case 4: Very long text
long_text = "hello " * 1000
token_ids = tokenizer.encode(long_text)
assert len(token_ids) > 0, "Long text should tokenize"
assert all(isinstance(t, int) for t in token_ids), \
"All tokens should be integers"
# Edge Case 5: Special characters
special_text = "hello, world! @#$%"
token_ids = tokenizer.encode(special_text)
decoded = tokenizer.decode(token_ids)
# Should preserve word content even if punctuation changes
assert "hello" in decoded or "world" in decoded, \
"Should preserve core words"
```
**Bug This Catches**: Crashes on empty input, unknown character handling, memory issues
---
### Priority 4: Vocabulary Consistency
**Missing Test**: Vocabulary consistency across encode/decode
```python
def test_vocabulary_encode_decode_consistency():
"""Verify vocabulary mappings are bidirectional and consistent."""
# Test CharTokenizer
char_tokenizer = CharTokenizer()
corpus = ["abc", "def", "xyz"]
char_tokenizer.build_vocab(corpus)
# Check bidirectional mappings
for token, token_id in char_tokenizer.token_to_id.items():
assert char_tokenizer.id_to_token[token_id] == token, \
f"Bidirectional mapping broken: {token} -> {token_id} -> {char_tokenizer.id_to_token[token_id]}"
# Test roundtrip for all corpus text
for text in corpus:
token_ids = char_tokenizer.encode(text)
decoded = char_tokenizer.decode(token_ids)
# Should preserve characters (may have different spacing)
for char in text:
assert char in decoded, f"Lost character '{char}' in roundtrip"
# Test BPETokenizer
bpe_tokenizer = BPETokenizer(vocab_size=50)
bpe_tokenizer.train(["hello world", "test data"])
# Vocabulary should contain special tokens
assert '<UNK>' in bpe_tokenizer.vocab, "BPE should have <UNK> token"
assert bpe_tokenizer.token_to_id['<UNK>'] == 0, "<UNK> should be ID 0"
# Test roundtrip
text = "hello world"
token_ids = bpe_tokenizer.encode(text)
decoded = bpe_tokenizer.decode(token_ids)
# Should preserve words (BPE may merge/split differently)
words = text.split()
for word in words:
# Word content should be preserved (possibly with merges)
assert word in decoded or any(word in decoded for word in words), \
f"Lost word '{word}' in BPE roundtrip"
```
**Bug This Catches**: Vocabulary corruption, ID collisions, decode inconsistency
---
### Priority 5: Batch Processing
**Missing Test**: Tokenizer handles batches correctly
```python
def test_tokenizer_batch_processing():
"""Test tokenizer works with batched text data."""
tokenizer = CharTokenizer()
corpus = ["hello", "world", "test", "data"]
tokenizer.build_vocab(corpus)
# Batch of texts
texts = ["hello world", "test data", "new text"]
# Encode batch
batch_token_ids = [tokenizer.encode(text) for text in texts]
# Check all are lists of ints
for token_ids in batch_token_ids:
assert isinstance(token_ids, list), "Each should be a list"
assert all(isinstance(t, int) for t in token_ids), \
"All tokens should be integers"
# Check different texts produce different token sequences
assert batch_token_ids[0] != batch_token_ids[1], \
"Different texts should produce different token sequences"
# Decode batch
decoded_texts = [tokenizer.decode(token_ids) for token_ids in batch_token_ids]
# Should preserve core content
for original, decoded in zip(texts, decoded_texts):
# May have spacing differences, but core words should match
original_words = set(original.split())
decoded_words = set(decoded.split())
# At least some words should match
assert len(original_words & decoded_words) > 0, \
f"Lost all words in roundtrip: {original} -> {decoded}"
```
**Bug This Catches**: Batch size errors, state pollution between encodes
---
### Priority 6: Memory and Performance
**Missing Test**: Tokenization memory usage and throughput
```python
def test_tokenization_performance():
"""Test tokenization memory and throughput characteristics."""
import time
# Build tokenizers
char_tokenizer = CharTokenizer()
bpe_tokenizer = BPETokenizer(vocab_size=1000)
# Training corpus
corpus = ["hello world"] * 100
char_tokenizer.build_vocab(corpus)
bpe_tokenizer.train(corpus)
# Test text (simulate real document)
test_text = "hello world test data " * 100 # ~400 chars
# Measure CharTokenizer throughput
start = time.time()
iterations = 1000
for _ in range(iterations):
token_ids = char_tokenizer.encode(test_text)
char_time = time.time() - start
char_throughput = (len(test_text) * iterations) / char_time
print(f"CharTokenizer: {char_throughput:.0f} chars/sec")
assert char_throughput > 10000, \
f"CharTokenizer too slow: {char_throughput:.0f} chars/sec (expected >10K)"
# Measure BPE throughput
start = time.time()
for _ in range(iterations):
token_ids = bpe_tokenizer.encode(test_text)
bpe_time = time.time() - start
bpe_throughput = (len(test_text) * iterations) / bpe_time
print(f"BPETokenizer: {bpe_throughput:.0f} chars/sec")
# BPE should be slower (more complex), but still reasonable
assert bpe_throughput > 1000, \
f"BPETokenizer too slow: {bpe_throughput:.0f} chars/sec (expected >1K)"
# Vocabulary size check
assert len(char_tokenizer.vocab) < 500, \
f"CharTokenizer vocab too large: {len(char_tokenizer.vocab)} (expected <500)"
assert len(bpe_tokenizer.vocab) <= 1000, \
f"BPETokenizer vocab exceeded limit: {len(bpe_tokenizer.vocab)}"
```
**Bug This Catches**: Performance regressions, memory leaks, vocabulary explosion
---
### Priority 7: DataLoader Integration
**Missing Test**: Tokenizer integration with DataLoader
```python
def test_tokenizer_dataloader_integration():
"""Test tokenizer works in DataLoader pipeline."""
try:
from tinytorch.core.data import Dataset, DataLoader
from tinytorch.text.tokenization import CharTokenizer
# Custom dataset with tokenization
class TextDataset(Dataset):
def __init__(self, texts, tokenizer):
self.texts = texts
self.tokenizer = tokenizer
def __len__(self):
return len(self.texts)
def __getitem__(self, idx):
text = self.texts[idx]
token_ids = self.tokenizer.encode(text)
# Return as tensor
return Tensor(token_ids)
# Build tokenizer
tokenizer = CharTokenizer()
texts = ["hello world", "test data", "sample text"]
tokenizer.build_vocab(texts)
# Create dataset and dataloader
dataset = TextDataset(texts, tokenizer)
dataloader = DataLoader(dataset, batch_size=2, shuffle=False)
# Iterate batches
batch_count = 0
for batch in dataloader:
batch_count += 1
# Batch should be tensor or list of tensors
if isinstance(batch, (list, tuple)):
assert len(batch) <= 2, "Batch size should be 2"
for item in batch:
assert hasattr(item, 'data') or isinstance(item, Tensor), \
"Items should be Tensors"
else:
# Single batch tensor
assert hasattr(batch, 'data'), "Batch should be Tensor"
assert batch_count > 0, "DataLoader should produce batches"
except ImportError:
pytest.skip("DataLoader not yet implemented")
```
**Bug This Catches**: DataLoader compatibility issues, batching errors
---
## Regression Prevention Tests MISSING
### Test: Prior Stack Still Works
**Missing Test**: Verify Modules 01-09 unchanged
```python
def test_no_prior_module_regression():
"""Ensure tokenization doesn't break prior modules."""
# Module 01 (Tensor) should still work
from tinytorch.core.tensor import Tensor
x = Tensor([1, 2, 3])
assert x.shape == (3,), "Tensor creation broken"
# Module 02 (Activations) should still work
try:
from tinytorch.core.activations import ReLU
relu = ReLU()
y = relu(x)
assert y.shape == x.shape, "Activation broken"
except ImportError:
pass # Not implemented yet
# Module 08 (DataLoader) should still work
try:
from tinytorch.core.data import Dataset, DataLoader
class DummyDataset(Dataset):
def __len__(self):
return 5
def __getitem__(self, idx):
return idx
dataset = DummyDataset()
loader = DataLoader(dataset, batch_size=2)
assert len(dataset) == 5, "Dataset broken"
except ImportError:
pass
```
---
## Recommended Test File Structure
```python
"""
Module 10: Progressive Integration Tests
Tests that Module 10 (Tokenization) works correctly AND integrates with prior modules.
DEPENDENCY CHAIN: 01_tensor → ... → 08_dataloader → 10_tokenization → 11_embeddings
This is where we enable text processing for NLP.
"""
class TestPriorStackStillWorking:
"""Quick regression checks that prior modules (01-09) still work."""
def test_tensor_operations_stable(self):
"""Verify Module 01 (Tensor) still works."""
def test_dataloader_stable(self):
"""Verify Module 08 (DataLoader) still works."""
class TestModule10TokenizationCore:
"""Test Module 10 (Tokenization) core functionality."""
def test_char_tokenizer_creation(self):
"""Test CharTokenizer initialization and vocab building."""
def test_char_tokenizer_encode_decode(self):
"""Test CharTokenizer encode/decode roundtrip."""
def test_bpe_tokenizer_training(self):
"""Test BPE tokenizer training on corpus."""
def test_bpe_tokenizer_encode_decode(self):
"""Test BPE encode/decode roundtrip."""
class TestTokenizationIntegration:
"""Test tokenization integration with other modules."""
def test_tokenizer_produces_correct_dtypes(self):
"""PRIORITY 1: Verify int64 output for embeddings."""
def test_tokenization_to_embedding_pipeline(self):
"""PRIORITY 2: Test complete tokenization → embedding flow."""
def test_tokenizer_dataloader_integration(self):
"""Test tokenizer in DataLoader pipeline."""
class TestTokenizationEdgeCases:
"""Test tokenization robustness with edge cases."""
def test_bpe_edge_cases(self):
"""PRIORITY 3: Empty strings, unknown tokens, special chars."""
def test_vocabulary_consistency(self):
"""PRIORITY 4: Bidirectional mappings, roundtrip integrity."""
def test_batch_processing(self):
"""PRIORITY 5: Batch encoding/decoding correctness."""
class TestTokenizationPerformance:
"""Test tokenization performance characteristics."""
def test_tokenization_throughput(self):
"""PRIORITY 6: Measure chars/sec, vocab size."""
def test_memory_usage(self):
"""Verify vocabulary doesn't consume excessive memory."""
class TestRegressionPrevention:
"""Ensure previous modules still work after Module 10."""
def test_no_tensor_regression(self):
"""Verify Module 01 (Tensor) unchanged."""
def test_no_dataloader_regression(self):
"""Verify Module 08 (DataLoader) unchanged."""
```
---
## Summary Statistics
| Category | Missing Tests | Priority | Impact |
|----------|--------------|----------|--------|
| Data Type Correctness | 1 | CRITICAL | Breaks embeddings |
| Embedding Integration | 1 | CRITICAL | Core use case |
| BPE Edge Cases | 1 | HIGH | Production robustness |
| Vocabulary Consistency | 1 | HIGH | Data integrity |
| Batch Processing | 1 | MEDIUM | Real-world usage |
| Performance | 1 | MEDIUM | Production viability |
| DataLoader Integration | 1 | MEDIUM | Pipeline integrity |
| Regression Prevention | 2 | HIGH | Stack stability |
**Total Missing Tests**: 9 critical integration tests
**Current Test Coverage**: 0% (wrong module)
**Recommended Action**: REPLACE entire test file
---
## Recommended Action Plan
### Phase 1: Immediate (Critical Fixes)
1. **REPLACE test_progressive_integration.py** with correct Module 10 tests
2. **Implement Priority 1-2 tests** (dtype correctness, embedding integration)
3. **Add BPE edge case tests** (Priority 3)
### Phase 2: Short-term (Robustness)
4. **Add vocabulary consistency tests** (Priority 4)
5. **Add batch processing tests** (Priority 5)
6. **Add regression prevention tests**
### Phase 3: Performance Validation
7. **Add performance benchmarks** (Priority 6)
8. **Add DataLoader integration** (Priority 7)
---
## Bug-Catching Priorities (Ranked)
1. **Data Type Mismatch** (CRITICAL): int vs float breaks embedding lookup
2. **Embedding Integration** (CRITICAL): Core use case must work
3. **Unknown Token Handling** (HIGH): Crashes on unseen characters
4. **Vocabulary Corruption** (HIGH): Encode/decode inconsistency
5. **Empty Input Crashes** (MEDIUM): Edge case handling
6. **Batch State Pollution** (MEDIUM): Tokenizer state leaks between calls
7. **Performance Regression** (LOW): Slow tokenization impacts pipelines
---
**Audit Completed**: 2025-11-25
**Next Review**: After test file replacement
**Sign-off**: QA Agent - Integration Testing Team

View File

@@ -1,105 +0,0 @@
================================================================================
MODULE 11 EMBEDDINGS - INTEGRATION TEST AUDIT SUMMARY
================================================================================
Date: 2025-11-25
Status: CRITICAL ISSUES FOUND
CRITICAL FINDING
================================================================================
The test file tests THE WRONG MODULE!
- File claims to test Module 11 (Embeddings)
- Actually tests Module 12 (Compression)
- This is a copy-paste error requiring COMPLETE REWRITE
COVERAGE ANALYSIS
================================================================================
Current Coverage: 0% (tests wrong module)
Missing Tests: 12 critical integration tests
Risk Level: HIGH - No validation of embedding functionality
TOP PRIORITY MISSING TESTS (P0 - CRITICAL)
================================================================================
1. test_tokenizer_embedding_pipeline
→ Validates Module 10 → Module 11 integration
→ Catches: Vocab size mismatches, invalid token IDs
→ Priority: HIGHEST - This is the core use case
2. test_embedding_index_out_of_bounds
→ Validates error handling for invalid indices
→ Catches: Silent failures, tokenizer bugs
→ Priority: HIGHEST - Prevents crashes
3. test_positional_encoding_max_seq_len
→ Validates sequence length limits
→ Catches: OOB errors in attention, OOM crashes
→ Priority: HIGHEST - Critical for Module 12
4. test_embedding_gradient_flow
→ Validates autograd integration (Module 05)
→ Catches: Training failures, gradient bugs
→ Priority: HIGH - Ensures embeddings are trainable
HIGH PRIORITY MISSING TESTS (P1)
================================================================================
5. test_embedding_attention_shape_compatibility
→ Validates Module 11 → Module 12 forward integration
→ Ensures attention receives correct input shapes
6. test_variable_sequence_length_handling
→ Validates dynamic sequence length support
→ Critical for real-world NLP tasks
7. test_embedding_positional_composition
→ Validates token + positional encoding combination
→ Ensures both components contribute
8. test_embedding_parameters_optimizable
→ Validates optimizer integration
→ Ensures embeddings participate in training
CRITICAL INTEGRATION POINTS
================================================================================
Backward Integration (Dependencies):
✗ Module 10 (Tokenization) → Token IDs feed embeddings
✗ Module 05 (Autograd) → Gradient flow through embeddings
✗ Module 01 (Tensor) → Embedding operations use Tensor
Forward Integration (Dependents):
✗ Module 11 → Module 12 (Attention) → Shape compatibility
✗ Module 11 → Module 13 (Transformers) → Complete pipeline
✗ Module 11 → Module 06 (Optimizers) → Parameter updates
BUG-CATCHING VALUE
================================================================================
Highest Impact Tests:
1. Index validation → Catches 40% of embedding bugs
2. Gradient flow → Catches 25% of bugs
3. Shape compatibility → Catches 20% of bugs
4. Sequence length limits → Catches 15% of bugs
IMMEDIATE ACTION REQUIRED
================================================================================
1. Delete all compression tests from test_progressive_integration.py
2. Implement 4 P0 tests (tokenizer integration, index validation, etc.)
3. Implement 4 P1 tests (attention compatibility, variable sequences, etc.)
4. Add regression prevention tests (prior stack stability)
ESTIMATED EFFORT
================================================================================
Total Time: 4-6 hours
- Fix wrong module bug: 30 min
- P0 tests (4): 1.5 hours
- P1 tests (4): 1.5 hours
- P2 tests (4): 1.5 hours
- Documentation: 30 min
- Testing/validation: 1 hour
EXPECTED OUTCOME
================================================================================
After fixes: 90%+ bug detection coverage
- Tokenizer integration validated
- Gradient flow confirmed
- Attention compatibility ensured
- Training loop integration verified
See INTEGRATION_TEST_AUDIT.md for detailed analysis and test implementations.

View File

@@ -1,630 +0,0 @@
# Module 11 (Embeddings) Integration Test Audit Report
**Date**: 2025-11-25
**Auditor**: Dr. Sarah Rodriguez
**Module**: 11_embeddings (Token and Positional Embeddings)
**Test File**: `tests/11_embeddings/test_progressive_integration.py`
---
## Executive Summary
**CRITICAL FINDING**: The integration test file is completely incorrect - it tests Module 12 (Compression) instead of Module 11 (Embeddings). This is a copy-paste error that must be fixed immediately.
**Status**: MAJOR ISSUES - Complete rewrite required
**Coverage**: 0% of Module 11 functionality (tests wrong module)
**Risk Level**: HIGH - No integration validation for embeddings
---
## Current Test File Issues
### Issue 1: Wrong Module Being Tested (CRITICAL)
**Problem**: File header says "Module 11" but tests "Module 12 (Compression)"
```python
# Current (WRONG):
"""
Module 11: Progressive Integration Tests
Tests that Module 12 (Compression) works correctly...
"""
# Should be:
"""
Module 11: Progressive Integration Tests
Tests that Module 11 (Embeddings) works correctly...
"""
```
**Impact**: ZERO coverage of Module 11 integration points
### Issue 2: Wrong Dependency Chain
**Problem**: States dependency chain ending in compression
```python
# Current (WRONG):
DEPENDENCY CHAIN: 01_setup ... 11_training 12_compression
# Should be:
DEPENDENCY CHAIN: 01_tensor 02_activations ... 10_tokenization 11_embeddings
```
### Issue 3: No Embedding-Specific Tests
**Problem**: All test classes focus on compression (quantization, pruning, distillation)
- `TestModule12CompressionCore` - Wrong module
- No `TestModule11EmbeddingsCore` - Missing!
- No embedding-tokenizer integration - Missing!
- No embedding-attention preparation - Missing!
---
## Critical Integration Points for Module 11
Based on the module implementation and DEFINITIVE_MODULE_PLAN, Module 11 must validate:
### 1. Backward Integration (Dependencies)
**Module 10 (Tokenization) → Module 11 (Embeddings)**
- ✗ Token IDs from tokenizers must be valid embedding indices
- ✗ Vocabulary size consistency between tokenizer and embedding
- ✗ Special token handling (<UNK>, <PAD>, <BOS>, <EOS>)
- ✗ Batch dimension handling from DataLoader
**Module 01 (Tensor) → Module 11**
- ✗ Embeddings return proper Tensor objects
- ✗ Gradient tracking works (`requires_grad=True`)
- ✗ Tensor operations (slicing, reshaping) preserve embedding semantics
**Module 05 (Autograd) → Module 11**
- ✗ EmbeddingBackward gradient computation
- ✗ Gradient accumulation for shared embeddings
- ✗ Positional encoding gradients flow correctly
### 2. Forward Integration (Dependents)
**Module 11 (Embeddings) → Module 12 (Attention)**
- ✗ Embedding output shape matches attention input requirements
- ✗ Positional encodings don't exceed max_seq_len
- ✗ Embedding + positional encoding creates position-aware representations
- ✗ Variable sequence length handling
**Module 11 → Module 13 (Transformers)**
- ✗ EmbeddingLayer provides complete pipeline (token + positional)
- ✗ Embedding scaling (sqrt(embed_dim)) matches transformer conventions
- ✗ Learnable vs sinusoidal positional encoding options
### 3. Cross-Module Integration
**Embeddings + Optimizers**
- ✗ Embedding parameters appear in optimizer.parameters()
- ✗ Gradient updates modify embedding table correctly
- ✗ Positional encodings are trainable (when learned)
**Embeddings + Training**
- ✗ Forward pass with batched token sequences
- ✗ Loss computation with embedded representations
- ✗ Backward pass updates embedding weights
---
## Missing Test Coverage Analysis
### Category A: Backward Integration Tests (HIGH PRIORITY)
#### 1. Tokenizer → Embedding Integration
**Missing Test**: `test_tokenizer_embedding_pipeline`
```python
def test_tokenizer_embedding_pipeline(self):
"""Test token IDs from tokenizer work with embeddings."""
from tinytorch.text.tokenization import CharTokenizer
from tinytorch.text.embeddings import Embedding
from tinytorch.core.tensor import Tensor
# Tokenize text
tokenizer = CharTokenizer()
text = "Hello, world!"
token_ids = tokenizer.encode(text) # Returns list of IDs
# Create embedding
vocab_size = len(tokenizer.vocab)
embed = Embedding(vocab_size=vocab_size, embed_dim=64)
# Convert to tensor and embed
tokens_tensor = Tensor(np.array([token_ids])) # (1, seq_len)
embeddings = embed.forward(tokens_tensor)
# Validate
assert embeddings.shape == (1, len(token_ids), 64)
assert embeddings.requires_grad == True # Should track gradients
```
**Bug-Catching Value**: Catches vocabulary size mismatches, invalid token IDs, dimension errors
#### 2. Embedding Index Validation
**Missing Test**: `test_embedding_index_out_of_bounds`
```python
def test_embedding_index_out_of_bounds(self):
"""Test embedding handles invalid token IDs gracefully."""
from tinytorch.text.embeddings import Embedding
from tinytorch.core.tensor import Tensor
embed = Embedding(vocab_size=100, embed_dim=64)
# Test negative indices
try:
invalid_tokens = Tensor(np.array([[-1, 0, 1]]))
output = embed.forward(invalid_tokens)
assert False, "Should raise ValueError for negative indices"
except ValueError as e:
assert "out of range" in str(e).lower()
# Test indices >= vocab_size
try:
invalid_tokens = Tensor(np.array([[0, 1, 100]])) # 100 >= vocab_size
output = embed.forward(invalid_tokens)
assert False, "Should raise ValueError for indices >= vocab_size"
except ValueError as e:
assert "out of range" in str(e).lower()
```
**Bug-Catching Value**: Prevents silent failures, catches tokenizer bugs, validates error messages
#### 3. Gradient Flow Through Embeddings
**Missing Test**: `test_embedding_gradient_flow`
```python
def test_embedding_gradient_flow(self):
"""Test gradients flow back to embedding weights."""
from tinytorch.text.embeddings import Embedding
from tinytorch.core.tensor import Tensor
embed = Embedding(vocab_size=50, embed_dim=32)
tokens = Tensor(np.array([[1, 2, 3]])) # (1, 3)
# Forward pass
output = embed.forward(tokens)
assert output.requires_grad == True
# Check backward function attached
assert hasattr(output, '_grad_fn')
assert output._grad_fn is not None
# Verify embedding weights are marked for gradients
assert embed.weight.requires_grad == True
```
**Bug-Catching Value**: Catches gradient tracking bugs, validates autograd integration
#### 4. Positional Encoding Sequence Length Limits
**Missing Test**: `test_positional_encoding_max_seq_len`
```python
def test_positional_encoding_max_seq_len(self):
"""Test positional encoding respects max_seq_len."""
from tinytorch.text.embeddings import PositionalEncoding
from tinytorch.core.tensor import Tensor
max_seq_len = 512
pos_enc = PositionalEncoding(max_seq_len=max_seq_len, embed_dim=64)
# Test at limit (should work)
x_valid = Tensor(np.random.randn(2, 512, 64)) # (batch, seq, embed)
output = pos_enc.forward(x_valid)
assert output.shape == (2, 512, 64)
# Test beyond limit (should fail)
try:
x_invalid = Tensor(np.random.randn(2, 513, 64)) # Exceeds max_seq_len
output = pos_enc.forward(x_invalid)
assert False, "Should raise ValueError for seq_len > max_seq_len"
except ValueError as e:
assert "exceeds maximum" in str(e).lower()
```
**Bug-Catching Value**: Prevents position encoding OOB errors, critical for attention modules
### Category B: Forward Integration Tests (HIGH PRIORITY)
#### 5. Embedding → Attention Shape Compatibility
**Missing Test**: `test_embedding_attention_shape_compatibility`
```python
def test_embedding_attention_shape_compatibility(self):
"""Test embedding output shapes work with attention input requirements."""
from tinytorch.text.embeddings import EmbeddingLayer
from tinytorch.core.tensor import Tensor
# Create embedding layer
embed_layer = EmbeddingLayer(
vocab_size=1000,
embed_dim=512,
max_seq_len=128,
pos_encoding='learned'
)
# Simulate tokenized batch
batch_size, seq_len = 4, 32
tokens = Tensor(np.random.randint(0, 1000, (batch_size, seq_len)))
# Get embeddings
embeddings = embed_layer.forward(tokens)
# Validate attention-compatible shape (batch, seq, embed)
assert embeddings.shape == (batch_size, seq_len, 512)
assert embeddings.requires_grad == True
# Verify positional information is added
# (Different positions should have different representations)
# This is implicit validation - attention expects position-aware inputs
```
**Bug-Catching Value**: Ensures Module 12 (Attention) integration works, catches shape errors
#### 6. Variable Sequence Length Handling
**Missing Test**: `test_variable_sequence_length_handling`
```python
def test_variable_sequence_length_handling(self):
"""Test embeddings handle variable sequence lengths correctly."""
from tinytorch.text.embeddings import EmbeddingLayer
from tinytorch.core.tensor import Tensor
embed_layer = EmbeddingLayer(
vocab_size=500,
embed_dim=256,
max_seq_len=512
)
# Test different sequence lengths
for seq_len in [10, 50, 100, 256, 512]:
tokens = Tensor(np.random.randint(0, 500, (2, seq_len)))
output = embed_layer.forward(tokens)
assert output.shape == (2, seq_len, 256)
assert output.requires_grad == True
```
**Bug-Catching Value**: Validates dynamic sequence handling, catches hardcoded assumptions
#### 7. Embedding + Positional Encoding Composition
**Missing Test**: `test_embedding_positional_composition`
```python
def test_embedding_positional_composition(self):
"""Test token embeddings correctly combine with positional encodings."""
from tinytorch.text.embeddings import Embedding, PositionalEncoding
from tinytorch.core.tensor import Tensor
# Create components
token_embed = Embedding(vocab_size=100, embed_dim=64)
pos_enc = PositionalEncoding(max_seq_len=128, embed_dim=64)
# Token sequence
tokens = Tensor(np.array([[1, 2, 3, 4]])) # (1, 4)
# Manual composition
token_embeds = token_embed.forward(tokens) # (1, 4, 64)
position_aware = pos_enc.forward(token_embeds) # (1, 4, 64)
# Validate shape preservation
assert position_aware.shape == token_embeds.shape
# Validate it's not just token embeddings (positional info added)
# NOTE: Can't easily test this without comparing values,
# but gradients should flow through both components
assert hasattr(position_aware, '_grad_fn')
```
**Bug-Catching Value**: Validates additive composition, ensures both components contribute
### Category C: Cross-Module Integration Tests (MEDIUM PRIORITY)
#### 8. Embedding Parameters in Optimizer
**Missing Test**: `test_embedding_parameters_optimizable`
```python
def test_embedding_parameters_optimizable(self):
"""Test embedding parameters work with optimizers."""
from tinytorch.text.embeddings import EmbeddingLayer
from tinytorch.core.optimizers import SGD
from tinytorch.core.tensor import Tensor
import numpy as np
# Create embedding layer
embed_layer = EmbeddingLayer(
vocab_size=200,
embed_dim=128,
pos_encoding='learned'
)
# Get parameters
params = embed_layer.parameters()
# Should have 2 parameter sets: token embeddings + positional encodings
assert len(params) == 2
assert all(p.requires_grad for p in params)
# Create optimizer
optimizer = SGD(params, lr=0.01)
# Verify optimizer accepted parameters
assert len(optimizer.parameters) == 2
```
**Bug-Catching Value**: Ensures training loop integration, catches parameter registration bugs
#### 9. Embedding Training End-to-End
**Missing Test**: `test_embedding_training_updates`
```python
def test_embedding_training_updates(self):
"""Test embeddings update during training."""
from tinytorch.text.embeddings import Embedding
from tinytorch.core.tensor import Tensor
from tinytorch.core.losses import mse_loss
import numpy as np
embed = Embedding(vocab_size=50, embed_dim=32)
# Save initial weights
initial_weights = embed.weight.data.copy()
# Forward pass
tokens = Tensor(np.array([[1, 2, 3]]))
output = embed.forward(tokens)
# Compute loss (dummy target)
target = Tensor(np.random.randn(1, 3, 32))
loss = mse_loss(output, target)
# Backward pass
loss.backward()
# Verify gradients computed
assert embed.weight.grad is not None
assert embed.weight.grad.shape == embed.weight.shape
# Gradients should be non-zero for used embeddings
# (Only tokens 1, 2, 3 should have gradients)
# This validates sparse gradient accumulation
```
**Bug-Catching Value**: Validates end-to-end training, catches gradient bugs
#### 10. Sinusoidal vs Learned Positional Encoding
**Missing Test**: `test_sinusoidal_vs_learned_positional`
```python
def test_sinusoidal_vs_learned_positional(self):
"""Test both positional encoding types work correctly."""
from tinytorch.text.embeddings import EmbeddingLayer
from tinytorch.core.tensor import Tensor
tokens = Tensor(np.random.randint(0, 100, (2, 10)))
# Learned positional encoding
embed_learned = EmbeddingLayer(
vocab_size=100,
embed_dim=64,
pos_encoding='learned'
)
output_learned = embed_learned.forward(tokens)
assert output_learned.shape == (2, 10, 64)
# Should have trainable positional parameters
params_learned = embed_learned.parameters()
assert len(params_learned) == 2 # Token + Positional
# Sinusoidal positional encoding
embed_sinusoidal = EmbeddingLayer(
vocab_size=100,
embed_dim=64,
pos_encoding='sinusoidal'
)
output_sinusoidal = embed_sinusoidal.forward(tokens)
assert output_sinusoidal.shape == (2, 10, 64)
# Should only have token embeddings as parameters (sinusoidal is fixed)
params_sinusoidal = embed_sinusoidal.parameters()
assert len(params_sinusoidal) == 1 # Only token embeddings
# No positional encoding
embed_none = EmbeddingLayer(
vocab_size=100,
embed_dim=64,
pos_encoding=None
)
output_none = embed_none.forward(tokens)
assert output_none.shape == (2, 10, 64)
```
**Bug-Catching Value**: Validates positional encoding options, ensures transformer flexibility
### Category D: Regression Prevention Tests (MEDIUM PRIORITY)
#### 11. Prior Stack Stability
**Missing Test**: `test_prior_stack_stable_through_embeddings`
```python
def test_prior_stack_stable_through_embeddings(self):
"""Verify embedding development didn't break Modules 01-10."""
# Module 01: Tensor
from tinytorch.core.tensor import Tensor
t = Tensor([1, 2, 3])
assert t.shape == (3,)
# Module 02: Activations
from tinytorch.core.activations import ReLU
relu = ReLU()
assert hasattr(relu, 'forward')
# Module 05: Autograd
from tinytorch.core.autograd import AddBackward
assert AddBackward is not None
# Module 10: Tokenization
from tinytorch.text.tokenization import CharTokenizer
tokenizer = CharTokenizer()
encoded = tokenizer.encode("test")
assert isinstance(encoded, list)
```
**Bug-Catching Value**: Catches import errors, validates module isolation
#### 12. Embedding Memory Scaling
**Missing Test**: `test_embedding_memory_scaling`
```python
def test_embedding_memory_scaling(self):
"""Test embedding memory scales as expected."""
from tinytorch.text.embeddings import Embedding
# Small embedding
embed_small = Embedding(vocab_size=1000, embed_dim=128)
memory_small = embed_small.weight.data.nbytes
# Large embedding (4x vocabulary, 2x dimensions)
embed_large = Embedding(vocab_size=4000, embed_dim=256)
memory_large = embed_large.weight.data.nbytes
# Memory should scale proportionally: 4 * 2 = 8x
expected_ratio = 8.0
actual_ratio = memory_large / memory_small
assert np.isclose(actual_ratio, expected_ratio, rtol=0.1)
```
**Bug-Catching Value**: Validates memory model, catches initialization bugs
---
## Recommended Test Structure
### New File: `test_progressive_integration.py`
```python
"""
Module 11: Progressive Integration Tests
Tests that Module 11 (Embeddings) works correctly AND integrates with prior modules.
DEPENDENCY CHAIN: 01_tensor → 05_autograd → 10_tokenization → 11_embeddings → 12_attention
"""
class TestPriorStackStillWorking:
"""Verify Modules 01-10 still work after Module 11 development."""
def test_tensor_functionality_stable(self):
"""Module 01: Tensor operations still work."""
def test_tokenization_functionality_stable(self):
"""Module 10: Tokenization still works."""
class TestModule11EmbeddingsCore:
"""Test Module 11 core functionality in isolation."""
def test_embedding_creation(self):
"""Test basic embedding layer creation."""
def test_positional_encoding_creation(self):
"""Test positional encoding creation."""
def test_embedding_layer_complete_system(self):
"""Test complete EmbeddingLayer system."""
class TestBackwardIntegration:
"""Test Module 11 integrates with dependencies (Modules 01-10)."""
def test_tokenizer_embedding_pipeline(self):
"""Module 10 → 11: Tokenizer output feeds embeddings."""
def test_embedding_gradient_flow(self):
"""Module 05 → 11: Autograd works with embeddings."""
def test_embedding_index_validation(self):
"""Input validation catches tokenizer bugs."""
class TestForwardIntegration:
"""Test Module 11 prepares for dependents (Module 12+)."""
def test_embedding_attention_compatibility(self):
"""Module 11 → 12: Output shapes match attention requirements."""
def test_positional_encoding_sequence_limits(self):
"""Position encodings respect max_seq_len for attention."""
def test_variable_sequence_length_handling(self):
"""Dynamic sequence lengths work correctly."""
class TestCrossModuleIntegration:
"""Test Module 11 works with the complete stack."""
def test_embedding_parameters_optimizable(self):
"""Embeddings integrate with optimizers."""
def test_embedding_training_updates(self):
"""End-to-end training updates embeddings."""
def test_sinusoidal_vs_learned_encoding(self):
"""Both positional encoding types work."""
class TestRegressionPrevention:
"""Prevent future bugs and validate edge cases."""
def test_embedding_memory_scaling(self):
"""Memory usage scales correctly."""
def test_embedding_edge_cases(self):
"""Empty sequences, single tokens, max length."""
```
---
## Priority Ranking for Implementation
### P0 - CRITICAL (Implement First)
1. **Fix wrong module bug** - Replace compression tests with embedding tests
2. **test_tokenizer_embedding_pipeline** - Core integration point
3. **test_embedding_index_out_of_bounds** - Prevents silent failures
4. **test_positional_encoding_max_seq_len** - Critical for attention
### P1 - HIGH (Implement Second)
5. **test_embedding_attention_shape_compatibility** - Forward integration
6. **test_embedding_gradient_flow** - Autograd validation
7. **test_variable_sequence_length_handling** - Dynamic sequences
8. **test_embedding_positional_composition** - Component interaction
### P2 - MEDIUM (Implement Third)
9. **test_embedding_parameters_optimizable** - Training integration
10. **test_sinusoidal_vs_learned_positional** - Encoding options
11. **test_embedding_training_updates** - End-to-end validation
12. **test_embedding_memory_scaling** - Performance awareness
---
## Bug-Catching Priorities
### Highest Value Tests (Catch Most Bugs)
1. **Index validation** - Catches 40% of embedding bugs (OOB errors, vocab mismatches)
2. **Gradient flow** - Catches 25% of bugs (autograd issues, training failures)
3. **Shape compatibility** - Catches 20% of bugs (dimension mismatches, pipeline errors)
4. **Sequence length limits** - Catches 15% of bugs (attention crashes, OOM errors)
### Production-Critical Tests
- **test_tokenizer_embedding_pipeline** - Real usage pattern
- **test_embedding_attention_compatibility** - Transformer requirement
- **test_positional_encoding_max_seq_len** - Prevents runtime crashes
- **test_embedding_training_updates** - Validates learning actually works
---
## Estimated Implementation Effort
**Total Work**: ~4-6 hours for complete integration test suite
- P0 tests: 1.5 hours (4 tests)
- P1 tests: 1.5 hours (4 tests)
- P2 tests: 1.5 hours (4 tests)
- Documentation: 0.5 hours
- Testing & validation: 1 hour
**Recommended Approach**:
1. Day 1: Fix wrong module bug, implement P0 tests
2. Day 2: Implement P1 tests
3. Day 3: Implement P2 tests, documentation
---
## Conclusion
The current integration test file is **completely broken** - it tests the wrong module (Compression instead of Embeddings). A full rewrite is required.
**Key Priorities**:
1. Replace all compression tests with embedding tests
2. Focus on tokenizer → embedding → attention integration
3. Validate gradient flow and parameter optimization
4. Test both learned and sinusoidal positional encodings
**Expected Outcome**: Robust integration test suite that catches 90%+ of embedding-related bugs before they reach production.

View File

@@ -1,518 +0,0 @@
# Module 17 (Memoization/KV Cache) - Integration Test Audit Report
## Executive Summary
**Current Status**: Module 15/17 (Memoization) has **NO specific integration tests** - the test file `tests/15_memoization/test_progressive_integration.py` currently contains only generic TinyGPT/Capstone tests that belong in a later module.
**Critical Gap**: This module implements KV caching - a production-critical optimization with complex integration points - but has zero tests validating those integrations work correctly.
---
## Current Test Coverage Analysis
### What Exists (tests/15_memoization/test_progressive_integration.py)
The current test file is **COMPLETELY MISNAMED** - it tests Module 16 (TinyGPT Capstone), NOT Module 17 (Memoization):
```python
class TestModule16TinyGPTCore: # ← Tests TinyGPT, not KV cache!
def test_transformer_block_creation(self)
def test_tinygpt_model_creation(self)
def test_text_generation_capabilities(self)
class TestCompleteSystemIntegration: # ← Generic system tests
def test_end_to_end_language_model_training(self)
def test_compressed_transformer_deployment(self)
def test_multi_modal_capabilities(self)
```
**Zero tests validate**:
- KVCache integration with MultiHeadAttention
- Cache updates during autoregressive generation
- Training vs inference mode detection
- Cache corruption across generation steps
- Memory scaling validation
---
## Critical Integration Points for Module 17
Based on module implementation (`src/17_memoization/17_memoization.py`), these are the **CRITICAL integration points that MUST be tested**:
### 1. KVCache ↔ MultiHeadAttention Integration
**What needs testing**:
```python
class KVCache:
def update(layer_idx, key, value) # ← Must work with attention output
def get(layer_idx) # ← Must provide correct format for attention
def advance() # ← Must sync with generation loop
```
**Integration scenarios**:
- ✅ KVCache stores K,V tensors from attention computation
- ✅ Retrieved cache has correct shape for attention: `(batch, heads, seq_len, head_dim)`
- ✅ Cache updates don't corrupt data across layers
- ✅ Sequence position advances correctly after all layers process
**Risk**: Cache shape mismatch crashes attention → broken generation
---
### 2. Cache ↔ Generation Loop Integration
**What needs testing**:
```python
def enable_kv_cache(model) # ← Non-invasive model patching
# Generation loop must:
# 1. Create cache before generation
# 2. Pass cache to model.forward()
# 3. Advance cache after each step
# 4. Stop at max_seq_len
```
**Integration scenarios**:
- ✅ Cache initialized with correct model architecture params
- ✅ Generation produces correct output with cache enabled
- ✅ Cache updates don't break across generation steps
- ✅ Generated sequence length respects max_seq_len limit
- ✅ Cache memory doesn't grow unbounded
**Risk**: Cache corruption mid-generation → garbage output after N tokens
---
### 3. Training Mode Detection
**What needs testing**:
```python
# From implementation:
# - Training: Don't use cache (need gradients)
# - Inference: Use cache (no gradients, faster)
```
**Integration scenarios**:
- ✅ model.train() disables cache usage
- ✅ model.eval() enables cache usage
- ✅ Training with cache accidentally enabled → error or warning
- ✅ Cache correctly marked as inference-only (no gradient tracking)
**Risk**: Training with cache enabled → incorrect gradients → broken model
---
### 4. Multi-Layer Cache Consistency
**What needs testing**:
```python
# Each transformer layer has its own (K, V) cache
# Cache updates must not interfere across layers
cache.update(layer_idx=0, ...) # Layer 0
cache.update(layer_idx=1, ...) # Layer 1
```
**Integration scenarios**:
- ✅ Layer 0 cache update doesn't corrupt Layer 1 cache
- ✅ All layers retrieve correct cached K,V for their layer_idx
- ✅ Parallel layer processing doesn't cause race conditions
- ✅ Cache.get() returns layer-specific cached values
**Risk**: Layer cache mixing → incorrect attention → degraded quality
---
### 5. Batch Inference Validation
**What needs testing**:
```python
cache = KVCache(batch_size=4, ...) # Generate 4 sequences in parallel
# Each sequence in batch has independent cache state
```
**Integration scenarios**:
- ✅ Batch dimension properly handled in cache updates
- ✅ Different sequences don't interfere with each other
- ✅ Cache memory scales linearly with batch_size
- ✅ Batch inference produces same results as sequential
**Risk**: Batch sequences cross-contaminate → non-deterministic output
---
### 6. Memory Scaling Validation
**What needs testing**:
```python
# Cache memory = batch × layers × heads × seq_len × head_dim × 4 bytes
# Must validate this doesn't OOM for realistic configs
```
**Integration scenarios**:
- ✅ Small model (2 layers, 64 dim) uses <1 MB
- Medium model (4 layers, 128 dim) uses 1-10 MB
- Large model (12 layers, 768 dim, seq=1024) uses ~37 MB
- Memory calculation matches actual allocation
- Max sequence length enforcement prevents unbounded growth
**Risk**: Unbounded cache growth OOM crash in production
---
## Missing Integration Tests (Priority Ordered)
### CRITICAL (P0) - Break Production if Missing
#### Test 1: Cache-Enabled Generation Produces Correct Output
```python
def test_kv_cache_generation_correctness():
"""Verify cached generation matches non-cached generation."""
model = create_tiny_transformer()
input_ids = [1, 2, 3]
# Generate without cache (baseline)
output_no_cache = model.generate(input_ids, max_new_tokens=10)
# Generate with cache
cache = enable_kv_cache(model)
output_with_cache = model.generate(input_ids, max_new_tokens=10, cache=cache)
# Outputs should be identical (deterministic generation)
assert output_no_cache == output_with_cache
```
**Bug it catches**: Cache corruption producing wrong tokens
---
#### Test 2: Cache Updates Don't Corrupt Across Layers
```python
def test_cache_layer_isolation():
"""Verify each layer's cache is independent."""
cache = KVCache(batch_size=1, max_seq_len=10, num_layers=3,
num_heads=4, head_dim=16)
# Update each layer with unique data
for layer_idx in range(3):
key = Tensor(np.full((1, 4, 1, 16), layer_idx))
val = Tensor(np.full((1, 4, 1, 16), layer_idx * 10))
cache.update(layer_idx, key, val)
cache.advance()
# Verify each layer has its own data (no cross-contamination)
for layer_idx in range(3):
k, v = cache.get(layer_idx)
assert np.all(k.data == layer_idx), f"Layer {layer_idx} key corrupted"
assert np.all(v.data == layer_idx * 10), f"Layer {layer_idx} value corrupted"
```
**Bug it catches**: Layer cache mixing causing quality degradation
---
#### Test 3: Training Mode Prevents Cache Usage
```python
def test_training_mode_disables_cache():
"""Verify cache is disabled during training."""
model = create_tiny_transformer()
cache = enable_kv_cache(model)
# Training mode
model.train()
# Forward pass should NOT use cache (needs gradients)
input_ids = Tensor([[1, 2, 3, 4]])
output = model(input_ids)
# Cache should not have been updated
assert cache.seq_pos == 0, "Cache updated during training mode!"
# Inference mode
model.eval()
output = model(input_ids)
# Now cache should be updated
assert cache.seq_pos > 0, "Cache not updated during eval mode!"
```
**Bug it catches**: Incorrect gradients from cached computation
---
#### Test 4: Cache Memory Grows Correctly
```python
def test_cache_memory_scaling():
"""Verify cache memory scales as expected."""
configs = [
# (layers, embed_dim, heads, seq_len, expected_mb)
(2, 64, 4, 64, 0.1), # Tiny: <0.2 MB
(4, 128, 8, 128, 2.0), # Small: ~2 MB
(6, 256, 8, 256, 12.0), # Medium: ~12 MB
]
for num_layers, embed_dim, num_heads, max_seq_len, expected_mb in configs:
head_dim = embed_dim // num_heads
cache = KVCache(
batch_size=1,
max_seq_len=max_seq_len,
num_layers=num_layers,
num_heads=num_heads,
head_dim=head_dim
)
mem_info = cache.get_memory_usage()
actual_mb = mem_info['total_mb']
# Allow 20% tolerance for overhead
assert 0.8 * expected_mb < actual_mb < 1.2 * expected_mb, \
f"Memory scaling broken: expected ~{expected_mb}MB, got {actual_mb}MB"
```
**Bug it catches**: OOM from unbounded cache growth
---
### HIGH (P1) - Degrade User Experience
#### Test 5: Batch Inference Maintains Independence
```python
def test_batch_cache_independence():
"""Verify batch sequences don't interfere."""
cache = KVCache(batch_size=4, max_seq_len=10, num_layers=2,
num_heads=4, head_dim=16)
# Update with batch-specific data
# Batch 0: all 0s, Batch 1: all 1s, etc.
for step in range(3):
for layer_idx in range(2):
key = Tensor(np.stack([
np.full((4, 1, 16), batch_idx)
for batch_idx in range(4)
]))
val = key.copy()
cache.update(layer_idx, key, val)
cache.advance()
# Verify each batch maintained its own data
for layer_idx in range(2):
k, v = cache.get(layer_idx)
for batch_idx in range(4):
assert np.all(k.data[batch_idx] == batch_idx), \
f"Batch {batch_idx} contaminated"
```
**Bug it catches**: Batch cross-contamination causing non-deterministic output
---
#### Test 6: Cache Sequence Length Enforcement
```python
def test_cache_max_length_enforcement():
"""Verify cache prevents exceeding max_seq_len."""
cache = KVCache(batch_size=1, max_seq_len=5, num_layers=2,
num_heads=4, head_dim=16)
# Fill cache to max
for step in range(5):
for layer_idx in range(2):
key = Tensor(np.random.randn(1, 4, 1, 16))
val = Tensor(np.random.randn(1, 4, 1, 16))
cache.update(layer_idx, key, val)
cache.advance()
# Attempting to exceed should raise error
with pytest.raises(ValueError, match="max_seq_len"):
key = Tensor(np.random.randn(1, 4, 1, 16))
val = Tensor(np.random.randn(1, 4, 1, 16))
cache.update(0, key, val) # Should fail
```
**Bug it catches**: Unbounded generation causing OOM
---
#### Test 7: Cache Reset Functionality
```python
def test_cache_reset_clears_state():
"""Verify reset() clears cache for reuse."""
cache = KVCache(batch_size=1, max_seq_len=10, num_layers=2,
num_heads=4, head_dim=16)
# Fill cache with data
for step in range(3):
for layer_idx in range(2):
key = Tensor(np.ones((1, 4, 1, 16)))
val = Tensor(np.ones((1, 4, 1, 16)))
cache.update(layer_idx, key, val)
cache.advance()
assert cache.seq_pos == 3
# Reset cache
cache.reset()
# Verify clean state
assert cache.seq_pos == 0
k, v = cache.get(0)
assert k.shape[2] == 0, "Cache not empty after reset"
```
**Bug it catches**: Stale cache data corrupting next generation
---
### MEDIUM (P2) - Nice to Have
#### Test 8: enable_kv_cache() Integration with Real Model
```python
def test_enable_kv_cache_real_model():
"""Verify enable_kv_cache() works with transformer model."""
from tinytorch.models.transformer import GPT
model = GPT(vocab_size=100, embed_dim=64, num_layers=2,
num_heads=4, max_seq_len=32)
# Enable cache
cache = enable_kv_cache(model)
# Verify model attributes
assert hasattr(model, '_kv_cache')
assert hasattr(model, '_cache_enabled')
assert model._cache_enabled == True
# Verify cache configuration matches model
assert cache.num_layers == model.num_layers
assert cache.num_heads == model.num_heads
assert cache.max_seq_len == model.max_seq_len
```
**Bug it catches**: enable_kv_cache() misconfiguration
---
#### Test 9: Cache Shape Compatibility with Attention
```python
def test_cache_shapes_match_attention_requirements():
"""Verify cached K,V have correct shapes for attention."""
cache = KVCache(batch_size=2, max_seq_len=10, num_layers=1,
num_heads=4, head_dim=16)
# Simulate 3 generation steps
for step in range(3):
key = Tensor(np.random.randn(2, 4, 1, 16)) # (B, H, 1, D)
val = Tensor(np.random.randn(2, 4, 1, 16))
cache.update(0, key, val)
cache.advance()
# Get cached K,V
k, v = cache.get(0)
# Should have shape (B, H, seq_pos, D)
assert k.shape == (2, 4, 3, 16), f"Wrong key shape: {k.shape}"
assert v.shape == (2, 4, 3, 16), f"Wrong value shape: {v.shape}"
# Should be compatible with attention computation
# Q: (B, H, 1, D) @ K.T: (B, H, D, seq_pos) → (B, H, 1, seq_pos)
query = Tensor(np.random.randn(2, 4, 1, 16))
scores = query @ k.transpose(-2, -1)
assert scores.shape == (2, 4, 1, 3), "Attention computation failed"
```
**Bug it catches**: Shape mismatch causing attention crashes
---
## Test Organization Recommendation
### Proposed Structure
```
tests/15_memoization/
├── test_progressive_integration.py # RENAME from TinyGPT tests
│ ├── TestKVCacheAttentionIntegration
│ │ ├── test_cache_enabled_generation_correctness (P0)
│ │ ├── test_cache_layer_isolation (P0)
│ │ └── test_cache_shapes_match_attention (P2)
│ │
│ ├── TestCacheGenerationLoop
│ │ ├── test_training_mode_disables_cache (P0)
│ │ ├── test_cache_max_length_enforcement (P1)
│ │ └── test_cache_reset_clears_state (P1)
│ │
│ ├── TestCacheMemoryScaling
│ │ ├── test_cache_memory_scaling (P0)
│ │ └── test_batch_cache_independence (P1)
│ │
│ └── TestEnableKVCacheIntegration
│ └── test_enable_kv_cache_real_model (P2)
└── test_kv_cache_unit.py # Unit tests (already exist in module)
└── test_unit_kvcache() # From 17_memoization.py
```
---
## Summary Statistics
| Category | Count |
|----------|-------|
| **Total Integration Tests Needed** | 9 |
| **Critical (P0)** | 4 |
| **High Priority (P1)** | 3 |
| **Medium Priority (P2)** | 2 |
| **Current Integration Tests** | 0 |
| **Coverage Gap** | 100% |
---
## Recommended Action Plan
### Phase 1: Critical Tests (Week 1)
1. Implement P0 tests (4 tests)
2. Verify with real model (create minimal transformer for testing)
3. Fix any bugs discovered
### Phase 2: High Priority (Week 2)
4. Implement P1 tests (3 tests)
5. Add batch inference validation
6. Add sequence length enforcement
### Phase 3: Medium Priority (Week 3)
7. Implement P2 tests (2 tests)
8. Complete integration with enable_kv_cache()
9. Final validation pass
---
## Risk Assessment
### Current Risk Level: **HIGH** ⚠️
**Without these integration tests:**
- Cache corruption could go undetected broken generation in production
- Training mode cache usage incorrect gradients broken models
- Memory leaks from unbounded cache OOM crashes
- Layer cache mixing degraded output quality
- Batch contamination non-deterministic behavior
**With these integration tests:**
- Catch cache corruption before deployment
- Prevent training/inference mode bugs
- Validate memory scaling behavior
- Ensure layer independence
- Guarantee batch inference correctness
---
## Conclusion
Module 17 (Memoization/KV Cache) currently has **ZERO integration tests** despite implementing complex interactions with:
- MultiHeadAttention (Module 12)
- Transformer blocks (Module 13)
- Generation loops
- Training/inference mode switching
- Multi-layer cache coordination
**Recommendation**: Prioritize implementing the 4 P0 tests IMMEDIATELY to prevent production issues. These tests would have caught cache corruption bugs that could silently degrade model quality.
The current test file is completely misnamed and tests the wrong module. It should be renamed and populated with the 9 integration tests outlined above.

View File

@@ -1,440 +0,0 @@
# Module 16 Quantization - Integration Test Audit Report
## Executive Summary
**Current Status**: ❌ **CRITICAL - No integration tests implemented**
**Test File**: `tests/16_quantization/test_quantization_integration.py`
**Current Coverage**: 0% (stub file only)
**Required Coverage**: Full integration with Modules 01-15
---
## Critical Integration Points (Missing Tests)
### 1. ✅ Model Integrity After Quantization
**Status**: ❌ MISSING
**Priority**: 🔴 CRITICAL - Bug Prevention
**What needs testing**:
```python
def test_quantization_preserves_model_structure():
"""Verify quantization doesn't corrupt model from Modules 03-13."""
# Test that quantized models can still:
# - Forward pass with correct shapes
# - Work with optimizers (Module 06)
# - Train with Trainer (Module 07)
# - Process batched data from DataLoader (Module 08)
# - Integrate with Conv2D/MaxPool2D (Module 09)
# - Work with attention mechanisms (Module 12)
```
**Why this matters**:
- Quantization modifies model layers IN-PLACE
- Must preserve API compatibility with all prior modules
- Breaking changes would cascade through entire system
- Students need confidence their models still work
**Test cases needed**:
1. Quantize MLP → verify Dense layers still work
2. Quantize CNN → verify Conv2D/MaxPool2D integration
3. Quantize Transformer → verify attention/embeddings work
4. Quantize then train → verify optimizer compatibility
5. Quantize then profile → verify profiler (M14) integration
---
### 2. ✅ Output Similarity Validation
**Status**: ❌ MISSING
**Priority**: 🔴 CRITICAL - Accuracy Validation
**What needs testing**:
```python
def test_quantized_output_matches_float32():
"""Verify quantized models produce similar outputs to FP32."""
# Given: Original FP32 model
# When: Quantize to INT8
# Then: Output error < 1% (not just < 0.2 like unit test)
# Test across:
# - Different model architectures (MLP, CNN, Transformer)
# - Different input distributions (uniform, normal, realistic)
# - Different weight distributions (Xavier, He, pre-trained)
```
**Why this matters**:
- Unit tests use random weights (not realistic)
- Integration tests need realistic scenarios
- Must validate on actual model architectures
- Accuracy loss should be < 1% in production
**Test cases needed**:
1. Simple MLP on random data (baseline)
2. CNN on image-like data (spatial patterns)
3. Attention on sequence data (positional dependencies)
4. Pre-trained weights (realistic distributions)
5. Edge cases: very small/large activation ranges
---
### 3. ⚠️ In-Place Modification Warning System
**Status**: MISSING
**Priority**: 🟡 HIGH - Student Safety
**What needs testing**:
```python
def test_quantization_in_place_warning():
"""Verify students are warned about destructive operations."""
# Test that:
# 1. quantize_model() warns about in-place modification
# 2. Documentation clearly states weights are LOST
# 3. Example shows copy.deepcopy() pattern
# 4. Error handling for trying to "unquantize"
```
**Why this matters**:
- Students will lose their trained models
- Can't recover FP32 weights after quantization
- Common mistake in production (quantize checkpoint by accident)
- Educational: teach defensive programming patterns
**Test cases needed**:
1. Verify warning message displays
2. Test that original model IS modified
3. Verify deepcopy() prevents modification
4. Test error message for invalid recovery attempts
---
### 4. 💾 Memory Reduction Measurement
**Status**: MISSING
**Priority**: 🟡 HIGH - Core Value Proposition
**What needs testing**:
```python
def test_quantization_actual_memory_reduction():
"""Measure ACTUAL memory savings, not theoretical."""
# Test that:
# 1. INT8 tensors use 1 byte (not 4 bytes)
# 2. Compression ratio ≈ 4× in practice
# 3. Memory profiler (M14) shows real savings
# 4. Savings persist after forward/backward passes
```
**Why this matters**:
- Unit tests calculate theoretical savings
- Need to verify ACTUAL memory usage
- Python's memory model can be tricky (views, copies)
- Students need to see real impact
**Test cases needed**:
1. Profile memory before/after quantization
2. Verify dtype is actually int8 (not float32)
3. Test memory during forward pass (no hidden FP32 copies)
4. Measure total process memory (OS-level)
5. Compare with Module 14 profiler predictions
---
## Additional Missing Integration Tests
### 5. 🔄 Backward Compatibility
**Status**: MISSING
**Priority**: 🟡 HIGH
```python
def test_quantized_models_work_with_existing_code():
"""Verify quantized models integrate seamlessly."""
# Test that quantized models work with:
# - DataLoader batching
# - Training loops
# - Gradient computation (if supported)
# - Model saving/loading
```
### 6. 🚨 Edge Cases and Error Handling
**Status**: MISSING
**Priority**: 🟢 MEDIUM
```python
def test_quantization_edge_cases():
"""Test corner cases that might break."""
# Test:
# - Quantizing already quantized model (should error)
# - Quantizing model with no Linear layers
# - Quantizing with empty calibration data
# - Quantizing constant weights (all zeros, all ones)
# - Quantizing extreme ranges (very small, very large)
```
### 7. 📊 Profiler Integration (Module 14)
**Status**: MISSING
**Priority**: 🟢 MEDIUM
```python
def test_quantization_with_profiler():
"""Verify M14 profiler works with M16 quantization."""
# Test that:
# - Profiler can measure quantized models
# - Memory measurements are accurate
# - Parameter counting works correctly
# - Benchmark results make sense
```
### 8. 🏗️ Multi-Layer Model Integration
**Status**: MISSING
**Priority**: 🟡 HIGH
```python
def test_quantization_complex_architectures():
"""Test quantization on realistic architectures."""
# Test:
# - ResNet-like skip connections
# - Multi-head attention models
# - Mixed CNN + Transformer
# - Models with shared weights (embeddings)
```
---
## Comparison with Other Modules
### Module 14 (Profiling) Integration Test Pattern
```python
# Module 14 tests verify:
Complete system (0114) still works
Multi-modal models work correctly
Advanced features integrate properly
Regression prevention for all prior modules
```
### Module 16 Should Follow Same Pattern
```python
# Module 16 needs:
Complete system (0115) verification
Quantized multi-modal models
Integration with profiling/compression
Regression prevention
```
---
## Recommended Test Implementation Order
### Phase 1: Critical Bug Prevention (Week 1)
1. **test_quantization_preserves_model_structure()** - Prevent breaking changes
2. **test_quantized_output_matches_float32()** - Validate accuracy preservation
3. **test_quantization_actual_memory_reduction()** - Verify core value prop
### Phase 2: Student Safety (Week 2)
4. **test_quantization_in_place_warning()** - Prevent data loss
5. **test_quantized_models_work_with_existing_code()** - Ensure usability
6. **test_quantization_edge_cases()** - Handle corner cases
### Phase 3: Advanced Integration (Week 3)
7. **test_quantization_with_profiler()** - M14 + M16 integration
8. **test_quantization_complex_architectures()** - Real-world scenarios
9. **test_complete_tinytorch_system_stable()** - Full regression suite
---
## Test Coverage Gaps - Detailed Analysis
### Current Unit Test Coverage (in module)
`test_unit_quantize_int8()` - Basic quantization works
`test_unit_dequantize_int8()` - Basic dequantization works
`test_unit_quantized_linear()` - Single layer quantization
`test_unit_quantize_model()` - Model-level quantization
`test_unit_compare_model_sizes()` - Memory comparison
### Missing Integration Coverage
**Cross-module compatibility** - No tests verify M16 works with M01-M15
**Real-world scenarios** - No tests on realistic architectures
**Production patterns** - No tests for deployment workflows
**Error recovery** - No tests for handling failures gracefully
**Performance validation** - No tests verify speedup claims
**Hardware compatibility** - No tests for different backends
---
## Bug-Catching Priorities
### P0: Critical Bugs (Would break student work)
1. **Quantization corrupts model state** Students lose trained models
2. **Output accuracy degradation > 5%** Models become useless
3. **Memory not actually reduced** False promises
4. **In-place modification without warning** Silent data loss
### P1: High-Impact Bugs (Would frustrate students)
5. **Quantized models incompatible with training** Can't fine-tune
6. **Profiler breaks on quantized models** Can't measure impact
7. **Edge cases crash silently** Hard to debug
### P2: Quality Issues (Would confuse students)
8. **Inconsistent compression ratios** Unclear value proposition
9. **Calibration doesn't improve accuracy** Wasted complexity
10. **Documentation claims don't match reality** Trust issues
---
## Recommended Test File Structure
```python
"""
Integration tests for Module 16: Quantization
Tests INT8 quantization, model preservation, and system integration
"""
class TestQuantizationModelIntegrity:
"""Verify quantization preserves model structure and functionality."""
def test_quantize_mlp_preserves_structure()
def test_quantize_cnn_preserves_spatial_ops()
def test_quantize_transformer_preserves_attention()
def test_quantized_model_trains_correctly()
def test_quantized_model_profiles_correctly()
class TestQuantizationAccuracy:
"""Verify quantized models maintain acceptable accuracy."""
def test_mlp_output_similarity()
def test_cnn_output_similarity()
def test_transformer_output_similarity()
def test_calibrated_vs_uncalibrated_accuracy()
def test_quantization_error_within_1_percent()
class TestQuantizationMemorySavings:
"""Verify actual memory reduction matches claims."""
def test_int8_tensor_actual_memory()
def test_compression_ratio_approximately_4x()
def test_memory_savings_persist_during_inference()
def test_profiler_measures_savings_correctly()
def test_os_level_memory_reduction()
class TestQuantizationSafety:
"""Verify safe usage patterns and error handling."""
def test_in_place_modification_warning()
def test_cannot_unquantize_model()
def test_deepcopy_prevents_modification()
def test_quantizing_quantized_model_errors()
def test_edge_case_constant_tensors()
class TestQuantizationSystemIntegration:
"""Verify quantization works with complete TinyTorch system."""
def test_complete_system_01_to_15_stable()
def test_quantized_dataloader_pipeline()
def test_quantized_training_workflow()
def test_quantization_plus_profiling()
def test_multimodal_model_quantization()
class TestQuantizationEdgeCases:
"""Test corner cases and error conditions."""
def test_empty_calibration_data()
def test_zero_weights_quantization()
def test_extreme_activation_ranges()
def test_model_with_no_linear_layers()
def test_single_layer_quantization_error()
```
---
## Success Metrics
### Minimum Acceptable Coverage
- All P0 bugs prevented (4/4 tests)
- Integration with M01-M15 verified (5+ tests)
- Real-world scenarios tested (3+ architectures)
- Memory savings validated (actual measurements)
### Gold Standard Coverage
- All recommended tests implemented (20+ tests)
- Cross-module regression suite (like M14)
- Performance benchmarks included
- Error handling comprehensive
---
## Next Actions
### Immediate (This Sprint)
1. Create basic test structure (5 test classes)
2. Implement P0 critical tests (4 tests)
3. Add model integrity tests (5 tests)
### Short-term (Next Sprint)
4. Implement accuracy validation (5 tests)
5. Add memory measurement tests (5 tests)
6. Create safety/warning tests (5 tests)
### Long-term (Future Sprints)
7. Complete edge case coverage
8. Add performance benchmarks
9. Create comprehensive regression suite
10. Document test patterns for future modules
---
## Appendix: Test Examples
### Example: Critical Integration Test
```python
def test_quantization_preserves_cnn_functionality():
"""
CRITICAL: Verify quantized CNN still works with spatial operations.
Bug this catches:
- Quantization breaks Conv2D/MaxPool2D integration
- Shape mismatches after quantization
- Gradient flow issues (if backward supported)
"""
from tinytorch.core.spatial import Conv2D, MaxPool2D
from tinytorch.core.layers import Linear
from tinytorch.core.activations import ReLU
from tinytorch.optimization.quantization import quantize_model
# Build realistic CNN
conv1 = Conv2D(3, 16, kernel_size=3)
pool = MaxPool2D(kernel_size=2)
conv2 = Conv2D(16, 32, kernel_size=3)
flatten = # ... flatten operation
fc = Linear(800, 10) # Assume flattened size
model = SimpleCNN(conv1, pool, conv2, flatten, fc)
# Test original
x = Tensor(np.random.randn(4, 3, 32, 32))
original_output = model.forward(x)
# Quantize (in-place)
quantize_model(model)
# Test quantized
quantized_output = model.forward(x)
# Assertions
assert quantized_output.shape == original_output.shape, \
"Quantization changed output shape - BREAKS SYSTEM"
error = np.mean(np.abs(original_output.data - quantized_output.data))
assert error < 0.5, \
f"Quantization error {error:.3f} too high for CNN"
# Verify Conv2D layers still work
assert hasattr(model.conv1, 'forward'), \
"Quantization broke Conv2D API"
```
---
**Report Generated**: 2024-11-25
**Auditor**: Claude (ML Systems QA)
**Status**: Ready for implementation

View File

@@ -1,453 +0,0 @@
# Module 17 (Compression/Pruning) - Integration Test Audit Report
**Audit Date**: 2025-11-25
**Auditor**: QA Agent
**Module**: 17 - Compression (Pruning, Knowledge Distillation)
**Status**: CRITICAL GAPS IDENTIFIED
---
## Executive Summary
**Current State**: Module 17 has ONLY a placeholder integration test file with no actual tests.
**Risk Level**: HIGH - Module is exported to production package but lacks integration validation.
**Critical Finding**: The checkpoint test (checkpoint_17_compression.py) expects completely different APIs than what's implemented in the actual module.
---
## 1. Current Test Coverage
### Existing Test Files
```
tests/17_compression/
├── test_compression_integration.py ❌ PLACEHOLDER ONLY (23 lines, no real tests)
├── run_all_tests.py ✅ Exists but returns PENDING status
└── __pycache__/
```
### Current Coverage: 0%
- **Unit Tests**: None in integration directory
- **Integration Tests**: Placeholder only
- **Progressive Tests**: Missing entirely
- **Cross-Module Tests**: None
---
## 2. Critical Integration Points for Module 17
Based on the actual implementation (`tinytorch/optimization/compression.py`), these are the critical integration points that MUST be tested:
### 2.1 Pruning Doesn't Corrupt Shared Weight References
**Risk**: High - Pruning modifies weights in-place
**Current Coverage**: 0%
**Bug Potential**: CRITICAL
**What to test**:
```python
# Multiple layers sharing same weight tensor
layer1 = Linear(10, 20)
layer2_weights = layer1.weight # Shared reference
model = SimpleModel(layer1, layer2_with_shared_weights)
magnitude_prune(model, sparsity=0.5)
# CRITICAL: Verify both references see the same pruned weights
# CRITICAL: Verify gradients still flow correctly through shared weights
```
**Why this matters**:
- Weight sharing is common (e.g., tied embeddings in transformers)
- In-place pruning could break reference sharing
- Could cause silent accuracy degradation
### 2.2 Sparse Models Still Train Correctly
**Risk**: High - Pruning creates zeros that must stay zero during training
**Current Coverage**: 0%
**Bug Potential**: CRITICAL
**What to test**:
```python
model = create_simple_mlp()
magnitude_prune(model, sparsity=0.7)
# Train for several steps
for _ in range(10):
output = model.forward(input)
loss = compute_loss(output, target)
loss.backward()
optimizer.step()
# CRITICAL: Verify pruned weights remain zero after training
# CRITICAL: Verify unpruned weights still update normally
# CRITICAL: Verify loss decreases despite sparsity
```
**Why this matters**:
- Pruned weights should stay pruned during fine-tuning
- Optimizer updates could "resurrect" pruned weights
- Gradient flow through sparse matrices can be unstable
### 2.3 Sparsity Measurement Consistency
**Risk**: Medium - Different measurement methods should agree
**Current Coverage**: 0%
**Bug Potential**: MEDIUM
**What to test**:
```python
model = create_model()
magnitude_prune(model, sparsity=0.6)
# Measure sparsity multiple ways
sparsity_v1 = measure_sparsity(model) # Current implementation
sparsity_v2 = manual_count_zeros(model) / total_params(model)
sparsity_v3 = CompressionComplete.measure_sparsity(model)
# CRITICAL: All methods should agree within 1%
assert abs(sparsity_v1 - sparsity_v2) < 0.01
assert abs(sparsity_v1 - sparsity_v3) < 0.01
```
**Why this matters**:
- Inconsistent sparsity metrics confuse students
- Could hide bugs in pruning implementation
- Affects compression ratio calculations
### 2.4 Pruned Model Inference Works
**Risk**: High - Sparse operations must produce correct outputs
**Current Coverage**: 0%
**Bug Potential**: HIGH
**What to test**:
```python
# Create model, train it, get baseline accuracy
model = create_and_train_model()
baseline_output = model.forward(test_input)
# Prune and verify inference still works
magnitude_prune(model, sparsity=0.7)
pruned_output = model.forward(test_input)
# CRITICAL: Output shape unchanged
assert pruned_output.shape == baseline_output.shape
# CRITICAL: Output values reasonable (not NaN/Inf)
assert not np.any(np.isnan(pruned_output.data))
assert not np.any(np.isinf(pruned_output.data))
# CRITICAL: Output changes are bounded
max_change = np.max(np.abs(pruned_output.data - baseline_output.data))
assert max_change < 10.0 # Reasonable threshold
```
### 2.5 Structured vs Unstructured Pruning Interaction
**Risk**: Medium - Both pruning types modify same weights
**Current Coverage**: 0%
**Bug Potential**: MEDIUM
**What to test**:
```python
model = create_model()
# Apply both pruning types
magnitude_prune(model, sparsity=0.5) # Unstructured
initial_sparsity = measure_sparsity(model)
structured_prune(model, prune_ratio=0.3) # Structured
final_sparsity = measure_sparsity(model)
# CRITICAL: Sparsity should increase (or stay same)
assert final_sparsity >= initial_sparsity
# CRITICAL: Model still functional
output = model.forward(test_input)
assert output.shape == expected_shape
```
### 2.6 Knowledge Distillation Integration
**Risk**: High - KD loss depends on correct tensor operations
**Current Coverage**: 0%
**Bug Potential**: HIGH
**What to test**:
```python
teacher = create_large_model()
student = create_small_model()
kd = KnowledgeDistillation(teacher, student, temperature=3.0, alpha=0.7)
# Generate predictions
teacher_logits = teacher.forward(input)
student_logits = student.forward(input)
true_labels = np.array([0, 1, 2, 3])
# Compute distillation loss
loss = kd.distillation_loss(student_logits, teacher_logits, true_labels)
# CRITICAL: Loss is a scalar
assert np.isscalar(loss) or (isinstance(loss, np.ndarray) and loss.size == 1)
# CRITICAL: Loss is positive and finite
assert loss > 0
assert not np.isnan(loss)
assert not np.isinf(loss)
# CRITICAL: Alpha parameter affects loss composition
loss_high_alpha = KnowledgeDistillation(teacher, student, alpha=0.9).distillation_loss(...)
loss_low_alpha = KnowledgeDistillation(teacher, student, alpha=0.1).distillation_loss(...)
# Different alpha should give different losses
assert abs(loss_high_alpha - loss_low_alpha) > 0.01
```
---
## 3. Missing Progressive Integration Tests
Module 17 integration tests should verify the ENTIRE stack (Modules 01-17) still works:
### 3.1 Prior Stack Regression Tests (MISSING)
```python
class TestPriorStackStillWorking:
"""Verify Modules 01-16 unchanged after compression development."""
def test_quantization_still_works(self):
"""Module 16 (Quantization) should be unaffected."""
# Test quantization APIs still functional
def test_profiling_still_works(self):
"""Module 14 (Profiling) should be unaffected."""
# Test profiling APIs still functional
def test_training_pipeline_stable(self):
"""Complete training pipeline (Modules 01-07) should work."""
# End-to-end training test
```
### 3.2 Cross-Module Integration Tests (MISSING)
```python
class TestCompressionWithOtherModules:
"""Test compression works with other advanced modules."""
def test_compression_with_quantization(self):
"""Test: Prune first, then quantize."""
model = create_model()
magnitude_prune(model, sparsity=0.7)
quantize_model(model, bits=8)
# Verify both optimizations work together
def test_compression_with_attention(self):
"""Test: Prune attention mechanisms."""
attention = MultiHeadAttention(64, 8)
structured_prune(attention, prune_ratio=0.3)
# Verify attention still computes correctly
def test_compression_with_spatial_conv(self):
"""Test: Prune CNN filters."""
conv = Conv2D(3, 64, kernel_size=3)
structured_prune(conv, prune_ratio=0.5)
# Verify convolutions still work
```
---
## 4. API Mismatch with Checkpoint Test
**CRITICAL ISSUE**: The checkpoint test expects completely different APIs than what's implemented!
### Expected APIs (from checkpoint_17_compression.py):
```python
from tinytorch.nn.utils.prune import (
MagnitudePruner, # ❌ Class-based API
prune_conv_filters, # ❌ Specialized function
CompressionAnalyzer # ❌ Analysis class
)
pruner = MagnitudePruner()
pruned_weights, mask, stats = pruner.prune(test_weights, sparsity=0.7)
```
### Actual Implementation (in compression.py):
```python
from tinytorch.optimization.compression import (
magnitude_prune, # ✅ Function-based API
structured_prune, # ✅ Function-based API
KnowledgeDistillation, # ✅ KD class
measure_sparsity, # ✅ Utility function
compress_model # ✅ Pipeline function
)
magnitude_prune(model, sparsity=0.7) # In-place, no mask/stats returned
```
### Resolution Required:
1. **Option A**: Update checkpoint to match actual implementation
2. **Option B**: Extend implementation to match checkpoint expectations
3. **Option C**: Document API differences and maintain both
**Recommendation**: Option A - Update checkpoint to match the cleaner functional API actually implemented.
---
## 5. Bug-Catching Test Priorities
### Priority 1: CRITICAL (Could cause silent failures)
1. **Shared weight corruption test** - Highest risk for silent accuracy degradation
2. **Training with pruned weights test** - Optimizer could resurrect pruned weights
3. **Knowledge distillation loss validity test** - Invalid loss breaks training
### Priority 2: HIGH (Could cause obvious failures)
4. **Pruned model inference test** - Ensures basic functionality works
5. **Sparsity measurement consistency test** - Prevents metric confusion
6. **Cross-module integration tests** - Ensures compression doesn't break other modules
### Priority 3: MEDIUM (Quality of life issues)
7. **Structured vs unstructured interaction test** - Edge case handling
8. **Progressive stack regression tests** - Prevent accidental breakage
9. **Performance profiling tests** - Verify compression actually improves performance
---
## 6. Recommended Test Structure
```
tests/17_compression/
├── test_progressive_integration.py # NEW - Progressive stack tests
│ ├── TestPriorStackStillWorking # Modules 01-16 regression
│ ├── TestModule17CompressionCore # Core compression functionality
│ ├── TestProgressiveStackIntegration # Full stack (01-17) integration
│ └── TestRegressionPrevention # Prevent breakage
├── test_compression_integration.py # EXPAND - Currently placeholder
│ ├── TestPruningIntegration # In-place pruning behavior
│ ├── TestSparsityConsistency # Measurement accuracy
│ ├── TestKnowledgeDistillation # KD integration
│ └── TestCrossModuleInteraction # With quantization, attention, etc.
├── test_pruning_edge_cases.py # NEW - Edge case handling
│ ├── TestSharedWeightReferences # CRITICAL
│ ├── TestTrainingAfterPruning # CRITICAL
│ ├── TestExtremeSparsity # 0%, 100% sparsity
│ └── TestInvalidInputHandling # Error cases
└── test_compression_performance.py # NEW - Performance validation
├── TestMemoryReduction # Actual memory savings
├── TestInferenceSpeed # Sparse inference performance
└── TestCompressionQuality # Accuracy preservation
```
---
## 7. Sample Integration Test Implementation
Here's a sample of what the CRITICAL shared weight test should look like:
```python
def test_pruning_with_shared_weights():
"""CRITICAL: Verify pruning doesn't corrupt shared weight references."""
print("🔬 Testing pruning with shared weight references...")
# Create two layers sharing the same weight tensor
layer1 = Linear(100, 50)
layer2 = Linear(100, 50)
# Share weights (common pattern: tied embeddings)
layer2.weight = layer1.weight # Share reference
# Create model with shared weights
model = SimpleModel(layer1, layer2)
# Verify weights are actually shared before pruning
original_id = id(layer1.weight.data)
assert id(layer2.weight.data) == original_id, "Weights should be shared"
# Apply magnitude pruning
magnitude_prune(model, sparsity=0.6)
# CRITICAL TEST 1: Weights still shared after pruning
assert id(layer1.weight.data) == id(layer2.weight.data), \
"Pruning should preserve weight sharing"
# CRITICAL TEST 2: Both layers see the same pruned pattern
assert np.array_equal(layer1.weight.data, layer2.weight.data), \
"Shared weights should have identical pruning masks"
# CRITICAL TEST 3: Sparsity is correct
sparsity = np.sum(layer1.weight.data == 0) / layer1.weight.data.size
assert 0.55 <= sparsity <= 0.65, \
f"Expected ~60% sparsity, got {sparsity:.1%}"
# CRITICAL TEST 4: Forward pass works with shared pruned weights
input_data = Tensor(np.random.randn(10, 100))
output1 = layer1.forward(input_data)
output2 = layer2.forward(input_data)
# Both layers should produce identical outputs (same weights)
assert np.allclose(output1.data, output2.data), \
"Shared pruned weights should produce identical outputs"
print("✅ Shared weight pruning works correctly!")
```
---
## 8. Actionable Recommendations
### Immediate Actions (This Sprint)
1. **Create test_progressive_integration.py** - Following Module 02 pattern
2. **Implement 6 critical integration tests** - Focus on shared weights, training, KD
3. **Resolve checkpoint API mismatch** - Update checkpoint or extend implementation
4. **Add cross-module tests** - Compression + Quantization, Compression + Attention
### Short-term Actions (Next Sprint)
5. **Add edge case tests** - Extreme sparsity, invalid inputs, error handling
6. **Add performance validation tests** - Verify actual memory/speed improvements
7. **Document integration patterns** - How compression interacts with other modules
8. **Create test data fixtures** - Reusable models for testing
### Long-term Actions (Future)
9. **Continuous integration monitoring** - Add to CI/CD pipeline
10. **Property-based testing** - Use Hypothesis for generative test cases
11. **Benchmark suite** - Performance regression detection
12. **Student confusion monitoring** - Track common errors in integration
---
## 9. Risk Assessment
| Risk Category | Likelihood | Impact | Mitigation Priority |
|---------------|------------|--------|---------------------|
| Shared weight corruption | HIGH | CRITICAL | P1 - Immediate |
| Training resurrects pruned weights | HIGH | CRITICAL | P1 - Immediate |
| KD loss computation errors | MEDIUM | HIGH | P1 - Immediate |
| Sparsity measurement bugs | MEDIUM | MEDIUM | P2 - Short-term |
| Cross-module incompatibility | LOW | HIGH | P2 - Short-term |
| API confusion (checkpoint mismatch) | HIGH | MEDIUM | P1 - Immediate |
---
## 10. Conclusion
**Module 17 (Compression) has ZERO integration test coverage despite being exported to production.**
**Highest-risk gaps**:
1. No validation that pruning preserves shared weight references
2. No validation that pruned models can still train
3. No validation that knowledge distillation produces valid losses
4. Complete API mismatch with checkpoint expectations
**Recommended action**: Implement the 6 critical integration tests IMMEDIATELY before any student uses this module in combination with other modules.
**Estimated effort**:
- Critical tests (Priority 1): 4-6 hours
- High-priority tests (Priority 2): 3-4 hours
- Progressive integration structure: 2-3 hours
- **Total**: 10-13 hours to achieve acceptable coverage
**Next steps**: Review this audit with Module Developer, prioritize critical tests, assign implementation tasks.
---
**Audit completed**: 2025-11-25
**Reviewed by**: QA Agent
**Status**: APPROVED FOR DEVELOPMENT

View File

@@ -1,615 +0,0 @@
# Module 19 (Benchmarking) - Integration Test Audit Report
**Audit Date**: 2025-11-25
**Module**: 19_benchmarking
**Current Test File**: `tests/19_benchmarking/test_benchmarking_integration.py`
**Status**: STUB ONLY - NO IMPLEMENTATION
---
## EXECUTIVE SUMMARY
**CRITICAL FINDING**: Module 19 integration tests are completely unimplemented (TODO stub only).
- **Current Coverage**: 0% (stub file with TODO comments)
- **Expected Coverage**: ~80% for production-ready benchmarking system
- **Priority**: HIGH - Benchmarking is final implementation module and capstone foundation
- **Risk**: Students cannot validate benchmarking correctness or integration with optimization modules
---
## 1. CURRENT TEST COVERAGE ANALYSIS
### 1.1 What EXISTS (Stub Only)
```python
def test_benchmarking_integration():
"""Test benchmarking system integration."""
# TODO: Implement integration tests
# - Test benchmark runner
# - Test performance metrics collection
# - Test result validation
# - Test comparison with baselines
# - Test leaderboard submission
pass
```
**Lines of Code**: 24 (all comments/stubs)
**Actual Tests**: 0
**Integration Scenarios**: 0
### 1.2 What Module 19 IMPLEMENTS (2546 lines)
Module 19 provides comprehensive benchmarking infrastructure:
**Core Components**:
1. `BenchmarkResult` - Statistical analysis container
2. `PreciseTimer` - High-precision timing infrastructure
3. `Benchmark` - Multi-model comparison framework
4. `BenchmarkSuite` - Comprehensive multi-metric evaluation
5. `TinyMLPerf` - Industry-standard benchmark runner
6. `compare_optimization_techniques()` - Optimization comparison engine
**Key Integration Points**:
- Uses `Profiler` from Module 14 for measurements
- Uses `Tensor` from Module 01 for data handling
- Should work with optimized models from Modules 15-18
- Generates reports for TorchPerf Olympics capstone
---
## 2. CRITICAL INTEGRATION POINTS FOR MODULE 19
### 2.1 Real Model Performance Measurement
**What Needs Testing**:
```python
Benchmark measures ACTUAL model latency (not simulated)
Benchmark measures REAL memory usage (not estimates)
Benchmark handles different model types (TinyTorch, PyTorch, custom)
Benchmark works with models from previous modules (Conv2D, MLP, Transformer)
```
**Why Critical**:
- Students need to benchmark their actual implementations, not mock models
- Profiler integration must work correctly with real TinyTorch models
- Duck-typing (hasattr checks) must handle various model interfaces
### 2.2 Statistical Validity of Measurements
**What Needs Testing**:
```python
Confidence intervals calculated correctly
Warmup runs eliminate cold-start effects
Measurement variance is reasonable (CV < 20%)
Outlier detection prevents skewed results
Sample size recommendations are valid
```
**Why Critical**:
- Poor statistics lead to incorrect optimization decisions
- Benchmarking is worthless without statistical rigor
- Students must learn to trust/distrust measurements
### 2.3 Resource Exhaustion Prevention
**What Needs Testing**:
```python
Memory benchmarks don't cause OOM crashes
Large models don't hang the benchmarking system
Timeout mechanisms prevent infinite loops
Graceful degradation when resources are limited
Clean resource cleanup after benchmarks
```
**Why Critical**:
- Benchmarking shouldn't crash student systems
- Edge cases (huge models, limited RAM) must be handled
- Production systems require robust error handling
### 2.4 Benchmark Results Reproducibility
**What Needs Testing**:
```python
Same model produces consistent results across runs
Randomness is controlled (seeded) where needed
System state doesn't affect benchmark validity
Results can be serialized/deserialized correctly
Comparison across different machines is meaningful
```
**Why Critical**:
- TorchPerf Olympics requires reproducible submissions
- Students must be able to verify their optimizations
- Leaderboard requires fair comparisons
### 2.5 Optimization Module Integration (M15-18)
**What Needs Testing**:
```python
Benchmark works with quantized models (Module 15)
Benchmark works with pruned models (Module 16)
Benchmark works with distilled models (Module 17)
Benchmark works with fused operators (Module 18)
compare_optimization_techniques() handles all optimization types
```
**Why Critical**:
- Module 19 is the EVALUATION framework for Modules 15-18
- Without integration, students can't validate optimizations
- Capstone requires combining multiple optimization techniques
### 2.6 TinyMLPerf Standard Compliance
**What Needs Testing**:
```python
Standard benchmarks (keyword_spotting, image_classification, etc.) run correctly
Compliance thresholds enforced properly
Report generation matches MLPerf format
Leaderboard submission format is valid
Results are comparable to official MLPerf baselines
```
**Why Critical**:
- Industry-standard benchmarking teaches professional practices
- Capstone submissions require MLPerf-style reporting
- Career preparation for ML engineering roles
---
## 3. MISSING INTEGRATION TESTS (BY PRIORITY)
### PRIORITY 1: Core Benchmarking Workflow (CRITICAL)
**Test**: `test_benchmark_real_tinytorch_models()`
```python
def test_benchmark_real_tinytorch_models():
"""
✅ TEST: Benchmark should measure REAL TinyTorch models correctly
VALIDATES:
- Integration with Tensor, Linear, Conv2D from earlier modules
- Profiler from Module 14 works in benchmarking context
- Latency/memory measurements are realistic (not zero, not infinite)
- Results structure is correct and serializable
🐛 BUG-CATCHING:
- Model.forward() not being called correctly
- Profiler returning None or invalid measurements
- Memory tracking not working with TinyTorch tensors
- Duck-typing failures with real TinyTorch models
"""
```
**Bug Examples**:
- Benchmark tries to call `model.predict()` but TinyTorch uses `model.forward()`
- Memory measurement returns 0 for all models
- Latency measurement includes warmup time incorrectly
---
**Test**: `test_statistical_validity()`
```python
def test_statistical_validity():
"""
✅ TEST: Statistical analysis should be mathematically correct
VALIDATES:
- Confidence intervals calculated using proper formulas
- Mean/std/median computed correctly
- Sample size sufficient for statistical significance
- Variance is reasonable (not too high or too low)
🐛 BUG-CATCHING:
- Wrong t-score value (should be 1.96 for 95% CI)
- Division by zero when n=1
- CI width unreasonably large (>50% of mean)
- Outliers not handled properly
"""
```
**Bug Examples**:
- Confidence interval calculation uses wrong formula
- Single measurement causes divide-by-zero in std calculation
- Outliers skew results (one 100ms measurement among 1ms measurements)
---
**Test**: `test_benchmark_suite_multi_metric()`
```python
def test_benchmark_suite_multi_metric():
"""
✅ TEST: BenchmarkSuite should run all metrics and combine results
VALIDATES:
- Latency, accuracy, memory, energy all measured
- Results structure contains all metrics
- Pareto frontier analysis identifies optimal models
- Report generation produces valid output
🐛 BUG-CATCHING:
- One metric failing breaks entire suite
- Results missing some metrics
- Pareto analysis chooses dominated solutions
- Energy estimation produces negative values
"""
```
---
### PRIORITY 2: Optimization Integration (HIGH)
**Test**: `test_optimization_module_integration()`
```python
def test_optimization_module_integration():
"""
✅ TEST: Benchmark should work with models from optimization modules
VALIDATES:
- Quantized models (Module 15) benchmark correctly
- Pruned models (Module 16) show reduced memory
- Distilled models (Module 17) measured accurately
- Fused operators (Module 18) show speedups
- compare_optimization_techniques() generates valid comparisons
🐛 BUG-CATCHING:
- Quantized model measurement crashes
- Pruned model memory doesn't decrease
- Fused operators show no speedup
- Comparison function fails with empty models
"""
```
**Bug Examples**:
- Quantized model forward() returns wrong dtype, crashes Profiler
- Pruned model parameter counting doesn't account for sparse weights
- Comparison assumes all models have same interface
---
**Test**: `test_optimization_recommendations()`
```python
def test_optimization_recommendations():
"""
✅ TEST: Recommendation engine should provide actionable guidance
VALIDATES:
- Recommendations match use case constraints
- Latency-critical use case chooses fastest model
- Memory-constrained use case chooses smallest model
- Balanced use case considers multiple metrics
- Recommendations include reasoning
🐛 BUG-CATCHING:
- Latency-critical recommends slowest model
- Memory-constrained ignores memory metric
- Recommendations contradict actual measurements
- Reasoning is generic (not specific to results)
"""
```
---
### PRIORITY 3: Robustness & Edge Cases (MEDIUM)
**Test**: `test_resource_exhaustion_prevention()`
```python
def test_resource_exhaustion_prevention():
"""
✅ TEST: Benchmark should handle resource constraints gracefully
VALIDATES:
- Large models don't cause OOM crashes
- Long-running benchmarks can be interrupted
- Memory is cleaned up after benchmarks
- Timeout prevents infinite loops
- Error messages are helpful
🐛 BUG-CATCHING:
- Memory leak in benchmark loop
- No timeout on model.forward() calls
- Crash instead of graceful degradation
- Resources not released on exception
"""
```
**Bug Examples**:
- Benchmarking 1GB model crashes with OOM
- Infinite loop in warmup phase (no timeout)
- Memory leak: each benchmark run consumes more memory
---
**Test**: `test_benchmark_reproducibility()`
```python
def test_benchmark_reproducibility():
"""
✅ TEST: Benchmark results should be reproducible
VALIDATES:
- Same model gives consistent results across runs
- Random seed controls variability
- Serialized results match original
- Deserialized results can be compared
- Variance is within acceptable bounds (CV < 10%)
🐛 BUG-CATCHING:
- Results vary wildly between identical runs (CV > 50%)
- Serialization loses precision
- Deserialization fails on valid files
- No seed control for reproducibility
"""
```
---
**Test**: `test_edge_case_models()`
```python
def test_edge_case_models():
"""
✅ TEST: Benchmark should handle unusual model types
VALIDATES:
- Empty model (no parameters) doesn't crash
- Single-parameter model benchmarks correctly
- Model with no forward() method fails gracefully
- Model returning wrong shape is caught
- Non-tensor outputs handled appropriately
🐛 BUG-CATCHING:
- Empty model causes division by zero
- Missing forward() crashes instead of error message
- Wrong output shape causes silent failure
- Non-tensor output crashes Profiler
"""
```
---
### PRIORITY 4: TinyMLPerf & Capstone (MEDIUM-HIGH)
**Test**: `test_tinymlperf_standard_benchmarks()`
```python
def test_tinymlperf_standard_benchmarks():
"""
✅ TEST: TinyMLPerf should run standard industry benchmarks
VALIDATES:
- All standard benchmarks (keyword_spotting, image_classification, etc.) run
- Compliance thresholds enforced correctly
- Report format matches MLPerf specification
- Leaderboard submission JSON is valid
- Results comparable to reference implementations
🐛 BUG-CATCHING:
- Benchmark names don't match MLPerf standard
- Compliance check uses wrong thresholds
- Report missing required fields
- JSON serialization produces invalid format
"""
```
---
**Test**: `test_torchperf_olympics_workflow()`
```python
def test_torchperf_olympics_workflow():
"""
✅ TEST: TorchPerf Olympics submission workflow should work end-to-end
VALIDATES:
- Student can choose Olympic event
- Benchmark runs for chosen event
- Results validated against event constraints
- Submission package generated correctly
- Leaderboard ranking calculated properly
🐛 BUG-CATCHING:
- Event constraints not enforced
- Invalid submission passes validation
- Ranking algorithm broken (ties handled wrong)
- Submission package missing required files
"""
```
---
### PRIORITY 5: Progressive Integration (MEDIUM)
**Test**: `test_complete_tinytorch_system_still_works()`
```python
def test_complete_tinytorch_system_still_works():
"""
🔄 REGRESSION: Complete TinyTorch system (Modules 01-18) should still work
VALIDATES:
- Tensor, activations, layers still functional
- Training loops still work
- Optimization modules (15-18) still work
- Benchmarking doesn't break existing functionality
🐛 BUG-CATCHING:
- Benchmarking imports break core modules
- Profiler integration interferes with training
- Circular dependencies introduced
"""
```
---
## 4. REFERENCE: GOOD INTEGRATION TEST STRUCTURE
Based on `tests/02_activations/test_progressive_integration.py`:
```python
"""
Module 19: Progressive Integration Tests
Tests that Module 19 (Benchmarking) works correctly AND that entire TinyTorch system still works.
DEPENDENCY CHAIN: 01_tensor → ... → 18_fusion → 19_benchmarking → Capstone
Final validation before TorchPerf Olympics capstone project.
"""
import numpy as np
import sys
from pathlib import Path
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
class TestModules01Through18StillWorking:
"""Verify all previous modules still work after benchmarking development."""
def test_core_modules_stable(self):
"""Ensure core modules (01-09) weren't broken."""
# Test imports and basic functionality
pass
def test_optimization_modules_stable(self):
"""Ensure optimization modules (15-18) still work."""
# Test quantization, pruning, distillation, fusion
pass
class TestModule19BenchmarkingCore:
"""Test Module 19 core benchmarking functionality."""
def test_benchmark_result_statistics(self):
"""Test BenchmarkResult calculates statistics correctly."""
pass
def test_benchmark_runner_real_models(self):
"""Test Benchmark class with real TinyTorch models."""
pass
def test_benchmark_suite_multi_metric(self):
"""Test BenchmarkSuite runs all metrics."""
pass
def test_tinymlperf_compliance(self):
"""Test TinyMLPerf standard benchmarks."""
pass
class TestProgressiveStackIntegration:
"""Test complete stack (01→19) works together."""
def test_benchmark_optimized_models_pipeline(self):
"""Test benchmarking pipeline with models from optimization modules."""
# Create base model
# Apply optimization (quantize, prune, etc.)
# Benchmark both
# Verify comparison results
pass
def test_torchperf_olympics_submission_workflow(self):
"""Test end-to-end capstone submission workflow."""
# Choose event
# Optimize model
# Benchmark
# Generate submission
# Validate submission
pass
```
---
## 5. BUG-CATCHING PRIORITIES
### 5.1 CRITICAL Bugs (Would Break Capstone)
1. **Benchmark fails with real TinyTorch models** → Students can't validate their work
2. **Statistical calculations wrong** → Incorrect optimization decisions
3. **Memory measurement always returns 0** → Can't evaluate memory optimizations
4. **Profiler integration broken** → No measurements at all
5. **compare_optimization_techniques() crashes** → Can't compare optimizations
### 5.2 HIGH-PRIORITY Bugs (Would Mislead Students)
6. **Confidence intervals calculated incorrectly** → False confidence in results
7. **Warmup runs not working** → Cold-start bias in measurements
8. **Pareto frontier analysis chooses dominated solutions** → Wrong recommendations
9. **Energy estimation produces negative values** → Meaningless results
10. **Reproducibility broken** → Can't verify submissions
### 5.3 MEDIUM-PRIORITY Bugs (Would Cause Confusion)
11. **Duck-typing fails with custom models** → Limits flexibility
12. **Resource exhaustion crashes system** → Poor student experience
13. **Serialization loses precision** → Comparison errors
14. **Report generation missing metrics** → Incomplete analysis
15. **Timeout not implemented** → Infinite loops possible
---
## 6. RECOMMENDED IMPLEMENTATION ORDER
### Phase 1: Core Functionality (Week 1)
1. `test_benchmark_real_tinytorch_models()` - CRITICAL
2. `test_statistical_validity()` - CRITICAL
3. `test_benchmark_suite_multi_metric()` - CRITICAL
### Phase 2: Optimization Integration (Week 2)
4. `test_optimization_module_integration()` - HIGH
5. `test_optimization_recommendations()` - HIGH
6. `test_complete_tinytorch_system_still_works()` - HIGH (regression)
### Phase 3: Robustness (Week 3)
7. `test_resource_exhaustion_prevention()` - MEDIUM
8. `test_benchmark_reproducibility()` - MEDIUM
9. `test_edge_case_models()` - MEDIUM
### Phase 4: Capstone Preparation (Week 4)
10. `test_tinymlperf_standard_benchmarks()` - MEDIUM-HIGH
11. `test_torchperf_olympics_workflow()` - MEDIUM-HIGH
---
## 7. ACCEPTANCE CRITERIA
Module 19 integration tests are COMPLETE when:
- [ ] **Benchmark works with real TinyTorch models** (Tensor, Linear, Conv2D, MLP, Transformer)
- [ ] **Statistical analysis is mathematically correct** (CI, mean, std validated)
- [ ] **All metrics measured correctly** (latency, memory, accuracy, energy)
- [ ] **Optimization modules integrate properly** (quantization, pruning, distillation, fusion)
- [ ] **Resource exhaustion prevented** (OOM, timeouts, cleanup tested)
- [ ] **Results are reproducible** (same model → consistent results)
- [ ] **TinyMLPerf compliance validated** (standard benchmarks run correctly)
- [ ] **Capstone workflow tested end-to-end** (Olympics submission works)
- [ ] **Progressive integration verified** (all previous modules still work)
- [ ] **Test coverage ≥ 80%** for critical integration points
---
## 8. CONCLUSION
**Current State**: CRITICAL GAP - No integration tests implemented
**Risk Level**: HIGH
- Students cannot validate benchmarking correctness
- Capstone project (TorchPerf Olympics) has no test foundation
- Integration with optimization modules unverified
- Statistical validity unchecked
**Recommendation**: IMPLEMENT IMMEDIATELY
- Start with Phase 1 (core functionality) ASAP
- Module 19 is the final implementation module before capstone
- Benchmarking is the EVALUATION framework for all optimizations
- Without tests, students cannot trust their measurements
**Estimated Effort**: 3-4 weeks for complete implementation
- Week 1: Core benchmarking tests (3 tests, ~500 LOC)
- Week 2: Optimization integration tests (3 tests, ~400 LOC)
- Week 3: Robustness tests (3 tests, ~300 LOC)
- Week 4: Capstone workflow tests (2 tests, ~300 LOC)
**Total**: ~11 comprehensive integration tests, ~1500 LOC
---
**Next Steps**:
1. Implement `test_benchmark_real_tinytorch_models()` first (most critical)
2. Add `test_statistical_validity()` (foundation for all analysis)
3. Proceed through phases systematically
4. Test with real student models from earlier modules
5. Validate capstone workflow before student submission deadlines

View File

@@ -1,119 +0,0 @@
# CLI Command Files - Usage Report
## Summary
**Status**: ✅ All files are accounted for. Some are imported but not exposed as top-level commands.
## File Categories
### 1. ✅ Registered Top-Level Commands (18)
These are in `TinyTorchCLI.commands` and accessible via `tito <command>`:
```
benchmark, book, checkpoint, community, demo, export,
grade, leaderboard, logo, milestones, module, nbgrader,
olympics, package, setup, src, system, test
```
### 2. 🔧 Internal Subcommands (7)
**Imported and used by other commands, but not top-level:**
| File | Used By | Purpose |
|------|---------|---------|
| `reset.py` | `package.py` | Reset functionality for package command |
| `module_reset.py` | `module_workflow.py` | Module reset subcommand |
| `status.py` | - | Imported in main.py but not clearly used |
| `nbdev.py` | `package.py` | NBDev integration for package command |
| `info.py` | `system.py`, `health.py` | System info subcommand |
| `health.py` | `system.py` | System health check subcommand |
| `jupyter.py` | `system.py` | Jupyter integration subcommand |
**Action**: ✅ **KEEP THESE** - They're used by other commands
### 3. ❓ Imported but Unclear Usage (1)
| File | Issue | Recommendation |
|------|-------|----------------|
| `notebooks.py` | Imported in main.py, but no usage found | Check if used, otherwise remove import |
| `status.py` | Imported in main.py, but no clear usage | Check if used, otherwise remove import |
**Action**: Need to verify these
### 4. 🗑️ Likely Unused/Deprecated (9)
| File | Status |
|------|--------|
| `check.py` | Not imported anywhere |
| `clean.py` | Not imported anywhere |
| `clean_workspace.py` | Not imported anywhere |
| `help.py` | Not imported anywhere |
| `protect.py` | Not imported anywhere |
| `report.py` | Not imported anywhere |
| `version.py` | Not imported anywhere |
| `view.py` | Not imported anywhere |
**Action**: ⚠️ Safe to delete (not imported anywhere)
## Cleanup Actions
### Step 1: Remove Dead Imports from main.py
These are imported but not registered or used:
```python
# Remove from tito/main.py lines 28-37:
from .commands.notebooks import NotebooksCommand # ❌ Not used
from .commands.status import StatusCommand # ❌ Not used (verify first)
```
### Step 2: Delete Truly Unused Files
```bash
# These are safe to delete (not imported anywhere)
rm tito/commands/check.py
rm tito/commands/clean.py
rm tito/commands/clean_workspace.py
rm tito/commands/help.py
rm tito/commands/protect.py
rm tito/commands/report.py
rm tito/commands/version.py
rm tito/commands/view.py
```
### Step 3: Verify and Update Tests
Update `test_cli_registry.py` to remove deleted files from `known_internal`:
```python
known_internal = {
'health.py', # Used by system command
'info.py', # Used by system command
'jupyter.py', # Used by system command
'nbdev.py', # Used by package command
'notebooks.py', # Verify if needed, otherwise remove
'reset.py', # Used by package command
'status.py', # Verify if needed, otherwise remove
'module_reset.py' # Used by module_workflow command
}
```
## Verification Commands
Check if status.py is actually used:
```bash
grep -r "StatusCommand" tito/ --include="*.py" | grep -v "^tito/main.py:from" | grep -v "class StatusCommand"
```
Check if notebooks.py is actually used:
```bash
grep -r "NotebooksCommand" tito/ --include="*.py" | grep -v "^tito/main.py:from" | grep -v "class NotebooksCommand"
```
## Final Architecture
After cleanup, you'll have:
- **18 top-level commands** (user-facing via `tito <cmd>`)
- **7-8 internal commands** (used as helpers by other commands)
- **0 orphaned files** (everything has a purpose)
Clean CLI with clear separation between public API and internal helpers!

View File

@@ -1,107 +0,0 @@
# Final Answer: CLI Command Cleanup
## What the Tests Found ✅
**Good news**: No broken or dangling commands! Everything is accounted for.
**However**: Found some cleanup opportunities:
### 1. Dead Imports in main.py
These 2 commands are imported but **never used**:
```python
# tito/main.py lines 28 and 37
from .commands.notebooks import NotebooksCommand # ❌ DELETE
from .commands.status import StatusCommand # ❌ DELETE
```
They're only in `__init__.py` exports, not actually used anywhere.
### 2. Orphaned Command Files (8 files)
These files exist but are **not imported anywhere**:
```bash
tito/commands/check.py
tito/commands/clean.py
tito/commands/clean_workspace.py
tito/commands/help.py
tito/commands/protect.py
tito/commands/report.py
tito/commands/version.py
tito/commands/view.py
```
### 3. Internal Helper Commands (6 files) ✅ KEEP
These are used by other commands:
- `reset.py` → used by `package.py`
- `nbdev.py` → used by `package.py`
- `info.py` → used by `system.py`
- `health.py` → used by `system.py`
- `jupyter.py` → used by `system.py`
- `module_reset.py` → used by `module_workflow.py`
## Recommended Actions
### Option A: Full Cleanup (Recommended)
```bash
# 1. Delete truly orphaned files
rm tito/commands/check.py
rm tito/commands/clean.py
rm tito/commands/clean_workspace.py
rm tito/commands/help.py
rm tito/commands/protect.py
rm tito/commands/report.py
rm tito/commands/version.py
rm tito/commands/view.py
# 2. Delete unused imported files
rm tito/commands/notebooks.py
rm tito/commands/status.py
# 3. Remove dead imports from main.py
# Edit tito/main.py and remove lines 28 and 37
```
### Option B: Conservative (Move to Archive)
```bash
# Move to archive instead of deleting
mkdir -p tito/commands/_archived
mv tito/commands/{check,clean,clean_workspace,help,protect,report,version,view}.py tito/commands/_archived/
mv tito/commands/{notebooks,status}.py tito/commands/_archived/
```
### Option C: Do Nothing
Current state is **fine** - tests prove nothing is broken. The extra files just create clutter but don't hurt.
## After Cleanup
Update `tests/cli/test_cli_registry.py`:
```python
# Remove these from known_internal since they'll be deleted:
known_internal = {
'health.py', # Used by system
'info.py', # Used by system
'jupyter.py', # Used by system
'nbdev.py', # Used by package
'reset.py', # Used by package
'module_reset.py' # Used by module_workflow
}
```
## Summary
Your CLI is **healthy**! The tests caught:
- ✅ 18 working registered commands
- ✅ 6 internal helper commands (properly used)
- ❌ 2 dead imports (should remove)
- ❌ 8 orphaned files (safe to delete)
- ❌ 2 unused command files (safe to delete)
**Total cleanup**: 12 files/imports that can be safely removed without breaking anything.
Want me to do the cleanup for you?

View File

@@ -1,233 +0,0 @@
# CLI Hierarchy Refactor - COMPLETE ✅
## Summary
Successfully refactored TinyTorch CLI from flat structure to hierarchical organization with subfolders for complex commands.
**Date**: 2025-11-28
**Tests Passing**: 52/52 ✅
**User Impact**: ZERO (completely internal)
---
## What Changed
### Before (Flat Structure)
```
tito/commands/
├── module_workflow.py
├── module_reset.py
├── system.py
├── info.py
├── health.py
├── jupyter.py
├── package.py
├── reset.py
├── nbdev.py
├── ... (34 files total, hard to navigate)
```
### After (Hierarchical Structure)
```
tito/commands/
├── module/
│ ├── __init__.py
│ ├── workflow.py # Main module command
│ └── reset.py # Module reset subcommand
├── system/
│ ├── __init__.py
│ ├── system.py # Main system command
│ ├── info.py # system info
│ ├── health.py # system doctor
│ └── jupyter.py # system jupyter
├── package/
│ ├── __init__.py
│ ├── package.py # Main package command
│ ├── reset.py # package reset
│ └── nbdev.py # package nbdev
├── _archived/ # Deprecated files
│ ├── clean.py
│ ├── help.py
│ ├── notebooks.py
│ └── status.py
├── setup.py # Simple commands stay flat
├── test.py
├── export.py
└── ... (15 simple commands)
```
---
## Benefits
### ✅ Clear Ownership
- Easy to see that `module/reset.py` belongs to module command
- No confusion about which files are helpers vs top-level commands
### ✅ Better Organization
- Related files grouped together
- Subfolders scale as commands grow
- Clear separation between simple and complex commands
### ✅ Easier Maintenance
- Tests validate structure automatically
- Adding new subcommands is straightforward
- No orphaned files hiding in flat structure
### ✅ Zero User Impact
```bash
# These still work EXACTLY the same:
tito module complete 01
tito system info
tito package export
```
---
## Files Changed
### Moved Files (10)
```
module_workflow.py → module/workflow.py
module_reset.py → module/reset.py
system.py → system/system.py
info.py → system/info.py
health.py → system/health.py
jupyter.py → system/jupyter.py
package.py → package/package.py
reset.py → package/reset.py
nbdev.py → package/nbdev.py
```
### Created Files (4)
```
module/__init__.py
system/__init__.py
package/__init__.py
_archived/README.md
```
### Updated Files (3)
```
tito/main.py # Updated imports
tito/commands/__init__.py # Updated imports
tests/cli/test_cli_registry.py # Updated file path expectations
```
### Archived Files (4)
```
Moved to _archived/:
- clean.py (deprecated)
- help.py (deprecated)
- notebooks.py (deprecated)
- status.py (deprecated)
```
---
## Test Results
### Before Refactor
```
52 tests passing ✅
```
### After Refactor
```
52 tests passing ✅
```
### Test Coverage
- ✅ All commands are BaseCommand subclasses
- ✅ All commands have descriptions
- ✅ All commands implement required methods
- ✅ All help text accessible
- ✅ No orphaned files
- ✅ All file paths correct
- ✅ All subcommands work
---
## Verification Commands
Test the refactored CLI:
```bash
# Version check
tito --version
# Module commands
tito module -h
tito module status
# System commands
tito system -h
tito system info
tito system doctor
# Package commands
tito package -h
tito package reset -h
# Run all tests
pytest tests/cli/ -v
# Quick import test
python -c "from tito.main import TinyTorchCLI; print('Success')"
```
All passing! ✅
---
## Architecture Decision
**Question**: Should we organize commands with subcommands into subfolders?
**Answer**: YES! ✅
**Follows best practices from**:
- Git (`git/builtin/`)
- AWS CLI (`awscli/customizations/`)
- Django (`django/core/management/commands/`)
- Click (Python CLI framework)
**Key insight**: Flat worked when small, but with 34 files it became unmaintainable. Hierarchical structure scales better and makes ownership crystal clear.
---
## Future Additions
When adding new commands:
### Simple Command (no subcommands)
```bash
# Create at top level
tito/commands/newcmd.py
```
### Complex Command (with subcommands)
```bash
# Create subfolder
tito/commands/newcmd/
├── __init__.py # Export main command
├── newcmd.py # Main command
└── helper.py # Subcommand
```
Tests will automatically validate! 🎉
---
## Impact Summary
| Metric | Before | After |
|--------|--------|-------|
| Total files in commands/ | 34 | 29 (+ 3 subfolders) |
| Flat files | 34 | 19 |
| Organized in subfolders | 0 | 10 |
| Orphaned files | Unknown | 0 (archived) |
| Tests passing | 52 | 52 |
| User-facing changes | N/A | 0 |
| Developer clarity | ⚠️ Confusing | ✅ Crystal clear |
**Result**: Much cleaner, easier to maintain, zero user impact! 🚀

View File

@@ -1,472 +1,436 @@
#!/usr/bin/env python3
"""
Comprehensive gradient flow testing for TinyTorch.
Comprehensive Gradient Flow Tests for TinyTorch
================================================
This test suite systematically validates that gradients propagate correctly
through all components of the training stack.
Tests that gradients flow correctly through:
1. Simple networks (single layer)
2. Multi-layer networks (MLP)
3. Convolutional networks (CNN)
4. Attention mechanisms
5. Complete training loops
Run with: pytest tests/test_gradient_flow.py -v
Or directly: python tests/test_gradient_flow.py
This ensures backpropagation works correctly end-to-end.
"""
import numpy as np
import sys
import os
import numpy as np
# Add project root to path
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, project_root)
from tinytorch import Tensor, Linear, Dropout
from tinytorch import Sigmoid, ReLU, Tanh, GELU, Softmax
from tinytorch import MSELoss, CrossEntropyLoss, BinaryCrossEntropyLoss
from tinytorch import SGD, AdamW
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Linear, Dropout
from tinytorch.core.activations import ReLU, Sigmoid, Softmax
from tinytorch.core.losses import MSELoss, BinaryCrossEntropyLoss, CrossEntropyLoss
from tinytorch.core.optimizers import SGD, Adam
from tinytorch.core.spatial import Conv2d, MaxPool2d
from tinytorch.core.autograd import enable_autograd
# Enable autograd
enable_autograd()
def test_simple_linear_gradient_flow():
"""Test gradients flow through a single linear layer"""
print("\n" + "="*70)
print("TEST 1: Simple Linear Layer Gradient Flow")
print("="*70)
# Create simple network: Linear(2->1)
layer = Linear(2, 1)
# Input
x = Tensor([[1.0, 2.0]], requires_grad=True)
target = Tensor([[3.0]])
# Forward pass
output = layer.forward(x)
# Loss
loss_fn = MSELoss()
loss = loss_fn.forward(output, target)
print(f"Initial loss: {float(loss.data):.4f}")
print(f"Initial weight shape: {layer.weight.shape}")
print(f"Initial bias shape: {layer.bias.shape}")
# Backward pass
loss.backward()
# Check gradients exist
assert layer.weight.grad is not None, "Weight gradient is None!"
assert layer.bias.grad is not None, "Bias gradient is None!"
assert x.grad is not None, "Input gradient is None!"
# Check gradients are non-zero
weight_grad_norm = np.linalg.norm(layer.weight.grad.data)
bias_grad_norm = np.linalg.norm(layer.bias.grad.data)
input_grad_norm = np.linalg.norm(x.grad.data)
print(f"\n✓ Weight gradient norm: {weight_grad_norm:.6f}")
print(f"✓ Bias gradient norm: {bias_grad_norm:.6f}")
print(f"✓ Input gradient norm: {input_grad_norm:.6f}")
assert weight_grad_norm > 1e-6, f"Weight gradients too small: {weight_grad_norm}"
assert bias_grad_norm > 1e-6, f"Bias gradients too small: {bias_grad_norm}"
assert input_grad_norm > 1e-6, f"Input gradients too small: {input_grad_norm}"
print("\n✅ TEST PASSED: Gradients flow correctly through linear layer")
return True
class TestBasicTensorGradients:
"""Test gradient computation for basic tensor operations."""
def test_multiplication_gradient(self):
"""Test gradient flow through multiplication."""
x = Tensor([[1.0, 2.0]], requires_grad=True)
y = x * 3
loss = y.sum()
loss.backward()
# dy/dx = 3
assert x.grad is not None, "Gradient should be computed"
assert np.allclose(x.grad, [[3.0, 3.0]]), f"Expected [[3, 3]], got {x.grad}"
def test_addition_gradient(self):
"""Test gradient flow through addition."""
x = Tensor([[1.0, 2.0]], requires_grad=True)
y = Tensor([[3.0, 4.0]], requires_grad=True)
z = x + y
loss = z.sum()
loss.backward()
# dz/dx = 1, dz/dy = 1
assert np.allclose(x.grad, [[1.0, 1.0]]), f"x.grad: {x.grad}"
assert np.allclose(y.grad, [[1.0, 1.0]]), f"y.grad: {y.grad}"
def test_chain_rule(self):
"""Test gradient flow through chain of operations."""
x = Tensor([[2.0]], requires_grad=True)
y = x * 3 # y = 3x
z = y + 1 # z = 3x + 1
w = z * 2 # w = 2(3x + 1) = 6x + 2
w.backward()
# dw/dx = 6
assert np.allclose(x.grad, [[6.0]]), f"Expected [[6]], got {x.grad}"
def test_matmul_gradient(self):
"""Test gradient flow through matrix multiplication."""
x = Tensor([[1.0, 2.0]], requires_grad=True)
W = Tensor([[1.0], [2.0]], requires_grad=True)
y = x.matmul(W) # y = [[5.0]]
y.backward()
# dy/dx = W^T = [[1, 2]]
# dy/dW = x^T = [[1], [2]]
assert np.allclose(x.grad, [[1.0, 2.0]]), f"x.grad: {x.grad}"
assert np.allclose(W.grad, [[1.0], [2.0]]), f"W.grad: {W.grad}"
def test_broadcasting_gradient(self):
"""Test gradient flow with broadcasting (e.g., bias addition)."""
x = Tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True) # (2, 2)
bias = Tensor([1.0, 2.0], requires_grad=True) # (2,)
y = x + bias # Broadcasting happens
loss = y.sum()
loss.backward()
# Gradient should sum over broadcast dimension
assert x.grad.shape == (2, 2), f"x.grad shape: {x.grad.shape}"
assert bias.grad.shape == (2,), f"bias.grad shape: {bias.grad.shape}"
assert np.allclose(bias.grad, [2.0, 2.0]), f"bias.grad: {bias.grad}"
def test_mlp_gradient_flow():
"""Test gradients flow through multi-layer perceptron"""
print("\n" + "="*70)
print("TEST 2: Multi-Layer Perceptron Gradient Flow")
print("="*70)
# Create MLP: Input(4) -> Linear(4->8) -> ReLU -> Linear(8->2)
layer1 = Linear(4, 8)
activation = ReLU()
layer2 = Linear(8, 2)
# Input and target
x = Tensor(np.random.randn(3, 4), requires_grad=True)
target = Tensor(np.array([[1, 0], [0, 1], [1, 0]]))
print(f"Input shape: {x.shape}")
print(f"Target shape: {target.shape}")
# Forward pass
h1 = layer1.forward(x)
h1_activated = activation.forward(h1)
output = layer2.forward(h1_activated)
print(f"Hidden layer shape: {h1.shape}")
print(f"Output shape: {output.shape}")
# Loss
loss_fn = MSELoss()
loss = loss_fn.forward(output, target)
print(f"Initial loss: {float(loss.data):.4f}")
# Backward pass
loss.backward()
# Check all layer gradients exist
assert layer1.weight.grad is not None, "Layer1 weight gradient is None!"
assert layer1.bias.grad is not None, "Layer1 bias gradient is None!"
assert layer2.weight.grad is not None, "Layer2 weight gradient is None!"
assert layer2.bias.grad is not None, "Layer2 bias gradient is None!"
# Check gradient magnitudes
l1_weight_norm = np.linalg.norm(layer1.weight.grad.data)
l1_bias_norm = np.linalg.norm(layer1.bias.grad.data)
l2_weight_norm = np.linalg.norm(layer2.weight.grad.data)
l2_bias_norm = np.linalg.norm(layer2.bias.grad.data)
print(f"\n✓ Layer1 weight gradient norm: {l1_weight_norm:.6f}")
print(f"✓ Layer1 bias gradient norm: {l1_bias_norm:.6f}")
print(f"✓ Layer2 weight gradient norm: {l2_weight_norm:.6f}")
print(f"✓ Layer2 bias gradient norm: {l2_bias_norm:.6f}")
assert l1_weight_norm > 1e-6, "Layer1 weight gradients too small"
assert l1_bias_norm > 1e-6, "Layer1 bias gradients too small"
assert l2_weight_norm > 1e-6, "Layer2 weight gradients too small"
assert l2_bias_norm > 1e-6, "Layer2 bias gradients too small"
print("\n✅ TEST PASSED: Gradients flow correctly through MLP")
return True
class TestLayerGradients:
"""Test gradient computation through neural network layers."""
def test_linear_layer_gradients(self):
"""Test gradient flow through Linear layer."""
layer = Linear(2, 3)
x = Tensor([[1.0, 2.0]], requires_grad=True)
w_before = layer.weight.data.copy()
b_before = layer.bias.data.copy()
out = layer(x)
loss = out.sum()
loss.backward()
# All gradients should exist
assert layer.weight.grad is not None, "Weight gradient missing"
assert layer.bias.grad is not None, "Bias gradient missing"
assert x.grad is not None, "Input gradient missing"
# Gradient shapes should match parameter shapes
assert layer.weight.grad.shape == layer.weight.shape
assert layer.bias.grad.shape == layer.bias.shape
def test_multi_layer_gradients(self):
"""Test gradient flow through multiple layers."""
layer1 = Linear(2, 3)
layer2 = Linear(3, 1)
x = Tensor([[1.0, 2.0]], requires_grad=True)
h = layer1(x)
out = layer2(h)
loss = out.sum()
loss.backward()
# All layers should have gradients
assert layer1.weight.grad is not None
assert layer1.bias.grad is not None
assert layer2.weight.grad is not None
assert layer2.bias.grad is not None
def test_mlp_training_updates():
"""Test that MLP actually learns (loss decreases)"""
print("\n" + "="*70)
print("TEST 3: MLP Training - Loss Reduction")
print("="*70)
# Create simple MLP
layer1 = Linear(2, 4)
activation = ReLU()
layer2 = Linear(4, 1)
class TestActivationGradients:
"""Test gradient computation through activation functions."""
def test_sigmoid_gradient(self):
"""Test gradient flow through Sigmoid."""
x = Tensor([[0.0, 1.0, -1.0]], requires_grad=True)
sigmoid = Sigmoid()
y = sigmoid(x)
loss = y.sum()
loss.backward()
assert x.grad is not None, "Sigmoid gradient missing"
# Sigmoid gradient: σ'(x) = σ(x)(1 - σ(x))
# At x=0: σ(0) = 0.5, σ'(0) = 0.25
assert x.grad[0, 0] > 0, "Gradient should be positive"
def test_relu_gradient(self):
"""Test gradient flow through ReLU."""
x = Tensor([[-1.0, 0.0, 1.0]], requires_grad=True)
relu = ReLU()
y = relu(x)
loss = y.sum()
loss.backward()
# ReLU gradient: 1 if x > 0, else 0
# Note: We haven't implemented ReLU backward yet, so this will fail
# TODO: Implement ReLU backward in autograd
def test_tanh_gradient(self):
"""Test gradient flow through Tanh."""
x = Tensor([[0.0, 1.0]], requires_grad=True)
tanh = Tanh()
y = tanh(x)
loss = y.sum()
# TODO: Implement Tanh backward
# loss.backward()
# Simple dataset (XOR-like)
X = Tensor(np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]), requires_grad=False)
y = Tensor(np.array([[0.0], [1.0], [1.0], [0.0]]))
# Optimizer
optimizer = SGD([layer1.weight, layer1.bias, layer2.weight, layer2.bias], lr=0.1)
loss_fn = MSELoss()
class TestLossGradients:
"""Test gradient computation through loss functions."""
def test_bce_gradient(self):
"""Test gradient flow through Binary Cross-Entropy."""
predictions = Tensor([[0.7, 0.3, 0.9]], requires_grad=True)
targets = Tensor([[1.0, 0.0, 1.0]])
loss_fn = BinaryCrossEntropyLoss()
loss = loss_fn(predictions, targets)
loss.backward()
assert predictions.grad is not None, "BCE gradient missing"
assert predictions.grad.shape == predictions.shape
# Gradient should be negative for correct predictions
assert predictions.grad[0, 0] < 0, "Gradient sign incorrect"
def test_mse_gradient(self):
"""Test gradient flow through MSE loss."""
predictions = Tensor([[1.0, 2.0, 3.0]], requires_grad=True)
targets = Tensor([[2.0, 2.0, 2.0]])
loss_fn = MSELoss()
loss = loss_fn(predictions, targets)
# TODO: Implement MSE backward
# loss.backward()
losses = []
print("Training for 50 epochs...")
for epoch in range(50):
# Forward
h1 = layer1.forward(X)
h1_act = activation.forward(h1)
output = layer2.forward(h1_act)
class TestOptimizerIntegration:
"""Test optimizer integration with gradient flow."""
def test_sgd_updates_parameters(self):
"""Test that SGD actually updates parameters."""
layer = Linear(2, 1)
optimizer = SGD(layer.parameters(), lr=0.1)
w_before = layer.weight.data.copy()
b_before = layer.bias.data.copy()
# Forward pass
x = Tensor([[1.0, 2.0]], requires_grad=True)
out = layer(x)
loss = out.sum()
# Backward pass
loss.backward()
# Optimizer step
optimizer.step()
# Parameters should change
assert not np.allclose(layer.weight.data, w_before), "Weights didn't update"
assert not np.allclose(layer.bias.data, b_before), "Bias didn't update"
def test_zero_grad_clears_gradients(self):
"""Test that zero_grad() clears gradients."""
layer = Linear(2, 1)
optimizer = SGD(layer.parameters(), lr=0.1)
# First backward pass
x = Tensor([[1.0, 2.0]])
out = layer(x)
loss = out.sum()
loss.backward()
assert layer.weight.grad is not None, "Gradient should exist"
# Clear gradients
# Loss
loss = loss_fn.forward(output, y)
losses.append(float(loss.data))
# Backward
optimizer.zero_grad()
assert layer.weight.grad is None, "Gradient should be cleared"
assert layer.bias.grad is None, "Bias gradient should be cleared"
def test_adamw_updates_parameters(self):
"""Test that AdamW optimizer works."""
layer = Linear(2, 1)
optimizer = AdamW(layer.parameters(), lr=0.01)
w_before = layer.weight.data.copy()
x = Tensor([[1.0, 2.0]])
out = layer(x)
loss = out.sum()
loss.backward()
# Update
optimizer.step()
assert not np.allclose(layer.weight.data, w_before), "AdamW didn't update weights"
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
# Check loss decreased
initial_loss = losses[0]
final_loss = losses[-1]
reduction = initial_loss - final_loss
reduction_pct = (reduction / initial_loss) * 100
print(f"\n✓ Initial loss: {initial_loss:.6f}")
print(f"✓ Final loss: {final_loss:.6f}")
print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
assert reduction_pct > 10, f"Loss reduction too small: {reduction_pct:.1f}%"
print("\n✅ TEST PASSED: MLP learns successfully (loss decreases)")
return True
class TestFullTrainingLoop:
"""Test complete training scenarios."""
def test_simple_convergence(self):
"""Test that a simple model can learn."""
# Simple task: learn to output 5 from input [1, 2]
layer = Linear(2, 1)
optimizer = SGD(layer.parameters(), lr=0.1)
loss_fn = MSELoss()
x = Tensor([[1.0, 2.0]])
target = Tensor([[5.0]])
initial_loss = None
final_loss = None
# Train for a few iterations
for i in range(50):
# Forward
pred = layer(x)
loss = loss_fn(pred, target)
if i == 0:
initial_loss = loss.data
if i == 49:
final_loss = loss.data
# Backward
loss.backward()
# Update
optimizer.step()
optimizer.zero_grad()
# Loss should decrease
assert final_loss < initial_loss, f"Loss didn't decrease: {initial_loss}{final_loss}"
def test_binary_classification(self):
"""Test binary classification training."""
layer = Linear(2, 1)
sigmoid = Sigmoid()
loss_fn = BinaryCrossEntropyLoss()
optimizer = SGD(layer.parameters(), lr=0.1)
# Simple dataset: [1, 1] → 1, [0, 0] → 0
X = Tensor([[1.0, 1.0], [0.0, 0.0]])
y = Tensor([[1.0], [0.0]])
initial_loss = None
final_loss = None
for i in range(50):
# Forward
logits = layer(X)
probs = sigmoid(logits)
loss = loss_fn(probs, y)
if i == 0:
initial_loss = loss.data
if i == 49:
final_loss = loss.data
# Backward
loss.backward()
# Update
optimizer.step()
optimizer.zero_grad()
assert final_loss < initial_loss, "Binary classification didn't learn"
def test_cnn_gradient_flow():
"""Test gradients flow through convolutional layers"""
print("\n" + "="*70)
print("TEST 4: CNN Gradient Flow")
print("="*70)
# Create simple CNN: Conv2d -> ReLU -> Linear
conv = Conv2d(in_channels=1, out_channels=4, kernel_size=3, stride=1, padding=0)
activation = ReLU()
# Input: batch=2, channels=1, height=8, width=8
x = Tensor(np.random.randn(2, 1, 8, 8), requires_grad=True)
print(f"Input shape: {x.shape}")
print(f"Conv weight shape: {conv.weight.shape}")
# Forward through conv
conv_out = conv.forward(x)
print(f"Conv output shape: {conv_out.shape}")
activated = activation.forward(conv_out)
# Flatten for linear layer
batch_size = activated.shape[0]
flattened_size = np.prod(activated.shape[1:])
# Use reshape method to maintain gradient flow
flattened = activated.reshape(batch_size, flattened_size)
linear = Linear(flattened_size, 2)
output = linear.forward(flattened)
print(f"Flattened shape: {flattened.shape}")
print(f"Output shape: {output.shape}")
# Loss
target = Tensor(np.array([[1, 0], [0, 1]]))
loss_fn = MSELoss()
loss = loss_fn.forward(output, target)
print(f"Initial loss: {float(loss.data):.4f}")
# Backward
loss.backward()
# Check gradients
assert conv.weight.grad is not None, "Conv weight gradient is None!"
assert conv.bias.grad is not None, "Conv bias gradient is None!"
assert linear.weight.grad is not None, "Linear weight gradient is None!"
weight_grad_norm = np.linalg.norm(conv.weight.grad.data)
conv_bias_norm = np.linalg.norm(conv.bias.grad.data)
linear_grad_norm = np.linalg.norm(linear.weight.grad.data)
print(f"\n✓ Conv weight gradient norm: {weight_grad_norm:.6f}")
print(f"✓ Conv bias gradient norm: {conv_bias_norm:.6f}")
print(f"✓ Linear weight gradient norm: {linear_grad_norm:.6f}")
assert weight_grad_norm > 1e-6, f"Conv weight gradients too small: {weight_grad_norm}"
assert conv_bias_norm > 1e-6, f"Conv bias gradients too small: {conv_bias_norm}"
assert linear_grad_norm > 1e-6, f"Linear gradients too small: {linear_grad_norm}"
print("\n✅ TEST PASSED: Gradients flow correctly through CNN")
return True
class TestEdgeCases:
"""Test edge cases and potential failure modes."""
def test_zero_gradient(self):
"""Test that zero gradients don't break training."""
x = Tensor([[0.0, 0.0]], requires_grad=True)
y = x * 0
loss = y.sum()
def test_cnn_training_updates():
"""Test that CNN actually learns on simple data"""
print("\n" + "="*70)
print("TEST 5: CNN Training - Loss Reduction")
print("="*70)
# Simple CNN
conv = Conv2d(1, 2, kernel_size=3, stride=1, padding=1)
activation = ReLU()
# Simple data: 4 samples, 1 channel, 4x4 images
X = Tensor(np.random.randn(4, 1, 4, 4), requires_grad=False)
# After conv: (4, 2, 4, 4) -> flatten to (4, 32)
conv_out_size = 2 * 4 * 4 # channels * height * width
linear = Linear(conv_out_size, 2)
y = Tensor(np.array([[1, 0], [0, 1], [1, 0], [0, 1]]))
# Get parameters with gradients
params = []
for p in [conv.weight, conv.bias, linear.weight, linear.bias]:
if not p.requires_grad:
p.requires_grad = True
params.append(p)
# Optimizer
optimizer = SGD(params, lr=0.01)
loss_fn = MSELoss()
losses = []
print("Training for 30 epochs...")
for epoch in range(30):
# Forward
conv_out = conv.forward(X)
activated = activation.forward(conv_out)
# Flatten using reshape to maintain gradients
batch_size = activated.shape[0]
flattened = activated.reshape(batch_size, -1)
output = linear.forward(flattened)
# Loss
loss = loss_fn.forward(output, y)
losses.append(float(loss.data))
# Backward
optimizer.zero_grad()
loss.backward()
assert x.grad is not None
assert np.allclose(x.grad, [[0.0, 0.0]])
def test_very_small_values(self):
"""Test gradient flow with very small values."""
x = Tensor([[1e-8, 1e-8]], requires_grad=True)
y = x * 2
loss = y.sum()
loss.backward()
assert x.grad is not None
assert np.allclose(x.grad, [[2.0, 2.0]])
def test_gradient_accumulation(self):
"""Test that gradients accumulate correctly across multiple backward passes."""
x = Tensor([[1.0]], requires_grad=True)
# First backward
y1 = x * 2
y1.backward()
grad_after_first = x.grad.copy()
# Second backward (without zero_grad)
y2 = x * 3
y2.backward()
# Gradient should accumulate: 2 + 3 = 5
expected = grad_after_first + np.array([[3.0]])
assert np.allclose(x.grad, expected), f"Expected {expected}, got {x.grad}"
# Update
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
# Check loss decreased
initial_loss = losses[0]
final_loss = losses[-1]
reduction = initial_loss - final_loss
reduction_pct = (reduction / initial_loss) * 100
print(f"\n✓ Initial loss: {initial_loss:.6f}")
print(f"✓ Final loss: {final_loss:.6f}")
print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
print("\n✅ TEST PASSED: CNN learns successfully (loss decreases)")
return True
def run_all_tests():
"""Run all tests and print results."""
import inspect
test_classes = [
TestBasicTensorGradients,
TestLayerGradients,
TestActivationGradients,
TestLossGradients,
TestOptimizerIntegration,
TestFullTrainingLoop,
TestEdgeCases,
def test_gradient_accumulation():
"""Test that gradients accumulate correctly across batches"""
print("\n" + "="*70)
print("TEST 6: Gradient Accumulation")
print("="*70)
layer = Linear(2, 1)
# Two batches
x1 = Tensor([[1.0, 2.0]], requires_grad=True)
x2 = Tensor([[3.0, 4.0]], requires_grad=True)
target = Tensor([[1.0]])
loss_fn = MSELoss()
# Forward + backward on first batch (don't zero grad)
out1 = layer.forward(x1)
loss1 = loss_fn.forward(out1, target)
loss1.backward()
grad_after_first = np.array(layer.weight.grad.data)
# Forward + backward on second batch (gradients should accumulate)
out2 = layer.forward(x2)
loss2 = loss_fn.forward(out2, target)
loss2.backward()
grad_after_second = layer.weight.grad.data
# Gradients should have accumulated (not been replaced)
grad_diff = np.linalg.norm(grad_after_second - grad_after_first)
print(f"✓ Gradient after first batch norm: {np.linalg.norm(grad_after_first):.6f}")
print(f"✓ Gradient after second batch norm: {np.linalg.norm(grad_after_second):.6f}")
print(f"✓ Difference: {grad_diff:.6f}")
assert grad_diff > 1e-6, "Gradients didn't accumulate properly"
print("\n✅ TEST PASSED: Gradients accumulate correctly")
return True
def main():
"""Run all gradient flow tests"""
print("\n" + "="*70)
print(" TINYTORCH GRADIENT FLOW TEST SUITE")
print("="*70)
tests = [
("Simple Linear", test_simple_linear_gradient_flow),
("MLP Gradient Flow", test_mlp_gradient_flow),
("MLP Training", test_mlp_training_updates),
("CNN Gradient Flow", test_cnn_gradient_flow),
("CNN Training", test_cnn_training_updates),
("Gradient Accumulation", test_gradient_accumulation),
]
total_tests = 0
passed_tests = 0
failed_tests = []
skipped_tests = []
print("=" * 80)
print("🧪 TINYTORCH GRADIENT FLOW TEST SUITE")
print("=" * 80)
for test_class in test_classes:
print(f"\n{'=' * 80}")
print(f"📦 {test_class.__name__}")
print(f"{'=' * 80}")
instance = test_class()
methods = [m for m in dir(instance) if m.startswith('test_')]
for method_name in methods:
total_tests += 1
method = getattr(instance, method_name)
# Get docstring
doc = method.__doc__ or method_name
doc = doc.strip().split('\n')[0]
print(f"\n {method_name}")
print(f" {doc}")
try:
method()
print(f" ✅ PASSED")
passed_tests += 1
except NotImplementedError as e:
print(f" ⏭️ SKIPPED: {e}")
skipped_tests.append((test_class.__name__, method_name, str(e)))
except AssertionError as e:
print(f" ❌ FAILED: {e}")
failed_tests.append((test_class.__name__, method_name, str(e)))
except Exception as e:
print(f" ❌ ERROR: {e}")
failed_tests.append((test_class.__name__, method_name, str(e)))
results = []
for name, test_func in tests:
try:
result = test_func()
results.append((name, "PASSED" if result else "FAILED"))
except Exception as e:
print(f"\n❌ TEST FAILED: {name}")
print(f"Error: {str(e)}")
import traceback
traceback.print_exc()
results.append((name, "FAILED"))
# Summary
print("\n" + "=" * 80)
print("📊 TEST SUMMARY")
print("=" * 80)
print(f"Total tests: {total_tests}")
print(f"✅ Passed: {passed_tests}")
print(f"❌ Failed: {len(failed_tests)}")
print(f"⏭️ Skipped: {len(skipped_tests)}")
if failed_tests:
print("\n" + "=" * 80)
print("❌ FAILED TESTS:")
print("=" * 80)
for class_name, method_name, error in failed_tests:
print(f"\n {class_name}.{method_name}")
print(f" {error}")
if skipped_tests:
print("\n" + "=" * 80)
print("⏭️ SKIPPED TESTS (Not Yet Implemented):")
print("=" * 80)
for class_name, method_name, reason in skipped_tests:
print(f" {class_name}.{method_name}")
print("\n" + "=" * 80)
return len(failed_tests) == 0
print("\n" + "="*70)
print(" TEST SUMMARY")
print("="*70)
passed = sum(1 for _, status in results if status == "PASSED")
total = len(results)
for name, status in results:
symbol = "" if status == "PASSED" else ""
print(f"{symbol} {name}: {status}")
print(f"\nTotal: {passed}/{total} tests passed")
if passed == total:
print("\n🎉 ALL TESTS PASSED! Gradients flow correctly through TinyTorch.")
return 0
else:
print(f"\n⚠️ {total - passed} tests failed. Please review the errors above.")
return 1
if __name__ == "__main__":
success = run_all_tests()
sys.exit(0 if success else 1)
exit(main())

View File

@@ -1,436 +0,0 @@
#!/usr/bin/env python3
"""
Comprehensive Gradient Flow Tests for TinyTorch
================================================
Tests that gradients flow correctly through:
1. Simple networks (single layer)
2. Multi-layer networks (MLP)
3. Convolutional networks (CNN)
4. Attention mechanisms
5. Complete training loops
This ensures backpropagation works correctly end-to-end.
"""
import sys
import os
import numpy as np
# Add project root to path
project_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, project_root)
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Linear, Dropout
from tinytorch.core.activations import ReLU, Sigmoid, Softmax
from tinytorch.core.losses import MSELoss, BinaryCrossEntropyLoss, CrossEntropyLoss
from tinytorch.core.optimizers import SGD, Adam
from tinytorch.core.spatial import Conv2d, MaxPool2d
from tinytorch.core.autograd import enable_autograd
# Enable autograd
enable_autograd()
def test_simple_linear_gradient_flow():
"""Test gradients flow through a single linear layer"""
print("\n" + "="*70)
print("TEST 1: Simple Linear Layer Gradient Flow")
print("="*70)
# Create simple network: Linear(2->1)
layer = Linear(2, 1)
# Input
x = Tensor([[1.0, 2.0]], requires_grad=True)
target = Tensor([[3.0]])
# Forward pass
output = layer.forward(x)
# Loss
loss_fn = MSELoss()
loss = loss_fn.forward(output, target)
print(f"Initial loss: {float(loss.data):.4f}")
print(f"Initial weight shape: {layer.weight.shape}")
print(f"Initial bias shape: {layer.bias.shape}")
# Backward pass
loss.backward()
# Check gradients exist
assert layer.weight.grad is not None, "Weight gradient is None!"
assert layer.bias.grad is not None, "Bias gradient is None!"
assert x.grad is not None, "Input gradient is None!"
# Check gradients are non-zero
weight_grad_norm = np.linalg.norm(layer.weight.grad.data)
bias_grad_norm = np.linalg.norm(layer.bias.grad.data)
input_grad_norm = np.linalg.norm(x.grad.data)
print(f"\n✓ Weight gradient norm: {weight_grad_norm:.6f}")
print(f"✓ Bias gradient norm: {bias_grad_norm:.6f}")
print(f"✓ Input gradient norm: {input_grad_norm:.6f}")
assert weight_grad_norm > 1e-6, f"Weight gradients too small: {weight_grad_norm}"
assert bias_grad_norm > 1e-6, f"Bias gradients too small: {bias_grad_norm}"
assert input_grad_norm > 1e-6, f"Input gradients too small: {input_grad_norm}"
print("\n✅ TEST PASSED: Gradients flow correctly through linear layer")
return True
def test_mlp_gradient_flow():
"""Test gradients flow through multi-layer perceptron"""
print("\n" + "="*70)
print("TEST 2: Multi-Layer Perceptron Gradient Flow")
print("="*70)
# Create MLP: Input(4) -> Linear(4->8) -> ReLU -> Linear(8->2)
layer1 = Linear(4, 8)
activation = ReLU()
layer2 = Linear(8, 2)
# Input and target
x = Tensor(np.random.randn(3, 4), requires_grad=True)
target = Tensor(np.array([[1, 0], [0, 1], [1, 0]]))
print(f"Input shape: {x.shape}")
print(f"Target shape: {target.shape}")
# Forward pass
h1 = layer1.forward(x)
h1_activated = activation.forward(h1)
output = layer2.forward(h1_activated)
print(f"Hidden layer shape: {h1.shape}")
print(f"Output shape: {output.shape}")
# Loss
loss_fn = MSELoss()
loss = loss_fn.forward(output, target)
print(f"Initial loss: {float(loss.data):.4f}")
# Backward pass
loss.backward()
# Check all layer gradients exist
assert layer1.weight.grad is not None, "Layer1 weight gradient is None!"
assert layer1.bias.grad is not None, "Layer1 bias gradient is None!"
assert layer2.weight.grad is not None, "Layer2 weight gradient is None!"
assert layer2.bias.grad is not None, "Layer2 bias gradient is None!"
# Check gradient magnitudes
l1_weight_norm = np.linalg.norm(layer1.weight.grad.data)
l1_bias_norm = np.linalg.norm(layer1.bias.grad.data)
l2_weight_norm = np.linalg.norm(layer2.weight.grad.data)
l2_bias_norm = np.linalg.norm(layer2.bias.grad.data)
print(f"\n✓ Layer1 weight gradient norm: {l1_weight_norm:.6f}")
print(f"✓ Layer1 bias gradient norm: {l1_bias_norm:.6f}")
print(f"✓ Layer2 weight gradient norm: {l2_weight_norm:.6f}")
print(f"✓ Layer2 bias gradient norm: {l2_bias_norm:.6f}")
assert l1_weight_norm > 1e-6, "Layer1 weight gradients too small"
assert l1_bias_norm > 1e-6, "Layer1 bias gradients too small"
assert l2_weight_norm > 1e-6, "Layer2 weight gradients too small"
assert l2_bias_norm > 1e-6, "Layer2 bias gradients too small"
print("\n✅ TEST PASSED: Gradients flow correctly through MLP")
return True
def test_mlp_training_updates():
"""Test that MLP actually learns (loss decreases)"""
print("\n" + "="*70)
print("TEST 3: MLP Training - Loss Reduction")
print("="*70)
# Create simple MLP
layer1 = Linear(2, 4)
activation = ReLU()
layer2 = Linear(4, 1)
# Simple dataset (XOR-like)
X = Tensor(np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]), requires_grad=False)
y = Tensor(np.array([[0.0], [1.0], [1.0], [0.0]]))
# Optimizer
optimizer = SGD([layer1.weight, layer1.bias, layer2.weight, layer2.bias], lr=0.1)
loss_fn = MSELoss()
losses = []
print("Training for 50 epochs...")
for epoch in range(50):
# Forward
h1 = layer1.forward(X)
h1_act = activation.forward(h1)
output = layer2.forward(h1_act)
# Loss
loss = loss_fn.forward(output, y)
losses.append(float(loss.data))
# Backward
optimizer.zero_grad()
loss.backward()
# Update
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
# Check loss decreased
initial_loss = losses[0]
final_loss = losses[-1]
reduction = initial_loss - final_loss
reduction_pct = (reduction / initial_loss) * 100
print(f"\n✓ Initial loss: {initial_loss:.6f}")
print(f"✓ Final loss: {final_loss:.6f}")
print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
assert reduction_pct > 10, f"Loss reduction too small: {reduction_pct:.1f}%"
print("\n✅ TEST PASSED: MLP learns successfully (loss decreases)")
return True
def test_cnn_gradient_flow():
"""Test gradients flow through convolutional layers"""
print("\n" + "="*70)
print("TEST 4: CNN Gradient Flow")
print("="*70)
# Create simple CNN: Conv2d -> ReLU -> Linear
conv = Conv2d(in_channels=1, out_channels=4, kernel_size=3, stride=1, padding=0)
activation = ReLU()
# Input: batch=2, channels=1, height=8, width=8
x = Tensor(np.random.randn(2, 1, 8, 8), requires_grad=True)
print(f"Input shape: {x.shape}")
print(f"Conv weight shape: {conv.weight.shape}")
# Forward through conv
conv_out = conv.forward(x)
print(f"Conv output shape: {conv_out.shape}")
activated = activation.forward(conv_out)
# Flatten for linear layer
batch_size = activated.shape[0]
flattened_size = np.prod(activated.shape[1:])
# Use reshape method to maintain gradient flow
flattened = activated.reshape(batch_size, flattened_size)
linear = Linear(flattened_size, 2)
output = linear.forward(flattened)
print(f"Flattened shape: {flattened.shape}")
print(f"Output shape: {output.shape}")
# Loss
target = Tensor(np.array([[1, 0], [0, 1]]))
loss_fn = MSELoss()
loss = loss_fn.forward(output, target)
print(f"Initial loss: {float(loss.data):.4f}")
# Backward
loss.backward()
# Check gradients
assert conv.weight.grad is not None, "Conv weight gradient is None!"
assert conv.bias.grad is not None, "Conv bias gradient is None!"
assert linear.weight.grad is not None, "Linear weight gradient is None!"
weight_grad_norm = np.linalg.norm(conv.weight.grad.data)
conv_bias_norm = np.linalg.norm(conv.bias.grad.data)
linear_grad_norm = np.linalg.norm(linear.weight.grad.data)
print(f"\n✓ Conv weight gradient norm: {weight_grad_norm:.6f}")
print(f"✓ Conv bias gradient norm: {conv_bias_norm:.6f}")
print(f"✓ Linear weight gradient norm: {linear_grad_norm:.6f}")
assert weight_grad_norm > 1e-6, f"Conv weight gradients too small: {weight_grad_norm}"
assert conv_bias_norm > 1e-6, f"Conv bias gradients too small: {conv_bias_norm}"
assert linear_grad_norm > 1e-6, f"Linear gradients too small: {linear_grad_norm}"
print("\n✅ TEST PASSED: Gradients flow correctly through CNN")
return True
def test_cnn_training_updates():
"""Test that CNN actually learns on simple data"""
print("\n" + "="*70)
print("TEST 5: CNN Training - Loss Reduction")
print("="*70)
# Simple CNN
conv = Conv2d(1, 2, kernel_size=3, stride=1, padding=1)
activation = ReLU()
# Simple data: 4 samples, 1 channel, 4x4 images
X = Tensor(np.random.randn(4, 1, 4, 4), requires_grad=False)
# After conv: (4, 2, 4, 4) -> flatten to (4, 32)
conv_out_size = 2 * 4 * 4 # channels * height * width
linear = Linear(conv_out_size, 2)
y = Tensor(np.array([[1, 0], [0, 1], [1, 0], [0, 1]]))
# Get parameters with gradients
params = []
for p in [conv.weight, conv.bias, linear.weight, linear.bias]:
if not p.requires_grad:
p.requires_grad = True
params.append(p)
# Optimizer
optimizer = SGD(params, lr=0.01)
loss_fn = MSELoss()
losses = []
print("Training for 30 epochs...")
for epoch in range(30):
# Forward
conv_out = conv.forward(X)
activated = activation.forward(conv_out)
# Flatten using reshape to maintain gradients
batch_size = activated.shape[0]
flattened = activated.reshape(batch_size, -1)
output = linear.forward(flattened)
# Loss
loss = loss_fn.forward(output, y)
losses.append(float(loss.data))
# Backward
optimizer.zero_grad()
loss.backward()
# Update
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1:2d}: Loss = {float(loss.data):.6f}")
# Check loss decreased
initial_loss = losses[0]
final_loss = losses[-1]
reduction = initial_loss - final_loss
reduction_pct = (reduction / initial_loss) * 100
print(f"\n✓ Initial loss: {initial_loss:.6f}")
print(f"✓ Final loss: {final_loss:.6f}")
print(f"✓ Reduction: {reduction:.6f} ({reduction_pct:.1f}%)")
assert final_loss < initial_loss, f"Loss didn't decrease! Initial: {initial_loss}, Final: {final_loss}"
print("\n✅ TEST PASSED: CNN learns successfully (loss decreases)")
return True
def test_gradient_accumulation():
"""Test that gradients accumulate correctly across batches"""
print("\n" + "="*70)
print("TEST 6: Gradient Accumulation")
print("="*70)
layer = Linear(2, 1)
# Two batches
x1 = Tensor([[1.0, 2.0]], requires_grad=True)
x2 = Tensor([[3.0, 4.0]], requires_grad=True)
target = Tensor([[1.0]])
loss_fn = MSELoss()
# Forward + backward on first batch (don't zero grad)
out1 = layer.forward(x1)
loss1 = loss_fn.forward(out1, target)
loss1.backward()
grad_after_first = np.array(layer.weight.grad.data)
# Forward + backward on second batch (gradients should accumulate)
out2 = layer.forward(x2)
loss2 = loss_fn.forward(out2, target)
loss2.backward()
grad_after_second = layer.weight.grad.data
# Gradients should have accumulated (not been replaced)
grad_diff = np.linalg.norm(grad_after_second - grad_after_first)
print(f"✓ Gradient after first batch norm: {np.linalg.norm(grad_after_first):.6f}")
print(f"✓ Gradient after second batch norm: {np.linalg.norm(grad_after_second):.6f}")
print(f"✓ Difference: {grad_diff:.6f}")
assert grad_diff > 1e-6, "Gradients didn't accumulate properly"
print("\n✅ TEST PASSED: Gradients accumulate correctly")
return True
def main():
"""Run all gradient flow tests"""
print("\n" + "="*70)
print(" TINYTORCH GRADIENT FLOW TEST SUITE")
print("="*70)
tests = [
("Simple Linear", test_simple_linear_gradient_flow),
("MLP Gradient Flow", test_mlp_gradient_flow),
("MLP Training", test_mlp_training_updates),
("CNN Gradient Flow", test_cnn_gradient_flow),
("CNN Training", test_cnn_training_updates),
("Gradient Accumulation", test_gradient_accumulation),
]
results = []
for name, test_func in tests:
try:
result = test_func()
results.append((name, "PASSED" if result else "FAILED"))
except Exception as e:
print(f"\n❌ TEST FAILED: {name}")
print(f"Error: {str(e)}")
import traceback
traceback.print_exc()
results.append((name, "FAILED"))
# Summary
print("\n" + "="*70)
print(" TEST SUMMARY")
print("="*70)
passed = sum(1 for _, status in results if status == "PASSED")
total = len(results)
for name, status in results:
symbol = "" if status == "PASSED" else ""
print(f"{symbol} {name}: {status}")
print(f"\nTotal: {passed}/{total} tests passed")
if passed == total:
print("\n🎉 ALL TESTS PASSED! Gradients flow correctly through TinyTorch.")
return 0
else:
print(f"\n⚠️ {total - passed} tests failed. Please review the errors above.")
return 1
if __name__ == "__main__":
exit(main())

View File

@@ -1,24 +0,0 @@
# Archived Commands
These command files are no longer top-level commands but are kept for reference.
## Archived Files
- `clean.py` - Deprecated cleanup command
- `help.py` - Old help command (now handled by argparse)
- `notebooks.py` - Deprecated notebooks command
- `status.py` - Old status command (functionality moved to module workflow)
- `checkpoint.py` - Old checkpoint tracking (superseded by milestones command)
- `demo.py` - Demo runner (students can run demos directly with Python)
- `book.py` - Jupyter Book builder (developers can run jupyter-book directly)
- `leaderboard.py` - Community leaderboard (functionality merged into community command)
- `olympics.py` - Competition events (functionality merged into community command)
## Note
During the CLI reorganization on 2025-11-28, commands with subcommands were moved into logical subfolders:
- `module/` - Module workflow and reset
- `system/` - System commands (info, health, jupyter, check, version, clean_workspace, report, protect)
- `package/` - Package management (nbdev, reset)
These archived files are truly deprecated and not used anywhere in the codebase.

View File

@@ -1,396 +0,0 @@
"""
Book command for TinyTorch CLI: builds and manages the Jupyter Book.
"""
import os
import subprocess
from argparse import ArgumentParser, Namespace
from pathlib import Path
from rich.panel import Panel
from .base import BaseCommand
NOTEBOOKS_DIR = "modules"
class BookCommand(BaseCommand):
@property
def name(self) -> str:
return "book"
@property
def description(self) -> str:
return "Build and manage the TinyTorch Jupyter Book"
def add_arguments(self, parser: ArgumentParser) -> None:
subparsers = parser.add_subparsers(
dest='book_command',
help='Book management commands',
metavar='COMMAND'
)
# Build command
build_parser = subparsers.add_parser(
'build',
help='Build the Jupyter Book locally'
)
# Publish command
publish_parser = subparsers.add_parser(
'publish',
help='Generate content, commit, and publish to GitHub'
)
publish_parser.add_argument(
'--message',
type=str,
default='📚 Update book content',
help='Commit message (default: "📚 Update book content")'
)
publish_parser.add_argument(
'--branch',
type=str,
default='main',
help='Branch to push to (default: main)'
)
# Clean command
clean_parser = subparsers.add_parser(
'clean',
help='Clean built book files'
)
# Serve command
serve_parser = subparsers.add_parser(
'serve',
help='Build and serve the Jupyter Book locally'
)
serve_parser.add_argument(
'--port',
type=int,
default=8001,
help='Port to serve on (default: 8001)'
)
serve_parser.add_argument(
'--no-build',
action='store_true',
help='Skip building and serve existing files'
)
def run(self, args: Namespace) -> int:
console = self.console
# Check if we're in the right directory
if not Path("site").exists():
console.print(Panel(
"[red]❌ site/ directory not found. Run this command from the TinyTorch root directory.[/red]",
title="Error",
border_style="red"
))
return 1
# Handle subcommands
if not hasattr(args, 'book_command') or not args.book_command:
console.print(Panel(
"[bold cyan]📚 TinyTorch Book Management[/bold cyan]\n\n"
"[bold]Available Commands:[/bold]\n"
" [bold green]build[/bold green] - Build the complete Jupyter Book\n"
" [bold green]serve[/bold green] - Build and serve the Jupyter Book locally\n"
" [bold green]publish[/bold green] - Generate content, commit, and publish to GitHub\n"
" [bold green]clean[/bold green] - Clean built book files\n\n"
"[bold]Quick Start:[/bold]\n"
" [dim]tito book publish[/dim] - Generate, commit, and publish to GitHub\n"
" [dim]tito book clean[/dim] - Clean built book files",
title="Book Commands",
border_style="bright_blue"
))
return 0
if args.book_command == 'build':
return self._build_book(args)
elif args.book_command == 'serve':
return self._serve_book(args)
elif args.book_command == 'publish':
return self._publish_book(args)
elif args.book_command == 'clean':
return self._clean_book()
else:
console.print(f"[red]Unknown book command: {args.book_command}[/red]")
return 1
def _generate_overview(self) -> int:
"""Generate overview pages from modules."""
console = self.console
console.print("🔄 Generating overview pages from modules...")
try:
os.chdir("site")
result = subprocess.run(
["python3", "convert_readmes.py"],
capture_output=True,
text=True
)
if result.returncode == 0:
console.print("✅ Overview pages generated successfully")
# Show summary from the output
for line in result.stdout.split('\n'):
if "✅ Created" in line or "🎉 Converted" in line:
console.print(f" {line.strip()}")
return 0
else:
console.print(f"[red]❌ Failed to generate overview pages: {result.stderr}[/red]")
return 1
except FileNotFoundError:
console.print("[red]❌ Python3 not found or convert_readmes.py missing[/red]")
return 1
except Exception as e:
console.print(f"[red]❌ Error generating overview pages: {e}[/red]")
return 1
finally:
os.chdir("..")
def _generate_all(self) -> int:
"""Verify that all book chapters exist."""
console = self.console
console.print("📝 Verifying book chapters...")
# Check that the chapters directory exists
chapters_dir = Path("docs/chapters")
if not chapters_dir.exists():
console.print("[red]❌ docs/chapters directory not found[/red]")
return 1
# Count markdown files in chapters directory
chapter_files = list(chapters_dir.glob("*.md"))
if chapter_files:
console.print(f"✅ Found {len(chapter_files)} chapter files")
else:
console.print("[yellow]⚠️ No chapter files found in docs/chapters/[/yellow]")
return 0
def _build_book(self, args: Namespace) -> int:
"""Build the Jupyter Book locally."""
console = self.console
# First generate all content (notebooks + overview pages)
console.print("📄 Step 1: Generating all content...")
if self._generate_all() != 0:
return 1
# Then build the book
console.print("📚 Step 2: Building Jupyter Book...")
try:
os.chdir("site")
result = subprocess.run(
["jupyter-book", "build", "."],
capture_output=True,
text=True
)
if result.returncode == 0:
console.print("✅ Book built successfully!")
# Extract and show the file path
if "file://" in result.stdout:
for line in result.stdout.split('\n'):
if "file://" in line:
console.print(f"🌐 View at: {line.strip()}")
break
console.print("📁 HTML files available in: docs/_build/html/")
return 0
else:
console.print(f"[red]❌ Failed to build book[/red]")
if result.stderr:
console.print(f"Error details: {result.stderr}")
return 1
except FileNotFoundError:
console.print("[red]❌ jupyter-book not found. Install with: pip install jupyter-book[/red]")
return 1
except Exception as e:
console.print(f"[red]❌ Error building book: {e}[/red]")
return 1
finally:
os.chdir("..")
def _serve_book(self, args: Namespace) -> int:
"""Build and serve the Jupyter Book locally."""
console = self.console
# Build the book first unless --no-build is specified
if not args.no_build:
console.print("📚 Step 1: Building the book...")
if self._build_book(args) != 0:
return 1
console.print()
# Start the HTTP server
console.print("🌐 Step 2: Starting development server...")
console.print(f"📖 Open your browser to: [bold blue]http://localhost:{args.port}[/bold blue]")
console.print("🛑 Press [bold]Ctrl+C[/bold] to stop the server")
console.print()
book_dir = Path("docs/_build/html")
if not book_dir.exists():
console.print("[red]❌ Built book not found. Run with --no-build=False to build first.[/red]")
return 1
try:
# Use Python's built-in HTTP server
subprocess.run([
"python3", "-m", "http.server", str(args.port),
"--directory", str(book_dir)
])
except KeyboardInterrupt:
console.print("\n🛑 Development server stopped")
except FileNotFoundError:
console.print("[red]❌ Python3 not found in PATH[/red]")
return 1
except Exception as e:
console.print(f"[red]❌ Error starting server: {e}[/red]")
return 1
return 0
def _clean_book(self) -> int:
"""Clean built book files."""
console = self.console
console.print("🧹 Cleaning book build files...")
try:
os.chdir("site")
result = subprocess.run(
["jupyter-book", "clean", "."],
capture_output=True,
text=True
)
if result.returncode == 0:
console.print("✅ Book files cleaned successfully")
return 0
else:
console.print(f"[red]❌ Failed to clean book files: {result.stderr}[/red]")
return 1
except FileNotFoundError:
console.print("[red]❌ jupyter-book not found[/red]")
return 1
except Exception as e:
console.print(f"[red]❌ Error cleaning book: {e}[/red]")
return 1
finally:
os.chdir("..")
def _publish_book(self, args: Namespace) -> int:
"""Generate content, commit, and publish to GitHub."""
console = self.console
console.print("🚀 Starting book publishing workflow...")
# Step 1: Generate all content
console.print("📝 Step 1: Generating all content...")
if self._generate_all() != 0:
console.print("[red]❌ Failed to generate content. Aborting publish.[/red]")
return 1
# Step 2: Check git status
console.print("🔍 Step 2: Checking git status...")
try:
result = subprocess.run(
["git", "status", "--porcelain"],
capture_output=True,
text=True,
cwd="."
)
if result.returncode != 0:
console.print("[red]❌ Git not available or not a git repository[/red]")
return 1
changes = result.stdout.strip()
if not changes:
console.print("✅ No changes to publish")
return 0
except Exception as e:
console.print(f"[red]❌ Error checking git status: {e}[/red]")
return 1
# Step 3: Add and commit changes
console.print("📦 Step 3: Committing changes...")
try:
# Add all changes
subprocess.run(["git", "add", "."], check=True, cwd=".")
# Commit with message
subprocess.run([
"git", "commit", "-m", args.message
], check=True, cwd=".")
console.print(f"✅ Committed with message: {args.message}")
except subprocess.CalledProcessError as e:
console.print(f"[red]❌ Failed to commit changes: {e}[/red]")
return 1
except Exception as e:
console.print(f"[red]❌ Error during commit: {e}[/red]")
return 1
# Step 4: Push to GitHub
console.print(f"⬆️ Step 4: Pushing to {args.branch} branch...")
try:
result = subprocess.run([
"git", "push", "origin", args.branch
], capture_output=True, text=True, cwd=".")
if result.returncode == 0:
console.print(f"✅ Successfully pushed to {args.branch}")
else:
console.print(f"[red]❌ Failed to push: {result.stderr}[/red]")
return 1
except Exception as e:
console.print(f"[red]❌ Error during push: {e}[/red]")
return 1
# Step 5: Show deployment info
console.print("🌐 Step 5: Deployment initiated...")
console.print("✅ GitHub Actions will now:")
console.print(" 📚 Build the Jupyter Book")
console.print(" 🚀 Deploy to GitHub Pages")
console.print(" 🔗 Update live website")
# Try to get repository info for deployment URL
try:
result = subprocess.run([
"git", "remote", "get-url", "origin"
], capture_output=True, text=True, cwd=".")
if result.returncode == 0:
remote_url = result.stdout.strip()
if "github.com" in remote_url:
# Extract owner/repo from git URL
if remote_url.endswith(".git"):
remote_url = remote_url[:-4]
if remote_url.startswith("git@github.com:"):
repo_path = remote_url.replace("git@github.com:", "")
elif remote_url.startswith("https://github.com/"):
repo_path = remote_url.replace("https://github.com/", "")
else:
repo_path = None
if repo_path:
console.print(f"\n🔗 Monitor deployment: https://github.com/{repo_path}/actions")
console.print(f"📖 Live website: https://{repo_path.split('/')[0]}.github.io/{repo_path.split('/')[1]}/")
except Exception:
# Don't fail the whole command if we can't get repo info
pass
console.print("\n🎉 Publishing workflow complete!")
console.print("💡 Check GitHub Actions for deployment status")
return 0

View File

@@ -1,690 +0,0 @@
"""
Checkpoint tracking and visualization command for TinyTorch CLI.
Provides capability-based progress tracking through the ML systems engineering journey:
Foundation → Architecture → Training → Inference → Serving
"""
import argparse
import subprocess
import sys
from pathlib import Path
from typing import Dict, List, Tuple, Optional
from rich.console import Console
from rich.panel import Panel
from rich.progress import Progress, BarColumn, TextColumn, SpinnerColumn
from rich.table import Table
from rich.tree import Tree
from rich.text import Text
from rich.layout import Layout
from rich.columns import Columns
from rich.status import Status
from .base import BaseCommand
from ..core.config import CLIConfig
from ..core.console import get_console, print_error, print_success
class CheckpointSystem:
"""Core checkpoint tracking system."""
# Define the 20-checkpoint structure for complete ML systems engineering journey
CHECKPOINTS = {
"00": {
"name": "Environment",
"description": "Development environment setup and configuration",
"test_file": "checkpoint_00_environment.py",
"capability": "Can I configure my TinyTorch development environment?"
},
"01": {
"name": "Foundation",
"description": "Basic tensor operations and ML building blocks",
"test_file": "checkpoint_01_foundation.py",
"capability": "Can I create and manipulate the building blocks of ML?"
},
"02": {
"name": "Intelligence",
"description": "Nonlinear activation functions",
"test_file": "checkpoint_02_intelligence.py",
"capability": "Can I add nonlinearity - the key to neural network intelligence?"
},
"03": {
"name": "Components",
"description": "Fundamental neural network building blocks",
"test_file": "checkpoint_03_components.py",
"capability": "Can I build the fundamental building blocks of neural networks?"
},
"04": {
"name": "Networks",
"description": "Complete multi-layer neural networks",
"test_file": "checkpoint_04_networks.py",
"capability": "Can I build complete multi-layer neural networks?"
},
"05": {
"name": "Learning",
"description": "Spatial data processing with convolutional operations",
"test_file": "checkpoint_05_learning.py",
"capability": "Can I process spatial data like images with convolutional operations?"
},
"06": {
"name": "Attention",
"description": "Attention mechanisms for sequence understanding",
"test_file": "checkpoint_06_attention.py",
"capability": "Can I build attention mechanisms for sequence understanding?"
},
"07": {
"name": "Stability",
"description": "Training stabilization with normalization",
"test_file": "checkpoint_07_stability.py",
"capability": "Can I stabilize training with normalization techniques?"
},
"08": {
"name": "Differentiation",
"description": "Automatic gradient computation for learning",
"test_file": "checkpoint_08_differentiation.py",
"capability": "Can I automatically compute gradients for learning?"
},
"09": {
"name": "Optimization",
"description": "Sophisticated optimization algorithms",
"test_file": "checkpoint_09_optimization.py",
"capability": "Can I optimize neural networks with sophisticated algorithms?"
},
"10": {
"name": "Training",
"description": "Complete training loops for end-to-end learning",
"test_file": "checkpoint_10_training.py",
"capability": "Can I build complete training loops for end-to-end learning?"
},
"11": {
"name": "Regularization",
"description": "Overfitting prevention and robust model building",
"test_file": "checkpoint_11_regularization.py",
"capability": "Can I prevent overfitting and build robust models?"
},
"12": {
"name": "Kernels",
"description": "High-performance computational kernels",
"test_file": "checkpoint_12_kernels.py",
"capability": "Can I implement high-performance computational kernels?"
},
"13": {
"name": "Benchmarking",
"description": "Performance analysis and bottleneck identification",
"test_file": "checkpoint_13_benchmarking.py",
"capability": "Can I analyze performance and identify bottlenecks in ML systems?"
},
"14": {
"name": "Deployment",
"description": "Production deployment and monitoring",
"test_file": "checkpoint_14_deployment.py",
"capability": "Can I deploy and monitor ML systems in production?"
},
"15": {
"name": "Acceleration",
"description": "Algorithmic optimization and acceleration techniques",
"test_file": "checkpoint_15_acceleration.py",
"capability": "Can I accelerate computations through algorithmic optimization?"
},
"16": {
"name": "Quantization",
"description": "Trading precision for speed with INT8 quantization",
"test_file": "checkpoint_16_quantization.py",
"capability": "Can I trade precision for speed with INT8 quantization?"
},
"17": {
"name": "Compression",
"description": "Neural network pruning for edge deployment",
"test_file": "checkpoint_17_compression.py",
"capability": "Can I remove 70% of parameters while maintaining accuracy?"
},
"18": {
"name": "Caching",
"description": "KV caching for transformer inference optimization",
"test_file": "checkpoint_18_caching.py",
"capability": "Can I transform O(N²) to O(N) complexity with intelligent caching?"
},
"19": {
"name": "Competition",
"description": "TinyMLPerf competition system for optimization mastery",
"test_file": "checkpoint_19_competition.py",
"capability": "Can I build competition-grade benchmarking infrastructure?"
},
"20": {
"name": "TinyGPT Capstone",
"description": "Complete language model demonstrating ML systems mastery",
"test_file": "checkpoint_20_capstone.py",
"capability": "Can I build a complete language model that generates coherent text from scratch?"
}
}
def __init__(self, config: CLIConfig):
"""Initialize checkpoint system."""
self.config = config
self.console = get_console()
self.modules_dir = config.project_root / "modules" / "source"
self.checkpoints_dir = config.project_root / "tests" / "checkpoints"
def get_checkpoint_test_status(self, checkpoint_id: str) -> Dict[str, bool]:
"""Get the status of a checkpoint test file."""
if checkpoint_id not in self.CHECKPOINTS:
return {"exists": False, "tested": False, "passed": False}
test_file = self.CHECKPOINTS[checkpoint_id]["test_file"]
test_path = self.checkpoints_dir / test_file
return {
"exists": test_path.exists(),
"tested": False, # Will be set when we run tests
"passed": False # Will be set based on test results
}
def get_checkpoint_status(self, checkpoint_id: str) -> Dict:
"""Get status information for a checkpoint."""
checkpoint = self.CHECKPOINTS[checkpoint_id]
test_status = self.get_checkpoint_test_status(checkpoint_id)
return {
"checkpoint": checkpoint,
"test_status": test_status,
"is_available": test_status["exists"],
"is_complete": test_status.get("passed", False),
"checkpoint_id": checkpoint_id
}
def get_overall_progress(self) -> Dict:
"""Get overall progress across all checkpoints."""
checkpoints_status = {}
current_checkpoint = None
total_complete = 0
total_checkpoints = len(self.CHECKPOINTS)
for checkpoint_id in self.CHECKPOINTS.keys():
status = self.get_checkpoint_status(checkpoint_id)
checkpoints_status[checkpoint_id] = status
if status["is_complete"]:
total_complete += 1
elif current_checkpoint is None and status["is_available"]:
# First available but incomplete checkpoint is current
current_checkpoint = checkpoint_id
# If all are complete, set current to last checkpoint
if current_checkpoint is None and total_complete == total_checkpoints:
current_checkpoint = list(self.CHECKPOINTS.keys())[-1]
# If none are complete, start with first
elif current_checkpoint is None:
current_checkpoint = "00"
# Calculate overall percentage
overall_percent = (total_complete / total_checkpoints * 100) if total_checkpoints > 0 else 0
return {
"checkpoints": checkpoints_status,
"current": current_checkpoint,
"overall_progress": overall_percent,
"total_complete": total_complete,
"total_checkpoints": total_checkpoints
}
def run_checkpoint_test(self, checkpoint_id: str) -> Dict:
"""Run a specific checkpoint test and return results."""
if checkpoint_id not in self.CHECKPOINTS:
return {"success": False, "error": f"Unknown checkpoint: {checkpoint_id}"}
checkpoint = self.CHECKPOINTS[checkpoint_id]
test_file = checkpoint["test_file"]
test_path = self.checkpoints_dir / test_file
if not test_path.exists():
return {"success": False, "error": f"Test file not found: {test_file}"}
try:
# Run the test using subprocess to capture output
result = subprocess.run(
[sys.executable, str(test_path)],
capture_output=True,
text=True,
cwd=self.config.project_root,
timeout=30 # 30 second timeout
)
return {
"success": result.returncode == 0,
"returncode": result.returncode,
"stdout": result.stdout,
"stderr": result.stderr,
"checkpoint_name": checkpoint["name"],
"capability": checkpoint["capability"]
}
except subprocess.TimeoutExpired:
return {"success": False, "error": "Test timed out after 30 seconds"}
except Exception as e:
return {"success": False, "error": f"Test execution failed: {str(e)}"}
class CheckpointCommand(BaseCommand):
"""Checkpoint tracking and visualization command."""
name = "checkpoint"
description = "Track and visualize ML systems engineering progress through checkpoints"
def add_arguments(self, parser: argparse.ArgumentParser) -> None:
"""Add checkpoint-specific arguments."""
subparsers = parser.add_subparsers(
dest='checkpoint_command',
help='Checkpoint operations',
metavar='COMMAND'
)
# Status command
status_parser = subparsers.add_parser(
'status',
help='Show current checkpoint progress'
)
status_parser.add_argument(
'--detailed', '-d',
action='store_true',
help='Show detailed module-level progress'
)
# Timeline command
timeline_parser = subparsers.add_parser(
'timeline',
help='Show visual progress timeline'
)
timeline_parser.add_argument(
'--horizontal',
action='store_true',
help='Show horizontal timeline (default: vertical)'
)
# Test command
test_parser = subparsers.add_parser(
'test',
help='Test checkpoint capabilities'
)
test_parser.add_argument(
'checkpoint_id',
nargs='?',
help='Checkpoint ID to test (00-20, current checkpoint if not specified)'
)
# Run command (new)
run_parser = subparsers.add_parser(
'run',
help='Run specific checkpoint tests with progress tracking'
)
run_parser.add_argument(
'checkpoint_id',
help='Checkpoint ID to run (00-20)'
)
run_parser.add_argument(
'--verbose', '-v',
action='store_true',
help='Show detailed test output'
)
# Unlock command
unlock_parser = subparsers.add_parser(
'unlock',
help='Attempt to unlock next checkpoint'
)
def run(self, args: argparse.Namespace) -> int:
"""Execute checkpoint command."""
checkpoint_system = CheckpointSystem(self.config)
if not args.checkpoint_command:
return self._show_help(args)
if args.checkpoint_command == 'status':
return self._show_status(checkpoint_system, args)
elif args.checkpoint_command == 'timeline':
return self._show_timeline(checkpoint_system, args)
elif args.checkpoint_command == 'test':
return self._test_checkpoint(checkpoint_system, args)
elif args.checkpoint_command == 'run':
return self._run_checkpoint(checkpoint_system, args)
elif args.checkpoint_command == 'unlock':
return self._unlock_checkpoint(checkpoint_system, args)
else:
print_error(f"Unknown checkpoint command: {args.checkpoint_command}")
return 1
def _show_help(self, args: argparse.Namespace) -> int:
"""Show checkpoint command help."""
console = get_console()
console.print(Panel(
"[bold cyan]TinyTorch Checkpoint System[/bold cyan]\n\n"
"[bold]Track your progress through 20 capability checkpoints:[/bold]\n"
" 00-04: Foundation → Environment, tensors, networks\n"
" 05-09: Architecture → Spatial, attention, autograd, optimization\n"
" 10-14: Systems → Training, kernels, benchmarking, deployment\n"
" 15-19: Optimization → Acceleration, quantization, compression, caching, competition\n"
" 20: Capstone → Complete TinyGPT language model\n\n"
"[bold]Available Commands:[/bold]\n"
" [green]status[/green] - Show current progress and capabilities\n"
" [green]timeline[/green] - Visual progress timeline\n"
" [green]test[/green] - Test checkpoint capabilities\n"
" [green]run[/green] - Run specific checkpoint with progress\n"
" [green]unlock[/green] - Attempt to unlock next checkpoint\n\n"
"[bold]Examples:[/bold]\n"
" [dim]tito checkpoint status --detailed[/dim]\n"
" [dim]tito checkpoint timeline --horizontal[/dim]\n"
" [dim]tito checkpoint test 16[/dim]\n"
" [dim]tito checkpoint run 20 --verbose[/dim]",
title="Checkpoint System (20 Checkpoints)",
border_style="bright_blue"
))
return 0
def _show_status(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
"""Show checkpoint status."""
console = get_console()
progress_data = checkpoint_system.get_overall_progress()
# Header
console.print(Panel(
"[bold cyan]🚀 TinyTorch Framework Capabilities[/bold cyan]",
border_style="bright_blue"
))
# Overall progress
overall_percent = progress_data["overall_progress"]
console.print(f"\n[bold]Overall Progress:[/bold] {overall_percent:.0f}% ({progress_data['total_complete']}/{progress_data['total_checkpoints']} checkpoints)")
# Current status summary
current = progress_data["current"]
if current:
current_status = progress_data["checkpoints"][current]
current_name = current_status["checkpoint"]["name"]
console.print(f"[bold]Current Checkpoint:[/bold] {current:0>2} - {current_name}")
if current_status["is_complete"]:
console.print(f"[bold green]✅ {current_name} checkpoint achieved![/bold green]")
console.print(f"[dim]Capability unlocked: {current_status['checkpoint']['capability']}[/dim]")
else:
console.print(f"[bold yellow]🎯 Ready to test {current_name} capabilities[/bold yellow]")
console.print(f"[dim]Goal: {current_status['checkpoint']['capability']}[/dim]")
console.print()
# Checkpoint progress
for checkpoint_id, checkpoint_data in progress_data["checkpoints"].items():
checkpoint = checkpoint_data["checkpoint"]
# Checkpoint header
if checkpoint_data["is_complete"]:
status_icon = ""
status_color = "green"
elif checkpoint_id == current:
status_icon = "🎯"
status_color = "yellow"
else:
status_icon = ""
status_color = "dim"
console.print(f"[bold]{status_icon} {checkpoint_id:0>2}: {checkpoint['name']}[/bold] [{status_color}]{'COMPLETE' if checkpoint_data['is_complete'] else 'PENDING'}[/{status_color}]")
if args.detailed:
# Show test file and availability
test_status = checkpoint_data["test_status"]
test_available = "" if test_status["exists"] else ""
console.print(f" {test_available} Test: {checkpoint['test_file']}")
console.print(f" [dim]{checkpoint['capability']}[/dim]\n")
return 0
def _show_timeline(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
"""Show visual timeline with Rich progress bar."""
console = get_console()
progress_data = checkpoint_system.get_overall_progress()
console.print("\n[bold cyan]🚀 TinyTorch Framework Progress Timeline[/bold cyan]\n")
if args.horizontal:
# Enhanced horizontal timeline with progress line
overall_percent = progress_data["overall_progress"]
total_checkpoints = progress_data["total_checkpoints"]
complete_checkpoints = progress_data["total_complete"]
# Create a visual progress bar
filled = int(overall_percent / 2) # 50 characters total width
bar = "" * filled + "" * (50 - filled)
console.print(f"[bold]Overall:[/bold] [{bar}] {overall_percent:.0f}%")
console.print(f"[dim]{complete_checkpoints}/{total_checkpoints} checkpoints complete[/dim]\n")
# Show checkpoint progression - group in rows of 8
checkpoints_list = list(progress_data["checkpoints"].items())
for row_start in range(0, len(checkpoints_list), 8):
row_checkpoints = checkpoints_list[row_start:row_start + 8]
# Build the checkpoint line for this row
checkpoint_line = ""
names_line = ""
for i, (checkpoint_id, checkpoint_data) in enumerate(row_checkpoints):
checkpoint = checkpoint_data["checkpoint"]
# Checkpoint status
if checkpoint_data["is_complete"]:
checkpoint_marker = f"[green]●[/green]"
name_color = "green"
elif checkpoint_id == progress_data["current"]:
checkpoint_marker = f"[yellow]◉[/yellow]"
name_color = "yellow"
else:
checkpoint_marker = f"[dim]○[/dim]"
name_color = "dim"
# Add checkpoint with ID
checkpoint_line += f"{checkpoint_marker}{checkpoint_id}"
names_line += f"[{name_color}]{checkpoint['name'][:9]:^9}[/{name_color}]"
# Add spacing (except for last in row)
if i < len(row_checkpoints) - 1:
if checkpoint_data["is_complete"]:
checkpoint_line += "[green]━━[/green]"
else:
checkpoint_line += "[dim]━━[/dim]"
names_line += " "
console.print(checkpoint_line)
console.print(names_line)
console.print() # Empty line between rows
else:
# Vertical timeline (tree structure)
tree = Tree("ML Systems Engineering Journey (20 Checkpoints)")
for checkpoint_id, checkpoint_data in progress_data["checkpoints"].items():
checkpoint = checkpoint_data["checkpoint"]
if checkpoint_data["is_complete"]:
checkpoint_text = f"[green]✅ {checkpoint_id}: {checkpoint['name']}[/green]"
elif checkpoint_id == progress_data["current"]:
checkpoint_text = f"[yellow]🎯 {checkpoint_id}: {checkpoint['name']} (CURRENT)[/yellow]"
else:
checkpoint_text = f"[dim]⏳ {checkpoint_id}: {checkpoint['name']}[/dim]"
checkpoint_node = tree.add(checkpoint_text)
checkpoint_node.add(f"[dim]{checkpoint['capability']}[/dim]")
console.print(tree)
console.print()
return 0
def _test_checkpoint(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
"""Test checkpoint capabilities."""
console = get_console()
# Determine which checkpoint to test
checkpoint_id = args.checkpoint_id
if not checkpoint_id:
progress_data = checkpoint_system.get_overall_progress()
checkpoint_id = progress_data["current"]
# Validate checkpoint ID
if checkpoint_id not in checkpoint_system.CHECKPOINTS:
print_error(f"Unknown checkpoint: {checkpoint_id}")
console.print(f"[dim]Available checkpoints: {', '.join(checkpoint_system.CHECKPOINTS.keys())}[/dim]")
return 1
checkpoint = checkpoint_system.CHECKPOINTS[checkpoint_id]
# Show what we're testing
console.print(f"\n[bold cyan]Testing Checkpoint {checkpoint_id}: {checkpoint['name']}[/bold cyan]")
console.print(f"[bold]Capability Question:[/bold] {checkpoint['capability']}\n")
# Run the test
with console.status(f"[bold green]Running checkpoint {checkpoint_id} test...", spinner="dots") as status:
result = checkpoint_system.run_checkpoint_test(checkpoint_id)
# Display results
if result["success"]:
console.print(f"[bold green]✅ Checkpoint {checkpoint_id} PASSED![/bold green]")
console.print(f"[green]Capability achieved: {checkpoint['capability']}[/green]\n")
# Show brief output
if result.get("stdout") and "🎉" in result["stdout"]:
# Extract the completion message
lines = result["stdout"].split('\n')
for line in lines:
if "🎉" in line or "📝" in line or "🎯" in line:
console.print(f"[dim]{line}[/dim]")
print_success(f"Checkpoint {checkpoint_id} test completed successfully!")
return 0
else:
console.print(f"[bold red]❌ Checkpoint {checkpoint_id} FAILED[/bold red]\n")
# Show error details
if "error" in result:
console.print(f"[red]Error: {result['error']}[/red]")
elif result.get("stderr"):
console.print(f"[red]Error output:[/red]")
console.print(f"[dim]{result['stderr']}[/dim]")
elif result.get("stdout"):
console.print(f"[yellow]Test output:[/yellow]")
console.print(f"[dim]{result['stdout']}[/dim]")
print_error(f"Checkpoint {checkpoint_id} test failed")
return 1
def _run_checkpoint(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
"""Run specific checkpoint test with detailed progress tracking."""
console = get_console()
checkpoint_id = args.checkpoint_id
# Validate checkpoint ID
if checkpoint_id not in checkpoint_system.CHECKPOINTS:
print_error(f"Unknown checkpoint: {checkpoint_id}")
console.print(f"[dim]Available checkpoints: {', '.join(checkpoint_system.CHECKPOINTS.keys())}[/dim]")
return 1
checkpoint = checkpoint_system.CHECKPOINTS[checkpoint_id]
# Show detailed information
console.print(Panel(
f"[bold cyan]Checkpoint {checkpoint_id}: {checkpoint['name']}[/bold cyan]\n\n"
f"[bold]Capability Question:[/bold]\n{checkpoint['capability']}\n\n"
f"[bold]Test File:[/bold] {checkpoint['test_file']}\n"
f"[bold]Description:[/bold] {checkpoint['description']}",
title=f"Running Checkpoint {checkpoint_id}",
border_style="bright_blue"
))
# Check if test file exists
test_path = checkpoint_system.checkpoints_dir / checkpoint["test_file"]
if not test_path.exists():
print_error(f"Test file not found: {checkpoint['test_file']}")
return 1
console.print(f"\n[bold]Executing test...[/bold]")
# Run the test with status feedback
with console.status(f"[bold green]Running checkpoint {checkpoint_id} test...", spinner="dots"):
result = checkpoint_system.run_checkpoint_test(checkpoint_id)
console.print()
# Display detailed results
if result["success"]:
console.print(Panel(
f"[bold green]✅ SUCCESS![/bold green]\n\n"
f"[green]Checkpoint {checkpoint_id} completed successfully![/green]\n"
f"[green]Capability achieved: {checkpoint['capability']}[/green]",
title="Test Results",
border_style="green"
))
# Show test output if verbose or if it contains key markers
if args.verbose or (result.get("stdout") and any(marker in result["stdout"] for marker in ["🎉", "", "📝", "🎯"])):
console.print(f"\n[bold]Test Output:[/bold]")
if result.get("stdout"):
console.print(result["stdout"])
return 0
else:
console.print(Panel(
f"[bold red]❌ FAILED[/bold red]\n\n"
f"[red]Checkpoint {checkpoint_id} test failed[/red]\n"
f"[yellow]This indicates the required capabilities are not yet implemented.[/yellow]",
title="Test Results",
border_style="red"
))
# Show error details
if "error" in result:
console.print(f"\n[bold red]Error:[/bold red] {result['error']}")
if args.verbose or "error" in result:
if result.get("stdout"):
console.print(f"\n[bold]Standard Output:[/bold]")
console.print(result["stdout"])
if result.get("stderr"):
console.print(f"\n[bold]Error Output:[/bold]")
console.print(result["stderr"])
return 1
def _unlock_checkpoint(self, checkpoint_system: CheckpointSystem, args: argparse.Namespace) -> int:
"""Attempt to unlock next checkpoint."""
console = get_console()
progress_data = checkpoint_system.get_overall_progress()
current = progress_data["current"]
if not current:
console.print("[green]All checkpoints completed! 🎉[/green]")
return 0
current_status = progress_data["checkpoints"][current]
if current_status["is_complete"]:
console.print(f"[green]✅ Checkpoint {current} ({current_status['checkpoint']['name']}) already complete![/green]")
# Find next checkpoint
checkpoint_ids = list(checkpoint_system.CHECKPOINTS.keys())
try:
current_index = checkpoint_ids.index(current)
if current_index < len(checkpoint_ids) - 1:
next_id = checkpoint_ids[current_index + 1]
next_checkpoint = checkpoint_system.CHECKPOINTS[next_id]
console.print(f"[bold]Next checkpoint:[/bold] {next_id} - {next_checkpoint['name']}")
console.print(f"[dim]Goal: {next_checkpoint['capability']}[/dim]")
else:
console.print("[bold]🎉 All checkpoints completed![/bold]")
except ValueError:
console.print("[yellow]Cannot determine next checkpoint[/yellow]")
else:
console.print(f"[yellow]Test checkpoint {current} to unlock your next capability:[/yellow]")
console.print(f"[bold]Goal:[/bold] {current_status['checkpoint']['capability']}")
console.print(f"[dim]Run: tito checkpoint run {current}[/dim]")
return 0

View File

@@ -1,160 +0,0 @@
"""
Clean command for TinyTorch CLI: cleans up module directories to start fresh.
"""
import shutil
from argparse import ArgumentParser, Namespace
from pathlib import Path
from rich.panel import Panel
from rich.text import Text
from .base import BaseCommand
class CleanCommand(BaseCommand):
@property
def name(self) -> str:
return "clean"
@property
def description(self) -> str:
return "Clean up module directories (notebooks, cache, etc.)"
def add_arguments(self, parser: ArgumentParser) -> None:
parser.add_argument("module", nargs="?", help="Clean specific module only")
parser.add_argument("--notebooks", action="store_true", help="Remove generated notebook files")
parser.add_argument("--cache", action="store_true", help="Remove Python cache files")
parser.add_argument("--all", action="store_true", help="Clean all modules")
parser.add_argument("--force", action="store_true", help="Skip confirmation prompt")
def run(self, args: Namespace) -> int:
console = self.console
console.print(Panel("🧹 Cleaning Module Directories",
title="Module Cleanup", border_style="bright_yellow"))
modules_dir = Path("modules")
if not modules_dir.exists():
console.print(Panel("[red]❌ modules/ directory not found[/red]",
title="Error", border_style="red"))
return 1
# Determine what to clean (file types)
clean_notebooks = args.notebooks or (not args.notebooks and not args.cache)
clean_cache = args.cache or (not args.notebooks and not args.cache)
# Determine which modules to clean
if args.module:
module_path = modules_dir / args.module
if not module_path.exists():
console.print(Panel(f"[red]❌ Module '{args.module}' not found[/red]",
title="Module Not Found", border_style="red"))
return 1
module_dirs = [module_path]
elif args.all:
# Find all module directories (exclude special directories)
exclude_dirs = {'.quarto', '__pycache__', '.git', '.pytest_cache', 'sidebar.yml', 'nbdev.yml'}
module_dirs = [d for d in modules_dir.iterdir()
if d.is_dir() and d.name not in exclude_dirs]
else:
# No module specified and no --all flag
console.print(Panel("[red]❌ Please specify a module name or use --all to clean all modules[/red]\n\n"
"[dim]Examples:[/dim]\n"
"[dim] tito module clean tensor - Clean specific module[/dim]\n"
"[dim] tito module clean --all - Clean all modules[/dim]",
title="Module Required", border_style="red"))
return 1
if not module_dirs:
console.print(Panel("[yellow]⚠️ No modules found to clean[/yellow]",
title="Nothing to Clean", border_style="yellow"))
return 0
# Show what will be cleaned
clean_text = Text()
clean_text.append("📋 Cleanup Plan:\n\n", style="bold cyan")
files_to_remove = []
for module_dir in module_dirs:
module_name = module_dir.name
clean_text.append(f"📁 {module_name}:\n", style="bold white")
if clean_notebooks:
# Find .ipynb files
for ipynb_file in module_dir.glob("*.ipynb"):
files_to_remove.append(ipynb_file)
clean_text.append(f" 🗑️ {ipynb_file.name}\n", style="yellow")
if clean_cache:
# Find __pycache__ directories
pycache_dirs = []
for pycache in module_dir.rglob("__pycache__"):
if pycache.is_dir():
pycache_dirs.append(pycache)
files_to_remove.append(pycache)
clean_text.append(f" 🗑️ {pycache.relative_to(module_dir)}/\n", style="yellow")
# Find .pyc files that are NOT inside __pycache__ directories
for pyc_file in module_dir.rglob("*.pyc"):
# Check if this pyc file is inside any __pycache__ directory
is_in_pycache = any(pycache in pyc_file.parents for pycache in pycache_dirs)
if not is_in_pycache:
files_to_remove.append(pyc_file)
clean_text.append(f" 🗑️ {pyc_file.relative_to(module_dir)}\n", style="yellow")
if not files_to_remove:
console.print(Panel("[green]✅ No files found to clean - modules are already clean![/green]",
title="Already Clean", border_style="green"))
return 0
clean_text.append(f"\n📊 Total: {len(files_to_remove)} files/directories to remove\n", style="bold cyan")
console.print(Panel(clean_text, title="Cleanup Preview", border_style="bright_yellow"))
# Ask for confirmation unless --force is used
if not args.force:
console.print("\n[yellow]This will permanently remove the files listed above.[/yellow]")
console.print("[yellow]Python source files (*.py) will be preserved.[/yellow]\n")
try:
response = input("Are you sure you want to proceed? (y/N): ").strip().lower()
if response not in ['y', 'yes']:
console.print(Panel("[cyan]Cleanup cancelled.[/cyan]",
title="Cancelled", border_style="cyan"))
return 0
except KeyboardInterrupt:
console.print(Panel("[cyan]Cleanup cancelled.[/cyan]",
title="Cancelled", border_style="cyan"))
return 0
# Perform cleanup
removed_count = 0
error_count = 0
for file_path in files_to_remove:
try:
if file_path.is_dir():
shutil.rmtree(file_path)
else:
file_path.unlink()
removed_count += 1
except Exception as e:
console.print(f" ❌ Failed to remove {file_path}: {e}")
error_count += 1
# Show results
result_text = Text()
if removed_count > 0:
result_text.append(f"✅ Successfully removed {removed_count} files/directories\n", style="bold green")
if error_count > 0:
result_text.append(f"❌ Failed to remove {error_count} files/directories\n", style="bold red")
if removed_count > 0:
result_text.append("\n💡 Next steps:\n", style="bold yellow")
result_text.append(" • Run: tito module notebooks - Regenerate notebooks\n", style="white")
result_text.append(" • Run: tito module test --all - Test all modules\n", style="white")
result_text.append(" • Run: tito module export --all - Export to package\n", style="white")
border_style = "green" if error_count == 0 else "yellow"
console.print(Panel(result_text, title="Cleanup Complete", border_style=border_style))
return 0 if error_count == 0 else 1

View File

@@ -1,263 +0,0 @@
#!/usr/bin/env python3
"""
Tito Demo Command - Show off your AI capabilities!
Runs progressive demos showing what TinyTorch can do at each stage.
"""
import argparse
import subprocess
import sys
from pathlib import Path
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
from rich.text import Text
from .base import BaseCommand
console = Console()
class TinyTorchDemoMatrix:
"""Tracks and displays TinyTorch AI demo capabilities"""
def __init__(self):
self.demos = {
'math': {
'name': 'Mathematical Operations',
'file': 'demo_tensor_math.py',
'requires': ['02_tensor'],
'description': 'Linear algebra, matrix operations, transformations'
},
'logic': {
'name': 'Logical Reasoning',
'file': 'demo_activations.py',
'requires': ['02_tensor', '03_activations'],
'description': 'Boolean functions, XOR problem, decision boundaries'
},
'neuron': {
'name': 'Single Neuron Learning',
'file': 'demo_single_neuron.py',
'requires': ['02_tensor', '03_activations', '04_layers'],
'description': 'Watch a neuron learn the AND gate'
},
'network': {
'name': 'Multi-Layer Networks',
'file': 'demo_xor_network.py',
'requires': ['02_tensor', '03_activations', '04_layers', '05_dense'],
'description': 'Solve the famous XOR problem'
},
'vision': {
'name': 'Computer Vision',
'file': 'demo_vision.py',
'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '06_spatial'],
'description': 'Image processing and pattern recognition'
},
'attention': {
'name': 'Attention Mechanisms',
'file': 'demo_attention.py',
'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '07_attention'],
'description': 'Sequence processing and attention'
},
'training': {
'name': 'End-to-End Training',
'file': 'demo_training.py',
'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '11_training'],
'description': 'Complete training pipelines'
},
'language': {
'name': 'Language Generation',
'file': 'demo_language.py',
'requires': ['02_tensor', '03_activations', '04_layers', '05_dense', '07_attention', '16_tinygpt'],
'description': 'AI text generation and language models'
}
}
def check_module_exported(self, module_name):
"""Check if a module has been exported to the package"""
try:
if module_name == '02_tensor':
import tinytorch.core.tensor
return True
elif module_name == '03_activations':
import tinytorch.core.activations
return True
elif module_name == '04_layers':
import tinytorch.core.layers
return True
elif module_name == '05_dense':
import tinytorch.core.dense
return True
elif module_name == '06_spatial':
import tinytorch.core.spatial
return True
elif module_name == '07_attention':
import tinytorch.core.attention
return True
elif module_name == '11_training':
import tinytorch.core.training
return True
elif module_name == '16_tinygpt':
import tinytorch.tinygpt
return True
return False
except ImportError:
return False
def get_demo_status(self, demo_name):
"""Get status of a demo: available, partial, or unavailable"""
demo = self.demos[demo_name]
required_modules = demo['requires']
available_count = sum(1 for module in required_modules if self.check_module_exported(module))
total_count = len(required_modules)
if available_count == total_count:
return '' # Fully available
elif available_count > 0:
return '' # Partially available
else:
return '' # Not available
def show_matrix(self):
"""Display the demo capability matrix"""
console.print("\n🤖 TinyTorch Demo Matrix", style="bold cyan")
console.print("=" * 50)
table = Table(show_header=True, header_style="bold magenta")
table.add_column("Demo", style="cyan", width=20)
table.add_column("Status", justify="center", width=8)
table.add_column("Description", style="dim")
available_demos = []
for demo_name, demo_info in self.demos.items():
status = self.get_demo_status(demo_name)
table.add_row(demo_info['name'], status, demo_info['description'])
if status == '':
available_demos.append(demo_name)
console.print(table)
console.print()
if available_demos:
console.print("🎯 Available Demos:", style="bold green")
for demo in available_demos:
console.print(f" • tito demo {demo}")
console.print()
console.print("Legend: ✅ Ready ⚡ Partial ❌ Not Available")
console.print()
def run_demo(self, demo_name):
"""Run a specific demo"""
if demo_name not in self.demos:
console.print(f"❌ Unknown demo: {demo_name}", style="red")
console.print("Available demos:", ', '.join(self.demos.keys()))
return False
demo = self.demos[demo_name]
status = self.get_demo_status(demo_name)
if status == '':
console.print(f"❌ Demo '{demo_name}' not available", style="red")
missing_modules = [m for m in demo['requires'] if not self.check_module_exported(m)]
console.print(f"Missing modules: {', '.join(missing_modules)}")
console.print(f"Run: tito export {' '.join(missing_modules)}")
return False
if status == '':
console.print(f"⚠️ Demo '{demo_name}' partially available", style="yellow")
console.print("Some features may not work correctly.")
# Find the demo file
project_root = Path(__file__).parent.parent.parent
demo_file = project_root / "demos" / demo['file']
if not demo_file.exists():
console.print(f"❌ Demo file not found: {demo_file}", style="red")
return False
console.print(f"🚀 Running {demo['name']} Demo...", style="bold green")
console.print()
# Run the demo
try:
result = subprocess.run([sys.executable, str(demo_file)],
capture_output=False,
text=True)
return result.returncode == 0
except Exception as e:
console.print(f"❌ Demo failed: {e}", style="red")
return False
class DemoCommand(BaseCommand):
"""Command for running TinyTorch AI capability demos"""
def __init__(self, config):
super().__init__(config)
self.matrix = TinyTorchDemoMatrix()
@property
def name(self) -> str:
return "demo"
@property
def description(self) -> str:
return "Run AI capability demos"
def add_arguments(self, parser):
"""Add demo command arguments"""
parser.add_argument('demo_name', nargs='?',
help='Name of demo to run (math, logic, neuron, network, etc.)')
parser.add_argument('--all', action='store_true',
help='Run all available demos')
parser.add_argument('--matrix', action='store_true',
help='Show capability matrix only')
def run(self, args):
"""Execute the demo command"""
# Just show matrix if no args or --matrix flag
if not args.demo_name and not args.all or args.matrix:
self.matrix.show_matrix()
return
# Run all available demos
if args.all:
self.matrix.show_matrix()
available_demos = [name for name in self.matrix.demos.keys()
if self.matrix.get_demo_status(name) == '']
if not available_demos:
console.print("❌ No demos available. Export some modules first!", style="red")
return
console.print(f"🚀 Running {len(available_demos)} available demos...", style="bold green")
console.print()
for demo_name in available_demos:
console.print(f"\n{'='*60}")
success = self.matrix.run_demo(demo_name)
if not success:
console.print(f"❌ Demo {demo_name} failed", style="red")
console.print(f"\n{'='*60}")
console.print("🏆 All available demos completed!", style="bold green")
return
# Run specific demo
if args.demo_name:
self.matrix.run_demo(args.demo_name)
def main():
"""Standalone entry point for development"""
import argparse
parser = argparse.ArgumentParser()
DemoCommand.add_parser(parser._subparsers_action.add_parser if hasattr(parser, '_subparsers_action') else parser.add_subparser)
args = parser.parse_args()
cmd = DemoCommand()
cmd.execute(args)
if __name__ == "__main__":
main()

View File

@@ -1,469 +0,0 @@
"""
Tiny🔥Torch Interactive Help System
Provides contextual, progressive guidance for new and experienced users.
"""
from argparse import ArgumentParser, Namespace
from typing import Optional, List, Dict, Any
import os
from pathlib import Path
from .base import BaseCommand
from ..core.config import CLIConfig
from ..core.console import get_console
from rich.console import Console
from rich.panel import Panel
from rich.columns import Columns
from rich.table import Table
from rich.text import Text
from rich.prompt import Prompt, Confirm
class HelpCommand(BaseCommand):
"""Interactive help and onboarding system."""
@property
def name(self) -> str:
return "help"
@property
def description(self) -> str:
return "Interactive help system with guided onboarding"
def add_arguments(self, parser: ArgumentParser) -> None:
"""Add help command arguments."""
parser.add_argument(
'topic',
nargs='?',
help='Specific help topic (getting-started, commands, workflow, etc.)'
)
parser.add_argument(
'--interactive', '-i',
action='store_true',
help='Launch interactive onboarding wizard'
)
parser.add_argument(
'--quick', '-q',
action='store_true',
help='Show quick reference card'
)
def run(self, args: Namespace) -> int:
"""Execute help command."""
console = get_console()
# Interactive onboarding wizard
if args.interactive:
return self._interactive_onboarding()
# Quick reference
if args.quick:
return self._show_quick_reference()
# Topic-specific help
if args.topic:
return self._show_topic_help(args.topic)
# Default: Show main help with user context
return self._show_contextual_help()
def _interactive_onboarding(self) -> int:
"""Launch interactive onboarding wizard."""
console = get_console()
# Welcome screen
console.print(Panel.fit(
"[bold blue]🚀 Welcome to Tiny🔥Torch![/bold blue]\n\n"
"Let's get you started on your ML systems engineering journey.\n"
"This quick wizard will help you understand what Tiny🔥Torch is\n"
"and guide you to the right starting point.",
title="Tiny🔥Torch Onboarding Wizard",
border_style="blue"
))
# User experience assessment
experience = self._assess_user_experience()
# Learning goal identification
goals = self._identify_learning_goals()
# Time commitment assessment
time_commitment = self._assess_time_commitment()
# Generate personalized recommendations
recommendations = self._generate_recommendations(experience, goals, time_commitment)
# Show personalized path
self._show_personalized_path(recommendations)
# Offer to start immediately
if Confirm.ask("\n[bold green]Ready to start your first steps?[/bold green]"):
self._launch_first_steps(recommendations)
return 0
def _assess_user_experience(self) -> str:
"""Assess user's ML and programming experience."""
console = get_console()
console.print("\n[bold cyan]📋 Quick Experience Assessment[/bold cyan]")
choices = [
"New to ML and Python - need fundamentals",
"Know Python, new to ML - want to learn systems",
"Use PyTorch/TensorFlow - want to understand internals",
"ML Engineer - need to debug/optimize production systems",
"Instructor - want to teach this course"
]
console.print("\nWhat best describes your background?")
for i, choice in enumerate(choices, 1):
console.print(f" {i}. {choice}")
while True:
try:
selection = int(Prompt.ask("\nEnter your choice (1-5)"))
if 1 <= selection <= 5:
return ['beginner', 'python_user', 'framework_user', 'ml_engineer', 'instructor'][selection-1]
else:
console.print("[red]Please enter a number between 1-5[/red]")
except ValueError:
console.print("[red]Please enter a valid number[/red]")
def _identify_learning_goals(self) -> List[str]:
"""Identify user's learning goals."""
console = get_console()
console.print("\n[bold cyan]🎯 Learning Goals[/bold cyan]")
console.print("What do you want to achieve? (Select all that apply)")
goals = [
("understand_internals", "Understand how PyTorch/TensorFlow work internally"),
("build_networks", "Build neural networks from scratch"),
("optimize_performance", "Learn to optimize ML system performance"),
("debug_production", "Debug production ML systems"),
("teach_course", "Teach ML systems to others"),
("career_transition", "Transition from software engineering to ML"),
("research_custom", "Implement custom operations for research")
]
selected_goals = []
for key, description in goals:
if Confirm.ask(f"{description}?"):
selected_goals.append(key)
return selected_goals
def _assess_time_commitment(self) -> str:
"""Assess available time commitment."""
console = get_console()
console.print("\n[bold cyan]⏰ Time Commitment[/bold cyan]")
choices = [
("15_minutes", "15 minutes - just want a quick taste"),
("2_hours", "2 hours - explore a few modules"),
("weekend", "Weekend project - build something substantial"),
("semester", "8-12 weeks - complete learning journey"),
("teaching", "Teaching timeline - need instructor resources")
]
console.print("How much time can you dedicate?")
for i, (key, description) in enumerate(choices, 1):
console.print(f" {i}. {description}")
while True:
try:
selection = int(Prompt.ask("\nEnter your choice (1-5)"))
if 1 <= selection <= 5:
return choices[selection-1][0]
else:
console.print("[red]Please enter a number between 1-5[/red]")
except ValueError:
console.print("[red]Please enter a valid number[/red]")
def _generate_recommendations(self, experience: str, goals: List[str], time: str) -> Dict[str, Any]:
"""Generate personalized recommendations."""
# Learning path mapping
path_mapping = {
'beginner': 'foundation_first',
'python_user': 'guided_learning',
'framework_user': 'systems_focus',
'ml_engineer': 'optimization_focus',
'instructor': 'teaching_resources'
}
# Starting point mapping
start_mapping = {
'15_minutes': 'quick_demo',
'2_hours': 'first_module',
'weekend': 'milestone_project',
'semester': 'full_curriculum',
'teaching': 'instructor_setup'
}
return {
'learning_path': path_mapping.get(experience, 'guided_learning'),
'starting_point': start_mapping.get(time, 'first_module'),
'experience_level': experience,
'goals': goals,
'time_commitment': time
}
def _show_personalized_path(self, recommendations: Dict[str, Any]) -> None:
"""Show personalized learning path."""
console = get_console()
# Path descriptions
paths = {
'foundation_first': {
'title': '🌱 Foundation First Path',
'description': 'Build fundamentals step-by-step with extra explanations',
'next_steps': ['Module 1: Setup & Environment', 'Python fundamentals review', 'Linear algebra primer']
},
'guided_learning': {
'title': '🎯 Guided Learning Path',
'description': 'Structured progression through all major concepts',
'next_steps': ['Module 1: Setup', 'Module 2: Tensors', 'Track progress with checkpoints']
},
'systems_focus': {
'title': '⚡ Systems Focus Path',
'description': 'Understand internals of frameworks you already use',
'next_steps': ['Compare PyTorch vs your code', 'Profile memory usage', 'Optimization modules']
},
'optimization_focus': {
'title': '🚀 Optimization Focus Path',
'description': 'Performance debugging and production optimization',
'next_steps': ['Profiling module', 'Benchmarking module', 'TinyMLPerf competition']
},
'teaching_resources': {
'title': '🎓 Teaching Resources Path',
'description': 'Instructor guides and classroom setup',
'next_steps': ['Instructor guide', 'NBGrader setup', 'Student progress tracking']
}
}
path_info = paths[recommendations['learning_path']]
console.print(f"\n[bold green]✨ Your Personalized Learning Path[/bold green]")
console.print(Panel(
f"[bold]{path_info['title']}[/bold]\n\n"
f"{path_info['description']}\n\n"
f"[bold cyan]Your Next Steps:[/bold cyan]\n" +
"\n".join(f"{step}" for step in path_info['next_steps']),
border_style="green"
))
def _launch_first_steps(self, recommendations: Dict[str, Any]) -> None:
"""Launch appropriate first steps based on recommendations."""
console = get_console()
starting_point = recommendations['starting_point']
if starting_point == 'quick_demo':
console.print("\n[bold blue]🚀 Launching Quick Demo...[/bold blue]")
console.print("Running: [code]tito demo quick[/code]")
os.system("tito demo quick")
elif starting_point == 'first_module':
console.print("\n[bold blue]🛠️ Setting up Module 1...[/bold blue]")
console.print("Next commands:")
console.print(" [code]cd modules/01_setup[/code]")
console.print(" [code]jupyter lab setup.py[/code]")
elif starting_point == 'milestone_project':
console.print("\n[bold blue]🎯 Weekend Project Recommendations...[/bold blue]")
console.print("Suggested goal: Build XOR solver (Modules 1-6)")
console.print("Time estimate: 6-8 hours")
elif starting_point == 'full_curriculum':
console.print("\n[bold blue]📚 Full Curriculum Setup...[/bold blue]")
console.print("Running checkpoint system initialization...")
os.system("tito checkpoint status")
elif starting_point == 'instructor_setup':
console.print("\n[bold blue]🎓 Instructor Resources...[/bold blue]")
console.print("Opening instructor guide...")
console.print("Check: [code]book/usage-paths/classroom-use.html[/code]")
def _show_quick_reference(self) -> int:
"""Show quick reference card."""
console = get_console()
# Essential commands table
table = Table(title="🚀 TinyTorch Quick Reference", show_header=True, header_style="bold cyan")
table.add_column("Command", style="bold", width=25)
table.add_column("Description", width=40)
table.add_column("Example", style="dim", width=30)
essential_commands = [
("tito help --interactive", "Launch onboarding wizard", "First time users"),
("tito checkpoint status", "See your progress", "Track learning journey"),
("tito module complete 02", "Finish a module", "Export & test your code"),
("tito demo quick", "See framework in action", "5-minute demonstration"),
("tito leaderboard join", "Join community", "Connect with learners"),
("tito system health", "Check environment", "Troubleshoot issues")
]
for cmd, desc, example in essential_commands:
table.add_row(cmd, desc, example)
console.print(table)
# Common workflows
console.print("\n[bold cyan]📋 Common Workflows:[/bold cyan]")
workflows = [
("New User", "tito help -i → tito checkpoint status → cd modules/01_setup"),
("Continue Learning", "tito checkpoint status → work on next module → tito module complete XX"),
("Join Community", "tito leaderboard join → submit progress → see global rankings"),
("Get Help", "tito system health → check docs/FAQ → ask community")
]
for workflow, commands in workflows:
console.print(f" [bold]{workflow}:[/bold] {commands}")
return 0
def _show_topic_help(self, topic: str) -> int:
"""Show help for specific topic."""
console = get_console()
topics = {
'getting-started': self._help_getting_started,
'commands': self._help_commands,
'workflow': self._help_workflow,
'modules': self._help_modules,
'checkpoints': self._help_checkpoints,
'community': self._help_community,
'troubleshooting': self._help_troubleshooting
}
if topic in topics:
topics[topic]()
return 0
else:
console.print(f"[red]Unknown help topic: {topic}[/red]")
console.print("Available topics: " + ", ".join(topics.keys()))
return 1
def _show_contextual_help(self) -> int:
"""Show contextual help based on user progress."""
console = get_console()
# Check user progress to provide contextual guidance
progress = self._assess_user_progress()
if progress['is_new_user']:
self._show_new_user_help()
elif progress['current_module']:
self._show_in_progress_help(progress['current_module'])
else:
self._show_experienced_user_help()
return 0
def _assess_user_progress(self) -> Dict[str, Any]:
"""Assess user's current progress."""
# Check for checkpoint files, completed modules, etc.
# This would integrate with the checkpoint system
# Simplified implementation for now
checkpoints_dir = Path("tests/checkpoints")
modules_dir = Path("modules")
return {
'is_new_user': not checkpoints_dir.exists(),
'current_module': None, # Would be determined by checkpoint status
'completed_modules': [], # Would be populated from checkpoint results
'has_joined_community': False # Would check leaderboard status
}
def _show_new_user_help(self) -> None:
"""Show help optimized for new users."""
console = get_console()
console.print(Panel.fit(
"[bold blue]👋 Welcome to Tiny🔥Torch![/bold blue]\n\n"
"You're about to build a complete ML framework from scratch.\n"
"Here's how to get started:\n\n"
"[bold cyan]Next Steps:[/bold cyan]\n"
"1. [code]tito help --interactive[/code] - Personalized onboarding\n"
"2. [code]tito system health[/code] - Check your environment\n"
"3. [code]tito checkpoint status[/code] - See the learning journey\n\n"
"[bold yellow]New to ML systems?[/bold yellow] Run the interactive wizard!",
title="Getting Started",
border_style="blue"
))
def _help_getting_started(self) -> None:
"""Detailed getting started help."""
console = get_console()
console.print("[bold blue]🚀 Getting Started with Tiny🔥Torch[/bold blue]\n")
# Installation steps
install_panel = Panel(
"[bold]1. Environment Setup[/bold]\n"
"```bash\n"
"git clone https://github.com/mlsysbook/Tiny🔥Torch.git\n"
"cd Tiny🔥Torch\n"
f"python -m venv {self.venv_path}\n"
f"source {self.venv_path}/bin/activate # Windows: .venv\\Scripts\\activate\n"
"pip install -r requirements.txt\n"
"pip install -e .\n"
"```",
title="Installation",
border_style="green"
)
# First steps
first_steps_panel = Panel(
"[bold]2. First Steps[/bold]\n"
"• [code]tito system health[/code] - Verify installation\n"
"• [code]tito help --interactive[/code] - Personalized guidance\n"
"• [code]tito checkpoint status[/code] - See learning path\n"
"• [code]cd modules/01_setup[/code] - Start first module",
title="First Steps",
border_style="blue"
)
# Learning path
learning_panel = Panel(
"[bold]3. Learning Journey[/bold]\n"
"📚 [bold]Modules 1-8:[/bold] Neural Network Foundations\n"
"🔬 [bold]Modules 9-10:[/bold] Computer Vision (CNNs)\n"
"🤖 [bold]Modules 11-14:[/bold] Language Models (Transformers)\n"
"⚡ [bold]Modules 15-20:[/bold] System Optimization\n\n"
"[dim]Each module: Build → Test → Export → Checkpoint[/dim]",
title="Learning Path",
border_style="yellow"
)
console.print(Columns([install_panel, first_steps_panel, learning_panel]))
# Additional help methods would be implemented here...
def _help_commands(self) -> None:
"""Show comprehensive command reference."""
pass
def _help_workflow(self) -> None:
"""Show common workflow patterns."""
pass
def _help_modules(self) -> None:
"""Show module system explanation."""
pass
def _help_checkpoints(self) -> None:
"""Show checkpoint system explanation."""
pass
def _help_community(self) -> None:
"""Show community features and leaderboard."""
pass
def _help_troubleshooting(self) -> None:
"""Show troubleshooting guide."""
pass

File diff suppressed because it is too large Load Diff

View File

@@ -1,193 +0,0 @@
"""
Notebooks command for building Jupyter notebooks from Python files using Jupytext.
"""
import subprocess
import sys
from argparse import ArgumentParser, Namespace
from pathlib import Path
from typing import List, Tuple
from rich.panel import Panel
from rich.text import Text
from .base import BaseCommand
from ..core.exceptions import ExecutionError, ModuleNotFoundError
class NotebooksCommand(BaseCommand):
"""Command to build Jupyter notebooks from Python files using Jupytext."""
@property
def name(self) -> str:
return "notebooks"
@property
def description(self) -> str:
return "Build notebooks from Python files"
def add_arguments(self, parser: ArgumentParser) -> None:
"""Add notebooks command arguments."""
parser.add_argument(
'--module',
help='Build notebook for specific module'
)
parser.add_argument(
'--force',
action='store_true',
help='Force rebuild even if notebook exists'
)
parser.add_argument(
'--dry-run',
action='store_true',
help='Show what would be built without actually building'
)
def validate_args(self, args: Namespace) -> None:
"""Validate notebooks command arguments."""
if args.module:
module_dir = self.config.modules_dir / args.module
if not module_dir.exists():
raise ModuleNotFoundError(f"Module directory '{args.module}' not found")
# Find module Python file in the module directory
# Extract short name from module directory name
if args.module.startswith(tuple(f"{i:02d}_" for i in range(100))):
short_name = args.module[3:] # Remove "00_" prefix
else:
short_name = args.module
dev_file = module_dir / f"{short_name}.py"
if not dev_file.exists():
raise ModuleNotFoundError(
f"No module file found in module '{args.module}'. Expected: {dev_file.name}"
)
def _find_dev_files(self) -> List[Path]:
"""Find all module Python files in modules directory."""
dev_files = []
# Look in modules/ directory
modules_dir = self.config.modules_dir
for module_dir in modules_dir.iterdir():
if module_dir.is_dir() and not module_dir.name.startswith('.'):
# Extract short name from module directory name
module_name = module_dir.name
if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
short_name = module_name[3:] # Remove "00_" prefix
else:
short_name = module_name
# Look for module Python file (without _dev suffix)
py_file = module_dir / f"{short_name}.py"
if py_file.exists():
dev_files.append(py_file)
return sorted(dev_files)
def _convert_file(self, dev_file: Path) -> Tuple[bool, str]:
"""Convert a single Python file to notebook using Jupytext."""
try:
# Use Jupytext from venv to convert Python file to notebook
import sys
venv_python = Path(sys.executable)
jupytext_cmd = venv_python.parent / "jupytext"
result = subprocess.run([
str(jupytext_cmd), "--to", "notebook", str(dev_file)
], capture_output=True, text=True, timeout=30, cwd=dev_file.parent)
if result.returncode == 0:
notebook_file = dev_file.with_suffix('.ipynb')
return True, f"{dev_file.name}{notebook_file.name}"
else:
error_msg = result.stderr.strip() if result.stderr.strip() else "Conversion failed"
return False, error_msg
except subprocess.TimeoutExpired:
return False, "Conversion timed out"
except FileNotFoundError:
return False, "Jupytext not found. Install with: pip install jupytext"
except Exception as e:
return False, f"Error: {str(e)}"
def run(self, args: Namespace) -> int:
"""Execute the notebooks command."""
self.console.print(Panel(
"📓 Building Notebooks from Python Files (using Jupytext)",
title="Notebook Generation",
border_style="bright_cyan"
))
# Find files to convert
if args.module:
module_dir = self.config.modules_dir / args.module
# Extract short name from module directory name
module_name = args.module
if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
short_name = module_name[3:] # Remove "00_" prefix
else:
short_name = module_name
dev_file = module_dir / f"{short_name}.py"
if dev_file.exists():
dev_files = [dev_file]
else:
dev_files = []
self.console.print(f"🔄 Building notebook for module: {args.module}")
else:
dev_files = self._find_dev_files()
if not dev_files:
self.console.print(Panel(
"[yellow]⚠️ No *.py files found in modules/[/yellow]",
title="Nothing to Convert",
border_style="yellow"
))
return 0
self.console.print(f"🔄 Building notebooks for {len(dev_files)} modules...")
# Dry run mode
if args.dry_run:
self.console.print("\n[cyan]Dry run mode - would convert:[/cyan]")
for dev_file in dev_files:
module_name = dev_file.parent.name
self.console.print(f"{module_name}: {dev_file.name}")
return 0
# Convert files
success_count = 0
error_count = 0
for dev_file in dev_files:
success, message = self._convert_file(dev_file)
module_name = dev_file.parent.name
if success:
success_count += 1
self.console.print(f"{module_name}: {message}")
else:
error_count += 1
self.console.print(f"{module_name}: {message}")
# Summary
self._print_summary(success_count, error_count)
return 0 if error_count == 0 else 1
def _print_summary(self, success_count: int, error_count: int) -> None:
"""Print command execution summary."""
summary_text = Text()
if success_count > 0:
summary_text.append(f"✅ Successfully built {success_count} notebook(s)\n", style="bold green")
if error_count > 0:
summary_text.append(f"❌ Failed to build {error_count} notebook(s)\n", style="bold red")
if success_count > 0:
summary_text.append("\n💡 Next steps:\n", style="bold yellow")
summary_text.append(" • Open notebooks with: jupyter lab\n", style="white")
summary_text.append(" • Work interactively in the notebooks\n", style="white")
summary_text.append(" • Export code with: tito package export\n", style="white")
summary_text.append(" • Run tests with: tito module test\n", style="white")
border_style = "green" if error_count == 0 else "yellow"
self.console.print(Panel(
summary_text,
title="Notebook Generation Complete",
border_style=border_style
))

View File

@@ -1,897 +0,0 @@
"""
TinyTorch Olympics Command
Special competition events with focused challenges, time-limited competitions,
and unique recognition opportunities beyond the regular community leaderboard.
"""
import json
import os
from argparse import ArgumentParser, Namespace
from datetime import datetime, timedelta
from pathlib import Path
from typing import Dict, List, Optional, Any
import uuid
from rich.panel import Panel
from rich.table import Table
from rich.progress import track
from rich.prompt import Prompt, Confirm
from rich.console import Group
from rich.align import Align
from .base import BaseCommand
from ..core.exceptions import TinyTorchCLIError
class OlympicsCommand(BaseCommand):
"""Special competition events - Focused challenges and recognition"""
@property
def name(self) -> str:
return "olympics"
@property
def description(self) -> str:
return "Special competition events with unique challenges and recognition"
def add_arguments(self, parser: ArgumentParser) -> None:
"""Add olympics subcommands."""
subparsers = parser.add_subparsers(
dest='olympics_command',
help='Olympics operations',
metavar='COMMAND'
)
# Events command
events_parser = subparsers.add_parser(
'events',
help='View current and upcoming competition events'
)
events_parser.add_argument(
'--upcoming',
action='store_true',
help='Show only upcoming events'
)
events_parser.add_argument(
'--past',
action='store_true',
help='Show past competition results'
)
# Compete command
compete_parser = subparsers.add_parser(
'compete',
help='Enter a specific competition event'
)
compete_parser.add_argument(
'--event',
required=True,
help='Event ID to compete in'
)
compete_parser.add_argument(
'--accuracy',
type=float,
help='Accuracy achieved for this competition'
)
compete_parser.add_argument(
'--model',
help='Model description and approach used'
)
compete_parser.add_argument(
'--code-url',
help='Optional: Link to your competition code/approach'
)
compete_parser.add_argument(
'--notes',
help='Competition-specific notes, innovations, learnings'
)
# Awards command
awards_parser = subparsers.add_parser(
'awards',
help='View special recognition and achievement badges'
)
awards_parser.add_argument(
'--personal',
action='store_true',
help='Show only your personal awards'
)
# History command
history_parser = subparsers.add_parser(
'history',
help='View past competition events and memorable moments'
)
history_parser.add_argument(
'--year',
type=int,
help='Filter by specific year'
)
history_parser.add_argument(
'--event-type',
choices=['speed', 'accuracy', 'innovation', 'efficiency', 'community'],
help='Filter by event type'
)
def run(self, args: Namespace) -> int:
"""Execute olympics command."""
command = getattr(args, 'olympics_command', None)
if not command:
self._show_olympics_overview()
return 0
if command == 'events':
return self._show_events(args)
elif command == 'compete':
return self._compete_in_event(args)
elif command == 'awards':
return self._show_awards(args)
elif command == 'history':
return self._show_history(args)
else:
raise TinyTorchCLIError(f"Unknown olympics command: {command}")
def _show_olympics_overview(self) -> None:
"""Show olympics overview and current special events."""
self.console.print(Panel(
Group(
Align.center("[bold bright_gold]🏅 TinyTorch Olympics 🏅[/bold bright_gold]"),
"",
"[bold]Special Competition Events![/bold] Beyond the regular community leaderboard:",
"",
"🎯 [bold bright_blue]Focused Challenges[/bold bright_blue]",
" • Time-limited competitions (24hr, 1week, 1month challenges)",
" • Specific constraints (memory-efficient, fastest training, novel architectures)",
" • Theme-based events (interpretability, fairness, efficiency)",
"",
"🏆 [bold bright_yellow]Special Recognition[/bold bright_yellow]",
" • Olympic medals and achievement badges",
" • Innovation awards for creative approaches",
" • Community impact recognition",
"",
"🌟 [bold bright_green]Current Active Events[/bold bright_green]",
" • Winter 2024 Speed Challenge (Training under 5 minutes)",
" • Memory Efficiency Olympics (Models under 1MB)",
" • Architecture Innovation Contest (Novel designs welcome)",
"",
"[bold]Available Commands:[/bold]",
" [green]events[/green] - See current and upcoming competitions",
" [green]compete[/green] - Enter a specific event",
" [green]awards[/green] - View special recognition and badges",
" [green]history[/green] - Past competitions and memorable moments",
"",
"[dim]💡 Note: Olympics are special events separate from daily community leaderboard[/dim]",
),
title="🥇 Competition Central",
border_style="bright_yellow",
padding=(1, 2)
))
def _show_events(self, args: Namespace) -> int:
"""Show current and upcoming competition events."""
# Load events data (mock for now)
events = self._load_olympics_events()
if args.upcoming:
events = [e for e in events if e["status"] == "upcoming"]
title = "📅 Upcoming Competition Events"
elif args.past:
events = [e for e in events if e["status"] == "completed"]
title = "🏛️ Past Competition Results"
else:
title = "🏅 All Competition Events"
if not events:
status_text = "upcoming" if args.upcoming else "past" if args.past else "available"
self.console.print(Panel(
f"[yellow]No {status_text} events at this time![/yellow]\n\n"
"Check back soon for new competition opportunities!",
title="📅 No Events",
border_style="yellow"
))
return 0
# Create events table
table = Table(title=title)
table.add_column("Event", style="bold")
table.add_column("Type", style="blue")
table.add_column("Duration", style="green")
table.add_column("Status", style="yellow")
table.add_column("Prize/Recognition", style="bright_magenta")
table.add_column("Participants", style="cyan", justify="right")
for event in events:
status_display = self._get_status_display(event["status"], event.get("end_date"))
table.add_row(
event["name"],
event["type"],
event["duration"],
status_display,
event["prize"],
str(event.get("participants", 0))
)
self.console.print(table)
# Show active event details
active_events = [e for e in events if e["status"] == "active"]
if active_events:
self.console.print(Panel(
Group(
"[bold bright_green]🔥 Active Competitions You Can Join Now![/bold bright_green]",
"",
*[f"• [bold]{event['name']}[/bold]: {event['description']}" for event in active_events[:3]],
"",
"[bold]Join a competition:[/bold]",
"[dim]tito olympics compete --event <event_id>[/dim]",
),
title="⚡ Join Now",
border_style="bright_green",
padding=(0, 1)
))
return 0
def _compete_in_event(self, args: Namespace) -> int:
"""Enter a competition event."""
# Check if user is registered for leaderboard
if not self._is_user_registered():
self.console.print(Panel(
"[yellow]Please register for the community leaderboard first![/yellow]\n\n"
"Olympics competitions require community membership:\n"
"[bold]tito leaderboard register[/bold]",
title="📝 Registration Required",
border_style="yellow"
))
return 1
# Load event details
event = self._get_event_details(args.event)
if not event:
self.console.print(Panel(
f"[red]Event '{args.event}' not found![/red]\n\n"
"See available events: [bold]tito olympics events[/bold]",
title="❌ Event Not Found",
border_style="red"
))
return 1
# Check if event is active
if event["status"] != "active":
self.console.print(Panel(
f"[yellow]Event '{event['name']}' is not currently active![/yellow]\n\n"
f"Status: {event['status']}\n"
"See active events: [bold]tito olympics events[/bold]",
title="⏰ Event Not Active",
border_style="yellow"
))
return 1
# Show event details and confirm participation
self._show_event_details(event)
if not Confirm.ask("\n[bold]Compete in this event?[/bold]"):
self.console.print("[dim]Maybe next time! 👋[/dim]")
return 0
# Gather competition submission
submission = self._gather_competition_submission(event, args)
# Validate submission meets event criteria
validation_result = self._validate_submission(event, submission)
if not validation_result["valid"]:
self.console.print(Panel(
f"[red]Submission doesn't meet event criteria![/red]\n\n"
f"Issue: {validation_result['reason']}\n\n"
"Please check event requirements and try again.",
title="❌ Validation Failed",
border_style="red"
))
return 1
# Save competition entry
self._save_competition_entry(event, submission)
# Show competition confirmation and standing
self._show_competition_confirmation(event, submission)
return 0
def _show_awards(self, args: Namespace) -> int:
"""Show special recognition and achievement badges."""
if args.personal:
return self._show_personal_awards()
else:
return self._show_all_awards()
def _show_personal_awards(self) -> int:
"""Show user's personal awards and badges."""
if not self._is_user_registered():
self.console.print(Panel(
"[yellow]Please register first to see your awards![/yellow]\n\n"
"Run: [bold]tito leaderboard register[/bold]",
title="📝 Registration Required",
border_style="yellow"
))
return 1
# Load user's Olympic achievements
olympic_profile = self._load_user_olympic_profile()
awards = olympic_profile.get("awards", [])
competitions = olympic_profile.get("competitions", [])
if not awards and not competitions:
self.console.print(Panel(
Group(
"[bold bright_blue]🌟 Your Olympic Journey Awaits![/bold bright_blue]",
"",
"You haven't participated in Olympics competitions yet.",
"",
"[bold]Start your journey:[/bold]",
"• Check active events: [green]tito olympics events[/green]",
"• Join a competition: [green]tito olympics compete --event <id>[/green]",
"• Earn your first Olympic badge! 🏅",
"",
"[dim]Every Olympic participant gets recognition for participation![/dim]",
),
title="🏅 Your Olympic Profile",
border_style="bright_blue",
padding=(1, 2)
))
return 0
# Show awards and achievements
self._display_personal_olympic_achievements(olympic_profile)
return 0
def _show_all_awards(self) -> int:
"""Show community awards and notable achievements."""
# Mock awards data
notable_awards = self._load_notable_awards()
# Recent awards table
table = Table(title="🏆 Recent Olympic Achievements")
table.add_column("Award", style="bold")
table.add_column("Recipient", style="green")
table.add_column("Event", style="blue")
table.add_column("Achievement", style="yellow")
table.add_column("Date", style="dim")
for award in notable_awards[:10]:
table.add_row(
award["award_type"],
award["recipient"],
award["event"],
award["description"],
award["date"]
)
self.console.print(table)
# Award categories explanation
self.console.print(Panel(
Group(
"[bold bright_yellow]🏅 Olympic Award Categories[/bold bright_yellow]",
"",
"🥇 [bold]Performance Awards[/bold]",
" • Gold/Silver/Bronze medals for top competition results",
" • Speed records, accuracy achievements, efficiency milestones",
"",
"🌟 [bold]Innovation Awards[/bold]",
" • Novel Architecture Award for creative model designs",
" • Optimization Genius for breakthrough efficiency techniques",
" • Interpretability Champion for explainable AI contributions",
"",
"🤝 [bold]Community Awards[/bold]",
" • Mentor Badge for helping other competitors",
" • Knowledge Sharer for valuable insights and tutorials",
" • Sportsperson Award for exceptional community spirit",
"",
"🎯 [bold]Special Recognition[/bold]",
" • First Participation Badge (everyone gets this!)",
" • Consistency Award for regular competition participation",
" • Breakthrough Achievement for major personal improvements",
),
title="🏆 Recognition System",
border_style="bright_yellow",
padding=(0, 1)
))
return 0
def _show_history(self, args: Namespace) -> int:
"""Show past competition events and memorable moments."""
# Load historical data
history = self._load_olympics_history()
# Filter by year if specified
if args.year:
history = [h for h in history if h["year"] == args.year]
# Filter by event type if specified
if args.event_type:
history = [h for h in history if h["type"] == args.event_type]
if not history:
filter_text = f" for {args.year}" if args.year else ""
filter_text += f" ({args.event_type} events)" if args.event_type else ""
self.console.print(Panel(
f"[yellow]No competition history found{filter_text}![/yellow]\n\n"
"The Olympics program is just getting started!",
title="📚 No History",
border_style="yellow"
))
return 0
# Create history table
table = Table(title="📚 TinyTorch Olympics History")
table.add_column("Event", style="bold")
table.add_column("Date", style="dim")
table.add_column("Type", style="blue")
table.add_column("Winner", style="green")
table.add_column("Achievement", style="yellow")
table.add_column("Memorable Moment", style="cyan")
for event in sorted(history, key=lambda x: x["date"], reverse=True):
table.add_row(
event["name"],
event["date"],
event["type"],
event["winner"],
event["winning_achievement"],
event["memorable_moment"]
)
self.console.print(table)
# Show legendary moments
if not args.year and not args.event_type:
self.console.print(Panel(
Group(
"[bold bright_gold]🌟 Legendary Olympic Moments[/bold bright_gold]",
"",
"🏆 [bold]The Great Speed Challenge 2024[/bold]",
" Winner achieved 75% CIFAR-10 accuracy in just 47 seconds!",
"",
"🧠 [bold]Architecture Innovation Contest[/bold]",
" Revolutionary attention mechanism reduced parameters by 90%",
"",
"🤝 [bold]Community Spirit Award[/bold]",
" Competitor shared winning code to help others improve",
"",
"[dim]Each Olympics creates new legends in the TinyTorch community! 💫[/dim]",
),
title="🏛️ Hall of Fame",
border_style="bright_gold",
padding=(0, 1)
))
return 0
def _load_olympics_events(self) -> List[Dict[str, Any]]:
"""Load olympics events data (mock implementation)."""
return [
{
"id": "winter2024_speed",
"name": "Winter 2024 Speed Challenge",
"type": "Speed",
"status": "active",
"duration": "24 hours",
"description": "Train CIFAR-10 model to 70%+ accuracy in under 5 minutes",
"prize": "🏆 Speed Medal + Recognition",
"participants": 23,
"start_date": "2024-01-15",
"end_date": "2024-01-16",
"criteria": {"min_accuracy": 70.0, "max_time_minutes": 5}
},
{
"id": "memory2024_efficiency",
"name": "Memory Efficiency Olympics",
"type": "Efficiency",
"status": "active",
"duration": "1 week",
"description": "Best CIFAR-10 accuracy with model under 1MB",
"prize": "🥇 Efficiency Champion",
"participants": 15,
"start_date": "2024-01-10",
"end_date": "2024-01-17",
"criteria": {"max_model_size_mb": 1.0}
},
{
"id": "innovation2024_arch",
"name": "Architecture Innovation Contest",
"type": "Innovation",
"status": "upcoming",
"duration": "2 weeks",
"description": "Novel architectures and creative approaches welcome",
"prize": "🌟 Innovation Award",
"participants": 0,
"start_date": "2024-02-01",
"end_date": "2024-02-14",
"criteria": {"novelty_required": True}
},
{
"id": "autumn2023_classic",
"name": "Autumn 2023 Classic",
"type": "Accuracy",
"status": "completed",
"duration": "1 month",
"description": "Best overall CIFAR-10 accuracy challenge",
"prize": "🥇 Gold Medal",
"participants": 87,
"start_date": "2023-10-01",
"end_date": "2023-10-31",
"winner": "neural_champion",
"winning_score": 84.2
}
]
def _get_status_display(self, status: str, end_date: Optional[str] = None) -> str:
"""Get display-friendly status with timing information."""
if status == "active":
if end_date:
# Calculate time remaining
end = datetime.fromisoformat(end_date)
now = datetime.now()
if end > now:
remaining = end - now
if remaining.days > 0:
return f"🔥 Active ({remaining.days}d left)"
else:
hours = remaining.seconds // 3600
return f"🔥 Active ({hours}h left)"
return "🔥 Active"
elif status == "upcoming":
return "📅 Upcoming"
elif status == "completed":
return "✅ Completed"
else:
return status.title()
def _is_user_registered(self) -> bool:
"""Check if user is registered for community leaderboard."""
from .leaderboard import LeaderboardCommand
leaderboard_cmd = LeaderboardCommand(self.config)
return leaderboard_cmd._load_user_profile() is not None
def _get_event_details(self, event_id: str) -> Optional[Dict[str, Any]]:
"""Get details for a specific event."""
events = self._load_olympics_events()
return next((e for e in events if e["id"] == event_id), None)
def _show_event_details(self, event: Dict[str, Any]) -> None:
"""Show detailed information about an event."""
self.console.print(Panel(
Group(
f"[bold bright_blue]{event['name']}[/bold bright_blue]",
"",
f"[bold]Type:[/bold] {event['type']}",
f"[bold]Duration:[/bold] {event['duration']}",
f"[bold]Current Participants:[/bold] {event.get('participants', 0)}",
"",
f"[bold]Challenge:[/bold]",
f" {event['description']}",
"",
f"[bold]Recognition:[/bold]",
f" {event['prize']}",
"",
f"[bold]Requirements:[/bold]",
*[f"{k.replace('_', ' ').title()}: {v}" for k, v in event.get('criteria', {}).items()],
),
title=f"🏅 {event['type']} Competition",
border_style="bright_blue",
padding=(1, 2)
))
def _gather_competition_submission(self, event: Dict[str, Any], args: Namespace) -> Dict[str, Any]:
"""Gather submission details for competition."""
submission = {
"event_id": event["id"],
"submitted_date": datetime.now().isoformat()
}
# Get accuracy
if args.accuracy is not None:
submission["accuracy"] = args.accuracy
else:
submission["accuracy"] = float(Prompt.ask(
f"[bold]Accuracy achieved on {event.get('dataset', 'the task')}[/bold]",
default="0.0"
))
# Get model description
if args.model:
submission["model"] = args.model
else:
submission["model"] = Prompt.ask(
"[bold]Model description[/bold] (architecture, approach, innovations)",
default="Custom Model"
)
# Optional fields
submission["code_url"] = args.code_url or Prompt.ask(
"[bold]Code/approach URL[/bold] (optional)",
default=""
) or None
submission["notes"] = args.notes or Prompt.ask(
"[bold]Competition notes[/bold] (innovations, challenges, learnings)",
default=""
) or None
# Event-specific metrics
if "max_time_minutes" in event.get("criteria", {}):
training_time = float(Prompt.ask(
"[bold]Training time in minutes[/bold]",
default="0.0"
))
submission["training_time_minutes"] = training_time
if "max_model_size_mb" in event.get("criteria", {}):
model_size = float(Prompt.ask(
"[bold]Model size in MB[/bold]",
default="0.0"
))
submission["model_size_mb"] = model_size
return submission
def _validate_submission(self, event: Dict[str, Any], submission: Dict[str, Any]) -> Dict[str, Any]:
"""Validate submission meets event criteria."""
criteria = event.get("criteria", {})
# Check minimum accuracy
if "min_accuracy" in criteria:
if submission["accuracy"] < criteria["min_accuracy"]:
return {
"valid": False,
"reason": f"Accuracy {submission['accuracy']:.1f}% below required {criteria['min_accuracy']:.1f}%"
}
# Check maximum training time
if "max_time_minutes" in criteria:
if submission.get("training_time_minutes", 0) > criteria["max_time_minutes"]:
return {
"valid": False,
"reason": f"Training time {submission['training_time_minutes']:.1f}min exceeds limit {criteria['max_time_minutes']:.1f}min"
}
# Check maximum model size
if "max_model_size_mb" in criteria:
if submission.get("model_size_mb", 0) > criteria["max_model_size_mb"]:
return {
"valid": False,
"reason": f"Model size {submission['model_size_mb']:.1f}MB exceeds limit {criteria['max_model_size_mb']:.1f}MB"
}
return {"valid": True}
def _save_competition_entry(self, event: Dict[str, Any], submission: Dict[str, Any]) -> None:
"""Save competition entry to user's Olympic profile."""
olympic_profile = self._load_user_olympic_profile()
if "competitions" not in olympic_profile:
olympic_profile["competitions"] = []
olympic_profile["competitions"].append(submission)
# Add participation award if first competition
if len(olympic_profile["competitions"]) == 1:
award = {
"type": "participation",
"name": "First Olympic Participation",
"description": "Welcomed to the Olympics community!",
"event": event["name"],
"earned_date": datetime.now().isoformat()
}
if "awards" not in olympic_profile:
olympic_profile["awards"] = []
olympic_profile["awards"].append(award)
self._save_user_olympic_profile(olympic_profile)
def _show_competition_confirmation(self, event: Dict[str, Any], submission: Dict[str, Any]) -> None:
"""Show confirmation and current standing."""
# Determine performance level for this competition
ranking_message = self._get_competition_ranking_message(event, submission)
self.console.print(Panel(
Group(
Align.center("[bold bright_green]🎉 Competition Entry Submitted! 🎉[/bold bright_green]"),
"",
f"[bold]Event:[/bold] {event['name']}",
f"[bold]Your Result:[/bold] {submission['accuracy']:.1f}% accuracy",
f"[bold]Model:[/bold] {submission['model']}",
"",
ranking_message,
"",
"[bold bright_blue]🏅 Recognition Earned:[/bold bright_blue]",
"• Olympic Participant Badge",
"• Competition Experience Points",
"• Community Recognition",
"",
"[bold]Next Steps:[/bold]",
"• View your awards: [green]tito olympics awards --personal[/green]",
"• See current standings: [green]tito olympics events[/green]",
"• Join another event: [green]tito olympics events[/green]",
),
title="🥇 Olympic Achievement",
border_style="bright_green",
padding=(1, 2)
))
def _get_competition_ranking_message(self, event: Dict[str, Any], submission: Dict[str, Any]) -> str:
"""Get appropriate ranking/performance message for competition."""
accuracy = submission["accuracy"]
# Mock competition standings for encouragement
if accuracy >= 80:
return "[bright_green]🏆 Outstanding performance! You're in contention for top prizes![/bright_green]"
elif accuracy >= 70:
return "[bright_blue]🎯 Strong showing! You're competing well in this event![/bright_blue]"
elif accuracy >= 60:
return "[bright_yellow]🌟 Good effort! Every competition teaches valuable lessons![/bright_yellow]"
else:
return "[bright_magenta]💝 Thank you for participating! Competition experience is valuable![/bright_magenta]"
def _load_user_olympic_profile(self) -> Dict[str, Any]:
"""Load user's Olympic competition profile."""
data_dir = Path.home() / ".tinytorch" / "olympics"
data_dir.mkdir(parents=True, exist_ok=True)
profile_file = data_dir / "olympic_profile.json"
if profile_file.exists():
with open(profile_file, 'r') as f:
return json.load(f)
return {
"competitions": [],
"awards": [],
"created_date": datetime.now().isoformat()
}
def _save_user_olympic_profile(self, profile: Dict[str, Any]) -> None:
"""Save user's Olympic competition profile."""
data_dir = Path.home() / ".tinytorch" / "olympics"
profile_file = data_dir / "olympic_profile.json"
with open(profile_file, 'w') as f:
json.dump(profile, f, indent=2)
def _display_personal_olympic_achievements(self, olympic_profile: Dict[str, Any]) -> None:
"""Display user's personal Olympic achievements."""
competitions = olympic_profile.get("competitions", [])
awards = olympic_profile.get("awards", [])
# Summary stats
total_competitions = len(competitions)
best_accuracy = max([c["accuracy"] for c in competitions], default=0)
events_participated = len(set(c["event_id"] for c in competitions))
self.console.print(Panel(
Group(
Align.center("[bold bright_gold]🏅 Your Olympic Journey 🏅[/bold bright_gold]"),
"",
f"🎯 Competitions Entered: {total_competitions}",
f"🏆 Best Performance: {best_accuracy:.1f}% accuracy",
f"🌟 Events Participated: {events_participated}",
f"🥇 Awards Earned: {len(awards)}",
),
title="📊 Olympic Stats",
border_style="bright_gold",
padding=(1, 2)
))
# Awards table
if awards:
awards_table = Table(title="🏆 Your Olympic Awards")
awards_table.add_column("Award", style="bold")
awards_table.add_column("Event", style="blue")
awards_table.add_column("Description", style="green")
awards_table.add_column("Date", style="dim")
for award in sorted(awards, key=lambda x: x["earned_date"], reverse=True):
awards_table.add_row(
award["name"],
award["event"],
award["description"],
award["earned_date"][:10]
)
self.console.print(awards_table)
# Recent competitions
if competitions:
recent_comps = sorted(competitions, key=lambda x: x["submitted_date"], reverse=True)[:5]
comps_table = Table(title="🎯 Recent Competition Entries")
comps_table.add_column("Event", style="bold")
comps_table.add_column("Accuracy", style="green", justify="right")
comps_table.add_column("Model", style="blue")
comps_table.add_column("Date", style="dim")
for comp in recent_comps:
comps_table.add_row(
comp["event_id"],
f"{comp['accuracy']:.1f}%",
comp["model"],
comp["submitted_date"][:10]
)
self.console.print(comps_table)
def _load_notable_awards(self) -> List[Dict[str, Any]]:
"""Load notable community awards (mock implementation)."""
return [
{
"award_type": "🥇 Gold Medal",
"recipient": "speed_demon",
"event": "Winter 2024 Speed Challenge",
"description": "2.3 min training, 78.4% accuracy",
"date": "2024-01-16"
},
{
"award_type": "🌟 Innovation Award",
"recipient": "arch_wizard",
"event": "Memory Efficiency Olympics",
"description": "Novel attention mechanism",
"date": "2024-01-15"
},
{
"award_type": "🤝 Community Spirit",
"recipient": "helpful_mentor",
"event": "Autumn 2023 Classic",
"description": "Shared winning approach publicly",
"date": "2023-11-01"
},
{
"award_type": "🏆 Speed Record",
"recipient": "lightning_fast",
"event": "Winter 2024 Speed Challenge",
"description": "47 second training record",
"date": "2024-01-15"
},
{
"award_type": "🎯 Accuracy Champion",
"recipient": "precision_master",
"event": "Architecture Innovation",
"description": "86.7% CIFAR-10 accuracy",
"date": "2024-01-10"
}
]
def _load_olympics_history(self) -> List[Dict[str, Any]]:
"""Load historical Olympics data (mock implementation)."""
return [
{
"name": "Autumn 2023 Classic",
"date": "2023-10-31",
"year": 2023,
"type": "accuracy",
"winner": "neural_champion",
"winning_achievement": "84.2% CIFAR-10 accuracy",
"memorable_moment": "First 80%+ achievement in community"
},
{
"name": "Summer 2023 Speed Trial",
"date": "2023-07-15",
"year": 2023,
"type": "speed",
"winner": "velocity_victor",
"winning_achievement": "3.2 minute training",
"memorable_moment": "Breakthrough GPU optimization technique"
},
{
"name": "Spring 2023 Innovation Fair",
"date": "2023-04-20",
"year": 2023,
"type": "innovation",
"winner": "creative_genius",
"winning_achievement": "Self-organizing architecture",
"memorable_moment": "Inspired 12 follow-up research papers"
}
]

View File

@@ -1,572 +0,0 @@
"""
Status command for TinyTorch CLI: checks status of all modules in modules/ directory.
Supports both basic status checking and comprehensive system analysis.
"""
import subprocess
import sys
import yaml
import re
import time
from argparse import ArgumentParser, Namespace
from pathlib import Path
from rich.panel import Panel
from rich.table import Table
from rich.text import Text
from typing import Union, Dict, Any, Optional
from .base import BaseCommand
from ..core.status_analyzer import TinyTorchStatusAnalyzer
class StatusCommand(BaseCommand):
@property
def name(self) -> str:
return "status"
@property
def description(self) -> str:
return "Check status of all modules"
def add_arguments(self, parser: ArgumentParser) -> None:
parser.add_argument("--progress", action="store_true", help="Show user progress (modules + milestones) - DEFAULT")
parser.add_argument("--files", action="store_true", help="Show file structure and module status")
parser.add_argument("--details", action="store_true", help="Show detailed file structure")
parser.add_argument("--metadata", action="store_true", help="Show module metadata information")
parser.add_argument("--test-status", action="store_true", help="Include test execution status (slower)")
parser.add_argument("--comprehensive", action="store_true", help="Run comprehensive system health dashboard (environment + compliance + testing)")
def _get_export_target(self, module_path: Path) -> str:
"""
Read the actual export target from the dev file's #| default_exp directive.
Same logic as the export command.
"""
# Extract short name from module directory name for dev file
module_name = module_path.name
if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
short_name = module_name[3:] # Remove "00_" prefix
else:
short_name = module_name
dev_file = module_path / f"{short_name}.py"
if not dev_file.exists():
return "not_found"
try:
with open(dev_file, 'r', encoding='utf-8') as f:
content = f.read()
# Look for #| default_exp directive
match = re.search(r'#\|\s*default_exp\s+([^\n\r]+)', content)
if match:
return match.group(1).strip()
return "no_export"
except Exception:
return "read_error"
def _count_test_functions(self, dev_file: Path) -> int:
"""Count the number of test functions in a dev file."""
try:
with open(dev_file, 'r', encoding='utf-8') as f:
content = f.read()
# Count lines that start with "def test_"
lines = content.split('\n')
test_functions = [line for line in lines if line.strip().startswith('def test_')]
return len(test_functions)
except Exception:
return 0
def _count_export_functions(self, dev_file: Path) -> int:
"""Count the number of exported functions/classes in a dev file."""
try:
with open(dev_file, 'r', encoding='utf-8') as f:
content = f.read()
# Count lines that have #| export directive
lines = content.split('\n')
export_lines = [line for line in lines if line.strip().startswith('#| export')]
return len(export_lines)
except Exception:
return 0
def run(self, args: Namespace) -> int:
console = self.console
# Handle comprehensive analysis mode
if args.comprehensive:
return self._run_comprehensive_analysis()
# Handle progress view (default if no flags, or --progress)
if not args.files and not args.details and not args.metadata and not args.test_status:
return self._run_progress_view()
if args.progress:
return self._run_progress_view()
# Standard file status check mode
return self._run_standard_status(args)
def _run_progress_view(self) -> int:
"""Show unified user progress view (modules + milestones)."""
console = self.console
import json
from datetime import datetime
# Load progress data
progress_file = Path(".tito") / "progress.json"
milestones_file = Path(".tito") / "milestones.json"
# Load module progress
if progress_file.exists():
progress_data = json.loads(progress_file.read_text())
completed_modules = progress_data.get("completed_modules", [])
completion_dates = progress_data.get("completion_dates", {})
else:
completed_modules = []
completion_dates = {}
# Load milestone achievements
if milestones_file.exists():
milestones_data = json.loads(milestones_file.read_text())
completed_milestones = milestones_data.get("completed_milestones", [])
milestone_dates = milestones_data.get("completion_dates", {})
else:
completed_milestones = []
milestone_dates = {}
# Calculate progress percentages
total_modules = 20
total_milestones = 6
modules_percent = int((len(completed_modules) / total_modules) * 100)
milestones_percent = int((len(completed_milestones) / total_milestones) * 100)
# Create summary panel
summary_text = Text()
summary_text.append(f"📦 Modules Completed: ", style="bold")
summary_text.append(f"{len(completed_modules)}/{total_modules} ({modules_percent}%)\n", style="cyan")
summary_text.append(f"🏆 Milestones Achieved: ", style="bold")
summary_text.append(f"{len(completed_milestones)}/{total_milestones} ({milestones_percent}%)\n\n", style="magenta")
# Last activity
all_dates = list(completion_dates.values()) + list(milestone_dates.values())
if all_dates:
latest_date = max(all_dates)
summary_text.append("📍 Last Activity: ", style="bold")
summary_text.append(f"{latest_date}\n", style="dim")
console.print(Panel(
summary_text,
title="📊 TinyTorch Progress",
border_style="bright_cyan"
))
# Module Progress Table
if completed_modules:
console.print("\n[bold]Module Progress:[/bold]")
for i in range(1, total_modules + 1):
mod_num = i
if mod_num in completed_modules:
module_name = self._get_module_name(mod_num)
console.print(f" [green]✅ {mod_num:02d} {module_name}[/green]")
elif i <= len(completed_modules) + 3: # Show next few modules
module_name = self._get_module_name(mod_num)
console.print(f" [dim]🔒 {mod_num:02d} {module_name}[/dim]")
# Milestone Achievements
if completed_milestones or (completed_modules and len(completed_modules) >= 1):
console.print("\n[bold]Milestone Achievements:[/bold]")
milestone_names = {
"01": "Perceptron (1957)",
"02": "Backpropagation (1986)",
"03": "MLP Revival (1986)",
"04": "CNN Revolution (1998)",
"05": "Transformer Era (2017)",
"06": "MLPerf (2018)"
}
for mid in ["01", "02", "03", "04", "05", "06"]:
if mid in completed_milestones:
console.print(f" [magenta]✅ {mid} - {milestone_names[mid]}[/magenta]")
else:
# Check if ready
prereqs_met = self._check_milestone_prereqs(mid, completed_modules)
if prereqs_met:
console.print(f" [yellow]🎯 {mid} - {milestone_names[mid]} [Ready!][/yellow]")
else:
console.print(f" [dim]🔒 {mid} - {milestone_names[mid]}[/dim]")
console.print()
return 0
def _get_module_name(self, module_num: int) -> str:
"""Get module name from number."""
module_names = {
1: "Tensor", 2: "Activations", 3: "Layers", 4: "Losses",
5: "Autograd", 6: "Optimizers", 7: "Training", 8: "DataLoader",
9: "Convolutions", 10: "Normalization", 11: "Tokenization",
12: "Embeddings", 13: "Attention", 14: "Transformers",
15: "Profiling", 16: "Quantization", 17: "Compression",
18: "Memoization", 19: "Benchmarking", 20: "Capstone"
}
return module_names.get(module_num, "Unknown")
def _check_milestone_prereqs(self, milestone_id: str, completed_modules: list) -> bool:
"""Check if milestone prerequisites are met."""
prereqs = {
"01": [1],
"02": [1, 2, 3, 4, 5],
"03": [1, 2, 3, 4, 5, 6, 7],
"04": [1, 2, 3, 4, 5, 6, 7, 8, 9],
"05": [1, 2, 3, 4, 5, 6, 7, 11, 12, 13, 14],
"06": [1, 2, 3, 4, 5, 6, 7, 8, 9, 15, 16, 19]
}
required = prereqs.get(milestone_id, [])
return all(mod in completed_modules for mod in required)
def _run_comprehensive_analysis(self) -> int:
"""Run comprehensive system health dashboard."""
console = self.console
start_time = time.time()
console.print("🚀 Starting TinyTorch Comprehensive Status Check...", style="bold green")
# Initialize analyzer
analyzer = TinyTorchStatusAnalyzer()
# Run full analysis
result = analyzer.run_full_analysis()
# Generate comprehensive report
analyzer.generate_comprehensive_report(console)
# Summary
total_time = time.time() - start_time
console.print(f"\n⏱️ Comprehensive analysis completed in {total_time:.1f}s", style="dim")
# Return appropriate exit code
if result['summary']['environment_healthy'] and result['summary']['working_modules'] >= result['summary']['total_modules'] * 0.8:
return 0 # Success
else:
return 1 # Issues found
def _run_standard_status(self, args: Namespace) -> int:
"""Run standard status check mode."""
console = self.console
# Scan modules directory
modules_dir = Path("modules")
if not modules_dir.exists():
console.print(Panel("[red]❌ modules/ directory not found[/red]",
title="Error", border_style="red"))
return 1
# Find all module directories (exclude special directories)
exclude_dirs = {'.quarto', '__pycache__', '.git', '.pytest_cache'}
module_dirs = [d for d in modules_dir.iterdir()
if d.is_dir() and d.name not in exclude_dirs]
if not module_dirs:
console.print(Panel("[yellow]⚠️ No modules found in modules/ directory[/yellow]",
title="Warning", border_style="yellow"))
return 0
console.print(Panel(f"📋 Found {len(module_dirs)} modules in modules directory",
title="Module Status Check", border_style="bright_cyan"))
# Create status table
status_table = Table(title="Module Status Overview", show_header=True, header_style="bold blue")
status_table.add_column("Module", style="bold cyan", width=17)
status_table.add_column("Status", width=12, justify="center")
status_table.add_column("Dev File", width=12, justify="center")
status_table.add_column("Inline Tests", width=12, justify="center")
status_table.add_column("External Tests", width=12, justify="center")
status_table.add_column("README", width=12, justify="center")
if args.metadata:
status_table.add_column("Export Target", width=20, justify="center")
status_table.add_column("Prerequisites", width=15, justify="center")
# Check each module
modules_status = []
for module_dir in sorted(module_dirs):
module_name = module_dir.name
status = self._check_module_status(module_dir, args.test_status)
modules_status.append((module_name, status))
# Add to table
row = [
module_name,
self._format_status(status['overall_status']),
self._format_file_status(status['dev_file'], status.get('export_count', 0)),
self._format_inline_tests(status['inline_test_count']),
self._format_external_tests(status['external_tests'], status.get('external_test_status')),
"" if status['readme'] else ""
]
# Add metadata columns if requested
if args.metadata:
metadata = status.get('metadata', {})
export_target = status.get('export_target', 'unknown')
row.append(export_target if export_target not in ['not_found', 'no_export', 'read_error'] else export_target)
# Show prerequisites from dependencies
deps = metadata.get('dependencies', {})
prereqs = deps.get('prerequisites', [])
row.append(', '.join(prereqs) if prereqs else 'none')
status_table.add_row(*row)
console.print(status_table)
# Summary with better logic
total_modules = len(modules_status)
# A module is "working" if it has a dev file with implementations
working_modules = sum(1 for _, status in modules_status
if status['dev_file'] and status.get('export_count', 0) > 0)
# A module is "complete" if it has everything
complete_modules = sum(1 for _, status in modules_status
if status['dev_file'] and status['external_tests'] and status['readme'] and status.get('export_count', 0) > 0)
console.print(f"\n📊 Summary:")
console.print(f" 🏗️ Working modules: {working_modules}/{total_modules} (have implementations)")
console.print(f" ✅ Complete modules: {complete_modules}/{total_modules} (have implementations, tests, docs)")
# Helpful commands
console.print(f"\n💡 Quick commands:")
console.print(f" [bold cyan]tito status --comprehensive[/bold cyan] # Full system health dashboard")
console.print(f" [bold cyan]tito module test --all[/bold cyan] # Test all modules")
console.print(f" [bold cyan]tito module test MODULE_NAME[/bold cyan] # Test specific module")
console.print(f" [bold cyan]pytest modules/*/ -k test_[/bold cyan] # Run pytest on inline tests")
console.print(f" [bold cyan]pytest tests/test_*.py[/bold cyan] # Run external tests")
# Detailed view
if args.details:
console.print("\n" + "="*60)
console.print("📁 Detailed Module Structure")
console.print("="*60)
for module_name, status in modules_status:
self._print_module_details(module_name, status)
# Metadata view
if args.metadata:
console.print("\n" + "="*60)
console.print("📊 Module Metadata")
console.print("="*60)
for module_name, status in modules_status:
if status.get('metadata'):
self._print_module_metadata(module_name, status['metadata'])
return 0
def _check_module_status(self, module_dir: Path, check_tests: bool = False) -> dict:
"""Check the status of a single module."""
module_name = module_dir.name
# Check for required files
# Extract short name from module directory name for dev file
if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
short_name = module_name[3:] # Remove "00_" prefix
else:
short_name = module_name
dev_file = module_dir / f"{short_name}.py"
readme_file = module_dir / "README.md"
metadata_file = module_dir / "module.yaml"
# Check for tests in main tests directory
# Extract short name from module directory name (e.g., "01_tensor" -> "tensor")
if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))):
short_name = module_name[3:] # Remove "00_" prefix
else:
short_name = module_name
main_test_file = Path("tests") / f"test_{short_name}.py"
status = {
'dev_file': dev_file.exists(),
'readme': readme_file.exists(),
'metadata_file': metadata_file.exists(),
'external_tests': main_test_file.exists(),
'inline_test_count': 0,
'export_count': 0,
'export_target': 'not_found',
'external_test_status': None,
'overall_status': 'unknown',
'metadata': None
}
# Count inline tests and exports if dev file exists
if dev_file.exists():
status['inline_test_count'] = self._count_test_functions(dev_file)
status['export_count'] = self._count_export_functions(dev_file)
status['export_target'] = self._get_export_target(module_dir)
# Run external tests if requested (slower)
if check_tests and main_test_file.exists():
status['external_test_status'] = self._check_external_tests(main_test_file)
# Determine overall status
status['overall_status'] = self._determine_overall_status(status)
# Load metadata if available
if metadata_file.exists():
try:
with open(metadata_file, 'r') as f:
metadata = yaml.safe_load(f)
status['metadata'] = metadata
except Exception as e:
status['metadata'] = {'error': str(e)}
return status
def _determine_overall_status(self, status: dict) -> str:
"""Determine overall module status based on files and implementation."""
# If no dev file, module is not started
if not status['dev_file']:
return 'not_started'
# If dev file exists but no implementations, module is empty
if status.get('export_count', 0) == 0:
return 'empty'
# If has implementations but no tests, module is in progress
if status.get('inline_test_count', 0) == 0 and not status.get('external_tests', False):
return 'no_tests'
# If has implementations and tests, module is working
if status.get('export_count', 0) > 0 and (status.get('inline_test_count', 0) > 0 or status.get('external_tests', False)):
return 'working'
return 'unknown'
def _check_external_tests(self, test_file: Path) -> str:
"""Check if external tests pass (used only when --test-status is specified)."""
try:
result = subprocess.run(
[sys.executable, "-m", "pytest", str(test_file), "-q", "--tb=no"],
capture_output=True,
text=True,
timeout=30
)
if result.returncode == 0:
return 'passing'
else:
return 'failing'
except (subprocess.TimeoutExpired, FileNotFoundError):
return 'error'
def _format_status(self, status: str) -> str:
"""Format overall module status with appropriate emoji and color."""
status_map = {
'working': '', # Has implementations and tests
'no_tests': '🚧', # Has implementations but no tests
'empty': '📝', # Has dev file but no implementations
'not_started': '', # No dev file
'unknown': ''
}
return status_map.get(status, '')
def _format_file_status(self, exists: bool, export_count: int) -> str:
"""Format dev file status showing if it has implementations."""
if not exists:
return ""
if export_count == 0:
return "📝" # File exists but empty
return f"✅({export_count})" # File exists with implementations
def _format_inline_tests(self, test_count: int) -> str:
"""Format inline test count."""
if test_count == 0:
return ""
return f"✅({test_count})"
def _format_external_tests(self, exists: bool, test_status: Optional[str] = None) -> str:
"""Format external test status."""
if not exists:
return ""
if test_status == 'passing':
return ""
elif test_status == 'failing':
return "🔴"
elif test_status == 'error':
return "⚠️"
else:
return "" # Exists but not tested
def _print_module_details(self, module_name: str, status: dict) -> None:
"""Print detailed information about a module."""
console = self.console
# Module header
console.print(f"\n📦 {module_name.upper()}", style="bold cyan")
console.print("-" * 40)
# File structure
files_table = Table(show_header=False, box=None, padding=(0, 2))
files_table.add_column("File", style="dim")
files_table.add_column("Status")
dev_status = "✅ Found" if status['dev_file'] else "❌ Missing"
if status['dev_file']:
dev_status += f" ({status.get('export_count', 0)} exports, {status.get('inline_test_count', 0)} inline tests)"
files_table.add_row(f"{module_name}.py", dev_status)
files_table.add_row("tests/test_*.py", "✅ Found" if status['external_tests'] else "❌ Missing")
files_table.add_row("README.md", "✅ Found" if status['readme'] else "❌ Missing")
console.print(files_table)
# Pytest commands
if status['dev_file'] or status['external_tests']:
console.print("\n[dim]💡 Test commands:[/dim]")
if status['dev_file']:
console.print(f"[dim] pytest modules/{module_name}/{module_name}.py -k test_[/dim]")
if status['external_tests']:
short_name = module_name[3:] if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))) else module_name
console.print(f"[dim] pytest tests/test_{short_name}.py -v[/dim]")
def _print_module_metadata(self, module_name: str, metadata: dict) -> None:
"""Print detailed metadata information about a module."""
console = self.console
# Module header
title = metadata.get('title', module_name.title())
console.print(f"\n📦 {title}", style="bold cyan")
console.print("-" * (len(title) + 4))
# Basic info
if metadata.get('description'):
console.print(f"📝 {metadata['description']}")
# Export info (read from dev file - source of truth)
module_path = Path(f"modules/{module_name}")
export_target = self._get_export_target(module_path)
if export_target not in ['not_found', 'no_export', 'read_error']:
console.print(f"📦 Exports to: {export_target}")
# Dependencies
if metadata.get('dependencies'):
deps = metadata['dependencies']
console.print("\n🔗 Dependencies:")
if deps.get('prerequisites'):
console.print(f" Prerequisites: {', '.join(deps['prerequisites'])}")
if deps.get('enables'):
console.print(f" Enables: {', '.join(deps['enables'])}")
# Components
if metadata.get('components'):
console.print("\n🧩 Components:")
for component in metadata['components']:
console.print(f"{component}")
# Files
if metadata.get('files'):
files = metadata['files']
console.print("\n📁 Files:")
if files.get('dev_file'):
console.print(f" • Dev: {files['dev_file']}")
if files.get('test_file'):
console.print(f" • Test: {files['test_file']}")
if files.get('readme'):
console.print(f" • README: {files['readme']}")