Files
TinyTorch/docs/development/testing-architecture.md
Vijay Janapa Reddi f31865560e Add enumitem package to fix itemize formatting
The itemize environment parameters [leftmargin=*, itemsep=1pt, parsep=0pt]
were appearing as visible text in the PDF because the enumitem package
wasn't loaded. This fix adds \usepackage{enumitem} to the preamble.

All itemized lists now format correctly with proper spacing and margins.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-19 08:43:41 -05:00

13 KiB

TinyTorch Testing Architecture

🎯 Overview: Two-Tier Testing Strategy

TinyTorch uses a two-tier testing approach that separates component validation from system integration:

  1. Inline Tests (modules/) - Component validation, unit tests
  2. Integration Tests (tests/) - Inter-module integration, edge cases, system tests

This separation follows ML engineering best practices: validate components in isolation, then test how they work together.


📋 Tier 1: Inline Tests (Component Validation)

Location: modules/XX_modulename/*.py

Purpose:

  • Validate individual components work correctly in isolation
  • Test single module functionality
  • Provide immediate feedback during development
  • Educate students about expected behavior
  • Fast execution for rapid iteration

What Gets Tested:

  • Individual class/function correctness
  • Mathematical operations (forward passes)
  • Shape transformations
  • Basic edge cases and error handling
  • Component-level functionality

Test Pattern:

def test_unit_componentname():
    """🧪 Unit Test: Component Name
    
    **This is a unit test** - it tests [component] in isolation.
    """
    print("🔬 Unit Test: Component...")
    
    # Test implementation
    assert condition, "✅ Component works"
    
    print("✅ Component test passed")

Example: modules/01_tensor/tensor.py

  • test_unit_tensor_creation() - Tests tensor creation
  • test_unit_arithmetic_operations() - Tests +, -, *, /
  • test_unit_matrix_multiplication() - Tests @ operator
  • test_unit_shape_manipulation() - Tests reshape, transpose
  • test_unit_reduction_operations() - Tests sum, mean, max

Execution:

# Run inline tests only
tito test 01_tensor --inline-only

# Tests run when you execute the module file
python modules/01_tensor/tensor.py

Key Characteristics:

  • Fast: Run during development for immediate feedback
  • Isolated: No dependencies on other modules
  • Educational: Shows students what "correct" looks like
  • Component-focused: Tests one thing at a time

📊 Tier 2: Integration Tests (tests/ Directory)

Location: tests/

Purpose:

  • Test how multiple modules work together
  • Validate cross-module dependencies
  • Test realistic workflows and use cases
  • Ensure system-level correctness
  • Catch bugs that unit tests miss
  • Test edge cases and corner scenarios
  • Validate exported code (tinytorch/) works correctly

Key Insight:

Component correctness ≠ System correctness

A tensor might work perfectly in isolation, but fail when gradients flow through layers → activations → losses → optimizers. Integration tests catch these "seam" bugs.


🗂️ Structure of tests/ Directory

1. Module-Specific Integration Tests (tests/XX_modulename/)

Purpose: Test that module N works correctly with all previous modules (1 through N-1)

Example: tests/05_autograd/test_progressive_integration.py

  • Tests autograd with Tensor (01), Activations (02), Layers (03), Losses (04)
  • Validates that gradients flow correctly through the entire stack built so far

Pattern: Progressive integration

# tests/05_autograd/test_progressive_integration.py
def test_autograd_with_all_previous_modules():
    # Uses real Tensor, real Layers, real Activations, real Losses
    # Then tests Autograd (05) with all of them
    x = Tensor([[1.0, 2.0]], requires_grad=True)
    layer = Linear(2, 3)
    activation = ReLU()
    loss_fn = MSELoss()
    
    output = activation(layer(x))
    loss = loss_fn(output, target)
    loss.backward()
    
    assert x.grad is not None  # Gradient flowed through everything!

Why This Matters:

  • Catches integration bugs early
  • Ensures modules don't break previous functionality
  • Validates the "seams" between modules

2. Cross-Module Integration Tests (tests/integration/)

Purpose: Test multiple modules working together in realistic scenarios

Key Files:

  • test_gradient_flow.py - CRITICAL: Validates gradients flow through entire training stack
  • test_end_to_end_training.py - Full training loops
  • test_module_compatibility.py - Module interfaces

Example: tests/integration/test_gradient_flow.py

def test_complete_training_stack():
    """Test that gradients flow through: Tensor → Layers → Activations → Loss → Autograd → Optimizer"""
    # Uses modules 01, 02, 03, 04, 05, 06, 07
    # Validates the entire training pipeline works

Why This Matters:

  • Catches bugs that unit tests miss
  • Validates the "seams" between modules
  • Ensures training actually works end-to-end
  • Tests realistic ML workflows

3. Edge Cases & Stress Tests (tests/05_autograd/, tests/debugging/)

Purpose: Test corner cases and common pitfalls

Examples:

  • tests/05_autograd/test_broadcasting.py - Broadcasting gradient bugs
  • tests/05_autograd/test_computation_graph.py - Graph construction edge cases
  • tests/debugging/test_gradient_vanishing.py - Detect vanishing gradients
  • tests/debugging/test_common_mistakes.py - "Did you forget backward()?" style tests

Philosophy: When these tests fail, the error message should teach the student what went wrong and how to fix it.

Why This Matters:

  • Catches numerical stability issues
  • Tests edge cases that break in production
  • Pedagogical: teaches debugging skills

4. Regression Tests (tests/regression/)

Purpose: Ensure previously fixed bugs don't come back

Pattern: Each bug gets a test file

  • test_issue_20241125_conv_fc_shapes.py - Tests a specific bug that was fixed
  • Documents the bug, root cause, fix, and prevention

Why This Matters:

  • Prevents regressions
  • Documents historical bugs
  • Ensures fixes persist

5. Performance Tests (tests/performance/)

Purpose: Validate systems performance characteristics

Examples:

  • Memory profiling
  • Speed benchmarks
  • Scalability tests

Why This Matters:

  • Ensures implementations are efficient
  • Validates performance characteristics
  • Catches performance regressions

6. System Tests (tests/system/)

Purpose: Test entire system workflows

Examples:

  • End-to-end training pipelines
  • Model export/import
  • Checkpoint system tests

Why This Matters:

  • Validates complete workflows
  • Tests production scenarios
  • Ensures system-level correctness

7. Checkpoint Tests (tests/checkpoints/)

Purpose: Validate milestone capabilities

Examples:

  • checkpoint_01_foundation.py - Tensor operations mastered
  • checkpoint_05_learning.py - Autograd working correctly

Why This Matters:

  • Validates student progress
  • Ensures milestones are met
  • Provides clear success criteria

🔄 Code Flow: Development → Export → Testing

┌─────────────────────────────────────────────────────────────┐
│                    DEVELOPMENT WORKFLOW                      │
└─────────────────────────────────────────────────────────────┘

1. DEVELOP in modules/
   └─> modules/01_tensor/tensor.py
       ├─> Write code
       ├─> Write inline tests (test_unit_*)
       └─> Run: python modules/01_tensor/tensor.py

2. EXPORT to tinytorch/
   └─> tito export 01_tensor
       └─> Code exported to tinytorch/core/tensor.py

3. TEST integration
   └─> tests/01_tensor/test_progressive_integration.py
       ├─> Imports from tinytorch.core.tensor (exported code!)
       ├─> Tests module works with previous modules
       └─> Run: pytest tests/01_tensor/

4. TEST cross-module
   └─> tests/integration/test_gradient_flow.py
       ├─> Imports from tinytorch.* (all exported modules)
       ├─> Tests multiple modules working together
       └─> Run: pytest tests/integration/

🎯 Decision Tree: Where Should This Test Go?

Is it testing a single component in isolation?
├─ YES → modules/XX_modulename/*.py (inline test_unit_*)
│
└─ NO → Is it testing module N with previous modules?
    ├─ YES → tests/XX_modulename/test_progressive_integration.py
    │
    └─ NO → Is it testing multiple modules together?
        ├─ YES → tests/integration/test_*.py
        │
        └─ NO → Is it an edge case or stress test?
            ├─ YES → tests/XX_modulename/test_*_edge_cases.py
            │         OR tests/debugging/test_*.py
            │
            └─ NO → Is it a regression test?
                ├─ YES → tests/regression/test_issue_*.py
                │
                └─ NO → Is it a performance test?
                    ├─ YES → tests/performance/test_*.py
                    │
                    └─ NO → Is it a system test?
                        └─ YES → tests/system/test_*.py

📝 Best Practices

DO:

Write inline tests immediately after implementing a component
Test one thing per inline test function
Use descriptive test function names (test_unit_sigmoid, not test1)
Add integration tests when combining multiple modules
Run inline tests frequently during development
Run full test suite before committing
Test exported code (tinytorch/), not development code (modules/)
Write tests that catch real bugs you've encountered

DON'T:

Mix inline and integration test concerns
Test implementation details in integration tests
Skip inline tests and jump to integration
Test mocked/fake components (use real ones)
Create dependencies between test files
Test code in modules/ directly in tests/ (test tinytorch/ instead)
Duplicate inline tests in tests/ directory


🔍 Key Distinctions

Aspect Inline Tests (modules/) Integration Tests (tests/)
Location modules/XX_name/*.py tests/XX_name/ or tests/integration/
Scope Single component Multiple modules
Dependencies None (isolated) Previous modules
Speed Fast Slower
Purpose Component correctness System correctness
When to run During development Before commit/export
What gets tested modules/ code directly tinytorch/ exported code
Example test_unit_tensor_creation() test_tensor_with_layers()

🚀 Testing Workflow

For Students:

# 1. Work on module
cd modules/01_tensor
vim tensor.py

# 2. Run inline tests (fast feedback)
python tensor.py
# or
tito test 01_tensor --inline-only

# 3. Export to package
tito export 01_tensor

# 4. Run integration tests (full validation)
tito test 01_tensor
# or
pytest tests/01_tensor/

# 5. Run cross-module tests (ensure nothing broke)
pytest tests/integration/

For Instructors:

# Comprehensive test suite
tito test --comprehensive

# Specific module deep dive
tito test 05_autograd --detailed

# All inline tests only (quick check)
tito test --all --inline-only

# Critical integration tests
pytest tests/integration/test_gradient_flow.py -v

💡 Why This Architecture?

Separation of Concerns:

  • Inline tests = "Does this component work?"
  • Integration tests = "Do these components work together?"

Educational Value:

  • Students learn component testing first
  • Then learn integration testing
  • Mirrors professional ML engineering workflows

Practical Benefits:

  • Fast feedback during development (inline tests)
  • Comprehensive validation before commit (integration tests)
  • Catches bugs at the right level
  • Clear mental model: component vs. system

Real-World Alignment:

  • Professional ML teams use this pattern
  • Unit tests for components
  • Integration tests for pipelines
  • System tests for workflows

📚 Summary

Think of tests/ as the "system validation layer":

  1. modules/ inline tests = "Does my component work?"
  2. tests/XX_modulename/ = "Does my module work with previous modules?"
  3. tests/integration/ = "Do multiple modules work together?"
  4. tests/debugging/ = "Are there edge cases I'm missing?"
  5. tests/regression/ = "Did I break something that was working?"
  6. tests/performance/ = "Is my implementation efficient?"
  7. tests/system/ = "Does the entire system work?"

The key insight: tests/ validates that exported code (tinytorch/) works correctly in realistic scenarios, catching bugs that isolated unit tests miss.


Last Updated: 2025-01-XX
Test Infrastructure: Complete (20/20 modules have test directories)
Philosophy: Component correctness ≠ System correctness