Files
TinyTorch/docs/development/comprehensive-module-testing-plan.md
Vijay Janapa Reddi 857553fa9d Update site documentation and development guides
- Improve site navigation and content structure
- Update development testing documentation
- Enhance site styling and visual consistency
- Update release notes and milestone templates
- Improve site rebuild script functionality
2025-11-13 10:42:51 -05:00

26 KiB

Comprehensive Module Testing Plan

🎯 Overview

This document defines a systematic testing strategy for all TinyTorch modules. It identifies what critical checks each module needs, ensuring both students and maintainers can catch issues early and build robust systems.

Key Principle: Every module needs tests that validate:

  1. Correctness - Does it work as intended?
  2. Integration - Does it work with other modules?
  3. Robustness - Does it handle edge cases?
  4. Usability - Can students actually use it?

📊 Test Categories: What to Test

Category 1: Core Functionality

Purpose: Verify the module does what it's supposed to do

Checks:

  • Forward pass correctness
  • Output shapes match expectations
  • Mathematical correctness (compare to reference implementations)
  • API correctness (methods exist, signatures correct)
  • Parameter initialization (if applicable)

Example: For 03_layers:

  • Linear layer computes output = input @ weight + bias correctly
  • Output shape is (batch, out_features) when input is (batch, in_features)
  • Weight and bias are initialized properly

Category 2: Gradient Flow 🔥

Purpose: Verify gradients flow correctly (critical for training)

Checks:

  • Gradients exist after backward pass
  • Gradients are non-zero (not all zeros)
  • All trainable parameters receive gradients
  • Gradient shapes match parameter shapes
  • Gradients flow through the component correctly

Example: For 02_activations:

  • ReLU preserves requires_grad flag
  • Backward pass computes correct gradients
  • Gradient is 0 for negative inputs, 1 for positive inputs

Modules That Need This: All modules with trainable parameters or that process gradients

  • 02_activations, 03_layers, 04_losses, 05_autograd, 06_optimizers, 07_training, 09_spatial, 11_embeddings, 12_attention, 13_transformers

Modules That Don't Need This: Modules that don't process gradients

  • 01_tensor (foundation, no gradients yet), 08_dataloader (data only), 10_tokenization (text processing), 14_profiling (analysis), 15_quantization (post-training), 16_compression (post-training), 17_memoization (caching), 18_acceleration (optimization), 19_benchmarking (evaluation)

Category 3: Integration with Previous Modules 🔗

Purpose: Verify module N works with modules 1 through N-1

Checks:

  • Imports from previous modules work
  • Components from previous modules integrate correctly
  • Data flows correctly through the stack
  • No breaking changes to previous modules

Example: For 07_training:

  • Uses Tensor (01), Layers (03), Losses (04), Autograd (05), Optimizers (06)
  • All components work together in a training loop
  • Training loop actually trains (loss decreases)

All Modules Need This: Every module should test integration with previous modules


Category 4: Shape Correctness 📐

Purpose: Verify shapes are handled correctly (common source of bugs)

Checks:

  • Output shapes match expected dimensions
  • Broadcasting works correctly
  • Reshape operations preserve data
  • Batch dimensions handled correctly
  • Edge cases (empty tensors, single samples, etc.)

Example: For 09_spatial:

  • Conv2d output shape: (batch, out_channels, height_out, width_out)
  • MaxPool2d reduces spatial dimensions correctly
  • Shapes work with Linear layers downstream

Modules That Need This: All modules that transform shapes

  • 01_tensor, 03_layers, 09_spatial, 11_embeddings, 12_attention, 13_transformers

Category 5: Edge Cases & Error Handling ⚠️

Purpose: Verify robustness and helpful error messages

Checks:

  • Handles empty inputs gracefully
  • Handles zero values correctly
  • Handles very large/small values
  • Provides helpful error messages for invalid inputs
  • Handles NaN/Inf correctly
  • Handles out-of-bounds indices

Example: For 08_dataloader:

  • Empty dataset handled gracefully
  • Batch size larger than dataset handled correctly
  • Invalid indices raise clear error messages

All Modules Need This: Every module should handle edge cases


Category 6: Numerical Stability 🔢

Purpose: Verify numerical correctness and stability

Checks:

  • No NaN values in outputs
  • No Inf values in outputs
  • Numerical precision is acceptable
  • Operations are numerically stable
  • Compare to reference implementations (NumPy, PyTorch)

Example: For 02_activations:

  • Sigmoid doesn't overflow for large inputs
  • Softmax is numerically stable (uses log-sum-exp trick)
  • No NaN/Inf in outputs

Modules That Need This: Modules with numerical operations

  • 01_tensor, 02_activations, 03_layers, 04_losses, 05_autograd, 09_spatial, 11_embeddings, 12_attention, 13_transformers

Category 7: Memory & Performance

Purpose: Verify reasonable performance (not exhaustive, but catch major issues)

Checks:

  • No memory leaks
  • Operations complete in reasonable time
  • Memory usage is reasonable
  • Can handle realistic batch sizes

Example: For 13_transformers:

  • Forward pass completes in reasonable time for small models
  • Memory usage scales linearly with batch size
  • No memory leaks across multiple forward passes

Modules That Need This: Modules with performance-sensitive operations

  • 05_autograd, 09_spatial, 12_attention, 13_transformers, 14_profiling, 18_acceleration, 19_benchmarking

Category 8: Real-World Usage 🌍

Purpose: Verify the module works in realistic scenarios

Checks:

  • Can solve the intended problem
  • Works with real datasets (if applicable)
  • Matches expected behavior from documentation
  • Can be used in production-like scenarios

Example: For 07_training:

  • Can train a simple model on real data
  • Loss decreases over epochs
  • Model actually learns (accuracy improves)

Modules That Need This: All modules should have at least one real-world usage test


Category 9: Export/Import Correctness 📦

Purpose: Verify code exports correctly and can be imported

Checks:

  • Code exports to tinytorch/ correctly
  • Can import from tinytorch.* package
  • Exported API matches module API
  • No import errors

All Modules Need This: Every module should test export/import


Category 10: API Consistency 🔌

Purpose: Verify API matches conventions and is usable

Checks:

  • Methods have expected names
  • Parameters match expected signatures
  • Return types are consistent
  • Follows TinyTorch conventions

All Modules Need This: Every module should test API consistency


📋 Module-by-Module Testing Plan

Module 01: Tensor (Foundation)

Critical Checks Needed:

  • Core Functionality: All operations work (add, mul, matmul, etc.)
  • Gradient Flow: Not applicable (no gradients yet)
  • Integration: No previous modules
  • Shape Correctness: Broadcasting, reshaping, indexing
  • Edge Cases: Empty tensors, zero values, large arrays
  • Numerical Stability: Precision, overflow handling
  • ⚠️ Memory & Performance: Large tensor operations
  • Real-World Usage: Can build neural networks with tensors
  • Export/Import: Exports to tinytorch.core.tensor
  • API Consistency: Matches NumPy-like API

Test Files:

  • tests/01_tensor/test_tensor_core.py - Core functionality
  • tests/01_tensor/test_tensor_integration.py - Integration with NumPy
  • tests/01_tensor/test_progressive_integration.py - Progressive integration

Module 02: Activations

Critical Checks Needed:

  • Core Functionality: Forward pass correctness
  • Gradient Flow: CRITICAL - All activations preserve gradients
  • Integration: Works with Tensor (01)
  • Shape Correctness: Output shape matches input shape
  • Edge Cases: Large values, zero values, negative values
  • Numerical Stability: CRITICAL - No overflow/underflow
  • ⚠️ Memory & Performance: Fast forward/backward passes
  • Real-World Usage: Can use in neural networks
  • Export/Import: Exports to tinytorch.core.activations
  • API Consistency: All activations have same interface

Test Files:

  • tests/02_activations/test_activations_core.py - Core functionality
  • tests/02_activations/test_gradient_flow.py - MISSING - Gradient flow tests
  • tests/02_activations/test_activations_integration.py - Integration
  • tests/02_activations/test_progressive_integration.py - Progressive integration

Gap: Missing comprehensive gradient flow tests for all activations


Module 03: Layers

Critical Checks Needed:

  • Core Functionality: Forward pass, parameter initialization
  • Gradient Flow: CRITICAL - All layers compute gradients correctly
  • Integration: Works with Tensor (01), Activations (02)
  • Shape Correctness: CRITICAL - Output shapes match expectations
  • Edge Cases: Zero inputs, single samples, large batches
  • Numerical Stability: No NaN/Inf in outputs
  • ⚠️ Memory & Performance: Reasonable memory usage
  • Real-World Usage: Can build neural networks
  • Export/Import: Exports to tinytorch.core.layers
  • API Consistency: All layers follow Module interface

Test Files:

  • tests/03_layers/test_layers_core.py - Core functionality
  • tests/03_layers/test_layers_integration.py - Integration
  • tests/03_layers/test_layers_networks_integration.py - Network integration
  • tests/03_layers/test_progressive_integration.py - Progressive integration

Gap: Missing gradient flow tests for Dropout, LayerNorm (if in layers module)


Module 04: Losses

Critical Checks Needed:

  • Core Functionality: Loss computation correctness
  • Gradient Flow: CRITICAL - Loss functions compute gradients
  • Integration: Works with Tensor (01), Layers (03)
  • Shape Correctness: Handles different batch sizes
  • Edge Cases: Perfect predictions, zero loss, large losses
  • Numerical Stability: CRITICAL - Log operations stable
  • ⚠️ Memory & Performance: Efficient computation
  • Real-World Usage: Can use in training loops
  • Export/Import: Exports to tinytorch.core.losses
  • API Consistency: All losses have same interface

Test Files:

  • tests/04_losses/test_dense_layer.py - Layer tests
  • tests/04_losses/test_dense_integration.py - Integration
  • tests/04_losses/test_network_capability.py - Network capability
  • tests/04_losses/test_progressive_integration.py - Progressive integration

Status: Good coverage


Module 05: Autograd

Critical Checks Needed:

  • Core Functionality: Forward/backward pass correctness
  • Gradient Flow: CRITICAL - Gradients computed correctly
  • Integration: Works with all previous modules
  • Shape Correctness: Gradient shapes match parameter shapes
  • Edge Cases: CRITICAL - Broadcasting, reshape, chain rule
  • Numerical Stability: CRITICAL - Gradient computation stable
  • Memory & Performance: CRITICAL - No memory leaks
  • Real-World Usage: Can train models
  • Export/Import: Exports to tinytorch.core.autograd
  • API Consistency: Matches PyTorch-like API

Test Files:

  • tests/05_autograd/test_gradient_flow.py - Gradient flow
  • tests/05_autograd/test_batched_matmul_backward.py - Batched operations
  • tests/05_autograd/test_progressive_integration.py - Progressive integration

Status: Excellent coverage


Module 06: Optimizers

Critical Checks Needed:

  • Core Functionality: Parameter updates work
  • Gradient Flow: CRITICAL - Optimizers use gradients correctly
  • Integration: Works with Autograd (05), Layers (03)
  • Shape Correctness: Parameter shapes preserved
  • Edge Cases: Zero gradients, very small/large learning rates
  • Numerical Stability: Updates don't cause overflow
  • ⚠️ Memory & Performance: Efficient updates
  • Real-World Usage: Can train models
  • Export/Import: Exports to tinytorch.core.optimizers
  • API Consistency: All optimizers have same interface

Test Files:

  • tests/06_optimizers/test_progressive_integration.py - Progressive integration
  • tests/06_optimizers/test_cnn_networks_integration.py - CNN integration
  • tests/06_optimizers/test_cnn_pipeline_integration.py - Pipeline integration

Gap: Missing dedicated optimizer functionality tests


Module 07: Training

Critical Checks Needed:

  • Core Functionality: Training loops work
  • Gradient Flow: CRITICAL - Full training stack gradients work
  • Integration: Works with all previous modules (01-06)
  • Shape Correctness: Batch handling, loss aggregation
  • Edge Cases: Single sample, empty batches, convergence
  • Numerical Stability: Training doesn't diverge
  • ⚠️ Memory & Performance: Reasonable training speed
  • Real-World Usage: CRITICAL - Can actually train models
  • Export/Import: Exports to tinytorch.core.training
  • API Consistency: Training API is usable

Test Files:

  • tests/07_training/test_autograd_integration.py - Autograd integration
  • tests/07_training/test_tensor_autograd_integration.py - Tensor integration
  • tests/07_training/test_progressive_integration.py - Progressive integration

Gap: Missing end-to-end training convergence tests


Module 08: Dataloader

Critical Checks Needed:

  • Core Functionality: Batching, shuffling, iteration work
  • Gradient Flow: Not applicable (data only)
  • Integration: Works with Tensor (01), doesn't break gradients
  • Shape Correctness: Batch shapes correct
  • Edge Cases: CRITICAL - Empty dataset, batch > dataset size
  • ⚠️ Numerical Stability: Not applicable
  • Memory & Performance: CRITICAL - Efficient data loading
  • Real-World Usage: Can load real datasets
  • Export/Import: Exports to tinytorch.data.dataloader
  • API Consistency: Iterator interface works

Test Files:

  • tests/08_dataloader/test_autograd_core.py - Core functionality
  • tests/08_dataloader/test_progressive_integration.py - Progressive integration

Gap: Missing comprehensive edge case tests, missing tests that verify dataloader doesn't break gradient flow


Module 09: Spatial (CNNs)

Critical Checks Needed:

  • Core Functionality: Conv2d, Pooling work correctly
  • Gradient Flow: CRITICAL - Conv2d gradients work
  • Integration: Works with Tensor (01), Layers (03), Autograd (05)
  • Shape Correctness: CRITICAL - Output shapes match expectations
  • Edge Cases: Kernel size > image size, stride > kernel size
  • Numerical Stability: No NaN/Inf in outputs
  • Memory & Performance: CRITICAL - Efficient convolution
  • Real-World Usage: Can build CNNs
  • Export/Import: Exports to tinytorch.core.spatial
  • API Consistency: Matches PyTorch Conv2d API

Test Files:

  • tests/integration/test_cnn_integration.py - CNN integration
  • tests/09_spatial/test_progressive_integration.py - Progressive integration

Status: Good coverage


Module 10: Tokenization

Critical Checks Needed:

  • Core Functionality: Tokenization works correctly
  • Gradient Flow: Not applicable (text processing)
  • Integration: Works with Tensor (01)
  • Shape Correctness: Token sequences have correct shapes
  • Edge Cases: Empty strings, special characters, long sequences
  • ⚠️ Numerical Stability: Not applicable
  • ⚠️ Memory & Performance: Efficient tokenization
  • Real-World Usage: Can tokenize real text
  • Export/Import: Exports to tinytorch.text.tokenization
  • API Consistency: Tokenizer interface works

Test Files:

  • tests/10_tokenization/test_progressive_integration.py - Progressive integration

Gap: Missing comprehensive tokenization tests


Module 11: Embeddings

Critical Checks Needed:

  • Core Functionality: Embedding lookup works
  • Gradient Flow: CRITICAL - Embedding gradients work
  • Integration: Works with Tokenization (10), Tensor (01)
  • Shape Correctness: Embedding shapes correct
  • Edge Cases: Out-of-vocab tokens, zero embeddings
  • Numerical Stability: Embedding values reasonable
  • ⚠️ Memory & Performance: Efficient embedding lookup
  • Real-World Usage: Can embed real text
  • Export/Import: Exports to tinytorch.text.embeddings
  • API Consistency: Embedding interface works

Test Files:

  • tests/11_embeddings/test_training_integration.py - Training integration
  • tests/11_embeddings/test_ml_pipeline.py - ML pipeline
  • tests/11_embeddings/test_progressive_integration.py - Progressive integration

Status: Good coverage


Module 12: Attention

Critical Checks Needed:

  • Core Functionality: Attention mechanism works
  • Gradient Flow: CRITICAL - Attention gradients work
  • Integration: Works with Embeddings (11), Tensor (01)
  • Shape Correctness: CRITICAL - Attention output shapes correct
  • Edge Cases: Causal masking, padding masks, long sequences
  • Numerical Stability: CRITICAL - Softmax stability
  • Memory & Performance: CRITICAL - O(n²) complexity handled
  • Real-World Usage: Can use in transformers
  • Export/Import: Exports to tinytorch.models.attention
  • API Consistency: Attention interface works

Test Files:

  • tests/12_attention/test_progressive_integration.py - Progressive integration
  • tests/12_attention/test_compression_integration.py - Compression integration

Gap: Missing dedicated attention mechanism tests


Module 13: Transformers

Critical Checks Needed:

  • Core Functionality: Transformer blocks work
  • Gradient Flow: CRITICAL - Full transformer gradients work
  • Integration: Works with all previous modules (01-12)
  • Shape Correctness: CRITICAL - Transformer output shapes correct
  • Edge Cases: Variable sequence lengths, masking
  • Numerical Stability: CRITICAL - LayerNorm, attention stability
  • Memory & Performance: CRITICAL - Efficient transformer forward/backward
  • Real-World Usage: CRITICAL - Can train transformers
  • Export/Import: Exports to tinytorch.models.transformer
  • API Consistency: Transformer API works

Test Files:

  • tests/13_transformers/test_transformer_gradient_flow.py - Gradient flow
  • tests/13_transformers/test_training_simple.py - Training tests
  • tests/13_transformers/test_kernels_integration.py - Kernel integration
  • tests/13_transformers/test_progressive_integration.py - Progressive integration

Status: Excellent coverage


Module 14: Profiling

Critical Checks Needed:

  • Core Functionality: Profiling works correctly
  • Gradient Flow: Not applicable (analysis only)
  • Integration: Works with all modules
  • ⚠️ Shape Correctness: Not applicable
  • Edge Cases: Empty profiles, very fast operations
  • ⚠️ Numerical Stability: Not applicable
  • Memory & Performance: CRITICAL - Profiling overhead minimal
  • Real-World Usage: Can profile real models
  • Export/Import: Exports to tinytorch.profiling
  • API Consistency: Profiler interface works

Test Files:

  • tests/14_profiling/test_progressive_integration.py - Progressive integration
  • tests/14_profiling/test_benchmarking_integration.py - Benchmarking integration
  • tests/14_profiling/test_kv_cache_integration.py - KV cache integration

Status: Good coverage


Module 15: Quantization

Critical Checks Needed:

  • Core Functionality: Quantization works correctly
  • ⚠️ Gradient Flow: May need gradient tests if quantization-aware training
  • Integration: Works with trained models
  • Shape Correctness: Quantized model shapes preserved
  • Edge Cases: Extreme values, zero values
  • Numerical Stability: Quantization doesn't cause overflow
  • Memory & Performance: CRITICAL - Memory reduction achieved
  • Real-World Usage: Can quantize real models
  • Export/Import: Exports to tinytorch.quantization
  • API Consistency: Quantization API works

Test Files:

  • tests/15_memoization/test_progressive_integration.py - Progressive integration
  • tests/15_memoization/test_mlops_integration.py - MLOps integration
  • tests/15_memoization/test_tinygpt_integration.py - TinyGPT integration

Gap: Missing quantization-specific tests


Module 16: Compression

Critical Checks Needed:

  • Core Functionality: Compression works correctly
  • ⚠️ Gradient Flow: May need gradient tests if compression-aware training
  • Integration: Works with trained models
  • Shape Correctness: Compressed model shapes handled
  • Edge Cases: Already sparse models, extreme compression
  • Numerical Stability: Compression doesn't cause instability
  • Memory & Performance: CRITICAL - Compression ratio achieved
  • Real-World Usage: Can compress real models
  • Export/Import: Exports to tinytorch.compression
  • API Consistency: Compression API works

Test Files: Need to check

Gap: Unknown - needs assessment


Module 17: Memoization

Critical Checks Needed:

  • Core Functionality: Caching works correctly
  • Gradient Flow: Not applicable (caching only)
  • Integration: Works with all modules
  • ⚠️ Shape Correctness: Not applicable
  • Edge Cases: Cache invalidation, memory limits
  • ⚠️ Numerical Stability: Not applicable
  • Memory & Performance: CRITICAL - Caching improves performance
  • Real-World Usage: Can cache real computations
  • Export/Import: Exports to tinytorch.memoization
  • API Consistency: Cache interface works

Test Files:

  • tests/17_compression/ - Need to check

Gap: Unknown - needs assessment


Module 18: Acceleration

Critical Checks Needed:

  • Core Functionality: Acceleration works correctly
  • Gradient Flow: Not applicable (optimization only)
  • Integration: Works with all modules
  • ⚠️ Shape Correctness: Not applicable
  • Edge Cases: Already optimized code, edge cases
  • ⚠️ Numerical Stability: Not applicable
  • Memory & Performance: CRITICAL - Speedup achieved
  • Real-World Usage: Can accelerate real models
  • Export/Import: Exports to tinytorch.acceleration
  • API Consistency: Acceleration API works

Test Files: Need to check

Gap: Unknown - needs assessment


Module 19: Benchmarking

Critical Checks Needed:

  • Core Functionality: Benchmarking works correctly
  • Gradient Flow: Not applicable (evaluation only)
  • Integration: Works with all modules
  • ⚠️ Shape Correctness: Not applicable
  • Edge Cases: Very fast/slow operations, edge cases
  • ⚠️ Numerical Stability: Not applicable
  • Memory & Performance: CRITICAL - Benchmarking overhead minimal
  • Real-World Usage: Can benchmark real models
  • Export/Import: Exports to tinytorch.benchmarking
  • API Consistency: Benchmarking API works

Test Files: Need to check

Gap: Unknown - needs assessment


Module 20: Capstone

Critical Checks Needed:

  • Core Functionality: Complete system works
  • Gradient Flow: CRITICAL - Full system gradients work
  • Integration: Works with ALL modules (01-19)
  • Shape Correctness: End-to-end shapes correct
  • Edge Cases: All edge cases from all modules
  • Numerical Stability: Full system stable
  • Memory & Performance: CRITICAL - System performance acceptable
  • Real-World Usage: CRITICAL - Can train and use TinyGPT
  • Export/Import: Exports to tinytorch.applications.tinygpt
  • API Consistency: Complete API works

Test Files: Need to check

Gap: Unknown - needs assessment


🎯 Priority Implementation Plan

Phase 1: Critical Gaps (Must Fix)

  1. Module 02_activations: Add comprehensive gradient flow tests
  2. Module 08_dataloader: Add edge case tests, verify doesn't break gradients
  3. Module 06_optimizers: Add dedicated optimizer functionality tests
  4. Module 07_training: Add end-to-end convergence tests

Phase 2: Important Gaps (Should Fix)

  1. Module 03_layers: Add gradient flow tests for Dropout, LayerNorm
  2. Module 10_tokenization: Add comprehensive tokenization tests
  3. Module 12_attention: Add dedicated attention mechanism tests
  4. All modules: Add export/import correctness tests

Phase 3: Nice to Have (Can Fix)

  1. All modules: Add numerical stability tests
  2. All modules: Add memory/performance tests
  3. All modules: Add real-world usage tests

📝 Test File Naming Convention

For each module XX_modulename, create:

tests/XX_modulename/
├── test_[modulename]_core.py              # Core functionality
├── test_gradient_flow.py                  # Gradient flow (if applicable)
├── test_[modulename]_integration.py       # Integration with previous modules
├── test_progressive_integration.py        # Progressive integration (module N with 1-N-1)
├── test_edge_cases.py                     # Edge cases and error handling
├── test_numerical_stability.py            # Numerical stability (if applicable)
└── test_real_world_usage.py               # Real-world usage scenarios

Success Criteria

A module has complete test coverage when:

  1. Core functionality tests pass
  2. Gradient flow tests pass (if applicable)
  3. Integration tests pass
  4. Progressive integration tests pass
  5. Edge case tests pass
  6. Export/import tests pass
  7. At least one real-world usage test passes

🎓 For Students

This testing plan helps you:

  • Understand what to test: Clear categories of what matters
  • Catch bugs early: Test as you build
  • Learn best practices: See how professional ML systems are tested
  • Build confidence: Know your code works correctly

🔧 For Maintainers

This testing plan helps you:

  • Catch regressions: Comprehensive tests catch breaking changes
  • Ensure quality: All modules meet quality standards
  • Document behavior: Tests document expected behavior
  • Maintain system: Keep TinyTorch robust as it evolves

Last Updated: 2025-01-XX
Status: Comprehensive plan complete, implementation in progress
Priority: High - Systematic testing ensures robust system