mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-31 09:21:30 -05:00

Files

Vijay Janapa Reddi 857553fa9d Update site documentation and development guides

- Improve site navigation and content structure
- Update development testing documentation
- Enhance site styling and visual consistency
- Update release notes and milestone templates
- Improve site rebuild script functionality

2025-11-13 10:42:51 -05:00

26 KiB

Raw Blame History

Comprehensive Module Testing Plan

🎯 Overview

This document defines a systematic testing strategy for all TinyTorch modules. It identifies what critical checks each module needs, ensuring both students and maintainers can catch issues early and build robust systems.

Key Principle: Every module needs tests that validate:

Correctness - Does it work as intended?
Integration - Does it work with other modules?
Robustness - Does it handle edge cases?
Usability - Can students actually use it?

📊 Test Categories: What to Test

Category 1: Core Functionality ✅

Purpose: Verify the module does what it's supposed to do

Checks:

✅ Forward pass correctness
✅ Output shapes match expectations
✅ Mathematical correctness (compare to reference implementations)
✅ API correctness (methods exist, signatures correct)
✅ Parameter initialization (if applicable)

Example: For 03_layers:

Linear layer computes output = input @ weight + bias correctly
Output shape is (batch, out_features) when input is (batch, in_features)
Weight and bias are initialized properly

Category 2: Gradient Flow 🔥

Purpose: Verify gradients flow correctly (critical for training)

Checks:

✅ Gradients exist after backward pass
✅ Gradients are non-zero (not all zeros)
✅ All trainable parameters receive gradients
✅ Gradient shapes match parameter shapes
✅ Gradients flow through the component correctly

Example: For 02_activations:

ReLU preserves requires_grad flag
Backward pass computes correct gradients
Gradient is 0 for negative inputs, 1 for positive inputs

Modules That Need This: All modules with trainable parameters or that process gradients

✅ 02_activations, 03_layers, 04_losses, 05_autograd, 06_optimizers, 07_training, 09_spatial, 11_embeddings, 12_attention, 13_transformers

Modules That Don't Need This: Modules that don't process gradients

❌ 01_tensor (foundation, no gradients yet), 08_dataloader (data only), 10_tokenization (text processing), 14_profiling (analysis), 15_quantization (post-training), 16_compression (post-training), 17_memoization (caching), 18_acceleration (optimization), 19_benchmarking (evaluation)

Category 3: Integration with Previous Modules 🔗

Purpose: Verify module N works with modules 1 through N-1

Checks:

✅ Imports from previous modules work
✅ Components from previous modules integrate correctly
✅ Data flows correctly through the stack
✅ No breaking changes to previous modules

Example: For 07_training:

Uses Tensor (01), Layers (03), Losses (04), Autograd (05), Optimizers (06)
All components work together in a training loop
Training loop actually trains (loss decreases)

All Modules Need This: Every module should test integration with previous modules

Category 4: Shape Correctness 📐

Purpose: Verify shapes are handled correctly (common source of bugs)

Checks:

✅ Output shapes match expected dimensions
✅ Broadcasting works correctly
✅ Reshape operations preserve data
✅ Batch dimensions handled correctly
✅ Edge cases (empty tensors, single samples, etc.)

Example: For 09_spatial:

Conv2d output shape: (batch, out_channels, height_out, width_out)
MaxPool2d reduces spatial dimensions correctly
Shapes work with Linear layers downstream

Modules That Need This: All modules that transform shapes

✅ 01_tensor, 03_layers, 09_spatial, 11_embeddings, 12_attention, 13_transformers

Category 5: Edge Cases & Error Handling ⚠️

Purpose: Verify robustness and helpful error messages

Checks:

✅ Handles empty inputs gracefully
✅ Handles zero values correctly
✅ Handles very large/small values
✅ Provides helpful error messages for invalid inputs
✅ Handles NaN/Inf correctly
✅ Handles out-of-bounds indices

Example: For 08_dataloader:

Empty dataset handled gracefully
Batch size larger than dataset handled correctly
Invalid indices raise clear error messages

All Modules Need This: Every module should handle edge cases

Category 6: Numerical Stability 🔢

Purpose: Verify numerical correctness and stability

Checks:

✅ No NaN values in outputs
✅ No Inf values in outputs
✅ Numerical precision is acceptable
✅ Operations are numerically stable
✅ Compare to reference implementations (NumPy, PyTorch)

Example: For 02_activations:

Sigmoid doesn't overflow for large inputs
Softmax is numerically stable (uses log-sum-exp trick)
No NaN/Inf in outputs

Modules That Need This: Modules with numerical operations

✅ 01_tensor, 02_activations, 03_layers, 04_losses, 05_autograd, 09_spatial, 11_embeddings, 12_attention, 13_transformers

Category 7: Memory & Performance ⚡

Purpose: Verify reasonable performance (not exhaustive, but catch major issues)

Checks:

✅ No memory leaks
✅ Operations complete in reasonable time
✅ Memory usage is reasonable
✅ Can handle realistic batch sizes

Example: For 13_transformers:

Forward pass completes in reasonable time for small models
Memory usage scales linearly with batch size
No memory leaks across multiple forward passes

Modules That Need This: Modules with performance-sensitive operations

✅ 05_autograd, 09_spatial, 12_attention, 13_transformers, 14_profiling, 18_acceleration, 19_benchmarking

Category 8: Real-World Usage 🌍

Purpose: Verify the module works in realistic scenarios

Checks:

✅ Can solve the intended problem
✅ Works with real datasets (if applicable)
✅ Matches expected behavior from documentation
✅ Can be used in production-like scenarios

Example: For 07_training:

Can train a simple model on real data
Loss decreases over epochs
Model actually learns (accuracy improves)

Modules That Need This: All modules should have at least one real-world usage test

Category 9: Export/Import Correctness 📦

Purpose: Verify code exports correctly and can be imported

Checks:

✅ Code exports to tinytorch/ correctly
✅ Can import from tinytorch.* package
✅ Exported API matches module API
✅ No import errors

All Modules Need This: Every module should test export/import

Category 10: API Consistency 🔌

Purpose: Verify API matches conventions and is usable

Checks:

✅ Methods have expected names
✅ Parameters match expected signatures
✅ Return types are consistent
✅ Follows TinyTorch conventions

All Modules Need This: Every module should test API consistency

📋 Module-by-Module Testing Plan

Module 01: Tensor (Foundation)

Critical Checks Needed:

✅ Core Functionality: All operations work (add, mul, matmul, etc.)
❌ Gradient Flow: Not applicable (no gradients yet)
❌ Integration: No previous modules
✅ Shape Correctness: Broadcasting, reshaping, indexing
✅ Edge Cases: Empty tensors, zero values, large arrays
✅ Numerical Stability: Precision, overflow handling
⚠️ Memory & Performance: Large tensor operations
✅ Real-World Usage: Can build neural networks with tensors
✅ Export/Import: Exports to tinytorch.core.tensor
✅ API Consistency: Matches NumPy-like API

Test Files:

tests/01_tensor/test_tensor_core.py - Core functionality
tests/01_tensor/test_tensor_integration.py - Integration with NumPy
tests/01_tensor/test_progressive_integration.py - Progressive integration

Module 02: Activations

Critical Checks Needed:

✅ Core Functionality: Forward pass correctness
✅ Gradient Flow: CRITICAL - All activations preserve gradients
✅ Integration: Works with Tensor (01)
✅ Shape Correctness: Output shape matches input shape
✅ Edge Cases: Large values, zero values, negative values
✅ Numerical Stability: CRITICAL - No overflow/underflow
⚠️ Memory & Performance: Fast forward/backward passes
✅ Real-World Usage: Can use in neural networks
✅ Export/Import: Exports to tinytorch.core.activations
✅ API Consistency: All activations have same interface

Test Files:

tests/02_activations/test_activations_core.py - Core functionality
tests/02_activations/test_gradient_flow.py - MISSING - Gradient flow tests
tests/02_activations/test_activations_integration.py - Integration
tests/02_activations/test_progressive_integration.py - Progressive integration

Gap: Missing comprehensive gradient flow tests for all activations

Module 03: Layers

Critical Checks Needed:

✅ Core Functionality: Forward pass, parameter initialization
✅ Gradient Flow: CRITICAL - All layers compute gradients correctly
✅ Integration: Works with Tensor (01), Activations (02)
✅ Shape Correctness: CRITICAL - Output shapes match expectations
✅ Edge Cases: Zero inputs, single samples, large batches
✅ Numerical Stability: No NaN/Inf in outputs
⚠️ Memory & Performance: Reasonable memory usage
✅ Real-World Usage: Can build neural networks
✅ Export/Import: Exports to tinytorch.core.layers
✅ API Consistency: All layers follow Module interface

Test Files:

tests/03_layers/test_layers_core.py - Core functionality
tests/03_layers/test_layers_integration.py - Integration
tests/03_layers/test_layers_networks_integration.py - Network integration
tests/03_layers/test_progressive_integration.py - Progressive integration

Gap: Missing gradient flow tests for Dropout, LayerNorm (if in layers module)

Module 04: Losses

Critical Checks Needed:

✅ Core Functionality: Loss computation correctness
✅ Gradient Flow: CRITICAL - Loss functions compute gradients
✅ Integration: Works with Tensor (01), Layers (03)
✅ Shape Correctness: Handles different batch sizes
✅ Edge Cases: Perfect predictions, zero loss, large losses
✅ Numerical Stability: CRITICAL - Log operations stable
⚠️ Memory & Performance: Efficient computation
✅ Real-World Usage: Can use in training loops
✅ Export/Import: Exports to tinytorch.core.losses
✅ API Consistency: All losses have same interface

Test Files:

tests/04_losses/test_dense_layer.py - Layer tests
tests/04_losses/test_dense_integration.py - Integration
tests/04_losses/test_network_capability.py - Network capability
tests/04_losses/test_progressive_integration.py - Progressive integration

Status: Good coverage

Module 05: Autograd

Critical Checks Needed:

✅ Core Functionality: Forward/backward pass correctness
✅ Gradient Flow: CRITICAL - Gradients computed correctly
✅ Integration: Works with all previous modules
✅ Shape Correctness: Gradient shapes match parameter shapes
✅ Edge Cases: CRITICAL - Broadcasting, reshape, chain rule
✅ Numerical Stability: CRITICAL - Gradient computation stable
✅ Memory & Performance: CRITICAL - No memory leaks
✅ Real-World Usage: Can train models
✅ Export/Import: Exports to tinytorch.core.autograd
✅ API Consistency: Matches PyTorch-like API

Test Files:

tests/05_autograd/test_gradient_flow.py - Gradient flow ✅
tests/05_autograd/test_batched_matmul_backward.py - Batched operations
tests/05_autograd/test_progressive_integration.py - Progressive integration

Status: Excellent coverage

Module 06: Optimizers

Critical Checks Needed:

✅ Core Functionality: Parameter updates work
✅ Gradient Flow: CRITICAL - Optimizers use gradients correctly
✅ Integration: Works with Autograd (05), Layers (03)
✅ Shape Correctness: Parameter shapes preserved
✅ Edge Cases: Zero gradients, very small/large learning rates
✅ Numerical Stability: Updates don't cause overflow
⚠️ Memory & Performance: Efficient updates
✅ Real-World Usage: Can train models
✅ Export/Import: Exports to tinytorch.core.optimizers
✅ API Consistency: All optimizers have same interface

Test Files:

tests/06_optimizers/test_progressive_integration.py - Progressive integration
tests/06_optimizers/test_cnn_networks_integration.py - CNN integration
tests/06_optimizers/test_cnn_pipeline_integration.py - Pipeline integration

Gap: Missing dedicated optimizer functionality tests

Module 07: Training

Critical Checks Needed:

✅ Core Functionality: Training loops work
✅ Gradient Flow: CRITICAL - Full training stack gradients work
✅ Integration: Works with all previous modules (01-06)
✅ Shape Correctness: Batch handling, loss aggregation
✅ Edge Cases: Single sample, empty batches, convergence
✅ Numerical Stability: Training doesn't diverge
⚠️ Memory & Performance: Reasonable training speed
✅ Real-World Usage: CRITICAL - Can actually train models
✅ Export/Import: Exports to tinytorch.core.training
✅ API Consistency: Training API is usable

Test Files:

tests/07_training/test_autograd_integration.py - Autograd integration
tests/07_training/test_tensor_autograd_integration.py - Tensor integration
tests/07_training/test_progressive_integration.py - Progressive integration

Gap: Missing end-to-end training convergence tests

Module 08: Dataloader

Critical Checks Needed:

✅ Core Functionality: Batching, shuffling, iteration work
❌ Gradient Flow: Not applicable (data only)
✅ Integration: Works with Tensor (01), doesn't break gradients
✅ Shape Correctness: Batch shapes correct
✅ Edge Cases: CRITICAL - Empty dataset, batch > dataset size
⚠️ Numerical Stability: Not applicable
✅ Memory & Performance: CRITICAL - Efficient data loading
✅ Real-World Usage: Can load real datasets
✅ Export/Import: Exports to tinytorch.data.dataloader
✅ API Consistency: Iterator interface works

Test Files:

tests/08_dataloader/test_autograd_core.py - Core functionality
tests/08_dataloader/test_progressive_integration.py - Progressive integration

Gap: Missing comprehensive edge case tests, missing tests that verify dataloader doesn't break gradient flow

Module 09: Spatial (CNNs)

Critical Checks Needed:

✅ Core Functionality: Conv2d, Pooling work correctly
✅ Gradient Flow: CRITICAL - Conv2d gradients work
✅ Integration: Works with Tensor (01), Layers (03), Autograd (05)
✅ Shape Correctness: CRITICAL - Output shapes match expectations
✅ Edge Cases: Kernel size > image size, stride > kernel size
✅ Numerical Stability: No NaN/Inf in outputs
✅ Memory & Performance: CRITICAL - Efficient convolution
✅ Real-World Usage: Can build CNNs
✅ Export/Import: Exports to tinytorch.core.spatial
✅ API Consistency: Matches PyTorch Conv2d API

Test Files:

tests/integration/test_cnn_integration.py - CNN integration ✅
tests/09_spatial/test_progressive_integration.py - Progressive integration

Status: Good coverage

Module 10: Tokenization

Critical Checks Needed:

✅ Core Functionality: Tokenization works correctly
❌ Gradient Flow: Not applicable (text processing)
✅ Integration: Works with Tensor (01)
✅ Shape Correctness: Token sequences have correct shapes
✅ Edge Cases: Empty strings, special characters, long sequences
⚠️ Numerical Stability: Not applicable
⚠️ Memory & Performance: Efficient tokenization
✅ Real-World Usage: Can tokenize real text
✅ Export/Import: Exports to tinytorch.text.tokenization
✅ API Consistency: Tokenizer interface works

Test Files:

tests/10_tokenization/test_progressive_integration.py - Progressive integration

Gap: Missing comprehensive tokenization tests

Module 11: Embeddings

Critical Checks Needed:

✅ Core Functionality: Embedding lookup works
✅ Gradient Flow: CRITICAL - Embedding gradients work
✅ Integration: Works with Tokenization (10), Tensor (01)
✅ Shape Correctness: Embedding shapes correct
✅ Edge Cases: Out-of-vocab tokens, zero embeddings
✅ Numerical Stability: Embedding values reasonable
⚠️ Memory & Performance: Efficient embedding lookup
✅ Real-World Usage: Can embed real text
✅ Export/Import: Exports to tinytorch.text.embeddings
✅ API Consistency: Embedding interface works

Test Files:

tests/11_embeddings/test_training_integration.py - Training integration
tests/11_embeddings/test_ml_pipeline.py - ML pipeline
tests/11_embeddings/test_progressive_integration.py - Progressive integration

Status: Good coverage

Module 12: Attention

Critical Checks Needed:

✅ Core Functionality: Attention mechanism works
✅ Gradient Flow: CRITICAL - Attention gradients work
✅ Integration: Works with Embeddings (11), Tensor (01)
✅ Shape Correctness: CRITICAL - Attention output shapes correct
✅ Edge Cases: Causal masking, padding masks, long sequences
✅ Numerical Stability: CRITICAL - Softmax stability
✅ Memory & Performance: CRITICAL - O(n²) complexity handled
✅ Real-World Usage: Can use in transformers
✅ Export/Import: Exports to tinytorch.models.attention
✅ API Consistency: Attention interface works

Test Files:

tests/12_attention/test_progressive_integration.py - Progressive integration
tests/12_attention/test_compression_integration.py - Compression integration

Gap: Missing dedicated attention mechanism tests

Module 13: Transformers

Critical Checks Needed:

✅ Core Functionality: Transformer blocks work
✅ Gradient Flow: CRITICAL - Full transformer gradients work
✅ Integration: Works with all previous modules (01-12)
✅ Shape Correctness: CRITICAL - Transformer output shapes correct
✅ Edge Cases: Variable sequence lengths, masking
✅ Numerical Stability: CRITICAL - LayerNorm, attention stability
✅ Memory & Performance: CRITICAL - Efficient transformer forward/backward
✅ Real-World Usage: CRITICAL - Can train transformers
✅ Export/Import: Exports to tinytorch.models.transformer
✅ API Consistency: Transformer API works

Test Files:

tests/13_transformers/test_transformer_gradient_flow.py - Gradient flow ✅
tests/13_transformers/test_training_simple.py - Training tests
tests/13_transformers/test_kernels_integration.py - Kernel integration
tests/13_transformers/test_progressive_integration.py - Progressive integration

Status: Excellent coverage

Module 14: Profiling

Critical Checks Needed:

✅ Core Functionality: Profiling works correctly
❌ Gradient Flow: Not applicable (analysis only)
✅ Integration: Works with all modules
⚠️ Shape Correctness: Not applicable
✅ Edge Cases: Empty profiles, very fast operations
⚠️ Numerical Stability: Not applicable
✅ Memory & Performance: CRITICAL - Profiling overhead minimal
✅ Real-World Usage: Can profile real models
✅ Export/Import: Exports to tinytorch.profiling
✅ API Consistency: Profiler interface works

Test Files:

tests/14_profiling/test_progressive_integration.py - Progressive integration
tests/14_profiling/test_benchmarking_integration.py - Benchmarking integration
tests/14_profiling/test_kv_cache_integration.py - KV cache integration

Status: Good coverage

Module 15: Quantization

Critical Checks Needed:

✅ Core Functionality: Quantization works correctly
⚠️ Gradient Flow: May need gradient tests if quantization-aware training
✅ Integration: Works with trained models
✅ Shape Correctness: Quantized model shapes preserved
✅ Edge Cases: Extreme values, zero values
✅ Numerical Stability: Quantization doesn't cause overflow
✅ Memory & Performance: CRITICAL - Memory reduction achieved
✅ Real-World Usage: Can quantize real models
✅ Export/Import: Exports to tinytorch.quantization
✅ API Consistency: Quantization API works

Test Files:

tests/15_memoization/test_progressive_integration.py - Progressive integration
tests/15_memoization/test_mlops_integration.py - MLOps integration
tests/15_memoization/test_tinygpt_integration.py - TinyGPT integration

Gap: Missing quantization-specific tests