Add integration tests for training flow and NLP pipeline

New tests that catch module boundary bugs:

test_training_flow.py:
- Optimizer actually updates weights (SGD, Adam)
- Training reduces loss over iterations
- Gradient chain not broken through 5 layers
- Input receives gradients
- zero_grad works correctly
- Batch gradients are averaged

test_nlp_pipeline_flow.py:
- Embedding receives gradients
- Repeated tokens accumulate gradients
- Attention projections receive gradients
- Attention input receives gradients (xfail - known issue)
- Transformer block gradient flow
- Complete NLP pipeline end-to-end

README.md:
- Integration test philosophy
- Good vs bad integration test examples
- Coverage gaps and how to fill them
This commit is contained in:
Vijay Janapa Reddi
2025-12-02 22:14:08 -05:00
parent 3a885601f9
commit fb20e255c9
3 changed files with 847 additions and 0 deletions

142
tests/integration/README.md Normal file
View File

@@ -0,0 +1,142 @@
# Integration Tests
## Philosophy
Integration tests catch bugs that **unit tests miss** - specifically bugs at **module boundaries** where one module's output becomes another module's input.
### The Gradient Flow Pattern
The gold standard is `test_gradient_flow.py`. It verifies:
1. **Gradients exist** (not None)
2. **Gradients are non-zero** (actually computed)
3. **Gradients flow through each layer** (chain not broken)
4. **Training actually works** (loss decreases)
This pattern catches the most common and frustrating bugs students encounter.
## Test Categories
### 🔥 Critical (Must Pass)
| Test File | What It Catches | Modules |
|-----------|-----------------|---------|
| `test_gradient_flow.py` | Broken backpropagation | 01-07 |
| `test_training_flow.py` | Training loop failures | 05-07 |
| `test_nlp_pipeline_flow.py` | NLP stack issues | 10-13 |
| `test_cnn_integration.py` | CNN gradient issues | 09 |
### 📋 Standard (Should Pass)
| Test File | What It Catches | Modules |
|-----------|-----------------|---------|
| `test_dataloader_integration.py` | Data pipeline issues | 08 |
| `test_api_simplification_integration.py` | API compatibility | All |
### 🔬 Scenario Tests
These test complete use cases:
- `integration_xor_test.py` - XOR learning (classic test)
- `integration_mnist_test.py` - MNIST classification
- `integration_cnn_test.py` - CNN on images
- `integration_tinygpt_test.py` - Language model training
## What Makes a Good Integration Test
### ✅ Good Integration Test
```python
def test_gradients_flow_through_mlp():
"""Gradients must reach all layers"""
layers = [Linear(4, 4) for _ in range(5)]
x = Tensor(np.random.randn(1, 4), requires_grad=True)
h = x
for layer in layers:
h = relu(layer(h))
loss = mse_loss(h, target)
loss.backward()
# ALL layers must have gradients
for i, layer in enumerate(layers):
assert layer.weight.grad is not None, f"Layer {i} has no gradient!"
```
**Why it's good:**
- Tests the **boundary** between layers
- Catches gradient chain breaks
- Clear error message tells you WHERE it broke
### ❌ Bad Integration Test
```python
def test_linear_layer():
"""Test linear layer works"""
layer = Linear(2, 3)
x = Tensor([[1, 2]])
y = layer(x)
assert y.shape == (1, 3)
```
**Why it's bad:**
- This is a **unit test**, not integration
- Doesn't test interaction with other modules
- Belongs in `tests/03_layers/`
## Running Tests
```bash
# Run all integration tests
pytest tests/integration/ -v
# Run only gradient flow tests
pytest tests/integration/test_gradient_flow.py -v
# Run only training flow tests
pytest tests/integration/test_training_flow.py -v
# Run quick smoke tests (for CI)
pytest tests/integration/ -v -k quick
# Run with detailed output on failure
pytest tests/integration/ -v --tb=long
```
## Adding New Integration Tests
When adding a new module (e.g., Module 14: Profiling), ask:
1. **What other modules does it interact with?**
- Profiling interacts with training loops (07) and models (03)
2. **What could break at the boundary?**
- Profiling hooks might interfere with autograd
- Timing might change tensor operations
3. **Write a test that exercises the boundary:**
```python
def test_profiling_does_not_break_training():
"""Profiling should not interfere with gradient flow"""
with profiler.profile():
loss = model(x)
loss.backward() # Should still work!
assert model.weight.grad is not None
```
## Coverage Gaps
### Currently Missing
| Module | Integration Test Needed |
|--------|------------------------|
| 14 Profiling | Profiler + training loop |
| 15 Quantization | Quantized model accuracy |
| 16 Compression | Compressed model still trains |
| 17 Memoization | Cached ops maintain correctness |
| 18 Acceleration | Accelerated ops match baseline |
### How to Fill Gaps
For each gap, create a test that:
1. Uses the module in a **realistic scenario**
2. Verifies **correctness** (not just "doesn't crash")
3. Checks **boundaries** with connected modules

View File

@@ -0,0 +1,349 @@
"""
NLP Pipeline Flow Integration Tests
====================================
Tests that the NLP pipeline works end-to-end:
1. Tokenization produces valid token IDs
2. Embeddings convert tokens to vectors
3. Attention mechanisms process sequences
4. Transformers combine everything correctly
5. Gradients flow back through the entire pipeline
These tests catch issues at module boundaries in the NLP stack.
Modules tested: 10-13 (Tokenization → Embeddings → Attention → Transformers)
"""
import pytest
import numpy as np
import sys
from pathlib import Path
# Add project root
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from tinytorch.core.tensor import Tensor
from tinytorch.core.autograd import enable_autograd
# Enable autograd
enable_autograd()
class TestEmbeddingGradientFlow:
"""
Critical Test: Verify gradients flow through embeddings.
Common bugs caught:
- Embedding lookup not differentiable
- Wrong gradient accumulation for repeated tokens
- Shape mismatches between embedding and attention
"""
def test_embedding_receives_gradients(self):
"""Embedding weights must receive gradients during training"""
try:
from tinytorch.core.embeddings import Embedding
except ImportError:
pytest.skip("Embedding module not yet implemented")
vocab_size = 100
embed_dim = 32
embedding = Embedding(vocab_size, embed_dim)
# Token IDs (integers)
token_ids = [1, 5, 3, 7, 2]
# Forward pass
embedded = embedding.forward(token_ids)
# Simple loss: sum of embeddings
loss = Tensor(np.array([[embedded.data.sum()]]), requires_grad=True)
loss.backward()
# Embedding weights should have gradients
assert embedding.weight.grad is not None, (
"Embedding weights did not receive gradients!"
)
# Only used token embeddings should have non-zero gradients
for token_id in token_ids:
grad_row = embedding.weight.grad[token_id]
assert np.any(grad_row != 0), (
f"Token {token_id} embedding has zero gradient!"
)
def test_repeated_tokens_accumulate_gradients(self):
"""Same token appearing twice should have accumulated gradient"""
try:
from tinytorch.core.embeddings import Embedding
except ImportError:
pytest.skip("Embedding module not yet implemented")
vocab_size = 10
embed_dim = 4
embedding = Embedding(vocab_size, embed_dim)
# Token 5 appears twice
token_ids = [5, 2, 5, 3]
embedded = embedding.forward(token_ids)
# Loss that weights all positions equally
loss = Tensor(np.array([[embedded.data.sum()]]), requires_grad=True)
loss.backward()
# Token 5 should have ~2x the gradient of token 2 or 3
grad_5 = np.linalg.norm(embedding.weight.grad[5])
grad_2 = np.linalg.norm(embedding.weight.grad[2])
# Allow some tolerance
assert grad_5 > grad_2 * 1.5, (
f"Repeated token gradient not accumulated!\n"
f" Token 5 (appears 2x) grad: {grad_5}\n"
f" Token 2 (appears 1x) grad: {grad_2}\n"
f" Expected ratio ~2, got {grad_5/grad_2:.2f}"
)
class TestAttentionGradientFlow:
"""
Critical Test: Verify gradients flow through attention mechanism.
Common bugs caught:
- Softmax gradient issues
- Attention weights not differentiable
- Query/Key/Value projection gradients
"""
def test_attention_all_projections_receive_gradients(self):
"""Q, K, V projections must all receive gradients"""
try:
from tinytorch.core.attention import MultiHeadAttention
except ImportError:
pytest.skip("Attention module not yet implemented")
embed_dim = 32
num_heads = 4
seq_len = 8
batch_size = 2
attention = MultiHeadAttention(embed_dim, num_heads)
# Random input sequence
x = Tensor(
np.random.randn(batch_size, seq_len, embed_dim),
requires_grad=True
)
# Forward pass (self-attention - single input for Q, K, V)
output = attention.forward(x)
# Simple loss
loss = Tensor(np.array([[output.data.sum()]]), requires_grad=True)
loss.backward()
# All projection matrices should have gradients
projections = ['W_q', 'W_k', 'W_v', 'W_o']
for proj_name in projections:
if hasattr(attention, proj_name):
proj = getattr(attention, proj_name)
if hasattr(proj, 'weight'):
assert proj.weight.grad is not None, (
f"{proj_name} did not receive gradients!"
)
@pytest.mark.xfail(reason="Known issue: Attention gradient flow needs fix - see Module 12")
def test_attention_input_receives_gradients(self):
"""Input to attention must receive gradients for residual connections"""
try:
from tinytorch.core.attention import MultiHeadAttention
except ImportError:
pytest.skip("Attention module not yet implemented")
embed_dim = 16
num_heads = 2
attention = MultiHeadAttention(embed_dim, num_heads)
x = Tensor(
np.random.randn(1, 4, embed_dim),
requires_grad=True
)
output = attention.forward(x)
loss = Tensor(np.array([[output.data.sum()]]), requires_grad=True)
loss.backward()
assert x.grad is not None, (
"Input to attention did not receive gradients!\n"
"This breaks residual connections in Transformers."
)
assert x.grad.shape == x.shape, (
f"Input gradient shape mismatch: {x.grad.shape} vs {x.shape}"
)
class TestTransformerGradientFlow:
"""
Critical Test: Verify gradients flow through complete Transformer.
Common bugs caught:
- Residual connection gradients
- Layer norm gradient issues
- Deep network vanishing gradients
"""
def test_transformer_block_gradient_flow(self):
"""Gradients must flow through a complete transformer block"""
try:
from tinytorch.core.transformers import TransformerBlock
except ImportError:
pytest.skip("Transformer module not yet implemented")
embed_dim = 32
num_heads = 4
ff_dim = 64
block = TransformerBlock(embed_dim, num_heads, ff_dim)
x = Tensor(
np.random.randn(1, 8, embed_dim),
requires_grad=True
)
output = block.forward(x)
loss = Tensor(np.array([[output.data.sum()]]), requires_grad=True)
loss.backward()
# Input must receive gradients (for stacking blocks)
assert x.grad is not None, (
"Transformer block input did not receive gradients!"
)
# Gradient should not be too small (vanishing)
grad_norm = np.linalg.norm(x.grad)
assert grad_norm > 1e-6, (
f"Vanishing gradients in transformer block: {grad_norm}"
)
def test_stacked_transformer_blocks(self):
"""Gradients must flow through multiple stacked blocks"""
try:
from tinytorch.core.transformers import TransformerBlock
except ImportError:
pytest.skip("Transformer module not yet implemented")
embed_dim = 32
num_heads = 4
ff_dim = 64
num_layers = 4
blocks = [TransformerBlock(embed_dim, num_heads, ff_dim) for _ in range(num_layers)]
x = Tensor(
np.random.randn(1, 8, embed_dim),
requires_grad=True
)
# Forward through all blocks
h = x
for block in blocks:
h = block.forward(h)
loss = Tensor(np.array([[h.data.sum()]]), requires_grad=True)
loss.backward()
# Input must receive gradients through all layers
assert x.grad is not None, (
f"Gradients did not flow through {num_layers} transformer blocks!"
)
# Check gradient magnitude is reasonable
grad_norm = np.linalg.norm(x.grad)
assert grad_norm > 1e-8, (
f"Severe vanishing gradients through {num_layers} blocks: {grad_norm}"
)
class TestNLPPipelineEndToEnd:
"""
Integration Test: Full NLP pipeline from tokens to loss.
This tests the complete flow:
tokens → embedding → attention → linear → loss
"""
def test_complete_nlp_forward_backward(self):
"""Complete NLP pipeline must work end-to-end"""
try:
from tinytorch.core.embeddings import Embedding
from tinytorch.core.attention import MultiHeadAttention
from tinytorch.core.layers import Linear
from tinytorch.core.losses import CrossEntropyLoss
except ImportError:
pytest.skip("NLP modules not yet implemented")
vocab_size = 100
embed_dim = 32
num_heads = 4
num_classes = 10
seq_len = 8
# Build pipeline
embedding = Embedding(vocab_size, embed_dim)
attention = MultiHeadAttention(embed_dim, num_heads)
classifier = Linear(embed_dim, num_classes)
loss_fn = CrossEntropyLoss()
# Input: token IDs
token_ids = list(np.random.randint(0, vocab_size, seq_len))
target = Tensor(np.array([[3]])) # Class 3
# Forward pass
embedded = embedding.forward(token_ids) # [seq_len, embed_dim]
# Reshape for attention: add batch dimension
embedded_batched = Tensor(embedded.data.reshape(1, seq_len, embed_dim), requires_grad=True)
attended = attention.forward(embedded_batched) # [1, seq_len, embed_dim]
# Mean pooling over sequence
pooled = Tensor(attended.data.mean(axis=0, keepdims=True), requires_grad=True)
logits = classifier.forward(pooled) # [1, num_classes]
loss = loss_fn.forward(logits, target)
# Backward pass
loss.backward()
# Verify gradients flowed to embedding
assert embedding.weight.grad is not None, (
"Gradients did not flow back to embeddings!"
)
# Verify classifier received gradients
assert classifier.weight.grad is not None, (
"Classifier did not receive gradients!"
)
# Quick smoke tests for CI
@pytest.mark.quick
class TestQuickNLPSmoke:
"""Fast tests for CI"""
def test_embedding_forward_works(self):
"""Embedding forward should not crash"""
try:
from tinytorch.core.embeddings import Embedding
except ImportError:
pytest.skip("Embedding module not yet implemented")
embedding = Embedding(100, 32)
result = embedding.forward([1, 2, 3])
assert result.shape[0] == 3
assert result.shape[1] == 32
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,356 @@
"""
Training Flow Integration Tests
================================
Tests that the complete training pipeline works:
1. Forward pass produces valid outputs
2. Loss computes correctly
3. Backward pass populates gradients
4. Optimizer updates weights
5. Loss decreases over iterations
These tests catch issues that unit tests miss - where modules
work individually but fail when connected.
Modules tested: 01-07 (Tensor → Training)
"""
import pytest
import numpy as np
import sys
from pathlib import Path
# Add project root
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
from tinytorch.core.tensor import Tensor
from tinytorch.core.layers import Linear
from tinytorch.core.activations import ReLU, Sigmoid
from tinytorch.core.losses import MSELoss, CrossEntropyLoss
from tinytorch.core.optimizers import SGD, Adam
from tinytorch.core.autograd import enable_autograd
# Enable autograd for all tests
enable_autograd()
class TestOptimzerActuallyUpdatesWeights:
"""
Critical Test: Verify optimizer.step() actually changes weights.
Common bugs caught:
- Optimizer not connected to parameters
- Gradients not flowing to weights
- Learning rate is zero
- step() not implemented correctly
"""
def test_sgd_updates_weights(self):
"""SGD must modify weights after step()"""
layer = Linear(2, 1)
optimizer = SGD([layer.weight, layer.bias], lr=0.1)
# Store initial weights
initial_weight = layer.weight.data.copy()
initial_bias = layer.bias.data.copy()
# Forward + backward
x = Tensor([[1.0, 2.0]], requires_grad=True)
target = Tensor([[5.0]])
output = layer.forward(x)
loss = MSELoss().forward(output, target)
loss.backward()
# Verify gradients exist
assert layer.weight.grad is not None, "Weight gradient is None!"
assert layer.bias.grad is not None, "Bias gradient is None!"
# Step should update weights
optimizer.step()
# Weights MUST be different
weight_changed = not np.allclose(initial_weight, layer.weight.data)
bias_changed = not np.allclose(initial_bias, layer.bias.data)
assert weight_changed, (
f"SGD.step() did not change weights!\n"
f" Before: {initial_weight}\n"
f" After: {layer.weight.data}\n"
f" Grad: {layer.weight.grad}"
)
assert bias_changed, "SGD.step() did not change bias!"
def test_adam_updates_weights(self):
"""Adam must modify weights after step()"""
layer = Linear(2, 1)
optimizer = Adam([layer.weight, layer.bias], lr=0.1)
initial_weight = layer.weight.data.copy()
x = Tensor([[1.0, 2.0]], requires_grad=True)
target = Tensor([[5.0]])
output = layer.forward(x)
loss = MSELoss().forward(output, target)
loss.backward()
optimizer.step()
assert not np.allclose(initial_weight, layer.weight.data), (
"Adam.step() did not change weights!"
)
class TestTrainingReducesLoss:
"""
Critical Test: Verify that training actually reduces loss.
Common bugs caught:
- Gradients have wrong sign
- Learning rate too high (divergence)
- Optimizer not using gradients correctly
- Loss function returning wrong values
"""
def test_mlp_loss_decreases(self):
"""A simple MLP must learn XOR-like pattern"""
# Simple 2-layer network
layer1 = Linear(2, 4)
relu = ReLU()
layer2 = Linear(4, 1)
sigmoid = Sigmoid()
loss_fn = MSELoss()
params = [layer1.weight, layer1.bias, layer2.weight, layer2.bias]
optimizer = SGD(params, lr=0.5)
# XOR-like data
X = Tensor([
[0., 0.],
[0., 1.],
[1., 0.],
[1., 1.]
], requires_grad=True)
y = Tensor([[0.], [1.], [1.], [0.]])
# Track loss over time
losses = []
for epoch in range(100):
# Zero gradients
for p in params:
if p.grad is not None:
p.grad = np.zeros_like(p.grad)
# Forward
h = relu.forward(layer1.forward(X))
out = sigmoid.forward(layer2.forward(h))
loss = loss_fn.forward(out, y)
losses.append(float(loss.data))
# Backward
loss.backward()
# Update
optimizer.step()
# Loss MUST decrease
initial_loss = losses[0]
final_loss = losses[-1]
assert final_loss < initial_loss, (
f"Training did not reduce loss!\n"
f" Initial: {initial_loss:.4f}\n"
f" Final: {final_loss:.4f}\n"
f" Loss history: {losses[:5]}...{losses[-5:]}"
)
# Loss should decrease significantly (at least 20%)
improvement = (initial_loss - final_loss) / initial_loss
assert improvement > 0.2, (
f"Training improved loss by only {improvement*100:.1f}%\n"
f" Expected at least 20% improvement"
)
class TestGradientChainNotBroken:
"""
Critical Test: Verify gradient chain is not broken.
Common bugs caught:
- requires_grad not propagating
- Operations not recording grad_fn
- Intermediate tensors breaking the chain
"""
def test_deep_network_gradient_chain(self):
"""Gradients must flow through 5 layers"""
layers = [Linear(4, 4) for _ in range(5)]
relu = ReLU()
x = Tensor(np.random.randn(1, 4), requires_grad=True)
target = Tensor(np.random.randn(1, 4))
# Forward through all layers
h = x
for layer in layers:
h = relu.forward(layer.forward(h))
loss = MSELoss().forward(h, target)
loss.backward()
# ALL layers must have gradients
for i, layer in enumerate(layers):
assert layer.weight.grad is not None, (
f"Layer {i} weight.grad is None - gradient chain broken!"
)
assert layer.bias.grad is not None, (
f"Layer {i} bias.grad is None - gradient chain broken!"
)
# Gradients should be non-trivial
grad_norm = np.linalg.norm(layer.weight.grad)
assert grad_norm > 1e-10, (
f"Layer {i} has vanishing gradients: {grad_norm}"
)
def test_input_receives_gradients(self):
"""Input tensor must receive gradients for visualization/debugging"""
layer = Linear(3, 2)
x = Tensor([[1., 2., 3.]], requires_grad=True)
target = Tensor([[1., 0.]])
output = layer.forward(x)
loss = MSELoss().forward(output, target)
loss.backward()
assert x.grad is not None, "Input tensor did not receive gradients!"
assert x.grad.shape == x.shape, (
f"Input gradient shape mismatch: {x.grad.shape} vs {x.shape}"
)
class TestZeroGradWorks:
"""
Critical Test: Verify zero_grad clears gradients properly.
Common bugs caught:
- Gradients accumulating across batches
- zero_grad not actually zeroing
- Memory leaks from gradient accumulation
"""
def test_gradients_dont_accumulate_after_zero_grad(self):
"""Gradients must not accumulate when zero_grad is called"""
layer = Linear(2, 1)
optimizer = SGD([layer.weight, layer.bias], lr=0.1)
x = Tensor([[1., 2.]], requires_grad=True)
target = Tensor([[1.]])
# First forward/backward
out1 = layer.forward(x)
loss1 = MSELoss().forward(out1, target)
loss1.backward()
grad_after_first = layer.weight.grad.copy()
# Zero gradients
optimizer.zero_grad()
# Verify zeroed
assert layer.weight.grad is None or np.allclose(layer.weight.grad, 0), (
"zero_grad() did not clear weight gradients!"
)
# Second forward/backward
out2 = layer.forward(x)
loss2 = MSELoss().forward(out2, target)
loss2.backward()
grad_after_second = layer.weight.grad.copy()
# Gradients should be similar magnitude (not accumulated)
ratio = np.linalg.norm(grad_after_second) / np.linalg.norm(grad_after_first)
assert 0.5 < ratio < 2.0, (
f"Gradients appear to be accumulating!\n"
f" First grad norm: {np.linalg.norm(grad_after_first)}\n"
f" Second grad norm: {np.linalg.norm(grad_after_second)}\n"
f" Ratio: {ratio} (should be ~1.0)"
)
class TestBatchTraining:
"""
Critical Test: Verify batch training works correctly.
Common bugs caught:
- Shape mismatches with batches
- Mean vs sum reduction issues
- Gradient scaling problems
"""
def test_batch_gradients_are_averaged(self):
"""Gradients should be averaged over batch (not summed)"""
layer = Linear(2, 1)
# Single sample
x1 = Tensor([[1., 2.]], requires_grad=True)
target1 = Tensor([[3.]])
out1 = layer.forward(x1)
loss1 = MSELoss().forward(out1, target1)
loss1.backward()
single_grad = layer.weight.grad.copy()
# Reset
layer.weight.grad = None
layer.bias.grad = None
# Batch of same sample repeated 4 times
x_batch = Tensor([[1., 2.]] * 4, requires_grad=True)
target_batch = Tensor([[3.]] * 4)
out_batch = layer.forward(x_batch)
loss_batch = MSELoss().forward(out_batch, target_batch)
loss_batch.backward()
batch_grad = layer.weight.grad.copy()
# Gradients should be similar (averaged, not 4x)
ratio = np.linalg.norm(batch_grad) / np.linalg.norm(single_grad)
assert 0.8 < ratio < 1.2, (
f"Batch gradients not properly averaged!\n"
f" Single sample grad norm: {np.linalg.norm(single_grad)}\n"
f" Batch (4x same) grad norm: {np.linalg.norm(batch_grad)}\n"
f" Ratio: {ratio} (should be ~1.0, got {ratio:.2f})"
)
# Quick smoke test for CI
@pytest.mark.quick
class TestQuickTrainingSmoke:
"""Fast tests for CI - just verify nothing crashes"""
def test_simple_training_step(self):
"""One training step should not crash"""
layer = Linear(2, 1)
opt = SGD([layer.weight, layer.bias], lr=0.1)
x = Tensor([[1., 2.]], requires_grad=True)
y = Tensor([[1.]])
out = layer.forward(x)
loss = MSELoss().forward(out, y)
loss.backward()
opt.step()
assert True # If we got here, it works
if __name__ == "__main__":
pytest.main([__file__, "-v"])