mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-27 11:38:42 -05:00
Add integration tests for training flow and NLP pipeline
New tests that catch module boundary bugs: test_training_flow.py: - Optimizer actually updates weights (SGD, Adam) - Training reduces loss over iterations - Gradient chain not broken through 5 layers - Input receives gradients - zero_grad works correctly - Batch gradients are averaged test_nlp_pipeline_flow.py: - Embedding receives gradients - Repeated tokens accumulate gradients - Attention projections receive gradients - Attention input receives gradients (xfail - known issue) - Transformer block gradient flow - Complete NLP pipeline end-to-end README.md: - Integration test philosophy - Good vs bad integration test examples - Coverage gaps and how to fill them
This commit is contained in:
142
tests/integration/README.md
Normal file
142
tests/integration/README.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Integration Tests
|
||||
|
||||
## Philosophy
|
||||
|
||||
Integration tests catch bugs that **unit tests miss** - specifically bugs at **module boundaries** where one module's output becomes another module's input.
|
||||
|
||||
### The Gradient Flow Pattern
|
||||
|
||||
The gold standard is `test_gradient_flow.py`. It verifies:
|
||||
1. **Gradients exist** (not None)
|
||||
2. **Gradients are non-zero** (actually computed)
|
||||
3. **Gradients flow through each layer** (chain not broken)
|
||||
4. **Training actually works** (loss decreases)
|
||||
|
||||
This pattern catches the most common and frustrating bugs students encounter.
|
||||
|
||||
## Test Categories
|
||||
|
||||
### 🔥 Critical (Must Pass)
|
||||
|
||||
| Test File | What It Catches | Modules |
|
||||
|-----------|-----------------|---------|
|
||||
| `test_gradient_flow.py` | Broken backpropagation | 01-07 |
|
||||
| `test_training_flow.py` | Training loop failures | 05-07 |
|
||||
| `test_nlp_pipeline_flow.py` | NLP stack issues | 10-13 |
|
||||
| `test_cnn_integration.py` | CNN gradient issues | 09 |
|
||||
|
||||
### 📋 Standard (Should Pass)
|
||||
|
||||
| Test File | What It Catches | Modules |
|
||||
|-----------|-----------------|---------|
|
||||
| `test_dataloader_integration.py` | Data pipeline issues | 08 |
|
||||
| `test_api_simplification_integration.py` | API compatibility | All |
|
||||
|
||||
### 🔬 Scenario Tests
|
||||
|
||||
These test complete use cases:
|
||||
- `integration_xor_test.py` - XOR learning (classic test)
|
||||
- `integration_mnist_test.py` - MNIST classification
|
||||
- `integration_cnn_test.py` - CNN on images
|
||||
- `integration_tinygpt_test.py` - Language model training
|
||||
|
||||
## What Makes a Good Integration Test
|
||||
|
||||
### ✅ Good Integration Test
|
||||
```python
|
||||
def test_gradients_flow_through_mlp():
|
||||
"""Gradients must reach all layers"""
|
||||
layers = [Linear(4, 4) for _ in range(5)]
|
||||
|
||||
x = Tensor(np.random.randn(1, 4), requires_grad=True)
|
||||
h = x
|
||||
for layer in layers:
|
||||
h = relu(layer(h))
|
||||
loss = mse_loss(h, target)
|
||||
loss.backward()
|
||||
|
||||
# ALL layers must have gradients
|
||||
for i, layer in enumerate(layers):
|
||||
assert layer.weight.grad is not None, f"Layer {i} has no gradient!"
|
||||
```
|
||||
|
||||
**Why it's good:**
|
||||
- Tests the **boundary** between layers
|
||||
- Catches gradient chain breaks
|
||||
- Clear error message tells you WHERE it broke
|
||||
|
||||
### ❌ Bad Integration Test
|
||||
```python
|
||||
def test_linear_layer():
|
||||
"""Test linear layer works"""
|
||||
layer = Linear(2, 3)
|
||||
x = Tensor([[1, 2]])
|
||||
y = layer(x)
|
||||
assert y.shape == (1, 3)
|
||||
```
|
||||
|
||||
**Why it's bad:**
|
||||
- This is a **unit test**, not integration
|
||||
- Doesn't test interaction with other modules
|
||||
- Belongs in `tests/03_layers/`
|
||||
|
||||
## Running Tests
|
||||
|
||||
```bash
|
||||
# Run all integration tests
|
||||
pytest tests/integration/ -v
|
||||
|
||||
# Run only gradient flow tests
|
||||
pytest tests/integration/test_gradient_flow.py -v
|
||||
|
||||
# Run only training flow tests
|
||||
pytest tests/integration/test_training_flow.py -v
|
||||
|
||||
# Run quick smoke tests (for CI)
|
||||
pytest tests/integration/ -v -k quick
|
||||
|
||||
# Run with detailed output on failure
|
||||
pytest tests/integration/ -v --tb=long
|
||||
```
|
||||
|
||||
## Adding New Integration Tests
|
||||
|
||||
When adding a new module (e.g., Module 14: Profiling), ask:
|
||||
|
||||
1. **What other modules does it interact with?**
|
||||
- Profiling interacts with training loops (07) and models (03)
|
||||
|
||||
2. **What could break at the boundary?**
|
||||
- Profiling hooks might interfere with autograd
|
||||
- Timing might change tensor operations
|
||||
|
||||
3. **Write a test that exercises the boundary:**
|
||||
```python
|
||||
def test_profiling_does_not_break_training():
|
||||
"""Profiling should not interfere with gradient flow"""
|
||||
with profiler.profile():
|
||||
loss = model(x)
|
||||
loss.backward() # Should still work!
|
||||
|
||||
assert model.weight.grad is not None
|
||||
```
|
||||
|
||||
## Coverage Gaps
|
||||
|
||||
### Currently Missing
|
||||
|
||||
| Module | Integration Test Needed |
|
||||
|--------|------------------------|
|
||||
| 14 Profiling | Profiler + training loop |
|
||||
| 15 Quantization | Quantized model accuracy |
|
||||
| 16 Compression | Compressed model still trains |
|
||||
| 17 Memoization | Cached ops maintain correctness |
|
||||
| 18 Acceleration | Accelerated ops match baseline |
|
||||
|
||||
### How to Fill Gaps
|
||||
|
||||
For each gap, create a test that:
|
||||
1. Uses the module in a **realistic scenario**
|
||||
2. Verifies **correctness** (not just "doesn't crash")
|
||||
3. Checks **boundaries** with connected modules
|
||||
|
||||
349
tests/integration/test_nlp_pipeline_flow.py
Normal file
349
tests/integration/test_nlp_pipeline_flow.py
Normal file
@@ -0,0 +1,349 @@
|
||||
"""
|
||||
NLP Pipeline Flow Integration Tests
|
||||
====================================
|
||||
|
||||
Tests that the NLP pipeline works end-to-end:
|
||||
1. Tokenization produces valid token IDs
|
||||
2. Embeddings convert tokens to vectors
|
||||
3. Attention mechanisms process sequences
|
||||
4. Transformers combine everything correctly
|
||||
5. Gradients flow back through the entire pipeline
|
||||
|
||||
These tests catch issues at module boundaries in the NLP stack.
|
||||
|
||||
Modules tested: 10-13 (Tokenization → Embeddings → Attention → Transformers)
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import numpy as np
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add project root
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||
|
||||
from tinytorch.core.tensor import Tensor
|
||||
from tinytorch.core.autograd import enable_autograd
|
||||
|
||||
# Enable autograd
|
||||
enable_autograd()
|
||||
|
||||
|
||||
class TestEmbeddingGradientFlow:
|
||||
"""
|
||||
Critical Test: Verify gradients flow through embeddings.
|
||||
|
||||
Common bugs caught:
|
||||
- Embedding lookup not differentiable
|
||||
- Wrong gradient accumulation for repeated tokens
|
||||
- Shape mismatches between embedding and attention
|
||||
"""
|
||||
|
||||
def test_embedding_receives_gradients(self):
|
||||
"""Embedding weights must receive gradients during training"""
|
||||
try:
|
||||
from tinytorch.core.embeddings import Embedding
|
||||
except ImportError:
|
||||
pytest.skip("Embedding module not yet implemented")
|
||||
|
||||
vocab_size = 100
|
||||
embed_dim = 32
|
||||
embedding = Embedding(vocab_size, embed_dim)
|
||||
|
||||
# Token IDs (integers)
|
||||
token_ids = [1, 5, 3, 7, 2]
|
||||
|
||||
# Forward pass
|
||||
embedded = embedding.forward(token_ids)
|
||||
|
||||
# Simple loss: sum of embeddings
|
||||
loss = Tensor(np.array([[embedded.data.sum()]]), requires_grad=True)
|
||||
loss.backward()
|
||||
|
||||
# Embedding weights should have gradients
|
||||
assert embedding.weight.grad is not None, (
|
||||
"Embedding weights did not receive gradients!"
|
||||
)
|
||||
|
||||
# Only used token embeddings should have non-zero gradients
|
||||
for token_id in token_ids:
|
||||
grad_row = embedding.weight.grad[token_id]
|
||||
assert np.any(grad_row != 0), (
|
||||
f"Token {token_id} embedding has zero gradient!"
|
||||
)
|
||||
|
||||
def test_repeated_tokens_accumulate_gradients(self):
|
||||
"""Same token appearing twice should have accumulated gradient"""
|
||||
try:
|
||||
from tinytorch.core.embeddings import Embedding
|
||||
except ImportError:
|
||||
pytest.skip("Embedding module not yet implemented")
|
||||
|
||||
vocab_size = 10
|
||||
embed_dim = 4
|
||||
embedding = Embedding(vocab_size, embed_dim)
|
||||
|
||||
# Token 5 appears twice
|
||||
token_ids = [5, 2, 5, 3]
|
||||
|
||||
embedded = embedding.forward(token_ids)
|
||||
|
||||
# Loss that weights all positions equally
|
||||
loss = Tensor(np.array([[embedded.data.sum()]]), requires_grad=True)
|
||||
loss.backward()
|
||||
|
||||
# Token 5 should have ~2x the gradient of token 2 or 3
|
||||
grad_5 = np.linalg.norm(embedding.weight.grad[5])
|
||||
grad_2 = np.linalg.norm(embedding.weight.grad[2])
|
||||
|
||||
# Allow some tolerance
|
||||
assert grad_5 > grad_2 * 1.5, (
|
||||
f"Repeated token gradient not accumulated!\n"
|
||||
f" Token 5 (appears 2x) grad: {grad_5}\n"
|
||||
f" Token 2 (appears 1x) grad: {grad_2}\n"
|
||||
f" Expected ratio ~2, got {grad_5/grad_2:.2f}"
|
||||
)
|
||||
|
||||
|
||||
class TestAttentionGradientFlow:
|
||||
"""
|
||||
Critical Test: Verify gradients flow through attention mechanism.
|
||||
|
||||
Common bugs caught:
|
||||
- Softmax gradient issues
|
||||
- Attention weights not differentiable
|
||||
- Query/Key/Value projection gradients
|
||||
"""
|
||||
|
||||
def test_attention_all_projections_receive_gradients(self):
|
||||
"""Q, K, V projections must all receive gradients"""
|
||||
try:
|
||||
from tinytorch.core.attention import MultiHeadAttention
|
||||
except ImportError:
|
||||
pytest.skip("Attention module not yet implemented")
|
||||
|
||||
embed_dim = 32
|
||||
num_heads = 4
|
||||
seq_len = 8
|
||||
batch_size = 2
|
||||
|
||||
attention = MultiHeadAttention(embed_dim, num_heads)
|
||||
|
||||
# Random input sequence
|
||||
x = Tensor(
|
||||
np.random.randn(batch_size, seq_len, embed_dim),
|
||||
requires_grad=True
|
||||
)
|
||||
|
||||
# Forward pass (self-attention - single input for Q, K, V)
|
||||
output = attention.forward(x)
|
||||
|
||||
# Simple loss
|
||||
loss = Tensor(np.array([[output.data.sum()]]), requires_grad=True)
|
||||
loss.backward()
|
||||
|
||||
# All projection matrices should have gradients
|
||||
projections = ['W_q', 'W_k', 'W_v', 'W_o']
|
||||
for proj_name in projections:
|
||||
if hasattr(attention, proj_name):
|
||||
proj = getattr(attention, proj_name)
|
||||
if hasattr(proj, 'weight'):
|
||||
assert proj.weight.grad is not None, (
|
||||
f"{proj_name} did not receive gradients!"
|
||||
)
|
||||
|
||||
@pytest.mark.xfail(reason="Known issue: Attention gradient flow needs fix - see Module 12")
|
||||
def test_attention_input_receives_gradients(self):
|
||||
"""Input to attention must receive gradients for residual connections"""
|
||||
try:
|
||||
from tinytorch.core.attention import MultiHeadAttention
|
||||
except ImportError:
|
||||
pytest.skip("Attention module not yet implemented")
|
||||
|
||||
embed_dim = 16
|
||||
num_heads = 2
|
||||
|
||||
attention = MultiHeadAttention(embed_dim, num_heads)
|
||||
|
||||
x = Tensor(
|
||||
np.random.randn(1, 4, embed_dim),
|
||||
requires_grad=True
|
||||
)
|
||||
|
||||
output = attention.forward(x)
|
||||
loss = Tensor(np.array([[output.data.sum()]]), requires_grad=True)
|
||||
loss.backward()
|
||||
|
||||
assert x.grad is not None, (
|
||||
"Input to attention did not receive gradients!\n"
|
||||
"This breaks residual connections in Transformers."
|
||||
)
|
||||
|
||||
assert x.grad.shape == x.shape, (
|
||||
f"Input gradient shape mismatch: {x.grad.shape} vs {x.shape}"
|
||||
)
|
||||
|
||||
|
||||
class TestTransformerGradientFlow:
|
||||
"""
|
||||
Critical Test: Verify gradients flow through complete Transformer.
|
||||
|
||||
Common bugs caught:
|
||||
- Residual connection gradients
|
||||
- Layer norm gradient issues
|
||||
- Deep network vanishing gradients
|
||||
"""
|
||||
|
||||
def test_transformer_block_gradient_flow(self):
|
||||
"""Gradients must flow through a complete transformer block"""
|
||||
try:
|
||||
from tinytorch.core.transformers import TransformerBlock
|
||||
except ImportError:
|
||||
pytest.skip("Transformer module not yet implemented")
|
||||
|
||||
embed_dim = 32
|
||||
num_heads = 4
|
||||
ff_dim = 64
|
||||
|
||||
block = TransformerBlock(embed_dim, num_heads, ff_dim)
|
||||
|
||||
x = Tensor(
|
||||
np.random.randn(1, 8, embed_dim),
|
||||
requires_grad=True
|
||||
)
|
||||
|
||||
output = block.forward(x)
|
||||
loss = Tensor(np.array([[output.data.sum()]]), requires_grad=True)
|
||||
loss.backward()
|
||||
|
||||
# Input must receive gradients (for stacking blocks)
|
||||
assert x.grad is not None, (
|
||||
"Transformer block input did not receive gradients!"
|
||||
)
|
||||
|
||||
# Gradient should not be too small (vanishing)
|
||||
grad_norm = np.linalg.norm(x.grad)
|
||||
assert grad_norm > 1e-6, (
|
||||
f"Vanishing gradients in transformer block: {grad_norm}"
|
||||
)
|
||||
|
||||
def test_stacked_transformer_blocks(self):
|
||||
"""Gradients must flow through multiple stacked blocks"""
|
||||
try:
|
||||
from tinytorch.core.transformers import TransformerBlock
|
||||
except ImportError:
|
||||
pytest.skip("Transformer module not yet implemented")
|
||||
|
||||
embed_dim = 32
|
||||
num_heads = 4
|
||||
ff_dim = 64
|
||||
num_layers = 4
|
||||
|
||||
blocks = [TransformerBlock(embed_dim, num_heads, ff_dim) for _ in range(num_layers)]
|
||||
|
||||
x = Tensor(
|
||||
np.random.randn(1, 8, embed_dim),
|
||||
requires_grad=True
|
||||
)
|
||||
|
||||
# Forward through all blocks
|
||||
h = x
|
||||
for block in blocks:
|
||||
h = block.forward(h)
|
||||
|
||||
loss = Tensor(np.array([[h.data.sum()]]), requires_grad=True)
|
||||
loss.backward()
|
||||
|
||||
# Input must receive gradients through all layers
|
||||
assert x.grad is not None, (
|
||||
f"Gradients did not flow through {num_layers} transformer blocks!"
|
||||
)
|
||||
|
||||
# Check gradient magnitude is reasonable
|
||||
grad_norm = np.linalg.norm(x.grad)
|
||||
assert grad_norm > 1e-8, (
|
||||
f"Severe vanishing gradients through {num_layers} blocks: {grad_norm}"
|
||||
)
|
||||
|
||||
|
||||
class TestNLPPipelineEndToEnd:
|
||||
"""
|
||||
Integration Test: Full NLP pipeline from tokens to loss.
|
||||
|
||||
This tests the complete flow:
|
||||
tokens → embedding → attention → linear → loss
|
||||
"""
|
||||
|
||||
def test_complete_nlp_forward_backward(self):
|
||||
"""Complete NLP pipeline must work end-to-end"""
|
||||
try:
|
||||
from tinytorch.core.embeddings import Embedding
|
||||
from tinytorch.core.attention import MultiHeadAttention
|
||||
from tinytorch.core.layers import Linear
|
||||
from tinytorch.core.losses import CrossEntropyLoss
|
||||
except ImportError:
|
||||
pytest.skip("NLP modules not yet implemented")
|
||||
|
||||
vocab_size = 100
|
||||
embed_dim = 32
|
||||
num_heads = 4
|
||||
num_classes = 10
|
||||
seq_len = 8
|
||||
|
||||
# Build pipeline
|
||||
embedding = Embedding(vocab_size, embed_dim)
|
||||
attention = MultiHeadAttention(embed_dim, num_heads)
|
||||
classifier = Linear(embed_dim, num_classes)
|
||||
loss_fn = CrossEntropyLoss()
|
||||
|
||||
# Input: token IDs
|
||||
token_ids = list(np.random.randint(0, vocab_size, seq_len))
|
||||
target = Tensor(np.array([[3]])) # Class 3
|
||||
|
||||
# Forward pass
|
||||
embedded = embedding.forward(token_ids) # [seq_len, embed_dim]
|
||||
# Reshape for attention: add batch dimension
|
||||
embedded_batched = Tensor(embedded.data.reshape(1, seq_len, embed_dim), requires_grad=True)
|
||||
attended = attention.forward(embedded_batched) # [1, seq_len, embed_dim]
|
||||
|
||||
# Mean pooling over sequence
|
||||
pooled = Tensor(attended.data.mean(axis=0, keepdims=True), requires_grad=True)
|
||||
|
||||
logits = classifier.forward(pooled) # [1, num_classes]
|
||||
loss = loss_fn.forward(logits, target)
|
||||
|
||||
# Backward pass
|
||||
loss.backward()
|
||||
|
||||
# Verify gradients flowed to embedding
|
||||
assert embedding.weight.grad is not None, (
|
||||
"Gradients did not flow back to embeddings!"
|
||||
)
|
||||
|
||||
# Verify classifier received gradients
|
||||
assert classifier.weight.grad is not None, (
|
||||
"Classifier did not receive gradients!"
|
||||
)
|
||||
|
||||
|
||||
# Quick smoke tests for CI
|
||||
@pytest.mark.quick
|
||||
class TestQuickNLPSmoke:
|
||||
"""Fast tests for CI"""
|
||||
|
||||
def test_embedding_forward_works(self):
|
||||
"""Embedding forward should not crash"""
|
||||
try:
|
||||
from tinytorch.core.embeddings import Embedding
|
||||
except ImportError:
|
||||
pytest.skip("Embedding module not yet implemented")
|
||||
|
||||
embedding = Embedding(100, 32)
|
||||
result = embedding.forward([1, 2, 3])
|
||||
assert result.shape[0] == 3
|
||||
assert result.shape[1] == 32
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
|
||||
356
tests/integration/test_training_flow.py
Normal file
356
tests/integration/test_training_flow.py
Normal file
@@ -0,0 +1,356 @@
|
||||
"""
|
||||
Training Flow Integration Tests
|
||||
================================
|
||||
|
||||
Tests that the complete training pipeline works:
|
||||
1. Forward pass produces valid outputs
|
||||
2. Loss computes correctly
|
||||
3. Backward pass populates gradients
|
||||
4. Optimizer updates weights
|
||||
5. Loss decreases over iterations
|
||||
|
||||
These tests catch issues that unit tests miss - where modules
|
||||
work individually but fail when connected.
|
||||
|
||||
Modules tested: 01-07 (Tensor → Training)
|
||||
"""
|
||||
|
||||
import pytest
|
||||
import numpy as np
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# Add project root
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
|
||||
|
||||
from tinytorch.core.tensor import Tensor
|
||||
from tinytorch.core.layers import Linear
|
||||
from tinytorch.core.activations import ReLU, Sigmoid
|
||||
from tinytorch.core.losses import MSELoss, CrossEntropyLoss
|
||||
from tinytorch.core.optimizers import SGD, Adam
|
||||
from tinytorch.core.autograd import enable_autograd
|
||||
|
||||
# Enable autograd for all tests
|
||||
enable_autograd()
|
||||
|
||||
|
||||
class TestOptimzerActuallyUpdatesWeights:
|
||||
"""
|
||||
Critical Test: Verify optimizer.step() actually changes weights.
|
||||
|
||||
Common bugs caught:
|
||||
- Optimizer not connected to parameters
|
||||
- Gradients not flowing to weights
|
||||
- Learning rate is zero
|
||||
- step() not implemented correctly
|
||||
"""
|
||||
|
||||
def test_sgd_updates_weights(self):
|
||||
"""SGD must modify weights after step()"""
|
||||
layer = Linear(2, 1)
|
||||
optimizer = SGD([layer.weight, layer.bias], lr=0.1)
|
||||
|
||||
# Store initial weights
|
||||
initial_weight = layer.weight.data.copy()
|
||||
initial_bias = layer.bias.data.copy()
|
||||
|
||||
# Forward + backward
|
||||
x = Tensor([[1.0, 2.0]], requires_grad=True)
|
||||
target = Tensor([[5.0]])
|
||||
|
||||
output = layer.forward(x)
|
||||
loss = MSELoss().forward(output, target)
|
||||
loss.backward()
|
||||
|
||||
# Verify gradients exist
|
||||
assert layer.weight.grad is not None, "Weight gradient is None!"
|
||||
assert layer.bias.grad is not None, "Bias gradient is None!"
|
||||
|
||||
# Step should update weights
|
||||
optimizer.step()
|
||||
|
||||
# Weights MUST be different
|
||||
weight_changed = not np.allclose(initial_weight, layer.weight.data)
|
||||
bias_changed = not np.allclose(initial_bias, layer.bias.data)
|
||||
|
||||
assert weight_changed, (
|
||||
f"SGD.step() did not change weights!\n"
|
||||
f" Before: {initial_weight}\n"
|
||||
f" After: {layer.weight.data}\n"
|
||||
f" Grad: {layer.weight.grad}"
|
||||
)
|
||||
assert bias_changed, "SGD.step() did not change bias!"
|
||||
|
||||
def test_adam_updates_weights(self):
|
||||
"""Adam must modify weights after step()"""
|
||||
layer = Linear(2, 1)
|
||||
optimizer = Adam([layer.weight, layer.bias], lr=0.1)
|
||||
|
||||
initial_weight = layer.weight.data.copy()
|
||||
|
||||
x = Tensor([[1.0, 2.0]], requires_grad=True)
|
||||
target = Tensor([[5.0]])
|
||||
|
||||
output = layer.forward(x)
|
||||
loss = MSELoss().forward(output, target)
|
||||
loss.backward()
|
||||
|
||||
optimizer.step()
|
||||
|
||||
assert not np.allclose(initial_weight, layer.weight.data), (
|
||||
"Adam.step() did not change weights!"
|
||||
)
|
||||
|
||||
|
||||
class TestTrainingReducesLoss:
|
||||
"""
|
||||
Critical Test: Verify that training actually reduces loss.
|
||||
|
||||
Common bugs caught:
|
||||
- Gradients have wrong sign
|
||||
- Learning rate too high (divergence)
|
||||
- Optimizer not using gradients correctly
|
||||
- Loss function returning wrong values
|
||||
"""
|
||||
|
||||
def test_mlp_loss_decreases(self):
|
||||
"""A simple MLP must learn XOR-like pattern"""
|
||||
# Simple 2-layer network
|
||||
layer1 = Linear(2, 4)
|
||||
relu = ReLU()
|
||||
layer2 = Linear(4, 1)
|
||||
sigmoid = Sigmoid()
|
||||
loss_fn = MSELoss()
|
||||
|
||||
params = [layer1.weight, layer1.bias, layer2.weight, layer2.bias]
|
||||
optimizer = SGD(params, lr=0.5)
|
||||
|
||||
# XOR-like data
|
||||
X = Tensor([
|
||||
[0., 0.],
|
||||
[0., 1.],
|
||||
[1., 0.],
|
||||
[1., 1.]
|
||||
], requires_grad=True)
|
||||
y = Tensor([[0.], [1.], [1.], [0.]])
|
||||
|
||||
# Track loss over time
|
||||
losses = []
|
||||
|
||||
for epoch in range(100):
|
||||
# Zero gradients
|
||||
for p in params:
|
||||
if p.grad is not None:
|
||||
p.grad = np.zeros_like(p.grad)
|
||||
|
||||
# Forward
|
||||
h = relu.forward(layer1.forward(X))
|
||||
out = sigmoid.forward(layer2.forward(h))
|
||||
loss = loss_fn.forward(out, y)
|
||||
|
||||
losses.append(float(loss.data))
|
||||
|
||||
# Backward
|
||||
loss.backward()
|
||||
|
||||
# Update
|
||||
optimizer.step()
|
||||
|
||||
# Loss MUST decrease
|
||||
initial_loss = losses[0]
|
||||
final_loss = losses[-1]
|
||||
|
||||
assert final_loss < initial_loss, (
|
||||
f"Training did not reduce loss!\n"
|
||||
f" Initial: {initial_loss:.4f}\n"
|
||||
f" Final: {final_loss:.4f}\n"
|
||||
f" Loss history: {losses[:5]}...{losses[-5:]}"
|
||||
)
|
||||
|
||||
# Loss should decrease significantly (at least 20%)
|
||||
improvement = (initial_loss - final_loss) / initial_loss
|
||||
assert improvement > 0.2, (
|
||||
f"Training improved loss by only {improvement*100:.1f}%\n"
|
||||
f" Expected at least 20% improvement"
|
||||
)
|
||||
|
||||
|
||||
class TestGradientChainNotBroken:
|
||||
"""
|
||||
Critical Test: Verify gradient chain is not broken.
|
||||
|
||||
Common bugs caught:
|
||||
- requires_grad not propagating
|
||||
- Operations not recording grad_fn
|
||||
- Intermediate tensors breaking the chain
|
||||
"""
|
||||
|
||||
def test_deep_network_gradient_chain(self):
|
||||
"""Gradients must flow through 5 layers"""
|
||||
layers = [Linear(4, 4) for _ in range(5)]
|
||||
relu = ReLU()
|
||||
|
||||
x = Tensor(np.random.randn(1, 4), requires_grad=True)
|
||||
target = Tensor(np.random.randn(1, 4))
|
||||
|
||||
# Forward through all layers
|
||||
h = x
|
||||
for layer in layers:
|
||||
h = relu.forward(layer.forward(h))
|
||||
|
||||
loss = MSELoss().forward(h, target)
|
||||
loss.backward()
|
||||
|
||||
# ALL layers must have gradients
|
||||
for i, layer in enumerate(layers):
|
||||
assert layer.weight.grad is not None, (
|
||||
f"Layer {i} weight.grad is None - gradient chain broken!"
|
||||
)
|
||||
assert layer.bias.grad is not None, (
|
||||
f"Layer {i} bias.grad is None - gradient chain broken!"
|
||||
)
|
||||
|
||||
# Gradients should be non-trivial
|
||||
grad_norm = np.linalg.norm(layer.weight.grad)
|
||||
assert grad_norm > 1e-10, (
|
||||
f"Layer {i} has vanishing gradients: {grad_norm}"
|
||||
)
|
||||
|
||||
def test_input_receives_gradients(self):
|
||||
"""Input tensor must receive gradients for visualization/debugging"""
|
||||
layer = Linear(3, 2)
|
||||
x = Tensor([[1., 2., 3.]], requires_grad=True)
|
||||
target = Tensor([[1., 0.]])
|
||||
|
||||
output = layer.forward(x)
|
||||
loss = MSELoss().forward(output, target)
|
||||
loss.backward()
|
||||
|
||||
assert x.grad is not None, "Input tensor did not receive gradients!"
|
||||
assert x.grad.shape == x.shape, (
|
||||
f"Input gradient shape mismatch: {x.grad.shape} vs {x.shape}"
|
||||
)
|
||||
|
||||
|
||||
class TestZeroGradWorks:
|
||||
"""
|
||||
Critical Test: Verify zero_grad clears gradients properly.
|
||||
|
||||
Common bugs caught:
|
||||
- Gradients accumulating across batches
|
||||
- zero_grad not actually zeroing
|
||||
- Memory leaks from gradient accumulation
|
||||
"""
|
||||
|
||||
def test_gradients_dont_accumulate_after_zero_grad(self):
|
||||
"""Gradients must not accumulate when zero_grad is called"""
|
||||
layer = Linear(2, 1)
|
||||
optimizer = SGD([layer.weight, layer.bias], lr=0.1)
|
||||
|
||||
x = Tensor([[1., 2.]], requires_grad=True)
|
||||
target = Tensor([[1.]])
|
||||
|
||||
# First forward/backward
|
||||
out1 = layer.forward(x)
|
||||
loss1 = MSELoss().forward(out1, target)
|
||||
loss1.backward()
|
||||
|
||||
grad_after_first = layer.weight.grad.copy()
|
||||
|
||||
# Zero gradients
|
||||
optimizer.zero_grad()
|
||||
|
||||
# Verify zeroed
|
||||
assert layer.weight.grad is None or np.allclose(layer.weight.grad, 0), (
|
||||
"zero_grad() did not clear weight gradients!"
|
||||
)
|
||||
|
||||
# Second forward/backward
|
||||
out2 = layer.forward(x)
|
||||
loss2 = MSELoss().forward(out2, target)
|
||||
loss2.backward()
|
||||
|
||||
grad_after_second = layer.weight.grad.copy()
|
||||
|
||||
# Gradients should be similar magnitude (not accumulated)
|
||||
ratio = np.linalg.norm(grad_after_second) / np.linalg.norm(grad_after_first)
|
||||
assert 0.5 < ratio < 2.0, (
|
||||
f"Gradients appear to be accumulating!\n"
|
||||
f" First grad norm: {np.linalg.norm(grad_after_first)}\n"
|
||||
f" Second grad norm: {np.linalg.norm(grad_after_second)}\n"
|
||||
f" Ratio: {ratio} (should be ~1.0)"
|
||||
)
|
||||
|
||||
|
||||
class TestBatchTraining:
|
||||
"""
|
||||
Critical Test: Verify batch training works correctly.
|
||||
|
||||
Common bugs caught:
|
||||
- Shape mismatches with batches
|
||||
- Mean vs sum reduction issues
|
||||
- Gradient scaling problems
|
||||
"""
|
||||
|
||||
def test_batch_gradients_are_averaged(self):
|
||||
"""Gradients should be averaged over batch (not summed)"""
|
||||
layer = Linear(2, 1)
|
||||
|
||||
# Single sample
|
||||
x1 = Tensor([[1., 2.]], requires_grad=True)
|
||||
target1 = Tensor([[3.]])
|
||||
|
||||
out1 = layer.forward(x1)
|
||||
loss1 = MSELoss().forward(out1, target1)
|
||||
loss1.backward()
|
||||
|
||||
single_grad = layer.weight.grad.copy()
|
||||
|
||||
# Reset
|
||||
layer.weight.grad = None
|
||||
layer.bias.grad = None
|
||||
|
||||
# Batch of same sample repeated 4 times
|
||||
x_batch = Tensor([[1., 2.]] * 4, requires_grad=True)
|
||||
target_batch = Tensor([[3.]] * 4)
|
||||
|
||||
out_batch = layer.forward(x_batch)
|
||||
loss_batch = MSELoss().forward(out_batch, target_batch)
|
||||
loss_batch.backward()
|
||||
|
||||
batch_grad = layer.weight.grad.copy()
|
||||
|
||||
# Gradients should be similar (averaged, not 4x)
|
||||
ratio = np.linalg.norm(batch_grad) / np.linalg.norm(single_grad)
|
||||
assert 0.8 < ratio < 1.2, (
|
||||
f"Batch gradients not properly averaged!\n"
|
||||
f" Single sample grad norm: {np.linalg.norm(single_grad)}\n"
|
||||
f" Batch (4x same) grad norm: {np.linalg.norm(batch_grad)}\n"
|
||||
f" Ratio: {ratio} (should be ~1.0, got {ratio:.2f})"
|
||||
)
|
||||
|
||||
|
||||
# Quick smoke test for CI
|
||||
@pytest.mark.quick
|
||||
class TestQuickTrainingSmoke:
|
||||
"""Fast tests for CI - just verify nothing crashes"""
|
||||
|
||||
def test_simple_training_step(self):
|
||||
"""One training step should not crash"""
|
||||
layer = Linear(2, 1)
|
||||
opt = SGD([layer.weight, layer.bias], lr=0.1)
|
||||
|
||||
x = Tensor([[1., 2.]], requires_grad=True)
|
||||
y = Tensor([[1.]])
|
||||
|
||||
out = layer.forward(x)
|
||||
loss = MSELoss().forward(out, y)
|
||||
loss.backward()
|
||||
opt.step()
|
||||
|
||||
assert True # If we got here, it works
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
pytest.main([__file__, "-v"])
|
||||
|
||||
Reference in New Issue
Block a user