Files
TinyTorch/modules/source/03_layers/layers_dev.py
Vijay Janapa Reddi 365e2ee394 feat: Add comprehensive intermediate testing across all TinyTorch modules
- Add 17 intermediate test points across 6 modules for immediate student feedback
- Tensor module: Tests after creation, properties, arithmetic, and operators
- Activations module: Tests after each activation function (ReLU, Sigmoid, Tanh, Softmax)
- Layers module: Tests after matrix multiplication and Dense layer implementation
- Networks module: Tests after Sequential class and MLP creation
- CNN module: Tests after convolution, Conv2D layer, and flatten operations
- DataLoader module: Tests after Dataset interface and DataLoader class
- All tests include visual progress indicators and behavioral explanations
- Maintains NBGrader compliance with proper metadata and point allocation
- Enables steady forward progress and better debugging for students
- 100% test success rate across all modules and integration testing
2025-07-12 18:28:35 -04:00

661 lines
23 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# ---
# %% [markdown]
"""
# Module 3: Layers - Building Blocks of Neural Networks
Welcome to the Layers module! This is where we build the fundamental components that stack together to form neural networks.
## Learning Goals
- Understand how matrix multiplication powers neural networks
- Implement naive matrix multiplication from scratch for deep understanding
- Build the Dense (Linear) layer - the foundation of all neural networks
- Learn weight initialization strategies and their importance
- See how layers compose with activations to create powerful networks
## Build → Use → Understand
1. **Build**: Matrix multiplication and Dense layers from scratch
2. **Use**: Create and test layers with real data
3. **Understand**: How linear transformations enable feature learning
"""
# %% nbgrader={"grade": false, "grade_id": "layers-imports", "locked": false, "schema_version": 3, "solution": false, "task": false}
#| default_exp core.layers
#| export
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from typing import Union, List, Tuple, Optional
# Import our dependencies - try from package first, then local modules
try:
from tinytorch.core.tensor import Tensor
from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax
except ImportError:
# For development, import from local modules
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))
from tensor_dev import Tensor
from activations_dev import ReLU, Sigmoid, Tanh, Softmax
# %% nbgrader={"grade": false, "grade_id": "layers-setup", "locked": false, "schema_version": 3, "solution": false, "task": false}
#| hide
#| export
def _should_show_plots():
"""Check if we should show plots (disable during testing)"""
# Check multiple conditions that indicate we're in test mode
is_pytest = (
'pytest' in sys.modules or
'test' in sys.argv or
os.environ.get('PYTEST_CURRENT_TEST') is not None or
any('test' in arg for arg in sys.argv) or
any('pytest' in arg for arg in sys.argv)
)
# Show plots in development mode (when not in test mode)
return not is_pytest
# %% nbgrader={"grade": false, "grade_id": "layers-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false}
print("🔥 TinyTorch Layers Module")
print(f"NumPy version: {np.__version__}")
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
print("Ready to build neural network layers!")
# %% [markdown]
"""
## 📦 Where This Code Lives in the Final Package
**Learning Side:** You work in `modules/source/03_layers/layers_dev.py`
**Building Side:** Code exports to `tinytorch.core.layers`
```python
# Final package structure:
from tinytorch.core.layers import Dense, Conv2D # All layer types together!
from tinytorch.core.tensor import Tensor # The foundation
from tinytorch.core.activations import ReLU, Sigmoid # Nonlinearity
```
**Why this matters:**
- **Learning:** Focused modules for deep understanding
- **Production:** Proper organization like PyTorch's `torch.nn.Linear`
- **Consistency:** All layer types live together in `core.layers`
- **Integration:** Works seamlessly with tensors and activations
"""
# %% [markdown]
"""
## 🧠 The Mathematical Foundation of Neural Layers
### Linear Algebra at the Heart of ML
Neural networks are fundamentally about **linear transformations** followed by **nonlinear activations**:
```
Layer: y = Wx + b (linear transformation)
Activation: z = σ(y) (nonlinear transformation)
```
### Matrix Multiplication: The Engine of Deep Learning
Every forward pass in a neural network involves matrix multiplication:
- **Dense layers**: Matrix multiplication between inputs and weights
- **Convolutional layers**: Convolution as matrix multiplication
- **Attention**: Query-key-value matrix operations
- **Transformers**: Self-attention through matrix operations
### Why Matrix Multiplication Matters
- **Parallel computation**: GPUs excel at matrix operations
- **Batch processing**: Handle multiple samples simultaneously
- **Feature learning**: Each row/column learns different patterns
- **Composability**: Layers stack naturally through matrix chains
### Connection to Real ML Systems
Every framework optimizes matrix multiplication:
- **PyTorch**: `torch.nn.Linear` uses optimized BLAS
- **TensorFlow**: `tf.keras.layers.Dense` uses cuDNN
- **JAX**: `jax.numpy.dot` uses XLA compilation
- **TinyTorch**: `tinytorch.core.layers.Dense` (what we're building!)
### Performance Considerations
- **Memory layout**: Contiguous arrays for cache efficiency
- **Vectorization**: SIMD operations for speed
- **Parallelization**: Multi-threading and GPU acceleration
- **Numerical stability**: Proper initialization and normalization
"""
# %% [markdown]
"""
## Step 1: Understanding Matrix Multiplication
### What is Matrix Multiplication?
Matrix multiplication is the **fundamental operation** that powers neural networks. When we multiply matrices A and B:
```
C = A @ B
```
Each element C[i,j] is the **dot product** of row i from A and column j from B.
### Why Matrix Multiplication in Neural Networks?
- **Dense layers**: Transform inputs through learned weights
- **Batch processing**: Handle multiple samples at once
- **Feature learning**: Each neuron learns different patterns
- **Efficiency**: GPUs are optimized for matrix operations
### Visual Example
```
A = [[1, 2], B = [[5, 6], C = [[19, 22],
[3, 4]] [7, 8]] [43, 50]]
C[0,0] = 1*5 + 2*7 = 19
C[0,1] = 1*6 + 2*8 = 22
C[1,0] = 3*5 + 4*7 = 43
C[1,1] = 3*6 + 4*8 = 50
```
### The Algorithm
For matrices A(m×n) and B(n×p) → C(m×p):
```
for i in range(m):
for j in range(p):
for k in range(n):
C[i,j] += A[i,k] * B[k,j]
```
Let's implement this to truly understand it!
"""
# %% nbgrader={"grade": false, "grade_id": "matmul-naive", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
"""
Naive matrix multiplication using explicit for-loops.
This helps you understand what matrix multiplication really does!
Args:
A: Matrix of shape (m, n)
B: Matrix of shape (n, p)
Returns:
Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))
TODO: Implement matrix multiplication using three nested for-loops.
APPROACH:
1. Get the dimensions: m, n from A and n2, p from B
2. Check that n == n2 (matrices must be compatible)
3. Create output matrix C of shape (m, p) filled with zeros
4. Use three nested loops:
- i loop: rows of A (0 to m-1)
- j loop: columns of B (0 to p-1)
- k loop: shared dimension (0 to n-1)
5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j]
EXAMPLE:
A = [[1, 2], B = [[5, 6],
[3, 4]] [7, 8]]
C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19
C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22
C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43
C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50
HINTS:
- Start with C = np.zeros((m, p))
- Use three nested for loops: for i in range(m): for j in range(p): for k in range(n):
- Accumulate the sum: C[i,j] += A[i,k] * B[k,j]
"""
### BEGIN SOLUTION
# Get matrix dimensions
m, n = A.shape
n2, p = B.shape
# Check compatibility
if n != n2:
raise ValueError(f"Incompatible matrix dimensions: A is {m}x{n}, B is {n2}x{p}")
# Initialize result matrix
C = np.zeros((m, p))
# Triple nested loop for matrix multiplication
for i in range(m):
for j in range(p):
for k in range(n):
C[i, j] += A[i, k] * B[k, j]
return C
### END SOLUTION
# %% [markdown]
"""
### 🧪 Quick Test: Matrix Multiplication
Let's test your matrix multiplication implementation right away! This is the foundation of neural networks.
"""
# %% nbgrader={"grade": true, "grade_id": "test-matmul-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
# Test matrix multiplication immediately after implementation
print("🔬 Testing matrix multiplication...")
# Test simple 2x2 case
try:
A = np.array([[1, 2], [3, 4]], dtype=np.float32)
B = np.array([[5, 6], [7, 8]], dtype=np.float32)
result = matmul_naive(A, B)
expected = np.array([[19, 22], [43, 50]], dtype=np.float32)
assert np.allclose(result, expected), f"Matrix multiplication failed: expected {expected}, got {result}"
print(f"✅ Simple 2x2 test: {A.tolist()} @ {B.tolist()} = {result.tolist()}")
# Compare with NumPy
numpy_result = A @ B
assert np.allclose(result, numpy_result), f"Doesn't match NumPy: got {result}, expected {numpy_result}"
print("✅ Matches NumPy's result")
except Exception as e:
print(f"❌ Matrix multiplication test failed: {e}")
raise
# Test different shapes
try:
A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3
B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1
result2 = matmul_naive(A2, B2)
expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32
assert np.allclose(result2, expected2), f"Different shapes failed: got {result2}, expected {expected2}"
print(f"✅ Different shapes test: {A2.tolist()} @ {B2.tolist()} = {result2.tolist()}")
except Exception as e:
print(f"❌ Different shapes test failed: {e}")
raise
# Show the algorithm in action
print("🎯 Matrix multiplication algorithm:")
print(" C[i,j] = Σ(A[i,k] * B[k,j]) for all k")
print(" Triple nested loops compute each element")
print("📈 Progress: Matrix multiplication ✓")
# %% [markdown]
"""
## Step 2: Building the Dense Layer
Now let's build the **Dense layer**, the most fundamental building block of neural networks. A Dense layer performs a linear transformation: `y = Wx + b`
### What is a Dense Layer?
- **Linear transformation**: `y = Wx + b`
- **W**: Weight matrix (learnable parameters)
- **x**: Input tensor
- **b**: Bias vector (learnable parameters)
- **y**: Output tensor
### Why Dense Layers Matter
- **Universal approximation**: Can approximate any function with enough neurons
- **Feature learning**: Each neuron learns a different feature
- **Nonlinearity**: When combined with activation functions, becomes very powerful
- **Foundation**: All other layers build on this concept
### The Math
For input x of shape (batch_size, input_size):
- **W**: Weight matrix of shape (input_size, output_size)
- **b**: Bias vector of shape (output_size)
- **y**: Output of shape (batch_size, output_size)
### Visual Example
```
Input: x = [1, 2, 3] (3 features)
Weights: W = [[0.1, 0.2], Bias: b = [0.1, 0.2]
[0.3, 0.4],
[0.5, 0.6]]
Step 1: Wx = [0.1*1 + 0.3*2 + 0.5*3, 0.2*1 + 0.4*2 + 0.6*3]
= [2.2, 3.2]
Step 2: y = Wx + b = [2.2 + 0.1, 3.2 + 0.2] = [2.3, 3.4]
```
Let's implement this!
"""
# %% nbgrader={"grade": false, "grade_id": "dense-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
class Dense:
"""
Dense (Linear) Layer: y = Wx + b
The fundamental building block of neural networks.
Performs linear transformation: matrix multiplication + bias addition.
"""
def __init__(self, input_size: int, output_size: int, use_bias: bool = True,
use_naive_matmul: bool = False):
"""
Initialize Dense layer with random weights.
Args:
input_size: Number of input features
output_size: Number of output features
use_bias: Whether to include bias term (default: True)
use_naive_matmul: Whether to use naive matrix multiplication (for learning)
TODO: Implement Dense layer initialization with proper weight initialization.
APPROACH:
1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
2. Initialize weights with Xavier/Glorot initialization
3. Initialize bias to zeros (if use_bias=True)
4. Convert to float32 for consistency
EXAMPLE:
Dense(3, 2) creates:
- weights: shape (3, 2) with small random values
- bias: shape (2,) with zeros
HINTS:
- Use np.random.randn() for random initialization
- Scale weights by sqrt(2/(input_size + output_size)) for Xavier init
- Use np.zeros() for bias initialization
- Convert to float32 with .astype(np.float32)
"""
### BEGIN SOLUTION
# Store parameters
self.input_size = input_size
self.output_size = output_size
self.use_bias = use_bias
self.use_naive_matmul = use_naive_matmul
# Xavier/Glorot initialization
scale = np.sqrt(2.0 / (input_size + output_size))
self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale
# Initialize bias
if use_bias:
self.bias = np.zeros(output_size, dtype=np.float32)
else:
self.bias = None
### END SOLUTION
def forward(self, x: Tensor) -> Tensor:
"""
Forward pass: y = Wx + b
Args:
x: Input tensor of shape (batch_size, input_size)
Returns:
Output tensor of shape (batch_size, output_size)
TODO: Implement matrix multiplication and bias addition.
APPROACH:
1. Choose matrix multiplication method based on use_naive_matmul flag
2. Perform matrix multiplication: Wx
3. Add bias if use_bias=True
4. Return result wrapped in Tensor
EXAMPLE:
Input x: Tensor([[1, 2, 3]]) # shape (1, 3)
Weights: shape (3, 2)
Output: Tensor([[val1, val2]]) # shape (1, 2)
HINTS:
- Use self.use_naive_matmul to choose between matmul_naive and @
- x.data gives you the numpy array
- Use broadcasting for bias addition: result + self.bias
- Return Tensor(result) to wrap the result
"""
### BEGIN SOLUTION
# Matrix multiplication
if self.use_naive_matmul:
result = matmul_naive(x.data, self.weights)
else:
result = x.data @ self.weights
# Add bias
if self.use_bias:
result += self.bias
return Tensor(result)
### END SOLUTION
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Quick Test: Dense Layer
Let's test your Dense layer implementation! This is the fundamental building block of neural networks.
"""
# %% nbgrader={"grade": true, "grade_id": "test-dense-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
# Test Dense layer immediately after implementation
print("🔬 Testing Dense layer...")
# Test basic Dense layer
try:
layer = Dense(input_size=3, output_size=2, use_bias=True)
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
print(f"Input shape: {x.shape}")
print(f"Layer weights shape: {layer.weights.shape}")
if layer.bias is not None:
print(f"Layer bias shape: {layer.bias.shape}")
y = layer(x)
print(f"Output shape: {y.shape}")
print(f"Output: {y}")
# Test shape compatibility
assert y.shape == (1, 2), f"Output shape should be (1, 2), got {y.shape}"
print("✅ Dense layer produces correct output shape")
# Test weights initialization
assert layer.weights.shape == (3, 2), f"Weights shape should be (3, 2), got {layer.weights.shape}"
if layer.bias is not None:
assert layer.bias.shape == (2,), f"Bias shape should be (2,), got {layer.bias.shape}"
print("✅ Dense layer has correct weight and bias shapes")
# Test that weights are not all zeros (proper initialization)
assert not np.allclose(layer.weights, 0), "Weights should not be all zeros"
if layer.bias is not None:
assert np.allclose(layer.bias, 0), "Bias should be initialized to zeros"
print("✅ Dense layer has proper weight initialization")
except Exception as e:
print(f"❌ Dense layer test failed: {e}")
raise
# Test without bias
try:
layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)
x2 = Tensor([[1, 2]])
y2 = layer_no_bias(x2)
assert y2.shape == (1, 1), f"No bias output shape should be (1, 1), got {y2.shape}"
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
print("✅ Dense layer works without bias")
except Exception as e:
print(f"❌ Dense layer no-bias test failed: {e}")
raise
# Test naive matrix multiplication
try:
layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)
x3 = Tensor([[1, 2]])
y3 = layer_naive(x3)
assert y3.shape == (1, 2), f"Naive matmul output shape should be (1, 2), got {y3.shape}"
print("✅ Dense layer works with naive matrix multiplication")
except Exception as e:
print(f"❌ Dense layer naive matmul test failed: {e}")
raise
# Show the linear transformation in action
print("🎯 Dense layer behavior:")
print(" y = Wx + b (linear transformation)")
print(" W: learnable weight matrix")
print(" b: learnable bias vector")
print("📈 Progress: Matrix multiplication ✓, Dense layer ✓")
# %% [markdown]
"""
### 🧪 Test Your Implementations
Once you implement the functions above, run these cells to test them:
"""
# %% nbgrader={"grade": true, "grade_id": "test-matmul-naive", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false}
# Test matrix multiplication
print("Testing matrix multiplication...")
# Test case 1: Simple 2x2 matrices
A = np.array([[1, 2], [3, 4]], dtype=np.float32)
B = np.array([[5, 6], [7, 8]], dtype=np.float32)
result = matmul_naive(A, B)
expected = np.array([[19, 22], [43, 50]], dtype=np.float32)
print(f"Matrix A:\n{A}")
print(f"Matrix B:\n{B}")
print(f"Your result:\n{result}")
print(f"Expected:\n{expected}")
assert np.allclose(result, expected), f"Result doesn't match expected: got {result}, expected {expected}"
# Test case 2: Compare with NumPy
numpy_result = A @ B
assert np.allclose(result, numpy_result), f"Doesn't match NumPy result: got {result}, expected {numpy_result}"
# Test case 3: Different shapes
A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3
B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1
result2 = matmul_naive(A2, B2)
expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32
assert np.allclose(result2, expected2), f"Different shapes failed: got {result2}, expected {expected2}"
print("✅ Matrix multiplication tests passed!")
# %% nbgrader={"grade": true, "grade_id": "test-dense-layer", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false}
# Test Dense layer
print("Testing Dense layer...")
# Test basic Dense layer
layer = Dense(input_size=3, output_size=2, use_bias=True)
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
print(f"Input shape: {x.shape}")
print(f"Layer weights shape: {layer.weights.shape}")
if layer.bias is not None:
print(f"Layer bias shape: {layer.bias.shape}")
else:
print("Layer bias: None")
y = layer(x)
print(f"Output shape: {y.shape}")
print(f"Output: {y}")
# Test shape compatibility
assert y.shape == (1, 2), f"Output shape should be (1, 2), got {y.shape}"
# Test without bias
layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)
x2 = Tensor([[1, 2]])
y2 = layer_no_bias(x2)
assert y2.shape == (1, 1), f"No bias output shape should be (1, 1), got {y2.shape}"
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
# Test naive matrix multiplication
layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)
x3 = Tensor([[1, 2]])
y3 = layer_naive(x3)
assert y3.shape == (1, 2), f"Naive matmul output shape should be (1, 2), got {y3.shape}"
print("✅ Dense layer tests passed!")
# %% nbgrader={"grade": true, "grade_id": "test-layer-composition", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false}
# Test layer composition
print("Testing layer composition...")
# Create a simple network: Dense → ReLU → Dense
dense1 = Dense(input_size=3, output_size=2)
relu = ReLU()
dense2 = Dense(input_size=2, output_size=1)
# Test input
x = Tensor([[1, 2, 3]])
print(f"Input: {x}")
# Forward pass through the network
h1 = dense1(x)
print(f"After Dense1: {h1}")
h2 = relu(h1)
print(f"After ReLU: {h2}")
h3 = dense2(h2)
print(f"After Dense2: {h3}")
# Test shapes
assert h1.shape == (1, 2), f"Dense1 output should be (1, 2), got {h1.shape}"
assert h2.shape == (1, 2), f"ReLU output should be (1, 2), got {h2.shape}"
assert h3.shape == (1, 1), f"Dense2 output should be (1, 1), got {h3.shape}"
# Test that ReLU actually applied (non-negative values)
assert np.all(h2.data >= 0), "ReLU should produce non-negative values"
print("✅ Layer composition tests passed!")
# %% [markdown]
"""
## 🎯 Module Summary
Congratulations! You've successfully implemented the core building blocks of neural networks:
### What You've Accomplished
✅ **Matrix Multiplication**: Implemented from scratch with triple nested loops
✅ **Dense Layer**: The fundamental linear transformation y = Wx + b
✅ **Weight Initialization**: Xavier/Glorot initialization for stable training
✅ **Layer Composition**: Combining layers with activations
✅ **Flexible Implementation**: Support for both naive and optimized matrix multiplication
### Key Concepts You've Learned
- **Matrix multiplication** is the engine of neural networks
- **Dense layers** perform linear transformations that learn features
- **Weight initialization** is crucial for stable training
- **Layer composition** creates powerful nonlinear functions
- **Batch processing** enables efficient computation
### Mathematical Foundations
- **Linear algebra**: Matrix operations power all neural computations
- **Universal approximation**: Dense layers can approximate any function
- **Feature learning**: Each neuron learns different patterns
- **Composability**: Simple operations combine to create complex behaviors
### Next Steps
1. **Export your code**: `tito package nbdev --export 03_layers`
2. **Test your implementation**: `tito module test 03_layers`
3. **Use your layers**:
```python
from tinytorch.core.layers import Dense
from tinytorch.core.activations import ReLU
layer = Dense(10, 5)
activation = ReLU()
```
4. **Move to Module 4**: Start building complete neural networks!
**Ready for the next challenge?** Let's compose these layers into complete neural network architectures!
"""