mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-01 01:16:17 -05:00
- Add 17 intermediate test points across 6 modules for immediate student feedback - Tensor module: Tests after creation, properties, arithmetic, and operators - Activations module: Tests after each activation function (ReLU, Sigmoid, Tanh, Softmax) - Layers module: Tests after matrix multiplication and Dense layer implementation - Networks module: Tests after Sequential class and MLP creation - CNN module: Tests after convolution, Conv2D layer, and flatten operations - DataLoader module: Tests after Dataset interface and DataLoader class - All tests include visual progress indicators and behavioral explanations - Maintains NBGrader compliance with proper metadata and point allocation - Enables steady forward progress and better debugging for students - 100% test success rate across all modules and integration testing
661 lines
23 KiB
Python
661 lines
23 KiB
Python
# ---
|
||
# jupyter:
|
||
# jupytext:
|
||
# text_representation:
|
||
# extension: .py
|
||
# format_name: percent
|
||
# format_version: '1.3'
|
||
# jupytext_version: 1.17.1
|
||
# ---
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Module 3: Layers - Building Blocks of Neural Networks
|
||
|
||
Welcome to the Layers module! This is where we build the fundamental components that stack together to form neural networks.
|
||
|
||
## Learning Goals
|
||
- Understand how matrix multiplication powers neural networks
|
||
- Implement naive matrix multiplication from scratch for deep understanding
|
||
- Build the Dense (Linear) layer - the foundation of all neural networks
|
||
- Learn weight initialization strategies and their importance
|
||
- See how layers compose with activations to create powerful networks
|
||
|
||
## Build → Use → Understand
|
||
1. **Build**: Matrix multiplication and Dense layers from scratch
|
||
2. **Use**: Create and test layers with real data
|
||
3. **Understand**: How linear transformations enable feature learning
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "layers-imports", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
#| default_exp core.layers
|
||
|
||
#| export
|
||
import numpy as np
|
||
import matplotlib.pyplot as plt
|
||
import os
|
||
import sys
|
||
from typing import Union, List, Tuple, Optional
|
||
|
||
# Import our dependencies - try from package first, then local modules
|
||
try:
|
||
from tinytorch.core.tensor import Tensor
|
||
from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax
|
||
except ImportError:
|
||
# For development, import from local modules
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))
|
||
from tensor_dev import Tensor
|
||
from activations_dev import ReLU, Sigmoid, Tanh, Softmax
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "layers-setup", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
#| hide
|
||
#| export
|
||
def _should_show_plots():
|
||
"""Check if we should show plots (disable during testing)"""
|
||
# Check multiple conditions that indicate we're in test mode
|
||
is_pytest = (
|
||
'pytest' in sys.modules or
|
||
'test' in sys.argv or
|
||
os.environ.get('PYTEST_CURRENT_TEST') is not None or
|
||
any('test' in arg for arg in sys.argv) or
|
||
any('pytest' in arg for arg in sys.argv)
|
||
)
|
||
|
||
# Show plots in development mode (when not in test mode)
|
||
return not is_pytest
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "layers-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
print("🔥 TinyTorch Layers Module")
|
||
print(f"NumPy version: {np.__version__}")
|
||
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
|
||
print("Ready to build neural network layers!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 📦 Where This Code Lives in the Final Package
|
||
|
||
**Learning Side:** You work in `modules/source/03_layers/layers_dev.py`
|
||
**Building Side:** Code exports to `tinytorch.core.layers`
|
||
|
||
```python
|
||
# Final package structure:
|
||
from tinytorch.core.layers import Dense, Conv2D # All layer types together!
|
||
from tinytorch.core.tensor import Tensor # The foundation
|
||
from tinytorch.core.activations import ReLU, Sigmoid # Nonlinearity
|
||
```
|
||
|
||
**Why this matters:**
|
||
- **Learning:** Focused modules for deep understanding
|
||
- **Production:** Proper organization like PyTorch's `torch.nn.Linear`
|
||
- **Consistency:** All layer types live together in `core.layers`
|
||
- **Integration:** Works seamlessly with tensors and activations
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🧠 The Mathematical Foundation of Neural Layers
|
||
|
||
### Linear Algebra at the Heart of ML
|
||
Neural networks are fundamentally about **linear transformations** followed by **nonlinear activations**:
|
||
|
||
```
|
||
Layer: y = Wx + b (linear transformation)
|
||
Activation: z = σ(y) (nonlinear transformation)
|
||
```
|
||
|
||
### Matrix Multiplication: The Engine of Deep Learning
|
||
Every forward pass in a neural network involves matrix multiplication:
|
||
- **Dense layers**: Matrix multiplication between inputs and weights
|
||
- **Convolutional layers**: Convolution as matrix multiplication
|
||
- **Attention**: Query-key-value matrix operations
|
||
- **Transformers**: Self-attention through matrix operations
|
||
|
||
### Why Matrix Multiplication Matters
|
||
- **Parallel computation**: GPUs excel at matrix operations
|
||
- **Batch processing**: Handle multiple samples simultaneously
|
||
- **Feature learning**: Each row/column learns different patterns
|
||
- **Composability**: Layers stack naturally through matrix chains
|
||
|
||
### Connection to Real ML Systems
|
||
Every framework optimizes matrix multiplication:
|
||
- **PyTorch**: `torch.nn.Linear` uses optimized BLAS
|
||
- **TensorFlow**: `tf.keras.layers.Dense` uses cuDNN
|
||
- **JAX**: `jax.numpy.dot` uses XLA compilation
|
||
- **TinyTorch**: `tinytorch.core.layers.Dense` (what we're building!)
|
||
|
||
### Performance Considerations
|
||
- **Memory layout**: Contiguous arrays for cache efficiency
|
||
- **Vectorization**: SIMD operations for speed
|
||
- **Parallelization**: Multi-threading and GPU acceleration
|
||
- **Numerical stability**: Proper initialization and normalization
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 1: Understanding Matrix Multiplication
|
||
|
||
### What is Matrix Multiplication?
|
||
Matrix multiplication is the **fundamental operation** that powers neural networks. When we multiply matrices A and B:
|
||
|
||
```
|
||
C = A @ B
|
||
```
|
||
|
||
Each element C[i,j] is the **dot product** of row i from A and column j from B.
|
||
|
||
### Why Matrix Multiplication in Neural Networks?
|
||
- **Dense layers**: Transform inputs through learned weights
|
||
- **Batch processing**: Handle multiple samples at once
|
||
- **Feature learning**: Each neuron learns different patterns
|
||
- **Efficiency**: GPUs are optimized for matrix operations
|
||
|
||
### Visual Example
|
||
```
|
||
A = [[1, 2], B = [[5, 6], C = [[19, 22],
|
||
[3, 4]] [7, 8]] [43, 50]]
|
||
|
||
C[0,0] = 1*5 + 2*7 = 19
|
||
C[0,1] = 1*6 + 2*8 = 22
|
||
C[1,0] = 3*5 + 4*7 = 43
|
||
C[1,1] = 3*6 + 4*8 = 50
|
||
```
|
||
|
||
### The Algorithm
|
||
For matrices A(m×n) and B(n×p) → C(m×p):
|
||
```
|
||
for i in range(m):
|
||
for j in range(p):
|
||
for k in range(n):
|
||
C[i,j] += A[i,k] * B[k,j]
|
||
```
|
||
|
||
Let's implement this to truly understand it!
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "matmul-naive", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
|
||
"""
|
||
Naive matrix multiplication using explicit for-loops.
|
||
|
||
This helps you understand what matrix multiplication really does!
|
||
|
||
Args:
|
||
A: Matrix of shape (m, n)
|
||
B: Matrix of shape (n, p)
|
||
|
||
Returns:
|
||
Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))
|
||
|
||
TODO: Implement matrix multiplication using three nested for-loops.
|
||
|
||
APPROACH:
|
||
1. Get the dimensions: m, n from A and n2, p from B
|
||
2. Check that n == n2 (matrices must be compatible)
|
||
3. Create output matrix C of shape (m, p) filled with zeros
|
||
4. Use three nested loops:
|
||
- i loop: rows of A (0 to m-1)
|
||
- j loop: columns of B (0 to p-1)
|
||
- k loop: shared dimension (0 to n-1)
|
||
5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j]
|
||
|
||
EXAMPLE:
|
||
A = [[1, 2], B = [[5, 6],
|
||
[3, 4]] [7, 8]]
|
||
|
||
C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19
|
||
C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22
|
||
C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43
|
||
C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50
|
||
|
||
HINTS:
|
||
- Start with C = np.zeros((m, p))
|
||
- Use three nested for loops: for i in range(m): for j in range(p): for k in range(n):
|
||
- Accumulate the sum: C[i,j] += A[i,k] * B[k,j]
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Get matrix dimensions
|
||
m, n = A.shape
|
||
n2, p = B.shape
|
||
|
||
# Check compatibility
|
||
if n != n2:
|
||
raise ValueError(f"Incompatible matrix dimensions: A is {m}x{n}, B is {n2}x{p}")
|
||
|
||
# Initialize result matrix
|
||
C = np.zeros((m, p))
|
||
|
||
# Triple nested loop for matrix multiplication
|
||
for i in range(m):
|
||
for j in range(p):
|
||
for k in range(n):
|
||
C[i, j] += A[i, k] * B[k, j]
|
||
|
||
return C
|
||
### END SOLUTION
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Quick Test: Matrix Multiplication
|
||
|
||
Let's test your matrix multiplication implementation right away! This is the foundation of neural networks.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-matmul-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
|
||
# Test matrix multiplication immediately after implementation
|
||
print("🔬 Testing matrix multiplication...")
|
||
|
||
# Test simple 2x2 case
|
||
try:
|
||
A = np.array([[1, 2], [3, 4]], dtype=np.float32)
|
||
B = np.array([[5, 6], [7, 8]], dtype=np.float32)
|
||
|
||
result = matmul_naive(A, B)
|
||
expected = np.array([[19, 22], [43, 50]], dtype=np.float32)
|
||
|
||
assert np.allclose(result, expected), f"Matrix multiplication failed: expected {expected}, got {result}"
|
||
print(f"✅ Simple 2x2 test: {A.tolist()} @ {B.tolist()} = {result.tolist()}")
|
||
|
||
# Compare with NumPy
|
||
numpy_result = A @ B
|
||
assert np.allclose(result, numpy_result), f"Doesn't match NumPy: got {result}, expected {numpy_result}"
|
||
print("✅ Matches NumPy's result")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Matrix multiplication test failed: {e}")
|
||
raise
|
||
|
||
# Test different shapes
|
||
try:
|
||
A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3
|
||
B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1
|
||
result2 = matmul_naive(A2, B2)
|
||
expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32
|
||
|
||
assert np.allclose(result2, expected2), f"Different shapes failed: got {result2}, expected {expected2}"
|
||
print(f"✅ Different shapes test: {A2.tolist()} @ {B2.tolist()} = {result2.tolist()}")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Different shapes test failed: {e}")
|
||
raise
|
||
|
||
# Show the algorithm in action
|
||
print("🎯 Matrix multiplication algorithm:")
|
||
print(" C[i,j] = Σ(A[i,k] * B[k,j]) for all k")
|
||
print(" Triple nested loops compute each element")
|
||
print("📈 Progress: Matrix multiplication ✓")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 2: Building the Dense Layer
|
||
|
||
Now let's build the **Dense layer**, the most fundamental building block of neural networks. A Dense layer performs a linear transformation: `y = Wx + b`
|
||
|
||
### What is a Dense Layer?
|
||
- **Linear transformation**: `y = Wx + b`
|
||
- **W**: Weight matrix (learnable parameters)
|
||
- **x**: Input tensor
|
||
- **b**: Bias vector (learnable parameters)
|
||
- **y**: Output tensor
|
||
|
||
### Why Dense Layers Matter
|
||
- **Universal approximation**: Can approximate any function with enough neurons
|
||
- **Feature learning**: Each neuron learns a different feature
|
||
- **Nonlinearity**: When combined with activation functions, becomes very powerful
|
||
- **Foundation**: All other layers build on this concept
|
||
|
||
### The Math
|
||
For input x of shape (batch_size, input_size):
|
||
- **W**: Weight matrix of shape (input_size, output_size)
|
||
- **b**: Bias vector of shape (output_size)
|
||
- **y**: Output of shape (batch_size, output_size)
|
||
|
||
### Visual Example
|
||
```
|
||
Input: x = [1, 2, 3] (3 features)
|
||
Weights: W = [[0.1, 0.2], Bias: b = [0.1, 0.2]
|
||
[0.3, 0.4],
|
||
[0.5, 0.6]]
|
||
|
||
Step 1: Wx = [0.1*1 + 0.3*2 + 0.5*3, 0.2*1 + 0.4*2 + 0.6*3]
|
||
= [2.2, 3.2]
|
||
|
||
Step 2: y = Wx + b = [2.2 + 0.1, 3.2 + 0.2] = [2.3, 3.4]
|
||
```
|
||
|
||
Let's implement this!
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "dense-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class Dense:
|
||
"""
|
||
Dense (Linear) Layer: y = Wx + b
|
||
|
||
The fundamental building block of neural networks.
|
||
Performs linear transformation: matrix multiplication + bias addition.
|
||
"""
|
||
|
||
def __init__(self, input_size: int, output_size: int, use_bias: bool = True,
|
||
use_naive_matmul: bool = False):
|
||
"""
|
||
Initialize Dense layer with random weights.
|
||
|
||
Args:
|
||
input_size: Number of input features
|
||
output_size: Number of output features
|
||
use_bias: Whether to include bias term (default: True)
|
||
use_naive_matmul: Whether to use naive matrix multiplication (for learning)
|
||
|
||
TODO: Implement Dense layer initialization with proper weight initialization.
|
||
|
||
APPROACH:
|
||
1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
|
||
2. Initialize weights with Xavier/Glorot initialization
|
||
3. Initialize bias to zeros (if use_bias=True)
|
||
4. Convert to float32 for consistency
|
||
|
||
EXAMPLE:
|
||
Dense(3, 2) creates:
|
||
- weights: shape (3, 2) with small random values
|
||
- bias: shape (2,) with zeros
|
||
|
||
HINTS:
|
||
- Use np.random.randn() for random initialization
|
||
- Scale weights by sqrt(2/(input_size + output_size)) for Xavier init
|
||
- Use np.zeros() for bias initialization
|
||
- Convert to float32 with .astype(np.float32)
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Store parameters
|
||
self.input_size = input_size
|
||
self.output_size = output_size
|
||
self.use_bias = use_bias
|
||
self.use_naive_matmul = use_naive_matmul
|
||
|
||
# Xavier/Glorot initialization
|
||
scale = np.sqrt(2.0 / (input_size + output_size))
|
||
self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale
|
||
|
||
# Initialize bias
|
||
if use_bias:
|
||
self.bias = np.zeros(output_size, dtype=np.float32)
|
||
else:
|
||
self.bias = None
|
||
### END SOLUTION
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Forward pass: y = Wx + b
|
||
|
||
Args:
|
||
x: Input tensor of shape (batch_size, input_size)
|
||
|
||
Returns:
|
||
Output tensor of shape (batch_size, output_size)
|
||
|
||
TODO: Implement matrix multiplication and bias addition.
|
||
|
||
APPROACH:
|
||
1. Choose matrix multiplication method based on use_naive_matmul flag
|
||
2. Perform matrix multiplication: Wx
|
||
3. Add bias if use_bias=True
|
||
4. Return result wrapped in Tensor
|
||
|
||
EXAMPLE:
|
||
Input x: Tensor([[1, 2, 3]]) # shape (1, 3)
|
||
Weights: shape (3, 2)
|
||
Output: Tensor([[val1, val2]]) # shape (1, 2)
|
||
|
||
HINTS:
|
||
- Use self.use_naive_matmul to choose between matmul_naive and @
|
||
- x.data gives you the numpy array
|
||
- Use broadcasting for bias addition: result + self.bias
|
||
- Return Tensor(result) to wrap the result
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Matrix multiplication
|
||
if self.use_naive_matmul:
|
||
result = matmul_naive(x.data, self.weights)
|
||
else:
|
||
result = x.data @ self.weights
|
||
|
||
# Add bias
|
||
if self.use_bias:
|
||
result += self.bias
|
||
|
||
return Tensor(result)
|
||
### END SOLUTION
|
||
|
||
def __call__(self, x: Tensor) -> Tensor:
|
||
"""Make layer callable: layer(x) same as layer.forward(x)"""
|
||
return self.forward(x)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Quick Test: Dense Layer
|
||
|
||
Let's test your Dense layer implementation! This is the fundamental building block of neural networks.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-dense-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Dense layer immediately after implementation
|
||
print("🔬 Testing Dense layer...")
|
||
|
||
# Test basic Dense layer
|
||
try:
|
||
layer = Dense(input_size=3, output_size=2, use_bias=True)
|
||
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
|
||
|
||
print(f"Input shape: {x.shape}")
|
||
print(f"Layer weights shape: {layer.weights.shape}")
|
||
if layer.bias is not None:
|
||
print(f"Layer bias shape: {layer.bias.shape}")
|
||
|
||
y = layer(x)
|
||
print(f"Output shape: {y.shape}")
|
||
print(f"Output: {y}")
|
||
|
||
# Test shape compatibility
|
||
assert y.shape == (1, 2), f"Output shape should be (1, 2), got {y.shape}"
|
||
print("✅ Dense layer produces correct output shape")
|
||
|
||
# Test weights initialization
|
||
assert layer.weights.shape == (3, 2), f"Weights shape should be (3, 2), got {layer.weights.shape}"
|
||
if layer.bias is not None:
|
||
assert layer.bias.shape == (2,), f"Bias shape should be (2,), got {layer.bias.shape}"
|
||
print("✅ Dense layer has correct weight and bias shapes")
|
||
|
||
# Test that weights are not all zeros (proper initialization)
|
||
assert not np.allclose(layer.weights, 0), "Weights should not be all zeros"
|
||
if layer.bias is not None:
|
||
assert np.allclose(layer.bias, 0), "Bias should be initialized to zeros"
|
||
print("✅ Dense layer has proper weight initialization")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Dense layer test failed: {e}")
|
||
raise
|
||
|
||
# Test without bias
|
||
try:
|
||
layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)
|
||
x2 = Tensor([[1, 2]])
|
||
y2 = layer_no_bias(x2)
|
||
|
||
assert y2.shape == (1, 1), f"No bias output shape should be (1, 1), got {y2.shape}"
|
||
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
|
||
print("✅ Dense layer works without bias")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Dense layer no-bias test failed: {e}")
|
||
raise
|
||
|
||
# Test naive matrix multiplication
|
||
try:
|
||
layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)
|
||
x3 = Tensor([[1, 2]])
|
||
y3 = layer_naive(x3)
|
||
|
||
assert y3.shape == (1, 2), f"Naive matmul output shape should be (1, 2), got {y3.shape}"
|
||
print("✅ Dense layer works with naive matrix multiplication")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Dense layer naive matmul test failed: {e}")
|
||
raise
|
||
|
||
# Show the linear transformation in action
|
||
print("🎯 Dense layer behavior:")
|
||
print(" y = Wx + b (linear transformation)")
|
||
print(" W: learnable weight matrix")
|
||
print(" b: learnable bias vector")
|
||
print("📈 Progress: Matrix multiplication ✓, Dense layer ✓")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Test Your Implementations
|
||
|
||
Once you implement the functions above, run these cells to test them:
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-matmul-naive", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false}
|
||
# Test matrix multiplication
|
||
print("Testing matrix multiplication...")
|
||
|
||
# Test case 1: Simple 2x2 matrices
|
||
A = np.array([[1, 2], [3, 4]], dtype=np.float32)
|
||
B = np.array([[5, 6], [7, 8]], dtype=np.float32)
|
||
|
||
result = matmul_naive(A, B)
|
||
expected = np.array([[19, 22], [43, 50]], dtype=np.float32)
|
||
|
||
print(f"Matrix A:\n{A}")
|
||
print(f"Matrix B:\n{B}")
|
||
print(f"Your result:\n{result}")
|
||
print(f"Expected:\n{expected}")
|
||
|
||
assert np.allclose(result, expected), f"Result doesn't match expected: got {result}, expected {expected}"
|
||
|
||
# Test case 2: Compare with NumPy
|
||
numpy_result = A @ B
|
||
assert np.allclose(result, numpy_result), f"Doesn't match NumPy result: got {result}, expected {numpy_result}"
|
||
|
||
# Test case 3: Different shapes
|
||
A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3
|
||
B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1
|
||
result2 = matmul_naive(A2, B2)
|
||
expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32
|
||
assert np.allclose(result2, expected2), f"Different shapes failed: got {result2}, expected {expected2}"
|
||
|
||
print("✅ Matrix multiplication tests passed!")
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-dense-layer", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Dense layer
|
||
print("Testing Dense layer...")
|
||
|
||
# Test basic Dense layer
|
||
layer = Dense(input_size=3, output_size=2, use_bias=True)
|
||
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
|
||
|
||
print(f"Input shape: {x.shape}")
|
||
print(f"Layer weights shape: {layer.weights.shape}")
|
||
if layer.bias is not None:
|
||
print(f"Layer bias shape: {layer.bias.shape}")
|
||
else:
|
||
print("Layer bias: None")
|
||
|
||
y = layer(x)
|
||
print(f"Output shape: {y.shape}")
|
||
print(f"Output: {y}")
|
||
|
||
# Test shape compatibility
|
||
assert y.shape == (1, 2), f"Output shape should be (1, 2), got {y.shape}"
|
||
|
||
# Test without bias
|
||
layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)
|
||
x2 = Tensor([[1, 2]])
|
||
y2 = layer_no_bias(x2)
|
||
assert y2.shape == (1, 1), f"No bias output shape should be (1, 1), got {y2.shape}"
|
||
assert layer_no_bias.bias is None, "Bias should be None when use_bias=False"
|
||
|
||
# Test naive matrix multiplication
|
||
layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)
|
||
x3 = Tensor([[1, 2]])
|
||
y3 = layer_naive(x3)
|
||
assert y3.shape == (1, 2), f"Naive matmul output shape should be (1, 2), got {y3.shape}"
|
||
|
||
print("✅ Dense layer tests passed!")
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-layer-composition", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false}
|
||
# Test layer composition
|
||
print("Testing layer composition...")
|
||
|
||
# Create a simple network: Dense → ReLU → Dense
|
||
dense1 = Dense(input_size=3, output_size=2)
|
||
relu = ReLU()
|
||
dense2 = Dense(input_size=2, output_size=1)
|
||
|
||
# Test input
|
||
x = Tensor([[1, 2, 3]])
|
||
print(f"Input: {x}")
|
||
|
||
# Forward pass through the network
|
||
h1 = dense1(x)
|
||
print(f"After Dense1: {h1}")
|
||
|
||
h2 = relu(h1)
|
||
print(f"After ReLU: {h2}")
|
||
|
||
h3 = dense2(h2)
|
||
print(f"After Dense2: {h3}")
|
||
|
||
# Test shapes
|
||
assert h1.shape == (1, 2), f"Dense1 output should be (1, 2), got {h1.shape}"
|
||
assert h2.shape == (1, 2), f"ReLU output should be (1, 2), got {h2.shape}"
|
||
assert h3.shape == (1, 1), f"Dense2 output should be (1, 1), got {h3.shape}"
|
||
|
||
# Test that ReLU actually applied (non-negative values)
|
||
assert np.all(h2.data >= 0), "ReLU should produce non-negative values"
|
||
|
||
print("✅ Layer composition tests passed!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🎯 Module Summary
|
||
|
||
Congratulations! You've successfully implemented the core building blocks of neural networks:
|
||
|
||
### What You've Accomplished
|
||
✅ **Matrix Multiplication**: Implemented from scratch with triple nested loops
|
||
✅ **Dense Layer**: The fundamental linear transformation y = Wx + b
|
||
✅ **Weight Initialization**: Xavier/Glorot initialization for stable training
|
||
✅ **Layer Composition**: Combining layers with activations
|
||
✅ **Flexible Implementation**: Support for both naive and optimized matrix multiplication
|
||
|
||
### Key Concepts You've Learned
|
||
- **Matrix multiplication** is the engine of neural networks
|
||
- **Dense layers** perform linear transformations that learn features
|
||
- **Weight initialization** is crucial for stable training
|
||
- **Layer composition** creates powerful nonlinear functions
|
||
- **Batch processing** enables efficient computation
|
||
|
||
### Mathematical Foundations
|
||
- **Linear algebra**: Matrix operations power all neural computations
|
||
- **Universal approximation**: Dense layers can approximate any function
|
||
- **Feature learning**: Each neuron learns different patterns
|
||
- **Composability**: Simple operations combine to create complex behaviors
|
||
|
||
### Next Steps
|
||
1. **Export your code**: `tito package nbdev --export 03_layers`
|
||
2. **Test your implementation**: `tito module test 03_layers`
|
||
3. **Use your layers**:
|
||
```python
|
||
from tinytorch.core.layers import Dense
|
||
from tinytorch.core.activations import ReLU
|
||
layer = Dense(10, 5)
|
||
activation = ReLU()
|
||
```
|
||
4. **Move to Module 4**: Start building complete neural networks!
|
||
|
||
**Ready for the next challenge?** Let's compose these layers into complete neural network architectures!
|
||
""" |