Files
TinyTorch/modules/layers/layers_dev.py
Vijay Janapa Reddi 578a00f608 Revert to rock solid foundation approach for module imports
- Fix module imports to use tinytorch.core.* instead of local module imports
- Activations module now imports from tinytorch.core.tensor for stability
- Layers module imports from tinytorch.core.tensor and tinytorch.core.activations
- Test files updated to use main package imports for dependencies
- This ensures students can focus on current module without dependency issues
- Previous modules are 'locked in' and guaranteed to work
- Mirrors real-world usage patterns like PyTorch
- Maintains educational progression while ensuring system stability
2025-07-12 02:00:30 -04:00

658 lines
21 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# ---
# %% [markdown]
"""
# Module 2: Layers - Neural Network Building Blocks
Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.
## Learning Goals
- Understand layers as functions that transform tensors: `y = f(x)`
- Implement Dense layers with linear transformations: `y = Wx + b`
- Use activation functions from the activations module for nonlinearity
- See how neural networks are just function composition
- Build intuition before diving into training
## Build → Use → Understand
1. **Build**: Dense layers using activation functions as building blocks
2. **Use**: Transform tensors and see immediate results
3. **Understand**: How neural networks transform information
## Module Dependencies
This module builds on the **activations** module:
- **activations** → **layers** → **networks**
- Clean separation of concerns: math functions → layer building blocks → full networks
"""
# %% [markdown]
"""
## 📦 Where This Code Lives in the Final Package
**Learning Side:** You work in `modules/layers/layers_dev.py`
**Building Side:** Code exports to `tinytorch.core.layers`
```python
# Final package structure:
from tinytorch.core.layers import Dense, Conv2D # All layers together!
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
from tinytorch.core.tensor import Tensor
```
**Why this matters:**
- **Learning:** Focused modules for deep understanding
- **Production:** Proper organization like PyTorch's `torch.nn`
- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`
"""
# %%
#| default_exp core.layers
# Setup and imports
import numpy as np
import sys
from typing import Union, Optional, Callable
import math
# %%
#| export
import numpy as np
import math
import sys
from typing import Union, Optional, Callable
# Import from the main package (rock solid foundation)
from tinytorch.core.tensor import Tensor
from tinytorch.core.activations import ReLU, Sigmoid, Tanh
# print("🔥 TinyTorch Layers Module")
# print(f"NumPy version: {np.__version__}")
# print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
# print("Ready to build neural network layers!")
# %% [markdown]
"""
## Step 1: What is a Layer?
### Definition
A **layer** is a function that transforms tensors. Think of it as a mathematical operation that takes input data and produces output data:
```
Input Tensor → Layer → Output Tensor
```
### Why Layers Matter in Neural Networks
Layers are the fundamental building blocks of all neural networks because:
- **Modularity**: Each layer has a specific job (linear transformation, nonlinearity, etc.)
- **Composability**: Layers can be combined to create complex functions
- **Learnability**: Each layer has parameters that can be learned from data
- **Interpretability**: Different layers learn different features
### The Fundamental Insight
**Neural networks are just function composition!**
```
x → Layer1 → Layer2 → Layer3 → y
```
Each layer transforms the data, and the final output is the composition of all these transformations.
### Real-World Examples
- **Dense Layer**: Learns linear relationships between features
- **Convolutional Layer**: Learns spatial patterns in images
- **Recurrent Layer**: Learns temporal patterns in sequences
- **Activation Layer**: Adds nonlinearity to make networks powerful
### Visual Intuition
```
Input: [1, 2, 3] (3 features)
Dense Layer: y = Wx + b
Weights W: [[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6]] (2×3 matrix)
Bias b: [0.1, 0.2] (2 values)
Output: [0.1*1 + 0.2*2 + 0.3*3 + 0.1,
0.4*1 + 0.5*2 + 0.6*3 + 0.2] = [1.4, 3.2]
```
Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).
"""
# %% [markdown]
"""
## Step 2: Understanding Matrix Multiplication
Before we build layers, let's understand the core operation: **matrix multiplication**. This is what powers all neural network computations.
### Why Matrix Multiplication Matters
- **Efficiency**: Process multiple inputs at once
- **Parallelization**: GPU acceleration works great with matrix operations
- **Batch processing**: Handle multiple samples simultaneously
- **Mathematical foundation**: Linear algebra is the language of neural networks
### The Math Behind It
For matrices A (m×n) and B (n×p), the result C (m×p) is:
```
C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))
```
### Visual Example
```
A = [[1, 2], B = [[5, 6],
[3, 4]] [7, 8]]
C = A @ B = [[1*5 + 2*7, 1*6 + 2*8],
[3*5 + 4*7, 3*6 + 4*8]]
= [[19, 22],
[43, 50]]
```
Let's implement this step by step!
"""
# %%
#| export
def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
"""
Naive matrix multiplication using explicit for-loops.
This helps you understand what matrix multiplication really does!
Args:
A: Matrix of shape (m, n)
B: Matrix of shape (n, p)
Returns:
Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))
TODO: Implement matrix multiplication using three nested for-loops.
APPROACH:
1. Get the dimensions: m, n from A and n2, p from B
2. Check that n == n2 (matrices must be compatible)
3. Create output matrix C of shape (m, p) filled with zeros
4. Use three nested loops:
- i loop: rows of A (0 to m-1)
- j loop: columns of B (0 to p-1)
- k loop: shared dimension (0 to n-1)
5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j]
EXAMPLE:
A = [[1, 2], B = [[5, 6],
[3, 4]] [7, 8]]
C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19
C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22
C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43
C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50
HINTS:
- Start with C = np.zeros((m, p))
- Use three nested for loops: for i in range(m): for j in range(p): for k in range(n):
- Accumulate the sum: C[i,j] += A[i,k] * B[k,j]
"""
raise NotImplementedError("Student implementation required")
# %%
#| hide
#| export
def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
"""
Naive matrix multiplication using explicit for-loops.
This helps you understand what matrix multiplication really does!
"""
m, n = A.shape
n2, p = B.shape
assert n == n2, f"Matrix shapes don't match: A({m},{n}) @ B({n2},{p})"
C = np.zeros((m, p))
for i in range(m):
for j in range(p):
for k in range(n):
C[i, j] += A[i, k] * B[k, j]
return C
# %% [markdown]
"""
### 🧪 Test Your Matrix Multiplication
"""
# %%
# Test matrix multiplication
print("Testing matrix multiplication...")
try:
# Test case 1: Simple 2x2 matrices
A = np.array([[1, 2], [3, 4]], dtype=np.float32)
B = np.array([[5, 6], [7, 8]], dtype=np.float32)
result = matmul_naive(A, B)
expected = np.array([[19, 22], [43, 50]], dtype=np.float32)
print(f"✅ Matrix A:\n{A}")
print(f"✅ Matrix B:\n{B}")
print(f"✅ Your result:\n{result}")
print(f"✅ Expected:\n{expected}")
assert np.allclose(result, expected), "❌ Result doesn't match expected!"
print("🎉 Matrix multiplication works!")
# Test case 2: Compare with NumPy
numpy_result = A @ B
assert np.allclose(result, numpy_result), "❌ Doesn't match NumPy result!"
print("✅ Matches NumPy implementation!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement matmul_naive above!")
# %% [markdown]
"""
## Step 3: Building the Dense Layer
Now let's build the **Dense layer**, the most fundamental building block of neural networks. A Dense layer performs a linear transformation: `y = Wx + b`
### What is a Dense Layer?
- **Linear transformation**: `y = Wx + b`
- **W**: Weight matrix (learnable parameters)
- **x**: Input tensor
- **b**: Bias vector (learnable parameters)
- **y**: Output tensor
### Why Dense Layers Matter
- **Universal approximation**: Can approximate any function with enough neurons
- **Feature learning**: Each neuron learns a different feature
- **Nonlinearity**: When combined with activation functions, becomes very powerful
- **Foundation**: All other layers build on this concept
### The Math
For input x of shape (batch_size, input_size):
- **W**: Weight matrix of shape (input_size, output_size)
- **b**: Bias vector of shape (output_size)
- **y**: Output of shape (batch_size, output_size)
### Visual Example
```
Input: x = [1, 2, 3] (3 features)
Weights: W = [[0.1, 0.2], Bias: b = [0.1, 0.2]
[0.3, 0.4],
[0.5, 0.6]]
Step 1: Wx = [0.1*1 + 0.3*2 + 0.5*3, 0.2*1 + 0.4*2 + 0.6*3]
= [2.2, 3.2]
Step 2: y = Wx + b = [2.2 + 0.1, 3.2 + 0.2] = [2.3, 3.4]
```
Let's implement this!
"""
# %%
#| export
class Dense:
"""
Dense (Linear) Layer: y = Wx + b
The fundamental building block of neural networks.
Performs linear transformation: matrix multiplication + bias addition.
Args:
input_size: Number of input features
output_size: Number of output features
use_bias: Whether to include bias term (default: True)
use_naive_matmul: Whether to use naive matrix multiplication (for learning)
TODO: Implement the Dense layer with weight initialization and forward pass.
APPROACH:
1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
2. Initialize weights with small random values (Xavier/Glorot initialization)
3. Initialize bias to zeros (if use_bias=True)
4. Implement forward pass using matrix multiplication and bias addition
EXAMPLE:
layer = Dense(input_size=3, output_size=2)
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
y = layer(x) # shape: (1, 2)
HINTS:
- Use np.random.randn() for random initialization
- Scale weights by sqrt(2/(input_size + output_size)) for Xavier init
- Store weights and bias as numpy arrays
- Use matmul_naive or @ operator based on use_naive_matmul flag
"""
def __init__(self, input_size: int, output_size: int, use_bias: bool = True,
use_naive_matmul: bool = False):
"""
Initialize Dense layer with random weights.
Args:
input_size: Number of input features
output_size: Number of output features
use_bias: Whether to include bias term
use_naive_matmul: Use naive matrix multiplication (for learning)
TODO:
1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
2. Initialize weights with small random values
3. Initialize bias to zeros (if use_bias=True)
STEP-BY-STEP:
1. Store the parameters as instance variables
2. Calculate scale factor for Xavier initialization: sqrt(2/(input_size + output_size))
3. Initialize weights: np.random.randn(input_size, output_size) * scale
4. If use_bias=True, initialize bias: np.zeros(output_size)
5. If use_bias=False, set bias to None
EXAMPLE:
Dense(3, 2) creates:
- weights: shape (3, 2) with small random values
- bias: shape (2,) with zeros
"""
raise NotImplementedError("Student implementation required")
def forward(self, x: Tensor) -> Tensor:
"""
Forward pass: y = Wx + b
Args:
x: Input tensor of shape (batch_size, input_size)
Returns:
Output tensor of shape (batch_size, output_size)
TODO: Implement matrix multiplication and bias addition
- Use self.use_naive_matmul to choose between NumPy and naive implementation
- If use_naive_matmul=True, use matmul_naive(x.data, self.weights)
- If use_naive_matmul=False, use x.data @ self.weights
- Add bias if self.use_bias=True
STEP-BY-STEP:
1. Perform matrix multiplication: Wx
- If use_naive_matmul: result = matmul_naive(x.data, self.weights)
- Else: result = x.data @ self.weights
2. Add bias if use_bias: result += self.bias
3. Return Tensor(result)
EXAMPLE:
Input x: Tensor([[1, 2, 3]]) # shape (1, 3)
Weights: shape (3, 2)
Output: Tensor([[val1, val2]]) # shape (1, 2)
HINTS:
- x.data gives you the numpy array
- self.weights is your weight matrix
- Use broadcasting for bias addition: result + self.bias
- Return Tensor(result) to wrap the result
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %%
#| hide
#| export
class Dense:
"""
Dense (Linear) Layer: y = Wx + b
The fundamental building block of neural networks.
Performs linear transformation: matrix multiplication + bias addition.
"""
def __init__(self, input_size: int, output_size: int, use_bias: bool = True,
use_naive_matmul: bool = False):
"""
Initialize Dense layer with random weights.
Args:
input_size: Number of input features
output_size: Number of output features
use_bias: Whether to include bias term
use_naive_matmul: Use naive matrix multiplication (for learning)
"""
# Store parameters
self.input_size = input_size
self.output_size = output_size
self.use_bias = use_bias
self.use_naive_matmul = use_naive_matmul
# Xavier/Glorot initialization
scale = np.sqrt(2.0 / (input_size + output_size))
self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale
# Initialize bias
if use_bias:
self.bias = np.zeros(output_size, dtype=np.float32)
else:
self.bias = None
def forward(self, x: Tensor) -> Tensor:
"""
Forward pass: y = Wx + b
Args:
x: Input tensor of shape (batch_size, input_size)
Returns:
Output tensor of shape (batch_size, output_size)
"""
# Matrix multiplication
if self.use_naive_matmul:
result = matmul_naive(x.data, self.weights)
else:
result = x.data @ self.weights
# Add bias
if self.use_bias:
result += self.bias
return Tensor(result)
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Dense Layer
"""
# %%
# Test Dense layer
print("Testing Dense layer...")
try:
# Test basic Dense layer
layer = Dense(input_size=3, output_size=2, use_bias=True)
x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3
print(f"✅ Input shape: {x.shape}")
print(f"✅ Layer weights shape: {layer.weights.shape}")
print(f"✅ Layer bias shape: {layer.bias.shape}")
y = layer(x)
print(f"✅ Output shape: {y.shape}")
print(f"✅ Output: {y}")
# Test without bias
layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)
x2 = Tensor([[1, 2]])
y2 = layer_no_bias(x2)
print(f"✅ No bias output: {y2}")
# Test naive matrix multiplication
layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)
x3 = Tensor([[1, 2]])
y3 = layer_naive(x3)
print(f"✅ Naive matmul output: {y3}")
print("\n🎉 All Dense layer tests passed!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the Dense layer above!")
# %% [markdown]
"""
## Step 4: Composing Layers with Activations
Now let's see how layers work together! A neural network is just layers composed with activation functions.
### Why Layer Composition Matters
- **Nonlinearity**: Activation functions make networks powerful
- **Feature learning**: Each layer learns different levels of features
- **Universal approximation**: Can approximate any function
- **Modularity**: Easy to experiment with different architectures
### The Pattern
```
Input → Dense → Activation → Dense → Activation → Output
```
### Real-World Example
```
Input: [1, 2, 3] (3 features)
Dense(3→2): [1.4, 2.8] (linear transformation)
ReLU: [1.4, 2.8] (nonlinearity)
Dense(2→1): [3.2] (final prediction)
```
Let's build a simple network!
"""
# %%
# Test layer composition
print("Testing layer composition...")
try:
# Create a simple network: Dense → ReLU → Dense
dense1 = Dense(input_size=3, output_size=2)
relu = ReLU()
dense2 = Dense(input_size=2, output_size=1)
# Test input
x = Tensor([[1, 2, 3]])
print(f"✅ Input: {x}")
# Forward pass through the network
h1 = dense1(x)
print(f"✅ After Dense1: {h1}")
h2 = relu(h1)
print(f"✅ After ReLU: {h2}")
y = dense2(h2)
print(f"✅ Final output: {y}")
print("\n🎉 Layer composition works!")
print("This is how neural networks work: layers + activations!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure all your layers and activations are working!")
# %% [markdown]
"""
## Step 5: Performance Comparison
Let's compare our naive matrix multiplication with NumPy's optimized version to understand why optimization matters in ML.
### Why Performance Matters
- **Training time**: Neural networks train for hours/days
- **Inference speed**: Real-time applications need fast predictions
- **GPU utilization**: Optimized operations use hardware efficiently
- **Scalability**: Large models need efficient implementations
"""
# %%
# Performance comparison
print("Comparing naive vs NumPy matrix multiplication...")
try:
import time
# Create test matrices
A = np.random.randn(100, 100).astype(np.float32)
B = np.random.randn(100, 100).astype(np.float32)
# Time naive implementation
start_time = time.time()
result_naive = matmul_naive(A, B)
naive_time = time.time() - start_time
# Time NumPy implementation
start_time = time.time()
result_numpy = A @ B
numpy_time = time.time() - start_time
print(f"✅ Naive time: {naive_time:.4f} seconds")
print(f"✅ NumPy time: {numpy_time:.4f} seconds")
print(f"✅ Speedup: {naive_time/numpy_time:.1f}x faster")
# Verify correctness
assert np.allclose(result_naive, result_numpy), "Results don't match!"
print("✅ Results are identical!")
print("\n💡 This is why we use optimized libraries in production!")
except Exception as e:
print(f"❌ Error: {e}")
# %% [markdown]
"""
## 🎯 Module Summary
Congratulations! You've built the foundation of neural network layers:
### What You've Accomplished
✅ **Matrix Multiplication**: Understanding the core operation
✅ **Dense Layer**: Linear transformation with weights and bias
✅ **Layer Composition**: Combining layers with activations
✅ **Performance Awareness**: Understanding optimization importance
✅ **Testing**: Immediate feedback on your implementations
### Key Concepts You've Learned
- **Layers** are functions that transform tensors
- **Matrix multiplication** powers all neural network computations
- **Dense layers** perform linear transformations: `y = Wx + b`
- **Layer composition** creates complex functions from simple building blocks
- **Performance** matters for real-world ML applications
### What's Next
In the next modules, you'll build on this foundation:
- **Networks**: Compose layers into complete models
- **Training**: Learn parameters with gradients and optimization
- **Convolutional layers**: Process spatial data like images
- **Recurrent layers**: Process sequential data like text
### Real-World Connection
Your Dense layer is now ready to:
- Learn patterns in data through weight updates
- Transform features for classification and regression
- Serve as building blocks for complex architectures
- Integrate with the rest of the TinyTorch ecosystem
**Ready for the next challenge?** Let's move on to building complete neural networks!
"""
# %%
# Final verification
print("\n" + "="*50)
print("🎉 LAYERS MODULE COMPLETE!")
print("="*50)
print("✅ Matrix multiplication understanding")
print("✅ Dense layer implementation")
print("✅ Layer composition with activations")
print("✅ Performance awareness")
print("✅ Comprehensive testing")
print("\n🚀 Ready to build networks in the next module!")