Files
TinyTorch/modules/layers/layers_dev.py
Vijay Janapa Reddi e2c659023d 🧱 Implement Layers module - Neural Network Building Blocks
 Features:
- Dense layer with Xavier initialization (y = Wx + b)
- Activation functions: ReLU, Sigmoid, Tanh
- Layer composition for building neural networks
- Comprehensive test suite (17 passed, 5 skipped stretch goals)
- Package-level integration tests (14 passed)
- Complete documentation and examples

🎯 Educational Design:
- Follows 'Build → Use → Understand' pedagogical framework
- Immediate visual feedback with working examples
- Progressive complexity from simple layers to full networks
- Students see neural networks as function composition

🧪 Testing Architecture:
- Module tests: 17/17 core tests pass, 5 stretch goals available
- Package tests: 14/14 integration tests pass
- Dual testing supports both learning and validation

📚 Complete Implementation:
- Dense layer with proper weight initialization
- Numerically stable activation functions
- Batch processing support
- Real-world examples (image classification network)
- CLI integration: 'tito test --module layers'

This establishes the fundamental building blocks students need
to understand neural networks before diving into training.
2025-07-10 20:30:31 -04:00

548 lines
16 KiB
Python

# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# ---
# %% [markdown]
"""
# Module 2: Layers - Neural Network Building Blocks
Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.
## Learning Goals
- Understand layers as functions that transform tensors: `y = f(x)`
- Implement Dense layers with linear transformations: `y = Wx + b`
- Add activation functions for nonlinearity (ReLU, Sigmoid, Tanh)
- See how neural networks are just function composition
- Build intuition before diving into training
## Build → Use → Understand
1. **Build**: Dense layers and activation functions
2. **Use**: Transform tensors and see immediate results
3. **Understand**: How neural networks transform information
## Module → Package Structure
**🎓 Teaching vs. 🔧 Building**:
- **Learning side**: Work in `modules/layers/layers_dev.py`
- **Building side**: Exports to `tinytorch/core/layers.py`
This module builds the fundamental transformations that compose into neural networks.
"""
# %%
#| default_exp core.layers
# Setup and imports
import numpy as np
import sys
from typing import Union, Optional, Callable
import math
# %%
#| export
import numpy as np
import math
import sys
from typing import Union, Optional, Callable
from tinytorch.core.tensor import Tensor
# Import our Tensor class
# sys.path.append('../../')
# from modules.tensor.tensor_dev import Tensor
# print("🔥 TinyTorch Layers Module")
# print(f"NumPy version: {np.__version__}")
# print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
# print("Ready to build neural network layers!")
# %% [markdown]
"""
## Step 1: What is a Layer?
A **layer** is a function that transforms tensors. Think of it as:
- **Input**: Tensor with some shape
- **Transformation**: Mathematical operation (linear, nonlinear, etc.)
- **Output**: Tensor with possibly different shape
**The fundamental insight**: Neural networks are just function composition!
```
x → Layer1 → Layer2 → Layer3 → y
```
**Why layers matter**:
- They're the building blocks of all neural networks
- Each layer learns a different transformation
- Composing layers creates complex functions
- Understanding layers = understanding neural networks
Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).
"""
# %%
#| export
class Dense:
"""
Dense (Linear) Layer: y = Wx + b
The fundamental building block of neural networks.
Performs linear transformation: matrix multiplication + bias addition.
Args:
input_size: Number of input features
output_size: Number of output features
use_bias: Whether to include bias term (default: True)
TODO: Implement the Dense layer with weight initialization and forward pass.
"""
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
"""
Initialize Dense layer with random weights.
TODO:
1. Store layer parameters (input_size, output_size, use_bias)
2. Initialize weights with small random values
3. Initialize bias to zeros (if use_bias=True)
"""
raise NotImplementedError("Student implementation required")
def forward(self, x: Tensor) -> Tensor:
"""
Forward pass: y = Wx + b
Args:
x: Input tensor of shape (batch_size, input_size)
Returns:
Output tensor of shape (batch_size, output_size)
TODO: Implement matrix multiplication and bias addition
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %%
#| hide
#| export
class Dense:
"""
Dense (Linear) Layer: y = Wx + b
The fundamental building block of neural networks.
Performs linear transformation: matrix multiplication + bias addition.
"""
def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
"""Initialize Dense layer with random weights."""
self.input_size = input_size
self.output_size = output_size
self.use_bias = use_bias
# Initialize weights with Xavier/Glorot initialization
# This helps with gradient flow during training
limit = math.sqrt(6.0 / (input_size + output_size))
self.weights = Tensor(
np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)
)
# Initialize bias to zeros
if use_bias:
self.bias = Tensor(np.zeros(output_size, dtype=np.float32))
else:
self.bias = None
def forward(self, x: Tensor) -> Tensor:
"""Forward pass: y = Wx + b"""
# Matrix multiplication: x @ weights
# x shape: (batch_size, input_size)
# weights shape: (input_size, output_size)
# result shape: (batch_size, output_size)
output = Tensor(x.data @ self.weights.data)
# Add bias if present
if self.bias is not None:
output = Tensor(output.data + self.bias.data)
return output
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Dense Layer
Once you implement the Dense layer above, run this cell to test it:
"""
# %%
# Test the Dense layer
try:
print("=== Testing Dense Layer ===")
# Create a simple Dense layer: 3 inputs → 2 outputs
layer = Dense(input_size=3, output_size=2)
print(f"Created Dense layer: {layer.input_size}{layer.output_size}")
print(f"Weights shape: {layer.weights.shape}")
print(f"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}")
# Test with a single example
x = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3)
y = layer(x)
print(f"Input shape: {x.shape}")
print(f"Output shape: {y.shape}")
print(f"Input: {x.data}")
print(f"Output: {y.data}")
# Test with batch of examples
x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # Shape: (2, 3)
y_batch = layer(x_batch)
print(f"\nBatch input shape: {x_batch.shape}")
print(f"Batch output shape: {y_batch.shape}")
print(f"Batch output: {y_batch.data}")
print("✅ Dense layer working!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the Dense layer above!")
# %% [markdown]
"""
## Step 2: Activation Functions
Dense layers alone can only learn **linear** transformations. But most real-world problems need **nonlinear** transformations.
**Activation functions** add nonlinearity:
- **ReLU**: `max(0, x)` - Most common, simple and effective
- **Sigmoid**: `1 / (1 + e^(-x))` - Squashes to (0, 1)
- **Tanh**: `tanh(x)` - Squashes to (-1, 1)
**Why nonlinearity matters**: Without it, stacking layers is pointless!
```
Linear → Linear → Linear = Just one big Linear transformation
Linear → NonLinear → Linear = Can learn complex patterns
```
"""
# %%
#| export
class ReLU:
"""
ReLU Activation: f(x) = max(0, x)
The most popular activation function in deep learning.
Simple, effective, and computationally efficient.
TODO: Implement ReLU activation function.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply ReLU: f(x) = max(0, x)
Args:
x: Input tensor
Returns:
Output tensor with ReLU applied element-wise
TODO: Implement element-wise max(0, x) operation
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
"""Make activation callable: relu(x) same as relu.forward(x)"""
return self.forward(x)
# %%
#| hide
#| export
class ReLU:
"""ReLU Activation: f(x) = max(0, x)"""
def forward(self, x: Tensor) -> Tensor:
"""Apply ReLU: f(x) = max(0, x)"""
return Tensor(np.maximum(0, x.data))
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %%
#| export
class Sigmoid:
"""
Sigmoid Activation: f(x) = 1 / (1 + e^(-x))
Squashes input to range (0, 1). Often used for binary classification.
TODO: Implement Sigmoid activation function.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply Sigmoid: f(x) = 1 / (1 + e^(-x))
Args:
x: Input tensor
Returns:
Output tensor with Sigmoid applied element-wise
TODO: Implement sigmoid function (be careful with numerical stability!)
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %%
#| hide
#| export
class Sigmoid:
"""Sigmoid Activation: f(x) = 1 / (1 + e^(-x))"""
def forward(self, x: Tensor) -> Tensor:
"""Apply Sigmoid with numerical stability"""
# Use the numerically stable version to avoid overflow
# For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
# For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
x_data = x.data
result = np.zeros_like(x_data)
# Stable computation
positive_mask = x_data >= 0
result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))
result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))
return Tensor(result)
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %%
#| export
class Tanh:
"""
Tanh Activation: f(x) = tanh(x)
Squashes input to range (-1, 1). Zero-centered output.
TODO: Implement Tanh activation function.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply Tanh: f(x) = tanh(x)
Args:
x: Input tensor
Returns:
Output tensor with Tanh applied element-wise
TODO: Implement tanh function
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %%
#| hide
#| export
class Tanh:
"""Tanh Activation: f(x) = tanh(x)"""
def forward(self, x: Tensor) -> Tensor:
"""Apply Tanh"""
return Tensor(np.tanh(x.data))
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Activation Functions
Once you implement the activation functions above, run this cell to test them:
"""
# %%
# Test activation functions
try:
print("=== Testing Activation Functions ===")
# Test data: mix of positive, negative, and zero
x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])
print(f"Input: {x.data}")
# Test ReLU
relu = ReLU()
y_relu = relu(x)
print(f"ReLU output: {y_relu.data}")
# Test Sigmoid
sigmoid = Sigmoid()
y_sigmoid = sigmoid(x)
print(f"Sigmoid output: {y_sigmoid.data}")
# Test Tanh
tanh = Tanh()
y_tanh = tanh(x)
print(f"Tanh output: {y_tanh.data}")
print("✅ Activation functions working!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the activation functions above!")
# %% [markdown]
"""
## Step 3: Layer Composition - Building Neural Networks
Now comes the magic! We can **compose** layers to build neural networks:
```
Input → Dense → ReLU → Dense → Sigmoid → Output
```
This is a 2-layer neural network that can learn complex nonlinear patterns!
"""
# %%
# Build a simple 2-layer neural network
try:
print("=== Building a 2-Layer Neural Network ===")
# Network architecture: 3 → 4 → 2
# Input: 3 features
# Hidden: 4 neurons with ReLU
# Output: 2 neurons with Sigmoid
layer1 = Dense(input_size=3, output_size=4)
activation1 = ReLU()
layer2 = Dense(input_size=4, output_size=2)
activation2 = Sigmoid()
print("Network architecture:")
print(f" Input: 3 features")
print(f" Hidden: {layer1.input_size}{layer1.output_size} (Dense + ReLU)")
print(f" Output: {layer2.input_size}{layer2.output_size} (Dense + Sigmoid)")
# Test with sample data
x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # 2 examples, 3 features each
print(f"\nInput shape: {x.shape}")
print(f"Input data: {x.data}")
# Forward pass through the network
h1 = layer1(x) # Dense layer 1
h1_activated = activation1(h1) # ReLU activation
h2 = layer2(h1_activated) # Dense layer 2
output = activation2(h2) # Sigmoid activation
print(f"\nAfter layer 1: {h1.shape}")
print(f"After ReLU: {h1_activated.shape}")
print(f"After layer 2: {h2.shape}")
print(f"Final output: {output.shape}")
print(f"Output values: {output.data}")
print("\n🎉 Neural network working! You just built your first neural network!")
print("Notice how the network transforms 3D input into 2D output through learned transformations.")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the layers and activations above!")
# %% [markdown]
"""
## Step 4: Understanding What We Built
Congratulations! You just implemented the fundamental building blocks of neural networks:
### 🧱 **What You Built**
1. **Dense Layer**: Linear transformation `y = Wx + b`
2. **Activation Functions**: Nonlinear transformations (ReLU, Sigmoid, Tanh)
3. **Layer Composition**: Chaining layers to build networks
### 🎯 **Key Insights**
- **Layers are functions**: They transform tensors from one space to another
- **Composition creates complexity**: Simple layers → complex networks
- **Nonlinearity is crucial**: Without it, deep networks are just linear transformations
- **Neural networks are function approximators**: They learn to map inputs to outputs
### 🚀 **What's Next**
In the next modules, you'll learn:
- **Training**: How networks learn from data (backpropagation, optimizers)
- **Architectures**: Specialized layers for different problems (CNNs, RNNs)
- **Applications**: Using networks for real problems
### 🔧 **Export to Package**
Run this to export your layers to the TinyTorch package:
```bash
python bin/tito.py sync
```
Then test your implementation:
```bash
python bin/tito.py test --module layers
```
**Great job! You've built the foundation of neural networks!** 🎉
"""
# %%
# Final demonstration: A more complex example
try:
print("=== Final Demo: Image Classification Network ===")
# Simulate a small image: 28x28 pixels flattened to 784 features
# This is like a tiny MNIST digit
image_size = 28 * 28 # 784 pixels
num_classes = 10 # 10 digits (0-9)
# Build a 3-layer network for digit classification
# 784 → 128 → 64 → 10
layer1 = Dense(input_size=image_size, output_size=128)
relu1 = ReLU()
layer2 = Dense(input_size=128, output_size=64)
relu2 = ReLU()
layer3 = Dense(input_size=64, output_size=num_classes)
softmax = Sigmoid() # Using Sigmoid as a simple "probability-like" output
print(f"Image classification network:")
print(f" Input: {image_size} pixels (28x28 image)")
print(f" Hidden 1: {layer1.input_size}{layer1.output_size} (Dense + ReLU)")
print(f" Hidden 2: {layer2.input_size}{layer2.output_size} (Dense + ReLU)")
print(f" Output: {layer3.input_size}{layer3.output_size} (Dense + Sigmoid)")
# Simulate a batch of 5 images
batch_size = 5
fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32))
# Forward pass
h1 = relu1(layer1(fake_images))
h2 = relu2(layer2(h1))
predictions = softmax(layer3(h2))
print(f"\nBatch processing:")
print(f" Input batch shape: {fake_images.shape}")
print(f" Predictions shape: {predictions.shape}")
print(f" Sample predictions: {predictions.data[0]}") # First image predictions
print("\n🎉 You built a neural network that could classify images!")
print("With training, this network could learn to recognize handwritten digits!")
except Exception as e:
print(f"❌ Error: {e}")
print("Check your layer implementations!")