Files
TinyTorch/modules/source/02_activations/activations_dev.py
Vijay Janapa Reddi 833475c2c7 feat: Transform 7 modules to follow progressive testing pedagogical pattern
- Implement 'explain → code → test → repeat' structure across all modules
- Replace comprehensive end-of-module tests with progressive unit tests
- Add rich scaffolding with detailed implementation guidance
- Transform generic TODOs into step-by-step learning instructions
- Connect educational content to real-world ML systems and PyTorch
- Reduce overall codebase by 37% while enhancing learning experience
- Ensure immediate feedback and skill building for students

Modules transformed:
- 01_tensor: Tensor operations and broadcasting
- 02_activations: Activation functions and derivatives
- 03_layers: Linear layers and forward/backward propagation
- 04_networks: Network building and multi-layer composition
- 05_cnn: Convolution operations and CNN architecture
- 06_dataloader: Data pipeline and batch processing
- 07_autograd: Automatic differentiation and computational graphs
2025-07-13 16:43:27 -04:00

886 lines
31 KiB
Python

# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# ---
# %% [markdown]
"""
# Module 2: Activations - Nonlinearity in Neural Networks
Welcome to the Activations module! This is where neural networks get their power through nonlinearity.
## Learning Goals
- Understand why activation functions are essential for neural networks
- Implement the four most important activation functions: ReLU, Sigmoid, Tanh, and Softmax
- Visualize how activations transform data and enable complex learning
- See how activations work with layers to build powerful networks
- Master the NBGrader workflow with comprehensive testing
## Build → Use → Understand
1. **Build**: Activation functions that add nonlinearity
2. **Use**: Transform tensors and see immediate results
3. **Understand**: How nonlinearity enables complex pattern learning
"""
# %% nbgrader={"grade": false, "grade_id": "activations-imports", "locked": false, "schema_version": 3, "solution": false, "task": false}
#| default_exp core.activations
#| export
import math
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
from typing import Union, List
# Import our Tensor class - try from package first, then from local module
try:
from tinytorch.core.tensor import Tensor
except ImportError:
# For development, import from local tensor module
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))
from tensor_dev import Tensor
# %% nbgrader={"grade": false, "grade_id": "activations-setup", "locked": false, "schema_version": 3, "solution": false, "task": false}
#| hide
#| export
def _should_show_plots():
"""Check if we should show plots (disable during testing)"""
# Check multiple conditions that indicate we're in test mode
is_pytest = (
'pytest' in sys.modules or
'test' in sys.argv or
os.environ.get('PYTEST_CURRENT_TEST') is not None or
any('test' in arg for arg in sys.argv) or
any('pytest' in arg for arg in sys.argv)
)
# Show plots in development mode (when not in test mode)
return not is_pytest
# %% nbgrader={"grade": false, "grade_id": "activations-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false}
print("🔥 TinyTorch Activations Module")
print(f"NumPy version: {np.__version__}")
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
print("Ready to build activation functions!")
# %% [markdown]
"""
## 📦 Where This Code Lives in the Final Package
**Learning Side:** You work in `modules/source/02_activations/activations_dev.py`
**Building Side:** Code exports to `tinytorch.core.activations`
```python
# Final package structure:
from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax
from tinytorch.core.tensor import Tensor # Foundation
from tinytorch.core.layers import Dense # Uses activations
```
**Why this matters:**
- **Learning:** Focused modules for deep understanding
- **Production:** Proper organization like PyTorch's `torch.nn.ReLU`
- **Consistency:** All activation functions live together in `core.activations`
- **Integration:** Works seamlessly with tensors and layers
"""
# %% [markdown]
"""
## What Are Activation Functions?
### The Problem: Linear Limitations
Without activation functions, neural networks can only learn linear relationships:
```
y = W₁ · (W₂ · (W₃ · x + b₃) + b₂) + b₁
```
This simplifies to just:
```
y = W_combined · x + b_combined
```
**A single linear function!** No matter how many layers you add, you can't learn complex patterns like:
- Image recognition (nonlinear pixel relationships)
- Language understanding (nonlinear word relationships)
- Game playing (nonlinear strategy relationships)
### The Solution: Nonlinearity
Activation functions add nonlinearity between layers:
```
y = W₁ · f(W₂ · f(W₃ · x + b₃) + b₂) + b₁
```
Now each layer can learn complex transformations!
### Real-World Impact
- **Before activations**: Only linear classifiers (logistic regression)
- **After activations**: Complex pattern recognition (deep learning revolution)
### What We'll Build
1. **ReLU**: The foundation of modern deep learning
2. **Sigmoid**: Classic activation for binary classification
3. **Tanh**: Centered activation for better gradients
4. **Softmax**: Probability distributions for multi-class classification
"""
# %% [markdown]
"""
## Step 1: ReLU - The Foundation of Deep Learning
### What is ReLU?
**ReLU (Rectified Linear Unit)** is the most important activation function in deep learning:
```
f(x) = max(0, x)
```
- **Positive inputs**: Pass through unchanged
- **Negative inputs**: Become zero
- **Zero**: Stays zero
### Why ReLU Revolutionized Deep Learning
1. **Computational efficiency**: Just a max operation
2. **No vanishing gradients**: Derivative is 1 for positive values
3. **Sparsity**: Many neurons output exactly 0
4. **Empirical success**: Works well in practice
### Visual Understanding
```
Input: [-2, -1, 0, 1, 2]
ReLU: [ 0, 0, 0, 1, 2]
```
### Real-World Applications
- **Image classification**: ResNet, VGG, AlexNet
- **Object detection**: YOLO, R-CNN
- **Language models**: Transformer feedforward layers
- **Recommendation**: Deep collaborative filtering
### Mathematical Properties
- **Derivative**: f'(x) = 1 if x > 0, else 0
- **Range**: [0, ∞)
- **Sparsity**: Outputs exactly 0 for negative inputs
"""
# %% nbgrader={"grade": false, "grade_id": "relu-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
class ReLU:
"""
ReLU Activation Function: f(x) = max(0, x)
The most popular activation function in deep learning.
Simple, fast, and effective for most applications.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply ReLU activation: f(x) = max(0, x)
TODO: Implement ReLU activation function.
STEP-BY-STEP IMPLEMENTATION:
1. For each element in the input tensor, apply max(0, element)
2. Use NumPy's maximum function for efficient element-wise operation
3. Return a new Tensor with the results
4. Preserve the input tensor's shape
EXAMPLE USAGE:
```python
relu = ReLU()
input_tensor = Tensor([[-2, -1, 0, 1, 2]])
output = relu(input_tensor)
print(output.data) # [[0, 0, 0, 1, 2]]
```
IMPLEMENTATION HINTS:
- Use np.maximum(0, x.data) for element-wise max with 0
- Remember to return a new Tensor object: return Tensor(result)
- The shape should remain the same as input
- Don't modify the input tensor (immutable operations)
LEARNING CONNECTIONS:
- This is like torch.nn.ReLU() in PyTorch
- Used in virtually every modern neural network
- Enables deep networks by preventing vanishing gradients
- Creates sparse representations (many zeros)
"""
### BEGIN SOLUTION
result = np.maximum(0, x.data)
return Tensor(result)
### END SOLUTION
def __call__(self, x: Tensor) -> Tensor:
"""Make the class callable: relu(x) instead of relu.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your ReLU Implementation
Once you implement the ReLU forward method above, run this cell to test it:
"""
# %% nbgrader={"grade": true, "grade_id": "test-relu-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
def test_relu_activation():
"""Test ReLU activation function"""
print("Testing ReLU activation...")
# Create ReLU instance
relu = ReLU()
# Test with mixed positive/negative values
test_input = Tensor([[-2, -1, 0, 1, 2]])
result = relu(test_input)
expected = np.array([[0, 0, 0, 1, 2]])
assert np.array_equal(result.data, expected), f"ReLU failed: expected {expected}, got {result.data}"
# Test that negative values become zero
assert np.all(result.data >= 0), "ReLU should make all negative values zero"
# Test that positive values remain unchanged
positive_input = Tensor([[1, 2, 3, 4, 5]])
positive_result = relu(positive_input)
assert np.array_equal(positive_result.data, positive_input.data), "ReLU should preserve positive values"
# Test with 2D tensor
matrix_input = Tensor([[-1, 2], [3, -4]])
matrix_result = relu(matrix_input)
matrix_expected = np.array([[0, 2], [3, 0]])
assert np.array_equal(matrix_result.data, matrix_expected), "ReLU should work with 2D tensors"
# Test shape preservation
assert matrix_result.shape == matrix_input.shape, "ReLU should preserve input shape"
print("✅ ReLU activation tests passed!")
print(f"✅ Negative values correctly zeroed")
print(f"✅ Positive values preserved")
print(f"✅ Shape preservation working")
print(f"✅ Works with multi-dimensional tensors")
# Run the test
test_relu_activation()
# %% [markdown]
"""
## Step 2: Sigmoid - Classic Binary Classification
### What is Sigmoid?
**Sigmoid** is the classic activation function that maps any real number to (0, 1):
```
f(x) = 1 / (1 + e^(-x))
```
### Why Sigmoid Matters
1. **Probability interpretation**: Outputs between 0 and 1
2. **Smooth gradients**: Differentiable everywhere
3. **Historical importance**: Enabled early neural networks
4. **Binary classification**: Perfect for yes/no decisions
### Visual Understanding
```
Input: [-∞, -2, -1, 0, 1, 2, ∞]
Sigmoid:[0, 0.12, 0.27, 0.5, 0.73, 0.88, 1]
```
### Real-World Applications
- **Binary classification**: Spam detection, medical diagnosis
- **Gating mechanisms**: LSTM and GRU cells
- **Output layers**: When you need probabilities
- **Attention mechanisms**: Where to focus attention
### Mathematical Properties
- **Range**: (0, 1)
- **Derivative**: f'(x) = f(x) · (1 - f(x))
- **Centered**: f(0) = 0.5
- **Symmetric**: f(-x) = 1 - f(x)
"""
# %% nbgrader={"grade": false, "grade_id": "sigmoid-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
class Sigmoid:
"""
Sigmoid Activation Function: f(x) = 1 / (1 + e^(-x))
Maps any real number to the range (0, 1).
Useful for binary classification and probability outputs.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply Sigmoid activation: f(x) = 1 / (1 + e^(-x))
TODO: Implement Sigmoid activation function.
STEP-BY-STEP IMPLEMENTATION:
1. Compute the negative of input: -x.data
2. Compute the exponential: np.exp(-x.data)
3. Add 1 to the exponential: 1 + np.exp(-x.data)
4. Take the reciprocal: 1 / (1 + np.exp(-x.data))
5. Return as new Tensor
EXAMPLE USAGE:
```python
sigmoid = Sigmoid()
input_tensor = Tensor([[-2, -1, 0, 1, 2]])
output = sigmoid(input_tensor)
print(output.data) # [[0.119, 0.269, 0.5, 0.731, 0.881]]
```
IMPLEMENTATION HINTS:
- Use np.exp() for exponential function
- Formula: 1 / (1 + np.exp(-x.data))
- Handle potential overflow with np.clip(-x.data, -500, 500)
- Return Tensor(result)
LEARNING CONNECTIONS:
- This is like torch.nn.Sigmoid() in PyTorch
- Used in binary classification output layers
- Key component in LSTM and GRU gating mechanisms
- Historically important for early neural networks
"""
### BEGIN SOLUTION
# Clip to prevent overflow
clipped_input = np.clip(-x.data, -500, 500)
result = 1 / (1 + np.exp(clipped_input))
return Tensor(result)
### END SOLUTION
def __call__(self, x: Tensor) -> Tensor:
"""Make the class callable: sigmoid(x) instead of sigmoid.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Sigmoid Implementation
Once you implement the Sigmoid forward method above, run this cell to test it:
"""
# %% nbgrader={"grade": true, "grade_id": "test-sigmoid-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
def test_sigmoid_activation():
"""Test Sigmoid activation function"""
print("Testing Sigmoid activation...")
# Create Sigmoid instance
sigmoid = Sigmoid()
# Test with known values
test_input = Tensor([[0]])
result = sigmoid(test_input)
expected = 0.5
assert abs(result.data[0][0] - expected) < 1e-6, f"Sigmoid(0) should be 0.5, got {result.data[0][0]}"
# Test with positive and negative values
test_input = Tensor([[-2, -1, 0, 1, 2]])
result = sigmoid(test_input)
# Check that all values are between 0 and 1
assert np.all(result.data > 0), "Sigmoid output should be > 0"
assert np.all(result.data < 1), "Sigmoid output should be < 1"
# Test symmetry: sigmoid(-x) = 1 - sigmoid(x)
x_val = 1.0
pos_result = sigmoid(Tensor([[x_val]]))
neg_result = sigmoid(Tensor([[-x_val]]))
symmetry_check = abs(pos_result.data[0][0] + neg_result.data[0][0] - 1.0)
assert symmetry_check < 1e-6, "Sigmoid should be symmetric around 0.5"
# Test with 2D tensor
matrix_input = Tensor([[-1, 1], [0, 2]])
matrix_result = sigmoid(matrix_input)
assert matrix_result.shape == matrix_input.shape, "Sigmoid should preserve shape"
# Test extreme values (should not overflow)
extreme_input = Tensor([[-100, 100]])
extreme_result = sigmoid(extreme_input)
assert not np.any(np.isnan(extreme_result.data)), "Sigmoid should handle extreme values"
assert not np.any(np.isinf(extreme_result.data)), "Sigmoid should not produce inf values"
print("✅ Sigmoid activation tests passed!")
print(f"✅ Outputs correctly bounded between 0 and 1")
print(f"✅ Symmetric property verified")
print(f"✅ Handles extreme values without overflow")
print(f"✅ Shape preservation working")
# Run the test
test_sigmoid_activation()
# %% [markdown]
"""
## Step 3: Tanh - Centered Activation
### What is Tanh?
**Tanh (Hyperbolic Tangent)** is similar to sigmoid but centered around zero:
```
f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
```
### Why Tanh is Better Than Sigmoid
1. **Zero-centered**: Outputs range from -1 to 1
2. **Better gradients**: Helps with gradient flow in deep networks
3. **Faster convergence**: Less bias shift during training
4. **Stronger gradients**: Maximum gradient is 1 vs 0.25 for sigmoid
### Visual Understanding
```
Input: [-∞, -2, -1, 0, 1, 2, ∞]
Tanh: [-1, -0.96, -0.76, 0, 0.76, 0.96, 1]
```
### Real-World Applications
- **Hidden layers**: Better than sigmoid for internal activations
- **RNN cells**: Classic RNN and LSTM use tanh
- **Normalization**: When you need zero-centered outputs
- **Feature scaling**: Maps inputs to [-1, 1] range
### Mathematical Properties
- **Range**: (-1, 1)
- **Derivative**: f'(x) = 1 - f(x)²
- **Zero-centered**: f(0) = 0
- **Antisymmetric**: f(-x) = -f(x)
"""
# %% nbgrader={"grade": false, "grade_id": "tanh-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
class Tanh:
"""
Tanh Activation Function: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
Zero-centered activation function with range (-1, 1).
Better gradient properties than sigmoid.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply Tanh activation: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
TODO: Implement Tanh activation function.
STEP-BY-STEP IMPLEMENTATION:
1. Use NumPy's built-in tanh function: np.tanh(x.data)
2. Alternatively, implement manually:
- Compute e^x and e^(-x)
- Calculate (e^x - e^(-x)) / (e^x + e^(-x))
3. Return as new Tensor
EXAMPLE USAGE:
```python
tanh = Tanh()
input_tensor = Tensor([[-2, -1, 0, 1, 2]])
output = tanh(input_tensor)
print(output.data) # [[-0.964, -0.762, 0, 0.762, 0.964]]
```
IMPLEMENTATION HINTS:
- Use np.tanh(x.data) for simplicity
- Manual implementation: (np.exp(x.data) - np.exp(-x.data)) / (np.exp(x.data) + np.exp(-x.data))
- Handle overflow by clipping inputs: np.clip(x.data, -500, 500)
- Return Tensor(result)
LEARNING CONNECTIONS:
- This is like torch.nn.Tanh() in PyTorch
- Used in RNN, LSTM, and GRU cells
- Better than sigmoid for hidden layers
- Zero-centered outputs help with gradient flow
"""
### BEGIN SOLUTION
# Use NumPy's built-in tanh function
result = np.tanh(x.data)
return Tensor(result)
### END SOLUTION
def __call__(self, x: Tensor) -> Tensor:
"""Make the class callable: tanh(x) instead of tanh.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Tanh Implementation
Once you implement the Tanh forward method above, run this cell to test it:
"""
# %% nbgrader={"grade": true, "grade_id": "test-tanh-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
def test_tanh_activation():
"""Test Tanh activation function"""
print("Testing Tanh activation...")
# Create Tanh instance
tanh = Tanh()
# Test with zero (should be 0)
test_input = Tensor([[0]])
result = tanh(test_input)
expected = 0.0
assert abs(result.data[0][0] - expected) < 1e-6, f"Tanh(0) should be 0, got {result.data[0][0]}"
# Test with positive and negative values
test_input = Tensor([[-2, -1, 0, 1, 2]])
result = tanh(test_input)
# Check that all values are between -1 and 1
assert np.all(result.data > -1), "Tanh output should be > -1"
assert np.all(result.data < 1), "Tanh output should be < 1"
# Test antisymmetry: tanh(-x) = -tanh(x)
x_val = 1.5
pos_result = tanh(Tensor([[x_val]]))
neg_result = tanh(Tensor([[-x_val]]))
antisymmetry_check = abs(pos_result.data[0][0] + neg_result.data[0][0])
assert antisymmetry_check < 1e-6, "Tanh should be antisymmetric"
# Test with 2D tensor
matrix_input = Tensor([[-1, 1], [0, 2]])
matrix_result = tanh(matrix_input)
assert matrix_result.shape == matrix_input.shape, "Tanh should preserve shape"
# Test extreme values (should not overflow)
extreme_input = Tensor([[-100, 100]])
extreme_result = tanh(extreme_input)
assert not np.any(np.isnan(extreme_result.data)), "Tanh should handle extreme values"
assert not np.any(np.isinf(extreme_result.data)), "Tanh should not produce inf values"
# Test that extreme values approach ±1
assert abs(extreme_result.data[0][0] - (-1)) < 1e-6, "Tanh(-∞) should approach -1"
assert abs(extreme_result.data[0][1] - 1) < 1e-6, "Tanh(∞) should approach 1"
print("✅ Tanh activation tests passed!")
print(f"✅ Outputs correctly bounded between -1 and 1")
print(f"✅ Antisymmetric property verified")
print(f"✅ Zero-centered (tanh(0) = 0)")
print(f"✅ Handles extreme values correctly")
# Run the test
test_tanh_activation()
# %% [markdown]
"""
## Step 4: Softmax - Probability Distributions
### What is Softmax?
**Softmax** converts a vector of real numbers into a probability distribution:
```
f(x_i) = e^(x_i) / Σ(e^(x_j))
```
### Why Softmax is Essential
1. **Probability distribution**: Outputs sum to 1
2. **Multi-class classification**: Choose one class from many
3. **Interpretable**: Each output is a probability
4. **Differentiable**: Enables gradient-based learning
### Visual Understanding
```
Input: [1, 2, 3]
Softmax:[0.09, 0.24, 0.67] # Sums to 1.0
```
### Real-World Applications
- **Classification**: Image classification, text classification
- **Language models**: Next word prediction
- **Attention mechanisms**: Where to focus attention
- **Reinforcement learning**: Action selection probabilities
### Mathematical Properties
- **Range**: (0, 1) for each output
- **Constraint**: Σ(f(x_i)) = 1
- **Argmax preservation**: Doesn't change relative ordering
- **Temperature scaling**: Can be made sharper or softer
"""
# %% nbgrader={"grade": false, "grade_id": "softmax-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
class Softmax:
"""
Softmax Activation Function: f(x_i) = e^(x_i) / Σ(e^(x_j))
Converts a vector of real numbers into a probability distribution.
Essential for multi-class classification.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply Softmax activation: f(x_i) = e^(x_i) / Σ(e^(x_j))
TODO: Implement Softmax activation function.
STEP-BY-STEP IMPLEMENTATION:
1. Subtract max value for numerical stability: x - max(x)
2. Compute exponentials: np.exp(x - max(x))
3. Compute sum of exponentials: np.sum(exp_values)
4. Divide each exponential by the sum: exp_values / sum
5. Return as new Tensor
EXAMPLE USAGE:
```python
softmax = Softmax()
input_tensor = Tensor([[1, 2, 3]])
output = softmax(input_tensor)
print(output.data) # [[0.09, 0.24, 0.67]]
print(np.sum(output.data)) # 1.0
```
IMPLEMENTATION HINTS:
- Subtract max for numerical stability: x_shifted = x.data - np.max(x.data, axis=-1, keepdims=True)
- Compute exponentials: exp_values = np.exp(x_shifted)
- Sum along last axis: sum_exp = np.sum(exp_values, axis=-1, keepdims=True)
- Divide: result = exp_values / sum_exp
- Return Tensor(result)
LEARNING CONNECTIONS:
- This is like torch.nn.Softmax() in PyTorch
- Used in classification output layers
- Key component in attention mechanisms
- Enables probability-based decision making
"""
### BEGIN SOLUTION
# Subtract max for numerical stability
x_shifted = x.data - np.max(x.data, axis=-1, keepdims=True)
# Compute exponentials
exp_values = np.exp(x_shifted)
# Sum along last axis
sum_exp = np.sum(exp_values, axis=-1, keepdims=True)
# Divide to get probabilities
result = exp_values / sum_exp
return Tensor(result)
### END SOLUTION
def __call__(self, x: Tensor) -> Tensor:
"""Make the class callable: softmax(x) instead of softmax.forward(x)"""
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Softmax Implementation
Once you implement the Softmax forward method above, run this cell to test it:
"""
# %% nbgrader={"grade": true, "grade_id": "test-softmax-immediate", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false}
def test_softmax_activation():
"""Test Softmax activation function"""
print("Testing Softmax activation...")
# Create Softmax instance
softmax = Softmax()
# Test with simple input
test_input = Tensor([[1, 2, 3]])
result = softmax(test_input)
# Check that outputs sum to 1
output_sum = np.sum(result.data)
assert abs(output_sum - 1.0) < 1e-6, f"Softmax outputs should sum to 1, got {output_sum}"
# Check that all outputs are positive
assert np.all(result.data > 0), "Softmax outputs should be positive"
assert np.all(result.data < 1), "Softmax outputs should be less than 1"
# Test with uniform input (should give equal probabilities)
uniform_input = Tensor([[1, 1, 1]])
uniform_result = softmax(uniform_input)
expected_prob = 1.0 / 3.0
for prob in uniform_result.data[0]:
assert abs(prob - expected_prob) < 1e-6, f"Uniform input should give equal probabilities"
# Test with batch input (multiple samples)
batch_input = Tensor([[1, 2, 3], [4, 5, 6]])
batch_result = softmax(batch_input)
# Check that each row sums to 1
for i in range(batch_input.shape[0]):
row_sum = np.sum(batch_result.data[i])
assert abs(row_sum - 1.0) < 1e-6, f"Each row should sum to 1, row {i} sums to {row_sum}"
# Test numerical stability with large values
large_input = Tensor([[1000, 1001, 1002]])
large_result = softmax(large_input)
assert not np.any(np.isnan(large_result.data)), "Softmax should handle large values"
assert not np.any(np.isinf(large_result.data)), "Softmax should not produce inf values"
large_sum = np.sum(large_result.data)
assert abs(large_sum - 1.0) < 1e-6, "Large values should still sum to 1"
# Test shape preservation
assert batch_result.shape == batch_input.shape, "Softmax should preserve shape"
print("✅ Softmax activation tests passed!")
print(f"✅ Outputs sum to 1 (probability distribution)")
print(f"✅ All outputs are positive")
print(f"✅ Handles uniform inputs correctly")
print(f"✅ Works with batch inputs")
print(f"✅ Numerically stable with large values")
# Run the test
test_softmax_activation()
# %% [markdown]
"""
## 🎯 Integration Test: All Activations Working Together
### Real-World Scenario
Let's test how all activation functions work together in a realistic neural network scenario:
- **Input processing**: Raw data transformation
- **Hidden layers**: ReLU for internal processing
- **Output layer**: Softmax for classification
- **Comparison**: See how different activations transform the same data
"""
# %% nbgrader={"grade": true, "grade_id": "test-activations-integration", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false}
def test_activations_integration():
"""Test all activation functions working together"""
print("Testing activation functions integration...")
# Create instances of all activation functions
relu = ReLU()
sigmoid = Sigmoid()
tanh = Tanh()
softmax = Softmax()
# Test data: simulating neural network layer outputs
test_data = Tensor([[-2, -1, 0, 1, 2]])
# Apply each activation function
relu_result = relu(test_data)
sigmoid_result = sigmoid(test_data)
tanh_result = tanh(test_data)
softmax_result = softmax(test_data)
# Test that all functions preserve input shape
assert relu_result.shape == test_data.shape, "ReLU should preserve shape"
assert sigmoid_result.shape == test_data.shape, "Sigmoid should preserve shape"
assert tanh_result.shape == test_data.shape, "Tanh should preserve shape"
assert softmax_result.shape == test_data.shape, "Softmax should preserve shape"
# Test that all functions return Tensor objects
assert isinstance(relu_result, Tensor), "ReLU should return Tensor"
assert isinstance(sigmoid_result, Tensor), "Sigmoid should return Tensor"
assert isinstance(tanh_result, Tensor), "Tanh should return Tensor"
assert isinstance(softmax_result, Tensor), "Softmax should return Tensor"
# Test ReLU properties
assert np.all(relu_result.data >= 0), "ReLU output should be non-negative"
# Test Sigmoid properties
assert np.all(sigmoid_result.data > 0), "Sigmoid output should be positive"
assert np.all(sigmoid_result.data < 1), "Sigmoid output should be less than 1"
# Test Tanh properties
assert np.all(tanh_result.data > -1), "Tanh output should be > -1"
assert np.all(tanh_result.data < 1), "Tanh output should be < 1"
# Test Softmax properties
softmax_sum = np.sum(softmax_result.data)
assert abs(softmax_sum - 1.0) < 1e-6, "Softmax outputs should sum to 1"
# Test chaining activations (realistic neural network scenario)
# Hidden layer with ReLU
hidden_output = relu(test_data)
# Add some weights simulation (element-wise multiplication)
weights = Tensor([[0.5, 0.3, 0.8, 0.2, 0.7]])
weighted_output = hidden_output * weights
# Final layer with Softmax
final_output = softmax(weighted_output)
# Test that chained operations work
assert isinstance(final_output, Tensor), "Chained operations should return Tensor"
assert abs(np.sum(final_output.data) - 1.0) < 1e-6, "Final output should be valid probability"
# Test with batch data (multiple samples)
batch_data = Tensor([
[-2, -1, 0, 1, 2],
[1, 2, 3, 4, 5],
[-1, 0, 1, 2, 3]
])
batch_softmax = softmax(batch_data)
# Each row should sum to 1
for i in range(batch_data.shape[0]):
row_sum = np.sum(batch_softmax.data[i])
assert abs(row_sum - 1.0) < 1e-6, f"Batch row {i} should sum to 1"
print("✅ Activation functions integration tests passed!")
print(f"✅ All functions work together seamlessly")
print(f"✅ Shape preservation across all activations")
print(f"✅ Chained operations work correctly")
print(f"✅ Batch processing works for all activations")
print(f"✅ Ready for neural network integration!")
# Run the integration test
test_activations_integration()
# %% [markdown]
"""
## 🎯 Module Summary: Activation Functions Mastery!
Congratulations! You've successfully implemented all four essential activation functions:
### ✅ What You've Built
- **ReLU**: The foundation of modern deep learning with sparsity and efficiency
- **Sigmoid**: Classic activation for binary classification and probability outputs
- **Tanh**: Zero-centered activation with better gradient properties
- **Softmax**: Probability distribution for multi-class classification
### ✅ Key Learning Outcomes
- **Understanding**: Why nonlinearity is essential for neural networks
- **Implementation**: Built activation functions from scratch using NumPy
- **Testing**: Progressive validation with immediate feedback after each function
- **Integration**: Saw how activations work together in neural networks
- **Real-world context**: Understanding where each activation is used
### ✅ Mathematical Mastery
- **ReLU**: f(x) = max(0, x) - Simple but powerful
- **Sigmoid**: f(x) = 1/(1 + e^(-x)) - Maps to (0,1)
- **Tanh**: f(x) = tanh(x) - Zero-centered, maps to (-1,1)
- **Softmax**: f(x_i) = e^(x_i)/Σ(e^(x_j)) - Probability distribution
### ✅ Professional Skills Developed
- **Numerical stability**: Handling overflow and underflow
- **API design**: Consistent interfaces across all functions
- **Testing discipline**: Immediate validation after each implementation
- **Integration thinking**: Understanding how components work together
### ✅ Ready for Next Steps
Your activation functions are now ready to power:
- **Dense layers**: Linear transformations with nonlinear activations
- **Convolutional layers**: Spatial feature extraction with ReLU
- **Network architectures**: Complete neural networks with proper activations
- **Training**: Gradient computation through activation functions
### 🔗 Connection to Real ML Systems
Your implementations mirror production systems:
- **PyTorch**: `torch.nn.ReLU()`, `torch.nn.Sigmoid()`, `torch.nn.Tanh()`, `torch.nn.Softmax()`
- **TensorFlow**: `tf.nn.relu()`, `tf.nn.sigmoid()`, `tf.nn.tanh()`, `tf.nn.softmax()`
- **Industry applications**: Every major deep learning model uses these functions
### 🎯 The Power of Nonlinearity
You've unlocked the key to deep learning:
- **Before**: Linear models limited to simple patterns
- **After**: Nonlinear models can learn any pattern (universal approximation)
**Next Module**: Layers - Building blocks that combine your tensors and activations into powerful transformations!
Your activation functions are the key to neural network intelligence. Now let's build the layers that use them!
"""