mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-02 06:17:44 -05:00
- Enhanced tensor module documentation with mathematical foundations - Improved explanations for scalars, vectors, and matrices - Added NBGrader workflow documentation to activations module - Cleaned up .cursor/rules/ directory structure - Updated user preferences for better development workflow These changes improve the educational content and developer experience while maintaining the core functionality of all modules.
1589 lines
60 KiB
Python
1589 lines
60 KiB
Python
# ---
|
||
# jupyter:
|
||
# jupytext:
|
||
# text_representation:
|
||
# extension: .py
|
||
# format_name: percent
|
||
# format_version: '1.3'
|
||
# jupytext_version: 1.17.1
|
||
# ---
|
||
|
||
# %% [markdown]
|
||
"""
|
||
# Module 2: Activations - Nonlinearity in Neural Networks
|
||
|
||
Welcome to the Activations module! This is where neural networks get their power through nonlinearity.
|
||
|
||
## Learning Goals
|
||
- Understand why activation functions are essential for neural networks
|
||
- Implement the four most important activation functions: ReLU, Sigmoid, Tanh, and Softmax
|
||
- Visualize how activations transform data and enable complex learning
|
||
- See how activations work with layers to build powerful networks
|
||
- Master the NBGrader workflow with comprehensive testing
|
||
|
||
## Build → Use → Understand
|
||
1. **Build**: Activation functions that add nonlinearity
|
||
2. **Use**: Transform tensors and see immediate results
|
||
3. **Understand**: How nonlinearity enables complex pattern learning
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "activations-imports", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
#| default_exp core.activations
|
||
|
||
#| export
|
||
import math
|
||
import numpy as np
|
||
import matplotlib.pyplot as plt
|
||
import os
|
||
import sys
|
||
from typing import Union, List
|
||
|
||
# Import our Tensor class - try from package first, then from local module
|
||
try:
|
||
from tinytorch.core.tensor import Tensor
|
||
except ImportError:
|
||
# For development, import from local tensor module
|
||
sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))
|
||
from tensor_dev import Tensor
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "activations-setup", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
#| hide
|
||
#| export
|
||
def _should_show_plots():
|
||
"""Check if we should show plots (disable during testing)"""
|
||
# Check multiple conditions that indicate we're in test mode
|
||
is_pytest = (
|
||
'pytest' in sys.modules or
|
||
'test' in sys.argv or
|
||
os.environ.get('PYTEST_CURRENT_TEST') is not None or
|
||
any('test' in arg for arg in sys.argv) or
|
||
any('pytest' in arg for arg in sys.argv)
|
||
)
|
||
|
||
# Show plots in development mode (when not in test mode)
|
||
return not is_pytest
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "activations-visualization", "locked": false, "schema_version": 3, "solution": false, "task": false}
|
||
#| hide
|
||
#| export
|
||
def visualize_activation_function(activation_fn, name: str, x_range: tuple = (-5, 5), num_points: int = 100):
|
||
"""Visualize an activation function's behavior"""
|
||
if not _should_show_plots():
|
||
return
|
||
|
||
try:
|
||
|
||
# Generate input values
|
||
x_vals = np.linspace(x_range[0], x_range[1], num_points)
|
||
|
||
# Apply activation function
|
||
y_vals = []
|
||
for x in x_vals:
|
||
input_tensor = Tensor([[x]])
|
||
output = activation_fn(input_tensor)
|
||
y_vals.append(output.data.item())
|
||
|
||
# Create plot
|
||
plt.figure(figsize=(10, 6))
|
||
plt.plot(x_vals, y_vals, 'b-', linewidth=2, label=f'{name} Activation')
|
||
plt.grid(True, alpha=0.3)
|
||
plt.xlabel('Input (x)')
|
||
plt.ylabel(f'{name}(x)')
|
||
plt.title(f'{name} Activation Function')
|
||
plt.legend()
|
||
plt.show()
|
||
|
||
except ImportError:
|
||
print(" 📊 Matplotlib not available - skipping visualization")
|
||
except Exception as e:
|
||
print(f" ⚠️ Visualization error: {e}")
|
||
|
||
def visualize_activation_on_data(activation_fn, name: str, data: Tensor):
|
||
"""Show activation function applied to sample data"""
|
||
if not _should_show_plots():
|
||
return
|
||
|
||
try:
|
||
output = activation_fn(data)
|
||
print(f" 📊 {name} Example:")
|
||
print(f" Input: {data.data.flatten()}")
|
||
print(f" Output: {output.data.flatten()}")
|
||
print(f" Range: [{output.data.min():.3f}, {output.data.max():.3f}]")
|
||
|
||
except Exception as e:
|
||
print(f" ⚠️ Data visualization error: {e}")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 📦 Where This Code Lives in the Final Package
|
||
|
||
**Learning Side:** You work in `modules/source/02_activations/activations_dev.py`
|
||
**Building Side:** Code exports to `tinytorch.core.activations`
|
||
|
||
```python
|
||
# Final package structure:
|
||
from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax # All activations together!
|
||
from tinytorch.core.tensor import Tensor # The foundation
|
||
from tinytorch.core.layers import Dense, Conv2D # Coming next!
|
||
```
|
||
|
||
**Why this matters:**
|
||
- **Learning:** Focused modules for deep understanding
|
||
- **Production:** Proper organization like PyTorch's `torch.nn.functional`
|
||
- **Consistency:** All activation functions live together in `core.activations`
|
||
- **Integration:** Works seamlessly with tensors and layers
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🧠 The Mathematical Foundation of Nonlinearity
|
||
|
||
### The Universal Approximation Theorem
|
||
**Key Insight:** Neural networks with nonlinear activation functions can approximate any continuous function!
|
||
|
||
```
|
||
Without activation: f(x) = W₃(W₂(W₁x + b₁) + b₂) + b₃ = Wx + b (still linear!)
|
||
With activation: f(x) = W₃σ(W₂σ(W₁x + b₁) + b₂) + b₃ (nonlinear!)
|
||
```
|
||
|
||
### Why Nonlinearity is Critical
|
||
- **Linear Limitations**: Without activations, any deep network collapses to a single linear transformation
|
||
- **Feature Learning**: Nonlinear functions create complex decision boundaries
|
||
- **Representation Power**: Each layer can learn different levels of abstraction
|
||
- **Biological Inspiration**: Neurons fire (activate) only above certain thresholds
|
||
|
||
### Mathematical Properties We Care About
|
||
- **Differentiability**: For gradient-based optimization
|
||
- **Computational Efficiency**: Fast forward and backward passes
|
||
- **Numerical Stability**: Avoiding vanishing/exploding gradients
|
||
- **Sparsity**: Some activations (like ReLU) produce sparse representations
|
||
|
||
### Connection to Real ML Systems
|
||
Every major framework has these same activations:
|
||
- **PyTorch**: `torch.nn.ReLU()`, `torch.nn.Sigmoid()`, etc.
|
||
- **TensorFlow**: `tf.nn.relu()`, `tf.nn.sigmoid()`, etc.
|
||
- **JAX**: `jax.nn.relu()`, `jax.nn.sigmoid()`, etc.
|
||
- **TinyTorch**: `tinytorch.core.activations.ReLU()` (what we're building!)
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 1: What is an Activation Function?
|
||
|
||
### Definition
|
||
An **activation function** is a mathematical function that adds nonlinearity to neural networks. It transforms the output of a layer before passing it to the next layer.
|
||
|
||
### The Fundamental Problem: Why We Need Nonlinearity
|
||
|
||
#### **The Linear Limitation**
|
||
Without activation functions, neural networks are just linear transformations:
|
||
|
||
```python
|
||
# Without activation functions:
|
||
layer1 = W1 @ x + b1 # Linear transformation
|
||
layer2 = W2 @ layer1 + b2 # Another linear transformation
|
||
layer3 = W3 @ layer2 + b3 # Yet another linear transformation
|
||
|
||
# This is equivalent to:
|
||
final_output = (W3 @ W2 @ W1) @ x + (W3 @ W2 @ b1 + W3 @ b2 + b3)
|
||
# = W_combined @ x + b_combined
|
||
# Still just one linear transformation!
|
||
```
|
||
|
||
**No matter how many layers you stack, without activation functions, you can only learn linear relationships.**
|
||
|
||
#### **The Nonlinearity Solution**
|
||
Activation functions break this linearity:
|
||
|
||
```python
|
||
# With activation functions:
|
||
layer1 = activation(W1 @ x + b1) # Nonlinear transformation
|
||
layer2 = activation(W2 @ layer1 + b2) # Another nonlinear transformation
|
||
layer3 = activation(W3 @ layer2 + b3) # Complex nonlinear composition
|
||
|
||
# This can approximate any continuous function!
|
||
```
|
||
|
||
### Biological Inspiration: How Neurons Really Work
|
||
|
||
#### **The Biological Neuron**
|
||
Real neurons in the brain exhibit nonlinear behavior:
|
||
|
||
1. **Threshold behavior**: Neurons fire only when input exceeds a threshold
|
||
2. **Saturation**: Neurons have maximum firing rates
|
||
3. **Sparsity**: Most neurons are inactive most of the time
|
||
4. **Adaptation**: Neurons adjust their sensitivity over time
|
||
|
||
#### **Activation Functions as Neuron Models**
|
||
- **ReLU**: Models threshold behavior (fire or don't fire)
|
||
- **Sigmoid**: Models saturation (smooth transition from inactive to active)
|
||
- **Tanh**: Models bipolar neurons (inhibitory and excitatory)
|
||
- **Softmax**: Models competition between neurons (winner-take-all)
|
||
|
||
### Mathematical Foundation: The Universal Approximation Theorem
|
||
|
||
#### **The Theorem**
|
||
**Any continuous function can be approximated by a neural network with:**
|
||
- **One hidden layer**
|
||
- **Enough neurons**
|
||
- **Nonlinear activation functions**
|
||
|
||
#### **Why This Matters**
|
||
This theorem guarantees that neural networks with nonlinear activations can learn:
|
||
- **Image recognition**: Mapping pixels to object classes
|
||
- **Language understanding**: Mapping words to meanings
|
||
- **Game playing**: Mapping board states to optimal moves
|
||
- **Scientific modeling**: Mapping inputs to complex phenomena
|
||
|
||
#### **The Catch**
|
||
- **"Enough neurons"** might be exponentially large
|
||
- **Deep networks** can approximate the same functions with fewer neurons
|
||
- **Nonlinearity is essential** - linear networks can't do this
|
||
|
||
### Real-World Impact: What Nonlinearity Enables
|
||
|
||
#### **Computer Vision**
|
||
```python
|
||
# Linear model: Can only learn linear classifiers
|
||
# "Is this a cat?" → Only works if cats are linearly separable from dogs
|
||
# Reality: Cats and dogs are NOT linearly separable in pixel space!
|
||
|
||
# Nonlinear model: Can learn complex decision boundaries
|
||
# "Is this a cat?" → Can learn fur patterns, ear shapes, eye positions
|
||
# Reality: Deep networks with ReLU can distinguish thousands of objects
|
||
```
|
||
|
||
#### **Natural Language Processing**
|
||
```python
|
||
# Linear model: Can only learn word co-occurrence
|
||
# "The movie was great" → Linear combination of word vectors
|
||
# Problem: "The movie was not great" looks similar to linear model
|
||
|
||
# Nonlinear model: Can understand context and negation
|
||
# "The movie was great" vs "The movie was not great"
|
||
# Solution: Transformers with nonlinear feedforward layers
|
||
```
|
||
|
||
#### **Game Playing**
|
||
```python
|
||
# Linear model: Can only learn linear strategies
|
||
# Chess position → Linear combination of piece values
|
||
# Problem: Chess strategy is highly nonlinear (tactics, combinations)
|
||
|
||
# Nonlinear model: Can learn complex strategies
|
||
# Chess position → Deep evaluation of patterns and tactics
|
||
# Success: AlphaZero uses deep networks with ReLU
|
||
```
|
||
|
||
### Activation Function Properties: What Makes Them Work
|
||
|
||
#### **1. Nonlinearity (Essential)**
|
||
- **Definition**: f(ax + by) ≠ af(x) + bf(y)
|
||
- **Why crucial**: Enables complex function approximation
|
||
- **Example**: ReLU(2x) ≠ 2×ReLU(x) for negative x
|
||
|
||
#### **2. Differentiability (Important)**
|
||
- **Definition**: Function has well-defined derivatives
|
||
- **Why important**: Enables gradient-based optimization
|
||
- **Trade-off**: ReLU is not differentiable at 0, but works well in practice
|
||
|
||
#### **3. Computational Efficiency (Practical)**
|
||
- **Definition**: Fast to compute forward and backward passes
|
||
- **Why important**: Training speed and inference speed
|
||
- **Example**: ReLU is faster than sigmoid (no exponentials)
|
||
|
||
#### **4. Gradient Properties (Critical)**
|
||
- **Vanishing gradients**: Derivatives approach 0 (sigmoid, tanh)
|
||
- **Exploding gradients**: Derivatives grow exponentially (rare)
|
||
- **Gradient preservation**: Derivatives stay reasonable (ReLU)
|
||
|
||
#### **5. Output Range (Application-dependent)**
|
||
- **Bounded**: Output in fixed range (sigmoid: [0,1], tanh: [-1,1])
|
||
- **Unbounded**: Output can be any value (ReLU: [0,∞))
|
||
- **Probabilistic**: Output sums to 1 (softmax)
|
||
|
||
### The Four Fundamental Activation Functions
|
||
|
||
#### **1. ReLU (Rectified Linear Unit)**
|
||
- **Formula**: $f(x) = \max(0, x)$
|
||
- **Use case**: Hidden layers in most networks
|
||
- **Advantages**: Simple, fast, no vanishing gradients
|
||
- **Disadvantages**: "Dead neurons" problem
|
||
|
||
#### **2. Sigmoid**
|
||
- **Formula**: $f(x) = \frac{1}{1 + e^{-x}}$
|
||
- **Use case**: Binary classification output
|
||
- **Advantages**: Smooth, probabilistic interpretation
|
||
- **Disadvantages**: Vanishing gradients, computationally expensive
|
||
|
||
#### **3. Tanh (Hyperbolic Tangent)**
|
||
- **Formula**: $f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
|
||
- **Use case**: Hidden layers (better than sigmoid)
|
||
- **Advantages**: Zero-centered, stronger gradients than sigmoid
|
||
- **Disadvantages**: Still suffers from vanishing gradients
|
||
|
||
#### **4. Softmax**
|
||
- **Formula**: $f(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}$
|
||
- **Use case**: Multi-class classification output
|
||
- **Advantages**: Probabilistic, sums to 1
|
||
- **Disadvantages**: Computationally expensive, can saturate
|
||
|
||
### Modern Activation Function Evolution
|
||
|
||
#### **Historical Timeline**
|
||
1. **1943**: Threshold functions (McCulloch-Pitts neurons)
|
||
2. **1960s**: Sigmoid functions (perceptrons)
|
||
3. **1980s**: Tanh functions (backpropagation era)
|
||
4. **2010s**: ReLU revolution (deep learning breakthrough)
|
||
5. **2020s**: Advanced variants (Swish, GELU, Mish)
|
||
|
||
#### **Why ReLU Won**
|
||
- **Simplicity**: Just max(0, x)
|
||
- **Speed**: No exponentials or divisions
|
||
- **Gradients**: No vanishing gradient problem
|
||
- **Sparsity**: Creates sparse representations
|
||
- **Empirical success**: Works well in practice
|
||
|
||
### Connection to Previous Modules
|
||
|
||
#### **From Module 1 (Tensor)**
|
||
- **Input**: Tensors from previous layers
|
||
- **Output**: Transformed tensors for next layers
|
||
- **Operations**: Element-wise transformations
|
||
|
||
#### **To Module 3 (Layers)**
|
||
- **Integration**: Layers + activations = nonlinear transformations
|
||
- **Composition**: Stack layers with activations for deep networks
|
||
- **Design**: Choose activation based on layer purpose
|
||
|
||
### Visual Analogy: The Activation Function Zoo
|
||
|
||
Think of activation functions as different types of **signal processors**:
|
||
|
||
- **ReLU**: One-way valve (blocks negative, passes positive)
|
||
- **Sigmoid**: Volume knob (smoothly adjusts from 0 to 1)
|
||
- **Tanh**: Balanced amplifier (amplifies around 0, saturates at extremes)
|
||
- **Softmax**: Probability distributor (converts scores to probabilities)
|
||
|
||
Let's implement these essential nonlinear functions!
|
||
"""
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 2: ReLU - The Workhorse of Deep Learning
|
||
|
||
### What is ReLU?
|
||
**ReLU (Rectified Linear Unit)** is the most popular activation function in deep learning.
|
||
|
||
**Mathematical Definition:**
|
||
```
|
||
f(x) = max(0, x)
|
||
```
|
||
|
||
**In Plain English:**
|
||
- If input is positive → pass it through unchanged
|
||
- If input is negative → output zero
|
||
|
||
### Why ReLU is Popular
|
||
1. **Simple**: Easy to compute and understand
|
||
2. **Fast**: No expensive operations (no exponentials)
|
||
3. **Sparse**: Outputs many zeros, creating sparse representations
|
||
4. **Gradient-friendly**: Gradient is either 0 or 1 (no vanishing gradient for positive inputs)
|
||
|
||
### Real-World Analogy
|
||
ReLU is like a **one-way valve** - it only lets positive "pressure" through, blocking negative values completely.
|
||
|
||
### When to Use ReLU
|
||
- **Hidden layers** in most neural networks (90% of cases)
|
||
- **Convolutional layers** in image processing (CNNs)
|
||
- **When you want sparse activations** (many zeros)
|
||
- **Deep networks** (doesn't suffer from vanishing gradients)
|
||
|
||
### Real-World Applications
|
||
- **Image Classification**: ResNet, VGG, AlexNet all use ReLU
|
||
- **Object Detection**: YOLO, R-CNN use ReLU in backbone networks
|
||
- **Natural Language Processing**: Transformer models use ReLU in feedforward layers
|
||
- **Recommendation Systems**: Deep collaborative filtering with ReLU
|
||
|
||
### Mathematical Properties
|
||
- **Derivative**: f'(x) = 1 if x > 0, else 0
|
||
- **Range**: [0, ∞)
|
||
- **Sparsity**: Outputs exactly 0 for negative inputs
|
||
- **Computational Cost**: O(1) - just a max operation
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "relu-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class ReLU:
|
||
"""
|
||
ReLU Activation Function: f(x) = max(0, x)
|
||
|
||
The most popular activation function in deep learning.
|
||
Simple, fast, and effective for most applications.
|
||
"""
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Apply ReLU activation: f(x) = max(0, x)
|
||
|
||
TODO: Implement ReLU activation
|
||
|
||
APPROACH:
|
||
1. For each element in the input tensor, apply max(0, element)
|
||
2. Return a new Tensor with the results
|
||
|
||
EXAMPLE:
|
||
Input: Tensor([[-1, 0, 1, 2, -3]])
|
||
Expected: Tensor([[0, 0, 1, 2, 0]])
|
||
|
||
HINTS:
|
||
- Use np.maximum(0, x.data) for element-wise max
|
||
- Remember to return a new Tensor object
|
||
- The shape should remain the same as input
|
||
"""
|
||
### BEGIN SOLUTION
|
||
result = np.maximum(0, x.data)
|
||
return Tensor(result)
|
||
### END SOLUTION
|
||
|
||
def __call__(self, x: Tensor) -> Tensor:
|
||
"""Make the class callable: relu(x) instead of relu.forward(x)"""
|
||
return self.forward(x)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: ReLU Activation
|
||
|
||
Let's test your ReLU implementation right away! This gives you immediate feedback on whether your activation function works correctly.
|
||
|
||
**This is a unit test** - it tests one specific activation function (ReLU) in isolation.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-relu-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false}
|
||
# Test ReLU activation immediately after implementation
|
||
print("🔬 Unit Test: ReLU Activation...")
|
||
|
||
# Create ReLU instance
|
||
relu = ReLU()
|
||
|
||
# Test with mixed positive/negative values
|
||
try:
|
||
test_input = Tensor([[-2, -1, 0, 1, 2]])
|
||
result = relu(test_input)
|
||
expected = np.array([[0, 0, 0, 1, 2]])
|
||
|
||
assert np.array_equal(result.data, expected), f"ReLU failed: expected {expected}, got {result.data}"
|
||
print(f"✅ ReLU test: input {test_input.data} → output {result.data}")
|
||
|
||
# Test that negative values become zero
|
||
assert np.all(result.data >= 0), "ReLU should make all negative values zero"
|
||
print("✅ ReLU correctly zeros negative values")
|
||
|
||
# Test that positive values remain unchanged
|
||
positive_input = Tensor([[1, 2, 3, 4, 5]])
|
||
positive_result = relu(positive_input)
|
||
assert np.array_equal(positive_result.data, positive_input.data), "ReLU should preserve positive values"
|
||
print("✅ ReLU preserves positive values")
|
||
|
||
except Exception as e:
|
||
print(f"❌ ReLU test failed: {e}")
|
||
raise
|
||
|
||
# Show visual example
|
||
print("🎯 ReLU behavior:")
|
||
print(" Negative → 0 (blocked)")
|
||
print(" Zero → 0 (blocked)")
|
||
print(" Positive → unchanged (passed through)")
|
||
print("📈 Progress: ReLU ✓")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 3: Sigmoid - The Smooth Squasher
|
||
|
||
### What is Sigmoid?
|
||
**Sigmoid** is a smooth S-shaped function that squashes inputs to the range (0, 1).
|
||
|
||
**Mathematical Definition:**
|
||
```
|
||
f(x) = 1 / (1 + e^(-x))
|
||
```
|
||
|
||
**Properties:**
|
||
- **Range**: (0, 1) - never exactly 0 or 1
|
||
- **Smooth**: Differentiable everywhere
|
||
- **Monotonic**: Always increasing
|
||
- **Centered**: Around 0.5
|
||
|
||
### Why Sigmoid is Useful
|
||
1. **Probabilistic**: Output can be interpreted as probabilities
|
||
2. **Bounded**: Output is always between 0 and 1
|
||
3. **Smooth**: Good for gradient-based optimization
|
||
4. **Historical**: Was the standard before ReLU
|
||
|
||
### Real-World Analogy
|
||
Sigmoid is like a **soft switch** - it gradually turns on as input increases, unlike ReLU's hard cutoff.
|
||
|
||
### Real-World Applications
|
||
- **Binary Classification**: Final layer for yes/no decisions (spam detection, medical diagnosis)
|
||
- **Logistic Regression**: The classic ML algorithm uses sigmoid
|
||
- **Attention Mechanisms**: Gating mechanisms in LSTM/GRU
|
||
- **Probability Estimation**: When you need outputs between 0 and 1
|
||
|
||
### Mathematical Properties
|
||
- **Derivative**: f'(x) = f(x)(1 - f(x)) - elegant and efficient!
|
||
- **Range**: (0, 1) - never exactly 0 or 1
|
||
- **Symmetry**: Sigmoid(0) = 0.5 (centered)
|
||
- **Saturation**: Gradients approach 0 for large |x| (vanishing gradient problem)
|
||
|
||
### When to Use Sigmoid
|
||
- **Binary classification** (output layer)
|
||
- **Gates** in LSTM/GRU networks
|
||
- **When you need probabilistic outputs**
|
||
- **Avoid in deep networks** (vanishing gradients)
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "sigmoid-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class Sigmoid:
|
||
"""
|
||
Sigmoid Activation Function: f(x) = 1 / (1 + e^(-x))
|
||
|
||
Smooth S-shaped function that squashes inputs to (0, 1).
|
||
Useful for binary classification and probabilistic outputs.
|
||
"""
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Apply Sigmoid activation: f(x) = 1 / (1 + e^(-x))
|
||
|
||
TODO: Implement Sigmoid activation with numerical stability
|
||
|
||
APPROACH:
|
||
1. Clip input values to prevent overflow (e.g., between -500 and 500)
|
||
2. Apply the sigmoid formula: 1 / (1 + exp(-x))
|
||
3. Return a new Tensor with the results
|
||
|
||
EXAMPLE:
|
||
Input: Tensor([[-2, 0, 2]])
|
||
Expected: Tensor([[0.119, 0.5, 0.881]]) (approximately)
|
||
|
||
HINTS:
|
||
- Use np.clip(x.data, -500, 500) for numerical stability
|
||
- Use np.exp() for the exponential function
|
||
- Be careful with very large/small inputs to avoid overflow
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Clip for numerical stability
|
||
clipped = np.clip(x.data, -500, 500)
|
||
result = 1 / (1 + np.exp(-clipped))
|
||
return Tensor(result)
|
||
### END SOLUTION
|
||
|
||
def __call__(self, x: Tensor) -> Tensor:
|
||
"""Make the class callable: sigmoid(x) instead of sigmoid.forward(x)"""
|
||
return self.forward(x)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Sigmoid Activation
|
||
|
||
Let's test your Sigmoid implementation! This should squash all values to the range (0, 1).
|
||
|
||
**This is a unit test** - it tests one specific activation function (Sigmoid) in isolation.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-sigmoid-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Sigmoid activation immediately after implementation
|
||
print("🔬 Unit Test: Sigmoid Activation...")
|
||
|
||
# Create Sigmoid instance
|
||
sigmoid = Sigmoid()
|
||
|
||
# Test with various inputs
|
||
try:
|
||
test_input = Tensor([[-2, -1, 0, 1, 2]])
|
||
result = sigmoid(test_input)
|
||
|
||
# Check that all outputs are between 0 and 1
|
||
assert np.all(result.data > 0), "Sigmoid outputs should be > 0"
|
||
assert np.all(result.data < 1), "Sigmoid outputs should be < 1"
|
||
print(f"✅ Sigmoid test: input {test_input.data} → output {result.data}")
|
||
|
||
# Test specific values
|
||
zero_input = Tensor([[0]])
|
||
zero_result = sigmoid(zero_input)
|
||
assert np.allclose(zero_result.data, 0.5, atol=1e-6), f"Sigmoid(0) should be 0.5, got {zero_result.data}"
|
||
print("✅ Sigmoid(0) = 0.5 (correct)")
|
||
|
||
# Test that it's monotonic (larger inputs give larger outputs)
|
||
small_input = Tensor([[-1]])
|
||
large_input = Tensor([[1]])
|
||
small_result = sigmoid(small_input)
|
||
large_result = sigmoid(large_input)
|
||
assert small_result.data < large_result.data, "Sigmoid should be monotonic"
|
||
print("✅ Sigmoid is monotonic (increasing)")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Sigmoid test failed: {e}")
|
||
raise
|
||
|
||
# Show visual example
|
||
print("🎯 Sigmoid behavior:")
|
||
print(" Large negative → approaches 0")
|
||
print(" Zero → 0.5")
|
||
print(" Large positive → approaches 1")
|
||
print("📈 Progress: ReLU ✓, Sigmoid ✓")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 4: Tanh - The Zero-Centered Squasher
|
||
|
||
### What is Tanh?
|
||
**Tanh (Hyperbolic Tangent)** is similar to Sigmoid but centered around zero.
|
||
|
||
**Mathematical Definition:**
|
||
```
|
||
f(x) = tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))
|
||
```
|
||
|
||
**Properties:**
|
||
- **Range**: (-1, 1) - symmetric around zero
|
||
- **Zero-centered**: Output averages to zero
|
||
- **Smooth**: Differentiable everywhere
|
||
- **Stronger gradients**: Than sigmoid in some regions
|
||
|
||
### Why Tanh is Useful
|
||
1. **Zero-centered**: Better for training (gradients don't all have same sign)
|
||
2. **Symmetric**: Treats positive and negative inputs equally
|
||
3. **Stronger gradients**: Can help with training dynamics
|
||
4. **Bounded**: Output is always between -1 and 1
|
||
|
||
### Real-World Analogy
|
||
Tanh is like a **balanced scale** - it can tip positive or negative, with zero as the neutral point.
|
||
|
||
### When to Use Tanh
|
||
- **Hidden layers** (alternative to ReLU)
|
||
- **RNNs** (traditional choice)
|
||
- **When you need zero-centered outputs**
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "tanh-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class Tanh:
|
||
"""
|
||
Tanh Activation Function: f(x) = tanh(x)
|
||
|
||
Zero-centered S-shaped function that squashes inputs to (-1, 1).
|
||
Better than sigmoid for hidden layers due to zero-centered outputs.
|
||
"""
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Apply Tanh activation: f(x) = tanh(x)
|
||
|
||
TODO: Implement Tanh activation
|
||
|
||
APPROACH:
|
||
1. Use NumPy's tanh function for numerical stability
|
||
2. Apply to the tensor data
|
||
3. Return a new Tensor with the results
|
||
|
||
EXAMPLE:
|
||
Input: Tensor([[-2, 0, 2]])
|
||
Expected: Tensor([[-0.964, 0.0, 0.964]]) (approximately)
|
||
|
||
HINTS:
|
||
- Use np.tanh(x.data) - NumPy handles the math
|
||
- Much simpler than implementing the formula manually
|
||
- NumPy's tanh is numerically stable
|
||
"""
|
||
### BEGIN SOLUTION
|
||
result = np.tanh(x.data)
|
||
return Tensor(result)
|
||
### END SOLUTION
|
||
|
||
def __call__(self, x: Tensor) -> Tensor:
|
||
"""Make the class callable: tanh(x) instead of tanh.forward(x)"""
|
||
return self.forward(x)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Tanh Activation
|
||
|
||
Let's test your Tanh implementation! This should squash all values to the range (-1, 1) and be zero-centered.
|
||
|
||
**This is a unit test** - it tests one specific activation function (Tanh) in isolation.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-tanh-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Tanh activation immediately after implementation
|
||
print("🔬 Unit Test: Tanh Activation...")
|
||
|
||
# Create Tanh instance
|
||
tanh = Tanh()
|
||
|
||
# Test with various inputs
|
||
try:
|
||
test_input = Tensor([[-2, -1, 0, 1, 2]])
|
||
result = tanh(test_input)
|
||
|
||
# Check that all outputs are between -1 and 1
|
||
assert np.all(result.data > -1), "Tanh outputs should be > -1"
|
||
assert np.all(result.data < 1), "Tanh outputs should be < 1"
|
||
print(f"✅ Tanh test: input {test_input.data} → output {result.data}")
|
||
|
||
# Test specific values
|
||
zero_input = Tensor([[0]])
|
||
zero_result = tanh(zero_input)
|
||
assert np.allclose(zero_result.data, 0.0, atol=1e-6), f"Tanh(0) should be 0.0, got {zero_result.data}"
|
||
print("✅ Tanh(0) = 0.0 (zero-centered)")
|
||
|
||
# Test symmetry: tanh(-x) = -tanh(x)
|
||
pos_input = Tensor([[1]])
|
||
neg_input = Tensor([[-1]])
|
||
pos_result = tanh(pos_input)
|
||
neg_result = tanh(neg_input)
|
||
assert np.allclose(pos_result.data, -neg_result.data, atol=1e-6), "Tanh should be symmetric"
|
||
print("✅ Tanh is symmetric: tanh(-x) = -tanh(x)")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Tanh test failed: {e}")
|
||
raise
|
||
|
||
# Show visual example
|
||
print("🎯 Tanh behavior:")
|
||
print(" Large negative → approaches -1")
|
||
print(" Zero → 0.0 (zero-centered)")
|
||
print(" Large positive → approaches 1")
|
||
print("📈 Progress: ReLU ✓, Sigmoid ✓, Tanh ✓")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## Step 5: Softmax - The Probability Converter
|
||
|
||
### What is Softmax?
|
||
**Softmax** converts a vector of numbers into a probability distribution.
|
||
|
||
**Mathematical Definition:**
|
||
```
|
||
f(x_i) = e^(x_i) / Σ(e^(x_j)) for all j
|
||
```
|
||
|
||
**Properties:**
|
||
- **Probabilities**: All outputs sum to 1
|
||
- **Non-negative**: All outputs are ≥ 0
|
||
- **Differentiable**: Smooth everywhere
|
||
- **Competitive**: Amplifies differences between inputs
|
||
|
||
### Why Softmax is Essential
|
||
1. **Multi-class classification**: Converts logits to probabilities
|
||
2. **Attention mechanisms**: Focuses on important elements
|
||
3. **Interpretable**: Output can be understood as confidence
|
||
4. **Competitive**: Emphasizes the largest input
|
||
|
||
### Real-World Analogy
|
||
Softmax is like **dividing a pie** - it takes any set of numbers and converts them into slices that sum to 100%.
|
||
|
||
### When to Use Softmax
|
||
- **Multi-class classification** (output layer)
|
||
- **Attention mechanisms** in transformers
|
||
- **When you need probability distributions**
|
||
"""
|
||
|
||
# %% nbgrader={"grade": false, "grade_id": "softmax-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
|
||
#| export
|
||
class Softmax:
|
||
"""
|
||
Softmax Activation Function: f(x_i) = e^(x_i) / Σ(e^(x_j))
|
||
|
||
Converts a vector of numbers into a probability distribution.
|
||
Essential for multi-class classification and attention mechanisms.
|
||
"""
|
||
|
||
def forward(self, x: Tensor) -> Tensor:
|
||
"""
|
||
Apply Softmax activation: f(x_i) = e^(x_i) / Σ(e^(x_j))
|
||
|
||
TODO: Implement Softmax activation with numerical stability
|
||
|
||
APPROACH:
|
||
1. Subtract max value from inputs for numerical stability
|
||
2. Compute exponentials: e^(x_i - max)
|
||
3. Divide by sum of exponentials
|
||
4. Return a new Tensor with the results
|
||
|
||
EXAMPLE:
|
||
Input: Tensor([[1, 2, 3]])
|
||
Expected: Tensor([[0.09, 0.24, 0.67]]) (approximately, sums to 1)
|
||
|
||
HINTS:
|
||
- Use np.max(x.data, axis=-1, keepdims=True) for stability
|
||
- Use np.exp() for exponentials
|
||
- Use np.sum() for the denominator
|
||
- Make sure the result sums to 1 along the last axis
|
||
"""
|
||
### BEGIN SOLUTION
|
||
# Subtract max for numerical stability
|
||
x_max = np.max(x.data, axis=-1, keepdims=True)
|
||
x_shifted = x.data - x_max
|
||
|
||
# Compute softmax
|
||
exp_x = np.exp(x_shifted)
|
||
sum_exp = np.sum(exp_x, axis=-1, keepdims=True)
|
||
result = exp_x / sum_exp
|
||
|
||
return Tensor(result)
|
||
### END SOLUTION
|
||
|
||
def __call__(self, x: Tensor) -> Tensor:
|
||
"""Make the class callable: softmax(x) instead of softmax.forward(x)"""
|
||
return self.forward(x)
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Unit Test: Softmax Activation
|
||
|
||
Let's test your Softmax implementation! This should convert any vector into a probability distribution that sums to 1.
|
||
|
||
**This is a unit test** - it tests one specific activation function (Softmax) in isolation.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-softmax-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Softmax activation immediately after implementation
|
||
print("🔬 Unit Test: Softmax Activation...")
|
||
|
||
# Create Softmax instance
|
||
softmax = Softmax()
|
||
|
||
# Test with various inputs
|
||
try:
|
||
test_input = Tensor([[1, 2, 3]])
|
||
result = softmax(test_input)
|
||
|
||
# Check that all outputs are non-negative
|
||
assert np.all(result.data >= 0), "Softmax outputs should be non-negative"
|
||
print(f"✅ Softmax test: input {test_input.data} → output {result.data}")
|
||
|
||
# Check that outputs sum to 1
|
||
sum_result = np.sum(result.data)
|
||
assert np.allclose(sum_result, 1.0, atol=1e-6), f"Softmax should sum to 1, got {sum_result}"
|
||
print(f"✅ Softmax sums to 1: {sum_result:.6f}")
|
||
|
||
# Test that larger inputs get higher probabilities
|
||
large_input = Tensor([[1, 2, 5]]) # 5 should get the highest probability
|
||
large_result = softmax(large_input)
|
||
max_idx = np.argmax(large_result.data)
|
||
assert max_idx == 2, f"Largest input should get highest probability, got max at index {max_idx}"
|
||
print("✅ Softmax gives highest probability to largest input")
|
||
|
||
# Test numerical stability with large numbers
|
||
stable_input = Tensor([[1000, 1001, 1002]])
|
||
stable_result = softmax(stable_input)
|
||
assert not np.any(np.isnan(stable_result.data)), "Softmax should be numerically stable"
|
||
assert np.allclose(np.sum(stable_result.data), 1.0, atol=1e-6), "Softmax should still sum to 1 with large inputs"
|
||
print("✅ Softmax is numerically stable with large inputs")
|
||
|
||
except Exception as e:
|
||
print(f"❌ Softmax test failed: {e}")
|
||
raise
|
||
|
||
# Show visual example
|
||
print("🎯 Softmax behavior:")
|
||
print(" Converts any vector → probability distribution")
|
||
print(" All outputs ≥ 0, sum = 1")
|
||
print(" Larger inputs → higher probabilities")
|
||
print("📈 Progress: ReLU ✓, Sigmoid ✓, Tanh ✓, Softmax ✓")
|
||
print("🚀 All activation functions ready!")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Test Your Activation Functions
|
||
|
||
Once you implement the activation functions above, run these cells to test them:
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-relu", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false}
|
||
# Test ReLU activation
|
||
print("Testing ReLU activation...")
|
||
|
||
relu = ReLU()
|
||
|
||
# Test basic functionality
|
||
input_tensor = Tensor([[-2, -1, 0, 1, 2]])
|
||
output = relu(input_tensor)
|
||
expected = np.array([[0, 0, 0, 1, 2]])
|
||
assert np.array_equal(output.data, expected), f"ReLU failed: expected {expected}, got {output.data}"
|
||
|
||
# Test with matrix
|
||
matrix_input = Tensor([[-1, 2], [3, -4]])
|
||
matrix_output = relu(matrix_input)
|
||
expected_matrix = np.array([[0, 2], [3, 0]])
|
||
assert np.array_equal(matrix_output.data, expected_matrix), f"ReLU matrix failed: expected {expected_matrix}, got {matrix_output.data}"
|
||
|
||
# Test shape preservation
|
||
assert output.shape == input_tensor.shape, f"ReLU should preserve shape: input {input_tensor.shape}, output {output.shape}"
|
||
|
||
print("✅ ReLU tests passed!")
|
||
print(f"✅ ReLU({input_tensor.data.flatten()}) = {output.data.flatten()}")
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-sigmoid", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Sigmoid activation
|
||
print("Testing Sigmoid activation...")
|
||
|
||
sigmoid = Sigmoid()
|
||
|
||
# Test basic functionality
|
||
input_tensor = Tensor([[0]])
|
||
output = sigmoid(input_tensor)
|
||
expected_value = 0.5
|
||
assert abs(output.data.item() - expected_value) < 1e-6, f"Sigmoid(0) should be 0.5, got {output.data.item()}"
|
||
|
||
# Test range bounds (allowing for floating-point precision at extremes)
|
||
large_input = Tensor([[100]])
|
||
large_output = sigmoid(large_input)
|
||
assert 0 < large_output.data.item() <= 1, f"Sigmoid output should be in (0,1], got {large_output.data.item()}"
|
||
|
||
small_input = Tensor([[-100]])
|
||
small_output = sigmoid(small_input)
|
||
assert 0 <= small_output.data.item() < 1, f"Sigmoid output should be in [0,1), got {small_output.data.item()}"
|
||
|
||
# Test with multiple values
|
||
multi_input = Tensor([[-2, 0, 2]])
|
||
multi_output = sigmoid(multi_input)
|
||
assert multi_output.shape == multi_input.shape, "Sigmoid should preserve shape"
|
||
assert np.all((multi_output.data > 0) & (multi_output.data < 1)), "All sigmoid outputs should be in (0,1)"
|
||
|
||
print("✅ Sigmoid tests passed!")
|
||
print(f"✅ Sigmoid({multi_input.data.flatten()}) = {multi_output.data.flatten()}")
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-tanh", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Tanh activation
|
||
print("Testing Tanh activation...")
|
||
|
||
tanh = Tanh()
|
||
|
||
# Test basic functionality
|
||
input_tensor = Tensor([[0]])
|
||
output = tanh(input_tensor)
|
||
expected_value = 0.0
|
||
assert abs(output.data.item() - expected_value) < 1e-6, f"Tanh(0) should be 0.0, got {output.data.item()}"
|
||
|
||
# Test range bounds (allowing for floating-point precision at extremes)
|
||
large_input = Tensor([[100]])
|
||
large_output = tanh(large_input)
|
||
assert -1 <= large_output.data.item() <= 1, f"Tanh output should be in [-1,1], got {large_output.data.item()}"
|
||
|
||
small_input = Tensor([[-100]])
|
||
small_output = tanh(small_input)
|
||
assert -1 <= small_output.data.item() <= 1, f"Tanh output should be in [-1,1], got {small_output.data.item()}"
|
||
|
||
# Test symmetry: tanh(-x) = -tanh(x)
|
||
test_input = Tensor([[2]])
|
||
pos_output = tanh(test_input)
|
||
neg_input = Tensor([[-2]])
|
||
neg_output = tanh(neg_input)
|
||
assert abs(pos_output.data.item() + neg_output.data.item()) < 1e-6, "Tanh should be symmetric: tanh(-x) = -tanh(x)"
|
||
|
||
print("✅ Tanh tests passed!")
|
||
print(f"✅ Tanh(±2) = ±{abs(pos_output.data.item()):.3f}")
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-softmax", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false}
|
||
# Test Softmax activation
|
||
print("Testing Softmax activation...")
|
||
|
||
softmax = Softmax()
|
||
|
||
# Test basic functionality
|
||
input_tensor = Tensor([[1, 2, 3]])
|
||
output = softmax(input_tensor)
|
||
|
||
# Check that outputs sum to 1
|
||
sum_output = np.sum(output.data)
|
||
assert abs(sum_output - 1.0) < 1e-6, f"Softmax outputs should sum to 1, got {sum_output}"
|
||
|
||
# Check that all outputs are positive
|
||
assert np.all(output.data > 0), "All softmax outputs should be positive"
|
||
|
||
# Check that larger inputs give larger outputs
|
||
assert output.data[0, 2] > output.data[0, 1] > output.data[0, 0], "Softmax should preserve order"
|
||
|
||
# Test with matrix (multiple rows)
|
||
matrix_input = Tensor([[1, 2], [3, 4]])
|
||
matrix_output = softmax(matrix_input)
|
||
row_sums = np.sum(matrix_output.data, axis=1)
|
||
assert np.allclose(row_sums, 1.0), f"Each row should sum to 1, got {row_sums}"
|
||
|
||
print("✅ Softmax tests passed!")
|
||
print(f"✅ Softmax({input_tensor.data.flatten()}) = {output.data.flatten()}")
|
||
print(f"✅ Sum = {np.sum(output.data):.6f}")
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-activation-integration", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false}
|
||
# Test activation function integration
|
||
print("Testing activation function integration...")
|
||
|
||
# Create test data
|
||
test_data = Tensor([[-2, -1, 0, 1, 2]])
|
||
|
||
# Test all activations
|
||
relu = ReLU()
|
||
sigmoid = Sigmoid()
|
||
tanh = Tanh()
|
||
softmax = Softmax()
|
||
|
||
# Apply all activations
|
||
relu_out = relu(test_data)
|
||
sigmoid_out = sigmoid(test_data)
|
||
tanh_out = tanh(test_data)
|
||
softmax_out = softmax(test_data)
|
||
|
||
# Check shapes are preserved
|
||
assert relu_out.shape == test_data.shape, "ReLU should preserve shape"
|
||
assert sigmoid_out.shape == test_data.shape, "Sigmoid should preserve shape"
|
||
assert tanh_out.shape == test_data.shape, "Tanh should preserve shape"
|
||
assert softmax_out.shape == test_data.shape, "Softmax should preserve shape"
|
||
|
||
# Check ranges (allowing for floating-point precision at extremes)
|
||
assert np.all(relu_out.data >= 0), "ReLU outputs should be non-negative"
|
||
assert np.all((sigmoid_out.data >= 0) & (sigmoid_out.data <= 1)), "Sigmoid outputs should be in [0,1]"
|
||
assert np.all((tanh_out.data >= -1) & (tanh_out.data <= 1)), "Tanh outputs should be in [-1,1]"
|
||
assert np.all(softmax_out.data > 0), "Softmax outputs should be positive"
|
||
|
||
# Test chaining (composition)
|
||
chained = relu(sigmoid(test_data))
|
||
assert chained.shape == test_data.shape, "Chained activations should preserve shape"
|
||
|
||
print("✅ Activation integration tests passed!")
|
||
print(f"✅ All activation functions work correctly")
|
||
print(f"✅ Input: {test_data.data.flatten()}")
|
||
print(f"✅ ReLU: {relu_out.data.flatten()}")
|
||
print(f"✅ Sigmoid: {sigmoid_out.data.flatten()}")
|
||
print(f"✅ Tanh: {tanh_out.data.flatten()}")
|
||
print(f"✅ Softmax: {softmax_out.data.flatten()}")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🧪 Comprehensive Testing: All Activation Functions
|
||
|
||
Let's thoroughly test all your activation functions to make sure they work correctly in all scenarios.
|
||
This comprehensive testing ensures your implementations are robust and ready for real ML applications.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-activations-comprehensive", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false}
|
||
def test_activations_comprehensive():
|
||
"""Comprehensive test of all activation functions."""
|
||
print("🔬 Testing all activation functions comprehensively...")
|
||
|
||
tests_passed = 0
|
||
total_tests = 12
|
||
|
||
# Test 1: ReLU Basic Functionality
|
||
try:
|
||
relu = ReLU()
|
||
test_input = Tensor([[-2, -1, 0, 1, 2]])
|
||
result = relu(test_input)
|
||
expected = np.array([[0, 0, 0, 1, 2]])
|
||
|
||
assert np.array_equal(result.data, expected), f"ReLU failed: expected {expected}, got {result.data}"
|
||
assert result.shape == test_input.shape, "ReLU should preserve shape"
|
||
assert np.all(result.data >= 0), "ReLU outputs should be non-negative"
|
||
|
||
print(f"✅ ReLU basic: {test_input.data.flatten()} → {result.data.flatten()}")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ ReLU basic test failed: {e}")
|
||
|
||
# Test 2: ReLU Edge Cases
|
||
try:
|
||
relu = ReLU()
|
||
|
||
# Test with zeros
|
||
zero_input = Tensor([[0, 0, 0]])
|
||
zero_result = relu(zero_input)
|
||
assert np.array_equal(zero_result.data, np.array([[0, 0, 0]])), "ReLU(0) should be 0"
|
||
|
||
# Test with large values
|
||
large_input = Tensor([[1000, -1000]])
|
||
large_result = relu(large_input)
|
||
expected_large = np.array([[1000, 0]])
|
||
assert np.array_equal(large_result.data, expected_large), "ReLU should handle large values"
|
||
|
||
# Test with matrix
|
||
matrix_input = Tensor([[-1, 2], [3, -4]])
|
||
matrix_result = relu(matrix_input)
|
||
expected_matrix = np.array([[0, 2], [3, 0]])
|
||
assert np.array_equal(matrix_result.data, expected_matrix), "ReLU should work with matrices"
|
||
|
||
print("✅ ReLU edge cases: zeros, large values, matrices")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ ReLU edge cases failed: {e}")
|
||
|
||
# Test 3: Sigmoid Basic Functionality
|
||
try:
|
||
sigmoid = Sigmoid()
|
||
|
||
# Test sigmoid(0) = 0.5
|
||
zero_input = Tensor([[0]])
|
||
zero_result = sigmoid(zero_input)
|
||
assert abs(zero_result.data.item() - 0.5) < 1e-6, f"Sigmoid(0) should be 0.5, got {zero_result.data.item()}"
|
||
|
||
# Test range bounds
|
||
test_input = Tensor([[-10, -1, 0, 1, 10]])
|
||
result = sigmoid(test_input)
|
||
assert np.all((result.data > 0) & (result.data < 1)), "Sigmoid outputs should be in (0,1)"
|
||
assert result.shape == test_input.shape, "Sigmoid should preserve shape"
|
||
|
||
print(f"✅ Sigmoid basic: range (0,1), sigmoid(0)=0.5")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Sigmoid basic test failed: {e}")
|
||
|
||
# Test 4: Sigmoid Properties
|
||
try:
|
||
sigmoid = Sigmoid()
|
||
|
||
# Test monotonicity
|
||
inputs = Tensor([[-2, -1, 0, 1, 2]])
|
||
outputs = sigmoid(inputs)
|
||
output_values = outputs.data.flatten()
|
||
|
||
# Check that outputs are increasing
|
||
for i in range(len(output_values) - 1):
|
||
assert output_values[i] < output_values[i + 1], "Sigmoid should be monotonic increasing"
|
||
|
||
# Test numerical stability with extreme values
|
||
extreme_input = Tensor([[-1000, 1000]])
|
||
extreme_result = sigmoid(extreme_input)
|
||
assert not np.any(np.isnan(extreme_result.data)), "Sigmoid should handle extreme values without NaN"
|
||
assert not np.any(np.isinf(extreme_result.data)), "Sigmoid should handle extreme values without Inf"
|
||
|
||
print("✅ Sigmoid properties: monotonic, numerically stable")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Sigmoid properties failed: {e}")
|
||
|
||
# Test 5: Tanh Basic Functionality
|
||
try:
|
||
tanh = Tanh()
|
||
|
||
# Test tanh(0) = 0
|
||
zero_input = Tensor([[0]])
|
||
zero_result = tanh(zero_input)
|
||
assert abs(zero_result.data.item() - 0.0) < 1e-6, f"Tanh(0) should be 0.0, got {zero_result.data.item()}"
|
||
|
||
# Test range bounds
|
||
test_input = Tensor([[-10, -1, 0, 1, 10]])
|
||
result = tanh(test_input)
|
||
assert np.all((result.data >= -1) & (result.data <= 1)), "Tanh outputs should be in [-1,1]"
|
||
assert result.shape == test_input.shape, "Tanh should preserve shape"
|
||
|
||
print(f"✅ Tanh basic: range [-1,1], tanh(0)=0")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Tanh basic test failed: {e}")
|
||
|
||
# Test 6: Tanh Symmetry
|
||
try:
|
||
tanh = Tanh()
|
||
|
||
# Test symmetry: tanh(-x) = -tanh(x)
|
||
test_values = [1, 2, 3, 5]
|
||
for val in test_values:
|
||
pos_input = Tensor([[val]])
|
||
neg_input = Tensor([[-val]])
|
||
pos_result = tanh(pos_input)
|
||
neg_result = tanh(neg_input)
|
||
|
||
assert abs(pos_result.data.item() + neg_result.data.item()) < 1e-6, f"Tanh should be symmetric: tanh(-{val}) ≠ -tanh({val})"
|
||
|
||
# Test numerical stability
|
||
extreme_input = Tensor([[-1000, 1000]])
|
||
extreme_result = tanh(extreme_input)
|
||
assert not np.any(np.isnan(extreme_result.data)), "Tanh should handle extreme values without NaN"
|
||
|
||
print("✅ Tanh symmetry: tanh(-x) = -tanh(x), numerically stable")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Tanh symmetry failed: {e}")
|
||
|
||
# Test 7: Softmax Basic Functionality
|
||
try:
|
||
softmax = Softmax()
|
||
|
||
# Test that outputs sum to 1
|
||
test_input = Tensor([[1, 2, 3]])
|
||
result = softmax(test_input)
|
||
sum_result = np.sum(result.data)
|
||
assert abs(sum_result - 1.0) < 1e-6, f"Softmax outputs should sum to 1, got {sum_result}"
|
||
|
||
# Test that all outputs are positive
|
||
assert np.all(result.data > 0), "All softmax outputs should be positive"
|
||
|
||
# Test that larger inputs give larger outputs
|
||
assert result.data[0, 2] > result.data[0, 1] > result.data[0, 0], "Softmax should preserve order"
|
||
|
||
print(f"✅ Softmax basic: sums to 1, all positive, preserves order")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Softmax basic test failed: {e}")
|
||
|
||
# Test 8: Softmax with Multiple Rows
|
||
try:
|
||
softmax = Softmax()
|
||
|
||
# Test with matrix (multiple rows)
|
||
matrix_input = Tensor([[1, 2, 3], [4, 5, 6]])
|
||
matrix_result = softmax(matrix_input)
|
||
|
||
# Each row should sum to 1
|
||
row_sums = np.sum(matrix_result.data, axis=1)
|
||
assert np.allclose(row_sums, 1.0), f"Each row should sum to 1, got {row_sums}"
|
||
|
||
# All values should be positive
|
||
assert np.all(matrix_result.data > 0), "All softmax outputs should be positive"
|
||
|
||
# Test numerical stability with extreme values
|
||
extreme_input = Tensor([[1000, 1001, 1002]])
|
||
extreme_result = softmax(extreme_input)
|
||
assert not np.any(np.isnan(extreme_result.data)), "Softmax should handle extreme values without NaN"
|
||
assert abs(np.sum(extreme_result.data) - 1.0) < 1e-6, "Softmax should still sum to 1 with extreme values"
|
||
|
||
print("✅ Softmax matrices: each row sums to 1, numerically stable")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Softmax matrices failed: {e}")
|
||
|
||
# Test 9: Shape Preservation
|
||
try:
|
||
relu = ReLU()
|
||
sigmoid = Sigmoid()
|
||
tanh = Tanh()
|
||
softmax = Softmax()
|
||
|
||
# Test different shapes
|
||
test_shapes = [
|
||
Tensor([[1]]), # 1x1
|
||
Tensor([[1, 2, 3]]), # 1x3
|
||
Tensor([[1], [2], [3]]), # 3x1
|
||
Tensor([[1, 2], [3, 4]]), # 2x2
|
||
Tensor([[1, 2], [3, 4]]), # 2x2
|
||
]
|
||
|
||
for i, test_tensor in enumerate(test_shapes):
|
||
original_shape = test_tensor.shape
|
||
|
||
relu_result = relu(test_tensor)
|
||
sigmoid_result = sigmoid(test_tensor)
|
||
tanh_result = tanh(test_tensor)
|
||
softmax_result = softmax(test_tensor)
|
||
|
||
assert relu_result.shape == original_shape, f"ReLU shape mismatch for test {i}"
|
||
assert sigmoid_result.shape == original_shape, f"Sigmoid shape mismatch for test {i}"
|
||
assert tanh_result.shape == original_shape, f"Tanh shape mismatch for test {i}"
|
||
assert softmax_result.shape == original_shape, f"Softmax shape mismatch for test {i}"
|
||
|
||
print("✅ Shape preservation: all activations preserve input shapes")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Shape preservation failed: {e}")
|
||
|
||
# Test 10: Function Composition
|
||
try:
|
||
relu = ReLU()
|
||
sigmoid = Sigmoid()
|
||
tanh = Tanh()
|
||
|
||
# Test chaining activations
|
||
test_input = Tensor([[-2, -1, 0, 1, 2]])
|
||
|
||
# Chain: input → tanh → relu
|
||
tanh_result = tanh(test_input)
|
||
relu_tanh_result = relu(tanh_result)
|
||
|
||
# Chain: input → sigmoid → tanh
|
||
sigmoid_result = sigmoid(test_input)
|
||
tanh_sigmoid_result = tanh(sigmoid_result)
|
||
|
||
# All should preserve shape
|
||
assert relu_tanh_result.shape == test_input.shape, "Chained activations should preserve shape"
|
||
assert tanh_sigmoid_result.shape == test_input.shape, "Chained activations should preserve shape"
|
||
|
||
# Results should be valid
|
||
assert np.all(relu_tanh_result.data >= 0), "ReLU after Tanh should be non-negative"
|
||
assert np.all((tanh_sigmoid_result.data >= -1) & (tanh_sigmoid_result.data <= 1)), "Tanh after Sigmoid should be in [-1,1]"
|
||
|
||
print("✅ Function composition: activations can be chained together")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Function composition failed: {e}")
|
||
|
||
# Test 11: Real ML Scenario
|
||
try:
|
||
# Simulate a neural network layer output
|
||
logits = Tensor([[2.0, 1.0, 0.1]]) # Raw network outputs
|
||
|
||
# Apply softmax for classification
|
||
softmax = Softmax()
|
||
probabilities = softmax(logits)
|
||
|
||
# Check that we get valid probabilities
|
||
assert abs(np.sum(probabilities.data) - 1.0) < 1e-6, "Probabilities should sum to 1"
|
||
assert np.all(probabilities.data > 0), "All probabilities should be positive"
|
||
|
||
# The highest logit should give the highest probability
|
||
max_logit_idx = np.argmax(logits.data)
|
||
max_prob_idx = np.argmax(probabilities.data)
|
||
assert max_logit_idx == max_prob_idx, "Highest logit should give highest probability"
|
||
|
||
# Apply ReLU to hidden layer
|
||
hidden_activations = Tensor([[-0.5, 0.8, -1.2, 2.1]])
|
||
relu = ReLU()
|
||
relu_output = relu(hidden_activations)
|
||
|
||
# Should zero out negative values
|
||
expected_relu = np.array([[0.0, 0.8, 0.0, 2.1]])
|
||
assert np.array_equal(relu_output.data, expected_relu), "ReLU should zero negative values"
|
||
|
||
print("✅ Real ML scenario: classification probabilities, hidden layer activation")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Real ML scenario failed: {e}")
|
||
|
||
# Test 12: Performance and Stability
|
||
try:
|
||
# Test with large tensors
|
||
large_input = Tensor(np.random.randn(100, 50))
|
||
|
||
relu = ReLU()
|
||
sigmoid = Sigmoid()
|
||
tanh = Tanh()
|
||
softmax = Softmax()
|
||
|
||
# All should handle large tensors
|
||
relu_large = relu(large_input)
|
||
sigmoid_large = sigmoid(large_input)
|
||
tanh_large = tanh(large_input)
|
||
softmax_large = softmax(large_input)
|
||
|
||
# Check for NaN or Inf
|
||
assert not np.any(np.isnan(relu_large.data)), "ReLU should not produce NaN"
|
||
assert not np.any(np.isnan(sigmoid_large.data)), "Sigmoid should not produce NaN"
|
||
assert not np.any(np.isnan(tanh_large.data)), "Tanh should not produce NaN"
|
||
assert not np.any(np.isnan(softmax_large.data)), "Softmax should not produce NaN"
|
||
|
||
assert not np.any(np.isinf(relu_large.data)), "ReLU should not produce Inf"
|
||
assert not np.any(np.isinf(sigmoid_large.data)), "Sigmoid should not produce Inf"
|
||
assert not np.any(np.isinf(tanh_large.data)), "Tanh should not produce Inf"
|
||
assert not np.any(np.isinf(softmax_large.data)), "Softmax should not produce Inf"
|
||
|
||
print("✅ Performance and stability: large tensors handled without NaN/Inf")
|
||
tests_passed += 1
|
||
except Exception as e:
|
||
print(f"❌ Performance and stability failed: {e}")
|
||
|
||
# Results summary
|
||
print(f"\n📊 Activation Functions Results: {tests_passed}/{total_tests} tests passed")
|
||
|
||
if tests_passed == total_tests:
|
||
print("🎉 All activation function tests passed! Your implementations support:")
|
||
print(" • ReLU: Fast, sparse activation for hidden layers")
|
||
print(" • Sigmoid: Smooth probabilistic outputs (0,1)")
|
||
print(" • Tanh: Zero-centered activation (-1,1)")
|
||
print(" • Softmax: Probability distributions for classification")
|
||
print(" • All functions preserve shapes and handle edge cases")
|
||
print(" • Numerical stability with extreme values")
|
||
print(" • Function composition for complex networks")
|
||
print("📈 Progress: All Activation Functions ✓")
|
||
return True
|
||
else:
|
||
print("⚠️ Some activation tests failed. Common issues:")
|
||
print(" • Check mathematical formulas (especially sigmoid and tanh)")
|
||
print(" • Verify numerical stability (clip extreme values)")
|
||
print(" • Ensure proper shape preservation")
|
||
print(" • Test with edge cases (zeros, large values)")
|
||
print(" • Verify softmax sums to 1 for each row")
|
||
return False
|
||
|
||
# Run the comprehensive test
|
||
success = test_activations_comprehensive()
|
||
|
||
# %% [markdown]
|
||
"""
|
||
### 🧪 Integration Test: Activation Functions in Neural Networks
|
||
|
||
Let's test how your activation functions work in a realistic neural network scenario.
|
||
"""
|
||
|
||
# %% nbgrader={"grade": true, "grade_id": "test-activations-integration", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false}
|
||
def test_activations_integration():
|
||
"""Integration test with realistic neural network scenario."""
|
||
print("🔬 Testing activation functions in neural network scenario...")
|
||
|
||
try:
|
||
print("🧠 Simulating a 3-layer neural network...")
|
||
|
||
# Layer 1: Input data (batch of 3 samples, 4 features each)
|
||
input_data = Tensor([[1.0, -2.0, 3.0, -1.0],
|
||
[2.0, 1.0, -1.0, 0.5],
|
||
[-1.0, 3.0, 2.0, -0.5]])
|
||
print(f"📊 Input data shape: {input_data.shape}")
|
||
|
||
# Layer 2: Hidden layer with ReLU activation
|
||
# Simulate some linear transformation results
|
||
hidden_raw = Tensor([[2.1, -1.5, 0.8],
|
||
[1.2, 3.4, -0.3],
|
||
[-0.7, 2.8, 1.9]])
|
||
|
||
relu = ReLU()
|
||
hidden_activated = relu(hidden_raw)
|
||
print(f"✅ Hidden layer (ReLU): {hidden_raw.data.flatten()[:3]} → {hidden_activated.data.flatten()[:3]}")
|
||
|
||
# Verify ReLU worked correctly
|
||
assert np.all(hidden_activated.data >= 0), "Hidden layer should have non-negative activations"
|
||
|
||
# Layer 3: Output layer for binary classification (sigmoid)
|
||
output_raw = Tensor([[0.8], [2.1], [-0.5]])
|
||
|
||
sigmoid = Sigmoid()
|
||
output_probs = sigmoid(output_raw)
|
||
print(f"✅ Output layer (Sigmoid): {output_raw.data.flatten()} → {output_probs.data.flatten()}")
|
||
|
||
# Verify sigmoid outputs are valid probabilities
|
||
assert np.all((output_probs.data > 0) & (output_probs.data < 1)), "Output should be valid probabilities"
|
||
|
||
# Alternative: Multi-class classification with softmax
|
||
multiclass_raw = Tensor([[1.0, 2.0, 0.5],
|
||
[0.1, 0.8, 2.1],
|
||
[1.5, 0.3, 1.2]])
|
||
|
||
softmax = Softmax()
|
||
class_probs = softmax(multiclass_raw)
|
||
print(f"✅ Multi-class output (Softmax): each row sums to {np.sum(class_probs.data, axis=1)}")
|
||
|
||
# Verify softmax outputs
|
||
row_sums = np.sum(class_probs.data, axis=1)
|
||
assert np.allclose(row_sums, 1.0), "Each sample should have probabilities summing to 1"
|
||
|
||
# Test activation function chaining
|
||
print("\n🔗 Testing activation function chaining...")
|
||
|
||
# Chain: Tanh → ReLU (unusual but valid)
|
||
tanh = Tanh()
|
||
test_input = Tensor([[-2, -1, 0, 1, 2]])
|
||
|
||
tanh_result = tanh(test_input)
|
||
relu_tanh_result = relu(tanh_result)
|
||
|
||
print(f"✅ Tanh → ReLU: {test_input.data.flatten()} → {tanh_result.data.flatten()} → {relu_tanh_result.data.flatten()}")
|
||
|
||
# Verify chaining worked
|
||
assert relu_tanh_result.shape == test_input.shape, "Chained activations should preserve shape"
|
||
assert np.all(relu_tanh_result.data >= 0), "Final result should be non-negative (ReLU effect)"
|
||
|
||
# Test different activation choices
|
||
print("\n🎯 Testing activation function choices...")
|
||
|
||
# Compare different activations on same input
|
||
comparison_input = Tensor([[0.5, -0.5, 1.0, -1.0]])
|
||
|
||
relu_comp = relu(comparison_input)
|
||
sigmoid_comp = sigmoid(comparison_input)
|
||
tanh_comp = tanh(comparison_input)
|
||
|
||
print(f"Input: {comparison_input.data.flatten()}")
|
||
print(f"ReLU: {relu_comp.data.flatten()}")
|
||
print(f"Sigmoid: {sigmoid_comp.data.flatten()}")
|
||
print(f"Tanh: {tanh_comp.data.flatten()}")
|
||
|
||
# Show how different activations affect the same input
|
||
print("\n📈 Activation function characteristics:")
|
||
print("• ReLU: Sparse (many zeros), unbounded positive")
|
||
print("• Sigmoid: Smooth, bounded (0,1), good for probabilities")
|
||
print("• Tanh: Zero-centered (-1,1), symmetric")
|
||
print("• Softmax: Probability distribution, sums to 1")
|
||
|
||
print("\n🎉 Integration test passed! Your activation functions work correctly in:")
|
||
print(" • Multi-layer neural networks")
|
||
print(" • Binary and multi-class classification")
|
||
print(" • Function composition and chaining")
|
||
print(" • Different architectural choices")
|
||
print("📈 Progress: All activation functions ready for neural networks!")
|
||
|
||
return True
|
||
|
||
except Exception as e:
|
||
print(f"❌ Integration test failed: {e}")
|
||
print("\n💡 This suggests an issue with:")
|
||
print(" • Basic activation function implementation")
|
||
print(" • Shape handling in neural network context")
|
||
print(" • Mathematical correctness of the functions")
|
||
print(" • Check your activation function implementations")
|
||
return False
|
||
|
||
# Run the integration test
|
||
success = test_activations_integration() and success
|
||
|
||
# Print final summary
|
||
print(f"\n{'='*60}")
|
||
print("🎯 ACTIVATION FUNCTIONS MODULE TESTING COMPLETE")
|
||
print(f"{'='*60}")
|
||
|
||
if success:
|
||
print("🎉 CONGRATULATIONS! All activation function tests passed!")
|
||
print("\n✅ Your activation functions successfully implement:")
|
||
print(" • ReLU: max(0, x) for sparse hidden layer activation")
|
||
print(" • Sigmoid: 1/(1+e^(-x)) for binary classification")
|
||
print(" • Tanh: tanh(x) for zero-centered activation")
|
||
print(" • Softmax: probability distributions for multi-class classification")
|
||
print(" • Numerical stability with extreme values")
|
||
print(" • Shape preservation and function composition")
|
||
print(" • Real neural network integration")
|
||
print("\n🚀 You're ready to build neural network layers!")
|
||
print("📈 Final Progress: Activation Functions Module ✓ COMPLETE")
|
||
else:
|
||
print("⚠️ Some tests failed. Please review the error messages above.")
|
||
print("\n🔧 To fix issues:")
|
||
print(" 1. Check the specific activation function that failed")
|
||
print(" 2. Review the mathematical formulas")
|
||
print(" 3. Verify numerical stability (especially for sigmoid/tanh)")
|
||
print(" 4. Test with edge cases (zeros, large values)")
|
||
print(" 5. Ensure softmax sums to 1")
|
||
print("\n💪 Keep going! These functions are the key to neural network power.")
|
||
|
||
# %% [markdown]
|
||
"""
|
||
## 🎯 Module Summary
|
||
|
||
Congratulations! You've successfully implemented the core activation functions for TinyTorch:
|
||
|
||
### What You've Accomplished
|
||
✅ **ReLU**: The workhorse activation for hidden layers
|
||
✅ **Sigmoid**: Smooth probabilistic outputs for binary classification
|
||
✅ **Tanh**: Zero-centered activation for better training dynamics
|
||
✅ **Softmax**: Probability distributions for multi-class classification
|
||
✅ **Integration**: All functions work together and preserve tensor shapes
|
||
|
||
### Key Concepts You've Learned
|
||
- **Nonlinearity** is essential for neural networks to learn complex patterns
|
||
- **ReLU** is simple, fast, and effective for most hidden layers
|
||
- **Sigmoid** squashes outputs to (0,1) for probabilistic interpretation
|
||
- **Tanh** is zero-centered and often better than sigmoid for hidden layers
|
||
- **Softmax** converts logits to probability distributions
|
||
- **Numerical stability** is crucial for functions with exponentials
|
||
|
||
### Next Steps
|
||
1. **Export your code**: `tito package nbdev --export 02_activations`
|
||
2. **Test your implementation**: `tito module test 02_activations`
|
||
3. **Use your activations**:
|
||
```python
|
||
from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax
|
||
from tinytorch.core.tensor import Tensor
|
||
|
||
relu = ReLU()
|
||
x = Tensor([[-1, 0, 1, 2]])
|
||
y = relu(x) # Your activation in action!
|
||
```
|
||
4. **Move to Module 3**: Start building neural network layers!
|
||
|
||
**Ready for the next challenge?** Let's combine tensors and activations to build the fundamental building blocks of neural networks!
|
||
""" |