Files
TinyTorch/_proc/activations/activations_dev
Vijay Janapa Reddi 82defeafd3 Refactor notebook generation to use separate files for better architecture
- Restored tools/py_to_notebook.py as a focused, standalone tool
- Updated tito notebooks command to use subprocess to call the separate tool
- Maintains clean separation of concerns: tito.py for CLI orchestration, py_to_notebook.py for conversion logic
- Updated documentation to use 'tito notebooks' command instead of direct tool calls
- Benefits: easier debugging, better maintainability, focused single-responsibility modules
2025-07-10 21:57:09 -04:00

1162 lines
39 KiB
Plaintext

# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# ---
# %% [markdown]
"""
# 🔥 TinyTorch Activations Module
Welcome to the **Activations** module! This is where you'll implement the mathematical functions that give neural networks their power.
## 🎯 Learning Objectives
By the end of this module, you will:
1. **Understand** why activation functions are essential for neural networks
2. **Implement** the three most important activation functions: ReLU, Sigmoid, and Tanh
3. **Test** your functions with various inputs to understand their behavior
4. **Use** these functions as building blocks for neural networks
## 🧠 Why Activation Functions Matter
**Without activation functions, neural networks are just linear transformations!**
```
Linear → Linear → Linear = Still just Linear
Linear → Activation → Linear = Can learn complex patterns!
```
**Key insight**: Activation functions add **nonlinearity**, allowing networks to learn complex patterns that linear functions cannot capture.
## 📚 What You'll Build
- **ReLU**: `f(x) = max(0, x)` - The workhorse of deep learning
- **Sigmoid**: `f(x) = 1 / (1 + e^(-x))` - Squashes to (0, 1)
- **Tanh**: `f(x) = tanh(x)` - Squashes to (-1, 1)
Each function serves different purposes and has different mathematical properties.
---
Let's start building! 🚀
"""
# %%
#| default_exp core.activations
# Standard library imports
import math
import numpy as np
import matplotlib.pyplot as plt
import os
import sys
# TinyTorch imports
from tinytorch.core.tensor import Tensor
# %%
# Helper function to detect if we're in a testing environment
def _should_show_plots():
"""
Determine if we should show plots based on the execution context.
Returns False if:
- Running in pytest (detected by 'pytest' in sys.modules)
- Running in test environment (detected by environment variables)
- Running from command line test runner
Returns True if:
- Running in Jupyter notebook
- Running interactively in Python
"""
# Check if we're running in pytest
if 'pytest' in sys.modules:
return False
# Check if we're in a test environment
if os.environ.get('PYTEST_CURRENT_TEST'):
return False
# Check if we're running from a test file (more specific check)
if any(arg.endswith('.py') and 'test_' in os.path.basename(arg) and 'tests/' in arg for arg in sys.argv):
return False
# Check if we're running from the tito CLI test command
if len(sys.argv) > 0 and 'tito.py' in sys.argv[0] and 'test' in sys.argv:
return False
# Default to showing plots (notebook/interactive environment)
return True
# %% [markdown]
"""
## Step 1: ReLU Activation Function
**ReLU** (Rectified Linear Unit) is the most popular activation function in deep learning.
**Formula**: `f(x) = max(0, x)`
**Properties**:
- **Simple**: Easy to compute and understand
- **Sparse**: Outputs exactly zero for negative inputs
- **Unbounded**: No upper limit on positive outputs
- **Non-saturating**: Doesn't suffer from vanishing gradients
**When to use**: Almost everywhere! It's the default choice for hidden layers.
"""
# %%
#| export
class ReLU:
"""
ReLU Activation: f(x) = max(0, x)
The most popular activation function in deep learning.
Simple, effective, and computationally efficient.
TODO: Implement ReLU activation function.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply ReLU: f(x) = max(0, x)
Args:
x: Input tensor
Returns:
Output tensor with ReLU applied element-wise
TODO: Implement element-wise max(0, x) operation
Hint: Use np.maximum(0, x.data)
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
"""Make activation callable: relu(x) same as relu.forward(x)"""
return self.forward(x)
# %%
#| hide
#| export
class ReLU:
"""ReLU Activation: f(x) = max(0, x)"""
def forward(self, x: Tensor) -> Tensor:
"""Apply ReLU: f(x) = max(0, x)"""
return Tensor(np.maximum(0, x.data))
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your ReLU Function
Once you implement ReLU above, run this cell to test it:
"""
# %%
# Test ReLU function
try:
print("=== Testing ReLU Function ===")
# Test data: mix of positive, negative, and zero
x = Tensor([[-3.0, -1.0, 0.0, 1.0, 3.0]])
print(f"Input: {x.data}")
# Test ReLU
relu = ReLU()
y = relu(x)
print(f"ReLU output: {y.data}")
print(f"Expected: [[0. 0. 0. 1. 3.]]")
# Test with different shapes
x_2d = Tensor([[-2.0, 1.0], [0.5, -0.5]])
y_2d = relu(x_2d)
print(f"\n2D Input: {x_2d.data}")
print(f"2D ReLU output: {y_2d.data}")
print("✅ ReLU working!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the ReLU function above!")
# %% [markdown]
"""
### 📊 Visualize ReLU Function
Let's plot the ReLU function to see how it transforms inputs:
"""
# %%
# Plot ReLU function
try:
print("=== Plotting ReLU Function ===")
# Create a range of input values
x_range = np.linspace(-5, 5, 100)
x_tensor = Tensor([x_range])
# Apply ReLU (student implementation)
relu = ReLU()
y_tensor = relu(x_tensor)
y_range = y_tensor.data[0]
# Create ideal ReLU for comparison
y_ideal = np.maximum(0, x_range)
# Only show plots if we're not in a testing environment
if _should_show_plots():
# Create the plot
plt.figure(figsize=(12, 8))
# Plot both student implementation and ideal
plt.subplot(2, 2, 1)
plt.plot(x_range, y_range, 'b-', linewidth=3, label='Your ReLU Implementation')
plt.plot(x_range, y_ideal, 'r--', linewidth=2, alpha=0.7, label='Ideal ReLU')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Output')
plt.title('ReLU: Your Implementation vs Ideal')
plt.grid(True, alpha=0.3)
plt.legend()
plt.xlim(-5, 5)
plt.ylim(-1, 5)
# Mathematical explanation plot
plt.subplot(2, 2, 2)
# Show the mathematical definition
x_math = np.array([-3, -2, -1, 0, 1, 2, 3])
y_math = np.maximum(0, x_math)
plt.stem(x_math, y_math, basefmt=' ', linefmt='g-', markerfmt='go')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('max(0, x)')
plt.title('Mathematical Definition: max(0, x)')
plt.grid(True, alpha=0.3)
plt.xlim(-4, 4)
plt.ylim(-0.5, 3.5)
# Show the piecewise nature
plt.subplot(2, 2, 3)
x_left = np.linspace(-5, 0, 50)
x_right = np.linspace(0, 5, 50)
plt.plot(x_left, np.zeros_like(x_left), 'r-', linewidth=3, label='f(x) = 0 for x < 0')
plt.plot(x_right, x_right, 'b-', linewidth=3, label='f(x) = x for x ≥ 0')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Output')
plt.title('Piecewise Function Definition')
plt.grid(True, alpha=0.3)
plt.legend()
plt.xlim(-5, 5)
plt.ylim(-1, 5)
# Error analysis
plt.subplot(2, 2, 4)
difference = np.abs(y_range - y_ideal)
max_error = np.max(difference)
plt.plot(x_range, difference, 'purple', linewidth=2)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('|Your Output - Ideal Output|')
plt.title(f'Implementation Error (Max: {max_error:.6f})')
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
plt.tight_layout()
plt.show()
# Print analysis
print(f"\n📊 Analysis:")
print(f"✅ Maximum error: {max_error:.10f}")
if max_error < 1e-10:
print("🎉 Perfect implementation!")
elif max_error < 1e-6:
print("🌟 Excellent implementation!")
elif max_error < 1e-3:
print("👍 Good implementation!")
else:
print("🔧 Implementation needs work.")
print(f"📈 Function properties:")
print(f" • Range: [0, ∞)")
print(f" • Piecewise: f(x) = 0 for x < 0, f(x) = x for x ≥ 0")
print(f" • Monotonic: Always increasing for x ≥ 0")
print(f" • Sparse: Exactly zero for negative inputs")
else:
print("📊 Plots disabled during testing - this is normal!")
# Always show the mathematical analysis
difference = np.abs(y_range - y_ideal)
max_error = np.max(difference)
print(f"\n📊 Mathematical Analysis:")
print(f"✅ Maximum error: {max_error:.10f}")
if max_error < 1e-10:
print("🎉 Perfect implementation!")
elif max_error < 1e-6:
print("🌟 Excellent implementation!")
elif max_error < 1e-3:
print("👍 Good implementation!")
else:
print("🔧 Implementation needs work.")
except Exception as e:
print(f"❌ Error in plotting: {e}")
print("Make sure to implement the ReLU function above!")
# %% [markdown]
"""
## Step 2: Sigmoid Activation Function
**Sigmoid** squashes any input to the range (0, 1), making it useful for probabilities.
**Formula**: `f(x) = 1 / (1 + e^(-x))`
**Properties**:
- **Bounded**: Always outputs between 0 and 1
- **Smooth**: Differentiable everywhere
- **S-shaped**: Smooth transition from 0 to 1
- **Saturating**: Can suffer from vanishing gradients
**When to use**: Binary classification (final layer), gates in RNNs/LSTMs.
**⚠️ Numerical Stability**: Be careful with large inputs to avoid overflow!
"""
# %%
#| export
class Sigmoid:
"""
Sigmoid Activation: f(x) = 1 / (1 + e^(-x))
Squashes input to range (0, 1). Often used for binary classification.
TODO: Implement Sigmoid activation function.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply Sigmoid: f(x) = 1 / (1 + e^(-x))
Args:
x: Input tensor
Returns:
Output tensor with Sigmoid applied element-wise
TODO: Implement sigmoid function (be careful with numerical stability!)
Hint: For numerical stability, use:
- For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
- For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %%
#| hide
#| export
class Sigmoid:
"""Sigmoid Activation: f(x) = 1 / (1 + e^(-x))"""
def forward(self, x: Tensor) -> Tensor:
"""Apply Sigmoid with numerical stability"""
# Use the numerically stable version to avoid overflow
# For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
# For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
x_data = x.data
result = np.zeros_like(x_data)
# Stable computation
positive_mask = x_data >= 0
result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))
result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))
return Tensor(result)
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Sigmoid Function
Once you implement Sigmoid above, run this cell to test it:
"""
# %%
# Test Sigmoid function
try:
print("=== Testing Sigmoid Function ===")
# Test data: mix of positive, negative, and zero
x = Tensor([[-5.0, -1.0, 0.0, 1.0, 5.0]])
print(f"Input: {x.data}")
# Test Sigmoid
sigmoid = Sigmoid()
y = sigmoid(x)
print(f"Sigmoid output: {y.data}")
print("Expected: values between 0 and 1")
print(f"All values in (0,1)? {np.all((y.data > 0) & (y.data < 1))}")
# Test specific values
x_zero = Tensor([[0.0]])
y_zero = sigmoid(x_zero)
print(f"\nSigmoid(0) = {y_zero.data[0, 0]:.4f} (should be 0.5)")
# Test extreme values (numerical stability)
x_extreme = Tensor([[-100.0, 100.0]])
y_extreme = sigmoid(x_extreme)
print(f"Sigmoid([-100, 100]) = {y_extreme.data}")
print("Should be close to [0, 1] without overflow errors")
print("✅ Sigmoid working!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the Sigmoid function above!")
# %% [markdown]
"""
### 📊 Visualize Sigmoid Function
Let's plot the Sigmoid function to see its S-shaped curve:
"""
# %%
# Plot Sigmoid function
try:
print("=== Plotting Sigmoid Function ===")
# Create a range of input values
x_range = np.linspace(-10, 10, 100)
x_tensor = Tensor([x_range])
# Apply Sigmoid (student implementation)
sigmoid = Sigmoid()
y_tensor = sigmoid(x_tensor)
y_range = y_tensor.data[0]
# Create ideal Sigmoid for comparison
y_ideal = 1.0 / (1.0 + np.exp(-x_range))
# Only show plots if we're not in a testing environment
if _should_show_plots():
# Create the plot
plt.figure(figsize=(12, 8))
# Plot both student implementation and ideal
plt.subplot(2, 2, 1)
plt.plot(x_range, y_range, 'g-', linewidth=3, label='Your Sigmoid Implementation')
plt.plot(x_range, y_ideal, 'r--', linewidth=2, alpha=0.7, label='Ideal Sigmoid')
plt.axhline(y=0.5, color='orange', linestyle='--', alpha=0.5, label='y = 0.5')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='k', linestyle='-', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Output')
plt.title('Sigmoid: Your Implementation vs Ideal')
plt.grid(True, alpha=0.3)
plt.legend()
plt.xlim(-10, 10)
plt.ylim(-0.1, 1.1)
# Mathematical explanation plot
plt.subplot(2, 2, 2)
# Show key points
x_key = np.array([-5, -2, -1, 0, 1, 2, 5])
y_key = 1.0 / (1.0 + np.exp(-x_key))
plt.stem(x_key, y_key, basefmt=' ', linefmt='orange', markerfmt='o')
plt.axhline(y=0.5, color='orange', linestyle='--', alpha=0.5)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='k', linestyle='-', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('1/(1+e^(-x))')
plt.title('Mathematical Definition: 1/(1+e^(-x))')
plt.grid(True, alpha=0.3)
plt.xlim(-6, 6)
plt.ylim(-0.1, 1.1)
# Show the S-curve properties
plt.subplot(2, 2, 3)
x_detailed = np.linspace(-8, 8, 200)
y_detailed = 1.0 / (1.0 + np.exp(-x_detailed))
plt.plot(x_detailed, y_detailed, 'g-', linewidth=3)
# Add asymptotes
plt.axhline(y=0, color='r', linestyle='--', alpha=0.7, label='Lower asymptote: y = 0')
plt.axhline(y=1, color='r', linestyle='--', alpha=0.7, label='Upper asymptote: y = 1')
plt.axhline(y=0.5, color='orange', linestyle='--', alpha=0.7, label='Midpoint: y = 0.5')
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Output')
plt.title('S-Curve Properties')
plt.grid(True, alpha=0.3)
plt.legend()
plt.xlim(-8, 8)
plt.ylim(-0.1, 1.1)
# Error analysis
plt.subplot(2, 2, 4)
difference = np.abs(y_range - y_ideal)
max_error = np.max(difference)
plt.plot(x_range, difference, 'purple', linewidth=2)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('|Your Output - Ideal Output|')
plt.title(f'Implementation Error (Max: {max_error:.6f})')
plt.grid(True, alpha=0.3)
plt.xlim(-10, 10)
plt.tight_layout()
plt.show()
# Print analysis
print(f"\n📊 Analysis:")
print(f"✅ Maximum error: {max_error:.10f}")
if max_error < 1e-10:
print("🎉 Perfect implementation!")
elif max_error < 1e-6:
print("🌟 Excellent implementation!")
elif max_error < 1e-3:
print("👍 Good implementation!")
else:
print("🔧 Implementation needs work.")
print(f"📈 Function properties:")
print(f" • Range: (0, 1)")
print(f" • Symmetric around (0, 0.5)")
print(f" • Smooth and differentiable everywhere")
print(f" • Saturates for large |x| (vanishing gradient problem)")
print(f" • Useful for binary classification (outputs probabilities)")
else:
print("📊 Plots disabled during testing - this is normal!")
# Always show the mathematical analysis
difference = np.abs(y_range - y_ideal)
max_error = np.max(difference)
print(f"\n📊 Mathematical Analysis:")
print(f"✅ Maximum error: {max_error:.10f}")
if max_error < 1e-10:
print("🎉 Perfect implementation!")
elif max_error < 1e-6:
print("🌟 Excellent implementation!")
elif max_error < 1e-3:
print("👍 Good implementation!")
else:
print("🔧 Implementation needs work.")
except Exception as e:
print(f"❌ Error in plotting: {e}")
print("Make sure to implement the Sigmoid function above!")
# %% [markdown]
"""
## Step 3: Tanh Activation Function
**Tanh** (Hyperbolic Tangent) squashes inputs to the range (-1, 1).
**Formula**: `f(x) = tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))`
**Properties**:
- **Bounded**: Always outputs between -1 and 1
- **Zero-centered**: Output is centered around 0
- **Smooth**: Differentiable everywhere
- **Stronger gradients**: Than sigmoid around zero
**When to use**: Hidden layers when you want zero-centered outputs, RNNs.
**Advantage over Sigmoid**: Zero-centered outputs help with gradient flow.
"""
# %%
#| export
class Tanh:
"""
Tanh Activation: f(x) = tanh(x)
Squashes input to range (-1, 1). Zero-centered output.
TODO: Implement Tanh activation function.
"""
def forward(self, x: Tensor) -> Tensor:
"""
Apply Tanh: f(x) = tanh(x)
Args:
x: Input tensor
Returns:
Output tensor with Tanh applied element-wise
TODO: Implement tanh function
Hint: Use np.tanh(x.data)
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %%
#| hide
#| export
class Tanh:
"""Tanh Activation: f(x) = tanh(x)"""
def forward(self, x: Tensor) -> Tensor:
"""Apply Tanh"""
return Tensor(np.tanh(x.data))
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Tanh Function
Once you implement Tanh above, run this cell to test it:
"""
# %%
# Test Tanh function
try:
print("=== Testing Tanh Function ===")
# Test data: mix of positive, negative, and zero
x = Tensor([[-3.0, -1.0, 0.0, 1.0, 3.0]])
print(f"Input: {x.data}")
# Test Tanh
tanh = Tanh()
y = tanh(x)
print(f"Tanh output: {y.data}")
print("Expected: values between -1 and 1")
print(f"All values in (-1,1)? {np.all((y.data > -1) & (y.data < 1))}")
# Test specific values
x_zero = Tensor([[0.0]])
y_zero = tanh(x_zero)
print(f"\nTanh(0) = {y_zero.data[0, 0]:.4f} (should be 0.0)")
# Test extreme values
x_extreme = Tensor([[-10.0, 10.0]])
y_extreme = tanh(x_extreme)
print(f"Tanh([-10, 10]) = {y_extreme.data}")
print("Should be close to [-1, 1]")
print("✅ Tanh working!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the Tanh function above!")
# %% [markdown]
"""
### 📊 Visualize Tanh Function
Let's plot the Tanh function to see its zero-centered S-shaped curve:
"""
# %%
# Plot Tanh function
try:
print("=== Plotting Tanh Function ===")
# Create a range of input values
x_range = np.linspace(-5, 5, 100)
x_tensor = Tensor([x_range])
# Apply Tanh (student implementation)
tanh = Tanh()
y_tensor = tanh(x_tensor)
y_range = y_tensor.data[0]
# Create ideal Tanh for comparison
y_ideal = np.tanh(x_range)
# Only show plots if we're not in a testing environment
if _should_show_plots():
# Create the plot
plt.figure(figsize=(12, 8))
# Plot both student implementation and ideal
plt.subplot(2, 2, 1)
plt.plot(x_range, y_range, 'orange', linewidth=3, label='Your Tanh Implementation')
plt.plot(x_range, y_ideal, 'r--', linewidth=2, alpha=0.7, label='Ideal Tanh')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='k', linestyle='--', alpha=0.3)
plt.axhline(y=-1, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Output')
plt.title('Tanh: Your Implementation vs Ideal')
plt.grid(True, alpha=0.3)
plt.legend()
plt.xlim(-5, 5)
plt.ylim(-1.2, 1.2)
# Mathematical explanation plot
plt.subplot(2, 2, 2)
# Show key points
x_key = np.array([-3, -2, -1, 0, 1, 2, 3])
y_key = np.tanh(x_key)
plt.stem(x_key, y_key, basefmt=' ', linefmt='purple', markerfmt='o')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='k', linestyle='--', alpha=0.3)
plt.axhline(y=-1, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('tanh(x)')
plt.title('Mathematical Definition: tanh(x)')
plt.grid(True, alpha=0.3)
plt.xlim(-4, 4)
plt.ylim(-1.2, 1.2)
# Show symmetry property
plt.subplot(2, 2, 3)
x_sym = np.linspace(-4, 4, 100)
y_sym = np.tanh(x_sym)
plt.plot(x_sym, y_sym, 'orange', linewidth=3, label='tanh(x)')
plt.plot(-x_sym, -y_sym, 'b--', linewidth=2, alpha=0.7, label='-tanh(-x)')
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='r', linestyle='--', alpha=0.7, label='Upper asymptote: y = 1')
plt.axhline(y=-1, color='r', linestyle='--', alpha=0.7, label='Lower asymptote: y = -1')
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Output')
plt.title('Symmetry: tanh(-x) = -tanh(x)')
plt.grid(True, alpha=0.3)
plt.legend()
plt.xlim(-4, 4)
plt.ylim(-1.2, 1.2)
# Error analysis
plt.subplot(2, 2, 4)
difference = np.abs(y_range - y_ideal)
max_error = np.max(difference)
plt.plot(x_range, difference, 'purple', linewidth=2)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('|Your Output - Ideal Output|')
plt.title(f'Implementation Error (Max: {max_error:.6f})')
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
plt.tight_layout()
plt.show()
# Print analysis
print(f"\n📊 Analysis:")
print(f"✅ Maximum error: {max_error:.10f}")
if max_error < 1e-10:
print("🎉 Perfect implementation!")
elif max_error < 1e-6:
print("🌟 Excellent implementation!")
elif max_error < 1e-3:
print("👍 Good implementation!")
else:
print("🔧 Implementation needs work.")
print(f"📈 Function properties:")
print(f" • Range: (-1, 1)")
print(f" • Odd function: tanh(-x) = -tanh(x)")
print(f" • Symmetric around origin (0, 0)")
print(f" • Smooth and differentiable everywhere")
print(f" • Stronger gradients than sigmoid around zero")
print(f" • Related to sigmoid: tanh(x) = 2*sigmoid(2x) - 1")
else:
print("📊 Plots disabled during testing - this is normal!")
# Always show the mathematical analysis
difference = np.abs(y_range - y_ideal)
max_error = np.max(difference)
print(f"\n📊 Mathematical Analysis:")
print(f"✅ Maximum error: {max_error:.10f}")
if max_error < 1e-10:
print("🎉 Perfect implementation!")
elif max_error < 1e-6:
print("🌟 Excellent implementation!")
elif max_error < 1e-3:
print("👍 Good implementation!")
else:
print("🔧 Implementation needs work.")
except Exception as e:
print(f"❌ Error in plotting: {e}")
print("Make sure to implement the Tanh function above!")
# %% [markdown]
"""
## Step 4: Compare All Activation Functions
Let's see how all three functions behave on the same input:
"""
# %%
# Compare all activation functions
try:
print("=== Comparing All Activation Functions ===")
# Test data: range from -5 to 5
x = Tensor([[-5.0, -2.0, -1.0, 0.0, 1.0, 2.0, 5.0]])
print(f"Input: {x.data}")
# Apply all activations
relu = ReLU()
sigmoid = Sigmoid()
tanh = Tanh()
y_relu = relu(x)
y_sigmoid = sigmoid(x)
y_tanh = tanh(x)
print(f"\nReLU: {y_relu.data}")
print(f"Sigmoid: {y_sigmoid.data}")
print(f"Tanh: {y_tanh.data}")
print("\n📊 Key Differences:")
print("- ReLU: Zeros out negative values, unbounded positive")
print("- Sigmoid: Squashes to (0, 1), always positive")
print("- Tanh: Squashes to (-1, 1), zero-centered")
print("\n✅ All activation functions working!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement all activation functions above!")
# %% [markdown]
"""
### 📊 Comprehensive Activation Function Comparison
Let's plot all three functions together to see their differences:
"""
# %%
# Plot all activation functions together
try:
print("=== Plotting All Activation Functions Together ===")
# Create a range of input values
x_range = np.linspace(-5, 5, 100)
x_tensor = Tensor([x_range])
# Apply all activations (student implementations)
relu = ReLU()
sigmoid = Sigmoid()
tanh = Tanh()
y_relu = relu(x_tensor).data[0]
y_sigmoid = sigmoid(x_tensor).data[0]
y_tanh = tanh(x_tensor).data[0]
# Create ideal functions for comparison
y_relu_ideal = np.maximum(0, x_range)
y_sigmoid_ideal = 1.0 / (1.0 + np.exp(-x_range))
y_tanh_ideal = np.tanh(x_range)
# Only show plots if we're not in a testing environment
if _should_show_plots():
# Create the comprehensive plot
plt.figure(figsize=(15, 10))
# Main comparison plot
plt.subplot(2, 3, (1, 2))
plt.plot(x_range, y_relu, 'b-', linewidth=3, label='Your ReLU')
plt.plot(x_range, y_sigmoid, 'g-', linewidth=3, label='Your Sigmoid')
plt.plot(x_range, y_tanh, 'orange', linewidth=3, label='Your Tanh')
# Add ideal functions as dashed lines
plt.plot(x_range, y_relu_ideal, 'b--', linewidth=1, alpha=0.7, label='Ideal ReLU')
plt.plot(x_range, y_sigmoid_ideal, 'g--', linewidth=1, alpha=0.7, label='Ideal Sigmoid')
plt.plot(x_range, y_tanh_ideal, '--', color='orange', linewidth=1, alpha=0.7, label='Ideal Tanh')
# Add reference lines
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.axhline(y=1, color='k', linestyle='--', alpha=0.3)
plt.axhline(y=-1, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
# Formatting
plt.xlabel('Input (x)', fontsize=12)
plt.ylabel('Output f(x)', fontsize=12)
plt.title('Activation Functions: Your Implementation vs Ideal', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.legend(fontsize=10, loc='upper left')
plt.xlim(-5, 5)
plt.ylim(-1.5, 5)
# Mathematical definitions
plt.subplot(2, 3, 3)
plt.text(0.05, 0.95, 'Mathematical Definitions:', fontsize=12, fontweight='bold',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.85, 'ReLU:', fontsize=11, fontweight='bold', color='blue',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.80, 'f(x) = max(0, x)', fontsize=10, fontfamily='monospace',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.70, 'Sigmoid:', fontsize=11, fontweight='bold', color='green',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.65, 'f(x) = 1/(1+e^(-x))', fontsize=10, fontfamily='monospace',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.55, 'Tanh:', fontsize=11, fontweight='bold', color='orange',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.50, 'f(x) = tanh(x)', fontsize=10, fontfamily='monospace',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.45, ' = (e^x-e^(-x))/(e^x+e^(-x))', fontsize=10, fontfamily='monospace',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.30, 'Key Properties:', fontsize=12, fontweight='bold',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.25, '• ReLU: Sparse, unbounded', fontsize=10, color='blue',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.20, '• Sigmoid: Bounded (0,1)', fontsize=10, color='green',
transform=plt.gca().transAxes, verticalalignment='top')
plt.text(0.05, 0.15, '• Tanh: Zero-centered (-1,1)', fontsize=10, color='orange',
transform=plt.gca().transAxes, verticalalignment='top')
plt.axis('off')
# Error analysis for ReLU
plt.subplot(2, 3, 4)
error_relu = np.abs(y_relu - y_relu_ideal)
plt.plot(x_range, error_relu, 'b-', linewidth=2)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Error')
plt.title(f'ReLU Error (Max: {np.max(error_relu):.2e})')
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
# Error analysis for Sigmoid
plt.subplot(2, 3, 5)
error_sigmoid = np.abs(y_sigmoid - y_sigmoid_ideal)
plt.plot(x_range, error_sigmoid, 'g-', linewidth=2)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Error')
plt.title(f'Sigmoid Error (Max: {np.max(error_sigmoid):.2e})')
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
# Error analysis for Tanh
plt.subplot(2, 3, 6)
error_tanh = np.abs(y_tanh - y_tanh_ideal)
plt.plot(x_range, error_tanh, 'orange', linewidth=2)
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
plt.xlabel('Input (x)')
plt.ylabel('Error')
plt.title(f'Tanh Error (Max: {np.max(error_tanh):.2e})')
plt.grid(True, alpha=0.3)
plt.xlim(-5, 5)
plt.tight_layout()
plt.show()
# Comprehensive analysis
print("\n📊 Comprehensive Analysis:")
print("=" * 60)
# Function ranges
print("📈 Output Ranges:")
print(f" ReLU: [{np.min(y_relu):.3f}, {np.max(y_relu):.3f}]")
print(f" Sigmoid: [{np.min(y_sigmoid):.3f}, {np.max(y_sigmoid):.3f}]")
print(f" Tanh: [{np.min(y_tanh):.3f}, {np.max(y_tanh):.3f}]")
# Implementation accuracy
print("\n🎯 Implementation Accuracy:")
max_errors = [np.max(error_relu), np.max(error_sigmoid), np.max(error_tanh)]
functions = ['ReLU', 'Sigmoid', 'Tanh']
for func, error in zip(functions, max_errors):
if error < 1e-10:
status = "✅ PERFECT"
elif error < 1e-6:
status = "✅ EXCELLENT"
elif error < 1e-3:
status = "⚠️ GOOD"
else:
status = "❌ NEEDS WORK"
print(f" {func:8s}: {status:12s} (error: {error:.2e})")
# Mathematical properties verification
print("\n🔍 Mathematical Properties:")
# Zero-centered test
x_zero = Tensor([[0.0]])
print(" Zero-centered test (f(0) should be 0):")
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
output = func(x_zero).data[0, 0]
is_zero = abs(output) < 1e-6
expected = 0.0 if name != "Sigmoid" else 0.5
print(f" {name:8s}: f(0) = {output:.4f} {'✅' if abs(output - expected) < 1e-6 else '❌'}")
# Monotonicity test
print(" Monotonicity test (should be increasing):")
test_vals = np.array([-2, -1, 0, 1, 2])
x_test = Tensor([test_vals])
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
outputs = func(x_test).data[0]
is_monotonic = np.all(outputs[1:] >= outputs[:-1])
print(f" {name:8s}: {'✅ Monotonic' if is_monotonic else '❌ Not monotonic'}")
print("\n🎉 Comparison complete! Use these insights to understand each function's role in neural networks.")
else:
print("📊 Plots disabled during testing - this is normal!")
except Exception as e:
print(f"❌ Error in plotting: {e}")
print("Make sure matplotlib is installed and all functions are implemented!")
# %% [markdown]
"""
## Step 5: Understanding Activation Function Properties
Let's explore the mathematical properties of each function:
"""
# %%
# Explore activation function properties
try:
print("=== Activation Function Properties ===")
# Create test functions
relu = ReLU()
sigmoid = Sigmoid()
tanh = Tanh()
# Test with a range of values
test_values = np.linspace(-5, 5, 11)
x = Tensor([test_values])
print(f"Input range: {test_values}")
print(f"ReLU range: [{np.min(relu(x).data):.2f}, {np.max(relu(x).data):.2f}]")
print(f"Sigmoid range: [{np.min(sigmoid(x).data):.2f}, {np.max(sigmoid(x).data):.2f}]")
print(f"Tanh range: [{np.min(tanh(x).data):.2f}, {np.max(tanh(x).data):.2f}]")
# Test monotonicity (should all be increasing functions)
print(f"\n📈 Monotonicity Test:")
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
outputs = func(x).data[0]
is_monotonic = np.all(outputs[1:] >= outputs[:-1])
print(f"{name}: {'✅ Monotonic' if is_monotonic else '❌ Not monotonic'}")
# Test zero-centered property
print(f"\n🎯 Zero-Centered Test (f(0) = 0):")
x_zero = Tensor([[0.0]])
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
output = func(x_zero).data[0, 0]
is_zero_centered = abs(output) < 1e-6
print(f"{name}: f(0) = {output:.4f} {'✅ Zero-centered' if is_zero_centered else '❌ Not zero-centered'}")
print("\n🎉 Property analysis complete!")
except Exception as e:
print(f"❌ Error: {e}")
print("Check your activation function implementations!")
# %% [markdown]
"""
## Step 6: Practical Usage Examples
Let's see how these functions would be used in practice:
"""
# %%
# Practical usage examples
try:
print("=== Practical Usage Examples ===")
# Example 1: Binary classification with sigmoid
print("1. Binary Classification (Sigmoid):")
logits = Tensor([[2.5, -1.2, 0.8, -0.3]]) # Raw network outputs
sigmoid = Sigmoid()
probabilities = sigmoid(logits)
print(f" Logits: {logits.data}")
print(f" Probabilities: {probabilities.data}")
print(f" Predictions: {(probabilities.data > 0.5).astype(int)}")
# Example 2: Feature processing with ReLU
print("\n2. Feature Processing (ReLU):")
features = Tensor([[-0.5, 1.2, -2.1, 0.8, -0.1]]) # Mixed positive/negative
relu = ReLU()
processed = relu(features)
print(f" Raw features: {features.data}")
print(f" After ReLU: {processed.data}")
print(f" Sparsity: {np.mean(processed.data == 0):.1%} zeros")
# Example 3: Normalized features with Tanh
print("\n3. Normalized Features (Tanh):")
raw_features = Tensor([[3.2, -1.8, 0.5, -2.4, 1.1]])
tanh = Tanh()
normalized = tanh(raw_features)
print(f" Raw features: {raw_features.data}")
print(f" Normalized: {normalized.data}")
print(f" Mean: {np.mean(normalized.data):.3f} (close to 0)")
print("\n✅ Practical examples complete!")
except Exception as e:
print(f"❌ Error: {e}")
print("Check your activation function implementations!")
# %% [markdown]
"""
## 🎉 Congratulations!
You've successfully implemented the three most important activation functions in deep learning!
### 🧱 What You Built
1. **ReLU**: The workhorse activation that enables deep networks
2. **Sigmoid**: The probability activation for binary classification
3. **Tanh**: The zero-centered activation for better gradient flow
### 🎯 Key Insights
- **Nonlinearity is essential**: Without activations, neural networks are just linear transformations
- **Different functions serve different purposes**: ReLU for hidden layers, Sigmoid for probabilities, Tanh for zero-centered outputs
- **Mathematical properties matter**: Monotonicity, boundedness, and zero-centering affect learning
### 🚀 What's Next
These activation functions will be used in:
- **Layers Module**: Building neural network layers
- **Loss Functions**: Computing training objectives
- **Advanced Architectures**: CNNs, RNNs, and more
### 🔧 Export to Package
Run this to export your activations to the TinyTorch package:
```bash
python bin/tito.py sync
```
Then test your implementation:
```bash
python bin/tito.py test --module activations
```
**Excellent work! You've mastered the mathematical foundations of neural networks!** 🎉
---
## 📚 Further Reading
**Want to learn more about activation functions?**
- **ReLU variants**: Leaky ReLU, ELU, Swish
- **Advanced activations**: GELU, Mish, SiLU
- **Activation choice**: When to use which function
- **Gradient flow**: How activations affect training
**Next modules**: Layers, Loss Functions, Optimization
"""