mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-29 00:37:32 -05:00
- Restored tools/py_to_notebook.py as a focused, standalone tool - Updated tito notebooks command to use subprocess to call the separate tool - Maintains clean separation of concerns: tito.py for CLI orchestration, py_to_notebook.py for conversion logic - Updated documentation to use 'tito notebooks' command instead of direct tool calls - Benefits: easier debugging, better maintainability, focused single-responsibility modules
1162 lines
39 KiB
Plaintext
1162 lines
39 KiB
Plaintext
# ---
|
|
# jupyter:
|
|
# jupytext:
|
|
# text_representation:
|
|
# extension: .py
|
|
# format_name: percent
|
|
# format_version: '1.3'
|
|
# jupytext_version: 1.17.1
|
|
# ---
|
|
|
|
# %% [markdown]
|
|
"""
|
|
# 🔥 TinyTorch Activations Module
|
|
|
|
Welcome to the **Activations** module! This is where you'll implement the mathematical functions that give neural networks their power.
|
|
|
|
## 🎯 Learning Objectives
|
|
|
|
By the end of this module, you will:
|
|
1. **Understand** why activation functions are essential for neural networks
|
|
2. **Implement** the three most important activation functions: ReLU, Sigmoid, and Tanh
|
|
3. **Test** your functions with various inputs to understand their behavior
|
|
4. **Use** these functions as building blocks for neural networks
|
|
|
|
## 🧠 Why Activation Functions Matter
|
|
|
|
**Without activation functions, neural networks are just linear transformations!**
|
|
|
|
```
|
|
Linear → Linear → Linear = Still just Linear
|
|
Linear → Activation → Linear = Can learn complex patterns!
|
|
```
|
|
|
|
**Key insight**: Activation functions add **nonlinearity**, allowing networks to learn complex patterns that linear functions cannot capture.
|
|
|
|
## 📚 What You'll Build
|
|
|
|
- **ReLU**: `f(x) = max(0, x)` - The workhorse of deep learning
|
|
- **Sigmoid**: `f(x) = 1 / (1 + e^(-x))` - Squashes to (0, 1)
|
|
- **Tanh**: `f(x) = tanh(x)` - Squashes to (-1, 1)
|
|
|
|
Each function serves different purposes and has different mathematical properties.
|
|
|
|
---
|
|
|
|
Let's start building! 🚀
|
|
"""
|
|
|
|
# %%
|
|
#| default_exp core.activations
|
|
|
|
# Standard library imports
|
|
import math
|
|
import numpy as np
|
|
import matplotlib.pyplot as plt
|
|
import os
|
|
import sys
|
|
|
|
# TinyTorch imports
|
|
from tinytorch.core.tensor import Tensor
|
|
|
|
# %%
|
|
# Helper function to detect if we're in a testing environment
|
|
def _should_show_plots():
|
|
"""
|
|
Determine if we should show plots based on the execution context.
|
|
|
|
Returns False if:
|
|
- Running in pytest (detected by 'pytest' in sys.modules)
|
|
- Running in test environment (detected by environment variables)
|
|
- Running from command line test runner
|
|
|
|
Returns True if:
|
|
- Running in Jupyter notebook
|
|
- Running interactively in Python
|
|
"""
|
|
# Check if we're running in pytest
|
|
if 'pytest' in sys.modules:
|
|
return False
|
|
|
|
# Check if we're in a test environment
|
|
if os.environ.get('PYTEST_CURRENT_TEST'):
|
|
return False
|
|
|
|
# Check if we're running from a test file (more specific check)
|
|
if any(arg.endswith('.py') and 'test_' in os.path.basename(arg) and 'tests/' in arg for arg in sys.argv):
|
|
return False
|
|
|
|
# Check if we're running from the tito CLI test command
|
|
if len(sys.argv) > 0 and 'tito.py' in sys.argv[0] and 'test' in sys.argv:
|
|
return False
|
|
|
|
# Default to showing plots (notebook/interactive environment)
|
|
return True
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 1: ReLU Activation Function
|
|
|
|
**ReLU** (Rectified Linear Unit) is the most popular activation function in deep learning.
|
|
|
|
**Formula**: `f(x) = max(0, x)`
|
|
|
|
**Properties**:
|
|
- **Simple**: Easy to compute and understand
|
|
- **Sparse**: Outputs exactly zero for negative inputs
|
|
- **Unbounded**: No upper limit on positive outputs
|
|
- **Non-saturating**: Doesn't suffer from vanishing gradients
|
|
|
|
**When to use**: Almost everywhere! It's the default choice for hidden layers.
|
|
"""
|
|
|
|
# %%
|
|
#| export
|
|
class ReLU:
|
|
"""
|
|
ReLU Activation: f(x) = max(0, x)
|
|
|
|
The most popular activation function in deep learning.
|
|
Simple, effective, and computationally efficient.
|
|
|
|
TODO: Implement ReLU activation function.
|
|
"""
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""
|
|
Apply ReLU: f(x) = max(0, x)
|
|
|
|
Args:
|
|
x: Input tensor
|
|
|
|
Returns:
|
|
Output tensor with ReLU applied element-wise
|
|
|
|
TODO: Implement element-wise max(0, x) operation
|
|
Hint: Use np.maximum(0, x.data)
|
|
"""
|
|
raise NotImplementedError("Student implementation required")
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
"""Make activation callable: relu(x) same as relu.forward(x)"""
|
|
return self.forward(x)
|
|
|
|
# %%
|
|
#| hide
|
|
#| export
|
|
class ReLU:
|
|
"""ReLU Activation: f(x) = max(0, x)"""
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""Apply ReLU: f(x) = max(0, x)"""
|
|
return Tensor(np.maximum(0, x.data))
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
return self.forward(x)
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 🧪 Test Your ReLU Function
|
|
|
|
Once you implement ReLU above, run this cell to test it:
|
|
"""
|
|
|
|
# %%
|
|
# Test ReLU function
|
|
try:
|
|
print("=== Testing ReLU Function ===")
|
|
|
|
# Test data: mix of positive, negative, and zero
|
|
x = Tensor([[-3.0, -1.0, 0.0, 1.0, 3.0]])
|
|
print(f"Input: {x.data}")
|
|
|
|
# Test ReLU
|
|
relu = ReLU()
|
|
y = relu(x)
|
|
print(f"ReLU output: {y.data}")
|
|
print(f"Expected: [[0. 0. 0. 1. 3.]]")
|
|
|
|
# Test with different shapes
|
|
x_2d = Tensor([[-2.0, 1.0], [0.5, -0.5]])
|
|
y_2d = relu(x_2d)
|
|
print(f"\n2D Input: {x_2d.data}")
|
|
print(f"2D ReLU output: {y_2d.data}")
|
|
|
|
print("✅ ReLU working!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Make sure to implement the ReLU function above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 📊 Visualize ReLU Function
|
|
|
|
Let's plot the ReLU function to see how it transforms inputs:
|
|
"""
|
|
|
|
# %%
|
|
# Plot ReLU function
|
|
try:
|
|
print("=== Plotting ReLU Function ===")
|
|
|
|
# Create a range of input values
|
|
x_range = np.linspace(-5, 5, 100)
|
|
x_tensor = Tensor([x_range])
|
|
|
|
# Apply ReLU (student implementation)
|
|
relu = ReLU()
|
|
y_tensor = relu(x_tensor)
|
|
y_range = y_tensor.data[0]
|
|
|
|
# Create ideal ReLU for comparison
|
|
y_ideal = np.maximum(0, x_range)
|
|
|
|
# Only show plots if we're not in a testing environment
|
|
if _should_show_plots():
|
|
# Create the plot
|
|
plt.figure(figsize=(12, 8))
|
|
|
|
# Plot both student implementation and ideal
|
|
plt.subplot(2, 2, 1)
|
|
plt.plot(x_range, y_range, 'b-', linewidth=3, label='Your ReLU Implementation')
|
|
plt.plot(x_range, y_ideal, 'r--', linewidth=2, alpha=0.7, label='Ideal ReLU')
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Output')
|
|
plt.title('ReLU: Your Implementation vs Ideal')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.legend()
|
|
plt.xlim(-5, 5)
|
|
plt.ylim(-1, 5)
|
|
|
|
# Mathematical explanation plot
|
|
plt.subplot(2, 2, 2)
|
|
# Show the mathematical definition
|
|
x_math = np.array([-3, -2, -1, 0, 1, 2, 3])
|
|
y_math = np.maximum(0, x_math)
|
|
plt.stem(x_math, y_math, basefmt=' ', linefmt='g-', markerfmt='go')
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('max(0, x)')
|
|
plt.title('Mathematical Definition: max(0, x)')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-4, 4)
|
|
plt.ylim(-0.5, 3.5)
|
|
|
|
# Show the piecewise nature
|
|
plt.subplot(2, 2, 3)
|
|
x_left = np.linspace(-5, 0, 50)
|
|
x_right = np.linspace(0, 5, 50)
|
|
plt.plot(x_left, np.zeros_like(x_left), 'r-', linewidth=3, label='f(x) = 0 for x < 0')
|
|
plt.plot(x_right, x_right, 'b-', linewidth=3, label='f(x) = x for x ≥ 0')
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Output')
|
|
plt.title('Piecewise Function Definition')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.legend()
|
|
plt.xlim(-5, 5)
|
|
plt.ylim(-1, 5)
|
|
|
|
# Error analysis
|
|
plt.subplot(2, 2, 4)
|
|
difference = np.abs(y_range - y_ideal)
|
|
max_error = np.max(difference)
|
|
plt.plot(x_range, difference, 'purple', linewidth=2)
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('|Your Output - Ideal Output|')
|
|
plt.title(f'Implementation Error (Max: {max_error:.6f})')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-5, 5)
|
|
|
|
plt.tight_layout()
|
|
plt.show()
|
|
|
|
# Print analysis
|
|
print(f"\n📊 Analysis:")
|
|
print(f"✅ Maximum error: {max_error:.10f}")
|
|
if max_error < 1e-10:
|
|
print("🎉 Perfect implementation!")
|
|
elif max_error < 1e-6:
|
|
print("🌟 Excellent implementation!")
|
|
elif max_error < 1e-3:
|
|
print("👍 Good implementation!")
|
|
else:
|
|
print("🔧 Implementation needs work.")
|
|
|
|
print(f"📈 Function properties:")
|
|
print(f" • Range: [0, ∞)")
|
|
print(f" • Piecewise: f(x) = 0 for x < 0, f(x) = x for x ≥ 0")
|
|
print(f" • Monotonic: Always increasing for x ≥ 0")
|
|
print(f" • Sparse: Exactly zero for negative inputs")
|
|
else:
|
|
print("📊 Plots disabled during testing - this is normal!")
|
|
|
|
# Always show the mathematical analysis
|
|
difference = np.abs(y_range - y_ideal)
|
|
max_error = np.max(difference)
|
|
print(f"\n📊 Mathematical Analysis:")
|
|
print(f"✅ Maximum error: {max_error:.10f}")
|
|
if max_error < 1e-10:
|
|
print("🎉 Perfect implementation!")
|
|
elif max_error < 1e-6:
|
|
print("🌟 Excellent implementation!")
|
|
elif max_error < 1e-3:
|
|
print("👍 Good implementation!")
|
|
else:
|
|
print("🔧 Implementation needs work.")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error in plotting: {e}")
|
|
print("Make sure to implement the ReLU function above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 2: Sigmoid Activation Function
|
|
|
|
**Sigmoid** squashes any input to the range (0, 1), making it useful for probabilities.
|
|
|
|
**Formula**: `f(x) = 1 / (1 + e^(-x))`
|
|
|
|
**Properties**:
|
|
- **Bounded**: Always outputs between 0 and 1
|
|
- **Smooth**: Differentiable everywhere
|
|
- **S-shaped**: Smooth transition from 0 to 1
|
|
- **Saturating**: Can suffer from vanishing gradients
|
|
|
|
**When to use**: Binary classification (final layer), gates in RNNs/LSTMs.
|
|
|
|
**⚠️ Numerical Stability**: Be careful with large inputs to avoid overflow!
|
|
"""
|
|
|
|
# %%
|
|
#| export
|
|
class Sigmoid:
|
|
"""
|
|
Sigmoid Activation: f(x) = 1 / (1 + e^(-x))
|
|
|
|
Squashes input to range (0, 1). Often used for binary classification.
|
|
|
|
TODO: Implement Sigmoid activation function.
|
|
"""
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""
|
|
Apply Sigmoid: f(x) = 1 / (1 + e^(-x))
|
|
|
|
Args:
|
|
x: Input tensor
|
|
|
|
Returns:
|
|
Output tensor with Sigmoid applied element-wise
|
|
|
|
TODO: Implement sigmoid function (be careful with numerical stability!)
|
|
|
|
Hint: For numerical stability, use:
|
|
- For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
|
|
- For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
|
|
"""
|
|
raise NotImplementedError("Student implementation required")
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
return self.forward(x)
|
|
|
|
# %%
|
|
#| hide
|
|
#| export
|
|
class Sigmoid:
|
|
"""Sigmoid Activation: f(x) = 1 / (1 + e^(-x))"""
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""Apply Sigmoid with numerical stability"""
|
|
# Use the numerically stable version to avoid overflow
|
|
# For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
|
|
# For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
|
|
x_data = x.data
|
|
result = np.zeros_like(x_data)
|
|
|
|
# Stable computation
|
|
positive_mask = x_data >= 0
|
|
result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))
|
|
result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))
|
|
|
|
return Tensor(result)
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
return self.forward(x)
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 🧪 Test Your Sigmoid Function
|
|
|
|
Once you implement Sigmoid above, run this cell to test it:
|
|
"""
|
|
|
|
# %%
|
|
# Test Sigmoid function
|
|
try:
|
|
print("=== Testing Sigmoid Function ===")
|
|
|
|
# Test data: mix of positive, negative, and zero
|
|
x = Tensor([[-5.0, -1.0, 0.0, 1.0, 5.0]])
|
|
print(f"Input: {x.data}")
|
|
|
|
# Test Sigmoid
|
|
sigmoid = Sigmoid()
|
|
y = sigmoid(x)
|
|
print(f"Sigmoid output: {y.data}")
|
|
print("Expected: values between 0 and 1")
|
|
print(f"All values in (0,1)? {np.all((y.data > 0) & (y.data < 1))}")
|
|
|
|
# Test specific values
|
|
x_zero = Tensor([[0.0]])
|
|
y_zero = sigmoid(x_zero)
|
|
print(f"\nSigmoid(0) = {y_zero.data[0, 0]:.4f} (should be 0.5)")
|
|
|
|
# Test extreme values (numerical stability)
|
|
x_extreme = Tensor([[-100.0, 100.0]])
|
|
y_extreme = sigmoid(x_extreme)
|
|
print(f"Sigmoid([-100, 100]) = {y_extreme.data}")
|
|
print("Should be close to [0, 1] without overflow errors")
|
|
|
|
print("✅ Sigmoid working!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Make sure to implement the Sigmoid function above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 📊 Visualize Sigmoid Function
|
|
|
|
Let's plot the Sigmoid function to see its S-shaped curve:
|
|
"""
|
|
|
|
# %%
|
|
# Plot Sigmoid function
|
|
try:
|
|
print("=== Plotting Sigmoid Function ===")
|
|
|
|
# Create a range of input values
|
|
x_range = np.linspace(-10, 10, 100)
|
|
x_tensor = Tensor([x_range])
|
|
|
|
# Apply Sigmoid (student implementation)
|
|
sigmoid = Sigmoid()
|
|
y_tensor = sigmoid(x_tensor)
|
|
y_range = y_tensor.data[0]
|
|
|
|
# Create ideal Sigmoid for comparison
|
|
y_ideal = 1.0 / (1.0 + np.exp(-x_range))
|
|
|
|
# Only show plots if we're not in a testing environment
|
|
if _should_show_plots():
|
|
# Create the plot
|
|
plt.figure(figsize=(12, 8))
|
|
|
|
# Plot both student implementation and ideal
|
|
plt.subplot(2, 2, 1)
|
|
plt.plot(x_range, y_range, 'g-', linewidth=3, label='Your Sigmoid Implementation')
|
|
plt.plot(x_range, y_ideal, 'r--', linewidth=2, alpha=0.7, label='Ideal Sigmoid')
|
|
plt.axhline(y=0.5, color='orange', linestyle='--', alpha=0.5, label='y = 0.5')
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axhline(y=1, color='k', linestyle='-', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Output')
|
|
plt.title('Sigmoid: Your Implementation vs Ideal')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.legend()
|
|
plt.xlim(-10, 10)
|
|
plt.ylim(-0.1, 1.1)
|
|
|
|
# Mathematical explanation plot
|
|
plt.subplot(2, 2, 2)
|
|
# Show key points
|
|
x_key = np.array([-5, -2, -1, 0, 1, 2, 5])
|
|
y_key = 1.0 / (1.0 + np.exp(-x_key))
|
|
plt.stem(x_key, y_key, basefmt=' ', linefmt='orange', markerfmt='o')
|
|
plt.axhline(y=0.5, color='orange', linestyle='--', alpha=0.5)
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axhline(y=1, color='k', linestyle='-', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('1/(1+e^(-x))')
|
|
plt.title('Mathematical Definition: 1/(1+e^(-x))')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-6, 6)
|
|
plt.ylim(-0.1, 1.1)
|
|
|
|
# Show the S-curve properties
|
|
plt.subplot(2, 2, 3)
|
|
x_detailed = np.linspace(-8, 8, 200)
|
|
y_detailed = 1.0 / (1.0 + np.exp(-x_detailed))
|
|
plt.plot(x_detailed, y_detailed, 'g-', linewidth=3)
|
|
# Add asymptotes
|
|
plt.axhline(y=0, color='r', linestyle='--', alpha=0.7, label='Lower asymptote: y = 0')
|
|
plt.axhline(y=1, color='r', linestyle='--', alpha=0.7, label='Upper asymptote: y = 1')
|
|
plt.axhline(y=0.5, color='orange', linestyle='--', alpha=0.7, label='Midpoint: y = 0.5')
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Output')
|
|
plt.title('S-Curve Properties')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.legend()
|
|
plt.xlim(-8, 8)
|
|
plt.ylim(-0.1, 1.1)
|
|
|
|
# Error analysis
|
|
plt.subplot(2, 2, 4)
|
|
difference = np.abs(y_range - y_ideal)
|
|
max_error = np.max(difference)
|
|
plt.plot(x_range, difference, 'purple', linewidth=2)
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('|Your Output - Ideal Output|')
|
|
plt.title(f'Implementation Error (Max: {max_error:.6f})')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-10, 10)
|
|
|
|
plt.tight_layout()
|
|
plt.show()
|
|
|
|
# Print analysis
|
|
print(f"\n📊 Analysis:")
|
|
print(f"✅ Maximum error: {max_error:.10f}")
|
|
if max_error < 1e-10:
|
|
print("🎉 Perfect implementation!")
|
|
elif max_error < 1e-6:
|
|
print("🌟 Excellent implementation!")
|
|
elif max_error < 1e-3:
|
|
print("👍 Good implementation!")
|
|
else:
|
|
print("🔧 Implementation needs work.")
|
|
|
|
print(f"📈 Function properties:")
|
|
print(f" • Range: (0, 1)")
|
|
print(f" • Symmetric around (0, 0.5)")
|
|
print(f" • Smooth and differentiable everywhere")
|
|
print(f" • Saturates for large |x| (vanishing gradient problem)")
|
|
print(f" • Useful for binary classification (outputs probabilities)")
|
|
else:
|
|
print("📊 Plots disabled during testing - this is normal!")
|
|
|
|
# Always show the mathematical analysis
|
|
difference = np.abs(y_range - y_ideal)
|
|
max_error = np.max(difference)
|
|
print(f"\n📊 Mathematical Analysis:")
|
|
print(f"✅ Maximum error: {max_error:.10f}")
|
|
if max_error < 1e-10:
|
|
print("🎉 Perfect implementation!")
|
|
elif max_error < 1e-6:
|
|
print("🌟 Excellent implementation!")
|
|
elif max_error < 1e-3:
|
|
print("👍 Good implementation!")
|
|
else:
|
|
print("🔧 Implementation needs work.")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error in plotting: {e}")
|
|
print("Make sure to implement the Sigmoid function above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 3: Tanh Activation Function
|
|
|
|
**Tanh** (Hyperbolic Tangent) squashes inputs to the range (-1, 1).
|
|
|
|
**Formula**: `f(x) = tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))`
|
|
|
|
**Properties**:
|
|
- **Bounded**: Always outputs between -1 and 1
|
|
- **Zero-centered**: Output is centered around 0
|
|
- **Smooth**: Differentiable everywhere
|
|
- **Stronger gradients**: Than sigmoid around zero
|
|
|
|
**When to use**: Hidden layers when you want zero-centered outputs, RNNs.
|
|
|
|
**Advantage over Sigmoid**: Zero-centered outputs help with gradient flow.
|
|
"""
|
|
|
|
# %%
|
|
#| export
|
|
class Tanh:
|
|
"""
|
|
Tanh Activation: f(x) = tanh(x)
|
|
|
|
Squashes input to range (-1, 1). Zero-centered output.
|
|
|
|
TODO: Implement Tanh activation function.
|
|
"""
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""
|
|
Apply Tanh: f(x) = tanh(x)
|
|
|
|
Args:
|
|
x: Input tensor
|
|
|
|
Returns:
|
|
Output tensor with Tanh applied element-wise
|
|
|
|
TODO: Implement tanh function
|
|
Hint: Use np.tanh(x.data)
|
|
"""
|
|
raise NotImplementedError("Student implementation required")
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
return self.forward(x)
|
|
|
|
# %%
|
|
#| hide
|
|
#| export
|
|
class Tanh:
|
|
"""Tanh Activation: f(x) = tanh(x)"""
|
|
|
|
def forward(self, x: Tensor) -> Tensor:
|
|
"""Apply Tanh"""
|
|
return Tensor(np.tanh(x.data))
|
|
|
|
def __call__(self, x: Tensor) -> Tensor:
|
|
return self.forward(x)
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 🧪 Test Your Tanh Function
|
|
|
|
Once you implement Tanh above, run this cell to test it:
|
|
"""
|
|
|
|
# %%
|
|
# Test Tanh function
|
|
try:
|
|
print("=== Testing Tanh Function ===")
|
|
|
|
# Test data: mix of positive, negative, and zero
|
|
x = Tensor([[-3.0, -1.0, 0.0, 1.0, 3.0]])
|
|
print(f"Input: {x.data}")
|
|
|
|
# Test Tanh
|
|
tanh = Tanh()
|
|
y = tanh(x)
|
|
print(f"Tanh output: {y.data}")
|
|
print("Expected: values between -1 and 1")
|
|
print(f"All values in (-1,1)? {np.all((y.data > -1) & (y.data < 1))}")
|
|
|
|
# Test specific values
|
|
x_zero = Tensor([[0.0]])
|
|
y_zero = tanh(x_zero)
|
|
print(f"\nTanh(0) = {y_zero.data[0, 0]:.4f} (should be 0.0)")
|
|
|
|
# Test extreme values
|
|
x_extreme = Tensor([[-10.0, 10.0]])
|
|
y_extreme = tanh(x_extreme)
|
|
print(f"Tanh([-10, 10]) = {y_extreme.data}")
|
|
print("Should be close to [-1, 1]")
|
|
|
|
print("✅ Tanh working!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Make sure to implement the Tanh function above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 📊 Visualize Tanh Function
|
|
|
|
Let's plot the Tanh function to see its zero-centered S-shaped curve:
|
|
"""
|
|
|
|
# %%
|
|
# Plot Tanh function
|
|
try:
|
|
print("=== Plotting Tanh Function ===")
|
|
|
|
# Create a range of input values
|
|
x_range = np.linspace(-5, 5, 100)
|
|
x_tensor = Tensor([x_range])
|
|
|
|
# Apply Tanh (student implementation)
|
|
tanh = Tanh()
|
|
y_tensor = tanh(x_tensor)
|
|
y_range = y_tensor.data[0]
|
|
|
|
# Create ideal Tanh for comparison
|
|
y_ideal = np.tanh(x_range)
|
|
|
|
# Only show plots if we're not in a testing environment
|
|
if _should_show_plots():
|
|
# Create the plot
|
|
plt.figure(figsize=(12, 8))
|
|
|
|
# Plot both student implementation and ideal
|
|
plt.subplot(2, 2, 1)
|
|
plt.plot(x_range, y_range, 'orange', linewidth=3, label='Your Tanh Implementation')
|
|
plt.plot(x_range, y_ideal, 'r--', linewidth=2, alpha=0.7, label='Ideal Tanh')
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axhline(y=1, color='k', linestyle='--', alpha=0.3)
|
|
plt.axhline(y=-1, color='k', linestyle='--', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Output')
|
|
plt.title('Tanh: Your Implementation vs Ideal')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.legend()
|
|
plt.xlim(-5, 5)
|
|
plt.ylim(-1.2, 1.2)
|
|
|
|
# Mathematical explanation plot
|
|
plt.subplot(2, 2, 2)
|
|
# Show key points
|
|
x_key = np.array([-3, -2, -1, 0, 1, 2, 3])
|
|
y_key = np.tanh(x_key)
|
|
plt.stem(x_key, y_key, basefmt=' ', linefmt='purple', markerfmt='o')
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axhline(y=1, color='k', linestyle='--', alpha=0.3)
|
|
plt.axhline(y=-1, color='k', linestyle='--', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('tanh(x)')
|
|
plt.title('Mathematical Definition: tanh(x)')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-4, 4)
|
|
plt.ylim(-1.2, 1.2)
|
|
|
|
# Show symmetry property
|
|
plt.subplot(2, 2, 3)
|
|
x_sym = np.linspace(-4, 4, 100)
|
|
y_sym = np.tanh(x_sym)
|
|
plt.plot(x_sym, y_sym, 'orange', linewidth=3, label='tanh(x)')
|
|
plt.plot(-x_sym, -y_sym, 'b--', linewidth=2, alpha=0.7, label='-tanh(-x)')
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axhline(y=1, color='r', linestyle='--', alpha=0.7, label='Upper asymptote: y = 1')
|
|
plt.axhline(y=-1, color='r', linestyle='--', alpha=0.7, label='Lower asymptote: y = -1')
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Output')
|
|
plt.title('Symmetry: tanh(-x) = -tanh(x)')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.legend()
|
|
plt.xlim(-4, 4)
|
|
plt.ylim(-1.2, 1.2)
|
|
|
|
# Error analysis
|
|
plt.subplot(2, 2, 4)
|
|
difference = np.abs(y_range - y_ideal)
|
|
max_error = np.max(difference)
|
|
plt.plot(x_range, difference, 'purple', linewidth=2)
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('|Your Output - Ideal Output|')
|
|
plt.title(f'Implementation Error (Max: {max_error:.6f})')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-5, 5)
|
|
|
|
plt.tight_layout()
|
|
plt.show()
|
|
|
|
# Print analysis
|
|
print(f"\n📊 Analysis:")
|
|
print(f"✅ Maximum error: {max_error:.10f}")
|
|
if max_error < 1e-10:
|
|
print("🎉 Perfect implementation!")
|
|
elif max_error < 1e-6:
|
|
print("🌟 Excellent implementation!")
|
|
elif max_error < 1e-3:
|
|
print("👍 Good implementation!")
|
|
else:
|
|
print("🔧 Implementation needs work.")
|
|
|
|
print(f"📈 Function properties:")
|
|
print(f" • Range: (-1, 1)")
|
|
print(f" • Odd function: tanh(-x) = -tanh(x)")
|
|
print(f" • Symmetric around origin (0, 0)")
|
|
print(f" • Smooth and differentiable everywhere")
|
|
print(f" • Stronger gradients than sigmoid around zero")
|
|
print(f" • Related to sigmoid: tanh(x) = 2*sigmoid(2x) - 1")
|
|
else:
|
|
print("📊 Plots disabled during testing - this is normal!")
|
|
|
|
# Always show the mathematical analysis
|
|
difference = np.abs(y_range - y_ideal)
|
|
max_error = np.max(difference)
|
|
print(f"\n📊 Mathematical Analysis:")
|
|
print(f"✅ Maximum error: {max_error:.10f}")
|
|
if max_error < 1e-10:
|
|
print("🎉 Perfect implementation!")
|
|
elif max_error < 1e-6:
|
|
print("🌟 Excellent implementation!")
|
|
elif max_error < 1e-3:
|
|
print("👍 Good implementation!")
|
|
else:
|
|
print("🔧 Implementation needs work.")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error in plotting: {e}")
|
|
print("Make sure to implement the Tanh function above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 4: Compare All Activation Functions
|
|
|
|
Let's see how all three functions behave on the same input:
|
|
"""
|
|
|
|
# %%
|
|
# Compare all activation functions
|
|
try:
|
|
print("=== Comparing All Activation Functions ===")
|
|
|
|
# Test data: range from -5 to 5
|
|
x = Tensor([[-5.0, -2.0, -1.0, 0.0, 1.0, 2.0, 5.0]])
|
|
print(f"Input: {x.data}")
|
|
|
|
# Apply all activations
|
|
relu = ReLU()
|
|
sigmoid = Sigmoid()
|
|
tanh = Tanh()
|
|
|
|
y_relu = relu(x)
|
|
y_sigmoid = sigmoid(x)
|
|
y_tanh = tanh(x)
|
|
|
|
print(f"\nReLU: {y_relu.data}")
|
|
print(f"Sigmoid: {y_sigmoid.data}")
|
|
print(f"Tanh: {y_tanh.data}")
|
|
|
|
print("\n📊 Key Differences:")
|
|
print("- ReLU: Zeros out negative values, unbounded positive")
|
|
print("- Sigmoid: Squashes to (0, 1), always positive")
|
|
print("- Tanh: Squashes to (-1, 1), zero-centered")
|
|
|
|
print("\n✅ All activation functions working!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Make sure to implement all activation functions above!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
### 📊 Comprehensive Activation Function Comparison
|
|
|
|
Let's plot all three functions together to see their differences:
|
|
"""
|
|
|
|
# %%
|
|
# Plot all activation functions together
|
|
try:
|
|
print("=== Plotting All Activation Functions Together ===")
|
|
|
|
# Create a range of input values
|
|
x_range = np.linspace(-5, 5, 100)
|
|
x_tensor = Tensor([x_range])
|
|
|
|
# Apply all activations (student implementations)
|
|
relu = ReLU()
|
|
sigmoid = Sigmoid()
|
|
tanh = Tanh()
|
|
|
|
y_relu = relu(x_tensor).data[0]
|
|
y_sigmoid = sigmoid(x_tensor).data[0]
|
|
y_tanh = tanh(x_tensor).data[0]
|
|
|
|
# Create ideal functions for comparison
|
|
y_relu_ideal = np.maximum(0, x_range)
|
|
y_sigmoid_ideal = 1.0 / (1.0 + np.exp(-x_range))
|
|
y_tanh_ideal = np.tanh(x_range)
|
|
|
|
# Only show plots if we're not in a testing environment
|
|
if _should_show_plots():
|
|
# Create the comprehensive plot
|
|
plt.figure(figsize=(15, 10))
|
|
|
|
# Main comparison plot
|
|
plt.subplot(2, 3, (1, 2))
|
|
plt.plot(x_range, y_relu, 'b-', linewidth=3, label='Your ReLU')
|
|
plt.plot(x_range, y_sigmoid, 'g-', linewidth=3, label='Your Sigmoid')
|
|
plt.plot(x_range, y_tanh, 'orange', linewidth=3, label='Your Tanh')
|
|
|
|
# Add ideal functions as dashed lines
|
|
plt.plot(x_range, y_relu_ideal, 'b--', linewidth=1, alpha=0.7, label='Ideal ReLU')
|
|
plt.plot(x_range, y_sigmoid_ideal, 'g--', linewidth=1, alpha=0.7, label='Ideal Sigmoid')
|
|
plt.plot(x_range, y_tanh_ideal, '--', color='orange', linewidth=1, alpha=0.7, label='Ideal Tanh')
|
|
|
|
# Add reference lines
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.axhline(y=1, color='k', linestyle='--', alpha=0.3)
|
|
plt.axhline(y=-1, color='k', linestyle='--', alpha=0.3)
|
|
plt.axvline(x=0, color='k', linestyle='-', alpha=0.3)
|
|
|
|
# Formatting
|
|
plt.xlabel('Input (x)', fontsize=12)
|
|
plt.ylabel('Output f(x)', fontsize=12)
|
|
plt.title('Activation Functions: Your Implementation vs Ideal', fontsize=14, fontweight='bold')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.legend(fontsize=10, loc='upper left')
|
|
plt.xlim(-5, 5)
|
|
plt.ylim(-1.5, 5)
|
|
|
|
# Mathematical definitions
|
|
plt.subplot(2, 3, 3)
|
|
plt.text(0.05, 0.95, 'Mathematical Definitions:', fontsize=12, fontweight='bold',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.85, 'ReLU:', fontsize=11, fontweight='bold', color='blue',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.80, 'f(x) = max(0, x)', fontsize=10, fontfamily='monospace',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.70, 'Sigmoid:', fontsize=11, fontweight='bold', color='green',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.65, 'f(x) = 1/(1+e^(-x))', fontsize=10, fontfamily='monospace',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.55, 'Tanh:', fontsize=11, fontweight='bold', color='orange',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.50, 'f(x) = tanh(x)', fontsize=10, fontfamily='monospace',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.45, ' = (e^x-e^(-x))/(e^x+e^(-x))', fontsize=10, fontfamily='monospace',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
|
|
plt.text(0.05, 0.30, 'Key Properties:', fontsize=12, fontweight='bold',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.25, '• ReLU: Sparse, unbounded', fontsize=10, color='blue',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.20, '• Sigmoid: Bounded (0,1)', fontsize=10, color='green',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.text(0.05, 0.15, '• Tanh: Zero-centered (-1,1)', fontsize=10, color='orange',
|
|
transform=plt.gca().transAxes, verticalalignment='top')
|
|
plt.axis('off')
|
|
|
|
# Error analysis for ReLU
|
|
plt.subplot(2, 3, 4)
|
|
error_relu = np.abs(y_relu - y_relu_ideal)
|
|
plt.plot(x_range, error_relu, 'b-', linewidth=2)
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Error')
|
|
plt.title(f'ReLU Error (Max: {np.max(error_relu):.2e})')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-5, 5)
|
|
|
|
# Error analysis for Sigmoid
|
|
plt.subplot(2, 3, 5)
|
|
error_sigmoid = np.abs(y_sigmoid - y_sigmoid_ideal)
|
|
plt.plot(x_range, error_sigmoid, 'g-', linewidth=2)
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Error')
|
|
plt.title(f'Sigmoid Error (Max: {np.max(error_sigmoid):.2e})')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-5, 5)
|
|
|
|
# Error analysis for Tanh
|
|
plt.subplot(2, 3, 6)
|
|
error_tanh = np.abs(y_tanh - y_tanh_ideal)
|
|
plt.plot(x_range, error_tanh, 'orange', linewidth=2)
|
|
plt.axhline(y=0, color='k', linestyle='-', alpha=0.3)
|
|
plt.xlabel('Input (x)')
|
|
plt.ylabel('Error')
|
|
plt.title(f'Tanh Error (Max: {np.max(error_tanh):.2e})')
|
|
plt.grid(True, alpha=0.3)
|
|
plt.xlim(-5, 5)
|
|
|
|
plt.tight_layout()
|
|
plt.show()
|
|
|
|
# Comprehensive analysis
|
|
print("\n📊 Comprehensive Analysis:")
|
|
print("=" * 60)
|
|
|
|
# Function ranges
|
|
print("📈 Output Ranges:")
|
|
print(f" ReLU: [{np.min(y_relu):.3f}, {np.max(y_relu):.3f}]")
|
|
print(f" Sigmoid: [{np.min(y_sigmoid):.3f}, {np.max(y_sigmoid):.3f}]")
|
|
print(f" Tanh: [{np.min(y_tanh):.3f}, {np.max(y_tanh):.3f}]")
|
|
|
|
# Implementation accuracy
|
|
print("\n🎯 Implementation Accuracy:")
|
|
max_errors = [np.max(error_relu), np.max(error_sigmoid), np.max(error_tanh)]
|
|
functions = ['ReLU', 'Sigmoid', 'Tanh']
|
|
|
|
for func, error in zip(functions, max_errors):
|
|
if error < 1e-10:
|
|
status = "✅ PERFECT"
|
|
elif error < 1e-6:
|
|
status = "✅ EXCELLENT"
|
|
elif error < 1e-3:
|
|
status = "⚠️ GOOD"
|
|
else:
|
|
status = "❌ NEEDS WORK"
|
|
print(f" {func:8s}: {status:12s} (error: {error:.2e})")
|
|
|
|
# Mathematical properties verification
|
|
print("\n🔍 Mathematical Properties:")
|
|
|
|
# Zero-centered test
|
|
x_zero = Tensor([[0.0]])
|
|
print(" Zero-centered test (f(0) should be 0):")
|
|
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
|
|
output = func(x_zero).data[0, 0]
|
|
is_zero = abs(output) < 1e-6
|
|
expected = 0.0 if name != "Sigmoid" else 0.5
|
|
print(f" {name:8s}: f(0) = {output:.4f} {'✅' if abs(output - expected) < 1e-6 else '❌'}")
|
|
|
|
# Monotonicity test
|
|
print(" Monotonicity test (should be increasing):")
|
|
test_vals = np.array([-2, -1, 0, 1, 2])
|
|
x_test = Tensor([test_vals])
|
|
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
|
|
outputs = func(x_test).data[0]
|
|
is_monotonic = np.all(outputs[1:] >= outputs[:-1])
|
|
print(f" {name:8s}: {'✅ Monotonic' if is_monotonic else '❌ Not monotonic'}")
|
|
|
|
print("\n🎉 Comparison complete! Use these insights to understand each function's role in neural networks.")
|
|
else:
|
|
print("📊 Plots disabled during testing - this is normal!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error in plotting: {e}")
|
|
print("Make sure matplotlib is installed and all functions are implemented!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 5: Understanding Activation Function Properties
|
|
|
|
Let's explore the mathematical properties of each function:
|
|
"""
|
|
|
|
# %%
|
|
# Explore activation function properties
|
|
try:
|
|
print("=== Activation Function Properties ===")
|
|
|
|
# Create test functions
|
|
relu = ReLU()
|
|
sigmoid = Sigmoid()
|
|
tanh = Tanh()
|
|
|
|
# Test with a range of values
|
|
test_values = np.linspace(-5, 5, 11)
|
|
x = Tensor([test_values])
|
|
|
|
print(f"Input range: {test_values}")
|
|
print(f"ReLU range: [{np.min(relu(x).data):.2f}, {np.max(relu(x).data):.2f}]")
|
|
print(f"Sigmoid range: [{np.min(sigmoid(x).data):.2f}, {np.max(sigmoid(x).data):.2f}]")
|
|
print(f"Tanh range: [{np.min(tanh(x).data):.2f}, {np.max(tanh(x).data):.2f}]")
|
|
|
|
# Test monotonicity (should all be increasing functions)
|
|
print(f"\n📈 Monotonicity Test:")
|
|
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
|
|
outputs = func(x).data[0]
|
|
is_monotonic = np.all(outputs[1:] >= outputs[:-1])
|
|
print(f"{name}: {'✅ Monotonic' if is_monotonic else '❌ Not monotonic'}")
|
|
|
|
# Test zero-centered property
|
|
print(f"\n🎯 Zero-Centered Test (f(0) = 0):")
|
|
x_zero = Tensor([[0.0]])
|
|
for name, func in [("ReLU", relu), ("Sigmoid", sigmoid), ("Tanh", tanh)]:
|
|
output = func(x_zero).data[0, 0]
|
|
is_zero_centered = abs(output) < 1e-6
|
|
print(f"{name}: f(0) = {output:.4f} {'✅ Zero-centered' if is_zero_centered else '❌ Not zero-centered'}")
|
|
|
|
print("\n🎉 Property analysis complete!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Check your activation function implementations!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## Step 6: Practical Usage Examples
|
|
|
|
Let's see how these functions would be used in practice:
|
|
"""
|
|
|
|
# %%
|
|
# Practical usage examples
|
|
try:
|
|
print("=== Practical Usage Examples ===")
|
|
|
|
# Example 1: Binary classification with sigmoid
|
|
print("1. Binary Classification (Sigmoid):")
|
|
logits = Tensor([[2.5, -1.2, 0.8, -0.3]]) # Raw network outputs
|
|
sigmoid = Sigmoid()
|
|
probabilities = sigmoid(logits)
|
|
print(f" Logits: {logits.data}")
|
|
print(f" Probabilities: {probabilities.data}")
|
|
print(f" Predictions: {(probabilities.data > 0.5).astype(int)}")
|
|
|
|
# Example 2: Feature processing with ReLU
|
|
print("\n2. Feature Processing (ReLU):")
|
|
features = Tensor([[-0.5, 1.2, -2.1, 0.8, -0.1]]) # Mixed positive/negative
|
|
relu = ReLU()
|
|
processed = relu(features)
|
|
print(f" Raw features: {features.data}")
|
|
print(f" After ReLU: {processed.data}")
|
|
print(f" Sparsity: {np.mean(processed.data == 0):.1%} zeros")
|
|
|
|
# Example 3: Normalized features with Tanh
|
|
print("\n3. Normalized Features (Tanh):")
|
|
raw_features = Tensor([[3.2, -1.8, 0.5, -2.4, 1.1]])
|
|
tanh = Tanh()
|
|
normalized = tanh(raw_features)
|
|
print(f" Raw features: {raw_features.data}")
|
|
print(f" Normalized: {normalized.data}")
|
|
print(f" Mean: {np.mean(normalized.data):.3f} (close to 0)")
|
|
|
|
print("\n✅ Practical examples complete!")
|
|
|
|
except Exception as e:
|
|
print(f"❌ Error: {e}")
|
|
print("Check your activation function implementations!")
|
|
|
|
# %% [markdown]
|
|
"""
|
|
## 🎉 Congratulations!
|
|
|
|
You've successfully implemented the three most important activation functions in deep learning!
|
|
|
|
### 🧱 What You Built
|
|
1. **ReLU**: The workhorse activation that enables deep networks
|
|
2. **Sigmoid**: The probability activation for binary classification
|
|
3. **Tanh**: The zero-centered activation for better gradient flow
|
|
|
|
### 🎯 Key Insights
|
|
- **Nonlinearity is essential**: Without activations, neural networks are just linear transformations
|
|
- **Different functions serve different purposes**: ReLU for hidden layers, Sigmoid for probabilities, Tanh for zero-centered outputs
|
|
- **Mathematical properties matter**: Monotonicity, boundedness, and zero-centering affect learning
|
|
|
|
### 🚀 What's Next
|
|
These activation functions will be used in:
|
|
- **Layers Module**: Building neural network layers
|
|
- **Loss Functions**: Computing training objectives
|
|
- **Advanced Architectures**: CNNs, RNNs, and more
|
|
|
|
### 🔧 Export to Package
|
|
Run this to export your activations to the TinyTorch package:
|
|
```bash
|
|
python bin/tito.py sync
|
|
```
|
|
|
|
Then test your implementation:
|
|
```bash
|
|
python bin/tito.py test --module activations
|
|
```
|
|
|
|
**Excellent work! You've mastered the mathematical foundations of neural networks!** 🎉
|
|
|
|
---
|
|
|
|
## 📚 Further Reading
|
|
|
|
**Want to learn more about activation functions?**
|
|
- **ReLU variants**: Leaky ReLU, ELU, Swish
|
|
- **Advanced activations**: GELU, Mish, SiLU
|
|
- **Activation choice**: When to use which function
|
|
- **Gradient flow**: How activations affect training
|
|
|
|
**Next modules**: Layers, Loss Functions, Optimization
|
|
""" |