mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-03 17:50:52 -05:00
- Preserve computation graph by using Tensor arithmetic (x - x_max, exp / sum) - No more .data extraction that breaks gradient flow - Numerically stable with max subtraction before exp Required for transformer attention softmax gradient flow
1209 lines
40 KiB
Plaintext
1209 lines
40 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a65f03ef",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"# Activations - Intelligence Through Nonlinearity\n",
|
||
"\n",
|
||
"Welcome to Activations! Today you'll add the secret ingredient that makes neural networks intelligent: **nonlinearity**.\n",
|
||
"\n",
|
||
"## 🔗 Prerequisites & Progress\n",
|
||
"**You've Built**: Tensor with data manipulation and basic operations\n",
|
||
"**You'll Build**: Activation functions that add nonlinearity to transformations\n",
|
||
"**You'll Enable**: Neural networks with the ability to learn complex patterns\n",
|
||
"\n",
|
||
"**Connection Map**:\n",
|
||
"```\n",
|
||
"Tensor → Activations → Layers\n",
|
||
"(data) (intelligence) (architecture)\n",
|
||
"```\n",
|
||
"\n",
|
||
"## Learning Objectives\n",
|
||
"By the end of this module, you will:\n",
|
||
"1. Implement 5 core activation functions (Sigmoid, ReLU, Tanh, GELU, Softmax)\n",
|
||
"2. Understand how nonlinearity enables neural network intelligence\n",
|
||
"3. Test activation behaviors and output ranges\n",
|
||
"4. Connect activations to real neural network components\n",
|
||
"\n",
|
||
"Let's add intelligence to your tensors!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2d2bde70",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 📦 Where This Code Lives in the Final Package\n",
|
||
"\n",
|
||
"**Learning Side:** You work in modules/02_activations/activations_dev.py\n",
|
||
"**Building Side:** Code exports to tinytorch.core.activations\n",
|
||
"\n",
|
||
"```python\n",
|
||
"# Final package structure:\n",
|
||
"from tinytorch.core.activations import Sigmoid, ReLU, Tanh, GELU, Softmax # This module\n",
|
||
"from tinytorch.core.tensor import Tensor # Foundation (Module 01)\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why this matters:**\n",
|
||
"- **Learning:** Complete activation system in one focused module for deep understanding\n",
|
||
"- **Production:** Proper organization like PyTorch's torch.nn.functional with all activation operations together\n",
|
||
"- **Consistency:** All activation functions and behaviors in core.activations\n",
|
||
"- **Integration:** Works seamlessly with Tensor for complete nonlinear transformations"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "fc87ae92",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 📋 Module Prerequisites & Setup\n",
|
||
"\n",
|
||
"This module builds on previous TinyTorch components. Here's what we need and why:\n",
|
||
"\n",
|
||
"**Required Components:**\n",
|
||
"- **Tensor** (Module 01): Foundation for all activation computations and data flow\n",
|
||
"\n",
|
||
"**Integration Helper:**\n",
|
||
"The `import_previous_module()` function below helps us cleanly import components from previous modules during development and testing."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "7797ec62",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "setup",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| default_exp core.activations\n",
|
||
"#| export\n",
|
||
"\n",
|
||
"import numpy as np\n",
|
||
"from typing import Optional\n",
|
||
"import sys\n",
|
||
"import os\n",
|
||
"\n",
|
||
"\n",
|
||
"# Import will be in export cell"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "4cf71245",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 1. Introduction - What Makes Neural Networks Intelligent?\n",
|
||
"\n",
|
||
"Consider two scenarios:\n",
|
||
"\n",
|
||
"**Without Activations (Linear Only):**\n",
|
||
"```\n",
|
||
"Input → Linear Transform → Output\n",
|
||
"[1, 2] → [3, 4] → [11] # Just weighted sum\n",
|
||
"```\n",
|
||
"\n",
|
||
"**With Activations (Nonlinear):**\n",
|
||
"```\n",
|
||
"Input → Linear → Activation → Linear → Activation → Output\n",
|
||
"[1, 2] → [3, 4] → [3, 4] → [7] → [7] → Complex Pattern!\n",
|
||
"```\n",
|
||
"\n",
|
||
"The magic happens in those activation functions. They introduce **nonlinearity** - the ability to curve, bend, and create complex decision boundaries instead of just straight lines.\n",
|
||
"\n",
|
||
"### Why Nonlinearity Matters\n",
|
||
"\n",
|
||
"Without activation functions, stacking multiple linear layers is pointless:\n",
|
||
"```\n",
|
||
"Linear(Linear(x)) = Linear(x) # Same as single layer!\n",
|
||
"```\n",
|
||
"\n",
|
||
"With activation functions, each layer can learn increasingly complex patterns:\n",
|
||
"```\n",
|
||
"Layer 1: Simple edges and lines\n",
|
||
"Layer 2: Curves and shapes\n",
|
||
"Layer 3: Complex objects and concepts\n",
|
||
"```\n",
|
||
"\n",
|
||
"This is how deep networks build intelligence from simple mathematical operations."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1a42e702",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 2. Mathematical Foundations\n",
|
||
"\n",
|
||
"Each activation function serves a different purpose in neural networks:\n",
|
||
"\n",
|
||
"### The Five Essential Activations\n",
|
||
"\n",
|
||
"1. **Sigmoid**: Maps to (0, 1) - perfect for probabilities\n",
|
||
"2. **ReLU**: Removes negatives - creates sparsity and efficiency\n",
|
||
"3. **Tanh**: Maps to (-1, 1) - zero-centered for better training\n",
|
||
"4. **GELU**: Smooth ReLU - modern choice for transformers\n",
|
||
"5. **Softmax**: Creates probability distributions - essential for classification\n",
|
||
"\n",
|
||
"Let's implement each one with clear explanations and immediate testing!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a08f91f1",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 3. Implementation - Building Activation Functions\n",
|
||
"\n",
|
||
"### 🏗️ Implementation Pattern\n",
|
||
"\n",
|
||
"Each activation follows this structure:\n",
|
||
"```python\n",
|
||
"class ActivationName:\n",
|
||
" def forward(self, x: Tensor) -> Tensor:\n",
|
||
" # Apply mathematical transformation\n",
|
||
" # Return new Tensor with result\n",
|
||
"\n",
|
||
" def backward(self, grad: Tensor) -> Tensor:\n",
|
||
" # Stub for Module 05 - gradient computation\n",
|
||
" pass\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "bb7e11b8",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## Sigmoid - The Probability Gatekeeper\n",
|
||
"\n",
|
||
"Sigmoid maps any real number to the range (0, 1), making it perfect for probabilities and binary decisions.\n",
|
||
"\n",
|
||
"### Mathematical Definition\n",
|
||
"```\n",
|
||
"σ(x) = 1/(1 + e^(-x))\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Visual Behavior\n",
|
||
"```\n",
|
||
"Input: [-3, -1, 0, 1, 3]\n",
|
||
" ↓ ↓ ↓ ↓ ↓ Sigmoid Function\n",
|
||
"Output: [0.05, 0.27, 0.5, 0.73, 0.95]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### ASCII Visualization\n",
|
||
"```\n",
|
||
"Sigmoid Curve:\n",
|
||
" 1.0 ┤ ╭─────\n",
|
||
" │ ╱\n",
|
||
" 0.5 ┤ ╱\n",
|
||
" │ ╱\n",
|
||
" 0.0 ┤─╱─────────\n",
|
||
" -3 0 3\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why Sigmoid matters**: In binary classification, we need outputs between 0 and 1 to represent probabilities. Sigmoid gives us exactly that!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b90730ab",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "sigmoid-impl",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"from tinytorch.core.tensor import Tensor\n",
|
||
"\n",
|
||
"class Sigmoid:\n",
|
||
" \"\"\"\n",
|
||
" Sigmoid activation: σ(x) = 1/(1 + e^(-x))\n",
|
||
"\n",
|
||
" Maps any real number to (0, 1) range.\n",
|
||
" Perfect for probabilities and binary classification.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def forward(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"\n",
|
||
" Apply sigmoid activation element-wise.\n",
|
||
"\n",
|
||
" TODO: Implement sigmoid function\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Apply sigmoid formula: 1 / (1 + exp(-x))\n",
|
||
" 2. Use np.exp for exponential\n",
|
||
" 3. Return result wrapped in new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> sigmoid = Sigmoid()\n",
|
||
" >>> x = Tensor([-2, 0, 2])\n",
|
||
" >>> result = sigmoid(x)\n",
|
||
" >>> print(result.data)\n",
|
||
" [0.119, 0.5, 0.881] # All values between 0 and 1\n",
|
||
"\n",
|
||
" HINT: Use np.exp(-x.data) for numerical stability\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Apply sigmoid: 1 / (1 + exp(-x))\n",
|
||
" result_data = 1.0 / (1.0 + np.exp(-x.data))\n",
|
||
" result = Tensor(result_data)\n",
|
||
" \n",
|
||
" # Track gradients if autograd is enabled and input requires_grad\n",
|
||
" if SigmoidBackward is not None and x.requires_grad:\n",
|
||
" result.requires_grad = True\n",
|
||
" result._grad_fn = SigmoidBackward(x, result)\n",
|
||
" \n",
|
||
" return result\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __call__(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"Allows the activation to be called like a function.\"\"\"\n",
|
||
" return self.forward(x)\n",
|
||
"\n",
|
||
" def backward(self, grad: Tensor) -> Tensor:\n",
|
||
" \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n",
|
||
" pass # Will implement backward pass in Module 05"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "27a57cf3",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🔬 Unit Test: Sigmoid\n",
|
||
"This test validates sigmoid activation behavior.\n",
|
||
"**What we're testing**: Sigmoid maps inputs to (0, 1) range\n",
|
||
"**Why it matters**: Ensures proper probability-like outputs\n",
|
||
"**Expected**: All outputs between 0 and 1, sigmoid(0) = 0.5"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "91296689",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-sigmoid",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_sigmoid():\n",
|
||
" \"\"\"🔬 Test Sigmoid implementation.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Sigmoid...\")\n",
|
||
"\n",
|
||
" sigmoid = Sigmoid()\n",
|
||
"\n",
|
||
" # Test basic cases\n",
|
||
" x = Tensor([0.0])\n",
|
||
" result = sigmoid.forward(x)\n",
|
||
" assert np.allclose(result.data, [0.5]), f\"sigmoid(0) should be 0.5, got {result.data}\"\n",
|
||
"\n",
|
||
" # Test range property - all outputs should be in (0, 1)\n",
|
||
" x = Tensor([-10, -1, 0, 1, 10])\n",
|
||
" result = sigmoid.forward(x)\n",
|
||
" assert np.all(result.data > 0) and np.all(result.data < 1), \"All sigmoid outputs should be in (0, 1)\"\n",
|
||
"\n",
|
||
" # Test specific values\n",
|
||
" x = Tensor([-1000, 1000]) # Extreme values\n",
|
||
" result = sigmoid.forward(x)\n",
|
||
" assert np.allclose(result.data[0], 0, atol=1e-10), \"sigmoid(-∞) should approach 0\"\n",
|
||
" assert np.allclose(result.data[1], 1, atol=1e-10), \"sigmoid(+∞) should approach 1\"\n",
|
||
"\n",
|
||
" print(\"✅ Sigmoid works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_sigmoid()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "41ae8ed4",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## ReLU - The Sparsity Creator\n",
|
||
"\n",
|
||
"ReLU (Rectified Linear Unit) is the most popular activation function. It simply removes negative values, creating sparsity that makes neural networks more efficient.\n",
|
||
"\n",
|
||
"### Mathematical Definition\n",
|
||
"```\n",
|
||
"f(x) = max(0, x)\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Visual Behavior\n",
|
||
"```\n",
|
||
"Input: [-2, -1, 0, 1, 2]\n",
|
||
" ↓ ↓ ↓ ↓ ↓ ReLU Function\n",
|
||
"Output: [ 0, 0, 0, 1, 2]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### ASCII Visualization\n",
|
||
"```\n",
|
||
"ReLU Function:\n",
|
||
" ╱\n",
|
||
" 2 ╱\n",
|
||
" ╱\n",
|
||
" 1╱\n",
|
||
" ╱\n",
|
||
" ╱\n",
|
||
" ╱\n",
|
||
"─┴─────\n",
|
||
"-2 0 2\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why ReLU matters**: By zeroing negative values, ReLU creates sparsity (many zeros) which makes computation faster and helps prevent overfitting."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "c3438519",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "relu-impl",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class ReLU:\n",
|
||
" \"\"\"\n",
|
||
" ReLU activation: f(x) = max(0, x)\n",
|
||
"\n",
|
||
" Sets negative values to zero, keeps positive values unchanged.\n",
|
||
" Most popular activation for hidden layers.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def forward(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"\n",
|
||
" Apply ReLU activation element-wise.\n",
|
||
"\n",
|
||
" TODO: Implement ReLU function\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Use np.maximum(0, x.data) for element-wise max with zero\n",
|
||
" 2. Return result wrapped in new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> relu = ReLU()\n",
|
||
" >>> x = Tensor([-2, -1, 0, 1, 2])\n",
|
||
" >>> result = relu(x)\n",
|
||
" >>> print(result.data)\n",
|
||
" [0, 0, 0, 1, 2] # Negative values become 0, positive unchanged\n",
|
||
"\n",
|
||
" HINT: np.maximum handles element-wise maximum automatically\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Apply ReLU: max(0, x)\n",
|
||
" result = np.maximum(0, x.data)\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __call__(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"Allows the activation to be called like a function.\"\"\"\n",
|
||
" return self.forward(x)\n",
|
||
"\n",
|
||
" def backward(self, grad: Tensor) -> Tensor:\n",
|
||
" \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n",
|
||
" pass # Will implement backward pass in Module 05"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b038349a",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🔬 Unit Test: ReLU\n",
|
||
"This test validates ReLU activation behavior.\n",
|
||
"**What we're testing**: ReLU zeros negative values, preserves positive\n",
|
||
"**Why it matters**: ReLU's sparsity helps neural networks train efficiently\n",
|
||
"**Expected**: Negative → 0, positive unchanged, zero → 0"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "710535c5",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-relu",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_relu():\n",
|
||
" \"\"\"🔬 Test ReLU implementation.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: ReLU...\")\n",
|
||
"\n",
|
||
" relu = ReLU()\n",
|
||
"\n",
|
||
" # Test mixed positive/negative values\n",
|
||
" x = Tensor([-2, -1, 0, 1, 2])\n",
|
||
" result = relu.forward(x)\n",
|
||
" expected = [0, 0, 0, 1, 2]\n",
|
||
" assert np.allclose(result.data, expected), f\"ReLU failed, expected {expected}, got {result.data}\"\n",
|
||
"\n",
|
||
" # Test all negative\n",
|
||
" x = Tensor([-5, -3, -1])\n",
|
||
" result = relu.forward(x)\n",
|
||
" assert np.allclose(result.data, [0, 0, 0]), \"ReLU should zero all negative values\"\n",
|
||
"\n",
|
||
" # Test all positive\n",
|
||
" x = Tensor([1, 3, 5])\n",
|
||
" result = relu.forward(x)\n",
|
||
" assert np.allclose(result.data, [1, 3, 5]), \"ReLU should preserve all positive values\"\n",
|
||
"\n",
|
||
" # Test sparsity property\n",
|
||
" x = Tensor([-1, -2, -3, 1])\n",
|
||
" result = relu.forward(x)\n",
|
||
" zeros = np.sum(result.data == 0)\n",
|
||
" assert zeros == 3, f\"ReLU should create sparsity, got {zeros} zeros out of 4\"\n",
|
||
"\n",
|
||
" print(\"✅ ReLU works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_relu()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "25c9a414",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Tanh - The Zero-Centered Alternative\n",
|
||
"\n",
|
||
"Tanh (hyperbolic tangent) is like sigmoid but centered around zero, mapping inputs to (-1, 1). This zero-centering helps with gradient flow during training.\n",
|
||
"\n",
|
||
"### Mathematical Definition\n",
|
||
"```\n",
|
||
"f(x) = (e^x - e^(-x))/(e^x + e^(-x))\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Visual Behavior\n",
|
||
"```\n",
|
||
"Input: [-2, 0, 2]\n",
|
||
" ↓ ↓ ↓ Tanh Function\n",
|
||
"Output: [-0.96, 0, 0.96]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### ASCII Visualization\n",
|
||
"```\n",
|
||
"Tanh Curve:\n",
|
||
" 1 ┤ ╭─────\n",
|
||
" │ ╱\n",
|
||
" 0 ┤───╱─────\n",
|
||
" │ ╱\n",
|
||
" -1 ┤─╱───────\n",
|
||
" -3 0 3\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why Tanh matters**: Unlike sigmoid, tanh outputs are centered around zero, which can help gradients flow better through deep networks."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "2e428827",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "tanh-impl",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class Tanh:\n",
|
||
" \"\"\"\n",
|
||
" Tanh activation: f(x) = (e^x - e^(-x))/(e^x + e^(-x))\n",
|
||
"\n",
|
||
" Maps any real number to (-1, 1) range.\n",
|
||
" Zero-centered alternative to sigmoid.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def forward(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"\n",
|
||
" Apply tanh activation element-wise.\n",
|
||
"\n",
|
||
" TODO: Implement tanh function\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Use np.tanh(x.data) for hyperbolic tangent\n",
|
||
" 2. Return result wrapped in new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> tanh = Tanh()\n",
|
||
" >>> x = Tensor([-2, 0, 2])\n",
|
||
" >>> result = tanh(x)\n",
|
||
" >>> print(result.data)\n",
|
||
" [-0.964, 0.0, 0.964] # Range (-1, 1), symmetric around 0\n",
|
||
"\n",
|
||
" HINT: NumPy provides np.tanh function\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Apply tanh using NumPy\n",
|
||
" result = np.tanh(x.data)\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __call__(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"Allows the activation to be called like a function.\"\"\"\n",
|
||
" return self.forward(x)\n",
|
||
"\n",
|
||
" def backward(self, grad: Tensor) -> Tensor:\n",
|
||
" \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n",
|
||
" pass # Will implement backward pass in Module 05"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "045af2f1",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🔬 Unit Test: Tanh\n",
|
||
"This test validates tanh activation behavior.\n",
|
||
"**What we're testing**: Tanh maps inputs to (-1, 1) range, zero-centered\n",
|
||
"**Why it matters**: Zero-centered activations can help with gradient flow\n",
|
||
"**Expected**: All outputs in (-1, 1), tanh(0) = 0, symmetric behavior"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "287a3c73",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-tanh",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_tanh():\n",
|
||
" \"\"\"🔬 Test Tanh implementation.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Tanh...\")\n",
|
||
"\n",
|
||
" tanh = Tanh()\n",
|
||
"\n",
|
||
" # Test zero\n",
|
||
" x = Tensor([0.0])\n",
|
||
" result = tanh.forward(x)\n",
|
||
" assert np.allclose(result.data, [0.0]), f\"tanh(0) should be 0, got {result.data}\"\n",
|
||
"\n",
|
||
" # Test range property - all outputs should be in (-1, 1)\n",
|
||
" x = Tensor([-10, -1, 0, 1, 10])\n",
|
||
" result = tanh.forward(x)\n",
|
||
" assert np.all(result.data >= -1) and np.all(result.data <= 1), \"All tanh outputs should be in [-1, 1]\"\n",
|
||
"\n",
|
||
" # Test symmetry: tanh(-x) = -tanh(x)\n",
|
||
" x = Tensor([2.0])\n",
|
||
" pos_result = tanh.forward(x)\n",
|
||
" x_neg = Tensor([-2.0])\n",
|
||
" neg_result = tanh.forward(x_neg)\n",
|
||
" assert np.allclose(pos_result.data, -neg_result.data), \"tanh should be symmetric: tanh(-x) = -tanh(x)\"\n",
|
||
"\n",
|
||
" # Test extreme values\n",
|
||
" x = Tensor([-1000, 1000])\n",
|
||
" result = tanh.forward(x)\n",
|
||
" assert np.allclose(result.data[0], -1, atol=1e-10), \"tanh(-∞) should approach -1\"\n",
|
||
" assert np.allclose(result.data[1], 1, atol=1e-10), \"tanh(+∞) should approach 1\"\n",
|
||
"\n",
|
||
" print(\"✅ Tanh works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_tanh()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7be7b936",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## GELU - The Smooth Modern Choice\n",
|
||
"\n",
|
||
"GELU (Gaussian Error Linear Unit) is a smooth approximation to ReLU that's become popular in modern architectures like transformers. Unlike ReLU's sharp corner, GELU is smooth everywhere.\n",
|
||
"\n",
|
||
"### Mathematical Definition\n",
|
||
"```\n",
|
||
"f(x) = x * Φ(x) ≈ x * Sigmoid(1.702 * x)\n",
|
||
"```\n",
|
||
"Where Φ(x) is the cumulative distribution function of standard normal distribution.\n",
|
||
"\n",
|
||
"### Visual Behavior\n",
|
||
"```\n",
|
||
"Input: [-1, 0, 1]\n",
|
||
" ↓ ↓ ↓ GELU Function\n",
|
||
"Output: [-0.16, 0, 0.84]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### ASCII Visualization\n",
|
||
"```\n",
|
||
"GELU Function:\n",
|
||
" ╱\n",
|
||
" 1 ╱\n",
|
||
" ╱\n",
|
||
" ╱\n",
|
||
" ╱\n",
|
||
" ╱ ↙ (smooth curve, no sharp corner)\n",
|
||
" ╱\n",
|
||
"─┴─────\n",
|
||
"-2 0 2\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why GELU matters**: Used in GPT, BERT, and other transformers. The smoothness helps with optimization compared to ReLU's sharp corner."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "faa72fc8",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "gelu-impl",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class GELU:\n",
|
||
" \"\"\"\n",
|
||
" GELU activation: f(x) = x * Φ(x) ≈ x * Sigmoid(1.702 * x)\n",
|
||
"\n",
|
||
" Smooth approximation to ReLU, used in modern transformers.\n",
|
||
" Where Φ(x) is the cumulative distribution function of standard normal.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def forward(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"\n",
|
||
" Apply GELU activation element-wise.\n",
|
||
"\n",
|
||
" TODO: Implement GELU approximation\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Use approximation: x * sigmoid(1.702 * x)\n",
|
||
" 2. Compute sigmoid part: 1 / (1 + exp(-1.702 * x))\n",
|
||
" 3. Multiply by x element-wise\n",
|
||
" 4. Return result wrapped in new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> gelu = GELU()\n",
|
||
" >>> x = Tensor([-1, 0, 1])\n",
|
||
" >>> result = gelu(x)\n",
|
||
" >>> print(result.data)\n",
|
||
" [-0.159, 0.0, 0.841] # Smooth, like ReLU but differentiable everywhere\n",
|
||
"\n",
|
||
" HINT: The 1.702 constant comes from √(2/π) approximation\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # GELU approximation: x * sigmoid(1.702 * x)\n",
|
||
" # First compute sigmoid part\n",
|
||
" sigmoid_part = 1.0 / (1.0 + np.exp(-1.702 * x.data))\n",
|
||
" # Then multiply by x\n",
|
||
" result = x.data * sigmoid_part\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __call__(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"Allows the activation to be called like a function.\"\"\"\n",
|
||
" return self.forward(x)\n",
|
||
"\n",
|
||
" def backward(self, grad: Tensor) -> Tensor:\n",
|
||
" \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n",
|
||
" pass # Will implement backward pass in Module 05"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "aca7e16d",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🔬 Unit Test: GELU\n",
|
||
"This test validates GELU activation behavior.\n",
|
||
"**What we're testing**: GELU provides smooth ReLU-like behavior\n",
|
||
"**Why it matters**: GELU is used in modern transformers like GPT and BERT\n",
|
||
"**Expected**: Smooth curve, GELU(0) ≈ 0, positive values preserved roughly"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d66fad33",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-gelu",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_gelu():\n",
|
||
" \"\"\"🔬 Test GELU implementation.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: GELU...\")\n",
|
||
"\n",
|
||
" gelu = GELU()\n",
|
||
"\n",
|
||
" # Test zero (should be approximately 0)\n",
|
||
" x = Tensor([0.0])\n",
|
||
" result = gelu.forward(x)\n",
|
||
" assert np.allclose(result.data, [0.0], atol=1e-10), f\"GELU(0) should be ≈0, got {result.data}\"\n",
|
||
"\n",
|
||
" # Test positive values (should be roughly preserved)\n",
|
||
" x = Tensor([1.0])\n",
|
||
" result = gelu.forward(x)\n",
|
||
" assert result.data[0] > 0.8, f\"GELU(1) should be ≈0.84, got {result.data[0]}\"\n",
|
||
"\n",
|
||
" # Test negative values (should be small but not zero)\n",
|
||
" x = Tensor([-1.0])\n",
|
||
" result = gelu.forward(x)\n",
|
||
" assert result.data[0] < 0 and result.data[0] > -0.2, f\"GELU(-1) should be ≈-0.16, got {result.data[0]}\"\n",
|
||
"\n",
|
||
" # Test smoothness property (no sharp corners like ReLU)\n",
|
||
" x = Tensor([-0.001, 0.0, 0.001])\n",
|
||
" result = gelu.forward(x)\n",
|
||
" # Values should be close to each other (smooth)\n",
|
||
" diff1 = abs(result.data[1] - result.data[0])\n",
|
||
" diff2 = abs(result.data[2] - result.data[1])\n",
|
||
" assert diff1 < 0.01 and diff2 < 0.01, \"GELU should be smooth around zero\"\n",
|
||
"\n",
|
||
" print(\"✅ GELU works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_gelu()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "13a2312e",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Softmax - The Probability Distributor\n",
|
||
"\n",
|
||
"Softmax converts any vector into a valid probability distribution. All outputs are positive and sum to exactly 1.0, making it essential for multi-class classification.\n",
|
||
"\n",
|
||
"### Mathematical Definition\n",
|
||
"```\n",
|
||
"f(x_i) = e^(x_i) / Σ(e^(x_j))\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Visual Behavior\n",
|
||
"```\n",
|
||
"Input: [1, 2, 3]\n",
|
||
" ↓ ↓ ↓ Softmax Function\n",
|
||
"Output: [0.09, 0.24, 0.67] # Sum = 1.0\n",
|
||
"```\n",
|
||
"\n",
|
||
"### ASCII Visualization\n",
|
||
"```\n",
|
||
"Softmax Transform:\n",
|
||
"Raw scores: [1, 2, 3, 4]\n",
|
||
" ↓ Exponential ↓\n",
|
||
" [2.7, 7.4, 20.1, 54.6]\n",
|
||
" ↓ Normalize ↓\n",
|
||
" [0.03, 0.09, 0.24, 0.64] ← Sum = 1.0\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why Softmax matters**: In multi-class classification, we need outputs that represent probabilities for each class. Softmax guarantees valid probabilities."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a5fbaab2",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "softmax-impl",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class Softmax:\n",
|
||
" \"\"\"\n",
|
||
" Softmax activation: f(x_i) = e^(x_i) / Σ(e^(x_j))\n",
|
||
"\n",
|
||
" Converts any vector to a probability distribution.\n",
|
||
" Sum of all outputs equals 1.0.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def forward(self, x: Tensor, dim: int = -1) -> Tensor:\n",
|
||
" \"\"\"\n",
|
||
" Apply softmax activation along specified dimension.\n",
|
||
"\n",
|
||
" TODO: Implement numerically stable softmax\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Subtract max for numerical stability: x - max(x)\n",
|
||
" 2. Compute exponentials: exp(x - max(x))\n",
|
||
" 3. Sum along dimension: sum(exp_values)\n",
|
||
" 4. Divide: exp_values / sum\n",
|
||
" 5. Return result wrapped in new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> softmax = Softmax()\n",
|
||
" >>> x = Tensor([1, 2, 3])\n",
|
||
" >>> result = softmax(x)\n",
|
||
" >>> print(result.data)\n",
|
||
" [0.090, 0.245, 0.665] # Sums to 1.0, larger inputs get higher probability\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - Use np.max(x.data, axis=dim, keepdims=True) for max\n",
|
||
" - Use np.sum(exp_values, axis=dim, keepdims=True) for sum\n",
|
||
" - The max subtraction prevents overflow in exponentials\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Numerical stability: subtract max to prevent overflow\n",
|
||
" # Use Tensor operations to preserve gradient flow!\n",
|
||
" x_max_data = np.max(x.data, axis=dim, keepdims=True)\n",
|
||
" x_max = Tensor(x_max_data, requires_grad=False) # max is not differentiable in this context\n",
|
||
" x_shifted = x - x_max # Tensor subtraction!\n",
|
||
"\n",
|
||
" # Compute exponentials (NumPy operation, but wrapped in Tensor)\n",
|
||
" exp_values = Tensor(np.exp(x_shifted.data), requires_grad=x_shifted.requires_grad)\n",
|
||
"\n",
|
||
" # Sum along dimension (Tensor operation)\n",
|
||
" exp_sum_data = np.sum(exp_values.data, axis=dim, keepdims=True)\n",
|
||
" exp_sum = Tensor(exp_sum_data, requires_grad=exp_values.requires_grad)\n",
|
||
"\n",
|
||
" # Normalize to get probabilities (Tensor division!)\n",
|
||
" result = exp_values / exp_sum\n",
|
||
" return result\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __call__(self, x: Tensor, dim: int = -1) -> Tensor:\n",
|
||
" \"\"\"Allows the activation to be called like a function.\"\"\"\n",
|
||
" return self.forward(x, dim)\n",
|
||
"\n",
|
||
" def backward(self, grad: Tensor) -> Tensor:\n",
|
||
" \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n",
|
||
" pass # Will implement backward pass in Module 05"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b7f6d4a6",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🔬 Unit Test: Softmax\n",
|
||
"This test validates softmax activation behavior.\n",
|
||
"**What we're testing**: Softmax creates valid probability distributions\n",
|
||
"**Why it matters**: Essential for multi-class classification outputs\n",
|
||
"**Expected**: Outputs sum to 1.0, all values in (0, 1), largest input gets highest probability"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a68dea4a",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-softmax",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_softmax():\n",
|
||
" \"\"\"🔬 Test Softmax implementation.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Softmax...\")\n",
|
||
"\n",
|
||
" softmax = Softmax()\n",
|
||
"\n",
|
||
" # Test basic probability properties\n",
|
||
" x = Tensor([1, 2, 3])\n",
|
||
" result = softmax.forward(x)\n",
|
||
"\n",
|
||
" # Should sum to 1\n",
|
||
" assert np.allclose(np.sum(result.data), 1.0), f\"Softmax should sum to 1, got {np.sum(result.data)}\"\n",
|
||
"\n",
|
||
" # All values should be positive\n",
|
||
" assert np.all(result.data > 0), \"All softmax values should be positive\"\n",
|
||
"\n",
|
||
" # All values should be less than 1\n",
|
||
" assert np.all(result.data < 1), \"All softmax values should be less than 1\"\n",
|
||
"\n",
|
||
" # Largest input should get largest output\n",
|
||
" max_input_idx = np.argmax(x.data)\n",
|
||
" max_output_idx = np.argmax(result.data)\n",
|
||
" assert max_input_idx == max_output_idx, \"Largest input should get largest softmax output\"\n",
|
||
"\n",
|
||
" # Test numerical stability with large numbers\n",
|
||
" x = Tensor([1000, 1001, 1002]) # Would overflow without max subtraction\n",
|
||
" result = softmax.forward(x)\n",
|
||
" assert np.allclose(np.sum(result.data), 1.0), \"Softmax should handle large numbers\"\n",
|
||
" assert not np.any(np.isnan(result.data)), \"Softmax should not produce NaN\"\n",
|
||
" assert not np.any(np.isinf(result.data)), \"Softmax should not produce infinity\"\n",
|
||
"\n",
|
||
" # Test with 2D tensor (batch dimension)\n",
|
||
" x = Tensor([[1, 2], [3, 4]])\n",
|
||
" result = softmax.forward(x, dim=-1) # Softmax along last dimension\n",
|
||
" assert result.shape == (2, 2), \"Softmax should preserve input shape\"\n",
|
||
" # Each row should sum to 1\n",
|
||
" row_sums = np.sum(result.data, axis=-1)\n",
|
||
" assert np.allclose(row_sums, [1.0, 1.0]), \"Each row should sum to 1\"\n",
|
||
"\n",
|
||
" print(\"✅ Softmax works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_softmax()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "936779e1",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## 4. Integration - Bringing It Together\n",
|
||
"\n",
|
||
"Now let's test how all our activation functions work together and understand their different behaviors."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5ecfa064",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Understanding the Output Patterns\n",
|
||
"\n",
|
||
"From the demonstration above, notice how each activation serves a different purpose:\n",
|
||
"\n",
|
||
"**Sigmoid**: Squashes everything to (0, 1) - good for probabilities\n",
|
||
"**ReLU**: Zeros negatives, keeps positives - creates sparsity\n",
|
||
"**Tanh**: Like sigmoid but centered at zero (-1, 1) - better gradient flow\n",
|
||
"**GELU**: Smooth ReLU-like behavior - modern choice for transformers\n",
|
||
"**Softmax**: Converts to probability distribution - sum equals 1\n",
|
||
"\n",
|
||
"These different behaviors make each activation suitable for different parts of neural networks."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e6d4f14d",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## 🧪 Module Integration Test\n",
|
||
"\n",
|
||
"Final validation that everything works together correctly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8d3e00f4",
|
||
"metadata": {
|
||
"lines_to_next_cell": 2,
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "module-test",
|
||
"locked": true,
|
||
"points": 20
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def import_previous_module(module_name: str, component_name: str):\n",
|
||
" import sys\n",
|
||
" import os\n",
|
||
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', module_name))\n",
|
||
" module = __import__(f\"{module_name.split('_')[1]}_dev\")\n",
|
||
" return getattr(module, component_name)\n",
|
||
"\n",
|
||
"def test_module():\n",
|
||
" \"\"\"\n",
|
||
" Comprehensive test of entire module functionality.\n",
|
||
"\n",
|
||
" This final test runs before module summary to ensure:\n",
|
||
" - All unit tests pass\n",
|
||
" - Functions work together correctly\n",
|
||
" - Module is ready for integration with TinyTorch\n",
|
||
" \"\"\"\n",
|
||
" print(\"🧪 RUNNING MODULE INTEGRATION TEST\")\n",
|
||
" print(\"=\" * 50)\n",
|
||
"\n",
|
||
" # Run all unit tests\n",
|
||
" print(\"Running unit tests...\")\n",
|
||
" test_unit_sigmoid()\n",
|
||
" test_unit_relu()\n",
|
||
" test_unit_tanh()\n",
|
||
" test_unit_gelu()\n",
|
||
" test_unit_softmax()\n",
|
||
"\n",
|
||
" print(\"\\nRunning integration scenarios...\")\n",
|
||
"\n",
|
||
" # Test 1: All activations preserve tensor properties\n",
|
||
" print(\"🔬 Integration Test: Tensor property preservation...\")\n",
|
||
" test_data = Tensor([[1, -1], [2, -2]]) # 2D tensor\n",
|
||
"\n",
|
||
" activations = [Sigmoid(), ReLU(), Tanh(), GELU()]\n",
|
||
" for activation in activations:\n",
|
||
" result = activation.forward(test_data)\n",
|
||
" assert result.shape == test_data.shape, f\"Shape not preserved by {activation.__class__.__name__}\"\n",
|
||
" assert isinstance(result, Tensor), f\"Output not Tensor from {activation.__class__.__name__}\"\n",
|
||
"\n",
|
||
" print(\"✅ All activations preserve tensor properties!\")\n",
|
||
"\n",
|
||
" # Test 2: Softmax works with different dimensions\n",
|
||
" print(\"🔬 Integration Test: Softmax dimension handling...\")\n",
|
||
" data_3d = Tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) # (2, 2, 3)\n",
|
||
" softmax = Softmax()\n",
|
||
"\n",
|
||
" # Test different dimensions\n",
|
||
" result_last = softmax(data_3d, dim=-1)\n",
|
||
" assert result_last.shape == (2, 2, 3), \"Softmax should preserve shape\"\n",
|
||
"\n",
|
||
" # Check that last dimension sums to 1\n",
|
||
" last_dim_sums = np.sum(result_last.data, axis=-1)\n",
|
||
" assert np.allclose(last_dim_sums, 1.0), \"Last dimension should sum to 1\"\n",
|
||
"\n",
|
||
" print(\"✅ Softmax handles different dimensions correctly!\")\n",
|
||
"\n",
|
||
" # Test 3: Activation chaining (simulating neural network)\n",
|
||
" print(\"🔬 Integration Test: Activation chaining...\")\n",
|
||
"\n",
|
||
" # Simulate: Input → Linear → ReLU → Linear → Softmax (like a simple network)\n",
|
||
" x = Tensor([[-1, 0, 1, 2]]) # Batch of 1, 4 features\n",
|
||
"\n",
|
||
" # Apply ReLU (hidden layer activation)\n",
|
||
" relu = ReLU()\n",
|
||
" hidden = relu.forward(x)\n",
|
||
"\n",
|
||
" # Apply Softmax (output layer activation)\n",
|
||
" softmax = Softmax()\n",
|
||
" output = softmax.forward(hidden)\n",
|
||
"\n",
|
||
" # Verify the chain\n",
|
||
" assert hidden.data[0, 0] == 0, \"ReLU should zero negative input\"\n",
|
||
" assert np.allclose(np.sum(output.data), 1.0), \"Final output should be probability distribution\"\n",
|
||
"\n",
|
||
" print(\"✅ Activation chaining works correctly!\")\n",
|
||
"\n",
|
||
" print(\"\\n\" + \"=\" * 50)\n",
|
||
" print(\"🎉 ALL TESTS PASSED! Module ready for export.\")\n",
|
||
" print(\"Run: tito module complete 02\")\n",
|
||
"\n",
|
||
"# Run comprehensive module test\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_module()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "df17a734",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🎯 MODULE SUMMARY: Activations\n",
|
||
"\n",
|
||
"Congratulations! You've built the intelligence engine of neural networks!\n",
|
||
"\n",
|
||
"### Key Accomplishments\n",
|
||
"- Built 5 core activation functions with distinct behaviors and use cases\n",
|
||
"- Implemented forward passes for Sigmoid, ReLU, Tanh, GELU, and Softmax\n",
|
||
"- Discovered how nonlinearity enables complex pattern learning\n",
|
||
"- All tests pass ✅ (validated by `test_module()`)\n",
|
||
"\n",
|
||
"### Ready for Next Steps\n",
|
||
"Your activation implementations enable neural network layers to learn complex, nonlinear patterns instead of just linear transformations.\n",
|
||
"\n",
|
||
"Export with: `tito module complete 02`\n",
|
||
"\n",
|
||
"**Next**: Module 03 will combine your Tensors and Activations to build complete neural network Layers!"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|