diff --git a/docs/_static/demos/02-build-test-ship.gif b/docs/_static/demos/02-build-test-ship.gif index 70d6b157..415503f3 100644 Binary files a/docs/_static/demos/02-build-test-ship.gif and b/docs/_static/demos/02-build-test-ship.gif differ diff --git a/docs/_static/demos/03-milestone-unlocked.gif b/docs/_static/demos/03-milestone-unlocked.gif index 2ddad865..80fa821a 100644 Binary files a/docs/_static/demos/03-milestone-unlocked.gif and b/docs/_static/demos/03-milestone-unlocked.gif differ diff --git a/src/02_activations/02_activations.ipynb b/src/02_activations/02_activations.ipynb new file mode 100644 index 00000000..d215e06f --- /dev/null +++ b/src/02_activations/02_activations.ipynb @@ -0,0 +1,1355 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7d81c336", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "# Activations - Intelligence Through Nonlinearity\n", + "\n", + "Welcome to Activations! Today you'll add the secret ingredient that makes neural networks intelligent: **nonlinearity**.\n", + "\n", + "## πŸ”— Prerequisites & Progress\n", + "**You've Built**: Tensor with data manipulation and basic operations\n", + "**You'll Build**: Activation functions that add nonlinearity to transformations\n", + "**You'll Enable**: Neural networks with the ability to learn complex patterns\n", + "\n", + "**Connection Map**:\n", + "```\n", + "Tensor β†’ Activations β†’ Layers\n", + "(data) (intelligence) (architecture)\n", + "```\n", + "\n", + "## Learning Objectives\n", + "By the end of this module, you will:\n", + "1. Implement 5 core activation functions (Sigmoid, ReLU, Tanh, GELU, Softmax)\n", + "2. Understand how nonlinearity enables neural network intelligence\n", + "3. Test activation behaviors and output ranges\n", + "4. Connect activations to real neural network components\n", + "\n", + "Let's add intelligence to your tensors!" + ] + }, + { + "cell_type": "markdown", + "id": "4d420d81", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "## πŸ“¦ Where This Code Lives in the Final Package\n", + "\n", + "**Learning Side:** You work in modules/02_activations/activations_dev.py\n", + "**Building Side:** Code exports to tinytorch.core.activations\n", + "\n", + "```python\n", + "# Final package structure:\n", + "from tinytorch.core.activations import Sigmoid, ReLU, Tanh, GELU, Softmax # This module\n", + "from tinytorch.core.tensor import Tensor # Foundation (Module 01)\n", + "```\n", + "\n", + "**Why this matters:**\n", + "- **Learning:** Complete activation system in one focused module for deep understanding\n", + "- **Production:** Proper organization like PyTorch's torch.nn.functional with all activation operations together\n", + "- **Consistency:** All activation functions and behaviors in core.activations\n", + "- **Integration:** Works seamlessly with Tensor for complete nonlinear transformations" + ] + }, + { + "cell_type": "markdown", + "id": "ef737812", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "## πŸ“‹ Module Dependencies\n", + "\n", + "**Prerequisites**: Module 01 (Tensor) must be completed\n", + "\n", + "**External Dependencies**:\n", + "- `numpy` (for numerical operations)\n", + "\n", + "**TinyTorch Dependencies**:\n", + "- **Module 01 (Tensor)**: Foundation for all activation computations and data flow\n", + " - Used for: Input/output data structures, shape operations, element-wise operations\n", + " - Required: Yes - activations operate on Tensor objects\n", + "\n", + "**Dependency Flow**:\n", + "```\n", + "Module 01 (Tensor) β†’ Module 02 (Activations) β†’ Module 03 (Layers)\n", + " ↓ ↓ ↓\n", + " Foundation Nonlinearity Architecture\n", + "```\n", + "\n", + "**Import Strategy**:\n", + "This module imports directly from the TinyTorch package (`from tinytorch.core.*`).\n", + "**Assumption**: Module 01 (Tensor) has been completed and exported to the package.\n", + "If you see import errors, ensure you've run `tito export` after completing Module 01." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2066641f", + "metadata": { + "nbgrader": { + "grade": false, + "grade_id": "setup", + "solution": true + } + }, + "outputs": [], + "source": [ + "#| default_exp core.activations\n", + "#| export\n", + "\n", + "import numpy as np\n", + "from typing import Optional\n", + "\n", + "# Import from TinyTorch package (previous modules must be completed and exported)\n", + "from tinytorch.core.tensor import Tensor\n", + "\n", + "# Constants for numerical comparisons\n", + "TOLERANCE = 1e-10 # Small tolerance for floating-point comparisons in tests" + ] + }, + { + "cell_type": "markdown", + "id": "c47833e7", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "## 1. Introduction - What Makes Neural Networks Intelligent?\n", + "\n", + "Consider two scenarios:\n", + "\n", + "**Without Activations (Linear Only):**\n", + "```\n", + "Input β†’ Linear Transform β†’ Output\n", + "[1, 2] β†’ [3, 4] β†’ [11] # Just weighted sum\n", + "```\n", + "\n", + "**With Activations (Nonlinear):**\n", + "```\n", + "Input β†’ Linear β†’ Activation β†’ Linear β†’ Activation β†’ Output\n", + "[1, 2] β†’ [3, 4] β†’ [3, 4] β†’ [7] β†’ [7] β†’ Complex Pattern!\n", + "```\n", + "\n", + "The magic happens in those activation functions. They introduce **nonlinearity** - the ability to curve, bend, and create complex decision boundaries instead of just straight lines.\n", + "\n", + "### Why Nonlinearity Matters\n", + "\n", + "Without activation functions, stacking multiple linear layers is pointless:\n", + "```\n", + "Linear(Linear(x)) = Linear(x) # Same as single layer!\n", + "```\n", + "\n", + "With activation functions, each layer can learn increasingly complex patterns:\n", + "```\n", + "Layer 1: Simple edges and lines\n", + "Layer 2: Curves and shapes\n", + "Layer 3: Complex objects and concepts\n", + "```\n", + "\n", + "This is how deep networks build intelligence from simple mathematical operations." + ] + }, + { + "cell_type": "markdown", + "id": "ca836d90", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "## 2. Mathematical Foundations\n", + "\n", + "Each activation function serves a different purpose in neural networks:\n", + "\n", + "### The Five Essential Activations\n", + "\n", + "1. **Sigmoid**: Maps to (0, 1) - perfect for probabilities\n", + "2. **ReLU**: Removes negatives - creates sparsity and efficiency\n", + "3. **Tanh**: Maps to (-1, 1) - zero-centered for better training\n", + "4. **GELU**: Smooth ReLU - modern choice for transformers\n", + "5. **Softmax**: Creates probability distributions - essential for classification\n", + "\n", + "Let's implement each one with clear explanations and immediate testing!" + ] + }, + { + "cell_type": "markdown", + "id": "5f73cf7e", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "## 3. Implementation - Building Activation Functions\n", + "\n", + "### πŸ—οΈ Implementation Pattern\n", + "\n", + "Each activation follows this structure:\n", + "```python\n", + "class ActivationName:\n", + " def forward(self, x: Tensor) -> Tensor:\n", + " # Apply mathematical transformation\n", + " # Return new Tensor with result\n", + "\n", + " def backward(self, grad: Tensor) -> Tensor:\n", + " # Stub for Module 05 - gradient computation\n", + " pass\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "79ef7336", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "## Sigmoid - The Probability Gatekeeper\n", + "\n", + "Sigmoid maps any real number to the range (0, 1), making it perfect for probabilities and binary decisions.\n", + "\n", + "### Mathematical Definition\n", + "```\n", + "Οƒ(x) = 1/(1 + e^(-x))\n", + "```\n", + "\n", + "### Visual Behavior\n", + "```\n", + "Input: [-3, -1, 0, 1, 3]\n", + " ↓ ↓ ↓ ↓ ↓ Sigmoid Function\n", + "Output: [0.05, 0.27, 0.5, 0.73, 0.95]\n", + "```\n", + "\n", + "### ASCII Visualization\n", + "```\n", + "Sigmoid Curve:\n", + " 1.0 ─ ╭─────\n", + " β”‚ β•±\n", + " 0.5 ─ β•±\n", + " β”‚ β•±\n", + " 0.0 ──╱─────────\n", + " -3 0 3\n", + "```\n", + "\n", + "**Why Sigmoid matters**: In binary classification, we need outputs between 0 and 1 to represent probabilities. Sigmoid gives us exactly that!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0e0285e2", + "metadata": { + "lines_to_next_cell": 1, + "nbgrader": { + "grade": false, + "grade_id": "sigmoid-impl", + "solution": true + } + }, + "outputs": [], + "source": [ + "#| export\n", + "from tinytorch.core.tensor import Tensor\n", + "\n", + "class Sigmoid:\n", + " \"\"\"\n", + " Sigmoid activation: Οƒ(x) = 1/(1 + e^(-x))\n", + "\n", + " Maps any real number to (0, 1) range.\n", + " Perfect for probabilities and binary classification.\n", + " \"\"\"\n", + "\n", + " def forward(self, x: Tensor) -> Tensor:\n", + " \"\"\"\n", + " Apply sigmoid activation element-wise.\n", + "\n", + " TODO: Implement sigmoid function\n", + "\n", + " APPROACH:\n", + " 1. Apply sigmoid formula: 1 / (1 + exp(-x))\n", + " 2. Use np.exp for exponential\n", + " 3. Return result wrapped in new Tensor\n", + "\n", + " EXAMPLE:\n", + " >>> sigmoid = Sigmoid()\n", + " >>> x = Tensor([-2, 0, 2])\n", + " >>> result = sigmoid(x)\n", + " >>> print(result.data)\n", + " [0.119, 0.5, 0.881] # All values between 0 and 1\n", + "\n", + " HINT: Use np.exp(-x.data) for numerical stability\n", + " \"\"\"\n", + " ### BEGIN SOLUTION\n", + " # Apply sigmoid: 1 / (1 + exp(-x))\n", + " # Clip extreme values to prevent overflow (sigmoid(-500) β‰ˆ 0, sigmoid(500) β‰ˆ 1)\n", + " # Clipping at Β±500 ensures exp() stays within float64 range\n", + " z = np.clip(x.data, -500, 500)\n", + "\n", + " # Use numerically stable sigmoid\n", + " # For positive values: 1 / (1 + exp(-x))\n", + " # For negative values: exp(x) / (1 + exp(x)) = 1 / (1 + exp(-x)) after clipping\n", + " result_data = np.zeros_like(z)\n", + "\n", + " # Positive values (including zero)\n", + " pos_mask = z >= 0\n", + " result_data[pos_mask] = 1.0 / (1.0 + np.exp(-z[pos_mask]))\n", + "\n", + " # Negative values\n", + " neg_mask = z < 0\n", + " exp_z = np.exp(z[neg_mask])\n", + " result_data[neg_mask] = exp_z / (1.0 + exp_z)\n", + "\n", + " return Tensor(result_data)\n", + " ### END SOLUTION\n", + "\n", + " def __call__(self, x: Tensor) -> Tensor:\n", + " \"\"\"Allows the activation to be called like a function.\"\"\"\n", + " return self.forward(x)\n", + "\n", + " def backward(self, grad: Tensor) -> Tensor:\n", + " \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n", + " pass # Will implement backward pass in Module 05" + ] + }, + { + "cell_type": "markdown", + "id": "064c45b4", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "### πŸ”¬ Unit Test: Sigmoid\n", + "This test validates sigmoid activation behavior.\n", + "**What we're testing**: Sigmoid maps inputs to (0, 1) range\n", + "**Why it matters**: Ensures proper probability-like outputs\n", + "**Expected**: All outputs between 0 and 1, sigmoid(0) = 0.5" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "622abc9a", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "test-sigmoid", + "locked": true, + "points": 10 + } + }, + "outputs": [], + "source": [ + "def test_unit_sigmoid():\n", + " \"\"\"πŸ”¬ Test Sigmoid implementation.\"\"\"\n", + " print(\"πŸ”¬ Unit Test: Sigmoid...\")\n", + "\n", + " sigmoid = Sigmoid()\n", + "\n", + " # Test basic cases\n", + " x = Tensor([0.0])\n", + " result = sigmoid.forward(x)\n", + " assert np.allclose(result.data, [0.5]), f\"sigmoid(0) should be 0.5, got {result.data}\"\n", + "\n", + " # Test range property - all outputs should be in (0, 1)\n", + " x = Tensor([-10, -1, 0, 1, 10])\n", + " result = sigmoid.forward(x)\n", + " assert np.all(result.data > 0) and np.all(result.data < 1), \"All sigmoid outputs should be in (0, 1)\"\n", + "\n", + " # Test specific values\n", + " x = Tensor([-1000, 1000]) # Extreme values\n", + " result = sigmoid.forward(x)\n", + " assert np.allclose(result.data[0], 0, atol=TOLERANCE), \"sigmoid(-∞) should approach 0\"\n", + " assert np.allclose(result.data[1], 1, atol=TOLERANCE), \"sigmoid(+∞) should approach 1\"\n", + "\n", + " print(\"βœ… Sigmoid works correctly!\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " test_unit_sigmoid()" + ] + }, + { + "cell_type": "markdown", + "id": "edb8b018", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "## ReLU - The Sparsity Creator\n", + "\n", + "ReLU (Rectified Linear Unit) is the most popular activation function. It simply removes negative values, creating sparsity that makes neural networks more efficient.\n", + "\n", + "### Mathematical Definition\n", + "```\n", + "f(x) = max(0, x)\n", + "```\n", + "\n", + "### Visual Behavior\n", + "```\n", + "Input: [-2, -1, 0, 1, 2]\n", + " ↓ ↓ ↓ ↓ ↓ ReLU Function\n", + "Output: [ 0, 0, 0, 1, 2]\n", + "```\n", + "\n", + "### ASCII Visualization\n", + "```\n", + "ReLU Function:\n", + " β•±\n", + " 2 β•±\n", + " β•±\n", + " 1β•±\n", + " β•±\n", + " β•±\n", + " β•±\n", + "─┴─────\n", + "-2 0 2\n", + "```\n", + "\n", + "**Why ReLU matters**: By zeroing negative values, ReLU creates sparsity (many zeros) which makes computation faster and helps prevent overfitting." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "492c0f67", + "metadata": { + "lines_to_next_cell": 1, + "nbgrader": { + "grade": false, + "grade_id": "relu-impl", + "solution": true + } + }, + "outputs": [], + "source": [ + "#| export\n", + "class ReLU:\n", + " \"\"\"\n", + " ReLU activation: f(x) = max(0, x)\n", + "\n", + " Sets negative values to zero, keeps positive values unchanged.\n", + " Most popular activation for hidden layers.\n", + " \"\"\"\n", + "\n", + " def forward(self, x: Tensor) -> Tensor:\n", + " \"\"\"\n", + " Apply ReLU activation element-wise.\n", + "\n", + " TODO: Implement ReLU function\n", + "\n", + " APPROACH:\n", + " 1. Use np.maximum(0, x.data) for element-wise max with zero\n", + " 2. Return result wrapped in new Tensor\n", + "\n", + " EXAMPLE:\n", + " >>> relu = ReLU()\n", + " >>> x = Tensor([-2, -1, 0, 1, 2])\n", + " >>> result = relu(x)\n", + " >>> print(result.data)\n", + " [0, 0, 0, 1, 2] # Negative values become 0, positive unchanged\n", + "\n", + " HINT: np.maximum handles element-wise maximum automatically\n", + " \"\"\"\n", + " ### BEGIN SOLUTION\n", + " # Apply ReLU: max(0, x)\n", + " result = np.maximum(0, x.data)\n", + " return Tensor(result)\n", + " ### END SOLUTION\n", + "\n", + " def __call__(self, x: Tensor) -> Tensor:\n", + " \"\"\"Allows the activation to be called like a function.\"\"\"\n", + " return self.forward(x)\n", + "\n", + " def backward(self, grad: Tensor) -> Tensor:\n", + " \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n", + " pass # Will implement backward pass in Module 05" + ] + }, + { + "cell_type": "markdown", + "id": "12eff51b", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "### πŸ”¬ Unit Test: ReLU\n", + "This test validates ReLU activation behavior.\n", + "**What we're testing**: ReLU zeros negative values, preserves positive\n", + "**Why it matters**: ReLU's sparsity helps neural networks train efficiently\n", + "**Expected**: Negative β†’ 0, positive unchanged, zero β†’ 0" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2f82fe9d", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "test-relu", + "locked": true, + "points": 10 + } + }, + "outputs": [], + "source": [ + "def test_unit_relu():\n", + " \"\"\"πŸ”¬ Test ReLU implementation.\"\"\"\n", + " print(\"πŸ”¬ Unit Test: ReLU...\")\n", + "\n", + " relu = ReLU()\n", + "\n", + " # Test mixed positive/negative values\n", + " x = Tensor([-2, -1, 0, 1, 2])\n", + " result = relu.forward(x)\n", + " expected = [0, 0, 0, 1, 2]\n", + " assert np.allclose(result.data, expected), f\"ReLU failed, expected {expected}, got {result.data}\"\n", + "\n", + " # Test all negative\n", + " x = Tensor([-5, -3, -1])\n", + " result = relu.forward(x)\n", + " assert np.allclose(result.data, [0, 0, 0]), \"ReLU should zero all negative values\"\n", + "\n", + " # Test all positive\n", + " x = Tensor([1, 3, 5])\n", + " result = relu.forward(x)\n", + " assert np.allclose(result.data, [1, 3, 5]), \"ReLU should preserve all positive values\"\n", + "\n", + " # Test sparsity property\n", + " x = Tensor([-1, -2, -3, 1])\n", + " result = relu.forward(x)\n", + " zeros = np.sum(result.data == 0)\n", + " assert zeros == 3, f\"ReLU should create sparsity, got {zeros} zeros out of 4\"\n", + "\n", + " print(\"βœ… ReLU works correctly!\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " test_unit_relu()" + ] + }, + { + "cell_type": "markdown", + "id": "e337e334", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "## Tanh - The Zero-Centered Alternative\n", + "\n", + "Tanh (hyperbolic tangent) is like sigmoid but centered around zero, mapping inputs to (-1, 1). This zero-centering helps with gradient flow during training.\n", + "\n", + "### Mathematical Definition\n", + "```\n", + "f(x) = (e^x - e^(-x))/(e^x + e^(-x))\n", + "```\n", + "\n", + "### Visual Behavior\n", + "```\n", + "Input: [-2, 0, 2]\n", + " ↓ ↓ ↓ Tanh Function\n", + "Output: [-0.96, 0, 0.96]\n", + "```\n", + "\n", + "### ASCII Visualization\n", + "```\n", + "Tanh Curve:\n", + " 1 ─ ╭─────\n", + " β”‚ β•±\n", + " 0 ────╱─────\n", + " β”‚ β•±\n", + " -1 ──╱───────\n", + " -3 0 3\n", + "```\n", + "\n", + "**Why Tanh matters**: Unlike sigmoid, tanh outputs are centered around zero, which can help gradients flow better through deep networks." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5097cffc", + "metadata": { + "lines_to_next_cell": 1, + "nbgrader": { + "grade": false, + "grade_id": "tanh-impl", + "solution": true + } + }, + "outputs": [], + "source": [ + "#| export\n", + "class Tanh:\n", + " \"\"\"\n", + " Tanh activation: f(x) = (e^x - e^(-x))/(e^x + e^(-x))\n", + "\n", + " Maps any real number to (-1, 1) range.\n", + " Zero-centered alternative to sigmoid.\n", + " \"\"\"\n", + "\n", + " def forward(self, x: Tensor) -> Tensor:\n", + " \"\"\"\n", + " Apply tanh activation element-wise.\n", + "\n", + " TODO: Implement tanh function\n", + "\n", + " APPROACH:\n", + " 1. Use np.tanh(x.data) for hyperbolic tangent\n", + " 2. Return result wrapped in new Tensor\n", + "\n", + " EXAMPLE:\n", + " >>> tanh = Tanh()\n", + " >>> x = Tensor([-2, 0, 2])\n", + " >>> result = tanh(x)\n", + " >>> print(result.data)\n", + " [-0.964, 0.0, 0.964] # Range (-1, 1), symmetric around 0\n", + "\n", + " HINT: NumPy provides np.tanh function\n", + " \"\"\"\n", + " ### BEGIN SOLUTION\n", + " # Apply tanh using NumPy\n", + " result = np.tanh(x.data)\n", + " return Tensor(result)\n", + " ### END SOLUTION\n", + "\n", + " def __call__(self, x: Tensor) -> Tensor:\n", + " \"\"\"Allows the activation to be called like a function.\"\"\"\n", + " return self.forward(x)\n", + "\n", + " def backward(self, grad: Tensor) -> Tensor:\n", + " \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n", + " pass # Will implement backward pass in Module 05" + ] + }, + { + "cell_type": "markdown", + "id": "83e4b892", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "### πŸ”¬ Unit Test: Tanh\n", + "This test validates tanh activation behavior.\n", + "**What we're testing**: Tanh maps inputs to (-1, 1) range, zero-centered\n", + "**Why it matters**: Zero-centered activations can help with gradient flow\n", + "**Expected**: All outputs in (-1, 1), tanh(0) = 0, symmetric behavior" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f55159ca", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "test-tanh", + "locked": true, + "points": 10 + } + }, + "outputs": [], + "source": [ + "def test_unit_tanh():\n", + " \"\"\"πŸ”¬ Test Tanh implementation.\"\"\"\n", + " print(\"πŸ”¬ Unit Test: Tanh...\")\n", + "\n", + " tanh = Tanh()\n", + "\n", + " # Test zero\n", + " x = Tensor([0.0])\n", + " result = tanh.forward(x)\n", + " assert np.allclose(result.data, [0.0]), f\"tanh(0) should be 0, got {result.data}\"\n", + "\n", + " # Test range property - all outputs should be in (-1, 1)\n", + " x = Tensor([-10, -1, 0, 1, 10])\n", + " result = tanh.forward(x)\n", + " assert np.all(result.data >= -1) and np.all(result.data <= 1), \"All tanh outputs should be in [-1, 1]\"\n", + "\n", + " # Test symmetry: tanh(-x) = -tanh(x)\n", + " x = Tensor([2.0])\n", + " pos_result = tanh.forward(x)\n", + " x_neg = Tensor([-2.0])\n", + " neg_result = tanh.forward(x_neg)\n", + " assert np.allclose(pos_result.data, -neg_result.data), \"tanh should be symmetric: tanh(-x) = -tanh(x)\"\n", + "\n", + " # Test extreme values\n", + " x = Tensor([-1000, 1000])\n", + " result = tanh.forward(x)\n", + " assert np.allclose(result.data[0], -1, atol=TOLERANCE), \"tanh(-∞) should approach -1\"\n", + " assert np.allclose(result.data[1], 1, atol=TOLERANCE), \"tanh(+∞) should approach 1\"\n", + "\n", + " print(\"βœ… Tanh works correctly!\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " test_unit_tanh()" + ] + }, + { + "cell_type": "markdown", + "id": "3a2be663", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "## GELU - The Smooth Modern Choice\n", + "\n", + "GELU (Gaussian Error Linear Unit) is a smooth approximation to ReLU that's become popular in modern architectures like transformers. Unlike ReLU's sharp corner, GELU is smooth everywhere.\n", + "\n", + "### Mathematical Definition\n", + "```\n", + "f(x) = x * Ξ¦(x) β‰ˆ x * Sigmoid(1.702 * x)\n", + "```\n", + "Where Ξ¦(x) is the cumulative distribution function of standard normal distribution.\n", + "\n", + "### Visual Behavior\n", + "```\n", + "Input: [-1, 0, 1]\n", + " ↓ ↓ ↓ GELU Function\n", + "Output: [-0.16, 0, 0.84]\n", + "```\n", + "\n", + "### ASCII Visualization\n", + "```\n", + "GELU Function:\n", + " β•±\n", + " 1 β•±\n", + " β•±\n", + " β•±\n", + " β•±\n", + " β•± ↙ (smooth curve, no sharp corner)\n", + " β•±\n", + "─┴─────\n", + "-2 0 2\n", + "```\n", + "\n", + "**Why GELU matters**: Used in GPT, BERT, and other transformers. The smoothness helps with optimization compared to ReLU's sharp corner." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "702988e0", + "metadata": { + "lines_to_next_cell": 1, + "nbgrader": { + "grade": false, + "grade_id": "gelu-impl", + "solution": true + } + }, + "outputs": [], + "source": [ + "#| export\n", + "class GELU:\n", + " \"\"\"\n", + " GELU activation: f(x) = x * Ξ¦(x) β‰ˆ x * Sigmoid(1.702 * x)\n", + "\n", + " Smooth approximation to ReLU, used in modern transformers.\n", + " Where Ξ¦(x) is the cumulative distribution function of standard normal.\n", + " \"\"\"\n", + "\n", + " def forward(self, x: Tensor) -> Tensor:\n", + " \"\"\"\n", + " Apply GELU activation element-wise.\n", + "\n", + " TODO: Implement GELU approximation\n", + "\n", + " APPROACH:\n", + " 1. Use approximation: x * sigmoid(1.702 * x)\n", + " 2. Compute sigmoid part: 1 / (1 + exp(-1.702 * x))\n", + " 3. Multiply by x element-wise\n", + " 4. Return result wrapped in new Tensor\n", + "\n", + " EXAMPLE:\n", + " >>> gelu = GELU()\n", + " >>> x = Tensor([-1, 0, 1])\n", + " >>> result = gelu(x)\n", + " >>> print(result.data)\n", + " [-0.159, 0.0, 0.841] # Smooth, like ReLU but differentiable everywhere\n", + "\n", + " HINT: The 1.702 constant comes from √(2/Ο€) approximation\n", + " \"\"\"\n", + " ### BEGIN SOLUTION\n", + " # GELU approximation: x * sigmoid(1.702 * x)\n", + " # First compute sigmoid part\n", + " sigmoid_part = 1.0 / (1.0 + np.exp(-1.702 * x.data))\n", + " # Then multiply by x\n", + " result = x.data * sigmoid_part\n", + " return Tensor(result)\n", + " ### END SOLUTION\n", + "\n", + " def __call__(self, x: Tensor) -> Tensor:\n", + " \"\"\"Allows the activation to be called like a function.\"\"\"\n", + " return self.forward(x)\n", + "\n", + " def backward(self, grad: Tensor) -> Tensor:\n", + " \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n", + " pass # Will implement backward pass in Module 05" + ] + }, + { + "cell_type": "markdown", + "id": "5c9142d2", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "### πŸ”¬ Unit Test: GELU\n", + "This test validates GELU activation behavior.\n", + "**What we're testing**: GELU provides smooth ReLU-like behavior\n", + "**Why it matters**: GELU is used in modern transformers like GPT and BERT\n", + "**Expected**: Smooth curve, GELU(0) β‰ˆ 0, positive values preserved roughly" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9f917b3", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "test-gelu", + "locked": true, + "points": 10 + } + }, + "outputs": [], + "source": [ + "def test_unit_gelu():\n", + " \"\"\"πŸ”¬ Test GELU implementation.\"\"\"\n", + " print(\"πŸ”¬ Unit Test: GELU...\")\n", + "\n", + " gelu = GELU()\n", + "\n", + " # Test zero (should be approximately 0)\n", + " x = Tensor([0.0])\n", + " result = gelu.forward(x)\n", + " assert np.allclose(result.data, [0.0], atol=TOLERANCE), f\"GELU(0) should be β‰ˆ0, got {result.data}\"\n", + "\n", + " # Test positive values (should be roughly preserved)\n", + " x = Tensor([1.0])\n", + " result = gelu.forward(x)\n", + " assert result.data[0] > 0.8, f\"GELU(1) should be β‰ˆ0.84, got {result.data[0]}\"\n", + "\n", + " # Test negative values (should be small but not zero)\n", + " x = Tensor([-1.0])\n", + " result = gelu.forward(x)\n", + " assert result.data[0] < 0 and result.data[0] > -0.2, f\"GELU(-1) should be β‰ˆ-0.16, got {result.data[0]}\"\n", + "\n", + " # Test smoothness property (no sharp corners like ReLU)\n", + " x = Tensor([-0.001, 0.0, 0.001])\n", + " result = gelu.forward(x)\n", + " # Values should be close to each other (smooth)\n", + " diff1 = abs(result.data[1] - result.data[0])\n", + " diff2 = abs(result.data[2] - result.data[1])\n", + " assert diff1 < 0.01 and diff2 < 0.01, \"GELU should be smooth around zero\"\n", + "\n", + " print(\"βœ… GELU works correctly!\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " test_unit_gelu()" + ] + }, + { + "cell_type": "markdown", + "id": "770d4997", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "## Softmax - The Probability Distributor\n", + "\n", + "Softmax converts any vector into a valid probability distribution. All outputs are positive and sum to exactly 1.0, making it essential for multi-class classification.\n", + "\n", + "### Mathematical Definition\n", + "```\n", + "f(x_i) = e^(x_i) / Ξ£(e^(x_j))\n", + "```\n", + "\n", + "### Visual Behavior\n", + "```\n", + "Input: [1, 2, 3]\n", + " ↓ ↓ ↓ Softmax Function\n", + "Output: [0.09, 0.24, 0.67] # Sum = 1.0\n", + "```\n", + "\n", + "### ASCII Visualization\n", + "```\n", + "Softmax Transform:\n", + "Raw scores: [1, 2, 3, 4]\n", + " ↓ Exponential ↓\n", + " [2.7, 7.4, 20.1, 54.6]\n", + " ↓ Normalize ↓\n", + " [0.03, 0.09, 0.24, 0.64] ← Sum = 1.0\n", + "```\n", + "\n", + "**Why Softmax matters**: In multi-class classification, we need outputs that represent probabilities for each class. Softmax guarantees valid probabilities." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b8a39ebe", + "metadata": { + "lines_to_next_cell": 1, + "nbgrader": { + "grade": false, + "grade_id": "softmax-impl", + "solution": true + } + }, + "outputs": [], + "source": [ + "#| export\n", + "class Softmax:\n", + " \"\"\"\n", + " Softmax activation: f(x_i) = e^(x_i) / Ξ£(e^(x_j))\n", + "\n", + " Converts any vector to a probability distribution.\n", + " Sum of all outputs equals 1.0.\n", + " \"\"\"\n", + "\n", + " def forward(self, x: Tensor, dim: int = -1) -> Tensor:\n", + " \"\"\"\n", + " Apply softmax activation along specified dimension.\n", + "\n", + " TODO: Implement numerically stable softmax\n", + "\n", + " APPROACH:\n", + " 1. Subtract max for numerical stability: x - max(x)\n", + " 2. Compute exponentials: exp(x - max(x))\n", + " 3. Sum along dimension: sum(exp_values)\n", + " 4. Divide: exp_values / sum\n", + " 5. Return result wrapped in new Tensor\n", + "\n", + " EXAMPLE:\n", + " >>> softmax = Softmax()\n", + " >>> x = Tensor([1, 2, 3])\n", + " >>> result = softmax(x)\n", + " >>> print(result.data)\n", + " [0.090, 0.245, 0.665] # Sums to 1.0, larger inputs get higher probability\n", + "\n", + " HINTS:\n", + " - Use np.max(x.data, axis=dim, keepdims=True) for max\n", + " - Use np.sum(exp_values, axis=dim, keepdims=True) for sum\n", + " - The max subtraction prevents overflow in exponentials\n", + " \"\"\"\n", + " ### BEGIN SOLUTION\n", + " # Numerical stability: subtract max to prevent overflow\n", + " # Use Tensor operations to preserve gradient flow!\n", + " x_max_data = np.max(x.data, axis=dim, keepdims=True)\n", + " x_max = Tensor(x_max_data, requires_grad=False) # max is not differentiable in this context\n", + " x_shifted = x - x_max # Tensor subtraction!\n", + "\n", + " # Compute exponentials (NumPy operation, but wrapped in Tensor)\n", + " exp_values = Tensor(np.exp(x_shifted.data), requires_grad=x_shifted.requires_grad)\n", + "\n", + " # Sum along dimension (Tensor operation)\n", + " exp_sum_data = np.sum(exp_values.data, axis=dim, keepdims=True)\n", + " exp_sum = Tensor(exp_sum_data, requires_grad=exp_values.requires_grad)\n", + "\n", + " # Normalize to get probabilities (Tensor division!)\n", + " result = exp_values / exp_sum\n", + " return result\n", + " ### END SOLUTION\n", + "\n", + " def __call__(self, x: Tensor, dim: int = -1) -> Tensor:\n", + " \"\"\"Allows the activation to be called like a function.\"\"\"\n", + " return self.forward(x, dim)\n", + "\n", + " def backward(self, grad: Tensor) -> Tensor:\n", + " \"\"\"Compute gradient (implemented in Module 05).\"\"\"\n", + " pass # Will implement backward pass in Module 05" + ] + }, + { + "cell_type": "markdown", + "id": "7b9d8ff4", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "### πŸ”¬ Unit Test: Softmax\n", + "This test validates softmax activation behavior.\n", + "**What we're testing**: Softmax creates valid probability distributions\n", + "**Why it matters**: Essential for multi-class classification outputs\n", + "**Expected**: Outputs sum to 1.0, all values in (0, 1), largest input gets highest probability" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ba0c1c6e", + "metadata": { + "nbgrader": { + "grade": true, + "grade_id": "test-softmax", + "locked": true, + "points": 10 + } + }, + "outputs": [], + "source": [ + "def test_unit_softmax():\n", + " \"\"\"πŸ”¬ Test Softmax implementation.\"\"\"\n", + " print(\"πŸ”¬ Unit Test: Softmax...\")\n", + "\n", + " softmax = Softmax()\n", + "\n", + " # Test basic probability properties\n", + " x = Tensor([1, 2, 3])\n", + " result = softmax.forward(x)\n", + "\n", + " # Should sum to 1\n", + " assert np.allclose(np.sum(result.data), 1.0), f\"Softmax should sum to 1, got {np.sum(result.data)}\"\n", + "\n", + " # All values should be positive\n", + " assert np.all(result.data > 0), \"All softmax values should be positive\"\n", + "\n", + " # All values should be less than 1\n", + " assert np.all(result.data < 1), \"All softmax values should be less than 1\"\n", + "\n", + " # Largest input should get largest output\n", + " max_input_idx = np.argmax(x.data)\n", + " max_output_idx = np.argmax(result.data)\n", + " assert max_input_idx == max_output_idx, \"Largest input should get largest softmax output\"\n", + "\n", + " # Test numerical stability with large numbers\n", + " x = Tensor([1000, 1001, 1002]) # Would overflow without max subtraction\n", + " result = softmax.forward(x)\n", + " assert np.allclose(np.sum(result.data), 1.0), \"Softmax should handle large numbers\"\n", + " assert not np.any(np.isnan(result.data)), \"Softmax should not produce NaN\"\n", + " assert not np.any(np.isinf(result.data)), \"Softmax should not produce infinity\"\n", + "\n", + " # Test with 2D tensor (batch dimension)\n", + " x = Tensor([[1, 2], [3, 4]])\n", + " result = softmax.forward(x, dim=-1) # Softmax along last dimension\n", + " assert result.shape == (2, 2), \"Softmax should preserve input shape\"\n", + " # Each row should sum to 1\n", + " row_sums = np.sum(result.data, axis=-1)\n", + " assert np.allclose(row_sums, [1.0, 1.0]), \"Each row should sum to 1\"\n", + "\n", + " print(\"βœ… Softmax works correctly!\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " test_unit_softmax()" + ] + }, + { + "cell_type": "markdown", + "id": "6e51cf5d", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 2 + }, + "source": [ + "## 4. Integration - Bringing It Together\n", + "\n", + "Now let's test how all our activation functions work together and understand their different behaviors." + ] + }, + { + "cell_type": "markdown", + "id": "3c74a5d8", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "### Understanding the Output Patterns\n", + "\n", + "From the demonstration above, notice how each activation serves a different purpose:\n", + "\n", + "**Sigmoid**: Squashes everything to (0, 1) - good for probabilities\n", + "**ReLU**: Zeros negatives, keeps positives - creates sparsity\n", + "**Tanh**: Like sigmoid but centered at zero (-1, 1) - better gradient flow\n", + "**GELU**: Smooth ReLU-like behavior - modern choice for transformers\n", + "**Softmax**: Converts to probability distribution - sum equals 1\n", + "\n", + "These different behaviors make each activation suitable for different parts of neural networks." + ] + }, + { + "cell_type": "markdown", + "id": "e4a784c4", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 1 + }, + "source": [ + "## πŸ§ͺ Module Integration Test\n", + "\n", + "Final validation that everything works together correctly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c9f61aa8", + "metadata": { + "lines_to_next_cell": 2, + "nbgrader": { + "grade": true, + "grade_id": "module-test", + "locked": true, + "points": 20 + } + }, + "outputs": [], + "source": [ + "\n", + "def test_module():\n", + " \"\"\"πŸ§ͺ Module Test: Complete Integration\n", + "\n", + " Comprehensive test of entire module functionality.\n", + "\n", + " This final test runs before module summary to ensure:\n", + " - All unit tests pass\n", + " - Functions work together correctly\n", + " - Module is ready for integration with TinyTorch\n", + " \"\"\"\n", + " print(\"πŸ§ͺ RUNNING MODULE INTEGRATION TEST\")\n", + " print(\"=\" * 50)\n", + "\n", + " # Run all unit tests\n", + " print(\"Running unit tests...\")\n", + " test_unit_sigmoid()\n", + " test_unit_relu()\n", + " test_unit_tanh()\n", + " test_unit_gelu()\n", + " test_unit_softmax()\n", + "\n", + " print(\"\\nRunning integration scenarios...\")\n", + "\n", + " # Test 1: All activations preserve tensor properties\n", + " print(\"πŸ”¬ Integration Test: Tensor property preservation...\")\n", + " test_data = Tensor([[1, -1], [2, -2]]) # 2D tensor\n", + "\n", + " activations = [Sigmoid(), ReLU(), Tanh(), GELU()]\n", + " for activation in activations:\n", + " result = activation.forward(test_data)\n", + " assert result.shape == test_data.shape, f\"Shape not preserved by {activation.__class__.__name__}\"\n", + " assert isinstance(result, Tensor), f\"Output not Tensor from {activation.__class__.__name__}\"\n", + "\n", + " print(\"βœ… All activations preserve tensor properties!\")\n", + "\n", + " # Test 2: Softmax works with different dimensions\n", + " print(\"πŸ”¬ Integration Test: Softmax dimension handling...\")\n", + " data_3d = Tensor([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]]) # (2, 2, 3)\n", + " softmax = Softmax()\n", + "\n", + " # Test different dimensions\n", + " result_last = softmax(data_3d, dim=-1)\n", + " assert result_last.shape == (2, 2, 3), \"Softmax should preserve shape\"\n", + "\n", + " # Check that last dimension sums to 1\n", + " last_dim_sums = np.sum(result_last.data, axis=-1)\n", + " assert np.allclose(last_dim_sums, 1.0), \"Last dimension should sum to 1\"\n", + "\n", + " print(\"βœ… Softmax handles different dimensions correctly!\")\n", + "\n", + " # Test 3: Activation chaining (simulating neural network)\n", + " print(\"πŸ”¬ Integration Test: Activation chaining...\")\n", + "\n", + " # Simulate: Input β†’ Linear β†’ ReLU β†’ Linear β†’ Softmax (like a simple network)\n", + " x = Tensor([[-1, 0, 1, 2]]) # Batch of 1, 4 features\n", + "\n", + " # Apply ReLU (hidden layer activation)\n", + " relu = ReLU()\n", + " hidden = relu.forward(x)\n", + "\n", + " # Apply Softmax (output layer activation)\n", + " softmax = Softmax()\n", + " output = softmax.forward(hidden)\n", + "\n", + " # Verify the chain\n", + " assert hidden.data[0, 0] == 0, \"ReLU should zero negative input\"\n", + " assert np.allclose(np.sum(output.data), 1.0), \"Final output should be probability distribution\"\n", + "\n", + " print(\"βœ… Activation chaining works correctly!\")\n", + "\n", + " print(\"\\n\" + \"=\" * 50)\n", + " print(\"πŸŽ‰ ALL TESTS PASSED! Module ready for export.\")\n", + " print(\"Run: tito module complete 02\")\n", + "\n", + "# Run comprehensive module test\n", + "if __name__ == \"__main__\":\n", + " test_module()" + ] + }, + { + "cell_type": "markdown", + "id": "d1a45630", + "metadata": { + "cell_marker": "\"\"\"", + "lines_to_next_cell": 2 + }, + "source": [ + "## 5. Real-World Production Context\n", + "\n", + "Now that you've implemented these activations, let's understand how they're used in real ML systems.\n", + "\n", + "### Activation Selection Guide\n", + "\n", + "**When to Use Each Activation:**\n", + "\n", + "**Sigmoid**\n", + "- **Use case**: Binary classification output layers, gates in LSTMs/GRUs\n", + "- **Production example**: Spam detection (output: probability of spam)\n", + "- **Why**: Outputs valid probabilities in (0, 1)\n", + "- **Avoid**: Hidden layers in deep networks (vanishing gradients)\n", + "\n", + "**ReLU**\n", + "- **Use case**: Hidden layers in CNNs, feedforward networks\n", + "- **Production example**: Image classification networks (ResNet, VGG)\n", + "- **Why**: Fast computation, prevents vanishing gradients, creates sparsity\n", + "- **Avoid**: Output layers (can't output negative values or probabilities)\n", + "\n", + "**Tanh**\n", + "- **Use case**: RNN hidden states, when zero-centered outputs matter\n", + "- **Production example**: Sentiment analysis RNNs, time series prediction\n", + "- **Why**: Zero-centered helps with gradient flow in recurrent networks\n", + "- **Avoid**: Very deep networks (still suffers from vanishing gradients)\n", + "\n", + "**GELU**\n", + "- **Use case**: Transformer models, modern architectures\n", + "- **Production example**: GPT, BERT, modern language models\n", + "- **Why**: Smooth approximation of ReLU, better gradient flow, state-of-the-art results\n", + "- **Avoid**: When computational efficiency is critical (slightly slower than ReLU)\n", + "\n", + "**Softmax**\n", + "- **Use case**: Multi-class classification output layers\n", + "- **Production example**: ImageNet classification (1000 classes), NLP token prediction\n", + "- **Why**: Converts logits to valid probability distribution (sums to 1)\n", + "- **Avoid**: Hidden layers (loses information through normalization)\n", + "\n", + "### Common Production Patterns\n", + "\n", + "**Pattern 1: CNN Image Classification**\n", + "```\n", + "Input β†’ Conv+ReLU β†’ Conv+ReLU β†’ ... β†’ Linear β†’ Softmax β†’ Class Probabilities\n", + "```\n", + "\n", + "**Pattern 2: Binary Classifier**\n", + "```\n", + "Input β†’ Linear+ReLU β†’ Linear+ReLU β†’ Linear β†’ Sigmoid β†’ Binary Probability\n", + "```\n", + "\n", + "**Pattern 3: Modern Transformer**\n", + "```\n", + "Input β†’ Attention β†’ Linear+GELU β†’ Linear+GELU β†’ Output\n", + "```\n", + "\n", + "### Common Pitfalls and Debugging\n", + "\n", + "**Sigmoid/Tanh Pitfalls:**\n", + "- **Vanishing gradients**: Gradients near 0 for extreme inputs\n", + "- **Saturation**: Outputs plateau, learning slows\n", + "- **Debug tip**: Check activation distribution - avoid all values near 0 or 1\n", + "\n", + "**ReLU Pitfalls:**\n", + "- **Dying ReLU**: Neurons output 0 forever after large negative gradient\n", + "- **No negative outputs**: Can't represent negative relationships\n", + "- **Debug tip**: Monitor % of dead neurons (always output 0)\n", + "\n", + "**Softmax Pitfalls:**\n", + "- **Numerical overflow**: exp(x) explodes for large x (solved by max subtraction)\n", + "- **Dimension confusion**: Must apply along correct axis for batched data\n", + "- **Debug tip**: Verify outputs sum to 1.0 along correct dimension\n", + "\n", + "**GELU Pitfalls:**\n", + "- **Approximation error**: Using wrong approximation constant\n", + "- **Speed**: Slightly slower than ReLU\n", + "- **Debug tip**: Compare outputs to reference implementation\n", + "\n", + "### Performance Characteristics\n", + "\n", + "**Computational Cost (relative to ReLU = 1.0):**\n", + "- ReLU: 1.0Γ— (fastest - just comparison and max)\n", + "- Sigmoid: ~3Γ—-4Γ— (exponential computation)\n", + "- Tanh: ~3Γ—-4Γ— (two exponentials)\n", + "- GELU: ~4Γ—-5Γ— (exponential in approximation)\n", + "- Softmax: ~5Γ—+ (exponentials + division across all elements)\n", + "\n", + "**Memory Impact:**\n", + "- All activations: Minimal memory overhead (output same size as input)\n", + "- Softmax: Slightly higher (temporary buffers for exp and sum)\n", + "- For autograd (Module 05): Must cache inputs for backward pass\n", + "\n", + "### Integration with TinyTorch\n", + "\n", + "Your activation functions integrate seamlessly with other modules:\n", + "\n", + "**Module 03 (Layers)**: Will use these activations\n", + "```python\n", + "# Coming in Module 03\n", + "class Linear:\n", + " def __init__(self, in_features, out_features, activation=None):\n", + " self.activation = activation # Your ReLU, Sigmoid, etc.\n", + "\n", + " def forward(self, x):\n", + " out = self.compute_linear(x)\n", + " if self.activation:\n", + " out = self.activation(out) # Uses your forward()\n", + " return out\n", + "```\n", + "\n", + "**Module 05 (Autograd)**: Will add gradient computation\n", + "```python\n", + "# Coming in Module 05\n", + "class Sigmoid:\n", + " def backward(self, grad):\n", + " # βˆ‚sigmoid/βˆ‚x = sigmoid(x) * (1 - sigmoid(x))\n", + " return grad * self.output * (1 - self.output)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "id": "904d4f89", + "metadata": { + "cell_marker": "\"\"\"" + }, + "source": [ + "## 🎯 MODULE SUMMARY: Activations\n", + "\n", + "Congratulations! You've built the intelligence engine of neural networks!\n", + "\n", + "### Key Accomplishments\n", + "- Built 5 core activation functions with distinct behaviors and use cases\n", + "- Implemented forward passes for Sigmoid, ReLU, Tanh, GELU, and Softmax\n", + "- Discovered how nonlinearity enables complex pattern learning\n", + "- All tests pass βœ… (validated by `test_module()`)\n", + "\n", + "### Ready for Next Steps\n", + "Your activation implementations enable neural network layers to learn complex, nonlinear patterns instead of just linear transformations.\n", + "\n", + "Export with: `tito module complete 02`\n", + "\n", + "**Next**: Module 03 will combine your Tensors and Activations to build complete neural network Layers!" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/tito/main.py b/tito/main.py index 02e2c6ee..fe279511 100644 --- a/tito/main.py +++ b/tito/main.py @@ -216,37 +216,60 @@ Tracking Progress: if not parsed_args.command: # Show ASCII logo first print_ascii_logo() - - # Show enhanced help with command groups + + # Dynamically build help based on registered commands + # Categorize commands by role + essential = ['setup'] + student_workflow = ['module', 'checkpoint', 'milestones'] + community = ['leaderboard', 'olympics', 'community'] + developer = ['system', 'package', 'nbgrader', 'src'] + shortcuts = ['export', 'test', 'book', 'demo'] + other = ['benchmark', 'grade', 'logo'] + + help_text = "[bold]Essential Commands:[/bold]\n" + for cmd in essential: + if cmd in self.commands: + desc = self.commands[cmd](self.config).description + help_text += f" [bold cyan]{cmd}[/bold cyan] - {desc}\n" + + help_text += "\n[bold]Student Workflow:[/bold]\n" + for cmd in student_workflow: + if cmd in self.commands: + desc = self.commands[cmd](self.config).description + help_text += f" [bold green]{cmd}[/bold green] - {desc}\n" + + help_text += "\n[bold]Community:[/bold]\n" + for cmd in community: + if cmd in self.commands: + desc = self.commands[cmd](self.config).description + help_text += f" [bold bright_blue]{cmd}[/bold bright_blue] - {desc}\n" + + help_text += "\n[bold]Developer Tools:[/bold]\n" + for cmd in developer: + if cmd in self.commands: + desc = self.commands[cmd](self.config).description + help_text += f" [dim]{cmd}[/dim] - {desc}\n" + + help_text += "\n[bold]Shortcuts:[/bold]\n" + for cmd in shortcuts: + if cmd in self.commands: + desc = self.commands[cmd](self.config).description + help_text += f" [bold yellow]{cmd}[/bold yellow] - {desc}\n" + + help_text += "\n[bold]Quick Start:[/bold]\n" + help_text += " [dim]tito setup[/dim] - First-time setup (run once)\n" + help_text += " [dim]tito module start 01[/dim] - Start Module 01 (tensors)\n" + help_text += " [dim]tito module complete 01[/dim] - Complete it (test + export + track)\n" + help_text += " [dim]tito module status[/dim] - View all progress\n" + help_text += "\n[bold]Track Progress:[/bold]\n" + help_text += " [dim]tito checkpoint status[/dim] - Capabilities unlocked\n" + help_text += " [dim]tito leaderboard profile[/dim] - Your achievement journey\n" + help_text += "\n[bold]Get Help:[/bold]\n" + help_text += " [dim]tito [/dim] - Show command subcommands\n" + help_text += " [dim]tito --help[/dim] - Show full help" + self.console.print(Panel( - "[bold]Essential Commands:[/bold]\n" - " [bold cyan]setup[/bold cyan] - First-time environment setup\n\n" - "[bold]Command Groups:[/bold]\n" - " [bold green]system[/bold green] - System environment and configuration\n" - " [bold green]module[/bold green] - Module workflow (start, complete, resume)\n" - " [bold green]package[/bold green] - Package management and nbdev integration\n" - " [bold green]nbgrader[/bold green] - Assignment management and auto-grading\n" - " [bold cyan]checkpoint[/bold cyan] - Progress tracking (capabilities unlocked)\n" - " [bold magenta]milestones[/bold magenta] - Epic achievements (major unlocks)\n" - " [bold bright_blue]leaderboard[/bold bright_blue] - Community showcase (share progress)\n" - " [bold bright_yellow]olympics[/bold bright_yellow] - Competition events (challenges)\n\n" - "[bold]Convenience Shortcuts:[/bold]\n" - " [bold yellow]export[/bold yellow] - Quick export (β†’ module export)\n" - " [bold yellow]test[/bold yellow] - Quick test (β†’ module test)\n" - " [bold green]book[/bold green] - Build Jupyter Book documentation\n" - " [bold green]logo[/bold green] - Learn about TinyπŸ”₯Torch philosophy\n" - "[bold]Quick Start:[/bold]\n" - " [dim]tito setup[/dim] - First-time setup (run once)\n" - " [dim]tito module start 01[/dim] - Start Module 01 (tensors)\n" - " [dim]tito module complete 01[/dim] - Complete it (test + export + track)\n" - " [dim]tito module start 02[/dim] - Continue to Module 02\n" - " [dim]tito module status[/dim] - View all progress\n\n" - "[bold]Track Progress:[/bold]\n" - " [dim]tito checkpoint status[/dim] - Capabilities unlocked\n" - " [dim]tito leaderboard profile[/dim] - Your achievement journey\n\n" - "[bold]Get Help:[/bold]\n" - " [dim]tito [/dim] - Show command subcommands\n" - " [dim]tito --help[/dim] - Show full help", + help_text, title="Welcome to TinyπŸ”₯Torch!", border_style="bright_green" ))