TinyTorch/modules/source/04_layers/layers_dev.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0e007598",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "# Layers - Building Blocks of Neural Networks\n",
    "\n",
    "Welcome to the Layers module! This is where we build the fundamental components that stack together to form neural networks.\n",
    "\n",
    "## Learning Goals\n",
    "- Understand how matrix multiplication powers neural networks\n",
    "- Implement naive matrix multiplication from scratch for deep understanding\n",
    "- Build the Dense (Linear) layer - the foundation of all neural networks\n",
    "- Learn weight initialization strategies and their importance\n",
    "- See how layers compose with activations to create powerful networks\n",
    "\n",
    "## Build → Use → Understand\n",
    "1. **Build**: Matrix multiplication and Dense layers from scratch\n",
    "2. **Use**: Create and test layers with real data\n",
    "3. **Understand**: How linear transformations enable feature learning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc400228",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "layers-imports",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| default_exp core.layers\n",
    "\n",
    "#| export\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import os\n",
    "import sys\n",
    "from typing import Union, List, Tuple, Optional\n",
    "\n",
    "# Import our dependencies - try from package first, then local modules\n",
    "try:\n",
    "    from tinytorch.core.tensor import Tensor\n",
    "    from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
    "except ImportError:\n",
    "    # For development, import from local modules\n",
    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
    "    try:\n",
    "        from tensor_dev import Tensor\n",
    "        from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
    "    except ImportError:\n",
    "        # If the local modules are not available, use relative imports\n",
    "        from ..tensor.tensor_dev import Tensor\n",
    "        from ..activations.activations_dev import ReLU, Sigmoid, Tanh, Softmax"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e186492c",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "layers-setup",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| hide\n",
    "#| export\n",
    "def _should_show_plots():\n",
    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
    "    # Check multiple conditions that indicate we're in test mode\n",
    "    is_pytest = (\n",
    "        'pytest' in sys.modules or\n",
    "        'test' in sys.argv or\n",
    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
    "        any('test' in arg for arg in sys.argv) or\n",
    "        any('pytest' in arg for arg in sys.argv)\n",
    "    )\n",
    "    \n",
    "    # Show plots in development mode (when not in test mode)\n",
    "    return not is_pytest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d41a5d47",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "layers-welcome",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "print(\"🔥 TinyTorch Layers Module\")\n",
    "print(f\"NumPy version: {np.__version__}\")\n",
    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
    "print(\"Ready to build neural network layers!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bed6f41e",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 📦 Where This Code Lives in the Final Package\n",
    "\n",
    "**Learning Side:** You work in `modules/source/03_layers/layers_dev.py`  \n",
    "**Building Side:** Code exports to `tinytorch.core.layers`\n",
    "\n",
    "```python\n",
    "# Final package structure:\n",
    "from tinytorch.core.layers import Dense, Conv2D  # All layer types together!\n",
    "from tinytorch.core.tensor import Tensor  # The foundation\n",
    "from tinytorch.core.activations import ReLU, Sigmoid  # Nonlinearity\n",
    "```\n",
    "\n",
    "**Why this matters:**\n",
    "- **Learning:** Focused modules for deep understanding\n",
    "- **Production:** Proper organization like PyTorch's `torch.nn.Linear`\n",
    "- **Consistency:** All layer types live together in `core.layers`\n",
    "- **Integration:** Works seamlessly with tensors and activations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2c033ee",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## What Are Neural Network Layers?\n",
    "\n",
    "### The Building Block Pattern\n",
    "Neural networks are built by stacking **layers** - each layer is a function that:\n",
    "1. **Takes input**: Tensor data from previous layer\n",
    "2. **Transforms**: Applies mathematical operations (linear transformation + activation)\n",
    "3. **Produces output**: New tensor data for next layer\n",
    "\n",
    "### The Universal Pattern\n",
    "Every layer follows this pattern:\n",
    "```python\n",
    "def layer(x):\n",
    "    # 1. Linear transformation\n",
    "    linear_output = x @ weights + bias\n",
    "    \n",
    "    # 2. Nonlinear activation\n",
    "    output = activation(linear_output)\n",
    "    \n",
    "    return output\n",
    "```\n",
    "\n",
    "### Why This Works\n",
    "- **Linear part**: Learns feature combinations\n",
    "- **Nonlinear part**: Enables complex patterns\n",
    "- **Stacking**: Multiple layers = more complex functions\n",
    "\n",
    "### Mathematical Foundation\n",
    "A neural network is function composition:\n",
    "```\n",
    "f(x) = layer_n(layer_{n-1}(...layer_2(layer_1(x))))\n",
    "```\n",
    "\n",
    "Each layer transforms the representation to be more useful for the final task.\n",
    "\n",
    "### What We'll Build\n",
    "1. **Matrix Multiplication**: The core operation powering all layers\n",
    "2. **Dense Layer**: The fundamental building block of neural networks\n",
    "3. **Integration**: How layers work with activations and tensors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "448f63f6",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 1: Matrix Multiplication - The Engine of Neural Networks\n",
    "\n",
    "### What is Matrix Multiplication?\n",
    "Matrix multiplication is the core operation that powers all neural network layers:\n",
    "\n",
    "```\n",
    "C = A @ B\n",
    "```\n",
    "\n",
    "Where:\n",
    "- **A**: Input data (batch_size × input_features)\n",
    "- **B**: Weight matrix (input_features × output_features)  \n",
    "- **C**: Output data (batch_size × output_features)\n",
    "\n",
    "### Why It's Essential\n",
    "- **Feature combination**: Each output combines all input features\n",
    "- **Learned weights**: B contains the learned parameters\n",
    "- **Efficient computation**: Vectorized operations are much faster\n",
    "- **Parallel processing**: GPUs are designed for matrix operations\n",
    "\n",
    "### The Mathematical Definition\n",
    "For matrices A (m×n) and B (n×p), the result C (m×p) is:\n",
    "```\n",
    "C[i,j] = Σ(k=0 to n-1) A[i,k] * B[k,j]\n",
    "```\n",
    "\n",
    "### Visual Understanding\n",
    "```\n",
    "[1 2] @ [5 6] = [1*5+2*7  1*6+2*8] = [19 22]\n",
    "[3 4]   [7 8]   [3*5+4*7  3*6+4*8]   [43 50]\n",
    "```\n",
    "\n",
    "### Real-World Context\n",
    "Every major operation in deep learning uses matrix multiplication:\n",
    "- **Dense layers**: Linear transformations\n",
    "- **Convolutional layers**: Convolution as matrix multiplication\n",
    "- **Attention mechanisms**: Query-Key-Value computations\n",
    "- **Embeddings**: Lookup tables as matrix multiplication"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cccd838f",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "matmul-naive",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def matmul(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
    "    \"\"\"\n",
    "    Matrix multiplication using explicit for-loops.\n",
    "    \n",
    "    This helps you understand what matrix multiplication really does!\n",
    "        \n",
    "    TODO: Implement matrix multiplication using three nested for-loops.\n",
    "    \n",
    "    STEP-BY-STEP IMPLEMENTATION:\n",
    "    1. Get the dimensions: m, n from A.shape and n2, p from B.shape\n",
    "    2. Check compatibility: n must equal n2\n",
    "    3. Create output matrix C of shape (m, p) filled with zeros\n",
    "    4. Use three nested loops:\n",
    "       - i loop: iterate through rows of A (0 to m-1)\n",
    "       - j loop: iterate through columns of B (0 to p-1)\n",
    "       - k loop: iterate through shared dimension (0 to n-1)\n",
    "    5. For each (i,j), accumulate: C[i,j] += A[i,k] * B[k,j]\n",
    "    \n",
    "    EXAMPLE WALKTHROUGH:\n",
    "    ```python\n",
    "    A = [[1, 2],     B = [[5, 6],\n",
    "         [3, 4]]          [7, 8]]\n",
    "    \n",
    "    C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19\n",
    "    C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22\n",
    "    C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43\n",
    "    C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50\n",
    "    \n",
    "    Result: [[19, 22], [43, 50]]\n",
    "    ```\n",
    "    \n",
    "    IMPLEMENTATION HINTS:\n",
    "    - Get dimensions: m, n = A.shape; n2, p = B.shape\n",
    "    - Check compatibility: if n != n2: raise ValueError\n",
    "    - Initialize result: C = np.zeros((m, p))\n",
    "    - Triple nested loop: for i in range(m): for j in range(p): for k in range(n):\n",
    "    - Accumulate sum: C[i,j] += A[i,k] * B[k,j]\n",
    "    \n",
    "    LEARNING CONNECTIONS:\n",
    "    - This is what every neural network layer does internally\n",
    "    - Understanding this helps debug shape mismatches\n",
    "    - Essential for understanding the foundation of neural networks\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    # Get matrix dimensions\n",
    "    m, n = A.shape\n",
    "    n2, p = B.shape\n",
    "    \n",
    "    # Check compatibility\n",
    "    if n != n2:\n",
    "        raise ValueError(f\"Incompatible matrix dimensions: A is {m}x{n}, B is {n2}x{p}\")\n",
    "    \n",
    "    # Initialize result matrix\n",
    "    C = np.zeros((m, p))\n",
    "    \n",
    "    # Triple nested loop for matrix multiplication\n",
    "    for i in range(m):\n",
    "        for j in range(p):\n",
    "            for k in range(n):\n",
    "                C[i, j] += A[i, k] * B[k, j]\n",
    "    \n",
    "    return C\n",
    "    ### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e695714",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "### 🧪 Test Your Matrix Multiplication\n",
    "\n",
    "Once you implement the `matmul` function above, run this cell to test it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bed91066",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "test-matmul-immediate",
     "locked": true,
     "points": 10,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_matrix_multiplication():\n",
    "    \"\"\"Test matrix multiplication implementation\"\"\"\n",
    "    print(\"🔬 Unit Test: Matrix Multiplication...\")\n",
    "\n",
    "# Test simple 2x2 case\n",
    "    A = np.array([[1, 2], [3, 4]], dtype=np.float32)\n",
    "    B = np.array([[5, 6], [7, 8]], dtype=np.float32)\n",
    "    \n",
    "    result = matmul(A, B)\n",
    "    expected = np.array([[19, 22], [43, 50]], dtype=np.float32)\n",
    "    \n",
    "    assert np.allclose(result, expected), f\"Matrix multiplication failed: expected {expected}, got {result}\"\n",
    "    \n",
    "    # Compare with NumPy\n",
    "    numpy_result = A @ B\n",
    "    assert np.allclose(result, numpy_result), f\"Doesn't match NumPy: got {result}, expected {numpy_result}\"\n",
    "\n",
    "# Test different shapes\n",
    "    A2 = np.array([[1, 2, 3]], dtype=np.float32)  # 1x3\n",
    "    B2 = np.array([[4], [5], [6]], dtype=np.float32)  # 3x1\n",
    "    result2 = matmul(A2, B2)\n",
    "    expected2 = np.array([[32]], dtype=np.float32)  # 1*4 + 2*5 + 3*6 = 32\n",
    "    \n",
    "    assert np.allclose(result2, expected2), f\"1x3 @ 3x1 failed: expected {expected2}, got {result2}\"\n",
    "    \n",
    "    # Test 3x3 case\n",
    "    A3 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)\n",
    "    B3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32)  # Identity\n",
    "    result3 = matmul(A3, B3)\n",
    "    \n",
    "    assert np.allclose(result3, A3), \"Multiplication by identity should preserve matrix\"\n",
    "    \n",
    "    # Test incompatible shapes\n",
    "    A4 = np.array([[1, 2]], dtype=np.float32)  # 1x2\n",
    "    B4 = np.array([[3], [4], [5]], dtype=np.float32)  # 3x1\n",
    "    \n",
    "    try:\n",
    "        matmul(A4, B4)\n",
    "        assert False, \"Should raise error for incompatible shapes\"\n",
    "    except ValueError as e:\n",
    "        assert \"Incompatible matrix dimensions\" in str(e)\n",
    "    \n",
    "    print(\"✅ Matrix multiplication tests passed!\")\n",
    "    print(f\"✅ 2x2 multiplication working correctly\")\n",
    "    print(f\"✅ Matches NumPy's implementation\")\n",
    "    print(f\"✅ Handles different shapes correctly\")\n",
    "    print(f\"✅ Proper error handling for incompatible shapes\")\n",
    "\n",
    "# Run the test\n",
    "test_matrix_multiplication()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ab183a07",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 2: Dense Layer - The Foundation of Neural Networks\n",
    "\n",
    "### What is a Dense Layer?\n",
    "A **Dense layer** (also called Linear or Fully Connected layer) is the fundamental building block of neural networks:\n",
    "\n",
    "```python\n",
    "output = input @ weights + bias\n",
    "```\n",
    "\n",
    "Where:\n",
    "- **input**: Input data (batch_size × input_features)\n",
    "- **weights**: Learned parameters (input_features × output_features)\n",
    "- **bias**: Learned bias terms (output_features,)\n",
    "- **output**: Transformed data (batch_size × output_features)\n",
    "\n",
    "### Why Dense Layers Are Essential\n",
    "1. **Feature transformation**: Learn meaningful combinations of input features\n",
    "2. **Universal approximation**: Stack enough layers to approximate any function\n",
    "3. **Learnable parameters**: Weights and biases are optimized during training\n",
    "4. **Composability**: Can be stacked to create complex architectures\n",
    "\n",
    "### The Mathematical Foundation\n",
    "For input x, weight matrix W, and bias b:\n",
    "```\n",
    "y = xW + b\n",
    "```\n",
    "\n",
    "This is a linear transformation that:\n",
    "- **Combines features**: Each output is a weighted sum of all inputs\n",
    "- **Learns relationships**: Weights encode feature interactions\n",
    "- **Adds flexibility**: Bias allows shifting the output\n",
    "\n",
    "### Real-World Applications\n",
    "- **Classification**: Transform features to class logits\n",
    "- **Regression**: Transform features to continuous outputs\n",
    "- **Representation learning**: Learn useful intermediate representations\n",
    "- **Attention mechanisms**: Compute queries, keys, and values\n",
    "\n",
    "### Design Decisions\n",
    "- **Weight initialization**: Random initialization to break symmetry\n",
    "- **Bias usage**: Usually included for flexibility\n",
    "- **Activation**: Often followed by nonlinear activation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eec77bde",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "dense-layer",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "class Dense:\n",
    "    \"\"\"\n",
    "    Dense (Linear/Fully Connected) Layer\n",
    "    \n",
    "    Applies a linear transformation: y = xW + b\n",
    "    \n",
    "    This is the fundamental building block of neural networks.\n",
    "    \"\"\"\n",
    "    \n",
    "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
    "        \"\"\"\n",
    "        Initialize Dense layer with random weights and optional bias.\n",
    "        \n",
    "        TODO: Implement Dense layer initialization.\n",
    "        \n",
    "        STEP-BY-STEP IMPLEMENTATION:\n",
    "        1. Store the layer parameters (input_size, output_size, use_bias)\n",
    "        2. Initialize weights with random values using proper scaling\n",
    "        3. Initialize bias (if use_bias=True) with zeros\n",
    "        4. Convert weights and bias to Tensor objects\n",
    "        \n",
    "        WEIGHT INITIALIZATION STRATEGY:\n",
    "        - Use Xavier/Glorot initialization for better gradient flow\n",
    "        - Scale: sqrt(2 / (input_size + output_size))\n",
    "        - Random values: np.random.randn() * scale\n",
    "        \n",
    "        EXAMPLE USAGE:\n",
    "        ```python\n",
    "        layer = Dense(input_size=3, output_size=2)\n",
    "        # Creates weight matrix of shape (3, 2) and bias of shape (2,)\n",
    "        ```\n",
    "        \n",
    "        IMPLEMENTATION HINTS:\n",
    "        - Store parameters: self.input_size, self.output_size, self.use_bias\n",
    "        - Weight shape: (input_size, output_size)\n",
    "        - Bias shape: (output_size,) if use_bias else None\n",
    "        - Use Xavier initialization: scale = np.sqrt(2.0 / (input_size + output_size))\n",
    "        - Initialize weights: np.random.randn(input_size, output_size) * scale\n",
    "        - Initialize bias: np.zeros(output_size) if use_bias else None\n",
    "        - Convert to Tensors: self.weights = Tensor(weight_data), self.bias = Tensor(bias_data)\n",
    "        \"\"\"\n",
    "        ### BEGIN SOLUTION\n",
    "        # Store layer parameters\n",
    "        self.input_size = input_size\n",
    "        self.output_size = output_size\n",
    "        self.use_bias = use_bias\n",
    "        \n",
    "        # Xavier/Glorot initialization\n",
    "        scale = np.sqrt(2.0 / (input_size + output_size))\n",
    "        \n",
    "        # Initialize weights with random values\n",
    "        weight_data = np.random.randn(input_size, output_size) * scale\n",
    "        self.weights = Tensor(weight_data)\n",
    "        \n",
    "        # Initialize bias\n",
    "        if use_bias:\n",
    "            bias_data = np.zeros(output_size)\n",
    "            self.bias = Tensor(bias_data)\n",
    "        else:\n",
    "            self.bias = None\n",
    "        ### END SOLUTION\n",
    "    \n",
    "    def forward(self, x):\n",
    "        \"\"\"\n",
    "        Forward pass through the Dense layer.\n",
    "        \n",
    "        TODO: Implement the forward pass: y = xW + b\n",
    "        \n",
    "        STEP-BY-STEP IMPLEMENTATION:\n",
    "        1. Perform matrix multiplication: x @ self.weights\n",
    "        2. Add bias if present: result + self.bias\n",
    "        3. Return the result as a Tensor\n",
    "        \n",
    "        EXAMPLE USAGE:\n",
    "        ```python\n",
    "        layer = Dense(input_size=3, output_size=2)\n",
    "        input_data = Tensor([[1, 2, 3]])  # Shape: (1, 3)\n",
    "        output = layer(input_data)        # Shape: (1, 2)\n",
    "        ```\n",
    "        \n",
    "        IMPLEMENTATION HINTS:\n",
    "        - Matrix multiplication: matmul(x.data, self.weights.data)\n",
    "        - Add bias: result + self.bias.data (broadcasting handles shape)\n",
    "        - Return as Tensor: return Tensor(final_result)\n",
    "        - Handle both cases: with and without bias\n",
    "        \n",
    "        LEARNING CONNECTIONS:\n",
    "        - This is the core operation in every neural network layer\n",
    "        - Matrix multiplication combines all input features\n",
    "        - Bias addition allows shifting the output distribution\n",
    "        - The result feeds into activation functions\n",
    "        \"\"\"\n",
    "        ### BEGIN SOLUTION\n",
    "        # Perform matrix multiplication\n",
    "        linear_output = matmul(x.data, self.weights.data)\n",
    "        \n",
    "        # Add bias if present\n",
    "        if self.use_bias and self.bias is not None:\n",
    "            linear_output = linear_output + self.bias.data\n",
    "        \n",
    "        return type(x)(linear_output)\n",
    "        ### END SOLUTION\n",
    "    \n",
    "    def __call__(self, x):\n",
    "        \"\"\"Make the layer callable: layer(x) instead of layer.forward(x)\"\"\"\n",
    "        return self.forward(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5736d98c",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "### 🧪 Test Your Dense Layer\n",
    "\n",
    "Once you implement the Dense layer above, run this cell to test it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b1d056c",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "test-dense-layer",
     "locked": true,
     "points": 15,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_dense_layer():\n",
    "    \"\"\"Test Dense layer implementation\"\"\"\n",
    "    print(\"🔬 Unit Test: Dense Layer...\")\n",
    "    \n",
    "    # Test layer creation\n",
    "    layer = Dense(input_size=3, output_size=2)\n",
    "    \n",
    "    # Check weight and bias shapes\n",
    "    assert layer.weights.shape == (3, 2), f\"Weight shape should be (3, 2), got {layer.weights.shape}\"\n",
    "    assert layer.bias is not None, \"Bias should not be None when use_bias=True\"\n",
    "    assert layer.bias.shape == (2,), f\"Bias shape should be (2,), got {layer.bias.shape}\"\n",
    "    \n",
    "    # Test forward pass\n",
    "    input_data = Tensor([[1, 2, 3]])  # Shape: (1, 3)\n",
    "    output = layer(input_data)\n",
    "    \n",
    "    # Check output shape\n",
    "    assert output.shape == (1, 2), f\"Output shape should be (1, 2), got {output.shape}\"\n",
    "    \n",
    "    # Test batch processing\n",
    "    batch_input = Tensor([[1, 2, 3], [4, 5, 6]])  # Shape: (2, 3)\n",
    "    batch_output = layer(batch_input)\n",
    "    \n",
    "    assert batch_output.shape == (2, 2), f\"Batch output shape should be (2, 2), got {batch_output.shape}\"\n",
    "\n",
    "# Test without bias\n",
    "    no_bias_layer = Dense(input_size=3, output_size=2, use_bias=False)\n",
    "    assert no_bias_layer.bias is None, \"Layer without bias should have None bias\"\n",
    "    \n",
    "    no_bias_output = no_bias_layer(input_data)\n",
    "    assert no_bias_output.shape == (1, 2), \"No-bias layer should still produce correct shape\"\n",
    "    \n",
    "    # Test that different inputs produce different outputs\n",
    "    input1 = Tensor([[1, 0, 0]])\n",
    "    input2 = Tensor([[0, 1, 0]])\n",
    "    \n",
    "    output1 = layer(input1)\n",
    "    output2 = layer(input2)\n",
    "    \n",
    "    # Should not be equal (with high probability due to random initialization)\n",
    "    assert not np.allclose(output1.data, output2.data), \"Different inputs should produce different outputs\"\n",
    "    \n",
    "    # Test linearity property: layer(a*x) = a*layer(x)\n",
    "    scale = 2.0\n",
    "    scaled_input = Tensor([[2, 4, 6]])  # 2 * [1, 2, 3]\n",
    "    scaled_output = layer(scaled_input)\n",
    "    \n",
    "    # Due to bias, this won't be exactly 2*output, but the linear part should scale\n",
    "    print(\"✅ Dense layer tests passed!\")\n",
    "    print(f\"✅ Correct weight and bias initialization\")\n",
    "    print(f\"✅ Forward pass produces correct shapes\")\n",
    "    print(f\"✅ Batch processing works correctly\")\n",
    "    print(f\"✅ Bias and no-bias variants work\")\n",
    "    print(f\"✅ Naive matrix multiplication option works\")\n",
    "\n",
    "# Run the test\n",
    "test_dense_layer()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac4dcba0",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 3: Layer Integration with Activations\n",
    "\n",
    "### Building Complete Neural Network Components\n",
    "Now let's see how Dense layers work with activation functions to create complete neural network components:\n",
    "\n",
    "```python\n",
    "# Complete neural network layer\n",
    "x = input_data\n",
    "linear_output = dense_layer(x)\n",
    "final_output = activation_function(linear_output)\n",
    "```\n",
    "\n",
    "### Why This Combination Works\n",
    "1. **Linear transformation**: Dense layer learns feature combinations\n",
    "2. **Nonlinear activation**: Enables complex pattern recognition\n",
    "3. **Stacking**: Multiple layer+activation pairs create deep networks\n",
    "4. **Universal approximation**: Can approximate any continuous function\n",
    "\n",
    "### Real-World Layer Patterns\n",
    "- **Hidden layers**: Dense + ReLU (most common)\n",
    "- **Output layers**: Dense + Softmax (classification) or Dense + Sigmoid (binary)\n",
    "- **Gated layers**: Dense + Sigmoid (for gates in LSTM/GRU)\n",
    "- **Attention layers**: Dense + Softmax (for attention weights)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5e77a64",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "test-layer-activation-comprehensive",
     "locked": true,
     "points": 15,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_layer_activation():\n",
    "    \"\"\"Test Dense layer comprehensive testing with activation functions\"\"\"\n",
    "    print(\"🔬 Unit Test: Layer-Activation Comprehensive Test...\")\n",
    "    \n",
    "    # Create layer and activation functions\n",
    "    layer = Dense(input_size=4, output_size=3)\n",
    "    relu = ReLU()\n",
    "    sigmoid = Sigmoid()\n",
    "    tanh = Tanh()\n",
    "    softmax = Softmax()\n",
    "    \n",
    "    # Test input\n",
    "    input_data = Tensor([[1, -2, 3, -4], [2, 1, -1, 3]])  # Shape: (2, 4)\n",
    "    \n",
    "    # Test Dense + ReLU (common hidden layer pattern)\n",
    "    linear_output = layer(input_data)\n",
    "    relu_output = relu(linear_output)\n",
    "    \n",
    "    assert relu_output.shape == (2, 3), \"ReLU output should preserve shape\"\n",
    "    assert np.all(relu_output.data >= 0), \"ReLU output should be non-negative\"\n",
    "    \n",
    "    # Test Dense + Softmax (classification output pattern)\n",
    "    softmax_output = softmax(linear_output)\n",
    "    \n",
    "    assert softmax_output.shape == (2, 3), \"Softmax output should preserve shape\"\n",
    "    \n",
    "    # Each row should sum to 1 (probability distribution)\n",
    "    for i in range(2):\n",
    "        row_sum = np.sum(softmax_output.data[i])\n",
    "        assert abs(row_sum - 1.0) < 1e-6, f\"Row {i} should sum to 1, got {row_sum}\"\n",
    "    \n",
    "    # Test Dense + Sigmoid (binary classification pattern)\n",
    "    sigmoid_output = sigmoid(linear_output)\n",
    "    \n",
    "    assert sigmoid_output.shape == (2, 3), \"Sigmoid output should preserve shape\"\n",
    "    assert np.all(sigmoid_output.data > 0), \"Sigmoid output should be positive\"\n",
    "    assert np.all(sigmoid_output.data < 1), \"Sigmoid output should be less than 1\"\n",
    "    \n",
    "    # Test Dense + Tanh (hidden layer with centered outputs)\n",
    "    tanh_output = tanh(linear_output)\n",
    "    \n",
    "    assert tanh_output.shape == (2, 3), \"Tanh output should preserve shape\"\n",
    "    assert np.all(tanh_output.data > -1), \"Tanh output should be > -1\"\n",
    "    assert np.all(tanh_output.data < 1), \"Tanh output should be < 1\"\n",
    "    \n",
    "    # Test chained layers (simple 2-layer network)\n",
    "    layer1 = Dense(input_size=4, output_size=5)\n",
    "    layer2 = Dense(input_size=5, output_size=3)\n",
    "    \n",
    "    # Forward pass through 2-layer network\n",
    "    hidden = relu(layer1(input_data))\n",
    "    output = softmax(layer2(hidden))\n",
    "    \n",
    "    assert output.shape == (2, 3), \"2-layer network should produce correct output shape\"\n",
    "    \n",
    "    # Each output should be a valid probability distribution\n",
    "    for i in range(2):\n",
    "        row_sum = np.sum(output.data[i])\n",
    "        assert abs(row_sum - 1.0) < 1e-6, f\"Network output row {i} should sum to 1\"\n",
    "    \n",
    "    # Test that layers are learning-ready (have parameters)\n",
    "    assert hasattr(layer1, 'weights'), \"Layer should have weights\"\n",
    "    assert hasattr(layer1, 'bias'), \"Layer should have bias\"\n",
    "    assert isinstance(layer1.weights, Tensor), \"Weights should be Tensor\"\n",
    "    assert isinstance(layer1.bias, Tensor), \"Bias should be Tensor\"\n",
    "    \n",
    "    print(\"✅ Layer-activation comprehensive tests passed!\")\n",
    "    print(f\"✅ Dense + ReLU working correctly\")\n",
    "    print(f\"✅ Dense + Softmax producing valid probabilities\")\n",
    "    print(f\"✅ Dense + Sigmoid bounded correctly\")\n",
    "    print(f\"✅ Dense + Tanh centered correctly\")\n",
    "    print(f\"✅ Multi-layer networks working\")\n",
    "    print(f\"✅ All components ready for training!\")\n",
    "\n",
    "# Run the test\n",
    "test_layer_activation()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9cfd022a",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 🧪 Module Testing\n",
    "\n",
    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
    "\n",
    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e508b1ce",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "standardized-testing",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
    "# =============================================================================\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    from tito.tools.testing import run_module_tests_auto\n",
    "    \n",
    "    # Automatically discover and run all tests in this module\n",
    "    success = run_module_tests_auto(\"Layers\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89a2d068",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 🎯 Module Summary: Neural Network Layers Mastery!\n",
    "\n",
    "Congratulations! You've successfully implemented the fundamental building blocks of neural networks:\n",
    "\n",
    "### ✅ What You've Built\n",
    "- **Matrix Multiplication**: The core operation powering all neural network computations\n",
    "- **Dense Layer**: The fundamental building block with proper weight initialization\n",
    "- **Integration**: How layers work with activation functions to create complete neural components\n",
    "- **Flexibility**: Support for bias/no-bias and naive/optimized matrix multiplication\n",
    "\n",
    "### ✅ Key Learning Outcomes\n",
    "- **Understanding**: How linear transformations enable feature learning\n",
    "- **Implementation**: Built layers from scratch with proper initialization\n",
    "- **Testing**: Progressive validation with immediate feedback\n",
    "- **Integration**: Saw how layers compose with activations for complete functionality\n",
    "- **Real-world skills**: Understanding the mathematics behind neural networks\n",
    "\n",
    "### ✅ Mathematical Mastery\n",
    "- **Matrix Multiplication**: C[i,j] = Σ(A[i,k] * B[k,j]) - implemented with loops\n",
    "- **Linear Transformation**: y = xW + b - the heart of neural networks\n",
    "- **Xavier Initialization**: Proper weight scaling for stable gradients\n",
    "- **Composition**: How multiple layers create complex functions\n",
    "\n",
    "### ✅ Professional Skills Developed\n",
    "- **Algorithm implementation**: From mathematical definition to working code\n",
    "- **Performance considerations**: Naive vs optimized implementations\n",
    "- **API design**: Clean, consistent interfaces for layer creation and usage\n",
    "- **Testing methodology**: Unit tests, comprehensive tests, and edge case handling\n",
    "\n",
    "### ✅ Ready for Next Steps\n",
    "Your layers are now ready to power:\n",
    "- **Complete Networks**: Stack multiple layers with activations\n",
    "- **Training**: Gradient computation and parameter updates\n",
    "- **Specialized Architectures**: CNNs, RNNs, Transformers all use these foundations\n",
    "- **Real Applications**: Image classification, NLP, game playing, etc.\n",
    "\n",
    "### 🔗 Connection to Real ML Systems\n",
    "Your implementations mirror production frameworks:\n",
    "- **PyTorch**: `torch.nn.Linear()` - same mathematical operations\n",
    "- **TensorFlow**: `tf.keras.layers.Dense()` - identical functionality\n",
    "- **Industry**: Every major neural network uses these exact computations\n",
    "\n",
    "### 🎯 The Power of Linear Algebra\n",
    "You've unlocked the mathematical foundation of AI:\n",
    "- **Feature combination**: Each layer learns how to combine input features\n",
    "- **Representation learning**: Layers automatically discover useful representations\n",
    "- **Universal approximation**: Stack enough layers to approximate any function\n",
    "- **Scalability**: Same operations work from small networks to massive language models\n",
    "\n",
    "### 🧠 Deep Learning Insights\n",
    "- **Why deep networks work**: Multiple layers = multiple levels of abstraction\n",
    "- **Parameter efficiency**: Shared weights enable learning with limited data\n",
    "- **Gradient flow**: Proper initialization enables training deep networks\n",
    "- **Composability**: Simple components combine to create complex intelligence\n",
    "\n",
    "**Next Module**: Networks - Composing your layers into complete neural network architectures!\n",
    "\n",
    "Your layers are the building blocks. Now let's assemble them into powerful neural networks that can learn to solve complex problems!"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}