TinyTorch/assignments/source/05_cnn/05_cnn.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "ca53839c",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "# Module X: CNN - Convolutional Neural Networks\n",
        "\n",
        "Welcome to the CNN module! Here you'll implement the core building block of modern computer vision: the convolutional layer.\n",
        "\n",
        "## Learning Goals\n",
        "- Understand the convolution operation (sliding window, local connectivity, weight sharing)\n",
        "- Implement Conv2D with explicit for-loops\n",
        "- Visualize how convolution builds feature maps\n",
        "- Compose Conv2D with other layers to build a simple ConvNet\n",
        "- (Stretch) Explore stride, padding, pooling, and multi-channel input\n",
        "\n",
        "## Build \u2192 Use \u2192 Understand\n",
        "1. **Build**: Conv2D layer using sliding window convolution\n",
        "2. **Use**: Transform images and see feature maps\n",
        "3. **Understand**: How CNNs learn spatial patterns"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9e0d8f02",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "## \ud83d\udce6 Where This Code Lives in the Final Package\n",
        "\n",
        "**Learning Side:** You work in `modules/cnn/cnn_dev.py`  \n",
        "**Building Side:** Code exports to `tinytorch.core.layers`\n",
        "\n",
        "```python\n",
        "# Final package structure:\n",
        "from tinytorch.core.layers import Dense, Conv2D  # Both layers together!\n",
        "from tinytorch.core.activations import ReLU\n",
        "from tinytorch.core.tensor import Tensor\n",
        "```\n",
        "\n",
        "**Why this matters:**\n",
        "- **Learning:** Focused modules for deep understanding\n",
        "- **Production:** Proper organization like PyTorch's `torch.nn`\n",
        "- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "fbd717db",
      "metadata": {},
      "outputs": [],
      "source": [
        "#| default_exp core.cnn"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7f22e530",
      "metadata": {},
      "outputs": [],
      "source": [
        "#| export\n",
        "import numpy as np\n",
        "from typing import List, Tuple, Optional\n",
        "from tinytorch.core.tensor import Tensor\n",
        "\n",
        "# Setup and imports (for development)\n",
        "import matplotlib.pyplot as plt\n",
        "from tinytorch.core.layers import Dense\n",
        "from tinytorch.core.activations import ReLU"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "f99723c8",
      "metadata": {
        "cell_marker": "\"\"\"",
        "lines_to_next_cell": 1
      },
      "source": [
        "## Step 1: What is Convolution?\n",
        "\n",
        "### Definition\n",
        "A **convolutional layer** applies a small filter (kernel) across the input, producing a feature map. This operation captures local patterns and is the foundation of modern vision models.\n",
        "\n",
        "### Why Convolution Matters in Computer Vision\n",
        "- **Local connectivity**: Each output value depends only on a small region of the input\n",
        "- **Weight sharing**: The same filter is applied everywhere (translation invariance)\n",
        "- **Spatial hierarchy**: Multiple layers build increasingly complex features\n",
        "- **Parameter efficiency**: Much fewer parameters than fully connected layers\n",
        "\n",
        "### The Fundamental Insight\n",
        "**Convolution is pattern matching!** The kernel learns to detect specific patterns:\n",
        "- **Edge detectors**: Find boundaries between objects\n",
        "- **Texture detectors**: Recognize surface patterns\n",
        "- **Shape detectors**: Identify geometric forms\n",
        "- **Feature detectors**: Combine simple patterns into complex features\n",
        "\n",
        "### Real-World Examples\n",
        "- **Image processing**: Detect edges, blur, sharpen\n",
        "- **Computer vision**: Recognize objects, faces, text\n",
        "- **Medical imaging**: Detect tumors, analyze scans\n",
        "- **Autonomous driving**: Identify traffic signs, pedestrians\n",
        "\n",
        "### Visual Intuition\n",
        "```\n",
        "Input Image:     Kernel:        Output Feature Map:\n",
        "[1, 2, 3]       [1,  0]       [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)]\n",
        "[4, 5, 6]       [0, -1]       [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]\n",
        "[7, 8, 9]\n",
        "```\n",
        "\n",
        "The kernel slides across the input, computing dot products at each position.\n",
        "\n",
        "### The Math Behind It\n",
        "For input I (H\u00d7W) and kernel K (kH\u00d7kW), the output O (out_H\u00d7out_W) is:\n",
        "```\n",
        "O[i,j] = sum(I[i+di, j+dj] * K[di, dj] for di in range(kH), dj in range(kW))\n",
        "```\n",
        "\n",
        "Let's implement this step by step!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "aa4af055",
      "metadata": {
        "lines_to_next_cell": 1
      },
      "outputs": [],
      "source": [
        "#| export\n",
        "def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:\n",
        "    \"\"\"\n",
        "    Naive 2D convolution (single channel, no stride, no padding).\n",
        "    \n",
        "    Args:\n",
        "        input: 2D input array (H, W)\n",
        "        kernel: 2D filter (kH, kW)\n",
        "    Returns:\n",
        "        2D output array (H-kH+1, W-kW+1)\n",
        "        \n",
        "    TODO: Implement the sliding window convolution using for-loops.\n",
        "    \n",
        "    APPROACH:\n",
        "    1. Get input dimensions: H, W = input.shape\n",
        "    2. Get kernel dimensions: kH, kW = kernel.shape\n",
        "    3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1\n",
        "    4. Create output array: np.zeros((out_H, out_W))\n",
        "    5. Use nested loops to slide the kernel:\n",
        "       - i loop: output rows (0 to out_H-1)\n",
        "       - j loop: output columns (0 to out_W-1)\n",
        "       - di loop: kernel rows (0 to kH-1)\n",
        "       - dj loop: kernel columns (0 to kW-1)\n",
        "    6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
        "    \n",
        "    EXAMPLE:\n",
        "    Input: [[1, 2, 3],     Kernel: [[1, 0],\n",
        "            [4, 5, 6],               [0, -1]]\n",
        "            [7, 8, 9]]\n",
        "    \n",
        "    Output[0,0] = 1*1 + 2*0 + 4*0 + 5*(-1) = 1 - 5 = -4\n",
        "    Output[0,1] = 2*1 + 3*0 + 5*0 + 6*(-1) = 2 - 6 = -4\n",
        "    Output[1,0] = 4*1 + 5*0 + 7*0 + 8*(-1) = 4 - 8 = -4\n",
        "    Output[1,1] = 5*1 + 6*0 + 8*0 + 9*(-1) = 5 - 9 = -4\n",
        "    \n",
        "    HINTS:\n",
        "    - Start with output = np.zeros((out_H, out_W))\n",
        "    - Use four nested loops: for i in range(out_H): for j in range(out_W): for di in range(kH): for dj in range(kW):\n",
        "    - Accumulate the sum: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
        "    \"\"\"\n",
        "    raise NotImplementedError(\"Student implementation required\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "d83b2c10",
      "metadata": {
        "lines_to_next_cell": 1
      },
      "outputs": [],
      "source": [
        "#| hide\n",
        "#| export\n",
        "def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:\n",
        "    H, W = input.shape\n",
        "    kH, kW = kernel.shape\n",
        "    out_H, out_W = H - kH + 1, W - kW + 1\n",
        "    output = np.zeros((out_H, out_W), dtype=input.dtype)\n",
        "    for i in range(out_H):\n",
        "        for j in range(out_W):\n",
        "            for di in range(kH):\n",
        "                for dj in range(kW):\n",
        "                    output[i, j] += input[i + di, j + dj] * kernel[di, dj]\n",
        "    return output"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "454a6bad",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "### \ud83e\uddea Test Your Conv2D Implementation\n",
        "\n",
        "Try your function on this simple example:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7705032a",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Test case for conv2d_naive\n",
        "input = np.array([\n",
        "    [1, 2, 3],\n",
        "    [4, 5, 6],\n",
        "    [7, 8, 9]\n",
        "], dtype=np.float32)\n",
        "kernel = np.array([\n",
        "    [1, 0],\n",
        "    [0, -1]\n",
        "], dtype=np.float32)\n",
        "\n",
        "expected = np.array([\n",
        "    [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)],\n",
        "    [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]\n",
        "], dtype=np.float32)\n",
        "\n",
        "try:\n",
        "    output = conv2d_naive(input, kernel)\n",
        "    print(\"\u2705 Input:\\n\", input)\n",
        "    print(\"\u2705 Kernel:\\n\", kernel)\n",
        "    print(\"\u2705 Your output:\\n\", output)\n",
        "    print(\"\u2705 Expected:\\n\", expected)\n",
        "    assert np.allclose(output, expected), \"\u274c Output does not match expected!\"\n",
        "    print(\"\ud83c\udf89 conv2d_naive works!\")\n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Make sure to implement conv2d_naive above!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "53449e22",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "## Step 2: Understanding What Convolution Does\n",
        "\n",
        "Let's visualize how different kernels detect different patterns:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "05a1ce2c",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Visualize different convolution kernels\n",
        "print(\"Visualizing different convolution kernels...\")\n",
        "\n",
        "try:\n",
        "    # Test different kernels\n",
        "    test_input = np.array([\n",
        "        [1, 1, 1, 0, 0],\n",
        "        [1, 1, 1, 0, 0],\n",
        "        [1, 1, 1, 0, 0],\n",
        "        [0, 0, 0, 0, 0],\n",
        "        [0, 0, 0, 0, 0]\n",
        "    ], dtype=np.float32)\n",
        "    \n",
        "    # Edge detection kernel (horizontal)\n",
        "    edge_kernel = np.array([\n",
        "        [1, 1, 1],\n",
        "        [0, 0, 0],\n",
        "        [-1, -1, -1]\n",
        "    ], dtype=np.float32)\n",
        "    \n",
        "    # Sharpening kernel\n",
        "    sharpen_kernel = np.array([\n",
        "        [0, -1, 0],\n",
        "        [-1, 5, -1],\n",
        "        [0, -1, 0]\n",
        "    ], dtype=np.float32)\n",
        "    \n",
        "    # Test edge detection\n",
        "    edge_output = conv2d_naive(test_input, edge_kernel)\n",
        "    print(\"\u2705 Edge detection kernel:\")\n",
        "    print(\"   Detects horizontal edges (boundaries between light and dark)\")\n",
        "    print(\"   Output:\\n\", edge_output)\n",
        "    \n",
        "    # Test sharpening\n",
        "    sharpen_output = conv2d_naive(test_input, sharpen_kernel)\n",
        "    print(\"\u2705 Sharpening kernel:\")\n",
        "    print(\"   Enhances edges and details\")\n",
        "    print(\"   Output:\\n\", sharpen_output)\n",
        "    \n",
        "    print(\"\\n\ud83d\udca1 Different kernels detect different patterns!\")\n",
        "    print(\"   Neural networks learn these kernels automatically!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "0b33791b",
      "metadata": {
        "cell_marker": "\"\"\"",
        "lines_to_next_cell": 1
      },
      "source": [
        "## Step 3: Conv2D Layer Class\n",
        "\n",
        "Now let's wrap your convolution function in a layer class for use in networks. This makes it consistent with other layers like Dense.\n",
        "\n",
        "### Why Layer Classes Matter\n",
        "- **Consistent API**: Same interface as Dense layers\n",
        "- **Learnable parameters**: Kernels can be learned from data\n",
        "- **Composability**: Can be combined with other layers\n",
        "- **Integration**: Works seamlessly with the rest of TinyTorch\n",
        "\n",
        "### The Pattern\n",
        "```\n",
        "Input Tensor \u2192 Conv2D \u2192 Output Tensor\n",
        "```\n",
        "\n",
        "Just like Dense layers, but with spatial operations instead of linear transformations."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "118ba687",
      "metadata": {
        "lines_to_next_cell": 1
      },
      "outputs": [],
      "source": [
        "#| export\n",
        "class Conv2D:\n",
        "    \"\"\"\n",
        "    2D Convolutional Layer (single channel, single filter, no stride/pad).\n",
        "    \n",
        "    Args:\n",
        "        kernel_size: (kH, kW) - size of the convolution kernel\n",
        "        \n",
        "    TODO: Initialize a random kernel and implement the forward pass using conv2d_naive.\n",
        "    \n",
        "    APPROACH:\n",
        "    1. Store kernel_size as instance variable\n",
        "    2. Initialize random kernel with small values\n",
        "    3. Implement forward pass using conv2d_naive function\n",
        "    4. Return Tensor wrapped around the result\n",
        "    \n",
        "    EXAMPLE:\n",
        "    layer = Conv2D(kernel_size=(2, 2))\n",
        "    x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # shape (3, 3)\n",
        "    y = layer(x)  # shape (2, 2)\n",
        "    \n",
        "    HINTS:\n",
        "    - Store kernel_size as (kH, kW)\n",
        "    - Initialize kernel with np.random.randn(kH, kW) * 0.1 (small values)\n",
        "    - Use conv2d_naive(x.data, self.kernel) in forward pass\n",
        "    - Return Tensor(result) to wrap the result\n",
        "    \"\"\"\n",
        "    def __init__(self, kernel_size: Tuple[int, int]):\n",
        "        \"\"\"\n",
        "        Initialize Conv2D layer with random kernel.\n",
        "        \n",
        "        Args:\n",
        "            kernel_size: (kH, kW) - size of the convolution kernel\n",
        "            \n",
        "        TODO: \n",
        "        1. Store kernel_size as instance variable\n",
        "        2. Initialize random kernel with small values\n",
        "        3. Scale kernel values to prevent large outputs\n",
        "        \n",
        "        STEP-BY-STEP:\n",
        "        1. Store kernel_size as self.kernel_size\n",
        "        2. Unpack kernel_size into kH, kW\n",
        "        3. Initialize kernel: np.random.randn(kH, kW) * 0.1\n",
        "        4. Convert to float32 for consistency\n",
        "        \n",
        "        EXAMPLE:\n",
        "        Conv2D((2, 2)) creates:\n",
        "        - kernel: shape (2, 2) with small random values\n",
        "        \"\"\"\n",
        "        raise NotImplementedError(\"Student implementation required\")\n",
        "    \n",
        "    def forward(self, x: Tensor) -> Tensor:\n",
        "        \"\"\"\n",
        "        Forward pass: apply convolution to input.\n",
        "        \n",
        "        Args:\n",
        "            x: Input tensor of shape (H, W)\n",
        "            \n",
        "        Returns:\n",
        "            Output tensor of shape (H-kH+1, W-kW+1)\n",
        "            \n",
        "        TODO: Implement convolution using conv2d_naive function.\n",
        "        \n",
        "        STEP-BY-STEP:\n",
        "        1. Use conv2d_naive(x.data, self.kernel)\n",
        "        2. Return Tensor(result)\n",
        "        \n",
        "        EXAMPLE:\n",
        "        Input x: Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # shape (3, 3)\n",
        "        Kernel: shape (2, 2)\n",
        "        Output: Tensor([[val1, val2], [val3, val4]])  # shape (2, 2)\n",
        "        \n",
        "        HINTS:\n",
        "        - x.data gives you the numpy array\n",
        "        - self.kernel is your learned kernel\n",
        "        - Use conv2d_naive(x.data, self.kernel)\n",
        "        - Return Tensor(result) to wrap the result\n",
        "        \"\"\"\n",
        "        raise NotImplementedError(\"Student implementation required\")\n",
        "    \n",
        "    def __call__(self, x: Tensor) -> Tensor:\n",
        "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
        "        return self.forward(x)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3e18c382",
      "metadata": {
        "lines_to_next_cell": 1
      },
      "outputs": [],
      "source": [
        "#| hide\n",
        "#| export\n",
        "class Conv2D:\n",
        "    def __init__(self, kernel_size: Tuple[int, int]):\n",
        "        self.kernel_size = kernel_size\n",
        "        kH, kW = kernel_size\n",
        "        # Initialize with small random values\n",
        "        self.kernel = np.random.randn(kH, kW).astype(np.float32) * 0.1\n",
        "    \n",
        "    def forward(self, x: Tensor) -> Tensor:\n",
        "        return Tensor(conv2d_naive(x.data, self.kernel))\n",
        "    \n",
        "    def __call__(self, x: Tensor) -> Tensor:\n",
        "        return self.forward(x)"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "e288fb18",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "### \ud83e\uddea Test Your Conv2D Layer"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "2f1a4a6a",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Test Conv2D layer\n",
        "print(\"Testing Conv2D layer...\")\n",
        "\n",
        "try:\n",
        "    # Test basic Conv2D layer\n",
        "    conv = Conv2D(kernel_size=(2, 2))\n",
        "    x = Tensor(np.array([\n",
        "        [1, 2, 3],\n",
        "        [4, 5, 6],\n",
        "        [7, 8, 9]\n",
        "    ], dtype=np.float32))\n",
        "    \n",
        "    print(f\"\u2705 Input shape: {x.shape}\")\n",
        "    print(f\"\u2705 Kernel shape: {conv.kernel.shape}\")\n",
        "    print(f\"\u2705 Kernel values:\\n{conv.kernel}\")\n",
        "    \n",
        "    y = conv(x)\n",
        "    print(f\"\u2705 Output shape: {y.shape}\")\n",
        "    print(f\"\u2705 Output: {y}\")\n",
        "    \n",
        "    # Test with different kernel size\n",
        "    conv2 = Conv2D(kernel_size=(3, 3))\n",
        "    y2 = conv2(x)\n",
        "    print(f\"\u2705 3x3 kernel output shape: {y2.shape}\")\n",
        "    \n",
        "    print(\"\\n\ud83c\udf89 Conv2D layer works!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Make sure to implement the Conv2D layer above!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "97939763",
      "metadata": {
        "cell_marker": "\"\"\"",
        "lines_to_next_cell": 1
      },
      "source": [
        "## Step 4: Building a Simple ConvNet\n",
        "\n",
        "Now let's compose Conv2D layers with other layers to build a complete convolutional neural network!\n",
        "\n",
        "### Why ConvNets Matter\n",
        "- **Spatial hierarchy**: Each layer learns increasingly complex features\n",
        "- **Parameter sharing**: Same kernel applied everywhere (efficiency)\n",
        "- **Translation invariance**: Can recognize objects regardless of position\n",
        "- **Real-world success**: Power most modern computer vision systems\n",
        "\n",
        "### The Architecture\n",
        "```\n",
        "Input Image \u2192 Conv2D \u2192 ReLU \u2192 Flatten \u2192 Dense \u2192 Output\n",
        "```\n",
        "\n",
        "This simple architecture can learn to recognize patterns in images!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "51631fe6",
      "metadata": {
        "lines_to_next_cell": 1
      },
      "outputs": [],
      "source": [
        "#| export\n",
        "def flatten(x: Tensor) -> Tensor:\n",
        "    \"\"\"\n",
        "    Flatten a 2D tensor to 1D (for connecting to Dense).\n",
        "    \n",
        "    TODO: Implement flattening operation.\n",
        "    \n",
        "    APPROACH:\n",
        "    1. Get the numpy array from the tensor\n",
        "    2. Use .flatten() to convert to 1D\n",
        "    3. Add batch dimension with [None, :]\n",
        "    4. Return Tensor wrapped around the result\n",
        "    \n",
        "    EXAMPLE:\n",
        "    Input: Tensor([[1, 2], [3, 4]])  # shape (2, 2)\n",
        "    Output: Tensor([[1, 2, 3, 4]])  # shape (1, 4)\n",
        "    \n",
        "    HINTS:\n",
        "    - Use x.data.flatten() to get 1D array\n",
        "    - Add batch dimension: result[None, :]\n",
        "    - Return Tensor(result)\n",
        "    \"\"\"\n",
        "    raise NotImplementedError(\"Student implementation required\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "7e8f2b50",
      "metadata": {
        "lines_to_next_cell": 1
      },
      "outputs": [],
      "source": [
        "#| hide\n",
        "#| export\n",
        "def flatten(x: Tensor) -> Tensor:\n",
        "    \"\"\"Flatten a 2D tensor to 1D (for connecting to Dense).\"\"\"\n",
        "    return Tensor(x.data.flatten()[None, :])"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "7bdb9f80",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "### \ud83e\uddea Test Your Flatten Function"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "c6d92ebc",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Test flatten function\n",
        "print(\"Testing flatten function...\")\n",
        "\n",
        "try:\n",
        "    # Test flattening\n",
        "    x = Tensor([[1, 2, 3], [4, 5, 6]])  # shape (2, 3)\n",
        "    flattened = flatten(x)\n",
        "    \n",
        "    print(f\"\u2705 Input shape: {x.shape}\")\n",
        "    print(f\"\u2705 Flattened shape: {flattened.shape}\")\n",
        "    print(f\"\u2705 Flattened values: {flattened}\")\n",
        "    \n",
        "    # Verify the flattening worked correctly\n",
        "    expected = np.array([[1, 2, 3, 4, 5, 6]])\n",
        "    assert np.allclose(flattened.data, expected), \"\u274c Flattening incorrect!\"\n",
        "    print(\"\u2705 Flattening works correctly!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Make sure to implement the flatten function above!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9804128d",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "## Step 5: Composing a Complete ConvNet\n",
        "\n",
        "Now let's build a simple convolutional neural network that can process images!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "d60d05b9",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Compose a simple ConvNet\n",
        "print(\"Building a simple ConvNet...\")\n",
        "\n",
        "try:\n",
        "    # Create network components\n",
        "    conv = Conv2D((2, 2))\n",
        "    relu = ReLU()\n",
        "    dense = Dense(input_size=4, output_size=1)  # 4 features from 2x2 output\n",
        "    \n",
        "    # Test input (small 3x3 \"image\")\n",
        "    x = Tensor(np.random.randn(3, 3).astype(np.float32))\n",
        "    print(f\"\u2705 Input shape: {x.shape}\")\n",
        "    print(f\"\u2705 Input: {x}\")\n",
        "    \n",
        "    # Forward pass through the network\n",
        "    conv_out = conv(x)\n",
        "    print(f\"\u2705 After Conv2D: {conv_out}\")\n",
        "    \n",
        "    relu_out = relu(conv_out)\n",
        "    print(f\"\u2705 After ReLU: {relu_out}\")\n",
        "    \n",
        "    flattened = flatten(relu_out)\n",
        "    print(f\"\u2705 After flatten: {flattened}\")\n",
        "    \n",
        "    final_out = dense(flattened)\n",
        "    print(f\"\u2705 Final output: {final_out}\")\n",
        "    \n",
        "    print(\"\\n\ud83c\udf89 Simple ConvNet works!\")\n",
        "    print(\"This network can learn to recognize patterns in images!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Check your Conv2D, flatten, and Dense implementations!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "9fe4faf0",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "## Step 6: Understanding the Power of Convolution\n",
        "\n",
        "Let's see how convolution captures different types of patterns:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "434133c2",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Demonstrate pattern detection\n",
        "print(\"Demonstrating pattern detection...\")\n",
        "\n",
        "try:\n",
        "    # Create a simple \"image\" with a pattern\n",
        "    image = np.array([\n",
        "        [0, 0, 0, 0, 0],\n",
        "        [0, 1, 1, 1, 0],\n",
        "        [0, 1, 1, 1, 0],\n",
        "        [0, 1, 1, 1, 0],\n",
        "        [0, 0, 0, 0, 0]\n",
        "    ], dtype=np.float32)\n",
        "    \n",
        "    # Different kernels detect different patterns\n",
        "    edge_kernel = np.array([\n",
        "        [1, 1, 1],\n",
        "        [1, -8, 1],\n",
        "        [1, 1, 1]\n",
        "    ], dtype=np.float32)\n",
        "    \n",
        "    blur_kernel = np.array([\n",
        "        [1/9, 1/9, 1/9],\n",
        "        [1/9, 1/9, 1/9],\n",
        "        [1/9, 1/9, 1/9]\n",
        "    ], dtype=np.float32)\n",
        "    \n",
        "    # Test edge detection\n",
        "    edge_result = conv2d_naive(image, edge_kernel)\n",
        "    print(\"\u2705 Edge detection:\")\n",
        "    print(\"   Detects boundaries around the white square\")\n",
        "    print(\"   Result:\\n\", edge_result)\n",
        "    \n",
        "    # Test blurring\n",
        "    blur_result = conv2d_naive(image, blur_kernel)\n",
        "    print(\"\u2705 Blurring:\")\n",
        "    print(\"   Smooths the image\")\n",
        "    print(\"   Result:\\n\", blur_result)\n",
        "    \n",
        "    print(\"\\n\ud83d\udca1 Different kernels = different feature detectors!\")\n",
        "    print(\"   Neural networks learn these automatically from data!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "id": "80938b52",
      "metadata": {
        "cell_marker": "\"\"\""
      },
      "source": [
        "## \ud83c\udfaf Module Summary\n",
        "\n",
        "Congratulations! You've built the foundation of convolutional neural networks:\n",
        "\n",
        "### What You've Accomplished\n",
        "\u2705 **Convolution Operation**: Understanding the sliding window mechanism  \n",
        "\u2705 **Conv2D Layer**: Learnable convolutional layer implementation  \n",
        "\u2705 **Pattern Detection**: Visualizing how kernels detect different features  \n",
        "\u2705 **ConvNet Architecture**: Composing Conv2D with other layers  \n",
        "\u2705 **Real-world Applications**: Understanding computer vision applications  \n",
        "\n",
        "### Key Concepts You've Learned\n",
        "- **Convolution** is pattern matching with sliding windows\n",
        "- **Local connectivity** means each output depends on a small input region\n",
        "- **Weight sharing** makes CNNs parameter-efficient\n",
        "- **Spatial hierarchy** builds complex features from simple patterns\n",
        "- **Translation invariance** allows recognition regardless of position\n",
        "\n",
        "### What's Next\n",
        "In the next modules, you'll build on this foundation:\n",
        "- **Advanced CNN features**: Stride, padding, pooling\n",
        "- **Multi-channel convolution**: RGB images, multiple filters\n",
        "- **Training**: Learning kernels from data\n",
        "- **Real applications**: Image classification, object detection\n",
        "\n",
        "### Real-World Connection\n",
        "Your Conv2D layer is now ready to:\n",
        "- Learn edge detectors, texture recognizers, and shape detectors\n",
        "- Process real images for computer vision tasks\n",
        "- Integrate with the rest of the TinyTorch ecosystem\n",
        "- Scale to complex architectures like ResNet, VGG, etc.\n",
        "\n",
        "**Ready for the next challenge?** Let's move on to training these networks!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "03f153f1",
      "metadata": {},
      "outputs": [],
      "source": [
        "# Final verification\n",
        "print(\"\\n\" + \"=\"*50)\n",
        "print(\"\ud83c\udf89 CNN MODULE COMPLETE!\")\n",
        "print(\"=\"*50)\n",
        "print(\"\u2705 Convolution operation understanding\")\n",
        "print(\"\u2705 Conv2D layer implementation\")\n",
        "print(\"\u2705 Pattern detection visualization\")\n",
        "print(\"\u2705 ConvNet architecture composition\")\n",
        "print(\"\u2705 Real-world computer vision context\")\n",
        "print(\"\\n\ud83d\ude80 Ready to train networks in the next module!\") "
      ]
    }
  ],
  "metadata": {
    "jupytext": {
      "main_language": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}