mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-30 10:13:57 -05:00
✅ COMPLETED: - Instructor solution executes perfectly - NBDev export works (fixed import directives) - Package functionality verified - Student assignment generation works - CLI integration complete - Systematic testing framework established ⚠️ CRITICAL DISCOVERY: - NBGrader requires cell metadata architecture changes - Current generator creates content correctly but wrong cell types - Would require major rework of assignment generation pipeline 📊 STATUS: - Core TinyTorch functionality: ✅ READY FOR STUDENTS - NBGrader integration: Requires Phase 2 rework - Ready to continue systematic testing of modules 01-06 🔧 FIXES APPLIED: - Added #| export directive to imports in enhanced modules - Fixed generator logic for student scaffolding - Updated testing framework and documentation
816 lines
30 KiB
Plaintext
816 lines
30 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "ca53839c",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"# Module X: CNN - Convolutional Neural Networks\n",
|
|
"\n",
|
|
"Welcome to the CNN module! Here you'll implement the core building block of modern computer vision: the convolutional layer.\n",
|
|
"\n",
|
|
"## Learning Goals\n",
|
|
"- Understand the convolution operation (sliding window, local connectivity, weight sharing)\n",
|
|
"- Implement Conv2D with explicit for-loops\n",
|
|
"- Visualize how convolution builds feature maps\n",
|
|
"- Compose Conv2D with other layers to build a simple ConvNet\n",
|
|
"- (Stretch) Explore stride, padding, pooling, and multi-channel input\n",
|
|
"\n",
|
|
"## Build \u2192 Use \u2192 Understand\n",
|
|
"1. **Build**: Conv2D layer using sliding window convolution\n",
|
|
"2. **Use**: Transform images and see feature maps\n",
|
|
"3. **Understand**: How CNNs learn spatial patterns"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9e0d8f02",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## \ud83d\udce6 Where This Code Lives in the Final Package\n",
|
|
"\n",
|
|
"**Learning Side:** You work in `modules/cnn/cnn_dev.py` \n",
|
|
"**Building Side:** Code exports to `tinytorch.core.layers`\n",
|
|
"\n",
|
|
"```python\n",
|
|
"# Final package structure:\n",
|
|
"from tinytorch.core.layers import Dense, Conv2D # Both layers together!\n",
|
|
"from tinytorch.core.activations import ReLU\n",
|
|
"from tinytorch.core.tensor import Tensor\n",
|
|
"```\n",
|
|
"\n",
|
|
"**Why this matters:**\n",
|
|
"- **Learning:** Focused modules for deep understanding\n",
|
|
"- **Production:** Proper organization like PyTorch's `torch.nn`\n",
|
|
"- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "fbd717db",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| default_exp core.cnn"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7f22e530",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"import numpy as np\n",
|
|
"from typing import List, Tuple, Optional\n",
|
|
"from tinytorch.core.tensor import Tensor\n",
|
|
"\n",
|
|
"# Setup and imports (for development)\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"from tinytorch.core.layers import Dense\n",
|
|
"from tinytorch.core.activations import ReLU"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "f99723c8",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"## Step 1: What is Convolution?\n",
|
|
"\n",
|
|
"### Definition\n",
|
|
"A **convolutional layer** applies a small filter (kernel) across the input, producing a feature map. This operation captures local patterns and is the foundation of modern vision models.\n",
|
|
"\n",
|
|
"### Why Convolution Matters in Computer Vision\n",
|
|
"- **Local connectivity**: Each output value depends only on a small region of the input\n",
|
|
"- **Weight sharing**: The same filter is applied everywhere (translation invariance)\n",
|
|
"- **Spatial hierarchy**: Multiple layers build increasingly complex features\n",
|
|
"- **Parameter efficiency**: Much fewer parameters than fully connected layers\n",
|
|
"\n",
|
|
"### The Fundamental Insight\n",
|
|
"**Convolution is pattern matching!** The kernel learns to detect specific patterns:\n",
|
|
"- **Edge detectors**: Find boundaries between objects\n",
|
|
"- **Texture detectors**: Recognize surface patterns\n",
|
|
"- **Shape detectors**: Identify geometric forms\n",
|
|
"- **Feature detectors**: Combine simple patterns into complex features\n",
|
|
"\n",
|
|
"### Real-World Examples\n",
|
|
"- **Image processing**: Detect edges, blur, sharpen\n",
|
|
"- **Computer vision**: Recognize objects, faces, text\n",
|
|
"- **Medical imaging**: Detect tumors, analyze scans\n",
|
|
"- **Autonomous driving**: Identify traffic signs, pedestrians\n",
|
|
"\n",
|
|
"### Visual Intuition\n",
|
|
"```\n",
|
|
"Input Image: Kernel: Output Feature Map:\n",
|
|
"[1, 2, 3] [1, 0] [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)]\n",
|
|
"[4, 5, 6] [0, -1] [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]\n",
|
|
"[7, 8, 9]\n",
|
|
"```\n",
|
|
"\n",
|
|
"The kernel slides across the input, computing dot products at each position.\n",
|
|
"\n",
|
|
"### The Math Behind It\n",
|
|
"For input I (H\u00d7W) and kernel K (kH\u00d7kW), the output O (out_H\u00d7out_W) is:\n",
|
|
"```\n",
|
|
"O[i,j] = sum(I[i+di, j+dj] * K[di, dj] for di in range(kH), dj in range(kW))\n",
|
|
"```\n",
|
|
"\n",
|
|
"Let's implement this step by step!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "aa4af055",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:\n",
|
|
" \"\"\"\n",
|
|
" Naive 2D convolution (single channel, no stride, no padding).\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" input: 2D input array (H, W)\n",
|
|
" kernel: 2D filter (kH, kW)\n",
|
|
" Returns:\n",
|
|
" 2D output array (H-kH+1, W-kW+1)\n",
|
|
" \n",
|
|
" TODO: Implement the sliding window convolution using for-loops.\n",
|
|
" \n",
|
|
" APPROACH:\n",
|
|
" 1. Get input dimensions: H, W = input.shape\n",
|
|
" 2. Get kernel dimensions: kH, kW = kernel.shape\n",
|
|
" 3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1\n",
|
|
" 4. Create output array: np.zeros((out_H, out_W))\n",
|
|
" 5. Use nested loops to slide the kernel:\n",
|
|
" - i loop: output rows (0 to out_H-1)\n",
|
|
" - j loop: output columns (0 to out_W-1)\n",
|
|
" - di loop: kernel rows (0 to kH-1)\n",
|
|
" - dj loop: kernel columns (0 to kW-1)\n",
|
|
" 6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
|
|
" \n",
|
|
" EXAMPLE:\n",
|
|
" Input: [[1, 2, 3], Kernel: [[1, 0],\n",
|
|
" [4, 5, 6], [0, -1]]\n",
|
|
" [7, 8, 9]]\n",
|
|
" \n",
|
|
" Output[0,0] = 1*1 + 2*0 + 4*0 + 5*(-1) = 1 - 5 = -4\n",
|
|
" Output[0,1] = 2*1 + 3*0 + 5*0 + 6*(-1) = 2 - 6 = -4\n",
|
|
" Output[1,0] = 4*1 + 5*0 + 7*0 + 8*(-1) = 4 - 8 = -4\n",
|
|
" Output[1,1] = 5*1 + 6*0 + 8*0 + 9*(-1) = 5 - 9 = -4\n",
|
|
" \n",
|
|
" HINTS:\n",
|
|
" - Start with output = np.zeros((out_H, out_W))\n",
|
|
" - Use four nested loops: for i in range(out_H): for j in range(out_W): for di in range(kH): for dj in range(kW):\n",
|
|
" - Accumulate the sum: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
|
|
" \"\"\"\n",
|
|
" raise NotImplementedError(\"Student implementation required\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d83b2c10",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| hide\n",
|
|
"#| export\n",
|
|
"def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:\n",
|
|
" H, W = input.shape\n",
|
|
" kH, kW = kernel.shape\n",
|
|
" out_H, out_W = H - kH + 1, W - kW + 1\n",
|
|
" output = np.zeros((out_H, out_W), dtype=input.dtype)\n",
|
|
" for i in range(out_H):\n",
|
|
" for j in range(out_W):\n",
|
|
" for di in range(kH):\n",
|
|
" for dj in range(kW):\n",
|
|
" output[i, j] += input[i + di, j + dj] * kernel[di, dj]\n",
|
|
" return output"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "454a6bad",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"### \ud83e\uddea Test Your Conv2D Implementation\n",
|
|
"\n",
|
|
"Try your function on this simple example:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7705032a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Test case for conv2d_naive\n",
|
|
"input = np.array([\n",
|
|
" [1, 2, 3],\n",
|
|
" [4, 5, 6],\n",
|
|
" [7, 8, 9]\n",
|
|
"], dtype=np.float32)\n",
|
|
"kernel = np.array([\n",
|
|
" [1, 0],\n",
|
|
" [0, -1]\n",
|
|
"], dtype=np.float32)\n",
|
|
"\n",
|
|
"expected = np.array([\n",
|
|
" [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)],\n",
|
|
" [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]\n",
|
|
"], dtype=np.float32)\n",
|
|
"\n",
|
|
"try:\n",
|
|
" output = conv2d_naive(input, kernel)\n",
|
|
" print(\"\u2705 Input:\\n\", input)\n",
|
|
" print(\"\u2705 Kernel:\\n\", kernel)\n",
|
|
" print(\"\u2705 Your output:\\n\", output)\n",
|
|
" print(\"\u2705 Expected:\\n\", expected)\n",
|
|
" assert np.allclose(output, expected), \"\u274c Output does not match expected!\"\n",
|
|
" print(\"\ud83c\udf89 conv2d_naive works!\")\n",
|
|
"except Exception as e:\n",
|
|
" print(f\"\u274c Error: {e}\")\n",
|
|
" print(\"Make sure to implement conv2d_naive above!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "53449e22",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## Step 2: Understanding What Convolution Does\n",
|
|
"\n",
|
|
"Let's visualize how different kernels detect different patterns:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "05a1ce2c",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Visualize different convolution kernels\n",
|
|
"print(\"Visualizing different convolution kernels...\")\n",
|
|
"\n",
|
|
"try:\n",
|
|
" # Test different kernels\n",
|
|
" test_input = np.array([\n",
|
|
" [1, 1, 1, 0, 0],\n",
|
|
" [1, 1, 1, 0, 0],\n",
|
|
" [1, 1, 1, 0, 0],\n",
|
|
" [0, 0, 0, 0, 0],\n",
|
|
" [0, 0, 0, 0, 0]\n",
|
|
" ], dtype=np.float32)\n",
|
|
" \n",
|
|
" # Edge detection kernel (horizontal)\n",
|
|
" edge_kernel = np.array([\n",
|
|
" [1, 1, 1],\n",
|
|
" [0, 0, 0],\n",
|
|
" [-1, -1, -1]\n",
|
|
" ], dtype=np.float32)\n",
|
|
" \n",
|
|
" # Sharpening kernel\n",
|
|
" sharpen_kernel = np.array([\n",
|
|
" [0, -1, 0],\n",
|
|
" [-1, 5, -1],\n",
|
|
" [0, -1, 0]\n",
|
|
" ], dtype=np.float32)\n",
|
|
" \n",
|
|
" # Test edge detection\n",
|
|
" edge_output = conv2d_naive(test_input, edge_kernel)\n",
|
|
" print(\"\u2705 Edge detection kernel:\")\n",
|
|
" print(\" Detects horizontal edges (boundaries between light and dark)\")\n",
|
|
" print(\" Output:\\n\", edge_output)\n",
|
|
" \n",
|
|
" # Test sharpening\n",
|
|
" sharpen_output = conv2d_naive(test_input, sharpen_kernel)\n",
|
|
" print(\"\u2705 Sharpening kernel:\")\n",
|
|
" print(\" Enhances edges and details\")\n",
|
|
" print(\" Output:\\n\", sharpen_output)\n",
|
|
" \n",
|
|
" print(\"\\n\ud83d\udca1 Different kernels detect different patterns!\")\n",
|
|
" print(\" Neural networks learn these kernels automatically!\")\n",
|
|
" \n",
|
|
"except Exception as e:\n",
|
|
" print(f\"\u274c Error: {e}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "0b33791b",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"## Step 3: Conv2D Layer Class\n",
|
|
"\n",
|
|
"Now let's wrap your convolution function in a layer class for use in networks. This makes it consistent with other layers like Dense.\n",
|
|
"\n",
|
|
"### Why Layer Classes Matter\n",
|
|
"- **Consistent API**: Same interface as Dense layers\n",
|
|
"- **Learnable parameters**: Kernels can be learned from data\n",
|
|
"- **Composability**: Can be combined with other layers\n",
|
|
"- **Integration**: Works seamlessly with the rest of TinyTorch\n",
|
|
"\n",
|
|
"### The Pattern\n",
|
|
"```\n",
|
|
"Input Tensor \u2192 Conv2D \u2192 Output Tensor\n",
|
|
"```\n",
|
|
"\n",
|
|
"Just like Dense layers, but with spatial operations instead of linear transformations."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "118ba687",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"class Conv2D:\n",
|
|
" \"\"\"\n",
|
|
" 2D Convolutional Layer (single channel, single filter, no stride/pad).\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" kernel_size: (kH, kW) - size of the convolution kernel\n",
|
|
" \n",
|
|
" TODO: Initialize a random kernel and implement the forward pass using conv2d_naive.\n",
|
|
" \n",
|
|
" APPROACH:\n",
|
|
" 1. Store kernel_size as instance variable\n",
|
|
" 2. Initialize random kernel with small values\n",
|
|
" 3. Implement forward pass using conv2d_naive function\n",
|
|
" 4. Return Tensor wrapped around the result\n",
|
|
" \n",
|
|
" EXAMPLE:\n",
|
|
" layer = Conv2D(kernel_size=(2, 2))\n",
|
|
" x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape (3, 3)\n",
|
|
" y = layer(x) # shape (2, 2)\n",
|
|
" \n",
|
|
" HINTS:\n",
|
|
" - Store kernel_size as (kH, kW)\n",
|
|
" - Initialize kernel with np.random.randn(kH, kW) * 0.1 (small values)\n",
|
|
" - Use conv2d_naive(x.data, self.kernel) in forward pass\n",
|
|
" - Return Tensor(result) to wrap the result\n",
|
|
" \"\"\"\n",
|
|
" def __init__(self, kernel_size: Tuple[int, int]):\n",
|
|
" \"\"\"\n",
|
|
" Initialize Conv2D layer with random kernel.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" kernel_size: (kH, kW) - size of the convolution kernel\n",
|
|
" \n",
|
|
" TODO: \n",
|
|
" 1. Store kernel_size as instance variable\n",
|
|
" 2. Initialize random kernel with small values\n",
|
|
" 3. Scale kernel values to prevent large outputs\n",
|
|
" \n",
|
|
" STEP-BY-STEP:\n",
|
|
" 1. Store kernel_size as self.kernel_size\n",
|
|
" 2. Unpack kernel_size into kH, kW\n",
|
|
" 3. Initialize kernel: np.random.randn(kH, kW) * 0.1\n",
|
|
" 4. Convert to float32 for consistency\n",
|
|
" \n",
|
|
" EXAMPLE:\n",
|
|
" Conv2D((2, 2)) creates:\n",
|
|
" - kernel: shape (2, 2) with small random values\n",
|
|
" \"\"\"\n",
|
|
" raise NotImplementedError(\"Student implementation required\")\n",
|
|
" \n",
|
|
" def forward(self, x: Tensor) -> Tensor:\n",
|
|
" \"\"\"\n",
|
|
" Forward pass: apply convolution to input.\n",
|
|
" \n",
|
|
" Args:\n",
|
|
" x: Input tensor of shape (H, W)\n",
|
|
" \n",
|
|
" Returns:\n",
|
|
" Output tensor of shape (H-kH+1, W-kW+1)\n",
|
|
" \n",
|
|
" TODO: Implement convolution using conv2d_naive function.\n",
|
|
" \n",
|
|
" STEP-BY-STEP:\n",
|
|
" 1. Use conv2d_naive(x.data, self.kernel)\n",
|
|
" 2. Return Tensor(result)\n",
|
|
" \n",
|
|
" EXAMPLE:\n",
|
|
" Input x: Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape (3, 3)\n",
|
|
" Kernel: shape (2, 2)\n",
|
|
" Output: Tensor([[val1, val2], [val3, val4]]) # shape (2, 2)\n",
|
|
" \n",
|
|
" HINTS:\n",
|
|
" - x.data gives you the numpy array\n",
|
|
" - self.kernel is your learned kernel\n",
|
|
" - Use conv2d_naive(x.data, self.kernel)\n",
|
|
" - Return Tensor(result) to wrap the result\n",
|
|
" \"\"\"\n",
|
|
" raise NotImplementedError(\"Student implementation required\")\n",
|
|
" \n",
|
|
" def __call__(self, x: Tensor) -> Tensor:\n",
|
|
" \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
|
|
" return self.forward(x)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "3e18c382",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| hide\n",
|
|
"#| export\n",
|
|
"class Conv2D:\n",
|
|
" def __init__(self, kernel_size: Tuple[int, int]):\n",
|
|
" self.kernel_size = kernel_size\n",
|
|
" kH, kW = kernel_size\n",
|
|
" # Initialize with small random values\n",
|
|
" self.kernel = np.random.randn(kH, kW).astype(np.float32) * 0.1\n",
|
|
" \n",
|
|
" def forward(self, x: Tensor) -> Tensor:\n",
|
|
" return Tensor(conv2d_naive(x.data, self.kernel))\n",
|
|
" \n",
|
|
" def __call__(self, x: Tensor) -> Tensor:\n",
|
|
" return self.forward(x)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "e288fb18",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"### \ud83e\uddea Test Your Conv2D Layer"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "2f1a4a6a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Test Conv2D layer\n",
|
|
"print(\"Testing Conv2D layer...\")\n",
|
|
"\n",
|
|
"try:\n",
|
|
" # Test basic Conv2D layer\n",
|
|
" conv = Conv2D(kernel_size=(2, 2))\n",
|
|
" x = Tensor(np.array([\n",
|
|
" [1, 2, 3],\n",
|
|
" [4, 5, 6],\n",
|
|
" [7, 8, 9]\n",
|
|
" ], dtype=np.float32))\n",
|
|
" \n",
|
|
" print(f\"\u2705 Input shape: {x.shape}\")\n",
|
|
" print(f\"\u2705 Kernel shape: {conv.kernel.shape}\")\n",
|
|
" print(f\"\u2705 Kernel values:\\n{conv.kernel}\")\n",
|
|
" \n",
|
|
" y = conv(x)\n",
|
|
" print(f\"\u2705 Output shape: {y.shape}\")\n",
|
|
" print(f\"\u2705 Output: {y}\")\n",
|
|
" \n",
|
|
" # Test with different kernel size\n",
|
|
" conv2 = Conv2D(kernel_size=(3, 3))\n",
|
|
" y2 = conv2(x)\n",
|
|
" print(f\"\u2705 3x3 kernel output shape: {y2.shape}\")\n",
|
|
" \n",
|
|
" print(\"\\n\ud83c\udf89 Conv2D layer works!\")\n",
|
|
" \n",
|
|
"except Exception as e:\n",
|
|
" print(f\"\u274c Error: {e}\")\n",
|
|
" print(\"Make sure to implement the Conv2D layer above!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "97939763",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\"",
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"source": [
|
|
"## Step 4: Building a Simple ConvNet\n",
|
|
"\n",
|
|
"Now let's compose Conv2D layers with other layers to build a complete convolutional neural network!\n",
|
|
"\n",
|
|
"### Why ConvNets Matter\n",
|
|
"- **Spatial hierarchy**: Each layer learns increasingly complex features\n",
|
|
"- **Parameter sharing**: Same kernel applied everywhere (efficiency)\n",
|
|
"- **Translation invariance**: Can recognize objects regardless of position\n",
|
|
"- **Real-world success**: Power most modern computer vision systems\n",
|
|
"\n",
|
|
"### The Architecture\n",
|
|
"```\n",
|
|
"Input Image \u2192 Conv2D \u2192 ReLU \u2192 Flatten \u2192 Dense \u2192 Output\n",
|
|
"```\n",
|
|
"\n",
|
|
"This simple architecture can learn to recognize patterns in images!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "51631fe6",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| export\n",
|
|
"def flatten(x: Tensor) -> Tensor:\n",
|
|
" \"\"\"\n",
|
|
" Flatten a 2D tensor to 1D (for connecting to Dense).\n",
|
|
" \n",
|
|
" TODO: Implement flattening operation.\n",
|
|
" \n",
|
|
" APPROACH:\n",
|
|
" 1. Get the numpy array from the tensor\n",
|
|
" 2. Use .flatten() to convert to 1D\n",
|
|
" 3. Add batch dimension with [None, :]\n",
|
|
" 4. Return Tensor wrapped around the result\n",
|
|
" \n",
|
|
" EXAMPLE:\n",
|
|
" Input: Tensor([[1, 2], [3, 4]]) # shape (2, 2)\n",
|
|
" Output: Tensor([[1, 2, 3, 4]]) # shape (1, 4)\n",
|
|
" \n",
|
|
" HINTS:\n",
|
|
" - Use x.data.flatten() to get 1D array\n",
|
|
" - Add batch dimension: result[None, :]\n",
|
|
" - Return Tensor(result)\n",
|
|
" \"\"\"\n",
|
|
" raise NotImplementedError(\"Student implementation required\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "7e8f2b50",
|
|
"metadata": {
|
|
"lines_to_next_cell": 1
|
|
},
|
|
"outputs": [],
|
|
"source": [
|
|
"#| hide\n",
|
|
"#| export\n",
|
|
"def flatten(x: Tensor) -> Tensor:\n",
|
|
" \"\"\"Flatten a 2D tensor to 1D (for connecting to Dense).\"\"\"\n",
|
|
" return Tensor(x.data.flatten()[None, :])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7bdb9f80",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"### \ud83e\uddea Test Your Flatten Function"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "c6d92ebc",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Test flatten function\n",
|
|
"print(\"Testing flatten function...\")\n",
|
|
"\n",
|
|
"try:\n",
|
|
" # Test flattening\n",
|
|
" x = Tensor([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)\n",
|
|
" flattened = flatten(x)\n",
|
|
" \n",
|
|
" print(f\"\u2705 Input shape: {x.shape}\")\n",
|
|
" print(f\"\u2705 Flattened shape: {flattened.shape}\")\n",
|
|
" print(f\"\u2705 Flattened values: {flattened}\")\n",
|
|
" \n",
|
|
" # Verify the flattening worked correctly\n",
|
|
" expected = np.array([[1, 2, 3, 4, 5, 6]])\n",
|
|
" assert np.allclose(flattened.data, expected), \"\u274c Flattening incorrect!\"\n",
|
|
" print(\"\u2705 Flattening works correctly!\")\n",
|
|
" \n",
|
|
"except Exception as e:\n",
|
|
" print(f\"\u274c Error: {e}\")\n",
|
|
" print(\"Make sure to implement the flatten function above!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9804128d",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## Step 5: Composing a Complete ConvNet\n",
|
|
"\n",
|
|
"Now let's build a simple convolutional neural network that can process images!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "d60d05b9",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Compose a simple ConvNet\n",
|
|
"print(\"Building a simple ConvNet...\")\n",
|
|
"\n",
|
|
"try:\n",
|
|
" # Create network components\n",
|
|
" conv = Conv2D((2, 2))\n",
|
|
" relu = ReLU()\n",
|
|
" dense = Dense(input_size=4, output_size=1) # 4 features from 2x2 output\n",
|
|
" \n",
|
|
" # Test input (small 3x3 \"image\")\n",
|
|
" x = Tensor(np.random.randn(3, 3).astype(np.float32))\n",
|
|
" print(f\"\u2705 Input shape: {x.shape}\")\n",
|
|
" print(f\"\u2705 Input: {x}\")\n",
|
|
" \n",
|
|
" # Forward pass through the network\n",
|
|
" conv_out = conv(x)\n",
|
|
" print(f\"\u2705 After Conv2D: {conv_out}\")\n",
|
|
" \n",
|
|
" relu_out = relu(conv_out)\n",
|
|
" print(f\"\u2705 After ReLU: {relu_out}\")\n",
|
|
" \n",
|
|
" flattened = flatten(relu_out)\n",
|
|
" print(f\"\u2705 After flatten: {flattened}\")\n",
|
|
" \n",
|
|
" final_out = dense(flattened)\n",
|
|
" print(f\"\u2705 Final output: {final_out}\")\n",
|
|
" \n",
|
|
" print(\"\\n\ud83c\udf89 Simple ConvNet works!\")\n",
|
|
" print(\"This network can learn to recognize patterns in images!\")\n",
|
|
" \n",
|
|
"except Exception as e:\n",
|
|
" print(f\"\u274c Error: {e}\")\n",
|
|
" print(\"Check your Conv2D, flatten, and Dense implementations!\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "9fe4faf0",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## Step 6: Understanding the Power of Convolution\n",
|
|
"\n",
|
|
"Let's see how convolution captures different types of patterns:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "434133c2",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Demonstrate pattern detection\n",
|
|
"print(\"Demonstrating pattern detection...\")\n",
|
|
"\n",
|
|
"try:\n",
|
|
" # Create a simple \"image\" with a pattern\n",
|
|
" image = np.array([\n",
|
|
" [0, 0, 0, 0, 0],\n",
|
|
" [0, 1, 1, 1, 0],\n",
|
|
" [0, 1, 1, 1, 0],\n",
|
|
" [0, 1, 1, 1, 0],\n",
|
|
" [0, 0, 0, 0, 0]\n",
|
|
" ], dtype=np.float32)\n",
|
|
" \n",
|
|
" # Different kernels detect different patterns\n",
|
|
" edge_kernel = np.array([\n",
|
|
" [1, 1, 1],\n",
|
|
" [1, -8, 1],\n",
|
|
" [1, 1, 1]\n",
|
|
" ], dtype=np.float32)\n",
|
|
" \n",
|
|
" blur_kernel = np.array([\n",
|
|
" [1/9, 1/9, 1/9],\n",
|
|
" [1/9, 1/9, 1/9],\n",
|
|
" [1/9, 1/9, 1/9]\n",
|
|
" ], dtype=np.float32)\n",
|
|
" \n",
|
|
" # Test edge detection\n",
|
|
" edge_result = conv2d_naive(image, edge_kernel)\n",
|
|
" print(\"\u2705 Edge detection:\")\n",
|
|
" print(\" Detects boundaries around the white square\")\n",
|
|
" print(\" Result:\\n\", edge_result)\n",
|
|
" \n",
|
|
" # Test blurring\n",
|
|
" blur_result = conv2d_naive(image, blur_kernel)\n",
|
|
" print(\"\u2705 Blurring:\")\n",
|
|
" print(\" Smooths the image\")\n",
|
|
" print(\" Result:\\n\", blur_result)\n",
|
|
" \n",
|
|
" print(\"\\n\ud83d\udca1 Different kernels = different feature detectors!\")\n",
|
|
" print(\" Neural networks learn these automatically from data!\")\n",
|
|
" \n",
|
|
"except Exception as e:\n",
|
|
" print(f\"\u274c Error: {e}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "80938b52",
|
|
"metadata": {
|
|
"cell_marker": "\"\"\""
|
|
},
|
|
"source": [
|
|
"## \ud83c\udfaf Module Summary\n",
|
|
"\n",
|
|
"Congratulations! You've built the foundation of convolutional neural networks:\n",
|
|
"\n",
|
|
"### What You've Accomplished\n",
|
|
"\u2705 **Convolution Operation**: Understanding the sliding window mechanism \n",
|
|
"\u2705 **Conv2D Layer**: Learnable convolutional layer implementation \n",
|
|
"\u2705 **Pattern Detection**: Visualizing how kernels detect different features \n",
|
|
"\u2705 **ConvNet Architecture**: Composing Conv2D with other layers \n",
|
|
"\u2705 **Real-world Applications**: Understanding computer vision applications \n",
|
|
"\n",
|
|
"### Key Concepts You've Learned\n",
|
|
"- **Convolution** is pattern matching with sliding windows\n",
|
|
"- **Local connectivity** means each output depends on a small input region\n",
|
|
"- **Weight sharing** makes CNNs parameter-efficient\n",
|
|
"- **Spatial hierarchy** builds complex features from simple patterns\n",
|
|
"- **Translation invariance** allows recognition regardless of position\n",
|
|
"\n",
|
|
"### What's Next\n",
|
|
"In the next modules, you'll build on this foundation:\n",
|
|
"- **Advanced CNN features**: Stride, padding, pooling\n",
|
|
"- **Multi-channel convolution**: RGB images, multiple filters\n",
|
|
"- **Training**: Learning kernels from data\n",
|
|
"- **Real applications**: Image classification, object detection\n",
|
|
"\n",
|
|
"### Real-World Connection\n",
|
|
"Your Conv2D layer is now ready to:\n",
|
|
"- Learn edge detectors, texture recognizers, and shape detectors\n",
|
|
"- Process real images for computer vision tasks\n",
|
|
"- Integrate with the rest of the TinyTorch ecosystem\n",
|
|
"- Scale to complex architectures like ResNet, VGG, etc.\n",
|
|
"\n",
|
|
"**Ready for the next challenge?** Let's move on to training these networks!"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "03f153f1",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Final verification\n",
|
|
"print(\"\\n\" + \"=\"*50)\n",
|
|
"print(\"\ud83c\udf89 CNN MODULE COMPLETE!\")\n",
|
|
"print(\"=\"*50)\n",
|
|
"print(\"\u2705 Convolution operation understanding\")\n",
|
|
"print(\"\u2705 Conv2D layer implementation\")\n",
|
|
"print(\"\u2705 Pattern detection visualization\")\n",
|
|
"print(\"\u2705 ConvNet architecture composition\")\n",
|
|
"print(\"\u2705 Real-world computer vision context\")\n",
|
|
"print(\"\\n\ud83d\ude80 Ready to train networks in the next module!\") "
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"jupytext": {
|
|
"main_language": "python"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
} |