Files
TinyTorch/modules/cnn/cnn_dev.py
Vijay Janapa Reddi 718f52380e Improve CLI: remove redundant --module flags
- Update test, export, and clean commands to use positional arguments
- Change from 'tito module test --module dataloader' to 'tito module test dataloader'
- Eliminates redundant --module flag within module command group
- Update help text and examples to reflect new syntax
- Maintains backward compatibility with --all flag
- More intuitive and consistent CLI design
2025-07-12 00:56:00 -04:00

634 lines
19 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# ---
# jupyter:
# jupytext:
# text_representation:
# extension: .py
# format_name: percent
# format_version: '1.3'
# jupytext_version: 1.17.1
# ---
# %% [markdown]
"""
# Module X: CNN - Convolutional Neural Networks
Welcome to the CNN module! Here you'll implement the core building block of modern computer vision: the convolutional layer.
## Learning Goals
- Understand the convolution operation (sliding window, local connectivity, weight sharing)
- Implement Conv2D with explicit for-loops
- Visualize how convolution builds feature maps
- Compose Conv2D with other layers to build a simple ConvNet
- (Stretch) Explore stride, padding, pooling, and multi-channel input
## Build → Use → Understand
1. **Build**: Conv2D layer using sliding window convolution
2. **Use**: Transform images and see feature maps
3. **Understand**: How CNNs learn spatial patterns
"""
# %% [markdown]
"""
## 📦 Where This Code Lives in the Final Package
**Learning Side:** You work in `modules/cnn/cnn_dev.py`
**Building Side:** Code exports to `tinytorch.core.layers`
```python
# Final package structure:
from tinytorch.core.layers import Dense, Conv2D # Both layers together!
from tinytorch.core.activations import ReLU
from tinytorch.core.tensor import Tensor
```
**Why this matters:**
- **Learning:** Focused modules for deep understanding
- **Production:** Proper organization like PyTorch's `torch.nn`
- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`
"""
# %%
#| default_exp core.cnn
# %%
#| export
import numpy as np
from typing import List, Tuple, Optional
from tinytorch.core.tensor import Tensor
# Setup and imports (for development)
import matplotlib.pyplot as plt
from tinytorch.core.layers import Dense
from tinytorch.core.activations import ReLU
# %% [markdown]
"""
## Step 1: What is Convolution?
### Definition
A **convolutional layer** applies a small filter (kernel) across the input, producing a feature map. This operation captures local patterns and is the foundation of modern vision models.
### Why Convolution Matters in Computer Vision
- **Local connectivity**: Each output value depends only on a small region of the input
- **Weight sharing**: The same filter is applied everywhere (translation invariance)
- **Spatial hierarchy**: Multiple layers build increasingly complex features
- **Parameter efficiency**: Much fewer parameters than fully connected layers
### The Fundamental Insight
**Convolution is pattern matching!** The kernel learns to detect specific patterns:
- **Edge detectors**: Find boundaries between objects
- **Texture detectors**: Recognize surface patterns
- **Shape detectors**: Identify geometric forms
- **Feature detectors**: Combine simple patterns into complex features
### Real-World Examples
- **Image processing**: Detect edges, blur, sharpen
- **Computer vision**: Recognize objects, faces, text
- **Medical imaging**: Detect tumors, analyze scans
- **Autonomous driving**: Identify traffic signs, pedestrians
### Visual Intuition
```
Input Image: Kernel: Output Feature Map:
[1, 2, 3] [1, 0] [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)]
[4, 5, 6] [0, -1] [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]
[7, 8, 9]
```
The kernel slides across the input, computing dot products at each position.
### The Math Behind It
For input I (H×W) and kernel K (kH×kW), the output O (out_H×out_W) is:
```
O[i,j] = sum(I[i+di, j+dj] * K[di, dj] for di in range(kH), dj in range(kW))
```
Let's implement this step by step!
"""
# %%
#| export
def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
"""
Naive 2D convolution (single channel, no stride, no padding).
Args:
input: 2D input array (H, W)
kernel: 2D filter (kH, kW)
Returns:
2D output array (H-kH+1, W-kW+1)
TODO: Implement the sliding window convolution using for-loops.
APPROACH:
1. Get input dimensions: H, W = input.shape
2. Get kernel dimensions: kH, kW = kernel.shape
3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1
4. Create output array: np.zeros((out_H, out_W))
5. Use nested loops to slide the kernel:
- i loop: output rows (0 to out_H-1)
- j loop: output columns (0 to out_W-1)
- di loop: kernel rows (0 to kH-1)
- dj loop: kernel columns (0 to kW-1)
6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]
EXAMPLE:
Input: [[1, 2, 3], Kernel: [[1, 0],
[4, 5, 6], [0, -1]]
[7, 8, 9]]
Output[0,0] = 1*1 + 2*0 + 4*0 + 5*(-1) = 1 - 5 = -4
Output[0,1] = 2*1 + 3*0 + 5*0 + 6*(-1) = 2 - 6 = -4
Output[1,0] = 4*1 + 5*0 + 7*0 + 8*(-1) = 4 - 8 = -4
Output[1,1] = 5*1 + 6*0 + 8*0 + 9*(-1) = 5 - 9 = -4
HINTS:
- Start with output = np.zeros((out_H, out_W))
- Use four nested loops: for i in range(out_H): for j in range(out_W): for di in range(kH): for dj in range(kW):
- Accumulate the sum: output[i,j] += input[i+di, j+dj] * kernel[di, dj]
"""
raise NotImplementedError("Student implementation required")
# %%
#| hide
#| export
def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
H, W = input.shape
kH, kW = kernel.shape
out_H, out_W = H - kH + 1, W - kW + 1
output = np.zeros((out_H, out_W), dtype=input.dtype)
for i in range(out_H):
for j in range(out_W):
for di in range(kH):
for dj in range(kW):
output[i, j] += input[i + di, j + dj] * kernel[di, dj]
return output
# %% [markdown]
"""
### 🧪 Test Your Conv2D Implementation
Try your function on this simple example:
"""
# %%
# Test case for conv2d_naive
input = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
], dtype=np.float32)
kernel = np.array([
[1, 0],
[0, -1]
], dtype=np.float32)
expected = np.array([
[1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)],
[4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]
], dtype=np.float32)
try:
output = conv2d_naive(input, kernel)
print("✅ Input:\n", input)
print("✅ Kernel:\n", kernel)
print("✅ Your output:\n", output)
print("✅ Expected:\n", expected)
assert np.allclose(output, expected), "❌ Output does not match expected!"
print("🎉 conv2d_naive works!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement conv2d_naive above!")
# %% [markdown]
"""
## Step 2: Understanding What Convolution Does
Let's visualize how different kernels detect different patterns:
"""
# %%
# Visualize different convolution kernels
print("Visualizing different convolution kernels...")
try:
# Test different kernels
test_input = np.array([
[1, 1, 1, 0, 0],
[1, 1, 1, 0, 0],
[1, 1, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]
], dtype=np.float32)
# Edge detection kernel (horizontal)
edge_kernel = np.array([
[1, 1, 1],
[0, 0, 0],
[-1, -1, -1]
], dtype=np.float32)
# Sharpening kernel
sharpen_kernel = np.array([
[0, -1, 0],
[-1, 5, -1],
[0, -1, 0]
], dtype=np.float32)
# Test edge detection
edge_output = conv2d_naive(test_input, edge_kernel)
print("✅ Edge detection kernel:")
print(" Detects horizontal edges (boundaries between light and dark)")
print(" Output:\n", edge_output)
# Test sharpening
sharpen_output = conv2d_naive(test_input, sharpen_kernel)
print("✅ Sharpening kernel:")
print(" Enhances edges and details")
print(" Output:\n", sharpen_output)
print("\n💡 Different kernels detect different patterns!")
print(" Neural networks learn these kernels automatically!")
except Exception as e:
print(f"❌ Error: {e}")
# %% [markdown]
"""
## Step 3: Conv2D Layer Class
Now let's wrap your convolution function in a layer class for use in networks. This makes it consistent with other layers like Dense.
### Why Layer Classes Matter
- **Consistent API**: Same interface as Dense layers
- **Learnable parameters**: Kernels can be learned from data
- **Composability**: Can be combined with other layers
- **Integration**: Works seamlessly with the rest of TinyTorch
### The Pattern
```
Input Tensor → Conv2D → Output Tensor
```
Just like Dense layers, but with spatial operations instead of linear transformations.
"""
# %%
#| export
class Conv2D:
"""
2D Convolutional Layer (single channel, single filter, no stride/pad).
Args:
kernel_size: (kH, kW) - size of the convolution kernel
TODO: Initialize a random kernel and implement the forward pass using conv2d_naive.
APPROACH:
1. Store kernel_size as instance variable
2. Initialize random kernel with small values
3. Implement forward pass using conv2d_naive function
4. Return Tensor wrapped around the result
EXAMPLE:
layer = Conv2D(kernel_size=(2, 2))
x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape (3, 3)
y = layer(x) # shape (2, 2)
HINTS:
- Store kernel_size as (kH, kW)
- Initialize kernel with np.random.randn(kH, kW) * 0.1 (small values)
- Use conv2d_naive(x.data, self.kernel) in forward pass
- Return Tensor(result) to wrap the result
"""
def __init__(self, kernel_size: Tuple[int, int]):
"""
Initialize Conv2D layer with random kernel.
Args:
kernel_size: (kH, kW) - size of the convolution kernel
TODO:
1. Store kernel_size as instance variable
2. Initialize random kernel with small values
3. Scale kernel values to prevent large outputs
STEP-BY-STEP:
1. Store kernel_size as self.kernel_size
2. Unpack kernel_size into kH, kW
3. Initialize kernel: np.random.randn(kH, kW) * 0.1
4. Convert to float32 for consistency
EXAMPLE:
Conv2D((2, 2)) creates:
- kernel: shape (2, 2) with small random values
"""
raise NotImplementedError("Student implementation required")
def forward(self, x: Tensor) -> Tensor:
"""
Forward pass: apply convolution to input.
Args:
x: Input tensor of shape (H, W)
Returns:
Output tensor of shape (H-kH+1, W-kW+1)
TODO: Implement convolution using conv2d_naive function.
STEP-BY-STEP:
1. Use conv2d_naive(x.data, self.kernel)
2. Return Tensor(result)
EXAMPLE:
Input x: Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape (3, 3)
Kernel: shape (2, 2)
Output: Tensor([[val1, val2], [val3, val4]]) # shape (2, 2)
HINTS:
- x.data gives you the numpy array
- self.kernel is your learned kernel
- Use conv2d_naive(x.data, self.kernel)
- Return Tensor(result) to wrap the result
"""
raise NotImplementedError("Student implementation required")
def __call__(self, x: Tensor) -> Tensor:
"""Make layer callable: layer(x) same as layer.forward(x)"""
return self.forward(x)
# %%
#| hide
#| export
class Conv2D:
def __init__(self, kernel_size: Tuple[int, int]):
self.kernel_size = kernel_size
kH, kW = kernel_size
# Initialize with small random values
self.kernel = np.random.randn(kH, kW).astype(np.float32) * 0.1
def forward(self, x: Tensor) -> Tensor:
return Tensor(conv2d_naive(x.data, self.kernel))
def __call__(self, x: Tensor) -> Tensor:
return self.forward(x)
# %% [markdown]
"""
### 🧪 Test Your Conv2D Layer
"""
# %%
# Test Conv2D layer
print("Testing Conv2D layer...")
try:
# Test basic Conv2D layer
conv = Conv2D(kernel_size=(2, 2))
x = Tensor(np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
], dtype=np.float32))
print(f"✅ Input shape: {x.shape}")
print(f"✅ Kernel shape: {conv.kernel.shape}")
print(f"✅ Kernel values:\n{conv.kernel}")
y = conv(x)
print(f"✅ Output shape: {y.shape}")
print(f"✅ Output: {y}")
# Test with different kernel size
conv2 = Conv2D(kernel_size=(3, 3))
y2 = conv2(x)
print(f"✅ 3x3 kernel output shape: {y2.shape}")
print("\n🎉 Conv2D layer works!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the Conv2D layer above!")
# %% [markdown]
"""
## Step 4: Building a Simple ConvNet
Now let's compose Conv2D layers with other layers to build a complete convolutional neural network!
### Why ConvNets Matter
- **Spatial hierarchy**: Each layer learns increasingly complex features
- **Parameter sharing**: Same kernel applied everywhere (efficiency)
- **Translation invariance**: Can recognize objects regardless of position
- **Real-world success**: Power most modern computer vision systems
### The Architecture
```
Input Image → Conv2D → ReLU → Flatten → Dense → Output
```
This simple architecture can learn to recognize patterns in images!
"""
# %%
#| export
def flatten(x: Tensor) -> Tensor:
"""
Flatten a 2D tensor to 1D (for connecting to Dense).
TODO: Implement flattening operation.
APPROACH:
1. Get the numpy array from the tensor
2. Use .flatten() to convert to 1D
3. Add batch dimension with [None, :]
4. Return Tensor wrapped around the result
EXAMPLE:
Input: Tensor([[1, 2], [3, 4]]) # shape (2, 2)
Output: Tensor([[1, 2, 3, 4]]) # shape (1, 4)
HINTS:
- Use x.data.flatten() to get 1D array
- Add batch dimension: result[None, :]
- Return Tensor(result)
"""
raise NotImplementedError("Student implementation required")
# %%
#| hide
#| export
def flatten(x: Tensor) -> Tensor:
"""Flatten a 2D tensor to 1D (for connecting to Dense)."""
return Tensor(x.data.flatten()[None, :])
# %% [markdown]
"""
### 🧪 Test Your Flatten Function
"""
# %%
# Test flatten function
print("Testing flatten function...")
try:
# Test flattening
x = Tensor([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)
flattened = flatten(x)
print(f"✅ Input shape: {x.shape}")
print(f"✅ Flattened shape: {flattened.shape}")
print(f"✅ Flattened values: {flattened}")
# Verify the flattening worked correctly
expected = np.array([[1, 2, 3, 4, 5, 6]])
assert np.allclose(flattened.data, expected), "❌ Flattening incorrect!"
print("✅ Flattening works correctly!")
except Exception as e:
print(f"❌ Error: {e}")
print("Make sure to implement the flatten function above!")
# %% [markdown]
"""
## Step 5: Composing a Complete ConvNet
Now let's build a simple convolutional neural network that can process images!
"""
# %%
# Compose a simple ConvNet
print("Building a simple ConvNet...")
try:
# Create network components
conv = Conv2D((2, 2))
relu = ReLU()
dense = Dense(input_size=4, output_size=1) # 4 features from 2x2 output
# Test input (small 3x3 "image")
x = Tensor(np.random.randn(3, 3).astype(np.float32))
print(f"✅ Input shape: {x.shape}")
print(f"✅ Input: {x}")
# Forward pass through the network
conv_out = conv(x)
print(f"✅ After Conv2D: {conv_out}")
relu_out = relu(conv_out)
print(f"✅ After ReLU: {relu_out}")
flattened = flatten(relu_out)
print(f"✅ After flatten: {flattened}")
final_out = dense(flattened)
print(f"✅ Final output: {final_out}")
print("\n🎉 Simple ConvNet works!")
print("This network can learn to recognize patterns in images!")
except Exception as e:
print(f"❌ Error: {e}")
print("Check your Conv2D, flatten, and Dense implementations!")
# %% [markdown]
"""
## Step 6: Understanding the Power of Convolution
Let's see how convolution captures different types of patterns:
"""
# %%
# Demonstrate pattern detection
print("Demonstrating pattern detection...")
try:
# Create a simple "image" with a pattern
image = np.array([
[0, 0, 0, 0, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 1, 1, 1, 0],
[0, 0, 0, 0, 0]
], dtype=np.float32)
# Different kernels detect different patterns
edge_kernel = np.array([
[1, 1, 1],
[1, -8, 1],
[1, 1, 1]
], dtype=np.float32)
blur_kernel = np.array([
[1/9, 1/9, 1/9],
[1/9, 1/9, 1/9],
[1/9, 1/9, 1/9]
], dtype=np.float32)
# Test edge detection
edge_result = conv2d_naive(image, edge_kernel)
print("✅ Edge detection:")
print(" Detects boundaries around the white square")
print(" Result:\n", edge_result)
# Test blurring
blur_result = conv2d_naive(image, blur_kernel)
print("✅ Blurring:")
print(" Smooths the image")
print(" Result:\n", blur_result)
print("\n💡 Different kernels = different feature detectors!")
print(" Neural networks learn these automatically from data!")
except Exception as e:
print(f"❌ Error: {e}")
# %% [markdown]
"""
## 🎯 Module Summary
Congratulations! You've built the foundation of convolutional neural networks:
### What You've Accomplished
✅ **Convolution Operation**: Understanding the sliding window mechanism
✅ **Conv2D Layer**: Learnable convolutional layer implementation
✅ **Pattern Detection**: Visualizing how kernels detect different features
✅ **ConvNet Architecture**: Composing Conv2D with other layers
✅ **Real-world Applications**: Understanding computer vision applications
### Key Concepts You've Learned
- **Convolution** is pattern matching with sliding windows
- **Local connectivity** means each output depends on a small input region
- **Weight sharing** makes CNNs parameter-efficient
- **Spatial hierarchy** builds complex features from simple patterns
- **Translation invariance** allows recognition regardless of position
### What's Next
In the next modules, you'll build on this foundation:
- **Advanced CNN features**: Stride, padding, pooling
- **Multi-channel convolution**: RGB images, multiple filters
- **Training**: Learning kernels from data
- **Real applications**: Image classification, object detection
### Real-World Connection
Your Conv2D layer is now ready to:
- Learn edge detectors, texture recognizers, and shape detectors
- Process real images for computer vision tasks
- Integrate with the rest of the TinyTorch ecosystem
- Scale to complex architectures like ResNet, VGG, etc.
**Ready for the next challenge?** Let's move on to training these networks!
"""
# %%
# Final verification
print("\n" + "="*50)
print("🎉 CNN MODULE COMPLETE!")
print("="*50)
print("✅ Convolution operation understanding")
print("✅ Conv2D layer implementation")
print("✅ Pattern detection visualization")
print("✅ ConvNet architecture composition")
print("✅ Real-world computer vision context")
print("\n🚀 Ready to train networks in the next module!")