TinyTorch/modules/09_spatial/README.md

# Module 09: Spatial Operations - CNNs for Vision

## Overview
**Time**: 3-4 hours
**Difficulty**: ⭐⭐⭐⭐☆

Build convolutional neural networks (CNNs) - the foundation of computer vision. Learn how spatial operations enable pattern recognition in images through local connectivity and parameter sharing.

## Prerequisites
**Required Modules**: 01-08 must be completed and tested
- ✅ Module 01 (Tensor): Data structures
- ✅ Module 02 (Activations): ReLU for feature detection
- ✅ Module 03 (Layers): Linear layers foundation
- ✅ Module 04 (Losses): CrossEntropy for classification
- ✅ Module 05 (Autograd): Gradient computation
- ✅ Module 06 (Optimizers): SGD/Adam for training
- ✅ Module 07 (Training): Training loop patterns
- ✅ Module 08 (Data): Efficient data loading

**Before starting**, verify prerequisites:
```bash
pytest modules/01_tensor/test_tensor.py
pytest modules/02_activations/test_activations.py
# ... test all modules 01-08
```

## Learning Objectives

By the end of this module, you will:

### Core Concepts
1. **Understand Convolutional Operations**
   - Sliding window computation over spatial dimensions
   - Filter/kernel mathematics (cross-correlation)
   - Output size calculations: `(H-K+2P)/S + 1`
   - Why convolution works for spatial data

2. **Implement Conv2d Layers**
   - Forward pass: applying filters to extract features
   - Backward pass: gradients for filters, inputs, and biases
   - Parameter sharing reduces model size vs fully-connected
   - Local connectivity captures spatial patterns

3. **Master Pooling Operations**
   - MaxPool2d: dimensionality reduction while preserving features
   - Stride and kernel size trade-offs
   - Translation invariance for robust recognition
   - When to pool vs when to use strided convolution

4. **Build Spatial Hierarchies**
   - Early layers: edges and textures (local patterns)
   - Middle layers: parts and shapes (combinations)
   - Deep layers: objects and scenes (high-level concepts)
   - How receptive fields grow with depth

### Systems Understanding
1. **Computational Complexity**
   - FLOPs analysis: `O(N²M²K²)` for naive convolution
   - Why convolution is expensive (6 nested loops)
   - Memory bottlenecks in spatial operations
   - Cache efficiency and data locality

2. **Optimization Techniques**
   - Im2col algorithm: trade memory for speed
   - Vectorization strategies for convolution
   - Why GPUs excel at convolutional operations
   - Batch processing for throughput

3. **Production Considerations**
   - Parameter efficiency: CNNs vs MLPs for images
   - Mobile deployment: depthwise-separable convolutions
   - Memory footprint during training (activations + gradients)
   - Inference optimization patterns

### ML Engineering Skills
1. **Architecture Design**
   - Choosing filter sizes (1×1, 3×3, 5×5)
   - Balancing depth vs width
   - When to pool and when to stride
   - Building feature extraction pipelines

2. **Debugging Spatial Layers**
   - Shape tracking through conv and pool layers
   - Gradient flow verification in deep networks
   - Common errors: dimension mismatches
   - Validating learned filters visually

3. **Performance Profiling**
   - Measuring convolution speed vs input size
   - Memory usage scaling with batch size
   - Comparing naive vs optimized implementations
   - Bottleneck identification in CNN pipelines

## What You'll Build

### Core Components
1. **Conv2d**: Convolutional layer with learnable filters
2. **MaxPool2d**: Max pooling for dimensionality reduction
3. **Flatten**: Reshape spatial features for classification
4. **Helper functions**: Shape calculation utilities

### Complete CNN System
By module end, you'll have all components to build:
- LeNet-style architectures (1998 - digit recognition)
- Feature extraction pipelines
- Spatial hierarchy networks
- Ready for Milestone 04: LeNet CNN

## Module Structure

```
modules/09_spatial/
├── README.md                 ← You are here
├── spatial_dev.py            ← Main implementation file
├── spatial_dev.ipynb         ← Jupyter notebook version
└── test_spatial.py           ← Validation tests
```

## After This Module

### Immediate Next Step
**→ Milestone 04: LeNet CNN (1998)**
Build Yann LeCun's historic convolutional network that revolutionized digit recognition. You now have all components: Conv2d, MaxPool2d, ReLU, and training loops.

### Future Modules Will Add
- **Module 10**: Normalization (BatchNorm, LayerNorm)
- **Module 11**: Modern architectures (ResNets, skip connections)
- **Module 12**: Attention mechanisms (transformers)

### What Becomes Possible
- ✅ Image classification (MNIST, CIFAR-10)
- ✅ Feature extraction for transfer learning
- ✅ Spatial pattern recognition
- ✅ Building blocks for modern vision models

## Key Insights You'll Discover

### Why CNNs Work
1. **Parameter Sharing**: Same filter applied everywhere → fewer parameters
2. **Local Connectivity**: Neurons see small regions → translation equivariance
3. **Hierarchical Features**: Stack layers → learn complex patterns
4. **Spatial Structure**: Preserve 2D topology → better for images

### Performance Realities
1. **Convolution is Expensive**: O(N²M²K²) complexity → GPUs essential
2. **Memory Scales Quadratically**: Large images → huge activations
3. **Im2col Trade-off**: 10× memory → 100× speed possible
4. **Batch Processing**: Amortize overhead → better throughput

### Architectural Patterns
1. **Gradual Downsampling**: Increase channels, decrease spatial size
2. **3×3 Dominance**: Best balance of expressiveness and efficiency
3. **Pooling Alternatives**: Strided conv can replace pooling
4. **Depth Matters**: More layers → better hierarchies

## Tips for Success

### Implementation Strategy
1. **Start Simple**: Get 3×3 convolution working first
2. **Test Incrementally**: Verify shapes at each step
3. **Profile Early**: Measure performance to understand complexity
4. **Visualize Outputs**: Check feature maps make sense

### Common Pitfalls
- ⚠️ **Shape Mismatches**: Track dimensions carefully through conv/pool
- ⚠️ **Memory Errors**: Batch size × spatial size can be huge
- ⚠️ **Gradient Issues**: Deep networks need careful initialization
- ⚠️ **Performance**: Naive implementation will be slow (that's the point!)

### Debugging Techniques
```python
# Always print shapes during development
print(f"Input: {x.shape}")
x = conv1(x)
print(f"After conv1: {x.shape}")
x = pool1(x)
print(f"After pool1: {x.shape}")
```

## Estimated Timeline

- **Part 1-2**: Introduction & Math (30 minutes)
- **Part 3**: Conv2d Implementation (90 minutes)
- **Part 4**: MaxPool2d & Flatten (45 minutes)
- **Part 5**: Systems Analysis (30 minutes)
- **Part 6**: Integration & Testing (30 minutes)
- **Total**: 3-4 hours with breaks

## Learning Approach

This is a **Core Module (complexity level 4/5)**:
- Full implementation with explicit loops (see the complexity!)
- Systems analysis reveals performance characteristics
- Connection to production patterns (im2col, GPU kernels)
- Immediate testing after each component

**Don't rush** - understanding spatial operations deeply is crucial for modern ML.

## Getting Started

Open `spatial_dev.py` and begin with Part 1: Introduction to Spatial Operations.

**Remember**: You're building the foundation of computer vision. Take time to understand how these operations enable hierarchical feature learning in images.

---

**Ready?** Let's build CNNs! 🏗️