- Add 🧪 emoji to all test_module() docstrings (20 modules)
- Fix Module 16 (compression): Add if __name__ guards to 6 test functions
- Fix Module 08 (dataloader): Add if __name__ guard to test_training_integration
All modules now follow consistent formatting standards for release.
Module 09: Spatial Operations - CNNs for Vision
Overview
Time: 3-4 hours Difficulty: ⭐⭐⭐⭐☆
Build convolutional neural networks (CNNs) - the foundation of computer vision. Learn how spatial operations enable pattern recognition in images through local connectivity and parameter sharing.
Prerequisites
Required Modules: 01-08 must be completed and tested
- ✅ Module 01 (Tensor): Data structures
- ✅ Module 02 (Activations): ReLU for feature detection
- ✅ Module 03 (Layers): Linear layers foundation
- ✅ Module 04 (Losses): CrossEntropy for classification
- ✅ Module 05 (Autograd): Gradient computation
- ✅ Module 06 (Optimizers): SGD/Adam for training
- ✅ Module 07 (Training): Training loop patterns
- ✅ Module 08 (Data): Efficient data loading
Before starting, verify prerequisites:
pytest modules/01_tensor/test_tensor.py
pytest modules/02_activations/test_activations.py
# ... test all modules 01-08
Learning Objectives
By the end of this module, you will:
Core Concepts
-
Understand Convolutional Operations
- Sliding window computation over spatial dimensions
- Filter/kernel mathematics (cross-correlation)
- Output size calculations:
(H-K+2P)/S + 1 - Why convolution works for spatial data
-
Implement Conv2d Layers
- Forward pass: applying filters to extract features
- Backward pass: gradients for filters, inputs, and biases
- Parameter sharing reduces model size vs fully-connected
- Local connectivity captures spatial patterns
-
Master Pooling Operations
- MaxPool2d: dimensionality reduction while preserving features
- Stride and kernel size trade-offs
- Translation invariance for robust recognition
- When to pool vs when to use strided convolution
-
Build Spatial Hierarchies
- Early layers: edges and textures (local patterns)
- Middle layers: parts and shapes (combinations)
- Deep layers: objects and scenes (high-level concepts)
- How receptive fields grow with depth
Systems Understanding
-
Computational Complexity
- FLOPs analysis:
O(N²M²K²)for naive convolution - Why convolution is expensive (6 nested loops)
- Memory bottlenecks in spatial operations
- Cache efficiency and data locality
- FLOPs analysis:
-
Optimization Techniques
- Im2col algorithm: trade memory for speed
- Vectorization strategies for convolution
- Why GPUs excel at convolutional operations
- Batch processing for throughput
-
Production Considerations
- Parameter efficiency: CNNs vs MLPs for images
- Mobile deployment: depthwise-separable convolutions
- Memory footprint during training (activations + gradients)
- Inference optimization patterns
ML Engineering Skills
-
Architecture Design
- Choosing filter sizes (1×1, 3×3, 5×5)
- Balancing depth vs width
- When to pool and when to stride
- Building feature extraction pipelines
-
Debugging Spatial Layers
- Shape tracking through conv and pool layers
- Gradient flow verification in deep networks
- Common errors: dimension mismatches
- Validating learned filters visually
-
Performance Profiling
- Measuring convolution speed vs input size
- Memory usage scaling with batch size
- Comparing naive vs optimized implementations
- Bottleneck identification in CNN pipelines
What You'll Build
Core Components
- Conv2d: Convolutional layer with learnable filters
- MaxPool2d: Max pooling for dimensionality reduction
- Flatten: Reshape spatial features for classification
- Helper functions: Shape calculation utilities
Complete CNN System
By module end, you'll have all components to build:
- LeNet-style architectures (1998 - digit recognition)
- Feature extraction pipelines
- Spatial hierarchy networks
- Ready for Milestone 04: LeNet CNN
Module Structure
modules/09_spatial/
├── README.md ← You are here
├── spatial_dev.py ← Main implementation file
├── spatial_dev.ipynb ← Jupyter notebook version
└── test_spatial.py ← Validation tests
After This Module
Immediate Next Step
→ Milestone 04: LeNet CNN (1998) Build Yann LeCun's historic convolutional network that revolutionized digit recognition. You now have all components: Conv2d, MaxPool2d, ReLU, and training loops.
Future Modules Will Add
- Module 10: Normalization (BatchNorm, LayerNorm)
- Module 11: Modern architectures (ResNets, skip connections)
- Module 12: Attention mechanisms (transformers)
What Becomes Possible
- ✅ Image classification (MNIST, CIFAR-10)
- ✅ Feature extraction for transfer learning
- ✅ Spatial pattern recognition
- ✅ Building blocks for modern vision models
Key Insights You'll Discover
Why CNNs Work
- Parameter Sharing: Same filter applied everywhere → fewer parameters
- Local Connectivity: Neurons see small regions → translation equivariance
- Hierarchical Features: Stack layers → learn complex patterns
- Spatial Structure: Preserve 2D topology → better for images
Performance Realities
- Convolution is Expensive: O(N²M²K²) complexity → GPUs essential
- Memory Scales Quadratically: Large images → huge activations
- Im2col Trade-off: 10× memory → 100× speed possible
- Batch Processing: Amortize overhead → better throughput
Architectural Patterns
- Gradual Downsampling: Increase channels, decrease spatial size
- 3×3 Dominance: Best balance of expressiveness and efficiency
- Pooling Alternatives: Strided conv can replace pooling
- Depth Matters: More layers → better hierarchies
Tips for Success
Implementation Strategy
- Start Simple: Get 3×3 convolution working first
- Test Incrementally: Verify shapes at each step
- Profile Early: Measure performance to understand complexity
- Visualize Outputs: Check feature maps make sense
Common Pitfalls
- ⚠️ Shape Mismatches: Track dimensions carefully through conv/pool
- ⚠️ Memory Errors: Batch size × spatial size can be huge
- ⚠️ Gradient Issues: Deep networks need careful initialization
- ⚠️ Performance: Naive implementation will be slow (that's the point!)
Debugging Techniques
# Always print shapes during development
print(f"Input: {x.shape}")
x = conv1(x)
print(f"After conv1: {x.shape}")
x = pool1(x)
print(f"After pool1: {x.shape}")
Estimated Timeline
- Part 1-2: Introduction & Math (30 minutes)
- Part 3: Conv2d Implementation (90 minutes)
- Part 4: MaxPool2d & Flatten (45 minutes)
- Part 5: Systems Analysis (30 minutes)
- Part 6: Integration & Testing (30 minutes)
- Total: 3-4 hours with breaks
Learning Approach
This is a Core Module (complexity level 4/5):
- Full implementation with explicit loops (see the complexity!)
- Systems analysis reveals performance characteristics
- Connection to production patterns (im2col, GPU kernels)
- Immediate testing after each component
Don't rush - understanding spatial operations deeply is crucial for modern ML.
Getting Started
Open spatial_dev.py and begin with Part 1: Introduction to Spatial Operations.
Remember: You're building the foundation of computer vision. Take time to understand how these operations enable hierarchical feature learning in images.
Ready? Let's build CNNs! 🏗️