Files
TinyTorch/modules/09_spatial/README.md
Vijay Janapa Reddi 832c569cad Add module development files to new structure
Added all module development files to modules/XX_name/ directories:

Module notebooks and scripts:
- 18 modules with .ipynb and .py files (01-20, excluding some gaps)
- Moved from modules/source/ to direct module directories
- Includes tensor, autograd, layers, transformers, optimization modules

Module README files:
- Added README.md for modules with additional documentation
- Complements ABOUT.md files added earlier

This completes the module restructuring:
- Before: modules/source/XX_name/*_dev.{py,ipynb}
- After: modules/XX_name/*_dev.{py,ipynb}

All development happens directly in numbered module directories now.
2025-11-10 19:43:36 -05:00

208 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Module 09: Spatial Operations - CNNs for Vision
## Overview
**Time**: 3-4 hours
**Difficulty**: ⭐⭐⭐⭐☆
Build convolutional neural networks (CNNs) - the foundation of computer vision. Learn how spatial operations enable pattern recognition in images through local connectivity and parameter sharing.
## Prerequisites
**Required Modules**: 01-08 must be completed and tested
- ✅ Module 01 (Tensor): Data structures
- ✅ Module 02 (Activations): ReLU for feature detection
- ✅ Module 03 (Layers): Linear layers foundation
- ✅ Module 04 (Losses): CrossEntropy for classification
- ✅ Module 05 (Autograd): Gradient computation
- ✅ Module 06 (Optimizers): SGD/Adam for training
- ✅ Module 07 (Training): Training loop patterns
- ✅ Module 08 (Data): Efficient data loading
**Before starting**, verify prerequisites:
```bash
pytest modules/01_tensor/test_tensor.py
pytest modules/02_activations/test_activations.py
# ... test all modules 01-08
```
## Learning Objectives
By the end of this module, you will:
### Core Concepts
1. **Understand Convolutional Operations**
- Sliding window computation over spatial dimensions
- Filter/kernel mathematics (cross-correlation)
- Output size calculations: `(H-K+2P)/S + 1`
- Why convolution works for spatial data
2. **Implement Conv2d Layers**
- Forward pass: applying filters to extract features
- Backward pass: gradients for filters, inputs, and biases
- Parameter sharing reduces model size vs fully-connected
- Local connectivity captures spatial patterns
3. **Master Pooling Operations**
- MaxPool2d: dimensionality reduction while preserving features
- Stride and kernel size trade-offs
- Translation invariance for robust recognition
- When to pool vs when to use strided convolution
4. **Build Spatial Hierarchies**
- Early layers: edges and textures (local patterns)
- Middle layers: parts and shapes (combinations)
- Deep layers: objects and scenes (high-level concepts)
- How receptive fields grow with depth
### Systems Understanding
1. **Computational Complexity**
- FLOPs analysis: `O(N²M²K²)` for naive convolution
- Why convolution is expensive (6 nested loops)
- Memory bottlenecks in spatial operations
- Cache efficiency and data locality
2. **Optimization Techniques**
- Im2col algorithm: trade memory for speed
- Vectorization strategies for convolution
- Why GPUs excel at convolutional operations
- Batch processing for throughput
3. **Production Considerations**
- Parameter efficiency: CNNs vs MLPs for images
- Mobile deployment: depthwise-separable convolutions
- Memory footprint during training (activations + gradients)
- Inference optimization patterns
### ML Engineering Skills
1. **Architecture Design**
- Choosing filter sizes (1×1, 3×3, 5×5)
- Balancing depth vs width
- When to pool and when to stride
- Building feature extraction pipelines
2. **Debugging Spatial Layers**
- Shape tracking through conv and pool layers
- Gradient flow verification in deep networks
- Common errors: dimension mismatches
- Validating learned filters visually
3. **Performance Profiling**
- Measuring convolution speed vs input size
- Memory usage scaling with batch size
- Comparing naive vs optimized implementations
- Bottleneck identification in CNN pipelines
## What You'll Build
### Core Components
1. **Conv2d**: Convolutional layer with learnable filters
2. **MaxPool2d**: Max pooling for dimensionality reduction
3. **Flatten**: Reshape spatial features for classification
4. **Helper functions**: Shape calculation utilities
### Complete CNN System
By module end, you'll have all components to build:
- LeNet-style architectures (1998 - digit recognition)
- Feature extraction pipelines
- Spatial hierarchy networks
- Ready for Milestone 04: LeNet CNN
## Module Structure
```
modules/09_spatial/
├── README.md ← You are here
├── spatial_dev.py ← Main implementation file
├── spatial_dev.ipynb ← Jupyter notebook version
└── test_spatial.py ← Validation tests
```
## After This Module
### Immediate Next Step
**→ Milestone 04: LeNet CNN (1998)**
Build Yann LeCun's historic convolutional network that revolutionized digit recognition. You now have all components: Conv2d, MaxPool2d, ReLU, and training loops.
### Future Modules Will Add
- **Module 10**: Normalization (BatchNorm, LayerNorm)
- **Module 11**: Modern architectures (ResNets, skip connections)
- **Module 12**: Attention mechanisms (transformers)
### What Becomes Possible
- ✅ Image classification (MNIST, CIFAR-10)
- ✅ Feature extraction for transfer learning
- ✅ Spatial pattern recognition
- ✅ Building blocks for modern vision models
## Key Insights You'll Discover
### Why CNNs Work
1. **Parameter Sharing**: Same filter applied everywhere → fewer parameters
2. **Local Connectivity**: Neurons see small regions → translation equivariance
3. **Hierarchical Features**: Stack layers → learn complex patterns
4. **Spatial Structure**: Preserve 2D topology → better for images
### Performance Realities
1. **Convolution is Expensive**: O(N²M²K²) complexity → GPUs essential
2. **Memory Scales Quadratically**: Large images → huge activations
3. **Im2col Trade-off**: 10× memory → 100× speed possible
4. **Batch Processing**: Amortize overhead → better throughput
### Architectural Patterns
1. **Gradual Downsampling**: Increase channels, decrease spatial size
2. **3×3 Dominance**: Best balance of expressiveness and efficiency
3. **Pooling Alternatives**: Strided conv can replace pooling
4. **Depth Matters**: More layers → better hierarchies
## Tips for Success
### Implementation Strategy
1. **Start Simple**: Get 3×3 convolution working first
2. **Test Incrementally**: Verify shapes at each step
3. **Profile Early**: Measure performance to understand complexity
4. **Visualize Outputs**: Check feature maps make sense
### Common Pitfalls
- ⚠️ **Shape Mismatches**: Track dimensions carefully through conv/pool
- ⚠️ **Memory Errors**: Batch size × spatial size can be huge
- ⚠️ **Gradient Issues**: Deep networks need careful initialization
- ⚠️ **Performance**: Naive implementation will be slow (that's the point!)
### Debugging Techniques
```python
# Always print shapes during development
print(f"Input: {x.shape}")
x = conv1(x)
print(f"After conv1: {x.shape}")
x = pool1(x)
print(f"After pool1: {x.shape}")
```
## Estimated Timeline
- **Part 1-2**: Introduction & Math (30 minutes)
- **Part 3**: Conv2d Implementation (90 minutes)
- **Part 4**: MaxPool2d & Flatten (45 minutes)
- **Part 5**: Systems Analysis (30 minutes)
- **Part 6**: Integration & Testing (30 minutes)
- **Total**: 3-4 hours with breaks
## Learning Approach
This is a **Core Module (complexity level 4/5)**:
- Full implementation with explicit loops (see the complexity!)
- Systems analysis reveals performance characteristics
- Connection to production patterns (im2col, GPU kernels)
- Immediate testing after each component
**Don't rush** - understanding spatial operations deeply is crucial for modern ML.
## Getting Started
Open `spatial_dev.py` and begin with Part 1: Introduction to Spatial Operations.
**Remember**: You're building the foundation of computer vision. Take time to understand how these operations enable hierarchical feature learning in images.
---
**Ready?** Let's build CNNs! 🏗️