mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-05 04:57:30 -05:00
Added all module development files to modules/XX_name/ directories:
Module notebooks and scripts:
- 18 modules with .ipynb and .py files (01-20, excluding some gaps)
- Moved from modules/source/ to direct module directories
- Includes tensor, autograd, layers, transformers, optimization modules
Module README files:
- Added README.md for modules with additional documentation
- Complements ABOUT.md files added earlier
This completes the module restructuring:
- Before: modules/source/XX_name/*_dev.{py,ipynb}
- After: modules/XX_name/*_dev.{py,ipynb}
All development happens directly in numbered module directories now.
208 lines
7.4 KiB
Markdown
208 lines
7.4 KiB
Markdown
# Module 09: Spatial Operations - CNNs for Vision
|
||
|
||
## Overview
|
||
**Time**: 3-4 hours
|
||
**Difficulty**: ⭐⭐⭐⭐☆
|
||
|
||
Build convolutional neural networks (CNNs) - the foundation of computer vision. Learn how spatial operations enable pattern recognition in images through local connectivity and parameter sharing.
|
||
|
||
## Prerequisites
|
||
**Required Modules**: 01-08 must be completed and tested
|
||
- ✅ Module 01 (Tensor): Data structures
|
||
- ✅ Module 02 (Activations): ReLU for feature detection
|
||
- ✅ Module 03 (Layers): Linear layers foundation
|
||
- ✅ Module 04 (Losses): CrossEntropy for classification
|
||
- ✅ Module 05 (Autograd): Gradient computation
|
||
- ✅ Module 06 (Optimizers): SGD/Adam for training
|
||
- ✅ Module 07 (Training): Training loop patterns
|
||
- ✅ Module 08 (Data): Efficient data loading
|
||
|
||
**Before starting**, verify prerequisites:
|
||
```bash
|
||
pytest modules/01_tensor/test_tensor.py
|
||
pytest modules/02_activations/test_activations.py
|
||
# ... test all modules 01-08
|
||
```
|
||
|
||
## Learning Objectives
|
||
|
||
By the end of this module, you will:
|
||
|
||
### Core Concepts
|
||
1. **Understand Convolutional Operations**
|
||
- Sliding window computation over spatial dimensions
|
||
- Filter/kernel mathematics (cross-correlation)
|
||
- Output size calculations: `(H-K+2P)/S + 1`
|
||
- Why convolution works for spatial data
|
||
|
||
2. **Implement Conv2d Layers**
|
||
- Forward pass: applying filters to extract features
|
||
- Backward pass: gradients for filters, inputs, and biases
|
||
- Parameter sharing reduces model size vs fully-connected
|
||
- Local connectivity captures spatial patterns
|
||
|
||
3. **Master Pooling Operations**
|
||
- MaxPool2d: dimensionality reduction while preserving features
|
||
- Stride and kernel size trade-offs
|
||
- Translation invariance for robust recognition
|
||
- When to pool vs when to use strided convolution
|
||
|
||
4. **Build Spatial Hierarchies**
|
||
- Early layers: edges and textures (local patterns)
|
||
- Middle layers: parts and shapes (combinations)
|
||
- Deep layers: objects and scenes (high-level concepts)
|
||
- How receptive fields grow with depth
|
||
|
||
### Systems Understanding
|
||
1. **Computational Complexity**
|
||
- FLOPs analysis: `O(N²M²K²)` for naive convolution
|
||
- Why convolution is expensive (6 nested loops)
|
||
- Memory bottlenecks in spatial operations
|
||
- Cache efficiency and data locality
|
||
|
||
2. **Optimization Techniques**
|
||
- Im2col algorithm: trade memory for speed
|
||
- Vectorization strategies for convolution
|
||
- Why GPUs excel at convolutional operations
|
||
- Batch processing for throughput
|
||
|
||
3. **Production Considerations**
|
||
- Parameter efficiency: CNNs vs MLPs for images
|
||
- Mobile deployment: depthwise-separable convolutions
|
||
- Memory footprint during training (activations + gradients)
|
||
- Inference optimization patterns
|
||
|
||
### ML Engineering Skills
|
||
1. **Architecture Design**
|
||
- Choosing filter sizes (1×1, 3×3, 5×5)
|
||
- Balancing depth vs width
|
||
- When to pool and when to stride
|
||
- Building feature extraction pipelines
|
||
|
||
2. **Debugging Spatial Layers**
|
||
- Shape tracking through conv and pool layers
|
||
- Gradient flow verification in deep networks
|
||
- Common errors: dimension mismatches
|
||
- Validating learned filters visually
|
||
|
||
3. **Performance Profiling**
|
||
- Measuring convolution speed vs input size
|
||
- Memory usage scaling with batch size
|
||
- Comparing naive vs optimized implementations
|
||
- Bottleneck identification in CNN pipelines
|
||
|
||
## What You'll Build
|
||
|
||
### Core Components
|
||
1. **Conv2d**: Convolutional layer with learnable filters
|
||
2. **MaxPool2d**: Max pooling for dimensionality reduction
|
||
3. **Flatten**: Reshape spatial features for classification
|
||
4. **Helper functions**: Shape calculation utilities
|
||
|
||
### Complete CNN System
|
||
By module end, you'll have all components to build:
|
||
- LeNet-style architectures (1998 - digit recognition)
|
||
- Feature extraction pipelines
|
||
- Spatial hierarchy networks
|
||
- Ready for Milestone 04: LeNet CNN
|
||
|
||
## Module Structure
|
||
|
||
```
|
||
modules/09_spatial/
|
||
├── README.md ← You are here
|
||
├── spatial_dev.py ← Main implementation file
|
||
├── spatial_dev.ipynb ← Jupyter notebook version
|
||
└── test_spatial.py ← Validation tests
|
||
```
|
||
|
||
## After This Module
|
||
|
||
### Immediate Next Step
|
||
**→ Milestone 04: LeNet CNN (1998)**
|
||
Build Yann LeCun's historic convolutional network that revolutionized digit recognition. You now have all components: Conv2d, MaxPool2d, ReLU, and training loops.
|
||
|
||
### Future Modules Will Add
|
||
- **Module 10**: Normalization (BatchNorm, LayerNorm)
|
||
- **Module 11**: Modern architectures (ResNets, skip connections)
|
||
- **Module 12**: Attention mechanisms (transformers)
|
||
|
||
### What Becomes Possible
|
||
- ✅ Image classification (MNIST, CIFAR-10)
|
||
- ✅ Feature extraction for transfer learning
|
||
- ✅ Spatial pattern recognition
|
||
- ✅ Building blocks for modern vision models
|
||
|
||
## Key Insights You'll Discover
|
||
|
||
### Why CNNs Work
|
||
1. **Parameter Sharing**: Same filter applied everywhere → fewer parameters
|
||
2. **Local Connectivity**: Neurons see small regions → translation equivariance
|
||
3. **Hierarchical Features**: Stack layers → learn complex patterns
|
||
4. **Spatial Structure**: Preserve 2D topology → better for images
|
||
|
||
### Performance Realities
|
||
1. **Convolution is Expensive**: O(N²M²K²) complexity → GPUs essential
|
||
2. **Memory Scales Quadratically**: Large images → huge activations
|
||
3. **Im2col Trade-off**: 10× memory → 100× speed possible
|
||
4. **Batch Processing**: Amortize overhead → better throughput
|
||
|
||
### Architectural Patterns
|
||
1. **Gradual Downsampling**: Increase channels, decrease spatial size
|
||
2. **3×3 Dominance**: Best balance of expressiveness and efficiency
|
||
3. **Pooling Alternatives**: Strided conv can replace pooling
|
||
4. **Depth Matters**: More layers → better hierarchies
|
||
|
||
## Tips for Success
|
||
|
||
### Implementation Strategy
|
||
1. **Start Simple**: Get 3×3 convolution working first
|
||
2. **Test Incrementally**: Verify shapes at each step
|
||
3. **Profile Early**: Measure performance to understand complexity
|
||
4. **Visualize Outputs**: Check feature maps make sense
|
||
|
||
### Common Pitfalls
|
||
- ⚠️ **Shape Mismatches**: Track dimensions carefully through conv/pool
|
||
- ⚠️ **Memory Errors**: Batch size × spatial size can be huge
|
||
- ⚠️ **Gradient Issues**: Deep networks need careful initialization
|
||
- ⚠️ **Performance**: Naive implementation will be slow (that's the point!)
|
||
|
||
### Debugging Techniques
|
||
```python
|
||
# Always print shapes during development
|
||
print(f"Input: {x.shape}")
|
||
x = conv1(x)
|
||
print(f"After conv1: {x.shape}")
|
||
x = pool1(x)
|
||
print(f"After pool1: {x.shape}")
|
||
```
|
||
|
||
## Estimated Timeline
|
||
|
||
- **Part 1-2**: Introduction & Math (30 minutes)
|
||
- **Part 3**: Conv2d Implementation (90 minutes)
|
||
- **Part 4**: MaxPool2d & Flatten (45 minutes)
|
||
- **Part 5**: Systems Analysis (30 minutes)
|
||
- **Part 6**: Integration & Testing (30 minutes)
|
||
- **Total**: 3-4 hours with breaks
|
||
|
||
## Learning Approach
|
||
|
||
This is a **Core Module (complexity level 4/5)**:
|
||
- Full implementation with explicit loops (see the complexity!)
|
||
- Systems analysis reveals performance characteristics
|
||
- Connection to production patterns (im2col, GPU kernels)
|
||
- Immediate testing after each component
|
||
|
||
**Don't rush** - understanding spatial operations deeply is crucial for modern ML.
|
||
|
||
## Getting Started
|
||
|
||
Open `spatial_dev.py` and begin with Part 1: Introduction to Spatial Operations.
|
||
|
||
**Remember**: You're building the foundation of computer vision. Take time to understand how these operations enable hierarchical feature learning in images.
|
||
|
||
---
|
||
|
||
**Ready?** Let's build CNNs! 🏗️
|