mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-08 05:45:18 -05:00

Files

Vijay Janapa Reddi 4f06392de5 Apply formatting fixes to achieve 10/10 consistency

- Add 🧪 emoji to all test_module() docstrings (20 modules)
- Fix Module 16 (compression): Add if __name__ guards to 6 test functions
- Fix Module 08 (dataloader): Add if __name__ guard to test_training_integration

All modules now follow consistent formatting standards for release.

2025-11-24 15:07:32 -05:00

ABOUT.md

Update module documentation: enhance ABOUT.md files across all modules

2025-11-13 10:42:47 -05:00

README.md

Add module development files to new structure

2025-11-10 19:43:36 -05:00

spatial.ipynb

Clean up milestone directories

2025-11-22 20:30:58 -05:00

spatial.py

Apply formatting fixes to achieve 10/10 consistency

2025-11-24 15:07:32 -05:00

README.md

Module 09: Spatial Operations - CNNs for Vision

Overview

Time: 3-4 hours Difficulty: ⭐⭐⭐⭐☆

Build convolutional neural networks (CNNs) - the foundation of computer vision. Learn how spatial operations enable pattern recognition in images through local connectivity and parameter sharing.

Prerequisites

Required Modules: 01-08 must be completed and tested

✅ Module 01 (Tensor): Data structures
✅ Module 02 (Activations): ReLU for feature detection
✅ Module 03 (Layers): Linear layers foundation
✅ Module 04 (Losses): CrossEntropy for classification
✅ Module 05 (Autograd): Gradient computation
✅ Module 06 (Optimizers): SGD/Adam for training
✅ Module 07 (Training): Training loop patterns
✅ Module 08 (Data): Efficient data loading

Before starting, verify prerequisites:

pytest modules/01_tensor/test_tensor.py
pytest modules/02_activations/test_activations.py
# ... test all modules 01-08

Learning Objectives

By the end of this module, you will:

Core Concepts

Understand Convolutional Operations
- Sliding window computation over spatial dimensions
- Filter/kernel mathematics (cross-correlation)
- Output size calculations: (H-K+2P)/S + 1
- Why convolution works for spatial data
Implement Conv2d Layers
- Forward pass: applying filters to extract features
- Backward pass: gradients for filters, inputs, and biases
- Parameter sharing reduces model size vs fully-connected
- Local connectivity captures spatial patterns
Master Pooling Operations
- MaxPool2d: dimensionality reduction while preserving features
- Stride and kernel size trade-offs
- Translation invariance for robust recognition
- When to pool vs when to use strided convolution
Build Spatial Hierarchies
- Early layers: edges and textures (local patterns)
- Middle layers: parts and shapes (combinations)
- Deep layers: objects and scenes (high-level concepts)
- How receptive fields grow with depth

Systems Understanding

Computational Complexity
- FLOPs analysis: O(N²M²K²) for naive convolution
- Why convolution is expensive (6 nested loops)
- Memory bottlenecks in spatial operations
- Cache efficiency and data locality
Optimization Techniques
- Im2col algorithm: trade memory for speed
- Vectorization strategies for convolution
- Why GPUs excel at convolutional operations
- Batch processing for throughput
Production Considerations
- Parameter efficiency: CNNs vs MLPs for images
- Mobile deployment: depthwise-separable convolutions
- Memory footprint during training (activations + gradients)
- Inference optimization patterns

ML Engineering Skills

Architecture Design
- Choosing filter sizes (1×1, 3×3, 5×5)
- Balancing depth vs width
- When to pool and when to stride
- Building feature extraction pipelines
Debugging Spatial Layers
- Shape tracking through conv and pool layers
- Gradient flow verification in deep networks
- Common errors: dimension mismatches
- Validating learned filters visually
Performance Profiling
- Measuring convolution speed vs input size
- Memory usage scaling with batch size
- Comparing naive vs optimized implementations
- Bottleneck identification in CNN pipelines

What You'll Build

Core Components

Conv2d: Convolutional layer with learnable filters
MaxPool2d: Max pooling for dimensionality reduction
Flatten: Reshape spatial features for classification
Helper functions: Shape calculation utilities

Complete CNN System

By module end, you'll have all components to build:

LeNet-style architectures (1998 - digit recognition)
Feature extraction pipelines
Spatial hierarchy networks
Ready for Milestone 04: LeNet CNN

Module Structure

modules/09_spatial/
├── README.md                 ← You are here
├── spatial_dev.py            ← Main implementation file
├── spatial_dev.ipynb         ← Jupyter notebook version
└── test_spatial.py           ← Validation tests

After This Module

Immediate Next Step

→ Milestone 04: LeNet CNN (1998) Build Yann LeCun's historic convolutional network that revolutionized digit recognition. You now have all components: Conv2d, MaxPool2d, ReLU, and training loops.

Future Modules Will Add

Module 10: Normalization (BatchNorm, LayerNorm)
Module 11: Modern architectures (ResNets, skip connections)
Module 12: Attention mechanisms (transformers)

What Becomes Possible

✅ Image classification (MNIST, CIFAR-10)
✅ Feature extraction for transfer learning
✅ Spatial pattern recognition
✅ Building blocks for modern vision models

Key Insights You'll Discover

Why CNNs Work

Parameter Sharing: Same filter applied everywhere → fewer parameters
Local Connectivity: Neurons see small regions → translation equivariance
Hierarchical Features: Stack layers → learn complex patterns
Spatial Structure: Preserve 2D topology → better for images

Performance Realities

Convolution is Expensive: O(N²M²K²) complexity → GPUs essential
Memory Scales Quadratically: Large images → huge activations
Im2col Trade-off: 10× memory → 100× speed possible
Batch Processing: Amortize overhead → better throughput

Architectural Patterns

Gradual Downsampling: Increase channels, decrease spatial size
3×3 Dominance: Best balance of expressiveness and efficiency
Pooling Alternatives: Strided conv can replace pooling
Depth Matters: More layers → better hierarchies

Tips for Success

Implementation Strategy

Start Simple: Get 3×3 convolution working first
Test Incrementally: Verify shapes at each step
Profile Early: Measure performance to understand complexity
Visualize Outputs: Check feature maps make sense

Common Pitfalls

⚠️ Shape Mismatches: Track dimensions carefully through conv/pool
⚠️ Memory Errors: Batch size × spatial size can be huge
⚠️ Gradient Issues: Deep networks need careful initialization
⚠️ Performance: Naive implementation will be slow (that's the point!)

Debugging Techniques

# Always print shapes during development
print(f"Input: {x.shape}")
x = conv1(x)
print(f"After conv1: {x.shape}")
x = pool1(x)
print(f"After pool1: {x.shape}")

Estimated Timeline

Part 1-2: Introduction & Math (30 minutes)
Part 3: Conv2d Implementation (90 minutes)
Part 4: MaxPool2d & Flatten (45 minutes)
Part 5: Systems Analysis (30 minutes)
Part 6: Integration & Testing (30 minutes)
Total: 3-4 hours with breaks

Learning Approach

This is a Core Module (complexity level 4/5):

Full implementation with explicit loops (see the complexity!)
Systems analysis reveals performance characteristics
Connection to production patterns (im2col, GPU kernels)
Immediate testing after each component

Don't rush - understanding spatial operations deeply is crucial for modern ML.

Getting Started

Open spatial_dev.py and begin with Part 1: Introduction to Spatial Operations.

Remember: You're building the foundation of computer vision. Take time to understand how these operations enable hierarchical feature learning in images.

Ready? Let's build CNNs! 🏗️

README.md Unescape Escape

Module 09: Spatial Operations - CNNs for Vision

Overview

Prerequisites

Learning Objectives

Core Concepts

Systems Understanding

ML Engineering Skills

What You'll Build

Core Components

Complete CNN System

Module Structure

After This Module

Immediate Next Step

Future Modules Will Add

What Becomes Possible

Key Insights You'll Discover

Why CNNs Work

Performance Realities

Architectural Patterns

Tips for Success

Implementation Strategy

Common Pitfalls

Debugging Techniques

Estimated Timeline

Learning Approach

Getting Started

README.md