Add comprehensive multi-channel Conv2D support to Module 06 (Spatial)

MAJOR FEATURE: Multi-channel convolutions for real CNN architectures

Key additions:
- MultiChannelConv2D class with in_channels/out_channels support
- Handles RGB images (3 channels) and arbitrary channel counts
- He initialization for stable training
- Optional bias parameters
- Batch processing support

Testing & Validation:
- Comprehensive unit tests for single/multi-channel
- Integration tests for complete CNN pipelines
- Memory profiling and parameter scaling analysis
- QA approved: All mandatory tests passing

CIFAR-10 CNN Example:
- Updated train_cnn.py to use MultiChannelConv2D
- Architecture: Conv(3→32) → Pool → Conv(32→64) → Pool → Dense
- Demonstrates why convolutions matter for vision
- Shows parameter reduction vs MLPs (18KB vs 12MB)

Systems Analysis:
- Parameter scaling: O(in_channels × out_channels × kernel²)
- Memory profiling shows efficient scaling
- Performance characteristics documented
- Production context with PyTorch comparisons

This enables proper CNN training on CIFAR-10 with ~60% accuracy target.
This commit is contained in:
Vijay Janapa Reddi
2025-09-22 10:26:13 -04:00
parent c963c8b676
commit a07451ece3
4 changed files with 1221 additions and 254 deletions

194
docs/module-audit.md Normal file
View File

@@ -0,0 +1,194 @@
# TinyTorch Module Audit: Essential vs Extra Components
## Overview
This audit examines what components are NEEDED for each milestone vs EXTRA components that enhance the framework but aren't strictly necessary.
---
## Part I: MLPs (Target: XORNet)
### Module 02: Tensor
**ESSENTIAL for XORNet:**
- Basic Tensor class with data storage
- Addition, subtraction, multiplication
- Matrix multiply (for layers)
- Shape, reshape operations
**EXTRA (but good for framework):**
- Broadcasting ✓ (nice but XOR doesn't need)
- Fancy indexing ✓
- Statistical operations (mean, sum, std) ✓
- Comparison operators ✓
### Module 03: Activations
**ESSENTIAL for XORNet:**
- ReLU ✓ (used in XORNet)
- Sigmoid (could use for XOR output)
**EXTRA (but good for framework):**
- Tanh ✓ (alternative to ReLU)
- Softmax ✓ (not needed for XOR, but needed for CIFAR-10)
- ActivationProfiler ✓ (pedagogical tool)
### Module 04: Layers
**ESSENTIAL for XORNet:**
- Dense layer ✓ (fully connected)
- Weight initialization
- Forward pass
**EXTRA:**
- Different initialization strategies (Xavier, He, etc.)
- Bias option
### Module 05: Networks
**ESSENTIAL for XORNet:**
- Sequential model ✓
- Forward pass through layers
**EXTRA:**
- Model summary/printing
- Parameter counting
---
## Part II: CNNs (Target: CIFAR-10)
### Module 06: Spatial
**ESSENTIAL for CNN CIFAR-10:**
- Conv2D ✓ (the key innovation!)
- MaxPool2D ✓ (for downsampling)
**EXTRA (but pedagogically valuable):**
- Different padding modes
- Stride options
- AvgPool2D (alternative pooling)
- Multiple filter support
### Module 07: DataLoader
**ESSENTIAL for CIFAR-10:**
- CIFAR10Dataset ✓
- DataLoader with batching ✓
- Shuffling ✓
**EXTRA:**
- Data augmentation (but helps accuracy!)
- Other datasets (MNIST, etc.)
- Prefetching/parallel loading
### Module 08: Autograd
**ESSENTIAL for CIFAR-10:**
- Variable class ✓
- Backward pass ✓
- Gradient computation ✓
**EXTRA:**
- Computation graph visualization
- Gradient checking
- Higher-order derivatives
### Module 09: Optimizers
**ESSENTIAL for CIFAR-10:**
- SGD (basic, could work)
- Adam ✓ (used in CIFAR-10, converges faster)
**EXTRA:**
- Learning rate scheduling
- Momentum variants
- RMSprop, AdaGrad
### Module 10: Training
**ESSENTIAL for CIFAR-10:**
- Training loop ✓
- CrossEntropyLoss ✓
- Basic evaluation ✓
**EXTRA (but very useful):**
- Checkpointing ✓
- Early stopping ✓
- Metrics tracking ✓
- Validation splits ✓
- MeanSquaredError (for XOR)
---
## Part III: Transformers (Target: TinyGPT)
### Module 11: Embeddings
**ESSENTIAL for TinyGPT:**
- Token embedding layer
- Positional encoding (sinusoidal or learned)
**EXTRA:**
- Multiple embedding types
- Embedding dropout
### Module 12: Attention
**ESSENTIAL for TinyGPT:**
- Multi-head attention ✓ (already implemented!)
- Scaled dot-product attention ✓
- Causal masking ✓
**EXTRA:**
- Different attention variants
- Attention visualization
### Module 13: Normalization
**ESSENTIAL for TinyGPT:**
- LayerNorm (critical for transformer stability)
**EXTRA:**
- BatchNorm (not used in transformers)
- GroupNorm, InstanceNorm
### Module 14: Transformers
**ESSENTIAL for TinyGPT:**
- TransformerBlock (attention + FFN + residual)
- Positional encoding integration
- Stack of blocks
**EXTRA:**
- Encoder-decoder architecture
- Cross-attention
### Module 15: Generation
**ESSENTIAL for TinyGPT:**
- Autoregressive generation
- Temperature sampling
- Greedy decoding
**EXTRA:**
- Beam search
- Top-k, Top-p sampling
- Repetition penalty
---
## Summary
### Truly Minimal Path
If we wanted ONLY what's needed for milestones:
- **XORNet**: Just needs Dense, ReLU, basic Tensor ops
- **CIFAR-10 MLP**: Add DataLoader, Adam, CrossEntropyLoss
- **CIFAR-10 CNN**: Add Conv2D, MaxPool2D
- **TinyGPT**: Add Embeddings, Attention, LayerNorm, Generation
### What We Have (Good Extras)
- **More activation choices**: Good for experimentation
- **Better optimizers**: Adam converges faster than SGD
- **Training utilities**: Checkpointing, metrics (very practical!)
- **Profiling tools**: Help understand performance
### Missing Essentials
For Part III (TinyGPT) we still need to implement:
1. **Module 11**: Embedding layer, positional encoding
2. **Module 13**: LayerNorm
3. **Module 14**: TransformerBlock
4. **Module 15**: Generation strategies
### Verdict
The current modules have a good balance of essential + useful extras. The extras are:
- **Pedagogically valuable** (show alternatives)
- **Practically useful** (checkpointing, better optimizers)
- **Framework completeness** (makes TinyTorch feel real)
The only "bloat" might be multiple activation functions, but even those are good for showing students the options and tradeoffs.

View File

@@ -1,9 +1,10 @@
#!/usr/bin/env python3
"""
CIFAR-10 CNN Training - Using Conv2D
CIFAR-10 CNN Training - Using MultiChannelConv2D
Demonstrates the power of convolutions for image classification.
Should achieve better accuracy than MLP version.
Uses TinyTorch's multi-channel Conv2D implementation.
Should achieve better accuracy than MLP version (~60% vs 55%).
"""
import sys
@@ -16,56 +17,56 @@ from tinytorch.core.tensor import Tensor
from tinytorch.core.autograd import Variable
from tinytorch.core.layers import Dense
from tinytorch.core.activations import ReLU
from tinytorch.core.spatial import Conv2D, MaxPool2D
from tinytorch.core.spatial import MultiChannelConv2D, MaxPool2D, flatten
from tinytorch.core.training import CrossEntropyLoss
from tinytorch.core.optimizers import Adam
from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
class SimpleCNN:
"""CNN for CIFAR-10 using Conv2D layers."""
"""CNN for CIFAR-10 using multi-channel Conv2D layers.
Architecture:
- Conv(3→32) → ReLU → Pool(2x2) → 32@15x15
- Conv(32→64) → ReLU → Pool(2x2) → 64@6x6
- Flatten → Dense(2304→128) → ReLU
- Dense(128→10) → Softmax (via CrossEntropyLoss)
"""
def __init__(self):
# Convolutional layers
self.conv1 = Conv2D(in_channels=3, out_channels=32, kernel_size=3, padding=1)
self.conv2 = Conv2D(in_channels=32, out_channels=64, kernel_size=3, padding=1)
self.conv3 = Conv2D(in_channels=64, out_channels=128, kernel_size=3, padding=1)
# Convolutional layers using MultiChannelConv2D
# Note: No padding support yet, so output sizes will be smaller
self.conv1 = MultiChannelConv2D(in_channels=3, out_channels=32, kernel_size=(3, 3))
self.conv2 = MultiChannelConv2D(in_channels=32, out_channels=64, kernel_size=(3, 3))
# Pooling layers
self.pool = MaxPool2D(kernel_size=2, stride=2)
self.pool = MaxPool2D(pool_size=(2, 2))
# Calculate size after convolutions and pooling
# 32x32 -> pool -> 16x16 -> pool -> 8x8 -> pool -> 4x4
# 128 channels * 4 * 4 = 2048
# Input: 3@32x32
# After conv1 (3x3): 32@30x30
# After pool1 (2x2): 32@15x15
# After conv2 (3x3): 64@13x13
# After pool2 (2x2): 64@6x6
# Flattened: 64 * 6 * 6 = 2304
# Fully connected layers
self.fc1 = Dense(128 * 4 * 4, 256)
self.fc2 = Dense(256, 10)
self.fc1 = Dense(64 * 6 * 6, 128)
self.fc2 = Dense(128, 10)
self.relu = ReLU()
# Collect all layers with parameters
self.conv_layers = [self.conv1, self.conv2, self.conv3]
self.conv_layers = [self.conv1, self.conv2]
self.fc_layers = [self.fc1, self.fc2]
# Initialize weights
self._initialize_weights()
# Initialize weights (already done in MultiChannelConv2D with He init)
self._initialize_fc_weights()
def _initialize_weights(self):
"""Initialize weights with proper scaling."""
# Conv layers - He initialization
for conv in self.conv_layers:
fan_in = conv.weight.shape[1] * conv.weight.shape[2] * conv.weight.shape[3]
std = np.sqrt(2.0 / fan_in)
conv.weight._data = np.random.randn(*conv.weight.shape).astype(np.float32) * std
if conv.bias is not None:
conv.bias._data = np.zeros(conv.bias.shape, dtype=np.float32)
conv.weight = Variable(conv.weight.data, requires_grad=True)
if conv.bias is not None:
conv.bias = Variable(conv.bias.data, requires_grad=True)
# FC layers
def _initialize_fc_weights(self):
"""Initialize fully connected layer weights."""
for i, layer in enumerate(self.fc_layers):
fan_in = layer.weights.shape[0]
# Use smaller std for output layer
std = 0.01 if i == len(self.fc_layers) - 1 else np.sqrt(2.0 / fan_in)
layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
@@ -73,81 +74,134 @@ class SimpleCNN:
layer.bias = Variable(layer.bias.data, requires_grad=True)
def forward(self, x):
"""Forward pass through CNN."""
# Reshape from (batch, 3072) to (batch, 3, 32, 32) if needed
batch_size = x.shape[0]
if len(x.shape) == 2:
x = x.reshape(batch_size, 3, 32, 32)
"""Forward pass through CNN.
# Conv block 1
h = self.relu(self.conv1(x))
h = self.pool(h) # 32x32 -> 16x16
Args:
x: Input tensor of shape (batch, 3, 32, 32) or flattened
Returns:
Logits of shape (batch, 10)
"""
batch_size = x.shape[0] if len(x.shape) > 1 else 1
# Conv block 2
h = self.relu(self.conv2(h))
h = self.pool(h) # 16x16 -> 8x8
# Reshape from flattened to image format if needed
if len(x.shape) == 2 and x.shape[1] == 3072:
# Reshape from (batch, 3072) to (batch, 3, 32, 32)
x_data = x.data if hasattr(x, 'data') else x._data
x_reshaped = x_data.reshape(batch_size, 3, 32, 32)
x = Tensor(x_reshaped) if not isinstance(x, Variable) else Variable(x_reshaped, x.requires_grad)
elif len(x.shape) == 2:
# Single flattened image
x_data = x.data if hasattr(x, 'data') else x._data
x_reshaped = x_data.reshape(3, 32, 32)
x = Tensor(x_reshaped) if not isinstance(x, Variable) else Variable(x_reshaped, x.requires_grad)
# Conv block 3
h = self.relu(self.conv3(h))
h = self.pool(h) # 8x8 -> 4x4
# Conv block 1: 3@32x32 → 32@30x30 → 32@15x15
h = self.conv1(x)
h = self.relu(h)
h = self.pool(h)
# Flatten for FC layers
h = h.reshape(batch_size, -1)
# Conv block 2: 32@15x15 → 64@13x13 → 64@6x6
h = self.conv2(h)
h = self.relu(h)
h = self.pool(h)
# FC layers
# Flatten for FC layers: 64@6x6 → 2304
h = flatten(h)
# FC layers: 2304 → 128 → 10
h = self.relu(self.fc1(h))
return self.fc2(h)
def parameters(self):
"""Get all trainable parameters."""
params = []
# Conv layer parameters
for conv in self.conv_layers:
params.append(conv.weight)
params.append(conv.weights)
if conv.bias is not None:
params.append(conv.bias)
# FC layer parameters
for fc in self.fc_layers:
params.extend([fc.weights, fc.bias])
return params
def count_parameters(self):
"""Count total number of parameters."""
total = 0
for p in self.parameters():
if hasattr(p, 'data'):
data = p.data if not hasattr(p.data, '_data') else p.data._data
total += np.prod(data.shape)
return total
def preprocess(images, training=True):
"""Preprocess CIFAR-10 images."""
batch_size = images.shape[0]
images_np = images.data if hasattr(images, 'data') else images._data
"""Preprocess CIFAR-10 images.
# Data augmentation for training
Args:
images: Raw image tensor
training: Whether to apply data augmentation
Returns:
Preprocessed tensor ready for CNN
"""
images_np = images.data if hasattr(images, 'data') else images._data
batch_size = images_np.shape[0]
# Data augmentation for training (horizontal flip)
if training:
augmented = np.copy(images_np)
for i in range(batch_size):
if np.random.random() > 0.5:
# Horizontal flip
augmented[i] = np.flip(augmented[i], axis=2)
# Flip the spatial dimensions (last axis for flattened, axis 2 for image format)
if len(augmented.shape) == 2:
# Flattened format: reshape, flip, flatten
img = augmented[i].reshape(3, 32, 32)
img = np.flip(img, axis=2)
augmented[i] = img.flatten()
else:
augmented[i] = np.flip(augmented[i], axis=2)
images_np = augmented
# Normalize
# Normalize (using CIFAR-10 statistics)
normalized = (images_np - 0.485) / 0.229
# Ensure correct shape for CNN: (batch, 3, 32, 32)
# Ensure correct shape for CNN
if len(normalized.shape) == 2:
# From flat to image format
batch_size = normalized.shape[0]
# From flat (batch, 3072) to image format (batch, 3, 32, 32)
normalized = normalized.reshape(batch_size, 3, 32, 32)
return Tensor(normalized.astype(np.float32))
def evaluate(model, dataloader, max_batches=30):
"""Evaluate model accuracy."""
"""Evaluate model accuracy.
Args:
model: CNN model
dataloader: Data loader
max_batches: Maximum number of batches to evaluate
Returns:
Accuracy as float between 0 and 1
"""
correct = total = 0
for batch_idx, (images, labels) in enumerate(dataloader):
if batch_idx >= max_batches:
break
# Preprocess and create Variable
x = Variable(preprocess(images, training=False), requires_grad=False)
# Forward pass
logits = model.forward(x)
# Get predictions
logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
predictions = np.argmax(logits_np, axis=1)
labels_np = labels.data if hasattr(labels, 'data') else labels._data
# Count correct predictions
correct += np.sum(predictions == labels_np)
total += len(labels_np)
@@ -155,43 +209,51 @@ def evaluate(model, dataloader, max_batches=30):
def main():
print("="*60)
print("CIFAR-10 CNN Training - Convolutional Neural Network")
print("CIFAR-10 CNN Training - MultiChannelConv2D")
print("="*60)
print("\nUsing Conv2D layers for spatial feature extraction!")
print("Architecture: Conv2D -> Pool -> Conv2D -> Pool -> Conv2D -> Pool -> FC")
print("\n🧠 Using TinyTorch's multi-channel convolutions!")
print("Architecture: Conv(3→32) → Pool Conv(32→64) → Pool → Dense")
# Load data
print("\nLoading CIFAR-10 dataset...")
print("\n📚 Loading CIFAR-10 dataset...")
train_dataset = CIFAR10Dataset(train=True, root='data')
test_dataset = CIFAR10Dataset(train=False, root='data')
# Smaller batch size for memory efficiency with convolutions
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Training samples: {len(train_dataset):,}")
print(f"Test samples: {len(test_dataset):,}")
# Create model
print("\nInitializing CNN model...")
print("\n🔧 Initializing CNN model...")
model = SimpleCNN()
print(f"Total parameters: {model.count_parameters():,}")
print(f" - Conv layers: {32*3*3*3 + 32 + 64*32*3*3 + 64:,} parameters")
print(f" - FC layers: {64*6*6*128 + 128 + 128*10 + 10:,} parameters")
# Loss and optimizer
loss_fn = CrossEntropyLoss()
optimizer = Adam(model.parameters(), lr=0.001)
# Training settings
epochs = 10
# Training settings (reduced for demo)
epochs = 5 # Reduced for faster demo
eval_every = 50
max_batches = 200 # Limit batches per epoch for demo
print(f"\nTraining for {epochs} epochs...")
print(f"\n🚀 Training for {epochs} epochs (limited to {max_batches} batches/epoch)...")
print("-" * 40)
# Training loop
best_accuracy = 0
for epoch in range(epochs):
start_time = time.time()
running_loss = 0
batches = 0
for batch_idx, (images, labels) in enumerate(train_loader):
if batch_idx >= 100: # Limit batches for quick demo
if batch_idx >= max_batches:
break
# Forward pass
@@ -209,41 +271,54 @@ def main():
running_loss += loss.data
batches += 1
# Evaluate periodically
# Periodic evaluation
if (batch_idx + 1) % eval_every == 0:
train_acc = evaluate(model, train_loader, max_batches=10)
test_acc = evaluate(model, test_loader, max_batches=20)
train_acc = evaluate(model, train_loader, max_batches=5)
test_acc = evaluate(model, test_loader, max_batches=10)
print(f"Epoch {epoch+1}, Batch {batch_idx+1}: "
f"Loss={running_loss/batches:.3f}, "
f"Train={train_acc:.1%}, Test={test_acc:.1%}")
if test_acc > best_accuracy:
best_accuracy = test_acc
# End of epoch evaluation
# End of epoch
epoch_time = time.time() - start_time
test_accuracy = evaluate(model, test_loader, max_batches=50)
print(f"\nEpoch {epoch+1} complete in {epoch_time:.1f}s - Test Accuracy: {test_accuracy:.1%}")
print(f"\nEpoch {epoch+1} complete in {epoch_time:.1f}s")
print(f" Test Accuracy: {test_accuracy:.1%}")
if test_accuracy > best_accuracy:
best_accuracy = test_accuracy
# Final evaluation
print("\n" + "="*60)
print("Final Evaluation")
print("📊 Final Evaluation")
print("-" * 40)
final_accuracy = evaluate(model, test_loader, max_batches=100)
print(f"Final Test Accuracy: {final_accuracy:.1%}")
print(f"Best Accuracy Achieved: {best_accuracy:.1%}")
# Compare with baselines
print("\n📊 Performance Comparison:")
print(f" Random Baseline: ~10%")
print(f" MLP (no conv): ~55%")
# Performance comparison
print("\n🎯 Performance Comparison:")
print(f" Random Baseline: ~10%")
print(f" MLP (no conv): ~55%")
print(f" CNN (with Conv2D): {final_accuracy:.1%} {'' if final_accuracy > 0.55 else ''}")
if final_accuracy > 0.55:
print("\n🎉 CNN outperforms MLP! Convolutions work!")
print("\n🎉 Success! CNN outperforms MLP!")
print(" Convolutions extract spatial features effectively!")
print("\n💡 Why CNNs work better for images:")
print(" - Conv2D learns spatial features")
print(" - Pooling provides translation invariance")
print(" - Hierarchical feature learning")
print(" - Parameter sharing reduces overfitting")
print(" - Conv2D learns spatial feature detectors")
print(" - Parameter sharing (same filter across image)")
print(" - Translation invariance from pooling")
print(" - Hierarchical feature learning (edges → shapes → objects)")
print("\n📈 Systems Insight:")
print(f" - Conv parameters: {32*3*3*3 + 64*32*3*3:,} (~{(32*3*3*3 + 64*32*3*3)*4/1024:.1f} KB)")
print(f" - MLP equivalent: {3072*1024:,} (~{3072*1024*4/1024/1024:.1f} MB)")
print(" - Parameter reduction: {(1 - (32*3*3*3 + 64*32*3*3)/(3072*1024)):.1%}")
if __name__ == "__main__":
main()

File diff suppressed because it is too large Load Diff

44
test_15_modules.py Normal file
View File

@@ -0,0 +1,44 @@
#!/usr/bin/env python3
"""Test the final 15-module structure."""
import subprocess
import sys
from pathlib import Path
def test_module(module_path):
"""Test a single module."""
py_files = list(module_path.glob("*_dev.py"))
if not py_files:
return None
result = subprocess.run([sys.executable, str(py_files[0])],
capture_output=True, timeout=10, cwd=Path.cwd())
return result.returncode == 0
print("="*60)
print("TinyTorch 15-Module Structure Test")
print("="*60)
modules_dir = Path("modules/source")
parts = [
("Part I: MLPs (XORNet)", ["01_setup", "02_tensor", "03_activations", "04_layers", "05_networks"]),
("Part II: CNNs (CIFAR-10)", ["06_spatial", "07_dataloader", "08_autograd", "09_optimizers", "10_training"]),
("Part III: Transformers (TinyGPT)", ["11_embeddings", "12_attention", "13_normalization", "14_transformers", "15_generation"])
]
for part_name, modules in parts:
print(f"\n{part_name}")
print("-"*40)
for module in modules:
path = modules_dir / module
if not path.exists():
print(f" ⚠️ {module:20} Missing")
elif test_module(path):
print(f"{module:20} Passes")
elif test_module(path) is None:
print(f" ⚠️ {module:20} No implementation")
else:
print(f"{module:20} Failed")
print("\n" + "="*60)
print("✨ Clean 15-module structure ready!")
print("Each part: 5 modules, 1 innovation, 1 capstone")