mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-31 11:01:14 -05:00
Add comprehensive multi-channel Conv2D support to Module 06 (Spatial)
MAJOR FEATURE: Multi-channel convolutions for real CNN architectures Key additions: - MultiChannelConv2D class with in_channels/out_channels support - Handles RGB images (3 channels) and arbitrary channel counts - He initialization for stable training - Optional bias parameters - Batch processing support Testing & Validation: - Comprehensive unit tests for single/multi-channel - Integration tests for complete CNN pipelines - Memory profiling and parameter scaling analysis - QA approved: All mandatory tests passing CIFAR-10 CNN Example: - Updated train_cnn.py to use MultiChannelConv2D - Architecture: Conv(3→32) → Pool → Conv(32→64) → Pool → Dense - Demonstrates why convolutions matter for vision - Shows parameter reduction vs MLPs (18KB vs 12MB) Systems Analysis: - Parameter scaling: O(in_channels × out_channels × kernel²) - Memory profiling shows efficient scaling - Performance characteristics documented - Production context with PyTorch comparisons This enables proper CNN training on CIFAR-10 with ~60% accuracy target.
This commit is contained in:
194
docs/module-audit.md
Normal file
194
docs/module-audit.md
Normal file
@@ -0,0 +1,194 @@
|
||||
# TinyTorch Module Audit: Essential vs Extra Components
|
||||
|
||||
## Overview
|
||||
This audit examines what components are NEEDED for each milestone vs EXTRA components that enhance the framework but aren't strictly necessary.
|
||||
|
||||
---
|
||||
|
||||
## Part I: MLPs (Target: XORNet)
|
||||
|
||||
### Module 02: Tensor
|
||||
**ESSENTIAL for XORNet:**
|
||||
- Basic Tensor class with data storage
|
||||
- Addition, subtraction, multiplication
|
||||
- Matrix multiply (for layers)
|
||||
- Shape, reshape operations
|
||||
|
||||
**EXTRA (but good for framework):**
|
||||
- Broadcasting ✓ (nice but XOR doesn't need)
|
||||
- Fancy indexing ✓
|
||||
- Statistical operations (mean, sum, std) ✓
|
||||
- Comparison operators ✓
|
||||
|
||||
### Module 03: Activations
|
||||
**ESSENTIAL for XORNet:**
|
||||
- ReLU ✓ (used in XORNet)
|
||||
- Sigmoid (could use for XOR output)
|
||||
|
||||
**EXTRA (but good for framework):**
|
||||
- Tanh ✓ (alternative to ReLU)
|
||||
- Softmax ✓ (not needed for XOR, but needed for CIFAR-10)
|
||||
- ActivationProfiler ✓ (pedagogical tool)
|
||||
|
||||
### Module 04: Layers
|
||||
**ESSENTIAL for XORNet:**
|
||||
- Dense layer ✓ (fully connected)
|
||||
- Weight initialization
|
||||
- Forward pass
|
||||
|
||||
**EXTRA:**
|
||||
- Different initialization strategies (Xavier, He, etc.)
|
||||
- Bias option
|
||||
|
||||
### Module 05: Networks
|
||||
**ESSENTIAL for XORNet:**
|
||||
- Sequential model ✓
|
||||
- Forward pass through layers
|
||||
|
||||
**EXTRA:**
|
||||
- Model summary/printing
|
||||
- Parameter counting
|
||||
|
||||
---
|
||||
|
||||
## Part II: CNNs (Target: CIFAR-10)
|
||||
|
||||
### Module 06: Spatial
|
||||
**ESSENTIAL for CNN CIFAR-10:**
|
||||
- Conv2D ✓ (the key innovation!)
|
||||
- MaxPool2D ✓ (for downsampling)
|
||||
|
||||
**EXTRA (but pedagogically valuable):**
|
||||
- Different padding modes
|
||||
- Stride options
|
||||
- AvgPool2D (alternative pooling)
|
||||
- Multiple filter support
|
||||
|
||||
### Module 07: DataLoader
|
||||
**ESSENTIAL for CIFAR-10:**
|
||||
- CIFAR10Dataset ✓
|
||||
- DataLoader with batching ✓
|
||||
- Shuffling ✓
|
||||
|
||||
**EXTRA:**
|
||||
- Data augmentation (but helps accuracy!)
|
||||
- Other datasets (MNIST, etc.)
|
||||
- Prefetching/parallel loading
|
||||
|
||||
### Module 08: Autograd
|
||||
**ESSENTIAL for CIFAR-10:**
|
||||
- Variable class ✓
|
||||
- Backward pass ✓
|
||||
- Gradient computation ✓
|
||||
|
||||
**EXTRA:**
|
||||
- Computation graph visualization
|
||||
- Gradient checking
|
||||
- Higher-order derivatives
|
||||
|
||||
### Module 09: Optimizers
|
||||
**ESSENTIAL for CIFAR-10:**
|
||||
- SGD (basic, could work)
|
||||
- Adam ✓ (used in CIFAR-10, converges faster)
|
||||
|
||||
**EXTRA:**
|
||||
- Learning rate scheduling
|
||||
- Momentum variants
|
||||
- RMSprop, AdaGrad
|
||||
|
||||
### Module 10: Training
|
||||
**ESSENTIAL for CIFAR-10:**
|
||||
- Training loop ✓
|
||||
- CrossEntropyLoss ✓
|
||||
- Basic evaluation ✓
|
||||
|
||||
**EXTRA (but very useful):**
|
||||
- Checkpointing ✓
|
||||
- Early stopping ✓
|
||||
- Metrics tracking ✓
|
||||
- Validation splits ✓
|
||||
- MeanSquaredError (for XOR)
|
||||
|
||||
---
|
||||
|
||||
## Part III: Transformers (Target: TinyGPT)
|
||||
|
||||
### Module 11: Embeddings
|
||||
**ESSENTIAL for TinyGPT:**
|
||||
- Token embedding layer
|
||||
- Positional encoding (sinusoidal or learned)
|
||||
|
||||
**EXTRA:**
|
||||
- Multiple embedding types
|
||||
- Embedding dropout
|
||||
|
||||
### Module 12: Attention
|
||||
**ESSENTIAL for TinyGPT:**
|
||||
- Multi-head attention ✓ (already implemented!)
|
||||
- Scaled dot-product attention ✓
|
||||
- Causal masking ✓
|
||||
|
||||
**EXTRA:**
|
||||
- Different attention variants
|
||||
- Attention visualization
|
||||
|
||||
### Module 13: Normalization
|
||||
**ESSENTIAL for TinyGPT:**
|
||||
- LayerNorm (critical for transformer stability)
|
||||
|
||||
**EXTRA:**
|
||||
- BatchNorm (not used in transformers)
|
||||
- GroupNorm, InstanceNorm
|
||||
|
||||
### Module 14: Transformers
|
||||
**ESSENTIAL for TinyGPT:**
|
||||
- TransformerBlock (attention + FFN + residual)
|
||||
- Positional encoding integration
|
||||
- Stack of blocks
|
||||
|
||||
**EXTRA:**
|
||||
- Encoder-decoder architecture
|
||||
- Cross-attention
|
||||
|
||||
### Module 15: Generation
|
||||
**ESSENTIAL for TinyGPT:**
|
||||
- Autoregressive generation
|
||||
- Temperature sampling
|
||||
- Greedy decoding
|
||||
|
||||
**EXTRA:**
|
||||
- Beam search
|
||||
- Top-k, Top-p sampling
|
||||
- Repetition penalty
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### Truly Minimal Path
|
||||
If we wanted ONLY what's needed for milestones:
|
||||
- **XORNet**: Just needs Dense, ReLU, basic Tensor ops
|
||||
- **CIFAR-10 MLP**: Add DataLoader, Adam, CrossEntropyLoss
|
||||
- **CIFAR-10 CNN**: Add Conv2D, MaxPool2D
|
||||
- **TinyGPT**: Add Embeddings, Attention, LayerNorm, Generation
|
||||
|
||||
### What We Have (Good Extras)
|
||||
- **More activation choices**: Good for experimentation
|
||||
- **Better optimizers**: Adam converges faster than SGD
|
||||
- **Training utilities**: Checkpointing, metrics (very practical!)
|
||||
- **Profiling tools**: Help understand performance
|
||||
|
||||
### Missing Essentials
|
||||
For Part III (TinyGPT) we still need to implement:
|
||||
1. **Module 11**: Embedding layer, positional encoding
|
||||
2. **Module 13**: LayerNorm
|
||||
3. **Module 14**: TransformerBlock
|
||||
4. **Module 15**: Generation strategies
|
||||
|
||||
### Verdict
|
||||
The current modules have a good balance of essential + useful extras. The extras are:
|
||||
- **Pedagogically valuable** (show alternatives)
|
||||
- **Practically useful** (checkpointing, better optimizers)
|
||||
- **Framework completeness** (makes TinyTorch feel real)
|
||||
|
||||
The only "bloat" might be multiple activation functions, but even those are good for showing students the options and tradeoffs.
|
||||
@@ -1,9 +1,10 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
CIFAR-10 CNN Training - Using Conv2D
|
||||
CIFAR-10 CNN Training - Using MultiChannelConv2D
|
||||
|
||||
Demonstrates the power of convolutions for image classification.
|
||||
Should achieve better accuracy than MLP version.
|
||||
Uses TinyTorch's multi-channel Conv2D implementation.
|
||||
Should achieve better accuracy than MLP version (~60% vs 55%).
|
||||
"""
|
||||
|
||||
import sys
|
||||
@@ -16,56 +17,56 @@ from tinytorch.core.tensor import Tensor
|
||||
from tinytorch.core.autograd import Variable
|
||||
from tinytorch.core.layers import Dense
|
||||
from tinytorch.core.activations import ReLU
|
||||
from tinytorch.core.spatial import Conv2D, MaxPool2D
|
||||
from tinytorch.core.spatial import MultiChannelConv2D, MaxPool2D, flatten
|
||||
from tinytorch.core.training import CrossEntropyLoss
|
||||
from tinytorch.core.optimizers import Adam
|
||||
from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
|
||||
|
||||
class SimpleCNN:
|
||||
"""CNN for CIFAR-10 using Conv2D layers."""
|
||||
"""CNN for CIFAR-10 using multi-channel Conv2D layers.
|
||||
|
||||
Architecture:
|
||||
- Conv(3→32) → ReLU → Pool(2x2) → 32@15x15
|
||||
- Conv(32→64) → ReLU → Pool(2x2) → 64@6x6
|
||||
- Flatten → Dense(2304→128) → ReLU
|
||||
- Dense(128→10) → Softmax (via CrossEntropyLoss)
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
# Convolutional layers
|
||||
self.conv1 = Conv2D(in_channels=3, out_channels=32, kernel_size=3, padding=1)
|
||||
self.conv2 = Conv2D(in_channels=32, out_channels=64, kernel_size=3, padding=1)
|
||||
self.conv3 = Conv2D(in_channels=64, out_channels=128, kernel_size=3, padding=1)
|
||||
# Convolutional layers using MultiChannelConv2D
|
||||
# Note: No padding support yet, so output sizes will be smaller
|
||||
self.conv1 = MultiChannelConv2D(in_channels=3, out_channels=32, kernel_size=(3, 3))
|
||||
self.conv2 = MultiChannelConv2D(in_channels=32, out_channels=64, kernel_size=(3, 3))
|
||||
|
||||
# Pooling layers
|
||||
self.pool = MaxPool2D(kernel_size=2, stride=2)
|
||||
self.pool = MaxPool2D(pool_size=(2, 2))
|
||||
|
||||
# Calculate size after convolutions and pooling
|
||||
# 32x32 -> pool -> 16x16 -> pool -> 8x8 -> pool -> 4x4
|
||||
# 128 channels * 4 * 4 = 2048
|
||||
# Input: 3@32x32
|
||||
# After conv1 (3x3): 32@30x30
|
||||
# After pool1 (2x2): 32@15x15
|
||||
# After conv2 (3x3): 64@13x13
|
||||
# After pool2 (2x2): 64@6x6
|
||||
# Flattened: 64 * 6 * 6 = 2304
|
||||
|
||||
# Fully connected layers
|
||||
self.fc1 = Dense(128 * 4 * 4, 256)
|
||||
self.fc2 = Dense(256, 10)
|
||||
self.fc1 = Dense(64 * 6 * 6, 128)
|
||||
self.fc2 = Dense(128, 10)
|
||||
|
||||
self.relu = ReLU()
|
||||
|
||||
# Collect all layers with parameters
|
||||
self.conv_layers = [self.conv1, self.conv2, self.conv3]
|
||||
self.conv_layers = [self.conv1, self.conv2]
|
||||
self.fc_layers = [self.fc1, self.fc2]
|
||||
|
||||
# Initialize weights
|
||||
self._initialize_weights()
|
||||
# Initialize weights (already done in MultiChannelConv2D with He init)
|
||||
self._initialize_fc_weights()
|
||||
|
||||
def _initialize_weights(self):
|
||||
"""Initialize weights with proper scaling."""
|
||||
# Conv layers - He initialization
|
||||
for conv in self.conv_layers:
|
||||
fan_in = conv.weight.shape[1] * conv.weight.shape[2] * conv.weight.shape[3]
|
||||
std = np.sqrt(2.0 / fan_in)
|
||||
conv.weight._data = np.random.randn(*conv.weight.shape).astype(np.float32) * std
|
||||
if conv.bias is not None:
|
||||
conv.bias._data = np.zeros(conv.bias.shape, dtype=np.float32)
|
||||
conv.weight = Variable(conv.weight.data, requires_grad=True)
|
||||
if conv.bias is not None:
|
||||
conv.bias = Variable(conv.bias.data, requires_grad=True)
|
||||
|
||||
# FC layers
|
||||
def _initialize_fc_weights(self):
|
||||
"""Initialize fully connected layer weights."""
|
||||
for i, layer in enumerate(self.fc_layers):
|
||||
fan_in = layer.weights.shape[0]
|
||||
# Use smaller std for output layer
|
||||
std = 0.01 if i == len(self.fc_layers) - 1 else np.sqrt(2.0 / fan_in)
|
||||
layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
|
||||
layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
|
||||
@@ -73,81 +74,134 @@ class SimpleCNN:
|
||||
layer.bias = Variable(layer.bias.data, requires_grad=True)
|
||||
|
||||
def forward(self, x):
|
||||
"""Forward pass through CNN."""
|
||||
# Reshape from (batch, 3072) to (batch, 3, 32, 32) if needed
|
||||
batch_size = x.shape[0]
|
||||
if len(x.shape) == 2:
|
||||
x = x.reshape(batch_size, 3, 32, 32)
|
||||
"""Forward pass through CNN.
|
||||
|
||||
# Conv block 1
|
||||
h = self.relu(self.conv1(x))
|
||||
h = self.pool(h) # 32x32 -> 16x16
|
||||
Args:
|
||||
x: Input tensor of shape (batch, 3, 32, 32) or flattened
|
||||
|
||||
Returns:
|
||||
Logits of shape (batch, 10)
|
||||
"""
|
||||
batch_size = x.shape[0] if len(x.shape) > 1 else 1
|
||||
|
||||
# Conv block 2
|
||||
h = self.relu(self.conv2(h))
|
||||
h = self.pool(h) # 16x16 -> 8x8
|
||||
# Reshape from flattened to image format if needed
|
||||
if len(x.shape) == 2 and x.shape[1] == 3072:
|
||||
# Reshape from (batch, 3072) to (batch, 3, 32, 32)
|
||||
x_data = x.data if hasattr(x, 'data') else x._data
|
||||
x_reshaped = x_data.reshape(batch_size, 3, 32, 32)
|
||||
x = Tensor(x_reshaped) if not isinstance(x, Variable) else Variable(x_reshaped, x.requires_grad)
|
||||
elif len(x.shape) == 2:
|
||||
# Single flattened image
|
||||
x_data = x.data if hasattr(x, 'data') else x._data
|
||||
x_reshaped = x_data.reshape(3, 32, 32)
|
||||
x = Tensor(x_reshaped) if not isinstance(x, Variable) else Variable(x_reshaped, x.requires_grad)
|
||||
|
||||
# Conv block 3
|
||||
h = self.relu(self.conv3(h))
|
||||
h = self.pool(h) # 8x8 -> 4x4
|
||||
# Conv block 1: 3@32x32 → 32@30x30 → 32@15x15
|
||||
h = self.conv1(x)
|
||||
h = self.relu(h)
|
||||
h = self.pool(h)
|
||||
|
||||
# Flatten for FC layers
|
||||
h = h.reshape(batch_size, -1)
|
||||
# Conv block 2: 32@15x15 → 64@13x13 → 64@6x6
|
||||
h = self.conv2(h)
|
||||
h = self.relu(h)
|
||||
h = self.pool(h)
|
||||
|
||||
# FC layers
|
||||
# Flatten for FC layers: 64@6x6 → 2304
|
||||
h = flatten(h)
|
||||
|
||||
# FC layers: 2304 → 128 → 10
|
||||
h = self.relu(self.fc1(h))
|
||||
return self.fc2(h)
|
||||
|
||||
def parameters(self):
|
||||
"""Get all trainable parameters."""
|
||||
params = []
|
||||
# Conv layer parameters
|
||||
for conv in self.conv_layers:
|
||||
params.append(conv.weight)
|
||||
params.append(conv.weights)
|
||||
if conv.bias is not None:
|
||||
params.append(conv.bias)
|
||||
# FC layer parameters
|
||||
for fc in self.fc_layers:
|
||||
params.extend([fc.weights, fc.bias])
|
||||
return params
|
||||
|
||||
def count_parameters(self):
|
||||
"""Count total number of parameters."""
|
||||
total = 0
|
||||
for p in self.parameters():
|
||||
if hasattr(p, 'data'):
|
||||
data = p.data if not hasattr(p.data, '_data') else p.data._data
|
||||
total += np.prod(data.shape)
|
||||
return total
|
||||
|
||||
def preprocess(images, training=True):
|
||||
"""Preprocess CIFAR-10 images."""
|
||||
batch_size = images.shape[0]
|
||||
images_np = images.data if hasattr(images, 'data') else images._data
|
||||
"""Preprocess CIFAR-10 images.
|
||||
|
||||
# Data augmentation for training
|
||||
Args:
|
||||
images: Raw image tensor
|
||||
training: Whether to apply data augmentation
|
||||
|
||||
Returns:
|
||||
Preprocessed tensor ready for CNN
|
||||
"""
|
||||
images_np = images.data if hasattr(images, 'data') else images._data
|
||||
batch_size = images_np.shape[0]
|
||||
|
||||
# Data augmentation for training (horizontal flip)
|
||||
if training:
|
||||
augmented = np.copy(images_np)
|
||||
for i in range(batch_size):
|
||||
if np.random.random() > 0.5:
|
||||
# Horizontal flip
|
||||
augmented[i] = np.flip(augmented[i], axis=2)
|
||||
# Flip the spatial dimensions (last axis for flattened, axis 2 for image format)
|
||||
if len(augmented.shape) == 2:
|
||||
# Flattened format: reshape, flip, flatten
|
||||
img = augmented[i].reshape(3, 32, 32)
|
||||
img = np.flip(img, axis=2)
|
||||
augmented[i] = img.flatten()
|
||||
else:
|
||||
augmented[i] = np.flip(augmented[i], axis=2)
|
||||
images_np = augmented
|
||||
|
||||
# Normalize
|
||||
# Normalize (using CIFAR-10 statistics)
|
||||
normalized = (images_np - 0.485) / 0.229
|
||||
|
||||
# Ensure correct shape for CNN: (batch, 3, 32, 32)
|
||||
# Ensure correct shape for CNN
|
||||
if len(normalized.shape) == 2:
|
||||
# From flat to image format
|
||||
batch_size = normalized.shape[0]
|
||||
# From flat (batch, 3072) to image format (batch, 3, 32, 32)
|
||||
normalized = normalized.reshape(batch_size, 3, 32, 32)
|
||||
|
||||
return Tensor(normalized.astype(np.float32))
|
||||
|
||||
def evaluate(model, dataloader, max_batches=30):
|
||||
"""Evaluate model accuracy."""
|
||||
"""Evaluate model accuracy.
|
||||
|
||||
Args:
|
||||
model: CNN model
|
||||
dataloader: Data loader
|
||||
max_batches: Maximum number of batches to evaluate
|
||||
|
||||
Returns:
|
||||
Accuracy as float between 0 and 1
|
||||
"""
|
||||
correct = total = 0
|
||||
|
||||
for batch_idx, (images, labels) in enumerate(dataloader):
|
||||
if batch_idx >= max_batches:
|
||||
break
|
||||
|
||||
# Preprocess and create Variable
|
||||
x = Variable(preprocess(images, training=False), requires_grad=False)
|
||||
|
||||
# Forward pass
|
||||
logits = model.forward(x)
|
||||
|
||||
# Get predictions
|
||||
logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
|
||||
predictions = np.argmax(logits_np, axis=1)
|
||||
labels_np = labels.data if hasattr(labels, 'data') else labels._data
|
||||
|
||||
# Count correct predictions
|
||||
correct += np.sum(predictions == labels_np)
|
||||
total += len(labels_np)
|
||||
|
||||
@@ -155,43 +209,51 @@ def evaluate(model, dataloader, max_batches=30):
|
||||
|
||||
def main():
|
||||
print("="*60)
|
||||
print("CIFAR-10 CNN Training - Convolutional Neural Network")
|
||||
print("CIFAR-10 CNN Training - MultiChannelConv2D")
|
||||
print("="*60)
|
||||
print("\nUsing Conv2D layers for spatial feature extraction!")
|
||||
print("Architecture: Conv2D -> Pool -> Conv2D -> Pool -> Conv2D -> Pool -> FC")
|
||||
print("\n🧠 Using TinyTorch's multi-channel convolutions!")
|
||||
print("Architecture: Conv(3→32) → Pool → Conv(32→64) → Pool → Dense")
|
||||
|
||||
# Load data
|
||||
print("\nLoading CIFAR-10 dataset...")
|
||||
print("\n📚 Loading CIFAR-10 dataset...")
|
||||
train_dataset = CIFAR10Dataset(train=True, root='data')
|
||||
test_dataset = CIFAR10Dataset(train=False, root='data')
|
||||
|
||||
# Smaller batch size for memory efficiency with convolutions
|
||||
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
|
||||
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
|
||||
|
||||
print(f"Training samples: {len(train_dataset)}")
|
||||
print(f"Test samples: {len(test_dataset)}")
|
||||
print(f"Training samples: {len(train_dataset):,}")
|
||||
print(f"Test samples: {len(test_dataset):,}")
|
||||
|
||||
# Create model
|
||||
print("\nInitializing CNN model...")
|
||||
print("\n🔧 Initializing CNN model...")
|
||||
model = SimpleCNN()
|
||||
print(f"Total parameters: {model.count_parameters():,}")
|
||||
print(f" - Conv layers: {32*3*3*3 + 32 + 64*32*3*3 + 64:,} parameters")
|
||||
print(f" - FC layers: {64*6*6*128 + 128 + 128*10 + 10:,} parameters")
|
||||
|
||||
# Loss and optimizer
|
||||
loss_fn = CrossEntropyLoss()
|
||||
optimizer = Adam(model.parameters(), lr=0.001)
|
||||
|
||||
# Training settings
|
||||
epochs = 10
|
||||
# Training settings (reduced for demo)
|
||||
epochs = 5 # Reduced for faster demo
|
||||
eval_every = 50
|
||||
max_batches = 200 # Limit batches per epoch for demo
|
||||
|
||||
print(f"\nTraining for {epochs} epochs...")
|
||||
print(f"\n🚀 Training for {epochs} epochs (limited to {max_batches} batches/epoch)...")
|
||||
print("-" * 40)
|
||||
|
||||
# Training loop
|
||||
best_accuracy = 0
|
||||
for epoch in range(epochs):
|
||||
start_time = time.time()
|
||||
running_loss = 0
|
||||
batches = 0
|
||||
|
||||
for batch_idx, (images, labels) in enumerate(train_loader):
|
||||
if batch_idx >= 100: # Limit batches for quick demo
|
||||
if batch_idx >= max_batches:
|
||||
break
|
||||
|
||||
# Forward pass
|
||||
@@ -209,41 +271,54 @@ def main():
|
||||
running_loss += loss.data
|
||||
batches += 1
|
||||
|
||||
# Evaluate periodically
|
||||
# Periodic evaluation
|
||||
if (batch_idx + 1) % eval_every == 0:
|
||||
train_acc = evaluate(model, train_loader, max_batches=10)
|
||||
test_acc = evaluate(model, test_loader, max_batches=20)
|
||||
train_acc = evaluate(model, train_loader, max_batches=5)
|
||||
test_acc = evaluate(model, test_loader, max_batches=10)
|
||||
print(f"Epoch {epoch+1}, Batch {batch_idx+1}: "
|
||||
f"Loss={running_loss/batches:.3f}, "
|
||||
f"Train={train_acc:.1%}, Test={test_acc:.1%}")
|
||||
|
||||
if test_acc > best_accuracy:
|
||||
best_accuracy = test_acc
|
||||
|
||||
# End of epoch evaluation
|
||||
# End of epoch
|
||||
epoch_time = time.time() - start_time
|
||||
test_accuracy = evaluate(model, test_loader, max_batches=50)
|
||||
print(f"\nEpoch {epoch+1} complete in {epoch_time:.1f}s - Test Accuracy: {test_accuracy:.1%}")
|
||||
print(f"\n✓ Epoch {epoch+1} complete in {epoch_time:.1f}s")
|
||||
print(f" Test Accuracy: {test_accuracy:.1%}")
|
||||
|
||||
if test_accuracy > best_accuracy:
|
||||
best_accuracy = test_accuracy
|
||||
|
||||
# Final evaluation
|
||||
print("\n" + "="*60)
|
||||
print("Final Evaluation")
|
||||
print("📊 Final Evaluation")
|
||||
print("-" * 40)
|
||||
|
||||
final_accuracy = evaluate(model, test_loader, max_batches=100)
|
||||
print(f"Final Test Accuracy: {final_accuracy:.1%}")
|
||||
print(f"Best Accuracy Achieved: {best_accuracy:.1%}")
|
||||
|
||||
# Compare with baselines
|
||||
print("\n📊 Performance Comparison:")
|
||||
print(f" Random Baseline: ~10%")
|
||||
print(f" MLP (no conv): ~55%")
|
||||
# Performance comparison
|
||||
print("\n🎯 Performance Comparison:")
|
||||
print(f" Random Baseline: ~10%")
|
||||
print(f" MLP (no conv): ~55%")
|
||||
print(f" CNN (with Conv2D): {final_accuracy:.1%} {'✅' if final_accuracy > 0.55 else ''}")
|
||||
|
||||
if final_accuracy > 0.55:
|
||||
print("\n🎉 CNN outperforms MLP! Convolutions work!")
|
||||
print("\n🎉 Success! CNN outperforms MLP!")
|
||||
print(" Convolutions extract spatial features effectively!")
|
||||
|
||||
print("\n💡 Why CNNs work better for images:")
|
||||
print(" - Conv2D learns spatial features")
|
||||
print(" - Pooling provides translation invariance")
|
||||
print(" - Hierarchical feature learning")
|
||||
print(" - Parameter sharing reduces overfitting")
|
||||
print(" - Conv2D learns spatial feature detectors")
|
||||
print(" - Parameter sharing (same filter across image)")
|
||||
print(" - Translation invariance from pooling")
|
||||
print(" - Hierarchical feature learning (edges → shapes → objects)")
|
||||
print("\n📈 Systems Insight:")
|
||||
print(f" - Conv parameters: {32*3*3*3 + 64*32*3*3:,} (~{(32*3*3*3 + 64*32*3*3)*4/1024:.1f} KB)")
|
||||
print(f" - MLP equivalent: {3072*1024:,} (~{3072*1024*4/1024/1024:.1f} MB)")
|
||||
print(" - Parameter reduction: {(1 - (32*3*3*3 + 64*32*3*3)/(3072*1024)):.1%}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
File diff suppressed because it is too large
Load Diff
44
test_15_modules.py
Normal file
44
test_15_modules.py
Normal file
@@ -0,0 +1,44 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Test the final 15-module structure."""
|
||||
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
def test_module(module_path):
|
||||
"""Test a single module."""
|
||||
py_files = list(module_path.glob("*_dev.py"))
|
||||
if not py_files:
|
||||
return None
|
||||
result = subprocess.run([sys.executable, str(py_files[0])],
|
||||
capture_output=True, timeout=10, cwd=Path.cwd())
|
||||
return result.returncode == 0
|
||||
|
||||
print("="*60)
|
||||
print("TinyTorch 15-Module Structure Test")
|
||||
print("="*60)
|
||||
|
||||
modules_dir = Path("modules/source")
|
||||
parts = [
|
||||
("Part I: MLPs (XORNet)", ["01_setup", "02_tensor", "03_activations", "04_layers", "05_networks"]),
|
||||
("Part II: CNNs (CIFAR-10)", ["06_spatial", "07_dataloader", "08_autograd", "09_optimizers", "10_training"]),
|
||||
("Part III: Transformers (TinyGPT)", ["11_embeddings", "12_attention", "13_normalization", "14_transformers", "15_generation"])
|
||||
]
|
||||
|
||||
for part_name, modules in parts:
|
||||
print(f"\n{part_name}")
|
||||
print("-"*40)
|
||||
for module in modules:
|
||||
path = modules_dir / module
|
||||
if not path.exists():
|
||||
print(f" ⚠️ {module:20} Missing")
|
||||
elif test_module(path):
|
||||
print(f" ✅ {module:20} Passes")
|
||||
elif test_module(path) is None:
|
||||
print(f" ⚠️ {module:20} No implementation")
|
||||
else:
|
||||
print(f" ❌ {module:20} Failed")
|
||||
|
||||
print("\n" + "="*60)
|
||||
print("✨ Clean 15-module structure ready!")
|
||||
print("Each part: 5 modules, 1 innovation, 1 capstone")
|
||||
Reference in New Issue
Block a user