Add comprehensive multi-channel Conv2D support to Module 06 (Spatial)

MAJOR FEATURE: Multi-channel convolutions for real CNN architectures Key additions: - MultiChannelConv2D class with in_channels/out_channels support - Handles RGB images (3 channels) and arbitrary channel counts - He initialization for stable training - Optional bias parameters - Batch processing support Testing & Validation: - Comprehensive unit tests for single/multi-channel - Integration tests for complete CNN pipelines - Memory profiling and parameter scaling analysis - QA approved: All mandatory tests passing CIFAR-10 CNN Example: - Updated train_cnn.py to use MultiChannelConv2D - Architecture: Conv(3→32) → Pool → Conv(32→64) → Pool → Dense - Demonstrates why convolutions matter for vision - Shows parameter reduction vs MLPs (18KB vs 12MB) Systems Analysis: - Parameter scaling: O(in_channels × out_channels × kernel²) - Memory profiling shows efficient scaling - Performance characteristics documented - Production context with PyTorch comparisons This enables proper CNN training on CIFAR-10 with ~60% accuracy target.
2026-05-31 11:01:14 -05:00 · 2025-09-22 10:26:13 -04:00
parent c963c8b676
commit a07451ece3
4 changed files with 1221 additions and 254 deletions
--- a/docs/module-audit.md
+++ b/docs/module-audit.md
@@ -0,0 +1,194 @@
+# TinyTorch Module Audit: Essential vs Extra Components
+
+## Overview
+This audit examines what components are NEEDED for each milestone vs EXTRA components that enhance the framework but aren't strictly necessary.
+
+---
+
+## Part I: MLPs (Target: XORNet)
+
+### Module 02: Tensor
+**ESSENTIAL for XORNet:**
+- Basic Tensor class with data storage
+- Addition, subtraction, multiplication
+- Matrix multiply (for layers)
+- Shape, reshape operations
+
+**EXTRA (but good for framework):**
+- Broadcasting ✓ (nice but XOR doesn't need)
+- Fancy indexing ✓ 
+- Statistical operations (mean, sum, std) ✓
+- Comparison operators ✓
+
+### Module 03: Activations  
+**ESSENTIAL for XORNet:**
+- ReLU ✓ (used in XORNet)
+- Sigmoid (could use for XOR output)
+
+**EXTRA (but good for framework):**
+- Tanh ✓ (alternative to ReLU)
+- Softmax ✓ (not needed for XOR, but needed for CIFAR-10)
+- ActivationProfiler ✓ (pedagogical tool)
+
+### Module 04: Layers
+**ESSENTIAL for XORNet:**
+- Dense layer ✓ (fully connected)
+- Weight initialization
+- Forward pass
+
+**EXTRA:**
+- Different initialization strategies (Xavier, He, etc.)
+- Bias option
+
+### Module 05: Networks
+**ESSENTIAL for XORNet:**
+- Sequential model ✓
+- Forward pass through layers
+
+**EXTRA:**
+- Model summary/printing
+- Parameter counting
+
+---
+
+## Part II: CNNs (Target: CIFAR-10)
+
+### Module 06: Spatial
+**ESSENTIAL for CNN CIFAR-10:**
+- Conv2D ✓ (the key innovation!)
+- MaxPool2D ✓ (for downsampling)
+
+**EXTRA (but pedagogically valuable):**
+- Different padding modes
+- Stride options
+- AvgPool2D (alternative pooling)
+- Multiple filter support
+
+### Module 07: DataLoader
+**ESSENTIAL for CIFAR-10:**
+- CIFAR10Dataset ✓
+- DataLoader with batching ✓
+- Shuffling ✓
+
+**EXTRA:**
+- Data augmentation (but helps accuracy!)
+- Other datasets (MNIST, etc.)
+- Prefetching/parallel loading
+
+### Module 08: Autograd
+**ESSENTIAL for CIFAR-10:**
+- Variable class ✓
+- Backward pass ✓
+- Gradient computation ✓
+
+**EXTRA:**
+- Computation graph visualization
+- Gradient checking
+- Higher-order derivatives
+
+### Module 09: Optimizers
+**ESSENTIAL for CIFAR-10:**
+- SGD (basic, could work)
+- Adam ✓ (used in CIFAR-10, converges faster)
+
+**EXTRA:**
+- Learning rate scheduling
+- Momentum variants
+- RMSprop, AdaGrad
+
+### Module 10: Training
+**ESSENTIAL for CIFAR-10:**
+- Training loop ✓
+- CrossEntropyLoss ✓
+- Basic evaluation ✓
+
+**EXTRA (but very useful):**
+- Checkpointing ✓
+- Early stopping ✓
+- Metrics tracking ✓
+- Validation splits ✓
+- MeanSquaredError (for XOR)
+
+---
+
+## Part III: Transformers (Target: TinyGPT)
+
+### Module 11: Embeddings
+**ESSENTIAL for TinyGPT:**
+- Token embedding layer
+- Positional encoding (sinusoidal or learned)
+
+**EXTRA:**
+- Multiple embedding types
+- Embedding dropout
+
+### Module 12: Attention
+**ESSENTIAL for TinyGPT:**
+- Multi-head attention ✓ (already implemented!)
+- Scaled dot-product attention ✓
+- Causal masking ✓
+
+**EXTRA:**
+- Different attention variants
+- Attention visualization
+
+### Module 13: Normalization
+**ESSENTIAL for TinyGPT:**
+- LayerNorm (critical for transformer stability)
+
+**EXTRA:**
+- BatchNorm (not used in transformers)
+- GroupNorm, InstanceNorm
+
+### Module 14: Transformers
+**ESSENTIAL for TinyGPT:**
+- TransformerBlock (attention + FFN + residual)
+- Positional encoding integration
+- Stack of blocks
+
+**EXTRA:**
+- Encoder-decoder architecture
+- Cross-attention
+
+### Module 15: Generation
+**ESSENTIAL for TinyGPT:**
+- Autoregressive generation
+- Temperature sampling
+- Greedy decoding
+
+**EXTRA:**
+- Beam search
+- Top-k, Top-p sampling
+- Repetition penalty
+
+---
+
+## Summary
+
+### Truly Minimal Path
+If we wanted ONLY what's needed for milestones:
+- **XORNet**: Just needs Dense, ReLU, basic Tensor ops
+- **CIFAR-10 MLP**: Add DataLoader, Adam, CrossEntropyLoss
+- **CIFAR-10 CNN**: Add Conv2D, MaxPool2D
+- **TinyGPT**: Add Embeddings, Attention, LayerNorm, Generation
+
+### What We Have (Good Extras)
+- **More activation choices**: Good for experimentation
+- **Better optimizers**: Adam converges faster than SGD
+- **Training utilities**: Checkpointing, metrics (very practical!)
+- **Profiling tools**: Help understand performance
+
+### Missing Essentials
+For Part III (TinyGPT) we still need to implement:
+1. **Module 11**: Embedding layer, positional encoding
+2. **Module 13**: LayerNorm 
+3. **Module 14**: TransformerBlock
+4. **Module 15**: Generation strategies
+
+### Verdict
+The current modules have a good balance of essential + useful extras. The extras are:
+- **Pedagogically valuable** (show alternatives)
+- **Practically useful** (checkpointing, better optimizers)
+- **Framework completeness** (makes TinyTorch feel real)
+
+The only "bloat" might be multiple activation functions, but even those are good for showing students the options and tradeoffs.
--- a/examples/cifar10/train_cnn.py
+++ b/examples/cifar10/train_cnn.py
@@ -1,9 +1,10 @@
 #!/usr/bin/env python3
 """
-CIFAR-10 CNN Training - Using Conv2D
+CIFAR-10 CNN Training - Using MultiChannelConv2D

 Demonstrates the power of convolutions for image classification.
-Should achieve better accuracy than MLP version.
+Uses TinyTorch's multi-channel Conv2D implementation.
+Should achieve better accuracy than MLP version (~60% vs 55%).
 """

 import sys
@@ -16,56 +17,56 @@ from tinytorch.core.tensor import Tensor
 from tinytorch.core.autograd import Variable
 from tinytorch.core.layers import Dense
 from tinytorch.core.activations import ReLU
-from tinytorch.core.spatial import Conv2D, MaxPool2D
+from tinytorch.core.spatial import MultiChannelConv2D, MaxPool2D, flatten
 from tinytorch.core.training import CrossEntropyLoss
 from tinytorch.core.optimizers import Adam
 from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset

 class SimpleCNN:
-    """CNN for CIFAR-10 using Conv2D layers."""
+    """CNN for CIFAR-10 using multi-channel Conv2D layers.
+    
+    Architecture:
+    - Conv(3→32) → ReLU → Pool(2x2) → 32@15x15
+    - Conv(32→64) → ReLU → Pool(2x2) → 64@6x6  
+    - Flatten → Dense(2304→128) → ReLU
+    - Dense(128→10) → Softmax (via CrossEntropyLoss)
+    """
    
    def __init__(self):
-        # Convolutional layers
-        self.conv1 = Conv2D(in_channels=3, out_channels=32, kernel_size=3, padding=1)
-        self.conv2 = Conv2D(in_channels=32, out_channels=64, kernel_size=3, padding=1)
-        self.conv3 = Conv2D(in_channels=64, out_channels=128, kernel_size=3, padding=1)
+        # Convolutional layers using MultiChannelConv2D
+        # Note: No padding support yet, so output sizes will be smaller
+        self.conv1 = MultiChannelConv2D(in_channels=3, out_channels=32, kernel_size=(3, 3))
+        self.conv2 = MultiChannelConv2D(in_channels=32, out_channels=64, kernel_size=(3, 3))
        
        # Pooling layers
-        self.pool = MaxPool2D(kernel_size=2, stride=2)
+        self.pool = MaxPool2D(pool_size=(2, 2))
        
        # Calculate size after convolutions and pooling
-        # 32x32 -> pool -> 16x16 -> pool -> 8x8 -> pool -> 4x4
-        # 128 channels * 4 * 4 = 2048
+        # Input: 3@32x32
+        # After conv1 (3x3): 32@30x30
+        # After pool1 (2x2): 32@15x15
+        # After conv2 (3x3): 64@13x13
+        # After pool2 (2x2): 64@6x6
+        # Flattened: 64 * 6 * 6 = 2304
        
        # Fully connected layers
-        self.fc1 = Dense(128 * 4 * 4, 256)
-        self.fc2 = Dense(256, 10)
+        self.fc1 = Dense(64 * 6 * 6, 128)
+        self.fc2 = Dense(128, 10)
        
        self.relu = ReLU()
        
        # Collect all layers with parameters
-        self.conv_layers = [self.conv1, self.conv2, self.conv3]
+        self.conv_layers = [self.conv1, self.conv2]
        self.fc_layers = [self.fc1, self.fc2]
        
-        # Initialize weights
-        self._initialize_weights()
+        # Initialize weights (already done in MultiChannelConv2D with He init)
+        self._initialize_fc_weights()
    
-    def _initialize_weights(self):
-        """Initialize weights with proper scaling."""
-        # Conv layers - He initialization
-        for conv in self.conv_layers:
-            fan_in = conv.weight.shape[1] * conv.weight.shape[2] * conv.weight.shape[3]
-            std = np.sqrt(2.0 / fan_in)
-            conv.weight._data = np.random.randn(*conv.weight.shape).astype(np.float32) * std
-            if conv.bias is not None:
-                conv.bias._data = np.zeros(conv.bias.shape, dtype=np.float32)
-            conv.weight = Variable(conv.weight.data, requires_grad=True)
-            if conv.bias is not None:
-                conv.bias = Variable(conv.bias.data, requires_grad=True)
-        
-        # FC layers
+    def _initialize_fc_weights(self):
+        """Initialize fully connected layer weights."""
        for i, layer in enumerate(self.fc_layers):
            fan_in = layer.weights.shape[0]
+            # Use smaller std for output layer
            std = 0.01 if i == len(self.fc_layers) - 1 else np.sqrt(2.0 / fan_in)
            layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
            layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
@@ -73,81 +74,134 @@ class SimpleCNN:
            layer.bias = Variable(layer.bias.data, requires_grad=True)
    
    def forward(self, x):
-        """Forward pass through CNN."""
-        # Reshape from (batch, 3072) to (batch, 3, 32, 32) if needed
-        batch_size = x.shape[0]
-        if len(x.shape) == 2:
-            x = x.reshape(batch_size, 3, 32, 32)
+        """Forward pass through CNN.
        
-        # Conv block 1
-        h = self.relu(self.conv1(x))
-        h = self.pool(h)  # 32x32 -> 16x16
+        Args:
+            x: Input tensor of shape (batch, 3, 32, 32) or flattened
+            
+        Returns:
+            Logits of shape (batch, 10)
+        """
+        batch_size = x.shape[0] if len(x.shape) > 1 else 1
        
-        # Conv block 2  
-        h = self.relu(self.conv2(h))
-        h = self.pool(h)  # 16x16 -> 8x8
+        # Reshape from flattened to image format if needed
+        if len(x.shape) == 2 and x.shape[1] == 3072:
+            # Reshape from (batch, 3072) to (batch, 3, 32, 32)
+            x_data = x.data if hasattr(x, 'data') else x._data
+            x_reshaped = x_data.reshape(batch_size, 3, 32, 32)
+            x = Tensor(x_reshaped) if not isinstance(x, Variable) else Variable(x_reshaped, x.requires_grad)
+        elif len(x.shape) == 2:
+            # Single flattened image
+            x_data = x.data if hasattr(x, 'data') else x._data
+            x_reshaped = x_data.reshape(3, 32, 32)
+            x = Tensor(x_reshaped) if not isinstance(x, Variable) else Variable(x_reshaped, x.requires_grad)
        
-        # Conv block 3
-        h = self.relu(self.conv3(h))
-        h = self.pool(h)  # 8x8 -> 4x4
+        # Conv block 1: 3@32x32 → 32@30x30 → 32@15x15
+        h = self.conv1(x)
+        h = self.relu(h)
+        h = self.pool(h)
        
-        # Flatten for FC layers
-        h = h.reshape(batch_size, -1)
+        # Conv block 2: 32@15x15 → 64@13x13 → 64@6x6
+        h = self.conv2(h)
+        h = self.relu(h)
+        h = self.pool(h)
        
-        # FC layers
+        # Flatten for FC layers: 64@6x6 → 2304
+        h = flatten(h)
+        
+        # FC layers: 2304 → 128 → 10
        h = self.relu(self.fc1(h))
        return self.fc2(h)
    
    def parameters(self):
        """Get all trainable parameters."""
        params = []
+        # Conv layer parameters
        for conv in self.conv_layers:
-            params.append(conv.weight)
+            params.append(conv.weights)
            if conv.bias is not None:
                params.append(conv.bias)
+        # FC layer parameters
        for fc in self.fc_layers:
            params.extend([fc.weights, fc.bias])
        return params
+    
+    def count_parameters(self):
+        """Count total number of parameters."""
+        total = 0
+        for p in self.parameters():
+            if hasattr(p, 'data'):
+                data = p.data if not hasattr(p.data, '_data') else p.data._data
+                total += np.prod(data.shape)
+        return total

 def preprocess(images, training=True):
-    """Preprocess CIFAR-10 images."""
-    batch_size = images.shape[0]
-    images_np = images.data if hasattr(images, 'data') else images._data
+    """Preprocess CIFAR-10 images.
    
-    # Data augmentation for training
+    Args:
+        images: Raw image tensor
+        training: Whether to apply data augmentation
+        
+    Returns:
+        Preprocessed tensor ready for CNN
+    """
+    images_np = images.data if hasattr(images, 'data') else images._data
+    batch_size = images_np.shape[0]
+    
+    # Data augmentation for training (horizontal flip)
    if training:
        augmented = np.copy(images_np)
        for i in range(batch_size):
            if np.random.random() > 0.5:
-                # Horizontal flip
-                augmented[i] = np.flip(augmented[i], axis=2)
+                # Flip the spatial dimensions (last axis for flattened, axis 2 for image format)
+                if len(augmented.shape) == 2:
+                    # Flattened format: reshape, flip, flatten
+                    img = augmented[i].reshape(3, 32, 32)
+                    img = np.flip(img, axis=2)
+                    augmented[i] = img.flatten()
+                else:
+                    augmented[i] = np.flip(augmented[i], axis=2)
        images_np = augmented
    
-    # Normalize
+    # Normalize (using CIFAR-10 statistics)
    normalized = (images_np - 0.485) / 0.229
    
-    # Ensure correct shape for CNN: (batch, 3, 32, 32)
+    # Ensure correct shape for CNN
    if len(normalized.shape) == 2:
-        # From flat to image format
-        batch_size = normalized.shape[0]
+        # From flat (batch, 3072) to image format (batch, 3, 32, 32)
        normalized = normalized.reshape(batch_size, 3, 32, 32)
    
    return Tensor(normalized.astype(np.float32))

 def evaluate(model, dataloader, max_batches=30):
-    """Evaluate model accuracy."""
+    """Evaluate model accuracy.
+    
+    Args:
+        model: CNN model
+        dataloader: Data loader
+        max_batches: Maximum number of batches to evaluate
+        
+    Returns:
+        Accuracy as float between 0 and 1
+    """
    correct = total = 0
+    
    for batch_idx, (images, labels) in enumerate(dataloader):
        if batch_idx >= max_batches:
            break
        
+        # Preprocess and create Variable
        x = Variable(preprocess(images, training=False), requires_grad=False)
+        
+        # Forward pass
        logits = model.forward(x)
        
+        # Get predictions
        logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
        predictions = np.argmax(logits_np, axis=1)
        labels_np = labels.data if hasattr(labels, 'data') else labels._data
        
+        # Count correct predictions
        correct += np.sum(predictions == labels_np)
        total += len(labels_np)
    
@@ -155,43 +209,51 @@ def evaluate(model, dataloader, max_batches=30):

 def main():
    print("="*60)
-    print("CIFAR-10 CNN Training - Convolutional Neural Network")
+    print("CIFAR-10 CNN Training - MultiChannelConv2D")
    print("="*60)
-    print("\nUsing Conv2D layers for spatial feature extraction!")
-    print("Architecture: Conv2D -> Pool -> Conv2D -> Pool -> Conv2D -> Pool -> FC")
+    print("\n🧠 Using TinyTorch's multi-channel convolutions!")
+    print("Architecture: Conv(3→32) → Pool → Conv(32→64) → Pool → Dense")
    
    # Load data
-    print("\nLoading CIFAR-10 dataset...")
+    print("\n📚 Loading CIFAR-10 dataset...")
    train_dataset = CIFAR10Dataset(train=True, root='data')
    test_dataset = CIFAR10Dataset(train=False, root='data')
    
+    # Smaller batch size for memory efficiency with convolutions
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
    
-    print(f"Training samples: {len(train_dataset)}")
-    print(f"Test samples: {len(test_dataset)}")
+    print(f"Training samples: {len(train_dataset):,}")
+    print(f"Test samples: {len(test_dataset):,}")
    
    # Create model
-    print("\nInitializing CNN model...")
+    print("\n🔧 Initializing CNN model...")
    model = SimpleCNN()
+    print(f"Total parameters: {model.count_parameters():,}")
+    print(f"  - Conv layers: {32*3*3*3 + 32 + 64*32*3*3 + 64:,} parameters")
+    print(f"  - FC layers: {64*6*6*128 + 128 + 128*10 + 10:,} parameters")
+    
+    # Loss and optimizer
    loss_fn = CrossEntropyLoss()
    optimizer = Adam(model.parameters(), lr=0.001)
    
-    # Training settings
-    epochs = 10
+    # Training settings (reduced for demo)
+    epochs = 5  # Reduced for faster demo
    eval_every = 50
+    max_batches = 200  # Limit batches per epoch for demo
    
-    print(f"\nTraining for {epochs} epochs...")
+    print(f"\n🚀 Training for {epochs} epochs (limited to {max_batches} batches/epoch)...")
    print("-" * 40)
    
    # Training loop
+    best_accuracy = 0
    for epoch in range(epochs):
        start_time = time.time()
        running_loss = 0
        batches = 0
        
        for batch_idx, (images, labels) in enumerate(train_loader):
-            if batch_idx >= 100:  # Limit batches for quick demo
+            if batch_idx >= max_batches:
                break
            
            # Forward pass
@@ -209,41 +271,54 @@ def main():
            running_loss += loss.data
            batches += 1
            
-            # Evaluate periodically
+            # Periodic evaluation
            if (batch_idx + 1) % eval_every == 0:
-                train_acc = evaluate(model, train_loader, max_batches=10)
-                test_acc = evaluate(model, test_loader, max_batches=20)
+                train_acc = evaluate(model, train_loader, max_batches=5)
+                test_acc = evaluate(model, test_loader, max_batches=10)
                print(f"Epoch {epoch+1}, Batch {batch_idx+1}: "
                      f"Loss={running_loss/batches:.3f}, "
                      f"Train={train_acc:.1%}, Test={test_acc:.1%}")
+                
+                if test_acc > best_accuracy:
+                    best_accuracy = test_acc
        
-        # End of epoch evaluation
+        # End of epoch
        epoch_time = time.time() - start_time
        test_accuracy = evaluate(model, test_loader, max_batches=50)
-        print(f"\nEpoch {epoch+1} complete in {epoch_time:.1f}s - Test Accuracy: {test_accuracy:.1%}")
+        print(f"\n✓ Epoch {epoch+1} complete in {epoch_time:.1f}s")
+        print(f"  Test Accuracy: {test_accuracy:.1%}")
+        
+        if test_accuracy > best_accuracy:
+            best_accuracy = test_accuracy
    
    # Final evaluation
    print("\n" + "="*60)
-    print("Final Evaluation")
+    print("📊 Final Evaluation")
    print("-" * 40)
    
    final_accuracy = evaluate(model, test_loader, max_batches=100)
    print(f"Final Test Accuracy: {final_accuracy:.1%}")
+    print(f"Best Accuracy Achieved: {best_accuracy:.1%}")
    
-    # Compare with baselines
-    print("\n📊 Performance Comparison:")
-    print(f"  Random Baseline: ~10%")
-    print(f"  MLP (no conv):   ~55%")
+    # Performance comparison
+    print("\n🎯 Performance Comparison:")
+    print(f"  Random Baseline:  ~10%")
+    print(f"  MLP (no conv):    ~55%")
    print(f"  CNN (with Conv2D): {final_accuracy:.1%} {'✅' if final_accuracy > 0.55 else ''}")
    
    if final_accuracy > 0.55:
-        print("\n🎉 CNN outperforms MLP! Convolutions work!")
+        print("\n🎉 Success! CNN outperforms MLP!")
+        print("   Convolutions extract spatial features effectively!")
    
    print("\n💡 Why CNNs work better for images:")
-    print("  - Conv2D learns spatial features")
-    print("  - Pooling provides translation invariance")
-    print("  - Hierarchical feature learning")
-    print("  - Parameter sharing reduces overfitting")
+    print("  - Conv2D learns spatial feature detectors")
+    print("  - Parameter sharing (same filter across image)")
+    print("  - Translation invariance from pooling")
+    print("  - Hierarchical feature learning (edges → shapes → objects)")
+    print("\n📈 Systems Insight:")
+    print(f"  - Conv parameters: {32*3*3*3 + 64*32*3*3:,} (~{(32*3*3*3 + 64*32*3*3)*4/1024:.1f} KB)")
+    print(f"  - MLP equivalent: {3072*1024:,} (~{3072*1024*4/1024/1024:.1f} MB)")
+    print("  - Parameter reduction: {(1 - (32*3*3*3 + 64*32*3*3)/(3072*1024)):.1%}")

 if __name__ == "__main__":
    main()
--- a/modules/source/06_spatial/spatial_dev.py
+++ b/modules/source/06_spatial/spatial_dev.py
--- a/test_15_modules.py
+++ b/test_15_modules.py
@@ -0,0 +1,44 @@
+#!/usr/bin/env python3
+"""Test the final 15-module structure."""
+
+import subprocess
+import sys
+from pathlib import Path
+
+def test_module(module_path):
+    """Test a single module."""
+    py_files = list(module_path.glob("*_dev.py"))
+    if not py_files:
+        return None
+    result = subprocess.run([sys.executable, str(py_files[0])], 
+                          capture_output=True, timeout=10, cwd=Path.cwd())
+    return result.returncode == 0
+
+print("="*60)
+print("TinyTorch 15-Module Structure Test")
+print("="*60)
+
+modules_dir = Path("modules/source")
+parts = [
+    ("Part I: MLPs (XORNet)", ["01_setup", "02_tensor", "03_activations", "04_layers", "05_networks"]),
+    ("Part II: CNNs (CIFAR-10)", ["06_spatial", "07_dataloader", "08_autograd", "09_optimizers", "10_training"]),
+    ("Part III: Transformers (TinyGPT)", ["11_embeddings", "12_attention", "13_normalization", "14_transformers", "15_generation"])
+]
+
+for part_name, modules in parts:
+    print(f"\n{part_name}")
+    print("-"*40)
+    for module in modules:
+        path = modules_dir / module
+        if not path.exists():
+            print(f"  ⚠️  {module:20} Missing")
+        elif test_module(path):
+            print(f"  ✅ {module:20} Passes")
+        elif test_module(path) is None:
+            print(f"  ⚠️  {module:20} No implementation")
+        else:
+            print(f"  ❌ {module:20} Failed")
+
+print("\n" + "="*60)
+print("✨ Clean 15-module structure ready!")
+print("Each part: 5 modules, 1 innovation, 1 capstone")