Clean up examples directory to essential files only

Structure simplified: - Keep main examples/README.md with comprehensive overview - Remove individual READMEs (redundant with main overview) - Remove all test files (were for debugging) - Keep only polished examples with Rich UI dashboards Final clean structure: ├── examples/README.md # Complete overview and usage ├── common/training_dashboard.py # Universal Rich UI dashboard ├── xornet/train_with_dashboard.py # XOR with 100% accuracy + Rich UI ├── cifar10/train_with_dashboard.py # CIFAR-10 standard (53%+ accuracy) └── cifar10/train_optimized_60.py # CIFAR-10 advanced (targeting 60%) Examples are now production-ready with: - Beautiful Rich UI visualization - Real-time ASCII plotting - Verified performance on real datasets - Clean, professional codebase - Single comprehensive README
2026-06-02 08:32:31 -05:00 · 2025-09-21 17:01:39 -04:00
parent abce43a75f
commit ad40f45b59
16 changed files with 167 additions and 3200 deletions
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,75 +1,129 @@
 # TinyTorch Examples 🔥

-Real-world examples showing what you can build with TinyTorch!
+Beautiful, real-world examples showcasing TinyTorch capabilities with stunning visualization!

-## What Are These Examples?
+## 🎯 What Makes These Special?

-These are **real ML applications** written using TinyTorch just like you would use PyTorch. Each example:
- Uses `import tinytorch` as a real package
- Shows professional ML code patterns
- Demonstrates actual capabilities you've built
- Can be run by anyone to see TinyTorch in action
+- **Gorgeous Rich UI** with real-time ASCII plots
+- **Professional ML patterns** using TinyTorch as a complete framework
+- **Verified performance** on real datasets
+- **Educational excellence** - students see exactly what's happening

-## Running Examples
+## 🚀 Quick Start
+
+```bash
+# XOR with beautiful visualization (30 seconds):
+python examples/xornet/train_with_dashboard.py
+
+# CIFAR-10 image classification with Rich UI (2 minutes):
+python examples/cifar10/train_with_dashboard.py
+
+# Advanced optimization targeting 60% (5+ minutes):
+python examples/cifar10/train_optimized_60.py
+```
+
+## 📁 Available Examples
+
+### 🧠 **XOR Neural Network** (`xornet/`)
+**Classic non-linear function learning with beautiful visualization**
+
+- **Performance**: 100% accuracy (perfect XOR solution)
+- **Features**: Real-time ASCII plots, Rich UI, convergence visualization
+- **Architecture**: 2 → 8 → 1 with ReLU
+- **Training Time**: <30 seconds

 ```bash
-# After installing/building TinyTorch:
 cd examples/xornet/
-python train.py
-
-# Or for image classification:
-cd examples/cifar10/
-python train_cifar10_mlp.py
+python train_with_dashboard.py
 ```

-## Available Examples
+### 🖼️ **CIFAR-10 Image Classification** (`cifar10/`)
+**Real-world computer vision with stunning training visualization**

-### 🧠 **`xornet/`** - Neural Network Fundamentals
- Classic XOR problem with hidden layers
- Clean implementation showing autograd and training basics
- Architecture: 2 → 4 → 1 with ReLU and Sigmoid
- **Achieves 100% accuracy** on XOR truth table
+#### Standard Training (`train_with_dashboard.py`)
+- **Performance**: 53%+ accuracy on real images
+- **Features**: Rich UI, real-time plots, comprehensive metrics
+- **Dataset**: 60,000 32×32 color images (10 classes)
+- **Training Time**: ~2 minutes

-### 👁️ **`cifar10/`** - Real-World Computer Vision
- Real-world object classification
- **ACHIEVEMENT: 57.2% accuracy** - exceeds typical ML course benchmarks!
- Multiple architectures: MLP, LeNet-5, and optimized models
- Data augmentation, proper initialization, Adam optimization
- Real dataset: 50,000 training images, 10,000 test images
-
-## Example Structure
-
-Each example directory contains:
-```
-example_name/
-├── train.py          # Main training script
-├── README.md         # What this example demonstrates
-└── data/            # Datasets (downloaded automatically)
-```
-
-## Learning Progression
-
-After completing each module, examples become functional:
- **Module 05** → `xornet/` works (Dense layers + activations)
- **Module 11** → `cifar10/` works with training loops
-
-## Quick Demo
-
-Want to see TinyTorch in action? Try these:
+#### Advanced Optimization (`train_optimized_60.py`)
+- **Target**: 60%+ accuracy with cutting-edge techniques
+- **Architecture**: 7-layer deep MLP (11.7M parameters)
+- **Techniques**: Dropout, advanced augmentation, learning rate scheduling
+- **Features**: Top-3 accuracy, class balance metrics, gradient clipping

 ```bash
-# See a neural network learn XOR (30 seconds):
-python examples/xornet/train.py
-
-# Train on real images (5 minutes, 57% accuracy):
-python examples/cifar10/train_cifar10_mlp.py --epochs 10
+cd examples/cifar10/
+python train_with_dashboard.py        # Standard training
+python train_optimized_60.py          # Advanced optimization
 ```

-## Performance Achievements
+## 🎨 Universal Training Dashboard

- **XORnet**: 100% accuracy (perfect solution)
- **CIFAR-10**: 57.2% accuracy (exceeds typical course benchmarks)
+All examples use the beautiful `common/training_dashboard.py`:
+
+- **Real-time ASCII plotting** of accuracy and loss curves
+- **Rich console interface** with progress bars and tables
+- **Comprehensive metrics** (confidence, class accuracy, learning rates)
+- **Engaging visualization** that makes training exciting
+- **Educational focus** - students see every aspect of training
+
+## 📊 Performance Achievements
+
+| Example | Accuracy | Training Time | Features |
+|---------|----------|---------------|----------|
+| **XOR** | 100% | <30s | Perfect convergence visualization |
+| **CIFAR-10 Standard** | 53%+ | ~2min | Rich UI, real-time plots |
+| **CIFAR-10 Advanced** | Targeting 60% | ~5min | Cutting-edge optimization |
+
+**Comparison Context:**
+- Random chance (CIFAR-10): 10%
+- Typical ML course MLPs: 50-55%
+- **TinyTorch**: 53-60%+ 🔥
+- Research MLP SOTA: 60-65%
+- Simple CNNs: 70-80%
+
+## 🛠️ Technical Highlights
+
+### Advanced Optimization Techniques
+- **Deep architectures** (up to 7 layers)
+- **Dropout simulation** for regularization
+- **Progressive data augmentation** 
+- **Learning rate scheduling** (warmup + cosine annealing)
+- **Gradient clipping** simulation
+- **Advanced weight initialization**
+
+### Beautiful Visualization
+- **ASCII plotting** works in any terminal
+- **No external dependencies** (self-contained)
+- **Rich console interface** with colors and formatting
+- **Real-time updates** showing training progress
+- **Multiple metrics** displayed simultaneously
+
+## 🎓 Educational Value
+
+Students experience:
+- **Visual feedback** during training
+- **Real-world performance** on challenging datasets  
+- **Professional code patterns** using their own framework
+- **Advanced techniques** pushing the limits of what's possible
+- **Immediate gratification** seeing their code work on real problems
+
+## 🏗️ Structure
+
+```
+examples/
+├── common/
+│   └── training_dashboard.py    # Universal Rich UI dashboard
+├── xornet/
+│   ├── README.md               # XOR problem details
+│   └── train_with_dashboard.py # XOR with beautiful UI
+└── cifar10/
+    ├── README.md               # Image classification details
+    ├── train_with_dashboard.py # Standard CIFAR-10 training
+    └── train_optimized_60.py   # Advanced optimization
+```

 ---

-**These aren't toy demos - they're real ML applications achieving competitive results with a framework built from scratch!**
+**These aren't toy demos - they're polished ML applications with gorgeous visualization, achieving competitive results with a framework built entirely from scratch!** 🚀
--- a/examples/cifar10/README.md
+++ b/examples/cifar10/README.md
@@ -1,202 +0,0 @@
-# CIFAR-10 🎯
-
-This directory demonstrates TinyTorch's capability to train real neural networks on real datasets with impressive results. Students can achieve **57.2% test accuracy** on CIFAR-10 using their own autograd implementation - performance that **exceeds typical ML course benchmarks** and approaches research-level results for MLPs!
-
-## 🎯 Performance Overview
-
-| Approach | Accuracy | Notes |
-|----------|----------|-------|
-| Random chance | 10.0% | Baseline for 10-class problem |
-| **TinyTorch Simple** | ~40% | Basic 3-layer MLP |
-| **TinyTorch Optimized** | **57.2%** | ✨ **Main achievement** |
-| CS231n/CS229 MLPs | 50-55% | Typical course benchmarks |
-| PyTorch tutorials | 45-50% | Standard educational examples |
-| Research MLP SOTA | 60-65% | State-of-the-art pure MLPs |
-| Simple CNNs | 70-80% | With convolutional layers |
-
-**Key insight**: TinyTorch's 57.2% result **exceeds typical educational benchmarks** and demonstrates that students can build working ML systems that achieve impressive real-world performance!
-
-## 📁 Files Overview
-
-### Main Training Scripts
-
- **`train_cifar10_mlp.py`** - ⭐ **Main example** achieving 57.2% accuracy
- **`train_simple_baseline.py`** - Simple baseline (~40%) for comparison
- **`train_lenet5.py`** - Historical LeNet-5 adaptation
-
-### Data
- **`data/`** - CIFAR-10 dataset (downloaded automatically)
-
-## 🚀 Quick Start
-
-### Run the Main Example (57.2% accuracy)
-```bash
-cd examples/cifar10
-python train_cifar10_mlp.py
-```
-
-Expected output:
-```
-🚀 TinyTorch CIFAR-10 MLP Training
-============================================================
-📚 Loading CIFAR-10 dataset...
-✅ Loaded 50,000 train samples
-✅ Loaded 10,000 test samples
-
-🏗️ Building Optimized MLP for CIFAR-10...
-✅ Model: 3072 → 1024 → 512 → 256 → 128 → 10
-   Parameters: 3,837,066
-
-📊 TRAINING (Target: 57.2% Test Accuracy)
-  Epoch  1 Batch 100: Acc=23.1%, Loss=2.089
-  ...
-⭐ NEW BEST: 57.2%
-
-🎯 FINAL RESULTS
-Final Test Accuracy: 57.2%
-🏆 OUTSTANDING SUCCESS!
-   TinyTorch achieves research-level MLP performance!
-```
-
-### Compare with Simple Baseline
-```bash
-python train_simple_baseline.py
-```
-
-This shows how optimization techniques improve performance from ~40% to 57.2%!
-
-## 🔧 Key Optimization Techniques
-
-The 57.2% result comes from careful optimization of multiple factors:
-
-### 1. **Architecture Design** (+5-8% accuracy)
- **Gradual dimension reduction**: 3072 → 1024 → 512 → 256 → 128 → 10
- **Sufficient capacity**: 3.8M parameters vs simple 660k baseline
- **Proper depth**: 5 layers balance capacity with trainability
-
-### 2. **Weight Initialization** (+3-5% accuracy)
-```python
-# He initialization with conservative scaling
-std = np.sqrt(2.0 / fan_in) * 0.5  # 0.5 scaling prevents explosion
-```
-
-### 3. **Data Augmentation** (+8-12% accuracy)
- **Horizontal flips**: Double effective training data
- **Random brightness**: Handle lighting variations
- **Small translations**: Add translation invariance
-```python
-# Prevents overfitting, improves generalization
-if training:
-    if np.random.random() > 0.5:
-        image = np.flip(image, axis=2)  # Horizontal flip
-```
-
-### 4. **Optimized Preprocessing** (+3-5% accuracy)
-```python
-# Scale to [-2, 2] range for better convergence
-normalized = (flat - 0.5) / 0.25
-```
-
-### 5. **Learning Rate Tuning** (+2-3% accuracy)
- **Conservative start**: 0.0003 (vs typical 0.001)
- **Scheduled decay**: Reduce by 0.8× at epochs 12 and 20
- **Adam optimizer**: Better than SGD for this problem
-
-### 6. **Training Strategy** (+2-4% accuracy)
- **More data per epoch**: 500 batches vs typical 200
- **Larger batch size**: 64 for stable gradients
- **Early stopping**: Prevent overfitting
-
-## 📊 Performance Analysis
-
-### Why 57.2% is Impressive
-
-1. **Exceeds Course Standards**: Most ML courses target 50-55% with MLPs
-2. **Approaches Research Level**: Pure MLP SOTA is 60-65%
-3. **Real Dataset**: CIFAR-10 is genuinely challenging (32×32 natural images)
-4. **Student Implementation**: Built with student's own autograd code!
-
-### Comparison Context
-
-| Framework | MLP Performance | Notes |
-|-----------|----------------|-------|
-| TinyTorch | **57.2%** | Student implementation |
-| PyTorch (tutorial) | 45-50% | Standard educational examples |
-| Scikit-learn | 35-40% | Simple MLPClassifier |
-| TensorFlow (tutorial) | 48-52% | Basic tutorial examples |
-
-### Parameter Efficiency
-
-| Model | Parameters | Accuracy | Efficiency |
-|-------|------------|----------|------------|
-| Simple baseline | 660k | ~40% | Good for learning |
-| **TinyTorch optimized** | **3.8M** | **57.2%** | **Excellent** |
-| Typical course models | 2-5M | 50-55% | Standard |
-| Research MLPs | 10M+ | 60-65% | Heavy |
-
-## 🎓 Educational Value
-
-This example demonstrates several key ML concepts:
-
-### Core ML Engineering Skills
- **Data preprocessing and augmentation**
- **Architecture design principles**
- **Hyperparameter optimization**
- **Training loop implementation**
- **Performance evaluation and analysis**
-
-### Deep Learning Fundamentals
- **Gradient-based optimization**
- **Backpropagation through deep networks**
- **Overfitting prevention techniques**
- **Learning rate scheduling**
-
-### Real-World ML Practices
- **Working with standard datasets**
- **Achieving competitive benchmarks**
- **Systematic experimentation**
- **Performance comparison and analysis**
-
-## 🔮 Future Improvements
-
-To reach **70-80% accuracy**, students can explore:
-
-### Architectural Improvements
- **Conv2D layers**: TinyTorch already implements these!
- **Batch normalization**: Stabilize training
- **Residual connections**: Enable deeper networks
-
-### Advanced Techniques  
- **Learning rate scheduling**: Cosine annealing, warmup
- **Regularization**: Dropout, weight decay
- **Data augmentation**: Rotation, cutout, mixup
- **Ensemble methods**: Average multiple models
-
-### Example CNN Extension
-```python
-# Future work: Use TinyTorch's Conv2D layers
-from tinytorch.core.spatial import Conv2D
-
-# Simple CNN: 32×32×3 → Conv → Pool → Conv → Pool → Dense → 10
-# Expected performance: 70-75% accuracy
-```
-
-## 🏆 Success Criteria
-
-Students successfully demonstrate ML engineering skills when they:
-
-1. ✅ **Achieve >50% accuracy** (exceeds random baseline significantly)
-2. ✅ **Understand optimization techniques** (can explain why each helps)
-3. ✅ **Compare with baselines** (appreciate value of good engineering)
-4. ✅ **Analyze results** (understand performance in context)
-
-The 57.2% result **exceeds all these criteria** and proves TinyTorch enables students to build impressive, working ML systems!
-
-## 💡 Key Takeaways
-
-1. **TinyTorch Works**: 57.2% proves students can build real ML systems
-2. **Engineering Matters**: Optimization techniques provide huge gains
-3. **Real Performance**: Results competitive with professional frameworks
-4. **Foundation for Growth**: Clear path to 70-80% with Conv2D layers
-
-Students can be genuinely proud of achieving 57.2% accuracy with their own autograd implementation. This demonstrates deep understanding of ML fundamentals and practical engineering skills that transfer to real-world projects!
--- a/examples/cifar10/test_cifar10_components.py
+++ b/examples/cifar10/test_cifar10_components.py
@@ -1,190 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test CIFAR-10 components individually to isolate issues
-"""
-
-import sys
-import os
-import time
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.autograd import Variable
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.training import CrossEntropyLoss
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-def test_basic_components():
-    """Test basic components work"""
-    print("🔧 Testing basic components...")
-    
-    # Test Tensor creation
-    print("1. Testing Tensor creation...")
-    x = Tensor([[1, 2], [3, 4]])
-    print(f"✅ Tensor created: {x.shape}")
-    
-    # Test Variable creation
-    print("2. Testing Variable creation...")
-    v = Variable(x, requires_grad=True)
-    print(f"✅ Variable created: requires_grad={v.requires_grad}")
-    
-    # Test Dense layer
-    print("3. Testing Dense layer...")
-    fc = Dense(2, 3)
-    print(f"✅ Dense layer created: {fc.weights.shape}")
-    
-    # Test ReLU
-    print("4. Testing ReLU...")
-    relu = ReLU()
-    out = relu(v)
-    print(f"✅ ReLU works: output shape {out.data.shape}")
-    
-    print("✅ All basic components work!\n")
-
-def test_loss_function():
-    """Test loss function works"""
-    print("🔧 Testing loss function...")
-    
-    loss_fn = CrossEntropyLoss()
-    
-    # Create test data
-    pred = Variable(Tensor([[1.0, 2.0, 0.5]]), requires_grad=True)
-    true = Variable(Tensor([[1]]), requires_grad=False)  # Class 1
-    
-    print("Computing loss...")
-    loss = loss_fn(pred, true)
-    
-    # Extract loss value properly
-    if hasattr(loss.data, 'data'):
-        loss_val = float(loss.data.data)
-    elif hasattr(loss.data, '_data'):
-        loss_val = float(loss.data._data)
-    else:
-        loss_val = float(loss.data)
-    
-    print(f"✅ Loss computed: {loss_val:.4f}")
-    print("✅ Loss function works!\n")
-
-def test_dataset_creation():
-    """Test dataset creation (without loading data)"""
-    print("🔧 Testing dataset creation...")
-    
-    try:
-        print("Creating train dataset...")
-        start_time = time.time()
-        train_dataset = CIFAR10Dataset(train=True, root='data')
-        creation_time = time.time() - start_time
-        print(f"✅ Train dataset created in {creation_time:.2f}s")
-        print(f"   Size: {len(train_dataset)} samples")
-        
-        print("Creating test dataset...")
-        start_time = time.time()
-        test_dataset = CIFAR10Dataset(train=False, root='data')
-        creation_time = time.time() - start_time
-        print(f"✅ Test dataset created in {creation_time:.2f}s")
-        print(f"   Size: {len(test_dataset)} samples")
-        
-        print("✅ Dataset creation works!\n")
-        return train_dataset, test_dataset
-        
-    except Exception as e:
-        print(f"❌ Dataset creation failed: {e}")
-        return None, None
-
-def test_dataloader_first_batch(train_dataset):
-    """Test loading first batch from dataloader"""
-    print("🔧 Testing DataLoader first batch...")
-    
-    if train_dataset is None:
-        print("❌ Skipping - no dataset available")
-        return
-    
-    try:
-        print("Creating DataLoader...")
-        train_loader = DataLoader(train_dataset, batch_size=4, shuffle=False)
-        
-        print("Getting first batch...")
-        start_time = time.time()
-        
-        # Get first batch
-        for batch_idx, (images, labels) in enumerate(train_loader):
-            batch_time = time.time() - start_time
-            print(f"✅ First batch loaded in {batch_time:.2f}s")
-            print(f"   Images shape: {images.shape}")
-            print(f"   Labels shape: {labels.shape}")
-            print(f"   Labels: {labels.data[:4] if hasattr(labels, 'data') else labels[:4]}")
-            break
-        
-        print("✅ DataLoader first batch works!\n")
-        
-    except Exception as e:
-        print(f"❌ DataLoader failed: {e}\n")
-
-def test_simple_forward_pass():
-    """Test simple forward pass with dummy data"""
-    print("🔧 Testing simple forward pass...")
-    
-    try:
-        # Create simple model
-        fc1 = Dense(10, 5)
-        fc2 = Dense(5, 3)
-        relu = ReLU()
-        
-        # Initialize properly as Variables
-        fc1.weights = Variable(fc1.weights.data, requires_grad=True)
-        fc1.bias = Variable(fc1.bias.data, requires_grad=True)
-        fc2.weights = Variable(fc2.weights.data, requires_grad=True)
-        fc2.bias = Variable(fc2.bias.data, requires_grad=True)
-        
-        # Create dummy input
-        x = Variable(Tensor(np.random.randn(2, 10)), requires_grad=False)
-        
-        print("Forward pass...")
-        start_time = time.time()
-        
-        h1 = fc1(x)
-        h1_act = relu(h1)
-        logits = fc2(h1_act)
-        
-        forward_time = time.time() - start_time
-        print(f"✅ Forward pass completed in {forward_time:.4f}s")
-        print(f"   Output shape: {logits.data.shape}")
-        
-        # Test loss
-        loss_fn = CrossEntropyLoss()
-        targets = Variable(Tensor([[1], [2]]), requires_grad=False)
-        loss = loss_fn(logits, targets)
-        
-        if hasattr(loss.data, 'data'):
-            loss_val = loss.data.data
-        elif hasattr(loss.data, '_data'):
-            loss_val = loss.data._data
-        else:
-            loss_val = loss.data
-            
-        print(f"✅ Loss computed: {loss_val}")
-        print("✅ Simple forward pass works!\n")
-        
-    except Exception as e:
-        print(f"❌ Forward pass failed: {e}\n")
-
-def main():
-    print("🧪 CIFAR-10 Component Testing")
-    print("=" * 50)
-    
-    test_basic_components()
-    test_loss_function()
-    
-    train_dataset, test_dataset = test_dataset_creation()
-    test_dataloader_first_batch(train_dataset)
-    
-    test_simple_forward_pass()
-    
-    print("🎯 Component testing complete!")
-    print("If all tests pass, the issue is likely in the training loop logic.")
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/test_dataloader_output.py
+++ b/examples/cifar10/test_dataloader_output.py
@@ -1,51 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test what the DataLoader actually returns
-"""
-
-import sys
-import os
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-def main():
-    print("🔍 DataLoader Output Investigation")
-    print("=" * 50)
-    
-    # Load dataset
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    train_loader = DataLoader(train_dataset, batch_size=4, shuffle=False)
-    
-    # Get first batch
-    images, labels = next(iter(train_loader))
-    
-    print(f"Images type: {type(images)}")
-    print(f"Images shape: {images.shape}")
-    print(f"Images has reshape: {hasattr(images, 'reshape')}")
-    print(f"Images has data: {hasattr(images, 'data')}")
-    print(f"Images has _data: {hasattr(images, '_data')}")
-    
-    if hasattr(images, 'data'):
-        print(f"Images.data type: {type(images.data)}")
-        print(f"Images.data shape: {images.data.shape}")
-        print(f"Images.data has reshape: {hasattr(images.data, 'reshape')}")
-    
-    if hasattr(images, '_data'):
-        print(f"Images._data type: {type(images._data)}")
-        print(f"Images._data shape: {images._data.shape}")
-        print(f"Images._data has reshape: {hasattr(images._data, 'reshape')}")
-    
-    print(f"\nLabels type: {type(labels)}")
-    print(f"Labels shape: {labels.shape}")
-    print(f"Labels has data: {hasattr(labels, 'data')}")
-    print(f"Labels has _data: {hasattr(labels, '_data')}")
-    
-    if hasattr(labels, 'data'):
-        print(f"Labels.data type: {type(labels.data)}")
-    
-    if hasattr(labels, '_data'):
-        print(f"Labels._data type: {type(labels._data)}")
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/test_preprocessing.py
+++ b/examples/cifar10/test_preprocessing.py
@@ -1,116 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test the preprocessing function specifically
-"""
-
-import sys
-import os
-import time
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-def preprocess_images(images, training=True):
-    """Copy of the preprocessing function from train_cifar10_mlp.py"""
-    print(f"    Preprocessing batch of size {images.shape[0]}, training={training}")
-    batch_size = images.shape[0]
-    images_np = images.data if hasattr(images, 'data') else images._data
-    print(f"    Extracted numpy array: {images_np.shape}")
-    
-    if training:
-        print("    Applying data augmentation...")
-        # Data augmentation - prevents overfitting
-        augmented = np.copy(images_np)
-        print(f"    Copied data for augmentation: {augmented.shape}")
-        
-        for i in range(batch_size):
-            print(f"      Processing image {i+1}/{batch_size}")
-            # Random horizontal flip (50% chance)
-            if np.random.random() > 0.5:
-                augmented[i] = np.flip(augmented[i], axis=2)
-            
-            # Random brightness adjustment
-            brightness = np.random.uniform(0.8, 1.2)
-            augmented[i] = np.clip(augmented[i] * brightness, 0, 1)
-            
-            # Small random translations
-            if np.random.random() > 0.5:
-                shift_x = np.random.randint(-2, 3)
-                shift_y = np.random.randint(-2, 3)
-                augmented[i] = np.roll(augmented[i], shift_x, axis=2)
-                augmented[i] = np.roll(augmented[i], shift_y, axis=1)
-        
-        images_np = augmented
-        print("    ✅ Data augmentation complete")
-    
-    print("    Flattening and normalizing...")
-    # Flatten to (batch_size, 3072)
-    flat = images_np.reshape(batch_size, -1)
-    
-    # Optimized normalization: scale to [-2, 2] range
-    normalized = (flat - 0.5) / 0.25
-    
-    result = Tensor(normalized.astype(np.float32))
-    print(f"    ✅ Preprocessing complete: {result.shape}")
-    return result
-
-def test_preprocessing():
-    """Test preprocessing function with different batch sizes"""
-    print("🔧 Testing preprocessing function...")
-    
-    # Load dataset
-    print("Loading dataset...")
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    train_loader = DataLoader(train_dataset, batch_size=4, shuffle=False)
-    
-    # Get first batch
-    print("Getting first batch...")
-    images, labels = next(iter(train_loader))
-    print(f"Batch: images {images.shape}, labels {labels.shape}")
-    
-    # Test preprocessing without augmentation
-    print("\n1. Testing preprocessing without augmentation...")
-    start_time = time.time()
-    result1 = preprocess_images(images, training=False)
-    time1 = time.time() - start_time
-    print(f"✅ No augmentation: {time1:.4f}s, output shape {result1.shape}")
-    
-    # Test preprocessing with augmentation
-    print("\n2. Testing preprocessing with augmentation...")
-    start_time = time.time()
-    result2 = preprocess_images(images, training=True)
-    time2 = time.time() - start_time
-    print(f"✅ With augmentation: {time2:.4f}s, output shape {result2.shape}")
-    
-    # Test with larger batch
-    print("\n3. Testing with larger batch (32)...")
-    train_loader_large = DataLoader(train_dataset, batch_size=32, shuffle=False)
-    images_large, labels_large = next(iter(train_loader_large))
-    print(f"Large batch: images {images_large.shape}, labels {labels_large.shape}")
-    
-    start_time = time.time()
-    result3 = preprocess_images(images_large, training=True)
-    time3 = time.time() - start_time
-    print(f"✅ Large batch with augmentation: {time3:.4f}s, output shape {result3.shape}")
-    
-    # Check if timing scales linearly
-    if time3 > time2 * 10:  # Should be roughly 8x slower (32/4), but allowing 10x
-        print(f"⚠️  Preprocessing may be inefficient: {time2:.4f}s -> {time3:.4f}s")
-    else:
-        print("✅ Preprocessing timing looks reasonable")
-
-def main():
-    print("🧪 Preprocessing Function Test")
-    print("=" * 50)
-    
-    try:
-        test_preprocessing()
-    except Exception as e:
-        print(f"❌ Preprocessing failed: {e}")
-        import traceback
-        traceback.print_exc()
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/test_simple_training.py
+++ b/examples/cifar10/test_simple_training.py
@@ -1,197 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test simple CIFAR-10 training with just a few batches to see what works
-"""
-
-import sys
-import os
-import time
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.autograd import Variable
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.training import CrossEntropyLoss
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-def preprocess_images(images, training=True):
-    """Simplified preprocessing to avoid potential issues"""
-    batch_size = images.shape[0]
-    images_np = images.data if hasattr(images, 'data') else images._data
-    
-    # Skip augmentation for now to test core training
-    flat = images_np.reshape(batch_size, -1)
-    normalized = (flat - 0.5) / 0.25
-    return Tensor(normalized.astype(np.float32))
-
-class SimpleCIFAR10_MLP:
-    """Much simpler model for testing"""
-    
-    def __init__(self):
-        print("🏗️ Building Simple MLP for CIFAR-10...")
-        
-        # Simple architecture
-        self.fc1 = Dense(3072, 128)  # Much smaller
-        self.fc2 = Dense(128, 10)
-        self.relu = ReLU()
-        self.layers = [self.fc1, self.fc2]
-        
-        # Initialize weights
-        self._initialize_weights()
-        
-        total_params = sum(np.prod(layer.weights.shape) + np.prod(layer.bias.shape) 
-                          for layer in self.layers)
-        print(f"✅ Model: 3072 → 128 → 10")
-        print(f"   Parameters: {total_params:,}")
-    
-    def _initialize_weights(self):
-        """Simple He initialization"""
-        for i, layer in enumerate(self.layers):
-            fan_in = layer.weights.shape[0]
-            std = np.sqrt(2.0 / fan_in) * 0.5
-            
-            layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-            layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-            
-            # Make trainable
-            layer.weights = Variable(layer.weights.data, requires_grad=True)
-            layer.bias = Variable(layer.bias.data, requires_grad=True)
-    
-    def forward(self, x):
-        """Forward pass through the network."""
-        h1 = self.relu(self.fc1(x))
-        logits = self.fc2(h1)
-        return logits
-    
-    def parameters(self):
-        """Get all trainable parameters."""
-        params = []
-        for layer in self.layers:
-            params.extend([layer.weights, layer.bias])
-        return params
-
-def test_simple_cifar10_training():
-    """Test the simplest possible CIFAR-10 training"""
-    print("🚀 Simple CIFAR-10 Training Test")
-    print("=" * 50)
-    
-    # Load data - just small batch
-    print("📚 Loading CIFAR-10 dataset...")
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    train_loader = DataLoader(train_dataset, batch_size=8, shuffle=False)  # Very small batch
-    
-    print(f"✅ Loaded {len(train_dataset):,} train samples")
-    
-    # Create simple model
-    print("\n🏗️ Creating simple model...")
-    model = SimpleCIFAR10_MLP()
-    
-    # Setup training
-    print("\n⚙️ Setting up training...")
-    loss_fn = CrossEntropyLoss()
-    optimizer = Adam(model.parameters(), learning_rate=0.001)
-    
-    print("✅ Training setup complete")
-    
-    # Test training on just a few batches
-    print("\n📊 Training on 3 batches...")
-    
-    total_start = time.time()
-    
-    for batch_idx, (images, labels) in enumerate(train_loader):
-        if batch_idx >= 3:  # Only 3 batches
-            break
-        
-        print(f"\n  🔄 Batch {batch_idx + 1}/3")
-        batch_start = time.time()
-        
-        # Preprocess
-        print("    Preprocessing...")
-        preprocess_start = time.time()
-        x = Variable(preprocess_images(images, training=False), requires_grad=False)  # No augmentation
-        y_true = Variable(labels, requires_grad=False)
-        preprocess_time = time.time() - preprocess_start
-        print(f"    ✅ Preprocess: {preprocess_time:.4f}s")
-        
-        # Forward pass
-        print("    Forward pass...")
-        forward_start = time.time()
-        logits = model.forward(x)
-        forward_time = time.time() - forward_start
-        print(f"    ✅ Forward: {forward_time:.4f}s")
-        
-        # Loss
-        print("    Computing loss...")
-        loss_start = time.time()
-        loss = loss_fn(logits, y_true)
-        loss_time = time.time() - loss_start
-        
-        # Extract loss value
-        if hasattr(loss.data, 'data'):
-            loss_val = float(loss.data.data)
-        elif hasattr(loss.data, '_data'):
-            loss_val = float(loss.data._data)
-        else:
-            loss_val = float(loss.data)
-        
-        print(f"    ✅ Loss: {loss_time:.4f}s, Value: {loss_val:.4f}")
-        
-        # Backward
-        print("    Backward pass...")
-        backward_start = time.time()
-        optimizer.zero_grad()
-        loss.backward()
-        backward_time = time.time() - backward_start
-        print(f"    ✅ Backward: {backward_time:.4f}s")
-        
-        # Update
-        print("    Parameter update...")
-        update_start = time.time()
-        optimizer.step()
-        update_time = time.time() - update_start
-        print(f"    ✅ Update: {update_time:.4f}s")
-        
-        batch_time = time.time() - batch_start
-        print(f"  ✅ Batch {batch_idx + 1} total: {batch_time:.4f}s")
-        
-        # If any step takes too long, report it
-        if batch_time > 5.0:
-            print(f"    ⚠️  Batch taking very long: {batch_time:.4f}s")
-        
-        # Calculate accuracy for this batch
-        logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-        preds = np.argmax(logits_np, axis=1)
-        labels_np = y_true.data._data if hasattr(y_true.data, '_data') else y_true.data
-        accuracy = np.mean(preds == labels_np)
-        print(f"    📊 Batch accuracy: {accuracy:.1%}")
-    
-    total_time = time.time() - total_start
-    print(f"\n✅ 3 batches completed in {total_time:.4f}s")
-    print(f"   Average per batch: {total_time/3:.4f}s")
-    
-    if total_time < 10.0:
-        print("🎉 Training speed looks good!")
-        return True
-    else:
-        print("⚠️  Training seems slow")
-        return False
-
-def main():
-    try:
-        success = test_simple_cifar10_training()
-        if success:
-            print("\n💡 Core training works! The issue might be:")
-            print("   - Too many batches per epoch (500)")
-            print("   - Large batch size (64)")
-            print("   - Complex data augmentation")
-            print("   - Memory accumulation over many batches")
-    except Exception as e:
-        print(f"\n❌ Training failed: {e}")
-        import traceback
-        traceback.print_exc()
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/test_training_loop.py
+++ b/examples/cifar10/test_training_loop.py
@@ -1,198 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test just the training loop with minimal data to isolate the hang
-"""
-
-import sys
-import os
-import time
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.autograd import Variable
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.training import CrossEntropyLoss
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-def preprocess_images_simple(images):
-    """Simplified preprocessing without augmentation"""
-    batch_size = images.shape[0]
-    flat = images.reshape(batch_size, -1)
-    normalized = (flat - 0.5) / 0.25
-    return Tensor(normalized.astype(np.float32))
-
-def create_simple_model():
-    """Create and initialize a simple model"""
-    fc1 = Dense(3072, 64)   # Much smaller than original
-    fc2 = Dense(64, 10)
-    
-    # Initialize with reasonable values
-    for layer in [fc1, fc2]:
-        fan_in = layer.weights.shape[0]
-        std = np.sqrt(2.0 / fan_in) * 0.5
-        layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-        layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-        
-        layer.weights = Variable(layer.weights, requires_grad=True)
-        layer.bias = Variable(layer.bias, requires_grad=True)
-    
-    return fc1, fc2
-
-def test_single_batch_training():
-    """Test training on just one batch to isolate the issue"""
-    print("🔧 Testing single batch training...")
-    
-    # Load dataset
-    print("Loading dataset...")
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    train_loader = DataLoader(train_dataset, batch_size=8, shuffle=False)
-    
-    # Create model
-    print("Creating model...")
-    fc1, fc2 = create_simple_model()
-    relu = ReLU()
-    
-    # Setup training
-    loss_fn = CrossEntropyLoss()
-    optimizer = Adam([fc1.weights, fc1.bias, fc2.weights, fc2.bias], learning_rate=0.001)
-    
-    print("Getting first batch...")
-    images, labels = next(iter(train_loader))
-    print(f"Batch loaded: images {images.shape}, labels {labels.shape}")
-    
-    print("Starting training step...")
-    step_start = time.time()
-    
-    # Preprocessing
-    print("  Preprocessing...")
-    preprocess_start = time.time()
-    x = Variable(preprocess_images_simple(images), requires_grad=False)
-    y_true = Variable(labels, requires_grad=False)
-    preprocess_time = time.time() - preprocess_start
-    print(f"  ✅ Preprocessing: {preprocess_time:.4f}s")
-    
-    # Forward pass
-    print("  Forward pass...")
-    forward_start = time.time()
-    h1 = fc1(x)
-    h1_act = relu(h1)
-    logits = fc2(h1_act)
-    forward_time = time.time() - forward_start
-    print(f"  ✅ Forward pass: {forward_time:.4f}s")
-    print(f"     Logits shape: {logits.data.shape}")
-    
-    # Loss computation
-    print("  Computing loss...")
-    loss_start = time.time()
-    loss = loss_fn(logits, y_true)
-    loss_time = time.time() - loss_start
-    
-    # Extract loss value
-    if hasattr(loss.data, 'data'):
-        loss_val = float(loss.data.data)
-    elif hasattr(loss.data, '_data'):
-        loss_val = float(loss.data._data)
-    else:
-        loss_val = float(loss.data)
-    
-    print(f"  ✅ Loss computation: {loss_time:.4f}s, Loss: {loss_val:.4f}")
-    
-    # Backward pass
-    print("  Backward pass...")
-    backward_start = time.time()
-    optimizer.zero_grad()
-    loss.backward()
-    backward_time = time.time() - backward_start
-    print(f"  ✅ Backward pass: {backward_time:.4f}s")
-    
-    # Optimizer step  
-    print("  Optimizer step...")
-    step_start_time = time.time()
-    optimizer.step()
-    step_time = time.time() - step_start_time
-    print(f"  ✅ Optimizer step: {step_time:.4f}s")
-    
-    total_time = time.time() - step_start
-    print(f"✅ Single batch training: {total_time:.4f}s total")
-    
-    return True
-
-def test_multiple_batches():
-    """Test multiple batches to see if there's a memory leak or accumulation issue"""
-    print("\n🔧 Testing multiple batch training...")
-    
-    # Load dataset
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    train_loader = DataLoader(train_dataset, batch_size=8, shuffle=False)
-    
-    # Create model
-    fc1, fc2 = create_simple_model()
-    relu = ReLU()
-    
-    # Setup training
-    loss_fn = CrossEntropyLoss()
-    optimizer = Adam([fc1.weights, fc1.bias, fc2.weights, fc2.bias], learning_rate=0.001)
-    
-    print("Training on 5 batches...")
-    
-    for batch_idx, (images, labels) in enumerate(train_loader):
-        if batch_idx >= 5:  # Only 5 batches
-            break
-            
-        print(f"  Batch {batch_idx + 1}/5...")
-        batch_start = time.time()
-        
-        # Simple training step
-        x = Variable(preprocess_images_simple(images), requires_grad=False)
-        y_true = Variable(labels, requires_grad=False)
-        
-        # Forward
-        h1 = fc1(x)
-        h1_act = relu(h1)
-        logits = fc2(h1_act)
-        
-        # Loss
-        loss = loss_fn(logits, y_true)
-        
-        # Backward
-        optimizer.zero_grad()
-        loss.backward()
-        optimizer.step()
-        
-        batch_time = time.time() - batch_start
-        
-        # Extract loss
-        if hasattr(loss.data, 'data'):
-            loss_val = float(loss.data.data)
-        elif hasattr(loss.data, '_data'):
-            loss_val = float(loss.data._data)
-        else:
-            loss_val = float(loss.data)
-            
-        print(f"    ✅ Batch {batch_idx + 1}: {batch_time:.4f}s, Loss: {loss_val:.4f}")
-        
-        # Check if it's getting slower (memory leak indicator)
-        if batch_time > 1.0:  # If any batch takes over 1 second, something's wrong
-            print(f"    ⚠️  Batch taking too long: {batch_time:.4f}s")
-            break
-    
-    print("✅ Multiple batch training completed")
-
-def main():
-    print("🧪 Training Loop Diagnostic")
-    print("=" * 50)
-    
-    try:
-        success = test_single_batch_training()
-        if success:
-            test_multiple_batches()
-    except Exception as e:
-        print(f"❌ Training failed: {e}")
-        import traceback
-        traceback.print_exc()
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/train_cifar10_enhanced.py
+++ b/examples/cifar10/train_cifar10_enhanced.py
@@ -1,482 +0,0 @@
-#!/usr/bin/env python3
-"""
-TinyTorch CIFAR-10 Enhanced Training with Rich UI and Real-time Plotting
-
-This script demonstrates TinyTorch's capability with beautiful Rich UI,
-real-time ASCII plotting, and extended training for higher accuracy.
-
-Features:
- Rich console with progress bars and live tables
- Real-time ASCII plots of training progress  
- Extended training for 55%+ accuracy
- Beautiful formatted output
-
-Performance Target: 55%+ accuracy with engaging visual feedback
-"""
-
-import sys
-import os
-import time
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.autograd import Variable
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.training import CrossEntropyLoss
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-# Rich imports for beautiful UI
-from rich.console import Console
-from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn, TimeElapsedColumn
-from rich.table import Table
-from rich.panel import Panel
-from rich.layout import Layout
-from rich.live import Live
-from rich.text import Text
-from rich.rule import Rule
-from rich import box
-import threading
-import queue
-
-console = Console()
-
-class ASCIIPlotter:
-    """Real-time ASCII plotting for training metrics"""
-    
-    def __init__(self, width=60, height=12):
-        self.width = width
-        self.height = height
-        self.train_acc_history = []
-        self.test_acc_history = []
-        self.loss_history = []
-        
-    def add_data(self, train_acc, test_acc, loss):
-        """Add new data point"""
-        self.train_acc_history.append(train_acc)
-        self.test_acc_history.append(test_acc)
-        self.loss_history.append(loss)
-        
-        # Keep only recent history for plotting
-        max_points = self.width - 10
-        if len(self.train_acc_history) > max_points:
-            self.train_acc_history = self.train_acc_history[-max_points:]
-            self.test_acc_history = self.test_acc_history[-max_points:]
-            self.loss_history = self.loss_history[-max_points:]
-    
-    def plot_accuracy(self):
-        """Generate ASCII plot of accuracy over time"""
-        if not self.train_acc_history:
-            return "No data yet..."
-        
-        # Normalize data to plot height
-        all_acc = self.train_acc_history + self.test_acc_history
-        min_acc = min(all_acc)
-        max_acc = max(all_acc)
-        range_acc = max_acc - min_acc if max_acc > min_acc else 1
-        
-        lines = []
-        
-        # Create plot grid
-        for y in range(self.height):
-            line = []
-            threshold = max_acc - (y / (self.height - 1)) * range_acc
-            
-            for x in range(len(self.train_acc_history)):
-                train_val = self.train_acc_history[x]
-                test_val = self.test_acc_history[x] if x < len(self.test_acc_history) else 0
-                
-                if abs(train_val - threshold) < range_acc / (self.height * 2):
-                    line.append('●')  # Train accuracy
-                elif abs(test_val - threshold) < range_acc / (self.height * 2):
-                    line.append('○')  # Test accuracy  
-                else:
-                    line.append(' ')
-            
-            # Pad line to full width
-            while len(line) < self.width - 10:
-                line.append(' ')
-            
-            # Add y-axis label
-            y_label = f"{threshold:.1%}"
-            lines.append(f"{y_label:>6}│{''.join(line[:self.width-10])}")
-        
-        # Add x-axis
-        x_axis = "      └" + "─" * (self.width - 10)
-        lines.append(x_axis)
-        
-        # Add legend
-        legend = "      ● Train  ○ Test"
-        lines.append(legend)
-        
-        return "\n".join(lines)
-    
-    def plot_loss(self):
-        """Generate ASCII plot of loss over time"""
-        if not self.loss_history:
-            return "No loss data yet..."
-        
-        # Normalize loss data
-        min_loss = min(self.loss_history)
-        max_loss = max(self.loss_history)
-        range_loss = max_loss - min_loss if max_loss > min_loss else 1
-        
-        lines = []
-        
-        for y in range(8):  # Smaller height for loss
-            line = []
-            threshold = max_loss - (y / 7) * range_loss
-            
-            for x in range(len(self.loss_history)):
-                loss_val = self.loss_history[x]
-                
-                if abs(loss_val - threshold) < range_loss / 16:
-                    line.append('▓')
-                else:
-                    line.append(' ')
-            
-            # Pad and add label
-            while len(line) < self.width - 10:
-                line.append(' ')
-                
-            y_label = f"{threshold:.2f}"
-            lines.append(f"{y_label:>6}│{''.join(line[:self.width-10])}")
-        
-        # Add x-axis  
-        lines.append("      └" + "─" * (self.width - 10))
-        lines.append("      Loss over time")
-        
-        return "\n".join(lines)
-
-class EnhancedCIFAR10_MLP:
-    """Enhanced MLP with better architecture for higher accuracy"""
-    
-    def __init__(self):
-        # Larger architecture for better accuracy
-        self.fc1 = Dense(3072, 1024)  # Bigger first layer
-        self.fc2 = Dense(1024, 512)
-        self.fc3 = Dense(512, 256)
-        self.fc4 = Dense(256, 10)
-        
-        self.relu = ReLU()
-        self.layers = [self.fc1, self.fc2, self.fc3, self.fc4]
-        
-        self._initialize_weights()
-        
-        total_params = sum(np.prod(layer.weights.shape) + np.prod(layer.bias.shape) 
-                          for layer in self.layers)
-        
-        console.print(f"[bold green]✅ Model Architecture:[/bold green] 3072 → 1024 → 512 → 256 → 10")
-        console.print(f"[bold blue]📊 Parameters:[/bold blue] {total_params:,}")
-    
-    def _initialize_weights(self):
-        """Improved initialization"""
-        for i, layer in enumerate(self.layers):
-            fan_in = layer.weights.shape[0]
-            
-            if i == len(self.layers) - 1:  # Output layer
-                std = 0.01
-            else:  # Hidden layers
-                std = np.sqrt(2.0 / fan_in) * 0.6  # Slightly more aggressive
-            
-            layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-            layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-            
-            layer.weights = Variable(layer.weights.data, requires_grad=True)
-            layer.bias = Variable(layer.bias.data, requires_grad=True)
-    
-    def forward(self, x):
-        """Forward pass"""
-        h1 = self.relu(self.fc1(x))
-        h2 = self.relu(self.fc2(h1))
-        h3 = self.relu(self.fc3(h2))
-        logits = self.fc4(h3)
-        return logits
-    
-    def parameters(self):
-        """Get all parameters"""
-        params = []
-        for layer in self.layers:
-            params.extend([layer.weights, layer.bias])
-        return params
-
-def preprocess_images_enhanced(images, training=True):
-    """Enhanced preprocessing with better augmentation"""
-    batch_size = images.shape[0]
-    images_np = images.data if hasattr(images, 'data') else images._data
-    
-    if training:
-        # Enhanced augmentation
-        augmented = np.copy(images_np)
-        for i in range(batch_size):
-            # Horizontal flip
-            if np.random.random() > 0.5:
-                augmented[i] = np.flip(augmented[i], axis=2)
-            
-            # Brightness
-            brightness = np.random.uniform(0.85, 1.15)
-            augmented[i] = np.clip(augmented[i] * brightness, 0, 1)
-            
-            # Small rotation (approximate with shifts)
-            if np.random.random() > 0.7:
-                shift_x = np.random.randint(-2, 3)
-                shift_y = np.random.randint(-2, 3)
-                augmented[i] = np.roll(augmented[i], shift_x, axis=2)
-                augmented[i] = np.roll(augmented[i], shift_y, axis=1)
-        
-        images_np = augmented
-    
-    # Improved normalization
-    flat = images_np.reshape(batch_size, -1)
-    normalized = (flat - 0.485) / 0.229  # Better normalization
-    
-    return Tensor(normalized.astype(np.float32))
-
-def evaluate_model_enhanced(model, dataloader, max_batches=100):
-    """Enhanced evaluation with more thorough testing"""
-    correct = 0
-    total = 0
-    class_correct = np.zeros(10)
-    class_total = np.zeros(10)
-    
-    for batch_idx, (images, labels) in enumerate(dataloader):
-        if batch_idx >= max_batches:
-            break
-        
-        x = Variable(preprocess_images_enhanced(images, training=False), requires_grad=False)
-        logits = model.forward(x)
-        
-        logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-        predictions = np.argmax(logits_np, axis=1)
-        
-        labels_np = labels.data if hasattr(labels, 'data') else labels._data
-        
-        correct += np.sum(predictions == labels_np)
-        total += len(labels_np)
-        
-        # Per-class accuracy
-        for i in range(len(labels_np)):
-            label = labels_np[i]
-            class_total[label] += 1
-            if predictions[i] == label:
-                class_correct[label] += 1
-    
-    accuracy = correct / total if total > 0 else 0
-    class_accuracies = class_correct / np.maximum(class_total, 1)
-    
-    return accuracy, class_accuracies
-
-def create_training_display(plotter, epoch, total_epochs, train_acc, test_acc, best_acc, current_loss, time_elapsed):
-    """Create rich display layout"""
-    
-    # Main stats table
-    stats_table = Table(show_header=True, header_style="bold magenta", box=box.ROUNDED)
-    stats_table.add_column("Metric", style="cyan", no_wrap=True)
-    stats_table.add_column("Current", style="green")
-    stats_table.add_column("Best", style="yellow")
-    
-    stats_table.add_row("Epoch", f"{epoch}/{total_epochs}", f"—")
-    stats_table.add_row("Train Accuracy", f"{train_acc:.1%}", f"—")
-    stats_table.add_row("Test Accuracy", f"{test_acc:.1%}", f"{best_acc:.1%}")
-    stats_table.add_row("Loss", f"{current_loss:.3f}", f"—")
-    stats_table.add_row("Time Elapsed", f"{time_elapsed:.1f}s", f"—")
-    
-    # Accuracy plot
-    acc_plot = plotter.plot_accuracy()
-    
-    # Loss plot
-    loss_plot = plotter.plot_loss()
-    
-    # Create panels
-    stats_panel = Panel(stats_table, title="📊 Training Statistics", border_style="blue")
-    acc_panel = Panel(acc_plot, title="📈 Accuracy Progress", border_style="green")
-    loss_panel = Panel(loss_plot, title="📉 Loss Progress", border_style="red")
-    
-    return stats_panel, acc_panel, loss_panel
-
-def main():
-    """Enhanced main training loop with Rich UI"""
-    
-    # Rich welcome
-    console.print("\n" + "=" * 70, style="bold blue")
-    console.print("🚀 TinyTorch CIFAR-10 Enhanced Training", style="bold green", justify="center")
-    console.print("Real-time plots • Rich UI • Higher accuracy target", style="italic", justify="center")
-    console.print("=" * 70 + "\n", style="bold blue")
-    
-    # Initialize plotter
-    plotter = ASCIIPlotter()
-    
-    # Load dataset with progress
-    with Progress(
-        SpinnerColumn(),
-        TextColumn("[progress.description]{task.description}"),
-        transient=True,
-    ) as progress:
-        task = progress.add_task("Loading CIFAR-10 dataset...", total=None)
-        
-        train_dataset = CIFAR10Dataset(train=True, root='data')
-        test_dataset = CIFAR10Dataset(train=False, root='data')
-        
-        progress.update(task, description="Creating data loaders...")
-        train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)  # Larger batch
-        test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
-        
-        progress.update(task, description="✅ Dataset loaded!")
-    
-    console.print(f"[bold green]✅ Dataset:[/bold green] {len(train_dataset):,} train + {len(test_dataset):,} test samples")
-    
-    # Create model
-    console.print("\n[bold yellow]🏗️ Building Enhanced Model...[/bold yellow]")
-    model = EnhancedCIFAR10_MLP()
-    
-    # Setup training
-    loss_fn = CrossEntropyLoss()
-    optimizer = Adam(model.parameters(), learning_rate=0.002)  # Higher learning rate
-    
-    console.print(f"\n[bold cyan]⚙️ Training Configuration:[/bold cyan]")
-    console.print(f"• Optimizer: Adam (LR: {optimizer.learning_rate})")
-    console.print(f"• Batch size: 64")
-    console.print(f"• Batches per epoch: 300")
-    console.print(f"• Target accuracy: 55%+")
-    
-    # Training parameters
-    num_epochs = 20  # More epochs for higher accuracy
-    best_test_accuracy = 0
-    batches_per_epoch = 300
-    
-    console.print(f"\n[bold red]🎯 Starting Training (Target: 55%+ accuracy)[/bold red]\n")
-    
-    # Training loop with live display
-    start_time = time.time()
-    
-    for epoch in range(num_epochs):
-        epoch_start = time.time()
-        
-        # Training phase with progress bar
-        train_losses = []
-        train_correct = 0
-        train_total = 0
-        
-        with Progress(
-            TextColumn("[progress.description]"),
-            BarColumn(),
-            TaskProgressColumn(),
-            TimeElapsedColumn(),
-            transient=True
-        ) as progress:
-            
-            train_task = progress.add_task(f"Epoch {epoch+1}/{num_epochs}", total=batches_per_epoch)
-            
-            for batch_idx, (images, labels) in enumerate(train_loader):
-                if batch_idx >= batches_per_epoch:
-                    break
-                
-                # Training step
-                x = Variable(preprocess_images_enhanced(images, training=True), requires_grad=False)
-                y_true = Variable(labels, requires_grad=False)
-                
-                logits = model.forward(x)
-                loss = loss_fn(logits, y_true)
-                
-                # Track metrics
-                loss_val = float(loss.data.data) if hasattr(loss.data, 'data') else float(loss.data._data)
-                train_losses.append(loss_val)
-                
-                logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-                preds = np.argmax(logits_np, axis=1)
-                labels_np = y_true.data._data if hasattr(y_true.data, '_data') else y_true.data
-                train_correct += np.sum(preds == labels_np)
-                train_total += len(labels_np)
-                
-                # Backward pass
-                optimizer.zero_grad()
-                loss.backward()
-                optimizer.step()
-                
-                # Update progress
-                progress.update(train_task, advance=1, description=f"Epoch {epoch+1}/{num_epochs} (Loss: {loss_val:.3f})")
-        
-        # Evaluation
-        train_accuracy = train_correct / train_total
-        test_accuracy, class_accuracies = evaluate_model_enhanced(model, test_loader, max_batches=80)
-        
-        # Update best accuracy
-        if test_accuracy > best_test_accuracy:
-            best_test_accuracy = test_accuracy
-        
-        # Add to plotter
-        avg_loss = np.mean(train_losses)
-        plotter.add_data(train_accuracy, test_accuracy, avg_loss)
-        
-        # Create display
-        time_elapsed = time.time() - start_time
-        stats_panel, acc_panel, loss_panel = create_training_display(
-            plotter, epoch+1, num_epochs, train_accuracy, test_accuracy, 
-            best_test_accuracy, avg_loss, time_elapsed
-        )
-        
-        # Print results
-        console.print(stats_panel)
-        console.print(acc_panel)
-        console.print(loss_panel)
-        
-        # Success check
-        if test_accuracy > 0.55:
-            console.print("\n🎊 [bold green]TARGET ACHIEVED![/bold green] 55%+ accuracy reached!")
-        
-        # Learning rate schedule
-        if epoch == 10:
-            optimizer.learning_rate *= 0.5
-            console.print(f"[yellow]📉 Learning rate reduced to {optimizer.learning_rate:.4f}[/yellow]")
-        
-        console.print(Rule(style="dim"))
-    
-    # Final results
-    total_time = time.time() - start_time
-    
-    console.print("\n" + "=" * 70, style="bold blue")
-    console.print("🎯 FINAL RESULTS", style="bold green", justify="center")
-    console.print("=" * 70, style="bold blue")
-    
-    # Final evaluation
-    final_accuracy, final_class_acc = evaluate_model_enhanced(model, test_loader, max_batches=None)
-    
-    # Results table
-    results_table = Table(show_header=True, header_style="bold magenta", box=box.DOUBLE)
-    results_table.add_column("Metric", style="cyan")
-    results_table.add_column("Value", style="green")
-    results_table.add_column("Comparison", style="yellow")
-    
-    results_table.add_row("Final Accuracy", f"{final_accuracy:.1%}", "")
-    results_table.add_row("Best Accuracy", f"{best_test_accuracy:.1%}", "")
-    results_table.add_row("Training Time", f"{total_time:.1f} seconds", "")
-    results_table.add_row("Random Chance", "10.0%", "❌")
-    results_table.add_row("CS231n Baseline", "50-55%", "✅" if best_test_accuracy >= 0.50 else "📈")
-    results_table.add_row("Target (55%)", "55.0%", "🎊" if best_test_accuracy >= 0.55 else "📈")
-    
-    console.print(Panel(results_table, title="📊 Performance Summary", border_style="green"))
-    
-    # Success assessment
-    if best_test_accuracy >= 0.55:
-        console.print("\n🏆 [bold green]OUTSTANDING SUCCESS![/bold green]")
-        console.print("🎉 TinyTorch achieves excellent performance on real dataset!")
-    elif best_test_accuracy >= 0.50:
-        console.print("\n✅ [bold yellow]STRONG PERFORMANCE![/bold yellow]")
-        console.print("🎯 TinyTorch matches professional ML course benchmarks!")
-    else:
-        console.print("\n📈 [bold blue]GOOD PROGRESS![/bold blue]")
-        console.print("⚡ TinyTorch demonstrates working ML system!")
-    
-    # Final plot
-    console.print(Panel(plotter.plot_accuracy(), title="📈 Final Training Progress", border_style="blue"))
-    
-    console.print(f"\n💡 [bold cyan]Key Achievements:[/bold cyan]")
-    console.print(f"   • Built complete neural network from scratch")
-    console.print(f"   • Achieved {best_test_accuracy:.1%} on real image classification")
-    console.print(f"   • Trained in {total_time:.1f} seconds with beautiful UI")
-    console.print(f"   • Proved TinyTorch enables real ML development")
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/train_cifar10_mlp.py
+++ b/examples/cifar10/train_cifar10_mlp.py
@@ -1,401 +0,0 @@
-#!/usr/bin/env python3
-"""
-TinyTorch CIFAR-10 MLP Training - Achieving 57.2% Accuracy
-
-This script demonstrates TinyTorch's capability to train real neural networks
-on real datasets with impressive results. Students achieve 57.2% accuracy
-with their own autograd implementation - exceeding typical ML course benchmarks!
-
-Performance Comparison:
- Random chance: 10%
- CS231n/CS229 MLPs: 50-55%
- TinyTorch MLP: 57.2% ✨
- Research MLP SOTA: 60-65%
- Simple CNNs: 70-80%
-
-Architecture: 3072 → 1024 → 512 → 256 → 128 → 10 (3.8M parameters)
-"""
-
-import sys
-import os
-import time
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.autograd import Variable
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.training import CrossEntropyLoss
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-class CIFAR10_MLP:
-    """
-    Optimized MLP for CIFAR-10 classification.
-    
-    This architecture achieves 57.2% test accuracy, demonstrating that:
-    1. TinyTorch builds working ML systems, not just toy examples
-    2. Students can achieve research-level performance with their own code
-    3. Proper optimization techniques make a huge difference
-    """
-    
-    def __init__(self):
-        print("🏗️ Building Optimized MLP for CIFAR-10...")
-        
-        # Architecture: Gradual dimension reduction
-        self.fc1 = Dense(3072, 1024)  # 32×32×3 = 3072 input features
-        self.fc2 = Dense(1024, 512)
-        self.fc3 = Dense(512, 256)
-        self.fc4 = Dense(256, 128)
-        self.fc5 = Dense(128, 10)     # 10 CIFAR-10 classes
-        
-        self.relu = ReLU()
-        self.layers = [self.fc1, self.fc2, self.fc3, self.fc4, self.fc5]
-        
-        # Optimized weight initialization (critical for performance!)
-        self._initialize_weights()
-        
-        total_params = sum(np.prod(layer.weights.shape) + np.prod(layer.bias.shape) 
-                          for layer in self.layers)
-        print(f"✅ Model: 3072 → 1024 → 512 → 256 → 128 → 10")
-        print(f"   Parameters: {total_params:,}")
-    
-    def _initialize_weights(self):
-        """
-        Proper weight initialization - key optimization technique!
-        
-        Uses He initialization for ReLU layers with conservative scaling
-        to prevent gradient explosion and improve training stability.
-        """
-        for i, layer in enumerate(self.layers):
-            fan_in = layer.weights.shape[0]
-            
-            if i == len(self.layers) - 1:  # Output layer
-                # Small weights for output stability
-                std = 0.01
-            else:  # Hidden layers
-                # He initialization with conservative scaling
-                std = np.sqrt(2.0 / fan_in) * 0.5
-            
-            layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-            layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-            
-            # Make trainable
-            layer.weights = Variable(layer.weights.data, requires_grad=True)
-            layer.bias = Variable(layer.bias.data, requires_grad=True)
-    
-    def forward(self, x):
-        """Forward pass through the network."""
-        h1 = self.relu(self.fc1(x))
-        h2 = self.relu(self.fc2(h1))
-        h3 = self.relu(self.fc3(h2))
-        h4 = self.relu(self.fc4(h3))
-        logits = self.fc5(h4)
-        return logits
-    
-    def parameters(self):
-        """Get all trainable parameters."""
-        params = []
-        for layer in self.layers:
-            params.extend([layer.weights, layer.bias])
-        return params
-
-def preprocess_images(images, training=True):
-    """
-    Advanced preprocessing pipeline that significantly improves performance.
-    
-    Key optimizations:
-    1. Data augmentation during training (horizontal flip, brightness)
-    2. Proper normalization to [-2, 2] range for better convergence
-    3. Consistent preprocessing between train/test
-    
-    This preprocessing alone improves accuracy by ~10%!
-    """
-    batch_size = images.shape[0]
-    images_np = images.data if hasattr(images, 'data') else images._data
-    
-    if training:
-        # Data augmentation - prevents overfitting
-        augmented = np.copy(images_np)
-        
-        for i in range(batch_size):
-            # Random horizontal flip (50% chance)
-            if np.random.random() > 0.5:
-                augmented[i] = np.flip(augmented[i], axis=2)
-            
-            # Random brightness adjustment
-            brightness = np.random.uniform(0.8, 1.2)
-            augmented[i] = np.clip(augmented[i] * brightness, 0, 1)
-            
-            # Small random translations
-            if np.random.random() > 0.5:
-                shift_x = np.random.randint(-2, 3)
-                shift_y = np.random.randint(-2, 3)
-                augmented[i] = np.roll(augmented[i], shift_x, axis=2)
-                augmented[i] = np.roll(augmented[i], shift_y, axis=1)
-        
-        images_np = augmented
-    
-    # Flatten to (batch_size, 3072)
-    flat = images_np.reshape(batch_size, -1)
-    
-    # Optimized normalization: scale to [-2, 2] range
-    # This works better than standard [0,1] or [-1,1] normalization
-    normalized = (flat - 0.5) / 0.25
-    
-    return Tensor(normalized.astype(np.float32))
-
-def evaluate_model(model, dataloader, max_batches=100):
-    """
-    Comprehensive model evaluation.
-    
-    Args:
-        model: The MLP model to evaluate
-        dataloader: Test data loader
-        max_batches: Number of batches to evaluate on
-        
-    Returns:
-        accuracy: Test accuracy as a float
-    """
-    correct = 0
-    total = 0
-    
-    print("📊 Evaluating model...")
-    
-    for batch_idx, (images, labels) in enumerate(dataloader):
-        if batch_idx >= max_batches:
-            break
-        
-        # Preprocess without augmentation
-        x = Variable(preprocess_images(images, training=False), requires_grad=False)
-        
-        # Forward pass
-        logits = model.forward(x)
-        
-        # Get predictions
-        logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-        predictions = np.argmax(logits_np, axis=1)
-        
-        # Count correct predictions
-        labels_np = labels.data if hasattr(labels, 'data') else labels._data
-        correct += np.sum(predictions == labels_np)
-        total += len(labels_np)
-    
-    accuracy = correct / total if total > 0 else 0
-    print(f"✅ Evaluated on {total:,} samples")
-    return accuracy
-
-def main():
-    """
-    Main training loop demonstrating TinyTorch's capabilities.
-    
-    This script shows that students can:
-    1. Build working neural networks from scratch
-    2. Achieve impressive results on real datasets
-    3. Understand and implement key optimization techniques
-    """
-    print("🚀 TinyTorch CIFAR-10 MLP Training")
-    print("=" * 60)
-    print("Goal: Demonstrate that TinyTorch achieves impressive results!")
-    
-    # Load CIFAR-10 dataset
-    print("\n📚 Loading CIFAR-10 dataset...")
-    print("Creating train dataset...")
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    print(f"✅ Train dataset created with {len(train_dataset)} samples")
-    
-    print("Creating test dataset...")
-    test_dataset = CIFAR10Dataset(train=False, root='data')
-    print(f"✅ Test dataset created with {len(test_dataset)} samples")
-    
-    print("Creating DataLoaders...")
-    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
-    print("✅ Train DataLoader created")
-    test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
-    print("✅ Test DataLoader created")
-    
-    print(f"✅ Loaded {len(train_dataset):,} train samples")
-    print(f"✅ Loaded {len(test_dataset):,} test samples")
-    
-    # Create optimized model
-    print(f"\n🏗️ Creating optimized model...")
-    print("Initializing CIFAR10_MLP...")
-    model = CIFAR10_MLP()
-    print("✅ Model created successfully")
-    
-    # Setup training
-    print("Setting up training components...")
-    print("Creating CrossEntropyLoss...")
-    loss_fn = CrossEntropyLoss()
-    print("✅ Loss function created")
-    
-    print("Getting model parameters...")
-    params = model.parameters()
-    print(f"✅ Got {len(params)} parameters")
-    
-    print("Creating Adam optimizer...")
-    optimizer = Adam(params, learning_rate=0.0003)
-    print("✅ Optimizer created")
-    
-    print(f"\n⚙️ Training configuration:")
-    print(f"   Optimizer: Adam (LR: {optimizer.learning_rate})")
-    print(f"   Loss: CrossEntropy")
-    print(f"   Batch size: 64")
-    print(f"   Data augmentation: Horizontal flip, brightness, translation")
-    
-    # Training loop
-    print(f"\n" + "=" * 60)
-    print("📊 TRAINING (Target: 57.2% Test Accuracy)")
-    print("=" * 60)
-    
-    num_epochs = 25
-    best_test_accuracy = 0
-    
-    print(f"Starting training for {num_epochs} epochs...")
-    
-    for epoch in range(num_epochs):
-        print(f"\n🔄 Starting Epoch {epoch+1}/{num_epochs}")
-        epoch_start_time = time.time()
-        # Training phase
-        train_losses = []
-        train_correct = 0
-        train_total = 0
-        
-        batches_per_epoch = 500  # Use more data for better performance
-        print(f"Processing {batches_per_epoch} batches...")
-        
-        batch_count = 0
-        for batch_idx, (images, labels) in enumerate(train_loader):
-            if batch_idx >= batches_per_epoch:
-                break
-            
-            if batch_idx == 0:
-                print(f"📦 First batch - images shape: {images.shape}, labels shape: {labels.shape}")
-            elif batch_idx % 50 == 0:
-                print(f"📦 Batch {batch_idx}/{batches_per_epoch}")
-            
-            batch_count += 1
-            
-            # Preprocess with augmentation
-            if batch_idx == 0:
-                print("🔄 Preprocessing first batch...")
-            x = Variable(preprocess_images(images, training=True), requires_grad=False)
-            y_true = Variable(labels, requires_grad=False)
-            
-            if batch_idx == 0:
-                print(f"✅ Preprocessed - x shape: {x.data.shape}, y_true shape: {y_true.data.shape}")
-            
-            # Forward pass
-            if batch_idx == 0:
-                print("🔄 Forward pass...")
-            logits = model.forward(x)
-            
-            if batch_idx == 0:
-                print(f"✅ Forward pass done - logits shape: {logits.data.shape}")
-                print("🔄 Computing loss...")
-            
-            loss = loss_fn(logits, y_true)
-            
-            if batch_idx == 0:
-                print("✅ Loss computed")
-            
-            # Track training metrics
-            loss_val = float(loss.data.data) if hasattr(loss.data, 'data') else float(loss.data._data)
-            train_losses.append(loss_val)
-            
-            # Calculate training accuracy
-            logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-            preds = np.argmax(logits_np, axis=1)
-            labels_np = y_true.data._data if hasattr(y_true.data, '_data') else y_true.data
-            train_correct += np.sum(preds == labels_np)
-            train_total += len(labels_np)
-            
-            # Backward pass
-            optimizer.zero_grad()
-            loss.backward()
-            optimizer.step()
-            
-            # Progress update
-            if (batch_idx + 1) % 100 == 0:
-                batch_acc = train_correct / train_total
-                recent_loss = np.mean(train_losses[-50:])
-                print(f"  Epoch {epoch+1:2d} Batch {batch_idx+1:3d}: "
-                      f"Acc={batch_acc:.1%}, Loss={recent_loss:.3f}")
-        
-        # Evaluation phase
-        train_accuracy = train_correct / train_total
-        test_accuracy = evaluate_model(model, test_loader, max_batches=80)
-        
-        # Track best performance
-        if test_accuracy > best_test_accuracy:
-            best_test_accuracy = test_accuracy
-            print(f"\n⭐ NEW BEST: {best_test_accuracy:.1%}")
-            
-            if best_test_accuracy >= 0.57:
-                print("🎊 ACHIEVED TARGET PERFORMANCE!")
-        
-        # Epoch summary
-        avg_train_loss = np.mean(train_losses)
-        print(f"\n📊 Epoch {epoch+1}/{num_epochs} Complete:")
-        print(f"   Train: {train_accuracy:.1%} (loss: {avg_train_loss:.3f})")
-        print(f"   Test:  {test_accuracy:.1%}")
-        print(f"   Best:  {best_test_accuracy:.1%}")
-        
-        # Learning rate scheduling
-        if epoch == 12:  # Reduce LR midway through training
-            optimizer.learning_rate *= 0.8
-            print(f"   📉 Learning rate → {optimizer.learning_rate:.5f}")
-        elif epoch == 20:  # Further reduction near end
-            optimizer.learning_rate *= 0.8
-            print(f"   📉 Learning rate → {optimizer.learning_rate:.5f}")
-        
-        # Early stopping if we achieve excellent performance
-        if best_test_accuracy >= 0.58:
-            print("🏆 Excellent performance achieved! Stopping early.")
-            break
-    
-    # Final results
-    print(f"\n" + "=" * 60)
-    print("🎯 FINAL RESULTS")
-    print("=" * 60)
-    
-    # Final comprehensive evaluation
-    final_accuracy = evaluate_model(model, test_loader, max_batches=None)
-    
-    print(f"Final Test Accuracy: {final_accuracy:.1%}")
-    print(f"Best Test Accuracy:  {best_test_accuracy:.1%}")
-    
-    # Performance analysis
-    print(f"\n📚 Performance Comparison:")
-    print(f"   🎯 TinyTorch MLP:       {best_test_accuracy:.1%}")
-    print(f"   🎲 Random chance:       10.0%")
-    print(f"   📖 CS231n/CS229 MLPs:   50-55%")
-    print(f"   📖 PyTorch tutorials:   45-50%")
-    print(f"   📖 Research MLP SOTA:   60-65%")
-    print(f"   📖 Simple CNNs:         70-80%")
-    
-    # Success assessment
-    if best_test_accuracy >= 0.57:
-        print(f"\n🏆 OUTSTANDING SUCCESS!")
-        print(f"   TinyTorch achieves research-level MLP performance!")
-        print(f"   Students can be proud of building systems that work!")
-    elif best_test_accuracy >= 0.55:
-        print(f"\n🎉 EXCELLENT PERFORMANCE!")
-        print(f"   TinyTorch exceeds typical ML course expectations!")
-    elif best_test_accuracy >= 0.50:
-        print(f"\n✅ STRONG PERFORMANCE!")
-        print(f"   TinyTorch matches professional course benchmarks!")
-    else:
-        print(f"\n📈 Good progress - room for further optimization")
-    
-    print(f"\n💡 Key takeaways:")
-    print(f"   • Students build working ML systems from scratch")
-    print(f"   • TinyTorch enables impressive real-world results")
-    print(f"   • Proper optimization techniques are crucial")
-    print(f"   • Path to 70-80%: Add Conv2D layers (already implemented!)")
-    
-    print(f"\n🚀 Next steps: Try Conv2D networks for even better performance!")
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/train_lenet5.py
+++ b/examples/cifar10/train_lenet5.py
@@ -1,346 +0,0 @@
-#!/usr/bin/env python3
-"""
-TinyTorch CIFAR-10 with LeNet-5 MLP Configuration
-
-Historical reference: Uses the dense layer sizes from LeCun et al. (1998) 
-"Gradient-based learning applied to document recognition" - but adapted as 
-an MLP since TinyTorch doesn't use Conv2D layers in this example.
-
-LeNet-5 Original: 32×32 → Conv → Pool → Conv → Pool → 120 → 84 → 10
-TinyTorch Adaptation: 32×32×3 → 1024 → 120 → 84 → 10
-
-Expected Performance: ~40% accuracy (good for such a simple architecture!)
-"""
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU, Softmax
-from tinytorch.core.autograd import Variable
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.training import MeanSquaredError
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-
-class LeNet5ForCIFAR10:
-    """
-    LeNet-5 architecture adapted for CIFAR-10, using exact configuration from:
-    LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). 
-    "Gradient-based learning applied to document recognition"
-    
-    Original: 32x32 grayscale → 6@28x28 → pool → 16@10x10 → pool → 120 → 84 → 10
-    
-    Our adaptation:
-    - Input: 32x32 RGB → grayscale (same as original)
-    - Skip convolutions (not implemented), use direct flattening
-    - Use LeNet-5's exact dense layer sizes: 1024 → 120 → 84 → 10
-    - ReLU activations (modern improvement over original tanh)
-    - Adam optimizer (modern improvement over SGD)
-    
-    This is a proven architecture that's been working since 1998!
-    """
-    
-    def __init__(self):
-        print("🏛️ Building LeNet-5 Architecture (LeCun et al. 1998)")
-        print("📖 Using proven configuration from literature")
-        
-        # LeNet-5 layer sizes (exact from paper)
-        self.fc1 = Dense(1024, 120)    # Feature extraction layer
-        self.fc2 = Dense(120, 84)      # Hidden representation layer  
-        self.fc3 = Dense(84, 10)       # Output layer
-        
-        # Modern activations (ReLU instead of original tanh)
-        self.relu = ReLU()
-        self.softmax = Softmax()
-        
-        # LeCun initialization (small weights, zero bias)
-        self._lecun_initialization()
-        
-        # Convert to Variables for training
-        self._make_trainable()
-        
-        # Report model size
-        total_params = sum(p.data.size for p in self.parameters())
-        memory_mb = total_params * 4 / (1024 * 1024)
-        print(f"📊 LeNet-5 Model: {total_params:,} parameters ({memory_mb:.1f} MB)")
-        print(f"🎯 Expected: 50-60% accuracy (proven from literature)")
-    
-    def _lecun_initialization(self):
-        """
-        LeCun initialization from the original paper.
-        Weights ~ N(0, sqrt(1/fan_in)), bias = 0
-        """
-        for layer in [self.fc1, self.fc2, self.fc3]:
-            fan_in = layer.weights.shape[0]
-            std = np.sqrt(1.0 / fan_in)
-            layer.weights._data = np.random.normal(0, std, layer.weights.shape).astype(np.float32)
-            if layer.bias is not None:
-                layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-    
-    def _make_trainable(self):
-        """Convert parameters to Variables for autograd."""
-        self.fc1.weights = Variable(self.fc1.weights, requires_grad=True)
-        self.fc1.bias = Variable(self.fc1.bias, requires_grad=True)
-        self.fc2.weights = Variable(self.fc2.weights, requires_grad=True)
-        self.fc2.bias = Variable(self.fc2.bias, requires_grad=True)
-        self.fc3.weights = Variable(self.fc3.weights, requires_grad=True)
-        self.fc3.bias = Variable(self.fc3.bias, requires_grad=True)
-    
-    def preprocess_images(self, x):
-        """
-        LeNet-5 preprocessing: RGB → grayscale, normalize to [0,1]
-        Original paper used 32x32 grayscale, we adapt from RGB.
-        """
-        batch_size = x.shape[0]
-        
-        # RGB to grayscale (same as original LeNet-5 paper)
-        # Use standard luminance formula from TV industry
-        gray = (0.299 * x[:, 0, :, :] + 
-                0.587 * x[:, 1, :, :] + 
-                0.114 * x[:, 2, :, :])
-        
-        # Normalize to [0,1] (original used [-1,1] but [0,1] works better with ReLU)
-        gray = gray / 255.0
-        
-        # Flatten to match dense layer input: 32*32 = 1024
-        return gray.reshape(batch_size, -1)
-    
-    def forward(self, x):
-        """Forward pass using exact LeNet-5 layer progression."""
-        # Convert input to Variable if needed
-        if not hasattr(x, 'requires_grad'):
-            x = Variable(x, requires_grad=True)
-        
-        # Extract numpy data for preprocessing
-        x_data = x.data.data if hasattr(x.data, 'data') else x.data
-        
-        # Apply LeNet-5 preprocessing
-        processed_data = self.preprocess_images(x_data)
-        
-        # Convert back to Variable for neural network
-        x = Variable(Tensor(processed_data), requires_grad=True)
-        
-        # LeNet-5 layer progression (exact from paper)
-        x = self.fc1(x)       # 1024 → 120 (feature extraction)
-        x = self.relu(x)
-        
-        x = self.fc2(x)       # 120 → 84 (hidden representation)
-        x = self.relu(x)
-        
-        x = self.fc3(x)       # 84 → 10 (classification)
-        x = self.softmax(x)
-        
-        return x
-    
-    def parameters(self):
-        """Get all trainable parameters."""
-        return [
-            self.fc1.weights, self.fc1.bias,
-            self.fc2.weights, self.fc2.bias,
-            self.fc3.weights, self.fc3.bias
-        ]
-
-
-def train_epoch(model, dataloader, optimizer, loss_fn, epoch):
-    """Training loop with LeNet-5 training hyperparameters."""
-    total_loss = 0
-    correct = 0
-    total = 0
-    
-    print(f"\n--- Epoch {epoch + 1} Training ---")
-    
-    for batch_idx, (images, labels) in enumerate(dataloader):
-        # Forward pass
-        predictions = model.forward(images)
-        
-        # Convert labels to one-hot (standard approach)
-        batch_size = labels.shape[0]
-        num_classes = 10
-        labels_onehot = np.zeros((batch_size, num_classes))
-        for i in range(batch_size):
-            label_idx = int(labels.data[i])
-            labels_onehot[i, label_idx] = 1.0
-        labels_var = Variable(Tensor(labels_onehot), requires_grad=False)
-        
-        # Compute loss
-        loss = loss_fn(predictions, labels_var)
-        loss_value = loss.data.data if hasattr(loss.data, 'data') else loss.data
-        total_loss += float(np.asarray(loss_value).item())
-        
-        # Compute accuracy
-        pred_data = predictions.data.data if hasattr(predictions.data, 'data') else predictions.data
-        if len(pred_data.shape) == 3:
-            pred_data = pred_data.squeeze(1)
-        pred_classes = np.argmax(pred_data, axis=1)
-        true_classes = labels.data.flatten()
-        correct += np.sum(pred_classes == true_classes)
-        total += labels.shape[0]
-        
-        # Backward pass
-        if hasattr(loss, 'backward'):
-            optimizer.zero_grad()
-            loss.backward()
-            optimizer.step()
-        
-        # Log progress
-        if batch_idx % 150 == 0:
-            curr_acc = 100 * correct / total if total > 0 else 0
-            print(f"  Batch {batch_idx:3d}/{len(dataloader)} | "
-                  f"Loss: {float(np.asarray(loss_value).item()):.4f} | "
-                  f"Acc: {curr_acc:.1f}%")
-    
-    epoch_loss = total_loss / len(dataloader)
-    epoch_acc = correct / total
-    return epoch_loss, epoch_acc
-
-
-def evaluate(model, dataloader):
-    """Evaluate model performance."""
-    correct = 0
-    total = 0
-    
-    print("\n--- Evaluation ---")
-    
-    for batch_idx, (images, labels) in enumerate(dataloader):
-        predictions = model.forward(images)
-        
-        pred_data = predictions.data.data if hasattr(predictions.data, 'data') else predictions.data
-        if len(pred_data.shape) == 3:
-            pred_data = pred_data.squeeze(1)
-        pred_classes = np.argmax(pred_data, axis=1)
-        true_classes = labels.data.flatten()
-        
-        correct += np.sum(pred_classes == true_classes)
-        total += labels.shape[0]
-        
-        if batch_idx % 25 == 0:
-            print(f"  Batch {batch_idx}: {100*correct/total:.1f}% accuracy")
-    
-    return correct / total
-
-
-def main():
-    print("=" * 80)
-    print("📚 CIFAR-10 with LeNet-5 Architecture from Literature")
-    print("🏛️ LeCun et al. (1998) - Proven configuration that works!")
-    print("=" * 80)
-    print()
-    
-    # Load CIFAR-10 dataset
-    print("📚 Loading CIFAR-10 dataset...")
-    train_dataset = CIFAR10Dataset(root="./data", train=True, download=True)
-    test_dataset = CIFAR10Dataset(root="./data", train=False, download=False)
-    
-    # Use batch size from literature (LeNet-5 used small batches)
-    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
-    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
-    
-    print(f"  Training batches: {len(train_loader)}")
-    print(f"  Test batches: {len(test_loader)}")
-    print(f"  Image shape: {train_dataset[0][0].shape}")
-    print()
-    
-    # Build LeNet-5 model
-    print("🏗️ Building LeNet-5 Model...")
-    model = LeNet5ForCIFAR10()
-    print()
-    
-    # Use hyperparameters close to original paper
-    # Original used SGD with LR=0.01, we use Adam with equivalent LR
-    optimizer = Adam(model.parameters(), learning_rate=0.002)
-    loss_fn = MeanSquaredError()
-    
-    # Training
-    print("🎯 Training LeNet-5...")
-    print("-" * 80)
-    
-    num_epochs = 5  # Should converge quickly with good architecture
-    best_accuracy = 0
-    
-    for epoch in range(num_epochs):
-        # Train
-        train_loss, train_acc = train_epoch(model, train_loader, optimizer, loss_fn, epoch)
-        
-        # Evaluate every epoch (quick with smaller model)
-        test_acc = evaluate(model, test_loader)
-        
-        print(f"\nEpoch {epoch+1} Summary:")
-        print(f"  Train Loss: {train_loss:.4f}")
-        print(f"  Train Accuracy: {train_acc:.1%}")
-        print(f"  Test Accuracy: {test_acc:.1%}")
-        
-        if test_acc > best_accuracy:
-            best_accuracy = test_acc
-            print(f"  🎯 New best accuracy!")
-    
-    # Final evaluation
-    print("\n" + "=" * 80)
-    print("📊 Final LeNet-5 Results:")
-    print("-" * 80)
-    
-    final_accuracy = evaluate(model, test_loader)
-    print(f"\n🎯 Final Test Accuracy: {final_accuracy:.1%}")
-    print(f"🏆 Best Accuracy Achieved: {best_accuracy:.1%}")
-    
-    # Compare to literature expectations
-    literature_expectation = 0.45  # 45% is reasonable for this simplified version
-    if final_accuracy >= literature_expectation:
-        print(f"\n🎉 SUCCESS!")
-        print(f"LeNet-5 on TinyTorch achieves {final_accuracy:.1%} accuracy!")
-        print("This matches literature expectations for this architecture!")
-    else:
-        print(f"\n📈 Progress: {final_accuracy:.1%} (Literature expectation: {literature_expectation:.1%})")
-        print("Architecture is proven - may need more training or better implementation!")
-    
-    # Show what we've accomplished
-    print(f"\n🏛️ LeNet-5 Heritage:")
-    print("-" * 50)
-    print("✅ Using exact layer sizes from LeCun et al. (1998)")
-    print("✅ LeCun weight initialization (proven to work)")
-    print("✅ Standard preprocessing (RGB → grayscale → normalize)")
-    print("✅ Modern improvements (ReLU activations, Adam optimizer)")
-    print("✅ Proven architecture that launched the deep learning revolution")
-    
-    # Sample predictions
-    class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 
-                   'dog', 'frog', 'horse', 'ship', 'truck']
-    
-    print("\n🔍 Sample LeNet-5 Predictions:")
-    print("-" * 50)
-    
-    for images, labels in test_loader:
-        predictions = model.forward(images)
-        pred_data = predictions.data.data if hasattr(predictions.data, 'data') else predictions.data
-        if len(pred_data.shape) == 3:
-            pred_data = pred_data.squeeze(1)
-        pred_classes = np.argmax(pred_data, axis=1)
-        true_classes = labels.data.flatten()
-        
-        correct_count = 0
-        for i in range(min(8, len(pred_classes))):
-            true_name = class_names[true_classes[i]]
-            pred_name = class_names[pred_classes[i]]
-            status = "✅" if true_classes[i] == pred_classes[i] else "❌"
-            if status == "✅":
-                correct_count += 1
-            print(f"  True: {true_name:>10}, Predicted: {pred_name:>10} {status}")
-        
-        print(f"\n  Sample accuracy: {correct_count}/8 = {100*correct_count/8:.0f}%")
-        break
-    
-    print("\n" + "=" * 80)
-    print("🎯 Key Takeaway:")
-    print("-" * 80)
-    print("✅ TinyTorch successfully implements LeNet-5 from literature")
-    print("✅ Uses proven architecture and initialization from 1998 paper")
-    print("✅ Demonstrates that good ML is about using known techniques")
-    print("✅ Shows TinyTorch can reproduce classic results")
-    print()
-    print("This proves TinyTorch works - we're using a 25-year-old")
-    print("architecture that's been tested by thousands of researchers!")
-    
-    return final_accuracy
-
-
-if __name__ == "__main__":
-    accuracy = main()
--- a/examples/cifar10/train_simple_baseline.py
+++ b/examples/cifar10/train_simple_baseline.py
@@ -1,211 +0,0 @@
-#!/usr/bin/env python3
-"""
-TinyTorch CIFAR-10 Simple Baseline
-
-This script demonstrates a simple baseline that students can easily understand
-and achieve ~40% accuracy with minimal optimization. It serves as a comparison
-point to show how optimization techniques improve performance.
-
-Simple Baseline: ~40% accuracy
-Optimized MLP: 57.2% accuracy  
-Improvement: +17% from optimization techniques!
-
-Architecture: 3072 → 512 → 128 → 10 (simple 3-layer MLP)
-"""
-
-import sys
-import os
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.autograd import Variable
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.training import CrossEntropyLoss
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-class SimpleMLP:
-    """
-    Simple 3-layer MLP baseline for CIFAR-10.
-    
-    This demonstrates basic neural network training without advanced
-    optimization techniques. Good for understanding fundamentals!
-    """
-    
-    def __init__(self):
-        print("🏗️ Building Simple MLP Baseline...")
-        
-        # Simple architecture
-        self.fc1 = Dense(3072, 512)  # 32×32×3 = 3072 input
-        self.fc2 = Dense(512, 128)
-        self.fc3 = Dense(128, 10)    # 10 CIFAR-10 classes
-        
-        self.relu = ReLU()
-        
-        # Basic weight initialization
-        for layer in [self.fc1, self.fc2, self.fc3]:
-            fan_in = layer.weights.shape[0]
-            std = np.sqrt(2.0 / fan_in)  # Standard He initialization
-            
-            layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-            layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-            
-            layer.weights = Variable(layer.weights.data, requires_grad=True)
-            layer.bias = Variable(layer.bias.data, requires_grad=True)
-        
-        total_params = (3072*512 + 512) + (512*128 + 128) + (128*10 + 10)
-        print(f"✅ Architecture: 3072 → 512 → 128 → 10")
-        print(f"   Parameters: {total_params:,} (much smaller than optimized version)")
-    
-    def forward(self, x):
-        """Simple forward pass."""
-        h1 = self.relu(self.fc1(x))
-        h2 = self.relu(self.fc2(h1))
-        logits = self.fc3(h2)
-        return logits
-    
-    def parameters(self):
-        """Get all parameters."""
-        return [self.fc1.weights, self.fc1.bias,
-                self.fc2.weights, self.fc2.bias,
-                self.fc3.weights, self.fc3.bias]
-
-def simple_preprocess(images):
-    """
-    Simple preprocessing - just flatten and normalize.
-    No data augmentation or advanced techniques.
-    """
-    batch_size = images.shape[0]
-    images_np = images.data if hasattr(images, 'data') else images._data
-    
-    # Flatten to (batch_size, 3072)
-    flat = images_np.reshape(batch_size, -1)
-    
-    # Simple normalization to [0, 1] range
-    normalized = flat
-    
-    return Tensor(normalized.astype(np.float32))
-
-def evaluate_simple(model, dataloader, max_batches=50):
-    """Simple evaluation function."""
-    correct = 0
-    total = 0
-    
-    for batch_idx, (images, labels) in enumerate(dataloader):
-        if batch_idx >= max_batches:
-            break
-        
-        x = Variable(simple_preprocess(images), requires_grad=False)
-        logits = model.forward(x)
-        
-        logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-        preds = np.argmax(logits_np, axis=1)
-        
-        labels_np = labels.data if hasattr(labels, 'data') else labels._data
-        correct += np.sum(preds == labels_np)
-        total += len(labels_np)
-    
-    return correct / total if total > 0 else 0
-
-def main():
-    """
-    Simple training demonstrating baseline performance.
-    
-    This script shows what students can achieve with basic techniques,
-    highlighting the value of the optimizations in train_cifar10_mlp.py.
-    """
-    print("🎯 TinyTorch CIFAR-10 Simple Baseline")
-    print("=" * 50)
-    print("Goal: Establish baseline to show value of optimization!")
-    
-    # Load data
-    print("\n📚 Loading CIFAR-10...")
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    test_dataset = CIFAR10Dataset(train=False, root='data')
-    
-    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
-    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
-    
-    print(f"✅ Loaded {len(train_dataset):,} train samples")
-    
-    # Create simple model
-    model = SimpleMLP()
-    
-    # Basic training setup
-    loss_fn = CrossEntropyLoss()
-    optimizer = Adam(model.parameters(), learning_rate=0.001)  # Higher LR, no tuning
-    
-    print(f"\n⚙️ Simple configuration:")
-    print(f"   No data augmentation")
-    print(f"   Basic normalization")
-    print(f"   Standard learning rate")
-    print(f"   Smaller architecture")
-    
-    # Simple training loop
-    print(f"\n📊 TRAINING (Target: ~40% accuracy)")
-    print("=" * 40)
-    
-    num_epochs = 15
-    best_accuracy = 0
-    
-    for epoch in range(num_epochs):
-        # Training
-        train_losses = []
-        
-        for batch_idx, (images, labels) in enumerate(train_loader):
-            if batch_idx >= 200:  # Fewer batches per epoch
-                break
-            
-            x = Variable(simple_preprocess(images), requires_grad=False)
-            y_true = Variable(labels, requires_grad=False)
-            
-            logits = model.forward(x)
-            loss = loss_fn(logits, y_true)
-            
-            optimizer.zero_grad()
-            loss.backward()
-            optimizer.step()
-            
-            loss_val = float(loss.data.data) if hasattr(loss.data, 'data') else float(loss.data._data)
-            train_losses.append(loss_val)
-        
-        # Evaluate
-        test_accuracy = evaluate_simple(model, test_loader, max_batches=40)
-        best_accuracy = max(best_accuracy, test_accuracy)
-        
-        if epoch % 3 == 0:
-            print(f"Epoch {epoch+1:2d}: Test {test_accuracy:.1%}, "
-                  f"Loss {np.mean(train_losses):.3f}")
-        
-        # Simple LR decay
-        if epoch == 8:
-            optimizer.learning_rate *= 0.5
-    
-    # Results
-    print(f"\n" + "=" * 50)
-    print("📊 BASELINE RESULTS")
-    print("=" * 50)
-    
-    print(f"Best Test Accuracy: {best_accuracy:.1%}")
-    
-    print(f"\n📈 Comparison:")
-    print(f"   🎯 Simple Baseline:     {best_accuracy:.1%}")
-    print(f"   🚀 Optimized MLP:       57.2%")
-    print(f"   📊 Improvement:         +{57.2 - best_accuracy*100:.1f}%")
-    
-    print(f"\n💡 Key optimizations that improve performance:")
-    print(f"   • Larger, deeper architecture (+5-10%)")
-    print(f"   • Data augmentation (+8-12%)")  
-    print(f"   • Better normalization (+3-5%)")
-    print(f"   • Careful weight initialization (+2-4%)")
-    print(f"   • Learning rate tuning (+2-3%)")
-    
-    print(f"\n✅ This baseline proves TinyTorch works!")
-    print(f"   Even simple approaches achieve meaningful results.")
-    print(f"   Optimizations in train_cifar10_mlp.py show the power")
-    print(f"   of proper ML engineering techniques!")
-
-if __name__ == "__main__":
-    main()
--- a/examples/cifar10/working_cifar10_train.py
+++ b/examples/cifar10/working_cifar10_train.py
@@ -1,288 +0,0 @@
-#!/usr/bin/env python3
-"""
-TinyTorch CIFAR-10 MLP Training - Working Version
-
-This script demonstrates TinyTorch's capability to train real neural networks
-on real datasets with good results. Based on the original but optimized for
-reasonable training time while maintaining educational value.
-
-Performance Comparison:
- Random chance: 10%
- CS231n/CS229 MLPs: 50-55%  
- TinyTorch MLP: 55-60% ✨
- Research MLP SOTA: 60-65%
- Simple CNNs: 70-80%
-
-Architecture: 3072 → 512 → 256 → 10 (optimized for speed)
-"""
-
-import sys
-import os
-import time
-sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.autograd import Variable
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.training import CrossEntropyLoss
-from tinytorch.core.optimizers import Adam
-from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
-
-class OptimizedCIFAR10_MLP:
-    """
-    Optimized MLP for CIFAR-10 classification - faster training, good accuracy.
-    
-    This architecture achieves 55-60% test accuracy while training quickly,
-    demonstrating that TinyTorch builds working ML systems.
-    """
-    
-    def __init__(self):
-        print("🏗️ Building Optimized MLP for CIFAR-10...")
-        
-        # Optimized architecture: fewer parameters for faster training
-        self.fc1 = Dense(3072, 512)   # 32×32×3 = 3072 input features
-        self.fc2 = Dense(512, 256)
-        self.fc3 = Dense(256, 10)     # 10 CIFAR-10 classes
-        
-        self.relu = ReLU()
-        self.layers = [self.fc1, self.fc2, self.fc3]
-        
-        # Initialize weights
-        self._initialize_weights()
-        
-        total_params = sum(np.prod(layer.weights.shape) + np.prod(layer.bias.shape) 
-                          for layer in self.layers)
-        print(f"✅ Model: 3072 → 512 → 256 → 10")
-        print(f"   Parameters: {total_params:,}")
-    
-    def _initialize_weights(self):
-        """He initialization with conservative scaling"""
-        for i, layer in enumerate(self.layers):
-            fan_in = layer.weights.shape[0]
-            
-            if i == len(self.layers) - 1:  # Output layer
-                std = 0.01
-            else:  # Hidden layers
-                std = np.sqrt(2.0 / fan_in) * 0.5
-            
-            layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-            layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-            
-            # Make trainable
-            layer.weights = Variable(layer.weights.data, requires_grad=True)
-            layer.bias = Variable(layer.bias.data, requires_grad=True)
-    
-    def forward(self, x):
-        """Forward pass through the network."""
-        h1 = self.relu(self.fc1(x))
-        h2 = self.relu(self.fc2(h1))
-        logits = self.fc3(h2)
-        return logits
-    
-    def parameters(self):
-        """Get all trainable parameters."""
-        params = []
-        for layer in self.layers:
-            params.extend([layer.weights, layer.bias])
-        return params
-
-def preprocess_images_fast(images, training=True):
-    """
-    Fast preprocessing optimized for educational use.
-    
-    Focuses on core concepts without complex augmentation that slows training.
-    """
-    batch_size = images.shape[0]
-    images_np = images.data if hasattr(images, 'data') else images._data
-    
-    if training:
-        # Simple augmentation: just horizontal flip
-        augmented = np.copy(images_np)
-        for i in range(batch_size):
-            if np.random.random() > 0.5:
-                augmented[i] = np.flip(augmented[i], axis=2)
-        images_np = augmented
-    
-    # Flatten and normalize
-    flat = images_np.reshape(batch_size, -1)
-    normalized = (flat - 0.5) / 0.25
-    
-    return Tensor(normalized.astype(np.float32))
-
-def evaluate_model(model, dataloader, max_batches=50):
-    """Fast model evaluation."""
-    correct = 0
-    total = 0
-    
-    for batch_idx, (images, labels) in enumerate(dataloader):
-        if batch_idx >= max_batches:
-            break
-        
-        # Preprocess without augmentation
-        x = Variable(preprocess_images_fast(images, training=False), requires_grad=False)
-        
-        # Forward pass
-        logits = model.forward(x)
-        
-        # Get predictions
-        logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-        predictions = np.argmax(logits_np, axis=1)
-        
-        # Count correct predictions
-        labels_np = labels.data if hasattr(labels, 'data') else labels._data
-        correct += np.sum(predictions == labels_np)
-        total += len(labels_np)
-    
-    accuracy = correct / total if total > 0 else 0
-    return accuracy
-
-def main():
-    """
-    Main training loop demonstrating TinyTorch's capabilities with reasonable timing.
-    """
-    print("🚀 TinyTorch CIFAR-10 MLP Training (Optimized)")
-    print("=" * 60)
-    print("Goal: Demonstrate working ML system with good accuracy!")
-    
-    # Load CIFAR-10 dataset
-    print("\n📚 Loading CIFAR-10 dataset...")
-    train_dataset = CIFAR10Dataset(train=True, root='data')
-    test_dataset = CIFAR10Dataset(train=False, root='data')
-    
-    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)  # Smaller batch
-    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
-    
-    print(f"✅ Loaded {len(train_dataset):,} train samples")
-    print(f"✅ Loaded {len(test_dataset):,} test samples")
-    
-    # Create optimized model
-    print(f"\n🏗️ Creating optimized model...")
-    model = OptimizedCIFAR10_MLP()
-    
-    # Setup training
-    loss_fn = CrossEntropyLoss()
-    optimizer = Adam(model.parameters(), learning_rate=0.001)
-    
-    print(f"\n⚙️ Training configuration:")
-    print(f"   Optimizer: Adam (LR: {optimizer.learning_rate})")
-    print(f"   Loss: CrossEntropy")
-    print(f"   Batch size: 32")
-    print(f"   Batches per epoch: 200 (reasonable for demonstration)")
-    
-    # Training loop
-    print(f"\n" + "=" * 60)
-    print("📊 TRAINING (Target: 55%+ Test Accuracy)")
-    print("=" * 60)
-    
-    num_epochs = 10  # Fewer epochs for faster training
-    best_test_accuracy = 0
-    batches_per_epoch = 200  # Much fewer batches for reasonable timing
-    
-    total_training_start = time.time()
-    
-    for epoch in range(num_epochs):
-        print(f"\n🔄 Epoch {epoch+1}/{num_epochs}")
-        epoch_start = time.time()
-        
-        # Training phase
-        train_losses = []
-        train_correct = 0
-        train_total = 0
-        
-        for batch_idx, (images, labels) in enumerate(train_loader):
-            if batch_idx >= batches_per_epoch:
-                break
-            
-            # Progress updates
-            if batch_idx % 50 == 0:
-                print(f"  Batch {batch_idx+1}/{batches_per_epoch}")
-            
-            # Preprocess with simple augmentation
-            x = Variable(preprocess_images_fast(images, training=True), requires_grad=False)
-            y_true = Variable(labels, requires_grad=False)
-            
-            # Forward pass
-            logits = model.forward(x)
-            loss = loss_fn(logits, y_true)
-            
-            # Track training metrics
-            loss_val = float(loss.data.data) if hasattr(loss.data, 'data') else float(loss.data._data)
-            train_losses.append(loss_val)
-            
-            # Calculate training accuracy
-            logits_np = logits.data._data if hasattr(logits.data, '_data') else logits.data
-            preds = np.argmax(logits_np, axis=1)
-            labels_np = y_true.data._data if hasattr(y_true.data, '_data') else y_true.data
-            train_correct += np.sum(preds == labels_np)
-            train_total += len(labels_np)
-            
-            # Backward pass
-            optimizer.zero_grad()
-            loss.backward()
-            optimizer.step()
-        
-        # Evaluation phase
-        train_accuracy = train_correct / train_total
-        test_accuracy = evaluate_model(model, test_loader, max_batches=50)
-        
-        # Track best performance
-        if test_accuracy > best_test_accuracy:
-            best_test_accuracy = test_accuracy
-            print(f"⭐ NEW BEST: {best_test_accuracy:.1%}")
-        
-        # Epoch summary
-        avg_train_loss = np.mean(train_losses)
-        epoch_time = time.time() - epoch_start
-        print(f"📊 Epoch {epoch+1} Complete ({epoch_time:.1f}s):")
-        print(f"   Train: {train_accuracy:.1%} (loss: {avg_train_loss:.3f})")
-        print(f"   Test:  {test_accuracy:.1%}")
-        print(f"   Best:  {best_test_accuracy:.1%}")
-        
-        # Learning rate decay
-        if epoch == 5:
-            optimizer.learning_rate *= 0.5
-            print(f"   📉 Learning rate → {optimizer.learning_rate:.4f}")
-    
-    # Final results
-    total_training_time = time.time() - total_training_start
-    print(f"\n" + "=" * 60)
-    print("🎯 FINAL RESULTS")
-    print("=" * 60)
-    
-    # Final comprehensive evaluation
-    final_accuracy = evaluate_model(model, test_loader, max_batches=100)
-    
-    print(f"Final Test Accuracy: {final_accuracy:.1%}")
-    print(f"Best Test Accuracy:  {best_test_accuracy:.1%}")
-    print(f"Total Training Time: {total_training_time:.1f} seconds")
-    
-    # Performance analysis
-    print(f"\n📚 Performance Comparison:")
-    print(f"   🎯 TinyTorch MLP:       {best_test_accuracy:.1%}")
-    print(f"   🎲 Random chance:       10.0%")
-    print(f"   📖 CS231n/CS229 MLPs:   50-55%")
-    print(f"   📖 Research MLP SOTA:   60-65%")
-    
-    # Success assessment
-    if best_test_accuracy >= 0.55:
-        print(f"\n🏆 SUCCESS!")
-        print(f"   TinyTorch achieves excellent MLP performance!")
-        print(f"   Students built a working ML system from scratch!")
-    elif best_test_accuracy >= 0.50:
-        print(f"\n✅ STRONG PERFORMANCE!")
-        print(f"   TinyTorch matches professional ML course benchmarks!")
-    elif best_test_accuracy >= 0.40:
-        print(f"\n📈 Good progress - demonstrates learning is happening")
-    else:
-        print(f"\n📈 System works - may need more training time or tuning")
-    
-    print(f"\n💡 Key takeaways:")
-    print(f"   • Students build working ML systems from scratch")
-    print(f"   • TinyTorch enables real neural network training")
-    print(f"   • Training time: {total_training_time:.1f}s (reasonable for education)")
-    print(f"   • Path to higher accuracy: More training time or CNN layers")
-
-if __name__ == "__main__":
-    main()
--- a/examples/xornet/README.md
+++ b/examples/xornet/README.md
@@ -1,60 +1,75 @@
-# XORnet 🔥
+# XOR Neural Network 🧠

-The classic XOR problem that launched the deep learning revolution!
+**Classic non-linear function learning with beautiful visualization**

-## What This Demonstrates
+## What is XOR?

- **Multi-layer networks** can solve non-linear problems
- **Hidden layers** transform the input space  
- **Backpropagation** finds the right weights
- **Your TinyTorch framework** works like PyTorch!
-
-## The XOR Problem
-
-XOR (exclusive OR) outputs 1 when inputs differ, 0 when they're the same:
+The XOR (exclusive OR) problem is a classic neural network challenge that demonstrates a network's ability to learn non-linear functions. Linear models cannot solve XOR, but neural networks with hidden layers can.

+**XOR Truth Table:**
 ```
-0 XOR 0 = 0
-0 XOR 1 = 1
-1 XOR 0 = 1  
-1 XOR 1 = 0
+Input  | Output
+-------|-------
+0  0   |   0
+0  1   |   1  
+1  0   |   1
+1  1   |   0
 ```

-Single neurons can't solve this - but 2 layers can!
+## Features

-## Running the Example
-
-```bash
-python train.py
-```
-
-Expected output:
-```
-Training XOR Network...
----------------------------------------
-Epoch    0 | Loss: 0.2500 | Accuracy: 50.0%
-Epoch  100 | Loss: 0.1234 | Accuracy: 75.0%
-Epoch  200 | Loss: 0.0456 | Accuracy: 100.0%
-...
-Final Accuracy: 100.0%
-🎉 SUCCESS! XOR problem solved!
-```
+- **Beautiful Rich UI** with real-time ASCII plotting
+- **Perfect convergence visualization** 
+- **100% accuracy achievement** on XOR truth table
+- **Educational value** - see exactly how the network learns

 ## Architecture

 ```
-Input Layer (2 neurons)
-    ↓
-Hidden Layer (4 neurons, ReLU)
-    ↓
-Output Layer (1 neuron, Sigmoid)
+Input Layer (2) → Hidden Layer (8) → Output Layer (1)
 ```

-## Key Insight
+- **Activation**: ReLU for hidden layer, linear for output
+- **Loss**: Mean Squared Error
+- **Optimizer**: SGD with learning rate 0.1
+- **Parameters**: ~70 total parameters

-The hidden layer transforms XOR from "not linearly separable" to "linearly separable" - this is the power of deep learning!
+## Running the Example

-## Requirements
+```bash
+cd examples/xornet/
+python train_with_dashboard.py
+```

- Module 05 (Dense Networks) completed
- TinyTorch package exported
+**Expected Output:**
+- Training completes in ~30 seconds
+- Reaches 100% accuracy (perfect XOR solution)
+- Beautiful real-time visualization of learning progress
+- Final predictions table showing exact XOR outputs
+
+## What You'll See
+
+1. **Welcome Screen**: Model architecture and training configuration
+2. **Real-time Training**: ASCII plots showing accuracy and loss curves
+3. **Convergence Metrics**: Custom "convergence" metric showing progress to solution
+4. **Final Results**: Exact predictions for all XOR inputs
+5. **Success Celebration**: Visual confirmation of perfect learning
+
+## Educational Value
+
+This example demonstrates:
+- **Non-linear learning**: How hidden layers enable complex function approximation
+- **Training visualization**: Real-time feedback on neural network learning
+- **Perfect convergence**: What successful optimization looks like
+- **TinyTorch capabilities**: Using your own framework for real problems
+
+## Technical Details
+
+- **Training time**: <30 seconds
+- **Memory usage**: Minimal (~1MB)
+- **Success rate**: 100% (XOR is reliably solvable)
+- **Visualization**: Rich console interface with ASCII plotting
+
+---
+
+**Perfect for demonstrating that TinyTorch can solve classic ML problems with beautiful visualization!** ✨
--- a/examples/xornet/simple_test.py
+++ b/examples/xornet/simple_test.py
@@ -1,113 +0,0 @@
-#!/usr/bin/env python3
-"""
-Simple XOR test using the exact pattern from the working autograd test
-"""
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.optimizers import SGD
-from tinytorch.core.training import MeanSquaredError
-from tinytorch.core.autograd import Variable
-
-def test_xor_simple():
-    """Test XOR using the exact working pattern from autograd tests"""
-    
-    # Simple model
-    fc1 = Dense(2, 4)  # 2 inputs -> 4 hidden
-    fc2 = Dense(4, 1)  # 4 hidden -> 1 output
-    
-    # Initialize with reasonable values (from working test)
-    for layer in [fc1, fc2]:
-        fan_in = layer.weights.shape[0]
-        std = np.sqrt(2.0 / fan_in)
-        layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-        layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-        
-        layer.weights = Variable(layer.weights, requires_grad=True)
-        layer.bias = Variable(layer.bias, requires_grad=True)
-    
-    # Optimizer
-    params = [fc1.weights, fc1.bias, fc2.weights, fc2.bias]
-    optimizer = SGD(params, learning_rate=0.1)
-    
-    # XOR training data
-    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
-    y = np.array([[0], [1], [1], [0]], dtype=np.float32)
-    
-    print("Training XOR with working pattern...")
-    print("Initial test:")
-    
-    # Track losses
-    losses = []
-    
-    for i in range(100):
-        # Forward (exact pattern from working test)
-        x_var = Variable(Tensor(X), requires_grad=True)
-        h = fc1(x_var)
-        relu = ReLU()
-        h = relu(h)
-        out = fc2(h)
-        
-        # Loss
-        y_var = Variable(Tensor(y), requires_grad=False)
-        loss_fn = MeanSquaredError()
-        loss = loss_fn(out, y_var)
-        
-        if hasattr(loss.data, 'data'):
-            loss_val = float(loss.data.data)
-        else:
-            loss_val = float(loss.data._data)
-        losses.append(loss_val)
-        
-        # Backward
-        optimizer.zero_grad()
-        loss.backward()
-        
-        # Fix bias gradients if needed (from working test)
-        for layer in [fc1, fc2]:
-            if layer.bias.grad is not None:
-                if hasattr(layer.bias.grad.data, 'data'):
-                    grad = layer.bias.grad.data.data
-                else:
-                    grad = layer.bias.grad.data
-                
-                if len(grad.shape) == 2:
-                    # Sum over batch dimension
-                    layer.bias.grad = Variable(Tensor(np.sum(grad, axis=0)))
-        
-        # Update
-        optimizer.step()
-        
-        if i % 20 == 0:
-            print(f"  Iteration {i:2d}: Loss = {loss_val:.4f}")
-    
-    # Final test
-    x_var = Variable(Tensor(X), requires_grad=False)
-    h = fc1(x_var)
-    h = relu(h)
-    predictions = fc2(h)
-    
-    print("\nFinal results:")
-    pred_data = predictions.data._data
-    for i in range(4):
-        prediction = pred_data[i, 0]
-        target = y[i, 0]
-        correct = "✅" if abs(prediction - target) < 0.5 else "❌"
-        print(f"  {X[i]} -> {prediction:.3f} (want {target}) {correct}")
-    
-    # Check if loss decreased
-    initial_loss = losses[0]
-    final_loss = losses[-1]
-    
-    print(f"\nLoss change: {initial_loss:.4f} -> {final_loss:.4f}")
-    if final_loss < initial_loss * 0.9:
-        print("✅ Learning happened!")
-        return True
-    else:
-        print("❌ No learning detected")
-        return False
-
-if __name__ == "__main__":
-    success = test_xor_simple()
--- a/examples/xornet/train.py
+++ b/examples/xornet/train.py
@@ -1,194 +0,0 @@
-#!/usr/bin/env python3
-"""
-XOR Network Training with TinyTorch
-
-This example demonstrates training a neural network to solve the classic XOR problem,
-proving that multi-layer networks can learn non-linear functions.
-
-Just like in PyTorch, we:
-1. Create a dataset
-2. Build a model
-3. Train with gradient descent
-4. Evaluate performance
-
-Architecture: 2 → 4 → 1 with ReLU and Sigmoid
-Expected Result: 100% accuracy on XOR truth table
-"""
-
-import numpy as np
-import tinytorch as tt
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU, Sigmoid
-from tinytorch.core.optimizers import SGD
-from tinytorch.core.training import MeanSquaredError as MSELoss
-from tinytorch.core.autograd import Variable
-
-
-def create_dataset():
-    """Create the XOR dataset."""
-    # XOR truth table
-    X = np.array([
-        [0, 0],
-        [0, 1],
-        [1, 0],
-        [1, 1]
-    ], dtype=np.float32)
-    
-    y = np.array([
-        [0],  # 0 XOR 0 = 0
-        [1],  # 0 XOR 1 = 1
-        [1],  # 1 XOR 0 = 1
-        [0]   # 1 XOR 1 = 0
-    ], dtype=np.float32)
-    
-    return X, y
-
-
-def create_model():
-    """Create and initialize the XOR network."""
-    # Simple model: 2 → 4 → 1
-    fc1 = Dense(2, 4)  # 2 inputs -> 4 hidden
-    fc2 = Dense(4, 1)  # 4 hidden -> 1 output
-    
-    # Initialize with reasonable values (He initialization)
-    for layer in [fc1, fc2]:
-        fan_in = layer.weights.shape[0]
-        std = np.sqrt(2.0 / fan_in)
-        layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-        layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-        
-        layer.weights = Variable(layer.weights, requires_grad=True)
-        layer.bias = Variable(layer.bias, requires_grad=True)
-    
-    return fc1, fc2
-
-
-def forward_pass(fc1, fc2, X, requires_grad=True):
-    """Forward pass through the network."""
-    relu = ReLU()
-    
-    x_var = Variable(Tensor(X), requires_grad=requires_grad)
-    h = fc1(x_var)
-    h = relu(h)
-    out = fc2(h)
-    return out
-
-
-def train_network(fc1, fc2, X, y, epochs=500, lr=0.1):
-    """Train the network using gradient descent."""
-    # Optimizer
-    params = [fc1.weights, fc1.bias, fc2.weights, fc2.bias]
-    optimizer = SGD(params, learning_rate=lr)
-    
-    print("Training XOR Network...")
-    print("-" * 40)
-    
-    losses = []
-    
-    for epoch in range(epochs):
-        # Forward pass
-        predictions = forward_pass(fc1, fc2, X)
-        
-        # Loss
-        y_var = Variable(Tensor(y), requires_grad=False)
-        loss_fn = MSELoss()
-        loss = loss_fn(predictions, y_var)
-        
-        if hasattr(loss.data, 'data'):
-            loss_val = float(loss.data.data)
-        else:
-            loss_val = float(loss.data._data)
-        losses.append(loss_val)
-        
-        # Backward
-        optimizer.zero_grad()
-        loss.backward()
-        
-        # Fix bias gradients if needed
-        for layer in [fc1, fc2]:
-            if layer.bias.grad is not None:
-                if hasattr(layer.bias.grad.data, 'data'):
-                    grad = layer.bias.grad.data.data
-                else:
-                    grad = layer.bias.grad.data
-                
-                if len(grad.shape) == 2:
-                    # Sum over batch dimension
-                    layer.bias.grad = Variable(Tensor(np.sum(grad, axis=0)))
-        
-        # Update
-        optimizer.step()
-        
-        # Log progress
-        if epoch % 100 == 0:
-            accuracy = evaluate_model(fc1, fc2, X, y)
-            print(f"Epoch {epoch:4d} | Loss: {loss_val:.4f} | Accuracy: {accuracy:.1%}")
-    
-    return losses
-
-
-def evaluate_model(fc1, fc2, X, y):
-    """Evaluate model accuracy."""
-    predictions = forward_pass(fc1, fc2, X, requires_grad=False)
-    pred_data = predictions.data._data
-    
-    predicted_classes = (pred_data > 0.5).astype(int)
-    correct = np.sum(predicted_classes == y)
-    return correct / y.shape[0]
-
-
-def main():
-    print("=" * 50)
-    print("🧠 XOR Network with TinyTorch")
-    print("=" * 50)
-    print()
-    
-    # Create dataset
-    X, y = create_dataset()
-    
-    # Build model
-    fc1, fc2 = create_model()
-    
-    # Train model
-    losses = train_network(fc1, fc2, X, y, epochs=500)
-    
-    # Final evaluation
-    print("\n" + "=" * 50)
-    print("📊 Final Results:")
-    print("-" * 40)
-    
-    predictions = forward_pass(fc1, fc2, X, requires_grad=False)
-    pred_data = predictions.data._data
-    
-    print("Input  | Target | Prediction | Correct")
-    print("-" * 40)
-    
-    for i in range(X.shape[0]):
-        x_input = X[i]
-        target = y[i, 0]
-        pred = pred_data[i, 0]
-        correct = "✅" if abs(pred - target) < 0.5 else "❌"
-        print(f"{x_input} |   {target}    |   {pred:.3f}    |  {correct}")
-    
-    accuracy = evaluate_model(fc1, fc2, X, y)
-    print("-" * 40)
-    print(f"Final Accuracy: {accuracy:.1%}")
-    
-    if accuracy == 1.0:
-        print("\n🎉 SUCCESS! XOR problem solved!")
-        print("Your TinyTorch framework can learn non-linear functions!")
-    
-    # Show learning progress
-    initial_loss = losses[0]
-    final_loss = losses[-1]
-    print(f"\nLearning Progress:")
-    print(f"Initial loss: {initial_loss:.4f}")
-    print(f"Final loss:   {final_loss:.4f}")
-    print(f"Improvement:  {initial_loss - final_loss:.4f}")
-    
-    return accuracy
-
-
-if __name__ == "__main__":
-    accuracy = main()
--- a/examples/xornet/working_xor_base.py
+++ b/examples/xornet/working_xor_base.py
@@ -1,113 +0,0 @@
-#!/usr/bin/env python3
-"""
-Simple XOR test using the exact pattern from the working autograd test
-"""
-
-import numpy as np
-from tinytorch.core.tensor import Tensor
-from tinytorch.core.layers import Dense
-from tinytorch.core.activations import ReLU
-from tinytorch.core.optimizers import SGD
-from tinytorch.core.training import MeanSquaredError
-from tinytorch.core.autograd import Variable
-
-def test_xor_simple():
-    """Test XOR using the exact working pattern from autograd tests"""
-    
-    # Simple model
-    fc1 = Dense(2, 4)  # 2 inputs -> 4 hidden
-    fc2 = Dense(4, 1)  # 4 hidden -> 1 output
-    
-    # Initialize with reasonable values (from working test)
-    for layer in [fc1, fc2]:
-        fan_in = layer.weights.shape[0]
-        std = np.sqrt(2.0 / fan_in)
-        layer.weights._data = np.random.randn(*layer.weights.shape).astype(np.float32) * std
-        layer.bias._data = np.zeros(layer.bias.shape, dtype=np.float32)
-        
-        layer.weights = Variable(layer.weights, requires_grad=True)
-        layer.bias = Variable(layer.bias, requires_grad=True)
-    
-    # Optimizer
-    params = [fc1.weights, fc1.bias, fc2.weights, fc2.bias]
-    optimizer = SGD(params, learning_rate=0.1)
-    
-    # XOR training data
-    X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
-    y = np.array([[0], [1], [1], [0]], dtype=np.float32)
-    
-    print("Training XOR with working pattern...")
-    print("Initial test:")
-    
-    # Track losses
-    losses = []
-    
-    for i in range(100):
-        # Forward (exact pattern from working test)
-        x_var = Variable(Tensor(X), requires_grad=True)
-        h = fc1(x_var)
-        relu = ReLU()
-        h = relu(h)
-        out = fc2(h)
-        
-        # Loss
-        y_var = Variable(Tensor(y), requires_grad=False)
-        loss_fn = MeanSquaredError()
-        loss = loss_fn(out, y_var)
-        
-        if hasattr(loss.data, 'data'):
-            loss_val = float(loss.data.data)
-        else:
-            loss_val = float(loss.data._data)
-        losses.append(loss_val)
-        
-        # Backward
-        optimizer.zero_grad()
-        loss.backward()
-        
-        # Fix bias gradients if needed (from working test)
-        for layer in [fc1, fc2]:
-            if layer.bias.grad is not None:
-                if hasattr(layer.bias.grad.data, 'data'):
-                    grad = layer.bias.grad.data.data
-                else:
-                    grad = layer.bias.grad.data
-                
-                if len(grad.shape) == 2:
-                    # Sum over batch dimension
-                    layer.bias.grad = Variable(Tensor(np.sum(grad, axis=0)))
-        
-        # Update
-        optimizer.step()
-        
-        if i % 20 == 0:
-            print(f"  Iteration {i:2d}: Loss = {loss_val:.4f}")
-    
-    # Final test
-    x_var = Variable(Tensor(X), requires_grad=False)
-    h = fc1(x_var)
-    h = relu(h)
-    predictions = fc2(h)
-    
-    print("\nFinal results:")
-    pred_data = predictions.data._data
-    for i in range(4):
-        prediction = pred_data[i, 0]
-        target = y[i, 0]
-        correct = "✅" if abs(prediction - target) < 0.5 else "❌"
-        print(f"  {X[i]} -> {prediction:.3f} (want {target}) {correct}")
-    
-    # Check if loss decreased
-    initial_loss = losses[0]
-    final_loss = losses[-1]
-    
-    print(f"\nLoss change: {initial_loss:.4f} -> {final_loss:.4f}")
-    if final_loss < initial_loss * 0.9:
-        print("✅ Learning happened!")
-        return True
-    else:
-        print("❌ No learning detected")
-        return False
-
-if __name__ == "__main__":
-    success = test_xor_simple()