Files
TinyTorch/docs/cifar10-training-guide.md
Vijay Janapa Reddi 9ab3b7a5b6 Document north star CIFAR-10 training capabilities
- Add comprehensive README section showcasing 75% accuracy goal
- Update dataloader module README with CIFAR-10 support details
- Update training module README with checkpointing features
- Create complete CIFAR-10 training guide for students
- Document all north star implementations in CLAUDE.md

Students can now train real CNNs on CIFAR-10 using 100% TinyTorch code.
2025-09-17 00:43:19 -04:00

7.6 KiB

🎯 CIFAR-10 Training Guide: Achieving 75% Accuracy

Overview

This guide walks you through training a CNN on CIFAR-10 using your TinyTorch implementation to achieve our north star goal of 75% accuracy.

Prerequisites

Complete these modules first:

  • Module 08: DataLoader (for CIFAR-10 loading)
  • Module 11: Training (for model checkpointing)
  • Module 06: Spatial (for CNN layers)
  • Module 10: Optimizers (for Adam optimizer)

Step 1: Load CIFAR-10 Data

from tinytorch.core.dataloader import CIFAR10Dataset, DataLoader

# Download CIFAR-10 (one-time, ~170MB)
dataset = CIFAR10Dataset(download=True, flatten=False)
print(f"✅ Training samples: {len(dataset.train_data)}")
print(f"✅ Test samples: {len(dataset.test_data)}")

# Create data loaders
train_loader = DataLoader(
    dataset.train_data, 
    dataset.train_labels, 
    batch_size=32, 
    shuffle=True
)

test_loader = DataLoader(
    dataset.test_data,
    dataset.test_labels,
    batch_size=32,
    shuffle=False
)

Step 2: Build Your CNN Architecture

Option A: Simple CNN (Good for initial testing)

from tinytorch.core.networks import Sequential
from tinytorch.core.layers import Dense
from tinytorch.core.spatial import Conv2D, MaxPool2D, Flatten
from tinytorch.core.activations import ReLU

model = Sequential([
    # First conv block
    Conv2D(3, 32, kernel_size=3, padding=1),
    ReLU(),
    MaxPool2D(2),
    
    # Second conv block  
    Conv2D(32, 64, kernel_size=3, padding=1),
    ReLU(),
    MaxPool2D(2),
    
    # Flatten and classify
    Flatten(),
    Dense(64 * 8 * 8, 128),
    ReLU(),
    Dense(128, 10)
])

Option B: Deeper CNN (Better accuracy)

model = Sequential([
    # Block 1
    Conv2D(3, 64, kernel_size=3, padding=1),
    ReLU(),
    Conv2D(64, 64, kernel_size=3, padding=1),
    ReLU(),
    MaxPool2D(2),
    
    # Block 2
    Conv2D(64, 128, kernel_size=3, padding=1),
    ReLU(),
    Conv2D(128, 128, kernel_size=3, padding=1),
    ReLU(),
    MaxPool2D(2),
    
    # Classifier
    Flatten(),
    Dense(128 * 8 * 8, 256),
    ReLU(),
    Dense(256, 128),
    ReLU(),
    Dense(128, 10)
])

Step 3: Configure Training

from tinytorch.core.training import Trainer, CrossEntropyLoss, Accuracy
from tinytorch.core.optimizers import Adam

# Setup training components
loss_fn = CrossEntropyLoss()
optimizer = Adam(lr=0.001)
metrics = [Accuracy()]

# Create trainer
trainer = Trainer(model, loss_fn, optimizer, metrics)

Step 4: Train with Checkpointing

# Train with automatic model saving
history = trainer.fit(
    train_loader,
    val_dataloader=test_loader,
    epochs=30,
    save_best=True,                    # Save best model
    checkpoint_path='best_cifar10.pkl', # Where to save
    early_stopping_patience=5,          # Stop if no improvement
    verbose=True                        # Show progress
)

print(f"🎉 Best validation accuracy: {max(history['val_accuracy']):.2%}")

Step 5: Evaluate Performance

from tinytorch.core.training import evaluate_model, plot_training_history

# Load best model
trainer.load_checkpoint('best_cifar10.pkl')

# Comprehensive evaluation
results = evaluate_model(model, test_loader)
print(f"\n📊 Test Results:")
print(f"Accuracy: {results['accuracy']:.2%}")
print(f"Per-class accuracy:")
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 
           'dog', 'frog', 'horse', 'ship', 'truck']
for i, class_name in enumerate(classes):
    class_acc = results['per_class_accuracy'][i]
    print(f"  {class_name}: {class_acc:.2%}")

# Visualize training curves
plot_training_history(history)

Step 6: Analyze Confusion Matrix

from tinytorch.core.training import compute_confusion_matrix
import numpy as np

# Get predictions for entire test set
all_preds = []
all_labels = []
for batch_x, batch_y in test_loader:
    preds = model(batch_x).data.argmax(axis=1)
    all_preds.extend(preds)
    all_labels.extend(batch_y.data)

# Compute confusion matrix
cm = compute_confusion_matrix(np.array(all_preds), np.array(all_labels))

# Analyze common mistakes
print("\n🔍 Common Confusions:")
for i in range(10):
    for j in range(10):
        if i != j and cm[i, j] > 100:  # More than 100 mistakes
            print(f"{classes[i]} confused as {classes[j]}: {cm[i, j]} times")

Training Tips for 75%+ Accuracy

1. Data Preprocessing

# Normalize data for better convergence
from tinytorch.core.dataloader import Normalizer

normalizer = Normalizer()
normalizer.fit(dataset.train_data)
train_data_normalized = normalizer.transform(dataset.train_data)
test_data_normalized = normalizer.transform(dataset.test_data)

2. Learning Rate Scheduling

# Reduce learning rate when stuck
for epoch in range(epochs):
    if epoch == 20:
        optimizer.lr *= 0.1  # Reduce by 10x
    trainer.train_epoch(train_loader)

3. Data Augmentation (Simple)

# Random horizontal flips for training
def augment_batch(batch_x, batch_y):
    # Randomly flip half the images horizontally
    flip_mask = np.random.random(len(batch_x)) > 0.5
    batch_x[flip_mask] = batch_x[flip_mask][:, :, :, ::-1]
    return batch_x, batch_y

4. Monitor Training Progress

# Check if model is learning
if epoch % 5 == 0:
    train_acc = evaluate_model(model, train_loader)['accuracy']
    test_acc = evaluate_model(model, test_loader)['accuracy']
    gap = train_acc - test_acc
    
    if gap > 0.15:
        print("⚠️ Overfitting detected! Consider:")
        print("  - Adding dropout layers")
        print("  - Reducing model complexity")
        print("  - Increasing batch size")
    elif train_acc < 0.6:
        print("⚠️ Underfitting! Consider:")
        print("  - Increasing model capacity")
        print("  - Checking learning rate")
        print("  - Training longer")

Expected Results Timeline

  • After 5 epochs: ~40-50% accuracy (model learning basic patterns)
  • After 10 epochs: ~55-65% accuracy (recognizing shapes)
  • After 20 epochs: ~70-75% accuracy (good feature extraction)
  • After 30 epochs: ~75-80% accuracy (north star achieved! 🎉)

Troubleshooting Common Issues

Issue: Accuracy stuck at ~10%

Solution: Check loss is decreasing. If not, reduce learning rate.

Issue: Loss is NaN

Solution: Learning rate too high. Start with 0.0001 instead.

Issue: Accuracy oscillating wildly

Solution: Batch size too small. Try 64 or 128.

Issue: Training very slow

Solution: Ensure you're using vectorized operations, not loops.

Issue: Memory errors

Solution: Reduce batch size or model size.

Celebrating Success! 🎉

Once you achieve 75% accuracy:

  1. Save your model: This is a real achievement!
trainer.save_checkpoint('my_75_percent_model.pkl')
  1. Document your architecture: What worked?
print(model.summary())  # Your architecture
print(f"Parameters: {model.count_parameters()}")
print(f"Best epoch: {np.argmax(history['val_accuracy'])}")
  1. Share your results: You built this from scratch!
print(f"🏆 CIFAR-10 Test Accuracy: {results['accuracy']:.2%}")
print("✅ North Star Goal Achieved!")
print("🎯 Built entirely with TinyTorch - no PyTorch/TensorFlow!")

Next Challenges

After achieving 75%:

  • 🚀 Push for 80%+ with better architectures
  • 🎨 Implement data augmentation for 85%+
  • Optimize training speed with better kernels
  • 🔬 Analyze what your CNN learned with visualizations
  • 🏆 Try other datasets (Fashion-MNIST, etc.)

Remember: You built every component from scratch - from tensors to convolutions to optimizers. This 75% accuracy represents deep understanding of ML systems, not just API usage!