Document north star CIFAR-10 training capabilities

- Add comprehensive README section showcasing 75% accuracy goal
- Update dataloader module README with CIFAR-10 support details
- Update training module README with checkpointing features
- Create complete CIFAR-10 training guide for students
- Document all north star implementations in CLAUDE.md

Students can now train real CNNs on CIFAR-10 using 100% TinyTorch code.
This commit is contained in:
Vijay Janapa Reddi
2025-09-17 00:43:19 -04:00
parent 7f94cc4809
commit 2c0f5ba8c9
5 changed files with 483 additions and 1 deletions

View File

@@ -0,0 +1,282 @@
# 🎯 CIFAR-10 Training Guide: Achieving 75% Accuracy
## Overview
This guide walks you through training a CNN on CIFAR-10 using your TinyTorch implementation to achieve our north star goal of 75% accuracy.
## Prerequisites
Complete these modules first:
- ✅ Module 08: DataLoader (for CIFAR-10 loading)
- ✅ Module 11: Training (for model checkpointing)
- ✅ Module 06: Spatial (for CNN layers)
- ✅ Module 10: Optimizers (for Adam optimizer)
## Step 1: Load CIFAR-10 Data
```python
from tinytorch.core.dataloader import CIFAR10Dataset, DataLoader
# Download CIFAR-10 (one-time, ~170MB)
dataset = CIFAR10Dataset(download=True, flatten=False)
print(f"✅ Training samples: {len(dataset.train_data)}")
print(f"✅ Test samples: {len(dataset.test_data)}")
# Create data loaders
train_loader = DataLoader(
dataset.train_data,
dataset.train_labels,
batch_size=32,
shuffle=True
)
test_loader = DataLoader(
dataset.test_data,
dataset.test_labels,
batch_size=32,
shuffle=False
)
```
## Step 2: Build Your CNN Architecture
### Option A: Simple CNN (Good for initial testing)
```python
from tinytorch.core.networks import Sequential
from tinytorch.core.layers import Dense
from tinytorch.core.spatial import Conv2D, MaxPool2D, Flatten
from tinytorch.core.activations import ReLU
model = Sequential([
# First conv block
Conv2D(3, 32, kernel_size=3, padding=1),
ReLU(),
MaxPool2D(2),
# Second conv block
Conv2D(32, 64, kernel_size=3, padding=1),
ReLU(),
MaxPool2D(2),
# Flatten and classify
Flatten(),
Dense(64 * 8 * 8, 128),
ReLU(),
Dense(128, 10)
])
```
### Option B: Deeper CNN (Better accuracy)
```python
model = Sequential([
# Block 1
Conv2D(3, 64, kernel_size=3, padding=1),
ReLU(),
Conv2D(64, 64, kernel_size=3, padding=1),
ReLU(),
MaxPool2D(2),
# Block 2
Conv2D(64, 128, kernel_size=3, padding=1),
ReLU(),
Conv2D(128, 128, kernel_size=3, padding=1),
ReLU(),
MaxPool2D(2),
# Classifier
Flatten(),
Dense(128 * 8 * 8, 256),
ReLU(),
Dense(256, 128),
ReLU(),
Dense(128, 10)
])
```
## Step 3: Configure Training
```python
from tinytorch.core.training import Trainer, CrossEntropyLoss, Accuracy
from tinytorch.core.optimizers import Adam
# Setup training components
loss_fn = CrossEntropyLoss()
optimizer = Adam(lr=0.001)
metrics = [Accuracy()]
# Create trainer
trainer = Trainer(model, loss_fn, optimizer, metrics)
```
## Step 4: Train with Checkpointing
```python
# Train with automatic model saving
history = trainer.fit(
train_loader,
val_dataloader=test_loader,
epochs=30,
save_best=True, # Save best model
checkpoint_path='best_cifar10.pkl', # Where to save
early_stopping_patience=5, # Stop if no improvement
verbose=True # Show progress
)
print(f"🎉 Best validation accuracy: {max(history['val_accuracy']):.2%}")
```
## Step 5: Evaluate Performance
```python
from tinytorch.core.training import evaluate_model, plot_training_history
# Load best model
trainer.load_checkpoint('best_cifar10.pkl')
# Comprehensive evaluation
results = evaluate_model(model, test_loader)
print(f"\n📊 Test Results:")
print(f"Accuracy: {results['accuracy']:.2%}")
print(f"Per-class accuracy:")
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
for i, class_name in enumerate(classes):
class_acc = results['per_class_accuracy'][i]
print(f" {class_name}: {class_acc:.2%}")
# Visualize training curves
plot_training_history(history)
```
## Step 6: Analyze Confusion Matrix
```python
from tinytorch.core.training import compute_confusion_matrix
import numpy as np
# Get predictions for entire test set
all_preds = []
all_labels = []
for batch_x, batch_y in test_loader:
preds = model(batch_x).data.argmax(axis=1)
all_preds.extend(preds)
all_labels.extend(batch_y.data)
# Compute confusion matrix
cm = compute_confusion_matrix(np.array(all_preds), np.array(all_labels))
# Analyze common mistakes
print("\n🔍 Common Confusions:")
for i in range(10):
for j in range(10):
if i != j and cm[i, j] > 100: # More than 100 mistakes
print(f"{classes[i]} confused as {classes[j]}: {cm[i, j]} times")
```
## Training Tips for 75%+ Accuracy
### 1. Data Preprocessing
```python
# Normalize data for better convergence
from tinytorch.core.dataloader import Normalizer
normalizer = Normalizer()
normalizer.fit(dataset.train_data)
train_data_normalized = normalizer.transform(dataset.train_data)
test_data_normalized = normalizer.transform(dataset.test_data)
```
### 2. Learning Rate Scheduling
```python
# Reduce learning rate when stuck
for epoch in range(epochs):
if epoch == 20:
optimizer.lr *= 0.1 # Reduce by 10x
trainer.train_epoch(train_loader)
```
### 3. Data Augmentation (Simple)
```python
# Random horizontal flips for training
def augment_batch(batch_x, batch_y):
# Randomly flip half the images horizontally
flip_mask = np.random.random(len(batch_x)) > 0.5
batch_x[flip_mask] = batch_x[flip_mask][:, :, :, ::-1]
return batch_x, batch_y
```
### 4. Monitor Training Progress
```python
# Check if model is learning
if epoch % 5 == 0:
train_acc = evaluate_model(model, train_loader)['accuracy']
test_acc = evaluate_model(model, test_loader)['accuracy']
gap = train_acc - test_acc
if gap > 0.15:
print("⚠️ Overfitting detected! Consider:")
print(" - Adding dropout layers")
print(" - Reducing model complexity")
print(" - Increasing batch size")
elif train_acc < 0.6:
print("⚠️ Underfitting! Consider:")
print(" - Increasing model capacity")
print(" - Checking learning rate")
print(" - Training longer")
```
## Expected Results Timeline
- **After 5 epochs**: ~40-50% accuracy (model learning basic patterns)
- **After 10 epochs**: ~55-65% accuracy (recognizing shapes)
- **After 20 epochs**: ~70-75% accuracy (good feature extraction)
- **After 30 epochs**: ~75-80% accuracy (north star achieved! 🎉)
## Troubleshooting Common Issues
### Issue: Accuracy stuck at ~10%
**Solution**: Check loss is decreasing. If not, reduce learning rate.
### Issue: Loss is NaN
**Solution**: Learning rate too high. Start with 0.0001 instead.
### Issue: Accuracy oscillating wildly
**Solution**: Batch size too small. Try 64 or 128.
### Issue: Training very slow
**Solution**: Ensure you're using vectorized operations, not loops.
### Issue: Memory errors
**Solution**: Reduce batch size or model size.
## Celebrating Success! 🎉
Once you achieve 75% accuracy:
1. **Save your model**: This is a real achievement!
```python
trainer.save_checkpoint('my_75_percent_model.pkl')
```
2. **Document your architecture**: What worked?
```python
print(model.summary()) # Your architecture
print(f"Parameters: {model.count_parameters()}")
print(f"Best epoch: {np.argmax(history['val_accuracy'])}")
```
3. **Share your results**: You built this from scratch!
```python
print(f"🏆 CIFAR-10 Test Accuracy: {results['accuracy']:.2%}")
print("✅ North Star Goal Achieved!")
print("🎯 Built entirely with TinyTorch - no PyTorch/TensorFlow!")
```
## Next Challenges
After achieving 75%:
- 🚀 Push for 80%+ with better architectures
- 🎨 Implement data augmentation for 85%+
- ⚡ Optimize training speed with better kernels
- 🔬 Analyze what your CNN learned with visualizations
- 🏆 Try other datasets (Fashion-MNIST, etc.)
Remember: You built every component from scratch - from tensors to convolutions to optimizers. This 75% accuracy represents deep understanding of ML systems, not just API usage!