✅ Rename all module directories: 00_setup → 01_setup, etc. ✅ Update convert_modules.py mappings for new directory names ✅ Update _toc.yml file paths and titles (1-14 instead of 0-13) ✅ Regenerate all overview pages with new numbering ✅ Fix all broken references in usage-paths and intro ✅ Update chapter references to use natural numbering Benefits: - More intuitive course progression starting from 1 - Matches academic course numbering conventions - Eliminates confusion about 'Module 0' concept - Cleaner mental model for students and instructors - All references and links properly updated Complete transformation: 14 modules now numbered 01-14
🏋️ Module 9: Training - Complete Neural Network Training Pipeline
📊 Module Info
- Difficulty: ⭐⭐⭐⭐⭐ Expert
- Time Estimate: 8-10 hours
- Prerequisites: Tensor, Activations, Layers, Networks, DataLoader, Autograd, Optimizers modules
- Next Steps: Compression, Kernels, Benchmarking, MLOps modules
Build the complete training pipeline that brings all TinyTorch components together
🎯 Learning Objectives
After completing this module, you will:
- Understand loss functions and how they guide neural network training
- Implement essential loss functions: MSE, CrossEntropy, and BinaryCrossEntropy
- Build evaluation metrics for classification and regression tasks
- Create a complete training loop that orchestrates the entire training process
- Master training workflows with validation, logging, and progress tracking
🧠 Build → Use → Optimize
This module follows the TinyTorch pedagogical framework:
- Build: Loss functions, metrics, and training orchestration components
- Use: Train complete neural networks on real datasets
- Optimize: Analyze training dynamics and improve performance
📚 What You'll Build
Loss Functions
# Regression loss
mse = MeanSquaredError()
loss = mse(predictions, targets)
# Multi-class classification loss
ce = CrossEntropyLoss()
loss = ce(logits, class_indices)
# Binary classification loss
bce = BinaryCrossEntropyLoss()
loss = bce(logits, binary_labels)
Evaluation Metrics
# Classification accuracy
accuracy = Accuracy()
acc = accuracy(predictions, true_labels) # Returns 0.0 to 1.0
# Regression metrics
mae = MeanAbsoluteError()
error = mae(predictions, targets)
Complete Training Pipeline
# Set up training components
model = Sequential([
Dense(784, 128), ReLU(),
Dense(128, 64), ReLU(),
Dense(64, 10), Softmax()
])
optimizer = Adam(model.parameters, learning_rate=0.001)
loss_fn = CrossEntropyLoss()
metrics = [Accuracy()]
# Create trainer
trainer = Trainer(model, optimizer, loss_fn, metrics)
# Train the model
history = trainer.fit(
train_dataloader,
val_dataloader,
epochs=10,
verbose=True
)
Training with Real Data
# Load dataset
from tinytorch.core.dataloader import SimpleDataset, DataLoader
# Create dataset
train_dataset = SimpleDataset(size=1000, num_features=784, num_classes=10)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
# Train on real data
history = trainer.fit(train_loader, epochs=50)
# Analyze training
print(f"Final training loss: {history['train_loss'][-1]:.4f}")
print(f"Final training accuracy: {history['train_accuracy'][-1]:.4f}")
🚀 Getting Started
Prerequisites
- Complete Modules 1-8: Setup through Optimizers ✅
- Understand backpropagation and gradient descent
- Familiar with classification and regression tasks
Quick Start
# Navigate to the training module
cd modules/source/09_training
# Open the development notebook
jupyter lab training_dev.py
# Or use the TinyTorch CLI
tito module info training
tito module test training
📖 Core Concepts
Loss Functions: The Training Signal
Loss functions measure how far our predictions are from the true values:
- MSE: For regression tasks, penalizes large errors heavily
- CrossEntropy: For classification, works with softmax outputs
- BinaryCrossEntropy: For binary classification, works with sigmoid outputs
Metrics: Human-Interpretable Performance
Metrics provide understandable measures of model performance:
- Accuracy: Fraction of correct predictions
- Precision: Of positive predictions, how many were correct?
- Recall: Of actual positives, how many were found?
Training Loop: Orchestrating Learning
The training loop coordinates all components:
- Forward Pass: Model makes predictions
- Loss Computation: Measure prediction quality
- Backward Pass: Compute gradients
- Parameter Update: Improve model weights
- Validation: Monitor generalization performance
Training Dynamics
Understanding how training behaves:
- Overfitting: Model memorizes training data
- Underfitting: Model too simple to learn patterns
- Convergence: Loss stops decreasing
- Validation: Monitoring generalization
🔬 Advanced Features
Training Monitoring
# Track training progress
history = trainer.fit(train_loader, val_loader, epochs=100)
# Plot training curves
import matplotlib.pyplot as plt
plt.plot(history['train_loss'], label='Training Loss')
plt.plot(history['val_loss'], label='Validation Loss')
plt.legend()
plt.show()
Custom Metrics
# Create custom metrics
class F1Score:
def __call__(self, y_pred, y_true):
# Implement F1 score calculation
pass
# Use in training
trainer = Trainer(model, optimizer, loss_fn, metrics=[Accuracy(), F1Score()])
Training Strategies
# Learning rate scheduling
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
# Early stopping
class EarlyStopping:
def __init__(self, patience=10):
self.patience = patience
self.best_loss = float('inf')
self.counter = 0
def __call__(self, val_loss):
if val_loss < self.best_loss:
self.best_loss = val_loss
self.counter = 0
else:
self.counter += 1
return self.counter >= self.patience
🛠️ Real-World Applications
Computer Vision
# Image classification pipeline
model = Sequential([
Conv2D((3, 3)), ReLU(),
flatten,
Dense(128), ReLU(),
Dense(10), Softmax()
])
trainer = Trainer(model, Adam(model.parameters), CrossEntropyLoss(), [Accuracy()])
history = trainer.fit(cifar10_loader, epochs=50)
Natural Language Processing
# Text classification
model = Sequential([
Dense(vocab_size, 128), ReLU(),
Dense(128, 64), ReLU(),
Dense(64, num_classes), Softmax()
])
trainer = Trainer(model, SGD(model.parameters), CrossEntropyLoss(), [Accuracy()])
history = trainer.fit(text_loader, epochs=20)
Regression Tasks
# House price prediction
model = Sequential([
Dense(features, 64), ReLU(),
Dense(64, 32), ReLU(),
Dense(32, 1) # Single output for regression
])
trainer = Trainer(model, Adam(model.parameters), MeanSquaredError(), [])
history = trainer.fit(housing_loader, epochs=100)
📈 Performance Optimization
Batch Size Selection
- Small batches: More updates, noisier gradients
- Large batches: Fewer updates, smoother gradients
- Sweet spot: Usually 32-256 depending on dataset
Learning Rate Tuning
- Too high: Training diverges or oscillates
- Too low: Training is slow or gets stuck
- Adaptive methods: Adam often works well out of the box
Regularization
- Dropout: Randomly disable neurons during training
- Weight decay: L2 regularization on parameters
- Early stopping: Stop when validation performance plateaus
🎯 Module Completion
What You've Built
✅ Complete loss function library: MSE, CrossEntropy, BinaryCrossEntropy
✅ Evaluation metrics: Accuracy and extensible metric framework
✅ Training orchestration: Full-featured Trainer class
✅ Real-world pipeline: Train models on actual datasets
✅ Monitoring tools: Track training progress and performance
Skills Developed
✅ Training loop design: Coordinate all training components
✅ Loss function implementation: Measure prediction quality
✅ Metric computation: Evaluate model performance
✅ Training dynamics: Understand convergence and optimization
✅ Production workflows: Build scalable training pipelines
Next Steps
- Export your training module:
tito export training - Train a complete model: Use all TinyTorch components together
- Explore advanced topics: Regularization, scheduling, ensembles
- Build production pipelines: Scale training to larger datasets
Ready for the final stretch? Your training module completes the core TinyTorch framework. Next up: compression, kernels, and MLOps! 🚀