mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-05 05:35:52 -05:00
✅ Rename all module directories: 00_setup → 01_setup, etc. ✅ Update convert_modules.py mappings for new directory names ✅ Update _toc.yml file paths and titles (1-14 instead of 0-13) ✅ Regenerate all overview pages with new numbering ✅ Fix all broken references in usage-paths and intro ✅ Update chapter references to use natural numbering Benefits: - More intuitive course progression starting from 1 - Matches academic course numbering conventions - Eliminates confusion about 'Module 0' concept - Cleaner mental model for students and instructors - All references and links properly updated Complete transformation: 14 modules now numbered 01-14
280 lines
8.1 KiB
Markdown
280 lines
8.1 KiB
Markdown
# 🏋️ Module 9: Training - Complete Neural Network Training Pipeline
|
|
|
|
## 📊 Module Info
|
|
- **Difficulty**: ⭐⭐⭐⭐⭐ Expert
|
|
- **Time Estimate**: 8-10 hours
|
|
- **Prerequisites**: Tensor, Activations, Layers, Networks, DataLoader, Autograd, Optimizers modules
|
|
- **Next Steps**: Compression, Kernels, Benchmarking, MLOps modules
|
|
|
|
**Build the complete training pipeline that brings all TinyTorch components together**
|
|
|
|
## 🎯 Learning Objectives
|
|
|
|
After completing this module, you will:
|
|
- Understand loss functions and how they guide neural network training
|
|
- Implement essential loss functions: MSE, CrossEntropy, and BinaryCrossEntropy
|
|
- Build evaluation metrics for classification and regression tasks
|
|
- Create a complete training loop that orchestrates the entire training process
|
|
- Master training workflows with validation, logging, and progress tracking
|
|
|
|
## 🧠 Build → Use → Optimize
|
|
|
|
This module follows the TinyTorch pedagogical framework:
|
|
|
|
1. **Build**: Loss functions, metrics, and training orchestration components
|
|
2. **Use**: Train complete neural networks on real datasets
|
|
3. **Optimize**: Analyze training dynamics and improve performance
|
|
|
|
## 📚 What You'll Build
|
|
|
|
### **Loss Functions**
|
|
```python
|
|
# Regression loss
|
|
mse = MeanSquaredError()
|
|
loss = mse(predictions, targets)
|
|
|
|
# Multi-class classification loss
|
|
ce = CrossEntropyLoss()
|
|
loss = ce(logits, class_indices)
|
|
|
|
# Binary classification loss
|
|
bce = BinaryCrossEntropyLoss()
|
|
loss = bce(logits, binary_labels)
|
|
```
|
|
|
|
### **Evaluation Metrics**
|
|
```python
|
|
# Classification accuracy
|
|
accuracy = Accuracy()
|
|
acc = accuracy(predictions, true_labels) # Returns 0.0 to 1.0
|
|
|
|
# Regression metrics
|
|
mae = MeanAbsoluteError()
|
|
error = mae(predictions, targets)
|
|
```
|
|
|
|
### **Complete Training Pipeline**
|
|
```python
|
|
# Set up training components
|
|
model = Sequential([
|
|
Dense(784, 128), ReLU(),
|
|
Dense(128, 64), ReLU(),
|
|
Dense(64, 10), Softmax()
|
|
])
|
|
|
|
optimizer = Adam(model.parameters, learning_rate=0.001)
|
|
loss_fn = CrossEntropyLoss()
|
|
metrics = [Accuracy()]
|
|
|
|
# Create trainer
|
|
trainer = Trainer(model, optimizer, loss_fn, metrics)
|
|
|
|
# Train the model
|
|
history = trainer.fit(
|
|
train_dataloader,
|
|
val_dataloader,
|
|
epochs=10,
|
|
verbose=True
|
|
)
|
|
```
|
|
|
|
### **Training with Real Data**
|
|
```python
|
|
# Load dataset
|
|
from tinytorch.core.dataloader import SimpleDataset, DataLoader
|
|
|
|
# Create dataset
|
|
train_dataset = SimpleDataset(size=1000, num_features=784, num_classes=10)
|
|
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
|
|
|
|
# Train on real data
|
|
history = trainer.fit(train_loader, epochs=50)
|
|
|
|
# Analyze training
|
|
print(f"Final training loss: {history['train_loss'][-1]:.4f}")
|
|
print(f"Final training accuracy: {history['train_accuracy'][-1]:.4f}")
|
|
```
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
- Complete Modules 1-8: Setup through Optimizers ✅
|
|
- Understand backpropagation and gradient descent
|
|
- Familiar with classification and regression tasks
|
|
|
|
### Quick Start
|
|
```bash
|
|
# Navigate to the training module
|
|
cd modules/source/09_training
|
|
|
|
# Open the development notebook
|
|
jupyter lab training_dev.py
|
|
|
|
# Or use the TinyTorch CLI
|
|
tito module info training
|
|
tito module test training
|
|
```
|
|
|
|
## 📖 Core Concepts
|
|
|
|
### **Loss Functions: The Training Signal**
|
|
Loss functions measure how far our predictions are from the true values:
|
|
|
|
- **MSE**: For regression tasks, penalizes large errors heavily
|
|
- **CrossEntropy**: For classification, works with softmax outputs
|
|
- **BinaryCrossEntropy**: For binary classification, works with sigmoid outputs
|
|
|
|
### **Metrics: Human-Interpretable Performance**
|
|
Metrics provide understandable measures of model performance:
|
|
|
|
- **Accuracy**: Fraction of correct predictions
|
|
- **Precision**: Of positive predictions, how many were correct?
|
|
- **Recall**: Of actual positives, how many were found?
|
|
|
|
### **Training Loop: Orchestrating Learning**
|
|
The training loop coordinates all components:
|
|
|
|
1. **Forward Pass**: Model makes predictions
|
|
2. **Loss Computation**: Measure prediction quality
|
|
3. **Backward Pass**: Compute gradients
|
|
4. **Parameter Update**: Improve model weights
|
|
5. **Validation**: Monitor generalization performance
|
|
|
|
### **Training Dynamics**
|
|
Understanding how training behaves:
|
|
|
|
- **Overfitting**: Model memorizes training data
|
|
- **Underfitting**: Model too simple to learn patterns
|
|
- **Convergence**: Loss stops decreasing
|
|
- **Validation**: Monitoring generalization
|
|
|
|
## 🔬 Advanced Features
|
|
|
|
### **Training Monitoring**
|
|
```python
|
|
# Track training progress
|
|
history = trainer.fit(train_loader, val_loader, epochs=100)
|
|
|
|
# Plot training curves
|
|
import matplotlib.pyplot as plt
|
|
plt.plot(history['train_loss'], label='Training Loss')
|
|
plt.plot(history['val_loss'], label='Validation Loss')
|
|
plt.legend()
|
|
plt.show()
|
|
```
|
|
|
|
### **Custom Metrics**
|
|
```python
|
|
# Create custom metrics
|
|
class F1Score:
|
|
def __call__(self, y_pred, y_true):
|
|
# Implement F1 score calculation
|
|
pass
|
|
|
|
# Use in training
|
|
trainer = Trainer(model, optimizer, loss_fn, metrics=[Accuracy(), F1Score()])
|
|
```
|
|
|
|
### **Training Strategies**
|
|
```python
|
|
# Learning rate scheduling
|
|
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
|
|
|
|
# Early stopping
|
|
class EarlyStopping:
|
|
def __init__(self, patience=10):
|
|
self.patience = patience
|
|
self.best_loss = float('inf')
|
|
self.counter = 0
|
|
|
|
def __call__(self, val_loss):
|
|
if val_loss < self.best_loss:
|
|
self.best_loss = val_loss
|
|
self.counter = 0
|
|
else:
|
|
self.counter += 1
|
|
return self.counter >= self.patience
|
|
```
|
|
|
|
## 🛠️ Real-World Applications
|
|
|
|
### **Computer Vision**
|
|
```python
|
|
# Image classification pipeline
|
|
model = Sequential([
|
|
Conv2D((3, 3)), ReLU(),
|
|
flatten,
|
|
Dense(128), ReLU(),
|
|
Dense(10), Softmax()
|
|
])
|
|
|
|
trainer = Trainer(model, Adam(model.parameters), CrossEntropyLoss(), [Accuracy()])
|
|
history = trainer.fit(cifar10_loader, epochs=50)
|
|
```
|
|
|
|
### **Natural Language Processing**
|
|
```python
|
|
# Text classification
|
|
model = Sequential([
|
|
Dense(vocab_size, 128), ReLU(),
|
|
Dense(128, 64), ReLU(),
|
|
Dense(64, num_classes), Softmax()
|
|
])
|
|
|
|
trainer = Trainer(model, SGD(model.parameters), CrossEntropyLoss(), [Accuracy()])
|
|
history = trainer.fit(text_loader, epochs=20)
|
|
```
|
|
|
|
### **Regression Tasks**
|
|
```python
|
|
# House price prediction
|
|
model = Sequential([
|
|
Dense(features, 64), ReLU(),
|
|
Dense(64, 32), ReLU(),
|
|
Dense(32, 1) # Single output for regression
|
|
])
|
|
|
|
trainer = Trainer(model, Adam(model.parameters), MeanSquaredError(), [])
|
|
history = trainer.fit(housing_loader, epochs=100)
|
|
```
|
|
|
|
## 📈 Performance Optimization
|
|
|
|
### **Batch Size Selection**
|
|
- **Small batches**: More updates, noisier gradients
|
|
- **Large batches**: Fewer updates, smoother gradients
|
|
- **Sweet spot**: Usually 32-256 depending on dataset
|
|
|
|
### **Learning Rate Tuning**
|
|
- **Too high**: Training diverges or oscillates
|
|
- **Too low**: Training is slow or gets stuck
|
|
- **Adaptive methods**: Adam often works well out of the box
|
|
|
|
### **Regularization**
|
|
- **Dropout**: Randomly disable neurons during training
|
|
- **Weight decay**: L2 regularization on parameters
|
|
- **Early stopping**: Stop when validation performance plateaus
|
|
|
|
## 🎯 Module Completion
|
|
|
|
### **What You've Built**
|
|
✅ **Complete loss function library**: MSE, CrossEntropy, BinaryCrossEntropy
|
|
✅ **Evaluation metrics**: Accuracy and extensible metric framework
|
|
✅ **Training orchestration**: Full-featured Trainer class
|
|
✅ **Real-world pipeline**: Train models on actual datasets
|
|
✅ **Monitoring tools**: Track training progress and performance
|
|
|
|
### **Skills Developed**
|
|
✅ **Training loop design**: Coordinate all training components
|
|
✅ **Loss function implementation**: Measure prediction quality
|
|
✅ **Metric computation**: Evaluate model performance
|
|
✅ **Training dynamics**: Understand convergence and optimization
|
|
✅ **Production workflows**: Build scalable training pipelines
|
|
|
|
### **Next Steps**
|
|
1. **Export your training module**: `tito export training`
|
|
2. **Train a complete model**: Use all TinyTorch components together
|
|
3. **Explore advanced topics**: Regularization, scheduling, ensembles
|
|
4. **Build production pipelines**: Scale training to larger datasets
|
|
|
|
**Ready for the final stretch?** Your training module completes the core TinyTorch framework. Next up: compression, kernels, and MLOps! 🚀 |