# 🏋️ Module 9: Training - Complete Neural Network Training Pipeline

## 📊 Module Info
- **Difficulty**: ⭐⭐⭐⭐⭐ Expert
- **Time Estimate**: 8-10 hours
- **Prerequisites**: Tensor, Activations, Layers, Networks, DataLoader, Autograd, Optimizers modules
- **Next Steps**: Compression, Kernels, Benchmarking, MLOps modules

**Build the complete training pipeline that brings all TinyTorch components together**

## 🎯 Learning Objectives

After completing this module, you will:
- Understand loss functions and how they guide neural network training
- Implement essential loss functions: MSE, CrossEntropy, and BinaryCrossEntropy
- Build evaluation metrics for classification and regression tasks
- Create a complete training loop that orchestrates the entire training process
- Master training workflows with validation, logging, and progress tracking

## 🧠 Build → Use → Optimize

This module follows the TinyTorch pedagogical framework:

1. **Build**: Loss functions, metrics, and training orchestration components
2. **Use**: Train complete neural networks on real datasets
3. **Optimize**: Analyze training dynamics and improve performance

## 📚 What You'll Build

### **Loss Functions**
```python
# Regression loss
mse = MeanSquaredError()
loss = mse(predictions, targets)

# Multi-class classification loss
ce = CrossEntropyLoss()
loss = ce(logits, class_indices)

# Binary classification loss
bce = BinaryCrossEntropyLoss()
loss = bce(logits, binary_labels)
```

### **Evaluation Metrics**
```python
# Classification accuracy
accuracy = Accuracy()
acc = accuracy(predictions, true_labels)  # Returns 0.0 to 1.0

# Regression metrics
mae = MeanAbsoluteError()
error = mae(predictions, targets)
```

### **Complete Training Pipeline**
```python
# Set up training components
model = Sequential([
    Dense(784, 128), ReLU(),
    Dense(128, 64), ReLU(),
    Dense(64, 10), Softmax()
])

optimizer = Adam(model.parameters, learning_rate=0.001)
loss_fn = CrossEntropyLoss()
metrics = [Accuracy()]

# Create trainer
trainer = Trainer(model, optimizer, loss_fn, metrics)

# Train the model
history = trainer.fit(
    train_dataloader, 
    val_dataloader, 
    epochs=10,
    verbose=True
)
```

### **Training with Real Data**
```python
# Load dataset
from tinytorch.core.dataloader import SimpleDataset, DataLoader

# Create dataset
train_dataset = SimpleDataset(size=1000, num_features=784, num_classes=10)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Train on real data
history = trainer.fit(train_loader, epochs=50)

# Analyze training
print(f"Final training loss: {history['train_loss'][-1]:.4f}")
print(f"Final training accuracy: {history['train_accuracy'][-1]:.4f}")
```

## 🚀 Getting Started

### Prerequisites
- Complete Modules 1-8: Setup through Optimizers ✅
- Understand backpropagation and gradient descent
- Familiar with classification and regression tasks

### Quick Start
```bash
# Navigate to the training module
cd modules/source/09_training

# Open the development notebook
jupyter lab training_dev.py

# Or use the TinyTorch CLI
tito module info training
tito module test training
```

## 📖 Core Concepts

### **Loss Functions: The Training Signal**
Loss functions measure how far our predictions are from the true values:

- **MSE**: For regression tasks, penalizes large errors heavily
- **CrossEntropy**: For classification, works with softmax outputs
- **BinaryCrossEntropy**: For binary classification, works with sigmoid outputs

### **Metrics: Human-Interpretable Performance**
Metrics provide understandable measures of model performance:

- **Accuracy**: Fraction of correct predictions
- **Precision**: Of positive predictions, how many were correct?
- **Recall**: Of actual positives, how many were found?

### **Training Loop: Orchestrating Learning**
The training loop coordinates all components:

1. **Forward Pass**: Model makes predictions
2. **Loss Computation**: Measure prediction quality
3. **Backward Pass**: Compute gradients
4. **Parameter Update**: Improve model weights
5. **Validation**: Monitor generalization performance

### **Training Dynamics**
Understanding how training behaves:

- **Overfitting**: Model memorizes training data
- **Underfitting**: Model too simple to learn patterns
- **Convergence**: Loss stops decreasing
- **Validation**: Monitoring generalization

## 🔬 Advanced Features

### **Training Monitoring**
```python
# Track training progress
history = trainer.fit(train_loader, val_loader, epochs=100)

# Plot training curves
import matplotlib.pyplot as plt
plt.plot(history['train_loss'], label='Training Loss')
plt.plot(history['val_loss'], label='Validation Loss')
plt.legend()
plt.show()
```

### **Custom Metrics**
```python
# Create custom metrics
class F1Score:
    def __call__(self, y_pred, y_true):
        # Implement F1 score calculation
        pass

# Use in training
trainer = Trainer(model, optimizer, loss_fn, metrics=[Accuracy(), F1Score()])
```

### **Training Strategies**
```python
# Learning rate scheduling
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

# Early stopping
class EarlyStopping:
    def __init__(self, patience=10):
        self.patience = patience
        self.best_loss = float('inf')
        self.counter = 0
    
    def __call__(self, val_loss):
        if val_loss < self.best_loss:
            self.best_loss = val_loss
            self.counter = 0
        else:
            self.counter += 1
            return self.counter >= self.patience
```

## 🛠️ Real-World Applications

### **Computer Vision**
```python
# Image classification pipeline
model = Sequential([
    Conv2D((3, 3)), ReLU(),
    flatten,
    Dense(128), ReLU(),
    Dense(10), Softmax()
])

trainer = Trainer(model, Adam(model.parameters), CrossEntropyLoss(), [Accuracy()])
history = trainer.fit(cifar10_loader, epochs=50)
```

### **Natural Language Processing**
```python
# Text classification
model = Sequential([
    Dense(vocab_size, 128), ReLU(),
    Dense(128, 64), ReLU(),
    Dense(64, num_classes), Softmax()
])

trainer = Trainer(model, SGD(model.parameters), CrossEntropyLoss(), [Accuracy()])
history = trainer.fit(text_loader, epochs=20)
```

### **Regression Tasks**
```python
# House price prediction
model = Sequential([
    Dense(features, 64), ReLU(),
    Dense(64, 32), ReLU(),
    Dense(32, 1)  # Single output for regression
])

trainer = Trainer(model, Adam(model.parameters), MeanSquaredError(), [])
history = trainer.fit(housing_loader, epochs=100)
```

## 📈 Performance Optimization

### **Batch Size Selection**
- **Small batches**: More updates, noisier gradients
- **Large batches**: Fewer updates, smoother gradients
- **Sweet spot**: Usually 32-256 depending on dataset

### **Learning Rate Tuning**
- **Too high**: Training diverges or oscillates
- **Too low**: Training is slow or gets stuck
- **Adaptive methods**: Adam often works well out of the box

### **Regularization**
- **Dropout**: Randomly disable neurons during training
- **Weight decay**: L2 regularization on parameters
- **Early stopping**: Stop when validation performance plateaus

## 🎯 Module Completion

### **What You've Built**
✅ **Complete loss function library**: MSE, CrossEntropy, BinaryCrossEntropy  
✅ **Evaluation metrics**: Accuracy and extensible metric framework  
✅ **Training orchestration**: Full-featured Trainer class  
✅ **Real-world pipeline**: Train models on actual datasets  
✅ **Monitoring tools**: Track training progress and performance  

### **Skills Developed**
✅ **Training loop design**: Coordinate all training components  
✅ **Loss function implementation**: Measure prediction quality  
✅ **Metric computation**: Evaluate model performance  
✅ **Training dynamics**: Understand convergence and optimization  
✅ **Production workflows**: Build scalable training pipelines  

### **Next Steps**
1. **Export your training module**: `tito export training`
2. **Train a complete model**: Use all TinyTorch components together
3. **Explore advanced topics**: Regularization, scheduling, ensembles
4. **Build production pipelines**: Scale training to larger datasets

**Ready for the final stretch?** Your training module completes the core TinyTorch framework. Next up: compression, kernels, and MLOps! 🚀