mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-05 08:57:28 -05:00
✅ Renamed modules for clearer pedagogical flow: - 05_networks → 05_dense (multi-layer dense/fully connected networks) - 06_cnn → 06_spatial (convolutional networks for spatial patterns) - 06_attention → 07_attention (attention mechanisms for sequences) ✅ Shifted remaining modules down by 1: - 07_dataloader → 08_dataloader - 08_autograd → 09_autograd - 09_optimizers → 10_optimizers - 10_training → 11_training - 11_compression → 12_compression - 12_kernels → 13_kernels - 13_benchmarking → 14_benchmarking - 14_mlops → 15_mlops - 15_capstone → 16_capstone ✅ Updated module metadata (module.yaml files): - Updated names, descriptions, dependencies - Fixed prerequisite chains and enables relationships - Updated export paths to match new names New learner progression: Foundation → Individual Layers → Dense Networks → Spatial Networks → Attention Networks → Training Pipeline Perfect pedagogical flow: Build one layer → Stack dense layers → Add spatial patterns → Add attention mechanisms → Learn to train them all.
287 lines
11 KiB
Markdown
287 lines
11 KiB
Markdown
# 🔥 Module: Training
|
|
|
|
## 📊 Module Info
|
|
- **Difficulty**: ⭐⭐⭐⭐ Expert
|
|
- **Time Estimate**: 8-10 hours
|
|
- **Prerequisites**: Tensor, Activations, Layers, Networks, DataLoader, Autograd, Optimizers modules
|
|
- **Next Steps**: Compression, Kernels, Benchmarking, MLOps modules
|
|
|
|
Build the complete training pipeline that brings all TinyTorch components together. This capstone module orchestrates data loading, model forward passes, loss computation, backpropagation, and optimization into the end-to-end training workflows that power modern AI systems.
|
|
|
|
## 🎯 Learning Objectives
|
|
|
|
By the end of this module, you will be able to:
|
|
|
|
- **Design complete training architectures**: Orchestrate all ML components into cohesive training systems
|
|
- **Implement essential loss functions**: Build MSE, CrossEntropy, and BinaryCrossEntropy from mathematical foundations
|
|
- **Create evaluation frameworks**: Develop metrics systems for classification, regression, and model performance assessment
|
|
- **Build production training loops**: Implement robust training workflows with validation, logging, and progress tracking
|
|
- **Master training dynamics**: Understand convergence, overfitting, generalization, and optimization in real scenarios
|
|
|
|
## 🧠 Build → Use → Optimize
|
|
|
|
This module follows TinyTorch's **Build → Use → Optimize** framework:
|
|
|
|
1. **Build**: Implement loss functions, evaluation metrics, and complete training orchestration systems
|
|
2. **Use**: Train end-to-end neural networks on real datasets with full pipeline automation
|
|
3. **Optimize**: Analyze training dynamics, debug convergence issues, and optimize training performance for production
|
|
|
|
## 📚 What You'll Build
|
|
|
|
### Complete Training Pipeline
|
|
```python
|
|
# End-to-end training system
|
|
from tinytorch.core.training import Trainer
|
|
from tinytorch.core.losses import CrossEntropyLoss
|
|
from tinytorch.core.metrics import Accuracy
|
|
|
|
# Define complete model architecture
|
|
model = Sequential([
|
|
Dense(784, 128), ReLU(),
|
|
Dense(128, 64), ReLU(),
|
|
Dense(64, 10), Softmax()
|
|
])
|
|
|
|
# Configure training components
|
|
optimizer = Adam(model.parameters(), learning_rate=0.001)
|
|
loss_fn = CrossEntropyLoss()
|
|
metrics = [Accuracy()]
|
|
|
|
# Create and configure trainer
|
|
trainer = Trainer(
|
|
model=model,
|
|
optimizer=optimizer,
|
|
loss_fn=loss_fn,
|
|
metrics=metrics
|
|
)
|
|
|
|
# Train with comprehensive monitoring
|
|
history = trainer.fit(
|
|
train_dataloader=train_loader,
|
|
val_dataloader=val_loader,
|
|
epochs=50,
|
|
verbose=True
|
|
)
|
|
```
|
|
|
|
### Loss Function Library
|
|
```python
|
|
# Regression loss for continuous targets
|
|
mse_loss = MeanSquaredError()
|
|
regression_loss = mse_loss(predictions, continuous_targets)
|
|
|
|
# Multi-class classification loss
|
|
ce_loss = CrossEntropyLoss()
|
|
classification_loss = ce_loss(logits, class_indices)
|
|
|
|
# Binary classification loss
|
|
bce_loss = BinaryCrossEntropyLoss()
|
|
binary_loss = bce_loss(sigmoid_outputs, binary_labels)
|
|
|
|
# All losses support batch processing and gradient computation
|
|
loss.backward() # Automatic differentiation integration
|
|
```
|
|
|
|
### Evaluation Metrics System
|
|
```python
|
|
# Classification performance measurement
|
|
accuracy = Accuracy()
|
|
acc_score = accuracy(predictions, true_labels) # Returns 0.0 to 1.0
|
|
|
|
# Regression error measurement
|
|
mae = MeanAbsoluteError()
|
|
error = mae(predictions, targets)
|
|
|
|
# Extensible metric framework
|
|
class CustomMetric:
|
|
def __call__(self, y_pred, y_true):
|
|
# Implement custom evaluation logic
|
|
return custom_score
|
|
|
|
metrics = [Accuracy(), CustomMetric()]
|
|
trainer = Trainer(model, optimizer, loss_fn, metrics)
|
|
```
|
|
|
|
### Real-World Training Workflows
|
|
```python
|
|
# Train on CIFAR-10 with full pipeline
|
|
from tinytorch.core.dataloader import CIFAR10Dataset, DataLoader
|
|
|
|
# Load and prepare data
|
|
train_dataset = CIFAR10Dataset("data/cifar10/", train=True, download=True)
|
|
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
|
|
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
|
|
|
|
# Configure CNN for computer vision
|
|
cnn_model = Sequential([
|
|
Conv2D(3, 16, kernel_size=3), ReLU(),
|
|
MaxPool2D(kernel_size=2),
|
|
Conv2D(16, 32, kernel_size=3), ReLU(),
|
|
Flatten(),
|
|
Dense(32 * 13 * 13, 128), ReLU(),
|
|
Dense(128, 10)
|
|
])
|
|
|
|
# Train with monitoring and validation
|
|
trainer = Trainer(cnn_model, Adam(cnn_model.parameters()), CrossEntropyLoss(), [Accuracy()])
|
|
history = trainer.fit(train_loader, val_loader, epochs=100)
|
|
|
|
# Analyze training results
|
|
print(f"Final train accuracy: {history['train_accuracy'][-1]:.4f}")
|
|
print(f"Final val accuracy: {history['val_accuracy'][-1]:.4f}")
|
|
```
|
|
|
|
## 🚀 Getting Started
|
|
|
|
### Prerequisites
|
|
Ensure you have completed the entire TinyTorch foundation:
|
|
|
|
```bash
|
|
# Activate TinyTorch environment
|
|
source bin/activate-tinytorch.sh
|
|
|
|
# Verify all prerequisite modules (this is the capstone!)
|
|
tito test --module tensor
|
|
tito test --module activations
|
|
tito test --module layers
|
|
tito test --module networks
|
|
tito test --module dataloader
|
|
tito test --module autograd
|
|
tito test --module optimizers
|
|
```
|
|
|
|
### Development Workflow
|
|
1. **Open the development file**: `modules/source/10_training/training_dev.py`
|
|
2. **Implement loss functions**: Build MSE, CrossEntropy, and BinaryCrossEntropy with proper gradients
|
|
3. **Create metrics system**: Develop Accuracy and extensible evaluation framework
|
|
4. **Build Trainer class**: Orchestrate training loop with validation and monitoring
|
|
5. **Test end-to-end training**: Apply complete pipeline to real datasets and problems
|
|
6. **Export and verify**: `tito export --module training && tito test --module training`
|
|
|
|
## 🧪 Testing Your Implementation
|
|
|
|
### Comprehensive Test Suite
|
|
Run the full test suite to verify complete training system functionality:
|
|
|
|
```bash
|
|
# TinyTorch CLI (recommended)
|
|
tito test --module training
|
|
|
|
# Direct pytest execution
|
|
python -m pytest tests/ -k training -v
|
|
```
|
|
|
|
### Test Coverage Areas
|
|
- ✅ **Loss Function Implementation**: Verify mathematical correctness and gradient computation
|
|
- ✅ **Metrics System**: Test accuracy calculation and extensible framework
|
|
- ✅ **Training Loop Orchestration**: Ensure proper coordination of all components
|
|
- ✅ **End-to-End Training**: Verify complete workflows on real datasets
|
|
- ✅ **Convergence Analysis**: Test training dynamics and optimization behavior
|
|
|
|
### Inline Testing & Training Analysis
|
|
The module includes comprehensive training validation and convergence monitoring:
|
|
```python
|
|
# Example inline test output
|
|
🔬 Unit Test: CrossEntropy loss function...
|
|
✅ Mathematical correctness verified
|
|
✅ Gradient computation working
|
|
✅ Batch processing supported
|
|
📈 Progress: Loss Functions ✓
|
|
|
|
# Training monitoring
|
|
🔬 Unit Test: Complete training pipeline...
|
|
✅ Trainer orchestrates all components correctly
|
|
✅ Training loop converges on test problem
|
|
✅ Validation monitoring working
|
|
📈 Progress: End-to-End Training ✓
|
|
|
|
# Real dataset training
|
|
📊 Training on CIFAR-10 subset...
|
|
Epoch 1/10: train_loss=2.345, train_acc=0.234, val_loss=2.123, val_acc=0.278
|
|
Epoch 5/10: train_loss=1.456, train_acc=0.567, val_loss=1.543, val_acc=0.523
|
|
✅ Model converging successfully
|
|
```
|
|
|
|
### Manual Testing Examples
|
|
```python
|
|
from training_dev import Trainer, CrossEntropyLoss, Accuracy
|
|
from networks_dev import Sequential
|
|
from layers_dev import Dense
|
|
from activations_dev import ReLU, Softmax
|
|
from optimizers_dev import Adam
|
|
|
|
# Test complete training on synthetic data
|
|
model = Sequential([Dense(4, 8), ReLU(), Dense(8, 3), Softmax()])
|
|
optimizer = Adam(model.parameters(), learning_rate=0.01)
|
|
loss_fn = CrossEntropyLoss()
|
|
metrics = [Accuracy()]
|
|
|
|
trainer = Trainer(model, optimizer, loss_fn, metrics)
|
|
|
|
# Create simple dataset
|
|
from dataloader_dev import SimpleDataset, DataLoader
|
|
train_dataset = SimpleDataset(size=1000, num_features=4, num_classes=3)
|
|
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
|
|
|
|
# Train and monitor
|
|
history = trainer.fit(train_loader, epochs=20, verbose=True)
|
|
print(f"Training completed. Final accuracy: {history['train_accuracy'][-1]:.4f}")
|
|
```
|
|
|
|
## 🎯 Key Concepts
|
|
|
|
### Real-World Applications
|
|
- **Production ML Systems**: Companies like Netflix, Google use similar training pipelines for recommendation and search systems
|
|
- **Research Workflows**: Academic researchers use training frameworks like this for experimental model development
|
|
- **MLOps Platforms**: Production training systems extend these patterns with distributed computing and monitoring
|
|
- **Edge AI Training**: Federated learning systems use similar orchestration patterns across distributed devices
|
|
|
|
### Training System Architecture
|
|
- **Loss Functions**: Mathematical objectives that define what the model should learn
|
|
- **Metrics**: Human-interpretable measures of model performance for monitoring and decision-making
|
|
- **Training Loop**: Orchestration pattern that coordinates data loading, forward passes, backward passes, and optimization
|
|
- **Validation Strategy**: Techniques for monitoring generalization and preventing overfitting
|
|
|
|
### Machine Learning Engineering
|
|
- **Training Dynamics**: Understanding convergence, overfitting, underfitting, and optimization landscapes
|
|
- **Hyperparameter Tuning**: Systematic approaches to learning rate, batch size, and architecture selection
|
|
- **Debugging Training**: Common failure modes and diagnostic techniques for training issues
|
|
- **Production Considerations**: Scalability, monitoring, reproducibility, and deployment readiness
|
|
|
|
### Systems Integration Patterns
|
|
- **Component Orchestration**: How to coordinate multiple ML components into cohesive systems
|
|
- **Error Handling**: Robust handling of training failures, data issues, and convergence problems
|
|
- **Monitoring and Logging**: Tracking training progress, performance metrics, and system health
|
|
- **Extensibility**: Design patterns that enable easy addition of new losses, metrics, and training strategies
|
|
|
|
## 🎉 Ready to Build?
|
|
|
|
You're about to complete the TinyTorch framework by building the training system that brings everything together! This is where all your hard work on tensors, layers, networks, data loading, gradients, and optimization culminates in a complete ML system.
|
|
|
|
Training is the heart of machine learning—it's where models learn from data and become intelligent. You're building the same patterns used to train GPT, train computer vision models, and power production AI systems. Take your time, understand how all the pieces fit together, and enjoy creating something truly powerful!
|
|
|
|
```{grid} 3
|
|
:gutter: 3
|
|
:margin: 2
|
|
|
|
{grid-item-card} 🚀 Launch Builder
|
|
:link: https://mybinder.org/v2/gh/VJProductions/TinyTorch/main?filepath=modules/source/10_training/training_dev.py
|
|
:class-title: text-center
|
|
:class-body: text-center
|
|
|
|
Interactive development environment
|
|
|
|
{grid-item-card} 📓 Open in Colab
|
|
:link: https://colab.research.google.com/github/VJProductions/TinyTorch/blob/main/modules/source/10_training/training_dev.ipynb
|
|
:class-title: text-center
|
|
:class-body: text-center
|
|
|
|
Google Colab notebook
|
|
|
|
{grid-item-card} 👀 View Source
|
|
:link: https://github.com/VJProductions/TinyTorch/blob/main/modules/source/10_training/training_dev.py
|
|
:class-title: text-center
|
|
:class-body: text-center
|
|
|
|
Browse the code on GitHub
|
|
``` |