mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-06 12:42:32 -05:00

Files

Vijay Janapa Reddi d82c75f9dc Renumber modules from 00-13 to 01-14 for natural numbering

✅ Rename all module directories: 00_setup → 01_setup, etc.
✅ Update convert_modules.py mappings for new directory names
✅ Update _toc.yml file paths and titles (1-14 instead of 0-13)
✅ Regenerate all overview pages with new numbering
✅ Fix all broken references in usage-paths and intro
✅ Update chapter references to use natural numbering

Benefits:
- More intuitive course progression starting from 1
- Matches academic course numbering conventions
- Eliminates confusion about 'Module 0' concept
- Cleaner mental model for students and instructors
- All references and links properly updated

Complete transformation: 14 modules now numbered 01-14

2025-07-15 18:51:36 -04:00

module.yaml

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

README.md

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

training_dev.ipynb

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

training_dev.py

Renumber modules from 00-13 to 01-14 for natural numbering

2025-07-15 18:51:36 -04:00

README.md

🏋️ Module 9: Training - Complete Neural Network Training Pipeline

📊 Module Info

Difficulty: ⭐⭐⭐⭐⭐ Expert
Time Estimate: 8-10 hours
Prerequisites: Tensor, Activations, Layers, Networks, DataLoader, Autograd, Optimizers modules
Next Steps: Compression, Kernels, Benchmarking, MLOps modules

Build the complete training pipeline that brings all TinyTorch components together

🎯 Learning Objectives

After completing this module, you will:

Understand loss functions and how they guide neural network training
Implement essential loss functions: MSE, CrossEntropy, and BinaryCrossEntropy
Build evaluation metrics for classification and regression tasks
Create a complete training loop that orchestrates the entire training process
Master training workflows with validation, logging, and progress tracking

🧠 Build → Use → Optimize

This module follows the TinyTorch pedagogical framework:

Build: Loss functions, metrics, and training orchestration components
Use: Train complete neural networks on real datasets
Optimize: Analyze training dynamics and improve performance

📚 What You'll Build

Loss Functions

# Regression loss
mse = MeanSquaredError()
loss = mse(predictions, targets)

# Multi-class classification loss
ce = CrossEntropyLoss()
loss = ce(logits, class_indices)

# Binary classification loss
bce = BinaryCrossEntropyLoss()
loss = bce(logits, binary_labels)

Evaluation Metrics

# Classification accuracy
accuracy = Accuracy()
acc = accuracy(predictions, true_labels)  # Returns 0.0 to 1.0

# Regression metrics
mae = MeanAbsoluteError()
error = mae(predictions, targets)

Complete Training Pipeline

# Set up training components
model = Sequential([
    Dense(784, 128), ReLU(),
    Dense(128, 64), ReLU(),
    Dense(64, 10), Softmax()
])

optimizer = Adam(model.parameters, learning_rate=0.001)
loss_fn = CrossEntropyLoss()
metrics = [Accuracy()]

# Create trainer
trainer = Trainer(model, optimizer, loss_fn, metrics)

# Train the model
history = trainer.fit(
    train_dataloader, 
    val_dataloader, 
    epochs=10,
    verbose=True
)

Training with Real Data

# Load dataset
from tinytorch.core.dataloader import SimpleDataset, DataLoader

# Create dataset
train_dataset = SimpleDataset(size=1000, num_features=784, num_classes=10)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Train on real data
history = trainer.fit(train_loader, epochs=50)

# Analyze training
print(f"Final training loss: {history['train_loss'][-1]:.4f}")
print(f"Final training accuracy: {history['train_accuracy'][-1]:.4f}")

🚀 Getting Started

Prerequisites

Complete Modules 1-8: Setup through Optimizers ✅
Understand backpropagation and gradient descent
Familiar with classification and regression tasks

Quick Start

# Navigate to the training module
cd modules/source/09_training

# Open the development notebook
jupyter lab training_dev.py

# Or use the TinyTorch CLI
tito module info training
tito module test training

📖 Core Concepts

Loss Functions: The Training Signal

Loss functions measure how far our predictions are from the true values:

MSE: For regression tasks, penalizes large errors heavily
CrossEntropy: For classification, works with softmax outputs
BinaryCrossEntropy: For binary classification, works with sigmoid outputs

Metrics: Human-Interpretable Performance

Metrics provide understandable measures of model performance:

Accuracy: Fraction of correct predictions
Precision: Of positive predictions, how many were correct?
Recall: Of actual positives, how many were found?

Training Loop: Orchestrating Learning

The training loop coordinates all components:

Forward Pass: Model makes predictions
Loss Computation: Measure prediction quality
Backward Pass: Compute gradients
Parameter Update: Improve model weights
Validation: Monitor generalization performance

Training Dynamics

Understanding how training behaves:

Overfitting: Model memorizes training data
Underfitting: Model too simple to learn patterns
Convergence: Loss stops decreasing
Validation: Monitoring generalization

🔬 Advanced Features

Training Monitoring

# Track training progress
history = trainer.fit(train_loader, val_loader, epochs=100)

# Plot training curves
import matplotlib.pyplot as plt
plt.plot(history['train_loss'], label='Training Loss')
plt.plot(history['val_loss'], label='Validation Loss')
plt.legend()
plt.show()

Custom Metrics

# Create custom metrics
class F1Score:
    def __call__(self, y_pred, y_true):
        # Implement F1 score calculation
        pass

# Use in training
trainer = Trainer(model, optimizer, loss_fn, metrics=[Accuracy(), F1Score()])

Training Strategies

# Learning rate scheduling
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)

# Early stopping
class EarlyStopping:
    def __init__(self, patience=10):
        self.patience = patience
        self.best_loss = float('inf')
        self.counter = 0
    
    def __call__(self, val_loss):
        if val_loss < self.best_loss:
            self.best_loss = val_loss
            self.counter = 0
        else:
            self.counter += 1
            return self.counter >= self.patience

🛠️ Real-World Applications

Computer Vision

# Image classification pipeline
model = Sequential([
    Conv2D((3, 3)), ReLU(),
    flatten,
    Dense(128), ReLU(),
    Dense(10), Softmax()
])

trainer = Trainer(model, Adam(model.parameters), CrossEntropyLoss(), [Accuracy()])
history = trainer.fit(cifar10_loader, epochs=50)

Natural Language Processing

# Text classification
model = Sequential([
    Dense(vocab_size, 128), ReLU(),
    Dense(128, 64), ReLU(),
    Dense(64, num_classes), Softmax()
])

trainer = Trainer(model, SGD(model.parameters), CrossEntropyLoss(), [Accuracy()])
history = trainer.fit(text_loader, epochs=20)

Regression Tasks

# House price prediction
model = Sequential([
    Dense(features, 64), ReLU(),
    Dense(64, 32), ReLU(),
    Dense(32, 1)  # Single output for regression
])

trainer = Trainer(model, Adam(model.parameters), MeanSquaredError(), [])
history = trainer.fit(housing_loader, epochs=100)

📈 Performance Optimization

Batch Size Selection

Small batches: More updates, noisier gradients
Large batches: Fewer updates, smoother gradients
Sweet spot: Usually 32-256 depending on dataset

Learning Rate Tuning

Too high: Training diverges or oscillates
Too low: Training is slow or gets stuck
Adaptive methods: Adam often works well out of the box

Regularization

Dropout: Randomly disable neurons during training
Weight decay: L2 regularization on parameters
Early stopping: Stop when validation performance plateaus

🎯 Module Completion

What You've Built

✅ Complete loss function library: MSE, CrossEntropy, BinaryCrossEntropy
✅ Evaluation metrics: Accuracy and extensible metric framework
✅ Training orchestration: Full-featured Trainer class
✅ Real-world pipeline: Train models on actual datasets
✅ Monitoring tools: Track training progress and performance

Skills Developed

✅ Training loop design: Coordinate all training components
✅ Loss function implementation: Measure prediction quality
✅ Metric computation: Evaluate model performance
✅ Training dynamics: Understand convergence and optimization
✅ Production workflows: Build scalable training pipelines

Next Steps

Export your training module: tito export training
Train a complete model: Use all TinyTorch components together
Explore advanced topics: Regularization, scheduling, ensembles
Build production pipelines: Scale training to larger datasets

Ready for the final stretch? Your training module completes the core TinyTorch framework. Next up: compression, kernels, and MLOps! 🚀