mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-04-26 16:58:13 -05:00

Files

Vijay Janapa Reddi a5679de141 Update documentation after module reordering

All module references updated to reflect new ordering:
- Module 15: Quantization (was 16)
- Module 16: Compression (was 17)
- Module 17: Memoization (was 15)

Updated by module-developer and website-manager agents:
- Module ABOUT files with correct numbers and prerequisites
- Cross-references and "What's Next" chains
- Website navigation (_toc.yml) and content
- Learning path progression in LEARNING_PATH.md
- Profile milestone completion message (Module 17)

Pedagogical flow now: Profile → Quantize → Prune → Cache → Accelerate

2025-11-10 19:37:41 -05:00

7.5 KiB

Raw Blame History

title, description, difficulty, time_estimate, prerequisites, next_steps, learning_objectives

title

description

difficulty

time_estimate

prerequisites

next_steps

learning_objectives

Loss Functions

Implement MSE and CrossEntropy loss functions for training neural networks

3-4 hours

Tensor

Activations

Layers

Autograd

Implement MSE loss for regression tasks with proper numerical stability

Build CrossEntropy loss for classification with log-sum-exp trick

Understand mathematical properties of loss functions and their gradients

Recognize how loss functions connect model outputs to optimization objectives

Apply appropriate loss functions for different machine learning tasks

04. Losses

🏗️ FOUNDATION TIER | Difficulty: ⭐⭐ (2/4) | Time: 3-4 hours

Overview

Implement the mathematical functions that measure how wrong your model's predictions are. Loss functions are the bridge between model outputs and the optimization process—they define what "better" means and drive the entire learning process.

Learning Objectives

By completing this module, you will be able to:

Implement MSE loss for regression tasks with numerically stable computation
Build CrossEntropy loss for classification using the log-sum-exp trick for numerical stability
Understand mathematical properties of loss landscapes and their impact on optimization
Recognize the role of loss functions in connecting predictions to training objectives
Apply appropriate losses for regression, binary classification, and multi-class classification

Why This Matters

Production Context

Loss functions are fundamental to all machine learning systems:

Recommendation Systems use MSE and ranking losses to learn user preferences
Image Classification relies on CrossEntropy loss for category prediction (ImageNet, CIFAR-10)
Language Models use CrossEntropy to predict next tokens in GPT, Claude, and all LLMs
Autonomous Driving combines multiple losses for perception, planning, and control

Historical Context

Loss functions evolved with machine learning itself:

Least Squares (1805): Gauss invented MSE for astronomical orbit predictions
Maximum Likelihood (1912): Fisher formalized statistical foundations of loss functions
CrossEntropy (1950s): Information theory brought entropy-based losses to ML
Modern Deep Learning (2012+): Careful loss design enables training billion-parameter models

Build → Use → Understand

This module follows the classic pedagogy for foundational concepts:

Build: Implement MSE and CrossEntropy loss functions from mathematical definitions
Use: Apply losses to regression and classification tasks, seeing how they drive learning
Understand: Analyze loss landscapes, gradients, and numerical stability considerations

Implementation Guide

Step 1: MSE (Mean Squared Error) Loss

Implement L2 loss for regression:

class MSELoss:
    """Mean Squared Error loss for regression."""
    
    def __call__(self, predictions: Tensor, targets: Tensor) -> Tensor:
        """
        Compute MSE: (1/n) * Σ(predictions - targets)²
        
        Args:
            predictions: Model outputs
            targets: Ground truth values
        Returns:
            Scalar loss value
        """
        diff = predictions - targets
        squared = diff * diff
        return squared.mean()

Step 2: CrossEntropy Loss

Implement log-likelihood loss for classification:

class CrossEntropyLoss:
    """CrossEntropy loss for multi-class classification."""
    
    def __call__(self, logits: Tensor, targets: Tensor) -> Tensor:
        """
        Compute CrossEntropy with log-sum-exp trick for numerical stability.
        
        Args:
            logits: Raw model outputs (before softmax)
            targets: Class indices or one-hot vectors
        Returns:
            Scalar loss value
        """
        # Log-sum-exp trick for numerical stability
        max_logits = logits.max(axis=1, keepdims=True)
        exp_logits = (logits - max_logits).exp()
        log_probs = logits - max_logits - exp_logits.sum(axis=1, keepdims=True).log()
        
        # Negative log-likelihood
        return -log_probs.mean()

Step 3: Loss Function Properties

Understand key mathematical properties:

Convexity: MSE is convex; CrossEntropy is convex in logits
Gradients: Smooth gradients enable effective optimization
Scale: Loss magnitude affects learning rate tuning
Numerical Stability: Requires careful implementation (log-sum-exp trick)

Testing

Inline Tests

The module includes immediate feedback:

# Example inline test output
🔬 Unit Test: MSE Loss...
✅ MSE computes squared error correctly
✅ MSE gradient flows properly
✅ MSE handles batch dimensions correctly
📈 Progress: MSE Loss ✓

🔬 Unit Test: CrossEntropy Loss...
✅ CrossEntropy numerically stable
✅ CrossEntropy matches PyTorch implementation
✅ CrossEntropy handles multi-class problems
📈 Progress: CrossEntropy Loss ✓

Export and Validate

# Export to package
tito export --module 04_losses

# Run test suite
tito test --module 04_losses

Where This Code Lives

tinytorch/
├── nn/
│   └── losses.py          # MSELoss, CrossEntropyLoss
└── core/
    └── tensor.py          # Underlying tensor operations

After export, use as:

from tinytorch.nn import MSELoss, CrossEntropyLoss

# For regression
mse = MSELoss()
loss = mse(predictions, targets)

# For classification
ce = CrossEntropyLoss()
loss = ce(logits, labels)

Systems Thinking Questions

Why does CrossEntropy require the log-sum-exp trick? What numerical instability occurs without it?
How does loss scale affect learning? If you multiply your loss by 100, what happens to gradients and learning?
Why do we use MSE for regression but CrossEntropy for classification? What makes each appropriate for its task?
How do loss functions connect to probability theory? What is the relationship between CrossEntropy and maximum likelihood?
What happens if you use the wrong loss function? Try MSE for classification or CrossEntropy for regression—what breaks?

Real-World Connections

Industry Applications

Computer Vision: CrossEntropy trains all classification models (ResNet, EfficientNet, Vision Transformers)
NLP: CrossEntropy is the foundation of all language models (GPT, BERT, T5)
Recommendation: MSE and ranking losses optimize Netflix, Spotify, YouTube recommendations
Robotics: MSE trains continuous control policies for manipulation and navigation

Production Considerations

Numerical Stability: Log-sum-exp trick prevents overflow/underflow in production systems
Loss Scaling: Careful scaling enables mixed-precision training (FP16/BF16)
Weighted Losses: Class weights handle imbalanced datasets in production
Custom Losses: Production systems often combine multiple loss terms

What's Next?

Now that you can measure prediction quality, you're ready for Module 05: Autograd where you'll learn how to automatically compute gradients of these loss functions, enabling the optimization that drives all of machine learning.

Preview: Autograd will automatically compute ∂Loss/∂weights for any loss function you build, making training possible without manual gradient derivations!

Need Help?

Check the inline tests in modules/04_losses/losses_dev.py
Review mathematical derivations in the module comments
Compare your implementation against PyTorch's losses

7.5 KiB Raw Blame History