# --- # jupyter: # jupytext: # text_representation: # extension: .py # format_name: percent # format_version: '1.3' # jupytext_version: 1.17.1 # --- # %% [markdown] """ # Loss Functions - Essential Training Objectives for Neural Networks Welcome to the Loss Functions module! You'll implement the essential loss functions that define learning objectives and enable neural networks to learn from data through gradient-based optimization. ## Learning Goals - Systems understanding: How loss functions define learning objectives and drive gradient-based optimization - Core implementation skill: Build MSE, CrossEntropy, and BinaryCrossEntropy with proper numerical stability - Pattern recognition: Understand how different loss functions shape learning dynamics and convergence behavior - Framework connection: See how your loss implementations mirror PyTorch's loss functions and autograd integration - Performance insight: Learn why numerically stable loss computation affects training reliability and convergence speed ## Build โ†’ Use โ†’ Reflect 1. **Build**: Complete loss function implementations with numerical stability and gradient support 2. **Use**: Apply loss functions to regression and classification problems with real neural networks 3. **Reflect**: Why do different loss functions lead to different learning behaviors, and when does numerical stability matter? ## What You'll Achieve By the end of this module, you'll understand: - Deep technical understanding of how loss functions translate learning problems into optimization objectives - Practical capability to implement production-quality loss functions with proper numerical stability - Systems insight into why loss function design affects training stability and convergence characteristics - Performance consideration of how numerical precision in loss computation affects training reliability - Connection to production ML systems and how frameworks implement robust loss computation ## Systems Reality Check ๐Ÿ’ก **Production Context**: PyTorch's loss functions use numerically stable implementations and automatic mixed precision to handle extreme gradients and values โšก **Performance Note**: Numerically unstable loss functions can cause training to fail catastrophically - proper implementation is critical for reliable ML systems """ # %% nbgrader={"grade": false, "grade_id": "losses-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} #| default_exp core.losses #| export import numpy as np import sys import os # Import our building blocks - try package first, then local modules try: from tinytorch.core.tensor import Tensor # Note: For now, we'll use simplified implementations without full autograd # In a complete system, these would integrate with the autograd Variable system except ImportError: # For development, import from local modules sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_tensor')) from tensor_dev import Tensor # %% nbgrader={"grade": false, "grade_id": "losses-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} print("๐Ÿ”ฅ TinyTorch Loss Functions Module") print(f"NumPy version: {np.__version__}") print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") print("Ready to build loss functions for neural network training!") # %% [markdown] """ ## Where This Code Lives in the Final Package **Learning Side:** You work in modules/05_losses/losses_dev.py **Building Side:** Code exports to tinytorch.core.losses ```python # Final package structure: from tinytorch.core.losses import MeanSquaredError, CrossEntropyLoss, BinaryCrossEntropyLoss # All loss functions! from tinytorch.core.tensor import Tensor # The foundation from tinytorch.core.layers import Linear, Sequential # Network components ``` **Why this matters:** - **Learning:** Focused module for understanding loss functions and training objectives - **Production:** Proper organization like PyTorch's torch.nn with all loss functions together - **Consistency:** All loss functions live together in core.losses for easy access - **Integration:** Works seamlessly with tensors and neural networks for complete training systems """ # %% [markdown] """ # Understanding Loss Functions in Neural Networks ## What are Loss Functions? Loss functions (also called cost functions or objective functions) quantify how far your model's predictions are from the true targets. They provide: ๐ŸŽฏ **Learning Objectives**: Define what "good" performance means for your specific problem ๐Ÿ“ˆ **Gradient Signal**: Provide gradients that guide parameter updates during training ๐Ÿ” **Progress Measurement**: Enable monitoring training progress and convergence โš–๏ธ **Trade-off Control**: Balance different aspects of model performance (accuracy vs regularization) ## Why Loss Functions Matter for ML Systems ### The Learning Loop ``` 1. Forward Pass: Input โ†’ Network โ†’ Predictions 2. Loss Computation: Loss = loss_function(predictions, targets) 3. Backward Pass: Compute gradients of loss w.r.t. parameters 4. Parameter Update: parameters -= learning_rate * gradients 5. Repeat until convergence ``` ### Different Problems Need Different Loss Functions ๐Ÿ”ข **Regression Problems**: Mean Squared Error (MSE) - Predicting continuous values (house prices, temperatures, stock prices) - Penalizes large errors more than small errors (quadratic penalty) ๐Ÿท๏ธ **Multi-Class Classification**: Cross-Entropy Loss - Predicting one class from many options (image classification, text categorization) - Works with probability distributions over classes โšช **Binary Classification**: Binary Cross-Entropy Loss - Predicting yes/no, positive/negative (spam detection, medical diagnosis) - Optimized for two-class problems Let's implement these essential loss functions! """ # %% [markdown] """ # Mean Squared Error - Foundation for Regression MSE measures the average squared difference between predictions and targets. It's the most fundamental loss function for regression problems. ## Why MSE Matters ๐Ÿ“Š **Regression Standard**: The go-to loss function for predicting continuous values ๐ŸŽฏ **Quadratic Penalty**: Penalizes large errors more than small errors ๐Ÿ“ˆ **Smooth Gradients**: Provides smooth gradients for stable optimization ๐Ÿ”ข **Interpretable**: Loss value has same units as your target variable (squared) ## Mathematical Foundation For batch of predictions and targets: ``` MSE = (1/n) ร— ฮฃ(y_pred - y_true)ยฒ ``` ## Learning Objectives By implementing MSE, you'll understand: - How regression loss functions quantify prediction quality - Why squared error creates smooth gradients for optimization - How batch processing enables efficient training on multiple samples - The connection between mathematical loss functions and practical ML training """ # %% nbgrader={"grade": false, "grade_id": "mse-loss-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export class MeanSquaredError: """ Mean Squared Error Loss for Regression Problems Computes the average squared difference between predictions and targets: MSE = (1/n) ร— ฮฃ(y_pred - y_true)ยฒ Features: - Numerically stable computation - Efficient batch processing - Clean gradient properties for optimization - Compatible with tensor operations Example Usage: mse = MeanSquaredError() loss = mse(predictions, targets) # Returns scalar loss value """ def __init__(self): """Initialize MSE loss function.""" pass def __call__(self, y_pred, y_true): """ Compute MSE loss between predictions and targets. Args: y_pred: Model predictions (Tensor, shape: [batch_size, ...]) y_true: True targets (Tensor, shape: [batch_size, ...]) Returns: Tensor with scalar loss value """ # Convert to tensors if needed if not isinstance(y_pred, Tensor): y_pred = Tensor(y_pred) if not isinstance(y_true, Tensor): y_true = Tensor(y_true) # Compute mean squared error diff = y_pred.data - y_true.data squared_diff = diff * diff mean_loss = np.mean(squared_diff) return Tensor(mean_loss) def forward(self, y_pred, y_true): """Alternative interface for forward pass.""" return self.__call__(y_pred, y_true) # %% [markdown] """ ## Testing Mean Squared Error Let's verify our MSE implementation works correctly with various test cases. """ # %% nbgrader={"grade": true, "grade_id": "test-mse-loss", "locked": true, "points": 3, "schema_version": 3, "solution": false, "task": false} def test_mse_loss(): """Test MSE loss implementation.""" print("๐Ÿงช Testing Mean Squared Error Loss...") mse = MeanSquaredError() # Test case 1: Perfect predictions (loss should be 0) y_pred = Tensor([[1.0, 2.0], [3.0, 4.0]]) y_true = Tensor([[1.0, 2.0], [3.0, 4.0]]) loss = mse(y_pred, y_true) assert abs(loss.data) < 1e-6, f"Perfect predictions should have loss โ‰ˆ 0, got {loss.data}" print("โœ… Perfect predictions test passed") # Test case 2: Known loss computation y_pred = Tensor([[1.0, 2.0]]) y_true = Tensor([[0.0, 1.0]]) loss = mse(y_pred, y_true) expected = 1.0 # [(1-0)ยฒ + (2-1)ยฒ] / 2 = [1 + 1] / 2 = 1.0 assert abs(loss.data - expected) < 1e-6, f"Expected loss {expected}, got {loss.data}" print("โœ… Known loss computation test passed") # Test case 3: Batch processing y_pred = Tensor([[1.0, 2.0], [3.0, 4.0]]) y_true = Tensor([[1.5, 2.5], [2.5, 3.5]]) loss = mse(y_pred, y_true) expected = 0.25 # All squared differences are 0.25 assert abs(loss.data - expected) < 1e-6, f"Expected batch loss {expected}, got {loss.data}" print("โœ… Batch processing test passed") # Test case 4: Single value y_pred = Tensor([5.0]) y_true = Tensor([3.0]) loss = mse(y_pred, y_true) expected = 4.0 # (5-3)ยฒ = 4 assert abs(loss.data - expected) < 1e-6, f"Expected single value loss {expected}, got {loss.data}" print("โœ… Single value test passed") print("๐ŸŽ‰ All MSE loss tests passed!") test_mse_loss() # %% [markdown] """ # Cross-Entropy Loss - Foundation for Multi-Class Classification Cross-Entropy Loss measures the difference between predicted probability distributions and true class labels. It's the standard loss function for multi-class classification problems. ## Why Cross-Entropy Matters ๐ŸŽฏ **Classification Standard**: The go-to loss function for multi-class problems ๐Ÿ“Š **Probability Interpretation**: Works naturally with softmax probability outputs ๐Ÿ”„ **Information Theory**: Measures information distance between distributions โš–๏ธ **Class Balance**: Handles multiple classes in a principled way ## Mathematical Foundation For predictions and class indices: ``` CrossEntropy = -ฮฃ y_true ร— log(softmax(y_pred)) ``` With softmax normalization: ``` softmax(x_i) = exp(x_i) / ฮฃ exp(x_j) ``` ## Learning Objectives By implementing Cross-Entropy, you'll understand: - How classification losses work with probability distributions - Why softmax normalization is essential for multi-class problems - The importance of numerical stability in log computations - How cross-entropy encourages confident, correct predictions """ # %% nbgrader={"grade": false, "grade_id": "crossentropy-loss-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export class CrossEntropyLoss: """ Cross-Entropy Loss for Multi-Class Classification Problems Computes the cross-entropy between predicted probability distributions and true class labels with numerically stable implementation. Features: - Numerically stable softmax computation - Support for both class indices and one-hot encoding - Efficient batch processing - Automatic handling of edge cases Example Usage: ce_loss = CrossEntropyLoss() loss = ce_loss(logits, class_indices) # Returns scalar loss value """ def __init__(self): """Initialize CrossEntropy loss function.""" pass def __call__(self, y_pred, y_true): """ Compute CrossEntropy loss between predictions and targets. Args: y_pred: Model predictions/logits (Tensor, shape: [batch_size, num_classes]) y_true: True class indices (Tensor, shape: [batch_size]) or one-hot encoding Returns: Tensor with scalar loss value """ # Convert to tensors if needed if not isinstance(y_pred, Tensor): y_pred = Tensor(y_pred) if not isinstance(y_true, Tensor): y_true = Tensor(y_true) # Get data arrays pred_data = y_pred.data true_data = y_true.data # Handle both 1D and 2D prediction arrays if pred_data.ndim == 1: pred_data = pred_data.reshape(1, -1) # Apply softmax to get probability distribution (numerically stable) exp_pred = np.exp(pred_data - np.max(pred_data, axis=1, keepdims=True)) softmax_pred = exp_pred / np.sum(exp_pred, axis=1, keepdims=True) # Add small epsilon to avoid log(0) epsilon = 1e-15 softmax_pred = np.clip(softmax_pred, epsilon, 1.0 - epsilon) # Handle class indices vs one-hot encoding if len(true_data.shape) == 1: # y_true contains class indices batch_size = true_data.shape[0] log_probs = np.log(softmax_pred[np.arange(batch_size), true_data.astype(int)]) loss_value = -np.mean(log_probs) else: # y_true is one-hot encoded log_probs = np.log(softmax_pred) loss_value = -np.mean(np.sum(true_data * log_probs, axis=1)) return Tensor(loss_value) def forward(self, y_pred, y_true): """Alternative interface for forward pass.""" return self.__call__(y_pred, y_true) # %% [markdown] """ ## Testing Cross-Entropy Loss Let's verify our Cross-Entropy implementation handles various classification scenarios. """ # %% nbgrader={"grade": true, "grade_id": "test-crossentropy-loss", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false} def test_crossentropy_loss(): """Test CrossEntropy loss implementation.""" print("๐Ÿงช Testing Cross-Entropy Loss...") ce = CrossEntropyLoss() # Test case 1: Perfect predictions y_pred = Tensor([[10.0, 0.0, 0.0], [0.0, 10.0, 0.0]]) # Very confident correct predictions y_true = Tensor([0, 1]) # Class indices loss = ce(y_pred, y_true) assert loss.data < 0.1, f"Perfect predictions should have low loss, got {loss.data}" print("โœ… Perfect predictions test passed") # Test case 2: Random predictions (should have higher loss) y_pred = Tensor([[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]) # Uniform after softmax y_true = Tensor([0, 1]) loss = ce(y_pred, y_true) expected_random = -np.log(1.0/3.0) # log(1/num_classes) for uniform distribution assert abs(loss.data - expected_random) < 0.1, f"Random predictions should have loss โ‰ˆ {expected_random}, got {loss.data}" print("โœ… Random predictions test passed") # Test case 3: Binary classification y_pred = Tensor([[2.0, 1.0], [1.0, 2.0]]) y_true = Tensor([0, 1]) loss = ce(y_pred, y_true) assert 0.0 < loss.data < 2.0, f"Binary classification loss should be reasonable, got {loss.data}" print("โœ… Binary classification test passed") # Test case 4: One-hot encoded labels y_pred = Tensor([[2.0, 1.0, 0.0], [0.0, 2.0, 1.0]]) y_true = Tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]) # One-hot encoded loss = ce(y_pred, y_true) assert 0.0 < loss.data < 2.0, f"One-hot encoded loss should be reasonable, got {loss.data}" print("โœ… One-hot encoded labels test passed") print("๐ŸŽ‰ All Cross-Entropy loss tests passed!") test_crossentropy_loss() # %% [markdown] """ # Binary Cross-Entropy Loss - Optimized for Binary Classification Binary Cross-Entropy Loss is specifically designed for binary classification problems. While you could use regular Cross-Entropy with 2 classes, BCE is more efficient and numerically stable for binary problems. ## Why Binary Cross-Entropy Matters โšช **Binary Optimization**: Specifically designed for two-class problems ๐Ÿ”ข **Efficiency**: More efficient than multi-class cross-entropy for binary cases ๐ŸŽฏ **Stability**: Better numerical stability with sigmoid outputs ๐Ÿ“ˆ **Standard Practice**: Industry standard for binary classification ## Mathematical Foundation For binary predictions and labels: ``` BCE = -y_true ร— log(ฯƒ(y_pred)) - (1-y_true) ร— log(1-ฯƒ(y_pred)) ``` Where ฯƒ(x) is the sigmoid function: ``` ฯƒ(x) = 1 / (1 + exp(-x)) ``` ## Learning Objectives By implementing Binary Cross-Entropy, you'll understand: - How binary classification differs from multi-class problems - Why sigmoid activation is natural for binary problems - The importance of numerical stability in sigmoid + log computations - How BCE loss shapes binary decision boundaries """ # %% nbgrader={"grade": false, "grade_id": "binary-crossentropy-implementation", "locked": false, "schema_version": 3, "solution": true, "task": false} #| export class BinaryCrossEntropyLoss: """ Binary Cross-Entropy Loss for Binary Classification Problems Computes binary cross-entropy between predictions and binary labels with numerically stable sigmoid + BCE implementation. Features: - Numerically stable computation from logits - Efficient batch processing - Automatic sigmoid application - Robust to extreme input values Example Usage: bce_loss = BinaryCrossEntropyLoss() loss = bce_loss(logits, binary_labels) # Returns scalar loss value """ def __init__(self): """Initialize Binary CrossEntropy loss function.""" pass def __call__(self, y_pred, y_true): """ Compute Binary CrossEntropy loss between predictions and targets. Args: y_pred: Model predictions/logits (Tensor, shape: [batch_size, 1] or [batch_size]) y_true: True binary labels (Tensor, shape: [batch_size, 1] or [batch_size]) Returns: Tensor with scalar loss value """ # Convert to tensors if needed if not isinstance(y_pred, Tensor): y_pred = Tensor(y_pred) if not isinstance(y_true, Tensor): y_true = Tensor(y_true) # Get flat arrays for computation logits = y_pred.data.flatten() labels = y_true.data.flatten() # Numerically stable binary cross-entropy from logits def stable_bce_with_logits(logits, labels): # Use the stable formulation: max(x, 0) - x * y + log(1 + exp(-abs(x))) stable_loss = np.maximum(logits, 0) - logits * labels + np.log(1 + np.exp(-np.abs(logits))) return stable_loss # Compute loss for each sample losses = stable_bce_with_logits(logits, labels) mean_loss = np.mean(losses) return Tensor(mean_loss) def forward(self, y_pred, y_true): """Alternative interface for forward pass.""" return self.__call__(y_pred, y_true) # %% [markdown] """ ## Testing Binary Cross-Entropy Loss Let's verify our Binary Cross-Entropy implementation handles binary classification correctly. """ # %% nbgrader={"grade": true, "grade_id": "test-binary-crossentropy", "locked": true, "points": 4, "schema_version": 3, "solution": false, "task": false} def test_binary_crossentropy_loss(): """Test Binary CrossEntropy loss implementation.""" print("๐Ÿงช Testing Binary Cross-Entropy Loss...") bce = BinaryCrossEntropyLoss() # Test case 1: Perfect predictions y_pred = Tensor([[10.0], [-10.0]]) # Very confident correct predictions y_true = Tensor([[1.0], [0.0]]) loss = bce(y_pred, y_true) assert loss.data < 0.1, f"Perfect predictions should have low loss, got {loss.data}" print("โœ… Perfect predictions test passed") # Test case 2: Random predictions (should have higher loss) y_pred = Tensor([[0.0], [0.0]]) # 0.5 probability after sigmoid y_true = Tensor([[1.0], [0.0]]) loss = bce(y_pred, y_true) expected_random = -np.log(0.5) # log(0.5) for random guessing assert abs(loss.data - expected_random) < 0.1, f"Random predictions should have loss โ‰ˆ {expected_random}, got {loss.data}" print("โœ… Random predictions test passed") # Test case 3: Batch processing y_pred = Tensor([[1.0], [2.0], [-1.0]]) y_true = Tensor([[1.0], [1.0], [0.0]]) loss = bce(y_pred, y_true) assert 0.0 < loss.data < 2.0, f"Batch processing loss should be reasonable, got {loss.data}" print("โœ… Batch processing test passed") # Test case 4: Extreme values (test numerical stability) y_pred = Tensor([[100.0], [-100.0]]) # Extreme logits y_true = Tensor([[1.0], [0.0]]) loss = bce(y_pred, y_true) assert not np.isnan(loss.data) and not np.isinf(loss.data), f"Extreme values should not cause NaN/Inf, got {loss.data}" assert loss.data < 1.0, f"Extreme correct predictions should have low loss, got {loss.data}" print("โœ… Extreme values test passed") print("๐ŸŽ‰ All Binary Cross-Entropy loss tests passed!") test_binary_crossentropy_loss() # %% [markdown] """ # Loss Function Comparison and Usage Guide ## When to Use Each Loss Function ### Mean Squared Error (MSE) **Best for:** Regression problems where you predict continuous values - **Examples:** Predicting house prices, temperature, stock values, ages - **Characteristics:** Penalizes large errors more than small ones - **Output:** Any real number - **Activation:** Usually none (linear output) ### Cross-Entropy Loss **Best for:** Multi-class classification (3+ classes) - **Examples:** Image classification (cats/dogs/birds), text categorization, medical diagnosis - **Characteristics:** Works with probability distributions over classes - **Output:** Class probabilities (sums to 1) - **Activation:** Softmax ### Binary Cross-Entropy Loss **Best for:** Binary classification (2 classes) - **Examples:** Spam detection, medical positive/negative, fraud detection - **Characteristics:** Optimized for binary decisions - **Output:** Single probability (0 to 1) - **Activation:** Sigmoid ## Numerical Stability Considerations All our implementations include numerical stability features: ๐Ÿ”ข **MSE**: Straightforward - no special numerical concerns ๐Ÿ“Š **Cross-Entropy**: Uses log-sum-exp trick, clips probabilities, handles edge cases โšช **Binary CE**: Uses stable logits formulation, clips extreme values ## Integration with Neural Networks ```python # Example usage in training loop model = Sequential([ Linear(784, 128), ReLU(), Linear(128, 10), Softmax() ]) # Choose appropriate loss for your problem loss_fn = CrossEntropyLoss() # For 10-class classification # In training loop predictions = model(inputs) loss = loss_fn(predictions, targets) # loss.backward() # Would trigger gradient computation (when autograd is integrated) ``` """ # %% [markdown] """ # Comprehensive Loss Function Testing Let's verify all our loss functions work correctly together and can be used interchangeably. """ # %% nbgrader={"grade": false, "grade_id": "comprehensive-loss-tests", "locked": false, "schema_version": 3, "solution": false, "task": false} def test_all_loss_functions(): """Test all loss functions work correctly together.""" print("๐Ÿ”ฌ Comprehensive Loss Function Testing") print("=" * 45) # Test 1: All losses can be instantiated print("\\n1. Loss Function Instantiation:") mse = MeanSquaredError() ce = CrossEntropyLoss() bce = BinaryCrossEntropyLoss() print(" โœ… All loss functions created successfully") # Test 2: Loss functions return appropriate types print("\\n2. Return Type Verification:") # MSE test pred = Tensor([[1.0, 2.0]]) target = Tensor([[1.0, 2.0]]) loss = mse(pred, target) assert isinstance(loss, Tensor), "MSE should return Tensor" assert loss.data.shape == (), "MSE should return scalar" # Cross-entropy test pred = Tensor([[1.0, 2.0], [2.0, 1.0]]) target = Tensor([1, 0]) loss = ce(pred, target) assert isinstance(loss, Tensor), "CrossEntropy should return Tensor" assert loss.data.shape == (), "CrossEntropy should return scalar" # Binary cross-entropy test pred = Tensor([[1.0], [-1.0]]) target = Tensor([[1.0], [0.0]]) loss = bce(pred, target) assert isinstance(loss, Tensor), "Binary CrossEntropy should return Tensor" assert loss.data.shape == (), "Binary CrossEntropy should return scalar" print(" โœ… All loss functions return correct types") # Test 3: Loss values are reasonable print("\\n3. Loss Value Sanity Checks:") # All losses should be non-negative assert mse.forward(Tensor([1.0]), Tensor([2.0])).data >= 0, "MSE should be non-negative" assert ce.forward(Tensor([[1.0, 0.0]]), Tensor([0])).data >= 0, "CrossEntropy should be non-negative" assert bce.forward(Tensor([1.0]), Tensor([1.0])).data >= 0, "Binary CrossEntropy should be non-negative" print(" โœ… All loss functions produce reasonable values") # Test 4: Perfect predictions give low loss print("\\n4. Perfect Prediction Tests:") perfect_mse = mse(Tensor([5.0]), Tensor([5.0])) perfect_ce = ce(Tensor([[10.0, 0.0]]), Tensor([0])) perfect_bce = bce(Tensor([10.0]), Tensor([1.0])) assert perfect_mse.data < 1e-10, f"Perfect MSE should be ~0, got {perfect_mse.data}" assert perfect_ce.data < 0.1, f"Perfect CE should be low, got {perfect_ce.data}" assert perfect_bce.data < 0.1, f"Perfect BCE should be low, got {perfect_bce.data}" print(" โœ… Perfect predictions produce low loss") print("\\n๐ŸŽ‰ All comprehensive tests passed!") print(" โ€ข Loss functions instantiate correctly") print(" โ€ข Return types are consistent (Tensor scalars)") print(" โ€ข Loss values are mathematically sound") print(" โ€ข Perfect predictions are handled correctly") print(" โ€ข Ready for integration with neural network training!") test_all_loss_functions() # %% [markdown] """ ## ๐Ÿค” ML Systems Thinking: Interactive Questions Now that you've implemented all the core loss functions, let's think about their implications for ML systems: """ # %% nbgrader={"grade": false, "grade_id": "question-1", "locked": false, "schema_version": 3, "solution": false, "task": false} # Question 1: Loss Function Selection and System Performance """ ๐Ÿค” **Question 1: Loss Function Selection Impact** You're building a production recommendation system that needs to predict user ratings (1-5 stars) for movies. You have three options: A) Treat as regression: Use MSE loss with continuous outputs (1.0-5.0) B) Treat as classification: Use CrossEntropy loss with 5 classes C) Use a custom loss that penalizes being off by multiple stars more heavily Analyze each approach considering: - Training speed and convergence behavior - Model interpretability and debugging - Production inference speed - How well each loss function matches the business objective - Edge case handling (what happens with ratings like 3.7?) Which approach would you choose and why? Consider both technical and business factors. """ # %% nbgrader={"grade": false, "grade_id": "question-2", "locked": false, "schema_version": 3, "solution": false, "task": false} # Question 2: Numerical Stability in Production """ ๐Ÿค” **Question 2: Numerical Stability Analysis** Your cross-entropy loss function works perfectly in development, but in production you start seeing NaN losses that crash training. Investigate the numerical stability issues: 1. What specific computations in cross-entropy can produce NaN or infinity values? 2. How do our implementations handle these edge cases? 3. What would happen if you removed the epsilon clipping in softmax computation? 4. How would you debug this in a production system with millions of training examples? Research areas to consider: - Floating point precision and representation limits - Log of very small numbers and exp of very large numbers - Batch processing effects on numerical stability - How PyTorch handles these same numerical challenges """ # %% nbgrader={"grade": false, "grade_id": "question-3", "locked": false, "schema_version": 3, "solution": false, "task": false} # Question 3: Loss Function Innovation """ ๐Ÿค” **Question 3: Custom Loss Functions for Real Problems** Standard loss functions don't always match real-world objectives. Consider these scenarios: **Scenario A**: Medical diagnosis where false negatives are 10x more costly than false positives **Scenario B**: Search ranking where being wrong about the top result is much worse than being wrong about result #50 **Scenario C**: Financial trading where large losses should be penalized exponentially more than small losses For each scenario: 1. Why would standard loss functions (MSE, CrossEntropy, BCE) be suboptimal? 2. How would you modify the loss function to better match the business objective? 3. What are the implementation challenges of custom loss functions? 4. How would you validate that your custom loss actually improves business outcomes? Design principles to consider: - Asymmetric penalties for different types of errors - Position-aware losses for ranking problems - Risk-adjusted losses for financial applications - How custom losses affect gradient flow and training dynamics """ # %% [markdown] """ ## ๐ŸŽฏ MODULE SUMMARY: Loss Functions - Learning Objectives Made Concrete ## ๐ŸŽฏ What You've Accomplished You've successfully implemented the complete foundation for neural network training objectives: ### โœ… **Complete Loss Function Library** - **Mean Squared Error**: Robust regression loss with smooth gradients for continuous value prediction - **Cross-Entropy Loss**: Multi-class classification loss with numerically stable softmax integration - **Binary Cross-Entropy Loss**: Optimized binary classification loss with stable sigmoid computation - **Numerical Stability**: All implementations handle edge cases and extreme values gracefully ### โœ… **Systems Understanding** - **Training Objectives**: How loss functions translate business problems into mathematical optimization objectives - **Numerical Stability**: Why proper implementation prevents catastrophic training failures in production - **Performance Characteristics**: Understanding computational complexity and batch processing efficiency - **Problem Matching**: When to use each loss function based on problem structure and data characteristics ### โœ… **ML Engineering Skills** - **Production-Ready Implementation**: Robust loss functions that handle real-world data edge cases - **Batch Processing**: Efficient computation across multiple samples for scalable training - **Error Handling**: Proper numerical stability measures for reliable production deployment - **Integration Ready**: Clean interfaces that work seamlessly with neural network training loops ## ๐Ÿ”— **Connection to Production ML Systems** Your implementations mirror the essential patterns used in: - **PyTorch's loss functions**: Same mathematical formulations with production-grade numerical stability - **TensorFlow's losses**: Identical computational patterns and stability measures - **Production ML pipelines**: The same loss functions that power real ML systems at scale - **Research frameworks**: Foundation for experimenting with custom loss functions and training objectives ## ๐Ÿš€ **What's Next** With solid loss function implementations, you're ready to: - **Build complete training loops** that optimize neural networks on real data - **Implement optimizers** that use loss gradients to update model parameters - **Create training infrastructure** with proper monitoring and convergence detection - **Experiment with custom losses** for specialized business objectives and research problems ## ๐Ÿ’ก **Key Systems Insights** 1. **Loss functions are the interface between business objectives and mathematical optimization** - they translate "what we want" into "what the computer can optimize" 2. **Numerical stability is not optional in production** - unstable loss computation causes catastrophic training failures 3. **Different problem types require different loss functions** - the choice affects both convergence speed and final model behavior 4. **Batch processing efficiency determines training speed** - loss computation must scale to handle large datasets efficiently You now understand how to implement the mathematical foundation that enables neural networks to learn from data and solve real-world problems! """ # %% nbgrader={"grade": false, "grade_id": "final-demo", "locked": false, "schema_version": 3, "solution": false, "task": false} if __name__ == "__main__": print("๐Ÿ”ฅ TinyTorch Loss Functions Module - Complete Demo") print("=" * 55) # Test all core implementations print("\\n๐Ÿงช Testing All Loss Functions:") test_mse_loss() test_crossentropy_loss() test_binary_crossentropy_loss() test_all_loss_functions() print("\\n" + "="*60) print("๐Ÿ“Š Loss Function Usage Examples") print("=" * 35) # Example 1: Regression with MSE print("\\n1. Regression Example (Predicting House Prices):") mse = MeanSquaredError() house_predictions = Tensor([[250000, 180000, 320000]]) # Predicted prices house_actual = Tensor([[240000, 175000, 315000]]) # Actual prices regression_loss = mse(house_predictions, house_actual) print(f" House price prediction loss: ${regression_loss.data:,.0f}ยฒ average error") # Example 2: Multi-class classification with CrossEntropy print("\\n2. Multi-Class Classification Example (Image Recognition):") ce = CrossEntropyLoss() image_logits = Tensor([[2.1, 0.5, -0.3, 1.8, 0.1], # Model outputs for 5 classes [-0.2, 3.1, 0.8, -1.0, 0.4]]) # (cat, dog, bird, fish, rabbit) true_classes = Tensor([0, 1]) # First image = cat, second = dog classification_loss = ce(image_logits, true_classes) print(f" Image classification loss: {classification_loss.data:.4f}") # Example 3: Binary classification with BCE print("\\n3. Binary Classification Example (Spam Detection):") bce = BinaryCrossEntropyLoss() spam_logits = Tensor([[1.2], [-0.8], [2.1], [-1.5]]) # Spam prediction logits spam_labels = Tensor([[1.0], [0.0], [1.0], [0.0]]) # 1=spam, 0=not spam spam_loss = bce(spam_logits, spam_labels) print(f" Spam detection loss: {spam_loss.data:.4f}") print("\\n" + "="*60) print("๐ŸŽฏ Loss Function Characteristics") print("=" * 35) # Compare perfect vs imperfect predictions print("\\n๐Ÿ“Š Perfect vs Random Predictions:") # Perfect predictions perfect_mse = mse(Tensor([5.0]), Tensor([5.0])) perfect_ce = ce(Tensor([[10.0, 0.0, 0.0]]), Tensor([0])) perfect_bce = bce(Tensor([10.0]), Tensor([1.0])) print(f" Perfect MSE loss: {perfect_mse.data:.6f}") print(f" Perfect CE loss: {perfect_ce.data:.6f}") print(f" Perfect BCE loss: {perfect_bce.data:.6f}") # Random predictions random_mse = mse(Tensor([3.0]), Tensor([5.0])) # Off by 2 random_ce = ce(Tensor([[0.0, 0.0, 0.0]]), Tensor([0])) # Uniform distribution random_bce = bce(Tensor([0.0]), Tensor([1.0])) # 50% confidence print(f" Random MSE loss: {random_mse.data:.6f}") print(f" Random CE loss: {random_ce.data:.6f}") print(f" Random BCE loss: {random_bce.data:.6f}") print("\\n๐ŸŽ‰ Complete loss function foundation ready!") print(" โœ… MSE for regression problems") print(" โœ… CrossEntropy for multi-class classification") print(" โœ… Binary CrossEntropy for binary classification") print(" โœ… Numerically stable implementations") print(" โœ… Production-ready batch processing") print(" โœ… Ready for neural network training!")