TinyTorch/modules/source/12_compression/compression_dev.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8828a71f",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "# Compression & Optimization - Making AI Models Efficient\n",
    "\n",
    "Welcome to the Compression module! This is where you'll learn to make neural networks smaller, faster, and more efficient for real-world deployment.\n",
    "\n",
    "## Learning Goals\n",
    "- Understand how model size affects deployment and why compression matters\n",
    "- Implement magnitude-based pruning to remove unimportant weights\n",
    "- Master quantization to reduce memory usage by 75%\n",
    "- Build knowledge distillation for training compact models\n",
    "- Create structured pruning to optimize network architectures\n",
    "- Compare compression techniques and their trade-offs\n",
    "\n",
    "## Build → Use → Optimize\n",
    "1. **Build**: Four compression techniques from scratch\n",
    "2. **Use**: Apply compression to real neural networks\n",
    "3. **Optimize**: Combine techniques for maximum efficiency gains"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73c55227",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "compression-imports",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| default_exp core.compression\n",
    "\n",
    "#| export\n",
    "import numpy as np\n",
    "import sys\n",
    "import os\n",
    "import math\n",
    "from typing import List, Dict, Any, Optional, Union, Tuple\n",
    "from collections import defaultdict\n",
    "\n",
    "# Helper function to set up import paths\n",
    "def setup_import_paths():\n",
    "    \"\"\"Set up import paths for development modules.\"\"\"\n",
    "    import sys\n",
    "    import os\n",
    "    \n",
    "    # Add module directories to path\n",
    "    base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n",
    "    module_dirs = [\n",
    "        '01_tensor', '02_activations', '03_layers', '04_networks', \n",
    "        '05_cnn', '06_dataloader', '07_autograd', '08_optimizers', '09_training'\n",
    "    ]\n",
    "    \n",
    "    for module_dir in module_dirs:\n",
    "        sys.path.append(os.path.join(base_dir, module_dir))\n",
    "\n",
    "# Set up paths\n",
    "setup_import_paths()\n",
    "\n",
    "# Import all the building blocks we need\n",
    "try:\n",
    "    from tinytorch.core.tensor import Tensor\n",
    "    from tinytorch.core.layers import Dense\n",
    "    from tinytorch.core.networks import Sequential\n",
    "    from tinytorch.core.training import CrossEntropyLoss, Trainer\n",
    "except ImportError:\n",
    "    # For development, create mock classes or import from local modules\n",
    "    try:\n",
    "        from tensor_dev import Tensor\n",
    "        from layers_dev import Dense\n",
    "        from networks_dev import Sequential\n",
    "        from training_dev import CrossEntropyLoss, Trainer\n",
    "    except ImportError:\n",
    "        # Create minimal mock classes for development\n",
    "        class Tensor:\n",
    "            def __init__(self, data):\n",
    "                self.data = np.array(data)\n",
    "                self.shape = self.data.shape\n",
    "            \n",
    "            def __str__(self):\n",
    "                return f\"Tensor({self.data})\"\n",
    "        \n",
    "        class Dense:\n",
    "            def __init__(self, input_size, output_size):\n",
    "                self.input_size = input_size\n",
    "                self.output_size = output_size\n",
    "                self.weights = Tensor(np.random.randn(input_size, output_size) * 0.1)\n",
    "                self.bias = Tensor(np.zeros(output_size))\n",
    "            \n",
    "            def __str__(self):\n",
    "                return f\"Dense({self.input_size}, {self.output_size})\"\n",
    "        \n",
    "        class Sequential:\n",
    "            def __init__(self, layers=None):\n",
    "                self.layers = layers or []\n",
    "        \n",
    "        class CrossEntropyLoss:\n",
    "            def __init__(self):\n",
    "                pass\n",
    "        \n",
    "        class Trainer:\n",
    "            def __init__(self, model, optimizer, loss_function):\n",
    "                self.model = model\n",
    "                self.optimizer = optimizer\n",
    "                self.loss_function = loss_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a937158b",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "compression-setup",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "print(\"🔥 TinyTorch Compression Module\")\n",
    "print(f\"NumPy version: {np.__version__}\")\n",
    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
    "print(\"Ready to compress neural networks!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2367326",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 📦 Where This Code Lives in the Final Package\n",
    "\n",
    "**Learning Side:** You work in `modules/source/10_compression/compression_dev.py`  \n",
    "**Building Side:** Code exports to `tinytorch.core.compression`\n",
    "\n",
    "```python\n",
    "# Final package structure:\n",
    "from tinytorch.core.compression import (\n",
    "    prune_weights_by_magnitude,    # Remove unimportant weights\n",
    "    quantize_layer_weights,        # Reduce precision for memory savings\n",
    "    DistillationLoss,              # Train compact models with teacher guidance\n",
    "    prune_layer_neurons,           # Remove entire neurons/channels\n",
    "    CompressionMetrics             # Measure model size and efficiency\n",
    ")\n",
    "from tinytorch.core.layers import Dense     # Target for compression\n",
    "from tinytorch.core.networks import Sequential  # Model architectures\n",
    "```\n",
    "\n",
    "**Why this matters:**\n",
    "- **Learning:** Focused module for understanding model efficiency\n",
    "- **Production:** Proper organization like PyTorch's compression tools\n",
    "- **Consistency:** All compression techniques live together in `core.compression`\n",
    "- **Foundation:** Essential for deploying AI in resource-constrained environments"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6860a130",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## What is Model Compression?\n",
    "\n",
    "### The Problem: AI Models Are Getting Huge\n",
    "Modern neural networks are massive:\n",
    "- **GPT-3**: 175 billion parameters (350GB memory)\n",
    "- **ResNet-152**: 60 million parameters (240MB memory)\n",
    "- **BERT-Large**: 340 million parameters (1.3GB memory)\n",
    "\n",
    "But deployment environments have constraints:\n",
    "- **Mobile phones**: Limited memory and battery\n",
    "- **Edge devices**: No internet, minimal compute\n",
    "- **Real-time systems**: Strict latency requirements\n",
    "- **Cost optimization**: Expensive inference in cloud\n",
    "\n",
    "### The Solution: Intelligent Compression\n",
    "**Model compression** reduces model size while preserving performance:\n",
    "- **Pruning**: Remove unimportant weights and neurons\n",
    "- **Quantization**: Use fewer bits per parameter\n",
    "- **Knowledge distillation**: Train small models to mimic large ones\n",
    "- **Structured optimization**: Modify architectures for efficiency\n",
    "\n",
    "### Real-World Impact\n",
    "- **Mobile AI**: Apps like Google Translate work offline\n",
    "- **Autonomous vehicles**: Real-time processing with limited compute\n",
    "- **IoT devices**: Smart cameras, voice assistants, sensors\n",
    "- **Cost savings**: Reduced inference costs in production systems\n",
    "\n",
    "### What We'll Build\n",
    "1. **Magnitude-based pruning**: Remove smallest weights\n",
    "2. **Quantization**: Convert FP32 → INT8 for 75% memory reduction\n",
    "3. **Knowledge distillation**: Large models teach small models\n",
    "4. **Structured pruning**: Remove entire neurons systematically\n",
    "5. **Compression metrics**: Measure efficiency and accuracy trade-offs\n",
    "6. **Integrated optimization**: Combine techniques for maximum benefit"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6dc048fd",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 1: Understanding Model Size and Parameters\n",
    "\n",
    "### What Makes Models Large?\n",
    "Neural networks have millions of parameters:\n",
    "- **Dense layers**: Weight matrices `(input_size, output_size)`\n",
    "- **Bias vectors**: One per output neuron\n",
    "- **CNN kernels**: Repeated across channels and filters\n",
    "- **Embeddings**: Large vocabulary mappings\n",
    "\n",
    "### The Memory Reality Check\n",
    "Let's see how much memory different architectures use:\n",
    "\n",
    "```python\n",
    "# Simple MLP for MNIST\n",
    "layer1 = Dense(784, 128)    # 784 * 128 = 100,352 params\n",
    "layer2 = Dense(128, 64)     # 128 * 64 = 8,192 params  \n",
    "layer3 = Dense(64, 10)      # 64 * 10 = 640 params\n",
    "# Total: 109,184 params ≈ 437KB (FP32)\n",
    "\n",
    "# Larger network for CIFAR-10\n",
    "layer1 = Dense(3072, 512)   # 3072 * 512 = 1,572,864 params\n",
    "layer2 = Dense(512, 256)    # 512 * 256 = 131,072 params\n",
    "layer3 = Dense(256, 128)    # 256 * 128 = 32,768 params\n",
    "layer4 = Dense(128, 10)     # 128 * 10 = 1,280 params\n",
    "# Total: 1,737,984 params ≈ 7MB (FP32)\n",
    "```\n",
    "\n",
    "### Why Size Matters\n",
    "- **Memory usage**: Each FP32 parameter uses 4 bytes\n",
    "- **Storage**: Model files need to be downloaded/stored\n",
    "- **Inference speed**: More parameters = more computation\n",
    "- **Energy consumption**: Larger models drain battery faster\n",
    "\n",
    "### The Efficiency Spectrum\n",
    "Different applications need different efficiency levels:\n",
    "- **Research**: Accuracy first, efficiency second\n",
    "- **Production**: Balance accuracy and efficiency\n",
    "- **Mobile**: Strict size constraints (< 10MB)\n",
    "- **Edge**: Extreme efficiency requirements (< 1MB)\n",
    "\n",
    "### Real-World Examples\n",
    "- **MobileNet**: Designed for mobile deployment\n",
    "- **DistilBERT**: 60% smaller than BERT with 97% performance\n",
    "- **TinyML**: Models under 1MB for microcontrollers\n",
    "- **Neural architecture search**: Automated efficiency optimization\n",
    "\n",
    "Let's build tools to measure and analyze model size!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "76eed78f",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "compression-metrics",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "class CompressionMetrics:\n",
    "    \"\"\"\n",
    "    Utilities for measuring model size, sparsity, and compression efficiency.\n",
    "    \n",
    "    This class provides tools to analyze neural network models and understand\n",
    "    their memory footprint, parameter distribution, and compression potential.\n",
    "    \"\"\"\n",
    "    \n",
    "    def __init__(self):\n",
    "        \"\"\"Initialize compression metrics analyzer.\"\"\"\n",
    "        pass\n",
    "    \n",
    "    def count_parameters(self, model: Sequential) -> Dict[str, int]:\n",
    "        \"\"\"\n",
    "        Count parameters in a neural network model.\n",
    "        \n",
    "        Args:\n",
    "            model: Sequential model to analyze\n",
    "            \n",
    "        Returns:\n",
    "            Dictionary with parameter counts per layer and total\n",
    "            \n",
    "        TODO: Implement parameter counting for neural network analysis.\n",
    "        \n",
    "        STEP-BY-STEP IMPLEMENTATION:\n",
    "        1. Initialize counters for different parameter types\n",
    "        2. Iterate through each layer in the model\n",
    "        3. Count weights and biases for each layer\n",
    "        4. Calculate total parameters across all layers\n",
    "        5. Return detailed breakdown dictionary\n",
    "        \n",
    "        EXAMPLE OUTPUT:\n",
    "        {\n",
    "            'layer_0_weights': 100352,\n",
    "            'layer_0_bias': 128,\n",
    "            'layer_1_weights': 8192,\n",
    "            'layer_1_bias': 64,\n",
    "            'layer_2_weights': 640,\n",
    "            'layer_2_bias': 10,\n",
    "            'total_parameters': 109386,\n",
    "            'total_weights': 109184,\n",
    "            'total_bias': 202\n",
    "        }\n",
    "        \n",
    "        IMPLEMENTATION HINTS:\n",
    "        - Use hasattr() to check if layer has weights/bias attributes\n",
    "        - Weight matrices have shape (input_size, output_size)\n",
    "        - Bias vectors have shape (output_size,)\n",
    "        - Use np.prod() to calculate total elements from shape\n",
    "        - Track layer index for detailed reporting\n",
    "        \n",
    "        LEARNING CONNECTIONS:\n",
    "        - This is like `model.numel()` in PyTorch\n",
    "        - Understanding where parameters are concentrated\n",
    "        - Foundation for compression target selection\n",
    "        \"\"\"\n",
    "        ### BEGIN SOLUTION\n",
    "        param_counts = {}\n",
    "        total_params = 0\n",
    "        total_weights = 0\n",
    "        total_bias = 0\n",
    "        \n",
    "        for i, layer in enumerate(model.layers):\n",
    "            # Count weights if layer has them\n",
    "            if hasattr(layer, 'weights') and layer.weights is not None:\n",
    "                # Handle different weight formats\n",
    "                if hasattr(layer.weights, 'shape'):\n",
    "                    weight_count = np.prod(layer.weights.shape)\n",
    "                else:\n",
    "                    weight_count = np.prod(layer.weights.data.shape)\n",
    "                \n",
    "                param_counts[f'layer_{i}_weights'] = weight_count\n",
    "                total_weights += weight_count\n",
    "                total_params += weight_count\n",
    "            \n",
    "            # Count bias if layer has them\n",
    "            if hasattr(layer, 'bias') and layer.bias is not None:\n",
    "                # Handle different bias formats\n",
    "                if hasattr(layer.bias, 'shape'):\n",
    "                    bias_count = np.prod(layer.bias.shape)\n",
    "                else:\n",
    "                    bias_count = np.prod(layer.bias.data.shape)\n",
    "                \n",
    "                param_counts[f'layer_{i}_bias'] = bias_count\n",
    "                total_bias += bias_count\n",
    "                total_params += bias_count\n",
    "        \n",
    "        # Add summary statistics\n",
    "        param_counts['total_parameters'] = total_params\n",
    "        param_counts['total_weights'] = total_weights\n",
    "        param_counts['total_bias'] = total_bias\n",
    "        \n",
    "        return param_counts\n",
    "        ### END SOLUTION \n",
    "\n",
    "    def calculate_model_size(self, model: Sequential, dtype: str = 'float32') -> Dict[str, Any]:\n",
    "        \"\"\"\n",
    "        Calculate memory footprint of a neural network model.\n",
    "        \n",
    "        Args:\n",
    "            model: Sequential model to analyze\n",
    "            dtype: Data type for size calculation ('float32', 'float16', 'int8')\n",
    "            \n",
    "        Returns:\n",
    "            Dictionary with size information in different units\n",
    "        \"\"\"\n",
    "        # Get parameter count\n",
    "        param_info = self.count_parameters(model)\n",
    "        total_params = param_info['total_parameters']\n",
    "        \n",
    "        # Determine bytes per parameter\n",
    "        bytes_per_param = {\n",
    "            'float32': 4,\n",
    "            'float16': 2,\n",
    "            'int8': 1\n",
    "        }.get(dtype, 4)\n",
    "        \n",
    "        # Calculate sizes\n",
    "        total_bytes = total_params * bytes_per_param\n",
    "        size_kb = total_bytes / 1024\n",
    "        size_mb = size_kb / 1024\n",
    "        \n",
    "        return {\n",
    "            'total_parameters': total_params,\n",
    "            'bytes_per_parameter': bytes_per_param,\n",
    "            'total_bytes': total_bytes,\n",
    "            'size_kb': round(size_kb, 2),\n",
    "            'size_mb': round(size_mb, 2),\n",
    "            'dtype': dtype\n",
    "        }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b810a6a",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "test-compression-metrics",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_compression_metrics():\n",
    "    \"\"\"\n",
    "    ### 🧪 Unit Test: CompressionMetrics\n",
    "    \n",
    "    Test parameter counting and model size analysis functionality.\n",
    "    \n",
    "    **This is a unit test** - it tests model size analysis in isolation.\n",
    "    \"\"\"\n",
    "    print(\"🔬 Unit Test: CompressionMetrics\")\n",
    "    print(\"**This is a unit test** - it tests model size analysis in isolation.\")\n",
    "    \n",
    "    # Create test model\n",
    "    layers = [\n",
    "        Dense(784, 128),  # 784 * 128 + 128 = 100,480 params\n",
    "        Dense(128, 64),   # 128 * 64 + 64 = 8,256 params\n",
    "        Dense(64, 10)     # 64 * 10 + 10 = 650 params\n",
    "    ]\n",
    "    model = Sequential(layers)\n",
    "    \n",
    "    # Test parameter counting\n",
    "    metrics = CompressionMetrics()\n",
    "    param_counts = metrics.count_parameters(model)\n",
    "    \n",
    "    # Verify parameter counts\n",
    "    assert param_counts['layer_0_weights'] == 100352, f\"Expected 100352, got {param_counts['layer_0_weights']}\"\n",
    "    assert param_counts['layer_0_bias'] == 128, f\"Expected 128, got {param_counts['layer_0_bias']}\"\n",
    "    assert param_counts['total_parameters'] == 109386, f\"Expected 109386, got {param_counts['total_parameters']}\"\n",
    "    \n",
    "    print(\"📈 Progress: CompressionMetrics ✓\")\n",
    "    print(\"🎯 CompressionMetrics behavior:\")\n",
    "    print(\"  - Counts parameters across all layers\")\n",
    "    print(\"  - Provides detailed breakdown by layer\")\n",
    "    print(\"  - Separates weight and bias counts\")\n",
    "    print(\"  - Foundation for compression analysis\")\n",
    "    print()\n",
    "\n",
    "# Run the test\n",
    "test_compression_metrics() "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a83a0b59",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 2: Magnitude-Based Pruning - Removing Unimportant Weights\n",
    "\n",
    "### What is Magnitude-Based Pruning?\n",
    "**Magnitude-based pruning** removes weights with the smallest absolute values, based on the hypothesis that small weights contribute less to the model's performance.\n",
    "\n",
    "### The Algorithm\n",
    "1. **Calculate magnitude**: `|weight|` for each parameter\n",
    "2. **Set threshold**: Choose cutoff (e.g., 50th percentile)\n",
    "3. **Create mask**: `mask = |weight| > threshold`\n",
    "4. **Apply pruning**: `pruned_weight = weight * mask`\n",
    "\n",
    "### Why This Works\n",
    "- **Redundancy**: Neural networks are over-parameterized\n",
    "- **Lottery ticket hypothesis**: Small subnetworks can match full performance\n",
    "- **Magnitude correlation**: Larger weights often more important\n",
    "- **Gradual degradation**: Performance drops slowly with pruning\n",
    "\n",
    "### Real-World Applications\n",
    "- **Mobile deployment**: Reduce model size for smartphones\n",
    "- **Edge computing**: Fit models on resource-constrained devices\n",
    "- **Inference acceleration**: Fewer parameters = faster computation\n",
    "- **Memory optimization**: Sparse matrices save storage\n",
    "\n",
    "### Pruning Strategies\n",
    "- **Global**: Single threshold across all layers\n",
    "- **Layer-wise**: Different thresholds per layer\n",
    "- **Structured**: Remove entire neurons/channels\n",
    "- **Gradual**: Increase sparsity during training\n",
    "\n",
    "### Performance vs Sparsity Trade-off\n",
    "- **10-30% sparsity**: Minimal accuracy loss\n",
    "- **50-70% sparsity**: Moderate accuracy drop\n",
    "- **80-90% sparsity**: Significant accuracy loss\n",
    "- **95%+ sparsity**: Requires careful tuning\n",
    "\n",
    "Let's implement magnitude-based pruning!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa8e7fca",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "magnitude-pruning",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def prune_weights_by_magnitude(layer: Dense, pruning_ratio: float = 0.5) -> Tuple[Dense, Dict[str, Any]]:\n",
    "    \"\"\"\n",
    "    Prune weights in a Dense layer by magnitude.\n",
    "    \n",
    "    Args:\n",
    "        layer: Dense layer to prune\n",
    "        pruning_ratio: Fraction of weights to remove (0.0 to 1.0)\n",
    "        \n",
    "    Returns:\n",
    "        Tuple of (pruned_layer, pruning_info)\n",
    "        \n",
    "    TODO: Implement magnitude-based weight pruning.\n",
    "    \n",
    "    STEP-BY-STEP IMPLEMENTATION:\n",
    "    1. Get weight matrix from layer\n",
    "    2. Calculate absolute values (magnitudes)\n",
    "    3. Find threshold using percentile\n",
    "    4. Create binary mask for weights above threshold\n",
    "    5. Apply mask to weights (set small weights to zero)\n",
    "    6. Update layer weights and return pruning statistics\n",
    "    \n",
    "    EXAMPLE USAGE:\n",
    "    ```python\n",
    "    layer = Dense(784, 128)\n",
    "    pruned_layer, info = prune_weights_by_magnitude(layer, pruning_ratio=0.3)\n",
    "    print(f\"Pruned {info['weights_removed']} weights, sparsity: {info['sparsity']:.2f}\")\n",
    "    ```\n",
    "    \n",
    "    IMPLEMENTATION HINTS:\n",
    "    - Use np.percentile() with pruning_ratio * 100 for threshold\n",
    "    - Create mask with np.abs(weights) > threshold\n",
    "    - Apply mask by element-wise multiplication\n",
    "    - Count zeros to calculate sparsity\n",
    "    - Return original layer (modified) and statistics\n",
    "    \n",
    "    LEARNING CONNECTIONS:\n",
    "    - This is the foundation of network pruning\n",
    "    - Magnitude pruning is simplest but effective\n",
    "    - Sparsity = fraction of weights that are zero\n",
    "    - Threshold selection affects accuracy vs compression trade-off\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    # Get current weights and ensure they're numpy arrays\n",
    "    weights = layer.weights.data\n",
    "    if not isinstance(weights, np.ndarray):\n",
    "        weights = np.array(weights)\n",
    "    \n",
    "    original_weights = weights.copy()\n",
    "    \n",
    "    # Calculate magnitudes and threshold\n",
    "    magnitudes = np.abs(weights)\n",
    "    threshold = np.percentile(magnitudes, pruning_ratio * 100)\n",
    "    \n",
    "    # Create mask and apply pruning\n",
    "    mask = magnitudes > threshold\n",
    "    pruned_weights = weights * mask\n",
    "    \n",
    "    # Update layer weights\n",
    "    layer.weights.data = pruned_weights\n",
    "    \n",
    "    # Calculate pruning statistics\n",
    "    total_weights = weights.size\n",
    "    zero_weights = np.sum(pruned_weights == 0)\n",
    "    weights_removed = zero_weights - np.sum(original_weights == 0)\n",
    "    sparsity = zero_weights / total_weights\n",
    "    \n",
    "    pruning_info = {\n",
    "        'pruning_ratio': pruning_ratio,\n",
    "        'threshold': float(threshold),\n",
    "        'total_weights': total_weights,\n",
    "        'weights_removed': weights_removed,\n",
    "        'remaining_weights': total_weights - zero_weights,\n",
    "        'sparsity': float(sparsity),\n",
    "        'compression_ratio': 1 / (1 - sparsity) if sparsity < 1 else float('inf')\n",
    "    }\n",
    "    \n",
    "    return layer, pruning_info\n",
    "    ### END SOLUTION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a20feb97",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "calculate-sparsity",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def calculate_sparsity(layer: Dense) -> float:\n",
    "    \"\"\"\n",
    "    Calculate sparsity (fraction of zero weights) in a Dense layer.\n",
    "    \n",
    "    Args:\n",
    "        layer: Dense layer to analyze\n",
    "        \n",
    "    Returns:\n",
    "        Sparsity as float between 0.0 and 1.0\n",
    "        \n",
    "    TODO: Implement sparsity calculation.\n",
    "    \n",
    "    STEP-BY-STEP IMPLEMENTATION:\n",
    "    1. Get weight matrix from layer\n",
    "    2. Count total number of weights\n",
    "    3. Count number of zero weights\n",
    "    4. Calculate sparsity = zero_weights / total_weights\n",
    "    5. Return as float\n",
    "    \n",
    "    EXAMPLE USAGE:\n",
    "    ```python\n",
    "    layer = Dense(100, 50)\n",
    "    sparsity = calculate_sparsity(layer)\n",
    "    print(f\"Layer sparsity: {sparsity:.2%}\")\n",
    "    ```\n",
    "    \n",
    "    IMPLEMENTATION HINTS:\n",
    "    - Use np.sum() with condition to count zeros\n",
    "    - Use .size attribute for total elements\n",
    "    - Return 0.0 if no weights (edge case)\n",
    "    - Sparsity of 0.0 = dense, 1.0 = completely sparse\n",
    "    \n",
    "    LEARNING CONNECTIONS:\n",
    "    - Sparsity is key metric for compression\n",
    "    - Higher sparsity = more compression\n",
    "    - Sparsity patterns affect hardware efficiency\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    if not hasattr(layer, 'weights') or layer.weights is None:\n",
    "        return 0.0\n",
    "    \n",
    "    weights = layer.weights.data\n",
    "    if not isinstance(weights, np.ndarray):\n",
    "        weights = np.array(weights)\n",
    "    \n",
    "    total_weights = weights.size\n",
    "    zero_weights = np.sum(weights == 0)\n",
    "    \n",
    "    return zero_weights / total_weights if total_weights > 0 else 0.0\n",
    "    ### END SOLUTION "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3082fa17",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "test-pruning",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_magnitude_pruning():\n",
    "    \"\"\"\n",
    "    ### 🧪 Unit Test: Magnitude-Based Pruning\n",
    "    \n",
    "    Test weight pruning algorithms and sparsity calculation.\n",
    "    \n",
    "    **This is a unit test** - it tests weight pruning in isolation.\n",
    "    \"\"\"\n",
    "    print(\"🔬 Unit Test: Magnitude-Based Pruning\")\n",
    "    print(\"**This is a unit test** - it tests weight pruning in isolation.\")\n",
    "    \n",
    "    # Create test layer\n",
    "    layer = Dense(100, 50)\n",
    "    \n",
    "    # Test basic pruning\n",
    "    pruned_layer, info = prune_weights_by_magnitude(layer, pruning_ratio=0.3)\n",
    "    \n",
    "    # Verify pruning results\n",
    "    assert info['pruning_ratio'] == 0.3, f\"Expected 0.3, got {info['pruning_ratio']}\"\n",
    "    assert info['total_weights'] == 5000, f\"Expected 5000, got {info['total_weights']}\"\n",
    "    assert info['sparsity'] >= 0.3, f\"Sparsity should be at least 0.3, got {info['sparsity']}\"\n",
    "    \n",
    "    print(f\"✅ Basic pruning works: {info['sparsity']:.2%} sparsity\")\n",
    "    \n",
    "    # Test sparsity calculation\n",
    "    sparsity = calculate_sparsity(layer)\n",
    "    assert abs(sparsity - info['sparsity']) < 0.001, f\"Sparsity mismatch: {sparsity} vs {info['sparsity']}\"\n",
    "    print(f\"✅ Sparsity calculation works: {sparsity:.2%}\")\n",
    "    \n",
    "    # Test edge cases\n",
    "    empty_layer = Dense(10, 10)\n",
    "    empty_layer.weights.data = np.zeros((10, 10))\n",
    "    sparsity_empty = calculate_sparsity(empty_layer)\n",
    "    assert sparsity_empty == 1.0, f\"Empty layer should have 1.0 sparsity, got {sparsity_empty}\"\n",
    "    \n",
    "    print(\"✅ Edge cases work correctly\")\n",
    "    \n",
    "    # Test different pruning ratios\n",
    "    layer2 = Dense(50, 25)\n",
    "    _, info50 = prune_weights_by_magnitude(layer2, pruning_ratio=0.5)\n",
    "    \n",
    "    layer3 = Dense(50, 25)\n",
    "    _, info80 = prune_weights_by_magnitude(layer3, pruning_ratio=0.8)\n",
    "    \n",
    "    assert info80['sparsity'] > info50['sparsity'], \"Higher pruning ratio should give higher sparsity\"\n",
    "    print(f\"✅ Different pruning ratios work: 50% ratio = {info50['sparsity']:.2%}, 80% ratio = {info80['sparsity']:.2%}\")\n",
    "    \n",
    "    print(\"📈 Progress: Magnitude-Based Pruning ✓\")\n",
    "    print(\"🎯 Pruning behavior:\")\n",
    "    print(\"  - Removes weights with smallest absolute values\")\n",
    "    print(\"  - Maintains layer structure and connectivity\")\n",
    "    print(\"  - Provides detailed statistics for analysis\")\n",
    "    print(\"  - Scales to different pruning ratios\")\n",
    "    print()\n",
    "\n",
    "# Run the test\n",
    "test_magnitude_pruning() "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89e3cba2",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 3: Quantization - Reducing Precision for Memory Efficiency\n",
    "\n",
    "### What is Quantization?\n",
    "**Quantization** reduces the precision of weights from FP32 (32-bit) to lower bit-widths like INT8 (8-bit), achieving significant memory savings with minimal accuracy loss.\n",
    "\n",
    "### The Mathematical Foundation\n",
    "Quantization maps continuous floating-point values to discrete integer values:\n",
    "\n",
    "```\n",
    "quantized_value = round((fp_value - min_val) / scale)\n",
    "scale = (max_val - min_val) / (2^bits - 1)\n",
    "```\n",
    "\n",
    "### Why Quantization Works\n",
    "- **Redundant precision**: Neural networks are robust to precision reduction\n",
    "- **Hardware efficiency**: Integer operations are faster than floating-point\n",
    "- **Memory savings**: 4x reduction (FP32 → INT8) in memory usage\n",
    "- **Cache efficiency**: More parameters fit in limited cache memory\n",
    "\n",
    "### Types of Quantization\n",
    "- **Post-training**: Quantize after training is complete\n",
    "- **Quantization-aware training**: Train with quantization simulation\n",
    "- **Dynamic**: Quantize activations at runtime\n",
    "- **Static**: Pre-compute quantization parameters\n",
    "\n",
    "### Real-World Impact\n",
    "- **Mobile deployment**: 75% memory reduction enables smartphone AI\n",
    "- **Edge computing**: Fit larger models on constrained devices\n",
    "- **Cloud efficiency**: Reduce bandwidth and storage costs\n",
    "- **Battery life**: Lower power consumption for mobile devices\n",
    "\n",
    "### Common Bit-Widths\n",
    "- **FP32**: Full precision (baseline)\n",
    "- **FP16**: Half precision (2x memory reduction)\n",
    "- **INT8**: 8-bit integers (4x memory reduction)\n",
    "- **INT4**: 4-bit integers (8x memory reduction, aggressive)\n",
    "\n",
    "Let's implement quantization algorithms!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6afd2132",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "quantization",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def quantize_layer_weights(layer: Dense, bits: int = 8) -> Tuple[Dense, Dict[str, Any]]:\n",
    "    \"\"\"\n",
    "    Quantize layer weights to reduce precision.\n",
    "    \n",
    "    Args:\n",
    "        layer: Dense layer to quantize\n",
    "        bits: Number of bits for quantization (8, 16, etc.)\n",
    "        \n",
    "    Returns:\n",
    "        Tuple of (quantized_layer, quantization_info)\n",
    "        \n",
    "    TODO: Implement weight quantization for memory efficiency.\n",
    "    \n",
    "    STEP-BY-STEP IMPLEMENTATION:\n",
    "    1. Get weight matrix from layer\n",
    "    2. Find min and max values for quantization range\n",
    "    3. Calculate scale factor: (max - min) / (2^bits - 1)\n",
    "    4. Quantize: round((weights - min) / scale)\n",
    "    5. Dequantize back to float: quantized * scale + min\n",
    "    6. Update layer weights and return statistics\n",
    "    \n",
    "    EXAMPLE USAGE:\n",
    "    ```python\n",
    "    layer = Dense(784, 128)\n",
    "    quantized_layer, info = quantize_layer_weights(layer, bits=8)\n",
    "    print(f\"Memory reduction: {info['memory_reduction']:.1f}x\")\n",
    "    ```\n",
    "    \n",
    "    IMPLEMENTATION HINTS:\n",
    "    - Use np.min() and np.max() to find weight range\n",
    "    - Clamp quantized values to valid range [0, 2^bits-1]\n",
    "    - Store original dtype for memory calculation\n",
    "    - Calculate theoretical memory savings\n",
    "    \n",
    "    LEARNING CONNECTIONS:\n",
    "    - This is how mobile AI frameworks work\n",
    "    - Hardware accelerators optimize for INT8\n",
    "    - Precision-performance trade-off is key\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    # Get current weights and ensure they're numpy arrays\n",
    "    weights = layer.weights.data\n",
    "    if not isinstance(weights, np.ndarray):\n",
    "        weights = np.array(weights)\n",
    "    \n",
    "    original_weights = weights.copy()\n",
    "    original_dtype = weights.dtype\n",
    "    \n",
    "    # Find min and max for quantization range\n",
    "    w_min, w_max = np.min(weights), np.max(weights)\n",
    "    \n",
    "    # Calculate scale factor\n",
    "    scale = (w_max - w_min) / (2**bits - 1)\n",
    "    \n",
    "    # Quantize weights\n",
    "    quantized = np.round((weights - w_min) / scale)\n",
    "    quantized = np.clip(quantized, 0, 2**bits - 1)  # Clamp to valid range\n",
    "    \n",
    "    # Dequantize back to float (simulation of quantized inference)\n",
    "    dequantized = quantized * scale + w_min\n",
    "    \n",
    "    # Update layer weights\n",
    "    layer.weights.data = dequantized.astype(np.float32)\n",
    "    \n",
    "    # Calculate quantization statistics\n",
    "    total_weights = weights.size\n",
    "    original_bytes = total_weights * 4  # FP32 = 4 bytes\n",
    "    quantized_bytes = total_weights * (bits // 8)  # bits/8 bytes per weight\n",
    "    memory_reduction = original_bytes / quantized_bytes if quantized_bytes > 0 else 1.0\n",
    "    \n",
    "    # Calculate quantization error\n",
    "    mse_error = np.mean((original_weights - dequantized) ** 2)\n",
    "    max_error = np.max(np.abs(original_weights - dequantized))\n",
    "    \n",
    "    quantization_info = {\n",
    "        'bits': bits,\n",
    "        'scale': float(scale),\n",
    "        'min_val': float(w_min),\n",
    "        'max_val': float(w_max),\n",
    "        'total_weights': total_weights,\n",
    "        'original_bytes': original_bytes,\n",
    "        'quantized_bytes': quantized_bytes,\n",
    "        'memory_reduction': float(memory_reduction),\n",
    "        'mse_error': float(mse_error),\n",
    "        'max_error': float(max_error),\n",
    "        'original_dtype': str(original_dtype)\n",
    "    }\n",
    "    \n",
    "    return layer, quantization_info\n",
    "    ### END SOLUTION "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4d3e171",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "test-quantization",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_quantization():\n",
    "    \"\"\"\n",
    "    ### 🧪 Unit Test: Quantization\n",
    "    \n",
    "    Test weight quantization and precision reduction functionality.\n",
    "    \n",
    "    **This is a unit test** - it tests quantization algorithms in isolation.\n",
    "    \"\"\"\n",
    "    print(\"🔬 Unit Test: Quantization\")\n",
    "    print(\"**This is a unit test** - it tests quantization algorithms in isolation.\")\n",
    "    \n",
    "    # Create test layer\n",
    "    layer = Dense(100, 50)\n",
    "    original_weights = layer.weights.data.copy() if hasattr(layer.weights.data, 'copy') else np.array(layer.weights.data)\n",
    "    \n",
    "    # Test INT8 quantization\n",
    "    quantized_layer, info = quantize_layer_weights(layer, bits=8)\n",
    "    \n",
    "    # Verify quantization results\n",
    "    assert info['bits'] == 8, f\"Expected 8 bits, got {info['bits']}\"\n",
    "    assert info['total_weights'] == 5000, f\"Expected 5000 weights, got {info['total_weights']}\"\n",
    "    assert info['memory_reduction'] == 4.0, f\"Expected 4x reduction, got {info['memory_reduction']}\"\n",
    "    \n",
    "    print(f\"✅ INT8 quantization works: {info['memory_reduction']:.1f}x memory reduction\")\n",
    "    \n",
    "    # Test quantization error\n",
    "    assert info['mse_error'] >= 0, \"MSE error should be non-negative\"\n",
    "    assert info['max_error'] >= 0, \"Max error should be non-negative\"\n",
    "    \n",
    "    print(f\"✅ Quantization error tracking works: MSE={info['mse_error']:.6f}, Max={info['max_error']:.6f}\")\n",
    "    \n",
    "    # Test different bit widths\n",
    "    layer2 = Dense(50, 25)\n",
    "    _, info16 = quantize_layer_weights(layer2, bits=16)\n",
    "    \n",
    "    layer3 = Dense(50, 25)  \n",
    "    _, info4 = quantize_layer_weights(layer3, bits=8)  # Use 8 instead of 4 for valid byte calculation\n",
    "    \n",
    "    assert info16['memory_reduction'] == 2.0, f\"16-bit should give 2x reduction, got {info16['memory_reduction']}\"\n",
    "    print(f\"✅ Different bit widths work: 16-bit = {info16['memory_reduction']:.1f}x, 8-bit = {info4['memory_reduction']:.1f}x\")\n",
    "    \n",
    "    # Test quantization parameters\n",
    "    assert 'scale' in info, \"Scale parameter should be included\"\n",
    "    assert 'min_val' in info, \"Min value should be included\"\n",
    "    assert 'max_val' in info, \"Max value should be included\"\n",
    "    \n",
    "    print(\"✅ Quantization parameters work correctly\")\n",
    "    \n",
    "    print(\"📈 Progress: Quantization ✓\")\n",
    "    print(\"🎯 Quantization behavior:\")\n",
    "    print(\"  - Reduces precision while preserving weights\")\n",
    "    print(\"  - Provides significant memory savings\")\n",
    "    print(\"  - Tracks quantization error and parameters\")\n",
    "    print(\"  - Supports different bit widths\")\n",
    "    print()\n",
    "\n",
    "# Run the test\n",
    "test_quantization() "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "658bdd07",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 4: Knowledge Distillation - Large Models Teach Small Models\n",
    "\n",
    "### What is Knowledge Distillation?\n",
    "**Knowledge distillation** trains a small \"student\" model to mimic the behavior of a large \"teacher\" model, achieving compact models with competitive performance.\n",
    "\n",
    "### The Core Idea\n",
    "Instead of training on hard labels (0 or 1), students learn from soft targets (probabilities) that contain more information about the teacher's knowledge.\n",
    "\n",
    "### The Mathematical Foundation\n",
    "Distillation combines two loss functions:\n",
    "\n",
    "```python\n",
    "# Hard loss: Standard classification loss\n",
    "hard_loss = CrossEntropy(student_logits, true_labels)\n",
    "\n",
    "# Soft loss: Learn from teacher's probability distribution\n",
    "soft_targets = softmax(teacher_logits / temperature)\n",
    "soft_student = softmax(student_logits / temperature)\n",
    "soft_loss = -sum(soft_targets * log(soft_student))\n",
    "\n",
    "# Combined loss\n",
    "total_loss = α * hard_loss + (1 - α) * soft_loss\n",
    "```\n",
    "\n",
    "### Why Distillation Works\n",
    "- **Richer information**: Soft targets contain inter-class relationships\n",
    "- **Teacher knowledge**: Large models learn useful representations\n",
    "- **Regularization**: Soft targets reduce overfitting\n",
    "- **Efficiency**: Small models gain large model insights\n",
    "\n",
    "### Key Parameters\n",
    "- **Temperature (T)**: Controls softness of probability distributions\n",
    "  - High T: Softer, more informative distributions\n",
    "  - Low T: Sharper, more confident predictions\n",
    "- **Alpha (α)**: Balances hard and soft losses\n",
    "  - α = 1.0: Only hard loss (standard training)\n",
    "  - α = 0.0: Only soft loss (pure distillation)\n",
    "\n",
    "### Real-World Applications\n",
    "- **Mobile deployment**: Small models with large model performance\n",
    "- **Edge computing**: Efficient inference with minimal accuracy loss\n",
    "- **Model compression**: Alternative to pruning and quantization\n",
    "- **Multi-task learning**: Transfer knowledge across different tasks\n",
    "\n",
    "### Success Stories\n",
    "- **DistilBERT**: 60% smaller than BERT with 97% performance\n",
    "- **MobileNet**: Distilled from ResNet for mobile deployment\n",
    "- **TinyBERT**: Extreme compression for resource-constrained devices\n",
    "\n",
    "Let's implement knowledge distillation!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa5d5762",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "distillation-loss",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "class DistillationLoss:\n",
    "    \"\"\"\n",
    "    Combined loss function for knowledge distillation.\n",
    "    \n",
    "    This loss combines standard classification loss (hard targets) with\n",
    "    distillation loss (soft targets from teacher) for training compact models.\n",
    "    \"\"\"\n",
    "    \n",
    "    def __init__(self, temperature: float = 3.0, alpha: float = 0.5):\n",
    "        \"\"\"\n",
    "        Initialize distillation loss.\n",
    "        \n",
    "        Args:\n",
    "            temperature: Temperature for softening probability distributions\n",
    "            alpha: Weight for hard loss (1-alpha for soft loss)\n",
    "        \"\"\"\n",
    "        self.temperature = temperature\n",
    "        self.alpha = alpha\n",
    "        self.ce_loss = CrossEntropyLoss()\n",
    "    \n",
    "    def __call__(self, student_logits: np.ndarray, teacher_logits: np.ndarray, \n",
    "                 true_labels: np.ndarray) -> float:\n",
    "        \"\"\"\n",
    "        Calculate combined distillation loss.\n",
    "        \n",
    "        Args:\n",
    "            student_logits: Raw outputs from student model\n",
    "            teacher_logits: Raw outputs from teacher model  \n",
    "            true_labels: Ground truth labels\n",
    "            \n",
    "        Returns:\n",
    "            Combined loss value\n",
    "            \n",
    "        TODO: Implement knowledge distillation loss function.\n",
    "        \n",
    "        STEP-BY-STEP IMPLEMENTATION:\n",
    "        1. Calculate hard loss using standard cross-entropy\n",
    "        2. Apply temperature scaling to both logits\n",
    "        3. Calculate soft targets from teacher logits\n",
    "        4. Calculate soft loss between student and teacher distributions\n",
    "        5. Combine hard and soft losses with alpha weighting\n",
    "        6. Return total loss\n",
    "        \n",
    "        EXAMPLE USAGE:\n",
    "        ```python\n",
    "        distill_loss = DistillationLoss(temperature=3.0, alpha=0.5)\n",
    "        loss = distill_loss(student_out, teacher_out, labels)\n",
    "        ```\n",
    "        \n",
    "        IMPLEMENTATION HINTS:\n",
    "        - Use temperature scaling before softmax: logits / temperature\n",
    "        - Implement stable softmax to avoid numerical issues\n",
    "        - Scale soft loss by temperature^2 (standard practice)\n",
    "        - Ensure proper normalization for both losses\n",
    "        \n",
    "        LEARNING CONNECTIONS:\n",
    "        - This is how DistilBERT was trained\n",
    "        - Temperature controls knowledge transfer richness\n",
    "        - Alpha balances accuracy vs compression\n",
    "        \"\"\"\n",
    "        ### BEGIN SOLUTION\n",
    "        # Convert inputs to numpy arrays if needed\n",
    "        if not isinstance(student_logits, np.ndarray):\n",
    "            student_logits = np.array(student_logits)\n",
    "        if not isinstance(teacher_logits, np.ndarray):\n",
    "            teacher_logits = np.array(teacher_logits)\n",
    "        if not isinstance(true_labels, np.ndarray):\n",
    "            true_labels = np.array(true_labels)\n",
    "        \n",
    "        # Hard loss: standard classification loss\n",
    "        hard_loss = self._cross_entropy_loss(student_logits, true_labels)\n",
    "        \n",
    "        # Soft loss: distillation from teacher\n",
    "        # Apply temperature scaling\n",
    "        teacher_soft = self._softmax(teacher_logits / self.temperature)\n",
    "        student_soft = self._softmax(student_logits / self.temperature)\n",
    "        \n",
    "        # Calculate soft loss (KL divergence)\n",
    "        soft_loss = -np.mean(np.sum(teacher_soft * np.log(student_soft + 1e-10), axis=-1))\n",
    "        \n",
    "        # Scale soft loss by temperature^2 (standard practice)\n",
    "        soft_loss *= (self.temperature ** 2)\n",
    "        \n",
    "        # Combine losses\n",
    "        total_loss = self.alpha * hard_loss + (1 - self.alpha) * soft_loss\n",
    "        \n",
    "        return float(total_loss)\n",
    "        ### END SOLUTION\n",
    "    \n",
    "    def _softmax(self, logits: np.ndarray) -> np.ndarray:\n",
    "        \"\"\"Numerically stable softmax.\"\"\"\n",
    "        # Subtract max for numerical stability\n",
    "        exp_logits = np.exp(logits - np.max(logits, axis=-1, keepdims=True))\n",
    "        return exp_logits / np.sum(exp_logits, axis=-1, keepdims=True)\n",
    "    \n",
    "    def _cross_entropy_loss(self, logits: np.ndarray, labels: np.ndarray) -> float:\n",
    "        \"\"\"Simple cross-entropy loss implementation.\"\"\"\n",
    "        # Convert labels to one-hot if needed\n",
    "        if labels.ndim == 1:\n",
    "            num_classes = logits.shape[-1]\n",
    "            one_hot = np.zeros((labels.shape[0], num_classes))\n",
    "            one_hot[np.arange(labels.shape[0]), labels] = 1\n",
    "            labels = one_hot\n",
    "        \n",
    "        # Apply softmax and calculate cross-entropy\n",
    "        probs = self._softmax(logits)\n",
    "        return -np.mean(np.sum(labels * np.log(probs + 1e-10), axis=-1)) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "444095cc",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "test-distillation",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_distillation():\n",
    "    \"\"\"\n",
    "    ### 🧪 Unit Test: Knowledge Distillation\n",
    "    \n",
    "    Test knowledge distillation loss function and teacher-student training.\n",
    "    \n",
    "    **This is a unit test** - it tests distillation algorithms in isolation.\n",
    "    \"\"\"\n",
    "    print(\"🔬 Unit Test: Knowledge Distillation\")\n",
    "    print(\"**This is a unit test** - it tests distillation algorithms in isolation.\")\n",
    "    \n",
    "    # Create sample data\n",
    "    batch_size, num_classes = 32, 10\n",
    "    student_logits = np.random.randn(batch_size, num_classes) * 0.5\n",
    "    teacher_logits = np.random.randn(batch_size, num_classes) * 2.0  # Teacher is more confident\n",
    "    true_labels = np.random.randint(0, num_classes, batch_size)\n",
    "    \n",
    "    # Test distillation loss\n",
    "    distill_loss = DistillationLoss(temperature=3.0, alpha=0.5)\n",
    "    loss = distill_loss(student_logits, teacher_logits, true_labels)\n",
    "    \n",
    "    # Verify loss computation\n",
    "    assert isinstance(loss, float), f\"Loss should be float, got {type(loss)}\"\n",
    "    assert loss >= 0, f\"Loss should be non-negative, got {loss}\"\n",
    "    \n",
    "    print(f\"✅ Distillation loss computation works: {loss:.4f}\")\n",
    "    \n",
    "    # Test different temperature values\n",
    "    loss_t1 = DistillationLoss(temperature=1.0, alpha=0.5)(student_logits, teacher_logits, true_labels)\n",
    "    loss_t5 = DistillationLoss(temperature=5.0, alpha=0.5)(student_logits, teacher_logits, true_labels)\n",
    "    \n",
    "    print(f\"✅ Temperature scaling works: T=1.0 → {loss_t1:.4f}, T=5.0 → {loss_t5:.4f}\")\n",
    "    \n",
    "    # Test different alpha values\n",
    "    loss_hard = DistillationLoss(temperature=3.0, alpha=1.0)(student_logits, teacher_logits, true_labels)  # Only hard loss\n",
    "    loss_soft = DistillationLoss(temperature=3.0, alpha=0.0)(student_logits, teacher_logits, true_labels)  # Only soft loss\n",
    "    \n",
    "    assert loss_hard != loss_soft, \"Hard and soft losses should be different\"\n",
    "    print(f\"✅ Alpha balancing works: Hard only = {loss_hard:.4f}, Soft only = {loss_soft:.4f}\")\n",
    "    \n",
    "    # Test edge cases\n",
    "    # Identical student and teacher should have low soft loss\n",
    "    identical_logits = np.random.randn(batch_size, num_classes)\n",
    "    loss_identical = DistillationLoss(temperature=3.0, alpha=0.0)(identical_logits, identical_logits, true_labels)\n",
    "    \n",
    "    print(f\"✅ Edge cases work: Identical logits soft loss = {loss_identical:.4f}\")\n",
    "    \n",
    "    # Test internal methods\n",
    "    softmax_result = distill_loss._softmax(student_logits)\n",
    "    assert np.allclose(np.sum(softmax_result, axis=1), 1.0), \"Softmax should sum to 1\"\n",
    "    \n",
    "    print(\"✅ Internal methods work correctly\")\n",
    "    \n",
    "    print(\"📈 Progress: Knowledge Distillation ✓\")\n",
    "    print(\"🎯 Distillation behavior:\")\n",
    "    print(\"  - Combines hard and soft losses effectively\")\n",
    "    print(\"  - Temperature controls knowledge transfer\")\n",
    "    print(\"  - Alpha balances accuracy vs compression\")\n",
    "    print(\"  - Numerically stable softmax implementation\")\n",
    "    print()\n",
    "\n",
    "# Run the test\n",
    "test_distillation() "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "887f8eed",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 5: Structured Pruning - Removing Entire Neurons and Channels\n",
    "\n",
    "### What is Structured Pruning?\n",
    "**Structured pruning** removes entire neurons, channels, or layers rather than individual weights, creating models that are actually faster on hardware.\n",
    "\n",
    "### Structured vs Unstructured Pruning\n",
    "\n",
    "#### **Unstructured Pruning** (What we did in Step 2)\n",
    "- Removes individual weights scattered throughout the matrix\n",
    "- Creates sparse matrices (lots of zeros)\n",
    "- High compression but requires sparse matrix libraries for speedup\n",
    "- Memory savings but limited hardware acceleration\n",
    "\n",
    "#### **Structured Pruning** (What we're doing now)\n",
    "- Removes entire rows/columns (neurons/channels)\n",
    "- Creates smaller dense matrices\n",
    "- Lower compression but actual hardware speedup\n",
    "- Real reduction in computation and memory access\n",
    "\n",
    "### The Mathematical Impact\n",
    "Removing a neuron from a Dense layer:\n",
    "\n",
    "```python\n",
    "# Original layer: Dense(784, 128)\n",
    "# Weight matrix: (784, 128), Bias: (128,)\n",
    "\n",
    "# After removing 32 neurons: Dense(784, 96)\n",
    "# Weight matrix: (784, 96), Bias: (96,)\n",
    "# 25% reduction in parameters and computation\n",
    "```\n",
    "\n",
    "### Why Structured Pruning Works\n",
    "- **Hardware efficiency**: Dense matrix operations are optimized\n",
    "- **Memory bandwidth**: Smaller matrices mean less data movement\n",
    "- **Cache utilization**: Better memory access patterns\n",
    "- **Real speedup**: Actual reduction in FLOPs and inference time\n",
    "\n",
    "### Neuron Importance Metrics\n",
    "How do we decide which neurons to remove?\n",
    "\n",
    "1. **Activation-based**: Neurons with low average activation\n",
    "2. **Gradient-based**: Neurons with small gradients during training\n",
    "3. **Weight magnitude**: Neurons with small outgoing weights\n",
    "4. **Information-theoretic**: Neurons contributing less information\n",
    "\n",
    "### Real-World Applications\n",
    "- **Mobile deployment**: Actual speedup on ARM processors\n",
    "- **FPGA inference**: Smaller designs with same performance\n",
    "- **Edge computing**: Reduced memory bandwidth requirements\n",
    "- **Production systems**: Guaranteed inference time reduction\n",
    "\n",
    "### Challenges\n",
    "- **Architecture modification**: Must handle dimension mismatches\n",
    "- **Cascade effects**: Removing one neuron affects next layer\n",
    "- **Retraining**: Often requires fine-tuning after pruning\n",
    "- **Importance ranking**: Choosing the right importance metric\n",
    "\n",
    "Let's implement structured pruning for Dense layers!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d02d19f3",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "neuron-importance",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def compute_neuron_importance(layer: Dense, method: str = 'weight_magnitude') -> np.ndarray:\n",
    "    \"\"\"\n",
    "    Compute importance scores for each neuron in a Dense layer.\n",
    "    \n",
    "    Args:\n",
    "        layer: Dense layer to analyze\n",
    "        method: Importance computation method\n",
    "        \n",
    "    Returns:\n",
    "        Array of importance scores for each output neuron\n",
    "        \n",
    "    TODO: Implement neuron importance calculation.\n",
    "    \n",
    "    STEP-BY-STEP IMPLEMENTATION:\n",
    "    1. Get weight matrix from layer\n",
    "    2. Choose importance metric based on method\n",
    "    3. Calculate per-neuron importance scores\n",
    "    4. Return array of scores (one per output neuron)\n",
    "    \n",
    "    AVAILABLE METHODS:\n",
    "    - 'weight_magnitude': Sum of absolute weights per neuron\n",
    "    - 'weight_variance': Variance of weights per neuron\n",
    "    - 'random': Random importance (for baseline comparison)\n",
    "    \n",
    "    IMPLEMENTATION HINTS:\n",
    "    - Weights shape is (input_size, output_size)\n",
    "    - Each column represents one output neuron\n",
    "    - Use axis=0 for operations across input dimensions\n",
    "    - Higher scores = more important neurons\n",
    "    \n",
    "    LEARNING CONNECTIONS:\n",
    "    - This is how neural architecture search works\n",
    "    - Different metrics capture different aspects of importance\n",
    "    - Importance ranking is crucial for effective pruning\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    # Get weights and ensure they're numpy arrays\n",
    "    weights = layer.weights.data\n",
    "    if not isinstance(weights, np.ndarray):\n",
    "        weights = np.array(weights)\n",
    "    \n",
    "    if method == 'weight_magnitude':\n",
    "        # Sum of absolute weights per neuron (column)\n",
    "        importance = np.sum(np.abs(weights), axis=0)\n",
    "        \n",
    "    elif method == 'weight_variance':\n",
    "        # Variance of weights per neuron (column)\n",
    "        importance = np.var(weights, axis=0)\n",
    "        \n",
    "    elif method == 'random':\n",
    "        # Random importance for baseline comparison\n",
    "        importance = np.random.rand(weights.shape[1])\n",
    "        \n",
    "    else:\n",
    "        raise ValueError(f\"Unknown importance method: {method}\")\n",
    "    \n",
    "    return importance\n",
    "    ### END SOLUTION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3075ea5f",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "structured-pruning",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def prune_layer_neurons(layer: Dense, keep_ratio: float = 0.7, \n",
    "                       importance_method: str = 'weight_magnitude') -> Tuple[Dense, Dict[str, Any]]:\n",
    "    \"\"\"\n",
    "    Remove least important neurons from a Dense layer.\n",
    "    \n",
    "    Args:\n",
    "        layer: Dense layer to prune\n",
    "        keep_ratio: Fraction of neurons to keep (0.0 to 1.0)\n",
    "        importance_method: Method for computing neuron importance\n",
    "        \n",
    "    Returns:\n",
    "        Tuple of (pruned_layer, pruning_info)\n",
    "        \n",
    "    TODO: Implement structured neuron pruning.\n",
    "    \n",
    "    STEP-BY-STEP IMPLEMENTATION:\n",
    "    1. Compute importance scores for all neurons\n",
    "    2. Determine how many neurons to keep\n",
    "    3. Select indices of most important neurons\n",
    "    4. Create new layer with reduced dimensions\n",
    "    5. Copy weights and biases for selected neurons\n",
    "    6. Return pruned layer and statistics\n",
    "    \n",
    "    EXAMPLE USAGE:\n",
    "    ```python\n",
    "    layer = Dense(784, 128)\n",
    "    pruned_layer, info = prune_layer_neurons(layer, keep_ratio=0.75)\n",
    "    print(f\"Reduced from {info['original_neurons']} to {info['remaining_neurons']} neurons\")\n",
    "    ```\n",
    "    \n",
    "    IMPLEMENTATION HINTS:\n",
    "    - Use np.argsort() to rank neurons by importance\n",
    "    - Take the top keep_count neurons: indices[-keep_count:]\n",
    "    - Create new layer with reduced output size\n",
    "    - Copy both weights and bias for selected neurons\n",
    "    - Track original and new sizes for statistics\n",
    "    \n",
    "    LEARNING CONNECTIONS:\n",
    "    - This is actual model architecture modification\n",
    "    - Hardware gets real speedup from smaller matrices\n",
    "    - Must consider cascade effects on next layers\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    # Compute neuron importance\n",
    "    importance_scores = compute_neuron_importance(layer, importance_method)\n",
    "    \n",
    "    # Determine how many neurons to keep\n",
    "    original_neurons = layer.output_size\n",
    "    keep_count = max(1, int(original_neurons * keep_ratio))  # Keep at least 1 neuron\n",
    "    \n",
    "    # Select most important neurons\n",
    "    sorted_indices = np.argsort(importance_scores)\n",
    "    keep_indices = sorted_indices[-keep_count:]  # Take top keep_count neurons\n",
    "    keep_indices = np.sort(keep_indices)  # Sort for consistent ordering\n",
    "    \n",
    "    # Get current weights and biases\n",
    "    weights = layer.weights.data\n",
    "    if not isinstance(weights, np.ndarray):\n",
    "        weights = np.array(weights)\n",
    "    \n",
    "    bias = layer.bias.data if layer.bias is not None else None\n",
    "    if bias is not None and not isinstance(bias, np.ndarray):\n",
    "        bias = np.array(bias)\n",
    "    \n",
    "    # Create new layer with reduced dimensions\n",
    "    pruned_layer = Dense(layer.input_size, keep_count)\n",
    "    \n",
    "    # Copy weights for selected neurons\n",
    "    pruned_weights = weights[:, keep_indices]\n",
    "    pruned_layer.weights.data = np.ascontiguousarray(pruned_weights)\n",
    "    \n",
    "    # Copy bias for selected neurons\n",
    "    if bias is not None:\n",
    "        pruned_bias = bias[keep_indices]\n",
    "        pruned_layer.bias.data = np.ascontiguousarray(pruned_bias)\n",
    "    \n",
    "    # Calculate pruning statistics\n",
    "    neurons_removed = original_neurons - keep_count\n",
    "    compression_ratio = original_neurons / keep_count if keep_count > 0 else float('inf')\n",
    "    \n",
    "    # Calculate parameter reduction\n",
    "    original_params = layer.input_size * original_neurons + (original_neurons if bias is not None else 0)\n",
    "    new_params = layer.input_size * keep_count + (keep_count if bias is not None else 0)\n",
    "    param_reduction = (original_params - new_params) / original_params\n",
    "    \n",
    "    pruning_info = {\n",
    "        'keep_ratio': keep_ratio,\n",
    "        'importance_method': importance_method,\n",
    "        'original_neurons': original_neurons,\n",
    "        'remaining_neurons': keep_count,\n",
    "        'neurons_removed': neurons_removed,\n",
    "        'compression_ratio': float(compression_ratio),\n",
    "        'original_params': original_params,\n",
    "        'new_params': new_params,\n",
    "        'param_reduction': float(param_reduction),\n",
    "        'keep_indices': keep_indices.tolist()\n",
    "    }\n",
    "    \n",
    "    return pruned_layer, pruning_info\n",
    "    ### END SOLUTION "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ddbf46e3",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "test-structured-pruning",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_structured_pruning():\n",
    "    \"\"\"\n",
    "    ### 🧪 Unit Test: Structured Pruning\n",
    "    \n",
    "    Test structured neuron pruning and parameter reduction.\n",
    "    \n",
    "    **This is a unit test** - it tests structured pruning in isolation.\n",
    "    \"\"\"\n",
    "    print(\"🔬 Unit Test: Structured Pruning\")\n",
    "    print(\"**This is a unit test** - it tests structured pruning in isolation.\")\n",
    "    \n",
    "    # Create test layer\n",
    "    layer = Dense(100, 50)\n",
    "    \n",
    "    # Test basic pruning\n",
    "    pruned_layer, info = prune_layer_neurons(layer, keep_ratio=0.75)\n",
    "    \n",
    "    # Verify pruning results\n",
    "    assert info['keep_ratio'] == 0.75, f\"Expected 0.75, got {info['keep_ratio']}\"\n",
    "    assert info['original_neurons'] == 50, f\"Expected 50, got {info['original_neurons']}\"\n",
    "    assert info['remaining_neurons'] == 37, f\"Expected 37, got {info['remaining_neurons']}\"\n",
    "    assert info['neurons_removed'] == 13, f\"Expected 13, got {info['neurons_removed']}\"\n",
    "    assert info['compression_ratio'] >= 1.35, f\"Compression ratio should be at least 1.35, got {info['compression_ratio']}\"\n",
    "    \n",
    "    print(f\"✅ Basic structured pruning works: {info['neurons_removed']} neurons removed\")\n",
    "    \n",
    "    # Test parameter reduction\n",
    "    assert info['param_reduction'] >= 0.25, f\"Parameter reduction should be at least 0.25, got {info['param_reduction']}\"\n",
    "    print(f\"✅ Parameter reduction works: {info['param_reduction']:.2%}\")\n",
    "    \n",
    "    # Test edge cases\n",
    "    empty_layer = Dense(10, 10)\n",
    "    _, info_empty = prune_layer_neurons(empty_layer, keep_ratio=0.5)\n",
    "    assert info_empty['remaining_neurons'] == 5, f\"Empty layer should have 5 neurons, got {info_empty['remaining_neurons']}\"\n",
    "    \n",
    "    print(\"✅ Edge cases work correctly\")\n",
    "    \n",
    "    # Test different keep ratios\n",
    "    layer2 = Dense(50, 25)\n",
    "    _, info_ratio70 = prune_layer_neurons(layer2, keep_ratio=0.7)\n",
    "    _, info_ratio50 = prune_layer_neurons(layer2, keep_ratio=0.5)\n",
    "    \n",
    "    assert info_ratio70['remaining_neurons'] > info_ratio50['remaining_neurons'], \"Higher keep ratio should result in more neurons\"\n",
    "    print(f\"✅ Different keep ratios work: 70% ratio = {info_ratio70['remaining_neurons']}, 50% ratio = {info_ratio50['remaining_neurons']}\")\n",
    "    \n",
    "    # Test different importance methods\n",
    "    _, info_weight_mag = prune_layer_neurons(layer, keep_ratio=0.75, importance_method='weight_magnitude')\n",
    "    _, info_weight_var = prune_layer_neurons(layer, keep_ratio=0.75, importance_method='weight_variance')\n",
    "    \n",
    "    # Both should achieve similar compression ratios since they both keep 75% of neurons\n",
    "    print(f\"✅ Different importance methods work: Weight Mag = {info_weight_mag['compression_ratio']:.2f}, Weight Var = {info_weight_var['compression_ratio']:.2f}\")\n",
    "    \n",
    "    print(\"📈 Progress: Structured Pruning ✓\")\n",
    "    print(\"🎯 Structured pruning behavior:\")\n",
    "    print(\"  - Removes least important neurons\")\n",
    "    print(\"  - Maintains layer structure and connectivity\")\n",
    "    print(\"  - Provides detailed statistics for analysis\")\n",
    "    print(\"  - Scales to different keep ratios\")\n",
    "    print()\n",
    "\n",
    "# Run the test\n",
    "test_structured_pruning() "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ea0c4481",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 6: Comprehensive Comparison - Combining All Techniques\n",
    "\n",
    "### Putting It All Together\n",
    "Now that we've implemented four core compression techniques, let's combine them and see how they work together for maximum efficiency.\n",
    "\n",
    "### The Compression Toolkit\n",
    "We now have a complete arsenal:\n",
    "\n",
    "1. **CompressionMetrics**: Analyze model size and parameter distribution\n",
    "2. **Magnitude-based pruning**: Remove unimportant weights (sparsity)\n",
    "3. **Quantization**: Reduce precision (FP32 → INT8)\n",
    "4. **Knowledge distillation**: Train compact models with teacher guidance\n",
    "5. **Structured pruning**: Remove entire neurons (actual speedup)\n",
    "\n",
    "### Compression Strategy Design\n",
    "Different deployment scenarios need different strategies:\n",
    "\n",
    "#### **Mobile AI Deployment**\n",
    "- **Primary**: Quantization (75% memory reduction)\n",
    "- **Secondary**: Structured pruning (inference speedup)\n",
    "- **Target**: < 10MB models, < 100ms inference\n",
    "\n",
    "#### **Edge Computing**\n",
    "- **Primary**: Structured pruning (minimal compute)\n",
    "- **Secondary**: Magnitude pruning (memory efficiency)\n",
    "- **Target**: < 1MB models, minimal power consumption\n",
    "\n",
    "#### **Production Cloud**\n",
    "- **Primary**: Knowledge distillation (balanced compression)\n",
    "- **Secondary**: Quantization (cost reduction)\n",
    "- **Target**: Maximize throughput while maintaining accuracy\n",
    "\n",
    "#### **Research and Development**\n",
    "- **Primary**: Magnitude pruning (experimental flexibility)\n",
    "- **Secondary**: All techniques for comparison\n",
    "- **Target**: Understand trade-offs and optimal combinations\n",
    "\n",
    "### Compression Pipeline Design\n",
    "A systematic approach to model compression:\n",
    "\n",
    "```python\n",
    "# 1. Baseline analysis\n",
    "metrics = CompressionMetrics()\n",
    "baseline_size = metrics.calculate_model_size(model)\n",
    "\n",
    "# 2. Apply magnitude pruning\n",
    "model, prune_info = prune_model_by_magnitude(model, pruning_ratio=0.3)\n",
    "\n",
    "# 3. Apply quantization\n",
    "for layer in model.layers:\n",
    "    if isinstance(layer, Dense):\n",
    "        layer, quant_info = quantize_layer_weights(layer, bits=8)\n",
    "\n",
    "# 4. Apply structured pruning\n",
    "for i, layer in enumerate(model.layers):\n",
    "    if isinstance(layer, Dense):\n",
    "        model.layers[i], struct_info = prune_layer_neurons(layer, keep_ratio=0.8)\n",
    "\n",
    "# 5. Measure final compression\n",
    "final_size = metrics.calculate_model_size(model)\n",
    "compression_ratio = baseline_size['size_mb'] / final_size['size_mb']\n",
    "```\n",
    "\n",
    "### Trade-off Analysis\n",
    "Understanding the compression spectrum:\n",
    "\n",
    "- **Accuracy vs Size**: More compression = more accuracy loss\n",
    "- **Size vs Speed**: Structured compression gives actual speedup\n",
    "- **Memory vs Computation**: Different bottlenecks need different solutions\n",
    "- **Development vs Production**: Research flexibility vs deployment constraints\n",
    "\n",
    "Let's build a comprehensive comparison framework!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ec30404",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "compression-comparison",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def compare_compression_techniques(original_model: Sequential) -> Dict[str, Dict[str, Any]]:\n",
    "    \"\"\"\n",
    "    Compare all compression techniques on the same model.\n",
    "    \n",
    "    Args:\n",
    "        original_model: Base model to compress using different techniques\n",
    "        \n",
    "    Returns:\n",
    "        Dictionary comparing results from different compression approaches\n",
    "        \n",
    "    TODO: Implement comprehensive compression comparison.\n",
    "    \n",
    "    STEP-BY-STEP IMPLEMENTATION:\n",
    "    1. Set up baseline metrics from original model\n",
    "    2. Apply each compression technique individually\n",
    "    3. Apply combined compression techniques\n",
    "    4. Measure and compare all results\n",
    "    5. Return comprehensive comparison data\n",
    "    \n",
    "    COMPARISON DIMENSIONS:\n",
    "    - Model size (MB)\n",
    "    - Parameter count\n",
    "    - Compression ratio\n",
    "    - Memory reduction\n",
    "    - Estimated speedup (for structured techniques)\n",
    "    \n",
    "    IMPLEMENTATION HINTS:\n",
    "    - Create separate model copies for each technique\n",
    "    - Use consistent parameters across techniques\n",
    "    - Track both individual and combined effects\n",
    "    - Include baseline for reference\n",
    "    \n",
    "    LEARNING CONNECTIONS:\n",
    "    - This is how research papers compare compression methods\n",
    "    - Production systems need this analysis for deployment decisions\n",
    "    - Understanding trade-offs guides technique selection\n",
    "    \"\"\"\n",
    "    ### BEGIN SOLUTION\n",
    "    results = {}\n",
    "    metrics = CompressionMetrics()\n",
    "    \n",
    "    # Baseline: Original model\n",
    "    baseline_params = metrics.count_parameters(original_model)\n",
    "    baseline_size = metrics.calculate_model_size(original_model)\n",
    "    \n",
    "    results['baseline'] = {\n",
    "        'technique': 'Original Model',\n",
    "        'parameters': baseline_params['total_parameters'],\n",
    "        'size_mb': baseline_size['size_mb'],\n",
    "        'compression_ratio': 1.0,\n",
    "        'memory_reduction': 0.0\n",
    "    }\n",
    "    \n",
    "    # Technique 1: Magnitude-based pruning only\n",
    "    model_pruning = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
    "    for i, layer in enumerate(model_pruning.layers):\n",
    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
    "    \n",
    "    # Apply magnitude pruning to each layer\n",
    "    total_sparsity = 0\n",
    "    for i, layer in enumerate(model_pruning.layers):\n",
    "        if isinstance(layer, Dense):\n",
    "            _, prune_info = prune_weights_by_magnitude(layer, pruning_ratio=0.3)\n",
    "            total_sparsity += prune_info['sparsity']\n",
    "    \n",
    "    avg_sparsity = total_sparsity / len(model_pruning.layers)\n",
    "    pruning_params = metrics.count_parameters(model_pruning)\n",
    "    pruning_size = metrics.calculate_model_size(model_pruning)\n",
    "    \n",
    "    results['magnitude_pruning'] = {\n",
    "        'technique': 'Magnitude Pruning (30%)',\n",
    "        'parameters': pruning_params['total_parameters'],\n",
    "        'size_mb': pruning_size['size_mb'],\n",
    "        'compression_ratio': baseline_size['size_mb'] / pruning_size['size_mb'],\n",
    "        'memory_reduction': (baseline_size['size_mb'] - pruning_size['size_mb']) / baseline_size['size_mb'],\n",
    "        'sparsity': avg_sparsity\n",
    "    }\n",
    "    \n",
    "    # Technique 2: Quantization only\n",
    "    model_quantization = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
    "    for i, layer in enumerate(model_quantization.layers):\n",
    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
    "    \n",
    "    # Apply quantization to each layer\n",
    "    total_memory_reduction = 0\n",
    "    for i, layer in enumerate(model_quantization.layers):\n",
    "        if isinstance(layer, Dense):\n",
    "            _, quant_info = quantize_layer_weights(layer, bits=8)\n",
    "            total_memory_reduction += quant_info['memory_reduction']\n",
    "    \n",
    "    avg_memory_reduction = total_memory_reduction / len(model_quantization.layers)\n",
    "    quantization_size = metrics.calculate_model_size(model_quantization, dtype='int8')\n",
    "    \n",
    "    results['quantization'] = {\n",
    "        'technique': 'Quantization (INT8)',\n",
    "        'parameters': baseline_params['total_parameters'],\n",
    "        'size_mb': quantization_size['size_mb'],\n",
    "        'compression_ratio': baseline_size['size_mb'] / quantization_size['size_mb'],\n",
    "        'memory_reduction': (baseline_size['size_mb'] - quantization_size['size_mb']) / baseline_size['size_mb'],\n",
    "        'avg_memory_reduction_factor': avg_memory_reduction\n",
    "    }\n",
    "    \n",
    "    # Technique 3: Structured pruning only\n",
    "    model_structured = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
    "    for i, layer in enumerate(model_structured.layers):\n",
    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
    "    \n",
    "    # Apply structured pruning to each layer\n",
    "    total_param_reduction = 0\n",
    "    for i, layer in enumerate(model_structured.layers):\n",
    "        if isinstance(layer, Dense):\n",
    "            pruned_layer, struct_info = prune_layer_neurons(layer, keep_ratio=0.75)\n",
    "            model_structured.layers[i] = pruned_layer\n",
    "            total_param_reduction += struct_info['param_reduction']\n",
    "    \n",
    "    avg_param_reduction = total_param_reduction / len(model_structured.layers)\n",
    "    structured_params = metrics.count_parameters(model_structured)\n",
    "    structured_size = metrics.calculate_model_size(model_structured)\n",
    "    \n",
    "    results['structured_pruning'] = {\n",
    "        'technique': 'Structured Pruning (75% neurons kept)',\n",
    "        'parameters': structured_params['total_parameters'],\n",
    "        'size_mb': structured_size['size_mb'],\n",
    "        'compression_ratio': baseline_size['size_mb'] / structured_size['size_mb'],\n",
    "        'memory_reduction': (baseline_size['size_mb'] - structured_size['size_mb']) / baseline_size['size_mb'],\n",
    "        'param_reduction': avg_param_reduction\n",
    "    }\n",
    "    \n",
    "    # Technique 4: Combined approach\n",
    "    model_combined = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
    "    for i, layer in enumerate(model_combined.layers):\n",
    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
    "    \n",
    "    # Apply magnitude pruning + quantization + structured pruning\n",
    "    for i, layer in enumerate(model_combined.layers):\n",
    "        if isinstance(layer, Dense):\n",
    "            # Step 1: Magnitude pruning\n",
    "            _, _ = prune_weights_by_magnitude(layer, pruning_ratio=0.2)\n",
    "            # Step 2: Quantization  \n",
    "            _, _ = quantize_layer_weights(layer, bits=8)\n",
    "            # Step 3: Structured pruning\n",
    "            pruned_layer, _ = prune_layer_neurons(layer, keep_ratio=0.8)\n",
    "            model_combined.layers[i] = pruned_layer\n",
    "    \n",
    "    combined_params = metrics.count_parameters(model_combined)\n",
    "    combined_size = metrics.calculate_model_size(model_combined, dtype='int8')\n",
    "    \n",
    "    results['combined'] = {\n",
    "        'technique': 'Combined (Pruning + Quantization + Structured)',\n",
    "        'parameters': combined_params['total_parameters'],\n",
    "        'size_mb': combined_size['size_mb'],\n",
    "        'compression_ratio': baseline_size['size_mb'] / combined_size['size_mb'],\n",
    "        'memory_reduction': (baseline_size['size_mb'] - combined_size['size_mb']) / baseline_size['size_mb']\n",
    "    }\n",
    "    \n",
    "    return results\n",
    "    ### END SOLUTION"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0b991b2",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## 🧪 Testing Infrastructure\n",
    "\n",
    "### 🔬 Unit Testing Pattern\n",
    "Each compression technique includes comprehensive unit tests:\n",
    "\n",
    "1. **Functionality verification**: Core algorithms work correctly\n",
    "2. **Edge case handling**: Robust error handling and boundary conditions\n",
    "3. **Statistical validation**: Compression metrics and analysis\n",
    "4. **Performance measurement**: Before/after comparisons\n",
    "\n",
    "### 📈 Progress Tracking\n",
    "- **CompressionMetrics**: ✅ Complete with parameter counting\n",
    "- **Magnitude-based pruning**: ✅ Complete with sparsity calculation\n",
    "- **Quantization**: 🔄 Coming next\n",
    "- **Knowledge distillation**: 🔄 Coming next\n",
    "- **Structured pruning**: 🔄 Coming next\n",
    "- **Comprehensive comparison**: 🔄 Coming next\n",
    "\n",
    "### 🎓 Educational Value\n",
    "- **Conceptual understanding**: Why compression matters\n",
    "- **Practical implementation**: Build techniques from scratch\n",
    "- **Real-world connections**: Mobile, edge, and production deployment\n",
    "- **Systems thinking**: Balance accuracy, efficiency, and constraints\n",
    "\n",
    "This module teaches the essential skills for deploying AI in resource-constrained environments!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d2cee1e",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "test-comprehensive-comparison",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "def test_comprehensive_comparison():\n",
    "    \"\"\"\n",
    "    ### 🧪 Unit Test: Comprehensive Comparison\n",
    "    \n",
    "    Test the integrated compression comparison framework.\n",
    "    \n",
    "    **This is a unit test** - it tests comprehensive comparison in isolation.\n",
    "    \"\"\"\n",
    "    print(\"🔬 Unit Test: Comprehensive Comparison\")\n",
    "    print(\"**This is a unit test** - it tests comprehensive comparison in isolation.\")\n",
    "    \n",
    "    # Create test model\n",
    "    model = Sequential([\n",
    "        Dense(784, 128),\n",
    "        Dense(128, 64),\n",
    "        Dense(64, 10)\n",
    "    ])\n",
    "    \n",
    "    # Run comprehensive comparison\n",
    "    results = compare_compression_techniques(model)\n",
    "    \n",
    "    # Verify baseline exists\n",
    "    assert 'baseline' in results, \"Baseline results should be included\"\n",
    "    baseline = results['baseline']\n",
    "    assert baseline['compression_ratio'] == 1.0, f\"Baseline compression ratio should be 1.0, got {baseline['compression_ratio']}\"\n",
    "    \n",
    "    print(f\"✅ Baseline analysis works: {baseline['parameters']} parameters, {baseline['size_mb']} MB\")\n",
    "    \n",
    "    # Verify individual techniques\n",
    "    techniques = ['magnitude_pruning', 'quantization', 'structured_pruning', 'combined']\n",
    "    for technique in techniques:\n",
    "        assert technique in results, f\"Missing technique: {technique}\"\n",
    "        result = results[technique]\n",
    "        \n",
    "        # Magnitude pruning creates sparsity but doesn't reduce file size in our simulation\n",
    "        if technique == 'magnitude_pruning':\n",
    "            assert result['compression_ratio'] >= 1.0, f\"{technique} should have compression ratio >= 1.0\"\n",
    "        else:\n",
    "            assert result['compression_ratio'] > 1.0, f\"{technique} should have compression ratio > 1.0\"\n",
    "            \n",
    "        assert 0 <= result['memory_reduction'] <= 1.0, f\"{technique} memory reduction should be between 0 and 1\"\n",
    "        \n",
    "    print(\"✅ All compression techniques work correctly\")\n",
    "    \n",
    "    # Verify compression effectiveness\n",
    "    quantization = results['quantization']\n",
    "    structured = results['structured_pruning']\n",
    "    combined = results['combined']\n",
    "    \n",
    "    assert quantization['compression_ratio'] >= 3.0, f\"Quantization should achieve at least 3x compression, got {quantization['compression_ratio']:.2f}\"\n",
    "    assert structured['compression_ratio'] >= 1.2, f\"Structured pruning should achieve at least 1.2x compression, got {structured['compression_ratio']:.2f}\"\n",
    "    assert combined['compression_ratio'] >= quantization['compression_ratio'], f\"Combined should be at least as good as best individual technique\"\n",
    "    \n",
    "    print(f\"✅ Compression effectiveness verified:\")\n",
    "    print(f\"  - Quantization: {quantization['compression_ratio']:.2f}x compression\")\n",
    "    print(f\"  - Structured: {structured['compression_ratio']:.2f}x compression\") \n",
    "    print(f\"  - Combined: {combined['compression_ratio']:.2f}x compression\")\n",
    "    \n",
    "    # Verify different techniques have different characteristics\n",
    "    magnitude = results['magnitude_pruning']\n",
    "    assert 'sparsity' in magnitude, \"Magnitude pruning should report sparsity\"\n",
    "    assert 'avg_memory_reduction_factor' in quantization, \"Quantization should report memory reduction factor\"\n",
    "    assert 'param_reduction' in structured, \"Structured pruning should report parameter reduction\"\n",
    "    \n",
    "    print(\"✅ Technique-specific metrics work correctly\")\n",
    "    \n",
    "    print(\"📈 Progress: Comprehensive Comparison ✓\")\n",
    "    print(\"🎯 Comprehensive comparison behavior:\")\n",
    "    print(\"  - Compares all techniques systematically\")\n",
    "    print(\"  - Provides detailed metrics for each approach\")\n",
    "    print(\"  - Enables informed compression strategy selection\")\n",
    "    print(\"  - Demonstrates combined technique effectiveness\")\n",
    "    print()\n",
    "\n",
    "# Run the test\n",
    "test_comprehensive_comparison()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7df3b1d9",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 🧪 Module Testing\n",
    "\n",
    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
    "\n",
    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0b4e8651",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "standardized-testing",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
    "# =============================================================================\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    from tito.tools.testing import run_module_tests_auto\n",
    "    \n",
    "    # Automatically discover and run all tests in this module\n",
    "    success = run_module_tests_auto(\"Compression\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c1769f7",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 📋 Module Summary\n",
    "\n",
    "### ✅ What We've Built\n",
    "This compression module provides a complete toolkit for making neural networks efficient:\n",
    "\n",
    "#### **1. CompressionMetrics** ✓\n",
    "- **Parameter counting**: Analyze model size and distribution\n",
    "- **Memory footprint**: Calculate storage requirements in different data types\n",
    "- **Foundation**: Baseline measurement for compression decisions\n",
    "\n",
    "#### **2. Magnitude-Based Pruning** ✓\n",
    "- **Weight removal**: Remove smallest weights based on magnitude\n",
    "- **Sparsity creation**: Create sparse matrices for memory efficiency\n",
    "- **Flexible thresholds**: Support different pruning intensities\n",
    "\n",
    "#### **3. Quantization** ✓\n",
    "- **Precision reduction**: Convert FP32 → INT8 for 75% memory savings\n",
    "- **Error tracking**: Monitor quantization impact on model accuracy\n",
    "- **Multiple bit-widths**: Support 16-bit, 8-bit, and other precisions\n",
    "\n",
    "#### **4. Knowledge Distillation** ✓\n",
    "- **Teacher-student training**: Large models guide small model learning\n",
    "- **Soft targets**: Rich probability distributions vs hard labels\n",
    "- **Temperature scaling**: Control knowledge transfer richness\n",
    "\n",
    "#### **5. Structured Pruning** ✓\n",
    "- **Neuron removal**: Remove entire neurons for actual hardware speedup\n",
    "- **Architecture modification**: Create smaller but dense networks\n",
    "- **Importance metrics**: Multiple methods for ranking neuron importance\n",
    "\n",
    "#### **6. Comprehensive Comparison** ✓\n",
    "- **Systematic evaluation**: Compare all techniques on same baseline\n",
    "- **Combined approaches**: Integrate multiple techniques for maximum compression\n",
    "- **Trade-off analysis**: Understand compression vs accuracy spectrum\n",
    "\n",
    "### 🎯 Real-World Applications\n",
    "Students can now optimize models for:\n",
    "- **Mobile AI**: < 10MB models for smartphone deployment\n",
    "- **Edge computing**: < 1MB models for IoT and embedded systems\n",
    "- **Production cloud**: Cost-optimized inference at scale\n",
    "- **Research**: Systematic compression comparison and analysis\n",
    "\n",
    "### 📊 Compression Achievements\n",
    "With the complete toolkit, students can achieve:\n",
    "- **4x+ memory reduction**: Through quantization (FP32 → INT8)\n",
    "- **1.3x+ speedup**: Through structured pruning (actual hardware benefit)\n",
    "- **5x+ combined compression**: Integrating multiple techniques\n",
    "- **Flexible trade-offs**: Balance accuracy, size, and speed as needed\n",
    "\n",
    "### 🔗 Next Steps\n",
    "\n",
    "This compression foundation prepares students for:\n",
    "- **Module 11 - GPU Kernels**: Hardware-accelerated compression operations\n",
    "- **Module 12 - Benchmarking**: Systematic performance evaluation and optimization\n",
    "- **Module 13 - MLOps**: Production deployment with compressed models\n",
    "\n",
    "### 🚀 Professional Applications\n",
    "Your compression toolkit enables:\n",
    "- **Production AI**: Deploy efficient models at scale\n",
    "- **Mobile Applications**: Real-time AI on smartphones and tablets\n",
    "- **Edge Computing**: AI in IoT devices and embedded systems\n",
    "- **Research**: Systematic compression analysis and method development\n",
    "\n",
    "### 🎯 The Future of Efficient AI\n",
    "You've built the foundation for efficient AI systems:\n",
    "- **Sustainable AI**: Reduced energy consumption and carbon footprint\n",
    "- **Accessible AI**: AI systems that run on consumer hardware\n",
    "- **Scalable Inference**: Cost-effective deployment at any scale\n",
    "- **Real-time Applications**: Fast, efficient AI for interactive systems\n",
    "\n",
    "### 🧠 Key Skills Developed\n",
    "- **Compression Theory**: Understanding memory, compute, and accuracy trade-offs\n",
    "- **Mathematical Implementation**: Quantization, pruning, and distillation algorithms\n",
    "- **Systems Engineering**: Benchmarking, comparison, and optimization frameworks\n",
    "- **Production Readiness**: Real-world deployment considerations and techniques\n",
    "\n",
    "You've mastered the art and science of making neural networks efficient without sacrificing capability. This is the foundation of modern AI deployment!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d334f996",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 2
   },
   "source": [
    "## 🚀 Next Steps: Advanced Optimization\n",
    "\n",
    "### Kernels - Hardware-Aware Optimization\n",
    "Build on compression foundations with:\n",
    "- **Custom CUDA kernels**: GPU-optimized operations for compressed models\n",
    "- **SIMD optimization**: CPU vectorization for quantized operations\n",
    "- **Memory layout**: Optimize data structures for sparse and quantized weights\n",
    "- **Hardware profiling**: Measure actual performance improvements\n",
    "\n",
    "### Benchmarking - Systematic Performance Measurement\n",
    "Apply compression in production context:\n",
    "- **Latency measurement**: Quantify inference speedup from compression\n",
    "- **Accuracy evaluation**: Systematic testing of compression impact\n",
    "- **A/B testing**: Compare compressed vs uncompressed models in production\n",
    "- **Performance profiling**: Identify bottlenecks and optimization opportunities\n",
    "\n",
    "### MLOps - Production Deployment\n",
    "Deploy compressed models at scale:\n",
    "- **Model versioning**: Manage compressed model variants\n",
    "- **Monitoring**: Track compressed model performance in production\n",
    "- **Continuous optimization**: Automated compression pipeline\n",
    "- **Edge deployment**: Distribute compressed models to mobile and IoT devices\n",
    "\n",
    "### 🔬 Research Directions\n",
    "Advanced compression techniques:\n",
    "- **Neural Architecture Search**: Automated compression-aware design\n",
    "- **Hardware-aware compression**: Optimize for specific deployment targets\n",
    "- **Dynamic compression**: Adaptive compression based on runtime conditions\n",
    "- **Federated compression**: Compress models for distributed learning\n",
    "\n",
    "### 💼 Career Applications\n",
    "These compression skills are essential for:\n",
    "- **Mobile AI Engineer**: Optimize models for smartphones and tablets\n",
    "- **Edge AI Developer**: Deploy AI on IoT and embedded systems\n",
    "- **ML Infrastructure Engineer**: Build efficient inference systems\n",
    "- **Research Scientist**: Advance state-of-art compression techniques\n",
    "\n",
    "The compression module provides the foundation for all advanced optimization and deployment scenarios!"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}