diff --git a/modules/source/01_setup/setup_dev.ipynb b/modules/source/01_setup/setup_dev.ipynb
deleted file mode 100644
index afa9d5e6..00000000
--- a/modules/source/01_setup/setup_dev.ipynb
+++ /dev/null
@@ -1,1699 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "a84f5309",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Setup - TinyTorch System Configuration\n",
-    "\n",
-    "Welcome to TinyTorch! This setup module configures your personal TinyTorch installation and teaches you the NBGrader workflow.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Configure your personal TinyTorch installation with custom information\n",
-    "- Learn to query system information using Python modules\n",
-    "- Master the NBGrader workflow: implement → test → export\n",
-    "- Create functions that become part of your tinytorch package\n",
-    "- Understand solution blocks, hidden tests, and automated grading\n",
-    "\n",
-    "## The Big Picture: Why Configuration Matters in ML Systems\n",
-    "Configuration is the foundation of any production ML system. In this module, you'll learn:\n",
-    "\n",
-    "### 1. **System Awareness**\n",
-    "Real ML systems need to understand their environment:\n",
-    "- **Hardware constraints**: Memory, CPU cores, GPU availability\n",
-    "- **Software dependencies**: Python version, library compatibility\n",
-    "- **Platform differences**: Linux servers, macOS development, Windows deployment\n",
-    "\n",
-    "### 2. **Reproducibility**\n",
-    "Configuration enables reproducible ML:\n",
-    "- **Environment documentation**: Exactly what system was used\n",
-    "- **Dependency management**: Precise versions and requirements\n",
-    "- **Debugging support**: System info helps troubleshoot issues\n",
-    "\n",
-    "### 3. **Professional Development**\n",
-    "Proper configuration shows engineering maturity:\n",
-    "- **Attribution**: Your work is properly credited\n",
-    "- **Collaboration**: Others can understand and extend your setup\n",
-    "- **Maintenance**: Systems can be updated and maintained\n",
-    "\n",
-    "### 4. **ML Systems Context**\n",
-    "This connects to broader ML engineering:\n",
-    "- **Model deployment**: Different environments need different configs\n",
-    "- **Monitoring**: System metrics help track performance\n",
-    "- **Scaling**: Understanding hardware helps optimize training\n",
-    "\n",
-    "Let's build the foundation of your ML systems engineering skills!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b608e2e6",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "setup-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.setup\n",
-    "\n",
-    "#| export\n",
-    "import sys\n",
-    "import platform\n",
-    "import psutil\n",
-    "import os\n",
-    "from typing import Dict, Any"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "427aefa2",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "setup-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Setup Module\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(f\"Platform: {platform.system()}\")\n",
-    "print(\"Ready to configure your TinyTorch installation!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "946074ef",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🏗️ The Architecture of ML Systems Configuration\n",
-    "\n",
-    "### Configuration Layers in Production ML\n",
-    "Real ML systems have multiple configuration layers:\n",
-    "\n",
-    "```\n",
-    "┌─────────────────────────────────────┐\n",
-    "│        Application Config           │  ← Your personal info\n",
-    "├─────────────────────────────────────┤\n",
-    "│        System Environment           │  ← Hardware specs\n",
-    "├─────────────────────────────────────┤\n",
-    "│        Runtime Configuration        │  ← Python, libraries\n",
-    "├─────────────────────────────────────┤\n",
-    "│        Infrastructure Config        │  ← Cloud, containers\n",
-    "└─────────────────────────────────────┘\n",
-    "```\n",
-    "\n",
-    "### Why Each Layer Matters\n",
-    "- **Application**: Identifies who built what and when\n",
-    "- **System**: Determines performance characteristics and limitations\n",
-    "- **Runtime**: Affects compatibility and feature availability\n",
-    "- **Infrastructure**: Enables scaling and deployment strategies\n",
-    "\n",
-    "### Connection to Real ML Frameworks\n",
-    "Every major ML framework has configuration:\n",
-    "- **PyTorch**: `torch.cuda.is_available()`, `torch.get_num_threads()`\n",
-    "- **TensorFlow**: `tf.config.list_physical_devices()`, `tf.sysconfig.get_build_info()`\n",
-    "- **Hugging Face**: Model cards with system requirements and performance metrics\n",
-    "- **MLflow**: Experiment tracking with system context and reproducibility\n",
-    "\n",
-    "### TinyTorch's Approach\n",
-    "We'll build configuration that's:\n",
-    "- **Educational**: Teaches system awareness\n",
-    "- **Practical**: Actually useful for debugging\n",
-    "- **Professional**: Follows industry standards\n",
-    "- **Extensible**: Ready for future ML systems features"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b2bb27d7",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 1: What is System Configuration?\n",
-    "\n",
-    "### Definition\n",
-    "**System configuration** is the process of setting up your development environment with personalized information and system diagnostics. In TinyTorch, this means:\n",
-    "\n",
-    "- **Personal Information**: Your name, email, institution for identification\n",
-    "- **System Information**: Hardware specs, Python version, platform details\n",
-    "- **Customization**: Making your TinyTorch installation uniquely yours\n",
-    "\n",
-    "### Why Configuration Matters in ML Systems\n",
-    "Proper system configuration is crucial because:\n",
-    "\n",
-    "#### 1. **Reproducibility** \n",
-    "Your setup can be documented and shared:\n",
-    "```python\n",
-    "# Someone else can recreate your environment\n",
-    "config = {\n",
-    "    'developer': 'Your Name',\n",
-    "    'python_version': '3.9.7',\n",
-    "    'platform': 'Darwin',\n",
-    "    'memory_gb': 16.0\n",
-    "}\n",
-    "```\n",
-    "\n",
-    "#### 2. **Debugging**\n",
-    "System info helps troubleshoot ML performance issues:\n",
-    "- **Memory errors**: \"Do I have enough RAM for this model?\"\n",
-    "- **Performance issues**: \"How many CPU cores can I use?\"\n",
-    "- **Compatibility problems**: \"What Python version am I running?\"\n",
-    "\n",
-    "#### 3. **Professional Development**\n",
-    "Shows proper engineering practices:\n",
-    "- **Attribution**: Your work is properly credited\n",
-    "- **Collaboration**: Others can contact you about your code\n",
-    "- **Documentation**: System context is preserved\n",
-    "\n",
-    "#### 4. **ML Systems Integration**\n",
-    "Connects to broader ML engineering:\n",
-    "- **Model cards**: Document system requirements\n",
-    "- **Experiment tracking**: Record hardware context\n",
-    "- **Deployment**: Match development to production environments\n",
-    "\n",
-    "### Real-World Examples\n",
-    "- **Google Colab**: Shows GPU type, RAM, disk space\n",
-    "- **Kaggle**: Displays system specs for reproducibility\n",
-    "- **MLflow**: Tracks system context with experiments\n",
-    "- **Docker**: Containerizes entire system configuration\n",
-    "\n",
-    "Let's start configuring your TinyTorch system!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "26b13500",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Personal Information Configuration\n",
-    "\n",
-    "### The Concept: Identity in ML Systems\n",
-    "Your **personal information** identifies you as the developer and configures your TinyTorch installation. This isn't just administrative - it's foundational to professional ML development.\n",
-    "\n",
-    "### Why Personal Info Matters in ML Engineering\n",
-    "\n",
-    "#### 1. **Attribution and Accountability**\n",
-    "- **Model ownership**: Who built this model?\n",
-    "- **Responsibility**: Who should be contacted about issues?\n",
-    "- **Credit**: Proper recognition for your work\n",
-    "\n",
-    "#### 2. **Collaboration and Communication**\n",
-    "- **Team coordination**: Multiple developers on ML projects\n",
-    "- **Knowledge sharing**: Others can learn from your work\n",
-    "- **Bug reports**: Contact info for issues and improvements\n",
-    "\n",
-    "#### 3. **Professional Standards**\n",
-    "- **Industry practice**: All professional software has attribution\n",
-    "- **Open source**: Proper credit in shared code\n",
-    "- **Academic integrity**: Clear authorship in research\n",
-    "\n",
-    "#### 4. **System Customization**\n",
-    "- **Personalized experience**: Your TinyTorch installation\n",
-    "- **Unique identification**: Distinguish your work from others\n",
-    "- **Development tracking**: Link code to developer\n",
-    "\n",
-    "### Real-World Parallels\n",
-    "- **Git commits**: Author name and email in every commit\n",
-    "- **Docker images**: Maintainer information in container metadata\n",
-    "- **Python packages**: Author info in `setup.py` and `pyproject.toml`\n",
-    "- **Model cards**: Creator information for ML models\n",
-    "\n",
-    "### Best Practices for Personal Configuration\n",
-    "- **Use real information**: Not placeholders or fake data\n",
-    "- **Professional email**: Accessible and appropriate\n",
-    "- **Descriptive system name**: Unique and meaningful\n",
-    "- **Consistent formatting**: Follow established conventions\n",
-    "\n",
-    "Now let's implement your personal configuration!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ae4d2930",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "personal-info",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def personal_info() -> Dict[str, str]:\n",
-    "    \"\"\"\n",
-    "    Return personal information for this TinyTorch installation.\n",
-    "    \n",
-    "    This function configures your personal TinyTorch installation with your identity.\n",
-    "    It's the foundation of proper ML engineering practices - every system needs\n",
-    "    to know who built it and how to contact them.\n",
-    "    \n",
-    "    TODO: Implement personal information configuration.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Create a dictionary with your personal details\n",
-    "    2. Include all required keys: developer, email, institution, system_name, version\n",
-    "    3. Use your actual information (not placeholder text)\n",
-    "    4. Make system_name unique and descriptive\n",
-    "    5. Keep version as '1.0.0' for now\n",
-    "    \n",
-    "    EXAMPLE OUTPUT:\n",
-    "    {\n",
-    "        'developer': 'Vijay Janapa Reddi',\n",
-    "        'email': 'vj@eecs.harvard.edu', \n",
-    "        'institution': 'Harvard University',\n",
-    "        'system_name': 'VJ-TinyTorch-Dev',\n",
-    "        'version': '1.0.0'\n",
-    "    }\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Replace the example with your real information\n",
-    "    - Use a descriptive system_name (e.g., 'YourName-TinyTorch-Dev')\n",
-    "    - Keep email format valid (contains @ and domain)\n",
-    "    - Make sure all values are strings\n",
-    "    - Consider how this info will be used in debugging and collaboration\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is like the 'author' field in Git commits\n",
-    "    - Similar to maintainer info in Docker images\n",
-    "    - Parallels author info in Python packages\n",
-    "    - Foundation for professional ML development\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    return {\n",
-    "        'developer': 'Vijay Janapa Reddi',\n",
-    "        'email': 'vj@eecs.harvard.edu',\n",
-    "        'institution': 'Harvard University',\n",
-    "        'system_name': 'VJ-TinyTorch-Dev',\n",
-    "        'version': '1.0.0'\n",
-    "    }\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3e8b5d05",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: System Information Queries\n",
-    "\n",
-    "### The Concept: Hardware-Aware ML Systems\n",
-    "**System information** provides details about your hardware and software environment. This is crucial for ML development because machine learning is fundamentally about computation, and computation depends on hardware.\n",
-    "\n",
-    "### Why System Information Matters in ML Engineering\n",
-    "\n",
-    "#### 1. **Performance Optimization**\n",
-    "- **CPU cores**: Determines parallelization strategies\n",
-    "- **Memory**: Limits batch size and model size\n",
-    "- **Architecture**: Affects numerical precision and optimization\n",
-    "\n",
-    "#### 2. **Compatibility and Debugging**\n",
-    "- **Python version**: Determines available features and libraries\n",
-    "- **Platform**: Affects file paths, process management, and system calls\n",
-    "- **Architecture**: Influences numerical behavior and optimization\n",
-    "\n",
-    "#### 3. **Resource Planning**\n",
-    "- **Training time estimation**: More cores = faster training\n",
-    "- **Memory requirements**: Avoid out-of-memory errors\n",
-    "- **Deployment matching**: Development should match production\n",
-    "\n",
-    "#### 4. **Reproducibility**\n",
-    "- **Environment documentation**: Exact system specifications\n",
-    "- **Performance comparison**: Same code, different hardware\n",
-    "- **Bug reproduction**: System-specific issues\n",
-    "\n",
-    "### The Python System Query Toolkit\n",
-    "You'll learn to use these essential Python modules:\n",
-    "\n",
-    "#### `sys.version_info` - Python Version\n",
-    "```python\n",
-    "version_info = sys.version_info\n",
-    "python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
-    "# Example: \"3.9.7\"\n",
-    "```\n",
-    "\n",
-    "#### `platform.system()` - Operating System\n",
-    "```python\n",
-    "platform_name = platform.system()\n",
-    "# Examples: \"Darwin\" (macOS), \"Linux\", \"Windows\"\n",
-    "```\n",
-    "\n",
-    "#### `platform.machine()` - CPU Architecture\n",
-    "```python\n",
-    "architecture = platform.machine()\n",
-    "# Examples: \"x86_64\", \"arm64\", \"aarch64\"\n",
-    "```\n",
-    "\n",
-    "#### `psutil.cpu_count()` - CPU Cores\n",
-    "```python\n",
-    "cpu_count = psutil.cpu_count()\n",
-    "# Example: 8 (cores available for parallel processing)\n",
-    "```\n",
-    "\n",
-    "#### `psutil.virtual_memory().total` - Total RAM\n",
-    "```python\n",
-    "memory_bytes = psutil.virtual_memory().total\n",
-    "memory_gb = round(memory_bytes / (1024**3), 1)\n",
-    "# Example: 16.0 GB\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **PyTorch**: `torch.get_num_threads()` uses CPU count\n",
-    "- **TensorFlow**: `tf.config.list_physical_devices()` queries hardware\n",
-    "- **Scikit-learn**: `n_jobs=-1` uses all available cores\n",
-    "- **Dask**: Automatically configures workers based on CPU count\n",
-    "\n",
-    "### ML Systems Performance Considerations\n",
-    "- **Memory-bound operations**: Matrix multiplication, large model loading\n",
-    "- **CPU-bound operations**: Data preprocessing, feature engineering\n",
-    "- **I/O-bound operations**: Data loading, model saving\n",
-    "- **Platform-specific optimizations**: SIMD instructions, memory management\n",
-    "\n",
-    "Now let's implement system information queries!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f1607388",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "system-info",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def system_info() -> Dict[str, Any]:\n",
-    "    \"\"\"\n",
-    "    Query and return system information for this TinyTorch installation.\n",
-    "    \n",
-    "    This function gathers crucial hardware and software information that affects\n",
-    "    ML performance, compatibility, and debugging. It's the foundation of \n",
-    "    hardware-aware ML systems.\n",
-    "    \n",
-    "    TODO: Implement system information queries.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Get Python version using sys.version_info\n",
-    "    2. Get platform using platform.system()\n",
-    "    3. Get architecture using platform.machine()\n",
-    "    4. Get CPU count using psutil.cpu_count()\n",
-    "    5. Get memory using psutil.virtual_memory().total\n",
-    "    6. Convert memory from bytes to GB (divide by 1024^3)\n",
-    "    7. Return all information in a dictionary\n",
-    "    \n",
-    "    EXAMPLE OUTPUT:\n",
-    "    {\n",
-    "        'python_version': '3.9.7',\n",
-    "        'platform': 'Darwin', \n",
-    "        'architecture': 'arm64',\n",
-    "        'cpu_count': 8,\n",
-    "        'memory_gb': 16.0\n",
-    "    }\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use f-string formatting for Python version: f\"{major}.{minor}.{micro}\"\n",
-    "    - Memory conversion: bytes / (1024^3) = GB\n",
-    "    - Round memory to 1 decimal place for readability\n",
-    "    - Make sure data types are correct (strings for text, int for cpu_count, float for memory_gb)\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is like `torch.cuda.is_available()` in PyTorch\n",
-    "    - Similar to system info in MLflow experiment tracking\n",
-    "    - Parallels hardware detection in TensorFlow\n",
-    "    - Foundation for performance optimization in ML systems\n",
-    "    \n",
-    "    PERFORMANCE IMPLICATIONS:\n",
-    "    - cpu_count affects parallel processing capabilities\n",
-    "    - memory_gb determines maximum model and batch sizes\n",
-    "    - platform affects file system and process management\n",
-    "    - architecture influences numerical precision and optimization\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Get Python version\n",
-    "    version_info = sys.version_info\n",
-    "    python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
-    "    \n",
-    "    # Get platform information\n",
-    "    platform_name = platform.system()\n",
-    "    architecture = platform.machine()\n",
-    "    \n",
-    "    # Get CPU information\n",
-    "    cpu_count = psutil.cpu_count()\n",
-    "    \n",
-    "    # Get memory information (convert bytes to GB)\n",
-    "    memory_bytes = psutil.virtual_memory().total\n",
-    "    memory_gb = round(memory_bytes / (1024**3), 1)\n",
-    "    \n",
-    "    return {\n",
-    "        'python_version': python_version,\n",
-    "        'platform': platform_name,\n",
-    "        'architecture': architecture,\n",
-    "        'cpu_count': cpu_count,\n",
-    "        'memory_gb': memory_gb\n",
-    "    }\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3671c633",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Testing Your Configuration Functions\n",
-    "\n",
-    "### The Importance of Testing in ML Systems\n",
-    "Before we test your implementation, let's understand why testing is crucial in ML systems:\n",
-    "\n",
-    "#### 1. **Reliability**\n",
-    "- **Function correctness**: Does your code do what it's supposed to?\n",
-    "- **Edge case handling**: What happens with unexpected inputs?\n",
-    "- **Error detection**: Catch bugs before they cause problems\n",
-    "\n",
-    "#### 2. **Reproducibility**\n",
-    "- **Consistent behavior**: Same inputs always produce same outputs\n",
-    "- **Environment validation**: Ensure setup works across different systems\n",
-    "- **Regression prevention**: New changes don't break existing functionality\n",
-    "\n",
-    "#### 3. **Professional Development**\n",
-    "- **Code quality**: Well-tested code is maintainable code\n",
-    "- **Collaboration**: Others can trust and extend your work\n",
-    "- **Documentation**: Tests serve as executable documentation\n",
-    "\n",
-    "#### 4. **ML-Specific Concerns**\n",
-    "- **Data validation**: Ensure data types and shapes are correct\n",
-    "- **Performance verification**: Check that optimizations work\n",
-    "- **System compatibility**: Verify cross-platform behavior\n",
-    "\n",
-    "### Testing Strategy\n",
-    "We'll use comprehensive testing that checks:\n",
-    "- **Return types**: Are outputs the correct data types?\n",
-    "- **Required fields**: Are all expected keys present?\n",
-    "- **Data validation**: Are values reasonable and properly formatted?\n",
-    "- **System accuracy**: Do queries match actual system state?\n",
-    "\n",
-    "Now let's test your configuration functions!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fa14788c",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Test Your Configuration Functions\n",
-    "\n",
-    "Once you implement both functions above, run this cell to test them:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6c0c8c52",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-personal-info",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test personal information configuration\n",
-    "print(\"🔬 Unit Test: Personal Information...\")\n",
-    "\n",
-    "# Test personal_info function\n",
-    "personal = personal_info()\n",
-    "\n",
-    "# Test return type\n",
-    "assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
-    "\n",
-    "# Test required keys\n",
-    "required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
-    "for key in required_keys:\n",
-    "    assert key in personal, f\"Dictionary should have '{key}' key\"\n",
-    "\n",
-    "# Test non-empty values\n",
-    "for key, value in personal.items():\n",
-    "    assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
-    "    assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
-    "\n",
-    "# Test email format\n",
-    "assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
-    "assert '.' in personal['email'], \"Email should contain domain\"\n",
-    "\n",
-    "# Test version format\n",
-    "assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
-    "\n",
-    "# Test system name (should be unique/personalized)\n",
-    "assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
-    "\n",
-    "print(\"✅ Personal info function tests passed!\")\n",
-    "print(f\"✅ TinyTorch configured for: {personal['developer']}\")\n",
-    "print(f\"✅ System: {personal['system_name']}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7b30693d",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-system-info",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test system information queries\n",
-    "print(\"🔬 Unit Test: System Information...\")\n",
-    "\n",
-    "# Test system_info function\n",
-    "sys_info = system_info()\n",
-    "\n",
-    "# Test return type\n",
-    "assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
-    "\n",
-    "# Test required keys\n",
-    "required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
-    "for key in required_keys:\n",
-    "    assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
-    "\n",
-    "# Test data types\n",
-    "assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
-    "assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
-    "assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
-    "assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
-    "assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
-    "\n",
-    "# Test reasonable values\n",
-    "assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
-    "assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
-    "assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
-    "\n",
-    "# Test that values are actually queried (not hardcoded)\n",
-    "actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
-    "assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
-    "\n",
-    "print(\"✅ System info function tests passed!\")\n",
-    "print(f\"✅ Python: {sys_info['python_version']} on {sys_info['platform']}\")\n",
-    "print(f\"✅ Memory: {sys_info['memory_gb']} GB, CPUs: {sys_info['cpu_count']}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c44390b2",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Inline Test Functions\n",
-    "\n",
-    "These test functions provide immediate feedback when developing your solutions:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "404c5605",
-   "metadata": {
-    "lines_to_next_cell": 1
-   },
-   "outputs": [],
-   "source": [
-    "def test_personal_info():\n",
-    "    \"\"\"Test personal_info function implementation.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Personal Information...\")\n",
-    "    \n",
-    "    # Test personal_info function\n",
-    "    personal = personal_info()\n",
-    "    \n",
-    "    # Test return type\n",
-    "    assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
-    "    \n",
-    "    # Test required keys\n",
-    "    required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
-    "    for key in required_keys:\n",
-    "        assert key in personal, f\"Dictionary should have '{key}' key\"\n",
-    "    \n",
-    "    # Test non-empty values\n",
-    "    for key, value in personal.items():\n",
-    "        assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
-    "        assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
-    "    \n",
-    "    # Test email format\n",
-    "    assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
-    "    assert '.' in personal['email'], \"Email should contain domain\"\n",
-    "    \n",
-    "    # Test version format\n",
-    "    assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
-    "    \n",
-    "    # Test system name (should be unique/personalized)\n",
-    "    assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
-    "    \n",
-    "    print(\"✅ Personal info function tests passed!\")\n",
-    "    print(f\"✅ TinyTorch configured for: {personal['developer']}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5ab7c64b",
-   "metadata": {
-    "lines_to_next_cell": 1
-   },
-   "outputs": [],
-   "source": [
-    "def test_system_info():\n",
-    "    \"\"\"Test system_info function implementation.\"\"\"\n",
-    "    print(\"🔬 Unit Test: System Information...\")\n",
-    "    \n",
-    "    # Test system_info function\n",
-    "    sys_info = system_info()\n",
-    "    \n",
-    "    # Test return type\n",
-    "    assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
-    "    \n",
-    "    # Test required keys\n",
-    "    required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
-    "    for key in required_keys:\n",
-    "        assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
-    "    \n",
-    "    # Test data types\n",
-    "    assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
-    "    assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
-    "    assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
-    "    assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
-    "    assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
-    "    \n",
-    "    # Test reasonable values\n",
-    "    assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
-    "    assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
-    "    assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
-    "    \n",
-    "    # Test that values are actually queried (not hardcoded)\n",
-    "    actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
-    "    assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
-    "    \n",
-    "    print(\"✅ System info function tests passed!\")\n",
-    "    print(f\"✅ Python: {sys_info['python_version']} on {sys_info['platform']}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "54d58db1",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Professional ML Engineering Skills\n",
-    "\n",
-    "You've successfully configured your TinyTorch installation and learned the foundations of ML systems engineering:\n",
-    "\n",
-    "### What You've Accomplished\n",
-    "✅ **Personal Configuration**: Set up your identity and custom system name  \n",
-    "✅ **System Queries**: Learned to gather hardware and software information  \n",
-    "✅ **NBGrader Workflow**: Mastered solution blocks and automated testing  \n",
-    "✅ **Code Export**: Created functions that become part of your tinytorch package  \n",
-    "✅ **Professional Setup**: Established proper development practices  \n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "\n",
-    "#### 1. **System Awareness**\n",
-    "- **Hardware constraints**: Understanding CPU, memory, and architecture limitations\n",
-    "- **Software dependencies**: Python version and platform compatibility\n",
-    "- **Performance implications**: How system specs affect ML workloads\n",
-    "\n",
-    "#### 2. **Configuration Management**\n",
-    "- **Personal identification**: Professional attribution and contact information\n",
-    "- **Environment documentation**: Reproducible system specifications\n",
-    "- **Professional standards**: Industry-standard development practices\n",
-    "\n",
-    "#### 3. **ML Systems Foundations**\n",
-    "- **Reproducibility**: System context for experiment tracking\n",
-    "- **Debugging**: Hardware info for performance troubleshooting\n",
-    "- **Collaboration**: Proper attribution and contact information\n",
-    "\n",
-    "#### 4. **Development Workflow**\n",
-    "- **NBGrader integration**: Automated testing and grading\n",
-    "- **Code export**: Functions become part of production package\n",
-    "- **Testing practices**: Comprehensive validation of functionality\n",
-    "\n",
-    "### Connections to Real ML Systems\n",
-    "\n",
-    "This module connects to broader ML engineering practices:\n",
-    "\n",
-    "#### **Industry Parallels**\n",
-    "- **Docker containers**: System configuration and reproducibility\n",
-    "- **MLflow tracking**: Experiment context and system metadata\n",
-    "- **Model cards**: Documentation of system requirements and performance\n",
-    "- **CI/CD pipelines**: Automated testing and environment validation\n",
-    "\n",
-    "#### **Production Considerations**\n",
-    "- **Deployment matching**: Development environment should match production\n",
-    "- **Resource planning**: Understanding hardware constraints for scaling\n",
-    "- **Monitoring**: System metrics for performance optimization\n",
-    "- **Debugging**: System context for troubleshooting issues\n",
-    "\n",
-    "### Next Steps in Your ML Systems Journey\n",
-    "\n",
-    "#### **Immediate Actions**\n",
-    "1. **Export your code**: `tito module export 01_setup`\n",
-    "2. **Test your installation**: \n",
-    "   ```python\n",
-    "   from tinytorch.core.setup import personal_info, system_info\n",
-    "   print(personal_info())  # Your personal details\n",
-    "   print(system_info())    # System information\n",
-    "   ```\n",
-    "3. **Verify package integration**: Ensure your functions work in the tinytorch package\n",
-    "\n",
-    "#### **Looking Ahead**\n",
-    "- **Module 1 (Tensor)**: Build the fundamental data structure for ML\n",
-    "- **Module 2 (Activations)**: Add nonlinearity for complex learning\n",
-    "- **Module 3 (Layers)**: Create the building blocks of neural networks\n",
-    "- **Module 4 (Networks)**: Compose layers into powerful architectures\n",
-    "\n",
-    "#### **Course Progression**\n",
-    "You're now ready to build a complete ML system from scratch:\n",
-    "```\n",
-    "Setup → Tensor → Activations → Layers → Networks → CNN → DataLoader → \n",
-    "Autograd → Optimizers → Training → Compression → Kernels → Benchmarking → MLOps\n",
-    "```\n",
-    "\n",
-    "### Professional Development Milestone\n",
-    "\n",
-    "You've taken your first step in ML systems engineering! This module taught you:\n",
-    "- **System thinking**: Understanding hardware and software constraints\n",
-    "- **Professional practices**: Proper attribution, testing, and documentation\n",
-    "- **Tool mastery**: NBGrader workflow and package development\n",
-    "- **Foundation building**: Creating reusable, tested, documented code\n",
-    "\n",
-    "**Ready for the next challenge?** Let's build the foundation of ML systems with tensors!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fdb8068c",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 4: Environment Validation\n",
-    "\n",
-    "### The Concept: Dependency Management in ML Systems\n",
-    "**Environment validation** ensures your system has the necessary packages and versions for ML development. This is crucial because ML systems have complex dependency chains that can break in subtle ways.\n",
-    "\n",
-    "### Why Environment Validation Matters\n",
-    "\n",
-    "#### 1. **Compatibility Assurance**\n",
-    "- **Version conflicts**: Different packages may require incompatible versions\n",
-    "- **API changes**: New versions might break existing code\n",
-    "- **Feature availability**: Some features require specific versions\n",
-    "\n",
-    "#### 2. **Reproducibility**\n",
-    "- **Environment documentation**: Exact package versions for reproduction\n",
-    "- **Dependency tracking**: Understanding what's installed and why\n",
-    "- **Debugging support**: Version info helps troubleshoot issues\n",
-    "\n",
-    "#### 3. **Professional Development**\n",
-    "- **Deployment safety**: Ensure development matches production\n",
-    "- **Collaboration**: Team members need compatible environments\n",
-    "- **Quality assurance**: Validate setup before beginning work\n",
-    "\n",
-    "### Essential ML Dependencies\n",
-    "We'll check for core packages that ML systems depend on:\n",
-    "- **numpy**: Fundamental numerical computing\n",
-    "- **matplotlib**: Visualization and plotting\n",
-    "- **psutil**: System information and monitoring\n",
-    "- **jupyter**: Interactive development environment\n",
-    "- **nbdev**: Package development tools\n",
-    "- **pytest**: Testing framework\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Docker**: Container images include dependency validation\n",
-    "- **CI/CD**: Automated testing validates environment setup\n",
-    "- **MLflow**: Tracks package versions with experiment metadata\n",
-    "- **Kaggle**: Validates package availability in competition environments\n",
-    "\n",
-    "Let's implement environment validation!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7e36a801",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "environment-validation",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "import importlib\n",
-    "import pkg_resources\n",
-    "from typing import Dict, List, Optional\n",
-    "\n",
-    "def validate_environment() -> Dict[str, Any]:\n",
-    "    \"\"\"\n",
-    "    Validate ML development environment and check essential dependencies.\n",
-    "    \n",
-    "    This function checks that your system has the necessary packages for ML development.\n",
-    "    It's like a pre-flight check before you start building ML systems.\n",
-    "    \n",
-    "    TODO: Implement environment validation.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Define list of essential ML packages to check\n",
-    "    2. For each package, try to import it and get version\n",
-    "    3. Track which packages are available vs missing\n",
-    "    4. Calculate environment health score\n",
-    "    5. Return comprehensive environment report\n",
-    "    \n",
-    "    ESSENTIAL PACKAGES TO CHECK:\n",
-    "    - numpy: Numerical computing foundation\n",
-    "    - matplotlib: Visualization and plotting\n",
-    "    - psutil: System monitoring\n",
-    "    - jupyter: Interactive development\n",
-    "    - nbdev: Package development\n",
-    "    - pytest: Testing framework\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use try/except to handle missing packages gracefully\n",
-    "    - Use pkg_resources.get_distribution(package).version for versions\n",
-    "    - Calculate health_score as (available_packages / total_packages) * 100\n",
-    "    - Round health_score to 1 decimal place\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    essential_packages = [\n",
-    "        'numpy', 'matplotlib', 'psutil', 'jupyter', 'nbdev', 'pytest'\n",
-    "    ]\n",
-    "    \n",
-    "    available = {}\n",
-    "    missing = []\n",
-    "    \n",
-    "    for package in essential_packages:\n",
-    "        try:\n",
-    "            # Try to import the package\n",
-    "            importlib.import_module(package)\n",
-    "            # Get version information\n",
-    "            version = pkg_resources.get_distribution(package).version\n",
-    "            available[package] = version\n",
-    "        except (ImportError, pkg_resources.DistributionNotFound):\n",
-    "            missing.append(package)\n",
-    "    \n",
-    "    # Calculate health score\n",
-    "    total_packages = len(essential_packages)\n",
-    "    available_packages = len(available)\n",
-    "    health_score = round((available_packages / total_packages) * 100, 1)\n",
-    "    \n",
-    "    return {\n",
-    "        'available_packages': available,\n",
-    "        'missing_packages': missing,\n",
-    "        'health_score': health_score,\n",
-    "        'total_checked': total_packages,\n",
-    "        'status': 'healthy' if health_score >= 80 else 'needs_attention'\n",
-    "    }\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4547fb8d",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 5: Performance Benchmarking\n",
-    "\n",
-    "### The Concept: Hardware Performance Profiling\n",
-    "**Performance benchmarking** measures your system's computational capabilities for ML workloads. This helps you understand your hardware limits and optimize your development workflow.\n",
-    "\n",
-    "### Why Performance Benchmarking Matters\n",
-    "\n",
-    "#### 1. **Resource Planning**\n",
-    "- **Training time estimation**: How long will model training take?\n",
-    "- **Memory allocation**: What's the maximum batch size you can handle?\n",
-    "- **Parallelization**: How many cores can you effectively use?\n",
-    "\n",
-    "#### 2. **Optimization Guidance**\n",
-    "- **Bottleneck identification**: Is your system CPU-bound or memory-bound?\n",
-    "- **Hardware upgrades**: What would improve performance most?\n",
-    "- **Algorithm selection**: Which algorithms suit your hardware?\n",
-    "\n",
-    "#### 3. **Performance Comparison**\n",
-    "- **Baseline establishment**: Track performance over time\n",
-    "- **System comparison**: Compare different development environments\n",
-    "- **Deployment planning**: Match development to production performance\n",
-    "\n",
-    "### Benchmarking Strategy\n",
-    "We'll test key ML operations:\n",
-    "- **CPU computation**: Matrix operations that stress the processor\n",
-    "- **Memory bandwidth**: Large data transfers that test memory speed\n",
-    "- **Overall system**: Combined CPU and memory performance\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **MLPerf**: Industry-standard ML benchmarks\n",
-    "- **Cloud providers**: Performance metrics for instance selection\n",
-    "- **Hardware vendors**: Benchmark comparisons for purchasing decisions\n",
-    "\n",
-    "Let's implement performance benchmarking!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c80ba038",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "performance-benchmark",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "import time\n",
-    "import random\n",
-    "\n",
-    "def benchmark_performance() -> Dict[str, Any]:\n",
-    "    \"\"\"\n",
-    "    Benchmark system performance for ML workloads.\n",
-    "    \n",
-    "    This function measures computational performance to help you understand\n",
-    "    your system's capabilities and optimize your ML development workflow.\n",
-    "    \n",
-    "    TODO: Implement performance benchmarking.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. CPU Test: Time a computationally intensive operation\n",
-    "    2. Memory Test: Time a memory-intensive operation\n",
-    "    3. Calculate performance scores based on execution time\n",
-    "    4. Determine overall system performance rating\n",
-    "    5. Return comprehensive benchmark results\n",
-    "    \n",
-    "    BENCHMARK TESTS:\n",
-    "    - CPU: Nested loop calculation (computational intensity)\n",
-    "    - Memory: Large list operations (memory bandwidth)\n",
-    "    - Combined: Overall system performance score\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use time.time() to measure execution time\n",
-    "    - CPU test: nested loops with mathematical operations\n",
-    "    - Memory test: large list creation and manipulation\n",
-    "    - Lower execution time = better performance\n",
-    "    - Calculate scores as inverse of time (e.g., 1/time * 1000)\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    benchmarks = {}\n",
-    "    \n",
-    "    # CPU Performance Test\n",
-    "    print(\"⚡ Running CPU benchmark...\")\n",
-    "    start_time = time.time()\n",
-    "    \n",
-    "    # CPU-intensive calculation\n",
-    "    result = 0\n",
-    "    for i in range(100000):\n",
-    "        result += i * i + i / 2\n",
-    "    \n",
-    "    cpu_time = time.time() - start_time\n",
-    "    benchmarks['cpu_time'] = round(cpu_time, 3)\n",
-    "    benchmarks['cpu_score'] = round(1000 / cpu_time, 1)\n",
-    "    \n",
-    "    # Memory Performance Test\n",
-    "    print(\"🧠 Running memory benchmark...\")\n",
-    "    start_time = time.time()\n",
-    "    \n",
-    "    # Memory-intensive operations\n",
-    "    large_list = list(range(1000000))\n",
-    "    large_list.reverse()\n",
-    "    large_list.sort()\n",
-    "    \n",
-    "    memory_time = time.time() - start_time\n",
-    "    benchmarks['memory_time'] = round(memory_time, 3)\n",
-    "    benchmarks['memory_score'] = round(1000 / memory_time, 1)\n",
-    "    \n",
-    "    # Overall Performance Score\n",
-    "    overall_score = round((benchmarks['cpu_score'] + benchmarks['memory_score']) / 2, 1)\n",
-    "    benchmarks['overall_score'] = overall_score\n",
-    "    \n",
-    "    # Performance Rating\n",
-    "    if overall_score >= 80:\n",
-    "        rating = 'excellent'\n",
-    "    elif overall_score >= 60:\n",
-    "        rating = 'good'\n",
-    "    elif overall_score >= 40:\n",
-    "        rating = 'fair'\n",
-    "    else:\n",
-    "        rating = 'needs_optimization'\n",
-    "    \n",
-    "    benchmarks['performance_rating'] = rating\n",
-    "    \n",
-    "    return benchmarks\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "666b386a",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 6: Development Environment Setup\n",
-    "\n",
-    "### The Concept: Professional Development Configuration\n",
-    "**Development environment setup** configures essential tools and settings for professional ML development. This includes Git configuration, Jupyter settings, and other tools that make development more efficient.\n",
-    "\n",
-    "### Why Development Setup Matters\n",
-    "\n",
-    "#### 1. **Professional Standards**\n",
-    "- **Version control**: Proper Git configuration for collaboration\n",
-    "- **Code quality**: Consistent formatting and style\n",
-    "- **Documentation**: Automatic documentation generation\n",
-    "\n",
-    "#### 2. **Productivity Optimization**\n",
-    "- **Tool configuration**: Optimized settings for efficiency\n",
-    "- **Workflow automation**: Reduce repetitive tasks\n",
-    "- **Error prevention**: Catch issues before they become problems\n",
-    "\n",
-    "#### 3. **Collaboration Readiness**\n",
-    "- **Team compatibility**: Consistent development environment\n",
-    "- **Code sharing**: Proper attribution and commit messages\n",
-    "- **Project standards**: Follow established conventions\n",
-    "\n",
-    "### Essential Development Tools\n",
-    "We'll configure key tools for ML development:\n",
-    "- **Git**: Version control and collaboration\n",
-    "- **Jupyter**: Interactive development environment\n",
-    "- **Python**: Code formatting and quality tools\n",
-    "\n",
-    "Let's implement development environment setup!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a34ebb28",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "development-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "import subprocess\n",
-    "import json\n",
-    "from pathlib import Path\n",
-    "\n",
-    "def setup_development_environment() -> Dict[str, Any]:\n",
-    "    \"\"\"\n",
-    "    Configure development environment for professional ML development.\n",
-    "    \n",
-    "    This function sets up essential tools and configurations to make your\n",
-    "    development workflow more efficient and professional.\n",
-    "    \n",
-    "    TODO: Implement development environment setup.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Check if Git is installed and configured\n",
-    "    2. Verify Jupyter installation and configuration\n",
-    "    3. Check Python development tools\n",
-    "    4. Configure any missing tools\n",
-    "    5. Return setup status and recommendations\n",
-    "    \n",
-    "    DEVELOPMENT TOOLS TO CHECK:\n",
-    "    - Git: Version control system\n",
-    "    - Jupyter: Interactive development\n",
-    "    - Python tools: Code quality and formatting\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use subprocess.run() to check tool availability\n",
-    "    - Use try/except to handle missing tools gracefully\n",
-    "    - Provide helpful recommendations for missing tools\n",
-    "    - Focus on tools that improve ML development workflow\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    setup_status = {}\n",
-    "    recommendations = []\n",
-    "    \n",
-    "    # Check Git installation and configuration\n",
-    "    try:\n",
-    "        git_version = subprocess.run(['git', '--version'], \n",
-    "                                   capture_output=True, text=True, check=True)\n",
-    "        setup_status['git_installed'] = True\n",
-    "        setup_status['git_version'] = git_version.stdout.strip()\n",
-    "        \n",
-    "        # Check Git configuration\n",
-    "        try:\n",
-    "            git_name = subprocess.run(['git', 'config', 'user.name'], \n",
-    "                                    capture_output=True, text=True, check=True)\n",
-    "            git_email = subprocess.run(['git', 'config', 'user.email'], \n",
-    "                                     capture_output=True, text=True, check=True)\n",
-    "            setup_status['git_configured'] = True\n",
-    "            setup_status['git_name'] = git_name.stdout.strip()\n",
-    "            setup_status['git_email'] = git_email.stdout.strip()\n",
-    "        except subprocess.CalledProcessError:\n",
-    "            setup_status['git_configured'] = False\n",
-    "            recommendations.append(\"Configure Git: git config --global user.name 'Your Name'\")\n",
-    "            recommendations.append(\"Configure Git: git config --global user.email 'your.email@domain.com'\")\n",
-    "    \n",
-    "    except (subprocess.CalledProcessError, FileNotFoundError):\n",
-    "        setup_status['git_installed'] = False\n",
-    "        recommendations.append(\"Install Git: https://git-scm.com/downloads\")\n",
-    "    \n",
-    "    # Check Jupyter installation\n",
-    "    try:\n",
-    "        jupyter_version = subprocess.run(['jupyter', '--version'], \n",
-    "                                       capture_output=True, text=True, check=True)\n",
-    "        setup_status['jupyter_installed'] = True\n",
-    "        setup_status['jupyter_version'] = jupyter_version.stdout.strip()\n",
-    "    except (subprocess.CalledProcessError, FileNotFoundError):\n",
-    "        setup_status['jupyter_installed'] = False\n",
-    "        recommendations.append(\"Install Jupyter: pip install jupyter\")\n",
-    "    \n",
-    "    # Check Python tools\n",
-    "    python_tools = ['pip', 'python']\n",
-    "    for tool in python_tools:\n",
-    "        try:\n",
-    "            tool_version = subprocess.run([tool, '--version'], \n",
-    "                                        capture_output=True, text=True, check=True)\n",
-    "            setup_status[f'{tool}_installed'] = True\n",
-    "            setup_status[f'{tool}_version'] = tool_version.stdout.strip()\n",
-    "        except (subprocess.CalledProcessError, FileNotFoundError):\n",
-    "            setup_status[f'{tool}_installed'] = False\n",
-    "            recommendations.append(f\"Install {tool}: Check Python installation\")\n",
-    "    \n",
-    "    # Calculate setup health\n",
-    "    total_tools = 4  # git, jupyter, pip, python\n",
-    "    installed_tools = sum([\n",
-    "        setup_status.get('git_installed', False),\n",
-    "        setup_status.get('jupyter_installed', False),\n",
-    "        setup_status.get('pip_installed', False),\n",
-    "        setup_status.get('python_installed', False)\n",
-    "    ])\n",
-    "    \n",
-    "    setup_score = round((installed_tools / total_tools) * 100, 1)\n",
-    "    \n",
-    "    return {\n",
-    "        'setup_status': setup_status,\n",
-    "        'recommendations': recommendations,\n",
-    "        'setup_score': setup_score,\n",
-    "        'status': 'ready' if setup_score >= 75 else 'needs_configuration'\n",
-    "    }\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c27d83df",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 7: Comprehensive System Report\n",
-    "\n",
-    "### The Concept: Integrated System Analysis\n",
-    "**Comprehensive system reporting** combines all your configuration and diagnostic information into a single, actionable report. This is like a \"health check\" for your ML development environment.\n",
-    "\n",
-    "### Why Comprehensive Reporting Matters\n",
-    "\n",
-    "#### 1. **Holistic View**\n",
-    "- **Complete picture**: All system information in one place\n",
-    "- **Dependency analysis**: How different components interact\n",
-    "- **Performance context**: Understanding system capabilities\n",
-    "\n",
-    "#### 2. **Troubleshooting Support**\n",
-    "- **Debugging aid**: Complete environment information for issue resolution\n",
-    "- **Performance analysis**: Identify bottlenecks and optimization opportunities\n",
-    "- **Compatibility checking**: Ensure all components work together\n",
-    "\n",
-    "#### 3. **Professional Documentation**\n",
-    "- **Environment documentation**: Complete system specification\n",
-    "- **Reproducibility**: All information needed to recreate environment\n",
-    "- **Sharing**: Easy to share system information with collaborators\n",
-    "\n",
-    "Let's create a comprehensive system report!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "89b9aac3",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "system-report",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "from datetime import datetime\n",
-    "\n",
-    "def generate_system_report() -> Dict[str, Any]:\n",
-    "    \"\"\"\n",
-    "    Generate comprehensive system report for ML development.\n",
-    "    \n",
-    "    This function combines all configuration and diagnostic information\n",
-    "    into a single, actionable report for your ML development environment.\n",
-    "    \n",
-    "    TODO: Implement comprehensive system reporting.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Gather personal information\n",
-    "    2. Collect system information\n",
-    "    3. Validate environment\n",
-    "    4. Run performance benchmarks\n",
-    "    5. Check development setup\n",
-    "    6. Generate overall health score\n",
-    "    7. Create comprehensive report with recommendations\n",
-    "    \n",
-    "    REPORT SECTIONS:\n",
-    "    - Personal configuration\n",
-    "    - System specifications\n",
-    "    - Environment validation\n",
-    "    - Performance benchmarks\n",
-    "    - Development setup\n",
-    "    - Overall health assessment\n",
-    "    - Recommendations for improvement\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Call all previously implemented functions\n",
-    "    - Combine results into comprehensive report\n",
-    "    - Calculate overall health score from all components\n",
-    "    - Provide actionable recommendations\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    print(\"📊 Generating comprehensive system report...\")\n",
-    "    \n",
-    "    # Gather all information\n",
-    "    personal = personal_info()\n",
-    "    system = system_info()\n",
-    "    environment = validate_environment()\n",
-    "    performance = benchmark_performance()\n",
-    "    development = setup_development_environment()\n",
-    "    \n",
-    "    # Calculate overall health score (normalize performance score to 0-100 range)\n",
-    "    normalized_performance = min(performance['overall_score'], 100)  # Cap at 100\n",
-    "    \n",
-    "    health_components = [\n",
-    "        environment['health_score'],\n",
-    "        normalized_performance,\n",
-    "        development['setup_score']\n",
-    "    ]\n",
-    "    \n",
-    "    overall_health = round(sum(health_components) / len(health_components), 1)\n",
-    "    \n",
-    "    # Generate status\n",
-    "    if overall_health >= 85:\n",
-    "        status = 'excellent'\n",
-    "    elif overall_health >= 70:\n",
-    "        status = 'good'\n",
-    "    elif overall_health >= 50:\n",
-    "        status = 'fair'\n",
-    "    else:\n",
-    "        status = 'needs_attention'\n",
-    "    \n",
-    "    # Compile recommendations\n",
-    "    recommendations = []\n",
-    "    \n",
-    "    if environment['health_score'] < 80:\n",
-    "        recommendations.extend([f\"Install missing package: {pkg}\" for pkg in environment['missing_packages']])\n",
-    "    \n",
-    "    if performance['overall_score'] < 50:\n",
-    "        recommendations.append(\"Consider hardware upgrade for better ML performance\")\n",
-    "    \n",
-    "    recommendations.extend(development['recommendations'])\n",
-    "    \n",
-    "    # Create comprehensive report\n",
-    "    report = {\n",
-    "        'timestamp': datetime.now().isoformat(),\n",
-    "        'personal_info': personal,\n",
-    "        'system_info': system,\n",
-    "        'environment_validation': environment,\n",
-    "        'performance_benchmarks': performance,\n",
-    "        'development_setup': development,\n",
-    "        'overall_health': overall_health,\n",
-    "        'status': status,\n",
-    "        'recommendations': recommendations,\n",
-    "        'report_version': '1.0.0'\n",
-    "    }\n",
-    "    \n",
-    "    return report\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9063a17e",
-   "metadata": {},
-   "source": [
-    "\"\"\"\n",
-    "## 🧪 Unit Test: Enhanced Setup Functions\n",
-    "\n",
-    "Test all the new enhanced setup functions:\n",
-    "\"\"\"\n",
-    "\n",
-    "Old function removed - using shared test runner pattern"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4b48e976",
-   "metadata": {
-    "lines_to_next_cell": 1
-   },
-   "outputs": [],
-   "source": [
-    "def test_performance_benchmark():\n",
-    "    \"\"\"Test performance benchmarking function.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Performance Benchmarking...\")\n",
-    "    \n",
-    "    benchmark_report = benchmark_performance()\n",
-    "    \n",
-    "    # Test return type and structure\n",
-    "    assert isinstance(benchmark_report, dict), \"benchmark_performance should return a dictionary\"\n",
-    "    \n",
-    "    # Test required keys\n",
-    "    required_keys = ['cpu_time', 'cpu_score', 'memory_time', 'memory_score', 'overall_score', 'performance_rating']\n",
-    "    for key in required_keys:\n",
-    "        assert key in benchmark_report, f\"Report should have '{key}' key\"\n",
-    "    \n",
-    "    # Test data types\n",
-    "    assert isinstance(benchmark_report['cpu_time'], (int, float)), \"cpu_time should be number\"\n",
-    "    assert isinstance(benchmark_report['cpu_score'], (int, float)), \"cpu_score should be number\"\n",
-    "    assert isinstance(benchmark_report['memory_time'], (int, float)), \"memory_time should be number\"\n",
-    "    assert isinstance(benchmark_report['memory_score'], (int, float)), \"memory_score should be number\"\n",
-    "    assert isinstance(benchmark_report['overall_score'], (int, float)), \"overall_score should be number\"\n",
-    "    assert isinstance(benchmark_report['performance_rating'], str), \"performance_rating should be string\"\n",
-    "    \n",
-    "    # Test reasonable values\n",
-    "    assert benchmark_report['cpu_time'] > 0, \"cpu_time should be positive\"\n",
-    "    assert benchmark_report['memory_time'] > 0, \"memory_time should be positive\"\n",
-    "    assert benchmark_report['cpu_score'] > 0, \"cpu_score should be positive\"\n",
-    "    assert benchmark_report['memory_score'] > 0, \"memory_score should be positive\"\n",
-    "    assert benchmark_report['overall_score'] > 0, \"overall_score should be positive\"\n",
-    "    \n",
-    "    valid_ratings = ['excellent', 'good', 'fair', 'needs_optimization']\n",
-    "    assert benchmark_report['performance_rating'] in valid_ratings, \"performance_rating should be valid\"\n",
-    "    \n",
-    "    print(\"✅ Performance benchmark tests passed!\")\n",
-    "    print(f\"✅ Performance rating: {benchmark_report['performance_rating']}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7b09b6ad",
-   "metadata": {
-    "lines_to_next_cell": 1
-   },
-   "outputs": [],
-   "source": [
-    "def test_development_setup():\n",
-    "    \"\"\"Test development environment setup function.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Development Environment Setup...\")\n",
-    "    \n",
-    "    setup_report = setup_development_environment()\n",
-    "    \n",
-    "    # Test return type and structure\n",
-    "    assert isinstance(setup_report, dict), \"setup_development_environment should return a dictionary\"\n",
-    "    \n",
-    "    # Test required keys\n",
-    "    required_keys = ['setup_status', 'recommendations', 'setup_score', 'status']\n",
-    "    for key in required_keys:\n",
-    "        assert key in setup_report, f\"Report should have '{key}' key\"\n",
-    "    \n",
-    "    # Test data types\n",
-    "    assert isinstance(setup_report['setup_status'], dict), \"setup_status should be dict\"\n",
-    "    assert isinstance(setup_report['recommendations'], list), \"recommendations should be list\"\n",
-    "    assert isinstance(setup_report['setup_score'], (int, float)), \"setup_score should be number\"\n",
-    "    assert isinstance(setup_report['status'], str), \"status should be string\"\n",
-    "    \n",
-    "    # Test reasonable values\n",
-    "    assert 0 <= setup_report['setup_score'] <= 100, \"setup_score should be between 0 and 100\"\n",
-    "    assert setup_report['status'] in ['ready', 'needs_configuration'], \"status should be valid\"\n",
-    "    \n",
-    "    print(\"✅ Development setup tests passed!\")\n",
-    "    print(f\"✅ Setup score: {setup_report['setup_score']}%\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "68475c70",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def test_system_report():\n",
-    "    \"\"\"Test comprehensive system report function.\"\"\"\n",
-    "    print(\"🔬 Unit Test: System Report Generation...\")\n",
-    "    \n",
-    "    report = generate_system_report()\n",
-    "    \n",
-    "    # Test return type and structure\n",
-    "    assert isinstance(report, dict), \"generate_system_report should return a dictionary\"\n",
-    "    \n",
-    "    # Test required keys\n",
-    "    required_keys = ['timestamp', 'personal_info', 'system_info', 'environment_validation', \n",
-    "                    'performance_benchmarks', 'development_setup', 'overall_health', \n",
-    "                    'status', 'recommendations', 'report_version']\n",
-    "    for key in required_keys:\n",
-    "        assert key in report, f\"Report should have '{key}' key\"\n",
-    "    \n",
-    "    # Test data types\n",
-    "    assert isinstance(report['timestamp'], str), \"timestamp should be string\"\n",
-    "    assert isinstance(report['personal_info'], dict), \"personal_info should be dict\"\n",
-    "    assert isinstance(report['system_info'], dict), \"system_info should be dict\"\n",
-    "    assert isinstance(report['environment_validation'], dict), \"environment_validation should be dict\"\n",
-    "    assert isinstance(report['performance_benchmarks'], dict), \"performance_benchmarks should be dict\"\n",
-    "    assert isinstance(report['development_setup'], dict), \"development_setup should be dict\"\n",
-    "    assert isinstance(report['overall_health'], (int, float)), \"overall_health should be number\"\n",
-    "    assert isinstance(report['status'], str), \"status should be string\"\n",
-    "    assert isinstance(report['recommendations'], list), \"recommendations should be list\"\n",
-    "    assert isinstance(report['report_version'], str), \"report_version should be string\"\n",
-    "    \n",
-    "    # Test reasonable values\n",
-    "    assert 0 <= report['overall_health'] <= 100, \"overall_health should be between 0 and 100\"\n",
-    "    valid_statuses = ['excellent', 'good', 'fair', 'needs_attention']\n",
-    "    assert report['status'] in valid_statuses, \"status should be valid\"\n",
-    "    \n",
-    "    print(\"✅ System report tests passed!\")\n",
-    "    print(f\"✅ Overall system health: {report['overall_health']}%\")\n",
-    "\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ba1bcd18",
-   "metadata": {
-    "lines_to_next_cell": 1
-   },
-   "outputs": [],
-   "source": [
-    "def test_personal_info():\n",
-    "    \"\"\"Test personal information function comprehensively.\"\"\"\n",
-    "    personal = personal_info()\n",
-    "    assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
-    "    assert 'developer' in personal, \"Dictionary should have 'developer' key\"\n",
-    "    assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
-    "    print(\"✅ Personal information function works\")\n",
-    "\n",
-    "def test_system_info():\n",
-    "    \"\"\"Test system information function comprehensively.\"\"\"\n",
-    "    system = system_info()\n",
-    "    assert isinstance(system, dict), \"system_info should return a dictionary\"\n",
-    "    assert 'python_version' in system, \"Dictionary should have 'python_version' key\"\n",
-    "    assert system['memory_gb'] > 0, \"Memory should be positive\"\n",
-    "    print(\"✅ System information function works\")\n",
-    "\n",
-    "def test_environment_validation():\n",
-    "    \"\"\"Test environment validation function comprehensively.\"\"\"\n",
-    "    env = validate_environment()\n",
-    "    assert isinstance(env, dict), \"validate_environment should return a dictionary\"\n",
-    "    assert 'health_score' in env, \"Dictionary should have 'health_score' key\"\n",
-    "    print(\"✅ Environment validation function works\")\n",
-    "\n",
-    "def test_performance_benchmark():\n",
-    "    \"\"\"Test performance benchmarking function comprehensively.\"\"\"\n",
-    "    perf = benchmark_performance()\n",
-    "    assert isinstance(perf, dict), \"benchmark_performance should return a dictionary\"\n",
-    "    assert 'cpu_score' in perf, \"Dictionary should have 'cpu_score' key\"\n",
-    "    print(\"✅ Performance benchmarking function works\")\n",
-    "\n",
-    "def test_development_setup():\n",
-    "    \"\"\"Test development setup function comprehensively.\"\"\"\n",
-    "    dev = setup_development_environment()\n",
-    "    assert isinstance(dev, dict), \"setup_development_environment should return a dictionary\"\n",
-    "    assert 'setup_score' in dev, \"Dictionary should have 'setup_score' key\"\n",
-    "    print(\"✅ Development setup function works\")\n",
-    "\n",
-    "def test_system_report():\n",
-    "    \"\"\"Test system report comprehensive function.\"\"\"\n",
-    "    report = generate_system_report()\n",
-    "    assert isinstance(report, dict), \"generate_system_report should return a dictionary\"\n",
-    "    assert 'overall_health' in report, \"Dictionary should have 'overall_health' key\"\n",
-    "    print(\"✅ System report function works\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2415d2ab",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "526c9009",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Setup\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "35feea10",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Development Environment Setup Complete!\n",
-    "\n",
-    "Congratulations! You've successfully set up your TinyTorch development environment:\n",
-    "\n",
-    "### What You've Accomplished\n",
-    "✅ **Personal Configuration**: Developer information and preferences\n",
-    "✅ **System Analysis**: Hardware and software environment validation\n",
-    "✅ **Environment Validation**: Python packages and dependencies\n",
-    "✅ **Performance Benchmarking**: CPU and memory performance testing\n",
-    "✅ **Development Setup**: IDE configuration and tooling\n",
-    "✅ **Comprehensive Reporting**: System health and recommendations\n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "- **Environment Management**: How to validate and configure development environments\n",
-    "- **Performance Analysis**: Benchmarking system capabilities for ML workloads\n",
-    "- **System Diagnostics**: Comprehensive health checking and reporting\n",
-    "- **Development Best Practices**: Professional setup for ML development\n",
-    "\n",
-    "### Next Steps\n",
-    "1. **Export your code**: `tito package nbdev --export 00_setup`\n",
-    "2. **Test your implementation**: `tito test 00_setup`\n",
-    "3. **Use your environment**: Start building with confidence in a validated setup\n",
-    "4. **Move to Module 1**: Begin implementing the core tensor system!\n",
-    "\n",
-    "**Ready for the ML journey?** Your development environment is now optimized for building neural networks from scratch!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/02_tensor/tensor_dev.ipynb b/modules/source/02_tensor/tensor_dev.ipynb
deleted file mode 100644
index b62b109a..00000000
--- a/modules/source/02_tensor/tensor_dev.ipynb
+++ /dev/null
@@ -1,1921 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "2f536feb",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Tensor - Core Data Structure\n",
-    "\n",
-    "Welcome to the Tensor module! This is where TinyTorch really begins. You'll implement the fundamental data structure that powers all ML systems.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand tensors as N-dimensional arrays with ML-specific operations\n",
-    "- Implement a complete Tensor class with arithmetic operations\n",
-    "- Handle shape management, data types, and memory layout\n",
-    "- Build the foundation for neural networks and automatic differentiation\n",
-    "- Master the NBGrader workflow with comprehensive testing\n",
-    "\n",
-    "## Build → Use → Understand\n",
-    "1. **Build**: Create the Tensor class with core operations\n",
-    "2. **Use**: Perform tensor arithmetic and transformations\n",
-    "3. **Understand**: How tensors form the foundation of ML systems"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ac7d8c03",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "tensor-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.tensor\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "from typing import Union, List, Tuple, Optional, Any"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e6a6eb48",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "tensor-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Tensor Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build tensors!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3d304c03",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/01_tensor/tensor_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.tensor`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.tensor import Tensor  # The foundation of everything!\n",
-    "from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
-    "from tinytorch.core.layers import Dense, Conv2D\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused modules for deep understanding\n",
-    "- **Production:** Proper organization like PyTorch's `torch.Tensor`\n",
-    "- **Consistency:** All tensor operations live together in `core.tensor`\n",
-    "- **Foundation:** Every other module depends on Tensor"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "231bde56",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 1: What is a Tensor?\n",
-    "\n",
-    "### Definition\n",
-    "A **tensor** is an N-dimensional array with ML-specific operations. Think of it as a container that can hold data in multiple dimensions:\n",
-    "\n",
-    "- **Scalar** (0D): A single number - `5.0`\n",
-    "- **Vector** (1D): A list of numbers - `[1, 2, 3]`  \n",
-    "- **Matrix** (2D): A 2D array - `[[1, 2], [3, 4]]`\n",
-    "- **Higher dimensions**: 3D, 4D, etc. for images, video, batches\n",
-    "\n",
-    "### The Mathematical Foundation: From Scalars to Tensors\n",
-    "Understanding tensors requires building from mathematical fundamentals:\n",
-    "\n",
-    "#### **Scalars (Rank 0)**\n",
-    "- **Definition**: A single number with no direction\n",
-    "- **Examples**: Temperature (25°C), mass (5.2 kg), probability (0.7)\n",
-    "- **Operations**: Addition, multiplication, comparison\n",
-    "- **ML Context**: Loss values, learning rates, regularization parameters\n",
-    "\n",
-    "#### **Vectors (Rank 1)**\n",
-    "- **Definition**: An ordered list of numbers with direction and magnitude\n",
-    "- **Examples**: Position [x, y, z], RGB color [255, 128, 0], word embedding [0.1, -0.5, 0.8]\n",
-    "- **Operations**: Dot product, cross product, norm calculation\n",
-    "- **ML Context**: Feature vectors, gradients, model parameters\n",
-    "\n",
-    "#### **Matrices (Rank 2)**\n",
-    "- **Definition**: A 2D array organizing data in rows and columns\n",
-    "- **Examples**: Image (height × width), weight matrix (input × output), covariance matrix\n",
-    "- **Operations**: Matrix multiplication, transpose, inverse, eigendecomposition\n",
-    "- **ML Context**: Linear layer weights, attention matrices, batch data\n",
-    "\n",
-    "#### **Higher-Order Tensors (Rank 3+)**\n",
-    "- **Definition**: Multi-dimensional arrays extending matrices\n",
-    "- **Examples**: \n",
-    "  - **3D**: Video frames (time × height × width), RGB images (height × width × channels)\n",
-    "  - **4D**: Image batches (batch × height × width × channels)\n",
-    "  - **5D**: Video batches (batch × time × height × width × channels)\n",
-    "- **Operations**: Tensor products, contractions, decompositions\n",
-    "- **ML Context**: Convolutional features, RNN states, transformer attention\n",
-    "\n",
-    "### Why Tensors Matter in ML: The Computational Foundation\n",
-    "\n",
-    "#### **1. Unified Data Representation**\n",
-    "Tensors provide a consistent way to represent all ML data:\n",
-    "```python\n",
-    "# All of these are tensors with different shapes\n",
-    "scalar_loss = Tensor(0.5)              # Shape: ()\n",
-    "feature_vector = Tensor([1, 2, 3])      # Shape: (3,)\n",
-    "weight_matrix = Tensor([[1, 2], [3, 4]]) # Shape: (2, 2)\n",
-    "image_batch = Tensor(np.random.rand(32, 224, 224, 3)) # Shape: (32, 224, 224, 3)\n",
-    "```\n",
-    "\n",
-    "#### **2. Efficient Batch Processing**\n",
-    "ML systems process multiple samples simultaneously:\n",
-    "```python\n",
-    "# Instead of processing one image at a time:\n",
-    "for image in images:\n",
-    "    result = model(image)  # Slow: 1000 separate operations\n",
-    "\n",
-    "# Process entire batch at once:\n",
-    "batch_result = model(image_batch)  # Fast: 1 vectorized operation\n",
-    "```\n",
-    "\n",
-    "#### **3. Hardware Acceleration**\n",
-    "Modern hardware (GPUs, TPUs) excels at tensor operations:\n",
-    "- **Parallel processing**: Multiple operations simultaneously\n",
-    "- **Vectorization**: SIMD (Single Instruction, Multiple Data) operations\n",
-    "- **Memory optimization**: Contiguous memory layout for cache efficiency\n",
-    "\n",
-    "#### **4. Automatic Differentiation**\n",
-    "Tensors enable gradient computation through computational graphs:\n",
-    "```python\n",
-    "# Each tensor operation creates a node in the computation graph\n",
-    "x = Tensor([1, 2, 3])\n",
-    "y = x * 2          # Node: multiplication\n",
-    "z = y + 1          # Node: addition\n",
-    "loss = z.sum()     # Node: summation\n",
-    "# Gradients flow backward through this graph\n",
-    "```\n",
-    "\n",
-    "### Real-World Examples: Tensors in Action\n",
-    "\n",
-    "#### **Computer Vision**\n",
-    "- **Grayscale image**: 2D tensor `(height, width)` - `(28, 28)` for MNIST\n",
-    "- **Color image**: 3D tensor `(height, width, channels)` - `(224, 224, 3)` for RGB\n",
-    "- **Image batch**: 4D tensor `(batch, height, width, channels)` - `(32, 224, 224, 3)`\n",
-    "- **Video**: 5D tensor `(batch, time, height, width, channels)`\n",
-    "\n",
-    "#### **Natural Language Processing**\n",
-    "- **Word embedding**: 1D tensor `(embedding_dim,)` - `(300,)` for Word2Vec\n",
-    "- **Sentence**: 2D tensor `(sequence_length, embedding_dim)` - `(50, 768)` for BERT\n",
-    "- **Batch of sentences**: 3D tensor `(batch, sequence_length, embedding_dim)`\n",
-    "\n",
-    "#### **Audio Processing**\n",
-    "- **Audio signal**: 1D tensor `(time_steps,)` - `(16000,)` for 1 second at 16kHz\n",
-    "- **Spectrogram**: 2D tensor `(time_frames, frequency_bins)`\n",
-    "- **Batch of audio**: 3D tensor `(batch, time_steps, features)`\n",
-    "\n",
-    "#### **Time Series**\n",
-    "- **Single series**: 2D tensor `(time_steps, features)`\n",
-    "- **Multiple series**: 3D tensor `(batch, time_steps, features)`\n",
-    "- **Multivariate forecasting**: 4D tensor `(batch, time_steps, features, predictions)`\n",
-    "\n",
-    "### Why Not Just Use NumPy?\n",
-    "\n",
-    "While we use NumPy internally, our Tensor class adds ML-specific functionality:\n",
-    "\n",
-    "#### **1. ML-Specific Operations**\n",
-    "- **Gradient tracking**: For automatic differentiation (coming in Module 7)\n",
-    "- **GPU support**: For hardware acceleration (future extension)\n",
-    "- **Broadcasting semantics**: ML-friendly dimension handling\n",
-    "\n",
-    "#### **2. Consistent API**\n",
-    "- **Type safety**: Predictable behavior across operations\n",
-    "- **Error checking**: Clear error messages for debugging\n",
-    "- **Integration**: Seamless work with other TinyTorch components\n",
-    "\n",
-    "#### **3. Educational Value**\n",
-    "- **Conceptual clarity**: Understand what tensors really are\n",
-    "- **Implementation insight**: See how frameworks work internally\n",
-    "- **Debugging skills**: Trace through tensor operations step by step\n",
-    "\n",
-    "#### **4. Extensibility**\n",
-    "- **Future features**: Ready for gradients, GPU, distributed computing\n",
-    "- **Customization**: Add domain-specific operations\n",
-    "- **Optimization**: Profile and optimize specific use cases\n",
-    "\n",
-    "### Performance Considerations: Building Efficient Tensors\n",
-    "\n",
-    "#### **Memory Layout**\n",
-    "- **Contiguous arrays**: Better cache locality and performance\n",
-    "- **Data types**: `float32` vs `float64` trade-offs\n",
-    "- **Memory sharing**: Avoid unnecessary copies\n",
-    "\n",
-    "#### **Vectorization**\n",
-    "- **SIMD operations**: Single Instruction, Multiple Data\n",
-    "- **Broadcasting**: Efficient operations on different shapes\n",
-    "- **Batch operations**: Process multiple samples simultaneously\n",
-    "\n",
-    "#### **Numerical Stability**\n",
-    "- **Precision**: Balancing speed and accuracy\n",
-    "- **Overflow/underflow**: Handling extreme values\n",
-    "- **Gradient flow**: Maintaining numerical stability for training\n",
-    "\n",
-    "Let's start building our tensor foundation!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5a2e60a3",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧠 The Mathematical Foundation\n",
-    "\n",
-    "### Linear Algebra Refresher\n",
-    "Tensors are generalizations of scalars, vectors, and matrices:\n",
-    "\n",
-    "```\n",
-    "Scalar (0D): 5\n",
-    "Vector (1D): [1, 2, 3]\n",
-    "Matrix (2D): [[1, 2], [3, 4]]\n",
-    "Tensor (3D): [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]\n",
-    "```\n",
-    "\n",
-    "### Why This Matters for Neural Networks\n",
-    "- **Forward Pass**: Matrix multiplication between layers\n",
-    "- **Batch Processing**: Multiple samples processed simultaneously\n",
-    "- **Convolutions**: 3D operations on image data\n",
-    "- **Gradients**: Derivatives computed across all dimensions\n",
-    "\n",
-    "### Connection to Real ML Systems\n",
-    "Every major ML framework uses tensors:\n",
-    "- **PyTorch**: `torch.Tensor`\n",
-    "- **TensorFlow**: `tf.Tensor`\n",
-    "- **JAX**: `jax.numpy.ndarray`\n",
-    "- **TinyTorch**: `tinytorch.core.tensor.Tensor` (what we're building!)\n",
-    "\n",
-    "### Performance Considerations\n",
-    "- **Memory Layout**: Contiguous arrays for cache efficiency\n",
-    "- **Vectorization**: SIMD operations for speed\n",
-    "- **Broadcasting**: Efficient operations on different shapes\n",
-    "- **Type Consistency**: Avoiding unnecessary conversions"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d543ca4a",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: The Tensor Class Foundation\n",
-    "\n",
-    "### Core Concept: Wrapping NumPy with ML Intelligence\n",
-    "Our Tensor class wraps NumPy arrays with ML-specific functionality. This design pattern is used by all major ML frameworks:\n",
-    "\n",
-    "- **PyTorch**: `torch.Tensor` wraps ATen (C++ tensor library)\n",
-    "- **TensorFlow**: `tf.Tensor` wraps Eigen (C++ linear algebra library)\n",
-    "- **JAX**: `jax.numpy.ndarray` wraps XLA (Google's linear algebra compiler)\n",
-    "- **TinyTorch**: `Tensor` wraps NumPy (Python's numerical computing library)\n",
-    "\n",
-    "### Design Requirements Analysis\n",
-    "\n",
-    "#### **1. Input Flexibility**\n",
-    "Our tensor must handle diverse input types:\n",
-    "```python\n",
-    "# Scalars (Python numbers)\n",
-    "t1 = Tensor(5)           # int → numpy array\n",
-    "t2 = Tensor(3.14)        # float → numpy array\n",
-    "\n",
-    "# Lists (Python sequences)\n",
-    "t3 = Tensor([1, 2, 3])   # list → numpy array\n",
-    "t4 = Tensor([[1, 2], [3, 4]])  # nested list → 2D array\n",
-    "\n",
-    "# NumPy arrays (existing arrays)\n",
-    "t5 = Tensor(np.array([1, 2, 3]))  # array → tensor wrapper\n",
-    "```\n",
-    "\n",
-    "#### **2. Type Management**\n",
-    "ML systems need consistent, predictable types:\n",
-    "- **Default behavior**: Auto-detect appropriate types\n",
-    "- **Explicit control**: Allow manual type specification\n",
-    "- **Performance optimization**: Prefer `float32` over `float64`\n",
-    "- **Memory efficiency**: Use appropriate precision\n",
-    "\n",
-    "#### **3. Property Access**\n",
-    "Essential tensor properties for ML operations:\n",
-    "- **Shape**: Dimensions for compatibility checking\n",
-    "- **Size**: Total elements for memory estimation\n",
-    "- **Data type**: For numerical computation planning\n",
-    "- **Data access**: For integration with other libraries\n",
-    "\n",
-    "#### **4. Arithmetic Operations**\n",
-    "Support for mathematical operations:\n",
-    "- **Element-wise**: Addition, multiplication, subtraction, division\n",
-    "- **Broadcasting**: Operations on different shapes\n",
-    "- **Type promotion**: Consistent result types\n",
-    "- **Error handling**: Clear messages for incompatible operations\n",
-    "\n",
-    "### Implementation Strategy\n",
-    "\n",
-    "#### **Memory Management**\n",
-    "- **Copy vs. Reference**: When to copy data vs. share memory\n",
-    "- **Type conversion**: Efficient dtype changes\n",
-    "- **Contiguous layout**: Ensure optimal memory access patterns\n",
-    "\n",
-    "#### **Error Handling**\n",
-    "- **Input validation**: Check for valid input types\n",
-    "- **Shape compatibility**: Verify operations are mathematically valid\n",
-    "- **Informative messages**: Help users debug issues quickly\n",
-    "\n",
-    "#### **Performance Optimization**\n",
-    "- **Lazy evaluation**: Defer expensive operations when possible\n",
-    "- **Vectorization**: Use NumPy's optimized operations\n",
-    "- **Memory reuse**: Minimize unnecessary allocations\n",
-    "\n",
-    "### Learning Objectives for Implementation\n",
-    "\n",
-    "By implementing this Tensor class, you'll learn:\n",
-    "1. **Wrapper pattern**: How to extend existing libraries\n",
-    "2. **Type system design**: Managing data types in numerical computing\n",
-    "3. **API design**: Creating intuitive, consistent interfaces\n",
-    "4. **Performance considerations**: Balancing flexibility and speed\n",
-    "5. **Error handling**: Providing helpful feedback to users\n",
-    "\n",
-    "Let's implement our tensor foundation!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0b3fe49f",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "tensor-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Tensor:\n",
-    "    \"\"\"\n",
-    "    TinyTorch Tensor: N-dimensional array with ML operations.\n",
-    "    \n",
-    "    The fundamental data structure for all TinyTorch operations.\n",
-    "    Wraps NumPy arrays with ML-specific functionality.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, data: Union[int, float, List, np.ndarray], dtype: Optional[str] = None):\n",
-    "        \"\"\"\n",
-    "        Create a new tensor from data.\n",
-    "        \n",
-    "        Args:\n",
-    "            data: Input data (scalar, list, or numpy array)\n",
-    "            dtype: Data type ('float32', 'int32', etc.). Defaults to auto-detect.\n",
-    "            \n",
-    "        TODO: Implement tensor creation with proper type handling.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Check if data is a scalar (int/float) - convert to numpy array\n",
-    "        2. Check if data is a list - convert to numpy array  \n",
-    "        3. Check if data is already a numpy array - use as-is\n",
-    "        4. Apply dtype conversion if specified\n",
-    "        5. Store the result in self._data\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor(5) → stores np.array(5)\n",
-    "        Tensor([1, 2, 3]) → stores np.array([1, 2, 3])\n",
-    "        Tensor(np.array([1, 2, 3])) → stores the array directly\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use isinstance() to check data types\n",
-    "        - Use np.array() for conversion\n",
-    "        - Handle dtype parameter for type conversion\n",
-    "        - Store the array in self._data\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Convert input to numpy array\n",
-    "        if isinstance(data, (int, float, np.number)):\n",
-    "            # Handle Python and NumPy scalars\n",
-    "            if dtype is None:\n",
-    "                # Auto-detect type: int for integers, float32 for floats\n",
-    "                if isinstance(data, int) or (isinstance(data, np.number) and np.issubdtype(type(data), np.integer)):\n",
-    "                    dtype = 'int32'\n",
-    "                else:\n",
-    "                    dtype = 'float32'\n",
-    "            self._data = np.array(data, dtype=dtype)\n",
-    "        elif isinstance(data, list):\n",
-    "            # Let NumPy auto-detect type, then convert if needed\n",
-    "            temp_array = np.array(data)\n",
-    "            if dtype is None:\n",
-    "                # Use NumPy's auto-detected type, but prefer float32 for floats\n",
-    "                if temp_array.dtype == np.float64:\n",
-    "                    dtype = 'float32'\n",
-    "                else:\n",
-    "                    dtype = str(temp_array.dtype)\n",
-    "            self._data = np.array(data, dtype=dtype)\n",
-    "        elif isinstance(data, np.ndarray):\n",
-    "            # Already a numpy array\n",
-    "            if dtype is None:\n",
-    "                # Keep existing dtype, but prefer float32 for float64\n",
-    "                if data.dtype == np.float64:\n",
-    "                    dtype = 'float32'\n",
-    "                else:\n",
-    "                    dtype = str(data.dtype)\n",
-    "            self._data = data.astype(dtype) if dtype != data.dtype else data.copy()\n",
-    "        else:\n",
-    "            # Try to convert unknown types\n",
-    "            self._data = np.array(data, dtype=dtype)\n",
-    "        ### END SOLUTION\n",
-    "\n",
-    "    @property\n",
-    "    def data(self) -> np.ndarray:\n",
-    "        \"\"\"\n",
-    "        Access underlying numpy array.\n",
-    "        \n",
-    "        TODO: Return the stored numpy array.\n",
-    "        \n",
-    "        HINT: Return self._data (the array you stored in __init__)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return self._data\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    @property\n",
-    "    def shape(self) -> Tuple[int, ...]:\n",
-    "        \"\"\"\n",
-    "        Get tensor shape.\n",
-    "        \n",
-    "        TODO: Return the shape of the stored numpy array.\n",
-    "        \n",
-    "        HINT: Use .shape attribute of the numpy array\n",
-    "        EXAMPLE: Tensor([1, 2, 3]).shape should return (3,)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return self._data.shape\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    @property\n",
-    "    def size(self) -> int:\n",
-    "        \"\"\"\n",
-    "        Get total number of elements.\n",
-    "        \n",
-    "        TODO: Return the total number of elements in the tensor.\n",
-    "        \n",
-    "        HINT: Use .size attribute of the numpy array\n",
-    "        EXAMPLE: Tensor([1, 2, 3]).size should return 3\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return self._data.size\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    @property\n",
-    "    def dtype(self) -> np.dtype:\n",
-    "        \"\"\"\n",
-    "        Get data type as numpy dtype.\n",
-    "        \n",
-    "        TODO: Return the data type of the stored numpy array.\n",
-    "        \n",
-    "        HINT: Use .dtype attribute of the numpy array\n",
-    "        EXAMPLE: Tensor([1, 2, 3]).dtype should return dtype('int32')\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return self._data.dtype\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __repr__(self) -> str:\n",
-    "        \"\"\"\n",
-    "        String representation.\n",
-    "        \n",
-    "        TODO: Create a clear string representation of the tensor.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Convert the numpy array to a list for readable output\n",
-    "        2. Include the shape and dtype information\n",
-    "        3. Format: \"Tensor([data], shape=shape, dtype=dtype)\"\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor([1, 2, 3]) → \"Tensor([1, 2, 3], shape=(3,), dtype=int32)\"\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use .tolist() to convert numpy array to list\n",
-    "        - Include shape and dtype information\n",
-    "        - Keep format consistent and readable\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return f\"Tensor({self._data.tolist()}, shape={self.shape}, dtype={self.dtype})\"\n",
-    "        ### END SOLUTION\n",
-    "\n",
-    "    def add(self, other: 'Tensor') -> 'Tensor':\n",
-    "        \"\"\"\n",
-    "        Add two tensors element-wise.\n",
-    "        \n",
-    "        TODO: Implement tensor addition.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Add the numpy arrays using +\n",
-    "        2. Return a new Tensor with the result\n",
-    "        3. Handle broadcasting automatically\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor([1, 2]) + Tensor([3, 4]) → Tensor([4, 6])\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use self._data + other._data\n",
-    "        - Return Tensor(result)\n",
-    "        - NumPy handles broadcasting automatically\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        result = self._data + other._data\n",
-    "        return Tensor(result)\n",
-    "        ### END SOLUTION\n",
-    "\n",
-    "    def multiply(self, other: 'Tensor') -> 'Tensor':\n",
-    "        \"\"\"\n",
-    "        Multiply two tensors element-wise.\n",
-    "        \n",
-    "        TODO: Implement tensor multiplication.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Multiply the numpy arrays using *\n",
-    "        2. Return a new Tensor with the result\n",
-    "        3. Handle broadcasting automatically\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor([1, 2]) * Tensor([3, 4]) → Tensor([3, 8])\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use self._data * other._data\n",
-    "        - Return Tensor(result)\n",
-    "        - This is element-wise, not matrix multiplication\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        result = self._data * other._data\n",
-    "        return Tensor(result)\n",
-    "        ### END SOLUTION\n",
-    "\n",
-    "    def __add__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
-    "        \"\"\"\n",
-    "        Addition operator: tensor + other\n",
-    "        \n",
-    "        TODO: Implement + operator for tensors.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. If other is a Tensor, use tensor addition\n",
-    "        2. If other is a scalar, convert to Tensor first\n",
-    "        3. Return the result\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor([1, 2]) + Tensor([3, 4]) → Tensor([4, 6])\n",
-    "        Tensor([1, 2]) + 5 → Tensor([6, 7])\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if isinstance(other, Tensor):\n",
-    "            return self.add(other)\n",
-    "        else:\n",
-    "            return self.add(Tensor(other))\n",
-    "        ### END SOLUTION\n",
-    "\n",
-    "    def __mul__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
-    "        \"\"\"\n",
-    "        Multiplication operator: tensor * other\n",
-    "        \n",
-    "        TODO: Implement * operator for tensors.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. If other is a Tensor, use tensor multiplication\n",
-    "        2. If other is a scalar, convert to Tensor first\n",
-    "        3. Return the result\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor([1, 2]) * Tensor([3, 4]) → Tensor([3, 8])\n",
-    "        Tensor([1, 2]) * 3 → Tensor([3, 6])\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if isinstance(other, Tensor):\n",
-    "            return self.multiply(other)\n",
-    "        else:\n",
-    "            return self.multiply(Tensor(other))\n",
-    "        ### END SOLUTION\n",
-    "\n",
-    "    def __sub__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
-    "        \"\"\"\n",
-    "        Subtraction operator: tensor - other\n",
-    "        \n",
-    "        TODO: Implement - operator for tensors.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Convert other to Tensor if needed\n",
-    "        2. Subtract using numpy arrays\n",
-    "        3. Return new Tensor with result\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor([5, 6]) - Tensor([1, 2]) → Tensor([4, 4])\n",
-    "        Tensor([5, 6]) - 1 → Tensor([4, 5])\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if isinstance(other, Tensor):\n",
-    "            result = self._data - other._data\n",
-    "        else:\n",
-    "            result = self._data - other\n",
-    "        return Tensor(result)\n",
-    "        ### END SOLUTION\n",
-    "\n",
-    "    def __truediv__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
-    "        \"\"\"\n",
-    "        Division operator: tensor / other\n",
-    "        \n",
-    "        TODO: Implement / operator for tensors.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Convert other to Tensor if needed\n",
-    "        2. Divide using numpy arrays\n",
-    "        3. Return new Tensor with result\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Tensor([6, 8]) / Tensor([2, 4]) → Tensor([3, 2])\n",
-    "        Tensor([6, 8]) / 2 → Tensor([3, 4])\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if isinstance(other, Tensor):\n",
-    "            result = self._data / other._data\n",
-    "        else:\n",
-    "            result = self._data / other\n",
-    "        return Tensor(result)\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f9b2247b",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Tensor Creation\n",
-    "\n",
-    "Let's test your tensor creation implementation right away! This gives you immediate feedback on whether your `__init__` method works correctly.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific function (tensor creation) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "010efe49",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-creation-immediate",
-     "locked": true,
-     "points": 5,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test tensor creation immediately after implementation\n",
-    "print(\"🔬 Unit Test: Tensor Creation...\")\n",
-    "\n",
-    "# Test basic tensor creation\n",
-    "try:\n",
-    "    # Test scalar\n",
-    "    scalar = Tensor(5.0)\n",
-    "    assert hasattr(scalar, '_data'), \"Tensor should have _data attribute\"\n",
-    "    assert scalar._data.shape == (), f\"Scalar should have shape (), got {scalar._data.shape}\"\n",
-    "    print(\"✅ Scalar creation works\")\n",
-    "    \n",
-    "    # Test vector\n",
-    "    vector = Tensor([1, 2, 3])\n",
-    "    assert vector._data.shape == (3,), f\"Vector should have shape (3,), got {vector._data.shape}\"\n",
-    "    print(\"✅ Vector creation works\")\n",
-    "    \n",
-    "    # Test matrix\n",
-    "    matrix = Tensor([[1, 2], [3, 4]])\n",
-    "    assert matrix._data.shape == (2, 2), f\"Matrix should have shape (2, 2), got {matrix._data.shape}\"\n",
-    "    print(\"✅ Matrix creation works\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Tensor Creation ✓\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Tensor creation test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "print(\"🎯 Tensor creation behavior:\")\n",
-    "print(\"   Converts data to NumPy arrays\")\n",
-    "print(\"   Preserves shape and data type\")\n",
-    "print(\"   Stores in _data attribute\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "41578d61",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Tensor Properties\n",
-    "\n",
-    "Now let's test that your tensor properties work correctly. This tests the @property methods you implemented.\n",
-    "\n",
-    "**This is a unit test** - it tests specific properties (shape, size, dtype, data) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ec6fa2f5",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-properties-immediate",
-     "locked": true,
-     "points": 5,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test tensor properties immediately after implementation\n",
-    "print(\"🔬 Unit Test: Tensor Properties...\")\n",
-    "\n",
-    "# Test properties with simple examples\n",
-    "try:\n",
-    "    # Test with a simple matrix\n",
-    "    tensor = Tensor([[1, 2, 3], [4, 5, 6]])\n",
-    "    \n",
-    "    # Test shape property\n",
-    "    assert tensor.shape == (2, 3), f\"Shape should be (2, 3), got {tensor.shape}\"\n",
-    "    print(\"✅ Shape property works\")\n",
-    "    \n",
-    "    # Test size property\n",
-    "    assert tensor.size == 6, f\"Size should be 6, got {tensor.size}\"\n",
-    "    print(\"✅ Size property works\")\n",
-    "    \n",
-    "    # Test data property\n",
-    "    assert np.array_equal(tensor.data, np.array([[1, 2, 3], [4, 5, 6]])), \"Data property should return numpy array\"\n",
-    "    print(\"✅ Data property works\")\n",
-    "    \n",
-    "    # Test dtype property\n",
-    "    assert tensor.dtype in [np.int32, np.int64], f\"Dtype should be int32 or int64, got {tensor.dtype}\"\n",
-    "    print(\"✅ Dtype property works\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Tensor Properties ✓\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Tensor properties test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "print(\"🎯 Tensor properties behavior:\")\n",
-    "print(\"   shape: Returns tuple of dimensions\")\n",
-    "print(\"   size: Returns total number of elements\")\n",
-    "print(\"   data: Returns underlying NumPy array\")\n",
-    "print(\"   dtype: Returns NumPy data type\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e4270b31",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Tensor Arithmetic\n",
-    "\n",
-    "Let's test your tensor arithmetic operations. This tests the __add__, __mul__, __sub__, __truediv__ methods.\n",
-    "\n",
-    "**This is a unit test** - it tests specific arithmetic operations in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f64ec4db",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-arithmetic-immediate",
-     "locked": true,
-     "points": 5,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test tensor arithmetic immediately after implementation\n",
-    "print(\"🔬 Unit Test: Tensor Arithmetic...\")\n",
-    "\n",
-    "# Test basic arithmetic with simple examples\n",
-    "try:\n",
-    "    # Test addition\n",
-    "    a = Tensor([1, 2, 3])\n",
-    "    b = Tensor([4, 5, 6])\n",
-    "    result = a + b\n",
-    "    expected = np.array([5, 7, 9])\n",
-    "    assert np.array_equal(result.data, expected), f\"Addition failed: expected {expected}, got {result.data}\"\n",
-    "    print(\"✅ Addition works\")\n",
-    "    \n",
-    "    # Test scalar addition\n",
-    "    result_scalar = a + 10\n",
-    "    expected_scalar = np.array([11, 12, 13])\n",
-    "    assert np.array_equal(result_scalar.data, expected_scalar), f\"Scalar addition failed: expected {expected_scalar}, got {result_scalar.data}\"\n",
-    "    print(\"✅ Scalar addition works\")\n",
-    "    \n",
-    "    # Test multiplication\n",
-    "    result_mul = a * b\n",
-    "    expected_mul = np.array([4, 10, 18])\n",
-    "    assert np.array_equal(result_mul.data, expected_mul), f\"Multiplication failed: expected {expected_mul}, got {result_mul.data}\"\n",
-    "    print(\"✅ Multiplication works\")\n",
-    "    \n",
-    "    # Test scalar multiplication\n",
-    "    result_scalar_mul = a * 2\n",
-    "    expected_scalar_mul = np.array([2, 4, 6])\n",
-    "    assert np.array_equal(result_scalar_mul.data, expected_scalar_mul), f\"Scalar multiplication failed: expected {expected_scalar_mul}, got {result_scalar_mul.data}\"\n",
-    "    print(\"✅ Scalar multiplication works\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Tensor Arithmetic ✓\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Tensor arithmetic test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "print(\"🎯 Tensor arithmetic behavior:\")\n",
-    "print(\"   Element-wise operations on tensors\")\n",
-    "print(\"   Broadcasting with scalars\")\n",
-    "print(\"   Returns new Tensor objects\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "741633c3",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Comprehensive Test: Tensor Creation\n",
-    "\n",
-    "Let's thoroughly test your tensor creation to make sure it handles all the cases you'll encounter in ML.\n",
-    "This tests the foundation of everything else we'll build."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0ab23cfa",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-creation-comprehensive",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_tensor_creation():\n",
-    "    \"\"\"Comprehensive test of tensor creation with all data types and shapes.\"\"\"\n",
-    "    print(\"🔬 Testing comprehensive tensor creation...\")\n",
-    "    \n",
-    "    tests_passed = 0\n",
-    "    total_tests = 8\n",
-    "    \n",
-    "    # Test 1: Scalar creation (0D tensor)\n",
-    "    try:\n",
-    "        scalar_int = Tensor(42)\n",
-    "        scalar_float = Tensor(3.14)\n",
-    "        scalar_zero = Tensor(0)\n",
-    "        \n",
-    "        assert hasattr(scalar_int, '_data'), \"Tensor should have _data attribute\"\n",
-    "        assert scalar_int._data.shape == (), f\"Scalar should have shape (), got {scalar_int._data.shape}\"\n",
-    "        assert scalar_float._data.shape == (), f\"Float scalar should have shape (), got {scalar_float._data.shape}\"\n",
-    "        assert scalar_zero._data.shape == (), f\"Zero scalar should have shape (), got {scalar_zero._data.shape}\"\n",
-    "        \n",
-    "        print(\"✅ Scalar creation: integers, floats, and zero\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Scalar creation failed: {e}\")\n",
-    "    \n",
-    "    # Test 2: Vector creation (1D tensor)\n",
-    "    try:\n",
-    "        vector_int = Tensor([1, 2, 3, 4, 5])\n",
-    "        vector_float = Tensor([1.0, 2.5, 3.7])\n",
-    "        vector_single = Tensor([42])\n",
-    "        vector_empty = Tensor([])\n",
-    "        \n",
-    "        assert vector_int._data.shape == (5,), f\"Int vector should have shape (5,), got {vector_int._data.shape}\"\n",
-    "        assert vector_float._data.shape == (3,), f\"Float vector should have shape (3,), got {vector_float._data.shape}\"\n",
-    "        assert vector_single._data.shape == (1,), f\"Single element vector should have shape (1,), got {vector_single._data.shape}\"\n",
-    "        assert vector_empty._data.shape == (0,), f\"Empty vector should have shape (0,), got {vector_empty._data.shape}\"\n",
-    "        \n",
-    "        print(\"✅ Vector creation: integers, floats, single element, and empty\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Vector creation failed: {e}\")\n",
-    "    \n",
-    "    # Test 3: Matrix creation (2D tensor)\n",
-    "    try:\n",
-    "        matrix_2x2 = Tensor([[1, 2], [3, 4]])\n",
-    "        matrix_3x2 = Tensor([[1, 2], [3, 4], [5, 6]])\n",
-    "        matrix_1x3 = Tensor([[1, 2, 3]])\n",
-    "        \n",
-    "        assert matrix_2x2._data.shape == (2, 2), f\"2x2 matrix should have shape (2, 2), got {matrix_2x2._data.shape}\"\n",
-    "        assert matrix_3x2._data.shape == (3, 2), f\"3x2 matrix should have shape (3, 2), got {matrix_3x2._data.shape}\"\n",
-    "        assert matrix_1x3._data.shape == (1, 3), f\"1x3 matrix should have shape (1, 3), got {matrix_1x3._data.shape}\"\n",
-    "        \n",
-    "        print(\"✅ Matrix creation: 2x2, 3x2, and 1x3 matrices\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Matrix creation failed: {e}\")\n",
-    "    \n",
-    "    # Test 4: Data type handling\n",
-    "    try:\n",
-    "        int_tensor = Tensor([1, 2, 3])\n",
-    "        float_tensor = Tensor([1.0, 2.0, 3.0])\n",
-    "        mixed_tensor = Tensor([1, 2.5, 3])  # Should convert to float\n",
-    "        \n",
-    "        # Check that data types are reasonable\n",
-    "        assert int_tensor._data.dtype in [np.int32, np.int64], f\"Int tensor has unexpected dtype: {int_tensor._data.dtype}\"\n",
-    "        assert float_tensor._data.dtype in [np.float32, np.float64], f\"Float tensor has unexpected dtype: {float_tensor._data.dtype}\"\n",
-    "        assert mixed_tensor._data.dtype in [np.float32, np.float64], f\"Mixed tensor should be float, got: {mixed_tensor._data.dtype}\"\n",
-    "        \n",
-    "        print(\"✅ Data type handling: integers, floats, and mixed types\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Data type handling failed: {e}\")\n",
-    "    \n",
-    "    # Test 5: NumPy array input\n",
-    "    try:\n",
-    "        np_array = np.array([1, 2, 3, 4])\n",
-    "        tensor_from_np = Tensor(np_array)\n",
-    "        \n",
-    "        assert tensor_from_np._data.shape == (4,), f\"Tensor from NumPy should have shape (4,), got {tensor_from_np._data.shape}\"\n",
-    "        assert np.array_equal(tensor_from_np._data, np_array), \"Tensor from NumPy should preserve data\"\n",
-    "        \n",
-    "        print(\"✅ NumPy array input: conversion works correctly\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ NumPy array input failed: {e}\")\n",
-    "    \n",
-    "    # Test 6: Large tensor creation\n",
-    "    try:\n",
-    "        large_tensor = Tensor(list(range(1000)))\n",
-    "        assert large_tensor._data.shape == (1000,), f\"Large tensor should have shape (1000,), got {large_tensor._data.shape}\"\n",
-    "        assert large_tensor._data[0] == 0, \"Large tensor should start with 0\"\n",
-    "        assert large_tensor._data[-1] == 999, \"Large tensor should end with 999\"\n",
-    "        \n",
-    "        print(\"✅ Large tensor creation: 1000 elements\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Large tensor creation failed: {e}\")\n",
-    "    \n",
-    "    # Test 7: Negative numbers\n",
-    "    try:\n",
-    "        negative_tensor = Tensor([-1, -2, -3])\n",
-    "        mixed_signs = Tensor([-1, 0, 1])\n",
-    "        \n",
-    "        assert negative_tensor._data.shape == (3,), f\"Negative tensor should have shape (3,), got {negative_tensor._data.shape}\"\n",
-    "        assert np.array_equal(negative_tensor._data, np.array([-1, -2, -3])), \"Negative numbers should be preserved\"\n",
-    "        assert np.array_equal(mixed_signs._data, np.array([-1, 0, 1])), \"Mixed signs should be preserved\"\n",
-    "        \n",
-    "        print(\"✅ Negative numbers: handled correctly\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Negative numbers failed: {e}\")\n",
-    "    \n",
-    "    # Test 8: Edge cases\n",
-    "    try:\n",
-    "        # Very large numbers\n",
-    "        big_tensor = Tensor([1e6, 1e-6])\n",
-    "        assert big_tensor._data.shape == (2,), \"Big numbers tensor should have correct shape\"\n",
-    "        \n",
-    "        # Zero tensor\n",
-    "        zero_tensor = Tensor([0, 0, 0])\n",
-    "        assert np.all(zero_tensor._data == 0), \"Zero tensor should contain all zeros\"\n",
-    "        \n",
-    "        print(\"✅ Edge cases: large numbers and zeros\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Edge cases failed: {e}\")\n",
-    "    \n",
-    "    # Results summary\n",
-    "    print(f\"\\n📊 Tensor Creation Results: {tests_passed}/{total_tests} tests passed\")\n",
-    "    \n",
-    "    if tests_passed == total_tests:\n",
-    "        print(\"🎉 All tensor creation tests passed! Your Tensor class can handle:\")\n",
-    "        print(\"  • Scalars, vectors, and matrices\")\n",
-    "        print(\"  • Different data types (int, float)\")\n",
-    "        print(\"  • NumPy arrays\")\n",
-    "        print(\"  • Large tensors and edge cases\")\n",
-    "        print(\"📈 Progress: Tensor Creation ✓\")\n",
-    "        return True\n",
-    "    else:\n",
-    "        print(\"⚠️  Some tensor creation tests failed. Common issues:\")\n",
-    "        print(\"  • Check your __init__ method implementation\")\n",
-    "        print(\"  • Make sure you're storing data in self._data\")\n",
-    "        print(\"  • Verify NumPy array conversion works correctly\")\n",
-    "        print(\"  • Test with different input types (int, float, list, np.array)\")\n",
-    "        return False\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "success = test_tensor_creation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c4ea8283",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Comprehensive Test: Tensor Properties\n",
-    "\n",
-    "Now let's test all the properties your tensor should have. These properties are essential for ML operations."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "96f34f0e",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-properties-comprehensive",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_tensor_properties():\n",
-    "    \"\"\"Comprehensive test of tensor properties (shape, size, dtype, data access).\"\"\"\n",
-    "    print(\"🔬 Testing comprehensive tensor properties...\")\n",
-    "    \n",
-    "    tests_passed = 0\n",
-    "    total_tests = 6\n",
-    "    \n",
-    "    # Test 1: Shape property\n",
-    "    try:\n",
-    "        scalar = Tensor(5.0)\n",
-    "        vector = Tensor([1, 2, 3])\n",
-    "        matrix = Tensor([[1, 2], [3, 4]])\n",
-    "        tensor_3d = Tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])\n",
-    "        \n",
-    "        assert scalar.shape == (), f\"Scalar shape should be (), got {scalar.shape}\"\n",
-    "        assert vector.shape == (3,), f\"Vector shape should be (3,), got {vector.shape}\"\n",
-    "        assert matrix.shape == (2, 2), f\"Matrix shape should be (2, 2), got {matrix.shape}\"\n",
-    "        assert tensor_3d.shape == (2, 2, 2), f\"3D tensor shape should be (2, 2, 2), got {tensor_3d.shape}\"\n",
-    "        \n",
-    "        print(\"✅ Shape property: scalar, vector, matrix, and 3D tensor\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Shape property failed: {e}\")\n",
-    "    \n",
-    "    # Test 2: Size property\n",
-    "    try:\n",
-    "        scalar = Tensor(5.0)\n",
-    "        vector = Tensor([1, 2, 3])\n",
-    "        matrix = Tensor([[1, 2], [3, 4]])\n",
-    "        empty = Tensor([])\n",
-    "        \n",
-    "        assert scalar.size == 1, f\"Scalar size should be 1, got {scalar.size}\"\n",
-    "        assert vector.size == 3, f\"Vector size should be 3, got {vector.size}\"\n",
-    "        assert matrix.size == 4, f\"Matrix size should be 4, got {matrix.size}\"\n",
-    "        assert empty.size == 0, f\"Empty tensor size should be 0, got {empty.size}\"\n",
-    "        \n",
-    "        print(\"✅ Size property: scalar, vector, matrix, and empty tensor\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Size property failed: {e}\")\n",
-    "    \n",
-    "    # Test 3: Data type property\n",
-    "    try:\n",
-    "        int_tensor = Tensor([1, 2, 3])\n",
-    "        float_tensor = Tensor([1.0, 2.0, 3.0])\n",
-    "        \n",
-    "        # Check that dtype is accessible and reasonable\n",
-    "        assert hasattr(int_tensor, 'dtype'), \"Tensor should have dtype property\"\n",
-    "        assert hasattr(float_tensor, 'dtype'), \"Tensor should have dtype property\"\n",
-    "        \n",
-    "        # Data types should be NumPy dtypes\n",
-    "        assert isinstance(int_tensor.dtype, np.dtype), f\"dtype should be np.dtype, got {type(int_tensor.dtype)}\"\n",
-    "        assert isinstance(float_tensor.dtype, np.dtype), f\"dtype should be np.dtype, got {type(float_tensor.dtype)}\"\n",
-    "        \n",
-    "        print(f\"✅ Data type property: int tensor is {int_tensor.dtype}, float tensor is {float_tensor.dtype}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Data type property failed: {e}\")\n",
-    "    \n",
-    "    # Test 4: Data access property\n",
-    "    try:\n",
-    "        scalar = Tensor(5.0)\n",
-    "        vector = Tensor([1, 2, 3])\n",
-    "        matrix = Tensor([[1, 2], [3, 4]])\n",
-    "        \n",
-    "        # Test data access\n",
-    "        assert hasattr(scalar, 'data'), \"Tensor should have data property\"\n",
-    "        assert hasattr(vector, 'data'), \"Tensor should have data property\"\n",
-    "        assert hasattr(matrix, 'data'), \"Tensor should have data property\"\n",
-    "        \n",
-    "        # Test data content\n",
-    "        assert scalar.data.item() == 5.0, f\"Scalar data should be 5.0, got {scalar.data.item()}\"\n",
-    "        assert np.array_equal(vector.data, np.array([1, 2, 3])), \"Vector data mismatch\"\n",
-    "        assert np.array_equal(matrix.data, np.array([[1, 2], [3, 4]])), \"Matrix data mismatch\"\n",
-    "        \n",
-    "        print(\"✅ Data access: scalar, vector, and matrix data retrieval\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Data access failed: {e}\")\n",
-    "    \n",
-    "    # Test 5: String representation\n",
-    "    try:\n",
-    "        scalar = Tensor(5.0)\n",
-    "        vector = Tensor([1, 2, 3])\n",
-    "        \n",
-    "        # Test that __repr__ works\n",
-    "        scalar_str = str(scalar)\n",
-    "        vector_str = str(vector)\n",
-    "        \n",
-    "        assert isinstance(scalar_str, str), \"Tensor string representation should be a string\"\n",
-    "        assert isinstance(vector_str, str), \"Tensor string representation should be a string\"\n",
-    "        assert len(scalar_str) > 0, \"Tensor string representation should not be empty\"\n",
-    "        assert len(vector_str) > 0, \"Tensor string representation should not be empty\"\n",
-    "        \n",
-    "        print(f\"✅ String representation: scalar={scalar_str[:50]}{'...' if len(scalar_str) > 50 else ''}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ String representation failed: {e}\")\n",
-    "    \n",
-    "    # Test 6: Property consistency\n",
-    "    try:\n",
-    "        test_cases = [\n",
-    "            Tensor(42),\n",
-    "            Tensor([1, 2, 3, 4, 5]),\n",
-    "            Tensor([[1, 2, 3], [4, 5, 6]]),\n",
-    "            Tensor([])\n",
-    "        ]\n",
-    "        \n",
-    "        for i, tensor in enumerate(test_cases):\n",
-    "            # Size should equal product of shape\n",
-    "            expected_size = np.prod(tensor.shape) if tensor.shape else 1\n",
-    "            assert tensor.size == expected_size, f\"Test case {i}: size {tensor.size} doesn't match shape {tensor.shape}\"\n",
-    "            \n",
-    "            # Data shape should match tensor shape\n",
-    "            assert tensor.data.shape == tensor.shape, f\"Test case {i}: data shape {tensor.data.shape} doesn't match tensor shape {tensor.shape}\"\n",
-    "        \n",
-    "        print(\"✅ Property consistency: size matches shape, data shape matches tensor shape\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Property consistency failed: {e}\")\n",
-    "    \n",
-    "    # Results summary\n",
-    "    print(f\"\\n📊 Tensor Properties Results: {tests_passed}/{total_tests} tests passed\")\n",
-    "    \n",
-    "    if tests_passed == total_tests:\n",
-    "        print(\"🎉 All tensor property tests passed! Your tensor has:\")\n",
-    "        print(\"  • Correct shape property for all dimensions\")\n",
-    "        print(\"  • Accurate size calculation\")\n",
-    "        print(\"  • Proper data type handling\")\n",
-    "        print(\"  • Working data access\")\n",
-    "        print(\"  • Good string representation\")\n",
-    "        print(\"📈 Progress: Tensor Creation ✓, Properties ✓\")\n",
-    "        return True\n",
-    "    else:\n",
-    "        print(\"⚠️  Some property tests failed. Common issues:\")\n",
-    "        print(\"  • Check your @property decorators\")\n",
-    "        print(\"  • Verify shape returns self._data.shape\")\n",
-    "        print(\"  • Make sure size returns self._data.size\")\n",
-    "        print(\"  • Ensure dtype returns self._data.dtype\")\n",
-    "        print(\"  • Test your __repr__ method\")\n",
-    "        return False\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "success = test_tensor_properties() and success"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b692b968",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Comprehensive Test: Tensor Arithmetic\n",
-    "\n",
-    "Let's test all arithmetic operations. These are the foundation of neural network computations!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "12022726",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-arithmetic-comprehensive",
-     "locked": true,
-     "points": 20,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_tensor_arithmetic():\n",
-    "    \"\"\"Comprehensive test of tensor arithmetic operations.\"\"\"\n",
-    "    print(\"🔬 Testing comprehensive tensor arithmetic...\")\n",
-    "    \n",
-    "    tests_passed = 0\n",
-    "    total_tests = 8\n",
-    "    \n",
-    "    # Test 1: Basic addition method\n",
-    "    try:\n",
-    "        a = Tensor([1, 2, 3])\n",
-    "        b = Tensor([4, 5, 6])\n",
-    "        c = a.add(b)\n",
-    "        \n",
-    "        expected = np.array([5, 7, 9])\n",
-    "        assert np.array_equal(c.data, expected), f\"Addition method failed: expected {expected}, got {c.data}\"\n",
-    "        assert isinstance(c, Tensor), \"Addition should return a Tensor\"\n",
-    "        \n",
-    "        print(f\"✅ Addition method: {a.data} + {b.data} = {c.data}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Addition method failed: {e}\")\n",
-    "    \n",
-    "    # Test 2: Basic multiplication method\n",
-    "    try:\n",
-    "        a = Tensor([1, 2, 3])\n",
-    "        b = Tensor([4, 5, 6])\n",
-    "        c = a.multiply(b)\n",
-    "        \n",
-    "        expected = np.array([4, 10, 18])\n",
-    "        assert np.array_equal(c.data, expected), f\"Multiplication method failed: expected {expected}, got {c.data}\"\n",
-    "        assert isinstance(c, Tensor), \"Multiplication should return a Tensor\"\n",
-    "        \n",
-    "        print(f\"✅ Multiplication method: {a.data} * {b.data} = {c.data}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Multiplication method failed: {e}\")\n",
-    "    \n",
-    "    # Test 3: Addition operator (+)\n",
-    "    try:\n",
-    "        a = Tensor([1, 2, 3])\n",
-    "        b = Tensor([4, 5, 6])\n",
-    "        c = a + b\n",
-    "        \n",
-    "        expected = np.array([5, 7, 9])\n",
-    "        assert np.array_equal(c.data, expected), f\"+ operator failed: expected {expected}, got {c.data}\"\n",
-    "        assert isinstance(c, Tensor), \"+ operator should return a Tensor\"\n",
-    "        \n",
-    "        print(f\"✅ + operator: {a.data} + {b.data} = {c.data}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ + operator failed: {e}\")\n",
-    "    \n",
-    "    # Test 4: Multiplication operator (*)\n",
-    "    try:\n",
-    "        a = Tensor([1, 2, 3])\n",
-    "        b = Tensor([4, 5, 6])\n",
-    "        c = a * b\n",
-    "        \n",
-    "        expected = np.array([4, 10, 18])\n",
-    "        assert np.array_equal(c.data, expected), f\"* operator failed: expected {expected}, got {c.data}\"\n",
-    "        assert isinstance(c, Tensor), \"* operator should return a Tensor\"\n",
-    "        \n",
-    "        print(f\"✅ * operator: {a.data} * {b.data} = {c.data}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ * operator failed: {e}\")\n",
-    "    \n",
-    "    # Test 5: Subtraction operator (-)\n",
-    "    try:\n",
-    "        a = Tensor([1, 2, 3])\n",
-    "        b = Tensor([4, 5, 6])\n",
-    "        c = b - a\n",
-    "        \n",
-    "        expected = np.array([3, 3, 3])\n",
-    "        assert np.array_equal(c.data, expected), f\"- operator failed: expected {expected}, got {c.data}\"\n",
-    "        assert isinstance(c, Tensor), \"- operator should return a Tensor\"\n",
-    "        \n",
-    "        print(f\"✅ - operator: {b.data} - {a.data} = {c.data}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ - operator failed: {e}\")\n",
-    "    \n",
-    "    # Test 6: Division operator (/)\n",
-    "    try:\n",
-    "        a = Tensor([1, 2, 4])\n",
-    "        b = Tensor([2, 4, 8])\n",
-    "        c = b / a\n",
-    "        \n",
-    "        expected = np.array([2.0, 2.0, 2.0])\n",
-    "        assert np.allclose(c.data, expected), f\"/ operator failed: expected {expected}, got {c.data}\"\n",
-    "        assert isinstance(c, Tensor), \"/ operator should return a Tensor\"\n",
-    "        \n",
-    "        print(f\"✅ / operator: {b.data} / {a.data} = {c.data}\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ / operator failed: {e}\")\n",
-    "    \n",
-    "    # Test 7: Scalar operations\n",
-    "    try:\n",
-    "        a = Tensor([1, 2, 3])\n",
-    "        \n",
-    "        # Addition with scalar\n",
-    "        b = a + 10\n",
-    "        expected_add = np.array([11, 12, 13])\n",
-    "        assert np.array_equal(b.data, expected_add), f\"Scalar addition failed: expected {expected_add}, got {b.data}\"\n",
-    "        \n",
-    "        # Multiplication with scalar\n",
-    "        c = a * 2\n",
-    "        expected_mul = np.array([2, 4, 6])\n",
-    "        assert np.array_equal(c.data, expected_mul), f\"Scalar multiplication failed: expected {expected_mul}, got {c.data}\"\n",
-    "        \n",
-    "        # Subtraction with scalar\n",
-    "        d = a - 1\n",
-    "        expected_sub = np.array([0, 1, 2])\n",
-    "        assert np.array_equal(d.data, expected_sub), f\"Scalar subtraction failed: expected {expected_sub}, got {d.data}\"\n",
-    "        \n",
-    "        # Division with scalar\n",
-    "        e = a / 2\n",
-    "        expected_div = np.array([0.5, 1.0, 1.5])\n",
-    "        assert np.allclose(e.data, expected_div), f\"Scalar division failed: expected {expected_div}, got {e.data}\"\n",
-    "        \n",
-    "        print(f\"✅ Scalar operations: +10, *2, -1, /2 all work correctly\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Scalar operations failed: {e}\")\n",
-    "    \n",
-    "    # Test 8: Matrix operations\n",
-    "    try:\n",
-    "        matrix_a = Tensor([[1, 2], [3, 4]])\n",
-    "        matrix_b = Tensor([[5, 6], [7, 8]])\n",
-    "        \n",
-    "        # Matrix addition\n",
-    "        c = matrix_a + matrix_b\n",
-    "        expected = np.array([[6, 8], [10, 12]])\n",
-    "        assert np.array_equal(c.data, expected), f\"Matrix addition failed: expected {expected}, got {c.data}\"\n",
-    "        assert c.shape == (2, 2), f\"Matrix addition should preserve shape, got {c.shape}\"\n",
-    "        \n",
-    "        # Matrix multiplication (element-wise)\n",
-    "        d = matrix_a * matrix_b\n",
-    "        expected_mul = np.array([[5, 12], [21, 32]])\n",
-    "        assert np.array_equal(d.data, expected_mul), f\"Matrix multiplication failed: expected {expected_mul}, got {d.data}\"\n",
-    "        \n",
-    "        print(f\"✅ Matrix operations: 2x2 matrix addition and multiplication\")\n",
-    "        tests_passed += 1\n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Matrix operations failed: {e}\")\n",
-    "    \n",
-    "    # Results summary\n",
-    "    print(f\"\\n📊 Tensor Arithmetic Results: {tests_passed}/{total_tests} tests passed\")\n",
-    "    \n",
-    "    if tests_passed == total_tests:\n",
-    "        print(\"🎉 All tensor arithmetic tests passed! Your tensor supports:\")\n",
-    "        print(\"  • Basic methods: add(), multiply()\")\n",
-    "        print(\"  • Python operators: +, -, *, /\")\n",
-    "        print(\"  • Scalar operations: tensor + number\")\n",
-    "        print(\"  • Matrix operations: element-wise operations\")\n",
-    "        print(\"📈 Progress: Tensor Creation ✓, Properties ✓, Arithmetic ✓\")\n",
-    "        return True\n",
-    "    else:\n",
-    "        print(\"⚠️  Some arithmetic tests failed. Common issues:\")\n",
-    "        print(\"  • Check your add() and multiply() methods\")\n",
-    "        print(\"  • Verify operator overloading (__add__, __mul__, __sub__, __truediv__)\")\n",
-    "        print(\"  • Make sure scalar operations work (convert scalar to Tensor)\")\n",
-    "        print(\"  • Test with different tensor shapes\")\n",
-    "        return False\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "success = test_tensor_arithmetic() and success"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e1bc89e2",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Comprehensive Test: Real ML Scenario\n",
-    "\n",
-    "Let's test your tensor with a realistic machine learning scenario to make sure everything works together."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c2620df1",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-comprehensive",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_tensor():\n",
-    "    \"\"\"Comprehensive test with realistic ML scenario.\"\"\"\n",
-    "    print(\"🔬 Testing tensor comprehensively with ML scenario...\")\n",
-    "    \n",
-    "    try:\n",
-    "        print(\"🧠 Simulating a simple neural network forward pass...\")\n",
-    "        \n",
-    "        # Simulate input data (batch of 2 samples, 3 features each)\n",
-    "        X = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])\n",
-    "        print(f\"📊 Input data shape: {X.shape}\")\n",
-    "        \n",
-    "        # Simulate weights (3 input features, 2 output neurons)\n",
-    "        W = Tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])\n",
-    "        print(f\"🎯 Weights shape: {W.shape}\")\n",
-    "        \n",
-    "        # Simulate bias (2 output neurons)\n",
-    "        b = Tensor([0.1, 0.2])\n",
-    "        print(f\"⚖️  Bias shape: {b.shape}\")\n",
-    "        \n",
-    "        # Simple linear transformation: y = X * W + b\n",
-    "        # Note: This is a simplified version - real matrix multiplication would be different\n",
-    "        # But we can test element-wise operations\n",
-    "        \n",
-    "        # Test that we can do basic operations needed for ML\n",
-    "        sample = Tensor([1.0, 2.0, 3.0])  # Single sample\n",
-    "        weight_col = Tensor([0.1, 0.3, 0.5])  # First column of weights\n",
-    "        \n",
-    "        # Compute dot product manually using element-wise operations\n",
-    "        products = sample * weight_col  # Element-wise multiplication\n",
-    "        print(f\"✅ Element-wise multiplication works: {products.data}\")\n",
-    "        \n",
-    "        # Test addition for bias\n",
-    "        result = products + Tensor([0.1, 0.1, 0.1])\n",
-    "        print(f\"✅ Bias addition works: {result.data}\")\n",
-    "        \n",
-    "        # Test with different shapes\n",
-    "        matrix_a = Tensor([[1, 2], [3, 4]])\n",
-    "        matrix_b = Tensor([[0.1, 0.2], [0.3, 0.4]])\n",
-    "        matrix_result = matrix_a * matrix_b\n",
-    "        print(f\"✅ Matrix operations work: {matrix_result.data}\")\n",
-    "        \n",
-    "        # Test scalar operations (common in ML)\n",
-    "        scaled = sample * 0.5  # Learning rate scaling\n",
-    "        print(f\"✅ Scalar scaling works: {scaled.data}\")\n",
-    "        \n",
-    "        # Test normalization-like operations\n",
-    "        mean_val = Tensor([2.0, 2.0, 2.0])  # Simulate mean\n",
-    "        normalized = sample - mean_val\n",
-    "        print(f\"✅ Mean subtraction works: {normalized.data}\")\n",
-    "        \n",
-    "        print(\"\\n🎉 Comprehensive test passed! Your tensor class can handle:\")\n",
-    "        print(\"  • Multi-dimensional data (batches, features)\")\n",
-    "        print(\"  • Element-wise operations needed for ML\")\n",
-    "        print(\"  • Scalar operations (learning rates, normalization)\")\n",
-    "        print(\"  • Matrix operations (weights, transformations)\")\n",
-    "        print(\"📈 Progress: All tensor functionality ✓\")\n",
-    "        print(\"🚀 Ready for neural network layers!\")\n",
-    "        \n",
-    "        return True\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Comprehensive test failed: {e}\")\n",
-    "        print(\"\\n💡 This suggests an issue with:\")\n",
-    "        print(\"  • Basic tensor operations not working together\")\n",
-    "        print(\"  • Shape handling problems\")\n",
-    "        print(\"  • Arithmetic operation implementation\")\n",
-    "        print(\"  • Check your tensor creation and arithmetic methods\")\n",
-    "        return False\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "success = test_tensor() and success\n",
-    "\n",
-    "# Print final summary\n",
-    "print(f\"\\n{'='*60}\")\n",
-    "print(\"🎯 TENSOR MODULE TESTING COMPLETE\")\n",
-    "print(f\"{'='*60}\")\n",
-    "\n",
-    "if success:\n",
-    "    print(\"🎉 CONGRATULATIONS! All tensor tests passed!\")\n",
-    "    print(\"\\n✅ Your Tensor class successfully implements:\")\n",
-    "    print(\"  • Comprehensive tensor creation (scalars, vectors, matrices)\")\n",
-    "    print(\"  • All essential properties (shape, size, dtype, data access)\")\n",
-    "    print(\"  • Complete arithmetic operations (methods and operators)\")\n",
-    "    print(\"  • Scalar and matrix operations\")\n",
-    "    print(\"  • Real ML scenario compatibility\")\n",
-    "    print(\"\\n🚀 You're ready to move to the next module!\")\n",
-    "    print(\"📈 Final Progress: Tensor Module ✓ COMPLETE\")\n",
-    "else:\n",
-    "    print(\"⚠️  Some tests failed. Please review the error messages above.\")\n",
-    "    print(\"\\n🔧 To fix issues:\")\n",
-    "    print(\"  1. Check the specific test that failed\")\n",
-    "    print(\"  2. Review the error message and hints\")\n",
-    "    print(\"  3. Fix your implementation\")\n",
-    "    print(\"  4. Re-run the notebook cells\")\n",
-    "    print(\"\\n💪 Don't give up! Debugging is part of learning.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "95e55372",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 3: Tensor Arithmetic Operations\n",
-    "\n",
-    "### Why Arithmetic Matters\n",
-    "Tensor arithmetic is the foundation of all neural network operations:\n",
-    "- **Forward pass**: Matrix multiplications and additions\n",
-    "- **Activation functions**: Element-wise operations\n",
-    "- **Loss computation**: Differences and squares\n",
-    "- **Gradient computation**: Chain rule applications\n",
-    "\n",
-    "### Operations We'll Implement\n",
-    "- **Addition**: Element-wise addition of tensors\n",
-    "- **Multiplication**: Element-wise multiplication\n",
-    "- **Python operators**: `+`, `-`, `*`, `/` for natural syntax\n",
-    "- **Broadcasting**: Handle different shapes automatically"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "892bdfc8",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 3: Tensor Arithmetic Methods\n",
-    "\n",
-    "The arithmetic methods are now part of the Tensor class above. Let's test them!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b41da2be",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 4: Python Operator Overloading\n",
-    "\n",
-    "### Why Operator Overloading?\n",
-    "Python's magic methods allow us to use natural syntax:\n",
-    "- `a + b` instead of `a.add(b)`\n",
-    "- `a * b` instead of `a.multiply(b)`\n",
-    "- `a - b` for subtraction\n",
-    "- `a / b` for division\n",
-    "\n",
-    "This makes tensor operations feel natural and readable."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a1593a73",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 4: Operator Overloading\n",
-    "\n",
-    "The operator methods (__add__, __mul__, __sub__, __truediv__) are now part of the Tensor class above. This enables natural syntax like `a + b` and `a * b`."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e96b0ba6",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Test Your Tensor Implementation\n",
-    "\n",
-    "Once you implement the Tensor class above, run these cells to test your implementation:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3c0f07ec",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-creation",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test tensor creation and properties\n",
-    "print(\"🔬 Unit Test: Tensor Creation...\")\n",
-    "\n",
-    "# Test scalar creation\n",
-    "scalar = Tensor(5.0)\n",
-    "assert scalar.shape == (), f\"Scalar shape should be (), got {scalar.shape}\"\n",
-    "assert scalar.size == 1, f\"Scalar size should be 1, got {scalar.size}\"\n",
-    "assert scalar.data.item() == 5.0, f\"Scalar value should be 5.0, got {scalar.data.item()}\"\n",
-    "\n",
-    "# Test vector creation\n",
-    "vector = Tensor([1, 2, 3])\n",
-    "assert vector.shape == (3,), f\"Vector shape should be (3,), got {vector.shape}\"\n",
-    "assert vector.size == 3, f\"Vector size should be 3, got {vector.size}\"\n",
-    "assert np.array_equal(vector.data, np.array([1, 2, 3])), \"Vector data mismatch\"\n",
-    "\n",
-    "# Test matrix creation\n",
-    "matrix = Tensor([[1, 2], [3, 4]])\n",
-    "assert matrix.shape == (2, 2), f\"Matrix shape should be (2, 2), got {matrix.shape}\"\n",
-    "assert matrix.size == 4, f\"Matrix size should be 4, got {matrix.size}\"\n",
-    "assert np.array_equal(matrix.data, np.array([[1, 2], [3, 4]])), \"Matrix data mismatch\"\n",
-    "\n",
-    "# Test dtype handling\n",
-    "float_tensor = Tensor([1.0, 2.0, 3.0])\n",
-    "assert float_tensor.dtype == np.float32, f\"Float tensor dtype should be float32, got {float_tensor.dtype}\"\n",
-    "\n",
-    "int_tensor = Tensor([1, 2, 3])\n",
-    "# Note: NumPy may default to int64 on some systems, so we check for integer types\n",
-    "assert int_tensor.dtype in [np.int32, np.int64], f\"Int tensor dtype should be int32 or int64, got {int_tensor.dtype}\"\n",
-    "\n",
-    "print(\"✅ Tensor creation tests passed!\")\n",
-    "print(f\"✅ Scalar: {scalar}\")\n",
-    "print(f\"✅ Vector: {vector}\")\n",
-    "print(f\"✅ Matrix: {matrix}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "56ac7c19",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-arithmetic",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test tensor arithmetic operations\n",
-    "print(\"🔬 Unit Test: Tensor Arithmetic...\")\n",
-    "\n",
-    "# Test addition\n",
-    "a = Tensor([1, 2, 3])\n",
-    "b = Tensor([4, 5, 6])\n",
-    "c = a + b\n",
-    "expected = np.array([5, 7, 9])\n",
-    "assert np.array_equal(c.data, expected), f\"Addition failed: expected {expected}, got {c.data}\"\n",
-    "\n",
-    "# Test multiplication\n",
-    "d = a * b\n",
-    "expected = np.array([4, 10, 18])\n",
-    "assert np.array_equal(d.data, expected), f\"Multiplication failed: expected {expected}, got {d.data}\"\n",
-    "\n",
-    "# Test subtraction\n",
-    "e = b - a\n",
-    "expected = np.array([3, 3, 3])\n",
-    "assert np.array_equal(e.data, expected), f\"Subtraction failed: expected {expected}, got {e.data}\"\n",
-    "\n",
-    "# Test division\n",
-    "f = b / a\n",
-    "expected = np.array([4.0, 2.5, 2.0])\n",
-    "assert np.allclose(f.data, expected), f\"Division failed: expected {expected}, got {f.data}\"\n",
-    "\n",
-    "# Test scalar operations\n",
-    "g = a + 10\n",
-    "expected = np.array([11, 12, 13])\n",
-    "assert np.array_equal(g.data, expected), f\"Scalar addition failed: expected {expected}, got {g.data}\"\n",
-    "\n",
-    "h = a * 2\n",
-    "expected = np.array([2, 4, 6])\n",
-    "assert np.array_equal(h.data, expected), f\"Scalar multiplication failed: expected {expected}, got {h.data}\"\n",
-    "\n",
-    "print(\"✅ Tensor arithmetic tests passed!\")\n",
-    "print(f\"✅ Addition: {a} + {b} = {c}\")\n",
-    "print(f\"✅ Multiplication: {a} * {b} = {d}\")\n",
-    "print(f\"✅ Subtraction: {b} - {a} = {e}\")\n",
-    "print(f\"✅ Division: {b} / {a} = {f}\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c1a1d9b9",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tensor-broadcasting",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test tensor broadcasting\n",
-    "print(\"🔬 Unit Test: Tensor Broadcasting...\")\n",
-    "\n",
-    "# Test scalar broadcasting\n",
-    "matrix = Tensor([[1, 2], [3, 4]])\n",
-    "scalar = Tensor(10)\n",
-    "result = matrix + scalar\n",
-    "expected = np.array([[11, 12], [13, 14]])\n",
-    "assert np.array_equal(result.data, expected), f\"Scalar broadcasting failed: expected {expected}, got {result.data}\"\n",
-    "\n",
-    "# Test vector broadcasting\n",
-    "vector = Tensor([1, 2])\n",
-    "result = matrix + vector\n",
-    "expected = np.array([[2, 4], [4, 6]])\n",
-    "assert np.array_equal(result.data, expected), f\"Vector broadcasting failed: expected {expected}, got {result.data}\"\n",
-    "\n",
-    "# Test different shapes\n",
-    "a = Tensor([[1], [2], [3]])  # (3, 1)\n",
-    "b = Tensor([10, 20])         # (2,)\n",
-    "result = a + b\n",
-    "expected = np.array([[11, 21], [12, 22], [13, 23]])\n",
-    "assert np.array_equal(result.data, expected), f\"Shape broadcasting failed: expected {expected}, got {result.data}\"\n",
-    "\n",
-    "print(\"✅ Tensor broadcasting tests passed!\")\n",
-    "print(f\"✅ Matrix + Scalar: {matrix} + {scalar} = {result}\")\n",
-    "print(f\"✅ Broadcasting works correctly!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f50aa11a",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary\n",
-    "\n",
-    "Congratulations! You've successfully implemented the core Tensor class for TinyTorch:\n",
-    "\n",
-    "### What You've Accomplished\n",
-    "✅ **Tensor Creation**: Handle scalars, vectors, matrices, and higher-dimensional arrays  \n",
-    "✅ **Data Types**: Proper dtype handling with auto-detection and conversion  \n",
-    "✅ **Properties**: Shape, size, dtype, and data access  \n",
-    "✅ **Arithmetic**: Addition, multiplication, subtraction, division  \n",
-    "✅ **Operators**: Natural Python syntax with `+`, `-`, `*`, `/`  \n",
-    "✅ **Broadcasting**: Automatic shape compatibility like NumPy  \n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "- **Tensors** are the fundamental data structure for ML systems\n",
-    "- **NumPy backend** provides efficient computation with ML-friendly API\n",
-    "- **Operator overloading** makes tensor operations feel natural\n",
-    "- **Broadcasting** enables flexible operations between different shapes\n",
-    "- **Type safety** ensures consistent behavior across operations"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8517b929",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1a502bc5",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Tensor\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dee65782",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary\n",
-    "\n",
-    "Congratulations! You've successfully implemented the core Tensor class for TinyTorch:\n",
-    "\n",
-    "### What You've Accomplished\n",
-    "✅ **Tensor Creation**: Handle scalars, vectors, matrices, and higher-dimensional arrays  \n",
-    "✅ **Data Types**: Proper dtype handling with auto-detection and conversion  \n",
-    "✅ **Properties**: Shape, size, dtype, and data access  \n",
-    "✅ **Arithmetic**: Addition, multiplication, subtraction, division  \n",
-    "✅ **Operators**: Natural Python syntax with `+`, `-`, `*`, `/`  \n",
-    "✅ **Broadcasting**: Automatic shape compatibility like NumPy  \n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "- **Tensors** are the fundamental data structure for ML systems\n",
-    "- **NumPy backend** provides efficient computation with ML-friendly API\n",
-    "- **Operator overloading** makes tensor operations feel natural\n",
-    "- **Broadcasting** enables flexible operations between different shapes\n",
-    "- **Type safety** ensures consistent behavior across operations\n",
-    "\n",
-    "### Next Steps\n",
-    "1. **Export your code**: `tito package nbdev --export 01_tensor`\n",
-    "2. **Test your implementation**: `tito module test 01_tensor`\n",
-    "3. **Use your tensors**: \n",
-    "   ```python\n",
-    "   from tinytorch.core.tensor import Tensor\n",
-    "   t = Tensor([1, 2, 3])\n",
-    "   print(t + 5)  # Your tensor in action!\n",
-    "   ```\n",
-    "4. **Move to Module 2**: Start building activation functions!\n",
-    "\n",
-    "**Ready for the next challenge?** Let's add the mathematical functions that make neural networks powerful!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/03_activations/activations_dev.ipynb b/modules/source/03_activations/activations_dev.ipynb
deleted file mode 100644
index e356e73e..00000000
--- a/modules/source/03_activations/activations_dev.ipynb
+++ /dev/null
@@ -1,1213 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "be8121ef",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Activations - Nonlinearity in Neural Networks\n",
-    "\n",
-    "Welcome to the Activations module! This is where neural networks get their power through nonlinearity.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand why activation functions are essential for neural networks\n",
-    "- Implement the four most important activation functions: ReLU, Sigmoid, Tanh, and Softmax\n",
-    "- Visualize how activations transform data and enable complex learning\n",
-    "- See how activations work with layers to build powerful networks\n",
-    "- Master the NBGrader workflow with comprehensive testing\n",
-    "\n",
-    "## Build → Use → Understand\n",
-    "1. **Build**: Activation functions that add nonlinearity\n",
-    "2. **Use**: Transform tensors and see immediate results\n",
-    "3. **Understand**: How nonlinearity enables complex pattern learning"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "961899d6",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "activations-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.activations\n",
-    "\n",
-    "#| export\n",
-    "import math\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import os\n",
-    "import sys\n",
-    "from typing import Union, List\n",
-    "\n",
-    "# Import our Tensor class - try from package first, then from local module\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "except ImportError:\n",
-    "    # For development, import from local tensor module\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
-    "    from tensor_dev import Tensor"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "dad2f6a1",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "activations-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e4d45d18",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "activations-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Activations Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build activation functions!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a504e812",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/02_activations/activations_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.activations`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
-    "from tinytorch.core.tensor import Tensor  # Foundation\n",
-    "from tinytorch.core.layers import Dense  # Uses activations\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused modules for deep understanding\n",
-    "- **Production:** Proper organization like PyTorch's `torch.nn.ReLU`\n",
-    "- **Consistency:** All activation functions live together in `core.activations`\n",
-    "- **Integration:** Works seamlessly with tensors and layers"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b2e5d721",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What Are Activation Functions?\n",
-    "\n",
-    "### The Problem: Linear Limitations\n",
-    "Without activation functions, neural networks can only learn linear relationships:\n",
-    "```\n",
-    "y = W₁ · (W₂ · (W₃ · x + b₃) + b₂) + b₁\n",
-    "```\n",
-    "\n",
-    "This simplifies to just:\n",
-    "```\n",
-    "y = W_combined · x + b_combined\n",
-    "```\n",
-    "\n",
-    "**A single linear function!** No matter how many layers you add, you can't learn complex patterns like:\n",
-    "- Image recognition (nonlinear pixel relationships)\n",
-    "- Language understanding (nonlinear word relationships) \n",
-    "- Game playing (nonlinear strategy relationships)\n",
-    "\n",
-    "### The Solution: Nonlinearity\n",
-    "Activation functions add nonlinearity between layers:\n",
-    "```\n",
-    "y = W₁ · f(W₂ · f(W₃ · x + b₃) + b₂) + b₁\n",
-    "```\n",
-    "\n",
-    "Now each layer can learn complex transformations!\n",
-    "\n",
-    "### Real-World Impact\n",
-    "- **Before activations**: Only linear classifiers (logistic regression)\n",
-    "- **After activations**: Complex pattern recognition (deep learning revolution)\n",
-    "\n",
-    "### What We'll Build\n",
-    "1. **ReLU**: The foundation of modern deep learning\n",
-    "2. **Sigmoid**: Classic activation for binary classification\n",
-    "3. **Tanh**: Centered activation for better gradients\n",
-    "4. **Softmax**: Probability distributions for multi-class classification"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b4d72ee2",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: ReLU - The Foundation of Deep Learning\n",
-    "\n",
-    "### What is ReLU?\n",
-    "**ReLU (Rectified Linear Unit)** is the most important activation function in deep learning:\n",
-    "\n",
-    "```\n",
-    "f(x) = max(0, x)\n",
-    "```\n",
-    "\n",
-    "- **Positive inputs**: Pass through unchanged\n",
-    "- **Negative inputs**: Become zero\n",
-    "- **Zero**: Stays zero\n",
-    "\n",
-    "### Why ReLU Revolutionized Deep Learning\n",
-    "1. **Computational efficiency**: Just a max operation\n",
-    "2. **No vanishing gradients**: Derivative is 1 for positive values\n",
-    "3. **Sparsity**: Many neurons output exactly 0\n",
-    "4. **Empirical success**: Works well in practice\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Input:  [-2, -1, 0, 1, 2]\n",
-    "ReLU:   [ 0,  0, 0, 1, 2]\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Image classification**: ResNet, VGG, AlexNet\n",
-    "- **Object detection**: YOLO, R-CNN\n",
-    "- **Language models**: Transformer feedforward layers\n",
-    "- **Recommendation**: Deep collaborative filtering\n",
-    "\n",
-    "### Mathematical Properties\n",
-    "- **Derivative**: f'(x) = 1 if x > 0, else 0\n",
-    "- **Range**: [0, ∞)\n",
-    "- **Sparsity**: Outputs exactly 0 for negative inputs"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6d036432",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "relu-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class ReLU:\n",
-    "    \"\"\"\n",
-    "    ReLU Activation Function: f(x) = max(0, x)\n",
-    "    \n",
-    "    The most popular activation function in deep learning.\n",
-    "    Simple, fast, and effective for most applications.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def forward(self, x):\n",
-    "        \"\"\"\n",
-    "        Apply ReLU activation: f(x) = max(0, x)\n",
-    "        \n",
-    "        TODO: Implement ReLU activation function.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. For each element in the input tensor, apply max(0, element)\n",
-    "        2. Use NumPy's maximum function for efficient element-wise operation\n",
-    "        3. Return a new tensor of the same type with the results\n",
-    "        4. Preserve the input tensor's shape\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        relu = ReLU()\n",
-    "        input_tensor = Tensor([[-2, -1, 0, 1, 2]])\n",
-    "        output = relu(input_tensor)\n",
-    "        print(output.data)  # [[0, 0, 0, 1, 2]]\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use np.maximum(0, x.data) for element-wise max with 0\n",
-    "        - Return the same type as input: return type(x)(result)\n",
-    "        - The shape should remain the same as input\n",
-    "        - Don't modify the input tensor (immutable operations)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is like torch.nn.ReLU() in PyTorch\n",
-    "        - Used in virtually every modern neural network\n",
-    "        - Enables deep networks by preventing vanishing gradients\n",
-    "        - Creates sparse representations (many zeros)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        result = np.maximum(0, x.data)\n",
-    "        return type(x)(result)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x):\n",
-    "        \"\"\"Make the class callable: relu(x) instead of relu.forward(x)\"\"\"\n",
-    "        return self.forward(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3d92ddc5",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your ReLU Implementation\n",
-    "\n",
-    "Once you implement the ReLU forward method above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c3049a30",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-relu-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_relu_activation():\n",
-    "    \"\"\"Test ReLU activation function\"\"\"\n",
-    "    print(\"🔬 Unit Test: ReLU Activation...\")\n",
-    "\n",
-    "    # Create ReLU instance\n",
-    "    relu = ReLU()\n",
-    "\n",
-    "    # Test with mixed positive/negative values\n",
-    "    test_input = Tensor([[-2, -1, 0, 1, 2]])\n",
-    "    result = relu(test_input)\n",
-    "    expected = np.array([[0, 0, 0, 1, 2]])\n",
-    "    \n",
-    "    assert np.array_equal(result.data, expected), f\"ReLU failed: expected {expected}, got {result.data}\"\n",
-    "    \n",
-    "    # Test that negative values become zero\n",
-    "    assert np.all(result.data >= 0), \"ReLU should make all negative values zero\"\n",
-    "    \n",
-    "    # Test that positive values remain unchanged\n",
-    "    positive_input = Tensor([[1, 2, 3, 4, 5]])\n",
-    "    positive_result = relu(positive_input)\n",
-    "    assert np.array_equal(positive_result.data, positive_input.data), \"ReLU should preserve positive values\"\n",
-    "    \n",
-    "    # Test with 2D tensor\n",
-    "    matrix_input = Tensor([[-1, 2], [3, -4]])\n",
-    "    matrix_result = relu(matrix_input)\n",
-    "    matrix_expected = np.array([[0, 2], [3, 0]])\n",
-    "    assert np.array_equal(matrix_result.data, matrix_expected), \"ReLU should work with 2D tensors\"\n",
-    "    \n",
-    "    # Test shape preservation\n",
-    "    assert matrix_result.shape == matrix_input.shape, \"ReLU should preserve input shape\"\n",
-    "    \n",
-    "    print(\"✅ ReLU activation tests passed!\")\n",
-    "    print(f\"✅ Negative values correctly zeroed\")\n",
-    "    print(f\"✅ Positive values preserved\")\n",
-    "    print(f\"✅ Shape preservation working\")\n",
-    "    print(f\"✅ Works with multi-dimensional tensors\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_relu_activation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d1ff8f4e",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Sigmoid - Classic Binary Classification\n",
-    "\n",
-    "### What is Sigmoid?\n",
-    "**Sigmoid** is the classic activation function that maps any real number to (0, 1):\n",
-    "\n",
-    "```\n",
-    "f(x) = 1 / (1 + e^(-x))\n",
-    "```\n",
-    "\n",
-    "### Why Sigmoid Matters\n",
-    "1. **Probability interpretation**: Outputs between 0 and 1\n",
-    "2. **Smooth gradients**: Differentiable everywhere\n",
-    "3. **Historical importance**: Enabled early neural networks\n",
-    "4. **Binary classification**: Perfect for yes/no decisions\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Input:  [-∞, -2, -1, 0, 1, 2, ∞]\n",
-    "Sigmoid:[0,  0.12, 0.27, 0.5, 0.73, 0.88, 1]\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Binary classification**: Spam detection, medical diagnosis\n",
-    "- **Gating mechanisms**: LSTM and GRU cells\n",
-    "- **Output layers**: When you need probabilities\n",
-    "- **Attention mechanisms**: Where to focus attention\n",
-    "\n",
-    "### Mathematical Properties\n",
-    "- **Range**: (0, 1)\n",
-    "- **Derivative**: f'(x) = f(x) · (1 - f(x))\n",
-    "- **Centered**: f(0) = 0.5\n",
-    "- **Symmetric**: f(-x) = 1 - f(x)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "96622d93",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "sigmoid-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Sigmoid:\n",
-    "    \"\"\"\n",
-    "    Sigmoid Activation Function: f(x) = 1 / (1 + e^(-x))\n",
-    "    \n",
-    "    Maps any real number to the range (0, 1).\n",
-    "    Useful for binary classification and probability outputs.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def forward(self, x):\n",
-    "        \"\"\"\n",
-    "        Apply Sigmoid activation: f(x) = 1 / (1 + e^(-x))\n",
-    "        \n",
-    "        TODO: Implement Sigmoid activation function.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Compute the negative of input: -x.data\n",
-    "        2. Compute the exponential: np.exp(-x.data)\n",
-    "        3. Add 1 to the exponential: 1 + np.exp(-x.data)\n",
-    "        4. Take the reciprocal: 1 / (1 + np.exp(-x.data))\n",
-    "        5. Return as new Tensor\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        sigmoid = Sigmoid()\n",
-    "        input_tensor = Tensor([[-2, -1, 0, 1, 2]])\n",
-    "        output = sigmoid(input_tensor)\n",
-    "        print(output.data)  # [[0.119, 0.269, 0.5, 0.731, 0.881]]\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use np.exp() for exponential function\n",
-    "        - Formula: 1 / (1 + np.exp(-x.data))\n",
-    "        - Handle potential overflow with np.clip(-x.data, -500, 500)\n",
-    "        - Return Tensor(result)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is like torch.nn.Sigmoid() in PyTorch\n",
-    "        - Used in binary classification output layers\n",
-    "        - Key component in LSTM and GRU gating mechanisms\n",
-    "        - Historically important for early neural networks\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Clip to prevent overflow\n",
-    "        clipped_input = np.clip(-x.data, -500, 500)\n",
-    "        result = 1 / (1 + np.exp(clipped_input))\n",
-    "        return type(x)(result)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x):\n",
-    "        \"\"\"Make the class callable: sigmoid(x) instead of sigmoid.forward(x)\"\"\"\n",
-    "        return self.forward(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d5fc932b",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Sigmoid Implementation\n",
-    "\n",
-    "Once you implement the Sigmoid forward method above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8adc7dbd",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-sigmoid-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_sigmoid_activation():\n",
-    "    \"\"\"Test Sigmoid activation function\"\"\"\n",
-    "    print(\"🔬 Unit Test: Sigmoid Activation...\")\n",
-    "\n",
-    "# Create Sigmoid instance\n",
-    "    sigmoid = Sigmoid()\n",
-    "\n",
-    "    # Test with known values\n",
-    "    test_input = Tensor([[0]])\n",
-    "    result = sigmoid(test_input)\n",
-    "    expected = 0.5\n",
-    "    \n",
-    "    assert abs(result.data[0][0] - expected) < 1e-6, f\"Sigmoid(0) should be 0.5, got {result.data[0][0]}\"\n",
-    "    \n",
-    "    # Test with positive and negative values\n",
-    "    test_input = Tensor([[-2, -1, 0, 1, 2]])\n",
-    "    result = sigmoid(test_input)\n",
-    "    \n",
-    "    # Check that all values are between 0 and 1\n",
-    "    assert np.all(result.data > 0), \"Sigmoid output should be > 0\"\n",
-    "    assert np.all(result.data < 1), \"Sigmoid output should be < 1\"\n",
-    "    \n",
-    "    # Test symmetry: sigmoid(-x) = 1 - sigmoid(x)\n",
-    "    x_val = 1.0\n",
-    "    pos_result = sigmoid(Tensor([[x_val]]))\n",
-    "    neg_result = sigmoid(Tensor([[-x_val]]))\n",
-    "    symmetry_check = abs(pos_result.data[0][0] + neg_result.data[0][0] - 1.0)\n",
-    "    assert symmetry_check < 1e-6, \"Sigmoid should be symmetric around 0.5\"\n",
-    "    \n",
-    "    # Test with 2D tensor\n",
-    "    matrix_input = Tensor([[-1, 1], [0, 2]])\n",
-    "    matrix_result = sigmoid(matrix_input)\n",
-    "    assert matrix_result.shape == matrix_input.shape, \"Sigmoid should preserve shape\"\n",
-    "    \n",
-    "    # Test extreme values (should not overflow)\n",
-    "    extreme_input = Tensor([[-100, 100]])\n",
-    "    extreme_result = sigmoid(extreme_input)\n",
-    "    assert not np.any(np.isnan(extreme_result.data)), \"Sigmoid should handle extreme values\"\n",
-    "    assert not np.any(np.isinf(extreme_result.data)), \"Sigmoid should not produce inf values\"\n",
-    "    \n",
-    "    print(\"✅ Sigmoid activation tests passed!\")\n",
-    "    print(f\"✅ Outputs correctly bounded between 0 and 1\")\n",
-    "    print(f\"✅ Symmetric property verified\")\n",
-    "    print(f\"✅ Handles extreme values without overflow\")\n",
-    "    print(f\"✅ Shape preservation working\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_sigmoid_activation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "13ec7799",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Tanh - Centered Activation\n",
-    "\n",
-    "### What is Tanh?\n",
-    "**Tanh (Hyperbolic Tangent)** is similar to sigmoid but centered around zero:\n",
-    "\n",
-    "```\n",
-    "f(x) = (e^x - e^(-x)) / (e^x + e^(-x))\n",
-    "```\n",
-    "\n",
-    "### Why Tanh is Better Than Sigmoid\n",
-    "1. **Zero-centered**: Outputs range from -1 to 1\n",
-    "2. **Better gradients**: Helps with gradient flow in deep networks\n",
-    "3. **Faster convergence**: Less bias shift during training\n",
-    "4. **Stronger gradients**: Maximum gradient is 1 vs 0.25 for sigmoid\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Input: [-∞, -2, -1, 0, 1, 2, ∞]\n",
-    "Tanh:  [-1, -0.96, -0.76, 0, 0.76, 0.96, 1]\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Hidden layers**: Better than sigmoid for internal activations\n",
-    "- **RNN cells**: Classic RNN and LSTM use tanh\n",
-    "- **Normalization**: When you need zero-centered outputs\n",
-    "- **Feature scaling**: Maps inputs to [-1, 1] range\n",
-    "\n",
-    "### Mathematical Properties\n",
-    "- **Range**: (-1, 1)\n",
-    "- **Derivative**: f'(x) = 1 - f(x)²\n",
-    "- **Zero-centered**: f(0) = 0\n",
-    "- **Antisymmetric**: f(-x) = -f(x)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7759b33e",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "tanh-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Tanh:\n",
-    "    \"\"\"\n",
-    "    Tanh Activation Function: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))\n",
-    "    \n",
-    "    Zero-centered activation function with range (-1, 1).\n",
-    "    Better gradient properties than sigmoid.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def forward(self, x: Tensor) -> Tensor:\n",
-    "        \"\"\"\n",
-    "        Apply Tanh activation: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))\n",
-    "        \n",
-    "        TODO: Implement Tanh activation function.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Use NumPy's built-in tanh function: np.tanh(x.data)\n",
-    "        2. Alternatively, implement manually:\n",
-    "           - Compute e^x and e^(-x)\n",
-    "           - Calculate (e^x - e^(-x)) / (e^x + e^(-x))\n",
-    "        3. Return as new Tensor\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        tanh = Tanh()\n",
-    "        input_tensor = Tensor([[-2, -1, 0, 1, 2]])\n",
-    "        output = tanh(input_tensor)\n",
-    "        print(output.data)  # [[-0.964, -0.762, 0, 0.762, 0.964]]\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use np.tanh(x.data) for simplicity\n",
-    "        - Manual implementation: (np.exp(x.data) - np.exp(-x.data)) / (np.exp(x.data) + np.exp(-x.data))\n",
-    "        - Handle overflow by clipping inputs: np.clip(x.data, -500, 500)\n",
-    "        - Return Tensor(result)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is like torch.nn.Tanh() in PyTorch\n",
-    "        - Used in RNN, LSTM, and GRU cells\n",
-    "        - Better than sigmoid for hidden layers\n",
-    "        - Zero-centered outputs help with gradient flow\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Use NumPy's built-in tanh function\n",
-    "        result = np.tanh(x.data)\n",
-    "        return type(x)(result)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x: Tensor) -> Tensor:\n",
-    "        \"\"\"Make the class callable: tanh(x) instead of tanh.forward(x)\"\"\"\n",
-    "        return self.forward(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "23418296",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Tanh Implementation\n",
-    "\n",
-    "Once you implement the Tanh forward method above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e237b49a",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-tanh-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_tanh_activation():\n",
-    "    \"\"\"Test Tanh activation function\"\"\"\n",
-    "    print(\"🔬 Unit Test: Tanh Activation...\")\n",
-    "\n",
-    "# Create Tanh instance\n",
-    "    tanh = Tanh()\n",
-    "\n",
-    "    # Test with zero (should be 0)\n",
-    "    test_input = Tensor([[0]])\n",
-    "    result = tanh(test_input)\n",
-    "    expected = 0.0\n",
-    "    \n",
-    "    assert abs(result.data[0][0] - expected) < 1e-6, f\"Tanh(0) should be 0, got {result.data[0][0]}\"\n",
-    "    \n",
-    "    # Test with positive and negative values\n",
-    "    test_input = Tensor([[-2, -1, 0, 1, 2]])\n",
-    "    result = tanh(test_input)\n",
-    "    \n",
-    "    # Check that all values are between -1 and 1\n",
-    "    assert np.all(result.data > -1), \"Tanh output should be > -1\"\n",
-    "    assert np.all(result.data < 1), \"Tanh output should be < 1\"\n",
-    "    \n",
-    "    # Test antisymmetry: tanh(-x) = -tanh(x)\n",
-    "    x_val = 1.5\n",
-    "    pos_result = tanh(Tensor([[x_val]]))\n",
-    "    neg_result = tanh(Tensor([[-x_val]]))\n",
-    "    antisymmetry_check = abs(pos_result.data[0][0] + neg_result.data[0][0])\n",
-    "    assert antisymmetry_check < 1e-6, \"Tanh should be antisymmetric\"\n",
-    "    \n",
-    "    # Test with 2D tensor\n",
-    "    matrix_input = Tensor([[-1, 1], [0, 2]])\n",
-    "    matrix_result = tanh(matrix_input)\n",
-    "    assert matrix_result.shape == matrix_input.shape, \"Tanh should preserve shape\"\n",
-    "    \n",
-    "    # Test extreme values (should not overflow)\n",
-    "    extreme_input = Tensor([[-100, 100]])\n",
-    "    extreme_result = tanh(extreme_input)\n",
-    "    assert not np.any(np.isnan(extreme_result.data)), \"Tanh should handle extreme values\"\n",
-    "    assert not np.any(np.isinf(extreme_result.data)), \"Tanh should not produce inf values\"\n",
-    "    \n",
-    "    # Test that extreme values approach ±1\n",
-    "    assert abs(extreme_result.data[0][0] - (-1)) < 1e-6, \"Tanh(-∞) should approach -1\"\n",
-    "    assert abs(extreme_result.data[0][1] - 1) < 1e-6, \"Tanh(∞) should approach 1\"\n",
-    "    \n",
-    "    print(\"✅ Tanh activation tests passed!\")\n",
-    "    print(f\"✅ Outputs correctly bounded between -1 and 1\")\n",
-    "    print(f\"✅ Antisymmetric property verified\")\n",
-    "    print(f\"✅ Zero-centered (tanh(0) = 0)\")\n",
-    "    print(f\"✅ Handles extreme values correctly\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_tanh_activation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca26ddb0",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Softmax - Probability Distributions\n",
-    "\n",
-    "### What is Softmax?\n",
-    "**Softmax** converts a vector of real numbers into a probability distribution:\n",
-    "\n",
-    "```\n",
-    "f(x_i) = e^(x_i) / Σ(e^(x_j))\n",
-    "```\n",
-    "\n",
-    "### Why Softmax is Essential\n",
-    "1. **Probability distribution**: Outputs sum to 1\n",
-    "2. **Multi-class classification**: Choose one class from many\n",
-    "3. **Interpretable**: Each output is a probability\n",
-    "4. **Differentiable**: Enables gradient-based learning\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Input:  [1, 2, 3]\n",
-    "Softmax:[0.09, 0.24, 0.67]  # Sums to 1.0\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Classification**: Image classification, text classification\n",
-    "- **Language models**: Next word prediction\n",
-    "- **Attention mechanisms**: Where to focus attention\n",
-    "- **Reinforcement learning**: Action selection probabilities\n",
-    "\n",
-    "### Mathematical Properties\n",
-    "- **Range**: (0, 1) for each output\n",
-    "- **Constraint**: Σ(f(x_i)) = 1\n",
-    "- **Argmax preservation**: Doesn't change relative ordering\n",
-    "- **Temperature scaling**: Can be made sharper or softer"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bdcc52c8",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "softmax-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Softmax:\n",
-    "    \"\"\"\n",
-    "    Softmax Activation Function: f(x_i) = e^(x_i) / Σ(e^(x_j))\n",
-    "    \n",
-    "    Converts a vector of real numbers into a probability distribution.\n",
-    "    Essential for multi-class classification.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def forward(self, x):\n",
-    "        \"\"\"\n",
-    "        Apply Softmax activation: f(x_i) = e^(x_i) / Σ(e^(x_j))\n",
-    "        \n",
-    "        TODO: Implement Softmax activation function.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Handle empty input case\n",
-    "        2. Subtract max value for numerical stability: x - max(x)\n",
-    "        3. Compute exponentials: np.exp(x - max(x))\n",
-    "        4. Compute sum of exponentials: np.sum(exp_values)\n",
-    "        5. Divide each exponential by the sum: exp_values / sum\n",
-    "        6. Return as same tensor type as input\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        softmax = Softmax()\n",
-    "        input_tensor = Tensor([[1, 2, 3]])\n",
-    "        output = softmax(input_tensor)\n",
-    "        print(output.data)  # [[0.09, 0.24, 0.67]]\n",
-    "        print(np.sum(output.data))  # 1.0\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Handle empty case: if x.data.size == 0: return type(x)(x.data.copy())\n",
-    "        - Subtract max for numerical stability: x_shifted = x.data - np.max(x.data, axis=-1, keepdims=True)\n",
-    "        - Compute exponentials: exp_values = np.exp(x_shifted)\n",
-    "        - Sum along last axis: sum_exp = np.sum(exp_values, axis=-1, keepdims=True)\n",
-    "        - Divide: result = exp_values / sum_exp\n",
-    "        - Return same type as input: return type(x)(result)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is like torch.nn.Softmax() in PyTorch\n",
-    "        - Used in classification output layers\n",
-    "        - Key component in attention mechanisms\n",
-    "        - Enables probability-based decision making\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Handle empty input\n",
-    "        if x.data.size == 0:\n",
-    "            return type(x)(x.data.copy())\n",
-    "        \n",
-    "        # Subtract max for numerical stability\n",
-    "        x_shifted = x.data - np.max(x.data, axis=-1, keepdims=True)\n",
-    "        \n",
-    "        # Compute exponentials\n",
-    "        exp_values = np.exp(x_shifted)\n",
-    "        \n",
-    "        # Sum along last axis\n",
-    "        sum_exp = np.sum(exp_values, axis=-1, keepdims=True)\n",
-    "        \n",
-    "        # Divide to get probabilities\n",
-    "        result = exp_values / sum_exp\n",
-    "        \n",
-    "        return type(x)(result)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x):\n",
-    "        \"\"\"Make the class callable: softmax(x) instead of softmax.forward(x)\"\"\"\n",
-    "        return self.forward(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "747e807d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Softmax Implementation\n",
-    "\n",
-    "Once you implement the Softmax forward method above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1695c1d7",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-softmax-immediate",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_softmax_activation():\n",
-    "    \"\"\"Test Softmax activation function\"\"\"\n",
-    "    print(\"🔬 Unit Test: Softmax Activation...\")\n",
-    "\n",
-    "# Create Softmax instance\n",
-    "    softmax = Softmax()\n",
-    "\n",
-    "    # Test with simple input\n",
-    "    test_input = Tensor([[1, 2, 3]])\n",
-    "    result = softmax(test_input)\n",
-    "    \n",
-    "    # Check that outputs sum to 1\n",
-    "    output_sum = np.sum(result.data)\n",
-    "    assert abs(output_sum - 1.0) < 1e-6, f\"Softmax outputs should sum to 1, got {output_sum}\"\n",
-    "    \n",
-    "    # Check that all outputs are positive\n",
-    "    assert np.all(result.data > 0), \"Softmax outputs should be positive\"\n",
-    "    assert np.all(result.data < 1), \"Softmax outputs should be less than 1\"\n",
-    "    \n",
-    "    # Test with uniform input (should give equal probabilities)\n",
-    "    uniform_input = Tensor([[1, 1, 1]])\n",
-    "    uniform_result = softmax(uniform_input)\n",
-    "    expected_prob = 1.0 / 3.0\n",
-    "    \n",
-    "    for prob in uniform_result.data[0]:\n",
-    "        assert abs(prob - expected_prob) < 1e-6, f\"Uniform input should give equal probabilities\"\n",
-    "    \n",
-    "    # Test with batch input (multiple samples)\n",
-    "    batch_input = Tensor([[1, 2, 3], [4, 5, 6]])\n",
-    "    batch_result = softmax(batch_input)\n",
-    "    \n",
-    "    # Check that each row sums to 1\n",
-    "    for i in range(batch_input.shape[0]):\n",
-    "        row_sum = np.sum(batch_result.data[i])\n",
-    "        assert abs(row_sum - 1.0) < 1e-6, f\"Each row should sum to 1, row {i} sums to {row_sum}\"\n",
-    "    \n",
-    "    # Test numerical stability with large values\n",
-    "    large_input = Tensor([[1000, 1001, 1002]])\n",
-    "    large_result = softmax(large_input)\n",
-    "    \n",
-    "    assert not np.any(np.isnan(large_result.data)), \"Softmax should handle large values\"\n",
-    "    assert not np.any(np.isinf(large_result.data)), \"Softmax should not produce inf values\"\n",
-    "    \n",
-    "    large_sum = np.sum(large_result.data)\n",
-    "    assert abs(large_sum - 1.0) < 1e-6, \"Large values should still sum to 1\"\n",
-    "\n",
-    "# Test shape preservation\n",
-    "    assert batch_result.shape == batch_input.shape, \"Softmax should preserve shape\"\n",
-    "    \n",
-    "    print(\"✅ Softmax activation tests passed!\")\n",
-    "    print(f\"✅ Outputs sum to 1 (probability distribution)\")\n",
-    "    print(f\"✅ All outputs are positive\")\n",
-    "    print(f\"✅ Handles uniform inputs correctly\")\n",
-    "    print(f\"✅ Works with batch inputs\")\n",
-    "    print(f\"✅ Numerically stable with large values\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_softmax_activation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6b7807a6",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## 🎯 Comprehensive Test: All Activations Working Together\n",
-    "\n",
-    "### Real-World Scenario\n",
-    "Let's test how all activation functions work together in a realistic neural network scenario:\n",
-    "\n",
-    "- **Input processing**: Raw data transformation\n",
-    "- **Hidden layers**: ReLU for internal processing\n",
-    "- **Output layer**: Softmax for classification\n",
-    "- **Comparison**: See how different activations transform the same data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8e64dc4b",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-activations-comprehensive",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_activations():\n",
-    "    \"\"\"Test all activation functions working together\"\"\"\n",
-    "    print(\"🔬 Unit Test: Activation Functions Comprehensive Test...\")\n",
-    "    \n",
-    "    # Create instances of all activation functions\n",
-    "    relu = ReLU()\n",
-    "    sigmoid = Sigmoid()\n",
-    "    tanh = Tanh()\n",
-    "    softmax = Softmax()\n",
-    "    \n",
-    "    # Test data: simulating neural network layer outputs\n",
-    "    test_data = Tensor([[-2, -1, 0, 1, 2]])\n",
-    "    \n",
-    "    # Apply each activation function\n",
-    "    relu_result = relu(test_data)\n",
-    "    sigmoid_result = sigmoid(test_data)\n",
-    "    tanh_result = tanh(test_data)\n",
-    "    softmax_result = softmax(test_data)\n",
-    "    \n",
-    "    # Test that all functions preserve input shape\n",
-    "    assert relu_result.shape == test_data.shape, \"ReLU should preserve shape\"\n",
-    "    assert sigmoid_result.shape == test_data.shape, \"Sigmoid should preserve shape\"\n",
-    "    assert tanh_result.shape == test_data.shape, \"Tanh should preserve shape\"\n",
-    "    assert softmax_result.shape == test_data.shape, \"Softmax should preserve shape\"\n",
-    "    \n",
-    "    # Test that all functions return Tensor objects\n",
-    "    assert isinstance(relu_result, Tensor), \"ReLU should return Tensor\"\n",
-    "    assert isinstance(sigmoid_result, Tensor), \"Sigmoid should return Tensor\"\n",
-    "    assert isinstance(tanh_result, Tensor), \"Tanh should return Tensor\"\n",
-    "    assert isinstance(softmax_result, Tensor), \"Softmax should return Tensor\"\n",
-    "    \n",
-    "    # Test ReLU properties\n",
-    "    assert np.all(relu_result.data >= 0), \"ReLU output should be non-negative\"\n",
-    "    \n",
-    "    # Test Sigmoid properties\n",
-    "    assert np.all(sigmoid_result.data > 0), \"Sigmoid output should be positive\"\n",
-    "    assert np.all(sigmoid_result.data < 1), \"Sigmoid output should be less than 1\"\n",
-    "    \n",
-    "    # Test Tanh properties\n",
-    "    assert np.all(tanh_result.data > -1), \"Tanh output should be > -1\"\n",
-    "    assert np.all(tanh_result.data < 1), \"Tanh output should be < 1\"\n",
-    "    \n",
-    "    # Test Softmax properties\n",
-    "    softmax_sum = np.sum(softmax_result.data)\n",
-    "    assert abs(softmax_sum - 1.0) < 1e-6, \"Softmax outputs should sum to 1\"\n",
-    "    \n",
-    "    # Test chaining activations (realistic neural network scenario)\n",
-    "    # Hidden layer with ReLU\n",
-    "    hidden_output = relu(test_data)\n",
-    "    \n",
-    "    # Add some weights simulation (element-wise multiplication)\n",
-    "    weights = Tensor([[0.5, 0.3, 0.8, 0.2, 0.7]])\n",
-    "    weighted_output = hidden_output * weights\n",
-    "    \n",
-    "    # Final layer with Softmax\n",
-    "    final_output = softmax(weighted_output)\n",
-    "    \n",
-    "    # Test that chained operations work\n",
-    "    assert isinstance(final_output, Tensor), \"Chained operations should return Tensor\"\n",
-    "    assert abs(np.sum(final_output.data) - 1.0) < 1e-6, \"Final output should be valid probability\"\n",
-    "    \n",
-    "    # Test with batch data (multiple samples)\n",
-    "    batch_data = Tensor([\n",
-    "    [-2, -1, 0, 1, 2],\n",
-    "    [1, 2, 3, 4, 5],\n",
-    "    [-1, 0, 1, 2, 3]\n",
-    "    ])\n",
-    "    \n",
-    "    batch_softmax = softmax(batch_data)\n",
-    "    \n",
-    "    # Each row should sum to 1\n",
-    "    for i in range(batch_data.shape[0]):\n",
-    "        row_sum = np.sum(batch_softmax.data[i])\n",
-    "        assert abs(row_sum - 1.0) < 1e-6, f\"Batch row {i} should sum to 1\"\n",
-    "    \n",
-    "    print(\"✅ Activation functions comprehensive tests passed!\")\n",
-    "    print(f\"✅ All functions work together seamlessly\")\n",
-    "    print(f\"✅ Shape preservation across all activations\")\n",
-    "    print(f\"✅ Chained operations work correctly\")\n",
-    "    print(f\"✅ Batch processing works for all activations\")\n",
-    "    print(f\"✅ Ready for neural network integration!\")\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "test_activations()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4917e71a",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "02cdd85e",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Activations\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "792a556f",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Activation Functions Mastery!\n",
-    "\n",
-    "    Congratulations! You've successfully implemented all four essential activation functions:\n",
-    "\n",
-    "### ✅ What You've Built\n",
-    "    - **ReLU**: The foundation of modern deep learning with sparsity and efficiency\n",
-    "    - **Sigmoid**: Classic activation for binary classification and probability outputs\n",
-    "    - **Tanh**: Zero-centered activation with better gradient properties\n",
-    "    - **Softmax**: Probability distribution for multi-class classification\n",
-    "\n",
-    "### ✅ Key Learning Outcomes\n",
-    "    - **Understanding**: Why nonlinearity is essential for neural networks\n",
-    "    - **Implementation**: Built activation functions from scratch using NumPy\n",
-    "    - **Testing**: Progressive validation with immediate feedback after each function\n",
-    "    - **Integration**: Saw how activations work together in neural networks\n",
-    "    - **Real-world context**: Understanding where each activation is used\n",
-    "\n",
-    "### ✅ Mathematical Mastery\n",
-    "    - **ReLU**: f(x) = max(0, x) - Simple but powerful\n",
-    "    - **Sigmoid**: f(x) = 1/(1 + e^(-x)) - Maps to (0,1)\n",
-    "    - **Tanh**: f(x) = tanh(x) - Zero-centered, maps to (-1,1)\n",
-    "    - **Softmax**: f(x_i) = e^(x_i)/Σ(e^(x_j)) - Probability distribution\n",
-    "\n",
-    "### ✅ Professional Skills Developed\n",
-    "    - **Numerical stability**: Handling overflow and underflow\n",
-    "    - **API design**: Consistent interfaces across all functions\n",
-    "    - **Testing discipline**: Immediate validation after each implementation\n",
-    "    - **Integration thinking**: Understanding how components work together\n",
-    "\n",
-    "### ✅ Ready for Next Steps\n",
-    "    Your activation functions are now ready to power:\n",
-    "    - **Dense layers**: Linear transformations with nonlinear activations\n",
-    "    - **Convolutional layers**: Spatial feature extraction with ReLU\n",
-    "    - **Network architectures**: Complete neural networks with proper activations\n",
-    "    - **Training**: Gradient computation through activation functions\n",
-    "\n",
-    "### 🔗 Connection to Real ML Systems\n",
-    "    Your implementations mirror production systems:\n",
-    "    - **PyTorch**: `torch.nn.ReLU()`, `torch.nn.Sigmoid()`, `torch.nn.Tanh()`, `torch.nn.Softmax()`\n",
-    "    - **TensorFlow**: `tf.nn.relu()`, `tf.nn.sigmoid()`, `tf.nn.tanh()`, `tf.nn.softmax()`\n",
-    "    - **Industry applications**: Every major deep learning model uses these functions\n",
-    "\n",
-    "### 🎯 The Power of Nonlinearity\n",
-    "    You've unlocked the key to deep learning:\n",
-    "    - **Before**: Linear models limited to simple patterns\n",
-    "    - **After**: Nonlinear models can learn any pattern (universal approximation)\n",
-    "\n",
-    "    **Next Module**: Layers - Building blocks that combine your tensors and activations into powerful transformations!\n",
-    "\n",
-    "    Your activation functions are the key to neural network intelligence. Now let's build the layers that use them!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/04_layers/layers_dev.ipynb b/modules/source/04_layers/layers_dev.ipynb
deleted file mode 100644
index b20937bd..00000000
--- a/modules/source/04_layers/layers_dev.ipynb
+++ /dev/null
@@ -1,924 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "0e007598",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Layers - Building Blocks of Neural Networks\n",
-    "\n",
-    "Welcome to the Layers module! This is where we build the fundamental components that stack together to form neural networks.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand how matrix multiplication powers neural networks\n",
-    "- Implement naive matrix multiplication from scratch for deep understanding\n",
-    "- Build the Dense (Linear) layer - the foundation of all neural networks\n",
-    "- Learn weight initialization strategies and their importance\n",
-    "- See how layers compose with activations to create powerful networks\n",
-    "\n",
-    "## Build → Use → Understand\n",
-    "1. **Build**: Matrix multiplication and Dense layers from scratch\n",
-    "2. **Use**: Create and test layers with real data\n",
-    "3. **Understand**: How linear transformations enable feature learning"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bc400228",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "layers-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.layers\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import os\n",
-    "import sys\n",
-    "from typing import Union, List, Tuple, Optional\n",
-    "\n",
-    "# Import our dependencies - try from package first, then local modules\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
-    "    try:\n",
-    "        from tensor_dev import Tensor\n",
-    "        from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
-    "    except ImportError:\n",
-    "        # If the local modules are not available, use relative imports\n",
-    "        from ..tensor.tensor_dev import Tensor\n",
-    "        from ..activations.activations_dev import ReLU, Sigmoid, Tanh, Softmax"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e186492c",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "layers-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d41a5d47",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "layers-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Layers Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build neural network layers!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bed6f41e",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/03_layers/layers_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.layers`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.layers import Dense, Conv2D  # All layer types together!\n",
-    "from tinytorch.core.tensor import Tensor  # The foundation\n",
-    "from tinytorch.core.activations import ReLU, Sigmoid  # Nonlinearity\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused modules for deep understanding\n",
-    "- **Production:** Proper organization like PyTorch's `torch.nn.Linear`\n",
-    "- **Consistency:** All layer types live together in `core.layers`\n",
-    "- **Integration:** Works seamlessly with tensors and activations"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a2c033ee",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What Are Neural Network Layers?\n",
-    "\n",
-    "### The Building Block Pattern\n",
-    "Neural networks are built by stacking **layers** - each layer is a function that:\n",
-    "1. **Takes input**: Tensor data from previous layer\n",
-    "2. **Transforms**: Applies mathematical operations (linear transformation + activation)\n",
-    "3. **Produces output**: New tensor data for next layer\n",
-    "\n",
-    "### The Universal Pattern\n",
-    "Every layer follows this pattern:\n",
-    "```python\n",
-    "def layer(x):\n",
-    "    # 1. Linear transformation\n",
-    "    linear_output = x @ weights + bias\n",
-    "    \n",
-    "    # 2. Nonlinear activation\n",
-    "    output = activation(linear_output)\n",
-    "    \n",
-    "    return output\n",
-    "```\n",
-    "\n",
-    "### Why This Works\n",
-    "- **Linear part**: Learns feature combinations\n",
-    "- **Nonlinear part**: Enables complex patterns\n",
-    "- **Stacking**: Multiple layers = more complex functions\n",
-    "\n",
-    "### Mathematical Foundation\n",
-    "A neural network is function composition:\n",
-    "```\n",
-    "f(x) = layer_n(layer_{n-1}(...layer_2(layer_1(x))))\n",
-    "```\n",
-    "\n",
-    "Each layer transforms the representation to be more useful for the final task.\n",
-    "\n",
-    "### What We'll Build\n",
-    "1. **Matrix Multiplication**: The core operation powering all layers\n",
-    "2. **Dense Layer**: The fundamental building block of neural networks\n",
-    "3. **Integration**: How layers work with activations and tensors"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "448f63f6",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: Matrix Multiplication - The Engine of Neural Networks\n",
-    "\n",
-    "### What is Matrix Multiplication?\n",
-    "Matrix multiplication is the core operation that powers all neural network layers:\n",
-    "\n",
-    "```\n",
-    "C = A @ B\n",
-    "```\n",
-    "\n",
-    "Where:\n",
-    "- **A**: Input data (batch_size × input_features)\n",
-    "- **B**: Weight matrix (input_features × output_features)  \n",
-    "- **C**: Output data (batch_size × output_features)\n",
-    "\n",
-    "### Why It's Essential\n",
-    "- **Feature combination**: Each output combines all input features\n",
-    "- **Learned weights**: B contains the learned parameters\n",
-    "- **Efficient computation**: Vectorized operations are much faster\n",
-    "- **Parallel processing**: GPUs are designed for matrix operations\n",
-    "\n",
-    "### The Mathematical Definition\n",
-    "For matrices A (m×n) and B (n×p), the result C (m×p) is:\n",
-    "```\n",
-    "C[i,j] = Σ(k=0 to n-1) A[i,k] * B[k,j]\n",
-    "```\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "[1 2] @ [5 6] = [1*5+2*7  1*6+2*8] = [19 22]\n",
-    "[3 4]   [7 8]   [3*5+4*7  3*6+4*8]   [43 50]\n",
-    "```\n",
-    "\n",
-    "### Real-World Context\n",
-    "Every major operation in deep learning uses matrix multiplication:\n",
-    "- **Dense layers**: Linear transformations\n",
-    "- **Convolutional layers**: Convolution as matrix multiplication\n",
-    "- **Attention mechanisms**: Query-Key-Value computations\n",
-    "- **Embeddings**: Lookup tables as matrix multiplication"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "cccd838f",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "matmul-naive",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def matmul(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
-    "    \"\"\"\n",
-    "    Matrix multiplication using explicit for-loops.\n",
-    "    \n",
-    "    This helps you understand what matrix multiplication really does!\n",
-    "        \n",
-    "    TODO: Implement matrix multiplication using three nested for-loops.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Get the dimensions: m, n from A.shape and n2, p from B.shape\n",
-    "    2. Check compatibility: n must equal n2\n",
-    "    3. Create output matrix C of shape (m, p) filled with zeros\n",
-    "    4. Use three nested loops:\n",
-    "       - i loop: iterate through rows of A (0 to m-1)\n",
-    "       - j loop: iterate through columns of B (0 to p-1)\n",
-    "       - k loop: iterate through shared dimension (0 to n-1)\n",
-    "    5. For each (i,j), accumulate: C[i,j] += A[i,k] * B[k,j]\n",
-    "    \n",
-    "    EXAMPLE WALKTHROUGH:\n",
-    "    ```python\n",
-    "    A = [[1, 2],     B = [[5, 6],\n",
-    "         [3, 4]]          [7, 8]]\n",
-    "    \n",
-    "    C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19\n",
-    "    C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22\n",
-    "    C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43\n",
-    "    C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50\n",
-    "    \n",
-    "    Result: [[19, 22], [43, 50]]\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Get dimensions: m, n = A.shape; n2, p = B.shape\n",
-    "    - Check compatibility: if n != n2: raise ValueError\n",
-    "    - Initialize result: C = np.zeros((m, p))\n",
-    "    - Triple nested loop: for i in range(m): for j in range(p): for k in range(n):\n",
-    "    - Accumulate sum: C[i,j] += A[i,k] * B[k,j]\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is what every neural network layer does internally\n",
-    "    - Understanding this helps debug shape mismatches\n",
-    "    - Essential for understanding the foundation of neural networks\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Get matrix dimensions\n",
-    "    m, n = A.shape\n",
-    "    n2, p = B.shape\n",
-    "    \n",
-    "    # Check compatibility\n",
-    "    if n != n2:\n",
-    "        raise ValueError(f\"Incompatible matrix dimensions: A is {m}x{n}, B is {n2}x{p}\")\n",
-    "    \n",
-    "    # Initialize result matrix\n",
-    "    C = np.zeros((m, p))\n",
-    "    \n",
-    "    # Triple nested loop for matrix multiplication\n",
-    "    for i in range(m):\n",
-    "        for j in range(p):\n",
-    "            for k in range(n):\n",
-    "                C[i, j] += A[i, k] * B[k, j]\n",
-    "    \n",
-    "    return C\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6e695714",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Matrix Multiplication\n",
-    "\n",
-    "Once you implement the `matmul` function above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bed91066",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-matmul-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_matrix_multiplication():\n",
-    "    \"\"\"Test matrix multiplication implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Matrix Multiplication...\")\n",
-    "\n",
-    "# Test simple 2x2 case\n",
-    "    A = np.array([[1, 2], [3, 4]], dtype=np.float32)\n",
-    "    B = np.array([[5, 6], [7, 8]], dtype=np.float32)\n",
-    "    \n",
-    "    result = matmul(A, B)\n",
-    "    expected = np.array([[19, 22], [43, 50]], dtype=np.float32)\n",
-    "    \n",
-    "    assert np.allclose(result, expected), f\"Matrix multiplication failed: expected {expected}, got {result}\"\n",
-    "    \n",
-    "    # Compare with NumPy\n",
-    "    numpy_result = A @ B\n",
-    "    assert np.allclose(result, numpy_result), f\"Doesn't match NumPy: got {result}, expected {numpy_result}\"\n",
-    "\n",
-    "# Test different shapes\n",
-    "    A2 = np.array([[1, 2, 3]], dtype=np.float32)  # 1x3\n",
-    "    B2 = np.array([[4], [5], [6]], dtype=np.float32)  # 3x1\n",
-    "    result2 = matmul(A2, B2)\n",
-    "    expected2 = np.array([[32]], dtype=np.float32)  # 1*4 + 2*5 + 3*6 = 32\n",
-    "    \n",
-    "    assert np.allclose(result2, expected2), f\"1x3 @ 3x1 failed: expected {expected2}, got {result2}\"\n",
-    "    \n",
-    "    # Test 3x3 case\n",
-    "    A3 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)\n",
-    "    B3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32)  # Identity\n",
-    "    result3 = matmul(A3, B3)\n",
-    "    \n",
-    "    assert np.allclose(result3, A3), \"Multiplication by identity should preserve matrix\"\n",
-    "    \n",
-    "    # Test incompatible shapes\n",
-    "    A4 = np.array([[1, 2]], dtype=np.float32)  # 1x2\n",
-    "    B4 = np.array([[3], [4], [5]], dtype=np.float32)  # 3x1\n",
-    "    \n",
-    "    try:\n",
-    "        matmul(A4, B4)\n",
-    "        assert False, \"Should raise error for incompatible shapes\"\n",
-    "    except ValueError as e:\n",
-    "        assert \"Incompatible matrix dimensions\" in str(e)\n",
-    "    \n",
-    "    print(\"✅ Matrix multiplication tests passed!\")\n",
-    "    print(f\"✅ 2x2 multiplication working correctly\")\n",
-    "    print(f\"✅ Matches NumPy's implementation\")\n",
-    "    print(f\"✅ Handles different shapes correctly\")\n",
-    "    print(f\"✅ Proper error handling for incompatible shapes\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_matrix_multiplication()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ab183a07",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Dense Layer - The Foundation of Neural Networks\n",
-    "\n",
-    "### What is a Dense Layer?\n",
-    "A **Dense layer** (also called Linear or Fully Connected layer) is the fundamental building block of neural networks:\n",
-    "\n",
-    "```python\n",
-    "output = input @ weights + bias\n",
-    "```\n",
-    "\n",
-    "Where:\n",
-    "- **input**: Input data (batch_size × input_features)\n",
-    "- **weights**: Learned parameters (input_features × output_features)\n",
-    "- **bias**: Learned bias terms (output_features,)\n",
-    "- **output**: Transformed data (batch_size × output_features)\n",
-    "\n",
-    "### Why Dense Layers Are Essential\n",
-    "1. **Feature transformation**: Learn meaningful combinations of input features\n",
-    "2. **Universal approximation**: Stack enough layers to approximate any function\n",
-    "3. **Learnable parameters**: Weights and biases are optimized during training\n",
-    "4. **Composability**: Can be stacked to create complex architectures\n",
-    "\n",
-    "### The Mathematical Foundation\n",
-    "For input x, weight matrix W, and bias b:\n",
-    "```\n",
-    "y = xW + b\n",
-    "```\n",
-    "\n",
-    "This is a linear transformation that:\n",
-    "- **Combines features**: Each output is a weighted sum of all inputs\n",
-    "- **Learns relationships**: Weights encode feature interactions\n",
-    "- **Adds flexibility**: Bias allows shifting the output\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Classification**: Transform features to class logits\n",
-    "- **Regression**: Transform features to continuous outputs\n",
-    "- **Representation learning**: Learn useful intermediate representations\n",
-    "- **Attention mechanisms**: Compute queries, keys, and values\n",
-    "\n",
-    "### Design Decisions\n",
-    "- **Weight initialization**: Random initialization to break symmetry\n",
-    "- **Bias usage**: Usually included for flexibility\n",
-    "- **Activation**: Often followed by nonlinear activation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "eec77bde",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "dense-layer",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Dense:\n",
-    "    \"\"\"\n",
-    "    Dense (Linear/Fully Connected) Layer\n",
-    "    \n",
-    "    Applies a linear transformation: y = xW + b\n",
-    "    \n",
-    "    This is the fundamental building block of neural networks.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
-    "        \"\"\"\n",
-    "        Initialize Dense layer with random weights and optional bias.\n",
-    "        \n",
-    "        TODO: Implement Dense layer initialization.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Store the layer parameters (input_size, output_size, use_bias)\n",
-    "        2. Initialize weights with random values using proper scaling\n",
-    "        3. Initialize bias (if use_bias=True) with zeros\n",
-    "        4. Convert weights and bias to Tensor objects\n",
-    "        \n",
-    "        WEIGHT INITIALIZATION STRATEGY:\n",
-    "        - Use Xavier/Glorot initialization for better gradient flow\n",
-    "        - Scale: sqrt(2 / (input_size + output_size))\n",
-    "        - Random values: np.random.randn() * scale\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        layer = Dense(input_size=3, output_size=2)\n",
-    "        # Creates weight matrix of shape (3, 2) and bias of shape (2,)\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Store parameters: self.input_size, self.output_size, self.use_bias\n",
-    "        - Weight shape: (input_size, output_size)\n",
-    "        - Bias shape: (output_size,) if use_bias else None\n",
-    "        - Use Xavier initialization: scale = np.sqrt(2.0 / (input_size + output_size))\n",
-    "        - Initialize weights: np.random.randn(input_size, output_size) * scale\n",
-    "        - Initialize bias: np.zeros(output_size) if use_bias else None\n",
-    "        - Convert to Tensors: self.weights = Tensor(weight_data), self.bias = Tensor(bias_data)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Store layer parameters\n",
-    "        self.input_size = input_size\n",
-    "        self.output_size = output_size\n",
-    "        self.use_bias = use_bias\n",
-    "        \n",
-    "        # Xavier/Glorot initialization\n",
-    "        scale = np.sqrt(2.0 / (input_size + output_size))\n",
-    "        \n",
-    "        # Initialize weights with random values\n",
-    "        weight_data = np.random.randn(input_size, output_size) * scale\n",
-    "        self.weights = Tensor(weight_data)\n",
-    "        \n",
-    "        # Initialize bias\n",
-    "        if use_bias:\n",
-    "            bias_data = np.zeros(output_size)\n",
-    "            self.bias = Tensor(bias_data)\n",
-    "        else:\n",
-    "            self.bias = None\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, x):\n",
-    "        \"\"\"\n",
-    "        Forward pass through the Dense layer.\n",
-    "        \n",
-    "        TODO: Implement the forward pass: y = xW + b\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Perform matrix multiplication: x @ self.weights\n",
-    "        2. Add bias if present: result + self.bias\n",
-    "        3. Return the result as a Tensor\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        layer = Dense(input_size=3, output_size=2)\n",
-    "        input_data = Tensor([[1, 2, 3]])  # Shape: (1, 3)\n",
-    "        output = layer(input_data)        # Shape: (1, 2)\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Matrix multiplication: matmul(x.data, self.weights.data)\n",
-    "        - Add bias: result + self.bias.data (broadcasting handles shape)\n",
-    "        - Return as Tensor: return Tensor(final_result)\n",
-    "        - Handle both cases: with and without bias\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is the core operation in every neural network layer\n",
-    "        - Matrix multiplication combines all input features\n",
-    "        - Bias addition allows shifting the output distribution\n",
-    "        - The result feeds into activation functions\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Perform matrix multiplication\n",
-    "        linear_output = matmul(x.data, self.weights.data)\n",
-    "        \n",
-    "        # Add bias if present\n",
-    "        if self.use_bias and self.bias is not None:\n",
-    "            linear_output = linear_output + self.bias.data\n",
-    "        \n",
-    "        return type(x)(linear_output)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x):\n",
-    "        \"\"\"Make the layer callable: layer(x) instead of layer.forward(x)\"\"\"\n",
-    "        return self.forward(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5736d98c",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Dense Layer\n",
-    "\n",
-    "Once you implement the Dense layer above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9b1d056c",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-dense-layer",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_dense_layer():\n",
-    "    \"\"\"Test Dense layer implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Dense Layer...\")\n",
-    "    \n",
-    "    # Test layer creation\n",
-    "    layer = Dense(input_size=3, output_size=2)\n",
-    "    \n",
-    "    # Check weight and bias shapes\n",
-    "    assert layer.weights.shape == (3, 2), f\"Weight shape should be (3, 2), got {layer.weights.shape}\"\n",
-    "    assert layer.bias is not None, \"Bias should not be None when use_bias=True\"\n",
-    "    assert layer.bias.shape == (2,), f\"Bias shape should be (2,), got {layer.bias.shape}\"\n",
-    "    \n",
-    "    # Test forward pass\n",
-    "    input_data = Tensor([[1, 2, 3]])  # Shape: (1, 3)\n",
-    "    output = layer(input_data)\n",
-    "    \n",
-    "    # Check output shape\n",
-    "    assert output.shape == (1, 2), f\"Output shape should be (1, 2), got {output.shape}\"\n",
-    "    \n",
-    "    # Test batch processing\n",
-    "    batch_input = Tensor([[1, 2, 3], [4, 5, 6]])  # Shape: (2, 3)\n",
-    "    batch_output = layer(batch_input)\n",
-    "    \n",
-    "    assert batch_output.shape == (2, 2), f\"Batch output shape should be (2, 2), got {batch_output.shape}\"\n",
-    "\n",
-    "# Test without bias\n",
-    "    no_bias_layer = Dense(input_size=3, output_size=2, use_bias=False)\n",
-    "    assert no_bias_layer.bias is None, \"Layer without bias should have None bias\"\n",
-    "    \n",
-    "    no_bias_output = no_bias_layer(input_data)\n",
-    "    assert no_bias_output.shape == (1, 2), \"No-bias layer should still produce correct shape\"\n",
-    "    \n",
-    "    # Test that different inputs produce different outputs\n",
-    "    input1 = Tensor([[1, 0, 0]])\n",
-    "    input2 = Tensor([[0, 1, 0]])\n",
-    "    \n",
-    "    output1 = layer(input1)\n",
-    "    output2 = layer(input2)\n",
-    "    \n",
-    "    # Should not be equal (with high probability due to random initialization)\n",
-    "    assert not np.allclose(output1.data, output2.data), \"Different inputs should produce different outputs\"\n",
-    "    \n",
-    "    # Test linearity property: layer(a*x) = a*layer(x)\n",
-    "    scale = 2.0\n",
-    "    scaled_input = Tensor([[2, 4, 6]])  # 2 * [1, 2, 3]\n",
-    "    scaled_output = layer(scaled_input)\n",
-    "    \n",
-    "    # Due to bias, this won't be exactly 2*output, but the linear part should scale\n",
-    "    print(\"✅ Dense layer tests passed!\")\n",
-    "    print(f\"✅ Correct weight and bias initialization\")\n",
-    "    print(f\"✅ Forward pass produces correct shapes\")\n",
-    "    print(f\"✅ Batch processing works correctly\")\n",
-    "    print(f\"✅ Bias and no-bias variants work\")\n",
-    "    print(f\"✅ Naive matrix multiplication option works\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_dense_layer()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ac4dcba0",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Layer Integration with Activations\n",
-    "\n",
-    "### Building Complete Neural Network Components\n",
-    "Now let's see how Dense layers work with activation functions to create complete neural network components:\n",
-    "\n",
-    "```python\n",
-    "# Complete neural network layer\n",
-    "x = input_data\n",
-    "linear_output = dense_layer(x)\n",
-    "final_output = activation_function(linear_output)\n",
-    "```\n",
-    "\n",
-    "### Why This Combination Works\n",
-    "1. **Linear transformation**: Dense layer learns feature combinations\n",
-    "2. **Nonlinear activation**: Enables complex pattern recognition\n",
-    "3. **Stacking**: Multiple layer+activation pairs create deep networks\n",
-    "4. **Universal approximation**: Can approximate any continuous function\n",
-    "\n",
-    "### Real-World Layer Patterns\n",
-    "- **Hidden layers**: Dense + ReLU (most common)\n",
-    "- **Output layers**: Dense + Softmax (classification) or Dense + Sigmoid (binary)\n",
-    "- **Gated layers**: Dense + Sigmoid (for gates in LSTM/GRU)\n",
-    "- **Attention layers**: Dense + Softmax (for attention weights)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f5e77a64",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-layer-activation-comprehensive",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_layer_activation():\n",
-    "    \"\"\"Test Dense layer comprehensive testing with activation functions\"\"\"\n",
-    "    print(\"🔬 Unit Test: Layer-Activation Comprehensive Test...\")\n",
-    "    \n",
-    "    # Create layer and activation functions\n",
-    "    layer = Dense(input_size=4, output_size=3)\n",
-    "    relu = ReLU()\n",
-    "    sigmoid = Sigmoid()\n",
-    "    tanh = Tanh()\n",
-    "    softmax = Softmax()\n",
-    "    \n",
-    "    # Test input\n",
-    "    input_data = Tensor([[1, -2, 3, -4], [2, 1, -1, 3]])  # Shape: (2, 4)\n",
-    "    \n",
-    "    # Test Dense + ReLU (common hidden layer pattern)\n",
-    "    linear_output = layer(input_data)\n",
-    "    relu_output = relu(linear_output)\n",
-    "    \n",
-    "    assert relu_output.shape == (2, 3), \"ReLU output should preserve shape\"\n",
-    "    assert np.all(relu_output.data >= 0), \"ReLU output should be non-negative\"\n",
-    "    \n",
-    "    # Test Dense + Softmax (classification output pattern)\n",
-    "    softmax_output = softmax(linear_output)\n",
-    "    \n",
-    "    assert softmax_output.shape == (2, 3), \"Softmax output should preserve shape\"\n",
-    "    \n",
-    "    # Each row should sum to 1 (probability distribution)\n",
-    "    for i in range(2):\n",
-    "        row_sum = np.sum(softmax_output.data[i])\n",
-    "        assert abs(row_sum - 1.0) < 1e-6, f\"Row {i} should sum to 1, got {row_sum}\"\n",
-    "    \n",
-    "    # Test Dense + Sigmoid (binary classification pattern)\n",
-    "    sigmoid_output = sigmoid(linear_output)\n",
-    "    \n",
-    "    assert sigmoid_output.shape == (2, 3), \"Sigmoid output should preserve shape\"\n",
-    "    assert np.all(sigmoid_output.data > 0), \"Sigmoid output should be positive\"\n",
-    "    assert np.all(sigmoid_output.data < 1), \"Sigmoid output should be less than 1\"\n",
-    "    \n",
-    "    # Test Dense + Tanh (hidden layer with centered outputs)\n",
-    "    tanh_output = tanh(linear_output)\n",
-    "    \n",
-    "    assert tanh_output.shape == (2, 3), \"Tanh output should preserve shape\"\n",
-    "    assert np.all(tanh_output.data > -1), \"Tanh output should be > -1\"\n",
-    "    assert np.all(tanh_output.data < 1), \"Tanh output should be < 1\"\n",
-    "    \n",
-    "    # Test chained layers (simple 2-layer network)\n",
-    "    layer1 = Dense(input_size=4, output_size=5)\n",
-    "    layer2 = Dense(input_size=5, output_size=3)\n",
-    "    \n",
-    "    # Forward pass through 2-layer network\n",
-    "    hidden = relu(layer1(input_data))\n",
-    "    output = softmax(layer2(hidden))\n",
-    "    \n",
-    "    assert output.shape == (2, 3), \"2-layer network should produce correct output shape\"\n",
-    "    \n",
-    "    # Each output should be a valid probability distribution\n",
-    "    for i in range(2):\n",
-    "        row_sum = np.sum(output.data[i])\n",
-    "        assert abs(row_sum - 1.0) < 1e-6, f\"Network output row {i} should sum to 1\"\n",
-    "    \n",
-    "    # Test that layers are learning-ready (have parameters)\n",
-    "    assert hasattr(layer1, 'weights'), \"Layer should have weights\"\n",
-    "    assert hasattr(layer1, 'bias'), \"Layer should have bias\"\n",
-    "    assert isinstance(layer1.weights, Tensor), \"Weights should be Tensor\"\n",
-    "    assert isinstance(layer1.bias, Tensor), \"Bias should be Tensor\"\n",
-    "    \n",
-    "    print(\"✅ Layer-activation comprehensive tests passed!\")\n",
-    "    print(f\"✅ Dense + ReLU working correctly\")\n",
-    "    print(f\"✅ Dense + Softmax producing valid probabilities\")\n",
-    "    print(f\"✅ Dense + Sigmoid bounded correctly\")\n",
-    "    print(f\"✅ Dense + Tanh centered correctly\")\n",
-    "    print(f\"✅ Multi-layer networks working\")\n",
-    "    print(f\"✅ All components ready for training!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_layer_activation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9cfd022a",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e508b1ce",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Layers\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "89a2d068",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Neural Network Layers Mastery!\n",
-    "\n",
-    "Congratulations! You've successfully implemented the fundamental building blocks of neural networks:\n",
-    "\n",
-    "### ✅ What You've Built\n",
-    "- **Matrix Multiplication**: The core operation powering all neural network computations\n",
-    "- **Dense Layer**: The fundamental building block with proper weight initialization\n",
-    "- **Integration**: How layers work with activation functions to create complete neural components\n",
-    "- **Flexibility**: Support for bias/no-bias and naive/optimized matrix multiplication\n",
-    "\n",
-    "### ✅ Key Learning Outcomes\n",
-    "- **Understanding**: How linear transformations enable feature learning\n",
-    "- **Implementation**: Built layers from scratch with proper initialization\n",
-    "- **Testing**: Progressive validation with immediate feedback\n",
-    "- **Integration**: Saw how layers compose with activations for complete functionality\n",
-    "- **Real-world skills**: Understanding the mathematics behind neural networks\n",
-    "\n",
-    "### ✅ Mathematical Mastery\n",
-    "- **Matrix Multiplication**: C[i,j] = Σ(A[i,k] * B[k,j]) - implemented with loops\n",
-    "- **Linear Transformation**: y = xW + b - the heart of neural networks\n",
-    "- **Xavier Initialization**: Proper weight scaling for stable gradients\n",
-    "- **Composition**: How multiple layers create complex functions\n",
-    "\n",
-    "### ✅ Professional Skills Developed\n",
-    "- **Algorithm implementation**: From mathematical definition to working code\n",
-    "- **Performance considerations**: Naive vs optimized implementations\n",
-    "- **API design**: Clean, consistent interfaces for layer creation and usage\n",
-    "- **Testing methodology**: Unit tests, comprehensive tests, and edge case handling\n",
-    "\n",
-    "### ✅ Ready for Next Steps\n",
-    "Your layers are now ready to power:\n",
-    "- **Complete Networks**: Stack multiple layers with activations\n",
-    "- **Training**: Gradient computation and parameter updates\n",
-    "- **Specialized Architectures**: CNNs, RNNs, Transformers all use these foundations\n",
-    "- **Real Applications**: Image classification, NLP, game playing, etc.\n",
-    "\n",
-    "### 🔗 Connection to Real ML Systems\n",
-    "Your implementations mirror production frameworks:\n",
-    "- **PyTorch**: `torch.nn.Linear()` - same mathematical operations\n",
-    "- **TensorFlow**: `tf.keras.layers.Dense()` - identical functionality\n",
-    "- **Industry**: Every major neural network uses these exact computations\n",
-    "\n",
-    "### 🎯 The Power of Linear Algebra\n",
-    "You've unlocked the mathematical foundation of AI:\n",
-    "- **Feature combination**: Each layer learns how to combine input features\n",
-    "- **Representation learning**: Layers automatically discover useful representations\n",
-    "- **Universal approximation**: Stack enough layers to approximate any function\n",
-    "- **Scalability**: Same operations work from small networks to massive language models\n",
-    "\n",
-    "### 🧠 Deep Learning Insights\n",
-    "- **Why deep networks work**: Multiple layers = multiple levels of abstraction\n",
-    "- **Parameter efficiency**: Shared weights enable learning with limited data\n",
-    "- **Gradient flow**: Proper initialization enables training deep networks\n",
-    "- **Composability**: Simple components combine to create complex intelligence\n",
-    "\n",
-    "**Next Module**: Networks - Composing your layers into complete neural network architectures!\n",
-    "\n",
-    "Your layers are the building blocks. Now let's assemble them into powerful neural networks that can learn to solve complex problems!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/05_dense/dense_dev.ipynb b/modules/source/05_dense/dense_dev.ipynb
deleted file mode 100644
index d047e865..00000000
--- a/modules/source/05_dense/dense_dev.ipynb
+++ /dev/null
@@ -1,1165 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "1d75d07b",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Networks - Neural Network Architectures\n",
-    "\n",
-    "Welcome to the Networks module! This is where we compose layers into complete neural network architectures.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand networks as function composition: `f(x) = layer_n(...layer_2(layer_1(x)))`\n",
-    "- Build the Sequential network architecture for composing layers\n",
-    "- Create common network patterns like MLPs (Multi-Layer Perceptrons)\n",
-    "- Visualize network architectures and understand their capabilities\n",
-    "- Master forward pass inference through complete networks\n",
-    "\n",
-    "## Build → Use → Reflect\n",
-    "1. **Build**: Sequential networks that compose layers into complete architectures\n",
-    "2. **Use**: Create different network patterns and run inference\n",
-    "3. **Reflect**: How architecture design affects network behavior and capability\n",
-    "\n",
-    "## What You'll Learn\n",
-    "By the end of this module, you'll understand:\n",
-    "- How simple layers combine to create complex behaviors\n",
-    "- The fundamental Sequential architecture pattern\n",
-    "- How to build MLPs with any number of layers\n",
-    "- Different network architectures (shallow, deep, wide)\n",
-    "- How neural networks approximate complex functions"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ba3f59de",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "networks-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.dense\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "import os\n",
-    "from typing import List, Union, Optional, Callable\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "# Import all the building blocks we need - try package first, then local modules\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.layers import Dense\n",
-    "    from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))\n",
-    "    from tensor_dev import Tensor\n",
-    "    from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
-    "    from layers_dev import Dense"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "402ec8fe",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "networks-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d49ff26b",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "networks-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Networks Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build neural network architectures!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3d812770",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/04_networks/networks_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.networks`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.networks import Sequential, create_mlp  # Network architectures!\n",
-    "from tinytorch.core.layers import Dense, Conv2D  # Building blocks\n",
-    "from tinytorch.core.activations import ReLU, Sigmoid, Tanh  # Nonlinearity\n",
-    "from tinytorch.core.tensor import Tensor  # Foundation\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused modules for deep understanding\n",
-    "- **Production:** Proper organization like PyTorch's `torch.nn.Sequential`\n",
-    "- **Consistency:** All network architectures live together in `core.networks`\n",
-    "- **Integration:** Works seamlessly with layers, activations, and tensors"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4fe22e03",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 1: Understanding Neural Networks as Function Composition\n",
-    "\n",
-    "### What is a Neural Network?\n",
-    "A neural network is simply **function composition** - chaining simple functions together to create complex behaviors:\n",
-    "\n",
-    "```\n",
-    "f(x) = f_n(f_{n-1}(...f_2(f_1(x))))\n",
-    "```\n",
-    "\n",
-    "### Real-World Analogy: Assembly Line\n",
-    "Think of an assembly line in a factory:\n",
-    "- **Input:** Raw materials (data)\n",
-    "- **Stations:** Each worker (layer) transforms the product\n",
-    "- **Output:** Final product (predictions)\n",
-    "\n",
-    "### The Power of Composition\n",
-    "```python\n",
-    "# Simple functions\n",
-    "def add_one(x): return x + 1\n",
-    "def multiply_two(x): return x * 2\n",
-    "def square(x): return x * x\n",
-    "\n",
-    "# Composed function\n",
-    "def complex_function(x):\n",
-    "    return square(multiply_two(add_one(x)))\n",
-    "    \n",
-    "# This is what neural networks do!\n",
-    "```\n",
-    "\n",
-    "### Why This Matters\n",
-    "- **Universal Approximation:** MLPs can approximate any continuous function\n",
-    "- **Hierarchical Learning:** Early layers learn simple features, later layers learn complex patterns\n",
-    "- **Composability:** Mix and match layers to create custom architectures\n",
-    "- **Scalability:** Add more layers or make them wider as needed\n",
-    "\n",
-    "### From Modules We've Built\n",
-    "- **Tensors:** The data containers that flow through networks\n",
-    "- **Activations:** The nonlinear transformations that enable complex behaviors\n",
-    "- **Layers:** The building blocks that transform data\n",
-    "\n",
-    "Now let's build our first network architecture!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9b5e0c86",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Building the Sequential Network\n",
-    "\n",
-    "### What is Sequential?\n",
-    "**Sequential** is the most fundamental network architecture - it applies layers in order:\n",
-    "\n",
-    "```\n",
-    "Sequential([layer1, layer2, layer3]) \n",
-    "→ f(x) = layer3(layer2(layer1(x)))\n",
-    "```\n",
-    "\n",
-    "### Why Sequential Matters\n",
-    "- **Foundation:** Every neural network library has this pattern\n",
-    "- **Simplicity:** Easy to understand and implement\n",
-    "- **Flexibility:** Can compose any layers in any order\n",
-    "- **Building Block:** Foundation for more complex architectures\n",
-    "\n",
-    "### The Sequential Pattern\n",
-    "```python\n",
-    "# PyTorch style\n",
-    "model = nn.Sequential(\n",
-    "    nn.Linear(784, 128),\n",
-    "    nn.ReLU(),\n",
-    "    nn.Linear(128, 10)\n",
-    ")\n",
-    "\n",
-    "# Our TinyTorch style\n",
-    "model = Sequential([\n",
-    "    Dense(784, 128),\n",
-    "    ReLU(),\n",
-    "    Dense(128, 10)\n",
-    "])\n",
-    "```\n",
-    "\n",
-    "Let's implement this fundamental architecture!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9d725744",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "sequential-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Sequential:\n",
-    "    \"\"\"\n",
-    "    Sequential Network: Composes layers in sequence\n",
-    "    \n",
-    "    The most fundamental network architecture.\n",
-    "    Applies layers in order: f(x) = layer_n(...layer_2(layer_1(x)))\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, layers: Optional[List] = None):\n",
-    "        \"\"\"\n",
-    "        Initialize Sequential network with layers.\n",
-    "        \n",
-    "        Args:\n",
-    "            layers: List of layers to compose in order (optional, defaults to empty list)\n",
-    "            \n",
-    "        TODO: Store the layers and implement forward pass\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store the layers list as an instance variable\n",
-    "        2. Initialize empty list if no layers provided\n",
-    "        3. Prepare for forward pass implementation\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Sequential([Dense(3,4), ReLU(), Dense(4,2)])\n",
-    "        creates a 3-layer network: Dense → ReLU → Dense\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use self.layers to store the layers\n",
-    "        - Handle empty initialization case\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.layers = layers if layers is not None else []\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, x: Tensor) -> Tensor:\n",
-    "        \"\"\"\n",
-    "        Forward pass through all layers in sequence.\n",
-    "        \n",
-    "        Args:\n",
-    "            x: Input tensor\n",
-    "            \n",
-    "        Returns:\n",
-    "            Output tensor after passing through all layers\n",
-    "            \n",
-    "        TODO: Implement sequential forward pass through all layers\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Start with the input tensor\n",
-    "        2. Apply each layer in sequence\n",
-    "        3. Each layer's output becomes the next layer's input\n",
-    "        4. Return the final output\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Input: Tensor([[1, 2, 3]])\n",
-    "        Layer1 (Dense): Tensor([[1.4, 2.8]])\n",
-    "        Layer2 (ReLU): Tensor([[1.4, 2.8]])\n",
-    "        Layer3 (Dense): Tensor([[0.7]])\n",
-    "        Output: Tensor([[0.7]])\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use a for loop: for layer in self.layers:\n",
-    "        - Apply each layer: x = layer(x)\n",
-    "        - The output of one layer becomes input to the next\n",
-    "        - Return the final result\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Apply each layer in sequence\n",
-    "        for layer in self.layers:\n",
-    "            x = layer(x)\n",
-    "        return x\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x: Tensor) -> Tensor:\n",
-    "        \"\"\"Make the network callable: sequential(x) instead of sequential.forward(x)\"\"\"\n",
-    "        return self.forward(x)\n",
-    "    \n",
-    "    def add(self, layer):\n",
-    "        \"\"\"Add a layer to the network.\"\"\"\n",
-    "        self.layers.append(layer)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3f80261e",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Sequential Network\n",
-    "\n",
-    "Let's test your Sequential network implementation! This is the foundation of all neural network architectures.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific class (Sequential network) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1fcf76f5",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-sequential-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test Sequential network immediately after implementation\n",
-    "print(\"🔬 Unit Test: Sequential Network...\")\n",
-    "\n",
-    "# Create a simple 2-layer network: 3 → 4 → 2\n",
-    "try:\n",
-    "    network = Sequential([\n",
-    "        Dense(input_size=3, output_size=4),\n",
-    "        ReLU(),\n",
-    "        Dense(input_size=4, output_size=2),\n",
-    "        Sigmoid()\n",
-    "    ])\n",
-    "    \n",
-    "    print(f\"Network created with {len(network.layers)} layers\")\n",
-    "    print(\"✅ Sequential network creation successful\")\n",
-    "    \n",
-    "    # Test with sample data\n",
-    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
-    "    print(f\"Input: {x}\")\n",
-    "    \n",
-    "    # Forward pass\n",
-    "    y = network(x)\n",
-    "    print(f\"Output: {y}\")\n",
-    "    print(f\"Output shape: {y.shape}\")\n",
-    "    \n",
-    "    # Verify the network works\n",
-    "    assert y.shape == (1, 2), f\"Expected shape (1, 2), got {y.shape}\"\n",
-    "    print(\"✅ Sequential network produces correct output shape\")\n",
-    "    \n",
-    "    # Test that sigmoid output is in valid range\n",
-    "    assert np.all(y.data >= 0) and np.all(y.data <= 1), \"Sigmoid output should be between 0 and 1\"\n",
-    "    print(\"✅ Sequential network output is in valid range\")\n",
-    "    \n",
-    "    # Test that layers are stored correctly\n",
-    "    assert len(network.layers) == 4, f\"Expected 4 layers, got {len(network.layers)}\"\n",
-    "    print(\"✅ Sequential network stores layers correctly\")\n",
-    "    \n",
-    "    # Test batch processing\n",
-    "    x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])\n",
-    "    y_batch = network(x_batch)\n",
-    "    assert y_batch.shape == (2, 2), f\"Expected batch shape (2, 2), got {y_batch.shape}\"\n",
-    "    print(\"✅ Sequential network handles batch processing\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Sequential network test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Show the network architecture\n",
-    "print(\"🎯 Sequential network behavior:\")\n",
-    "print(\"   Applies layers in sequence: f(g(h(x)))\")\n",
-    "print(\"   Input flows through each layer in order\")\n",
-    "print(\"   Output of layer i becomes input of layer i+1\")\n",
-    "print(\"📈 Progress: Sequential network ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "acbb7fe0",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Building Multi-Layer Perceptrons (MLPs)\n",
-    "\n",
-    "### What is an MLP?\n",
-    "A **Multi-Layer Perceptron** is the classic neural network architecture:\n",
-    "\n",
-    "```\n",
-    "Input → Dense → Activation → Dense → Activation → ... → Dense → Output\n",
-    "```\n",
-    "\n",
-    "### Why MLPs are Important\n",
-    "- **Universal approximation**: Can approximate any continuous function\n",
-    "- **Foundation**: Basis for understanding all neural networks\n",
-    "- **Versatile**: Works for classification, regression, and more\n",
-    "- **Simple**: Easy to understand and implement\n",
-    "\n",
-    "### MLP Architecture Pattern\n",
-    "```\n",
-    "create_mlp(3, [4, 2], 1) creates:\n",
-    "Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Tabular data**: Customer analytics, financial modeling\n",
-    "- **Feature learning**: Learning representations from raw data\n",
-    "- **Classification**: Spam detection, medical diagnosis\n",
-    "- **Regression**: Price prediction, time series forecasting\n",
-    "\n",
-    "### The MLP Factory Pattern\n",
-    "Instead of manually creating each layer, we'll build a function that creates MLPs automatically!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "310a4d03",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "create-mlp",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int, \n",
-    "               activation=ReLU, output_activation=Sigmoid) -> Sequential:\n",
-    "    \"\"\"\n",
-    "    Create a Multi-Layer Perceptron (MLP) network.\n",
-    "    \n",
-    "    Args:\n",
-    "        input_size: Number of input features\n",
-    "        hidden_sizes: List of hidden layer sizes\n",
-    "        output_size: Number of output features\n",
-    "        activation: Activation function for hidden layers (default: ReLU)\n",
-    "        output_activation: Activation function for output layer (default: Sigmoid)\n",
-    "        \n",
-    "    Returns:\n",
-    "        Sequential network with MLP architecture\n",
-    "        \n",
-    "    TODO: Implement MLP creation with alternating Dense and activation layers.\n",
-    "    \n",
-    "    APPROACH:\n",
-    "    1. Start with an empty list of layers\n",
-    "    2. Add layers in this pattern:\n",
-    "       - Dense(input_size → first_hidden_size)\n",
-    "       - Activation()\n",
-    "       - Dense(first_hidden_size → second_hidden_size)\n",
-    "       - Activation()\n",
-    "       - ...\n",
-    "       - Dense(last_hidden_size → output_size)\n",
-    "       - Output_activation()\n",
-    "    3. Return Sequential(layers)\n",
-    "    \n",
-    "    EXAMPLE:\n",
-    "    create_mlp(3, [4, 2], 1) creates:\n",
-    "    Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid\n",
-    "    \n",
-    "    HINTS:\n",
-    "    - Start with layers = []\n",
-    "    - Track current_size starting with input_size\n",
-    "    - For each hidden_size: add Dense(current_size, hidden_size), then activation\n",
-    "    - Finally add Dense(last_hidden_size, output_size), then output_activation\n",
-    "    - Return Sequential(layers)\n",
-    "    \"\"\"\n",
-    "    layers = []\n",
-    "    current_size = input_size\n",
-    "    \n",
-    "    # Add hidden layers with activations\n",
-    "    for hidden_size in hidden_sizes:\n",
-    "        layers.append(Dense(current_size, hidden_size))\n",
-    "        layers.append(activation())\n",
-    "        current_size = hidden_size\n",
-    "    \n",
-    "    # Add output layer with output activation\n",
-    "    layers.append(Dense(current_size, output_size))\n",
-    "    layers.append(output_activation())\n",
-    "    \n",
-    "    return Sequential(layers)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f73b16ac",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: MLP Creation\n",
-    "\n",
-    "Let's test your MLP creation function! This builds complete neural networks with a single function call.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific function (create_mlp) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4423cc45",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-mlp-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test MLP creation immediately after implementation\n",
-    "print(\"🔬 Unit Test: MLP Creation...\")\n",
-    "\n",
-    "# Create a simple MLP: 3 → 4 → 2 → 1\n",
-    "try:\n",
-    "    mlp = create_mlp(input_size=3, hidden_sizes=[4, 2], output_size=1)\n",
-    "    \n",
-    "    print(f\"MLP created with {len(mlp.layers)} layers\")\n",
-    "    print(\"✅ MLP creation successful\")\n",
-    "    \n",
-    "    # Test the structure - should have 6 layers: Dense, ReLU, Dense, ReLU, Dense, Sigmoid\n",
-    "    expected_layers = 6  # 3 Dense + 2 ReLU + 1 Sigmoid\n",
-    "    assert len(mlp.layers) == expected_layers, f\"Expected {expected_layers} layers, got {len(mlp.layers)}\"\n",
-    "    print(\"✅ MLP has correct number of layers\")\n",
-    "    \n",
-    "    # Test layer types\n",
-    "    layer_types = [type(layer).__name__ for layer in mlp.layers]\n",
-    "    expected_pattern = ['Dense', 'ReLU', 'Dense', 'ReLU', 'Dense', 'Sigmoid']\n",
-    "    assert layer_types == expected_pattern, f\"Expected pattern {expected_pattern}, got {layer_types}\"\n",
-    "    print(\"✅ MLP follows correct layer pattern\")\n",
-    "    \n",
-    "    # Test with sample data\n",
-    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
-    "    y = mlp(x)\n",
-    "    print(f\"MLP input: {x}\")\n",
-    "    print(f\"MLP output: {y}\")\n",
-    "    print(f\"MLP output shape: {y.shape}\")\n",
-    "    \n",
-    "    # Verify the output\n",
-    "    assert y.shape == (1, 1), f\"Expected shape (1, 1), got {y.shape}\"\n",
-    "    print(\"✅ MLP produces correct output shape\")\n",
-    "    \n",
-    "    # Test that sigmoid output is in valid range\n",
-    "    assert np.all(y.data >= 0) and np.all(y.data <= 1), \"Sigmoid output should be between 0 and 1\"\n",
-    "    print(\"✅ MLP output is in valid range\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ MLP creation test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test different architectures\n",
-    "try:\n",
-    "    # Test shallow network\n",
-    "    shallow_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1)\n",
-    "    assert len(shallow_net.layers) == 4, f\"Shallow network should have 4 layers, got {len(shallow_net.layers)}\"\n",
-    "    \n",
-    "    # Test deep network  \n",
-    "    deep_net = create_mlp(input_size=3, hidden_sizes=[4, 4, 4], output_size=1)\n",
-    "    assert len(deep_net.layers) == 8, f\"Deep network should have 8 layers, got {len(deep_net.layers)}\"\n",
-    "    \n",
-    "    # Test wide network\n",
-    "    wide_net = create_mlp(input_size=3, hidden_sizes=[10], output_size=1)\n",
-    "    assert len(wide_net.layers) == 4, f\"Wide network should have 4 layers, got {len(wide_net.layers)}\"\n",
-    "    \n",
-    "    print(\"✅ Different MLP architectures work correctly\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ MLP architecture test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Show the MLP pattern\n",
-    "print(\"🎯 MLP creation pattern:\")\n",
-    "print(\"   Input → Dense → Activation → Dense → Activation → ... → Dense → Output_Activation\")\n",
-    "print(\"   Automatically creates the complete architecture\")\n",
-    "print(\"   Handles any number of hidden layers\")\n",
-    "print(\"📈 Progress: Sequential network ✓, MLP creation ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "65373d18",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 4: Understanding Network Architectures\n",
-    "\n",
-    "### Architecture Patterns\n",
-    "Different network architectures solve different problems:\n",
-    "\n",
-    "#### **Shallow vs Deep Networks**\n",
-    "```python\n",
-    "# Shallow: 1 hidden layer\n",
-    "shallow = create_mlp(10, [20], 1)\n",
-    "\n",
-    "# Deep: Many hidden layers\n",
-    "deep = create_mlp(10, [20, 20, 20], 1)\n",
-    "```\n",
-    "\n",
-    "#### **Narrow vs Wide Networks**\n",
-    "```python\n",
-    "# Narrow: Few neurons per layer\n",
-    "narrow = create_mlp(10, [5, 5], 1)\n",
-    "\n",
-    "# Wide: Many neurons per layer\n",
-    "wide = create_mlp(10, [50], 1)\n",
-    "```\n",
-    "\n",
-    "### Why Architecture Matters\n",
-    "- **Capacity:** More parameters can learn more complex patterns\n",
-    "- **Depth:** Enables hierarchical feature learning\n",
-    "- **Width:** Allows parallel processing of features\n",
-    "- **Efficiency:** Balance between performance and computation\n",
-    "\n",
-    "### Different Activation Functions\n",
-    "   ```python\n",
-    "# ReLU networks (most common)\n",
-    "relu_net = create_mlp(10, [20], 1, activation=ReLU)\n",
-    "   \n",
-    "# Tanh networks (centered around 0)\n",
-    "tanh_net = create_mlp(10, [20], 1, activation=Tanh)\n",
-    "   \n",
-    "# Multi-class classification\n",
-    "classifier = create_mlp(10, [20], 3, output_activation=Softmax)\n",
-    "   ```\n",
-    "\n",
-    "Let's test different architectures!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8b592f27",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Architecture Variations\n",
-    "\n",
-    "Let's test different network architectures to understand their behavior.\n",
-    "\n",
-    "**This is a unit test** - it tests architectural variations in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "014ae306",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-architectures",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test different architectures\n",
-    "print(\"🔬 Unit Test: Network Architecture Variations...\")\n",
-    "\n",
-    "try:\n",
-    "    # Test different activation functions\n",
-    "    relu_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=ReLU)\n",
-    "    tanh_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh)\n",
-    "    \n",
-    "    # Test different output activations\n",
-    "    classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax)\n",
-    "    \n",
-    "    # Test with sample data\n",
-    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
-    "    \n",
-    "    # Test ReLU network\n",
-    "    y_relu = relu_net(x)\n",
-    "    assert y_relu.shape == (1, 1), \"ReLU network should work\"\n",
-    "    print(\"✅ ReLU network works correctly\")\n",
-    "    \n",
-    "    # Test Tanh network\n",
-    "    y_tanh = tanh_net(x)\n",
-    "    assert y_tanh.shape == (1, 1), \"Tanh network should work\"\n",
-    "    print(\"✅ Tanh network works correctly\")\n",
-    "    \n",
-    "    # Test multi-class classifier\n",
-    "    y_multi = classifier(x)\n",
-    "    assert y_multi.shape == (1, 3), \"Multi-class classifier should work\"\n",
-    "    \n",
-    "    # Check softmax properties\n",
-    "    assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, \"Softmax outputs should sum to 1\"\n",
-    "    print(\"✅ Multi-class classifier with Softmax works correctly\")\n",
-    "    \n",
-    "    # Test different architectures\n",
-    "    shallow = create_mlp(input_size=4, hidden_sizes=[5], output_size=1)\n",
-    "    deep = create_mlp(input_size=4, hidden_sizes=[5, 5, 5], output_size=1)\n",
-    "    wide = create_mlp(input_size=4, hidden_sizes=[20], output_size=1)\n",
-    "    \n",
-    "    x_test = Tensor([[1.0, 2.0, 3.0, 4.0]])\n",
-    "    \n",
-    "    # Test all architectures\n",
-    "    for name, net in [(\"Shallow\", shallow), (\"Deep\", deep), (\"Wide\", wide)]:\n",
-    "        y = net(x_test)\n",
-    "        assert y.shape == (1, 1), f\"{name} network should produce correct shape\"\n",
-    "        print(f\"✅ {name} network works correctly\")\n",
-    "    \n",
-    "    print(\"✅ All network architectures work correctly\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Architecture test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "print(\"🎯 Architecture insights:\")\n",
-    "print(\"   Different activations create different behaviors\")\n",
-    "print(\"   Softmax enables multi-class classification\")\n",
-    "print(\"   Architecture affects network capacity and learning\")\n",
-    "print(\"📈 Progress: Sequential ✓, MLP creation ✓, Architecture variations ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "95eaa020",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 5: Comprehensive Test - Complete Network Applications\n",
-    "\n",
-    "### Real-World Network Applications\n",
-    "Let's test our networks on realistic scenarios:\n",
-    "\n",
-    "#### **Classification Problem**\n",
-    "```python\n",
-    "# 4 features → 2 classes (binary classification)\n",
-    "classifier = create_mlp(4, [8, 4], 2, output_activation=Softmax)\n",
-    "```\n",
-    "\n",
-    "#### **Regression Problem**\n",
-    "```python\n",
-    "# 3 features → 1 continuous output\n",
-    "regressor = create_mlp(3, [10, 5], 1, output_activation=lambda: Dense(0, 0))  # Linear output\n",
-    "```\n",
-    "\n",
-    "#### **Deep Learning Pattern**\n",
-    "```python\n",
-    "# Complex feature learning\n",
-    "deep_net = create_mlp(10, [64, 32, 16], 1)\n",
-    "```\n",
-    "\n",
-    "This comprehensive test ensures our networks work for real ML applications!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "88a1a20f",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-integration",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Comprehensive test - complete network applications\n",
-    "print(\"🔬 Comprehensive Test: Complete Network Applications...\")\n",
-    "\n",
-    "try:\n",
-    "    # Test 1: Multi-class Classification (Iris-like dataset)\n",
-    "    print(\"\\n1. Multi-class Classification Test:\")\n",
-    "    iris_classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax)\n",
-    "    \n",
-    "    # Simulate iris features: [sepal_length, sepal_width, petal_length, petal_width]\n",
-    "    iris_samples = Tensor([\n",
-    "        [5.1, 3.5, 1.4, 0.2],  # Setosa\n",
-    "        [7.0, 3.2, 4.7, 1.4],  # Versicolor\n",
-    "        [6.3, 3.3, 6.0, 2.5]   # Virginica\n",
-    "        ])\n",
-    "        \n",
-    "    iris_predictions = iris_classifier(iris_samples)\n",
-    "    assert iris_predictions.shape == (3, 3), \"Iris classifier should output 3 classes for 3 samples\"\n",
-    "        \n",
-    "    # Check softmax properties\n",
-    "    row_sums = np.sum(iris_predictions.data, axis=1)\n",
-    "    assert np.allclose(row_sums, 1.0), \"Each prediction should sum to 1\"\n",
-    "    print(\"✅ Multi-class classification works correctly\")\n",
-    "    \n",
-    "    # Test 2: Regression Task (Housing prices)\n",
-    "    print(\"\\n2. Regression Task Test:\")\n",
-    "    # Create a regressor without final activation (linear output)\n",
-    "    class Identity:\n",
-    "        def __call__(self, x): return x\n",
-    "    \n",
-    "    housing_regressor = create_mlp(input_size=3, hidden_sizes=[10, 5], output_size=1, output_activation=Identity)\n",
-    "    \n",
-    "    # Simulate housing features: [size, bedrooms, location_score]\n",
-    "    housing_samples = Tensor([\n",
-    "        [2000, 3, 8.5],  # Large house, good location\n",
-    "        [1200, 2, 6.0],  # Medium house, ok location\n",
-    "        [800, 1, 4.0]    # Small house, poor location\n",
-    "    ])\n",
-    "    \n",
-    "    housing_predictions = housing_regressor(housing_samples)\n",
-    "    assert housing_predictions.shape == (3, 1), \"Housing regressor should output 1 value per sample\"\n",
-    "    print(\"✅ Regression task works correctly\")\n",
-    "    \n",
-    "    # Test 3: Deep Network Performance\n",
-    "    print(\"\\n3. Deep Network Test:\")\n",
-    "    deep_network = create_mlp(input_size=10, hidden_sizes=[20, 15, 10, 5], output_size=1)\n",
-    "    \n",
-    "    # Test with realistic batch size\n",
-    "    batch_data = Tensor(np.random.randn(32, 10))  # 32 samples, 10 features\n",
-    "    deep_predictions = deep_network(batch_data)\n",
-    "    \n",
-    "    assert deep_predictions.shape == (32, 1), \"Deep network should handle batch processing\"\n",
-    "    assert not np.any(np.isnan(deep_predictions.data)), \"Deep network should not produce NaN\"\n",
-    "    print(\"✅ Deep network handles batch processing correctly\")\n",
-    "    \n",
-    "    # Test 4: Network Composition\n",
-    "    print(\"\\n4. Network Composition Test:\")\n",
-    "    # Create a feature extractor and classifier separately\n",
-    "    feature_extractor = Sequential([\n",
-    "    Dense(input_size=10, output_size=5),\n",
-    "        ReLU(),\n",
-    "    Dense(input_size=5, output_size=3),\n",
-    "        ReLU()\n",
-    "    ])\n",
-    "    \n",
-    "    classifier_head = Sequential([\n",
-    "    Dense(input_size=3, output_size=2),\n",
-    "        Softmax()\n",
-    "    ])\n",
-    "    \n",
-    "    # Test composition\n",
-    "    raw_data = Tensor(np.random.randn(5, 10))\n",
-    "    features = feature_extractor(raw_data)\n",
-    "    final_predictions = classifier_head(features)\n",
-    "    \n",
-    "    assert features.shape == (5, 3), \"Feature extractor should output 3 features\"\n",
-    "    assert final_predictions.shape == (5, 2), \"Classifier should output 2 classes\"\n",
-    "    \n",
-    "    row_sums = np.sum(final_predictions.data, axis=1)\n",
-    "    assert np.allclose(row_sums, 1.0), \"Composed network predictions should be valid\"\n",
-    "    print(\"✅ Network composition works correctly\")\n",
-    "    \n",
-    "    print(\"\\n🎉 Comprehensive test passed! Your networks work correctly for:\")\n",
-    "    print(\"  • Multi-class classification (Iris flowers)\")\n",
-    "    print(\"  • Regression tasks (housing prices)\")\n",
-    "    print(\"  • Deep learning architectures\")\n",
-    "    print(\"  • Network composition and feature extraction\")\n",
-    "\n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Comprehensive test failed: {e}\")\n",
-    "\n",
-    "print(\"📈 Final Progress: Complete network architectures ready for real ML applications!\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5cdcbd10",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "networks-compatibility",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class MLP:\n",
-    "    \"\"\"\n",
-    "    Multi-Layer Perceptron (MLP) class.\n",
-    "    \n",
-    "    A convenient wrapper around Sequential networks for standard MLP architectures.\n",
-    "    Maintains parameter information and provides a clean interface.\n",
-    "    \n",
-    "    Args:\n",
-    "        input_size: Number of input features\n",
-    "        hidden_size: Size of the single hidden layer\n",
-    "        output_size: Number of output features\n",
-    "        activation: Activation function for hidden layer (default: ReLU)\n",
-    "        output_activation: Activation function for output layer (default: Sigmoid)\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, input_size: int, hidden_size: int, output_size: int, \n",
-    "                 activation=ReLU, output_activation=None):\n",
-    "        self.input_size = input_size\n",
-    "        self.hidden_size = hidden_size\n",
-    "        self.output_size = output_size\n",
-    "        \n",
-    "        # Build the network layers\n",
-    "        layers = []\n",
-    "        \n",
-    "        # Input to hidden layer\n",
-    "        layers.append(Dense(input_size, hidden_size))\n",
-    "        layers.append(activation())\n",
-    "        \n",
-    "        # Hidden to output layer\n",
-    "        layers.append(Dense(hidden_size, output_size))\n",
-    "        if output_activation is not None:\n",
-    "            layers.append(output_activation())\n",
-    "        \n",
-    "        self.network = Sequential(layers)\n",
-    "    \n",
-    "    def forward(self, x):\n",
-    "        \"\"\"Forward pass through the MLP network.\"\"\"\n",
-    "        return self.network.forward(x)\n",
-    "    \n",
-    "    def __call__(self, x):\n",
-    "        \"\"\"Make the MLP callable.\"\"\"\n",
-    "        return self.forward(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1fac91d2",
-   "metadata": {
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "\n",
-    "def test_sequential_networks():\n",
-    "    \"\"\"Test Sequential network implementation comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Sequential Networks...\")\n",
-    "    \n",
-    "    # Test basic Sequential network\n",
-    "    net = Sequential([\n",
-    "        Dense(input_size=3, output_size=4),\n",
-    "        ReLU(),\n",
-    "        Dense(input_size=4, output_size=2),\n",
-    "        Sigmoid()\n",
-    "    ])\n",
-    "    \n",
-    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
-    "    y = net(x)\n",
-    "    \n",
-    "    assert y.shape == (1, 2), \"Sequential network should produce correct output shape\"\n",
-    "    assert np.all(y.data > 0), \"Sigmoid output should be positive\"\n",
-    "    assert np.all(y.data < 1), \"Sigmoid output should be less than 1\"\n",
-    "    \n",
-    "    print(\"✅ Sequential networks work correctly\")\n",
-    "\n",
-    "def test_mlp_creation():\n",
-    "    \"\"\"Test MLP creation function comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: MLP Creation...\")\n",
-    "    \n",
-    "    # Test different MLP architectures\n",
-    "    shallow = create_mlp(input_size=4, hidden_sizes=[5], output_size=1)\n",
-    "    deep = create_mlp(input_size=4, hidden_sizes=[8, 6, 4], output_size=2)\n",
-    "    \n",
-    "    x = Tensor([[1.0, 2.0, 3.0, 4.0]])\n",
-    "    \n",
-    "    # Test shallow network\n",
-    "    y_shallow = shallow(x)\n",
-    "    assert y_shallow.shape == (1, 1), \"Shallow MLP should work\"\n",
-    "    \n",
-    "    # Test deep network  \n",
-    "    y_deep = deep(x)\n",
-    "    assert y_deep.shape == (1, 2), \"Deep MLP should work\"\n",
-    "    \n",
-    "    print(\"✅ MLP creation works correctly\")\n",
-    "\n",
-    "def test_network_architectures():\n",
-    "    \"\"\"Test different network architectures comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Network Architectures...\")\n",
-    "    \n",
-    "    # Test different activation functions\n",
-    "    relu_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=ReLU)\n",
-    "    tanh_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh)\n",
-    "    \n",
-    "    # Test multi-class classifier\n",
-    "    classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax)\n",
-    "    \n",
-    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
-    "    \n",
-    "    # Test all architectures\n",
-    "    y_relu = relu_net(x)\n",
-    "    y_tanh = tanh_net(x)\n",
-    "    y_multi = classifier(x)\n",
-    "    \n",
-    "    assert y_relu.shape == (1, 1), \"ReLU network should work\"\n",
-    "    assert y_tanh.shape == (1, 1), \"Tanh network should work\"\n",
-    "    assert y_multi.shape == (1, 3), \"Multi-class classifier should work\"\n",
-    "    assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, \"Softmax outputs should sum to 1\"\n",
-    "    \n",
-    "    print(\"✅ Network architectures work correctly\")\n",
-    "\n",
-    "def test_networks():\n",
-    "    \"\"\"Test network comprehensive testing with real ML scenarios.\"\"\"\n",
-    "    print(\"🔬 Comprehensive Test: Network Applications...\")\n",
-    "    \n",
-    "    # Test multi-class classification\n",
-    "    iris_classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax)\n",
-    "    iris_samples = Tensor([[5.1, 3.5, 1.4, 0.2], [7.0, 3.2, 4.7, 1.4], [6.3, 3.3, 6.0, 2.5]])\n",
-    "    iris_predictions = iris_classifier(iris_samples)\n",
-    "    \n",
-    "    assert iris_predictions.shape == (3, 3), \"Iris classifier should work\"\n",
-    "    row_sums = np.sum(iris_predictions.data, axis=1)\n",
-    "    assert np.allclose(row_sums, 1.0), \"Predictions should sum to 1\""
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4501d38a",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2207bbaf",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Networks\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6bb77317",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Neural Network Architectures Mastery!\n",
-    "\n",
-    "Congratulations! You've successfully implemented complete neural network architectures:\n",
-    "\n",
-    "### What You've Accomplished\n",
-    "✅ **Sequential Networks**: Chained layers for complex transformations\n",
-    "✅ **MLP Creation**: Multi-layer perceptrons with flexible architectures\n",
-    "✅ **Network Architectures**: Different activation patterns and output types\n",
-    "✅ **Integration**: Real-world applications like classification and regression\n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "- **Sequential Processing**: How layers chain together for complex functions\n",
-    "- **MLP Design**: Multi-layer perceptrons as universal function approximators  \n",
-    "- **Architecture Choices**: How depth, width, and activations affect learning\n",
-    "- **Real Applications**: Classification, regression, and feature extraction\n",
-    "\n",
-    "### Next Steps\n",
-    "1. **Export your code**: `tito package nbdev --export 04_networks`\n",
-    "2. **Test your implementation**: `tito test 04_networks`\n",
-    "3. **Build complete models**: Combine with training for full ML pipelines\n",
-    "4. **Move to Module 5**: Add convolutional layers for image processing!\n",
-    "\n",
-    "**Ready for CNNs?** Your network foundations are now ready for specialized architectures!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/06_spatial/spatial_dev.ipynb b/modules/source/06_spatial/spatial_dev.ipynb
deleted file mode 100644
index 9bd5e4d1..00000000
--- a/modules/source/06_spatial/spatial_dev.ipynb
+++ /dev/null
@@ -1,1106 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "0eb7442f",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# CNN - Convolutional Neural Networks\n",
-    "\n",
-    "Welcome to the CNN module! Here you'll implement the core building block of modern computer vision: the convolutional layer.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand the convolution operation and its importance in computer vision\n",
-    "- Implement Conv2D with explicit for-loops to understand the sliding window mechanism\n",
-    "- Build convolutional layers that can detect spatial patterns in images\n",
-    "- Compose Conv2D with other layers to build complete convolutional networks\n",
-    "- See how convolution enables parameter sharing and translation invariance\n",
-    "\n",
-    "## Build → Use → Reflect\n",
-    "1. **Build**: Conv2D layer using sliding window convolution from scratch\n",
-    "2. **Use**: Transform images and see feature maps emerge\n",
-    "3. **Reflect**: How CNNs learn hierarchical spatial patterns\n",
-    "\n",
-    "## What You'll Learn\n",
-    "By the end of this module, you'll understand:\n",
-    "- How convolution works as a sliding window operation\n",
-    "- Why convolution is perfect for spatial data like images\n",
-    "- How to build learnable convolutional layers\n",
-    "- The CNN pipeline: Conv2D → Activation → Flatten → Dense\n",
-    "- How parameter sharing makes CNNs efficient"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "dcbef292",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "cnn-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.spatial\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import os\n",
-    "import sys\n",
-    "from typing import List, Tuple, Optional\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "# Import from the main package - try package first, then local modules\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.layers import Dense\n",
-    "    from tinytorch.core.activations import ReLU\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))\n",
-    "    from tensor_dev import Tensor\n",
-    "    from activations_dev import ReLU\n",
-    "    from layers_dev import Dense"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "708b859a",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "cnn-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "afb3e12a",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "cnn-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch CNN Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build convolutional neural networks!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0c0fae33",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/05_cnn/cnn_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.cnn`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.cnn import Conv2D, conv2d_naive, flatten  # CNN operations!\n",
-    "from tinytorch.core.layers import Dense  # Fully connected layers\n",
-    "from tinytorch.core.activations import ReLU  # Nonlinearity\n",
-    "from tinytorch.core.tensor import Tensor  # Foundation\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused modules for deep understanding of convolution\n",
-    "- **Production:** Proper organization like PyTorch's `torch.nn.Conv2d`\n",
-    "- **Consistency:** All CNN operations live together in `core.cnn`\n",
-    "- **Integration:** Works seamlessly with other TinyTorch components"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3f3d5bdd",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: Understanding Convolution\n",
-    "\n",
-    "### What is Convolution?\n",
-    "**Convolution** is a mathematical operation that slides a small filter (kernel) across an input, computing dot products at each position.\n",
-    "\n",
-    "### Why Convolution is Perfect for Images\n",
-    "- **Local patterns**: Images have local structure (edges, textures)\n",
-    "- **Translation invariance**: Same pattern can appear anywhere\n",
-    "- **Parameter sharing**: One filter detects the pattern everywhere\n",
-    "- **Spatial hierarchy**: Multiple layers build increasingly complex features\n",
-    "\n",
-    "### The Fundamental Insight\n",
-    "**Convolution is pattern matching!** The kernel learns to detect specific patterns:\n",
-    "- **Edge detectors**: Find boundaries between objects\n",
-    "- **Texture detectors**: Recognize surface patterns\n",
-    "- **Shape detectors**: Identify geometric forms\n",
-    "- **Feature detectors**: Combine simple patterns into complex features\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Image processing**: Detect edges, blur, sharpen\n",
-    "- **Computer vision**: Recognize objects, faces, text\n",
-    "- **Medical imaging**: Detect tumors, analyze scans\n",
-    "- **Autonomous driving**: Identify traffic signs, pedestrians\n",
-    "\n",
-    "### Visual Intuition\n",
-    "```\n",
-    "Input Image:     Kernel:        Output Feature Map:\n",
-    "[1, 2, 3]       [1,  0]       [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)]\n",
-    "[4, 5, 6]       [0, -1]       [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]\n",
-    "[7, 8, 9]\n",
-    "```\n",
-    "\n",
-    "The kernel slides across the input, computing dot products at each position.\n",
-    "\n",
-    "Let's implement this step by step!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2e7367d3",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "conv2d-naive",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:\n",
-    "    \"\"\"\n",
-    "    Naive 2D convolution (single channel, no stride, no padding).\n",
-    "    \n",
-    "    Args:\n",
-    "        input: 2D input array (H, W)\n",
-    "        kernel: 2D filter (kH, kW)\n",
-    "    Returns:\n",
-    "        2D output array (H-kH+1, W-kW+1)\n",
-    "        \n",
-    "    TODO: Implement the sliding window convolution using for-loops.\n",
-    "    \n",
-    "    APPROACH:\n",
-    "    1. Get input dimensions: H, W = input.shape\n",
-    "    2. Get kernel dimensions: kH, kW = kernel.shape\n",
-    "    3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1\n",
-    "    4. Create output array: np.zeros((out_H, out_W))\n",
-    "    5. Use nested loops to slide the kernel:\n",
-    "       - i loop: output rows (0 to out_H-1)\n",
-    "       - j loop: output columns (0 to out_W-1)\n",
-    "       - di loop: kernel rows (0 to kH-1)\n",
-    "       - dj loop: kernel columns (0 to kW-1)\n",
-    "    6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
-    "    \n",
-    "    EXAMPLE:\n",
-    "    Input: [[1, 2, 3],     Kernel: [[1, 0],\n",
-    "            [4, 5, 6],              [0, -1]]\n",
-    "            [7, 8, 9]]\n",
-    "    \n",
-    "    Output[0,0] = 1*1 + 2*0 + 4*0 + 5*(-1) = 1 - 5 = -4\n",
-    "    Output[0,1] = 2*1 + 3*0 + 5*0 + 6*(-1) = 2 - 6 = -4\n",
-    "    Output[1,0] = 4*1 + 5*0 + 7*0 + 8*(-1) = 4 - 8 = -4\n",
-    "    Output[1,1] = 5*1 + 6*0 + 8*0 + 9*(-1) = 5 - 9 = -4\n",
-    "    \n",
-    "    HINTS:\n",
-    "    - Start with output = np.zeros((out_H, out_W))\n",
-    "    - Use four nested loops: for i in range(out_H): for j in range(out_W): for di in range(kH): for dj in range(kW):\n",
-    "    - Accumulate the sum: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Get input and kernel dimensions\n",
-    "    H, W = input.shape\n",
-    "    kH, kW = kernel.shape\n",
-    "    \n",
-    "    # Calculate output dimensions\n",
-    "    out_H, out_W = H - kH + 1, W - kW + 1\n",
-    "    \n",
-    "    # Initialize output array\n",
-    "    output = np.zeros((out_H, out_W), dtype=input.dtype)\n",
-    "    \n",
-    "    # Sliding window convolution with four nested loops\n",
-    "    for i in range(out_H):\n",
-    "        for j in range(out_W):\n",
-    "            for di in range(kH):\n",
-    "                for dj in range(kW):\n",
-    "                    output[i, j] += input[i + di, j + dj] * kernel[di, dj]\n",
-    "    \n",
-    "    return output\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f864dd60",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Convolution Operation\n",
-    "\n",
-    "Let's test your convolution implementation right away! This is the core operation that powers computer vision.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific function (conv2d_naive) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0a397c9b",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-conv2d-naive-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test conv2d_naive function immediately after implementation\n",
-    "print(\"🔬 Unit Test: Convolution Operation...\")\n",
-    "\n",
-    "# Test simple 3x3 input with 2x2 kernel\n",
-    "try:\n",
-    "    input_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)\n",
-    "    kernel_array = np.array([[1, 0], [0, 1]], dtype=np.float32)  # Identity-like kernel\n",
-    "    \n",
-    "    result = conv2d_naive(input_array, kernel_array)\n",
-    "    expected = np.array([[6, 8], [12, 14]], dtype=np.float32)  # 1+5, 2+6, 4+8, 5+9\n",
-    "    \n",
-    "    print(f\"Input:\\n{input_array}\")\n",
-    "    print(f\"Kernel:\\n{kernel_array}\")\n",
-    "    print(f\"Result:\\n{result}\")\n",
-    "    print(f\"Expected:\\n{expected}\")\n",
-    "    \n",
-    "    assert np.allclose(result, expected), f\"Convolution failed: expected {expected}, got {result}\"\n",
-    "    print(\"✅ Simple convolution test passed\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Simple convolution test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test edge detection kernel\n",
-    "try:\n",
-    "    input_array = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=np.float32)\n",
-    "    edge_kernel = np.array([[-1, -1], [-1, 3]], dtype=np.float32)  # Edge detection\n",
-    "    \n",
-    "    result = conv2d_naive(input_array, edge_kernel)\n",
-    "    expected = np.array([[0, 0], [0, 0]], dtype=np.float32)  # Uniform region = no edges\n",
-    "    \n",
-    "    assert np.allclose(result, expected), f\"Edge detection failed: expected {expected}, got {result}\"\n",
-    "    print(\"✅ Edge detection test passed\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Edge detection test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test output shape\n",
-    "try:\n",
-    "    input_5x5 = np.random.randn(5, 5).astype(np.float32)\n",
-    "    kernel_3x3 = np.random.randn(3, 3).astype(np.float32)\n",
-    "    \n",
-    "    result = conv2d_naive(input_5x5, kernel_3x3)\n",
-    "    expected_shape = (3, 3)  # 5-3+1 = 3\n",
-    "    \n",
-    "    assert result.shape == expected_shape, f\"Output shape wrong: expected {expected_shape}, got {result.shape}\"\n",
-    "    print(\"✅ Output shape test passed\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Output shape test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Show the convolution process\n",
-    "print(\"🎯 Convolution behavior:\")\n",
-    "print(\"   Slides kernel across input\")\n",
-    "print(\"   Computes dot product at each position\")\n",
-    "print(\"   Output size = Input size - Kernel size + 1\")\n",
-    "print(\"📈 Progress: Convolution operation ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "21785fd9",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Building the Conv2D Layer\n",
-    "\n",
-    "### What is a Conv2D Layer?\n",
-    "A **Conv2D layer** is a learnable convolutional layer that:\n",
-    "- Has learnable kernel weights (initialized randomly)\n",
-    "- Applies convolution to input tensors\n",
-    "- Integrates with the rest of the neural network\n",
-    "\n",
-    "### Why Conv2D Layers Matter\n",
-    "- **Feature learning**: Kernels learn to detect useful patterns\n",
-    "- **Composability**: Can be stacked with other layers\n",
-    "- **Efficiency**: Shared weights reduce parameters dramatically\n",
-    "- **Translation invariance**: Same patterns detected anywhere in the image\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Image classification**: Recognize objects in photos\n",
-    "- **Object detection**: Find and locate objects\n",
-    "- **Medical imaging**: Detect anomalies in scans\n",
-    "- **Autonomous driving**: Identify road features\n",
-    "\n",
-    "### Design Decisions\n",
-    "- **Kernel size**: Typically 3×3 or 5×5 for balance of locality and capacity\n",
-    "- **Initialization**: Small random values to break symmetry\n",
-    "- **Integration**: Works with Tensor class and other layers"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1419fa91",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "conv2d-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Conv2D:\n",
-    "    \"\"\"\n",
-    "    2D Convolutional Layer (single channel, single filter, no stride/pad).\n",
-    "    \n",
-    "    A learnable convolutional layer that applies a kernel to detect spatial patterns.\n",
-    "    Perfect for building the foundation of convolutional neural networks.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, kernel_size: Tuple[int, int]):\n",
-    "        \"\"\"\n",
-    "        Initialize Conv2D layer with random kernel.\n",
-    "        \n",
-    "        Args:\n",
-    "            kernel_size: (kH, kW) - size of the convolution kernel\n",
-    "            \n",
-    "        TODO: Initialize a random kernel with small values.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store kernel_size as instance variable\n",
-    "        2. Initialize random kernel with small values\n",
-    "        3. Use proper initialization for stable training\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Conv2D((2, 2)) creates:\n",
-    "        - kernel: shape (2, 2) with small random values\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Store kernel_size as self.kernel_size\n",
-    "        - Initialize kernel: np.random.randn(kH, kW) * 0.1 (small values)\n",
-    "        - Convert to float32 for consistency\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Store kernel size\n",
-    "        self.kernel_size = kernel_size\n",
-    "        kH, kW = kernel_size\n",
-    "        \n",
-    "        # Initialize random kernel with small values\n",
-    "        self.kernel = np.random.randn(kH, kW).astype(np.float32) * 0.1\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, x):\n",
-    "        \"\"\"\n",
-    "        Forward pass: apply convolution to input tensor.\n",
-    "        \n",
-    "        Args:\n",
-    "            x: Input tensor (2D for simplicity)\n",
-    "            \n",
-    "        Returns:\n",
-    "            Output tensor after convolution\n",
-    "            \n",
-    "        TODO: Implement forward pass using conv2d_naive function.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Extract numpy array from input tensor\n",
-    "        2. Apply conv2d_naive with stored kernel\n",
-    "        3. Return result wrapped in Tensor\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # shape (3, 3)\n",
-    "        layer = Conv2D((2, 2))\n",
-    "        y = layer(x)  # shape (2, 2)\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use x.data to get numpy array\n",
-    "        - Use conv2d_naive(x.data, self.kernel)\n",
-    "        - Return Tensor(result) to wrap the result\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Apply convolution using naive implementation\n",
-    "        result = conv2d_naive(x.data, self.kernel)\n",
-    "        return type(x)(result)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x):\n",
-    "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
-    "        return self.forward(x)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bcbe5521",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Conv2D Layer\n",
-    "\n",
-    "Let's test your Conv2D layer implementation! This is a learnable convolutional layer that can be trained.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific class (Conv2D) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "dc785267",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-conv2d-layer-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test Conv2D layer immediately after implementation\n",
-    "print(\"🔬 Unit Test: Conv2D Layer...\")\n",
-    "\n",
-    "# Create a Conv2D layer\n",
-    "try:\n",
-    "    layer = Conv2D(kernel_size=(2, 2))\n",
-    "    print(f\"Conv2D layer created with kernel size: {layer.kernel_size}\")\n",
-    "    print(f\"Kernel shape: {layer.kernel.shape}\")\n",
-    "    \n",
-    "    # Test that kernel is initialized properly\n",
-    "    assert layer.kernel.shape == (2, 2), f\"Kernel shape should be (2, 2), got {layer.kernel.shape}\"\n",
-    "    assert not np.allclose(layer.kernel, 0), \"Kernel should not be all zeros\"\n",
-    "    print(\"✅ Conv2D layer initialization successful\")\n",
-    "    \n",
-    "    # Test with sample input\n",
-    "    x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
-    "    print(f\"Input shape: {x.shape}\")\n",
-    "    \n",
-    "    y = layer(x)\n",
-    "    print(f\"Output shape: {y.shape}\")\n",
-    "    print(f\"Output: {y}\")\n",
-    "    \n",
-    "    # Verify shapes\n",
-    "    assert y.shape == (2, 2), f\"Output shape should be (2, 2), got {y.shape}\"\n",
-    "    assert isinstance(y, Tensor), \"Output should be a Tensor\"\n",
-    "    print(\"✅ Conv2D layer forward pass successful\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Conv2D layer test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test different kernel sizes\n",
-    "try:\n",
-    "    layer_3x3 = Conv2D(kernel_size=(3, 3))\n",
-    "    x_5x5 = Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]])\n",
-    "    y_3x3 = layer_3x3(x_5x5)\n",
-    "    \n",
-    "    assert y_3x3.shape == (3, 3), f\"3x3 kernel output should be (3, 3), got {y_3x3.shape}\"\n",
-    "    print(\"✅ Different kernel sizes work correctly\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Different kernel sizes test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Show the layer behavior\n",
-    "print(\"🎯 Conv2D layer behavior:\")\n",
-    "print(\"   Learnable kernel weights\")\n",
-    "print(\"   Applies convolution to detect patterns\")\n",
-    "print(\"   Can be trained end-to-end\")\n",
-    "print(\"📈 Progress: Convolution operation ✓, Conv2D layer ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "04dd554d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Flattening for Dense Layers\n",
-    "\n",
-    "### What is Flattening?\n",
-    "**Flattening** converts multi-dimensional tensors to 1D vectors, enabling connection between convolutional and dense layers.\n",
-    "\n",
-    "### Why Flattening is Needed\n",
-    "- **Interface compatibility**: Conv2D outputs 2D, Dense expects 1D\n",
-    "- **Network composition**: Connect spatial features to classification\n",
-    "- **Standard practice**: Almost all CNNs use this pattern\n",
-    "- **Dimension management**: Preserve information while changing shape\n",
-    "\n",
-    "### The Pattern\n",
-    "```\n",
-    "Conv2D → ReLU → Conv2D → ReLU → Flatten → Dense → Output\n",
-    "```\n",
-    "\n",
-    "### Real-World Usage\n",
-    "- **Classification**: Final layers need 1D input for class probabilities\n",
-    "- **Feature extraction**: Convert spatial features to vector representations\n",
-    "- **Transfer learning**: Extract features from pre-trained CNNs"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fc87c45d",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "flatten-function",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def flatten(x):\n",
-    "    \"\"\"\n",
-    "    Flatten a 2D tensor to 1D (for connecting to Dense layers).\n",
-    "    \n",
-    "    Args:\n",
-    "        x: Input tensor to flatten\n",
-    "        \n",
-    "    Returns:\n",
-    "        Flattened tensor with batch dimension preserved\n",
-    "        \n",
-    "    TODO: Implement flattening operation.\n",
-    "    \n",
-    "    APPROACH:\n",
-    "    1. Get the numpy array from the tensor\n",
-    "    2. Use .flatten() to convert to 1D\n",
-    "    3. Add batch dimension with [None, :]\n",
-    "    4. Return Tensor wrapped around the result\n",
-    "    \n",
-    "    EXAMPLE:\n",
-    "    Input: Tensor([[1, 2], [3, 4]])  # shape (2, 2)\n",
-    "    Output: Tensor([[1, 2, 3, 4]])  # shape (1, 4)\n",
-    "    \n",
-    "    HINTS:\n",
-    "    - Use x.data.flatten() to get 1D array\n",
-    "    - Add batch dimension: result[None, :]\n",
-    "    - Return Tensor(result)\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Flatten the tensor and add batch dimension\n",
-    "    flattened = x.data.flatten()\n",
-    "    result = flattened[None, :]  # Add batch dimension\n",
-    "    return type(x)(result)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "26b6d81b",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Flatten Function\n",
-    "\n",
-    "Let's test your flatten function! This connects convolutional layers to dense layers.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific function (flatten) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c49e29e6",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-flatten-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test flatten function immediately after implementation\n",
-    "print(\"🔬 Unit Test: Flatten Function...\")\n",
-    "\n",
-    "# Test case 1: 2x2 tensor\n",
-    "try:\n",
-    "    x = Tensor([[1, 2], [3, 4]])\n",
-    "    flattened = flatten(x)\n",
-    "    \n",
-    "    print(f\"Input: {x}\")\n",
-    "    print(f\"Flattened: {flattened}\")\n",
-    "    print(f\"Flattened shape: {flattened.shape}\")\n",
-    "    \n",
-    "    # Verify shape and content\n",
-    "    assert flattened.shape == (1, 4), f\"Flattened shape should be (1, 4), got {flattened.shape}\"\n",
-    "    expected_data = np.array([[1, 2, 3, 4]])\n",
-    "    assert np.array_equal(flattened.data, expected_data), f\"Flattened data should be {expected_data}, got {flattened.data}\"\n",
-    "    print(\"✅ 2x2 flatten test passed\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ 2x2 flatten test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test case 2: 3x3 tensor\n",
-    "try:\n",
-    "    x2 = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
-    "    flattened2 = flatten(x2)\n",
-    "    \n",
-    "    assert flattened2.shape == (1, 9), f\"Flattened shape should be (1, 9), got {flattened2.shape}\"\n",
-    "    expected_data2 = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])\n",
-    "    assert np.array_equal(flattened2.data, expected_data2), f\"Flattened data should be {expected_data2}, got {flattened2.data}\"\n",
-    "    print(\"✅ 3x3 flatten test passed\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ 3x3 flatten test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test case 3: Different shapes\n",
-    "try:\n",
-    "    x3 = Tensor([[1, 2, 3, 4], [5, 6, 7, 8]])  # 2x4\n",
-    "    flattened3 = flatten(x3)\n",
-    "    \n",
-    "    assert flattened3.shape == (1, 8), f\"Flattened shape should be (1, 8), got {flattened3.shape}\"\n",
-    "    expected_data3 = np.array([[1, 2, 3, 4, 5, 6, 7, 8]])\n",
-    "    assert np.array_equal(flattened3.data, expected_data3), f\"Flattened data should be {expected_data3}, got {flattened3.data}\"\n",
-    "    print(\"✅ Different shapes flatten test passed\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Different shapes flatten test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Show the flattening behavior\n",
-    "print(\"🎯 Flatten behavior:\")\n",
-    "print(\"   Converts 2D tensor to 1D\")\n",
-    "print(\"   Preserves batch dimension\")\n",
-    "print(\"   Enables connection to Dense layers\")\n",
-    "print(\"📈 Progress: Convolution operation ✓, Conv2D layer ✓, Flatten ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5fa61ae7",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 4: Comprehensive Test - Complete CNN Pipeline\n",
-    "\n",
-    "### Real-World CNN Applications\n",
-    "Let's test our CNN components in realistic scenarios:\n",
-    "\n",
-    "#### **Image Classification Pipeline**\n",
-    "```python\n",
-    "# The standard CNN pattern\n",
-    "Conv2D → ReLU → Flatten → Dense → Output\n",
-    "```\n",
-    "\n",
-    "#### **Multi-layer CNN**\n",
-    "```python\n",
-    "# Deeper pattern for complex features\n",
-    "Conv2D → ReLU → Conv2D → ReLU → Flatten → Dense → Output\n",
-    "```\n",
-    "\n",
-    "#### **Feature Extraction**\n",
-    "```python\n",
-    "# Extract spatial features then classify\n",
-    "image → CNN features → dense classifier → predictions\n",
-    "```\n",
-    "\n",
-    "This comprehensive test ensures our CNN components work together for real computer vision applications!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5b92503c",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-comprehensive",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Comprehensive test - complete CNN applications\n",
-    "print(\"🔬 Comprehensive Test: Complete CNN Applications...\")\n",
-    "\n",
-    "try:\n",
-    "    # Test 1: Simple CNN Pipeline\n",
-    "    print(\"\\n1. Simple CNN Pipeline Test:\")\n",
-    "    \n",
-    "    # Create pipeline: Conv2D → ReLU → Flatten → Dense\n",
-    "    conv = Conv2D(kernel_size=(2, 2))\n",
-    "    relu = ReLU()\n",
-    "    dense = Dense(input_size=4, output_size=3)\n",
-    "    \n",
-    "    # Input image\n",
-    "    image = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
-    "    \n",
-    "    # Forward pass\n",
-    "    features = conv(image)          # (3,3) → (2,2)\n",
-    "    activated = relu(features)      # (2,2) → (2,2)\n",
-    "    flattened = flatten(activated)  # (2,2) → (1,4)\n",
-    "    output = dense(flattened)       # (1,4) → (1,3)\n",
-    "    \n",
-    "    assert features.shape == (2, 2), f\"Conv output shape wrong: {features.shape}\"\n",
-    "    assert activated.shape == (2, 2), f\"ReLU output shape wrong: {activated.shape}\"\n",
-    "    assert flattened.shape == (1, 4), f\"Flatten output shape wrong: {flattened.shape}\"\n",
-    "    assert output.shape == (1, 3), f\"Dense output shape wrong: {output.shape}\"\n",
-    "    \n",
-    "    print(\"✅ Simple CNN pipeline works correctly\")\n",
-    "    \n",
-    "    # Test 2: Multi-layer CNN\n",
-    "    print(\"\\n2. Multi-layer CNN Test:\")\n",
-    "    \n",
-    "    # Create deeper pipeline: Conv2D → ReLU → Conv2D → ReLU → Flatten → Dense\n",
-    "    conv1 = Conv2D(kernel_size=(2, 2))\n",
-    "    relu1 = ReLU()\n",
-    "    conv2 = Conv2D(kernel_size=(2, 2))\n",
-    "    relu2 = ReLU()\n",
-    "    dense_multi = Dense(input_size=9, output_size=2)\n",
-    "    \n",
-    "    # Larger input for multi-layer processing\n",
-    "    large_image = Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]])\n",
-    "    \n",
-    "    # Forward pass\n",
-    "    h1 = conv1(large_image)  # (5,5) → (4,4)\n",
-    "    h2 = relu1(h1)           # (4,4) → (4,4)\n",
-    "    h3 = conv2(h2)           # (4,4) → (3,3)\n",
-    "    h4 = relu2(h3)           # (3,3) → (3,3)\n",
-    "    h5 = flatten(h4)         # (3,3) → (1,9)\n",
-    "    output_multi = dense_multi(h5)  # (1,9) → (1,2)\n",
-    "    \n",
-    "    assert h1.shape == (4, 4), f\"Conv1 output wrong: {h1.shape}\"\n",
-    "    assert h3.shape == (3, 3), f\"Conv2 output wrong: {h3.shape}\"\n",
-    "    assert h5.shape == (1, 9), f\"Flatten output wrong: {h5.shape}\"\n",
-    "    assert output_multi.shape == (1, 2), f\"Final output wrong: {output_multi.shape}\"\n",
-    "    \n",
-    "    print(\"✅ Multi-layer CNN works correctly\")\n",
-    "    \n",
-    "    # Test 3: Image Classification Scenario\n",
-    "    print(\"\\n3. Image Classification Test:\")\n",
-    "    \n",
-    "    # Simulate digit classification with 8x8 image\n",
-    "    digit_image = Tensor([[1, 0, 0, 1, 1, 0, 0, 1],\n",
-    "                     [0, 1, 0, 1, 1, 0, 1, 0],\n",
-    "                     [0, 0, 1, 1, 1, 1, 0, 0],\n",
-    "                     [1, 1, 1, 0, 0, 1, 1, 1],\n",
-    "                     [1, 0, 0, 1, 1, 0, 0, 1],\n",
-    "                     [0, 1, 1, 0, 0, 1, 1, 0],\n",
-    "                     [0, 0, 1, 1, 1, 1, 0, 0],\n",
-    "                     [1, 1, 0, 0, 0, 0, 1, 1]])\n",
-    "    \n",
-    "    # CNN for digit classification\n",
-    "    feature_extractor = Conv2D(kernel_size=(3, 3))  # (8,8) → (6,6)\n",
-    "    activation = ReLU()\n",
-    "    classifier = Dense(input_size=36, output_size=10)  # 10 digit classes\n",
-    "    \n",
-    "    # Forward pass\n",
-    "    features = feature_extractor(digit_image)\n",
-    "    activated_features = activation(features)\n",
-    "    feature_vector = flatten(activated_features)\n",
-    "    digit_scores = classifier(feature_vector)\n",
-    "    \n",
-    "    assert features.shape == (6, 6), f\"Feature extraction shape wrong: {features.shape}\"\n",
-    "    assert feature_vector.shape == (1, 36), f\"Feature vector shape wrong: {feature_vector.shape}\"\n",
-    "    assert digit_scores.shape == (1, 10), f\"Digit scores shape wrong: {digit_scores.shape}\"\n",
-    "    \n",
-    "    print(\"✅ Image classification scenario works correctly\")\n",
-    "    \n",
-    "    # Test 4: Feature Extraction and Composition\n",
-    "    print(\"\\n4. Feature Extraction Test:\")\n",
-    "    \n",
-    "    # Create modular feature extractor\n",
-    "    feature_conv = Conv2D(kernel_size=(2, 2))\n",
-    "    feature_activation = ReLU()\n",
-    "    \n",
-    "    # Create classifier head\n",
-    "    classifier_head = Dense(input_size=4, output_size=3)\n",
-    "    \n",
-    "    # Test composition\n",
-    "    test_image = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
-    "    \n",
-    "    # Extract features\n",
-    "    extracted_features = feature_conv(test_image)\n",
-    "    activated_features = feature_activation(extracted_features)\n",
-    "    feature_representation = flatten(activated_features)\n",
-    "    \n",
-    "    # Classify\n",
-    "    predictions = classifier_head(feature_representation)\n",
-    "    \n",
-    "    assert extracted_features.shape == (2, 2), f\"Feature extraction wrong: {extracted_features.shape}\"\n",
-    "    assert feature_representation.shape == (1, 4), f\"Feature representation wrong: {feature_representation.shape}\"\n",
-    "    assert predictions.shape == (1, 3), f\"Predictions wrong: {predictions.shape}\"\n",
-    "    \n",
-    "    print(\"✅ Feature extraction and composition works correctly\")\n",
-    "    \n",
-    "    print(\"\\n🎉 Comprehensive test passed! Your CNN components work correctly for:\")\n",
-    "    print(\"  • Image classification pipelines\")\n",
-    "    print(\"  • Multi-layer feature extraction\")\n",
-    "    print(\"  • Spatial pattern recognition\")\n",
-    "    print(\"  • End-to-end CNN workflows\")\n",
-    "    print(\"📈 Progress: Complete CNN architecture ready for computer vision!\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Comprehensive test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "print(\"📈 Final Progress: Complete CNN system ready for computer vision!\")\n",
-    "\n",
-    "def test_convolution_operation():\n",
-    "    \"\"\"Test convolution operation implementation comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Convolution Operation...\")\n",
-    "    \n",
-    "    # Test basic convolution\n",
-    "    input_data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n",
-    "    kernel = np.array([[1, 0], [0, 1]])\n",
-    "    result = conv2d_naive(input_data, kernel)\n",
-    "    \n",
-    "    assert result.shape == (2, 2), \"Convolution should produce correct output shape\"\n",
-    "    expected = np.array([[6, 8], [12, 14]])\n",
-    "    assert np.array_equal(result, expected), \"Convolution should produce correct values\"\n",
-    "    \n",
-    "    print(\"✅ Convolution operation works correctly\")\n",
-    "\n",
-    "def test_conv2d_layer():\n",
-    "    \"\"\"Test Conv2D layer implementation comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Conv2D Layer...\")\n",
-    "    \n",
-    "    # Test Conv2D layer\n",
-    "    conv = Conv2D(kernel_size=(3, 3))\n",
-    "    input_tensor = Tensor(np.random.randn(6, 6))\n",
-    "    output = conv(input_tensor)\n",
-    "    \n",
-    "    assert output.shape == (4, 4), \"Conv2D should produce correct output shape\"\n",
-    "    assert hasattr(conv, 'kernel'), \"Conv2D should have kernel attribute\"\n",
-    "    assert conv.kernel.shape == (3, 3), \"Kernel should have correct shape\"\n",
-    "    \n",
-    "    print(\"✅ Conv2D layer works correctly\")\n",
-    "\n",
-    "def test_flatten_function():\n",
-    "    \"\"\"Test flatten function implementation comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Flatten Function...\")\n",
-    "    \n",
-    "    # Test flatten function\n",
-    "    input_2d = Tensor([[1, 2], [3, 4]])\n",
-    "    flattened = flatten(input_2d)\n",
-    "    \n",
-    "    assert flattened.shape == (1, 4), \"Flatten should produce output with batch dimension\"\n",
-    "    expected = np.array([[1, 2, 3, 4]])\n",
-    "    assert np.array_equal(flattened.data, expected), \"Flatten should preserve values\"\n",
-    "    \n",
-    "    print(\"✅ Flatten function works correctly\")\n",
-    "\n",
-    "# CNN pipeline integration test moved to tests/integration/test_cnn_pipeline.py"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0d45209f",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "30104333",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"CNN\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e346137e",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary\n",
-    "\n",
-    "Congratulations! You've successfully implemented the core components of convolutional neural networks:\n",
-    "\n",
-    "### What You've Accomplished\n",
-    "✅ **Convolution Operation**: Implemented the sliding window mechanism from scratch  \n",
-    "✅ **Conv2D Layer**: Built learnable convolutional layers with random initialization  \n",
-    "✅ **Flatten Function**: Created the bridge between convolutional and dense layers  \n",
-    "✅ **CNN Pipelines**: Composed complete systems for image processing  \n",
-    "✅ **Real Applications**: Tested on image classification and feature extraction\n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "- **Convolution as pattern matching**: Kernels detect specific features\n",
-    "- **Sliding window mechanism**: How convolution processes spatial data\n",
-    "- **Parameter sharing**: Same kernel applied across the entire image\n",
-    "- **Spatial hierarchy**: Multiple layers build complex features\n",
-    "- **CNN architecture**: Conv2D → Activation → Flatten → Dense pattern\n",
-    "\n",
-    "### Mathematical Foundations\n",
-    "- **Convolution operation**: dot product of kernel and image patches\n",
-    "- **Output size calculation**: (input_size - kernel_size + 1)\n",
-    "- **Translation invariance**: Same pattern detected anywhere in input\n",
-    "- **Feature maps**: Spatial representations of detected patterns\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Image classification**: Object recognition, medical imaging\n",
-    "- **Computer vision**: Face detection, autonomous driving\n",
-    "- **Pattern recognition**: Texture analysis, edge detection\n",
-    "- **Feature extraction**: Transfer learning, representation learning\n",
-    "\n",
-    "### CNN Architecture Insights\n",
-    "- **Kernel size**: 3×3 most common, balances locality and capacity\n",
-    "- **Stacking layers**: Builds hierarchical feature representations\n",
-    "- **Spatial reduction**: Each layer reduces spatial dimensions\n",
-    "- **Channel progression**: Typically increase channels while reducing spatial size\n",
-    "\n",
-    "### Performance Characteristics\n",
-    "- **Parameter efficiency**: Dramatic reduction vs. fully connected\n",
-    "- **Translation invariance**: Robust to object location changes\n",
-    "- **Computational efficiency**: Parallel processing of spatial regions\n",
-    "- **Memory considerations**: Feature maps require storage during forward pass\n",
-    "\n",
-    "### Next Steps\n",
-    "1. **Export your code**: Use NBDev to export to the `tinytorch` package\n",
-    "2. **Test your implementation**: Run the complete test suite\n",
-    "3. **Build CNN architectures**: \n",
-    "   ```python\n",
-    "   from tinytorch.core.cnn import Conv2D, flatten\n",
-    "   from tinytorch.core.layers import Dense\n",
-    "   from tinytorch.core.activations import ReLU\n",
-    "   \n",
-    "   # Create CNN\n",
-    "   conv = Conv2D(kernel_size=(3, 3))\n",
-    "   relu = ReLU()\n",
-    "   dense = Dense(input_size=36, output_size=10)\n",
-    "   \n",
-    "   # Process image\n",
-    "   features = relu(conv(image))\n",
-    "   predictions = dense(flatten(features))\n",
-    "   ```\n",
-    "4. **Explore advanced CNNs**: Pooling, multiple channels, modern architectures!\n",
-    "\n",
-    "**Ready for the next challenge?** Let's build data loaders to handle real datasets efficiently!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/07_attention/attention_dev.ipynb b/modules/source/07_attention/attention_dev.ipynb
deleted file mode 100644
index 89e547fd..00000000
--- a/modules/source/07_attention/attention_dev.ipynb
+++ /dev/null
@@ -1,1259 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "a2085540",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Attention - The Foundation of Modern AI\n",
-    "\n",
-    "Welcome to the Attention module! This is where you'll implement the revolutionary mechanism that powers ChatGPT, BERT, GPT-4, and virtually all state-of-the-art AI systems.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand attention as dynamic pattern matching with Query, Key, Value projections\n",
-    "- Implement scaled dot-product attention from mathematical foundations\n",
-    "- Master the attention formula that powers all transformer models\n",
-    "- Create masking utilities for different attention patterns\n",
-    "- Build the foundation for understanding modern AI architectures\n",
-    "\n",
-    "## Build → Use → Understand\n",
-    "1. **Build**: Implement the core attention mechanism from scratch using mathematical principles\n",
-    "2. **Use**: Apply attention to sequence tasks and visualize attention patterns\n",
-    "3. **Understand**: How attention revolutionized AI by enabling global context modeling\n",
-    "\n",
-    "## What You'll Learn\n",
-    "By the end of this module, you'll understand:\n",
-    "- How attention enables dynamic focus on relevant input parts\n",
-    "- The mathematical foundation behind all transformer models\n",
-    "- Why attention is more powerful than fixed convolution kernels\n",
-    "- How masking enables different attention patterns (causal, padding)\n",
-    "- The building block that powers ChatGPT, BERT, and modern AI"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d6f68c38",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "attention-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.attention\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import math\n",
-    "import sys\n",
-    "import os\n",
-    "from typing import List, Union, Optional, Tuple\n",
-    "import matplotlib.pyplot as plt\n",
-    "\n",
-    "# Import our building blocks - try package first, then local modules\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_tensor'))\n",
-    "    from tensor_dev import Tensor"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f4eab632",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "attention-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e47a3e00",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "attention-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Attention Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build attention mechanisms that power modern AI!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0e97deab",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/06_attention/attention_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.attention`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.attention import (\n",
-    "    scaled_dot_product_attention,  # Core attention function\n",
-    "    SelfAttention,                 # Self-attention wrapper\n",
-    "    create_causal_mask,           # Masking utilities\n",
-    "    create_padding_mask\n",
-    ")\n",
-    "from tinytorch.core.tensor import Tensor  # Foundation\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused module for deep understanding of core attention\n",
-    "- **Production:** Proper organization like PyTorch's attention functions\n",
-    "- **Consistency:** All attention mechanisms live together in `core.attention`\n",
-    "- **Foundation:** Building block for future transformer modules"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c5cb4273",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 1: Understanding Attention - The Revolutionary Mechanism\n",
-    "\n",
-    "### What is Attention?\n",
-    "**Attention** is a mechanism that allows models to dynamically focus on relevant parts of the input. It's like having a spotlight that can shine on different parts of a sequence based on what's most important for the current task.\n",
-    "\n",
-    "### The Fundamental Insight: Query, Key, Value\n",
-    "Attention works through three projections:\n",
-    "- **Query (Q)**: \"What am I looking for?\"\n",
-    "- **Key (K)**: \"What information is available?\"\n",
-    "- **Value (V)**: \"What is the actual content?\"\n",
-    "\n",
-    "### Real-World Analogy: Library Search\n",
-    "Imagine searching in a library:\n",
-    "```\n",
-    "Query: \"machine learning books\"     ← What you're looking for\n",
-    "Keys: [\"AI\", \"ML\", \"physics\", ...] ← Book category labels  \n",
-    "Values: [book1, book2, book3, ...]  ← Actual book contents\n",
-    "\n",
-    "Attention: Look at all keys, find matches with query, \n",
-    "          return weighted combination of corresponding values\n",
-    "```\n",
-    "\n",
-    "### The Attention Formula\n",
-    "```\n",
-    "Attention(Q,K,V) = softmax(QK^T/√d_k)V\n",
-    "```\n",
-    "\n",
-    "**Step by step:**\n",
-    "1. **Compute scores**: `QK^T` measures similarity between queries and keys\n",
-    "2. **Scale**: Divide by `√d_k` to prevent extremely large values\n",
-    "3. **Normalize**: `softmax` converts scores to probabilities\n",
-    "4. **Combine**: Weight the values by attention probabilities\n",
-    "\n",
-    "### Why This Is Revolutionary\n",
-    "- **Dynamic weights**: Unlike fixed convolution kernels, attention adapts to input\n",
-    "- **Global connectivity**: Any position can attend to any other position directly\n",
-    "- **Interpretability**: Attention weights show what the model focuses on\n",
-    "- **Scalability**: Works for sequences of varying lengths\n",
-    "\n",
-    "### Attention vs Convolution\n",
-    "| Aspect | Convolution | Attention |\n",
-    "|--------|-------------|-----------|\n",
-    "| **Receptive field** | Local, grows with depth | Global from layer 1 |\n",
-    "| **Computation** | O(n) with kernel size | O(n²) with sequence length |\n",
-    "| **Weights** | Fixed learned kernels | Dynamic input-dependent |\n",
-    "| **Best for** | Spatial data (images) | Sequential data (text) |\n",
-    "\n",
-    "Let's implement this step by step!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "22047a2e",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Implementing Scaled Dot-Product Attention\n",
-    "\n",
-    "### The Core Attention Operation\n",
-    "This is the mathematical heart of all modern AI systems. Every transformer model (GPT, BERT, etc.) uses this exact operation.\n",
-    "\n",
-    "### Mathematical Foundation\n",
-    "```\n",
-    "scores = QK^T / √d_k\n",
-    "attention_weights = softmax(scores)\n",
-    "output = attention_weights @ V\n",
-    "```\n",
-    "\n",
-    "### Why Scale by √d_k?\n",
-    "- **Prevents saturation**: Large dot products → extreme softmax values → vanishing gradients\n",
-    "- **Stable training**: Keeps attention weights in a reasonable range\n",
-    "- **Mathematical insight**: Compensates for variance growth with dimension\n",
-    "\n",
-    "Let's build the fundamental attention function!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "64b0186d",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "scaled-dot-product-attention",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def scaled_dot_product_attention(Q: np.ndarray, K: np.ndarray, V: np.ndarray, \n",
-    "                                mask: Optional[np.ndarray] = None) -> Tuple[np.ndarray, np.ndarray]:\n",
-    "    \"\"\"\n",
-    "    Scaled Dot-Product Attention - The foundation of all transformer models.\n",
-    "    \n",
-    "    This is the exact mechanism used in GPT, BERT, and all modern language models.\n",
-    "    \n",
-    "    TODO: Implement the core attention mechanism.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Get d_k (dimension of keys) from Q.shape[-1]\n",
-    "    2. Compute attention scores: Q @ K^T (matrix multiplication)\n",
-    "    3. Scale by √d_k: scores / sqrt(d_k)\n",
-    "    4. Apply mask if provided: set masked positions to -1e9\n",
-    "    5. Apply softmax to get attention weights (probabilities)\n",
-    "    6. Apply attention weights to values: weights @ V\n",
-    "    7. Return (output, attention_weights)\n",
-    "    \n",
-    "    MATHEMATICAL OPERATION:\n",
-    "        Attention(Q,K,V) = softmax(QK^T/√d_k)V\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use np.matmul() for matrix multiplication\n",
-    "    - Use np.swapaxes(K, -2, -1) to transpose last two dimensions\n",
-    "    - Use math.sqrt() for square root\n",
-    "    - Use np.where() for masking: np.where(mask == 0, -1e9, scores)\n",
-    "    - Implement softmax manually: exp(x) / sum(exp(x))\n",
-    "    - Use keepdims=True for broadcasting\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This exact function powers ChatGPT, BERT, GPT-4\n",
-    "    - The scaling prevents gradient vanishing in deep networks\n",
-    "    - Masking enables causal (GPT) and bidirectional (BERT) models\n",
-    "    - Attention weights are interpretable - you can visualize them!\n",
-    "    \n",
-    "    Args:\n",
-    "        Q: Query matrix of shape (..., seq_len_q, d_k)\n",
-    "        K: Key matrix of shape (..., seq_len_k, d_k)  \n",
-    "        V: Value matrix of shape (..., seq_len_v, d_v)\n",
-    "        mask: Optional mask of shape (..., seq_len_q, seq_len_k)\n",
-    "    \n",
-    "    Returns:\n",
-    "        output: Attention output (..., seq_len_q, d_v)\n",
-    "        attention_weights: Attention probabilities (..., seq_len_q, seq_len_k)\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Get the dimension for scaling\n",
-    "    d_k = Q.shape[-1]\n",
-    "    \n",
-    "    # Step 1: Compute attention scores (QK^T)\n",
-    "    # This measures similarity between each query and each key\n",
-    "    scores = np.matmul(Q, np.swapaxes(K, -2, -1))  # (..., seq_len_q, seq_len_k)\n",
-    "    \n",
-    "    # Step 2: Scale by √d_k to prevent exploding gradients\n",
-    "    scores = scores / math.sqrt(d_k)\n",
-    "    \n",
-    "    # Step 3: Apply mask if provided (for padding or causality)\n",
-    "    if mask is not None:\n",
-    "        # Replace masked positions with large negative values\n",
-    "        # This makes softmax output ~0 for these positions\n",
-    "        scores = np.where(mask == 0, -1e9, scores)\n",
-    "    \n",
-    "    # Step 4: Apply softmax to get attention probabilities\n",
-    "    # Each row sums to 1, representing where to focus attention\n",
-    "    # Using numerically stable softmax\n",
-    "    scores_max = np.max(scores, axis=-1, keepdims=True)\n",
-    "    scores_exp = np.exp(scores - scores_max)\n",
-    "    attention_weights = scores_exp / np.sum(scores_exp, axis=-1, keepdims=True)\n",
-    "    \n",
-    "    # Step 5: Apply attention weights to values\n",
-    "    # This gives us the weighted combination of values\n",
-    "    output = np.matmul(attention_weights, V)  # (..., seq_len_q, d_v)\n",
-    "    \n",
-    "    return output, attention_weights\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7b80a093",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Attention Implementation\n",
-    "\n",
-    "Once you implement the `scaled_dot_product_attention` function above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4d25f9f4",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-attention-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_scaled_dot_product_attention():\n",
-    "    \"\"\"Test scaled dot-product attention implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Scaled Dot-Product Attention...\")\n",
-    "\n",
-    "    # Create simple test data\n",
-    "    seq_len, d_model = 4, 6\n",
-    "    np.random.seed(42)\n",
-    "\n",
-    "    # Create Q, K, V matrices\n",
-    "    Q = np.random.randn(seq_len, d_model) * 0.1\n",
-    "    K = np.random.randn(seq_len, d_model) * 0.1  \n",
-    "    V = np.random.randn(seq_len, d_model) * 0.1\n",
-    "\n",
-    "    print(f\"📊 Input shapes: Q{Q.shape}, K{K.shape}, V{V.shape}\")\n",
-    "\n",
-    "    # Test attention\n",
-    "    output, weights = scaled_dot_product_attention(Q, K, V)\n",
-    "\n",
-    "    print(f\"📊 Output shapes: output{output.shape}, weights{weights.shape}\")\n",
-    "\n",
-    "    # Verify properties\n",
-    "    weights_sum = np.sum(weights, axis=-1)\n",
-    "    assert np.allclose(weights_sum, 1.0), f\"Attention weights should sum to 1, got {weights_sum}\"\n",
-    "    assert output.shape == (seq_len, d_model), f\"Output shape should be {(seq_len, d_model)}, got {output.shape}\"\n",
-    "    assert np.all(weights >= 0), \"All attention weights should be non-negative\"\n",
-    "\n",
-    "    # Test with mask\n",
-    "    mask = np.array([\n",
-    "        [1, 1, 0, 0],\n",
-    "        [1, 1, 1, 0], \n",
-    "        [1, 1, 1, 1],\n",
-    "        [1, 1, 1, 1]\n",
-    "    ])\n",
-    "    output_masked, weights_masked = scaled_dot_product_attention(Q, K, V, mask)\n",
-    "\n",
-    "    # Check that masked positions have near-zero attention\n",
-    "    masked_positions = (mask == 0)\n",
-    "    masked_weights = weights_masked[masked_positions]\n",
-    "    assert np.all(masked_weights < 1e-6), \"Masked positions should have near-zero weights\"\n",
-    "\n",
-    "    print(\"✅ Attention weights sum to 1: True\")\n",
-    "    print(\"✅ Output has correct shape: True\")\n",
-    "    print(\"✅ All weights are non-negative: True\")\n",
-    "    print(\"✅ Masked positions have near-zero weights: True\")\n",
-    "    print(\"📈 Progress: Scaled Dot-Product Attention ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_scaled_dot_product_attention()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4fdf778d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Self-Attention - The Most Common Case\n",
-    "\n",
-    "### What is Self-Attention?\n",
-    "**Self-Attention** is the most common use of attention where Q, K, and V all come from the same input sequence. This is what enables models like GPT to understand relationships between words in a sentence.\n",
-    "\n",
-    "### Why Self-Attention Matters\n",
-    "- **Context understanding**: Each word can attend to every other word\n",
-    "- **Long-range dependencies**: Connect distant related concepts\n",
-    "- **Parallel processing**: Unlike RNNs, all positions computed simultaneously\n",
-    "- **Foundation of GPT**: How language models understand context\n",
-    "\n",
-    "Let's create a convenient wrapper for self-attention!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ba64fa66",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "self-attention",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class SelfAttention:\n",
-    "    \"\"\"\n",
-    "    Self-Attention wrapper - Convenience class for self-attention where Q=K=V.\n",
-    "    \n",
-    "    This is the most common use case in transformer models where each position\n",
-    "    attends to all positions in the same sequence.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, d_model: int):\n",
-    "        \"\"\"\n",
-    "        Initialize Self-Attention.\n",
-    "        \n",
-    "        TODO: Store the model dimension for this self-attention layer.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Store d_model as an instance variable (self.d_model)\n",
-    "        2. Print initialization message for debugging\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        self_attn = SelfAttention(d_model=64)\n",
-    "        output, weights = self_attn(input_sequence)\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Simply store d_model parameter: self.d_model = d_model\n",
-    "        - Print message: print(f\"🔧 SelfAttention: d_model={d_model}\")\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is like nn.MultiheadAttention in PyTorch (but simpler)\n",
-    "        - Used in every transformer layer for self-attention\n",
-    "        - Foundation for understanding GPT, BERT architectures\n",
-    "        \n",
-    "        Args:\n",
-    "            d_model: Model dimension\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.d_model = d_model\n",
-    "        print(f\"🔧 SelfAttention: d_model={d_model}\")\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, x: np.ndarray, mask: Optional[np.ndarray] = None) -> Tuple[np.ndarray, np.ndarray]:\n",
-    "        \"\"\"\n",
-    "        Forward pass of self-attention.\n",
-    "        \n",
-    "        TODO: Apply self-attention where Q=K=V=x.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Call scaled_dot_product_attention with Q=K=V=x\n",
-    "        2. Pass the mask parameter through\n",
-    "        3. Return the output and attention weights\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        x = np.random.randn(seq_len, d_model)  # Input sequence\n",
-    "        output, weights = self_attn.forward(x)\n",
-    "        # weights[i,j] = how much position i attends to position j\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use the function you implemented above\n",
-    "        - Self-attention means: Q = K = V = x\n",
-    "        - Return: scaled_dot_product_attention(x, x, x, mask)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is how transformers process sequences\n",
-    "        - Each position can attend to any other position\n",
-    "        - Enables understanding of long-range dependencies\n",
-    "        \n",
-    "        Args:\n",
-    "            x: Input tensor (..., seq_len, d_model)\n",
-    "            mask: Optional attention mask\n",
-    "            \n",
-    "        Returns:\n",
-    "            output: Self-attention output (..., seq_len, d_model)\n",
-    "            attention_weights: Attention weights\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Self-attention: Q = K = V = x\n",
-    "        return scaled_dot_product_attention(x, x, x, mask)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __call__(self, x: np.ndarray, mask: Optional[np.ndarray] = None) -> Tuple[np.ndarray, np.ndarray]:\n",
-    "        \"\"\"Make the class callable.\"\"\"\n",
-    "        return self.forward(x, mask)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "77435888",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Self-Attention Implementation\n",
-    "\n",
-    "Once you implement the SelfAttention class above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "decc3f0c",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-self-attention-immediate",
-     "locked": true,
-     "points": 5,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_self_attention():\n",
-    "    \"\"\"Test self-attention wrapper\"\"\"\n",
-    "    print(\"🔬 Unit Test: Self-Attention...\")\n",
-    "\n",
-    "    # Test parameters\n",
-    "    d_model = 32\n",
-    "    seq_len = 8\n",
-    "    np.random.seed(42)\n",
-    "\n",
-    "    # Create test data (like word embeddings)\n",
-    "    x = np.random.randn(seq_len, d_model) * 0.1\n",
-    "\n",
-    "    print(f\"📊 Test setup: d_model={d_model}, seq_len={seq_len}\")\n",
-    "\n",
-    "    # Create self-attention\n",
-    "    self_attn = SelfAttention(d_model)\n",
-    "\n",
-    "    # Test forward pass\n",
-    "    output, weights = self_attn(x)\n",
-    "\n",
-    "    print(f\"📊 Output shapes: output{output.shape}, weights{weights.shape}\")\n",
-    "\n",
-    "    # Verify properties\n",
-    "    assert output.shape == x.shape, f\"Output shape should match input shape {x.shape}, got {output.shape}\"\n",
-    "    assert weights.shape == (seq_len, seq_len), f\"Attention weights shape should be {(seq_len, seq_len)}, got {weights.shape}\"\n",
-    "    assert np.allclose(np.sum(weights, axis=-1), 1.0), \"Attention weights should sum to 1\"\n",
-    "    assert weights.shape[0] == weights.shape[1], \"Self-attention weights should be square matrix\"\n",
-    "\n",
-    "    print(\"✅ Output shape preserved: True\")\n",
-    "    print(\"✅ Attention weights correct shape: True\")\n",
-    "    print(\"✅ Attention weights sum to 1: True\")\n",
-    "    print(\"✅ Self-attention is symmetric operation: True\")\n",
-    "    print(\"📈 Progress: Self-Attention ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_self_attention()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9452d069",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Attention Masking - Controlling Information Flow\n",
-    "\n",
-    "### Why Masking Matters\n",
-    "Masking allows us to control which positions can attend to which other positions:\n",
-    "\n",
-    "1. **Causal Masking**: For autoregressive models (like GPT) - can't see future tokens\n",
-    "2. **Padding Masking**: Ignore padding tokens in variable-length sequences\n",
-    "3. **Custom Masking**: Application-specific attention patterns\n",
-    "\n",
-    "### Types of Masks\n",
-    "- **Causal (Lower Triangular)**: Position i can only attend to positions ≤ i\n",
-    "- **Padding**: Mask out padding tokens so they don't affect attention\n",
-    "- **Bidirectional**: All positions can attend to all positions (like BERT)\n",
-    "\n",
-    "Let's implement these essential masking utilities!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f0859ba6",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "attention-masking",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def create_causal_mask(seq_len: int) -> np.ndarray:\n",
-    "    \"\"\"\n",
-    "    Create a causal (lower triangular) mask for autoregressive models.\n",
-    "    \n",
-    "    Used in models like GPT where each position can only attend to \n",
-    "    previous positions, not future ones.\n",
-    "    \n",
-    "    TODO: Create a lower triangular matrix of ones.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Use np.tril() to create lower triangular matrix\n",
-    "    2. Create matrix of ones with shape (seq_len, seq_len)\n",
-    "    3. Return the lower triangular part\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    mask = create_causal_mask(4)\n",
-    "    # mask = [[1, 0, 0, 0],\n",
-    "    #         [1, 1, 0, 0], \n",
-    "    #         [1, 1, 1, 0],\n",
-    "    #         [1, 1, 1, 1]]\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use np.ones((seq_len, seq_len)) to create matrix of ones\n",
-    "    - Use np.tril() to get lower triangular part\n",
-    "    - Or combine: np.tril(np.ones((seq_len, seq_len)))\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - Used in GPT for autoregressive generation\n",
-    "    - Prevents looking into the future during training\n",
-    "    - Essential for language modeling tasks\n",
-    "    \n",
-    "    Args:\n",
-    "        seq_len: Sequence length\n",
-    "        \n",
-    "    Returns:\n",
-    "        mask: Causal mask (seq_len, seq_len) with 1s for allowed positions, 0s for blocked\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    return np.tril(np.ones((seq_len, seq_len)))\n",
-    "    ### END SOLUTION\n",
-    "\n",
-    "#| export  \n",
-    "def create_padding_mask(lengths: List[int], max_length: int) -> np.ndarray:\n",
-    "    \"\"\"\n",
-    "    Create padding mask for variable-length sequences.\n",
-    "    \n",
-    "    TODO: Create mask that ignores padding tokens.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Initialize zero array with shape (batch_size, max_length, max_length)\n",
-    "    2. For each sequence in the batch, set valid positions to 1\n",
-    "    3. Valid positions are [:length, :length] for each sequence\n",
-    "    4. Return the mask array\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    lengths = [3, 2, 4]  # Actual sequence lengths\n",
-    "    mask = create_padding_mask(lengths, max_length=4)\n",
-    "    # For sequence 0 (length=3): positions [0,1,2] can attend to [0,1,2]\n",
-    "    # For sequence 1 (length=2): positions [0,1] can attend to [0,1] \n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - batch_size = len(lengths)\n",
-    "    - Use np.zeros((batch_size, max_length, max_length))\n",
-    "    - Loop through lengths: for i, length in enumerate(lengths)\n",
-    "    - Set valid region: mask[i, :length, :length] = 1\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - Used when sequences have different lengths\n",
-    "    - Prevents attention to padding tokens\n",
-    "    - Essential for efficient batch processing\n",
-    "    \n",
-    "    Args:\n",
-    "        lengths: List of actual sequence lengths\n",
-    "        max_length: Maximum sequence length (padded length)\n",
-    "        \n",
-    "    Returns:\n",
-    "        mask: Padding mask (batch_size, max_length, max_length)\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    batch_size = len(lengths)\n",
-    "    mask = np.zeros((batch_size, max_length, max_length))\n",
-    "    \n",
-    "    for i, length in enumerate(lengths):\n",
-    "        mask[i, :length, :length] = 1\n",
-    "    \n",
-    "    return mask\n",
-    "    ### END SOLUTION\n",
-    "\n",
-    "#| export\n",
-    "def create_bidirectional_mask(seq_len: int) -> np.ndarray:\n",
-    "    \"\"\"\n",
-    "    Create a bidirectional mask where all positions can attend to all positions.\n",
-    "    \n",
-    "    Used in models like BERT for bidirectional context understanding.\n",
-    "    \n",
-    "    TODO: Create a matrix of all ones.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Use np.ones() to create matrix of all ones\n",
-    "    2. Shape should be (seq_len, seq_len)\n",
-    "    3. Return the matrix\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    mask = create_bidirectional_mask(3)\n",
-    "    # mask = [[1, 1, 1],\n",
-    "    #         [1, 1, 1],\n",
-    "    #         [1, 1, 1]]\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Very simple: np.ones((seq_len, seq_len))\n",
-    "    - All positions can attend to all positions\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - Used in BERT for bidirectional understanding\n",
-    "    - Allows looking at past and future context\n",
-    "    - Good for understanding tasks, not generation\n",
-    "    \n",
-    "    Args:\n",
-    "        seq_len: Sequence length\n",
-    "        \n",
-    "    Returns:\n",
-    "        mask: All-ones mask (seq_len, seq_len)\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    return np.ones((seq_len, seq_len))\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "73de5db9",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Masking Functions\n",
-    "\n",
-    "Once you implement the masking functions above, run this cell to test them:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a1e4efdf",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-masking-immediate",
-     "locked": true,
-     "points": 5,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_attention_masking():\n",
-    "    \"\"\"Test attention masking utilities\"\"\"\n",
-    "    print(\"🔬 Unit Test: Attention Masking...\")\n",
-    "\n",
-    "    # Test causal mask\n",
-    "    seq_len = 5\n",
-    "    causal_mask = create_causal_mask(seq_len)\n",
-    "\n",
-    "    print(f\"📊 Causal mask for seq_len={seq_len}:\")\n",
-    "    print(causal_mask)\n",
-    "\n",
-    "    # Verify causal mask properties\n",
-    "    assert np.allclose(causal_mask, np.tril(causal_mask)), \"Causal mask should be lower triangular\"\n",
-    "    assert causal_mask.shape == (seq_len, seq_len), f\"Causal mask should have shape {(seq_len, seq_len)}\"\n",
-    "    assert np.all(np.triu(causal_mask, k=1) == 0), \"Causal mask upper triangle should be zeros\"\n",
-    "\n",
-    "    # Test padding mask\n",
-    "    lengths = [5, 3, 4]\n",
-    "    max_length = 5\n",
-    "    padding_mask = create_padding_mask(lengths, max_length)\n",
-    "\n",
-    "    print(f\"📊 Padding mask for lengths {lengths}, max_length={max_length}:\")\n",
-    "    print(\"Mask for sequence 0 (length 5):\")\n",
-    "    print(padding_mask[0])\n",
-    "    print(\"Mask for sequence 1 (length 3):\")\n",
-    "    print(padding_mask[1])\n",
-    "\n",
-    "    # Verify padding mask properties\n",
-    "    assert padding_mask.shape == (3, max_length, max_length), f\"Padding mask should have shape {(3, max_length, max_length)}\"\n",
-    "    assert np.all(padding_mask[0] == 1), \"Full-length sequence should be all ones\"\n",
-    "    assert np.all(padding_mask[1, 3:, :] == 0), \"Short sequence should have zeros in padding area\"\n",
-    "\n",
-    "    # Test bidirectional mask\n",
-    "    bidirectional_mask = create_bidirectional_mask(seq_len)\n",
-    "    assert np.all(bidirectional_mask == 1), \"Bidirectional mask should be all ones\"\n",
-    "    assert bidirectional_mask.shape == (seq_len, seq_len), f\"Bidirectional mask should have shape {(seq_len, seq_len)}\"\n",
-    "\n",
-    "    print(\"✅ Causal mask is lower triangular: True\")\n",
-    "    print(\"✅ Causal mask has correct shape: True\")\n",
-    "    print(\"✅ Causal mask upper triangle is zeros: True\")\n",
-    "    print(\"✅ Padding mask has correct shape: True\")\n",
-    "    print(\"✅ Full-length sequence is all ones: True\")\n",
-    "    print(\"✅ Short sequence has zeros in padding area: True\")\n",
-    "    print(\"✅ Bidirectional mask is all ones: True\")\n",
-    "    print(\"✅ Bidirectional mask has correct shape: True\")\n",
-    "    print(\"📈 Progress: Attention Masking ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_attention_masking()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e0186421",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 5: Complete System Integration Test\n",
-    "\n",
-    "### Bringing It All Together\n",
-    "Let's test all components working together in a realistic scenario similar to how they would be used in actual transformer models."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "884802a7",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-integration-final",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_complete_attention_system():\n",
-    "    \"\"\"Test the complete attention system working together\"\"\"\n",
-    "    print(\"🔬 Unit Test: Complete Attention System Integration...\")\n",
-    "\n",
-    "    # Test parameters\n",
-    "    d_model = 64\n",
-    "    seq_len = 16\n",
-    "    batch_size = 2\n",
-    "    np.random.seed(42)\n",
-    "\n",
-    "    print(f\"📊 Integration test: d_model={d_model}, seq_len={seq_len}, batch_size={batch_size}\")\n",
-    "\n",
-    "    # Step 1: Create input embeddings (simulating word embeddings)\n",
-    "    embeddings = np.random.randn(batch_size, seq_len, d_model) * 0.1\n",
-    "    print(f\"📊 Input embeddings: {embeddings.shape}\")\n",
-    "\n",
-    "    # Step 2: Test basic attention\n",
-    "    output, attention_weights = scaled_dot_product_attention(embeddings, embeddings, embeddings)\n",
-    "    assert output.shape == embeddings.shape, \"Basic attention should preserve shape\"\n",
-    "    print(f\"✅ Basic attention works: {output.shape}\")\n",
-    "\n",
-    "    # Step 3: Test self-attention wrapper\n",
-    "    self_attn = SelfAttention(d_model)\n",
-    "    self_output, self_weights = self_attn(embeddings[0])  # Single batch item\n",
-    "    assert self_output.shape == (seq_len, d_model), \"Self-attention should preserve shape\"\n",
-    "    print(f\"✅ Self-attention output: {self_output.shape}\")\n",
-    "\n",
-    "    # Step 4: Test with causal mask (like GPT)\n",
-    "    causal_mask = create_causal_mask(seq_len)\n",
-    "    causal_output, causal_weights = scaled_dot_product_attention(\n",
-    "        embeddings[0], embeddings[0], embeddings[0], causal_mask\n",
-    "    )\n",
-    "    assert causal_output.shape == (seq_len, d_model), \"Causal attention should preserve shape\"\n",
-    "    print(f\"✅ Causal attention works: {causal_output.shape}\")\n",
-    "\n",
-    "    # Step 5: Test with padding mask (variable lengths)\n",
-    "    lengths = [seq_len, seq_len-3]  # Different sequence lengths\n",
-    "    padding_mask = create_padding_mask(lengths, seq_len)\n",
-    "    padded_output, padded_weights = scaled_dot_product_attention(\n",
-    "        embeddings[0], embeddings[0], embeddings[0], padding_mask[0]\n",
-    "    )\n",
-    "    assert padded_output.shape == (seq_len, d_model), \"Padding attention should preserve shape\"\n",
-    "    print(f\"✅ Padding mask works: {padded_output.shape}\")\n",
-    "\n",
-    "    # Step 6: Verify all outputs have correct properties\n",
-    "    assert np.allclose(np.sum(attention_weights, axis=-1), 1.0), \"All attention weights should sum to 1\"\n",
-    "    assert output.shape == embeddings.shape, \"All outputs should preserve input shape\"\n",
-    "    assert np.all(np.triu(causal_weights, k=1) < 1e-6), \"Causal masking should work\"\n",
-    "\n",
-    "    print(\"✅ All attention weights sum to 1: True\")\n",
-    "    print(\"✅ All outputs preserve input shape: True\")\n",
-    "    print(\"✅ Causal masking works: True\")\n",
-    "    print(\"📈 Progress: Complete Attention System ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_complete_attention_system()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7fe63253",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Attention Behavior Analysis\n",
-    "\n",
-    "Let's create a simple example to see what attention patterns emerge and understand the behavior."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "967e42ed",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "attention-analysis",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🎯 Attention behavior analysis:\")\n",
-    "\n",
-    "# Create a simple sequence with clear patterns\n",
-    "simple_seq = np.array([\n",
-    "    [1, 0, 0, 0],  # Position 0: [1, 0, 0, 0]\n",
-    "    [0, 1, 0, 0],  # Position 1: [0, 1, 0, 0]  \n",
-    "    [0, 0, 1, 0],  # Position 2: [0, 0, 1, 0]\n",
-    "    [1, 0, 0, 0],  # Position 3: [1, 0, 0, 0] (same as position 0)\n",
-    "])\n",
-    "\n",
-    "print(f\"🎯 Simple test sequence shape: {simple_seq.shape}\")\n",
-    "\n",
-    "# Apply attention\n",
-    "output, weights = scaled_dot_product_attention(simple_seq, simple_seq, simple_seq)\n",
-    "\n",
-    "print(f\"🎯 Attention pattern analysis:\")\n",
-    "print(f\"Position 0 attends most to position: {np.argmax(weights[0])}\")\n",
-    "print(f\"Position 3 attends most to position: {np.argmax(weights[3])}\")\n",
-    "print(f\"✅ Positions with same content should attend to each other!\")\n",
-    "\n",
-    "# Test with causal masking\n",
-    "causal_mask = create_causal_mask(4)\n",
-    "output_causal, weights_causal = scaled_dot_product_attention(simple_seq, simple_seq, simple_seq, causal_mask)\n",
-    "\n",
-    "print(f\"🎯 With causal masking:\")\n",
-    "print(f\"Position 3 can only attend to positions 0-3: {np.sum(weights_causal[3, :]) > 0.99}\")\n",
-    "\n",
-    "if _should_show_plots():\n",
-    "    plt.figure(figsize=(12, 4))\n",
-    "    \n",
-    "    plt.subplot(1, 3, 1)\n",
-    "    plt.imshow(weights, cmap='Blues')\n",
-    "    plt.title('Full Attention Weights\\n(Darker = Higher Attention)')\n",
-    "    plt.xlabel('Key Position')\n",
-    "    plt.ylabel('Query Position')\n",
-    "    plt.colorbar()\n",
-    "    \n",
-    "    # Add text annotations\n",
-    "    for i in range(4):\n",
-    "        for j in range(4):\n",
-    "            plt.text(j, i, f'{weights[i,j]:.2f}', \n",
-    "                    ha='center', va='center', \n",
-    "                    color='white' if weights[i,j] > 0.5 else 'black')\n",
-    "    \n",
-    "    plt.subplot(1, 3, 2)\n",
-    "    plt.imshow(weights_causal, cmap='Blues')\n",
-    "    plt.title('Causal Attention Weights\\n(Upper triangle masked)')\n",
-    "    plt.xlabel('Key Position')\n",
-    "    plt.ylabel('Query Position')\n",
-    "    plt.colorbar()\n",
-    "    \n",
-    "    plt.subplot(1, 3, 3)\n",
-    "    plt.plot(weights[0], 'o-', label='Position 0 attention')\n",
-    "    plt.plot(weights[3], 's-', label='Position 3 attention')\n",
-    "    plt.xlabel('Attending to Position')\n",
-    "    plt.ylabel('Attention Weight')\n",
-    "    plt.title('Attention Distribution')\n",
-    "    plt.legend()\n",
-    "    plt.grid(True)\n",
-    "    \n",
-    "    plt.tight_layout()\n",
-    "    plt.show()\n",
-    "\n",
-    "print(\"🎯 Attention learns to focus on similar content!\")\n",
-    "\n",
-    "print(\"\\n\" + \"=\"*50)\n",
-    "print(\"🔥 ATTENTION MODULE COMPLETE!\")\n",
-    "print(\"=\"*50)\n",
-    "print(\"✅ Scaled dot-product attention\")\n",
-    "print(\"✅ Self-attention wrapper\") \n",
-    "print(\"✅ Causal masking\")\n",
-    "print(\"✅ Padding masking\")\n",
-    "print(\"✅ Bidirectional masking\")\n",
-    "print(\"✅ Attention visualization\")\n",
-    "print(\"✅ Complete integration tests\")\n",
-    "print(\"\\nYou now understand the core mechanism powering modern AI! 🚀\")\n",
-    "print(\"Next: Learn how to build complete transformer models using this foundation.\")\n",
-    "\n",
-    "def test_attention_mechanism():\n",
-    "    \"\"\"Test attention mechanism implementation.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Attention Mechanism...\")\n",
-    "    \n",
-    "    # Test basic attention\n",
-    "    Q = np.random.randn(4, 6) * 0.1\n",
-    "    K = np.random.randn(4, 6) * 0.1  \n",
-    "    V = np.random.randn(4, 6) * 0.1\n",
-    "    output, weights = scaled_dot_product_attention(Q, K, V)\n",
-    "    \n",
-    "    assert output.shape == (4, 6), \"Attention should produce correct output shape\"\n",
-    "    assert weights.shape == (4, 4), \"Attention weights should be square matrix\"\n",
-    "    assert np.allclose(np.sum(weights, axis=-1), 1.0), \"Attention weights should sum to 1\"\n",
-    "    \n",
-    "    print(\"✅ Attention mechanism works correctly\")\n",
-    "\n",
-    "def test_self_attention_wrapper():\n",
-    "    \"\"\"Test self-attention wrapper implementation.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Self-Attention Wrapper...\")\n",
-    "    \n",
-    "    # Test self-attention\n",
-    "    self_attn = SelfAttention(d_model=32)\n",
-    "    x = np.random.randn(8, 32) * 0.1\n",
-    "    output, weights = self_attn(x)\n",
-    "    \n",
-    "    assert output.shape == x.shape, \"Self-attention should preserve input shape\"\n",
-    "    assert weights.shape == (8, 8), \"Self-attention weights should be square\"\n",
-    "    assert np.allclose(np.sum(weights, axis=-1), 1.0), \"Weights should sum to 1\"\n",
-    "    \n",
-    "    print(\"✅ Self-attention wrapper works correctly\")\n",
-    "\n",
-    "def test_masking_utilities():\n",
-    "    \"\"\"Test attention masking utilities.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Masking Utilities...\")\n",
-    "    \n",
-    "    # Test causal mask\n",
-    "    causal_mask = create_causal_mask(4)\n",
-    "    assert np.allclose(causal_mask, np.tril(causal_mask)), \"Causal mask should be lower triangular\"\n",
-    "    \n",
-    "    # Test padding mask  \n",
-    "    padding_mask = create_padding_mask([3, 2], 4)\n",
-    "    assert padding_mask.shape == (2, 4, 4), \"Padding mask should have correct shape\"\n",
-    "    \n",
-    "    # Test bidirectional mask\n",
-    "    bidirectional_mask = create_bidirectional_mask(3)\n",
-    "    assert np.all(bidirectional_mask == 1), \"Bidirectional mask should be all ones\"\n",
-    "    \n",
-    "    print(\"✅ Masking utilities work correctly\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fdf6ec9a",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "51a48a5e",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Attention\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "926deae2",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary\n",
-    "\n",
-    "Congratulations! You've successfully implemented the revolutionary attention mechanism that powers all modern AI systems:\n",
-    "\n",
-    "### What You've Accomplished\n",
-    "✅ **Scaled Dot-Product Attention**: Implemented the mathematical core of all transformer models  \n",
-    "✅ **Self-Attention Wrapper**: Built the mechanism that enables sequence understanding  \n",
-    "✅ **Attention Masking**: Created causal, padding, and bidirectional attention patterns  \n",
-    "✅ **Complete Integration**: Tested all components working together seamlessly  \n",
-    "✅ **Real Applications**: Applied attention to sequence processing and pattern matching\n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "- **Attention as dynamic pattern matching**: Query-Key-Value projections enable adaptive focus\n",
-    "- **Mathematical foundation**: Attention(Q,K,V) = softmax(QK^T/√d_k)V powers all modern AI\n",
-    "- **Global connectivity**: Unlike convolution, attention connects all positions directly\n",
-    "- **Interpretability**: Attention weights reveal what the model focuses on\n",
-    "- **Masking mechanisms**: Control information flow for different model architectures\n",
-    "\n",
-    "### Mathematical Foundations\n",
-    "- **Attention formula**: The exact operation used in ChatGPT, BERT, GPT-4\n",
-    "- **Scaling factor**: √d_k prevents gradient vanishing in deep networks\n",
-    "- **Softmax normalization**: Converts similarity scores to probability distributions\n",
-    "- **Matrix operations**: Efficient parallel computation of all attention heads\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Language models**: ChatGPT, GPT-4, BERT use this exact mechanism\n",
-    "- **Machine translation**: Google Translate's transformer architecture\n",
-    "- **Computer vision**: Vision Transformers (ViTs) for image classification\n",
-    "- **Multimodal AI**: DALL-E, CLIP combining text and image understanding\n",
-    "\n",
-    "### Attention vs. Convolution Insights\n",
-    "- **Receptive field**: Attention is global from layer 1, convolution is local\n",
-    "- **Computation**: Attention is O(n²), convolution is O(n) with kernel size\n",
-    "- **Weights**: Attention weights are dynamic and input-dependent\n",
-    "- **Best applications**: Attention excels at sequential/relational data\n",
-    "\n",
-    "### Architecture Design Patterns\n",
-    "- **Self-attention**: Most common pattern where Q=K=V=input\n",
-    "- **Causal masking**: Enables autoregressive generation (GPT-style models)\n",
-    "- **Bidirectional**: Allows full context access (BERT-style models)\n",
-    "- **Padding masks**: Handle variable-length sequences efficiently\n",
-    "\n",
-    "### Performance Characteristics\n",
-    "- **Quadratic scaling**: Memory and computation grow with sequence length squared\n",
-    "- **Parallelization**: All positions computed simultaneously (unlike RNNs)\n",
-    "- **Memory efficiency**: Attention weights require careful management\n",
-    "- **Gradient flow**: Direct connections enable training very deep networks\n",
-    "\n",
-    "### Transformer Building Blocks\n",
-    "Your attention implementation is the foundation for:\n",
-    "- **Multi-head attention**: Multiple attention heads in parallel\n",
-    "- **Transformer blocks**: Attention + feedforward + residual connections\n",
-    "- **Positional encoding**: Adding sequence position information\n",
-    "- **Complete transformers**: Full encoder-decoder architectures\n",
-    "\n",
-    "### Next Steps\n",
-    "1. **Export your code**: Use NBDev to export to the `tinytorch` package\n",
-    "2. **Test your implementation**: Run the complete test suite\n",
-    "3. **Build transformer architectures**: \n",
-    "   ```python\n",
-    "   from tinytorch.core.attention import scaled_dot_product_attention, SelfAttention\n",
-    "   from tinytorch.core.attention import create_causal_mask, create_padding_mask\n",
-    "   \n",
-    "   # Create self-attention\n",
-    "   self_attn = SelfAttention(d_model=512)\n",
-    "   \n",
-    "   # Process sequence with causal masking (GPT-style)\n",
-    "   mask = create_causal_mask(seq_len)\n",
-    "   output, weights = self_attn(embeddings, mask)\n",
-    "   \n",
-    "   # Visualize attention patterns\n",
-    "   plt.imshow(weights, cmap='Blues')\n",
-    "   plt.title('Attention Patterns')\n",
-    "   ```\n",
-    "4. **Explore advanced transformers**: Multi-head attention, positional encoding, full transformer blocks!\n",
-    "\n",
-    "### The Revolutionary Impact\n",
-    "You've implemented the mechanism that:\n",
-    "- **Revolutionized NLP**: Enabled ChatGPT, GPT-4, BERT breakthrough performance\n",
-    "- **Transformed computer vision**: Vision Transformers (ViTs) now compete with CNNs\n",
-    "- **Powers modern AI**: Almost every state-of-the-art model uses attention\n",
-    "- **Enables interpretability**: Attention weights show what AI models focus on\n",
-    "\n",
-    "**Ready for the next challenge?** Let's build complete transformer architectures using your attention foundation!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/08_dataloader/dataloader_dev.ipynb b/modules/source/08_dataloader/dataloader_dev.ipynb
deleted file mode 100644
index c5ed7996..00000000
--- a/modules/source/08_dataloader/dataloader_dev.ipynb
+++ /dev/null
@@ -1,1398 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "b88708de",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# DataLoader - Data Loading and Preprocessing\n",
-    "\n",
-    "Welcome to the DataLoader module! This is where you'll learn how to efficiently load, process, and manage data for machine learning systems.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand data pipelines as the foundation of ML systems\n",
-    "- Implement efficient data loading with memory management and batching\n",
-    "- Build reusable dataset abstractions for different data types\n",
-    "- Master the Dataset and DataLoader pattern used in all ML frameworks\n",
-    "- Learn systems thinking for data engineering and I/O optimization\n",
-    "\n",
-    "## Build → Use → Reflect\n",
-    "1. **Build**: Create dataset classes and data loaders from scratch\n",
-    "2. **Use**: Load real datasets and feed them to neural networks\n",
-    "3. **Reflect**: How data engineering affects system performance and scalability\n",
-    "\n",
-    "## What You'll Learn\n",
-    "By the end of this module, you'll understand:\n",
-    "- The Dataset pattern for consistent data access\n",
-    "- How DataLoaders enable efficient batch processing\n",
-    "- Why batching and shuffling are crucial for ML\n",
-    "- How to handle datasets larger than memory\n",
-    "- The connection between data engineering and model performance"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8a1a46d2",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "dataloader-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.dataloader\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "import os\n",
-    "import pickle\n",
-    "import struct\n",
-    "from typing import List, Tuple, Optional, Union, Iterator\n",
-    "import matplotlib.pyplot as plt\n",
-    "import urllib.request\n",
-    "import tarfile\n",
-    "\n",
-    "# Import our building blocks - try package first, then local modules\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
-    "    from tensor_dev import Tensor"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9fc27557",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "dataloader-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f37cacaf",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "dataloader-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch DataLoader Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build data pipelines!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "decfa343",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/06_dataloader/dataloader_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.dataloader`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.dataloader import Dataset, DataLoader  # Data loading utilities!\n",
-    "from tinytorch.core.tensor import Tensor  # Foundation\n",
-    "from tinytorch.core.networks import Sequential  # Models to train\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused modules for deep understanding of data pipelines\n",
-    "- **Production:** Proper organization like PyTorch's `torch.utils.data`\n",
-    "- **Consistency:** All data loading utilities live together in `core.dataloader`\n",
-    "- **Integration:** Works seamlessly with tensors and networks"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "daf1136d",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 1: Understanding Data Pipelines\n",
-    "\n",
-    "### What are Data Pipelines?\n",
-    "**Data pipelines** are the systems that efficiently move data from storage to your model. They're the foundation of all machine learning systems.\n",
-    "\n",
-    "### The Data Pipeline Equation\n",
-    "```\n",
-    "Raw Data → Load → Transform → Batch → Model → Predictions\n",
-    "```\n",
-    "\n",
-    "### Why Data Pipelines Matter\n",
-    "- **Performance**: Efficient loading prevents GPU starvation\n",
-    "- **Scalability**: Handle datasets larger than memory\n",
-    "- **Consistency**: Reproducible data processing\n",
-    "- **Flexibility**: Easy to switch between datasets\n",
-    "\n",
-    "### Real-World Challenges\n",
-    "- **Memory constraints**: Datasets often exceed available RAM\n",
-    "- **I/O bottlenecks**: Disk access is much slower than computation\n",
-    "- **Batch processing**: Neural networks need batched data for efficiency\n",
-    "- **Shuffling**: Random order prevents overfitting\n",
-    "\n",
-    "### Systems Thinking\n",
-    "- **Memory efficiency**: Handle datasets larger than RAM\n",
-    "- **I/O optimization**: Read from disk efficiently\n",
-    "- **Batching strategies**: Trade-offs between memory and speed\n",
-    "- **Caching**: When to cache vs recompute\n",
-    "\n",
-    "### Visual Intuition\n",
-    "```\n",
-    "Raw Files: [image1.jpg, image2.jpg, image3.jpg, ...]\n",
-    "Load: [Tensor(32x32x3), Tensor(32x32x3), Tensor(32x32x3), ...]\n",
-    "Batch: [Tensor(32, 32, 32, 3)]  # 32 images at once\n",
-    "Model: Process batch efficiently\n",
-    "```\n",
-    "\n",
-    "Let's start by building the most fundamental component: **Dataset**."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1881387d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Building the Dataset Interface\n",
-    "\n",
-    "### What is a Dataset?\n",
-    "A **Dataset** is an abstract interface that provides consistent access to data. It's the foundation of all data loading systems.\n",
-    "\n",
-    "### Why Abstract Interfaces Matter\n",
-    "- **Consistency**: Same interface for all data types\n",
-    "- **Flexibility**: Easy to switch between datasets\n",
-    "- **Testability**: Easy to create test datasets\n",
-    "- **Extensibility**: Easy to add new data sources\n",
-    "\n",
-    "### The Dataset Pattern\n",
-    "```python\n",
-    "class Dataset:\n",
-    "    def __getitem__(self, index):  # Get single sample\n",
-    "        return data, label\n",
-    "    \n",
-    "    def __len__(self):  # Get dataset size\n",
-    "        return total_samples\n",
-    "```\n",
-    "\n",
-    "### Real-World Usage\n",
-    "- **Computer vision**: ImageNet, CIFAR-10, custom image datasets\n",
-    "- **NLP**: Text datasets, tokenized sequences\n",
-    "- **Audio**: Audio files, spectrograms\n",
-    "- **Time series**: Sequential data with proper windowing\n",
-    "\n",
-    "Let's implement the Dataset interface!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f02bc42c",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "dataset-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Dataset:\n",
-    "    \"\"\"\n",
-    "    Base Dataset class: Abstract interface for all datasets.\n",
-    "    \n",
-    "    The fundamental abstraction for data loading in TinyTorch.\n",
-    "    Students implement concrete datasets by inheriting from this class.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __getitem__(self, index: int) -> Tuple[Tensor, Tensor]:\n",
-    "        \"\"\"\n",
-    "        Get a single sample and label by index.\n",
-    "        \n",
-    "        Args:\n",
-    "            index: Index of the sample to retrieve\n",
-    "            \n",
-    "        Returns:\n",
-    "            Tuple of (data, label) tensors\n",
-    "            \n",
-    "        TODO: Implement abstract method for getting samples.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. This is an abstract method - subclasses will implement it\n",
-    "        2. Return a tuple of (data, label) tensors\n",
-    "        3. Data should be the input features, label should be the target\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        dataset[0] should return (Tensor(image_data), Tensor(label))\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - This is an abstract method that subclasses must override\n",
-    "        - Always return a tuple of (data, label) tensors\n",
-    "        - Data contains the input features, label contains the target\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # This is an abstract method - subclasses must implement it\n",
-    "        raise NotImplementedError(\"Subclasses must implement __getitem__\")\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def __len__(self) -> int:\n",
-    "        \"\"\"\n",
-    "        Get the total number of samples in the dataset.\n",
-    "        \n",
-    "        TODO: Implement abstract method for getting dataset size.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. This is an abstract method - subclasses will implement it\n",
-    "        2. Return the total number of samples in the dataset\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        len(dataset) should return 50000 for CIFAR-10 training set\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - This is an abstract method that subclasses must override\n",
-    "        - Return an integer representing the total number of samples\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # This is an abstract method - subclasses must implement it\n",
-    "        raise NotImplementedError(\"Subclasses must implement __len__\")\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def get_sample_shape(self) -> Tuple[int, ...]:\n",
-    "        \"\"\"\n",
-    "        Get the shape of a single data sample.\n",
-    "        \n",
-    "        TODO: Implement method to get sample shape.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Get the first sample using self[0]\n",
-    "        2. Extract the data part (first element of tuple)\n",
-    "        3. Return the shape of the data tensor\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        For CIFAR-10: returns (3, 32, 32) for RGB images\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use self[0] to get the first sample\n",
-    "        - Extract data from the (data, label) tuple\n",
-    "        - Return data.shape\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Get the first sample to determine shape\n",
-    "        data, _ = self[0]\n",
-    "        return data.shape\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def get_num_classes(self) -> int:\n",
-    "        \"\"\"\n",
-    "        Get the number of classes in the dataset.\n",
-    "        \n",
-    "        TODO: Implement abstract method for getting number of classes.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. This is an abstract method - subclasses will implement it\n",
-    "        2. Return the number of unique classes in the dataset\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        For CIFAR-10: returns 10 (classes 0-9)\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - This is an abstract method that subclasses must override\n",
-    "        - Return the number of unique classes/categories\n",
-    "        \"\"\"\n",
-    "        # This is an abstract method - subclasses must implement it\n",
-    "        raise NotImplementedError(\"Subclasses must implement get_num_classes\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fe072a6b",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: Dataset Interface\n",
-    "\n",
-    "Let's understand the Dataset interface! While we can't test the abstract class directly, we'll create a simple test dataset.\n",
-    "\n",
-    "**This is a unit test** - it tests the Dataset interface pattern in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f5dbcde5",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-dataset-interface-immediate",
-     "locked": true,
-     "points": 5,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test Dataset interface with a simple implementation\n",
-    "print(\"🔬 Unit Test: Dataset Interface...\")\n",
-    "\n",
-    "# Create a minimal test dataset\n",
-    "class TestDataset(Dataset):\n",
-    "    def __init__(self, size=5):\n",
-    "        self.size = size\n",
-    "    \n",
-    "    def __getitem__(self, index):\n",
-    "        # Simple test data: features are [index, index*2], label is index % 2\n",
-    "        data = Tensor([index, index * 2])\n",
-    "        label = Tensor([index % 2])\n",
-    "        return data, label\n",
-    "    \n",
-    "    def __len__(self):\n",
-    "        return self.size\n",
-    "    \n",
-    "    def get_num_classes(self):\n",
-    "        return 2\n",
-    "\n",
-    "# Test the interface\n",
-    "try:\n",
-    "    test_dataset = TestDataset(size=5)\n",
-    "    print(f\"Dataset created with size: {len(test_dataset)}\")\n",
-    "    \n",
-    "    # Test __getitem__\n",
-    "    data, label = test_dataset[0]\n",
-    "    print(f\"Sample 0: data={data}, label={label}\")\n",
-    "    assert isinstance(data, Tensor), \"Data should be a Tensor\"\n",
-    "    assert isinstance(label, Tensor), \"Label should be a Tensor\"\n",
-    "    print(\"✅ Dataset __getitem__ works correctly\")\n",
-    "    \n",
-    "    # Test __len__\n",
-    "    assert len(test_dataset) == 5, f\"Dataset length should be 5, got {len(test_dataset)}\"\n",
-    "    print(\"✅ Dataset __len__ works correctly\")\n",
-    "    \n",
-    "    # Test get_num_classes\n",
-    "    assert test_dataset.get_num_classes() == 2, f\"Should have 2 classes, got {test_dataset.get_num_classes()}\"\n",
-    "    print(\"✅ Dataset get_num_classes works correctly\")\n",
-    "    \n",
-    "    # Test get_sample_shape\n",
-    "    sample_shape = test_dataset.get_sample_shape()\n",
-    "    assert sample_shape == (2,), f\"Sample shape should be (2,), got {sample_shape}\"\n",
-    "    print(\"✅ Dataset get_sample_shape works correctly\")\n",
-    "    \n",
-    "    # Test multiple samples\n",
-    "    for i in range(3):\n",
-    "        data, label = test_dataset[i]\n",
-    "        expected_data = [i, i * 2]\n",
-    "        expected_label = [i % 2]\n",
-    "        assert np.array_equal(data.data, expected_data), f\"Data mismatch at index {i}\"\n",
-    "        assert np.array_equal(label.data, expected_label), f\"Label mismatch at index {i}\"\n",
-    "    print(\"✅ Dataset produces correct data for multiple samples\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Dataset interface test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Show the dataset pattern\n",
-    "print(\"🎯 Dataset interface pattern:\")\n",
-    "print(\"   __getitem__: Returns (data, label) tuple\")\n",
-    "print(\"   __len__: Returns dataset size\")\n",
-    "print(\"   get_num_classes: Returns number of classes\")\n",
-    "print(\"   get_sample_shape: Returns shape of data samples\")\n",
-    "print(\"📈 Progress: Dataset interface ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "84c87935",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Building the DataLoader\n",
-    "\n",
-    "### What is a DataLoader?\n",
-    "A **DataLoader** efficiently batches and iterates through datasets. It's the bridge between individual samples and the batched data that neural networks expect.\n",
-    "\n",
-    "### Why DataLoaders Matter\n",
-    "- **Batching**: Groups samples for efficient GPU computation\n",
-    "- **Shuffling**: Randomizes data order to prevent overfitting\n",
-    "- **Memory efficiency**: Loads data on-demand rather than all at once\n",
-    "- **Iteration**: Provides clean interface for training loops\n",
-    "\n",
-    "### The DataLoader Pattern\n",
-    "```python\n",
-    "DataLoader(dataset, batch_size=32, shuffle=True)\n",
-    "for batch_data, batch_labels in dataloader:\n",
-    "    # batch_data.shape: (32, ...)\n",
-    "    # batch_labels.shape: (32,)\n",
-    "    # Train on batch\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Training loops**: Feed batches to neural networks\n",
-    "- **Validation**: Evaluate models on held-out data\n",
-    "- **Inference**: Process large datasets efficiently\n",
-    "- **Data analysis**: Explore datasets systematically\n",
-    "\n",
-    "### Systems Thinking\n",
-    "- **Batch size**: Trade-off between memory and speed\n",
-    "- **Shuffling**: Prevents overfitting to data order\n",
-    "- **Iteration**: Efficient looping through data\n",
-    "- **Memory**: Manage large datasets that don't fit in RAM"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0918d8cf",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "dataloader-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class DataLoader:\n",
-    "    \"\"\"\n",
-    "    DataLoader: Efficiently batch and iterate through datasets.\n",
-    "    \n",
-    "    Provides batching, shuffling, and efficient iteration over datasets.\n",
-    "    Essential for training neural networks efficiently.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, dataset: Dataset, batch_size: int = 32, shuffle: bool = True):\n",
-    "        \"\"\"\n",
-    "        Initialize DataLoader.\n",
-    "        \n",
-    "        Args:\n",
-    "            dataset: Dataset to load from\n",
-    "            batch_size: Number of samples per batch\n",
-    "            shuffle: Whether to shuffle data each epoch\n",
-    "            \n",
-    "        TODO: Store configuration and dataset.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store dataset as self.dataset\n",
-    "        2. Store batch_size as self.batch_size\n",
-    "        3. Store shuffle as self.shuffle\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        DataLoader(dataset, batch_size=32, shuffle=True)\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Store all parameters as instance variables\n",
-    "        - These will be used in __iter__ for batching\n",
-    "        \"\"\"\n",
-    "        # Input validation\n",
-    "        if dataset is None:\n",
-    "            raise TypeError(\"Dataset cannot be None\")\n",
-    "        if not isinstance(batch_size, int) or batch_size <= 0:\n",
-    "            raise ValueError(f\"Batch size must be a positive integer, got {batch_size}\")\n",
-    "        \n",
-    "        self.dataset = dataset\n",
-    "        self.batch_size = batch_size\n",
-    "        self.shuffle = shuffle\n",
-    "    \n",
-    "    def __iter__(self) -> Iterator[Tuple[Tensor, Tensor]]:\n",
-    "        \"\"\"\n",
-    "        Iterate through dataset in batches.\n",
-    "        \n",
-    "        Returns:\n",
-    "            Iterator yielding (batch_data, batch_labels) tuples\n",
-    "            \n",
-    "        TODO: Implement batching and shuffling logic.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Create indices list: list(range(len(dataset)))\n",
-    "        2. Shuffle indices if self.shuffle is True\n",
-    "        3. Loop through indices in batch_size chunks\n",
-    "        4. For each batch: collect samples, stack them, yield batch\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        for batch_data, batch_labels in dataloader:\n",
-    "            # batch_data.shape: (batch_size, ...)\n",
-    "            # batch_labels.shape: (batch_size,)\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use list(range(len(self.dataset))) for indices\n",
-    "        - Use np.random.shuffle() if self.shuffle is True\n",
-    "        - Loop in chunks of self.batch_size\n",
-    "        - Collect samples and stack with np.stack()\n",
-    "        \"\"\"\n",
-    "        # Create indices for all samples\n",
-    "        indices = list(range(len(self.dataset)))\n",
-    "        \n",
-    "        # Shuffle if requested\n",
-    "        if self.shuffle:\n",
-    "            np.random.shuffle(indices)\n",
-    "        \n",
-    "        # Iterate through indices in batches\n",
-    "        for i in range(0, len(indices), self.batch_size):\n",
-    "            batch_indices = indices[i:i + self.batch_size]\n",
-    "            \n",
-    "            # Collect samples for this batch\n",
-    "            batch_data = []\n",
-    "            batch_labels = []\n",
-    "            \n",
-    "            for idx in batch_indices:\n",
-    "                data, label = self.dataset[idx]\n",
-    "                batch_data.append(data.data)\n",
-    "                batch_labels.append(label.data)\n",
-    "            \n",
-    "            # Stack into batch tensors\n",
-    "            batch_data_array = np.stack(batch_data, axis=0)\n",
-    "            batch_labels_array = np.stack(batch_labels, axis=0)\n",
-    "            \n",
-    "            yield Tensor(batch_data_array), Tensor(batch_labels_array)\n",
-    "    \n",
-    "    def __len__(self) -> int:\n",
-    "        \"\"\"\n",
-    "        Get the number of batches per epoch.\n",
-    "        \n",
-    "        TODO: Calculate number of batches.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Get dataset size: len(self.dataset)\n",
-    "        2. Divide by batch_size and round up\n",
-    "        3. Use ceiling division: (n + batch_size - 1) // batch_size\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        Dataset size 100, batch size 32 → 4 batches\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use len(self.dataset) for dataset size\n",
-    "        - Use ceiling division for exact batch count\n",
-    "        - Formula: (dataset_size + batch_size - 1) // batch_size\n",
-    "        \"\"\"\n",
-    "        # Calculate number of batches using ceiling division\n",
-    "        dataset_size = len(self.dataset)\n",
-    "        return (dataset_size + self.batch_size - 1) // self.batch_size"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "46082fb1",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: DataLoader\n",
-    "\n",
-    "Let's test your DataLoader implementation! This is the heart of efficient data loading for neural networks.\n",
-    "\n",
-    "**This is a unit test** - it tests the DataLoader class in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9744517c",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-dataloader-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test DataLoader immediately after implementation\n",
-    "print(\"🔬 Unit Test: DataLoader...\")\n",
-    "\n",
-    "# Use the test dataset from before\n",
-    "class TestDataset(Dataset):\n",
-    "    def __init__(self, size=10):\n",
-    "        self.size = size\n",
-    "    \n",
-    "    def __getitem__(self, index):\n",
-    "        data = Tensor([index, index * 2])\n",
-    "        label = Tensor([index % 3])  # 3 classes\n",
-    "        return data, label\n",
-    "    \n",
-    "    def __len__(self):\n",
-    "        return self.size\n",
-    "    \n",
-    "    def get_num_classes(self):\n",
-    "        return 3\n",
-    "\n",
-    "# Test basic DataLoader functionality\n",
-    "try:\n",
-    "    dataset = TestDataset(size=10)\n",
-    "    dataloader = DataLoader(dataset, batch_size=3, shuffle=False)\n",
-    "    \n",
-    "    print(f\"DataLoader created: batch_size={dataloader.batch_size}, shuffle={dataloader.shuffle}\")\n",
-    "    print(f\"Number of batches: {len(dataloader)}\")\n",
-    "    \n",
-    "    # Test __len__\n",
-    "    expected_batches = (10 + 3 - 1) // 3  # Ceiling division: 4 batches\n",
-    "    assert len(dataloader) == expected_batches, f\"Should have {expected_batches} batches, got {len(dataloader)}\"\n",
-    "    print(\"✅ DataLoader __len__ works correctly\")\n",
-    "    \n",
-    "    # Test iteration\n",
-    "    batch_count = 0\n",
-    "    total_samples = 0\n",
-    "    \n",
-    "    for batch_data, batch_labels in dataloader:\n",
-    "        batch_count += 1\n",
-    "        batch_size = batch_data.shape[0]\n",
-    "        total_samples += batch_size\n",
-    "        \n",
-    "        print(f\"Batch {batch_count}: data shape {batch_data.shape}, labels shape {batch_labels.shape}\")\n",
-    "        \n",
-    "        # Verify batch dimensions\n",
-    "        assert len(batch_data.shape) == 2, f\"Batch data should be 2D, got {batch_data.shape}\"\n",
-    "        assert len(batch_labels.shape) == 2, f\"Batch labels should be 2D, got {batch_labels.shape}\"\n",
-    "        assert batch_data.shape[1] == 2, f\"Each sample should have 2 features, got {batch_data.shape[1]}\"\n",
-    "        assert batch_labels.shape[1] == 1, f\"Each label should have 1 element, got {batch_labels.shape[1]}\"\n",
-    "        \n",
-    "    assert batch_count == expected_batches, f\"Should iterate {expected_batches} times, got {batch_count}\"\n",
-    "    assert total_samples == 10, f\"Should process 10 total samples, got {total_samples}\"\n",
-    "    print(\"✅ DataLoader iteration works correctly\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ DataLoader test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test shuffling\n",
-    "try:\n",
-    "    dataloader_shuffle = DataLoader(dataset, batch_size=5, shuffle=True)\n",
-    "    dataloader_no_shuffle = DataLoader(dataset, batch_size=5, shuffle=False)\n",
-    "    \n",
-    "    # Get first batch from each\n",
-    "    batch1_shuffle = next(iter(dataloader_shuffle))\n",
-    "    batch1_no_shuffle = next(iter(dataloader_no_shuffle))\n",
-    "    \n",
-    "    print(\"✅ DataLoader shuffling parameter works\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ DataLoader shuffling test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Test different batch sizes\n",
-    "try:\n",
-    "    small_loader = DataLoader(dataset, batch_size=2, shuffle=False)\n",
-    "    large_loader = DataLoader(dataset, batch_size=8, shuffle=False)\n",
-    "    \n",
-    "    assert len(small_loader) == 5, f\"Small loader should have 5 batches, got {len(small_loader)}\"\n",
-    "    assert len(large_loader) == 2, f\"Large loader should have 2 batches, got {len(large_loader)}\"\n",
-    "    print(\"✅ DataLoader handles different batch sizes correctly\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ DataLoader batch size test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "# Show the DataLoader behavior\n",
-    "print(\"🎯 DataLoader behavior:\")\n",
-    "print(\"   Batches data for efficient processing\")\n",
-    "print(\"   Handles shuffling and iteration\")\n",
-    "print(\"   Provides clean interface for training loops\")\n",
-    "print(\"📈 Progress: Dataset interface ✓, DataLoader ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ee45269f",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Creating a Simple Dataset Example\n",
-    "\n",
-    "### Why We Need Concrete Examples\n",
-    "Abstract classes are great for interfaces, but we need concrete implementations to understand how they work. Let's create a simple dataset for testing.\n",
-    "\n",
-    "### Design Principles\n",
-    "- **Simple**: Easy to understand and debug\n",
-    "- **Configurable**: Adjustable size and properties\n",
-    "- **Predictable**: Deterministic data for testing\n",
-    "- **Educational**: Shows the Dataset pattern clearly\n",
-    "\n",
-    "### Real-World Connection\n",
-    "This pattern is used for:\n",
-    "- **CIFAR-10**: 32x32 RGB images with 10 classes\n",
-    "- **ImageNet**: High-resolution images with 1000 classes\n",
-    "- **MNIST**: 28x28 grayscale digits with 10 classes\n",
-    "- **Custom datasets**: Your own data following this pattern"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d4c773ba",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "simple-dataset",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class SimpleDataset(Dataset):\n",
-    "    \"\"\"\n",
-    "    Simple dataset for testing and demonstration.\n",
-    "    \n",
-    "    Generates synthetic data with configurable size and properties.\n",
-    "    Perfect for understanding the Dataset pattern.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, size: int = 100, num_features: int = 4, num_classes: int = 3):\n",
-    "        \"\"\"\n",
-    "        Initialize SimpleDataset.\n",
-    "        \n",
-    "        Args:\n",
-    "            size: Number of samples in the dataset\n",
-    "            num_features: Number of features per sample\n",
-    "            num_classes: Number of classes\n",
-    "            \n",
-    "        TODO: Initialize the dataset with synthetic data.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store the configuration parameters\n",
-    "        2. Generate synthetic data and labels\n",
-    "        3. Make data deterministic for testing\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        SimpleDataset(size=100, num_features=4, num_classes=3)\n",
-    "        creates 100 samples with 4 features each, 3 classes\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Store size, num_features, num_classes as instance variables\n",
-    "        - Use np.random.seed() for reproducible data\n",
-    "        - Generate random data with np.random.randn()\n",
-    "        - Generate random labels with np.random.randint()\n",
-    "        \"\"\"\n",
-    "        self.size = size\n",
-    "        self.num_features = num_features\n",
-    "        self.num_classes = num_classes\n",
-    "        \n",
-    "        # Generate synthetic data (deterministic for testing)\n",
-    "        np.random.seed(42)  # For reproducible data\n",
-    "        self.data = np.random.randn(size, num_features).astype(np.float32)\n",
-    "        self.labels = np.random.randint(0, num_classes, size=size)\n",
-    "    \n",
-    "    def __getitem__(self, index: int) -> Tuple[Tensor, Tensor]:\n",
-    "        \"\"\"\n",
-    "        Get a sample by index.\n",
-    "        \n",
-    "        Args:\n",
-    "            index: Index of the sample\n",
-    "            \n",
-    "        Returns:\n",
-    "            Tuple of (data, label) tensors\n",
-    "            \n",
-    "        TODO: Return the sample at the given index.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Get data sample from self.data[index]\n",
-    "        2. Get label from self.labels[index]\n",
-    "        3. Convert both to Tensors and return as tuple\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        dataset[0] returns (Tensor(features), Tensor(label))\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use self.data[index] for the data\n",
-    "        - Use self.labels[index] for the label\n",
-    "        - Convert to Tensors: Tensor(data), Tensor(label)\n",
-    "        \"\"\"\n",
-    "        data = self.data[index]\n",
-    "        label = self.labels[index]\n",
-    "        return Tensor(data), Tensor(label)\n",
-    "    \n",
-    "    def __len__(self) -> int:\n",
-    "        \"\"\"\n",
-    "        Get the dataset size.\n",
-    "        \n",
-    "        TODO: Return the dataset size.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Return self.size\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        len(dataset) returns 100 for dataset with 100 samples\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Simply return self.size\n",
-    "        \"\"\"\n",
-    "        return self.size\n",
-    "    \n",
-    "    def get_num_classes(self) -> int:\n",
-    "        \"\"\"\n",
-    "        Get the number of classes.\n",
-    "        \n",
-    "        TODO: Return the number of classes.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Return self.num_classes\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        dataset.get_num_classes() returns 3 for 3-class dataset\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Simply return self.num_classes\n",
-    "        \"\"\"\n",
-    "        return self.num_classes"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e6a029be",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Unit Test: SimpleDataset\n",
-    "\n",
-    "Let's test your SimpleDataset implementation! This concrete example shows how the Dataset pattern works.\n",
-    "\n",
-    "**This is a unit test** - it tests the SimpleDataset class in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0f3f5ed5",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-simple-dataset-immediate",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Test SimpleDataset immediately after implementation\n",
-    "print(\"🔬 Unit Test: SimpleDataset...\")\n",
-    "\n",
-    "try:\n",
-    "    # Create dataset\n",
-    "    dataset = SimpleDataset(size=20, num_features=5, num_classes=4)\n",
-    "    \n",
-    "    print(f\"Dataset created: size={len(dataset)}, features={dataset.num_features}, classes={dataset.get_num_classes()}\")\n",
-    "        \n",
-    "        # Test basic properties\n",
-    "    assert len(dataset) == 20, f\"Dataset length should be 20, got {len(dataset)}\"\n",
-    "    assert dataset.get_num_classes() == 4, f\"Should have 4 classes, got {dataset.get_num_classes()}\"\n",
-    "    print(\"✅ SimpleDataset basic properties work correctly\")\n",
-    "        \n",
-    "    # Test sample access\n",
-    "    data, label = dataset[0]\n",
-    "    assert isinstance(data, Tensor), \"Data should be a Tensor\"\n",
-    "    assert isinstance(label, Tensor), \"Label should be a Tensor\"\n",
-    "    assert data.shape == (5,), f\"Data shape should be (5,), got {data.shape}\"\n",
-    "    assert label.shape == (), f\"Label shape should be (), got {label.shape}\"\n",
-    "    print(\"✅ SimpleDataset sample access works correctly\")\n",
-    "    \n",
-    "    # Test sample shape\n",
-    "    sample_shape = dataset.get_sample_shape()\n",
-    "    assert sample_shape == (5,), f\"Sample shape should be (5,), got {sample_shape}\"\n",
-    "    print(\"✅ SimpleDataset get_sample_shape works correctly\")\n",
-    "    \n",
-    "    # Test multiple samples\n",
-    "    for i in range(5):\n",
-    "            data, label = dataset[i]\n",
-    "            assert data.shape == (5,), f\"Data shape should be (5,) for sample {i}, got {data.shape}\"\n",
-    "            assert 0 <= label.data < 4, f\"Label should be in [0, 3] for sample {i}, got {label.data}\"\n",
-    "    print(\"✅ SimpleDataset multiple samples work correctly\")\n",
-    "    \n",
-    "    # Test deterministic data (same seed should give same data)\n",
-    "    dataset2 = SimpleDataset(size=20, num_features=5, num_classes=4)\n",
-    "    data1, label1 = dataset[0]\n",
-    "    data2, label2 = dataset2[0]\n",
-    "    assert np.array_equal(data1.data, data2.data), \"Data should be deterministic\"\n",
-    "    assert np.array_equal(label1.data, label2.data), \"Labels should be deterministic\"\n",
-    "    print(\"✅ SimpleDataset data is deterministic\")\n",
-    "\n",
-    "except Exception as e:\n",
-    "    print(f\"❌ SimpleDataset test failed: {e}\")\n",
-    "\n",
-    "# Show the SimpleDataset behavior\n",
-    "print(\"🎯 SimpleDataset behavior:\")\n",
-    "print(\"   Generates synthetic data for testing\")\n",
-    "print(\"   Implements complete Dataset interface\")\n",
-    "print(\"   Provides deterministic data for reproducibility\")\n",
-    "print(\"📈 Progress: Dataset interface ✓, DataLoader ✓, SimpleDataset ✓\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3b5a161c",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 5: Comprehensive Test - Complete Data Pipeline\n",
-    "\n",
-    "### Real-World Data Pipeline Applications\n",
-    "Let's test our data loading components in realistic scenarios:\n",
-    "\n",
-    "#### **Training Pipeline**\n",
-    "```python\n",
-    "# The standard ML training pattern\n",
-    "dataset = SimpleDataset(size=1000, num_features=10, num_classes=5)\n",
-    "dataloader = DataLoader(dataset, batch_size=32, shuffle=True)\n",
-    "\n",
-    "for epoch in range(num_epochs):\n",
-    "    for batch_data, batch_labels in dataloader:\n",
-    "        # Train model on batch\n",
-    "        pass\n",
-    "```\n",
-    "\n",
-    "#### **Validation Pipeline**\n",
-    "```python\n",
-    "# Validation without shuffling\n",
-    "val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)\n",
-    "\n",
-    "for batch_data, batch_labels in val_loader:\n",
-    "    # Evaluate model on batch\n",
-    "    pass\n",
-    "```\n",
-    "\n",
-    "#### **Data Analysis Pipeline**\n",
-    "```python\n",
-    "# Systematic data exploration\n",
-    "for batch_data, batch_labels in dataloader:\n",
-    "    # Analyze batch statistics\n",
-    "    pass\n",
-    "```\n",
-    "\n",
-    "This comprehensive test ensures our data loading components work together for real ML applications!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5e8d80ec",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-comprehensive",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# Comprehensive test - complete data pipeline applications\n",
-    "print(\"🔬 Comprehensive Test: Complete Data Pipeline...\")\n",
-    "\n",
-    "try:\n",
-    "    # Test 1: Training Data Pipeline\n",
-    "    print(\"\\n1. Training Data Pipeline Test:\")\n",
-    "    \n",
-    "    # Create training dataset\n",
-    "    train_dataset = SimpleDataset(size=100, num_features=8, num_classes=5)\n",
-    "    train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)\n",
-    "    \n",
-    "    # Simulate training epoch\n",
-    "    epoch_samples = 0\n",
-    "    epoch_batches = 0\n",
-    "    \n",
-    "    for batch_data, batch_labels in train_loader:\n",
-    "        epoch_batches += 1\n",
-    "        epoch_samples += batch_data.shape[0]\n",
-    "        \n",
-    "        # Verify batch properties\n",
-    "        assert batch_data.shape[1] == 8, f\"Features should be 8, got {batch_data.shape[1]}\"\n",
-    "        assert len(batch_labels.shape) == 1, f\"Labels should be 1D, got shape {batch_labels.shape}\"\n",
-    "        assert isinstance(batch_data, Tensor), \"Batch data should be Tensor\"\n",
-    "        assert isinstance(batch_labels, Tensor), \"Batch labels should be Tensor\"\n",
-    "    \n",
-    "    assert epoch_samples == 100, f\"Should process 100 samples, got {epoch_samples}\"\n",
-    "    expected_batches = (100 + 16 - 1) // 16\n",
-    "    assert epoch_batches == expected_batches, f\"Should have {expected_batches} batches, got {epoch_batches}\"\n",
-    "    print(\"✅ Training pipeline works correctly\")\n",
-    "    \n",
-    "    # Test 2: Validation Data Pipeline\n",
-    "    print(\"\\n2. Validation Data Pipeline Test:\")\n",
-    "    \n",
-    "    # Create validation dataset (no shuffling)\n",
-    "    val_dataset = SimpleDataset(size=50, num_features=8, num_classes=5)\n",
-    "    val_loader = DataLoader(val_dataset, batch_size=10, shuffle=False)\n",
-    "    \n",
-    "    # Simulate validation\n",
-    "    val_samples = 0\n",
-    "    val_batches = 0\n",
-    "    \n",
-    "    for batch_data, batch_labels in val_loader:\n",
-    "        val_batches += 1\n",
-    "        val_samples += batch_data.shape[0]\n",
-    "        \n",
-    "        # Verify consistent batch processing\n",
-    "        assert batch_data.shape[1] == 8, \"Validation features should match training\"\n",
-    "        assert len(batch_labels.shape) == 1, \"Validation labels should be 1D\"\n",
-    "        \n",
-    "    assert val_samples == 50, f\"Should process 50 validation samples, got {val_samples}\"\n",
-    "    assert val_batches == 5, f\"Should have 5 validation batches, got {val_batches}\"\n",
-    "    print(\"✅ Validation pipeline works correctly\")\n",
-    "    \n",
-    "    # Test 3: Different Dataset Configurations\n",
-    "    print(\"\\n3. Dataset Configuration Test:\")\n",
-    "    \n",
-    "    # Test different configurations\n",
-    "    configs = [\n",
-    "        (200, 4, 3),   # Medium dataset\n",
-    "        (50, 12, 10),  # High-dimensional features\n",
-    "        (1000, 2, 2),  # Large dataset, simple features\n",
-    "    ]\n",
-    "    \n",
-    "    for size, features, classes in configs:\n",
-    "        dataset = SimpleDataset(size=size, num_features=features, num_classes=classes)\n",
-    "        loader = DataLoader(dataset, batch_size=32, shuffle=True)\n",
-    "        \n",
-    "        # Test one batch\n",
-    "        batch_data, batch_labels = next(iter(loader))\n",
-    "        \n",
-    "        assert batch_data.shape[1] == features, f\"Features mismatch for config {configs}\"\n",
-    "        assert len(dataset) == size, f\"Size mismatch for config {configs}\"\n",
-    "        assert dataset.get_num_classes() == classes, f\"Classes mismatch for config {configs}\"\n",
-    "    \n",
-    "    print(\"✅ Different dataset configurations work correctly\")\n",
-    "    \n",
-    "    # Test 4: Memory Efficiency Simulation\n",
-    "    print(\"\\n4. Memory Efficiency Test:\")\n",
-    "    \n",
-    "    # Create larger dataset to test memory efficiency\n",
-    "    large_dataset = SimpleDataset(size=500, num_features=20, num_classes=10)\n",
-    "    large_loader = DataLoader(large_dataset, batch_size=50, shuffle=True)\n",
-    "    \n",
-    "    # Process all batches to ensure memory efficiency\n",
-    "    processed_samples = 0\n",
-    "    max_batch_size = 0\n",
-    "    \n",
-    "    for batch_data, batch_labels in large_loader:\n",
-    "        processed_samples += batch_data.shape[0]\n",
-    "        max_batch_size = max(max_batch_size, batch_data.shape[0])\n",
-    "        \n",
-    "        # Verify memory usage stays reasonable\n",
-    "        assert batch_data.shape[0] <= 50, f\"Batch size should not exceed 50, got {batch_data.shape[0]}\"\n",
-    "    \n",
-    "    assert processed_samples == 500, f\"Should process all 500 samples, got {processed_samples}\"\n",
-    "    print(\"✅ Memory efficiency works correctly\")\n",
-    "    \n",
-    "    # Test 5: Multi-Epoch Training Simulation\n",
-    "    print(\"\\n5. Multi-Epoch Training Test:\")\n",
-    "    \n",
-    "    # Simulate multiple epochs\n",
-    "    dataset = SimpleDataset(size=60, num_features=6, num_classes=3)\n",
-    "    loader = DataLoader(dataset, batch_size=20, shuffle=True)\n",
-    "    \n",
-    "    for epoch in range(3):\n",
-    "        epoch_samples = 0\n",
-    "        for batch_data, batch_labels in loader:\n",
-    "            epoch_samples += batch_data.shape[0]\n",
-    "            \n",
-    "            # Verify shapes remain consistent across epochs\n",
-    "            assert batch_data.shape[1] == 6, f\"Features should be 6 in epoch {epoch}\"\n",
-    "            assert len(batch_labels.shape) == 1, f\"Labels should be 1D in epoch {epoch}\"\n",
-    "        \n",
-    "        assert epoch_samples == 60, f\"Should process 60 samples in epoch {epoch}, got {epoch_samples}\"\n",
-    "    \n",
-    "    print(\"✅ Multi-epoch training works correctly\")\n",
-    "    \n",
-    "    print(\"\\n🎉 Comprehensive test passed! Your data pipeline works correctly for:\")\n",
-    "    print(\"  • Large-scale dataset handling\")\n",
-    "    print(\"  • Batch processing with multiple workers\")\n",
-    "    print(\"  • Shuffling and sampling strategies\")\n",
-    "    print(\"  • Memory-efficient data loading\")\n",
-    "    print(\"  • Complete training pipeline integration\")\n",
-    "    print(\"📈 Progress: Production-ready data pipeline ✓\")\n",
-    "    \n",
-    "except Exception as e:\n",
-    "    print(f\"❌ Comprehensive test failed: {e}\")\n",
-    "    raise\n",
-    "\n",
-    "print(\"📈 Final Progress: Complete data pipeline ready for production ML!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b0352802",
-   "metadata": {
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "\"\"\"\n",
-    "# 🎯 Module Summary\n",
-    "\n",
-    "Congratulations! You've successfully implemented the core components of data loading systems:\n",
-    "\n",
-    "## What You've Accomplished\n",
-    "✅ **Dataset Abstract Class**: The foundation interface for all data loading  \n",
-    "✅ **DataLoader Implementation**: Efficient batching and iteration over datasets  \n",
-    "✅ **SimpleDataset Example**: Concrete implementation showing the Dataset pattern  \n",
-    "✅ **Complete Data Pipeline**: End-to-end data loading for neural network training  \n",
-    "✅ **Systems Thinking**: Understanding memory efficiency, batching, and I/O optimization  \n",
-    "\n",
-    "## Key Concepts You've Learned\n",
-    "- **Dataset pattern**: Abstract interface for consistent data access\n",
-    "- **DataLoader pattern**: Efficient batching and iteration for training\n",
-    "- **Memory efficiency**: Loading data on-demand rather than all at once\n",
-    "- **Batching strategies**: Grouping samples for efficient GPU computation\n",
-    "- **Shuffling**: Randomizing data order to prevent overfitting\n",
-    "\n",
-    "## Mathematical Foundations\n",
-    "- **Batch processing**: Vectorized operations on multiple samples\n",
-    "- **Memory management**: Handling datasets larger than available RAM\n",
-    "- **I/O optimization**: Minimizing disk reads and memory allocation\n",
-    "- **Stochastic sampling**: Random shuffling for better generalization\n",
-    "\n",
-    "## Real-World Applications\n",
-    "- **Computer vision**: Loading image datasets like CIFAR-10, ImageNet\n",
-    "- **Natural language processing**: Loading text datasets with tokenization\n",
-    "- **Tabular data**: Loading CSV files and database records\n",
-    "- **Audio processing**: Loading and preprocessing audio files\n",
-    "- **Time series**: Loading sequential data with proper windowing\n",
-    "\n",
-    "## Connection to Production Systems\n",
-    "- **PyTorch**: Your Dataset and DataLoader mirror `torch.utils.data`\n",
-    "- **TensorFlow**: Similar concepts in `tf.data.Dataset`\n",
-    "- **JAX**: Custom data loading with efficient batching\n",
-    "- **MLOps**: Data pipelines are critical for production ML systems\n",
-    "\n",
-    "## Performance Characteristics\n",
-    "- **Memory efficiency**: O(batch_size) memory usage, not O(dataset_size)\n",
-    "- **I/O optimization**: Load data on-demand, not all at once\n",
-    "- **Batching efficiency**: Vectorized operations on GPU\n",
-    "- **Shuffling overhead**: Minimal cost for significant training benefits\n",
-    "\n",
-    "## Data Engineering Best Practices\n",
-    "- **Reproducibility**: Deterministic data generation and shuffling\n",
-    "- **Scalability**: Handle datasets of any size\n",
-    "- **Flexibility**: Easy to switch between different data sources\n",
-    "- **Testability**: Simple interfaces for unit testing\n",
-    "\n",
-    "## Next Steps\n",
-    "1. **Export your code**: Use NBDev to export to the `tinytorch` package\n",
-    "2. **Test your implementation**: Run the complete test suite\n",
-    "3. **Build data pipelines**: \n",
-    "   ```python\n",
-    "   from tinytorch.core.dataloader import Dataset, DataLoader\n",
-    "   from tinytorch.core.tensor import Tensor\n",
-    "   \n",
-    "   # Create dataset\n",
-    "   dataset = SimpleDataset(size=1000, num_features=10, num_classes=5)\n",
-    "   \n",
-    "   # Create dataloader\n",
-    "   loader = DataLoader(dataset, batch_size=32, shuffle=True)\n",
-    "   \n",
-    "   # Training loop\n",
-    "   for epoch in range(num_epochs):\n",
-    "       for batch_data, batch_labels in loader:\n",
-    "           # Train model\n",
-    "       pass\n",
-    "   ```\n",
-    "4. **Explore advanced topics**: Data augmentation, distributed loading, streaming datasets!\n",
-    "\n",
-    "**Ready for the next challenge?** Let's build training loops and optimizers to complete the ML pipeline!\n",
-    "\"\"\"\n",
-    "\n",
-    "def test_dataset_interface():\n",
-    "    \"\"\"Test Dataset abstract interface implementation comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Dataset Interface...\")\n",
-    "    \n",
-    "    # Test TestDataset implementation\n",
-    "    dataset = TestDataset(size=5)\n",
-    "    \n",
-    "    # Test basic interface\n",
-    "    assert len(dataset) == 5, \"Dataset should have correct length\"\n",
-    "    \n",
-    "    # Test data access\n",
-    "    sample, label = dataset[0]\n",
-    "    assert isinstance(sample, Tensor), \"Sample should be Tensor\"\n",
-    "    assert isinstance(label, Tensor), \"Label should be Tensor\"\n",
-    "    \n",
-    "    print(\"✅ Dataset interface works correctly\")\n",
-    "\n",
-    "def test_dataloader():\n",
-    "    \"\"\"Test DataLoader implementation comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: DataLoader...\")\n",
-    "    \n",
-    "    # Test DataLoader with TestDataset\n",
-    "    dataset = TestDataset(size=10)\n",
-    "    loader = DataLoader(dataset, batch_size=3, shuffle=False)\n",
-    "    \n",
-    "    # Test iteration\n",
-    "    batches = list(loader)\n",
-    "    assert len(batches) >= 3, \"Should have at least 3 batches\"\n",
-    "    \n",
-    "    # Test batch shapes\n",
-    "    batch_data, batch_labels = batches[0]\n",
-    "    assert batch_data.shape[0] <= 3, \"Batch size should be <= 3\"\n",
-    "    assert batch_labels.shape[0] <= 3, \"Batch labels should match data\"\n",
-    "    \n",
-    "    print(\"✅ DataLoader works correctly\")\n",
-    "\n",
-    "def test_simple_dataset():\n",
-    "    \"\"\"Test SimpleDataset implementation comprehensively.\"\"\"\n",
-    "    print(\"🔬 Unit Test: SimpleDataset...\")\n",
-    "    \n",
-    "    # Test SimpleDataset\n",
-    "    dataset = SimpleDataset(size=100, num_features=4, num_classes=3)\n",
-    "    \n",
-    "    # Test properties\n",
-    "    assert len(dataset) == 100, \"Dataset should have correct size\"\n",
-    "    assert dataset.get_num_classes() == 3, \"Should have correct number of classes\"\n",
-    "    \n",
-    "    # Test data access\n",
-    "    sample, label = dataset[0]\n",
-    "    assert sample.shape == (4,), \"Sample should have correct features\"\n",
-    "    assert 0 <= label.data < 3, \"Label should be valid class\"\n",
-    "    \n",
-    "    print(\"✅ SimpleDataset works correctly\")\n",
-    "\n",
-    "def test_dataloader_pipeline():\n",
-    "    \"\"\"Test complete data pipeline comprehensive testing.\"\"\"\n",
-    "    print(\"🔬 Comprehensive Test: Data Pipeline...\")\n",
-    "    \n",
-    "    # Test complete pipeline\n",
-    "    dataset = SimpleDataset(size=50, num_features=10, num_classes=5)\n",
-    "    loader = DataLoader(dataset, batch_size=8, shuffle=True)\n",
-    "    \n",
-    "    total_samples = 0\n",
-    "    for batch_data, batch_labels in loader:\n",
-    "        assert isinstance(batch_data, Tensor), \"Batch data should be Tensor\"\n",
-    "        assert isinstance(batch_labels, Tensor), \"Batch labels should be Tensor\"\n",
-    "        assert batch_data.shape[1] == 10, \"Features should be correct\"\n",
-    "        total_samples += batch_data.shape[0]\n",
-    "    \n",
-    "    assert total_samples == 50, \"Should process all samples\"\n",
-    "    \n",
-    "    print(\"✅ Data pipeline integration works correctly\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c9433d3d",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3ec15e59",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"DataLoader\") "
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/09_autograd/autograd_dev.ipynb b/modules/source/09_autograd/autograd_dev.ipynb
deleted file mode 100644
index 7304b5a3..00000000
--- a/modules/source/09_autograd/autograd_dev.ipynb
+++ /dev/null
@@ -1,1309 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "b65d8062",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Autograd - Automatic Differentiation Engine\n",
-    "\n",
-    "Welcome to the Autograd module! This is where TinyTorch becomes truly powerful. You'll implement the automatic differentiation engine that makes neural network training possible.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand how automatic differentiation works through computational graphs\n",
-    "- Implement the Variable class that tracks gradients and operations\n",
-    "- Build backward propagation for gradient computation\n",
-    "- Create the foundation for neural network training\n",
-    "- Master the mathematical concepts behind backpropagation\n",
-    "\n",
-    "## Build → Use → Analyze\n",
-    "1. **Build**: Create the Variable class and gradient computation system\n",
-    "2. **Use**: Perform automatic differentiation on complex expressions\n",
-    "3. **Analyze**: Understand how gradients flow through computational graphs"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "744f90c8",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "autograd-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.autograd\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "from typing import Union, List, Tuple, Optional, Any, Callable\n",
-    "from collections import defaultdict\n",
-    "\n",
-    "# Import our existing components\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    import os\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
-    "    from tensor_dev import Tensor"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "75ad92a1",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "autograd-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Autograd Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build automatic differentiation!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "14ca5d2a",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/07_autograd/autograd_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.autograd`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.autograd import Variable, backward  # The gradient engine!\n",
-    "from tinytorch.core.tensor import Tensor\n",
-    "from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused module for understanding gradients\n",
-    "- **Production:** Proper organization like PyTorch's `torch.autograd`\n",
-    "- **Consistency:** All gradient operations live together in `core.autograd`\n",
-    "- **Foundation:** Enables training for all neural networks"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1645e619",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What is Automatic Differentiation?\n",
-    "\n",
-    "### The Problem: Computing Gradients at Scale\n",
-    "Neural networks have millions of parameters. To train them, we need gradients of the loss function with respect to every parameter:\n",
-    "\n",
-    "```\n",
-    "∇θ L = [∂L/∂w₁, ∂L/∂w₂, ..., ∂L/∂wₙ, ∂L/∂b₁, ∂L/∂b₂, ..., ∂L/∂bₘ]\n",
-    "```\n",
-    "\n",
-    "**Manual differentiation fails** because:\n",
-    "- Networks have thousands of composed functions\n",
-    "- Manual computation is extremely error-prone\n",
-    "- Every architecture change requires re-deriving all gradients\n",
-    "\n",
-    "### The Solution: Automatic Differentiation\n",
-    "**Autograd** automatically computes derivatives of functions represented as computational graphs:\n",
-    "\n",
-    "```python\n",
-    "# Instead of manually computing: ∂(x² + 2xy + y²)/∂x = 2x + 2y\n",
-    "# Autograd does it automatically:\n",
-    "x = Variable(3.0, requires_grad=True)\n",
-    "y = Variable(4.0, requires_grad=True)\n",
-    "z = x**2 + 2*x*y + y**2\n",
-    "z.backward()\n",
-    "print(x.grad)  # 2*3 + 2*4 = 14 (computed automatically!)\n",
-    "```\n",
-    "\n",
-    "### Why This is Revolutionary\n",
-    "- **Efficiency**: O(1) overhead per operation\n",
-    "- **Flexibility**: Works with any differentiable function\n",
-    "- **Correctness**: Implements chain rule precisely\n",
-    "- **Scale**: Handles millions of parameters automatically\n",
-    "\n",
-    "### Real-World Impact\n",
-    "- **PyTorch**: `torch.autograd` enables all neural network training\n",
-    "- **TensorFlow**: `tf.GradientTape` provides similar functionality\n",
-    "- **JAX**: `jax.grad` for high-performance computing\n",
-    "- **Deep Learning**: Made training complex models practical\n",
-    "\n",
-    "Let's build the engine that powers modern AI!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d84fddab",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: The Variable Class - Gradient Tracking\n",
-    "\n",
-    "### What is a Variable?\n",
-    "A **Variable** wraps a Tensor and tracks:\n",
-    "- **Data**: The actual values (forward pass)\n",
-    "- **Gradient**: The computed gradients (backward pass)\n",
-    "- **Computation history**: How this Variable was created\n",
-    "- **Backward function**: How to compute gradients\n",
-    "\n",
-    "### The Computational Graph\n",
-    "Variables automatically build a computational graph:\n",
-    "\n",
-    "```python\n",
-    "x = Variable(2.0)  # Leaf node\n",
-    "y = Variable(3.0)  # Leaf node\n",
-    "z = x * y          # Intermediate node: z = x * y\n",
-    "w = z + 1          # Output node: w = z + 1\n",
-    "\n",
-    "# Graph: x ──→ * ──→ + ──→ w\n",
-    "#        y ──→   ──→   ──→\n",
-    "```\n",
-    "\n",
-    "### Design Principles\n",
-    "- **Transparency**: Works seamlessly with existing operations\n",
-    "- **Efficiency**: Minimal overhead for forward pass\n",
-    "- **Flexibility**: Supports any differentiable operation\n",
-    "- **Correctness**: Implements chain rule precisely\n",
-    "\n",
-    "### Real-World Context\n",
-    "This is like:\n",
-    "- **PyTorch**: `torch.autograd.Variable` (now integrated into tensors)\n",
-    "- **TensorFlow**: `tf.Variable` with gradient tracking\n",
-    "- **JAX**: Variables with `jax.grad` transformation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1fa6d9eb",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "variable-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Variable:\n",
-    "    \"\"\"\n",
-    "    Variable: Tensor wrapper with automatic differentiation capabilities.\n",
-    "    \n",
-    "    The fundamental class for gradient computation in TinyTorch.\n",
-    "    Wraps Tensor objects and tracks computational history for backpropagation.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, data: Union[Tensor, np.ndarray, list, float, int], \n",
-    "                 requires_grad: bool = True, grad_fn: Optional[Callable] = None):\n",
-    "        \"\"\"\n",
-    "        Create a Variable with gradient tracking.\n",
-    "            \n",
-    "        TODO: Implement Variable initialization with gradient tracking.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Convert data to Tensor if it's not already a Tensor\n",
-    "        2. Store the tensor data in self.data\n",
-    "        3. Set gradient tracking flag (requires_grad)\n",
-    "        4. Initialize gradient to None (will be computed during backward pass)\n",
-    "        5. Store the gradient function for backward pass\n",
-    "        6. Track if this is a leaf node (no grad_fn means it's a leaf)\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        # Create leaf variables (input data)\n",
-    "        x = Variable(5.0, requires_grad=True)\n",
-    "        y = Variable([1, 2, 3], requires_grad=True)\n",
-    "        \n",
-    "        # Create intermediate variables (results of operations)\n",
-    "        z = x + y  # Has grad_fn for addition\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use isinstance(data, Tensor) to check type\n",
-    "        - Convert with Tensor(data) if needed\n",
-    "        - Store requires_grad, grad_fn flags\n",
-    "        - Initialize self.grad = None\n",
-    "        - Leaf nodes have grad_fn = None\n",
-    "        - Set self.is_leaf = (grad_fn is None)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is like torch.Tensor with requires_grad=True\n",
-    "        - Forms the basis for all neural network training\n",
-    "        - Each Variable is a node in the computational graph\n",
-    "        - Enables automatic gradient computation\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Convert data to Tensor if needed\n",
-    "        if isinstance(data, Tensor):\n",
-    "            self.data = data\n",
-    "        else:\n",
-    "            self.data = Tensor(data)\n",
-    "        \n",
-    "        # Set gradient tracking\n",
-    "        self.requires_grad = requires_grad\n",
-    "        self.grad = None  # Will be initialized when needed\n",
-    "        self.grad_fn = grad_fn\n",
-    "        self.is_leaf = grad_fn is None\n",
-    "        \n",
-    "        # For computational graph\n",
-    "        self._backward_hooks = []\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    @property\n",
-    "    def shape(self) -> Tuple[int, ...]:\n",
-    "        \"\"\"Get the shape of the underlying tensor.\"\"\"\n",
-    "        return self.data.shape\n",
-    "    \n",
-    "    @property\n",
-    "    def size(self) -> int:\n",
-    "        \"\"\"Get the total number of elements.\"\"\"\n",
-    "        return self.data.size\n",
-    "    \n",
-    "    def __repr__(self) -> str:\n",
-    "        \"\"\"String representation of the Variable.\"\"\"\n",
-    "        grad_str = f\", grad_fn={self.grad_fn.__name__}\" if self.grad_fn else \"\"\n",
-    "        return f\"Variable({self.data.data.tolist()}, requires_grad={self.requires_grad}{grad_str})\"\n",
-    "    \n",
-    "    def backward(self, gradient: Optional['Variable'] = None) -> None:\n",
-    "        \"\"\"\n",
-    "        Compute gradients using backpropagation.\n",
-    "        \n",
-    "        TODO: Implement backward pass for gradient computation.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. If gradient is None, create gradient of ones (for scalar outputs)\n",
-    "        2. If this Variable requires gradients, accumulate the gradient\n",
-    "        3. If this Variable has a grad_fn, call it to propagate gradients\n",
-    "        4. The grad_fn will recursively call backward on input Variables\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        x = Variable(2.0, requires_grad=True)\n",
-    "        y = Variable(3.0, requires_grad=True)\n",
-    "        z = add(x, y)  # z = 5.0\n",
-    "        z.backward()\n",
-    "        print(x.grad)  # 1.0 (∂z/∂x = 1)\n",
-    "        print(y.grad)  # 1.0 (∂z/∂y = 1)\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - If gradient is None: gradient = Variable(np.ones_like(self.data.data))\n",
-    "        - If self.requires_grad: accumulate gradient into self.grad\n",
-    "        - If self.grad_fn: call self.grad_fn(gradient)\n",
-    "        - Handle gradient accumulation (add to existing gradient)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This implements the chain rule of calculus\n",
-    "        - Gradients flow backward through the computational graph\n",
-    "        - Each operation contributes its local gradient\n",
-    "        - Enables training of any differentiable function\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if gradient is None:\n",
-    "            gradient = Variable(np.ones_like(self.data.data))\n",
-    "        \n",
-    "        if self.requires_grad:\n",
-    "            if self.grad is None:\n",
-    "                self.grad = gradient\n",
-    "            else:\n",
-    "                # Accumulate gradients\n",
-    "                self.grad = Variable(self.grad.data.data + gradient.data.data)\n",
-    "        \n",
-    "            if self.grad_fn is not None:\n",
-    "                self.grad_fn(gradient)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def zero_grad(self) -> None:\n",
-    "        \"\"\"Reset gradients to zero.\"\"\"\n",
-    "        self.grad = None\n",
-    "    \n",
-    "    def __add__(self, other: Union['Variable', float, int]) -> 'Variable':\n",
-    "        \"\"\"Addition operator: self + other\"\"\"\n",
-    "        return add(self, other)\n",
-    "    \n",
-    "    def __mul__(self, other: Union['Variable', float, int]) -> 'Variable':\n",
-    "        \"\"\"Multiplication operator: self * other\"\"\"\n",
-    "        return multiply(self, other)\n",
-    "    \n",
-    "    def __sub__(self, other: Union['Variable', float, int]) -> 'Variable':\n",
-    "        \"\"\"Subtraction operator: self - other\"\"\"\n",
-    "        return subtract(self, other)\n",
-    "    \n",
-    "    def __truediv__(self, other: Union['Variable', float, int]) -> 'Variable':\n",
-    "        \"\"\"Division operator: self / other\"\"\"\n",
-    "        return divide(self, other) "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "08ca2dfa",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Variable Class\n",
-    "\n",
-    "Once you implement the Variable class above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "81162152",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-variable-class",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_variable_class():\n",
-    "    \"\"\"Test Variable class implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Variable Class...\")\n",
-    "    \n",
-    "    # Test Variable creation\n",
-    "    x = Variable(5.0, requires_grad=True)\n",
-    "    assert x.requires_grad == True, \"Variable should require gradients\"\n",
-    "    assert x.is_leaf == True, \"Variable should be a leaf node\"\n",
-    "    assert x.grad is None, \"Gradient should be None initially\"\n",
-    "    \n",
-    "    # Test data access\n",
-    "    assert x.data.data.item() == 5.0, \"Data should be accessible\"\n",
-    "    assert x.shape == (), \"Scalar should have empty shape\"\n",
-    "    assert x.size == 1, \"Scalar should have size 1\"\n",
-    "    \n",
-    "    # Test with list input\n",
-    "    y = Variable([1, 2, 3], requires_grad=True)\n",
-    "    assert y.shape == (3,), \"List should create 1D tensor\"\n",
-    "    assert y.size == 3, \"Size should be 3\"\n",
-    "    \n",
-    "    # Test with requires_grad=False\n",
-    "    z = Variable(10.0, requires_grad=False)\n",
-    "    assert z.requires_grad == False, \"Should not require gradients\"\n",
-    "    \n",
-    "    # Test zero_grad\n",
-    "    x.grad = Variable(1.0)\n",
-    "    x.zero_grad()\n",
-    "    assert x.grad is None, \"zero_grad should reset gradient to None\"\n",
-    "    \n",
-    "    print(\"✅ Variable class tests passed!\")\n",
-    "    print(f\"✅ Variable creation and initialization working\")\n",
-    "    print(f\"✅ Data access and properties working\")\n",
-    "    print(f\"✅ Gradient management working\")\n",
-    "\n",
-    "# Run inline tests when module is executed directly\n",
-    "if __name__ == \"__main__\":\n",
-    "    test_variable_class()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "60f90c39",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Basic Operations with Gradients\n",
-    "\n",
-    "### The Chain Rule in Action\n",
-    "Every operation must implement:\n",
-    "1. **Forward pass**: Compute the result\n",
-    "2. **Backward pass**: Compute gradients for inputs\n",
-    "\n",
-    "### Example: Addition\n",
-    "For z = x + y:\n",
-    "- **Forward**: z.data = x.data + y.data\n",
-    "- **Backward**: ∂z/∂x = 1, ∂z/∂y = 1\n",
-    "\n",
-    "### Mathematical Foundation\n",
-    "The chain rule states:\n",
-    "```\n",
-    "∂f/∂x = ∂f/∂z · ∂z/∂x\n",
-    "```\n",
-    "\n",
-    "For complex expressions like f(g(h(x))):\n",
-    "```\n",
-    "∂f/∂x = ∂f/∂g · ∂g/∂h · ∂h/∂x\n",
-    "```\n",
-    "\n",
-    "### Implementation Pattern\n",
-    "Each operation returns a new Variable with:\n",
-    "- **Forward result**: Computed value\n",
-    "- **Backward function**: Gradient computation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "68856b9c",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "add-operation",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def add(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Variable:\n",
-    "    \"\"\"\n",
-    "    Addition operation with gradient tracking: a + b\n",
-    "    \n",
-    "    TODO: Implement addition with automatic differentiation.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Convert inputs to Variables if they're scalars\n",
-    "    2. Compute forward pass: result = a.data + b.data\n",
-    "    3. Create gradient function that implements: ∂(a+b)/∂a = 1, ∂(a+b)/∂b = 1\n",
-    "    4. Return new Variable with result and gradient function\n",
-    "    \n",
-    "    MATHEMATICAL FOUNDATION:\n",
-    "    - Forward: z = x + y\n",
-    "    - Backward: ∂z/∂x = 1, ∂z/∂y = 1\n",
-    "    - Chain rule: ∂L/∂x = ∂L/∂z · ∂z/∂x = ∂L/∂z · 1 = ∂L/∂z\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    x = Variable(2.0, requires_grad=True)\n",
-    "    y = Variable(3.0, requires_grad=True)\n",
-    "    z = add(x, y)  # z = 5.0\n",
-    "    z.backward()\n",
-    "    print(x.grad)  # 1.0 (∂z/∂x = 1)\n",
-    "    print(y.grad)  # 1.0 (∂z/∂y = 1)\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Convert scalars: if isinstance(a, (int, float)): a = Variable(a, requires_grad=False)\n",
-    "    - Forward pass: result_data = a.data + b.data\n",
-    "    - Backward function: def grad_fn(grad_output): if a.requires_grad: a.backward(grad_output)\n",
-    "    - Return: Variable(result_data, grad_fn=grad_fn)\n",
-    "    - Only propagate gradients to Variables that require them\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is like torch.add() with autograd\n",
-    "    - Addition distributes gradients equally to both inputs\n",
-    "    - Forms the basis for bias addition in neural networks\n",
-    "    - Chain rule propagates gradients through the graph\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Convert scalars to Variables\n",
-    "    if isinstance(a, (int, float)):\n",
-    "        a = Variable(a, requires_grad=False)\n",
-    "    if isinstance(b, (int, float)):\n",
-    "        b = Variable(b, requires_grad=False)\n",
-    "    \n",
-    "    # Forward pass\n",
-    "    result_data = a.data + b.data\n",
-    "    \n",
-    "    # Backward function\n",
-    "    def grad_fn(grad_output):\n",
-    "        # Addition distributes gradients equally\n",
-    "        if a.requires_grad:\n",
-    "            a.backward(grad_output)\n",
-    "        if b.requires_grad:\n",
-    "            b.backward(grad_output)\n",
-    "    \n",
-    "    # Return new Variable with gradient function\n",
-    "    requires_grad = a.requires_grad or b.requires_grad\n",
-    "    return Variable(result_data, requires_grad=requires_grad, grad_fn=grad_fn)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "61d7404f",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Addition Operation\n",
-    "\n",
-    "Once you implement the add function above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "42055ba9",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-add-operation",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_add_operation():\n",
-    "    \"\"\"Test addition operation with gradients\"\"\"\n",
-    "    print(\"🔬 Unit Test: Addition Operation...\")\n",
-    "    \n",
-    "    # Test basic addition\n",
-    "    x = Variable(2.0, requires_grad=True)\n",
-    "    y = Variable(3.0, requires_grad=True)\n",
-    "    z = add(x, y)\n",
-    "    \n",
-    "    assert z.data.data.item() == 5.0, \"Addition result should be 5.0\"\n",
-    "    assert z.requires_grad == True, \"Result should require gradients\"\n",
-    "    assert z.is_leaf == False, \"Result should not be a leaf node\"\n",
-    "    \n",
-    "    # Test backward pass\n",
-    "    z.backward()\n",
-    "    \n",
-    "    assert x.grad is not None, \"x should have gradient\"\n",
-    "    assert y.grad is not None, \"y should have gradient\"\n",
-    "    assert x.grad.data.data.item() == 1.0, \"∂z/∂x should be 1.0\"\n",
-    "    assert y.grad.data.data.item() == 1.0, \"∂z/∂y should be 1.0\"\n",
-    "    \n",
-    "    # Test with scalar\n",
-    "    a = Variable(5.0, requires_grad=True)\n",
-    "    b = add(a, 3.0)  # Add scalar\n",
-    "    \n",
-    "    assert b.data.data.item() == 8.0, \"Addition with scalar should work\"\n",
-    "    \n",
-    "    b.backward()\n",
-    "    assert a.grad.data.data.item() == 1.0, \"Gradient through scalar addition should be 1.0\"\n",
-    "    \n",
-    "    print(\"✅ Addition operation tests passed!\")\n",
-    "    print(f\"✅ Forward pass computing correct results\")\n",
-    "    print(f\"✅ Backward pass computing correct gradients\")\n",
-    "    print(f\"✅ Scalar addition working correctly\")\n",
-    "\n",
-    "# Run inline tests when module is executed directly\n",
-    "if __name__ == \"__main__\":\n",
-    "    test_add_operation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "16253bbe",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Multiplication Operation\n",
-    "\n",
-    "### The Product Rule\n",
-    "For z = x * y:\n",
-    "- **Forward**: z = x * y\n",
-    "- **Backward**: ∂z/∂x = y, ∂z/∂y = x\n",
-    "\n",
-    "### Why This Matters\n",
-    "Multiplication is everywhere in neural networks:\n",
-    "- **Weight scaling**: w * x in dense layers\n",
-    "- **Attention mechanisms**: attention_weights * values\n",
-    "- **Gating**: gate_signal * hidden_state\n",
-    "\n",
-    "### Chain Rule Application\n",
-    "When gradients flow back through multiplication:\n",
-    "```\n",
-    "∂L/∂x = ∂L/∂z · ∂z/∂x = ∂L/∂z · y\n",
-    "∂L/∂y = ∂L/∂z · ∂z/∂y = ∂L/∂z · x\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "843201ea",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "multiply-operation",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def multiply(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Variable:\n",
-    "    \"\"\"\n",
-    "    Multiplication operation with gradient tracking: a * b\n",
-    "    \n",
-    "    TODO: Implement multiplication with automatic differentiation.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Convert inputs to Variables if they're scalars\n",
-    "    2. Compute forward pass: result = a.data * b.data\n",
-    "    3. Create gradient function implementing product rule: ∂(a*b)/∂a = b, ∂(a*b)/∂b = a\n",
-    "    4. Return new Variable with result and gradient function\n",
-    "    \n",
-    "    MATHEMATICAL FOUNDATION:\n",
-    "    - Forward: z = x * y\n",
-    "    - Backward: ∂z/∂x = y, ∂z/∂y = x\n",
-    "    - Chain rule: ∂L/∂x = ∂L/∂z · y, ∂L/∂y = ∂L/∂z · x\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    x = Variable(2.0, requires_grad=True)\n",
-    "    y = Variable(3.0, requires_grad=True)\n",
-    "    z = multiply(x, y)  # z = 6.0\n",
-    "    z.backward()\n",
-    "    print(x.grad)  # 3.0 (∂z/∂x = y)\n",
-    "    print(y.grad)  # 2.0 (∂z/∂y = x)\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Convert scalars to Variables (same as addition)\n",
-    "    - Forward pass: result_data = a.data * b.data\n",
-    "    - Backward function: multiply incoming gradient by the other variable\n",
-    "    - For a: a.backward(grad_output * b.data)\n",
-    "    - For b: b.backward(grad_output * a.data)\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is like torch.mul() with autograd\n",
-    "    - Product rule is fundamental to backpropagation\n",
-    "    - Used in weight updates and attention mechanisms\n",
-    "    - Each input's gradient depends on the other input's value\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Convert scalars to Variables\n",
-    "    if isinstance(a, (int, float)):\n",
-    "        a = Variable(a, requires_grad=False)\n",
-    "    if isinstance(b, (int, float)):\n",
-    "        b = Variable(b, requires_grad=False)\n",
-    "    \n",
-    "    # Forward pass\n",
-    "    result_data = a.data * b.data\n",
-    "    \n",
-    "    # Backward function\n",
-    "    def grad_fn(grad_output):\n",
-    "        # Product rule: d(xy)/dx = y, d(xy)/dy = x\n",
-    "        if a.requires_grad:\n",
-    "            a.backward(Variable(grad_output.data.data * b.data.data))\n",
-    "        if b.requires_grad:\n",
-    "            b.backward(Variable(grad_output.data.data * a.data.data))\n",
-    "    \n",
-    "    # Return new Variable with gradient function\n",
-    "    requires_grad = a.requires_grad or b.requires_grad\n",
-    "    return Variable(result_data, requires_grad=requires_grad, grad_fn=grad_fn)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0d9831c4",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Multiplication Operation\n",
-    "\n",
-    "Once you implement the multiply function above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "86a17baf",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-multiply-operation",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_multiply_operation():\n",
-    "    \"\"\"Test multiplication operation with gradients\"\"\"\n",
-    "    print(\"🔬 Unit Test: Multiplication Operation...\")\n",
-    "    \n",
-    "    # Test basic multiplication\n",
-    "    x = Variable(2.0, requires_grad=True)\n",
-    "    y = Variable(3.0, requires_grad=True)\n",
-    "    z = multiply(x, y)\n",
-    "    \n",
-    "    assert z.data.data.item() == 6.0, \"Multiplication result should be 6.0\"\n",
-    "    assert z.requires_grad == True, \"Result should require gradients\"\n",
-    "    \n",
-    "    # Test backward pass\n",
-    "    z.backward()\n",
-    "    \n",
-    "    assert x.grad is not None, \"x should have gradient\"\n",
-    "    assert y.grad is not None, \"y should have gradient\"\n",
-    "    assert x.grad.data.data.item() == 3.0, \"∂z/∂x should be y = 3.0\"\n",
-    "    assert y.grad.data.data.item() == 2.0, \"∂z/∂y should be x = 2.0\"\n",
-    "    \n",
-    "    # Test with scalar\n",
-    "    a = Variable(4.0, requires_grad=True)\n",
-    "    b = multiply(a, 2.0)  # Multiply by scalar\n",
-    "    \n",
-    "    assert b.data.data.item() == 8.0, \"Multiplication with scalar should work\"\n",
-    "    \n",
-    "    b.backward()\n",
-    "    assert a.grad.data.data.item() == 2.0, \"Gradient through scalar multiplication should be the scalar\"\n",
-    "    \n",
-    "    print(\"✅ Multiplication operation tests passed!\")\n",
-    "    print(f\"✅ Forward pass computing correct results\")\n",
-    "    print(f\"✅ Backward pass implementing product rule correctly\")\n",
-    "    print(f\"✅ Scalar multiplication working correctly\")\n",
-    "\n",
-    "# Run inline tests when module is executed directly\n",
-    "if __name__ == \"__main__\":\n",
-    "    test_multiply_operation()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "364f201e",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "subtract-operation",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def subtract(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Variable:\n",
-    "    \"\"\"\n",
-    "    Subtraction operation with gradient tracking.\n",
-    "    \n",
-    "    Args:\n",
-    "        a: First operand (minuend)\n",
-    "        b: Second operand (subtrahend)\n",
-    "        \n",
-    "    Returns:\n",
-    "        Variable with difference and gradient function\n",
-    "        \n",
-    "    TODO: Implement subtraction with gradient computation.\n",
-    "    \n",
-    "    APPROACH:\n",
-    "    1. Convert inputs to Variables if needed\n",
-    "    2. Compute forward pass: result = a - b\n",
-    "    3. Create gradient function with correct signs\n",
-    "    4. Return Variable with result and grad_fn\n",
-    "    \n",
-    "    MATHEMATICAL RULE:\n",
-    "    If z = x - y, then dz/dx = 1, dz/dy = -1\n",
-    "    \n",
-    "    EXAMPLE:\n",
-    "    x = Variable(5.0), y = Variable(3.0)\n",
-    "    z = subtract(x, y)  # z.data = 2.0\n",
-    "    z.backward()        # x.grad = 1.0, y.grad = -1.0\n",
-    "    \n",
-    "    HINTS:\n",
-    "    - Forward pass is straightforward: a - b\n",
-    "    - Gradient for a is positive, for b is negative\n",
-    "    - Remember to negate the gradient for b\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Convert to Variables if needed\n",
-    "    if not isinstance(a, Variable):\n",
-    "        a = Variable(a, requires_grad=False)\n",
-    "    if not isinstance(b, Variable):\n",
-    "        b = Variable(b, requires_grad=False)\n",
-    "    \n",
-    "    # Forward pass\n",
-    "    result_data = a.data - b.data\n",
-    "    \n",
-    "    # Create gradient function\n",
-    "    def grad_fn(grad_output):\n",
-    "        # Subtraction rule: d(x-y)/dx = 1, d(x-y)/dy = -1\n",
-    "        if a.requires_grad:\n",
-    "            a.backward(grad_output)\n",
-    "        if b.requires_grad:\n",
-    "            b_grad = Variable(-grad_output.data.data)\n",
-    "            b.backward(b_grad)\n",
-    "    \n",
-    "    # Determine if result requires gradients\n",
-    "    requires_grad = a.requires_grad or b.requires_grad\n",
-    "    \n",
-    "    return Variable(result_data, requires_grad=requires_grad, grad_fn=grad_fn)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b113ec95",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-subtract-operation",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_subtract_operation():\n",
-    "    \"\"\"Test subtraction operation with gradients\"\"\"\n",
-    "    print(\"🔬 Unit Test: Subtraction Operation...\")\n",
-    "    \n",
-    "    # Test basic subtraction\n",
-    "    x = Variable(5.0, requires_grad=True)\n",
-    "    y = Variable(3.0, requires_grad=True)\n",
-    "    z = subtract(x, y)\n",
-    "    \n",
-    "    assert z.data.data.item() == 2.0, \"Subtraction result should be 2.0\"\n",
-    "    assert z.requires_grad == True, \"Result should require gradients\"\n",
-    "    \n",
-    "    # Test backward pass\n",
-    "    z.backward()\n",
-    "    \n",
-    "    assert x.grad is not None, \"x should have gradient\"\n",
-    "    assert y.grad is not None, \"y should have gradient\"\n",
-    "    assert x.grad.data.data.item() == 1.0, \"∂z/∂x should be 1.0\"\n",
-    "    assert y.grad.data.data.item() == -1.0, \"∂z/∂y should be -1.0\"\n",
-    "    \n",
-    "    # Test with scalar\n",
-    "    a = Variable(4.0, requires_grad=True)\n",
-    "    b = subtract(a, 2.0)  # Subtract scalar\n",
-    "    \n",
-    "    assert b.data.data.item() == 2.0, \"Subtraction with scalar should work\"\n",
-    "    \n",
-    "    b.backward()\n",
-    "    assert a.grad.data.data.item() == 1.0, \"Gradient through scalar subtraction should be 1.0\"\n",
-    "    \n",
-    "    print(\"✅ Subtraction operation tests passed!\")\n",
-    "    print(f\"✅ Forward pass computing correct results\")\n",
-    "    print(f\"✅ Backward pass implementing subtraction rule correctly\")\n",
-    "    print(f\"✅ Scalar subtraction working correctly\")\n",
-    "\n",
-    "# Run inline tests when module is executed directly\n",
-    "if __name__ == \"__main__\":\n",
-    "    test_subtract_operation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ac463d8c",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Chain Rule in Complex Expressions\n",
-    "\n",
-    "### Building Complex Computations\n",
-    "Now let's test how multiple operations work together through the chain rule:\n",
-    "\n",
-    "### Example: f(x, y) = (x + y) * (x - y)\n",
-    "This creates a computational graph:\n",
-    "```\n",
-    "x ──→ + ──→ * ──→ result\n",
-    "y ──→   ──→   ──→\n",
-    "│            ↑\n",
-    "└──→ - ──────┘\n",
-    "```\n",
-    "\n",
-    "### Chain Rule Application\n",
-    "- **Forward**: Compute each operation in sequence\n",
-    "- **Backward**: Gradients flow back through each operation\n",
-    "- **Automatic**: No manual gradient computation needed!\n",
-    "\n",
-    "### Real-World Significance\n",
-    "Complex neural networks are just larger versions of this:\n",
-    "- **Millions of operations**: Each tracked automatically\n",
-    "- **Complex architectures**: ResNet, Transformer, etc.\n",
-    "- **Efficient computation**: O(1) overhead per operation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "304b4351",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-chain-rule",
-     "locked": true,
-     "points": 20,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_chain_rule():\n",
-    "    \"\"\"Test chain rule with complex expressions\"\"\"\n",
-    "    print(\"🔬 Unit Test: Chain Rule with Complex Expressions...\")\n",
-    "    \n",
-    "    # Test: f(x, y) = (x + y) * (x - y) = x² - y²\n",
-    "    x = Variable(3.0, requires_grad=True)\n",
-    "    y = Variable(2.0, requires_grad=True)\n",
-    "    \n",
-    "    # Build expression step by step\n",
-    "    sum_xy = add(x, y)      # x + y = 5.0\n",
-    "    diff_xy = subtract(x, y) # x - y = 1.0\n",
-    "    result = multiply(sum_xy, diff_xy)  # (x + y) * (x - y) = 5.0\n",
-    "    \n",
-    "    # Check forward pass\n",
-    "    assert result.data.data.item() == 5.0, \"Forward pass should compute 5.0\"\n",
-    "    \n",
-    "    # Compute gradients\n",
-    "    result.backward()\n",
-    "    \n",
-    "    # Check gradients: ∂(x²-y²)/∂x = 2x, ∂(x²-y²)/∂y = -2y\n",
-    "    expected_x_grad = 2 * x.data.data.item()  # 2 * 3 = 6\n",
-    "    expected_y_grad = -2 * y.data.data.item()  # -2 * 2 = -4\n",
-    "    \n",
-    "    assert abs(x.grad.data.data.item() - expected_x_grad) < 1e-6, f\"x gradient should be {expected_x_grad}\"\n",
-    "    assert abs(y.grad.data.data.item() - expected_y_grad) < 1e-6, f\"y gradient should be {expected_y_grad}\"\n",
-    "    \n",
-    "    # Test more complex expression: f(x) = (x + 1) * (x + 2) * (x + 3)\n",
-    "    x2 = Variable(1.0, requires_grad=True)\n",
-    "    \n",
-    "    term1 = add(x2, 1.0)    # x + 1 = 2.0\n",
-    "    term2 = add(x2, 2.0)    # x + 2 = 3.0\n",
-    "    term3 = add(x2, 3.0)    # x + 3 = 4.0\n",
-    "    \n",
-    "    product1 = multiply(term1, term2)  # (x + 1) * (x + 2) = 6.0\n",
-    "    result2 = multiply(product1, term3)  # * (x + 3) = 24.0\n",
-    "    \n",
-    "    assert result2.data.data.item() == 24.0, \"Complex expression should compute 24.0\"\n",
-    "    \n",
-    "    result2.backward()\n",
-    "    \n",
-    "    # For f(x) = (x+1)(x+2)(x+3), f'(x) = 3x² + 12x + 11\n",
-    "    # At x=1: f'(1) = 3 + 12 + 11 = 26\n",
-    "    expected_grad = 3 * (1.0**2) + 12 * 1.0 + 11  # 26\n",
-    "    \n",
-    "    assert abs(x2.grad.data.data.item() - expected_grad) < 1e-6, f\"Complex gradient should be {expected_grad}\"\n",
-    "    \n",
-    "    print(\"✅ Chain rule tests passed!\")\n",
-    "    print(f\"✅ Simple expression: (x+y)*(x-y) = x²-y²\")\n",
-    "    print(f\"✅ Complex expression: (x+1)*(x+2)*(x+3)\")\n",
-    "    print(f\"✅ Automatic gradient computation working correctly\")\n",
-    "    print(f\"✅ Chain rule implemented correctly\")\n",
-    "\n",
-    "# Run inline tests when module is executed directly\n",
-    "if __name__ == \"__main__\":\n",
-    "    test_chain_rule()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a8e2d912",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 5: Integration with Neural Network Training\n",
-    "\n",
-    "### The Complete Training Loop\n",
-    "Let's see how autograd enables neural network training:\n",
-    "\n",
-    "1. **Forward pass**: Compute predictions\n",
-    "2. **Loss computation**: Compare with targets\n",
-    "3. **Backward pass**: Compute gradients automatically\n",
-    "4. **Parameter update**: Update weights using gradients\n",
-    "\n",
-    "### Example: Simple Linear Regression\n",
-    "   ```python\n",
-    "# Model: y = wx + b\n",
-    "w = Variable(0.5, requires_grad=True)\n",
-    "b = Variable(0.1, requires_grad=True)\n",
-    "\n",
-    "    # Forward pass\n",
-    "prediction = w * x + b\n",
-    "\n",
-    "# Loss: mean squared error\n",
-    "loss = (prediction - target)**2\n",
-    "\n",
-    "# Backward pass (automatic!)\n",
-    "loss.backward()\n",
-    "\n",
-    "# Update parameters\n",
-    "w.data = w.data - learning_rate * w.grad.data\n",
-    "b.data = b.data - learning_rate * b.grad.data\n",
-    "```\n",
-    "\n",
-    "### Why This is Powerful\n",
-    "- **Automatic**: No manual gradient computation\n",
-    "- **Flexible**: Works with any differentiable function\n",
-    "- **Efficient**: Minimal computational overhead\n",
-    "- **Scalable**: Handles millions of parameters"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c26b9fd2",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-neural-network-training",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_neural_network_training():\n",
-    "    \"\"\"Test autograd in neural network training scenario\"\"\"\n",
-    "    print(\"🔬 Unit Test: Neural Network Training Comprehensive Test...\")\n",
-    "    \n",
-    "    # Simple linear regression: y = wx + b\n",
-    "    # Training data: y = 2x + 1 + noise\n",
-    "    \n",
-    "    # Initialize parameters\n",
-    "    w = Variable(0.1, requires_grad=True)  # Start with small random value\n",
-    "    b = Variable(0.0, requires_grad=True)  # Start with zero bias\n",
-    "    \n",
-    "    # Training data\n",
-    "    x_data = [1.0, 2.0, 3.0, 4.0]\n",
-    "    y_data = [3.0, 5.0, 7.0, 9.0]  # y = 2x + 1\n",
-    "    \n",
-    "    learning_rate = 0.01\n",
-    "    \n",
-    "    # Training loop\n",
-    "    for epoch in range(100):\n",
-    "        total_loss = Variable(0.0)\n",
-    "        \n",
-    "        for x_val, y_val in zip(x_data, y_data):\n",
-    "            # Create input variable\n",
-    "            x = Variable(x_val, requires_grad=False)\n",
-    "            target = Variable(y_val, requires_grad=False)\n",
-    "            \n",
-    "    # Forward pass\n",
-    "            prediction = add(multiply(w, x), b)  # wx + b\n",
-    "            \n",
-    "            # Loss: squared error\n",
-    "            error = subtract(prediction, target)\n",
-    "            loss = multiply(error, error)  # (pred - target)²\n",
-    "            \n",
-    "            # Accumulate loss\n",
-    "            total_loss = add(total_loss, loss)\n",
-    "        \n",
-    "        # Backward pass\n",
-    "        w.zero_grad()\n",
-    "        b.zero_grad()\n",
-    "        total_loss.backward()\n",
-    "        \n",
-    "        # Update parameters\n",
-    "        if w.grad is not None:\n",
-    "            w.data = Tensor(w.data.data - learning_rate * w.grad.data.data)\n",
-    "        if b.grad is not None:\n",
-    "            b.data = Tensor(b.data.data - learning_rate * b.grad.data.data)\n",
-    "    \n",
-    "    # Check that parameters converged to correct values\n",
-    "    final_w = w.data.data.item()\n",
-    "    final_b = b.data.data.item()\n",
-    "    \n",
-    "    print(f\"Final weights: w = {final_w:.3f}, b = {final_b:.3f}\")\n",
-    "    print(f\"Target weights: w = 2.000, b = 1.000\")\n",
-    "    \n",
-    "    # Should be close to w=2, b=1\n",
-    "    assert abs(final_w - 2.0) < 0.1, f\"Weight should be close to 2.0, got {final_w}\"\n",
-    "    assert abs(final_b - 1.0) < 0.1, f\"Bias should be close to 1.0, got {final_b}\"\n",
-    "    \n",
-    "    # Test prediction with learned parameters\n",
-    "    test_x = Variable(5.0, requires_grad=False)\n",
-    "    test_prediction = add(multiply(w, test_x), b)\n",
-    "    expected_output = 2.0 * 5.0 + 1.0  # 11.0\n",
-    "    \n",
-    "    prediction_error = abs(test_prediction.data.data.item() - expected_output)\n",
-    "    assert prediction_error < 0.5, f\"Prediction error should be small, got {prediction_error}\"\n",
-    "    \n",
-    "    print(\"✅ Neural network training comprehensive tests passed!\")\n",
-    "    print(f\"✅ Parameters converged to correct values\")\n",
-    "    print(f\"✅ Model makes accurate predictions\")\n",
-    "    print(f\"✅ Autograd enables automatic training\")\n",
-    "    print(f\"✅ Ready for complex neural network architectures!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6ecb7aca",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7cbb2c8b",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Autograd\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e3d40178",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Automatic Differentiation Mastery!\n",
-    "\n",
-    "Congratulations! You've successfully implemented the automatic differentiation engine that powers all modern deep learning:\n",
-    "\n",
-    "### ✅ What You've Built\n",
-    "- **Variable Class**: Tensor wrapper with gradient tracking and computational graph construction\n",
-    "- **Automatic Differentiation**: Forward and backward pass implementation\n",
-    "- **Basic Operations**: Addition and multiplication with proper gradient computation\n",
-    "- **Chain Rule**: Automatic gradient flow through complex expressions\n",
-    "- **Training Integration**: Complete neural network training with automatic gradients\n",
-    "\n",
-    "### ✅ Key Learning Outcomes\n",
-    "- **Understanding**: How automatic differentiation works through computational graphs\n",
-    "- **Implementation**: Built the gradient engine from scratch\n",
-    "- **Mathematical mastery**: Chain rule, product rule, and gradient computation\n",
-    "- **Real-world application**: Saw how autograd enables neural network training\n",
-    "- **Systems thinking**: Understanding the foundation of modern AI systems\n",
-    "\n",
-    "### ✅ Mathematical Foundations Mastered\n",
-    "- **Chain Rule**: ∂f/∂x = ∂f/∂z · ∂z/∂x for composite functions\n",
-    "- **Product Rule**: ∂(xy)/∂x = y, ∂(xy)/∂y = x for multiplication\n",
-    "- **Gradient Accumulation**: Handling multiple paths to the same variable\n",
-    "- **Computational Graphs**: Forward pass builds graph, backward pass computes gradients\n",
-    "\n",
-    "### ✅ Professional Skills Developed\n",
-    "- **Systems architecture**: Designed a scalable gradient computation system\n",
-    "- **Memory management**: Efficient gradient storage and computation\n",
-    "- **API design**: Clean interfaces for automatic differentiation\n",
-    "- **Testing methodology**: Comprehensive validation of gradient computation\n",
-    "\n",
-    "### ✅ Ready for Advanced Applications\n",
-    "Your autograd engine now enables:\n",
-    "- **Deep Neural Networks**: Automatic gradient computation for any architecture\n",
-    "- **Optimization**: Gradient-based parameter updates\n",
-    "- **Complex Models**: Transformers, ResNets, any differentiable model\n",
-    "- **Research**: Foundation for experimenting with new architectures\n",
-    "\n",
-    "### 🔗 Connection to Real ML Systems\n",
-    "Your implementation mirrors production systems:\n",
-    "- **PyTorch**: `torch.autograd` provides identical functionality\n",
-    "- **TensorFlow**: `tf.GradientTape` implements similar concepts\n",
-    "- **JAX**: `jax.grad` for high-performance automatic differentiation\n",
-    "- **Industry Standard**: Every major ML framework uses these exact principles\n",
-    "\n",
-    "### 🎯 The Power of Automatic Differentiation\n",
-    "You've unlocked the key technology that made modern AI possible:\n",
-    "- **Scalability**: Handles millions of parameters automatically\n",
-    "- **Flexibility**: Works with any differentiable function\n",
-    "- **Efficiency**: Minimal computational overhead\n",
-    "- **Universality**: Enables training of any neural network architecture\n",
-    "\n",
-    "### 🧠 Deep Learning Revolution\n",
-    "You now understand the technology that revolutionized AI:\n",
-    "- **Before autograd**: Manual gradient computation limited model complexity\n",
-    "- **After autograd**: Automatic gradients enabled deep learning revolution\n",
-    "- **Modern AI**: GPT, BERT, ResNet all rely on automatic differentiation\n",
-    "- **Future**: Your understanding enables you to build next-generation AI systems\n",
-    "\n",
-    "### 🚀 What's Next\n",
-    "Your autograd engine is the foundation for:\n",
-    "- **Optimizers**: SGD, Adam, and other gradient-based optimizers\n",
-    "- **Training Loops**: Complete neural network training systems\n",
-    "- **Advanced Architectures**: Transformers, GANs, and more complex models\n",
-    "- **Research**: Experimenting with new differentiable algorithms\n",
-    "\n",
-    "**Next Module**: Advanced training systems, optimizers, and complete neural network architectures!\n",
-    "\n",
-    "You've built the engine that powers modern AI. Now let's use it to train intelligent systems that can learn to solve complex problems!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/10_optimizers/optimizers_dev.ipynb b/modules/source/10_optimizers/optimizers_dev.ipynb
deleted file mode 100644
index 2eb24c60..00000000
--- a/modules/source/10_optimizers/optimizers_dev.ipynb
+++ /dev/null
@@ -1,1789 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "1a7b8957",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Optimizers - Gradient-Based Parameter Updates\n",
-    "\n",
-    "Welcome to the Optimizers module! This is where neural networks learn to improve through intelligent parameter updates.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand gradient descent and how optimizers use gradients to update parameters\n",
-    "- Implement SGD with momentum for accelerated convergence\n",
-    "- Build Adam optimizer with adaptive learning rates\n",
-    "- Master learning rate scheduling strategies\n",
-    "- See how optimizers enable effective neural network training\n",
-    "\n",
-    "## Build → Use → Analyze\n",
-    "1. **Build**: Core optimization algorithms (SGD, Adam)\n",
-    "2. **Use**: Apply optimizers to train neural networks\n",
-    "3. **Analyze**: Compare optimizer behavior and convergence patterns"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "46c2a0fb",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "optimizers-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.optimizers\n",
-    "\n",
-    "#| export\n",
-    "import math\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "import os\n",
-    "from typing import List, Dict, Any, Optional, Union\n",
-    "from collections import defaultdict\n",
-    "\n",
-    "# Helper function to set up import paths\n",
-    "def setup_import_paths():\n",
-    "    \"\"\"Set up import paths for development modules.\"\"\"\n",
-    "    import sys\n",
-    "    import os\n",
-    "    \n",
-    "    # Add module directories to path\n",
-    "    base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n",
-    "    tensor_dir = os.path.join(base_dir, '01_tensor')\n",
-    "    autograd_dir = os.path.join(base_dir, '07_autograd')\n",
-    "    \n",
-    "    if tensor_dir not in sys.path:\n",
-    "        sys.path.append(tensor_dir)\n",
-    "    if autograd_dir not in sys.path:\n",
-    "        sys.path.append(autograd_dir)\n",
-    "\n",
-    "# Import our existing components\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.autograd import Variable\n",
-    "except ImportError:\n",
-    "    # For development, try local imports\n",
-    "    try:\n",
-    "        setup_import_paths()\n",
-    "        from tensor_dev import Tensor\n",
-    "        from autograd_dev import Variable\n",
-    "    except ImportError:\n",
-    "        # Create minimal fallback classes for testing\n",
-    "        print(\"Warning: Using fallback classes for testing\")\n",
-    "        \n",
-    "        class Tensor:\n",
-    "            def __init__(self, data):\n",
-    "                self.data = np.array(data)\n",
-    "                self.shape = self.data.shape\n",
-    "            \n",
-    "            def __str__(self):\n",
-    "                return f\"Tensor({self.data})\"\n",
-    "        \n",
-    "        class Variable:\n",
-    "            def __init__(self, data, requires_grad=True):\n",
-    "                if isinstance(data, (int, float)):\n",
-    "                    self.data = Tensor([data])\n",
-    "                else:\n",
-    "                    self.data = Tensor(data)\n",
-    "                self.requires_grad = requires_grad\n",
-    "                self.grad = None\n",
-    "            \n",
-    "            def zero_grad(self):\n",
-    "                self.grad = None\n",
-    "            \n",
-    "            def __str__(self):\n",
-    "                return f\"Variable({self.data.data})\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0736e01a",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "optimizers-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Optimizers Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build optimization algorithms!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6b4d50c3",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/08_optimizers/optimizers_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.optimizers`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.optimizers import SGD, Adam, StepLR  # The optimization engines!\n",
-    "from tinytorch.core.autograd import Variable  # Gradient computation\n",
-    "from tinytorch.core.tensor import Tensor  # Data structures\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused module for understanding optimization algorithms\n",
-    "- **Production:** Proper organization like PyTorch's `torch.optim`\n",
-    "- **Consistency:** All optimization algorithms live together in `core.optimizers`\n",
-    "- **Foundation:** Enables effective neural network training"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "afd6d838",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What Are Optimizers?\n",
-    "\n",
-    "### The Problem: How to Update Parameters\n",
-    "Neural networks learn by updating parameters using gradients:\n",
-    "```\n",
-    "parameter_new = parameter_old - learning_rate * gradient\n",
-    "```\n",
-    "\n",
-    "But **naive gradient descent** has problems:\n",
-    "- **Slow convergence**: Takes many steps to reach optimum\n",
-    "- **Oscillation**: Bounces around valleys without making progress\n",
-    "- **Poor scaling**: Same learning rate for all parameters\n",
-    "\n",
-    "### The Solution: Smart Optimization\n",
-    "**Optimizers** are algorithms that intelligently update parameters:\n",
-    "- **Momentum**: Accelerate convergence by accumulating velocity\n",
-    "- **Adaptive learning rates**: Different learning rates for different parameters\n",
-    "- **Second-order information**: Use curvature to guide updates\n",
-    "\n",
-    "### Real-World Impact\n",
-    "- **SGD**: The foundation of all neural network training\n",
-    "- **Adam**: The default optimizer for most deep learning applications\n",
-    "- **Learning rate scheduling**: Critical for training stability and performance\n",
-    "\n",
-    "### What We'll Build\n",
-    "1. **SGD**: Stochastic Gradient Descent with momentum\n",
-    "2. **Adam**: Adaptive Moment Estimation optimizer\n",
-    "3. **StepLR**: Learning rate scheduling\n",
-    "4. **Integration**: Complete training loop with optimizers"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2837bfdf",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: Understanding Gradient Descent\n",
-    "\n",
-    "### What is Gradient Descent?\n",
-    "**Gradient descent** finds the minimum of a function by following the negative gradient:\n",
-    "\n",
-    "```\n",
-    "θ_{t+1} = θ_t - α ∇f(θ_t)\n",
-    "```\n",
-    "\n",
-    "Where:\n",
-    "- θ: Parameters we want to optimize\n",
-    "- α: Learning rate (how big steps to take)\n",
-    "- ∇f(θ): Gradient of loss function with respect to parameters\n",
-    "\n",
-    "### Why Gradient Descent Works\n",
-    "1. **Gradients point uphill**: Negative gradient points toward minimum\n",
-    "2. **Iterative improvement**: Each step reduces the loss (in theory)\n",
-    "3. **Local convergence**: Finds local minimum with proper learning rate\n",
-    "4. **Scalable**: Works with millions of parameters\n",
-    "\n",
-    "### The Learning Rate Dilemma\n",
-    "- **Too large**: Overshoots minimum, diverges\n",
-    "- **Too small**: Extremely slow convergence\n",
-    "- **Just right**: Steady progress toward minimum\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Loss landscape: \\__/\n",
-    "Start here: ↑\n",
-    "Gradient descent: ↓ → ↓ → ↓ → minimum\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Neural networks**: Training any deep learning model\n",
-    "- **Machine learning**: Logistic regression, SVM, etc.\n",
-    "- **Scientific computing**: Optimization problems in physics, engineering\n",
-    "- **Economics**: Portfolio optimization, game theory\n",
-    "\n",
-    "Let's implement gradient descent to understand it deeply!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3eb31e7d",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "gradient-descent-function",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def gradient_descent_step(parameter: Variable, learning_rate: float) -> None:\n",
-    "    \"\"\"\n",
-    "    Perform one step of gradient descent on a parameter.\n",
-    "    \n",
-    "    Args:\n",
-    "        parameter: Variable with gradient information\n",
-    "        learning_rate: How much to update parameter\n",
-    "    \n",
-    "    TODO: Implement basic gradient descent parameter update.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Check if parameter has a gradient\n",
-    "    2. Get current parameter value and gradient\n",
-    "    3. Update parameter: new_value = old_value - learning_rate * gradient\n",
-    "    4. Update parameter data with new value\n",
-    "    5. Handle edge cases (no gradient, invalid values)\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    # Parameter with gradient\n",
-    "    w = Variable(2.0, requires_grad=True)\n",
-    "    w.grad = Variable(0.5)  # Gradient from loss\n",
-    "    \n",
-    "    # Update parameter\n",
-    "    gradient_descent_step(w, learning_rate=0.1)\n",
-    "    # w.data now contains: 2.0 - 0.1 * 0.5 = 1.95\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Check if parameter.grad is not None\n",
-    "    - Use parameter.grad.data.data to get gradient value\n",
-    "    - Update parameter.data with new Tensor\n",
-    "    - Don't modify gradient (it's used for logging)\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is the foundation of all neural network training\n",
-    "    - PyTorch's optimizer.step() does exactly this\n",
-    "    - The learning rate determines convergence speed\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    if parameter.grad is not None:\n",
-    "        # Get current parameter value and gradient\n",
-    "        current_value = parameter.data.data\n",
-    "        gradient_value = parameter.grad.data.data\n",
-    "        \n",
-    "        # Update parameter: new_value = old_value - learning_rate * gradient\n",
-    "        new_value = current_value - learning_rate * gradient_value\n",
-    "        \n",
-    "        # Update parameter data\n",
-    "        parameter.data = Tensor(new_value)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ea919b0d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Gradient Descent Step\n",
-    "\n",
-    "Let's test your gradient descent implementation right away! This is the foundation of all optimization algorithms.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific function (gradient_descent_step) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bc7da2bd",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-gradient-descent",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_gradient_descent_step():\n",
-    "    \"\"\"Test basic gradient descent parameter update\"\"\"\n",
-    "    print(\"🔬 Unit Test: Gradient Descent Step...\")\n",
-    "    \n",
-    "    # Test basic parameter update\n",
-    "    try:\n",
-    "        w = Variable(2.0, requires_grad=True)\n",
-    "        w.grad = Variable(0.5)  # Positive gradient\n",
-    "        \n",
-    "        original_value = w.data.data.item()\n",
-    "        gradient_descent_step(w, learning_rate=0.1)\n",
-    "        new_value = w.data.data.item()\n",
-    "        \n",
-    "        expected_value = original_value - 0.1 * 0.5  # 2.0 - 0.05 = 1.95\n",
-    "        assert abs(new_value - expected_value) < 1e-6, f\"Expected {expected_value}, got {new_value}\"\n",
-    "        print(\"✅ Basic parameter update works\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Basic parameter update failed: {e}\")\n",
-    "        raise\n",
-    "\n",
-    "    # Test with negative gradient\n",
-    "    try:\n",
-    "        w2 = Variable(1.0, requires_grad=True)\n",
-    "        w2.grad = Variable(-0.2)  # Negative gradient\n",
-    "        \n",
-    "        gradient_descent_step(w2, learning_rate=0.1)\n",
-    "        expected_value2 = 1.0 - 0.1 * (-0.2)  # 1.0 + 0.02 = 1.02\n",
-    "        assert abs(w2.data.data.item() - expected_value2) < 1e-6, \"Negative gradient test failed\"\n",
-    "        print(\"✅ Negative gradient handling works\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Negative gradient handling failed: {e}\")\n",
-    "        raise\n",
-    "\n",
-    "    # Test with no gradient (should not update)\n",
-    "    try:\n",
-    "        w3 = Variable(3.0, requires_grad=True)\n",
-    "        w3.grad = None\n",
-    "        original_value3 = w3.data.data.item()\n",
-    "        \n",
-    "        gradient_descent_step(w3, learning_rate=0.1)\n",
-    "        assert w3.data.data.item() == original_value3, \"Parameter with no gradient should not update\"\n",
-    "        print(\"✅ No gradient case works\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ No gradient case failed: {e}\")\n",
-    "        raise\n",
-    "\n",
-    "    print(\"🎯 Gradient descent step behavior:\")\n",
-    "    print(\"   Updates parameters in negative gradient direction\")\n",
-    "    print(\"   Uses learning rate to control step size\")\n",
-    "    print(\"   Skips updates when gradient is None\")\n",
-    "    print(\"📈 Progress: Gradient Descent Step ✓\")\n",
-    "\n",
-    "# Test function is called by auto-discovery system"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fe88c9d6",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: SGD with Momentum\n",
-    "\n",
-    "### What is SGD?\n",
-    "**SGD (Stochastic Gradient Descent)** is the fundamental optimization algorithm:\n",
-    "\n",
-    "```\n",
-    "θ_{t+1} = θ_t - α ∇L(θ_t)\n",
-    "```\n",
-    "\n",
-    "### The Problem with Vanilla SGD\n",
-    "- **Slow convergence**: Especially in narrow valleys\n",
-    "- **Oscillation**: Bounces around without making progress\n",
-    "- **Poor conditioning**: Struggles with ill-conditioned problems\n",
-    "\n",
-    "### The Solution: Momentum\n",
-    "**Momentum** accumulates velocity to accelerate convergence:\n",
-    "\n",
-    "```\n",
-    "v_t = β v_{t-1} + ∇L(θ_t)\n",
-    "θ_{t+1} = θ_t - α v_t\n",
-    "```\n",
-    "\n",
-    "Where:\n",
-    "- v_t: Velocity (exponential moving average of gradients)\n",
-    "- β: Momentum coefficient (typically 0.9)\n",
-    "- α: Learning rate\n",
-    "\n",
-    "### Why Momentum Works\n",
-    "1. **Acceleration**: Builds up speed in consistent directions\n",
-    "2. **Dampening**: Reduces oscillations in inconsistent directions\n",
-    "3. **Memory**: Remembers previous gradient directions\n",
-    "4. **Robustness**: Less sensitive to noisy gradients\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Without momentum: ↗↙↗↙↗↙ (oscillating)\n",
-    "With momentum:    ↗→→→→→ (smooth progress)\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Image classification**: Training ResNet, VGG\n",
-    "- **Natural language**: Training RNNs, early transformers\n",
-    "- **Classic choice**: Still used when Adam fails\n",
-    "- **Large batch training**: Often preferred over Adam\n",
-    "\n",
-    "Let's implement SGD with momentum!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fb64936c",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "sgd-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class SGD:\n",
-    "    \"\"\"\n",
-    "    SGD Optimizer with Momentum\n",
-    "    \n",
-    "    Implements stochastic gradient descent with momentum:\n",
-    "    v_t = momentum * v_{t-1} + gradient\n",
-    "    parameter = parameter - learning_rate * v_t\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, parameters: List[Variable], learning_rate: float = 0.01, \n",
-    "                 momentum: float = 0.0, weight_decay: float = 0.0):\n",
-    "        \"\"\"\n",
-    "        Initialize SGD optimizer.\n",
-    "        \n",
-    "        Args:\n",
-    "            parameters: List of Variables to optimize\n",
-    "            learning_rate: Learning rate (default: 0.01)\n",
-    "            momentum: Momentum coefficient (default: 0.0)\n",
-    "            weight_decay: L2 regularization coefficient (default: 0.0)\n",
-    "        \n",
-    "        TODO: Implement SGD optimizer initialization.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store parameters and hyperparameters\n",
-    "        2. Initialize momentum buffers for each parameter\n",
-    "        3. Set up state tracking for optimization\n",
-    "        4. Prepare for step() and zero_grad() methods\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        ```python\n",
-    "        # Create optimizer\n",
-    "        optimizer = SGD([w1, w2, b1, b2], learning_rate=0.01, momentum=0.9)\n",
-    "        \n",
-    "        # In training loop:\n",
-    "        optimizer.zero_grad()\n",
-    "        loss.backward()\n",
-    "        optimizer.step()\n",
-    "        ```\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Store parameters as a list\n",
-    "        - Initialize momentum buffers as empty dict\n",
-    "        - Use parameter id() as key for momentum tracking\n",
-    "        - Momentum buffers will be created lazily in step()\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.parameters = parameters\n",
-    "        self.learning_rate = learning_rate\n",
-    "        self.momentum = momentum\n",
-    "        self.weight_decay = weight_decay\n",
-    "        \n",
-    "        # Initialize momentum buffers (created lazily)\n",
-    "        self.momentum_buffers = {}\n",
-    "        \n",
-    "        # Track optimization steps\n",
-    "        self.step_count = 0\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def step(self) -> None:\n",
-    "        \"\"\"\n",
-    "        Perform one optimization step.\n",
-    "        \n",
-    "        TODO: Implement SGD parameter update with momentum.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Iterate through all parameters\n",
-    "        2. For each parameter with gradient:\n",
-    "           a. Get current gradient\n",
-    "           b. Apply weight decay if specified\n",
-    "           c. Update momentum buffer (or create if first time)\n",
-    "           d. Update parameter using momentum\n",
-    "        3. Increment step count\n",
-    "        \n",
-    "        MATHEMATICAL FORMULATION:\n",
-    "        - If weight_decay > 0: gradient = gradient + weight_decay * parameter\n",
-    "        - momentum_buffer = momentum * momentum_buffer + gradient\n",
-    "        - parameter = parameter - learning_rate * momentum_buffer\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use id(param) as key for momentum buffers\n",
-    "        - Initialize buffer with zeros if not exists\n",
-    "        - Handle case where momentum = 0 (no momentum)\n",
-    "        - Update parameter.data with new Tensor\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        for param in self.parameters:\n",
-    "            if param.grad is not None:\n",
-    "                # Get gradient\n",
-    "                gradient = param.grad.data.data\n",
-    "                \n",
-    "                # Apply weight decay (L2 regularization)\n",
-    "                if self.weight_decay > 0:\n",
-    "                    gradient = gradient + self.weight_decay * param.data.data\n",
-    "                \n",
-    "                # Get or create momentum buffer\n",
-    "                param_id = id(param)\n",
-    "                if param_id not in self.momentum_buffers:\n",
-    "                    self.momentum_buffers[param_id] = np.zeros_like(param.data.data)\n",
-    "                \n",
-    "                # Update momentum buffer\n",
-    "                self.momentum_buffers[param_id] = (\n",
-    "                    self.momentum * self.momentum_buffers[param_id] + gradient\n",
-    "                )\n",
-    "                \n",
-    "                # Update parameter\n",
-    "                param.data = Tensor(\n",
-    "                    param.data.data - self.learning_rate * self.momentum_buffers[param_id]\n",
-    "                )\n",
-    "        \n",
-    "        self.step_count += 1\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def zero_grad(self) -> None:\n",
-    "        \"\"\"\n",
-    "        Zero out gradients for all parameters.\n",
-    "        \n",
-    "        TODO: Implement gradient zeroing.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Iterate through all parameters\n",
-    "        2. Set gradient to None for each parameter\n",
-    "        3. This prepares for next backward pass\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Simply set param.grad = None\n",
-    "        - This is called before loss.backward()\n",
-    "        - Essential for proper gradient accumulation\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        for param in self.parameters:\n",
-    "            param.grad = None\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "65f9f253",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: SGD Optimizer\n",
-    "\n",
-    "Let's test your SGD optimizer implementation! This optimizer adds momentum to gradient descent for better convergence.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific class (SGD) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3eca0754",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-sgd",
-     "locked": true,
-     "points": 15,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_sgd_optimizer():\n",
-    "    \"\"\"Test SGD optimizer implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: SGD Optimizer...\")\n",
-    "    \n",
-    "    # Create test parameters\n",
-    "    w1 = Variable(1.0, requires_grad=True)\n",
-    "    w2 = Variable(2.0, requires_grad=True)\n",
-    "    b = Variable(0.5, requires_grad=True)\n",
-    "    \n",
-    "    # Create optimizer\n",
-    "    optimizer = SGD([w1, w2, b], learning_rate=0.1, momentum=0.9)\n",
-    "    \n",
-    "    # Test zero_grad\n",
-    "    try:\n",
-    "        w1.grad = Variable(0.1)\n",
-    "        w2.grad = Variable(0.2)\n",
-    "        b.grad = Variable(0.05)\n",
-    "        \n",
-    "        optimizer.zero_grad()\n",
-    "        \n",
-    "        assert w1.grad is None, \"Gradient should be None after zero_grad\"\n",
-    "        assert w2.grad is None, \"Gradient should be None after zero_grad\"\n",
-    "        assert b.grad is None, \"Gradient should be None after zero_grad\"\n",
-    "        print(\"✅ zero_grad() works correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ zero_grad() failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test step with gradients\n",
-    "    try:\n",
-    "        w1.grad = Variable(0.1)\n",
-    "        w2.grad = Variable(0.2)\n",
-    "        b.grad = Variable(0.05)\n",
-    "        \n",
-    "        # First step (no momentum yet)\n",
-    "        original_w1 = w1.data.data.item()\n",
-    "        original_w2 = w2.data.data.item()\n",
-    "        original_b = b.data.data.item()\n",
-    "        \n",
-    "        optimizer.step()\n",
-    "        \n",
-    "        # Check parameter updates\n",
-    "        expected_w1 = original_w1 - 0.1 * 0.1  # 1.0 - 0.01 = 0.99\n",
-    "        expected_w2 = original_w2 - 0.1 * 0.2  # 2.0 - 0.02 = 1.98\n",
-    "        expected_b = original_b - 0.1 * 0.05   # 0.5 - 0.005 = 0.495\n",
-    "        \n",
-    "        assert abs(w1.data.data.item() - expected_w1) < 1e-6, f\"w1 update failed: expected {expected_w1}, got {w1.data.data.item()}\"\n",
-    "        assert abs(w2.data.data.item() - expected_w2) < 1e-6, f\"w2 update failed: expected {expected_w2}, got {w2.data.data.item()}\"\n",
-    "        assert abs(b.data.data.item() - expected_b) < 1e-6, f\"b update failed: expected {expected_b}, got {b.data.data.item()}\"\n",
-    "        print(\"✅ Parameter updates work correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Parameter updates failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test momentum buffers\n",
-    "    try:\n",
-    "        assert len(optimizer.momentum_buffers) == 3, f\"Should have 3 momentum buffers, got {len(optimizer.momentum_buffers)}\"\n",
-    "        assert optimizer.step_count == 1, f\"Step count should be 1, got {optimizer.step_count}\"\n",
-    "        print(\"✅ Momentum buffers created correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Momentum buffers failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test step counting\n",
-    "    try:\n",
-    "        w1.grad = Variable(0.1)\n",
-    "        w2.grad = Variable(0.2)\n",
-    "        b.grad = Variable(0.05)\n",
-    "        \n",
-    "        optimizer.step()\n",
-    "        \n",
-    "        assert optimizer.step_count == 2, f\"Step count should be 2, got {optimizer.step_count}\"\n",
-    "        print(\"✅ Step counting works correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Step counting failed: {e}\")\n",
-    "        raise\n",
-    "\n",
-    "    print(\"🎯 SGD optimizer behavior:\")\n",
-    "    print(\"   Maintains momentum buffers for accelerated updates\")\n",
-    "    print(\"   Tracks step count for learning rate scheduling\")\n",
-    "    print(\"   Supports weight decay for regularization\")\n",
-    "    print(\"📈 Progress: SGD Optimizer ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_sgd_optimizer()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "17102887",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Adam - Adaptive Learning Rates\n",
-    "\n",
-    "### What is Adam?\n",
-    "**Adam (Adaptive Moment Estimation)** is the most popular optimizer in deep learning:\n",
-    "\n",
-    "```\n",
-    "m_t = β₁ m_{t-1} + (1 - β₁) ∇L(θ_t)        # First moment (momentum)\n",
-    "v_t = β₂ v_{t-1} + (1 - β₂) (∇L(θ_t))²     # Second moment (variance)\n",
-    "m̂_t = m_t / (1 - β₁ᵗ)                      # Bias correction\n",
-    "v̂_t = v_t / (1 - β₂ᵗ)                      # Bias correction\n",
-    "θ_{t+1} = θ_t - α m̂_t / (√v̂_t + ε)        # Parameter update\n",
-    "```\n",
-    "\n",
-    "### Why Adam is Revolutionary\n",
-    "1. **Adaptive learning rates**: Different learning rate for each parameter\n",
-    "2. **Momentum**: Accelerates convergence like SGD\n",
-    "3. **Variance adaptation**: Scales updates based on gradient variance\n",
-    "4. **Bias correction**: Handles initialization bias\n",
-    "5. **Robust**: Works well with minimal hyperparameter tuning\n",
-    "\n",
-    "### The Three Key Ideas\n",
-    "1. **First moment (m_t)**: Exponential moving average of gradients (momentum)\n",
-    "2. **Second moment (v_t)**: Exponential moving average of squared gradients (variance)\n",
-    "3. **Adaptive scaling**: Large gradients → small updates, small gradients → large updates\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Parameter with large gradients: /\\/\\/\\/\\ → smooth updates\n",
-    "Parameter with small gradients: ______ → amplified updates\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Deep learning**: Default optimizer for most neural networks\n",
-    "- **Computer vision**: Training CNNs, ResNets, Vision Transformers\n",
-    "- **Natural language**: Training BERT, GPT, T5\n",
-    "- **Transformers**: Essential for attention-based models\n",
-    "\n",
-    "Let's implement Adam optimizer!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "13cec546",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "adam-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Adam:\n",
-    "    \"\"\"\n",
-    "    Adam Optimizer\n",
-    "    \n",
-    "    Implements Adam algorithm with adaptive learning rates:\n",
-    "    - First moment: exponential moving average of gradients\n",
-    "    - Second moment: exponential moving average of squared gradients\n",
-    "    - Bias correction: accounts for initialization bias\n",
-    "    - Adaptive updates: different learning rate per parameter\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, parameters: List[Variable], learning_rate: float = 0.001,\n",
-    "                 beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-8,\n",
-    "                 weight_decay: float = 0.0):\n",
-    "        \"\"\"\n",
-    "        Initialize Adam optimizer.\n",
-    "        \n",
-    "        Args:\n",
-    "            parameters: List of Variables to optimize\n",
-    "            learning_rate: Learning rate (default: 0.001)\n",
-    "            beta1: Exponential decay rate for first moment (default: 0.9)\n",
-    "            beta2: Exponential decay rate for second moment (default: 0.999)\n",
-    "            epsilon: Small constant for numerical stability (default: 1e-8)\n",
-    "            weight_decay: L2 regularization coefficient (default: 0.0)\n",
-    "        \n",
-    "        TODO: Implement Adam optimizer initialization.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store parameters and hyperparameters\n",
-    "        2. Initialize first moment buffers (m_t)\n",
-    "        3. Initialize second moment buffers (v_t)\n",
-    "        4. Set up step counter for bias correction\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        ```python\n",
-    "        # Create Adam optimizer\n",
-    "        optimizer = Adam([w1, w2, b1, b2], learning_rate=0.001)\n",
-    "        \n",
-    "        # In training loop:\n",
-    "        optimizer.zero_grad()\n",
-    "        loss.backward()\n",
-    "        optimizer.step()\n",
-    "        ```\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Store all hyperparameters\n",
-    "        - Initialize moment buffers as empty dicts\n",
-    "        - Use parameter id() as key for tracking\n",
-    "        - Buffers will be created lazily in step()\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.parameters = parameters\n",
-    "        self.learning_rate = learning_rate\n",
-    "        self.beta1 = beta1\n",
-    "        self.beta2 = beta2\n",
-    "        self.epsilon = epsilon\n",
-    "        self.weight_decay = weight_decay\n",
-    "        \n",
-    "        # Initialize moment buffers (created lazily)\n",
-    "        self.first_moment = {}   # m_t\n",
-    "        self.second_moment = {}  # v_t\n",
-    "        \n",
-    "        # Track optimization steps for bias correction\n",
-    "        self.step_count = 0\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def step(self) -> None:\n",
-    "        \"\"\"\n",
-    "        Perform one optimization step using Adam algorithm.\n",
-    "        \n",
-    "        TODO: Implement Adam parameter update.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Increment step count\n",
-    "        2. For each parameter with gradient:\n",
-    "           a. Get current gradient\n",
-    "           b. Apply weight decay if specified\n",
-    "           c. Update first moment (momentum)\n",
-    "           d. Update second moment (variance)\n",
-    "           e. Apply bias correction\n",
-    "           f. Update parameter with adaptive learning rate\n",
-    "        \n",
-    "        MATHEMATICAL FORMULATION:\n",
-    "        - m_t = beta1 * m_{t-1} + (1 - beta1) * gradient\n",
-    "        - v_t = beta2 * v_{t-1} + (1 - beta2) * gradient^2\n",
-    "        - m_hat = m_t / (1 - beta1^t)\n",
-    "        - v_hat = v_t / (1 - beta2^t)\n",
-    "        - parameter = parameter - learning_rate * m_hat / (sqrt(v_hat) + epsilon)\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use id(param) as key for moment buffers\n",
-    "        - Initialize buffers with zeros if not exists\n",
-    "        - Use np.sqrt() for square root\n",
-    "        - Handle numerical stability with epsilon\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.step_count += 1\n",
-    "        \n",
-    "        for param in self.parameters:\n",
-    "            if param.grad is not None:\n",
-    "                # Get gradient\n",
-    "                gradient = param.grad.data.data\n",
-    "                \n",
-    "                # Apply weight decay (L2 regularization)\n",
-    "                if self.weight_decay > 0:\n",
-    "                    gradient = gradient + self.weight_decay * param.data.data\n",
-    "                \n",
-    "                # Get or create moment buffers\n",
-    "                param_id = id(param)\n",
-    "                if param_id not in self.first_moment:\n",
-    "                    self.first_moment[param_id] = np.zeros_like(param.data.data)\n",
-    "                    self.second_moment[param_id] = np.zeros_like(param.data.data)\n",
-    "                \n",
-    "                # Update first moment (momentum)\n",
-    "                self.first_moment[param_id] = (\n",
-    "                    self.beta1 * self.first_moment[param_id] + \n",
-    "                    (1 - self.beta1) * gradient\n",
-    "                )\n",
-    "                \n",
-    "                # Update second moment (variance)\n",
-    "                self.second_moment[param_id] = (\n",
-    "                    self.beta2 * self.second_moment[param_id] + \n",
-    "                    (1 - self.beta2) * gradient * gradient\n",
-    "                )\n",
-    "                \n",
-    "                # Bias correction\n",
-    "                first_moment_corrected = (\n",
-    "                    self.first_moment[param_id] / (1 - self.beta1 ** self.step_count)\n",
-    "                )\n",
-    "                second_moment_corrected = (\n",
-    "                    self.second_moment[param_id] / (1 - self.beta2 ** self.step_count)\n",
-    "                )\n",
-    "                \n",
-    "                # Update parameter with adaptive learning rate\n",
-    "                param.data = Tensor(\n",
-    "                    param.data.data - self.learning_rate * first_moment_corrected / \n",
-    "                    (np.sqrt(second_moment_corrected) + self.epsilon)\n",
-    "                )\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def zero_grad(self) -> None:\n",
-    "        \"\"\"\n",
-    "        Zero out gradients for all parameters.\n",
-    "        \n",
-    "        TODO: Implement gradient zeroing (same as SGD).\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Set param.grad = None for all parameters\n",
-    "        - This is identical to SGD implementation\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        for param in self.parameters:\n",
-    "            param.grad = None\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f434b140",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "### 🧪 Test Your Adam Implementation\n",
-    "\n",
-    "Let's test the Adam optimizer:"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9f1b0e70",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Adam Optimizer\n",
-    "\n",
-    "Let's test your Adam optimizer implementation! This is a state-of-the-art adaptive optimization algorithm.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific class (Adam) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e28e2680",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-adam",
-     "locked": true,
-     "points": 20,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_adam_optimizer():\n",
-    "    \"\"\"Test Adam optimizer implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Adam Optimizer...\")\n",
-    "    \n",
-    "    # Create test parameters\n",
-    "    w1 = Variable(1.0, requires_grad=True)\n",
-    "    w2 = Variable(2.0, requires_grad=True)\n",
-    "    b = Variable(0.5, requires_grad=True)\n",
-    "    \n",
-    "    # Create optimizer\n",
-    "    optimizer = Adam([w1, w2, b], learning_rate=0.01, beta1=0.9, beta2=0.999, epsilon=1e-8)\n",
-    "    \n",
-    "    # Test zero_grad\n",
-    "    try:\n",
-    "        w1.grad = Variable(0.1)\n",
-    "        w2.grad = Variable(0.2)\n",
-    "        b.grad = Variable(0.05)\n",
-    "        \n",
-    "        optimizer.zero_grad()\n",
-    "        \n",
-    "        assert w1.grad is None, \"Gradient should be None after zero_grad\"\n",
-    "        assert w2.grad is None, \"Gradient should be None after zero_grad\"\n",
-    "        assert b.grad is None, \"Gradient should be None after zero_grad\"\n",
-    "        print(\"✅ zero_grad() works correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ zero_grad() failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test step with gradients\n",
-    "    try:\n",
-    "        w1.grad = Variable(0.1)\n",
-    "        w2.grad = Variable(0.2)\n",
-    "        b.grad = Variable(0.05)\n",
-    "        \n",
-    "        # First step\n",
-    "        original_w1 = w1.data.data.item()\n",
-    "        original_w2 = w2.data.data.item()\n",
-    "        original_b = b.data.data.item()\n",
-    "        \n",
-    "        optimizer.step()\n",
-    "        \n",
-    "        # Check that parameters were updated (Adam uses adaptive learning rates)\n",
-    "        assert w1.data.data.item() != original_w1, \"w1 should have been updated\"\n",
-    "        assert w2.data.data.item() != original_w2, \"w2 should have been updated\"\n",
-    "        assert b.data.data.item() != original_b, \"b should have been updated\"\n",
-    "        print(\"✅ Parameter updates work correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Parameter updates failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test moment buffers\n",
-    "    try:\n",
-    "        assert len(optimizer.first_moment) == 3, f\"Should have 3 first moment buffers, got {len(optimizer.first_moment)}\"\n",
-    "        assert len(optimizer.second_moment) == 3, f\"Should have 3 second moment buffers, got {len(optimizer.second_moment)}\"\n",
-    "        print(\"✅ Moment buffers created correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Moment buffers failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test step counting and bias correction\n",
-    "    try:\n",
-    "        assert optimizer.step_count == 1, f\"Step count should be 1, got {optimizer.step_count}\"\n",
-    "        \n",
-    "        # Take another step\n",
-    "        w1.grad = Variable(0.1)\n",
-    "        w2.grad = Variable(0.2)\n",
-    "        b.grad = Variable(0.05)\n",
-    "        \n",
-    "        optimizer.step()\n",
-    "        \n",
-    "        assert optimizer.step_count == 2, f\"Step count should be 2, got {optimizer.step_count}\"\n",
-    "        print(\"✅ Step counting and bias correction work correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Step counting and bias correction failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test adaptive learning rates\n",
-    "    try:\n",
-    "        # Adam should have different effective learning rates for different parameters\n",
-    "        # This is tested implicitly by the parameter updates above\n",
-    "        print(\"✅ Adaptive learning rates work correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Adaptive learning rates failed: {e}\")\n",
-    "        raise\n",
-    "\n",
-    "    print(\"🎯 Adam optimizer behavior:\")\n",
-    "    print(\"   Maintains first and second moment estimates\")\n",
-    "    print(\"   Applies bias correction for early training\")\n",
-    "    print(\"   Uses adaptive learning rates per parameter\")\n",
-    "    print(\"   Combines benefits of momentum and RMSprop\")\n",
-    "    print(\"📈 Progress: Adam Optimizer ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_adam_optimizer()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fe7a5590",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Learning Rate Scheduling\n",
-    "\n",
-    "### What is Learning Rate Scheduling?\n",
-    "**Learning rate scheduling** adjusts the learning rate during training:\n",
-    "\n",
-    "```\n",
-    "Initial: learning_rate = 0.1\n",
-    "After 10 epochs: learning_rate = 0.01\n",
-    "After 20 epochs: learning_rate = 0.001\n",
-    "```\n",
-    "\n",
-    "### Why Scheduling Matters\n",
-    "1. **Fine-tuning**: Start with large steps, then refine with small steps\n",
-    "2. **Convergence**: Prevents overshooting near optimum\n",
-    "3. **Stability**: Reduces oscillations in later training\n",
-    "4. **Performance**: Often improves final accuracy\n",
-    "\n",
-    "### Common Scheduling Strategies\n",
-    "1. **Step decay**: Reduce by factor every N epochs\n",
-    "2. **Exponential decay**: Gradual exponential reduction\n",
-    "3. **Cosine annealing**: Smooth cosine curve reduction\n",
-    "4. **Warm-up**: Start small, increase, then decrease\n",
-    "\n",
-    "### Visual Understanding\n",
-    "```\n",
-    "Step decay:     ----↓----↓----↓\n",
-    "Exponential:    \\\\\\\\\\\\\\\\\\\\\\\\\\\\\n",
-    "Cosine:         ∩∩∩∩∩∩∩∩∩∩∩∩∩\n",
-    "```\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **ImageNet training**: Essential for achieving state-of-the-art results\n",
-    "- **Language models**: Critical for training large transformers\n",
-    "- **Fine-tuning**: Prevents catastrophic forgetting\n",
-    "- **Transfer learning**: Adapts pre-trained models\n",
-    "\n",
-    "Let's implement step learning rate scheduling!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "79b15d87",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "steplr-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class StepLR:\n",
-    "    \"\"\"\n",
-    "    Step Learning Rate Scheduler\n",
-    "    \n",
-    "    Decays learning rate by gamma every step_size epochs:\n",
-    "    learning_rate = initial_lr * (gamma ^ (epoch // step_size))\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, optimizer: Union[SGD, Adam], step_size: int, gamma: float = 0.1):\n",
-    "        \"\"\"\n",
-    "        Initialize step learning rate scheduler.\n",
-    "        \n",
-    "        Args:\n",
-    "            optimizer: Optimizer to schedule\n",
-    "            step_size: Number of epochs between decreases\n",
-    "            gamma: Multiplicative factor for learning rate decay\n",
-    "        \n",
-    "        TODO: Implement learning rate scheduler initialization.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store optimizer reference\n",
-    "        2. Store scheduling parameters\n",
-    "        3. Save initial learning rate\n",
-    "        4. Initialize step counter\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        ```python\n",
-    "        optimizer = SGD([w1, w2], learning_rate=0.1)\n",
-    "        scheduler = StepLR(optimizer, step_size=10, gamma=0.1)\n",
-    "        \n",
-    "        # In training loop:\n",
-    "        for epoch in range(100):\n",
-    "            train_one_epoch()\n",
-    "            scheduler.step()  # Update learning rate\n",
-    "        ```\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Store optimizer reference\n",
-    "        - Save initial learning rate from optimizer\n",
-    "        - Initialize step counter to 0\n",
-    "        - gamma is the decay factor (0.1 = 10x reduction)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.optimizer = optimizer\n",
-    "        self.step_size = step_size\n",
-    "        self.gamma = gamma\n",
-    "        self.initial_lr = optimizer.learning_rate\n",
-    "        self.step_count = 0\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def step(self) -> None:\n",
-    "        \"\"\"\n",
-    "        Update learning rate based on current step.\n",
-    "        \n",
-    "        TODO: Implement learning rate update.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Increment step counter\n",
-    "        2. Calculate new learning rate using step decay formula\n",
-    "        3. Update optimizer's learning rate\n",
-    "        \n",
-    "        MATHEMATICAL FORMULATION:\n",
-    "        new_lr = initial_lr * (gamma ^ ((step_count - 1) // step_size))\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use // for integer division\n",
-    "        - Use ** for exponentiation\n",
-    "        - Update optimizer.learning_rate directly\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.step_count += 1\n",
-    "        \n",
-    "        # Calculate new learning rate\n",
-    "        decay_factor = self.gamma ** ((self.step_count - 1) // self.step_size)\n",
-    "        new_lr = self.initial_lr * decay_factor\n",
-    "        \n",
-    "        # Update optimizer's learning rate\n",
-    "        self.optimizer.learning_rate = new_lr\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def get_lr(self) -> float:\n",
-    "        \"\"\"\n",
-    "        Get current learning rate.\n",
-    "        \n",
-    "        TODO: Return current learning rate.\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Return optimizer.learning_rate\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return self.optimizer.learning_rate\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d11f4710",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Step Learning Rate Scheduler\n",
-    "\n",
-    "Let's test your step learning rate scheduler implementation! This scheduler reduces learning rate at regular intervals.\n",
-    "\n",
-    "**This is a unit test** - it tests one specific class (StepLR) in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "52b116af",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-step-scheduler",
-     "locked": true,
-     "points": 10,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_step_scheduler():\n",
-    "    \"\"\"Test StepLR scheduler implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Step Learning Rate Scheduler...\")\n",
-    "    \n",
-    "    # Create test parameters and optimizer\n",
-    "    w = Variable(1.0, requires_grad=True)\n",
-    "    optimizer = SGD([w], learning_rate=0.1)\n",
-    "    \n",
-    "    # Test scheduler initialization\n",
-    "    try:\n",
-    "        scheduler = StepLR(optimizer, step_size=10, gamma=0.1)\n",
-    "        \n",
-    "        # Test initial learning rate\n",
-    "        assert scheduler.get_lr() == 0.1, f\"Initial learning rate should be 0.1, got {scheduler.get_lr()}\"\n",
-    "        print(\"✅ Initial learning rate is correct\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Initial learning rate failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test step-based decay\n",
-    "    try:\n",
-    "        # Steps 1-10: no decay (decay happens after step 10)\n",
-    "        for i in range(10):\n",
-    "            scheduler.step()\n",
-    "        \n",
-    "        assert scheduler.get_lr() == 0.1, f\"Learning rate should still be 0.1 after 10 steps, got {scheduler.get_lr()}\"\n",
-    "        \n",
-    "        # Step 11: decay should occur\n",
-    "        scheduler.step()\n",
-    "        expected_lr = 0.1 * 0.1  # 0.01\n",
-    "        assert abs(scheduler.get_lr() - expected_lr) < 1e-6, f\"Learning rate should be {expected_lr} after 11 steps, got {scheduler.get_lr()}\"\n",
-    "        print(\"✅ Step-based decay works correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Step-based decay failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test multiple decay levels\n",
-    "    try:\n",
-    "        # Steps 12-20: should stay at 0.01\n",
-    "        for i in range(9):\n",
-    "            scheduler.step()\n",
-    "        \n",
-    "        assert abs(scheduler.get_lr() - 0.01) < 1e-6, f\"Learning rate should be 0.01 after 20 steps, got {scheduler.get_lr()}\"\n",
-    "        \n",
-    "        # Step 21: another decay\n",
-    "        scheduler.step()\n",
-    "        expected_lr = 0.01 * 0.1  # 0.001\n",
-    "        assert abs(scheduler.get_lr() - expected_lr) < 1e-6, f\"Learning rate should be {expected_lr} after 21 steps, got {scheduler.get_lr()}\"\n",
-    "        print(\"✅ Multiple decay levels work correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Multiple decay levels failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test with different optimizer\n",
-    "    try:\n",
-    "        w2 = Variable(2.0, requires_grad=True)\n",
-    "        adam_optimizer = Adam([w2], learning_rate=0.001)\n",
-    "        adam_scheduler = StepLR(adam_optimizer, step_size=5, gamma=0.5)\n",
-    "        \n",
-    "        # Test initial learning rate\n",
-    "        assert adam_scheduler.get_lr() == 0.001, f\"Initial Adam learning rate should be 0.001, got {adam_scheduler.get_lr()}\"\n",
-    "        \n",
-    "        # Test decay after 5 steps\n",
-    "        for i in range(5):\n",
-    "            adam_scheduler.step()\n",
-    "        \n",
-    "        # Learning rate should still be 0.001 after 5 steps\n",
-    "        assert adam_scheduler.get_lr() == 0.001, f\"Adam learning rate should still be 0.001 after 5 steps, got {adam_scheduler.get_lr()}\"\n",
-    "        \n",
-    "        # Step 6: decay should occur\n",
-    "        adam_scheduler.step()\n",
-    "        expected_lr = 0.001 * 0.5  # 0.0005\n",
-    "        assert abs(adam_scheduler.get_lr() - expected_lr) < 1e-6, f\"Adam learning rate should be {expected_lr} after 6 steps, got {adam_scheduler.get_lr()}\"\n",
-    "        print(\"✅ Works with different optimizers\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Different optimizers failed: {e}\")\n",
-    "        raise\n",
-    "\n",
-    "    print(\"🎯 Step learning rate scheduler behavior:\")\n",
-    "    print(\"   Reduces learning rate at regular intervals\")\n",
-    "    print(\"   Multiplies current rate by gamma factor\")\n",
-    "    print(\"   Works with any optimizer (SGD, Adam, etc.)\")\n",
-    "    print(\"📈 Progress: Step Learning Rate Scheduler ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_step_scheduler()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e6636e5b",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 5: Integration - Complete Training Example\n",
-    "\n",
-    "### Putting It All Together\n",
-    "Let's see how optimizers enable complete neural network training:\n",
-    "\n",
-    "1. **Forward pass**: Compute predictions\n",
-    "2. **Loss computation**: Compare with targets\n",
-    "3. **Backward pass**: Compute gradients\n",
-    "4. **Optimizer step**: Update parameters\n",
-    "5. **Learning rate scheduling**: Adjust learning rate\n",
-    "\n",
-    "### The Modern Training Loop\n",
-    "```python\n",
-    "# Setup\n",
-    "optimizer = Adam(model.parameters(), learning_rate=0.001)\n",
-    "scheduler = StepLR(optimizer, step_size=10, gamma=0.1)\n",
-    "\n",
-    "# Training loop\n",
-    "for epoch in range(num_epochs):\n",
-    "    for batch in dataloader:\n",
-    "        # Forward pass\n",
-    "        predictions = model(batch.inputs)\n",
-    "        loss = criterion(predictions, batch.targets)\n",
-    "        \n",
-    "        # Backward pass\n",
-    "        optimizer.zero_grad()\n",
-    "        loss.backward()\n",
-    "        optimizer.step()\n",
-    "    \n",
-    "    # Update learning rate\n",
-    "    scheduler.step()\n",
-    "```\n",
-    "\n",
-    "Let's implement a complete training example!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2f4d483d",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "training-integration",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def train_simple_model():\n",
-    "    \"\"\"\n",
-    "    Complete training example using optimizers.\n",
-    "    \n",
-    "    TODO: Implement a complete training loop.\n",
-    "    \n",
-    "    APPROACH:\n",
-    "    1. Create a simple model (linear regression)\n",
-    "    2. Generate training data\n",
-    "    3. Set up optimizer and scheduler\n",
-    "    4. Train for several epochs\n",
-    "    5. Show convergence\n",
-    "    \n",
-    "    LEARNING OBJECTIVE:\n",
-    "    - See how optimizers enable real learning\n",
-    "    - Compare SGD vs Adam performance\n",
-    "    - Understand the complete training workflow\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    print(\"Training simple linear regression model...\")\n",
-    "    \n",
-    "    # Create simple model: y = w*x + b\n",
-    "    w = Variable(0.1, requires_grad=True)  # Initialize near zero\n",
-    "    b = Variable(0.0, requires_grad=True)\n",
-    "    \n",
-    "    # Training data: y = 2*x + 1\n",
-    "    x_data = [1.0, 2.0, 3.0, 4.0, 5.0]\n",
-    "    y_data = [3.0, 5.0, 7.0, 9.0, 11.0]\n",
-    "    \n",
-    "    # Try SGD first\n",
-    "    print(\"\\n🔍 Training with SGD...\")\n",
-    "    optimizer_sgd = SGD([w, b], learning_rate=0.01, momentum=0.9)\n",
-    "    \n",
-    "    for epoch in range(60):\n",
-    "        total_loss = 0\n",
-    "        \n",
-    "        for x_val, y_val in zip(x_data, y_data):\n",
-    "            # Forward pass\n",
-    "            x = Variable(x_val, requires_grad=False)\n",
-    "            y_target = Variable(y_val, requires_grad=False)\n",
-    "            \n",
-    "            # Prediction: y = w*x + b\n",
-    "            try:\n",
-    "                from tinytorch.core.autograd import add, multiply, subtract\n",
-    "            except ImportError:\n",
-    "                setup_import_paths()\n",
-    "                from autograd_dev import add, multiply, subtract\n",
-    "            \n",
-    "            prediction = add(multiply(w, x), b)\n",
-    "            \n",
-    "            # Loss: (prediction - target)^2\n",
-    "            error = subtract(prediction, y_target)\n",
-    "            loss = multiply(error, error)\n",
-    "            \n",
-    "            # Backward pass\n",
-    "            optimizer_sgd.zero_grad()\n",
-    "            loss.backward()\n",
-    "            optimizer_sgd.step()\n",
-    "            \n",
-    "            total_loss += loss.data.data.item()\n",
-    "        \n",
-    "        if epoch % 10 == 0:\n",
-    "            print(f\"Epoch {epoch}: Loss = {total_loss:.4f}, w = {w.data.data.item():.3f}, b = {b.data.data.item():.3f}\")\n",
-    "    \n",
-    "    sgd_final_w = w.data.data.item()\n",
-    "    sgd_final_b = b.data.data.item()\n",
-    "    \n",
-    "    # Reset parameters and try Adam\n",
-    "    print(\"\\n🔍 Training with Adam...\")\n",
-    "    w.data = Tensor(0.1)\n",
-    "    b.data = Tensor(0.0)\n",
-    "    \n",
-    "    optimizer_adam = Adam([w, b], learning_rate=0.01)\n",
-    "    \n",
-    "    for epoch in range(60):\n",
-    "        total_loss = 0\n",
-    "        \n",
-    "        for x_val, y_val in zip(x_data, y_data):\n",
-    "            # Forward pass\n",
-    "            x = Variable(x_val, requires_grad=False)\n",
-    "            y_target = Variable(y_val, requires_grad=False)\n",
-    "            \n",
-    "            # Prediction: y = w*x + b\n",
-    "            prediction = add(multiply(w, x), b)\n",
-    "            \n",
-    "            # Loss: (prediction - target)^2\n",
-    "            error = subtract(prediction, y_target)\n",
-    "            loss = multiply(error, error)\n",
-    "            \n",
-    "            # Backward pass\n",
-    "            optimizer_adam.zero_grad()\n",
-    "            loss.backward()\n",
-    "            optimizer_adam.step()\n",
-    "            \n",
-    "            total_loss += loss.data.data.item()\n",
-    "        \n",
-    "        if epoch % 10 == 0:\n",
-    "            print(f\"Epoch {epoch}: Loss = {total_loss:.4f}, w = {w.data.data.item():.3f}, b = {b.data.data.item():.3f}\")\n",
-    "    \n",
-    "    adam_final_w = w.data.data.item()\n",
-    "    adam_final_b = b.data.data.item()\n",
-    "    \n",
-    "    print(f\"\\n📊 Results:\")\n",
-    "    print(f\"Target: w = 2.0, b = 1.0\")\n",
-    "    print(f\"SGD:    w = {sgd_final_w:.3f}, b = {sgd_final_b:.3f}\")\n",
-    "    print(f\"Adam:   w = {adam_final_w:.3f}, b = {adam_final_b:.3f}\")\n",
-    "    \n",
-    "    return sgd_final_w, sgd_final_b, adam_final_w, adam_final_b\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a6691538",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Complete Training Integration\n",
-    "\n",
-    "Let's test your complete training integration! This demonstrates optimizers working together in a realistic training scenario.\n",
-    "\n",
-    "**This is a unit test** - it tests the complete training workflow with optimizers in isolation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c02a5ded",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-training-integration",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_training_integration():\n",
-    "    \"\"\"Test complete training integration with optimizers\"\"\"\n",
-    "    print(\"🔬 Unit Test: Complete Training Integration...\")\n",
-    "    \n",
-    "    # Test training with SGD and Adam\n",
-    "    try:\n",
-    "        sgd_w, sgd_b, adam_w, adam_b = train_simple_model()\n",
-    "        \n",
-    "        # Test SGD convergence\n",
-    "        assert abs(sgd_w - 2.0) < 0.1, f\"SGD should converge close to w=2.0, got {sgd_w}\"\n",
-    "        assert abs(sgd_b - 1.0) < 0.1, f\"SGD should converge close to b=1.0, got {sgd_b}\"\n",
-    "        print(\"✅ SGD convergence works\")\n",
-    "        \n",
-    "        # Test Adam convergence (may be different due to adaptive learning rates)\n",
-    "        assert abs(adam_w - 2.0) < 1.0, f\"Adam should converge reasonably close to w=2.0, got {adam_w}\"\n",
-    "        assert abs(adam_b - 1.0) < 1.0, f\"Adam should converge reasonably close to b=1.0, got {adam_b}\"\n",
-    "        print(\"✅ Adam convergence works\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Training integration failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test optimizer comparison\n",
-    "    try:\n",
-    "        # Both optimizers should achieve reasonable results\n",
-    "        sgd_error = (sgd_w - 2.0)**2 + (sgd_b - 1.0)**2\n",
-    "        adam_error = (adam_w - 2.0)**2 + (adam_b - 1.0)**2\n",
-    "        \n",
-    "        # Both should have low error (< 0.1)\n",
-    "        assert sgd_error < 0.1, f\"SGD error should be < 0.1, got {sgd_error}\"\n",
-    "        assert adam_error < 1.0, f\"Adam error should be < 1.0, got {adam_error}\"\n",
-    "        print(\"✅ Optimizer comparison works\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Optimizer comparison failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    # Test gradient flow\n",
-    "    try:\n",
-    "        # Create a simple test to verify gradients flow correctly\n",
-    "        w = Variable(1.0, requires_grad=True)\n",
-    "        b = Variable(0.0, requires_grad=True)\n",
-    "        \n",
-    "        # Set up simple gradients\n",
-    "        w.grad = Variable(0.1)\n",
-    "        b.grad = Variable(0.05)\n",
-    "        \n",
-    "        # Test SGD step\n",
-    "        sgd_optimizer = SGD([w, b], learning_rate=0.1)\n",
-    "        original_w = w.data.data.item()\n",
-    "        original_b = b.data.data.item()\n",
-    "        \n",
-    "        sgd_optimizer.step()\n",
-    "        \n",
-    "        # Check updates\n",
-    "        assert w.data.data.item() != original_w, \"SGD should update w\"\n",
-    "        assert b.data.data.item() != original_b, \"SGD should update b\"\n",
-    "        print(\"✅ Gradient flow works correctly\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Gradient flow failed: {e}\")\n",
-    "        raise\n",
-    "\n",
-    "    print(\"🎯 Training integration behavior:\")\n",
-    "    print(\"   Optimizers successfully minimize loss functions\")\n",
-    "    print(\"   SGD and Adam both converge to target values\")\n",
-    "    print(\"   Gradient computation and updates work correctly\")\n",
-    "    print(\"   Ready for real neural network training\")\n",
-    "    print(\"📈 Progress: Complete Training Integration ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_training_integration()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "906c0293",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Optimization Mastery!\n",
-    "\n",
-    "Congratulations! You've successfully implemented the optimization algorithms that power all modern neural network training:\n",
-    "\n",
-    "### ✅ What You've Built\n",
-    "- **Gradient Descent**: The fundamental parameter update mechanism\n",
-    "- **SGD with Momentum**: Accelerated convergence with velocity accumulation\n",
-    "- **Adam Optimizer**: Adaptive learning rates with first and second moments\n",
-    "- **Learning Rate Scheduling**: Smart learning rate adjustment during training\n",
-    "- **Complete Training Integration**: End-to-end training workflow\n",
-    "\n",
-    "### ✅ Key Learning Outcomes\n",
-    "- **Understanding**: How optimizers use gradients to update parameters intelligently\n",
-    "- **Implementation**: Built SGD and Adam optimizers from mathematical foundations\n",
-    "- **Mathematical mastery**: Momentum, adaptive learning rates, bias correction\n",
-    "- **Systems integration**: Complete training loops with scheduling\n",
-    "- **Real-world application**: Modern deep learning training workflow\n",
-    "\n",
-    "### ✅ Mathematical Foundations Mastered\n",
-    "- **Gradient Descent**: θ = θ - α∇L(θ) for parameter updates\n",
-    "- **Momentum**: v_t = βv_{t-1} + ∇L(θ) for acceleration\n",
-    "- **Adam**: Adaptive learning rates with exponential moving averages\n",
-    "- **Learning Rate Scheduling**: Strategic learning rate adjustment\n",
-    "\n",
-    "### ✅ Professional Skills Developed\n",
-    "- **Algorithm implementation**: Translating mathematical formulas into code\n",
-    "- **State management**: Tracking optimizer buffers and statistics\n",
-    "- **Hyperparameter design**: Understanding the impact of learning rate, momentum, etc.\n",
-    "- **Training orchestration**: Complete training loop design\n",
-    "\n",
-    "### ✅ Ready for Advanced Applications\n",
-    "Your optimizers now enable:\n",
-    "- **Deep Neural Networks**: Effective training of complex architectures\n",
-    "- **Computer Vision**: Training CNNs, ResNets, Vision Transformers\n",
-    "- **Natural Language Processing**: Training transformers and language models\n",
-    "- **Any ML Model**: Gradient-based optimization for any differentiable system\n",
-    "\n",
-    "### 🔗 Connection to Real ML Systems\n",
-    "Your implementations mirror production systems:\n",
-    "- **PyTorch**: `torch.optim.SGD()`, `torch.optim.Adam()`, `torch.optim.lr_scheduler.StepLR()`\n",
-    "- **TensorFlow**: `tf.keras.optimizers.SGD()`, `tf.keras.optimizers.Adam()`\n",
-    "- **Industry Standard**: Every major ML framework uses these exact algorithms\n",
-    "\n",
-    "### 🎯 The Power of Intelligent Optimization\n",
-    "You've unlocked the algorithms that made modern AI possible:\n",
-    "- **Scalability**: Efficiently optimize millions of parameters\n",
-    "- **Adaptability**: Different learning rates for different parameters\n",
-    "- **Robustness**: Handle noisy gradients and ill-conditioned problems\n",
-    "- **Universality**: Work with any differentiable neural network\n",
-    "\n",
-    "### 🧠 Deep Learning Revolution\n",
-    "You now understand the optimization technology that powers:\n",
-    "- **ImageNet**: Training state-of-the-art computer vision models\n",
-    "- **Language Models**: Training GPT, BERT, and other transformers\n",
-    "- **Modern AI**: Every breakthrough relies on these optimization algorithms\n",
-    "- **Future Research**: Your understanding enables you to develop new optimizers\n",
-    "\n",
-    "### 🚀 What's Next\n",
-    "Your optimizers are the foundation for:\n",
-    "- **Training Module**: Complete training loops with loss functions and metrics\n",
-    "- **Advanced Optimizers**: RMSprop, AdaGrad, learning rate warm-up\n",
-    "- **Distributed Training**: Multi-GPU optimization strategies\n",
-    "- **Research**: Experimenting with novel optimization algorithms\n",
-    "\n",
-    "**Next Module**: Complete training systems that orchestrate your optimizers for real-world ML!\n",
-    "\n",
-    "You've built the intelligent algorithms that enable neural networks to learn. Now let's use them to train systems that can solve complex real-world problems!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1a18918f",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6fe288c8",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Optimizers\") "
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/11_training/training_dev.ipynb b/modules/source/11_training/training_dev.ipynb
deleted file mode 100644
index 6052b2ad..00000000
--- a/modules/source/11_training/training_dev.ipynb
+++ /dev/null
@@ -1,1597 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "48bae017",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Training - Complete Neural Network Training Pipeline\n",
-    "\n",
-    "Welcome to the Training module! This is where we bring everything together to train neural networks on real data.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand loss functions and how they measure model performance\n",
-    "- Implement essential loss functions: MSE, CrossEntropy, and BinaryCrossEntropy\n",
-    "- Build evaluation metrics for classification and regression\n",
-    "- Create a complete training loop that orchestrates the entire process\n",
-    "- Master checkpointing and model persistence for real-world deployment\n",
-    "\n",
-    "## Build → Use → Optimize\n",
-    "1. **Build**: Loss functions, metrics, and training orchestration\n",
-    "2. **Use**: Train complete models on real datasets\n",
-    "3. **Optimize**: Analyze training dynamics and improve performance"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8f124e75",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "training-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.training\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "import os\n",
-    "import pickle\n",
-    "import json\n",
-    "from pathlib import Path\n",
-    "from typing import List, Dict, Any, Optional, Union, Callable, Tuple\n",
-    "from collections import defaultdict\n",
-    "import time\n",
-    "\n",
-    "# Helper function to set up import paths\n",
-    "def setup_import_paths():\n",
-    "    \"\"\"Set up import paths for development modules.\"\"\"\n",
-    "    import sys\n",
-    "    import os\n",
-    "    \n",
-    "    # Add module directories to path\n",
-    "    base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n",
-    "    module_dirs = [\n",
-    "        '01_tensor', '02_activations', '03_layers', '04_networks', \n",
-    "        '05_cnn', '06_dataloader', '07_autograd', '08_optimizers'\n",
-    "    ]\n",
-    "    \n",
-    "    for module_dir in module_dirs:\n",
-    "        sys.path.append(os.path.join(base_dir, module_dir))\n",
-    "\n",
-    "# Set up paths\n",
-    "setup_import_paths()\n",
-    "\n",
-    "# Import all the building blocks we need\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
-    "    from tinytorch.core.layers import Dense\n",
-    "    from tinytorch.core.networks import Sequential, create_mlp\n",
-    "    from tinytorch.core.cnn import Conv2D, flatten\n",
-    "    from tinytorch.core.dataloader import Dataset, DataLoader\n",
-    "    from tinytorch.core.autograd import Variable\n",
-    "    from tinytorch.core.optimizers import SGD, Adam, StepLR\n",
-    "except ImportError:\n",
-    "    # For development, create mock classes or import from local modules\n",
-    "    try:\n",
-    "        from tensor_dev import Tensor\n",
-    "        from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
-    "        from layers_dev import Dense\n",
-    "        from networks_dev import Sequential, create_mlp\n",
-    "        from cnn_dev import Conv2D, flatten\n",
-    "        from dataloader_dev import Dataset, DataLoader\n",
-    "        from autograd_dev import Variable\n",
-    "        from optimizers_dev import SGD, Adam, StepLR\n",
-    "    except ImportError:\n",
-    "        # Create minimal mock classes for development\n",
-    "        class Tensor:\n",
-    "            def __init__(self, data):\n",
-    "                self.data = np.array(data)\n",
-    "            def __str__(self):\n",
-    "                return f\"Tensor({self.data})\"\n",
-    "        \n",
-    "        class Variable:\n",
-    "            def __init__(self, data, requires_grad=True):\n",
-    "                self.data = Tensor(data)\n",
-    "                self.requires_grad = requires_grad\n",
-    "                self.grad = None\n",
-    "            \n",
-    "            def zero_grad(self):\n",
-    "                self.grad = None\n",
-    "            \n",
-    "            def backward(self):\n",
-    "                if self.requires_grad:\n",
-    "                    self.grad = Variable(1.0, requires_grad=False)\n",
-    "            \n",
-    "            def __str__(self):\n",
-    "                return f\"Variable({self.data})\"\n",
-    "        \n",
-    "        class SGD:\n",
-    "            def __init__(self, parameters, learning_rate=0.01):\n",
-    "                self.parameters = parameters\n",
-    "                self.learning_rate = learning_rate\n",
-    "            \n",
-    "            def zero_grad(self):\n",
-    "                for param in self.parameters:\n",
-    "                    if hasattr(param, 'zero_grad'):\n",
-    "                        param.zero_grad()\n",
-    "            \n",
-    "            def step(self):\n",
-    "                pass\n",
-    "        \n",
-    "        class Sequential:\n",
-    "            def __init__(self, layers=None):\n",
-    "                self.layers = layers or []\n",
-    "            \n",
-    "            def __call__(self, x):\n",
-    "                for layer in self.layers:\n",
-    "                    x = layer(x)\n",
-    "                return x\n",
-    "        \n",
-    "        class DataLoader:\n",
-    "            def __init__(self, dataset, batch_size=32, shuffle=True):\n",
-    "                self.dataset = dataset\n",
-    "                self.batch_size = batch_size\n",
-    "                self.shuffle = shuffle\n",
-    "            \n",
-    "            def __iter__(self):\n",
-    "                return iter([(Tensor([1, 2, 3]), Tensor([0]))])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "dcb56de6",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "training-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "06aeb4ba",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: Understanding Loss Functions\n",
-    "\n",
-    "### What are Loss Functions?\n",
-    "Loss functions measure how far our model's predictions are from the true values. They provide the \"signal\" that tells our optimizer which direction to update parameters.\n",
-    "\n",
-    "### The Mathematical Foundation\n",
-    "Training a neural network is an optimization problem:\n",
-    "```\n",
-    "θ* = argmin_θ L(f(x; θ), y)\n",
-    "```\n",
-    "Where:\n",
-    "- `θ` = model parameters (weights and biases)\n",
-    "- `f(x; θ)` = model predictions\n",
-    "- `y` = true labels\n",
-    "- `L` = loss function\n",
-    "- `θ*` = optimal parameters\n",
-    "\n",
-    "### Why Loss Functions Matter\n",
-    "- **Optimization target**: They define what \"good\" means for our model\n",
-    "- **Gradient source**: Provide gradients for backpropagation\n",
-    "- **Task-specific**: Different losses for different problems\n",
-    "- **Training dynamics**: Shape how the model learns\n",
-    "\n",
-    "### Common Loss Functions\n",
-    "\n",
-    "#### **Mean Squared Error (MSE)** - For Regression\n",
-    "```\n",
-    "MSE = (1/n) * Σ(y_pred - y_true)²\n",
-    "```\n",
-    "- **Use case**: Regression problems\n",
-    "- **Properties**: Penalizes large errors heavily\n",
-    "- **Gradient**: 2 * (y_pred - y_true)\n",
-    "\n",
-    "#### **Cross-Entropy Loss** - For Classification\n",
-    "```\n",
-    "CrossEntropy = -Σ y_true * log(y_pred)\n",
-    "```\n",
-    "- **Use case**: Multi-class classification\n",
-    "- **Properties**: Penalizes confident wrong predictions\n",
-    "- **Gradient**: y_pred - y_true (with softmax)\n",
-    "\n",
-    "#### **Binary Cross-Entropy** - For Binary Classification\n",
-    "```\n",
-    "BCE = -y_true * log(y_pred) - (1-y_true) * log(1-y_pred)\n",
-    "```\n",
-    "- **Use case**: Binary classification\n",
-    "- **Properties**: Symmetric around 0.5\n",
-    "- **Gradient**: (y_pred - y_true) / (y_pred * (1-y_pred))\n",
-    "\n",
-    "Let's implement these essential loss functions!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "430d0134",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "mse-loss",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class MeanSquaredError:\n",
-    "    \"\"\"\n",
-    "    Mean Squared Error Loss for Regression\n",
-    "    \n",
-    "    Measures the average squared difference between predictions and targets.\n",
-    "    MSE = (1/n) * Σ(y_pred - y_true)²\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        \"\"\"Initialize MSE loss function.\"\"\"\n",
-    "        pass\n",
-    "    \n",
-    "    def __call__(self, y_pred: Tensor, y_true: Tensor) -> Tensor:\n",
-    "        \"\"\"\n",
-    "        Compute MSE loss between predictions and targets.\n",
-    "        \n",
-    "        Args:\n",
-    "            y_pred: Model predictions (shape: [batch_size, ...])\n",
-    "            y_true: True targets (shape: [batch_size, ...])\n",
-    "            \n",
-    "        Returns:\n",
-    "            Scalar loss value\n",
-    "            \n",
-    "        TODO: Implement Mean Squared Error loss computation.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Compute difference: diff = y_pred - y_true\n",
-    "        2. Square the differences: squared_diff = diff²\n",
-    "        3. Take mean over all elements: mean(squared_diff)\n",
-    "        4. Return as scalar Tensor\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        y_pred = Tensor([[1.0, 2.0], [3.0, 4.0]])\n",
-    "        y_true = Tensor([[1.5, 2.5], [2.5, 3.5]])\n",
-    "        loss = mse_loss(y_pred, y_true)\n",
-    "        # Should return: mean([(1.0-1.5)², (2.0-2.5)², (3.0-2.5)², (4.0-3.5)²])\n",
-    "        #                = mean([0.25, 0.25, 0.25, 0.25]) = 0.25\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use tensor subtraction: y_pred - y_true\n",
-    "        - Use element-wise multiplication for squaring: diff * diff\n",
-    "        - Use np.mean() to get the average\n",
-    "        - Return Tensor(scalar_value)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Compute difference\n",
-    "        diff = y_pred - y_true\n",
-    "        \n",
-    "        # Square the differences\n",
-    "        squared_diff = diff * diff\n",
-    "        \n",
-    "        # Take mean over all elements\n",
-    "        mean_loss = np.mean(squared_diff.data)\n",
-    "        \n",
-    "        return Tensor(mean_loss)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, y_pred: Tensor, y_true: Tensor) -> Tensor:\n",
-    "        \"\"\"Alternative interface for forward pass.\"\"\"\n",
-    "        return self.__call__(y_pred, y_true)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dfbaf866",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: MSE Loss\n",
-    "\n",
-    "Let's test our MSE loss implementation with known values."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ec126d13",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-mse-loss",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_mse_loss():\n",
-    "    \"\"\"Test MSE loss with comprehensive examples.\"\"\"\n",
-    "    print(\"🔬 Unit Test: MSE Loss...\")\n",
-    "    \n",
-    "    mse = MeanSquaredError()\n",
-    "    \n",
-    "    # Test 1: Perfect predictions (loss should be 0)\n",
-    "    y_pred = Tensor([[1.0, 2.0], [3.0, 4.0]])\n",
-    "    y_true = Tensor([[1.0, 2.0], [3.0, 4.0]])\n",
-    "    loss = mse(y_pred, y_true)\n",
-    "    assert abs(loss.data) < 1e-6, f\"Perfect predictions should have loss ≈ 0, got {loss.data}\"\n",
-    "    print(\"✅ Perfect predictions test passed\")\n",
-    "    \n",
-    "    # Test 2: Known loss computation\n",
-    "    y_pred = Tensor([[1.0, 2.0]])\n",
-    "    y_true = Tensor([[0.0, 1.0]])\n",
-    "    loss = mse(y_pred, y_true)\n",
-    "    expected = 1.0  # [(1-0)² + (2-1)²] / 2 = [1 + 1] / 2 = 1.0\n",
-    "    assert abs(loss.data - expected) < 1e-6, f\"Expected loss {expected}, got {loss.data}\"\n",
-    "    print(\"✅ Known loss computation test passed\")\n",
-    "    \n",
-    "    # Test 3: Batch processing\n",
-    "    y_pred = Tensor([[1.0, 2.0], [3.0, 4.0]])\n",
-    "    y_true = Tensor([[1.5, 2.5], [2.5, 3.5]])\n",
-    "    loss = mse(y_pred, y_true)\n",
-    "    expected = 0.25  # All squared differences are 0.25\n",
-    "    assert abs(loss.data - expected) < 1e-6, f\"Expected batch loss {expected}, got {loss.data}\"\n",
-    "    print(\"✅ Batch processing test passed\")\n",
-    "    \n",
-    "    # Test 4: Single value\n",
-    "    y_pred = Tensor([5.0])\n",
-    "    y_true = Tensor([3.0])\n",
-    "    loss = mse(y_pred, y_true)\n",
-    "    expected = 4.0  # (5-3)² = 4\n",
-    "    assert abs(loss.data - expected) < 1e-6, f\"Expected single value loss {expected}, got {loss.data}\"\n",
-    "    print(\"✅ Single value test passed\")\n",
-    "    \n",
-    "    print(\"🎯 MSE Loss: All tests passed!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_mse_loss() "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4e8c6d61",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "crossentropy-loss",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class CrossEntropyLoss:\n",
-    "    \"\"\"\n",
-    "    Cross-Entropy Loss for Multi-Class Classification\n",
-    "    \n",
-    "    Measures the difference between predicted probability distribution and true labels.\n",
-    "    CrossEntropy = -Σ y_true * log(y_pred)\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        \"\"\"Initialize CrossEntropy loss function.\"\"\"\n",
-    "        pass\n",
-    "    \n",
-    "    def __call__(self, y_pred: Tensor, y_true: Tensor) -> Tensor:\n",
-    "        \"\"\"\n",
-    "        Compute CrossEntropy loss between predictions and targets.\n",
-    "        \n",
-    "        Args:\n",
-    "            y_pred: Model predictions (shape: [batch_size, num_classes])\n",
-    "            y_true: True class indices (shape: [batch_size]) or one-hot (shape: [batch_size, num_classes])\n",
-    "            \n",
-    "        Returns:\n",
-    "            Scalar loss value\n",
-    "            \n",
-    "        TODO: Implement Cross-Entropy loss computation.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Handle both class indices and one-hot encoded labels\n",
-    "        2. Apply softmax to predictions for probability distribution\n",
-    "        3. Compute log probabilities: log(softmax(y_pred))\n",
-    "        4. Calculate cross-entropy: -mean(y_true * log_probs)\n",
-    "        5. Return scalar loss\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        y_pred = Tensor([[2.0, 1.0, 0.1], [0.5, 2.1, 0.9]])  # Raw logits\n",
-    "        y_true = Tensor([0, 1])  # Class indices\n",
-    "        loss = crossentropy_loss(y_pred, y_true)\n",
-    "        # Should apply softmax then compute -log(prob_of_correct_class)\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use softmax: exp(x) / sum(exp(x)) for probability distribution\n",
-    "        - Add small epsilon (1e-15) to avoid log(0)\n",
-    "        - Handle both class indices and one-hot encoding\n",
-    "        - Use np.log for logarithm computation\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Handle both 1D and 2D prediction arrays\n",
-    "        if y_pred.data.ndim == 1:\n",
-    "            # Reshape 1D to 2D for consistency (single sample)\n",
-    "            y_pred_2d = y_pred.data.reshape(1, -1)\n",
-    "        else:\n",
-    "            y_pred_2d = y_pred.data\n",
-    "            \n",
-    "        # Apply softmax to get probability distribution\n",
-    "        exp_pred = np.exp(y_pred_2d - np.max(y_pred_2d, axis=1, keepdims=True))\n",
-    "        softmax_pred = exp_pred / np.sum(exp_pred, axis=1, keepdims=True)\n",
-    "        \n",
-    "        # Add small epsilon to avoid log(0)\n",
-    "        epsilon = 1e-15\n",
-    "        softmax_pred = np.clip(softmax_pred, epsilon, 1.0 - epsilon)\n",
-    "        \n",
-    "        # Handle class indices vs one-hot encoding\n",
-    "        if len(y_true.data.shape) == 1:\n",
-    "            # y_true contains class indices\n",
-    "            batch_size = y_true.data.shape[0]\n",
-    "            log_probs = np.log(softmax_pred[np.arange(batch_size), y_true.data.astype(int)])\n",
-    "            loss = -np.mean(log_probs)\n",
-    "        else:\n",
-    "            # y_true is one-hot encoded\n",
-    "            log_probs = np.log(softmax_pred)\n",
-    "            loss = -np.mean(np.sum(y_true.data * log_probs, axis=1))\n",
-    "        \n",
-    "        return Tensor(loss)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, y_pred: Tensor, y_true: Tensor) -> Tensor:\n",
-    "        \"\"\"Alternative interface for forward pass.\"\"\"\n",
-    "        return self.__call__(y_pred, y_true)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0b900718",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: CrossEntropy Loss\n",
-    "\n",
-    "Let's test our CrossEntropy loss implementation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1eae889b",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-crossentropy-loss",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_crossentropy_loss():\n",
-    "    \"\"\"Test CrossEntropy loss with comprehensive examples.\"\"\"\n",
-    "    print(\"🔬 Unit Test: CrossEntropy Loss...\")\n",
-    "    \n",
-    "    ce = CrossEntropyLoss()\n",
-    "    \n",
-    "    # Test 1: Perfect predictions\n",
-    "    y_pred = Tensor([[10.0, 0.0, 0.0], [0.0, 10.0, 0.0]])  # Very confident correct predictions\n",
-    "    y_true = Tensor([0, 1])  # Class indices\n",
-    "    loss = ce(y_pred, y_true)\n",
-    "    assert loss.data < 0.1, f\"Perfect predictions should have low loss, got {loss.data}\"\n",
-    "    print(\"✅ Perfect predictions test passed\")\n",
-    "    \n",
-    "    # Test 2: Random predictions (should have higher loss)\n",
-    "    y_pred = Tensor([[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])  # Uniform after softmax\n",
-    "    y_true = Tensor([0, 1])\n",
-    "    loss = ce(y_pred, y_true)\n",
-    "    expected_random = -np.log(1.0/3.0)  # log(1/num_classes) for uniform distribution\n",
-    "    assert abs(loss.data - expected_random) < 0.1, f\"Random predictions should have loss ≈ {expected_random}, got {loss.data}\"\n",
-    "    print(\"✅ Random predictions test passed\")\n",
-    "    \n",
-    "    # Test 3: Binary classification\n",
-    "    y_pred = Tensor([[2.0, 1.0], [1.0, 2.0]])\n",
-    "    y_true = Tensor([0, 1])\n",
-    "    loss = ce(y_pred, y_true)\n",
-    "    assert 0.0 < loss.data < 2.0, f\"Binary classification loss should be reasonable, got {loss.data}\"\n",
-    "    print(\"✅ Binary classification test passed\")\n",
-    "    \n",
-    "    # Test 4: One-hot encoded labels\n",
-    "    y_pred = Tensor([[2.0, 1.0, 0.0], [0.0, 2.0, 1.0]])\n",
-    "    y_true = Tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]])  # One-hot encoded\n",
-    "    loss = ce(y_pred, y_true)\n",
-    "    assert 0.0 < loss.data < 2.0, f\"One-hot encoded loss should be reasonable, got {loss.data}\"\n",
-    "    print(\"✅ One-hot encoded labels test passed\")\n",
-    "    \n",
-    "    print(\"🎯 CrossEntropy Loss: All tests passed!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_crossentropy_loss()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ae252f27",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "binary-crossentropy-loss",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class BinaryCrossEntropyLoss:\n",
-    "    \"\"\"\n",
-    "    Binary Cross-Entropy Loss for Binary Classification\n",
-    "    \n",
-    "    Measures the difference between predicted probabilities and binary labels.\n",
-    "    BCE = -y_true * log(y_pred) - (1-y_true) * log(1-y_pred)\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        \"\"\"Initialize Binary CrossEntropy loss function.\"\"\"\n",
-    "        pass\n",
-    "    \n",
-    "    def __call__(self, y_pred: Tensor, y_true: Tensor) -> Tensor:\n",
-    "        \"\"\"\n",
-    "        Compute Binary CrossEntropy loss between predictions and targets.\n",
-    "        \n",
-    "        Args:\n",
-    "            y_pred: Model predictions (shape: [batch_size, 1] or [batch_size])\n",
-    "            y_true: True binary labels (shape: [batch_size, 1] or [batch_size])\n",
-    "            \n",
-    "        Returns:\n",
-    "            Scalar loss value\n",
-    "            \n",
-    "        TODO: Implement Binary Cross-Entropy loss computation.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Apply sigmoid to predictions for probability values\n",
-    "        2. Clip probabilities to avoid log(0) and log(1)\n",
-    "        3. Compute: -y_true * log(y_pred) - (1-y_true) * log(1-y_pred)\n",
-    "        4. Take mean over batch\n",
-    "        5. Return scalar loss\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        y_pred = Tensor([[2.0], [0.0], [-1.0]])  # Raw logits\n",
-    "        y_true = Tensor([[1.0], [1.0], [0.0]])   # Binary labels\n",
-    "        loss = bce_loss(y_pred, y_true)\n",
-    "        # Should apply sigmoid then compute binary cross-entropy\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use sigmoid: 1 / (1 + exp(-x))\n",
-    "        - Clip probabilities: np.clip(probs, epsilon, 1-epsilon)\n",
-    "        - Handle both [batch_size] and [batch_size, 1] shapes\n",
-    "        - Use np.log for logarithm computation\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Use numerically stable implementation directly from logits\n",
-    "        # This avoids computing sigmoid and log separately\n",
-    "        logits = y_pred.data.flatten()\n",
-    "        labels = y_true.data.flatten()\n",
-    "        \n",
-    "        # Numerically stable binary cross-entropy from logits\n",
-    "        # Uses the identity: log(1 + exp(x)) = max(x, 0) + log(1 + exp(-abs(x)))\n",
-    "        def stable_bce_with_logits(logits, labels):\n",
-    "            # For each sample: -[y*log(sigmoid(x)) + (1-y)*log(1-sigmoid(x))]\n",
-    "            # Which equals: -[y*log_sigmoid(x) + (1-y)*log_sigmoid(-x)]\n",
-    "            # Where log_sigmoid(x) = x - log(1 + exp(x)) = x - softplus(x)\n",
-    "            \n",
-    "            # Compute log(sigmoid(x)) = x - log(1 + exp(x))\n",
-    "            # Use numerical stability: log(1 + exp(x)) = max(0, x) + log(1 + exp(-abs(x)))\n",
-    "            def log_sigmoid(x):\n",
-    "                return x - np.maximum(0, x) - np.log(1 + np.exp(-np.abs(x)))\n",
-    "            \n",
-    "            # Compute log(1 - sigmoid(x)) = -x - log(1 + exp(-x))\n",
-    "            def log_one_minus_sigmoid(x):\n",
-    "                return -x - np.maximum(0, -x) - np.log(1 + np.exp(-np.abs(x)))\n",
-    "            \n",
-    "            # Binary cross-entropy: -[y*log_sigmoid(x) + (1-y)*log_sigmoid(-x)]\n",
-    "            loss = -(labels * log_sigmoid(logits) + (1 - labels) * log_one_minus_sigmoid(logits))\n",
-    "            return loss\n",
-    "        \n",
-    "        # Compute loss for each sample\n",
-    "        losses = stable_bce_with_logits(logits, labels)\n",
-    "        \n",
-    "        # Take mean over batch\n",
-    "        mean_loss = np.mean(losses)\n",
-    "        \n",
-    "        return Tensor(mean_loss)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, y_pred: Tensor, y_true: Tensor) -> Tensor:\n",
-    "        \"\"\"Alternative interface for forward pass.\"\"\"\n",
-    "        return self.__call__(y_pred, y_true)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b23584bd",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Binary CrossEntropy Loss\n",
-    "\n",
-    "Let's test our Binary CrossEntropy loss implementation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "895f7af5",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-binary-crossentropy-loss",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_binary_crossentropy_loss():\n",
-    "    \"\"\"Test Binary CrossEntropy loss with comprehensive examples.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Binary CrossEntropy Loss...\")\n",
-    "    \n",
-    "    bce = BinaryCrossEntropyLoss()\n",
-    "    \n",
-    "    # Test 1: Perfect predictions\n",
-    "    y_pred = Tensor([[10.0], [-10.0]])  # Very confident correct predictions\n",
-    "    y_true = Tensor([[1.0], [0.0]])\n",
-    "    loss = bce(y_pred, y_true)\n",
-    "    assert loss.data < 0.1, f\"Perfect predictions should have low loss, got {loss.data}\"\n",
-    "    print(\"✅ Perfect predictions test passed\")\n",
-    "    \n",
-    "    # Test 2: Random predictions (should have higher loss)\n",
-    "    y_pred = Tensor([[0.0], [0.0]])  # 0.5 probability after sigmoid\n",
-    "    y_true = Tensor([[1.0], [0.0]])\n",
-    "    loss = bce(y_pred, y_true)\n",
-    "    expected_random = -np.log(0.5)  # log(0.5) for random guessing\n",
-    "    assert abs(loss.data - expected_random) < 0.1, f\"Random predictions should have loss ≈ {expected_random}, got {loss.data}\"\n",
-    "    print(\"✅ Random predictions test passed\")\n",
-    "    \n",
-    "    # Test 3: Batch processing\n",
-    "    y_pred = Tensor([[1.0], [2.0], [-1.0]])\n",
-    "    y_true = Tensor([[1.0], [1.0], [0.0]])\n",
-    "    loss = bce(y_pred, y_true)\n",
-    "    assert 0.0 < loss.data < 2.0, f\"Batch processing loss should be reasonable, got {loss.data}\"\n",
-    "    print(\"✅ Batch processing test passed\")\n",
-    "    \n",
-    "    # Test 4: Edge cases\n",
-    "    y_pred = Tensor([[100.0], [-100.0]])  # Extreme values\n",
-    "    y_true = Tensor([[1.0], [0.0]])\n",
-    "    loss = bce(y_pred, y_true)\n",
-    "    assert loss.data < 0.1, f\"Extreme correct predictions should have low loss, got {loss.data}\"\n",
-    "    print(\"✅ Edge cases test passed\")\n",
-    "    \n",
-    "    print(\"🎯 Binary CrossEntropy Loss: All tests passed!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_binary_crossentropy_loss() "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "dee0efd8",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Understanding Metrics\n",
-    "\n",
-    "### What are Metrics?\n",
-    "Metrics are measurements that help us understand how well our model is performing. Unlike loss functions, metrics are often more interpretable and align with business objectives.\n",
-    "\n",
-    "### Key Metrics for Classification\n",
-    "\n",
-    "#### **Accuracy**\n",
-    "```\n",
-    "Accuracy = (Correct Predictions) / (Total Predictions)\n",
-    "```\n",
-    "- **Range**: [0, 1]\n",
-    "- **Interpretation**: Percentage of correct predictions\n",
-    "- **Good for**: Balanced datasets\n",
-    "\n",
-    "#### **Precision**\n",
-    "```\n",
-    "Precision = True Positives / (True Positives + False Positives)\n",
-    "```\n",
-    "- **Range**: [0, 1]\n",
-    "- **Interpretation**: Of all positive predictions, how many were correct?\n",
-    "- **Good for**: When false positives are costly\n",
-    "\n",
-    "#### **Recall (Sensitivity)**\n",
-    "```\n",
-    "Recall = True Positives / (True Positives + False Negatives)\n",
-    "```\n",
-    "- **Range**: [0, 1]\n",
-    "- **Interpretation**: Of all actual positives, how many did we find?\n",
-    "- **Good for**: When false negatives are costly\n",
-    "\n",
-    "### Key Metrics for Regression\n",
-    "\n",
-    "#### **Mean Absolute Error (MAE)**\n",
-    "```\n",
-    "MAE = (1/n) * Σ|y_pred - y_true|\n",
-    "```\n",
-    "- **Range**: [0, ∞)\n",
-    "- **Interpretation**: Average absolute error\n",
-    "- **Good for**: Robust to outliers\n",
-    "\n",
-    "Let's implement these essential metrics!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1affd9b1",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "accuracy-metric",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Accuracy:\n",
-    "    \"\"\"\n",
-    "    Accuracy Metric for Classification\n",
-    "    \n",
-    "    Computes the fraction of correct predictions.\n",
-    "    Accuracy = (Correct Predictions) / (Total Predictions)\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        \"\"\"Initialize Accuracy metric.\"\"\"\n",
-    "        pass\n",
-    "    \n",
-    "    def __call__(self, y_pred: Tensor, y_true: Tensor) -> float:\n",
-    "        \"\"\"\n",
-    "        Compute accuracy between predictions and targets.\n",
-    "        \n",
-    "        Args:\n",
-    "            y_pred: Model predictions (shape: [batch_size, num_classes] or [batch_size])\n",
-    "            y_true: True class labels (shape: [batch_size] or [batch_size])\n",
-    "            \n",
-    "        Returns:\n",
-    "            Accuracy as a float value between 0 and 1\n",
-    "            \n",
-    "        TODO: Implement accuracy computation.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Convert predictions to class indices (argmax for multi-class)\n",
-    "        2. Convert true labels to class indices if needed\n",
-    "        3. Count correct predictions\n",
-    "        4. Divide by total predictions\n",
-    "        5. Return as float\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        y_pred = Tensor([[0.9, 0.1], [0.2, 0.8], [0.6, 0.4]])  # Probabilities\n",
-    "        y_true = Tensor([0, 1, 0])  # True classes\n",
-    "        accuracy = accuracy_metric(y_pred, y_true)\n",
-    "        # Should return: 2/3 = 0.667 (first and second predictions correct)\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use np.argmax(axis=1) for multi-class predictions\n",
-    "        - Handle both probability and class index inputs\n",
-    "        - Use np.mean() for averaging\n",
-    "        - Return Python float, not Tensor\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Convert predictions to class indices\n",
-    "        if len(y_pred.data.shape) > 1 and y_pred.data.shape[1] > 1:\n",
-    "            # Multi-class: use argmax\n",
-    "            pred_classes = np.argmax(y_pred.data, axis=1)\n",
-    "        else:\n",
-    "            # Binary classification: threshold at 0.5\n",
-    "            pred_classes = (y_pred.data.flatten() > 0.5).astype(int)\n",
-    "        \n",
-    "        # Convert true labels to class indices if needed\n",
-    "        if len(y_true.data.shape) > 1 and y_true.data.shape[1] > 1:\n",
-    "            # One-hot encoded\n",
-    "            true_classes = np.argmax(y_true.data, axis=1)\n",
-    "        else:\n",
-    "            # Already class indices\n",
-    "            true_classes = y_true.data.flatten().astype(int)\n",
-    "        \n",
-    "        # Compute accuracy\n",
-    "        correct = np.sum(pred_classes == true_classes)\n",
-    "        total = len(true_classes)\n",
-    "        accuracy = correct / total\n",
-    "        \n",
-    "        return float(accuracy)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def forward(self, y_pred: Tensor, y_true: Tensor) -> float:\n",
-    "        \"\"\"Alternative interface for forward pass.\"\"\"\n",
-    "        return self.__call__(y_pred, y_true)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9011e646",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Accuracy Metric\n",
-    "\n",
-    "Let's test our Accuracy metric implementation."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b79cec2c",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-accuracy-metric",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_accuracy_metric():\n",
-    "    \"\"\"Test Accuracy metric with comprehensive examples.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Accuracy Metric...\")\n",
-    "    \n",
-    "    accuracy = Accuracy()\n",
-    "    \n",
-    "    # Test 1: Perfect predictions\n",
-    "    y_pred = Tensor([[0.9, 0.1], [0.1, 0.9], [0.8, 0.2]])\n",
-    "    y_true = Tensor([0, 1, 0])\n",
-    "    acc = accuracy(y_pred, y_true)\n",
-    "    assert acc == 1.0, f\"Perfect predictions should have accuracy 1.0, got {acc}\"\n",
-    "    print(\"✅ Perfect predictions test passed\")\n",
-    "    \n",
-    "    # Test 2: Half correct\n",
-    "    y_pred = Tensor([[0.9, 0.1], [0.9, 0.1], [0.8, 0.2]])  # All predict class 0\n",
-    "    y_true = Tensor([0, 1, 0])  # Classes: 0, 1, 0\n",
-    "    acc = accuracy(y_pred, y_true)\n",
-    "    expected = 2.0/3.0  # 2 out of 3 correct\n",
-    "    assert abs(acc - expected) < 1e-6, f\"Half correct should have accuracy {expected}, got {acc}\"\n",
-    "    print(\"✅ Half correct test passed\")\n",
-    "    \n",
-    "    # Test 3: Binary classification\n",
-    "    y_pred = Tensor([[0.8], [0.3], [0.9], [0.1]])  # Predictions above/below 0.5\n",
-    "    y_true = Tensor([1, 0, 1, 0])\n",
-    "    acc = accuracy(y_pred, y_true)\n",
-    "    assert acc == 1.0, f\"Binary classification should have accuracy 1.0, got {acc}\"\n",
-    "    print(\"✅ Binary classification test passed\")\n",
-    "    \n",
-    "    # Test 4: Multi-class\n",
-    "    y_pred = Tensor([[0.7, 0.2, 0.1], [0.1, 0.8, 0.1], [0.1, 0.1, 0.8]])\n",
-    "    y_true = Tensor([0, 1, 2])\n",
-    "    acc = accuracy(y_pred, y_true)\n",
-    "    assert acc == 1.0, f\"Multi-class should have accuracy 1.0, got {acc}\"\n",
-    "    print(\"✅ Multi-class test passed\")\n",
-    "    \n",
-    "    print(\"🎯 Accuracy Metric: All tests passed!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_accuracy_metric()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9203bd67",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Building the Training Loop\n",
-    "\n",
-    "### What is a Training Loop?\n",
-    "A training loop is the orchestration logic that coordinates all components of neural network training:\n",
-    "\n",
-    "1. **Forward Pass**: Compute predictions\n",
-    "2. **Loss Computation**: Measure prediction quality\n",
-    "3. **Backward Pass**: Compute gradients\n",
-    "4. **Parameter Update**: Update model parameters\n",
-    "5. **Evaluation**: Compute metrics and validation performance\n",
-    "\n",
-    "### The Training Loop Architecture\n",
-    "```python\n",
-    "for epoch in range(num_epochs):\n",
-    "    # Training phase\n",
-    "    for batch in train_dataloader:\n",
-    "        optimizer.zero_grad()\n",
-    "        predictions = model(batch_x)\n",
-    "        loss = loss_function(predictions, batch_y)\n",
-    "        loss.backward()\n",
-    "        optimizer.step()\n",
-    "    \n",
-    "    # Validation phase\n",
-    "    for batch in val_dataloader:\n",
-    "        predictions = model(batch_x)\n",
-    "        val_loss = loss_function(predictions, batch_y)\n",
-    "        accuracy = accuracy_metric(predictions, batch_y)\n",
-    "```\n",
-    "\n",
-    "### Why We Need a Trainer Class\n",
-    "- **Encapsulation**: Keeps training logic organized\n",
-    "- **Reusability**: Same trainer works with different models/datasets\n",
-    "- **Monitoring**: Built-in logging and progress tracking\n",
-    "- **Flexibility**: Easy to modify training behavior\n",
-    "\n",
-    "Let's build our Trainer class!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f2b1f5e9",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "trainer-class",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class Trainer:\n",
-    "    \"\"\"\n",
-    "    Training Loop Orchestrator\n",
-    "    \n",
-    "    Coordinates model training with loss functions, optimizers, and metrics.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, model, optimizer, loss_function, metrics=None):\n",
-    "        \"\"\"\n",
-    "        Initialize trainer with model and training components.\n",
-    "        \n",
-    "        Args:\n",
-    "            model: Neural network model to train\n",
-    "            optimizer: Optimizer for parameter updates\n",
-    "            loss_function: Loss function for training\n",
-    "            metrics: List of metrics to track (optional)\n",
-    "            \n",
-    "        TODO: Initialize the trainer with all necessary components.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Store model, optimizer, loss function, and metrics\n",
-    "        2. Initialize history tracking for losses and metrics\n",
-    "        3. Set up training state (epoch, step counters)\n",
-    "        4. Prepare for training and validation loops\n",
-    "        \n",
-    "        EXAMPLE:\n",
-    "        model = Sequential([Dense(10, 5), ReLU(), Dense(5, 2)])\n",
-    "        optimizer = Adam(model.parameters, learning_rate=0.001)\n",
-    "        loss_fn = CrossEntropyLoss()\n",
-    "        metrics = [Accuracy()]\n",
-    "        trainer = Trainer(model, optimizer, loss_fn, metrics)\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Store all components as instance variables\n",
-    "        - Initialize empty history dictionaries\n",
-    "        - Set metrics to empty list if None provided\n",
-    "        - Initialize epoch and step counters to 0\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.model = model\n",
-    "        self.optimizer = optimizer\n",
-    "        self.loss_function = loss_function\n",
-    "        self.metrics = metrics or []\n",
-    "        \n",
-    "        # Training history\n",
-    "        self.history = {\n",
-    "            'train_loss': [],\n",
-    "            'val_loss': [],\n",
-    "            'epoch': []\n",
-    "        }\n",
-    "        \n",
-    "        # Add metric history tracking\n",
-    "        for metric in self.metrics:\n",
-    "            metric_name = metric.__class__.__name__.lower()\n",
-    "            self.history[f'train_{metric_name}'] = []\n",
-    "            self.history[f'val_{metric_name}'] = []\n",
-    "        \n",
-    "        # Training state\n",
-    "        self.current_epoch = 0\n",
-    "        self.current_step = 0\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def train_epoch(self, dataloader):\n",
-    "        \"\"\"\n",
-    "        Train for one epoch on the given dataloader.\n",
-    "        \n",
-    "        Args:\n",
-    "            dataloader: DataLoader containing training data\n",
-    "            \n",
-    "        Returns:\n",
-    "            Dictionary with epoch training metrics\n",
-    "            \n",
-    "        TODO: Implement single epoch training logic.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Initialize epoch metrics tracking\n",
-    "        2. Iterate through batches in dataloader\n",
-    "        3. For each batch:\n",
-    "           - Zero gradients\n",
-    "           - Forward pass\n",
-    "           - Compute loss\n",
-    "           - Backward pass\n",
-    "           - Update parameters\n",
-    "           - Track metrics\n",
-    "        4. Return averaged metrics for the epoch\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use optimizer.zero_grad() before each batch\n",
-    "        - Call loss.backward() for gradient computation\n",
-    "        - Use optimizer.step() for parameter updates\n",
-    "        - Track running averages for metrics\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        epoch_metrics = {'loss': 0.0}\n",
-    "        \n",
-    "        # Initialize metric tracking\n",
-    "        for metric in self.metrics:\n",
-    "            metric_name = metric.__class__.__name__.lower()\n",
-    "            epoch_metrics[metric_name] = 0.0\n",
-    "        \n",
-    "        batch_count = 0\n",
-    "        \n",
-    "        for batch_x, batch_y in dataloader:\n",
-    "            # Zero gradients\n",
-    "            self.optimizer.zero_grad()\n",
-    "            \n",
-    "            # Forward pass\n",
-    "            predictions = self.model(batch_x)\n",
-    "            \n",
-    "            # Compute loss\n",
-    "            loss = self.loss_function(predictions, batch_y)\n",
-    "            \n",
-    "            # Backward pass (simplified - in real implementation would use autograd)\n",
-    "            # loss.backward()\n",
-    "            \n",
-    "            # Update parameters\n",
-    "            self.optimizer.step()\n",
-    "            \n",
-    "            # Track metrics\n",
-    "            epoch_metrics['loss'] += loss.data\n",
-    "            \n",
-    "            for metric in self.metrics:\n",
-    "                metric_name = metric.__class__.__name__.lower()\n",
-    "                metric_value = metric(predictions, batch_y)\n",
-    "                epoch_metrics[metric_name] += metric_value\n",
-    "            \n",
-    "            batch_count += 1\n",
-    "            self.current_step += 1\n",
-    "        \n",
-    "        # Average metrics over all batches\n",
-    "        for key in epoch_metrics:\n",
-    "            epoch_metrics[key] /= batch_count\n",
-    "        \n",
-    "        return epoch_metrics\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def validate_epoch(self, dataloader):\n",
-    "        \"\"\"\n",
-    "        Validate for one epoch on the given dataloader.\n",
-    "        \n",
-    "        Args:\n",
-    "            dataloader: DataLoader containing validation data\n",
-    "            \n",
-    "        Returns:\n",
-    "            Dictionary with epoch validation metrics\n",
-    "            \n",
-    "        TODO: Implement single epoch validation logic.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Initialize epoch metrics tracking\n",
-    "        2. Iterate through batches in dataloader\n",
-    "        3. For each batch:\n",
-    "           - Forward pass (no gradient computation)\n",
-    "           - Compute loss\n",
-    "           - Track metrics\n",
-    "        4. Return averaged metrics for the epoch\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - No gradient computation needed for validation\n",
-    "        - No parameter updates during validation\n",
-    "        - Similar to train_epoch but simpler\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        epoch_metrics = {'loss': 0.0}\n",
-    "        \n",
-    "        # Initialize metric tracking\n",
-    "        for metric in self.metrics:\n",
-    "            metric_name = metric.__class__.__name__.lower()\n",
-    "            epoch_metrics[metric_name] = 0.0\n",
-    "        \n",
-    "        batch_count = 0\n",
-    "        \n",
-    "        for batch_x, batch_y in dataloader:\n",
-    "            # Forward pass only (no gradients needed)\n",
-    "            predictions = self.model(batch_x)\n",
-    "            \n",
-    "            # Compute loss\n",
-    "            loss = self.loss_function(predictions, batch_y)\n",
-    "            \n",
-    "            # Track metrics\n",
-    "            epoch_metrics['loss'] += loss.data\n",
-    "            \n",
-    "            for metric in self.metrics:\n",
-    "                metric_name = metric.__class__.__name__.lower()\n",
-    "                metric_value = metric(predictions, batch_y)\n",
-    "                epoch_metrics[metric_name] += metric_value\n",
-    "            \n",
-    "            batch_count += 1\n",
-    "        \n",
-    "        # Average metrics over all batches\n",
-    "        for key in epoch_metrics:\n",
-    "            epoch_metrics[key] /= batch_count\n",
-    "        \n",
-    "        return epoch_metrics\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def fit(self, train_dataloader, val_dataloader=None, epochs=10, verbose=True):\n",
-    "        \"\"\"\n",
-    "        Train the model for specified number of epochs.\n",
-    "        \n",
-    "        Args:\n",
-    "            train_dataloader: Training data\n",
-    "            val_dataloader: Validation data (optional)\n",
-    "            epochs: Number of training epochs\n",
-    "            verbose: Whether to print training progress\n",
-    "            \n",
-    "        Returns:\n",
-    "            Training history dictionary\n",
-    "            \n",
-    "        TODO: Implement complete training loop.\n",
-    "        \n",
-    "        APPROACH:\n",
-    "        1. Loop through epochs\n",
-    "        2. For each epoch:\n",
-    "           - Train on training data\n",
-    "           - Validate on validation data (if provided)\n",
-    "           - Update history\n",
-    "           - Print progress (if verbose)\n",
-    "        3. Return complete training history\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use train_epoch() and validate_epoch() methods\n",
-    "        - Update self.history with results\n",
-    "        - Print epoch summary if verbose=True\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        print(f\"Starting training for {epochs} epochs...\")\n",
-    "        \n",
-    "        for epoch in range(epochs):\n",
-    "            self.current_epoch = epoch\n",
-    "            \n",
-    "            # Training phase\n",
-    "            train_metrics = self.train_epoch(train_dataloader)\n",
-    "            \n",
-    "            # Validation phase\n",
-    "            val_metrics = {}\n",
-    "            if val_dataloader is not None:\n",
-    "                val_metrics = self.validate_epoch(val_dataloader)\n",
-    "            \n",
-    "            # Update history\n",
-    "            self.history['epoch'].append(epoch)\n",
-    "            self.history['train_loss'].append(train_metrics['loss'])\n",
-    "            \n",
-    "            if val_dataloader is not None:\n",
-    "                self.history['val_loss'].append(val_metrics['loss'])\n",
-    "            \n",
-    "            # Update metric history\n",
-    "            for metric in self.metrics:\n",
-    "                metric_name = metric.__class__.__name__.lower()\n",
-    "                self.history[f'train_{metric_name}'].append(train_metrics[metric_name])\n",
-    "                if val_dataloader is not None:\n",
-    "                    self.history[f'val_{metric_name}'].append(val_metrics[metric_name])\n",
-    "            \n",
-    "            # Print progress\n",
-    "            if verbose:\n",
-    "                train_loss = train_metrics['loss']\n",
-    "                print(f\"Epoch {epoch+1}/{epochs} - train_loss: {train_loss:.4f}\", end=\"\")\n",
-    "                \n",
-    "                if val_dataloader is not None:\n",
-    "                    val_loss = val_metrics['loss']\n",
-    "                    print(f\" - val_loss: {val_loss:.4f}\", end=\"\")\n",
-    "                \n",
-    "                for metric in self.metrics:\n",
-    "                    metric_name = metric.__class__.__name__.lower()\n",
-    "                    train_metric = train_metrics[metric_name]\n",
-    "                    print(f\" - train_{metric_name}: {train_metric:.4f}\", end=\"\")\n",
-    "                    \n",
-    "                    if val_dataloader is not None:\n",
-    "                        val_metric = val_metrics[metric_name]\n",
-    "                        print(f\" - val_{metric_name}: {val_metric:.4f}\", end=\"\")\n",
-    "                \n",
-    "                print()  # New line\n",
-    "        \n",
-    "        print(\"Training completed!\")\n",
-    "        return self.history\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8e402189",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Training Loop\n",
-    "\n",
-    "Let's test our Trainer class with a simple example."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bb82e4e0",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-trainer",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_trainer():\n",
-    "    \"\"\"Test Trainer class with comprehensive examples.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Trainer Class...\")\n",
-    "    \n",
-    "    # Create simple model and components\n",
-    "    model = Sequential([Dense(2, 3), ReLU(), Dense(3, 2)])  # Simple model\n",
-    "    optimizer = SGD([], learning_rate=0.01)  # Empty parameters list for testing\n",
-    "    loss_fn = MeanSquaredError()\n",
-    "    metrics = [Accuracy()]\n",
-    "    \n",
-    "    # Create trainer\n",
-    "    trainer = Trainer(model, optimizer, loss_fn, metrics)\n",
-    "    \n",
-    "    # Test 1: Trainer initialization\n",
-    "    assert trainer.model is model, \"Model should be stored correctly\"\n",
-    "    assert trainer.optimizer is optimizer, \"Optimizer should be stored correctly\"\n",
-    "    assert trainer.loss_function is loss_fn, \"Loss function should be stored correctly\"\n",
-    "    assert len(trainer.metrics) == 1, \"Metrics should be stored correctly\"\n",
-    "    assert 'train_loss' in trainer.history, \"Training history should be initialized\"\n",
-    "    print(\"✅ Trainer initialization test passed\")\n",
-    "    \n",
-    "    # Test 2: History structure\n",
-    "    assert 'epoch' in trainer.history, \"History should track epochs\"\n",
-    "    assert 'train_accuracy' in trainer.history, \"History should track training accuracy\"\n",
-    "    assert 'val_accuracy' in trainer.history, \"History should track validation accuracy\"\n",
-    "    print(\"✅ History structure test passed\")\n",
-    "    \n",
-    "    # Test 3: Training state\n",
-    "    assert trainer.current_epoch == 0, \"Current epoch should start at 0\"\n",
-    "    assert trainer.current_step == 0, \"Current step should start at 0\"\n",
-    "    print(\"✅ Training state test passed\")\n",
-    "    \n",
-    "    print(\"🎯 Trainer Class: All tests passed!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_trainer()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a390a3c5",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Complete Training Comprehensive Test\n",
-    "\n",
-    "Let's test the complete training pipeline with all components working together.\n",
-    "\n",
-    "**This is a comprehensive test** - it tests all training components working together in a realistic scenario."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b12748b9",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-training-comprehensive",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_training():\n",
-    "    \"\"\"Test complete training pipeline with all components.\"\"\"\n",
-    "    print(\"🔬 Comprehensive Test: Complete Training Pipeline...\")\n",
-    "    \n",
-    "    try:\n",
-    "        # Test 1: Loss functions work correctly\n",
-    "        mse = MeanSquaredError()\n",
-    "        ce = CrossEntropyLoss()\n",
-    "        bce = BinaryCrossEntropyLoss()\n",
-    "        \n",
-    "        # MSE test\n",
-    "        y_pred = Tensor([[1.0, 2.0]])\n",
-    "        y_true = Tensor([[1.0, 2.0]])\n",
-    "        loss = mse(y_pred, y_true)\n",
-    "        assert abs(loss.data) < 1e-6, \"MSE should work for perfect predictions\"\n",
-    "        \n",
-    "        # CrossEntropy test\n",
-    "        y_pred = Tensor([[10.0, 0.0], [0.0, 10.0]])\n",
-    "        y_true = Tensor([0, 1])\n",
-    "        loss = ce(y_pred, y_true)\n",
-    "        assert loss.data < 1.0, \"CrossEntropy should work for good predictions\"\n",
-    "        \n",
-    "        # Binary CrossEntropy test\n",
-    "        y_pred = Tensor([[10.0], [-10.0]])\n",
-    "        y_true = Tensor([[1.0], [0.0]])\n",
-    "        loss = bce(y_pred, y_true)\n",
-    "        assert loss.data < 1.0, \"Binary CrossEntropy should work for good predictions\"\n",
-    "        \n",
-    "        print(\"✅ Loss functions work correctly\")\n",
-    "        \n",
-    "        # Test 2: Metrics work correctly\n",
-    "        accuracy = Accuracy()\n",
-    "        \n",
-    "        y_pred = Tensor([[0.9, 0.1], [0.1, 0.9]])\n",
-    "        y_true = Tensor([0, 1])\n",
-    "        acc = accuracy(y_pred, y_true)\n",
-    "        assert acc == 1.0, \"Accuracy should work for perfect predictions\"\n",
-    "        \n",
-    "        print(\"✅ Metrics work correctly\")\n",
-    "        \n",
-    "        # Test 3: Trainer integrates all components\n",
-    "        model = Sequential([])  # Empty model for testing\n",
-    "        optimizer = SGD([], learning_rate=0.01)\n",
-    "        loss_fn = MeanSquaredError()\n",
-    "        metrics = [Accuracy()]\n",
-    "        \n",
-    "        trainer = Trainer(model, optimizer, loss_fn, metrics)\n",
-    "        \n",
-    "        # Check trainer setup\n",
-    "        assert trainer.model is model, \"Trainer should store model\"\n",
-    "        assert trainer.optimizer is optimizer, \"Trainer should store optimizer\"\n",
-    "        assert trainer.loss_function is loss_fn, \"Trainer should store loss function\"\n",
-    "        assert len(trainer.metrics) == 1, \"Trainer should store metrics\"\n",
-    "        \n",
-    "        print(\"✅ Trainer integrates all components\")\n",
-    "        \n",
-    "        print(\"🎉 Complete training pipeline works correctly!\")\n",
-    "        \n",
-    "        # Test 4: Integration works end-to-end\n",
-    "        print(\"✅ End-to-end integration successful\")\n",
-    "        \n",
-    "    except Exception as e:\n",
-    "        print(f\"❌ Training pipeline test failed: {e}\")\n",
-    "        raise\n",
-    "    \n",
-    "    print(\"🎯 Training Pipeline: All comprehensive tests passed!\")\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "test_training()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0ea4efb0",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e43200ad",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Training\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b820e9a4",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Neural Network Training Mastery!\n",
-    "\n",
-    "Congratulations! You've successfully implemented the complete training system that powers modern neural networks:\n",
-    "\n",
-    "### ✅ What You've Built\n",
-    "- **Loss Functions**: MSE, CrossEntropy, BinaryCrossEntropy for different problem types\n",
-    "- **Metrics System**: Accuracy with extensible framework for additional metrics\n",
-    "- **Training Loop**: Complete Trainer class with epoch management and history tracking\n",
-    "- **Integration**: All components work together in a unified training pipeline\n",
-    "\n",
-    "### ✅ Key Learning Outcomes\n",
-    "- **Understanding**: How neural networks learn through loss optimization\n",
-    "- **Implementation**: Built complete training system from scratch\n",
-    "- **Mathematical mastery**: Loss functions, gradient computation, metric calculation\n",
-    "- **Real-world application**: Comprehensive training pipeline for production use\n",
-    "- **Systems thinking**: Modular design enabling flexible training configurations\n",
-    "\n",
-    "### ✅ Mathematical Foundations Mastered\n",
-    "- **Loss Functions**: Quantifying prediction quality for different problem types\n",
-    "- **Gradient Descent**: Iterative optimization through loss minimization\n",
-    "- **Metrics**: Performance evaluation beyond loss (accuracy, precision, recall)\n",
-    "- **Training Dynamics**: Epoch management, batch processing, validation monitoring\n",
-    "\n",
-    "### ✅ Professional Skills Developed\n",
-    "- **Software Architecture**: Modular, extensible training system design\n",
-    "- **API Design**: Clean interfaces for training configuration and monitoring\n",
-    "- **Performance Monitoring**: Comprehensive metrics tracking and history logging\n",
-    "- **Error Handling**: Robust training pipeline with proper error management\n",
-    "\n",
-    "### ✅ Ready for Advanced Applications\n",
-    "Your training system now enables:\n",
-    "- **Any Neural Network**: Train any architecture with any loss function\n",
-    "- **Multiple Problem Types**: Classification, regression, and custom objectives\n",
-    "- **Production Training**: Robust training loops with monitoring and checkpointing\n",
-    "- **Research Applications**: Flexible framework for experimenting with new methods\n",
-    "\n",
-    "### 🔗 Connection to Real ML Systems\n",
-    "Your implementation mirrors production frameworks:\n",
-    "- **PyTorch**: `torch.nn` loss functions and training loops\n",
-    "- **TensorFlow**: `tf.keras` training API and callbacks\n",
-    "- **JAX**: `optax` optimizers and training utilities\n",
-    "- **Industry Standard**: Core training concepts used in all major ML systems\n",
-    "\n",
-    "### 🎯 The Power of Systematic Training\n",
-    "You've built the orchestration system that makes ML possible:\n",
-    "- **Automation**: Handles complex training workflows automatically\n",
-    "- **Flexibility**: Supports any model architecture and training configuration\n",
-    "- **Monitoring**: Comprehensive tracking of training progress and performance\n",
-    "- **Reliability**: Robust error handling and validation throughout training\n",
-    "\n",
-    "### 🧠 Machine Learning Engineering\n",
-    "You now understand the engineering that makes AI systems work:\n",
-    "- **Training Pipelines**: End-to-end automated training workflows\n",
-    "- **Performance Monitoring**: Real-time feedback on model learning progress\n",
-    "- **Hyperparameter Management**: Systematic approach to training configuration\n",
-    "- **Production Readiness**: Scalable training systems for real-world deployment\n",
-    "\n",
-    "### 🚀 What's Next\n",
-    "Your training system is the foundation for:\n",
-    "- **Advanced Optimizers**: Adam, RMSprop, and specialized optimization methods\n",
-    "- **Regularization**: Dropout, weight decay, and overfitting prevention\n",
-    "- **Model Deployment**: Saving, loading, and serving trained models\n",
-    "- **MLOps**: Production training pipelines, monitoring, and continuous learning\n",
-    "\n",
-    "### 🚀 Next Steps\n",
-    "1. **Export your code**: `tito export 09_training`\n",
-    "2. **Test your implementation**: `tito test 09_training`  \n",
-    "3. **Use your training system**: Train neural networks with confidence!\n",
-    "4. **Move to Module 10**: Advanced training techniques and regularization!\n",
-    "\n",
-    "**Ready for Production Training?** Your training system is now ready to train neural networks for real-world applications!\n",
-    "\n",
-    "You've built the training engine that powers modern AI. Now let's add the advanced features that make it production-ready and capable of learning complex patterns from real-world data!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/12_compression/compression_dev.ipynb b/modules/source/12_compression/compression_dev.ipynb
deleted file mode 100644
index 6ef0e2ce..00000000
--- a/modules/source/12_compression/compression_dev.ipynb
+++ /dev/null
@@ -1,2198 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "8828a71f",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Compression & Optimization - Making AI Models Efficient\n",
-    "\n",
-    "Welcome to the Compression module! This is where you'll learn to make neural networks smaller, faster, and more efficient for real-world deployment.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand how model size affects deployment and why compression matters\n",
-    "- Implement magnitude-based pruning to remove unimportant weights\n",
-    "- Master quantization to reduce memory usage by 75%\n",
-    "- Build knowledge distillation for training compact models\n",
-    "- Create structured pruning to optimize network architectures\n",
-    "- Compare compression techniques and their trade-offs\n",
-    "\n",
-    "## Build → Use → Optimize\n",
-    "1. **Build**: Four compression techniques from scratch\n",
-    "2. **Use**: Apply compression to real neural networks\n",
-    "3. **Optimize**: Combine techniques for maximum efficiency gains"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "73c55227",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "compression-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.compression\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "import os\n",
-    "import math\n",
-    "from typing import List, Dict, Any, Optional, Union, Tuple\n",
-    "from collections import defaultdict\n",
-    "\n",
-    "# Helper function to set up import paths\n",
-    "def setup_import_paths():\n",
-    "    \"\"\"Set up import paths for development modules.\"\"\"\n",
-    "    import sys\n",
-    "    import os\n",
-    "    \n",
-    "    # Add module directories to path\n",
-    "    base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n",
-    "    module_dirs = [\n",
-    "        '01_tensor', '02_activations', '03_layers', '04_networks', \n",
-    "        '05_cnn', '06_dataloader', '07_autograd', '08_optimizers', '09_training'\n",
-    "    ]\n",
-    "    \n",
-    "    for module_dir in module_dirs:\n",
-    "        sys.path.append(os.path.join(base_dir, module_dir))\n",
-    "\n",
-    "# Set up paths\n",
-    "setup_import_paths()\n",
-    "\n",
-    "# Import all the building blocks we need\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.layers import Dense\n",
-    "    from tinytorch.core.networks import Sequential\n",
-    "    from tinytorch.core.training import CrossEntropyLoss, Trainer\n",
-    "except ImportError:\n",
-    "    # For development, create mock classes or import from local modules\n",
-    "    try:\n",
-    "        from tensor_dev import Tensor\n",
-    "        from layers_dev import Dense\n",
-    "        from networks_dev import Sequential\n",
-    "        from training_dev import CrossEntropyLoss, Trainer\n",
-    "    except ImportError:\n",
-    "        # Create minimal mock classes for development\n",
-    "        class Tensor:\n",
-    "            def __init__(self, data):\n",
-    "                self.data = np.array(data)\n",
-    "                self.shape = self.data.shape\n",
-    "            \n",
-    "            def __str__(self):\n",
-    "                return f\"Tensor({self.data})\"\n",
-    "        \n",
-    "        class Dense:\n",
-    "            def __init__(self, input_size, output_size):\n",
-    "                self.input_size = input_size\n",
-    "                self.output_size = output_size\n",
-    "                self.weights = Tensor(np.random.randn(input_size, output_size) * 0.1)\n",
-    "                self.bias = Tensor(np.zeros(output_size))\n",
-    "            \n",
-    "            def __str__(self):\n",
-    "                return f\"Dense({self.input_size}, {self.output_size})\"\n",
-    "        \n",
-    "        class Sequential:\n",
-    "            def __init__(self, layers=None):\n",
-    "                self.layers = layers or []\n",
-    "        \n",
-    "        class CrossEntropyLoss:\n",
-    "            def __init__(self):\n",
-    "                pass\n",
-    "        \n",
-    "        class Trainer:\n",
-    "            def __init__(self, model, optimizer, loss_function):\n",
-    "                self.model = model\n",
-    "                self.optimizer = optimizer\n",
-    "                self.loss_function = loss_function"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a937158b",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "compression-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Compression Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to compress neural networks!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e2367326",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/10_compression/compression_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.compression`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.compression import (\n",
-    "    prune_weights_by_magnitude,    # Remove unimportant weights\n",
-    "    quantize_layer_weights,        # Reduce precision for memory savings\n",
-    "    DistillationLoss,              # Train compact models with teacher guidance\n",
-    "    prune_layer_neurons,           # Remove entire neurons/channels\n",
-    "    CompressionMetrics             # Measure model size and efficiency\n",
-    ")\n",
-    "from tinytorch.core.layers import Dense     # Target for compression\n",
-    "from tinytorch.core.networks import Sequential  # Model architectures\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Focused module for understanding model efficiency\n",
-    "- **Production:** Proper organization like PyTorch's compression tools\n",
-    "- **Consistency:** All compression techniques live together in `core.compression`\n",
-    "- **Foundation:** Essential for deploying AI in resource-constrained environments"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6860a130",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What is Model Compression?\n",
-    "\n",
-    "### The Problem: AI Models Are Getting Huge\n",
-    "Modern neural networks are massive:\n",
-    "- **GPT-3**: 175 billion parameters (350GB memory)\n",
-    "- **ResNet-152**: 60 million parameters (240MB memory)\n",
-    "- **BERT-Large**: 340 million parameters (1.3GB memory)\n",
-    "\n",
-    "But deployment environments have constraints:\n",
-    "- **Mobile phones**: Limited memory and battery\n",
-    "- **Edge devices**: No internet, minimal compute\n",
-    "- **Real-time systems**: Strict latency requirements\n",
-    "- **Cost optimization**: Expensive inference in cloud\n",
-    "\n",
-    "### The Solution: Intelligent Compression\n",
-    "**Model compression** reduces model size while preserving performance:\n",
-    "- **Pruning**: Remove unimportant weights and neurons\n",
-    "- **Quantization**: Use fewer bits per parameter\n",
-    "- **Knowledge distillation**: Train small models to mimic large ones\n",
-    "- **Structured optimization**: Modify architectures for efficiency\n",
-    "\n",
-    "### Real-World Impact\n",
-    "- **Mobile AI**: Apps like Google Translate work offline\n",
-    "- **Autonomous vehicles**: Real-time processing with limited compute\n",
-    "- **IoT devices**: Smart cameras, voice assistants, sensors\n",
-    "- **Cost savings**: Reduced inference costs in production systems\n",
-    "\n",
-    "### What We'll Build\n",
-    "1. **Magnitude-based pruning**: Remove smallest weights\n",
-    "2. **Quantization**: Convert FP32 → INT8 for 75% memory reduction\n",
-    "3. **Knowledge distillation**: Large models teach small models\n",
-    "4. **Structured pruning**: Remove entire neurons systematically\n",
-    "5. **Compression metrics**: Measure efficiency and accuracy trade-offs\n",
-    "6. **Integrated optimization**: Combine techniques for maximum benefit"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6dc048fd",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: Understanding Model Size and Parameters\n",
-    "\n",
-    "### What Makes Models Large?\n",
-    "Neural networks have millions of parameters:\n",
-    "- **Dense layers**: Weight matrices `(input_size, output_size)`\n",
-    "- **Bias vectors**: One per output neuron\n",
-    "- **CNN kernels**: Repeated across channels and filters\n",
-    "- **Embeddings**: Large vocabulary mappings\n",
-    "\n",
-    "### The Memory Reality Check\n",
-    "Let's see how much memory different architectures use:\n",
-    "\n",
-    "```python\n",
-    "# Simple MLP for MNIST\n",
-    "layer1 = Dense(784, 128)    # 784 * 128 = 100,352 params\n",
-    "layer2 = Dense(128, 64)     # 128 * 64 = 8,192 params  \n",
-    "layer3 = Dense(64, 10)      # 64 * 10 = 640 params\n",
-    "# Total: 109,184 params ≈ 437KB (FP32)\n",
-    "\n",
-    "# Larger network for CIFAR-10\n",
-    "layer1 = Dense(3072, 512)   # 3072 * 512 = 1,572,864 params\n",
-    "layer2 = Dense(512, 256)    # 512 * 256 = 131,072 params\n",
-    "layer3 = Dense(256, 128)    # 256 * 128 = 32,768 params\n",
-    "layer4 = Dense(128, 10)     # 128 * 10 = 1,280 params\n",
-    "# Total: 1,737,984 params ≈ 7MB (FP32)\n",
-    "```\n",
-    "\n",
-    "### Why Size Matters\n",
-    "- **Memory usage**: Each FP32 parameter uses 4 bytes\n",
-    "- **Storage**: Model files need to be downloaded/stored\n",
-    "- **Inference speed**: More parameters = more computation\n",
-    "- **Energy consumption**: Larger models drain battery faster\n",
-    "\n",
-    "### The Efficiency Spectrum\n",
-    "Different applications need different efficiency levels:\n",
-    "- **Research**: Accuracy first, efficiency second\n",
-    "- **Production**: Balance accuracy and efficiency\n",
-    "- **Mobile**: Strict size constraints (< 10MB)\n",
-    "- **Edge**: Extreme efficiency requirements (< 1MB)\n",
-    "\n",
-    "### Real-World Examples\n",
-    "- **MobileNet**: Designed for mobile deployment\n",
-    "- **DistilBERT**: 60% smaller than BERT with 97% performance\n",
-    "- **TinyML**: Models under 1MB for microcontrollers\n",
-    "- **Neural architecture search**: Automated efficiency optimization\n",
-    "\n",
-    "Let's build tools to measure and analyze model size!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "76eed78f",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "compression-metrics",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class CompressionMetrics:\n",
-    "    \"\"\"\n",
-    "    Utilities for measuring model size, sparsity, and compression efficiency.\n",
-    "    \n",
-    "    This class provides tools to analyze neural network models and understand\n",
-    "    their memory footprint, parameter distribution, and compression potential.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        \"\"\"Initialize compression metrics analyzer.\"\"\"\n",
-    "        pass\n",
-    "    \n",
-    "    def count_parameters(self, model: Sequential) -> Dict[str, int]:\n",
-    "        \"\"\"\n",
-    "        Count parameters in a neural network model.\n",
-    "        \n",
-    "        Args:\n",
-    "            model: Sequential model to analyze\n",
-    "            \n",
-    "        Returns:\n",
-    "            Dictionary with parameter counts per layer and total\n",
-    "            \n",
-    "        TODO: Implement parameter counting for neural network analysis.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Initialize counters for different parameter types\n",
-    "        2. Iterate through each layer in the model\n",
-    "        3. Count weights and biases for each layer\n",
-    "        4. Calculate total parameters across all layers\n",
-    "        5. Return detailed breakdown dictionary\n",
-    "        \n",
-    "        EXAMPLE OUTPUT:\n",
-    "        {\n",
-    "            'layer_0_weights': 100352,\n",
-    "            'layer_0_bias': 128,\n",
-    "            'layer_1_weights': 8192,\n",
-    "            'layer_1_bias': 64,\n",
-    "            'layer_2_weights': 640,\n",
-    "            'layer_2_bias': 10,\n",
-    "            'total_parameters': 109386,\n",
-    "            'total_weights': 109184,\n",
-    "            'total_bias': 202\n",
-    "        }\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use hasattr() to check if layer has weights/bias attributes\n",
-    "        - Weight matrices have shape (input_size, output_size)\n",
-    "        - Bias vectors have shape (output_size,)\n",
-    "        - Use np.prod() to calculate total elements from shape\n",
-    "        - Track layer index for detailed reporting\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is like `model.numel()` in PyTorch\n",
-    "        - Understanding where parameters are concentrated\n",
-    "        - Foundation for compression target selection\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        param_counts = {}\n",
-    "        total_params = 0\n",
-    "        total_weights = 0\n",
-    "        total_bias = 0\n",
-    "        \n",
-    "        for i, layer in enumerate(model.layers):\n",
-    "            # Count weights if layer has them\n",
-    "            if hasattr(layer, 'weights') and layer.weights is not None:\n",
-    "                # Handle different weight formats\n",
-    "                if hasattr(layer.weights, 'shape'):\n",
-    "                    weight_count = np.prod(layer.weights.shape)\n",
-    "                else:\n",
-    "                    weight_count = np.prod(layer.weights.data.shape)\n",
-    "                \n",
-    "                param_counts[f'layer_{i}_weights'] = weight_count\n",
-    "                total_weights += weight_count\n",
-    "                total_params += weight_count\n",
-    "            \n",
-    "            # Count bias if layer has them\n",
-    "            if hasattr(layer, 'bias') and layer.bias is not None:\n",
-    "                # Handle different bias formats\n",
-    "                if hasattr(layer.bias, 'shape'):\n",
-    "                    bias_count = np.prod(layer.bias.shape)\n",
-    "                else:\n",
-    "                    bias_count = np.prod(layer.bias.data.shape)\n",
-    "                \n",
-    "                param_counts[f'layer_{i}_bias'] = bias_count\n",
-    "                total_bias += bias_count\n",
-    "                total_params += bias_count\n",
-    "        \n",
-    "        # Add summary statistics\n",
-    "        param_counts['total_parameters'] = total_params\n",
-    "        param_counts['total_weights'] = total_weights\n",
-    "        param_counts['total_bias'] = total_bias\n",
-    "        \n",
-    "        return param_counts\n",
-    "        ### END SOLUTION \n",
-    "\n",
-    "    def calculate_model_size(self, model: Sequential, dtype: str = 'float32') -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        Calculate memory footprint of a neural network model.\n",
-    "        \n",
-    "        Args:\n",
-    "            model: Sequential model to analyze\n",
-    "            dtype: Data type for size calculation ('float32', 'float16', 'int8')\n",
-    "            \n",
-    "        Returns:\n",
-    "            Dictionary with size information in different units\n",
-    "        \"\"\"\n",
-    "        # Get parameter count\n",
-    "        param_info = self.count_parameters(model)\n",
-    "        total_params = param_info['total_parameters']\n",
-    "        \n",
-    "        # Determine bytes per parameter\n",
-    "        bytes_per_param = {\n",
-    "            'float32': 4,\n",
-    "            'float16': 2,\n",
-    "            'int8': 1\n",
-    "        }.get(dtype, 4)\n",
-    "        \n",
-    "        # Calculate sizes\n",
-    "        total_bytes = total_params * bytes_per_param\n",
-    "        size_kb = total_bytes / 1024\n",
-    "        size_mb = size_kb / 1024\n",
-    "        \n",
-    "        return {\n",
-    "            'total_parameters': total_params,\n",
-    "            'bytes_per_parameter': bytes_per_param,\n",
-    "            'total_bytes': total_bytes,\n",
-    "            'size_kb': round(size_kb, 2),\n",
-    "            'size_mb': round(size_mb, 2),\n",
-    "            'dtype': dtype\n",
-    "        }"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1b810a6a",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-compression-metrics",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_compression_metrics():\n",
-    "    \"\"\"\n",
-    "    ### 🧪 Unit Test: CompressionMetrics\n",
-    "    \n",
-    "    Test parameter counting and model size analysis functionality.\n",
-    "    \n",
-    "    **This is a unit test** - it tests model size analysis in isolation.\n",
-    "    \"\"\"\n",
-    "    print(\"🔬 Unit Test: CompressionMetrics\")\n",
-    "    print(\"**This is a unit test** - it tests model size analysis in isolation.\")\n",
-    "    \n",
-    "    # Create test model\n",
-    "    layers = [\n",
-    "        Dense(784, 128),  # 784 * 128 + 128 = 100,480 params\n",
-    "        Dense(128, 64),   # 128 * 64 + 64 = 8,256 params\n",
-    "        Dense(64, 10)     # 64 * 10 + 10 = 650 params\n",
-    "    ]\n",
-    "    model = Sequential(layers)\n",
-    "    \n",
-    "    # Test parameter counting\n",
-    "    metrics = CompressionMetrics()\n",
-    "    param_counts = metrics.count_parameters(model)\n",
-    "    \n",
-    "    # Verify parameter counts\n",
-    "    assert param_counts['layer_0_weights'] == 100352, f\"Expected 100352, got {param_counts['layer_0_weights']}\"\n",
-    "    assert param_counts['layer_0_bias'] == 128, f\"Expected 128, got {param_counts['layer_0_bias']}\"\n",
-    "    assert param_counts['total_parameters'] == 109386, f\"Expected 109386, got {param_counts['total_parameters']}\"\n",
-    "    \n",
-    "    print(\"📈 Progress: CompressionMetrics ✓\")\n",
-    "    print(\"🎯 CompressionMetrics behavior:\")\n",
-    "    print(\"  - Counts parameters across all layers\")\n",
-    "    print(\"  - Provides detailed breakdown by layer\")\n",
-    "    print(\"  - Separates weight and bias counts\")\n",
-    "    print(\"  - Foundation for compression analysis\")\n",
-    "    print()\n",
-    "\n",
-    "# Run the test\n",
-    "test_compression_metrics() "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a83a0b59",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Magnitude-Based Pruning - Removing Unimportant Weights\n",
-    "\n",
-    "### What is Magnitude-Based Pruning?\n",
-    "**Magnitude-based pruning** removes weights with the smallest absolute values, based on the hypothesis that small weights contribute less to the model's performance.\n",
-    "\n",
-    "### The Algorithm\n",
-    "1. **Calculate magnitude**: `|weight|` for each parameter\n",
-    "2. **Set threshold**: Choose cutoff (e.g., 50th percentile)\n",
-    "3. **Create mask**: `mask = |weight| > threshold`\n",
-    "4. **Apply pruning**: `pruned_weight = weight * mask`\n",
-    "\n",
-    "### Why This Works\n",
-    "- **Redundancy**: Neural networks are over-parameterized\n",
-    "- **Lottery ticket hypothesis**: Small subnetworks can match full performance\n",
-    "- **Magnitude correlation**: Larger weights often more important\n",
-    "- **Gradual degradation**: Performance drops slowly with pruning\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Mobile deployment**: Reduce model size for smartphones\n",
-    "- **Edge computing**: Fit models on resource-constrained devices\n",
-    "- **Inference acceleration**: Fewer parameters = faster computation\n",
-    "- **Memory optimization**: Sparse matrices save storage\n",
-    "\n",
-    "### Pruning Strategies\n",
-    "- **Global**: Single threshold across all layers\n",
-    "- **Layer-wise**: Different thresholds per layer\n",
-    "- **Structured**: Remove entire neurons/channels\n",
-    "- **Gradual**: Increase sparsity during training\n",
-    "\n",
-    "### Performance vs Sparsity Trade-off\n",
-    "- **10-30% sparsity**: Minimal accuracy loss\n",
-    "- **50-70% sparsity**: Moderate accuracy drop\n",
-    "- **80-90% sparsity**: Significant accuracy loss\n",
-    "- **95%+ sparsity**: Requires careful tuning\n",
-    "\n",
-    "Let's implement magnitude-based pruning!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fa8e7fca",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "magnitude-pruning",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def prune_weights_by_magnitude(layer: Dense, pruning_ratio: float = 0.5) -> Tuple[Dense, Dict[str, Any]]:\n",
-    "    \"\"\"\n",
-    "    Prune weights in a Dense layer by magnitude.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer: Dense layer to prune\n",
-    "        pruning_ratio: Fraction of weights to remove (0.0 to 1.0)\n",
-    "        \n",
-    "    Returns:\n",
-    "        Tuple of (pruned_layer, pruning_info)\n",
-    "        \n",
-    "    TODO: Implement magnitude-based weight pruning.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Get weight matrix from layer\n",
-    "    2. Calculate absolute values (magnitudes)\n",
-    "    3. Find threshold using percentile\n",
-    "    4. Create binary mask for weights above threshold\n",
-    "    5. Apply mask to weights (set small weights to zero)\n",
-    "    6. Update layer weights and return pruning statistics\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    layer = Dense(784, 128)\n",
-    "    pruned_layer, info = prune_weights_by_magnitude(layer, pruning_ratio=0.3)\n",
-    "    print(f\"Pruned {info['weights_removed']} weights, sparsity: {info['sparsity']:.2f}\")\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use np.percentile() with pruning_ratio * 100 for threshold\n",
-    "    - Create mask with np.abs(weights) > threshold\n",
-    "    - Apply mask by element-wise multiplication\n",
-    "    - Count zeros to calculate sparsity\n",
-    "    - Return original layer (modified) and statistics\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is the foundation of network pruning\n",
-    "    - Magnitude pruning is simplest but effective\n",
-    "    - Sparsity = fraction of weights that are zero\n",
-    "    - Threshold selection affects accuracy vs compression trade-off\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Get current weights and ensure they're numpy arrays\n",
-    "    weights = layer.weights.data\n",
-    "    if not isinstance(weights, np.ndarray):\n",
-    "        weights = np.array(weights)\n",
-    "    \n",
-    "    original_weights = weights.copy()\n",
-    "    \n",
-    "    # Calculate magnitudes and threshold\n",
-    "    magnitudes = np.abs(weights)\n",
-    "    threshold = np.percentile(magnitudes, pruning_ratio * 100)\n",
-    "    \n",
-    "    # Create mask and apply pruning\n",
-    "    mask = magnitudes > threshold\n",
-    "    pruned_weights = weights * mask\n",
-    "    \n",
-    "    # Update layer weights\n",
-    "    layer.weights.data = pruned_weights\n",
-    "    \n",
-    "    # Calculate pruning statistics\n",
-    "    total_weights = weights.size\n",
-    "    zero_weights = np.sum(pruned_weights == 0)\n",
-    "    weights_removed = zero_weights - np.sum(original_weights == 0)\n",
-    "    sparsity = zero_weights / total_weights\n",
-    "    \n",
-    "    pruning_info = {\n",
-    "        'pruning_ratio': pruning_ratio,\n",
-    "        'threshold': float(threshold),\n",
-    "        'total_weights': total_weights,\n",
-    "        'weights_removed': weights_removed,\n",
-    "        'remaining_weights': total_weights - zero_weights,\n",
-    "        'sparsity': float(sparsity),\n",
-    "        'compression_ratio': 1 / (1 - sparsity) if sparsity < 1 else float('inf')\n",
-    "    }\n",
-    "    \n",
-    "    return layer, pruning_info\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a20feb97",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "calculate-sparsity",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def calculate_sparsity(layer: Dense) -> float:\n",
-    "    \"\"\"\n",
-    "    Calculate sparsity (fraction of zero weights) in a Dense layer.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer: Dense layer to analyze\n",
-    "        \n",
-    "    Returns:\n",
-    "        Sparsity as float between 0.0 and 1.0\n",
-    "        \n",
-    "    TODO: Implement sparsity calculation.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Get weight matrix from layer\n",
-    "    2. Count total number of weights\n",
-    "    3. Count number of zero weights\n",
-    "    4. Calculate sparsity = zero_weights / total_weights\n",
-    "    5. Return as float\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    layer = Dense(100, 50)\n",
-    "    sparsity = calculate_sparsity(layer)\n",
-    "    print(f\"Layer sparsity: {sparsity:.2%}\")\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use np.sum() with condition to count zeros\n",
-    "    - Use .size attribute for total elements\n",
-    "    - Return 0.0 if no weights (edge case)\n",
-    "    - Sparsity of 0.0 = dense, 1.0 = completely sparse\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - Sparsity is key metric for compression\n",
-    "    - Higher sparsity = more compression\n",
-    "    - Sparsity patterns affect hardware efficiency\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    if not hasattr(layer, 'weights') or layer.weights is None:\n",
-    "        return 0.0\n",
-    "    \n",
-    "    weights = layer.weights.data\n",
-    "    if not isinstance(weights, np.ndarray):\n",
-    "        weights = np.array(weights)\n",
-    "    \n",
-    "    total_weights = weights.size\n",
-    "    zero_weights = np.sum(weights == 0)\n",
-    "    \n",
-    "    return zero_weights / total_weights if total_weights > 0 else 0.0\n",
-    "    ### END SOLUTION "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3082fa17",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-pruning",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_magnitude_pruning():\n",
-    "    \"\"\"\n",
-    "    ### 🧪 Unit Test: Magnitude-Based Pruning\n",
-    "    \n",
-    "    Test weight pruning algorithms and sparsity calculation.\n",
-    "    \n",
-    "    **This is a unit test** - it tests weight pruning in isolation.\n",
-    "    \"\"\"\n",
-    "    print(\"🔬 Unit Test: Magnitude-Based Pruning\")\n",
-    "    print(\"**This is a unit test** - it tests weight pruning in isolation.\")\n",
-    "    \n",
-    "    # Create test layer\n",
-    "    layer = Dense(100, 50)\n",
-    "    \n",
-    "    # Test basic pruning\n",
-    "    pruned_layer, info = prune_weights_by_magnitude(layer, pruning_ratio=0.3)\n",
-    "    \n",
-    "    # Verify pruning results\n",
-    "    assert info['pruning_ratio'] == 0.3, f\"Expected 0.3, got {info['pruning_ratio']}\"\n",
-    "    assert info['total_weights'] == 5000, f\"Expected 5000, got {info['total_weights']}\"\n",
-    "    assert info['sparsity'] >= 0.3, f\"Sparsity should be at least 0.3, got {info['sparsity']}\"\n",
-    "    \n",
-    "    print(f\"✅ Basic pruning works: {info['sparsity']:.2%} sparsity\")\n",
-    "    \n",
-    "    # Test sparsity calculation\n",
-    "    sparsity = calculate_sparsity(layer)\n",
-    "    assert abs(sparsity - info['sparsity']) < 0.001, f\"Sparsity mismatch: {sparsity} vs {info['sparsity']}\"\n",
-    "    print(f\"✅ Sparsity calculation works: {sparsity:.2%}\")\n",
-    "    \n",
-    "    # Test edge cases\n",
-    "    empty_layer = Dense(10, 10)\n",
-    "    empty_layer.weights.data = np.zeros((10, 10))\n",
-    "    sparsity_empty = calculate_sparsity(empty_layer)\n",
-    "    assert sparsity_empty == 1.0, f\"Empty layer should have 1.0 sparsity, got {sparsity_empty}\"\n",
-    "    \n",
-    "    print(\"✅ Edge cases work correctly\")\n",
-    "    \n",
-    "    # Test different pruning ratios\n",
-    "    layer2 = Dense(50, 25)\n",
-    "    _, info50 = prune_weights_by_magnitude(layer2, pruning_ratio=0.5)\n",
-    "    \n",
-    "    layer3 = Dense(50, 25)\n",
-    "    _, info80 = prune_weights_by_magnitude(layer3, pruning_ratio=0.8)\n",
-    "    \n",
-    "    assert info80['sparsity'] > info50['sparsity'], \"Higher pruning ratio should give higher sparsity\"\n",
-    "    print(f\"✅ Different pruning ratios work: 50% ratio = {info50['sparsity']:.2%}, 80% ratio = {info80['sparsity']:.2%}\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Magnitude-Based Pruning ✓\")\n",
-    "    print(\"🎯 Pruning behavior:\")\n",
-    "    print(\"  - Removes weights with smallest absolute values\")\n",
-    "    print(\"  - Maintains layer structure and connectivity\")\n",
-    "    print(\"  - Provides detailed statistics for analysis\")\n",
-    "    print(\"  - Scales to different pruning ratios\")\n",
-    "    print()\n",
-    "\n",
-    "# Run the test\n",
-    "test_magnitude_pruning() "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "89e3cba2",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Quantization - Reducing Precision for Memory Efficiency\n",
-    "\n",
-    "### What is Quantization?\n",
-    "**Quantization** reduces the precision of weights from FP32 (32-bit) to lower bit-widths like INT8 (8-bit), achieving significant memory savings with minimal accuracy loss.\n",
-    "\n",
-    "### The Mathematical Foundation\n",
-    "Quantization maps continuous floating-point values to discrete integer values:\n",
-    "\n",
-    "```\n",
-    "quantized_value = round((fp_value - min_val) / scale)\n",
-    "scale = (max_val - min_val) / (2^bits - 1)\n",
-    "```\n",
-    "\n",
-    "### Why Quantization Works\n",
-    "- **Redundant precision**: Neural networks are robust to precision reduction\n",
-    "- **Hardware efficiency**: Integer operations are faster than floating-point\n",
-    "- **Memory savings**: 4x reduction (FP32 → INT8) in memory usage\n",
-    "- **Cache efficiency**: More parameters fit in limited cache memory\n",
-    "\n",
-    "### Types of Quantization\n",
-    "- **Post-training**: Quantize after training is complete\n",
-    "- **Quantization-aware training**: Train with quantization simulation\n",
-    "- **Dynamic**: Quantize activations at runtime\n",
-    "- **Static**: Pre-compute quantization parameters\n",
-    "\n",
-    "### Real-World Impact\n",
-    "- **Mobile deployment**: 75% memory reduction enables smartphone AI\n",
-    "- **Edge computing**: Fit larger models on constrained devices\n",
-    "- **Cloud efficiency**: Reduce bandwidth and storage costs\n",
-    "- **Battery life**: Lower power consumption for mobile devices\n",
-    "\n",
-    "### Common Bit-Widths\n",
-    "- **FP32**: Full precision (baseline)\n",
-    "- **FP16**: Half precision (2x memory reduction)\n",
-    "- **INT8**: 8-bit integers (4x memory reduction)\n",
-    "- **INT4**: 4-bit integers (8x memory reduction, aggressive)\n",
-    "\n",
-    "Let's implement quantization algorithms!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "6afd2132",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "quantization",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def quantize_layer_weights(layer: Dense, bits: int = 8) -> Tuple[Dense, Dict[str, Any]]:\n",
-    "    \"\"\"\n",
-    "    Quantize layer weights to reduce precision.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer: Dense layer to quantize\n",
-    "        bits: Number of bits for quantization (8, 16, etc.)\n",
-    "        \n",
-    "    Returns:\n",
-    "        Tuple of (quantized_layer, quantization_info)\n",
-    "        \n",
-    "    TODO: Implement weight quantization for memory efficiency.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Get weight matrix from layer\n",
-    "    2. Find min and max values for quantization range\n",
-    "    3. Calculate scale factor: (max - min) / (2^bits - 1)\n",
-    "    4. Quantize: round((weights - min) / scale)\n",
-    "    5. Dequantize back to float: quantized * scale + min\n",
-    "    6. Update layer weights and return statistics\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    layer = Dense(784, 128)\n",
-    "    quantized_layer, info = quantize_layer_weights(layer, bits=8)\n",
-    "    print(f\"Memory reduction: {info['memory_reduction']:.1f}x\")\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use np.min() and np.max() to find weight range\n",
-    "    - Clamp quantized values to valid range [0, 2^bits-1]\n",
-    "    - Store original dtype for memory calculation\n",
-    "    - Calculate theoretical memory savings\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how mobile AI frameworks work\n",
-    "    - Hardware accelerators optimize for INT8\n",
-    "    - Precision-performance trade-off is key\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Get current weights and ensure they're numpy arrays\n",
-    "    weights = layer.weights.data\n",
-    "    if not isinstance(weights, np.ndarray):\n",
-    "        weights = np.array(weights)\n",
-    "    \n",
-    "    original_weights = weights.copy()\n",
-    "    original_dtype = weights.dtype\n",
-    "    \n",
-    "    # Find min and max for quantization range\n",
-    "    w_min, w_max = np.min(weights), np.max(weights)\n",
-    "    \n",
-    "    # Calculate scale factor\n",
-    "    scale = (w_max - w_min) / (2**bits - 1)\n",
-    "    \n",
-    "    # Quantize weights\n",
-    "    quantized = np.round((weights - w_min) / scale)\n",
-    "    quantized = np.clip(quantized, 0, 2**bits - 1)  # Clamp to valid range\n",
-    "    \n",
-    "    # Dequantize back to float (simulation of quantized inference)\n",
-    "    dequantized = quantized * scale + w_min\n",
-    "    \n",
-    "    # Update layer weights\n",
-    "    layer.weights.data = dequantized.astype(np.float32)\n",
-    "    \n",
-    "    # Calculate quantization statistics\n",
-    "    total_weights = weights.size\n",
-    "    original_bytes = total_weights * 4  # FP32 = 4 bytes\n",
-    "    quantized_bytes = total_weights * (bits // 8)  # bits/8 bytes per weight\n",
-    "    memory_reduction = original_bytes / quantized_bytes if quantized_bytes > 0 else 1.0\n",
-    "    \n",
-    "    # Calculate quantization error\n",
-    "    mse_error = np.mean((original_weights - dequantized) ** 2)\n",
-    "    max_error = np.max(np.abs(original_weights - dequantized))\n",
-    "    \n",
-    "    quantization_info = {\n",
-    "        'bits': bits,\n",
-    "        'scale': float(scale),\n",
-    "        'min_val': float(w_min),\n",
-    "        'max_val': float(w_max),\n",
-    "        'total_weights': total_weights,\n",
-    "        'original_bytes': original_bytes,\n",
-    "        'quantized_bytes': quantized_bytes,\n",
-    "        'memory_reduction': float(memory_reduction),\n",
-    "        'mse_error': float(mse_error),\n",
-    "        'max_error': float(max_error),\n",
-    "        'original_dtype': str(original_dtype)\n",
-    "    }\n",
-    "    \n",
-    "    return layer, quantization_info\n",
-    "    ### END SOLUTION "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b4d3e171",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-quantization",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_quantization():\n",
-    "    \"\"\"\n",
-    "    ### 🧪 Unit Test: Quantization\n",
-    "    \n",
-    "    Test weight quantization and precision reduction functionality.\n",
-    "    \n",
-    "    **This is a unit test** - it tests quantization algorithms in isolation.\n",
-    "    \"\"\"\n",
-    "    print(\"🔬 Unit Test: Quantization\")\n",
-    "    print(\"**This is a unit test** - it tests quantization algorithms in isolation.\")\n",
-    "    \n",
-    "    # Create test layer\n",
-    "    layer = Dense(100, 50)\n",
-    "    original_weights = layer.weights.data.copy() if hasattr(layer.weights.data, 'copy') else np.array(layer.weights.data)\n",
-    "    \n",
-    "    # Test INT8 quantization\n",
-    "    quantized_layer, info = quantize_layer_weights(layer, bits=8)\n",
-    "    \n",
-    "    # Verify quantization results\n",
-    "    assert info['bits'] == 8, f\"Expected 8 bits, got {info['bits']}\"\n",
-    "    assert info['total_weights'] == 5000, f\"Expected 5000 weights, got {info['total_weights']}\"\n",
-    "    assert info['memory_reduction'] == 4.0, f\"Expected 4x reduction, got {info['memory_reduction']}\"\n",
-    "    \n",
-    "    print(f\"✅ INT8 quantization works: {info['memory_reduction']:.1f}x memory reduction\")\n",
-    "    \n",
-    "    # Test quantization error\n",
-    "    assert info['mse_error'] >= 0, \"MSE error should be non-negative\"\n",
-    "    assert info['max_error'] >= 0, \"Max error should be non-negative\"\n",
-    "    \n",
-    "    print(f\"✅ Quantization error tracking works: MSE={info['mse_error']:.6f}, Max={info['max_error']:.6f}\")\n",
-    "    \n",
-    "    # Test different bit widths\n",
-    "    layer2 = Dense(50, 25)\n",
-    "    _, info16 = quantize_layer_weights(layer2, bits=16)\n",
-    "    \n",
-    "    layer3 = Dense(50, 25)  \n",
-    "    _, info4 = quantize_layer_weights(layer3, bits=8)  # Use 8 instead of 4 for valid byte calculation\n",
-    "    \n",
-    "    assert info16['memory_reduction'] == 2.0, f\"16-bit should give 2x reduction, got {info16['memory_reduction']}\"\n",
-    "    print(f\"✅ Different bit widths work: 16-bit = {info16['memory_reduction']:.1f}x, 8-bit = {info4['memory_reduction']:.1f}x\")\n",
-    "    \n",
-    "    # Test quantization parameters\n",
-    "    assert 'scale' in info, \"Scale parameter should be included\"\n",
-    "    assert 'min_val' in info, \"Min value should be included\"\n",
-    "    assert 'max_val' in info, \"Max value should be included\"\n",
-    "    \n",
-    "    print(\"✅ Quantization parameters work correctly\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Quantization ✓\")\n",
-    "    print(\"🎯 Quantization behavior:\")\n",
-    "    print(\"  - Reduces precision while preserving weights\")\n",
-    "    print(\"  - Provides significant memory savings\")\n",
-    "    print(\"  - Tracks quantization error and parameters\")\n",
-    "    print(\"  - Supports different bit widths\")\n",
-    "    print()\n",
-    "\n",
-    "# Run the test\n",
-    "test_quantization() "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "658bdd07",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Knowledge Distillation - Large Models Teach Small Models\n",
-    "\n",
-    "### What is Knowledge Distillation?\n",
-    "**Knowledge distillation** trains a small \"student\" model to mimic the behavior of a large \"teacher\" model, achieving compact models with competitive performance.\n",
-    "\n",
-    "### The Core Idea\n",
-    "Instead of training on hard labels (0 or 1), students learn from soft targets (probabilities) that contain more information about the teacher's knowledge.\n",
-    "\n",
-    "### The Mathematical Foundation\n",
-    "Distillation combines two loss functions:\n",
-    "\n",
-    "```python\n",
-    "# Hard loss: Standard classification loss\n",
-    "hard_loss = CrossEntropy(student_logits, true_labels)\n",
-    "\n",
-    "# Soft loss: Learn from teacher's probability distribution\n",
-    "soft_targets = softmax(teacher_logits / temperature)\n",
-    "soft_student = softmax(student_logits / temperature)\n",
-    "soft_loss = -sum(soft_targets * log(soft_student))\n",
-    "\n",
-    "# Combined loss\n",
-    "total_loss = α * hard_loss + (1 - α) * soft_loss\n",
-    "```\n",
-    "\n",
-    "### Why Distillation Works\n",
-    "- **Richer information**: Soft targets contain inter-class relationships\n",
-    "- **Teacher knowledge**: Large models learn useful representations\n",
-    "- **Regularization**: Soft targets reduce overfitting\n",
-    "- **Efficiency**: Small models gain large model insights\n",
-    "\n",
-    "### Key Parameters\n",
-    "- **Temperature (T)**: Controls softness of probability distributions\n",
-    "  - High T: Softer, more informative distributions\n",
-    "  - Low T: Sharper, more confident predictions\n",
-    "- **Alpha (α)**: Balances hard and soft losses\n",
-    "  - α = 1.0: Only hard loss (standard training)\n",
-    "  - α = 0.0: Only soft loss (pure distillation)\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Mobile deployment**: Small models with large model performance\n",
-    "- **Edge computing**: Efficient inference with minimal accuracy loss\n",
-    "- **Model compression**: Alternative to pruning and quantization\n",
-    "- **Multi-task learning**: Transfer knowledge across different tasks\n",
-    "\n",
-    "### Success Stories\n",
-    "- **DistilBERT**: 60% smaller than BERT with 97% performance\n",
-    "- **MobileNet**: Distilled from ResNet for mobile deployment\n",
-    "- **TinyBERT**: Extreme compression for resource-constrained devices\n",
-    "\n",
-    "Let's implement knowledge distillation!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "fa5d5762",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "distillation-loss",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class DistillationLoss:\n",
-    "    \"\"\"\n",
-    "    Combined loss function for knowledge distillation.\n",
-    "    \n",
-    "    This loss combines standard classification loss (hard targets) with\n",
-    "    distillation loss (soft targets from teacher) for training compact models.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, temperature: float = 3.0, alpha: float = 0.5):\n",
-    "        \"\"\"\n",
-    "        Initialize distillation loss.\n",
-    "        \n",
-    "        Args:\n",
-    "            temperature: Temperature for softening probability distributions\n",
-    "            alpha: Weight for hard loss (1-alpha for soft loss)\n",
-    "        \"\"\"\n",
-    "        self.temperature = temperature\n",
-    "        self.alpha = alpha\n",
-    "        self.ce_loss = CrossEntropyLoss()\n",
-    "    \n",
-    "    def __call__(self, student_logits: np.ndarray, teacher_logits: np.ndarray, \n",
-    "                 true_labels: np.ndarray) -> float:\n",
-    "        \"\"\"\n",
-    "        Calculate combined distillation loss.\n",
-    "        \n",
-    "        Args:\n",
-    "            student_logits: Raw outputs from student model\n",
-    "            teacher_logits: Raw outputs from teacher model  \n",
-    "            true_labels: Ground truth labels\n",
-    "            \n",
-    "        Returns:\n",
-    "            Combined loss value\n",
-    "            \n",
-    "        TODO: Implement knowledge distillation loss function.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Calculate hard loss using standard cross-entropy\n",
-    "        2. Apply temperature scaling to both logits\n",
-    "        3. Calculate soft targets from teacher logits\n",
-    "        4. Calculate soft loss between student and teacher distributions\n",
-    "        5. Combine hard and soft losses with alpha weighting\n",
-    "        6. Return total loss\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        distill_loss = DistillationLoss(temperature=3.0, alpha=0.5)\n",
-    "        loss = distill_loss(student_out, teacher_out, labels)\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use temperature scaling before softmax: logits / temperature\n",
-    "        - Implement stable softmax to avoid numerical issues\n",
-    "        - Scale soft loss by temperature^2 (standard practice)\n",
-    "        - Ensure proper normalization for both losses\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This is how DistilBERT was trained\n",
-    "        - Temperature controls knowledge transfer richness\n",
-    "        - Alpha balances accuracy vs compression\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Convert inputs to numpy arrays if needed\n",
-    "        if not isinstance(student_logits, np.ndarray):\n",
-    "            student_logits = np.array(student_logits)\n",
-    "        if not isinstance(teacher_logits, np.ndarray):\n",
-    "            teacher_logits = np.array(teacher_logits)\n",
-    "        if not isinstance(true_labels, np.ndarray):\n",
-    "            true_labels = np.array(true_labels)\n",
-    "        \n",
-    "        # Hard loss: standard classification loss\n",
-    "        hard_loss = self._cross_entropy_loss(student_logits, true_labels)\n",
-    "        \n",
-    "        # Soft loss: distillation from teacher\n",
-    "        # Apply temperature scaling\n",
-    "        teacher_soft = self._softmax(teacher_logits / self.temperature)\n",
-    "        student_soft = self._softmax(student_logits / self.temperature)\n",
-    "        \n",
-    "        # Calculate soft loss (KL divergence)\n",
-    "        soft_loss = -np.mean(np.sum(teacher_soft * np.log(student_soft + 1e-10), axis=-1))\n",
-    "        \n",
-    "        # Scale soft loss by temperature^2 (standard practice)\n",
-    "        soft_loss *= (self.temperature ** 2)\n",
-    "        \n",
-    "        # Combine losses\n",
-    "        total_loss = self.alpha * hard_loss + (1 - self.alpha) * soft_loss\n",
-    "        \n",
-    "        return float(total_loss)\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def _softmax(self, logits: np.ndarray) -> np.ndarray:\n",
-    "        \"\"\"Numerically stable softmax.\"\"\"\n",
-    "        # Subtract max for numerical stability\n",
-    "        exp_logits = np.exp(logits - np.max(logits, axis=-1, keepdims=True))\n",
-    "        return exp_logits / np.sum(exp_logits, axis=-1, keepdims=True)\n",
-    "    \n",
-    "    def _cross_entropy_loss(self, logits: np.ndarray, labels: np.ndarray) -> float:\n",
-    "        \"\"\"Simple cross-entropy loss implementation.\"\"\"\n",
-    "        # Convert labels to one-hot if needed\n",
-    "        if labels.ndim == 1:\n",
-    "            num_classes = logits.shape[-1]\n",
-    "            one_hot = np.zeros((labels.shape[0], num_classes))\n",
-    "            one_hot[np.arange(labels.shape[0]), labels] = 1\n",
-    "            labels = one_hot\n",
-    "        \n",
-    "        # Apply softmax and calculate cross-entropy\n",
-    "        probs = self._softmax(logits)\n",
-    "        return -np.mean(np.sum(labels * np.log(probs + 1e-10), axis=-1)) "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "444095cc",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-distillation",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_distillation():\n",
-    "    \"\"\"\n",
-    "    ### 🧪 Unit Test: Knowledge Distillation\n",
-    "    \n",
-    "    Test knowledge distillation loss function and teacher-student training.\n",
-    "    \n",
-    "    **This is a unit test** - it tests distillation algorithms in isolation.\n",
-    "    \"\"\"\n",
-    "    print(\"🔬 Unit Test: Knowledge Distillation\")\n",
-    "    print(\"**This is a unit test** - it tests distillation algorithms in isolation.\")\n",
-    "    \n",
-    "    # Create sample data\n",
-    "    batch_size, num_classes = 32, 10\n",
-    "    student_logits = np.random.randn(batch_size, num_classes) * 0.5\n",
-    "    teacher_logits = np.random.randn(batch_size, num_classes) * 2.0  # Teacher is more confident\n",
-    "    true_labels = np.random.randint(0, num_classes, batch_size)\n",
-    "    \n",
-    "    # Test distillation loss\n",
-    "    distill_loss = DistillationLoss(temperature=3.0, alpha=0.5)\n",
-    "    loss = distill_loss(student_logits, teacher_logits, true_labels)\n",
-    "    \n",
-    "    # Verify loss computation\n",
-    "    assert isinstance(loss, float), f\"Loss should be float, got {type(loss)}\"\n",
-    "    assert loss >= 0, f\"Loss should be non-negative, got {loss}\"\n",
-    "    \n",
-    "    print(f\"✅ Distillation loss computation works: {loss:.4f}\")\n",
-    "    \n",
-    "    # Test different temperature values\n",
-    "    loss_t1 = DistillationLoss(temperature=1.0, alpha=0.5)(student_logits, teacher_logits, true_labels)\n",
-    "    loss_t5 = DistillationLoss(temperature=5.0, alpha=0.5)(student_logits, teacher_logits, true_labels)\n",
-    "    \n",
-    "    print(f\"✅ Temperature scaling works: T=1.0 → {loss_t1:.4f}, T=5.0 → {loss_t5:.4f}\")\n",
-    "    \n",
-    "    # Test different alpha values\n",
-    "    loss_hard = DistillationLoss(temperature=3.0, alpha=1.0)(student_logits, teacher_logits, true_labels)  # Only hard loss\n",
-    "    loss_soft = DistillationLoss(temperature=3.0, alpha=0.0)(student_logits, teacher_logits, true_labels)  # Only soft loss\n",
-    "    \n",
-    "    assert loss_hard != loss_soft, \"Hard and soft losses should be different\"\n",
-    "    print(f\"✅ Alpha balancing works: Hard only = {loss_hard:.4f}, Soft only = {loss_soft:.4f}\")\n",
-    "    \n",
-    "    # Test edge cases\n",
-    "    # Identical student and teacher should have low soft loss\n",
-    "    identical_logits = np.random.randn(batch_size, num_classes)\n",
-    "    loss_identical = DistillationLoss(temperature=3.0, alpha=0.0)(identical_logits, identical_logits, true_labels)\n",
-    "    \n",
-    "    print(f\"✅ Edge cases work: Identical logits soft loss = {loss_identical:.4f}\")\n",
-    "    \n",
-    "    # Test internal methods\n",
-    "    softmax_result = distill_loss._softmax(student_logits)\n",
-    "    assert np.allclose(np.sum(softmax_result, axis=1), 1.0), \"Softmax should sum to 1\"\n",
-    "    \n",
-    "    print(\"✅ Internal methods work correctly\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Knowledge Distillation ✓\")\n",
-    "    print(\"🎯 Distillation behavior:\")\n",
-    "    print(\"  - Combines hard and soft losses effectively\")\n",
-    "    print(\"  - Temperature controls knowledge transfer\")\n",
-    "    print(\"  - Alpha balances accuracy vs compression\")\n",
-    "    print(\"  - Numerically stable softmax implementation\")\n",
-    "    print()\n",
-    "\n",
-    "# Run the test\n",
-    "test_distillation() "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "887f8eed",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 5: Structured Pruning - Removing Entire Neurons and Channels\n",
-    "\n",
-    "### What is Structured Pruning?\n",
-    "**Structured pruning** removes entire neurons, channels, or layers rather than individual weights, creating models that are actually faster on hardware.\n",
-    "\n",
-    "### Structured vs Unstructured Pruning\n",
-    "\n",
-    "#### **Unstructured Pruning** (What we did in Step 2)\n",
-    "- Removes individual weights scattered throughout the matrix\n",
-    "- Creates sparse matrices (lots of zeros)\n",
-    "- High compression but requires sparse matrix libraries for speedup\n",
-    "- Memory savings but limited hardware acceleration\n",
-    "\n",
-    "#### **Structured Pruning** (What we're doing now)\n",
-    "- Removes entire rows/columns (neurons/channels)\n",
-    "- Creates smaller dense matrices\n",
-    "- Lower compression but actual hardware speedup\n",
-    "- Real reduction in computation and memory access\n",
-    "\n",
-    "### The Mathematical Impact\n",
-    "Removing a neuron from a Dense layer:\n",
-    "\n",
-    "```python\n",
-    "# Original layer: Dense(784, 128)\n",
-    "# Weight matrix: (784, 128), Bias: (128,)\n",
-    "\n",
-    "# After removing 32 neurons: Dense(784, 96)\n",
-    "# Weight matrix: (784, 96), Bias: (96,)\n",
-    "# 25% reduction in parameters and computation\n",
-    "```\n",
-    "\n",
-    "### Why Structured Pruning Works\n",
-    "- **Hardware efficiency**: Dense matrix operations are optimized\n",
-    "- **Memory bandwidth**: Smaller matrices mean less data movement\n",
-    "- **Cache utilization**: Better memory access patterns\n",
-    "- **Real speedup**: Actual reduction in FLOPs and inference time\n",
-    "\n",
-    "### Neuron Importance Metrics\n",
-    "How do we decide which neurons to remove?\n",
-    "\n",
-    "1. **Activation-based**: Neurons with low average activation\n",
-    "2. **Gradient-based**: Neurons with small gradients during training\n",
-    "3. **Weight magnitude**: Neurons with small outgoing weights\n",
-    "4. **Information-theoretic**: Neurons contributing less information\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Mobile deployment**: Actual speedup on ARM processors\n",
-    "- **FPGA inference**: Smaller designs with same performance\n",
-    "- **Edge computing**: Reduced memory bandwidth requirements\n",
-    "- **Production systems**: Guaranteed inference time reduction\n",
-    "\n",
-    "### Challenges\n",
-    "- **Architecture modification**: Must handle dimension mismatches\n",
-    "- **Cascade effects**: Removing one neuron affects next layer\n",
-    "- **Retraining**: Often requires fine-tuning after pruning\n",
-    "- **Importance ranking**: Choosing the right importance metric\n",
-    "\n",
-    "Let's implement structured pruning for Dense layers!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d02d19f3",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "neuron-importance",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def compute_neuron_importance(layer: Dense, method: str = 'weight_magnitude') -> np.ndarray:\n",
-    "    \"\"\"\n",
-    "    Compute importance scores for each neuron in a Dense layer.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer: Dense layer to analyze\n",
-    "        method: Importance computation method\n",
-    "        \n",
-    "    Returns:\n",
-    "        Array of importance scores for each output neuron\n",
-    "        \n",
-    "    TODO: Implement neuron importance calculation.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Get weight matrix from layer\n",
-    "    2. Choose importance metric based on method\n",
-    "    3. Calculate per-neuron importance scores\n",
-    "    4. Return array of scores (one per output neuron)\n",
-    "    \n",
-    "    AVAILABLE METHODS:\n",
-    "    - 'weight_magnitude': Sum of absolute weights per neuron\n",
-    "    - 'weight_variance': Variance of weights per neuron\n",
-    "    - 'random': Random importance (for baseline comparison)\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Weights shape is (input_size, output_size)\n",
-    "    - Each column represents one output neuron\n",
-    "    - Use axis=0 for operations across input dimensions\n",
-    "    - Higher scores = more important neurons\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how neural architecture search works\n",
-    "    - Different metrics capture different aspects of importance\n",
-    "    - Importance ranking is crucial for effective pruning\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Get weights and ensure they're numpy arrays\n",
-    "    weights = layer.weights.data\n",
-    "    if not isinstance(weights, np.ndarray):\n",
-    "        weights = np.array(weights)\n",
-    "    \n",
-    "    if method == 'weight_magnitude':\n",
-    "        # Sum of absolute weights per neuron (column)\n",
-    "        importance = np.sum(np.abs(weights), axis=0)\n",
-    "        \n",
-    "    elif method == 'weight_variance':\n",
-    "        # Variance of weights per neuron (column)\n",
-    "        importance = np.var(weights, axis=0)\n",
-    "        \n",
-    "    elif method == 'random':\n",
-    "        # Random importance for baseline comparison\n",
-    "        importance = np.random.rand(weights.shape[1])\n",
-    "        \n",
-    "    else:\n",
-    "        raise ValueError(f\"Unknown importance method: {method}\")\n",
-    "    \n",
-    "    return importance\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3075ea5f",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "structured-pruning",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def prune_layer_neurons(layer: Dense, keep_ratio: float = 0.7, \n",
-    "                       importance_method: str = 'weight_magnitude') -> Tuple[Dense, Dict[str, Any]]:\n",
-    "    \"\"\"\n",
-    "    Remove least important neurons from a Dense layer.\n",
-    "    \n",
-    "    Args:\n",
-    "        layer: Dense layer to prune\n",
-    "        keep_ratio: Fraction of neurons to keep (0.0 to 1.0)\n",
-    "        importance_method: Method for computing neuron importance\n",
-    "        \n",
-    "    Returns:\n",
-    "        Tuple of (pruned_layer, pruning_info)\n",
-    "        \n",
-    "    TODO: Implement structured neuron pruning.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Compute importance scores for all neurons\n",
-    "    2. Determine how many neurons to keep\n",
-    "    3. Select indices of most important neurons\n",
-    "    4. Create new layer with reduced dimensions\n",
-    "    5. Copy weights and biases for selected neurons\n",
-    "    6. Return pruned layer and statistics\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    layer = Dense(784, 128)\n",
-    "    pruned_layer, info = prune_layer_neurons(layer, keep_ratio=0.75)\n",
-    "    print(f\"Reduced from {info['original_neurons']} to {info['remaining_neurons']} neurons\")\n",
-    "    ```\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Use np.argsort() to rank neurons by importance\n",
-    "    - Take the top keep_count neurons: indices[-keep_count:]\n",
-    "    - Create new layer with reduced output size\n",
-    "    - Copy both weights and bias for selected neurons\n",
-    "    - Track original and new sizes for statistics\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is actual model architecture modification\n",
-    "    - Hardware gets real speedup from smaller matrices\n",
-    "    - Must consider cascade effects on next layers\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Compute neuron importance\n",
-    "    importance_scores = compute_neuron_importance(layer, importance_method)\n",
-    "    \n",
-    "    # Determine how many neurons to keep\n",
-    "    original_neurons = layer.output_size\n",
-    "    keep_count = max(1, int(original_neurons * keep_ratio))  # Keep at least 1 neuron\n",
-    "    \n",
-    "    # Select most important neurons\n",
-    "    sorted_indices = np.argsort(importance_scores)\n",
-    "    keep_indices = sorted_indices[-keep_count:]  # Take top keep_count neurons\n",
-    "    keep_indices = np.sort(keep_indices)  # Sort for consistent ordering\n",
-    "    \n",
-    "    # Get current weights and biases\n",
-    "    weights = layer.weights.data\n",
-    "    if not isinstance(weights, np.ndarray):\n",
-    "        weights = np.array(weights)\n",
-    "    \n",
-    "    bias = layer.bias.data if layer.bias is not None else None\n",
-    "    if bias is not None and not isinstance(bias, np.ndarray):\n",
-    "        bias = np.array(bias)\n",
-    "    \n",
-    "    # Create new layer with reduced dimensions\n",
-    "    pruned_layer = Dense(layer.input_size, keep_count)\n",
-    "    \n",
-    "    # Copy weights for selected neurons\n",
-    "    pruned_weights = weights[:, keep_indices]\n",
-    "    pruned_layer.weights.data = np.ascontiguousarray(pruned_weights)\n",
-    "    \n",
-    "    # Copy bias for selected neurons\n",
-    "    if bias is not None:\n",
-    "        pruned_bias = bias[keep_indices]\n",
-    "        pruned_layer.bias.data = np.ascontiguousarray(pruned_bias)\n",
-    "    \n",
-    "    # Calculate pruning statistics\n",
-    "    neurons_removed = original_neurons - keep_count\n",
-    "    compression_ratio = original_neurons / keep_count if keep_count > 0 else float('inf')\n",
-    "    \n",
-    "    # Calculate parameter reduction\n",
-    "    original_params = layer.input_size * original_neurons + (original_neurons if bias is not None else 0)\n",
-    "    new_params = layer.input_size * keep_count + (keep_count if bias is not None else 0)\n",
-    "    param_reduction = (original_params - new_params) / original_params\n",
-    "    \n",
-    "    pruning_info = {\n",
-    "        'keep_ratio': keep_ratio,\n",
-    "        'importance_method': importance_method,\n",
-    "        'original_neurons': original_neurons,\n",
-    "        'remaining_neurons': keep_count,\n",
-    "        'neurons_removed': neurons_removed,\n",
-    "        'compression_ratio': float(compression_ratio),\n",
-    "        'original_params': original_params,\n",
-    "        'new_params': new_params,\n",
-    "        'param_reduction': float(param_reduction),\n",
-    "        'keep_indices': keep_indices.tolist()\n",
-    "    }\n",
-    "    \n",
-    "    return pruned_layer, pruning_info\n",
-    "    ### END SOLUTION "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ddbf46e3",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-structured-pruning",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_structured_pruning():\n",
-    "    \"\"\"\n",
-    "    ### 🧪 Unit Test: Structured Pruning\n",
-    "    \n",
-    "    Test structured neuron pruning and parameter reduction.\n",
-    "    \n",
-    "    **This is a unit test** - it tests structured pruning in isolation.\n",
-    "    \"\"\"\n",
-    "    print(\"🔬 Unit Test: Structured Pruning\")\n",
-    "    print(\"**This is a unit test** - it tests structured pruning in isolation.\")\n",
-    "    \n",
-    "    # Create test layer\n",
-    "    layer = Dense(100, 50)\n",
-    "    \n",
-    "    # Test basic pruning\n",
-    "    pruned_layer, info = prune_layer_neurons(layer, keep_ratio=0.75)\n",
-    "    \n",
-    "    # Verify pruning results\n",
-    "    assert info['keep_ratio'] == 0.75, f\"Expected 0.75, got {info['keep_ratio']}\"\n",
-    "    assert info['original_neurons'] == 50, f\"Expected 50, got {info['original_neurons']}\"\n",
-    "    assert info['remaining_neurons'] == 37, f\"Expected 37, got {info['remaining_neurons']}\"\n",
-    "    assert info['neurons_removed'] == 13, f\"Expected 13, got {info['neurons_removed']}\"\n",
-    "    assert info['compression_ratio'] >= 1.35, f\"Compression ratio should be at least 1.35, got {info['compression_ratio']}\"\n",
-    "    \n",
-    "    print(f\"✅ Basic structured pruning works: {info['neurons_removed']} neurons removed\")\n",
-    "    \n",
-    "    # Test parameter reduction\n",
-    "    assert info['param_reduction'] >= 0.25, f\"Parameter reduction should be at least 0.25, got {info['param_reduction']}\"\n",
-    "    print(f\"✅ Parameter reduction works: {info['param_reduction']:.2%}\")\n",
-    "    \n",
-    "    # Test edge cases\n",
-    "    empty_layer = Dense(10, 10)\n",
-    "    _, info_empty = prune_layer_neurons(empty_layer, keep_ratio=0.5)\n",
-    "    assert info_empty['remaining_neurons'] == 5, f\"Empty layer should have 5 neurons, got {info_empty['remaining_neurons']}\"\n",
-    "    \n",
-    "    print(\"✅ Edge cases work correctly\")\n",
-    "    \n",
-    "    # Test different keep ratios\n",
-    "    layer2 = Dense(50, 25)\n",
-    "    _, info_ratio70 = prune_layer_neurons(layer2, keep_ratio=0.7)\n",
-    "    _, info_ratio50 = prune_layer_neurons(layer2, keep_ratio=0.5)\n",
-    "    \n",
-    "    assert info_ratio70['remaining_neurons'] > info_ratio50['remaining_neurons'], \"Higher keep ratio should result in more neurons\"\n",
-    "    print(f\"✅ Different keep ratios work: 70% ratio = {info_ratio70['remaining_neurons']}, 50% ratio = {info_ratio50['remaining_neurons']}\")\n",
-    "    \n",
-    "    # Test different importance methods\n",
-    "    _, info_weight_mag = prune_layer_neurons(layer, keep_ratio=0.75, importance_method='weight_magnitude')\n",
-    "    _, info_weight_var = prune_layer_neurons(layer, keep_ratio=0.75, importance_method='weight_variance')\n",
-    "    \n",
-    "    # Both should achieve similar compression ratios since they both keep 75% of neurons\n",
-    "    print(f\"✅ Different importance methods work: Weight Mag = {info_weight_mag['compression_ratio']:.2f}, Weight Var = {info_weight_var['compression_ratio']:.2f}\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Structured Pruning ✓\")\n",
-    "    print(\"🎯 Structured pruning behavior:\")\n",
-    "    print(\"  - Removes least important neurons\")\n",
-    "    print(\"  - Maintains layer structure and connectivity\")\n",
-    "    print(\"  - Provides detailed statistics for analysis\")\n",
-    "    print(\"  - Scales to different keep ratios\")\n",
-    "    print()\n",
-    "\n",
-    "# Run the test\n",
-    "test_structured_pruning() "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ea0c4481",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 6: Comprehensive Comparison - Combining All Techniques\n",
-    "\n",
-    "### Putting It All Together\n",
-    "Now that we've implemented four core compression techniques, let's combine them and see how they work together for maximum efficiency.\n",
-    "\n",
-    "### The Compression Toolkit\n",
-    "We now have a complete arsenal:\n",
-    "\n",
-    "1. **CompressionMetrics**: Analyze model size and parameter distribution\n",
-    "2. **Magnitude-based pruning**: Remove unimportant weights (sparsity)\n",
-    "3. **Quantization**: Reduce precision (FP32 → INT8)\n",
-    "4. **Knowledge distillation**: Train compact models with teacher guidance\n",
-    "5. **Structured pruning**: Remove entire neurons (actual speedup)\n",
-    "\n",
-    "### Compression Strategy Design\n",
-    "Different deployment scenarios need different strategies:\n",
-    "\n",
-    "#### **Mobile AI Deployment**\n",
-    "- **Primary**: Quantization (75% memory reduction)\n",
-    "- **Secondary**: Structured pruning (inference speedup)\n",
-    "- **Target**: < 10MB models, < 100ms inference\n",
-    "\n",
-    "#### **Edge Computing**\n",
-    "- **Primary**: Structured pruning (minimal compute)\n",
-    "- **Secondary**: Magnitude pruning (memory efficiency)\n",
-    "- **Target**: < 1MB models, minimal power consumption\n",
-    "\n",
-    "#### **Production Cloud**\n",
-    "- **Primary**: Knowledge distillation (balanced compression)\n",
-    "- **Secondary**: Quantization (cost reduction)\n",
-    "- **Target**: Maximize throughput while maintaining accuracy\n",
-    "\n",
-    "#### **Research and Development**\n",
-    "- **Primary**: Magnitude pruning (experimental flexibility)\n",
-    "- **Secondary**: All techniques for comparison\n",
-    "- **Target**: Understand trade-offs and optimal combinations\n",
-    "\n",
-    "### Compression Pipeline Design\n",
-    "A systematic approach to model compression:\n",
-    "\n",
-    "```python\n",
-    "# 1. Baseline analysis\n",
-    "metrics = CompressionMetrics()\n",
-    "baseline_size = metrics.calculate_model_size(model)\n",
-    "\n",
-    "# 2. Apply magnitude pruning\n",
-    "model, prune_info = prune_model_by_magnitude(model, pruning_ratio=0.3)\n",
-    "\n",
-    "# 3. Apply quantization\n",
-    "for layer in model.layers:\n",
-    "    if isinstance(layer, Dense):\n",
-    "        layer, quant_info = quantize_layer_weights(layer, bits=8)\n",
-    "\n",
-    "# 4. Apply structured pruning\n",
-    "for i, layer in enumerate(model.layers):\n",
-    "    if isinstance(layer, Dense):\n",
-    "        model.layers[i], struct_info = prune_layer_neurons(layer, keep_ratio=0.8)\n",
-    "\n",
-    "# 5. Measure final compression\n",
-    "final_size = metrics.calculate_model_size(model)\n",
-    "compression_ratio = baseline_size['size_mb'] / final_size['size_mb']\n",
-    "```\n",
-    "\n",
-    "### Trade-off Analysis\n",
-    "Understanding the compression spectrum:\n",
-    "\n",
-    "- **Accuracy vs Size**: More compression = more accuracy loss\n",
-    "- **Size vs Speed**: Structured compression gives actual speedup\n",
-    "- **Memory vs Computation**: Different bottlenecks need different solutions\n",
-    "- **Development vs Production**: Research flexibility vs deployment constraints\n",
-    "\n",
-    "Let's build a comprehensive comparison framework!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5ec30404",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "compression-comparison",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def compare_compression_techniques(original_model: Sequential) -> Dict[str, Dict[str, Any]]:\n",
-    "    \"\"\"\n",
-    "    Compare all compression techniques on the same model.\n",
-    "    \n",
-    "    Args:\n",
-    "        original_model: Base model to compress using different techniques\n",
-    "        \n",
-    "    Returns:\n",
-    "        Dictionary comparing results from different compression approaches\n",
-    "        \n",
-    "    TODO: Implement comprehensive compression comparison.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Set up baseline metrics from original model\n",
-    "    2. Apply each compression technique individually\n",
-    "    3. Apply combined compression techniques\n",
-    "    4. Measure and compare all results\n",
-    "    5. Return comprehensive comparison data\n",
-    "    \n",
-    "    COMPARISON DIMENSIONS:\n",
-    "    - Model size (MB)\n",
-    "    - Parameter count\n",
-    "    - Compression ratio\n",
-    "    - Memory reduction\n",
-    "    - Estimated speedup (for structured techniques)\n",
-    "    \n",
-    "    IMPLEMENTATION HINTS:\n",
-    "    - Create separate model copies for each technique\n",
-    "    - Use consistent parameters across techniques\n",
-    "    - Track both individual and combined effects\n",
-    "    - Include baseline for reference\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how research papers compare compression methods\n",
-    "    - Production systems need this analysis for deployment decisions\n",
-    "    - Understanding trade-offs guides technique selection\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    results = {}\n",
-    "    metrics = CompressionMetrics()\n",
-    "    \n",
-    "    # Baseline: Original model\n",
-    "    baseline_params = metrics.count_parameters(original_model)\n",
-    "    baseline_size = metrics.calculate_model_size(original_model)\n",
-    "    \n",
-    "    results['baseline'] = {\n",
-    "        'technique': 'Original Model',\n",
-    "        'parameters': baseline_params['total_parameters'],\n",
-    "        'size_mb': baseline_size['size_mb'],\n",
-    "        'compression_ratio': 1.0,\n",
-    "        'memory_reduction': 0.0\n",
-    "    }\n",
-    "    \n",
-    "    # Technique 1: Magnitude-based pruning only\n",
-    "    model_pruning = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
-    "    for i, layer in enumerate(model_pruning.layers):\n",
-    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
-    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
-    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
-    "    \n",
-    "    # Apply magnitude pruning to each layer\n",
-    "    total_sparsity = 0\n",
-    "    for i, layer in enumerate(model_pruning.layers):\n",
-    "        if isinstance(layer, Dense):\n",
-    "            _, prune_info = prune_weights_by_magnitude(layer, pruning_ratio=0.3)\n",
-    "            total_sparsity += prune_info['sparsity']\n",
-    "    \n",
-    "    avg_sparsity = total_sparsity / len(model_pruning.layers)\n",
-    "    pruning_params = metrics.count_parameters(model_pruning)\n",
-    "    pruning_size = metrics.calculate_model_size(model_pruning)\n",
-    "    \n",
-    "    results['magnitude_pruning'] = {\n",
-    "        'technique': 'Magnitude Pruning (30%)',\n",
-    "        'parameters': pruning_params['total_parameters'],\n",
-    "        'size_mb': pruning_size['size_mb'],\n",
-    "        'compression_ratio': baseline_size['size_mb'] / pruning_size['size_mb'],\n",
-    "        'memory_reduction': (baseline_size['size_mb'] - pruning_size['size_mb']) / baseline_size['size_mb'],\n",
-    "        'sparsity': avg_sparsity\n",
-    "    }\n",
-    "    \n",
-    "    # Technique 2: Quantization only\n",
-    "    model_quantization = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
-    "    for i, layer in enumerate(model_quantization.layers):\n",
-    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
-    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
-    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
-    "    \n",
-    "    # Apply quantization to each layer\n",
-    "    total_memory_reduction = 0\n",
-    "    for i, layer in enumerate(model_quantization.layers):\n",
-    "        if isinstance(layer, Dense):\n",
-    "            _, quant_info = quantize_layer_weights(layer, bits=8)\n",
-    "            total_memory_reduction += quant_info['memory_reduction']\n",
-    "    \n",
-    "    avg_memory_reduction = total_memory_reduction / len(model_quantization.layers)\n",
-    "    quantization_size = metrics.calculate_model_size(model_quantization, dtype='int8')\n",
-    "    \n",
-    "    results['quantization'] = {\n",
-    "        'technique': 'Quantization (INT8)',\n",
-    "        'parameters': baseline_params['total_parameters'],\n",
-    "        'size_mb': quantization_size['size_mb'],\n",
-    "        'compression_ratio': baseline_size['size_mb'] / quantization_size['size_mb'],\n",
-    "        'memory_reduction': (baseline_size['size_mb'] - quantization_size['size_mb']) / baseline_size['size_mb'],\n",
-    "        'avg_memory_reduction_factor': avg_memory_reduction\n",
-    "    }\n",
-    "    \n",
-    "    # Technique 3: Structured pruning only\n",
-    "    model_structured = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
-    "    for i, layer in enumerate(model_structured.layers):\n",
-    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
-    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
-    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
-    "    \n",
-    "    # Apply structured pruning to each layer\n",
-    "    total_param_reduction = 0\n",
-    "    for i, layer in enumerate(model_structured.layers):\n",
-    "        if isinstance(layer, Dense):\n",
-    "            pruned_layer, struct_info = prune_layer_neurons(layer, keep_ratio=0.75)\n",
-    "            model_structured.layers[i] = pruned_layer\n",
-    "            total_param_reduction += struct_info['param_reduction']\n",
-    "    \n",
-    "    avg_param_reduction = total_param_reduction / len(model_structured.layers)\n",
-    "    structured_params = metrics.count_parameters(model_structured)\n",
-    "    structured_size = metrics.calculate_model_size(model_structured)\n",
-    "    \n",
-    "    results['structured_pruning'] = {\n",
-    "        'technique': 'Structured Pruning (75% neurons kept)',\n",
-    "        'parameters': structured_params['total_parameters'],\n",
-    "        'size_mb': structured_size['size_mb'],\n",
-    "        'compression_ratio': baseline_size['size_mb'] / structured_size['size_mb'],\n",
-    "        'memory_reduction': (baseline_size['size_mb'] - structured_size['size_mb']) / baseline_size['size_mb'],\n",
-    "        'param_reduction': avg_param_reduction\n",
-    "    }\n",
-    "    \n",
-    "    # Technique 4: Combined approach\n",
-    "    model_combined = Sequential([Dense(layer.input_size, layer.output_size) for layer in original_model.layers])\n",
-    "    for i, layer in enumerate(model_combined.layers):\n",
-    "        layer.weights.data = original_model.layers[i].weights.data.copy() if hasattr(original_model.layers[i].weights.data, 'copy') else np.array(original_model.layers[i].weights.data)\n",
-    "        if hasattr(layer, 'bias') and original_model.layers[i].bias is not None:\n",
-    "            layer.bias.data = original_model.layers[i].bias.data.copy() if hasattr(original_model.layers[i].bias.data, 'copy') else np.array(original_model.layers[i].bias.data)\n",
-    "    \n",
-    "    # Apply magnitude pruning + quantization + structured pruning\n",
-    "    for i, layer in enumerate(model_combined.layers):\n",
-    "        if isinstance(layer, Dense):\n",
-    "            # Step 1: Magnitude pruning\n",
-    "            _, _ = prune_weights_by_magnitude(layer, pruning_ratio=0.2)\n",
-    "            # Step 2: Quantization  \n",
-    "            _, _ = quantize_layer_weights(layer, bits=8)\n",
-    "            # Step 3: Structured pruning\n",
-    "            pruned_layer, _ = prune_layer_neurons(layer, keep_ratio=0.8)\n",
-    "            model_combined.layers[i] = pruned_layer\n",
-    "    \n",
-    "    combined_params = metrics.count_parameters(model_combined)\n",
-    "    combined_size = metrics.calculate_model_size(model_combined, dtype='int8')\n",
-    "    \n",
-    "    results['combined'] = {\n",
-    "        'technique': 'Combined (Pruning + Quantization + Structured)',\n",
-    "        'parameters': combined_params['total_parameters'],\n",
-    "        'size_mb': combined_size['size_mb'],\n",
-    "        'compression_ratio': baseline_size['size_mb'] / combined_size['size_mb'],\n",
-    "        'memory_reduction': (baseline_size['size_mb'] - combined_size['size_mb']) / baseline_size['size_mb']\n",
-    "    }\n",
-    "    \n",
-    "    return results\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b0b991b2",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## 🧪 Testing Infrastructure\n",
-    "\n",
-    "### 🔬 Unit Testing Pattern\n",
-    "Each compression technique includes comprehensive unit tests:\n",
-    "\n",
-    "1. **Functionality verification**: Core algorithms work correctly\n",
-    "2. **Edge case handling**: Robust error handling and boundary conditions\n",
-    "3. **Statistical validation**: Compression metrics and analysis\n",
-    "4. **Performance measurement**: Before/after comparisons\n",
-    "\n",
-    "### 📈 Progress Tracking\n",
-    "- **CompressionMetrics**: ✅ Complete with parameter counting\n",
-    "- **Magnitude-based pruning**: ✅ Complete with sparsity calculation\n",
-    "- **Quantization**: 🔄 Coming next\n",
-    "- **Knowledge distillation**: 🔄 Coming next\n",
-    "- **Structured pruning**: 🔄 Coming next\n",
-    "- **Comprehensive comparison**: 🔄 Coming next\n",
-    "\n",
-    "### 🎓 Educational Value\n",
-    "- **Conceptual understanding**: Why compression matters\n",
-    "- **Practical implementation**: Build techniques from scratch\n",
-    "- **Real-world connections**: Mobile, edge, and production deployment\n",
-    "- **Systems thinking**: Balance accuracy, efficiency, and constraints\n",
-    "\n",
-    "This module teaches the essential skills for deploying AI in resource-constrained environments!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2d2cee1e",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-comprehensive-comparison",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_comprehensive_comparison():\n",
-    "    \"\"\"\n",
-    "    ### 🧪 Unit Test: Comprehensive Comparison\n",
-    "    \n",
-    "    Test the integrated compression comparison framework.\n",
-    "    \n",
-    "    **This is a unit test** - it tests comprehensive comparison in isolation.\n",
-    "    \"\"\"\n",
-    "    print(\"🔬 Unit Test: Comprehensive Comparison\")\n",
-    "    print(\"**This is a unit test** - it tests comprehensive comparison in isolation.\")\n",
-    "    \n",
-    "    # Create test model\n",
-    "    model = Sequential([\n",
-    "        Dense(784, 128),\n",
-    "        Dense(128, 64),\n",
-    "        Dense(64, 10)\n",
-    "    ])\n",
-    "    \n",
-    "    # Run comprehensive comparison\n",
-    "    results = compare_compression_techniques(model)\n",
-    "    \n",
-    "    # Verify baseline exists\n",
-    "    assert 'baseline' in results, \"Baseline results should be included\"\n",
-    "    baseline = results['baseline']\n",
-    "    assert baseline['compression_ratio'] == 1.0, f\"Baseline compression ratio should be 1.0, got {baseline['compression_ratio']}\"\n",
-    "    \n",
-    "    print(f\"✅ Baseline analysis works: {baseline['parameters']} parameters, {baseline['size_mb']} MB\")\n",
-    "    \n",
-    "    # Verify individual techniques\n",
-    "    techniques = ['magnitude_pruning', 'quantization', 'structured_pruning', 'combined']\n",
-    "    for technique in techniques:\n",
-    "        assert technique in results, f\"Missing technique: {technique}\"\n",
-    "        result = results[technique]\n",
-    "        \n",
-    "        # Magnitude pruning creates sparsity but doesn't reduce file size in our simulation\n",
-    "        if technique == 'magnitude_pruning':\n",
-    "            assert result['compression_ratio'] >= 1.0, f\"{technique} should have compression ratio >= 1.0\"\n",
-    "        else:\n",
-    "            assert result['compression_ratio'] > 1.0, f\"{technique} should have compression ratio > 1.0\"\n",
-    "            \n",
-    "        assert 0 <= result['memory_reduction'] <= 1.0, f\"{technique} memory reduction should be between 0 and 1\"\n",
-    "        \n",
-    "    print(\"✅ All compression techniques work correctly\")\n",
-    "    \n",
-    "    # Verify compression effectiveness\n",
-    "    quantization = results['quantization']\n",
-    "    structured = results['structured_pruning']\n",
-    "    combined = results['combined']\n",
-    "    \n",
-    "    assert quantization['compression_ratio'] >= 3.0, f\"Quantization should achieve at least 3x compression, got {quantization['compression_ratio']:.2f}\"\n",
-    "    assert structured['compression_ratio'] >= 1.2, f\"Structured pruning should achieve at least 1.2x compression, got {structured['compression_ratio']:.2f}\"\n",
-    "    assert combined['compression_ratio'] >= quantization['compression_ratio'], f\"Combined should be at least as good as best individual technique\"\n",
-    "    \n",
-    "    print(f\"✅ Compression effectiveness verified:\")\n",
-    "    print(f\"  - Quantization: {quantization['compression_ratio']:.2f}x compression\")\n",
-    "    print(f\"  - Structured: {structured['compression_ratio']:.2f}x compression\") \n",
-    "    print(f\"  - Combined: {combined['compression_ratio']:.2f}x compression\")\n",
-    "    \n",
-    "    # Verify different techniques have different characteristics\n",
-    "    magnitude = results['magnitude_pruning']\n",
-    "    assert 'sparsity' in magnitude, \"Magnitude pruning should report sparsity\"\n",
-    "    assert 'avg_memory_reduction_factor' in quantization, \"Quantization should report memory reduction factor\"\n",
-    "    assert 'param_reduction' in structured, \"Structured pruning should report parameter reduction\"\n",
-    "    \n",
-    "    print(\"✅ Technique-specific metrics work correctly\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Comprehensive Comparison ✓\")\n",
-    "    print(\"🎯 Comprehensive comparison behavior:\")\n",
-    "    print(\"  - Compares all techniques systematically\")\n",
-    "    print(\"  - Provides detailed metrics for each approach\")\n",
-    "    print(\"  - Enables informed compression strategy selection\")\n",
-    "    print(\"  - Demonstrates combined technique effectiveness\")\n",
-    "    print()\n",
-    "\n",
-    "# Run the test\n",
-    "test_comprehensive_comparison()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7df3b1d9",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "0b4e8651",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Compression\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4c1769f7",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📋 Module Summary\n",
-    "\n",
-    "### ✅ What We've Built\n",
-    "This compression module provides a complete toolkit for making neural networks efficient:\n",
-    "\n",
-    "#### **1. CompressionMetrics** ✓\n",
-    "- **Parameter counting**: Analyze model size and distribution\n",
-    "- **Memory footprint**: Calculate storage requirements in different data types\n",
-    "- **Foundation**: Baseline measurement for compression decisions\n",
-    "\n",
-    "#### **2. Magnitude-Based Pruning** ✓\n",
-    "- **Weight removal**: Remove smallest weights based on magnitude\n",
-    "- **Sparsity creation**: Create sparse matrices for memory efficiency\n",
-    "- **Flexible thresholds**: Support different pruning intensities\n",
-    "\n",
-    "#### **3. Quantization** ✓\n",
-    "- **Precision reduction**: Convert FP32 → INT8 for 75% memory savings\n",
-    "- **Error tracking**: Monitor quantization impact on model accuracy\n",
-    "- **Multiple bit-widths**: Support 16-bit, 8-bit, and other precisions\n",
-    "\n",
-    "#### **4. Knowledge Distillation** ✓\n",
-    "- **Teacher-student training**: Large models guide small model learning\n",
-    "- **Soft targets**: Rich probability distributions vs hard labels\n",
-    "- **Temperature scaling**: Control knowledge transfer richness\n",
-    "\n",
-    "#### **5. Structured Pruning** ✓\n",
-    "- **Neuron removal**: Remove entire neurons for actual hardware speedup\n",
-    "- **Architecture modification**: Create smaller but dense networks\n",
-    "- **Importance metrics**: Multiple methods for ranking neuron importance\n",
-    "\n",
-    "#### **6. Comprehensive Comparison** ✓\n",
-    "- **Systematic evaluation**: Compare all techniques on same baseline\n",
-    "- **Combined approaches**: Integrate multiple techniques for maximum compression\n",
-    "- **Trade-off analysis**: Understand compression vs accuracy spectrum\n",
-    "\n",
-    "### 🎯 Real-World Applications\n",
-    "Students can now optimize models for:\n",
-    "- **Mobile AI**: < 10MB models for smartphone deployment\n",
-    "- **Edge computing**: < 1MB models for IoT and embedded systems\n",
-    "- **Production cloud**: Cost-optimized inference at scale\n",
-    "- **Research**: Systematic compression comparison and analysis\n",
-    "\n",
-    "### 📊 Compression Achievements\n",
-    "With the complete toolkit, students can achieve:\n",
-    "- **4x+ memory reduction**: Through quantization (FP32 → INT8)\n",
-    "- **1.3x+ speedup**: Through structured pruning (actual hardware benefit)\n",
-    "- **5x+ combined compression**: Integrating multiple techniques\n",
-    "- **Flexible trade-offs**: Balance accuracy, size, and speed as needed\n",
-    "\n",
-    "### 🔗 Next Steps\n",
-    "\n",
-    "This compression foundation prepares students for:\n",
-    "- **Module 11 - GPU Kernels**: Hardware-accelerated compression operations\n",
-    "- **Module 12 - Benchmarking**: Systematic performance evaluation and optimization\n",
-    "- **Module 13 - MLOps**: Production deployment with compressed models\n",
-    "\n",
-    "### 🚀 Professional Applications\n",
-    "Your compression toolkit enables:\n",
-    "- **Production AI**: Deploy efficient models at scale\n",
-    "- **Mobile Applications**: Real-time AI on smartphones and tablets\n",
-    "- **Edge Computing**: AI in IoT devices and embedded systems\n",
-    "- **Research**: Systematic compression analysis and method development\n",
-    "\n",
-    "### 🎯 The Future of Efficient AI\n",
-    "You've built the foundation for efficient AI systems:\n",
-    "- **Sustainable AI**: Reduced energy consumption and carbon footprint\n",
-    "- **Accessible AI**: AI systems that run on consumer hardware\n",
-    "- **Scalable Inference**: Cost-effective deployment at any scale\n",
-    "- **Real-time Applications**: Fast, efficient AI for interactive systems\n",
-    "\n",
-    "### 🧠 Key Skills Developed\n",
-    "- **Compression Theory**: Understanding memory, compute, and accuracy trade-offs\n",
-    "- **Mathematical Implementation**: Quantization, pruning, and distillation algorithms\n",
-    "- **Systems Engineering**: Benchmarking, comparison, and optimization frameworks\n",
-    "- **Production Readiness**: Real-world deployment considerations and techniques\n",
-    "\n",
-    "You've mastered the art and science of making neural networks efficient without sacrificing capability. This is the foundation of modern AI deployment!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d334f996",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 2
-   },
-   "source": [
-    "## 🚀 Next Steps: Advanced Optimization\n",
-    "\n",
-    "### Kernels - Hardware-Aware Optimization\n",
-    "Build on compression foundations with:\n",
-    "- **Custom CUDA kernels**: GPU-optimized operations for compressed models\n",
-    "- **SIMD optimization**: CPU vectorization for quantized operations\n",
-    "- **Memory layout**: Optimize data structures for sparse and quantized weights\n",
-    "- **Hardware profiling**: Measure actual performance improvements\n",
-    "\n",
-    "### Benchmarking - Systematic Performance Measurement\n",
-    "Apply compression in production context:\n",
-    "- **Latency measurement**: Quantify inference speedup from compression\n",
-    "- **Accuracy evaluation**: Systematic testing of compression impact\n",
-    "- **A/B testing**: Compare compressed vs uncompressed models in production\n",
-    "- **Performance profiling**: Identify bottlenecks and optimization opportunities\n",
-    "\n",
-    "### MLOps - Production Deployment\n",
-    "Deploy compressed models at scale:\n",
-    "- **Model versioning**: Manage compressed model variants\n",
-    "- **Monitoring**: Track compressed model performance in production\n",
-    "- **Continuous optimization**: Automated compression pipeline\n",
-    "- **Edge deployment**: Distribute compressed models to mobile and IoT devices\n",
-    "\n",
-    "### 🔬 Research Directions\n",
-    "Advanced compression techniques:\n",
-    "- **Neural Architecture Search**: Automated compression-aware design\n",
-    "- **Hardware-aware compression**: Optimize for specific deployment targets\n",
-    "- **Dynamic compression**: Adaptive compression based on runtime conditions\n",
-    "- **Federated compression**: Compress models for distributed learning\n",
-    "\n",
-    "### 💼 Career Applications\n",
-    "These compression skills are essential for:\n",
-    "- **Mobile AI Engineer**: Optimize models for smartphones and tablets\n",
-    "- **Edge AI Developer**: Deploy AI on IoT and embedded systems\n",
-    "- **ML Infrastructure Engineer**: Build efficient inference systems\n",
-    "- **Research Scientist**: Advance state-of-art compression techniques\n",
-    "\n",
-    "The compression module provides the foundation for all advanced optimization and deployment scenarios!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/13_kernels/kernels_dev.ipynb b/modules/source/13_kernels/kernels_dev.ipynb
deleted file mode 100644
index 1632ab4b..00000000
--- a/modules/source/13_kernels/kernels_dev.ipynb
+++ /dev/null
@@ -1,1743 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "587e066d",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Kernels - Hardware-Optimized ML Operations\n",
-    "\n",
-    "Welcome to the Kernels module! This is where we move beyond NumPy to understand how ML operations are optimized for modern hardware. You'll implement custom kernels that run faster than standard library functions.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand why custom kernels matter for ML performance\n",
-    "- Implement vectorized operations using SIMD principles\n",
-    "- Master memory-efficient algorithms for better cache utilization\n",
-    "- Build parallel processing patterns for CPU and GPU-style computing\n",
-    "- Create performance profiling tools to measure and optimize code\n",
-    "- Apply kernel optimizations to compressed model operations\n",
-    "\n",
-    "## Build → Use → Optimize\n",
-    "1. **Build**: Custom operations, vectorization, and memory optimization\n",
-    "2. **Use**: Apply optimized kernels to real ML workloads\n",
-    "3. **Optimize**: Profile, measure, and improve performance systematically"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ffa956a4",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "kernels-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.kernels\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import sys\n",
-    "import os\n",
-    "import time\n",
-    "import tracemalloc\n",
-    "import psutil\n",
-    "from typing import Callable, Dict, Any, Optional, Tuple, List\n",
-    "from functools import wraps\n",
-    "from pathlib import Path\n",
-    "\n",
-    "# Import our existing components\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.layers import matmul_naive as matmul\n",
-    "    from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
-    "    from tinytorch.core.cnn import Conv2D\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))\n",
-    "    sys.path.extend([\n",
-    "        os.path.join(base_dir, '01_tensor'),\n",
-    "        os.path.join(base_dir, '02_activations'),\n",
-    "        os.path.join(base_dir, '03_layers'),\n",
-    "        os.path.join(base_dir, '05_cnn'),\n",
-    "        os.path.join(base_dir, 'utils')\n",
-    "    ])\n",
-    "    \n",
-    "    try:\n",
-    "        from tensor_dev import Tensor\n",
-    "        from layers_dev import matmul_naive as matmul\n",
-    "        from activations_dev import ReLU, Sigmoid, Tanh\n",
-    "        from cnn_dev import Conv2D\n",
-    "    except ImportError:\n",
-    "        # Create minimal mock for development\n",
-    "        class Tensor:\n",
-    "            def __init__(self, data):\n",
-    "                self.data = np.array(data)\n",
-    "                self.shape = self.data.shape\n",
-    "            def __str__(self):\n",
-    "                return f\"Tensor({self.data})\"\n",
-    "\n",
-    "# Simple timing utility for kernel performance measurement\n",
-    "def time_kernel(func, *args, **kwargs):\n",
-    "    \"\"\"\n",
-    "    Simple timing function for measuring kernel performance.\n",
-    "    \n",
-    "    Returns:\n",
-    "        tuple: (result, time_in_microseconds)\n",
-    "    \"\"\"\n",
-    "    start = time.perf_counter()\n",
-    "    result = func(*args, **kwargs)\n",
-    "    end = time.perf_counter()\n",
-    "    microseconds = (end - start) * 1_000_000\n",
-    "    return result, microseconds"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8f7acaa1",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "kernels-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🔥 TinyTorch Kernels Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(f\"System: {psutil.cpu_count()} CPU cores, {psutil.virtual_memory().total // (1024**3):.1f}GB RAM\")\n",
-    "print(\"Ready to optimize ML operations!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "23adc0f7",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/11_kernels/kernels_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.kernels`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.kernels import vectorized_matmul, parallel_relu, cached_conv2d\n",
-    "from tinytorch.core.tensor import Tensor\n",
-    "from tinytorch.core.layers import Dense\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Performance:** Custom kernels can be 2-10x faster than naive implementations\n",
-    "- **Understanding:** Learn how PyTorch, TensorFlow achieve their speed\n",
-    "- **Real-world:** Modern ML frameworks rely heavily on optimized kernels\n",
-    "- **Hardware:** Bridge the gap between algorithms and computer architecture"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "35692f6e",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What are ML Kernels?\n",
-    "\n",
-    "### The Performance Gap\n",
-    "Your neural network training is slow. A simple matrix multiplication that should take milliseconds takes seconds. Why?\n",
-    "\n",
-    "**The problem:** NumPy operations, while convenient, aren't optimized for your specific hardware or use case.\n",
-    "\n",
-    "**The solution:** Custom kernels - specialized functions written to extract maximum performance from your hardware.\n",
-    "\n",
-    "### What is a Kernel?\n",
-    "A **kernel** is a highly optimized function that performs a specific computation:\n",
-    "\n",
-    "```python\n",
-    "# Standard approach - easy but slow\n",
-    "def slow_matmul(A, B):\n",
-    "    return np.dot(A, B)\n",
-    "\n",
-    "# Kernel approach - harder but fast\n",
-    "def fast_matmul(A, B):\n",
-    "    # Optimized for your CPU's cache hierarchy\n",
-    "    # Uses SIMD instructions for parallel operations\n",
-    "    # Minimizes memory allocations\n",
-    "    return optimized_result\n",
-    "```\n",
-    "\n",
-    "### Why Kernels Matter for ML\n",
-    "Modern ML frameworks achieve their speed through thousands of optimized kernels:\n",
-    "\n",
-    "- **PyTorch**: 2000+ CUDA kernels, 500+ CPU kernels\n",
-    "- **TensorFlow**: XLA compiler generates optimized kernels\n",
-    "- **JAX**: JIT compilation creates specialized kernels\n",
-    "- **Hardware**: GPUs have 1000s of cores, TPUs have specialized ML units\n",
-    "\n",
-    "### The Performance Hierarchy\n",
-    "```\n",
-    "Python loops:        1x speed    (baseline)\n",
-    "NumPy operations:    10x speed   (vectorized)\n",
-    "Optimized kernels:   100x speed  (hardware-aware)\n",
-    "GPU kernels:         1000x speed (massive parallelism)\n",
-    "```\n",
-    "\n",
-    "### Real-World Impact\n",
-    "- **Training time**: 10-hour training → 1-hour training\n",
-    "- **Inference cost**: $1000/month → $100/month\n",
-    "- **Model size**: Enable larger models through efficiency\n",
-    "- **Energy**: 90% reduction in power consumption\n",
-    "\n",
-    "### What You'll Learn\n",
-    "1. **Custom operations** - Moving beyond NumPy limitations\n",
-    "2. **Vectorization** - Using SIMD for parallel computation\n",
-    "3. **Memory optimization** - Cache-friendly algorithms\n",
-    "4. **Parallel processing** - CPU and GPU-style parallelism\n",
-    "5. **Performance measurement** - Professional profiling tools\n",
-    "6. **Compressed kernels** - Optimizations for quantized models\n",
-    "\n",
-    "Let's build the optimizations that power modern AI!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8be40806",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: Custom Operations - Beyond NumPy\n",
-    "\n",
-    "### Why Custom Operations?\n",
-    "NumPy is great for prototyping, but has limitations:\n",
-    "- **Generic**: Optimized for general use, not your specific case\n",
-    "- **Memory**: Creates temporary arrays, wastes memory\n",
-    "- **Control**: Can't control memory layout, algorithm choice\n",
-    "- **Specialization**: Can't optimize for your data patterns\n",
-    "\n",
-    "### The Philosophy\n",
-    "Instead of using general-purpose functions, we write **specialized** functions:\n",
-    "\n",
-    "```python\n",
-    "# Generic NumPy approach\n",
-    "def generic_activation(x):\n",
-    "    return np.maximum(0, x)  # ReLU\n",
-    "\n",
-    "# Specialized kernel approach  \n",
-    "def fast_relu_kernel(x):\n",
-    "    # Optimized for your specific use case\n",
-    "    # No unnecessary memory allocations\n",
-    "    # Optimized for your data sizes\n",
-    "    return result\n",
-    "```\n",
-    "\n",
-    "### Design Principles\n",
-    "- **Specialization**: Optimize for specific input patterns\n",
-    "- **Memory efficiency**: Minimize allocations and copies\n",
-    "- **Algorithmic choice**: Pick the best algorithm for your data\n",
-    "- **Measurement**: Always profile before and after\n",
-    "\n",
-    "### Real-World Context\n",
-    "This is how:\n",
-    "- **PyTorch**: Custom autograd functions override standard operations\n",
-    "- **TensorFlow**: tf.function compiles optimized graphs\n",
-    "- **JAX**: jax.jit creates specialized kernels\n",
-    "- **CUDA**: Every GPU operation is a custom kernel"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "1071a672",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "custom-matmul",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def matmul_baseline(A: Tensor, B: Tensor) -> Tensor:\n",
-    "    \"\"\"\n",
-    "    Baseline matrix multiplication using TinyTorch's proven implementation.\n",
-    "    \n",
-    "    This function demonstrates how to build on existing TinyTorch components\n",
-    "    rather than reinventing the wheel. We use the standard matmul from Module 03\n",
-    "    as our baseline for comparison with optimized kernels.\n",
-    "    \n",
-    "    This is NOT a custom implementation - it's the standard TinyTorch matmul\n",
-    "    wrapped for use in kernel comparisons and benchmarking.\n",
-    "    \n",
-    "    TODO: Use TinyTorch's standard matmul implementation as a baseline.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Import the standard matmul function from tinytorch.core.layers\n",
-    "    2. Extract numpy arrays from input Tensors\n",
-    "    3. Use the proven implementation from TinyTorch\n",
-    "    4. Wrap result back in Tensor format\n",
-    "    5. Return the result\n",
-    "    \n",
-    "    CODE REUSE PRINCIPLES:\n",
-    "    1. Always use the packaged version for reliability\n",
-    "    2. Don't duplicate working code - reference the source\n",
-    "    3. Use descriptive names that indicate what the function actually does\n",
-    "    4. Keep dependencies simple and reliable\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    A = Tensor([[1, 2], [3, 4]])\n",
-    "    B = Tensor([[5, 6], [7, 8]])\n",
-    "    C = matmul_baseline(A, B)\n",
-    "    # Expected: [[19, 22], [43, 50]]\n",
-    "    ```\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This shows how to use TinyTorch as a library\n",
-    "    - Demonstrates reliable dependency management\n",
-    "    - Serves as baseline for kernel performance comparisons\n",
-    "    - Shows proper software engineering practices\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Extract numpy arrays from Tensors\n",
-    "    A_data = A.data if hasattr(A, 'data') else A\n",
-    "    B_data = B.data if hasattr(B, 'data') else B\n",
-    "    \n",
-    "    # Use NumPy's matrix multiplication as our baseline\n",
-    "    # This is our baseline - reliable, tested, and consistent\n",
-    "    result_data = np.dot(A_data, B_data)\n",
-    "    \n",
-    "    # Wrap the result back in a Tensor for consistency\n",
-    "    result = Tensor(result_data)\n",
-    "    \n",
-    "    return result\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ecf63ac1",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-custom-matmul",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "### 🧪 Unit Test: Baseline Matrix Multiplication\n",
-    "\n",
-    "def test_matmul_baseline():\n",
-    "    \"\"\"Test baseline matrix multiplication implementation.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Baseline Matrix Multiplication...\")\n",
-    "    \n",
-    "    # Test case 1: Small matrices (2x2)\n",
-    "    A = Tensor([[1, 2], [3, 4]])\n",
-    "    B = Tensor([[5, 6], [7, 8]])\n",
-    "    C = matmul_baseline(A, B)\n",
-    "    expected = Tensor([[19, 22], [43, 50]])  # Hand-computed\n",
-    "    \n",
-    "    assert np.allclose(C.data, expected.data), f\"Expected {expected.data}, got {C.data}\"\n",
-    "    print(\"✅ Small matrix multiplication works\")\n",
-    "    \n",
-    "    # Test case 2: Rectangular matrices\n",
-    "    A = Tensor([[1, 2, 3], [4, 5, 6]])  # 2x3\n",
-    "    B = Tensor([[7, 8], [9, 10], [11, 12]])  # 3x2\n",
-    "    C = matmul_baseline(A, B)\n",
-    "    expected = Tensor([[58, 64], [139, 154]])\n",
-    "    \n",
-    "    assert np.allclose(C.data, expected.data), f\"Expected {expected.data}, got {C.data}\"\n",
-    "    print(\"✅ Rectangular matrix multiplication works\")\n",
-    "    \n",
-    "    # Test case 3: Compare with NumPy (medium size - should use TinyTorch implementation)\n",
-    "    np.random.seed(42)\n",
-    "    A = Tensor(np.random.randn(32, 32))\n",
-    "    B = Tensor(np.random.randn(32, 32))\n",
-    "    \n",
-    "    C_baseline = matmul_baseline(A, B)\n",
-    "    C_numpy = Tensor(np.dot(A.data, B.data))\n",
-    "    \n",
-    "    assert np.allclose(C_baseline.data, C_numpy.data, rtol=1e-10), \"Baseline implementation differs from NumPy\"\n",
-    "    print(\"✅ Baseline implementation matches NumPy\")\n",
-    "    \n",
-    "    # Test case 4: Large matrix\n",
-    "    A = Tensor(np.random.randn(100, 100))\n",
-    "    B = Tensor(np.random.randn(100, 100))\n",
-    "    C = matmul_baseline(A, B)\n",
-    "    \n",
-    "    assert C.shape == (100, 100), f\"Expected shape (100, 100), got {C.shape}\"\n",
-    "    print(\"✅ Large matrix multiplication works\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Baseline Matrix Multiplication ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_matmul_baseline()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "cc5be1bb",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Vectorized Operations - SIMD Principles\n",
-    "\n",
-    "### What is Vectorization?\n",
-    "**Vectorization** means processing multiple data elements in parallel using SIMD (Single Instruction, Multiple Data) operations.\n",
-    "\n",
-    "### The Problem with Loops\n",
-    "```python\n",
-    "# Scalar processing - one element at a time\n",
-    "def slow_relu(x):\n",
-    "    result = np.zeros_like(x)\n",
-    "    for i in range(len(x)):\n",
-    "        result[i] = max(0, x[i])  # One operation per cycle\n",
-    "    return result\n",
-    "```\n",
-    "\n",
-    "### The Vectorization Solution\n",
-    "```python\n",
-    "# Vector processing - multiple elements at once\n",
-    "def fast_relu(x):\n",
-    "    return np.maximum(0, x)  # Many operations per cycle\n",
-    "```\n",
-    "\n",
-    "### Why Vectorization Matters\n",
-    "- **CPU SIMD**: Modern CPUs can process 4-8 floats simultaneously\n",
-    "- **GPU parallelism**: GPUs have thousands of cores for parallel processing\n",
-    "- **Memory bandwidth**: Better utilization of memory transfers\n",
-    "- **Compiler optimization**: Enables automatic vectorization\n",
-    "\n",
-    "### SIMD Principles\n",
-    "1. **Data parallelism**: Same operation on multiple data elements\n",
-    "2. **Memory alignment**: Aligned data enables faster SIMD instructions\n",
-    "3. **Batch processing**: Process data in chunks that fit SIMD registers\n",
-    "4. **Avoid branches**: Conditional operations break SIMD efficiency\n",
-    "\n",
-    "### Real-World Context\n",
-    "- **NumPy**: All operations are vectorized using BLAS/LAPACK\n",
-    "- **PyTorch**: Vectorized operations compile to SIMD instructions\n",
-    "- **GPU kernels**: Thousands of parallel threads process data\n",
-    "- **AVX-512**: Intel's latest SIMD can process 16 floats at once"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "40e8241f",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "vectorized-relu",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def vectorized_relu(x: Tensor) -> Tensor:\n",
-    "    \"\"\"\n",
-    "    Vectorized ReLU implementation demonstrating SIMD principles.\n",
-    "    \n",
-    "    This function shows how to write operations that take advantage of\n",
-    "    CPU vectorization capabilities for better performance.\n",
-    "    \n",
-    "    TODO: Implement a vectorized ReLU that's optimized for performance.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Extract numpy array from Tensor\n",
-    "    2. Use NumPy's vectorized operations (these compile to SIMD instructions)\n",
-    "    3. Apply ReLU: f(x) = max(0, x) for all elements simultaneously\n",
-    "    4. Return result as Tensor\n",
-    "    \n",
-    "    VECTORIZATION TECHNIQUES:\n",
-    "    1. Use np.maximum instead of loops - this is vectorized\n",
-    "    2. Ensure input is contiguous in memory for better SIMD performance\n",
-    "    3. Consider using specific dtypes (float32 vs float64) for SIMD alignment\n",
-    "    4. Avoid conditional operations that break vectorization\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    x = Tensor([-2, -1, 0, 1, 2])\n",
-    "    y = vectorized_relu(x)\n",
-    "    # Expected: [0, 0, 0, 1, 2]\n",
-    "    ```\n",
-    "    \n",
-    "    PERFORMANCE CONSIDERATIONS:\n",
-    "    - np.maximum is vectorized and uses SIMD instructions\n",
-    "    - Memory layout matters: contiguous arrays are faster\n",
-    "    - Data type matters: float32 allows more SIMD parallelism than float64\n",
-    "    - Avoid Python loops - they can't be vectorized\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how PyTorch's ReLU is implemented under the hood\n",
-    "    - GPU kernels use similar principles with thousands of parallel threads\n",
-    "    - Modern CPUs can process 4-16 floats simultaneously with SIMD\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Extract numpy array\n",
-    "    x_data = x.data if hasattr(x, 'data') else x\n",
-    "    \n",
-    "    # Ensure contiguous memory layout for better SIMD performance\n",
-    "    if not x_data.flags.c_contiguous:\n",
-    "        x_data = np.ascontiguousarray(x_data)\n",
-    "    \n",
-    "    # Vectorized ReLU using NumPy's maximum function\n",
-    "    # This compiles to SIMD instructions on modern CPUs\n",
-    "    result = np.maximum(0, x_data)\n",
-    "    \n",
-    "    return Tensor(result)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "45ada8d8",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "vectorized-operations",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def vectorized_operations(x: Tensor, y: Tensor) -> Dict[str, Tensor]:\n",
-    "    \"\"\"\n",
-    "    Demonstration of various vectorized operations.\n",
-    "    \n",
-    "    Shows how multiple operations can be vectorized for better performance.\n",
-    "    \n",
-    "    TODO: Implement a collection of vectorized operations.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Extract numpy arrays from input Tensors\n",
-    "    2. Implement vectorized versions of common operations\n",
-    "    3. Use NumPy's built-in vectorized functions\n",
-    "    4. Return dictionary of results\n",
-    "    \n",
-    "    OPERATIONS TO IMPLEMENT:\n",
-    "    - element_wise_multiply: x * y (element-wise)\n",
-    "    - element_wise_add: x + y (element-wise)\n",
-    "    - squared_difference: (x - y)^2\n",
-    "    - euclidean_distance: sqrt(sum((x - y)^2))\n",
-    "    - dot_product: sum(x * y)\n",
-    "    \n",
-    "    VECTORIZATION PRINCIPLES:\n",
-    "    - Use NumPy operations instead of Python loops\n",
-    "    - Combine operations when possible: (x - y)**2 instead of subtract then square\n",
-    "    - Consider memory layout and data types\n",
-    "    - Measure performance improvements\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    x = Tensor([1, 2, 3, 4])\n",
-    "    y = Tensor([2, 3, 4, 5])\n",
-    "    results = vectorized_operations(x, y)\n",
-    "    # Returns dict with all vectorized operation results\n",
-    "    ```\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Extract numpy arrays\n",
-    "    x_data = x.data if hasattr(x, 'data') else x\n",
-    "    y_data = y.data if hasattr(y, 'data') else y\n",
-    "    \n",
-    "    # Ensure arrays are the same shape for element-wise operations\n",
-    "    assert x_data.shape == y_data.shape, f\"Shape mismatch: {x_data.shape} vs {y_data.shape}\"\n",
-    "    \n",
-    "    # Vectorized operations\n",
-    "    results = {\n",
-    "        'element_wise_multiply': Tensor(x_data * y_data),\n",
-    "        'element_wise_add': Tensor(x_data + y_data),\n",
-    "        'squared_difference': Tensor((x_data - y_data) ** 2),\n",
-    "        'euclidean_distance': Tensor(np.sqrt(np.sum((x_data - y_data) ** 2))),\n",
-    "        'dot_product': Tensor(np.dot(x_data.flatten(), y_data.flatten()))\n",
-    "    }\n",
-    "    \n",
-    "    return results\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c722c6c8",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-vectorized-operations",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "### 🧪 Unit Test: Vectorized Operations\n",
-    "\n",
-    "def test_vectorized_operations():\n",
-    "    \"\"\"Test vectorized operations implementation.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Vectorized Operations...\")\n",
-    "    \n",
-    "    # Test vectorized ReLU\n",
-    "    x = Tensor([-2, -1, 0, 1, 2])\n",
-    "    y = vectorized_relu(x)\n",
-    "    expected = [0, 0, 0, 1, 2]\n",
-    "    \n",
-    "    assert np.allclose(y.data, expected), f\"Expected {expected}, got {y.data}\"\n",
-    "    print(\"✅ Vectorized ReLU works\")\n",
-    "    \n",
-    "    # Test vectorized operations\n",
-    "    x = Tensor([1, 2, 3, 4])\n",
-    "    y = Tensor([2, 3, 4, 5])\n",
-    "    results = vectorized_operations(x, y)\n",
-    "    \n",
-    "    # Check element-wise multiply\n",
-    "    expected_mul = [2, 6, 12, 20]\n",
-    "    assert np.allclose(results['element_wise_multiply'].data, expected_mul), \\\n",
-    "        f\"Expected {expected_mul}, got {results['element_wise_multiply'].data}\"\n",
-    "    print(\"✅ Element-wise multiply works\")\n",
-    "    \n",
-    "    # Check element-wise add\n",
-    "    expected_add = [3, 5, 7, 9]\n",
-    "    assert np.allclose(results['element_wise_add'].data, expected_add), \\\n",
-    "        f\"Expected {expected_add}, got {results['element_wise_add'].data}\"\n",
-    "    print(\"✅ Element-wise add works\")\n",
-    "    \n",
-    "    # Check squared difference\n",
-    "    expected_sq_diff = [1, 1, 1, 1]  # (1-2)^2, (2-3)^2, etc.\n",
-    "    assert np.allclose(results['squared_difference'].data, expected_sq_diff), \\\n",
-    "        f\"Expected {expected_sq_diff}, got {results['squared_difference'].data}\"\n",
-    "    print(\"✅ Squared difference works\")\n",
-    "    \n",
-    "    # Check dot product\n",
-    "    expected_dot = 40  # 1*2 + 2*3 + 3*4 + 4*5 = 2 + 6 + 12 + 20 = 40\n",
-    "    assert np.allclose(results['dot_product'].data, expected_dot), \\\n",
-    "        f\"Expected {expected_dot}, got {results['dot_product'].data}\"\n",
-    "    print(\"✅ Dot product works\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Vectorized Operations ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_vectorized_operations()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5ce0511a",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Memory Layout Optimization - Cache-Friendly Algorithms\n",
-    "\n",
-    "### Why Memory Layout Matters\n",
-    "Modern CPUs are **memory-bound**, not compute-bound. The bottleneck isn't how fast you can multiply numbers—it's how fast you can get data from memory.\n",
-    "\n",
-    "### The Memory Hierarchy\n",
-    "```\n",
-    "CPU Registers:    1 cycle     (fastest, tiny)\n",
-    "L1 Cache:         3 cycles    (fast, small)\n",
-    "L2 Cache:         10 cycles   (medium, medium)\n",
-    "L3 Cache:         40 cycles   (slow, large)\n",
-    "Main Memory:      200+ cycles (slowest, huge)\n",
-    "```\n",
-    "\n",
-    "### Cache-Friendly Principles\n",
-    "1. **Spatial locality**: Access nearby memory locations\n",
-    "2. **Temporal locality**: Reuse recently accessed data\n",
-    "3. **Cache lines**: Memory is loaded in 64-byte chunks\n",
-    "4. **Cache blocking**: Process data in cache-sized chunks\n",
-    "\n",
-    "### Real-World Impact\n",
-    "- **Matrix multiplication**: Cache-friendly algorithms are 10x faster\n",
-    "- **Image processing**: Row-major vs column-major access patterns\n",
-    "- **Neural networks**: Memory layout affects training speed significantly\n",
-    "\n",
-    "### The Problem with Naive Algorithms\n",
-    "```python\n",
-    "# Cache-unfriendly: jumps around memory\n",
-    "def slow_transpose(A):\n",
-    "    for i in range(rows):\n",
-    "        for j in range(cols):\n",
-    "            B[j, i] = A[i, j]  # Poor cache locality\n",
-    "```\n",
-    "\n",
-    "### Cache-Friendly Solution\n",
-    "```python\n",
-    "# Cache-friendly: processes data in blocks\n",
-    "def fast_transpose(A):\n",
-    "    # Process in cache-sized blocks\n",
-    "    for block_i in range(0, rows, BLOCK_SIZE):\n",
-    "        for block_j in range(0, cols, BLOCK_SIZE):\n",
-    "            # Process block - good cache locality\n",
-    "            for i in range(block_i, min(block_i + BLOCK_SIZE, rows)):\n",
-    "                for j in range(block_j, min(block_j + BLOCK_SIZE, cols)):\n",
-    "                    B[j, i] = A[i, j]\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "dde51622",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "cache-friendly-matmul",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def cache_friendly_matmul(A: Tensor, B: Tensor, block_size: int = 32) -> Tensor:\n",
-    "    \"\"\"\n",
-    "    Cache-friendly matrix multiplication using blocking technique.\n",
-    "    \n",
-    "    This implementation uses cache blocking to improve memory access patterns\n",
-    "    and achieve better performance on modern CPUs.\n",
-    "    \n",
-    "    TODO: Implement cache-friendly matrix multiplication using blocking.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Extract numpy arrays and get dimensions\n",
-    "    2. Pre-allocate output matrix\n",
-    "    3. Use three nested loops for blocks: block_i, block_j, block_k\n",
-    "    4. Within each block, use three nested loops for elements: i, j, k\n",
-    "    5. Process data in cache-sized blocks for better locality\n",
-    "    \n",
-    "    BLOCKING ALGORITHM:\n",
-    "    1. Divide matrices into blocks of size block_size x block_size\n",
-    "    2. For each block of C, compute contribution from corresponding A and B blocks\n",
-    "    3. This keeps data in cache longer, reducing memory access time\n",
-    "    \n",
-    "    CACHE OPTIMIZATION PRINCIPLES:\n",
-    "    - Process data in small blocks that fit in cache\n",
-    "    - Reuse data as much as possible while it's in cache\n",
-    "    - Access memory in predictable patterns\n",
-    "    - Minimize cache misses\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    A = Tensor([[1, 2], [3, 4]])\n",
-    "    B = Tensor([[5, 6], [7, 8]])\n",
-    "    C = cache_friendly_matmul(A, B, block_size=2)\n",
-    "    # Expected: [[19, 22], [43, 50]]\n",
-    "    ```\n",
-    "    \n",
-    "    PERFORMANCE HINTS:\n",
-    "    - block_size should be chosen based on cache size\n",
-    "    - Typical L1 cache: 32KB, so block_size=32 for float32 matrices\n",
-    "    - Experiment with different block sizes for your hardware\n",
-    "    - This algorithm is O(n^3) but with much better constants\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how BLAS libraries achieve high performance\n",
-    "    - GPUs use similar tiling strategies for shared memory\n",
-    "    - Modern compilers can sometimes do this automatically\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Extract numpy arrays\n",
-    "    A_data = A.data if hasattr(A, 'data') else A\n",
-    "    B_data = B.data if hasattr(B, 'data') else B\n",
-    "    \n",
-    "    # Get dimensions\n",
-    "    m, k = A_data.shape\n",
-    "    k2, n = B_data.shape\n",
-    "    assert k == k2, f\"Cannot multiply {A_data.shape} and {B_data.shape}\"\n",
-    "    \n",
-    "    # Pre-allocate output matrix\n",
-    "    C = np.zeros((m, n), dtype=A_data.dtype)\n",
-    "    \n",
-    "    # Cache-friendly blocked matrix multiplication\n",
-    "    for block_i in range(0, m, block_size):\n",
-    "        for block_j in range(0, n, block_size):\n",
-    "            for block_k in range(0, k, block_size):\n",
-    "                # Define block boundaries\n",
-    "                end_i = min(block_i + block_size, m)\n",
-    "                end_j = min(block_j + block_size, n)\n",
-    "                end_k = min(block_k + block_size, k)\n",
-    "                \n",
-    "                # Process block - good cache locality\n",
-    "                for i in range(block_i, end_i):\n",
-    "                    for j in range(block_j, end_j):\n",
-    "                        for k_idx in range(block_k, end_k):\n",
-    "                            C[i, j] += A_data[i, k_idx] * B_data[k_idx, j]\n",
-    "    \n",
-    "    return Tensor(C)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "88cc8869",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-cache-friendly",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "### 🧪 Unit Test: Cache-Friendly Matrix Multiplication\n",
-    "\n",
-    "def test_cache_friendly_matmul():\n",
-    "    \"\"\"Test cache-friendly matrix multiplication implementation.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Cache-Friendly Matrix Multiplication...\")\n",
-    "    \n",
-    "    # Test case 1: Small matrices\n",
-    "    A = Tensor([[1, 2], [3, 4]])\n",
-    "    B = Tensor([[5, 6], [7, 8]])\n",
-    "    C = cache_friendly_matmul(A, B, block_size=2)\n",
-    "    expected = [[19, 22], [43, 50]]\n",
-    "    \n",
-    "    assert np.allclose(C.data, expected), f\"Expected {expected}, got {C.data}\"\n",
-    "    print(\"✅ Small matrix cache-friendly multiplication works\")\n",
-    "    \n",
-    "    # Test case 2: Larger matrices with different block sizes\n",
-    "    np.random.seed(42)\n",
-    "    A = Tensor(np.random.randn(64, 64))\n",
-    "    B = Tensor(np.random.randn(64, 64))\n",
-    "    \n",
-    "    C_blocked = cache_friendly_matmul(A, B, block_size=16)\n",
-    "    C_numpy = Tensor(np.dot(A.data, B.data))\n",
-    "    \n",
-    "    assert np.allclose(C_blocked.data, C_numpy.data, rtol=1e-4), \\\n",
-    "        \"Cache-friendly implementation differs from NumPy\"\n",
-    "    print(\"✅ Cache-friendly implementation matches NumPy\")\n",
-    "    \n",
-    "    # Test case 3: Non-square matrices\n",
-    "    A = Tensor(np.random.randn(48, 32))\n",
-    "    B = Tensor(np.random.randn(32, 48))\n",
-    "    \n",
-    "    C_blocked = cache_friendly_matmul(A, B, block_size=8)\n",
-    "    C_numpy = Tensor(np.dot(A.data, B.data))\n",
-    "    \n",
-    "    assert np.allclose(C_blocked.data, C_numpy.data, rtol=1e-4), \\\n",
-    "        \"Non-square cache-friendly implementation differs from NumPy\"\n",
-    "    print(\"✅ Non-square matrix cache-friendly multiplication works\")\n",
-    "    \n",
-    "    print(\"📈 Progress: Cache-Friendly Algorithms ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_cache_friendly_matmul()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "63bce775",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Parallel Processing - CPU and GPU-Style Computing\n",
-    "\n",
-    "### Why Parallel Processing?\n",
-    "Modern hardware has multiple cores, and ML workloads are inherently parallel. We need to use all available compute resources.\n",
-    "\n",
-    "### Types of Parallelism\n",
-    "1. **Data parallelism**: Split data across processors\n",
-    "2. **Task parallelism**: Split operations across processors\n",
-    "3. **Pipeline parallelism**: Different stages on different processors\n",
-    "4. **Model parallelism**: Split model across processors\n",
-    "\n",
-    "### CPU vs GPU Parallelism\n",
-    "- **CPU**: Few cores (4-64), complex operations, low latency\n",
-    "- **GPU**: Many cores (1000s), simple operations, high throughput\n",
-    "\n",
-    "### Parallel Processing Patterns\n",
-    "```python\n",
-    "# Sequential processing\n",
-    "for i in range(n):\n",
-    "    result[i] = expensive_operation(data[i])\n",
-    "\n",
-    "# Parallel processing\n",
-    "with ThreadPoolExecutor() as executor:\n",
-    "    futures = [executor.submit(expensive_operation, data[i]) for i in range(n)]\n",
-    "    results = [f.result() for f in futures]\n",
-    "```\n",
-    "\n",
-    "### Real-World Context\n",
-    "- **PyTorch**: Parallel data loading, distributed training\n",
-    "- **TensorFlow**: tf.data for parallel preprocessing\n",
-    "- **NumPy**: Multithreaded BLAS operations\n",
-    "- **GPU kernels**: Thousands of parallel threads"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5945362d",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "parallel-relu",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def parallel_relu(x: Tensor, num_workers: int = 4) -> Tensor:\n",
-    "    \"\"\"\n",
-    "    Parallel ReLU implementation using multiple CPU cores.\n",
-    "    \n",
-    "    This function demonstrates data parallelism by splitting the input\n",
-    "    across multiple worker processes.\n",
-    "    \n",
-    "    TODO: Implement parallel ReLU using multiprocessing or threading.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Extract numpy array from Tensor\n",
-    "    2. Split array into chunks for parallel processing\n",
-    "    3. Define worker function that applies ReLU to a chunk\n",
-    "    4. Use ThreadPoolExecutor to process chunks in parallel\n",
-    "    5. Combine results from all workers\n",
-    "    6. Return result as Tensor\n",
-    "    \n",
-    "    PARALLELIZATION STRATEGY:\n",
-    "    1. Split input into num_workers chunks\n",
-    "    2. Each worker processes its chunk independently\n",
-    "    3. Apply ReLU: max(0, x) to each chunk\n",
-    "    4. Combine results preserving original order\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    x = Tensor(np.random.randn(1000))\n",
-    "    y = parallel_relu(x, num_workers=4)\n",
-    "    # Processes data using 4 parallel workers\n",
-    "    ```\n",
-    "    \n",
-    "    PERFORMANCE CONSIDERATIONS:\n",
-    "    - Overhead of parallel processing may not be worth it for small arrays\n",
-    "    - Threading vs multiprocessing trade-offs\n",
-    "    - Chunk size should be large enough to amortize overhead\n",
-    "    - Consider memory bandwidth limitations\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how PyTorch processes batches in parallel\n",
-    "    - GPUs naturally do this with thousands of parallel threads\n",
-    "    - Modern deep learning frameworks heavily use parallelism\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    from concurrent.futures import ThreadPoolExecutor\n",
-    "    \n",
-    "    # Extract numpy array\n",
-    "    x_data = x.data if hasattr(x, 'data') else x\n",
-    "    \n",
-    "    # For small arrays, parallel processing isn't worth the overhead\n",
-    "    if x_data.size < 1000:\n",
-    "        return Tensor(np.maximum(0, x_data))\n",
-    "    \n",
-    "    # Split array into chunks\n",
-    "    chunk_size = max(1, x_data.size // num_workers)\n",
-    "    chunks = []\n",
-    "    flat_data = x_data.flatten()\n",
-    "    \n",
-    "    for i in range(0, len(flat_data), chunk_size):\n",
-    "        chunks.append(flat_data[i:i + chunk_size])\n",
-    "    \n",
-    "    # Worker function\n",
-    "    def relu_chunk(chunk):\n",
-    "        return np.maximum(0, chunk)\n",
-    "    \n",
-    "    # Process chunks in parallel\n",
-    "    with ThreadPoolExecutor(max_workers=num_workers) as executor:\n",
-    "        future_to_chunk = {executor.submit(relu_chunk, chunk): i for i, chunk in enumerate(chunks)}\n",
-    "        results = [None] * len(chunks)\n",
-    "        \n",
-    "        for future in future_to_chunk:\n",
-    "            chunk_idx = future_to_chunk[future]\n",
-    "            results[chunk_idx] = future.result()\n",
-    "    \n",
-    "    # Combine results\n",
-    "    combined_result = np.concatenate(results)\n",
-    "    \n",
-    "    # Reshape back to original shape\n",
-    "    result = combined_result.reshape(x_data.shape)\n",
-    "    \n",
-    "    return Tensor(result)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bb01b723",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "parallel-batch-processing",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def parallel_batch_processing(batch_data: List[Tensor], operation: Callable, num_workers: int = 4) -> List[Tensor]:\n",
-    "    \"\"\"\n",
-    "    Process a batch of tensors in parallel using multiple workers.\n",
-    "    \n",
-    "    This function demonstrates how to parallelize operations across\n",
-    "    multiple data samples, similar to how modern ML frameworks work.\n",
-    "    \n",
-    "    TODO: Implement parallel batch processing.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Take a list of Tensors and an operation function\n",
-    "    2. Use ThreadPoolExecutor to process multiple tensors simultaneously\n",
-    "    3. Apply the operation to each tensor in parallel\n",
-    "    4. Return list of results in original order\n",
-    "    \n",
-    "    PARALLELIZATION STRATEGY:\n",
-    "    1. Each worker processes one tensor at a time\n",
-    "    2. Multiple workers can process different tensors simultaneously\n",
-    "    3. Preserve order of results to match input order\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    batch = [Tensor(np.random.randn(100, 100)) for _ in range(8)]\n",
-    "    relu_op = lambda x: vectorized_relu(x)\n",
-    "    results = parallel_batch_processing(batch, relu_op, num_workers=4)\n",
-    "    # Processes 8 tensors using 4 parallel workers\n",
-    "    ```\n",
-    "    \n",
-    "    PERFORMANCE CONSIDERATIONS:\n",
-    "    - Each tensor should be large enough to justify parallel overhead\n",
-    "    - Balance number of workers with available CPU cores\n",
-    "    - Consider memory usage with multiple workers\n",
-    "    - Thread vs process pool trade-offs\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how PyTorch's DataLoader processes batches\n",
-    "    - Similar to how GPUs process multiple samples simultaneously\n",
-    "    - Foundation for distributed training across multiple nodes\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    from concurrent.futures import ThreadPoolExecutor\n",
-    "    \n",
-    "    # For small batches, parallel processing might not be worth it\n",
-    "    if len(batch_data) < num_workers:\n",
-    "        return [operation(tensor) for tensor in batch_data]\n",
-    "    \n",
-    "    # Process batch in parallel\n",
-    "    with ThreadPoolExecutor(max_workers=num_workers) as executor:\n",
-    "        # Submit all tasks\n",
-    "        future_to_index = {executor.submit(operation, tensor): i for i, tensor in enumerate(batch_data)}\n",
-    "        \n",
-    "        # Collect results in original order\n",
-    "        results = [None] * len(batch_data)\n",
-    "        for future in future_to_index:\n",
-    "            index = future_to_index[future]\n",
-    "            results[index] = future.result()\n",
-    "    \n",
-    "    return results\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ed80f5f2",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-parallel-processing",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "### 🧪 Unit Test: Parallel Processing\n",
-    "\n",
-    "def test_parallel_processing():\n",
-    "    \"\"\"Test parallel processing implementations.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Parallel Processing...\")\n",
-    "    \n",
-    "    # Test parallel ReLU\n",
-    "    x = Tensor(np.array([-2, -1, 0, 1, 2]))\n",
-    "    y = parallel_relu(x, num_workers=2)\n",
-    "    expected = [0, 0, 0, 1, 2]\n",
-    "    \n",
-    "    assert np.allclose(y.data, expected), f\"Expected {expected}, got {y.data}\"\n",
-    "    print(\"✅ Parallel ReLU works\")\n",
-    "    \n",
-    "    # Test parallel ReLU with larger data\n",
-    "    x_large = Tensor(np.random.randn(2000))\n",
-    "    y_large = parallel_relu(x_large, num_workers=4)\n",
-    "    y_sequential = vectorized_relu(x_large)\n",
-    "    \n",
-    "    assert np.allclose(y_large.data, y_sequential.data), \\\n",
-    "        \"Parallel ReLU differs from sequential version\"\n",
-    "    print(\"✅ Parallel ReLU matches sequential version\")\n",
-    "    \n",
-    "    # Test parallel batch processing\n",
-    "    batch = [Tensor(np.random.randn(100)) for _ in range(8)]\n",
-    "    relu_op = lambda x: vectorized_relu(x)\n",
-    "    \n",
-    "    results_parallel = parallel_batch_processing(batch, relu_op, num_workers=4)\n",
-    "    results_sequential = [relu_op(tensor) for tensor in batch]\n",
-    "    \n",
-    "    assert len(results_parallel) == len(results_sequential), \\\n",
-    "        f\"Expected {len(results_sequential)} results, got {len(results_parallel)}\"\n",
-    "    \n",
-    "    for i, (parallel, sequential) in enumerate(zip(results_parallel, results_sequential)):\n",
-    "        assert np.allclose(parallel.data, sequential.data), \\\n",
-    "            f\"Batch item {i}: parallel differs from sequential\"\n",
-    "    \n",
-    "    print(\"✅ Parallel batch processing works\")\n",
-    "    print(\"📈 Progress: Parallel Processing ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_parallel_processing()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "eeb8fe83",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 5: Simple Performance Measurement - Timing Your Kernels\n",
-    "\n",
-    "### Why Timing Matters\n",
-    "> \"Premature optimization is the root of all evil\" - Donald Knuth\n",
-    "\n",
-    "But **measured optimization** based on simple timing is essential for understanding kernel performance.\n",
-    "\n",
-    "### What We'll Measure\n",
-    "1. **Execution time**: How long does each kernel take?\n",
-    "2. **Relative performance**: Which implementation is faster?\n",
-    "3. **Scale effects**: How does performance change with data size?\n",
-    "4. **Optimization impact**: Did our changes actually help?\n",
-    "\n",
-    "### The Simple Timing Process\n",
-    "1. **Measure baseline**: Time the standard implementation\n",
-    "2. **Time optimizations**: Measure your improved versions\n",
-    "3. **Compare results**: See which is faster\n",
-    "4. **Verify correctness**: Ensure optimized code produces correct results\n",
-    "\n",
-    "### Our Simple Timing Tool\n",
-    "We use `time.perf_counter()` for microsecond-precision timing:\n",
-    "- **Precise**: Measures actual execution time\n",
-    "- **Simple**: Easy to understand and use\n",
-    "- **Realistic**: Shows kernel performance at the right scale\n",
-    "- **Educational**: Immediate feedback on optimization impact\n",
-    "\n",
-    "### Real-World Context\n",
-    "- **Kernel operations**: Typically take 10-1000 microseconds\n",
-    "- **Optimization impact**: Good kernels are 2-10x faster\n",
-    "- **Professional tools**: Production systems use sophisticated profilers\n",
-    "- **Foundation**: Simple timing teaches measurement principles"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "9dbac851",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-profiling",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "### 🧪 Unit Test: Simple Kernel Timing\n",
-    "\n",
-    "def test_simple_kernel_timing():\n",
-    "    \"\"\"Test simple kernel timing capabilities.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Simple Kernel Timing...\")\n",
-    "    \n",
-    "    # Test timing different matrix multiplication methods\n",
-    "    np.random.seed(42)\n",
-    "    A = Tensor(np.random.randn(100, 100))\n",
-    "    B = Tensor(np.random.randn(100, 100))\n",
-    "    \n",
-    "    # Time NumPy matmul\n",
-    "    result_numpy, time_numpy = time_kernel(lambda: Tensor(np.dot(A.data, B.data)))\n",
-    "    print(f\"🔍 NumPy matmul: {time_numpy:.1f} μs\")\n",
-    "    \n",
-    "    # Time baseline matmul  \n",
-    "    result_baseline, time_baseline = time_kernel(matmul_baseline, A, B)\n",
-    "    print(f\"🔍 Baseline matmul: {time_baseline:.1f} μs\")\n",
-    "    \n",
-    "    # Time cache-friendly matmul\n",
-    "    result_cache, time_cache = time_kernel(cache_friendly_matmul, A, B, 16)\n",
-    "    print(f\"🔍 Cache-friendly matmul: {time_cache:.1f} μs\")\n",
-    "    \n",
-    "    # Verify results are similar\n",
-    "    assert np.allclose(result_numpy.data, result_baseline.data, rtol=1e-4), \\\n",
-    "        \"NumPy and baseline results differ\"\n",
-    "    assert np.allclose(result_numpy.data, result_cache.data, rtol=1e-2), \\\n",
-    "        \"NumPy and cache-friendly results differ\"\n",
-    "    \n",
-    "    print(\"✅ All matrix multiplication methods produce correct results\")\n",
-    "    \n",
-    "    # Test timing parallel vs sequential ReLU\n",
-    "    x_large = Tensor(np.random.randn(10000))\n",
-    "    \n",
-    "    result_seq, time_seq = time_kernel(vectorized_relu, x_large)\n",
-    "    result_par, time_par = time_kernel(parallel_relu, x_large, 4)\n",
-    "    \n",
-    "    print(f\"🔍 Sequential ReLU: {time_seq:.1f} μs\")\n",
-    "    print(f\"🔍 Parallel ReLU: {time_par:.1f} μs\")\n",
-    "    \n",
-    "    # Verify results are the same\n",
-    "    assert np.allclose(result_seq.data, result_par.data), \\\n",
-    "        \"Sequential and parallel ReLU results differ\"\n",
-    "    \n",
-    "    print(\"✅ Simple timing works correctly\")\n",
-    "    print(\"📈 Progress: Simple Kernel Timing ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_simple_kernel_timing()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0dc7731d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 6: Compressed Model Kernels - Optimizing Quantized Operations\n",
-    "\n",
-    "### Why Compressed Model Kernels?\n",
-    "Modern deployment requires smaller, faster models:\n",
-    "- **Mobile devices**: Limited compute and memory\n",
-    "- **Edge computing**: Real-time inference requirements\n",
-    "- **Cloud costs**: Reduce computational expenses\n",
-    "- **Energy efficiency**: Lower power consumption\n",
-    "\n",
-    "### Types of Model Compression\n",
-    "1. **Quantization**: Reduce precision (float32 → int8)\n",
-    "2. **Pruning**: Remove unimportant weights\n",
-    "3. **Knowledge distillation**: Train smaller models\n",
-    "4. **Low-rank approximation**: Factorize weight matrices\n",
-    "\n",
-    "### Quantization Fundamentals\n",
-    "```python\n",
-    "# Original: 32-bit floating point\n",
-    "weights_fp32 = np.array([1.234, -0.567, 2.891])\n",
-    "\n",
-    "# Quantized: 8-bit integer\n",
-    "scale = max(weights_fp32) / 127\n",
-    "weights_int8 = np.round(weights_fp32 / scale).astype(np.int8)\n",
-    "\n",
-    "# Dequantized for computation\n",
-    "weights_dequant = weights_int8 * scale\n",
-    "```\n",
-    "\n",
-    "### Why Custom Kernels for Compression?\n",
-    "- **Integer arithmetic**: Faster than floating-point on many devices\n",
-    "- **Memory bandwidth**: 4x less data to transfer\n",
-    "- **Specialized instructions**: CPUs have optimized int8 operations\n",
-    "- **Accumulation**: Need to handle precision carefully\n",
-    "\n",
-    "### Real-World Context\n",
-    "- **TensorFlow Lite**: Quantized inference kernels\n",
-    "- **PyTorch Mobile**: Optimized int8 operations\n",
-    "- **ONNX Runtime**: Hardware-specific quantized kernels\n",
-    "- **Hardware accelerators**: TPUs, Neural Processing Units"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "57ebf28f",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "quantized-matmul",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def quantized_matmul(A: Tensor, B: Tensor, scale_A: float = 1.0, scale_B: float = 1.0) -> Tensor:\n",
-    "    \"\"\"\n",
-    "    Quantized matrix multiplication kernel for compressed models.\n",
-    "    \n",
-    "    This function demonstrates how to perform matrix multiplication\n",
-    "    with quantized (int8) weights while maintaining numerical accuracy.\n",
-    "    \n",
-    "    TODO: Implement quantized matrix multiplication.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Extract numpy arrays from Tensors\n",
-    "    2. Quantize inputs to int8 using provided scales\n",
-    "    3. Perform integer matrix multiplication\n",
-    "    4. Rescale result back to appropriate range\n",
-    "    5. Return result as Tensor\n",
-    "    \n",
-    "    QUANTIZATION PROCESS:\n",
-    "    1. Quantize: int8_value = round(float_value / scale)\n",
-    "    2. Compute: int8_result = int8_A @ int8_B\n",
-    "    3. Rescale: float_result = int8_result * scale_A * scale_B\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    A = Tensor([[1.0, 2.0], [3.0, 4.0]])\n",
-    "    B = Tensor([[0.5, 1.5], [2.5, 3.5]])\n",
-    "    C = quantized_matmul(A, B, scale_A=1.0/127, scale_B=1.0/127)\n",
-    "    # Should approximate regular matrix multiplication\n",
-    "    ```\n",
-    "    \n",
-    "    PERFORMANCE CONSIDERATIONS:\n",
-    "    - int8 operations are often faster than float32\n",
-    "    - Memory usage is 4x lower\n",
-    "    - Accumulation in int32 to prevent overflow\n",
-    "    - Careful handling of scales to maintain precision\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how TensorFlow Lite performs quantized inference\n",
-    "    - Similar to how mobile ML accelerators work\n",
-    "    - Foundation for edge deployment of neural networks\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Extract numpy arrays\n",
-    "    A_data = A.data if hasattr(A, 'data') else A\n",
-    "    B_data = B.data if hasattr(B, 'data') else B\n",
-    "    \n",
-    "    # Quantize inputs to int8\n",
-    "    A_int8 = np.round(A_data / scale_A).astype(np.int8)\n",
-    "    B_int8 = np.round(B_data / scale_B).astype(np.int8)\n",
-    "    \n",
-    "    # Perform integer matrix multiplication\n",
-    "    # Use int32 for accumulation to prevent overflow\n",
-    "    C_int32 = np.dot(A_int8.astype(np.int32), B_int8.astype(np.int32))\n",
-    "    \n",
-    "    # Rescale result back to float\n",
-    "    C_float = C_int32 * scale_A * scale_B\n",
-    "    \n",
-    "    return Tensor(C_float)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2739845e",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "quantized-relu",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "def quantized_relu(x: Tensor, scale: float = 1.0) -> Tensor:\n",
-    "    \"\"\"\n",
-    "    Quantized ReLU implementation for compressed models.\n",
-    "    \n",
-    "    This function shows how to apply ReLU activation to quantized values\n",
-    "    while maintaining the quantization format.\n",
-    "    \n",
-    "    TODO: Implement quantized ReLU activation.\n",
-    "    \n",
-    "    STEP-BY-STEP IMPLEMENTATION:\n",
-    "    1. Extract numpy array from Tensor\n",
-    "    2. Quantize input to int8 using provided scale\n",
-    "    3. Apply ReLU in integer domain: max(0, x)\n",
-    "    4. Keep result in int8 format (no rescaling needed for ReLU)\n",
-    "    5. Convert back to float using scale\n",
-    "    6. Return result as Tensor\n",
-    "    \n",
-    "    QUANTIZED RELU PROCESS:\n",
-    "    1. Quantize: int8_value = round(float_value / scale)\n",
-    "    2. Apply ReLU: int8_result = max(0, int8_value)\n",
-    "    3. Dequantize: float_result = int8_result * scale\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    ```python\n",
-    "    x = Tensor([-1.0, 0.0, 1.0, 2.0])\n",
-    "    y = quantized_relu(x, scale=1.0/127)\n",
-    "    # Should produce [0.0, 0.0, 1.0, 2.0] (approximately)\n",
-    "    ```\n",
-    "    \n",
-    "    OPTIMIZATION NOTES:\n",
-    "    - ReLU in int8 is just max(0, x) - very fast\n",
-    "    - No floating-point operations needed during activation\n",
-    "    - Maintains quantization format throughout\n",
-    "    - Can be vectorized efficiently\n",
-    "    \n",
-    "    LEARNING CONNECTIONS:\n",
-    "    - This is how quantized neural networks maintain speed\n",
-    "    - Similar to how mobile processors optimize ML inference\n",
-    "    - Foundation for real-time edge computing applications\n",
-    "    \"\"\"\n",
-    "    ### BEGIN SOLUTION\n",
-    "    # Extract numpy array\n",
-    "    x_data = x.data if hasattr(x, 'data') else x\n",
-    "    \n",
-    "    # Quantize input to int8\n",
-    "    x_int8 = np.round(x_data / scale).astype(np.int8)\n",
-    "    \n",
-    "    # Apply ReLU in integer domain\n",
-    "    x_relu_int8 = np.maximum(0, x_int8)\n",
-    "    \n",
-    "    # Convert back to float\n",
-    "    x_relu_float = x_relu_int8 * scale\n",
-    "    \n",
-    "    return Tensor(x_relu_float)\n",
-    "    ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "237c2d77",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-compressed-kernels",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "### 🧪 Unit Test: Compressed Model Kernels\n",
-    "\n",
-    "def test_compressed_kernels():\n",
-    "    \"\"\"Test compressed model kernel implementations.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Compressed Model Kernels...\")\n",
-    "    \n",
-    "    # Test quantized matrix multiplication\n",
-    "    A = Tensor([[1.0, 2.0], [3.0, 4.0]])\n",
-    "    B = Tensor([[0.5, 1.5], [2.5, 3.5]])\n",
-    "    \n",
-    "    # Regular matrix multiplication\n",
-    "    C_regular = matmul_baseline(A, B)\n",
-    "    \n",
-    "    # Quantized matrix multiplication\n",
-    "    # Use larger scales to prevent int8 overflow\n",
-    "    scale_A = 1.0 / 20  # Max value 4.0 / (1/20) = 80, fits in int8\n",
-    "    scale_B = 1.0 / 20  # Max value 3.5 / (1/20) = 70, fits in int8\n",
-    "    C_quantized = quantized_matmul(A, B, scale_A, scale_B)\n",
-    "    \n",
-    "    # Should be approximately equal (some quantization error expected)\n",
-    "    assert np.allclose(C_regular.data, C_quantized.data, rtol=0.1), \\\n",
-    "        f\"Regular: {C_regular.data}, Quantized: {C_quantized.data}\"\n",
-    "    print(\"✅ Quantized matrix multiplication works\")\n",
-    "    \n",
-    "    # Test quantized ReLU\n",
-    "    x = Tensor([-2.0, -1.0, 0.0, 1.0, 2.0])\n",
-    "    \n",
-    "    # Regular ReLU\n",
-    "    y_regular = vectorized_relu(x)\n",
-    "    \n",
-    "    # Quantized ReLU\n",
-    "    # Use larger scale to prevent int8 overflow\n",
-    "    scale = 1.0 / 50  # Max value 2.0 / (1/50) = 100, fits in int8\n",
-    "    y_quantized = quantized_relu(x, scale)\n",
-    "    \n",
-    "    # Should be approximately equal\n",
-    "    assert np.allclose(y_regular.data, y_quantized.data, rtol=0.1), \\\n",
-    "        f\"Regular: {y_regular.data}, Quantized: {y_quantized.data}\"\n",
-    "    print(\"✅ Quantized ReLU works\")\n",
-    "    \n",
-    "    # Test that quantized operations can be timed\n",
-    "    # This shows the performance characteristics of quantized vs regular operations\n",
-    "    x_large = Tensor(np.random.randn(1000))\n",
-    "    \n",
-    "    # Time regular ReLU\n",
-    "    _, time_regular = time_kernel(vectorized_relu, x_large)\n",
-    "    \n",
-    "    # Time quantized ReLU\n",
-    "    _, time_quantized = time_kernel(quantized_relu, x_large, 1.0/127)\n",
-    "    \n",
-    "    print(f\"🔍 Regular ReLU: {time_regular:.1f} μs\")\n",
-    "    print(f\"🔍 Quantized ReLU: {time_quantized:.1f} μs\")\n",
-    "    \n",
-    "    print(\"✅ Quantized operations timing works\")\n",
-    "    print(\"📈 Progress: Compressed Model Kernels ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_compressed_kernels()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d5f4b397",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "final-performance-test",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "### 🧪 Unit Test: Comprehensive Kernel Performance Comparison\n",
-    "\n",
-    "def final_performance_test():\n",
-    "    \"\"\"Comprehensive performance test of all implemented kernels.\"\"\"\n",
-    "    print(\"🔬 Final Performance Test: Comprehensive Kernel Comparison\")\n",
-    "    print(\"=\" * 60)\n",
-    "    \n",
-    "    # Create test data\n",
-    "    np.random.seed(42)\n",
-    "    A = Tensor(np.random.randn(256, 256))\n",
-    "    B = Tensor(np.random.randn(256, 256))\n",
-    "    x = Tensor(np.random.randn(10000))\n",
-    "    \n",
-    "    print(\"\\n📊 Matrix Multiplication Performance:\")\n",
-    "    print(\"-\" * 40)\n",
-    "    \n",
-    "    # Test different matrix multiplication methods\n",
-    "    methods = [\n",
-    "        (\"NumPy\", lambda: Tensor(np.dot(A.data, B.data))),\n",
-    "        (\"Baseline\", lambda: matmul_baseline(A, B)),\n",
-    "        (\"Cache-friendly\", lambda: cache_friendly_matmul(A, B, 32)),\n",
-    "        (\"Quantized\", lambda: quantized_matmul(A, B, 1.0/127, 1.0/127))\n",
-    "    ]\n",
-    "    \n",
-    "    results = {}\n",
-    "    for name, method in methods:\n",
-    "        result, time_us = time_kernel(method)\n",
-    "        results[name] = (result, time_us)\n",
-    "        print(f\"{name:15}: {time_us:.1f} μs\")\n",
-    "    \n",
-    "    print(\"\\n📊 ReLU Activation Performance:\")\n",
-    "    print(\"-\" * 40)\n",
-    "    \n",
-    "    # Test different ReLU methods\n",
-    "    relu_methods = [\n",
-    "        (\"Vectorized\", lambda: vectorized_relu(x)),\n",
-    "        (\"Parallel\", lambda: parallel_relu(x, 4)),\n",
-    "        (\"Quantized\", lambda: quantized_relu(x, 1.0/127))\n",
-    "    ]\n",
-    "    \n",
-    "    relu_results = {}\n",
-    "    for name, method in relu_methods:\n",
-    "        result, time_us = time_kernel(method)\n",
-    "        relu_results[name] = (result, time_us)\n",
-    "        print(f\"{name:15}: {time_us:.1f} μs\")\n",
-    "    \n",
-    "    print(\"\\n✅ All kernels implemented successfully!\")\n",
-    "    print(\"📈 Progress: Complete Kernels Module ✓\")\n",
-    "    \n",
-    "    # Verify correctness\n",
-    "    print(\"\\n🔍 Correctness Verification:\")\n",
-    "    print(\"-\" * 40)\n",
-    "    \n",
-    "    # Check that all matrix multiplication methods produce similar results\n",
-    "    base_result = results[\"NumPy\"][0]\n",
-    "    for name, (result, _) in results.items():\n",
-    "        if name != \"NumPy\":\n",
-    "            if name == \"Quantized\":\n",
-    "                # Skip quantized comparison in final test - already validated individually\n",
-    "                print(f\"⚠️  Skipping {name} comparison (quantization errors expected)\")\n",
-    "            else:\n",
-    "                assert np.allclose(base_result.data, result.data, rtol=1e-2), \\\n",
-    "                    f\"{name} differs from NumPy\"\n",
-    "    \n",
-    "    # Check that all ReLU methods produce similar results\n",
-    "    base_relu = relu_results[\"Vectorized\"][0]\n",
-    "    for name, (result, _) in relu_results.items():\n",
-    "        if name != \"Vectorized\":\n",
-    "            if name == \"Quantized\":\n",
-    "                # Skip quantized ReLU comparison - already validated individually\n",
-    "                print(f\"⚠️  Skipping {name} ReLU comparison (quantization errors expected)\")\n",
-    "            else:\n",
-    "                assert np.allclose(base_relu.data, result.data, rtol=1e-4), \\\n",
-    "                    f\"{name} ReLU differs from vectorized\"\n",
-    "    \n",
-    "    print(\"✅ All implementations produce correct results!\")\n",
-    "    \n",
-    "    print(\"\\n🎉 CONGRATULATIONS! 🎉\")\n",
-    "    print(\"You've successfully implemented hardware-optimized ML kernels!\")\n",
-    "    print(\"You now understand the performance optimizations that power modern AI frameworks.\")\n",
-    "\n",
-    "# Run the final test\n",
-    "final_performance_test()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "46930f8c",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f3d32aac",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Kernels\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f9d5ae11",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Hardware-Optimized ML Operations\n",
-    "\n",
-    "### What You've Built\n",
-    "You've implemented a complete set of hardware-optimized ML kernels:\n",
-    "\n",
-    "1. **Custom Operations**: Specialized matrix multiplication beyond NumPy\n",
-    "2. **Vectorized Operations**: SIMD-optimized ReLU and element-wise operations\n",
-    "3. **Cache-Friendly Algorithms**: Blocked matrix multiplication for better memory access\n",
-    "4. **Parallel Processing**: Multi-core CPU utilization for large operations\n",
-    "5. **Performance Profiling**: Tools to measure and optimize kernel performance\n",
-    "6. **Compressed Kernels**: Quantized operations for mobile deployment\n",
-    "\n",
-    "### Key Insights\n",
-    "- **Specialization beats generalization**: Custom kernels outperform generic libraries\n",
-    "- **Memory is the bottleneck**: Cache-friendly algorithms are crucial\n",
-    "- **Parallelism is everywhere**: From SIMD to multi-core to GPU-style processing\n",
-    "- **Measurement drives optimization**: Profile first, optimize second\n",
-    "- **Compression enables deployment**: Quantized models run faster with less memory\n",
-    "\n",
-    "### Real-World Connections\n",
-    "- **PyTorch**: Uses thousands of optimized kernels for speed\n",
-    "- **TensorFlow**: XLA compiler generates specialized kernels\n",
-    "- **Mobile ML**: Quantized kernels enable edge deployment\n",
-    "- **Cloud computing**: Kernel optimization reduces server costs\n",
-    "- **Research**: Custom kernels enable larger models and faster experimentation\n",
-    "\n",
-    "### Next Steps\n",
-    "In real ML systems, you'd:\n",
-    "1. **GPU kernels**: Implement CUDA/OpenCL versions\n",
-    "2. **Auto-tuning**: Automatically find optimal parameters\n",
-    "3. **Hardware specialization**: Optimize for specific processors\n",
-    "4. **Kernel fusion**: Combine multiple operations into single kernels\n",
-    "5. **Distributed computing**: Scale kernels across multiple machines\n",
-    "\n",
-    "### 🏆 Achievement Unlocked\n",
-    "You've mastered the performance optimization techniques that power modern ML frameworks. You understand how to move beyond high-level libraries to extract maximum performance from hardware!\n",
-    "\n",
-    "**You've completed the TinyTorch Kernels module!** 🎉"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/14_benchmarking/benchmarking_dev.ipynb b/modules/source/14_benchmarking/benchmarking_dev.ipynb
deleted file mode 100644
index 13291988..00000000
--- a/modules/source/14_benchmarking/benchmarking_dev.ipynb
+++ /dev/null
@@ -1,1640 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "fb08dda3",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# Benchmarking - Systematic ML Performance Evaluation\n",
-    "\n",
-    "Welcome to the Benchmarking module! This is where we learn to systematically evaluate ML systems using industry-standard methodology inspired by MLPerf.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand the four-component MLPerf benchmarking architecture\n",
-    "- Implement different benchmark scenarios (latency, throughput, offline)\n",
-    "- Apply statistical validation for meaningful results\n",
-    "- Create professional performance reports for ML projects\n",
-    "- Learn to avoid common benchmarking pitfalls\n",
-    "\n",
-    "## Build → Use → Analyze\n",
-    "1. **Build**: Benchmarking framework with proper statistical validation\n",
-    "2. **Use**: Apply systematic evaluation to your TinyTorch models\n",
-    "3. **Analyze**: Generate professional reports with statistical confidence"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bc43e634",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "benchmarking-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.benchmarking\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import time\n",
-    "import statistics\n",
-    "import json\n",
-    "import math\n",
-    "from typing import Dict, List, Tuple, Optional, Any, Callable\n",
-    "from dataclasses import dataclass\n",
-    "from enum import Enum\n",
-    "import os\n",
-    "import sys\n",
-    "\n",
-    "# Import our TinyTorch dependencies\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.networks import Sequential\n",
-    "    from tinytorch.core.layers import Dense\n",
-    "    from tinytorch.core.activations import ReLU, Softmax\n",
-    "    from tinytorch.core.dataloader import DataLoader\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    parent_dirs = [\n",
-    "        os.path.join(os.path.dirname(__file__), '..', '01_tensor'),\n",
-    "        os.path.join(os.path.dirname(__file__), '..', '03_layers'),\n",
-    "        os.path.join(os.path.dirname(__file__), '..', '02_activations'),\n",
-    "        os.path.join(os.path.dirname(__file__), '..', '04_networks'),\n",
-    "        os.path.join(os.path.dirname(__file__), '..', '06_dataloader')\n",
-    "    ]\n",
-    "    for path in parent_dirs:\n",
-    "        if path not in sys.path:\n",
-    "            sys.path.append(path)\n",
-    "    \n",
-    "    try:\n",
-    "        from tensor_dev import Tensor\n",
-    "        from networks_dev import Sequential\n",
-    "        from layers_dev import Dense\n",
-    "        from activations_dev import ReLU, Softmax\n",
-    "        from dataloader_dev import DataLoader\n",
-    "    except ImportError:\n",
-    "        # Fallback for missing modules\n",
-    "        print(\"⚠️  Some TinyTorch modules not available - using minimal implementations\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "a1f5c2b9",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "benchmarking-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2ee51b19",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "benchmarking-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"📊 TinyTorch Benchmarking Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build professional ML benchmarking tools!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2ea1d884",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/12_benchmarking/benchmarking_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.benchmarking`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.benchmarking import TinyTorchPerf, BenchmarkScenarios\n",
-    "from tinytorch.core.benchmarking import StatisticalValidator, PerformanceReporter\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Learning:** Deep understanding of systematic evaluation\n",
-    "- **Production:** Professional benchmarking methodology\n",
-    "- **Projects:** Tools for validating your ML project performance\n",
-    "- **Career:** Industry-standard skills for ML engineering roles"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "962a0097",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What is ML Benchmarking?\n",
-    "\n",
-    "### The Systematic Evaluation Problem\n",
-    "When you build ML systems, you need to answer critical questions:\n",
-    "- **Is my model actually better?** Statistical significance vs random variation\n",
-    "- **How does it perform in production?** Latency, throughput, resource usage\n",
-    "- **Which approach should I choose?** Systematic comparison methodology\n",
-    "- **Can I trust my results?** Avoiding common benchmarking pitfalls\n",
-    "\n",
-    "### The MLPerf Architecture\n",
-    "MLPerf (Machine Learning Performance) defines the industry standard for ML benchmarking:\n",
-    "\n",
-    "```\n",
-    "┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n",
-    "│  Load Generator │───▶│ System Under    │───▶│    Dataset      │\n",
-    "│   (Controls     │    │ Test (Your ML   │    │ (Standardized   │\n",
-    "│    Queries)     │    │    Model)       │    │  Evaluation)    │\n",
-    "└─────────────────┘    └─────────────────┘    └─────────────────┘\n",
-    "```\n",
-    "\n",
-    "### The Four Components\n",
-    "1. **System Under Test (SUT)**: Your ML model/system being evaluated\n",
-    "2. **Dataset**: Standardized evaluation data (CIFAR-10, ImageNet, etc.)\n",
-    "3. **Model**: The specific architecture and weights being tested\n",
-    "4. **Load Generator**: Controls how evaluation queries are sent to the SUT\n",
-    "\n",
-    "### Why This Matters\n",
-    "- **Reproducibility**: Others can verify your results\n",
-    "- **Comparability**: Fair comparison between different approaches\n",
-    "- **Statistical validity**: Meaningful conclusions from your data\n",
-    "- **Industry standards**: Skills you'll use in ML engineering careers\n",
-    "\n",
-    "### Real-World Examples\n",
-    "- **Google**: Uses similar patterns for production ML system evaluation\n",
-    "- **Meta**: A/B testing frameworks follow these principles\n",
-    "- **OpenAI**: GPT model comparisons use systematic benchmarking\n",
-    "- **Research**: All major ML conferences require proper evaluation methodology"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8494c11e",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## Step 1: Benchmark Scenarios - How to Measure Performance\n",
-    "\n",
-    "### The Three Standard Scenarios\n",
-    "Different use cases require different performance measurements:\n",
-    "\n",
-    "#### 1. Single-Stream Scenario\n",
-    "- **Use case**: Mobile/edge inference, interactive applications\n",
-    "- **Pattern**: Send next query only after previous completes\n",
-    "- **Metric**: 90th percentile latency (tail latency)\n",
-    "- **Why**: Users care about worst-case response time\n",
-    "\n",
-    "#### 2. Server Scenario  \n",
-    "- **Use case**: Production web services, API endpoints\n",
-    "- **Pattern**: Poisson distribution of concurrent queries\n",
-    "- **Metric**: Queries per second (QPS) at acceptable latency\n",
-    "- **Why**: Servers handle multiple simultaneous requests\n",
-    "\n",
-    "#### 3. Offline Scenario\n",
-    "- **Use case**: Batch processing, data center workloads\n",
-    "- **Pattern**: Send all samples at once for batch processing\n",
-    "- **Metric**: Throughput (samples per second)\n",
-    "- **Why**: Batch jobs care about total processing time\n",
-    "\n",
-    "### Mathematical Foundation\n",
-    "Each scenario tests different aspects:\n",
-    "- **Latency**: Time for single sample = f(model_complexity, hardware)\n",
-    "- **Throughput**: Samples per second = f(parallelism, batch_size)\n",
-    "- **Efficiency**: Resource utilization = f(memory, compute, bandwidth)\n",
-    "\n",
-    "### Why Multiple Scenarios?\n",
-    "Real ML systems have different requirements:\n",
-    "- **Chatbot**: Low latency for good user experience\n",
-    "- **Image API**: High throughput for many concurrent users  \n",
-    "- **Data pipeline**: Maximum batch processing efficiency"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0419cb50",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Statistical Validation - Ensuring Meaningful Results\n",
-    "\n",
-    "### The Significance Problem\n",
-    "Common benchmarking mistakes:\n",
-    "```python\n",
-    "# BAD: Single run, no statistical validation\n",
-    "result_a = model_a.run_once()  # 94.2% accuracy\n",
-    "result_b = model_b.run_once()  # 94.7% accuracy\n",
-    "print(\"Model B is better!\")  # Maybe, maybe not...\n",
-    "```\n",
-    "\n",
-    "### The MLPerf Solution\n",
-    "Proper statistical validation:\n",
-    "```python\n",
-    "# GOOD: Multiple runs with confidence intervals\n",
-    "results_a = [model_a.run() for _ in range(10)]  # [93.8, 94.1, 94.3, ...]\n",
-    "results_b = [model_b.run() for _ in range(10)]  # [94.2, 94.5, 94.9, ...]\n",
-    "significance = statistical_test(results_a, results_b)\n",
-    "print(f\"Model B is {significance.p_value < 0.05} better with p={significance.p_value}\")\n",
-    "```\n",
-    "\n",
-    "### Key Statistical Concepts\n",
-    "- **Confidence intervals**: Range of likely true values\n",
-    "- **P-values**: Probability that difference is due to chance\n",
-    "- **Effect size**: Magnitude of improvement (not just significance)\n",
-    "- **Multiple comparisons**: Adjusting for testing many approaches\n",
-    "\n",
-    "### Sample Size Calculation\n",
-    "MLPerf uses this formula for minimum samples:\n",
-    "```\n",
-    "n = Φ^(-1)((1-C)/2)^2 * p(1-p) / MOE^2\n",
-    "```\n",
-    "Where:\n",
-    "- C = confidence level (0.99)\n",
-    "- p = percentile (0.90 for 90th percentile)\n",
-    "- MOE = margin of error ((1-p)/20)\n",
-    "\n",
-    "For 90th percentile with 99% confidence: **n = 24,576 samples**"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "eaa45f40",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "benchmark-scenarios",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class BenchmarkScenario(Enum):\n",
-    "    \"\"\"Standard benchmark scenarios from MLPerf\"\"\"\n",
-    "    SINGLE_STREAM = \"single_stream\"\n",
-    "    SERVER = \"server\"\n",
-    "    OFFLINE = \"offline\"\n",
-    "\n",
-    "@dataclass\n",
-    "class BenchmarkResult:\n",
-    "    \"\"\"Results from a benchmark run\"\"\"\n",
-    "    scenario: BenchmarkScenario\n",
-    "    latencies: List[float]  # All latency measurements in seconds\n",
-    "    throughput: float      # Samples per second\n",
-    "    accuracy: float        # Model accuracy (0-1)\n",
-    "    metadata: Optional[Dict[str, Any]] = None\n",
-    "\n",
-    "#| export\n",
-    "class BenchmarkScenarios:\n",
-    "    \"\"\"\n",
-    "    Implements the three standard MLPerf benchmark scenarios.\n",
-    "    \n",
-    "    TODO: Implement the three benchmark scenarios following MLPerf patterns.\n",
-    "    \n",
-    "    UNDERSTANDING THE SCENARIOS:\n",
-    "    1. Single-Stream: Send queries one at a time, measure latency\n",
-    "    2. Server: Send queries following Poisson distribution, measure QPS\n",
-    "    3. Offline: Send all queries at once, measure total throughput\n",
-    "    \n",
-    "    IMPLEMENTATION APPROACH:\n",
-    "    1. Each scenario should run the model multiple times\n",
-    "    2. Collect latency measurements for each run\n",
-    "    3. Calculate appropriate metrics for each scenario\n",
-    "    4. Return BenchmarkResult with all measurements\n",
-    "    \n",
-    "    EXAMPLE USAGE:\n",
-    "    scenarios = BenchmarkScenarios()\n",
-    "    result = scenarios.single_stream(model, dataset, num_queries=1000)\n",
-    "    print(f\"90th percentile latency: {result.latencies[int(0.9 * len(result.latencies))]} seconds\")\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        self.results = []\n",
-    "    \n",
-    "    def single_stream(self, model: Callable, dataset: List, num_queries: int = 1000) -> BenchmarkResult:\n",
-    "        \"\"\"\n",
-    "        Run single-stream benchmark scenario.\n",
-    "        \n",
-    "        TODO: Implement single-stream benchmarking.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Initialize empty list for latencies\n",
-    "        2. For each query (up to num_queries):\n",
-    "           a. Get next sample from dataset (cycle if needed)\n",
-    "           b. Record start time\n",
-    "           c. Run model on sample\n",
-    "           d. Record end time\n",
-    "           e. Calculate latency = end - start\n",
-    "           f. Add latency to list\n",
-    "        3. Calculate throughput = num_queries / total_time\n",
-    "        4. Calculate accuracy if possible\n",
-    "        5. Return BenchmarkResult with SINGLE_STREAM scenario\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use time.perf_counter() for precise timing\n",
-    "        - Use dataset[i % len(dataset)] to cycle through samples\n",
-    "        - Sort latencies for percentile calculations\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        latencies = []\n",
-    "        correct_predictions = 0\n",
-    "        total_start_time = time.perf_counter()\n",
-    "        \n",
-    "        for i in range(num_queries):\n",
-    "            # Get sample (cycle through dataset)\n",
-    "            sample = dataset[i % len(dataset)]\n",
-    "            \n",
-    "            # Time the inference\n",
-    "            start_time = time.perf_counter()\n",
-    "            result = model(sample)\n",
-    "            end_time = time.perf_counter()\n",
-    "            \n",
-    "            latency = end_time - start_time\n",
-    "            latencies.append(latency)\n",
-    "            \n",
-    "            # Simple accuracy calculation (if possible)\n",
-    "            if hasattr(sample, 'target') and hasattr(result, 'data'):\n",
-    "                predicted = np.argmax(result.data)\n",
-    "                if predicted == sample.target:\n",
-    "                    correct_predictions += 1\n",
-    "        \n",
-    "        total_time = time.perf_counter() - total_start_time\n",
-    "        throughput = num_queries / total_time\n",
-    "        accuracy = correct_predictions / num_queries if num_queries > 0 else 0.0\n",
-    "        \n",
-    "        return BenchmarkResult(\n",
-    "            scenario=BenchmarkScenario.SINGLE_STREAM,\n",
-    "            latencies=sorted(latencies),\n",
-    "            throughput=throughput,\n",
-    "            accuracy=accuracy,\n",
-    "            metadata={\"num_queries\": num_queries}\n",
-    "        )\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def server(self, model: Callable, dataset: List, target_qps: float = 10.0, \n",
-    "               duration: float = 60.0) -> BenchmarkResult:\n",
-    "        \"\"\"\n",
-    "        Run server benchmark scenario with Poisson-distributed queries.\n",
-    "        \n",
-    "        TODO: Implement server benchmarking.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Calculate inter-arrival time = 1.0 / target_qps\n",
-    "        2. Run for specified duration:\n",
-    "           a. Wait for next query arrival (Poisson distribution)\n",
-    "           b. Get sample from dataset\n",
-    "           c. Record start time\n",
-    "           d. Run model\n",
-    "           e. Record end time and latency\n",
-    "        3. Calculate actual QPS = total_queries / duration\n",
-    "        4. Return results\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use np.random.exponential(inter_arrival_time) for Poisson\n",
-    "        - Track both query arrival times and completion times\n",
-    "        - Server scenario cares about sustained throughput\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        latencies = []\n",
-    "        inter_arrival_time = 1.0 / target_qps\n",
-    "        start_time = time.perf_counter()\n",
-    "        current_time = start_time\n",
-    "        query_count = 0\n",
-    "        \n",
-    "        while (current_time - start_time) < duration:\n",
-    "            # Wait for next query (Poisson distribution)\n",
-    "            wait_time = np.random.exponential(inter_arrival_time)\n",
-    "            time.sleep(min(wait_time, 0.001))  # Small sleep to simulate waiting\n",
-    "            \n",
-    "            # Get sample\n",
-    "            sample = dataset[query_count % len(dataset)]\n",
-    "            \n",
-    "            # Time the inference\n",
-    "            query_start = time.perf_counter()\n",
-    "            result = model(sample)\n",
-    "            query_end = time.perf_counter()\n",
-    "            \n",
-    "            latency = query_end - query_start\n",
-    "            latencies.append(latency)\n",
-    "            \n",
-    "            query_count += 1\n",
-    "            current_time = time.perf_counter()\n",
-    "        \n",
-    "        actual_duration = current_time - start_time\n",
-    "        actual_qps = query_count / actual_duration\n",
-    "        \n",
-    "        return BenchmarkResult(\n",
-    "            scenario=BenchmarkScenario.SERVER,\n",
-    "            latencies=sorted(latencies),\n",
-    "            throughput=actual_qps,\n",
-    "            accuracy=0.0,  # Would need labels for accuracy\n",
-    "            metadata={\"target_qps\": target_qps, \"actual_qps\": actual_qps, \"duration\": actual_duration}\n",
-    "        )\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def offline(self, model: Callable, dataset: List, batch_size: int = 32) -> BenchmarkResult:\n",
-    "        \"\"\"\n",
-    "        Run offline benchmark scenario with batch processing.\n",
-    "        \n",
-    "        TODO: Implement offline benchmarking.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Group dataset into batches of batch_size\n",
-    "        2. For each batch:\n",
-    "           a. Record start time\n",
-    "           b. Run model on entire batch\n",
-    "           c. Record end time\n",
-    "           d. Calculate batch latency\n",
-    "        3. Calculate total throughput = total_samples / total_time\n",
-    "        4. Return results\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Process data in batches for efficiency\n",
-    "        - Measure total time for all batches\n",
-    "        - Offline cares about maximum throughput\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        latencies = []\n",
-    "        total_samples = len(dataset)\n",
-    "        total_start_time = time.perf_counter()\n",
-    "        \n",
-    "        for batch_start in range(0, total_samples, batch_size):\n",
-    "            batch_end = min(batch_start + batch_size, total_samples)\n",
-    "            batch = dataset[batch_start:batch_end]\n",
-    "            \n",
-    "            # Time the batch inference\n",
-    "            batch_start_time = time.perf_counter()\n",
-    "            for sample in batch:\n",
-    "                result = model(sample)\n",
-    "            batch_end_time = time.perf_counter()\n",
-    "            \n",
-    "            batch_latency = batch_end_time - batch_start_time\n",
-    "            latencies.append(batch_latency)\n",
-    "        \n",
-    "        total_time = time.perf_counter() - total_start_time\n",
-    "        throughput = total_samples / total_time\n",
-    "        \n",
-    "        return BenchmarkResult(\n",
-    "            scenario=BenchmarkScenario.OFFLINE,\n",
-    "            latencies=latencies,\n",
-    "            throughput=throughput,\n",
-    "            accuracy=0.0,  # Would need labels for accuracy\n",
-    "            metadata={\"batch_size\": batch_size, \"total_samples\": total_samples}\n",
-    "        )\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f469d8ac",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Benchmark Scenarios\n",
-    "\n",
-    "Let's test our benchmark scenarios with a simple mock model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e391c62c",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-scenarios",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_benchmark_scenarios():\n",
-    "    \"\"\"Test that our benchmark scenarios work correctly.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Benchmark Scenarios...\")\n",
-    "    \n",
-    "    # Create a simple mock model and dataset\n",
-    "    def mock_model(sample):\n",
-    "        # Simulate some processing time\n",
-    "        time.sleep(0.001)  # 1ms processing\n",
-    "        return {\"prediction\": np.random.rand(10)}\n",
-    "    \n",
-    "    mock_dataset = [{\"data\": np.random.rand(10)} for _ in range(100)]\n",
-    "    \n",
-    "    # Test scenarios\n",
-    "    scenarios = BenchmarkScenarios()\n",
-    "    \n",
-    "    # Test single-stream\n",
-    "    single_result = scenarios.single_stream(mock_model, mock_dataset, num_queries=10)\n",
-    "    assert single_result.scenario == BenchmarkScenario.SINGLE_STREAM\n",
-    "    assert len(single_result.latencies) == 10\n",
-    "    assert single_result.throughput > 0\n",
-    "    print(f\"✅ Single-stream: {len(single_result.latencies)} measurements\")\n",
-    "    \n",
-    "    # Test server (short duration for testing)\n",
-    "    server_result = scenarios.server(mock_model, mock_dataset, target_qps=5.0, duration=2.0)\n",
-    "    assert server_result.scenario == BenchmarkScenario.SERVER\n",
-    "    assert len(server_result.latencies) > 0\n",
-    "    assert server_result.throughput > 0\n",
-    "    print(f\"✅ Server: {len(server_result.latencies)} queries processed\")\n",
-    "    \n",
-    "    # Test offline\n",
-    "    offline_result = scenarios.offline(mock_model, mock_dataset, batch_size=5)\n",
-    "    assert offline_result.scenario == BenchmarkScenario.OFFLINE\n",
-    "    assert len(offline_result.latencies) > 0\n",
-    "    assert offline_result.throughput > 0\n",
-    "    print(f\"✅ Offline: {len(offline_result.latencies)} batches processed\")\n",
-    "    \n",
-    "    print(\"✅ All benchmark scenarios working correctly!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_benchmark_scenarios()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ee1f6291",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Statistical Validation - Ensuring Meaningful Results\n",
-    "\n",
-    "### The Confidence Problem\n",
-    "How do we know if one model is actually better than another?\n",
-    "\n",
-    "### Statistical Testing for ML\n",
-    "We need to test the null hypothesis: \"There is no significant difference between models\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "edf019ea",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "statistical-validator",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "@dataclass\n",
-    "class StatisticalValidation:\n",
-    "    \"\"\"Results from statistical validation\"\"\"\n",
-    "    is_significant: bool\n",
-    "    p_value: float\n",
-    "    effect_size: float\n",
-    "    confidence_interval: Tuple[float, float]\n",
-    "    recommendation: str\n",
-    "\n",
-    "#| export\n",
-    "class StatisticalValidator:\n",
-    "    \"\"\"\n",
-    "    Validates benchmark results using proper statistical methods.\n",
-    "    \n",
-    "    TODO: Implement statistical validation for benchmark results.\n",
-    "    \n",
-    "    UNDERSTANDING STATISTICAL TESTING:\n",
-    "    1. Null hypothesis: No difference between models\n",
-    "    2. T-test: Compare means of two groups\n",
-    "    3. P-value: Probability of seeing this difference by chance\n",
-    "    4. Effect size: Magnitude of the difference\n",
-    "    5. Confidence interval: Range of likely true values\n",
-    "    \n",
-    "    IMPLEMENTATION APPROACH:\n",
-    "    1. Calculate basic statistics (mean, std, n)\n",
-    "    2. Perform t-test to get p-value\n",
-    "    3. Calculate effect size (Cohen's d)\n",
-    "    4. Calculate confidence interval\n",
-    "    5. Provide clear recommendation\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, confidence_level: float = 0.95):\n",
-    "        self.confidence_level = confidence_level\n",
-    "        self.alpha = 1 - confidence_level\n",
-    "    \n",
-    "    def validate_comparison(self, results_a: List[float], results_b: List[float]) -> StatisticalValidation:\n",
-    "        \"\"\"\n",
-    "        Compare two sets of benchmark results statistically.\n",
-    "        \n",
-    "        TODO: Implement statistical comparison.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Calculate basic statistics for both groups\n",
-    "        2. Perform two-sample t-test\n",
-    "        3. Calculate effect size (Cohen's d)\n",
-    "        4. Calculate confidence interval for the difference\n",
-    "        5. Generate recommendation based on results\n",
-    "        \n",
-    "        HINTS:\n",
-    "        - Use scipy.stats.ttest_ind for t-test (or implement manually)\n",
-    "        - Cohen's d = (mean_a - mean_b) / pooled_std\n",
-    "        - CI = difference ± (critical_value * standard_error)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        import math\n",
-    "        \n",
-    "        # Basic statistics\n",
-    "        mean_a = statistics.mean(results_a)\n",
-    "        mean_b = statistics.mean(results_b)\n",
-    "        std_a = statistics.stdev(results_a)\n",
-    "        std_b = statistics.stdev(results_b)\n",
-    "        n_a = len(results_a)\n",
-    "        n_b = len(results_b)\n",
-    "        \n",
-    "        # Two-sample t-test (simplified)\n",
-    "        pooled_std = math.sqrt(((n_a - 1) * std_a**2 + (n_b - 1) * std_b**2) / (n_a + n_b - 2))\n",
-    "        standard_error = pooled_std * math.sqrt(1/n_a + 1/n_b)\n",
-    "        \n",
-    "        if standard_error == 0:\n",
-    "            t_stat = 0\n",
-    "            p_value = 1.0\n",
-    "        else:\n",
-    "            t_stat = (mean_a - mean_b) / standard_error\n",
-    "            # Simplified p-value calculation (assuming normal distribution)\n",
-    "            p_value = 2 * (1 - abs(t_stat) / (abs(t_stat) + math.sqrt(n_a + n_b - 2)))\n",
-    "        \n",
-    "        # Effect size (Cohen's d)\n",
-    "        effect_size = (mean_a - mean_b) / pooled_std if pooled_std > 0 else 0\n",
-    "        \n",
-    "        # Confidence interval for difference\n",
-    "        difference = mean_a - mean_b\n",
-    "        critical_value = 1.96  # Approximate for 95% CI\n",
-    "        margin_of_error = critical_value * standard_error\n",
-    "        ci_lower = difference - margin_of_error\n",
-    "        ci_upper = difference + margin_of_error\n",
-    "        \n",
-    "        # Determine significance\n",
-    "        is_significant = p_value < self.alpha\n",
-    "        \n",
-    "        # Generate recommendation\n",
-    "        if is_significant:\n",
-    "            if effect_size > 0.8:\n",
-    "                recommendation = \"Large significant difference - strong evidence for improvement\"\n",
-    "            elif effect_size > 0.5:\n",
-    "                recommendation = \"Medium significant difference - good evidence for improvement\"\n",
-    "            else:\n",
-    "                recommendation = \"Small significant difference - weak evidence for improvement\"\n",
-    "        else:\n",
-    "            recommendation = \"No significant difference - insufficient evidence for improvement\"\n",
-    "        \n",
-    "        return StatisticalValidation(\n",
-    "            is_significant=is_significant,\n",
-    "            p_value=p_value,\n",
-    "            effect_size=effect_size,\n",
-    "            confidence_interval=(ci_lower, ci_upper),\n",
-    "            recommendation=recommendation\n",
-    "        )\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def validate_benchmark_result(self, result: BenchmarkResult, \n",
-    "                                 min_samples: int = 100) -> StatisticalValidation:\n",
-    "        \"\"\"\n",
-    "        Validate that a benchmark result has sufficient statistical power.\n",
-    "        \n",
-    "        TODO: Implement validation for single benchmark result.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Check if we have enough samples\n",
-    "        2. Calculate confidence interval for the metric\n",
-    "        3. Check for common pitfalls (outliers, etc.)\n",
-    "        4. Provide recommendations\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        latencies = result.latencies\n",
-    "        n = len(latencies)\n",
-    "        \n",
-    "        if n < min_samples:\n",
-    "            return StatisticalValidation(\n",
-    "                is_significant=False,\n",
-    "                p_value=1.0,\n",
-    "                effect_size=0.0,\n",
-    "                confidence_interval=(0.0, 0.0),\n",
-    "                recommendation=f\"Insufficient samples: {n} < {min_samples}. Need more data.\"\n",
-    "            )\n",
-    "        \n",
-    "        # Calculate confidence interval for mean latency\n",
-    "        mean_latency = statistics.mean(latencies)\n",
-    "        std_latency = statistics.stdev(latencies)\n",
-    "        standard_error = std_latency / math.sqrt(n)\n",
-    "        \n",
-    "        critical_value = 1.96  # 95% CI\n",
-    "        margin_of_error = critical_value * standard_error\n",
-    "        ci_lower = mean_latency - margin_of_error\n",
-    "        ci_upper = mean_latency + margin_of_error\n",
-    "        \n",
-    "        # Check for outliers (simple check)\n",
-    "        q1 = latencies[int(0.25 * n)]\n",
-    "        q3 = latencies[int(0.75 * n)]\n",
-    "        iqr = q3 - q1\n",
-    "        outlier_threshold = q3 + 1.5 * iqr\n",
-    "        outliers = [l for l in latencies if l > outlier_threshold]\n",
-    "        \n",
-    "        if len(outliers) > 0.1 * n:  # More than 10% outliers\n",
-    "            recommendation = f\"Warning: {len(outliers)} outliers detected. Results may be unreliable.\"\n",
-    "        else:\n",
-    "            recommendation = \"Benchmark result appears statistically valid.\"\n",
-    "        \n",
-    "        return StatisticalValidation(\n",
-    "            is_significant=True,\n",
-    "            p_value=0.0,  # Not applicable for single result\n",
-    "            effect_size=std_latency / mean_latency,  # Coefficient of variation\n",
-    "            confidence_interval=(ci_lower, ci_upper),\n",
-    "            recommendation=recommendation\n",
-    "        )\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "c53fdada",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Statistical Validation\n",
-    "\n",
-    "Let's test our statistical validation with simulated data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f225026d",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-validation",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_statistical_validation():\n",
-    "    \"\"\"Test statistical validation functionality.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Statistical Validation...\")\n",
-    "    \n",
-    "    validator = StatisticalValidator(confidence_level=0.95)\n",
-    "    \n",
-    "    # Test 1: No significant difference\n",
-    "    results_a = [0.1 + 0.01 * np.random.randn() for _ in range(100)]\n",
-    "    results_b = [0.1 + 0.01 * np.random.randn() for _ in range(100)]\n",
-    "    \n",
-    "    validation = validator.validate_comparison(results_a, results_b)\n",
-    "    print(f\"✅ No difference test: significant={validation.is_significant}, p={validation.p_value:.4f}\")\n",
-    "    \n",
-    "    # Test 2: Clear significant difference\n",
-    "    results_a = [0.1 + 0.01 * np.random.randn() for _ in range(100)]\n",
-    "    results_b = [0.2 + 0.01 * np.random.randn() for _ in range(100)]\n",
-    "    \n",
-    "    validation = validator.validate_comparison(results_a, results_b)\n",
-    "    print(f\"✅ Clear difference test: significant={validation.is_significant}, p={validation.p_value:.4f}\")\n",
-    "    print(f\"    Effect size: {validation.effect_size:.3f}\")\n",
-    "    print(f\"    Recommendation: {validation.recommendation}\")\n",
-    "    \n",
-    "    # Test 3: Single result validation\n",
-    "    mock_result = BenchmarkResult(\n",
-    "        scenario=BenchmarkScenario.SINGLE_STREAM,\n",
-    "        latencies=[0.1 + 0.01 * np.random.randn() for _ in range(200)],\n",
-    "        throughput=1000,\n",
-    "        accuracy=0.95\n",
-    "    )\n",
-    "    \n",
-    "    validation = validator.validate_benchmark_result(mock_result)\n",
-    "    print(f\"✅ Single result validation: {validation.recommendation}\")\n",
-    "    print(f\"    Confidence interval: ({validation.confidence_interval[0]:.4f}, {validation.confidence_interval[1]:.4f})\")\n",
-    "    \n",
-    "    print(\"✅ Statistical validation tests passed!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_statistical_validation()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9dc798ab",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: The TinyTorchPerf Framework - Putting It All Together\n",
-    "\n",
-    "### The Complete MLPerf-Inspired Framework\n",
-    "Now we combine all components into a professional benchmarking framework."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "cbcb09c3",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "tinytorch-perf",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class TinyTorchPerf:\n",
-    "    \"\"\"\n",
-    "    Complete MLPerf-inspired benchmarking framework for TinyTorch.\n",
-    "    \n",
-    "    TODO: Implement the complete benchmarking framework.\n",
-    "    \n",
-    "    UNDERSTANDING THE FRAMEWORK:\n",
-    "    1. Combines all benchmark scenarios\n",
-    "    2. Integrates statistical validation\n",
-    "    3. Provides easy-to-use API\n",
-    "    4. Generates professional reports\n",
-    "    \n",
-    "    IMPLEMENTATION APPROACH:\n",
-    "    1. Initialize with model and dataset\n",
-    "    2. Provide methods for each scenario\n",
-    "    3. Include statistical validation\n",
-    "    4. Generate comprehensive reports\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        self.scenarios = BenchmarkScenarios()\n",
-    "        self.validator = StatisticalValidator()\n",
-    "        self.model = None\n",
-    "        self.dataset = None\n",
-    "        self.results = {}\n",
-    "    \n",
-    "    def set_model(self, model: Callable):\n",
-    "        \"\"\"Set the model to benchmark.\"\"\"\n",
-    "        self.model = model\n",
-    "    \n",
-    "    def set_dataset(self, dataset: List):\n",
-    "        \"\"\"Set the dataset for benchmarking.\"\"\"\n",
-    "        self.dataset = dataset\n",
-    "    \n",
-    "    def run_single_stream(self, num_queries: int = 1000) -> BenchmarkResult:\n",
-    "        \"\"\"\n",
-    "        Run single-stream benchmark.\n",
-    "        \n",
-    "        TODO: Implement single-stream benchmark with validation.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Check that model and dataset are set\n",
-    "        2. Run single-stream scenario\n",
-    "        3. Validate results statistically\n",
-    "        4. Store results\n",
-    "        5. Return result\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if self.model is None or self.dataset is None:\n",
-    "            raise ValueError(\"Model and dataset must be set before running benchmarks\")\n",
-    "        \n",
-    "        result = self.scenarios.single_stream(self.model, self.dataset, num_queries)\n",
-    "        validation = self.validator.validate_benchmark_result(result)\n",
-    "        \n",
-    "        self.results['single_stream'] = {\n",
-    "            'result': result,\n",
-    "            'validation': validation\n",
-    "        }\n",
-    "        \n",
-    "        return result\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def run_server(self, target_qps: float = 10.0, duration: float = 60.0) -> BenchmarkResult:\n",
-    "        \"\"\"\n",
-    "        Run server benchmark.\n",
-    "        \n",
-    "        TODO: Implement server benchmark with validation.\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if self.model is None or self.dataset is None:\n",
-    "            raise ValueError(\"Model and dataset must be set before running benchmarks\")\n",
-    "        \n",
-    "        result = self.scenarios.server(self.model, self.dataset, target_qps, duration)\n",
-    "        validation = self.validator.validate_benchmark_result(result)\n",
-    "        \n",
-    "        self.results['server'] = {\n",
-    "            'result': result,\n",
-    "            'validation': validation\n",
-    "        }\n",
-    "        \n",
-    "        return result\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def run_offline(self, batch_size: int = 32) -> BenchmarkResult:\n",
-    "        \"\"\"\n",
-    "        Run offline benchmark.\n",
-    "        \n",
-    "        TODO: Implement offline benchmark with validation.\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if self.model is None or self.dataset is None:\n",
-    "            raise ValueError(\"Model and dataset must be set before running benchmarks\")\n",
-    "        \n",
-    "        result = self.scenarios.offline(self.model, self.dataset, batch_size)\n",
-    "        validation = self.validator.validate_benchmark_result(result)\n",
-    "        \n",
-    "        self.results['offline'] = {\n",
-    "            'result': result,\n",
-    "            'validation': validation\n",
-    "        }\n",
-    "        \n",
-    "        return result\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def run_all_scenarios(self, quick_test: bool = False) -> Dict[str, BenchmarkResult]:\n",
-    "        \"\"\"\n",
-    "        Run all benchmark scenarios.\n",
-    "        \n",
-    "        TODO: Implement comprehensive benchmarking.\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if quick_test:\n",
-    "            # Quick test with smaller parameters\n",
-    "            single_result = self.run_single_stream(num_queries=100)\n",
-    "            server_result = self.run_server(target_qps=5.0, duration=10.0)\n",
-    "            offline_result = self.run_offline(batch_size=16)\n",
-    "        else:\n",
-    "            # Full benchmarking\n",
-    "            single_result = self.run_single_stream(num_queries=1000)\n",
-    "            server_result = self.run_server(target_qps=10.0, duration=60.0)\n",
-    "            offline_result = self.run_offline(batch_size=32)\n",
-    "        \n",
-    "        return {\n",
-    "            'single_stream': single_result,\n",
-    "            'server': server_result,\n",
-    "            'offline': offline_result\n",
-    "        }\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def compare_models(self, model_a: Callable, model_b: Callable, \n",
-    "                      scenario: str = 'single_stream') -> StatisticalValidation:\n",
-    "        \"\"\"\n",
-    "        Compare two models statistically.\n",
-    "        \n",
-    "        TODO: Implement model comparison.\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Run both models on the same scenario\n",
-    "        self.set_model(model_a)\n",
-    "        if scenario == 'single_stream':\n",
-    "            result_a = self.run_single_stream(num_queries=100)\n",
-    "        elif scenario == 'server':\n",
-    "            result_a = self.run_server(target_qps=5.0, duration=10.0)\n",
-    "        else:  # offline\n",
-    "            result_a = self.run_offline(batch_size=16)\n",
-    "        \n",
-    "        self.set_model(model_b)\n",
-    "        if scenario == 'single_stream':\n",
-    "            result_b = self.run_single_stream(num_queries=100)\n",
-    "        elif scenario == 'server':\n",
-    "            result_b = self.run_server(target_qps=5.0, duration=10.0)\n",
-    "        else:  # offline\n",
-    "            result_b = self.run_offline(batch_size=16)\n",
-    "        \n",
-    "        # Compare latencies\n",
-    "        return self.validator.validate_comparison(result_a.latencies, result_b.latencies)\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def generate_report(self) -> str:\n",
-    "        \"\"\"\n",
-    "        Generate a comprehensive benchmark report.\n",
-    "        \n",
-    "        TODO: Implement professional report generation.\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        report = \"# TinyTorch Benchmark Report\\n\\n\"\n",
-    "        \n",
-    "        for scenario_name, scenario_data in self.results.items():\n",
-    "            result = scenario_data['result']\n",
-    "            validation = scenario_data['validation']\n",
-    "            \n",
-    "            report += f\"## {scenario_name.replace('_', ' ').title()} Scenario\\n\\n\"\n",
-    "            report += f\"- **Throughput**: {result.throughput:.2f} samples/second\\n\"\n",
-    "            report += f\"- **Mean Latency**: {statistics.mean(result.latencies)*1000:.2f} ms\\n\"\n",
-    "            report += f\"- **90th Percentile**: {result.latencies[int(0.9*len(result.latencies))]*1000:.2f} ms\\n\"\n",
-    "            report += f\"- **95th Percentile**: {result.latencies[int(0.95*len(result.latencies))]*1000:.2f} ms\\n\"\n",
-    "            report += f\"- **Statistical Validation**: {validation.recommendation}\\n\\n\"\n",
-    "        \n",
-    "        return report\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3fb6d798",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: TinyTorchPerf Framework\n",
-    "\n",
-    "Let's test our complete benchmarking framework."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "336444e9",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-framework",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_tinytorch_perf():\n",
-    "    \"\"\"Test the complete TinyTorchPerf framework.\"\"\"\n",
-    "    print(\"🔬 Unit Test: TinyTorchPerf Framework...\")\n",
-    "    \n",
-    "    # Create test model and dataset\n",
-    "    def test_model(sample):\n",
-    "        time.sleep(0.001)  # Simulate processing\n",
-    "        return {\"prediction\": np.random.rand(5)}\n",
-    "    \n",
-    "    test_dataset = [{\"data\": np.random.rand(10)} for _ in range(50)]\n",
-    "    \n",
-    "    # Test the framework\n",
-    "    benchmark = TinyTorchPerf()\n",
-    "    benchmark.set_model(test_model)\n",
-    "    benchmark.set_dataset(test_dataset)\n",
-    "    \n",
-    "    # Test individual scenarios\n",
-    "    single_result = benchmark.run_single_stream(num_queries=20)\n",
-    "    assert single_result.scenario == BenchmarkScenario.SINGLE_STREAM\n",
-    "    print(f\"✅ Single-stream: {single_result.throughput:.2f} samples/sec\")\n",
-    "    \n",
-    "    server_result = benchmark.run_server(target_qps=5.0, duration=2.0)\n",
-    "    assert server_result.scenario == BenchmarkScenario.SERVER\n",
-    "    print(f\"✅ Server: {server_result.throughput:.2f} QPS\")\n",
-    "    \n",
-    "    offline_result = benchmark.run_offline(batch_size=10)\n",
-    "    assert offline_result.scenario == BenchmarkScenario.OFFLINE\n",
-    "    print(f\"✅ Offline: {offline_result.throughput:.2f} samples/sec\")\n",
-    "    \n",
-    "    # Test comprehensive benchmarking\n",
-    "    all_results = benchmark.run_all_scenarios(quick_test=True)\n",
-    "    assert len(all_results) == 3\n",
-    "    print(f\"✅ All scenarios: {list(all_results.keys())}\")\n",
-    "    \n",
-    "    # Test model comparison\n",
-    "    def slower_model(sample):\n",
-    "        time.sleep(0.002)  # Twice as slow\n",
-    "        return {\"prediction\": np.random.rand(5)}\n",
-    "    \n",
-    "    comparison = benchmark.compare_models(test_model, slower_model)\n",
-    "    print(f\"✅ Model comparison: {comparison.recommendation}\")\n",
-    "    \n",
-    "    # Test report generation\n",
-    "    report = benchmark.generate_report()\n",
-    "    assert \"TinyTorch Benchmark Report\" in report\n",
-    "    print(\"✅ Report generation working\")\n",
-    "    \n",
-    "    print(\"✅ Complete TinyTorchPerf framework working!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_tinytorch_perf()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2a397866",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 5: Professional Reporting - Project-Ready Results\n",
-    "\n",
-    "### Why Professional Reports Matter\n",
-    "Your ML projects need:\n",
-    "- **Clear performance metrics** for presentations\n",
-    "- **Statistical validation** for credibility\n",
-    "- **Comparison baselines** for context\n",
-    "- **Professional formatting** for academic/industry standards"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b1bad597",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "performance-reporter",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class PerformanceReporter:\n",
-    "    \"\"\"\n",
-    "    Generates professional performance reports for ML projects.\n",
-    "    \n",
-    "    TODO: Implement professional report generation.\n",
-    "    \n",
-    "    UNDERSTANDING PROFESSIONAL REPORTS:\n",
-    "    1. Executive summary with key metrics\n",
-    "    2. Detailed methodology section\n",
-    "    3. Statistical validation results\n",
-    "    4. Comparison with baselines\n",
-    "    5. Recommendations for improvement\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self):\n",
-    "        self.reports = []\n",
-    "    \n",
-    "    def generate_project_report(self, benchmark_results: Dict[str, BenchmarkResult], \n",
-    "                               model_name: str = \"TinyTorch Model\") -> str:\n",
-    "        \"\"\"\n",
-    "        Generate a professional performance report for ML projects.\n",
-    "        \n",
-    "        TODO: Implement project report generation.\n",
-    "        \n",
-    "        STEP-BY-STEP:\n",
-    "        1. Create executive summary\n",
-    "        2. Add methodology section\n",
-    "        3. Present detailed results\n",
-    "        4. Include statistical validation\n",
-    "        5. Add recommendations\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        report = f\"\"\"# {model_name} Performance Report\n",
-    "\n",
-    "## Executive Summary\n",
-    "\n",
-    "This report presents comprehensive performance benchmarking results for {model_name} using MLPerf-inspired methodology. The evaluation covers three standard scenarios: single-stream (latency), server (throughput), and offline (batch processing).\n",
-    "\n",
-    "### Key Findings\n",
-    "\"\"\"\n",
-    "        \n",
-    "        # Add key metrics\n",
-    "        for scenario_name, result in benchmark_results.items():\n",
-    "            mean_latency = statistics.mean(result.latencies) * 1000\n",
-    "            p90_latency = result.latencies[int(0.9 * len(result.latencies))] * 1000\n",
-    "            \n",
-    "            report += f\"- **{scenario_name.replace('_', ' ').title()}**: {result.throughput:.2f} samples/sec, \"\n",
-    "            report += f\"{mean_latency:.2f}ms mean latency, {p90_latency:.2f}ms 90th percentile\\n\"\n",
-    "        \n",
-    "        report += \"\"\"\n",
-    "## Methodology\n",
-    "\n",
-    "### Benchmark Framework\n",
-    "- **Architecture**: MLPerf-inspired four-component system\n",
-    "- **Scenarios**: Single-stream, server, and offline evaluation\n",
-    "- **Statistical Validation**: Multiple runs with confidence intervals\n",
-    "- **Metrics**: Latency distribution, throughput, accuracy\n",
-    "\n",
-    "### Test Environment\n",
-    "- **Hardware**: Standard development machine\n",
-    "- **Software**: TinyTorch framework\n",
-    "- **Dataset**: Standardized evaluation dataset\n",
-    "- **Validation**: Statistical significance testing\n",
-    "\n",
-    "## Detailed Results\n",
-    "\n",
-    "\"\"\"\n",
-    "        \n",
-    "        # Add detailed results for each scenario\n",
-    "        for scenario_name, result in benchmark_results.items():\n",
-    "            report += f\"### {scenario_name.replace('_', ' ').title()} Scenario\\n\\n\"\n",
-    "            \n",
-    "            latencies_ms = [l * 1000 for l in result.latencies]\n",
-    "            \n",
-    "            report += f\"- **Sample Count**: {len(result.latencies)}\\n\"\n",
-    "            report += f\"- **Mean Latency**: {statistics.mean(latencies_ms):.2f} ms\\n\"\n",
-    "            report += f\"- **Median Latency**: {statistics.median(latencies_ms):.2f} ms\\n\"\n",
-    "            report += f\"- **90th Percentile**: {latencies_ms[int(0.9 * len(latencies_ms))]:.2f} ms\\n\"\n",
-    "            report += f\"- **95th Percentile**: {latencies_ms[int(0.95 * len(latencies_ms))]:.2f} ms\\n\"\n",
-    "            report += f\"- **Standard Deviation**: {statistics.stdev(latencies_ms):.2f} ms\\n\"\n",
-    "            report += f\"- **Throughput**: {result.throughput:.2f} samples/second\\n\"\n",
-    "            \n",
-    "            if result.accuracy > 0:\n",
-    "                report += f\"- **Accuracy**: {result.accuracy:.4f}\\n\"\n",
-    "            \n",
-    "            report += \"\\n\"\n",
-    "        \n",
-    "        report += \"\"\"## Statistical Validation\n",
-    "\n",
-    "All results include proper statistical validation:\n",
-    "- Multiple independent runs for reliability\n",
-    "- Confidence intervals for key metrics\n",
-    "- Outlier detection and handling\n",
-    "- Significance testing for comparisons\n",
-    "\n",
-    "## Recommendations\n",
-    "\n",
-    "Based on the benchmark results:\n",
-    "1. **Performance Characteristics**: Model shows consistent performance across scenarios\n",
-    "2. **Optimization Opportunities**: Focus on reducing tail latency for production deployment\n",
-    "3. **Scalability**: Server scenario results indicate good potential for production scaling\n",
-    "4. **Further Testing**: Consider testing with larger datasets and different hardware configurations\n",
-    "\n",
-    "## Conclusion\n",
-    "\n",
-    "This comprehensive benchmarking demonstrates {model_name}'s performance characteristics using industry-standard methodology. The results provide a solid foundation for production deployment decisions and further optimization efforts.\n",
-    "\"\"\"\n",
-    "        \n",
-    "        return report\n",
-    "        ### END SOLUTION\n",
-    "        raise NotImplementedError(\"Student implementation required\")\n",
-    "    \n",
-    "    def save_report(self, report: str, filename: str = \"benchmark_report.md\"):\n",
-    "        \"\"\"Save report to file.\"\"\"\n",
-    "        with open(filename, 'w') as f:\n",
-    "            f.write(report)\n",
-    "        print(f\"📄 Report saved to {filename}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ac05ec10",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Unit Test: Performance Reporter\n",
-    "\n",
-    "Let's test our professional reporting system."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "26836bbd",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "test-reporter",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_performance_reporter():\n",
-    "    \"\"\"Test the performance reporter.\"\"\"\n",
-    "    print(\"🔬 Unit Test: Performance Reporter...\")\n",
-    "    \n",
-    "    # Create mock benchmark results\n",
-    "    mock_results = {\n",
-    "        'single_stream': BenchmarkResult(\n",
-    "            scenario=BenchmarkScenario.SINGLE_STREAM,\n",
-    "            latencies=[0.01 + 0.002 * np.random.randn() for _ in range(100)],\n",
-    "            throughput=95.0,\n",
-    "            accuracy=0.942\n",
-    "        ),\n",
-    "        'server': BenchmarkResult(\n",
-    "            scenario=BenchmarkScenario.SERVER,\n",
-    "            latencies=[0.012 + 0.003 * np.random.randn() for _ in range(150)],\n",
-    "            throughput=87.0,\n",
-    "            accuracy=0.938\n",
-    "        ),\n",
-    "        'offline': BenchmarkResult(\n",
-    "            scenario=BenchmarkScenario.OFFLINE,\n",
-    "            latencies=[0.008 + 0.001 * np.random.randn() for _ in range(50)],\n",
-    "            throughput=120.0,\n",
-    "            accuracy=0.945\n",
-    "        )\n",
-    "    }\n",
-    "    \n",
-    "    # Test report generation\n",
-    "    reporter = PerformanceReporter()\n",
-    "    report = reporter.generate_project_report(mock_results, \"My Project Model\")\n",
-    "    \n",
-    "    # Verify report content\n",
-    "    assert \"Performance Report\" in report\n",
-    "    assert \"Executive Summary\" in report\n",
-    "    assert \"Methodology\" in report\n",
-    "    assert \"Detailed Results\" in report\n",
-    "    assert \"Statistical Validation\" in report\n",
-    "    assert \"Recommendations\" in report\n",
-    "    \n",
-    "    print(\"✅ Report generated successfully\")\n",
-    "    print(f\"✅ Report length: {len(report)} characters\")\n",
-    "    print(f\"✅ Contains all required sections\")\n",
-    "    \n",
-    "    # Test saving\n",
-    "    reporter.save_report(report, \"test_report.md\")\n",
-    "    print(\"✅ Report saving working\")\n",
-    "    \n",
-    "    print(\"✅ Performance reporter tests passed!\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_performance_reporter()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7fb941a3",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Comprehensive Integration Test\n",
-    "\n",
-    "Let's test everything together with a realistic TinyTorch model."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "f86442e3",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "integration-test",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_comprehensive_benchmarking():\n",
-    "    \"\"\"Test the complete benchmarking system with a realistic model.\"\"\"\n",
-    "    print(\"🔬 Comprehensive Integration Test...\")\n",
-    "    \n",
-    "    # Create a realistic TinyTorch model\n",
-    "    def create_simple_model():\n",
-    "        \"\"\"Create a simple classification model for testing.\"\"\"\n",
-    "        def model(sample):\n",
-    "            # Simulate a simple neural network\n",
-    "            x = np.array(sample['data'])\n",
-    "            \n",
-    "            # Layer 1: 10 -> 5\n",
-    "            W1 = np.random.randn(10, 5) * 0.1\n",
-    "            b1 = np.zeros(5)\n",
-    "            h1 = np.maximum(0, x @ W1 + b1)  # ReLU\n",
-    "            \n",
-    "            # Layer 2: 5 -> 3\n",
-    "            W2 = np.random.randn(5, 3) * 0.1\n",
-    "            b2 = np.zeros(3)\n",
-    "            output = h1 @ W2 + b2\n",
-    "            \n",
-    "            # Simulate some processing time\n",
-    "            time.sleep(0.001)\n",
-    "            \n",
-    "            return {\"prediction\": output}\n",
-    "        \n",
-    "        return model\n",
-    "    \n",
-    "    # Create test dataset\n",
-    "    test_dataset = []\n",
-    "    for i in range(100):\n",
-    "        sample = {\n",
-    "            'data': np.random.randn(10),\n",
-    "            'target': np.random.randint(0, 3)\n",
-    "        }\n",
-    "        test_dataset.append(sample)\n",
-    "    \n",
-    "    # Test complete workflow\n",
-    "    model = create_simple_model()\n",
-    "    \n",
-    "    # 1. Run comprehensive benchmarking\n",
-    "    benchmark = TinyTorchPerf()\n",
-    "    benchmark.set_model(model)\n",
-    "    benchmark.set_dataset(test_dataset)\n",
-    "    \n",
-    "    print(\"📊 Running comprehensive benchmarking...\")\n",
-    "    all_results = benchmark.run_all_scenarios(quick_test=True)\n",
-    "    \n",
-    "    # 2. Generate professional report\n",
-    "    reporter = PerformanceReporter()\n",
-    "    report = reporter.generate_project_report(all_results, \"TinyTorch CNN Model\")\n",
-    "    \n",
-    "    # 3. Validate results\n",
-    "    for scenario_name, result in all_results.items():\n",
-    "        assert result.throughput > 0, f\"{scenario_name} should have positive throughput\"\n",
-    "        assert len(result.latencies) > 0, f\"{scenario_name} should have latency measurements\"\n",
-    "        print(f\"✅ {scenario_name}: {result.throughput:.2f} samples/sec\")\n",
-    "    \n",
-    "    # 4. Test model comparison\n",
-    "    def create_slower_model():\n",
-    "        \"\"\"Create a slower model for comparison.\"\"\"\n",
-    "        def model(sample):\n",
-    "            x = np.array(sample['data'])\n",
-    "            W1 = np.random.randn(10, 5) * 0.1\n",
-    "            b1 = np.zeros(5)\n",
-    "            h1 = np.maximum(0, x @ W1 + b1)\n",
-    "            \n",
-    "            W2 = np.random.randn(5, 3) * 0.1\n",
-    "            b2 = np.zeros(3)\n",
-    "            output = h1 @ W2 + b2\n",
-    "            \n",
-    "            time.sleep(0.002)  # Slower\n",
-    "            return {\"prediction\": output}\n",
-    "        \n",
-    "        return model\n",
-    "    \n",
-    "    slower_model = create_slower_model()\n",
-    "    comparison = benchmark.compare_models(model, slower_model)\n",
-    "    print(f\"✅ Model comparison: {comparison.recommendation}\")\n",
-    "    \n",
-    "    # 5. Test report quality\n",
-    "    assert len(report) > 1000, \"Report should be comprehensive\"\n",
-    "    print(f\"✅ Generated {len(report)} character report\")\n",
-    "    \n",
-    "    print(\"✅ Comprehensive integration test passed!\")\n",
-    "    print(\"🎉 Complete benchmarking system working!\")\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "test_comprehensive_benchmarking()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "78fbb675",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Module Testing\n",
-    "\n",
-    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
-    "\n",
-    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "bc87be81",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "standardized-testing",
-     "locked": true,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "# =============================================================================\n",
-    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
-    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
-    "# =============================================================================\n",
-    "\n",
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"Benchmarking\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0f730754",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: Systematic ML Performance Evaluation\n",
-    "\n",
-    "### What You've Built\n",
-    "You've implemented a comprehensive MLPerf-inspired benchmarking framework:\n",
-    "\n",
-    "1. **Benchmark Scenarios**: Single-stream (latency), server (throughput), and offline (batch processing)\n",
-    "2. **Statistical Validation**: Confidence intervals, significance testing, and effect size calculation\n",
-    "3. **MLPerf Architecture**: Four-component system with load generator, model, dataset, and evaluation\n",
-    "4. **Professional Reporting**: Generate conference-quality performance reports with proper methodology\n",
-    "5. **Model Comparison**: Systematic comparison framework with statistical validation\n",
-    "\n",
-    "### Key Insights\n",
-    "- **Systematic evaluation beats intuition**: Proper benchmarking reveals true performance characteristics\n",
-    "- **Statistics matter**: Single measurements are meaningless; confidence intervals provide real insights\n",
-    "- **Scenarios capture reality**: Different use cases (mobile, server, batch) require different metrics\n",
-    "- **Reproducibility is crucial**: Others must be able to verify your results\n",
-    "- **Professional presentation**: Clear methodology and statistical validation build credibility\n",
-    "\n",
-    "### Real-World Connections\n",
-    "- **MLPerf**: Uses identical four-component architecture and scenario patterns\n",
-    "- **Production systems**: A/B testing frameworks follow these statistical principles\n",
-    "- **Research papers**: Proper experimental methodology is required for publication\n",
-    "- **ML engineering**: Systematic evaluation prevents costly production mistakes\n",
-    "- **Open source**: Contributing benchmarks to libraries like PyTorch and TensorFlow\n",
-    "\n",
-    "### Next Steps\n",
-    "In real ML systems, you'd:\n",
-    "1. **GPU benchmarking**: Extend to CUDA/OpenCL performance measurement\n",
-    "2. **Distributed evaluation**: Scale benchmarking across multiple machines\n",
-    "3. **Continuous monitoring**: Integrate with CI/CD pipelines for regression detection\n",
-    "4. **Domain-specific metrics**: Develop specialized benchmarks for your problem domain\n",
-    "5. **Hardware optimization**: Evaluate performance across different architectures\n",
-    "\n",
-    "### 🏆 Achievement Unlocked\n",
-    "You've mastered systematic ML evaluation using industry-standard methodology. You understand how to design proper experiments, validate results statistically, and present findings professionally!\n",
-    "\n",
-    "**You've completed the TinyTorch Benchmarking module!** 🎉"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/modules/source/15_mlops/mlops_dev.ipynb b/modules/source/15_mlops/mlops_dev.ipynb
deleted file mode 100644
index dcb3fb1e..00000000
--- a/modules/source/15_mlops/mlops_dev.ipynb
+++ /dev/null
@@ -1,1967 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "f4c5c0ff",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "# MLOps - Production ML Systems\n",
-    "\n",
-    "Welcome to the MLOps module! This is where we close the loop on the complete ML system lifecycle.\n",
-    "\n",
-    "## Learning Goals\n",
-    "- Understand why ML models degrade over time without maintenance\n",
-    "- Implement performance monitoring and drift detection systems\n",
-    "- Build automated retraining triggers that use your training pipeline\n",
-    "- Create model comparison and deployment workflows\n",
-    "- See how all TinyTorch components work together in production\n",
-    "\n",
-    "## Build → Use → Deploy\n",
-    "1. **Build**: Complete MLOps infrastructure for model lifecycle management\n",
-    "2. **Use**: Deploy and monitor ML systems that automatically respond to issues\n",
-    "3. **Deploy**: Create production-ready systems that maintain themselves over time"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "387f1832",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "mlops-imports",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| default_exp core.mlops\n",
-    "\n",
-    "#| export\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import os\n",
-    "import sys\n",
-    "import time\n",
-    "import json\n",
-    "from typing import Dict, List, Tuple, Optional, Any, Callable\n",
-    "from dataclasses import dataclass, field\n",
-    "from datetime import datetime, timedelta\n",
-    "from collections import defaultdict\n",
-    "\n",
-    "# Import our dependencies - try from package first, then local modules\n",
-    "try:\n",
-    "    from tinytorch.core.tensor import Tensor\n",
-    "    from tinytorch.core.training import Trainer, MeanSquaredError, CrossEntropyLoss, Accuracy\n",
-    "    from tinytorch.core.benchmarking import TinyTorchPerf, StatisticalValidator\n",
-    "    from tinytorch.core.compression import quantize_layer_weights, prune_weights_by_magnitude\n",
-    "    from tinytorch.core.networks import Sequential\n",
-    "    from tinytorch.core.layers import Dense\n",
-    "    from tinytorch.core.activations import ReLU, Sigmoid, Softmax\n",
-    "except ImportError:\n",
-    "    # For development, import from local modules\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '09_training'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '12_benchmarking'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '10_compression'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '04_networks'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))\n",
-    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
-    "    try:\n",
-    "        from tensor_dev import Tensor\n",
-    "        from training_dev import Trainer, MeanSquaredError, CrossEntropyLoss, Accuracy\n",
-    "        from benchmarking_dev import TinyTorchPerf, StatisticalValidator\n",
-    "        from compression_dev import quantize_layer_weights, prune_weights_by_magnitude\n",
-    "        from networks_dev import Sequential\n",
-    "        from layers_dev import Dense\n",
-    "        from activations_dev import ReLU, Sigmoid, Softmax\n",
-    "    except ImportError:\n",
-    "        print(\"⚠️  Development imports failed - some functionality may be limited\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "d9e58374",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "mlops-setup",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| hide\n",
-    "#| export\n",
-    "def _should_show_plots():\n",
-    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-    "    # Check multiple conditions that indicate we're in test mode\n",
-    "    is_pytest = (\n",
-    "        'pytest' in sys.modules or\n",
-    "        'test' in sys.argv or\n",
-    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-    "        any('test' in arg for arg in sys.argv) or\n",
-    "        any('pytest' in arg for arg in sys.argv)\n",
-    "    )\n",
-    "    \n",
-    "    # Show plots in development mode (when not in test mode)\n",
-    "    return not is_pytest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "3c76a555",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "mlops-welcome",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "print(\"🚀 TinyTorch MLOps Module\")\n",
-    "print(f\"NumPy version: {np.__version__}\")\n",
-    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-    "print(\"Ready to build production ML systems!\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f150193d",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 📦 Where This Code Lives in the Final Package\n",
-    "\n",
-    "**Learning Side:** You work in `modules/source/13_mlops/mlops_dev.py`  \n",
-    "**Building Side:** Code exports to `tinytorch.core.mlops`\n",
-    "\n",
-    "```python\n",
-    "# Final package structure:\n",
-    "from tinytorch.core.mlops import ModelMonitor, DriftDetector, MLOpsPipeline\n",
-    "from tinytorch.core.training import Trainer  # Reuse your training system\n",
-    "from tinytorch.core.benchmarking import TinyTorchPerf  # Reuse your benchmarking\n",
-    "from tinytorch.core.compression import quantize_layer_weights  # Reuse compression\n",
-    "```\n",
-    "\n",
-    "**Why this matters:**\n",
-    "- **Integration:** MLOps orchestrates all TinyTorch components\n",
-    "- **Reusability:** Uses everything you've built in previous modules\n",
-    "- **Production:** Real-world ML system lifecycle management\n",
-    "- **Maintainability:** Systems that keep working over time"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "f72d25a4",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## What is MLOps?\n",
-    "\n",
-    "### The Production Reality: Models Degrade Over Time\n",
-    "You've built an amazing ML system:\n",
-    "- **Training pipeline**: Produces high-quality models\n",
-    "- **Compression**: Optimizes models for deployment\n",
-    "- **Kernels**: Accelerates inference\n",
-    "- **Benchmarking**: Measures performance\n",
-    "\n",
-    "But there's a critical problem: **Models degrade over time without maintenance.**\n",
-    "\n",
-    "### Why Models Fail in Production\n",
-    "1. **Data drift**: Input data distribution changes\n",
-    "2. **Concept drift**: Relationship between inputs and outputs changes\n",
-    "3. **Performance degradation**: Accuracy drops over time\n",
-    "4. **System changes**: Infrastructure updates break assumptions\n",
-    "\n",
-    "### The MLOps Solution\n",
-    "**MLOps** (Machine Learning Operations) is the practice of maintaining ML systems in production:\n",
-    "- **Monitor**: Track model performance continuously\n",
-    "- **Detect**: Identify when models are failing\n",
-    "- **Respond**: Automatically retrain and redeploy\n",
-    "- **Validate**: Ensure new models are actually better\n",
-    "\n",
-    "### Real-World Examples\n",
-    "- **Netflix**: Recommendation models retrain when viewing patterns change\n",
-    "- **Uber**: Demand prediction models adapt to new cities and events\n",
-    "- **Google**: Search ranking models update as web content evolves\n",
-    "- **Tesla**: Autonomous driving models improve with new driving data\n",
-    "\n",
-    "### The Complete TinyTorch Lifecycle\n",
-    "```\n",
-    "Data → Training → Compression → Kernels → Benchmarking → Monitor → Detect → Retrain → Deploy\n",
-    "                                                             ↑__________________________|\n",
-    "```\n",
-    "\n",
-    "MLOps closes this loop, creating **self-maintaining systems**."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d6f13793",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 1: Performance Drift Monitor - Tracking Model Health\n",
-    "\n",
-    "### The Problem: Silent Model Degradation\n",
-    "Without monitoring, you won't know when your model stops working:\n",
-    "- **Accuracy drops** from 95% to 85% over 3 months\n",
-    "- **Latency increases** as data patterns change\n",
-    "- **System failures** go unnoticed until user complaints\n",
-    "\n",
-    "### The Solution: Continuous Performance Monitoring\n",
-    "Track key metrics over time:\n",
-    "- **Accuracy/Error rates**: Primary model performance\n",
-    "- **Latency/Throughput**: System performance\n",
-    "- **Data statistics**: Input distribution changes\n",
-    "- **System health**: Infrastructure metrics\n",
-    "\n",
-    "### What We'll Build\n",
-    "A `ModelMonitor` that:\n",
-    "1. **Tracks performance** over time\n",
-    "2. **Stores metric history** for trend analysis\n",
-    "3. **Detects degradation** when metrics drop\n",
-    "4. **Alerts** when thresholds are crossed\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **E-commerce**: Monitor recommendation click-through rates\n",
-    "- **Finance**: Track fraud detection false positive rates\n",
-    "- **Healthcare**: Monitor diagnostic accuracy over time\n",
-    "- **Autonomous vehicles**: Track object detection confidence scores"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "04ea61ef",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "model-monitor",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "@dataclass\n",
-    "class ModelMonitor:\n",
-    "    \"\"\"\n",
-    "    Monitors ML model performance over time and detects degradation.\n",
-    "    \n",
-    "    Tracks key metrics, stores history, and alerts when performance drops.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, model_name: str, baseline_accuracy: float = 0.95):\n",
-    "        \"\"\"\n",
-    "        TODO: Initialize the ModelMonitor for tracking model performance.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Store the model_name and baseline_accuracy\n",
-    "        2. Create empty lists to store metric history:\n",
-    "           - accuracy_history: List[float] \n",
-    "           - latency_history: List[float]\n",
-    "           - timestamp_history: List[datetime]\n",
-    "        3. Set performance thresholds:\n",
-    "           - accuracy_threshold: baseline_accuracy * 0.9 (10% drop triggers alert)\n",
-    "           - latency_threshold: 200.0 (milliseconds)\n",
-    "        4. Initialize alert flags:\n",
-    "           - accuracy_alert: False\n",
-    "           - latency_alert: False\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        monitor = ModelMonitor(\"image_classifier\", baseline_accuracy=0.93)\n",
-    "        monitor.record_performance(accuracy=0.92, latency=150.0)\n",
-    "        alerts = monitor.check_alerts()\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use self.model_name = model_name\n",
-    "        - Initialize lists with self.accuracy_history = []\n",
-    "        - Use datetime.now() for timestamps\n",
-    "        - Set thresholds relative to baseline (e.g., 90% of baseline)\n",
-    "        \n",
-    "        LEARNING CONNECTIONS:\n",
-    "        - This builds on benchmarking concepts from Module 12\n",
-    "        - Performance tracking is essential for production systems\n",
-    "        - Thresholds prevent false alarms while catching real issues\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.model_name = model_name\n",
-    "        self.baseline_accuracy = baseline_accuracy\n",
-    "        \n",
-    "        # Metric history storage\n",
-    "        self.accuracy_history = []\n",
-    "        self.latency_history = []\n",
-    "        self.timestamp_history = []\n",
-    "        \n",
-    "        # Performance thresholds\n",
-    "        self.accuracy_threshold = baseline_accuracy * 0.9  # 10% drop triggers alert\n",
-    "        self.latency_threshold = 200.0  # milliseconds\n",
-    "        \n",
-    "        # Alert flags\n",
-    "        self.accuracy_alert = False\n",
-    "        self.latency_alert = False\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def record_performance(self, accuracy: float, latency: float):\n",
-    "        \"\"\"\n",
-    "        TODO: Record a new performance measurement.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Get current timestamp with datetime.now()\n",
-    "        2. Append accuracy to self.accuracy_history\n",
-    "        3. Append latency to self.latency_history\n",
-    "        4. Append timestamp to self.timestamp_history\n",
-    "        5. Check if accuracy is below threshold:\n",
-    "           - If accuracy < self.accuracy_threshold: set self.accuracy_alert = True\n",
-    "           - Else: set self.accuracy_alert = False\n",
-    "        6. Check if latency is above threshold:\n",
-    "           - If latency > self.latency_threshold: set self.latency_alert = True\n",
-    "           - Else: set self.latency_alert = False\n",
-    "        \n",
-    "        EXAMPLE BEHAVIOR:\n",
-    "        ```python\n",
-    "        monitor.record_performance(0.94, 120.0)  # Good performance\n",
-    "        monitor.record_performance(0.84, 250.0)  # Triggers both alerts\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use datetime.now() for timestamps\n",
-    "        - Update alert flags based on current measurement\n",
-    "        - Don't forget to store all three values (accuracy, latency, timestamp)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        current_time = datetime.now()\n",
-    "        \n",
-    "        # Record the measurements\n",
-    "        self.accuracy_history.append(accuracy)\n",
-    "        self.latency_history.append(latency)\n",
-    "        self.timestamp_history.append(current_time)\n",
-    "        \n",
-    "        # Check thresholds and update alerts\n",
-    "        self.accuracy_alert = accuracy < self.accuracy_threshold\n",
-    "        self.latency_alert = latency > self.latency_threshold\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def check_alerts(self) -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        TODO: Check current alert status and return alert information.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Create result dictionary with basic info:\n",
-    "           - \"model_name\": self.model_name\n",
-    "           - \"accuracy_alert\": self.accuracy_alert\n",
-    "           - \"latency_alert\": self.latency_alert\n",
-    "        2. If accuracy_alert is True, add:\n",
-    "           - \"accuracy_message\": f\"Accuracy below threshold: {current_accuracy:.3f} < {self.accuracy_threshold:.3f}\"\n",
-    "           - \"current_accuracy\": most recent accuracy from history\n",
-    "        3. If latency_alert is True, add:\n",
-    "           - \"latency_message\": f\"Latency above threshold: {current_latency:.1f}ms > {self.latency_threshold:.1f}ms\"\n",
-    "           - \"current_latency\": most recent latency from history\n",
-    "        4. Add overall alert status:\n",
-    "           - \"any_alerts\": True if any alert is active\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"model_name\": \"image_classifier\",\n",
-    "            \"accuracy_alert\": True,\n",
-    "            \"latency_alert\": False,\n",
-    "            \"accuracy_message\": \"Accuracy below threshold: 0.840 < 0.855\",\n",
-    "            \"current_accuracy\": 0.840,\n",
-    "            \"any_alerts\": True\n",
-    "        }\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use self.accuracy_history[-1] for most recent values\n",
-    "        - Format numbers with f-strings for readability\n",
-    "        - Include both alert flags and descriptive messages\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        result = {\n",
-    "            \"model_name\": self.model_name,\n",
-    "            \"accuracy_alert\": self.accuracy_alert,\n",
-    "            \"latency_alert\": self.latency_alert\n",
-    "        }\n",
-    "        \n",
-    "        if self.accuracy_alert and self.accuracy_history:\n",
-    "            current_accuracy = self.accuracy_history[-1]\n",
-    "            result[\"accuracy_message\"] = f\"Accuracy below threshold: {current_accuracy:.3f} < {self.accuracy_threshold:.3f}\"\n",
-    "            result[\"current_accuracy\"] = current_accuracy\n",
-    "        \n",
-    "        if self.latency_alert and self.latency_history:\n",
-    "            current_latency = self.latency_history[-1]\n",
-    "            result[\"latency_message\"] = f\"Latency above threshold: {current_latency:.1f}ms > {self.latency_threshold:.1f}ms\"\n",
-    "            result[\"current_latency\"] = current_latency\n",
-    "        \n",
-    "        result[\"any_alerts\"] = self.accuracy_alert or self.latency_alert\n",
-    "        return result\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def get_performance_trend(self) -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        TODO: Analyze performance trends over time.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Check if we have enough data (at least 2 measurements)\n",
-    "        2. Calculate accuracy trend:\n",
-    "           - If accuracy_history has < 2 points: trend = \"insufficient_data\"\n",
-    "           - Else: compare recent avg (last 3) vs older avg (first 3)\n",
-    "           - If recent > older: trend = \"improving\"\n",
-    "           - If recent < older: trend = \"degrading\"\n",
-    "           - Else: trend = \"stable\"\n",
-    "        3. Calculate similar trend for latency\n",
-    "        4. Return dictionary with:\n",
-    "           - \"measurements_count\": len(self.accuracy_history)\n",
-    "           - \"accuracy_trend\": trend analysis\n",
-    "           - \"latency_trend\": trend analysis\n",
-    "           - \"baseline_accuracy\": self.baseline_accuracy\n",
-    "           - \"current_accuracy\": most recent accuracy (if available)\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"measurements_count\": 10,\n",
-    "            \"accuracy_trend\": \"degrading\",\n",
-    "            \"latency_trend\": \"stable\",\n",
-    "            \"baseline_accuracy\": 0.95,\n",
-    "            \"current_accuracy\": 0.87\n",
-    "        }\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use len(self.accuracy_history) for data count\n",
-    "        - Use np.mean() for calculating averages\n",
-    "        - Handle edge cases (empty history, insufficient data)\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if len(self.accuracy_history) < 2:\n",
-    "            return {\n",
-    "                \"measurements_count\": len(self.accuracy_history),\n",
-    "                \"accuracy_trend\": \"insufficient_data\",\n",
-    "                \"latency_trend\": \"insufficient_data\",\n",
-    "                \"baseline_accuracy\": self.baseline_accuracy,\n",
-    "                \"current_accuracy\": self.accuracy_history[-1] if self.accuracy_history else None\n",
-    "            }\n",
-    "        \n",
-    "        # Calculate accuracy trend\n",
-    "        if len(self.accuracy_history) >= 6:\n",
-    "            recent_acc = np.mean(self.accuracy_history[-3:])\n",
-    "            older_acc = np.mean(self.accuracy_history[:3])\n",
-    "            if recent_acc > older_acc * 1.01:  # 1% improvement\n",
-    "                accuracy_trend = \"improving\"\n",
-    "            elif recent_acc < older_acc * 0.99:  # 1% degradation\n",
-    "                accuracy_trend = \"degrading\"\n",
-    "            else:\n",
-    "                accuracy_trend = \"stable\"\n",
-    "        else:\n",
-    "            # Simple comparison for limited data\n",
-    "            if self.accuracy_history[-1] > self.accuracy_history[0]:\n",
-    "                accuracy_trend = \"improving\"\n",
-    "            elif self.accuracy_history[-1] < self.accuracy_history[0]:\n",
-    "                accuracy_trend = \"degrading\"\n",
-    "            else:\n",
-    "                accuracy_trend = \"stable\"\n",
-    "        \n",
-    "        # Calculate latency trend\n",
-    "        if len(self.latency_history) >= 6:\n",
-    "            recent_lat = np.mean(self.latency_history[-3:])\n",
-    "            older_lat = np.mean(self.latency_history[:3])\n",
-    "            if recent_lat > older_lat * 1.1:  # 10% increase\n",
-    "                latency_trend = \"degrading\"\n",
-    "            elif recent_lat < older_lat * 0.9:  # 10% improvement\n",
-    "                latency_trend = \"improving\"\n",
-    "            else:\n",
-    "                latency_trend = \"stable\"\n",
-    "        else:\n",
-    "            # Simple comparison for limited data\n",
-    "            if self.latency_history[-1] > self.latency_history[0]:\n",
-    "                latency_trend = \"degrading\"\n",
-    "            elif self.latency_history[-1] < self.latency_history[0]:\n",
-    "                latency_trend = \"improving\"\n",
-    "            else:\n",
-    "                latency_trend = \"stable\"\n",
-    "        \n",
-    "        return {\n",
-    "            \"measurements_count\": len(self.accuracy_history),\n",
-    "            \"accuracy_trend\": accuracy_trend,\n",
-    "            \"latency_trend\": latency_trend,\n",
-    "            \"baseline_accuracy\": self.baseline_accuracy,\n",
-    "            \"current_accuracy\": self.accuracy_history[-1] if self.accuracy_history else None\n",
-    "        }\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ba19c61a",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Performance Monitor\n",
-    "\n",
-    "Once you implement the `ModelMonitor` class above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8ef9236e",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-model-monitor",
-     "locked": true,
-     "points": 20,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_model_monitor():\n",
-    "    \"\"\"Test ModelMonitor implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Performance Drift Monitor...\")\n",
-    "    \n",
-    "    # Test initialization\n",
-    "    monitor = ModelMonitor(\"test_model\", baseline_accuracy=0.90)\n",
-    "    \n",
-    "    assert monitor.model_name == \"test_model\"\n",
-    "    assert monitor.baseline_accuracy == 0.90\n",
-    "    assert monitor.accuracy_threshold == 0.81  # 90% of 0.90\n",
-    "    assert monitor.latency_threshold == 200.0\n",
-    "    assert not monitor.accuracy_alert\n",
-    "    assert not monitor.latency_alert\n",
-    "    \n",
-    "    # Test good performance (no alerts)\n",
-    "    monitor.record_performance(accuracy=0.92, latency=150.0)\n",
-    "    \n",
-    "    alerts = monitor.check_alerts()\n",
-    "    assert not alerts[\"accuracy_alert\"]\n",
-    "    assert not alerts[\"latency_alert\"]\n",
-    "    assert not alerts[\"any_alerts\"]\n",
-    "    \n",
-    "    # Test accuracy degradation\n",
-    "    monitor.record_performance(accuracy=0.80, latency=150.0)  # Below threshold\n",
-    "    \n",
-    "    alerts = monitor.check_alerts()\n",
-    "    assert alerts[\"accuracy_alert\"]\n",
-    "    assert not alerts[\"latency_alert\"]\n",
-    "    assert alerts[\"any_alerts\"]\n",
-    "    assert \"Accuracy below threshold\" in alerts[\"accuracy_message\"]\n",
-    "    \n",
-    "    # Test latency degradation\n",
-    "    monitor.record_performance(accuracy=0.85, latency=250.0)  # Above threshold\n",
-    "    \n",
-    "    alerts = monitor.check_alerts()\n",
-    "    assert not alerts[\"accuracy_alert\"]  # Back above threshold\n",
-    "    assert alerts[\"latency_alert\"]\n",
-    "    assert alerts[\"any_alerts\"]\n",
-    "    assert \"Latency above threshold\" in alerts[\"latency_message\"]\n",
-    "    \n",
-    "    # Test trend analysis\n",
-    "    # Add more measurements to test trends\n",
-    "    for i in range(5):\n",
-    "        monitor.record_performance(accuracy=0.90 - i*0.02, latency=120.0 + i*10)\n",
-    "    \n",
-    "    trend = monitor.get_performance_trend()\n",
-    "    assert trend[\"measurements_count\"] >= 5\n",
-    "    assert trend[\"accuracy_trend\"] in [\"improving\", \"degrading\", \"stable\"]\n",
-    "    assert trend[\"latency_trend\"] in [\"improving\", \"degrading\", \"stable\"]\n",
-    "    assert trend[\"baseline_accuracy\"] == 0.90\n",
-    "    \n",
-    "    print(\"✅ ModelMonitor initialization works correctly\")\n",
-    "    print(\"✅ Performance recording and alert detection work\")\n",
-    "    print(\"✅ Alert checking returns proper format\")\n",
-    "    print(\"✅ Trend analysis provides meaningful insights\")\n",
-    "    print(\"📈 Progress: Performance Drift Monitor ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_model_monitor()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b999e698",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 2: Simple Drift Detection - Detecting Data Changes\n",
-    "\n",
-    "### The Problem: Silent Data Distribution Changes\n",
-    "Your model was trained on specific data patterns, but production data evolves:\n",
-    "- **Seasonal changes**: E-commerce traffic patterns change during holidays\n",
-    "- **User behavior shifts**: App usage patterns evolve over time\n",
-    "- **External factors**: Economic conditions affect financial predictions\n",
-    "- **System changes**: New data sources introduce different distributions\n",
-    "\n",
-    "### The Solution: Statistical Drift Detection\n",
-    "Compare current data to baseline data using statistical tests:\n",
-    "- **Kolmogorov-Smirnov test**: Detects distribution changes\n",
-    "- **Mean/Standard deviation shifts**: Simple but effective\n",
-    "- **Population stability index**: Common in industry\n",
-    "- **Chi-square test**: For categorical features\n",
-    "\n",
-    "### What We'll Build\n",
-    "A `DriftDetector` that:\n",
-    "1. **Stores baseline data** from training time\n",
-    "2. **Compares new data** to baseline using statistical tests\n",
-    "3. **Detects significant changes** in distribution\n",
-    "4. **Provides interpretable results** for debugging\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Fraud detection**: New fraud patterns emerge constantly\n",
-    "- **Recommendation systems**: User preferences shift over time\n",
-    "- **Medical diagnosis**: Patient demographics change\n",
-    "- **Computer vision**: Camera quality, lighting conditions evolve"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "e662b45d",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "drift-detector",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class DriftDetector:\n",
-    "    \"\"\"\n",
-    "    Detects data drift by comparing current data distributions to baseline.\n",
-    "    \n",
-    "    Uses statistical tests to identify significant changes in data patterns.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, baseline_data: np.ndarray, feature_names: Optional[List[str]] = None):\n",
-    "        \"\"\"\n",
-    "        TODO: Initialize the DriftDetector with baseline data.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Store baseline_data and feature_names\n",
-    "        2. Calculate baseline statistics:\n",
-    "           - baseline_mean: np.mean(baseline_data, axis=0)\n",
-    "           - baseline_std: np.std(baseline_data, axis=0)\n",
-    "           - baseline_min: np.min(baseline_data, axis=0)\n",
-    "           - baseline_max: np.max(baseline_data, axis=0)\n",
-    "        3. Set drift detection threshold (default: 0.05 for 95% confidence)\n",
-    "        4. Initialize drift history storage:\n",
-    "           - drift_history: List[Dict] to store drift test results\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        baseline = np.random.normal(0, 1, (1000, 3))\n",
-    "        detector = DriftDetector(baseline, [\"feature1\", \"feature2\", \"feature3\"])\n",
-    "        drift_result = detector.detect_drift(new_data)\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use axis=0 for column-wise statistics\n",
-    "        - Handle case when feature_names is None\n",
-    "        - Store original baseline_data for KS test\n",
-    "        - Set significance level (alpha) to 0.05\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.baseline_data = baseline_data\n",
-    "        self.feature_names = feature_names or [f\"feature_{i}\" for i in range(baseline_data.shape[1])]\n",
-    "        \n",
-    "        # Calculate baseline statistics\n",
-    "        self.baseline_mean = np.mean(baseline_data, axis=0)\n",
-    "        self.baseline_std = np.std(baseline_data, axis=0)\n",
-    "        self.baseline_min = np.min(baseline_data, axis=0)\n",
-    "        self.baseline_max = np.max(baseline_data, axis=0)\n",
-    "        \n",
-    "        # Drift detection parameters\n",
-    "        self.significance_level = 0.05\n",
-    "        \n",
-    "        # Drift history\n",
-    "        self.drift_history = []\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def detect_drift(self, new_data: np.ndarray) -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        TODO: Detect drift by comparing new data to baseline.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Calculate new data statistics:\n",
-    "           - new_mean, new_std, new_min, new_max (same as baseline)\n",
-    "        2. Perform statistical tests for each feature:\n",
-    "           - KS test: from scipy.stats import ks_2samp (if available)\n",
-    "           - Mean shift test: |new_mean - baseline_mean| / baseline_std > 2\n",
-    "           - Std shift test: |new_std - baseline_std| / baseline_std > 0.5\n",
-    "        3. Create result dictionary:\n",
-    "           - \"drift_detected\": True if any feature shows drift\n",
-    "           - \"feature_drift\": Dict with per-feature results\n",
-    "           - \"summary\": Overall drift description\n",
-    "        4. Store result in drift_history\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"drift_detected\": True,\n",
-    "            \"feature_drift\": {\n",
-    "                \"feature1\": {\"mean_drift\": True, \"std_drift\": False, \"ks_pvalue\": 0.001},\n",
-    "                \"feature2\": {\"mean_drift\": False, \"std_drift\": True, \"ks_pvalue\": 0.3}\n",
-    "            },\n",
-    "            \"summary\": \"Drift detected in 2/3 features\"\n",
-    "        }\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use try-except for KS test (may not be available)\n",
-    "        - Check each feature individually\n",
-    "        - Use absolute values for difference checks\n",
-    "        - Count how many features show drift\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        # Calculate new data statistics\n",
-    "        new_mean = np.mean(new_data, axis=0)\n",
-    "        new_std = np.std(new_data, axis=0)\n",
-    "        new_min = np.min(new_data, axis=0)\n",
-    "        new_max = np.max(new_data, axis=0)\n",
-    "        \n",
-    "        feature_drift = {}\n",
-    "        drift_count = 0\n",
-    "        \n",
-    "        for i, feature_name in enumerate(self.feature_names):\n",
-    "            # Mean shift test (2 standard deviations)\n",
-    "            mean_drift = abs(new_mean[i] - self.baseline_mean[i]) / (self.baseline_std[i] + 1e-8) > 2.0\n",
-    "            \n",
-    "            # Standard deviation shift test (50% change)\n",
-    "            std_drift = abs(new_std[i] - self.baseline_std[i]) / (self.baseline_std[i] + 1e-8) > 0.5\n",
-    "            \n",
-    "            # Simple KS test (without scipy)\n",
-    "            # For simplicity, we'll use range change as proxy\n",
-    "            baseline_range = self.baseline_max[i] - self.baseline_min[i]\n",
-    "            new_range = new_max[i] - new_min[i]\n",
-    "            range_drift = abs(new_range - baseline_range) / (baseline_range + 1e-8) > 0.3\n",
-    "            \n",
-    "            any_drift = mean_drift or std_drift or range_drift\n",
-    "            if any_drift:\n",
-    "                drift_count += 1\n",
-    "            \n",
-    "            feature_drift[feature_name] = {\n",
-    "                \"mean_drift\": mean_drift,\n",
-    "                \"std_drift\": std_drift,\n",
-    "                \"range_drift\": range_drift,\n",
-    "                \"mean_change\": (new_mean[i] - self.baseline_mean[i]) / (self.baseline_std[i] + 1e-8),\n",
-    "                \"std_change\": (new_std[i] - self.baseline_std[i]) / (self.baseline_std[i] + 1e-8)\n",
-    "            }\n",
-    "        \n",
-    "        drift_detected = drift_count > 0\n",
-    "        \n",
-    "        result = {\n",
-    "            \"drift_detected\": drift_detected,\n",
-    "            \"feature_drift\": feature_drift,\n",
-    "            \"summary\": f\"Drift detected in {drift_count}/{len(self.feature_names)} features\",\n",
-    "            \"drift_count\": drift_count,\n",
-    "            \"total_features\": len(self.feature_names)\n",
-    "        }\n",
-    "        \n",
-    "        # Store in history\n",
-    "        self.drift_history.append({\n",
-    "            \"timestamp\": datetime.now(),\n",
-    "            \"result\": result\n",
-    "        })\n",
-    "        \n",
-    "        return result\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def get_drift_history(self) -> List[Dict]:\n",
-    "        \"\"\"\n",
-    "        TODO: Return the complete drift detection history.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Return self.drift_history\n",
-    "        2. Include timestamp and result for each detection\n",
-    "        3. Format for easy analysis\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        [\n",
-    "            {\n",
-    "                \"timestamp\": datetime(2024, 1, 1, 12, 0),\n",
-    "                \"result\": {\"drift_detected\": False, \"drift_count\": 0, ...}\n",
-    "            },\n",
-    "            {\n",
-    "                \"timestamp\": datetime(2024, 1, 2, 12, 0),\n",
-    "                \"result\": {\"drift_detected\": True, \"drift_count\": 2, ...}\n",
-    "            }\n",
-    "        ]\n",
-    "        ```\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return self.drift_history\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3e2a728d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Drift Detector\n",
-    "\n",
-    "Once you implement the `DriftDetector` class above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "c47c42b6",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-drift-detector",
-     "locked": true,
-     "points": 20,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_drift_detector():\n",
-    "    \"\"\"Test DriftDetector implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Simple Drift Detection...\")\n",
-    "    \n",
-    "    # Create baseline data\n",
-    "    np.random.seed(42)\n",
-    "    baseline_data = np.random.normal(0, 1, (1000, 3))\n",
-    "    feature_names = [\"feature1\", \"feature2\", \"feature3\"]\n",
-    "    \n",
-    "    detector = DriftDetector(baseline_data, feature_names)\n",
-    "    \n",
-    "    # Test initialization\n",
-    "    assert detector.baseline_data.shape == (1000, 3)\n",
-    "    assert len(detector.feature_names) == 3\n",
-    "    assert detector.feature_names == feature_names\n",
-    "    assert detector.significance_level == 0.05\n",
-    "    \n",
-    "    # Test no drift (similar data)\n",
-    "    no_drift_data = np.random.normal(0, 1, (500, 3))\n",
-    "    result = detector.detect_drift(no_drift_data)\n",
-    "    \n",
-    "    assert \"drift_detected\" in result\n",
-    "    assert \"feature_drift\" in result\n",
-    "    assert \"summary\" in result\n",
-    "    assert len(result[\"feature_drift\"]) == 3\n",
-    "    \n",
-    "    # Test clear drift (shifted data)\n",
-    "    drift_data = np.random.normal(3, 1, (500, 3))  # Mean shifted by 3\n",
-    "    result = detector.detect_drift(drift_data)\n",
-    "    \n",
-    "    assert result[\"drift_detected\"] == True\n",
-    "    assert result[\"drift_count\"] > 0\n",
-    "    assert \"Drift detected\" in result[\"summary\"]\n",
-    "    \n",
-    "    # Check feature-level drift detection\n",
-    "    for feature_name in feature_names:\n",
-    "        feature_result = result[\"feature_drift\"][feature_name]\n",
-    "        assert \"mean_drift\" in feature_result\n",
-    "        assert \"std_drift\" in feature_result\n",
-    "        assert \"mean_change\" in feature_result\n",
-    "    \n",
-    "    # Test drift history\n",
-    "    history = detector.get_drift_history()\n",
-    "    assert len(history) >= 2  # At least 2 drift checks\n",
-    "    assert all(\"timestamp\" in entry for entry in history)\n",
-    "    assert all(\"result\" in entry for entry in history)\n",
-    "    \n",
-    "    print(\"✅ DriftDetector initialization works correctly\")\n",
-    "    print(\"✅ No-drift detection works (similar data)\")\n",
-    "    print(\"✅ Clear drift detection works (shifted data)\")\n",
-    "    print(\"✅ Feature-level drift analysis works\")\n",
-    "    print(\"✅ Drift history tracking works\")\n",
-    "    print(\"📈 Progress: Simple Drift Detection ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_drift_detector()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2a86430e",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 3: Retraining Trigger System - Automated Response to Issues\n",
-    "\n",
-    "### The Problem: Manual Intervention Required\n",
-    "You can detect when models are failing, but someone needs to:\n",
-    "- **Notice the alerts** (requires constant monitoring)\n",
-    "- **Decide to retrain** (requires domain expertise)\n",
-    "- **Execute retraining** (requires technical knowledge)\n",
-    "- **Validate results** (requires ML expertise)\n",
-    "\n",
-    "### The Solution: Automated Retraining Pipeline\n",
-    "Create a system that automatically responds to performance degradation:\n",
-    "- **Threshold-based triggers**: Automatically start retraining when performance drops\n",
-    "- **Reuse existing components**: Use your training pipeline from Module 09\n",
-    "- **Intelligent scheduling**: Avoid unnecessary retraining\n",
-    "- **Validation before deployment**: Ensure new models are actually better\n",
-    "\n",
-    "### What We'll Build\n",
-    "A `RetrainingTrigger` that:\n",
-    "1. **Monitors model performance** using ModelMonitor\n",
-    "2. **Detects drift** using DriftDetector\n",
-    "3. **Triggers retraining** when conditions are met\n",
-    "4. **Orchestrates the process** using existing TinyTorch components\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **A/B testing platforms**: Automatically update models based on performance\n",
-    "- **Recommendation engines**: Retrain when user behavior changes\n",
-    "- **Fraud detection**: Adapt to new fraud patterns automatically\n",
-    "- **Predictive maintenance**: Update models as equipment ages"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "92afc13d",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "retraining-trigger",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class RetrainingTrigger:\n",
-    "    \"\"\"\n",
-    "    Automated retraining system that responds to model performance degradation.\n",
-    "    \n",
-    "    Orchestrates the complete retraining workflow using existing TinyTorch components.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, model, training_data, validation_data, trainer_class=None):\n",
-    "        \"\"\"\n",
-    "        TODO: Initialize the RetrainingTrigger system.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Store the model, training_data, and validation_data\n",
-    "        2. Set up the trainer_class (use provided or default to simple trainer)\n",
-    "        3. Initialize trigger conditions:\n",
-    "           - accuracy_threshold: 0.85 (trigger retraining if accuracy < 85%)\n",
-    "           - drift_threshold: 2 (trigger if drift detected in 2+ features)\n",
-    "           - min_time_between_retrains: 24 hours (avoid too frequent retraining)\n",
-    "        4. Initialize tracking variables:\n",
-    "           - last_retrain_time: datetime.now()\n",
-    "           - retrain_history: List[Dict] to store retraining results\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        trigger = RetrainingTrigger(model, train_data, val_data)\n",
-    "        should_retrain = trigger.check_trigger_conditions(monitor, drift_detector)\n",
-    "        if should_retrain:\n",
-    "            new_model = trigger.execute_retraining()\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Store references to data for retraining\n",
-    "        - Set reasonable default thresholds\n",
-    "        - Use datetime for time tracking\n",
-    "        - Initialize empty history list\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.model = model\n",
-    "        self.training_data = training_data\n",
-    "        self.validation_data = validation_data\n",
-    "        self.trainer_class = trainer_class\n",
-    "        \n",
-    "        # Trigger conditions\n",
-    "        self.accuracy_threshold = 0.82  # Slightly above ModelMonitor threshold of 0.81\n",
-    "        self.drift_threshold = 1  # Reduced threshold for faster triggering\n",
-    "        self.min_time_between_retrains = 24 * 60 * 60  # 24 hours in seconds\n",
-    "        \n",
-    "        # Tracking variables\n",
-    "        # Set initial time to 25 hours ago to allow immediate retraining in tests\n",
-    "        self.last_retrain_time = datetime.now() - timedelta(hours=25)\n",
-    "        self.retrain_history = []\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def check_trigger_conditions(self, monitor: ModelMonitor, drift_detector: DriftDetector) -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        TODO: Check if retraining should be triggered.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Get current time and check time since last retrain:\n",
-    "           - time_since_last = (current_time - self.last_retrain_time).total_seconds()\n",
-    "           - too_soon = time_since_last < self.min_time_between_retrains\n",
-    "        2. Check monitor alerts:\n",
-    "           - Get alerts from monitor.check_alerts()\n",
-    "           - accuracy_trigger = alerts[\"accuracy_alert\"]\n",
-    "        3. Check drift status:\n",
-    "           - Get latest drift from drift_detector.drift_history\n",
-    "           - drift_trigger = drift_count >= self.drift_threshold\n",
-    "        4. Determine overall trigger status:\n",
-    "           - should_retrain = (accuracy_trigger or drift_trigger) and not too_soon\n",
-    "        5. Return comprehensive result dictionary\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"should_retrain\": True,\n",
-    "            \"accuracy_trigger\": True,\n",
-    "            \"drift_trigger\": False,\n",
-    "            \"time_trigger\": True,\n",
-    "            \"reasons\": [\"Accuracy below threshold: 0.82 < 0.85\"],\n",
-    "            \"time_since_last_retrain\": 86400\n",
-    "        }\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use .total_seconds() for time differences\n",
-    "        - Collect all trigger reasons in a list\n",
-    "        - Handle empty drift history gracefully\n",
-    "        - Provide detailed feedback for debugging\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        current_time = datetime.now()\n",
-    "        time_since_last = (current_time - self.last_retrain_time).total_seconds()\n",
-    "        too_soon = time_since_last < self.min_time_between_retrains\n",
-    "        \n",
-    "        # Check monitor alerts\n",
-    "        alerts = monitor.check_alerts()\n",
-    "        accuracy_trigger = alerts[\"accuracy_alert\"]\n",
-    "        \n",
-    "        # Check drift status\n",
-    "        drift_trigger = False\n",
-    "        drift_count = 0\n",
-    "        if drift_detector.drift_history:\n",
-    "            latest_drift = drift_detector.drift_history[-1][\"result\"]\n",
-    "            drift_count = latest_drift[\"drift_count\"]\n",
-    "            drift_trigger = drift_count >= self.drift_threshold\n",
-    "        \n",
-    "        # Determine overall trigger\n",
-    "        should_retrain = (accuracy_trigger or drift_trigger) and not too_soon\n",
-    "        \n",
-    "        # Collect reasons\n",
-    "        reasons = []\n",
-    "        if accuracy_trigger and monitor.accuracy_history:\n",
-    "            reasons.append(f\"Accuracy below threshold: {monitor.accuracy_history[-1]:.3f} < {self.accuracy_threshold}\")\n",
-    "        elif accuracy_trigger:\n",
-    "            reasons.append(f\"Accuracy below threshold: < {self.accuracy_threshold}\")\n",
-    "        if drift_trigger:\n",
-    "            reasons.append(f\"Drift detected in {drift_count} features (threshold: {self.drift_threshold})\")\n",
-    "        if too_soon:\n",
-    "            reasons.append(f\"Too soon since last retrain ({time_since_last:.0f}s < {self.min_time_between_retrains}s)\")\n",
-    "        \n",
-    "        return {\n",
-    "            \"should_retrain\": should_retrain,\n",
-    "            \"accuracy_trigger\": accuracy_trigger,\n",
-    "            \"drift_trigger\": drift_trigger,\n",
-    "            \"time_trigger\": not too_soon,\n",
-    "            \"reasons\": reasons,\n",
-    "            \"time_since_last_retrain\": time_since_last,\n",
-    "            \"drift_count\": drift_count\n",
-    "        }\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def execute_retraining(self) -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        TODO: Execute the retraining process.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Record start time and create result dictionary\n",
-    "        2. Simulate training process:\n",
-    "           - Create simple model (copy of original architecture)\n",
-    "           - Simulate training with random improvement\n",
-    "           - Calculate new performance (baseline + random improvement)\n",
-    "        3. Validate new model:\n",
-    "           - Compare old vs new performance\n",
-    "           - Only deploy if new model is better\n",
-    "        4. Update tracking:\n",
-    "           - Update last_retrain_time\n",
-    "           - Add entry to retrain_history\n",
-    "        5. Return comprehensive result\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"success\": True,\n",
-    "            \"old_accuracy\": 0.82,\n",
-    "            \"new_accuracy\": 0.91,\n",
-    "            \"improvement\": 0.09,\n",
-    "            \"deployed\": True,\n",
-    "            \"training_time\": 45.2,\n",
-    "            \"timestamp\": datetime(2024, 1, 1, 12, 0)\n",
-    "        }\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use time.time() for timing\n",
-    "        - Simulate realistic training time (random 30-60 seconds)\n",
-    "        - Add random improvement (0.02-0.08 accuracy boost)\n",
-    "        - Only deploy if new model is better\n",
-    "        - Store detailed results for analysis\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        start_time = time.time()\n",
-    "        timestamp = datetime.now()\n",
-    "        \n",
-    "        # Simulate training process\n",
-    "        training_time = np.random.uniform(30, 60)  # Simulate 30-60 seconds\n",
-    "        time.sleep(0.000001)  # Ultra short sleep for fast testing\n",
-    "        \n",
-    "        # Get current model performance\n",
-    "        old_accuracy = 0.82 if not hasattr(self, '_current_accuracy') else self._current_accuracy\n",
-    "        \n",
-    "        # Simulate training with random improvement\n",
-    "        improvement = np.random.uniform(0.02, 0.08)  # 2-8% improvement\n",
-    "        new_accuracy = min(old_accuracy + improvement, 0.98)  # Cap at 98%\n",
-    "        \n",
-    "        # Validate new model (deploy if better)\n",
-    "        deployed = new_accuracy > old_accuracy\n",
-    "        \n",
-    "        # Update tracking\n",
-    "        if deployed:\n",
-    "            self.last_retrain_time = timestamp\n",
-    "            self._current_accuracy = new_accuracy\n",
-    "        \n",
-    "        # Create result\n",
-    "        result = {\n",
-    "            \"success\": True,\n",
-    "            \"old_accuracy\": old_accuracy,\n",
-    "            \"new_accuracy\": new_accuracy,\n",
-    "            \"improvement\": new_accuracy - old_accuracy,\n",
-    "            \"deployed\": deployed,\n",
-    "            \"training_time\": training_time,\n",
-    "            \"timestamp\": timestamp\n",
-    "        }\n",
-    "        \n",
-    "        # Store in history\n",
-    "        self.retrain_history.append(result)\n",
-    "        \n",
-    "        return result\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def get_retraining_history(self) -> List[Dict]:\n",
-    "        \"\"\"\n",
-    "        TODO: Return the complete retraining history.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Return self.retrain_history\n",
-    "        2. Include all retraining attempts with results\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        [\n",
-    "            {\n",
-    "                \"success\": True,\n",
-    "                \"old_accuracy\": 0.82,\n",
-    "                \"new_accuracy\": 0.89,\n",
-    "                \"improvement\": 0.07,\n",
-    "                \"deployed\": True,\n",
-    "                \"training_time\": 42.1,\n",
-    "                \"timestamp\": datetime(2024, 1, 1, 12, 0)\n",
-    "            }\n",
-    "        ]\n",
-    "        ```\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        return self.retrain_history\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d0105b8f",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Retraining Trigger\n",
-    "\n",
-    "Once you implement the `RetrainingTrigger` class above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "56c74e82",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-retraining-trigger",
-     "locked": true,
-     "points": 25,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_retraining_trigger():\n",
-    "    \"\"\"Test RetrainingTrigger implementation\"\"\"\n",
-    "    print(\"🔬 Unit Test: Retraining Trigger System...\")\n",
-    "    \n",
-    "    # Create mock model and data\n",
-    "    model = \"mock_model\"\n",
-    "    train_data = np.random.normal(0, 1, (1000, 10))\n",
-    "    val_data = np.random.normal(0, 1, (200, 10))\n",
-    "    \n",
-    "    # Create retraining trigger\n",
-    "    trigger = RetrainingTrigger(model, train_data, val_data)\n",
-    "    \n",
-    "    # Test initialization\n",
-    "    assert trigger.model == model\n",
-    "    assert trigger.accuracy_threshold == 0.82\n",
-    "    assert trigger.drift_threshold == 1\n",
-    "    assert trigger.min_time_between_retrains == 24 * 60 * 60\n",
-    "    \n",
-    "    # Create monitor and drift detector for testing\n",
-    "    monitor = ModelMonitor(\"test_model\", baseline_accuracy=0.90)\n",
-    "    baseline_data = np.random.normal(0, 1, (1000, 3))\n",
-    "    drift_detector = DriftDetector(baseline_data)\n",
-    "    \n",
-    "    # Test no trigger conditions (good performance)\n",
-    "    monitor.record_performance(accuracy=0.92, latency=150.0)\n",
-    "    no_drift_data = np.random.normal(0, 1, (500, 3))\n",
-    "    drift_detector.detect_drift(no_drift_data)\n",
-    "    \n",
-    "    conditions = trigger.check_trigger_conditions(monitor, drift_detector)\n",
-    "    assert not conditions[\"should_retrain\"]\n",
-    "    assert not conditions[\"accuracy_trigger\"]\n",
-    "    assert not conditions[\"drift_trigger\"]\n",
-    "    \n",
-    "    # Test accuracy trigger\n",
-    "    monitor.record_performance(accuracy=0.80, latency=150.0)  # Below threshold\n",
-    "    conditions = trigger.check_trigger_conditions(monitor, drift_detector)\n",
-    "    assert conditions[\"accuracy_trigger\"]\n",
-    "    \n",
-    "    # Test drift trigger\n",
-    "    drift_data = np.random.normal(3, 1, (500, 3))  # Shifted data\n",
-    "    drift_detector.detect_drift(drift_data)\n",
-    "    conditions = trigger.check_trigger_conditions(monitor, drift_detector)\n",
-    "    assert conditions[\"drift_trigger\"]\n",
-    "    \n",
-    "    # Test retraining execution\n",
-    "    result = trigger.execute_retraining()\n",
-    "    assert result[\"success\"] == True\n",
-    "    assert \"old_accuracy\" in result\n",
-    "    assert \"new_accuracy\" in result\n",
-    "    assert \"improvement\" in result\n",
-    "    assert \"deployed\" in result\n",
-    "    assert \"training_time\" in result\n",
-    "    assert \"timestamp\" in result\n",
-    "    \n",
-    "    # Test retraining history\n",
-    "    history = trigger.get_retraining_history()\n",
-    "    assert len(history) >= 1\n",
-    "    assert all(\"timestamp\" in entry for entry in history)\n",
-    "    assert all(\"success\" in entry for entry in history)\n",
-    "    \n",
-    "    print(\"✅ RetrainingTrigger initialization works correctly\")\n",
-    "    print(\"✅ Trigger condition checking works\")\n",
-    "    print(\"✅ Accuracy and drift triggers work\")\n",
-    "    print(\"✅ Retraining execution works\")\n",
-    "    print(\"✅ Retraining history tracking works\")\n",
-    "    print(\"📈 Progress: Retraining Trigger System ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_retraining_trigger()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "7387b5a3",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## Step 4: Complete MLOps Pipeline - Integration and Deployment\n",
-    "\n",
-    "### The Problem: Disconnected Components\n",
-    "You have built individual MLOps components, but they need to work together:\n",
-    "- **ModelMonitor**: Tracks performance over time\n",
-    "- **DriftDetector**: Identifies data distribution changes\n",
-    "- **RetrainingTrigger**: Automates retraining decisions\n",
-    "- **Need**: Integration layer that orchestrates everything\n",
-    "\n",
-    "### The Solution: Complete MLOps Pipeline\n",
-    "Create a unified system that brings everything together:\n",
-    "- **Unified interface**: Single entry point for all MLOps operations\n",
-    "- **Automated workflows**: End-to-end automation from monitoring to deployment\n",
-    "- **Integration with TinyTorch**: Uses all previous modules seamlessly\n",
-    "- **Production-ready**: Handles edge cases and error conditions\n",
-    "\n",
-    "### What We'll Build\n",
-    "An `MLOpsPipeline` that:\n",
-    "1. **Integrates all components** into a cohesive system\n",
-    "2. **Orchestrates the complete workflow** from monitoring to deployment\n",
-    "3. **Provides simple API** for production use\n",
-    "4. **Demonstrates the full TinyTorch ecosystem** working together\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **End-to-end ML platforms**: MLflow, Kubeflow, SageMaker\n",
-    "- **Production ML systems**: Netflix, Uber, Google's ML infrastructure\n",
-    "- **Automated ML pipelines**: Continuous learning and deployment\n",
-    "- **ML monitoring platforms**: Datadog, New Relic for ML systems"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "97ecf47c",
-   "metadata": {
-    "lines_to_next_cell": 1,
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "mlops-pipeline",
-     "locked": false,
-     "schema_version": 3,
-     "solution": true,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "#| export\n",
-    "class MLOpsPipeline:\n",
-    "    \"\"\"\n",
-    "    Complete MLOps pipeline that integrates all components.\n",
-    "    \n",
-    "    Orchestrates the full ML system lifecycle from monitoring to deployment.\n",
-    "    \"\"\"\n",
-    "    \n",
-    "    def __init__(self, model, training_data, validation_data, baseline_data):\n",
-    "        \"\"\"\n",
-    "        TODO: Initialize the complete MLOps pipeline.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Store all input data and model\n",
-    "        2. Initialize all MLOps components:\n",
-    "           - ModelMonitor with baseline accuracy\n",
-    "           - DriftDetector with baseline data\n",
-    "           - RetrainingTrigger with model and data\n",
-    "        3. Set up pipeline configuration:\n",
-    "           - monitoring_interval: 3600 (1 hour)\n",
-    "           - auto_retrain: True\n",
-    "           - deploy_threshold: 0.02 (2% improvement required)\n",
-    "        4. Initialize pipeline state:\n",
-    "           - pipeline_active: False\n",
-    "           - last_check_time: datetime.now()\n",
-    "           - deployment_history: []\n",
-    "        \n",
-    "        EXAMPLE USAGE:\n",
-    "        ```python\n",
-    "        pipeline = MLOpsPipeline(model, train_data, val_data, baseline_data)\n",
-    "        pipeline.start_monitoring()\n",
-    "        status = pipeline.check_system_health()\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Calculate baseline_accuracy from validation data (use 0.9 as default)\n",
-    "        - Use feature_names from data shape\n",
-    "        - Set reasonable defaults for all parameters\n",
-    "        - Initialize all components in __init__\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.model = model\n",
-    "        self.training_data = training_data\n",
-    "        self.validation_data = validation_data\n",
-    "        self.baseline_data = baseline_data\n",
-    "        \n",
-    "        # Initialize MLOps components\n",
-    "        self.monitor = ModelMonitor(\"production_model\", baseline_accuracy=0.90)\n",
-    "        feature_names = [f\"feature_{i}\" for i in range(baseline_data.shape[1])]\n",
-    "        self.drift_detector = DriftDetector(baseline_data, feature_names)\n",
-    "        self.retrain_trigger = RetrainingTrigger(model, training_data, validation_data)\n",
-    "        \n",
-    "        # Pipeline configuration\n",
-    "        self.monitoring_interval = 3600  # 1 hour\n",
-    "        self.auto_retrain = True\n",
-    "        self.deploy_threshold = 0.02  # 2% improvement\n",
-    "        \n",
-    "        # Pipeline state\n",
-    "        self.pipeline_active = False\n",
-    "        self.last_check_time = datetime.now()\n",
-    "        self.deployment_history = []\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def start_monitoring(self):\n",
-    "        \"\"\"\n",
-    "        TODO: Start the MLOps monitoring pipeline.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Set pipeline_active = True\n",
-    "        2. Update last_check_time = datetime.now()\n",
-    "        3. Log pipeline start\n",
-    "        4. Return status dictionary\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"status\": \"started\",\n",
-    "            \"pipeline_active\": True,\n",
-    "            \"start_time\": datetime(2024, 1, 1, 12, 0),\n",
-    "            \"message\": \"MLOps pipeline started successfully\"\n",
-    "        }\n",
-    "        ```\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        self.pipeline_active = True\n",
-    "        self.last_check_time = datetime.now()\n",
-    "        \n",
-    "        return {\n",
-    "            \"status\": \"started\",\n",
-    "            \"pipeline_active\": True,\n",
-    "            \"start_time\": self.last_check_time,\n",
-    "            \"message\": \"MLOps pipeline started successfully\"\n",
-    "        }\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def check_system_health(self, new_data: Optional[np.ndarray] = None, current_accuracy: Optional[float] = None) -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        TODO: Check complete system health and trigger actions if needed.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Check if pipeline is active, return early if not\n",
-    "        2. Record current performance in monitor (if provided)\n",
-    "        3. Check for drift (if new_data provided)\n",
-    "        4. Check trigger conditions\n",
-    "        5. Execute retraining if needed (and auto_retrain is True)\n",
-    "        6. Return comprehensive system status\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"pipeline_active\": True,\n",
-    "            \"current_accuracy\": 0.87,\n",
-    "            \"drift_detected\": True,\n",
-    "            \"retraining_triggered\": True,\n",
-    "            \"new_model_deployed\": True,\n",
-    "            \"system_healthy\": True,\n",
-    "            \"last_check\": datetime(2024, 1, 1, 12, 0),\n",
-    "            \"actions_taken\": [\"drift_detected\", \"retraining_executed\", \"model_deployed\"]\n",
-    "        }\n",
-    "        ```\n",
-    "        \n",
-    "        IMPLEMENTATION HINTS:\n",
-    "        - Use default values if parameters not provided\n",
-    "        - Track all actions taken during health check\n",
-    "        - Update last_check_time\n",
-    "        - Return comprehensive status for debugging\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        if not self.pipeline_active:\n",
-    "            return {\n",
-    "                \"pipeline_active\": False,\n",
-    "                \"message\": \"Pipeline not active. Call start_monitoring() first.\"\n",
-    "            }\n",
-    "        \n",
-    "        current_time = datetime.now()\n",
-    "        actions_taken = []\n",
-    "        \n",
-    "        # Record performance if provided\n",
-    "        if current_accuracy is not None:\n",
-    "            self.monitor.record_performance(current_accuracy, latency=150.0)\n",
-    "            actions_taken.append(\"performance_recorded\")\n",
-    "        \n",
-    "        # Check for drift if new data provided\n",
-    "        drift_detected = False\n",
-    "        if new_data is not None:\n",
-    "            drift_result = self.drift_detector.detect_drift(new_data)\n",
-    "            drift_detected = drift_result[\"drift_detected\"]\n",
-    "            if drift_detected:\n",
-    "                actions_taken.append(\"drift_detected\")\n",
-    "        \n",
-    "        # Check trigger conditions\n",
-    "        trigger_conditions = self.retrain_trigger.check_trigger_conditions(\n",
-    "            self.monitor, self.drift_detector\n",
-    "        )\n",
-    "        \n",
-    "        # Execute retraining if needed\n",
-    "        new_model_deployed = False\n",
-    "        if trigger_conditions[\"should_retrain\"] and self.auto_retrain:\n",
-    "            retrain_result = self.retrain_trigger.execute_retraining()\n",
-    "            actions_taken.append(\"retraining_executed\")\n",
-    "            \n",
-    "            if retrain_result[\"deployed\"]:\n",
-    "                new_model_deployed = True\n",
-    "                actions_taken.append(\"model_deployed\")\n",
-    "                \n",
-    "                # Record deployment\n",
-    "                self.deployment_history.append({\n",
-    "                    \"timestamp\": current_time,\n",
-    "                    \"old_accuracy\": retrain_result[\"old_accuracy\"],\n",
-    "                    \"new_accuracy\": retrain_result[\"new_accuracy\"],\n",
-    "                    \"improvement\": retrain_result[\"improvement\"]\n",
-    "                })\n",
-    "        \n",
-    "        # Update state\n",
-    "        self.last_check_time = current_time\n",
-    "        \n",
-    "        # Determine system health\n",
-    "        alerts = self.monitor.check_alerts()\n",
-    "        system_healthy = not alerts[\"any_alerts\"] or new_model_deployed\n",
-    "        \n",
-    "        return {\n",
-    "            \"pipeline_active\": True,\n",
-    "            \"current_accuracy\": current_accuracy,\n",
-    "            \"drift_detected\": drift_detected,\n",
-    "            \"retraining_triggered\": trigger_conditions[\"should_retrain\"],\n",
-    "            \"new_model_deployed\": new_model_deployed,\n",
-    "            \"system_healthy\": system_healthy,\n",
-    "            \"last_check\": current_time,\n",
-    "            \"actions_taken\": actions_taken,\n",
-    "            \"alerts\": alerts,\n",
-    "            \"trigger_conditions\": trigger_conditions\n",
-    "        }\n",
-    "        ### END SOLUTION\n",
-    "    \n",
-    "    def get_pipeline_status(self) -> Dict[str, Any]:\n",
-    "        \"\"\"\n",
-    "        TODO: Get comprehensive pipeline status and history.\n",
-    "        \n",
-    "        STEP-BY-STEP IMPLEMENTATION:\n",
-    "        1. Get status from all components:\n",
-    "           - Monitor alerts and trends\n",
-    "           - Drift detection history\n",
-    "           - Retraining history\n",
-    "           - Deployment history\n",
-    "        2. Calculate summary statistics:\n",
-    "           - Total deployments\n",
-    "           - Average accuracy improvement\n",
-    "           - Time since last check\n",
-    "        3. Return comprehensive status\n",
-    "        \n",
-    "        EXAMPLE RETURN:\n",
-    "        ```python\n",
-    "        {\n",
-    "            \"pipeline_active\": True,\n",
-    "            \"total_deployments\": 3,\n",
-    "            \"average_improvement\": 0.05,\n",
-    "            \"time_since_last_check\": 300,\n",
-    "            \"recent_alerts\": [...],\n",
-    "            \"drift_history\": [...],\n",
-    "            \"deployment_history\": [...]\n",
-    "        }\n",
-    "        ```\n",
-    "        \"\"\"\n",
-    "        ### BEGIN SOLUTION\n",
-    "        current_time = datetime.now()\n",
-    "        time_since_last_check = (current_time - self.last_check_time).total_seconds()\n",
-    "        \n",
-    "        # Get component statuses\n",
-    "        alerts = self.monitor.check_alerts()\n",
-    "        trend = self.monitor.get_performance_trend()\n",
-    "        drift_history = self.drift_detector.get_drift_history()\n",
-    "        retrain_history = self.retrain_trigger.get_retraining_history()\n",
-    "        \n",
-    "        # Calculate summary statistics\n",
-    "        total_deployments = len(self.deployment_history)\n",
-    "        average_improvement = 0.0\n",
-    "        if self.deployment_history:\n",
-    "            average_improvement = np.mean([d[\"improvement\"] for d in self.deployment_history])\n",
-    "        \n",
-    "        return {\n",
-    "            \"pipeline_active\": self.pipeline_active,\n",
-    "            \"total_deployments\": total_deployments,\n",
-    "            \"average_improvement\": average_improvement,\n",
-    "            \"time_since_last_check\": time_since_last_check,\n",
-    "            \"recent_alerts\": alerts,\n",
-    "            \"performance_trend\": trend,\n",
-    "            \"drift_history\": drift_history[-5:],  # Last 5 drift checks\n",
-    "            \"deployment_history\": self.deployment_history,\n",
-    "            \"retrain_history\": retrain_history\n",
-    "        }\n",
-    "        ### END SOLUTION"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0c1d0f8d",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "### 🧪 Test Your Complete MLOps Pipeline\n",
-    "\n",
-    "Once you implement the `MLOpsPipeline` class above, run this cell to test it:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "ebf38c2b",
-   "metadata": {
-    "nbgrader": {
-     "grade": true,
-     "grade_id": "test-mlops-pipeline",
-     "locked": true,
-     "points": 35,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_mlops_pipeline():\n",
-    "    \"\"\"Test complete MLOps pipeline\"\"\"\n",
-    "    print(\"🔬 Unit Test: Complete MLOps Pipeline...\")\n",
-    "    \n",
-    "    # Create test data\n",
-    "    model = \"test_model\"\n",
-    "    train_data = np.random.normal(0, 1, (1000, 5))\n",
-    "    val_data = np.random.normal(0, 1, (200, 5))\n",
-    "    baseline_data = np.random.normal(0, 1, (1000, 5))\n",
-    "    \n",
-    "    # Create pipeline\n",
-    "    pipeline = MLOpsPipeline(model, train_data, val_data, baseline_data)\n",
-    "    \n",
-    "    # Test initialization\n",
-    "    assert pipeline.model == model\n",
-    "    assert pipeline.pipeline_active == False\n",
-    "    assert hasattr(pipeline, 'monitor')\n",
-    "    assert hasattr(pipeline, 'drift_detector')\n",
-    "    assert hasattr(pipeline, 'retrain_trigger')\n",
-    "    \n",
-    "    # Test start monitoring\n",
-    "    start_result = pipeline.start_monitoring()\n",
-    "    assert start_result[\"status\"] == \"started\"\n",
-    "    assert start_result[\"pipeline_active\"] == True\n",
-    "    assert pipeline.pipeline_active == True\n",
-    "    \n",
-    "    # Test system health check (no issues)\n",
-    "    health = pipeline.check_system_health(\n",
-    "        new_data=np.random.normal(0, 1, (100, 5)),\n",
-    "        current_accuracy=0.92\n",
-    "    )\n",
-    "    assert health[\"pipeline_active\"] == True\n",
-    "    assert health[\"current_accuracy\"] == 0.92\n",
-    "    assert \"actions_taken\" in health\n",
-    "    \n",
-    "    # Test system health check (with issues)\n",
-    "    health = pipeline.check_system_health(\n",
-    "        new_data=np.random.normal(5, 2, (100, 5)),  # Heavily drifted data\n",
-    "        current_accuracy=0.75  # Very low accuracy (well below 0.81 threshold)\n",
-    "    )\n",
-    "    assert health[\"pipeline_active\"] == True\n",
-    "    assert health[\"drift_detected\"] == True\n",
-    "    # Note: retraining_triggered depends on both accuracy and drift conditions\n",
-    "    # For fast testing, we just verify the system detects issues\n",
-    "    assert \"retraining_triggered\" in health\n",
-    "    \n",
-    "    # Test pipeline status\n",
-    "    status = pipeline.get_pipeline_status()\n",
-    "    assert status[\"pipeline_active\"] == True\n",
-    "    assert \"total_deployments\" in status\n",
-    "    assert \"average_improvement\" in status\n",
-    "    assert \"time_since_last_check\" in status\n",
-    "    assert \"recent_alerts\" in status\n",
-    "    assert \"performance_trend\" in status\n",
-    "    \n",
-    "    print(\"✅ MLOpsPipeline initialization works correctly\")\n",
-    "    print(\"✅ Pipeline start/stop functionality works\")\n",
-    "    print(\"✅ System health checking works\")\n",
-    "    print(\"✅ Drift detection and retraining integration works\")\n",
-    "    print(\"✅ Pipeline status reporting works\")\n",
-    "    print(\"📈 Progress: Complete MLOps Pipeline ✓\")\n",
-    "\n",
-    "# Run the test\n",
-    "test_mlops_pipeline()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4bb2abb4",
-   "metadata": {
-    "cell_marker": "\"\"\"",
-    "lines_to_next_cell": 1
-   },
-   "source": [
-    "## 🎯 Final Integration: Complete TinyTorch Ecosystem\n",
-    "\n",
-    "### The Full System in Action\n",
-    "Let's demonstrate how all TinyTorch components work together in a complete MLOps pipeline:\n",
-    "\n",
-    "```python\n",
-    "# Complete TinyTorch MLOps workflow\n",
-    "from tinytorch.core.tensor import Tensor\n",
-    "from tinytorch.core.networks import Sequential\n",
-    "from tinytorch.core.layers import Dense  \n",
-    "from tinytorch.core.activations import ReLU, Softmax\n",
-    "from tinytorch.core.training import Trainer, CrossEntropyLoss\n",
-    "from tinytorch.core.compression import quantize_layer_weights\n",
-    "from tinytorch.core.benchmarking import TinyTorchPerf\n",
-    "from tinytorch.core.mlops import MLOpsPipeline\n",
-    "\n",
-    "# 1. Build model (Modules 01-04)\n",
-    "model = Sequential([\n",
-    "    Dense(784, 128), ReLU(),\n",
-    "    Dense(128, 64), ReLU(), \n",
-    "    Dense(64, 10), Softmax()\n",
-    "])\n",
-    "\n",
-    "# 2. Train model (Module 09)\n",
-    "trainer = Trainer(model, CrossEntropyLoss(), learning_rate=0.001)\n",
-    "trained_model = trainer.train(training_data, epochs=10)\n",
-    "\n",
-    "# 3. Compress model (Module 10)\n",
-    "compressed_model = quantize_layer_weights(trained_model)\n",
-    "\n",
-    "# 4. Benchmark model (Module 12)\n",
-    "perf = TinyTorchPerf()\n",
-    "benchmark_results = perf.benchmark(compressed_model, test_data)\n",
-    "\n",
-    "# 5. Deploy with MLOps (Module 13)\n",
-    "pipeline = MLOpsPipeline(compressed_model, training_data, validation_data, baseline_data)\n",
-    "pipeline.start_monitoring()\n",
-    "\n",
-    "# 6. Monitor and maintain\n",
-    "health = pipeline.check_system_health(new_data, current_accuracy=0.89)\n",
-    "if health[\"new_model_deployed\"]:\n",
-    "    print(\"🚀 New model deployed automatically!\")\n",
-    "```\n",
-    "\n",
-    "### What Students Have Achieved\n",
-    "By completing this module, you have:\n",
-    "- **Built a complete ML system** from tensors to production deployment\n",
-    "- **Integrated all TinyTorch components** into a cohesive workflow\n",
-    "- **Implemented production-grade MLOps** with monitoring and automation\n",
-    "- **Created self-maintaining systems** that adapt to changing conditions\n",
-    "- **Mastered the full ML lifecycle** from development to production\n",
-    "\n",
-    "### Real-World Impact\n",
-    "Your MLOps skills now enable:\n",
-    "- **Automated model maintenance** reducing manual intervention by 90%\n",
-    "- **Faster response to issues** from days to hours or minutes\n",
-    "- **Improved model reliability** through continuous monitoring\n",
-    "- **Scalable ML operations** that work across multiple models\n",
-    "- **Production-ready deployment** with industry-standard practices"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "cae9fbf7",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "comprehensive-integration-test",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "def test_comprehensive_integration():\n",
-    "    \"\"\"Test complete integration of all TinyTorch components\"\"\"\n",
-    "    print(\"🔬 Comprehensive Integration Test: Complete TinyTorch Ecosystem...\")\n",
-    "    \n",
-    "    # 1. Create synthetic data (simulating real ML dataset)\n",
-    "    np.random.seed(42)\n",
-    "    train_data = np.random.normal(0, 1, (1000, 10))\n",
-    "    val_data = np.random.normal(0, 1, (200, 10))\n",
-    "    baseline_data = np.random.normal(0, 1, (1000, 10))\n",
-    "    \n",
-    "    # 2. Create model architecture\n",
-    "    model = \"TinyTorch_Production_Model\"\n",
-    "    \n",
-    "    # 3. Set up complete MLOps pipeline\n",
-    "    pipeline = MLOpsPipeline(model, train_data, val_data, baseline_data)\n",
-    "    \n",
-    "    # 4. Start monitoring\n",
-    "    start_result = pipeline.start_monitoring()\n",
-    "    assert start_result[\"status\"] == \"started\"\n",
-    "    print(\"✅ MLOps pipeline started successfully\")\n",
-    "    \n",
-    "    # 5. Simulate production monitoring cycle\n",
-    "    print(\"\\n🔄 Simulating Production Monitoring Cycle...\")\n",
-    "    \n",
-    "    # Phase 1: Normal operation\n",
-    "    health1 = pipeline.check_system_health(\n",
-    "        new_data=np.random.normal(0, 1, (100, 10)),\n",
-    "        current_accuracy=0.94\n",
-    "    )\n",
-    "    print(f\"   Phase 1 - Normal: Accuracy {health1['current_accuracy']}, Drift: {health1['drift_detected']}\")\n",
-    "    \n",
-    "    # Phase 2: Gradual degradation\n",
-    "    health2 = pipeline.check_system_health(\n",
-    "        new_data=np.random.normal(0.5, 1, (100, 10)),\n",
-    "        current_accuracy=0.88\n",
-    "    )\n",
-    "    print(f\"   Phase 2 - Degradation: Accuracy {health2['current_accuracy']}, Drift: {health2['drift_detected']}\")\n",
-    "    \n",
-    "    # Phase 3: Significant drift and low accuracy\n",
-    "    health3 = pipeline.check_system_health(\n",
-    "        new_data=np.random.normal(2, 1, (100, 10)),\n",
-    "        current_accuracy=0.79\n",
-    "    )\n",
-    "    print(f\"   Phase 3 - Critical: Accuracy {health3['current_accuracy']}, Drift: {health3['drift_detected']}\")\n",
-    "    print(f\"   Retraining triggered: {health3['retraining_triggered']}\")\n",
-    "    print(f\"   New model deployed: {health3['new_model_deployed']}\")\n",
-    "    \n",
-    "    # 6. Get final pipeline status\n",
-    "    final_status = pipeline.get_pipeline_status()\n",
-    "    print(f\"\\n📊 Final Pipeline Status:\")\n",
-    "    print(f\"   Total deployments: {final_status['total_deployments']}\")\n",
-    "    print(f\"   Average improvement: {final_status['average_improvement']:.3f}\")\n",
-    "    print(f\"   System health: {health3['system_healthy']}\")\n",
-    "    \n",
-    "    # 7. Verify complete integration\n",
-    "    assert final_status[\"pipeline_active\"] == True\n",
-    "    assert len(final_status[\"deployment_history\"]) >= 0\n",
-    "    assert \"drift_history\" in final_status\n",
-    "    assert \"retrain_history\" in final_status\n",
-    "    \n",
-    "    print(\"\\n✅ Complete TinyTorch ecosystem integration successful!\")\n",
-    "    print(\"🎉 All components working together seamlessly!\")\n",
-    "    print(\"📈 Progress: Complete TinyTorch Ecosystem ✓\")\n",
-    "\n",
-    "# Run the comprehensive test\n",
-    "test_comprehensive_integration()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d73dc1e1",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🧪 Auto-Discovery Testing\n",
-    "\n",
-    "The following cell automatically discovers and runs all test functions in this module:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8210a883",
-   "metadata": {
-    "nbgrader": {
-     "grade": false,
-     "grade_id": "auto-discovery-tests",
-     "locked": false,
-     "schema_version": 3,
-     "solution": false,
-     "task": false
-    }
-   },
-   "outputs": [],
-   "source": [
-    "if __name__ == \"__main__\":\n",
-    "    from tito.tools.testing import run_module_tests_auto\n",
-    "    \n",
-    "    # Automatically discover and run all tests in this module\n",
-    "    success = run_module_tests_auto(\"MLOps\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "996c93f6",
-   "metadata": {
-    "cell_marker": "\"\"\""
-   },
-   "source": [
-    "## 🎯 Module Summary: MLOps Production Systems\n",
-    "\n",
-    "Congratulations! You've successfully implemented a complete MLOps system for production ML lifecycle management:\n",
-    "\n",
-    "### What You've Built\n",
-    "✅ **Model Monitor**: Performance tracking and drift detection\n",
-    "✅ **Retraining Triggers**: Automated model updates based on performance thresholds\n",
-    "✅ **MLOps Pipeline**: Complete production deployment and maintenance system\n",
-    "✅ **Integration**: Orchestrates all TinyTorch components in production workflows\n",
-    "\n",
-    "### Key Concepts You've Learned\n",
-    "- **Production ML systems** require continuous monitoring and maintenance\n",
-    "- **Drift detection** identifies when models need retraining\n",
-    "- **Automated workflows** respond to system degradation without manual intervention\n",
-    "- **MLOps pipelines** integrate monitoring, training, and deployment\n",
-    "- **System orchestration** coordinates complex ML component interactions\n",
-    "\n",
-    "### Real-World Applications\n",
-    "- **Production AI**: Automated model maintenance at scale\n",
-    "- **Enterprise ML**: Continuous monitoring and improvement systems\n",
-    "- **Cloud deployment**: Industry-standard MLOps practices\n",
-    "- **Model lifecycle**: Complete deployment and maintenance workflows\n",
-    "\n",
-    "### Connection to Industry Systems\n",
-    "Your implementation mirrors production platforms:\n",
-    "- **MLflow**: Model lifecycle management and experiment tracking\n",
-    "- **Kubeflow**: Kubernetes-based ML workflows and pipelines\n",
-    "- **Amazon SageMaker**: End-to-end ML platform with monitoring\n",
-    "- **Google AI Platform**: Production ML services with automation\n",
-    "\n",
-    "### Next Steps\n",
-    "1. **Export your code**: `tito export 13_mlops`\n",
-    "2. **Test your implementation**: `tito test 13_mlops`\n",
-    "3. **Deploy production systems**: Apply MLOps patterns to real-world ML projects\n",
-    "4. **Complete TinyTorch**: You've mastered the full ML systems pipeline!\n",
-    "\n",
-    "**🎉 TinyTorch Journey Complete!** You've built a complete ML framework from tensors to production deployment. You're now ready to tackle real-world ML systems challenges!"
-   ]
-  }
- ],
- "metadata": {
-  "jupytext": {
-   "main_language": "python"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}