TinyTorch/modules/source/05_dense/networks_dev.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e8d2a035",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "# Networks - Neural Network Architectures\n",
    "\n",
    "Welcome to the Networks module! This is where we compose layers into complete neural network architectures.\n",
    "\n",
    "## Learning Goals\n",
    "- Understand networks as function composition: `f(x) = layer_n(...layer_2(layer_1(x)))`\n",
    "- Build the Sequential network architecture for composing layers\n",
    "- Create common network patterns like MLPs (Multi-Layer Perceptrons)\n",
    "- Visualize network architectures and understand their capabilities\n",
    "- Master forward pass inference through complete networks\n",
    "\n",
    "## Build → Use → Reflect\n",
    "1. **Build**: Sequential networks that compose layers into complete architectures\n",
    "2. **Use**: Create different network patterns and run inference\n",
    "3. **Reflect**: How architecture design affects network behavior and capability\n",
    "\n",
    "## What You'll Learn\n",
    "By the end of this module, you'll understand:\n",
    "- How simple layers combine to create complex behaviors\n",
    "- The fundamental Sequential architecture pattern\n",
    "- How to build MLPs with any number of layers\n",
    "- Different network architectures (shallow, deep, wide)\n",
    "- How neural networks approximate complex functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85869d04",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "networks-imports",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| default_exp core.networks\n",
    "\n",
    "#| export\n",
    "import numpy as np\n",
    "import sys\n",
    "import os\n",
    "from typing import List, Union, Optional, Callable\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Import all the building blocks we need - try package first, then local modules\n",
    "try:\n",
    "    from tinytorch.core.tensor import Tensor\n",
    "    from tinytorch.core.layers import Dense\n",
    "    from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
    "except ImportError:\n",
    "    # For development, import from local modules\n",
    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
    "    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))\n",
    "    from tensor_dev import Tensor\n",
    "    from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
    "    from layers_dev import Dense"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a0a2310",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "networks-setup",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| hide\n",
    "#| export\n",
    "def _should_show_plots():\n",
    "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
    "    # Check multiple conditions that indicate we're in test mode\n",
    "    is_pytest = (\n",
    "        'pytest' in sys.modules or\n",
    "        'test' in sys.argv or\n",
    "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
    "        any('test' in arg for arg in sys.argv) or\n",
    "        any('pytest' in arg for arg in sys.argv)\n",
    "    )\n",
    "    \n",
    "    # Show plots in development mode (when not in test mode)\n",
    "    return not is_pytest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "83de2607",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "networks-welcome",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "print(\"🔥 TinyTorch Networks Module\")\n",
    "print(f\"NumPy version: {np.__version__}\")\n",
    "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
    "print(\"Ready to build neural network architectures!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eb8d5033",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 📦 Where This Code Lives in the Final Package\n",
    "\n",
    "**Learning Side:** You work in `modules/source/04_networks/networks_dev.py`  \n",
    "**Building Side:** Code exports to `tinytorch.core.networks`\n",
    "\n",
    "```python\n",
    "# Final package structure:\n",
    "from tinytorch.core.networks import Sequential, create_mlp  # Network architectures!\n",
    "from tinytorch.core.layers import Dense, Conv2D  # Building blocks\n",
    "from tinytorch.core.activations import ReLU, Sigmoid, Tanh  # Nonlinearity\n",
    "from tinytorch.core.tensor import Tensor  # Foundation\n",
    "```\n",
    "\n",
    "**Why this matters:**\n",
    "- **Learning:** Focused modules for deep understanding\n",
    "- **Production:** Proper organization like PyTorch's `torch.nn.Sequential`\n",
    "- **Consistency:** All network architectures live together in `core.networks`\n",
    "- **Integration:** Works seamlessly with layers, activations, and tensors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7ef9807",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## Step 1: Understanding Neural Networks as Function Composition\n",
    "\n",
    "### What is a Neural Network?\n",
    "A neural network is simply **function composition** - chaining simple functions together to create complex behaviors:\n",
    "\n",
    "```\n",
    "f(x) = f_n(f_{n-1}(...f_2(f_1(x))))\n",
    "```\n",
    "\n",
    "### Real-World Analogy: Assembly Line\n",
    "Think of an assembly line in a factory:\n",
    "- **Input:** Raw materials (data)\n",
    "- **Stations:** Each worker (layer) transforms the product\n",
    "- **Output:** Final product (predictions)\n",
    "\n",
    "### The Power of Composition\n",
    "```python\n",
    "# Simple functions\n",
    "def add_one(x): return x + 1\n",
    "def multiply_two(x): return x * 2\n",
    "def square(x): return x * x\n",
    "\n",
    "# Composed function\n",
    "def complex_function(x):\n",
    "    return square(multiply_two(add_one(x)))\n",
    "    \n",
    "# This is what neural networks do!\n",
    "```\n",
    "\n",
    "### Why This Matters\n",
    "- **Universal Approximation:** MLPs can approximate any continuous function\n",
    "- **Hierarchical Learning:** Early layers learn simple features, later layers learn complex patterns\n",
    "- **Composability:** Mix and match layers to create custom architectures\n",
    "- **Scalability:** Add more layers or make them wider as needed\n",
    "\n",
    "### From Modules We've Built\n",
    "- **Tensors:** The data containers that flow through networks\n",
    "- **Activations:** The nonlinear transformations that enable complex behaviors\n",
    "- **Layers:** The building blocks that transform data\n",
    "\n",
    "Now let's build our first network architecture!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d761b0e8",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 2: Building the Sequential Network\n",
    "\n",
    "### What is Sequential?\n",
    "**Sequential** is the most fundamental network architecture - it applies layers in order:\n",
    "\n",
    "```\n",
    "Sequential([layer1, layer2, layer3]) \n",
    "→ f(x) = layer3(layer2(layer1(x)))\n",
    "```\n",
    "\n",
    "### Why Sequential Matters\n",
    "- **Foundation:** Every neural network library has this pattern\n",
    "- **Simplicity:** Easy to understand and implement\n",
    "- **Flexibility:** Can compose any layers in any order\n",
    "- **Building Block:** Foundation for more complex architectures\n",
    "\n",
    "### The Sequential Pattern\n",
    "```python\n",
    "# PyTorch style\n",
    "model = nn.Sequential(\n",
    "    nn.Linear(784, 128),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(128, 10)\n",
    ")\n",
    "\n",
    "# Our TinyTorch style\n",
    "model = Sequential([\n",
    "    Dense(784, 128),\n",
    "    ReLU(),\n",
    "    Dense(128, 10)\n",
    "])\n",
    "```\n",
    "\n",
    "Let's implement this fundamental architecture!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "442a13a0",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "sequential-class",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "class Sequential:\n",
    "    \"\"\"\n",
    "    Sequential Network: Composes layers in sequence\n",
    "    \n",
    "    The most fundamental network architecture.\n",
    "    Applies layers in order: f(x) = layer_n(...layer_2(layer_1(x)))\n",
    "    \"\"\"\n",
    "    \n",
    "    def __init__(self, layers: Optional[List] = None):\n",
    "        \"\"\"\n",
    "        Initialize Sequential network with layers.\n",
    "        \n",
    "        Args:\n",
    "            layers: List of layers to compose in order (optional, defaults to empty list)\n",
    "            \n",
    "        TODO: Store the layers and implement forward pass\n",
    "        \n",
    "        APPROACH:\n",
    "        1. Store the layers list as an instance variable\n",
    "        2. Initialize empty list if no layers provided\n",
    "        3. Prepare for forward pass implementation\n",
    "        \n",
    "        EXAMPLE:\n",
    "        Sequential([Dense(3,4), ReLU(), Dense(4,2)])\n",
    "        creates a 3-layer network: Dense → ReLU → Dense\n",
    "        \n",
    "        HINTS:\n",
    "        - Use self.layers to store the layers\n",
    "        - Handle empty initialization case\n",
    "        \"\"\"\n",
    "        ### BEGIN SOLUTION\n",
    "        self.layers = layers if layers is not None else []\n",
    "        ### END SOLUTION\n",
    "    \n",
    "    def forward(self, x: Tensor) -> Tensor:\n",
    "        \"\"\"\n",
    "        Forward pass through all layers in sequence.\n",
    "        \n",
    "        Args:\n",
    "            x: Input tensor\n",
    "            \n",
    "        Returns:\n",
    "            Output tensor after passing through all layers\n",
    "            \n",
    "        TODO: Implement sequential forward pass through all layers\n",
    "        \n",
    "        APPROACH:\n",
    "        1. Start with the input tensor\n",
    "        2. Apply each layer in sequence\n",
    "        3. Each layer's output becomes the next layer's input\n",
    "        4. Return the final output\n",
    "        \n",
    "        EXAMPLE:\n",
    "        Input: Tensor([[1, 2, 3]])\n",
    "        Layer1 (Dense): Tensor([[1.4, 2.8]])\n",
    "        Layer2 (ReLU): Tensor([[1.4, 2.8]])\n",
    "        Layer3 (Dense): Tensor([[0.7]])\n",
    "        Output: Tensor([[0.7]])\n",
    "        \n",
    "        HINTS:\n",
    "        - Use a for loop: for layer in self.layers:\n",
    "        - Apply each layer: x = layer(x)\n",
    "        - The output of one layer becomes input to the next\n",
    "        - Return the final result\n",
    "        \"\"\"\n",
    "        ### BEGIN SOLUTION\n",
    "        # Apply each layer in sequence\n",
    "        for layer in self.layers:\n",
    "            x = layer(x)\n",
    "        return x\n",
    "        ### END SOLUTION\n",
    "    \n",
    "    def __call__(self, x: Tensor) -> Tensor:\n",
    "        \"\"\"Make the network callable: sequential(x) instead of sequential.forward(x)\"\"\"\n",
    "        return self.forward(x)\n",
    "    \n",
    "    def add(self, layer):\n",
    "        \"\"\"Add a layer to the network.\"\"\"\n",
    "        self.layers.append(layer)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8d0cb245",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "### 🧪 Unit Test: Sequential Network\n",
    "\n",
    "Let's test your Sequential network implementation! This is the foundation of all neural network architectures.\n",
    "\n",
    "**This is a unit test** - it tests one specific class (Sequential network) in isolation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58bde5f1",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "test-sequential-immediate",
     "locked": true,
     "points": 10,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Test Sequential network immediately after implementation\n",
    "print(\"🔬 Unit Test: Sequential Network...\")\n",
    "\n",
    "# Create a simple 2-layer network: 3 → 4 → 2\n",
    "try:\n",
    "    network = Sequential([\n",
    "        Dense(input_size=3, output_size=4),\n",
    "        ReLU(),\n",
    "        Dense(input_size=4, output_size=2),\n",
    "        Sigmoid()\n",
    "    ])\n",
    "    \n",
    "    print(f\"Network created with {len(network.layers)} layers\")\n",
    "    print(\"✅ Sequential network creation successful\")\n",
    "    \n",
    "    # Test with sample data\n",
    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
    "    print(f\"Input: {x}\")\n",
    "    \n",
    "    # Forward pass\n",
    "    y = network(x)\n",
    "    print(f\"Output: {y}\")\n",
    "    print(f\"Output shape: {y.shape}\")\n",
    "    \n",
    "    # Verify the network works\n",
    "    assert y.shape == (1, 2), f\"Expected shape (1, 2), got {y.shape}\"\n",
    "    print(\"✅ Sequential network produces correct output shape\")\n",
    "    \n",
    "    # Test that sigmoid output is in valid range\n",
    "    assert np.all(y.data >= 0) and np.all(y.data <= 1), \"Sigmoid output should be between 0 and 1\"\n",
    "    print(\"✅ Sequential network output is in valid range\")\n",
    "    \n",
    "    # Test that layers are stored correctly\n",
    "    assert len(network.layers) == 4, f\"Expected 4 layers, got {len(network.layers)}\"\n",
    "    print(\"✅ Sequential network stores layers correctly\")\n",
    "    \n",
    "    # Test batch processing\n",
    "    x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])\n",
    "    y_batch = network(x_batch)\n",
    "    assert y_batch.shape == (2, 2), f\"Expected batch shape (2, 2), got {y_batch.shape}\"\n",
    "    print(\"✅ Sequential network handles batch processing\")\n",
    "    \n",
    "except Exception as e:\n",
    "    print(f\"❌ Sequential network test failed: {e}\")\n",
    "    raise\n",
    "\n",
    "# Show the network architecture\n",
    "print(\"🎯 Sequential network behavior:\")\n",
    "print(\"   Applies layers in sequence: f(g(h(x)))\")\n",
    "print(\"   Input flows through each layer in order\")\n",
    "print(\"   Output of layer i becomes input of layer i+1\")\n",
    "print(\"📈 Progress: Sequential network ✓\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86f50a55",
   "metadata": {
    "cell_marker": "\"\"\"",
    "lines_to_next_cell": 1
   },
   "source": [
    "## Step 3: Building Multi-Layer Perceptrons (MLPs)\n",
    "\n",
    "### What is an MLP?\n",
    "A **Multi-Layer Perceptron** is the classic neural network architecture:\n",
    "\n",
    "```\n",
    "Input → Dense → Activation → Dense → Activation → ... → Dense → Output\n",
    "```\n",
    "\n",
    "### Why MLPs are Important\n",
    "- **Universal approximation**: Can approximate any continuous function\n",
    "- **Foundation**: Basis for understanding all neural networks\n",
    "- **Versatile**: Works for classification, regression, and more\n",
    "- **Simple**: Easy to understand and implement\n",
    "\n",
    "### MLP Architecture Pattern\n",
    "```\n",
    "create_mlp(3, [4, 2], 1) creates:\n",
    "Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid\n",
    "```\n",
    "\n",
    "### Real-World Applications\n",
    "- **Tabular data**: Customer analytics, financial modeling\n",
    "- **Feature learning**: Learning representations from raw data\n",
    "- **Classification**: Spam detection, medical diagnosis\n",
    "- **Regression**: Price prediction, time series forecasting\n",
    "\n",
    "### The MLP Factory Pattern\n",
    "Instead of manually creating each layer, we'll build a function that creates MLPs automatically!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39d18c19",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "create-mlp",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int, \n",
    "               activation=ReLU, output_activation=Sigmoid) -> Sequential:\n",
    "    \"\"\"\n",
    "    Create a Multi-Layer Perceptron (MLP) network.\n",
    "    \n",
    "    Args:\n",
    "        input_size: Number of input features\n",
    "        hidden_sizes: List of hidden layer sizes\n",
    "        output_size: Number of output features\n",
    "        activation: Activation function for hidden layers (default: ReLU)\n",
    "        output_activation: Activation function for output layer (default: Sigmoid)\n",
    "        \n",
    "    Returns:\n",
    "        Sequential network with MLP architecture\n",
    "        \n",
    "    TODO: Implement MLP creation with alternating Dense and activation layers.\n",
    "    \n",
    "    APPROACH:\n",
    "    1. Start with an empty list of layers\n",
    "    2. Add layers in this pattern:\n",
    "       - Dense(input_size → first_hidden_size)\n",
    "       - Activation()\n",
    "       - Dense(first_hidden_size → second_hidden_size)\n",
    "       - Activation()\n",
    "       - ...\n",
    "       - Dense(last_hidden_size → output_size)\n",
    "       - Output_activation()\n",
    "    3. Return Sequential(layers)\n",
    "    \n",
    "    EXAMPLE:\n",
    "    create_mlp(3, [4, 2], 1) creates:\n",
    "    Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid\n",
    "    \n",
    "    HINTS:\n",
    "    - Start with layers = []\n",
    "    - Track current_size starting with input_size\n",
    "    - For each hidden_size: add Dense(current_size, hidden_size), then activation\n",
    "    - Finally add Dense(last_hidden_size, output_size), then output_activation\n",
    "    - Return Sequential(layers)\n",
    "    \"\"\"\n",
    "    layers = []\n",
    "    current_size = input_size\n",
    "    \n",
    "    # Add hidden layers with activations\n",
    "    for hidden_size in hidden_sizes:\n",
    "        layers.append(Dense(current_size, hidden_size))\n",
    "        layers.append(activation())\n",
    "        current_size = hidden_size\n",
    "    \n",
    "    # Add output layer with output activation\n",
    "    layers.append(Dense(current_size, output_size))\n",
    "    layers.append(output_activation())\n",
    "    \n",
    "    return Sequential(layers)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10852ab7",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "### 🧪 Unit Test: MLP Creation\n",
    "\n",
    "Let's test your MLP creation function! This builds complete neural networks with a single function call.\n",
    "\n",
    "**This is a unit test** - it tests one specific function (create_mlp) in isolation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8a67516",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "test-mlp-immediate",
     "locked": true,
     "points": 10,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Test MLP creation immediately after implementation\n",
    "print(\"🔬 Unit Test: MLP Creation...\")\n",
    "\n",
    "# Create a simple MLP: 3 → 4 → 2 → 1\n",
    "try:\n",
    "    mlp = create_mlp(input_size=3, hidden_sizes=[4, 2], output_size=1)\n",
    "    \n",
    "    print(f\"MLP created with {len(mlp.layers)} layers\")\n",
    "    print(\"✅ MLP creation successful\")\n",
    "    \n",
    "    # Test the structure - should have 6 layers: Dense, ReLU, Dense, ReLU, Dense, Sigmoid\n",
    "    expected_layers = 6  # 3 Dense + 2 ReLU + 1 Sigmoid\n",
    "    assert len(mlp.layers) == expected_layers, f\"Expected {expected_layers} layers, got {len(mlp.layers)}\"\n",
    "    print(\"✅ MLP has correct number of layers\")\n",
    "    \n",
    "    # Test layer types\n",
    "    layer_types = [type(layer).__name__ for layer in mlp.layers]\n",
    "    expected_pattern = ['Dense', 'ReLU', 'Dense', 'ReLU', 'Dense', 'Sigmoid']\n",
    "    assert layer_types == expected_pattern, f\"Expected pattern {expected_pattern}, got {layer_types}\"\n",
    "    print(\"✅ MLP follows correct layer pattern\")\n",
    "    \n",
    "    # Test with sample data\n",
    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
    "    y = mlp(x)\n",
    "    print(f\"MLP input: {x}\")\n",
    "    print(f\"MLP output: {y}\")\n",
    "    print(f\"MLP output shape: {y.shape}\")\n",
    "    \n",
    "    # Verify the output\n",
    "    assert y.shape == (1, 1), f\"Expected shape (1, 1), got {y.shape}\"\n",
    "    print(\"✅ MLP produces correct output shape\")\n",
    "    \n",
    "    # Test that sigmoid output is in valid range\n",
    "    assert np.all(y.data >= 0) and np.all(y.data <= 1), \"Sigmoid output should be between 0 and 1\"\n",
    "    print(\"✅ MLP output is in valid range\")\n",
    "    \n",
    "except Exception as e:\n",
    "    print(f\"❌ MLP creation test failed: {e}\")\n",
    "    raise\n",
    "\n",
    "# Test different architectures\n",
    "try:\n",
    "    # Test shallow network\n",
    "    shallow_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1)\n",
    "    assert len(shallow_net.layers) == 4, f\"Shallow network should have 4 layers, got {len(shallow_net.layers)}\"\n",
    "    \n",
    "    # Test deep network  \n",
    "    deep_net = create_mlp(input_size=3, hidden_sizes=[4, 4, 4], output_size=1)\n",
    "    assert len(deep_net.layers) == 8, f\"Deep network should have 8 layers, got {len(deep_net.layers)}\"\n",
    "    \n",
    "    # Test wide network\n",
    "    wide_net = create_mlp(input_size=3, hidden_sizes=[10], output_size=1)\n",
    "    assert len(wide_net.layers) == 4, f\"Wide network should have 4 layers, got {len(wide_net.layers)}\"\n",
    "    \n",
    "    print(\"✅ Different MLP architectures work correctly\")\n",
    "    \n",
    "except Exception as e:\n",
    "    print(f\"❌ MLP architecture test failed: {e}\")\n",
    "    raise\n",
    "\n",
    "# Show the MLP pattern\n",
    "print(\"🎯 MLP creation pattern:\")\n",
    "print(\"   Input → Dense → Activation → Dense → Activation → ... → Dense → Output_Activation\")\n",
    "print(\"   Automatically creates the complete architecture\")\n",
    "print(\"   Handles any number of hidden layers\")\n",
    "print(\"📈 Progress: Sequential network ✓, MLP creation ✓\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67d76916",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## Step 4: Understanding Network Architectures\n",
    "\n",
    "### Architecture Patterns\n",
    "Different network architectures solve different problems:\n",
    "\n",
    "#### **Shallow vs Deep Networks**\n",
    "```python\n",
    "# Shallow: 1 hidden layer\n",
    "shallow = create_mlp(10, [20], 1)\n",
    "\n",
    "# Deep: Many hidden layers\n",
    "deep = create_mlp(10, [20, 20, 20], 1)\n",
    "```\n",
    "\n",
    "#### **Narrow vs Wide Networks**\n",
    "```python\n",
    "# Narrow: Few neurons per layer\n",
    "narrow = create_mlp(10, [5, 5], 1)\n",
    "\n",
    "# Wide: Many neurons per layer\n",
    "wide = create_mlp(10, [50], 1)\n",
    "```\n",
    "\n",
    "### Why Architecture Matters\n",
    "- **Capacity:** More parameters can learn more complex patterns\n",
    "- **Depth:** Enables hierarchical feature learning\n",
    "- **Width:** Allows parallel processing of features\n",
    "- **Efficiency:** Balance between performance and computation\n",
    "\n",
    "### Different Activation Functions\n",
    "   ```python\n",
    "# ReLU networks (most common)\n",
    "relu_net = create_mlp(10, [20], 1, activation=ReLU)\n",
    "   \n",
    "# Tanh networks (centered around 0)\n",
    "tanh_net = create_mlp(10, [20], 1, activation=Tanh)\n",
    "   \n",
    "# Multi-class classification\n",
    "classifier = create_mlp(10, [20], 3, output_activation=Softmax)\n",
    "   ```\n",
    "\n",
    "Let's test different architectures!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed72681d",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "### 🧪 Unit Test: Architecture Variations\n",
    "\n",
    "Let's test different network architectures to understand their behavior.\n",
    "\n",
    "**This is a unit test** - it tests architectural variations in isolation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9daa8d0d",
   "metadata": {
    "nbgrader": {
     "grade": true,
     "grade_id": "test-architectures",
     "locked": true,
     "points": 10,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Test different architectures\n",
    "print(\"🔬 Unit Test: Network Architecture Variations...\")\n",
    "\n",
    "try:\n",
    "    # Test different activation functions\n",
    "    relu_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=ReLU)\n",
    "    tanh_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh)\n",
    "    \n",
    "    # Test different output activations\n",
    "    classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax)\n",
    "    \n",
    "    # Test with sample data\n",
    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
    "    \n",
    "    # Test ReLU network\n",
    "    y_relu = relu_net(x)\n",
    "    assert y_relu.shape == (1, 1), \"ReLU network should work\"\n",
    "    print(\"✅ ReLU network works correctly\")\n",
    "    \n",
    "    # Test Tanh network\n",
    "    y_tanh = tanh_net(x)\n",
    "    assert y_tanh.shape == (1, 1), \"Tanh network should work\"\n",
    "    print(\"✅ Tanh network works correctly\")\n",
    "    \n",
    "    # Test multi-class classifier\n",
    "    y_multi = classifier(x)\n",
    "    assert y_multi.shape == (1, 3), \"Multi-class classifier should work\"\n",
    "    \n",
    "    # Check softmax properties\n",
    "    assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, \"Softmax outputs should sum to 1\"\n",
    "    print(\"✅ Multi-class classifier with Softmax works correctly\")\n",
    "    \n",
    "    # Test different architectures\n",
    "    shallow = create_mlp(input_size=4, hidden_sizes=[5], output_size=1)\n",
    "    deep = create_mlp(input_size=4, hidden_sizes=[5, 5, 5], output_size=1)\n",
    "    wide = create_mlp(input_size=4, hidden_sizes=[20], output_size=1)\n",
    "    \n",
    "    x_test = Tensor([[1.0, 2.0, 3.0, 4.0]])\n",
    "    \n",
    "    # Test all architectures\n",
    "    for name, net in [(\"Shallow\", shallow), (\"Deep\", deep), (\"Wide\", wide)]:\n",
    "        y = net(x_test)\n",
    "        assert y.shape == (1, 1), f\"{name} network should produce correct shape\"\n",
    "        print(f\"✅ {name} network works correctly\")\n",
    "    \n",
    "    print(\"✅ All network architectures work correctly\")\n",
    "    \n",
    "except Exception as e:\n",
    "    print(f\"❌ Architecture test failed: {e}\")\n",
    "    raise\n",
    "\n",
    "print(\"🎯 Architecture insights:\")\n",
    "print(\"   Different activations create different behaviors\")\n",
    "print(\"   Softmax enables multi-class classification\")\n",
    "print(\"   Architecture affects network capacity and learning\")\n",
    "print(\"📈 Progress: Sequential ✓, MLP creation ✓, Architecture variations ✓\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8df67be5",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## Step 5: Comprehensive Test - Complete Network Applications\n",
    "\n",
    "### Real-World Network Applications\n",
    "Let's test our networks on realistic scenarios:\n",
    "\n",
    "#### **Classification Problem**\n",
    "```python\n",
    "# 4 features → 2 classes (binary classification)\n",
    "classifier = create_mlp(4, [8, 4], 2, output_activation=Softmax)\n",
    "```\n",
    "\n",
    "#### **Regression Problem**\n",
    "```python\n",
    "# 3 features → 1 continuous output\n",
    "regressor = create_mlp(3, [10, 5], 1, output_activation=lambda: Dense(0, 0))  # Linear output\n",
    "```\n",
    "\n",
    "#### **Deep Learning Pattern**\n",
    "```python\n",
    "# Complex feature learning\n",
    "deep_net = create_mlp(10, [64, 32, 16], 1)\n",
    "```\n",
    "\n",
    "This comprehensive test ensures our networks work for real ML applications!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "011cd928",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": true,
     "grade_id": "test-integration",
     "locked": true,
     "points": 15,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Comprehensive test - complete network applications\n",
    "print(\"🔬 Comprehensive Test: Complete Network Applications...\")\n",
    "\n",
    "try:\n",
    "    # Test 1: Multi-class Classification (Iris-like dataset)\n",
    "    print(\"\\n1. Multi-class Classification Test:\")\n",
    "    iris_classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax)\n",
    "    \n",
    "    # Simulate iris features: [sepal_length, sepal_width, petal_length, petal_width]\n",
    "    iris_samples = Tensor([\n",
    "        [5.1, 3.5, 1.4, 0.2],  # Setosa\n",
    "        [7.0, 3.2, 4.7, 1.4],  # Versicolor\n",
    "        [6.3, 3.3, 6.0, 2.5]   # Virginica\n",
    "        ])\n",
    "        \n",
    "    iris_predictions = iris_classifier(iris_samples)\n",
    "    assert iris_predictions.shape == (3, 3), \"Iris classifier should output 3 classes for 3 samples\"\n",
    "        \n",
    "    # Check softmax properties\n",
    "    row_sums = np.sum(iris_predictions.data, axis=1)\n",
    "    assert np.allclose(row_sums, 1.0), \"Each prediction should sum to 1\"\n",
    "    print(\"✅ Multi-class classification works correctly\")\n",
    "    \n",
    "    # Test 2: Regression Task (Housing prices)\n",
    "    print(\"\\n2. Regression Task Test:\")\n",
    "    # Create a regressor without final activation (linear output)\n",
    "    class Identity:\n",
    "        def __call__(self, x): return x\n",
    "    \n",
    "    housing_regressor = create_mlp(input_size=3, hidden_sizes=[10, 5], output_size=1, output_activation=Identity)\n",
    "    \n",
    "    # Simulate housing features: [size, bedrooms, location_score]\n",
    "    housing_samples = Tensor([\n",
    "        [2000, 3, 8.5],  # Large house, good location\n",
    "        [1200, 2, 6.0],  # Medium house, ok location\n",
    "        [800, 1, 4.0]    # Small house, poor location\n",
    "    ])\n",
    "    \n",
    "    housing_predictions = housing_regressor(housing_samples)\n",
    "    assert housing_predictions.shape == (3, 1), \"Housing regressor should output 1 value per sample\"\n",
    "    print(\"✅ Regression task works correctly\")\n",
    "    \n",
    "    # Test 3: Deep Network Performance\n",
    "    print(\"\\n3. Deep Network Test:\")\n",
    "    deep_network = create_mlp(input_size=10, hidden_sizes=[20, 15, 10, 5], output_size=1)\n",
    "    \n",
    "    # Test with realistic batch size\n",
    "    batch_data = Tensor(np.random.randn(32, 10))  # 32 samples, 10 features\n",
    "    deep_predictions = deep_network(batch_data)\n",
    "    \n",
    "    assert deep_predictions.shape == (32, 1), \"Deep network should handle batch processing\"\n",
    "    assert not np.any(np.isnan(deep_predictions.data)), \"Deep network should not produce NaN\"\n",
    "    print(\"✅ Deep network handles batch processing correctly\")\n",
    "    \n",
    "    # Test 4: Network Composition\n",
    "    print(\"\\n4. Network Composition Test:\")\n",
    "    # Create a feature extractor and classifier separately\n",
    "    feature_extractor = Sequential([\n",
    "    Dense(input_size=10, output_size=5),\n",
    "        ReLU(),\n",
    "    Dense(input_size=5, output_size=3),\n",
    "        ReLU()\n",
    "    ])\n",
    "    \n",
    "    classifier_head = Sequential([\n",
    "    Dense(input_size=3, output_size=2),\n",
    "        Softmax()\n",
    "    ])\n",
    "    \n",
    "    # Test composition\n",
    "    raw_data = Tensor(np.random.randn(5, 10))\n",
    "    features = feature_extractor(raw_data)\n",
    "    final_predictions = classifier_head(features)\n",
    "    \n",
    "    assert features.shape == (5, 3), \"Feature extractor should output 3 features\"\n",
    "    assert final_predictions.shape == (5, 2), \"Classifier should output 2 classes\"\n",
    "    \n",
    "    row_sums = np.sum(final_predictions.data, axis=1)\n",
    "    assert np.allclose(row_sums, 1.0), \"Composed network predictions should be valid\"\n",
    "    print(\"✅ Network composition works correctly\")\n",
    "    \n",
    "    print(\"\\n🎉 Comprehensive test passed! Your networks work correctly for:\")\n",
    "    print(\"  • Multi-class classification (Iris flowers)\")\n",
    "    print(\"  • Regression tasks (housing prices)\")\n",
    "    print(\"  • Deep learning architectures\")\n",
    "    print(\"  • Network composition and feature extraction\")\n",
    "\n",
    "except Exception as e:\n",
    "    print(f\"❌ Comprehensive test failed: {e}\")\n",
    "\n",
    "print(\"📈 Final Progress: Complete network architectures ready for real ML applications!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb906ad7",
   "metadata": {
    "lines_to_next_cell": 1,
    "nbgrader": {
     "grade": false,
     "grade_id": "networks-compatibility",
     "locked": false,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "#| export\n",
    "class MLP:\n",
    "    \"\"\"\n",
    "    Multi-Layer Perceptron (MLP) class.\n",
    "    \n",
    "    A convenient wrapper around Sequential networks for standard MLP architectures.\n",
    "    Maintains parameter information and provides a clean interface.\n",
    "    \n",
    "    Args:\n",
    "        input_size: Number of input features\n",
    "        hidden_size: Size of the single hidden layer\n",
    "        output_size: Number of output features\n",
    "        activation: Activation function for hidden layer (default: ReLU)\n",
    "        output_activation: Activation function for output layer (default: Sigmoid)\n",
    "    \"\"\"\n",
    "    \n",
    "    def __init__(self, input_size: int, hidden_size: int, output_size: int, \n",
    "                 activation=ReLU, output_activation=None):\n",
    "        self.input_size = input_size\n",
    "        self.hidden_size = hidden_size\n",
    "        self.output_size = output_size\n",
    "        \n",
    "        # Build the network layers\n",
    "        layers = []\n",
    "        \n",
    "        # Input to hidden layer\n",
    "        layers.append(Dense(input_size, hidden_size))\n",
    "        layers.append(activation())\n",
    "        \n",
    "        # Hidden to output layer\n",
    "        layers.append(Dense(hidden_size, output_size))\n",
    "        if output_activation is not None:\n",
    "            layers.append(output_activation())\n",
    "        \n",
    "        self.network = Sequential(layers)\n",
    "    \n",
    "    def forward(self, x):\n",
    "        \"\"\"Forward pass through the MLP network.\"\"\"\n",
    "        return self.network.forward(x)\n",
    "    \n",
    "    def __call__(self, x):\n",
    "        \"\"\"Make the MLP callable.\"\"\"\n",
    "        return self.forward(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c811b23",
   "metadata": {
    "lines_to_next_cell": 1
   },
   "source": [
    "\n",
    "def test_sequential_networks():\n",
    "    \"\"\"Test Sequential network implementation comprehensively.\"\"\"\n",
    "    print(\"🔬 Unit Test: Sequential Networks...\")\n",
    "    \n",
    "    # Test basic Sequential network\n",
    "    net = Sequential([\n",
    "        Dense(input_size=3, output_size=4),\n",
    "        ReLU(),\n",
    "        Dense(input_size=4, output_size=2),\n",
    "        Sigmoid()\n",
    "    ])\n",
    "    \n",
    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
    "    y = net(x)\n",
    "    \n",
    "    assert y.shape == (1, 2), \"Sequential network should produce correct output shape\"\n",
    "    assert np.all(y.data > 0), \"Sigmoid output should be positive\"\n",
    "    assert np.all(y.data < 1), \"Sigmoid output should be less than 1\"\n",
    "    \n",
    "    print(\"✅ Sequential networks work correctly\")\n",
    "\n",
    "def test_mlp_creation():\n",
    "    \"\"\"Test MLP creation function comprehensively.\"\"\"\n",
    "    print(\"🔬 Unit Test: MLP Creation...\")\n",
    "    \n",
    "    # Test different MLP architectures\n",
    "    shallow = create_mlp(input_size=4, hidden_sizes=[5], output_size=1)\n",
    "    deep = create_mlp(input_size=4, hidden_sizes=[8, 6, 4], output_size=2)\n",
    "    \n",
    "    x = Tensor([[1.0, 2.0, 3.0, 4.0]])\n",
    "    \n",
    "    # Test shallow network\n",
    "    y_shallow = shallow(x)\n",
    "    assert y_shallow.shape == (1, 1), \"Shallow MLP should work\"\n",
    "    \n",
    "    # Test deep network  \n",
    "    y_deep = deep(x)\n",
    "    assert y_deep.shape == (1, 2), \"Deep MLP should work\"\n",
    "    \n",
    "    print(\"✅ MLP creation works correctly\")\n",
    "\n",
    "def test_network_architectures():\n",
    "    \"\"\"Test different network architectures comprehensively.\"\"\"\n",
    "    print(\"🔬 Unit Test: Network Architectures...\")\n",
    "    \n",
    "    # Test different activation functions\n",
    "    relu_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=ReLU)\n",
    "    tanh_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh)\n",
    "    \n",
    "    # Test multi-class classifier\n",
    "    classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax)\n",
    "    \n",
    "    x = Tensor([[1.0, 2.0, 3.0]])\n",
    "    \n",
    "    # Test all architectures\n",
    "    y_relu = relu_net(x)\n",
    "    y_tanh = tanh_net(x)\n",
    "    y_multi = classifier(x)\n",
    "    \n",
    "    assert y_relu.shape == (1, 1), \"ReLU network should work\"\n",
    "    assert y_tanh.shape == (1, 1), \"Tanh network should work\"\n",
    "    assert y_multi.shape == (1, 3), \"Multi-class classifier should work\"\n",
    "    assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, \"Softmax outputs should sum to 1\"\n",
    "    \n",
    "    print(\"✅ Network architectures work correctly\")\n",
    "\n",
    "def test_networks():\n",
    "    \"\"\"Test network comprehensive testing with real ML scenarios.\"\"\"\n",
    "    print(\"🔬 Comprehensive Test: Network Applications...\")\n",
    "    \n",
    "    # Test multi-class classification\n",
    "    iris_classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax)\n",
    "    iris_samples = Tensor([[5.1, 3.5, 1.4, 0.2], [7.0, 3.2, 4.7, 1.4], [6.3, 3.3, 6.0, 2.5]])\n",
    "    iris_predictions = iris_classifier(iris_samples)\n",
    "    \n",
    "    assert iris_predictions.shape == (3, 3), \"Iris classifier should work\"\n",
    "    row_sums = np.sum(iris_predictions.data, axis=1)\n",
    "    assert np.allclose(row_sums, 1.0), \"Predictions should sum to 1\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8035240",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 🧪 Module Testing\n",
    "\n",
    "Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
    "\n",
    "**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "001981f5",
   "metadata": {
    "nbgrader": {
     "grade": false,
     "grade_id": "standardized-testing",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# =============================================================================\n",
    "# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
    "# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
    "# =============================================================================\n",
    "\n",
    "if __name__ == \"__main__\":\n",
    "    from tito.tools.testing import run_module_tests_auto\n",
    "    \n",
    "    # Automatically discover and run all tests in this module\n",
    "    success = run_module_tests_auto(\"Networks\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ee20f1b",
   "metadata": {
    "cell_marker": "\"\"\""
   },
   "source": [
    "## 🎯 Module Summary: Neural Network Architectures Mastery!\n",
    "\n",
    "Congratulations! You've successfully implemented complete neural network architectures:\n",
    "\n",
    "### What You've Accomplished\n",
    "✅ **Sequential Networks**: Chained layers for complex transformations\n",
    "✅ **MLP Creation**: Multi-layer perceptrons with flexible architectures\n",
    "✅ **Network Architectures**: Different activation patterns and output types\n",
    "✅ **Integration**: Real-world applications like classification and regression\n",
    "\n",
    "### Key Concepts You've Learned\n",
    "- **Sequential Processing**: How layers chain together for complex functions\n",
    "- **MLP Design**: Multi-layer perceptrons as universal function approximators  \n",
    "- **Architecture Choices**: How depth, width, and activations affect learning\n",
    "- **Real Applications**: Classification, regression, and feature extraction\n",
    "\n",
    "### Next Steps\n",
    "1. **Export your code**: `tito package nbdev --export 04_networks`\n",
    "2. **Test your implementation**: `tito test 04_networks`\n",
    "3. **Build complete models**: Combine with training for full ML pipelines\n",
    "4. **Move to Module 5**: Add convolutional layers for image processing!\n",
    "\n",
    "**Ready for CNNs?** Your network foundations are now ready for specialized architectures!"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "main_language": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}