TinyTorch/modules/layers/layers_dev.ipynb

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "---\n",
        "jupyter:\n",
        "  jupytext:\n",
        "    text_representation:\n",
        "      extension: .py\n",
        "      format_name: percent\n",
        "      format_version: '1.3'\n",
        "      jupytext_version: 1.17.1\n",
        "---\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "# Module 2: Layers - Neural Network Building Blocks\n",
        "\n",
        "Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.\n",
        "\n",
        "## Learning Goals\n",
        "- Understand layers as functions that transform tensors: `y = f(x)`\n",
        "- Implement Dense layers with linear transformations: `y = Wx + b`\n",
        "- Use activation functions from the activations module for nonlinearity\n",
        "- See how neural networks are just function composition\n",
        "- Build intuition before diving into training\n",
        "\n",
        "## Build \u2192 Use \u2192 Understand\n",
        "1. **Build**: Dense layers using activation functions as building blocks\n",
        "2. **Use**: Transform tensors and see immediate results\n",
        "3. **Understand**: How neural networks transform information\n",
        "\n",
        "## Module Dependencies\n",
        "This module builds on the **activations** module:\n",
        "- **activations** \u2192 **layers** \u2192 **networks**\n",
        "- Clean separation of concerns: math functions \u2192 layer building blocks \u2192 full networks\n",
        "\n",
        "## Module \u2192 Package Structure\n",
        "**\ud83c\udf93 Teaching vs. \ud83d\udd27 Building**: \n",
        "- **Learning side**: Work in `modules/layers/layers_dev.py`  \n",
        "- **Building side**: Exports to `tinytorch/core/layers.py`\n",
        "\n",
        "This module builds the fundamental transformations that compose into neural networks.\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#| default_exp core.layers\n",
        "\n",
        "# Setup and imports\n",
        "import numpy as np\n",
        "import sys\n",
        "from typing import Union, Optional, Callable\n",
        "import math"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#| export\n",
        "import numpy as np\n",
        "import math\n",
        "import sys\n",
        "from typing import Union, Optional, Callable\n",
        "from tinytorch.core.tensor import Tensor\n",
        "\n",
        "# Import activation functions from the activations module\n",
        "from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
        "\n",
        "# Import our Tensor class\n",
        "# sys.path.append('../../')\n",
        "# from modules.tensor.tensor_dev import Tensor\n",
        "\n",
        "# print(\"\ud83d\udd25 TinyTorch Layers Module\")\n",
        "# print(f\"NumPy version: {np.__version__}\")\n",
        "# print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
        "# print(\"Ready to build neural network layers!\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "## Step 1: What is a Layer?\n",
        "\n",
        "A **layer** is a function that transforms tensors. Think of it as:\n",
        "- **Input**: Tensor with some shape\n",
        "- **Transformation**: Mathematical operation (linear, nonlinear, etc.)\n",
        "- **Output**: Tensor with possibly different shape\n",
        "\n",
        "**The fundamental insight**: Neural networks are just function composition!\n",
        "```\n",
        "x \u2192 Layer1 \u2192 Layer2 \u2192 Layer3 \u2192 y\n",
        "```\n",
        "\n",
        "**Why layers matter**:\n",
        "- They're the building blocks of all neural networks\n",
        "- Each layer learns a different transformation\n",
        "- Composing layers creates complex functions\n",
        "- Understanding layers = understanding neural networks\n",
        "\n",
        "Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#| export\n",
        "class Dense:\n",
        "    \"\"\"\n",
        "    Dense (Linear) Layer: y = Wx + b\n",
        "    \n",
        "    The fundamental building block of neural networks.\n",
        "    Performs linear transformation: matrix multiplication + bias addition.\n",
        "    \n",
        "    Args:\n",
        "        input_size: Number of input features\n",
        "        output_size: Number of output features\n",
        "        use_bias: Whether to include bias term (default: True)\n",
        "        \n",
        "    TODO: Implement the Dense layer with weight initialization and forward pass.\n",
        "    \"\"\"\n",
        "    \n",
        "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
        "        \"\"\"\n",
        "        Initialize Dense layer with random weights.\n",
        "        \n",
        "        TODO: \n",
        "        1. Store layer parameters (input_size, output_size, use_bias)\n",
        "        2. Initialize weights with small random values\n",
        "        3. Initialize bias to zeros (if use_bias=True)\n",
        "        \"\"\"\n",
        "        raise NotImplementedError(\"Student implementation required\")\n",
        "    \n",
        "    def forward(self, x: Tensor) -> Tensor:\n",
        "        \"\"\"\n",
        "        Forward pass: y = Wx + b\n",
        "        \n",
        "        Args:\n",
        "            x: Input tensor of shape (batch_size, input_size)\n",
        "            \n",
        "        Returns:\n",
        "            Output tensor of shape (batch_size, output_size)\n",
        "            \n",
        "        TODO: Implement matrix multiplication and bias addition\n",
        "        \"\"\"\n",
        "        raise NotImplementedError(\"Student implementation required\")\n",
        "    \n",
        "    def __call__(self, x: Tensor) -> Tensor:\n",
        "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
        "        return self.forward(x)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "#| hide\n",
        "#| export\n",
        "class Dense:\n",
        "    \"\"\"\n",
        "    Dense (Linear) Layer: y = Wx + b\n",
        "    \n",
        "    The fundamental building block of neural networks.\n",
        "    Performs linear transformation: matrix multiplication + bias addition.\n",
        "    \"\"\"\n",
        "    \n",
        "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
        "        \"\"\"Initialize Dense layer with random weights.\"\"\"\n",
        "        self.input_size = input_size\n",
        "        self.output_size = output_size\n",
        "        self.use_bias = use_bias\n",
        "        \n",
        "        # Initialize weights with Xavier/Glorot initialization\n",
        "        # This helps with gradient flow during training\n",
        "        limit = math.sqrt(6.0 / (input_size + output_size))\n",
        "        self.weights = Tensor(\n",
        "            np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)\n",
        "        )\n",
        "        \n",
        "        # Initialize bias to zeros\n",
        "        if use_bias:\n",
        "            self.bias = Tensor(np.zeros(output_size, dtype=np.float32))\n",
        "        else:\n",
        "            self.bias = None\n",
        "    \n",
        "    def forward(self, x: Tensor) -> Tensor:\n",
        "        \"\"\"Forward pass: y = Wx + b\"\"\"\n",
        "        # Matrix multiplication: x @ weights\n",
        "        # x shape: (batch_size, input_size)\n",
        "        # weights shape: (input_size, output_size)\n",
        "        # result shape: (batch_size, output_size)\n",
        "        output = Tensor(x.data @ self.weights.data)\n",
        "        \n",
        "        # Add bias if present\n",
        "        if self.bias is not None:\n",
        "            output = Tensor(output.data + self.bias.data)\n",
        "        \n",
        "        return output\n",
        "    \n",
        "    def __call__(self, x: Tensor) -> Tensor:\n",
        "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
        "        return self.forward(x)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "### \ud83e\uddea Test Your Dense Layer\n",
        "\n",
        "Once you implement the Dense layer above, run this cell to test it:\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Test the Dense layer\n",
        "try:\n",
        "    print(\"=== Testing Dense Layer ===\")\n",
        "    \n",
        "    # Create a simple Dense layer: 3 inputs \u2192 2 outputs\n",
        "    layer = Dense(input_size=3, output_size=2)\n",
        "    print(f\"Created Dense layer: {layer.input_size} \u2192 {layer.output_size}\")\n",
        "    print(f\"Weights shape: {layer.weights.shape}\")\n",
        "    print(f\"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}\")\n",
        "    \n",
        "    # Test with a single example\n",
        "    x = Tensor([[1.0, 2.0, 3.0]])  # Shape: (1, 3)\n",
        "    y = layer(x)\n",
        "    print(f\"Input shape: {x.shape}\")\n",
        "    print(f\"Output shape: {y.shape}\")\n",
        "    print(f\"Input: {x.data}\")\n",
        "    print(f\"Output: {y.data}\")\n",
        "    \n",
        "    # Test with batch\n",
        "    x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])  # Shape: (2, 3)\n",
        "    y_batch = layer(x_batch)\n",
        "    print(f\"\\nBatch input shape: {x_batch.shape}\")\n",
        "    print(f\"Batch output shape: {y_batch.shape}\")\n",
        "    \n",
        "    print(\"\u2705 Dense layer working!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Make sure to implement the Dense layer above!\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "## Step 2: Activation Functions - Adding Nonlinearity\n",
        "\n",
        "Now we'll use the activation functions from the **activations** module! \n",
        "\n",
        "**Clean Architecture**: We import the activation functions rather than redefining them:\n",
        "```python\n",
        "from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
        "```\n",
        "\n",
        "**Why this matters**:\n",
        "- **Separation of concerns**: Math functions vs. layer building blocks\n",
        "- **Reusability**: Activations can be used anywhere in the system\n",
        "- **Maintainability**: One place to update activation implementations\n",
        "- **Composability**: Clean imports make neural networks easier to build\n",
        "\n",
        "**Why nonlinearity matters**: Without it, stacking layers is pointless!\n",
        "```\n",
        "Linear \u2192 Linear \u2192 Linear = Just one big Linear transformation\n",
        "Linear \u2192 NonLinear \u2192 Linear = Can learn complex patterns\n",
        "```\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "### \ud83e\uddea Test Activation Functions from Activations Module\n",
        "\n",
        "Let's test that we can use the activation functions from the activations module:\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Test activation functions from activations module\n",
        "try:\n",
        "    print(\"=== Testing Activation Functions from Activations Module ===\")\n",
        "    \n",
        "    # Test data: mix of positive, negative, and zero\n",
        "    x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])\n",
        "    print(f\"Input: {x.data}\")\n",
        "    \n",
        "    # Test ReLU from activations module\n",
        "    relu = ReLU()\n",
        "    y_relu = relu(x)\n",
        "    print(f\"ReLU output: {y_relu.data}\")\n",
        "    \n",
        "    # Test Sigmoid from activations module\n",
        "    sigmoid = Sigmoid()\n",
        "    y_sigmoid = sigmoid(x)\n",
        "    print(f\"Sigmoid output: {y_sigmoid.data}\")\n",
        "    \n",
        "    # Test Tanh from activations module\n",
        "    tanh = Tanh()\n",
        "    y_tanh = tanh(x)\n",
        "    print(f\"Tanh output: {y_tanh.data}\")\n",
        "    \n",
        "    print(\"\u2705 Activation functions from activations module working!\")\n",
        "    print(\"\ud83c\udf89 Clean architecture: layers module uses activations module!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Make sure the activations module is properly exported!\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "## Step 3: Layer Composition - Building Neural Networks\n",
        "\n",
        "Now comes the magic! We can **compose** layers to build neural networks:\n",
        "\n",
        "```\n",
        "Input \u2192 Dense \u2192 ReLU \u2192 Dense \u2192 Sigmoid \u2192 Output\n",
        "```\n",
        "\n",
        "This is a 2-layer neural network that can learn complex nonlinear patterns!\n",
        "\n",
        "**Notice the clean architecture**:\n",
        "- Dense layers handle linear transformations\n",
        "- Activation functions (from activations module) handle nonlinearity\n",
        "- Composition creates complex behaviors from simple building blocks\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Build a simple 2-layer neural network\n",
        "try:\n",
        "    print(\"=== Building a 2-Layer Neural Network ===\")\n",
        "    \n",
        "    # Network architecture: 3 \u2192 4 \u2192 2\n",
        "    # Input: 3 features\n",
        "    # Hidden: 4 neurons with ReLU\n",
        "    # Output: 2 neurons with Sigmoid\n",
        "    \n",
        "    layer1 = Dense(input_size=3, output_size=4)\n",
        "    activation1 = ReLU()  # From activations module\n",
        "    layer2 = Dense(input_size=4, output_size=2)\n",
        "    activation2 = Sigmoid()  # From activations module\n",
        "    \n",
        "    print(\"Network architecture:\")\n",
        "    print(f\"  Input: 3 features\")\n",
        "    print(f\"  Hidden: {layer1.input_size} \u2192 {layer1.output_size} (Dense + ReLU)\")\n",
        "    print(f\"  Output: {layer2.input_size} \u2192 {layer2.output_size} (Dense + Sigmoid)\")\n",
        "    \n",
        "    # Test with sample data\n",
        "    x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])  # 2 examples, 3 features each\n",
        "    print(f\"\\nInput shape: {x.shape}\")\n",
        "    print(f\"Input data: {x.data}\")\n",
        "    \n",
        "    # Forward pass through the network\n",
        "    h1 = layer1(x)           # Dense layer 1\n",
        "    h1_activated = activation1(h1)  # ReLU activation\n",
        "    h2 = layer2(h1_activated)       # Dense layer 2  \n",
        "    output = activation2(h2)        # Sigmoid activation\n",
        "    \n",
        "    print(f\"\\nAfter layer 1: {h1.shape}\")\n",
        "    print(f\"After ReLU: {h1_activated.shape}\")\n",
        "    print(f\"After layer 2: {h2.shape}\")\n",
        "    print(f\"Final output: {output.shape}\")\n",
        "    print(f\"Output values: {output.data}\")\n",
        "    \n",
        "    print(\"\\n\ud83c\udf89 Neural network working! You just built your first neural network!\")\n",
        "    print(\"\ud83c\udfd7\ufe0f  Clean architecture: Dense layers + Activations module = Neural Network\")\n",
        "    print(\"Notice how the network transforms 3D input into 2D output through learned transformations.\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Make sure to implement the layers and check activations module!\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "## Step 4: Understanding What We Built\n",
        "\n",
        "Congratulations! You just implemented a clean, modular neural network architecture:\n",
        "\n",
        "### \ud83e\uddf1 **What You Built**\n",
        "1. **Dense Layer**: Linear transformation `y = Wx + b`\n",
        "2. **Activation Functions**: Imported from activations module (ReLU, Sigmoid, Tanh)\n",
        "3. **Layer Composition**: Chaining layers to build networks\n",
        "\n",
        "### \ud83c\udfd7\ufe0f **Clean Architecture Benefits**\n",
        "- **Separation of concerns**: Math functions vs. layer building blocks\n",
        "- **Reusability**: Activations can be used across different modules\n",
        "- **Maintainability**: One place to update activation implementations\n",
        "- **Composability**: Clean imports make complex networks easier to build\n",
        "\n",
        "### \ud83c\udfaf **Key Insights**\n",
        "- **Layers are functions**: They transform tensors from one space to another\n",
        "- **Composition creates complexity**: Simple layers \u2192 complex networks\n",
        "- **Nonlinearity is crucial**: Without it, deep networks are just linear transformations\n",
        "- **Neural networks are function approximators**: They learn to map inputs to outputs\n",
        "- **Modular design**: Building blocks can be combined in many ways\n",
        "\n",
        "### \ud83d\ude80 **What's Next**\n",
        "In the next modules, you'll learn:\n",
        "- **Training**: How networks learn from data (backpropagation, optimizers)\n",
        "- **Architectures**: Specialized layers for different problems (CNNs, RNNs)\n",
        "- **Applications**: Using networks for real problems\n",
        "\n",
        "### \ud83d\udd27 **Export to Package**\n",
        "Run this to export your layers to the TinyTorch package:\n",
        "```bash\n",
        "python bin/tito.py sync\n",
        "```\n",
        "\n",
        "Then test your implementation:\n",
        "```bash\n",
        "python bin/tito.py test --module layers\n",
        "```\n",
        "\n",
        "**Great job! You've built a clean, modular foundation for neural networks!** \ud83c\udf89\n",
        "\"\"\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Final demonstration: A more complex example\n",
        "try:\n",
        "    print(\"=== Final Demo: Image Classification Network ===\")\n",
        "    \n",
        "    # Simulate a small image: 28x28 pixels flattened to 784 features\n",
        "    # This is like a tiny MNIST digit\n",
        "    image_size = 28 * 28  # 784 pixels\n",
        "    num_classes = 10      # 10 digits (0-9)\n",
        "    \n",
        "    # Build a 3-layer network for digit classification\n",
        "    # 784 \u2192 128 \u2192 64 \u2192 10\n",
        "    layer1 = Dense(input_size=image_size, output_size=128)\n",
        "    relu1 = ReLU()  # From activations module\n",
        "    layer2 = Dense(input_size=128, output_size=64)\n",
        "    relu2 = ReLU()  # From activations module\n",
        "    layer3 = Dense(input_size=64, output_size=num_classes)\n",
        "    softmax = Sigmoid()  # Using Sigmoid as a simple \"probability-like\" output\n",
        "    \n",
        "    print(f\"Image classification network:\")\n",
        "    print(f\"  Input: {image_size} pixels (28x28 image)\")\n",
        "    print(f\"  Hidden 1: {layer1.input_size} \u2192 {layer1.output_size} (Dense + ReLU)\")\n",
        "    print(f\"  Hidden 2: {layer2.input_size} \u2192 {layer2.output_size} (Dense + ReLU)\")\n",
        "    print(f\"  Output: {layer3.input_size} \u2192 {layer3.output_size} (Dense + Sigmoid)\")\n",
        "    \n",
        "    # Simulate a batch of 5 images\n",
        "    batch_size = 5\n",
        "    fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32))\n",
        "    \n",
        "    # Forward pass\n",
        "    h1 = relu1(layer1(fake_images))\n",
        "    h2 = relu2(layer2(h1))\n",
        "    predictions = softmax(layer3(h2))\n",
        "    \n",
        "    print(f\"\\nBatch processing:\")\n",
        "    print(f\"  Input batch shape: {fake_images.shape}\")\n",
        "    print(f\"  Predictions shape: {predictions.shape}\")\n",
        "    print(f\"  Sample predictions: {predictions.data[0]}\")  # First image predictions\n",
        "    \n",
        "    print(\"\\n\ud83c\udf89 You built a neural network that could classify images!\")\n",
        "    print(\"\ud83c\udfd7\ufe0f  Clean architecture: Dense layers + Activations module = Image Classifier\")\n",
        "    print(\"With training, this network could learn to recognize handwritten digits!\")\n",
        "    \n",
        "except Exception as e:\n",
        "    print(f\"\u274c Error: {e}\")\n",
        "    print(\"Check your layer implementations and activations module!\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "\"\"\"\n",
        "## \ud83c\udf93 Module Summary\n",
        "\n",
        "### What You Learned\n",
        "1. **Layer Architecture**: Dense layers as linear transformations\n",
        "2. **Clean Dependencies**: Layers module uses activations module\n",
        "3. **Function Composition**: Simple building blocks \u2192 complex networks\n",
        "4. **Modular Design**: Separation of concerns for maintainable code\n",
        "\n",
        "### Key Architectural Insight\n",
        "```\n",
        "activations (math functions) \u2192 layers (building blocks) \u2192 networks (applications)\n",
        "```\n",
        "\n",
        "This clean dependency graph makes the system:\n",
        "- **Understandable**: Each module has a clear purpose\n",
        "- **Testable**: Each module can be tested independently\n",
        "- **Reusable**: Components can be used across different contexts\n",
        "- **Maintainable**: Changes are localized to appropriate modules\n",
        "\n",
        "### Next Steps\n",
        "- **Training**: Learn how networks learn from data\n",
        "- **Advanced Architectures**: CNNs, RNNs, Transformers\n",
        "- **Applications**: Real-world machine learning problems\n",
        "\n",
        "**Congratulations on building a clean, modular neural network foundation!** \ud83d\ude80\n",
        "\"\"\""
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.8.0"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 4
}