Files
TinyTorch/modules/layers/layers_dev.ipynb
Vijay Janapa Reddi b785b706f2 Refactor notebook generation to use separate files for better architecture
- Restored tools/py_to_notebook.py as a focused, standalone tool
- Updated tito notebooks command to use subprocess to call the separate tool
- Maintains clean separation of concerns: tito.py for CLI orchestration, py_to_notebook.py for conversion logic
- Updated documentation to use 'tito notebooks' command instead of direct tool calls
- Benefits: easier debugging, better maintainability, focused single-responsibility modules
2025-07-10 21:57:09 -04:00

588 lines
24 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"jupyter:\n",
" jupytext:\n",
" text_representation:\n",
" extension: .py\n",
" format_name: percent\n",
" format_version: '1.3'\n",
" jupytext_version: 1.17.1\n",
"---\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"# Module 2: Layers - Neural Network Building Blocks\n",
"\n",
"Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.\n",
"\n",
"## Learning Goals\n",
"- Understand layers as functions that transform tensors: `y = f(x)`\n",
"- Implement Dense layers with linear transformations: `y = Wx + b`\n",
"- Use activation functions from the activations module for nonlinearity\n",
"- See how neural networks are just function composition\n",
"- Build intuition before diving into training\n",
"\n",
"## Build \u2192 Use \u2192 Understand\n",
"1. **Build**: Dense layers using activation functions as building blocks\n",
"2. **Use**: Transform tensors and see immediate results\n",
"3. **Understand**: How neural networks transform information\n",
"\n",
"## Module Dependencies\n",
"This module builds on the **activations** module:\n",
"- **activations** \u2192 **layers** \u2192 **networks**\n",
"- Clean separation of concerns: math functions \u2192 layer building blocks \u2192 full networks\n",
"\n",
"## Module \u2192 Package Structure\n",
"**\ud83c\udf93 Teaching vs. \ud83d\udd27 Building**: \n",
"- **Learning side**: Work in `modules/layers/layers_dev.py` \n",
"- **Building side**: Exports to `tinytorch/core/layers.py`\n",
"\n",
"This module builds the fundamental transformations that compose into neural networks.\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| default_exp core.layers\n",
"\n",
"# Setup and imports\n",
"import numpy as np\n",
"import sys\n",
"from typing import Union, Optional, Callable\n",
"import math"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| export\n",
"import numpy as np\n",
"import math\n",
"import sys\n",
"from typing import Union, Optional, Callable\n",
"from tinytorch.core.tensor import Tensor\n",
"\n",
"# Import activation functions from the activations module\n",
"from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
"\n",
"# Import our Tensor class\n",
"# sys.path.append('../../')\n",
"# from modules.tensor.tensor_dev import Tensor\n",
"\n",
"# print(\"\ud83d\udd25 TinyTorch Layers Module\")\n",
"# print(f\"NumPy version: {np.__version__}\")\n",
"# print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
"# print(\"Ready to build neural network layers!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"## Step 1: What is a Layer?\n",
"\n",
"A **layer** is a function that transforms tensors. Think of it as:\n",
"- **Input**: Tensor with some shape\n",
"- **Transformation**: Mathematical operation (linear, nonlinear, etc.)\n",
"- **Output**: Tensor with possibly different shape\n",
"\n",
"**The fundamental insight**: Neural networks are just function composition!\n",
"```\n",
"x \u2192 Layer1 \u2192 Layer2 \u2192 Layer3 \u2192 y\n",
"```\n",
"\n",
"**Why layers matter**:\n",
"- They're the building blocks of all neural networks\n",
"- Each layer learns a different transformation\n",
"- Composing layers creates complex functions\n",
"- Understanding layers = understanding neural networks\n",
"\n",
"Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| export\n",
"class Dense:\n",
" \"\"\"\n",
" Dense (Linear) Layer: y = Wx + b\n",
" \n",
" The fundamental building block of neural networks.\n",
" Performs linear transformation: matrix multiplication + bias addition.\n",
" \n",
" Args:\n",
" input_size: Number of input features\n",
" output_size: Number of output features\n",
" use_bias: Whether to include bias term (default: True)\n",
" \n",
" TODO: Implement the Dense layer with weight initialization and forward pass.\n",
" \"\"\"\n",
" \n",
" def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
" \"\"\"\n",
" Initialize Dense layer with random weights.\n",
" \n",
" TODO: \n",
" 1. Store layer parameters (input_size, output_size, use_bias)\n",
" 2. Initialize weights with small random values\n",
" 3. Initialize bias to zeros (if use_bias=True)\n",
" \"\"\"\n",
" raise NotImplementedError(\"Student implementation required\")\n",
" \n",
" def forward(self, x: Tensor) -> Tensor:\n",
" \"\"\"\n",
" Forward pass: y = Wx + b\n",
" \n",
" Args:\n",
" x: Input tensor of shape (batch_size, input_size)\n",
" \n",
" Returns:\n",
" Output tensor of shape (batch_size, output_size)\n",
" \n",
" TODO: Implement matrix multiplication and bias addition\n",
" \"\"\"\n",
" raise NotImplementedError(\"Student implementation required\")\n",
" \n",
" def __call__(self, x: Tensor) -> Tensor:\n",
" \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
" return self.forward(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"#| hide\n",
"#| export\n",
"class Dense:\n",
" \"\"\"\n",
" Dense (Linear) Layer: y = Wx + b\n",
" \n",
" The fundamental building block of neural networks.\n",
" Performs linear transformation: matrix multiplication + bias addition.\n",
" \"\"\"\n",
" \n",
" def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
" \"\"\"Initialize Dense layer with random weights.\"\"\"\n",
" self.input_size = input_size\n",
" self.output_size = output_size\n",
" self.use_bias = use_bias\n",
" \n",
" # Initialize weights with Xavier/Glorot initialization\n",
" # This helps with gradient flow during training\n",
" limit = math.sqrt(6.0 / (input_size + output_size))\n",
" self.weights = Tensor(\n",
" np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)\n",
" )\n",
" \n",
" # Initialize bias to zeros\n",
" if use_bias:\n",
" self.bias = Tensor(np.zeros(output_size, dtype=np.float32))\n",
" else:\n",
" self.bias = None\n",
" \n",
" def forward(self, x: Tensor) -> Tensor:\n",
" \"\"\"Forward pass: y = Wx + b\"\"\"\n",
" # Matrix multiplication: x @ weights\n",
" # x shape: (batch_size, input_size)\n",
" # weights shape: (input_size, output_size)\n",
" # result shape: (batch_size, output_size)\n",
" output = Tensor(x.data @ self.weights.data)\n",
" \n",
" # Add bias if present\n",
" if self.bias is not None:\n",
" output = Tensor(output.data + self.bias.data)\n",
" \n",
" return output\n",
" \n",
" def __call__(self, x: Tensor) -> Tensor:\n",
" \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
" return self.forward(x)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"### \ud83e\uddea Test Your Dense Layer\n",
"\n",
"Once you implement the Dense layer above, run this cell to test it:\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test the Dense layer\n",
"try:\n",
" print(\"=== Testing Dense Layer ===\")\n",
" \n",
" # Create a simple Dense layer: 3 inputs \u2192 2 outputs\n",
" layer = Dense(input_size=3, output_size=2)\n",
" print(f\"Created Dense layer: {layer.input_size} \u2192 {layer.output_size}\")\n",
" print(f\"Weights shape: {layer.weights.shape}\")\n",
" print(f\"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}\")\n",
" \n",
" # Test with a single example\n",
" x = Tensor([[1.0, 2.0, 3.0]]) # Shape: (1, 3)\n",
" y = layer(x)\n",
" print(f\"Input shape: {x.shape}\")\n",
" print(f\"Output shape: {y.shape}\")\n",
" print(f\"Input: {x.data}\")\n",
" print(f\"Output: {y.data}\")\n",
" \n",
" # Test with batch\n",
" x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # Shape: (2, 3)\n",
" y_batch = layer(x_batch)\n",
" print(f\"\\nBatch input shape: {x_batch.shape}\")\n",
" print(f\"Batch output shape: {y_batch.shape}\")\n",
" \n",
" print(\"\u2705 Dense layer working!\")\n",
" \n",
"except Exception as e:\n",
" print(f\"\u274c Error: {e}\")\n",
" print(\"Make sure to implement the Dense layer above!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"## Step 2: Activation Functions - Adding Nonlinearity\n",
"\n",
"Now we'll use the activation functions from the **activations** module! \n",
"\n",
"**Clean Architecture**: We import the activation functions rather than redefining them:\n",
"```python\n",
"from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
"```\n",
"\n",
"**Why this matters**:\n",
"- **Separation of concerns**: Math functions vs. layer building blocks\n",
"- **Reusability**: Activations can be used anywhere in the system\n",
"- **Maintainability**: One place to update activation implementations\n",
"- **Composability**: Clean imports make neural networks easier to build\n",
"\n",
"**Why nonlinearity matters**: Without it, stacking layers is pointless!\n",
"```\n",
"Linear \u2192 Linear \u2192 Linear = Just one big Linear transformation\n",
"Linear \u2192 NonLinear \u2192 Linear = Can learn complex patterns\n",
"```\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"### \ud83e\uddea Test Activation Functions from Activations Module\n",
"\n",
"Let's test that we can use the activation functions from the activations module:\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test activation functions from activations module\n",
"try:\n",
" print(\"=== Testing Activation Functions from Activations Module ===\")\n",
" \n",
" # Test data: mix of positive, negative, and zero\n",
" x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])\n",
" print(f\"Input: {x.data}\")\n",
" \n",
" # Test ReLU from activations module\n",
" relu = ReLU()\n",
" y_relu = relu(x)\n",
" print(f\"ReLU output: {y_relu.data}\")\n",
" \n",
" # Test Sigmoid from activations module\n",
" sigmoid = Sigmoid()\n",
" y_sigmoid = sigmoid(x)\n",
" print(f\"Sigmoid output: {y_sigmoid.data}\")\n",
" \n",
" # Test Tanh from activations module\n",
" tanh = Tanh()\n",
" y_tanh = tanh(x)\n",
" print(f\"Tanh output: {y_tanh.data}\")\n",
" \n",
" print(\"\u2705 Activation functions from activations module working!\")\n",
" print(\"\ud83c\udf89 Clean architecture: layers module uses activations module!\")\n",
" \n",
"except Exception as e:\n",
" print(f\"\u274c Error: {e}\")\n",
" print(\"Make sure the activations module is properly exported!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"## Step 3: Layer Composition - Building Neural Networks\n",
"\n",
"Now comes the magic! We can **compose** layers to build neural networks:\n",
"\n",
"```\n",
"Input \u2192 Dense \u2192 ReLU \u2192 Dense \u2192 Sigmoid \u2192 Output\n",
"```\n",
"\n",
"This is a 2-layer neural network that can learn complex nonlinear patterns!\n",
"\n",
"**Notice the clean architecture**:\n",
"- Dense layers handle linear transformations\n",
"- Activation functions (from activations module) handle nonlinearity\n",
"- Composition creates complex behaviors from simple building blocks\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Build a simple 2-layer neural network\n",
"try:\n",
" print(\"=== Building a 2-Layer Neural Network ===\")\n",
" \n",
" # Network architecture: 3 \u2192 4 \u2192 2\n",
" # Input: 3 features\n",
" # Hidden: 4 neurons with ReLU\n",
" # Output: 2 neurons with Sigmoid\n",
" \n",
" layer1 = Dense(input_size=3, output_size=4)\n",
" activation1 = ReLU() # From activations module\n",
" layer2 = Dense(input_size=4, output_size=2)\n",
" activation2 = Sigmoid() # From activations module\n",
" \n",
" print(\"Network architecture:\")\n",
" print(f\" Input: 3 features\")\n",
" print(f\" Hidden: {layer1.input_size} \u2192 {layer1.output_size} (Dense + ReLU)\")\n",
" print(f\" Output: {layer2.input_size} \u2192 {layer2.output_size} (Dense + Sigmoid)\")\n",
" \n",
" # Test with sample data\n",
" x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) # 2 examples, 3 features each\n",
" print(f\"\\nInput shape: {x.shape}\")\n",
" print(f\"Input data: {x.data}\")\n",
" \n",
" # Forward pass through the network\n",
" h1 = layer1(x) # Dense layer 1\n",
" h1_activated = activation1(h1) # ReLU activation\n",
" h2 = layer2(h1_activated) # Dense layer 2 \n",
" output = activation2(h2) # Sigmoid activation\n",
" \n",
" print(f\"\\nAfter layer 1: {h1.shape}\")\n",
" print(f\"After ReLU: {h1_activated.shape}\")\n",
" print(f\"After layer 2: {h2.shape}\")\n",
" print(f\"Final output: {output.shape}\")\n",
" print(f\"Output values: {output.data}\")\n",
" \n",
" print(\"\\n\ud83c\udf89 Neural network working! You just built your first neural network!\")\n",
" print(\"\ud83c\udfd7\ufe0f Clean architecture: Dense layers + Activations module = Neural Network\")\n",
" print(\"Notice how the network transforms 3D input into 2D output through learned transformations.\")\n",
" \n",
"except Exception as e:\n",
" print(f\"\u274c Error: {e}\")\n",
" print(\"Make sure to implement the layers and check activations module!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"## Step 4: Understanding What We Built\n",
"\n",
"Congratulations! You just implemented a clean, modular neural network architecture:\n",
"\n",
"### \ud83e\uddf1 **What You Built**\n",
"1. **Dense Layer**: Linear transformation `y = Wx + b`\n",
"2. **Activation Functions**: Imported from activations module (ReLU, Sigmoid, Tanh)\n",
"3. **Layer Composition**: Chaining layers to build networks\n",
"\n",
"### \ud83c\udfd7\ufe0f **Clean Architecture Benefits**\n",
"- **Separation of concerns**: Math functions vs. layer building blocks\n",
"- **Reusability**: Activations can be used across different modules\n",
"- **Maintainability**: One place to update activation implementations\n",
"- **Composability**: Clean imports make complex networks easier to build\n",
"\n",
"### \ud83c\udfaf **Key Insights**\n",
"- **Layers are functions**: They transform tensors from one space to another\n",
"- **Composition creates complexity**: Simple layers \u2192 complex networks\n",
"- **Nonlinearity is crucial**: Without it, deep networks are just linear transformations\n",
"- **Neural networks are function approximators**: They learn to map inputs to outputs\n",
"- **Modular design**: Building blocks can be combined in many ways\n",
"\n",
"### \ud83d\ude80 **What's Next**\n",
"In the next modules, you'll learn:\n",
"- **Training**: How networks learn from data (backpropagation, optimizers)\n",
"- **Architectures**: Specialized layers for different problems (CNNs, RNNs)\n",
"- **Applications**: Using networks for real problems\n",
"\n",
"### \ud83d\udd27 **Export to Package**\n",
"Run this to export your layers to the TinyTorch package:\n",
"```bash\n",
"python bin/tito.py sync\n",
"```\n",
"\n",
"Then test your implementation:\n",
"```bash\n",
"python bin/tito.py test --module layers\n",
"```\n",
"\n",
"**Great job! You've built a clean, modular foundation for neural networks!** \ud83c\udf89\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Final demonstration: A more complex example\n",
"try:\n",
" print(\"=== Final Demo: Image Classification Network ===\")\n",
" \n",
" # Simulate a small image: 28x28 pixels flattened to 784 features\n",
" # This is like a tiny MNIST digit\n",
" image_size = 28 * 28 # 784 pixels\n",
" num_classes = 10 # 10 digits (0-9)\n",
" \n",
" # Build a 3-layer network for digit classification\n",
" # 784 \u2192 128 \u2192 64 \u2192 10\n",
" layer1 = Dense(input_size=image_size, output_size=128)\n",
" relu1 = ReLU() # From activations module\n",
" layer2 = Dense(input_size=128, output_size=64)\n",
" relu2 = ReLU() # From activations module\n",
" layer3 = Dense(input_size=64, output_size=num_classes)\n",
" softmax = Sigmoid() # Using Sigmoid as a simple \"probability-like\" output\n",
" \n",
" print(f\"Image classification network:\")\n",
" print(f\" Input: {image_size} pixels (28x28 image)\")\n",
" print(f\" Hidden 1: {layer1.input_size} \u2192 {layer1.output_size} (Dense + ReLU)\")\n",
" print(f\" Hidden 2: {layer2.input_size} \u2192 {layer2.output_size} (Dense + ReLU)\")\n",
" print(f\" Output: {layer3.input_size} \u2192 {layer3.output_size} (Dense + Sigmoid)\")\n",
" \n",
" # Simulate a batch of 5 images\n",
" batch_size = 5\n",
" fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32))\n",
" \n",
" # Forward pass\n",
" h1 = relu1(layer1(fake_images))\n",
" h2 = relu2(layer2(h1))\n",
" predictions = softmax(layer3(h2))\n",
" \n",
" print(f\"\\nBatch processing:\")\n",
" print(f\" Input batch shape: {fake_images.shape}\")\n",
" print(f\" Predictions shape: {predictions.shape}\")\n",
" print(f\" Sample predictions: {predictions.data[0]}\") # First image predictions\n",
" \n",
" print(\"\\n\ud83c\udf89 You built a neural network that could classify images!\")\n",
" print(\"\ud83c\udfd7\ufe0f Clean architecture: Dense layers + Activations module = Image Classifier\")\n",
" print(\"With training, this network could learn to recognize handwritten digits!\")\n",
" \n",
"except Exception as e:\n",
" print(f\"\u274c Error: {e}\")\n",
" print(\"Check your layer implementations and activations module!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"\"\"\"\n",
"## \ud83c\udf93 Module Summary\n",
"\n",
"### What You Learned\n",
"1. **Layer Architecture**: Dense layers as linear transformations\n",
"2. **Clean Dependencies**: Layers module uses activations module\n",
"3. **Function Composition**: Simple building blocks \u2192 complex networks\n",
"4. **Modular Design**: Separation of concerns for maintainable code\n",
"\n",
"### Key Architectural Insight\n",
"```\n",
"activations (math functions) \u2192 layers (building blocks) \u2192 networks (applications)\n",
"```\n",
"\n",
"This clean dependency graph makes the system:\n",
"- **Understandable**: Each module has a clear purpose\n",
"- **Testable**: Each module can be tested independently\n",
"- **Reusable**: Components can be used across different contexts\n",
"- **Maintainable**: Changes are localized to appropriate modules\n",
"\n",
"### Next Steps\n",
"- **Training**: Learn how networks learn from data\n",
"- **Advanced Architectures**: CNNs, RNNs, Transformers\n",
"- **Applications**: Real-world machine learning problems\n",
"\n",
"**Congratulations on building a clean, modular neural network foundation!** \ud83d\ude80\n",
"\"\"\""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}