mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-11 20:55:19 -05:00
- Migrated all Python source files to assignments/source/ structure - Updated nbdev configuration to use assignments/source as nbs_path - Updated all tito commands (nbgrader, export, test) to use new structure - Fixed hardcoded paths in Python files and documentation - Updated config.py to use assignments/source instead of modules - Fixed test command to use correct file naming (short names vs full module names) - Regenerated all notebook files with clean metadata - Verified complete workflow: Python source → NBGrader → nbdev export → testing All systems now working: NBGrader (14 source assignments, 1 released), nbdev export (7 generated files), and pytest integration. The modules/ directory has been retired and replaced with standard NBGrader structure.
798 lines
29 KiB
Plaintext
798 lines
29 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2668bc45",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"# Module 2: Layers - Neural Network Building Blocks\n",
|
||
"\n",
|
||
"Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.\n",
|
||
"\n",
|
||
"## Learning Goals\n",
|
||
"- Understand layers as functions that transform tensors: `y = f(x)`\n",
|
||
"- Implement Dense layers with linear transformations: `y = Wx + b`\n",
|
||
"- Use activation functions from the activations module for nonlinearity\n",
|
||
"- See how neural networks are just function composition\n",
|
||
"- Build intuition before diving into training\n",
|
||
"\n",
|
||
"## Build → Use → Understand\n",
|
||
"1. **Build**: Dense layers using activation functions as building blocks\n",
|
||
"2. **Use**: Transform tensors and see immediate results\n",
|
||
"3. **Understand**: How neural networks transform information\n",
|
||
"\n",
|
||
"## Module Dependencies\n",
|
||
"This module builds on the **activations** module:\n",
|
||
"- **activations** → **layers** → **networks**\n",
|
||
"- Clean separation of concerns: math functions → layer building blocks → full networks"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "530716e8",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 📦 Where This Code Lives in the Final Package\n",
|
||
"\n",
|
||
"**Learning Side:** You work in `assignments/source/03_layers/layers_dev.py` \n",
|
||
"**Building Side:** Code exports to `tinytorch.core.layers`\n",
|
||
"\n",
|
||
"```python\n",
|
||
"# Final package structure:\n",
|
||
"from tinytorch.core.layers import Dense, Conv2D # All layers together!\n",
|
||
"from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
|
||
"from tinytorch.core.tensor import Tensor\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why this matters:**\n",
|
||
"- **Learning:** Focused modules for deep understanding\n",
|
||
"- **Production:** Proper organization like PyTorch's `torch.nn`\n",
|
||
"- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "4f63809e",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| default_exp core.layers\n",
|
||
"\n",
|
||
"# Setup and imports\n",
|
||
"import numpy as np\n",
|
||
"import sys\n",
|
||
"from typing import Union, Optional, Callable\n",
|
||
"import math"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "00a72b7c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"import numpy as np\n",
|
||
"import math\n",
|
||
"import sys\n",
|
||
"from typing import Union, Optional, Callable\n",
|
||
"\n",
|
||
"# Import from the main package (rock solid foundation)\n",
|
||
"from tinytorch.core.tensor import Tensor\n",
|
||
"from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
|
||
"\n",
|
||
"# print(\"🔥 TinyTorch Layers Module\")\n",
|
||
"# print(f\"NumPy version: {np.__version__}\")\n",
|
||
"# print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
|
||
"# print(\"Ready to build neural network layers!\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a0ad08ea",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## Step 1: What is a Layer?\n",
|
||
"\n",
|
||
"### Definition\n",
|
||
"A **layer** is a function that transforms tensors. Think of it as a mathematical operation that takes input data and produces output data:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Input Tensor → Layer → Output Tensor\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Why Layers Matter in Neural Networks\n",
|
||
"Layers are the fundamental building blocks of all neural networks because:\n",
|
||
"- **Modularity**: Each layer has a specific job (linear transformation, nonlinearity, etc.)\n",
|
||
"- **Composability**: Layers can be combined to create complex functions\n",
|
||
"- **Learnability**: Each layer has parameters that can be learned from data\n",
|
||
"- **Interpretability**: Different layers learn different features\n",
|
||
"\n",
|
||
"### The Fundamental Insight\n",
|
||
"**Neural networks are just function composition!**\n",
|
||
"```\n",
|
||
"x → Layer1 → Layer2 → Layer3 → y\n",
|
||
"```\n",
|
||
"\n",
|
||
"Each layer transforms the data, and the final output is the composition of all these transformations.\n",
|
||
"\n",
|
||
"### Real-World Examples\n",
|
||
"- **Dense Layer**: Learns linear relationships between features\n",
|
||
"- **Convolutional Layer**: Learns spatial patterns in images\n",
|
||
"- **Recurrent Layer**: Learns temporal patterns in sequences\n",
|
||
"- **Activation Layer**: Adds nonlinearity to make networks powerful\n",
|
||
"\n",
|
||
"### Visual Intuition\n",
|
||
"```\n",
|
||
"Input: [1, 2, 3] (3 features)\n",
|
||
"Dense Layer: y = Wx + b\n",
|
||
"Weights W: [[0.1, 0.2, 0.3],\n",
|
||
" [0.4, 0.5, 0.6]] (2×3 matrix)\n",
|
||
"Bias b: [0.1, 0.2] (2 values)\n",
|
||
"Output: [0.1*1 + 0.2*2 + 0.3*3 + 0.1,\n",
|
||
" 0.4*1 + 0.5*2 + 0.6*3 + 0.2] = [1.4, 3.2]\n",
|
||
"```\n",
|
||
"\n",
|
||
"Let's start with the most important layer: **Dense** (also called Linear or Fully Connected)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5d63d076",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Step 2: Understanding Matrix Multiplication\n",
|
||
"\n",
|
||
"Before we build layers, let's understand the core operation: **matrix multiplication**. This is what powers all neural network computations.\n",
|
||
"\n",
|
||
"### Why Matrix Multiplication Matters\n",
|
||
"- **Efficiency**: Process multiple inputs at once\n",
|
||
"- **Parallelization**: GPU acceleration works great with matrix operations\n",
|
||
"- **Batch processing**: Handle multiple samples simultaneously\n",
|
||
"- **Mathematical foundation**: Linear algebra is the language of neural networks\n",
|
||
"\n",
|
||
"### The Math Behind It\n",
|
||
"For matrices A (m×n) and B (n×p), the result C (m×p) is:\n",
|
||
"```\n",
|
||
"C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Visual Example\n",
|
||
"```\n",
|
||
"A = [[1, 2], B = [[5, 6],\n",
|
||
" [3, 4]] [7, 8]]\n",
|
||
"\n",
|
||
"C = A @ B = [[1*5 + 2*7, 1*6 + 2*8],\n",
|
||
" [3*5 + 4*7, 3*6 + 4*8]]\n",
|
||
" = [[19, 22],\n",
|
||
" [43, 50]]\n",
|
||
"```\n",
|
||
"\n",
|
||
"Let's implement this step by step!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "82cc8565",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
|
||
" \"\"\"\n",
|
||
" Naive matrix multiplication using explicit for-loops.\n",
|
||
" \n",
|
||
" This helps you understand what matrix multiplication really does!\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" A: Matrix of shape (m, n)\n",
|
||
" B: Matrix of shape (n, p)\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))\n",
|
||
" \n",
|
||
" TODO: Implement matrix multiplication using three nested for-loops.\n",
|
||
" \n",
|
||
" APPROACH:\n",
|
||
" 1. Get the dimensions: m, n from A and n2, p from B\n",
|
||
" 2. Check that n == n2 (matrices must be compatible)\n",
|
||
" 3. Create output matrix C of shape (m, p) filled with zeros\n",
|
||
" 4. Use three nested loops:\n",
|
||
" - i loop: rows of A (0 to m-1)\n",
|
||
" - j loop: columns of B (0 to p-1) \n",
|
||
" - k loop: shared dimension (0 to n-1)\n",
|
||
" 5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j]\n",
|
||
" \n",
|
||
" EXAMPLE:\n",
|
||
" A = [[1, 2], B = [[5, 6],\n",
|
||
" [3, 4]] [7, 8]]\n",
|
||
" \n",
|
||
" C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19\n",
|
||
" C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22\n",
|
||
" C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43\n",
|
||
" C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50\n",
|
||
" \n",
|
||
" HINTS:\n",
|
||
" - Start with C = np.zeros((m, p))\n",
|
||
" - Use three nested for loops: for i in range(m): for j in range(p): for k in range(n):\n",
|
||
" - Accumulate the sum: C[i,j] += A[i,k] * B[k,j]\n",
|
||
" \"\"\"\n",
|
||
" raise NotImplementedError(\"Student implementation required\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ea923f30",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| hide\n",
|
||
"#| export\n",
|
||
"def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
|
||
" \"\"\"\n",
|
||
" Naive matrix multiplication using explicit for-loops.\n",
|
||
" \n",
|
||
" This helps you understand what matrix multiplication really does!\n",
|
||
" \"\"\"\n",
|
||
" m, n = A.shape\n",
|
||
" n2, p = B.shape\n",
|
||
" assert n == n2, f\"Matrix shapes don't match: A({m},{n}) @ B({n2},{p})\"\n",
|
||
" \n",
|
||
" C = np.zeros((m, p))\n",
|
||
" for i in range(m):\n",
|
||
" for j in range(p):\n",
|
||
" for k in range(n):\n",
|
||
" C[i, j] += A[i, k] * B[k, j]\n",
|
||
" return C"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "60fb8544",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### 🧪 Test Your Matrix Multiplication"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "28898e45",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Test matrix multiplication\n",
|
||
"print(\"Testing matrix multiplication...\")\n",
|
||
"\n",
|
||
"try:\n",
|
||
" # Test case 1: Simple 2x2 matrices\n",
|
||
" A = np.array([[1, 2], [3, 4]], dtype=np.float32)\n",
|
||
" B = np.array([[5, 6], [7, 8]], dtype=np.float32)\n",
|
||
" \n",
|
||
" result = matmul_naive(A, B)\n",
|
||
" expected = np.array([[19, 22], [43, 50]], dtype=np.float32)\n",
|
||
" \n",
|
||
" print(f\"✅ Matrix A:\\n{A}\")\n",
|
||
" print(f\"✅ Matrix B:\\n{B}\")\n",
|
||
" print(f\"✅ Your result:\\n{result}\")\n",
|
||
" print(f\"✅ Expected:\\n{expected}\")\n",
|
||
" \n",
|
||
" assert np.allclose(result, expected), \"❌ Result doesn't match expected!\"\n",
|
||
" print(\"🎉 Matrix multiplication works!\")\n",
|
||
" \n",
|
||
" # Test case 2: Compare with NumPy\n",
|
||
" numpy_result = A @ B\n",
|
||
" assert np.allclose(result, numpy_result), \"❌ Doesn't match NumPy result!\"\n",
|
||
" print(\"✅ Matches NumPy implementation!\")\n",
|
||
" \n",
|
||
"except Exception as e:\n",
|
||
" print(f\"❌ Error: {e}\")\n",
|
||
" print(\"Make sure to implement matmul_naive above!\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "d8176801",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Step 3: Building the Dense Layer\n",
|
||
"\n",
|
||
"Now let's build the **Dense layer**, the most fundamental building block of neural networks. A Dense layer performs a linear transformation: `y = Wx + b`\n",
|
||
"\n",
|
||
"### What is a Dense Layer?\n",
|
||
"- **Linear transformation**: `y = Wx + b`\n",
|
||
"- **W**: Weight matrix (learnable parameters)\n",
|
||
"- **x**: Input tensor\n",
|
||
"- **b**: Bias vector (learnable parameters)\n",
|
||
"- **y**: Output tensor\n",
|
||
"\n",
|
||
"### Why Dense Layers Matter\n",
|
||
"- **Universal approximation**: Can approximate any function with enough neurons\n",
|
||
"- **Feature learning**: Each neuron learns a different feature\n",
|
||
"- **Nonlinearity**: When combined with activation functions, becomes very powerful\n",
|
||
"- **Foundation**: All other layers build on this concept\n",
|
||
"\n",
|
||
"### The Math\n",
|
||
"For input x of shape (batch_size, input_size):\n",
|
||
"- **W**: Weight matrix of shape (input_size, output_size)\n",
|
||
"- **b**: Bias vector of shape (output_size)\n",
|
||
"- **y**: Output of shape (batch_size, output_size)\n",
|
||
"\n",
|
||
"### Visual Example\n",
|
||
"```\n",
|
||
"Input: x = [1, 2, 3] (3 features)\n",
|
||
"Weights: W = [[0.1, 0.2], Bias: b = [0.1, 0.2]\n",
|
||
" [0.3, 0.4],\n",
|
||
" [0.5, 0.6]]\n",
|
||
"\n",
|
||
"Step 1: Wx = [0.1*1 + 0.3*2 + 0.5*3, 0.2*1 + 0.4*2 + 0.6*3]\n",
|
||
" = [2.2, 3.2]\n",
|
||
"\n",
|
||
"Step 2: y = Wx + b = [2.2 + 0.1, 3.2 + 0.2] = [2.3, 3.4]\n",
|
||
"```\n",
|
||
"\n",
|
||
"Let's implement this!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "4a916c67",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class Dense:\n",
|
||
" \"\"\"\n",
|
||
" Dense (Linear) Layer: y = Wx + b\n",
|
||
" \n",
|
||
" The fundamental building block of neural networks.\n",
|
||
" Performs linear transformation: matrix multiplication + bias addition.\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" input_size: Number of input features\n",
|
||
" output_size: Number of output features\n",
|
||
" use_bias: Whether to include bias term (default: True)\n",
|
||
" use_naive_matmul: Whether to use naive matrix multiplication (for learning)\n",
|
||
" \n",
|
||
" TODO: Implement the Dense layer with weight initialization and forward pass.\n",
|
||
" \n",
|
||
" APPROACH:\n",
|
||
" 1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)\n",
|
||
" 2. Initialize weights with small random values (Xavier/Glorot initialization)\n",
|
||
" 3. Initialize bias to zeros (if use_bias=True)\n",
|
||
" 4. Implement forward pass using matrix multiplication and bias addition\n",
|
||
" \n",
|
||
" EXAMPLE:\n",
|
||
" layer = Dense(input_size=3, output_size=2)\n",
|
||
" x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3\n",
|
||
" y = layer(x) # shape: (1, 2)\n",
|
||
" \n",
|
||
" HINTS:\n",
|
||
" - Use np.random.randn() for random initialization\n",
|
||
" - Scale weights by sqrt(2/(input_size + output_size)) for Xavier init\n",
|
||
" - Store weights and bias as numpy arrays\n",
|
||
" - Use matmul_naive or @ operator based on use_naive_matmul flag\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" def __init__(self, input_size: int, output_size: int, use_bias: bool = True, \n",
|
||
" use_naive_matmul: bool = False):\n",
|
||
" \"\"\"\n",
|
||
" Initialize Dense layer with random weights.\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" input_size: Number of input features\n",
|
||
" output_size: Number of output features\n",
|
||
" use_bias: Whether to include bias term\n",
|
||
" use_naive_matmul: Use naive matrix multiplication (for learning)\n",
|
||
" \n",
|
||
" TODO: \n",
|
||
" 1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)\n",
|
||
" 2. Initialize weights with small random values\n",
|
||
" 3. Initialize bias to zeros (if use_bias=True)\n",
|
||
" \n",
|
||
" STEP-BY-STEP:\n",
|
||
" 1. Store the parameters as instance variables\n",
|
||
" 2. Calculate scale factor for Xavier initialization: sqrt(2/(input_size + output_size))\n",
|
||
" 3. Initialize weights: np.random.randn(input_size, output_size) * scale\n",
|
||
" 4. If use_bias=True, initialize bias: np.zeros(output_size)\n",
|
||
" 5. If use_bias=False, set bias to None\n",
|
||
" \n",
|
||
" EXAMPLE:\n",
|
||
" Dense(3, 2) creates:\n",
|
||
" - weights: shape (3, 2) with small random values\n",
|
||
" - bias: shape (2,) with zeros\n",
|
||
" \"\"\"\n",
|
||
" raise NotImplementedError(\"Student implementation required\")\n",
|
||
" \n",
|
||
" def forward(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"\n",
|
||
" Forward pass: y = Wx + b\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" x: Input tensor of shape (batch_size, input_size)\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" Output tensor of shape (batch_size, output_size)\n",
|
||
" \n",
|
||
" TODO: Implement matrix multiplication and bias addition\n",
|
||
" - Use self.use_naive_matmul to choose between NumPy and naive implementation\n",
|
||
" - If use_naive_matmul=True, use matmul_naive(x.data, self.weights)\n",
|
||
" - If use_naive_matmul=False, use x.data @ self.weights\n",
|
||
" - Add bias if self.use_bias=True\n",
|
||
" \n",
|
||
" STEP-BY-STEP:\n",
|
||
" 1. Perform matrix multiplication: Wx\n",
|
||
" - If use_naive_matmul: result = matmul_naive(x.data, self.weights)\n",
|
||
" - Else: result = x.data @ self.weights\n",
|
||
" 2. Add bias if use_bias: result += self.bias\n",
|
||
" 3. Return Tensor(result)\n",
|
||
" \n",
|
||
" EXAMPLE:\n",
|
||
" Input x: Tensor([[1, 2, 3]]) # shape (1, 3)\n",
|
||
" Weights: shape (3, 2)\n",
|
||
" Output: Tensor([[val1, val2]]) # shape (1, 2)\n",
|
||
" \n",
|
||
" HINTS:\n",
|
||
" - x.data gives you the numpy array\n",
|
||
" - self.weights is your weight matrix\n",
|
||
" - Use broadcasting for bias addition: result + self.bias\n",
|
||
" - Return Tensor(result) to wrap the result\n",
|
||
" \"\"\"\n",
|
||
" raise NotImplementedError(\"Student implementation required\")\n",
|
||
" \n",
|
||
" def __call__(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
|
||
" return self.forward(x)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8570d026",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| hide\n",
|
||
"#| export\n",
|
||
"class Dense:\n",
|
||
" \"\"\"\n",
|
||
" Dense (Linear) Layer: y = Wx + b\n",
|
||
" \n",
|
||
" The fundamental building block of neural networks.\n",
|
||
" Performs linear transformation: matrix multiplication + bias addition.\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" def __init__(self, input_size: int, output_size: int, use_bias: bool = True, \n",
|
||
" use_naive_matmul: bool = False):\n",
|
||
" \"\"\"\n",
|
||
" Initialize Dense layer with random weights.\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" input_size: Number of input features\n",
|
||
" output_size: Number of output features\n",
|
||
" use_bias: Whether to include bias term\n",
|
||
" use_naive_matmul: Use naive matrix multiplication (for learning)\n",
|
||
" \"\"\"\n",
|
||
" # Store parameters\n",
|
||
" self.input_size = input_size\n",
|
||
" self.output_size = output_size\n",
|
||
" self.use_bias = use_bias\n",
|
||
" self.use_naive_matmul = use_naive_matmul\n",
|
||
" \n",
|
||
" # Xavier/Glorot initialization\n",
|
||
" scale = np.sqrt(2.0 / (input_size + output_size))\n",
|
||
" self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale\n",
|
||
" \n",
|
||
" # Initialize bias\n",
|
||
" if use_bias:\n",
|
||
" self.bias = np.zeros(output_size, dtype=np.float32)\n",
|
||
" else:\n",
|
||
" self.bias = None\n",
|
||
" \n",
|
||
" def forward(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"\n",
|
||
" Forward pass: y = Wx + b\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" x: Input tensor of shape (batch_size, input_size)\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" Output tensor of shape (batch_size, output_size)\n",
|
||
" \"\"\"\n",
|
||
" # Matrix multiplication\n",
|
||
" if self.use_naive_matmul:\n",
|
||
" result = matmul_naive(x.data, self.weights)\n",
|
||
" else:\n",
|
||
" result = x.data @ self.weights\n",
|
||
" \n",
|
||
" # Add bias\n",
|
||
" if self.use_bias:\n",
|
||
" result += self.bias\n",
|
||
" \n",
|
||
" return Tensor(result)\n",
|
||
" \n",
|
||
" def __call__(self, x: Tensor) -> Tensor:\n",
|
||
" \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
|
||
" return self.forward(x)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "90197c65",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### 🧪 Test Your Dense Layer"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "9d9e4d64",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Test Dense layer\n",
|
||
"print(\"Testing Dense layer...\")\n",
|
||
"\n",
|
||
"try:\n",
|
||
" # Test basic Dense layer\n",
|
||
" layer = Dense(input_size=3, output_size=2, use_bias=True)\n",
|
||
" x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3\n",
|
||
" \n",
|
||
" print(f\"✅ Input shape: {x.shape}\")\n",
|
||
" print(f\"✅ Layer weights shape: {layer.weights.shape}\")\n",
|
||
" print(f\"✅ Layer bias shape: {layer.bias.shape}\")\n",
|
||
" \n",
|
||
" y = layer(x)\n",
|
||
" print(f\"✅ Output shape: {y.shape}\")\n",
|
||
" print(f\"✅ Output: {y}\")\n",
|
||
" \n",
|
||
" # Test without bias\n",
|
||
" layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)\n",
|
||
" x2 = Tensor([[1, 2]])\n",
|
||
" y2 = layer_no_bias(x2)\n",
|
||
" print(f\"✅ No bias output: {y2}\")\n",
|
||
" \n",
|
||
" # Test naive matrix multiplication\n",
|
||
" layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)\n",
|
||
" x3 = Tensor([[1, 2]])\n",
|
||
" y3 = layer_naive(x3)\n",
|
||
" print(f\"✅ Naive matmul output: {y3}\")\n",
|
||
" \n",
|
||
" print(\"\\n🎉 All Dense layer tests passed!\")\n",
|
||
" \n",
|
||
"except Exception as e:\n",
|
||
" print(f\"❌ Error: {e}\")\n",
|
||
" print(\"Make sure to implement the Dense layer above!\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "37532e4d",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## Step 4: Composing Layers with Activations\n",
|
||
"\n",
|
||
"Now let's see how layers work together! A neural network is just layers composed with activation functions.\n",
|
||
"\n",
|
||
"### Why Layer Composition Matters\n",
|
||
"- **Nonlinearity**: Activation functions make networks powerful\n",
|
||
"- **Feature learning**: Each layer learns different levels of features\n",
|
||
"- **Universal approximation**: Can approximate any function\n",
|
||
"- **Modularity**: Easy to experiment with different architectures\n",
|
||
"\n",
|
||
"### The Pattern\n",
|
||
"```\n",
|
||
"Input → Dense → Activation → Dense → Activation → Output\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Real-World Example\n",
|
||
"```\n",
|
||
"Input: [1, 2, 3] (3 features)\n",
|
||
"Dense(3→2): [1.4, 2.8] (linear transformation)\n",
|
||
"ReLU: [1.4, 2.8] (nonlinearity)\n",
|
||
"Dense(2→1): [3.2] (final prediction)\n",
|
||
"```\n",
|
||
"\n",
|
||
"Let's build a simple network!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d6e1d85c",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Test layer composition\n",
|
||
"print(\"Testing layer composition...\")\n",
|
||
"\n",
|
||
"try:\n",
|
||
" # Create a simple network: Dense → ReLU → Dense\n",
|
||
" dense1 = Dense(input_size=3, output_size=2)\n",
|
||
" relu = ReLU()\n",
|
||
" dense2 = Dense(input_size=2, output_size=1)\n",
|
||
" \n",
|
||
" # Test input\n",
|
||
" x = Tensor([[1, 2, 3]])\n",
|
||
" print(f\"✅ Input: {x}\")\n",
|
||
" \n",
|
||
" # Forward pass through the network\n",
|
||
" h1 = dense1(x)\n",
|
||
" print(f\"✅ After Dense1: {h1}\")\n",
|
||
" \n",
|
||
" h2 = relu(h1)\n",
|
||
" print(f\"✅ After ReLU: {h2}\")\n",
|
||
" \n",
|
||
" y = dense2(h2)\n",
|
||
" print(f\"✅ Final output: {y}\")\n",
|
||
" \n",
|
||
" print(\"\\n🎉 Layer composition works!\")\n",
|
||
" print(\"This is how neural networks work: layers + activations!\")\n",
|
||
" \n",
|
||
"except Exception as e:\n",
|
||
" print(f\"❌ Error: {e}\")\n",
|
||
" print(\"Make sure all your layers and activations are working!\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5f2f8a48",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## Step 5: Performance Comparison\n",
|
||
"\n",
|
||
"Let's compare our naive matrix multiplication with NumPy's optimized version to understand why optimization matters in ML.\n",
|
||
"\n",
|
||
"### Why Performance Matters\n",
|
||
"- **Training time**: Neural networks train for hours/days\n",
|
||
"- **Inference speed**: Real-time applications need fast predictions\n",
|
||
"- **GPU utilization**: Optimized operations use hardware efficiently\n",
|
||
"- **Scalability**: Large models need efficient implementations"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b6f490a2",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Performance comparison\n",
|
||
"print(\"Comparing naive vs NumPy matrix multiplication...\")\n",
|
||
"\n",
|
||
"try:\n",
|
||
" import time\n",
|
||
" \n",
|
||
" # Create test matrices\n",
|
||
" A = np.random.randn(100, 100).astype(np.float32)\n",
|
||
" B = np.random.randn(100, 100).astype(np.float32)\n",
|
||
" \n",
|
||
" # Time naive implementation\n",
|
||
" start_time = time.time()\n",
|
||
" result_naive = matmul_naive(A, B)\n",
|
||
" naive_time = time.time() - start_time\n",
|
||
" \n",
|
||
" # Time NumPy implementation\n",
|
||
" start_time = time.time()\n",
|
||
" result_numpy = A @ B\n",
|
||
" numpy_time = time.time() - start_time\n",
|
||
" \n",
|
||
" print(f\"✅ Naive time: {naive_time:.4f} seconds\")\n",
|
||
" print(f\"✅ NumPy time: {numpy_time:.4f} seconds\")\n",
|
||
" print(f\"✅ Speedup: {naive_time/numpy_time:.1f}x faster\")\n",
|
||
" \n",
|
||
" # Verify correctness\n",
|
||
" assert np.allclose(result_naive, result_numpy), \"Results don't match!\"\n",
|
||
" print(\"✅ Results are identical!\")\n",
|
||
" \n",
|
||
" print(\"\\n💡 This is why we use optimized libraries in production!\")\n",
|
||
" \n",
|
||
"except Exception as e:\n",
|
||
" print(f\"❌ Error: {e}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "35efc1ca",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🎯 Module Summary\n",
|
||
"\n",
|
||
"Congratulations! You've built the foundation of neural network layers:\n",
|
||
"\n",
|
||
"### What You've Accomplished\n",
|
||
"✅ **Matrix Multiplication**: Understanding the core operation \n",
|
||
"✅ **Dense Layer**: Linear transformation with weights and bias \n",
|
||
"✅ **Layer Composition**: Combining layers with activations \n",
|
||
"✅ **Performance Awareness**: Understanding optimization importance \n",
|
||
"✅ **Testing**: Immediate feedback on your implementations \n",
|
||
"\n",
|
||
"### Key Concepts You've Learned\n",
|
||
"- **Layers** are functions that transform tensors\n",
|
||
"- **Matrix multiplication** powers all neural network computations\n",
|
||
"- **Dense layers** perform linear transformations: `y = Wx + b`\n",
|
||
"- **Layer composition** creates complex functions from simple building blocks\n",
|
||
"- **Performance** matters for real-world ML applications\n",
|
||
"\n",
|
||
"### What's Next\n",
|
||
"In the next modules, you'll build on this foundation:\n",
|
||
"- **Networks**: Compose layers into complete models\n",
|
||
"- **Training**: Learn parameters with gradients and optimization\n",
|
||
"- **Convolutional layers**: Process spatial data like images\n",
|
||
"- **Recurrent layers**: Process sequential data like text\n",
|
||
"\n",
|
||
"### Real-World Connection\n",
|
||
"Your Dense layer is now ready to:\n",
|
||
"- Learn patterns in data through weight updates\n",
|
||
"- Transform features for classification and regression\n",
|
||
"- Serve as building blocks for complex architectures\n",
|
||
"- Integrate with the rest of the TinyTorch ecosystem\n",
|
||
"\n",
|
||
"**Ready for the next challenge?** Let's move on to building complete neural networks!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "9c9187ca",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Final verification\n",
|
||
"print(\"\\n\" + \"=\"*50)\n",
|
||
"print(\"🎉 LAYERS MODULE COMPLETE!\")\n",
|
||
"print(\"=\"*50)\n",
|
||
"print(\"✅ Matrix multiplication understanding\")\n",
|
||
"print(\"✅ Dense layer implementation\")\n",
|
||
"print(\"✅ Layer composition with activations\")\n",
|
||
"print(\"✅ Performance awareness\")\n",
|
||
"print(\"✅ Comprehensive testing\")\n",
|
||
"print(\"\\n🚀 Ready to build networks in the next module!\") "
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"jupytext": {
|
||
"main_language": "python"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|