mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-02 20:42:35 -05:00
925 lines
36 KiB
Plaintext
925 lines
36 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0e007598",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"# Layers - Building Blocks of Neural Networks\n",
|
||
"\n",
|
||
"Welcome to the Layers module! This is where we build the fundamental components that stack together to form neural networks.\n",
|
||
"\n",
|
||
"## Learning Goals\n",
|
||
"- Understand how matrix multiplication powers neural networks\n",
|
||
"- Implement naive matrix multiplication from scratch for deep understanding\n",
|
||
"- Build the Dense (Linear) layer - the foundation of all neural networks\n",
|
||
"- Learn weight initialization strategies and their importance\n",
|
||
"- See how layers compose with activations to create powerful networks\n",
|
||
"\n",
|
||
"## Build → Use → Understand\n",
|
||
"1. **Build**: Matrix multiplication and Dense layers from scratch\n",
|
||
"2. **Use**: Create and test layers with real data\n",
|
||
"3. **Understand**: How linear transformations enable feature learning"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "bc400228",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "layers-imports",
|
||
"locked": false,
|
||
"schema_version": 3,
|
||
"solution": false,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| default_exp core.layers\n",
|
||
"\n",
|
||
"#| export\n",
|
||
"import numpy as np\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"import os\n",
|
||
"import sys\n",
|
||
"from typing import Union, List, Tuple, Optional\n",
|
||
"\n",
|
||
"# Import our dependencies - try from package first, then local modules\n",
|
||
"try:\n",
|
||
" from tinytorch.core.tensor import Tensor\n",
|
||
" from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
|
||
"except ImportError:\n",
|
||
" # For development, import from local modules\n",
|
||
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
|
||
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
|
||
" try:\n",
|
||
" from tensor_dev import Tensor\n",
|
||
" from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
|
||
" except ImportError:\n",
|
||
" # If the local modules are not available, use relative imports\n",
|
||
" from ..tensor.tensor_dev import Tensor\n",
|
||
" from ..activations.activations_dev import ReLU, Sigmoid, Tanh, Softmax"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "e186492c",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "layers-setup",
|
||
"locked": false,
|
||
"schema_version": 3,
|
||
"solution": false,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| hide\n",
|
||
"#| export\n",
|
||
"def _should_show_plots():\n",
|
||
" \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
|
||
" # Check multiple conditions that indicate we're in test mode\n",
|
||
" is_pytest = (\n",
|
||
" 'pytest' in sys.modules or\n",
|
||
" 'test' in sys.argv or\n",
|
||
" os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
|
||
" any('test' in arg for arg in sys.argv) or\n",
|
||
" any('pytest' in arg for arg in sys.argv)\n",
|
||
" )\n",
|
||
" \n",
|
||
" # Show plots in development mode (when not in test mode)\n",
|
||
" return not is_pytest"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d41a5d47",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "layers-welcome",
|
||
"locked": false,
|
||
"schema_version": 3,
|
||
"solution": false,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"print(\"🔥 TinyTorch Layers Module\")\n",
|
||
"print(f\"NumPy version: {np.__version__}\")\n",
|
||
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
|
||
"print(\"Ready to build neural network layers!\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "bed6f41e",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 📦 Where This Code Lives in the Final Package\n",
|
||
"\n",
|
||
"**Learning Side:** You work in `modules/source/03_layers/layers_dev.py` \n",
|
||
"**Building Side:** Code exports to `tinytorch.core.layers`\n",
|
||
"\n",
|
||
"```python\n",
|
||
"# Final package structure:\n",
|
||
"from tinytorch.core.layers import Dense, Conv2D # All layer types together!\n",
|
||
"from tinytorch.core.tensor import Tensor # The foundation\n",
|
||
"from tinytorch.core.activations import ReLU, Sigmoid # Nonlinearity\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why this matters:**\n",
|
||
"- **Learning:** Focused modules for deep understanding\n",
|
||
"- **Production:** Proper organization like PyTorch's `torch.nn.Linear`\n",
|
||
"- **Consistency:** All layer types live together in `core.layers`\n",
|
||
"- **Integration:** Works seamlessly with tensors and activations"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a2c033ee",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## What Are Neural Network Layers?\n",
|
||
"\n",
|
||
"### The Building Block Pattern\n",
|
||
"Neural networks are built by stacking **layers** - each layer is a function that:\n",
|
||
"1. **Takes input**: Tensor data from previous layer\n",
|
||
"2. **Transforms**: Applies mathematical operations (linear transformation + activation)\n",
|
||
"3. **Produces output**: New tensor data for next layer\n",
|
||
"\n",
|
||
"### The Universal Pattern\n",
|
||
"Every layer follows this pattern:\n",
|
||
"```python\n",
|
||
"def layer(x):\n",
|
||
" # 1. Linear transformation\n",
|
||
" linear_output = x @ weights + bias\n",
|
||
" \n",
|
||
" # 2. Nonlinear activation\n",
|
||
" output = activation(linear_output)\n",
|
||
" \n",
|
||
" return output\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Why This Works\n",
|
||
"- **Linear part**: Learns feature combinations\n",
|
||
"- **Nonlinear part**: Enables complex patterns\n",
|
||
"- **Stacking**: Multiple layers = more complex functions\n",
|
||
"\n",
|
||
"### Mathematical Foundation\n",
|
||
"A neural network is function composition:\n",
|
||
"```\n",
|
||
"f(x) = layer_n(layer_{n-1}(...layer_2(layer_1(x))))\n",
|
||
"```\n",
|
||
"\n",
|
||
"Each layer transforms the representation to be more useful for the final task.\n",
|
||
"\n",
|
||
"### What We'll Build\n",
|
||
"1. **Matrix Multiplication**: The core operation powering all layers\n",
|
||
"2. **Dense Layer**: The fundamental building block of neural networks\n",
|
||
"3. **Integration**: How layers work with activations and tensors"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "448f63f6",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Step 1: Matrix Multiplication - The Engine of Neural Networks\n",
|
||
"\n",
|
||
"### What is Matrix Multiplication?\n",
|
||
"Matrix multiplication is the core operation that powers all neural network layers:\n",
|
||
"\n",
|
||
"```\n",
|
||
"C = A @ B\n",
|
||
"```\n",
|
||
"\n",
|
||
"Where:\n",
|
||
"- **A**: Input data (batch_size × input_features)\n",
|
||
"- **B**: Weight matrix (input_features × output_features) \n",
|
||
"- **C**: Output data (batch_size × output_features)\n",
|
||
"\n",
|
||
"### Why It's Essential\n",
|
||
"- **Feature combination**: Each output combines all input features\n",
|
||
"- **Learned weights**: B contains the learned parameters\n",
|
||
"- **Efficient computation**: Vectorized operations are much faster\n",
|
||
"- **Parallel processing**: GPUs are designed for matrix operations\n",
|
||
"\n",
|
||
"### The Mathematical Definition\n",
|
||
"For matrices A (m×n) and B (n×p), the result C (m×p) is:\n",
|
||
"```\n",
|
||
"C[i,j] = Σ(k=0 to n-1) A[i,k] * B[k,j]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Visual Understanding\n",
|
||
"```\n",
|
||
"[1 2] @ [5 6] = [1*5+2*7 1*6+2*8] = [19 22]\n",
|
||
"[3 4] [7 8] [3*5+4*7 3*6+4*8] [43 50]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Real-World Context\n",
|
||
"Every major operation in deep learning uses matrix multiplication:\n",
|
||
"- **Dense layers**: Linear transformations\n",
|
||
"- **Convolutional layers**: Convolution as matrix multiplication\n",
|
||
"- **Attention mechanisms**: Query-Key-Value computations\n",
|
||
"- **Embeddings**: Lookup tables as matrix multiplication"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "cccd838f",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "matmul-naive",
|
||
"locked": false,
|
||
"schema_version": 3,
|
||
"solution": true,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"def matmul(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
|
||
" \"\"\"\n",
|
||
" Matrix multiplication using explicit for-loops.\n",
|
||
" \n",
|
||
" This helps you understand what matrix multiplication really does!\n",
|
||
" \n",
|
||
" TODO: Implement matrix multiplication using three nested for-loops.\n",
|
||
" \n",
|
||
" STEP-BY-STEP IMPLEMENTATION:\n",
|
||
" 1. Get the dimensions: m, n from A.shape and n2, p from B.shape\n",
|
||
" 2. Check compatibility: n must equal n2\n",
|
||
" 3. Create output matrix C of shape (m, p) filled with zeros\n",
|
||
" 4. Use three nested loops:\n",
|
||
" - i loop: iterate through rows of A (0 to m-1)\n",
|
||
" - j loop: iterate through columns of B (0 to p-1)\n",
|
||
" - k loop: iterate through shared dimension (0 to n-1)\n",
|
||
" 5. For each (i,j), accumulate: C[i,j] += A[i,k] * B[k,j]\n",
|
||
" \n",
|
||
" EXAMPLE WALKTHROUGH:\n",
|
||
" ```python\n",
|
||
" A = [[1, 2], B = [[5, 6],\n",
|
||
" [3, 4]] [7, 8]]\n",
|
||
" \n",
|
||
" C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19\n",
|
||
" C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22\n",
|
||
" C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43\n",
|
||
" C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50\n",
|
||
" \n",
|
||
" Result: [[19, 22], [43, 50]]\n",
|
||
" ```\n",
|
||
" \n",
|
||
" IMPLEMENTATION HINTS:\n",
|
||
" - Get dimensions: m, n = A.shape; n2, p = B.shape\n",
|
||
" - Check compatibility: if n != n2: raise ValueError\n",
|
||
" - Initialize result: C = np.zeros((m, p))\n",
|
||
" - Triple nested loop: for i in range(m): for j in range(p): for k in range(n):\n",
|
||
" - Accumulate sum: C[i,j] += A[i,k] * B[k,j]\n",
|
||
" \n",
|
||
" LEARNING CONNECTIONS:\n",
|
||
" - This is what every neural network layer does internally\n",
|
||
" - Understanding this helps debug shape mismatches\n",
|
||
" - Essential for understanding the foundation of neural networks\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Get matrix dimensions\n",
|
||
" m, n = A.shape\n",
|
||
" n2, p = B.shape\n",
|
||
" \n",
|
||
" # Check compatibility\n",
|
||
" if n != n2:\n",
|
||
" raise ValueError(f\"Incompatible matrix dimensions: A is {m}x{n}, B is {n2}x{p}\")\n",
|
||
" \n",
|
||
" # Initialize result matrix\n",
|
||
" C = np.zeros((m, p))\n",
|
||
" \n",
|
||
" # Triple nested loop for matrix multiplication\n",
|
||
" for i in range(m):\n",
|
||
" for j in range(p):\n",
|
||
" for k in range(n):\n",
|
||
" C[i, j] += A[i, k] * B[k, j]\n",
|
||
" \n",
|
||
" return C\n",
|
||
" ### END SOLUTION"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "6e695714",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Test Your Matrix Multiplication\n",
|
||
"\n",
|
||
"Once you implement the `matmul` function above, run this cell to test it:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "bed91066",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-matmul-immediate",
|
||
"locked": true,
|
||
"points": 10,
|
||
"schema_version": 3,
|
||
"solution": false,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_matrix_multiplication():\n",
|
||
" \"\"\"Test matrix multiplication implementation\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Matrix Multiplication...\")\n",
|
||
"\n",
|
||
"# Test simple 2x2 case\n",
|
||
" A = np.array([[1, 2], [3, 4]], dtype=np.float32)\n",
|
||
" B = np.array([[5, 6], [7, 8]], dtype=np.float32)\n",
|
||
" \n",
|
||
" result = matmul(A, B)\n",
|
||
" expected = np.array([[19, 22], [43, 50]], dtype=np.float32)\n",
|
||
" \n",
|
||
" assert np.allclose(result, expected), f\"Matrix multiplication failed: expected {expected}, got {result}\"\n",
|
||
" \n",
|
||
" # Compare with NumPy\n",
|
||
" numpy_result = A @ B\n",
|
||
" assert np.allclose(result, numpy_result), f\"Doesn't match NumPy: got {result}, expected {numpy_result}\"\n",
|
||
"\n",
|
||
"# Test different shapes\n",
|
||
" A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3\n",
|
||
" B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1\n",
|
||
" result2 = matmul(A2, B2)\n",
|
||
" expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32\n",
|
||
" \n",
|
||
" assert np.allclose(result2, expected2), f\"1x3 @ 3x1 failed: expected {expected2}, got {result2}\"\n",
|
||
" \n",
|
||
" # Test 3x3 case\n",
|
||
" A3 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)\n",
|
||
" B3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) # Identity\n",
|
||
" result3 = matmul(A3, B3)\n",
|
||
" \n",
|
||
" assert np.allclose(result3, A3), \"Multiplication by identity should preserve matrix\"\n",
|
||
" \n",
|
||
" # Test incompatible shapes\n",
|
||
" A4 = np.array([[1, 2]], dtype=np.float32) # 1x2\n",
|
||
" B4 = np.array([[3], [4], [5]], dtype=np.float32) # 3x1\n",
|
||
" \n",
|
||
" try:\n",
|
||
" matmul(A4, B4)\n",
|
||
" assert False, \"Should raise error for incompatible shapes\"\n",
|
||
" except ValueError as e:\n",
|
||
" assert \"Incompatible matrix dimensions\" in str(e)\n",
|
||
" \n",
|
||
" print(\"✅ Matrix multiplication tests passed!\")\n",
|
||
" print(f\"✅ 2x2 multiplication working correctly\")\n",
|
||
" print(f\"✅ Matches NumPy's implementation\")\n",
|
||
" print(f\"✅ Handles different shapes correctly\")\n",
|
||
" print(f\"✅ Proper error handling for incompatible shapes\")\n",
|
||
"\n",
|
||
"# Run the test\n",
|
||
"test_matrix_multiplication()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ab183a07",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Step 2: Dense Layer - The Foundation of Neural Networks\n",
|
||
"\n",
|
||
"### What is a Dense Layer?\n",
|
||
"A **Dense layer** (also called Linear or Fully Connected layer) is the fundamental building block of neural networks:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"output = input @ weights + bias\n",
|
||
"```\n",
|
||
"\n",
|
||
"Where:\n",
|
||
"- **input**: Input data (batch_size × input_features)\n",
|
||
"- **weights**: Learned parameters (input_features × output_features)\n",
|
||
"- **bias**: Learned bias terms (output_features,)\n",
|
||
"- **output**: Transformed data (batch_size × output_features)\n",
|
||
"\n",
|
||
"### Why Dense Layers Are Essential\n",
|
||
"1. **Feature transformation**: Learn meaningful combinations of input features\n",
|
||
"2. **Universal approximation**: Stack enough layers to approximate any function\n",
|
||
"3. **Learnable parameters**: Weights and biases are optimized during training\n",
|
||
"4. **Composability**: Can be stacked to create complex architectures\n",
|
||
"\n",
|
||
"### The Mathematical Foundation\n",
|
||
"For input x, weight matrix W, and bias b:\n",
|
||
"```\n",
|
||
"y = xW + b\n",
|
||
"```\n",
|
||
"\n",
|
||
"This is a linear transformation that:\n",
|
||
"- **Combines features**: Each output is a weighted sum of all inputs\n",
|
||
"- **Learns relationships**: Weights encode feature interactions\n",
|
||
"- **Adds flexibility**: Bias allows shifting the output\n",
|
||
"\n",
|
||
"### Real-World Applications\n",
|
||
"- **Classification**: Transform features to class logits\n",
|
||
"- **Regression**: Transform features to continuous outputs\n",
|
||
"- **Representation learning**: Learn useful intermediate representations\n",
|
||
"- **Attention mechanisms**: Compute queries, keys, and values\n",
|
||
"\n",
|
||
"### Design Decisions\n",
|
||
"- **Weight initialization**: Random initialization to break symmetry\n",
|
||
"- **Bias usage**: Usually included for flexibility\n",
|
||
"- **Activation**: Often followed by nonlinear activation"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "eec77bde",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "dense-layer",
|
||
"locked": false,
|
||
"schema_version": 3,
|
||
"solution": true,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class Dense:\n",
|
||
" \"\"\"\n",
|
||
" Dense (Linear/Fully Connected) Layer\n",
|
||
" \n",
|
||
" Applies a linear transformation: y = xW + b\n",
|
||
" \n",
|
||
" This is the fundamental building block of neural networks.\n",
|
||
" \"\"\"\n",
|
||
" \n",
|
||
" def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
|
||
" \"\"\"\n",
|
||
" Initialize Dense layer with random weights and optional bias.\n",
|
||
" \n",
|
||
" TODO: Implement Dense layer initialization.\n",
|
||
" \n",
|
||
" STEP-BY-STEP IMPLEMENTATION:\n",
|
||
" 1. Store the layer parameters (input_size, output_size, use_bias)\n",
|
||
" 2. Initialize weights with random values using proper scaling\n",
|
||
" 3. Initialize bias (if use_bias=True) with zeros\n",
|
||
" 4. Convert weights and bias to Tensor objects\n",
|
||
" \n",
|
||
" WEIGHT INITIALIZATION STRATEGY:\n",
|
||
" - Use Xavier/Glorot initialization for better gradient flow\n",
|
||
" - Scale: sqrt(2 / (input_size + output_size))\n",
|
||
" - Random values: np.random.randn() * scale\n",
|
||
" \n",
|
||
" EXAMPLE USAGE:\n",
|
||
" ```python\n",
|
||
" layer = Dense(input_size=3, output_size=2)\n",
|
||
" # Creates weight matrix of shape (3, 2) and bias of shape (2,)\n",
|
||
" ```\n",
|
||
" \n",
|
||
" IMPLEMENTATION HINTS:\n",
|
||
" - Store parameters: self.input_size, self.output_size, self.use_bias\n",
|
||
" - Weight shape: (input_size, output_size)\n",
|
||
" - Bias shape: (output_size,) if use_bias else None\n",
|
||
" - Use Xavier initialization: scale = np.sqrt(2.0 / (input_size + output_size))\n",
|
||
" - Initialize weights: np.random.randn(input_size, output_size) * scale\n",
|
||
" - Initialize bias: np.zeros(output_size) if use_bias else None\n",
|
||
" - Convert to Tensors: self.weights = Tensor(weight_data), self.bias = Tensor(bias_data)\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Store layer parameters\n",
|
||
" self.input_size = input_size\n",
|
||
" self.output_size = output_size\n",
|
||
" self.use_bias = use_bias\n",
|
||
" \n",
|
||
" # Xavier/Glorot initialization\n",
|
||
" scale = np.sqrt(2.0 / (input_size + output_size))\n",
|
||
" \n",
|
||
" # Initialize weights with random values\n",
|
||
" weight_data = np.random.randn(input_size, output_size) * scale\n",
|
||
" self.weights = Tensor(weight_data)\n",
|
||
" \n",
|
||
" # Initialize bias\n",
|
||
" if use_bias:\n",
|
||
" bias_data = np.zeros(output_size)\n",
|
||
" self.bias = Tensor(bias_data)\n",
|
||
" else:\n",
|
||
" self.bias = None\n",
|
||
" ### END SOLUTION\n",
|
||
" \n",
|
||
" def forward(self, x):\n",
|
||
" \"\"\"\n",
|
||
" Forward pass through the Dense layer.\n",
|
||
" \n",
|
||
" TODO: Implement the forward pass: y = xW + b\n",
|
||
" \n",
|
||
" STEP-BY-STEP IMPLEMENTATION:\n",
|
||
" 1. Perform matrix multiplication: x @ self.weights\n",
|
||
" 2. Add bias if present: result + self.bias\n",
|
||
" 3. Return the result as a Tensor\n",
|
||
" \n",
|
||
" EXAMPLE USAGE:\n",
|
||
" ```python\n",
|
||
" layer = Dense(input_size=3, output_size=2)\n",
|
||
" input_data = Tensor([[1, 2, 3]]) # Shape: (1, 3)\n",
|
||
" output = layer(input_data) # Shape: (1, 2)\n",
|
||
" ```\n",
|
||
" \n",
|
||
" IMPLEMENTATION HINTS:\n",
|
||
" - Matrix multiplication: matmul(x.data, self.weights.data)\n",
|
||
" - Add bias: result + self.bias.data (broadcasting handles shape)\n",
|
||
" - Return as Tensor: return Tensor(final_result)\n",
|
||
" - Handle both cases: with and without bias\n",
|
||
" \n",
|
||
" LEARNING CONNECTIONS:\n",
|
||
" - This is the core operation in every neural network layer\n",
|
||
" - Matrix multiplication combines all input features\n",
|
||
" - Bias addition allows shifting the output distribution\n",
|
||
" - The result feeds into activation functions\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Perform matrix multiplication\n",
|
||
" linear_output = matmul(x.data, self.weights.data)\n",
|
||
" \n",
|
||
" # Add bias if present\n",
|
||
" if self.use_bias and self.bias is not None:\n",
|
||
" linear_output = linear_output + self.bias.data\n",
|
||
" \n",
|
||
" return type(x)(linear_output)\n",
|
||
" ### END SOLUTION\n",
|
||
" \n",
|
||
" def __call__(self, x):\n",
|
||
" \"\"\"Make the layer callable: layer(x) instead of layer.forward(x)\"\"\"\n",
|
||
" return self.forward(x)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5736d98c",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Test Your Dense Layer\n",
|
||
"\n",
|
||
"Once you implement the Dense layer above, run this cell to test it:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "9b1d056c",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-dense-layer",
|
||
"locked": true,
|
||
"points": 15,
|
||
"schema_version": 3,
|
||
"solution": false,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_dense_layer():\n",
|
||
" \"\"\"Test Dense layer implementation\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Dense Layer...\")\n",
|
||
" \n",
|
||
" # Test layer creation\n",
|
||
" layer = Dense(input_size=3, output_size=2)\n",
|
||
" \n",
|
||
" # Check weight and bias shapes\n",
|
||
" assert layer.weights.shape == (3, 2), f\"Weight shape should be (3, 2), got {layer.weights.shape}\"\n",
|
||
" assert layer.bias is not None, \"Bias should not be None when use_bias=True\"\n",
|
||
" assert layer.bias.shape == (2,), f\"Bias shape should be (2,), got {layer.bias.shape}\"\n",
|
||
" \n",
|
||
" # Test forward pass\n",
|
||
" input_data = Tensor([[1, 2, 3]]) # Shape: (1, 3)\n",
|
||
" output = layer(input_data)\n",
|
||
" \n",
|
||
" # Check output shape\n",
|
||
" assert output.shape == (1, 2), f\"Output shape should be (1, 2), got {output.shape}\"\n",
|
||
" \n",
|
||
" # Test batch processing\n",
|
||
" batch_input = Tensor([[1, 2, 3], [4, 5, 6]]) # Shape: (2, 3)\n",
|
||
" batch_output = layer(batch_input)\n",
|
||
" \n",
|
||
" assert batch_output.shape == (2, 2), f\"Batch output shape should be (2, 2), got {batch_output.shape}\"\n",
|
||
"\n",
|
||
"# Test without bias\n",
|
||
" no_bias_layer = Dense(input_size=3, output_size=2, use_bias=False)\n",
|
||
" assert no_bias_layer.bias is None, \"Layer without bias should have None bias\"\n",
|
||
" \n",
|
||
" no_bias_output = no_bias_layer(input_data)\n",
|
||
" assert no_bias_output.shape == (1, 2), \"No-bias layer should still produce correct shape\"\n",
|
||
" \n",
|
||
" # Test that different inputs produce different outputs\n",
|
||
" input1 = Tensor([[1, 0, 0]])\n",
|
||
" input2 = Tensor([[0, 1, 0]])\n",
|
||
" \n",
|
||
" output1 = layer(input1)\n",
|
||
" output2 = layer(input2)\n",
|
||
" \n",
|
||
" # Should not be equal (with high probability due to random initialization)\n",
|
||
" assert not np.allclose(output1.data, output2.data), \"Different inputs should produce different outputs\"\n",
|
||
" \n",
|
||
" # Test linearity property: layer(a*x) = a*layer(x)\n",
|
||
" scale = 2.0\n",
|
||
" scaled_input = Tensor([[2, 4, 6]]) # 2 * [1, 2, 3]\n",
|
||
" scaled_output = layer(scaled_input)\n",
|
||
" \n",
|
||
" # Due to bias, this won't be exactly 2*output, but the linear part should scale\n",
|
||
" print(\"✅ Dense layer tests passed!\")\n",
|
||
" print(f\"✅ Correct weight and bias initialization\")\n",
|
||
" print(f\"✅ Forward pass produces correct shapes\")\n",
|
||
" print(f\"✅ Batch processing works correctly\")\n",
|
||
" print(f\"✅ Bias and no-bias variants work\")\n",
|
||
" print(f\"✅ Naive matrix multiplication option works\")\n",
|
||
"\n",
|
||
"# Run the test\n",
|
||
"test_dense_layer()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ac4dcba0",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Step 3: Layer Integration with Activations\n",
|
||
"\n",
|
||
"### Building Complete Neural Network Components\n",
|
||
"Now let's see how Dense layers work with activation functions to create complete neural network components:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"# Complete neural network layer\n",
|
||
"x = input_data\n",
|
||
"linear_output = dense_layer(x)\n",
|
||
"final_output = activation_function(linear_output)\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Why This Combination Works\n",
|
||
"1. **Linear transformation**: Dense layer learns feature combinations\n",
|
||
"2. **Nonlinear activation**: Enables complex pattern recognition\n",
|
||
"3. **Stacking**: Multiple layer+activation pairs create deep networks\n",
|
||
"4. **Universal approximation**: Can approximate any continuous function\n",
|
||
"\n",
|
||
"### Real-World Layer Patterns\n",
|
||
"- **Hidden layers**: Dense + ReLU (most common)\n",
|
||
"- **Output layers**: Dense + Softmax (classification) or Dense + Sigmoid (binary)\n",
|
||
"- **Gated layers**: Dense + Sigmoid (for gates in LSTM/GRU)\n",
|
||
"- **Attention layers**: Dense + Softmax (for attention weights)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f5e77a64",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-layer-activation-comprehensive",
|
||
"locked": true,
|
||
"points": 15,
|
||
"schema_version": 3,
|
||
"solution": false,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_layer_activation():\n",
|
||
" \"\"\"Test Dense layer comprehensive testing with activation functions\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Layer-Activation Comprehensive Test...\")\n",
|
||
" \n",
|
||
" # Create layer and activation functions\n",
|
||
" layer = Dense(input_size=4, output_size=3)\n",
|
||
" relu = ReLU()\n",
|
||
" sigmoid = Sigmoid()\n",
|
||
" tanh = Tanh()\n",
|
||
" softmax = Softmax()\n",
|
||
" \n",
|
||
" # Test input\n",
|
||
" input_data = Tensor([[1, -2, 3, -4], [2, 1, -1, 3]]) # Shape: (2, 4)\n",
|
||
" \n",
|
||
" # Test Dense + ReLU (common hidden layer pattern)\n",
|
||
" linear_output = layer(input_data)\n",
|
||
" relu_output = relu(linear_output)\n",
|
||
" \n",
|
||
" assert relu_output.shape == (2, 3), \"ReLU output should preserve shape\"\n",
|
||
" assert np.all(relu_output.data >= 0), \"ReLU output should be non-negative\"\n",
|
||
" \n",
|
||
" # Test Dense + Softmax (classification output pattern)\n",
|
||
" softmax_output = softmax(linear_output)\n",
|
||
" \n",
|
||
" assert softmax_output.shape == (2, 3), \"Softmax output should preserve shape\"\n",
|
||
" \n",
|
||
" # Each row should sum to 1 (probability distribution)\n",
|
||
" for i in range(2):\n",
|
||
" row_sum = np.sum(softmax_output.data[i])\n",
|
||
" assert abs(row_sum - 1.0) < 1e-6, f\"Row {i} should sum to 1, got {row_sum}\"\n",
|
||
" \n",
|
||
" # Test Dense + Sigmoid (binary classification pattern)\n",
|
||
" sigmoid_output = sigmoid(linear_output)\n",
|
||
" \n",
|
||
" assert sigmoid_output.shape == (2, 3), \"Sigmoid output should preserve shape\"\n",
|
||
" assert np.all(sigmoid_output.data > 0), \"Sigmoid output should be positive\"\n",
|
||
" assert np.all(sigmoid_output.data < 1), \"Sigmoid output should be less than 1\"\n",
|
||
" \n",
|
||
" # Test Dense + Tanh (hidden layer with centered outputs)\n",
|
||
" tanh_output = tanh(linear_output)\n",
|
||
" \n",
|
||
" assert tanh_output.shape == (2, 3), \"Tanh output should preserve shape\"\n",
|
||
" assert np.all(tanh_output.data > -1), \"Tanh output should be > -1\"\n",
|
||
" assert np.all(tanh_output.data < 1), \"Tanh output should be < 1\"\n",
|
||
" \n",
|
||
" # Test chained layers (simple 2-layer network)\n",
|
||
" layer1 = Dense(input_size=4, output_size=5)\n",
|
||
" layer2 = Dense(input_size=5, output_size=3)\n",
|
||
" \n",
|
||
" # Forward pass through 2-layer network\n",
|
||
" hidden = relu(layer1(input_data))\n",
|
||
" output = softmax(layer2(hidden))\n",
|
||
" \n",
|
||
" assert output.shape == (2, 3), \"2-layer network should produce correct output shape\"\n",
|
||
" \n",
|
||
" # Each output should be a valid probability distribution\n",
|
||
" for i in range(2):\n",
|
||
" row_sum = np.sum(output.data[i])\n",
|
||
" assert abs(row_sum - 1.0) < 1e-6, f\"Network output row {i} should sum to 1\"\n",
|
||
" \n",
|
||
" # Test that layers are learning-ready (have parameters)\n",
|
||
" assert hasattr(layer1, 'weights'), \"Layer should have weights\"\n",
|
||
" assert hasattr(layer1, 'bias'), \"Layer should have bias\"\n",
|
||
" assert isinstance(layer1.weights, Tensor), \"Weights should be Tensor\"\n",
|
||
" assert isinstance(layer1.bias, Tensor), \"Bias should be Tensor\"\n",
|
||
" \n",
|
||
" print(\"✅ Layer-activation comprehensive tests passed!\")\n",
|
||
" print(f\"✅ Dense + ReLU working correctly\")\n",
|
||
" print(f\"✅ Dense + Softmax producing valid probabilities\")\n",
|
||
" print(f\"✅ Dense + Sigmoid bounded correctly\")\n",
|
||
" print(f\"✅ Dense + Tanh centered correctly\")\n",
|
||
" print(f\"✅ Multi-layer networks working\")\n",
|
||
" print(f\"✅ All components ready for training!\")\n",
|
||
"\n",
|
||
"# Run the test\n",
|
||
"test_layer_activation()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9cfd022a",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🧪 Module Testing\n",
|
||
"\n",
|
||
"Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
|
||
"\n",
|
||
"**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "e508b1ce",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "standardized-testing",
|
||
"locked": true,
|
||
"schema_version": 3,
|
||
"solution": false,
|
||
"task": false
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# =============================================================================\n",
|
||
"# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
|
||
"# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
|
||
"# =============================================================================\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" from tito.tools.testing import run_module_tests_auto\n",
|
||
" \n",
|
||
" # Automatically discover and run all tests in this module\n",
|
||
" success = run_module_tests_auto(\"Layers\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "89a2d068",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🎯 Module Summary: Neural Network Layers Mastery!\n",
|
||
"\n",
|
||
"Congratulations! You've successfully implemented the fundamental building blocks of neural networks:\n",
|
||
"\n",
|
||
"### ✅ What You've Built\n",
|
||
"- **Matrix Multiplication**: The core operation powering all neural network computations\n",
|
||
"- **Dense Layer**: The fundamental building block with proper weight initialization\n",
|
||
"- **Integration**: How layers work with activation functions to create complete neural components\n",
|
||
"- **Flexibility**: Support for bias/no-bias and naive/optimized matrix multiplication\n",
|
||
"\n",
|
||
"### ✅ Key Learning Outcomes\n",
|
||
"- **Understanding**: How linear transformations enable feature learning\n",
|
||
"- **Implementation**: Built layers from scratch with proper initialization\n",
|
||
"- **Testing**: Progressive validation with immediate feedback\n",
|
||
"- **Integration**: Saw how layers compose with activations for complete functionality\n",
|
||
"- **Real-world skills**: Understanding the mathematics behind neural networks\n",
|
||
"\n",
|
||
"### ✅ Mathematical Mastery\n",
|
||
"- **Matrix Multiplication**: C[i,j] = Σ(A[i,k] * B[k,j]) - implemented with loops\n",
|
||
"- **Linear Transformation**: y = xW + b - the heart of neural networks\n",
|
||
"- **Xavier Initialization**: Proper weight scaling for stable gradients\n",
|
||
"- **Composition**: How multiple layers create complex functions\n",
|
||
"\n",
|
||
"### ✅ Professional Skills Developed\n",
|
||
"- **Algorithm implementation**: From mathematical definition to working code\n",
|
||
"- **Performance considerations**: Naive vs optimized implementations\n",
|
||
"- **API design**: Clean, consistent interfaces for layer creation and usage\n",
|
||
"- **Testing methodology**: Unit tests, comprehensive tests, and edge case handling\n",
|
||
"\n",
|
||
"### ✅ Ready for Next Steps\n",
|
||
"Your layers are now ready to power:\n",
|
||
"- **Complete Networks**: Stack multiple layers with activations\n",
|
||
"- **Training**: Gradient computation and parameter updates\n",
|
||
"- **Specialized Architectures**: CNNs, RNNs, Transformers all use these foundations\n",
|
||
"- **Real Applications**: Image classification, NLP, game playing, etc.\n",
|
||
"\n",
|
||
"### 🔗 Connection to Real ML Systems\n",
|
||
"Your implementations mirror production frameworks:\n",
|
||
"- **PyTorch**: `torch.nn.Linear()` - same mathematical operations\n",
|
||
"- **TensorFlow**: `tf.keras.layers.Dense()` - identical functionality\n",
|
||
"- **Industry**: Every major neural network uses these exact computations\n",
|
||
"\n",
|
||
"### 🎯 The Power of Linear Algebra\n",
|
||
"You've unlocked the mathematical foundation of AI:\n",
|
||
"- **Feature combination**: Each layer learns how to combine input features\n",
|
||
"- **Representation learning**: Layers automatically discover useful representations\n",
|
||
"- **Universal approximation**: Stack enough layers to approximate any function\n",
|
||
"- **Scalability**: Same operations work from small networks to massive language models\n",
|
||
"\n",
|
||
"### 🧠 Deep Learning Insights\n",
|
||
"- **Why deep networks work**: Multiple layers = multiple levels of abstraction\n",
|
||
"- **Parameter efficiency**: Shared weights enable learning with limited data\n",
|
||
"- **Gradient flow**: Proper initialization enables training deep networks\n",
|
||
"- **Composability**: Simple components combine to create complex intelligence\n",
|
||
"\n",
|
||
"**Next Module**: Networks - Composing your layers into complete neural network architectures!\n",
|
||
"\n",
|
||
"Your layers are the building blocks. Now let's assemble them into powerful neural networks that can learn to solve complex problems!"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"jupytext": {
|
||
"main_language": "python"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|