🧹 Remove Jupyter notebooks from modules/source - Python-first workflow

- Delete all 15 .ipynb files from modules/source directories
- Align with TinyTorch's Python-first development philosophy
- .py files are the source of truth, .ipynb files are temporary outputs
- Prevents version control conflicts with notebook metadata
- Students work directly with .py files using Jupytext format
- Notebooks can be regenerated when needed via 'tito nbdev generate'

Removed files:
- All *_dev.ipynb files across modules 01-15
- Keeps repository clean and focused on source code
This commit is contained in:
Vijay Janapa Reddi
2025-07-20 08:41:26 -04:00
parent 14a4e966ab
commit cec401af65
15 changed files with 0 additions and 22928 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,924 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0e007598",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Layers - Building Blocks of Neural Networks\n",
"\n",
"Welcome to the Layers module! This is where we build the fundamental components that stack together to form neural networks.\n",
"\n",
"## Learning Goals\n",
"- Understand how matrix multiplication powers neural networks\n",
"- Implement naive matrix multiplication from scratch for deep understanding\n",
"- Build the Dense (Linear) layer - the foundation of all neural networks\n",
"- Learn weight initialization strategies and their importance\n",
"- See how layers compose with activations to create powerful networks\n",
"\n",
"## Build → Use → Understand\n",
"1. **Build**: Matrix multiplication and Dense layers from scratch\n",
"2. **Use**: Create and test layers with real data\n",
"3. **Understand**: How linear transformations enable feature learning"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bc400228",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "layers-imports",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"#| default_exp core.layers\n",
"\n",
"#| export\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import os\n",
"import sys\n",
"from typing import Union, List, Tuple, Optional\n",
"\n",
"# Import our dependencies - try from package first, then local modules\n",
"try:\n",
" from tinytorch.core.tensor import Tensor\n",
" from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
"except ImportError:\n",
" # For development, import from local modules\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))\n",
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))\n",
" try:\n",
" from tensor_dev import Tensor\n",
" from activations_dev import ReLU, Sigmoid, Tanh, Softmax\n",
" except ImportError:\n",
" # If the local modules are not available, use relative imports\n",
" from ..tensor.tensor_dev import Tensor\n",
" from ..activations.activations_dev import ReLU, Sigmoid, Tanh, Softmax"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e186492c",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "layers-setup",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"#| hide\n",
"#| export\n",
"def _should_show_plots():\n",
" \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
" # Check multiple conditions that indicate we're in test mode\n",
" is_pytest = (\n",
" 'pytest' in sys.modules or\n",
" 'test' in sys.argv or\n",
" os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
" any('test' in arg for arg in sys.argv) or\n",
" any('pytest' in arg for arg in sys.argv)\n",
" )\n",
" \n",
" # Show plots in development mode (when not in test mode)\n",
" return not is_pytest"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d41a5d47",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "layers-welcome",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"print(\"🔥 TinyTorch Layers Module\")\n",
"print(f\"NumPy version: {np.__version__}\")\n",
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
"print(\"Ready to build neural network layers!\")"
]
},
{
"cell_type": "markdown",
"id": "bed6f41e",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 📦 Where This Code Lives in the Final Package\n",
"\n",
"**Learning Side:** You work in `modules/source/03_layers/layers_dev.py` \n",
"**Building Side:** Code exports to `tinytorch.core.layers`\n",
"\n",
"```python\n",
"# Final package structure:\n",
"from tinytorch.core.layers import Dense, Conv2D # All layer types together!\n",
"from tinytorch.core.tensor import Tensor # The foundation\n",
"from tinytorch.core.activations import ReLU, Sigmoid # Nonlinearity\n",
"```\n",
"\n",
"**Why this matters:**\n",
"- **Learning:** Focused modules for deep understanding\n",
"- **Production:** Proper organization like PyTorch's `torch.nn.Linear`\n",
"- **Consistency:** All layer types live together in `core.layers`\n",
"- **Integration:** Works seamlessly with tensors and activations"
]
},
{
"cell_type": "markdown",
"id": "a2c033ee",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## What Are Neural Network Layers?\n",
"\n",
"### The Building Block Pattern\n",
"Neural networks are built by stacking **layers** - each layer is a function that:\n",
"1. **Takes input**: Tensor data from previous layer\n",
"2. **Transforms**: Applies mathematical operations (linear transformation + activation)\n",
"3. **Produces output**: New tensor data for next layer\n",
"\n",
"### The Universal Pattern\n",
"Every layer follows this pattern:\n",
"```python\n",
"def layer(x):\n",
" # 1. Linear transformation\n",
" linear_output = x @ weights + bias\n",
" \n",
" # 2. Nonlinear activation\n",
" output = activation(linear_output)\n",
" \n",
" return output\n",
"```\n",
"\n",
"### Why This Works\n",
"- **Linear part**: Learns feature combinations\n",
"- **Nonlinear part**: Enables complex patterns\n",
"- **Stacking**: Multiple layers = more complex functions\n",
"\n",
"### Mathematical Foundation\n",
"A neural network is function composition:\n",
"```\n",
"f(x) = layer_n(layer_{n-1}(...layer_2(layer_1(x))))\n",
"```\n",
"\n",
"Each layer transforms the representation to be more useful for the final task.\n",
"\n",
"### What We'll Build\n",
"1. **Matrix Multiplication**: The core operation powering all layers\n",
"2. **Dense Layer**: The fundamental building block of neural networks\n",
"3. **Integration**: How layers work with activations and tensors"
]
},
{
"cell_type": "markdown",
"id": "448f63f6",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"## Step 1: Matrix Multiplication - The Engine of Neural Networks\n",
"\n",
"### What is Matrix Multiplication?\n",
"Matrix multiplication is the core operation that powers all neural network layers:\n",
"\n",
"```\n",
"C = A @ B\n",
"```\n",
"\n",
"Where:\n",
"- **A**: Input data (batch_size × input_features)\n",
"- **B**: Weight matrix (input_features × output_features) \n",
"- **C**: Output data (batch_size × output_features)\n",
"\n",
"### Why It's Essential\n",
"- **Feature combination**: Each output combines all input features\n",
"- **Learned weights**: B contains the learned parameters\n",
"- **Efficient computation**: Vectorized operations are much faster\n",
"- **Parallel processing**: GPUs are designed for matrix operations\n",
"\n",
"### The Mathematical Definition\n",
"For matrices A (m×n) and B (n×p), the result C (m×p) is:\n",
"```\n",
"C[i,j] = Σ(k=0 to n-1) A[i,k] * B[k,j]\n",
"```\n",
"\n",
"### Visual Understanding\n",
"```\n",
"[1 2] @ [5 6] = [1*5+2*7 1*6+2*8] = [19 22]\n",
"[3 4] [7 8] [3*5+4*7 3*6+4*8] [43 50]\n",
"```\n",
"\n",
"### Real-World Context\n",
"Every major operation in deep learning uses matrix multiplication:\n",
"- **Dense layers**: Linear transformations\n",
"- **Convolutional layers**: Convolution as matrix multiplication\n",
"- **Attention mechanisms**: Query-Key-Value computations\n",
"- **Embeddings**: Lookup tables as matrix multiplication"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cccd838f",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "matmul-naive",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"def matmul(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
" \"\"\"\n",
" Matrix multiplication using explicit for-loops.\n",
" \n",
" This helps you understand what matrix multiplication really does!\n",
" \n",
" TODO: Implement matrix multiplication using three nested for-loops.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Get the dimensions: m, n from A.shape and n2, p from B.shape\n",
" 2. Check compatibility: n must equal n2\n",
" 3. Create output matrix C of shape (m, p) filled with zeros\n",
" 4. Use three nested loops:\n",
" - i loop: iterate through rows of A (0 to m-1)\n",
" - j loop: iterate through columns of B (0 to p-1)\n",
" - k loop: iterate through shared dimension (0 to n-1)\n",
" 5. For each (i,j), accumulate: C[i,j] += A[i,k] * B[k,j]\n",
" \n",
" EXAMPLE WALKTHROUGH:\n",
" ```python\n",
" A = [[1, 2], B = [[5, 6],\n",
" [3, 4]] [7, 8]]\n",
" \n",
" C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19\n",
" C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22\n",
" C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43\n",
" C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50\n",
" \n",
" Result: [[19, 22], [43, 50]]\n",
" ```\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Get dimensions: m, n = A.shape; n2, p = B.shape\n",
" - Check compatibility: if n != n2: raise ValueError\n",
" - Initialize result: C = np.zeros((m, p))\n",
" - Triple nested loop: for i in range(m): for j in range(p): for k in range(n):\n",
" - Accumulate sum: C[i,j] += A[i,k] * B[k,j]\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is what every neural network layer does internally\n",
" - Understanding this helps debug shape mismatches\n",
" - Essential for understanding the foundation of neural networks\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" # Get matrix dimensions\n",
" m, n = A.shape\n",
" n2, p = B.shape\n",
" \n",
" # Check compatibility\n",
" if n != n2:\n",
" raise ValueError(f\"Incompatible matrix dimensions: A is {m}x{n}, B is {n2}x{p}\")\n",
" \n",
" # Initialize result matrix\n",
" C = np.zeros((m, p))\n",
" \n",
" # Triple nested loop for matrix multiplication\n",
" for i in range(m):\n",
" for j in range(p):\n",
" for k in range(n):\n",
" C[i, j] += A[i, k] * B[k, j]\n",
" \n",
" return C\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "6e695714",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### 🧪 Test Your Matrix Multiplication\n",
"\n",
"Once you implement the `matmul` function above, run this cell to test it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bed91066",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "test-matmul-immediate",
"locked": true,
"points": 10,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"def test_matrix_multiplication():\n",
" \"\"\"Test matrix multiplication implementation\"\"\"\n",
" print(\"🔬 Unit Test: Matrix Multiplication...\")\n",
"\n",
"# Test simple 2x2 case\n",
" A = np.array([[1, 2], [3, 4]], dtype=np.float32)\n",
" B = np.array([[5, 6], [7, 8]], dtype=np.float32)\n",
" \n",
" result = matmul(A, B)\n",
" expected = np.array([[19, 22], [43, 50]], dtype=np.float32)\n",
" \n",
" assert np.allclose(result, expected), f\"Matrix multiplication failed: expected {expected}, got {result}\"\n",
" \n",
" # Compare with NumPy\n",
" numpy_result = A @ B\n",
" assert np.allclose(result, numpy_result), f\"Doesn't match NumPy: got {result}, expected {numpy_result}\"\n",
"\n",
"# Test different shapes\n",
" A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3\n",
" B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1\n",
" result2 = matmul(A2, B2)\n",
" expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32\n",
" \n",
" assert np.allclose(result2, expected2), f\"1x3 @ 3x1 failed: expected {expected2}, got {result2}\"\n",
" \n",
" # Test 3x3 case\n",
" A3 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)\n",
" B3 = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]], dtype=np.float32) # Identity\n",
" result3 = matmul(A3, B3)\n",
" \n",
" assert np.allclose(result3, A3), \"Multiplication by identity should preserve matrix\"\n",
" \n",
" # Test incompatible shapes\n",
" A4 = np.array([[1, 2]], dtype=np.float32) # 1x2\n",
" B4 = np.array([[3], [4], [5]], dtype=np.float32) # 3x1\n",
" \n",
" try:\n",
" matmul(A4, B4)\n",
" assert False, \"Should raise error for incompatible shapes\"\n",
" except ValueError as e:\n",
" assert \"Incompatible matrix dimensions\" in str(e)\n",
" \n",
" print(\"✅ Matrix multiplication tests passed!\")\n",
" print(f\"✅ 2x2 multiplication working correctly\")\n",
" print(f\"✅ Matches NumPy's implementation\")\n",
" print(f\"✅ Handles different shapes correctly\")\n",
" print(f\"✅ Proper error handling for incompatible shapes\")\n",
"\n",
"# Run the test\n",
"test_matrix_multiplication()"
]
},
{
"cell_type": "markdown",
"id": "ab183a07",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"## Step 2: Dense Layer - The Foundation of Neural Networks\n",
"\n",
"### What is a Dense Layer?\n",
"A **Dense layer** (also called Linear or Fully Connected layer) is the fundamental building block of neural networks:\n",
"\n",
"```python\n",
"output = input @ weights + bias\n",
"```\n",
"\n",
"Where:\n",
"- **input**: Input data (batch_size × input_features)\n",
"- **weights**: Learned parameters (input_features × output_features)\n",
"- **bias**: Learned bias terms (output_features,)\n",
"- **output**: Transformed data (batch_size × output_features)\n",
"\n",
"### Why Dense Layers Are Essential\n",
"1. **Feature transformation**: Learn meaningful combinations of input features\n",
"2. **Universal approximation**: Stack enough layers to approximate any function\n",
"3. **Learnable parameters**: Weights and biases are optimized during training\n",
"4. **Composability**: Can be stacked to create complex architectures\n",
"\n",
"### The Mathematical Foundation\n",
"For input x, weight matrix W, and bias b:\n",
"```\n",
"y = xW + b\n",
"```\n",
"\n",
"This is a linear transformation that:\n",
"- **Combines features**: Each output is a weighted sum of all inputs\n",
"- **Learns relationships**: Weights encode feature interactions\n",
"- **Adds flexibility**: Bias allows shifting the output\n",
"\n",
"### Real-World Applications\n",
"- **Classification**: Transform features to class logits\n",
"- **Regression**: Transform features to continuous outputs\n",
"- **Representation learning**: Learn useful intermediate representations\n",
"- **Attention mechanisms**: Compute queries, keys, and values\n",
"\n",
"### Design Decisions\n",
"- **Weight initialization**: Random initialization to break symmetry\n",
"- **Bias usage**: Usually included for flexibility\n",
"- **Activation**: Often followed by nonlinear activation"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eec77bde",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "dense-layer",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"class Dense:\n",
" \"\"\"\n",
" Dense (Linear/Fully Connected) Layer\n",
" \n",
" Applies a linear transformation: y = xW + b\n",
" \n",
" This is the fundamental building block of neural networks.\n",
" \"\"\"\n",
" \n",
" def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
" \"\"\"\n",
" Initialize Dense layer with random weights and optional bias.\n",
" \n",
" TODO: Implement Dense layer initialization.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Store the layer parameters (input_size, output_size, use_bias)\n",
" 2. Initialize weights with random values using proper scaling\n",
" 3. Initialize bias (if use_bias=True) with zeros\n",
" 4. Convert weights and bias to Tensor objects\n",
" \n",
" WEIGHT INITIALIZATION STRATEGY:\n",
" - Use Xavier/Glorot initialization for better gradient flow\n",
" - Scale: sqrt(2 / (input_size + output_size))\n",
" - Random values: np.random.randn() * scale\n",
" \n",
" EXAMPLE USAGE:\n",
" ```python\n",
" layer = Dense(input_size=3, output_size=2)\n",
" # Creates weight matrix of shape (3, 2) and bias of shape (2,)\n",
" ```\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Store parameters: self.input_size, self.output_size, self.use_bias\n",
" - Weight shape: (input_size, output_size)\n",
" - Bias shape: (output_size,) if use_bias else None\n",
" - Use Xavier initialization: scale = np.sqrt(2.0 / (input_size + output_size))\n",
" - Initialize weights: np.random.randn(input_size, output_size) * scale\n",
" - Initialize bias: np.zeros(output_size) if use_bias else None\n",
" - Convert to Tensors: self.weights = Tensor(weight_data), self.bias = Tensor(bias_data)\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" # Store layer parameters\n",
" self.input_size = input_size\n",
" self.output_size = output_size\n",
" self.use_bias = use_bias\n",
" \n",
" # Xavier/Glorot initialization\n",
" scale = np.sqrt(2.0 / (input_size + output_size))\n",
" \n",
" # Initialize weights with random values\n",
" weight_data = np.random.randn(input_size, output_size) * scale\n",
" self.weights = Tensor(weight_data)\n",
" \n",
" # Initialize bias\n",
" if use_bias:\n",
" bias_data = np.zeros(output_size)\n",
" self.bias = Tensor(bias_data)\n",
" else:\n",
" self.bias = None\n",
" ### END SOLUTION\n",
" \n",
" def forward(self, x):\n",
" \"\"\"\n",
" Forward pass through the Dense layer.\n",
" \n",
" TODO: Implement the forward pass: y = xW + b\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Perform matrix multiplication: x @ self.weights\n",
" 2. Add bias if present: result + self.bias\n",
" 3. Return the result as a Tensor\n",
" \n",
" EXAMPLE USAGE:\n",
" ```python\n",
" layer = Dense(input_size=3, output_size=2)\n",
" input_data = Tensor([[1, 2, 3]]) # Shape: (1, 3)\n",
" output = layer(input_data) # Shape: (1, 2)\n",
" ```\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Matrix multiplication: matmul(x.data, self.weights.data)\n",
" - Add bias: result + self.bias.data (broadcasting handles shape)\n",
" - Return as Tensor: return Tensor(final_result)\n",
" - Handle both cases: with and without bias\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is the core operation in every neural network layer\n",
" - Matrix multiplication combines all input features\n",
" - Bias addition allows shifting the output distribution\n",
" - The result feeds into activation functions\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" # Perform matrix multiplication\n",
" linear_output = matmul(x.data, self.weights.data)\n",
" \n",
" # Add bias if present\n",
" if self.use_bias and self.bias is not None:\n",
" linear_output = linear_output + self.bias.data\n",
" \n",
" return type(x)(linear_output)\n",
" ### END SOLUTION\n",
" \n",
" def __call__(self, x):\n",
" \"\"\"Make the layer callable: layer(x) instead of layer.forward(x)\"\"\"\n",
" return self.forward(x)"
]
},
{
"cell_type": "markdown",
"id": "5736d98c",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### 🧪 Test Your Dense Layer\n",
"\n",
"Once you implement the Dense layer above, run this cell to test it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b1d056c",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "test-dense-layer",
"locked": true,
"points": 15,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"def test_dense_layer():\n",
" \"\"\"Test Dense layer implementation\"\"\"\n",
" print(\"🔬 Unit Test: Dense Layer...\")\n",
" \n",
" # Test layer creation\n",
" layer = Dense(input_size=3, output_size=2)\n",
" \n",
" # Check weight and bias shapes\n",
" assert layer.weights.shape == (3, 2), f\"Weight shape should be (3, 2), got {layer.weights.shape}\"\n",
" assert layer.bias is not None, \"Bias should not be None when use_bias=True\"\n",
" assert layer.bias.shape == (2,), f\"Bias shape should be (2,), got {layer.bias.shape}\"\n",
" \n",
" # Test forward pass\n",
" input_data = Tensor([[1, 2, 3]]) # Shape: (1, 3)\n",
" output = layer(input_data)\n",
" \n",
" # Check output shape\n",
" assert output.shape == (1, 2), f\"Output shape should be (1, 2), got {output.shape}\"\n",
" \n",
" # Test batch processing\n",
" batch_input = Tensor([[1, 2, 3], [4, 5, 6]]) # Shape: (2, 3)\n",
" batch_output = layer(batch_input)\n",
" \n",
" assert batch_output.shape == (2, 2), f\"Batch output shape should be (2, 2), got {batch_output.shape}\"\n",
"\n",
"# Test without bias\n",
" no_bias_layer = Dense(input_size=3, output_size=2, use_bias=False)\n",
" assert no_bias_layer.bias is None, \"Layer without bias should have None bias\"\n",
" \n",
" no_bias_output = no_bias_layer(input_data)\n",
" assert no_bias_output.shape == (1, 2), \"No-bias layer should still produce correct shape\"\n",
" \n",
" # Test that different inputs produce different outputs\n",
" input1 = Tensor([[1, 0, 0]])\n",
" input2 = Tensor([[0, 1, 0]])\n",
" \n",
" output1 = layer(input1)\n",
" output2 = layer(input2)\n",
" \n",
" # Should not be equal (with high probability due to random initialization)\n",
" assert not np.allclose(output1.data, output2.data), \"Different inputs should produce different outputs\"\n",
" \n",
" # Test linearity property: layer(a*x) = a*layer(x)\n",
" scale = 2.0\n",
" scaled_input = Tensor([[2, 4, 6]]) # 2 * [1, 2, 3]\n",
" scaled_output = layer(scaled_input)\n",
" \n",
" # Due to bias, this won't be exactly 2*output, but the linear part should scale\n",
" print(\"✅ Dense layer tests passed!\")\n",
" print(f\"✅ Correct weight and bias initialization\")\n",
" print(f\"✅ Forward pass produces correct shapes\")\n",
" print(f\"✅ Batch processing works correctly\")\n",
" print(f\"✅ Bias and no-bias variants work\")\n",
" print(f\"✅ Naive matrix multiplication option works\")\n",
"\n",
"# Run the test\n",
"test_dense_layer()"
]
},
{
"cell_type": "markdown",
"id": "ac4dcba0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"## Step 3: Layer Integration with Activations\n",
"\n",
"### Building Complete Neural Network Components\n",
"Now let's see how Dense layers work with activation functions to create complete neural network components:\n",
"\n",
"```python\n",
"# Complete neural network layer\n",
"x = input_data\n",
"linear_output = dense_layer(x)\n",
"final_output = activation_function(linear_output)\n",
"```\n",
"\n",
"### Why This Combination Works\n",
"1. **Linear transformation**: Dense layer learns feature combinations\n",
"2. **Nonlinear activation**: Enables complex pattern recognition\n",
"3. **Stacking**: Multiple layer+activation pairs create deep networks\n",
"4. **Universal approximation**: Can approximate any continuous function\n",
"\n",
"### Real-World Layer Patterns\n",
"- **Hidden layers**: Dense + ReLU (most common)\n",
"- **Output layers**: Dense + Softmax (classification) or Dense + Sigmoid (binary)\n",
"- **Gated layers**: Dense + Sigmoid (for gates in LSTM/GRU)\n",
"- **Attention layers**: Dense + Softmax (for attention weights)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5e77a64",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "test-layer-activation-comprehensive",
"locked": true,
"points": 15,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"def test_layer_activation():\n",
" \"\"\"Test Dense layer comprehensive testing with activation functions\"\"\"\n",
" print(\"🔬 Unit Test: Layer-Activation Comprehensive Test...\")\n",
" \n",
" # Create layer and activation functions\n",
" layer = Dense(input_size=4, output_size=3)\n",
" relu = ReLU()\n",
" sigmoid = Sigmoid()\n",
" tanh = Tanh()\n",
" softmax = Softmax()\n",
" \n",
" # Test input\n",
" input_data = Tensor([[1, -2, 3, -4], [2, 1, -1, 3]]) # Shape: (2, 4)\n",
" \n",
" # Test Dense + ReLU (common hidden layer pattern)\n",
" linear_output = layer(input_data)\n",
" relu_output = relu(linear_output)\n",
" \n",
" assert relu_output.shape == (2, 3), \"ReLU output should preserve shape\"\n",
" assert np.all(relu_output.data >= 0), \"ReLU output should be non-negative\"\n",
" \n",
" # Test Dense + Softmax (classification output pattern)\n",
" softmax_output = softmax(linear_output)\n",
" \n",
" assert softmax_output.shape == (2, 3), \"Softmax output should preserve shape\"\n",
" \n",
" # Each row should sum to 1 (probability distribution)\n",
" for i in range(2):\n",
" row_sum = np.sum(softmax_output.data[i])\n",
" assert abs(row_sum - 1.0) < 1e-6, f\"Row {i} should sum to 1, got {row_sum}\"\n",
" \n",
" # Test Dense + Sigmoid (binary classification pattern)\n",
" sigmoid_output = sigmoid(linear_output)\n",
" \n",
" assert sigmoid_output.shape == (2, 3), \"Sigmoid output should preserve shape\"\n",
" assert np.all(sigmoid_output.data > 0), \"Sigmoid output should be positive\"\n",
" assert np.all(sigmoid_output.data < 1), \"Sigmoid output should be less than 1\"\n",
" \n",
" # Test Dense + Tanh (hidden layer with centered outputs)\n",
" tanh_output = tanh(linear_output)\n",
" \n",
" assert tanh_output.shape == (2, 3), \"Tanh output should preserve shape\"\n",
" assert np.all(tanh_output.data > -1), \"Tanh output should be > -1\"\n",
" assert np.all(tanh_output.data < 1), \"Tanh output should be < 1\"\n",
" \n",
" # Test chained layers (simple 2-layer network)\n",
" layer1 = Dense(input_size=4, output_size=5)\n",
" layer2 = Dense(input_size=5, output_size=3)\n",
" \n",
" # Forward pass through 2-layer network\n",
" hidden = relu(layer1(input_data))\n",
" output = softmax(layer2(hidden))\n",
" \n",
" assert output.shape == (2, 3), \"2-layer network should produce correct output shape\"\n",
" \n",
" # Each output should be a valid probability distribution\n",
" for i in range(2):\n",
" row_sum = np.sum(output.data[i])\n",
" assert abs(row_sum - 1.0) < 1e-6, f\"Network output row {i} should sum to 1\"\n",
" \n",
" # Test that layers are learning-ready (have parameters)\n",
" assert hasattr(layer1, 'weights'), \"Layer should have weights\"\n",
" assert hasattr(layer1, 'bias'), \"Layer should have bias\"\n",
" assert isinstance(layer1.weights, Tensor), \"Weights should be Tensor\"\n",
" assert isinstance(layer1.bias, Tensor), \"Bias should be Tensor\"\n",
" \n",
" print(\"✅ Layer-activation comprehensive tests passed!\")\n",
" print(f\"✅ Dense + ReLU working correctly\")\n",
" print(f\"✅ Dense + Softmax producing valid probabilities\")\n",
" print(f\"✅ Dense + Sigmoid bounded correctly\")\n",
" print(f\"✅ Dense + Tanh centered correctly\")\n",
" print(f\"✅ Multi-layer networks working\")\n",
" print(f\"✅ All components ready for training!\")\n",
"\n",
"# Run the test\n",
"test_layer_activation()"
]
},
{
"cell_type": "markdown",
"id": "9cfd022a",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🧪 Module Testing\n",
"\n",
"Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.\n",
"\n",
"**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e508b1ce",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "standardized-testing",
"locked": true,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"# =============================================================================\n",
"# STANDARDIZED MODULE TESTING - DO NOT MODIFY\n",
"# This cell is locked to ensure consistent testing across all TinyTorch modules\n",
"# =============================================================================\n",
"\n",
"if __name__ == \"__main__\":\n",
" from tito.tools.testing import run_module_tests_auto\n",
" \n",
" # Automatically discover and run all tests in this module\n",
" success = run_module_tests_auto(\"Layers\")"
]
},
{
"cell_type": "markdown",
"id": "89a2d068",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🎯 Module Summary: Neural Network Layers Mastery!\n",
"\n",
"Congratulations! You've successfully implemented the fundamental building blocks of neural networks:\n",
"\n",
"### ✅ What You've Built\n",
"- **Matrix Multiplication**: The core operation powering all neural network computations\n",
"- **Dense Layer**: The fundamental building block with proper weight initialization\n",
"- **Integration**: How layers work with activation functions to create complete neural components\n",
"- **Flexibility**: Support for bias/no-bias and naive/optimized matrix multiplication\n",
"\n",
"### ✅ Key Learning Outcomes\n",
"- **Understanding**: How linear transformations enable feature learning\n",
"- **Implementation**: Built layers from scratch with proper initialization\n",
"- **Testing**: Progressive validation with immediate feedback\n",
"- **Integration**: Saw how layers compose with activations for complete functionality\n",
"- **Real-world skills**: Understanding the mathematics behind neural networks\n",
"\n",
"### ✅ Mathematical Mastery\n",
"- **Matrix Multiplication**: C[i,j] = Σ(A[i,k] * B[k,j]) - implemented with loops\n",
"- **Linear Transformation**: y = xW + b - the heart of neural networks\n",
"- **Xavier Initialization**: Proper weight scaling for stable gradients\n",
"- **Composition**: How multiple layers create complex functions\n",
"\n",
"### ✅ Professional Skills Developed\n",
"- **Algorithm implementation**: From mathematical definition to working code\n",
"- **Performance considerations**: Naive vs optimized implementations\n",
"- **API design**: Clean, consistent interfaces for layer creation and usage\n",
"- **Testing methodology**: Unit tests, comprehensive tests, and edge case handling\n",
"\n",
"### ✅ Ready for Next Steps\n",
"Your layers are now ready to power:\n",
"- **Complete Networks**: Stack multiple layers with activations\n",
"- **Training**: Gradient computation and parameter updates\n",
"- **Specialized Architectures**: CNNs, RNNs, Transformers all use these foundations\n",
"- **Real Applications**: Image classification, NLP, game playing, etc.\n",
"\n",
"### 🔗 Connection to Real ML Systems\n",
"Your implementations mirror production frameworks:\n",
"- **PyTorch**: `torch.nn.Linear()` - same mathematical operations\n",
"- **TensorFlow**: `tf.keras.layers.Dense()` - identical functionality\n",
"- **Industry**: Every major neural network uses these exact computations\n",
"\n",
"### 🎯 The Power of Linear Algebra\n",
"You've unlocked the mathematical foundation of AI:\n",
"- **Feature combination**: Each layer learns how to combine input features\n",
"- **Representation learning**: Layers automatically discover useful representations\n",
"- **Universal approximation**: Stack enough layers to approximate any function\n",
"- **Scalability**: Same operations work from small networks to massive language models\n",
"\n",
"### 🧠 Deep Learning Insights\n",
"- **Why deep networks work**: Multiple layers = multiple levels of abstraction\n",
"- **Parameter efficiency**: Shared weights enable learning with limited data\n",
"- **Gradient flow**: Proper initialization enables training deep networks\n",
"- **Composability**: Simple components combine to create complex intelligence\n",
"\n",
"**Next Module**: Networks - Composing your layers into complete neural network architectures!\n",
"\n",
"Your layers are the building blocks. Now let's assemble them into powerful neural networks that can learn to solve complex problems!"
]
}
],
"metadata": {
"jupytext": {
"main_language": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff