mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-28 15:49:31 -05:00
- Change np.dot to np.matmul for proper batched 3D tensor multiplication - Add requires_grad preservation in transpose() operation - Fixes attention mechanism gradient flow issues Regression tests added in tests/regression/test_gradient_flow_fixes.py
1839 lines
82 KiB
Plaintext
1839 lines
82 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e991dad5",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"# Module 01: Tensor Foundation - Building Blocks of ML\n",
|
||
"\n",
|
||
"Welcome to Module 01! You're about to build the foundational Tensor class that powers all machine learning operations.\n",
|
||
"\n",
|
||
"## 🔗 Prerequisites & Progress\n",
|
||
"**You've Built**: Nothing - this is our foundation!\n",
|
||
"**You'll Build**: A complete Tensor class with arithmetic, matrix operations, and shape manipulation\n",
|
||
"**You'll Enable**: Foundation for activations, layers, and all future neural network components\n",
|
||
"\n",
|
||
"**Connection Map**:\n",
|
||
"```\n",
|
||
"NumPy Arrays → Tensor → Activations (Module 02)\n",
|
||
"(raw data) (ML ops) (intelligence)\n",
|
||
"```\n",
|
||
"\n",
|
||
"## Learning Objectives\n",
|
||
"By the end of this module, you will:\n",
|
||
"1. Implement a complete Tensor class with fundamental operations\n",
|
||
"2. Understand tensors as the universal data structure in ML\n",
|
||
"3. Test tensor operations with immediate validation\n",
|
||
"4. Prepare for gradient computation in Module 05\n",
|
||
"\n",
|
||
"Let's get started!\n",
|
||
"\n",
|
||
"## 📦 Where This Code Lives in the Final Package\n",
|
||
"\n",
|
||
"**Learning Side:** You work in modules/01_tensor/tensor_dev.py\n",
|
||
"**Building Side:** Code exports to tinytorch.core.tensor\n",
|
||
"\n",
|
||
"```python\n",
|
||
"# Final package structure:\n",
|
||
"# Future modules will import and extend this Tensor\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Why this matters:**\n",
|
||
"- **Learning:** Complete tensor system in one focused module for deep understanding\n",
|
||
"- **Production:** Proper organization like PyTorch's torch.Tensor with all core operations together\n",
|
||
"- **Consistency:** All tensor operations and data manipulation in core.tensor\n",
|
||
"- **Integration:** Foundation that every other module will build upon"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "bed71914",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "imports",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| default_exp core.tensor\n",
|
||
"#| export\n",
|
||
"\n",
|
||
"import numpy as np"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "25222aa1",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 1. Introduction: What is a Tensor?\n",
|
||
"\n",
|
||
"A tensor is a multi-dimensional array that serves as the fundamental data structure in machine learning. Think of it as a universal container that can hold data in different dimensions:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Tensor Dimensions:\n",
|
||
"┌─────────────┐\n",
|
||
"│ 0D: Scalar │ 5.0 (just a number)\n",
|
||
"│ 1D: Vector │ [1, 2, 3] (list of numbers)\n",
|
||
"│ 2D: Matrix │ [[1, 2] (grid of numbers)\n",
|
||
"│ │ [3, 4]]\n",
|
||
"│ 3D: Cube │ [[[... (stack of matrices)\n",
|
||
"└─────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"In machine learning, tensors flow through operations like water through pipes:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Neural Network Data Flow:\n",
|
||
"Input Tensor → Layer 1 → Activation → Layer 2 → ... → Output Tensor\n",
|
||
" [batch, [batch, [batch, [batch, [batch,\n",
|
||
" features] hidden] hidden] hidden2] classes]\n",
|
||
"```\n",
|
||
"\n",
|
||
"Every neural network, from simple linear regression to modern transformers, processes tensors. Understanding tensors means understanding the foundation of all ML computations.\n",
|
||
"\n",
|
||
"### Why Tensors Matter in ML Systems\n",
|
||
"\n",
|
||
"In production ML systems, tensors carry more than just data - they carry the computational graph, memory layout information, and execution context:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Real ML Pipeline:\n",
|
||
"Raw Data → Preprocessing → Tensor Creation → Model Forward Pass → Loss Computation\n",
|
||
" ↓ ↓ ↓ ↓ ↓\n",
|
||
" Files NumPy Arrays Tensors GPU Tensors Scalar Loss\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Key Insight**: Tensors bridge the gap between mathematical concepts and efficient computation on modern hardware."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2cd44f52",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 2. Foundations: Mathematical Background\n",
|
||
"\n",
|
||
"### Core Operations We'll Implement\n",
|
||
"\n",
|
||
"Our Tensor class will support all fundamental operations that neural networks need:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Operation Types:\n",
|
||
"┌─────────────────┬─────────────────┬─────────────────┐\n",
|
||
"│ Element-wise │ Matrix Ops │ Shape Ops │\n",
|
||
"├─────────────────┼─────────────────┼─────────────────┤\n",
|
||
"│ + Addition │ @ Matrix Mult │ .reshape() │\n",
|
||
"│ - Subtraction │ .transpose() │ .sum() │\n",
|
||
"│ * Multiplication│ │ .mean() │\n",
|
||
"│ / Division │ │ .max() │\n",
|
||
"└─────────────────┴─────────────────┴─────────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Broadcasting: Making Tensors Work Together\n",
|
||
"\n",
|
||
"Broadcasting automatically aligns tensors of different shapes for operations:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Broadcasting Examples:\n",
|
||
"┌─────────────────────────────────────────────────────────┐\n",
|
||
"│ Scalar + Vector: │\n",
|
||
"│ 5 + [1, 2, 3] → [5, 5, 5] + [1, 2, 3] = [6, 7, 8]│\n",
|
||
"│ │\n",
|
||
"│ Matrix + Vector (row-wise): │\n",
|
||
"│ [[1, 2]] [10] [[1, 2]] [[10, 10]] [[11, 12]] │\n",
|
||
"│ [[3, 4]] + [10] = [[3, 4]] + [[10, 10]] = [[13, 14]] │\n",
|
||
"└─────────────────────────────────────────────────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Memory Layout**: NumPy uses row-major (C-style) storage where elements are stored row by row in memory for cache efficiency:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Memory Layout (2×3 matrix):\n",
|
||
"Matrix: Memory:\n",
|
||
"[[1, 2, 3] [1][2][3][4][5][6]\n",
|
||
" [4, 5, 6]] ↑ Row 1 ↑ Row 2\n",
|
||
"\n",
|
||
"Cache Behavior:\n",
|
||
"Sequential Access: Fast (uses cache lines efficiently)\n",
|
||
" Row access: [1][2][3] → cache hit, hit, hit\n",
|
||
"Random Access: Slow (cache misses)\n",
|
||
" Column access: [1][4] → cache hit, miss\n",
|
||
"```\n",
|
||
"\n",
|
||
"This memory layout affects performance in real ML workloads - algorithms that access data sequentially run faster than those that access randomly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "852b2eb6",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 3. Implementation: Building Tensor Foundation\n",
|
||
"\n",
|
||
"Let's build our Tensor class step by step, testing each component as we go.\n",
|
||
"\n",
|
||
"**Key Design Decision**: We'll include gradient-related attributes from the start, but they'll remain dormant until Module 05. This ensures a consistent interface throughout the course while keeping the cognitive load manageable.\n",
|
||
"\n",
|
||
"### Tensor Class Architecture\n",
|
||
"\n",
|
||
"```\n",
|
||
"Tensor Class Structure:\n",
|
||
"┌─────────────────────────────────┐\n",
|
||
"│ Core Attributes: │\n",
|
||
"│ • data: np.array (the numbers) │\n",
|
||
"│ • shape: tuple (dimensions) │\n",
|
||
"│ • size: int (total elements) │\n",
|
||
"│ • dtype: type (float32, int64) │\n",
|
||
"├─────────────────────────────────┤\n",
|
||
"│ Gradient Attributes (dormant): │\n",
|
||
"│ • requires_grad: bool │\n",
|
||
"│ • grad: None (until Module 05) │\n",
|
||
"├─────────────────────────────────┤\n",
|
||
"│ Operations: │\n",
|
||
"│ • __add__, __sub__, __mul__ │\n",
|
||
"│ • matmul(), reshape() │\n",
|
||
"│ • sum(), mean(), max() │\n",
|
||
"│ • __repr__(), __str__() │\n",
|
||
"└─────────────────────────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"The beauty of this design: **all methods are defined inside the class from day one**. No monkey-patching, no dynamic attribute addition. Clean, consistent, debugger-friendly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "79fe2a61",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### Tensor Creation and Initialization\n",
|
||
"\n",
|
||
"Before we implement operations, let's understand how tensors store data and manage their attributes. This initialization is the foundation that everything else builds upon.\n",
|
||
"\n",
|
||
"```\n",
|
||
"Tensor Initialization Process:\n",
|
||
"Input Data → Validation → NumPy Array → Tensor Wrapper → Ready for Operations\n",
|
||
" [1,2,3] → types → np.array → shape=(3,) → + - * / @ ...\n",
|
||
" ↓ ↓ ↓ ↓\n",
|
||
" List/Array Type Check Memory Attributes Set\n",
|
||
" (optional) Allocation\n",
|
||
"\n",
|
||
"Memory Allocation Example:\n",
|
||
"Input: [[1, 2, 3], [4, 5, 6]]\n",
|
||
" ↓\n",
|
||
"NumPy allocates: [1][2][3][4][5][6] in contiguous memory\n",
|
||
" ↓\n",
|
||
"Tensor wraps with: shape=(2,3), size=6, dtype=int64\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Key Design Principle**: Our Tensor is a wrapper around NumPy arrays that adds ML-specific functionality. We leverage NumPy's battle-tested memory management and computation kernels while adding the gradient tracking and operation chaining needed for deep learning.\n",
|
||
"\n",
|
||
"**Why This Approach?**\n",
|
||
"- **Performance**: NumPy's C implementations are highly optimized\n",
|
||
"- **Compatibility**: Easy integration with scientific Python ecosystem\n",
|
||
"- **Memory Efficiency**: No unnecessary data copying\n",
|
||
"- **Future-Proof**: Easy transition to GPU tensors in advanced modules"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ea76431d",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "tensor-class",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"#| export\n",
|
||
"class Tensor:\n",
|
||
" \"\"\"Educational tensor that grows with student knowledge.\n",
|
||
"\n",
|
||
" This class starts simple but includes dormant features for future modules:\n",
|
||
" - requires_grad: Will be used for automatic differentiation (Module 05)\n",
|
||
" - grad: Will store computed gradients (Module 05)\n",
|
||
" - backward(): Will compute gradients (Module 05)\n",
|
||
"\n",
|
||
" For now, focus on: data, shape, and basic operations.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def __init__(self, data, requires_grad=False):\n",
|
||
" \"\"\"\n",
|
||
" Create a new tensor from data.\n",
|
||
"\n",
|
||
" TODO: Initialize tensor attributes\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Convert data to NumPy array - handles lists, scalars, etc.\n",
|
||
" 2. Store shape and size for quick access\n",
|
||
" 3. Set up gradient tracking (dormant until Module 05)\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> tensor = Tensor([1, 2, 3])\n",
|
||
" >>> print(tensor.data)\n",
|
||
" [1 2 3]\n",
|
||
" >>> print(tensor.shape)\n",
|
||
" (3,)\n",
|
||
"\n",
|
||
" HINT: np.array() handles type conversion automatically\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Core tensor data - always present\n",
|
||
" self.data = np.array(data, dtype=np.float32) # Consistent float32 for ML\n",
|
||
" self.shape = self.data.shape\n",
|
||
" self.size = self.data.size\n",
|
||
" self.dtype = self.data.dtype\n",
|
||
"\n",
|
||
" # Gradient features (dormant until Module 05)\n",
|
||
" self.requires_grad = requires_grad\n",
|
||
" self.grad = None\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __repr__(self):\n",
|
||
" \"\"\"String representation of tensor for debugging.\"\"\"\n",
|
||
" grad_info = f\", requires_grad={self.requires_grad}\" if self.requires_grad else \"\"\n",
|
||
" return f\"Tensor(data={self.data}, shape={self.shape}{grad_info})\"\n",
|
||
"\n",
|
||
" def __str__(self):\n",
|
||
" \"\"\"Human-readable string representation.\"\"\"\n",
|
||
" return f\"Tensor({self.data})\"\n",
|
||
"\n",
|
||
" def numpy(self):\n",
|
||
" \"\"\"Return the underlying NumPy array.\"\"\"\n",
|
||
" return self.data\n",
|
||
"\n",
|
||
" # nbgrader={\\\"grade\\\": false, \\\"grade_id\\\": \\\"addition-impl\\\", \\\"solution\\\": true}\n",
|
||
" def __add__(self, other):\n",
|
||
" \"\"\"\n",
|
||
" Add two tensors element-wise with broadcasting support.\n",
|
||
"\n",
|
||
" TODO: Implement tensor addition with automatic broadcasting\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Handle both Tensor and scalar inputs\n",
|
||
" 2. Use NumPy's broadcasting for automatic shape alignment\n",
|
||
" 3. Return new Tensor with result (don't modify self)\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> a = Tensor([1, 2, 3])\n",
|
||
" >>> b = Tensor([4, 5, 6])\n",
|
||
" >>> result = a + b\n",
|
||
" >>> print(result.data)\n",
|
||
" [5. 7. 9.]\n",
|
||
"\n",
|
||
" BROADCASTING EXAMPLE:\n",
|
||
" >>> matrix = Tensor([[1, 2], [3, 4]]) # Shape: (2, 2)\n",
|
||
" >>> vector = Tensor([10, 20]) # Shape: (2,)\n",
|
||
" >>> result = matrix + vector # Broadcasting: (2,2) + (2,) → (2,2)\n",
|
||
" >>> print(result.data)\n",
|
||
" [[11. 22.]\n",
|
||
" [13. 24.]]\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - Use isinstance() to check if other is a Tensor\n",
|
||
" - NumPy handles broadcasting automatically with +\n",
|
||
" - Always return a new Tensor, don't modify self\n",
|
||
" - Preserve gradient tracking for future modules\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" # Tensor + Tensor: let NumPy handle broadcasting\n",
|
||
" return Tensor(self.data + other.data)\n",
|
||
" else:\n",
|
||
" # Tensor + scalar: NumPy broadcasts automatically\n",
|
||
" return Tensor(self.data + other)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" # nbgrader={\"grade\": false, \"grade_id\": \"more-arithmetic\", \"solution\": true}\n",
|
||
" def __sub__(self, other):\n",
|
||
" \"\"\"\n",
|
||
" Subtract two tensors element-wise.\n",
|
||
"\n",
|
||
" Common use: Centering data (x - mean), computing differences for loss functions.\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" return Tensor(self.data - other.data)\n",
|
||
" else:\n",
|
||
" return Tensor(self.data - other)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __mul__(self, other):\n",
|
||
" \"\"\"\n",
|
||
" Multiply two tensors element-wise (NOT matrix multiplication).\n",
|
||
"\n",
|
||
" Common use: Scaling features, applying masks, gating mechanisms in neural networks.\n",
|
||
" Note: This is * operator, not @ (which will be matrix multiplication).\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" return Tensor(self.data * other.data)\n",
|
||
" else:\n",
|
||
" return Tensor(self.data * other)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __truediv__(self, other):\n",
|
||
" \"\"\"\n",
|
||
" Divide two tensors element-wise.\n",
|
||
"\n",
|
||
" Common use: Normalization (x / std), converting counts to probabilities.\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" return Tensor(self.data / other.data)\n",
|
||
" else:\n",
|
||
" return Tensor(self.data / other)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" # nbgrader={\"grade\": false, \"grade_id\": \"matmul-impl\", \"solution\": true}\n",
|
||
" def matmul(self, other):\n",
|
||
" \"\"\"\n",
|
||
" Matrix multiplication of two tensors.\n",
|
||
"\n",
|
||
" TODO: Implement matrix multiplication using np.dot with proper validation\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Validate inputs are Tensors\n",
|
||
" 2. Check dimension compatibility (inner dimensions must match)\n",
|
||
" 3. Use np.dot for optimized computation\n",
|
||
" 4. Return new Tensor with result\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> a = Tensor([[1, 2], [3, 4]]) # 2×2\n",
|
||
" >>> b = Tensor([[5, 6], [7, 8]]) # 2×2\n",
|
||
" >>> result = a.matmul(b) # 2×2 result\n",
|
||
" >>> # Result: [[1×5+2×7, 1×6+2×8], [3×5+4×7, 3×6+4×8]] = [[19, 22], [43, 50]]\n",
|
||
"\n",
|
||
" SHAPE RULES:\n",
|
||
" - (M, K) @ (K, N) → (M, N) ✓ Valid\n",
|
||
" - (M, K) @ (J, N) → Error ✗ K ≠ J\n",
|
||
"\n",
|
||
" COMPLEXITY: O(M×N×K) for (M×K) @ (K×N) matrices\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - np.dot handles the optimization for us\n",
|
||
" - Check self.shape[-1] == other.shape[-2] for compatibility\n",
|
||
" - Provide clear error messages for debugging\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if not isinstance(other, Tensor):\n",
|
||
" raise TypeError(f\"Expected Tensor for matrix multiplication, got {type(other)}\")\n",
|
||
"\n",
|
||
" # Handle edge cases\n",
|
||
" if self.shape == () or other.shape == ():\n",
|
||
" # Scalar multiplication\n",
|
||
" return Tensor(self.data * other.data)\n",
|
||
"\n",
|
||
" # For matrix multiplication, we need at least 1D tensors\n",
|
||
" if len(self.shape) == 0 or len(other.shape) == 0:\n",
|
||
" return Tensor(self.data * other.data)\n",
|
||
"\n",
|
||
" # Check dimension compatibility for matrix multiplication\n",
|
||
" if len(self.shape) >= 2 and len(other.shape) >= 2:\n",
|
||
" if self.shape[-1] != other.shape[-2]:\n",
|
||
" raise ValueError(\n",
|
||
" f\"Cannot perform matrix multiplication: {self.shape} @ {other.shape}. \"\n",
|
||
" f\"Inner dimensions must match: {self.shape[-1]} ≠ {other.shape[-2]}. \"\n",
|
||
" f\"💡 HINT: For (M,K) @ (K,N) → (M,N), the K dimensions must be equal.\"\n",
|
||
" )\n",
|
||
" elif len(self.shape) == 1 and len(other.shape) == 2:\n",
|
||
" # Vector @ Matrix\n",
|
||
" if self.shape[0] != other.shape[0]:\n",
|
||
" raise ValueError(\n",
|
||
" f\"Cannot multiply vector {self.shape} with matrix {other.shape}. \"\n",
|
||
" f\"Vector length {self.shape[0]} must match matrix rows {other.shape[0]}.\"\n",
|
||
" )\n",
|
||
" elif len(self.shape) == 2 and len(other.shape) == 1:\n",
|
||
" # Matrix @ Vector\n",
|
||
" if self.shape[1] != other.shape[0]:\n",
|
||
" raise ValueError(\n",
|
||
" f\"Cannot multiply matrix {self.shape} with vector {other.shape}. \"\n",
|
||
" f\"Matrix columns {self.shape[1]} must match vector length {other.shape[0]}.\"\n",
|
||
" )\n",
|
||
"\n",
|
||
" # Perform optimized matrix multiplication\n",
|
||
" # Use np.matmul (not np.dot) for proper batched matrix multiplication with 3D+ tensors\n",
|
||
" result_data = np.matmul(self.data, other.data)\n",
|
||
" return Tensor(result_data)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" # nbgrader={\"grade\": false, \"grade_id\": \"shape-ops\", \"solution\": true}\n",
|
||
" def reshape(self, *shape):\n",
|
||
" \"\"\"\n",
|
||
" Reshape tensor to new dimensions.\n",
|
||
"\n",
|
||
" TODO: Implement tensor reshaping with validation\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Handle different calling conventions: reshape(2, 3) vs reshape((2, 3))\n",
|
||
" 2. Validate total elements remain the same\n",
|
||
" 3. Use NumPy's reshape for the actual operation\n",
|
||
" 4. Return new Tensor (keep immutability)\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> tensor = Tensor([1, 2, 3, 4, 5, 6]) # Shape: (6,)\n",
|
||
" >>> reshaped = tensor.reshape(2, 3) # Shape: (2, 3)\n",
|
||
" >>> print(reshaped.data)\n",
|
||
" [[1. 2. 3.]\n",
|
||
" [4. 5. 6.]]\n",
|
||
"\n",
|
||
" COMMON USAGE:\n",
|
||
" >>> # Flatten for MLP input\n",
|
||
" >>> image = Tensor(np.random.rand(3, 32, 32)) # (channels, height, width)\n",
|
||
" >>> flattened = image.reshape(-1) # (3072,) - all pixels in vector\n",
|
||
" >>>\n",
|
||
" >>> # Prepare batch for convolution\n",
|
||
" >>> batch = Tensor(np.random.rand(32, 784)) # (batch, features)\n",
|
||
" >>> images = batch.reshape(32, 1, 28, 28) # (batch, channels, height, width)\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - Handle both reshape(2, 3) and reshape((2, 3)) calling styles\n",
|
||
" - Check np.prod(new_shape) == self.size for validation\n",
|
||
" - Use descriptive error messages for debugging\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Handle both reshape(2, 3) and reshape((2, 3)) calling conventions\n",
|
||
" if len(shape) == 1 and isinstance(shape[0], (tuple, list)):\n",
|
||
" new_shape = tuple(shape[0])\n",
|
||
" else:\n",
|
||
" new_shape = shape\n",
|
||
"\n",
|
||
" # Handle -1 for automatic dimension inference (like NumPy)\n",
|
||
" if -1 in new_shape:\n",
|
||
" if new_shape.count(-1) > 1:\n",
|
||
" raise ValueError(\"Can only specify one unknown dimension with -1\")\n",
|
||
"\n",
|
||
" # Calculate the unknown dimension\n",
|
||
" known_size = 1\n",
|
||
" unknown_idx = new_shape.index(-1)\n",
|
||
" for i, dim in enumerate(new_shape):\n",
|
||
" if i != unknown_idx:\n",
|
||
" known_size *= dim\n",
|
||
"\n",
|
||
" unknown_dim = self.size // known_size\n",
|
||
" new_shape = list(new_shape)\n",
|
||
" new_shape[unknown_idx] = unknown_dim\n",
|
||
" new_shape = tuple(new_shape)\n",
|
||
"\n",
|
||
" # Validate total elements remain the same\n",
|
||
" if np.prod(new_shape) != self.size:\n",
|
||
" raise ValueError(\n",
|
||
" f\"Cannot reshape tensor of size {self.size} to shape {new_shape}. \"\n",
|
||
" f\"Total elements must match: {self.size} ≠ {np.prod(new_shape)}. \"\n",
|
||
" f\"💡 HINT: Make sure new_shape dimensions multiply to {self.size}\"\n",
|
||
" )\n",
|
||
"\n",
|
||
" # Reshape the data (NumPy handles the memory layout efficiently)\n",
|
||
" reshaped_data = np.reshape(self.data, new_shape)\n",
|
||
" # Preserve gradient tracking from the original tensor (important for autograd!)\n",
|
||
" result = Tensor(reshaped_data, requires_grad=self.requires_grad)\n",
|
||
" return result\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def transpose(self, dim0=None, dim1=None):\n",
|
||
" \"\"\"\n",
|
||
" Transpose tensor dimensions.\n",
|
||
"\n",
|
||
" TODO: Implement tensor transposition\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Handle default case (transpose last two dimensions)\n",
|
||
" 2. Handle specific dimension swapping\n",
|
||
" 3. Use NumPy's transpose with proper axis specification\n",
|
||
" 4. Return new Tensor\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> matrix = Tensor([[1, 2, 3], [4, 5, 6]]) # (2, 3)\n",
|
||
" >>> transposed = matrix.transpose() # (3, 2)\n",
|
||
" >>> print(transposed.data)\n",
|
||
" [[1. 4.]\n",
|
||
" [2. 5.]\n",
|
||
" [3. 6.]]\n",
|
||
"\n",
|
||
" NEURAL NETWORK USAGE:\n",
|
||
" >>> # Weight matrix transpose for backward pass\n",
|
||
" >>> W = Tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]) # (3, 2)\n",
|
||
" >>> W_T = W.transpose() # (2, 3) - for gradient computation\n",
|
||
" >>>\n",
|
||
" >>> # Attention mechanism\n",
|
||
" >>> Q = Tensor([[1, 2], [3, 4]]) # queries (2, 2)\n",
|
||
" >>> K = Tensor([[5, 6], [7, 8]]) # keys (2, 2)\n",
|
||
" >>> attention_scores = Q.matmul(K.transpose()) # Q @ K^T\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - Default: transpose last two dimensions (most common case)\n",
|
||
" - Use np.transpose() with axes parameter\n",
|
||
" - Handle 1D tensors gracefully (transpose is identity)\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if dim0 is None and dim1 is None:\n",
|
||
" # Default: transpose last two dimensions\n",
|
||
" if len(self.shape) < 2:\n",
|
||
" # For 1D tensors, transpose is identity operation\n",
|
||
" return Tensor(self.data.copy())\n",
|
||
" else:\n",
|
||
" # Transpose last two dimensions (most common in ML)\n",
|
||
" axes = list(range(len(self.shape)))\n",
|
||
" axes[-2], axes[-1] = axes[-1], axes[-2]\n",
|
||
" transposed_data = np.transpose(self.data, axes)\n",
|
||
" else:\n",
|
||
" # Specific dimensions to transpose\n",
|
||
" if dim0 is None or dim1 is None:\n",
|
||
" raise ValueError(\"Both dim0 and dim1 must be specified for specific dimension transpose\")\n",
|
||
"\n",
|
||
" # Validate dimensions exist\n",
|
||
" if dim0 >= len(self.shape) or dim1 >= len(self.shape) or dim0 < 0 or dim1 < 0:\n",
|
||
" raise ValueError(\n",
|
||
" f\"Dimension out of range for tensor with shape {self.shape}. \"\n",
|
||
" f\"Got dim0={dim0}, dim1={dim1}, but tensor has {len(self.shape)} dimensions.\"\n",
|
||
" )\n",
|
||
"\n",
|
||
" # Create axes list and swap the specified dimensions\n",
|
||
" axes = list(range(len(self.shape)))\n",
|
||
" axes[dim0], axes[dim1] = axes[dim1], axes[dim0]\n",
|
||
" transposed_data = np.transpose(self.data, axes)\n",
|
||
"\n",
|
||
" # Preserve requires_grad for gradient tracking (Module 05 will add _grad_fn)\n",
|
||
" result = Tensor(transposed_data, requires_grad=self.requires_grad if hasattr(self, 'requires_grad') else False)\n",
|
||
" return result\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" # nbgrader={\"grade\": false, \"grade_id\": \"reduction-ops\", \"solution\": true}\n",
|
||
" def sum(self, axis=None, keepdims=False):\n",
|
||
" \"\"\"\n",
|
||
" Sum tensor along specified axis.\n",
|
||
"\n",
|
||
" TODO: Implement tensor sum with axis control\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Use NumPy's sum with axis parameter\n",
|
||
" 2. Handle axis=None (sum all elements) vs specific axis\n",
|
||
" 3. Support keepdims to maintain shape for broadcasting\n",
|
||
" 4. Return new Tensor with result\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> tensor = Tensor([[1, 2], [3, 4]])\n",
|
||
" >>> total = tensor.sum() # Sum all elements: 10\n",
|
||
" >>> col_sum = tensor.sum(axis=0) # Sum columns: [4, 6]\n",
|
||
" >>> row_sum = tensor.sum(axis=1) # Sum rows: [3, 7]\n",
|
||
"\n",
|
||
" NEURAL NETWORK USAGE:\n",
|
||
" >>> # Batch loss computation\n",
|
||
" >>> batch_losses = Tensor([0.1, 0.3, 0.2, 0.4]) # Individual losses\n",
|
||
" >>> total_loss = batch_losses.sum() # Total: 1.0\n",
|
||
" >>> avg_loss = batch_losses.mean() # Average: 0.25\n",
|
||
" >>>\n",
|
||
" >>> # Global average pooling\n",
|
||
" >>> feature_maps = Tensor(np.random.rand(32, 256, 7, 7)) # (batch, channels, h, w)\n",
|
||
" >>> global_features = feature_maps.sum(axis=(2, 3)) # (batch, channels)\n",
|
||
"\n",
|
||
" HINTS:\n",
|
||
" - np.sum handles all the complexity for us\n",
|
||
" - axis=None sums all elements (returns scalar)\n",
|
||
" - axis=0 sums along first dimension, axis=1 along second, etc.\n",
|
||
" - keepdims=True preserves dimensions for broadcasting\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" result = np.sum(self.data, axis=axis, keepdims=keepdims)\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def mean(self, axis=None, keepdims=False):\n",
|
||
" \"\"\"\n",
|
||
" Compute mean of tensor along specified axis.\n",
|
||
"\n",
|
||
" Common usage: Batch normalization, loss averaging, global pooling.\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" result = np.mean(self.data, axis=axis, keepdims=keepdims)\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def max(self, axis=None, keepdims=False):\n",
|
||
" \"\"\"\n",
|
||
" Find maximum values along specified axis.\n",
|
||
"\n",
|
||
" Common usage: Max pooling, finding best predictions, activation clipping.\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" result = np.max(self.data, axis=axis, keepdims=keepdims)\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" # nbgrader={\"grade\": false, \"grade_id\": \"gradient-placeholder\", \"solution\": true}\n",
|
||
" def backward(self):\n",
|
||
" \"\"\"\n",
|
||
" Compute gradients (implemented in Module 05: Autograd).\n",
|
||
"\n",
|
||
" TODO: Placeholder implementation for gradient computation\n",
|
||
"\n",
|
||
" STUDENT NOTE:\n",
|
||
" This method exists but does nothing until Module 05: Autograd.\n",
|
||
" Don't worry about it for now - focus on the basic tensor operations.\n",
|
||
"\n",
|
||
" In Module 05, we'll implement:\n",
|
||
" - Gradient computation via chain rule\n",
|
||
" - Automatic differentiation\n",
|
||
" - Backpropagation through operations\n",
|
||
" - Computation graph construction\n",
|
||
"\n",
|
||
" FUTURE IMPLEMENTATION PREVIEW:\n",
|
||
" ```python\n",
|
||
" def backward(self, gradient=None):\n",
|
||
" # Module 05 will implement:\n",
|
||
" # 1. Set gradient for this tensor\n",
|
||
" # 2. Propagate to parent operations\n",
|
||
" # 3. Apply chain rule recursively\n",
|
||
" # 4. Accumulate gradients properly\n",
|
||
" pass\n",
|
||
" ```\n",
|
||
"\n",
|
||
" CURRENT BEHAVIOR:\n",
|
||
" >>> x = Tensor([1, 2, 3], requires_grad=True)\n",
|
||
" >>> y = x * 2\n",
|
||
" >>> y.sum().backward() # Calls this method - does nothing\n",
|
||
" >>> print(x.grad) # Still None\n",
|
||
" None\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Placeholder - will be implemented in Module 05\n",
|
||
" # For now, just ensure it doesn't crash when called\n",
|
||
" # This allows students to experiment with gradient syntax\n",
|
||
" # without getting confusing errors about missing methods\n",
|
||
" pass\n",
|
||
" ### END SOLUTION"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "28e76b8d",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Tensor Creation\n",
|
||
"\n",
|
||
"This test validates our Tensor constructor works correctly with various data types and properly initializes all attributes.\n",
|
||
"\n",
|
||
"**What we're testing**: Basic tensor creation and attribute setting\n",
|
||
"**Why it matters**: Foundation for all other operations - if creation fails, nothing works\n",
|
||
"**Expected**: Tensor wraps data correctly with proper attributes and consistent dtype"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "cfac36f6",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-tensor-creation",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_tensor_creation():\n",
|
||
" \"\"\"🧪 Test Tensor creation with various data types.\"\"\"\n",
|
||
" print(\"🧪 Unit Test: Tensor Creation...\")\n",
|
||
"\n",
|
||
" # Test scalar creation\n",
|
||
" scalar = Tensor(5.0)\n",
|
||
" assert scalar.data == 5.0\n",
|
||
" assert scalar.shape == ()\n",
|
||
" assert scalar.size == 1\n",
|
||
" assert scalar.requires_grad == False\n",
|
||
" assert scalar.grad is None\n",
|
||
" assert scalar.dtype == np.float32\n",
|
||
"\n",
|
||
" # Test vector creation\n",
|
||
" vector = Tensor([1, 2, 3])\n",
|
||
" assert np.array_equal(vector.data, np.array([1, 2, 3], dtype=np.float32))\n",
|
||
" assert vector.shape == (3,)\n",
|
||
" assert vector.size == 3\n",
|
||
"\n",
|
||
" # Test matrix creation\n",
|
||
" matrix = Tensor([[1, 2], [3, 4]])\n",
|
||
" assert np.array_equal(matrix.data, np.array([[1, 2], [3, 4]], dtype=np.float32))\n",
|
||
" assert matrix.shape == (2, 2)\n",
|
||
" assert matrix.size == 4\n",
|
||
"\n",
|
||
" # Test gradient flag (dormant feature)\n",
|
||
" grad_tensor = Tensor([1, 2], requires_grad=True)\n",
|
||
" assert grad_tensor.requires_grad == True\n",
|
||
" assert grad_tensor.grad is None # Still None until Module 05\n",
|
||
"\n",
|
||
" print(\"✅ Tensor creation works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_tensor_creation()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c23e49bc",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## Element-wise Arithmetic Operations\n",
|
||
"\n",
|
||
"Element-wise operations are the workhorses of neural network computation. They apply the same operation to corresponding elements in tensors, often with broadcasting to handle different shapes elegantly.\n",
|
||
"\n",
|
||
"### Why Element-wise Operations Matter\n",
|
||
"\n",
|
||
"In neural networks, element-wise operations appear everywhere:\n",
|
||
"- **Activation functions**: Apply ReLU, sigmoid to every element\n",
|
||
"- **Batch normalization**: Subtract mean, divide by std per element\n",
|
||
"- **Loss computation**: Compare predictions vs. targets element-wise\n",
|
||
"- **Gradient updates**: Add scaled gradients to parameters element-wise\n",
|
||
"\n",
|
||
"### Element-wise Addition: The Foundation\n",
|
||
"\n",
|
||
"Addition is the simplest and most fundamental operation. Understanding it deeply helps with all others.\n",
|
||
"\n",
|
||
"```\n",
|
||
"Element-wise Addition Visual:\n",
|
||
"[1, 2, 3] + [4, 5, 6] = [1+4, 2+5, 3+6] = [5, 7, 9]\n",
|
||
"\n",
|
||
"Matrix Addition:\n",
|
||
"[[1, 2]] [[5, 6]] [[1+5, 2+6]] [[6, 8]]\n",
|
||
"[[3, 4]] + [[7, 8]] = [[3+7, 4+8]] = [[10, 12]]\n",
|
||
"\n",
|
||
"Broadcasting Addition (Matrix + Vector):\n",
|
||
"[[1, 2]] [10] [[1, 2]] [[10, 10]] [[11, 12]]\n",
|
||
"[[3, 4]] + [20] = [[3, 4]] + [[20, 20]] = [[23, 24]]\n",
|
||
" ↑ ↑ ↑ ↑ ↑\n",
|
||
" (2,2) (2,1) (2,2) broadcast result\n",
|
||
"\n",
|
||
"Broadcasting Rules:\n",
|
||
"1. Start from rightmost dimension\n",
|
||
"2. Dimensions must be equal OR one must be 1 OR one must be missing\n",
|
||
"3. Missing dimensions are assumed to be 1\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Key Insight**: Broadcasting makes tensors of different shapes compatible by automatically expanding dimensions. This is crucial for batch processing where you often add a single bias vector to an entire batch of data.\n",
|
||
"\n",
|
||
"**Memory Efficiency**: Broadcasting doesn't actually create expanded copies in memory - NumPy computes results on-the-fly, saving memory."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ad4a3f8b",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"### Subtraction, Multiplication, and Division\n",
|
||
"\n",
|
||
"These operations follow the same pattern as addition, working element-wise with broadcasting support. Each serves specific purposes in neural networks:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Element-wise Operations in Neural Networks:\n",
|
||
"\n",
|
||
"┌─────────────────┬─────────────────┬─────────────────┬─────────────────┐\n",
|
||
"│ Subtraction │ Multiplication │ Division │ Use Cases │\n",
|
||
"├─────────────────┼─────────────────┼─────────────────┼─────────────────┤\n",
|
||
"│ [6,8] - [1,2] │ [2,3] * [4,5] │ [8,9] / [2,3] │ • Gradient │\n",
|
||
"│ = [5,6] │ = [8,15] │ = [4.0, 3.0] │ computation │\n",
|
||
"│ │ │ │ • Normalization │\n",
|
||
"│ Center data: │ Gate values: │ Scale features: │ • Loss functions│\n",
|
||
"│ x - mean │ x * mask │ x / std │ • Attention │\n",
|
||
"└─────────────────┴─────────────────┴─────────────────┴─────────────────┘\n",
|
||
"\n",
|
||
"Broadcasting with Scalars (very common in ML):\n",
|
||
"[1, 2, 3] * 2 = [2, 4, 6] (scale all values)\n",
|
||
"[1, 2, 3] - 1 = [0, 1, 2] (shift all values)\n",
|
||
"[2, 4, 6] / 2 = [1, 2, 3] (normalize all values)\n",
|
||
"\n",
|
||
"Real ML Example - Batch Normalization:\n",
|
||
"batch_data = [[1, 2], [3, 4], [5, 6]] # Shape: (3, 2)\n",
|
||
"mean = [3, 4] # Shape: (2,)\n",
|
||
"std = [2, 2] # Shape: (2,)\n",
|
||
"\n",
|
||
"# Normalize: (x - mean) / std\n",
|
||
"normalized = (batch_data - mean) / std\n",
|
||
"# Broadcasting: (3,2) - (2,) = (3,2), then (3,2) / (2,) = (3,2)\n",
|
||
"```\n",
|
||
"\n",
|
||
"**Performance Note**: Element-wise operations are highly optimized in NumPy and run efficiently on modern CPUs with vectorization (SIMD instructions)."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "6f8fd64f",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Arithmetic Operations\n",
|
||
"\n",
|
||
"This test validates our arithmetic operations work correctly with both tensor-tensor and tensor-scalar operations, including broadcasting behavior.\n",
|
||
"\n",
|
||
"**What we're testing**: Addition, subtraction, multiplication, division with broadcasting\n",
|
||
"**Why it matters**: Foundation for neural network forward passes, batch processing, normalization\n",
|
||
"**Expected**: Operations work with both tensors and scalars, proper broadcasting alignment"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ce89898f",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-arithmetic",
|
||
"locked": true,
|
||
"points": 15
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_arithmetic_operations():\n",
|
||
" \"\"\"🧪 Test arithmetic operations with broadcasting.\"\"\"\n",
|
||
" print(\"🧪 Unit Test: Arithmetic Operations...\")\n",
|
||
"\n",
|
||
" # Test tensor + tensor\n",
|
||
" a = Tensor([1, 2, 3])\n",
|
||
" b = Tensor([4, 5, 6])\n",
|
||
" result = a + b\n",
|
||
" assert np.array_equal(result.data, np.array([5, 7, 9], dtype=np.float32))\n",
|
||
"\n",
|
||
" # Test tensor + scalar (very common in ML)\n",
|
||
" result = a + 10\n",
|
||
" assert np.array_equal(result.data, np.array([11, 12, 13], dtype=np.float32))\n",
|
||
"\n",
|
||
" # Test broadcasting with different shapes (matrix + vector)\n",
|
||
" matrix = Tensor([[1, 2], [3, 4]])\n",
|
||
" vector = Tensor([10, 20])\n",
|
||
" result = matrix + vector\n",
|
||
" expected = np.array([[11, 22], [13, 24]], dtype=np.float32)\n",
|
||
" assert np.array_equal(result.data, expected)\n",
|
||
"\n",
|
||
" # Test subtraction (data centering)\n",
|
||
" result = b - a\n",
|
||
" assert np.array_equal(result.data, np.array([3, 3, 3], dtype=np.float32))\n",
|
||
"\n",
|
||
" # Test multiplication (scaling)\n",
|
||
" result = a * 2\n",
|
||
" assert np.array_equal(result.data, np.array([2, 4, 6], dtype=np.float32))\n",
|
||
"\n",
|
||
" # Test division (normalization)\n",
|
||
" result = b / 2\n",
|
||
" assert np.array_equal(result.data, np.array([2.0, 2.5, 3.0], dtype=np.float32))\n",
|
||
"\n",
|
||
" # Test chaining operations (common in ML pipelines)\n",
|
||
" normalized = (a - 2) / 2 # Center and scale\n",
|
||
" expected = np.array([-0.5, 0.0, 0.5], dtype=np.float32)\n",
|
||
" assert np.allclose(normalized.data, expected)\n",
|
||
"\n",
|
||
" print(\"✅ Arithmetic operations work correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_arithmetic_operations()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "55918cd3",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## Matrix Multiplication: The Heart of Neural Networks\n",
|
||
"\n",
|
||
"Matrix multiplication is fundamentally different from element-wise multiplication. It's the operation that gives neural networks their power to transform and combine information across features.\n",
|
||
"\n",
|
||
"### Why Matrix Multiplication is Central to ML\n",
|
||
"\n",
|
||
"Every neural network layer essentially performs matrix multiplication:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Linear Layer (the building block of neural networks):\n",
|
||
"Input Features × Weight Matrix = Output Features\n",
|
||
" (N, D_in) × (D_in, D_out) = (N, D_out)\n",
|
||
"\n",
|
||
"Real Example - Image Classification:\n",
|
||
"Flattened Image × Hidden Weights = Hidden Features\n",
|
||
" (32, 784) × (784, 256) = (32, 256)\n",
|
||
" ↑ ↑ ↑\n",
|
||
" 32 images 784→256 transform 32 feature vectors\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Matrix Multiplication Visualization\n",
|
||
"\n",
|
||
"```\n",
|
||
"Matrix Multiplication Process:\n",
|
||
" A (2×3) B (3×2) C (2×2)\n",
|
||
" ┌ ┐ ┌ ┐ ┌ ┐\n",
|
||
" │ 1 2 3 │ │ 7 8 │ │ 1×7+2×9+3×1 │ ┌ ┐\n",
|
||
" │ │ × │ 9 1 │ = │ │ = │ 28 13│\n",
|
||
" │ 4 5 6 │ │ 1 2 │ │ 4×7+5×9+6×1 │ │ 79 37│\n",
|
||
" └ ┘ └ ┘ └ ┘ └ ┘\n",
|
||
"\n",
|
||
"Computation Breakdown:\n",
|
||
"C[0,0] = A[0,:] · B[:,0] = [1,2,3] · [7,9,1] = 1×7 + 2×9 + 3×1 = 28\n",
|
||
"C[0,1] = A[0,:] · B[:,1] = [1,2,3] · [8,1,2] = 1×8 + 2×1 + 3×2 = 13\n",
|
||
"C[1,0] = A[1,:] · B[:,0] = [4,5,6] · [7,9,1] = 4×7 + 5×9 + 6×1 = 79\n",
|
||
"C[1,1] = A[1,:] · B[:,1] = [4,5,6] · [8,1,2] = 4×8 + 5×1 + 6×2 = 37\n",
|
||
"\n",
|
||
"Key Rule: Inner dimensions must match!\n",
|
||
"A(m,n) @ B(n,p) = C(m,p)\n",
|
||
" ↑ ↑\n",
|
||
" these must be equal\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Computational Complexity and Performance\n",
|
||
"\n",
|
||
"```\n",
|
||
"Computational Cost:\n",
|
||
"For C = A @ B where A is (M×K), B is (K×N):\n",
|
||
"- Multiplications: M × N × K\n",
|
||
"- Additions: M × N × (K-1) ≈ M × N × K\n",
|
||
"- Total FLOPs: ≈ 2 × M × N × K\n",
|
||
"\n",
|
||
"Example: (1000×1000) @ (1000×1000)\n",
|
||
"- FLOPs: 2 × 1000³ = 2 billion operations\n",
|
||
"- On 1 GHz CPU: ~2 seconds if no optimization\n",
|
||
"- With optimized BLAS: ~0.1 seconds (20× speedup!)\n",
|
||
"\n",
|
||
"Memory Access Pattern:\n",
|
||
"A: M×K (row-wise access) ✓ Good cache locality\n",
|
||
"B: K×N (column-wise) ✗ Poor cache locality\n",
|
||
"C: M×N (row-wise write) ✓ Good cache locality\n",
|
||
"\n",
|
||
"This is why optimized libraries like OpenBLAS, Intel MKL use:\n",
|
||
"- Blocking algorithms (process in cache-sized chunks)\n",
|
||
"- Vectorization (SIMD instructions)\n",
|
||
"- Parallelization (multiple cores)\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Neural Network Context\n",
|
||
"\n",
|
||
"```\n",
|
||
"Multi-layer Neural Network:\n",
|
||
"Input (batch=32, features=784)\n",
|
||
" ↓ W1: (784, 256)\n",
|
||
"Hidden1 (batch=32, features=256)\n",
|
||
" ↓ W2: (256, 128)\n",
|
||
"Hidden2 (batch=32, features=128)\n",
|
||
" ↓ W3: (128, 10)\n",
|
||
"Output (batch=32, classes=10)\n",
|
||
"\n",
|
||
"Each arrow represents a matrix multiplication:\n",
|
||
"- Forward pass: 3 matrix multiplications\n",
|
||
"- Backward pass: 3 more matrix multiplications (with transposes)\n",
|
||
"- Total: 6 matrix mults per forward+backward pass\n",
|
||
"\n",
|
||
"For training batch: 32 × (784×256 + 256×128 + 128×10) FLOPs\n",
|
||
"= 32 × (200,704 + 32,768 + 1,280) = 32 × 234,752 = 7.5M FLOPs per batch\n",
|
||
"```\n",
|
||
"\n",
|
||
"This is why GPU acceleration matters - modern GPUs can perform thousands of these operations in parallel!"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "d33d261d",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Matrix Multiplication\n",
|
||
"\n",
|
||
"This test validates matrix multiplication works correctly with proper shape checking and error handling.\n",
|
||
"\n",
|
||
"**What we're testing**: Matrix multiplication with shape validation and edge cases\n",
|
||
"**Why it matters**: Core operation in neural networks (linear layers, attention mechanisms)\n",
|
||
"**Expected**: Correct results for valid shapes, clear error messages for invalid shapes"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "93279707",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-matmul",
|
||
"locked": true,
|
||
"points": 15
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_matrix_multiplication():\n",
|
||
" \"\"\"🧪 Test matrix multiplication operations.\"\"\"\n",
|
||
" print(\"🧪 Unit Test: Matrix Multiplication...\")\n",
|
||
"\n",
|
||
" # Test 2×2 matrix multiplication (basic case)\n",
|
||
" a = Tensor([[1, 2], [3, 4]]) # 2×2\n",
|
||
" b = Tensor([[5, 6], [7, 8]]) # 2×2\n",
|
||
" result = a.matmul(b)\n",
|
||
" # Expected: [[1×5+2×7, 1×6+2×8], [3×5+4×7, 3×6+4×8]] = [[19, 22], [43, 50]]\n",
|
||
" expected = np.array([[19, 22], [43, 50]], dtype=np.float32)\n",
|
||
" assert np.array_equal(result.data, expected)\n",
|
||
"\n",
|
||
" # Test rectangular matrices (common in neural networks)\n",
|
||
" c = Tensor([[1, 2, 3], [4, 5, 6]]) # 2×3 (like batch_size=2, features=3)\n",
|
||
" d = Tensor([[7, 8], [9, 10], [11, 12]]) # 3×2 (like features=3, outputs=2)\n",
|
||
" result = c.matmul(d)\n",
|
||
" # Expected: [[1×7+2×9+3×11, 1×8+2×10+3×12], [4×7+5×9+6×11, 4×8+5×10+6×12]]\n",
|
||
" expected = np.array([[58, 64], [139, 154]], dtype=np.float32)\n",
|
||
" assert np.array_equal(result.data, expected)\n",
|
||
"\n",
|
||
" # Test matrix-vector multiplication (common in forward pass)\n",
|
||
" matrix = Tensor([[1, 2, 3], [4, 5, 6]]) # 2×3\n",
|
||
" vector = Tensor([1, 2, 3]) # 3×1 (conceptually)\n",
|
||
" result = matrix.matmul(vector)\n",
|
||
" # Expected: [1×1+2×2+3×3, 4×1+5×2+6×3] = [14, 32]\n",
|
||
" expected = np.array([14, 32], dtype=np.float32)\n",
|
||
" assert np.array_equal(result.data, expected)\n",
|
||
"\n",
|
||
" # Test shape validation - should raise clear error\n",
|
||
" try:\n",
|
||
" incompatible_a = Tensor([[1, 2]]) # 1×2\n",
|
||
" incompatible_b = Tensor([[1], [2], [3]]) # 3×1\n",
|
||
" incompatible_a.matmul(incompatible_b) # 1×2 @ 3×1 should fail (2 ≠ 3)\n",
|
||
" assert False, \"Should have raised ValueError for incompatible shapes\"\n",
|
||
" except ValueError as e:\n",
|
||
" assert \"Inner dimensions must match\" in str(e)\n",
|
||
" assert \"2 ≠ 3\" in str(e) # Should show specific dimensions\n",
|
||
"\n",
|
||
" print(\"✅ Matrix multiplication works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_matrix_multiplication()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2439ca3e",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## Shape Manipulation: Reshape and Transpose\n",
|
||
"\n",
|
||
"Neural networks constantly change tensor shapes to match layer requirements. Understanding these operations is crucial for data flow through networks.\n",
|
||
"\n",
|
||
"### Why Shape Manipulation Matters\n",
|
||
"\n",
|
||
"Real neural networks require constant shape changes:\n",
|
||
"\n",
|
||
"```\n",
|
||
"CNN Data Flow Example:\n",
|
||
"Input Image: (32, 3, 224, 224) # batch, channels, height, width\n",
|
||
" ↓ Convolutional layers\n",
|
||
"Feature Maps: (32, 512, 7, 7) # batch, features, spatial\n",
|
||
" ↓ Global Average Pool\n",
|
||
"Pooled: (32, 512, 1, 1) # batch, features, 1, 1\n",
|
||
" ↓ Flatten for classifier\n",
|
||
"Flattened: (32, 512) # batch, features\n",
|
||
" ↓ Linear classifier\n",
|
||
"Output: (32, 1000) # batch, classes\n",
|
||
"\n",
|
||
"Each ↓ involves reshape or view operations!\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Reshape: Changing Interpretation of the Same Data\n",
|
||
"\n",
|
||
"```\n",
|
||
"Reshaping (changing dimensions without changing data):\n",
|
||
"Original: [1, 2, 3, 4, 5, 6] (shape: (6,))\n",
|
||
" ↓ reshape(2, 3)\n",
|
||
"Result: [[1, 2, 3], (shape: (2, 3))\n",
|
||
" [4, 5, 6]]\n",
|
||
"\n",
|
||
"Memory Layout (unchanged):\n",
|
||
"Before: [1][2][3][4][5][6]\n",
|
||
"After: [1][2][3][4][5][6] ← Same memory, different interpretation\n",
|
||
"\n",
|
||
"Key Insight: Reshape is O(1) operation - no data copying!\n",
|
||
"Just changes how we interpret the memory layout.\n",
|
||
"\n",
|
||
"Common ML Reshapes:\n",
|
||
"┌─────────────────────┬─────────────────────┬─────────────────────┐\n",
|
||
"│ Flatten for MLP │ Unflatten for CNN │ Batch Dimension │\n",
|
||
"├─────────────────────┼─────────────────────┼─────────────────────┤\n",
|
||
"│ (N,H,W,C) → (N,H×W×C) │ (N,D) → (N,H,W,C) │ (H,W) → (1,H,W) │\n",
|
||
"│ Images to vectors │ Vectors to images │ Add batch dimension │\n",
|
||
"└─────────────────────┴─────────────────────┴─────────────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Transpose: Swapping Dimensions\n",
|
||
"\n",
|
||
"```\n",
|
||
"Transposing (swapping dimensions - data rearrangement):\n",
|
||
"Original: [[1, 2, 3], (shape: (2, 3))\n",
|
||
" [4, 5, 6]]\n",
|
||
" ↓ transpose()\n",
|
||
"Result: [[1, 4], (shape: (3, 2))\n",
|
||
" [2, 5],\n",
|
||
" [3, 6]]\n",
|
||
"\n",
|
||
"Memory Layout (rearranged):\n",
|
||
"Before: [1][2][3][4][5][6]\n",
|
||
"After: [1][4][2][5][3][6] ← Data actually moves in memory\n",
|
||
"\n",
|
||
"Key Insight: Transpose involves data movement - more expensive than reshape.\n",
|
||
"\n",
|
||
"Neural Network Usage:\n",
|
||
"┌─────────────────────┬─────────────────────┬─────────────────────┐\n",
|
||
"│ Weight Matrices │ Attention Mechanism │ Gradient Computation│\n",
|
||
"├─────────────────────┼─────────────────────┼─────────────────────┤\n",
|
||
"│ Forward: X @ W │ Q @ K^T attention │ ∂L/∂W = X^T @ ∂L/∂Y│\n",
|
||
"│ Backward: X @ W^T │ scores │ │\n",
|
||
"└─────────────────────┴─────────────────────┴─────────────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Performance Implications\n",
|
||
"\n",
|
||
"```\n",
|
||
"Operation Performance (for 1000×1000 matrix):\n",
|
||
"┌─────────────────┬──────────────┬─────────────────┬─────────────────┐\n",
|
||
"│ Operation │ Time │ Memory Access │ Cache Behavior │\n",
|
||
"├─────────────────┼──────────────┼─────────────────┼─────────────────┤\n",
|
||
"│ reshape() │ ~0.001 ms │ No data copy │ No cache impact │\n",
|
||
"│ transpose() │ ~10 ms │ Full data copy │ Poor locality │\n",
|
||
"│ view() (future) │ ~0.001 ms │ No data copy │ No cache impact │\n",
|
||
"└─────────────────┴──────────────┴─────────────────┴─────────────────┘\n",
|
||
"\n",
|
||
"Why transpose() is slower:\n",
|
||
"- Must rearrange data in memory\n",
|
||
"- Poor cache locality (accessing columns)\n",
|
||
"- Can't be parallelized easily\n",
|
||
"```\n",
|
||
"\n",
|
||
"This is why frameworks like PyTorch often use \"lazy\" transpose operations that defer the actual data movement until necessary."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "30ef42fb",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Shape Manipulation\n",
|
||
"\n",
|
||
"This test validates reshape and transpose operations work correctly with validation and edge cases.\n",
|
||
"\n",
|
||
"**What we're testing**: Reshape and transpose operations with proper error handling\n",
|
||
"**Why it matters**: Essential for data flow in neural networks, CNN/RNN architectures\n",
|
||
"**Expected**: Correct shape changes, proper error handling for invalid operations"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8ff5e144",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-shape-ops",
|
||
"locked": true,
|
||
"points": 15
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_shape_manipulation():\n",
|
||
" \"\"\"🧪 Test reshape and transpose operations.\"\"\"\n",
|
||
" print(\"🧪 Unit Test: Shape Manipulation...\")\n",
|
||
"\n",
|
||
" # Test basic reshape (flatten → matrix)\n",
|
||
" tensor = Tensor([1, 2, 3, 4, 5, 6]) # Shape: (6,)\n",
|
||
" reshaped = tensor.reshape(2, 3) # Shape: (2, 3)\n",
|
||
" assert reshaped.shape == (2, 3)\n",
|
||
" expected = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)\n",
|
||
" assert np.array_equal(reshaped.data, expected)\n",
|
||
"\n",
|
||
" # Test reshape with tuple (alternative calling style)\n",
|
||
" reshaped2 = tensor.reshape((3, 2)) # Shape: (3, 2)\n",
|
||
" assert reshaped2.shape == (3, 2)\n",
|
||
" expected2 = np.array([[1, 2], [3, 4], [5, 6]], dtype=np.float32)\n",
|
||
" assert np.array_equal(reshaped2.data, expected2)\n",
|
||
"\n",
|
||
" # Test reshape with -1 (automatic dimension inference)\n",
|
||
" auto_reshaped = tensor.reshape(2, -1) # Should infer -1 as 3\n",
|
||
" assert auto_reshaped.shape == (2, 3)\n",
|
||
"\n",
|
||
" # Test reshape validation - should raise error for incompatible sizes\n",
|
||
" try:\n",
|
||
" tensor.reshape(2, 2) # 6 elements can't fit in 2×2=4\n",
|
||
" assert False, \"Should have raised ValueError\"\n",
|
||
" except ValueError as e:\n",
|
||
" assert \"Total elements must match\" in str(e)\n",
|
||
" assert \"6 ≠ 4\" in str(e)\n",
|
||
"\n",
|
||
" # Test matrix transpose (most common case)\n",
|
||
" matrix = Tensor([[1, 2, 3], [4, 5, 6]]) # (2, 3)\n",
|
||
" transposed = matrix.transpose() # (3, 2)\n",
|
||
" assert transposed.shape == (3, 2)\n",
|
||
" expected = np.array([[1, 4], [2, 5], [3, 6]], dtype=np.float32)\n",
|
||
" assert np.array_equal(transposed.data, expected)\n",
|
||
"\n",
|
||
" # Test 1D transpose (should be identity)\n",
|
||
" vector = Tensor([1, 2, 3])\n",
|
||
" vector_t = vector.transpose()\n",
|
||
" assert np.array_equal(vector.data, vector_t.data)\n",
|
||
"\n",
|
||
" # Test specific dimension transpose\n",
|
||
" tensor_3d = Tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) # (2, 2, 2)\n",
|
||
" swapped = tensor_3d.transpose(0, 2) # Swap first and last dimensions\n",
|
||
" assert swapped.shape == (2, 2, 2) # Same shape but data rearranged\n",
|
||
"\n",
|
||
" # Test neural network reshape pattern (flatten for MLP)\n",
|
||
" batch_images = Tensor(np.random.rand(2, 3, 4)) # (batch=2, height=3, width=4)\n",
|
||
" flattened = batch_images.reshape(2, -1) # (batch=2, features=12)\n",
|
||
" assert flattened.shape == (2, 12)\n",
|
||
"\n",
|
||
" print(\"✅ Shape manipulation works correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_shape_manipulation()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5be42959",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## Reduction Operations: Aggregating Information\n",
|
||
"\n",
|
||
"Reduction operations collapse dimensions by aggregating data, which is essential for computing statistics, losses, and preparing data for different layers.\n",
|
||
"\n",
|
||
"### Why Reductions are Crucial in ML\n",
|
||
"\n",
|
||
"Reduction operations appear throughout neural networks:\n",
|
||
"\n",
|
||
"```\n",
|
||
"Common ML Reduction Patterns:\n",
|
||
"\n",
|
||
"┌─────────────────────┬─────────────────────┬─────────────────────┐\n",
|
||
"│ Loss Computation │ Batch Normalization │ Global Pooling │\n",
|
||
"├─────────────────────┼─────────────────────┼─────────────────────┤\n",
|
||
"│ Per-sample losses → │ Batch statistics → │ Feature maps → │\n",
|
||
"│ Single batch loss │ Normalization │ Single features │\n",
|
||
"│ │ │ │\n",
|
||
"│ losses.mean() │ batch.mean(axis=0) │ fmaps.mean(axis=(2,3))│\n",
|
||
"│ (N,) → scalar │ (N,D) → (D,) │ (N,C,H,W) → (N,C) │\n",
|
||
"└─────────────────────┴─────────────────────┴─────────────────────┘\n",
|
||
"\n",
|
||
"Real Examples:\n",
|
||
"• Cross-entropy loss: -log(predictions).mean() [average over batch]\n",
|
||
"• Batch norm: (x - x.mean()) / x.std() [normalize each feature]\n",
|
||
"• Global avg pool: features.mean(dim=(2,3)) [spatial → scalar per channel]\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Understanding Axis Operations\n",
|
||
"\n",
|
||
"```\n",
|
||
"Visual Axis Understanding:\n",
|
||
"Matrix: [[1, 2, 3], All reductions operate on this data\n",
|
||
" [4, 5, 6]] Shape: (2, 3)\n",
|
||
"\n",
|
||
" axis=0 (↓)\n",
|
||
" ┌─────────┐\n",
|
||
"axis=1 │ 1 2 3 │ → axis=1 reduces across columns (→)\n",
|
||
" (→) │ 4 5 6 │ → Result shape: (2,) [one value per row]\n",
|
||
" └─────────┘\n",
|
||
" ↓ ↓ ↓\n",
|
||
" axis=0 reduces down rows (↓)\n",
|
||
" Result shape: (3,) [one value per column]\n",
|
||
"\n",
|
||
"Reduction Results:\n",
|
||
"├─ .sum() → 21 (sum all: 1+2+3+4+5+6)\n",
|
||
"├─ .sum(axis=0) → [5, 7, 9] (sum columns: [1+4, 2+5, 3+6])\n",
|
||
"├─ .sum(axis=1) → [6, 15] (sum rows: [1+2+3, 4+5+6])\n",
|
||
"├─ .mean() → 3.5 (average all: 21/6)\n",
|
||
"├─ .mean(axis=0) → [2.5, 3.5, 4.5] (average columns)\n",
|
||
"└─ .max() → 6 (maximum element)\n",
|
||
"\n",
|
||
"3D Tensor Example (batch, height, width):\n",
|
||
"data.shape = (2, 3, 4) # 2 samples, 3×4 images\n",
|
||
"│\n",
|
||
"├─ .sum(axis=0) → (3, 4) # Sum across batch dimension\n",
|
||
"├─ .sum(axis=1) → (2, 4) # Sum across height dimension\n",
|
||
"├─ .sum(axis=2) → (2, 3) # Sum across width dimension\n",
|
||
"└─ .sum(axis=(1,2)) → (2,) # Sum across both spatial dims (global pool)\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Memory and Performance Considerations\n",
|
||
"\n",
|
||
"```\n",
|
||
"Reduction Performance:\n",
|
||
"┌─────────────────┬──────────────┬─────────────────┬─────────────────┐\n",
|
||
"│ Operation │ Time Complex │ Memory Access │ Cache Behavior │\n",
|
||
"├─────────────────┼──────────────┼─────────────────┼─────────────────┤\n",
|
||
"│ .sum() │ O(N) │ Sequential read │ Excellent │\n",
|
||
"│ .sum(axis=0) │ O(N) │ Column access │ Poor (strided) │\n",
|
||
"│ .sum(axis=1) │ O(N) │ Row access │ Excellent │\n",
|
||
"│ .mean() │ O(N) │ Sequential read │ Excellent │\n",
|
||
"│ .max() │ O(N) │ Sequential read │ Excellent │\n",
|
||
"└─────────────────┴──────────────┴─────────────────┴─────────────────┘\n",
|
||
"\n",
|
||
"Why axis=0 is slower:\n",
|
||
"- Accesses elements with large strides\n",
|
||
"- Poor cache locality (jumping rows)\n",
|
||
"- Less vectorization-friendly\n",
|
||
"\n",
|
||
"Optimization strategies:\n",
|
||
"- Prefer axis=-1 operations when possible\n",
|
||
"- Use keepdims=True to maintain shape for broadcasting\n",
|
||
"- Consider reshaping before reduction for better cache behavior\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e5824871",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Reduction Operations\n",
|
||
"\n",
|
||
"This test validates reduction operations work correctly with axis control and maintain proper shapes.\n",
|
||
"\n",
|
||
"**What we're testing**: Sum, mean, max operations with axis parameter and keepdims\n",
|
||
"**Why it matters**: Essential for loss computation, batch processing, and pooling operations\n",
|
||
"**Expected**: Correct reduction along specified axes with proper shape handling"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "e35f8cc5",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "test-reductions",
|
||
"locked": true,
|
||
"points": 10
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_unit_reduction_operations():\n",
|
||
" \"\"\"🧪 Test reduction operations.\"\"\"\n",
|
||
" print(\"🧪 Unit Test: Reduction Operations...\")\n",
|
||
"\n",
|
||
" matrix = Tensor([[1, 2, 3], [4, 5, 6]]) # Shape: (2, 3)\n",
|
||
"\n",
|
||
" # Test sum all elements (common for loss computation)\n",
|
||
" total = matrix.sum()\n",
|
||
" assert total.data == 21.0 # 1+2+3+4+5+6\n",
|
||
" assert total.shape == () # Scalar result\n",
|
||
"\n",
|
||
" # Test sum along axis 0 (columns) - batch dimension reduction\n",
|
||
" col_sum = matrix.sum(axis=0)\n",
|
||
" expected_col = np.array([5, 7, 9], dtype=np.float32) # [1+4, 2+5, 3+6]\n",
|
||
" assert np.array_equal(col_sum.data, expected_col)\n",
|
||
" assert col_sum.shape == (3,)\n",
|
||
"\n",
|
||
" # Test sum along axis 1 (rows) - feature dimension reduction\n",
|
||
" row_sum = matrix.sum(axis=1)\n",
|
||
" expected_row = np.array([6, 15], dtype=np.float32) # [1+2+3, 4+5+6]\n",
|
||
" assert np.array_equal(row_sum.data, expected_row)\n",
|
||
" assert row_sum.shape == (2,)\n",
|
||
"\n",
|
||
" # Test mean (average loss computation)\n",
|
||
" avg = matrix.mean()\n",
|
||
" assert np.isclose(avg.data, 3.5) # 21/6\n",
|
||
" assert avg.shape == ()\n",
|
||
"\n",
|
||
" # Test mean along axis (batch normalization pattern)\n",
|
||
" col_mean = matrix.mean(axis=0)\n",
|
||
" expected_mean = np.array([2.5, 3.5, 4.5], dtype=np.float32) # [5/2, 7/2, 9/2]\n",
|
||
" assert np.allclose(col_mean.data, expected_mean)\n",
|
||
"\n",
|
||
" # Test max (finding best predictions)\n",
|
||
" maximum = matrix.max()\n",
|
||
" assert maximum.data == 6.0\n",
|
||
" assert maximum.shape == ()\n",
|
||
"\n",
|
||
" # Test max along axis (argmax-like operation)\n",
|
||
" row_max = matrix.max(axis=1)\n",
|
||
" expected_max = np.array([3, 6], dtype=np.float32) # [max(1,2,3), max(4,5,6)]\n",
|
||
" assert np.array_equal(row_max.data, expected_max)\n",
|
||
"\n",
|
||
" # Test keepdims (important for broadcasting)\n",
|
||
" sum_keepdims = matrix.sum(axis=1, keepdims=True)\n",
|
||
" assert sum_keepdims.shape == (2, 1) # Maintains 2D shape\n",
|
||
" expected_keepdims = np.array([[6], [15]], dtype=np.float32)\n",
|
||
" assert np.array_equal(sum_keepdims.data, expected_keepdims)\n",
|
||
"\n",
|
||
" # Test 3D reduction (simulating global average pooling)\n",
|
||
" tensor_3d = Tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) # (2, 2, 2)\n",
|
||
" spatial_mean = tensor_3d.mean(axis=(1, 2)) # Average across spatial dimensions\n",
|
||
" assert spatial_mean.shape == (2,) # One value per batch item\n",
|
||
"\n",
|
||
" print(\"✅ Reduction operations work correctly!\")\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_unit_reduction_operations()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "cf6df213",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## Gradient Features: Preparing for Module 05\n",
|
||
"\n",
|
||
"Our Tensor includes dormant gradient features that will spring to life in Module 05. For now, they exist but do nothing - this design choice ensures a consistent interface throughout the course.\n",
|
||
"\n",
|
||
"### Why Include Gradient Features Now?\n",
|
||
"\n",
|
||
"```\n",
|
||
"Gradient System Evolution:\n",
|
||
"Module 01: Tensor with dormant gradients\n",
|
||
" ┌─────────────────────────────────┐\n",
|
||
" │ Tensor │\n",
|
||
" │ • data: actual values │\n",
|
||
" │ • requires_grad: False │ ← Present but unused\n",
|
||
" │ • grad: None │ ← Present but stays None\n",
|
||
" │ • backward(): pass │ ← Present but does nothing\n",
|
||
" └─────────────────────────────────┘\n",
|
||
" ↓ Module 05 activates these\n",
|
||
"Module 05: Tensor with active gradients\n",
|
||
" ┌─────────────────────────────────┐\n",
|
||
" │ Tensor │\n",
|
||
" │ • data: actual values │\n",
|
||
" │ • requires_grad: True │ ← Now controls gradient tracking\n",
|
||
" │ • grad: computed gradients │ ← Now accumulates gradients\n",
|
||
" │ • backward(): computes grads │ ← Now implements chain rule\n",
|
||
" └─────────────────────────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Design Benefits\n",
|
||
"\n",
|
||
"**Consistency**: Same Tensor class interface throughout all modules\n",
|
||
"- No confusing Variable vs. Tensor distinction (unlike early PyTorch)\n",
|
||
"- Students never need to learn a \"new\" Tensor class\n",
|
||
"- IDE autocomplete works from day one\n",
|
||
"\n",
|
||
"**Gradual Complexity**: Features activate when students are ready\n",
|
||
"- Module 01-04: Ignore gradient features, focus on operations\n",
|
||
"- Module 05: Gradient features \"turn on\" magically\n",
|
||
"- No cognitive overload in early modules\n",
|
||
"\n",
|
||
"**Future-Proof**: Easy to extend without breaking changes\n",
|
||
"- Additional features can be added as dormant initially\n",
|
||
"- No monkey-patching or dynamic class modification\n",
|
||
"- Clean evolution path\n",
|
||
"\n",
|
||
"### Current State (Module 01)\n",
|
||
"\n",
|
||
"```\n",
|
||
"Gradient Features - Current Behavior:\n",
|
||
"┌─────────────────────────────────────────────────────────┐\n",
|
||
"│ Feature │ Current State │ Module 05 State │\n",
|
||
"├─────────────────────────────────────────────────────────┤\n",
|
||
"│ requires_grad │ False │ True (when needed) │\n",
|
||
"│ grad │ None │ np.array(...) │\n",
|
||
"│ backward() │ pass (no-op) │ Chain rule impl │\n",
|
||
"│ Operation chaining│ Not tracked │ Computation graph │\n",
|
||
"└─────────────────────────────────────────────────────────┘\n",
|
||
"\n",
|
||
"Student Experience:\n",
|
||
"• Can call .backward() without errors (just does nothing)\n",
|
||
"• Can set requires_grad=True (just gets stored)\n",
|
||
"• Focus on understanding tensor operations first\n",
|
||
"• Gradients remain \"mysterious\" until Module 05 reveals them\n",
|
||
"```\n",
|
||
"\n",
|
||
"This approach matches the pedagogical principle of \"progressive disclosure\" - reveal complexity only when students are ready to handle it."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "6d368af1",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## 4. Integration: Bringing It Together\n",
|
||
"\n",
|
||
"Let's test how our Tensor operations work together in realistic scenarios that mirror neural network computations. This integration demonstrates that our individual operations combine correctly for complex ML workflows.\n",
|
||
"\n",
|
||
"### Neural Network Layer Simulation\n",
|
||
"\n",
|
||
"The fundamental building block of neural networks is the linear transformation: **y = xW + b**\n",
|
||
"\n",
|
||
"```\n",
|
||
"Linear Layer Forward Pass: y = xW + b\n",
|
||
"\n",
|
||
"Input Features → Weight Matrix → Matrix Multiply → Add Bias → Output Features\n",
|
||
" (batch, in) (in, out) (batch, out) (batch, out) (batch, out)\n",
|
||
"\n",
|
||
"Step-by-Step Breakdown:\n",
|
||
"1. Input: X shape (batch_size, input_features)\n",
|
||
"2. Weight: W shape (input_features, output_features)\n",
|
||
"3. Matmul: XW shape (batch_size, output_features)\n",
|
||
"4. Bias: b shape (output_features,)\n",
|
||
"5. Result: XW + b shape (batch_size, output_features)\n",
|
||
"\n",
|
||
"Example Flow:\n",
|
||
"Input: [[1, 2, 3], Weight: [[0.1, 0.2], Bias: [0.1, 0.2]\n",
|
||
" [4, 5, 6]] [0.3, 0.4],\n",
|
||
" (2, 3) [0.5, 0.6]]\n",
|
||
" (3, 2)\n",
|
||
"\n",
|
||
"Step 1: Matrix Multiply\n",
|
||
"[[1, 2, 3]] @ [[0.1, 0.2]] = [[1×0.1+2×0.3+3×0.5, 1×0.2+2×0.4+3×0.6]]\n",
|
||
"[[4, 5, 6]] [[0.3, 0.4]] [[4×0.1+5×0.3+6×0.5, 4×0.2+5×0.4+6×0.6]]\n",
|
||
" [[0.5, 0.6]]\n",
|
||
" = [[1.6, 2.6],\n",
|
||
" [4.9, 6.8]]\n",
|
||
"\n",
|
||
"Step 2: Add Bias (Broadcasting)\n",
|
||
"[[1.6, 2.6]] + [0.1, 0.2] = [[1.7, 2.8],\n",
|
||
" [4.9, 6.8]] [5.0, 7.0]]\n",
|
||
"\n",
|
||
"This is the foundation of every neural network layer!\n",
|
||
"```\n",
|
||
"\n",
|
||
"### Why This Integration Matters\n",
|
||
"\n",
|
||
"This simulation shows how our basic operations combine to create the computational building blocks of neural networks:\n",
|
||
"\n",
|
||
"- **Matrix Multiplication**: Transforms input features into new feature space\n",
|
||
"- **Broadcasting Addition**: Applies learned biases efficiently across batches\n",
|
||
"- **Shape Handling**: Ensures data flows correctly through layers\n",
|
||
"- **Memory Management**: Creates new tensors without corrupting inputs\n",
|
||
"\n",
|
||
"Every layer in a neural network - from simple MLPs to complex transformers - uses this same pattern."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a5c6349f",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"\"\"\"\n",
|
||
"# 🧪 Module Integration Test\n",
|
||
"\n",
|
||
"Final validation that everything works together correctly before module completion.\n",
|
||
"\"\"\"\n",
|
||
"\n",
|
||
"def import_previous_module(module_name: str, component_name: str):\n",
|
||
" import sys\n",
|
||
" import os\n",
|
||
" sys.path.append(os.path.join(os.path.dirname(__file__), '..', module_name))\n",
|
||
" module = __import__(f\"{module_name.split('_')[1]}_dev\")\n",
|
||
" return getattr(module, component_name)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a6a6b03a",
|
||
"metadata": {
|
||
"lines_to_next_cell": 2,
|
||
"nbgrader": {
|
||
"grade": true,
|
||
"grade_id": "module-integration",
|
||
"locked": true,
|
||
"points": 20
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def test_module():\n",
|
||
" \"\"\"\n",
|
||
" Comprehensive test of entire module functionality.\n",
|
||
"\n",
|
||
" This final test runs before module summary to ensure:\n",
|
||
" - All unit tests pass\n",
|
||
" - Functions work together correctly\n",
|
||
" - Module is ready for integration with TinyTorch\n",
|
||
" \"\"\"\n",
|
||
" print(\"🧪 RUNNING MODULE INTEGRATION TEST\")\n",
|
||
" print(\"=\" * 50)\n",
|
||
"\n",
|
||
" # Run all unit tests\n",
|
||
" print(\"Running unit tests...\")\n",
|
||
" test_unit_tensor_creation()\n",
|
||
" test_unit_arithmetic_operations()\n",
|
||
" test_unit_matrix_multiplication()\n",
|
||
" test_unit_shape_manipulation()\n",
|
||
" test_unit_reduction_operations()\n",
|
||
"\n",
|
||
" print(\"\\nRunning integration scenarios...\")\n",
|
||
"\n",
|
||
" # Test realistic neural network computation\n",
|
||
" print(\"🧪 Integration Test: Two-Layer Neural Network...\")\n",
|
||
"\n",
|
||
" # Create input data (2 samples, 3 features)\n",
|
||
" x = Tensor([[1, 2, 3], [4, 5, 6]])\n",
|
||
"\n",
|
||
" # First layer: 3 inputs → 4 hidden units\n",
|
||
" W1 = Tensor([[0.1, 0.2, 0.3, 0.4],\n",
|
||
" [0.5, 0.6, 0.7, 0.8],\n",
|
||
" [0.9, 1.0, 1.1, 1.2]])\n",
|
||
" b1 = Tensor([0.1, 0.2, 0.3, 0.4])\n",
|
||
"\n",
|
||
" # Forward pass: hidden = xW1 + b1\n",
|
||
" hidden = x.matmul(W1) + b1\n",
|
||
" assert hidden.shape == (2, 4), f\"Expected (2, 4), got {hidden.shape}\"\n",
|
||
"\n",
|
||
" # Second layer: 4 hidden → 2 outputs\n",
|
||
" W2 = Tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6], [0.7, 0.8]])\n",
|
||
" b2 = Tensor([0.1, 0.2])\n",
|
||
"\n",
|
||
" # Output layer: output = hiddenW2 + b2\n",
|
||
" output = hidden.matmul(W2) + b2\n",
|
||
" assert output.shape == (2, 2), f\"Expected (2, 2), got {output.shape}\"\n",
|
||
"\n",
|
||
" # Verify data flows correctly (no NaN, reasonable values)\n",
|
||
" assert not np.isnan(output.data).any(), \"Output contains NaN values\"\n",
|
||
" assert np.isfinite(output.data).all(), \"Output contains infinite values\"\n",
|
||
"\n",
|
||
" print(\"✅ Two-layer neural network computation works!\")\n",
|
||
"\n",
|
||
" # Test gradient attributes are preserved and functional\n",
|
||
" print(\"🧪 Integration Test: Gradient System Readiness...\")\n",
|
||
" grad_tensor = Tensor([1, 2, 3], requires_grad=True)\n",
|
||
" result = grad_tensor + 5\n",
|
||
" assert grad_tensor.requires_grad == True, \"requires_grad not preserved\"\n",
|
||
" assert grad_tensor.grad is None, \"grad should still be None\"\n",
|
||
"\n",
|
||
" # Test backward() doesn't crash (even though it does nothing)\n",
|
||
" grad_tensor.backward() # Should not raise any exception\n",
|
||
"\n",
|
||
" print(\"✅ Gradient system ready for Module 05!\")\n",
|
||
"\n",
|
||
" # Test complex shape manipulations\n",
|
||
" print(\"🧪 Integration Test: Complex Shape Operations...\")\n",
|
||
" data = Tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])\n",
|
||
"\n",
|
||
" # Reshape to 3D tensor (simulating batch processing)\n",
|
||
" tensor_3d = data.reshape(2, 2, 3) # (batch=2, height=2, width=3)\n",
|
||
" assert tensor_3d.shape == (2, 2, 3)\n",
|
||
"\n",
|
||
" # Global average pooling simulation\n",
|
||
" pooled = tensor_3d.mean(axis=(1, 2)) # Average across spatial dimensions\n",
|
||
" assert pooled.shape == (2,), f\"Expected (2,), got {pooled.shape}\"\n",
|
||
"\n",
|
||
" # Flatten for MLP\n",
|
||
" flattened = tensor_3d.reshape(2, -1) # (batch, features)\n",
|
||
" assert flattened.shape == (2, 6)\n",
|
||
"\n",
|
||
" # Transpose for different operations\n",
|
||
" transposed = tensor_3d.transpose() # Should transpose last two dims\n",
|
||
" assert transposed.shape == (2, 3, 2)\n",
|
||
"\n",
|
||
" print(\"✅ Complex shape operations work!\")\n",
|
||
"\n",
|
||
" # Test broadcasting edge cases\n",
|
||
" print(\"🧪 Integration Test: Broadcasting Edge Cases...\")\n",
|
||
"\n",
|
||
" # Scalar broadcasting\n",
|
||
" scalar = Tensor(5.0)\n",
|
||
" vector = Tensor([1, 2, 3])\n",
|
||
" result = scalar + vector # Should broadcast scalar to vector shape\n",
|
||
" expected = np.array([6, 7, 8], dtype=np.float32)\n",
|
||
" assert np.array_equal(result.data, expected)\n",
|
||
"\n",
|
||
" # Matrix + vector broadcasting\n",
|
||
" matrix = Tensor([[1, 2], [3, 4]])\n",
|
||
" vec = Tensor([10, 20])\n",
|
||
" result = matrix + vec\n",
|
||
" expected = np.array([[11, 22], [13, 24]], dtype=np.float32)\n",
|
||
" assert np.array_equal(result.data, expected)\n",
|
||
"\n",
|
||
" print(\"✅ Broadcasting edge cases work!\")\n",
|
||
"\n",
|
||
" print(\"\\n\" + \"=\" * 50)\n",
|
||
" print(\"🎉 ALL TESTS PASSED! Module ready for export.\")\n",
|
||
" print(\"Run: tito module complete 01_tensor\")\n",
|
||
"\n",
|
||
"# Run comprehensive module test\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" test_module()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0529e454",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🎯 MODULE SUMMARY: Tensor Foundation\n",
|
||
"\n",
|
||
"Congratulations! You've built the foundational Tensor class that powers all machine learning operations!\n",
|
||
"\n",
|
||
"### Key Accomplishments\n",
|
||
"- **Built a complete Tensor class** with arithmetic operations, matrix multiplication, and shape manipulation\n",
|
||
"- **Implemented broadcasting semantics** that match NumPy for automatic shape alignment\n",
|
||
"- **Created dormant gradient features** that will activate in Module 05 (autograd)\n",
|
||
"- **Added comprehensive ASCII diagrams** showing tensor operations visually\n",
|
||
"- **All methods defined INSIDE the class** (no monkey-patching) for clean, maintainable code\n",
|
||
"- **All tests pass ✅** (validated by `test_module()`)\n",
|
||
"\n",
|
||
"### Systems Insights Discovered\n",
|
||
"- **Memory scaling**: Matrix operations create new tensors (3× memory during computation)\n",
|
||
"- **Broadcasting efficiency**: NumPy's automatic shape alignment vs. explicit operations\n",
|
||
"- **Shape validation trade-offs**: Clear errors vs. performance in tight loops\n",
|
||
"- **Architecture decisions**: Dormant features vs. inheritance for clean evolution\n",
|
||
"\n",
|
||
"### Ready for Next Steps\n",
|
||
"Your Tensor implementation enables all future modules! The dormant gradient features will spring to life in Module 05, and every neural network component will build on this foundation.\n",
|
||
"\n",
|
||
"Export with: `tito module complete 01_tensor`\n",
|
||
"\n",
|
||
"**Next**: Module 02 will add activation functions (ReLU, Sigmoid, GELU) that bring intelligence to neural networks by introducing nonlinearity!"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|