mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-28 06:16:13 -05:00
- Enhanced module-developer agent with Dr. Sarah Rodriguez persona - Added comprehensive educational frameworks and Golden Rules - Implemented Progressive Disclosure Principle (no forward references) - Added Immediate Testing Pattern (test after each implementation) - Integrated package structure template (📦 where code exports to) - Applied clean NBGrader structure with proper scaffolding - Fixed tensor module formatting and scope boundaries - Removed confusing transparent analysis patterns - Added visual impact icons system for consistent motivation 🎯 Ready to apply these proven educational principles to all modules
1712 lines
68 KiB
Plaintext
1712 lines
68 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c8575dba",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"# Tensor - The Foundation of Machine Learning\n",
|
||
"\n",
|
||
"Welcome to Tensor! You'll build the fundamental data structure that powers every neural network.\n",
|
||
"\n",
|
||
"## 🔗 Building on Previous Learning\n",
|
||
"**What You Built Before**:\n",
|
||
"- Module 01 (Setup): Python environment with NumPy, the foundation for numerical computing\n",
|
||
"\n",
|
||
"**What's Working**: You have a complete development environment with all the tools needed for machine learning!\n",
|
||
"\n",
|
||
"**The Gap**: You can import NumPy, but you need to understand how to build the core data structure that makes ML possible.\n",
|
||
"\n",
|
||
"**This Module's Solution**: Build a complete Tensor class that wraps NumPy arrays with ML-specific operations and memory management.\n",
|
||
"\n",
|
||
"**Connection Map**:\n",
|
||
"```\n",
|
||
"Setup → Tensor → Activations\n",
|
||
"(tools) (data) (nonlinearity)\n",
|
||
"```\n",
|
||
"\n",
|
||
"## Learning Objectives\n",
|
||
"\n",
|
||
"By completing this module, you will:\n",
|
||
"\n",
|
||
"1. **Implement tensor operations** - Build a complete N-dimensional array system with arithmetic, broadcasting, and matrix multiplication\n",
|
||
"2. **Master memory efficiency** - Understand why memory layout affects performance more than algorithm choice\n",
|
||
"3. **Create ML-ready APIs** - Design clean interfaces that mirror PyTorch and TensorFlow patterns\n",
|
||
"4. **Enable neural networks** - Build the foundation that supports weights, biases, and data in all ML models\n",
|
||
"\n",
|
||
"## Build → Test → Use\n",
|
||
"\n",
|
||
"1. **Build**: Implement Tensor class with creation, arithmetic, and advanced operations\n",
|
||
"2. **Test**: Validate each component immediately to ensure correctness and performance\n",
|
||
"3. **Use**: Apply tensors to real multi-dimensional data operations that neural networks require"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "68dcb6b0",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"#| default_exp core.tensor\n",
|
||
"\n",
|
||
"#| export\n",
|
||
"import numpy as np\n",
|
||
"import sys\n",
|
||
"from typing import Union, Tuple, Optional, Any\n",
|
||
"import warnings"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "74cad3a4",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"print(\"🔥 TinyTorch Tensor Module\")\n",
|
||
"print(f\"NumPy version: {np.__version__}\")\n",
|
||
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
|
||
"print(\"Ready to build tensors!\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "285c53b1",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## Understanding Tensors: Visual Guide\n",
|
||
"\n",
|
||
"### What Are Tensors? A Visual Journey\n",
|
||
"\n",
|
||
"**The Story**: Think of tensors as smart containers that know their shape and can efficiently store numbers for machine learning. They're like upgraded versions of regular Python lists that understand mathematics.\n",
|
||
"\n",
|
||
"```\n",
|
||
"Scalar (0D Tensor): Vector (1D Tensor): Matrix (2D Tensor):\n",
|
||
" [5] [1, 2, 3] ┌ 1 2 3 ┐\n",
|
||
" │ 4 5 6 │\n",
|
||
" └ 7 8 9 ┘\n",
|
||
"\n",
|
||
"3D Tensor (RGB Image): 4D Tensor (Batch of Images):\n",
|
||
"┌─────────────┐ ┌─────────────┐ ┌─────────────┐\n",
|
||
"│ Red Channel │ │ Image 1 │ │ Image 2 │\n",
|
||
"│ │ │ │ │ │\n",
|
||
"└─────────────┘ └─────────────┘ └─────────────┘\n",
|
||
"┌─────────────┐ ...\n",
|
||
"│Green Channel│\n",
|
||
"│ │\n",
|
||
"└─────────────┘\n",
|
||
"┌─────────────┐\n",
|
||
"│Blue Channel │\n",
|
||
"│ │\n",
|
||
"└─────────────┘\n",
|
||
"```\n",
|
||
"\n",
|
||
"**What's happening step-by-step**: As we add dimensions, tensors represent more complex data. A single number becomes a list, a list becomes a grid, a grid becomes a volume (like an image with red/green/blue channels), and a volume becomes a collection (like a batch of images for training). Each dimension adds a new way to organize and access the data."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "840238d6",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Memory Layout: Why Performance Matters\n",
|
||
"\n",
|
||
"**The Story**: Imagine your computer's memory as a long street with numbered houses. When your CPU needs data, it doesn't just grab one house - it loads an entire city block (64 bytes) into its cache.\n",
|
||
"\n",
|
||
"```\n",
|
||
"Contiguous Memory (FAST):\n",
|
||
"[1][2][3][4][5][6] ──> Cache-friendly, vectorized operations\n",
|
||
" ↑ ↑ ↑ ↑ ↑ ↑\n",
|
||
" Sequential access pattern\n",
|
||
"\n",
|
||
"Non-contiguous Memory (SLOW):\n",
|
||
"[1]...[2].....[3] ──> Cache misses, scattered access\n",
|
||
" ↑ ↑ ↑\n",
|
||
" Random access pattern\n",
|
||
"```\n",
|
||
"\n",
|
||
"**What's happening step-by-step**: When you access element [1], the CPU automatically loads elements [1] through [6] in one cache load. Every subsequent access ([2], [3], [4]...) is already in the cache - no extra memory trips needed! With non-contiguous data, each access requires a new, expensive trip to main memory.\n",
|
||
"\n",
|
||
"**The Performance Impact**: This creates 10-100x speedups because you get 6 elements for the price of fetching 1. It's like getting 6 books from the library for the effort of finding just 1."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "86cb7d01",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Tensor Operations: Broadcasting Magic\n",
|
||
"\n",
|
||
"**The Story**: Broadcasting is like having a smart photocopier that automatically copies data to match different shapes without actually using extra memory. It's NumPy's way of making operations \"just work\" between tensors of different sizes.\n",
|
||
"\n",
|
||
"```\n",
|
||
"Broadcasting Example:\n",
|
||
" Matrix (2×3) + Scalar = Result (2×3)\n",
|
||
" ┌ 1 2 3 ┐ [10] ┌ 11 12 13 ┐\n",
|
||
" └ 4 5 6 ┘ └ 14 15 16 ┘\n",
|
||
"\n",
|
||
"Broadcasting Rules:\n",
|
||
"1. Align shapes from right to left\n",
|
||
"2. Dimensions of size 1 stretch to match\n",
|
||
"3. Missing dimensions assume size 1\n",
|
||
"\n",
|
||
"Vector + Matrix Broadcasting:\n",
|
||
" [1, 2, 3] + [[10], = [[11, 12, 13],\n",
|
||
" (1×3) [20]] [21, 22, 23]]\n",
|
||
" (2×1) (2×3)\n",
|
||
"```\n",
|
||
"\n",
|
||
"**What's happening step-by-step**: Python aligns shapes from right to left, like comparing numbers by their ones place first. When shapes don't match, dimensions of size 1 automatically \"stretch\" to match the larger dimension - but no data is actually copied. The operation happens as if the data were copied, but uses the original memory locations.\n",
|
||
"\n",
|
||
"**Why this matters for ML**: Adding a bias vector to a 1000×1000 matrix would normally require copying the vector 1000 times, but broadcasting does it with zero copies and massive memory savings."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "37bb2239",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Neural Network Data Flow\n",
|
||
"\n",
|
||
"```\n",
|
||
"Batch Processing in Neural Networks:\n",
|
||
"\n",
|
||
"Input Batch (32 images, 28×28 pixels):\n",
|
||
"┌─────────────────────────────────┐\n",
|
||
"│ [Batch=32, Height=28, Width=28] │\n",
|
||
"└─────────────────────────────────┘\n",
|
||
" ↓ Flatten\n",
|
||
"┌─────────────────────────────────┐\n",
|
||
"│ [Batch=32, Features=784] │ ← Matrix multiplication ready\n",
|
||
"└─────────────────────────────────┘\n",
|
||
" ↓ Linear Layer\n",
|
||
"┌─────────────────────────────────┐\n",
|
||
"│ [Batch=32, Hidden=128] │ ← Hidden layer activations\n",
|
||
"└─────────────────────────────────┘\n",
|
||
"\n",
|
||
"Why batching matters:\n",
|
||
"- Single image: 784 × 128 = 100,352 operations\n",
|
||
"- Batch of 32: Same 100,352 ops, but 32× the data\n",
|
||
"- GPU utilization: 32× better parallelization\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2e97ea75",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## The Mathematical Foundation\n",
|
||
"\n",
|
||
"Before we implement, let's understand the mathematical concepts:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5a2597fa",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Scalars to Tensors: Building Complexity\n",
|
||
"\n",
|
||
"**Scalar (Rank 0)**:\n",
|
||
"- A single number: `5.0` or `temperature`\n",
|
||
"- Shape: `()` (empty tuple)\n",
|
||
"- ML examples: loss values, learning rates\n",
|
||
"\n",
|
||
"**Vector (Rank 1)**:\n",
|
||
"- Ordered list of numbers: `[1, 2, 3]`\n",
|
||
"- Shape: `(3,)` (one dimension)\n",
|
||
"- ML examples: word embeddings, gradients\n",
|
||
"\n",
|
||
"**Matrix (Rank 2)**:\n",
|
||
"- 2D array: `[[1, 2], [3, 4]]`\n",
|
||
"- Shape: `(2, 2)` (rows, columns)\n",
|
||
"- ML examples: weight matrices, images\n",
|
||
"\n",
|
||
"**Higher-Order Tensors**:\n",
|
||
"- 3D: RGB images `(height, width, channels)`\n",
|
||
"- 4D: Image batches `(batch, height, width, channels)`\n",
|
||
"- 5D: Video batches `(batch, time, height, width, channels)`"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "51dbe323",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Why Not Just Use NumPy?\n",
|
||
"\n",
|
||
"While NumPy is excellent, our Tensor class adds ML-specific features:\n",
|
||
"\n",
|
||
"**Future Extensions** (coming in later modules):\n",
|
||
"- **Automatic gradients**: Track operations for backpropagation\n",
|
||
"- **GPU acceleration**: Move computations to graphics cards\n",
|
||
"- **Lazy evaluation**: Build computation graphs for optimization\n",
|
||
"\n",
|
||
"**Educational Value**:\n",
|
||
"- **Understanding**: See how PyTorch/TensorFlow work internally\n",
|
||
"- **Debugging**: Trace operations step by step\n",
|
||
"- **Customization**: Add domain-specific operations"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "076ad694",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Implementation Overview\n",
|
||
"\n",
|
||
"Our Tensor class design:\n",
|
||
"\n",
|
||
"```python\n",
|
||
"class Tensor:\n",
|
||
" def __init__(self, data) # Create from any data type\n",
|
||
"\n",
|
||
" # Properties\n",
|
||
" .shape # Dimensions tuple\n",
|
||
" .size # Total element count\n",
|
||
" .dtype # Data type\n",
|
||
" .data # Access underlying NumPy array\n",
|
||
"\n",
|
||
" # Arithmetic Operations\n",
|
||
" def __add__(self, other) # tensor + tensor\n",
|
||
" def __mul__(self, other) # tensor * tensor\n",
|
||
" def __sub__(self, other) # tensor - tensor\n",
|
||
" def __truediv__(self, other) # tensor / tensor\n",
|
||
"\n",
|
||
" # Advanced Operations\n",
|
||
" def matmul(self, other) # Matrix multiplication\n",
|
||
" def sum(self, axis=None) # Sum along axes\n",
|
||
" def reshape(self, *shape) # Change shape\n",
|
||
"```"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "fc9cadb3",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1,
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "tensor-init",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"#| export\n",
|
||
"class Tensor:\n",
|
||
" \"\"\"\n",
|
||
" TinyTorch Tensor: N-dimensional array with ML operations.\n",
|
||
"\n",
|
||
" The fundamental data structure for all TinyTorch operations.\n",
|
||
" Wraps NumPy arrays with ML-specific functionality.\n",
|
||
" \"\"\"\n",
|
||
"\n",
|
||
" def __init__(self, data: Any, dtype: Optional[str] = None, requires_grad: bool = False):\n",
|
||
" \"\"\"\n",
|
||
" Create a new tensor from data.\n",
|
||
"\n",
|
||
" Args:\n",
|
||
" data: Input data (scalar, list, or numpy array)\n",
|
||
" dtype: Data type ('float32', 'int32', etc.). Defaults to auto-detect.\n",
|
||
" requires_grad: Whether this tensor needs gradients for training. Defaults to False.\n",
|
||
"\n",
|
||
" TODO: Implement tensor creation with simple, clear type handling.\n",
|
||
"\n",
|
||
" APPROACH (Clear implementation for learning):\n",
|
||
" 1. Convert input data to numpy array - NumPy handles conversions\n",
|
||
" 2. Apply dtype if specified - common string types like 'float32'\n",
|
||
" 3. Set default float32 for float64 arrays - ML convention for efficiency\n",
|
||
" 4. Store the result in self._data - internal storage for numpy array\n",
|
||
" 5. Initialize gradient tracking - prepares for automatic differentiation\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" >>> Tensor(5)\n",
|
||
" # Creates: np.array(5, dtype='int32')\n",
|
||
" >>> Tensor([1.0, 2.0, 3.0])\n",
|
||
" # Creates: np.array([1.0, 2.0, 3.0], dtype='float32')\n",
|
||
" >>> Tensor([1, 2, 3], dtype='float32')\n",
|
||
" # Creates: np.array([1, 2, 3], dtype='float32')\n",
|
||
"\n",
|
||
" PRODUCTION CONTEXT:\n",
|
||
" PyTorch tensors handle 47+ dtype formats with complex validation.\n",
|
||
" Our version teaches the core concept that transfers directly.\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" # Convert input to numpy array - let NumPy handle most conversions\n",
|
||
" if isinstance(data, Tensor):\n",
|
||
" # Input is another Tensor - copy data efficiently\n",
|
||
" self._data = data.data.copy()\n",
|
||
" else:\n",
|
||
" # Convert to numpy array\n",
|
||
" self._data = np.array(data)\n",
|
||
"\n",
|
||
" # Apply dtype if specified\n",
|
||
" if dtype is not None:\n",
|
||
" self._data = self._data.astype(dtype)\n",
|
||
" elif self._data.dtype == np.float64:\n",
|
||
" # ML convention: prefer float32 for memory and GPU efficiency\n",
|
||
" self._data = self._data.astype(np.float32)\n",
|
||
"\n",
|
||
" # Initialize gradient tracking attributes (used in Module 9 - Autograd)\n",
|
||
" self.requires_grad = requires_grad\n",
|
||
" self.grad = None\n",
|
||
" self._grad_fn = None\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" @property\n",
|
||
" def data(self) -> np.ndarray:\n",
|
||
" \"\"\"\n",
|
||
" Access underlying numpy array.\n",
|
||
"\n",
|
||
" TODO: Return the stored numpy array.\n",
|
||
"\n",
|
||
" APPROACH (Medium comments for property methods):\n",
|
||
" 1. Access the internal _data attribute\n",
|
||
" 2. Return the numpy array directly - enables NumPy integration\n",
|
||
" 3. This provides access to underlying data for visualization/analysis\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - PyTorch: tensor.numpy() converts to NumPy for scientific computing\n",
|
||
" - TensorFlow: tensor.numpy() enables integration with matplotlib/scipy\n",
|
||
" - Production use: Data scientists need raw arrays for debugging/visualization\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" return self._data\n",
|
||
" ### END SOLUTION\n",
|
||
" \n",
|
||
" @data.setter\n",
|
||
" def data(self, value: Union[np.ndarray, 'Tensor']) -> None:\n",
|
||
" \"\"\"Set the underlying data of the tensor.\"\"\"\n",
|
||
" if isinstance(value, Tensor):\n",
|
||
" self._data = value._data.copy()\n",
|
||
" else:\n",
|
||
" self._data = np.array(value)\n",
|
||
"\n",
|
||
" @property\n",
|
||
" def shape(self) -> Tuple[int, ...]:\n",
|
||
" \"\"\"\n",
|
||
" Get tensor shape.\n",
|
||
"\n",
|
||
" TODO: Return the shape of the stored numpy array.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Access the _data attribute (the NumPy array)\n",
|
||
" 2. Get the shape property from the NumPy array\n",
|
||
" 3. Return the shape tuple directly\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Neural networks: Layer compatibility requires matching shapes\n",
|
||
" - Computer vision: Image shape (height, width, channels) determines architecture\n",
|
||
" - Debugging: Shape mismatches are the #1 cause of ML errors\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" return self._data.shape\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" @property\n",
|
||
" def size(self) -> int:\n",
|
||
" \"\"\"\n",
|
||
" Get total number of elements.\n",
|
||
"\n",
|
||
" TODO: Return the total number of elements in the tensor.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Access the _data attribute (the NumPy array)\n",
|
||
" 2. Get the size property from the NumPy array\n",
|
||
" 3. Return the total element count as an integer\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Memory planning: Calculate RAM requirements for large tensors\n",
|
||
" - Model architecture: Determine parameter counts for layers\n",
|
||
" - Performance: Size affects computation time and vectorization efficiency\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" return self._data.size\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" @property\n",
|
||
" def dtype(self) -> np.dtype:\n",
|
||
" \"\"\"\n",
|
||
" Get data type as numpy dtype.\n",
|
||
"\n",
|
||
" TODO: Return the data type of the stored numpy array.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Access the _data attribute\n",
|
||
" 2. Get the dtype property\n",
|
||
" 3. Return the NumPy dtype object\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Precision vs speed: float32 is faster, float64 more accurate\n",
|
||
" - Memory optimization: int8 uses 1/4 memory of int32\n",
|
||
" - GPU compatibility: Some operations only work with specific types\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" return self._data.dtype\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" @property\n",
|
||
" def strides(self) -> Tuple[int, ...]:\n",
|
||
" \"\"\"\n",
|
||
" Get memory stride pattern of the tensor.\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" Tuple of byte strides for each dimension\n",
|
||
" \n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Memory layout analysis: Understanding cache efficiency\n",
|
||
" - Performance debugging: Non-unit strides can indicate copies\n",
|
||
" - Advanced operations: Enables efficient transpose and reshape operations\n",
|
||
" \"\"\"\n",
|
||
" return self._data.strides\n",
|
||
" \n",
|
||
" @property\n",
|
||
" def is_contiguous(self) -> bool:\n",
|
||
" \"\"\"\n",
|
||
" Check if tensor data is stored in contiguous memory.\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" True if data is contiguous in C-order (row-major)\n",
|
||
" \n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Performance critical: Contiguous data enables vectorization\n",
|
||
" - Memory efficiency: Contiguous operations can be 10-100x faster\n",
|
||
" - GPU transfers: Contiguous data transfers more efficiently\n",
|
||
" \"\"\"\n",
|
||
" return self._data.flags['C_CONTIGUOUS']\n",
|
||
"\n",
|
||
" def __repr__(self) -> str:\n",
|
||
" \"\"\"\n",
|
||
" String representation with size limits for readability.\n",
|
||
"\n",
|
||
" TODO: Create a clear string representation of the tensor.\n",
|
||
"\n",
|
||
" APPROACH (Light comments for utility methods):\n",
|
||
" 1. Check tensor size - if large, show shape/dtype only\n",
|
||
" 2. For small tensors, convert numpy array to list using .tolist()\n",
|
||
" 3. Format appropriately and return string\n",
|
||
"\n",
|
||
" EXAMPLE:\n",
|
||
" Tensor([1, 2, 3]) → \"Tensor([1, 2, 3], shape=(3,), dtype=int32)\"\n",
|
||
" Large tensor → \"Tensor(shape=(1000, 1000), dtype=float32)\"\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if self.size > 20:\n",
|
||
" # Large tensors: show shape and dtype only for readability\n",
|
||
" return f\"Tensor(shape={self.shape}, dtype={self.dtype})\"\n",
|
||
" else:\n",
|
||
" # Small tensors: show data, shape, and dtype\n",
|
||
" return f\"Tensor({self._data.tolist()}, shape={self.shape}, dtype={self.dtype})\"\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def item(self) -> Union[int, float]:\n",
|
||
" \"\"\"Extract a scalar value from a single-element tensor.\"\"\"\n",
|
||
" if self._data.size != 1:\n",
|
||
" raise ValueError(f\"item() can only be called on tensors with exactly one element, got {self._data.size} elements\")\n",
|
||
" return self._data.item()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "91b993b2",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "tensor-arithmetic",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
" def add(self, other: 'Tensor') -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Add two tensors element-wise.\n",
|
||
"\n",
|
||
" TODO: Implement tensor addition.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Extract numpy arrays from both tensors\n",
|
||
" 2. Use NumPy's + operator for element-wise addition\n",
|
||
" 3. Create new Tensor object with result\n",
|
||
" 4. Return the new tensor\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Neural networks: Adding bias terms to linear layer outputs\n",
|
||
" - Residual connections: skip connections in ResNet architectures\n",
|
||
" - Gradient updates: Adding computed gradients to parameters\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" result_data = self._data + other._data\n",
|
||
" result = Tensor(result_data)\n",
|
||
" \n",
|
||
" # TODO: Gradient tracking will be added in Module 9 (Autograd)\n",
|
||
" # This enables automatic differentiation for neural network training\n",
|
||
" # For now, we focus on the core tensor operation\n",
|
||
" \n",
|
||
" return result\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def multiply(self, other: 'Tensor') -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Multiply two tensors element-wise.\n",
|
||
"\n",
|
||
" TODO: Implement tensor multiplication.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Extract numpy arrays from both tensors\n",
|
||
" 2. Use NumPy's * operator for element-wise multiplication\n",
|
||
" 3. Create new Tensor object with result\n",
|
||
" 4. Return the new tensor\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Activation functions: Element-wise operations like ReLU masking\n",
|
||
" - Attention mechanisms: Element-wise scaling in transformer models\n",
|
||
" - Feature scaling: Multiplying features by learned scaling factors\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" result_data = self._data * other._data\n",
|
||
" result = Tensor(result_data)\n",
|
||
" \n",
|
||
" # TODO: Gradient tracking will be added in Module 9 (Autograd)\n",
|
||
" # This enables automatic differentiation for neural network training\n",
|
||
" # For now, we focus on the core tensor operation\n",
|
||
" \n",
|
||
" return result\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __add__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Addition operator: tensor + other\n",
|
||
"\n",
|
||
" TODO: Implement + operator for tensors.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Check if other is a Tensor object\n",
|
||
" 2. If Tensor, call the add() method directly\n",
|
||
" 3. If scalar, convert to Tensor then call add()\n",
|
||
" 4. Return the result from add() method\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Natural syntax: tensor + scalar enables intuitive code\n",
|
||
" - Broadcasting: Adding scalars to tensors is common in ML\n",
|
||
" - API design: Clean interfaces reduce cognitive load for researchers\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" return self.add(other)\n",
|
||
" else:\n",
|
||
" return self.add(Tensor(other))\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __mul__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Multiplication operator: tensor * other\n",
|
||
"\n",
|
||
" TODO: Implement * operator for tensors.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Check if other is a Tensor object\n",
|
||
" 2. If Tensor, call the multiply() method directly\n",
|
||
" 3. If scalar, convert to Tensor then call multiply()\n",
|
||
" 4. Return the result from multiply() method\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Scaling features: tensor * learning_rate for gradient updates\n",
|
||
" - Masking: tensor * mask for attention mechanisms\n",
|
||
" - Regularization: tensor * dropout_mask during training\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" return self.multiply(other)\n",
|
||
" else:\n",
|
||
" return self.multiply(Tensor(other))\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __sub__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Subtraction operator: tensor - other\n",
|
||
"\n",
|
||
" TODO: Implement - operator for tensors.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Check if other is a Tensor object\n",
|
||
" 2. If Tensor, subtract other._data from self._data\n",
|
||
" 3. If scalar, subtract scalar directly from self._data\n",
|
||
" 4. Create new Tensor with result and return\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Gradient computation: parameter - learning_rate * gradient\n",
|
||
" - Error calculation: predicted - actual for loss computation\n",
|
||
" - Centering data: tensor - mean for zero-centered inputs\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" result = self._data - other._data\n",
|
||
" else:\n",
|
||
" result = self._data - other\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __truediv__(self, other: Union['Tensor', int, float]) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Division operator: tensor / other\n",
|
||
"\n",
|
||
" TODO: Implement / operator for tensors.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Check if other is a Tensor object\n",
|
||
" 2. If Tensor, divide self._data by other._data\n",
|
||
" 3. If scalar, divide self._data by scalar directly\n",
|
||
" 4. Create new Tensor with result and return\n",
|
||
"\n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Normalization: tensor / std_deviation for standard scaling\n",
|
||
" - Learning rate decay: parameter / decay_factor over time\n",
|
||
" - Probability computation: counts / total_counts for frequencies\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" if isinstance(other, Tensor):\n",
|
||
" result = self._data / other._data\n",
|
||
" else:\n",
|
||
" result = self._data / other\n",
|
||
" return Tensor(result)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def mean(self) -> 'Tensor':\n",
|
||
" \"\"\"Computes the mean of the tensor's elements.\"\"\"\n",
|
||
" return Tensor(np.mean(self.data))\n",
|
||
" \n",
|
||
" def sum(self, axis=None, keepdims=False) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Sum tensor elements along specified axes.\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" axis: Axis or axes to sum over. If None, sum all elements.\n",
|
||
" keepdims: Whether to keep dimensions of size 1 in output.\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" New tensor with summed values.\n",
|
||
" \"\"\"\n",
|
||
" result_data = np.sum(self._data, axis=axis, keepdims=keepdims)\n",
|
||
" result = Tensor(result_data)\n",
|
||
" \n",
|
||
" if self.requires_grad:\n",
|
||
" result.requires_grad = True\n",
|
||
" \n",
|
||
" def grad_fn(grad):\n",
|
||
" # Sum gradient: broadcast gradient back to original shape\n",
|
||
" grad_data = grad.data\n",
|
||
" if axis is None:\n",
|
||
" # Sum over all axes - gradient is broadcast to full shape\n",
|
||
" grad_data = np.full(self.shape, grad_data)\n",
|
||
" else:\n",
|
||
" # Sum over specific axes - expand back those dimensions\n",
|
||
" if not isinstance(axis, tuple):\n",
|
||
" axis_tuple = (axis,) if axis is not None else ()\n",
|
||
" else:\n",
|
||
" axis_tuple = axis\n",
|
||
" \n",
|
||
" # Expand dimensions that were summed\n",
|
||
" for ax in sorted(axis_tuple):\n",
|
||
" if ax < 0:\n",
|
||
" ax = len(self.shape) + ax\n",
|
||
" grad_data = np.expand_dims(grad_data, axis=ax)\n",
|
||
" \n",
|
||
" # Broadcast to original shape\n",
|
||
" grad_data = np.broadcast_to(grad_data, self.shape)\n",
|
||
" \n",
|
||
" self.backward(Tensor(grad_data))\n",
|
||
" \n",
|
||
" result._grad_fn = grad_fn\n",
|
||
" \n",
|
||
" return result"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "5c4b5e57",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "tensor-matmul",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
" def matmul(self, other: 'Tensor') -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Matrix multiplication using NumPy's optimized implementation.\n",
|
||
"\n",
|
||
" TODO: Implement matrix multiplication.\n",
|
||
"\n",
|
||
" APPROACH:\n",
|
||
" 1. Extract numpy arrays from both tensors\n",
|
||
" 2. Check tensor shapes for compatibility\n",
|
||
" 3. Use NumPy's optimized dot product\n",
|
||
" 4. Create new Tensor object with the result\n",
|
||
" 5. Return the new tensor\n",
|
||
" \"\"\"\n",
|
||
" ### BEGIN SOLUTION\n",
|
||
" a_data = self._data\n",
|
||
" b_data = other._data\n",
|
||
"\n",
|
||
" # Validate tensor shapes\n",
|
||
" if len(a_data.shape) != 2 or len(b_data.shape) != 2:\n",
|
||
" raise ValueError(\"matmul requires 2D tensors\")\n",
|
||
"\n",
|
||
" m, k = a_data.shape\n",
|
||
" k2, n = b_data.shape\n",
|
||
"\n",
|
||
" if k != k2:\n",
|
||
" raise ValueError(f\"Inner dimensions must match: {k} != {k2}\")\n",
|
||
"\n",
|
||
" # Use NumPy's optimized implementation\n",
|
||
" result_data = np.dot(a_data, b_data)\n",
|
||
" return Tensor(result_data)\n",
|
||
" ### END SOLUTION\n",
|
||
"\n",
|
||
" def __matmul__(self, other: 'Tensor') -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Matrix multiplication operator: tensor @ other\n",
|
||
"\n",
|
||
" Enables the @ operator for matrix multiplication, providing\n",
|
||
" clean syntax for neural network operations.\n",
|
||
" \"\"\"\n",
|
||
" return self.matmul(other)\n",
|
||
"\n",
|
||
" def backward(self, gradient=None):\n",
|
||
" \"\"\"\n",
|
||
" Compute gradients for this tensor and propagate backward.\n",
|
||
"\n",
|
||
" Basic backward pass - accumulates gradients and propagates to dependencies.\n",
|
||
" This enables simple gradient computation for basic operations.\n",
|
||
"\n",
|
||
" Args:\n",
|
||
" gradient: Gradient from upstream. If None, assumes scalar with grad=1\n",
|
||
" \"\"\"\n",
|
||
" if not self.requires_grad:\n",
|
||
" return\n",
|
||
"\n",
|
||
" if gradient is None:\n",
|
||
" # Scalar case - gradient is 1\n",
|
||
" gradient = Tensor(np.ones_like(self._data))\n",
|
||
"\n",
|
||
" # Accumulate gradients\n",
|
||
" if self.grad is None:\n",
|
||
" self.grad = gradient\n",
|
||
" else:\n",
|
||
" self.grad = self.grad + gradient\n",
|
||
"\n",
|
||
" # Propagate to dependencies via grad_fn\n",
|
||
" if self._grad_fn is not None:\n",
|
||
" self._grad_fn(gradient)\n",
|
||
" \n",
|
||
" def zero_grad(self):\n",
|
||
" \"\"\"Reset gradients to None. Used by optimizers before backward pass.\"\"\"\n",
|
||
" self.grad = None"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a8f6f7d5",
|
||
"metadata": {
|
||
"nbgrader": {
|
||
"grade": false,
|
||
"grade_id": "tensor-reshape",
|
||
"solution": true
|
||
}
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
" def reshape(self, *shape: int) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Return a new tensor with the same data but different shape.\n",
|
||
"\n",
|
||
" Args:\n",
|
||
" *shape: New shape dimensions. Use -1 for automatic sizing.\n",
|
||
"\n",
|
||
" Returns:\n",
|
||
" New Tensor with reshaped data\n",
|
||
" \n",
|
||
" Note:\n",
|
||
" This returns a view when possible (no copying), or a copy when necessary.\n",
|
||
" Use .contiguous() after reshape if you need guaranteed contiguous memory.\n",
|
||
" \"\"\"\n",
|
||
" reshaped_data = self._data.reshape(*shape)\n",
|
||
" result = Tensor(reshaped_data)\n",
|
||
" \n",
|
||
" # Preserve gradient tracking\n",
|
||
" if self.requires_grad:\n",
|
||
" result.requires_grad = True\n",
|
||
" \n",
|
||
" def grad_fn(grad):\n",
|
||
" # Reshape gradient back to original shape\n",
|
||
" orig_grad = grad.reshape(*self.shape)\n",
|
||
" self.backward(orig_grad)\n",
|
||
" \n",
|
||
" result._grad_fn = grad_fn\n",
|
||
" \n",
|
||
" return result\n",
|
||
" \n",
|
||
" def view(self, *shape: int) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Return a view of the tensor with a new shape. Alias for reshape.\n",
|
||
" \n",
|
||
" Args:\n",
|
||
" *shape: New shape dimensions. Use -1 for automatic sizing.\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" New Tensor sharing the same data (view when possible)\n",
|
||
" \n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - PyTorch compatibility: .view() is the PyTorch equivalent\n",
|
||
" - Memory efficiency: Views avoid copying data when possible\n",
|
||
" - Performance critical: Views enable efficient transformations\n",
|
||
" \"\"\"\n",
|
||
" return self.reshape(*shape)\n",
|
||
" \n",
|
||
" def clone(self) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Create a deep copy of the tensor.\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" New Tensor with copied data\n",
|
||
" \n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Memory isolation: Ensures modifications don't affect original\n",
|
||
" - Gradient tracking: Clones maintain independent gradient graphs\n",
|
||
" - Safe operations: Use when you need guaranteed data independence\n",
|
||
" \"\"\"\n",
|
||
" cloned_data = self._data.copy()\n",
|
||
" result = Tensor(cloned_data)\n",
|
||
" \n",
|
||
" # Clone preserves gradient requirements but starts fresh grad tracking\n",
|
||
" result.requires_grad = self.requires_grad\n",
|
||
" # Note: grad and grad_fn are NOT copied - clone starts fresh\n",
|
||
" \n",
|
||
" return result\n",
|
||
" \n",
|
||
" def contiguous(self) -> 'Tensor':\n",
|
||
" \"\"\"\n",
|
||
" Return a contiguous tensor with the same data.\n",
|
||
" \n",
|
||
" Returns:\n",
|
||
" Tensor with contiguous memory layout (may be a copy)\n",
|
||
" \n",
|
||
" PRODUCTION CONNECTION:\n",
|
||
" - Performance optimization: Ensures optimal memory layout\n",
|
||
" - GPU operations: Many CUDA operations require contiguous data\n",
|
||
" - Cache efficiency: Contiguous data maximizes CPU cache utilization\n",
|
||
" \"\"\"\n",
|
||
" if self.is_contiguous:\n",
|
||
" return self # Already contiguous, return self\n",
|
||
" \n",
|
||
" # Make contiguous copy\n",
|
||
" contiguous_data = np.ascontiguousarray(self._data)\n",
|
||
" result = Tensor(contiguous_data)\n",
|
||
" \n",
|
||
" # Preserve gradient tracking\n",
|
||
" result.requires_grad = self.requires_grad\n",
|
||
" if self.requires_grad:\n",
|
||
" def grad_fn(grad):\n",
|
||
" self.backward(grad)\n",
|
||
" result._grad_fn = grad_fn\n",
|
||
" \n",
|
||
" return result\n",
|
||
"\n",
|
||
" def numpy(self) -> np.ndarray:\n",
|
||
" \"\"\"\n",
|
||
" Convert tensor to NumPy array.\n",
|
||
" \n",
|
||
" This is the PyTorch-inspired method for tensor-to-numpy conversion.\n",
|
||
" Provides clean interface for interoperability with NumPy operations.\n",
|
||
" \"\"\"\n",
|
||
" return self._data\n",
|
||
" \n",
|
||
" def __array__(self, dtype=None) -> np.ndarray:\n",
|
||
" \"\"\"Enable np.array(tensor) and np.allclose(tensor, array).\"\"\"\n",
|
||
" if dtype is not None:\n",
|
||
" return self._data.astype(dtype)\n",
|
||
" return self._data\n",
|
||
" \n",
|
||
" def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):\n",
|
||
" \"\"\"Enable NumPy universal functions with Tensor objects.\"\"\"\n",
|
||
" # Convert Tensor inputs to NumPy arrays\n",
|
||
" args = []\n",
|
||
" for input_ in inputs:\n",
|
||
" if isinstance(input_, Tensor):\n",
|
||
" args.append(input_._data)\n",
|
||
" else:\n",
|
||
" args.append(input_)\n",
|
||
" \n",
|
||
" # Call the ufunc on NumPy arrays\n",
|
||
" outputs = getattr(ufunc, method)(*args, **kwargs)\n",
|
||
" \n",
|
||
" # If method returns NotImplemented, let NumPy handle it\n",
|
||
" if outputs is NotImplemented:\n",
|
||
" return NotImplemented\n",
|
||
" \n",
|
||
" # Wrap result back in Tensor if appropriate\n",
|
||
" if method == '__call__':\n",
|
||
" if isinstance(outputs, np.ndarray):\n",
|
||
" return Tensor(outputs)\n",
|
||
" elif isinstance(outputs, tuple):\n",
|
||
" return tuple(Tensor(output) if isinstance(output, np.ndarray) else output \n",
|
||
" for output in outputs)\n",
|
||
" \n",
|
||
" return outputs\n",
|
||
"\n",
|
||
"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0ce24a6f",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"## Testing Your Tensor Implementation\n",
|
||
"\n",
|
||
"Let's validate each component immediately to ensure everything works correctly:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "37e009e2",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Tensor Creation\n",
|
||
"\n",
|
||
"Let's test your tensor creation implementation right away! This gives you immediate feedback on whether your `__init__` method works correctly."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "eff5b3e5",
|
||
"metadata": {
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"def test_unit_tensor_creation():\n",
|
||
" \"\"\"Test tensor creation with all data types and shapes.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Tensor Creation...\")\n",
|
||
" \n",
|
||
" try:\n",
|
||
" # Test scalar\n",
|
||
" scalar = Tensor(5.0)\n",
|
||
" assert hasattr(scalar, '_data'), \"Tensor should have _data attribute\"\n",
|
||
" assert scalar._data.shape == (), f\"Scalar should have shape (), got {scalar._data.shape}\"\n",
|
||
" print(\"✅ Scalar creation works\")\n",
|
||
"\n",
|
||
" # Test vector\n",
|
||
" vector = Tensor([1, 2, 3])\n",
|
||
" assert vector._data.shape == (3,), f\"Vector should have shape (3,), got {vector._data.shape}\"\n",
|
||
" print(\"✅ Vector creation works\")\n",
|
||
"\n",
|
||
" # Test matrix\n",
|
||
" matrix = Tensor([[1, 2], [3, 4]])\n",
|
||
" assert matrix._data.shape == (2, 2), f\"Matrix should have shape (2, 2), got {matrix._data.shape}\"\n",
|
||
" print(\"✅ Matrix creation works\")\n",
|
||
"\n",
|
||
" print(\"📈 Progress: Tensor Creation ✓\")\n",
|
||
"\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Tensor creation test failed: {e}\")\n",
|
||
" raise\n",
|
||
"\n",
|
||
" print(\"🎯 Tensor creation behavior:\")\n",
|
||
" print(\" Converts data to NumPy arrays\")\n",
|
||
" print(\" Preserves shape and data type\")\n",
|
||
" print(\" Stores in _data attribute\")\n",
|
||
"\n",
|
||
"test_unit_tensor_creation()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0abae867",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Tensor Properties\n",
|
||
"\n",
|
||
"Now let's test that your tensor properties work correctly. This tests the @property methods you implemented."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "05c92150",
|
||
"metadata": {
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"def test_unit_tensor_properties():\n",
|
||
" \"\"\"Test tensor properties (shape, size, dtype, data access).\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Tensor Properties...\")\n",
|
||
" \n",
|
||
" try:\n",
|
||
" # Test with a simple matrix\n",
|
||
" tensor = Tensor([[1, 2, 3], [4, 5, 6]])\n",
|
||
"\n",
|
||
" # Test shape property\n",
|
||
" assert tensor.shape == (2, 3), f\"Shape should be (2, 3), got {tensor.shape}\"\n",
|
||
" print(\"✅ Shape property works\")\n",
|
||
"\n",
|
||
" # Test size property\n",
|
||
" assert tensor.size == 6, f\"Size should be 6, got {tensor.size}\"\n",
|
||
" print(\"✅ Size property works\")\n",
|
||
"\n",
|
||
" # Test data property\n",
|
||
" assert np.array_equal(tensor.data, np.array([[1, 2, 3], [4, 5, 6]])), \"Data property should return numpy array\"\n",
|
||
" print(\"✅ Data property works\")\n",
|
||
"\n",
|
||
" # Test dtype property\n",
|
||
" assert tensor.dtype in [np.int32, np.int64], f\"Dtype should be int32 or int64, got {tensor.dtype}\"\n",
|
||
" print(\"✅ Dtype property works\")\n",
|
||
"\n",
|
||
" print(\"📈 Progress: Tensor Properties ✓\")\n",
|
||
"\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Tensor properties test failed: {e}\")\n",
|
||
" raise\n",
|
||
"\n",
|
||
" print(\"🎯 Tensor properties behavior:\")\n",
|
||
" print(\" shape: Returns tuple of dimensions\")\n",
|
||
" print(\" size: Returns total number of elements\")\n",
|
||
" print(\" data: Returns underlying NumPy array\")\n",
|
||
" print(\" dtype: Returns NumPy data type\")\n",
|
||
"\n",
|
||
"test_unit_tensor_properties()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "94247bc9",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Tensor Arithmetic\n",
|
||
"\n",
|
||
"Let's test your tensor arithmetic operations. This tests the __add__, __mul__, __sub__, __truediv__ methods."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "2704d05a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"def test_unit_tensor_arithmetic():\n",
|
||
" \"\"\"Test tensor arithmetic operations.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Tensor Arithmetic...\")\n",
|
||
" \n",
|
||
" try:\n",
|
||
" # Test addition\n",
|
||
" a = Tensor([1, 2, 3])\n",
|
||
" b = Tensor([4, 5, 6])\n",
|
||
" result = a + b\n",
|
||
" expected = np.array([5, 7, 9])\n",
|
||
" assert np.array_equal(result.data, expected), f\"Addition failed: expected {expected}, got {result.data}\"\n",
|
||
" print(\"✅ Addition works\")\n",
|
||
"\n",
|
||
" # Test scalar addition\n",
|
||
" result_scalar = a + 10\n",
|
||
" expected_scalar = np.array([11, 12, 13])\n",
|
||
" assert np.array_equal(result_scalar.data, expected_scalar), f\"Scalar addition failed: expected {expected_scalar}, got {result_scalar.data}\"\n",
|
||
" print(\"✅ Scalar addition works\")\n",
|
||
"\n",
|
||
" # Test multiplication\n",
|
||
" result_mul = a * b\n",
|
||
" expected_mul = np.array([4, 10, 18])\n",
|
||
" assert np.array_equal(result_mul.data, expected_mul), f\"Multiplication failed: expected {expected_mul}, got {result_mul.data}\"\n",
|
||
" print(\"✅ Multiplication works\")\n",
|
||
"\n",
|
||
" # Test scalar multiplication\n",
|
||
" result_scalar_mul = a * 2\n",
|
||
" expected_scalar_mul = np.array([2, 4, 6])\n",
|
||
" assert np.array_equal(result_scalar_mul.data, expected_scalar_mul), f\"Scalar multiplication failed: expected {expected_scalar_mul}, got {result_scalar_mul.data}\"\n",
|
||
" print(\"✅ Scalar multiplication works\")\n",
|
||
"\n",
|
||
" # Test subtraction\n",
|
||
" result_sub = b - a\n",
|
||
" expected_sub = np.array([3, 3, 3])\n",
|
||
" assert np.array_equal(result_sub.data, expected_sub), f\"Subtraction failed: expected {expected_sub}, got {result_sub.data}\"\n",
|
||
" print(\"✅ Subtraction works\")\n",
|
||
"\n",
|
||
" # Test division\n",
|
||
" result_div = b / a\n",
|
||
" expected_div = np.array([4.0, 2.5, 2.0])\n",
|
||
" assert np.allclose(result_div.data, expected_div), f\"Division failed: expected {expected_div}, got {result_div.data}\"\n",
|
||
" print(\"✅ Division works\")\n",
|
||
"\n",
|
||
" print(\"📈 Progress: Tensor Arithmetic ✓\")\n",
|
||
"\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Tensor arithmetic test failed: {e}\")\n",
|
||
" raise\n",
|
||
"\n",
|
||
" print(\"🎯 Tensor arithmetic behavior:\")\n",
|
||
" print(\" Element-wise operations on tensors\")\n",
|
||
" print(\" Broadcasting with scalars\")\n",
|
||
" print(\" Returns new Tensor objects\")\n",
|
||
" print(\" Preserves numerical precision\")\n",
|
||
"\n",
|
||
"test_unit_tensor_arithmetic()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1da8fe1f",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Matrix Multiplication\n",
|
||
"\n",
|
||
"Test the matrix multiplication implementation that shows both educational and optimized approaches."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "66806e77",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"def test_unit_matrix_multiplication():\n",
|
||
" \"\"\"Test matrix multiplication with educational and optimized paths.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Matrix Multiplication...\")\n",
|
||
" \n",
|
||
" try:\n",
|
||
" # Small matrix (educational path)\n",
|
||
" small_a = Tensor([[1, 2], [3, 4]])\n",
|
||
" small_b = Tensor([[5, 6], [7, 8]])\n",
|
||
" small_result = small_a @ small_b\n",
|
||
" small_expected = np.array([[19, 22], [43, 50]])\n",
|
||
" assert np.array_equal(small_result.data, small_expected), f\"Small matmul failed: expected {small_expected}, got {small_result.data}\"\n",
|
||
" print(\"✅ Small matrix multiplication (educational) works\")\n",
|
||
"\n",
|
||
" # Large matrix (optimized path) \n",
|
||
" large_a = Tensor(np.random.randn(100, 50))\n",
|
||
" large_b = Tensor(np.random.randn(50, 80))\n",
|
||
" large_result = large_a @ large_b\n",
|
||
" assert large_result.shape == (100, 80), f\"Large matmul shape wrong: expected (100, 80), got {large_result.shape}\"\n",
|
||
" \n",
|
||
" # Verify with NumPy\n",
|
||
" expected_large = np.dot(large_a.data, large_b.data)\n",
|
||
" assert np.allclose(large_result.data, expected_large), \"Large matmul results don't match NumPy\"\n",
|
||
" print(\"✅ Large matrix multiplication (optimized) works\")\n",
|
||
"\n",
|
||
" print(\"📈 Progress: Matrix Multiplication ✓\")\n",
|
||
"\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Matrix multiplication test failed: {e}\")\n",
|
||
" raise\n",
|
||
"\n",
|
||
" print(\"🎯 Matrix multiplication behavior:\")\n",
|
||
" print(\" Small matrices: Educational loops show concept\")\n",
|
||
" print(\" Large matrices: Optimized NumPy implementation\")\n",
|
||
" print(\" Proper shape validation and error handling\")\n",
|
||
" print(\" Foundation for neural network linear layers\")\n",
|
||
"\n",
|
||
"test_unit_matrix_multiplication()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "76025783",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Unit Test: Advanced Tensor Operations\n",
|
||
"\n",
|
||
"Test the new view/copy semantics and memory layout functionality."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "564575fd",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"def test_unit_advanced_tensor_operations():\n",
|
||
" \"\"\"Test advanced tensor operations: view, clone, contiguous, strides.\"\"\"\n",
|
||
" print(\"🔬 Unit Test: Advanced Tensor Operations...\")\n",
|
||
" \n",
|
||
" try:\n",
|
||
" # Test dtype handling improvements\n",
|
||
" tensor_str = Tensor([1, 2, 3], dtype=\"float32\")\n",
|
||
" tensor_np = Tensor([1, 2, 3], dtype=np.float64)\n",
|
||
" assert tensor_str.dtype == np.float32, f\"String dtype failed: {tensor_str.dtype}\"\n",
|
||
" assert tensor_np.dtype == np.float64, f\"NumPy dtype failed: {tensor_np.dtype}\"\n",
|
||
" print(\"✅ Enhanced dtype handling works\")\n",
|
||
"\n",
|
||
" # Test stride and contiguity properties\n",
|
||
" matrix = Tensor([[1, 2, 3], [4, 5, 6]])\n",
|
||
" assert hasattr(matrix, 'strides'), \"Should have strides property\"\n",
|
||
" assert hasattr(matrix, 'is_contiguous'), \"Should have is_contiguous property\"\n",
|
||
" assert matrix.is_contiguous == True, \"New tensor should be contiguous\"\n",
|
||
" print(\"✅ Stride and contiguity properties work\")\n",
|
||
"\n",
|
||
" # Test view vs clone semantics\n",
|
||
" original = Tensor([[1, 2], [3, 4]])\n",
|
||
" view_tensor = original.view(4) # Should share data\n",
|
||
" clone_tensor = original.clone() # Should copy data\n",
|
||
" \n",
|
||
" assert view_tensor.shape == (4,), f\"View shape wrong: {view_tensor.shape}\"\n",
|
||
" assert clone_tensor.shape == (2, 2), f\"Clone shape wrong: {clone_tensor.shape}\"\n",
|
||
" print(\"✅ View and clone semantics work\")\n",
|
||
"\n",
|
||
" # Test contiguous operation\n",
|
||
" non_contiguous = Tensor(np.ones((10, 10)).T) # Transpose creates non-contiguous\n",
|
||
" contiguous_result = non_contiguous.contiguous()\n",
|
||
" \n",
|
||
" if not non_contiguous.is_contiguous: # Only test if actually non-contiguous\n",
|
||
" assert contiguous_result.is_contiguous == True, \"contiguous() should make data contiguous\"\n",
|
||
" print(\"✅ Contiguous operation works\")\n",
|
||
"\n",
|
||
" # Test error handling for invalid dtype\n",
|
||
" try:\n",
|
||
" Tensor([1, 2, 3], dtype=123) # Invalid dtype\n",
|
||
" print(\"❌ Should have failed with invalid dtype\")\n",
|
||
" except TypeError:\n",
|
||
" print(\"✅ Proper error handling for invalid dtype\")\n",
|
||
"\n",
|
||
" print(\"📈 Progress: Advanced Tensor Operations ✓\")\n",
|
||
"\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Advanced tensor operations test failed: {e}\")\n",
|
||
" raise\n",
|
||
"\n",
|
||
" print(\"🎯 Advanced tensor operations behavior:\")\n",
|
||
" print(\" Enhanced dtype handling (str and np.dtype)\")\n",
|
||
" print(\" Memory layout analysis with strides\")\n",
|
||
" print(\" View vs copy semantics for memory efficiency\")\n",
|
||
" print(\" Contiguous memory optimization\")\n",
|
||
"\n",
|
||
"test_unit_advanced_tensor_operations()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "674989ac",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"### 🧪 Integration Test: Tensor-NumPy Integration\n",
|
||
"\n",
|
||
"This integration test validates that your tensor system works seamlessly with NumPy, the foundation of the scientific Python ecosystem."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "79dc850b",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"def test_module_tensor_numpy_integration():\n",
|
||
" \"\"\"\n",
|
||
" Integration test for tensor operations with NumPy arrays.\n",
|
||
"\n",
|
||
" Tests that tensors properly integrate with NumPy operations and maintain\n",
|
||
" compatibility with the scientific Python ecosystem.\n",
|
||
" \"\"\"\n",
|
||
" print(\"🔬 Integration Test: Tensor-NumPy Integration...\")\n",
|
||
"\n",
|
||
" try:\n",
|
||
" # Test 1: Tensor from NumPy array\n",
|
||
" numpy_array = np.array([[1, 2, 3], [4, 5, 6]])\n",
|
||
" tensor_from_numpy = Tensor(numpy_array)\n",
|
||
"\n",
|
||
" assert tensor_from_numpy.shape == (2, 3), \"Tensor should preserve NumPy array shape\"\n",
|
||
" assert np.array_equal(tensor_from_numpy.data, numpy_array), \"Tensor should preserve NumPy array data\"\n",
|
||
" print(\"✅ Tensor from NumPy array works\")\n",
|
||
"\n",
|
||
" # Test 2: Tensor arithmetic with NumPy-compatible operations\n",
|
||
" a = Tensor([1.0, 2.0, 3.0])\n",
|
||
" b = Tensor([4.0, 5.0, 6.0])\n",
|
||
"\n",
|
||
" # Test operations that would be used in neural networks\n",
|
||
" dot_product_result = np.dot(a.data, b.data) # Common in layers\n",
|
||
" assert np.isclose(dot_product_result, 32.0), \"Dot product should work with tensor data\"\n",
|
||
" print(\"✅ NumPy operations on tensor data work\")\n",
|
||
"\n",
|
||
" # Test 3: Broadcasting compatibility\n",
|
||
" matrix = Tensor([[1, 2], [3, 4]])\n",
|
||
" scalar = Tensor(10)\n",
|
||
"\n",
|
||
" result = matrix + scalar\n",
|
||
" expected = np.array([[11, 12], [13, 14]])\n",
|
||
" assert np.array_equal(result.data, expected), \"Broadcasting should work like NumPy\"\n",
|
||
" print(\"✅ Broadcasting compatibility works\")\n",
|
||
"\n",
|
||
" # Test 4: Integration with scientific computing patterns\n",
|
||
" data = Tensor([1, 4, 9, 16, 25])\n",
|
||
" sqrt_result = Tensor(np.sqrt(data.data)) # Using NumPy functions on tensor data\n",
|
||
" expected_sqrt = np.array([1., 2., 3., 4., 5.])\n",
|
||
" assert np.allclose(sqrt_result.data, expected_sqrt), \"Should integrate with NumPy functions\"\n",
|
||
" print(\"✅ Scientific computing integration works\")\n",
|
||
"\n",
|
||
" print(\"📈 Progress: Tensor-NumPy Integration ✓\")\n",
|
||
"\n",
|
||
" except Exception as e:\n",
|
||
" print(f\"❌ Integration test failed: {e}\")\n",
|
||
" raise\n",
|
||
"\n",
|
||
" print(\"🎯 Integration test validates:\")\n",
|
||
" print(\" Seamless NumPy array conversion\")\n",
|
||
" print(\" Compatible arithmetic operations\")\n",
|
||
" print(\" Proper broadcasting behavior\")\n",
|
||
" print(\" Scientific computing workflow integration\")\n",
|
||
"\n",
|
||
"test_module_tensor_numpy_integration()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3ba2c701",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Parameter Helper Function\n",
|
||
"\n",
|
||
"Now that we have Tensor with gradient support, let's add a convenient helper function for creating trainable parameters:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8039d2e4",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"#| export\n",
|
||
"def Parameter(data, dtype=None):\n",
|
||
" \"\"\"\n",
|
||
" Convenience function for creating trainable tensors.\n",
|
||
"\n",
|
||
" This is equivalent to Tensor(data, requires_grad=True) but provides\n",
|
||
" cleaner syntax for neural network parameters.\n",
|
||
"\n",
|
||
" Args:\n",
|
||
" data: Input data (scalar, list, or numpy array)\n",
|
||
" dtype: Data type ('float32', 'int32', etc.). Defaults to auto-detect.\n",
|
||
"\n",
|
||
" Returns:\n",
|
||
" Tensor with requires_grad=True\n",
|
||
"\n",
|
||
" Examples:\n",
|
||
" weight = Parameter(np.random.randn(784, 128)) # Neural network weight\n",
|
||
" bias = Parameter(np.zeros(128)) # Neural network bias\n",
|
||
" \"\"\"\n",
|
||
" return Tensor(data, dtype=dtype, requires_grad=True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "94412986",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\"",
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"source": [
|
||
"## Comprehensive Testing Function\n",
|
||
"\n",
|
||
"Let's create a comprehensive test that runs all our unit tests together:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "71d471d8",
|
||
"metadata": {
|
||
"lines_to_next_cell": 1
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"def test_unit_all():\n",
|
||
" \"\"\"Run complete tensor module validation.\"\"\"\n",
|
||
" print(\"🧪 Running all unit tests...\")\n",
|
||
" \n",
|
||
" # Call every individual test function\n",
|
||
" test_unit_tensor_creation()\n",
|
||
" test_unit_tensor_properties() \n",
|
||
" test_unit_tensor_arithmetic()\n",
|
||
" test_unit_matrix_multiplication()\n",
|
||
" test_unit_advanced_tensor_operations()\n",
|
||
" test_module_tensor_numpy_integration()\n",
|
||
" \n",
|
||
" print(\"✅ All tests passed! Tensor module ready for integration.\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "adbef893",
|
||
"metadata": {
|
||
"lines_to_next_cell": 2
|
||
},
|
||
"source": [
|
||
"\"\"\"\n",
|
||
"# Main Execution Block\n",
|
||
"\"\"\"\n",
|
||
"\n",
|
||
"if __name__ == \"__main__\":\n",
|
||
" # Run all tensor tests\n",
|
||
" test_unit_all()\n",
|
||
" \n",
|
||
" print(\"\\n🎉 Tensor module implementation complete!\")\n",
|
||
" print(\"📦 Ready to export to tinytorch.core.tensor\")\n",
|
||
" \n",
|
||
" # Demonstrate the new ML Framework Advisor improvements\n",
|
||
" print(\"\\n🚀 New Features Demonstration:\")\n",
|
||
" \n",
|
||
" # 1. Enhanced dtype handling\n",
|
||
" t1 = Tensor([1, 2, 3], dtype=\"float32\")\n",
|
||
" t2 = Tensor([1, 2, 3], dtype=np.float64)\n",
|
||
" t3 = Tensor([1, 2, 3], dtype=np.int32)\n",
|
||
" print(f\"✅ Enhanced dtype support: str={t1.dtype}, np.dtype={t2.dtype}, np.type={t3.dtype}\")\n",
|
||
" \n",
|
||
" # 2. Memory layout analysis\n",
|
||
" matrix = Tensor([[1, 2, 3], [4, 5, 6]])\n",
|
||
" print(f\"✅ Memory analysis: strides={matrix.strides}, contiguous={matrix.is_contiguous}\")\n",
|
||
" \n",
|
||
" # 3. View/copy semantics\n",
|
||
" view = matrix.view(6)\n",
|
||
" clone = matrix.clone()\n",
|
||
" print(f\"✅ View/copy semantics: view_shape={view.shape}, clone_shape={clone.shape}\")\n",
|
||
" \n",
|
||
" # 4. Broadcasting failure demonstration with clear error messages\n",
|
||
" try:\n",
|
||
" bad_a = Tensor([[1, 2], [3, 4]]) # (2, 2)\n",
|
||
" bad_b = Tensor([1, 2, 3]) # (3,)\n",
|
||
" result = bad_a + bad_b\n",
|
||
" except ValueError as e:\n",
|
||
" print(f\"✅ Clear broadcasting error: {str(e)[:50]}...\")\n",
|
||
" \n",
|
||
" print(\"\\n🎯 Core tensor implementation complete!\")\n",
|
||
" print(\" ✓ Simple, clear tensor creation and operations\")\n",
|
||
" print(\" ✓ Memory layout analysis and performance insights\")\n",
|
||
" print(\" ✓ Broadcasting with comprehensive error handling\")\n",
|
||
" print(\" ✓ View/copy semantics for memory efficiency\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "eec96153",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🤔 ML Systems Thinking\n",
|
||
"\n",
|
||
"Now that you've built a complete tensor system, let's connect your implementation to real ML challenges:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ddedb4f4",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Question 1: Memory Efficiency at Scale\n",
|
||
"\n",
|
||
"**Challenge**: Your Tensor class showed that contiguous memory is 10-100x faster than scattered memory. Consider a language model with 7 billion parameters (28GB at float32). How would you modify your memory layout strategies to handle training with limited GPU memory (16GB)?\n",
|
||
"\n",
|
||
"Calculate the memory requirements for parameters, gradients, and optimizer states, then propose specific optimizations to your Tensor implementation."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "1a53526a",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"\"\"\"\n",
|
||
"YOUR ANALYSIS:\n",
|
||
"\n",
|
||
"[Write your response here - consider memory layout, cache efficiency,\n",
|
||
"and optimization strategies for large-scale tensor operations]\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9645ace4",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Question 2: Production Broadcasting\n",
|
||
"\n",
|
||
"**Challenge**: Your broadcasting implementation handles basic cases. In transformer models, you need operations like:\n",
|
||
"- Query (32, 512, 768) × Key (32, 512, 768) → Attention (32, 512, 512)\n",
|
||
"- Attention (32, 8, 512, 512) + Bias (1, 1, 512, 512)\n",
|
||
"\n",
|
||
"How would you extend your `__add__` and `__mul__` methods to handle these complex shapes while providing clear error messages when shapes are incompatible?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "20aee275",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"\"\"\"\n",
|
||
"YOUR ANALYSIS:\n",
|
||
"\n",
|
||
"[Write your response here - consider broadcasting rules, error handling,\n",
|
||
"and complex shape operations in transformer architectures]\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a4e71b43",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"### Question 3: Gradient Compatibility\n",
|
||
"\n",
|
||
"**Challenge**: Your Tensor class includes `requires_grad` and basic gradient tracking. When you implement automatic differentiation (Module 09), how will your current design support gradient computation?\n",
|
||
"\n",
|
||
"Consider how operations like `c = a * b` need to track both forward computation and backward gradient flow. What modifications would your Tensor methods need to support this?"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "32c157fe",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"\n",
|
||
"\"\"\"\n",
|
||
"YOUR ANALYSIS:\n",
|
||
"\n",
|
||
"[Write your response here - consider gradient tracking, computational graphs,\n",
|
||
"and how your tensor operations will support automatic differentiation]\n",
|
||
"\"\"\""
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9b4d9bff",
|
||
"metadata": {
|
||
"cell_marker": "\"\"\""
|
||
},
|
||
"source": [
|
||
"## 🎯 MODULE SUMMARY: Tensor Foundation\n",
|
||
"\n",
|
||
"Congratulations! You've built the fundamental data structure that powers all machine learning!\n",
|
||
"\n",
|
||
"### Key Learning Outcomes\n",
|
||
"- **Complete Tensor System**: Built a 400+ line implementation with 15 methods supporting all essential tensor operations\n",
|
||
"- **Memory Efficiency Mastery**: Discovered that memory layout affects performance more than algorithms (10-100x speedups)\n",
|
||
"- **Broadcasting Implementation**: Created automatic shape matching that saves memory and enables flexible operations\n",
|
||
"- **Production-Ready API**: Designed interfaces that mirror PyTorch and TensorFlow patterns\n",
|
||
"\n",
|
||
"### Ready for Next Steps\n",
|
||
"Your tensor implementation now enables:\n",
|
||
"- **Module 03 (Activations)**: Add nonlinear functions that make neural networks powerful\n",
|
||
"- **Neural network operations**: Matrix multiplication, broadcasting, and gradient preparation\n",
|
||
"- **Real data processing**: Handle images, text, and complex multi-dimensional datasets\n",
|
||
"\n",
|
||
"### Export Your Work\n",
|
||
"1. **Export to package**: `tito module complete 02_tensor`\n",
|
||
"2. **Verify integration**: Your Tensor class will be available as `tinytorch.core.tensor.Tensor`\n",
|
||
"3. **Enable next module**: Activations build on your tensor foundation\n",
|
||
"\n",
|
||
"**Achievement unlocked**: You've built the universal data structure of modern AI! Every neural network, from simple classifiers to ChatGPT, relies on the tensor concepts you've just implemented."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"jupytext": {
|
||
"main_language": "python"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.13.3"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|