diff --git a/MODULE_ANALYSIS_SUMMARY.md b/MODULE_ANALYSIS_SUMMARY.md new file mode 100644 index 00000000..4a6065cc --- /dev/null +++ b/MODULE_ANALYSIS_SUMMARY.md @@ -0,0 +1,152 @@ +# TinyTorch Module Analysis Summary + +## Key Findings + +### ✅ **Excellent Foundation (setup_dev.py)** +- **Perfect structure**: Follows explain → code → test → repeat pattern +- **Rich scaffolding**: Every TODO has step-by-step guidance +- **Immediate feedback**: Tests run after each concept +- **Educational flow**: Concepts build logically with real-world connections + +### ⚠️ **Structural Issues (Modules 01-07)** +- **Content quality**: Excellent mathematical explanations and implementations +- **Testing pattern**: All tests at end instead of progressive testing +- **TODO scaffolding**: Generic `NotImplementedError` without guidance +- **Student experience**: Large amounts of code before getting feedback + +### ❌ **Missing Modules (08-13)** +- **Empty directories**: 5 out of 13 modules are completely empty +- **Critical gaps**: Optimizers, training, MLOps missing + +## Immediate Action Items + +### 1. **Fix Testing Pattern (High Priority)** +Transform this poor pattern: +```python +# All implementations +def concept_1(): pass +def concept_2(): pass +def concept_3(): pass + +# All tests at end +def test_everything(): pass +``` + +To this excellent pattern: +```python +# Concept 1 +def concept_1(): pass +def test_concept_1(): pass +print("✅ Concept 1 tests passed!") + +# Concept 2 +def concept_2(): pass +def test_concept_2(): pass +print("✅ Concept 2 tests passed!") +``` + +### 2. **Enhance TODO Blocks (High Priority)** +Replace generic todos: +```python +def add(self, other): + """Add two tensors.""" + raise NotImplementedError("Student implementation required") +``` + +With rich scaffolding: +```python +def add(self, other): + """ + TODO: Implement tensor addition. + + STEP-BY-STEP IMPLEMENTATION: + 1. Get numpy data from both tensors + 2. Use numpy's + operator + 3. Create new Tensor with result + 4. Return the new tensor + + EXAMPLE USAGE: + t1 = Tensor([[1, 2], [3, 4]]) + t2 = Tensor([[5, 6], [7, 8]]) + result = t1.add(t2) # [[6, 8], [10, 12]] + + IMPLEMENTATION HINTS: + - Use self._data + other._data + - Wrap result in new Tensor + - NumPy handles broadcasting + """ +``` + +### 3. **Module Priority for Fixes** +1. **01_tensor** (Highest) - Foundation for everything +2. **02_activations** (High) - Used in all networks +3. **03_layers** (High) - Core building blocks +4. **07_autograd** (High) - Enables training +5. **04_networks** (Medium) - Compositions +6. **05_cnn** (Medium) - Specialized operations +7. **06_dataloader** (Medium) - Data handling + +## Implementation Strategy + +### Phase 1: Transform Existing Modules (Weeks 1-2) +For each module (01-07): +1. **Identify breakpoints**: Find natural concept boundaries +2. **Reorganize structure**: Create Step 1, Step 2, etc. with explanations +3. **Add immediate testing**: Test after each major concept +4. **Enhance TODO blocks**: Add step-by-step guidance +5. **Include success messages**: Clear progress indicators + +### Phase 2: Create Missing Modules (Weeks 3-4) +Using the improved structure: +- **08_optimizers**: SGD, Adam, learning rate scheduling +- **09_training**: Training loops, loss functions, metrics +- **10_compression**: Pruning, quantization, knowledge distillation +- **11_kernels**: Custom operations, CUDA kernels +- **12_benchmarking**: Performance measurement, profiling +- **13_mlops**: Model deployment, monitoring, versioning + +## Success Metrics + +### Student Experience +- **Immediate feedback**: Results after each concept +- **Clear guidance**: Step-by-step implementation instructions +- **Progressive complexity**: Each step builds on previous success +- **Debugging support**: Clear error messages and examples + +### Educational Quality +- **Consistent structure**: All modules follow same pattern +- **Rich scaffolding**: Every function has detailed guidance +- **Real-world connections**: Theory linked to practice +- **Integration**: Modules work together seamlessly + +## Next Steps + +### Week 1: Start with Tensor Module +1. **Backup current**: Create `tensor_dev_backup.py` +2. **Reorganize structure**: Break into progressive steps +3. **Add immediate testing**: Test after each operation type +4. **Test with students**: Validate improved experience + +### Week 2: Apply to Activations & Layers +1. **Apply same pattern**: Use tensor module as template +2. **Focus on scaffolding**: Rich TODO blocks +3. **Add visualizations**: Where helpful for understanding +4. **Progressive testing**: After each activation/layer type + +### Week 3-4: Complete Missing Modules +1. **Use proven pattern**: Follow successful structure +2. **Real-world focus**: Production-ready implementations +3. **Integration testing**: Ensure modules work together +4. **Documentation**: Clear learning outcomes + +## Key Principle + +**Always follow: Explain → Code → Test → Repeat** + +This pattern maximizes student success through: +- Immediate feedback prevents confusion +- Rich scaffolding reduces frustration +- Progressive complexity builds confidence +- Clear connections show the bigger picture + +The goal is to transform TinyTorch from reference material into a guided learning experience that creates deep understanding of ML systems. \ No newline at end of file diff --git a/Module_Improvement_Guide.md b/Module_Improvement_Guide.md new file mode 100644 index 00000000..836bc7e4 --- /dev/null +++ b/Module_Improvement_Guide.md @@ -0,0 +1,592 @@ +great# Module Improvement Guide: From Poor to Excellent Structure + +## Example: Transforming 01_tensor Module + +This guide shows how to transform the tensor module from its current structure to follow the **explain → code → test → repeat** pattern exemplified by `setup_dev.py`. + +## Current Problem Structure + +```python +# Current tensor_dev.py structure (POOR) +# Lines 1-300: All explanations +# Lines 300-700: All implementations +# Lines 700-1536: All tests at the end + +class Tensor: + def __init__(self): + raise NotImplementedError("Student implementation required") + + def add(self): + raise NotImplementedError("Student implementation required") + + def multiply(self): + raise NotImplementedError("Student implementation required") + +# Much later... +def test_tensor_creation_comprehensive(): + # Tests everything at once + pass + +def test_tensor_arithmetic_comprehensive(): + # Tests everything at once + pass +``` + +## Improved Structure (EXCELLENT) + +```python +# Improved tensor_dev.py structure (EXCELLENT) +# Following: Explain → Code → Test → Repeat + +# %% [markdown] +""" +## Step 1: What is a Tensor? + +### Definition +A **tensor** is an N-dimensional array with ML-specific operations. + +### Why Tensors Matter +- **Foundation**: Every ML framework uses tensors +- **Efficiency**: Vectorized operations are faster +- **Flexibility**: Same operations work on scalars, vectors, matrices + +### Real-World Examples +```python +# Scalar (0D): A single number +temperature = Tensor(25.0) + +# Vector (1D): A list of numbers +rgb_color = Tensor([255, 128, 0]) + +# Matrix (2D): Image pixels +image = Tensor([[100, 150], [200, 250]]) +``` + +Let's build this step by step! +""" + +# %% [markdown] +""" +## Step 1A: Tensor Creation + +### The Foundation Operation +Creating tensors is the first thing you'll do in any ML system. Our Tensor class needs to: +1. Accept various input types (lists, numpy arrays, scalars) +2. Store data efficiently +3. Track shape and type information +""" + +# %% nbgrader={"grade": false, "grade_id": "tensor-creation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class Tensor: + def __init__(self, data: Union[int, float, List, np.ndarray], dtype: Optional[str] = None): + """ + Create a tensor from various input types. + + TODO: Implement tensor creation with proper data handling. + + STEP-BY-STEP IMPLEMENTATION: + 1. Convert input data to numpy array using np.array() + 2. Handle dtype conversion if specified + 3. Store the numpy array in self._data + 4. Validate that data is numeric (not strings, objects, etc.) + + EXAMPLE USAGE: + ```python + # From scalar + t1 = Tensor(5.0) + + # From list + t2 = Tensor([1, 2, 3]) + + # From nested list (matrix) + t3 = Tensor([[1, 2], [3, 4]]) + ``` + + IMPLEMENTATION HINTS: + - Use np.array(data) to convert input + - Check dtype parameter: if provided, use np.array(data, dtype=dtype) + - Validate: ensure data is numeric (int, float, complex) + - Store in self._data for internal use + + LEARNING CONNECTIONS: + - This is like torch.tensor() in PyTorch + - Similar to tf.constant() in TensorFlow + - Foundation for all other tensor operations + """ + ### BEGIN SOLUTION + if dtype is not None: + self._data = np.array(data, dtype=dtype) + else: + self._data = np.array(data) + + # Validate numeric data + if not np.issubdtype(self._data.dtype, np.number): + raise ValueError(f"Tensor data must be numeric, got {self._data.dtype}") + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Test Your Tensor Creation + +Once you implement the `__init__` method above, run this cell to test it: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-creation", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +def test_tensor_creation(): + """Test tensor creation with various input types""" + print("Testing tensor creation...") + + # Test scalar creation + t1 = Tensor(5.0) + assert t1._data.shape == (), "Scalar tensor should have empty shape" + assert t1._data.item() == 5.0, "Scalar value should be 5.0" + + # Test list creation + t2 = Tensor([1, 2, 3]) + assert t2._data.shape == (3,), "1D tensor should have shape (3,)" + assert np.array_equal(t2._data, [1, 2, 3]), "1D tensor values should match" + + # Test matrix creation + t3 = Tensor([[1, 2], [3, 4]]) + assert t3._data.shape == (2, 2), "2D tensor should have shape (2, 2)" + assert np.array_equal(t3._data, [[1, 2], [3, 4]]), "2D tensor values should match" + + # Test dtype specification + t4 = Tensor([1, 2, 3], dtype='float32') + assert t4._data.dtype == np.float32, "Specified dtype should be respected" + + print("✅ Tensor creation tests passed!") + print(f"✅ Created tensors: scalar, vector, matrix") + print(f"✅ Handled data types correctly") + +# Run the test +test_tensor_creation() + +# %% [markdown] +""" +## Step 1B: Tensor Properties + +### Essential Information Access +Every tensor needs to provide basic information about itself: +- **Shape**: Dimensions of the tensor +- **Size**: Total number of elements +- **Data access**: Get the underlying data + +### Why Properties Matter +- **Debugging**: Quickly see tensor dimensions +- **Validation**: Check compatibility for operations +- **Integration**: Interface with other libraries +""" + +# %% nbgrader={"grade": false, "grade_id": "tensor-properties", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export + @property + def data(self) -> np.ndarray: + """ + Get the underlying numpy array data. + + TODO: Implement data property access. + + STEP-BY-STEP IMPLEMENTATION: + 1. Return self._data directly + 2. This gives users access to the numpy array + + EXAMPLE USAGE: + ```python + t = Tensor([[1, 2], [3, 4]]) + print(t.data) # [[1 2] + # [3 4]] + ``` + + IMPLEMENTATION HINTS: + - Simple property: just return self._data + - No validation needed here + - This is like tensor.numpy() in PyTorch + """ + ### BEGIN SOLUTION + return self._data + ### END SOLUTION + + @property + def shape(self) -> Tuple[int, ...]: + """ + Get the shape (dimensions) of the tensor. + + TODO: Implement shape property. + + STEP-BY-STEP IMPLEMENTATION: + 1. Return self._data.shape + 2. This gives the dimensions as a tuple + + EXAMPLE USAGE: + ```python + t = Tensor([[1, 2], [3, 4]]) + print(t.shape) # (2, 2) + ``` + + IMPLEMENTATION HINTS: + - NumPy arrays have a .shape attribute + - Return self._data.shape + - This is like tensor.shape in PyTorch + """ + ### BEGIN SOLUTION + return self._data.shape + ### END SOLUTION + + @property + def size(self) -> int: + """ + Get the total number of elements in the tensor. + + TODO: Implement size property. + + STEP-BY-STEP IMPLEMENTATION: + 1. Return self._data.size + 2. This gives total elements across all dimensions + + EXAMPLE USAGE: + ```python + t = Tensor([[1, 2], [3, 4]]) + print(t.size) # 4 (2×2 = 4 elements) + ``` + + IMPLEMENTATION HINTS: + - NumPy arrays have a .size attribute + - Return self._data.size + - This is like tensor.numel() in PyTorch + """ + ### BEGIN SOLUTION + return self._data.size + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Test Your Tensor Properties + +Once you implement the properties above, run this cell to test them: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-properties", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +def test_tensor_properties(): + """Test tensor properties: data, shape, size""" + print("Testing tensor properties...") + + # Test scalar properties + t1 = Tensor(5.0) + assert t1.shape == (), "Scalar shape should be empty tuple" + assert t1.size == 1, "Scalar size should be 1" + assert t1.data.item() == 5.0, "Scalar data should be accessible" + + # Test vector properties + t2 = Tensor([1, 2, 3, 4]) + assert t2.shape == (4,), "Vector shape should be (4,)" + assert t2.size == 4, "Vector size should be 4" + assert np.array_equal(t2.data, [1, 2, 3, 4]), "Vector data should match" + + # Test matrix properties + t3 = Tensor([[1, 2, 3], [4, 5, 6]]) + assert t3.shape == (2, 3), "Matrix shape should be (2, 3)" + assert t3.size == 6, "Matrix size should be 6" + assert np.array_equal(t3.data, [[1, 2, 3], [4, 5, 6]]), "Matrix data should match" + + print("✅ Tensor properties tests passed!") + print(f"✅ Shape, size, and data access working correctly") + +# Run the test +test_tensor_properties() + +# %% [markdown] +""" +## Step 2: Tensor Arithmetic + +### The Heart of ML: Mathematical Operations +Now we implement the core mathematical operations that make ML possible: +- **Addition**: Element-wise addition of tensors +- **Multiplication**: Element-wise multiplication +- **Subtraction**: Element-wise subtraction +- **Division**: Element-wise division + +### Why Arithmetic Matters +- **Neural networks**: Every layer uses tensor arithmetic +- **Optimization**: Gradient updates use arithmetic +- **Data processing**: Normalization, scaling, transformations +""" + +# %% [markdown] +""" +## Step 2A: Tensor Addition + +### The Foundation Operation +Addition is the most basic and important tensor operation: +- **Element-wise**: Each element adds to corresponding element +- **Broadcasting**: Smaller tensors can add to larger ones +- **Commutative**: a + b = b + a +""" + +# %% nbgrader={"grade": false, "grade_id": "tensor-addition", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export + def add(self, other: 'Tensor') -> 'Tensor': + """ + Add two tensors element-wise. + + TODO: Implement tensor addition. + + STEP-BY-STEP IMPLEMENTATION: + 1. Get the numpy data from both tensors + 2. Use numpy's + operator for element-wise addition + 3. Create a new Tensor with the result + 4. Return the new tensor + + EXAMPLE USAGE: + ```python + t1 = Tensor([[1, 2], [3, 4]]) + t2 = Tensor([[5, 6], [7, 8]]) + result = t1.add(t2) + print(result.data) # [[6, 8], [10, 12]] + ``` + + IMPLEMENTATION HINTS: + - Use self._data + other._data for numpy addition + - Wrap result in new Tensor: return Tensor(result) + - NumPy handles broadcasting automatically + - This is like torch.add() in PyTorch + + LEARNING CONNECTIONS: + - This is used in every neural network layer + - Gradient updates use addition: params = params + learning_rate * gradients + - Data preprocessing: adding bias, normalization + """ + ### BEGIN SOLUTION + result = self._data + other._data + return Tensor(result) + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Test Your Tensor Addition + +Once you implement the `add` method above, run this cell to test it: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-addition", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false} +def test_tensor_addition(): + """Test tensor addition with various shapes""" + print("Testing tensor addition...") + + # Test same-shape addition + t1 = Tensor([[1, 2], [3, 4]]) + t2 = Tensor([[5, 6], [7, 8]]) + result = t1.add(t2) + expected = np.array([[6, 8], [10, 12]]) + assert np.array_equal(result.data, expected), "Same-shape addition failed" + + # Test scalar addition (broadcasting) + t3 = Tensor([[1, 2], [3, 4]]) + t4 = Tensor(10) + result = t3.add(t4) + expected = np.array([[11, 12], [13, 14]]) + assert np.array_equal(result.data, expected), "Scalar addition failed" + + # Test vector addition (broadcasting) + t5 = Tensor([[1, 2], [3, 4]]) + t6 = Tensor([10, 20]) + result = t5.add(t6) + expected = np.array([[11, 22], [13, 24]]) + assert np.array_equal(result.data, expected), "Vector addition failed" + + print("✅ Tensor addition tests passed!") + print(f"✅ Same-shape, scalar, and vector addition working") + +# Run the test +test_tensor_addition() + +# %% [markdown] +""" +## Step 2B: Tensor Multiplication + +### Scaling and Element-wise Products +Multiplication is crucial for scaling values and computing element-wise products: +- **Element-wise**: Each element multiplies with corresponding element +- **Broadcasting**: Works with different shapes +- **Commutative**: a * b = b * a +""" + +# %% nbgrader={"grade": false, "grade_id": "tensor-multiplication", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export + def multiply(self, other: 'Tensor') -> 'Tensor': + """ + Multiply two tensors element-wise. + + TODO: Implement tensor multiplication. + + STEP-BY-STEP IMPLEMENTATION: + 1. Get the numpy data from both tensors + 2. Use numpy's * operator for element-wise multiplication + 3. Create a new Tensor with the result + 4. Return the new tensor + + EXAMPLE USAGE: + ```python + t1 = Tensor([[1, 2], [3, 4]]) + t2 = Tensor([[2, 3], [4, 5]]) + result = t1.multiply(t2) + print(result.data) # [[2, 6], [12, 20]] + ``` + + IMPLEMENTATION HINTS: + - Use self._data * other._data for numpy multiplication + - Wrap result in new Tensor: return Tensor(result) + - NumPy handles broadcasting automatically + - This is like torch.mul() in PyTorch + + LEARNING CONNECTIONS: + - Used in activation functions: ReLU masks + - Attention mechanisms: attention weights * values + - Scaling: learning_rate * gradients + """ + ### BEGIN SOLUTION + result = self._data * other._data + return Tensor(result) + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Test Your Tensor Multiplication + +Once you implement the `multiply` method above, run this cell to test it: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-multiplication", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false} +def test_tensor_multiplication(): + """Test tensor multiplication with various shapes""" + print("Testing tensor multiplication...") + + # Test same-shape multiplication + t1 = Tensor([[1, 2], [3, 4]]) + t2 = Tensor([[2, 3], [4, 5]]) + result = t1.multiply(t2) + expected = np.array([[2, 6], [12, 20]]) + assert np.array_equal(result.data, expected), "Same-shape multiplication failed" + + # Test scalar multiplication (broadcasting) + t3 = Tensor([[1, 2], [3, 4]]) + t4 = Tensor(2) + result = t3.multiply(t4) + expected = np.array([[2, 4], [6, 8]]) + assert np.array_equal(result.data, expected), "Scalar multiplication failed" + + # Test vector multiplication (broadcasting) + t5 = Tensor([[1, 2], [3, 4]]) + t6 = Tensor([2, 3]) + result = t5.multiply(t6) + expected = np.array([[2, 6], [6, 12]]) + assert np.array_equal(result.data, expected), "Vector multiplication failed" + + print("✅ Tensor multiplication tests passed!") + print(f"✅ Same-shape, scalar, and vector multiplication working") + +# Run the test +test_tensor_multiplication() + +# %% [markdown] +""" +## 🎯 Step 3: Integration Test + +### Putting It All Together +Now let's test that all our tensor operations work together in realistic scenarios: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-integration", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false} +def test_tensor_integration(): + """Test complete tensor functionality together""" + print("Testing tensor integration...") + + # Create test tensors + t1 = Tensor([[1, 2], [3, 4]]) + t2 = Tensor([[2, 1], [1, 2]]) + scalar = Tensor(0.5) + + # Test chained operations + result = t1.add(t2).multiply(scalar) + expected = np.array([[1.5, 1.5], [2.0, 3.0]]) + assert np.array_equal(result.data, expected), "Chained operations failed" + + # Test properties after operations + assert result.shape == (2, 2), "Result shape should be (2, 2)" + assert result.size == 4, "Result size should be 4" + + # Test with different shapes (broadcasting) + t3 = Tensor([1, 2, 3]) + t4 = Tensor([[1], [2], [3]]) + result = t3.add(t4) + assert result.shape == (3, 3), "Broadcasting result should be (3, 3)" + + print("✅ Tensor integration tests passed!") + print(f"✅ All tensor operations work together correctly") + print(f"✅ Ready to build neural networks!") + +# Run the integration test +test_tensor_integration() + +# %% [markdown] +""" +## 🎯 Module Summary: Tensor Mastery Achieved! + +Congratulations! You've successfully implemented the core Tensor class with: + +### ✅ What You've Built +- **Tensor Creation**: Handle various input types (scalars, lists, arrays) +- **Properties**: Access shape, size, and data efficiently +- **Arithmetic**: Add and multiply tensors with broadcasting support +- **Integration**: Operations work together seamlessly + +### ✅ Key Learning Outcomes +- **Understanding**: Tensors as the foundation of ML systems +- **Implementation**: Built tensor operations from scratch +- **Testing**: Comprehensive validation at each step +- **Integration**: Chained operations for complex computations + +### ✅ Ready for Next Steps +Your tensor implementation is now ready to power: +- **Activations**: ReLU, Sigmoid, Tanh will operate on your tensors +- **Layers**: Dense layers will use tensor arithmetic +- **Networks**: Complete neural networks built on your foundation + +**Next Module**: Activations - Adding nonlinearity to enable complex learning! +""" +``` + +## Key Improvements Demonstrated + +### 1. **Progressive Structure** +- Each concept is explained, implemented, and tested before moving on +- Students get immediate feedback after each step +- No overwhelming amount of code without validation + +### 2. **Rich Scaffolding** +- Every TODO has step-by-step implementation guidance +- Example usage shows exactly what the function should do +- Implementation hints provide specific technical guidance +- Learning connections show how concepts fit together + +### 3. **Immediate Testing** +- Each function is tested immediately after implementation +- Tests provide clear success messages and specific achievements +- Integration tests show how concepts work together + +### 4. **Educational Flow** +- Concepts build logically from simple to complex +- Real-world motivation before technical implementation +- Visual examples and concrete cases before abstract theory + +## Implementation Steps for Other Modules + +1. **Identify natural breakpoints** in the current module +2. **Reorganize** into Step 1, Step 2, etc. with explanations +3. **Add rich TODO blocks** with step-by-step guidance +4. **Insert immediate testing** after each major concept +5. **Add success messages** and progress indicators +6. **Include learning connections** between concepts + +This transformation turns modules from reference material into guided learning experiences that maximize student success through immediate feedback and clear progression. \ No newline at end of file diff --git a/docs/development/testing-design.md b/docs/development/testing-design.md index 21814ace..fe5a3269 100644 --- a/docs/development/testing-design.md +++ b/docs/development/testing-design.md @@ -283,7 +283,7 @@ class TestBasicMLPipeline: ### Test Organization ``` modules/source/{module}/{module}_dev.py # Implementation + comprehensive inline tests -tests/test_{module}.py # Module tests with mocks (for grading) +tests/test_{module}.py # Package tests for exported functionality tests/integration/ # Cross-module tests with vetted solutions ``` diff --git a/modules/source/00_setup/README.md b/modules/source/00_setup/README.md index bef90efe..8d05c48c 100644 --- a/modules/source/00_setup/README.md +++ b/modules/source/00_setup/README.md @@ -77,7 +77,7 @@ Run the comprehensive test suite using pytest: tito test --module setup # Or directly with pytest -python -m pytest modules/setup/tests/test_setup.py -v +python -m pytest tests/test_setup.py -v ``` ### Test Coverage diff --git a/modules/source/01_tensor/tensor_dev_backup.py b/modules/source/01_tensor/tensor_dev_backup.py new file mode 100644 index 00000000..671aaf3a --- /dev/null +++ b/modules/source/01_tensor/tensor_dev_backup.py @@ -0,0 +1,1536 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.17.1 +# --- + +# %% [markdown] +""" +# Module 1: Tensor - Core Data Structure + +Welcome to the Tensor module! This is where TinyTorch really begins. You'll implement the fundamental data structure that powers all ML systems. + +## Learning Goals +- Understand tensors as N-dimensional arrays with ML-specific operations +- Implement a complete Tensor class with arithmetic operations +- Handle shape management, data types, and memory layout +- Build the foundation for neural networks and automatic differentiation +- Master the NBGrader workflow with comprehensive testing + +## Build → Use → Understand +1. **Build**: Create the Tensor class with core operations +2. **Use**: Perform tensor arithmetic and transformations +3. **Understand**: How tensors form the foundation of ML systems +""" + +# %% nbgrader={"grade": false, "grade_id": "tensor-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| default_exp core.tensor + +#| export +import numpy as np +import sys +from typing import Union, List, Tuple, Optional, Any + +# %% nbgrader={"grade": false, "grade_id": "tensor-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} +print("🔥 TinyTorch Tensor Module") +print(f"NumPy version: {np.__version__}") +print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") +print("Ready to build tensors!") + +# %% [markdown] +""" +## 📦 Where This Code Lives in the Final Package + +**Learning Side:** You work in `modules/source/01_tensor/tensor_dev.py` +**Building Side:** Code exports to `tinytorch.core.tensor` + +```python +# Final package structure: +from tinytorch.core.tensor import Tensor # The foundation of everything! +from tinytorch.core.activations import ReLU, Sigmoid, Tanh +from tinytorch.core.layers import Dense, Conv2D +``` + +**Why this matters:** +- **Learning:** Focused modules for deep understanding +- **Production:** Proper organization like PyTorch's `torch.Tensor` +- **Consistency:** All tensor operations live together in `core.tensor` +- **Foundation:** Every other module depends on Tensor +""" + +# %% [markdown] +""" +## Step 1: What is a Tensor? + +### Definition +A **tensor** is an N-dimensional array with ML-specific operations. Think of it as a container that can hold data in multiple dimensions: + +- **Scalar** (0D): A single number - `5.0` +- **Vector** (1D): A list of numbers - `[1, 2, 3]` +- **Matrix** (2D): A 2D array - `[[1, 2], [3, 4]]` +- **Higher dimensions**: 3D, 4D, etc. for images, video, batches + +### The Mathematical Foundation: From Scalars to Tensors +Understanding tensors requires building from mathematical fundamentals: + +#### **Scalars (Rank 0)** +- **Definition**: A single number with no direction +- **Examples**: Temperature (25°C), mass (5.2 kg), probability (0.7) +- **Operations**: Addition, multiplication, comparison +- **ML Context**: Loss values, learning rates, regularization parameters + +#### **Vectors (Rank 1)** +- **Definition**: An ordered list of numbers with direction and magnitude +- **Examples**: Position [x, y, z], RGB color [255, 128, 0], word embedding [0.1, -0.5, 0.8] +- **Operations**: Dot product, cross product, norm calculation +- **ML Context**: Feature vectors, gradients, model parameters + +#### **Matrices (Rank 2)** +- **Definition**: A 2D array organizing data in rows and columns +- **Examples**: Image (height × width), weight matrix (input × output), covariance matrix +- **Operations**: Matrix multiplication, transpose, inverse, eigendecomposition +- **ML Context**: Linear layer weights, attention matrices, batch data + +#### **Higher-Order Tensors (Rank 3+)** +- **Definition**: Multi-dimensional arrays extending matrices +- **Examples**: + - **3D**: Video frames (time × height × width), RGB images (height × width × channels) + - **4D**: Image batches (batch × height × width × channels) + - **5D**: Video batches (batch × time × height × width × channels) +- **Operations**: Tensor products, contractions, decompositions +- **ML Context**: Convolutional features, RNN states, transformer attention + +### Why Tensors Matter in ML: The Computational Foundation + +#### **1. Unified Data Representation** +Tensors provide a consistent way to represent all ML data: +```python +# All of these are tensors with different shapes +scalar_loss = Tensor(0.5) # Shape: () +feature_vector = Tensor([1, 2, 3]) # Shape: (3,) +weight_matrix = Tensor([[1, 2], [3, 4]]) # Shape: (2, 2) +image_batch = Tensor(np.random.rand(32, 224, 224, 3)) # Shape: (32, 224, 224, 3) +``` + +#### **2. Efficient Batch Processing** +ML systems process multiple samples simultaneously: +```python +# Instead of processing one image at a time: +for image in images: + result = model(image) # Slow: 1000 separate operations + +# Process entire batch at once: +batch_result = model(image_batch) # Fast: 1 vectorized operation +``` + +#### **3. Hardware Acceleration** +Modern hardware (GPUs, TPUs) excels at tensor operations: +- **Parallel processing**: Multiple operations simultaneously +- **Vectorization**: SIMD (Single Instruction, Multiple Data) operations +- **Memory optimization**: Contiguous memory layout for cache efficiency + +#### **4. Automatic Differentiation** +Tensors enable gradient computation through computational graphs: +```python +# Each tensor operation creates a node in the computation graph +x = Tensor([1, 2, 3]) +y = x * 2 # Node: multiplication +z = y + 1 # Node: addition +loss = z.sum() # Node: summation +# Gradients flow backward through this graph +``` + +### Real-World Examples: Tensors in Action + +#### **Computer Vision** +- **Grayscale image**: 2D tensor `(height, width)` - `(28, 28)` for MNIST +- **Color image**: 3D tensor `(height, width, channels)` - `(224, 224, 3)` for RGB +- **Image batch**: 4D tensor `(batch, height, width, channels)` - `(32, 224, 224, 3)` +- **Video**: 5D tensor `(batch, time, height, width, channels)` + +#### **Natural Language Processing** +- **Word embedding**: 1D tensor `(embedding_dim,)` - `(300,)` for Word2Vec +- **Sentence**: 2D tensor `(sequence_length, embedding_dim)` - `(50, 768)` for BERT +- **Batch of sentences**: 3D tensor `(batch, sequence_length, embedding_dim)` + +#### **Audio Processing** +- **Audio signal**: 1D tensor `(time_steps,)` - `(16000,)` for 1 second at 16kHz +- **Spectrogram**: 2D tensor `(time_frames, frequency_bins)` +- **Batch of audio**: 3D tensor `(batch, time_steps, features)` + +#### **Time Series** +- **Single series**: 2D tensor `(time_steps, features)` +- **Multiple series**: 3D tensor `(batch, time_steps, features)` +- **Multivariate forecasting**: 4D tensor `(batch, time_steps, features, predictions)` + +### Why Not Just Use NumPy? + +While we use NumPy internally, our Tensor class adds ML-specific functionality: + +#### **1. ML-Specific Operations** +- **Gradient tracking**: For automatic differentiation (coming in Module 7) +- **GPU support**: For hardware acceleration (future extension) +- **Broadcasting semantics**: ML-friendly dimension handling + +#### **2. Consistent API** +- **Type safety**: Predictable behavior across operations +- **Error checking**: Clear error messages for debugging +- **Integration**: Seamless work with other TinyTorch components + +#### **3. Educational Value** +- **Conceptual clarity**: Understand what tensors really are +- **Implementation insight**: See how frameworks work internally +- **Debugging skills**: Trace through tensor operations step by step + +#### **4. Extensibility** +- **Future features**: Ready for gradients, GPU, distributed computing +- **Customization**: Add domain-specific operations +- **Optimization**: Profile and optimize specific use cases + +### Performance Considerations: Building Efficient Tensors + +#### **Memory Layout** +- **Contiguous arrays**: Better cache locality and performance +- **Data types**: `float32` vs `float64` trade-offs +- **Memory sharing**: Avoid unnecessary copies + +#### **Vectorization** +- **SIMD operations**: Single Instruction, Multiple Data +- **Broadcasting**: Efficient operations on different shapes +- **Batch operations**: Process multiple samples simultaneously + +#### **Numerical Stability** +- **Precision**: Balancing speed and accuracy +- **Overflow/underflow**: Handling extreme values +- **Gradient flow**: Maintaining numerical stability for training + +Let's start building our tensor foundation! +""" + +# %% [markdown] +""" +## 🧠 The Mathematical Foundation + +### Linear Algebra Refresher +Tensors are generalizations of scalars, vectors, and matrices: + +``` +Scalar (0D): 5 +Vector (1D): [1, 2, 3] +Matrix (2D): [[1, 2], [3, 4]] +Tensor (3D): [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] +``` + +### Why This Matters for Neural Networks +- **Forward Pass**: Matrix multiplication between layers +- **Batch Processing**: Multiple samples processed simultaneously +- **Convolutions**: 3D operations on image data +- **Gradients**: Derivatives computed across all dimensions + +### Connection to Real ML Systems +Every major ML framework uses tensors: +- **PyTorch**: `torch.Tensor` +- **TensorFlow**: `tf.Tensor` +- **JAX**: `jax.numpy.ndarray` +- **TinyTorch**: `tinytorch.core.tensor.Tensor` (what we're building!) + +### Performance Considerations +- **Memory Layout**: Contiguous arrays for cache efficiency +- **Vectorization**: SIMD operations for speed +- **Broadcasting**: Efficient operations on different shapes +- **Type Consistency**: Avoiding unnecessary conversions +""" + +# %% [markdown] +""" +## Step 2: The Tensor Class Foundation + +### Core Concept: Wrapping NumPy with ML Intelligence +Our Tensor class wraps NumPy arrays with ML-specific functionality. This design pattern is used by all major ML frameworks: + +- **PyTorch**: `torch.Tensor` wraps ATen (C++ tensor library) +- **TensorFlow**: `tf.Tensor` wraps Eigen (C++ linear algebra library) +- **JAX**: `jax.numpy.ndarray` wraps XLA (Google's linear algebra compiler) +- **TinyTorch**: `Tensor` wraps NumPy (Python's numerical computing library) + +### Design Requirements Analysis + +#### **1. Input Flexibility** +Our tensor must handle diverse input types: +```python +# Scalars (Python numbers) +t1 = Tensor(5) # int → numpy array +t2 = Tensor(3.14) # float → numpy array + +# Lists (Python sequences) +t3 = Tensor([1, 2, 3]) # list → numpy array +t4 = Tensor([[1, 2], [3, 4]]) # nested list → 2D array + +# NumPy arrays (existing arrays) +t5 = Tensor(np.array([1, 2, 3])) # array → tensor wrapper +``` + +#### **2. Type Management** +ML systems need consistent, predictable types: +- **Default behavior**: Auto-detect appropriate types +- **Explicit control**: Allow manual type specification +- **Performance optimization**: Prefer `float32` over `float64` +- **Memory efficiency**: Use appropriate precision + +#### **3. Property Access** +Essential tensor properties for ML operations: +- **Shape**: Dimensions for compatibility checking +- **Size**: Total elements for memory estimation +- **Data type**: For numerical computation planning +- **Data access**: For integration with other libraries + +#### **4. Arithmetic Operations** +Support for mathematical operations: +- **Element-wise**: Addition, multiplication, subtraction, division +- **Broadcasting**: Operations on different shapes +- **Type promotion**: Consistent result types +- **Error handling**: Clear messages for incompatible operations + +### Implementation Strategy + +#### **Memory Management** +- **Copy vs. Reference**: When to copy data vs. share memory +- **Type conversion**: Efficient dtype changes +- **Contiguous layout**: Ensure optimal memory access patterns + +#### **Error Handling** +- **Input validation**: Check for valid input types +- **Shape compatibility**: Verify operations are mathematically valid +- **Informative messages**: Help users debug issues quickly + +#### **Performance Optimization** +- **Lazy evaluation**: Defer expensive operations when possible +- **Vectorization**: Use NumPy's optimized operations +- **Memory reuse**: Minimize unnecessary allocations + +### Learning Objectives for Implementation + +By implementing this Tensor class, you'll learn: +1. **Wrapper pattern**: How to extend existing libraries +2. **Type system design**: Managing data types in numerical computing +3. **API design**: Creating intuitive, consistent interfaces +4. **Performance considerations**: Balancing flexibility and speed +5. **Error handling**: Providing helpful feedback to users + +Let's implement our tensor foundation! +""" + +# %% nbgrader={"grade": false, "grade_id": "tensor-class", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class Tensor: + """ + TinyTorch Tensor: N-dimensional array with ML operations. + + The fundamental data structure for all TinyTorch operations. + Wraps NumPy arrays with ML-specific functionality. + """ + + def __init__(self, data: Union[int, float, List, np.ndarray], dtype: Optional[str] = None): + """ + Create a new tensor from data. + + Args: + data: Input data (scalar, list, or numpy array) + dtype: Data type ('float32', 'int32', etc.). Defaults to auto-detect. + + TODO: Implement tensor creation with proper type handling. + + STEP-BY-STEP: + 1. Check if data is a scalar (int/float) - convert to numpy array + 2. Check if data is a list - convert to numpy array + 3. Check if data is already a numpy array - use as-is + 4. Apply dtype conversion if specified + 5. Store the result in self._data + + EXAMPLE: + Tensor(5) → stores np.array(5) + Tensor([1, 2, 3]) → stores np.array([1, 2, 3]) + Tensor(np.array([1, 2, 3])) → stores the array directly + + HINTS: + - Use isinstance() to check data types + - Use np.array() for conversion + - Handle dtype parameter for type conversion + - Store the array in self._data + """ + ### BEGIN SOLUTION + # Convert input to numpy array + if isinstance(data, (int, float, np.number)): + # Handle Python and NumPy scalars + if dtype is None: + # Auto-detect type: int for integers, float32 for floats + if isinstance(data, int) or (isinstance(data, np.number) and np.issubdtype(type(data), np.integer)): + dtype = 'int32' + else: + dtype = 'float32' + self._data = np.array(data, dtype=dtype) + elif isinstance(data, list): + # Let NumPy auto-detect type, then convert if needed + temp_array = np.array(data) + if dtype is None: + # Use NumPy's auto-detected type, but prefer float32 for floats + if temp_array.dtype == np.float64: + dtype = 'float32' + else: + dtype = str(temp_array.dtype) + self._data = np.array(data, dtype=dtype) + elif isinstance(data, np.ndarray): + # Already a numpy array + if dtype is None: + # Keep existing dtype, but prefer float32 for float64 + if data.dtype == np.float64: + dtype = 'float32' + else: + dtype = str(data.dtype) + self._data = data.astype(dtype) if dtype != data.dtype else data.copy() + else: + # Try to convert unknown types + self._data = np.array(data, dtype=dtype) + ### END SOLUTION + + @property + def data(self) -> np.ndarray: + """ + Access underlying numpy array. + + TODO: Return the stored numpy array. + + HINT: Return self._data (the array you stored in __init__) + """ + ### BEGIN SOLUTION + return self._data + ### END SOLUTION + + @property + def shape(self) -> Tuple[int, ...]: + """ + Get tensor shape. + + TODO: Return the shape of the stored numpy array. + + HINT: Use .shape attribute of the numpy array + EXAMPLE: Tensor([1, 2, 3]).shape should return (3,) + """ + ### BEGIN SOLUTION + return self._data.shape + ### END SOLUTION + + @property + def size(self) -> int: + """ + Get total number of elements. + + TODO: Return the total number of elements in the tensor. + + HINT: Use .size attribute of the numpy array + EXAMPLE: Tensor([1, 2, 3]).size should return 3 + """ + ### BEGIN SOLUTION + return self._data.size + ### END SOLUTION + + @property + def dtype(self) -> np.dtype: + """ + Get data type as numpy dtype. + + TODO: Return the data type of the stored numpy array. + + HINT: Use .dtype attribute of the numpy array + EXAMPLE: Tensor([1, 2, 3]).dtype should return dtype('int32') + """ + ### BEGIN SOLUTION + return self._data.dtype + ### END SOLUTION + + def __repr__(self) -> str: + """ + String representation. + + TODO: Create a clear string representation of the tensor. + + APPROACH: + 1. Convert the numpy array to a list for readable output + 2. Include the shape and dtype information + 3. Format: "Tensor([data], shape=shape, dtype=dtype)" + + EXAMPLE: + Tensor([1, 2, 3]) → "Tensor([1, 2, 3], shape=(3,), dtype=int32)" + + HINTS: + - Use .tolist() to convert numpy array to list + - Include shape and dtype information + - Keep format consistent and readable + """ + ### BEGIN SOLUTION + return f"Tensor({self._data.tolist()}, shape={self.shape}, dtype={self.dtype})" + ### END SOLUTION + + def add(self, other: 'Tensor') -> 'Tensor': + """ + Add two tensors element-wise. + + TODO: Implement tensor addition. + + APPROACH: + 1. Add the numpy arrays using + + 2. Return a new Tensor with the result + 3. Handle broadcasting automatically + + EXAMPLE: + Tensor([1, 2]) + Tensor([3, 4]) → Tensor([4, 6]) + + HINTS: + - Use self._data + other._data + - Return Tensor(result) + - NumPy handles broadcasting automatically + """ + ### BEGIN SOLUTION + result = self._data + other._data + return Tensor(result) + ### END SOLUTION + + def multiply(self, other: 'Tensor') -> 'Tensor': + """ + Multiply two tensors element-wise. + + TODO: Implement tensor multiplication. + + APPROACH: + 1. Multiply the numpy arrays using * + 2. Return a new Tensor with the result + 3. Handle broadcasting automatically + + EXAMPLE: + Tensor([1, 2]) * Tensor([3, 4]) → Tensor([3, 8]) + + HINTS: + - Use self._data * other._data + - Return Tensor(result) + - This is element-wise, not matrix multiplication + """ + ### BEGIN SOLUTION + result = self._data * other._data + return Tensor(result) + ### END SOLUTION + + def __add__(self, other: Union['Tensor', int, float]) -> 'Tensor': + """ + Addition operator: tensor + other + + TODO: Implement + operator for tensors. + + APPROACH: + 1. If other is a Tensor, use tensor addition + 2. If other is a scalar, convert to Tensor first + 3. Return the result + + EXAMPLE: + Tensor([1, 2]) + Tensor([3, 4]) → Tensor([4, 6]) + Tensor([1, 2]) + 5 → Tensor([6, 7]) + """ + ### BEGIN SOLUTION + if isinstance(other, Tensor): + return self.add(other) + else: + return self.add(Tensor(other)) + ### END SOLUTION + + def __mul__(self, other: Union['Tensor', int, float]) -> 'Tensor': + """ + Multiplication operator: tensor * other + + TODO: Implement * operator for tensors. + + APPROACH: + 1. If other is a Tensor, use tensor multiplication + 2. If other is a scalar, convert to Tensor first + 3. Return the result + + EXAMPLE: + Tensor([1, 2]) * Tensor([3, 4]) → Tensor([3, 8]) + Tensor([1, 2]) * 3 → Tensor([3, 6]) + """ + ### BEGIN SOLUTION + if isinstance(other, Tensor): + return self.multiply(other) + else: + return self.multiply(Tensor(other)) + ### END SOLUTION + + def __sub__(self, other: Union['Tensor', int, float]) -> 'Tensor': + """ + Subtraction operator: tensor - other + + TODO: Implement - operator for tensors. + + APPROACH: + 1. Convert other to Tensor if needed + 2. Subtract using numpy arrays + 3. Return new Tensor with result + + EXAMPLE: + Tensor([5, 6]) - Tensor([1, 2]) → Tensor([4, 4]) + Tensor([5, 6]) - 1 → Tensor([4, 5]) + """ + ### BEGIN SOLUTION + if isinstance(other, Tensor): + result = self._data - other._data + else: + result = self._data - other + return Tensor(result) + ### END SOLUTION + + def __truediv__(self, other: Union['Tensor', int, float]) -> 'Tensor': + """ + Division operator: tensor / other + + TODO: Implement / operator for tensors. + + APPROACH: + 1. Convert other to Tensor if needed + 2. Divide using numpy arrays + 3. Return new Tensor with result + + EXAMPLE: + Tensor([6, 8]) / Tensor([2, 4]) → Tensor([3, 2]) + Tensor([6, 8]) / 2 → Tensor([3, 4]) + """ + ### BEGIN SOLUTION + if isinstance(other, Tensor): + result = self._data / other._data + else: + result = self._data / other + return Tensor(result) + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Unit Test: Tensor Creation + +Let's test your tensor creation implementation right away! This gives you immediate feedback on whether your `__init__` method works correctly. + +**This is a unit test** - it tests one specific function (tensor creation) in isolation. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-creation-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false} +# Test tensor creation immediately after implementation +print("🔬 Unit Test: Tensor Creation...") + +# Test basic tensor creation +try: + # Test scalar + scalar = Tensor(5.0) + assert hasattr(scalar, '_data'), "Tensor should have _data attribute" + assert scalar._data.shape == (), f"Scalar should have shape (), got {scalar._data.shape}" + print("✅ Scalar creation works") + + # Test vector + vector = Tensor([1, 2, 3]) + assert vector._data.shape == (3,), f"Vector should have shape (3,), got {vector._data.shape}" + print("✅ Vector creation works") + + # Test matrix + matrix = Tensor([[1, 2], [3, 4]]) + assert matrix._data.shape == (2, 2), f"Matrix should have shape (2, 2), got {matrix._data.shape}" + print("✅ Matrix creation works") + + print("📈 Progress: Tensor Creation ✓") + +except Exception as e: + print(f"❌ Tensor creation test failed: {e}") + raise + +print("🎯 Tensor creation behavior:") +print(" Converts data to NumPy arrays") +print(" Preserves shape and data type") +print(" Stores in _data attribute") + +# %% [markdown] +""" +### 🧪 Unit Test: Tensor Properties + +Now let's test that your tensor properties work correctly. This tests the @property methods you implemented. + +**This is a unit test** - it tests specific properties (shape, size, dtype, data) in isolation. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-properties-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false} +# Test tensor properties immediately after implementation +print("🔬 Unit Test: Tensor Properties...") + +# Test properties with simple examples +try: + # Test with a simple matrix + tensor = Tensor([[1, 2, 3], [4, 5, 6]]) + + # Test shape property + assert tensor.shape == (2, 3), f"Shape should be (2, 3), got {tensor.shape}" + print("✅ Shape property works") + + # Test size property + assert tensor.size == 6, f"Size should be 6, got {tensor.size}" + print("✅ Size property works") + + # Test data property + assert np.array_equal(tensor.data, np.array([[1, 2, 3], [4, 5, 6]])), "Data property should return numpy array" + print("✅ Data property works") + + # Test dtype property + assert tensor.dtype in [np.int32, np.int64], f"Dtype should be int32 or int64, got {tensor.dtype}" + print("✅ Dtype property works") + + print("📈 Progress: Tensor Properties ✓") + +except Exception as e: + print(f"❌ Tensor properties test failed: {e}") + raise + +print("🎯 Tensor properties behavior:") +print(" shape: Returns tuple of dimensions") +print(" size: Returns total number of elements") +print(" data: Returns underlying NumPy array") +print(" dtype: Returns NumPy data type") + +# %% [markdown] +""" +### 🧪 Unit Test: Tensor Arithmetic + +Let's test your tensor arithmetic operations. This tests the __add__, __mul__, __sub__, __truediv__ methods. + +**This is a unit test** - it tests specific arithmetic operations in isolation. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-arithmetic-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false} +# Test tensor arithmetic immediately after implementation +print("🔬 Unit Test: Tensor Arithmetic...") + +# Test basic arithmetic with simple examples +try: + # Test addition + a = Tensor([1, 2, 3]) + b = Tensor([4, 5, 6]) + result = a + b + expected = np.array([5, 7, 9]) + assert np.array_equal(result.data, expected), f"Addition failed: expected {expected}, got {result.data}" + print("✅ Addition works") + + # Test scalar addition + result_scalar = a + 10 + expected_scalar = np.array([11, 12, 13]) + assert np.array_equal(result_scalar.data, expected_scalar), f"Scalar addition failed: expected {expected_scalar}, got {result_scalar.data}" + print("✅ Scalar addition works") + + # Test multiplication + result_mul = a * b + expected_mul = np.array([4, 10, 18]) + assert np.array_equal(result_mul.data, expected_mul), f"Multiplication failed: expected {expected_mul}, got {result_mul.data}" + print("✅ Multiplication works") + + # Test scalar multiplication + result_scalar_mul = a * 2 + expected_scalar_mul = np.array([2, 4, 6]) + assert np.array_equal(result_scalar_mul.data, expected_scalar_mul), f"Scalar multiplication failed: expected {expected_scalar_mul}, got {result_scalar_mul.data}" + print("✅ Scalar multiplication works") + + print("📈 Progress: Tensor Arithmetic ✓") + +except Exception as e: + print(f"❌ Tensor arithmetic test failed: {e}") + raise + +print("🎯 Tensor arithmetic behavior:") +print(" Element-wise operations on tensors") +print(" Broadcasting with scalars") +print(" Returns new Tensor objects") + +# %% [markdown] +""" +### 🧪 Comprehensive Test: Tensor Creation + +Let's thoroughly test your tensor creation to make sure it handles all the cases you'll encounter in ML. +This tests the foundation of everything else we'll build. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-creation-comprehensive", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false} +def test_tensor_creation_comprehensive(): + """Comprehensive test of tensor creation with all data types and shapes.""" + print("🔬 Testing comprehensive tensor creation...") + + tests_passed = 0 + total_tests = 8 + + # Test 1: Scalar creation (0D tensor) + try: + scalar_int = Tensor(42) + scalar_float = Tensor(3.14) + scalar_zero = Tensor(0) + + assert hasattr(scalar_int, '_data'), "Tensor should have _data attribute" + assert scalar_int._data.shape == (), f"Scalar should have shape (), got {scalar_int._data.shape}" + assert scalar_float._data.shape == (), f"Float scalar should have shape (), got {scalar_float._data.shape}" + assert scalar_zero._data.shape == (), f"Zero scalar should have shape (), got {scalar_zero._data.shape}" + + print("✅ Scalar creation: integers, floats, and zero") + tests_passed += 1 + except Exception as e: + print(f"❌ Scalar creation failed: {e}") + + # Test 2: Vector creation (1D tensor) + try: + vector_int = Tensor([1, 2, 3, 4, 5]) + vector_float = Tensor([1.0, 2.5, 3.7]) + vector_single = Tensor([42]) + vector_empty = Tensor([]) + + assert vector_int._data.shape == (5,), f"Int vector should have shape (5,), got {vector_int._data.shape}" + assert vector_float._data.shape == (3,), f"Float vector should have shape (3,), got {vector_float._data.shape}" + assert vector_single._data.shape == (1,), f"Single element vector should have shape (1,), got {vector_single._data.shape}" + assert vector_empty._data.shape == (0,), f"Empty vector should have shape (0,), got {vector_empty._data.shape}" + + print("✅ Vector creation: integers, floats, single element, and empty") + tests_passed += 1 + except Exception as e: + print(f"❌ Vector creation failed: {e}") + + # Test 3: Matrix creation (2D tensor) + try: + matrix_2x2 = Tensor([[1, 2], [3, 4]]) + matrix_3x2 = Tensor([[1, 2], [3, 4], [5, 6]]) + matrix_1x3 = Tensor([[1, 2, 3]]) + + assert matrix_2x2._data.shape == (2, 2), f"2x2 matrix should have shape (2, 2), got {matrix_2x2._data.shape}" + assert matrix_3x2._data.shape == (3, 2), f"3x2 matrix should have shape (3, 2), got {matrix_3x2._data.shape}" + assert matrix_1x3._data.shape == (1, 3), f"1x3 matrix should have shape (1, 3), got {matrix_1x3._data.shape}" + + print("✅ Matrix creation: 2x2, 3x2, and 1x3 matrices") + tests_passed += 1 + except Exception as e: + print(f"❌ Matrix creation failed: {e}") + + # Test 4: Data type handling + try: + int_tensor = Tensor([1, 2, 3]) + float_tensor = Tensor([1.0, 2.0, 3.0]) + mixed_tensor = Tensor([1, 2.5, 3]) # Should convert to float + + # Check that data types are reasonable + assert int_tensor._data.dtype in [np.int32, np.int64], f"Int tensor has unexpected dtype: {int_tensor._data.dtype}" + assert float_tensor._data.dtype in [np.float32, np.float64], f"Float tensor has unexpected dtype: {float_tensor._data.dtype}" + assert mixed_tensor._data.dtype in [np.float32, np.float64], f"Mixed tensor should be float, got: {mixed_tensor._data.dtype}" + + print("✅ Data type handling: integers, floats, and mixed types") + tests_passed += 1 + except Exception as e: + print(f"❌ Data type handling failed: {e}") + + # Test 5: NumPy array input + try: + np_array = np.array([1, 2, 3, 4]) + tensor_from_np = Tensor(np_array) + + assert tensor_from_np._data.shape == (4,), f"Tensor from NumPy should have shape (4,), got {tensor_from_np._data.shape}" + assert np.array_equal(tensor_from_np._data, np_array), "Tensor from NumPy should preserve data" + + print("✅ NumPy array input: conversion works correctly") + tests_passed += 1 + except Exception as e: + print(f"❌ NumPy array input failed: {e}") + + # Test 6: Large tensor creation + try: + large_tensor = Tensor(list(range(1000))) + assert large_tensor._data.shape == (1000,), f"Large tensor should have shape (1000,), got {large_tensor._data.shape}" + assert large_tensor._data[0] == 0, "Large tensor should start with 0" + assert large_tensor._data[-1] == 999, "Large tensor should end with 999" + + print("✅ Large tensor creation: 1000 elements") + tests_passed += 1 + except Exception as e: + print(f"❌ Large tensor creation failed: {e}") + + # Test 7: Negative numbers + try: + negative_tensor = Tensor([-1, -2, -3]) + mixed_signs = Tensor([-1, 0, 1]) + + assert negative_tensor._data.shape == (3,), f"Negative tensor should have shape (3,), got {negative_tensor._data.shape}" + assert np.array_equal(negative_tensor._data, np.array([-1, -2, -3])), "Negative numbers should be preserved" + assert np.array_equal(mixed_signs._data, np.array([-1, 0, 1])), "Mixed signs should be preserved" + + print("✅ Negative numbers: handled correctly") + tests_passed += 1 + except Exception as e: + print(f"❌ Negative numbers failed: {e}") + + # Test 8: Edge cases + try: + # Very large numbers + big_tensor = Tensor([1e6, 1e-6]) + assert big_tensor._data.shape == (2,), "Big numbers tensor should have correct shape" + + # Zero tensor + zero_tensor = Tensor([0, 0, 0]) + assert np.all(zero_tensor._data == 0), "Zero tensor should contain all zeros" + + print("✅ Edge cases: large numbers and zeros") + tests_passed += 1 + except Exception as e: + print(f"❌ Edge cases failed: {e}") + + # Results summary + print(f"\n📊 Tensor Creation Results: {tests_passed}/{total_tests} tests passed") + + if tests_passed == total_tests: + print("🎉 All tensor creation tests passed! Your Tensor class can handle:") + print(" • Scalars, vectors, and matrices") + print(" • Different data types (int, float)") + print(" • NumPy arrays") + print(" • Large tensors and edge cases") + print("📈 Progress: Tensor Creation ✓") + return True + else: + print("⚠️ Some tensor creation tests failed. Common issues:") + print(" • Check your __init__ method implementation") + print(" • Make sure you're storing data in self._data") + print(" • Verify NumPy array conversion works correctly") + print(" • Test with different input types (int, float, list, np.array)") + return False + +# Run the comprehensive test +success = test_tensor_creation_comprehensive() + +# %% [markdown] +""" +### 🧪 Comprehensive Test: Tensor Properties + +Now let's test all the properties your tensor should have. These properties are essential for ML operations. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-properties-comprehensive", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false} +def test_tensor_properties_comprehensive(): + """Comprehensive test of tensor properties (shape, size, dtype, data access).""" + print("🔬 Testing comprehensive tensor properties...") + + tests_passed = 0 + total_tests = 6 + + # Test 1: Shape property + try: + scalar = Tensor(5.0) + vector = Tensor([1, 2, 3]) + matrix = Tensor([[1, 2], [3, 4]]) + tensor_3d = Tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) + + assert scalar.shape == (), f"Scalar shape should be (), got {scalar.shape}" + assert vector.shape == (3,), f"Vector shape should be (3,), got {vector.shape}" + assert matrix.shape == (2, 2), f"Matrix shape should be (2, 2), got {matrix.shape}" + assert tensor_3d.shape == (2, 2, 2), f"3D tensor shape should be (2, 2, 2), got {tensor_3d.shape}" + + print("✅ Shape property: scalar, vector, matrix, and 3D tensor") + tests_passed += 1 + except Exception as e: + print(f"❌ Shape property failed: {e}") + + # Test 2: Size property + try: + scalar = Tensor(5.0) + vector = Tensor([1, 2, 3]) + matrix = Tensor([[1, 2], [3, 4]]) + empty = Tensor([]) + + assert scalar.size == 1, f"Scalar size should be 1, got {scalar.size}" + assert vector.size == 3, f"Vector size should be 3, got {vector.size}" + assert matrix.size == 4, f"Matrix size should be 4, got {matrix.size}" + assert empty.size == 0, f"Empty tensor size should be 0, got {empty.size}" + + print("✅ Size property: scalar, vector, matrix, and empty tensor") + tests_passed += 1 + except Exception as e: + print(f"❌ Size property failed: {e}") + + # Test 3: Data type property + try: + int_tensor = Tensor([1, 2, 3]) + float_tensor = Tensor([1.0, 2.0, 3.0]) + + # Check that dtype is accessible and reasonable + assert hasattr(int_tensor, 'dtype'), "Tensor should have dtype property" + assert hasattr(float_tensor, 'dtype'), "Tensor should have dtype property" + + # Data types should be NumPy dtypes + assert isinstance(int_tensor.dtype, np.dtype), f"dtype should be np.dtype, got {type(int_tensor.dtype)}" + assert isinstance(float_tensor.dtype, np.dtype), f"dtype should be np.dtype, got {type(float_tensor.dtype)}" + + print(f"✅ Data type property: int tensor is {int_tensor.dtype}, float tensor is {float_tensor.dtype}") + tests_passed += 1 + except Exception as e: + print(f"❌ Data type property failed: {e}") + + # Test 4: Data access property + try: + scalar = Tensor(5.0) + vector = Tensor([1, 2, 3]) + matrix = Tensor([[1, 2], [3, 4]]) + + # Test data access + assert hasattr(scalar, 'data'), "Tensor should have data property" + assert hasattr(vector, 'data'), "Tensor should have data property" + assert hasattr(matrix, 'data'), "Tensor should have data property" + + # Test data content + assert scalar.data.item() == 5.0, f"Scalar data should be 5.0, got {scalar.data.item()}" + assert np.array_equal(vector.data, np.array([1, 2, 3])), "Vector data mismatch" + assert np.array_equal(matrix.data, np.array([[1, 2], [3, 4]])), "Matrix data mismatch" + + print("✅ Data access: scalar, vector, and matrix data retrieval") + tests_passed += 1 + except Exception as e: + print(f"❌ Data access failed: {e}") + + # Test 5: String representation + try: + scalar = Tensor(5.0) + vector = Tensor([1, 2, 3]) + + # Test that __repr__ works + scalar_str = str(scalar) + vector_str = str(vector) + + assert isinstance(scalar_str, str), "Tensor string representation should be a string" + assert isinstance(vector_str, str), "Tensor string representation should be a string" + assert len(scalar_str) > 0, "Tensor string representation should not be empty" + assert len(vector_str) > 0, "Tensor string representation should not be empty" + + print(f"✅ String representation: scalar={scalar_str[:50]}{'...' if len(scalar_str) > 50 else ''}") + tests_passed += 1 + except Exception as e: + print(f"❌ String representation failed: {e}") + + # Test 6: Property consistency + try: + test_cases = [ + Tensor(42), + Tensor([1, 2, 3, 4, 5]), + Tensor([[1, 2, 3], [4, 5, 6]]), + Tensor([]) + ] + + for i, tensor in enumerate(test_cases): + # Size should equal product of shape + expected_size = np.prod(tensor.shape) if tensor.shape else 1 + assert tensor.size == expected_size, f"Test case {i}: size {tensor.size} doesn't match shape {tensor.shape}" + + # Data shape should match tensor shape + assert tensor.data.shape == tensor.shape, f"Test case {i}: data shape {tensor.data.shape} doesn't match tensor shape {tensor.shape}" + + print("✅ Property consistency: size matches shape, data shape matches tensor shape") + tests_passed += 1 + except Exception as e: + print(f"❌ Property consistency failed: {e}") + + # Results summary + print(f"\n📊 Tensor Properties Results: {tests_passed}/{total_tests} tests passed") + + if tests_passed == total_tests: + print("🎉 All tensor property tests passed! Your tensor has:") + print(" • Correct shape property for all dimensions") + print(" • Accurate size calculation") + print(" • Proper data type handling") + print(" • Working data access") + print(" • Good string representation") + print("📈 Progress: Tensor Creation ✓, Properties ✓") + return True + else: + print("⚠️ Some property tests failed. Common issues:") + print(" • Check your @property decorators") + print(" • Verify shape returns self._data.shape") + print(" • Make sure size returns self._data.size") + print(" • Ensure dtype returns self._data.dtype") + print(" • Test your __repr__ method") + return False + +# Run the comprehensive test +success = test_tensor_properties_comprehensive() and success + +# %% [markdown] +""" +### 🧪 Comprehensive Test: Tensor Arithmetic + +Let's test all arithmetic operations. These are the foundation of neural network computations! +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-arithmetic-comprehensive", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false} +def test_tensor_arithmetic_comprehensive(): + """Comprehensive test of tensor arithmetic operations.""" + print("🔬 Testing comprehensive tensor arithmetic...") + + tests_passed = 0 + total_tests = 8 + + # Test 1: Basic addition method + try: + a = Tensor([1, 2, 3]) + b = Tensor([4, 5, 6]) + c = a.add(b) + + expected = np.array([5, 7, 9]) + assert np.array_equal(c.data, expected), f"Addition method failed: expected {expected}, got {c.data}" + assert isinstance(c, Tensor), "Addition should return a Tensor" + + print(f"✅ Addition method: {a.data} + {b.data} = {c.data}") + tests_passed += 1 + except Exception as e: + print(f"❌ Addition method failed: {e}") + + # Test 2: Basic multiplication method + try: + a = Tensor([1, 2, 3]) + b = Tensor([4, 5, 6]) + c = a.multiply(b) + + expected = np.array([4, 10, 18]) + assert np.array_equal(c.data, expected), f"Multiplication method failed: expected {expected}, got {c.data}" + assert isinstance(c, Tensor), "Multiplication should return a Tensor" + + print(f"✅ Multiplication method: {a.data} * {b.data} = {c.data}") + tests_passed += 1 + except Exception as e: + print(f"❌ Multiplication method failed: {e}") + + # Test 3: Addition operator (+) + try: + a = Tensor([1, 2, 3]) + b = Tensor([4, 5, 6]) + c = a + b + + expected = np.array([5, 7, 9]) + assert np.array_equal(c.data, expected), f"+ operator failed: expected {expected}, got {c.data}" + assert isinstance(c, Tensor), "+ operator should return a Tensor" + + print(f"✅ + operator: {a.data} + {b.data} = {c.data}") + tests_passed += 1 + except Exception as e: + print(f"❌ + operator failed: {e}") + + # Test 4: Multiplication operator (*) + try: + a = Tensor([1, 2, 3]) + b = Tensor([4, 5, 6]) + c = a * b + + expected = np.array([4, 10, 18]) + assert np.array_equal(c.data, expected), f"* operator failed: expected {expected}, got {c.data}" + assert isinstance(c, Tensor), "* operator should return a Tensor" + + print(f"✅ * operator: {a.data} * {b.data} = {c.data}") + tests_passed += 1 + except Exception as e: + print(f"❌ * operator failed: {e}") + + # Test 5: Subtraction operator (-) + try: + a = Tensor([1, 2, 3]) + b = Tensor([4, 5, 6]) + c = b - a + + expected = np.array([3, 3, 3]) + assert np.array_equal(c.data, expected), f"- operator failed: expected {expected}, got {c.data}" + assert isinstance(c, Tensor), "- operator should return a Tensor" + + print(f"✅ - operator: {b.data} - {a.data} = {c.data}") + tests_passed += 1 + except Exception as e: + print(f"❌ - operator failed: {e}") + + # Test 6: Division operator (/) + try: + a = Tensor([1, 2, 4]) + b = Tensor([2, 4, 8]) + c = b / a + + expected = np.array([2.0, 2.0, 2.0]) + assert np.allclose(c.data, expected), f"/ operator failed: expected {expected}, got {c.data}" + assert isinstance(c, Tensor), "/ operator should return a Tensor" + + print(f"✅ / operator: {b.data} / {a.data} = {c.data}") + tests_passed += 1 + except Exception as e: + print(f"❌ / operator failed: {e}") + + # Test 7: Scalar operations + try: + a = Tensor([1, 2, 3]) + + # Addition with scalar + b = a + 10 + expected_add = np.array([11, 12, 13]) + assert np.array_equal(b.data, expected_add), f"Scalar addition failed: expected {expected_add}, got {b.data}" + + # Multiplication with scalar + c = a * 2 + expected_mul = np.array([2, 4, 6]) + assert np.array_equal(c.data, expected_mul), f"Scalar multiplication failed: expected {expected_mul}, got {c.data}" + + # Subtraction with scalar + d = a - 1 + expected_sub = np.array([0, 1, 2]) + assert np.array_equal(d.data, expected_sub), f"Scalar subtraction failed: expected {expected_sub}, got {d.data}" + + # Division with scalar + e = a / 2 + expected_div = np.array([0.5, 1.0, 1.5]) + assert np.allclose(e.data, expected_div), f"Scalar division failed: expected {expected_div}, got {e.data}" + + print(f"✅ Scalar operations: +10, *2, -1, /2 all work correctly") + tests_passed += 1 + except Exception as e: + print(f"❌ Scalar operations failed: {e}") + + # Test 8: Matrix operations + try: + matrix_a = Tensor([[1, 2], [3, 4]]) + matrix_b = Tensor([[5, 6], [7, 8]]) + + # Matrix addition + c = matrix_a + matrix_b + expected = np.array([[6, 8], [10, 12]]) + assert np.array_equal(c.data, expected), f"Matrix addition failed: expected {expected}, got {c.data}" + assert c.shape == (2, 2), f"Matrix addition should preserve shape, got {c.shape}" + + # Matrix multiplication (element-wise) + d = matrix_a * matrix_b + expected_mul = np.array([[5, 12], [21, 32]]) + assert np.array_equal(d.data, expected_mul), f"Matrix multiplication failed: expected {expected_mul}, got {d.data}" + + print(f"✅ Matrix operations: 2x2 matrix addition and multiplication") + tests_passed += 1 + except Exception as e: + print(f"❌ Matrix operations failed: {e}") + + # Results summary + print(f"\n📊 Tensor Arithmetic Results: {tests_passed}/{total_tests} tests passed") + + if tests_passed == total_tests: + print("🎉 All tensor arithmetic tests passed! Your tensor supports:") + print(" • Basic methods: add(), multiply()") + print(" • Python operators: +, -, *, /") + print(" • Scalar operations: tensor + number") + print(" • Matrix operations: element-wise operations") + print("📈 Progress: Tensor Creation ✓, Properties ✓, Arithmetic ✓") + return True + else: + print("⚠️ Some arithmetic tests failed. Common issues:") + print(" • Check your add() and multiply() methods") + print(" • Verify operator overloading (__add__, __mul__, __sub__, __truediv__)") + print(" • Make sure scalar operations work (convert scalar to Tensor)") + print(" • Test with different tensor shapes") + return False + +# Run the comprehensive test +success = test_tensor_arithmetic_comprehensive() and success + +# %% [markdown] +""" +### 🧪 Final Integration Test: Real ML Scenario + +Let's test your tensor with a realistic machine learning scenario to make sure everything works together. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-integration", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +def test_tensor_integration(): + """Integration test with realistic ML scenario.""" + print("🔬 Testing tensor integration with ML scenario...") + + try: + print("🧠 Simulating a simple neural network forward pass...") + + # Simulate input data (batch of 2 samples, 3 features each) + X = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) + print(f"📊 Input data shape: {X.shape}") + + # Simulate weights (3 input features, 2 output neurons) + W = Tensor([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]]) + print(f"🎯 Weights shape: {W.shape}") + + # Simulate bias (2 output neurons) + b = Tensor([0.1, 0.2]) + print(f"⚖️ Bias shape: {b.shape}") + + # Simple linear transformation: y = X * W + b + # Note: This is a simplified version - real matrix multiplication would be different + # But we can test element-wise operations + + # Test that we can do basic operations needed for ML + sample = Tensor([1.0, 2.0, 3.0]) # Single sample + weight_col = Tensor([0.1, 0.3, 0.5]) # First column of weights + + # Compute dot product manually using element-wise operations + products = sample * weight_col # Element-wise multiplication + print(f"✅ Element-wise multiplication works: {products.data}") + + # Test addition for bias + result = products + Tensor([0.1, 0.1, 0.1]) + print(f"✅ Bias addition works: {result.data}") + + # Test with different shapes + matrix_a = Tensor([[1, 2], [3, 4]]) + matrix_b = Tensor([[0.1, 0.2], [0.3, 0.4]]) + matrix_result = matrix_a * matrix_b + print(f"✅ Matrix operations work: {matrix_result.data}") + + # Test scalar operations (common in ML) + scaled = sample * 0.5 # Learning rate scaling + print(f"✅ Scalar scaling works: {scaled.data}") + + # Test normalization-like operations + mean_val = Tensor([2.0, 2.0, 2.0]) # Simulate mean + normalized = sample - mean_val + print(f"✅ Mean subtraction works: {normalized.data}") + + print("\n🎉 Integration test passed! Your tensor class can handle:") + print(" • Multi-dimensional data (batches, features)") + print(" • Element-wise operations needed for ML") + print(" • Scalar operations (learning rates, normalization)") + print(" • Matrix operations (weights, transformations)") + print("📈 Progress: All tensor functionality ✓") + print("🚀 Ready for neural network layers!") + + return True + + except Exception as e: + print(f"❌ Integration test failed: {e}") + print("\n💡 This suggests an issue with:") + print(" • Basic tensor operations not working together") + print(" • Shape handling problems") + print(" • Arithmetic operation implementation") + print(" • Check your tensor creation and arithmetic methods") + return False + +# Run the integration test +success = test_tensor_integration() and success + +# Print final summary +print(f"\n{'='*60}") +print("🎯 TENSOR MODULE TESTING COMPLETE") +print(f"{'='*60}") + +if success: + print("🎉 CONGRATULATIONS! All tensor tests passed!") + print("\n✅ Your Tensor class successfully implements:") + print(" • Comprehensive tensor creation (scalars, vectors, matrices)") + print(" • All essential properties (shape, size, dtype, data access)") + print(" • Complete arithmetic operations (methods and operators)") + print(" • Scalar and matrix operations") + print(" • Real ML scenario compatibility") + print("\n🚀 You're ready to move to the next module!") + print("📈 Final Progress: Tensor Module ✓ COMPLETE") +else: + print("⚠️ Some tests failed. Please review the error messages above.") + print("\n🔧 To fix issues:") + print(" 1. Check the specific test that failed") + print(" 2. Review the error message and hints") + print(" 3. Fix your implementation") + print(" 4. Re-run the notebook cells") + print("\n💪 Don't give up! Debugging is part of learning.") + +# %% [markdown] +""" +## Step 3: Tensor Arithmetic Operations + +### Why Arithmetic Matters +Tensor arithmetic is the foundation of all neural network operations: +- **Forward pass**: Matrix multiplications and additions +- **Activation functions**: Element-wise operations +- **Loss computation**: Differences and squares +- **Gradient computation**: Chain rule applications + +### Operations We'll Implement +- **Addition**: Element-wise addition of tensors +- **Multiplication**: Element-wise multiplication +- **Python operators**: `+`, `-`, `*`, `/` for natural syntax +- **Broadcasting**: Handle different shapes automatically +""" + +# %% [markdown] +""" +## Step 3: Tensor Arithmetic Methods + +The arithmetic methods are now part of the Tensor class above. Let's test them! +""" + +# %% [markdown] +""" +## Step 4: Python Operator Overloading + +### Why Operator Overloading? +Python's magic methods allow us to use natural syntax: +- `a + b` instead of `a.add(b)` +- `a * b` instead of `a.multiply(b)` +- `a - b` for subtraction +- `a / b` for division + +This makes tensor operations feel natural and readable. +""" + +# %% [markdown] +""" +## Step 4: Operator Overloading + +The operator methods (__add__, __mul__, __sub__, __truediv__) are now part of the Tensor class above. This enables natural syntax like `a + b` and `a * b`. +""" + +# %% [markdown] +""" +### 🧪 Test Your Tensor Implementation + +Once you implement the Tensor class above, run these cells to test your implementation: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-creation", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test tensor creation and properties +print("Testing tensor creation...") + +# Test scalar creation +scalar = Tensor(5.0) +assert scalar.shape == (), f"Scalar shape should be (), got {scalar.shape}" +assert scalar.size == 1, f"Scalar size should be 1, got {scalar.size}" +assert scalar.data.item() == 5.0, f"Scalar value should be 5.0, got {scalar.data.item()}" + +# Test vector creation +vector = Tensor([1, 2, 3]) +assert vector.shape == (3,), f"Vector shape should be (3,), got {vector.shape}" +assert vector.size == 3, f"Vector size should be 3, got {vector.size}" +assert np.array_equal(vector.data, np.array([1, 2, 3])), "Vector data mismatch" + +# Test matrix creation +matrix = Tensor([[1, 2], [3, 4]]) +assert matrix.shape == (2, 2), f"Matrix shape should be (2, 2), got {matrix.shape}" +assert matrix.size == 4, f"Matrix size should be 4, got {matrix.size}" +assert np.array_equal(matrix.data, np.array([[1, 2], [3, 4]])), "Matrix data mismatch" + +# Test dtype handling +float_tensor = Tensor([1.0, 2.0, 3.0]) +assert float_tensor.dtype == np.float32, f"Float tensor dtype should be float32, got {float_tensor.dtype}" + +int_tensor = Tensor([1, 2, 3]) +# Note: NumPy may default to int64 on some systems, so we check for integer types +assert int_tensor.dtype in [np.int32, np.int64], f"Int tensor dtype should be int32 or int64, got {int_tensor.dtype}" + +print("✅ Tensor creation tests passed!") +print(f"✅ Scalar: {scalar}") +print(f"✅ Vector: {vector}") +print(f"✅ Matrix: {matrix}") + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-arithmetic", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test tensor arithmetic operations +print("Testing tensor arithmetic...") + +# Test addition +a = Tensor([1, 2, 3]) +b = Tensor([4, 5, 6]) +c = a + b +expected = np.array([5, 7, 9]) +assert np.array_equal(c.data, expected), f"Addition failed: expected {expected}, got {c.data}" + +# Test multiplication +d = a * b +expected = np.array([4, 10, 18]) +assert np.array_equal(d.data, expected), f"Multiplication failed: expected {expected}, got {d.data}" + +# Test subtraction +e = b - a +expected = np.array([3, 3, 3]) +assert np.array_equal(e.data, expected), f"Subtraction failed: expected {expected}, got {e.data}" + +# Test division +f = b / a +expected = np.array([4.0, 2.5, 2.0]) +assert np.allclose(f.data, expected), f"Division failed: expected {expected}, got {f.data}" + +# Test scalar operations +g = a + 10 +expected = np.array([11, 12, 13]) +assert np.array_equal(g.data, expected), f"Scalar addition failed: expected {expected}, got {g.data}" + +h = a * 2 +expected = np.array([2, 4, 6]) +assert np.array_equal(h.data, expected), f"Scalar multiplication failed: expected {expected}, got {h.data}" + +print("✅ Tensor arithmetic tests passed!") +print(f"✅ Addition: {a} + {b} = {c}") +print(f"✅ Multiplication: {a} * {b} = {d}") +print(f"✅ Subtraction: {b} - {a} = {e}") +print(f"✅ Division: {b} / {a} = {f}") + +# %% nbgrader={"grade": true, "grade_id": "test-tensor-broadcasting", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test tensor broadcasting +print("Testing tensor broadcasting...") + +# Test scalar broadcasting +matrix = Tensor([[1, 2], [3, 4]]) +scalar = Tensor(10) +result = matrix + scalar +expected = np.array([[11, 12], [13, 14]]) +assert np.array_equal(result.data, expected), f"Scalar broadcasting failed: expected {expected}, got {result.data}" + +# Test vector broadcasting +vector = Tensor([1, 2]) +result = matrix + vector +expected = np.array([[2, 4], [4, 6]]) +assert np.array_equal(result.data, expected), f"Vector broadcasting failed: expected {expected}, got {result.data}" + +# Test different shapes +a = Tensor([[1], [2], [3]]) # (3, 1) +b = Tensor([10, 20]) # (2,) +result = a + b +expected = np.array([[11, 21], [12, 22], [13, 23]]) +assert np.array_equal(result.data, expected), f"Shape broadcasting failed: expected {expected}, got {result.data}" + +print("✅ Tensor broadcasting tests passed!") +print(f"✅ Matrix + Scalar: {matrix} + {scalar} = {result}") +print(f"✅ Broadcasting works correctly!") + +# %% [markdown] +""" +## 🎯 Module Summary + +Congratulations! You've successfully implemented the core Tensor class for TinyTorch: + +### What You've Accomplished +✅ **Tensor Creation**: Handle scalars, vectors, matrices, and higher-dimensional arrays +✅ **Data Types**: Proper dtype handling with auto-detection and conversion +✅ **Properties**: Shape, size, dtype, and data access +✅ **Arithmetic**: Addition, multiplication, subtraction, division +✅ **Operators**: Natural Python syntax with `+`, `-`, `*`, `/` +✅ **Broadcasting**: Automatic shape compatibility like NumPy + +### Key Concepts You've Learned +- **Tensors** are the fundamental data structure for ML systems +- **NumPy backend** provides efficient computation with ML-friendly API +- **Operator overloading** makes tensor operations feel natural +- **Broadcasting** enables flexible operations between different shapes +- **Type safety** ensures consistent behavior across operations + +### Next Steps +1. **Export your code**: `tito package nbdev --export 01_tensor` +2. **Test your implementation**: `tito module test 01_tensor` +3. **Use your tensors**: + ```python + from tinytorch.core.tensor import Tensor + t = Tensor([1, 2, 3]) + print(t + 5) # Your tensor in action! + ``` +4. **Move to Module 2**: Start building activation functions! + +**Ready for the next challenge?** Let's add the mathematical functions that make neural networks powerful! +""" \ No newline at end of file diff --git a/modules/source/01_tensor/tests/test_tensor.py b/modules/source/01_tensor/tests/test_tensor.py deleted file mode 100644 index 1b182af8..00000000 --- a/modules/source/01_tensor/tests/test_tensor.py +++ /dev/null @@ -1,337 +0,0 @@ -""" -Test suite for the tensor module. -This tests the student implementations to ensure they work correctly. -""" - -import pytest -import numpy as np -import sys -import os - -# Import from the main package (rock solid foundation) -from tinytorch.core.tensor import Tensor - -def safe_numpy(tensor): - """Get numpy array from tensor, using .numpy() if available, otherwise .data""" - if hasattr(tensor, 'numpy'): - return tensor.numpy() - else: - return tensor.data - -def safe_item(tensor): - """Get scalar value from tensor, using .item() if available, otherwise .data""" - if hasattr(tensor, 'item'): - return tensor.item() - else: - return float(tensor.data) - -class TestTensorCreation: - """Test tensor creation from different data types.""" - - def test_scalar_creation(self): - """Test creating tensors from scalars.""" - # Float scalar - t1 = Tensor(5.0) - assert t1.shape == () - assert t1.size == 1 - assert safe_item(t1) == 5.0 - - # Integer scalar - t2 = Tensor(42) - assert t2.shape == () - assert t2.size == 1 - assert safe_item(t2) == 42.0 # Should convert to float32 - - def test_vector_creation(self): - """Test creating 1D tensors.""" - t = Tensor([1, 2, 3, 4]) - assert t.shape == (4,) - assert t.size == 4 - assert t.dtype == np.int32 # Integer list defaults to int32 - np.testing.assert_array_equal(safe_numpy(t), [1, 2, 3, 4]) - - def test_matrix_creation(self): - """Test creating 2D tensors.""" - t = Tensor([[1, 2], [3, 4]]) - assert t.shape == (2, 2) - assert t.size == 4 - expected = np.array([[1.0, 2.0], [3.0, 4.0]], dtype='float32') - np.testing.assert_array_equal(safe_numpy(t), expected) - - def test_numpy_array_creation(self): - """Test creating tensors from numpy arrays.""" - arr = np.array([1, 2, 3], dtype='int32') - t = Tensor(arr) - assert t.shape == (3,) - assert t.dtype in ['int32', 'float32'] # May convert - - def test_dtype_specification(self): - """Test explicit dtype specification.""" - t = Tensor([1, 2, 3], dtype='int32') - assert t.dtype == np.int32 - - def test_invalid_data_type(self): - """Test error handling for invalid data types.""" - with pytest.raises(TypeError): - Tensor("invalid") - with pytest.raises(TypeError): - Tensor({"dict": "invalid"}) - -class TestTensorProperties: - """Test tensor properties and methods.""" - - def test_shape_property(self): - """Test shape property for different dimensions.""" - assert Tensor(5).shape == () - assert Tensor([1, 2, 3]).shape == (3,) - assert Tensor([[1, 2], [3, 4]]).shape == (2, 2) - assert Tensor([[[1]]]).shape == (1, 1, 1) - - def test_size_property(self): - """Test size property.""" - assert Tensor(5).size == 1 - assert Tensor([1, 2, 3]).size == 3 - assert Tensor([[1, 2], [3, 4]]).size == 4 - assert Tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]).size == 8 - - def test_dtype_property(self): - """Test dtype property.""" - t1 = Tensor(5.0) - assert t1.dtype == np.float32 - - t2 = Tensor([1, 2, 3], dtype='int32') - assert t2.dtype == np.int32 - - def test_repr(self): - """Test string representation.""" - t = Tensor([1, 2, 3]) - repr_str = repr(t) - assert 'Tensor' in repr_str - assert 'shape=' in repr_str - assert 'dtype=' in repr_str - -class TestArithmeticOperations: - """Test tensor arithmetic operations.""" - - def test_tensor_addition(self): - """Test tensor + tensor addition.""" - a = Tensor([1, 2, 3]) - b = Tensor([4, 5, 6]) - result = a + b - expected = [5.0, 7.0, 9.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_scalar_addition(self): - """Test tensor + scalar addition.""" - a = Tensor([1, 2, 3]) - result = a + 10 - expected = [11.0, 12.0, 13.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_reverse_addition(self): - """Test scalar + tensor addition.""" - a = Tensor([1, 2, 3]) - result = 10 + a - expected = [11.0, 12.0, 13.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_tensor_subtraction(self): - """Test tensor - tensor subtraction.""" - a = Tensor([5, 7, 9]) - b = Tensor([1, 2, 3]) - result = a - b - expected = [4.0, 5.0, 6.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_scalar_subtraction(self): - """Test tensor - scalar subtraction.""" - a = Tensor([10, 20, 30]) - result = a - 5 - expected = [5.0, 15.0, 25.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_tensor_multiplication(self): - """Test tensor * tensor multiplication.""" - a = Tensor([2, 3, 4]) - b = Tensor([5, 6, 7]) - result = a * b - expected = [10.0, 18.0, 28.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_scalar_multiplication(self): - """Test tensor * scalar multiplication.""" - a = Tensor([1, 2, 3]) - result = a * 3 - expected = [3.0, 6.0, 9.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_reverse_multiplication(self): - """Test scalar * tensor multiplication.""" - a = Tensor([1, 2, 3]) - result = 3 * a - expected = [3.0, 6.0, 9.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_tensor_division(self): - """Test tensor / tensor division.""" - a = Tensor([6, 8, 10]) - b = Tensor([2, 4, 5]) - result = a / b - expected = [3.0, 2.0, 2.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_scalar_division(self): - """Test tensor / scalar division.""" - a = Tensor([6, 8, 10]) - result = a / 2 - expected = [3.0, 4.0, 5.0] - np.testing.assert_array_equal(safe_numpy(result), expected) - -class TestUtilityMethods: - """Test tensor utility methods (stretch goals for students).""" - - def test_reshape(self): - """Test tensor reshaping (if implemented).""" - t = Tensor([[1, 2], [3, 4]]) - if hasattr(t, 'reshape'): - reshaped = t.reshape(4) - assert reshaped.shape == (4,) - expected = [1.0, 2.0, 3.0, 4.0] - np.testing.assert_array_equal(safe_numpy(reshaped), expected) - - # Reshape to 2D - reshaped2 = t.reshape(1, 4) - assert reshaped2.shape == (1, 4) - else: - pytest.skip("reshape method not implemented - stretch goal for students") - - def test_transpose(self): - """Test tensor transpose (if implemented).""" - t = Tensor([[1, 2, 3], [4, 5, 6]]) - if hasattr(t, 'transpose'): - transposed = t.transpose() - assert transposed.shape == (3, 2) - expected = [[1.0, 4.0], [2.0, 5.0], [3.0, 6.0]] - np.testing.assert_array_equal(safe_numpy(transposed), expected) - else: - pytest.skip("transpose method not implemented - stretch goal for students") - - def test_sum_all(self): - """Test summing all elements (if implemented).""" - t = Tensor([[1, 2], [3, 4]]) - if hasattr(t, 'sum'): - result = t.sum() - expected = 10.0 - assert abs(safe_item(result) - expected) < 1e-6 - else: - pytest.skip("sum method not implemented - stretch goal for students") - - def test_sum_axis(self): - """Test summing along specific axes (if implemented).""" - t = Tensor([[1, 2], [3, 4]]) - if hasattr(t, 'sum'): - # Sum along axis 0 (columns) - sum0 = t.sum(axis=0) - expected0 = [4.0, 6.0] - np.testing.assert_array_equal(safe_numpy(sum0), expected0) - - # Sum along axis 1 (rows) - sum1 = t.sum(axis=1) - expected1 = [3.0, 7.0] - np.testing.assert_array_equal(safe_numpy(sum1), expected1) - else: - pytest.skip("sum method not implemented - stretch goal for students") - - def test_mean(self): - """Test mean calculation (if implemented).""" - t = Tensor([[1, 2], [3, 4]]) - if hasattr(t, 'mean'): - result = t.mean() - expected = 2.5 - assert abs(safe_item(result) - expected) < 1e-6 - else: - pytest.skip("mean method not implemented - stretch goal for students") - - def test_max(self): - """Test maximum value (if implemented).""" - t = Tensor([[1, 2], [3, 4]]) - if hasattr(t, 'max'): - result = t.max() - expected = 4.0 - assert abs(safe_item(result) - expected) < 1e-6 - else: - pytest.skip("max method not implemented - stretch goal for students") - - def test_min(self): - """Test minimum value (if implemented).""" - t = Tensor([[1, 2], [3, 4]]) - if hasattr(t, 'min'): - result = t.min() - expected = 1.0 - assert abs(safe_item(result) - expected) < 1e-6 - else: - pytest.skip("min method not implemented - stretch goal for students") - - def test_item_scalar(self): - """Test converting single-element tensor to scalar (if implemented).""" - t = Tensor(42.0) - if hasattr(t, 'item'): - assert t.item() == 42.0 - else: - pytest.skip("item method not implemented - stretch goal for students") - - def test_item_error(self): - """Test item() error for multi-element tensors (if implemented).""" - t = Tensor([1, 2, 3]) - if hasattr(t, 'item'): - with pytest.raises(ValueError): - t.item() - else: - pytest.skip("item method not implemented - stretch goal for students") - - def test_numpy_conversion(self): - """Test converting tensor to numpy array (if implemented).""" - t = Tensor([[1, 2], [3, 4]]) - if hasattr(t, 'numpy'): - arr = t.numpy() - assert isinstance(arr, np.ndarray) - expected = [[1.0, 2.0], [3.0, 4.0]] - np.testing.assert_array_equal(arr, expected) - else: - pytest.skip("numpy method not implemented - stretch goal for students") - -class TestEdgeCases: - """Test edge cases and error handling.""" - - def test_empty_list(self): - """Test creating tensor from empty list.""" - t = Tensor([]) - assert t.shape == (0,) - assert t.size == 0 - - def test_mixed_operations(self): - """Test combining different operations.""" - a = Tensor([[1, 2], [3, 4]]) - b = Tensor([[2, 2], [2, 2]]) - - # Complex expression - result = (a + b) * 2 - 1 - expected = [[5.0, 7.0], [9.0, 11.0]] - np.testing.assert_array_equal(safe_numpy(result), expected) - - def test_chained_operations(self): - """Test chaining multiple operations (if methods implemented).""" - t = Tensor([[1, 2, 3], [4, 5, 6]]) - if hasattr(t, 'sum') and hasattr(t, 'mean'): - result = t.sum(axis=1).mean() - expected = 10.5 # (6 + 15) / 2 - assert abs(safe_item(result) - expected) < 1e-6 - else: - pytest.skip("Advanced methods not implemented - stretch goal for students") - -def run_tensor_tests(): - """Run all tensor tests.""" - pytest.main([__file__, "-v"]) - -if __name__ == "__main__": - run_tensor_tests() \ No newline at end of file diff --git a/modules/source/02_activations/activations_dev.py b/modules/source/02_activations/activations_dev.py index 517bd559..8acd9f64 100644 --- a/modules/source/02_activations/activations_dev.py +++ b/modules/source/02_activations/activations_dev.py @@ -230,11 +230,11 @@ Once you implement the ReLU forward method above, run this cell to test it: def test_relu_activation(): """Test ReLU activation function""" print("Testing ReLU activation...") - - # Create ReLU instance - relu = ReLU() - - # Test with mixed positive/negative values + +# Create ReLU instance +relu = ReLU() + +# Test with mixed positive/negative values test_input = Tensor([[-2, -1, 0, 1, 2]]) result = relu(test_input) expected = np.array([[0, 0, 0, 1, 2]]) @@ -368,10 +368,10 @@ Once you implement the Sigmoid forward method above, run this cell to test it: def test_sigmoid_activation(): """Test Sigmoid activation function""" print("Testing Sigmoid activation...") - - # Create Sigmoid instance - sigmoid = Sigmoid() - + +# Create Sigmoid instance +sigmoid = Sigmoid() + # Test with known values test_input = Tensor([[0]]) result = sigmoid(test_input) @@ -514,10 +514,10 @@ Once you implement the Tanh forward method above, run this cell to test it: def test_tanh_activation(): """Test Tanh activation function""" print("Testing Tanh activation...") - - # Create Tanh instance - tanh = Tanh() - + +# Create Tanh instance +tanh = Tanh() + # Test with zero (should be 0) test_input = Tensor([[0]]) result = tanh(test_input) @@ -676,10 +676,10 @@ Once you implement the Softmax forward method above, run this cell to test it: def test_softmax_activation(): """Test Softmax activation function""" print("Testing Softmax activation...") - - # Create Softmax instance - softmax = Softmax() - + +# Create Softmax instance +softmax = Softmax() + # Test with simple input test_input = Tensor([[1, 2, 3]]) result = softmax(test_input) @@ -718,8 +718,8 @@ def test_softmax_activation(): large_sum = np.sum(large_result.data) assert abs(large_sum - 1.0) < 1e-6, "Large values should still sum to 1" - - # Test shape preservation + +# Test shape preservation assert batch_result.shape == batch_input.shape, "Softmax should preserve shape" print("✅ Softmax activation tests passed!") @@ -751,9 +751,9 @@ def test_activations_integration(): print("Testing activation functions integration...") # Create instances of all activation functions - relu = ReLU() - sigmoid = Sigmoid() - tanh = Tanh() + relu = ReLU() + sigmoid = Sigmoid() + tanh = Tanh() softmax = Softmax() # Test data: simulating neural network layer outputs @@ -791,7 +791,7 @@ def test_activations_integration(): # Test Softmax properties softmax_sum = np.sum(softmax_result.data) assert abs(softmax_sum - 1.0) < 1e-6, "Softmax outputs should sum to 1" - + # Test chaining activations (realistic neural network scenario) # Hidden layer with ReLU hidden_output = relu(test_data) @@ -815,8 +815,8 @@ def test_activations_integration(): ]) batch_softmax = softmax(batch_data) - - # Each row should sum to 1 + + # Each row should sum to 1 for i in range(batch_data.shape[0]): row_sum = np.sum(batch_softmax.data[i]) assert abs(row_sum - 1.0) < 1e-6, f"Batch row {i} should sum to 1" diff --git a/modules/source/02_activations/tests/test_activations.py b/modules/source/02_activations/tests/test_activations.py deleted file mode 100644 index c7b5fbb4..00000000 --- a/modules/source/02_activations/tests/test_activations.py +++ /dev/null @@ -1,332 +0,0 @@ -""" -Test suite for the activations module. -This tests the student implementations to ensure they work correctly. -""" - -import pytest -import numpy as np -import sys -import os - -# Import from the main package (rock solid foundation) -from tinytorch.core.tensor import Tensor -from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax - - -class TestReLU: - """Test the ReLU activation function.""" - - def test_relu_basic_functionality(self): - """Test basic ReLU behavior: max(0, x)""" - relu = ReLU() - - # Test mixed positive/negative values - x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]]) - y = relu(x) - expected = np.array([[0.0, 0.0, 0.0, 1.0, 2.0]]) - - assert np.allclose(y.data, expected), f"Expected {expected}, got {y.data}" - - def test_relu_all_positive(self): - """Test ReLU with all positive values (should be unchanged)""" - relu = ReLU() - - x = Tensor([[1.0, 2.5, 3.7, 10.0]]) - y = relu(x) - - assert np.allclose(y.data, x.data), "ReLU should preserve positive values" - - def test_relu_all_negative(self): - """Test ReLU with all negative values (should be zeros)""" - relu = ReLU() - - x = Tensor([[-1.0, -2.5, -3.7, -10.0]]) - y = relu(x) - expected = np.zeros_like(x.data) - - assert np.allclose(y.data, expected), "ReLU should zero out negative values" - - def test_relu_zero_input(self): - """Test ReLU with zero input""" - relu = ReLU() - - x = Tensor([[0.0]]) - y = relu(x) - - assert y.data[0, 0] == 0.0, "ReLU(0) should be 0" - - def test_relu_shape_preservation(self): - """Test that ReLU preserves tensor shape""" - relu = ReLU() - - # Test different shapes - shapes = [(1, 5), (2, 3), (4, 1), (3, 3)] - for shape in shapes: - x = Tensor(np.random.randn(*shape)) - y = relu(x) - assert y.shape == x.shape, f"Shape mismatch: expected {x.shape}, got {y.shape}" - - def test_relu_callable(self): - """Test that ReLU can be called directly""" - relu = ReLU() - x = Tensor([[1.0, -1.0]]) - - y1 = relu(x) - y2 = relu.forward(x) - - assert np.allclose(y1.data, y2.data), "Direct call should match forward method" - - -class TestSigmoid: - """Test the Sigmoid activation function.""" - - def test_sigmoid_basic_functionality(self): - """Test basic Sigmoid behavior""" - sigmoid = Sigmoid() - - # Test known values - x = Tensor([[0.0]]) - y = sigmoid(x) - assert abs(y.data[0, 0] - 0.5) < 1e-6, "Sigmoid(0) should be 0.5" - - def test_sigmoid_range(self): - """Test that Sigmoid outputs are in (0, 1)""" - sigmoid = Sigmoid() - - # Test wide range of inputs - x = Tensor([[-10.0, -5.0, -1.0, 0.0, 1.0, 5.0, 10.0]]) - y = sigmoid(x) - - assert np.all(y.data > 0), "Sigmoid outputs should be > 0" - assert np.all(y.data < 1), "Sigmoid outputs should be < 1" - - def test_sigmoid_numerical_stability(self): - """Test Sigmoid with extreme values (numerical stability)""" - sigmoid = Sigmoid() - - # Test extreme values that could cause overflow - x = Tensor([[-100.0, -50.0, 50.0, 100.0]]) - y = sigmoid(x) - - # Should not contain NaN or inf - assert not np.any(np.isnan(y.data)), "Sigmoid should not produce NaN" - assert not np.any(np.isinf(y.data)), "Sigmoid should not produce inf" - - # Should be close to 0 for very negative, close to 1 for very positive - assert y.data[0, 0] < 1e-10, "Sigmoid(-100) should be very close to 0" - assert y.data[0, 1] < 1e-10, "Sigmoid(-50) should be very close to 0" - assert y.data[0, 2] > 1 - 1e-10, "Sigmoid(50) should be very close to 1" - assert y.data[0, 3] > 1 - 1e-10, "Sigmoid(100) should be very close to 1" - - def test_sigmoid_monotonicity(self): - """Test that Sigmoid is monotonically increasing""" - sigmoid = Sigmoid() - - x = Tensor([[-3.0, -1.0, 0.0, 1.0, 3.0]]) - y = sigmoid(x) - - # Check that outputs are increasing - for i in range(len(y.data[0]) - 1): - assert y.data[0, i] < y.data[0, i + 1], "Sigmoid should be monotonically increasing" - - def test_sigmoid_shape_preservation(self): - """Test that Sigmoid preserves tensor shape""" - sigmoid = Sigmoid() - - shapes = [(1, 5), (2, 3), (4, 1)] - for shape in shapes: - x = Tensor(np.random.randn(*shape)) - y = sigmoid(x) - assert y.shape == x.shape, f"Shape mismatch: expected {x.shape}, got {y.shape}" - - def test_sigmoid_callable(self): - """Test that Sigmoid can be called directly""" - sigmoid = Sigmoid() - x = Tensor([[1.0, -1.0]]) - - y1 = sigmoid(x) - y2 = sigmoid.forward(x) - - assert np.allclose(y1.data, y2.data), "Direct call should match forward method" - - -class TestTanh: - """Test the Tanh activation function.""" - - def test_tanh_basic_functionality(self): - """Test basic Tanh behavior""" - tanh = Tanh() - - # Test known values - x = Tensor([[0.0]]) - y = tanh(x) - assert abs(y.data[0, 0] - 0.0) < 1e-6, "Tanh(0) should be 0" - - def test_tanh_range(self): - """Test that Tanh outputs are in [-1, 1]""" - tanh = Tanh() - - # Test wide range of inputs - x = Tensor([[-10.0, -5.0, -1.0, 0.0, 1.0, 5.0, 10.0]]) - y = tanh(x) - - assert np.all(y.data >= -1), "Tanh outputs should be >= -1" - assert np.all(y.data <= 1), "Tanh outputs should be <= 1" - - def test_tanh_symmetry(self): - """Test that Tanh is symmetric: tanh(-x) = -tanh(x)""" - tanh = Tanh() - - x = Tensor([[1.0, 2.0, 3.0]]) - x_neg = Tensor([[-1.0, -2.0, -3.0]]) - - y_pos = tanh(x) - y_neg = tanh(x_neg) - - assert np.allclose(y_neg.data, -y_pos.data), "Tanh should be symmetric" - - def test_tanh_monotonicity(self): - """Test that Tanh is monotonically increasing""" - tanh = Tanh() - - x = Tensor([[-3.0, -1.0, 0.0, 1.0, 3.0]]) - y = tanh(x) - - # Check that outputs are increasing - for i in range(len(y.data[0]) - 1): - assert y.data[0, i] < y.data[0, i + 1], "Tanh should be monotonically increasing" - - def test_tanh_extreme_values(self): - """Test Tanh with extreme values""" - tanh = Tanh() - - x = Tensor([[-100.0, 100.0]]) - y = tanh(x) - - # Should be close to -1 and 1 respectively - assert abs(y.data[0, 0] - (-1.0)) < 1e-10, "Tanh(-100) should be very close to -1" - assert abs(y.data[0, 1] - 1.0) < 1e-10, "Tanh(100) should be very close to 1" - - def test_tanh_shape_preservation(self): - """Test that Tanh preserves tensor shape""" - tanh = Tanh() - - shapes = [(1, 5), (2, 3), (4, 1)] - for shape in shapes: - x = Tensor(np.random.randn(*shape)) - y = tanh(x) - assert y.shape == x.shape, f"Shape mismatch: expected {x.shape}, got {y.shape}" - - def test_tanh_callable(self): - """Test that Tanh can be called directly""" - tanh = Tanh() - x = Tensor([[1.0, -1.0]]) - - y1 = tanh(x) - y2 = tanh.forward(x) - - assert np.allclose(y1.data, y2.data), "Direct call should match forward method" - - -class TestActivationComparison: - """Test interactions and comparisons between activation functions.""" - - def test_activation_consistency(self): - """Test that all activations work with the same input""" - relu = ReLU() - sigmoid = Sigmoid() - tanh = Tanh() - - x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]]) - - # All should process without error - y_relu = relu(x) - y_sigmoid = sigmoid(x) - y_tanh = tanh(x) - - # All should preserve shape - assert y_relu.shape == x.shape - assert y_sigmoid.shape == x.shape - assert y_tanh.shape == x.shape - - def test_activation_ranges(self): - """Test that activations have expected output ranges""" - relu = ReLU() - sigmoid = Sigmoid() - tanh = Tanh() - - x = Tensor([[-5.0, -2.0, 0.0, 2.0, 5.0]]) - - y_relu = relu(x) - y_sigmoid = sigmoid(x) - y_tanh = tanh(x) - - # ReLU: [0, inf) - assert np.all(y_relu.data >= 0), "ReLU should be non-negative" - - # Sigmoid: (0, 1) - assert np.all(y_sigmoid.data > 0), "Sigmoid should be positive" - assert np.all(y_sigmoid.data < 1), "Sigmoid should be less than 1" - - # Tanh: (-1, 1) - assert np.all(y_tanh.data > -1), "Tanh should be greater than -1" - assert np.all(y_tanh.data < 1), "Tanh should be less than 1" - - -# Integration tests with edge cases -class TestActivationEdgeCases: - """Test edge cases and boundary conditions.""" - - def test_zero_tensor(self): - """Test all activations with zero tensor""" - relu = ReLU() - sigmoid = Sigmoid() - tanh = Tanh() - - x = Tensor([[0.0, 0.0, 0.0]]) - - y_relu = relu(x) - y_sigmoid = sigmoid(x) - y_tanh = tanh(x) - - assert np.allclose(y_relu.data, [0.0, 0.0, 0.0]), "ReLU(0) should be 0" - assert np.allclose(y_sigmoid.data, [0.5, 0.5, 0.5]), "Sigmoid(0) should be 0.5" - assert np.allclose(y_tanh.data, [0.0, 0.0, 0.0]), "Tanh(0) should be 0" - - def test_single_element_tensor(self): - """Test all activations with single element tensor""" - relu = ReLU() - sigmoid = Sigmoid() - tanh = Tanh() - - x = Tensor([[1.0]]) - - y_relu = relu(x) - y_sigmoid = sigmoid(x) - y_tanh = tanh(x) - - assert y_relu.shape == (1, 1) - assert y_sigmoid.shape == (1, 1) - assert y_tanh.shape == (1, 1) - - def test_large_tensor(self): - """Test activations with larger tensors""" - relu = ReLU() - sigmoid = Sigmoid() - tanh = Tanh() - - # Create a 10x10 tensor - x = Tensor(np.random.randn(10, 10)) - - y_relu = relu(x) - y_sigmoid = sigmoid(x) - y_tanh = tanh(x) - - assert y_relu.shape == (10, 10) - assert y_sigmoid.shape == (10, 10) - assert y_tanh.shape == (10, 10) - - -if __name__ == "__main__": - # Run tests with pytest - pytest.main([__file__, "-v"]) \ No newline at end of file diff --git a/modules/source/03_layers/layers_dev.py b/modules/source/03_layers/layers_dev.py index 8a7e362f..bbc89ed5 100644 --- a/modules/source/03_layers/layers_dev.py +++ b/modules/source/03_layers/layers_dev.py @@ -46,8 +46,8 @@ except ImportError: sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor')) sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations')) try: - from tensor_dev import Tensor - from activations_dev import ReLU, Sigmoid, Tanh, Softmax + from tensor_dev import Tensor + from activations_dev import ReLU, Sigmoid, Tanh, Softmax except ImportError: # If the local modules are not available, use relative imports from ..tensor.tensor_dev import Tensor @@ -188,7 +188,7 @@ def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray: Naive matrix multiplication using explicit for-loops. This helps you understand what matrix multiplication really does! - + TODO: Implement matrix multiplication using three nested for-loops. STEP-BY-STEP IMPLEMENTATION: @@ -259,8 +259,8 @@ Once you implement the `matmul_naive` function above, run this cell to test it: def test_matrix_multiplication(): """Test matrix multiplication implementation""" print("Testing matrix multiplication...") - - # Test simple 2x2 case + +# Test simple 2x2 case A = np.array([[1, 2], [3, 4]], dtype=np.float32) B = np.array([[5, 6], [7, 8]], dtype=np.float32) @@ -272,8 +272,8 @@ def test_matrix_multiplication(): # Compare with NumPy numpy_result = A @ B assert np.allclose(result, numpy_result), f"Doesn't match NumPy: got {result}, expected {numpy_result}" - - # Test different shapes + +# Test different shapes A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3 B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1 result2 = matmul_naive(A2, B2) @@ -423,7 +423,7 @@ class Dense: else: self.bias = None ### END SOLUTION - + def forward(self, x: Tensor) -> Tensor: """ Forward pass through the Dense layer. @@ -472,7 +472,7 @@ class Dense: return Tensor(linear_output) ### END SOLUTION - + def __call__(self, x: Tensor) -> Tensor: """Make the layer callable: layer(x) instead of layer.forward(x)""" return self.forward(x) @@ -509,8 +509,8 @@ def test_dense_layer(): batch_output = layer(batch_input) assert batch_output.shape == (2, 2), f"Batch output shape should be (2, 2), got {batch_output.shape}" - - # Test without bias + +# Test without bias no_bias_layer = Dense(input_size=3, output_size=2, use_bias=False) assert no_bias_layer.bias is None, "Layer without bias should have None bias" @@ -538,7 +538,7 @@ def test_dense_layer(): scaled_output = layer(scaled_input) # Due to bias, this won't be exactly 2*output, but the linear part should scale - print("✅ Dense layer tests passed!") +print("✅ Dense layer tests passed!") print(f"✅ Correct weight and bias initialization") print(f"✅ Forward pass produces correct shapes") print(f"✅ Batch processing works correctly") @@ -582,7 +582,7 @@ def test_layer_activation_integration(): # Create layer and activation functions layer = Dense(input_size=4, output_size=3) - relu = ReLU() + relu = ReLU() sigmoid = Sigmoid() tanh = Tanh() softmax = Softmax() diff --git a/modules/source/03_layers/layers_dev_backup.py b/modules/source/03_layers/layers_dev_backup.py new file mode 100644 index 00000000..576016f5 --- /dev/null +++ b/modules/source/03_layers/layers_dev_backup.py @@ -0,0 +1,1286 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.17.1 +# --- + +# %% [markdown] +""" +# Module 3: Layers - Building Blocks of Neural Networks + +Welcome to the Layers module! This is where we build the fundamental components that stack together to form neural networks. + +## Learning Goals +- Understand how matrix multiplication powers neural networks +- Implement naive matrix multiplication from scratch for deep understanding +- Build the Dense (Linear) layer - the foundation of all neural networks +- Learn weight initialization strategies and their importance +- See how layers compose with activations to create powerful networks + +## Build → Use → Understand +1. **Build**: Matrix multiplication and Dense layers from scratch +2. **Use**: Create and test layers with real data +3. **Understand**: How linear transformations enable feature learning +""" + +# %% nbgrader={"grade": false, "grade_id": "layers-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| default_exp core.layers + +#| export +import numpy as np +import matplotlib.pyplot as plt +import os +import sys +from typing import Union, List, Tuple, Optional + +# Import our dependencies - try from package first, then local modules +try: + from tinytorch.core.tensor import Tensor + from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax +except ImportError: + # For development, import from local modules + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor')) + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations')) + from tensor_dev import Tensor + from activations_dev import ReLU, Sigmoid, Tanh, Softmax + +# %% nbgrader={"grade": false, "grade_id": "layers-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| hide +#| export +def _should_show_plots(): + """Check if we should show plots (disable during testing)""" + # Check multiple conditions that indicate we're in test mode + is_pytest = ( + 'pytest' in sys.modules or + 'test' in sys.argv or + os.environ.get('PYTEST_CURRENT_TEST') is not None or + any('test' in arg for arg in sys.argv) or + any('pytest' in arg for arg in sys.argv) + ) + + # Show plots in development mode (when not in test mode) + return not is_pytest + +# %% nbgrader={"grade": false, "grade_id": "layers-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false} +print("🔥 TinyTorch Layers Module") +print(f"NumPy version: {np.__version__}") +print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") +print("Ready to build neural network layers!") + +# %% [markdown] +""" +## 📦 Where This Code Lives in the Final Package + +**Learning Side:** You work in `modules/source/03_layers/layers_dev.py` +**Building Side:** Code exports to `tinytorch.core.layers` + +```python +# Final package structure: +from tinytorch.core.layers import Dense, Conv2D # All layer types together! +from tinytorch.core.tensor import Tensor # The foundation +from tinytorch.core.activations import ReLU, Sigmoid # Nonlinearity +``` + +**Why this matters:** +- **Learning:** Focused modules for deep understanding +- **Production:** Proper organization like PyTorch's `torch.nn.Linear` +- **Consistency:** All layer types live together in `core.layers` +- **Integration:** Works seamlessly with tensors and activations +""" + +# %% [markdown] +""" +## 🧠 The Mathematical Foundation of Neural Layers + +### Linear Algebra at the Heart of ML +Neural networks are fundamentally about **linear transformations** followed by **nonlinear activations**: + +$$\text{Layer: } y = Wx + b \text{ (linear transformation)}$$ +$$\text{Activation: } z = \sigma(y) \text{ (nonlinear transformation)}$$ + +### Matrix Multiplication: The Engine of Deep Learning +Every forward pass in a neural network involves matrix multiplication: +- **Dense layers**: Matrix multiplication between inputs and weights +- **Convolutional layers**: Convolution as matrix multiplication +- **Attention**: Query-key-value matrix operations +- **Transformers**: Self-attention through matrix operations + +### Why Matrix Multiplication Matters +- **Parallel computation**: GPUs excel at matrix operations +- **Batch processing**: Handle multiple samples simultaneously +- **Feature learning**: Each row/column learns different patterns +- **Composability**: Layers stack naturally through matrix chains + +### Connection to Real ML Systems +Every framework optimizes matrix multiplication: +- **PyTorch**: `torch.nn.Linear` uses optimized BLAS +- **TensorFlow**: `tf.keras.layers.Dense` uses cuDNN +- **JAX**: `jax.numpy.dot` uses XLA compilation +- **TinyTorch**: `tinytorch.core.layers.Dense` (what we're building!) + +### Performance Considerations +- **Memory layout**: Contiguous arrays for cache efficiency +- **Vectorization**: SIMD operations for speed +- **Parallelization**: Multi-threading and GPU acceleration +- **Numerical stability**: Proper initialization and normalization +""" + +# %% [markdown] +""" +## Step 1: Understanding Matrix Multiplication + +### What is Matrix Multiplication? +Matrix multiplication is the **fundamental operation** that powers neural networks. When we multiply matrices A and B: + +$$C = A \times B$$ + +Each element $C_{i,j}$ is the **dot product** of row $i$ from A and column $j$ from B. + +### The Mathematical Foundation: Linear Algebra in Neural Networks + +#### **Why Matrix Multiplication in Neural Networks?** +Neural networks are fundamentally about **linear transformations** followed by **nonlinear activations**: + +```python +# The core neural network operation: +linear_output = weights @ input + bias # Linear transformation (matrix multiplication) +activation_output = activation_function(linear_output) # Nonlinear transformation +``` + +#### **The Geometric Interpretation** +Matrix multiplication represents **geometric transformations** in high-dimensional space: + +- **Rotation**: Changing the orientation of data +- **Scaling**: Stretching or compressing along certain dimensions +- **Projection**: Mapping to lower or higher dimensional spaces +- **Translation**: Shifting data (via bias terms) + +#### **Why This Matters for Learning** +Each layer learns to transform the input space to make the final task easier: + +```python +# Example: Image classification +raw_pixels → [Layer 1] → edges → [Layer 2] → shapes → [Layer 3] → objects → [Layer 4] → classes +``` + +### The Computational Perspective + +#### **Batch Processing Power** +Matrix multiplication enables efficient batch processing: + +```python +# Single sample (inefficient): +for sample in batch: + output = weights @ sample + bias # Process one at a time + +# Batch processing (efficient): +batch_output = weights @ batch + bias # Process all samples simultaneously +``` + +#### **Parallelization Benefits** +- **CPU**: Multiple cores can compute different parts simultaneously +- **GPU**: Thousands of cores excel at matrix operations +- **TPU**: Specialized hardware designed for matrix multiplication +- **Memory**: Contiguous memory access patterns improve cache efficiency + +#### **Computational Complexity** +For matrices A(m×n) and B(n×p): +- **Time complexity**: O(mnp) - cubic in the worst case +- **Space complexity**: O(mp) - for the output matrix +- **Optimization**: Modern libraries use optimized algorithms (Strassen, etc.) + +### Real-World Applications: Where Matrix Multiplication Shines + +#### **Computer Vision** +```python +# Convolutional layers can be expressed as matrix multiplication: +# Image patches → Matrix A +# Convolutional filters → Matrix B +# Feature maps → Matrix C = A @ B +``` + +#### **Natural Language Processing** +```python +# Transformer attention mechanism: +# Query matrix Q, Key matrix K, Value matrix V +# Attention weights = softmax(Q @ K.T / sqrt(d_k)) +# Output = Attention_weights @ V +``` + +#### **Recommendation Systems** +```python +# Matrix factorization: +# User-item matrix R ≈ User_factors @ Item_factors.T +# Collaborative filtering through matrix operations +``` + +### The Algorithm: Understanding Every Step + +For matrices A(m×n) and B(n×p) → C(m×p): +```python +for i in range(m): # For each row of A + for j in range(p): # For each column of B + for k in range(n): # Compute dot product + C[i,j] += A[i,k] * B[k,j] +``` + +#### **Visual Breakdown** +``` +A = [[1, 2], B = [[5, 6], C = [[19, 22], + [3, 4]] [7, 8]] [43, 50]] + +C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19 +C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22 +C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43 +C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50 +``` + +#### **Memory Access Pattern** +- **Row-major order**: Access elements row by row for cache efficiency +- **Cache locality**: Nearby elements are likely to be accessed together +- **Blocking**: Divide large matrices into blocks for better cache usage + +### Performance Considerations: Making It Fast + +#### **Optimization Strategies** +1. **Vectorization**: Use SIMD instructions for parallel element operations +2. **Blocking**: Divide matrices into cache-friendly blocks +3. **Loop unrolling**: Reduce loop overhead +4. **Memory alignment**: Ensure data is aligned for optimal access + +#### **Modern Libraries** +- **BLAS (Basic Linear Algebra Subprograms)**: Optimized matrix operations +- **Intel MKL**: Highly optimized for Intel processors +- **OpenBLAS**: Open-source optimized BLAS +- **cuBLAS**: GPU-accelerated BLAS from NVIDIA + +#### **Why We Implement Naive Version** +Understanding the basic algorithm helps you: +- **Debug performance issues**: Know what's happening under the hood +- **Optimize for specific cases**: Custom implementations for special matrices +- **Understand complexity**: Appreciate the optimizations in modern libraries +- **Educational value**: See the mathematical foundation clearly + +### Connection to Neural Network Architecture + +#### **Layer Composition** +```python +# Each layer is a matrix multiplication: +layer1_output = W1 @ input + b1 +layer2_output = W2 @ layer1_output + b2 +layer3_output = W3 @ layer2_output + b3 + +# This is equivalent to: +final_output = W3 @ (W2 @ (W1 @ input + b1) + b2) + b3 +``` + +#### **Gradient Flow** +During backpropagation, gradients flow through matrix operations: +```python +# Forward: y = W @ x + b +# Backward: +# dW = dy @ x.T +# dx = W.T @ dy +# db = dy.sum(axis=0) +``` + +#### **Weight Initialization** +Matrix multiplication behavior depends on weight initialization: +- **Xavier/Glorot**: Maintains variance across layers +- **He initialization**: Optimized for ReLU activations +- **Orthogonal**: Preserves gradient norms + +Let's implement matrix multiplication to truly understand it! +""" + +# %% nbgrader={"grade": false, "grade_id": "matmul-naive", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray: + """ + Naive matrix multiplication using explicit for-loops. + + This helps you understand what matrix multiplication really does! + + Args: + A: Matrix of shape (m, n) + B: Matrix of shape (n, p) + + Returns: + Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n)) + + TODO: Implement matrix multiplication using three nested for-loops. + + APPROACH: + 1. Get the dimensions: m, n from A and n2, p from B + 2. Check that n == n2 (matrices must be compatible) + 3. Create output matrix C of shape (m, p) filled with zeros + 4. Use three nested loops: + - i loop: rows of A (0 to m-1) + - j loop: columns of B (0 to p-1) + - k loop: shared dimension (0 to n-1) + 5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j] + + EXAMPLE: + A = [[1, 2], B = [[5, 6], + [3, 4]] [7, 8]] + + C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19 + C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22 + C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43 + C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50 + + HINTS: + - Start with C = np.zeros((m, p)) + - Use three nested for loops: for i in range(m): for j in range(p): for k in range(n): + - Accumulate the sum: C[i,j] += A[i,k] * B[k,j] + """ + ### BEGIN SOLUTION + # Get matrix dimensions + m, n = A.shape + n2, p = B.shape + + # Check compatibility + if n != n2: + raise ValueError(f"Incompatible matrix dimensions: A is {m}x{n}, B is {n2}x{p}") + + # Initialize result matrix + C = np.zeros((m, p)) + + # Triple nested loop for matrix multiplication + for i in range(m): + for j in range(p): + for k in range(n): + C[i, j] += A[i, k] * B[k, j] + + return C + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Unit Test: Matrix Multiplication + +Let's test your matrix multiplication implementation right away! This is the foundation of neural networks. + +**This is a unit test** - it tests one specific function (matmul_naive) in isolation. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-matmul-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test matrix multiplication immediately after implementation +print("🔬 Unit Test: Matrix Multiplication...") + +# Test simple 2x2 case +try: + A = np.array([[1, 2], [3, 4]], dtype=np.float32) + B = np.array([[5, 6], [7, 8]], dtype=np.float32) + + result = matmul_naive(A, B) + expected = np.array([[19, 22], [43, 50]], dtype=np.float32) + + assert np.allclose(result, expected), f"Matrix multiplication failed: expected {expected}, got {result}" + print(f"✅ Simple 2x2 test: {A.tolist()} @ {B.tolist()} = {result.tolist()}") + + # Compare with NumPy + numpy_result = A @ B + assert np.allclose(result, numpy_result), f"Doesn't match NumPy: got {result}, expected {numpy_result}" + print("✅ Matches NumPy's result") + +except Exception as e: + print(f"❌ Matrix multiplication test failed: {e}") + raise + +# Test different shapes +try: + A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3 + B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1 + result2 = matmul_naive(A2, B2) + expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32 + + assert np.allclose(result2, expected2), f"Different shapes failed: got {result2}, expected {expected2}" + print(f"✅ Different shapes test: {A2.tolist()} @ {B2.tolist()} = {result2.tolist()}") + +except Exception as e: + print(f"❌ Different shapes test failed: {e}") + raise + +# Show the algorithm in action +print("🎯 Matrix multiplication algorithm:") +print(" C[i,j] = Σ(A[i,k] * B[k,j]) for all k") +print(" Triple nested loops compute each element") +print("📈 Progress: Matrix multiplication ✓") + +# %% [markdown] +""" +## Step 2: Building the Dense Layer + +Now let's build the **Dense layer**, the most fundamental building block of neural networks. A Dense layer performs a linear transformation: `y = Wx + b` + +### What is a Dense Layer? +- **Linear transformation**: `y = Wx + b` +- **W**: Weight matrix (learnable parameters) +- **x**: Input tensor +- **b**: Bias vector (learnable parameters) +- **y**: Output tensor + +### Why Dense Layers Matter +- **Universal approximation**: Can approximate any function with enough neurons +- **Feature learning**: Each neuron learns a different feature +- **Nonlinearity**: When combined with activation functions, becomes very powerful +- **Foundation**: All other layers build on this concept + +### The Math +For input x of shape (batch_size, input_size): +- **W**: Weight matrix of shape (input_size, output_size) +- **b**: Bias vector of shape (output_size) +- **y**: Output of shape (batch_size, output_size) + +### Visual Example +``` +Input: x = [1, 2, 3] (3 features) +Weights: W = [[0.1, 0.2], Bias: b = [0.1, 0.2] + [0.3, 0.4], + [0.5, 0.6]] + +Step 1: Wx = [0.1*1 + 0.3*2 + 0.5*3, 0.2*1 + 0.4*2 + 0.6*3] + = [2.2, 3.2] + +Step 2: y = Wx + b = [2.2 + 0.1, 3.2 + 0.2] = [2.3, 3.4] +``` + +Let's implement this! +""" + +# %% nbgrader={"grade": false, "grade_id": "dense-class", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class Dense: + """ + Dense (Linear) Layer: y = Wx + b + + The fundamental building block of neural networks. + Performs linear transformation: matrix multiplication + bias addition. + """ + + def __init__(self, input_size: int, output_size: int, use_bias: bool = True, + use_naive_matmul: bool = False): + """ + Initialize Dense layer with random weights. + + Args: + input_size: Number of input features + output_size: Number of output features + use_bias: Whether to include bias term (default: True) + use_naive_matmul: Whether to use naive matrix multiplication (for learning) + + TODO: Implement Dense layer initialization with proper weight initialization. + + APPROACH: + 1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul) + 2. Initialize weights with Xavier/Glorot initialization + 3. Initialize bias to zeros (if use_bias=True) + 4. Convert to float32 for consistency + + EXAMPLE: + Dense(3, 2) creates: + - weights: shape (3, 2) with small random values + - bias: shape (2,) with zeros + + HINTS: + - Use np.random.randn() for random initialization + - Scale weights by sqrt(2/(input_size + output_size)) for Xavier init + - Use np.zeros() for bias initialization + - Convert to float32 with .astype(np.float32) + """ + ### BEGIN SOLUTION + # Store parameters + self.input_size = input_size + self.output_size = output_size + self.use_bias = use_bias + self.use_naive_matmul = use_naive_matmul + + # Xavier/Glorot initialization + scale = np.sqrt(2.0 / (input_size + output_size)) + self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale + + # Initialize bias + if use_bias: + self.bias = np.zeros(output_size, dtype=np.float32) + else: + self.bias = None + ### END SOLUTION + + def forward(self, x: Tensor) -> Tensor: + """ + Forward pass: y = Wx + b + + Args: + x: Input tensor of shape (batch_size, input_size) + + Returns: + Output tensor of shape (batch_size, output_size) + + TODO: Implement matrix multiplication and bias addition. + + APPROACH: + 1. Choose matrix multiplication method based on use_naive_matmul flag + 2. Perform matrix multiplication: Wx + 3. Add bias if use_bias=True + 4. Return result wrapped in Tensor + + EXAMPLE: + Input x: Tensor([[1, 2, 3]]) # shape (1, 3) + Weights: shape (3, 2) + Output: Tensor([[val1, val2]]) # shape (1, 2) + + HINTS: + - Use self.use_naive_matmul to choose between matmul_naive and @ + - x.data gives you the numpy array + - Use broadcasting for bias addition: result + self.bias + - Return Tensor(result) to wrap the result + """ + ### BEGIN SOLUTION + # Matrix multiplication + if self.use_naive_matmul: + result = matmul_naive(x.data, self.weights) + else: + result = x.data @ self.weights + + # Add bias + if self.use_bias: + result += self.bias + + return Tensor(result) + ### END SOLUTION + + def __call__(self, x: Tensor) -> Tensor: + """Make layer callable: layer(x) same as layer.forward(x)""" + return self.forward(x) + +# %% [markdown] +""" +### 🧪 Unit Test: Dense Layer + +Let's test your Dense layer implementation! This is the fundamental building block of neural networks. + +**This is a unit test** - it tests one specific class (Dense layer) in isolation. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-dense-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test Dense layer immediately after implementation +print("🔬 Unit Test: Dense Layer...") + +# Test basic Dense layer +try: + layer = Dense(input_size=3, output_size=2, use_bias=True) + x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3 + + print(f"Input shape: {x.shape}") + print(f"Layer weights shape: {layer.weights.shape}") + if layer.bias is not None: + print(f"Layer bias shape: {layer.bias.shape}") + + y = layer(x) + print(f"Output shape: {y.shape}") + print(f"Output: {y}") + + # Test shape compatibility + assert y.shape == (1, 2), f"Output shape should be (1, 2), got {y.shape}" + print("✅ Dense layer produces correct output shape") + + # Test weights initialization + assert layer.weights.shape == (3, 2), f"Weights shape should be (3, 2), got {layer.weights.shape}" + if layer.bias is not None: + assert layer.bias.shape == (2,), f"Bias shape should be (2,), got {layer.bias.shape}" + print("✅ Dense layer has correct weight and bias shapes") + + # Test that weights are not all zeros (proper initialization) + assert not np.allclose(layer.weights, 0), "Weights should not be all zeros" + if layer.bias is not None: + assert np.allclose(layer.bias, 0), "Bias should be initialized to zeros" + print("✅ Dense layer has proper weight initialization") + +except Exception as e: + print(f"❌ Dense layer test failed: {e}") + raise + +# Test without bias +try: + layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False) + x2 = Tensor([[1, 2]]) + y2 = layer_no_bias(x2) + + assert y2.shape == (1, 1), f"No bias output shape should be (1, 1), got {y2.shape}" + assert layer_no_bias.bias is None, "Bias should be None when use_bias=False" + print("✅ Dense layer works without bias") + +except Exception as e: + print(f"❌ Dense layer no-bias test failed: {e}") + raise + +# Test naive matrix multiplication +try: + layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True) + x3 = Tensor([[1, 2]]) + y3 = layer_naive(x3) + + assert y3.shape == (1, 2), f"Naive matmul output shape should be (1, 2), got {y3.shape}" + print("✅ Dense layer works with naive matrix multiplication") + +except Exception as e: + print(f"❌ Dense layer naive matmul test failed: {e}") + raise + +# Show the linear transformation in action +print("🎯 Dense layer behavior:") +print(" y = Wx + b (linear transformation)") +print(" W: learnable weight matrix") +print(" b: learnable bias vector") +print("📈 Progress: Matrix multiplication ✓, Dense layer ✓") + +# %% [markdown] +""" +### 🧪 Test Your Implementations + +Once you implement the functions above, run these cells to test them: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-matmul-naive", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test matrix multiplication +print("Testing matrix multiplication...") + +# Test case 1: Simple 2x2 matrices +A = np.array([[1, 2], [3, 4]], dtype=np.float32) +B = np.array([[5, 6], [7, 8]], dtype=np.float32) + +result = matmul_naive(A, B) +expected = np.array([[19, 22], [43, 50]], dtype=np.float32) + +print(f"Matrix A:\n{A}") +print(f"Matrix B:\n{B}") +print(f"Your result:\n{result}") +print(f"Expected:\n{expected}") + +assert np.allclose(result, expected), f"Result doesn't match expected: got {result}, expected {expected}" + +# Test case 2: Compare with NumPy +numpy_result = A @ B +assert np.allclose(result, numpy_result), f"Doesn't match NumPy result: got {result}, expected {numpy_result}" + +# Test case 3: Different shapes +A2 = np.array([[1, 2, 3]], dtype=np.float32) # 1x3 +B2 = np.array([[4], [5], [6]], dtype=np.float32) # 3x1 +result2 = matmul_naive(A2, B2) +expected2 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32 +assert np.allclose(result2, expected2), f"Different shapes failed: got {result2}, expected {expected2}" + +print("✅ Matrix multiplication tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-dense-layer", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test Dense layer +print("Testing Dense layer...") + +# Test basic Dense layer +layer = Dense(input_size=3, output_size=2, use_bias=True) +x = Tensor([[1, 2, 3]]) # batch_size=1, input_size=3 + +print(f"Input shape: {x.shape}") +print(f"Layer weights shape: {layer.weights.shape}") +if layer.bias is not None: + print(f"Layer bias shape: {layer.bias.shape}") +else: + print("Layer bias: None") + +y = layer(x) +print(f"Output shape: {y.shape}") +print(f"Output: {y}") + +# Test shape compatibility +assert y.shape == (1, 2), f"Output shape should be (1, 2), got {y.shape}" + +# Test without bias +layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False) +x2 = Tensor([[1, 2]]) +y2 = layer_no_bias(x2) +assert y2.shape == (1, 1), f"No bias output shape should be (1, 1), got {y2.shape}" +assert layer_no_bias.bias is None, "Bias should be None when use_bias=False" + +# Test naive matrix multiplication +layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True) +x3 = Tensor([[1, 2]]) +y3 = layer_naive(x3) +assert y3.shape == (1, 2), f"Naive matmul output shape should be (1, 2), got {y3.shape}" + +print("✅ Dense layer tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-layer-composition", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test layer composition +print("Testing layer composition...") + +# Create a simple network: Dense → ReLU → Dense +dense1 = Dense(input_size=3, output_size=2) +relu = ReLU() +dense2 = Dense(input_size=2, output_size=1) + +# Test input +x = Tensor([[1, 2, 3]]) +print(f"Input: {x}") + +# Forward pass through the network +h1 = dense1(x) +print(f"After Dense1: {h1}") + +h2 = relu(h1) +print(f"After ReLU: {h2}") + +h3 = dense2(h2) +print(f"After Dense2: {h3}") + +# Test shapes +assert h1.shape == (1, 2), f"Dense1 output should be (1, 2), got {h1.shape}" +assert h2.shape == (1, 2), f"ReLU output should be (1, 2), got {h2.shape}" +assert h3.shape == (1, 1), f"Dense2 output should be (1, 1), got {h3.shape}" + +# Test that ReLU actually applied (non-negative values) +assert np.all(h2.data >= 0), "ReLU should produce non-negative values" + +print("✅ Layer composition tests passed!") + +# %% [markdown] +""" +## 🧪 Comprehensive Testing: Matrix Multiplication and Dense Layers + +Let's thoroughly test your implementations to make sure they work correctly in all scenarios. +This comprehensive testing ensures your layers are robust and ready for real neural networks. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-layers-comprehensive", "locked": true, "points": 30, "schema_version": 3, "solution": false, "task": false} +def test_layers_comprehensive(): + """Comprehensive test of matrix multiplication and Dense layers.""" + print("🔬 Testing matrix multiplication and Dense layers comprehensively...") + + tests_passed = 0 + total_tests = 10 + + # Test 1: Matrix Multiplication Basic Cases + try: + # Test 2x2 matrices + A = np.array([[1, 2], [3, 4]], dtype=np.float32) + B = np.array([[5, 6], [7, 8]], dtype=np.float32) + result = matmul_naive(A, B) + expected = np.array([[19, 22], [43, 50]], dtype=np.float32) + + assert np.allclose(result, expected), f"2x2 multiplication failed: expected {expected}, got {result}" + + # Compare with NumPy + numpy_result = A @ B + assert np.allclose(result, numpy_result), f"Doesn't match NumPy: expected {numpy_result}, got {result}" + + print(f"✅ Matrix multiplication 2x2: {A.shape} × {B.shape} = {result.shape}") + tests_passed += 1 + except Exception as e: + print(f"❌ Matrix multiplication basic failed: {e}") + + # Test 2: Matrix Multiplication Different Shapes + try: + # Test 1x3 × 3x1 = 1x1 + A1 = np.array([[1, 2, 3]], dtype=np.float32) + B1 = np.array([[4], [5], [6]], dtype=np.float32) + result1 = matmul_naive(A1, B1) + expected1 = np.array([[32]], dtype=np.float32) # 1*4 + 2*5 + 3*6 = 32 + assert np.allclose(result1, expected1), f"1x3 × 3x1 failed: expected {expected1}, got {result1}" + + # Test 3x2 × 2x4 = 3x4 + A2 = np.array([[1, 2], [3, 4], [5, 6]], dtype=np.float32) + B2 = np.array([[1, 2, 3, 4], [5, 6, 7, 8]], dtype=np.float32) + result2 = matmul_naive(A2, B2) + expected2 = A2 @ B2 + assert np.allclose(result2, expected2), f"3x2 × 2x4 failed: expected {expected2}, got {result2}" + + print(f"✅ Matrix multiplication shapes: (1,3)×(3,1), (3,2)×(2,4)") + tests_passed += 1 + except Exception as e: + print(f"❌ Matrix multiplication shapes failed: {e}") + + # Test 3: Matrix Multiplication Edge Cases + try: + # Test with zeros + A_zero = np.zeros((2, 3), dtype=np.float32) + B_zero = np.zeros((3, 2), dtype=np.float32) + result_zero = matmul_naive(A_zero, B_zero) + expected_zero = np.zeros((2, 2), dtype=np.float32) + assert np.allclose(result_zero, expected_zero), "Zero matrix multiplication failed" + + # Test with identity + A_id = np.array([[1, 2]], dtype=np.float32) + B_id = np.array([[1, 0], [0, 1]], dtype=np.float32) + result_id = matmul_naive(A_id, B_id) + expected_id = np.array([[1, 2]], dtype=np.float32) + assert np.allclose(result_id, expected_id), "Identity matrix multiplication failed" + + # Test with negative values + A_neg = np.array([[-1, 2]], dtype=np.float32) + B_neg = np.array([[3], [-4]], dtype=np.float32) + result_neg = matmul_naive(A_neg, B_neg) + expected_neg = np.array([[-11]], dtype=np.float32) # -1*3 + 2*(-4) = -11 + assert np.allclose(result_neg, expected_neg), "Negative matrix multiplication failed" + + print("✅ Matrix multiplication edge cases: zeros, identity, negatives") + tests_passed += 1 + except Exception as e: + print(f"❌ Matrix multiplication edge cases failed: {e}") + + # Test 4: Dense Layer Initialization + try: + # Test with bias + layer_bias = Dense(input_size=3, output_size=2, use_bias=True) + assert layer_bias.weights.shape == (3, 2), f"Weights shape should be (3, 2), got {layer_bias.weights.shape}" + assert layer_bias.bias is not None, "Bias should not be None when use_bias=True" + assert layer_bias.bias.shape == (2,), f"Bias shape should be (2,), got {layer_bias.bias.shape}" + + # Check weight initialization (should not be all zeros) + assert not np.allclose(layer_bias.weights, 0), "Weights should not be all zeros" + assert np.allclose(layer_bias.bias, 0), "Bias should be initialized to zeros" + + # Test without bias + layer_no_bias = Dense(input_size=4, output_size=3, use_bias=False) + assert layer_no_bias.weights.shape == (4, 3), f"No-bias weights shape should be (4, 3), got {layer_no_bias.weights.shape}" + assert layer_no_bias.bias is None, "Bias should be None when use_bias=False" + + print("✅ Dense layer initialization: weights, bias, shapes") + tests_passed += 1 + except Exception as e: + print(f"❌ Dense layer initialization failed: {e}") + + # Test 5: Dense Layer Forward Pass + try: + layer = Dense(input_size=3, output_size=2, use_bias=True) + + # Test single sample + x_single = Tensor([[1, 2, 3]]) # shape: (1, 3) + y_single = layer(x_single) + assert y_single.shape == (1, 2), f"Single sample output should be (1, 2), got {y_single.shape}" + + # Test batch of samples + x_batch = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape: (3, 3) + y_batch = layer(x_batch) + assert y_batch.shape == (3, 2), f"Batch output should be (3, 2), got {y_batch.shape}" + + # Verify computation manually for single sample + expected_single = np.dot(x_single.data, layer.weights) + layer.bias + assert np.allclose(y_single.data, expected_single), "Single sample computation incorrect" + + print("✅ Dense layer forward pass: single sample, batch processing") + tests_passed += 1 + except Exception as e: + print(f"❌ Dense layer forward pass failed: {e}") + + # Test 6: Dense Layer Without Bias + try: + layer_no_bias = Dense(input_size=2, output_size=3, use_bias=False) + x = Tensor([[1, 2]]) + y = layer_no_bias(x) + + assert y.shape == (1, 3), f"No-bias output should be (1, 3), got {y.shape}" + + # Verify computation (should be just matrix multiplication) + expected = np.dot(x.data, layer_no_bias.weights) + assert np.allclose(y.data, expected), "No-bias computation incorrect" + + print("✅ Dense layer without bias: correct computation") + tests_passed += 1 + except Exception as e: + print(f"❌ Dense layer without bias failed: {e}") + + # Test 7: Dense Layer with Naive Matrix Multiplication + try: + layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True) + layer_optimized = Dense(input_size=2, output_size=2, use_naive_matmul=False) + + # Set same weights for comparison + layer_optimized.weights = layer_naive.weights.copy() + layer_optimized.bias = layer_naive.bias.copy() if layer_naive.bias is not None else None + + x = Tensor([[1, 2]]) + y_naive = layer_naive(x) + y_optimized = layer_optimized(x) + + # Both should give same results + assert np.allclose(y_naive.data, y_optimized.data), "Naive and optimized should give same results" + + print("✅ Dense layer naive vs optimized: consistent results") + tests_passed += 1 + except Exception as e: + print(f"❌ Dense layer naive matmul failed: {e}") + + # Test 8: Layer Composition + try: + # Create a simple network: Dense → ReLU → Dense + dense1 = Dense(input_size=3, output_size=4) + relu = ReLU() + dense2 = Dense(input_size=4, output_size=2) + + x = Tensor([[1, -2, 3]]) + + # Forward pass + h1 = dense1(x) + h2 = relu(h1) + h3 = dense2(h2) + + # Check shapes + assert h1.shape == (1, 4), f"Dense1 output should be (1, 4), got {h1.shape}" + assert h2.shape == (1, 4), f"ReLU output should be (1, 4), got {h2.shape}" + assert h3.shape == (1, 2), f"Dense2 output should be (1, 2), got {h3.shape}" + + # Check ReLU effect + assert np.all(h2.data >= 0), "ReLU should produce non-negative values" + + print("✅ Layer composition: Dense → ReLU → Dense pipeline") + tests_passed += 1 + except Exception as e: + print(f"❌ Layer composition failed: {e}") + + # Test 9: Different Layer Sizes + try: + # Test various layer sizes + test_configs = [ + (1, 1), # Minimal + (10, 5), # Medium + (100, 50), # Large + (784, 128) # MNIST-like + ] + + for input_size, output_size in test_configs: + layer = Dense(input_size=input_size, output_size=output_size) + + # Test with single sample + x = Tensor(np.random.randn(1, input_size)) + y = layer(x) + + assert y.shape == (1, output_size), f"Size ({input_size}, {output_size}) failed: got {y.shape}" + assert layer.weights.shape == (input_size, output_size), f"Weights shape wrong for ({input_size}, {output_size})" + + print("✅ Different layer sizes: (1,1), (10,5), (100,50), (784,128)") + tests_passed += 1 + except Exception as e: + print(f"❌ Different layer sizes failed: {e}") + + # Test 10: Real Neural Network Scenario + try: + # Simulate MNIST-like scenario: 784 → 128 → 64 → 10 + input_layer = Dense(input_size=784, output_size=128) + hidden_layer = Dense(input_size=128, output_size=64) + output_layer = Dense(input_size=64, output_size=10) + + relu1 = ReLU() + relu2 = ReLU() + softmax = Softmax() + + # Simulate flattened MNIST image + x = Tensor(np.random.randn(32, 784)) # Batch of 32 images + + # Forward pass through network + h1 = input_layer(x) + h1_activated = relu1(h1) + h2 = hidden_layer(h1_activated) + h2_activated = relu2(h2) + logits = output_layer(h2_activated) + probabilities = softmax(logits) + + # Check final output + assert probabilities.shape == (32, 10), f"Final output should be (32, 10), got {probabilities.shape}" + + # Check that probabilities sum to 1 for each sample + row_sums = np.sum(probabilities.data, axis=1) + assert np.allclose(row_sums, 1.0), "Each sample should have probabilities summing to 1" + + # Check that all intermediate shapes are correct + assert h1.shape == (32, 128), f"Hidden 1 shape should be (32, 128), got {h1.shape}" + assert h2.shape == (32, 64), f"Hidden 2 shape should be (32, 64), got {h2.shape}" + assert logits.shape == (32, 10), f"Logits shape should be (32, 10), got {logits.shape}" + + print("✅ Real neural network scenario: MNIST-like 784→128→64→10 classification") + tests_passed += 1 + except Exception as e: + print(f"❌ Real neural network scenario failed: {e}") + + # Results summary + print(f"\n📊 Layers Module Results: {tests_passed}/{total_tests} tests passed") + + if tests_passed == total_tests: + print("🎉 All layers tests passed! Your implementations support:") + print(" • Matrix multiplication: naive implementation from scratch") + print(" • Dense layers: linear transformations with learnable parameters") + print(" • Weight initialization: proper random initialization") + print(" • Bias handling: optional bias terms") + print(" • Batch processing: multiple samples at once") + print(" • Layer composition: building complete neural networks") + print(" • Real ML scenarios: MNIST-like classification networks") + print("📈 Progress: All Layer Functionality ✓") + return True + else: + print("⚠️ Some layers tests failed. Common issues:") + print(" • Check matrix multiplication implementation (triple nested loops)") + print(" • Verify Dense layer forward pass (y = Wx + b)") + print(" • Ensure proper weight initialization (not all zeros)") + print(" • Check shape handling for different input/output sizes") + print(" • Verify bias handling when use_bias=False") + return False + +# Run the comprehensive test +success = test_layers_comprehensive() + +# %% [markdown] +""" +### 🧪 Integration Test: Layers in Complete Neural Networks + +Let's test how your layers work in realistic neural network architectures. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-layers-integration", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false} +def test_layers_integration(): + """Integration test with complete neural network architectures.""" + print("🔬 Testing layers in complete neural network architectures...") + + try: + print("🧠 Building and testing different network architectures...") + + # Architecture 1: Simple Binary Classifier + print("\n📊 Architecture 1: Binary Classification Network") + binary_net = [ + Dense(input_size=4, output_size=8), + ReLU(), + Dense(input_size=8, output_size=4), + ReLU(), + Dense(input_size=4, output_size=1), + Sigmoid() + ] + + # Test with batch of samples + x_binary = Tensor(np.random.randn(10, 4)) # 10 samples, 4 features + + # Forward pass through network + current = x_binary + for i, layer in enumerate(binary_net): + current = layer(current) + print(f" Layer {i}: {current.shape}") + + # Verify final output is valid probabilities + assert current.shape == (10, 1), f"Binary classifier output should be (10, 1), got {current.shape}" + assert np.all((current.data >= 0) & (current.data <= 1)), "Binary probabilities should be in [0,1]" + + print("✅ Binary classification network: 4→8→4→1 with ReLU/Sigmoid") + + # Architecture 2: Multi-class Classifier + print("\n📊 Architecture 2: Multi-class Classification Network") + multiclass_net = [ + Dense(input_size=784, output_size=256), + ReLU(), + Dense(input_size=256, output_size=128), + ReLU(), + Dense(input_size=128, output_size=10), + Softmax() + ] + + # Simulate MNIST-like input + x_mnist = Tensor(np.random.randn(5, 784)) # 5 images, 784 pixels + + current = x_mnist + for i, layer in enumerate(multiclass_net): + current = layer(current) + print(f" Layer {i}: {current.shape}") + + # Verify final output is valid probability distribution + assert current.shape == (5, 10), f"Multi-class output should be (5, 10), got {current.shape}" + row_sums = np.sum(current.data, axis=1) + assert np.allclose(row_sums, 1.0), "Each sample should have probabilities summing to 1" + + print("✅ Multi-class classification network: 784→256→128→10 with Softmax") + + # Architecture 3: Deep Network + print("\n📊 Architecture 3: Deep Network (5 layers)") + deep_net = [ + Dense(input_size=100, output_size=80), + ReLU(), + Dense(input_size=80, output_size=60), + ReLU(), + Dense(input_size=60, output_size=40), + ReLU(), + Dense(input_size=40, output_size=20), + ReLU(), + Dense(input_size=20, output_size=3), + Softmax() + ] + + x_deep = Tensor(np.random.randn(8, 100)) # 8 samples, 100 features + + current = x_deep + for i, layer in enumerate(deep_net): + current = layer(current) + if i % 2 == 0: # Print every other layer to save space + print(f" Layer {i}: {current.shape}") + + assert current.shape == (8, 3), f"Deep network output should be (8, 3), got {current.shape}" + + print("✅ Deep network: 100→80→60→40→20→3 with multiple ReLU layers") + + # Test 4: Network with Different Activation Functions + print("\n📊 Architecture 4: Mixed Activation Functions") + mixed_net = [ + Dense(input_size=6, output_size=4), + Tanh(), # Zero-centered activation + Dense(input_size=4, output_size=3), + ReLU(), # Sparse activation + Dense(input_size=3, output_size=2), + Sigmoid() # Bounded activation + ] + + x_mixed = Tensor(np.random.randn(3, 6)) + + current = x_mixed + for i, layer in enumerate(mixed_net): + current = layer(current) + print(f" Layer {i}: {current.shape}, range: [{np.min(current.data):.3f}, {np.max(current.data):.3f}]") + + assert current.shape == (3, 2), f"Mixed network output should be (3, 2), got {current.shape}" + + print("✅ Mixed activations network: Tanh→ReLU→Sigmoid combinations") + + # Test 5: Parameter Counting + print("\n📊 Parameter Analysis") + + def count_parameters(layer): + """Count trainable parameters in a Dense layer.""" + if isinstance(layer, Dense): + weight_params = layer.weights.size + bias_params = layer.bias.size if layer.bias is not None else 0 + return weight_params + bias_params + return 0 + + # Count parameters in binary classifier + total_params = sum(count_parameters(layer) for layer in binary_net) + print(f"Binary classifier parameters: {total_params}") + + # Manual verification for first layer: 4*8 + 8 = 40 + first_dense = binary_net[0] + expected_first = 4 * 8 + 8 # weights + bias + actual_first = count_parameters(first_dense) + assert actual_first == expected_first, f"First layer params: expected {expected_first}, got {actual_first}" + + print("✅ Parameter counting: weight and bias parameters calculated correctly") + + # Test 6: Gradient Flow Preparation + print("\n📊 Gradient Flow Preparation") + + # Test that network can handle different input types + test_inputs = [ + Tensor(np.zeros((1, 4))), # All zeros + Tensor(np.ones((1, 4))), # All ones + Tensor(np.random.randn(1, 4)), # Random + Tensor(np.random.randn(1, 4) * 10) # Large values + ] + + for i, test_input in enumerate(test_inputs): + current = test_input + for layer in binary_net: + current = layer(current) + + # Check for numerical stability + assert not np.any(np.isnan(current.data)), f"Input {i} produced NaN" + assert not np.any(np.isinf(current.data)), f"Input {i} produced Inf" + + print("✅ Numerical stability: networks handle various input ranges") + + print("\n🎉 Integration test passed! Your layers work correctly in:") + print(" • Binary classification networks") + print(" • Multi-class classification networks") + print(" • Deep networks with multiple hidden layers") + print(" • Networks with mixed activation functions") + print(" • Parameter counting and analysis") + print(" • Numerical stability across input ranges") + print("📈 Progress: Layers ready for complete neural networks!") + + return True + + except Exception as e: + print(f"❌ Integration test failed: {e}") + print("\n💡 This suggests an issue with:") + print(" • Layer composition and chaining") + print(" • Shape compatibility between layers") + print(" • Activation function integration") + print(" • Numerical stability in deep networks") + print(" • Check your Dense layer and matrix multiplication") + return False + +# Run the integration test +success = test_layers_integration() and success + +# Print final summary +print(f"\n{'='*60}") +print("🎯 LAYERS MODULE TESTING COMPLETE") +print(f"{'='*60}") + +if success: + print("🎉 CONGRATULATIONS! All layers tests passed!") + print("\n✅ Your layers module successfully implements:") + print(" • Matrix multiplication: naive implementation from scratch") + print(" • Dense layers: y = Wx + b linear transformations") + print(" • Weight initialization: proper random weight setup") + print(" • Bias handling: optional bias terms") + print(" • Batch processing: efficient multi-sample computation") + print(" • Layer composition: building complete neural networks") + print(" • Integration: works with all activation functions") + print(" • Real ML scenarios: MNIST-like classification networks") + print("\n🚀 You're ready to build complete neural network architectures!") + print("📈 Final Progress: Layers Module ✓ COMPLETE") +else: + print("⚠️ Some tests failed. Please review the error messages above.") + print("\n🔧 To fix issues:") + print(" 1. Check your matrix multiplication implementation") + print(" 2. Verify Dense layer forward pass computation") + print(" 3. Ensure proper weight and bias initialization") + print(" 4. Test shape compatibility between layers") + print(" 5. Verify integration with activation functions") + print("\n💪 Keep building! These layers are the foundation of all neural networks.") + +# %% [markdown] +""" +## 🎯 Module Summary + +Congratulations! You've successfully implemented the core building blocks of neural networks: + +### What You've Accomplished +✅ **Matrix Multiplication**: Implemented from scratch with triple nested loops +✅ **Dense Layer**: The fundamental linear transformation y = Wx + b +✅ **Weight Initialization**: Xavier/Glorot initialization for stable training +✅ **Layer Composition**: Combining layers with activations +✅ **Flexible Implementation**: Support for both naive and optimized matrix multiplication + +### Key Concepts You've Learned +- **Matrix multiplication** is the engine of neural networks +- **Dense layers** perform linear transformations that learn features +- **Weight initialization** is crucial for stable training +- **Layer composition** creates powerful nonlinear functions +- **Batch processing** enables efficient computation + +### Mathematical Foundations +- **Linear algebra**: Matrix operations power all neural computations +- **Universal approximation**: Dense layers can approximate any function +- **Feature learning**: Each neuron learns different patterns +- **Composability**: Simple operations combine to create complex behaviors + +### Next Steps +1. **Export your code**: `tito package nbdev --export 03_layers` +2. **Test your implementation**: `tito module test 03_layers` +3. **Use your layers**: + ```python + from tinytorch.core.layers import Dense + from tinytorch.core.activations import ReLU + layer = Dense(10, 5) + activation = ReLU() + ``` +4. **Move to Module 4**: Start building complete neural networks! + +**Ready for the next challenge?** Let's compose these layers into complete neural network architectures! +""" \ No newline at end of file diff --git a/modules/source/03_layers/tests/test_layers.py b/modules/source/03_layers/tests/test_layers.py deleted file mode 100644 index e2597b8f..00000000 --- a/modules/source/03_layers/tests/test_layers.py +++ /dev/null @@ -1,336 +0,0 @@ -""" -Test suite for the layers module. -This tests the student implementations to ensure they work correctly. -""" - -import pytest -import numpy as np -import sys -import os - -# Import from the main package (rock solid foundation) -from tinytorch.core.tensor import Tensor -from tinytorch.core.layers import Dense -from tinytorch.core.activations import ReLU, Sigmoid, Tanh - -def safe_numpy(tensor): - """Get numpy array from tensor, using .numpy() if available, otherwise .data""" - if hasattr(tensor, 'numpy'): - return tensor.numpy() - else: - return tensor.data - -class TestDenseLayer: - """Test Dense (Linear) layer functionality.""" - - def test_dense_creation(self): - """Test creating Dense layers with different configurations.""" - # Basic dense layer - layer = Dense(input_size=3, output_size=2) - assert layer.input_size == 3 - assert layer.output_size == 2 - assert layer.use_bias == True - assert layer.weights.shape == (3, 2) - assert layer.bias.shape == (2,) - - # Dense layer without bias - layer_no_bias = Dense(input_size=4, output_size=3, use_bias=False) - assert layer_no_bias.use_bias == False - assert layer_no_bias.bias is None - - def test_dense_forward_single(self): - """Test Dense layer forward pass with single input.""" - layer = Dense(input_size=3, output_size=2) - - # Single input - x = Tensor([[1.0, 2.0, 3.0]]) - y = layer(x) - - assert y.shape == (1, 2) - assert isinstance(y, Tensor) - - def test_dense_forward_batch(self): - """Test Dense layer forward pass with batch input.""" - layer = Dense(input_size=3, output_size=2) - - # Batch input - x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) - y = layer(x) - - assert y.shape == (2, 2) - assert isinstance(y, Tensor) - - def test_dense_no_bias(self): - """Test Dense layer without bias.""" - layer = Dense(input_size=2, output_size=1, use_bias=False) - - x = Tensor([[1.0, 2.0]]) - y = layer(x) - - assert y.shape == (1, 1) - # Should be just matrix multiplication without bias - expected = safe_numpy(x) @ safe_numpy(layer.weights) - np.testing.assert_array_almost_equal(safe_numpy(y), expected) - - def test_dense_callable(self): - """Test that Dense layer is callable.""" - layer = Dense(input_size=2, output_size=1) - x = Tensor([[1.0, 2.0]]) - - # Both should work - y1 = layer.forward(x) - y2 = layer(x) - - np.testing.assert_array_equal(safe_numpy(y1), safe_numpy(y2)) - -class TestActivationFunctions: - """Test activation function implementations.""" - - def test_relu_basic(self): - """Test ReLU activation function.""" - relu = ReLU() - x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]]) - y = relu(x) - - expected = [[0.0, 0.0, 0.0, 1.0, 2.0]] - np.testing.assert_array_equal(safe_numpy(y), expected) - - def test_relu_callable(self): - """Test that ReLU is callable.""" - relu = ReLU() - x = Tensor([[1.0, -1.0]]) - - y1 = relu.forward(x) - y2 = relu(x) - - np.testing.assert_array_equal(safe_numpy(y1), safe_numpy(y2)) - - def test_sigmoid_basic(self): - """Test Sigmoid activation function.""" - sigmoid = Sigmoid() - x = Tensor([[0.0]]) # sigmoid(0) = 0.5 - y = sigmoid(x) - - np.testing.assert_array_almost_equal(safe_numpy(y), [[0.5]]) - - def test_sigmoid_range(self): - """Test Sigmoid output range.""" - sigmoid = Sigmoid() - x = Tensor([[-10.0, 0.0, 10.0]]) - y = sigmoid(x) - - # Should be in range [0, 1] - use reasonable bounds - assert np.all(safe_numpy(y) >= 0) - assert np.all(safe_numpy(y) <= 1) - # Check that extreme values are close to bounds - assert safe_numpy(y)[0][0] < 0.01 # Very small for -10 - assert safe_numpy(y)[0][2] > 0.99 # Very large for 10 - - def test_tanh_basic(self): - """Test Tanh activation function.""" - tanh = Tanh() - x = Tensor([[0.0]]) # tanh(0) = 0 - y = tanh(x) - - np.testing.assert_array_almost_equal(safe_numpy(y), [[0.0]]) - - def test_tanh_range(self): - """Test Tanh output range.""" - tanh = Tanh() - x = Tensor([[-10.0, 0.0, 10.0]]) - y = tanh(x) - - # Should be in range [-1, 1] - use reasonable bounds - assert np.all(safe_numpy(y) >= -1) - assert np.all(safe_numpy(y) <= 1) - # Check that extreme values are close to bounds - assert safe_numpy(y)[0][0] < -0.99 # Very negative for -10 - assert safe_numpy(y)[0][2] > 0.99 # Very positive for 10 - -class TestLayerComposition: - """Test composing layers into neural networks.""" - - def test_simple_network(self): - """Test a simple 2-layer network.""" - # 3 → 4 → 2 network - layer1 = Dense(input_size=3, output_size=4) - relu = ReLU() - layer2 = Dense(input_size=4, output_size=2) - sigmoid = Sigmoid() - - # Forward pass - x = Tensor([[1.0, 2.0, 3.0]]) - h1 = layer1(x) - h1_activated = relu(h1) - h2 = layer2(h1_activated) - output = sigmoid(h2) - - assert h1.shape == (1, 4) - assert h1_activated.shape == (1, 4) - assert h2.shape == (1, 2) - assert output.shape == (1, 2) - - # Output should be in sigmoid range - assert np.all(safe_numpy(output) >= 0) - assert np.all(safe_numpy(output) <= 1) - - def test_batch_network(self): - """Test network with batch processing.""" - layer1 = Dense(input_size=2, output_size=3) - relu = ReLU() - layer2 = Dense(input_size=3, output_size=1) - - # Batch of 4 examples - x = Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0]]) - - h1 = layer1(x) - h1_activated = relu(h1) - output = layer2(h1_activated) - - assert output.shape == (4, 1) - - def test_deep_network(self): - """Test deeper network composition.""" - # 5-layer network - layers = [ - Dense(input_size=10, output_size=8), - ReLU(), - Dense(input_size=8, output_size=6), - ReLU(), - Dense(input_size=6, output_size=4), - ReLU(), - Dense(input_size=4, output_size=2), - Sigmoid() - ] - - x = Tensor([[1.0] * 10]) # 10 features - - # Forward pass through all layers - current = x - for layer in layers: - current = layer(current) - - assert current.shape == (1, 2) - # Final output should be in sigmoid range - assert np.all(safe_numpy(current) >= 0) - assert np.all(safe_numpy(current) <= 1) - -class TestEdgeCases: - """Test edge cases and error conditions.""" - - def test_zero_input(self): - """Test layers with zero input.""" - layer = Dense(input_size=3, output_size=2) - relu = ReLU() - - x = Tensor([[0.0, 0.0, 0.0]]) - y = layer(x) - y_relu = relu(y) - - assert y.shape == (1, 2) - assert y_relu.shape == (1, 2) - - def test_large_input(self): - """Test layers with large input values.""" - layer = Dense(input_size=2, output_size=1) - sigmoid = Sigmoid() - - x = Tensor([[1000.0, -1000.0]]) - y = layer(x) - y_sigmoid = sigmoid(y) - - # Should not overflow - assert not np.any(np.isnan(safe_numpy(y_sigmoid))) - assert not np.any(np.isinf(safe_numpy(y_sigmoid))) - - def test_single_neuron(self): - """Test single neuron layers.""" - layer = Dense(input_size=1, output_size=1) - x = Tensor([[5.0]]) - y = layer(x) - - assert y.shape == (1, 1) - -# Stretch goal tests (these will be skipped if methods don't exist) -class TestStretchGoals: - """Stretch goal tests for advanced features.""" - - @pytest.mark.skip(reason="Stretch goal: Weight initialization methods") - def test_weight_initialization_methods(self): - """Test different weight initialization strategies.""" - # Xavier initialization - layer_xavier = Dense(input_size=100, output_size=50, init_method='xavier') - weights_xavier = safe_numpy(layer_xavier.weights) - - # He initialization - layer_he = Dense(input_size=100, output_size=50, init_method='he') - weights_he = safe_numpy(layer_he.weights) - - # Check initialization ranges - xavier_limit = np.sqrt(6.0 / (100 + 50)) - assert np.all(np.abs(weights_xavier) <= xavier_limit) - - he_limit = np.sqrt(2.0 / 100) - assert np.std(weights_he) <= he_limit * 1.5 # Some tolerance - - @pytest.mark.skip(reason="Stretch goal: Layer parameter access") - def test_layer_parameters(self): - """Test accessing and modifying layer parameters.""" - layer = Dense(input_size=3, output_size=2) - - # Should be able to access parameters - assert hasattr(layer, 'parameters') - params = layer.parameters() - assert len(params) == 2 # weights and bias - - # Should be able to set parameters - new_weights = Tensor(np.ones((3, 2))) - layer.set_weights(new_weights) - np.testing.assert_array_equal(safe_numpy(layer.weights), safe_numpy(new_weights)) - - @pytest.mark.skip(reason="Stretch goal: Additional activation functions") - def test_additional_activations(self): - """Test additional activation functions.""" - # Leaky ReLU - leaky_relu = LeakyReLU(alpha=0.1) - x = Tensor([[-1.0, 0.0, 1.0]]) - y = leaky_relu(x) - expected = [[-0.1, 0.0, 1.0]] - np.testing.assert_array_almost_equal(safe_numpy(y), expected) - - # Softmax - softmax = Softmax() - x = Tensor([[1.0, 2.0, 3.0]]) - y = softmax(x) - # Should sum to 1 - assert np.allclose(np.sum(safe_numpy(y)), 1.0) - - @pytest.mark.skip(reason="Stretch goal: Dropout layer") - def test_dropout_layer(self): - """Test dropout layer implementation.""" - dropout = Dropout(p=0.5) - x = Tensor([[1.0, 2.0, 3.0, 4.0]]) - - # Training mode - dropout.train() - y_train = dropout(x) - - # Inference mode - dropout.eval() - y_eval = dropout(x) - - # In eval mode, should be same as input - np.testing.assert_array_equal(safe_numpy(y_eval), safe_numpy(x)) - - @pytest.mark.skip(reason="Stretch goal: Batch normalization") - def test_batch_normalization(self): - """Test batch normalization layer.""" - bn = BatchNorm1d(num_features=3) - x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) - y = bn(x) - - # Should normalize across batch dimension - assert y.shape == x.shape - # Mean should be close to 0, std close to 1 - assert np.allclose(np.mean(safe_numpy(y), axis=0), 0.0, atol=1e-6) - assert np.allclose(np.std(safe_numpy(y), axis=0), 1.0, atol=1e-6) \ No newline at end of file diff --git a/modules/source/04_networks/networks_dev.py b/modules/source/04_networks/networks_dev.py index a256bc9a..ff081724 100644 --- a/modules/source/04_networks/networks_dev.py +++ b/modules/source/04_networks/networks_dev.py @@ -524,19 +524,19 @@ wide = create_mlp(10, [50], 1) - **Efficiency:** Balance between performance and computation ### Different Activation Functions -```python + ```python # ReLU networks (most common) relu_net = create_mlp(10, [20], 1, activation=ReLU) - + # Tanh networks (centered around 0) tanh_net = create_mlp(10, [20], 1, activation=Tanh) - + # Multi-class classification classifier = create_mlp(10, [20], 3, output_activation=Softmax) -``` + ``` Let's test different architectures! -""" +""" # %% [markdown] """ @@ -560,7 +560,7 @@ try: classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax) # Test with sample data - x = Tensor([[1.0, 2.0, 3.0]]) + x = Tensor([[1.0, 2.0, 3.0]]) # Test ReLU network y_relu = relu_net(x) @@ -575,9 +575,9 @@ try: # Test multi-class classifier y_multi = classifier(x) assert y_multi.shape == (1, 3), "Multi-class classifier should work" - - # Check softmax properties - assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, "Softmax outputs should sum to 1" + + # Check softmax properties + assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, "Softmax outputs should sum to 1" print("✅ Multi-class classifier with Softmax works correctly") # Test different architectures @@ -595,7 +595,7 @@ try: print("✅ All network architectures work correctly") -except Exception as e: + except Exception as e: print(f"❌ Architecture test failed: {e}") raise @@ -643,18 +643,18 @@ try: iris_classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax) # Simulate iris features: [sepal_length, sepal_width, petal_length, petal_width] - iris_samples = Tensor([ + iris_samples = Tensor([ [5.1, 3.5, 1.4, 0.2], # Setosa [7.0, 3.2, 4.7, 1.4], # Versicolor [6.3, 3.3, 6.0, 2.5] # Virginica - ]) - - iris_predictions = iris_classifier(iris_samples) + ]) + + iris_predictions = iris_classifier(iris_samples) assert iris_predictions.shape == (3, 3), "Iris classifier should output 3 classes for 3 samples" - + # Check softmax properties - row_sums = np.sum(iris_predictions.data, axis=1) - assert np.allclose(row_sums, 1.0), "Each prediction should sum to 1" + row_sums = np.sum(iris_predictions.data, axis=1) + assert np.allclose(row_sums, 1.0), "Each prediction should sum to 1" print("✅ Multi-class classification works correctly") # Test 2: Regression Task (Housing prices) @@ -691,38 +691,38 @@ try: # Test 4: Network Composition print("\n4. Network Composition Test:") # Create a feature extractor and classifier separately - feature_extractor = Sequential([ + feature_extractor = Sequential([ Dense(input_size=10, output_size=5), - ReLU(), + ReLU(), Dense(input_size=5, output_size=3), - ReLU() - ]) - - classifier_head = Sequential([ + ReLU() + ]) + + classifier_head = Sequential([ Dense(input_size=3, output_size=2), - Softmax() - ]) - + Softmax() + ]) + # Test composition raw_data = Tensor(np.random.randn(5, 10)) - features = feature_extractor(raw_data) - final_predictions = classifier_head(features) + features = feature_extractor(raw_data) + final_predictions = classifier_head(features) assert features.shape == (5, 3), "Feature extractor should output 3 features" assert final_predictions.shape == (5, 2), "Classifier should output 2 classes" - - row_sums = np.sum(final_predictions.data, axis=1) + + row_sums = np.sum(final_predictions.data, axis=1) assert np.allclose(row_sums, 1.0), "Composed network predictions should be valid" print("✅ Network composition works correctly") - + print("\n🎉 Integration test passed! Your networks work correctly for:") - print(" • Multi-class classification (Iris flowers)") - print(" • Regression tasks (housing prices)") + print(" • Multi-class classification (Iris flowers)") + print(" • Regression tasks (housing prices)") print(" • Deep learning architectures") print(" • Network composition and feature extraction") - -except Exception as e: - print(f"❌ Integration test failed: {e}") + + except Exception as e: + print(f"❌ Integration test failed: {e}") raise print("📈 Final Progress: Complete network architectures ready for real ML applications!") diff --git a/modules/source/04_networks/networks_dev_backup.py b/modules/source/04_networks/networks_dev_backup.py new file mode 100644 index 00000000..79f8abbb --- /dev/null +++ b/modules/source/04_networks/networks_dev_backup.py @@ -0,0 +1,1418 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.17.1 +# --- + +# %% [markdown] +""" +# Module 4: Networks - Neural Network Architectures + +Welcome to the Networks module! This is where we compose layers into complete neural network architectures. + +## Learning Goals +- Understand networks as function composition: `f(x) = layer_n(...layer_2(layer_1(x)))` +- Build the Sequential network architecture for composing layers +- Create common network patterns like MLPs (Multi-Layer Perceptrons) +- Visualize network architectures and understand their capabilities +- Master forward pass inference through complete networks + +## Build → Use → Understand +1. **Build**: Sequential networks that compose layers into complete architectures +2. **Use**: Create different network patterns and run inference +3. **Understand**: How architecture design affects network behavior and capability +""" + +# %% nbgrader={"grade": false, "grade_id": "networks-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| default_exp core.networks + +#| export +import numpy as np +import sys +import os +from typing import List, Union, Optional, Callable +import matplotlib.pyplot as plt +import matplotlib.patches as patches +from matplotlib.patches import FancyBboxPatch, ConnectionPatch +import seaborn as sns + +# Import all the building blocks we need - try package first, then local modules +try: + from tinytorch.core.tensor import Tensor + from tinytorch.core.layers import Dense + from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax +except ImportError: + # For development, import from local modules + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor')) + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations')) + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers')) + from tensor_dev import Tensor + from activations_dev import ReLU, Sigmoid, Tanh, Softmax + from layers_dev import Dense + +# %% nbgrader={"grade": false, "grade_id": "networks-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| hide +#| export +def _should_show_plots(): + """Check if we should show plots (disable during testing)""" + # Check multiple conditions that indicate we're in test mode + is_pytest = ( + 'pytest' in sys.modules or + 'test' in sys.argv or + os.environ.get('PYTEST_CURRENT_TEST') is not None or + any('test' in arg for arg in sys.argv) or + any('pytest' in arg for arg in sys.argv) + ) + + # Show plots in development mode (when not in test mode) + return not is_pytest + +# %% nbgrader={"grade": false, "grade_id": "networks-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false} +print("🔥 TinyTorch Networks Module") +print(f"NumPy version: {np.__version__}") +print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") +print("Ready to build neural network architectures!") + +# %% [markdown] +""" +## 📦 Where This Code Lives in the Final Package + +**Learning Side:** You work in `modules/source/04_networks/networks_dev.py` +**Building Side:** Code exports to `tinytorch.core.networks` + +```python +# Final package structure: +from tinytorch.core.networks import Sequential, MLP # Network architectures! +from tinytorch.core.layers import Dense, Conv2D # Building blocks +from tinytorch.core.activations import ReLU, Sigmoid, Tanh # Nonlinearity +from tinytorch.core.tensor import Tensor # Foundation +``` + +**Why this matters:** +- **Learning:** Focused modules for deep understanding +- **Production:** Proper organization like PyTorch's `torch.nn.Sequential` +- **Consistency:** All network architectures live together in `core.networks` +- **Integration:** Works seamlessly with layers, activations, and tensors +""" + +# %% [markdown] +""" +## 🧠 The Mathematical Foundation of Neural Networks + +### Function Composition at Scale +Neural networks are fundamentally about **function composition**: + +$$f(x) = f_n(f_{n-1}(\ldots f_2(f_1(x)) \ldots))$$ + +Each layer is a function, and the network is the composition of all these functions. + +### Why Function Composition is Powerful +- **Modularity**: Each layer has a specific purpose +- **Composability**: Simple functions combine to create complex behaviors +- **Universal approximation**: Deep compositions can approximate any function +- **Hierarchical learning**: Early layers learn simple features, later layers learn complex patterns + +### The Architecture Design Space +Different arrangements of layers create different capabilities: +- **Depth**: More layers → more complex representations +- **Width**: More neurons per layer → more capacity per layer +- **Connections**: How layers connect affects information flow +- **Activation functions**: Add nonlinearity for complex patterns + +### Connection to Real ML Systems +Every framework uses sequential composition: +- **PyTorch**: `torch.nn.Sequential([layer1, layer2, layer3])` +- **TensorFlow**: `tf.keras.Sequential([layer1, layer2, layer3])` +- **JAX**: `jax.nn.Sequential([layer1, layer2, layer3])` +- **TinyTorch**: `tinytorch.core.networks.Sequential([layer1, layer2, layer3])` (what we're building!) + +### Performance and Design Considerations +- **Forward pass efficiency**: Sequential computation through layers +- **Memory management**: Intermediate activations storage +- **Gradient flow**: How information flows backward (for training) +- **Architecture search**: Finding optimal network structures +""" + +# %% [markdown] +""" +## Step 1: What is a Network? + +### Definition +A **network** is a composition of layers that transforms input data into output predictions. Think of it as a pipeline of transformations: + +``` +Input → Layer1 → Layer2 → Layer3 → Output +``` + +### The Mathematical Foundation: Function Composition Theory + +#### **Function Composition in Mathematics** +In mathematics, function composition combines simple functions to create complex ones: + +$$(f \circ g)(x) = f(g(x))$$ + +Neural network composition: +$$h(x) = f_n(f_{n-1}(\ldots f_2(f_1(x)) \ldots))$$ + +#### **Why Composition is Powerful** +1. **Modularity**: Each layer has a specific, well-defined purpose +2. **Composability**: Simple functions combine to create arbitrarily complex behaviors +3. **Hierarchical learning**: Early layers learn simple features, later layers learn complex patterns +4. **Universal approximation**: Deep compositions can approximate any continuous function + +#### **The Emergence of Intelligence** +Complex behavior emerges from simple layer composition: + +```python +# Example: Image classification +raw_pixels → [Edge detectors] → [Shape detectors] → [Object detectors] → [Class predictor] + ↓ ↓ ↓ ↓ ↓ + [28x28] [64 features] [128 features] [256 features] [10 classes] +``` + +### Architectural Design Principles + +#### **1. Depth vs. Width Trade-offs** +- **Deep networks**: More layers → more complex representations + - **Advantages**: Better feature hierarchies, parameter efficiency + - **Disadvantages**: Harder to train, gradient problems +- **Wide networks**: More neurons per layer → more capacity per layer + - **Advantages**: Easier to train, parallel computation + - **Disadvantages**: More parameters, potential overfitting + +#### **2. Information Flow Patterns** +```python +# Sequential flow (what we're building): +x → layer1 → layer2 → layer3 → output + +# Residual flow (advanced): +x → layer1 → layer2 + x → layer3 → output + +# Attention flow (transformers): +x → attention(x, x, x) → feedforward → output +``` + +#### **3. Activation Function Placement** +```python +# Standard pattern: +linear_transformation → nonlinear_activation → next_layer + +# Why this works: +# Linear + Linear = Linear (no increase in expressiveness) +# Linear + Nonlinear + Linear = Nonlinear (exponential increase in expressiveness) +``` + +### Real-World Architecture Examples + +#### **Multi-Layer Perceptron (MLP)** +```python +# Classic feedforward network +input → dense(512) → relu → dense(256) → relu → dense(10) → softmax +``` +- **Use cases**: Tabular data, feature learning, classification +- **Strengths**: Universal approximation, well-understood +- **Weaknesses**: Doesn't exploit spatial/temporal structure + +#### **Convolutional Neural Network (CNN)** +```python +# Exploits spatial structure +input → conv2d → relu → pool → conv2d → relu → pool → dense → softmax +``` +- **Use cases**: Image processing, computer vision +- **Strengths**: Translation invariance, parameter sharing +- **Weaknesses**: Fixed receptive field, not great for sequences + +#### **Recurrent Neural Network (RNN)** +```python +# Processes sequences +input_t → rnn_cell(hidden_{t-1}) → hidden_t → output_t +``` +- **Use cases**: Natural language processing, time series +- **Strengths**: Variable length sequences, memory +- **Weaknesses**: Sequential computation, gradient problems + +#### **Transformer** +```python +# Attention-based processing +input → attention → feedforward → attention → feedforward → output +``` +- **Use cases**: Language models, machine translation +- **Strengths**: Parallelizable, long-range dependencies +- **Weaknesses**: Quadratic complexity, large memory requirements + +### The Network Design Process + +#### **1. Problem Analysis** +- **Data type**: Images, text, tabular, time series? +- **Task type**: Classification, regression, generation? +- **Constraints**: Latency, memory, accuracy requirements? + +#### **2. Architecture Selection** +- **Start simple**: Begin with basic MLP +- **Add structure**: Incorporate domain-specific inductive biases +- **Scale up**: Increase depth/width as needed + +#### **3. Component Design** +- **Input layer**: Match data dimensions +- **Hidden layers**: Gradual dimension reduction typical +- **Output layer**: Match task requirements (classes, regression targets) +- **Activation functions**: ReLU for hidden, task-specific for output + +#### **4. Optimization Considerations** +- **Gradient flow**: Ensure gradients can flow through the network +- **Computational efficiency**: Balance expressiveness with speed +- **Memory usage**: Consider intermediate activation storage + +### Performance Characteristics + +#### **Forward Pass Complexity** +For a network with L layers, each with n neurons: +- **Time complexity**: O(L × n²) for dense layers +- **Space complexity**: O(L × n) for activations +- **Parallelization**: Each layer can be parallelized + +#### **Memory Management** +```python +# Memory usage during forward pass: +input_memory = batch_size × input_size +hidden_memory = batch_size × hidden_size × num_layers +output_memory = batch_size × output_size +total_memory = input_memory + hidden_memory + output_memory +``` + +#### **Computational Optimization** +- **Batch processing**: Process multiple samples simultaneously +- **Vectorization**: Use optimized matrix operations +- **Hardware acceleration**: Leverage GPUs/TPUs for parallel computation + +### Connection to Previous Modules + +#### **From Module 1 (Tensor)** +- **Data flow**: Tensors flow through the network +- **Shape management**: Ensure compatible dimensions between layers + +#### **From Module 2 (Activations)** +- **Nonlinearity**: Activation functions between layers enable complex learning +- **Function choice**: Different activations for different purposes + +#### **From Module 3 (Layers)** +- **Building blocks**: Layers are the fundamental components +- **Composition**: Networks compose layers into complete architectures + +### Why Networks Matter: The Scaling Laws + +#### **Empirical Observations** +- **More parameters**: Generally better performance (up to a point) +- **More data**: Enables training of larger networks +- **More compute**: Allows exploration of larger architectures + +#### **The Deep Learning Revolution** +```python +# Pre-2012: Shallow networks +input → hidden(100) → output + +# Post-2012: Deep networks +input → hidden(512) → hidden(512) → hidden(512) → ... → output +``` + +The key insight: **Depth enables hierarchical feature learning** + +Let's start building our Sequential network architecture! +""" + +# %% nbgrader={"grade": false, "grade_id": "sequential-class", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class Sequential: + """ + Sequential Network: Composes layers in sequence + + The most fundamental network architecture. + Applies layers in order: f(x) = layer_n(...layer_2(layer_1(x))) + """ + + def __init__(self, layers: List): + """ + Initialize Sequential network with layers. + + Args: + layers: List of layers to compose in order + + TODO: Store the layers and implement forward pass + + APPROACH: + 1. Store the layers list as an instance variable + 2. This creates the network architecture ready for forward pass + + EXAMPLE: + Sequential([Dense(3,4), ReLU(), Dense(4,2)]) + creates a 3-layer network: Dense → ReLU → Dense + + HINTS: + - Store layers in self.layers + - This is the foundation for all network architectures + """ + ### BEGIN SOLUTION + self.layers = layers + ### END SOLUTION + + def forward(self, x: Tensor) -> Tensor: + """ + Forward pass through all layers in sequence. + + Args: + x: Input tensor + + Returns: + Output tensor after passing through all layers + + TODO: Implement sequential forward pass through all layers + + APPROACH: + 1. Start with the input tensor + 2. Apply each layer in sequence + 3. Each layer's output becomes the next layer's input + 4. Return the final output + + EXAMPLE: + Input: Tensor([[1, 2, 3]]) + Layer1 (Dense): Tensor([[1.4, 2.8]]) + Layer2 (ReLU): Tensor([[1.4, 2.8]]) + Layer3 (Dense): Tensor([[0.7]]) + Output: Tensor([[0.7]]) + + HINTS: + - Use a for loop: for layer in self.layers: + - Apply each layer: x = layer(x) + - The output of one layer becomes input to the next + - Return the final result + """ + ### BEGIN SOLUTION + # Apply each layer in sequence + for layer in self.layers: + x = layer(x) + return x + ### END SOLUTION + + def __call__(self, x: Tensor) -> Tensor: + """Make network callable: network(x) same as network.forward(x)""" + return self.forward(x) + +# %% [markdown] +""" +### 🧪 Unit Test: Sequential Network + +Let's test your Sequential network implementation! This is the foundation of all neural network architectures. + +**This is a unit test** - it tests one specific class (Sequential network) in isolation. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-sequential-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test Sequential network immediately after implementation +print("🔬 Unit Test: Sequential Network...") + +# Create a simple 2-layer network: 3 → 4 → 2 +try: + network = Sequential([ + Dense(input_size=3, output_size=4), + ReLU(), + Dense(input_size=4, output_size=2), + Sigmoid() + ]) + + print(f"Network created with {len(network.layers)} layers") + print("✅ Sequential network creation successful") + + # Test with sample data + x = Tensor([[1.0, 2.0, 3.0]]) + print(f"Input: {x}") + + # Forward pass + y = network(x) + print(f"Output: {y}") + print(f"Output shape: {y.shape}") + + # Verify the network works + assert y.shape == (1, 2), f"Expected shape (1, 2), got {y.shape}" + print("✅ Sequential network produces correct output shape") + + # Test that sigmoid output is in valid range + assert np.all(y.data >= 0) and np.all(y.data <= 1), "Sigmoid output should be between 0 and 1" + print("✅ Sequential network output is in valid range") + + # Test that layers are stored correctly + assert len(network.layers) == 4, f"Expected 4 layers, got {len(network.layers)}" + print("✅ Sequential network stores layers correctly") + +except Exception as e: + print(f"❌ Sequential network test failed: {e}") + raise + +# Show the network architecture +print("🎯 Sequential network behavior:") +print(" Applies layers in sequence: f(g(h(x)))") +print(" Input flows through each layer in order") +print(" Output of layer i becomes input of layer i+1") +print("📈 Progress: Sequential network ✓") + +# %% [markdown] +""" +## Step 2: Building Multi-Layer Perceptrons (MLPs) + +### What is an MLP? +A **Multi-Layer Perceptron** is the classic neural network architecture: + +``` +Input → Dense → Activation → Dense → Activation → ... → Dense → Output +``` + +### Why MLPs are Important +- **Universal approximation**: Can approximate any continuous function +- **Foundation**: Basis for understanding all neural networks +- **Versatile**: Works for classification, regression, and more +- **Simple**: Easy to understand and implement + +### MLP Architecture Pattern +``` +create_mlp(3, [4, 2], 1) creates: +Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid +``` + +### Real-World Applications +- **Tabular data**: Customer analytics, financial modeling +- **Feature learning**: Learning representations from raw data +- **Classification**: Spam detection, medical diagnosis +- **Regression**: Price prediction, time series forecasting +""" + +# %% nbgrader={"grade": false, "grade_id": "create-mlp", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int, + activation=ReLU, output_activation=Sigmoid) -> Sequential: + """ + Create a Multi-Layer Perceptron (MLP) network. + + Args: + input_size: Number of input features + hidden_sizes: List of hidden layer sizes + output_size: Number of output features + activation: Activation function for hidden layers (default: ReLU) + output_activation: Activation function for output layer (default: Sigmoid) + + Returns: + Sequential network with MLP architecture + + TODO: Implement MLP creation with alternating Dense and activation layers. + + APPROACH: + 1. Start with an empty list of layers + 2. Add layers in this pattern: + - Dense(input_size → first_hidden_size) + - Activation() + - Dense(first_hidden_size → second_hidden_size) + - Activation() + - ... + - Dense(last_hidden_size → output_size) + - Output_activation() + 3. Return Sequential(layers) + + EXAMPLE: + create_mlp(3, [4, 2], 1) creates: + Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid + + HINTS: + - Start with layers = [] + - Track current_size starting with input_size + - For each hidden_size: add Dense(current_size, hidden_size), then activation + - Finally add Dense(last_hidden_size, output_size), then output_activation + - Return Sequential(layers) + """ + ### BEGIN SOLUTION + layers = [] + current_size = input_size + + # Add hidden layers with activations + for hidden_size in hidden_sizes: + layers.append(Dense(current_size, hidden_size)) + layers.append(activation()) + current_size = hidden_size + + # Add output layer with output activation + layers.append(Dense(current_size, output_size)) + layers.append(output_activation()) + + return Sequential(layers) + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Unit Test: MLP Creation + +Let's test your MLP creation function! This builds complete neural networks with a single function call. + +**This is a unit test** - it tests one specific function (create_mlp) in isolation. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-mlp-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test MLP creation immediately after implementation +print("🔬 Unit Test: MLP Creation...") + +# Create a simple MLP: 3 → 4 → 2 → 1 +try: + mlp = create_mlp(input_size=3, hidden_sizes=[4, 2], output_size=1) + + print(f"MLP created with {len(mlp.layers)} layers") + print("✅ MLP creation successful") + + # Test the structure - should have 6 layers: Dense, ReLU, Dense, ReLU, Dense, Sigmoid + expected_layers = 6 # 3 Dense + 2 ReLU + 1 Sigmoid + assert len(mlp.layers) == expected_layers, f"Expected {expected_layers} layers, got {len(mlp.layers)}" + print("✅ MLP has correct number of layers") + + # Test with sample data + x = Tensor([[1.0, 2.0, 3.0]]) + y = mlp(x) + print(f"MLP input: {x}") + print(f"MLP output: {y}") + print(f"MLP output shape: {y.shape}") + + # Verify the output + assert y.shape == (1, 1), f"Expected shape (1, 1), got {y.shape}" + print("✅ MLP produces correct output shape") + + # Test that sigmoid output is in valid range + assert np.all(y.data >= 0) and np.all(y.data <= 1), "Sigmoid output should be between 0 and 1" + print("✅ MLP output is in valid range") + +except Exception as e: + print(f"❌ MLP creation test failed: {e}") + raise + +# Test different architectures +try: + # Test shallow network + shallow_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1) + assert len(shallow_net.layers) == 4, f"Shallow network should have 4 layers, got {len(shallow_net.layers)}" + + # Test deep network + deep_net = create_mlp(input_size=3, hidden_sizes=[4, 4, 4], output_size=1) + assert len(deep_net.layers) == 8, f"Deep network should have 8 layers, got {len(deep_net.layers)}" + + # Test wide network + wide_net = create_mlp(input_size=3, hidden_sizes=[10], output_size=1) + assert len(wide_net.layers) == 4, f"Wide network should have 4 layers, got {len(wide_net.layers)}" + + print("✅ Different MLP architectures work correctly") + +except Exception as e: + print(f"❌ MLP architecture test failed: {e}") + raise + +# Show the MLP pattern +print("🎯 MLP creation pattern:") +print(" Input → Dense → Activation → Dense → Activation → ... → Dense → Output_Activation") +print(" Automatically creates the complete architecture") +print(" Handles any number of hidden layers") +print("📈 Progress: Sequential network ✓, MLP creation ✓") +print("🚀 Complete neural networks ready!") + +# %% [markdown] +""" +### 🧪 Test Your Network Implementations + +Once you implement the functions above, run these cells to test them: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-sequential", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test the Sequential network +print("Testing Sequential network...") + +# Create a simple 2-layer network: 3 → 4 → 2 +network = Sequential([ + Dense(input_size=3, output_size=4), + ReLU(), + Dense(input_size=4, output_size=2), + Sigmoid() +]) + +print(f"Network created with {len(network.layers)} layers") + +# Test with sample data +x = Tensor([[1.0, 2.0, 3.0]]) +print(f"Input: {x}") + +# Forward pass +y = network(x) +print(f"Output: {y}") +print(f"Output shape: {y.shape}") + +# Verify the network works +assert y.shape == (1, 2), f"Expected shape (1, 2), got {y.shape}" +assert np.all(y.data >= 0) and np.all(y.data <= 1), "Sigmoid output should be between 0 and 1" + +print("✅ Sequential network tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-mlp", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test MLP creation +print("Testing MLP creation...") + +# Create a simple MLP: 3 → 4 → 2 → 1 +mlp = create_mlp(input_size=3, hidden_sizes=[4, 2], output_size=1) + +print(f"MLP created with {len(mlp.layers)} layers") + +# Test the structure +expected_layers = [ + Dense, # 3 → 4 + ReLU, # activation + Dense, # 4 → 2 + ReLU, # activation + Dense, # 2 → 1 + Sigmoid # output activation +] + +assert len(mlp.layers) == 6, f"Expected 6 layers, got {len(mlp.layers)}" + +# Test with sample data +x = Tensor([[1.0, 2.0, 3.0]]) +y = mlp(x) +print(f"MLP output: {y}") +print(f"MLP output shape: {y.shape}") + +# Verify the output +assert y.shape == (1, 1), f"Expected shape (1, 1), got {y.shape}" +assert np.all(y.data >= 0) and np.all(y.data <= 1), "Sigmoid output should be between 0 and 1" + +print("✅ MLP creation tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-network-comparison", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test different network architectures +print("Testing different network architectures...") + +# Create networks with different architectures +shallow_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1) +deep_net = create_mlp(input_size=3, hidden_sizes=[4, 4, 4], output_size=1) +wide_net = create_mlp(input_size=3, hidden_sizes=[10], output_size=1) + +# Test input +x = Tensor([[1.0, 2.0, 3.0]]) + +# Test all networks +shallow_out = shallow_net(x) +deep_out = deep_net(x) +wide_out = wide_net(x) + +print(f"Shallow network output: {shallow_out}") +print(f"Deep network output: {deep_out}") +print(f"Wide network output: {wide_out}") + +# Verify all outputs are valid +for name, output in [("Shallow", shallow_out), ("Deep", deep_out), ("Wide", wide_out)]: + assert output.shape == (1, 1), f"{name} network output shape should be (1, 1), got {output.shape}" + assert np.all(output.data >= 0) and np.all(output.data <= 1), f"{name} network output should be between 0 and 1" + +print("✅ Network architecture comparison tests passed!") + +# %% [markdown] +""" +## 🎯 Module Summary + +Congratulations! You've successfully implemented complete neural network architectures: + +### What You've Accomplished +✅ **Sequential Networks**: The fundamental architecture for composing layers +✅ **Function Composition**: Understanding how layers combine to create complex behaviors +✅ **MLP Creation**: Building Multi-Layer Perceptrons with flexible architectures +✅ **Architecture Patterns**: Creating shallow, deep, and wide networks +✅ **Forward Pass**: Complete inference through multi-layer networks + +### Key Concepts You've Learned +- **Networks are function composition**: Complex behavior from simple building blocks +- **Sequential architecture**: The foundation of most neural networks +- **MLP patterns**: Dense → Activation → Dense → Activation → Output +- **Architecture design**: How depth and width affect network capability +- **Forward pass**: How data flows through complete networks + +### Mathematical Foundations +- **Function composition**: f(x) = f_n(...f_2(f_1(x))) +- **Universal approximation**: MLPs can approximate any continuous function +- **Hierarchical learning**: Early layers learn simple features, later layers learn complex patterns +- **Nonlinearity**: Activation functions enable complex decision boundaries + +### Real-World Applications +- **Classification**: Image recognition, spam detection, medical diagnosis +- **Regression**: Price prediction, time series forecasting +- **Feature learning**: Extracting meaningful representations from raw data +- **Transfer learning**: Using pre-trained networks for new tasks + +### Next Steps +1. **Export your code**: `tito package nbdev --export 04_networks` +2. **Test your implementation**: `tito module test 04_networks` +3. **Use your networks**: + ```python + from tinytorch.core.networks import Sequential, create_mlp + from tinytorch.core.layers import Dense + from tinytorch.core.activations import ReLU + + # Create custom network + network = Sequential([Dense(10, 5), ReLU(), Dense(5, 1)]) + + # Create MLP + mlp = create_mlp(10, [20, 10], 1) + ``` +4. **Move to Module 5**: Start building convolutional networks for images! + +**Ready for the next challenge?** Let's add convolutional layers for image processing and build CNNs! +""" + +# %% [markdown] +""" +## 🧪 Comprehensive Testing: Neural Network Architectures + +Let's thoroughly test your network implementations to ensure they work correctly in all scenarios. +This comprehensive testing ensures your networks are robust and ready for real ML applications. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-networks-comprehensive", "locked": true, "points": 30, "schema_version": 3, "solution": false, "task": false} +def test_networks_comprehensive(): + """Comprehensive test of Sequential networks and MLP creation.""" + print("🔬 Testing neural network architectures comprehensively...") + + tests_passed = 0 + total_tests = 10 + + # Test 1: Sequential Network Creation and Structure + try: + # Create a simple 2-layer network + network = Sequential([ + Dense(input_size=3, output_size=4), + ReLU(), + Dense(input_size=4, output_size=2), + Sigmoid() + ]) + + assert len(network.layers) == 4, f"Expected 4 layers, got {len(network.layers)}" + + # Test layer types + assert isinstance(network.layers[0], Dense), "First layer should be Dense" + assert isinstance(network.layers[1], ReLU), "Second layer should be ReLU" + assert isinstance(network.layers[2], Dense), "Third layer should be Dense" + assert isinstance(network.layers[3], Sigmoid), "Fourth layer should be Sigmoid" + + print("✅ Sequential network creation and structure") + tests_passed += 1 + except Exception as e: + print(f"❌ Sequential network creation failed: {e}") + + # Test 2: Sequential Network Forward Pass + try: + network = Sequential([ + Dense(input_size=3, output_size=4), + ReLU(), + Dense(input_size=4, output_size=2), + Sigmoid() + ]) + + # Test single sample + x_single = Tensor([[1.0, 2.0, 3.0]]) + y_single = network(x_single) + + assert y_single.shape == (1, 2), f"Single sample output should be (1, 2), got {y_single.shape}" + assert np.all((y_single.data >= 0) & (y_single.data <= 1)), "Sigmoid output should be in [0,1]" + + # Test batch processing + x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]) + y_batch = network(x_batch) + + assert y_batch.shape == (3, 2), f"Batch output should be (3, 2), got {y_batch.shape}" + assert np.all((y_batch.data >= 0) & (y_batch.data <= 1)), "All batch outputs should be in [0,1]" + + print("✅ Sequential network forward pass: single and batch") + tests_passed += 1 + except Exception as e: + print(f"❌ Sequential network forward pass failed: {e}") + + # Test 3: MLP Creation Basic Functionality + try: + # Create simple MLP: 3 → 4 → 2 → 1 + mlp = create_mlp(input_size=3, hidden_sizes=[4, 2], output_size=1) + + # Should have 6 layers: Dense, ReLU, Dense, ReLU, Dense, Sigmoid + expected_layers = 6 + assert len(mlp.layers) == expected_layers, f"Expected {expected_layers} layers, got {len(mlp.layers)}" + + # Test layer pattern + layer_types = [type(layer).__name__ for layer in mlp.layers] + expected_pattern = ['Dense', 'ReLU', 'Dense', 'ReLU', 'Dense', 'Sigmoid'] + assert layer_types == expected_pattern, f"Expected pattern {expected_pattern}, got {layer_types}" + + # Test forward pass + x = Tensor([[1.0, 2.0, 3.0]]) + y = mlp(x) + + assert y.shape == (1, 1), f"MLP output should be (1, 1), got {y.shape}" + assert np.all((y.data >= 0) & (y.data <= 1)), "MLP output should be in [0,1]" + + print("✅ MLP creation basic functionality") + tests_passed += 1 + except Exception as e: + print(f"❌ MLP creation basic failed: {e}") + + # Test 4: Different MLP Architectures + try: + # Test shallow network (1 hidden layer) + shallow_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1) + assert len(shallow_net.layers) == 4, f"Shallow network should have 4 layers, got {len(shallow_net.layers)}" + + # Test deep network (3 hidden layers) + deep_net = create_mlp(input_size=3, hidden_sizes=[4, 4, 4], output_size=1) + assert len(deep_net.layers) == 8, f"Deep network should have 8 layers, got {len(deep_net.layers)}" + + # Test wide network (1 large hidden layer) + wide_net = create_mlp(input_size=3, hidden_sizes=[20], output_size=1) + assert len(wide_net.layers) == 4, f"Wide network should have 4 layers, got {len(wide_net.layers)}" + + # Test very deep network + very_deep_net = create_mlp(input_size=3, hidden_sizes=[5, 5, 5, 5, 5], output_size=1) + assert len(very_deep_net.layers) == 12, f"Very deep network should have 12 layers, got {len(very_deep_net.layers)}" + + # Test all networks work + x = Tensor([[1.0, 2.0, 3.0]]) + for name, net in [("Shallow", shallow_net), ("Deep", deep_net), ("Wide", wide_net), ("Very Deep", very_deep_net)]: + y = net(x) + assert y.shape == (1, 1), f"{name} network output shape should be (1, 1), got {y.shape}" + assert np.all((y.data >= 0) & (y.data <= 1)), f"{name} network output should be in [0,1]" + + print("✅ Different MLP architectures: shallow, deep, wide, very deep") + tests_passed += 1 + except Exception as e: + print(f"❌ Different MLP architectures failed: {e}") + + # Test 5: MLP with Different Activation Functions + try: + # Test with Tanh activation + mlp_tanh = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh, output_activation=Sigmoid) + + # Check layer types + layer_types = [type(layer).__name__ for layer in mlp_tanh.layers] + expected_pattern = ['Dense', 'Tanh', 'Dense', 'Sigmoid'] + assert layer_types == expected_pattern, f"Tanh MLP pattern should be {expected_pattern}, got {layer_types}" + + # Test forward pass + x = Tensor([[1.0, 2.0, 3.0]]) + y = mlp_tanh(x) + assert y.shape == (1, 1), "Tanh MLP should work correctly" + + # Test with different output activation + mlp_tanh_out = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, activation=ReLU, output_activation=Softmax) + y_multi = mlp_tanh_out(x) + assert y_multi.shape == (1, 3), "Multi-output MLP should work" + + # Check softmax properties + assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, "Softmax outputs should sum to 1" + + print("✅ MLP with different activation functions: Tanh, Softmax") + tests_passed += 1 + except Exception as e: + print(f"❌ MLP with different activations failed: {e}") + + # Test 6: Network Layer Composition + try: + # Test that network correctly chains layers + network = Sequential([ + Dense(input_size=4, output_size=3), + ReLU(), + Dense(input_size=3, output_size=2), + Tanh(), + Dense(input_size=2, output_size=1), + Sigmoid() + ]) + + x = Tensor([[1.0, -1.0, 2.0, -2.0]]) + + # Manual forward pass to verify composition + h1 = network.layers[0](x) # Dense + h2 = network.layers[1](h1) # ReLU + h3 = network.layers[2](h2) # Dense + h4 = network.layers[3](h3) # Tanh + h5 = network.layers[4](h4) # Dense + h6 = network.layers[5](h5) # Sigmoid + + # Compare with network forward pass + y_network = network(x) + + assert np.allclose(h6.data, y_network.data), "Manual and network forward pass should match" + + # Check intermediate shapes + assert h1.shape == (1, 3), f"h1 shape should be (1, 3), got {h1.shape}" + assert h2.shape == (1, 3), f"h2 shape should be (1, 3), got {h2.shape}" + assert h3.shape == (1, 2), f"h3 shape should be (1, 2), got {h3.shape}" + assert h4.shape == (1, 2), f"h4 shape should be (1, 2), got {h4.shape}" + assert h5.shape == (1, 1), f"h5 shape should be (1, 1), got {h5.shape}" + assert h6.shape == (1, 1), f"h6 shape should be (1, 1), got {h6.shape}" + + # Check activation effects + assert np.all(h2.data >= 0), "ReLU should produce non-negative values" + assert np.all((h4.data >= -1) & (h4.data <= 1)), "Tanh should produce values in [-1,1]" + assert np.all((h6.data >= 0) & (h6.data <= 1)), "Sigmoid should produce values in [0,1]" + + print("✅ Network layer composition: correct chaining and shapes") + tests_passed += 1 + except Exception as e: + print(f"❌ Network layer composition failed: {e}") + + # Test 7: Edge Cases and Robustness + try: + # Test with minimal network (1 layer) + minimal_net = Sequential([Dense(input_size=2, output_size=1)]) + x_minimal = Tensor([[1.0, 2.0]]) + y_minimal = minimal_net(x_minimal) + assert y_minimal.shape == (1, 1), "Minimal network should work" + + # Test with single neuron layers + single_neuron_net = create_mlp(input_size=1, hidden_sizes=[1], output_size=1) + x_single = Tensor([[5.0]]) + y_single_neuron = single_neuron_net(x_single) + assert y_single_neuron.shape == (1, 1), "Single neuron network should work" + + # Test with large batch + large_net = create_mlp(input_size=10, hidden_sizes=[5], output_size=1) + x_large_batch = Tensor(np.random.randn(100, 10)) + y_large_batch = large_net(x_large_batch) + assert y_large_batch.shape == (100, 1), "Large batch should work" + assert not np.any(np.isnan(y_large_batch.data)), "Should not produce NaN" + assert not np.any(np.isinf(y_large_batch.data)), "Should not produce Inf" + + print("✅ Edge cases: minimal networks, single neurons, large batches") + tests_passed += 1 + except Exception as e: + print(f"❌ Edge cases failed: {e}") + + # Test 8: Multi-class Classification Networks + try: + # Create multi-class classifier + classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax) + + # Test with batch of samples + x_multi = Tensor(np.random.randn(5, 4)) + y_multi = classifier(x_multi) + + assert y_multi.shape == (5, 3), f"Multi-class output should be (5, 3), got {y_multi.shape}" + + # Check softmax properties for each sample + row_sums = np.sum(y_multi.data, axis=1) + assert np.allclose(row_sums, 1.0), "Each sample should have probabilities summing to 1" + assert np.all(y_multi.data > 0), "All probabilities should be positive" + + # Test that argmax gives valid class predictions + predictions = np.argmax(y_multi.data, axis=1) + assert np.all((predictions >= 0) & (predictions < 3)), "Predictions should be valid class indices" + + print("✅ Multi-class classification: softmax probabilities, valid predictions") + tests_passed += 1 + except Exception as e: + print(f"❌ Multi-class classification failed: {e}") + + # Test 9: Real ML Scenarios + try: + # Scenario 1: Binary classification (like spam detection) + spam_classifier = create_mlp(input_size=100, hidden_sizes=[50, 20], output_size=1, output_activation=Sigmoid) + + # Simulate email features + email_features = Tensor(np.random.randn(10, 100)) + spam_probabilities = spam_classifier(email_features) + + assert spam_probabilities.shape == (10, 1), "Spam classifier should output probabilities for each email" + assert np.all((spam_probabilities.data >= 0) & (spam_probabilities.data <= 1)), "Should output valid probabilities" + + # Scenario 2: Image classification (like MNIST) + mnist_classifier = create_mlp(input_size=784, hidden_sizes=[256, 128], output_size=10, output_activation=Softmax) + + # Simulate flattened images + images = Tensor(np.random.randn(32, 784)) # Batch of 32 images + class_probabilities = mnist_classifier(images) + + assert class_probabilities.shape == (32, 10), "MNIST classifier should output 10 class probabilities" + + # Check softmax properties + batch_sums = np.sum(class_probabilities.data, axis=1) + assert np.allclose(batch_sums, 1.0), "Each image should have class probabilities summing to 1" + + # Scenario 3: Regression (like house price prediction) + price_predictor = Sequential([ + Dense(input_size=8, output_size=16), + ReLU(), + Dense(input_size=16, output_size=8), + ReLU(), + Dense(input_size=8, output_size=1) # No activation for regression + ]) + + # Simulate house features + house_features = Tensor(np.random.randn(5, 8)) + predicted_prices = price_predictor(house_features) + + assert predicted_prices.shape == (5, 1), "Price predictor should output one price per house" + + print("✅ Real ML scenarios: spam detection, image classification, price prediction") + tests_passed += 1 + except Exception as e: + print(f"❌ Real ML scenarios failed: {e}") + + # Test 10: Network Comparison and Analysis + try: + # Create networks with same total parameters but different architectures + x_test = Tensor([[1.0, 2.0, 3.0, 4.0]]) + + # Wide network: 4 → 20 → 1 (parameters: 4*20 + 20 + 20*1 + 1 = 121) + wide_network = create_mlp(input_size=4, hidden_sizes=[20], output_size=1) + + # Deep network: 4 → 10 → 10 → 1 (parameters: 4*10 + 10 + 10*10 + 10 + 10*1 + 1 = 171) + deep_network = create_mlp(input_size=4, hidden_sizes=[10, 10], output_size=1) + + # Test both networks + wide_output = wide_network(x_test) + deep_output = deep_network(x_test) + + assert wide_output.shape == (1, 1), "Wide network should produce correct output" + assert deep_output.shape == (1, 1), "Deep network should produce correct output" + + # Both should be valid but potentially different + assert np.all((wide_output.data >= 0) & (wide_output.data <= 1)), "Wide network output should be valid" + assert np.all((deep_output.data >= 0) & (deep_output.data <= 1)), "Deep network output should be valid" + + # Test network complexity + def count_parameters(network): + total = 0 + for layer in network.layers: + if isinstance(layer, Dense): + total += layer.weights.size + if layer.bias is not None: + total += layer.bias.size + return total + + wide_params = count_parameters(wide_network) + deep_params = count_parameters(deep_network) + + assert wide_params > 0, "Wide network should have parameters" + assert deep_params > 0, "Deep network should have parameters" + + print(f"✅ Network comparison: wide ({wide_params} params) vs deep ({deep_params} params)") + tests_passed += 1 + except Exception as e: + print(f"❌ Network comparison failed: {e}") + + # Results summary + print(f"\n📊 Networks Module Results: {tests_passed}/{total_tests} tests passed") + + if tests_passed == total_tests: + print("🎉 All network tests passed! Your implementations support:") + print(" • Sequential networks: layer composition and chaining") + print(" • MLP creation: flexible multi-layer perceptron architectures") + print(" • Different architectures: shallow, deep, wide networks") + print(" • Multiple activation functions: ReLU, Tanh, Sigmoid, Softmax") + print(" • Multi-class classification: softmax probability distributions") + print(" • Real ML scenarios: spam detection, image classification, regression") + print(" • Network analysis: parameter counting and architecture comparison") + print("📈 Progress: All Network Functionality ✓") + return True + else: + print("⚠️ Some network tests failed. Common issues:") + print(" • Check Sequential class layer composition") + print(" • Verify create_mlp function layer creation pattern") + print(" • Ensure proper activation function integration") + print(" • Test forward pass through complete networks") + print(" • Verify shape handling across all layers") + return False + +# Run the comprehensive test +success = test_networks_comprehensive() + +# %% [markdown] +""" +### 🧪 Integration Test: Complete Neural Network Applications + +Let's test your networks in realistic machine learning applications. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-networks-integration", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false} +def test_networks_integration(): + """Integration test with complete neural network applications.""" + print("🔬 Testing networks in complete ML applications...") + + try: + print("🧠 Building complete ML applications with neural networks...") + + # Application 1: Iris Classification + print("\n🌸 Application 1: Iris Classification (Multi-class)") + iris_classifier = create_mlp( + input_size=4, # 4 flower measurements + hidden_sizes=[8, 6], # Hidden layers + output_size=3, # 3 iris species + output_activation=Softmax + ) + + # Simulate iris data + iris_samples = Tensor([ + [5.1, 3.5, 1.4, 0.2], # Setosa-like + [7.0, 3.2, 4.7, 1.4], # Versicolor-like + [6.3, 3.3, 6.0, 2.5] # Virginica-like + ]) + + iris_predictions = iris_classifier(iris_samples) + + assert iris_predictions.shape == (3, 3), "Should predict 3 classes for 3 samples" + + # Check that predictions are valid probabilities + row_sums = np.sum(iris_predictions.data, axis=1) + assert np.allclose(row_sums, 1.0), "Each prediction should sum to 1" + + # Get predicted classes + predicted_classes = np.argmax(iris_predictions.data, axis=1) + print(f" Predicted classes: {predicted_classes}") + print(f" Confidence scores: {np.max(iris_predictions.data, axis=1)}") + + print("✅ Iris classification: valid multi-class predictions") + + # Application 2: Housing Price Prediction + print("\n🏠 Application 2: Housing Price Prediction (Regression)") + price_predictor = Sequential([ + Dense(input_size=8, output_size=16), # 8 house features + ReLU(), + Dense(input_size=16, output_size=8), + ReLU(), + Dense(input_size=8, output_size=1) # 1 price output (no activation for regression) + ]) + + # Simulate house features: [size, bedrooms, bathrooms, age, location_score, etc.] + house_data = Tensor([ + [2000, 3, 2, 5, 8.5, 1, 0, 1], # Large, new house + [1200, 2, 1, 20, 6.0, 0, 1, 0], # Small, older house + [1800, 3, 2, 10, 7.5, 1, 0, 0] # Medium house + ]) + + predicted_prices = price_predictor(house_data) + + assert predicted_prices.shape == (3, 1), "Should predict 1 price for each house" + assert not np.any(np.isnan(predicted_prices.data)), "Prices should not be NaN" + + print(f" Predicted prices: {predicted_prices.data.flatten()}") + print("✅ Housing price prediction: valid regression outputs") + + # Application 3: Sentiment Analysis + print("\n💭 Application 3: Sentiment Analysis (Binary Classification)") + sentiment_analyzer = create_mlp( + input_size=100, # 100 text features (like TF-IDF) + hidden_sizes=[50, 25], # Deep network for text + output_size=1, # Binary sentiment (positive/negative) + output_activation=Sigmoid + ) + + # Simulate text features for different reviews + review_features = Tensor(np.random.randn(5, 100)) # 5 reviews + sentiment_scores = sentiment_analyzer(review_features) + + assert sentiment_scores.shape == (5, 1), "Should predict sentiment for each review" + assert np.all((sentiment_scores.data >= 0) & (sentiment_scores.data <= 1)), "Sentiment scores should be probabilities" + + # Convert to sentiment labels + sentiment_labels = (sentiment_scores.data > 0.5).astype(int) + print(f" Sentiment predictions: {sentiment_labels.flatten()}") + print(f" Confidence scores: {sentiment_scores.data.flatten()}") + + print("✅ Sentiment analysis: valid binary classification") + + # Application 4: MNIST-like Digit Recognition + print("\n🔢 Application 4: Digit Recognition (Image Classification)") + digit_classifier = create_mlp( + input_size=784, # 28x28 flattened images + hidden_sizes=[256, 128, 64], # Deep network for images + output_size=10, # 10 digits (0-9) + output_activation=Softmax + ) + + # Simulate flattened digit images + digit_images = Tensor(np.random.randn(8, 784)) # 8 digit images + digit_predictions = digit_classifier(digit_images) + + assert digit_predictions.shape == (8, 10), "Should predict 10 classes for each image" + + # Check softmax properties + row_sums = np.sum(digit_predictions.data, axis=1) + assert np.allclose(row_sums, 1.0), "Each prediction should sum to 1" + + # Get predicted digits + predicted_digits = np.argmax(digit_predictions.data, axis=1) + confidence_scores = np.max(digit_predictions.data, axis=1) + + print(f" Predicted digits: {predicted_digits}") + print(f" Confidence scores: {confidence_scores}") + + print("✅ Digit recognition: valid multi-class image classification") + + # Application 5: Network Architecture Comparison + print("\n📊 Application 5: Architecture Comparison Study") + + # Create different architectures for same task + architectures = { + "Shallow": create_mlp(4, [16], 3, output_activation=Softmax), + "Medium": create_mlp(4, [12, 8], 3, output_activation=Softmax), + "Deep": create_mlp(4, [8, 8, 8], 3, output_activation=Softmax), + "Wide": create_mlp(4, [24], 3, output_activation=Softmax) + } + + # Test all architectures on same data + test_data = Tensor([[1.0, 2.0, 3.0, 4.0]]) + + for name, network in architectures.items(): + prediction = network(test_data) + assert prediction.shape == (1, 3), f"{name} network should output 3 classes" + assert abs(np.sum(prediction.data) - 1.0) < 1e-6, f"{name} network should output valid probabilities" + + # Count parameters + param_count = sum(layer.weights.size + (layer.bias.size if hasattr(layer, 'bias') and layer.bias is not None else 0) + for layer in network.layers if hasattr(layer, 'weights')) + + print(f" {name} network: {param_count} parameters, prediction: {prediction.data.flatten()}") + + print("✅ Architecture comparison: all networks work with different complexities") + + # Application 6: Transfer Learning Simulation + print("\n🔄 Application 6: Transfer Learning Simulation") + + # Create "pre-trained" feature extractor + feature_extractor = Sequential([ + Dense(input_size=100, output_size=50), + ReLU(), + Dense(input_size=50, output_size=25), + ReLU() + ]) + + # Create task-specific classifier + classifier_head = Sequential([ + Dense(input_size=25, output_size=10), + ReLU(), + Dense(input_size=10, output_size=2), + Softmax() + ]) + + # Simulate transfer learning pipeline + raw_data = Tensor(np.random.randn(3, 100)) + + # Extract features + features = feature_extractor(raw_data) + assert features.shape == (3, 25), "Feature extractor should output 25 features" + + # Classify using extracted features + final_predictions = classifier_head(features) + assert final_predictions.shape == (3, 2), "Classifier should output 2 classes" + + row_sums = np.sum(final_predictions.data, axis=1) + assert np.allclose(row_sums, 1.0), "Transfer learning predictions should be valid" + + print("✅ Transfer learning simulation: modular network composition") + + print("\n🎉 Integration test passed! Your networks work correctly in:") + print(" • Multi-class classification (Iris flowers)") + print(" • Regression tasks (housing prices)") + print(" • Binary classification (sentiment analysis)") + print(" • Image classification (digit recognition)") + print(" • Architecture comparison studies") + print(" • Transfer learning scenarios") + print("📈 Progress: Networks ready for real ML applications!") + + return True + + except Exception as e: + print(f"❌ Integration test failed: {e}") + print("\n💡 This suggests an issue with:") + print(" • Network architecture composition") + print(" • Forward pass through complete networks") + print(" • Shape compatibility between layers") + print(" • Activation function integration") + print(" • Check your Sequential and create_mlp implementations") + return False + +# Run the integration test +success = test_networks_integration() and success + +# Print final summary +print(f"\n{'='*60}") +print("🎯 NETWORKS MODULE TESTING COMPLETE") +print(f"{'='*60}") + +if success: + print("🎉 CONGRATULATIONS! All network tests passed!") + print("\n✅ Your networks module successfully implements:") + print(" • Sequential networks: flexible layer composition") + print(" • MLP creation: automated multi-layer perceptron building") + print(" • Architecture flexibility: shallow, deep, wide networks") + print(" • Multiple activations: ReLU, Tanh, Sigmoid, Softmax") + print(" • Real ML applications: classification, regression, image recognition") + print(" • Network analysis: parameter counting and architecture comparison") + print(" • Transfer learning: modular network composition") + print("\n🚀 You're ready to tackle any neural network architecture!") + print("📈 Final Progress: Networks Module ✓ COMPLETE") +else: + print("⚠️ Some tests failed. Please review the error messages above.") + print("\n🔧 To fix issues:") + print(" 1. Check your Sequential class implementation") + print(" 2. Verify create_mlp function layer creation") + print(" 3. Ensure proper forward pass through all layers") + print(" 4. Test shape compatibility between layers") + print(" 5. Verify activation function integration") + print("\n💪 Keep building! These networks are the foundation of modern AI.") + +# %% [markdown] +""" +## 🎯 Module Summary + +Congratulations! You've successfully implemented complete neural network architectures: + +### What You've Accomplished +✅ **Sequential Networks**: The fundamental architecture for composing layers +✅ **Function Composition**: Understanding how layers combine to create complex behaviors +✅ **MLP Creation**: Building Multi-Layer Perceptrons with flexible architectures +✅ **Architecture Patterns**: Creating shallow, deep, and wide networks +✅ **Forward Pass**: Complete inference through multi-layer networks + +### Key Concepts You've Learned +- **Networks are function composition**: Complex behavior from simple building blocks +- **Sequential architecture**: The foundation of most neural networks +- **MLP patterns**: Dense → Activation → Dense → Activation → Output +- **Architecture design**: How depth and width affect network capability +- **Forward pass**: How data flows through complete networks + +### Mathematical Foundations +- **Function composition**: f(x) = f_n(...f_2(f_1(x))) +- **Universal approximation**: MLPs can approximate any continuous function +- **Hierarchical learning**: Early layers learn simple features, later layers learn complex patterns +- **Nonlinearity**: Activation functions enable complex decision boundaries + +### Real-World Applications +- **Classification**: Image recognition, spam detection, medical diagnosis +- **Regression**: Price prediction, time series forecasting +- **Feature learning**: Extracting meaningful representations from raw data +- **Transfer learning**: Using pre-trained networks for new tasks + +### Next Steps +1. **Export your code**: `tito package nbdev --export 04_networks` +2. **Test your implementation**: `tito module test 04_networks` +3. **Use your networks**: + ```python + from tinytorch.core.networks import Sequential, create_mlp + from tinytorch.core.layers import Dense + from tinytorch.core.activations import ReLU + + # Create custom network + network = Sequential([Dense(10, 5), ReLU(), Dense(5, 1)]) + + # Create MLP + mlp = create_mlp(10, [20, 10], 1) + ``` +4. **Move to Module 5**: Start building convolutional networks for images! + +**Ready for the next challenge?** Let's add convolutional layers for image processing and build CNNs! +""" \ No newline at end of file diff --git a/modules/source/04_networks/tests/test_networks.py b/modules/source/04_networks/tests/test_networks.py deleted file mode 100644 index 2ebcbfe0..00000000 --- a/modules/source/04_networks/tests/test_networks.py +++ /dev/null @@ -1,453 +0,0 @@ -""" -Tests for the Networks module. - -Tests network composition, visualization, and practical applications. -""" - -import pytest -import numpy as np -import sys -from pathlib import Path - -# Add the project root to the path -project_root = Path(__file__).parent.parent.parent.parent -sys.path.insert(0, str(project_root)) - -# Import the modules we're testing -from tinytorch.core.tensor import Tensor -from tinytorch.core.layers import Dense -from tinytorch.core.activations import ReLU, Sigmoid, Tanh - -# Import the networks module -try: - # Import from the exported package - from tinytorch.core.networks import ( - Sequential, - create_mlp - ) - # These functions may not be implemented yet - use fallback - try: - from tinytorch.core.networks import ( - create_classification_network, - create_regression_network, - visualize_network_architecture, - visualize_data_flow, - compare_networks, - analyze_network_behavior - ) - except ImportError: - # Create mock functions for missing functionality - def create_classification_network(*args, **kwargs): - """Mock implementation for testing""" - return create_mlp(*args, **kwargs) - - def create_regression_network(*args, **kwargs): - """Mock implementation for testing""" - return create_mlp(*args, **kwargs) - - def visualize_network_architecture(*args, **kwargs): - """Mock implementation for testing""" - return "Network visualization placeholder" - - def visualize_data_flow(*args, **kwargs): - """Mock implementation for testing""" - return "Data flow visualization placeholder" - - def compare_networks(*args, **kwargs): - """Mock implementation for testing""" - return "Network comparison placeholder" - - def analyze_network_behavior(*args, **kwargs): - """Mock implementation for testing""" - return "Network behavior analysis placeholder" - -except ImportError: - # Fallback for when module isn't exported yet - sys.path.append(str(project_root / "modules" / "source" / "04_networks")) - from networks_dev import ( - Sequential, - create_mlp, - create_classification_network, - create_regression_network, - visualize_network_architecture, - visualize_data_flow, - compare_networks, - analyze_network_behavior - ) - - -class TestSequentialNetwork: - """Test the Sequential network class.""" - - def test_sequential_initialization(self): - """Test Sequential network initialization.""" - layers = [Dense(3, 4), ReLU(), Dense(4, 2), Sigmoid()] - network = Sequential(layers) - - assert len(network.layers) == 4 - assert isinstance(network.layers[0], Dense) - assert isinstance(network.layers[1], ReLU) - assert isinstance(network.layers[2], Dense) - assert isinstance(network.layers[3], Sigmoid) - - def test_sequential_forward_pass(self): - """Test Sequential network forward pass.""" - network = Sequential([ - Dense(3, 4), - ReLU(), - Dense(4, 2), - Sigmoid() - ]) - - x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) - output = network(x) - - assert output.shape == (2, 2) - assert isinstance(output, Tensor) - # Sigmoid output should be between 0 and 1 - assert np.all(output.data >= 0) and np.all(output.data <= 1) - - def test_sequential_callable(self): - """Test that Sequential network is callable.""" - network = Sequential([Dense(2, 3), ReLU()]) - x = Tensor([[1.0, 2.0]]) - - # Test both forward() and __call__() - output1 = network.forward(x) - output2 = network(x) - - assert np.allclose(output1.data, output2.data) - - def test_empty_sequential(self): - """Test Sequential network with no layers.""" - network = Sequential([]) - x = Tensor([[1.0, 2.0, 3.0]]) - - # Should return input unchanged - output = network(x) - assert np.allclose(output.data, x.data) - - -class TestMLPCreation: - """Test MLP creation functions.""" - - def test_create_mlp_basic(self): - """Test basic MLP creation.""" - mlp = create_mlp(input_size=3, hidden_sizes=[4], output_size=2) - - assert len(mlp.layers) == 4 # Dense + ReLU + Dense + Sigmoid - assert isinstance(mlp.layers[0], Dense) - assert mlp.layers[0].input_size == 3 - assert mlp.layers[0].output_size == 4 - assert isinstance(mlp.layers[1], ReLU) - assert isinstance(mlp.layers[2], Dense) - assert mlp.layers[2].input_size == 4 - assert mlp.layers[2].output_size == 2 - assert isinstance(mlp.layers[3], Sigmoid) - - def test_create_mlp_multiple_hidden(self): - """Test MLP creation with multiple hidden layers.""" - mlp = create_mlp(input_size=10, hidden_sizes=[16, 8, 4], output_size=3) - - assert len(mlp.layers) == 8 # 3 Dense + 3 ReLU + 1 Dense + 1 Sigmoid - - # Check Dense layers - dense_layers = [layer for layer in mlp.layers if isinstance(layer, Dense)] - assert len(dense_layers) == 4 - - assert dense_layers[0].input_size == 10 - assert dense_layers[0].output_size == 16 - assert dense_layers[1].input_size == 16 - assert dense_layers[1].output_size == 8 - assert dense_layers[2].input_size == 8 - assert dense_layers[2].output_size == 4 - assert dense_layers[3].input_size == 4 - assert dense_layers[3].output_size == 3 - - def test_create_mlp_no_hidden(self): - """Test MLP creation with no hidden layers.""" - mlp = create_mlp(input_size=5, hidden_sizes=[], output_size=2) - - assert len(mlp.layers) == 2 # Dense + Sigmoid - assert isinstance(mlp.layers[0], Dense) - assert mlp.layers[0].input_size == 5 - assert mlp.layers[0].output_size == 2 - assert isinstance(mlp.layers[1], Sigmoid) - - def test_create_mlp_custom_activation(self): - """Test MLP creation with custom activation functions.""" - mlp = create_mlp( - input_size=3, - hidden_sizes=[4], - output_size=2, - activation=Tanh, - output_activation=Tanh - ) - - assert len(mlp.layers) == 4 - assert isinstance(mlp.layers[1], Tanh) # Hidden activation - assert isinstance(mlp.layers[3], Tanh) # Output activation - - -class TestSpecializedNetworks: - """Test specialized network creation functions.""" - - def test_create_classification_network(self): - """Test classification network creation.""" - classifier = create_classification_network( - input_size=100, - num_classes=5, - hidden_sizes=[32, 16] - ) - - assert len(classifier.layers) == 6 # Dense(100→32) + ReLU + Dense(32→16) + ReLU + Dense(16→5) + Softmax - - # Check output layer - dense_layers = [layer for layer in classifier.layers if isinstance(layer, Dense)] - assert dense_layers[-1].output_size == 5 - # Should use Softmax for multi-class classification - from tinytorch.core.activations import Softmax - assert isinstance(classifier.layers[-1], Softmax) - - def test_create_classification_network_default(self): - """Test classification network with default hidden sizes.""" - classifier = create_classification_network(input_size=50, num_classes=3) - - # Should use default hidden size of input_size // 2 - expected_hidden = 50 // 2 - dense_layers = [layer for layer in classifier.layers if isinstance(layer, Dense)] - assert dense_layers[0].output_size == expected_hidden - assert dense_layers[1].output_size == 3 - - def test_create_regression_network(self): - """Test regression network creation.""" - regressor = create_regression_network( - input_size=13, - output_size=1, - hidden_sizes=[8, 4] - ) - - assert len(regressor.layers) == 6 # Dense(13→8) + ReLU + Dense(8→4) + ReLU + Dense(4→1) + Tanh - - # Check output layer - dense_layers = [layer for layer in regressor.layers if isinstance(layer, Dense)] - assert dense_layers[-1].output_size == 1 - assert isinstance(regressor.layers[-1], Tanh) - - def test_create_regression_network_default(self): - """Test regression network with default parameters.""" - regressor = create_regression_network(input_size=20) - - # Should use default output_size=1 and hidden_size=input_size//2 - expected_hidden = 20 // 2 - dense_layers = [layer for layer in regressor.layers if isinstance(layer, Dense)] - assert dense_layers[0].output_size == expected_hidden - assert dense_layers[1].output_size == 1 - - -class TestNetworkBehavior: - """Test network behavior and functionality.""" - - def test_network_shape_transformations(self): - """Test that networks properly transform tensor shapes.""" - network = Sequential([ - Dense(3, 4), - ReLU(), - Dense(4, 2), - Sigmoid() - ]) - - x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) - output = network(x) - - assert x.shape == (2, 3) - assert output.shape == (2, 2) - - def test_network_activations(self): - """Test that activation functions are properly applied.""" - network = Sequential([ - Dense(2, 3), - ReLU(), - Dense(3, 1), - Sigmoid() - ]) - - x = Tensor([[-1.0, 1.0]]) - output = network(x) - - # ReLU should zero out negative values - # Sigmoid should output values between 0 and 1 - assert np.all(output.data >= 0) and np.all(output.data <= 1) - - def test_network_parameter_count(self): - """Test that networks have the expected number of parameters.""" - network = Sequential([ - Dense(3, 4), # 3*4 + 4 = 16 parameters - ReLU(), - Dense(4, 2), # 4*2 + 2 = 10 parameters - Sigmoid() - ]) - - # Count parameters (weights + biases) - total_params = 0 - for layer in network.layers: - if hasattr(layer, 'weights'): - total_params += layer.weights.data.size - if hasattr(layer, 'bias') and layer.bias is not None: - total_params += layer.bias.data.size - - assert total_params == 26 # 16 + 10 - - -class TestVisualizationFunctions: - """Test visualization functions (basic functionality, not visual output).""" - - def test_visualize_network_architecture_exists(self): - """Test that visualization function exists and is callable.""" - network = Sequential([Dense(3, 4), ReLU(), Dense(4, 2), Sigmoid()]) - - # Should not raise an error - try: - visualize_network_architecture(network, "Test Network") - except Exception as e: - pytest.fail(f"visualize_network_architecture raised {e}") - - def test_visualize_data_flow_exists(self): - """Test that data flow visualization function exists and is callable.""" - network = Sequential([Dense(3, 4), ReLU(), Dense(4, 2), Sigmoid()]) - x = Tensor([[1.0, 2.0, 3.0]]) - - # Should not raise an error - try: - visualize_data_flow(network, x, "Test Data Flow") - except Exception as e: - pytest.fail(f"visualize_data_flow raised {e}") - - def test_compare_networks_exists(self): - """Test that network comparison function exists and is callable.""" - network1 = Sequential([Dense(3, 4), ReLU(), Dense(4, 2), Sigmoid()]) - network2 = Sequential([Dense(3, 8), ReLU(), Dense(8, 2), Sigmoid()]) - x = Tensor([[1.0, 2.0, 3.0]]) - - # Should not raise an error - try: - compare_networks([network1, network2], ["Small", "Large"], x, "Test Comparison") - except Exception as e: - pytest.fail(f"compare_networks raised {e}") - - def test_analyze_network_behavior_exists(self): - """Test that behavior analysis function exists and is callable.""" - network = Sequential([Dense(3, 4), ReLU(), Dense(4, 2), Sigmoid()]) - x = Tensor([[1.0, 2.0, 3.0]]) - - # Should not raise an error - try: - analyze_network_behavior(network, x, "Test Behavior") - except Exception as e: - pytest.fail(f"analyze_network_behavior raised {e}") - - -class TestPracticalApplications: - """Test practical network applications.""" - - def test_digit_classification_network(self): - """Test creating a network for digit classification.""" - classifier = create_classification_network( - input_size=784, # 28x28 image - num_classes=10, # 10 digits - hidden_sizes=[128, 64] - ) - - # Test with fake image data - fake_image = Tensor(np.random.randn(1, 784).astype(np.float32)) - output = classifier(fake_image) - - assert output.shape == (1, 10) - assert np.all(output.data >= 0) and np.all(output.data <= 1) - # Should sum to approximately 1 (probability distribution) - assert np.abs(np.sum(output.data) - 1.0) < 0.1 - - def test_sentiment_analysis_network(self): - """Test creating a network for sentiment analysis.""" - classifier = create_classification_network( - input_size=100, # 100-dimensional embeddings - num_classes=2, # Positive/Negative - hidden_sizes=[32, 16] - ) - - # Test with fake text embeddings - fake_embeddings = Tensor(np.random.randn(1, 100).astype(np.float32)) - output = classifier(fake_embeddings) - - assert output.shape == (1, 2) - assert np.all(output.data >= 0) and np.all(output.data <= 1) - - def test_house_price_prediction_network(self): - """Test creating a network for house price prediction.""" - regressor = create_regression_network( - input_size=13, # 13 house features - output_size=1, # 1 price prediction - hidden_sizes=[8, 4] - ) - - # Test with fake house features - fake_features = Tensor(np.random.randn(1, 13).astype(np.float32)) - output = regressor(fake_features) - - assert output.shape == (1, 1) - # Tanh output should be between -1 and 1 - assert np.all(output.data >= -1) and np.all(output.data <= 1) - - -class TestNetworkIntegration: - """Test integration with other modules.""" - - def test_network_with_tensor_operations(self): - """Test that networks work with tensor operations.""" - network = Sequential([Dense(3, 4), ReLU(), Dense(4, 2), Sigmoid()]) - - # Create input using tensor operations - x1 = Tensor([[1.0, 2.0, 3.0]]) - x2 = Tensor([[4.0, 5.0, 6.0]]) - x_combined = Tensor(np.vstack([x1.data, x2.data])) - - output = network(x_combined) - assert output.shape == (2, 2) - - def test_network_with_activations_module(self): - """Test that networks properly use activations from the activations module.""" - # This test ensures we're using the activations from the activations module - # rather than re-implementing them - network = Sequential([ - Dense(2, 3), - ReLU(), # From activations module - Dense(3, 1), - Sigmoid() # From activations module - ]) - - x = Tensor([[-1.0, 1.0]]) - output = network(x) - - # Test that activations work correctly - assert np.all(output.data >= 0) and np.all(output.data <= 1) - - def test_network_with_layers_module(self): - """Test that networks properly use layers from the layers module.""" - # This test ensures we're using the Dense layers from the layers module - network = Sequential([ - Dense(3, 4), # From layers module - ReLU(), - Dense(4, 2), # From layers module - Sigmoid() - ]) - - x = Tensor([[1.0, 2.0, 3.0]]) - output = network(x) - - # Test that layers work correctly - assert output.shape == (1, 2) - - -if __name__ == "__main__": - # Run the tests - pytest.main([__file__, "-v"]) \ No newline at end of file diff --git a/modules/source/05_cnn/cnn_dev.py b/modules/source/05_cnn/cnn_dev.py index 39004edf..0600f9ff 100644 --- a/modules/source/05_cnn/cnn_dev.py +++ b/modules/source/05_cnn/cnn_dev.py @@ -607,50 +607,50 @@ try: print("\n1. Simple CNN Pipeline Test:") # Create pipeline: Conv2D → ReLU → Flatten → Dense - conv = Conv2D(kernel_size=(2, 2)) - relu = ReLU() - dense = Dense(input_size=4, output_size=3) - - # Input image - image = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) - - # Forward pass + conv = Conv2D(kernel_size=(2, 2)) + relu = ReLU() + dense = Dense(input_size=4, output_size=3) + + # Input image + image = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) + + # Forward pass features = conv(image) # (3,3) → (2,2) activated = relu(features) # (2,2) → (2,2) flattened = flatten(activated) # (2,2) → (1,4) output = dense(flattened) # (1,4) → (1,3) - - assert features.shape == (2, 2), f"Conv output shape wrong: {features.shape}" - assert activated.shape == (2, 2), f"ReLU output shape wrong: {activated.shape}" - assert flattened.shape == (1, 4), f"Flatten output shape wrong: {flattened.shape}" - assert output.shape == (1, 3), f"Dense output shape wrong: {output.shape}" - + + assert features.shape == (2, 2), f"Conv output shape wrong: {features.shape}" + assert activated.shape == (2, 2), f"ReLU output shape wrong: {activated.shape}" + assert flattened.shape == (1, 4), f"Flatten output shape wrong: {flattened.shape}" + assert output.shape == (1, 3), f"Dense output shape wrong: {output.shape}" + print("✅ Simple CNN pipeline works correctly") # Test 2: Multi-layer CNN print("\n2. Multi-layer CNN Test:") # Create deeper pipeline: Conv2D → ReLU → Conv2D → ReLU → Flatten → Dense - conv1 = Conv2D(kernel_size=(2, 2)) - relu1 = ReLU() - conv2 = Conv2D(kernel_size=(2, 2)) - relu2 = ReLU() + conv1 = Conv2D(kernel_size=(2, 2)) + relu1 = ReLU() + conv2 = Conv2D(kernel_size=(2, 2)) + relu2 = ReLU() dense_multi = Dense(input_size=9, output_size=2) - - # Larger input for multi-layer processing + + # Larger input for multi-layer processing large_image = Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]) - - # Forward pass + + # Forward pass h1 = conv1(large_image) # (5,5) → (4,4) h2 = relu1(h1) # (4,4) → (4,4) h3 = conv2(h2) # (4,4) → (3,3) h4 = relu2(h3) # (3,3) → (3,3) h5 = flatten(h4) # (3,3) → (1,9) output_multi = dense_multi(h5) # (1,9) → (1,2) - - assert h1.shape == (4, 4), f"Conv1 output wrong: {h1.shape}" - assert h3.shape == (3, 3), f"Conv2 output wrong: {h3.shape}" - assert h5.shape == (1, 9), f"Flatten output wrong: {h5.shape}" + + assert h1.shape == (4, 4), f"Conv1 output wrong: {h1.shape}" + assert h3.shape == (3, 3), f"Conv2 output wrong: {h3.shape}" + assert h5.shape == (1, 9), f"Flatten output wrong: {h5.shape}" assert output_multi.shape == (1, 2), f"Final output wrong: {output_multi.shape}" print("✅ Multi-layer CNN works correctly") @@ -667,22 +667,22 @@ try: [0, 1, 1, 0, 0, 1, 1, 0], [0, 0, 1, 1, 1, 1, 0, 0], [1, 1, 0, 0, 0, 0, 1, 1]]) - - # CNN for digit classification + + # CNN for digit classification feature_extractor = Conv2D(kernel_size=(3, 3)) # (8,8) → (6,6) - activation = ReLU() - classifier = Dense(input_size=36, output_size=10) # 10 digit classes - - # Forward pass - features = feature_extractor(digit_image) - activated_features = activation(features) + activation = ReLU() + classifier = Dense(input_size=36, output_size=10) # 10 digit classes + + # Forward pass + features = feature_extractor(digit_image) + activated_features = activation(features) feature_vector = flatten(activated_features) - digit_scores = classifier(feature_vector) - - assert features.shape == (6, 6), f"Feature extraction shape wrong: {features.shape}" - assert feature_vector.shape == (1, 36), f"Feature vector shape wrong: {feature_vector.shape}" - assert digit_scores.shape == (1, 10), f"Digit scores shape wrong: {digit_scores.shape}" - + digit_scores = classifier(feature_vector) + + assert features.shape == (6, 6), f"Feature extraction shape wrong: {features.shape}" + assert feature_vector.shape == (1, 36), f"Feature vector shape wrong: {feature_vector.shape}" + assert digit_scores.shape == (1, 10), f"Digit scores shape wrong: {digit_scores.shape}" + print("✅ Image classification scenario works correctly") # Test 4: Feature Extraction and Composition diff --git a/modules/source/05_cnn/cnn_dev_backup.py b/modules/source/05_cnn/cnn_dev_backup.py new file mode 100644 index 00000000..9ab7314e --- /dev/null +++ b/modules/source/05_cnn/cnn_dev_backup.py @@ -0,0 +1,1173 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.17.1 +# --- + +# %% [markdown] +""" +# Module 5: CNN - Convolutional Neural Networks + +Welcome to the CNN module! Here you'll implement the core building block of modern computer vision: the convolutional layer. + +## Learning Goals +- Understand the convolution operation and its importance in computer vision +- Implement Conv2D with explicit for-loops to understand the sliding window mechanism +- Build convolutional layers that can detect spatial patterns in images +- Compose Conv2D with other layers to build complete convolutional networks +- See how convolution enables parameter sharing and translation invariance + +## Build → Use → Understand +1. **Build**: Conv2D layer using sliding window convolution from scratch +2. **Use**: Transform images and see feature maps emerge +3. **Understand**: How CNNs learn hierarchical spatial patterns +""" + +# %% nbgrader={"grade": false, "grade_id": "cnn-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| default_exp core.cnn + +#| export +import numpy as np +import os +import sys +from typing import List, Tuple, Optional +import matplotlib.pyplot as plt + +# Import from the main package - try package first, then local modules +try: + from tinytorch.core.tensor import Tensor + from tinytorch.core.layers import Dense + from tinytorch.core.activations import ReLU +except ImportError: + # For development, import from local modules + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor')) + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations')) + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers')) + from tensor_dev import Tensor + from activations_dev import ReLU + from layers_dev import Dense + +# %% nbgrader={"grade": false, "grade_id": "cnn-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| hide +#| export +def _should_show_plots(): + """Check if we should show plots (disable during testing)""" + # Check multiple conditions that indicate we're in test mode + is_pytest = ( + 'pytest' in sys.modules or + 'test' in sys.argv or + os.environ.get('PYTEST_CURRENT_TEST') is not None or + any('test' in arg for arg in sys.argv) or + any('pytest' in arg for arg in sys.argv) + ) + + # Show plots in development mode (when not in test mode) + return not is_pytest + +# %% nbgrader={"grade": false, "grade_id": "cnn-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false} +print("🔥 TinyTorch CNN Module") +print(f"NumPy version: {np.__version__}") +print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") +print("Ready to build convolutional neural networks!") + +# %% [markdown] +""" +## 📦 Where This Code Lives in the Final Package + +**Learning Side:** You work in `modules/source/05_cnn/cnn_dev.py` +**Building Side:** Code exports to `tinytorch.core.cnn` + +```python +# Final package structure: +from tinytorch.core.cnn import Conv2D, conv2d_naive, flatten # CNN operations! +from tinytorch.core.layers import Dense # Fully connected layers +from tinytorch.core.activations import ReLU # Nonlinearity +from tinytorch.core.tensor import Tensor # Foundation +``` + +**Why this matters:** +- **Learning:** Focused modules for deep understanding of convolution +- **Production:** Proper organization like PyTorch's `torch.nn.Conv2d` +- **Consistency:** All CNN operations live together in `core.cnn` +- **Integration:** Works seamlessly with other TinyTorch components +""" + +# %% [markdown] +""" +## 🧠 The Mathematical Foundation of Convolution + +### The Convolution Operation +Convolution is a mathematical operation that combines two functions to produce a third function: + +``` +(f * g)(t) = ∫ f(τ)g(t - τ)dτ +``` + +In discrete 2D computer vision, this becomes: +``` +(I * K)[i,j] = ΣΣ I[i+m, j+n] × K[m,n] +``` + +### Why Convolution is Perfect for Images +- **Local connectivity**: Each output depends only on a small region of input +- **Weight sharing**: Same filter applied everywhere (translation invariance) +- **Spatial hierarchy**: Multiple layers build increasingly complex features +- **Parameter efficiency**: Much fewer parameters than fully connected layers + +### The Three Core Principles +1. **Sparse connectivity**: Each neuron connects to only a small region +2. **Parameter sharing**: Same weights used across all spatial locations +3. **Equivariant representation**: If input shifts, output shifts correspondingly + +### Connection to Real ML Systems +Every vision framework uses convolution: +- **PyTorch**: `torch.nn.Conv2d` with optimized CUDA kernels +- **TensorFlow**: `tf.keras.layers.Conv2D` with cuDNN acceleration +- **JAX**: `jax.lax.conv_general_dilated` with XLA compilation +- **TinyTorch**: `tinytorch.core.cnn.Conv2D` (what we're building!) + +### Performance Considerations +- **Memory layout**: Efficient data access patterns +- **Vectorization**: SIMD operations for parallel computation +- **Cache efficiency**: Spatial locality in memory access +- **Optimization**: im2col, FFT-based convolution, Winograd algorithm +""" + +# %% [markdown] +""" +## Step 1: Understanding Convolution + +### What is Convolution? +A **convolutional layer** applies a small filter (kernel) across the input, producing a feature map. This operation captures local patterns and is the foundation of modern vision models. + +### Why Convolution Matters in Computer Vision +- **Local connectivity**: Each output value depends only on a small region of the input +- **Weight sharing**: The same filter is applied everywhere (translation invariance) +- **Spatial hierarchy**: Multiple layers build increasingly complex features +- **Parameter efficiency**: Much fewer parameters than fully connected layers + +### The Fundamental Insight +**Convolution is pattern matching!** The kernel learns to detect specific patterns: +- **Edge detectors**: Find boundaries between objects +- **Texture detectors**: Recognize surface patterns +- **Shape detectors**: Identify geometric forms +- **Feature detectors**: Combine simple patterns into complex features + +### Real-World Examples +- **Image processing**: Detect edges, blur, sharpen +- **Computer vision**: Recognize objects, faces, text +- **Medical imaging**: Detect tumors, analyze scans +- **Autonomous driving**: Identify traffic signs, pedestrians + +### Visual Intuition +``` +Input Image: Kernel: Output Feature Map: +[1, 2, 3] [1, 0] [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)] +[4, 5, 6] [0, -1] [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)] +[7, 8, 9] +``` + +The kernel slides across the input, computing dot products at each position. + +Let's implement this step by step! +""" + +# %% nbgrader={"grade": false, "grade_id": "conv2d-naive", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray: + """ + Naive 2D convolution (single channel, no stride, no padding). + + Args: + input: 2D input array (H, W) + kernel: 2D filter (kH, kW) + Returns: + 2D output array (H-kH+1, W-kW+1) + + TODO: Implement the sliding window convolution using for-loops. + + APPROACH: + 1. Get input dimensions: H, W = input.shape + 2. Get kernel dimensions: kH, kW = kernel.shape + 3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1 + 4. Create output array: np.zeros((out_H, out_W)) + 5. Use nested loops to slide the kernel: + - i loop: output rows (0 to out_H-1) + - j loop: output columns (0 to out_W-1) + - di loop: kernel rows (0 to kH-1) + - dj loop: kernel columns (0 to kW-1) + 6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj] + + EXAMPLE: + Input: [[1, 2, 3], Kernel: [[1, 0], + [4, 5, 6], [0, -1]] + [7, 8, 9]] + + Output[0,0] = 1*1 + 2*0 + 4*0 + 5*(-1) = 1 - 5 = -4 + Output[0,1] = 2*1 + 3*0 + 5*0 + 6*(-1) = 2 - 6 = -4 + Output[1,0] = 4*1 + 5*0 + 7*0 + 8*(-1) = 4 - 8 = -4 + Output[1,1] = 5*1 + 6*0 + 8*0 + 9*(-1) = 5 - 9 = -4 + + HINTS: + - Start with output = np.zeros((out_H, out_W)) + - Use four nested loops: for i in range(out_H): for j in range(out_W): for di in range(kH): for dj in range(kW): + - Accumulate the sum: output[i,j] += input[i+di, j+dj] * kernel[di, dj] + """ + ### BEGIN SOLUTION + # Get input and kernel dimensions + H, W = input.shape + kH, kW = kernel.shape + + # Calculate output dimensions + out_H, out_W = H - kH + 1, W - kW + 1 + + # Initialize output array + output = np.zeros((out_H, out_W), dtype=input.dtype) + + # Sliding window convolution with four nested loops + for i in range(out_H): + for j in range(out_W): + for di in range(kH): + for dj in range(kW): + output[i, j] += input[i + di, j + dj] * kernel[di, dj] + + return output + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Quick Test: Convolution Operation + +Let's test your convolution implementation right away! This is the core operation that powers computer vision. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-conv2d-naive-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test conv2d_naive function immediately after implementation +print("🔬 Testing convolution operation...") + +# Test simple 3x3 input with 2x2 kernel +try: + input_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32) + kernel_array = np.array([[1, 0], [0, 1]], dtype=np.float32) # Identity-like kernel + + result = conv2d_naive(input_array, kernel_array) + expected = np.array([[6, 8], [12, 14]], dtype=np.float32) # 1+5, 2+6, 4+8, 5+9 + + print(f"Input:\n{input_array}") + print(f"Kernel:\n{kernel_array}") + print(f"Result:\n{result}") + print(f"Expected:\n{expected}") + + assert np.allclose(result, expected), f"Convolution failed: expected {expected}, got {result}" + print("✅ Simple convolution test passed") + +except Exception as e: + print(f"❌ Simple convolution test failed: {e}") + raise + +# Test edge detection kernel +try: + input_array = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]], dtype=np.float32) + edge_kernel = np.array([[-1, -1], [-1, 3]], dtype=np.float32) # Edge detection + + result = conv2d_naive(input_array, edge_kernel) + expected = np.array([[0, 0], [0, 0]], dtype=np.float32) # Uniform region = no edges + + assert np.allclose(result, expected), f"Edge detection failed: expected {expected}, got {result}" + print("✅ Edge detection test passed") + +except Exception as e: + print(f"❌ Edge detection test failed: {e}") + raise + +# Test output shape +try: + input_5x5 = np.random.randn(5, 5).astype(np.float32) + kernel_3x3 = np.random.randn(3, 3).astype(np.float32) + + result = conv2d_naive(input_5x5, kernel_3x3) + expected_shape = (3, 3) # 5-3+1 = 3 + + assert result.shape == expected_shape, f"Output shape wrong: expected {expected_shape}, got {result.shape}" + print("✅ Output shape test passed") + +except Exception as e: + print(f"❌ Output shape test failed: {e}") + raise + +# Show the convolution process +print("🎯 Convolution behavior:") +print(" Slides kernel across input") +print(" Computes dot product at each position") +print(" Output size = Input size - Kernel size + 1") +print("📈 Progress: Convolution operation ✓") + +# %% [markdown] +""" +## Step 2: Building the Conv2D Layer + +### What is a Conv2D Layer? +A **Conv2D layer** is a learnable convolutional layer that: +- Has learnable kernel weights (initialized randomly) +- Applies convolution to input tensors +- Integrates with the rest of the neural network + +### Why Conv2D Layers Matter +- **Feature learning**: Kernels learn to detect useful patterns +- **Composability**: Can be stacked with other layers +- **Efficiency**: Shared weights reduce parameters dramatically +- **Translation invariance**: Same patterns detected anywhere in the image + +### Real-World Applications +- **Image classification**: Recognize objects in photos +- **Object detection**: Find and locate objects +- **Medical imaging**: Detect anomalies in scans +- **Autonomous driving**: Identify road features + +### Design Decisions +- **Kernel size**: Typically 3×3 or 5×5 for balance of locality and capacity +- **Initialization**: Small random values to break symmetry +- **Integration**: Works with Tensor class and other layers +""" + +# %% nbgrader={"grade": false, "grade_id": "conv2d-class", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class Conv2D: + """ + 2D Convolutional Layer (single channel, single filter, no stride/pad). + + A learnable convolutional layer that applies a kernel to detect spatial patterns. + Perfect for building the foundation of convolutional neural networks. + """ + + def __init__(self, kernel_size: Tuple[int, int]): + """ + Initialize Conv2D layer with random kernel. + + Args: + kernel_size: (kH, kW) - size of the convolution kernel + + TODO: Initialize a random kernel with small values. + + APPROACH: + 1. Store kernel_size as instance variable + 2. Initialize random kernel with small values + 3. Use proper initialization for stable training + + EXAMPLE: + Conv2D((2, 2)) creates: + - kernel: shape (2, 2) with small random values + + HINTS: + - Store kernel_size as self.kernel_size + - Initialize kernel: np.random.randn(kH, kW) * 0.1 (small values) + - Convert to float32 for consistency + """ + ### BEGIN SOLUTION + # Store kernel size + self.kernel_size = kernel_size + kH, kW = kernel_size + + # Initialize random kernel with small values + self.kernel = np.random.randn(kH, kW).astype(np.float32) * 0.1 + ### END SOLUTION + + def forward(self, x: Tensor) -> Tensor: + """ + Forward pass: apply convolution to input tensor. + + Args: + x: Input tensor (2D for simplicity) + + Returns: + Output tensor after convolution + + TODO: Implement forward pass using conv2d_naive function. + + APPROACH: + 1. Extract numpy array from input tensor + 2. Apply conv2d_naive with stored kernel + 3. Return result wrapped in Tensor + + EXAMPLE: + x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) # shape (3, 3) + layer = Conv2D((2, 2)) + y = layer(x) # shape (2, 2) + + HINTS: + - Use x.data to get numpy array + - Use conv2d_naive(x.data, self.kernel) + - Return Tensor(result) to wrap the result + """ + ### BEGIN SOLUTION + # Apply convolution using naive implementation + result = conv2d_naive(x.data, self.kernel) + return Tensor(result) + ### END SOLUTION + + def __call__(self, x: Tensor) -> Tensor: + """Make layer callable: layer(x) same as layer.forward(x)""" + return self.forward(x) + +# %% [markdown] +""" +### 🧪 Quick Test: Conv2D Layer + +Let's test your Conv2D layer implementation! This is a learnable convolutional layer that can be trained. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-conv2d-layer-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test Conv2D layer immediately after implementation +print("🔬 Testing Conv2D layer...") + +# Create a Conv2D layer +try: + layer = Conv2D(kernel_size=(2, 2)) + print(f"Conv2D layer created with kernel size: {layer.kernel_size}") + print(f"Kernel shape: {layer.kernel.shape}") + + # Test that kernel is initialized properly + assert layer.kernel.shape == (2, 2), f"Kernel shape should be (2, 2), got {layer.kernel.shape}" + assert not np.allclose(layer.kernel, 0), "Kernel should not be all zeros" + print("✅ Conv2D layer initialization successful") + + # Test with sample input + x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) + print(f"Input shape: {x.shape}") + + y = layer(x) + print(f"Output shape: {y.shape}") + print(f"Output: {y}") + + # Verify shapes + assert y.shape == (2, 2), f"Output shape should be (2, 2), got {y.shape}" + assert isinstance(y, Tensor), "Output should be a Tensor" + print("✅ Conv2D layer forward pass successful") + +except Exception as e: + print(f"❌ Conv2D layer test failed: {e}") + raise + +# Test different kernel sizes +try: + layer_3x3 = Conv2D(kernel_size=(3, 3)) + x_5x5 = Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]) + y_3x3 = layer_3x3(x_5x5) + + assert y_3x3.shape == (3, 3), f"3x3 kernel output should be (3, 3), got {y_3x3.shape}" + print("✅ Different kernel sizes work correctly") + +except Exception as e: + print(f"❌ Different kernel sizes test failed: {e}") + raise + +# Show the layer behavior +print("🎯 Conv2D layer behavior:") +print(" Learnable kernel weights") +print(" Applies convolution to detect patterns") +print(" Can be trained end-to-end") +print("📈 Progress: Convolution operation ✓, Conv2D layer ✓") + +# %% [markdown] +""" +## Step 3: Flattening for Dense Layers + +### What is Flattening? +**Flattening** converts multi-dimensional tensors to 1D vectors, enabling connection between convolutional and dense layers. + +### Why Flattening is Needed +- **Interface compatibility**: Conv2D outputs 2D, Dense expects 1D +- **Network composition**: Connect spatial features to classification +- **Standard practice**: Almost all CNNs use this pattern +- **Dimension management**: Preserve information while changing shape + +### The Pattern +``` +Conv2D → ReLU → Conv2D → ReLU → Flatten → Dense → Output +``` + +### Real-World Usage +- **Classification**: Final layers need 1D input for class probabilities +- **Feature extraction**: Convert spatial features to vector representations +- **Transfer learning**: Extract features from pre-trained CNNs +""" + +# %% nbgrader={"grade": false, "grade_id": "flatten-function", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def flatten(x: Tensor) -> Tensor: + """ + Flatten a 2D tensor to 1D (for connecting to Dense layers). + + Args: + x: Input tensor to flatten + + Returns: + Flattened tensor with batch dimension preserved + + TODO: Implement flattening operation. + + APPROACH: + 1. Get the numpy array from the tensor + 2. Use .flatten() to convert to 1D + 3. Add batch dimension with [None, :] + 4. Return Tensor wrapped around the result + + EXAMPLE: + Input: Tensor([[1, 2], [3, 4]]) # shape (2, 2) + Output: Tensor([[1, 2, 3, 4]]) # shape (1, 4) + + HINTS: + - Use x.data.flatten() to get 1D array + - Add batch dimension: result[None, :] + - Return Tensor(result) + """ + ### BEGIN SOLUTION + # Flatten the tensor and add batch dimension + flattened = x.data.flatten() + result = flattened[None, :] # Add batch dimension + return Tensor(result) + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Quick Test: Flatten Function + +Let's test your flatten function! This connects convolutional layers to dense layers. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-flatten-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test flatten function immediately after implementation +print("🔬 Testing flatten function...") + +# Test case 1: 2x2 tensor +try: + x = Tensor([[1, 2], [3, 4]]) + flattened = flatten(x) + + print(f"Input: {x}") + print(f"Flattened: {flattened}") + print(f"Flattened shape: {flattened.shape}") + + # Verify shape and content + assert flattened.shape == (1, 4), f"Flattened shape should be (1, 4), got {flattened.shape}" + expected_data = np.array([[1, 2, 3, 4]]) + assert np.array_equal(flattened.data, expected_data), f"Flattened data should be {expected_data}, got {flattened.data}" + print("✅ 2x2 flatten test passed") + +except Exception as e: + print(f"❌ 2x2 flatten test failed: {e}") + raise + +# Test case 2: 3x3 tensor +try: + x2 = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) + flattened2 = flatten(x2) + + assert flattened2.shape == (1, 9), f"Flattened shape should be (1, 9), got {flattened2.shape}" + expected_data2 = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9]]) + assert np.array_equal(flattened2.data, expected_data2), f"Flattened data should be {expected_data2}, got {flattened2.data}" + print("✅ 3x3 flatten test passed") + +except Exception as e: + print(f"❌ 3x3 flatten test failed: {e}") + raise + +# Test case 3: Different shapes +try: + x3 = Tensor([[1, 2, 3, 4], [5, 6, 7, 8]]) # 2x4 + flattened3 = flatten(x3) + + assert flattened3.shape == (1, 8), f"Flattened shape should be (1, 8), got {flattened3.shape}" + expected_data3 = np.array([[1, 2, 3, 4, 5, 6, 7, 8]]) + assert np.array_equal(flattened3.data, expected_data3), f"Flattened data should be {expected_data3}, got {flattened3.data}" + print("✅ Different shapes flatten test passed") + +except Exception as e: + print(f"❌ Different shapes flatten test failed: {e}") + raise + +# Show the flattening behavior +print("🎯 Flatten behavior:") +print(" Converts 2D tensor to 1D") +print(" Preserves batch dimension") +print(" Enables connection to Dense layers") +print("📈 Progress: Convolution operation ✓, Conv2D layer ✓, Flatten ✓") +print("🚀 CNN pipeline ready!") + +# %% [markdown] +""" +## 🧪 Comprehensive CNN Testing Suite + +Let's test all CNN components thoroughly with realistic computer vision scenarios! +""" + +# %% nbgrader={"grade": false, "grade_id": "test-cnn-comprehensive", "locked": false, "schema_version": 3, "solution": false, "task": false} +def test_convolution_operations(): + """Test 1: Comprehensive convolution operations testing""" + print("🔬 Testing Convolution Operations...") + + # Test 1.1: Basic convolution + try: + input_img = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32) + identity_kernel = np.array([[1, 0], [0, 1]], dtype=np.float32) + + result = conv2d_naive(input_img, identity_kernel) + expected = np.array([[6, 8], [12, 14]], dtype=np.float32) + + assert np.allclose(result, expected), f"Identity convolution failed: {result} vs {expected}" + print("✅ Basic convolution test passed") + except Exception as e: + print(f"❌ Basic convolution failed: {e}") + return False + + # Test 1.2: Edge detection kernel + try: + # Vertical edge detection + edge_input = np.array([[0, 0, 1, 1], [0, 0, 1, 1], [0, 0, 1, 1]], dtype=np.float32) + vertical_edge = np.array([[-1, 1], [-1, 1]], dtype=np.float32) + + result = conv2d_naive(edge_input, vertical_edge) + # Should detect the vertical edge at position (0,1) and (1,1) + assert result[0, 1] > 0 and result[1, 1] > 0, "Vertical edge not detected" + print("✅ Edge detection test passed") + except Exception as e: + print(f"❌ Edge detection failed: {e}") + return False + + # Test 1.3: Blur kernel + try: + noise_input = np.array([[1, 0, 1], [0, 1, 0], [1, 0, 1]], dtype=np.float32) + blur_kernel = np.array([[0.25, 0.25], [0.25, 0.25]], dtype=np.float32) + + result = conv2d_naive(noise_input, blur_kernel) + # Blur should smooth out the noise + assert np.all(result >= 0) and np.all(result <= 1), "Blur kernel failed" + print("✅ Blur kernel test passed") + except Exception as e: + print(f"❌ Blur kernel failed: {e}") + return False + + # Test 1.4: Different kernel sizes + try: + large_input = np.random.randn(10, 10).astype(np.float32) + + # Test 3x3 kernel + kernel_3x3 = np.random.randn(3, 3).astype(np.float32) + result_3x3 = conv2d_naive(large_input, kernel_3x3) + assert result_3x3.shape == (8, 8), f"3x3 kernel output shape wrong: {result_3x3.shape}" + + # Test 5x5 kernel + kernel_5x5 = np.random.randn(5, 5).astype(np.float32) + result_5x5 = conv2d_naive(large_input, kernel_5x5) + assert result_5x5.shape == (6, 6), f"5x5 kernel output shape wrong: {result_5x5.shape}" + + print("✅ Different kernel sizes test passed") + except Exception as e: + print(f"❌ Different kernel sizes failed: {e}") + return False + + print("🎯 Convolution operations: All tests passed!") + return True + +def test_conv2d_layer(): + """Test 2: Conv2D layer comprehensive testing""" + print("🔬 Testing Conv2D Layer...") + + # Test 2.1: Layer initialization + try: + layer_2x2 = Conv2D(kernel_size=(2, 2)) + assert layer_2x2.kernel.shape == (2, 2), f"2x2 kernel shape wrong: {layer_2x2.kernel.shape}" + assert not np.allclose(layer_2x2.kernel, 0), "Kernel should not be all zeros" + + layer_3x3 = Conv2D(kernel_size=(3, 3)) + assert layer_3x3.kernel.shape == (3, 3), f"3x3 kernel shape wrong: {layer_3x3.kernel.shape}" + + print("✅ Layer initialization test passed") + except Exception as e: + print(f"❌ Layer initialization failed: {e}") + return False + + # Test 2.2: Forward pass with different inputs + try: + layer = Conv2D(kernel_size=(2, 2)) + + # Small image + small_img = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) + output_small = layer(small_img) + assert output_small.shape == (2, 2), f"Small image output shape wrong: {output_small.shape}" + assert isinstance(output_small, Tensor), "Output should be Tensor" + + # Larger image + large_img = Tensor(np.random.randn(8, 8)) + output_large = layer(large_img) + assert output_large.shape == (7, 7), f"Large image output shape wrong: {output_large.shape}" + + print("✅ Forward pass test passed") + except Exception as e: + print(f"❌ Forward pass failed: {e}") + return False + + # Test 2.3: Learnable parameters + try: + layer1 = Conv2D(kernel_size=(2, 2)) + layer2 = Conv2D(kernel_size=(2, 2)) + + # Different layers should have different random kernels + assert not np.allclose(layer1.kernel, layer2.kernel), "Different layers should have different kernels" + + # Test that kernels are reasonable size (not too large) + assert np.max(np.abs(layer1.kernel)) < 1.0, "Kernel values should be small for stable training" + + print("✅ Learnable parameters test passed") + except Exception as e: + print(f"❌ Learnable parameters failed: {e}") + return False + + # Test 2.4: Real computer vision scenario - digit recognition + try: + # Simulate a simple 5x5 digit + digit_5x5 = Tensor([ + [0, 1, 1, 1, 0], + [1, 0, 0, 0, 1], + [1, 0, 1, 0, 1], + [1, 0, 0, 0, 1], + [0, 1, 1, 1, 0] + ]) + + # Edge detection layer + edge_layer = Conv2D(kernel_size=(3, 3)) + edge_layer.kernel = np.array([[-1, -1, -1], [-1, 8, -1], [-1, -1, -1]], dtype=np.float32) + + edges = edge_layer(digit_5x5) + assert edges.shape == (3, 3), f"Edge detection output shape wrong: {edges.shape}" + + print("✅ Computer vision scenario test passed") + except Exception as e: + print(f"❌ Computer vision scenario failed: {e}") + return False + + print("🎯 Conv2D layer: All tests passed!") + return True + +def test_flatten_operations(): + """Test 3: Flatten operations comprehensive testing""" + print("🔬 Testing Flatten Operations...") + + # Test 3.1: Basic flattening + try: + # 2x2 tensor + x_2x2 = Tensor([[1, 2], [3, 4]]) + flat_2x2 = flatten(x_2x2) + + assert flat_2x2.shape == (1, 4), f"2x2 flatten shape wrong: {flat_2x2.shape}" + expected = np.array([[1, 2, 3, 4]]) + assert np.array_equal(flat_2x2.data, expected), f"2x2 flatten data wrong: {flat_2x2.data}" + + # 3x3 tensor + x_3x3 = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) + flat_3x3 = flatten(x_3x3) + + assert flat_3x3.shape == (1, 9), f"3x3 flatten shape wrong: {flat_3x3.shape}" + expected = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9]]) + assert np.array_equal(flat_3x3.data, expected), f"3x3 flatten data wrong: {flat_3x3.data}" + + print("✅ Basic flattening test passed") + except Exception as e: + print(f"❌ Basic flattening failed: {e}") + return False + + # Test 3.2: Different aspect ratios + try: + # Wide tensor + x_wide = Tensor([[1, 2, 3, 4, 5, 6]]) # 1x6 + flat_wide = flatten(x_wide) + assert flat_wide.shape == (1, 6), f"Wide flatten shape wrong: {flat_wide.shape}" + + # Tall tensor + x_tall = Tensor([[1], [2], [3], [4], [5], [6]]) # 6x1 + flat_tall = flatten(x_tall) + assert flat_tall.shape == (1, 6), f"Tall flatten shape wrong: {flat_tall.shape}" + + print("✅ Different aspect ratios test passed") + except Exception as e: + print(f"❌ Different aspect ratios failed: {e}") + return False + + # Test 3.3: Preserve data order + try: + # Test that flattening preserves row-major order + x_ordered = Tensor([[1, 2, 3], [4, 5, 6]]) # 2x3 + flat_ordered = flatten(x_ordered) + + expected_order = np.array([[1, 2, 3, 4, 5, 6]]) + assert np.array_equal(flat_ordered.data, expected_order), "Flatten should preserve row-major order" + + print("✅ Data order preservation test passed") + except Exception as e: + print(f"❌ Data order preservation failed: {e}") + return False + + # Test 3.4: CNN to Dense connection scenario + try: + # Simulate CNN feature map -> Dense layer + feature_map = Tensor([[0.1, 0.2], [0.3, 0.4]]) # 2x2 feature map + flattened_features = flatten(feature_map) + + # Should be ready for Dense layer input + assert flattened_features.shape == (1, 4), "Feature map should flatten to (1, 4)" + assert isinstance(flattened_features, Tensor), "Should remain a Tensor" + + # Test with Dense layer + dense = Dense(input_size=4, output_size=2) + output = dense(flattened_features) + assert output.shape == (1, 2), f"Dense output shape wrong: {output.shape}" + + print("✅ CNN to Dense connection test passed") + except Exception as e: + print(f"❌ CNN to Dense connection failed: {e}") + return False + + print("🎯 Flatten operations: All tests passed!") + return True + +def test_cnn_pipelines(): + """Test 4: Complete CNN pipeline testing""" + print("🔬 Testing CNN Pipelines...") + + # Test 4.1: Simple CNN pipeline + try: + # Create pipeline: Conv2D -> ReLU -> Flatten -> Dense + conv = Conv2D(kernel_size=(2, 2)) + relu = ReLU() + dense = Dense(input_size=4, output_size=3) + + # Input image + image = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) + + # Forward pass + features = conv(image) # (3,3) -> (2,2) + activated = relu(features) # (2,2) -> (2,2) + flattened = flatten(activated) # (2,2) -> (1,4) + output = dense(flattened) # (1,4) -> (1,3) + + assert features.shape == (2, 2), f"Conv output shape wrong: {features.shape}" + assert activated.shape == (2, 2), f"ReLU output shape wrong: {activated.shape}" + assert flattened.shape == (1, 4), f"Flatten output shape wrong: {flattened.shape}" + assert output.shape == (1, 3), f"Dense output shape wrong: {output.shape}" + + print("✅ Simple CNN pipeline test passed") + except Exception as e: + print(f"❌ Simple CNN pipeline failed: {e}") + return False + + # Test 4.2: Multi-layer CNN + try: + # Create deeper pipeline: Conv2D -> ReLU -> Conv2D -> ReLU -> Flatten -> Dense + conv1 = Conv2D(kernel_size=(2, 2)) + relu1 = ReLU() + conv2 = Conv2D(kernel_size=(2, 2)) + relu2 = ReLU() + dense = Dense(input_size=1, output_size=2) + + # Larger input for multi-layer processing + large_image = Tensor(np.random.randn(5, 5)) + + # Forward pass + h1 = conv1(large_image) # (5,5) -> (4,4) + h2 = relu1(h1) # (4,4) -> (4,4) + h3 = conv2(h2) # (4,4) -> (3,3) + h4 = relu2(h3) # (3,3) -> (3,3) + h5 = flatten(h4) # (3,3) -> (1,9) + + # Adjust dense layer for correct input size + dense_adjusted = Dense(input_size=9, output_size=2) + output = dense_adjusted(h5) # (1,9) -> (1,2) + + assert h1.shape == (4, 4), f"Conv1 output wrong: {h1.shape}" + assert h3.shape == (3, 3), f"Conv2 output wrong: {h3.shape}" + assert h5.shape == (1, 9), f"Flatten output wrong: {h5.shape}" + assert output.shape == (1, 2), f"Final output wrong: {output.shape}" + + print("✅ Multi-layer CNN test passed") + except Exception as e: + print(f"❌ Multi-layer CNN failed: {e}") + return False + + # Test 4.3: Image classification scenario + try: + # Simulate MNIST-like 8x8 digit classification + digit_image = Tensor(np.random.randn(8, 8)) + + # CNN for digit classification + feature_extractor = Conv2D(kernel_size=(3, 3)) # (8,8) -> (6,6) + activation = ReLU() + classifier_prep = flatten # (6,6) -> (1,36) + classifier = Dense(input_size=36, output_size=10) # 10 digit classes + + # Forward pass + features = feature_extractor(digit_image) + activated_features = activation(features) + feature_vector = classifier_prep(activated_features) + digit_scores = classifier(feature_vector) + + assert features.shape == (6, 6), f"Feature extraction shape wrong: {features.shape}" + assert feature_vector.shape == (1, 36), f"Feature vector shape wrong: {feature_vector.shape}" + assert digit_scores.shape == (1, 10), f"Digit scores shape wrong: {digit_scores.shape}" + + print("✅ Image classification scenario test passed") + except Exception as e: + print(f"❌ Image classification scenario failed: {e}") + return False + + # Test 4.4: Real-world CNN architecture pattern + try: + # Simulate LeNet-like architecture pattern + input_img = Tensor(np.random.randn(32, 32)) # 32x32 input image + + # First conv block + conv1 = Conv2D(kernel_size=(5, 5)) # (32,32) -> (28,28) + relu1 = ReLU() + + # Second conv block + conv2 = Conv2D(kernel_size=(5, 5)) # (28,28) -> (24,24) + relu2 = ReLU() + + # Classifier + classifier = Dense(input_size=24*24, output_size=3) # 3 classes + + # Forward pass + h1 = relu1(conv1(input_img)) + h2 = relu2(conv2(h1)) + h3 = flatten(h2) + output = classifier(h3) + + assert h1.shape == (28, 28), f"First conv block output wrong: {h1.shape}" + assert h2.shape == (24, 24), f"Second conv block output wrong: {h2.shape}" + assert h3.shape == (1, 576), f"Flattened features wrong: {h3.shape}" # 24*24 = 576 + assert output.shape == (1, 3), f"Classification output wrong: {output.shape}" + + print("✅ Real-world CNN architecture test passed") + except Exception as e: + print(f"❌ Real-world CNN architecture failed: {e}") + return False + + print("🎯 CNN pipelines: All tests passed!") + return True + +# Run all comprehensive tests +def run_comprehensive_cnn_tests(): + """Run all comprehensive CNN tests""" + print("🧪 Running Comprehensive CNN Test Suite...") + print("=" * 50) + + test_results = [] + + # Run all test functions + test_results.append(test_convolution_operations()) + test_results.append(test_conv2d_layer()) + test_results.append(test_flatten_operations()) + test_results.append(test_cnn_pipelines()) + + # Summary + print("=" * 50) + print("📊 Test Results Summary:") + print(f"✅ Convolution Operations: {'PASSED' if test_results[0] else 'FAILED'}") + print(f"✅ Conv2D Layer: {'PASSED' if test_results[1] else 'FAILED'}") + print(f"✅ Flatten Operations: {'PASSED' if test_results[2] else 'FAILED'}") + print(f"✅ CNN Pipelines: {'PASSED' if test_results[3] else 'FAILED'}") + + all_passed = all(test_results) + print(f"\n🎯 Overall Result: {'ALL TESTS PASSED! 🎉' if all_passed else 'SOME TESTS FAILED ❌'}") + + if all_passed: + print("\n🚀 CNN Module Implementation Complete!") + print(" ✓ Convolution operations working correctly") + print(" ✓ Conv2D layers ready for training") + print(" ✓ Flatten operations connecting conv to dense layers") + print(" ✓ Complete CNN pipelines functional") + print("\n🎓 Ready for real computer vision applications!") + + return all_passed + +# Run the comprehensive test suite +if __name__ == "__main__": + run_comprehensive_cnn_tests() + +# %% [markdown] +""" +### 🧪 Test Your CNN Implementations + +Once you implement the functions above, run these cells to test them: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-conv2d-naive", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test conv2d_naive function +print("Testing conv2d_naive function...") + +# Test case 1: Simple 3x3 input with 2x2 kernel +input_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32) +kernel_array = np.array([[1, 0], [0, -1]], dtype=np.float32) + +result = conv2d_naive(input_array, kernel_array) +expected = np.array([[-4, -4], [-4, -4]], dtype=np.float32) + +print(f"Input:\n{input_array}") +print(f"Kernel:\n{kernel_array}") +print(f"Result:\n{result}") +print(f"Expected:\n{expected}") + +assert np.allclose(result, expected), f"conv2d_naive failed: expected {expected}, got {result}" + +# Test case 2: Different kernel +kernel2 = np.array([[1, 1], [1, 1]], dtype=np.float32) +result2 = conv2d_naive(input_array, kernel2) +expected2 = np.array([[12, 16], [24, 28]], dtype=np.float32) + +assert np.allclose(result2, expected2), f"conv2d_naive failed: expected {expected2}, got {result2}" + +print("✅ conv2d_naive tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-conv2d-layer", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test Conv2D layer +print("Testing Conv2D layer...") + +# Create a Conv2D layer +layer = Conv2D(kernel_size=(2, 2)) +print(f"Kernel size: {layer.kernel_size}") +print(f"Kernel shape: {layer.kernel.shape}") + +# Test with sample input +x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) +print(f"Input shape: {x.shape}") + +y = layer(x) +print(f"Output shape: {y.shape}") +print(f"Output: {y}") + +# Verify shapes +assert y.shape == (2, 2), f"Output shape should be (2, 2), got {y.shape}" +assert isinstance(y, Tensor), "Output should be a Tensor" + +print("✅ Conv2D layer tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-flatten", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test flatten function +print("Testing flatten function...") + +# Test case 1: 2x2 tensor +x = Tensor([[1, 2], [3, 4]]) +flattened = flatten(x) + +print(f"Input: {x}") +print(f"Flattened: {flattened}") +print(f"Flattened shape: {flattened.shape}") + +# Verify shape and content +assert flattened.shape == (1, 4), f"Flattened shape should be (1, 4), got {flattened.shape}" +expected_data = np.array([[1, 2, 3, 4]]) +assert np.array_equal(flattened.data, expected_data), f"Flattened data should be {expected_data}, got {flattened.data}" + +# Test case 2: 3x3 tensor +x2 = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) +flattened2 = flatten(x2) + +assert flattened2.shape == (1, 9), f"Flattened shape should be (1, 9), got {flattened2.shape}" +expected_data2 = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9]]) +assert np.array_equal(flattened2.data, expected_data2), f"Flattened data should be {expected_data2}, got {flattened2.data}" + +print("✅ Flatten tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-cnn-pipeline", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test complete CNN pipeline +print("Testing complete CNN pipeline...") + +# Create a simple CNN pipeline: Conv2D → ReLU → Flatten → Dense +conv_layer = Conv2D(kernel_size=(2, 2)) +relu = ReLU() +dense_layer = Dense(input_size=4, output_size=2) + +# Test input (3x3 image) +x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) +print(f"Input shape: {x.shape}") + +# Forward pass through pipeline +h1 = conv_layer(x) +print(f"After Conv2D: {h1.shape}") + +h2 = relu(h1) +print(f"After ReLU: {h2.shape}") + +h3 = flatten(h2) +print(f"After Flatten: {h3.shape}") + +h4 = dense_layer(h3) +print(f"After Dense: {h4.shape}") + +# Verify pipeline works +assert h1.shape == (2, 2), f"Conv2D output should be (2, 2), got {h1.shape}" +assert h2.shape == (2, 2), f"ReLU output should be (2, 2), got {h2.shape}" +assert h3.shape == (1, 4), f"Flatten output should be (1, 4), got {h3.shape}" +assert h4.shape == (1, 2), f"Dense output should be (1, 2), got {h4.shape}" + +print("✅ CNN pipeline tests passed!") + +# %% [markdown] +""" +## 🎯 Module Summary + +Congratulations! You've successfully implemented the core components of convolutional neural networks: + +### What You've Accomplished +✅ **Convolution Operation**: Implemented conv2d_naive with sliding window from scratch +✅ **Conv2D Layer**: Built a learnable convolutional layer with random kernel initialization +✅ **Flattening**: Created the bridge between convolutional and dense layers +✅ **CNN Pipeline**: Composed Conv2D → ReLU → Flatten → Dense for complete networks +✅ **Spatial Pattern Detection**: Understanding how convolution detects local features + +### Key Concepts You've Learned +- **Convolution is pattern matching**: Kernels detect specific spatial patterns +- **Parameter sharing**: Same kernel applied everywhere for translation invariance +- **Local connectivity**: Each output depends only on a small input region +- **Spatial hierarchy**: Multiple layers build increasingly complex features +- **Dimension management**: Flattening connects spatial and vector representations + +### Mathematical Foundations +- **Convolution operation**: (I * K)[i,j] = ΣΣ I[i+m, j+n] × K[m,n] +- **Sliding window**: Kernel moves across input computing dot products +- **Feature maps**: Convolution outputs that highlight detected patterns +- **Translation invariance**: Same pattern detected regardless of position + +### Real-World Applications +- **Computer vision**: Object recognition, face detection, medical imaging +- **Image processing**: Edge detection, noise reduction, enhancement +- **Autonomous systems**: Traffic sign recognition, obstacle detection +- **Scientific imaging**: Satellite imagery, microscopy, astronomy + +### Next Steps +1. **Export your code**: `tito package nbdev --export 05_cnn` +2. **Test your implementation**: `tito module test 05_cnn` +3. **Use your CNN components**: + ```python + from tinytorch.core.cnn import Conv2D, conv2d_naive, flatten + from tinytorch.core.layers import Dense + from tinytorch.core.activations import ReLU + + # Create CNN pipeline + conv = Conv2D((3, 3)) + relu = ReLU() + dense = Dense(16, 10) + + # Process image + features = conv(image) + activated = relu(features) + flattened = flatten(activated) + output = dense(flattened) + ``` +4. **Move to Module 6**: Start building data loading and preprocessing pipelines! + +**Ready for the next challenge?** Let's build efficient data loading systems to feed our networks! +""" \ No newline at end of file diff --git a/modules/source/05_cnn/tests/test_cnn.py b/modules/source/05_cnn/tests/test_cnn.py deleted file mode 100644 index 93a7c4f1..00000000 --- a/modules/source/05_cnn/tests/test_cnn.py +++ /dev/null @@ -1,368 +0,0 @@ -""" -Test suite for the CNN module. -This tests the CNN implementations to ensure they work correctly. -""" - -import pytest -import numpy as np -import sys -from pathlib import Path - -# Add the CNN module to the path -sys.path.append(str(Path(__file__).parent.parent)) - -try: - # Import from the exported package - from tinytorch.core.cnn import conv2d_naive, Conv2D, flatten -except ImportError: - # Fallback for when module isn't exported yet - from cnn_dev import conv2d_naive, Conv2D, flatten - -from tinytorch.core.tensor import Tensor - -def safe_numpy(tensor): - """Get numpy array from tensor, using .data attribute""" - return tensor.data - - -class TestConv2DNaive: - """Test the naive convolution implementation.""" - - def test_conv2d_naive_small(self): - """Test basic convolution with small matrices.""" - input = np.array([ - [1, 2, 3], - [4, 5, 6], - [7, 8, 9] - ], dtype=np.float32) - kernel = np.array([ - [1, 0], - [0, -1] - ], dtype=np.float32) - expected = np.array([ - [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)], - [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)] - ], dtype=np.float32) - output = conv2d_naive(input, kernel) - assert np.allclose(output, expected), f"conv2d_naive output incorrect!\nExpected:\n{expected}\nGot:\n{output}" - - def test_conv2d_naive_edge_detection(self): - """Test convolution with edge detection kernel.""" - input = np.array([ - [0, 0, 0, 0, 0], - [0, 1, 1, 1, 0], - [0, 1, 1, 1, 0], - [0, 1, 1, 1, 0], - [0, 0, 0, 0, 0] - ], dtype=np.float32) - - # Vertical edge detection kernel - kernel = np.array([ - [-1, 0, 1], - [-2, 0, 2], - [-1, 0, 1] - ], dtype=np.float32) - - output = conv2d_naive(input, kernel) - assert output.shape == (3, 3), f"Expected shape (3, 3), got {output.shape}" - - # Should detect vertical edges - assert np.abs(output[1, 0]) > 0, "Should detect left edge" - assert np.abs(output[1, 2]) > 0, "Should detect right edge" - assert np.abs(output[1, 1]) < 1, "Should be small in center" - - def test_conv2d_naive_identity_kernel(self): - """Test convolution with identity kernel.""" - input = np.array([ - [1, 2, 3], - [4, 5, 6], - [7, 8, 9] - ], dtype=np.float32) - - # Identity kernel - kernel = np.array([ - [0, 0, 0], - [0, 1, 0], - [0, 0, 0] - ], dtype=np.float32) - - output = conv2d_naive(input, kernel) - expected = np.array([[5]], dtype=np.float32) # Only center value - assert np.allclose(output, expected), f"Identity kernel failed: got {output}, expected {expected}" - - def test_conv2d_naive_different_sizes(self): - """Test convolution with different input and kernel sizes.""" - # 4x4 input, 2x2 kernel - input = np.array([ - [1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16] - ], dtype=np.float32) - - kernel = np.array([ - [1, 1], - [1, 1] - ], dtype=np.float32) - - output = conv2d_naive(input, kernel) - assert output.shape == (3, 3), f"Expected shape (3, 3), got {output.shape}" - - # Check first element: 1+2+5+6 = 14 - assert np.isclose(output[0, 0], 14), f"First element should be 14, got {output[0, 0]}" - - def test_conv2d_naive_single_pixel(self): - """Test convolution with single pixel input.""" - input = np.array([[5]], dtype=np.float32) - kernel = np.array([[2]], dtype=np.float32) - - output = conv2d_naive(input, kernel) - expected = np.array([[10]], dtype=np.float32) - assert np.allclose(output, expected), f"Single pixel convolution failed: got {output}, expected {expected}" - - -class TestConv2DLayer: - """Test the Conv2D layer implementation.""" - - def test_conv2d_layer_creation(self): - """Test Conv2D layer creation.""" - conv = Conv2D((3, 3)) - assert conv.kernel_size == (3, 3), f"Kernel size should be (3, 3), got {conv.kernel_size}" - assert conv.kernel.shape == (3, 3), f"Kernel shape should be (3, 3), got {conv.kernel.shape}" - - def test_conv2d_layer_forward_pass(self): - """Test Conv2D layer forward pass.""" - conv = Conv2D((2, 2)) - x = Tensor(np.ones((4, 4), dtype=np.float32)) - - output = conv(x) - assert output.shape == (3, 3), f"Expected output shape (3, 3), got {output.shape}" - assert hasattr(output, 'data'), "Output should be a Tensor with data attribute" - - def test_conv2d_layer_different_sizes(self): - """Test Conv2D layer with different input sizes.""" - conv = Conv2D((2, 2)) - - # Test with 3x3 input - x1 = Tensor(np.ones((3, 3), dtype=np.float32)) - out1 = conv(x1) - assert out1.shape == (2, 2), f"3x3 input should give (2, 2) output, got {out1.shape}" - - # Test with 5x5 input - x2 = Tensor(np.ones((5, 5), dtype=np.float32)) - out2 = conv(x2) - assert out2.shape == (4, 4), f"5x5 input should give (4, 4) output, got {out2.shape}" - - def test_conv2d_layer_kernel_initialization(self): - """Test that Conv2D layer initializes kernel properly.""" - conv = Conv2D((3, 3)) - - # Kernel should not be all zeros - assert not np.allclose(conv.kernel, 0), "Kernel should not be all zeros" - - # Kernel should be reasonable size (not too large) - assert np.abs(conv.kernel).max() < 10, "Kernel values should be reasonable" - - def test_conv2d_layer_reproducibility(self): - """Test that Conv2D layer gives consistent results.""" - conv = Conv2D((2, 2)) - x = Tensor(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=np.float32)) - - # Multiple forward passes should give same result - out1 = conv(x) - out2 = conv(x) - - assert np.allclose(safe_numpy(out1), safe_numpy(out2)), "Conv2D should be deterministic" - - -class TestFlattenFunction: - """Test the flatten function implementation.""" - - def test_flatten_2d_matrix(self): - """Test flattening a 2D matrix.""" - x = Tensor(np.array([[1, 2], [3, 4]], dtype=np.float32)) - flattened = flatten(x) - - expected = np.array([[1, 2, 3, 4]], dtype=np.float32) - assert np.array_equal(safe_numpy(flattened), expected), f"Flatten failed: got {safe_numpy(flattened)}, expected {expected}" - assert flattened.shape == (1, 4), f"Expected shape (1, 4), got {flattened.shape}" - - def test_flatten_3d_tensor(self): - """Test flattening a 3D tensor.""" - x = Tensor(np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]], dtype=np.float32)) - flattened = flatten(x) - - expected = np.array([[1, 2, 3, 4, 5, 6, 7, 8]], dtype=np.float32) - assert np.array_equal(safe_numpy(flattened), expected), f"3D flatten failed: got {safe_numpy(flattened)}, expected {expected}" - assert flattened.shape == (1, 8), f"Expected shape (1, 8), got {flattened.shape}" - - def test_flatten_1d_tensor(self): - """Test flattening a 1D tensor.""" - x = Tensor(np.array([1, 2, 3, 4], dtype=np.float32)) - flattened = flatten(x) - - expected = np.array([[1, 2, 3, 4]], dtype=np.float32) - assert np.array_equal(safe_numpy(flattened), expected), f"1D flatten failed: got {safe_numpy(flattened)}, expected {expected}" - assert flattened.shape == (1, 4), f"Expected shape (1, 4), got {flattened.shape}" - - def test_flatten_single_element(self): - """Test flattening a single element tensor.""" - x = Tensor(np.array([[[[5]]]], dtype=np.float32)) - flattened = flatten(x) - - expected = np.array([[5]], dtype=np.float32) - assert np.array_equal(safe_numpy(flattened), expected), f"Single element flatten failed: got {safe_numpy(flattened)}, expected {expected}" - assert flattened.shape == (1, 1), f"Expected shape (1, 1), got {flattened.shape}" - - def test_flatten_preserves_data_type(self): - """Test that flatten preserves data type.""" - x = Tensor(np.array([[1, 2], [3, 4]], dtype=np.float32)) - flattened = flatten(x) - - assert safe_numpy(flattened).dtype == np.float32, f"Data type should be preserved: got {safe_numpy(flattened).dtype}" - - -class TestCNNIntegration: - """Test integration between CNN components.""" - - def test_conv_then_flatten(self): - """Test convolution followed by flatten (typical CNN pattern).""" - # Create a simple input - x = Tensor(np.array([ - [1, 2, 3, 4], - [5, 6, 7, 8], - [9, 10, 11, 12], - [13, 14, 15, 16] - ], dtype=np.float32)) - - # Apply convolution - conv = Conv2D((2, 2)) - conv_out = conv(x) - assert conv_out.shape == (3, 3), f"Conv output should be (3, 3), got {conv_out.shape}" - - # Apply flatten - flat_out = flatten(conv_out) - assert flat_out.shape == (1, 9), f"Flatten output should be (1, 9), got {flat_out.shape}" - - # Check that data is preserved - assert safe_numpy(flat_out).size == 9, "Should have 9 elements after flatten" - - def test_multiple_conv_layers(self): - """Test multiple convolution layers (deeper CNN).""" - x = Tensor(np.ones((5, 5), dtype=np.float32)) - - # First conv layer - conv1 = Conv2D((2, 2)) - out1 = conv1(x) - assert out1.shape == (4, 4), f"First conv should give (4, 4), got {out1.shape}" - - # Second conv layer - conv2 = Conv2D((2, 2)) - out2 = conv2(out1) - assert out2.shape == (3, 3), f"Second conv should give (3, 3), got {out2.shape}" - - # Final flatten - final = flatten(out2) - assert final.shape == (1, 9), f"Final flatten should give (1, 9), got {final.shape}" - - def test_conv_output_range(self): - """Test that convolution outputs are in reasonable range.""" - # Create input with known range - x = Tensor(np.random.rand(4, 4).astype(np.float32)) # Values 0-1 - - conv = Conv2D((2, 2)) - output = conv(x) - - # Output should be finite - assert np.all(np.isfinite(safe_numpy(output))), "Conv output should be finite" - - # Output should not be extremely large - assert np.abs(safe_numpy(output)).max() < 100, "Conv output should not be extremely large" - - -class TestCNNEdgeCases: - """Test edge cases and error conditions.""" - - def test_conv2d_naive_minimum_size(self): - """Test convolution with minimum possible sizes.""" - # 1x1 input, 1x1 kernel - input = np.array([[1]], dtype=np.float32) - kernel = np.array([[2]], dtype=np.float32) - - output = conv2d_naive(input, kernel) - expected = np.array([[2]], dtype=np.float32) - assert np.allclose(output, expected), f"Minimum size convolution failed: got {output}, expected {expected}" - - def test_conv2d_layer_minimum_size(self): - """Test Conv2D layer with minimum input size.""" - conv = Conv2D((1, 1)) - x = Tensor(np.array([[5]], dtype=np.float32)) - - output = conv(x) - assert output.shape == (1, 1), f"Minimum size layer should give (1, 1), got {output.shape}" - - def test_flatten_empty_handling(self): - """Test flatten with various edge cases.""" - # Very small tensor - x = Tensor(np.array([1], dtype=np.float32)) - flattened = flatten(x) - assert flattened.shape == (1, 1), f"Single element should give (1, 1), got {flattened.shape}" - - def test_conv_with_zeros(self): - """Test convolution with zero inputs.""" - # All zeros input - x = Tensor(np.zeros((3, 3), dtype=np.float32)) - conv = Conv2D((2, 2)) - output = conv(x) - - # Should not crash and should produce valid output - assert output.shape == (2, 2), f"Zero input should give (2, 2), got {output.shape}" - assert np.all(np.isfinite(safe_numpy(output))), "Zero input should produce finite output" - - def test_conv_with_negative_values(self): - """Test convolution with negative inputs.""" - x = Tensor(np.array([[-1, -2], [-3, -4]], dtype=np.float32)) - conv = Conv2D((2, 2)) - output = conv(x) - - # Should handle negative values properly - assert output.shape == (1, 1), f"Negative input should give (1, 1), got {output.shape}" - assert np.all(np.isfinite(safe_numpy(output))), "Negative input should produce finite output" - - -class TestCNNPerformance: - """Test performance characteristics of CNN operations.""" - - def test_conv_reasonable_speed(self): - """Test that convolution completes in reasonable time.""" - import time - - # Medium-sized input - x = Tensor(np.random.rand(10, 10).astype(np.float32)) - conv = Conv2D((3, 3)) - - start_time = time.time() - output = conv(x) - end_time = time.time() - - # Should complete quickly (less than 1 second) - assert end_time - start_time < 1.0, "Convolution should complete quickly" - assert output.shape == (8, 8), f"Expected (8, 8), got {output.shape}" - - def test_flatten_preserves_size(self): - """Test that flatten preserves total number of elements.""" - shapes = [(2, 3), (4, 4), (1, 10), (5, 2, 3)] - - for shape in shapes: - x = Tensor(np.random.rand(*shape).astype(np.float32)) - flattened = flatten(x) - - original_size = np.prod(shape) - flattened_size = flattened.shape[1] # Second dimension since flatten returns (1, N) - - assert original_size == flattened_size, f"Size mismatch for shape {shape}: {original_size} != {flattened_size}" - - -if __name__ == "__main__": - # Run the tests - pytest.main([__file__, "-v"]) \ No newline at end of file diff --git a/modules/source/06_dataloader/dataloader_dev.py b/modules/source/06_dataloader/dataloader_dev.py index b73c51fe..e213d8cc 100644 --- a/modules/source/06_dataloader/dataloader_dev.py +++ b/modules/source/06_dataloader/dataloader_dev.py @@ -753,22 +753,22 @@ try: dataset = SimpleDataset(size=20, num_features=5, num_classes=4) print(f"Dataset created: size={len(dataset)}, features={dataset.num_features}, classes={dataset.get_num_classes()}") - - # Test basic properties + + # Test basic properties assert len(dataset) == 20, f"Dataset length should be 20, got {len(dataset)}" assert dataset.get_num_classes() == 4, f"Should have 4 classes, got {dataset.get_num_classes()}" print("✅ SimpleDataset basic properties work correctly") - + # Test sample access - data, label = dataset[0] - assert isinstance(data, Tensor), "Data should be a Tensor" - assert isinstance(label, Tensor), "Label should be a Tensor" + data, label = dataset[0] + assert isinstance(data, Tensor), "Data should be a Tensor" + assert isinstance(label, Tensor), "Label should be a Tensor" assert data.shape == (5,), f"Data shape should be (5,), got {data.shape}" assert label.shape == (), f"Label shape should be (), got {label.shape}" print("✅ SimpleDataset sample access works correctly") - + # Test sample shape - sample_shape = dataset.get_sample_shape() + sample_shape = dataset.get_sample_shape() assert sample_shape == (5,), f"Sample shape should be (5,), got {sample_shape}" print("✅ SimpleDataset get_sample_shape works correctly") @@ -787,7 +787,7 @@ try: assert np.array_equal(label1.data, label2.data), "Labels should be deterministic" print("✅ SimpleDataset data is deterministic") -except Exception as e: + except Exception as e: print(f"❌ SimpleDataset test failed: {e}") raise @@ -861,9 +861,9 @@ try: # Verify batch properties assert batch_data.shape[1] == 8, f"Features should be 8, got {batch_data.shape[1]}" assert len(batch_labels.shape) == 1, f"Labels should be 1D, got shape {batch_labels.shape}" - assert isinstance(batch_data, Tensor), "Batch data should be Tensor" - assert isinstance(batch_labels, Tensor), "Batch labels should be Tensor" - + assert isinstance(batch_data, Tensor), "Batch data should be Tensor" + assert isinstance(batch_labels, Tensor), "Batch labels should be Tensor" + assert epoch_samples == 100, f"Should process 100 samples, got {epoch_samples}" expected_batches = (100 + 16 - 1) // 16 assert epoch_batches == expected_batches, f"Should have {expected_batches} batches, got {epoch_batches}" @@ -943,11 +943,11 @@ try: dataset = SimpleDataset(size=60, num_features=6, num_classes=3) loader = DataLoader(dataset, batch_size=20, shuffle=True) - for epoch in range(3): - epoch_samples = 0 + for epoch in range(3): + epoch_samples = 0 for batch_data, batch_labels in loader: - epoch_samples += batch_data.shape[0] - + epoch_samples += batch_data.shape[0] + # Verify shapes remain consistent across epochs assert batch_data.shape[1] == 6, f"Features should be 6 in epoch {epoch}" assert len(batch_labels.shape) == 1, f"Labels should be 1D in epoch {epoch}" @@ -963,7 +963,7 @@ try: print(" • Memory-efficient processing") print(" • Multi-epoch training scenarios") -except Exception as e: + except Exception as e: print(f"❌ Integration test failed: {e}") raise @@ -1038,7 +1038,7 @@ Congratulations! You've successfully implemented the core components of data loa for epoch in range(num_epochs): for batch_data, batch_labels in loader: # Train model - pass + pass ``` 4. **Explore advanced topics**: Data augmentation, distributed loading, streaming datasets! diff --git a/modules/source/06_dataloader/dataloader_dev_backup.py b/modules/source/06_dataloader/dataloader_dev_backup.py new file mode 100644 index 00000000..bfc1f080 --- /dev/null +++ b/modules/source/06_dataloader/dataloader_dev_backup.py @@ -0,0 +1,1368 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.17.1 +# --- + +# %% [markdown] +""" +# Module 6: DataLoader - Data Loading and Preprocessing + +Welcome to the DataLoader module! This is where you'll learn how to efficiently load, process, and manage data for machine learning systems. + +## Learning Goals +- Understand data pipelines as the foundation of ML systems +- Implement efficient data loading with memory management and batching +- Build reusable dataset abstractions for different data types +- Master the Dataset and DataLoader pattern used in all ML frameworks +- Learn systems thinking for data engineering and I/O optimization + +## Build → Use → Understand +1. **Build**: Create dataset classes and data loaders from scratch +2. **Use**: Load real datasets and feed them to neural networks +3. **Understand**: How data engineering affects system performance and scalability +""" + +# %% nbgrader={"grade": false, "grade_id": "dataloader-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| default_exp core.dataloader + +#| export +import numpy as np +import sys +import os +import pickle +import struct +from typing import List, Tuple, Optional, Union, Iterator +import matplotlib.pyplot as plt +import urllib.request +import tarfile + +# Import our building blocks - try package first, then local modules +try: + from tinytorch.core.tensor import Tensor +except ImportError: + # For development, import from local modules + sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor')) + from tensor_dev import Tensor + +# %% nbgrader={"grade": false, "grade_id": "dataloader-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| hide +#| export +def _should_show_plots(): + """Check if we should show plots (disable during testing)""" + # Check multiple conditions that indicate we're in test mode + is_pytest = ( + 'pytest' in sys.modules or + 'test' in sys.argv or + os.environ.get('PYTEST_CURRENT_TEST') is not None or + any('test' in arg for arg in sys.argv) or + any('pytest' in arg for arg in sys.argv) + ) + + # Show plots in development mode (when not in test mode) + return not is_pytest + +# %% nbgrader={"grade": false, "grade_id": "dataloader-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false} +print("🔥 TinyTorch DataLoader Module") +print(f"NumPy version: {np.__version__}") +print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") +print("Ready to build data pipelines!") + +# %% [markdown] +""" +## 📦 Where This Code Lives in the Final Package + +**Learning Side:** You work in `modules/source/06_dataloader/dataloader_dev.py` +**Building Side:** Code exports to `tinytorch.core.dataloader` + +```python +# Final package structure: +from tinytorch.core.dataloader import Dataset, DataLoader # Data loading utilities! +from tinytorch.core.tensor import Tensor # Foundation +from tinytorch.core.networks import Sequential # Models to train +``` + +**Why this matters:** +- **Learning:** Focused modules for deep understanding of data pipelines +- **Production:** Proper organization like PyTorch's `torch.utils.data` +- **Consistency:** All data loading utilities live together in `core.dataloader` +- **Integration:** Works seamlessly with tensors and networks +""" + +# %% [markdown] +""" +## 🧠 The Mathematical Foundation of Data Engineering + +### The Data Pipeline Equation +Every machine learning system follows this fundamental equation: + +``` +Model Performance = f(Data Quality × Data Quantity × Data Efficiency) +``` + +### Why Data Engineering is Critical +- **Data is the fuel**: Without proper data pipelines, nothing else works +- **I/O bottlenecks**: Data loading is often the biggest performance bottleneck +- **Memory management**: How you handle data affects everything else +- **Production reality**: Data pipelines are critical in real ML systems + +### The Three Pillars of Data Engineering +1. **Abstraction**: Clean interfaces that hide complexity +2. **Efficiency**: Minimize I/O and memory overhead +3. **Scalability**: Handle datasets larger than memory + +### Connection to Real ML Systems +Every framework uses the Dataset/DataLoader pattern: +- **PyTorch**: `torch.utils.data.Dataset` and `torch.utils.data.DataLoader` +- **TensorFlow**: `tf.data.Dataset` with efficient data pipelines +- **JAX**: Custom data loading with `jax.numpy` integration +- **TinyTorch**: `tinytorch.core.dataloader.Dataset` and `DataLoader` (what we're building!) + +### Performance Considerations +- **Memory efficiency**: Handle datasets larger than RAM +- **I/O optimization**: Read from disk efficiently with batching +- **Caching strategies**: When to cache vs recompute +- **Parallel processing**: Multi-threaded data loading +""" + +# %% [markdown] +""" +## Step 1: Understanding Data Engineering + +### What is Data Engineering? +**Data engineering** is the foundation of all machine learning systems. It involves loading, processing, and managing data efficiently so that models can learn from it. + +### The Fundamental Insight +**Data engineering is about managing the flow of information through your system:** +``` +Raw Data → Load → Preprocess → Batch → Feed to Model +``` + +### Real-World Examples +- **Image datasets**: CIFAR-10, ImageNet, MNIST +- **Text datasets**: Wikipedia, books, social media +- **Tabular data**: CSV files, databases, spreadsheets +- **Audio data**: Speech recordings, music files + +### Systems Thinking +- **Memory efficiency**: Handle datasets larger than RAM +- **I/O optimization**: Read from disk efficiently +- **Batching strategies**: Trade-offs between memory and speed +- **Caching**: When to cache vs recompute + +### Visual Intuition +``` +Raw Files: [image1.jpg, image2.jpg, image3.jpg, ...] +Load: [Tensor(32x32x3), Tensor(32x32x3), Tensor(32x32x3), ...] +Batch: [Tensor(32, 32, 32, 3)] # 32 images at once +Model: Process batch efficiently +``` + +Let's start by building the most fundamental component: **Dataset**. +""" + +# %% nbgrader={"grade": false, "grade_id": "dataset-class", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class Dataset: + """ + Base Dataset class: Abstract interface for all datasets. + + The fundamental abstraction for data loading in TinyTorch. + Students implement concrete datasets by inheriting from this class. + """ + + def __getitem__(self, index: int) -> Tuple[Tensor, Tensor]: + """ + Get a single sample and label by index. + + Args: + index: Index of the sample to retrieve + + Returns: + Tuple of (data, label) tensors + + TODO: Implement abstract method for getting samples. + + APPROACH: + 1. This is an abstract method - subclasses will implement it + 2. Return a tuple of (data, label) tensors + 3. Data should be the input features, label should be the target + + EXAMPLE: + dataset[0] should return (Tensor(image_data), Tensor(label)) + + HINTS: + - This is an abstract method that subclasses must override + - Always return a tuple of (data, label) tensors + - Data contains the input features, label contains the target + """ + ### BEGIN SOLUTION + # This is an abstract method - subclasses must implement it + raise NotImplementedError("Subclasses must implement __getitem__") + ### END SOLUTION + + def __len__(self) -> int: + """ + Get the total number of samples in the dataset. + + TODO: Implement abstract method for getting dataset size. + + APPROACH: + 1. This is an abstract method - subclasses will implement it + 2. Return the total number of samples in the dataset + + EXAMPLE: + len(dataset) should return 50000 for CIFAR-10 training set + + HINTS: + - This is an abstract method that subclasses must override + - Return an integer representing the total number of samples + """ + ### BEGIN SOLUTION + # This is an abstract method - subclasses must implement it + raise NotImplementedError("Subclasses must implement __len__") + ### END SOLUTION + + def get_sample_shape(self) -> Tuple[int, ...]: + """ + Get the shape of a single data sample. + + TODO: Implement method to get sample shape. + + APPROACH: + 1. Get the first sample using self[0] + 2. Extract the data part (first element of tuple) + 3. Return the shape of the data tensor + + EXAMPLE: + For CIFAR-10: returns (3, 32, 32) for RGB images + + HINTS: + - Use self[0] to get the first sample + - Extract data from the (data, label) tuple + - Return data.shape + """ + ### BEGIN SOLUTION + # Get the first sample to determine shape + data, _ = self[0] + return data.shape + ### END SOLUTION + + def get_num_classes(self) -> int: + """ + Get the number of classes in the dataset. + + TODO: Implement abstract method for getting number of classes. + + APPROACH: + 1. This is an abstract method - subclasses will implement it + 2. Return the number of unique classes in the dataset + + EXAMPLE: + For CIFAR-10: returns 10 (classes 0-9) + + HINTS: + - This is an abstract method that subclasses must override + - Return the number of unique classes/categories + """ + ### BEGIN SOLUTION + # This is an abstract method - subclasses must implement it + raise NotImplementedError("Subclasses must implement get_num_classes") + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Quick Test: Dataset Base Class + +Let's understand the Dataset interface! While we can't test the abstract class directly, we'll create a simple test dataset. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-dataset-interface-immediate", "locked": true, "points": 5, "schema_version": 3, "solution": false, "task": false} +# Test Dataset interface with a simple implementation +print("🔬 Testing Dataset interface...") + +# Create a minimal test dataset +class TestDataset(Dataset): + def __init__(self, size=5): + self.size = size + + def __getitem__(self, index): + # Simple test data: features are [index, index*2], label is index % 2 + data = Tensor([index, index * 2]) + label = Tensor([index % 2]) + return data, label + + def __len__(self): + return self.size + + def get_num_classes(self): + return 2 + +# Test the interface +try: + test_dataset = TestDataset(size=5) + print(f"Dataset created with size: {len(test_dataset)}") + + # Test __getitem__ + data, label = test_dataset[0] + print(f"Sample 0: data={data}, label={label}") + assert isinstance(data, Tensor), "Data should be a Tensor" + assert isinstance(label, Tensor), "Label should be a Tensor" + print("✅ Dataset __getitem__ works correctly") + + # Test __len__ + assert len(test_dataset) == 5, f"Dataset length should be 5, got {len(test_dataset)}" + print("✅ Dataset __len__ works correctly") + + # Test get_num_classes + assert test_dataset.get_num_classes() == 2, f"Should have 2 classes, got {test_dataset.get_num_classes()}" + print("✅ Dataset get_num_classes works correctly") + + # Test multiple samples + for i in range(3): + data, label = test_dataset[i] + expected_data = [i, i * 2] + expected_label = [i % 2] + assert np.array_equal(data.data, expected_data), f"Data mismatch at index {i}" + assert np.array_equal(label.data, expected_label), f"Label mismatch at index {i}" + print("✅ Dataset produces correct data for multiple samples") + +except Exception as e: + print(f"❌ Dataset interface test failed: {e}") + raise + +# Show the dataset pattern +print("🎯 Dataset interface pattern:") +print(" __getitem__: Returns (data, label) tuple") +print(" __len__: Returns dataset size") +print(" get_num_classes: Returns number of classes") +print("📈 Progress: Dataset interface ✓") + +# %% [markdown] +""" +## Step 2: Building the DataLoader + +### What is a DataLoader? +A **DataLoader** efficiently batches and iterates through datasets. It's the bridge between individual samples and the batched data that neural networks expect. + +### Why DataLoaders Matter +- **Batching**: Groups samples for efficient GPU computation +- **Shuffling**: Randomizes data order to prevent overfitting +- **Memory efficiency**: Loads data on-demand rather than all at once +- **Iteration**: Provides clean interface for training loops + +### The DataLoader Pattern +``` +DataLoader(dataset, batch_size=32, shuffle=True) +for batch_data, batch_labels in dataloader: + # batch_data.shape: (32, ...) + # batch_labels.shape: (32,) + # Train on batch +``` + +### Real-World Applications +- **Training loops**: Feed batches to neural networks +- **Validation**: Evaluate models on held-out data +- **Inference**: Process large datasets efficiently +- **Data analysis**: Explore datasets systematically + +### Systems Thinking +- **Batch size**: Trade-off between memory and speed +- **Shuffling**: Prevents overfitting to data order +- **Iteration**: Efficient looping through data +- **Memory**: Manage large datasets that don't fit in RAM +""" + +# %% nbgrader={"grade": false, "grade_id": "dataloader-class", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class DataLoader: + """ + DataLoader: Efficiently batch and iterate through datasets. + + Provides batching, shuffling, and efficient iteration over datasets. + Essential for training neural networks efficiently. + """ + + def __init__(self, dataset: Dataset, batch_size: int = 32, shuffle: bool = True): + """ + Initialize DataLoader. + + Args: + dataset: Dataset to load from + batch_size: Number of samples per batch + shuffle: Whether to shuffle data each epoch + + TODO: Store configuration and dataset. + + APPROACH: + 1. Store dataset as self.dataset + 2. Store batch_size as self.batch_size + 3. Store shuffle as self.shuffle + + EXAMPLE: + DataLoader(dataset, batch_size=32, shuffle=True) + + HINTS: + - Store all parameters as instance variables + - These will be used in __iter__ for batching + """ + ### BEGIN SOLUTION + self.dataset = dataset + self.batch_size = batch_size + self.shuffle = shuffle + ### END SOLUTION + + def __iter__(self) -> Iterator[Tuple[Tensor, Tensor]]: + """ + Iterate through dataset in batches. + + Returns: + Iterator yielding (batch_data, batch_labels) tuples + + TODO: Implement batching and shuffling logic. + + APPROACH: + 1. Create indices list: list(range(len(dataset))) + 2. Shuffle indices if self.shuffle is True + 3. Loop through indices in batch_size chunks + 4. For each batch: collect samples, stack them, yield batch + + EXAMPLE: + for batch_data, batch_labels in dataloader: + # batch_data.shape: (batch_size, ...) + # batch_labels.shape: (batch_size,) + + HINTS: + - Use list(range(len(self.dataset))) for indices + - Use np.random.shuffle() if self.shuffle is True + - Loop in chunks of self.batch_size + - Collect samples and stack with np.stack() + """ + ### BEGIN SOLUTION + # Create indices for all samples + indices = list(range(len(self.dataset))) + + # Shuffle if requested + if self.shuffle: + np.random.shuffle(indices) + + # Iterate through indices in batches + for i in range(0, len(indices), self.batch_size): + batch_indices = indices[i:i + self.batch_size] + + # Collect samples for this batch + batch_data = [] + batch_labels = [] + + for idx in batch_indices: + data, label = self.dataset[idx] + batch_data.append(data.data) + batch_labels.append(label.data) + + # Stack into batch tensors + batch_data_array = np.stack(batch_data, axis=0) + batch_labels_array = np.stack(batch_labels, axis=0) + + yield Tensor(batch_data_array), Tensor(batch_labels_array) + ### END SOLUTION + + def __len__(self) -> int: + """ + Get the number of batches per epoch. + + TODO: Calculate number of batches. + + APPROACH: + 1. Get dataset size: len(self.dataset) + 2. Divide by batch_size and round up + 3. Use ceiling division: (n + batch_size - 1) // batch_size + + EXAMPLE: + Dataset size 100, batch size 32 → 4 batches + + HINTS: + - Use len(self.dataset) for dataset size + - Use ceiling division for exact batch count + - Formula: (dataset_size + batch_size - 1) // batch_size + """ + ### BEGIN SOLUTION + # Calculate number of batches using ceiling division + dataset_size = len(self.dataset) + return (dataset_size + self.batch_size - 1) // self.batch_size + ### END SOLUTION + +# %% [markdown] +""" +### 🧪 Quick Test: DataLoader + +Let's test your DataLoader implementation! This is the heart of efficient data loading for neural networks. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-dataloader-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false} +# Test DataLoader immediately after implementation +print("🔬 Testing DataLoader...") + +# Use the test dataset from before +class TestDataset(Dataset): + def __init__(self, size=10): + self.size = size + + def __getitem__(self, index): + data = Tensor([index, index * 2]) + label = Tensor([index % 3]) # 3 classes + return data, label + + def __len__(self): + return self.size + + def get_num_classes(self): + return 3 + +# Test basic DataLoader functionality +try: + dataset = TestDataset(size=10) + dataloader = DataLoader(dataset, batch_size=3, shuffle=False) + + print(f"DataLoader created: batch_size={dataloader.batch_size}, shuffle={dataloader.shuffle}") + print(f"Number of batches: {len(dataloader)}") + + # Test __len__ + expected_batches = (10 + 3 - 1) // 3 # Ceiling division: 4 batches + assert len(dataloader) == expected_batches, f"Should have {expected_batches} batches, got {len(dataloader)}" + print("✅ DataLoader __len__ works correctly") + + # Test iteration + batch_count = 0 + total_samples = 0 + + for batch_data, batch_labels in dataloader: + batch_count += 1 + batch_size = batch_data.shape[0] + total_samples += batch_size + + print(f"Batch {batch_count}: data shape {batch_data.shape}, labels shape {batch_labels.shape}") + + # Verify batch dimensions + assert len(batch_data.shape) == 2, f"Batch data should be 2D, got {batch_data.shape}" + assert len(batch_labels.shape) == 2, f"Batch labels should be 2D, got {batch_labels.shape}" + assert batch_data.shape[1] == 2, f"Each sample should have 2 features, got {batch_data.shape[1]}" + assert batch_labels.shape[1] == 1, f"Each label should have 1 element, got {batch_labels.shape[1]}" + + assert batch_count == expected_batches, f"Should iterate {expected_batches} times, got {batch_count}" + assert total_samples == 10, f"Should process 10 total samples, got {total_samples}" + print("✅ DataLoader iteration works correctly") + +except Exception as e: + print(f"❌ DataLoader test failed: {e}") + raise + +# Test shuffling +try: + dataloader_shuffle = DataLoader(dataset, batch_size=5, shuffle=True) + dataloader_no_shuffle = DataLoader(dataset, batch_size=5, shuffle=False) + + # Get first batch from each + batch1_shuffle = next(iter(dataloader_shuffle)) + batch1_no_shuffle = next(iter(dataloader_no_shuffle)) + + print("✅ DataLoader shuffling parameter works") + +except Exception as e: + print(f"❌ DataLoader shuffling test failed: {e}") + raise + +# Test different batch sizes +try: + small_loader = DataLoader(dataset, batch_size=2, shuffle=False) + large_loader = DataLoader(dataset, batch_size=8, shuffle=False) + + assert len(small_loader) == 5, f"Small loader should have 5 batches, got {len(small_loader)}" + assert len(large_loader) == 2, f"Large loader should have 2 batches, got {len(large_loader)}" + print("✅ DataLoader handles different batch sizes correctly") + +except Exception as e: + print(f"❌ DataLoader batch size test failed: {e}") + raise + +# Show the DataLoader behavior +print("🎯 DataLoader behavior:") +print(" Batches data for efficient processing") +print(" Handles shuffling and iteration") +print(" Provides clean interface for training loops") +print("📈 Progress: Dataset interface ✓, DataLoader ✓") + +# %% [markdown] +""" +## Step 3: Creating a Simple Dataset Example + +### Why We Need Concrete Examples +Abstract classes are great for interfaces, but we need concrete implementations to understand how they work. Let's create a simple dataset for testing. + +### Design Principles +- **Simple**: Easy to understand and debug +- **Configurable**: Adjustable size and properties +- **Predictable**: Deterministic data for testing +- **Educational**: Shows the Dataset pattern clearly +""" + +# %% nbgrader={"grade": false, "grade_id": "simple-dataset", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class SimpleDataset(Dataset): + """ + Simple dataset for testing and demonstration. + + Generates synthetic data with configurable size and properties. + Perfect for understanding the Dataset pattern. + """ + + def __init__(self, size: int = 100, num_features: int = 4, num_classes: int = 3): + """ + Initialize SimpleDataset. + + Args: + size: Number of samples in the dataset + num_features: Number of features per sample + num_classes: Number of classes + + TODO: Initialize the dataset with synthetic data. + + APPROACH: + 1. Store the configuration parameters + 2. Generate synthetic data and labels + 3. Make data deterministic for testing + + EXAMPLE: + SimpleDataset(size=100, num_features=4, num_classes=3) + creates 100 samples with 4 features each, 3 classes + + HINTS: + - Store size, num_features, num_classes as instance variables + - Use np.random.seed() for reproducible data + - Generate random data with np.random.randn() + - Generate random labels with np.random.randint() + """ + ### BEGIN SOLUTION + self.size = size + self.num_features = num_features + self.num_classes = num_classes + + # Set seed for reproducible data + np.random.seed(42) + + # Generate synthetic data + self.data = np.random.randn(size, num_features).astype(np.float32) + self.labels = np.random.randint(0, num_classes, size=size) + ### END SOLUTION + + def __getitem__(self, index: int) -> Tuple[Tensor, Tensor]: + """ + Get a single sample and label by index. + + Args: + index: Index of the sample to retrieve + + Returns: + Tuple of (data, label) tensors + + TODO: Return the sample and label at the given index. + + APPROACH: + 1. Get data at index from self.data + 2. Get label at index from self.labels + 3. Convert to tensors and return as tuple + + EXAMPLE: + dataset[0] returns (Tensor([1.2, -0.5, 0.8, 0.1]), Tensor(2)) + + HINTS: + - Use self.data[index] and self.labels[index] + - Convert to Tensor objects + - Return as tuple (data, label) + """ + ### BEGIN SOLUTION + data = Tensor(self.data[index]) + label = Tensor(self.labels[index]) + return data, label + ### END SOLUTION + + def __len__(self) -> int: + """ + Get the total number of samples in the dataset. + + TODO: Return the dataset size. + + HINTS: + - Return self.size + """ + ### BEGIN SOLUTION + return self.size + ### END SOLUTION + + def get_num_classes(self) -> int: + """ + Get the number of classes in the dataset. + + TODO: Return the number of classes. + + HINTS: + - Return self.num_classes + """ + ### BEGIN SOLUTION + return self.num_classes + ### END SOLUTION + +# %% [markdown] +""" +## 🧪 Comprehensive DataLoader Testing Suite + +Let's test all data loading components thoroughly with realistic ML data scenarios! +""" + +# %% nbgrader={"grade": false, "grade_id": "test-dataloader-comprehensive", "locked": false, "schema_version": 3, "solution": false, "task": false} +def test_dataset_interface(): + """Test 1: Dataset interface comprehensive testing""" + print("🔬 Testing Dataset Interface...") + + # Test 1.1: Abstract base class behavior + try: + # Test that we can't instantiate abstract Dataset + try: + base_dataset = Dataset() + base_dataset[0] # Should raise NotImplementedError + assert False, "Should not be able to call abstract methods" + except NotImplementedError: + print("✅ Abstract Dataset correctly raises NotImplementedError") + except Exception as e: + print(f"❌ Abstract Dataset test failed: {e}") + return False + + # Test 1.2: SimpleDataset implementation + try: + dataset = SimpleDataset(size=50, num_features=4, num_classes=3) + + # Test basic properties + assert len(dataset) == 50, f"Dataset length should be 50, got {len(dataset)}" + assert dataset.get_num_classes() == 3, f"Should have 3 classes, got {dataset.get_num_classes()}" + + # Test sample retrieval + data, label = dataset[0] + assert isinstance(data, Tensor), "Data should be a Tensor" + assert isinstance(label, Tensor), "Label should be a Tensor" + assert data.shape == (4,), f"Data shape should be (4,), got {data.shape}" + + # Test sample shape method + sample_shape = dataset.get_sample_shape() + assert sample_shape == (4,), f"Sample shape should be (4,), got {sample_shape}" + + print("✅ SimpleDataset implementation test passed") + except Exception as e: + print(f"❌ SimpleDataset implementation failed: {e}") + return False + + # Test 1.3: Different dataset configurations + try: + # Small dataset + small_dataset = SimpleDataset(size=5, num_features=2, num_classes=2) + assert len(small_dataset) == 5, "Small dataset length wrong" + assert small_dataset.get_num_classes() == 2, "Small dataset classes wrong" + + # Large dataset + large_dataset = SimpleDataset(size=1000, num_features=10, num_classes=5) + assert len(large_dataset) == 1000, "Large dataset length wrong" + assert large_dataset.get_num_classes() == 5, "Large dataset classes wrong" + + # Test data consistency (seeded random) + data1, _ = small_dataset[0] + data2, _ = small_dataset[0] + assert np.allclose(data1.data, data2.data), "Dataset should be deterministic" + + print("✅ Different dataset configurations test passed") + except Exception as e: + print(f"❌ Different dataset configurations failed: {e}") + return False + + # Test 1.4: Edge cases and robustness + try: + # Test edge case: single sample + single_dataset = SimpleDataset(size=1, num_features=1, num_classes=1) + data, label = single_dataset[0] + assert data.shape == (1,), "Single sample data shape wrong" + assert isinstance(label.data, (int, np.integer)) or label.data.shape == (), "Single sample label wrong" + + # Test boundary indices + dataset = SimpleDataset(size=10, num_features=3, num_classes=2) + first_data, first_label = dataset[0] + last_data, last_label = dataset[9] + assert first_data.shape == (3,), "First sample shape wrong" + assert last_data.shape == (3,), "Last sample shape wrong" + + print("✅ Edge cases and robustness test passed") + except Exception as e: + print(f"❌ Edge cases and robustness failed: {e}") + return False + + print("🎯 Dataset interface: All tests passed!") + return True + +def test_dataloader_functionality(): + """Test 2: DataLoader functionality comprehensive testing""" + print("🔬 Testing DataLoader Functionality...") + + # Test 2.1: Basic DataLoader operations + try: + dataset = SimpleDataset(size=32, num_features=4, num_classes=2) + dataloader = DataLoader(dataset, batch_size=8, shuffle=False) + + # Test initialization + assert dataloader.batch_size == 8, f"Batch size should be 8, got {dataloader.batch_size}" + assert dataloader.shuffle == False, f"Shuffle should be False, got {dataloader.shuffle}" + + # Test length calculation + expected_batches = (32 + 8 - 1) // 8 # Ceiling division: 4 batches + assert len(dataloader) == expected_batches, f"Should have {expected_batches} batches, got {len(dataloader)}" + + print("✅ Basic DataLoader operations test passed") + except Exception as e: + print(f"❌ Basic DataLoader operations failed: {e}") + return False + + # Test 2.2: Batch iteration and shapes + try: + dataset = SimpleDataset(size=25, num_features=3, num_classes=2) + dataloader = DataLoader(dataset, batch_size=10, shuffle=False) + + batch_count = 0 + total_samples = 0 + + for batch_data, batch_labels in dataloader: + batch_count += 1 + batch_size = batch_data.shape[0] + total_samples += batch_size + + # Check batch shapes + assert len(batch_data.shape) == 2, f"Batch data should be 2D, got {batch_data.shape}" + assert batch_data.shape[1] == 3, f"Should have 3 features, got {batch_data.shape[1]}" + assert batch_labels.shape[0] == batch_size, f"Labels should match batch size" + + # Check data types + assert isinstance(batch_data, Tensor), "Batch data should be Tensor" + assert isinstance(batch_labels, Tensor), "Batch labels should be Tensor" + + # Verify complete iteration + assert total_samples == 25, f"Should process 25 samples, got {total_samples}" + assert batch_count == 3, f"Should have 3 batches, got {batch_count}" # 25/10 = 3 batches + + print("✅ Batch iteration and shapes test passed") + except Exception as e: + print(f"❌ Batch iteration and shapes failed: {e}") + return False + + # Test 2.3: Different batch sizes + try: + dataset = SimpleDataset(size=100, num_features=5, num_classes=3) + + # Small batches + small_loader = DataLoader(dataset, batch_size=7, shuffle=False) + assert len(small_loader) == 15, f"Small loader should have 15 batches, got {len(small_loader)}" # 100/7 = 15 + + # Large batches + large_loader = DataLoader(dataset, batch_size=30, shuffle=False) + assert len(large_loader) == 4, f"Large loader should have 4 batches, got {len(large_loader)}" # 100/30 = 4 + + # Single sample batches + single_loader = DataLoader(dataset, batch_size=1, shuffle=False) + assert len(single_loader) == 100, f"Single loader should have 100 batches, got {len(single_loader)}" + + print("✅ Different batch sizes test passed") + except Exception as e: + print(f"❌ Different batch sizes failed: {e}") + return False + + # Test 2.4: Shuffling behavior + try: + dataset = SimpleDataset(size=20, num_features=2, num_classes=2) + + # Test with shuffling + loader_shuffle = DataLoader(dataset, batch_size=5, shuffle=True) + loader_no_shuffle = DataLoader(dataset, batch_size=5, shuffle=False) + + # Get multiple batches to test shuffling + shuffle_batches = list(loader_shuffle) + no_shuffle_batches = list(loader_no_shuffle) + + assert len(shuffle_batches) == len(no_shuffle_batches), "Should have same number of batches" + + # Test that all original samples are present (just reordered) + shuffle_all_data = np.concatenate([batch[0].data for batch in shuffle_batches]) + no_shuffle_all_data = np.concatenate([batch[0].data for batch in no_shuffle_batches]) + + assert shuffle_all_data.shape == no_shuffle_all_data.shape, "Should have same total data shape" + + print("✅ Shuffling behavior test passed") + except Exception as e: + print(f"❌ Shuffling behavior failed: {e}") + return False + + print("🎯 DataLoader functionality: All tests passed!") + return True + +def test_data_pipeline_scenarios(): + """Test 3: Real-world data pipeline scenarios""" + print("🔬 Testing Data Pipeline Scenarios...") + + # Test 3.1: Image classification scenario + try: + # Simulate CIFAR-10 like dataset: 32x32 RGB images, 10 classes + image_dataset = SimpleDataset(size=1000, num_features=32*32*3, num_classes=10) + image_loader = DataLoader(image_dataset, batch_size=64, shuffle=True) + + # Test one epoch of training + epoch_samples = 0 + for batch_data, batch_labels in image_loader: + epoch_samples += batch_data.shape[0] + + # Verify image batch properties + assert batch_data.shape[1] == 32*32*3, f"Should have 3072 features (32x32x3), got {batch_data.shape[1]}" + assert batch_data.shape[0] <= 64, f"Batch size should be <= 64, got {batch_data.shape[0]}" + + # Simulate forward pass + batch_size = batch_data.shape[0] + assert batch_labels.shape[0] == batch_size, "Labels should match batch size" + + assert epoch_samples == 1000, f"Should process 1000 samples, got {epoch_samples}" + print("✅ Image classification scenario test passed") + except Exception as e: + print(f"❌ Image classification scenario failed: {e}") + return False + + # Test 3.2: Text classification scenario + try: + # Simulate text classification: 512 token embeddings, 5 sentiment classes + text_dataset = SimpleDataset(size=500, num_features=512, num_classes=5) + text_loader = DataLoader(text_dataset, batch_size=32, shuffle=True) + + # Test batch processing + for batch_data, batch_labels in text_loader: + # Verify text batch properties + assert batch_data.shape[1] == 512, f"Should have 512 features, got {batch_data.shape[1]}" + + # Simulate text processing + batch_size = batch_data.shape[0] + assert batch_size <= 32, f"Batch size should be <= 32, got {batch_size}" + break # Just test first batch + + print("✅ Text classification scenario test passed") + except Exception as e: + print(f"❌ Text classification scenario failed: {e}") + return False + + # Test 3.3: Tabular data scenario + try: + # Simulate tabular data: house prices with 20 features, 3 price ranges + tabular_dataset = SimpleDataset(size=200, num_features=20, num_classes=3) + tabular_loader = DataLoader(tabular_dataset, batch_size=16, shuffle=False) + + # Test systematic processing (no shuffling for tabular data) + batch_count = 0 + for batch_data, batch_labels in tabular_loader: + batch_count += 1 + + # Verify tabular batch properties + assert batch_data.shape[1] == 20, f"Should have 20 features, got {batch_data.shape[1]}" + + # Simulate tabular processing + batch_size = batch_data.shape[0] + assert batch_size <= 16, f"Batch size should be <= 16, got {batch_size}" + + expected_batches = (200 + 16 - 1) // 16 # 13 batches + assert batch_count == expected_batches, f"Should have {expected_batches} batches, got {batch_count}" + + print("✅ Tabular data scenario test passed") + except Exception as e: + print(f"❌ Tabular data scenario failed: {e}") + return False + + # Test 3.4: Small dataset scenario + try: + # Simulate small research dataset + small_dataset = SimpleDataset(size=50, num_features=10, num_classes=2) + small_loader = DataLoader(small_dataset, batch_size=8, shuffle=True) + + # Test multiple epochs + for epoch in range(3): + epoch_samples = 0 + for batch_data, batch_labels in small_loader: + epoch_samples += batch_data.shape[0] + + # Verify small dataset properties + assert batch_data.shape[1] == 10, f"Should have 10 features, got {batch_data.shape[1]}" + + assert epoch_samples == 50, f"Epoch {epoch}: should process 50 samples, got {epoch_samples}" + + print("✅ Small dataset scenario test passed") + except Exception as e: + print(f"❌ Small dataset scenario failed: {e}") + return False + + print("🎯 Data pipeline scenarios: All tests passed!") + return True + +def test_integration_with_ml_workflow(): + """Test 4: Integration with ML workflow""" + print("🔬 Testing Integration with ML Workflow...") + + # Test 4.1: Training loop integration + try: + # Create dataset for training + train_dataset = SimpleDataset(size=100, num_features=8, num_classes=3) + train_loader = DataLoader(train_dataset, batch_size=20, shuffle=True) + + # Simulate training loop + for epoch in range(2): + epoch_loss = 0 + batch_count = 0 + + for batch_data, batch_labels in train_loader: + batch_count += 1 + + # Simulate forward pass + batch_size = batch_data.shape[0] + assert batch_data.shape == (batch_size, 8), f"Batch data shape wrong: {batch_data.shape}" + assert batch_labels.shape[0] == batch_size, f"Batch labels shape wrong: {batch_labels.shape}" + + # Simulate loss computation + mock_loss = np.random.random() + epoch_loss += mock_loss + + # Verify we can iterate through all batches + assert batch_count <= 5, f"Too many batches: {batch_count}" # 100/20 = 5 + + assert batch_count == 5, f"Should have 5 batches per epoch, got {batch_count}" + + print("✅ Training loop integration test passed") + except Exception as e: + print(f"❌ Training loop integration failed: {e}") + return False + + # Test 4.2: Validation loop integration + try: + # Create dataset for validation + val_dataset = SimpleDataset(size=50, num_features=8, num_classes=3) + val_loader = DataLoader(val_dataset, batch_size=10, shuffle=False) # No shuffle for validation + + # Simulate validation loop + total_correct = 0 + total_samples = 0 + + for batch_data, batch_labels in val_loader: + batch_size = batch_data.shape[0] + total_samples += batch_size + + # Simulate prediction + mock_predictions = np.random.randint(0, 3, size=batch_size) + mock_correct = np.random.randint(0, batch_size + 1) + total_correct += mock_correct + + # Verify batch properties + assert batch_data.shape[1] == 8, f"Features should be 8, got {batch_data.shape[1]}" + assert batch_labels.shape[0] == batch_size, f"Labels should match batch size" + + assert total_samples == 50, f"Should validate 50 samples, got {total_samples}" + + print("✅ Validation loop integration test passed") + except Exception as e: + print(f"❌ Validation loop integration failed: {e}") + return False + + # Test 4.3: Model inference integration + try: + # Create dataset for inference + test_dataset = SimpleDataset(size=30, num_features=5, num_classes=2) + test_loader = DataLoader(test_dataset, batch_size=5, shuffle=False) + + # Simulate inference + all_predictions = [] + + for batch_data, batch_labels in test_loader: + batch_size = batch_data.shape[0] + + # Simulate model inference + mock_predictions = np.random.random((batch_size, 2)) # 2 classes + all_predictions.append(mock_predictions) + + # Verify inference batch properties + assert batch_data.shape[1] == 5, f"Features should be 5, got {batch_data.shape[1]}" + assert batch_size <= 5, f"Batch size should be <= 5, got {batch_size}" + + # Verify all predictions collected + total_predictions = np.concatenate(all_predictions, axis=0) + assert total_predictions.shape == (30, 2), f"Predictions shape should be (30, 2), got {total_predictions.shape}" + + print("✅ Model inference integration test passed") + except Exception as e: + print(f"❌ Model inference integration failed: {e}") + return False + + # Test 4.4: Cross-validation scenario + try: + # Create dataset for cross-validation + full_dataset = SimpleDataset(size=100, num_features=6, num_classes=4) + + # Simulate 5-fold cross-validation + fold_size = 20 + + for fold in range(5): + # Create train/val split simulation + train_size = 80 # 4 folds for training + val_size = 20 # 1 fold for validation + + train_dataset = SimpleDataset(size=train_size, num_features=6, num_classes=4) + val_dataset = SimpleDataset(size=val_size, num_features=6, num_classes=4) + + train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True) + val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False) + + # Verify fold setup + assert len(train_dataset) == train_size, f"Train size wrong for fold {fold}" + assert len(val_dataset) == val_size, f"Val size wrong for fold {fold}" + + # Test one iteration of each + train_batch = next(iter(train_loader)) + val_batch = next(iter(val_loader)) + + assert train_batch[0].shape[1] == 6, f"Train features wrong for fold {fold}" + assert val_batch[0].shape[1] == 6, f"Val features wrong for fold {fold}" + + print("✅ Cross-validation scenario test passed") + except Exception as e: + print(f"❌ Cross-validation scenario failed: {e}") + return False + + print("🎯 ML workflow integration: All tests passed!") + return True + +# Run all comprehensive tests +def run_comprehensive_dataloader_tests(): + """Run all comprehensive DataLoader tests""" + print("🧪 Running Comprehensive DataLoader Test Suite...") + print("=" * 60) + + test_results = [] + + # Run all test functions + test_results.append(test_dataset_interface()) + test_results.append(test_dataloader_functionality()) + test_results.append(test_data_pipeline_scenarios()) + test_results.append(test_integration_with_ml_workflow()) + + # Summary + print("=" * 60) + print("📊 Test Results Summary:") + print(f"✅ Dataset Interface: {'PASSED' if test_results[0] else 'FAILED'}") + print(f"✅ DataLoader Functionality: {'PASSED' if test_results[1] else 'FAILED'}") + print(f"✅ Data Pipeline Scenarios: {'PASSED' if test_results[2] else 'FAILED'}") + print(f"✅ ML Workflow Integration: {'PASSED' if test_results[3] else 'FAILED'}") + + all_passed = all(test_results) + print(f"\n🎯 Overall Result: {'ALL TESTS PASSED! 🎉' if all_passed else 'SOME TESTS FAILED ❌'}") + + if all_passed: + print("\n🚀 DataLoader Module Implementation Complete!") + print(" ✓ Dataset interface working correctly") + print(" ✓ DataLoader batching and iteration functional") + print(" ✓ Real-world data pipeline scenarios tested") + print(" ✓ ML workflow integration verified") + print("\n🎓 Ready for production ML data pipelines!") + + return all_passed + +# Run the comprehensive test suite +if __name__ == "__main__": + run_comprehensive_dataloader_tests() + +# %% [markdown] +""" +### 🧪 Test Your Data Loading Implementations + +Once you implement the classes above, run these cells to test them: +""" + +# %% nbgrader={"grade": true, "grade_id": "test-dataset", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test Dataset abstract class +print("Testing Dataset abstract class...") + +# Create a simple dataset +dataset = SimpleDataset(size=10, num_features=3, num_classes=2) + +# Test basic functionality +assert len(dataset) == 10, f"Dataset length should be 10, got {len(dataset)}" +assert dataset.get_num_classes() == 2, f"Number of classes should be 2, got {dataset.get_num_classes()}" + +# Test sample retrieval +data, label = dataset[0] +assert isinstance(data, Tensor), "Data should be a Tensor" +assert isinstance(label, Tensor), "Label should be a Tensor" +assert data.shape == (3,), f"Data shape should be (3,), got {data.shape}" +assert label.shape == (), f"Label shape should be (), got {label.shape}" + +# Test sample shape +sample_shape = dataset.get_sample_shape() +assert sample_shape == (3,), f"Sample shape should be (3,), got {sample_shape}" + +print("✅ Dataset tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-dataloader", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test DataLoader +print("Testing DataLoader...") + +# Create dataset and dataloader +dataset = SimpleDataset(size=50, num_features=4, num_classes=3) +dataloader = DataLoader(dataset, batch_size=8, shuffle=True) + +# Test dataloader length +expected_batches = (50 + 8 - 1) // 8 # Ceiling division +assert len(dataloader) == expected_batches, f"DataLoader length should be {expected_batches}, got {len(dataloader)}" + +# Test batch iteration +batch_count = 0 +total_samples = 0 + +for batch_data, batch_labels in dataloader: + batch_count += 1 + batch_size = batch_data.shape[0] + total_samples += batch_size + + # Check batch shapes + assert batch_data.shape[1] == 4, f"Batch data should have 4 features, got {batch_data.shape[1]}" + assert batch_labels.shape[0] == batch_size, f"Batch labels should match batch size, got {batch_labels.shape[0]}" + + # Check that we don't exceed expected batches + assert batch_count <= expected_batches, f"Too many batches: {batch_count} > {expected_batches}" + +# Verify we processed all samples +assert total_samples == 50, f"Should process 50 samples total, got {total_samples}" +assert batch_count == expected_batches, f"Should have {expected_batches} batches, got {batch_count}" + +print("✅ DataLoader tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-dataloader-shuffle", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test DataLoader shuffling +print("Testing DataLoader shuffling...") + +# Create dataset +dataset = SimpleDataset(size=20, num_features=2, num_classes=2) + +# Test with shuffling +dataloader_shuffle = DataLoader(dataset, batch_size=5, shuffle=True) +dataloader_no_shuffle = DataLoader(dataset, batch_size=5, shuffle=False) + +# Get first batch from each +batch_shuffle = next(iter(dataloader_shuffle)) +batch_no_shuffle = next(iter(dataloader_no_shuffle)) + +# With different random seeds, shuffled batches should be different +# (This is probabilistic, but very likely to be true) +shuffle_data = batch_shuffle[0].data +no_shuffle_data = batch_no_shuffle[0].data + +# Check that shapes are correct +assert shuffle_data.shape == (5, 2), f"Shuffled batch shape should be (5, 2), got {shuffle_data.shape}" +assert no_shuffle_data.shape == (5, 2), f"No-shuffle batch shape should be (5, 2), got {no_shuffle_data.shape}" + +print("✅ DataLoader shuffling tests passed!") + +# %% nbgrader={"grade": true, "grade_id": "test-integration", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +# Test complete data pipeline integration +print("Testing complete data pipeline integration...") + +# Create a larger dataset +dataset = SimpleDataset(size=100, num_features=8, num_classes=5) +dataloader = DataLoader(dataset, batch_size=16, shuffle=True) + +# Simulate training loop +epoch_samples = 0 +epoch_batches = 0 + +for batch_data, batch_labels in dataloader: + epoch_batches += 1 + epoch_samples += batch_data.shape[0] + + # Verify batch properties + assert batch_data.shape[1] == 8, f"Features should be 8, got {batch_data.shape[1]}" + assert len(batch_labels.shape) == 1, f"Labels should be 1D, got shape {batch_labels.shape}" + + # Verify data types + assert isinstance(batch_data, Tensor), "Batch data should be Tensor" + assert isinstance(batch_labels, Tensor), "Batch labels should be Tensor" + +# Verify we processed all data +assert epoch_samples == 100, f"Should process 100 samples, got {epoch_samples}" +expected_batches = (100 + 16 - 1) // 16 +assert epoch_batches == expected_batches, f"Should have {expected_batches} batches, got {epoch_batches}" + +print("✅ Complete data pipeline integration tests passed!") + +# %% [markdown] +""" +## 🎯 Module Summary + +Congratulations! You've successfully implemented the core components of data loading systems: + +### What You've Accomplished +✅ **Dataset Abstract Class**: The foundation interface for all data loading +✅ **DataLoader Implementation**: Efficient batching and iteration over datasets +✅ **SimpleDataset Example**: Concrete implementation showing the Dataset pattern +✅ **Complete Data Pipeline**: End-to-end data loading for neural network training +✅ **Systems Thinking**: Understanding memory efficiency, batching, and I/O optimization + +### Key Concepts You've Learned +- **Dataset pattern**: Abstract interface for consistent data access +- **DataLoader pattern**: Efficient batching and iteration for training +- **Memory efficiency**: Loading data on-demand rather than all at once +- **Batching strategies**: Grouping samples for efficient GPU computation +- **Shuffling**: Randomizing data order to prevent overfitting + +### Mathematical Foundations +- **Batch processing**: Vectorized operations on multiple samples +- **Memory management**: Handling datasets larger than available RAM +- **I/O optimization**: Minimizing disk reads and memory allocation +- **Stochastic sampling**: Random shuffling for better generalization + +### Real-World Applications +- **Computer vision**: Loading image datasets like CIFAR-10, ImageNet +- **Natural language processing**: Loading text datasets with tokenization +- **Tabular data**: Loading CSV files and database records +- **Audio processing**: Loading and preprocessing audio files +- **Time series**: Loading sequential data with proper windowing + +### Connection to Production Systems +- **PyTorch**: Your Dataset and DataLoader mirror `torch.utils.data` +- **TensorFlow**: Similar concepts in `tf.data.Dataset` +- **JAX**: Custom data loading with efficient batching +- **MLOps**: Data pipelines are critical for production ML systems + +### Next Steps +1. **Export your code**: `tito package nbdev --export 06_dataloader` +2. **Test your implementation**: `tito module test 06_dataloader` +3. **Use your data loading**: + ```python + from tinytorch.core.dataloader import Dataset, DataLoader, SimpleDataset + + # Create dataset and dataloader + dataset = SimpleDataset(size=1000, num_features=10, num_classes=3) + dataloader = DataLoader(dataset, batch_size=32, shuffle=True) + + # Training loop + for batch_data, batch_labels in dataloader: + # Train your network on batch_data, batch_labels + pass + ``` +4. **Build real datasets**: Extend Dataset for your specific data types +5. **Optimize performance**: Add caching, parallel loading, and preprocessing + +**Ready for the next challenge?** You now have all the core components to build complete machine learning systems: tensors, activations, layers, networks, and data loading. The next modules will focus on training (autograd, optimizers) and advanced topics! +""" \ No newline at end of file diff --git a/modules/source/06_dataloader/tests/generate_test_dataloader.py b/modules/source/06_dataloader/tests/generate_test_dataloader.py deleted file mode 100644 index b9ced235..00000000 --- a/modules/source/06_dataloader/tests/generate_test_dataloader.py +++ /dev/null @@ -1,80 +0,0 @@ -#!/usr/bin/env python3 -""" -Generate small test data for data module testing. - -This creates a small mock dataset that mimics CIFAR-10 structure but is tiny -and doesn't require downloading anything. -""" - -import numpy as np -import pickle -import os -from pathlib import Path - -def generate_test_cifar10_data(): - """Generate small test data that mimics CIFAR-10 structure.""" - - # CIFAR-10 class names - class_names = [ - 'airplane', 'automobile', 'bird', 'cat', 'deer', - 'dog', 'frog', 'horse', 'ship', 'truck' - ] - - # Create small test dataset - train_size = 50 # Small training set - test_size = 20 # Small test set - - # Generate random image data (3x32x32, values 0-255) - train_data = np.random.randint(0, 256, size=(train_size, 3, 32, 32), dtype=np.uint8) - train_labels = np.random.randint(0, 10, size=(train_size,), dtype=np.uint8) - - test_data = np.random.randint(0, 256, size=(test_size, 3, 32, 32), dtype=np.uint8) - test_labels = np.random.randint(0, 10, size=(test_size,), dtype=np.uint8) - - # Create the data directory - data_dir = Path(__file__).parent / "test_data" - data_dir.mkdir(exist_ok=True) - - # Save training data (mimics CIFAR-10 format) - train_dict = { - b'data': train_data.reshape(train_size, -1), # Flatten to (N, 3072) - b'labels': train_labels.tolist(), - b'batch_label': b'training batch 1 of 1', - b'filenames': [f'train_image_{i}.png'.encode() for i in range(train_size)] - } - - with open(data_dir / "data_batch_1", "wb") as f: - pickle.dump(train_dict, f) - - # Save test data - test_dict = { - b'data': test_data.reshape(test_size, -1), # Flatten to (N, 3072) - b'labels': test_labels.tolist(), - b'batch_label': b'testing batch 1 of 1', - b'filenames': [f'test_image_{i}.png'.encode() for i in range(test_size)] - } - - with open(data_dir / "test_batch", "wb") as f: - pickle.dump(test_dict, f) - - # Save metadata - meta_dict = { - b'label_names': [name.encode() for name in class_names], - b'num_cases_per_batch': [train_size], - b'num_vis': 3072 # 32*32*3 - } - - with open(data_dir / "batches.meta", "wb") as f: - pickle.dump(meta_dict, f) - - print(f"✅ Generated test data:") - print(f" - Training samples: {train_size}") - print(f" - Test samples: {test_size}") - print(f" - Image shape: (3, 32, 32)") - print(f" - Classes: {len(class_names)}") - print(f" - Saved to: {data_dir}") - - return data_dir - -if __name__ == "__main__": - generate_test_cifar10_data() \ No newline at end of file diff --git a/modules/source/06_dataloader/tests/test_data/batches.meta b/modules/source/06_dataloader/tests/test_data/batches.meta deleted file mode 100644 index 9b3ddceb..00000000 Binary files a/modules/source/06_dataloader/tests/test_data/batches.meta and /dev/null differ diff --git a/modules/source/06_dataloader/tests/test_data/data_batch_1 b/modules/source/06_dataloader/tests/test_data/data_batch_1 deleted file mode 100644 index 3e2c33ed..00000000 Binary files a/modules/source/06_dataloader/tests/test_data/data_batch_1 and /dev/null differ diff --git a/modules/source/06_dataloader/tests/test_data/test_batch b/modules/source/06_dataloader/tests/test_data/test_batch deleted file mode 100644 index 7d3e1546..00000000 Binary files a/modules/source/06_dataloader/tests/test_data/test_batch and /dev/null differ diff --git a/modules/source/06_dataloader/tests/test_dataloader.py b/modules/source/06_dataloader/tests/test_dataloader.py deleted file mode 100644 index f6064744..00000000 --- a/modules/source/06_dataloader/tests/test_dataloader.py +++ /dev/null @@ -1,460 +0,0 @@ -""" -Test suite for the dataloader module. -This tests the student implementations to ensure they work correctly. -""" - -import pytest -import numpy as np -import sys -import os -import tempfile -import shutil -import pickle -from pathlib import Path -from unittest.mock import patch, MagicMock - -# Import from the main package (rock solid foundation) -try: - from tinytorch.core.dataloader import Dataset, DataLoader, SimpleDataset - # These may not be implemented yet - use fallback - try: - from tinytorch.core.dataloader import CIFAR10Dataset, Normalizer, create_data_pipeline - except ImportError: - # Create mock classes for missing functionality - class CIFAR10Dataset: - """Mock implementation for testing""" - def __init__(self, *args, **kwargs): - pass - def __len__(self): - return 100 - def __getitem__(self, idx): - return ([0.5] * 32 * 32 * 3, 1) - - class Normalizer: - """Mock implementation for testing""" - def __init__(self, *args, **kwargs): - pass - def __call__(self, x): - return x - - def create_data_pipeline(*args, **kwargs): - """Mock implementation for testing""" - return SimpleDataset([([0.5] * 10, 1)] * 100) - -except ImportError: - # Fallback for when module isn't exported yet - project_root = Path(__file__).parent.parent.parent - sys.path.append(str(project_root / "modules" / "source" / "06_dataloader")) - from dataloader_dev import Dataset, DataLoader, CIFAR10Dataset, Normalizer, create_data_pipeline - -from tinytorch.core.tensor import Tensor - -def safe_numpy(tensor): - """Get numpy array from tensor, using .data attribute""" - return tensor.data - -def safe_item(tensor): - """Get scalar value from tensor""" - return float(tensor.data) - -class TestCIFAR10Dataset(Dataset): - """Test dataset that uses local test data instead of downloading CIFAR-10.""" - - def __init__(self, root_dir: str, train: bool = True, download: bool = True): - """Initialize with local test data.""" - self.root_dir = root_dir - self.train = train - self.download = download - - # Use local test data - test_data_dir = Path(__file__).parent / "test_data" - if not test_data_dir.exists(): - raise FileNotFoundError(f"Test data not found at {test_data_dir}") - - self._load_test_data(test_data_dir) - - def _load_test_data(self, data_dir): - """Load the small test dataset.""" - # Load metadata - with open(data_dir / "batches.meta", "rb") as f: - meta_dict = pickle.load(f) - - self.class_names = [name.decode() for name in meta_dict[b'label_names']] - - # Load training or test data - if self.train: - with open(data_dir / "data_batch_1", "rb") as f: - data_dict = pickle.load(f) - else: - with open(data_dir / "test_batch", "rb") as f: - data_dict = pickle.load(f) - - # Reshape data from (N, 3072) to (N, 3, 32, 32) - self.data = data_dict[b'data'].reshape(-1, 3, 32, 32) - self.labels = data_dict[b'labels'] - - def __getitem__(self, index: int): - """Get a single sample and label.""" - image = self.data[index] - label = self.labels[index] - - return Tensor(image.astype(np.float32)), Tensor(np.array(label)) - - def __len__(self) -> int: - """Get the total number of samples.""" - return len(self.data) - - def get_num_classes(self) -> int: - """Get the number of classes.""" - return len(self.class_names) - -class TestDatasetInterface: - """Test the base Dataset class interface (abstract class behavior).""" - - def test_dataset_is_abstract(self): - """Test that Dataset base class is abstract.""" - dataset = Dataset() - - # Should raise NotImplementedError for abstract methods - with pytest.raises(NotImplementedError): - dataset[0] - - with pytest.raises(NotImplementedError): - len(dataset) - - with pytest.raises(NotImplementedError): - dataset.get_num_classes() - - def test_concrete_dataset_implementation(self): - """Test that concrete datasets work properly.""" - class TestDataset(Dataset): - def __init__(self, size=10): - self.size = size - self.data = [np.random.randn(3, 32, 32) for _ in range(size)] - self.labels = [i % 3 for i in range(size)] - - def __getitem__(self, index): - return Tensor(self.data[index]), Tensor(np.array(self.labels[index])) - - def __len__(self): - return self.size - - def get_num_classes(self): - return 3 - - dataset = TestDataset(5) - - # Test basic functionality - assert len(dataset) == 5 - assert dataset.get_num_classes() == 3 - - # Test indexing - sample, label = dataset[0] - assert sample.shape == (3, 32, 32) - assert label.shape == () - - # Test get_sample_shape - assert dataset.get_sample_shape() == (3, 32, 32) - -class TestLocalCIFAR10Dataset: - """Test CIFAR-10 dataset with local test data.""" - - def test_cifar10_train_set_load(self): - """Test loading training set from local test data.""" - with tempfile.TemporaryDirectory() as temp_dir: - # Use local test data - dataset = TestCIFAR10Dataset(temp_dir, train=True, download=True) - - # Verify basic properties - assert len(dataset) == 50 # Our test training set size - assert dataset.get_num_classes() == 10 - - # Test sample access - image, label = dataset[0] - assert image.shape == (3, 32, 32) # CIFAR-10 image shape - assert 0 <= safe_item(label) < 10 # Valid class label - - # Test class names - assert len(dataset.class_names) == 10 - assert 'airplane' in dataset.class_names - assert 'truck' in dataset.class_names - - def test_cifar10_test_set_load(self): - """Test loading test set from local test data.""" - with tempfile.TemporaryDirectory() as temp_dir: - # Use local test data - dataset = TestCIFAR10Dataset(temp_dir, train=False, download=True) - - # Verify test set properties - assert len(dataset) == 20 # Our test test set size - assert dataset.get_num_classes() == 10 - - # Test sample access - image, label = dataset[0] - assert image.shape == (3, 32, 32) - assert 0 <= safe_item(label) < 10 - - def test_cifar10_data_types(self): - """Test that test data has correct types and ranges.""" - with tempfile.TemporaryDirectory() as temp_dir: - dataset = TestCIFAR10Dataset(temp_dir, train=True, download=True) - - # Test first few samples - for i in range(5): - image, label = dataset[i] - - # Check data types - assert isinstance(image, Tensor) - assert isinstance(label, Tensor) - - # Check value ranges (our test data uses 0-255 range) - assert 0 <= safe_numpy(image).min() <= 255 - assert 0 <= safe_numpy(image).max() <= 255 - - # Check label is valid class - assert 0 <= safe_item(label) < 10 - -class TestDataLoader: - """Test DataLoader with local test data.""" - - def setup_method(self): - """Set up local test dataset for DataLoader tests.""" - self.temp_dir = tempfile.mkdtemp() - # Use local test data - self.dataset = TestCIFAR10Dataset(self.temp_dir, train=True, download=True) - - def teardown_method(self): - """Clean up temporary directory.""" - shutil.rmtree(self.temp_dir, ignore_errors=True) - - def test_dataloader_creation(self): - """Test DataLoader creation with local test data.""" - # Test with default parameters - loader = DataLoader(self.dataset, batch_size=16) - assert len(loader) == 4 # 50 samples / 16 batch_size = 4 batches (rounded up) - - # Test with custom batch size - loader = DataLoader(self.dataset, batch_size=10) - assert len(loader) == 5 # 50 samples / 10 batch_size = 5 batches - - def test_dataloader_iteration_test_data(self): - """Test DataLoader iteration with local test data.""" - loader = DataLoader(self.dataset, batch_size=8, shuffle=True) - - batch_count = 0 - total_samples = 0 - - for batch_data, batch_labels in loader: - batch_count += 1 - batch_size = batch_data.shape[0] - total_samples += batch_size - - # Check batch shapes - assert batch_data.shape[1:] == (3, 32, 32) # CIFAR-10 image shape - assert batch_labels.shape == (batch_size,) - - # Check data types - assert isinstance(batch_data, Tensor) - assert isinstance(batch_labels, Tensor) - - # Check test data properties - assert 0 <= safe_numpy(batch_data).min() <= 255 - assert 0 <= safe_numpy(batch_data).max() <= 255 - assert 0 <= safe_numpy(batch_labels).min() < 10 - assert 0 <= safe_numpy(batch_labels).max() < 10 - - # Check batch size - assert batch_size <= 8 - - if batch_count >= 3: # Test first few batches - break - - assert batch_count > 0 - assert total_samples <= len(self.dataset) - - def test_dataloader_shuffling_test_data(self): - """Test that shuffling works with test data.""" - loader1 = DataLoader(self.dataset, batch_size=10, shuffle=True) - loader2 = DataLoader(self.dataset, batch_size=10, shuffle=True) - - # Get first batch from each loader - batch1_data, batch1_labels = next(iter(loader1)) - batch2_data, batch2_labels = next(iter(loader2)) - - # With shuffling, batches should likely be different - # (This test might occasionally fail due to randomness, but very unlikely) - different = not np.array_equal(safe_numpy(batch1_labels), safe_numpy(batch2_labels)) - # Note: We don't assert this because random shuffling might occasionally produce same order - - def test_dataloader_no_shuffle_test_data(self): - """Test DataLoader without shuffling uses test data in order.""" - loader = DataLoader(self.dataset, batch_size=10, shuffle=False) - - # Get first batch - batch_data, batch_labels = next(iter(loader)) - - # Without shuffling, should get first 10 samples in order - expected_samples = [self.dataset[i] for i in range(10)] - expected_labels = [safe_item(sample[1]) for sample in expected_samples] - - np.testing.assert_array_equal(safe_numpy(batch_labels), expected_labels) - -class TestNormalizer: - """Test Normalizer with local test data.""" - - def setup_method(self): - """Set up local test data for normalization tests.""" - self.temp_dir = tempfile.mkdtemp() - dataset = TestCIFAR10Dataset(self.temp_dir, train=True, download=True) - - # Get first 20 samples for testing - self.test_data = [] - for i in range(20): - image, _ = dataset[i] - self.test_data.append(image) - - def teardown_method(self): - """Clean up temporary directory.""" - shutil.rmtree(self.temp_dir, ignore_errors=True) - - def test_normalizer_fit_test_data(self): - """Test Normalizer fit with local test data.""" - normalizer = Normalizer() - normalizer.fit(self.test_data) - - # Check computed statistics - assert normalizer.mean is not None - assert normalizer.std is not None - - # Our test data has pixel values 0-255, so mean should be reasonable - assert 0 <= normalizer.mean <= 255 - assert normalizer.std > 0 # Should have some variation - - def test_normalizer_transform_test_data(self): - """Test Normalizer transform with local test data.""" - normalizer = Normalizer() - normalizer.fit(self.test_data) - - # Transform single sample - sample = self.test_data[0] - normalized = normalizer.transform(sample) - - # Check that normalization changes the values - assert not np.allclose(safe_numpy(sample), safe_numpy(normalized)) - - # Check that normalized data has different statistics - original_mean = np.mean(safe_numpy(sample)) - normalized_mean = np.mean(safe_numpy(normalized)) - assert abs(normalized_mean) < abs(original_mean) # Should be closer to 0 - - def test_normalizer_transform_batch_test_data(self): - """Test Normalizer with batch of test data.""" - normalizer = Normalizer() - normalizer.fit(self.test_data) - - # Transform batch - batch = self.test_data[:5] - normalized_batch = normalizer.transform(batch) - - # Check that we get same number of samples - assert len(normalized_batch) == len(batch) - - # Check that each sample is normalized - for original, normalized in zip(batch, normalized_batch): - assert not np.allclose(safe_numpy(original), safe_numpy(normalized)) - -class TestDataPipeline: - """Test complete data pipeline with local test data.""" - - def test_create_data_pipeline_test_data(self): - """Test creating data pipeline with local test data.""" - with tempfile.TemporaryDirectory() as temp_dir: - # Copy test data to temp directory - test_data_dir = Path(__file__).parent / "test_data" - import shutil - shutil.copytree(test_data_dir, temp_dir + "/test_data") - - # Create pipeline (this would normally download CIFAR-10) - # For testing, we'll create a simple pipeline manually - dataset = TestCIFAR10Dataset(temp_dir, train=True, download=True) - dataloader = DataLoader(dataset, batch_size=8, shuffle=True) - - # Test pipeline components - assert len(dataset) == 50 # Our test training set - assert len(dataloader) == 7 # 50 samples / 8 batch_size = 7 batches - - # Test that we can iterate through the pipeline - batch_count = 0 - for batch_data, batch_labels in dataloader: - batch_count += 1 - assert batch_data.shape[1:] == (3, 32, 32) - assert batch_labels.shape[0] <= 8 - - if batch_count >= 3: # Test first few batches - break - - assert batch_count > 0 - - def test_pipeline_normalization_test_data(self): - """Test pipeline with normalization using local test data.""" - with tempfile.TemporaryDirectory() as temp_dir: - dataset = TestCIFAR10Dataset(temp_dir, train=True, download=True) - - # Get some samples for normalization - samples = [dataset[i][0] for i in range(10)] - - # Create and fit normalizer - normalizer = Normalizer() - normalizer.fit(samples) - - # Test that normalization works - normalized = normalizer.transform(samples[0]) - assert not np.allclose(safe_numpy(samples[0]), safe_numpy(normalized)) - - # Test with dataloader - dataloader = DataLoader(dataset, batch_size=5, shuffle=False) - batch_data, batch_labels = next(iter(dataloader)) - - # Normalize batch - normalized_batch = [] - for i in range(batch_data.shape[0]): - sample = Tensor(batch_data.data[i]) - normalized_sample = normalizer.transform(sample) - normalized_batch.append(normalized_sample.data) - - normalized_batch = Tensor(np.stack(normalized_batch)) - - # Check that batch normalization works - assert normalized_batch.shape == batch_data.shape - assert not np.allclose(safe_numpy(batch_data), safe_numpy(normalized_batch)) - -class TestEdgeCases: - """Test edge cases with local test data.""" - - def test_small_batch_size_test_data(self): - """Test with very small batch size using local test data.""" - with tempfile.TemporaryDirectory() as temp_dir: - # Create small dataset - dataset = TestCIFAR10Dataset(temp_dir, train=True, download=True) - - # Use batch size of 1 - loader = DataLoader(dataset, batch_size=1, shuffle=False) - - # Test first few batches - batch_count = 0 - for batch_data, batch_labels in loader: - assert batch_data.shape == (1, 3, 32, 32) - assert batch_labels.shape == (1,) - - batch_count += 1 - if batch_count >= 5: - break - - assert batch_count == 5 - -def run_data_tests(): - """Run all data tests.""" - pytest.main([__file__, "-v"]) - -if __name__ == "__main__": - run_data_tests() \ No newline at end of file diff --git a/modules/source/07_autograd/autograd_dev.py b/modules/source/07_autograd/autograd_dev.py index d8ae01c3..7e1dae31 100644 --- a/modules/source/07_autograd/autograd_dev.py +++ b/modules/source/07_autograd/autograd_dev.py @@ -38,7 +38,7 @@ from collections import defaultdict # Import our existing components try: - from tinytorch.core.tensor import Tensor +from tinytorch.core.tensor import Tensor except ImportError: # For development, import from local modules import os @@ -123,7 +123,7 @@ Let's build the engine that powers modern AI! ### What is a Variable? A **Variable** wraps a Tensor and tracks: - **Data**: The actual values (forward pass) -- **Gradient**: The computed gradients (backward pass) +- **Gradient**: The computed gradients (backward pass) - **Computation history**: How this Variable was created - **Backward function**: How to compute gradients @@ -167,7 +167,7 @@ class Variable: requires_grad: bool = True, grad_fn: Optional[Callable] = None): """ Create a Variable with gradient tracking. - + TODO: Implement Variable initialization with gradient tracking. STEP-BY-STEP IMPLEMENTATION: @@ -275,33 +275,33 @@ class Variable: if self.requires_grad: if self.grad is None: self.grad = gradient - else: + else: # Accumulate gradients self.grad = Variable(self.grad.data.data + gradient.data.data) - if self.grad_fn is not None: - self.grad_fn(gradient) + if self.grad_fn is not None: + self.grad_fn(gradient) ### END SOLUTION - + def zero_grad(self) -> None: """Reset gradients to zero.""" self.grad = None - + def __add__(self, other: Union['Variable', float, int]) -> 'Variable': """Addition operator: self + other""" return add(self, other) - + def __mul__(self, other: Union['Variable', float, int]) -> 'Variable': """Multiplication operator: self * other""" return multiply(self, other) - + def __sub__(self, other: Union['Variable', float, int]) -> 'Variable': """Subtraction operator: self - other""" return subtract(self, other) - + def __truediv__(self, other: Union['Variable', float, int]) -> 'Variable': """Division operator: self / other""" - return divide(self, other) + return divide(self, other) # %% [markdown] """ @@ -817,12 +817,12 @@ Let's see how autograd enables neural network training: 4. **Parameter update**: Update weights using gradients ### Example: Simple Linear Regression -```python + ```python # Model: y = wx + b w = Variable(0.5, requires_grad=True) b = Variable(0.1, requires_grad=True) -# Forward pass + # Forward pass prediction = w * x + b # Loss: mean squared error @@ -870,7 +870,7 @@ def test_neural_network_training(): x = Variable(x_val, requires_grad=False) target = Variable(y_val, requires_grad=False) - # Forward pass + # Forward pass prediction = add(multiply(w, x), b) # wx + b # Loss: squared error diff --git a/modules/source/07_autograd/autograd_dev_backup.py b/modules/source/07_autograd/autograd_dev_backup.py new file mode 100644 index 00000000..471939b6 --- /dev/null +++ b/modules/source/07_autograd/autograd_dev_backup.py @@ -0,0 +1,1672 @@ +# --- +# jupyter: +# jupytext: +# text_representation: +# extension: .py +# format_name: percent +# format_version: '1.3' +# jupytext_version: 1.17.1 +# --- + +# %% [markdown] +""" +# Module 7: Autograd - Automatic Differentiation Engine + +Welcome to the Autograd module! This is where TinyTorch becomes truly powerful. You'll implement the automatic differentiation engine that makes neural network training possible. + +## Learning Goals +- Understand how automatic differentiation works through computational graphs +- Implement the Variable class that tracks gradients and operations +- Build backward propagation for gradient computation +- Create the foundation for neural network training +- Master the mathematical concepts behind backpropagation + +## Build → Use → Analyze +1. **Build**: Create the Variable class and gradient computation system +2. **Use**: Perform automatic differentiation on complex expressions +3. **Analyze**: Understand how gradients flow through computational graphs +""" + +# %% nbgrader={"grade": false, "grade_id": "autograd-imports", "locked": false, "schema_version": 3, "solution": false, "task": false} +#| default_exp core.autograd + +#| export +import numpy as np +import sys +from typing import Union, List, Tuple, Optional, Any, Callable +from collections import defaultdict + +# Import our existing components +from tinytorch.core.tensor import Tensor + +# %% nbgrader={"grade": false, "grade_id": "autograd-setup", "locked": false, "schema_version": 3, "solution": false, "task": false} +print("🔥 TinyTorch Autograd Module") +print(f"NumPy version: {np.__version__}") +print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}") +print("Ready to build automatic differentiation!") + +# %% [markdown] +""" +## 📦 Where This Code Lives in the Final Package + +**Learning Side:** You work in `modules/source/07_autograd/autograd_dev.py` +**Building Side:** Code exports to `tinytorch.core.autograd` + +```python +# Final package structure: +from tinytorch.core.autograd import Variable, backward # The gradient engine! +from tinytorch.core.tensor import Tensor +from tinytorch.core.activations import ReLU, Sigmoid, Tanh +``` + +**Why this matters:** +- **Learning:** Focused module for understanding gradients +- **Production:** Proper organization like PyTorch's `torch.autograd` +- **Consistency:** All gradient operations live together in `core.autograd` +- **Foundation:** Enables training for all neural networks +""" + +# %% [markdown] +""" +## Step 1: What is Automatic Differentiation? + +### Definition +**Automatic differentiation (autograd)** is a technique that automatically computes derivatives of functions represented as computational graphs. It's the magic that makes neural network training possible. + +### The Fundamental Challenge: Computing Gradients at Scale + +#### **The Problem** +Neural networks have millions or billions of parameters. To train them, we need to compute the gradient of the loss function with respect to every single parameter: + +```python +# For a neural network with parameters θ = [w1, w2, ..., wn, b1, b2, ..., bm] +# We need to compute: ∇θ L = [∂L/∂w1, ∂L/∂w2, ..., ∂L/∂wn, ∂L/∂b1, ∂L/∂b2, ..., ∂L/∂bm] +``` + +#### **Why Manual Differentiation Fails** +- **Complexity**: Neural networks are compositions of thousands of operations +- **Error-prone**: Manual computation is extremely difficult and error-prone +- **Inflexible**: Every architecture change requires re-deriving gradients +- **Inefficient**: Manual computation doesn't exploit computational structure + +#### **Why Numerical Differentiation is Inadequate** +```python +# Numerical differentiation: f'(x) ≈ (f(x + h) - f(x)) / h +def numerical_gradient(f, x, h=1e-5): + return (f(x + h) - f(x)) / h +``` + +Problems: +- **Slow**: Requires 2 function evaluations per parameter +- **Imprecise**: Numerical errors accumulate +- **Unstable**: Sensitive to choice of h +- **Expensive**: O(n) cost for n parameters + +### The Solution: Computational Graphs + +#### **Key Insight: Every Computation is a Graph** +Any mathematical expression can be represented as a directed acyclic graph (DAG): + +```python +# Expression: f(x, y) = (x + y) * (x - y) +# Graph representation: +# x ──┐ ┌── add ──┐ +# │ │ │ +# ├─────┤ ├── multiply ── output +# │ │ │ +# y ──┘ └── sub ──┘ +``` + +#### **Forward Pass: Computing Values** +Traverse the graph from inputs to outputs, computing values at each node: + +```python +# Forward pass for f(x, y) = (x + y) * (x - y) +x = 3, y = 2 +add_result = x + y = 5 +sub_result = x - y = 1 +output = add_result * sub_result = 5 +``` + +#### **Backward Pass: Computing Gradients** +Traverse the graph from outputs to inputs, computing gradients using the chain rule: + +For $f(x, y) = (x + y) \cdot (x - y)$ with $x = 3, y = 2$: + +$$\frac{\partial \text{output}}{\partial \text{multiply}} = 1$$ + +$$\frac{\partial \text{output}}{\partial \text{add}} = \frac{\partial \text{output}}{\partial \text{multiply}} \cdot \frac{\partial \text{multiply}}{\partial \text{add}} = 1 \cdot \text{sub\_result} = 1$$ + +$$\frac{\partial \text{output}}{\partial \text{sub}} = \frac{\partial \text{output}}{\partial \text{multiply}} \cdot \frac{\partial \text{multiply}}{\partial \text{sub}} = 1 \cdot \text{add\_result} = 5$$ + +$$\frac{\partial \text{output}}{\partial x} = \frac{\partial \text{output}}{\partial \text{add}} \cdot \frac{\partial \text{add}}{\partial x} + \frac{\partial \text{output}}{\partial \text{sub}} \cdot \frac{\partial \text{sub}}{\partial x} = 1 \cdot 1 + 5 \cdot 1 = 6$$ + +$$\frac{\partial \text{output}}{\partial y} = \frac{\partial \text{output}}{\partial \text{add}} \cdot \frac{\partial \text{add}}{\partial y} + \frac{\partial \text{output}}{\partial \text{sub}} \cdot \frac{\partial \text{sub}}{\partial y} = 1 \cdot 1 + 5 \cdot (-1) = -4$$ +``` + +### Mathematical Foundation: The Chain Rule + +#### **Single Variable Chain Rule** +For composite functions: If $z = f(g(x))$, then: + +$$\frac{dz}{dx} = \frac{dz}{df} \cdot \frac{df}{dx}$$ + +#### **Multivariable Chain Rule** +For functions of multiple variables: If $z = f(x, y)$ where $x = g(t)$ and $y = h(t)$, then: + +$$\frac{dz}{dt} = \frac{\partial z}{\partial x} \cdot \frac{dx}{dt} + \frac{\partial z}{\partial y} \cdot \frac{dy}{dt}$$ + +#### **Chain Rule in Computational Graphs** +For any path from input to output through intermediate nodes: + +$$\frac{\partial \text{output}}{\partial \text{input}} = \prod_{i} \frac{\partial \text{node}_{i+1}}{\partial \text{node}_i}$$ + +### Automatic Differentiation Modes + +#### **Forward Mode (Forward Accumulation)** +- **Process**: Compute derivatives alongside forward pass +- **Efficiency**: Efficient when #inputs << #outputs +- **Use case**: Jacobian-vector products, sensitivity analysis + +#### **Reverse Mode (Backpropagation)** +- **Process**: Compute derivatives in reverse pass after forward pass +- **Efficiency**: Efficient when #outputs << #inputs +- **Use case**: Neural network training (many parameters, few outputs) + +#### **Why Reverse Mode Dominates ML** +Neural networks typically have: +- **Many inputs**: Millions of parameters +- **Few outputs**: Single loss value or small output vector +- **Reverse mode**: O(1) cost per parameter vs O(n) for forward mode + +### The Computational Graph Abstraction + +#### **Nodes: Operations and Variables** +- **Variable nodes**: Store values and gradients +- **Operation nodes**: Define how to compute forward and backward passes + +#### **Edges: Data Dependencies** +- **Forward edges**: Data flow from inputs to outputs +- **Backward edges**: Gradient flow from outputs to inputs + +#### **Dynamic vs Static Graphs** +- **Static graphs**: Define once, execute many times (TensorFlow 1.x) +- **Dynamic graphs**: Build graph during execution (PyTorch, TensorFlow 2.x) + +### Real-World Impact: What Autograd Enables + +#### **Deep Learning Revolution** +```python +# Before autograd: Manual gradient computation +def manual_gradient(x, y, w1, w2, b1, b2): + # Forward pass + z1 = w1 * x + b1 + a1 = sigmoid(z1) + z2 = w2 * a1 + b2 + a2 = sigmoid(z2) + loss = (a2 - y) ** 2 + + # Backward pass (manual) + dloss_da2 = 2 * (a2 - y) + da2_dz2 = sigmoid_derivative(z2) + dz2_dw2 = a1 + dz2_db2 = 1 + dz2_da1 = w2 + da1_dz1 = sigmoid_derivative(z1) + dz1_dw1 = x + dz1_db1 = 1 + + # Chain rule application + dloss_dw2 = dloss_da2 * da2_dz2 * dz2_dw2 + dloss_db2 = dloss_da2 * da2_dz2 * dz2_db2 + dloss_dw1 = dloss_da2 * da2_dz2 * dz2_da1 * da1_dz1 * dz1_dw1 + dloss_db1 = dloss_da2 * da2_dz2 * dz2_da1 * da1_dz1 * dz1_db1 + + return dloss_dw1, dloss_db1, dloss_dw2, dloss_db2 + +# With autograd: Automatic gradient computation +def autograd_gradient(x, y, w1, w2, b1, b2): + # Forward pass with gradient tracking + z1 = w1 * x + b1 + a1 = sigmoid(z1) + z2 = w2 * a1 + b2 + a2 = sigmoid(z2) + loss = (a2 - y) ** 2 + + # Backward pass (automatic) + loss.backward() + + return w1.grad, b1.grad, w2.grad, b2.grad +``` + +#### **Scientific Computing** +- **Optimization**: Gradient-based optimization algorithms +- **Inverse problems**: Parameter estimation from observations +- **Sensitivity analysis**: How outputs change with input perturbations + +#### **Modern AI Applications** +- **Neural architecture search**: Differentiable architecture optimization +- **Meta-learning**: Learning to learn with gradient-based meta-algorithms +- **Differentiable programming**: Entire programs as differentiable functions + +### Performance Considerations + +#### **Memory Management** +- **Intermediate storage**: Must store forward pass results for backward pass +- **Memory optimization**: Checkpointing, gradient accumulation +- **Trade-offs**: Memory vs computation time + +#### **Computational Efficiency** +- **Graph optimization**: Fuse operations, eliminate redundancy +- **Parallelization**: Compute independent gradients simultaneously +- **Hardware acceleration**: Specialized gradient computation on GPUs/TPUs + +#### **Numerical Stability** +- **Gradient clipping**: Prevent exploding gradients +- **Numerical precision**: Balance between float16 and float32 +- **Accumulation order**: Minimize numerical errors + +### Connection to Neural Network Training + +#### **The Training Loop** +```python +for epoch in range(num_epochs): + for batch in dataloader: + # Forward pass + predictions = model(batch.inputs) + loss = criterion(predictions, batch.targets) + + # Backward pass (autograd) + loss.backward() + + # Parameter update + optimizer.step() + optimizer.zero_grad() +``` + +#### **Gradient-Based Optimization** +- **Stochastic Gradient Descent**: Use gradients to update parameters +- **Adaptive methods**: Adam, RMSprop use gradient statistics +- **Second-order methods**: Use gradient and Hessian information + +### Why Autograd is Revolutionary + +#### **Democratization of Deep Learning** +- **Research acceleration**: Focus on architecture, not gradient computation +- **Experimentation**: Easy to try new ideas and architectures +- **Accessibility**: Researchers don't need to be differentiation experts + +#### **Scalability** +- **Large models**: Handle millions/billions of parameters automatically +- **Complex architectures**: Support arbitrary computational graphs +- **Distributed training**: Coordinate gradients across multiple devices + +Let's implement the Variable class that makes this magic possible! +""" + +# %% [markdown] +""" +## Step 2: The Variable Class + +### Core Concept +A **Variable** wraps a Tensor and tracks: +- **Data**: The actual values (forward pass) +- **Gradient**: The computed gradients (backward pass) +- **Computation history**: How this Variable was created +- **Backward function**: How to compute gradients + +### Design Principles +- **Transparency**: Works seamlessly with existing Tensor operations +- **Efficiency**: Minimal overhead for forward pass +- **Flexibility**: Supports any differentiable operation +- **Correctness**: Implements the chain rule precisely +""" + +# %% nbgrader={"grade": false, "grade_id": "variable-class", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +class Variable: + """ + Variable: Tensor wrapper with automatic differentiation capabilities. + + The fundamental class for gradient computation in TinyTorch. + Wraps Tensor objects and tracks computational history for backpropagation. + """ + + def __init__(self, data: Union[Tensor, np.ndarray, list, float, int], + requires_grad: bool = True, grad_fn: Optional[Callable] = None): + """ + Create a Variable with gradient tracking. + + Args: + data: The data to wrap (will be converted to Tensor) + requires_grad: Whether to compute gradients for this Variable + grad_fn: Function to compute gradients (None for leaf nodes) + + TODO: Implement Variable initialization with gradient tracking. + + APPROACH: + 1. Convert data to Tensor if it's not already + 2. Store the tensor data + 3. Set gradient tracking flag + 4. Initialize gradient to None (will be computed later) + 5. Store the gradient function for backward pass + 6. Track if this is a leaf node (no grad_fn) + + EXAMPLE: + Variable(5.0) → Variable wrapping Tensor(5.0) + Variable([1, 2, 3]) → Variable wrapping Tensor([1, 2, 3]) + + HINTS: + - Use isinstance() to check if data is already a Tensor + - Store requires_grad, grad_fn, and is_leaf flags + - Initialize self.grad to None + - A leaf node has grad_fn=None + """ + ### BEGIN SOLUTION + # Convert data to Tensor if needed + if isinstance(data, Tensor): + self.data = data + else: + self.data = Tensor(data) + + # Set gradient tracking + self.requires_grad = requires_grad + self.grad = None # Will be initialized when needed + self.grad_fn = grad_fn + self.is_leaf = grad_fn is None + + # For computational graph + self._backward_hooks = [] + ### END SOLUTION + + @property + def shape(self) -> Tuple[int, ...]: + """Get the shape of the underlying tensor.""" + return self.data.shape + + @property + def size(self) -> int: + """Get the total number of elements.""" + return self.data.size + + def __repr__(self) -> str: + """String representation of the Variable.""" + grad_str = f", grad_fn={self.grad_fn.__name__}" if self.grad_fn else "" + return f"Variable({self.data.data.tolist()}, requires_grad={self.requires_grad}{grad_str})" + + def backward(self, gradient: Optional['Variable'] = None) -> None: + """ + Compute gradients using backpropagation. + + Args: + gradient: The gradient to backpropagate (defaults to ones) + + TODO: Implement backward propagation. + + APPROACH: + 1. If gradient is None, create a gradient of ones with same shape + 2. If this Variable doesn't require gradients, return early + 3. If this is a leaf node, accumulate the gradient + 4. If this has a grad_fn, call it to propagate gradients + + EXAMPLE: + x = Variable(5.0) + y = x * 2 + y.backward() # Computes x.grad = 2.0 + + HINTS: + - Use np.ones_like() to create default gradient + - Accumulate gradients with += for leaf nodes + - Call self.grad_fn(gradient) for non-leaf nodes + """ + ### BEGIN SOLUTION + # Default gradient is ones + if gradient is None: + gradient = Variable(np.ones_like(self.data.data)) + + # Skip if gradients not required + if not self.requires_grad: + return + + # Accumulate gradient for leaf nodes + if self.is_leaf: + if self.grad is None: + self.grad = Variable(np.zeros_like(self.data.data)) + self.grad.data._data += gradient.data.data + else: + # Propagate gradients through grad_fn + if self.grad_fn is not None: + self.grad_fn(gradient) + ### END SOLUTION + + def zero_grad(self) -> None: + """Zero out the gradient.""" + if self.grad is not None: + self.grad.data._data.fill(0) + + # Arithmetic operations with gradient tracking + def __add__(self, other: Union['Variable', float, int]) -> 'Variable': + """Addition with gradient tracking.""" + return add(self, other) + + def __mul__(self, other: Union['Variable', float, int]) -> 'Variable': + """Multiplication with gradient tracking.""" + return multiply(self, other) + + def __sub__(self, other: Union['Variable', float, int]) -> 'Variable': + """Subtraction with gradient tracking.""" + return subtract(self, other) + + def __truediv__(self, other: Union['Variable', float, int]) -> 'Variable': + """Division with gradient tracking.""" + return divide(self, other) + +# %% [markdown] +""" +## Step 3: Basic Operations with Gradients + +### The Pattern +Every differentiable operation follows the same pattern: +1. **Forward pass**: Compute the result +2. **Create grad_fn**: Function that knows how to compute gradients +3. **Return Variable**: With the result and grad_fn + +### Mathematical Rules +- **Addition**: $\frac{d(x + y)}{dx} = 1$, $\frac{d(x + y)}{dy} = 1$ +- **Multiplication**: $\frac{d(x \cdot y)}{dx} = y$, $\frac{d(x \cdot y)}{dy} = x$ +- **Subtraction**: $\frac{d(x - y)}{dx} = 1$, $\frac{d(x - y)}{dy} = -1$ +- **Division**: $\frac{d(x / y)}{dx} = \frac{1}{y}$, $\frac{d(x / y)}{dy} = -\frac{x}{y^2}$ + +### Implementation Strategy +Each operation creates a closure that captures the input variables and implements the gradient computation rule. +""" + +# %% nbgrader={"grade": false, "grade_id": "add-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def add(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Variable: + """ + Addition operation with gradient tracking. + + Args: + a: First operand + b: Second operand + + Returns: + Variable with sum and gradient function + + TODO: Implement addition with gradient computation. + + APPROACH: + 1. Convert inputs to Variables if needed + 2. Compute forward pass: result = a + b + 3. Create gradient function that distributes gradients + 4. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = x + y, then dz/dx = 1, dz/dy = 1 + + EXAMPLE: + x = Variable(2.0), y = Variable(3.0) + z = add(x, y) # z.data = 5.0 + z.backward() # x.grad = 1.0, y.grad = 1.0 + + HINTS: + - Use isinstance() to check if inputs are Variables + - Create a closure that captures a and b + - In grad_fn, call a.backward() and b.backward() with appropriate gradients + """ + ### BEGIN SOLUTION + # Convert to Variables if needed + if not isinstance(a, Variable): + a = Variable(a, requires_grad=False) + if not isinstance(b, Variable): + b = Variable(b, requires_grad=False) + + # Forward pass + result_data = a.data + b.data + + # Create gradient function + def grad_fn(grad_output): + # Addition distributes gradients equally + if a.requires_grad: + a.backward(grad_output) + if b.requires_grad: + b.backward(grad_output) + + # Determine if result requires gradients + requires_grad = a.requires_grad or b.requires_grad + + return Variable(result_data, requires_grad=requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "multiply-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def multiply(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Variable: + """ + Multiplication operation with gradient tracking. + + Args: + a: First operand + b: Second operand + + Returns: + Variable with product and gradient function + + TODO: Implement multiplication with gradient computation. + + APPROACH: + 1. Convert inputs to Variables if needed + 2. Compute forward pass: result = a * b + 3. Create gradient function using product rule + 4. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = x * y, then dz/dx = y, dz/dy = x + + EXAMPLE: + x = Variable(2.0), y = Variable(3.0) + z = multiply(x, y) # z.data = 6.0 + z.backward() # x.grad = 3.0, y.grad = 2.0 + + HINTS: + - Store a.data and b.data for gradient computation + - In grad_fn, multiply incoming gradient by the other operand + - Handle broadcasting if shapes are different + """ + ### BEGIN SOLUTION + # Convert to Variables if needed + if not isinstance(a, Variable): + a = Variable(a, requires_grad=False) + if not isinstance(b, Variable): + b = Variable(b, requires_grad=False) + + # Forward pass + result_data = a.data * b.data + + # Create gradient function + def grad_fn(grad_output): + # Product rule: d(xy)/dx = y, d(xy)/dy = x + if a.requires_grad: + a_grad = Variable(grad_output.data * b.data) + a.backward(a_grad) + if b.requires_grad: + b_grad = Variable(grad_output.data * a.data) + b.backward(b_grad) + + # Determine if result requires gradients + requires_grad = a.requires_grad or b.requires_grad + + return Variable(result_data, requires_grad=requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "subtract-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def subtract(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Variable: + """ + Subtraction operation with gradient tracking. + + Args: + a: First operand (minuend) + b: Second operand (subtrahend) + + Returns: + Variable with difference and gradient function + + TODO: Implement subtraction with gradient computation. + + APPROACH: + 1. Convert inputs to Variables if needed + 2. Compute forward pass: result = a - b + 3. Create gradient function with correct signs + 4. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = x - y, then dz/dx = 1, dz/dy = -1 + + EXAMPLE: + x = Variable(5.0), y = Variable(3.0) + z = subtract(x, y) # z.data = 2.0 + z.backward() # x.grad = 1.0, y.grad = -1.0 + + HINTS: + - Forward pass is straightforward: a - b + - Gradient for a is positive, for b is negative + - Remember to negate the gradient for b + """ + ### BEGIN SOLUTION + # Convert to Variables if needed + if not isinstance(a, Variable): + a = Variable(a, requires_grad=False) + if not isinstance(b, Variable): + b = Variable(b, requires_grad=False) + + # Forward pass + result_data = a.data - b.data + + # Create gradient function + def grad_fn(grad_output): + # Subtraction rule: d(x-y)/dx = 1, d(x-y)/dy = -1 + if a.requires_grad: + a.backward(grad_output) + if b.requires_grad: + b_grad = Variable(-grad_output.data.data) + b.backward(b_grad) + + # Determine if result requires gradients + requires_grad = a.requires_grad or b.requires_grad + + return Variable(result_data, requires_grad=requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "divide-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def divide(a: Union[Variable, float, int], b: Union[Variable, float, int]) -> Variable: + """ + Division operation with gradient tracking. + + Args: + a: Numerator + b: Denominator + + Returns: + Variable with quotient and gradient function + + TODO: Implement division with gradient computation. + + APPROACH: + 1. Convert inputs to Variables if needed + 2. Compute forward pass: result = a / b + 3. Create gradient function using quotient rule + 4. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = x / y, then dz/dx = \frac{1}{y}, dz/dy = -\frac{x}{y^2} + + EXAMPLE: + x = Variable(6.0), y = Variable(2.0) + z = divide(x, y) # z.data = 3.0 + z.backward() # x.grad = 0.5, y.grad = -1.5 + + HINTS: + - Forward pass: a.data / b.data + - Gradient for a: grad_output / b.data + - Gradient for b: -grad_output * a.data / (b.data ** 2) + - Be careful with numerical stability + """ + ### BEGIN SOLUTION + # Convert to Variables if needed + if not isinstance(a, Variable): + a = Variable(a, requires_grad=False) + if not isinstance(b, Variable): + b = Variable(b, requires_grad=False) + + # Forward pass + result_data = a.data / b.data + + # Create gradient function + def grad_fn(grad_output): + # Quotient rule: d(x/y)/dx = 1/y, d(x/y)/dy = -x/y² + if a.requires_grad: + a_grad = Variable(grad_output.data.data / b.data.data) + a.backward(a_grad) + if b.requires_grad: + b_grad = Variable(-grad_output.data.data * a.data.data / (b.data.data ** 2)) + b.backward(b_grad) + + # Determine if result requires gradients + requires_grad = a.requires_grad or b.requires_grad + + return Variable(result_data, requires_grad=requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% [markdown] +""" +## Step 4: Testing Basic Operations + +Let's test our basic operations to ensure they compute gradients correctly. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-basic-operations", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +def test_basic_operations(): + """Test basic operations with gradient computation.""" + print("🔬 Testing basic operations...") + + # Test addition + print("📊 Testing addition...") + x = Variable(2.0, requires_grad=True) + y = Variable(3.0, requires_grad=True) + z = add(x, y) + + assert abs(z.data.data.item() - 5.0) < 1e-6, f"Addition failed: expected 5.0, got {z.data.data.item()}" + + z.backward() + assert abs(x.grad.data.data.item() - 1.0) < 1e-6, f"Addition gradient for x failed: expected 1.0, got {x.grad.data.data.item()}" + assert abs(y.grad.data.data.item() - 1.0) < 1e-6, f"Addition gradient for y failed: expected 1.0, got {y.grad.data.data.item()}" + print("✅ Addition test passed!") + + # Test multiplication + print("📊 Testing multiplication...") + x = Variable(2.0, requires_grad=True) + y = Variable(3.0, requires_grad=True) + z = multiply(x, y) + + assert abs(z.data.data.item() - 6.0) < 1e-6, f"Multiplication failed: expected 6.0, got {z.data.data.item()}" + + z.backward() + assert abs(x.grad.data.data.item() - 3.0) < 1e-6, f"Multiplication gradient for x failed: expected 3.0, got {x.grad.data.data.item()}" + assert abs(y.grad.data.data.item() - 2.0) < 1e-6, f"Multiplication gradient for y failed: expected 2.0, got {y.grad.data.data.item()}" + print("✅ Multiplication test passed!") + + # Test subtraction + print("📊 Testing subtraction...") + x = Variable(5.0, requires_grad=True) + y = Variable(3.0, requires_grad=True) + z = subtract(x, y) + + assert abs(z.data.data.item() - 2.0) < 1e-6, f"Subtraction failed: expected 2.0, got {z.data.data.item()}" + + z.backward() + assert abs(x.grad.data.data.item() - 1.0) < 1e-6, f"Subtraction gradient for x failed: expected 1.0, got {x.grad.data.data.item()}" + assert abs(y.grad.data.data.item() - (-1.0)) < 1e-6, f"Subtraction gradient for y failed: expected -1.0, got {y.grad.data.data.item()}" + print("✅ Subtraction test passed!") + + # Test division + print("📊 Testing division...") + x = Variable(6.0, requires_grad=True) + y = Variable(2.0, requires_grad=True) + z = divide(x, y) + + assert abs(z.data.data.item() - 3.0) < 1e-6, f"Division failed: expected 3.0, got {z.data.data.item()}" + + z.backward() + assert abs(x.grad.data.data.item() - 0.5) < 1e-6, f"Division gradient for x failed: expected 0.5, got {x.grad.data.data.item()}" + assert abs(y.grad.data.data.item() - (-1.5)) < 1e-6, f"Division gradient for y failed: expected -1.5, got {y.grad.data.data.item()}" + print("✅ Division test passed!") + + print("🎉 All basic operation tests passed!") + return True + +# Run the test +success = test_basic_operations() + +# %% [markdown] +""" +## Step 5: Chain Rule Testing + +Let's test more complex expressions to ensure the chain rule works correctly. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-chain-rule", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +def test_chain_rule(): + """Test chain rule with complex expressions.""" + print("🔬 Testing chain rule...") + + # Test: f(x, y) = (x + y) * (x - y) = x² - y² + print("📊 Testing f(x, y) = (x + y) * (x - y)...") + x = Variable(3.0, requires_grad=True) + y = Variable(2.0, requires_grad=True) + + # Forward pass + sum_xy = add(x, y) # x + y = 5 + diff_xy = subtract(x, y) # x - y = 1 + result = multiply(sum_xy, diff_xy) # (x + y) * (x - y) = 5 + + assert abs(result.data.data.item() - 5.0) < 1e-6, f"Chain rule forward failed: expected 5.0, got {result.data.data.item()}" + + # Backward pass + result.backward() + + # Analytical gradients: df/dx = 2x = 6, df/dy = -2y = -4 + expected_x_grad = 2 * 3.0 # 6.0 + expected_y_grad = -2 * 2.0 # -4.0 + + assert abs(x.grad.data.data.item() - expected_x_grad) < 1e-6, f"Chain rule x gradient failed: expected {expected_x_grad}, got {x.grad.data.data.item()}" + assert abs(y.grad.data.data.item() - expected_y_grad) < 1e-6, f"Chain rule y gradient failed: expected {expected_y_grad}, got {y.grad.data.data.item()}" + print("✅ Chain rule test passed!") + + # Test: f(x) = x * x * x (x³) + print("📊 Testing f(x) = x³...") + x = Variable(2.0, requires_grad=True) + + # Forward pass + x_squared = multiply(x, x) # x² + x_cubed = multiply(x_squared, x) # x³ + + assert abs(x_cubed.data.data.item() - 8.0) < 1e-6, f"x³ forward failed: expected 8.0, got {x_cubed.data.data.item()}" + + # Backward pass + x_cubed.backward() + + # Analytical gradient: df/dx = 3x² = 12 + expected_grad = 3 * (2.0 ** 2) # 12.0 + + assert abs(x.grad.data.data.item() - expected_grad) < 1e-6, f"x³ gradient failed: expected {expected_grad}, got {x.grad.data.data.item()}" + print("✅ x³ test passed!") + + print("🎉 All chain rule tests passed!") + return True + +# Run the test +success = test_chain_rule() + +# %% [markdown] +""" +## Step 6: Activation Function Gradients + +Now let's implement gradients for activation functions to integrate with our existing modules. +""" + +# %% nbgrader={"grade": false, "grade_id": "relu-gradient", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def relu_with_grad(x: Variable) -> Variable: + """ + ReLU activation with gradient tracking. + + Args: + x: Input Variable + + Returns: + Variable with ReLU applied and gradient function + + TODO: Implement ReLU with gradient computation. + + APPROACH: + 1. Compute forward pass: max(0, x) + 2. Create gradient function using ReLU derivative + 3. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + f(x) = max(0, x) + f'(x) = 1 if x > 0, else 0 + + EXAMPLE: + x = Variable([-1.0, 0.0, 1.0]) + y = relu_with_grad(x) # y.data = [0.0, 0.0, 1.0] + y.backward() # x.grad = [0.0, 0.0, 1.0] + + HINTS: + - Use np.maximum(0, x.data.data) for forward pass + - Use (x.data.data > 0) for gradient mask + - Only propagate gradients where input was positive + """ + ### BEGIN SOLUTION + # Forward pass + result_data = Tensor(np.maximum(0, x.data.data)) + + # Create gradient function + def grad_fn(grad_output): + if x.requires_grad: + # ReLU derivative: 1 if x > 0, else 0 + mask = (x.data.data > 0).astype(np.float32) + x_grad = Variable(grad_output.data.data * mask) + x.backward(x_grad) + + return Variable(result_data, requires_grad=x.requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "sigmoid-gradient", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def sigmoid_with_grad(x: Variable) -> Variable: + """ + Sigmoid activation with gradient tracking. + + Args: + x: Input Variable + + Returns: + Variable with sigmoid applied and gradient function + + TODO: Implement sigmoid with gradient computation. + + APPROACH: + 1. Compute forward pass: 1 / (1 + exp(-x)) + 2. Create gradient function using sigmoid derivative + 3. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + f(x) = 1 / (1 + exp(-x)) + f'(x) = f(x) * (1 - f(x)) + + EXAMPLE: + x = Variable(0.0) + y = sigmoid_with_grad(x) # y.data = 0.5 + y.backward() # x.grad = 0.25 + + HINTS: + - Use np.clip for numerical stability + - Store sigmoid output for gradient computation + - Gradient is sigmoid * (1 - sigmoid) + """ + ### BEGIN SOLUTION + # Forward pass with numerical stability + clipped = np.clip(x.data.data, -500, 500) + sigmoid_output = 1.0 / (1.0 + np.exp(-clipped)) + result_data = Tensor(sigmoid_output) + + # Create gradient function + def grad_fn(grad_output): + if x.requires_grad: + # Sigmoid derivative: sigmoid * (1 - sigmoid) + sigmoid_grad = sigmoid_output * (1.0 - sigmoid_output) + x_grad = Variable(grad_output.data.data * sigmoid_grad) + x.backward(x_grad) + + return Variable(result_data, requires_grad=x.requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% [markdown] +""" +## Step 7: Integration Testing + +Let's test our autograd system with a simple neural network scenario. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-integration", "locked": true, "points": 25, "schema_version": 3, "solution": false, "task": false} +def test_integration(): + """Test autograd integration with neural network scenario.""" + print("🔬 Testing autograd integration...") + + # Simple neural network: input -> linear -> ReLU -> output + print("📊 Testing simple neural network...") + + # Input + x = Variable(2.0, requires_grad=True) + + # Weights and bias + w1 = Variable(0.5, requires_grad=True) + b1 = Variable(0.1, requires_grad=True) + w2 = Variable(1.5, requires_grad=True) + + # Forward pass + linear1 = add(multiply(x, w1), b1) # x * w1 + b1 = 2*0.5 + 0.1 = 1.1 + activation1 = relu_with_grad(linear1) # ReLU(1.1) = 1.1 + output = multiply(activation1, w2) # 1.1 * 1.5 = 1.65 + + # Check forward pass + expected_output = 1.65 + assert abs(output.data.data.item() - expected_output) < 1e-6, f"Integration forward failed: expected {expected_output}, got {output.data.data.item()}" + + # Backward pass + output.backward() + + # Check gradients + # dL/dx = dL/doutput * doutput/dactivation1 * dactivation1/dlinear1 * dlinear1/dx + # = 1 * w2 * 1 * w1 = 1.5 * 0.5 = 0.75 + expected_x_grad = 0.75 + assert abs(x.grad.data.data.item() - expected_x_grad) < 1e-6, f"Integration x gradient failed: expected {expected_x_grad}, got {x.grad.data.data.item()}" + + # dL/dw1 = dL/doutput * doutput/dactivation1 * dactivation1/dlinear1 * dlinear1/dw1 + # = 1 * w2 * 1 * x = 1.5 * 2.0 = 3.0 + expected_w1_grad = 3.0 + assert abs(w1.grad.data.data.item() - expected_w1_grad) < 1e-6, f"Integration w1 gradient failed: expected {expected_w1_grad}, got {w1.grad.data.data.item()}" + + # dL/db1 = dL/doutput * doutput/dactivation1 * dactivation1/dlinear1 * dlinear1/db1 + # = 1 * w2 * 1 * 1 = 1.5 + expected_b1_grad = 1.5 + assert abs(b1.grad.data.data.item() - expected_b1_grad) < 1e-6, f"Integration b1 gradient failed: expected {expected_b1_grad}, got {b1.grad.data.data.item()}" + + # dL/dw2 = dL/doutput * doutput/dw2 = 1 * activation1 = 1.1 + expected_w2_grad = 1.1 + assert abs(w2.grad.data.data.item() - expected_w2_grad) < 1e-6, f"Integration w2 gradient failed: expected {expected_w2_grad}, got {w2.grad.data.data.item()}" + + print("✅ Integration test passed!") + print("🎉 All autograd tests passed!") + return True + +# Run the test +success = test_integration() + +# %% [markdown] +""" +## 🎯 Module Summary + +Congratulations! You've successfully implemented automatic differentiation for TinyTorch: + +### What You've Accomplished +✅ **Variable Class**: Tensor wrapper with gradient tracking and computational graph +✅ **Basic Operations**: Addition, multiplication, subtraction, division with gradients +✅ **Chain Rule**: Automatic gradient computation through complex expressions +✅ **Activation Functions**: ReLU and Sigmoid with proper gradient computation +✅ **Integration**: Works seamlessly with neural network scenarios + +### Key Concepts You've Learned +- **Computational graphs** represent mathematical expressions as directed graphs +- **Forward pass** computes function values following the graph +- **Backward pass** computes gradients using the chain rule in reverse +- **Gradient functions** capture how to compute gradients for each operation +- **Variable tracking** enables automatic differentiation of any expression + +### Mathematical Foundations +- **Chain rule**: The fundamental principle behind backpropagation +- **Partial derivatives**: How gradients flow through operations +- **Computational efficiency**: Reusing forward pass results in backward pass +- **Numerical stability**: Handling edge cases in gradient computation + +### Real-World Applications +- **Neural network training**: Backpropagation through layers +- **Optimization**: Gradient descent and advanced optimizers +- **Scientific computing**: Sensitivity analysis and inverse problems +- **Machine learning**: Any gradient-based learning algorithm + +### Next Steps +1. **Export your code**: `tito package nbdev --export 07_autograd` +2. **Test your implementation**: `tito module test 07_autograd` +3. **Use your autograd**: + ```python + from tinytorch.core.autograd import Variable + + x = Variable(2.0, requires_grad=True) + y = x**2 + 3*x + 1 + y.backward() + print(x.grad) # Your gradients in action! + ``` +4. **Move to Module 8**: Start building training loops and optimizers! + +**Ready for the next challenge?** Let's use your autograd system to build complete training pipelines! +""" + +# %% [markdown] +""" +## Step 8: Performance Optimizations and Advanced Features + +### Memory Management +- **Gradient Accumulation**: Efficient in-place gradient updates +- **Computational Graph Cleanup**: Release intermediate values when possible +- **Lazy Evaluation**: Compute gradients only when needed + +### Numerical Stability +- **Gradient Clipping**: Prevent exploding gradients +- **Numerical Precision**: Handle edge cases gracefully +- **Overflow Protection**: Clip extreme values + +### Advanced Features +- **Higher-Order Gradients**: Gradients of gradients +- **Gradient Checkpointing**: Memory-efficient backpropagation +- **Custom Operations**: Framework for user-defined differentiable functions +""" + +# %% nbgrader={"grade": false, "grade_id": "advanced-features", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def power(base: Variable, exponent: Union[float, int]) -> Variable: + """ + Power operation with gradient tracking: base^exponent. + + Args: + base: Base Variable + exponent: Exponent (scalar) + + Returns: + Variable with power applied and gradient function + + TODO: Implement power operation with gradient computation. + + APPROACH: + 1. Compute forward pass: base^exponent + 2. Create gradient function using power rule + 3. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = x^n, then dz/dx = n * x^(n-1) + + EXAMPLE: + x = Variable(2.0) + y = power(x, 3) # y.data = 8.0 + y.backward() # x.grad = 3 * 2^2 = 12.0 + + HINTS: + - Use np.power() for forward pass + - Power rule: gradient = exponent * base^(exponent-1) + - Handle edge cases like exponent=0 or base=0 + """ + ### BEGIN SOLUTION + # Forward pass + result_data = Tensor(np.power(base.data.data, exponent)) + + # Create gradient function + def grad_fn(grad_output): + if base.requires_grad: + # Power rule: d(x^n)/dx = n * x^(n-1) + if exponent == 0: + # Special case: derivative of constant is 0 + base_grad = Variable(np.zeros_like(base.data.data)) + else: + base_grad_data = exponent * np.power(base.data.data, exponent - 1) + base_grad = Variable(grad_output.data.data * base_grad_data) + base.backward(base_grad) + + return Variable(result_data, requires_grad=base.requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "exp-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def exp(x: Variable) -> Variable: + """ + Exponential operation with gradient tracking: e^x. + + Args: + x: Input Variable + + Returns: + Variable with exponential applied and gradient function + + TODO: Implement exponential operation with gradient computation. + + APPROACH: + 1. Compute forward pass: e^x + 2. Create gradient function using exponential derivative + 3. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = e^x, then dz/dx = e^x + + EXAMPLE: + x = Variable(1.0) + y = exp(x) # y.data = e^1 ≈ 2.718 + y.backward() # x.grad = e^1 ≈ 2.718 + + HINTS: + - Use np.exp() for forward pass + - Exponential derivative is itself: d(e^x)/dx = e^x + - Store result for gradient computation + """ + ### BEGIN SOLUTION + # Forward pass + exp_result = np.exp(x.data.data) + result_data = Tensor(exp_result) + + # Create gradient function + def grad_fn(grad_output): + if x.requires_grad: + # Exponential derivative: d(e^x)/dx = e^x + x_grad = Variable(grad_output.data.data * exp_result) + x.backward(x_grad) + + return Variable(result_data, requires_grad=x.requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "log-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def log(x: Variable) -> Variable: + """ + Natural logarithm operation with gradient tracking: ln(x). + + Args: + x: Input Variable + + Returns: + Variable with logarithm applied and gradient function + + TODO: Implement logarithm operation with gradient computation. + + APPROACH: + 1. Compute forward pass: ln(x) + 2. Create gradient function using logarithm derivative + 3. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = ln(x), then dz/dx = 1/x + + EXAMPLE: + x = Variable(2.0) + y = log(x) # y.data = ln(2) ≈ 0.693 + y.backward() # x.grad = 1/2 = 0.5 + + HINTS: + - Use np.log() for forward pass + - Logarithm derivative: d(ln(x))/dx = 1/x + - Handle numerical stability for small x + """ + ### BEGIN SOLUTION + # Forward pass with numerical stability + clipped_x = np.clip(x.data.data, 1e-8, np.inf) # Avoid log(0) + result_data = Tensor(np.log(clipped_x)) + + # Create gradient function + def grad_fn(grad_output): + if x.requires_grad: + # Logarithm derivative: d(ln(x))/dx = 1/x + x_grad = Variable(grad_output.data.data / clipped_x) + x.backward(x_grad) + + return Variable(result_data, requires_grad=x.requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "sum-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def sum_all(x: Variable) -> Variable: + """ + Sum all elements operation with gradient tracking. + + Args: + x: Input Variable + + Returns: + Variable with sum and gradient function + + TODO: Implement sum operation with gradient computation. + + APPROACH: + 1. Compute forward pass: sum of all elements + 2. Create gradient function that broadcasts gradient back + 3. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = sum(x), then dz/dx_i = 1 for all i + + EXAMPLE: + x = Variable([[1, 2], [3, 4]]) + y = sum_all(x) # y.data = 10 + y.backward() # x.grad = [[1, 1], [1, 1]] + + HINTS: + - Use np.sum() for forward pass + - Gradient is ones with same shape as input + - This is used for loss computation + """ + ### BEGIN SOLUTION + # Forward pass + result_data = Tensor(np.sum(x.data.data)) + + # Create gradient function + def grad_fn(grad_output): + if x.requires_grad: + # Sum gradient: broadcasts to all elements + x_grad = Variable(grad_output.data.data * np.ones_like(x.data.data)) + x.backward(x_grad) + + return Variable(result_data, requires_grad=x.requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "mean-operation", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def mean(x: Variable) -> Variable: + """ + Mean operation with gradient tracking. + + Args: + x: Input Variable + + Returns: + Variable with mean and gradient function + + TODO: Implement mean operation with gradient computation. + + APPROACH: + 1. Compute forward pass: mean of all elements + 2. Create gradient function that distributes gradient evenly + 3. Return Variable with result and grad_fn + + MATHEMATICAL RULE: + If z = mean(x), then dz/dx_i = 1/n for all i (where n is number of elements) + + EXAMPLE: + x = Variable([[1, 2], [3, 4]]) + y = mean(x) # y.data = 2.5 + y.backward() # x.grad = [[0.25, 0.25], [0.25, 0.25]] + + HINTS: + - Use np.mean() for forward pass + - Gradient is 1/n for each element + - This is commonly used for loss computation + """ + ### BEGIN SOLUTION + # Forward pass + result_data = Tensor(np.mean(x.data.data)) + + # Create gradient function + def grad_fn(grad_output): + if x.requires_grad: + # Mean gradient: 1/n for each element + n = x.data.size + x_grad = Variable(grad_output.data.data * np.ones_like(x.data.data) / n) + x.backward(x_grad) + + return Variable(result_data, requires_grad=x.requires_grad, grad_fn=grad_fn) + ### END SOLUTION + +# %% [markdown] +""" +## Step 9: Gradient Utilities and Helper Functions + +### Gradient Management +- **Gradient Clipping**: Prevent exploding gradients +- **Gradient Checking**: Verify gradient correctness +- **Parameter Collection**: Gather all parameters for optimization + +### Debugging Tools +- **Gradient Visualization**: Inspect gradient flow +- **Computational Graph**: Visualize the computation graph +- **Gradient Statistics**: Monitor gradient magnitudes +""" + +# %% nbgrader={"grade": false, "grade_id": "gradient-utilities", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def clip_gradients(variables: List[Variable], max_norm: float = 1.0) -> None: + """ + Clip gradients to prevent exploding gradients. + + Args: + variables: List of Variables to clip gradients for + max_norm: Maximum gradient norm allowed + + TODO: Implement gradient clipping. + + APPROACH: + 1. Compute total gradient norm across all variables + 2. If norm exceeds max_norm, scale all gradients down + 3. Modify gradients in-place + + MATHEMATICAL RULE: + If ||g|| > max_norm, then g := g * (max_norm / ||g||) + + EXAMPLE: + variables = [w1, w2, b1, b2] + clip_gradients(variables, max_norm=1.0) + + HINTS: + - Compute L2 norm of all gradients combined + - Scale factor = max_norm / total_norm + - Only clip if total_norm > max_norm + """ + ### BEGIN SOLUTION + # Compute total gradient norm + total_norm = 0.0 + for var in variables: + if var.grad is not None: + total_norm += np.sum(var.grad.data.data ** 2) + total_norm = np.sqrt(total_norm) + + # Clip if necessary + if total_norm > max_norm: + scale_factor = max_norm / total_norm + for var in variables: + if var.grad is not None: + var.grad.data._data *= scale_factor + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "collect-parameters", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def collect_parameters(*modules) -> List[Variable]: + """ + Collect all parameters from modules for optimization. + + Args: + *modules: Variable number of modules/objects with parameters + + Returns: + List of all Variables that require gradients + + TODO: Implement parameter collection. + + APPROACH: + 1. Iterate through all provided modules + 2. Find all Variable attributes that require gradients + 3. Return list of all such Variables + + EXAMPLE: + layer1 = SomeLayer() + layer2 = SomeLayer() + params = collect_parameters(layer1, layer2) + + HINTS: + - Use hasattr() and getattr() to find Variable attributes + - Check if attribute is Variable and requires_grad + - Handle different module types gracefully + """ + ### BEGIN SOLUTION + parameters = [] + for module in modules: + if hasattr(module, '__dict__'): + for attr_name, attr_value in module.__dict__.items(): + if isinstance(attr_value, Variable) and attr_value.requires_grad: + parameters.append(attr_value) + return parameters + ### END SOLUTION + +# %% nbgrader={"grade": false, "grade_id": "zero-gradients", "locked": false, "schema_version": 3, "solution": true, "task": false} +#| export +def zero_gradients(variables: List[Variable]) -> None: + """ + Zero out gradients for all variables. + + Args: + variables: List of Variables to zero gradients for + + TODO: Implement gradient zeroing. + + APPROACH: + 1. Iterate through all variables + 2. Call zero_grad() on each variable + 3. Handle None gradients gracefully + + EXAMPLE: + parameters = [w1, w2, b1, b2] + zero_gradients(parameters) + + HINTS: + - Use the zero_grad() method on each Variable + - Check if variable has gradients before zeroing + - This is typically called before each training step + """ + ### BEGIN SOLUTION + for var in variables: + if var.grad is not None: + var.zero_grad() + ### END SOLUTION + +# %% [markdown] +""" +## Step 10: Advanced Testing + +Let's test our advanced features and optimizations. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-advanced-operations", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false} +def test_advanced_operations(): + """Test advanced mathematical operations.""" + print("🔬 Testing advanced operations...") + + # Test power operation + print("📊 Testing power operation...") + x = Variable(2.0, requires_grad=True) + y = power(x, 3) # x^3 + + assert abs(y.data.data.item() - 8.0) < 1e-6, f"Power forward failed: expected 8.0, got {y.data.data.item()}" + + y.backward() + # Gradient: d(x^3)/dx = 3x^2 = 3 * 4 = 12 + assert abs(x.grad.data.data.item() - 12.0) < 1e-6, f"Power gradient failed: expected 12.0, got {x.grad.data.data.item()}" + print("✅ Power operation test passed!") + + # Test exponential operation + print("📊 Testing exponential operation...") + x = Variable(1.0, requires_grad=True) + y = exp(x) # e^x + + expected_exp = np.exp(1.0) + assert abs(y.data.data.item() - expected_exp) < 1e-6, f"Exp forward failed: expected {expected_exp}, got {y.data.data.item()}" + + y.backward() + # Gradient: d(e^x)/dx = e^x + assert abs(x.grad.data.data.item() - expected_exp) < 1e-6, f"Exp gradient failed: expected {expected_exp}, got {x.grad.data.data.item()}" + print("✅ Exponential operation test passed!") + + # Test logarithm operation + print("📊 Testing logarithm operation...") + x = Variable(2.0, requires_grad=True) + y = log(x) # ln(x) + + expected_log = np.log(2.0) + assert abs(y.data.data.item() - expected_log) < 1e-6, f"Log forward failed: expected {expected_log}, got {y.data.data.item()}" + + y.backward() + # Gradient: d(ln(x))/dx = 1/x = 1/2 = 0.5 + assert abs(x.grad.data.data.item() - 0.5) < 1e-6, f"Log gradient failed: expected 0.5, got {x.grad.data.data.item()}" + print("✅ Logarithm operation test passed!") + + # Test sum operation + print("📊 Testing sum operation...") + x = Variable([[1.0, 2.0], [3.0, 4.0]], requires_grad=True) + y = sum_all(x) # sum of all elements + + assert abs(y.data.data.item() - 10.0) < 1e-6, f"Sum forward failed: expected 10.0, got {y.data.data.item()}" + + y.backward() + # Gradient: all elements should be 1 + expected_grad = np.ones((2, 2)) + np.testing.assert_array_almost_equal(x.grad.data.data, expected_grad) + print("✅ Sum operation test passed!") + + # Test mean operation + print("📊 Testing mean operation...") + x = Variable([[1.0, 2.0], [3.0, 4.0]], requires_grad=True) + y = mean(x) # mean of all elements + + assert abs(y.data.data.item() - 2.5) < 1e-6, f"Mean forward failed: expected 2.5, got {y.data.data.item()}" + + y.backward() + # Gradient: all elements should be 1/4 = 0.25 + expected_grad = np.ones((2, 2)) * 0.25 + np.testing.assert_array_almost_equal(x.grad.data.data, expected_grad) + print("✅ Mean operation test passed!") + + print("🎉 All advanced operation tests passed!") + return True + +# Run the test +success = test_advanced_operations() + +# %% nbgrader={"grade": true, "grade_id": "test-gradient-utilities", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false} +def test_gradient_utilities(): + """Test gradient utility functions.""" + print("🔬 Testing gradient utilities...") + + # Test gradient clipping + print("📊 Testing gradient clipping...") + x = Variable(1.0, requires_grad=True) + y = Variable(1.0, requires_grad=True) + + # Create large gradients + z = multiply(x, 10.0) # Large gradient for x + w = multiply(y, 10.0) # Large gradient for y + loss = add(z, w) + loss.backward() + + # Check gradients are large before clipping + assert abs(x.grad.data.data.item() - 10.0) < 1e-6 + assert abs(y.grad.data.data.item() - 10.0) < 1e-6 + + # Clip gradients + clip_gradients([x, y], max_norm=1.0) + + # Check gradients are clipped + total_norm = np.sqrt(x.grad.data.data.item()**2 + y.grad.data.data.item()**2) + assert abs(total_norm - 1.0) < 1e-6, f"Gradient clipping failed: total norm {total_norm}, expected 1.0" + print("✅ Gradient clipping test passed!") + + # Test zero gradients + print("📊 Testing zero gradients...") + # Gradients should be non-zero before zeroing + assert abs(x.grad.data.data.item()) > 1e-6 + assert abs(y.grad.data.data.item()) > 1e-6 + + # Zero gradients + zero_gradients([x, y]) + + # Check gradients are zero + assert abs(x.grad.data.data.item()) < 1e-6 + assert abs(y.grad.data.data.item()) < 1e-6 + print("✅ Zero gradients test passed!") + + print("🎉 All gradient utility tests passed!") + return True + +# Run the test +success = test_gradient_utilities() + +# %% [markdown] +""" +## Step 11: Complete ML Pipeline Example + +Let's demonstrate a complete machine learning pipeline using our autograd system. +""" + +# %% nbgrader={"grade": true, "grade_id": "test-complete-pipeline", "locked": true, "points": 20, "schema_version": 3, "solution": false, "task": false} +def test_complete_ml_pipeline(): + """Test complete ML pipeline with autograd.""" + print("🔬 Testing complete ML pipeline...") + + # Create a simple regression problem: y = 2x + 1 + noise + print("📊 Setting up regression problem...") + + # Training data + x_data = [1.0, 2.0, 3.0, 4.0, 5.0] + y_data = [3.1, 4.9, 7.2, 9.1, 10.8] # Approximately 2x + 1 with noise + + # Model parameters + w = Variable(0.1, requires_grad=True) # Weight + b = Variable(0.0, requires_grad=True) # Bias + + # Training loop + learning_rate = 0.01 + num_epochs = 100 + + print("📊 Training model...") + for epoch in range(num_epochs): + total_loss = Variable(0.0, requires_grad=False) + + # Forward pass for all data points + for x_val, y_val in zip(x_data, y_data): + x = Variable(x_val, requires_grad=False) + y_target = Variable(y_val, requires_grad=False) + + # Prediction: y_pred = w * x + b + y_pred = add(multiply(w, x), b) + + # Loss: MSE = (y_pred - y_target)^2 + diff = subtract(y_pred, y_target) + loss = multiply(diff, diff) + + # Accumulate loss + total_loss = add(total_loss, loss) + + # Backward pass + total_loss.backward() + + # Update parameters + w.data._data -= learning_rate * w.grad.data.data + b.data._data -= learning_rate * b.grad.data.data + + # Zero gradients for next iteration + zero_gradients([w, b]) + + # Print progress + if epoch % 20 == 0: + print(f" Epoch {epoch}: Loss = {total_loss.data.data.item():.4f}, w = {w.data.data.item():.4f}, b = {b.data.data.item():.4f}") + + # Check final parameters + print("📊 Checking final parameters...") + final_w = w.data.data.item() + final_b = b.data.data.item() + + # Should be close to true values: w=2, b=1 + assert abs(final_w - 2.0) < 0.5, f"Weight not learned correctly: expected ~2.0, got {final_w}" + assert abs(final_b - 1.0) < 0.5, f"Bias not learned correctly: expected ~1.0, got {final_b}" + + print(f"✅ Model learned: w = {final_w:.3f}, b = {final_b:.3f}") + print("✅ Complete ML pipeline test passed!") + + # Test prediction on new data + print("📊 Testing prediction on new data...") + x_test = Variable(6.0, requires_grad=False) + y_pred = add(multiply(w, x_test), b) + expected_pred = 2.0 * 6.0 + 1.0 # True function value + + print(f" Prediction for x=6: {y_pred.data.data.item():.3f} (expected ~{expected_pred})") + assert abs(y_pred.data.data.item() - expected_pred) < 1.0, "Prediction accuracy insufficient" + + print("🎉 Complete ML pipeline test passed!") + return True + +# Run the test +success = test_complete_ml_pipeline() \ No newline at end of file diff --git a/modules/source/07_autograd/tests/test_autograd.py b/modules/source/07_autograd/tests/test_autograd.py deleted file mode 100644 index b1f49810..00000000 --- a/modules/source/07_autograd/tests/test_autograd.py +++ /dev/null @@ -1,698 +0,0 @@ -""" -Test suite for the autograd module. -This tests the autograd implementations using mock classes to avoid cross-module dependencies. -""" - -import pytest -import numpy as np -import sys -import os - -# Add the module path for testing -sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..')) - -# Import the autograd module directly -from autograd_dev import Variable, add, multiply, subtract, divide, relu_with_grad, sigmoid_with_grad - - -class MockTensor: - """Mock Tensor class for testing autograd without dependencies.""" - - def __init__(self, data): - if isinstance(data, (int, float)): - self._data = np.array(data, dtype=np.float32) - elif isinstance(data, list): - self._data = np.array(data, dtype=np.float32) - elif isinstance(data, np.ndarray): - self._data = data.astype(np.float32) - else: - self._data = np.array(data, dtype=np.float32) - - @property - def data(self): - return self._data - - @property - def shape(self): - return self._data.shape - - @property - def size(self): - return self._data.size - - def __add__(self, other): - if isinstance(other, MockTensor): - return MockTensor(self._data + other._data) - else: - return MockTensor(self._data + other) - - def __mul__(self, other): - if isinstance(other, MockTensor): - return MockTensor(self._data * other._data) - else: - return MockTensor(self._data * other) - - def __sub__(self, other): - if isinstance(other, MockTensor): - return MockTensor(self._data - other._data) - else: - return MockTensor(self._data - other) - - def __truediv__(self, other): - if isinstance(other, MockTensor): - return MockTensor(self._data / other._data) - else: - return MockTensor(self._data / other) - - def item(self): - return self._data.item() - - -class TestVariableCreation: - """Test Variable creation and basic properties.""" - - def test_variable_from_scalar(self): - """Test creating Variable from scalar values.""" - # Float scalar - v1 = Variable(5.0) - assert v1.shape == () - assert v1.size == 1 - assert v1.requires_grad == True - assert v1.is_leaf == True - assert v1.grad is None - - # Integer scalar - v2 = Variable(42) - assert v2.shape == () - assert v2.size == 1 - assert abs(v2.data.data.item() - 42.0) < 1e-6 - - def test_variable_from_list(self): - """Test creating Variable from list.""" - v = Variable([1.0, 2.0, 3.0]) - assert v.shape == (3,) - assert v.size == 3 - assert v.requires_grad == True - assert v.is_leaf == True - np.testing.assert_array_almost_equal(v.data.data, [1.0, 2.0, 3.0]) - - def test_variable_from_numpy(self): - """Test creating Variable from numpy array.""" - arr = np.array([[1.0, 2.0], [3.0, 4.0]]) - v = Variable(arr) - assert v.shape == (2, 2) - assert v.size == 4 - np.testing.assert_array_almost_equal(v.data.data, arr) - - def test_variable_requires_grad_flag(self): - """Test requires_grad flag functionality.""" - v1 = Variable(5.0, requires_grad=True) - assert v1.requires_grad == True - - v2 = Variable(5.0, requires_grad=False) - assert v2.requires_grad == False - - def test_variable_with_grad_fn(self): - """Test Variable with gradient function (non-leaf).""" - def dummy_grad_fn(grad): - pass - - v = Variable(5.0, requires_grad=True, grad_fn=dummy_grad_fn) - assert v.requires_grad == True - assert v.is_leaf == False - assert v.grad_fn == dummy_grad_fn - - def test_variable_repr(self): - """Test string representation of Variable.""" - v = Variable(5.0) - repr_str = repr(v) - assert 'Variable' in repr_str - assert 'requires_grad' in repr_str - - -class TestBasicOperations: - """Test basic arithmetic operations with gradient tracking.""" - - def test_addition_operation(self): - """Test addition operation and gradients.""" - x = Variable(2.0, requires_grad=True) - y = Variable(3.0, requires_grad=True) - z = add(x, y) - - # Test forward pass - assert abs(z.data.data.item() - 5.0) < 1e-6 - assert z.requires_grad == True - assert z.is_leaf == False - - # Test backward pass - z.backward() - assert abs(x.grad.data.data.item() - 1.0) < 1e-6 - assert abs(y.grad.data.data.item() - 1.0) < 1e-6 - - def test_multiplication_operation(self): - """Test multiplication operation and gradients.""" - x = Variable(2.0, requires_grad=True) - y = Variable(3.0, requires_grad=True) - z = multiply(x, y) - - # Test forward pass - assert abs(z.data.data.item() - 6.0) < 1e-6 - assert z.requires_grad == True - assert z.is_leaf == False - - # Test backward pass - z.backward() - assert abs(x.grad.data.data.item() - 3.0) < 1e-6 # dy/dx = y = 3 - assert abs(y.grad.data.data.item() - 2.0) < 1e-6 # dy/dy = x = 2 - - def test_subtraction_operation(self): - """Test subtraction operation and gradients.""" - x = Variable(5.0, requires_grad=True) - y = Variable(3.0, requires_grad=True) - z = subtract(x, y) - - # Test forward pass - assert abs(z.data.data.item() - 2.0) < 1e-6 - assert z.requires_grad == True - assert z.is_leaf == False - - # Test backward pass - z.backward() - assert abs(x.grad.data.data.item() - 1.0) < 1e-6 # dz/dx = 1 - assert abs(y.grad.data.data.item() - (-1.0)) < 1e-6 # dz/dy = -1 - - def test_division_operation(self): - """Test division operation and gradients.""" - x = Variable(6.0, requires_grad=True) - y = Variable(2.0, requires_grad=True) - z = divide(x, y) - - # Test forward pass - assert abs(z.data.data.item() - 3.0) < 1e-6 - assert z.requires_grad == True - assert z.is_leaf == False - - # Test backward pass - z.backward() - assert abs(x.grad.data.data.item() - 0.5) < 1e-6 # dz/dx = 1/y = 1/2 - assert abs(y.grad.data.data.item() - (-1.5)) < 1e-6 # dz/dy = -x/y² = -6/4 - - def test_operations_with_constants(self): - """Test operations with constant values.""" - x = Variable(2.0, requires_grad=True) - - # Addition with constant - z1 = add(x, 3.0) - assert abs(z1.data.data.item() - 5.0) < 1e-6 - z1.backward() - assert abs(x.grad.data.data.item() - 1.0) < 1e-6 - - # Reset gradient - x.zero_grad() - - # Multiplication with constant - z2 = multiply(x, 4.0) - assert abs(z2.data.data.item() - 8.0) < 1e-6 - z2.backward() - assert abs(x.grad.data.data.item() - 4.0) < 1e-6 - - def test_no_grad_propagation(self): - """Test that gradients don't propagate when requires_grad=False.""" - x = Variable(2.0, requires_grad=False) - y = Variable(3.0, requires_grad=True) - z = add(x, y) - - z.backward() - assert x.grad is None # No gradient for x - assert abs(y.grad.data.data.item() - 1.0) < 1e-6 - - -class TestChainRule: - """Test chain rule implementation with complex expressions.""" - - def test_simple_chain_rule(self): - """Test f(x, y) = (x + y) * (x - y) = x² - y².""" - x = Variable(3.0, requires_grad=True) - y = Variable(2.0, requires_grad=True) - - # Forward pass - sum_xy = add(x, y) - diff_xy = subtract(x, y) - result = multiply(sum_xy, diff_xy) - - # Check forward pass - assert abs(result.data.data.item() - 5.0) < 1e-6 # (3+2)*(3-2) = 5 - - # Backward pass - result.backward() - - # Check gradients: df/dx = 2x = 6, df/dy = -2y = -4 - assert abs(x.grad.data.data.item() - 6.0) < 1e-6 - assert abs(y.grad.data.data.item() - (-4.0)) < 1e-6 - - def test_cubic_function(self): - """Test f(x) = x³ using x * x * x.""" - x = Variable(2.0, requires_grad=True) - - # Forward pass - x_squared = multiply(x, x) - x_cubed = multiply(x_squared, x) - - # Check forward pass - assert abs(x_cubed.data.data.item() - 8.0) < 1e-6 # 2³ = 8 - - # Backward pass - x_cubed.backward() - - # Check gradient: df/dx = 3x² = 12 - assert abs(x.grad.data.data.item() - 12.0) < 1e-6 - - def test_complex_expression(self): - """Test f(x, y) = (x * y) + (x / y).""" - x = Variable(4.0, requires_grad=True) - y = Variable(2.0, requires_grad=True) - - # Forward pass - product = multiply(x, y) - quotient = divide(x, y) - result = add(product, quotient) - - # Check forward pass: (4*2) + (4/2) = 8 + 2 = 10 - assert abs(result.data.data.item() - 10.0) < 1e-6 - - # Backward pass - result.backward() - - # Check gradients: df/dx = y + 1/y = 2 + 0.5 = 2.5 - # df/dy = x - x/y² = 4 - 4/4 = 3 - assert abs(x.grad.data.data.item() - 2.5) < 1e-6 - assert abs(y.grad.data.data.item() - 3.0) < 1e-6 - - def test_gradient_accumulation(self): - """Test that gradients accumulate correctly.""" - x = Variable(2.0, requires_grad=True) - - # First computation - y1 = multiply(x, 3.0) - y1.backward() - first_grad = x.grad.data.data.item() - - # Second computation (should accumulate) - y2 = multiply(x, 4.0) - y2.backward() - second_grad = x.grad.data.data.item() - - # Gradient should accumulate: 3 + 4 = 7 - assert abs(second_grad - 7.0) < 1e-6 - - def test_zero_grad_functionality(self): - """Test zero_grad functionality.""" - x = Variable(2.0, requires_grad=True) - y = multiply(x, 3.0) - y.backward() - - # Check gradient exists - assert x.grad is not None - assert abs(x.grad.data.data.item() - 3.0) < 1e-6 - - # Zero the gradient - x.zero_grad() - assert abs(x.grad.data.data.item() - 0.0) < 1e-6 - - -class TestActivationGradients: - """Test activation functions with gradient computation.""" - - def test_relu_activation(self): - """Test ReLU activation and its gradient.""" - # Test positive input - x1 = Variable(2.0, requires_grad=True) - y1 = relu_with_grad(x1) - - assert abs(y1.data.data.item() - 2.0) < 1e-6 # ReLU(2) = 2 - y1.backward() - assert abs(x1.grad.data.data.item() - 1.0) < 1e-6 # gradient = 1 for x > 0 - - # Test negative input - x2 = Variable(-1.0, requires_grad=True) - y2 = relu_with_grad(x2) - - assert abs(y2.data.data.item() - 0.0) < 1e-6 # ReLU(-1) = 0 - y2.backward() - assert abs(x2.grad.data.data.item() - 0.0) < 1e-6 # gradient = 0 for x < 0 - - # Test zero input - x3 = Variable(0.0, requires_grad=True) - y3 = relu_with_grad(x3) - - assert abs(y3.data.data.item() - 0.0) < 1e-6 # ReLU(0) = 0 - y3.backward() - assert abs(x3.grad.data.data.item() - 0.0) < 1e-6 # gradient = 0 for x = 0 - - def test_sigmoid_activation(self): - """Test Sigmoid activation and its gradient.""" - # Test zero input - x1 = Variable(0.0, requires_grad=True) - y1 = sigmoid_with_grad(x1) - - assert abs(y1.data.data.item() - 0.5) < 1e-6 # sigmoid(0) = 0.5 - y1.backward() - assert abs(x1.grad.data.data.item() - 0.25) < 1e-6 # gradient = 0.5 * 0.5 = 0.25 - - # Test positive input - x2 = Variable(2.0, requires_grad=True) - y2 = sigmoid_with_grad(x2) - - expected_sigmoid = 1.0 / (1.0 + np.exp(-2.0)) - assert abs(y2.data.data.item() - expected_sigmoid) < 1e-6 - - y2.backward() - expected_grad = expected_sigmoid * (1.0 - expected_sigmoid) - assert abs(x2.grad.data.data.item() - expected_grad) < 1e-6 - - # Test negative input - x3 = Variable(-1.0, requires_grad=True) - y3 = sigmoid_with_grad(x3) - - expected_sigmoid = 1.0 / (1.0 + np.exp(1.0)) - assert abs(y3.data.data.item() - expected_sigmoid) < 1e-6 - - y3.backward() - expected_grad = expected_sigmoid * (1.0 - expected_sigmoid) - assert abs(x3.grad.data.data.item() - expected_grad) < 1e-6 - - def test_activation_chaining(self): - """Test chaining activation functions.""" - x = Variable(1.0, requires_grad=True) - - # Chain: x -> ReLU -> Sigmoid - relu_out = relu_with_grad(x) - sigmoid_out = sigmoid_with_grad(relu_out) - - # Forward pass - expected_relu = 1.0 # ReLU(1) = 1 - expected_sigmoid = 1.0 / (1.0 + np.exp(-1.0)) # sigmoid(1) - - assert abs(relu_out.data.data.item() - expected_relu) < 1e-6 - assert abs(sigmoid_out.data.data.item() - expected_sigmoid) < 1e-6 - - # Backward pass - sigmoid_out.backward() - - # Check that gradient flows through both activations - assert x.grad is not None - assert abs(x.grad.data.data.item()) > 1e-6 # Should have non-zero gradient - - -class TestNeuralNetworkScenarios: - """Test autograd in realistic neural network scenarios.""" - - def test_simple_linear_layer(self): - """Test simple linear transformation: y = Wx + b.""" - # Input - x = Variable(2.0, requires_grad=True) - - # Parameters - w = Variable(0.5, requires_grad=True) - b = Variable(0.1, requires_grad=True) - - # Forward pass - linear_out = add(multiply(x, w), b) # y = x*w + b = 2*0.5 + 0.1 = 1.1 - - assert abs(linear_out.data.data.item() - 1.1) < 1e-6 - - # Backward pass - linear_out.backward() - - # Check gradients - assert abs(x.grad.data.data.item() - 0.5) < 1e-6 # dy/dx = w = 0.5 - assert abs(w.grad.data.data.item() - 2.0) < 1e-6 # dy/dw = x = 2.0 - assert abs(b.grad.data.data.item() - 1.0) < 1e-6 # dy/db = 1 = 1.0 - - def test_two_layer_network(self): - """Test two-layer neural network.""" - # Input - x = Variable(1.0, requires_grad=True) - - # Layer 1 parameters - w1 = Variable(2.0, requires_grad=True) - b1 = Variable(0.5, requires_grad=True) - - # Layer 2 parameters - w2 = Variable(1.5, requires_grad=True) - b2 = Variable(0.2, requires_grad=True) - - # Forward pass - # Layer 1: h = x*w1 + b1 = 1*2 + 0.5 = 2.5 - h = add(multiply(x, w1), b1) - # ReLU activation - h_relu = relu_with_grad(h) # ReLU(2.5) = 2.5 - # Layer 2: y = h*w2 + b2 = 2.5*1.5 + 0.2 = 3.95 - y = add(multiply(h_relu, w2), b2) - - assert abs(y.data.data.item() - 3.95) < 1e-6 - - # Backward pass - y.backward() - - # Check that all parameters have gradients - assert x.grad is not None - assert w1.grad is not None - assert b1.grad is not None - assert w2.grad is not None - assert b2.grad is not None - - # Check specific gradient values - assert abs(b2.grad.data.data.item() - 1.0) < 1e-6 # dy/db2 = 1 - assert abs(w2.grad.data.data.item() - 2.5) < 1e-6 # dy/dw2 = h_relu = 2.5 - assert abs(b1.grad.data.data.item() - 1.5) < 1e-6 # dy/db1 = w2 = 1.5 - assert abs(w1.grad.data.data.item() - 1.5) < 1e-6 # dy/dw1 = x * w2 = 1 * 1.5 - assert abs(x.grad.data.data.item() - 3.0) < 1e-6 # dy/dx = w1 * w2 = 2 * 1.5 - - def test_loss_computation(self): - """Test loss computation with gradients.""" - # Prediction and target - pred = Variable(3.0, requires_grad=True) - target = Variable(2.0, requires_grad=False) - - # Mean squared error: loss = (pred - target)² - diff = subtract(pred, target) # 3 - 2 = 1 - loss = multiply(diff, diff) # 1² = 1 - - assert abs(loss.data.data.item() - 1.0) < 1e-6 - - # Backward pass - loss.backward() - - # Check gradient: d_loss/d_pred = 2 * (pred - target) = 2 * 1 = 2 - assert abs(pred.grad.data.data.item() - 2.0) < 1e-6 - assert target.grad is None # No gradient for target - - def test_batch_processing_simulation(self): - """Test simulation of batch processing.""" - # Simulate batch of 3 samples - x1 = Variable(1.0, requires_grad=True) - x2 = Variable(2.0, requires_grad=True) - x3 = Variable(3.0, requires_grad=True) - - # Shared parameters - w = Variable(0.5, requires_grad=True) - b = Variable(0.1, requires_grad=True) - - # Forward pass for each sample - y1 = add(multiply(x1, w), b) # 1*0.5 + 0.1 = 0.6 - y2 = add(multiply(x2, w), b) # 2*0.5 + 0.1 = 1.1 - y3 = add(multiply(x3, w), b) # 3*0.5 + 0.1 = 1.6 - - # Compute batch loss (sum of individual losses) - loss1 = multiply(y1, y1) # 0.6² = 0.36 - loss2 = multiply(y2, y2) # 1.1² = 1.21 - loss3 = multiply(y3, y3) # 1.6² = 2.56 - - batch_loss = add(add(loss1, loss2), loss3) # 0.36 + 1.21 + 2.56 = 4.13 - - assert abs(batch_loss.data.data.item() - 4.13) < 1e-6 - - # Backward pass - batch_loss.backward() - - # Check that gradients accumulated for shared parameters - assert w.grad is not None - assert b.grad is not None - - # w gradient should be sum of individual contributions - # dL/dw = 2*y1*x1 + 2*y2*x2 + 2*y3*x3 = 2*(0.6*1 + 1.1*2 + 1.6*3) = 2*7.6 = 15.2 - expected_w_grad = 2 * (0.6*1 + 1.1*2 + 1.6*3) - assert abs(w.grad.data.data.item() - expected_w_grad) < 1e-6 - - # b gradient should be sum of individual contributions - # dL/db = 2*y1 + 2*y2 + 2*y3 = 2*(0.6 + 1.1 + 1.6) = 2*3.3 = 6.6 - expected_b_grad = 2 * (0.6 + 1.1 + 1.6) - assert abs(b.grad.data.data.item() - expected_b_grad) < 1e-6 - - -class TestEdgeCases: - """Test edge cases and error conditions.""" - - def test_zero_division_handling(self): - """Test division by zero handling.""" - x = Variable(1.0, requires_grad=True) - y = Variable(0.0, requires_grad=True) - - # This should not crash but may produce inf/nan - z = divide(x, y) - - # Check that the operation completes - assert z.data.data.item() == np.inf or np.isnan(z.data.data.item()) - - def test_large_gradient_values(self): - """Test handling of large gradient values.""" - x = Variable(100.0, requires_grad=True) - y = Variable(100.0, requires_grad=True) - - # Large multiplication - z = multiply(x, y) # 100 * 100 = 10000 - z.backward() - - # Gradients should be large but finite - assert np.isfinite(x.grad.data.data.item()) - assert np.isfinite(y.grad.data.data.item()) - assert abs(x.grad.data.data.item() - 100.0) < 1e-6 - assert abs(y.grad.data.data.item() - 100.0) < 1e-6 - - def test_very_small_values(self): - """Test handling of very small values.""" - x = Variable(1e-10, requires_grad=True) - y = Variable(2e-10, requires_grad=True) - - z = add(x, y) - z.backward() - - # Gradients should still be computed correctly - assert abs(x.grad.data.data.item() - 1.0) < 1e-6 - assert abs(y.grad.data.data.item() - 1.0) < 1e-6 - - def test_mixed_requires_grad(self): - """Test operations with mixed requires_grad settings.""" - x = Variable(2.0, requires_grad=True) - y = Variable(3.0, requires_grad=False) - - z = multiply(x, y) - - # Result should require gradients - assert z.requires_grad == True - - z.backward() - - # Only x should have gradients - assert x.grad is not None - assert y.grad is None - assert abs(x.grad.data.data.item() - 3.0) < 1e-6 - - -# Integration tests that combine multiple concepts -class TestIntegration: - """Integration tests combining multiple autograd concepts.""" - - def test_complete_training_step(self): - """Test a complete training step simulation.""" - # Model parameters - w1 = Variable(0.1, requires_grad=True) - b1 = Variable(0.0, requires_grad=True) - w2 = Variable(0.2, requires_grad=True) - b2 = Variable(0.0, requires_grad=True) - - # Training data - x = Variable(1.5, requires_grad=False) - target = Variable(2.0, requires_grad=False) - - # Forward pass - h1 = add(multiply(x, w1), b1) # Linear layer 1 - h1_relu = relu_with_grad(h1) # ReLU activation - output = add(multiply(h1_relu, w2), b2) # Linear layer 2 - - # Loss computation (MSE) - diff = subtract(output, target) - loss = multiply(diff, diff) - - # Backward pass - loss.backward() - - # Check that all parameters have gradients - assert w1.grad is not None - assert b1.grad is not None - assert w2.grad is not None - assert b2.grad is not None - - # Simulate parameter update (gradient descent) - learning_rate = 0.01 - - # Save old parameter values - old_w1 = w1.data.data.item() - old_b1 = b1.data.data.item() - old_w2 = w2.data.data.item() - old_b2 = b2.data.data.item() - - # Update parameters: param = param - lr * grad - w1.data._data -= learning_rate * w1.grad.data.data - b1.data._data -= learning_rate * b1.grad.data.data - w2.data._data -= learning_rate * w2.grad.data.data - b2.data._data -= learning_rate * b2.grad.data.data - - # Check that parameters actually changed - assert abs(w1.data.data.item() - old_w1) > 1e-6 - assert abs(b1.data.data.item() - old_b1) > 1e-6 - assert abs(w2.data.data.item() - old_w2) > 1e-6 - assert abs(b2.data.data.item() - old_b2) > 1e-6 - - def test_multi_output_gradients(self): - """Test gradients when multiple outputs depend on same input.""" - x = Variable(2.0, requires_grad=True) - - # Create multiple outputs from same input - y1 = multiply(x, 3.0) # y1 = 3x - y2 = multiply(x, x) # y2 = x² - - # Combine outputs - combined = add(y1, y2) # combined = 3x + x² - - combined.backward() - - # Gradient should be sum of individual contributions - # d(combined)/dx = d(3x)/dx + d(x²)/dx = 3 + 2x = 3 + 2*2 = 7 - assert abs(x.grad.data.data.item() - 7.0) < 1e-6 - - def test_gradient_flow_through_complex_network(self): - """Test gradient flow through a more complex network.""" - # Input - x = Variable(1.0, requires_grad=True) - - # Create a diamond-shaped computation graph - # x - # / \ - # a b - # \ / - # c - - a = multiply(x, 2.0) # a = 2x - b = add(x, 1.0) # b = x + 1 - c = multiply(a, b) # c = a * b = 2x * (x + 1) = 2x² + 2x - - # Expected: c = 2x² + 2x, so dc/dx = 4x + 2 = 4*1 + 2 = 6 - c.backward() - - assert abs(x.grad.data.data.item() - 6.0) < 1e-6 - - def test_nested_function_composition(self): - """Test deeply nested function composition.""" - x = Variable(2.0, requires_grad=True) - - # Create nested composition: f(g(h(x))) - h = multiply(x, 2.0) # h(x) = 2x - g = add(h, 1.0) # g(h(x)) = 2x + 1 - f = multiply(g, g) # f(g(h(x))) = (2x + 1)² - - # Expected: f = (2x + 1)², so df/dx = 2(2x + 1) * 2 = 4(2x + 1) = 4(2*2 + 1) = 20 - f.backward() - - assert abs(x.grad.data.data.item() - 20.0) < 1e-6 \ No newline at end of file diff --git a/modules/source/08_optimizers/optimizers_dev.py b/modules/source/08_optimizers/optimizers_dev.py new file mode 100644 index 00000000..0519ecba --- /dev/null +++ b/modules/source/08_optimizers/optimizers_dev.py @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index db23b06d..3f9f5bf3 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -84,7 +84,6 @@ addopts = [ ] testpaths = [ "tests", - "modules/*/tests", ] python_files = ["test_*.py"] python_classes = ["Test*"] diff --git a/tito/commands/status.py b/tito/commands/status.py index 2051a708..de7d3056 100644 --- a/tito/commands/status.py +++ b/tito/commands/status.py @@ -52,9 +52,9 @@ class StatusCommand(BaseCommand): console = self.console # Scan modules directory - modules_dir = Path("modules") + modules_dir = Path("modules/source") if not modules_dir.exists(): - console.print(Panel("[red]❌ modules/ directory not found[/red]", + console.print(Panel("[red]❌ modules/source/ directory not found[/red]", title="Error", border_style="red")) return 1 @@ -150,14 +150,21 @@ class StatusCommand(BaseCommand): # Check for required files dev_file = module_dir / f"{module_name}_dev.py" - tests_dir = module_dir / "tests" - test_file = tests_dir / f"test_{module_name}.py" readme_file = module_dir / "README.md" metadata_file = module_dir / "module.yaml" + # Check for tests in main tests directory + # Extract short name from module directory name (e.g., "01_tensor" -> "tensor") + if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))): + short_name = module_name[3:] # Remove "00_" prefix + else: + short_name = module_name + + main_test_file = Path("tests") / f"test_{short_name}.py" + status = { 'dev_file': dev_file.exists(), - 'tests': test_file.exists(), + 'tests': main_test_file.exists(), 'readme': readme_file.exists(), 'metadata_file': metadata_file.exists(), } @@ -187,7 +194,13 @@ class StatusCommand(BaseCommand): return 'in_progress' # If tests exist, run them to determine status - test_file = f"modules/{module_name}/tests/test_{module_name}.py" + # Extract short name from module directory name (e.g., "01_tensor" -> "tensor") + if module_name.startswith(tuple(f"{i:02d}_" for i in range(100))): + short_name = module_name[3:] # Remove "00_" prefix + else: + short_name = module_name + + test_file = f"tests/test_{short_name}.py" try: # Run pytest quietly to check if tests pass result = subprocess.run(