TinyTorch/modules/source/05_dense/dense_dev.py

# ---
# jupyter:
#   jupytext:
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.17.1
# ---

# %% [markdown]
"""
# Networks - Neural Network Architectures

Welcome to the Networks module! This is where we compose layers into complete neural network architectures.

## Learning Goals
- Understand networks as function composition: `f(x) = layer_n(...layer_2(layer_1(x)))`
- Build the Sequential network architecture for composing layers
- Create common network patterns like MLPs (Multi-Layer Perceptrons)
- Visualize network architectures and understand their capabilities
- Master forward pass inference through complete networks

## Build → Use → Reflect
1. **Build**: Sequential networks that compose layers into complete architectures
2. **Use**: Create different network patterns and run inference
3. **Reflect**: How architecture design affects network behavior and capability

## What You'll Learn
By the end of this module, you'll understand:
- How simple layers combine to create complex behaviors
- The fundamental Sequential architecture pattern
- How to build MLPs with any number of layers
- Different network architectures (shallow, deep, wide)
- How neural networks approximate complex functions
"""

# %% nbgrader={"grade": false, "grade_id": "networks-imports", "locked": false, "schema_version": 3, "solution": false, "task": false}
#| default_exp core.dense

#| export
import numpy as np
import sys
import os
from typing import List, Union, Optional, Callable
import matplotlib.pyplot as plt

# Import all the building blocks we need - try package first, then local modules
try:
    from tinytorch.core.tensor import Tensor
    from tinytorch.core.layers import Dense
    from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax
except ImportError:
    # For development, import from local modules
    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '01_tensor'))
    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '02_activations'))
    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '03_layers'))
    from tensor_dev import Tensor
    from activations_dev import ReLU, Sigmoid, Tanh, Softmax
    from layers_dev import Dense

# %% nbgrader={"grade": false, "grade_id": "networks-setup", "locked": false, "schema_version": 3, "solution": false, "task": false}
#| hide
#| export
def _should_show_plots():
    """Check if we should show plots (disable during testing)"""
    # Check multiple conditions that indicate we're in test mode
    is_testing = (
        'pytest' in sys.modules or
        'test' in sys.argv or
        os.environ.get('PYTEST_CURRENT_TEST') is not None or
        any('test' in arg for arg in sys.argv) or
        any('pytest' in arg for arg in sys.argv) or
        'tito' in sys.argv or
        any('tito' in arg for arg in sys.argv) or
        os.environ.get('TITO_TESTING') is not None
    )

    # Show plots in development mode (when not in test mode)
    return not is_testing

# %% nbgrader={"grade": false, "grade_id": "networks-welcome", "locked": false, "schema_version": 3, "solution": false, "task": false}
print("🔥 TinyTorch Networks Module")
print(f"NumPy version: {np.__version__}")
print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
print("Ready to build neural network architectures!")

# %% [markdown]
"""
## 📦 Where This Code Lives in the Final Package

**Learning Side:** You work in `modules/source/04_networks/networks_dev.py`
**Building Side:** Code exports to `tinytorch.core.networks`

```python
# Final package structure:
from tinytorch.core.networks import Sequential, create_mlp  # Network architectures!
from tinytorch.core.layers import Dense, Conv2D  # Building blocks
from tinytorch.core.activations import ReLU, Sigmoid, Tanh  # Nonlinearity
from tinytorch.core.tensor import Tensor  # Foundation
```

**Why this matters:**
- **Learning:** Focused modules for deep understanding
- **Production:** Proper organization like PyTorch's `torch.nn.Sequential`
- **Consistency:** All network architectures live together in `core.networks`
- **Integration:** Works seamlessly with layers, activations, and tensors
"""

# %% [markdown]
"""
## 🔧 DEVELOPMENT
"""

# %% [markdown]
"""
## Step 1: Understanding Neural Networks as Function Composition

### What is a Neural Network?
A neural network is simply **function composition** - chaining simple functions together to create complex behaviors:

```
f(x) = f_n(f_{n-1}(...f_2(f_1(x))))
```

### Real-World Analogy: Assembly Line
Think of an assembly line in a factory:
- **Input:** Raw materials (data)
- **Stations:** Each worker (layer) transforms the product
- **Output:** Final product (predictions)

### The Power of Composition
```python
# Simple functions
def add_one(x): return x + 1
def multiply_two(x): return x * 2
def square(x): return x * x

# Composed function
def complex_function(x):
    return square(multiply_two(add_one(x)))

# This is what neural networks do!
```

### Why This Matters
- **Universal Approximation:** MLPs can approximate any continuous function
- **Hierarchical Learning:** Early layers learn simple features, later layers learn complex patterns
- **Composability:** Mix and match layers to create custom architectures
- **Scalability:** Add more layers or make them wider as needed

### From Modules We've Built
- **Tensors:** The data containers that flow through networks
- **Activations:** The nonlinear transformations that enable complex behaviors
- **Layers:** The building blocks that transform data

Now let's build our first network architecture!
"""

# %% [markdown]
"""
## Step 2: Building the Sequential Network

### What is Sequential?
**Sequential** is the most fundamental network architecture - it applies layers in order:

```
Sequential([layer1, layer2, layer3])
→ f(x) = layer3(layer2(layer1(x)))
```

### Why Sequential Matters
- **Foundation:** Every neural network library has this pattern
- **Simplicity:** Easy to understand and implement
- **Flexibility:** Can compose any layers in any order
- **Building Block:** Foundation for more complex architectures

### The Sequential Pattern
```python
# PyTorch style
model = nn.Sequential(
    nn.Linear(784, 128),
    nn.ReLU(),
    nn.Linear(128, 10)
)

# Our TinyTorch style
model = Sequential([
    Dense(784, 128),
    ReLU(),
    Dense(128, 10)
])
```

Let's implement this fundamental architecture!
"""

# %% nbgrader={"grade": false, "grade_id": "sequential-class", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
class Sequential:
    """
    Sequential Network: Composes layers in sequence

    The most fundamental network architecture.
    Applies layers in order: f(x) = layer_n(...layer_2(layer_1(x)))
    """

    def __init__(self, layers: Optional[List] = None):
        """
        Initialize Sequential network with layers.

        Args:
            layers: List of layers to compose in order (optional, defaults to empty list)

        TODO: Store the layers and implement forward pass

        APPROACH:
        1. Store the layers list as an instance variable
        2. Initialize empty list if no layers provided
        3. Prepare for forward pass implementation

        EXAMPLE:
        Sequential([Dense(3,4), ReLU(), Dense(4,2)])
        creates a 3-layer network: Dense → ReLU → Dense

        HINTS:
        - Use self.layers to store the layers
        - Handle empty initialization case
        """
        ### BEGIN SOLUTION
        self.layers = layers if layers is not None else []
        ### END SOLUTION

    def forward(self, x: Tensor) -> Tensor:
        """
        Forward pass through all layers in sequence.

        Args:
            x: Input tensor

        Returns:
            Output tensor after passing through all layers

        TODO: Implement sequential forward pass through all layers

        APPROACH:
        1. Start with the input tensor
        2. Apply each layer in sequence
        3. Each layer's output becomes the next layer's input
        4. Return the final output

        EXAMPLE:
        Input: Tensor([[1, 2, 3]])
        Layer1 (Dense): Tensor([[1.4, 2.8]])
        Layer2 (ReLU): Tensor([[1.4, 2.8]])
        Layer3 (Dense): Tensor([[0.7]])
        Output: Tensor([[0.7]])

        HINTS:
        - Use a for loop: for layer in self.layers:
        - Apply each layer: x = layer(x)
        - The output of one layer becomes input to the next
        - Return the final result
        """
        ### BEGIN SOLUTION
        # Apply each layer in sequence
        for layer in self.layers:
            x = layer(x)
        return x
        ### END SOLUTION

    def __call__(self, x: Tensor) -> Tensor:
        """Make the network callable: sequential(x) instead of sequential.forward(x)"""
        return self.forward(x)

    def add(self, layer):
        """Add a layer to the network."""
        self.layers.append(layer)

# %% [markdown]
"""
### 🧪 Unit Test: Sequential Network

Let's test your Sequential network implementation! This is the foundation of all neural network architectures.

**This is a unit test** - it tests one specific class (Sequential network) in isolation.
"""

# %% nbgrader={"grade": true, "grade_id": "test-sequential-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
# Test Sequential network immediately after implementation
print("🔬 Unit Test: Sequential Network...")

# Create a simple 2-layer network: 3 → 4 → 2
try:
    network = Sequential([
        Dense(input_size=3, output_size=4),
        ReLU(),
        Dense(input_size=4, output_size=2),
        Sigmoid()
    ])

    print(f"Network created with {len(network.layers)} layers")
    print("✅ Sequential network creation successful")

    # Test with sample data
    x = Tensor([[1.0, 2.0, 3.0]])
    print(f"Input: {x}")

    # Forward pass
    y = network(x)
    print(f"Output: {y}")
    print(f"Output shape: {y.shape}")

    # Verify the network works
    assert y.shape == (1, 2), f"Expected shape (1, 2), got {y.shape}"
    print("✅ Sequential network produces correct output shape")

    # Test that sigmoid output is in valid range
    assert np.all(y.data >= 0) and np.all(y.data <= 1), "Sigmoid output should be between 0 and 1"
    print("✅ Sequential network output is in valid range")

    # Test that layers are stored correctly
    assert len(network.layers) == 4, f"Expected 4 layers, got {len(network.layers)}"
    print("✅ Sequential network stores layers correctly")

    # Test batch processing
    x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
    y_batch = network(x_batch)
    assert y_batch.shape == (2, 2), f"Expected batch shape (2, 2), got {y_batch.shape}"
    print("✅ Sequential network handles batch processing")

except Exception as e:
    print(f"❌ Sequential network test failed: {e}")
    raise

# Show the network architecture
print("🎯 Sequential network behavior:")
print("   Applies layers in sequence: f(g(h(x)))")
print("   Input flows through each layer in order")
print("   Output of layer i becomes input of layer i+1")
print("📈 Progress: Sequential network ✓")

# %% [markdown]
"""
## Step 3: Building Multi-Layer Perceptrons (MLPs)

### What is an MLP?
A **Multi-Layer Perceptron** is the classic neural network architecture:

```
Input → Dense → Activation → Dense → Activation → ... → Dense → Output
```

### Why MLPs are Important
- **Universal approximation**: Can approximate any continuous function
- **Foundation**: Basis for understanding all neural networks
- **Versatile**: Works for classification, regression, and more
- **Simple**: Easy to understand and implement

### MLP Architecture Pattern
```
create_mlp(3, [4, 2], 1) creates:
Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid
```

### Real-World Applications
- **Tabular data**: Customer analytics, financial modeling
- **Feature learning**: Learning representations from raw data
- **Classification**: Spam detection, medical diagnosis
- **Regression**: Price prediction, time series forecasting

### The MLP Factory Pattern
Instead of manually creating each layer, we'll build a function that creates MLPs automatically!
"""

# %% nbgrader={"grade": false, "grade_id": "create-mlp", "locked": false, "schema_version": 3, "solution": true, "task": false}
#| export
def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int,
               activation=ReLU, output_activation=Sigmoid) -> Sequential:
    """
    Create a Multi-Layer Perceptron (MLP) network.

    Args:
        input_size: Number of input features
        hidden_sizes: List of hidden layer sizes
        output_size: Number of output features
        activation: Activation function for hidden layers (default: ReLU)
        output_activation: Activation function for output layer (default: Sigmoid)

    Returns:
        Sequential network with MLP architecture

    TODO: Implement MLP creation with alternating Dense and activation layers.

    APPROACH:
    1. Start with an empty list of layers
    2. Add layers in this pattern:
       - Dense(input_size → first_hidden_size)
       - Activation()
       - Dense(first_hidden_size → second_hidden_size)
       - Activation()
       - ...
       - Dense(last_hidden_size → output_size)
       - Output_activation()
    3. Return Sequential(layers)

    EXAMPLE:
    create_mlp(3, [4, 2], 1) creates:
    Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid

    HINTS:
    - Start with layers = []
    - Track current_size starting with input_size
    - For each hidden_size: add Dense(current_size, hidden_size), then activation
    - Finally add Dense(last_hidden_size, output_size), then output_activation
    - Return Sequential(layers)
    """
    layers = []
    current_size = input_size

    # Add hidden layers with activations
    for hidden_size in hidden_sizes:
        layers.append(Dense(current_size, hidden_size))
        layers.append(activation())
        current_size = hidden_size

    # Add output layer with output activation
    layers.append(Dense(current_size, output_size))
    layers.append(output_activation())

    return Sequential(layers)

# %% [markdown]
"""
### 🧪 Unit Test: MLP Creation

Let's test your MLP creation function! This builds complete neural networks with a single function call.

**This is a unit test** - it tests one specific function (create_mlp) in isolation.
"""

# %% nbgrader={"grade": true, "grade_id": "test-mlp-immediate", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
# Test MLP creation immediately after implementation
print("🔬 Unit Test: MLP Creation...")

# Create a simple MLP: 3 → 4 → 2 → 1
try:
    mlp = create_mlp(input_size=3, hidden_sizes=[4, 2], output_size=1)

    print(f"MLP created with {len(mlp.layers)} layers")
    print("✅ MLP creation successful")

    # Test the structure - should have 6 layers: Dense, ReLU, Dense, ReLU, Dense, Sigmoid
    expected_layers = 6  # 3 Dense + 2 ReLU + 1 Sigmoid
    assert len(mlp.layers) == expected_layers, f"Expected {expected_layers} layers, got {len(mlp.layers)}"
    print("✅ MLP has correct number of layers")

    # Test layer types
    layer_types = [type(layer).__name__ for layer in mlp.layers]
    expected_pattern = ['Dense', 'ReLU', 'Dense', 'ReLU', 'Dense', 'Sigmoid']
    assert layer_types == expected_pattern, f"Expected pattern {expected_pattern}, got {layer_types}"
    print("✅ MLP follows correct layer pattern")

    # Test with sample data
    x = Tensor([[1.0, 2.0, 3.0]])
    y = mlp(x)
    print(f"MLP input: {x}")
    print(f"MLP output: {y}")
    print(f"MLP output shape: {y.shape}")

    # Verify the output
    assert y.shape == (1, 1), f"Expected shape (1, 1), got {y.shape}"
    print("✅ MLP produces correct output shape")

    # Test that sigmoid output is in valid range
    assert np.all(y.data >= 0) and np.all(y.data <= 1), "Sigmoid output should be between 0 and 1"
    print("✅ MLP output is in valid range")

except Exception as e:
    print(f"❌ MLP creation test failed: {e}")
    raise

# Test different architectures
try:
    # Test shallow network
    shallow_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1)
    assert len(shallow_net.layers) == 4, f"Shallow network should have 4 layers, got {len(shallow_net.layers)}"

    # Test deep network
    deep_net = create_mlp(input_size=3, hidden_sizes=[4, 4, 4], output_size=1)
    assert len(deep_net.layers) == 8, f"Deep network should have 8 layers, got {len(deep_net.layers)}"

    # Test wide network
    wide_net = create_mlp(input_size=3, hidden_sizes=[10], output_size=1)
    assert len(wide_net.layers) == 4, f"Wide network should have 4 layers, got {len(wide_net.layers)}"

    print("✅ Different MLP architectures work correctly")

except Exception as e:
    print(f"❌ MLP architecture test failed: {e}")
    raise

# Show the MLP pattern
print("🎯 MLP creation pattern:")
print("   Input → Dense → Activation → Dense → Activation → ... → Dense → Output_Activation")
print("   Automatically creates the complete architecture")
print("   Handles any number of hidden layers")
print("📈 Progress: Sequential network ✓, MLP creation ✓")

# %% [markdown]
"""
## Step 4: Understanding Network Architectures

### Architecture Patterns
Different network architectures solve different problems:

#### **Shallow vs Deep Networks**
```python
# Shallow: 1 hidden layer
shallow = create_mlp(10, [20], 1)

# Deep: Many hidden layers
deep = create_mlp(10, [20, 20, 20], 1)
```

#### **Narrow vs Wide Networks**
```python
# Narrow: Few neurons per layer
narrow = create_mlp(10, [5, 5], 1)

# Wide: Many neurons per layer
wide = create_mlp(10, [50], 1)
```

### Why Architecture Matters
- **Capacity:** More parameters can learn more complex patterns
- **Depth:** Enables hierarchical feature learning
- **Width:** Allows parallel processing of features
- **Efficiency:** Balance between performance and computation

### Different Activation Functions
   ```python
# ReLU networks (most common)
relu_net = create_mlp(10, [20], 1, activation=ReLU)

# Tanh networks (centered around 0)
tanh_net = create_mlp(10, [20], 1, activation=Tanh)

# Multi-class classification
classifier = create_mlp(10, [20], 3, output_activation=Softmax)
   ```

Let's test different architectures!
"""

# %% [markdown]
"""
### 🧪 Unit Test: Architecture Variations

Let's test different network architectures to understand their behavior.

**This is a unit test** - it tests architectural variations in isolation.
"""

# %% nbgrader={"grade": true, "grade_id": "test-architectures", "locked": true, "points": 10, "schema_version": 3, "solution": false, "task": false}
def plot_network_architectures():
    """Visualize different network architectures."""
    if not _should_show_plots():
        return

    # Create different architectures
    relu_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=ReLU)
    tanh_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh)
    classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax)

    # Create input data
    x = Tensor([[1.0, 2.0, 3.0]])

    # Get outputs
    y_relu = relu_net(x)
    y_tanh = tanh_net(x)
    y_multi = classifier(x)

    # Plot the results
    fig, axs = plt.subplots(1, 3, figsize=(15, 4))

    axs[0].set_title("ReLU Network Output")
    axs[0].bar(['Output'], [y_relu.data[0][0]], color='skyblue')

    axs[1].set_title("Tanh Network Output")
    axs[1].bar(['Output'], [y_tanh.data[0][0]], color='salmon')

    axs[2].set_title("Softmax Classifier Output")
    axs[2].bar([f"Class {i}" for i in range(3)], y_multi.data[0], color='lightgreen')

    plt.tight_layout()
    plt.show()

def test_unit_network_architectures():
    """Unit test for different network architectures."""
# Test different architectures
print("🔬 Unit Test: Network Architecture Variations...")

try:
    # Test different activation functions
    relu_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=ReLU)
    tanh_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh)

    # Test different output activations
    classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax)

    # Test with sample data
    x = Tensor([[1.0, 2.0, 3.0]])

    # Test ReLU network
    y_relu = relu_net(x)
    assert y_relu.shape == (1, 1), "ReLU network should work"
    print("✅ ReLU network works correctly")

    # Test Tanh network
    y_tanh = tanh_net(x)
    assert y_tanh.shape == (1, 1), "Tanh network should work"
    print("✅ Tanh network works correctly")

    # Test multi-class classifier
    y_multi = classifier(x)
    assert y_multi.shape == (1, 3), "Multi-class classifier should work"

    # Check softmax properties
    assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, "Softmax outputs should sum to 1"
    print("✅ Multi-class classifier with Softmax works correctly")

    # Test different architectures
    shallow = create_mlp(input_size=4, hidden_sizes=[5], output_size=1)
    deep = create_mlp(input_size=4, hidden_sizes=[5, 5, 5], output_size=1)
    wide = create_mlp(input_size=4, hidden_sizes=[20], output_size=1)

    x_test = Tensor([[1.0, 2.0, 3.0, 4.0]])

    # Test all architectures
    for name, net in [("Shallow", shallow), ("Deep", deep), ("Wide", wide)]:
        y = net(x_test)
        assert y.shape == (1, 1), f"{name} network should produce correct shape"
        print(f"✅ {name} network works correctly")

    print("✅ All network architectures work correctly")

except Exception as e:
    print(f"❌ Architecture test failed: {e}")
    raise

print("🎯 Architecture insights:")
print("   Different activations create different behaviors")
print("   Softmax enables multi-class classification")
print("   Architecture affects network capacity and learning")
print("📈 Progress: Sequential ✓, MLP creation ✓, Architecture variations ✓")

# %% [markdown]
"""
### 📊 Visualization Demo: Network Architectures

Let's visualize the different network architectures for educational purposes:
"""

# %%
# Demo visualization - only run in interactive mode, not during tests
if __name__ == "__main__":
    # Generate network architecture visualization (only in interactive mode)
    if _should_show_plots():
        plot_network_architectures()
        print("📊 Network architecture visualization complete!")
    else:
        print("📊 Plots disabled during testing")

# %% [markdown]
"""
## Step 5: Comprehensive Test - Complete Network Applications

### Real-World Network Applications
Let's test our networks on realistic scenarios:

#### **Classification Problem**
```python
# 4 features → 2 classes (binary classification)
classifier = create_mlp(4, [8, 4], 2, output_activation=Softmax)
```

#### **Regression Problem**
```python
# 3 features → 1 continuous output
regressor = create_mlp(3, [10, 5], 1, output_activation=lambda: Dense(0, 0))  # Linear output
```

#### **Deep Learning Pattern**
```python
# Complex feature learning
deep_net = create_mlp(10, [64, 32, 16], 1)
```

This comprehensive test ensures our networks work for real ML applications!
"""

# %% nbgrader={"grade": true, "grade_id": "test-integration", "locked": true, "points": 15, "schema_version": 3, "solution": false, "task": false}
# Comprehensive test - complete network applications
print("🔬 Comprehensive Test: Complete Network Applications...")

try:
    # Test 1: Multi-class Classification (Iris-like dataset)
    print("\n1. Multi-class Classification Test:")
    iris_classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax)

    # Simulate iris features: [sepal_length, sepal_width, petal_length, petal_width]
    iris_samples = Tensor([
        [5.1, 3.5, 1.4, 0.2],  # Setosa
        [7.0, 3.2, 4.7, 1.4],  # Versicolor
        [6.3, 3.3, 6.0, 2.5]   # Virginica
        ])

    iris_predictions = iris_classifier(iris_samples)
    assert iris_predictions.shape == (3, 3), "Iris classifier should output 3 classes for 3 samples"

    # Check softmax properties
    row_sums = np.sum(iris_predictions.data, axis=1)
    assert np.allclose(row_sums, 1.0), "Each prediction should sum to 1"
    print("✅ Multi-class classification works correctly")

    # Test 2: Regression Task (Housing prices)
    print("\n2. Regression Task Test:")
    # Create a regressor without final activation (linear output)
    class Identity:
        def __call__(self, x): return x

    housing_regressor = create_mlp(input_size=3, hidden_sizes=[10, 5], output_size=1, output_activation=Identity)

    # Simulate housing features: [size, bedrooms, location_score]
    housing_samples = Tensor([
        [2000, 3, 8.5],  # Large house, good location
        [1200, 2, 6.0],  # Medium house, ok location
        [800, 1, 4.0]    # Small house, poor location
    ])

    housing_predictions = housing_regressor(housing_samples)
    assert housing_predictions.shape == (3, 1), "Housing regressor should output 1 value per sample"
    print("✅ Regression task works correctly")

    # Test 3: Deep Network Performance
    print("\n3. Deep Network Test:")
    deep_network = create_mlp(input_size=10, hidden_sizes=[20, 15, 10, 5], output_size=1)

    # Test with realistic batch size
    batch_data = Tensor(np.random.randn(32, 10))  # 32 samples, 10 features
    deep_predictions = deep_network(batch_data)

    assert deep_predictions.shape == (32, 1), "Deep network should handle batch processing"
    assert not np.any(np.isnan(deep_predictions.data)), "Deep network should not produce NaN"
    print("✅ Deep network handles batch processing correctly")

    # Test 4: Network Composition
    print("\n4. Network Composition Test:")
    # Create a feature extractor and classifier separately
    feature_extractor = Sequential([
    Dense(input_size=10, output_size=5),
        ReLU(),
    Dense(input_size=5, output_size=3),
        ReLU()
    ])

    classifier_head = Sequential([
    Dense(input_size=3, output_size=2),
        Softmax()
    ])

    # Test composition
    raw_data = Tensor(np.random.randn(5, 10))
    features = feature_extractor(raw_data)
    final_predictions = classifier_head(features)

    assert features.shape == (5, 3), "Feature extractor should output 3 features"
    assert final_predictions.shape == (5, 2), "Classifier should output 2 classes"

    row_sums = np.sum(final_predictions.data, axis=1)
    assert np.allclose(row_sums, 1.0), "Composed network predictions should be valid"
    print("✅ Network composition works correctly")

    print("\n🎉 Comprehensive test passed! Your networks work correctly for:")
    print("  • Multi-class classification (Iris flowers)")
    print("  • Regression tasks (housing prices)")
    print("  • Deep learning architectures")
    print("  • Network composition and feature extraction")

except Exception as e:
    print(f"❌ Comprehensive test failed: {e}")

print("📈 Final Progress: Complete network architectures ready for real ML applications!")

# Run the test
test_unit_network_architectures()

# %% nbgrader={"grade": false, "grade_id": "networks-compatibility", "locked": false, "schema_version": 3, "solution": false, "task": false}
#| export
class MLP:
    """
    Multi-Layer Perceptron (MLP) class.

    A convenient wrapper around Sequential networks for standard MLP architectures.
    Maintains parameter information and provides a clean interface.

    Args:
        input_size: Number of input features
        hidden_size: Size of the single hidden layer
        output_size: Number of output features
        activation: Activation function for hidden layer (default: ReLU)
        output_activation: Activation function for output layer (default: Sigmoid)
    """

    def __init__(self, input_size: int, hidden_size: int, output_size: int,
                 activation=ReLU, output_activation=None):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # Build the network layers
        layers = []

        # Input to hidden layer
        layers.append(Dense(input_size, hidden_size))
        layers.append(activation())

        # Hidden to output layer
        layers.append(Dense(hidden_size, output_size))
        if output_activation is not None:
            layers.append(output_activation())

        self.network = Sequential(layers)

    def forward(self, x):
        """Forward pass through the MLP network."""
        return self.network.forward(x)

    def __call__(self, x):
        """Make the MLP callable."""
        return self.forward(x)

# %% [markdown]

def test_unit_sequential_networks():
    """Unit test for the Sequential network implementation."""
    print("🔬 Unit Test: Sequential Networks...")

    # Test basic Sequential network
    net = Sequential([
        Dense(input_size=3, output_size=4),
        ReLU(),
        Dense(input_size=4, output_size=2),
        Sigmoid()
    ])

    x = Tensor([[1.0, 2.0, 3.0]])
    y = net(x)

    assert y.shape == (1, 2), "Sequential network should produce correct output shape"
    assert np.all(y.data > 0), "Sigmoid output should be positive"
    assert np.all(y.data < 1), "Sigmoid output should be less than 1"

    print("✅ Sequential networks work correctly")

# Run the test
test_unit_sequential_networks()

def test_unit_mlp_creation():
    """Unit test for the MLP creation function."""
    print("🔬 Unit Test: MLP Creation...")

    # Test different MLP architectures
    shallow = create_mlp(input_size=4, hidden_sizes=[5], output_size=1)
    deep = create_mlp(input_size=4, hidden_sizes=[8, 6, 4], output_size=2)

    x = Tensor([[1.0, 2.0, 3.0, 4.0]])

    # Test shallow network
    y_shallow = shallow(x)
    assert y_shallow.shape == (1, 1), "Shallow MLP should work"

    # Test deep network
    y_deep = deep(x)
    assert y_deep.shape == (1, 2), "Deep MLP should work"

    print("✅ MLP creation works correctly")

# Run the test
test_unit_mlp_creation()

def test_unit_network_architectures():
    """Unit test for different network architectures."""
    print("🔬 Unit Test: Network Architectures...")

    # Test different activation functions
    relu_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=ReLU)
    tanh_net = create_mlp(input_size=3, hidden_sizes=[4], output_size=1, activation=Tanh)

    # Test multi-class classifier
    classifier = create_mlp(input_size=3, hidden_sizes=[4], output_size=3, output_activation=Softmax)

    x = Tensor([[1.0, 2.0, 3.0]])

    # Test all architectures
    y_relu = relu_net(x)
    y_tanh = tanh_net(x)
    y_multi = classifier(x)

    assert y_relu.shape == (1, 1), "ReLU network should work"
    assert y_tanh.shape == (1, 1), "Tanh network should work"
    assert y_multi.shape == (1, 3), "Multi-class classifier should work"
    assert abs(np.sum(y_multi.data) - 1.0) < 1e-6, "Softmax outputs should sum to 1"

    print("✅ Network architectures work correctly")

# Run the test
test_unit_network_architectures()

def test_unit_network_applications():
    """Comprehensive unit test for network applications in real ML scenarios."""
    print("🔬 Comprehensive Test: Network Applications...")

    # Test multi-class classification
    iris_classifier = create_mlp(input_size=4, hidden_sizes=[8, 6], output_size=3, output_activation=Softmax)
    iris_samples = Tensor([[5.1, 3.5, 1.4, 0.2], [7.0, 3.2, 4.7, 1.4], [6.3, 3.3, 6.0, 2.5]])
    iris_predictions = iris_classifier(iris_samples)

    assert iris_predictions.shape == (3, 3), "Iris classifier should work"
    row_sums = np.sum(iris_predictions.data, axis=1)
    assert np.allclose(row_sums, 1.0), "Predictions should sum to 1"

# Run the test
test_unit_network_applications()

# %% [markdown]
"""
## 🧪 Module Testing

Time to test your implementation! This section uses TinyTorch's standardized testing framework to ensure your implementation works correctly.

**This testing section is locked** - it provides consistent feedback across all modules and cannot be modified.
"""

# %% nbgrader={"grade": false, "grade_id": "standardized-testing", "locked": true, "schema_version": 3, "solution": false, "task": false}
# =============================================================================
# STANDARDIZED MODULE TESTING - DO NOT MODIFY
# This cell is locked to ensure consistent testing across all TinyTorch modules
# =============================================================================

# %% [markdown]
"""
## 🔬 Integration Test: End-to-End Network Forward Pass
"""

# %%
def test_module_full_network_forward_pass():
    """
    Integration test for a complete forward pass through a multi-layer network.

    Tests a complete forward pass through a multi-layer network,
    integrating Tensors, Dense layers, Activations, and the Sequential container.
    """
    print("🔬 Running Integration Test: Full Network Forward Pass...")

    # 1. Define a simple 2-layer MLP
    # Input (3) -> Dense(4) -> ReLU -> Dense(2) -> Output
    model = Sequential([
        Dense(3, 4),
        ReLU(),
        Dense(4, 2)
    ])

    # 2. Create a batch of input Tensors
    # Batch of 5 samples, each with 3 features
    input_tensor = Tensor(np.random.randn(5, 3))

    # 3. Perform a forward pass through the entire network
    output_tensor = model(input_tensor)

    # 4. Assert the final output is correct
    assert isinstance(output_tensor, Tensor), "Network output must be a Tensor"
    assert output_tensor.shape == (5, 2), f"Expected output shape (5, 2), but got {output_tensor.shape}"
    print("✅ Integration Test Passed: Full network forward pass is successful.")

# %% [markdown]
"""
## 🤖 AUTO TESTING
"""

# %%
if __name__ == "__main__":
    test_module_full_network_forward_pass()
    from tito.tools.testing import run_module_tests_auto

    # Automatically discover and run all tests in this module
    success = run_module_tests_auto("Networks")

# %% [markdown]
"""
## 🎯 MODULE SUMMARY: Neural Network Architectures

Congratulations! You've successfully implemented complete neural network architectures:

### What You've Accomplished
✅ **Sequential Networks**: Chained layers for complex transformations
✅ **MLP Creation**: Multi-layer perceptrons with flexible architectures
✅ **Network Architectures**: Different activation patterns and output types
✅ **Integration**: Real-world applications like classification and regression

### Key Concepts You've Learned
- **Sequential Processing**: How layers chain together for complex functions
- **MLP Design**: Multi-layer perceptrons as universal function approximators
- **Architecture Choices**: How depth, width, and activations affect learning
- **Real Applications**: Classification, regression, and feature extraction

### Next Steps
1. **Export your code**: `tito package nbdev --export 04_networks`
2. **Test your implementation**: `tito test 04_networks`
3. **Build complete models**: Combine with training for full ML pipelines
4. **Move to Module 5**: Add convolutional layers for image processing!

**Ready for CNNs?** Your network foundations are now ready for specialized architectures!
"""