🧱 Implement Layers module - Neural Network Building Blocks

✨ Features: - Dense layer with Xavier initialization (y = Wx + b) - Activation functions: ReLU, Sigmoid, Tanh - Layer composition for building neural networks - Comprehensive test suite (17 passed, 5 skipped stretch goals) - Package-level integration tests (14 passed) - Complete documentation and examples 🎯 Educational Design: - Follows 'Build → Use → Understand' pedagogical framework - Immediate visual feedback with working examples - Progressive complexity from simple layers to full networks - Students see neural networks as function composition 🧪 Testing Architecture: - Module tests: 17/17 core tests pass, 5 stretch goals available - Package tests: 14/14 integration tests pass - Dual testing supports both learning and validation 📚 Complete Implementation: - Dense layer with proper weight initialization - Numerically stable activation functions - Batch processing support - Real-world examples (image classification network) - CLI integration: 'tito test --module layers' This establishes the fundamental building blocks students need to understand neural networks before diving into training.
2026-06-01 05:25:50 -05:00 · 2025-07-10 20:30:31 -04:00
parent e382a09a0c
commit e2c659023d
7 changed files with 2279 additions and 1 deletions
--- a/bin/tito.py
+++ b/bin/tito.py
@@ -343,7 +343,7 @@ def cmd_info(args):

 def cmd_test(args):
    """Run tests for a specific module."""
-    valid_modules = ["setup", "tensor", "mlp", "cnn", "data", "training", 
+    valid_modules = ["setup", "tensor", "layers", "cnn", "data", "training", 
                     "profiling", "compression", "kernels", "benchmarking", "mlops"]
    
    if args.all:
--- a/modules/layers/README.md
+++ b/modules/layers/README.md
@@ -0,0 +1,206 @@
+# 🧱 Module 2: Layers - Neural Network Building Blocks
+
+**Build the fundamental transformations that compose into neural networks**
+
+## 🎯 Learning Objectives
+
+After completing this module, you will:
+- Understand layers as functions that transform tensors: `y = f(x)`
+- Implement Dense layers with linear transformations: `y = Wx + b`
+- Add activation functions for nonlinearity (ReLU, Sigmoid, Tanh)
+- See how neural networks are just function composition
+- Build intuition for neural network architecture before diving into training
+
+## 🧱 Build → Use → Understand
+
+This module follows the TinyTorch pedagogical framework:
+
+1. **Build**: Dense layers and activation functions from scratch
+2. **Use**: Transform tensors and see immediate results
+3. **Understand**: How neural networks transform information
+
+## 📚 What You'll Build
+
+### **Dense Layer**
+```python
+layer = Dense(input_size=3, output_size=2)
+x = Tensor([[1.0, 2.0, 3.0]])
+y = layer(x)  # Shape: (1, 2)
+```
+
+### **Activation Functions**
+```python
+relu = ReLU()
+sigmoid = Sigmoid()
+tanh = Tanh()
+
+x = Tensor([[-1.0, 0.0, 1.0]])
+y_relu = relu(x)      # [0.0, 0.0, 1.0]
+y_sigmoid = sigmoid(x)  # [0.27, 0.5, 0.73]
+y_tanh = tanh(x)      # [-0.76, 0.0, 0.76]
+```
+
+### **Neural Networks**
+```python
+# 3 → 4 → 2 network
+layer1 = Dense(input_size=3, output_size=4)
+activation1 = ReLU()
+layer2 = Dense(input_size=4, output_size=2)
+activation2 = Sigmoid()
+
+# Forward pass
+x = Tensor([[1.0, 2.0, 3.0]])
+h1 = layer1(x)
+h1_activated = activation1(h1)
+h2 = layer2(h1_activated)
+output = activation2(h2)
+```
+
+## 🚀 Getting Started
+
+### Prerequisites
+- Complete Module 1: Tensor ✅
+- Understand basic linear algebra (matrix multiplication)
+- Familiar with Python classes and methods
+
+### Quick Start
+```bash
+# Navigate to the layers module
+cd modules/layers
+
+# Work in the development notebook
+jupyter notebook layers_dev.ipynb
+
+# Or work in the Python file
+code layers_dev.py
+```
+
+## 📖 Module Structure
+
+```
+modules/layers/
+├── layers_dev.py           # Main development file (work here!)
+├── layers_dev.ipynb        # Jupyter notebook version
+├── tests/
+│   └── test_layers.py      # Comprehensive tests
+├── README.md              # This file
+└── solutions/             # Reference implementations (if stuck)
+```
+
+## 🎓 Learning Path
+
+### Step 1: Dense Layer (Linear Transformation)
+- Understand `y = Wx + b`
+- Implement weight initialization
+- Handle matrix multiplication and bias addition
+- Test with single examples and batches
+
+### Step 2: Activation Functions
+- Implement ReLU: `max(0, x)`
+- Implement Sigmoid: `1 / (1 + e^(-x))`
+- Implement Tanh: `tanh(x)`
+- Understand why nonlinearity is crucial
+
+### Step 3: Layer Composition
+- Chain layers together
+- Build complete neural networks
+- See how simple layers create complex functions
+
+### Step 4: Real-World Application
+- Build an image classification network
+- Understand how architecture affects capability
+
+## 🧪 Testing Your Implementation
+
+### Module-Level Tests
+```bash
+# Run comprehensive tests
+python -m pytest tests/test_layers.py -v
+
+# Quick test
+python -c "from layers_dev import Dense, ReLU; print('✅ Layers working!')"
+```
+
+### Package-Level Tests
+```bash
+# Export to package
+python ../../bin/tito.py sync
+
+# Test integration
+python ../../bin/tito.py test --module layers
+```
+
+## 🎯 Key Concepts
+
+### **Layers as Functions**
+- Input: Tensor with some shape
+- Transformation: Mathematical operation
+- Output: Tensor with possibly different shape
+
+### **Linear vs Nonlinear**
+- Dense layers: Linear transformations
+- Activation functions: Nonlinear transformations
+- Composition: Linear + Nonlinear = Complex functions
+
+### **Neural Networks = Function Composition**
+```
+Input → Dense → ReLU → Dense → Sigmoid → Output
+```
+
+### **Why This Matters**
+- **Modularity**: Build complex networks from simple parts
+- **Reusability**: Same layers work for different problems
+- **Understanding**: Know how each part contributes to the whole
+
+## 🔍 Common Issues
+
+### **Import Errors**
+```python
+# Make sure you're in the right directory
+import sys
+sys.path.append('../../')
+from modules.tensor.tensor_dev import Tensor
+```
+
+### **Shape Mismatches**
+```python
+# Check input/output sizes match
+layer1 = Dense(input_size=3, output_size=4)
+layer2 = Dense(input_size=4, output_size=2)  # 4 matches output of layer1
+```
+
+### **Gradient Issues (Later)**
+```python
+# Use proper weight initialization
+limit = math.sqrt(6.0 / (input_size + output_size))
+weights = np.random.uniform(-limit, limit, (input_size, output_size))
+```
+
+## 🎉 Success Criteria
+
+You've successfully completed this module when:
+- ✅ All tests pass (`pytest tests/test_layers.py`)
+- ✅ You can build a 2-layer neural network
+- ✅ You understand how layers transform tensors
+- ✅ You see the connection between layers and neural networks
+- ✅ Package export works (`tito test --module layers`)
+
+## 🚀 What's Next
+
+After completing this module, you're ready for:
+- **Module 3: Networks** - Compose layers into common architectures
+- **Module 4: Training** - Learn how networks improve through experience
+- **Module 5: Applications** - Use networks for real problems
+
+## 🤝 Getting Help
+
+- Check the tests for examples of expected behavior
+- Look at the solutions/ directory if you're stuck
+- Review the pedagogical principles in `docs/pedagogy/`
+- Remember: Build → Use → Understand!
+
+---
+
+**Great job building the foundation of neural networks!** 🎉
+
+*This module implements the core insight: neural networks are just function composition of simple building blocks.* 
--- a/modules/layers/layers_dev.ipynb
+++ b/modules/layers/layers_dev.ipynb
@@ -0,0 +1,701 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "2843fa68",
+   "metadata": {
+    "cell_marker": "\"\"\""
+   },
+   "source": [
+    "# Module 2: Layers - Neural Network Building Blocks\n",
+    "\n",
+    "Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.\n",
+    "\n",
+    "## Learning Goals\n",
+    "- Understand layers as functions that transform tensors: `y = f(x)`\n",
+    "- Implement Dense layers with linear transformations: `y = Wx + b`\n",
+    "- Add activation functions for nonlinearity (ReLU, Sigmoid, Tanh)\n",
+    "- See how neural networks are just function composition\n",
+    "- Build intuition before diving into training\n",
+    "\n",
+    "## Build → Use → Understand\n",
+    "1. **Build**: Dense layers and activation functions\n",
+    "2. **Use**: Transform tensors and see immediate results\n",
+    "3. **Understand**: How neural networks transform information\n",
+    "\n",
+    "## Module → Package Structure\n",
+    "**🎓 Teaching vs. 🔧 Building**: \n",
+    "- **Learning side**: Work in `modules/layers/layers_dev.py`  \n",
+    "- **Building side**: Exports to `tinytorch/core/layers.py`\n",
+    "\n",
+    "This module builds the fundamental transformations that compose into neural networks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9d285d84",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#| default_exp core.layers\n",
+    "\n",
+    "# Setup and imports\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "from typing import Union, Optional, Callable\n",
+    "import math"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a12b7f36",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#| export\n",
+    "import numpy as np\n",
+    "import math\n",
+    "import sys\n",
+    "from typing import Union, Optional, Callable\n",
+    "from tinytorch.core.tensor import Tensor\n",
+    "\n",
+    "# Import our Tensor class\n",
+    "# sys.path.append('../../')\n",
+    "# from modules.tensor.tensor_dev import Tensor\n",
+    "\n",
+    "# print(\"🔥 TinyTorch Layers Module\")\n",
+    "# print(f\"NumPy version: {np.__version__}\")\n",
+    "# print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
+    "# print(\"Ready to build neural network layers!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b8b760c",
+   "metadata": {
+    "cell_marker": "\"\"\"",
+    "lines_to_next_cell": 1
+   },
+   "source": [
+    "## Step 1: What is a Layer?\n",
+    "\n",
+    "A **layer** is a function that transforms tensors. Think of it as:\n",
+    "- **Input**: Tensor with some shape\n",
+    "- **Transformation**: Mathematical operation (linear, nonlinear, etc.)\n",
+    "- **Output**: Tensor with possibly different shape\n",
+    "\n",
+    "**The fundamental insight**: Neural networks are just function composition!\n",
+    "```\n",
+    "x → Layer1 → Layer2 → Layer3 → y\n",
+    "```\n",
+    "\n",
+    "**Why layers matter**:\n",
+    "- They're the building blocks of all neural networks\n",
+    "- Each layer learns a different transformation\n",
+    "- Composing layers creates complex functions\n",
+    "- Understanding layers = understanding neural networks\n",
+    "\n",
+    "Let's start with the most important layer: **Dense** (also called Linear or Fully Connected)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fabf403c",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| export\n",
+    "class Dense:\n",
+    "    \"\"\"\n",
+    "    Dense (Linear) Layer: y = Wx + b\n",
+    "    \n",
+    "    The fundamental building block of neural networks.\n",
+    "    Performs linear transformation: matrix multiplication + bias addition.\n",
+    "    \n",
+    "    Args:\n",
+    "        input_size: Number of input features\n",
+    "        output_size: Number of output features\n",
+    "        use_bias: Whether to include bias term (default: True)\n",
+    "        \n",
+    "    TODO: Implement the Dense layer with weight initialization and forward pass.\n",
+    "    \"\"\"\n",
+    "    \n",
+    "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
+    "        \"\"\"\n",
+    "        Initialize Dense layer with random weights.\n",
+    "        \n",
+    "        TODO: \n",
+    "        1. Store layer parameters (input_size, output_size, use_bias)\n",
+    "        2. Initialize weights with small random values\n",
+    "        3. Initialize bias to zeros (if use_bias=True)\n",
+    "        \"\"\"\n",
+    "        raise NotImplementedError(\"Student implementation required\")\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"\n",
+    "        Forward pass: y = Wx + b\n",
+    "        \n",
+    "        Args:\n",
+    "            x: Input tensor of shape (batch_size, input_size)\n",
+    "            \n",
+    "        Returns:\n",
+    "            Output tensor of shape (batch_size, output_size)\n",
+    "            \n",
+    "        TODO: Implement matrix multiplication and bias addition\n",
+    "        \"\"\"\n",
+    "        raise NotImplementedError(\"Student implementation required\")\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "718aafe5",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| hide\n",
+    "#| export\n",
+    "class Dense:\n",
+    "    \"\"\"\n",
+    "    Dense (Linear) Layer: y = Wx + b\n",
+    "    \n",
+    "    The fundamental building block of neural networks.\n",
+    "    Performs linear transformation: matrix multiplication + bias addition.\n",
+    "    \"\"\"\n",
+    "    \n",
+    "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):\n",
+    "        \"\"\"Initialize Dense layer with random weights.\"\"\"\n",
+    "        self.input_size = input_size\n",
+    "        self.output_size = output_size\n",
+    "        self.use_bias = use_bias\n",
+    "        \n",
+    "        # Initialize weights with Xavier/Glorot initialization\n",
+    "        # This helps with gradient flow during training\n",
+    "        limit = math.sqrt(6.0 / (input_size + output_size))\n",
+    "        self.weights = Tensor(\n",
+    "            np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)\n",
+    "        )\n",
+    "        \n",
+    "        # Initialize bias to zeros\n",
+    "        if use_bias:\n",
+    "            self.bias = Tensor(np.zeros(output_size, dtype=np.float32))\n",
+    "        else:\n",
+    "            self.bias = None\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"Forward pass: y = Wx + b\"\"\"\n",
+    "        # Matrix multiplication: x @ weights\n",
+    "        # x shape: (batch_size, input_size)\n",
+    "        # weights shape: (input_size, output_size)\n",
+    "        # result shape: (batch_size, output_size)\n",
+    "        output = Tensor(x.data @ self.weights.data)\n",
+    "        \n",
+    "        # Add bias if present\n",
+    "        if self.bias is not None:\n",
+    "            output = Tensor(output.data + self.bias.data)\n",
+    "        \n",
+    "        return output\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54390574",
+   "metadata": {
+    "cell_marker": "\"\"\""
+   },
+   "source": [
+    "### 🧪 Test Your Dense Layer\n",
+    "\n",
+    "Once you implement the Dense layer above, run this cell to test it:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c24b9bc7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test the Dense layer\n",
+    "try:\n",
+    "    print(\"=== Testing Dense Layer ===\")\n",
+    "    \n",
+    "    # Create a simple Dense layer: 3 inputs → 2 outputs\n",
+    "    layer = Dense(input_size=3, output_size=2)\n",
+    "    print(f\"Created Dense layer: {layer.input_size} → {layer.output_size}\")\n",
+    "    print(f\"Weights shape: {layer.weights.shape}\")\n",
+    "    print(f\"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}\")\n",
+    "    \n",
+    "    # Test with a single example\n",
+    "    x = Tensor([[1.0, 2.0, 3.0]])  # Shape: (1, 3)\n",
+    "    y = layer(x)\n",
+    "    print(f\"Input shape: {x.shape}\")\n",
+    "    print(f\"Output shape: {y.shape}\")\n",
+    "    print(f\"Input: {x.data}\")\n",
+    "    print(f\"Output: {y.data}\")\n",
+    "    \n",
+    "    # Test with batch of examples\n",
+    "    x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])  # Shape: (2, 3)\n",
+    "    y_batch = layer(x_batch)\n",
+    "    print(f\"\\nBatch input shape: {x_batch.shape}\")\n",
+    "    print(f\"Batch output shape: {y_batch.shape}\")\n",
+    "    print(f\"Batch output: {y_batch.data}\")\n",
+    "    \n",
+    "    print(\"✅ Dense layer working!\")\n",
+    "    \n",
+    "except Exception as e:\n",
+    "    print(f\"❌ Error: {e}\")\n",
+    "    print(\"Make sure to implement the Dense layer above!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "50ccc78d",
+   "metadata": {
+    "cell_marker": "\"\"\"",
+    "lines_to_next_cell": 1
+   },
+   "source": [
+    "## Step 2: Activation Functions\n",
+    "\n",
+    "Dense layers alone can only learn **linear** transformations. But most real-world problems need **nonlinear** transformations.\n",
+    "\n",
+    "**Activation functions** add nonlinearity:\n",
+    "- **ReLU**: `max(0, x)` - Most common, simple and effective\n",
+    "- **Sigmoid**: `1 / (1 + e^(-x))` - Squashes to (0, 1)\n",
+    "- **Tanh**: `tanh(x)` - Squashes to (-1, 1)\n",
+    "\n",
+    "**Why nonlinearity matters**: Without it, stacking layers is pointless!\n",
+    "```\n",
+    "Linear → Linear → Linear = Just one big Linear transformation\n",
+    "Linear → NonLinear → Linear = Can learn complex patterns\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "85818dc3",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| export\n",
+    "class ReLU:\n",
+    "    \"\"\"\n",
+    "    ReLU Activation: f(x) = max(0, x)\n",
+    "    \n",
+    "    The most popular activation function in deep learning.\n",
+    "    Simple, effective, and computationally efficient.\n",
+    "    \n",
+    "    TODO: Implement ReLU activation function.\n",
+    "    \"\"\"\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"\n",
+    "        Apply ReLU: f(x) = max(0, x)\n",
+    "        \n",
+    "        Args:\n",
+    "            x: Input tensor\n",
+    "            \n",
+    "        Returns:\n",
+    "            Output tensor with ReLU applied element-wise\n",
+    "            \n",
+    "        TODO: Implement element-wise max(0, x) operation\n",
+    "        \"\"\"\n",
+    "        raise NotImplementedError(\"Student implementation required\")\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"Make activation callable: relu(x) same as relu.forward(x)\"\"\"\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "23e807f1",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| hide\n",
+    "#| export\n",
+    "class ReLU:\n",
+    "    \"\"\"ReLU Activation: f(x) = max(0, x)\"\"\"\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"Apply ReLU: f(x) = max(0, x)\"\"\"\n",
+    "        return Tensor(np.maximum(0, x.data))\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3c0bb26a",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| export\n",
+    "class Sigmoid:\n",
+    "    \"\"\"\n",
+    "    Sigmoid Activation: f(x) = 1 / (1 + e^(-x))\n",
+    "    \n",
+    "    Squashes input to range (0, 1). Often used for binary classification.\n",
+    "    \n",
+    "    TODO: Implement Sigmoid activation function.\n",
+    "    \"\"\"\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"\n",
+    "        Apply Sigmoid: f(x) = 1 / (1 + e^(-x))\n",
+    "        \n",
+    "        Args:\n",
+    "            x: Input tensor\n",
+    "            \n",
+    "        Returns:\n",
+    "            Output tensor with Sigmoid applied element-wise\n",
+    "            \n",
+    "        TODO: Implement sigmoid function (be careful with numerical stability!)\n",
+    "        \"\"\"\n",
+    "        raise NotImplementedError(\"Student implementation required\")\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "972e9668",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| hide\n",
+    "#| export\n",
+    "class Sigmoid:\n",
+    "    \"\"\"Sigmoid Activation: f(x) = 1 / (1 + e^(-x))\"\"\"\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"Apply Sigmoid with numerical stability\"\"\"\n",
+    "        # Use the numerically stable version to avoid overflow\n",
+    "        # For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))\n",
+    "        # For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))\n",
+    "        x_data = x.data\n",
+    "        result = np.zeros_like(x_data)\n",
+    "        \n",
+    "        # Stable computation\n",
+    "        positive_mask = x_data >= 0\n",
+    "        result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))\n",
+    "        result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))\n",
+    "        \n",
+    "        return Tensor(result)\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2babe8a8",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| export\n",
+    "class Tanh:\n",
+    "    \"\"\"\n",
+    "    Tanh Activation: f(x) = tanh(x)\n",
+    "    \n",
+    "    Squashes input to range (-1, 1). Zero-centered output.\n",
+    "    \n",
+    "    TODO: Implement Tanh activation function.\n",
+    "    \"\"\"\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"\n",
+    "        Apply Tanh: f(x) = tanh(x)\n",
+    "        \n",
+    "        Args:\n",
+    "            x: Input tensor\n",
+    "            \n",
+    "        Returns:\n",
+    "            Output tensor with Tanh applied element-wise\n",
+    "            \n",
+    "        TODO: Implement tanh function\n",
+    "        \"\"\"\n",
+    "        raise NotImplementedError(\"Student implementation required\")\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5eff4e44",
+   "metadata": {
+    "lines_to_next_cell": 1
+   },
+   "outputs": [],
+   "source": [
+    "#| hide\n",
+    "#| export\n",
+    "class Tanh:\n",
+    "    \"\"\"Tanh Activation: f(x) = tanh(x)\"\"\"\n",
+    "    \n",
+    "    def forward(self, x: Tensor) -> Tensor:\n",
+    "        \"\"\"Apply Tanh\"\"\"\n",
+    "        return Tensor(np.tanh(x.data))\n",
+    "    \n",
+    "    def __call__(self, x: Tensor) -> Tensor:\n",
+    "        return self.forward(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c39e4420",
+   "metadata": {
+    "cell_marker": "\"\"\""
+   },
+   "source": [
+    "### 🧪 Test Your Activation Functions\n",
+    "\n",
+    "Once you implement the activation functions above, run this cell to test them:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f73687cc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Test activation functions\n",
+    "try:\n",
+    "    print(\"=== Testing Activation Functions ===\")\n",
+    "    \n",
+    "    # Test data: mix of positive, negative, and zero\n",
+    "    x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])\n",
+    "    print(f\"Input: {x.data}\")\n",
+    "    \n",
+    "    # Test ReLU\n",
+    "    relu = ReLU()\n",
+    "    y_relu = relu(x)\n",
+    "    print(f\"ReLU output: {y_relu.data}\")\n",
+    "    \n",
+    "    # Test Sigmoid\n",
+    "    sigmoid = Sigmoid()\n",
+    "    y_sigmoid = sigmoid(x)\n",
+    "    print(f\"Sigmoid output: {y_sigmoid.data}\")\n",
+    "    \n",
+    "    # Test Tanh\n",
+    "    tanh = Tanh()\n",
+    "    y_tanh = tanh(x)\n",
+    "    print(f\"Tanh output: {y_tanh.data}\")\n",
+    "    \n",
+    "    print(\"✅ Activation functions working!\")\n",
+    "    \n",
+    "except Exception as e:\n",
+    "    print(f\"❌ Error: {e}\")\n",
+    "    print(\"Make sure to implement the activation functions above!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ec82e933",
+   "metadata": {
+    "cell_marker": "\"\"\""
+   },
+   "source": [
+    "## Step 3: Layer Composition - Building Neural Networks\n",
+    "\n",
+    "Now comes the magic! We can **compose** layers to build neural networks:\n",
+    "\n",
+    "```\n",
+    "Input → Dense → ReLU → Dense → Sigmoid → Output\n",
+    "```\n",
+    "\n",
+    "This is a 2-layer neural network that can learn complex nonlinear patterns!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "06c5692f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Build a simple 2-layer neural network\n",
+    "try:\n",
+    "    print(\"=== Building a 2-Layer Neural Network ===\")\n",
+    "    \n",
+    "    # Network architecture: 3 → 4 → 2\n",
+    "    # Input: 3 features\n",
+    "    # Hidden: 4 neurons with ReLU\n",
+    "    # Output: 2 neurons with Sigmoid\n",
+    "    \n",
+    "    layer1 = Dense(input_size=3, output_size=4)\n",
+    "    activation1 = ReLU()\n",
+    "    layer2 = Dense(input_size=4, output_size=2)\n",
+    "    activation2 = Sigmoid()\n",
+    "    \n",
+    "    print(\"Network architecture:\")\n",
+    "    print(f\"  Input: 3 features\")\n",
+    "    print(f\"  Hidden: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)\")\n",
+    "    print(f\"  Output: {layer2.input_size} → {layer2.output_size} (Dense + Sigmoid)\")\n",
+    "    \n",
+    "    # Test with sample data\n",
+    "    x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])  # 2 examples, 3 features each\n",
+    "    print(f\"\\nInput shape: {x.shape}\")\n",
+    "    print(f\"Input data: {x.data}\")\n",
+    "    \n",
+    "    # Forward pass through the network\n",
+    "    h1 = layer1(x)           # Dense layer 1\n",
+    "    h1_activated = activation1(h1)  # ReLU activation\n",
+    "    h2 = layer2(h1_activated)       # Dense layer 2  \n",
+    "    output = activation2(h2)        # Sigmoid activation\n",
+    "    \n",
+    "    print(f\"\\nAfter layer 1: {h1.shape}\")\n",
+    "    print(f\"After ReLU: {h1_activated.shape}\")\n",
+    "    print(f\"After layer 2: {h2.shape}\")\n",
+    "    print(f\"Final output: {output.shape}\")\n",
+    "    print(f\"Output values: {output.data}\")\n",
+    "    \n",
+    "    print(\"\\n🎉 Neural network working! You just built your first neural network!\")\n",
+    "    print(\"Notice how the network transforms 3D input into 2D output through learned transformations.\")\n",
+    "    \n",
+    "except Exception as e:\n",
+    "    print(f\"❌ Error: {e}\")\n",
+    "    print(\"Make sure to implement the layers and activations above!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "13dc6d9a",
+   "metadata": {
+    "cell_marker": "\"\"\""
+   },
+   "source": [
+    "## Step 4: Understanding What We Built\n",
+    "\n",
+    "Congratulations! You just implemented the fundamental building blocks of neural networks:\n",
+    "\n",
+    "### 🧱 **What You Built**\n",
+    "1. **Dense Layer**: Linear transformation `y = Wx + b`\n",
+    "2. **Activation Functions**: Nonlinear transformations (ReLU, Sigmoid, Tanh)\n",
+    "3. **Layer Composition**: Chaining layers to build networks\n",
+    "\n",
+    "### 🎯 **Key Insights**\n",
+    "- **Layers are functions**: They transform tensors from one space to another\n",
+    "- **Composition creates complexity**: Simple layers → complex networks\n",
+    "- **Nonlinearity is crucial**: Without it, deep networks are just linear transformations\n",
+    "- **Neural networks are function approximators**: They learn to map inputs to outputs\n",
+    "\n",
+    "### 🚀 **What's Next**\n",
+    "In the next modules, you'll learn:\n",
+    "- **Training**: How networks learn from data (backpropagation, optimizers)\n",
+    "- **Architectures**: Specialized layers for different problems (CNNs, RNNs)\n",
+    "- **Applications**: Using networks for real problems\n",
+    "\n",
+    "### 🔧 **Export to Package**\n",
+    "Run this to export your layers to the TinyTorch package:\n",
+    "```bash\n",
+    "python bin/tito.py sync\n",
+    "```\n",
+    "\n",
+    "Then test your implementation:\n",
+    "```bash\n",
+    "python bin/tito.py test --module layers\n",
+    "```\n",
+    "\n",
+    "**Great job! You've built the foundation of neural networks!** 🎉"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a54d8ce9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Final demonstration: A more complex example\n",
+    "try:\n",
+    "    print(\"=== Final Demo: Image Classification Network ===\")\n",
+    "    \n",
+    "    # Simulate a small image: 28x28 pixels flattened to 784 features\n",
+    "    # This is like a tiny MNIST digit\n",
+    "    image_size = 28 * 28  # 784 pixels\n",
+    "    num_classes = 10      # 10 digits (0-9)\n",
+    "    \n",
+    "    # Build a 3-layer network for digit classification\n",
+    "    # 784 → 128 → 64 → 10\n",
+    "    layer1 = Dense(input_size=image_size, output_size=128)\n",
+    "    relu1 = ReLU()\n",
+    "    layer2 = Dense(input_size=128, output_size=64)\n",
+    "    relu2 = ReLU()\n",
+    "    layer3 = Dense(input_size=64, output_size=num_classes)\n",
+    "    softmax = Sigmoid()  # Using Sigmoid as a simple \"probability-like\" output\n",
+    "    \n",
+    "    print(f\"Image classification network:\")\n",
+    "    print(f\"  Input: {image_size} pixels (28x28 image)\")\n",
+    "    print(f\"  Hidden 1: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)\")\n",
+    "    print(f\"  Hidden 2: {layer2.input_size} → {layer2.output_size} (Dense + ReLU)\")\n",
+    "    print(f\"  Output: {layer3.input_size} → {layer3.output_size} (Dense + Sigmoid)\")\n",
+    "    \n",
+    "    # Simulate a batch of 5 images\n",
+    "    batch_size = 5\n",
+    "    fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32))\n",
+    "    \n",
+    "    # Forward pass\n",
+    "    h1 = relu1(layer1(fake_images))\n",
+    "    h2 = relu2(layer2(h1))\n",
+    "    predictions = softmax(layer3(h2))\n",
+    "    \n",
+    "    print(f\"\\nBatch processing:\")\n",
+    "    print(f\"  Input batch shape: {fake_images.shape}\")\n",
+    "    print(f\"  Predictions shape: {predictions.shape}\")\n",
+    "    print(f\"  Sample predictions: {predictions.data[0]}\")  # First image predictions\n",
+    "    \n",
+    "    print(\"\\n🎉 You built a neural network that could classify images!\")\n",
+    "    print(\"With training, this network could learn to recognize handwritten digits!\")\n",
+    "    \n",
+    "except Exception as e:\n",
+    "    print(f\"❌ Error: {e}\")\n",
+    "    print(\"Check your layer implementations!\") "
+   ]
+  }
+ ],
+ "metadata": {
+  "jupytext": {
+   "main_language": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
--- a/modules/layers/layers_dev.py
+++ b/modules/layers/layers_dev.py
@@ -0,0 +1,548 @@
+# ---
+# jupyter:
+#   jupytext:
+#     text_representation:
+#       extension: .py
+#       format_name: percent
+#       format_version: '1.3'
+#       jupytext_version: 1.17.1
+# ---
+
+# %% [markdown]
+"""
+# Module 2: Layers - Neural Network Building Blocks
+
+Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.
+
+## Learning Goals
+- Understand layers as functions that transform tensors: `y = f(x)`
+- Implement Dense layers with linear transformations: `y = Wx + b`
+- Add activation functions for nonlinearity (ReLU, Sigmoid, Tanh)
+- See how neural networks are just function composition
+- Build intuition before diving into training
+
+## Build → Use → Understand
+1. **Build**: Dense layers and activation functions
+2. **Use**: Transform tensors and see immediate results
+3. **Understand**: How neural networks transform information
+
+## Module → Package Structure
+**🎓 Teaching vs. 🔧 Building**: 
+- **Learning side**: Work in `modules/layers/layers_dev.py`  
+- **Building side**: Exports to `tinytorch/core/layers.py`
+
+This module builds the fundamental transformations that compose into neural networks.
+"""
+
+# %%
+#| default_exp core.layers
+
+# Setup and imports
+import numpy as np
+import sys
+from typing import Union, Optional, Callable
+import math
+
+# %%
+#| export
+import numpy as np
+import math
+import sys
+from typing import Union, Optional, Callable
+from tinytorch.core.tensor import Tensor
+
+# Import our Tensor class
+# sys.path.append('../../')
+# from modules.tensor.tensor_dev import Tensor
+
+# print("🔥 TinyTorch Layers Module")
+# print(f"NumPy version: {np.__version__}")
+# print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
+# print("Ready to build neural network layers!")
+
+# %% [markdown]
+"""
+## Step 1: What is a Layer?
+
+A **layer** is a function that transforms tensors. Think of it as:
+- **Input**: Tensor with some shape
+- **Transformation**: Mathematical operation (linear, nonlinear, etc.)
+- **Output**: Tensor with possibly different shape
+
+**The fundamental insight**: Neural networks are just function composition!
+```
+x → Layer1 → Layer2 → Layer3 → y
+```
+
+**Why layers matter**:
+- They're the building blocks of all neural networks
+- Each layer learns a different transformation
+- Composing layers creates complex functions
+- Understanding layers = understanding neural networks
+
+Let's start with the most important layer: **Dense** (also called Linear or Fully Connected).
+"""
+
+# %%
+#| export
+class Dense:
+    """
+    Dense (Linear) Layer: y = Wx + b
+    
+    The fundamental building block of neural networks.
+    Performs linear transformation: matrix multiplication + bias addition.
+    
+    Args:
+        input_size: Number of input features
+        output_size: Number of output features
+        use_bias: Whether to include bias term (default: True)
+        
+    TODO: Implement the Dense layer with weight initialization and forward pass.
+    """
+    
+    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
+        """
+        Initialize Dense layer with random weights.
+        
+        TODO: 
+        1. Store layer parameters (input_size, output_size, use_bias)
+        2. Initialize weights with small random values
+        3. Initialize bias to zeros (if use_bias=True)
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Forward pass: y = Wx + b
+        
+        Args:
+            x: Input tensor of shape (batch_size, input_size)
+            
+        Returns:
+            Output tensor of shape (batch_size, output_size)
+            
+        TODO: Implement matrix multiplication and bias addition
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        """Make layer callable: layer(x) same as layer.forward(x)"""
+        return self.forward(x)
+
+# %%
+#| hide
+#| export
+class Dense:
+    """
+    Dense (Linear) Layer: y = Wx + b
+    
+    The fundamental building block of neural networks.
+    Performs linear transformation: matrix multiplication + bias addition.
+    """
+    
+    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
+        """Initialize Dense layer with random weights."""
+        self.input_size = input_size
+        self.output_size = output_size
+        self.use_bias = use_bias
+        
+        # Initialize weights with Xavier/Glorot initialization
+        # This helps with gradient flow during training
+        limit = math.sqrt(6.0 / (input_size + output_size))
+        self.weights = Tensor(
+            np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)
+        )
+        
+        # Initialize bias to zeros
+        if use_bias:
+            self.bias = Tensor(np.zeros(output_size, dtype=np.float32))
+        else:
+            self.bias = None
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward pass: y = Wx + b"""
+        # Matrix multiplication: x @ weights
+        # x shape: (batch_size, input_size)
+        # weights shape: (input_size, output_size)
+        # result shape: (batch_size, output_size)
+        output = Tensor(x.data @ self.weights.data)
+        
+        # Add bias if present
+        if self.bias is not None:
+            output = Tensor(output.data + self.bias.data)
+        
+        return output
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        """Make layer callable: layer(x) same as layer.forward(x)"""
+        return self.forward(x)
+
+# %% [markdown]
+"""
+### 🧪 Test Your Dense Layer
+
+Once you implement the Dense layer above, run this cell to test it:
+"""
+
+# %%
+# Test the Dense layer
+try:
+    print("=== Testing Dense Layer ===")
+    
+    # Create a simple Dense layer: 3 inputs → 2 outputs
+    layer = Dense(input_size=3, output_size=2)
+    print(f"Created Dense layer: {layer.input_size} → {layer.output_size}")
+    print(f"Weights shape: {layer.weights.shape}")
+    print(f"Bias shape: {layer.bias.shape if layer.bias else 'No bias'}")
+    
+    # Test with a single example
+    x = Tensor([[1.0, 2.0, 3.0]])  # Shape: (1, 3)
+    y = layer(x)
+    print(f"Input shape: {x.shape}")
+    print(f"Output shape: {y.shape}")
+    print(f"Input: {x.data}")
+    print(f"Output: {y.data}")
+    
+    # Test with batch of examples
+    x_batch = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])  # Shape: (2, 3)
+    y_batch = layer(x_batch)
+    print(f"\nBatch input shape: {x_batch.shape}")
+    print(f"Batch output shape: {y_batch.shape}")
+    print(f"Batch output: {y_batch.data}")
+    
+    print("✅ Dense layer working!")
+    
+except Exception as e:
+    print(f"❌ Error: {e}")
+    print("Make sure to implement the Dense layer above!")
+
+# %% [markdown]
+"""
+## Step 2: Activation Functions
+
+Dense layers alone can only learn **linear** transformations. But most real-world problems need **nonlinear** transformations.
+
+**Activation functions** add nonlinearity:
+- **ReLU**: `max(0, x)` - Most common, simple and effective
+- **Sigmoid**: `1 / (1 + e^(-x))` - Squashes to (0, 1)
+- **Tanh**: `tanh(x)` - Squashes to (-1, 1)
+
+**Why nonlinearity matters**: Without it, stacking layers is pointless!
+```
+Linear → Linear → Linear = Just one big Linear transformation
+Linear → NonLinear → Linear = Can learn complex patterns
+```
+"""
+
+# %%
+#| export
+class ReLU:
+    """
+    ReLU Activation: f(x) = max(0, x)
+    
+    The most popular activation function in deep learning.
+    Simple, effective, and computationally efficient.
+    
+    TODO: Implement ReLU activation function.
+    """
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Apply ReLU: f(x) = max(0, x)
+        
+        Args:
+            x: Input tensor
+            
+        Returns:
+            Output tensor with ReLU applied element-wise
+            
+        TODO: Implement element-wise max(0, x) operation
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        """Make activation callable: relu(x) same as relu.forward(x)"""
+        return self.forward(x)
+
+# %%
+#| hide
+#| export
+class ReLU:
+    """ReLU Activation: f(x) = max(0, x)"""
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Apply ReLU: f(x) = max(0, x)"""
+        return Tensor(np.maximum(0, x.data))
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %%
+#| export
+class Sigmoid:
+    """
+    Sigmoid Activation: f(x) = 1 / (1 + e^(-x))
+    
+    Squashes input to range (0, 1). Often used for binary classification.
+    
+    TODO: Implement Sigmoid activation function.
+    """
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Apply Sigmoid: f(x) = 1 / (1 + e^(-x))
+        
+        Args:
+            x: Input tensor
+            
+        Returns:
+            Output tensor with Sigmoid applied element-wise
+            
+        TODO: Implement sigmoid function (be careful with numerical stability!)
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %%
+#| hide
+#| export
+class Sigmoid:
+    """Sigmoid Activation: f(x) = 1 / (1 + e^(-x))"""
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Apply Sigmoid with numerical stability"""
+        # Use the numerically stable version to avoid overflow
+        # For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
+        # For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
+        x_data = x.data
+        result = np.zeros_like(x_data)
+        
+        # Stable computation
+        positive_mask = x_data >= 0
+        result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))
+        result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))
+        
+        return Tensor(result)
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %%
+#| export
+class Tanh:
+    """
+    Tanh Activation: f(x) = tanh(x)
+    
+    Squashes input to range (-1, 1). Zero-centered output.
+    
+    TODO: Implement Tanh activation function.
+    """
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Apply Tanh: f(x) = tanh(x)
+        
+        Args:
+            x: Input tensor
+            
+        Returns:
+            Output tensor with Tanh applied element-wise
+            
+        TODO: Implement tanh function
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %%
+#| hide
+#| export
+class Tanh:
+    """Tanh Activation: f(x) = tanh(x)"""
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Apply Tanh"""
+        return Tensor(np.tanh(x.data))
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %% [markdown]
+"""
+### 🧪 Test Your Activation Functions
+
+Once you implement the activation functions above, run this cell to test them:
+"""
+
+# %%
+# Test activation functions
+try:
+    print("=== Testing Activation Functions ===")
+    
+    # Test data: mix of positive, negative, and zero
+    x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])
+    print(f"Input: {x.data}")
+    
+    # Test ReLU
+    relu = ReLU()
+    y_relu = relu(x)
+    print(f"ReLU output: {y_relu.data}")
+    
+    # Test Sigmoid
+    sigmoid = Sigmoid()
+    y_sigmoid = sigmoid(x)
+    print(f"Sigmoid output: {y_sigmoid.data}")
+    
+    # Test Tanh
+    tanh = Tanh()
+    y_tanh = tanh(x)
+    print(f"Tanh output: {y_tanh.data}")
+    
+    print("✅ Activation functions working!")
+    
+except Exception as e:
+    print(f"❌ Error: {e}")
+    print("Make sure to implement the activation functions above!")
+
+# %% [markdown]
+"""
+## Step 3: Layer Composition - Building Neural Networks
+
+Now comes the magic! We can **compose** layers to build neural networks:
+
+```
+Input → Dense → ReLU → Dense → Sigmoid → Output
+```
+
+This is a 2-layer neural network that can learn complex nonlinear patterns!
+"""
+
+# %%
+# Build a simple 2-layer neural network
+try:
+    print("=== Building a 2-Layer Neural Network ===")
+    
+    # Network architecture: 3 → 4 → 2
+    # Input: 3 features
+    # Hidden: 4 neurons with ReLU
+    # Output: 2 neurons with Sigmoid
+    
+    layer1 = Dense(input_size=3, output_size=4)
+    activation1 = ReLU()
+    layer2 = Dense(input_size=4, output_size=2)
+    activation2 = Sigmoid()
+    
+    print("Network architecture:")
+    print(f"  Input: 3 features")
+    print(f"  Hidden: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)")
+    print(f"  Output: {layer2.input_size} → {layer2.output_size} (Dense + Sigmoid)")
+    
+    # Test with sample data
+    x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])  # 2 examples, 3 features each
+    print(f"\nInput shape: {x.shape}")
+    print(f"Input data: {x.data}")
+    
+    # Forward pass through the network
+    h1 = layer1(x)           # Dense layer 1
+    h1_activated = activation1(h1)  # ReLU activation
+    h2 = layer2(h1_activated)       # Dense layer 2  
+    output = activation2(h2)        # Sigmoid activation
+    
+    print(f"\nAfter layer 1: {h1.shape}")
+    print(f"After ReLU: {h1_activated.shape}")
+    print(f"After layer 2: {h2.shape}")
+    print(f"Final output: {output.shape}")
+    print(f"Output values: {output.data}")
+    
+    print("\n🎉 Neural network working! You just built your first neural network!")
+    print("Notice how the network transforms 3D input into 2D output through learned transformations.")
+    
+except Exception as e:
+    print(f"❌ Error: {e}")
+    print("Make sure to implement the layers and activations above!")
+
+# %% [markdown]
+"""
+## Step 4: Understanding What We Built
+
+Congratulations! You just implemented the fundamental building blocks of neural networks:
+
+### 🧱 **What You Built**
+1. **Dense Layer**: Linear transformation `y = Wx + b`
+2. **Activation Functions**: Nonlinear transformations (ReLU, Sigmoid, Tanh)
+3. **Layer Composition**: Chaining layers to build networks
+
+### 🎯 **Key Insights**
+- **Layers are functions**: They transform tensors from one space to another
+- **Composition creates complexity**: Simple layers → complex networks
+- **Nonlinearity is crucial**: Without it, deep networks are just linear transformations
+- **Neural networks are function approximators**: They learn to map inputs to outputs
+
+### 🚀 **What's Next**
+In the next modules, you'll learn:
+- **Training**: How networks learn from data (backpropagation, optimizers)
+- **Architectures**: Specialized layers for different problems (CNNs, RNNs)
+- **Applications**: Using networks for real problems
+
+### 🔧 **Export to Package**
+Run this to export your layers to the TinyTorch package:
+```bash
+python bin/tito.py sync
+```
+
+Then test your implementation:
+```bash
+python bin/tito.py test --module layers
+```
+
+**Great job! You've built the foundation of neural networks!** 🎉
+"""
+
+# %%
+# Final demonstration: A more complex example
+try:
+    print("=== Final Demo: Image Classification Network ===")
+    
+    # Simulate a small image: 28x28 pixels flattened to 784 features
+    # This is like a tiny MNIST digit
+    image_size = 28 * 28  # 784 pixels
+    num_classes = 10      # 10 digits (0-9)
+    
+    # Build a 3-layer network for digit classification
+    # 784 → 128 → 64 → 10
+    layer1 = Dense(input_size=image_size, output_size=128)
+    relu1 = ReLU()
+    layer2 = Dense(input_size=128, output_size=64)
+    relu2 = ReLU()
+    layer3 = Dense(input_size=64, output_size=num_classes)
+    softmax = Sigmoid()  # Using Sigmoid as a simple "probability-like" output
+    
+    print(f"Image classification network:")
+    print(f"  Input: {image_size} pixels (28x28 image)")
+    print(f"  Hidden 1: {layer1.input_size} → {layer1.output_size} (Dense + ReLU)")
+    print(f"  Hidden 2: {layer2.input_size} → {layer2.output_size} (Dense + ReLU)")
+    print(f"  Output: {layer3.input_size} → {layer3.output_size} (Dense + Sigmoid)")
+    
+    # Simulate a batch of 5 images
+    batch_size = 5
+    fake_images = Tensor(np.random.randn(batch_size, image_size).astype(np.float32))
+    
+    # Forward pass
+    h1 = relu1(layer1(fake_images))
+    h2 = relu2(layer2(h1))
+    predictions = softmax(layer3(h2))
+    
+    print(f"\nBatch processing:")
+    print(f"  Input batch shape: {fake_images.shape}")
+    print(f"  Predictions shape: {predictions.shape}")
+    print(f"  Sample predictions: {predictions.data[0]}")  # First image predictions
+    
+    print("\n🎉 You built a neural network that could classify images!")
+    print("With training, this network could learn to recognize handwritten digits!")
+    
+except Exception as e:
+    print(f"❌ Error: {e}")
+    print("Check your layer implementations!") 
--- a/modules/layers/tests/test_layers.py
+++ b/modules/layers/tests/test_layers.py
@@ -0,0 +1,343 @@
+"""
+Tests for TinyTorch Layers module.
+
+Tests the core layer functionality including Dense layers, activation functions,
+and layer composition.
+
+These tests work with the current implementation and provide stretch goals
+for students to implement additional features.
+"""
+
+import sys
+import os
+import pytest
+import numpy as np
+
+# Add the parent directory to path to import layers_dev
+sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
+
+# Import from the module's development file
+# Note: This imports the instructor version with full implementation
+from layers_dev import Dense, ReLU, Sigmoid, Tanh, Tensor
+
+def safe_numpy(tensor):
+    """Get numpy array from tensor, using .numpy() if available, otherwise .data"""
+    if hasattr(tensor, 'numpy'):
+        return tensor.numpy()
+    else:
+        return tensor.data
+
+class TestDenseLayer:
+    """Test Dense (Linear) layer functionality."""
+    
+    def test_dense_creation(self):
+        """Test creating Dense layers with different configurations."""
+        # Basic dense layer
+        layer = Dense(input_size=3, output_size=2)
+        assert layer.input_size == 3
+        assert layer.output_size == 2
+        assert layer.use_bias == True
+        assert layer.weights.shape == (3, 2)
+        assert layer.bias.shape == (2,)
+        
+        # Dense layer without bias
+        layer_no_bias = Dense(input_size=4, output_size=3, use_bias=False)
+        assert layer_no_bias.use_bias == False
+        assert layer_no_bias.bias is None
+    
+    def test_dense_forward_single(self):
+        """Test Dense layer forward pass with single input."""
+        layer = Dense(input_size=3, output_size=2)
+        
+        # Single input
+        x = Tensor([[1.0, 2.0, 3.0]])
+        y = layer(x)
+        
+        assert y.shape == (1, 2)
+        assert isinstance(y, Tensor)
+    
+    def test_dense_forward_batch(self):
+        """Test Dense layer forward pass with batch input."""
+        layer = Dense(input_size=3, output_size=2)
+        
+        # Batch input
+        x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+        y = layer(x)
+        
+        assert y.shape == (2, 2)
+        assert isinstance(y, Tensor)
+    
+    def test_dense_no_bias(self):
+        """Test Dense layer without bias."""
+        layer = Dense(input_size=2, output_size=1, use_bias=False)
+        
+        x = Tensor([[1.0, 2.0]])
+        y = layer(x)
+        
+        assert y.shape == (1, 1)
+        # Should be just matrix multiplication without bias
+        expected = safe_numpy(x) @ safe_numpy(layer.weights)
+        np.testing.assert_array_almost_equal(safe_numpy(y), expected)
+    
+    def test_dense_callable(self):
+        """Test that Dense layer is callable."""
+        layer = Dense(input_size=2, output_size=1)
+        x = Tensor([[1.0, 2.0]])
+        
+        # Both should work
+        y1 = layer.forward(x)
+        y2 = layer(x)
+        
+        np.testing.assert_array_equal(safe_numpy(y1), safe_numpy(y2))
+
+class TestActivationFunctions:
+    """Test activation function implementations."""
+    
+    def test_relu_basic(self):
+        """Test ReLU activation function."""
+        relu = ReLU()
+        x = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])
+        y = relu(x)
+        
+        expected = [[0.0, 0.0, 0.0, 1.0, 2.0]]
+        np.testing.assert_array_equal(safe_numpy(y), expected)
+    
+    def test_relu_callable(self):
+        """Test that ReLU is callable."""
+        relu = ReLU()
+        x = Tensor([[1.0, -1.0]])
+        
+        y1 = relu.forward(x)
+        y2 = relu(x)
+        
+        np.testing.assert_array_equal(safe_numpy(y1), safe_numpy(y2))
+    
+    def test_sigmoid_basic(self):
+        """Test Sigmoid activation function."""
+        sigmoid = Sigmoid()
+        x = Tensor([[0.0]])  # sigmoid(0) = 0.5
+        y = sigmoid(x)
+        
+        np.testing.assert_array_almost_equal(safe_numpy(y), [[0.5]])
+    
+    def test_sigmoid_range(self):
+        """Test Sigmoid output range."""
+        sigmoid = Sigmoid()
+        x = Tensor([[-10.0, 0.0, 10.0]])
+        y = sigmoid(x)
+        
+        # Should be in range [0, 1] - use reasonable bounds
+        assert np.all(safe_numpy(y) >= 0)
+        assert np.all(safe_numpy(y) <= 1)
+        # Check that extreme values are close to bounds
+        assert safe_numpy(y)[0][0] < 0.01  # Very small for -10
+        assert safe_numpy(y)[0][2] > 0.99  # Very large for 10
+    
+    def test_tanh_basic(self):
+        """Test Tanh activation function."""
+        tanh = Tanh()
+        x = Tensor([[0.0]])  # tanh(0) = 0
+        y = tanh(x)
+        
+        np.testing.assert_array_almost_equal(safe_numpy(y), [[0.0]])
+    
+    def test_tanh_range(self):
+        """Test Tanh output range."""
+        tanh = Tanh()
+        x = Tensor([[-10.0, 0.0, 10.0]])
+        y = tanh(x)
+        
+        # Should be in range [-1, 1] - use reasonable bounds
+        assert np.all(safe_numpy(y) >= -1)
+        assert np.all(safe_numpy(y) <= 1)
+        # Check that extreme values are close to bounds
+        assert safe_numpy(y)[0][0] < -0.99  # Very negative for -10
+        assert safe_numpy(y)[0][2] > 0.99   # Very positive for 10
+
+class TestLayerComposition:
+    """Test composing layers into neural networks."""
+    
+    def test_simple_network(self):
+        """Test a simple 2-layer network."""
+        # 3 → 4 → 2 network
+        layer1 = Dense(input_size=3, output_size=4)
+        relu = ReLU()
+        layer2 = Dense(input_size=4, output_size=2)
+        sigmoid = Sigmoid()
+        
+        # Forward pass
+        x = Tensor([[1.0, 2.0, 3.0]])
+        h1 = layer1(x)
+        h1_activated = relu(h1)
+        h2 = layer2(h1_activated)
+        output = sigmoid(h2)
+        
+        assert h1.shape == (1, 4)
+        assert h1_activated.shape == (1, 4)
+        assert h2.shape == (1, 2)
+        assert output.shape == (1, 2)
+        
+        # Output should be in sigmoid range
+        assert np.all(safe_numpy(output) >= 0)
+        assert np.all(safe_numpy(output) <= 1)
+    
+    def test_batch_network(self):
+        """Test network with batch processing."""
+        layer1 = Dense(input_size=2, output_size=3)
+        relu = ReLU()
+        layer2 = Dense(input_size=3, output_size=1)
+        
+        # Batch of 4 examples
+        x = Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0]])
+        
+        h1 = layer1(x)
+        h1_activated = relu(h1)
+        output = layer2(h1_activated)
+        
+        assert output.shape == (4, 1)
+    
+    def test_deep_network(self):
+        """Test deeper network composition."""
+        # 5-layer network
+        layers = [
+            Dense(input_size=10, output_size=8),
+            ReLU(),
+            Dense(input_size=8, output_size=6),
+            ReLU(),
+            Dense(input_size=6, output_size=4),
+            ReLU(),
+            Dense(input_size=4, output_size=2),
+            Sigmoid()
+        ]
+        
+        x = Tensor([[1.0] * 10])  # 10 features
+        
+        # Forward pass through all layers
+        current = x
+        for layer in layers:
+            current = layer(current)
+        
+        assert current.shape == (1, 2)
+        # Final output should be in sigmoid range
+        assert np.all(safe_numpy(current) >= 0)
+        assert np.all(safe_numpy(current) <= 1)
+
+class TestEdgeCases:
+    """Test edge cases and error conditions."""
+    
+    def test_zero_input(self):
+        """Test layers with zero input."""
+        layer = Dense(input_size=3, output_size=2)
+        relu = ReLU()
+        
+        x = Tensor([[0.0, 0.0, 0.0]])
+        y = layer(x)
+        y_relu = relu(y)
+        
+        assert y.shape == (1, 2)
+        assert y_relu.shape == (1, 2)
+    
+    def test_large_input(self):
+        """Test layers with large input values."""
+        layer = Dense(input_size=2, output_size=1)
+        sigmoid = Sigmoid()
+        
+        x = Tensor([[1000.0, -1000.0]])
+        y = layer(x)
+        y_sigmoid = sigmoid(y)
+        
+        # Should not overflow
+        assert not np.any(np.isnan(safe_numpy(y_sigmoid)))
+        assert not np.any(np.isinf(safe_numpy(y_sigmoid)))
+    
+    def test_single_neuron(self):
+        """Test single neuron layers."""
+        layer = Dense(input_size=1, output_size=1)
+        x = Tensor([[5.0]])
+        y = layer(x)
+        
+        assert y.shape == (1, 1)
+
+# Stretch goal tests (these will be skipped if methods don't exist)
+class TestStretchGoals:
+    """Stretch goal tests for advanced features."""
+    
+    @pytest.mark.skip(reason="Stretch goal: Weight initialization methods")
+    def test_weight_initialization_methods(self):
+        """Test different weight initialization strategies."""
+        # Xavier initialization
+        layer_xavier = Dense(input_size=100, output_size=50, init_method='xavier')
+        weights_xavier = safe_numpy(layer_xavier.weights)
+        
+        # He initialization  
+        layer_he = Dense(input_size=100, output_size=50, init_method='he')
+        weights_he = safe_numpy(layer_he.weights)
+        
+        # Check initialization ranges
+        xavier_limit = np.sqrt(6.0 / (100 + 50))
+        assert np.all(np.abs(weights_xavier) <= xavier_limit)
+        
+        he_limit = np.sqrt(2.0 / 100)
+        assert np.std(weights_he) <= he_limit * 1.5  # Some tolerance
+    
+    @pytest.mark.skip(reason="Stretch goal: Layer parameter access")
+    def test_layer_parameters(self):
+        """Test accessing and modifying layer parameters."""
+        layer = Dense(input_size=3, output_size=2)
+        
+        # Should be able to access parameters
+        assert hasattr(layer, 'parameters')
+        params = layer.parameters()
+        assert len(params) == 2  # weights and bias
+        
+        # Should be able to set parameters
+        new_weights = Tensor(np.ones((3, 2)))
+        layer.set_weights(new_weights)
+        np.testing.assert_array_equal(safe_numpy(layer.weights), safe_numpy(new_weights))
+    
+    @pytest.mark.skip(reason="Stretch goal: Additional activation functions")
+    def test_additional_activations(self):
+        """Test additional activation functions."""
+        # Leaky ReLU
+        leaky_relu = LeakyReLU(alpha=0.1)
+        x = Tensor([[-1.0, 0.0, 1.0]])
+        y = leaky_relu(x)
+        expected = [[-0.1, 0.0, 1.0]]
+        np.testing.assert_array_almost_equal(safe_numpy(y), expected)
+        
+        # Softmax
+        softmax = Softmax()
+        x = Tensor([[1.0, 2.0, 3.0]])
+        y = softmax(x)
+        # Should sum to 1
+        assert np.allclose(np.sum(safe_numpy(y)), 1.0)
+    
+    @pytest.mark.skip(reason="Stretch goal: Dropout layer")
+    def test_dropout_layer(self):
+        """Test dropout layer implementation."""
+        dropout = Dropout(p=0.5)
+        x = Tensor([[1.0, 2.0, 3.0, 4.0]])
+        
+        # Training mode
+        dropout.train()
+        y_train = dropout(x)
+        
+        # Inference mode
+        dropout.eval()
+        y_eval = dropout(x)
+        
+        # In eval mode, should be same as input
+        np.testing.assert_array_equal(safe_numpy(y_eval), safe_numpy(x))
+    
+    @pytest.mark.skip(reason="Stretch goal: Batch normalization")
+    def test_batch_normalization(self):
+        """Test batch normalization layer."""
+        bn = BatchNorm1d(num_features=3)
+        x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+        y = bn(x)
+        
+        # Should normalize across batch dimension
+        assert y.shape == x.shape
+        # Mean should be close to 0, std close to 1
+        assert np.allclose(np.mean(safe_numpy(y), axis=0), 0.0, atol=1e-6)
+        assert np.allclose(np.std(safe_numpy(y), axis=0), 1.0, atol=1e-6) 
--- a/tests/test_layers.py
+++ b/tests/test_layers.py
@@ -0,0 +1,242 @@
+"""
+Integration tests for TinyTorch Layers package.
+
+Tests the exported layers functionality that students will use.
+These tests ensure the student experience works correctly.
+"""
+
+import pytest
+import numpy as np
+from tinytorch.core.layers import Dense, ReLU, Sigmoid, Tanh
+from tinytorch.core.tensor import Tensor
+
+
+class TestDenseLayerIntegration:
+    """Test Dense layer integration with exported package."""
+    
+    def test_dense_basic_functionality(self):
+        """Test basic Dense layer functionality."""
+        layer = Dense(input_size=3, output_size=2)
+        x = Tensor([[1.0, 2.0, 3.0]])
+        y = layer(x)
+        
+        assert y.shape == (1, 2)
+        assert isinstance(y, Tensor)
+    
+    def test_dense_batch_processing(self):
+        """Test Dense layer with batch processing."""
+        layer = Dense(input_size=2, output_size=3)
+        x = Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
+        y = layer(x)
+        
+        assert y.shape == (3, 3)
+        assert isinstance(y, Tensor)
+    
+    def test_dense_no_bias(self):
+        """Test Dense layer without bias."""
+        layer = Dense(input_size=2, output_size=1, use_bias=False)
+        x = Tensor([[1.0, 2.0]])
+        y = layer(x)
+        
+        assert y.shape == (1, 1)
+        assert layer.bias is None
+
+
+class TestActivationFunctionsIntegration:
+    """Test activation functions integration."""
+    
+    def test_relu_integration(self):
+        """Test ReLU activation function."""
+        relu = ReLU()
+        x = Tensor([[-1.0, 0.0, 1.0]])
+        y = relu(x)
+        
+        expected = [[0.0, 0.0, 1.0]]
+        np.testing.assert_array_equal(y.data, expected)
+    
+    def test_sigmoid_integration(self):
+        """Test Sigmoid activation function."""
+        sigmoid = Sigmoid()
+        x = Tensor([[0.0]])
+        y = sigmoid(x)
+        
+        np.testing.assert_array_almost_equal(y.data, [[0.5]])
+    
+    def test_tanh_integration(self):
+        """Test Tanh activation function."""
+        tanh = Tanh()
+        x = Tensor([[0.0]])
+        y = tanh(x)
+        
+        np.testing.assert_array_almost_equal(y.data, [[0.0]])
+
+
+class TestNeuralNetworkIntegration:
+    """Test complete neural network integration."""
+    
+    def test_simple_network_integration(self):
+        """Test building a simple neural network."""
+        # 3 → 4 → 2 network
+        layer1 = Dense(input_size=3, output_size=4)
+        relu = ReLU()
+        layer2 = Dense(input_size=4, output_size=2)
+        sigmoid = Sigmoid()
+        
+        # Forward pass
+        x = Tensor([[1.0, 2.0, 3.0]])
+        h1 = layer1(x)
+        h1_activated = relu(h1)
+        h2 = layer2(h1_activated)
+        output = sigmoid(h2)
+        
+        assert output.shape == (1, 2)
+        # Output should be in sigmoid range
+        assert np.all(output.data >= 0)
+        assert np.all(output.data <= 1)
+    
+    def test_batch_network_integration(self):
+        """Test network with batch processing."""
+        layer1 = Dense(input_size=2, output_size=3)
+        relu = ReLU()
+        layer2 = Dense(input_size=3, output_size=1)
+        
+        # Batch of 4 examples
+        x = Tensor([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0]])
+        
+        h1 = layer1(x)
+        h1_activated = relu(h1)
+        output = layer2(h1_activated)
+        
+        assert output.shape == (4, 1)
+    
+    def test_image_classification_network(self):
+        """Test a realistic image classification network."""
+        # Simulate MNIST: 784 → 128 → 64 → 10
+        layer1 = Dense(input_size=784, output_size=128)
+        relu1 = ReLU()
+        layer2 = Dense(input_size=128, output_size=64)
+        relu2 = ReLU()
+        layer3 = Dense(input_size=64, output_size=10)
+        sigmoid = Sigmoid()
+        
+        # Simulate a batch of 3 images
+        batch_size = 3
+        fake_images = Tensor(np.random.randn(batch_size, 784).astype(np.float32))
+        
+        # Forward pass
+        h1 = relu1(layer1(fake_images))
+        h2 = relu2(layer2(h1))
+        predictions = sigmoid(layer3(h2))
+        
+        assert predictions.shape == (batch_size, 10)
+        # All predictions should be in [0, 1] range
+        assert np.all(predictions.data >= 0)
+        assert np.all(predictions.data <= 1)
+
+
+class TestLayerCompositionIntegration:
+    """Test layer composition patterns."""
+    
+    def test_sequential_composition(self):
+        """Test sequential layer composition."""
+        layers = [
+            Dense(input_size=5, output_size=4),
+            ReLU(),
+            Dense(input_size=4, output_size=3),
+            ReLU(),
+            Dense(input_size=3, output_size=2),
+            Sigmoid()
+        ]
+        
+        x = Tensor([[1.0, 2.0, 3.0, 4.0, 5.0]])
+        
+        # Apply layers sequentially
+        current = x
+        for layer in layers:
+            current = layer(current)
+        
+        assert current.shape == (1, 2)
+        assert np.all(current.data >= 0)
+        assert np.all(current.data <= 1)
+    
+    def test_different_activation_functions(self):
+        """Test using different activation functions."""
+        # Network with different activations
+        layer1 = Dense(input_size=3, output_size=4)
+        relu = ReLU()
+        layer2 = Dense(input_size=4, output_size=4)
+        tanh = Tanh()
+        layer3 = Dense(input_size=4, output_size=2)
+        sigmoid = Sigmoid()
+        
+        x = Tensor([[1.0, 2.0, 3.0]])
+        
+        # Forward pass
+        h1 = relu(layer1(x))
+        h2 = tanh(layer2(h1))
+        output = sigmoid(layer3(h2))
+        
+        assert output.shape == (1, 2)
+        # Final output should be in sigmoid range
+        assert np.all(output.data >= 0)
+        assert np.all(output.data <= 1)
+
+
+class TestStudentExperience:
+    """Test the typical student experience."""
+    
+    def test_first_neural_network(self):
+        """Test the first neural network a student would build."""
+        # Simple 2-layer network like in the tutorial
+        layer1 = Dense(input_size=3, output_size=4)
+        activation1 = ReLU()
+        layer2 = Dense(input_size=4, output_size=2)
+        activation2 = Sigmoid()
+        
+        # Sample data
+        x = Tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+        
+        # Forward pass
+        h1 = layer1(x)
+        h1_activated = activation1(h1)
+        h2 = layer2(h1_activated)
+        output = activation2(h2)
+        
+        # Should work without errors
+        assert output.shape == (2, 2)
+        assert isinstance(output, Tensor)
+    
+    def test_layer_inspection(self):
+        """Test that students can inspect layer properties."""
+        layer = Dense(input_size=3, output_size=2)
+        
+        # Students should be able to access these properties
+        assert hasattr(layer, 'input_size')
+        assert hasattr(layer, 'output_size')
+        assert hasattr(layer, 'weights')
+        assert hasattr(layer, 'bias')
+        
+        assert layer.input_size == 3
+        assert layer.output_size == 2
+        assert layer.weights.shape == (3, 2)
+        assert layer.bias.shape == (2,)
+    
+    def test_activation_function_behavior(self):
+        """Test activation function behavior that students will observe."""
+        # ReLU clips negative values
+        relu = ReLU()
+        x = Tensor([[-1.0, 0.0, 1.0]])
+        y = relu(x)
+        assert np.array_equal(y.data, [[0.0, 0.0, 1.0]])
+        
+        # Sigmoid maps to (0, 1)
+        sigmoid = Sigmoid()
+        x = Tensor([[0.0]])
+        y = sigmoid(x)
+        assert np.isclose(y.data[0][0], 0.5)
+        
+        # Tanh maps to (-1, 1)
+        tanh = Tanh()
+        x = Tensor([[0.0]])
+        y = tanh(x)
+        assert np.isclose(y.data[0][0], 0.0) 
--- a/tinytorch/core/layers.py
+++ b/tinytorch/core/layers.py
@@ -0,0 +1,238 @@
+# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/layers/layers_dev.ipynb.
+
+# %% auto 0
+__all__ = ['Dense', 'ReLU', 'Sigmoid', 'Tanh']
+
+# %% ../../modules/layers/layers_dev.ipynb 2
+import numpy as np
+import math
+import sys
+from typing import Union, Optional, Callable
+from .tensor import Tensor
+
+# Import our Tensor class
+# sys.path.append('../../')
+# from modules.tensor.tensor_dev import Tensor
+
+# print("🔥 TinyTorch Layers Module")
+# print(f"NumPy version: {np.__version__}")
+# print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
+# print("Ready to build neural network layers!")
+
+# %% ../../modules/layers/layers_dev.ipynb 4
+class Dense:
+    """
+    Dense (Linear) Layer: y = Wx + b
+    
+    The fundamental building block of neural networks.
+    Performs linear transformation: matrix multiplication + bias addition.
+    
+    Args:
+        input_size: Number of input features
+        output_size: Number of output features
+        use_bias: Whether to include bias term (default: True)
+        
+    TODO: Implement the Dense layer with weight initialization and forward pass.
+    """
+    
+    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
+        """
+        Initialize Dense layer with random weights.
+        
+        TODO: 
+        1. Store layer parameters (input_size, output_size, use_bias)
+        2. Initialize weights with small random values
+        3. Initialize bias to zeros (if use_bias=True)
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Forward pass: y = Wx + b
+        
+        Args:
+            x: Input tensor of shape (batch_size, input_size)
+            
+        Returns:
+            Output tensor of shape (batch_size, output_size)
+            
+        TODO: Implement matrix multiplication and bias addition
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        """Make layer callable: layer(x) same as layer.forward(x)"""
+        return self.forward(x)
+
+# %% ../../modules/layers/layers_dev.ipynb 5
+class Dense:
+    """
+    Dense (Linear) Layer: y = Wx + b
+    
+    The fundamental building block of neural networks.
+    Performs linear transformation: matrix multiplication + bias addition.
+    """
+    
+    def __init__(self, input_size: int, output_size: int, use_bias: bool = True):
+        """Initialize Dense layer with random weights."""
+        self.input_size = input_size
+        self.output_size = output_size
+        self.use_bias = use_bias
+        
+        # Initialize weights with Xavier/Glorot initialization
+        # This helps with gradient flow during training
+        limit = math.sqrt(6.0 / (input_size + output_size))
+        self.weights = Tensor(
+            np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)
+        )
+        
+        # Initialize bias to zeros
+        if use_bias:
+            self.bias = Tensor(np.zeros(output_size, dtype=np.float32))
+        else:
+            self.bias = None
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward pass: y = Wx + b"""
+        # Matrix multiplication: x @ weights
+        # x shape: (batch_size, input_size)
+        # weights shape: (input_size, output_size)
+        # result shape: (batch_size, output_size)
+        output = Tensor(x.data @ self.weights.data)
+        
+        # Add bias if present
+        if self.bias is not None:
+            output = Tensor(output.data + self.bias.data)
+        
+        return output
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        """Make layer callable: layer(x) same as layer.forward(x)"""
+        return self.forward(x)
+
+# %% ../../modules/layers/layers_dev.ipynb 9
+class ReLU:
+    """
+    ReLU Activation: f(x) = max(0, x)
+    
+    The most popular activation function in deep learning.
+    Simple, effective, and computationally efficient.
+    
+    TODO: Implement ReLU activation function.
+    """
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Apply ReLU: f(x) = max(0, x)
+        
+        Args:
+            x: Input tensor
+            
+        Returns:
+            Output tensor with ReLU applied element-wise
+            
+        TODO: Implement element-wise max(0, x) operation
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        """Make activation callable: relu(x) same as relu.forward(x)"""
+        return self.forward(x)
+
+# %% ../../modules/layers/layers_dev.ipynb 10
+class ReLU:
+    """ReLU Activation: f(x) = max(0, x)"""
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Apply ReLU: f(x) = max(0, x)"""
+        return Tensor(np.maximum(0, x.data))
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %% ../../modules/layers/layers_dev.ipynb 11
+class Sigmoid:
+    """
+    Sigmoid Activation: f(x) = 1 / (1 + e^(-x))
+    
+    Squashes input to range (0, 1). Often used for binary classification.
+    
+    TODO: Implement Sigmoid activation function.
+    """
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Apply Sigmoid: f(x) = 1 / (1 + e^(-x))
+        
+        Args:
+            x: Input tensor
+            
+        Returns:
+            Output tensor with Sigmoid applied element-wise
+            
+        TODO: Implement sigmoid function (be careful with numerical stability!)
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %% ../../modules/layers/layers_dev.ipynb 12
+class Sigmoid:
+    """Sigmoid Activation: f(x) = 1 / (1 + e^(-x))"""
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Apply Sigmoid with numerical stability"""
+        # Use the numerically stable version to avoid overflow
+        # For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
+        # For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
+        x_data = x.data
+        result = np.zeros_like(x_data)
+        
+        # Stable computation
+        positive_mask = x_data >= 0
+        result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))
+        result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))
+        
+        return Tensor(result)
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %% ../../modules/layers/layers_dev.ipynb 13
+class Tanh:
+    """
+    Tanh Activation: f(x) = tanh(x)
+    
+    Squashes input to range (-1, 1). Zero-centered output.
+    
+    TODO: Implement Tanh activation function.
+    """
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """
+        Apply Tanh: f(x) = tanh(x)
+        
+        Args:
+            x: Input tensor
+            
+        Returns:
+            Output tensor with Tanh applied element-wise
+            
+        TODO: Implement tanh function
+        """
+        raise NotImplementedError("Student implementation required")
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)
+
+# %% ../../modules/layers/layers_dev.ipynb 14
+class Tanh:
+    """Tanh Activation: f(x) = tanh(x)"""
+    
+    def forward(self, x: Tensor) -> Tensor:
+        """Apply Tanh"""
+        return Tensor(np.tanh(x.data))
+    
+    def __call__(self, x: Tensor) -> Tensor:
+        return self.forward(x)