diff --git a/README.md b/README.md
index 29878b81..cab01d83 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Tiny🔥Torch: Build ML Systems from Scratch
 
-> A hands-on systems course where you implement every component of a modern ML system
+> A hands-on ML Systems course where students implement every component from scratch
 
 [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
 [![License](https://img.shields.io/badge/license-Apache%202.0-green.svg)](LICENSE)
@@ -8,150 +8,153 @@
 
 > **Disclaimer**: TinyTorch is an educational framework developed independently and is not affiliated with or endorsed by Meta or the PyTorch project.
 
-**Tiny🔥Torch** is a hands-on companion to [*Machine Learning Systems*](https://mlsysbook.ai), providing practical coding exercises that complement the book's theoretical foundations. Rather than just learning *about* ML systems, you'll build one from scratch—implementing everything from tensors and autograd to hardware-aware optimization and deployment systems.
+**Tiny🔥Torch** is a complete ML Systems course where students build their own machine learning framework from scratch. Rather than just learning *about* ML systems, students implement every component and then use their own implementation to solve real problems.
 
-## 🎯 What You'll Build
+## 🚀 **Quick Start - Choose Your Path**
 
-By completing this course, you will have implemented a complete ML system:
+### **👨‍🏫 For Instructors**
+**[📖 Instructor Guide](docs/INSTRUCTOR_GUIDE.md)** - Complete teaching guide with verified modules, class structure, and commands
+- 6+ weeks of proven curriculum content
+- Verified module status and teaching sequence
+- Class session structure and troubleshooting guide
 
-**Core Framework** → **Training Pipeline** → **Production System**
-- ✅ Tensors with automatic differentiation
-- ✅ Neural network layers (MLP, CNN, Transformer)
-- ✅ Training loops with optimizers (SGD, Adam)
-- ✅ Data loading and preprocessing pipelines
-- ✅ Model compression (pruning, quantization)
-- ✅ Performance profiling and optimization
-- ✅ Production deployment and monitoring
+### **👨‍🎓 For Students**
+**[🔥 Student Guide](docs/STUDENT_GUIDE.md)** - Complete learning path with clear workflow
+- Step-by-step progress tracker
+- 5-step daily workflow for each module
+- Getting help and study tips
 
-## 🚀 Quick Start
+### **🛠️ For Developers**
+**[📚 Documentation](docs/)** - Complete documentation including pedagogy and development guides
 
-**Ready to build? Choose your path:**
+## 🎯 **What Students Build**
 
-### 🏃‍♂️ I want to start building now
-→ **[QUICKSTART.md](QUICKSTART.md)** - Get coding in 10 minutes
+By completing TinyTorch, students implement a complete ML framework:
 
-### 📚 I want to understand the full course structure  
-→ **[PROJECT_GUIDE.md](PROJECT_GUIDE.md)** - Complete learning roadmap
+- ✅ **Activation functions** (ReLU, Sigmoid, Tanh)
+- ✅ **Neural network layers** (Dense, Conv2D)
+- ✅ **Network architectures** (Sequential, MLP)
+- ✅ **Data loading** (CIFAR-10 pipeline)
+- ✅ **Development workflow** (export, test, use)
+- 🚧 **Tensor operations** (arithmetic, broadcasting)
+- 🚧 **Automatic differentiation** (backpropagation)
+- 🚧 **Training systems** (optimizers, loss functions)
 
-### 🔍 I want to see the course in action
-→ **[modules/setup/](modules/setup/)** - Browse the first module
+## 🎓 **Learning Philosophy: Build → Use → Understand → Repeat**
 
-## 🎓 Learning Approach
+Students experience the complete cycle:
+1. **Build**: Implement `ReLU()` function from scratch
+2. **Use**: Import `from tinytorch.core.activations import ReLU` with their own code
+3. **Understand**: See how it works in real neural networks
+4. **Repeat**: Each module builds on previous implementations
 
-**Module-First Development**: Each module is self-contained with its own notebook, tests, and learning objectives. You'll work in Jupyter notebooks using the [nbdev](https://nbdev.fast.ai/) workflow to build a real Python package.
+## 📊 **Current Status** (Ready for Classroom Use)
 
-**The Cycle**: `Write Code → Export → Test → Next Module`
+### **✅ Fully Working Modules** (6+ weeks of content)
+- **00_setup** (20/20 tests) - Development workflow & CLI tools
+- **02_activations** (24/24 tests) - ReLU, Sigmoid, Tanh functions
+- **03_layers** (17/22 tests) - Dense layers & neural building blocks
+- **04_networks** (20/25 tests) - Sequential networks & MLPs
+- **06_dataloader** (15/15 tests) - CIFAR-10 data loading
+- **05_cnn** (2/2 tests) - Convolution operations
 
+### **🚧 In Development**
+- **01_tensor** (22/33 tests) - Tensor arithmetic
+- **07-13** - Advanced features (autograd, training, MLOps)
+
+## 🚀 **Quick Commands**
+
+### **System Status**
 ```bash
-# The rhythm you'll use for every module
-jupyter lab tensor_dev.ipynb    # Write & test interactively  
-python bin/tito.py sync         # Export to Python package
-python bin/tito.py test         # Verify implementation
+tito system info              # Check system and module status
+tito system doctor            # Verify environment setup
+tito module status            # View all module progress
 ```
 
-## 📚 Course Structure
-
-| Phase | Modules | What You'll Build |
-|-------|---------|-------------------|
-| **Foundation** | Setup, Tensor, Autograd | Core mathematical engine |
-| **Neural Networks** | MLP, CNN | Learning algorithms |
-| **Training Systems** | Data, Training, Config | End-to-end pipelines |
-| **Production** | Profiling, Compression, MLOps | Real-world deployment |
-
-**Total Time**: 40-80 hours over several weeks • **Prerequisites**: Python basics
-
-## 🛠️ Key Commands
-
+### **Student Workflow**
 ```bash
-python bin/tito.py info               # Check progress
-python bin/tito.py sync               # Export notebooks  
-python bin/tito.py test --module [name]  # Test implementation
+cd modules/00_setup           # Navigate to first module
+jupyter lab setup_dev.py     # Open development notebook
+python -m pytest tests/ -v   # Run tests
+python bin/tito module export 00_setup  # Export to package
 ```
 
-## 🌟 Why Tiny🔥Torch?
+### **Verify Implementation**
+```bash
+# Use student's own implementations
+python -c "from tinytorch.core.utils import hello_tinytorch; hello_tinytorch()"
+python -c "from tinytorch.core.activations import ReLU; print(ReLU()([-1, 0, 1]))"
+```
 
-**Systems Engineering Principles**: Learn to design ML systems from first principles
-**Hardware-Software Co-design**: Understand how algorithms map to computational resources  
-**Performance-Aware Development**: Build systems optimized for real-world constraints
-**End-to-End Systems**: From mathematical foundations to production deployment
+## 🌟 **Why Build from Scratch?**
 
-## 📖 Educational Approach
+**Even in the age of AI-generated code, building systems from scratch remains educationally essential:**
 
-**Companion to [Machine Learning Systems](https://mlsysbook.ai)**: This course provides hands-on implementation exercises that bring the book's concepts to life through code.
+- **Understanding vs. Using**: AI shows *what* works, TinyTorch teaches *why* it works
+- **Systems Literacy**: Debugging real ML requires understanding abstractions like autograd and data loaders
+- **AI-Augmented Engineers**: The best engineers collaborate with AI tools, not rely on them blindly
+- **Intentional Design**: Systems thinking about memory, performance, and architecture can't be outsourced
 
-**Learning by Building**: Following the educational philosophy of [Karpathy's micrograd](https://github.com/karpathy/micrograd), we learn complex systems by implementing them from scratch.
+## 🏗️ **Repository Structure**
 
-**Real-World Systems**: Drawing from production [PyTorch](https://pytorch.org/) and [JAX](https://jax.readthedocs.io/) architectures to understand industry-proven design patterns.
+```
+TinyTorch/
+├── README.md                 # This file - main entry point
+├── docs/
+│   ├── INSTRUCTOR_GUIDE.md   # Complete teaching guide
+│   ├── STUDENT_GUIDE.md      # Complete learning path
+│   └── [detailed docs]       # Pedagogy and development guides
+├── modules/
+│   ├── 00_setup/            # Development workflow
+│   ├── 01_tensor/           # Tensor operations
+│   ├── 02_activations/      # Activation functions
+│   ├── 03_layers/           # Neural network layers
+│   ├── 04_networks/         # Network architectures
+│   ├── 05_cnn/              # Convolution operations
+│   ├── 06_dataloader/       # Data loading pipeline
+│   └── 07-13/               # Advanced features
+├── tinytorch/               # The actual Python package
+├── bin/                     # CLI tools (tito)
+└── tests/                   # Integration tests
+```
 
-## 🤔 Frequently Asked Questions
+## 📚 **Educational Approach**
 
-<details>
-<summary><strong>Why should students build TinyTorch if AI agents can already generate similar code?</strong></summary>
+### **Real Data, Real Systems**
+- Work with CIFAR-10 (10,000 real images)
+- Production-style code organization
+- Performance and engineering considerations
 
-Even though large language models can generate working ML code, building systems from scratch remains *pedagogically essential*:
+### **Immediate Feedback**
+- Tests provide instant verification
+- Students see their code working quickly
+- Progress is visible and measurable
 
-- **Understanding vs. Using**: AI-generated code shows what works, but not *why* it works. TinyTorch teaches students to reason through tensor operations, memory flows, and training logic.
-- **Systems Literacy**: Debugging and designing real ML pipelines requires understanding abstractions like autograd, data loaders, and parameter updates, not just calling APIs.
-- **AI-Augmented Engineers**: The best AI engineers will *collaborate with* AI tools, not rely on them blindly. TinyTorch trains students to read, verify, and modify generated code responsibly.
-- **Intentional Design**: Systems thinking can’t be outsourced. TinyTorch helps learners internalize how decisions about data layout, execution, and precision affect performance.
+### **Progressive Complexity**
+- Start simple (activation functions)
+- Build complexity gradually (layers → networks → training)
+- Connect to real ML engineering practices
 
-</details>
+## 🤝 **Contributing**
 
-<details>
-<summary><strong>Why not just study the PyTorch or TensorFlow source code instead?</strong></summary>
+We welcome contributions! See our [development documentation](docs/development/) for guidelines on creating new modules or improving existing ones.
 
-Industrial frameworks are optimized for scale, not clarity. They contain thousands of lines of code, hardware-specific kernels, and complex abstractions. 
-
-TinyTorch, by contrast, is intentionally **minimal** and **educational** — like building a kernel in an operating systems course. It helps learners understand the essential components and build an end-to-end pipeline from first principles.
-
-</details>
-
-<details>
-<summary><strong>Isn't it more efficient to just teach ML theory and use existing frameworks?</strong></summary>
-
-Teaching only the math without implementation leaves students unable to debug or extend real-world systems. TinyTorch bridges that gap by making ML systems tangible:
-
-- Students learn by doing, not just reading.
-- Implementing backpropagation or a training loop exposes hidden assumptions and tradeoffs.
-- Understanding how layers are built gives deeper insight into model behavior and performance.
-
-</details>
-
-<details>
-<summary><strong>Why use TinyML in a Machine Learning Systems course?</strong></summary>
-
-TinyML makes systems concepts concrete. By running ML models on constrained hardware, students encounter the real-world limits of memory, compute, latency, and energy — exactly the challenges modern ML engineers face at scale.
-
-- ⚙️ **Hardware constraints** expose architectural tradeoffs that are hidden in cloud settings.
-- 🧠 **Systems thinking** is deepened by understanding how models interact with sensors, microcontrollers, and execution runtimes.
-- 🌍 **End-to-end ML** becomes tangible — from data ingestion to inference.
-
-TinyML isn’t about toy problems — it’s about simplifying to the point of *clarity*, not abstraction. Students see the full system pipeline, not just the cloud endpoint.
-
-</details>
-
-<details>
-<summary><strong>What do the hardware kits add to the learning experience?</strong></summary>
-
-The hardware kits are where learning becomes **hands-on and embodied**. They bring several pedagogical advantages:
-
-- 🔌 **Physicality**: Students see real data flowing through sensors and watch ML models respond — not just print outputs.
-- 🧪 **Experimentation**: Kits enable tinkering with latency, power, and model size in ways that are otherwise abstract.
-- 🚀 **Creativity**: Students can build real applications — from gesture detection to keyword spotting — using what they learned in TinyTorch.
-
-The kits act as *debuggable, inspectable deployment targets*. They reveal what’s easy vs. hard in ML deployment — and why hardware-aware design matters.
-
-</details>
-
----
-## 🤝 Contributing
-
-We welcome contributions! Whether you're a student who found a bug or an instructor wanting to add modules, see our [Contributing Guide](CONTRIBUTING.md).
-
-## 📄 License
+## 📄 **License**
 
 Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
 
 ---
 
-**Ready to start building?** → [**QUICKSTART.md**](QUICKSTART.md) 🚀
+## 🎉 **Ready to Start?**
+
+### **Instructors**
+1. Read the [📖 Instructor Guide](docs/INSTRUCTOR_GUIDE.md)
+2. Test your setup: `tito system doctor`
+3. Start with: `cd modules/00_setup && jupyter lab setup_dev.py`
+
+### **Students**
+1. Read the [🔥 Student Guide](docs/STUDENT_GUIDE.md)
+2. Begin with: `cd modules/00_setup && jupyter lab setup_dev.py`
+3. Follow the 5-step workflow for each module
+
+**🚀 TinyTorch is ready for classroom use with 6+ weeks of proven curriculum content!**
diff --git a/assignments/source/00_setup/00_setup.ipynb b/assignments/source/00_setup/00_setup.ipynb
deleted file mode 100644
index 64f3eeb4..00000000
--- a/assignments/source/00_setup/00_setup.ipynb
+++ /dev/null
@@ -1,674 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "e3fcd475",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "# Module 0: Setup - Tiny\ud83d\udd25Torch Development Workflow (Enhanced for NBGrader)\n",
-        "\n",
-        "Welcome to TinyTorch! This module teaches you the development workflow you'll use throughout the course.\n",
-        "\n",
-        "## Learning Goals\n",
-        "- Understand the nbdev notebook-to-Python workflow\n",
-        "- Write your first TinyTorch code\n",
-        "- Run tests and use the CLI tools\n",
-        "- Get comfortable with the development rhythm\n",
-        "\n",
-        "## The TinyTorch Development Cycle\n",
-        "\n",
-        "1. **Write code** in this notebook using `#| export` \n",
-        "2. **Export code** with `python bin/tito.py sync --module setup`\n",
-        "3. **Run tests** with `python bin/tito.py test --module setup`\n",
-        "4. **Check progress** with `python bin/tito.py info`\n",
-        "\n",
-        "## New: NBGrader Integration\n",
-        "This module is also configured for automated grading with **100 points total**:\n",
-        "- Basic Functions: 30 points\n",
-        "- SystemInfo Class: 35 points  \n",
-        "- DeveloperProfile Class: 35 points\n",
-        "\n",
-        "Let's get started!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "fba821b3",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| default_exp core.utils"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "16465d62",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "# Setup imports and environment\n",
-        "import sys\n",
-        "import platform\n",
-        "from datetime import datetime\n",
-        "import os\n",
-        "from pathlib import Path\n",
-        "\n",
-        "print(\"\ud83d\udd25 TinyTorch Development Environment\")\n",
-        "print(f\"Python {sys.version}\")\n",
-        "print(f\"Platform: {platform.system()} {platform.release()}\")\n",
-        "print(f\"Started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "64d86ea8",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 1: Basic Functions (30 Points)\n",
-        "\n",
-        "Let's start with simple functions that form the foundation of TinyTorch."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "ab7eb118",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def hello_tinytorch():\n",
-        "    \"\"\"\n",
-        "    A simple hello world function for TinyTorch.\n",
-        "    \n",
-        "    Display TinyTorch ASCII art and welcome message.\n",
-        "    Load the flame art from tinytorch_flame.txt file with graceful fallback.\n",
-        "    \"\"\"\n",
-        "    #| exercise_start\n",
-        "    #| hint: Load ASCII art from tinytorch_flame.txt file with graceful fallback\n",
-        "    #| solution_test: Function should display ASCII art and welcome message\n",
-        "    #| difficulty: easy\n",
-        "    #| points: 10\n",
-        "    \n",
-        "    ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "    ### END SOLUTION\n",
-        "    \n",
-        "    #| exercise_end\n",
-        "\n",
-        "def add_numbers(a, b):\n",
-        "    \"\"\"\n",
-        "    Add two numbers together.\n",
-        "    \n",
-        "    This is the foundation of all mathematical operations in ML.\n",
-        "    \"\"\"\n",
-        "    #| exercise_start\n",
-        "    #| hint: Use the + operator to add two numbers\n",
-        "    #| solution_test: add_numbers(2, 3) should return 5\n",
-        "    #| difficulty: easy\n",
-        "    #| points: 10\n",
-        "    \n",
-        "    ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "    ### END SOLUTION\n",
-        "    \n",
-        "    #| exercise_end"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "4b7256a9",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Hidden Tests: Basic Functions (10 Points)\n",
-        "\n",
-        "These tests verify the basic functionality and award points automatically."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "2fc78732",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "### BEGIN HIDDEN TESTS\n",
-        "def test_hello_tinytorch():\n",
-        "    \"\"\"Test hello_tinytorch function (5 points)\"\"\"\n",
-        "    import io\n",
-        "    import sys\n",
-        "    \n",
-        "    # Capture output\n",
-        "    captured_output = io.StringIO()\n",
-        "    sys.stdout = captured_output\n",
-        "    \n",
-        "    try:\n",
-        "        hello_tinytorch()\n",
-        "        output = captured_output.getvalue()\n",
-        "        \n",
-        "        # Check that some output was produced\n",
-        "        assert len(output) > 0, \"Function should produce output\"\n",
-        "        assert \"TinyTorch\" in output, \"Output should contain 'TinyTorch'\"\n",
-        "        \n",
-        "    finally:\n",
-        "        sys.stdout = sys.__stdout__\n",
-        "\n",
-        "def test_add_numbers():\n",
-        "    \"\"\"Test add_numbers function (5 points)\"\"\"\n",
-        "    # Test basic addition\n",
-        "    assert add_numbers(2, 3) == 5, \"add_numbers(2, 3) should return 5\"\n",
-        "    assert add_numbers(0, 0) == 0, \"add_numbers(0, 0) should return 0\"\n",
-        "    assert add_numbers(-1, 1) == 0, \"add_numbers(-1, 1) should return 0\"\n",
-        "    \n",
-        "    # Test with floats\n",
-        "    assert add_numbers(2.5, 3.5) == 6.0, \"add_numbers(2.5, 3.5) should return 6.0\"\n",
-        "    \n",
-        "    # Test with negative numbers\n",
-        "    assert add_numbers(-5, -3) == -8, \"add_numbers(-5, -3) should return -8\"\n",
-        "### END HIDDEN TESTS"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "d457e1bf",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 2: SystemInfo Class (35 Points)\n",
-        "\n",
-        "Let's create a class that collects and displays system information."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "c78b6a2e",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class SystemInfo:\n",
-        "    \"\"\"\n",
-        "    Simple system information class.\n",
-        "    \n",
-        "    Collects and displays Python version, platform, and machine information.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def __init__(self):\n",
-        "        \"\"\"\n",
-        "        Initialize system information collection.\n",
-        "        \n",
-        "        Collect Python version, platform, and machine information.\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use sys.version_info, platform.system(), and platform.machine()\n",
-        "        #| solution_test: Should store Python version, platform, and machine info\n",
-        "        #| difficulty: medium\n",
-        "        #| points: 15\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "    \n",
-        "    def __str__(self):\n",
-        "        \"\"\"\n",
-        "        Return human-readable system information.\n",
-        "        \n",
-        "        Format system info as a readable string.\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Format as \"Python X.Y on Platform (Machine)\"\n",
-        "        #| solution_test: Should return formatted string with version and platform\n",
-        "        #| difficulty: easy\n",
-        "        #| points: 10\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "    \n",
-        "    def is_compatible(self):\n",
-        "        \"\"\"\n",
-        "        Check if system meets minimum requirements.\n",
-        "        \n",
-        "        Check if Python version is >= 3.8\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Compare self.python_version with (3, 8) tuple\n",
-        "        #| solution_test: Should return True for Python >= 3.8\n",
-        "        #| difficulty: medium\n",
-        "        #| points: 10\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "9aceffc4",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Hidden Tests: SystemInfo Class (35 Points)\n",
-        "\n",
-        "These tests verify the SystemInfo class implementation."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "e7738e0f",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "### BEGIN HIDDEN TESTS\n",
-        "def test_systeminfo_init():\n",
-        "    \"\"\"Test SystemInfo initialization (15 points)\"\"\"\n",
-        "    info = SystemInfo()\n",
-        "    \n",
-        "    # Check that attributes are set\n",
-        "    assert hasattr(info, 'python_version'), \"Should have python_version attribute\"\n",
-        "    assert hasattr(info, 'platform'), \"Should have platform attribute\"\n",
-        "    assert hasattr(info, 'machine'), \"Should have machine attribute\"\n",
-        "    \n",
-        "    # Check types\n",
-        "    assert isinstance(info.python_version, tuple), \"python_version should be tuple\"\n",
-        "    assert isinstance(info.platform, str), \"platform should be string\"\n",
-        "    assert isinstance(info.machine, str), \"machine should be string\"\n",
-        "    \n",
-        "    # Check values are reasonable\n",
-        "    assert len(info.python_version) >= 2, \"python_version should have at least major.minor\"\n",
-        "    assert len(info.platform) > 0, \"platform should not be empty\"\n",
-        "\n",
-        "def test_systeminfo_str():\n",
-        "    \"\"\"Test SystemInfo string representation (10 points)\"\"\"\n",
-        "    info = SystemInfo()\n",
-        "    str_repr = str(info)\n",
-        "    \n",
-        "    # Check that the string contains expected elements\n",
-        "    assert \"Python\" in str_repr, \"String should contain 'Python'\"\n",
-        "    assert str(info.python_version.major) in str_repr, \"String should contain major version\"\n",
-        "    assert str(info.python_version.minor) in str_repr, \"String should contain minor version\"\n",
-        "    assert info.platform in str_repr, \"String should contain platform\"\n",
-        "    assert info.machine in str_repr, \"String should contain machine\"\n",
-        "\n",
-        "def test_systeminfo_compatibility():\n",
-        "    \"\"\"Test SystemInfo compatibility check (10 points)\"\"\"\n",
-        "    info = SystemInfo()\n",
-        "    compatibility = info.is_compatible()\n",
-        "    \n",
-        "    # Check that it returns a boolean\n",
-        "    assert isinstance(compatibility, bool), \"is_compatible should return boolean\"\n",
-        "    \n",
-        "    # Check that it's reasonable (we're running Python >= 3.8)\n",
-        "    assert compatibility == True, \"Should return True for Python >= 3.8\"\n",
-        "### END HIDDEN TESTS"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "da0fd46d",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 3: DeveloperProfile Class (35 Points)\n",
-        "\n",
-        "Let's create a personalized developer profile system."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "c7cd22cd",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class DeveloperProfile:\n",
-        "    \"\"\"\n",
-        "    Developer profile for personalizing TinyTorch experience.\n",
-        "    \n",
-        "    Stores and displays developer information with ASCII art.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    @staticmethod\n",
-        "    def _load_default_flame():\n",
-        "        \"\"\"\n",
-        "        Load the default TinyTorch flame ASCII art from file.\n",
-        "        \n",
-        "        Load from tinytorch_flame.txt with graceful fallback.\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use Path and file operations with try/except for fallback\n",
-        "        #| solution_test: Should load ASCII art from file or provide fallback\n",
-        "        #| difficulty: hard\n",
-        "        #| points: 5\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "    \n",
-        "    def __init__(self, name=\"Vijay Janapa Reddi\", affiliation=\"Harvard University\", \n",
-        "                 email=\"vj@eecs.harvard.edu\", github_username=\"profvjreddi\", ascii_art=None):\n",
-        "        \"\"\"\n",
-        "        Initialize developer profile.\n",
-        "        \n",
-        "        Store developer information with sensible defaults.\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Store all parameters as instance attributes, use _load_default_flame for ascii_art if None\n",
-        "        #| solution_test: Should store all developer information\n",
-        "        #| difficulty: medium\n",
-        "        #| points: 15\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "    \n",
-        "    def __str__(self):\n",
-        "        \"\"\"\n",
-        "        Return formatted developer information.\n",
-        "        \n",
-        "        Format as professional signature.\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Format as \"\ud83d\udc68\u200d\ud83d\udcbb Name | Affiliation | @username\"\n",
-        "        #| solution_test: Should return formatted string with name, affiliation, and username\n",
-        "        #| difficulty: easy\n",
-        "        #| points: 5\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "    \n",
-        "    def get_signature(self):\n",
-        "        \"\"\"\n",
-        "        Get a short signature for code headers.\n",
-        "        \n",
-        "        Return concise signature like \"Built by Name (@github)\"\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Format as \"Built by Name (@username)\"\n",
-        "        #| solution_test: Should return signature with name and username\n",
-        "        #| difficulty: easy\n",
-        "        #| points: 5\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "    \n",
-        "    def get_ascii_art(self):\n",
-        "        \"\"\"\n",
-        "        Get ASCII art for the profile.\n",
-        "        \n",
-        "        Return custom ASCII art or default flame.\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Simply return self.ascii_art\n",
-        "        #| solution_test: Should return stored ASCII art\n",
-        "        #| difficulty: easy\n",
-        "        #| points: 5\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "c58a5de4",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Hidden Tests: DeveloperProfile Class (35 Points)\n",
-        "\n",
-        "These tests verify the DeveloperProfile class implementation."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "a74d8133",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "### BEGIN HIDDEN TESTS\n",
-        "def test_developer_profile_init():\n",
-        "    \"\"\"Test DeveloperProfile initialization (15 points)\"\"\"\n",
-        "    # Test with defaults\n",
-        "    profile = DeveloperProfile()\n",
-        "    \n",
-        "    assert hasattr(profile, 'name'), \"Should have name attribute\"\n",
-        "    assert hasattr(profile, 'affiliation'), \"Should have affiliation attribute\"\n",
-        "    assert hasattr(profile, 'email'), \"Should have email attribute\"\n",
-        "    assert hasattr(profile, 'github_username'), \"Should have github_username attribute\"\n",
-        "    assert hasattr(profile, 'ascii_art'), \"Should have ascii_art attribute\"\n",
-        "    \n",
-        "    # Check default values\n",
-        "    assert profile.name == \"Vijay Janapa Reddi\", \"Should have default name\"\n",
-        "    assert profile.affiliation == \"Harvard University\", \"Should have default affiliation\"\n",
-        "    assert profile.email == \"vj@eecs.harvard.edu\", \"Should have default email\"\n",
-        "    assert profile.github_username == \"profvjreddi\", \"Should have default username\"\n",
-        "    assert profile.ascii_art is not None, \"Should have ASCII art\"\n",
-        "    \n",
-        "    # Test with custom values\n",
-        "    custom_profile = DeveloperProfile(\n",
-        "        name=\"Test User\",\n",
-        "        affiliation=\"Test University\",\n",
-        "        email=\"test@test.com\",\n",
-        "        github_username=\"testuser\",\n",
-        "        ascii_art=\"Custom Art\"\n",
-        "    )\n",
-        "    \n",
-        "    assert custom_profile.name == \"Test User\", \"Should store custom name\"\n",
-        "    assert custom_profile.affiliation == \"Test University\", \"Should store custom affiliation\"\n",
-        "    assert custom_profile.email == \"test@test.com\", \"Should store custom email\"\n",
-        "    assert custom_profile.github_username == \"testuser\", \"Should store custom username\"\n",
-        "    assert custom_profile.ascii_art == \"Custom Art\", \"Should store custom ASCII art\"\n",
-        "\n",
-        "def test_developer_profile_str():\n",
-        "    \"\"\"Test DeveloperProfile string representation (5 points)\"\"\"\n",
-        "    profile = DeveloperProfile()\n",
-        "    str_repr = str(profile)\n",
-        "    \n",
-        "    assert \"\ud83d\udc68\u200d\ud83d\udcbb\" in str_repr, \"Should contain developer emoji\"\n",
-        "    assert profile.name in str_repr, \"Should contain name\"\n",
-        "    assert profile.affiliation in str_repr, \"Should contain affiliation\"\n",
-        "    assert f\"@{profile.github_username}\" in str_repr, \"Should contain @username\"\n",
-        "\n",
-        "def test_developer_profile_signature():\n",
-        "    \"\"\"Test DeveloperProfile signature (5 points)\"\"\"\n",
-        "    profile = DeveloperProfile()\n",
-        "    signature = profile.get_signature()\n",
-        "    \n",
-        "    assert \"Built by\" in signature, \"Should contain 'Built by'\"\n",
-        "    assert profile.name in signature, \"Should contain name\"\n",
-        "    assert f\"@{profile.github_username}\" in signature, \"Should contain @username\"\n",
-        "\n",
-        "def test_developer_profile_ascii_art():\n",
-        "    \"\"\"Test DeveloperProfile ASCII art (5 points)\"\"\"\n",
-        "    profile = DeveloperProfile()\n",
-        "    ascii_art = profile.get_ascii_art()\n",
-        "    \n",
-        "    assert isinstance(ascii_art, str), \"ASCII art should be string\"\n",
-        "    assert len(ascii_art) > 0, \"ASCII art should not be empty\"\n",
-        "    assert \"TinyTorch\" in ascii_art, \"ASCII art should contain 'TinyTorch'\"\n",
-        "\n",
-        "def test_default_flame_loading():\n",
-        "    \"\"\"Test default flame loading (5 points)\"\"\"\n",
-        "    flame_art = DeveloperProfile._load_default_flame()\n",
-        "    \n",
-        "    assert isinstance(flame_art, str), \"Flame art should be string\"\n",
-        "    assert len(flame_art) > 0, \"Flame art should not be empty\"\n",
-        "    assert \"TinyTorch\" in flame_art, \"Flame art should contain 'TinyTorch'\"\n",
-        "### END HIDDEN TESTS"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "2959453c",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Test Your Implementation\n",
-        "\n",
-        "Run these cells to test your implementation:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "75574cd6",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test basic functions\n",
-        "print(\"Testing Basic Functions:\")\n",
-        "try:\n",
-        "    hello_tinytorch()\n",
-        "    print(f\"2 + 3 = {add_numbers(2, 3)}\")\n",
-        "    print(\"\u2705 Basic functions working!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "e5d4a310",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test SystemInfo\n",
-        "print(\"\\nTesting SystemInfo:\")\n",
-        "try:\n",
-        "    info = SystemInfo()\n",
-        "    print(f\"System: {info}\")\n",
-        "    print(f\"Compatible: {info.is_compatible()}\")\n",
-        "    print(\"\u2705 SystemInfo working!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "9cd31f75",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test DeveloperProfile\n",
-        "print(\"\\nTesting DeveloperProfile:\")\n",
-        "try:\n",
-        "    profile = DeveloperProfile()\n",
-        "    print(f\"Profile: {profile}\")\n",
-        "    print(f\"Signature: {profile.get_signature()}\")\n",
-        "    print(\"\u2705 DeveloperProfile working!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "95483816",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83c\udf89 Module Complete!\n",
-        "\n",
-        "You've successfully implemented the setup module with **100 points total**:\n",
-        "\n",
-        "### Point Breakdown:\n",
-        "- **hello_tinytorch()**: 10 points\n",
-        "- **add_numbers()**: 10 points  \n",
-        "- **Basic function tests**: 10 points\n",
-        "- **SystemInfo.__init__()**: 15 points\n",
-        "- **SystemInfo.__str__()**: 10 points\n",
-        "- **SystemInfo.is_compatible()**: 10 points\n",
-        "- **DeveloperProfile.__init__()**: 15 points\n",
-        "- **DeveloperProfile methods**: 20 points\n",
-        "\n",
-        "### What's Next:\n",
-        "1. Export your code: `tito sync --module setup`\n",
-        "2. Run tests: `tito test --module setup`\n",
-        "3. Generate assignment: `tito nbgrader generate --module setup`\n",
-        "4. Move to Module 1: Tensor!\n",
-        "\n",
-        "### NBGrader Features:\n",
-        "- \u2705 Automatic grading with 100 points\n",
-        "- \u2705 Partial credit for each component\n",
-        "- \u2705 Hidden tests for comprehensive validation\n",
-        "- \u2705 Immediate feedback for students\n",
-        "- \u2705 Compatible with existing TinyTorch workflow\n",
-        "\n",
-        "Happy building! \ud83d\udd25"
-      ]
-    }
-  ],
-  "metadata": {
-    "jupytext": {
-      "main_language": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/assignments/source/01_tensor/01_tensor.ipynb b/assignments/source/01_tensor/01_tensor.ipynb
deleted file mode 100644
index ebfd21e6..00000000
--- a/assignments/source/01_tensor/01_tensor.ipynb
+++ /dev/null
@@ -1,480 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "0cf257dc",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "# Module 1: Tensor - Enhanced with nbgrader Support\n",
-        "\n",
-        "This is an enhanced version of the tensor module that demonstrates dual-purpose content creation:\n",
-        "- **Self-learning**: Rich educational content with guided implementation\n",
-        "- **Auto-grading**: nbgrader-compatible assignments with hidden tests\n",
-        "\n",
-        "## Dual System Benefits\n",
-        "\n",
-        "1. **Single Source**: One file generates both learning and assignment materials\n",
-        "2. **Consistent Quality**: Same instructor solutions in both contexts\n",
-        "3. **Flexible Assessment**: Choose between self-paced learning or formal grading\n",
-        "4. **Scalable**: Handle large courses with automated feedback\n",
-        "\n",
-        "## How It Works\n",
-        "\n",
-        "- **TinyTorch markers**: `#| exercise_start/end` for educational content\n",
-        "- **nbgrader markers**: `### BEGIN/END SOLUTION` for auto-grading\n",
-        "- **Hidden tests**: `### BEGIN/END HIDDEN TESTS` for automatic verification\n",
-        "- **Dual generation**: One command creates both student notebooks and assignments"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "dbe77981",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| default_exp core.tensor"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "7dc4f1a0",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "import numpy as np\n",
-        "from typing import Union, List, Tuple, Optional"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "1765d8cb",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Enhanced Tensor Class\n",
-        "\n",
-        "This implementation shows how to create dual-purpose educational content:\n",
-        "\n",
-        "### For Self-Learning Students\n",
-        "- Rich explanations and step-by-step guidance\n",
-        "- Detailed hints and examples\n",
-        "- Progressive difficulty with scaffolding\n",
-        "\n",
-        "### For Formal Assessment\n",
-        "- Auto-graded with hidden tests\n",
-        "- Immediate feedback on correctness\n",
-        "- Partial credit for complex methods"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "aff9a0f2",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class Tensor:\n",
-        "    \"\"\"\n",
-        "    TinyTorch Tensor: N-dimensional array with ML operations.\n",
-        "    \n",
-        "    This enhanced version demonstrates dual-purpose educational content\n",
-        "    suitable for both self-learning and formal assessment.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def __init__(self, data: Union[int, float, List, np.ndarray], dtype: Optional[str] = None):\n",
-        "        \"\"\"\n",
-        "        Create a new tensor from data.\n",
-        "        \n",
-        "        Args:\n",
-        "            data: Input data (scalar, list, or numpy array)\n",
-        "            dtype: Data type ('float32', 'int32', etc.). Defaults to auto-detect.\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use np.array() to convert input data to numpy array\n",
-        "        #| solution_test: tensor.shape should match input shape\n",
-        "        #| difficulty: easy\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        if isinstance(data, (int, float)):\n",
-        "            self._data = np.array(data)\n",
-        "        elif isinstance(data, list):\n",
-        "            self._data = np.array(data)\n",
-        "        elif isinstance(data, np.ndarray):\n",
-        "            self._data = data.copy()\n",
-        "        else:\n",
-        "            self._data = np.array(data)\n",
-        "        \n",
-        "        # Apply dtype conversion if specified\n",
-        "        if dtype is not None:\n",
-        "            self._data = self._data.astype(dtype)\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    @property\n",
-        "    def data(self) -> np.ndarray:\n",
-        "        \"\"\"Access underlying numpy array.\"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Return the stored numpy array (_data attribute)\n",
-        "        #| solution_test: tensor.data should return numpy array\n",
-        "        #| difficulty: easy\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    @property\n",
-        "    def shape(self) -> Tuple[int, ...]:\n",
-        "        \"\"\"Get tensor shape.\"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use the .shape attribute of the numpy array\n",
-        "        #| solution_test: tensor.shape should return tuple of dimensions\n",
-        "        #| difficulty: easy\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    @property\n",
-        "    def size(self) -> int:\n",
-        "        \"\"\"Get total number of elements.\"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use the .size attribute of the numpy array\n",
-        "        #| solution_test: tensor.size should return total element count\n",
-        "        #| difficulty: easy\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    @property\n",
-        "    def dtype(self) -> np.dtype:\n",
-        "        \"\"\"Get data type as numpy dtype.\"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use the .dtype attribute of the numpy array\n",
-        "        #| solution_test: tensor.dtype should return numpy dtype\n",
-        "        #| difficulty: easy\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    def __repr__(self) -> str:\n",
-        "        \"\"\"String representation of the tensor.\"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Format as \"Tensor([data], shape=shape, dtype=dtype)\"\n",
-        "        #| solution_test: repr should include data, shape, and dtype\n",
-        "        #| difficulty: medium\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        return f\"Tensor({data_str}, shape={self.shape}, dtype={self.dtype})\"\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    def add(self, other: 'Tensor') -> 'Tensor':\n",
-        "        \"\"\"\n",
-        "        Add two tensors element-wise.\n",
-        "        \n",
-        "        Args:\n",
-        "            other: Another tensor to add\n",
-        "            \n",
-        "        Returns:\n",
-        "            New tensor with element-wise sum\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use numpy's + operator for element-wise addition\n",
-        "        #| solution_test: result should be new Tensor with correct values\n",
-        "        #| difficulty: medium\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        return Tensor(result_data)\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    def multiply(self, other: 'Tensor') -> 'Tensor':\n",
-        "        \"\"\"\n",
-        "        Multiply two tensors element-wise.\n",
-        "        \n",
-        "        Args:\n",
-        "            other: Another tensor to multiply\n",
-        "            \n",
-        "        Returns:\n",
-        "            New tensor with element-wise product\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use numpy's * operator for element-wise multiplication\n",
-        "        #| solution_test: result should be new Tensor with correct values\n",
-        "        #| difficulty: medium\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        return Tensor(result_data)\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end\n",
-        "        \n",
-        "    def matmul(self, other: 'Tensor') -> 'Tensor':\n",
-        "        \"\"\"\n",
-        "        Matrix multiplication of two tensors.\n",
-        "        \n",
-        "        Args:\n",
-        "            other: Another tensor for matrix multiplication\n",
-        "            \n",
-        "        Returns:\n",
-        "            New tensor with matrix product\n",
-        "            \n",
-        "        Raises:\n",
-        "            ValueError: If shapes are incompatible for matrix multiplication\n",
-        "        \"\"\"\n",
-        "        #| exercise_start\n",
-        "        #| hint: Use np.dot() for matrix multiplication, check shapes first\n",
-        "        #| solution_test: result should handle shape validation and matrix multiplication\n",
-        "        #| difficulty: hard\n",
-        "        \n",
-        "        ### BEGIN SOLUTION\n",
-        "    # YOUR CODE HERE\n",
-        "    raise NotImplementedError()\n",
-        "        if len(self.shape) != 2 or len(other.shape) != 2:\n",
-        "            raise ValueError(\"Matrix multiplication requires 2D tensors\")\n",
-        "        \n",
-        "        if self.shape[1] != other.shape[0]:\n",
-        "            raise ValueError(f\"Cannot multiply shapes {self.shape} and {other.shape}\")\n",
-        "        \n",
-        "        result_data = np.dot(self._data, other._data)\n",
-        "        return Tensor(result_data)\n",
-        "        ### END SOLUTION\n",
-        "        \n",
-        "        #| exercise_end"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "90c887d9",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Hidden Tests for Auto-Grading\n",
-        "\n",
-        "These tests are hidden from students but used for automatic grading.\n",
-        "They provide comprehensive coverage and immediate feedback."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "67d0055f",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "### BEGIN HIDDEN TESTS\n",
-        "def test_tensor_creation_basic():\n",
-        "    \"\"\"Test basic tensor creation (2 points)\"\"\"\n",
-        "    t = Tensor([1, 2, 3])\n",
-        "    assert t.shape == (3,)\n",
-        "    assert t.data.tolist() == [1, 2, 3]\n",
-        "    assert t.size == 3\n",
-        "\n",
-        "def test_tensor_creation_scalar():\n",
-        "    \"\"\"Test scalar tensor creation (2 points)\"\"\"\n",
-        "    t = Tensor(5)\n",
-        "    assert t.shape == ()\n",
-        "    assert t.data.item() == 5\n",
-        "    assert t.size == 1\n",
-        "\n",
-        "def test_tensor_creation_2d():\n",
-        "    \"\"\"Test 2D tensor creation (2 points)\"\"\"\n",
-        "    t = Tensor([[1, 2], [3, 4]])\n",
-        "    assert t.shape == (2, 2)\n",
-        "    assert t.data.tolist() == [[1, 2], [3, 4]]\n",
-        "    assert t.size == 4\n",
-        "\n",
-        "def test_tensor_dtype():\n",
-        "    \"\"\"Test dtype handling (2 points)\"\"\"\n",
-        "    t = Tensor([1, 2, 3], dtype='float32')\n",
-        "    assert t.dtype == np.float32\n",
-        "    assert t.data.dtype == np.float32\n",
-        "\n",
-        "def test_tensor_properties():\n",
-        "    \"\"\"Test tensor properties (2 points)\"\"\"\n",
-        "    t = Tensor([[1, 2, 3], [4, 5, 6]])\n",
-        "    assert t.shape == (2, 3)\n",
-        "    assert t.size == 6\n",
-        "    assert isinstance(t.data, np.ndarray)\n",
-        "\n",
-        "def test_tensor_repr():\n",
-        "    \"\"\"Test string representation (2 points)\"\"\"\n",
-        "    t = Tensor([1, 2, 3])\n",
-        "    repr_str = repr(t)\n",
-        "    assert \"Tensor\" in repr_str\n",
-        "    assert \"shape\" in repr_str\n",
-        "    assert \"dtype\" in repr_str\n",
-        "\n",
-        "def test_tensor_add():\n",
-        "    \"\"\"Test tensor addition (3 points)\"\"\"\n",
-        "    t1 = Tensor([1, 2, 3])\n",
-        "    t2 = Tensor([4, 5, 6])\n",
-        "    result = t1.add(t2)\n",
-        "    assert result.data.tolist() == [5, 7, 9]\n",
-        "    assert result.shape == (3,)\n",
-        "\n",
-        "def test_tensor_multiply():\n",
-        "    \"\"\"Test tensor multiplication (3 points)\"\"\"\n",
-        "    t1 = Tensor([1, 2, 3])\n",
-        "    t2 = Tensor([4, 5, 6])\n",
-        "    result = t1.multiply(t2)\n",
-        "    assert result.data.tolist() == [4, 10, 18]\n",
-        "    assert result.shape == (3,)\n",
-        "\n",
-        "def test_tensor_matmul():\n",
-        "    \"\"\"Test matrix multiplication (4 points)\"\"\"\n",
-        "    t1 = Tensor([[1, 2], [3, 4]])\n",
-        "    t2 = Tensor([[5, 6], [7, 8]])\n",
-        "    result = t1.matmul(t2)\n",
-        "    expected = [[19, 22], [43, 50]]\n",
-        "    assert result.data.tolist() == expected\n",
-        "    assert result.shape == (2, 2)\n",
-        "\n",
-        "def test_tensor_matmul_error():\n",
-        "    \"\"\"Test matrix multiplication error handling (2 points)\"\"\"\n",
-        "    t1 = Tensor([[1, 2, 3]])  # Shape (1, 3)\n",
-        "    t2 = Tensor([[4, 5]])     # Shape (1, 2)\n",
-        "    \n",
-        "    try:\n",
-        "        t1.matmul(t2)\n",
-        "        assert False, \"Should have raised ValueError\"\n",
-        "    except ValueError as e:\n",
-        "        assert \"Cannot multiply shapes\" in str(e)\n",
-        "\n",
-        "def test_tensor_immutability():\n",
-        "    \"\"\"Test that operations create new tensors (2 points)\"\"\"\n",
-        "    t1 = Tensor([1, 2, 3])\n",
-        "    t2 = Tensor([4, 5, 6])\n",
-        "    original_data = t1.data.copy()\n",
-        "    \n",
-        "    result = t1.add(t2)\n",
-        "    \n",
-        "    # Original tensor should be unchanged\n",
-        "    assert np.array_equal(t1.data, original_data)\n",
-        "    # Result should be different object\n",
-        "    assert result is not t1\n",
-        "    assert result.data is not t1.data\n",
-        "\n",
-        "### END HIDDEN TESTS"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "636ac01d",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Usage Examples\n",
-        "\n",
-        "### Self-Learning Mode\n",
-        "Students work through the educational content step by step:\n",
-        "\n",
-        "```python\n",
-        "# Create tensors\n",
-        "t1 = Tensor([1, 2, 3])\n",
-        "t2 = Tensor([4, 5, 6])\n",
-        "\n",
-        "# Basic operations\n",
-        "result = t1.add(t2)\n",
-        "print(f\"Addition: {result}\")\n",
-        "\n",
-        "# Matrix operations\n",
-        "matrix1 = Tensor([[1, 2], [3, 4]])\n",
-        "matrix2 = Tensor([[5, 6], [7, 8]])\n",
-        "product = matrix1.matmul(matrix2)\n",
-        "print(f\"Matrix multiplication: {product}\")\n",
-        "```\n",
-        "\n",
-        "### Assignment Mode\n",
-        "Students submit implementations that are automatically graded:\n",
-        "\n",
-        "1. **Immediate feedback**: Know if implementation is correct\n",
-        "2. **Partial credit**: Earn points for each working method\n",
-        "3. **Hidden tests**: Comprehensive coverage beyond visible examples\n",
-        "4. **Error handling**: Points for proper edge case handling\n",
-        "\n",
-        "### Benefits of Dual System\n",
-        "\n",
-        "1. **Single source**: One implementation serves both purposes\n",
-        "2. **Consistent quality**: Same instructor solutions everywhere\n",
-        "3. **Flexible assessment**: Choose the right tool for each situation\n",
-        "4. **Scalable**: Handle large courses with automated feedback\n",
-        "\n",
-        "This approach transforms TinyTorch from a learning framework into a complete course management solution."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "cd296b25",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test the implementation\n",
-        "if __name__ == \"__main__\":\n",
-        "    # Basic testing\n",
-        "    t1 = Tensor([1, 2, 3])\n",
-        "    t2 = Tensor([4, 5, 6])\n",
-        "    \n",
-        "    print(f\"t1: {t1}\")\n",
-        "    print(f\"t2: {t2}\")\n",
-        "    print(f\"t1 + t2: {t1.add(t2)}\")\n",
-        "    print(f\"t1 * t2: {t1.multiply(t2)}\")\n",
-        "    \n",
-        "    # Matrix multiplication\n",
-        "    m1 = Tensor([[1, 2], [3, 4]])\n",
-        "    m2 = Tensor([[5, 6], [7, 8]])\n",
-        "    print(f\"Matrix multiplication: {m1.matmul(m2)}\")\n",
-        "    \n",
-        "    print(\"\u2705 Enhanced tensor module working!\") "
-      ]
-    }
-  ],
-  "metadata": {
-    "jupytext": {
-      "main_language": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/assignments/source/02_activations/02_activations.ipynb b/assignments/source/02_activations/02_activations.ipynb
deleted file mode 100644
index 9c027f4c..00000000
--- a/assignments/source/02_activations/02_activations.ipynb
+++ /dev/null
@@ -1,1143 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "836ef696",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "# Module 3: Activation Functions - The Spark of Intelligence\n",
-        "\n",
-        "**Learning Goals:**\n",
-        "- Understand why activation functions are essential for neural networks\n",
-        "- Implement four fundamental activation functions from scratch\n",
-        "- Learn the mathematical properties and use cases of each activation\n",
-        "- Visualize activation function behavior and understand their impact\n",
-        "\n",
-        "**Why This Matters:**\n",
-        "Without activation functions, neural networks would just be linear transformations - no matter how many layers you stack, you'd only get linear relationships. Activation functions introduce the nonlinearity that allows neural networks to learn complex patterns and approximate any function.\n",
-        "\n",
-        "**Real-World Context:**\n",
-        "Every neural network you've heard of - from image recognition to language models - relies on activation functions. Understanding them deeply is crucial for designing effective architectures and debugging training issues."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "fd818131",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| default_exp core.activations"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "3300cf9a",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "import math\n",
-        "import numpy as np\n",
-        "import matplotlib.pyplot as plt\n",
-        "import os\n",
-        "import sys\n",
-        "from typing import Union, List\n",
-        "\n",
-        "# Import our Tensor class from the main package (rock solid foundation)\n",
-        "from tinytorch.core.tensor import Tensor"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "1e3adf3e",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def _should_show_plots():\n",
-        "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-        "    # Check multiple conditions that indicate we're in test mode\n",
-        "    is_pytest = (\n",
-        "        'pytest' in sys.modules or\n",
-        "        'test' in sys.argv or\n",
-        "        os.environ.get('PYTEST_CURRENT_TEST') is not None or\n",
-        "        any('test' in arg for arg in sys.argv) or\n",
-        "        any('pytest' in arg for arg in sys.argv)\n",
-        "    )\n",
-        "    \n",
-        "    # Show plots in development mode (when not in test mode)\n",
-        "    return not is_pytest"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "2131f76a",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def visualize_activation_function(activation_fn, name: str, x_range: tuple = (-5, 5), num_points: int = 100):\n",
-        "    \"\"\"Visualize an activation function's behavior\"\"\"\n",
-        "    if not _should_show_plots():\n",
-        "        return\n",
-        "        \n",
-        "    try:\n",
-        "        \n",
-        "        # Generate input values\n",
-        "        x_vals = np.linspace(x_range[0], x_range[1], num_points)\n",
-        "        \n",
-        "        # Apply activation function\n",
-        "        y_vals = []\n",
-        "        for x in x_vals:\n",
-        "            input_tensor = Tensor([[x]])\n",
-        "            output = activation_fn(input_tensor)\n",
-        "            y_vals.append(output.data.item())\n",
-        "        \n",
-        "        # Create plot\n",
-        "        plt.figure(figsize=(10, 6))\n",
-        "        plt.plot(x_vals, y_vals, 'b-', linewidth=2, label=f'{name} Activation')\n",
-        "        plt.grid(True, alpha=0.3)\n",
-        "        plt.xlabel('Input (x)')\n",
-        "        plt.ylabel(f'{name}(x)')\n",
-        "        plt.title(f'{name} Activation Function')\n",
-        "        plt.legend()\n",
-        "        plt.show()\n",
-        "        \n",
-        "    except ImportError:\n",
-        "        print(\"   \ud83d\udcca Matplotlib not available - skipping visualization\")\n",
-        "    except Exception as e:\n",
-        "        print(f\"   \u26a0\ufe0f  Visualization error: {e}\")\n",
-        "\n",
-        "def visualize_activation_on_data(activation_fn, name: str, data: Tensor):\n",
-        "    \"\"\"Show activation function applied to sample data\"\"\"\n",
-        "    if not _should_show_plots():\n",
-        "        return\n",
-        "        \n",
-        "    try:\n",
-        "        output = activation_fn(data)\n",
-        "        print(f\"   \ud83d\udcca {name} Example:\")\n",
-        "        print(f\"      Input:  {data.data.flatten()}\")\n",
-        "        print(f\"      Output: {output.data.flatten()}\")\n",
-        "        print(f\"      Range:  [{output.data.min():.3f}, {output.data.max():.3f}]\")\n",
-        "        \n",
-        "    except Exception as e:\n",
-        "        print(f\"   \u26a0\ufe0f  Data visualization error: {e}\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "7107d23e",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Step 1: What is an Activation Function?\n",
-        "\n",
-        "### Definition\n",
-        "An **activation function** is a mathematical function that adds nonlinearity to neural networks. It transforms the output of a layer before passing it to the next layer.\n",
-        "\n",
-        "### Why Activation Functions Matter\n",
-        "**Without activation functions, neural networks are just linear transformations!**\n",
-        "\n",
-        "```\n",
-        "Linear \u2192 Linear \u2192 Linear = Still Linear\n",
-        "```\n",
-        "\n",
-        "No matter how many layers you stack, without activation functions, you can only learn linear relationships. Activation functions introduce the nonlinearity that allows neural networks to:\n",
-        "- Learn complex patterns\n",
-        "- Approximate any continuous function\n",
-        "- Solve non-linear problems\n",
-        "\n",
-        "### Visual Analogy\n",
-        "Think of activation functions as **decision makers** at each neuron:\n",
-        "- **ReLU**: \"If positive, pass it through; if negative, block it\"\n",
-        "- **Sigmoid**: \"Squash everything between 0 and 1\"\n",
-        "- **Tanh**: \"Squash everything between -1 and 1\"\n",
-        "- **Softmax**: \"Convert to probabilities that sum to 1\"\n",
-        "\n",
-        "### Connection to Previous Modules\n",
-        "In Module 2 (Layers), we learned how to transform data through linear operations (matrix multiplication + bias). Now we add the nonlinear activation functions that make neural networks powerful."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "3452616c",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 2: ReLU - The Workhorse of Deep Learning\n",
-        "\n",
-        "### What is ReLU?\n",
-        "**ReLU (Rectified Linear Unit)** is the most popular activation function in deep learning.\n",
-        "\n",
-        "**Mathematical Definition:**\n",
-        "```\n",
-        "f(x) = max(0, x)\n",
-        "```\n",
-        "\n",
-        "**In Plain English:**\n",
-        "- If input is positive \u2192 pass it through unchanged\n",
-        "- If input is negative \u2192 output zero\n",
-        "\n",
-        "### Why ReLU is Popular\n",
-        "1. **Simple**: Easy to compute and understand\n",
-        "2. **Fast**: No expensive operations (no exponentials)\n",
-        "3. **Sparse**: Outputs many zeros, creating sparse representations\n",
-        "4. **Gradient-friendly**: Gradient is either 0 or 1 (no vanishing gradient for positive inputs)\n",
-        "\n",
-        "### Real-World Analogy\n",
-        "ReLU is like a **one-way valve** - it only lets positive \"pressure\" through, blocking negative values completely.\n",
-        "\n",
-        "### When to Use ReLU\n",
-        "- **Hidden layers** in most neural networks\n",
-        "- **Convolutional layers** in image processing\n",
-        "- **When you want sparse activations**"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "a7885061",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class ReLU:\n",
-        "    \"\"\"\n",
-        "    ReLU Activation Function: f(x) = max(0, x)\n",
-        "    \n",
-        "    The most popular activation function in deep learning.\n",
-        "    Simple, fast, and effective for most applications.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Apply ReLU activation: f(x) = max(0, x)\n",
-        "        \n",
-        "        TODO: Implement ReLU activation\n",
-        "        \n",
-        "        APPROACH:\n",
-        "        1. For each element in the input tensor, apply max(0, element)\n",
-        "        2. Return a new Tensor with the results\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Input: Tensor([[-1, 0, 1, 2, -3]])\n",
-        "        Expected: Tensor([[0, 0, 1, 2, 0]])\n",
-        "        \n",
-        "        HINTS:\n",
-        "        - Use np.maximum(0, x.data) for element-wise max\n",
-        "        - Remember to return a new Tensor object\n",
-        "        - The shape should remain the same as input\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Allow calling the activation like a function: relu(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "f8337a5d",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "class ReLU:\n",
-        "    \"\"\"ReLU Activation: f(x) = max(0, x)\"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        result = np.maximum(0, x.data)\n",
-        "        return Tensor(result)\n",
-        "        \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "1c5aec6b",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your ReLU Implementation\n",
-        "\n",
-        "Let's test your ReLU implementation right away to make sure it's working correctly:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "ec0e4569",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "try:\n",
-        "    # Create ReLU activation\n",
-        "    relu = ReLU()\n",
-        "    \n",
-        "    # Test 1: Basic functionality\n",
-        "    print(\"\ud83d\udd27 Testing ReLU Implementation\")\n",
-        "    print(\"=\" * 40)\n",
-        "    \n",
-        "    # Test with mixed positive/negative values\n",
-        "    test_input = Tensor([[-2, -1, 0, 1, 2]])\n",
-        "    expected = Tensor([[0, 0, 0, 1, 2]])\n",
-        "    \n",
-        "    result = relu(test_input)\n",
-        "    print(f\"Input:    {test_input.data.flatten()}\")\n",
-        "    print(f\"Output:   {result.data.flatten()}\")\n",
-        "    print(f\"Expected: {expected.data.flatten()}\")\n",
-        "    \n",
-        "    # Verify correctness\n",
-        "    if np.allclose(result.data, expected.data):\n",
-        "        print(\"\u2705 Basic ReLU test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Basic ReLU test failed!\")\n",
-        "        print(\"   Check your max(0, x) implementation\")\n",
-        "    \n",
-        "    # Test 2: Edge cases\n",
-        "    edge_cases = Tensor([[-100, -0.1, 0, 0.1, 100]])\n",
-        "    edge_result = relu(edge_cases)\n",
-        "    expected_edge = np.array([[0, 0, 0, 0.1, 100]])\n",
-        "    \n",
-        "    print(f\"\\nEdge cases: {edge_cases.data.flatten()}\")\n",
-        "    print(f\"Output:     {edge_result.data.flatten()}\")\n",
-        "    \n",
-        "    if np.allclose(edge_result.data, expected_edge):\n",
-        "        print(\"\u2705 Edge case test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Edge case test failed!\")\n",
-        "    \n",
-        "    # Test 3: Shape preservation\n",
-        "    multi_dim = Tensor([[1, -1], [2, -2], [0, 3]])\n",
-        "    multi_result = relu(multi_dim)\n",
-        "    \n",
-        "    if multi_result.data.shape == multi_dim.data.shape:\n",
-        "        print(\"\u2705 Shape preservation test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Shape preservation test failed!\")\n",
-        "        print(f\"   Expected shape: {multi_dim.data.shape}, got: {multi_result.data.shape}\")\n",
-        "    \n",
-        "    print(\"\u2705 ReLU tests complete!\")\n",
-        "    \n",
-        "except NotImplementedError:\n",
-        "    print(\"\u26a0\ufe0f  ReLU not implemented yet - complete the forward method above!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error in ReLU: {e}\")\n",
-        "    print(\"   Check your implementation in the forward method\")\n",
-        "\n",
-        "print()  # Add spacing"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "e7f73603",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# \ud83c\udfa8 ReLU Visualization (development only - not exported)\n",
-        "if _should_show_plots():\n",
-        "    try:\n",
-        "        relu = ReLU()\n",
-        "        print(\"\ud83c\udfa8 Visualizing ReLU behavior...\")\n",
-        "        visualize_activation_function(relu, \"ReLU\", x_range=(-3, 3))\n",
-        "        \n",
-        "        # Show ReLU with real data\n",
-        "        sample_data = Tensor([[-2.5, -1.0, -0.5, 0.0, 0.5, 1.0, 2.5]])\n",
-        "        visualize_activation_on_data(relu, \"ReLU\", sample_data)\n",
-        "    except:\n",
-        "        pass  # Skip if ReLU not implemented"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "235b8ea2",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 3: Sigmoid - The Smooth Classifier\n",
-        "\n",
-        "### What is Sigmoid?\n",
-        "**Sigmoid** is a smooth, S-shaped activation function that squashes inputs to the range (0, 1).\n",
-        "\n",
-        "**Mathematical Definition:**\n",
-        "```\n",
-        "f(x) = 1 / (1 + e^(-x))\n",
-        "```\n",
-        "\n",
-        "**Key Properties:**\n",
-        "- **Range**: (0, 1) - never exactly 0 or 1\n",
-        "- **Smooth**: Differentiable everywhere\n",
-        "- **Monotonic**: Always increasing\n",
-        "- **Symmetric**: Around the point (0, 0.5)\n",
-        "\n",
-        "### Why Sigmoid is Useful\n",
-        "1. **Probability interpretation**: Output can be interpreted as probability\n",
-        "2. **Smooth gradients**: Nice for optimization\n",
-        "3. **Bounded output**: Prevents extreme values\n",
-        "\n",
-        "### Real-World Analogy\n",
-        "Sigmoid is like a **smooth dimmer switch** - it gradually transitions from \"off\" (near 0) to \"on\" (near 1), unlike ReLU's sharp cutoff.\n",
-        "\n",
-        "### When to Use Sigmoid\n",
-        "- **Binary classification** (output layer)\n",
-        "- **Gate mechanisms** (in LSTMs)\n",
-        "- **When you need probabilities**\n",
-        "\n",
-        "### Numerical Stability Note\n",
-        "For very large positive or negative inputs, sigmoid can cause numerical issues. We'll handle this with clipping."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "f3a7f3a1",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class Sigmoid:\n",
-        "    \"\"\"\n",
-        "    Sigmoid Activation Function: f(x) = 1 / (1 + e^(-x))\n",
-        "    \n",
-        "    Squashes inputs to the range (0, 1), useful for binary classification\n",
-        "    and probability interpretation.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Apply Sigmoid activation: f(x) = 1 / (1 + e^(-x))\n",
-        "        \n",
-        "        TODO: Implement Sigmoid activation\n",
-        "        \n",
-        "        APPROACH:\n",
-        "        1. For numerical stability, clip x to reasonable range (e.g., -500 to 500)\n",
-        "        2. Compute 1 / (1 + exp(-x)) for each element\n",
-        "        3. Return a new Tensor with the results\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Input: Tensor([[-2, -1, 0, 1, 2]])\n",
-        "        Expected: Tensor([[0.119, 0.269, 0.5, 0.731, 0.881]]) (approximately)\n",
-        "        \n",
-        "        HINTS:\n",
-        "        - Use np.clip(x.data, -500, 500) for numerical stability\n",
-        "        - Use np.exp(-clipped_x) for the exponential\n",
-        "        - Formula: 1 / (1 + np.exp(-clipped_x))\n",
-        "        - Remember to return a new Tensor object\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Allow calling the activation like a function: sigmoid(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "2254ff20",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "class Sigmoid:\n",
-        "    \"\"\"Sigmoid Activation: f(x) = 1 / (1 + e^(-x))\"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        # Clip for numerical stability\n",
-        "        clipped = np.clip(x.data, -500, 500)\n",
-        "        result = 1 / (1 + np.exp(-clipped))\n",
-        "        return Tensor(result)\n",
-        "        \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "80afbe84",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Sigmoid Implementation\n",
-        "\n",
-        "Let's test your Sigmoid implementation to ensure it's working correctly:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "e7ed51d8",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "try:\n",
-        "    # Create Sigmoid activation\n",
-        "    sigmoid = Sigmoid()\n",
-        "    \n",
-        "    print(\"\ud83d\udd27 Testing Sigmoid Implementation\")\n",
-        "    print(\"=\" * 40)\n",
-        "    \n",
-        "    # Test 1: Basic functionality\n",
-        "    test_input = Tensor([[-2, -1, 0, 1, 2]])\n",
-        "    result = sigmoid(test_input)\n",
-        "    \n",
-        "    print(f\"Input:  {test_input.data.flatten()}\")\n",
-        "    print(f\"Output: {result.data.flatten()}\")\n",
-        "    \n",
-        "    # Check properties\n",
-        "    # 1. All outputs should be between 0 and 1\n",
-        "    if np.all(result.data >= 0) and np.all(result.data <= 1):\n",
-        "        print(\"\u2705 Range test passed: all outputs in (0, 1)\")\n",
-        "    else:\n",
-        "        print(\"\u274c Range test failed: outputs should be in (0, 1)\")\n",
-        "    \n",
-        "    # 2. Sigmoid(0) should be 0.5\n",
-        "    zero_input = Tensor([[0]])\n",
-        "    zero_result = sigmoid(zero_input)\n",
-        "    if abs(zero_result.data.item() - 0.5) < 1e-6:\n",
-        "        print(\"\u2705 Sigmoid(0) = 0.5 test passed!\")\n",
-        "    else:\n",
-        "        print(f\"\u274c Sigmoid(0) should be 0.5, got {zero_result.data.item()}\")\n",
-        "    \n",
-        "    # 3. Test symmetry: sigmoid(-x) = 1 - sigmoid(x)\n",
-        "    x_val = 2.0\n",
-        "    pos_result = sigmoid(Tensor([[x_val]])).data.item()\n",
-        "    neg_result = sigmoid(Tensor([[-x_val]])).data.item()\n",
-        "    \n",
-        "    if abs(pos_result + neg_result - 1.0) < 1e-6:\n",
-        "        print(\"\u2705 Symmetry test passed!\")\n",
-        "    else:\n",
-        "        print(f\"\u274c Symmetry test failed: sigmoid({x_val}) + sigmoid({-x_val}) should equal 1\")\n",
-        "    \n",
-        "    # 4. Test numerical stability with extreme values\n",
-        "    extreme_input = Tensor([[-1000, 1000]])\n",
-        "    extreme_result = sigmoid(extreme_input)\n",
-        "    \n",
-        "    # Should not produce NaN or inf\n",
-        "    if not np.any(np.isnan(extreme_result.data)) and not np.any(np.isinf(extreme_result.data)):\n",
-        "        print(\"\u2705 Numerical stability test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Numerical stability test failed: extreme values produced NaN/inf\")\n",
-        "    \n",
-        "    print(\"\u2705 Sigmoid tests complete!\")\n",
-        "    \n",
-        "    # \ud83c\udfa8 Visualize Sigmoid behavior (development only)\n",
-        "    if _should_show_plots():\n",
-        "        print(\"\\n\ud83c\udfa8 Visualizing Sigmoid behavior...\")\n",
-        "        visualize_activation_function(sigmoid, \"Sigmoid\", x_range=(-5, 5))\n",
-        "        \n",
-        "        # Show Sigmoid with real data\n",
-        "        sample_data = Tensor([[-3.0, -1.0, 0.0, 1.0, 3.0]])\n",
-        "        visualize_activation_on_data(sigmoid, \"Sigmoid\", sample_data)\n",
-        "    \n",
-        "except NotImplementedError:\n",
-        "    print(\"\u26a0\ufe0f  Sigmoid not implemented yet - complete the forward method above!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error in Sigmoid: {e}\")\n",
-        "    print(\"   Check your implementation in the forward method\")\n",
-        "\n",
-        "print()  # Add spacing"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "a987dc2f",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 4: Tanh - The Centered Alternative\n",
-        "\n",
-        "### What is Tanh?\n",
-        "**Tanh (Hyperbolic Tangent)** is similar to Sigmoid but centered around zero, with range (-1, 1).\n",
-        "\n",
-        "**Mathematical Definition:**\n",
-        "```\n",
-        "f(x) = (e^x - e^(-x)) / (e^x + e^(-x))\n",
-        "```\n",
-        "\n",
-        "**Alternative form:**\n",
-        "```\n",
-        "f(x) = 2 * sigmoid(2x) - 1\n",
-        "```\n",
-        "\n",
-        "**Key Properties:**\n",
-        "- **Range**: (-1, 1) - symmetric around zero\n",
-        "- **Zero-centered**: Output has mean closer to zero\n",
-        "- **Smooth**: Differentiable everywhere\n",
-        "- **Stronger gradients**: Steeper than sigmoid\n",
-        "\n",
-        "### Why Tanh is Better Than Sigmoid\n",
-        "1. **Zero-centered**: Helps with gradient flow in deep networks\n",
-        "2. **Stronger gradients**: Faster convergence in some cases\n",
-        "3. **Symmetric**: Better for certain applications\n",
-        "\n",
-        "### Real-World Analogy\n",
-        "Tanh is like a **balanced scale** - it can tip strongly in either direction (-1 to +1) but defaults to neutral (0).\n",
-        "\n",
-        "### When to Use Tanh\n",
-        "- **Hidden layers** (alternative to ReLU)\n",
-        "- **Recurrent networks** (RNNs, LSTMs)\n",
-        "- **When you need zero-centered outputs**"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "e0ecd200",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class Tanh:\n",
-        "    \"\"\"\n",
-        "    Tanh Activation Function: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))\n",
-        "    \n",
-        "    Zero-centered activation function with range (-1, 1).\n",
-        "    Often preferred over Sigmoid for hidden layers.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Apply Tanh activation: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))\n",
-        "        \n",
-        "        TODO: Implement Tanh activation\n",
-        "        \n",
-        "        APPROACH:\n",
-        "        1. Use numpy's built-in tanh function: np.tanh(x.data)\n",
-        "        2. Return a new Tensor with the results\n",
-        "        \n",
-        "        ALTERNATIVE APPROACH:\n",
-        "        1. Compute e^x and e^(-x)\n",
-        "        2. Use formula: (e^x - e^(-x)) / (e^x + e^(-x))\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Input: Tensor([[-2, -1, 0, 1, 2]])\n",
-        "        Expected: Tensor([[-0.964, -0.762, 0.0, 0.762, 0.964]]) (approximately)\n",
-        "        \n",
-        "        HINTS:\n",
-        "        - np.tanh() is the simplest approach\n",
-        "        - Output range is (-1, 1)\n",
-        "        - tanh(0) = 0 (zero-centered)\n",
-        "        - Remember to return a new Tensor object\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Allow calling the activation like a function: tanh(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "0cdb8bc3",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "class Tanh:\n",
-        "    \"\"\"Tanh Activation: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))\"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        result = np.tanh(x.data)\n",
-        "        return Tensor(result)\n",
-        "        \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "b05e8d68",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Tanh Implementation\n",
-        "\n",
-        "Let's test your Tanh implementation to ensure it's working correctly:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "08eafad6",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "try:\n",
-        "    # Create Tanh activation\n",
-        "    tanh = Tanh()\n",
-        "    \n",
-        "    print(\"\ud83d\udd27 Testing Tanh Implementation\")\n",
-        "    print(\"=\" * 40)\n",
-        "    \n",
-        "    # Test 1: Basic functionality\n",
-        "    test_input = Tensor([[-2, -1, 0, 1, 2]])\n",
-        "    result = tanh(test_input)\n",
-        "    \n",
-        "    print(f\"Input:  {test_input.data.flatten()}\")\n",
-        "    print(f\"Output: {result.data.flatten()}\")\n",
-        "    \n",
-        "    # Check properties\n",
-        "    # 1. All outputs should be between -1 and 1\n",
-        "    if np.all(result.data >= -1) and np.all(result.data <= 1):\n",
-        "        print(\"\u2705 Range test passed: all outputs in (-1, 1)\")\n",
-        "    else:\n",
-        "        print(\"\u274c Range test failed: outputs should be in (-1, 1)\")\n",
-        "    \n",
-        "    # 2. Tanh(0) should be 0\n",
-        "    zero_input = Tensor([[0]])\n",
-        "    zero_result = tanh(zero_input)\n",
-        "    if abs(zero_result.data.item()) < 1e-6:\n",
-        "        print(\"\u2705 Tanh(0) = 0 test passed!\")\n",
-        "    else:\n",
-        "        print(f\"\u274c Tanh(0) should be 0, got {zero_result.data.item()}\")\n",
-        "    \n",
-        "    # 3. Test antisymmetry: tanh(-x) = -tanh(x)\n",
-        "    x_val = 1.5\n",
-        "    pos_result = tanh(Tensor([[x_val]])).data.item()\n",
-        "    neg_result = tanh(Tensor([[-x_val]])).data.item()\n",
-        "    \n",
-        "    if abs(pos_result + neg_result) < 1e-6:\n",
-        "        print(\"\u2705 Antisymmetry test passed!\")\n",
-        "    else:\n",
-        "        print(f\"\u274c Antisymmetry test failed: tanh({x_val}) + tanh({-x_val}) should equal 0\")\n",
-        "    \n",
-        "    # 4. Test that tanh is stronger than sigmoid\n",
-        "    # For the same input, |tanh(x)| should be > |sigmoid(x) - 0.5|\n",
-        "    test_val = 1.0\n",
-        "    tanh_result = abs(tanh(Tensor([[test_val]])).data.item())\n",
-        "    sigmoid_result = abs(sigmoid(Tensor([[test_val]])).data.item() - 0.5)\n",
-        "    \n",
-        "    if tanh_result > sigmoid_result:\n",
-        "        print(\"\u2705 Stronger gradient test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Tanh should have stronger gradients than sigmoid\")\n",
-        "    \n",
-        "    print(\"\u2705 Tanh tests complete!\")\n",
-        "    \n",
-        "    # \ud83c\udfa8 Visualize Tanh behavior (development only)\n",
-        "    if _should_show_plots():\n",
-        "        print(\"\\n\ud83c\udfa8 Visualizing Tanh behavior...\")\n",
-        "        visualize_activation_function(tanh, \"Tanh\", x_range=(-3, 3))\n",
-        "        \n",
-        "        # Show Tanh with real data\n",
-        "        sample_data = Tensor([[-2.0, -1.0, 0.0, 1.0, 2.0]])\n",
-        "        visualize_activation_on_data(tanh, \"Tanh\", sample_data)\n",
-        "    \n",
-        "except NotImplementedError:\n",
-        "    print(\"\u26a0\ufe0f  Tanh not implemented yet - complete the forward method above!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error in Tanh: {e}\")\n",
-        "    print(\"   Check your implementation in the forward method\")\n",
-        "\n",
-        "print()  # Add spacing"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "5af77df8",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 5: Softmax - The Probability Maker\n",
-        "\n",
-        "### What is Softmax?\n",
-        "**Softmax** converts a vector of real numbers into a probability distribution. It's essential for multi-class classification.\n",
-        "\n",
-        "**Mathematical Definition:**\n",
-        "```\n",
-        "f(x_i) = e^(x_i) / \u03a3(e^(x_j)) for all j\n",
-        "```\n",
-        "\n",
-        "**Key Properties:**\n",
-        "- **Probability distribution**: All outputs sum to 1\n",
-        "- **Non-negative**: All outputs \u2265 0\n",
-        "- **Differentiable**: Smooth for optimization\n",
-        "- **Relative**: Emphasizes the largest input\n",
-        "\n",
-        "### Why Softmax is Special\n",
-        "1. **Probability interpretation**: Perfect for classification\n",
-        "2. **Competitive**: Emphasizes the winner (largest input)\n",
-        "3. **Differentiable**: Works well with gradient descent\n",
-        "\n",
-        "### Real-World Analogy\n",
-        "Softmax is like **voting with enthusiasm** - not only does the most popular choice win, but the \"votes\" are weighted by how much more popular it is.\n",
-        "\n",
-        "### When to Use Softmax\n",
-        "- **Multi-class classification** (output layer)\n",
-        "- **Attention mechanisms** (in Transformers)\n",
-        "- **When you need probability distributions**\n",
-        "\n",
-        "### Numerical Stability Note\n",
-        "For numerical stability, we subtract the maximum value before computing exponentials."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "a8601324",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class Softmax:\n",
-        "    \"\"\"\n",
-        "    Softmax Activation Function: f(x_i) = e^(x_i) / \u03a3(e^(x_j))\n",
-        "    \n",
-        "    Converts a vector of real numbers into a probability distribution.\n",
-        "    Essential for multi-class classification.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Apply Softmax activation: f(x_i) = e^(x_i) / \u03a3(e^(x_j))\n",
-        "        \n",
-        "        TODO: Implement Softmax activation\n",
-        "        \n",
-        "        APPROACH:\n",
-        "        1. For numerical stability, subtract the maximum value from each row\n",
-        "        2. Compute exponentials of the shifted values\n",
-        "        3. Divide each exponential by the sum of exponentials in its row\n",
-        "        4. Return a new Tensor with the results\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Input: Tensor([[1, 2, 3]])\n",
-        "        Expected: Tensor([[0.090, 0.245, 0.665]]) (approximately)\n",
-        "        Sum should be 1.0\n",
-        "        \n",
-        "        HINTS:\n",
-        "        - Use np.max(x.data, axis=1, keepdims=True) to find row maximums\n",
-        "        - Subtract max from x.data for numerical stability\n",
-        "        - Use np.exp() for exponentials\n",
-        "        - Use np.sum(exp_vals, axis=1, keepdims=True) for row sums\n",
-        "        - Remember to return a new Tensor object\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Allow calling the activation like a function: softmax(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "c59da816",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "class Softmax:\n",
-        "    \"\"\"Softmax Activation: f(x_i) = e^(x_i) / \u03a3(e^(x_j))\"\"\"\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        # Subtract max for numerical stability\n",
-        "        shifted = x.data - np.max(x.data, axis=1, keepdims=True)\n",
-        "        exp_vals = np.exp(shifted)\n",
-        "        result = exp_vals / np.sum(exp_vals, axis=1, keepdims=True)\n",
-        "        return Tensor(result)\n",
-        "        \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "fc394348",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Softmax Implementation\n",
-        "\n",
-        "Let's test your Softmax implementation to ensure it's working correctly:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "7f960109",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "try:\n",
-        "    # Create Softmax activation\n",
-        "    softmax = Softmax()\n",
-        "    \n",
-        "    print(\"\ud83d\udd27 Testing Softmax Implementation\")\n",
-        "    print(\"=\" * 40)\n",
-        "    \n",
-        "    # Test 1: Basic functionality\n",
-        "    test_input = Tensor([[1, 2, 3]])\n",
-        "    result = softmax(test_input)\n",
-        "    \n",
-        "    print(f\"Input:  {test_input.data.flatten()}\")\n",
-        "    print(f\"Output: {result.data.flatten()}\")\n",
-        "    \n",
-        "    # Check properties\n",
-        "    # 1. All outputs should be non-negative\n",
-        "    if np.all(result.data >= 0):\n",
-        "        print(\"\u2705 Non-negative test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Non-negative test failed: all outputs should be \u2265 0\")\n",
-        "    \n",
-        "    # 2. Sum should equal 1 (probability distribution)\n",
-        "    row_sums = np.sum(result.data, axis=1)\n",
-        "    if np.allclose(row_sums, 1.0):\n",
-        "        print(\"\u2705 Probability distribution test passed!\")\n",
-        "    else:\n",
-        "        print(f\"\u274c Sum test failed: sum should be 1.0, got {row_sums}\")\n",
-        "    \n",
-        "    # 3. Test with multiple rows\n",
-        "    multi_input = Tensor([[1, 2, 3], [0, 0, 0], [10, 20, 30]])\n",
-        "    multi_result = softmax(multi_input)\n",
-        "    multi_sums = np.sum(multi_result.data, axis=1)\n",
-        "    \n",
-        "    if np.allclose(multi_sums, 1.0):\n",
-        "        print(\"\u2705 Multi-row test passed!\")\n",
-        "    else:\n",
-        "        print(f\"\u274c Multi-row test failed: all row sums should be 1.0, got {multi_sums}\")\n",
-        "    \n",
-        "    # 4. Test numerical stability\n",
-        "    large_input = Tensor([[1000, 1001, 1002]])\n",
-        "    large_result = softmax(large_input)\n",
-        "    \n",
-        "    # Should not produce NaN or inf\n",
-        "    if not np.any(np.isnan(large_result.data)) and not np.any(np.isinf(large_result.data)):\n",
-        "        print(\"\u2705 Numerical stability test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Numerical stability test failed: large values produced NaN/inf\")\n",
-        "    \n",
-        "    # 5. Test that largest input gets highest probability\n",
-        "    test_logits = Tensor([[1, 5, 2]])\n",
-        "    test_probs = softmax(test_logits)\n",
-        "    max_idx = np.argmax(test_probs.data)\n",
-        "    \n",
-        "    if max_idx == 1:  # Second element (index 1) should be largest\n",
-        "        print(\"\u2705 Max probability test passed!\")\n",
-        "    else:\n",
-        "        print(\"\u274c Max probability test failed: largest input should get highest probability\")\n",
-        "    \n",
-        "    print(\"\u2705 Softmax tests complete!\")\n",
-        "    \n",
-        "    # \ud83c\udfa8 Visualize Softmax behavior (development only)\n",
-        "    if _should_show_plots():\n",
-        "        print(\"\\n\ud83c\udfa8 Visualizing Softmax behavior...\")\n",
-        "        # Note: Softmax is different - it's a vector function, so we show it differently\n",
-        "        sample_logits = Tensor([[1.0, 2.0, 3.0]])  # Simple 3-class example\n",
-        "        softmax_output = softmax(sample_logits)\n",
-        "        \n",
-        "        print(f\"   Example: logits {sample_logits.data.flatten()} \u2192 probabilities {softmax_output.data.flatten()}\")\n",
-        "        print(f\"   Sum of probabilities: {softmax_output.data.sum():.6f} (should be 1.0)\")\n",
-        "        \n",
-        "        # Show how different input scales affect output\n",
-        "        scale_examples = [\n",
-        "            Tensor([[1.0, 2.0, 3.0]]),    # Original\n",
-        "            Tensor([[2.0, 4.0, 6.0]]),    # Scaled up\n",
-        "            Tensor([[0.1, 0.2, 0.3]]),    # Scaled down\n",
-        "        ]\n",
-        "        \n",
-        "        print(\"\\n   \ud83d\udcca Scale sensitivity:\")\n",
-        "        for i, example in enumerate(scale_examples):\n",
-        "            output = softmax(example)\n",
-        "            print(f\"   Scale {i+1}: {example.data.flatten()} \u2192 {output.data.flatten()}\")\n",
-        "    \n",
-        "except NotImplementedError:\n",
-        "    print(\"\u26a0\ufe0f  Softmax not implemented yet - complete the forward method above!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error in Softmax: {e}\")\n",
-        "    print(\"   Check your implementation in the forward method\")\n",
-        "\n",
-        "print()  # Add spacing"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "f7dd27a4",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83c\udfa8 Comprehensive Activation Function Comparison\n",
-        "\n",
-        "Now that we've implemented all four activation functions, let's compare them side by side to understand their differences and use cases."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "9c0ed7b3",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Comprehensive comparison of all activation functions\n",
-        "print(\"\ud83c\udfa8 Comprehensive Activation Function Comparison\")\n",
-        "print(\"=\" * 60)\n",
-        "\n",
-        "try:\n",
-        "    # Create all activation functions\n",
-        "    activations = {\n",
-        "        'ReLU': ReLU(),\n",
-        "        'Sigmoid': Sigmoid(),\n",
-        "        'Tanh': Tanh(),\n",
-        "        'Softmax': Softmax()\n",
-        "    }\n",
-        "    \n",
-        "    # Test with sample data\n",
-        "    test_data = Tensor([[-2, -1, 0, 1, 2]])\n",
-        "    \n",
-        "    print(\"\ud83d\udcca Activation Function Outputs:\")\n",
-        "    print(f\"Input: {test_data.data.flatten()}\")\n",
-        "    print(\"-\" * 40)\n",
-        "    \n",
-        "    for name, activation in activations.items():\n",
-        "        try:\n",
-        "            result = activation(test_data)\n",
-        "            print(f\"{name:8}: {result.data.flatten()}\")\n",
-        "        except Exception as e:\n",
-        "            print(f\"{name:8}: Error - {e}\")\n",
-        "    \n",
-        "    print(\"\\n\ud83d\udcc8 Key Properties Summary:\")\n",
-        "    print(\"-\" * 40)\n",
-        "    print(\"ReLU     : Range [0, \u221e), sparse, fast\")\n",
-        "    print(\"Sigmoid  : Range (0, 1), smooth, probability-like\")\n",
-        "    print(\"Tanh     : Range (-1, 1), zero-centered, symmetric\")\n",
-        "    print(\"Softmax  : Probability distribution, sums to 1\")\n",
-        "    \n",
-        "    print(\"\\n\ud83c\udfaf When to Use Each:\")\n",
-        "    print(\"-\" * 40)\n",
-        "    print(\"ReLU     : Hidden layers, CNNs, most deep networks\")\n",
-        "    print(\"Sigmoid  : Binary classification, gates, probabilities\")\n",
-        "    print(\"Tanh     : RNNs, when you need zero-centered output\")\n",
-        "    print(\"Softmax  : Multi-class classification, attention\")\n",
-        "    \n",
-        "    # Show comprehensive visualization if available\n",
-        "    if _should_show_plots():\n",
-        "        print(\"\\n\ud83c\udfa8 Generating comprehensive comparison plot...\")\n",
-        "        try:\n",
-        "            import matplotlib.pyplot as plt\n",
-        "            \n",
-        "            fig, axes = plt.subplots(2, 2, figsize=(12, 10))\n",
-        "            fig.suptitle('Activation Function Comparison', fontsize=16)\n",
-        "            \n",
-        "            x_vals = np.linspace(-5, 5, 100)\n",
-        "            \n",
-        "            # Plot each activation function\n",
-        "            for i, (name, activation) in enumerate(list(activations.items())[:3]):  # Skip Softmax for now\n",
-        "                row, col = i // 2, i % 2\n",
-        "                ax = axes[row, col]\n",
-        "                \n",
-        "                y_vals = []\n",
-        "                for x in x_vals:\n",
-        "                    try:\n",
-        "                        input_tensor = Tensor([[x]])\n",
-        "                        output = activation(input_tensor)\n",
-        "                        y_vals.append(output.data.item())\n",
-        "                    except:\n",
-        "                        y_vals.append(0)\n",
-        "                \n",
-        "                ax.plot(x_vals, y_vals, 'b-', linewidth=2)\n",
-        "                ax.set_title(f'{name} Activation')\n",
-        "                ax.grid(True, alpha=0.3)\n",
-        "                ax.set_xlabel('Input (x)')\n",
-        "                ax.set_ylabel(f'{name}(x)')\n",
-        "            \n",
-        "            # Special handling for Softmax\n",
-        "            ax = axes[1, 1]\n",
-        "            sample_inputs = np.array([[1, 2, 3], [0, 0, 0], [-1, 0, 1]])\n",
-        "            softmax_results = []\n",
-        "            \n",
-        "            for inp in sample_inputs:\n",
-        "                result = softmax(Tensor([inp]))\n",
-        "                softmax_results.append(result.data.flatten())\n",
-        "            \n",
-        "            x_pos = np.arange(len(sample_inputs))\n",
-        "            width = 0.25\n",
-        "            \n",
-        "            for i in range(3):  # 3 classes\n",
-        "                values = [result[i] for result in softmax_results]\n",
-        "                ax.bar(x_pos + i * width, values, width, label=f'Class {i+1}')\n",
-        "            \n",
-        "            ax.set_title('Softmax Activation')\n",
-        "            ax.set_xlabel('Input Examples')\n",
-        "            ax.set_ylabel('Probability')\n",
-        "            ax.set_xticks(x_pos + width)\n",
-        "            ax.set_xticklabels(['[1,2,3]', '[0,0,0]', '[-1,0,1]'])\n",
-        "            ax.legend()\n",
-        "            \n",
-        "            plt.tight_layout()\n",
-        "            plt.show()\n",
-        "            \n",
-        "        except ImportError:\n",
-        "            print(\"   \ud83d\udcca Matplotlib not available - skipping comprehensive plot\")\n",
-        "        except Exception as e:\n",
-        "            print(f\"   \u26a0\ufe0f  Comprehensive plot error: {e}\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error in comprehensive comparison: {e}\")\n",
-        "\n",
-        "print(\"\\n\" + \"=\" * 60)\n",
-        "print(\"\ud83c\udf89 Congratulations! You've implemented all four activation functions!\")\n",
-        "print(\"You now understand the building blocks that make neural networks intelligent.\")\n",
-        "print(\"=\" * 60) "
-      ]
-    }
-  ],
-  "metadata": {
-    "jupytext": {
-      "main_language": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/assignments/source/03_layers/03_layers.ipynb b/assignments/source/03_layers/03_layers.ipynb
deleted file mode 100644
index ea53eb3b..00000000
--- a/assignments/source/03_layers/03_layers.ipynb
+++ /dev/null
@@ -1,797 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "0a3df1fa",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "# Module 2: Layers - Neural Network Building Blocks\n",
-        "\n",
-        "Welcome to the Layers module! This is where neural networks begin. You'll implement the fundamental building blocks that transform tensors.\n",
-        "\n",
-        "## Learning Goals\n",
-        "- Understand layers as functions that transform tensors: `y = f(x)`\n",
-        "- Implement Dense layers with linear transformations: `y = Wx + b`\n",
-        "- Use activation functions from the activations module for nonlinearity\n",
-        "- See how neural networks are just function composition\n",
-        "- Build intuition before diving into training\n",
-        "\n",
-        "## Build \u2192 Use \u2192 Understand\n",
-        "1. **Build**: Dense layers using activation functions as building blocks\n",
-        "2. **Use**: Transform tensors and see immediate results\n",
-        "3. **Understand**: How neural networks transform information\n",
-        "\n",
-        "## Module Dependencies\n",
-        "This module builds on the **activations** module:\n",
-        "- **activations** \u2192 **layers** \u2192 **networks**\n",
-        "- Clean separation of concerns: math functions \u2192 layer building blocks \u2192 full networks"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "7ad0cde1",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83d\udce6 Where This Code Lives in the Final Package\n",
-        "\n",
-        "**Learning Side:** You work in `modules/03_layers/layers_dev.py`  \n",
-        "**Building Side:** Code exports to `tinytorch.core.layers`\n",
-        "\n",
-        "```python\n",
-        "# Final package structure:\n",
-        "from tinytorch.core.layers import Dense, Conv2D  # All layers together!\n",
-        "from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
-        "from tinytorch.core.tensor import Tensor\n",
-        "```\n",
-        "\n",
-        "**Why this matters:**\n",
-        "- **Learning:** Focused modules for deep understanding\n",
-        "- **Production:** Proper organization like PyTorch's `torch.nn`\n",
-        "- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "5e2b163c",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| default_exp core.layers\n",
-        "\n",
-        "# Setup and imports\n",
-        "import numpy as np\n",
-        "import sys\n",
-        "from typing import Union, Optional, Callable\n",
-        "import math"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "75eb63f1",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "import numpy as np\n",
-        "import math\n",
-        "import sys\n",
-        "from typing import Union, Optional, Callable\n",
-        "\n",
-        "# Import from the main package (rock solid foundation)\n",
-        "from tinytorch.core.tensor import Tensor\n",
-        "from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
-        "\n",
-        "# print(\"\ud83d\udd25 TinyTorch Layers Module\")\n",
-        "# print(f\"NumPy version: {np.__version__}\")\n",
-        "# print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-        "# print(\"Ready to build neural network layers!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "0d8689a4",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Step 1: What is a Layer?\n",
-        "\n",
-        "### Definition\n",
-        "A **layer** is a function that transforms tensors. Think of it as a mathematical operation that takes input data and produces output data:\n",
-        "\n",
-        "```\n",
-        "Input Tensor \u2192 Layer \u2192 Output Tensor\n",
-        "```\n",
-        "\n",
-        "### Why Layers Matter in Neural Networks\n",
-        "Layers are the fundamental building blocks of all neural networks because:\n",
-        "- **Modularity**: Each layer has a specific job (linear transformation, nonlinearity, etc.)\n",
-        "- **Composability**: Layers can be combined to create complex functions\n",
-        "- **Learnability**: Each layer has parameters that can be learned from data\n",
-        "- **Interpretability**: Different layers learn different features\n",
-        "\n",
-        "### The Fundamental Insight\n",
-        "**Neural networks are just function composition!**\n",
-        "```\n",
-        "x \u2192 Layer1 \u2192 Layer2 \u2192 Layer3 \u2192 y\n",
-        "```\n",
-        "\n",
-        "Each layer transforms the data, and the final output is the composition of all these transformations.\n",
-        "\n",
-        "### Real-World Examples\n",
-        "- **Dense Layer**: Learns linear relationships between features\n",
-        "- **Convolutional Layer**: Learns spatial patterns in images\n",
-        "- **Recurrent Layer**: Learns temporal patterns in sequences\n",
-        "- **Activation Layer**: Adds nonlinearity to make networks powerful\n",
-        "\n",
-        "### Visual Intuition\n",
-        "```\n",
-        "Input: [1, 2, 3] (3 features)\n",
-        "Dense Layer: y = Wx + b\n",
-        "Weights W: [[0.1, 0.2, 0.3],\n",
-        "            [0.4, 0.5, 0.6]] (2\u00d73 matrix)\n",
-        "Bias b: [0.1, 0.2] (2 values)\n",
-        "Output: [0.1*1 + 0.2*2 + 0.3*3 + 0.1,\n",
-        "         0.4*1 + 0.5*2 + 0.6*3 + 0.2] = [1.4, 3.2]\n",
-        "```\n",
-        "\n",
-        "Let's start with the most important layer: **Dense** (also called Linear or Fully Connected)."
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "16017609",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 2: Understanding Matrix Multiplication\n",
-        "\n",
-        "Before we build layers, let's understand the core operation: **matrix multiplication**. This is what powers all neural network computations.\n",
-        "\n",
-        "### Why Matrix Multiplication Matters\n",
-        "- **Efficiency**: Process multiple inputs at once\n",
-        "- **Parallelization**: GPU acceleration works great with matrix operations\n",
-        "- **Batch processing**: Handle multiple samples simultaneously\n",
-        "- **Mathematical foundation**: Linear algebra is the language of neural networks\n",
-        "\n",
-        "### The Math Behind It\n",
-        "For matrices A (m\u00d7n) and B (n\u00d7p), the result C (m\u00d7p) is:\n",
-        "```\n",
-        "C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))\n",
-        "```\n",
-        "\n",
-        "### Visual Example\n",
-        "```\n",
-        "A = [[1, 2],     B = [[5, 6],\n",
-        "     [3, 4]]          [7, 8]]\n",
-        "\n",
-        "C = A @ B = [[1*5 + 2*7,  1*6 + 2*8],\n",
-        "              [3*5 + 4*7,  3*6 + 4*8]]\n",
-        "  = [[19, 22],\n",
-        "     [43, 50]]\n",
-        "```\n",
-        "\n",
-        "Let's implement this step by step!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "40630d5d",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
-        "    \"\"\"\n",
-        "    Naive matrix multiplication using explicit for-loops.\n",
-        "    \n",
-        "    This helps you understand what matrix multiplication really does!\n",
-        "    \n",
-        "    Args:\n",
-        "        A: Matrix of shape (m, n)\n",
-        "        B: Matrix of shape (n, p)\n",
-        "        \n",
-        "    Returns:\n",
-        "        Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))\n",
-        "        \n",
-        "    TODO: Implement matrix multiplication using three nested for-loops.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Get the dimensions: m, n from A and n2, p from B\n",
-        "    2. Check that n == n2 (matrices must be compatible)\n",
-        "    3. Create output matrix C of shape (m, p) filled with zeros\n",
-        "    4. Use three nested loops:\n",
-        "       - i loop: rows of A (0 to m-1)\n",
-        "       - j loop: columns of B (0 to p-1) \n",
-        "       - k loop: shared dimension (0 to n-1)\n",
-        "    5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j]\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    A = [[1, 2],     B = [[5, 6],\n",
-        "         [3, 4]]          [7, 8]]\n",
-        "    \n",
-        "    C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19\n",
-        "    C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22\n",
-        "    C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43\n",
-        "    C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Start with C = np.zeros((m, p))\n",
-        "    - Use three nested for loops: for i in range(m): for j in range(p): for k in range(n):\n",
-        "    - Accumulate the sum: C[i,j] += A[i,k] * B[k,j]\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "445593e1",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:\n",
-        "    \"\"\"\n",
-        "    Naive matrix multiplication using explicit for-loops.\n",
-        "    \n",
-        "    This helps you understand what matrix multiplication really does!\n",
-        "    \"\"\"\n",
-        "    m, n = A.shape\n",
-        "    n2, p = B.shape\n",
-        "    assert n == n2, f\"Matrix shapes don't match: A({m},{n}) @ B({n2},{p})\"\n",
-        "    \n",
-        "    C = np.zeros((m, p))\n",
-        "    for i in range(m):\n",
-        "        for j in range(p):\n",
-        "            for k in range(n):\n",
-        "                C[i, j] += A[i, k] * B[k, j]\n",
-        "    return C"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "e23b8269",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Matrix Multiplication"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "48fadbe0",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test matrix multiplication\n",
-        "print(\"Testing matrix multiplication...\")\n",
-        "\n",
-        "try:\n",
-        "    # Test case 1: Simple 2x2 matrices\n",
-        "    A = np.array([[1, 2], [3, 4]], dtype=np.float32)\n",
-        "    B = np.array([[5, 6], [7, 8]], dtype=np.float32)\n",
-        "    \n",
-        "    result = matmul_naive(A, B)\n",
-        "    expected = np.array([[19, 22], [43, 50]], dtype=np.float32)\n",
-        "    \n",
-        "    print(f\"\u2705 Matrix A:\\n{A}\")\n",
-        "    print(f\"\u2705 Matrix B:\\n{B}\")\n",
-        "    print(f\"\u2705 Your result:\\n{result}\")\n",
-        "    print(f\"\u2705 Expected:\\n{expected}\")\n",
-        "    \n",
-        "    assert np.allclose(result, expected), \"\u274c Result doesn't match expected!\"\n",
-        "    print(\"\ud83c\udf89 Matrix multiplication works!\")\n",
-        "    \n",
-        "    # Test case 2: Compare with NumPy\n",
-        "    numpy_result = A @ B\n",
-        "    assert np.allclose(result, numpy_result), \"\u274c Doesn't match NumPy result!\"\n",
-        "    print(\"\u2705 Matches NumPy implementation!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement matmul_naive above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "3df7433e",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 3: Building the Dense Layer\n",
-        "\n",
-        "Now let's build the **Dense layer**, the most fundamental building block of neural networks. A Dense layer performs a linear transformation: `y = Wx + b`\n",
-        "\n",
-        "### What is a Dense Layer?\n",
-        "- **Linear transformation**: `y = Wx + b`\n",
-        "- **W**: Weight matrix (learnable parameters)\n",
-        "- **x**: Input tensor\n",
-        "- **b**: Bias vector (learnable parameters)\n",
-        "- **y**: Output tensor\n",
-        "\n",
-        "### Why Dense Layers Matter\n",
-        "- **Universal approximation**: Can approximate any function with enough neurons\n",
-        "- **Feature learning**: Each neuron learns a different feature\n",
-        "- **Nonlinearity**: When combined with activation functions, becomes very powerful\n",
-        "- **Foundation**: All other layers build on this concept\n",
-        "\n",
-        "### The Math\n",
-        "For input x of shape (batch_size, input_size):\n",
-        "- **W**: Weight matrix of shape (input_size, output_size)\n",
-        "- **b**: Bias vector of shape (output_size)\n",
-        "- **y**: Output of shape (batch_size, output_size)\n",
-        "\n",
-        "### Visual Example\n",
-        "```\n",
-        "Input: x = [1, 2, 3] (3 features)\n",
-        "Weights: W = [[0.1, 0.2],    Bias: b = [0.1, 0.2]\n",
-        "              [0.3, 0.4],\n",
-        "              [0.5, 0.6]]\n",
-        "\n",
-        "Step 1: Wx = [0.1*1 + 0.3*2 + 0.5*3,  0.2*1 + 0.4*2 + 0.6*3]\n",
-        "            = [2.2, 3.2]\n",
-        "\n",
-        "Step 2: y = Wx + b = [2.2 + 0.1, 3.2 + 0.2] = [2.3, 3.4]\n",
-        "```\n",
-        "\n",
-        "Let's implement this!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "c98c433e",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class Dense:\n",
-        "    \"\"\"\n",
-        "    Dense (Linear) Layer: y = Wx + b\n",
-        "    \n",
-        "    The fundamental building block of neural networks.\n",
-        "    Performs linear transformation: matrix multiplication + bias addition.\n",
-        "    \n",
-        "    Args:\n",
-        "        input_size: Number of input features\n",
-        "        output_size: Number of output features\n",
-        "        use_bias: Whether to include bias term (default: True)\n",
-        "        use_naive_matmul: Whether to use naive matrix multiplication (for learning)\n",
-        "        \n",
-        "    TODO: Implement the Dense layer with weight initialization and forward pass.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)\n",
-        "    2. Initialize weights with small random values (Xavier/Glorot initialization)\n",
-        "    3. Initialize bias to zeros (if use_bias=True)\n",
-        "    4. Implement forward pass using matrix multiplication and bias addition\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    layer = Dense(input_size=3, output_size=2)\n",
-        "    x = Tensor([[1, 2, 3]])  # batch_size=1, input_size=3\n",
-        "    y = layer(x)  # shape: (1, 2)\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Use np.random.randn() for random initialization\n",
-        "    - Scale weights by sqrt(2/(input_size + output_size)) for Xavier init\n",
-        "    - Store weights and bias as numpy arrays\n",
-        "    - Use matmul_naive or @ operator based on use_naive_matmul flag\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True, \n",
-        "                 use_naive_matmul: bool = False):\n",
-        "        \"\"\"\n",
-        "        Initialize Dense layer with random weights.\n",
-        "        \n",
-        "        Args:\n",
-        "            input_size: Number of input features\n",
-        "            output_size: Number of output features\n",
-        "            use_bias: Whether to include bias term\n",
-        "            use_naive_matmul: Use naive matrix multiplication (for learning)\n",
-        "            \n",
-        "        TODO: \n",
-        "        1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)\n",
-        "        2. Initialize weights with small random values\n",
-        "        3. Initialize bias to zeros (if use_bias=True)\n",
-        "        \n",
-        "        STEP-BY-STEP:\n",
-        "        1. Store the parameters as instance variables\n",
-        "        2. Calculate scale factor for Xavier initialization: sqrt(2/(input_size + output_size))\n",
-        "        3. Initialize weights: np.random.randn(input_size, output_size) * scale\n",
-        "        4. If use_bias=True, initialize bias: np.zeros(output_size)\n",
-        "        5. If use_bias=False, set bias to None\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Dense(3, 2) creates:\n",
-        "        - weights: shape (3, 2) with small random values\n",
-        "        - bias: shape (2,) with zeros\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Forward pass: y = Wx + b\n",
-        "        \n",
-        "        Args:\n",
-        "            x: Input tensor of shape (batch_size, input_size)\n",
-        "            \n",
-        "        Returns:\n",
-        "            Output tensor of shape (batch_size, output_size)\n",
-        "            \n",
-        "        TODO: Implement matrix multiplication and bias addition\n",
-        "        - Use self.use_naive_matmul to choose between NumPy and naive implementation\n",
-        "        - If use_naive_matmul=True, use matmul_naive(x.data, self.weights)\n",
-        "        - If use_naive_matmul=False, use x.data @ self.weights\n",
-        "        - Add bias if self.use_bias=True\n",
-        "        \n",
-        "        STEP-BY-STEP:\n",
-        "        1. Perform matrix multiplication: Wx\n",
-        "           - If use_naive_matmul: result = matmul_naive(x.data, self.weights)\n",
-        "           - Else: result = x.data @ self.weights\n",
-        "        2. Add bias if use_bias: result += self.bias\n",
-        "        3. Return Tensor(result)\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Input x: Tensor([[1, 2, 3]])  # shape (1, 3)\n",
-        "        Weights: shape (3, 2)\n",
-        "        Output: Tensor([[val1, val2]])  # shape (1, 2)\n",
-        "        \n",
-        "        HINTS:\n",
-        "        - x.data gives you the numpy array\n",
-        "        - self.weights is your weight matrix\n",
-        "        - Use broadcasting for bias addition: result + self.bias\n",
-        "        - Return Tensor(result) to wrap the result\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "2afc2026",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "class Dense:\n",
-        "    \"\"\"\n",
-        "    Dense (Linear) Layer: y = Wx + b\n",
-        "    \n",
-        "    The fundamental building block of neural networks.\n",
-        "    Performs linear transformation: matrix multiplication + bias addition.\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def __init__(self, input_size: int, output_size: int, use_bias: bool = True, \n",
-        "                 use_naive_matmul: bool = False):\n",
-        "        \"\"\"\n",
-        "        Initialize Dense layer with random weights.\n",
-        "        \n",
-        "        Args:\n",
-        "            input_size: Number of input features\n",
-        "            output_size: Number of output features\n",
-        "            use_bias: Whether to include bias term\n",
-        "            use_naive_matmul: Use naive matrix multiplication (for learning)\n",
-        "        \"\"\"\n",
-        "        # Store parameters\n",
-        "        self.input_size = input_size\n",
-        "        self.output_size = output_size\n",
-        "        self.use_bias = use_bias\n",
-        "        self.use_naive_matmul = use_naive_matmul\n",
-        "        \n",
-        "        # Xavier/Glorot initialization\n",
-        "        scale = np.sqrt(2.0 / (input_size + output_size))\n",
-        "        self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale\n",
-        "        \n",
-        "        # Initialize bias\n",
-        "        if use_bias:\n",
-        "            self.bias = np.zeros(output_size, dtype=np.float32)\n",
-        "        else:\n",
-        "            self.bias = None\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Forward pass: y = Wx + b\n",
-        "        \n",
-        "        Args:\n",
-        "            x: Input tensor of shape (batch_size, input_size)\n",
-        "            \n",
-        "        Returns:\n",
-        "            Output tensor of shape (batch_size, output_size)\n",
-        "        \"\"\"\n",
-        "        # Matrix multiplication\n",
-        "        if self.use_naive_matmul:\n",
-        "            result = matmul_naive(x.data, self.weights)\n",
-        "        else:\n",
-        "            result = x.data @ self.weights\n",
-        "        \n",
-        "        # Add bias\n",
-        "        if self.use_bias:\n",
-        "            result += self.bias\n",
-        "        \n",
-        "        return Tensor(result)\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "81d084d3",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Dense Layer"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "24a4e96b",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test Dense layer\n",
-        "print(\"Testing Dense layer...\")\n",
-        "\n",
-        "try:\n",
-        "    # Test basic Dense layer\n",
-        "    layer = Dense(input_size=3, output_size=2, use_bias=True)\n",
-        "    x = Tensor([[1, 2, 3]])  # batch_size=1, input_size=3\n",
-        "    \n",
-        "    print(f\"\u2705 Input shape: {x.shape}\")\n",
-        "    print(f\"\u2705 Layer weights shape: {layer.weights.shape}\")\n",
-        "    print(f\"\u2705 Layer bias shape: {layer.bias.shape}\")\n",
-        "    \n",
-        "    y = layer(x)\n",
-        "    print(f\"\u2705 Output shape: {y.shape}\")\n",
-        "    print(f\"\u2705 Output: {y}\")\n",
-        "    \n",
-        "    # Test without bias\n",
-        "    layer_no_bias = Dense(input_size=2, output_size=1, use_bias=False)\n",
-        "    x2 = Tensor([[1, 2]])\n",
-        "    y2 = layer_no_bias(x2)\n",
-        "    print(f\"\u2705 No bias output: {y2}\")\n",
-        "    \n",
-        "    # Test naive matrix multiplication\n",
-        "    layer_naive = Dense(input_size=2, output_size=2, use_naive_matmul=True)\n",
-        "    x3 = Tensor([[1, 2]])\n",
-        "    y3 = layer_naive(x3)\n",
-        "    print(f\"\u2705 Naive matmul output: {y3}\")\n",
-        "    \n",
-        "    print(\"\\n\ud83c\udf89 All Dense layer tests passed!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement the Dense layer above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "a527c61e",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Step 4: Composing Layers with Activations\n",
-        "\n",
-        "Now let's see how layers work together! A neural network is just layers composed with activation functions.\n",
-        "\n",
-        "### Why Layer Composition Matters\n",
-        "- **Nonlinearity**: Activation functions make networks powerful\n",
-        "- **Feature learning**: Each layer learns different levels of features\n",
-        "- **Universal approximation**: Can approximate any function\n",
-        "- **Modularity**: Easy to experiment with different architectures\n",
-        "\n",
-        "### The Pattern\n",
-        "```\n",
-        "Input \u2192 Dense \u2192 Activation \u2192 Dense \u2192 Activation \u2192 Output\n",
-        "```\n",
-        "\n",
-        "### Real-World Example\n",
-        "```\n",
-        "Input: [1, 2, 3] (3 features)\n",
-        "Dense(3\u21922): [1.4, 2.8] (linear transformation)\n",
-        "ReLU: [1.4, 2.8] (nonlinearity)\n",
-        "Dense(2\u21921): [3.2] (final prediction)\n",
-        "```\n",
-        "\n",
-        "Let's build a simple network!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "db3611ff",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test layer composition\n",
-        "print(\"Testing layer composition...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create a simple network: Dense \u2192 ReLU \u2192 Dense\n",
-        "    dense1 = Dense(input_size=3, output_size=2)\n",
-        "    relu = ReLU()\n",
-        "    dense2 = Dense(input_size=2, output_size=1)\n",
-        "    \n",
-        "    # Test input\n",
-        "    x = Tensor([[1, 2, 3]])\n",
-        "    print(f\"\u2705 Input: {x}\")\n",
-        "    \n",
-        "    # Forward pass through the network\n",
-        "    h1 = dense1(x)\n",
-        "    print(f\"\u2705 After Dense1: {h1}\")\n",
-        "    \n",
-        "    h2 = relu(h1)\n",
-        "    print(f\"\u2705 After ReLU: {h2}\")\n",
-        "    \n",
-        "    y = dense2(h2)\n",
-        "    print(f\"\u2705 Final output: {y}\")\n",
-        "    \n",
-        "    print(\"\\n\ud83c\udf89 Layer composition works!\")\n",
-        "    print(\"This is how neural networks work: layers + activations!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure all your layers and activations are working!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "69f75a1f",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Step 5: Performance Comparison\n",
-        "\n",
-        "Let's compare our naive matrix multiplication with NumPy's optimized version to understand why optimization matters in ML.\n",
-        "\n",
-        "### Why Performance Matters\n",
-        "- **Training time**: Neural networks train for hours/days\n",
-        "- **Inference speed**: Real-time applications need fast predictions\n",
-        "- **GPU utilization**: Optimized operations use hardware efficiently\n",
-        "- **Scalability**: Large models need efficient implementations"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "25fc59d6",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Performance comparison\n",
-        "print(\"Comparing naive vs NumPy matrix multiplication...\")\n",
-        "\n",
-        "try:\n",
-        "    import time\n",
-        "    \n",
-        "    # Create test matrices\n",
-        "    A = np.random.randn(100, 100).astype(np.float32)\n",
-        "    B = np.random.randn(100, 100).astype(np.float32)\n",
-        "    \n",
-        "    # Time naive implementation\n",
-        "    start_time = time.time()\n",
-        "    result_naive = matmul_naive(A, B)\n",
-        "    naive_time = time.time() - start_time\n",
-        "    \n",
-        "    # Time NumPy implementation\n",
-        "    start_time = time.time()\n",
-        "    result_numpy = A @ B\n",
-        "    numpy_time = time.time() - start_time\n",
-        "    \n",
-        "    print(f\"\u2705 Naive time: {naive_time:.4f} seconds\")\n",
-        "    print(f\"\u2705 NumPy time: {numpy_time:.4f} seconds\")\n",
-        "    print(f\"\u2705 Speedup: {naive_time/numpy_time:.1f}x faster\")\n",
-        "    \n",
-        "    # Verify correctness\n",
-        "    assert np.allclose(result_naive, result_numpy), \"Results don't match!\"\n",
-        "    print(\"\u2705 Results are identical!\")\n",
-        "    \n",
-        "    print(\"\\n\ud83d\udca1 This is why we use optimized libraries in production!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "ca2216d4",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83c\udfaf Module Summary\n",
-        "\n",
-        "Congratulations! You've built the foundation of neural network layers:\n",
-        "\n",
-        "### What You've Accomplished\n",
-        "\u2705 **Matrix Multiplication**: Understanding the core operation  \n",
-        "\u2705 **Dense Layer**: Linear transformation with weights and bias  \n",
-        "\u2705 **Layer Composition**: Combining layers with activations  \n",
-        "\u2705 **Performance Awareness**: Understanding optimization importance  \n",
-        "\u2705 **Testing**: Immediate feedback on your implementations  \n",
-        "\n",
-        "### Key Concepts You've Learned\n",
-        "- **Layers** are functions that transform tensors\n",
-        "- **Matrix multiplication** powers all neural network computations\n",
-        "- **Dense layers** perform linear transformations: `y = Wx + b`\n",
-        "- **Layer composition** creates complex functions from simple building blocks\n",
-        "- **Performance** matters for real-world ML applications\n",
-        "\n",
-        "### What's Next\n",
-        "In the next modules, you'll build on this foundation:\n",
-        "- **Networks**: Compose layers into complete models\n",
-        "- **Training**: Learn parameters with gradients and optimization\n",
-        "- **Convolutional layers**: Process spatial data like images\n",
-        "- **Recurrent layers**: Process sequential data like text\n",
-        "\n",
-        "### Real-World Connection\n",
-        "Your Dense layer is now ready to:\n",
-        "- Learn patterns in data through weight updates\n",
-        "- Transform features for classification and regression\n",
-        "- Serve as building blocks for complex architectures\n",
-        "- Integrate with the rest of the TinyTorch ecosystem\n",
-        "\n",
-        "**Ready for the next challenge?** Let's move on to building complete neural networks!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b8fef297",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Final verification\n",
-        "print(\"\\n\" + \"=\"*50)\n",
-        "print(\"\ud83c\udf89 LAYERS MODULE COMPLETE!\")\n",
-        "print(\"=\"*50)\n",
-        "print(\"\u2705 Matrix multiplication understanding\")\n",
-        "print(\"\u2705 Dense layer implementation\")\n",
-        "print(\"\u2705 Layer composition with activations\")\n",
-        "print(\"\u2705 Performance awareness\")\n",
-        "print(\"\u2705 Comprehensive testing\")\n",
-        "print(\"\\n\ud83d\ude80 Ready to build networks in the next module!\") "
-      ]
-    }
-  ],
-  "metadata": {
-    "jupytext": {
-      "main_language": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/assignments/source/04_networks/04_networks.ipynb b/assignments/source/04_networks/04_networks.ipynb
deleted file mode 100644
index 6ebd8c5e..00000000
--- a/assignments/source/04_networks/04_networks.ipynb
+++ /dev/null
@@ -1,1437 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "d99dcffa",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "# Module 3: Networks - Neural Network Architectures\n",
-        "\n",
-        "Welcome to the Networks module! This is where we compose layers into complete neural network architectures.\n",
-        "\n",
-        "## Learning Goals\n",
-        "- Understand networks as function composition: `f(x) = layer_n(...layer_2(layer_1(x)))`\n",
-        "- Build common architectures (MLP, CNN) from layers\n",
-        "- Visualize network structure and data flow\n",
-        "- See how architecture affects capability\n",
-        "- Master forward pass inference (no training yet!)\n",
-        "\n",
-        "## Build \u2192 Use \u2192 Understand\n",
-        "1. **Build**: Compose layers into complete networks\n",
-        "2. **Use**: Create different architectures and run inference\n",
-        "3. **Understand**: How architecture design affects network behavior\n",
-        "\n",
-        "## Module Dependencies\n",
-        "This module builds on previous modules:\n",
-        "- **tensor** \u2192 **activations** \u2192 **layers** \u2192 **networks**\n",
-        "- Clean composition: math functions \u2192 building blocks \u2192 complete systems"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "b9dc1bb2",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83d\udce6 Where This Code Lives in the Final Package\n",
-        "\n",
-        "**Learning Side:** You work in `modules/networks/networks_dev.py`  \n",
-        "**Building Side:** Code exports to `tinytorch.core.networks`\n",
-        "\n",
-        "```python\n",
-        "# Final package structure:\n",
-        "from tinytorch.core.networks import Sequential, MLP\n",
-        "from tinytorch.core.layers import Dense, Conv2D\n",
-        "from tinytorch.core.activations import ReLU, Sigmoid, Tanh\n",
-        "from tinytorch.core.tensor import Tensor\n",
-        "```\n",
-        "\n",
-        "**Why this matters:**\n",
-        "- **Learning:** Focused modules for deep understanding\n",
-        "- **Production:** Proper organization like PyTorch's `torch.nn`\n",
-        "- **Consistency:** All network architectures live together in `core.networks`"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "d716e1fb",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| default_exp core.networks\n",
-        "\n",
-        "# Setup and imports\n",
-        "import numpy as np\n",
-        "import sys\n",
-        "from typing import List, Union, Optional, Callable\n",
-        "import matplotlib.pyplot as plt\n",
-        "import matplotlib.patches as patches\n",
-        "from matplotlib.patches import FancyBboxPatch, ConnectionPatch\n",
-        "import seaborn as sns\n",
-        "\n",
-        "# Import all the building blocks we need\n",
-        "from tinytorch.core.tensor import Tensor\n",
-        "from tinytorch.core.layers import Dense\n",
-        "from tinytorch.core.activations import ReLU, Sigmoid, Tanh, Softmax\n",
-        "\n",
-        "print(\"\ud83d\udd25 TinyTorch Networks Module\")\n",
-        "print(f\"NumPy version: {np.__version__}\")\n",
-        "print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
-        "print(\"Ready to build neural network architectures!\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "0a4ba348",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "import numpy as np\n",
-        "import sys\n",
-        "from typing import List, Union, Optional, Callable\n",
-        "import matplotlib.pyplot as plt\n",
-        "import matplotlib.patches as patches\n",
-        "from matplotlib.patches import FancyBboxPatch, ConnectionPatch\n",
-        "import seaborn as sns\n",
-        "\n",
-        "# Import our building blocks\n",
-        "from tinytorch.core.tensor import Tensor\n",
-        "from tinytorch.core.layers import Dense\n",
-        "from tinytorch.core.activations import ReLU, Sigmoid, Tanh"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "802e174e",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def _should_show_plots():\n",
-        "    \"\"\"Check if we should show plots (disable during testing)\"\"\"\n",
-        "    return 'pytest' not in sys.modules and 'test' not in sys.argv"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "bad0d49f",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 1: What is a Network?\n",
-        "\n",
-        "### Definition\n",
-        "A **network** is a composition of layers that transforms input data into output predictions. Think of it as a pipeline of transformations:\n",
-        "\n",
-        "```\n",
-        "Input \u2192 Layer1 \u2192 Layer2 \u2192 Layer3 \u2192 Output\n",
-        "```\n",
-        "\n",
-        "### Why Networks Matter\n",
-        "- **Function composition**: Complex behavior from simple building blocks\n",
-        "- **Learnable parameters**: Each layer has weights that can be learned\n",
-        "- **Architecture design**: Different layouts solve different problems\n",
-        "- **Real-world applications**: Classification, regression, generation, etc.\n",
-        "\n",
-        "### The Fundamental Insight\n",
-        "**Neural networks are just function composition!**\n",
-        "- Each layer is a function: `f_i(x)`\n",
-        "- The network is: `f(x) = f_n(...f_2(f_1(x)))`\n",
-        "- Complex behavior emerges from simple building blocks\n",
-        "\n",
-        "### Real-World Examples\n",
-        "- **MLP (Multi-Layer Perceptron)**: Classic feedforward network\n",
-        "- **CNN (Convolutional Neural Network)**: For image processing\n",
-        "- **RNN (Recurrent Neural Network)**: For sequential data\n",
-        "- **Transformer**: For attention-based processing\n",
-        "\n",
-        "### Visual Intuition\n",
-        "```\n",
-        "Input: [1, 2, 3] (3 features)\n",
-        "Layer1: [1.4, 2.8] (linear transformation)\n",
-        "Layer2: [1.4, 2.8] (nonlinearity)\n",
-        "Layer3: [0.7] (final prediction)\n",
-        "```\n",
-        "\n",
-        "### The Math Behind It\n",
-        "For a network with layers `f_1, f_2, ..., f_n`:\n",
-        "```\n",
-        "f(x) = f_n(f_{n-1}(...f_2(f_1(x))))\n",
-        "```\n",
-        "\n",
-        "Each layer transforms the data, and the final output is the composition of all these transformations.\n",
-        "\n",
-        "Let's start by building the most fundamental network: **Sequential**."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "8ba92c7d",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class Sequential:\n",
-        "    \"\"\"\n",
-        "    Sequential Network: Composes layers in sequence\n",
-        "    \n",
-        "    The most fundamental network architecture.\n",
-        "    Applies layers in order: f(x) = layer_n(...layer_2(layer_1(x)))\n",
-        "    \n",
-        "    Args:\n",
-        "        layers: List of layers to compose\n",
-        "        \n",
-        "    TODO: Implement the Sequential network with forward pass.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Store the list of layers as an instance variable\n",
-        "    2. Implement forward pass that applies each layer in sequence\n",
-        "    3. Make the network callable for easy use\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    network = Sequential([\n",
-        "        Dense(3, 4),\n",
-        "        ReLU(),\n",
-        "        Dense(4, 2),\n",
-        "        Sigmoid()\n",
-        "    ])\n",
-        "    x = Tensor([[1, 2, 3]])\n",
-        "    y = network(x)  # Forward pass through all layers\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Store layers in self.layers\n",
-        "    - Use a for loop to apply each layer in order\n",
-        "    - Each layer's output becomes the next layer's input\n",
-        "    - Return the final output\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def __init__(self, layers: List):\n",
-        "        \"\"\"\n",
-        "        Initialize Sequential network with layers.\n",
-        "        \n",
-        "        Args:\n",
-        "            layers: List of layers to compose in order\n",
-        "            \n",
-        "        TODO: Store the layers and implement forward pass\n",
-        "        \n",
-        "        STEP-BY-STEP:\n",
-        "        1. Store the layers list as self.layers\n",
-        "        2. This creates the network architecture\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Sequential([Dense(3,4), ReLU(), Dense(4,2)])\n",
-        "        creates a 3-layer network: Dense \u2192 ReLU \u2192 Dense\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Forward pass through all layers in sequence.\n",
-        "        \n",
-        "        Args:\n",
-        "            x: Input tensor\n",
-        "            \n",
-        "        Returns:\n",
-        "            Output tensor after passing through all layers\n",
-        "            \n",
-        "        TODO: Implement sequential forward pass through all layers\n",
-        "        \n",
-        "        STEP-BY-STEP:\n",
-        "        1. Start with the input tensor: current = x\n",
-        "        2. Loop through each layer in self.layers\n",
-        "        3. Apply each layer: current = layer(current)\n",
-        "        4. Return the final output\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Input: Tensor([[1, 2, 3]])\n",
-        "        Layer1 (Dense): Tensor([[1.4, 2.8]])\n",
-        "        Layer2 (ReLU): Tensor([[1.4, 2.8]])\n",
-        "        Layer3 (Dense): Tensor([[0.7]])\n",
-        "        Output: Tensor([[0.7]])\n",
-        "        \n",
-        "        HINTS:\n",
-        "        - Use a for loop: for layer in self.layers:\n",
-        "        - Apply each layer: current = layer(current)\n",
-        "        - The output of one layer becomes input to the next\n",
-        "        - Return the final result\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Make network callable: network(x) same as network.forward(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b53463f1",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "class Sequential:\n",
-        "    \"\"\"\n",
-        "    Sequential Network: Composes layers in sequence\n",
-        "    \n",
-        "    The most fundamental network architecture.\n",
-        "    Applies layers in order: f(x) = layer_n(...layer_2(layer_1(x)))\n",
-        "    \"\"\"\n",
-        "    \n",
-        "    def __init__(self, layers: List):\n",
-        "        \"\"\"Initialize Sequential network with layers.\"\"\"\n",
-        "        self.layers = layers\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Forward pass through all layers in sequence.\"\"\"\n",
-        "        # Apply each layer in order\n",
-        "        for layer in self.layers:\n",
-        "            x = layer(x)\n",
-        "        return x\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Make network callable: network(x) same as network.forward(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "3eab5240",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Sequential Network"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "0982dae7",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test the Sequential network\n",
-        "print(\"Testing Sequential network...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create a simple 2-layer network: 3 \u2192 4 \u2192 2\n",
-        "    network = Sequential([\n",
-        "        Dense(input_size=3, output_size=4),\n",
-        "        ReLU(),\n",
-        "        Dense(input_size=4, output_size=2),\n",
-        "        Sigmoid()\n",
-        "    ])\n",
-        "    \n",
-        "    print(f\"\u2705 Network created with {len(network.layers)} layers\")\n",
-        "    \n",
-        "    # Test with sample data\n",
-        "    x = Tensor([[1.0, 2.0, 3.0]])\n",
-        "    print(f\"\u2705 Input: {x}\")\n",
-        "    \n",
-        "    # Forward pass\n",
-        "    y = network(x)\n",
-        "    print(f\"\u2705 Output: {y}\")\n",
-        "    print(f\"\u2705 Output shape: {y.shape}\")\n",
-        "    \n",
-        "    # Verify the network works\n",
-        "    assert y.shape == (1, 2), f\"\u274c Expected shape (1, 2), got {y.shape}\"\n",
-        "    assert np.all(y.data >= 0) and np.all(y.data <= 1), \"\u274c Sigmoid output should be between 0 and 1\"\n",
-        "    print(\"\ud83c\udf89 Sequential network works!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement the Sequential network above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "43a55700",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 2: Understanding Network Architecture\n",
-        "\n",
-        "Now let's explore how different network architectures affect the network's capabilities.\n",
-        "\n",
-        "### What is Network Architecture?\n",
-        "**Architecture** refers to how layers are arranged and connected. It determines:\n",
-        "- **Capacity**: How complex patterns the network can learn\n",
-        "- **Efficiency**: How many parameters and computations needed\n",
-        "- **Specialization**: What types of problems it's good at\n",
-        "\n",
-        "### Common Architectures\n",
-        "\n",
-        "#### 1. **MLP (Multi-Layer Perceptron)**\n",
-        "```\n",
-        "Input \u2192 Dense \u2192 ReLU \u2192 Dense \u2192 ReLU \u2192 Dense \u2192 Output\n",
-        "```\n",
-        "- **Use case**: General-purpose learning\n",
-        "- **Strengths**: Universal approximation, simple to understand\n",
-        "- **Weaknesses**: Doesn't exploit spatial structure\n",
-        "\n",
-        "#### 2. **CNN (Convolutional Neural Network)**\n",
-        "```\n",
-        "Input \u2192 Conv2D \u2192 ReLU \u2192 Conv2D \u2192 ReLU \u2192 Dense \u2192 Output\n",
-        "```\n",
-        "- **Use case**: Image processing, spatial data\n",
-        "- **Strengths**: Parameter sharing, translation invariance\n",
-        "- **Weaknesses**: Fixed spatial structure\n",
-        "\n",
-        "#### 3. **Deep Network**\n",
-        "```\n",
-        "Input \u2192 Dense \u2192 ReLU \u2192 Dense \u2192 ReLU \u2192 Dense \u2192 ReLU \u2192 Dense \u2192 Output\n",
-        "```\n",
-        "- **Use case**: Complex pattern recognition\n",
-        "- **Strengths**: High capacity, can learn complex functions\n",
-        "- **Weaknesses**: More parameters, harder to train\n",
-        "\n",
-        "Let's build some common architectures!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "37c8e633",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int, \n",
-        "               activation=ReLU, output_activation=Sigmoid) -> Sequential:\n",
-        "    \"\"\"\n",
-        "    Create a Multi-Layer Perceptron (MLP) network.\n",
-        "    \n",
-        "    Args:\n",
-        "        input_size: Number of input features\n",
-        "        hidden_sizes: List of hidden layer sizes\n",
-        "        output_size: Number of output features\n",
-        "        activation: Activation function for hidden layers (default: ReLU)\n",
-        "        output_activation: Activation function for output layer (default: Sigmoid)\n",
-        "        \n",
-        "    Returns:\n",
-        "        Sequential network with MLP architecture\n",
-        "        \n",
-        "    TODO: Implement MLP creation with alternating Dense and activation layers.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Start with an empty list of layers\n",
-        "    2. Add the first Dense layer: input_size \u2192 first hidden size\n",
-        "    3. For each hidden layer:\n",
-        "       - Add activation function\n",
-        "       - Add Dense layer connecting to next hidden size\n",
-        "    4. Add final activation function\n",
-        "    5. Add final Dense layer: last hidden size \u2192 output_size\n",
-        "    6. Add output activation function\n",
-        "    7. Return Sequential(layers)\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    create_mlp(3, [4, 2], 1) creates:\n",
-        "    Dense(3\u21924) \u2192 ReLU \u2192 Dense(4\u21922) \u2192 ReLU \u2192 Dense(2\u21921) \u2192 Sigmoid\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Start with layers = []\n",
-        "    - Add Dense layers with appropriate input/output sizes\n",
-        "    - Add activation functions between Dense layers\n",
-        "    - Don't forget the final output activation\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "f757230b",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int, \n",
-        "               activation=ReLU, output_activation=Sigmoid) -> Sequential:\n",
-        "    \"\"\"Create a Multi-Layer Perceptron (MLP) network.\"\"\"\n",
-        "    layers = []\n",
-        "    \n",
-        "    # Add first layer\n",
-        "    current_size = input_size\n",
-        "    for hidden_size in hidden_sizes:\n",
-        "        layers.append(Dense(input_size=current_size, output_size=hidden_size))\n",
-        "        layers.append(activation())\n",
-        "        current_size = hidden_size\n",
-        "    \n",
-        "    # Add output layer\n",
-        "    layers.append(Dense(input_size=current_size, output_size=output_size))\n",
-        "    layers.append(output_activation())\n",
-        "    \n",
-        "    return Sequential(layers)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "b06c7a4f",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your MLP Creation"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "2aae0ee1",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test MLP creation\n",
-        "print(\"Testing MLP creation...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create different MLP architectures\n",
-        "    mlp1 = create_mlp(input_size=3, hidden_sizes=[4], output_size=1)\n",
-        "    mlp2 = create_mlp(input_size=5, hidden_sizes=[8, 4], output_size=2)\n",
-        "    mlp3 = create_mlp(input_size=2, hidden_sizes=[10, 6, 3], output_size=1, activation=Tanh)\n",
-        "    \n",
-        "    print(f\"\u2705 MLP1: {len(mlp1.layers)} layers\")\n",
-        "    print(f\"\u2705 MLP2: {len(mlp2.layers)} layers\")\n",
-        "    print(f\"\u2705 MLP3: {len(mlp3.layers)} layers\")\n",
-        "    \n",
-        "    # Test forward pass\n",
-        "    x = Tensor([[1.0, 2.0, 3.0]])\n",
-        "    y1 = mlp1(x)\n",
-        "    print(f\"\u2705 MLP1 output: {y1}\")\n",
-        "    \n",
-        "    x2 = Tensor([[1.0, 2.0, 3.0, 4.0, 5.0]])\n",
-        "    y2 = mlp2(x2)\n",
-        "    print(f\"\u2705 MLP2 output: {y2}\")\n",
-        "    \n",
-        "    print(\"\ud83c\udf89 MLP creation works!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement create_mlp above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "21e27833",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 3: Network Visualization and Analysis\n",
-        "\n",
-        "Let's create tools to visualize and analyze network architectures. This helps us understand what our networks are doing.\n",
-        "\n",
-        "### Why Visualization Matters\n",
-        "- **Architecture understanding**: See how data flows through the network\n",
-        "- **Debugging**: Identify bottlenecks and issues\n",
-        "- **Design**: Compare different architectures\n",
-        "- **Communication**: Explain networks to others\n",
-        "\n",
-        "### What We'll Build\n",
-        "1. **Architecture visualization**: Show layer connections\n",
-        "2. **Data flow visualization**: See how data transforms\n",
-        "3. **Network comparison**: Compare different architectures\n",
-        "4. **Behavior analysis**: Understand network capabilities"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "6b7b9fe8",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def visualize_network_architecture(network: Sequential, title: str = \"Network Architecture\"):\n",
-        "    \"\"\"\n",
-        "    Visualize the architecture of a Sequential network.\n",
-        "    \n",
-        "    Args:\n",
-        "        network: Sequential network to visualize\n",
-        "        title: Title for the plot\n",
-        "        \n",
-        "    TODO: Create a visualization showing the network structure.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Create a matplotlib figure\n",
-        "    2. For each layer, draw a box showing its type and size\n",
-        "    3. Connect the boxes with arrows showing data flow\n",
-        "    4. Add labels and formatting\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    Input \u2192 Dense(3\u21924) \u2192 ReLU \u2192 Dense(4\u21922) \u2192 Sigmoid \u2192 Output\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Use plt.subplots() to create the figure\n",
-        "    - Use plt.text() to add layer labels\n",
-        "    - Use plt.arrow() to show connections\n",
-        "    - Add proper spacing and formatting\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b0cd896c",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def visualize_network_architecture(network: Sequential, title: str = \"Network Architecture\"):\n",
-        "    \"\"\"Visualize the architecture of a Sequential network.\"\"\"\n",
-        "    if not _should_show_plots():\n",
-        "        print(\"\ud83d\udcca Visualization disabled during testing\")\n",
-        "        return\n",
-        "    \n",
-        "    fig, ax = plt.subplots(1, 1, figsize=(12, 6))\n",
-        "    \n",
-        "    # Calculate positions\n",
-        "    num_layers = len(network.layers)\n",
-        "    x_positions = np.linspace(0, 10, num_layers + 2)\n",
-        "    \n",
-        "    # Draw input\n",
-        "    ax.text(x_positions[0], 0, 'Input', ha='center', va='center', \n",
-        "            bbox=dict(boxstyle='round,pad=0.3', facecolor='lightblue'))\n",
-        "    \n",
-        "    # Draw layers\n",
-        "    for i, layer in enumerate(network.layers):\n",
-        "        layer_name = type(layer).__name__\n",
-        "        ax.text(x_positions[i+1], 0, layer_name, ha='center', va='center',\n",
-        "                bbox=dict(boxstyle='round,pad=0.3', facecolor='lightgreen'))\n",
-        "        \n",
-        "        # Draw arrow\n",
-        "        ax.arrow(x_positions[i], 0, 0.8, 0, head_width=0.1, head_length=0.1, \n",
-        "                fc='black', ec='black')\n",
-        "    \n",
-        "    # Draw output\n",
-        "    ax.text(x_positions[-1], 0, 'Output', ha='center', va='center',\n",
-        "            bbox=dict(boxstyle='round,pad=0.3', facecolor='lightcoral'))\n",
-        "    \n",
-        "    ax.set_xlim(-0.5, 10.5)\n",
-        "    ax.set_ylim(-0.5, 0.5)\n",
-        "    ax.set_title(title)\n",
-        "    ax.axis('off')\n",
-        "    plt.show()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "8de4ec12",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Network Visualization"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "3a276cd3",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test network visualization\n",
-        "print(\"Testing network visualization...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create a test network\n",
-        "    test_network = Sequential([\n",
-        "        Dense(input_size=3, output_size=4),\n",
-        "        ReLU(),\n",
-        "        Dense(input_size=4, output_size=2),\n",
-        "        Sigmoid()\n",
-        "    ])\n",
-        "    \n",
-        "    # Visualize the network\n",
-        "    if _should_show_plots():\n",
-        "        visualize_network_architecture(test_network, \"Test Network Architecture\")\n",
-        "        print(\"\u2705 Network visualization created!\")\n",
-        "    else:\n",
-        "        print(\"\u2705 Network visualization skipped during testing\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement visualize_network_architecture above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "7c2c7688",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 4: Data Flow Analysis\n",
-        "\n",
-        "Let's create tools to analyze how data flows through the network. This helps us understand what each layer is doing.\n",
-        "\n",
-        "### Why Data Flow Analysis Matters\n",
-        "- **Debugging**: See where data gets corrupted\n",
-        "- **Optimization**: Identify bottlenecks\n",
-        "- **Understanding**: Learn what each layer learns\n",
-        "- **Design**: Choose appropriate layer sizes"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "0a24b85d",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def visualize_data_flow(network: Sequential, input_data: Tensor, title: str = \"Data Flow Through Network\"):\n",
-        "    \"\"\"\n",
-        "    Visualize how data flows through the network.\n",
-        "    \n",
-        "    Args:\n",
-        "        network: Sequential network to analyze\n",
-        "        input_data: Input tensor to trace through the network\n",
-        "        title: Title for the plot\n",
-        "        \n",
-        "    TODO: Create a visualization showing how data transforms through each layer.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Trace the input through each layer\n",
-        "    2. Record the output of each layer\n",
-        "    3. Create a visualization showing the transformations\n",
-        "    4. Add statistics (mean, std, range) for each layer\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    Input: [1, 2, 3] \u2192 Layer1: [1.4, 2.8] \u2192 Layer2: [1.4, 2.8] \u2192 Output: [0.7]\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Use a for loop to apply each layer\n",
-        "    - Store intermediate outputs\n",
-        "    - Use plt.subplot() to create multiple subplots\n",
-        "    - Show statistics for each layer output\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b1c743f0",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def visualize_data_flow(network: Sequential, input_data: Tensor, title: str = \"Data Flow Through Network\"):\n",
-        "    \"\"\"Visualize how data flows through the network.\"\"\"\n",
-        "    if not _should_show_plots():\n",
-        "        print(\"\ud83d\udcca Visualization disabled during testing\")\n",
-        "        return\n",
-        "    \n",
-        "    # Trace data through network\n",
-        "    current_data = input_data\n",
-        "    layer_outputs = [current_data.data.flatten()]\n",
-        "    layer_names = ['Input']\n",
-        "    \n",
-        "    for layer in network.layers:\n",
-        "        current_data = layer(current_data)\n",
-        "        layer_outputs.append(current_data.data.flatten())\n",
-        "        layer_names.append(type(layer).__name__)\n",
-        "    \n",
-        "    # Create visualization\n",
-        "    fig, axes = plt.subplots(2, len(layer_outputs), figsize=(15, 8))\n",
-        "    \n",
-        "    for i, (output, name) in enumerate(zip(layer_outputs, layer_names)):\n",
-        "        # Histogram\n",
-        "        axes[0, i].hist(output, bins=20, alpha=0.7)\n",
-        "        axes[0, i].set_title(f'{name}\\nShape: {output.shape}')\n",
-        "        axes[0, i].set_xlabel('Value')\n",
-        "        axes[0, i].set_ylabel('Frequency')\n",
-        "        \n",
-        "        # Statistics\n",
-        "        stats_text = f'Mean: {np.mean(output):.3f}\\nStd: {np.std(output):.3f}\\nRange: [{np.min(output):.3f}, {np.max(output):.3f}]'\n",
-        "        axes[1, i].text(0.1, 0.5, stats_text, transform=axes[1, i].transAxes, \n",
-        "                        verticalalignment='center', fontsize=10)\n",
-        "        axes[1, i].set_title(f'{name} Statistics')\n",
-        "        axes[1, i].axis('off')\n",
-        "    \n",
-        "    plt.suptitle(title)\n",
-        "    plt.tight_layout()\n",
-        "    plt.show()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "c86120df",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Data Flow Visualization"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "a53e5f96",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test data flow visualization\n",
-        "print(\"Testing data flow visualization...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create a test network\n",
-        "    test_network = Sequential([\n",
-        "        Dense(input_size=3, output_size=4),\n",
-        "        ReLU(),\n",
-        "        Dense(input_size=4, output_size=2),\n",
-        "        Sigmoid()\n",
-        "    ])\n",
-        "    \n",
-        "    # Test input\n",
-        "    test_input = Tensor([[1.0, 2.0, 3.0]])\n",
-        "    \n",
-        "    # Visualize data flow\n",
-        "    if _should_show_plots():\n",
-        "        visualize_data_flow(test_network, test_input, \"Test Network Data Flow\")\n",
-        "        print(\"\u2705 Data flow visualization created!\")\n",
-        "    else:\n",
-        "        print(\"\u2705 Data flow visualization skipped during testing\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement visualize_data_flow above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "8e4ae578",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 5: Network Comparison and Analysis\n",
-        "\n",
-        "Let's create tools to compare different network architectures and understand their capabilities.\n",
-        "\n",
-        "### Why Network Comparison Matters\n",
-        "- **Architecture selection**: Choose the right network for your problem\n",
-        "- **Performance analysis**: Understand trade-offs between different designs\n",
-        "- **Design insights**: Learn what makes networks effective\n",
-        "- **Research**: Compare new architectures to baselines"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b5566cb1",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def compare_networks(networks: List[Sequential], network_names: List[str], \n",
-        "                    input_data: Tensor, title: str = \"Network Comparison\"):\n",
-        "    \"\"\"\n",
-        "    Compare multiple networks on the same input.\n",
-        "    \n",
-        "    Args:\n",
-        "        networks: List of Sequential networks to compare\n",
-        "        network_names: Names for each network\n",
-        "        input_data: Input tensor to test all networks\n",
-        "        title: Title for the plot\n",
-        "        \n",
-        "    TODO: Create a comparison visualization showing how different networks process the same input.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Run the same input through each network\n",
-        "    2. Collect the outputs and intermediate results\n",
-        "    3. Create a visualization comparing the results\n",
-        "    4. Show statistics and differences\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    Compare MLP vs Deep Network vs Wide Network on same input\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Use a for loop to test each network\n",
-        "    - Store outputs and any relevant statistics\n",
-        "    - Use plt.subplot() to create comparison plots\n",
-        "    - Show both outputs and intermediate layer results\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b0949858",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def compare_networks(networks: List[Sequential], network_names: List[str], \n",
-        "                    input_data: Tensor, title: str = \"Network Comparison\"):\n",
-        "    \"\"\"Compare multiple networks on the same input.\"\"\"\n",
-        "    if not _should_show_plots():\n",
-        "        print(\"\ud83d\udcca Visualization disabled during testing\")\n",
-        "        return\n",
-        "    \n",
-        "    # Test all networks\n",
-        "    outputs = []\n",
-        "    for network in networks:\n",
-        "        output = network(input_data)\n",
-        "        outputs.append(output.data.flatten())\n",
-        "    \n",
-        "    # Create comparison plot\n",
-        "    fig, axes = plt.subplots(2, len(networks), figsize=(15, 8))\n",
-        "    \n",
-        "    for i, (output, name) in enumerate(zip(outputs, network_names)):\n",
-        "        # Output distribution\n",
-        "        axes[0, i].hist(output, bins=20, alpha=0.7)\n",
-        "        axes[0, i].set_title(f'{name}\\nOutput Distribution')\n",
-        "        axes[0, i].set_xlabel('Value')\n",
-        "        axes[0, i].set_ylabel('Frequency')\n",
-        "        \n",
-        "        # Statistics\n",
-        "        stats_text = f'Mean: {np.mean(output):.3f}\\nStd: {np.std(output):.3f}\\nRange: [{np.min(output):.3f}, {np.max(output):.3f}]\\nSize: {len(output)}'\n",
-        "        axes[1, i].text(0.1, 0.5, stats_text, transform=axes[1, i].transAxes, \n",
-        "                        verticalalignment='center', fontsize=10)\n",
-        "        axes[1, i].set_title(f'{name} Statistics')\n",
-        "        axes[1, i].axis('off')\n",
-        "    \n",
-        "    plt.suptitle(title)\n",
-        "    plt.tight_layout()\n",
-        "    plt.show()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "c9e720d5",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Network Comparison"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b27869da",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test network comparison\n",
-        "print(\"Testing network comparison...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create different networks\n",
-        "    network1 = create_mlp(input_size=3, hidden_sizes=[4], output_size=1)\n",
-        "    network2 = create_mlp(input_size=3, hidden_sizes=[8, 4], output_size=1)\n",
-        "    network3 = create_mlp(input_size=3, hidden_sizes=[2], output_size=1, activation=Tanh)\n",
-        "    \n",
-        "    networks = [network1, network2, network3]\n",
-        "    names = [\"Small MLP\", \"Deep MLP\", \"Tanh MLP\"]\n",
-        "    \n",
-        "    # Test input\n",
-        "    test_input = Tensor([[1.0, 2.0, 3.0]])\n",
-        "    \n",
-        "    # Compare networks\n",
-        "    if _should_show_plots():\n",
-        "        compare_networks(networks, names, test_input, \"Network Architecture Comparison\")\n",
-        "        print(\"\u2705 Network comparison created!\")\n",
-        "    else:\n",
-        "        print(\"\u2705 Network comparison skipped during testing\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement compare_networks above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "6bde2a55",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 6: Practical Network Architectures\n",
-        "\n",
-        "Now let's create some practical network architectures for common machine learning tasks.\n",
-        "\n",
-        "### Common Network Types\n",
-        "\n",
-        "#### 1. **Classification Networks**\n",
-        "- **Binary classification**: Output single probability\n",
-        "- **Multi-class classification**: Output probability distribution\n",
-        "- **Use cases**: Image classification, spam detection, sentiment analysis\n",
-        "\n",
-        "#### 2. **Regression Networks**\n",
-        "- **Single output**: Predict continuous value\n",
-        "- **Multiple outputs**: Predict multiple values\n",
-        "- **Use cases**: Price prediction, temperature forecasting, demand estimation\n",
-        "\n",
-        "#### 3. **Feature Extraction Networks**\n",
-        "- **Encoder networks**: Compress data into features\n",
-        "- **Use cases**: Dimensionality reduction, feature learning, representation learning"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "de53dfeb",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def create_classification_network(input_size: int, num_classes: int, \n",
-        "                                hidden_sizes: List[int] = None) -> Sequential:\n",
-        "    \"\"\"\n",
-        "    Create a network for classification tasks.\n",
-        "    \n",
-        "    Args:\n",
-        "        input_size: Number of input features\n",
-        "        num_classes: Number of output classes\n",
-        "        hidden_sizes: List of hidden layer sizes (default: [input_size * 2])\n",
-        "        \n",
-        "    Returns:\n",
-        "        Sequential network for classification\n",
-        "        \n",
-        "    TODO: Implement classification network creation.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Use default hidden sizes if none provided\n",
-        "    2. Create MLP with appropriate architecture\n",
-        "    3. Use Sigmoid for binary classification (num_classes=1)\n",
-        "    4. Use appropriate activation for multi-class\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    create_classification_network(10, 3) creates:\n",
-        "    Dense(10\u219220) \u2192 ReLU \u2192 Dense(20\u21923) \u2192 Sigmoid\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Use create_mlp() function\n",
-        "    - Choose appropriate output activation based on num_classes\n",
-        "    - For binary classification (num_classes=1), use Sigmoid\n",
-        "    - For multi-class, you could use Sigmoid or no activation\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "977a85df",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def create_classification_network(input_size: int, num_classes: int, \n",
-        "                                hidden_sizes: List[int] = None) -> Sequential:\n",
-        "    \"\"\"Create a network for classification tasks.\"\"\"\n",
-        "    if hidden_sizes is None:\n",
-        "        hidden_sizes = [input_size // 2]  # Use input_size // 2 as default\n",
-        "    \n",
-        "    # Choose appropriate output activation\n",
-        "    output_activation = Sigmoid if num_classes == 1 else Softmax\n",
-        "    \n",
-        "    return create_mlp(input_size, hidden_sizes, num_classes, \n",
-        "                     activation=ReLU, output_activation=output_activation)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "9e84a52b",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def create_regression_network(input_size: int, output_size: int = 1,\n",
-        "                             hidden_sizes: List[int] = None) -> Sequential:\n",
-        "    \"\"\"\n",
-        "    Create a network for regression tasks.\n",
-        "    \n",
-        "    Args:\n",
-        "        input_size: Number of input features\n",
-        "        output_size: Number of output values (default: 1)\n",
-        "        hidden_sizes: List of hidden layer sizes (default: [input_size * 2])\n",
-        "        \n",
-        "    Returns:\n",
-        "        Sequential network for regression\n",
-        "        \n",
-        "    TODO: Implement regression network creation.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Use default hidden sizes if none provided\n",
-        "    2. Create MLP with appropriate architecture\n",
-        "    3. Use no activation on output layer (linear output)\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    create_regression_network(5, 1) creates:\n",
-        "    Dense(5\u219210) \u2192 ReLU \u2192 Dense(10\u21921) (no activation)\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Use create_mlp() but with no output activation\n",
-        "    - For regression, we want linear outputs (no activation)\n",
-        "    - You can pass None or identity function as output_activation\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "6c8784d3",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def create_regression_network(input_size: int, output_size: int = 1,\n",
-        "                             hidden_sizes: List[int] = None) -> Sequential:\n",
-        "    \"\"\"Create a network for regression tasks.\"\"\"\n",
-        "    if hidden_sizes is None:\n",
-        "        hidden_sizes = [input_size // 2]  # Use input_size // 2 as default\n",
-        "    \n",
-        "    # Create MLP with Tanh output activation for regression\n",
-        "    return create_mlp(input_size, hidden_sizes, output_size, \n",
-        "                     activation=ReLU, output_activation=Tanh)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "5535e427",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Practical Networks"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "741cf65e",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test practical networks\n",
-        "print(\"Testing practical networks...\")\n",
-        "\n",
-        "try:\n",
-        "    # Test classification network\n",
-        "    class_net = create_classification_network(input_size=5, num_classes=1)\n",
-        "    x_class = Tensor([[1.0, 2.0, 3.0, 4.0, 5.0]])\n",
-        "    y_class = class_net(x_class)\n",
-        "    print(f\"\u2705 Classification output: {y_class}\")\n",
-        "    print(f\"\u2705 Output range: [{np.min(y_class.data):.3f}, {np.max(y_class.data):.3f}]\")\n",
-        "    \n",
-        "    # Test regression network\n",
-        "    reg_net = create_regression_network(input_size=3, output_size=1)\n",
-        "    x_reg = Tensor([[1.0, 2.0, 3.0]])\n",
-        "    y_reg = reg_net(x_reg)\n",
-        "    print(f\"\u2705 Regression output: {y_reg}\")\n",
-        "    print(f\"\u2705 Output range: [{np.min(y_reg.data):.3f}, {np.max(y_reg.data):.3f}]\")\n",
-        "    \n",
-        "    print(\"\ud83c\udf89 Practical networks work!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement the network creation functions above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "9332161e",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 7: Network Behavior Analysis\n",
-        "\n",
-        "Let's create tools to analyze how networks behave with different inputs and understand their capabilities.\n",
-        "\n",
-        "### Why Behavior Analysis Matters\n",
-        "- **Understanding**: Learn what patterns networks can learn\n",
-        "- **Debugging**: Identify when networks fail\n",
-        "- **Design**: Choose appropriate architectures\n",
-        "- **Validation**: Ensure networks work as expected"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "dbbbbb95",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def analyze_network_behavior(network: Sequential, input_data: Tensor, \n",
-        "                           title: str = \"Network Behavior Analysis\"):\n",
-        "    \"\"\"\n",
-        "    Analyze how a network behaves with different inputs.\n",
-        "    \n",
-        "    Args:\n",
-        "        network: Sequential network to analyze\n",
-        "        input_data: Input tensor to test\n",
-        "        title: Title for the plot\n",
-        "        \n",
-        "    TODO: Create an analysis showing network behavior and capabilities.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Test the network with the given input\n",
-        "    2. Analyze the output characteristics\n",
-        "    3. Test with variations of the input\n",
-        "    4. Create visualizations showing behavior patterns\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    Test network with original input and noisy versions\n",
-        "    Show how output changes with input variations\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Test the original input\n",
-        "    - Create variations (noise, scaling, etc.)\n",
-        "    - Compare outputs across variations\n",
-        "    - Show statistics and patterns\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "b62a84cf",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def analyze_network_behavior(network: Sequential, input_data: Tensor, \n",
-        "                           title: str = \"Network Behavior Analysis\"):\n",
-        "    \"\"\"Analyze how a network behaves with different inputs.\"\"\"\n",
-        "    if not _should_show_plots():\n",
-        "        print(\"\ud83d\udcca Visualization disabled during testing\")\n",
-        "        return\n",
-        "    \n",
-        "    # Test original input\n",
-        "    original_output = network(input_data)\n",
-        "    \n",
-        "    # Create variations\n",
-        "    noise_levels = [0.0, 0.1, 0.2, 0.5]\n",
-        "    outputs = []\n",
-        "    \n",
-        "    for noise in noise_levels:\n",
-        "        noisy_input = Tensor(input_data.data + noise * np.random.randn(*input_data.data.shape))\n",
-        "        output = network(noisy_input)\n",
-        "        outputs.append(output.data.flatten())\n",
-        "    \n",
-        "    # Create analysis plot\n",
-        "    fig, axes = plt.subplots(2, 2, figsize=(12, 10))\n",
-        "    \n",
-        "    # Original output\n",
-        "    axes[0, 0].hist(outputs[0], bins=20, alpha=0.7)\n",
-        "    axes[0, 0].set_title('Original Input Output')\n",
-        "    axes[0, 0].set_xlabel('Value')\n",
-        "    axes[0, 0].set_ylabel('Frequency')\n",
-        "    \n",
-        "    # Output stability\n",
-        "    output_means = [np.mean(out) for out in outputs]\n",
-        "    output_stds = [np.std(out) for out in outputs]\n",
-        "    axes[0, 1].plot(noise_levels, output_means, 'bo-', label='Mean')\n",
-        "    axes[0, 1].fill_between(noise_levels, \n",
-        "                           [m-s for m, s in zip(output_means, output_stds)],\n",
-        "                           [m+s for m, s in zip(output_means, output_stds)], \n",
-        "                           alpha=0.3, label='\u00b11 Std')\n",
-        "    axes[0, 1].set_xlabel('Noise Level')\n",
-        "    axes[0, 1].set_ylabel('Output Value')\n",
-        "    axes[0, 1].set_title('Output Stability')\n",
-        "    axes[0, 1].legend()\n",
-        "    \n",
-        "    # Output distribution comparison\n",
-        "    for i, (output, noise) in enumerate(zip(outputs, noise_levels)):\n",
-        "        axes[1, 0].hist(output, bins=20, alpha=0.5, label=f'Noise={noise}')\n",
-        "    axes[1, 0].set_xlabel('Output Value')\n",
-        "    axes[1, 0].set_ylabel('Frequency')\n",
-        "    axes[1, 0].set_title('Output Distribution Comparison')\n",
-        "    axes[1, 0].legend()\n",
-        "    \n",
-        "    # Statistics\n",
-        "    stats_text = f'Original Mean: {np.mean(outputs[0]):.3f}\\nOriginal Std: {np.std(outputs[0]):.3f}\\nOutput Range: [{np.min(outputs[0]):.3f}, {np.max(outputs[0]):.3f}]'\n",
-        "    axes[1, 1].text(0.1, 0.5, stats_text, transform=axes[1, 1].transAxes, \n",
-        "                    verticalalignment='center', fontsize=10)\n",
-        "    axes[1, 1].set_title('Network Statistics')\n",
-        "    axes[1, 1].axis('off')\n",
-        "    \n",
-        "    plt.suptitle(title)\n",
-        "    plt.tight_layout()\n",
-        "    plt.show()"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "e4c63d31",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Network Behavior Analysis"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "56f10f2f",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test network behavior analysis\n",
-        "print(\"Testing network behavior analysis...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create a test network\n",
-        "    test_network = create_classification_network(input_size=3, num_classes=1)\n",
-        "    test_input = Tensor([[1.0, 2.0, 3.0]])\n",
-        "    \n",
-        "    # Analyze behavior\n",
-        "    if _should_show_plots():\n",
-        "        analyze_network_behavior(test_network, test_input, \"Test Network Behavior\")\n",
-        "        print(\"\u2705 Network behavior analysis created!\")\n",
-        "    else:\n",
-        "        print(\"\u2705 Network behavior analysis skipped during testing\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement analyze_network_behavior above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "fcdeda32",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83c\udfaf Module Summary\n",
-        "\n",
-        "Congratulations! You've built the foundation of neural network architectures:\n",
-        "\n",
-        "### What You've Accomplished\n",
-        "\u2705 **Sequential Networks**: Composing layers into complete architectures  \n",
-        "\u2705 **MLP Creation**: Building multi-layer perceptrons  \n",
-        "\u2705 **Network Visualization**: Understanding architecture and data flow  \n",
-        "\u2705 **Network Comparison**: Analyzing different architectures  \n",
-        "\u2705 **Practical Networks**: Classification and regression networks  \n",
-        "\u2705 **Behavior Analysis**: Understanding network capabilities  \n",
-        "\n",
-        "### Key Concepts You've Learned\n",
-        "- **Networks** are compositions of layers that transform data\n",
-        "- **Architecture design** determines network capabilities\n",
-        "- **Sequential networks** are the most fundamental building block\n",
-        "- **Different architectures** solve different problems\n",
-        "- **Visualization tools** help understand network behavior\n",
-        "\n",
-        "### What's Next\n",
-        "In the next modules, you'll build on this foundation:\n",
-        "- **Autograd**: Enable automatic differentiation for training\n",
-        "- **Training**: Learn parameters using gradients and optimizers\n",
-        "- **Loss Functions**: Define objectives for learning\n",
-        "- **Applications**: Solve real problems with neural networks\n",
-        "\n",
-        "### Real-World Connection\n",
-        "Your network architectures are now ready to:\n",
-        "- Compose layers into complete neural networks\n",
-        "- Create specialized architectures for different tasks\n",
-        "- Analyze and understand network behavior\n",
-        "- Integrate with the rest of the TinyTorch ecosystem\n",
-        "\n",
-        "**Ready for the next challenge?** Let's move on to automatic differentiation to enable training!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "01ce7173",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Final verification\n",
-        "print(\"\\n\" + \"=\"*50)\n",
-        "print(\"\ud83c\udf89 NETWORKS MODULE COMPLETE!\")\n",
-        "print(\"=\"*50)\n",
-        "print(\"\u2705 Sequential network implementation\")\n",
-        "print(\"\u2705 MLP creation and architecture design\")\n",
-        "print(\"\u2705 Network visualization and analysis\")\n",
-        "print(\"\u2705 Network comparison tools\")\n",
-        "print(\"\u2705 Practical classification and regression networks\")\n",
-        "print(\"\u2705 Network behavior analysis\")\n",
-        "print(\"\\n\ud83d\ude80 Ready to enable training with autograd in the next module!\") "
-      ]
-    }
-  ],
-  "metadata": {
-    "jupytext": {
-      "main_language": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/assignments/source/05_cnn/05_cnn.ipynb b/assignments/source/05_cnn/05_cnn.ipynb
deleted file mode 100644
index 6dd3d37b..00000000
--- a/assignments/source/05_cnn/05_cnn.ipynb
+++ /dev/null
@@ -1,816 +0,0 @@
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "id": "ca53839c",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "# Module X: CNN - Convolutional Neural Networks\n",
-        "\n",
-        "Welcome to the CNN module! Here you'll implement the core building block of modern computer vision: the convolutional layer.\n",
-        "\n",
-        "## Learning Goals\n",
-        "- Understand the convolution operation (sliding window, local connectivity, weight sharing)\n",
-        "- Implement Conv2D with explicit for-loops\n",
-        "- Visualize how convolution builds feature maps\n",
-        "- Compose Conv2D with other layers to build a simple ConvNet\n",
-        "- (Stretch) Explore stride, padding, pooling, and multi-channel input\n",
-        "\n",
-        "## Build \u2192 Use \u2192 Understand\n",
-        "1. **Build**: Conv2D layer using sliding window convolution\n",
-        "2. **Use**: Transform images and see feature maps\n",
-        "3. **Understand**: How CNNs learn spatial patterns"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "9e0d8f02",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83d\udce6 Where This Code Lives in the Final Package\n",
-        "\n",
-        "**Learning Side:** You work in `modules/cnn/cnn_dev.py`  \n",
-        "**Building Side:** Code exports to `tinytorch.core.layers`\n",
-        "\n",
-        "```python\n",
-        "# Final package structure:\n",
-        "from tinytorch.core.layers import Dense, Conv2D  # Both layers together!\n",
-        "from tinytorch.core.activations import ReLU\n",
-        "from tinytorch.core.tensor import Tensor\n",
-        "```\n",
-        "\n",
-        "**Why this matters:**\n",
-        "- **Learning:** Focused modules for deep understanding\n",
-        "- **Production:** Proper organization like PyTorch's `torch.nn`\n",
-        "- **Consistency:** All layers (Dense, Conv2D) live together in `core.layers`"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "fbd717db",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| default_exp core.cnn"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "7f22e530",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "import numpy as np\n",
-        "from typing import List, Tuple, Optional\n",
-        "from tinytorch.core.tensor import Tensor\n",
-        "\n",
-        "# Setup and imports (for development)\n",
-        "import matplotlib.pyplot as plt\n",
-        "from tinytorch.core.layers import Dense\n",
-        "from tinytorch.core.activations import ReLU"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "f99723c8",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 1: What is Convolution?\n",
-        "\n",
-        "### Definition\n",
-        "A **convolutional layer** applies a small filter (kernel) across the input, producing a feature map. This operation captures local patterns and is the foundation of modern vision models.\n",
-        "\n",
-        "### Why Convolution Matters in Computer Vision\n",
-        "- **Local connectivity**: Each output value depends only on a small region of the input\n",
-        "- **Weight sharing**: The same filter is applied everywhere (translation invariance)\n",
-        "- **Spatial hierarchy**: Multiple layers build increasingly complex features\n",
-        "- **Parameter efficiency**: Much fewer parameters than fully connected layers\n",
-        "\n",
-        "### The Fundamental Insight\n",
-        "**Convolution is pattern matching!** The kernel learns to detect specific patterns:\n",
-        "- **Edge detectors**: Find boundaries between objects\n",
-        "- **Texture detectors**: Recognize surface patterns\n",
-        "- **Shape detectors**: Identify geometric forms\n",
-        "- **Feature detectors**: Combine simple patterns into complex features\n",
-        "\n",
-        "### Real-World Examples\n",
-        "- **Image processing**: Detect edges, blur, sharpen\n",
-        "- **Computer vision**: Recognize objects, faces, text\n",
-        "- **Medical imaging**: Detect tumors, analyze scans\n",
-        "- **Autonomous driving**: Identify traffic signs, pedestrians\n",
-        "\n",
-        "### Visual Intuition\n",
-        "```\n",
-        "Input Image:     Kernel:        Output Feature Map:\n",
-        "[1, 2, 3]       [1,  0]       [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)]\n",
-        "[4, 5, 6]       [0, -1]       [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]\n",
-        "[7, 8, 9]\n",
-        "```\n",
-        "\n",
-        "The kernel slides across the input, computing dot products at each position.\n",
-        "\n",
-        "### The Math Behind It\n",
-        "For input I (H\u00d7W) and kernel K (kH\u00d7kW), the output O (out_H\u00d7out_W) is:\n",
-        "```\n",
-        "O[i,j] = sum(I[i+di, j+dj] * K[di, dj] for di in range(kH), dj in range(kW))\n",
-        "```\n",
-        "\n",
-        "Let's implement this step by step!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "aa4af055",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:\n",
-        "    \"\"\"\n",
-        "    Naive 2D convolution (single channel, no stride, no padding).\n",
-        "    \n",
-        "    Args:\n",
-        "        input: 2D input array (H, W)\n",
-        "        kernel: 2D filter (kH, kW)\n",
-        "    Returns:\n",
-        "        2D output array (H-kH+1, W-kW+1)\n",
-        "        \n",
-        "    TODO: Implement the sliding window convolution using for-loops.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Get input dimensions: H, W = input.shape\n",
-        "    2. Get kernel dimensions: kH, kW = kernel.shape\n",
-        "    3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1\n",
-        "    4. Create output array: np.zeros((out_H, out_W))\n",
-        "    5. Use nested loops to slide the kernel:\n",
-        "       - i loop: output rows (0 to out_H-1)\n",
-        "       - j loop: output columns (0 to out_W-1)\n",
-        "       - di loop: kernel rows (0 to kH-1)\n",
-        "       - dj loop: kernel columns (0 to kW-1)\n",
-        "    6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    Input: [[1, 2, 3],     Kernel: [[1, 0],\n",
-        "            [4, 5, 6],               [0, -1]]\n",
-        "            [7, 8, 9]]\n",
-        "    \n",
-        "    Output[0,0] = 1*1 + 2*0 + 4*0 + 5*(-1) = 1 - 5 = -4\n",
-        "    Output[0,1] = 2*1 + 3*0 + 5*0 + 6*(-1) = 2 - 6 = -4\n",
-        "    Output[1,0] = 4*1 + 5*0 + 7*0 + 8*(-1) = 4 - 8 = -4\n",
-        "    Output[1,1] = 5*1 + 6*0 + 8*0 + 9*(-1) = 5 - 9 = -4\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Start with output = np.zeros((out_H, out_W))\n",
-        "    - Use four nested loops: for i in range(out_H): for j in range(out_W): for di in range(kH): for dj in range(kW):\n",
-        "    - Accumulate the sum: output[i,j] += input[i+di, j+dj] * kernel[di, dj]\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "d83b2c10",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:\n",
-        "    H, W = input.shape\n",
-        "    kH, kW = kernel.shape\n",
-        "    out_H, out_W = H - kH + 1, W - kW + 1\n",
-        "    output = np.zeros((out_H, out_W), dtype=input.dtype)\n",
-        "    for i in range(out_H):\n",
-        "        for j in range(out_W):\n",
-        "            for di in range(kH):\n",
-        "                for dj in range(kW):\n",
-        "                    output[i, j] += input[i + di, j + dj] * kernel[di, dj]\n",
-        "    return output"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "454a6bad",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Conv2D Implementation\n",
-        "\n",
-        "Try your function on this simple example:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "7705032a",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test case for conv2d_naive\n",
-        "input = np.array([\n",
-        "    [1, 2, 3],\n",
-        "    [4, 5, 6],\n",
-        "    [7, 8, 9]\n",
-        "], dtype=np.float32)\n",
-        "kernel = np.array([\n",
-        "    [1, 0],\n",
-        "    [0, -1]\n",
-        "], dtype=np.float32)\n",
-        "\n",
-        "expected = np.array([\n",
-        "    [1*1+2*0+4*0+5*(-1), 2*1+3*0+5*0+6*(-1)],\n",
-        "    [4*1+5*0+7*0+8*(-1), 5*1+6*0+8*0+9*(-1)]\n",
-        "], dtype=np.float32)\n",
-        "\n",
-        "try:\n",
-        "    output = conv2d_naive(input, kernel)\n",
-        "    print(\"\u2705 Input:\\n\", input)\n",
-        "    print(\"\u2705 Kernel:\\n\", kernel)\n",
-        "    print(\"\u2705 Your output:\\n\", output)\n",
-        "    print(\"\u2705 Expected:\\n\", expected)\n",
-        "    assert np.allclose(output, expected), \"\u274c Output does not match expected!\"\n",
-        "    print(\"\ud83c\udf89 conv2d_naive works!\")\n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement conv2d_naive above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "53449e22",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Step 2: Understanding What Convolution Does\n",
-        "\n",
-        "Let's visualize how different kernels detect different patterns:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "05a1ce2c",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Visualize different convolution kernels\n",
-        "print(\"Visualizing different convolution kernels...\")\n",
-        "\n",
-        "try:\n",
-        "    # Test different kernels\n",
-        "    test_input = np.array([\n",
-        "        [1, 1, 1, 0, 0],\n",
-        "        [1, 1, 1, 0, 0],\n",
-        "        [1, 1, 1, 0, 0],\n",
-        "        [0, 0, 0, 0, 0],\n",
-        "        [0, 0, 0, 0, 0]\n",
-        "    ], dtype=np.float32)\n",
-        "    \n",
-        "    # Edge detection kernel (horizontal)\n",
-        "    edge_kernel = np.array([\n",
-        "        [1, 1, 1],\n",
-        "        [0, 0, 0],\n",
-        "        [-1, -1, -1]\n",
-        "    ], dtype=np.float32)\n",
-        "    \n",
-        "    # Sharpening kernel\n",
-        "    sharpen_kernel = np.array([\n",
-        "        [0, -1, 0],\n",
-        "        [-1, 5, -1],\n",
-        "        [0, -1, 0]\n",
-        "    ], dtype=np.float32)\n",
-        "    \n",
-        "    # Test edge detection\n",
-        "    edge_output = conv2d_naive(test_input, edge_kernel)\n",
-        "    print(\"\u2705 Edge detection kernel:\")\n",
-        "    print(\"   Detects horizontal edges (boundaries between light and dark)\")\n",
-        "    print(\"   Output:\\n\", edge_output)\n",
-        "    \n",
-        "    # Test sharpening\n",
-        "    sharpen_output = conv2d_naive(test_input, sharpen_kernel)\n",
-        "    print(\"\u2705 Sharpening kernel:\")\n",
-        "    print(\"   Enhances edges and details\")\n",
-        "    print(\"   Output:\\n\", sharpen_output)\n",
-        "    \n",
-        "    print(\"\\n\ud83d\udca1 Different kernels detect different patterns!\")\n",
-        "    print(\"   Neural networks learn these kernels automatically!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "0b33791b",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 3: Conv2D Layer Class\n",
-        "\n",
-        "Now let's wrap your convolution function in a layer class for use in networks. This makes it consistent with other layers like Dense.\n",
-        "\n",
-        "### Why Layer Classes Matter\n",
-        "- **Consistent API**: Same interface as Dense layers\n",
-        "- **Learnable parameters**: Kernels can be learned from data\n",
-        "- **Composability**: Can be combined with other layers\n",
-        "- **Integration**: Works seamlessly with the rest of TinyTorch\n",
-        "\n",
-        "### The Pattern\n",
-        "```\n",
-        "Input Tensor \u2192 Conv2D \u2192 Output Tensor\n",
-        "```\n",
-        "\n",
-        "Just like Dense layers, but with spatial operations instead of linear transformations."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "118ba687",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "class Conv2D:\n",
-        "    \"\"\"\n",
-        "    2D Convolutional Layer (single channel, single filter, no stride/pad).\n",
-        "    \n",
-        "    Args:\n",
-        "        kernel_size: (kH, kW) - size of the convolution kernel\n",
-        "        \n",
-        "    TODO: Initialize a random kernel and implement the forward pass using conv2d_naive.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Store kernel_size as instance variable\n",
-        "    2. Initialize random kernel with small values\n",
-        "    3. Implement forward pass using conv2d_naive function\n",
-        "    4. Return Tensor wrapped around the result\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    layer = Conv2D(kernel_size=(2, 2))\n",
-        "    x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # shape (3, 3)\n",
-        "    y = layer(x)  # shape (2, 2)\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Store kernel_size as (kH, kW)\n",
-        "    - Initialize kernel with np.random.randn(kH, kW) * 0.1 (small values)\n",
-        "    - Use conv2d_naive(x.data, self.kernel) in forward pass\n",
-        "    - Return Tensor(result) to wrap the result\n",
-        "    \"\"\"\n",
-        "    def __init__(self, kernel_size: Tuple[int, int]):\n",
-        "        \"\"\"\n",
-        "        Initialize Conv2D layer with random kernel.\n",
-        "        \n",
-        "        Args:\n",
-        "            kernel_size: (kH, kW) - size of the convolution kernel\n",
-        "            \n",
-        "        TODO: \n",
-        "        1. Store kernel_size as instance variable\n",
-        "        2. Initialize random kernel with small values\n",
-        "        3. Scale kernel values to prevent large outputs\n",
-        "        \n",
-        "        STEP-BY-STEP:\n",
-        "        1. Store kernel_size as self.kernel_size\n",
-        "        2. Unpack kernel_size into kH, kW\n",
-        "        3. Initialize kernel: np.random.randn(kH, kW) * 0.1\n",
-        "        4. Convert to float32 for consistency\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Conv2D((2, 2)) creates:\n",
-        "        - kernel: shape (2, 2) with small random values\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"\n",
-        "        Forward pass: apply convolution to input.\n",
-        "        \n",
-        "        Args:\n",
-        "            x: Input tensor of shape (H, W)\n",
-        "            \n",
-        "        Returns:\n",
-        "            Output tensor of shape (H-kH+1, W-kW+1)\n",
-        "            \n",
-        "        TODO: Implement convolution using conv2d_naive function.\n",
-        "        \n",
-        "        STEP-BY-STEP:\n",
-        "        1. Use conv2d_naive(x.data, self.kernel)\n",
-        "        2. Return Tensor(result)\n",
-        "        \n",
-        "        EXAMPLE:\n",
-        "        Input x: Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # shape (3, 3)\n",
-        "        Kernel: shape (2, 2)\n",
-        "        Output: Tensor([[val1, val2], [val3, val4]])  # shape (2, 2)\n",
-        "        \n",
-        "        HINTS:\n",
-        "        - x.data gives you the numpy array\n",
-        "        - self.kernel is your learned kernel\n",
-        "        - Use conv2d_naive(x.data, self.kernel)\n",
-        "        - Return Tensor(result) to wrap the result\n",
-        "        \"\"\"\n",
-        "        raise NotImplementedError(\"Student implementation required\")\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        \"\"\"Make layer callable: layer(x) same as layer.forward(x)\"\"\"\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "3e18c382",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "class Conv2D:\n",
-        "    def __init__(self, kernel_size: Tuple[int, int]):\n",
-        "        self.kernel_size = kernel_size\n",
-        "        kH, kW = kernel_size\n",
-        "        # Initialize with small random values\n",
-        "        self.kernel = np.random.randn(kH, kW).astype(np.float32) * 0.1\n",
-        "    \n",
-        "    def forward(self, x: Tensor) -> Tensor:\n",
-        "        return Tensor(conv2d_naive(x.data, self.kernel))\n",
-        "    \n",
-        "    def __call__(self, x: Tensor) -> Tensor:\n",
-        "        return self.forward(x)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "e288fb18",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Conv2D Layer"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "2f1a4a6a",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test Conv2D layer\n",
-        "print(\"Testing Conv2D layer...\")\n",
-        "\n",
-        "try:\n",
-        "    # Test basic Conv2D layer\n",
-        "    conv = Conv2D(kernel_size=(2, 2))\n",
-        "    x = Tensor(np.array([\n",
-        "        [1, 2, 3],\n",
-        "        [4, 5, 6],\n",
-        "        [7, 8, 9]\n",
-        "    ], dtype=np.float32))\n",
-        "    \n",
-        "    print(f\"\u2705 Input shape: {x.shape}\")\n",
-        "    print(f\"\u2705 Kernel shape: {conv.kernel.shape}\")\n",
-        "    print(f\"\u2705 Kernel values:\\n{conv.kernel}\")\n",
-        "    \n",
-        "    y = conv(x)\n",
-        "    print(f\"\u2705 Output shape: {y.shape}\")\n",
-        "    print(f\"\u2705 Output: {y}\")\n",
-        "    \n",
-        "    # Test with different kernel size\n",
-        "    conv2 = Conv2D(kernel_size=(3, 3))\n",
-        "    y2 = conv2(x)\n",
-        "    print(f\"\u2705 3x3 kernel output shape: {y2.shape}\")\n",
-        "    \n",
-        "    print(\"\\n\ud83c\udf89 Conv2D layer works!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement the Conv2D layer above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "97939763",
-      "metadata": {
-        "cell_marker": "\"\"\"",
-        "lines_to_next_cell": 1
-      },
-      "source": [
-        "## Step 4: Building a Simple ConvNet\n",
-        "\n",
-        "Now let's compose Conv2D layers with other layers to build a complete convolutional neural network!\n",
-        "\n",
-        "### Why ConvNets Matter\n",
-        "- **Spatial hierarchy**: Each layer learns increasingly complex features\n",
-        "- **Parameter sharing**: Same kernel applied everywhere (efficiency)\n",
-        "- **Translation invariance**: Can recognize objects regardless of position\n",
-        "- **Real-world success**: Power most modern computer vision systems\n",
-        "\n",
-        "### The Architecture\n",
-        "```\n",
-        "Input Image \u2192 Conv2D \u2192 ReLU \u2192 Flatten \u2192 Dense \u2192 Output\n",
-        "```\n",
-        "\n",
-        "This simple architecture can learn to recognize patterns in images!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "51631fe6",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| export\n",
-        "def flatten(x: Tensor) -> Tensor:\n",
-        "    \"\"\"\n",
-        "    Flatten a 2D tensor to 1D (for connecting to Dense).\n",
-        "    \n",
-        "    TODO: Implement flattening operation.\n",
-        "    \n",
-        "    APPROACH:\n",
-        "    1. Get the numpy array from the tensor\n",
-        "    2. Use .flatten() to convert to 1D\n",
-        "    3. Add batch dimension with [None, :]\n",
-        "    4. Return Tensor wrapped around the result\n",
-        "    \n",
-        "    EXAMPLE:\n",
-        "    Input: Tensor([[1, 2], [3, 4]])  # shape (2, 2)\n",
-        "    Output: Tensor([[1, 2, 3, 4]])  # shape (1, 4)\n",
-        "    \n",
-        "    HINTS:\n",
-        "    - Use x.data.flatten() to get 1D array\n",
-        "    - Add batch dimension: result[None, :]\n",
-        "    - Return Tensor(result)\n",
-        "    \"\"\"\n",
-        "    raise NotImplementedError(\"Student implementation required\")"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "7e8f2b50",
-      "metadata": {
-        "lines_to_next_cell": 1
-      },
-      "outputs": [],
-      "source": [
-        "#| hide\n",
-        "#| export\n",
-        "def flatten(x: Tensor) -> Tensor:\n",
-        "    \"\"\"Flatten a 2D tensor to 1D (for connecting to Dense).\"\"\"\n",
-        "    return Tensor(x.data.flatten()[None, :])"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "7bdb9f80",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "### \ud83e\uddea Test Your Flatten Function"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "c6d92ebc",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Test flatten function\n",
-        "print(\"Testing flatten function...\")\n",
-        "\n",
-        "try:\n",
-        "    # Test flattening\n",
-        "    x = Tensor([[1, 2, 3], [4, 5, 6]])  # shape (2, 3)\n",
-        "    flattened = flatten(x)\n",
-        "    \n",
-        "    print(f\"\u2705 Input shape: {x.shape}\")\n",
-        "    print(f\"\u2705 Flattened shape: {flattened.shape}\")\n",
-        "    print(f\"\u2705 Flattened values: {flattened}\")\n",
-        "    \n",
-        "    # Verify the flattening worked correctly\n",
-        "    expected = np.array([[1, 2, 3, 4, 5, 6]])\n",
-        "    assert np.allclose(flattened.data, expected), \"\u274c Flattening incorrect!\"\n",
-        "    print(\"\u2705 Flattening works correctly!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Make sure to implement the flatten function above!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "9804128d",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Step 5: Composing a Complete ConvNet\n",
-        "\n",
-        "Now let's build a simple convolutional neural network that can process images!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "d60d05b9",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Compose a simple ConvNet\n",
-        "print(\"Building a simple ConvNet...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create network components\n",
-        "    conv = Conv2D((2, 2))\n",
-        "    relu = ReLU()\n",
-        "    dense = Dense(input_size=4, output_size=1)  # 4 features from 2x2 output\n",
-        "    \n",
-        "    # Test input (small 3x3 \"image\")\n",
-        "    x = Tensor(np.random.randn(3, 3).astype(np.float32))\n",
-        "    print(f\"\u2705 Input shape: {x.shape}\")\n",
-        "    print(f\"\u2705 Input: {x}\")\n",
-        "    \n",
-        "    # Forward pass through the network\n",
-        "    conv_out = conv(x)\n",
-        "    print(f\"\u2705 After Conv2D: {conv_out}\")\n",
-        "    \n",
-        "    relu_out = relu(conv_out)\n",
-        "    print(f\"\u2705 After ReLU: {relu_out}\")\n",
-        "    \n",
-        "    flattened = flatten(relu_out)\n",
-        "    print(f\"\u2705 After flatten: {flattened}\")\n",
-        "    \n",
-        "    final_out = dense(flattened)\n",
-        "    print(f\"\u2705 Final output: {final_out}\")\n",
-        "    \n",
-        "    print(\"\\n\ud83c\udf89 Simple ConvNet works!\")\n",
-        "    print(\"This network can learn to recognize patterns in images!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")\n",
-        "    print(\"Check your Conv2D, flatten, and Dense implementations!\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "9fe4faf0",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## Step 6: Understanding the Power of Convolution\n",
-        "\n",
-        "Let's see how convolution captures different types of patterns:"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "434133c2",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Demonstrate pattern detection\n",
-        "print(\"Demonstrating pattern detection...\")\n",
-        "\n",
-        "try:\n",
-        "    # Create a simple \"image\" with a pattern\n",
-        "    image = np.array([\n",
-        "        [0, 0, 0, 0, 0],\n",
-        "        [0, 1, 1, 1, 0],\n",
-        "        [0, 1, 1, 1, 0],\n",
-        "        [0, 1, 1, 1, 0],\n",
-        "        [0, 0, 0, 0, 0]\n",
-        "    ], dtype=np.float32)\n",
-        "    \n",
-        "    # Different kernels detect different patterns\n",
-        "    edge_kernel = np.array([\n",
-        "        [1, 1, 1],\n",
-        "        [1, -8, 1],\n",
-        "        [1, 1, 1]\n",
-        "    ], dtype=np.float32)\n",
-        "    \n",
-        "    blur_kernel = np.array([\n",
-        "        [1/9, 1/9, 1/9],\n",
-        "        [1/9, 1/9, 1/9],\n",
-        "        [1/9, 1/9, 1/9]\n",
-        "    ], dtype=np.float32)\n",
-        "    \n",
-        "    # Test edge detection\n",
-        "    edge_result = conv2d_naive(image, edge_kernel)\n",
-        "    print(\"\u2705 Edge detection:\")\n",
-        "    print(\"   Detects boundaries around the white square\")\n",
-        "    print(\"   Result:\\n\", edge_result)\n",
-        "    \n",
-        "    # Test blurring\n",
-        "    blur_result = conv2d_naive(image, blur_kernel)\n",
-        "    print(\"\u2705 Blurring:\")\n",
-        "    print(\"   Smooths the image\")\n",
-        "    print(\"   Result:\\n\", blur_result)\n",
-        "    \n",
-        "    print(\"\\n\ud83d\udca1 Different kernels = different feature detectors!\")\n",
-        "    print(\"   Neural networks learn these automatically from data!\")\n",
-        "    \n",
-        "except Exception as e:\n",
-        "    print(f\"\u274c Error: {e}\")"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "id": "80938b52",
-      "metadata": {
-        "cell_marker": "\"\"\""
-      },
-      "source": [
-        "## \ud83c\udfaf Module Summary\n",
-        "\n",
-        "Congratulations! You've built the foundation of convolutional neural networks:\n",
-        "\n",
-        "### What You've Accomplished\n",
-        "\u2705 **Convolution Operation**: Understanding the sliding window mechanism  \n",
-        "\u2705 **Conv2D Layer**: Learnable convolutional layer implementation  \n",
-        "\u2705 **Pattern Detection**: Visualizing how kernels detect different features  \n",
-        "\u2705 **ConvNet Architecture**: Composing Conv2D with other layers  \n",
-        "\u2705 **Real-world Applications**: Understanding computer vision applications  \n",
-        "\n",
-        "### Key Concepts You've Learned\n",
-        "- **Convolution** is pattern matching with sliding windows\n",
-        "- **Local connectivity** means each output depends on a small input region\n",
-        "- **Weight sharing** makes CNNs parameter-efficient\n",
-        "- **Spatial hierarchy** builds complex features from simple patterns\n",
-        "- **Translation invariance** allows recognition regardless of position\n",
-        "\n",
-        "### What's Next\n",
-        "In the next modules, you'll build on this foundation:\n",
-        "- **Advanced CNN features**: Stride, padding, pooling\n",
-        "- **Multi-channel convolution**: RGB images, multiple filters\n",
-        "- **Training**: Learning kernels from data\n",
-        "- **Real applications**: Image classification, object detection\n",
-        "\n",
-        "### Real-World Connection\n",
-        "Your Conv2D layer is now ready to:\n",
-        "- Learn edge detectors, texture recognizers, and shape detectors\n",
-        "- Process real images for computer vision tasks\n",
-        "- Integrate with the rest of the TinyTorch ecosystem\n",
-        "- Scale to complex architectures like ResNet, VGG, etc.\n",
-        "\n",
-        "**Ready for the next challenge?** Let's move on to training these networks!"
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "id": "03f153f1",
-      "metadata": {},
-      "outputs": [],
-      "source": [
-        "# Final verification\n",
-        "print(\"\\n\" + \"=\"*50)\n",
-        "print(\"\ud83c\udf89 CNN MODULE COMPLETE!\")\n",
-        "print(\"=\"*50)\n",
-        "print(\"\u2705 Convolution operation understanding\")\n",
-        "print(\"\u2705 Conv2D layer implementation\")\n",
-        "print(\"\u2705 Pattern detection visualization\")\n",
-        "print(\"\u2705 ConvNet architecture composition\")\n",
-        "print(\"\u2705 Real-world computer vision context\")\n",
-        "print(\"\\n\ud83d\ude80 Ready to train networks in the next module!\") "
-      ]
-    }
-  ],
-  "metadata": {
-    "jupytext": {
-      "main_language": "python"
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 5
-}
\ No newline at end of file
diff --git a/IMPLEMENTATION_SUMMARY.md b/development/archived/IMPLEMENTATION_SUMMARY.md
similarity index 100%
rename from IMPLEMENTATION_SUMMARY.md
rename to development/archived/IMPLEMENTATION_SUMMARY.md
diff --git a/MODULE_MIGRATION_STRATEGY.md b/development/archived/MODULE_MIGRATION_STRATEGY.md
similarity index 100%
rename from MODULE_MIGRATION_STRATEGY.md
rename to development/archived/MODULE_MIGRATION_STRATEGY.md
diff --git a/NBGRADER_INTEGRATION_COMPLETE.md b/development/archived/NBGRADER_INTEGRATION_COMPLETE.md
similarity index 100%
rename from NBGRADER_INTEGRATION_COMPLETE.md
rename to development/archived/NBGRADER_INTEGRATION_COMPLETE.md
diff --git a/NBGRADER_INTEGRATION_PLAN.md b/development/archived/NBGRADER_INTEGRATION_PLAN.md
similarity index 100%
rename from NBGRADER_INTEGRATION_PLAN.md
rename to development/archived/NBGRADER_INTEGRATION_PLAN.md
diff --git a/TINYTORCH_NBGRADER_PROPOSAL.md b/development/archived/TINYTORCH_NBGRADER_PROPOSAL.md
similarity index 100%
rename from TINYTORCH_NBGRADER_PROPOSAL.md
rename to development/archived/TINYTORCH_NBGRADER_PROPOSAL.md
diff --git a/quickstart.md b/development/archived/quickstart.md
similarity index 100%
rename from quickstart.md
rename to development/archived/quickstart.md
diff --git a/docs/students/project-guide.md b/docs/students/project-guide.md
deleted file mode 100644
index 1bb11f18..00000000
--- a/docs/students/project-guide.md
+++ /dev/null
@@ -1,288 +0,0 @@
-# 🔥 TinyTorch Project Guide
-
-**Building Machine Learning Systems from Scratch**
-
-This guide helps you navigate through the complete TinyTorch course. Each module builds progressively toward a complete ML system using a notebook-first development approach with nbdev.
-
-## 🎯 Module Progress Tracker
-
-Track your progress through the course:
-
-- [ ] **Module 0: Setup** - Environment & CLI setup  
-- [ ] **Module 1: Tensor** - Core tensor operations
-- [ ] **Module 2: Layers** - Neural network layers
-- [ ] **Module 3: Networks** - Complete model architectures
-- [ ] **Module 4: Autograd** - Automatic differentiation
-- [ ] **Module 5: DataLoader** - Data loading pipeline
-- [ ] **Module 6: Training** - Training loop & optimization
-- [ ] **Module 7: Config** - Configuration system
-- [ ] **Module 8: Profiling** - Performance profiling
-- [ ] **Module 9: Compression** - Model compression
-- [ ] **Module 10: Kernels** - Custom compute kernels
-- [ ] **Module 11: Benchmarking** - Performance benchmarking
-- [ ] **Module 12: MLOps** - Production monitoring
-
-## 🚀 Getting Started
-
-### First Time Setup
-1. **Clone the repository**
-2. **Go to**: [`modules/setup/README.md`](../../modules/setup/README.md)
-3. **Follow all setup instructions**
-4. **Verify with**: `tito system doctor`
-
-### Daily Workflow
-```bash
-cd TinyTorch
-source .venv/bin/activate  # Always activate first!
-tito system info            # Check system status
-```
-
-## 📋 Module Development Workflow
-
-Each module follows this pattern:
-1. **Read overview**: `modules/[name]/README.md`
-2. **Work in Python file**: `modules/[name]/[name]_dev.py`
-3. **Export code**: `tito package sync`
-4. **Run tests**: `tito module test --module [name]`
-5. **Move to next module when tests pass**
-
-## 📚 Module Details
-
-### 🔧 Module 0: Setup
-**Goal**: Get your development environment ready
-**Time**: 30 minutes
-**Location**: [`modules/setup/`](../../modules/setup/)
-
-**Key Tasks**:
-- [ ] Create virtual environment
-- [ ] Install dependencies
-- [ ] Implement `hello_tinytorch()` function
-- [ ] Pass all setup tests
-- [ ] Learn the `tito` CLI
-
-**Verification**:
-```bash
-tito system doctor           # Should show all ✅
-tito module test --module setup
-```
-
----
-
-### 🔢 Module 1: Tensor
-**Goal**: Build the core tensor system
-**Prerequisites**: Module 0 complete
-**Location**: [`modules/tensor/`](../../modules/tensor/)
-
-**Key Tasks**:
-- [ ] Implement `Tensor` class
-- [ ] Basic operations (add, mul, reshape)
-- [ ] Memory management
-- [ ] Shape validation
-- [ ] Broadcasting support
-
-**Verification**:
-```bash
-tito module test --module tensor
-```
-
----
-
-### 🧠 Module 2: Layers
-**Goal**: Build neural network layers
-**Prerequisites**: Module 1 complete
-**Location**: [`modules/layers/`](../../modules/layers/)
-
-**Key Tasks**:
-- [ ] Implement `Linear` layer
-- [ ] Activation functions (ReLU, Sigmoid)
-- [ ] Forward pass implementation
-- [ ] Parameter management
-- [ ] Layer composition
-
-**Verification**:
-```bash
-tito module test --module layers
-```
-
----
-
-### 🖼️ Module 3: Networks
-**Goal**: Build complete neural networks
-**Prerequisites**: Module 2 complete
-**Location**: [`modules/networks/`](../../modules/networks/)
-
-**Key Tasks**:
-- [ ] Implement `Sequential` container
-- [ ] CNN architectures
-- [ ] Model saving/loading
-- [ ] Train on CIFAR-10
-
-**Target**: >80% accuracy on CIFAR-10
-
----
-
-### ⚡ Module 4: Autograd
-**Goal**: Automatic differentiation engine
-**Prerequisites**: Module 3 complete
-**Location**: [`modules/autograd/`](../../modules/autograd/)
-
-**Key Tasks**:
-- [ ] Computational graph construction
-- [ ] Backward pass automation
-- [ ] Gradient checking
-- [ ] Memory efficient gradients
-
-**Verification**: All gradient checks pass
-
----
-
-### 📊 Module 5: DataLoader
-**Goal**: Efficient data loading
-**Prerequisites**: Module 4 complete
-**Location**: [`modules/dataloader/`](../../modules/dataloader/)
-
-**Key Tasks**:
-- [ ] Custom `DataLoader` implementation
-- [ ] Batch processing
-- [ ] Data transformations
-- [ ] Multi-threaded loading
-
----
-
-### 🎯 Module 6: Training
-**Goal**: Complete training system
-**Prerequisites**: Module 5 complete
-**Location**: [`modules/training/`](../../modules/training/)
-
-**Key Tasks**:
-- [ ] Training loop implementation
-- [ ] SGD optimizer
-- [ ] Adam optimizer
-- [ ] Learning rate scheduling
-- [ ] Metric tracking
-
----
-
-### ⚙️ Module 7: Config
-**Goal**: Configuration management
-**Prerequisites**: Module 6 complete
-**Location**: [`modules/config/`](../../modules/config/)
-
-**Key Tasks**:
-- [ ] YAML configuration system
-- [ ] Experiment logging
-- [ ] Reproducible training
-- [ ] Hyperparameter management
-
----
-
-### 📊 Module 8: Profiling
-**Goal**: Performance measurement
-**Prerequisites**: Module 7 complete
-**Location**: [`modules/profiling/`](../../modules/profiling/)
-
-**Key Tasks**:
-- [ ] Memory profiler
-- [ ] Compute profiler
-- [ ] Bottleneck identification
-- [ ] Performance visualizations
-
----
-
-### 🗜️ Module 9: Compression
-**Goal**: Model compression techniques
-**Prerequisites**: Module 8 complete
-**Location**: [`modules/compression/`](../../modules/compression/)
-
-**Key Tasks**:
-- [ ] Pruning implementation
-- [ ] Quantization
-- [ ] Knowledge distillation
-- [ ] Compression benchmarks
-
----
-
-### ⚡ Module 10: Kernels
-**Goal**: Custom compute kernels
-**Prerequisites**: Module 9 complete
-**Location**: [`modules/kernels/`](../../modules/kernels/)
-
-**Key Tasks**:
-- [ ] CUDA kernel implementation
-- [ ] Performance optimization
-- [ ] Memory coalescing
-- [ ] Kernel benchmarking
-
----
-
-### 📈 Module 11: Benchmarking
-**Goal**: Performance benchmarking
-**Prerequisites**: Module 10 complete
-**Location**: [`modules/benchmarking/`](../../modules/benchmarking/)
-
-**Key Tasks**:
-- [ ] Benchmarking framework
-- [ ] Performance comparisons
-- [ ] Scaling analysis
-- [ ] Optimization recommendations
-
----
-
-### 🚀 Module 12: MLOps
-**Goal**: Production monitoring
-**Prerequisites**: Module 11 complete
-**Location**: [`modules/mlops/`](../../modules/mlops/)
-
-**Key Tasks**:
-- [ ] Model monitoring
-- [ ] Performance tracking
-- [ ] Alert systems
-- [ ] Production deployment
-
-## 🛠️ Essential Commands
-
-### **System Commands**
-```bash
-tito system info              # System information and course navigation
-tito system doctor            # Environment diagnosis
-tito system jupyter           # Start Jupyter Lab
-```
-
-### **Module Development**
-```bash
-tito module status            # Check all module status
-tito module test --module X   # Test specific module
-tito module test --all        # Test all modules
-tito module notebooks --module X  # Convert Python to notebook
-```
-
-### **Package Management**
-```bash
-tito package sync            # Export all notebooks to package
-tito package sync --module X # Export specific module
-tito package reset           # Reset package to clean state
-```
-
-## 🎯 **Success Criteria**
-
-Each module is complete when:
-- [ ] **All tests pass**: `tito module test --module [name]`
-- [ ] **Code exports**: `tito package sync --module [name]`
-- [ ] **Understanding verified**: Can explain key concepts and trade-offs
-- [ ] **Ready for next**: Prerequisites met for following modules
-
-## 🆘 **Getting Help**
-
-### **Troubleshooting**
-- **Environment Issues**: `tito system doctor`
-- **Module Status**: `tito module status --details`
-- **Integration Issues**: Check `tito system info`
-
-### **Resources**
-- **Course Overview**: [Main README](../../README.md)
-- **Development Guide**: [Module Development](../development/module-development-guide.md)
-- **Quick Reference**: [Commands and Patterns](../development/quick-module-reference.md)
-
----
-
-**💡 Pro Tip**: Use `tito module status` regularly to track your progress and see which modules are ready to work on next! 
\ No newline at end of file
diff --git a/gradebook.db b/gradebook.db
deleted file mode 100644
index 7215b814..00000000
Binary files a/gradebook.db and /dev/null differ
diff --git a/gradebook.db.2025-07-12-090245.534037 b/gradebook.db.2025-07-12-090245.534037
deleted file mode 100644
index b679d0dd..00000000
Binary files a/gradebook.db.2025-07-12-090245.534037 and /dev/null differ
diff --git a/modules/00_setup/setup_dev_enhanced.ipynb b/modules/00_setup/setup_dev_enhanced.ipynb
index 5245278b..d05639d5 100644
--- a/modules/00_setup/setup_dev_enhanced.ipynb
+++ b/modules/00_setup/setup_dev_enhanced.ipynb
@@ -2,7 +2,7 @@
  "cells": [
   {
    "cell_type": "markdown",
-   "id": "e3fcd475",
+   "id": "cbc9ef5f",
    "metadata": {
     "cell_marker": "\"\"\""
    },
@@ -36,7 +36,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "fba821b3",
+   "id": "43560ba3",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -46,7 +46,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "16465d62",
+   "id": "516d08d6",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -66,7 +66,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "64d86ea8",
+   "id": "97f21ddb",
    "metadata": {
     "cell_marker": "\"\"\"",
     "lines_to_next_cell": 1
@@ -80,7 +80,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ab7eb118",
+   "id": "caeb1865",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -156,7 +156,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4b7256a9",
+   "id": "053a090e",
    "metadata": {
     "cell_marker": "\"\"\"",
     "lines_to_next_cell": 1
@@ -170,7 +170,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "2fc78732",
+   "id": "347431b1",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -214,7 +214,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "d457e1bf",
+   "id": "300543ef",
    "metadata": {
     "cell_marker": "\"\"\"",
     "lines_to_next_cell": 1
@@ -228,7 +228,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c78b6a2e",
+   "id": "f3d01818",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -301,7 +301,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "9aceffc4",
+   "id": "70543e35",
    "metadata": {
     "cell_marker": "\"\"\"",
     "lines_to_next_cell": 1
@@ -315,7 +315,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e7738e0f",
+   "id": "a837a39f",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -367,7 +367,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "da0fd46d",
+   "id": "4884a585",
    "metadata": {
     "cell_marker": "\"\"\"",
     "lines_to_next_cell": 1
@@ -381,7 +381,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c7cd22cd",
+   "id": "446836a3",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -538,12 +538,37 @@
     "        return self.ascii_art\n",
     "        ### END SOLUTION\n",
     "        \n",
+    "        #| exercise_end\n",
+    "\n",
+    "    def get_full_profile(self):\n",
+    "        \"\"\"\n",
+    "        Get complete profile with ASCII art.\n",
+    "        \n",
+    "        Return full profile display including ASCII art and all details.\n",
+    "        \"\"\"\n",
+    "        #| exercise_start\n",
+    "        #| hint: Format with ASCII art, then developer details with emojis\n",
+    "        #| solution_test: Should return complete profile with ASCII art and details\n",
+    "        #| difficulty: medium\n",
+    "        #| points: 10\n",
+    "        \n",
+    "        ### BEGIN SOLUTION\n",
+    "        return f\"\"\"{self.ascii_art}\n",
+    "        \n",
+    "👨‍💻 Developer: {self.name}\n",
+    "🏛️  Affiliation: {self.affiliation}\n",
+    "📧 Email: {self.email}\n",
+    "🐙 GitHub: @{self.github_username}\n",
+    "🔥 Ready to build ML systems from scratch!\n",
+    "\"\"\"\n",
+    "        ### END SOLUTION\n",
+    "        \n",
     "        #| exercise_end"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "c58a5de4",
+   "id": "be5ec710",
    "metadata": {
     "cell_marker": "\"\"\"",
     "lines_to_next_cell": 1
@@ -557,7 +582,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "a74d8133",
+   "id": "29f9103e",
    "metadata": {
     "lines_to_next_cell": 1
    },
@@ -637,7 +662,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "2959453c",
+   "id": "f5335cd2",
    "metadata": {
     "cell_marker": "\"\"\""
    },
@@ -650,7 +675,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "75574cd6",
+   "id": "d979356d",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -667,7 +692,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "e5d4a310",
+   "id": "f07fe977",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -685,7 +710,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "9cd31f75",
+   "id": "92619faf",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -702,7 +727,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "95483816",
+   "id": "eb20d3cd",
    "metadata": {
     "cell_marker": "\"\"\""
    },
diff --git a/modules/00_setup/setup_dev_enhanced.py b/modules/00_setup/setup_dev_enhanced.py
index 7d4bae20..47c519e2 100644
--- a/modules/00_setup/setup_dev_enhanced.py
+++ b/modules/00_setup/setup_dev_enhanced.py
@@ -455,6 +455,31 @@ class DeveloperProfile:
         
         #| exercise_end
 
+    def get_full_profile(self):
+        """
+        Get complete profile with ASCII art.
+        
+        Return full profile display including ASCII art and all details.
+        """
+        #| exercise_start
+        #| hint: Format with ASCII art, then developer details with emojis
+        #| solution_test: Should return complete profile with ASCII art and details
+        #| difficulty: medium
+        #| points: 10
+        
+        ### BEGIN SOLUTION
+        return f"""{self.ascii_art}
+        
+👨‍💻 Developer: {self.name}
+🏛️  Affiliation: {self.affiliation}
+📧 Email: {self.email}
+🐙 GitHub: @{self.github_username}
+🔥 Ready to build ML systems from scratch!
+"""
+        ### END SOLUTION
+        
+        #| exercise_end
+
 # %% [markdown]
 """
 ## Hidden Tests: DeveloperProfile Class (35 Points)
diff --git a/modules/00_setup/tests/test_setup.py b/modules/00_setup/tests/test_setup.py
index 6f449d08..4ef6b755 100644
--- a/modules/00_setup/tests/test_setup.py
+++ b/modules/00_setup/tests/test_setup.py
@@ -7,6 +7,7 @@ import pytest
 import numpy as np
 import sys
 import os
+from pathlib import Path
 
 # Import from the main package (rock solid foundation)
 from tinytorch.core.utils import hello_tinytorch, add_numbers, SystemInfo, DeveloperProfile
@@ -25,8 +26,8 @@ class TestSetupFunctions:
         hello_tinytorch()
         captured = capsys.readouterr()
         
-        # Should print the branding text
-        assert "Tiny🔥Torch" in captured.out
+        # Should print the branding text (flexible matching for unicode)
+        assert "TinyTorch" in captured.out or "Tiny🔥Torch" in captured.out
         assert "Build ML Systems from Scratch!" in captured.out
     
     def test_add_numbers_basic(self):
diff --git a/modules/04_networks/tests/test_networks.py b/modules/04_networks/tests/test_networks.py
index a612cbd2..14b59119 100644
--- a/modules/04_networks/tests/test_networks.py
+++ b/modules/04_networks/tests/test_networks.py
@@ -20,7 +20,8 @@ from tinytorch.core.activations import ReLU, Sigmoid, Tanh
 
 # Import the networks module
 try:
-    from modules.04_networks.networks_dev import (
+    # Import from the exported package
+    from tinytorch.core.networks import (
         Sequential, 
         create_mlp, 
         create_classification_network,
diff --git a/modules/05_cnn/tests/test_cnn.py b/modules/05_cnn/tests/test_cnn.py
index f98619d5..55752244 100644
--- a/modules/05_cnn/tests/test_cnn.py
+++ b/modules/05_cnn/tests/test_cnn.py
@@ -1,6 +1,18 @@
 import numpy as np
 import pytest
-from modules.cnn.cnn_dev import conv2d_naive, Conv2D
+import sys
+from pathlib import Path
+
+# Add the CNN module to the path
+sys.path.append(str(Path(__file__).parent.parent))
+
+try:
+    # Import from the exported package
+    from tinytorch.core.cnn import conv2d_naive, Conv2D
+except ImportError:
+    # Fallback for when module isn't exported yet
+    from cnn_dev import conv2d_naive, Conv2D
+
 from tinytorch.core.tensor import Tensor
 
 def test_conv2d_naive_small():
diff --git a/modules/06_dataloader/tests/test_dataloader.py b/modules/06_dataloader/tests/test_dataloader.py
index b449b063..ab3362a5 100644
--- a/modules/06_dataloader/tests/test_dataloader.py
+++ b/modules/06_dataloader/tests/test_dataloader.py
@@ -9,6 +9,7 @@ import sys
 import os
 import tempfile
 import shutil
+import pickle
 from pathlib import Path
 from unittest.mock import patch, MagicMock
 
diff --git a/tinytorch/_modidx.py b/tinytorch/_modidx.py
index 384e8634..99b64591 100644
--- a/tinytorch/_modidx.py
+++ b/tinytorch/_modidx.py
@@ -5,36 +5,42 @@ d = { 'settings': { 'branch': 'main',
                 'doc_host': 'https://tinytorch.github.io',
                 'git_url': 'https://github.com/tinytorch/TinyTorch/',
                 'lib_path': 'tinytorch'},
-  'syms': { 'tinytorch.core.activations': { 'tinytorch.core.activations.ReLU': ( 'activations/activations_dev.html#relu',
+  'syms': { 'tinytorch.core.activations': { 'tinytorch.core.activations.ReLU': ( '02_activations/activations_dev.html#relu',
                                                                                  'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.ReLU.__call__': ( 'activations/activations_dev.html#relu.__call__',
+                                            'tinytorch.core.activations.ReLU.__call__': ( '02_activations/activations_dev.html#relu.__call__',
                                                                                           'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.ReLU.forward': ( 'activations/activations_dev.html#relu.forward',
+                                            'tinytorch.core.activations.ReLU.forward': ( '02_activations/activations_dev.html#relu.forward',
                                                                                          'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Sigmoid': ( 'activations/activations_dev.html#sigmoid',
+                                            'tinytorch.core.activations.Sigmoid': ( '02_activations/activations_dev.html#sigmoid',
                                                                                     'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Sigmoid.__call__': ( 'activations/activations_dev.html#sigmoid.__call__',
+                                            'tinytorch.core.activations.Sigmoid.__call__': ( '02_activations/activations_dev.html#sigmoid.__call__',
                                                                                              'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Sigmoid.forward': ( 'activations/activations_dev.html#sigmoid.forward',
+                                            'tinytorch.core.activations.Sigmoid.forward': ( '02_activations/activations_dev.html#sigmoid.forward',
                                                                                             'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Softmax': ( 'activations/activations_dev.html#softmax',
+                                            'tinytorch.core.activations.Softmax': ( '02_activations/activations_dev.html#softmax',
                                                                                     'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Softmax.__call__': ( 'activations/activations_dev.html#softmax.__call__',
+                                            'tinytorch.core.activations.Softmax.__call__': ( '02_activations/activations_dev.html#softmax.__call__',
                                                                                              'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Softmax.forward': ( 'activations/activations_dev.html#softmax.forward',
+                                            'tinytorch.core.activations.Softmax.forward': ( '02_activations/activations_dev.html#softmax.forward',
                                                                                             'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Tanh': ( 'activations/activations_dev.html#tanh',
+                                            'tinytorch.core.activations.Tanh': ( '02_activations/activations_dev.html#tanh',
                                                                                  'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Tanh.__call__': ( 'activations/activations_dev.html#tanh.__call__',
+                                            'tinytorch.core.activations.Tanh.__call__': ( '02_activations/activations_dev.html#tanh.__call__',
                                                                                           'tinytorch/core/activations.py'),
-                                            'tinytorch.core.activations.Tanh.forward': ( 'activations/activations_dev.html#tanh.forward',
-                                                                                         'tinytorch/core/activations.py')},
-            'tinytorch.core.cnn': { 'tinytorch.core.cnn.Conv2D': ('cnn/cnn_dev.html#conv2d', 'tinytorch/core/cnn.py'),
-                                    'tinytorch.core.cnn.Conv2D.__call__': ('cnn/cnn_dev.html#conv2d.__call__', 'tinytorch/core/cnn.py'),
-                                    'tinytorch.core.cnn.Conv2D.__init__': ('cnn/cnn_dev.html#conv2d.__init__', 'tinytorch/core/cnn.py'),
-                                    'tinytorch.core.cnn.Conv2D.forward': ('cnn/cnn_dev.html#conv2d.forward', 'tinytorch/core/cnn.py'),
-                                    'tinytorch.core.cnn.conv2d_naive': ('cnn/cnn_dev.html#conv2d_naive', 'tinytorch/core/cnn.py'),
-                                    'tinytorch.core.cnn.flatten': ('cnn/cnn_dev.html#flatten', 'tinytorch/core/cnn.py')},
+                                            'tinytorch.core.activations.Tanh.forward': ( '02_activations/activations_dev.html#tanh.forward',
+                                                                                         'tinytorch/core/activations.py'),
+                                            'tinytorch.core.activations._should_show_plots': ( '02_activations/activations_dev.html#_should_show_plots',
+                                                                                               'tinytorch/core/activations.py'),
+                                            'tinytorch.core.activations.visualize_activation_function': ( '02_activations/activations_dev.html#visualize_activation_function',
+                                                                                                          'tinytorch/core/activations.py'),
+                                            'tinytorch.core.activations.visualize_activation_on_data': ( '02_activations/activations_dev.html#visualize_activation_on_data',
+                                                                                                         'tinytorch/core/activations.py')},
+            'tinytorch.core.cnn': { 'tinytorch.core.cnn.Conv2D': ('05_cnn/cnn_dev.html#conv2d', 'tinytorch/core/cnn.py'),
+                                    'tinytorch.core.cnn.Conv2D.__call__': ('05_cnn/cnn_dev.html#conv2d.__call__', 'tinytorch/core/cnn.py'),
+                                    'tinytorch.core.cnn.Conv2D.__init__': ('05_cnn/cnn_dev.html#conv2d.__init__', 'tinytorch/core/cnn.py'),
+                                    'tinytorch.core.cnn.Conv2D.forward': ('05_cnn/cnn_dev.html#conv2d.forward', 'tinytorch/core/cnn.py'),
+                                    'tinytorch.core.cnn.conv2d_naive': ('05_cnn/cnn_dev.html#conv2d_naive', 'tinytorch/core/cnn.py'),
+                                    'tinytorch.core.cnn.flatten': ('05_cnn/cnn_dev.html#flatten', 'tinytorch/core/cnn.py')},
             'tinytorch.core.dataloader': { 'tinytorch.core.dataloader.CIFAR10Dataset': ( 'dataloader/dataloader_dev.html#cifar10dataset',
                                                                                          'tinytorch/core/dataloader.py'),
                                            'tinytorch.core.dataloader.CIFAR10Dataset.__getitem__': ( 'dataloader/dataloader_dev.html#cifar10dataset.__getitem__',
@@ -79,54 +85,59 @@ d = { 'settings': { 'branch': 'main',
                                                                                              'tinytorch/core/dataloader.py'),
                                            'tinytorch.core.dataloader.create_data_pipeline': ( 'dataloader/dataloader_dev.html#create_data_pipeline',
                                                                                                'tinytorch/core/dataloader.py')},
-            'tinytorch.core.layers': { 'tinytorch.core.layers.Dense': ('layers/layers_dev.html#dense', 'tinytorch/core/layers.py'),
-                                       'tinytorch.core.layers.Dense.__call__': ( 'layers/layers_dev.html#dense.__call__',
+            'tinytorch.core.layers': { 'tinytorch.core.layers.Dense': ('03_layers/layers_dev.html#dense', 'tinytorch/core/layers.py'),
+                                       'tinytorch.core.layers.Dense.__call__': ( '03_layers/layers_dev.html#dense.__call__',
                                                                                  'tinytorch/core/layers.py'),
-                                       'tinytorch.core.layers.Dense.__init__': ( 'layers/layers_dev.html#dense.__init__',
+                                       'tinytorch.core.layers.Dense.__init__': ( '03_layers/layers_dev.html#dense.__init__',
                                                                                  'tinytorch/core/layers.py'),
-                                       'tinytorch.core.layers.Dense.forward': ( 'layers/layers_dev.html#dense.forward',
+                                       'tinytorch.core.layers.Dense.forward': ( '03_layers/layers_dev.html#dense.forward',
                                                                                 'tinytorch/core/layers.py'),
-                                       'tinytorch.core.layers.matmul_naive': ( 'layers/layers_dev.html#matmul_naive',
+                                       'tinytorch.core.layers.matmul_naive': ( '03_layers/layers_dev.html#matmul_naive',
                                                                                'tinytorch/core/layers.py')},
-            'tinytorch.core.networks': { 'tinytorch.core.networks.Sequential': ( 'networks/networks_dev.html#sequential',
+            'tinytorch.core.networks': { 'tinytorch.core.networks.Sequential': ( '04_networks/networks_dev.html#sequential',
                                                                                  'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.Sequential.__call__': ( 'networks/networks_dev.html#sequential.__call__',
+                                         'tinytorch.core.networks.Sequential.__call__': ( '04_networks/networks_dev.html#sequential.__call__',
                                                                                           'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.Sequential.__init__': ( 'networks/networks_dev.html#sequential.__init__',
+                                         'tinytorch.core.networks.Sequential.__init__': ( '04_networks/networks_dev.html#sequential.__init__',
                                                                                           'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.Sequential.forward': ( 'networks/networks_dev.html#sequential.forward',
+                                         'tinytorch.core.networks.Sequential.forward': ( '04_networks/networks_dev.html#sequential.forward',
                                                                                          'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks._should_show_plots': ( 'networks/networks_dev.html#_should_show_plots',
+                                         'tinytorch.core.networks._should_show_plots': ( '04_networks/networks_dev.html#_should_show_plots',
                                                                                          'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.analyze_network_behavior': ( 'networks/networks_dev.html#analyze_network_behavior',
+                                         'tinytorch.core.networks.analyze_network_behavior': ( '04_networks/networks_dev.html#analyze_network_behavior',
                                                                                                'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.compare_networks': ( 'networks/networks_dev.html#compare_networks',
+                                         'tinytorch.core.networks.compare_networks': ( '04_networks/networks_dev.html#compare_networks',
                                                                                        'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.create_classification_network': ( 'networks/networks_dev.html#create_classification_network',
+                                         'tinytorch.core.networks.create_classification_network': ( '04_networks/networks_dev.html#create_classification_network',
                                                                                                     'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.create_mlp': ( 'networks/networks_dev.html#create_mlp',
+                                         'tinytorch.core.networks.create_mlp': ( '04_networks/networks_dev.html#create_mlp',
                                                                                  'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.create_regression_network': ( 'networks/networks_dev.html#create_regression_network',
+                                         'tinytorch.core.networks.create_regression_network': ( '04_networks/networks_dev.html#create_regression_network',
                                                                                                 'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.visualize_data_flow': ( 'networks/networks_dev.html#visualize_data_flow',
+                                         'tinytorch.core.networks.visualize_data_flow': ( '04_networks/networks_dev.html#visualize_data_flow',
                                                                                           'tinytorch/core/networks.py'),
-                                         'tinytorch.core.networks.visualize_network_architecture': ( 'networks/networks_dev.html#visualize_network_architecture',
+                                         'tinytorch.core.networks.visualize_network_architecture': ( '04_networks/networks_dev.html#visualize_network_architecture',
                                                                                                      'tinytorch/core/networks.py')},
-            'tinytorch.core.tensor': { 'tinytorch.core.tensor.Tensor': ('tensor/tensor_dev.html#tensor', 'tinytorch/core/tensor.py'),
-                                       'tinytorch.core.tensor.Tensor.__init__': ( 'tensor/tensor_dev.html#tensor.__init__',
+            'tinytorch.core.tensor': { 'tinytorch.core.tensor.Tensor': ( '01_tensor/tensor_dev_enhanced.html#tensor',
+                                                                         'tinytorch/core/tensor.py'),
+                                       'tinytorch.core.tensor.Tensor.__init__': ( '01_tensor/tensor_dev_enhanced.html#tensor.__init__',
                                                                                   'tinytorch/core/tensor.py'),
-                                       'tinytorch.core.tensor.Tensor.__repr__': ( 'tensor/tensor_dev.html#tensor.__repr__',
+                                       'tinytorch.core.tensor.Tensor.__repr__': ( '01_tensor/tensor_dev_enhanced.html#tensor.__repr__',
                                                                                   'tinytorch/core/tensor.py'),
-                                       'tinytorch.core.tensor.Tensor.data': ( 'tensor/tensor_dev.html#tensor.data',
+                                       'tinytorch.core.tensor.Tensor.add': ( '01_tensor/tensor_dev_enhanced.html#tensor.add',
+                                                                             'tinytorch/core/tensor.py'),
+                                       'tinytorch.core.tensor.Tensor.data': ( '01_tensor/tensor_dev_enhanced.html#tensor.data',
                                                                               'tinytorch/core/tensor.py'),
-                                       'tinytorch.core.tensor.Tensor.dtype': ( 'tensor/tensor_dev.html#tensor.dtype',
+                                       'tinytorch.core.tensor.Tensor.dtype': ( '01_tensor/tensor_dev_enhanced.html#tensor.dtype',
                                                                                'tinytorch/core/tensor.py'),
-                                       'tinytorch.core.tensor.Tensor.shape': ( 'tensor/tensor_dev.html#tensor.shape',
+                                       'tinytorch.core.tensor.Tensor.matmul': ( '01_tensor/tensor_dev_enhanced.html#tensor.matmul',
+                                                                                'tinytorch/core/tensor.py'),
+                                       'tinytorch.core.tensor.Tensor.multiply': ( '01_tensor/tensor_dev_enhanced.html#tensor.multiply',
+                                                                                  'tinytorch/core/tensor.py'),
+                                       'tinytorch.core.tensor.Tensor.shape': ( '01_tensor/tensor_dev_enhanced.html#tensor.shape',
                                                                                'tinytorch/core/tensor.py'),
-                                       'tinytorch.core.tensor.Tensor.size': ( 'tensor/tensor_dev.html#tensor.size',
-                                                                              'tinytorch/core/tensor.py'),
-                                       'tinytorch.core.tensor._add_arithmetic_methods': ( 'tensor/tensor_dev.html#_add_arithmetic_methods',
-                                                                                          'tinytorch/core/tensor.py')},
+                                       'tinytorch.core.tensor.Tensor.size': ( '01_tensor/tensor_dev_enhanced.html#tensor.size',
+                                                                              'tinytorch/core/tensor.py')},
             'tinytorch.core.utils': { 'tinytorch.core.utils.DeveloperProfile': ( '00_setup/setup_dev_enhanced.html#developerprofile',
                                                                                  'tinytorch/core/utils.py'),
                                       'tinytorch.core.utils.DeveloperProfile.__init__': ( '00_setup/setup_dev_enhanced.html#developerprofile.__init__',
@@ -137,6 +148,8 @@ d = { 'settings': { 'branch': 'main',
                                                                                                      'tinytorch/core/utils.py'),
                                       'tinytorch.core.utils.DeveloperProfile.get_ascii_art': ( '00_setup/setup_dev_enhanced.html#developerprofile.get_ascii_art',
                                                                                                'tinytorch/core/utils.py'),
+                                      'tinytorch.core.utils.DeveloperProfile.get_full_profile': ( '00_setup/setup_dev_enhanced.html#developerprofile.get_full_profile',
+                                                                                                  'tinytorch/core/utils.py'),
                                       'tinytorch.core.utils.DeveloperProfile.get_signature': ( '00_setup/setup_dev_enhanced.html#developerprofile.get_signature',
                                                                                                'tinytorch/core/utils.py'),
                                       'tinytorch.core.utils.SystemInfo': ( '00_setup/setup_dev_enhanced.html#systeminfo',
diff --git a/tinytorch/core/activations.py b/tinytorch/core/activations.py
index 021fdcff..1219eed8 100644
--- a/tinytorch/core/activations.py
+++ b/tinytorch/core/activations.py
@@ -1,9 +1,9 @@
-# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/activations/activations_dev.ipynb.
+# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/02_activations/activations_dev.ipynb.
 
 # %% auto 0
-__all__ = ['ReLU', 'Sigmoid', 'Tanh', 'Softmax']
+__all__ = ['visualize_activation_function', 'visualize_activation_on_data', 'ReLU', 'Sigmoid', 'Tanh', 'Softmax']
 
-# %% ../../modules/activations/activations_dev.ipynb 5
+# %% ../../modules/02_activations/activations_dev.ipynb 2
 import math
 import numpy as np
 import matplotlib.pyplot as plt
@@ -11,157 +11,265 @@ import os
 import sys
 from typing import Union, List
 
-# Import our Tensor class
-from tinytorch.core.tensor import Tensor
+# Import our Tensor class from the main package (rock solid foundation)
+from .tensor import Tensor
 
-# %% ../../modules/activations/activations_dev.ipynb 5
+# %% ../../modules/02_activations/activations_dev.ipynb 3
+def _should_show_plots():
+    """Check if we should show plots (disable during testing)"""
+    # Check multiple conditions that indicate we're in test mode
+    is_pytest = (
+        'pytest' in sys.modules or
+        'test' in sys.argv or
+        os.environ.get('PYTEST_CURRENT_TEST') is not None or
+        any('test' in arg for arg in sys.argv) or
+        any('pytest' in arg for arg in sys.argv)
+    )
+    
+    # Show plots in development mode (when not in test mode)
+    return not is_pytest
+
+# %% ../../modules/02_activations/activations_dev.ipynb 4
+def visualize_activation_function(activation_fn, name: str, x_range: tuple = (-5, 5), num_points: int = 100):
+    """Visualize an activation function's behavior"""
+    if not _should_show_plots():
+        return
+        
+    try:
+        
+        # Generate input values
+        x_vals = np.linspace(x_range[0], x_range[1], num_points)
+        
+        # Apply activation function
+        y_vals = []
+        for x in x_vals:
+            input_tensor = Tensor([[x]])
+            output = activation_fn(input_tensor)
+            y_vals.append(output.data.item())
+        
+        # Create plot
+        plt.figure(figsize=(10, 6))
+        plt.plot(x_vals, y_vals, 'b-', linewidth=2, label=f'{name} Activation')
+        plt.grid(True, alpha=0.3)
+        plt.xlabel('Input (x)')
+        plt.ylabel(f'{name}(x)')
+        plt.title(f'{name} Activation Function')
+        plt.legend()
+        plt.show()
+        
+    except ImportError:
+        print("   📊 Matplotlib not available - skipping visualization")
+    except Exception as e:
+        print(f"   ⚠️  Visualization error: {e}")
+
+def visualize_activation_on_data(activation_fn, name: str, data: Tensor):
+    """Show activation function applied to sample data"""
+    if not _should_show_plots():
+        return
+        
+    try:
+        output = activation_fn(data)
+        print(f"   📊 {name} Example:")
+        print(f"      Input:  {data.data.flatten()}")
+        print(f"      Output: {output.data.flatten()}")
+        print(f"      Range:  [{output.data.min():.3f}, {output.data.max():.3f}]")
+        
+    except Exception as e:
+        print(f"   ⚠️  Data visualization error: {e}")
+
+# %% ../../modules/02_activations/activations_dev.ipynb 7
 class ReLU:
     """
-    ReLU Activation: f(x) = max(0, x)
+    ReLU Activation Function: f(x) = max(0, x)
     
     The most popular activation function in deep learning.
-    Simple, effective, and computationally efficient.
-    
-    TODO: Implement ReLU activation function.
+    Simple, fast, and effective for most applications.
     """
     
     def forward(self, x: Tensor) -> Tensor:
         """
-        Apply ReLU: f(x) = max(0, x)
+        Apply ReLU activation: f(x) = max(0, x)
         
-        Args:
-            x: Input tensor
-            
-        Returns:
-            Output tensor with ReLU applied element-wise
-            
-        TODO: Implement element-wise max(0, x) operation
-        Hint: Use np.maximum(0, x.data)
+        TODO: Implement ReLU activation
+        
+        APPROACH:
+        1. For each element in the input tensor, apply max(0, element)
+        2. Return a new Tensor with the results
+        
+        EXAMPLE:
+        Input: Tensor([[-1, 0, 1, 2, -3]])
+        Expected: Tensor([[0, 0, 1, 2, 0]])
+        
+        HINTS:
+        - Use np.maximum(0, x.data) for element-wise max
+        - Remember to return a new Tensor object
+        - The shape should remain the same as input
         """
         raise NotImplementedError("Student implementation required")
     
     def __call__(self, x: Tensor) -> Tensor:
-        """Make activation callable: relu(x) same as relu.forward(x)"""
+        """Allow calling the activation like a function: relu(x)"""
         return self.forward(x)
 
-# %% ../../modules/activations/activations_dev.ipynb 6
+# %% ../../modules/02_activations/activations_dev.ipynb 8
 class ReLU:
     """ReLU Activation: f(x) = max(0, x)"""
     
     def forward(self, x: Tensor) -> Tensor:
-        """Apply ReLU: f(x) = max(0, x)"""
-        return Tensor(np.maximum(0, x.data))
-    
+        result = np.maximum(0, x.data)
+        return Tensor(result)
+        
     def __call__(self, x: Tensor) -> Tensor:
         return self.forward(x)
 
-# %% ../../modules/activations/activations_dev.ipynb 12
+# %% ../../modules/02_activations/activations_dev.ipynb 13
 class Sigmoid:
     """
-    Sigmoid Activation: f(x) = 1 / (1 + e^(-x))
+    Sigmoid Activation Function: f(x) = 1 / (1 + e^(-x))
     
-    Squashes input to range (0, 1). Often used for binary classification.
-    
-    TODO: Implement Sigmoid activation function.
+    Squashes inputs to the range (0, 1), useful for binary classification
+    and probability interpretation.
     """
     
     def forward(self, x: Tensor) -> Tensor:
         """
-        Apply Sigmoid: f(x) = 1 / (1 + e^(-x))
+        Apply Sigmoid activation: f(x) = 1 / (1 + e^(-x))
         
-        Args:
-            x: Input tensor
-            
-        Returns:
-            Output tensor with Sigmoid applied element-wise
-            
-        TODO: Implement sigmoid function (be careful with numerical stability!)
+        TODO: Implement Sigmoid activation
         
-        Hint: For numerical stability, use:
-        - For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
-        - For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
+        APPROACH:
+        1. For numerical stability, clip x to reasonable range (e.g., -500 to 500)
+        2. Compute 1 / (1 + exp(-x)) for each element
+        3. Return a new Tensor with the results
+        
+        EXAMPLE:
+        Input: Tensor([[-2, -1, 0, 1, 2]])
+        Expected: Tensor([[0.119, 0.269, 0.5, 0.731, 0.881]]) (approximately)
+        
+        HINTS:
+        - Use np.clip(x.data, -500, 500) for numerical stability
+        - Use np.exp(-clipped_x) for the exponential
+        - Formula: 1 / (1 + np.exp(-clipped_x))
+        - Remember to return a new Tensor object
         """
         raise NotImplementedError("Student implementation required")
     
     def __call__(self, x: Tensor) -> Tensor:
+        """Allow calling the activation like a function: sigmoid(x)"""
         return self.forward(x)
 
-# %% ../../modules/activations/activations_dev.ipynb 13
+# %% ../../modules/02_activations/activations_dev.ipynb 14
 class Sigmoid:
     """Sigmoid Activation: f(x) = 1 / (1 + e^(-x))"""
     
     def forward(self, x: Tensor) -> Tensor:
-        """Apply Sigmoid with numerical stability"""
-        # Use the numerically stable version to avoid overflow
-        # For x >= 0: sigmoid(x) = 1 / (1 + exp(-x))
-        # For x < 0: sigmoid(x) = exp(x) / (1 + exp(x))
-        x_data = x.data
-        result = np.zeros_like(x_data)
-        
-        # Stable computation
-        positive_mask = x_data >= 0
-        result[positive_mask] = 1.0 / (1.0 + np.exp(-x_data[positive_mask]))
-        result[~positive_mask] = np.exp(x_data[~positive_mask]) / (1.0 + np.exp(x_data[~positive_mask]))
-        
+        # Clip for numerical stability
+        clipped = np.clip(x.data, -500, 500)
+        result = 1 / (1 + np.exp(-clipped))
         return Tensor(result)
-    
+        
     def __call__(self, x: Tensor) -> Tensor:
         return self.forward(x)
 
-# %% ../../modules/activations/activations_dev.ipynb 19
+# %% ../../modules/02_activations/activations_dev.ipynb 18
 class Tanh:
     """
-    Tanh Activation: f(x) = tanh(x)
+    Tanh Activation Function: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
     
-    Squashes input to range (-1, 1). Zero-centered output.
-    
-    TODO: Implement Tanh activation function.
+    Zero-centered activation function with range (-1, 1).
+    Often preferred over Sigmoid for hidden layers.
     """
     
     def forward(self, x: Tensor) -> Tensor:
         """
-        Apply Tanh: f(x) = tanh(x)
+        Apply Tanh activation: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
         
-        Args:
-            x: Input tensor
-            
-        Returns:
-            Output tensor with Tanh applied element-wise
-            
-        TODO: Implement tanh function
-        Hint: Use np.tanh(x.data)
+        TODO: Implement Tanh activation
+        
+        APPROACH:
+        1. Use numpy's built-in tanh function: np.tanh(x.data)
+        2. Return a new Tensor with the results
+        
+        ALTERNATIVE APPROACH:
+        1. Compute e^x and e^(-x)
+        2. Use formula: (e^x - e^(-x)) / (e^x + e^(-x))
+        
+        EXAMPLE:
+        Input: Tensor([[-2, -1, 0, 1, 2]])
+        Expected: Tensor([[-0.964, -0.762, 0.0, 0.762, 0.964]]) (approximately)
+        
+        HINTS:
+        - np.tanh() is the simplest approach
+        - Output range is (-1, 1)
+        - tanh(0) = 0 (zero-centered)
+        - Remember to return a new Tensor object
         """
         raise NotImplementedError("Student implementation required")
     
     def __call__(self, x: Tensor) -> Tensor:
+        """Allow calling the activation like a function: tanh(x)"""
         return self.forward(x)
 
-# %% ../../modules/activations/activations_dev.ipynb 20
+# %% ../../modules/02_activations/activations_dev.ipynb 19
 class Tanh:
-    """Tanh Activation: f(x) = tanh(x)"""
+    """Tanh Activation: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))"""
     
     def forward(self, x: Tensor) -> Tensor:
-        """Apply Tanh"""
-        return Tensor(np.tanh(x.data))
-    
+        result = np.tanh(x.data)
+        return Tensor(result)
+        
     def __call__(self, x: Tensor) -> Tensor:
         return self.forward(x)
 
+# %% ../../modules/02_activations/activations_dev.ipynb 23
 class Softmax:
-    """Softmax Activation: f(x) = exp(x) / sum(exp(x))"""
+    """
+    Softmax Activation Function: f(x_i) = e^(x_i) / Σ(e^(x_j))
+    
+    Converts a vector of real numbers into a probability distribution.
+    Essential for multi-class classification.
+    """
     
     def forward(self, x: Tensor) -> Tensor:
-        """Apply Softmax with numerical stability"""
-        # Subtract max for numerical stability
-        x_stable = x.data - np.max(x.data, axis=-1, keepdims=True)
+        """
+        Apply Softmax activation: f(x_i) = e^(x_i) / Σ(e^(x_j))
         
-        # Compute exponentials
-        exp_vals = np.exp(x_stable)
+        TODO: Implement Softmax activation
         
-        # Normalize to get probabilities
-        result = exp_vals / np.sum(exp_vals, axis=-1, keepdims=True)
+        APPROACH:
+        1. For numerical stability, subtract the maximum value from each row
+        2. Compute exponentials of the shifted values
+        3. Divide each exponential by the sum of exponentials in its row
+        4. Return a new Tensor with the results
         
-        return Tensor(result)
+        EXAMPLE:
+        Input: Tensor([[1, 2, 3]])
+        Expected: Tensor([[0.090, 0.245, 0.665]]) (approximately)
+        Sum should be 1.0
+        
+        HINTS:
+        - Use np.max(x.data, axis=1, keepdims=True) to find row maximums
+        - Subtract max from x.data for numerical stability
+        - Use np.exp() for exponentials
+        - Use np.sum(exp_vals, axis=1, keepdims=True) for row sums
+        - Remember to return a new Tensor object
+        """
+        raise NotImplementedError("Student implementation required")
     
+    def __call__(self, x: Tensor) -> Tensor:
+        """Allow calling the activation like a function: softmax(x)"""
+        return self.forward(x)
+
+# %% ../../modules/02_activations/activations_dev.ipynb 24
+class Softmax:
+    """Softmax Activation: f(x_i) = e^(x_i) / Σ(e^(x_j))"""
+    
+    def forward(self, x: Tensor) -> Tensor:
+        # Subtract max for numerical stability
+        shifted = x.data - np.max(x.data, axis=1, keepdims=True)
+        exp_vals = np.exp(shifted)
+        result = exp_vals / np.sum(exp_vals, axis=1, keepdims=True)
+        return Tensor(result)
+        
     def __call__(self, x: Tensor) -> Tensor:
         return self.forward(x)
diff --git a/tinytorch/core/cnn.py b/tinytorch/core/cnn.py
index 177d6b80..58f0d221 100644
--- a/tinytorch/core/cnn.py
+++ b/tinytorch/core/cnn.py
@@ -1,22 +1,61 @@
-# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/cnn/cnn_dev.ipynb.
+# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/05_cnn/cnn_dev.ipynb.
 
 # %% auto 0
 __all__ = ['conv2d_naive', 'Conv2D', 'flatten']
 
-# %% ../../modules/cnn/cnn_dev.ipynb 4
+# %% ../../modules/05_cnn/cnn_dev.ipynb 3
+import numpy as np
+from typing import List, Tuple, Optional
+from .tensor import Tensor
+
+# Setup and imports (for development)
+import matplotlib.pyplot as plt
+from .layers import Dense
+from .activations import ReLU
+
+# %% ../../modules/05_cnn/cnn_dev.ipynb 5
 def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
     """
     Naive 2D convolution (single channel, no stride, no padding).
+    
     Args:
         input: 2D input array (H, W)
         kernel: 2D filter (kH, kW)
     Returns:
         2D output array (H-kH+1, W-kW+1)
+        
     TODO: Implement the sliding window convolution using for-loops.
+    
+    APPROACH:
+    1. Get input dimensions: H, W = input.shape
+    2. Get kernel dimensions: kH, kW = kernel.shape
+    3. Calculate output dimensions: out_H = H - kH + 1, out_W = W - kW + 1
+    4. Create output array: np.zeros((out_H, out_W))
+    5. Use nested loops to slide the kernel:
+       - i loop: output rows (0 to out_H-1)
+       - j loop: output columns (0 to out_W-1)
+       - di loop: kernel rows (0 to kH-1)
+       - dj loop: kernel columns (0 to kW-1)
+    6. For each (i,j), compute: output[i,j] += input[i+di, j+dj] * kernel[di, dj]
+    
+    EXAMPLE:
+    Input: [[1, 2, 3],     Kernel: [[1, 0],
+            [4, 5, 6],               [0, -1]]
+            [7, 8, 9]]
+    
+    Output[0,0] = 1*1 + 2*0 + 4*0 + 5*(-1) = 1 - 5 = -4
+    Output[0,1] = 2*1 + 3*0 + 5*0 + 6*(-1) = 2 - 6 = -4
+    Output[1,0] = 4*1 + 5*0 + 7*0 + 8*(-1) = 4 - 8 = -4
+    Output[1,1] = 5*1 + 6*0 + 8*0 + 9*(-1) = 5 - 9 = -4
+    
+    HINTS:
+    - Start with output = np.zeros((out_H, out_W))
+    - Use four nested loops: for i in range(out_H): for j in range(out_W): for di in range(kH): for dj in range(kW):
+    - Accumulate the sum: output[i,j] += input[i+di, j+dj] * kernel[di, dj]
     """
     raise NotImplementedError("Student implementation required")
 
-# %% ../../modules/cnn/cnn_dev.ipynb 5
+# %% ../../modules/05_cnn/cnn_dev.ipynb 6
 def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
     H, W = input.shape
     kH, kW = kernel.shape
@@ -24,34 +63,134 @@ def conv2d_naive(input: np.ndarray, kernel: np.ndarray) -> np.ndarray:
     output = np.zeros((out_H, out_W), dtype=input.dtype)
     for i in range(out_H):
         for j in range(out_W):
-            output[i, j] = np.sum(input[i:i+kH, j:j+kW] * kernel)
+            for di in range(kH):
+                for dj in range(kW):
+                    output[i, j] += input[i + di, j + dj] * kernel[di, dj]
     return output
 
-# %% ../../modules/cnn/cnn_dev.ipynb 9
+# %% ../../modules/05_cnn/cnn_dev.ipynb 12
 class Conv2D:
     """
     2D Convolutional Layer (single channel, single filter, no stride/pad).
+    
     Args:
-        kernel_size: (kH, kW)
+        kernel_size: (kH, kW) - size of the convolution kernel
+        
     TODO: Initialize a random kernel and implement the forward pass using conv2d_naive.
+    
+    APPROACH:
+    1. Store kernel_size as instance variable
+    2. Initialize random kernel with small values
+    3. Implement forward pass using conv2d_naive function
+    4. Return Tensor wrapped around the result
+    
+    EXAMPLE:
+    layer = Conv2D(kernel_size=(2, 2))
+    x = Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # shape (3, 3)
+    y = layer(x)  # shape (2, 2)
+    
+    HINTS:
+    - Store kernel_size as (kH, kW)
+    - Initialize kernel with np.random.randn(kH, kW) * 0.1 (small values)
+    - Use conv2d_naive(x.data, self.kernel) in forward pass
+    - Return Tensor(result) to wrap the result
     """
     def __init__(self, kernel_size: Tuple[int, int]):
+        """
+        Initialize Conv2D layer with random kernel.
+        
+        Args:
+            kernel_size: (kH, kW) - size of the convolution kernel
+            
+        TODO: 
+        1. Store kernel_size as instance variable
+        2. Initialize random kernel with small values
+        3. Scale kernel values to prevent large outputs
+        
+        STEP-BY-STEP:
+        1. Store kernel_size as self.kernel_size
+        2. Unpack kernel_size into kH, kW
+        3. Initialize kernel: np.random.randn(kH, kW) * 0.1
+        4. Convert to float32 for consistency
+        
+        EXAMPLE:
+        Conv2D((2, 2)) creates:
+        - kernel: shape (2, 2) with small random values
+        """
         raise NotImplementedError("Student implementation required")
+    
     def forward(self, x: Tensor) -> Tensor:
+        """
+        Forward pass: apply convolution to input.
+        
+        Args:
+            x: Input tensor of shape (H, W)
+            
+        Returns:
+            Output tensor of shape (H-kH+1, W-kW+1)
+            
+        TODO: Implement convolution using conv2d_naive function.
+        
+        STEP-BY-STEP:
+        1. Use conv2d_naive(x.data, self.kernel)
+        2. Return Tensor(result)
+        
+        EXAMPLE:
+        Input x: Tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])  # shape (3, 3)
+        Kernel: shape (2, 2)
+        Output: Tensor([[val1, val2], [val3, val4]])  # shape (2, 2)
+        
+        HINTS:
+        - x.data gives you the numpy array
+        - self.kernel is your learned kernel
+        - Use conv2d_naive(x.data, self.kernel)
+        - Return Tensor(result) to wrap the result
+        """
         raise NotImplementedError("Student implementation required")
+    
     def __call__(self, x: Tensor) -> Tensor:
+        """Make layer callable: layer(x) same as layer.forward(x)"""
         return self.forward(x)
 
-# %% ../../modules/cnn/cnn_dev.ipynb 10
+# %% ../../modules/05_cnn/cnn_dev.ipynb 13
 class Conv2D:
     def __init__(self, kernel_size: Tuple[int, int]):
-        self.kernel = np.random.randn(*kernel_size).astype(np.float32)
+        self.kernel_size = kernel_size
+        kH, kW = kernel_size
+        # Initialize with small random values
+        self.kernel = np.random.randn(kH, kW).astype(np.float32) * 0.1
+    
     def forward(self, x: Tensor) -> Tensor:
         return Tensor(conv2d_naive(x.data, self.kernel))
+    
     def __call__(self, x: Tensor) -> Tensor:
         return self.forward(x)
 
-# %% ../../modules/cnn/cnn_dev.ipynb 12
+# %% ../../modules/05_cnn/cnn_dev.ipynb 17
+def flatten(x: Tensor) -> Tensor:
+    """
+    Flatten a 2D tensor to 1D (for connecting to Dense).
+    
+    TODO: Implement flattening operation.
+    
+    APPROACH:
+    1. Get the numpy array from the tensor
+    2. Use .flatten() to convert to 1D
+    3. Add batch dimension with [None, :]
+    4. Return Tensor wrapped around the result
+    
+    EXAMPLE:
+    Input: Tensor([[1, 2], [3, 4]])  # shape (2, 2)
+    Output: Tensor([[1, 2, 3, 4]])  # shape (1, 4)
+    
+    HINTS:
+    - Use x.data.flatten() to get 1D array
+    - Add batch dimension: result[None, :]
+    - Return Tensor(result)
+    """
+    raise NotImplementedError("Student implementation required")
+
+# %% ../../modules/05_cnn/cnn_dev.ipynb 18
 def flatten(x: Tensor) -> Tensor:
     """Flatten a 2D tensor to 1D (for connecting to Dense)."""
     return Tensor(x.data.flatten()[None, :])
diff --git a/tinytorch/core/layers.py b/tinytorch/core/layers.py
index d5d6e68b..bdf096b3 100644
--- a/tinytorch/core/layers.py
+++ b/tinytorch/core/layers.py
@@ -1,28 +1,24 @@
-# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/layers/layers_dev.ipynb.
+# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/03_layers/layers_dev.ipynb.
 
 # %% auto 0
 __all__ = ['matmul_naive', 'Dense']
 
-# %% ../../modules/layers/layers_dev.ipynb 3
+# %% ../../modules/03_layers/layers_dev.ipynb 3
 import numpy as np
 import math
 import sys
 from typing import Union, Optional, Callable
+
+# Import from the main package (rock solid foundation)
 from .tensor import Tensor
-
-# Import activation functions from the activations module
 from .activations import ReLU, Sigmoid, Tanh
 
-# Import our Tensor class
-# sys.path.append('../../')
-# from modules.tensor.tensor_dev import Tensor
-
 # print("🔥 TinyTorch Layers Module")
 # print(f"NumPy version: {np.__version__}")
 # print(f"Python version: {sys.version_info.major}.{sys.version_info.minor}")
 # print("Ready to build neural network layers!")
 
-# %% ../../modules/layers/layers_dev.ipynb 5
+# %% ../../modules/03_layers/layers_dev.ipynb 6
 def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
     """
     Naive matrix multiplication using explicit for-loops.
@@ -37,10 +33,34 @@ def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
         Matrix of shape (m, p) where C[i,j] = sum(A[i,k] * B[k,j] for k in range(n))
         
     TODO: Implement matrix multiplication using three nested for-loops.
+    
+    APPROACH:
+    1. Get the dimensions: m, n from A and n2, p from B
+    2. Check that n == n2 (matrices must be compatible)
+    3. Create output matrix C of shape (m, p) filled with zeros
+    4. Use three nested loops:
+       - i loop: rows of A (0 to m-1)
+       - j loop: columns of B (0 to p-1) 
+       - k loop: shared dimension (0 to n-1)
+    5. For each (i,j), compute: C[i,j] += A[i,k] * B[k,j]
+    
+    EXAMPLE:
+    A = [[1, 2],     B = [[5, 6],
+         [3, 4]]          [7, 8]]
+    
+    C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] = 1*5 + 2*7 = 19
+    C[0,1] = A[0,0]*B[0,1] + A[0,1]*B[1,1] = 1*6 + 2*8 = 22
+    C[1,0] = A[1,0]*B[0,0] + A[1,1]*B[1,0] = 3*5 + 4*7 = 43
+    C[1,1] = A[1,0]*B[0,1] + A[1,1]*B[1,1] = 3*6 + 4*8 = 50
+    
+    HINTS:
+    - Start with C = np.zeros((m, p))
+    - Use three nested for loops: for i in range(m): for j in range(p): for k in range(n):
+    - Accumulate the sum: C[i,j] += A[i,k] * B[k,j]
     """
     raise NotImplementedError("Student implementation required")
 
-# %% ../../modules/layers/layers_dev.ipynb 6
+# %% ../../modules/03_layers/layers_dev.ipynb 7
 def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
     """
     Naive matrix multiplication using explicit for-loops.
@@ -58,7 +78,7 @@ def matmul_naive(A: np.ndarray, B: np.ndarray) -> np.ndarray:
                 C[i, j] += A[i, k] * B[k, j]
     return C
 
-# %% ../../modules/layers/layers_dev.ipynb 7
+# %% ../../modules/03_layers/layers_dev.ipynb 11
 class Dense:
     """
     Dense (Linear) Layer: y = Wx + b
@@ -73,6 +93,23 @@ class Dense:
         use_naive_matmul: Whether to use naive matrix multiplication (for learning)
         
     TODO: Implement the Dense layer with weight initialization and forward pass.
+    
+    APPROACH:
+    1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
+    2. Initialize weights with small random values (Xavier/Glorot initialization)
+    3. Initialize bias to zeros (if use_bias=True)
+    4. Implement forward pass using matrix multiplication and bias addition
+    
+    EXAMPLE:
+    layer = Dense(input_size=3, output_size=2)
+    x = Tensor([[1, 2, 3]])  # batch_size=1, input_size=3
+    y = layer(x)  # shape: (1, 2)
+    
+    HINTS:
+    - Use np.random.randn() for random initialization
+    - Scale weights by sqrt(2/(input_size + output_size)) for Xavier init
+    - Store weights and bias as numpy arrays
+    - Use matmul_naive or @ operator based on use_naive_matmul flag
     """
     
     def __init__(self, input_size: int, output_size: int, use_bias: bool = True, 
@@ -90,6 +127,18 @@ class Dense:
         1. Store layer parameters (input_size, output_size, use_bias, use_naive_matmul)
         2. Initialize weights with small random values
         3. Initialize bias to zeros (if use_bias=True)
+        
+        STEP-BY-STEP:
+        1. Store the parameters as instance variables
+        2. Calculate scale factor for Xavier initialization: sqrt(2/(input_size + output_size))
+        3. Initialize weights: np.random.randn(input_size, output_size) * scale
+        4. If use_bias=True, initialize bias: np.zeros(output_size)
+        5. If use_bias=False, set bias to None
+        
+        EXAMPLE:
+        Dense(3, 2) creates:
+        - weights: shape (3, 2) with small random values
+        - bias: shape (2,) with zeros
         """
         raise NotImplementedError("Student implementation required")
     
@@ -105,8 +154,27 @@ class Dense:
             
         TODO: Implement matrix multiplication and bias addition
         - Use self.use_naive_matmul to choose between NumPy and naive implementation
-        - If use_naive_matmul=True, use matmul_naive(x.data, self.weights.data)
-        - If use_naive_matmul=False, use x.data @ self.weights.data
+        - If use_naive_matmul=True, use matmul_naive(x.data, self.weights)
+        - If use_naive_matmul=False, use x.data @ self.weights
+        - Add bias if self.use_bias=True
+        
+        STEP-BY-STEP:
+        1. Perform matrix multiplication: Wx
+           - If use_naive_matmul: result = matmul_naive(x.data, self.weights)
+           - Else: result = x.data @ self.weights
+        2. Add bias if use_bias: result += self.bias
+        3. Return Tensor(result)
+        
+        EXAMPLE:
+        Input x: Tensor([[1, 2, 3]])  # shape (1, 3)
+        Weights: shape (3, 2)
+        Output: Tensor([[val1, val2]])  # shape (1, 2)
+        
+        HINTS:
+        - x.data gives you the numpy array
+        - self.weights is your weight matrix
+        - Use broadcasting for bias addition: result + self.bias
+        - Return Tensor(result) to wrap the result
         """
         raise NotImplementedError("Student implementation required")
     
@@ -114,7 +182,7 @@ class Dense:
         """Make layer callable: layer(x) same as layer.forward(x)"""
         return self.forward(x)
 
-# %% ../../modules/layers/layers_dev.ipynb 8
+# %% ../../modules/03_layers/layers_dev.ipynb 12
 class Dense:
     """
     Dense (Linear) Layer: y = Wx + b
@@ -125,40 +193,52 @@ class Dense:
     
     def __init__(self, input_size: int, output_size: int, use_bias: bool = True, 
                  use_naive_matmul: bool = False):
-        """Initialize Dense layer with random weights."""
+        """
+        Initialize Dense layer with random weights.
+        
+        Args:
+            input_size: Number of input features
+            output_size: Number of output features
+            use_bias: Whether to include bias term
+            use_naive_matmul: Use naive matrix multiplication (for learning)
+        """
+        # Store parameters
         self.input_size = input_size
         self.output_size = output_size
         self.use_bias = use_bias
         self.use_naive_matmul = use_naive_matmul
         
-        # Initialize weights with Xavier/Glorot initialization
-        # This helps with gradient flow during training
-        limit = math.sqrt(6.0 / (input_size + output_size))
-        self.weights = Tensor(
-            np.random.uniform(-limit, limit, (input_size, output_size)).astype(np.float32)
-        )
+        # Xavier/Glorot initialization
+        scale = np.sqrt(2.0 / (input_size + output_size))
+        self.weights = np.random.randn(input_size, output_size).astype(np.float32) * scale
         
-        # Initialize bias to zeros
+        # Initialize bias
         if use_bias:
-            self.bias = Tensor(np.zeros(output_size, dtype=np.float32))
+            self.bias = np.zeros(output_size, dtype=np.float32)
         else:
             self.bias = None
     
     def forward(self, x: Tensor) -> Tensor:
-        """Forward pass: y = Wx + b"""
-        # Choose matrix multiplication implementation
+        """
+        Forward pass: y = Wx + b
+        
+        Args:
+            x: Input tensor of shape (batch_size, input_size)
+            
+        Returns:
+            Output tensor of shape (batch_size, output_size)
+        """
+        # Matrix multiplication
         if self.use_naive_matmul:
-            # Use naive implementation (for learning)
-            output = Tensor(matmul_naive(x.data, self.weights.data))
+            result = matmul_naive(x.data, self.weights)
         else:
-            # Use NumPy's optimized implementation (for speed)
-            output = Tensor(x.data @ self.weights.data)
+            result = x.data @ self.weights
         
-        # Add bias if present
-        if self.bias is not None:
-            output = Tensor(output.data + self.bias.data)
+        # Add bias
+        if self.use_bias:
+            result += self.bias
         
-        return output
+        return Tensor(result)
     
     def __call__(self, x: Tensor) -> Tensor:
         """Make layer callable: layer(x) same as layer.forward(x)"""
diff --git a/tinytorch/core/networks.py b/tinytorch/core/networks.py
index dc5089ac..6f2232ea 100644
--- a/tinytorch/core/networks.py
+++ b/tinytorch/core/networks.py
@@ -1,10 +1,10 @@
-# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/networks/networks_dev.ipynb.
+# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/04_networks/networks_dev.ipynb.
 
 # %% auto 0
-__all__ = ['Sequential', 'visualize_network_architecture', 'visualize_data_flow', 'compare_networks', 'create_mlp',
-           'analyze_network_behavior', 'create_classification_network', 'create_regression_network']
+__all__ = ['Sequential', 'create_mlp', 'visualize_network_architecture', 'visualize_data_flow', 'compare_networks',
+           'create_classification_network', 'create_regression_network', 'analyze_network_behavior']
 
-# %% ../../modules/networks/networks_dev.ipynb 3
+# %% ../../modules/04_networks/networks_dev.ipynb 3
 import numpy as np
 import sys
 from typing import List, Union, Optional, Callable
@@ -18,12 +18,12 @@ from .tensor import Tensor
 from .layers import Dense
 from .activations import ReLU, Sigmoid, Tanh
 
-# %% ../../modules/networks/networks_dev.ipynb 4
+# %% ../../modules/04_networks/networks_dev.ipynb 4
 def _should_show_plots():
     """Check if we should show plots (disable during testing)"""
     return 'pytest' not in sys.modules and 'test' not in sys.argv
 
-# %% ../../modules/networks/networks_dev.ipynb 6
+# %% ../../modules/04_networks/networks_dev.ipynb 6
 class Sequential:
     """
     Sequential Network: Composes layers in sequence
@@ -35,6 +35,27 @@ class Sequential:
         layers: List of layers to compose
         
     TODO: Implement the Sequential network with forward pass.
+    
+    APPROACH:
+    1. Store the list of layers as an instance variable
+    2. Implement forward pass that applies each layer in sequence
+    3. Make the network callable for easy use
+    
+    EXAMPLE:
+    network = Sequential([
+        Dense(3, 4),
+        ReLU(),
+        Dense(4, 2),
+        Sigmoid()
+    ])
+    x = Tensor([[1, 2, 3]])
+    y = network(x)  # Forward pass through all layers
+    
+    HINTS:
+    - Store layers in self.layers
+    - Use a for loop to apply each layer in order
+    - Each layer's output becomes the next layer's input
+    - Return the final output
     """
     
     def __init__(self, layers: List):
@@ -45,6 +66,14 @@ class Sequential:
             layers: List of layers to compose in order
             
         TODO: Store the layers and implement forward pass
+        
+        STEP-BY-STEP:
+        1. Store the layers list as self.layers
+        2. This creates the network architecture
+        
+        EXAMPLE:
+        Sequential([Dense(3,4), ReLU(), Dense(4,2)])
+        creates a 3-layer network: Dense → ReLU → Dense
         """
         raise NotImplementedError("Student implementation required")
     
@@ -59,6 +88,25 @@ class Sequential:
             Output tensor after passing through all layers
             
         TODO: Implement sequential forward pass through all layers
+        
+        STEP-BY-STEP:
+        1. Start with the input tensor: current = x
+        2. Loop through each layer in self.layers
+        3. Apply each layer: current = layer(current)
+        4. Return the final output
+        
+        EXAMPLE:
+        Input: Tensor([[1, 2, 3]])
+        Layer1 (Dense): Tensor([[1.4, 2.8]])
+        Layer2 (ReLU): Tensor([[1.4, 2.8]])
+        Layer3 (Dense): Tensor([[0.7]])
+        Output: Tensor([[0.7]])
+        
+        HINTS:
+        - Use a for loop: for layer in self.layers:
+        - Apply each layer: current = layer(current)
+        - The output of one layer becomes input to the next
+        - Return the final result
         """
         raise NotImplementedError("Student implementation required")
     
@@ -66,7 +114,7 @@ class Sequential:
         """Make network callable: network(x) same as network.forward(x)"""
         return self.forward(x)
 
-# %% ../../modules/networks/networks_dev.ipynb 7
+# %% ../../modules/04_networks/networks_dev.ipynb 7
 class Sequential:
     """
     Sequential Network: Composes layers in sequence
@@ -90,245 +138,7 @@ class Sequential:
         """Make network callable: network(x) same as network.forward(x)"""
         return self.forward(x)
 
-# %% ../../modules/networks/networks_dev.ipynb 11
-def visualize_network_architecture(network: Sequential, title: str = "Network Architecture"):
-    """
-    Create a visual representation of network architecture.
-    
-    Args:
-        network: Sequential network to visualize
-        title: Title for the plot
-    """
-    if not _should_show_plots():
-        print("📊 Plots disabled during testing - this is normal!")
-        return
-    
-    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
-    
-    # Network parameters
-    layer_count = len(network.layers)
-    layer_height = 0.8
-    layer_spacing = 1.2
-    
-    # Colors for different layer types
-    colors = {
-        'Dense': '#4CAF50',      # Green
-        'ReLU': '#2196F3',       # Blue
-        'Sigmoid': '#FF9800',    # Orange
-        'Tanh': '#9C27B0',       # Purple
-        'default': '#757575'      # Gray
-    }
-    
-    # Draw layers
-    for i, layer in enumerate(network.layers):
-        # Determine layer type and color
-        layer_type = type(layer).__name__
-        color = colors.get(layer_type, colors['default'])
-        
-        # Layer position
-        x = i * layer_spacing
-        y = 0
-        
-        # Create layer box
-        layer_box = FancyBboxPatch(
-            (x - 0.3, y - layer_height/2),
-            0.6, layer_height,
-            boxstyle="round,pad=0.1",
-            facecolor=color,
-            edgecolor='black',
-            linewidth=2,
-            alpha=0.8
-        )
-        ax.add_patch(layer_box)
-        
-        # Add layer label
-        ax.text(x, y, layer_type, ha='center', va='center', 
-                fontsize=10, fontweight='bold', color='white')
-        
-        # Add layer details
-        if hasattr(layer, 'input_size') and hasattr(layer, 'output_size'):
-            details = f"{layer.input_size}→{layer.output_size}"
-            ax.text(x, y - 0.3, details, ha='center', va='center',
-                   fontsize=8, color='white')
-        
-        # Draw connections to next layer
-        if i < layer_count - 1:
-            next_x = (i + 1) * layer_spacing
-            connection = ConnectionPatch(
-                (x + 0.3, y), (next_x - 0.3, y),
-                "data", "data",
-                arrowstyle="->", shrinkA=5, shrinkB=5,
-                mutation_scale=20, fc="black", lw=2
-            )
-            ax.add_patch(connection)
-    
-    # Formatting
-    ax.set_xlim(-0.5, (layer_count - 1) * layer_spacing + 0.5)
-    ax.set_ylim(-1, 1)
-    ax.set_aspect('equal')
-    ax.axis('off')
-    
-    # Add title
-    plt.title(title, fontsize=16, fontweight='bold', pad=20)
-    
-    # Add legend
-    legend_elements = []
-    for layer_type, color in colors.items():
-        if layer_type != 'default':
-            legend_elements.append(patches.Patch(color=color, label=layer_type))
-    
-    ax.legend(handles=legend_elements, loc='upper right', bbox_to_anchor=(1, 1))
-    
-    plt.tight_layout()
-    plt.show()
-
-# %% ../../modules/networks/networks_dev.ipynb 12
-def visualize_data_flow(network: Sequential, input_data: Tensor, title: str = "Data Flow Through Network"):
-    """
-    Visualize how data flows through the network.
-    
-    Args:
-        network: Sequential network
-        input_data: Input tensor
-        title: Title for the plot
-    """
-    if not _should_show_plots():
-        print("📊 Plots disabled during testing - this is normal!")
-        return
-    
-    # Get intermediate outputs
-    intermediate_outputs = []
-    x = input_data
-    
-    for i, layer in enumerate(network.layers):
-        x = layer(x)
-        intermediate_outputs.append({
-            'layer': network.layers[i],
-            'output': x,
-            'layer_index': i
-        })
-    
-    # Create visualization
-    fig, axes = plt.subplots(2, len(network.layers), figsize=(4*len(network.layers), 8))
-    if len(network.layers) == 1:
-        axes = axes.reshape(1, -1)
-    
-    for i, (layer, output) in enumerate(zip(network.layers, intermediate_outputs)):
-        # Top row: Layer information
-        ax_top = axes[0, i] if len(network.layers) > 1 else axes[0]
-        
-        # Layer type and details
-        layer_type = type(layer).__name__
-        ax_top.text(0.5, 0.8, layer_type, ha='center', va='center',
-                   fontsize=12, fontweight='bold')
-        
-        if hasattr(layer, 'input_size') and hasattr(layer, 'output_size'):
-            ax_top.text(0.5, 0.6, f"{layer.input_size} → {layer.output_size}", 
-                       ha='center', va='center', fontsize=10)
-        
-        # Output shape
-        ax_top.text(0.5, 0.4, f"Shape: {output['output'].shape}", 
-                   ha='center', va='center', fontsize=9)
-        
-        # Output statistics
-        output_data = output['output'].data
-        ax_top.text(0.5, 0.2, f"Mean: {np.mean(output_data):.3f}", 
-                   ha='center', va='center', fontsize=9)
-        ax_top.text(0.5, 0.1, f"Std: {np.std(output_data):.3f}", 
-                   ha='center', va='center', fontsize=9)
-        
-        ax_top.set_xlim(0, 1)
-        ax_top.set_ylim(0, 1)
-        ax_top.axis('off')
-        
-        # Bottom row: Output visualization
-        ax_bottom = axes[1, i] if len(network.layers) > 1 else axes[1]
-        
-        # Show output as heatmap or histogram
-        output_data = output['output'].data.flatten()
-        
-        if len(output_data) <= 20:  # Small output - show as bars
-            ax_bottom.bar(range(len(output_data)), output_data, alpha=0.7)
-            ax_bottom.set_title(f"Layer {i+1} Output")
-            ax_bottom.set_xlabel("Output Index")
-            ax_bottom.set_ylabel("Value")
-        else:  # Large output - show histogram
-            ax_bottom.hist(output_data, bins=20, alpha=0.7, edgecolor='black')
-            ax_bottom.set_title(f"Layer {i+1} Output Distribution")
-            ax_bottom.set_xlabel("Value")
-            ax_bottom.set_ylabel("Frequency")
-        
-        ax_bottom.grid(True, alpha=0.3)
-    
-    plt.suptitle(title, fontsize=14, fontweight='bold')
-    plt.tight_layout()
-    plt.show()
-
-# %% ../../modules/networks/networks_dev.ipynb 13
-def compare_networks(networks: List[Sequential], network_names: List[str], 
-                    input_data: Tensor, title: str = "Network Comparison"):
-    """
-    Compare different network architectures side-by-side.
-    
-    Args:
-        networks: List of networks to compare
-        network_names: Names for each network
-        input_data: Input tensor to test with
-        title: Title for the plot
-    """
-    if not _should_show_plots():
-        print("📊 Plots disabled during testing - this is normal!")
-        return
-    
-    fig, axes = plt.subplots(2, len(networks), figsize=(6*len(networks), 10))
-    if len(networks) == 1:
-        axes = axes.reshape(2, -1)
-    
-    for i, (network, name) in enumerate(zip(networks, network_names)):
-        # Get network output
-        output = network(input_data)
-        
-        # Top row: Architecture visualization
-        ax_top = axes[0, i] if len(networks) > 1 else axes[0]
-        
-        # Count layer types
-        layer_types = {}
-        for layer in network.layers:
-            layer_type = type(layer).__name__
-            layer_types[layer_type] = layer_types.get(layer_type, 0) + 1
-        
-        # Create pie chart of layer types
-        if layer_types:
-            labels = list(layer_types.keys())
-            sizes = list(layer_types.values())
-            colors = plt.cm.Set3(np.linspace(0, 1, len(labels)))
-            
-            ax_top.pie(sizes, labels=labels, autopct='%1.1f%%', colors=colors)
-            ax_top.set_title(f"{name}\nLayer Distribution")
-        
-        # Bottom row: Output comparison
-        ax_bottom = axes[1, i] if len(networks) > 1 else axes[1]
-        
-        output_data = output.data.flatten()
-        
-        # Show output statistics
-        ax_bottom.hist(output_data, bins=20, alpha=0.7, edgecolor='black')
-        ax_bottom.axvline(np.mean(output_data), color='red', linestyle='--', 
-                         label=f'Mean: {np.mean(output_data):.3f}')
-        ax_bottom.axvline(np.median(output_data), color='green', linestyle='--',
-                         label=f'Median: {np.median(output_data):.3f}')
-        
-        ax_bottom.set_title(f"{name} Output Distribution")
-        ax_bottom.set_xlabel("Output Value")
-        ax_bottom.set_ylabel("Frequency")
-        ax_bottom.legend()
-        ax_bottom.grid(True, alpha=0.3)
-    
-    plt.suptitle(title, fontsize=16, fontweight='bold')
-    plt.tight_layout()
-    plt.show()
-
-# %% ../../modules/networks/networks_dev.ipynb 15
+# %% ../../modules/04_networks/networks_dev.ipynb 11
 def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int, 
                activation=ReLU, output_activation=Sigmoid) -> Sequential:
     """
@@ -338,193 +148,432 @@ def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int,
         input_size: Number of input features
         hidden_sizes: List of hidden layer sizes
         output_size: Number of output features
-        activation: Activation function for hidden layers
-        output_activation: Activation function for output layer
+        activation: Activation function for hidden layers (default: ReLU)
+        output_activation: Activation function for output layer (default: Sigmoid)
         
     Returns:
-        Sequential network
+        Sequential network with MLP architecture
+        
+    TODO: Implement MLP creation with alternating Dense and activation layers.
+    
+    APPROACH:
+    1. Start with an empty list of layers
+    2. Add the first Dense layer: input_size → first hidden size
+    3. For each hidden layer:
+       - Add activation function
+       - Add Dense layer connecting to next hidden size
+    4. Add final activation function
+    5. Add final Dense layer: last hidden size → output_size
+    6. Add output activation function
+    7. Return Sequential(layers)
+    
+    EXAMPLE:
+    create_mlp(3, [4, 2], 1) creates:
+    Dense(3→4) → ReLU → Dense(4→2) → ReLU → Dense(2→1) → Sigmoid
+    
+    HINTS:
+    - Start with layers = []
+    - Add Dense layers with appropriate input/output sizes
+    - Add activation functions between Dense layers
+    - Don't forget the final output activation
     """
+    raise NotImplementedError("Student implementation required")
+
+# %% ../../modules/04_networks/networks_dev.ipynb 12
+def create_mlp(input_size: int, hidden_sizes: List[int], output_size: int, 
+               activation=ReLU, output_activation=Sigmoid) -> Sequential:
+    """Create a Multi-Layer Perceptron (MLP) network."""
     layers = []
     
-    # Input layer
-    if hidden_sizes:
-        layers.append(Dense(input_size, hidden_sizes[0]))
+    # Add first layer
+    current_size = input_size
+    for hidden_size in hidden_sizes:
+        layers.append(Dense(input_size=current_size, output_size=hidden_size))
         layers.append(activation())
-        
-        # Hidden layers
-        for i in range(len(hidden_sizes) - 1):
-            layers.append(Dense(hidden_sizes[i], hidden_sizes[i + 1]))
-            layers.append(activation())
-        
-        # Output layer
-        layers.append(Dense(hidden_sizes[-1], output_size))
-    else:
-        # Direct input to output
-        layers.append(Dense(input_size, output_size))
+        current_size = hidden_size
     
+    # Add output layer
+    layers.append(Dense(input_size=current_size, output_size=output_size))
     layers.append(output_activation())
     
     return Sequential(layers)
 
-# %% ../../modules/networks/networks_dev.ipynb 18
-def analyze_network_behavior(network: Sequential, input_data: Tensor, 
-                           title: str = "Network Behavior Analysis"):
+# %% ../../modules/04_networks/networks_dev.ipynb 16
+def visualize_network_architecture(network: Sequential, title: str = "Network Architecture"):
     """
-    Analyze how a network behaves with different types of input.
+    Visualize the architecture of a Sequential network.
     
     Args:
-        network: Network to analyze
-        input_data: Input tensor
+        network: Sequential network to visualize
         title: Title for the plot
+        
+    TODO: Create a visualization showing the network structure.
+    
+    APPROACH:
+    1. Create a matplotlib figure
+    2. For each layer, draw a box showing its type and size
+    3. Connect the boxes with arrows showing data flow
+    4. Add labels and formatting
+    
+    EXAMPLE:
+    Input → Dense(3→4) → ReLU → Dense(4→2) → Sigmoid → Output
+    
+    HINTS:
+    - Use plt.subplots() to create the figure
+    - Use plt.text() to add layer labels
+    - Use plt.arrow() to show connections
+    - Add proper spacing and formatting
     """
+    raise NotImplementedError("Student implementation required")
+
+# %% ../../modules/04_networks/networks_dev.ipynb 17
+def visualize_network_architecture(network: Sequential, title: str = "Network Architecture"):
+    """Visualize the architecture of a Sequential network."""
     if not _should_show_plots():
-        print("📊 Plots disabled during testing - this is normal!")
+        print("📊 Visualization disabled during testing")
         return
     
-    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
+    fig, ax = plt.subplots(1, 1, figsize=(12, 6))
     
-    # 1. Input vs Output relationship
-    ax1 = axes[0, 0]
-    input_flat = input_data.data.flatten()
-    output = network(input_data)
-    output_flat = output.data.flatten()
+    # Calculate positions
+    num_layers = len(network.layers)
+    x_positions = np.linspace(0, 10, num_layers + 2)
     
-    ax1.scatter(input_flat, output_flat, alpha=0.6)
-    ax1.plot([input_flat.min(), input_flat.max()], 
-             [input_flat.min(), input_flat.max()], 'r--', alpha=0.5, label='y=x')
-    ax1.set_xlabel('Input Values')
-    ax1.set_ylabel('Output Values')
-    ax1.set_title('Input vs Output')
-    ax1.legend()
-    ax1.grid(True, alpha=0.3)
+    # Draw input
+    ax.text(x_positions[0], 0, 'Input', ha='center', va='center', 
+            bbox=dict(boxstyle='round,pad=0.3', facecolor='lightblue'))
     
-    # 2. Output distribution
-    ax2 = axes[0, 1]
-    ax2.hist(output_flat, bins=20, alpha=0.7, edgecolor='black')
-    ax2.axvline(np.mean(output_flat), color='red', linestyle='--', 
-                label=f'Mean: {np.mean(output_flat):.3f}')
-    ax2.set_xlabel('Output Values')
-    ax2.set_ylabel('Frequency')
-    ax2.set_title('Output Distribution')
-    ax2.legend()
-    ax2.grid(True, alpha=0.3)
+    # Draw layers
+    for i, layer in enumerate(network.layers):
+        layer_name = type(layer).__name__
+        ax.text(x_positions[i+1], 0, layer_name, ha='center', va='center',
+                bbox=dict(boxstyle='round,pad=0.3', facecolor='lightgreen'))
+        
+        # Draw arrow
+        ax.arrow(x_positions[i], 0, 0.8, 0, head_width=0.1, head_length=0.1, 
+                fc='black', ec='black')
     
-    # 3. Layer-by-layer activation patterns
-    ax3 = axes[0, 2]
-    activations = []
-    x = input_data
+    # Draw output
+    ax.text(x_positions[-1], 0, 'Output', ha='center', va='center',
+            bbox=dict(boxstyle='round,pad=0.3', facecolor='lightcoral'))
     
-    for layer in network.layers:
-        x = layer(x)
-        if hasattr(layer, 'input_size'):  # Dense layer
-            activations.append(np.mean(x.data))
-        else:  # Activation layer
-            activations.append(np.mean(x.data))
-    
-    ax3.plot(range(len(activations)), activations, 'bo-', linewidth=2, markersize=8)
-    ax3.set_xlabel('Layer Index')
-    ax3.set_ylabel('Mean Activation')
-    ax3.set_title('Layer-by-Layer Activations')
-    ax3.grid(True, alpha=0.3)
-    
-    # 4. Network depth analysis
-    ax4 = axes[1, 0]
-    layer_types = [type(layer).__name__ for layer in network.layers]
-    layer_counts = {}
-    for layer_type in layer_types:
-        layer_counts[layer_type] = layer_counts.get(layer_type, 0) + 1
-    
-    if layer_counts:
-        ax4.bar(layer_counts.keys(), layer_counts.values(), alpha=0.7)
-        ax4.set_xlabel('Layer Type')
-        ax4.set_ylabel('Count')
-        ax4.set_title('Layer Type Distribution')
-        ax4.grid(True, alpha=0.3)
-    
-    # 5. Shape transformation
-    ax5 = axes[1, 1]
-    shapes = [input_data.shape]
-    x = input_data
-    
-    for layer in network.layers:
-        x = layer(x)
-        shapes.append(x.shape)
-    
-    layer_indices = range(len(shapes))
-    shape_sizes = [np.prod(shape) for shape in shapes]
-    
-    ax5.plot(layer_indices, shape_sizes, 'go-', linewidth=2, markersize=8)
-    ax5.set_xlabel('Layer Index')
-    ax5.set_ylabel('Tensor Size')
-    ax5.set_title('Shape Transformation')
-    ax5.grid(True, alpha=0.3)
-    
-    # 6. Network summary
-    ax6 = axes[1, 2]
-    ax6.axis('off')
-    
-    summary_text = f"""
-Network Summary:
-• Total Layers: {len(network.layers)}
-• Input Shape: {input_data.shape}
-• Output Shape: {output.shape}
-• Parameters: {sum(np.prod(layer.weights.data.shape) if hasattr(layer, 'weights') else 0 for layer in network.layers)}
-• Architecture: {' → '.join([type(layer).__name__ for layer in network.layers])}
+    ax.set_xlim(-0.5, 10.5)
+    ax.set_ylim(-0.5, 0.5)
+    ax.set_title(title)
+    ax.axis('off')
+    plt.show()
+
+# %% ../../modules/04_networks/networks_dev.ipynb 21
+def visualize_data_flow(network: Sequential, input_data: Tensor, title: str = "Data Flow Through Network"):
     """
+    Visualize how data flows through the network.
     
-    ax6.text(0.05, 0.95, summary_text, transform=ax6.transAxes, 
-             fontsize=10, verticalalignment='top', fontfamily='monospace')
+    Args:
+        network: Sequential network to analyze
+        input_data: Input tensor to trace through the network
+        title: Title for the plot
+        
+    TODO: Create a visualization showing how data transforms through each layer.
     
-    plt.suptitle(title, fontsize=16, fontweight='bold')
+    APPROACH:
+    1. Trace the input through each layer
+    2. Record the output of each layer
+    3. Create a visualization showing the transformations
+    4. Add statistics (mean, std, range) for each layer
+    
+    EXAMPLE:
+    Input: [1, 2, 3] → Layer1: [1.4, 2.8] → Layer2: [1.4, 2.8] → Output: [0.7]
+    
+    HINTS:
+    - Use a for loop to apply each layer
+    - Store intermediate outputs
+    - Use plt.subplot() to create multiple subplots
+    - Show statistics for each layer output
+    """
+    raise NotImplementedError("Student implementation required")
+
+# %% ../../modules/04_networks/networks_dev.ipynb 22
+def visualize_data_flow(network: Sequential, input_data: Tensor, title: str = "Data Flow Through Network"):
+    """Visualize how data flows through the network."""
+    if not _should_show_plots():
+        print("📊 Visualization disabled during testing")
+        return
+    
+    # Trace data through network
+    current_data = input_data
+    layer_outputs = [current_data.data.flatten()]
+    layer_names = ['Input']
+    
+    for layer in network.layers:
+        current_data = layer(current_data)
+        layer_outputs.append(current_data.data.flatten())
+        layer_names.append(type(layer).__name__)
+    
+    # Create visualization
+    fig, axes = plt.subplots(2, len(layer_outputs), figsize=(15, 8))
+    
+    for i, (output, name) in enumerate(zip(layer_outputs, layer_names)):
+        # Histogram
+        axes[0, i].hist(output, bins=20, alpha=0.7)
+        axes[0, i].set_title(f'{name}\nShape: {output.shape}')
+        axes[0, i].set_xlabel('Value')
+        axes[0, i].set_ylabel('Frequency')
+        
+        # Statistics
+        stats_text = f'Mean: {np.mean(output):.3f}\nStd: {np.std(output):.3f}\nRange: [{np.min(output):.3f}, {np.max(output):.3f}]'
+        axes[1, i].text(0.1, 0.5, stats_text, transform=axes[1, i].transAxes, 
+                        verticalalignment='center', fontsize=10)
+        axes[1, i].set_title(f'{name} Statistics')
+        axes[1, i].axis('off')
+    
+    plt.suptitle(title)
     plt.tight_layout()
     plt.show()
 
-# %% ../../modules/networks/networks_dev.ipynb 21
+# %% ../../modules/04_networks/networks_dev.ipynb 26
+def compare_networks(networks: List[Sequential], network_names: List[str], 
+                    input_data: Tensor, title: str = "Network Comparison"):
+    """
+    Compare multiple networks on the same input.
+    
+    Args:
+        networks: List of Sequential networks to compare
+        network_names: Names for each network
+        input_data: Input tensor to test all networks
+        title: Title for the plot
+        
+    TODO: Create a comparison visualization showing how different networks process the same input.
+    
+    APPROACH:
+    1. Run the same input through each network
+    2. Collect the outputs and intermediate results
+    3. Create a visualization comparing the results
+    4. Show statistics and differences
+    
+    EXAMPLE:
+    Compare MLP vs Deep Network vs Wide Network on same input
+    
+    HINTS:
+    - Use a for loop to test each network
+    - Store outputs and any relevant statistics
+    - Use plt.subplot() to create comparison plots
+    - Show both outputs and intermediate layer results
+    """
+    raise NotImplementedError("Student implementation required")
+
+# %% ../../modules/04_networks/networks_dev.ipynb 27
+def compare_networks(networks: List[Sequential], network_names: List[str], 
+                    input_data: Tensor, title: str = "Network Comparison"):
+    """Compare multiple networks on the same input."""
+    if not _should_show_plots():
+        print("📊 Visualization disabled during testing")
+        return
+    
+    # Test all networks
+    outputs = []
+    for network in networks:
+        output = network(input_data)
+        outputs.append(output.data.flatten())
+    
+    # Create comparison plot
+    fig, axes = plt.subplots(2, len(networks), figsize=(15, 8))
+    
+    for i, (output, name) in enumerate(zip(outputs, network_names)):
+        # Output distribution
+        axes[0, i].hist(output, bins=20, alpha=0.7)
+        axes[0, i].set_title(f'{name}\nOutput Distribution')
+        axes[0, i].set_xlabel('Value')
+        axes[0, i].set_ylabel('Frequency')
+        
+        # Statistics
+        stats_text = f'Mean: {np.mean(output):.3f}\nStd: {np.std(output):.3f}\nRange: [{np.min(output):.3f}, {np.max(output):.3f}]\nSize: {len(output)}'
+        axes[1, i].text(0.1, 0.5, stats_text, transform=axes[1, i].transAxes, 
+                        verticalalignment='center', fontsize=10)
+        axes[1, i].set_title(f'{name} Statistics')
+        axes[1, i].axis('off')
+    
+    plt.suptitle(title)
+    plt.tight_layout()
+    plt.show()
+
+# %% ../../modules/04_networks/networks_dev.ipynb 31
 def create_classification_network(input_size: int, num_classes: int, 
                                 hidden_sizes: List[int] = None) -> Sequential:
     """
-    Create a network for classification problems.
+    Create a network for classification tasks.
     
     Args:
         input_size: Number of input features
         num_classes: Number of output classes
-        hidden_sizes: List of hidden layer sizes (default: [input_size//2])
+        hidden_sizes: List of hidden layer sizes (default: [input_size * 2])
         
     Returns:
         Sequential network for classification
-    """
-    if hidden_sizes is None:
-        hidden_sizes = [input_size // 2]
+        
+    TODO: Implement classification network creation.
     
-    return create_mlp(
-        input_size=input_size,
-        hidden_sizes=hidden_sizes,
-        output_size=num_classes,
-        activation=ReLU,
-        output_activation=Sigmoid
-    )
+    APPROACH:
+    1. Use default hidden sizes if none provided
+    2. Create MLP with appropriate architecture
+    3. Use Sigmoid for binary classification (num_classes=1)
+    4. Use appropriate activation for multi-class
+    
+    EXAMPLE:
+    create_classification_network(10, 3) creates:
+    Dense(10→20) → ReLU → Dense(20→3) → Sigmoid
+    
+    HINTS:
+    - Use create_mlp() function
+    - Choose appropriate output activation based on num_classes
+    - For binary classification (num_classes=1), use Sigmoid
+    - For multi-class, you could use Sigmoid or no activation
+    """
+    raise NotImplementedError("Student implementation required")
 
-# %% ../../modules/networks/networks_dev.ipynb 22
+# %% ../../modules/04_networks/networks_dev.ipynb 32
+def create_classification_network(input_size: int, num_classes: int, 
+                                hidden_sizes: List[int] = None) -> Sequential:
+    """Create a network for classification tasks."""
+    if hidden_sizes is None:
+        hidden_sizes = [input_size // 2]  # Use input_size // 2 as default
+    
+    # Choose appropriate output activation
+    output_activation = Sigmoid if num_classes == 1 else Softmax
+    
+    return create_mlp(input_size, hidden_sizes, num_classes, 
+                     activation=ReLU, output_activation=output_activation)
+
+# %% ../../modules/04_networks/networks_dev.ipynb 33
 def create_regression_network(input_size: int, output_size: int = 1,
                              hidden_sizes: List[int] = None) -> Sequential:
     """
-    Create a network for regression problems.
+    Create a network for regression tasks.
     
     Args:
         input_size: Number of input features
         output_size: Number of output values (default: 1)
-        hidden_sizes: List of hidden layer sizes (default: [input_size//2])
+        hidden_sizes: List of hidden layer sizes (default: [input_size * 2])
         
     Returns:
         Sequential network for regression
-    """
-    if hidden_sizes is None:
-        hidden_sizes = [input_size // 2]
+        
+    TODO: Implement regression network creation.
     
-    return create_mlp(
-        input_size=input_size,
-        hidden_sizes=hidden_sizes,
-        output_size=output_size,
-        activation=ReLU,
-        output_activation=Tanh  # No activation for regression
-    )
+    APPROACH:
+    1. Use default hidden sizes if none provided
+    2. Create MLP with appropriate architecture
+    3. Use no activation on output layer (linear output)
+    
+    EXAMPLE:
+    create_regression_network(5, 1) creates:
+    Dense(5→10) → ReLU → Dense(10→1) (no activation)
+    
+    HINTS:
+    - Use create_mlp() but with no output activation
+    - For regression, we want linear outputs (no activation)
+    - You can pass None or identity function as output_activation
+    """
+    raise NotImplementedError("Student implementation required")
+
+# %% ../../modules/04_networks/networks_dev.ipynb 34
+def create_regression_network(input_size: int, output_size: int = 1,
+                             hidden_sizes: List[int] = None) -> Sequential:
+    """Create a network for regression tasks."""
+    if hidden_sizes is None:
+        hidden_sizes = [input_size // 2]  # Use input_size // 2 as default
+    
+    # Create MLP with Tanh output activation for regression
+    return create_mlp(input_size, hidden_sizes, output_size, 
+                     activation=ReLU, output_activation=Tanh)
+
+# %% ../../modules/04_networks/networks_dev.ipynb 38
+def analyze_network_behavior(network: Sequential, input_data: Tensor, 
+                           title: str = "Network Behavior Analysis"):
+    """
+    Analyze how a network behaves with different inputs.
+    
+    Args:
+        network: Sequential network to analyze
+        input_data: Input tensor to test
+        title: Title for the plot
+        
+    TODO: Create an analysis showing network behavior and capabilities.
+    
+    APPROACH:
+    1. Test the network with the given input
+    2. Analyze the output characteristics
+    3. Test with variations of the input
+    4. Create visualizations showing behavior patterns
+    
+    EXAMPLE:
+    Test network with original input and noisy versions
+    Show how output changes with input variations
+    
+    HINTS:
+    - Test the original input
+    - Create variations (noise, scaling, etc.)
+    - Compare outputs across variations
+    - Show statistics and patterns
+    """
+    raise NotImplementedError("Student implementation required")
+
+# %% ../../modules/04_networks/networks_dev.ipynb 39
+def analyze_network_behavior(network: Sequential, input_data: Tensor, 
+                           title: str = "Network Behavior Analysis"):
+    """Analyze how a network behaves with different inputs."""
+    if not _should_show_plots():
+        print("📊 Visualization disabled during testing")
+        return
+    
+    # Test original input
+    original_output = network(input_data)
+    
+    # Create variations
+    noise_levels = [0.0, 0.1, 0.2, 0.5]
+    outputs = []
+    
+    for noise in noise_levels:
+        noisy_input = Tensor(input_data.data + noise * np.random.randn(*input_data.data.shape))
+        output = network(noisy_input)
+        outputs.append(output.data.flatten())
+    
+    # Create analysis plot
+    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
+    
+    # Original output
+    axes[0, 0].hist(outputs[0], bins=20, alpha=0.7)
+    axes[0, 0].set_title('Original Input Output')
+    axes[0, 0].set_xlabel('Value')
+    axes[0, 0].set_ylabel('Frequency')
+    
+    # Output stability
+    output_means = [np.mean(out) for out in outputs]
+    output_stds = [np.std(out) for out in outputs]
+    axes[0, 1].plot(noise_levels, output_means, 'bo-', label='Mean')
+    axes[0, 1].fill_between(noise_levels, 
+                           [m-s for m, s in zip(output_means, output_stds)],
+                           [m+s for m, s in zip(output_means, output_stds)], 
+                           alpha=0.3, label='±1 Std')
+    axes[0, 1].set_xlabel('Noise Level')
+    axes[0, 1].set_ylabel('Output Value')
+    axes[0, 1].set_title('Output Stability')
+    axes[0, 1].legend()
+    
+    # Output distribution comparison
+    for i, (output, noise) in enumerate(zip(outputs, noise_levels)):
+        axes[1, 0].hist(output, bins=20, alpha=0.5, label=f'Noise={noise}')
+    axes[1, 0].set_xlabel('Output Value')
+    axes[1, 0].set_ylabel('Frequency')
+    axes[1, 0].set_title('Output Distribution Comparison')
+    axes[1, 0].legend()
+    
+    # Statistics
+    stats_text = f'Original Mean: {np.mean(outputs[0]):.3f}\nOriginal Std: {np.std(outputs[0]):.3f}\nOutput Range: [{np.min(outputs[0]):.3f}, {np.max(outputs[0]):.3f}]'
+    axes[1, 1].text(0.1, 0.5, stats_text, transform=axes[1, 1].transAxes, 
+                    verticalalignment='center', fontsize=10)
+    axes[1, 1].set_title('Network Statistics')
+    axes[1, 1].axis('off')
+    
+    plt.suptitle(title)
+    plt.tight_layout()
+    plt.show()
diff --git a/tinytorch/core/tensor.py b/tinytorch/core/tensor.py
index 6aea449d..df543aaa 100644
--- a/tinytorch/core/tensor.py
+++ b/tinytorch/core/tensor.py
@@ -1,67 +1,19 @@
-# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/tensor/tensor_dev.ipynb.
+# AUTOGENERATED! DO NOT EDIT! File to edit: ../../modules/01_tensor/tensor_dev_enhanced.ipynb.
 
 # %% auto 0
 __all__ = ['Tensor']
 
-# %% ../../modules/tensor/tensor_dev.ipynb 3
+# %% ../../modules/01_tensor/tensor_dev_enhanced.ipynb 2
 import numpy as np
-import sys
-from typing import Union, List, Tuple, Optional, Any
+from typing import Union, List, Tuple, Optional
 
-# %% ../../modules/tensor/tensor_dev.ipynb 4
+# %% ../../modules/01_tensor/tensor_dev_enhanced.ipynb 4
 class Tensor:
     """
     TinyTorch Tensor: N-dimensional array with ML operations.
     
-    The fundamental data structure for all TinyTorch operations.
-    Wraps NumPy arrays with ML-specific functionality.
-    
-    TODO: Implement the core Tensor class with data handling and properties.
-    """
-    
-    def __init__(self, data: Union[int, float, List, np.ndarray], dtype: Optional[str] = None):
-        """
-        Create a new tensor from data.
-        
-        Args:
-            data: Input data (scalar, list, or numpy array)
-            dtype: Data type ('float32', 'int32', etc.). Defaults to auto-detect.
-            
-        TODO: Implement tensor creation with proper type handling.
-        """
-        raise NotImplementedError("Student implementation required")
-    
-    @property
-    def data(self) -> np.ndarray:
-        """Access underlying numpy array."""
-        raise NotImplementedError("Student implementation required")
-    
-    @property
-    def shape(self) -> Tuple[int, ...]:
-        """Get tensor shape."""
-        raise NotImplementedError("Student implementation required")
-    
-    @property
-    def size(self) -> int:
-        """Get total number of elements."""
-        raise NotImplementedError("Student implementation required")
-    
-    @property
-    def dtype(self) -> np.dtype:
-        """Get data type as numpy dtype."""
-        raise NotImplementedError("Student implementation required")
-    
-    def __repr__(self) -> str:
-        """String representation."""
-        raise NotImplementedError("Student implementation required")
-
-# %% ../../modules/tensor/tensor_dev.ipynb 5
-class Tensor:
-    """
-    TinyTorch Tensor: N-dimensional array with ML operations.
-    
-    The fundamental data structure for all TinyTorch operations.
-    Wraps NumPy arrays with ML-specific functionality.
+    This enhanced version demonstrates dual-purpose educational content
+    suitable for both self-learning and formal assessment.
     """
     
     def __init__(self, data: Union[int, float, List, np.ndarray], dtype: Optional[str] = None):
@@ -72,145 +24,171 @@ class Tensor:
             data: Input data (scalar, list, or numpy array)
             dtype: Data type ('float32', 'int32', etc.). Defaults to auto-detect.
         """
+        #| exercise_start
+        #| hint: Use np.array() to convert input data to numpy array
+        #| solution_test: tensor.shape should match input shape
+        #| difficulty: easy
+        
+        ### BEGIN SOLUTION
         # Convert input to numpy array
-        if isinstance(data, (int, float, np.number)):
-            # Handle Python and NumPy scalars
-            if dtype is None:
-                # Auto-detect type: int for integers, float32 for floats
-                if isinstance(data, int) or (isinstance(data, np.number) and np.issubdtype(type(data), np.integer)):
-                    dtype = 'int32'
-                else:
-                    dtype = 'float32'
-            self._data = np.array(data, dtype=dtype)
+        if isinstance(data, (int, float)):
+            self._data = np.array(data)
         elif isinstance(data, list):
-            # Let NumPy auto-detect type, then convert if needed
-            temp_array = np.array(data)
-            if dtype is None:
-                # Keep NumPy's auto-detected type, but prefer common ML types
-                if np.issubdtype(temp_array.dtype, np.integer):
-                    dtype = 'int32'
-                elif np.issubdtype(temp_array.dtype, np.floating):
-                    dtype = 'float32'
-                else:
-                    dtype = temp_array.dtype
-            self._data = temp_array.astype(dtype)
+            self._data = np.array(data)
         elif isinstance(data, np.ndarray):
-            self._data = data.astype(dtype or data.dtype)
+            self._data = data.copy()
         else:
-            raise TypeError(f"Cannot create tensor from {type(data)}")
-    
+            self._data = np.array(data)
+        
+        # Apply dtype conversion if specified
+        if dtype is not None:
+            self._data = self._data.astype(dtype)
+        ### END SOLUTION
+        
+        #| exercise_end
+        
     @property
     def data(self) -> np.ndarray:
         """Access underlying numpy array."""
+        #| exercise_start
+        #| hint: Return the stored numpy array (_data attribute)
+        #| solution_test: tensor.data should return numpy array
+        #| difficulty: easy
+        
+        ### BEGIN SOLUTION
         return self._data
-    
+        ### END SOLUTION
+        
+        #| exercise_end
+        
     @property
     def shape(self) -> Tuple[int, ...]:
         """Get tensor shape."""
+        #| exercise_start
+        #| hint: Use the .shape attribute of the numpy array
+        #| solution_test: tensor.shape should return tuple of dimensions
+        #| difficulty: easy
+        
+        ### BEGIN SOLUTION
         return self._data.shape
-    
+        ### END SOLUTION
+        
+        #| exercise_end
+        
     @property
     def size(self) -> int:
         """Get total number of elements."""
+        #| exercise_start
+        #| hint: Use the .size attribute of the numpy array
+        #| solution_test: tensor.size should return total element count
+        #| difficulty: easy
+        
+        ### BEGIN SOLUTION
         return self._data.size
-    
+        ### END SOLUTION
+        
+        #| exercise_end
+        
     @property
     def dtype(self) -> np.dtype:
         """Get data type as numpy dtype."""
+        #| exercise_start
+        #| hint: Use the .dtype attribute of the numpy array
+        #| solution_test: tensor.dtype should return numpy dtype
+        #| difficulty: easy
+        
+        ### BEGIN SOLUTION
         return self._data.dtype
-    
+        ### END SOLUTION
+        
+        #| exercise_end
+        
     def __repr__(self) -> str:
-        """String representation."""
-        return f"Tensor({self._data.tolist()}, shape={self.shape}, dtype={self.dtype})"
-
-# %% ../../modules/tensor/tensor_dev.ipynb 9
-def _add_arithmetic_methods():
-    """
-    Add arithmetic operations to Tensor class.
-    
-    TODO: Implement arithmetic methods (__add__, __sub__, __mul__, __truediv__)
-    and their reverse operations (__radd__, __rsub__, etc.)
-    """
-    
-    def __add__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Addition: tensor + other"""
-        raise NotImplementedError("Student implementation required")
-    
-    def __sub__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Subtraction: tensor - other"""
-        raise NotImplementedError("Student implementation required")
-    
-    def __mul__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Multiplication: tensor * other"""
-        raise NotImplementedError("Student implementation required")
-    
-    def __truediv__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Division: tensor / other"""
-        raise NotImplementedError("Student implementation required")
-    
-    # Add methods to Tensor class
-    Tensor.__add__ = __add__
-    Tensor.__sub__ = __sub__
-    Tensor.__mul__ = __mul__
-    Tensor.__truediv__ = __truediv__
-
-# %% ../../modules/tensor/tensor_dev.ipynb 10
-def _add_arithmetic_methods():
-    """Add arithmetic operations to Tensor class."""
-    
-    def __add__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Addition: tensor + other"""
-        if isinstance(other, Tensor):
-            return Tensor(self._data + other._data)
-        else:  # scalar
-            return Tensor(self._data + other)
-    
-    def __sub__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Subtraction: tensor - other"""
-        if isinstance(other, Tensor):
-            return Tensor(self._data - other._data)
-        else:  # scalar
-            return Tensor(self._data - other)
-    
-    def __mul__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Multiplication: tensor * other"""
-        if isinstance(other, Tensor):
-            return Tensor(self._data * other._data)
-        else:  # scalar
-            return Tensor(self._data * other)
-    
-    def __truediv__(self, other: Union['Tensor', int, float]) -> 'Tensor':
-        """Division: tensor / other"""
-        if isinstance(other, Tensor):
-            return Tensor(self._data / other._data)
-        else:  # scalar
-            return Tensor(self._data / other)
-    
-    def __radd__(self, other: Union[int, float]) -> 'Tensor':
-        """Reverse addition: scalar + tensor"""
-        return Tensor(other + self._data)
-    
-    def __rsub__(self, other: Union[int, float]) -> 'Tensor':
-        """Reverse subtraction: scalar - tensor"""
-        return Tensor(other - self._data)
-    
-    def __rmul__(self, other: Union[int, float]) -> 'Tensor':
-        """Reverse multiplication: scalar * tensor"""
-        return Tensor(other * self._data)
-    
-    def __rtruediv__(self, other: Union[int, float]) -> 'Tensor':
-        """Reverse division: scalar / tensor"""
-        return Tensor(other / self._data)
-    
-    # Add methods to Tensor class
-    Tensor.__add__ = __add__
-    Tensor.__sub__ = __sub__
-    Tensor.__mul__ = __mul__
-    Tensor.__truediv__ = __truediv__
-    Tensor.__radd__ = __radd__
-    Tensor.__rsub__ = __rsub__
-    Tensor.__rmul__ = __rmul__
-    Tensor.__rtruediv__ = __rtruediv__
-
-# Call the function to add arithmetic methods
-_add_arithmetic_methods()
+        """String representation of the tensor."""
+        #| exercise_start
+        #| hint: Format as "Tensor([data], shape=shape, dtype=dtype)"
+        #| solution_test: repr should include data, shape, and dtype
+        #| difficulty: medium
+        
+        ### BEGIN SOLUTION
+        data_str = self._data.tolist()
+        return f"Tensor({data_str}, shape={self.shape}, dtype={self.dtype})"
+        ### END SOLUTION
+        
+        #| exercise_end
+        
+    def add(self, other: 'Tensor') -> 'Tensor':
+        """
+        Add two tensors element-wise.
+        
+        Args:
+            other: Another tensor to add
+            
+        Returns:
+            New tensor with element-wise sum
+        """
+        #| exercise_start
+        #| hint: Use numpy's + operator for element-wise addition
+        #| solution_test: result should be new Tensor with correct values
+        #| difficulty: medium
+        
+        ### BEGIN SOLUTION
+        result_data = self._data + other._data
+        return Tensor(result_data)
+        ### END SOLUTION
+        
+        #| exercise_end
+        
+    def multiply(self, other: 'Tensor') -> 'Tensor':
+        """
+        Multiply two tensors element-wise.
+        
+        Args:
+            other: Another tensor to multiply
+            
+        Returns:
+            New tensor with element-wise product
+        """
+        #| exercise_start
+        #| hint: Use numpy's * operator for element-wise multiplication
+        #| solution_test: result should be new Tensor with correct values
+        #| difficulty: medium
+        
+        ### BEGIN SOLUTION
+        result_data = self._data * other._data
+        return Tensor(result_data)
+        ### END SOLUTION
+        
+        #| exercise_end
+        
+    def matmul(self, other: 'Tensor') -> 'Tensor':
+        """
+        Matrix multiplication of two tensors.
+        
+        Args:
+            other: Another tensor for matrix multiplication
+            
+        Returns:
+            New tensor with matrix product
+            
+        Raises:
+            ValueError: If shapes are incompatible for matrix multiplication
+        """
+        #| exercise_start
+        #| hint: Use np.dot() for matrix multiplication, check shapes first
+        #| solution_test: result should handle shape validation and matrix multiplication
+        #| difficulty: hard
+        
+        ### BEGIN SOLUTION
+        # Check shape compatibility
+        if len(self.shape) != 2 or len(other.shape) != 2:
+            raise ValueError("Matrix multiplication requires 2D tensors")
+        
+        if self.shape[1] != other.shape[0]:
+            raise ValueError(f"Cannot multiply shapes {self.shape} and {other.shape}")
+        
+        result_data = np.dot(self._data, other._data)
+        return Tensor(result_data)
+        ### END SOLUTION
+        
+        #| exercise_end
diff --git a/tinytorch/core/utils.py b/tinytorch/core/utils.py
index f7109c7f..abf1679f 100644
--- a/tinytorch/core/utils.py
+++ b/tinytorch/core/utils.py
@@ -299,3 +299,28 @@ class DeveloperProfile:
         ### END SOLUTION
         
         #| exercise_end
+
+    def get_full_profile(self):
+        """
+        Get complete profile with ASCII art.
+        
+        Return full profile display including ASCII art and all details.
+        """
+        #| exercise_start
+        #| hint: Format with ASCII art, then developer details with emojis
+        #| solution_test: Should return complete profile with ASCII art and details
+        #| difficulty: medium
+        #| points: 10
+        
+        ### BEGIN SOLUTION
+        return f"""{self.ascii_art}
+        
+👨‍💻 Developer: {self.name}
+🏛️  Affiliation: {self.affiliation}
+📧 Email: {self.email}
+🐙 GitHub: @{self.github_username}
+🔥 Ready to build ML systems from scratch!
+"""
+        ### END SOLUTION
+        
+        #| exercise_end