Improve Jupyter Book styling and configuration

- Replace ugly gray background with clean white theme - Add proper logo styling and configuration - Update book chapters from module READMEs - Add educational-ml-docs-architect agent - Clean up custom CSS for better readability - Configure logo.png in correct location - Update tito book command with proper chapters
2026-06-03 16:51:16 -05:00 · 2025-09-18 09:48:01 -04:00
parent 01192b9749
commit c2c32db768
23 changed files with 1109 additions and 419 deletions
--- a/.claude/agents/educational-ml-docs-architect.md
+++ b/.claude/agents/educational-ml-docs-architect.md
@@ -0,0 +1,63 @@
+---
+name: educational-ml-docs-architect
+description: Use this agent when you need to create, improve, or restructure documentation for the educational ML framework website. This includes designing documentation pages, organizing the book structure, improving content presentation, ensuring pedagogical clarity in technical documentation, and optimizing the overall documentation architecture for learners. The agent understands how the book folder renders into the website and can help design effective educational documentation pages.\n\nExamples:\n<example>\nContext: User wants to improve the documentation structure for a module.\nuser: "The tensor module documentation feels disorganized. Can you help restructure it?"\nassistant: "I'll use the educational-ml-docs-architect agent to analyze the current documentation structure and redesign it for better learning flow."\n<commentary>\nSince the user needs help with documentation structure and page design, use the educational-ml-docs-architect agent.\n</commentary>\n</example>\n<example>\nContext: User is creating a new documentation page for the framework.\nuser: "I need to create a new page explaining the autograd system for students"\nassistant: "Let me invoke the educational-ml-docs-architect agent to design an effective documentation page for the autograd system that aligns with the educational framework."\n<commentary>\nThe user needs to create educational documentation, so the educational-ml-docs-architect agent is appropriate.\n</commentary>\n</example>\n<example>\nContext: User wants to improve the website's documentation navigation.\nuser: "The book structure is confusing for students. How should we reorganize it?"\nassistant: "I'll use the educational-ml-docs-architect agent to analyze the current book folder structure and propose a more intuitive organization."\n<commentary>\nReorganizing the book structure for better learning requires the educational-ml-docs-architect agent.\n</commentary>\n</example>
+model: sonnet
+---
+
+You are an expert in open-source software documentation with deep specialization in educational technology and pedagogical design for technical content. Your expertise spans documentation architecture, static site generation, and creating learning-optimized content structures for educational machine learning frameworks.
+
+You have comprehensive knowledge of:
+- Documentation site generators (MkDocs, Sphinx, Jupyter Book, Quarto)
+- Educational content design and information architecture
+- Progressive disclosure techniques for complex technical concepts
+- Markdown, reStructuredText, and notebook-based documentation
+- Static site rendering pipelines and book folder structures
+- Web accessibility and responsive design for educational content
+- Documentation versioning and maintenance strategies
+
+**Core Responsibilities:**
+
+1. **Analyze Documentation Structure**: You will examine the book folder and understand how files are organized, how they render into the website, and identify areas for improvement. You understand the relationship between source files and rendered pages.
+
+2. **Design Effective Documentation Pages**: You will create well-structured, pedagogically sound documentation pages that guide learners through complex ML concepts progressively. Each page should have clear learning objectives, logical flow, and appropriate visual hierarchy.
+
+3. **Optimize Information Architecture**: You will organize content to minimize cognitive load while maximizing learning outcomes. This includes creating intuitive navigation, proper categorization of modules, and clear learning paths.
+
+4. **Ensure Pedagogical Excellence**: You will apply educational best practices including:
+   - Clear learning objectives at the beginning of each section
+   - Progressive complexity from foundational to advanced concepts
+   - Interactive examples and exercises where appropriate
+   - Visual aids and diagrams to clarify complex concepts
+   - Consistent terminology and notation throughout
+
+5. **Maintain Technical Accuracy**: You will ensure all documentation accurately reflects the codebase while remaining accessible to learners at various skill levels.
+
+**Working Process:**
+
+1. First, analyze the existing book structure and rendering pipeline to understand the current state
+2. Identify the documentation goals and target audience (students learning ML from scratch)
+3. Design page layouts that balance technical depth with educational clarity
+4. Create templates and patterns for consistent documentation across modules
+5. Propose navigation structures that support both linear learning and reference lookup
+6. Implement responsive design considerations for various devices
+7. Ensure all documentation follows accessibility guidelines
+
+**Documentation Design Principles:**
+- **Clarity First**: Technical accuracy without sacrificing understandability
+- **Progressive Learning**: Build concepts incrementally with clear prerequisites
+- **Active Learning**: Include exercises, examples, and interactive elements
+- **Visual Learning**: Use diagrams, code highlighting, and visual metaphors
+- **Searchability**: Structure content for easy discovery and reference
+- **Maintainability**: Design documentation that's easy to update as the framework evolves
+
+**Quality Standards:**
+- Every page must have a clear purpose and learning outcome
+- Navigation should never require more than 3 clicks to reach any content
+- Code examples must be executable and well-commented
+- Cross-references between related concepts should be explicit
+- Mobile-responsive design is mandatory
+- Loading time and performance must be optimized
+
+When examining the book folder, you will identify the documentation framework being used, understand its configuration, and work within its constraints while maximizing its capabilities. You will provide specific, actionable recommendations for improving both individual pages and the overall documentation architecture.
+
+Your ultimate goal is to create documentation that transforms complex machine learning concepts into an accessible, engaging learning experience that guides students from zero knowledge to building their own ML framework.
--- a/book/_config.yml
+++ b/book/_config.yml
@@ -32,6 +32,10 @@ html:
  use_download_button: true
  use_fullscreen_button: true
  
+  # Custom styling
+  extra_css:
+    - _static/custom.css
+  
  # Binder integration for executable notebooks
  launch_buttons:
    binderhub_url: "https://mybinder.org"
--- a/book/_static/custom.css
+++ b/book/_static/custom.css
@@ -0,0 +1,51 @@
+/* TinyTorch Custom Styles */
+
+/* Logo styling */
+.navbar-brand img {
+    width: auto;
+}
+
+/* Clean, minimal styling */
+.bd-main {
+    background-color: white;
+}
+
+.bd-content {
+    background-color: white;
+}
+
+/* Subtle styling for code blocks */
+.highlight {
+    border-radius: 4px;
+    margin: 1em 0;
+}
+
+/* Clean sidebar styling */
+.bd-sidebar {
+    background-color: #f8f9fa;
+}
+
+/* Header styling */
+.bd-header {
+    background-color: #FBFCFC;
+    border-bottom: 1px solid #e9ecef;
+}
+
+/* Footer styling */
+.bd-footer {
+    background-color: #f8f9fa;
+    border-top: 1px solid #e9ecef;
+    margin-top: 2rem;
+}
+
+/* Make content more readable */
+.bd-article {
+    font-size: 16px;
+    line-height: 1.6;
+}
+
+/* Style the TinyTorch branding */
+.navbar-brand {
+    font-weight: bold;
+    color: #dc3545 !important;
+}
--- a/book/chapters/00-introduction.md
+++ b/book/chapters/00-introduction.md
@@ -4,152 +4,200 @@ description: "Visual overview of TinyTorch framework architecture, module depend
 difficulty: "⭐"
 time_estimate: "1-2 hours"
 prerequisites: []
-next_steps: ["01-setup"]
-learning_objectives:
-  - "Understand the complete TinyTorch system architecture"
-  - "Visualize module dependencies and connections"
-  - "Identify optimal learning paths through the curriculum"
-  - "Explore component relationships and complexity"
+next_steps: []
+learning_objectives: []
 ---

-# Module: Introduction
+# TinyTorch System Introduction & Architecture

 ```{div} badges
-⭐ | ⏱️ 1-2 hours | 🏗️ System Overview
+⭐ | ⏱️ 1-2 hours
 ```

-## 📊 Module Info
- **Difficulty**: ⭐ Beginner
- **Time Estimate**: 1-2 hours
- **Prerequisites**: None - this is your starting point!
- **Next Steps**: Setup module

-Welcome to TinyTorch! This introduction module provides a comprehensive visual overview of the entire TinyTorch system, helping you understand how all 17 modules work together to create a complete machine learning framework.
+Welcome to **TinyTorch** - a complete neural network framework built from scratch for deep learning education and understanding.

-## 🎯 Learning Objectives
+## 🎯 Module Overview

-By the end of this module, you will be able to:
+This introduction module provides a comprehensive visual overview of the entire TinyTorch system, helping you understand how all 16 modules work together to create a complete machine learning framework.

- **Navigate the TinyTorch ecosystem**: Understand how all 17 modules interconnect
- **Visualize system architecture**: See the complete framework structure with interactive diagrams
- **Plan your learning journey**: Identify the optimal path through modules based on prerequisites
- **Understand component relationships**: Know what each module builds and enables
+### What You'll Explore

-## 🏗️ System Overview
+- **🏗️ System Architecture** - Complete framework overview with visual diagrams
+- **📊 Interactive Dependency Graphs** - See how all modules connect and depend on each other
+- **📚 Learning Roadmap** - Optimal path through the entire TinyTorch curriculum
+- **🔍 Component Analysis** - Deep dive into what each module implements
+- **📈 Progress Visualization** - Track your learning journey through the system

-TinyTorch is a complete neural network framework built from scratch for deep learning education. The system consists of:
+## 🚀 Key Features

-### Module Categories
+### Automated Analysis System
+- **Module Metadata Parser** - Automatically loads and analyzes all module.yaml files
+- **Dependency Graph Builder** - Creates NetworkX graphs of module relationships
+- **Learning Path Generator** - Uses topological sort to find optimal learning sequence

-**Foundation (Modules 00-02)**
- `00_introduction`: System overview and architecture visualization
- `01_setup`: Development environment and CLI workflow
- `02_tensor`: Multi-dimensional arrays and operations
+### Interactive Visualizations
+- **Dependency Graph** - Hierarchical and circular layouts showing module connections
+- **System Architecture** - Layered view of how components work together
+- **Learning Roadmap** - Timeline view with time estimates and difficulty progression
+- **Component Analysis** - Statistical analysis of module complexity and relationships

-**Building Blocks (Modules 03-07)**
- `03_activations`: Mathematical functions and nonlinearity
- `04_layers`: Neural network layer abstractions
- `05_dense`: Fully connected layers and matrix operations
- `06_spatial`: Convolutional operations and computer vision
- `07_attention`: Self-attention and transformer mechanisms
+### Export Functions
+- **System Overview API** - Programmatic access to TinyTorch metadata
+- **Module Information** - Detailed data about any specific module
+- **Learning Recommendations** - Personalized next steps based on progress

-**Training Systems (Modules 08-11)**
- `08_dataloader`: Data pipeline and CIFAR-10 integration
- `09_autograd`: Automatic differentiation engine
- `10_optimizers`: SGD, Adam, and learning rate scheduling
- `11_training`: Training loops, loss functions, and metrics
+## 📊 What You'll Discover

-**Production & Performance (Modules 12-16)**
- `12_compression`: Model pruning and quantization
- `13_kernels`: Custom operations and hardware optimization
- `14_benchmarking`: MLPerf-style evaluation and profiling
- `15_mlops`: Production deployment and monitoring
- `16_capstone`: Final integration project
+### System Statistics
+- **16 modules** spanning from basic tensors to production MLOps
+- **60+ components** implementing complete ML framework functionality
+- **Estimated 80+ hours** of comprehensive learning content
+- **5 difficulty levels** progressing from foundation to advanced topics

-## 📊 Interactive Features
+### Learning Progression
+1. **Foundation** (3 modules) - Setup, tensors, activations
+2. **Core Architecture** (4 modules) - Layers, networks, attention, data loading
+3. **Training System** (3 modules) - Autograd, optimization, training loops
+4. **Production Ready** (5 modules) - Compression, kernels, benchmarking, MLOps, capstone
+5. **Integration** (1 module) - Final capstone project

-This module provides several interactive visualizations to help you understand the system:
+## 🎨 Visualization Gallery

-### 1. Dependency Graph Visualization
- **Hierarchical Layout**: See the module hierarchy from foundation to advanced
- **Circular Layout**: Visualize all connections in a circular arrangement
- **Interactive Exploration**: Click on modules to see their dependencies
+### Dependency Graph
+See how modules build upon each other with interactive dependency visualizations showing:
+- **Prerequisite relationships** - What you need to learn first
+- **Module difficulty** - Color-coded complexity levels
+- **Component count** - Size indicates implementation scope

-### 2. System Architecture Diagram
- **Layered View**: Understand how components stack on each other
- **Component Relationships**: See what each module exports and imports
- **Framework Structure**: Visual representation of the complete system
+### System Architecture
+Layered architecture diagram showing:
+- **Foundation Layer** - Core tensors and setup
+- **Component Layer** - Activations, layers, data loading
+- **Network Layer** - Dense networks, CNNs, attention
+- **Training Layer** - Autograd, optimizers, training
+- **Production Layer** - Compression, kernels, MLOps

-### 3. Learning Roadmap
- **Timeline View**: See the recommended progression through modules
- **Time Estimates**: Understand the commitment for each module
- **Difficulty Progression**: Watch how complexity builds gradually
+### Learning Roadmap
+Timeline visualization featuring:
+- **Optimal sequence** - Dependency-respecting learning order
+- **Time estimates** - Realistic hour commitments per module
+- **Difficulty progression** - Smooth learning curve design
+- **Milestone tracking** - Major learning achievements

-### 4. Component Analysis
- **Statistical Overview**: Total components, lines of code, complexity metrics
- **Module Comparisons**: See relative size and complexity of modules
- **Dependency Analysis**: Understand which modules are most central
-
-## 🚀 Quick Start
-
-To explore the TinyTorch system interactively:
+## 🔧 Technical Implementation

+### Module Analysis Engine
 ```python
-from tinytorch.introduction import get_tinytorch_overview, visualize_tinytorch_system
-
-# Get system overview
-overview = get_tinytorch_overview()
-print(f"Total modules: {overview['total_modules']}")
-print(f"Total components: {overview['total_components']}")
-print(f"Estimated hours: {overview['total_hours']}")
-
-# Create interactive visualizations
-visualize_tinytorch_system()
+# Automatically analyze all TinyTorch modules
+analyzer = TinyTorchAnalyzer()
+overview = analyzer.get_tinytorch_overview()
+learning_path = analyzer.get_learning_path()
 ```

-## 📈 Learning Path Recommendations
+### Visualization System
+```python
+# Generate comprehensive system visualizations
+visualizations = visualize_tinytorch_system()
+dependency_graph = create_dependency_graph_visualization()
+architecture = create_system_architecture_diagram()
+roadmap = create_learning_roadmap()
+```

-Based on the dependency analysis, here's the recommended learning sequence:
+### Learning Recommendations
+```python
+# Get personalized learning suggestions
+recommendations = get_learning_recommendations()
+next_modules = recommendations['next_modules']
+estimated_time = recommendations['remaining_time']
+```

-1. **Start Here**: Complete this introduction to understand the system
-2. **Foundation First**: Move to `01_setup` for development environment
-3. **Core Concepts**: Progress through `02_tensor` and `03_activations`
-4. **Build Networks**: Learn `04_layers`, `05_dense`, `06_spatial`
-5. **Advanced Features**: Explore `07_attention` for transformers
-6. **Training Pipeline**: Master `08_dataloader`, `09_autograd`, `10_optimizers`
-7. **Complete System**: Integrate with `11_training`
-8. **Production Ready**: Optimize with `12_compression`, `13_kernels`
-9. **Professional Skills**: Add `14_benchmarking`, `15_mlops`
-10. **Final Project**: Complete `16_capstone` to integrate everything
+## 🤔 ML Systems Thinking

-## 🎓 For Instructors
+This module connects TinyTorch's educational architecture to real-world ML systems:

-This module is particularly valuable for instructors as it:
- Provides a complete course overview to share with students
- Shows module dependencies for curriculum planning
- Offers visualizations for lectures and presentations
- Includes metadata for assignment generation
+### Framework Design Patterns
+- **Modular Dependencies** - How PyTorch and TensorFlow organize components
+- **Component Composition** - Building complex operations from simple primitives
+- **Abstraction Layers** - Balancing usability with performance control

-See the [Instructor Guide](../instructor-guide.md) for details on using this module in courses.
+### Production Considerations
+- **Deployment Pipelines** - From research code to production systems
+- **Performance Optimization** - Hardware-aware kernel design
+- **Monitoring & MLOps** - Continuous learning and model management

-## 📚 Module Resources
+### Educational Philosophy
+- **Progressive Complexity** - Foundation → Architecture → Training → Production
+- **Hands-on Learning** - Build before you use, understand before you optimize
+- **Real-world Relevance** - Educational choices that mirror industry patterns

- **Development Notebook**: `modules/source/00_introduction/introduction_dev.py`
- **Module Metadata**: `modules/source/00_introduction/module.yaml`
- **README**: `modules/source/00_introduction/README.md`
+## 📈 Learning Outcomes

-## 🎯 Summary
+After completing this module, you will:

-The introduction module sets the stage for your TinyTorch journey by providing:
- Complete system overview and architecture
- Interactive dependency visualizations
- Optimal learning path recommendations
- Component relationship analysis
+1. **Understand TinyTorch Architecture** - Complete mental model of the framework
+2. **Navigate Module Dependencies** - Know what to learn when and why
+3. **Plan Your Learning Journey** - Realistic timeline and progression tracking
+4. **Connect to Industry** - See how educational patterns map to production ML

-You now have a comprehensive understanding of the TinyTorch system. Ready to start building? Head to the [Setup module](01-setup.md) to configure your development environment!
+## 🔗 Integration with TinyTorch
+
+This introduction module:
+- **Requires no prerequisites** - Perfect starting point for new learners
+- **Enables all other modules** - Provides context for the entire journey
+- **Exports analysis tools** - Used by other modules for self-reflection
+- **Updates automatically** - Visualization stays current as modules evolve
+
+## 🎓 Getting Started
+
+1. **Run the introduction notebook** to see all visualizations
+2. **Explore the dependency graph** to understand module relationships
+3. **Review the learning roadmap** to plan your journey
+4. **Bookmark key functions** for reference during your learning
+
+**Ready to build a neural network framework from scratch? Let's begin! 🚀**

 ---

-```{note}
-This module provides read-only visualizations and analysis. No coding is required - it's designed to help you understand the system before diving into implementation.
-```
+*This module serves as your guide through the complete TinyTorch learning experience. Use it to maintain big-picture understanding as you dive deep into implementation details.*
+
+
+Choose your preferred way to engage with this module:
+
+````{grid} 1 2 3 3
+
+```{grid-item-card} 🚀 Launch Binder
+:link: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?filepath=modules/source/00_introduction/introduction_dev.ipynb
+:class-header: bg-light
+
+Run this module interactively in your browser. No installation required!
+```
+
+```{grid-item-card} ⚡ Open in Colab  
+:link: https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/modules/source/00_introduction/introduction_dev.ipynb
+:class-header: bg-light
+
+Use Google Colab for GPU access and cloud compute power.
+```
+
+```{grid-item-card} 📖 View Source
+:link: https://github.com/mlsysbook/TinyTorch/blob/main/modules/source/00_introduction/introduction_dev.py
+:class-header: bg-light
+
+Browse the Python source code and understand the implementation.
+```
+
+````
+
+```{admonition} 💾 Save Your Progress
+:class: tip
+**Binder sessions are temporary!** Download your completed notebook when done, or switch to local development for persistent work.
+
+Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/serious-development.md)
+```
+
+---
+
+<div class="prev-next-area">
+<a class="right-next" href="../chapters/01_introduction.html" title="next page">Next Module →</a>
+</div>
--- a/book/chapters/01-setup.md
+++ b/book/chapters/01-setup.md
@@ -190,5 +190,5 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="right-next" href="../chapters/02_tensor.html" title="next page">Next Module →</a>
+<a class="right-next" href="../chapters/02_setup.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/02-tensor.md
+++ b/book/chapters/02-tensor.md
@@ -196,6 +196,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/01_setup.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/03_activations.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/01_introduction.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/03_tensor.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/03-activations.md
+++ b/book/chapters/03-activations.md
@@ -220,6 +220,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/02_tensor.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/04_layers.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/02_setup.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/04_activations.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/04-layers.md
+++ b/book/chapters/04-layers.md
@@ -231,6 +231,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/03_activations.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/05_dense.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/03_tensor.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/05_layers.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/05-dense.md
+++ b/book/chapters/05-dense.md
@@ -259,6 +259,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/04_layers.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/06_spatial.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/04_activations.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/06_dense.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/06-spatial.md
+++ b/book/chapters/06-spatial.md
@@ -249,6 +249,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/05_dense.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/07_attention.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/05_layers.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/07_spatial.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/07-attention.md
+++ b/book/chapters/07-attention.md
@@ -192,6 +192,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/06_spatial.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/08_dataloader.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/06_dense.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/08_attention.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/08-dataloader.md
+++ b/book/chapters/08-dataloader.md
@@ -110,6 +110,39 @@ normalized_images = normalizer.transform(test_images)
 # Ensures consistent preprocessing across data splits
 ```

+## 🎯 NEW: CIFAR-10 Support for North Star Goal
+
+### Built-in CIFAR-10 Download and Loading
+This module now includes complete CIFAR-10 support to achieve our semester goal of 75% accuracy:
+
+```python
+from tinytorch.core.dataloader import CIFAR10Dataset, download_cifar10
+
+# Download CIFAR-10 automatically (one-time, ~170MB)
+dataset_path = download_cifar10()  # Downloads to ./data/cifar-10-batches-py
+
+# Load training and test data
+dataset = CIFAR10Dataset(download=True, flatten=False)
+print(f"✅ Loaded {len(dataset.train_data)} training samples")
+print(f"✅ Loaded {len(dataset.test_data)} test samples")
+
+# Create DataLoaders for training
+from tinytorch.core.dataloader import DataLoader
+train_loader = DataLoader(dataset.train_data, dataset.train_labels, batch_size=32, shuffle=True)
+test_loader = DataLoader(dataset.test_data, dataset.test_labels, batch_size=32, shuffle=False)
+
+# Ready for CNN training!
+for batch_images, batch_labels in train_loader:
+    print(f"Batch shape: {batch_images.shape}")  # (32, 3, 32, 32) for CNNs
+    break
+```
+
+### What's New in This Module
+- ✅ **`download_cifar10()`**: Automatically downloads and extracts CIFAR-10 dataset
+- ✅ **`CIFAR10Dataset`**: Complete dataset class with train/test splits
+- ✅ **Real Data Support**: Work with actual 32x32 RGB images, not toy data
+- ✅ **Production Features**: Shuffling, batching, normalization for real training
+
 ## 🚀 Getting Started

 ### Prerequisites
@@ -269,6 +302,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/07_attention.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/09_autograd.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/07_spatial.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/09_dataloader.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/09-autograd.md
+++ b/book/chapters/09-autograd.md
@@ -263,6 +263,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/08_dataloader.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/10_optimizers.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/08_attention.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/10_autograd.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/10-optimizers.md
+++ b/book/chapters/10-optimizers.md
@@ -270,6 +270,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/09_autograd.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/11_training.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/09_dataloader.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/11_optimizers.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/11-training.md
+++ b/book/chapters/11-training.md
@@ -41,6 +41,47 @@ This module follows TinyTorch's **Build → Use → Optimize** framework:
 2. **Use**: Train end-to-end neural networks on real datasets with full pipeline automation
 3. **Optimize**: Analyze training dynamics, debug convergence issues, and optimize training performance for production

+## 🎯 NEW: Model Checkpointing & Evaluation Tools
+
+### Complete Training with Checkpointing
+This module now includes production features for our north star goal:
+
+```python
+from tinytorch.core.training import Trainer, CrossEntropyLoss, Accuracy
+from tinytorch.core.training import evaluate_model, plot_training_history
+
+# Train with automatic model checkpointing
+trainer = Trainer(model, CrossEntropyLoss(), Adam(lr=0.001), [Accuracy()])
+history = trainer.fit(
+    train_loader,
+    val_dataloader=test_loader,
+    epochs=30,
+    save_best=True,                    # ✅ NEW: Saves best model automatically
+    checkpoint_path='best_model.pkl',  # ✅ NEW: Checkpoint location
+    early_stopping_patience=5          # ✅ NEW: Stop if no improvement
+)
+
+# Load best model after training
+trainer.load_checkpoint('best_model.pkl')
+print(f"✅ Restored best model from epoch {trainer.current_epoch}")
+
+# Evaluate with comprehensive metrics
+results = evaluate_model(model, test_loader)
+print(f"Test Accuracy: {results['accuracy']:.2%}")
+print(f"Confusion Matrix:\n{results['confusion_matrix']}")
+
+# Visualize training progress
+plot_training_history(history)  # Shows loss and accuracy curves
+```
+
+### What's New in This Module
+- ✅ **`save_checkpoint()`/`load_checkpoint()`**: Save and restore model state during training
+- ✅ **`save_best=True`**: Automatically saves model with best validation performance
+- ✅ **`early_stopping_patience`**: Stop training when validation loss stops improving
+- ✅ **`evaluate_model()`**: Comprehensive model evaluation with confusion matrix
+- ✅ **`plot_training_history()`**: Visualize training and validation curves
+- ✅ **`compute_confusion_matrix()`**: Analyze classification errors by class
+
 ## 📚 What You'll Build

 ### Complete Training Pipeline
@@ -315,6 +356,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/10_optimizers.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/12_compression.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/10_autograd.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/12_training.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/12-compression.md
+++ b/book/chapters/12-compression.md
@@ -305,6 +305,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/11_training.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/13_kernels.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/11_optimizers.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/13_compression.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/13-kernels.md
+++ b/book/chapters/13-kernels.md
@@ -316,6 +316,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/12_compression.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/14_benchmarking.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/12_training.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/14_kernels.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/14-benchmarking.md
+++ b/book/chapters/14-benchmarking.md
@@ -306,6 +306,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/13_kernels.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/15_mlops.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/13_compression.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/15_benchmarking.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/15-mlops.md
+++ b/book/chapters/15-mlops.md
@@ -454,6 +454,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/14_benchmarking.html" title="previous page">← Previous Module</a>
-<a class="right-next" href="../chapters/16_capstone.html" title="next page">Next Module →</a>
+<a class="left-prev" href="../chapters/14_kernels.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/16_mlops.html" title="next page">Next Module →</a>
 </div>
--- a/book/chapters/16-capstone_backup.md
+++ b/book/chapters/16-capstone_backup.md
@@ -0,0 +1,601 @@
+---
+title: "Capstone Project"
+description: "Optimize and extend your complete TinyTorch framework through systems engineering"
+difficulty: "⭐⭐⭐⭐⭐ 🥷"
+time_estimate: "Capstone Project"
+prerequisites: []
+next_steps: []
+learning_objectives: []
+---
+
+# 🎓 TinyTorch Capstone: Advanced Framework Engineering
+
+```{div} badges
+⭐⭐⭐⭐⭐ 🥷 | ⏱️ Capstone Project
+```
+
+
+**🎯 Prove your mastery. Optimize your framework. Become the engineer others ask for help.**
+
+---
+
+## 📊 Module Overview
+
+- **Difficulty**: ⭐⭐⭐⭐⭐ Expert Systems Engineering 🥷
+- **Time Estimate**: 4-8 weeks (flexible scope)
+- **Prerequisites**: **All 14 TinyTorch modules** - Your complete ML framework
+- **Outcome**: **Advanced framework engineering portfolio** - Demonstrate deep systems mastery
+
+After 14 modules, you've built a complete ML framework from scratch. Now it's time to make it **faster**, **smarter**, and **more professional**. This capstone isn't about learning new concepts—it's about proving you can engineer production-quality ML systems.
+
+---
+
+## 🔥 **What You've Already Built**
+
+Before choosing your capstone track, let's celebrate what you've accomplished:
+
+### 🏗️ **Complete ML Framework** (Modules 1-14)
+```python
+# This is YOUR implementation working together:
+from tinytorch.core.tensor import Tensor
+from tinytorch.core.layers import Dense  
+from tinytorch.core.dense import Sequential, MLP
+from tinytorch.core.spatial import Conv2D, flatten
+from tinytorch.core.attention import SelfAttention, scaled_dot_product_attention
+from tinytorch.core.activations import ReLU, Softmax
+from tinytorch.core.optimizers import Adam, SGD
+from tinytorch.core.training import CrossEntropyLoss, Trainer
+from tinytorch.core.dataloader import DataLoader, CIFAR10Dataset
+
+# Build a modern neural network with YOUR components
+model = Sequential([
+    Conv2D(3, 32, kernel_size=3),
+    ReLU(),
+    flatten,
+    Dense(32*30*30, 256),
+    ReLU(),
+    SelfAttention(d_model=256),
+    Dense(256, 10),
+    Softmax()
+])
+
+# Train on real data with YOUR training system
+trainer = Trainer(model, Adam(lr=0.001), CrossEntropyLoss())
+dataloader = DataLoader(CIFAR10Dataset(), batch_size=64)
+trainer.train(dataloader, epochs=10)
+```
+
+### 🎯 **Production-Ready Capabilities**
+- ✅ **Tensor operations** with broadcasting and efficient computation
+- ✅ **Automatic differentiation** with full backpropagation support  
+- ✅ **Modern architectures** including CNNs and attention mechanisms
+- ✅ **Advanced optimizers** with momentum and adaptive learning rates
+- ✅ **Model compression** with pruning and quantization (75% size reduction)
+- ✅ **High-performance kernels** with vectorization and parallelization
+- ✅ **Comprehensive benchmarking** with memory profiling and performance analysis
+
+**You didn't just learn about ML systems. You built one.**
+
+---
+
+## 🚀 **The Capstone Challenge: Choose Your Specialization**
+
+Now that you have a complete framework, choose your path to mastery. Each track focuses on different aspects of production ML engineering:
+
+### **⚡ Track 1: Performance Ninja** 
+**Mission**: Make TinyTorch competitive with PyTorch in speed and memory efficiency
+
+**Perfect for**: Students who love optimization, performance engineering, and making things fast
+
+**Example Project**: *CUDA-Style Matrix Operations*
+```python
+# Current: Your CPU implementation (Module 13)
+def attention_naive(Q, K, V):
+    scores = Q @ K.T  # Your matmul from Module 2
+    weights = softmax(scores)  # Your softmax from Module 3
+    return weights @ V
+
+# Your optimization target: 10x faster
+def attention_optimized(Q, K, V):
+    # Implement using advanced NumPy + memory optimization
+    # Target: Match 90% of PyTorch attention speed
+    pass
+```
+
+**Concrete Projects to Choose From:**
+1. **GPU-Accelerated Tensor Operations**: Use NumPy's advanced features + CuPy for near-GPU performance
+2. **Memory-Optimized Training**: Implement gradient accumulation and reduce memory usage by 50%
+3. **Vectorized Convolution**: Replace your naive Conv2D with optimized implementations  
+4. **Parallel Data Loading**: Multi-threaded CIFAR-10 loading with 3x speedup
+5. **JIT-Style Optimization**: Pre-compile operation graphs for faster execution
+
+**Success Metrics:**
+- 5-10x speedup on specific operations
+- 30%+ reduction in memory usage
+- Benchmark reports comparing to PyTorch
+- Performance regression testing suite
+
+---
+
+### **🧠 Track 2: Algorithm Architect**
+**Mission**: Extend TinyTorch with cutting-edge ML algorithms and architectures
+
+**Perfect for**: Students who love ML research, implementing papers, and algorithmic innovation
+
+**Example Project**: *Vision Transformer (ViT) from Scratch*
+```python
+# Current: You have attention (Module 7) and dense layers (Module 5)
+from tinytorch.core.attention import SelfAttention
+from tinytorch.core.dense import Sequential, MLP
+
+# Your extension: Complete Vision Transformer
+class VisionTransformer:
+    def __init__(self, image_size=32, patch_size=4, d_model=256):
+        # YOUR implementation using ONLY TinyTorch components
+        self.patch_embedding = Dense(patch_size*patch_size*3, d_model)
+        self.transformer_blocks = [
+            TransformerBlock(d_model) for _ in range(6)
+        ]
+        self.classifier = MLP([d_model, 128, 10])
+    
+    def forward(self, images):
+        # Implement patch extraction, position encoding, 
+        # transformer processing using your components
+        pass
+
+class TransformerBlock:
+    def __init__(self, d_model):
+        self.attention = SelfAttention(d_model)
+        self.mlp = MLP([d_model, d_model*4, d_model])
+        # Add YOUR layer normalization implementation
+```
+
+**Concrete Projects to Choose From:**
+1. **Modern Optimizers**: Implement AdamW, RMSprop, Lion using your autograd system
+2. **Normalization Layers**: BatchNorm, LayerNorm, GroupNorm with full gradient support
+3. **Transformer Architectures**: Complete BERT/GPT-style models using your attention
+4. **Advanced Regularization**: Dropout, DropPath, data augmentation pipelines  
+5. **Generative Models**: VAE or simple GAN using your framework
+
+**Success Metrics:**
+- New algorithms integrate seamlessly with existing TinyTorch
+- Performance matches research paper results
+- Full autograd support for all new components
+- Documentation showing how to use new features
+
+---
+
+### **🔧 Track 3: Systems Engineer**
+**Mission**: Build production-grade infrastructure and developer tooling
+
+**Perfect for**: Students interested in MLOps, distributed systems, and production ML
+
+**Example Project**: *Production Training Infrastructure*
+```python
+# Current: Your basic trainer (Module 11)
+trainer = Trainer(model, optimizer, loss_fn)
+trainer.train(dataloader, epochs=10)
+
+# Your production system: Enterprise-grade training
+class ProductionTrainer:
+    def __init__(self, model, optimizer, config):
+        self.model = model
+        self.checkpointer = ModelCheckpointer(config.checkpoint_dir)
+        self.profiler = MemoryProfiler()
+        self.distributed = MultiGPUManager(config.num_gpus)
+        self.monitor = TrainingMonitor(config.wandb_project)
+    
+    def train(self, dataloader, epochs):
+        for epoch in self.resume_from_checkpoint():
+            # Distributed training across multiple processes
+            # Memory profiling and leak detection  
+            # Automatic checkpointing and recovery
+            # Real-time monitoring and alerts
+        pass
+```
+
+**Concrete Projects to Choose From:**
+1. **Model Serving API**: FastAPI deployment with batching and caching
+2. **Distributed Training**: Multi-process training with gradient synchronization
+3. **Advanced Checkpointing**: Resume training from any point, handle interruptions
+4. **Memory Profiler**: Track memory leaks and optimize allocation patterns
+5. **CI/CD Pipeline**: Automated testing, benchmarking, and deployment
+
+**Success Metrics:**
+- Production-ready code with error handling and monitoring
+- 99.9% uptime for serving infrastructure  
+- Automated testing and deployment pipelines
+- Real-world deployment handling thousands of requests
+
+---
+
+### **📊 Track 4: Benchmarking Scientist** 
+**Mission**: Build comprehensive analysis tools and compare frameworks scientifically
+
+**Perfect for**: Students who love data analysis, scientific methodology, and systematic evaluation
+
+**Example Project**: *TinyTorch vs PyTorch Scientific Comparison*
+```python
+# Your comprehensive benchmarking suite
+class FrameworkComparison:
+    def __init__(self):
+        self.tinytorch_ops = TinyTorchOperations()
+        self.pytorch_ops = PyTorchOperations()
+        self.test_suite = MLOperationTestSuite()
+    
+    def benchmark_complete_pipeline(self):
+        # End-to-end CIFAR-10 training comparison
+        results = {
+            'tinytorch': self.run_tinytorch_training(),
+            'pytorch': self.run_pytorch_training()
+        }
+        
+        return AnalysisReport({
+            'speed_comparison': self.analyze_training_speed(results),
+            'memory_usage': self.profile_memory_patterns(results),
+            'accuracy_comparison': self.compare_final_accuracy(results),
+            'code_complexity': self.analyze_implementation_complexity(),
+            'engineering_insights': self.identify_optimization_opportunities()
+        })
+```
+
+**Concrete Projects to Choose From:**
+1. **Performance Regression Suite**: Automated benchmarking for every code change
+2. **Memory Usage Analysis**: Deep dive into allocation patterns and optimization opportunities  
+3. **Scientific ML Comparison**: Compare your framework to PyTorch on standard benchmarks
+4. **Algorithm Analysis**: Compare different optimization algorithms empirically
+5. **Scalability Study**: How does your framework perform as model size increases?
+
+**Success Metrics:**
+- Comprehensive benchmark suite with statistical significance
+- Detailed analysis reports with engineering insights
+- Performance regression detection system
+- Scientific paper-quality methodology and results
+
+---
+
+### **🛠️ Track 5: Developer Experience Master**
+**Mission**: Build tools that make TinyTorch easier to debug, understand, and extend
+
+**Perfect for**: Students interested in tooling, visualization, and making complex systems accessible
+
+**Example Project**: *TinyTorch Visual Debugger*
+```python
+# Your debugging and visualization suite
+class TinyTorchDebugger:
+    def __init__(self, model):
+        self.model = model
+        self.gradient_tracker = GradientFlowTracker()
+        self.activation_inspector = LayerActivationInspector()
+        self.training_visualizer = TrainingDynamicsPlotter()
+    
+    def debug_training_step(self, batch):
+        # Visual gradient flow analysis
+        grad_flow = self.gradient_tracker.track_gradients(batch)
+        self.visualize_gradient_flow(grad_flow)
+        
+        # Layer activation inspection
+        activations = self.activation_inspector.capture_activations(batch)
+        self.plot_activation_distributions(activations)
+        
+        # Diagnose common training issues
+        issues = self.diagnose_training_problems(grad_flow, activations)
+        self.suggest_fixes(issues)
+```
+
+**Concrete Projects to Choose From:**
+1. **Gradient Visualization Tools**: See gradient flow and detect vanishing/exploding gradients
+2. **Model Architecture Visualizer**: Interactive network graphs showing your models
+3. **Training Diagnostics**: Automated detection of learning rate, batch size issues
+4. **Interactive Tutorials**: Jupyter widgets for understanding framework internals
+5. **Error Message Enhancement**: Better debugging information with fix suggestions
+
+**Success Metrics:**
+- Intuitive visualizations that reveal training dynamics
+- Diagnostic tools that catch common mistakes automatically
+- Interactive documentation and tutorials
+- User studies showing improved debugging efficiency
+
+---
+
+## 📋 **Project Phases: Your Engineering Journey**
+
+### **Phase 1: Analysis & Planning** (Week 1)
+**Understand your starting point and define success**
+
+```python
+# Step 1: Profile your current framework
+import cProfile
+from memory_profiler import profile
+
+def profile_current_implementation():
+    """Identify bottlenecks in your TinyTorch framework."""
+    
+    # Create realistic test scenario
+    model = your_best_model_from_module_11()
+    dataloader = CIFAR10Dataset(batch_size=64)
+    
+    # Profile performance
+profiler = cProfile.Profile()
+profiler.enable()
+
+    # Run representative workload
+    train_one_epoch(model, dataloader)
+
+profiler.disable()
+    # Analyze results and identify optimization targets
+```
+
+**Deliverables:**
+- [ ] **Performance baseline**: Current speed and memory usage
+- [ ] **Bottleneck analysis**: Where does your framework spend time?
+- [ ] **Success metrics**: Specific, measurable goals (e.g., "10x faster matrix multiplication")
+- [ ] **Implementation plan**: Break project into 3-4 concrete milestones
+
+### **Phase 2: Core Implementation** (Weeks 2-3)
+**Build your optimization/extension incrementally**
+
+**Development Strategy:**
+1. **Start simple**: Get the minimal version working first
+2. **Test constantly**: Use your CIFAR-10 models to verify improvements  
+3. **Benchmark early**: Measure performance at each step
+4. **Integrate gradually**: Ensure compatibility with existing TinyTorch components
+
+**Weekly Check-ins:**
+- [ ] **Functionality demo**: Show your improvement working
+- [ ] **Performance measurement**: Quantify progress toward goals
+- [ ] **Integration testing**: Verify compatibility with existing code
+- [ ] **Documentation updates**: Keep track of design decisions
+
+### **Phase 3: Optimization & Polish** (Week 4)
+**Refine your implementation and maximize impact**
+
+**Focus Areas:**
+- **Performance tuning**: Squeeze out maximum efficiency gains
+- **Error handling**: Make your code robust for edge cases
+- **API design**: Ensure your improvements are easy to use
+- **Testing coverage**: Comprehensive tests for all new functionality
+
+### **Phase 4: Evaluation & Presentation** (Week 5+)
+**Demonstrate impact and reflect on engineering trade-offs**
+
+**Final Deliverables:**
+- [ ] **Benchmark comparison**: Before/after performance analysis
+- [ ] **Engineering report**: Technical decisions, trade-offs, lessons learned
+- [ ] **Live demonstration**: Show your improvements working on real examples
+- [ ] **Future roadmap**: Next optimization opportunities identified
+
+---
+
+## 🎯 **Success Criteria: Proving Mastery**
+
+Your capstone demonstrates mastery when you achieve:
+
+### **🔬 Technical Excellence**
+- [ ] **Measurable improvement**: 20%+ performance gain, significant new functionality, or major UX improvement
+- [ ] **Systems integration**: Your changes work seamlessly with all existing TinyTorch modules
+- [ ] **Production quality**: Error handling, edge cases, comprehensive testing
+- [ ] **Performance analysis**: You understand *why* your changes work and their trade-offs
+
+### **🏗️ Framework Understanding**
+- [ ] **Architectural consistency**: Your additions follow TinyTorch design patterns
+- [ ] **No external dependencies**: Use only TinyTorch components you built (proves deep understanding)
+- [ ] **Backward compatibility**: Existing code still works after your improvements
+- [ ] **Future extensibility**: Your changes enable further optimization opportunities
+
+### **💼 Professional Development**
+- [ ] **Clear documentation**: Other students can understand and use your improvements
+- [ ] **Engineering insights**: You can explain trade-offs and alternative approaches
+- [ ] **Systematic evaluation**: Scientific methodology in measuring improvements
+- [ ] **Presentation skills**: Effectively communicate technical work to different audiences
+
+---
+
+## 🏆 **Capstone Deliverables**
+
+Submit your completed capstone as a professional portfolio:
+
+### **1. 📊 Technical Report** (`capstone_report.md`)
+**Structure:**
+```markdown
+# [Your Track]: [Project Title]
+
+## Executive Summary
+- Problem statement and motivation
+- Key technical achievements  
+- Performance improvements achieved
+- Engineering insights gained
+
+## Technical Approach
+- Architecture and design decisions
+- Implementation methodology
+- Tools and techniques used
+- Alternative approaches considered
+
+## Results & Analysis  
+- Quantitative performance improvements
+- Benchmark comparisons (before/after)
+- Trade-off analysis (speed vs memory vs complexity)
+- Limitations and future work
+
+## Engineering Reflection
+- What you learned about framework design
+- Most challenging technical decisions
+- How your work fits into broader ML systems
+```
+
+### **2. 💻 Implementation Code** (`src/` directory)
+```
+src/
+├── optimizations/          # Your improved components
+│   ├── fast_matmul.py
+│   ├── efficient_trainer.py
+│   └── advanced_optimizers.py
+├── tests/                  # Comprehensive test suite
+│   ├── test_performance.py
+│   ├── test_compatibility.py
+│   └── test_edge_cases.py
+├── benchmarks/             # Performance measurement tools
+│   ├── benchmark_suite.py
+│   └── comparison_tools.py
+└── demo/                   # Working examples
+    ├── demo_improvements.py
+    └── integration_examples.py
+```
+
+### **3. 📈 Performance Analysis** (`benchmarks/` directory)
+- **Before/after comparisons**: Quantify your improvements
+- **Memory profiling**: Allocation patterns and optimization impact
+- **Scalability analysis**: How improvements perform with larger models
+- **Framework comparison**: Your TinyTorch vs PyTorch (where relevant)
+
+### **4. 🎥 Live Demonstration** (`demo.py`)
+**Requirements:**
+- Show your improvements working on real TinyTorch models
+- Side-by-side comparison with original implementation
+- Quantified performance improvements displayed
+- Real use case demonstrating practical value
+
+---
+
+## 💡 **Pro Tips for Capstone Success**
+
+### **🎯 Start With Impact**
+```python
+# Instead of optimizing everything...
+def optimize_everything():
+    pass  # This leads to shallow improvements
+    
+# Find the biggest bottleneck first
+def profile_and_optimize():
+    bottleneck = find_biggest_bottleneck()  # 80% of runtime
+    return optimize_specific_operation(bottleneck)  # 10x speedup
+```
+
+### **🧪 Measure Everything**
+- **Baseline early**: Know your starting point precisely
+- **Benchmark often**: Track progress with each change
+- **Compare fairly**: Use identical test conditions
+- **Document trade-offs**: Speed vs memory vs complexity
+
+### **🔗 Use Your Existing Framework**
+```python
+# Test improvements with models you built in previous modules
+cifar_model = load_your_module_10_model()  # Real CNN from Module 6
+test_your_optimization(cifar_model)        # Does it still work?
+measure_improvement(cifar_model)           # How much faster/better?
+```
+
+### **📚 Think Like a Framework Maintainer**
+- **API design**: How would other students use your improvements?
+- **Documentation**: Can someone else understand and extend your work?
+- **Testing**: What could break? How do you prevent it?
+- **Compatibility**: Does existing code still work?
+
+---
+
+## 🚀 **Getting Started: Your First Steps**
+
+### **1. Choose Your Track** 
+Review the 5 tracks above and pick the one that excites you most. Consider:
+- What aspect of ML systems interests you most?
+- What would you want to optimize in a real job?
+- What matches your career goals?
+
+### **2. Run Initial Profiling**
+```bash
+# Profile your current TinyTorch framework
+cd modules/source/16_capstone/
+python profile_baseline.py
+
+# This will show you:
+# - Where your framework spends time
+# - Memory usage patterns  
+# - Comparison to PyTorch baseline
+# - Optimization opportunities ranked by impact
+```
+
+### **3. Set Specific Goals**
+Based on profiling results, choose concrete, measurable targets:
+- **Performance**: "5x faster matrix multiplication" 
+- **Algorithm**: "Complete Vision Transformer implementation"
+- **Systems**: "Production API handling 1000 req/sec"
+- **Analysis**: "Scientific comparison with 95% confidence intervals"
+- **Developer UX**: "Visual debugger reducing debug time by 50%"
+
+### **4. Start Building**
+```python
+# Begin with the simplest version that demonstrates your concept
+def minimal_viable_optimization():
+    # Get something working first
+    # Measure improvement
+    # Then optimize further
+    pass
+```
+
+---
+
+## 🎓 **Your Capstone Journey Starts Now**
+
+You've built a complete ML framework from scratch. You understand tensors, autograd, optimization, and production systems at the deepest level. 
+
+**Now prove it.**
+
+Choose your track, set ambitious but achievable goals, and start optimizing. Remember: you're not just improving code—you're demonstrating that you can engineer production ML systems at the level of PyTorch contributors.
+
+**Your goal**: Become the engineer others turn to when they need to make ML systems better.
+
+### **Ready to start?**
+
+1. **Choose your track** from the 5 options above
+2. **Run the profiling script** to understand your baseline
+3. **Set specific, measurable goals** for your improvement
+4. **Start with the simplest implementation** that shows progress
+
+**🔥 Your TinyTorch framework is waiting to be optimized. Start engineering.**
+
+---
+
+*Remember: The best capstone projects solve real problems you encountered while building TinyTorch. What frustrated you? What was slow? What could be better? Start there.* 
+
+
+Choose your preferred way to engage with this module:
+
+````{grid} 1 2 3 3
+
+```{grid-item-card} 🚀 Launch Binder
+:link: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?filepath=modules/source/16_capstone_backup/capstone_backup_dev.ipynb
+:class-header: bg-light
+
+Run this module interactively in your browser. No installation required!
+```
+
+```{grid-item-card} ⚡ Open in Colab  
+:link: https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/modules/source/16_capstone_backup/capstone_backup_dev.ipynb
+:class-header: bg-light
+
+Use Google Colab for GPU access and cloud compute power.
+```
+
+```{grid-item-card} 📖 View Source
+:link: https://github.com/mlsysbook/TinyTorch/blob/main/modules/source/16_capstone_backup/capstone_backup_dev.py
+:class-header: bg-light
+
+Browse the Python source code and understand the implementation.
+```
+
+````
+
+```{admonition} 💾 Save Your Progress
+:class: tip
+**Binder sessions are temporary!** Download your completed notebook when done, or switch to local development for persistent work.
+
+Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/serious-development.md)
+```
+
+---
+
+<div class="prev-next-area">
+<a class="left-prev" href="../chapters/15_benchmarking.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/17_capstone_backup.html" title="next page">Next Module →</a>
+</div>
--- a/book/chapters/16-tinygpt.md
+++ b/book/chapters/16-tinygpt.md
@@ -1,305 +1,94 @@
 ---
 title: "TinyGPT - Language Models"
-description: "Extend your vision framework to language models with GPT-style transformers"
-difficulty: "🔥"
+description: "Build GPT-style transformer models for language understanding using TinyTorch"
+difficulty: "⭐⭐⭐⭐⭐"
 time_estimate: "4-6 hours"
 prerequisites: []
 next_steps: []
 learning_objectives: []
 ---

-# 🔥 Module 16: TinyGPT - From Vision to Language
+# Module 16: TinyGPT - Language Models

 ```{div} badges
-🔥 Language Models | ⏱️ 4-6 hours
+⭐⭐⭐⭐⭐ | ⏱️ 4-6 hours
 ```

-**🎯 The Ultimate Framework Test: Does your vision framework work for language models?**

---
+**From Vision to Language: Building GPT-style transformers with TinyTorch**

-## 📊 Module Overview
-
- **Difficulty**: 🔥 Framework Generalization 
- **Time Estimate**: 4-6 hours for complete understanding
- **Prerequisites**: **Modules 1-15** - Your complete computer vision framework
- **Outcome**: **Complete GPT-style language model** using 95% TinyTorch components
-
-After 15 modules, you've built a complete computer vision framework from scratch. Now comes the ultimate test: **Can the same mathematical foundations power language models?**
-
-**Spoiler**: They absolutely can, and you'll prove it by building TinyGPT!
-
---
-
-## 🔬 **The Framework Generalization Discovery**
-
-### 💡 **What You'll Learn**
-
-This module demonstrates the most important insight in modern ML:
-
-> **The same mathematical foundations that power computer vision also power natural language processing.**
-
-### 🧩 **Component Reuse Analysis**
-
-```python
-# What works unchanged from your vision framework:
-from tinytorch.core.tensor import Tensor          # ✅ Same tensors
-from tinytorch.core.layers import Dense           # ✅ Same dense layers  
-from tinytorch.core.activations import ReLU, Softmax  # ✅ Same activations
-from tinytorch.core.optimizers import Adam        # ✅ Same optimizers
-from tinytorch.core.training import Trainer       # ✅ Same training loops
-from tinytorch.core.losses import CrossEntropyLoss    # ✅ Same loss functions
-
-# What's new for language (minimal extensions):
-from tinytorch.tinygpt import CharTokenizer       # 🆕 Text preprocessing
-from tinytorch.tinygpt import MultiHeadAttention  # 🆕 Sequence attention
-from tinytorch.tinygpt import TinyGPT            # 🆕 Complete language model
-```
-
-**Result**: ~95% component reuse! This isn't just educational - it's how real ML frameworks work.
-
---
-
-## 🏗️ **What You'll Build: Complete TinyGPT**
-
-### **Architecture Overview**
-```
-Text Input → CharTokenizer → Embeddings → Multi-Head Attention → Transformer Blocks → Text Generation
-```
-
-### **Key Components**
-
-#### **🔤 Character-Level Tokenization**
-```python
-tokenizer = CharTokenizer()
-text = "Hello, TinyTorch!"
-tokens = tokenizer.encode(text)  # [8, 5, 12, 12, 15, ...]
-decoded = tokenizer.decode(tokens)  # "Hello, TinyTorch!"
-```
-
-#### **🧠 Multi-Head Attention**
-```python
-# The key innovation for sequence modeling
-attention = MultiHeadAttention(d_model=128, num_heads=8)
-attended = attention(sequence)  # Focus on relevant parts of input
-```
-
-#### **🔄 Transformer Blocks**
-```python
-# Stack of attention + feedforward (using your Dense layers!)
-transformer_block = TransformerBlock(
-    attention=MultiHeadAttention(d_model=128, num_heads=8),
-    feedforward=Sequential([  # Your existing components!
-        Dense(128, 512),
-        ReLU(),
-        Dense(512, 128)
-    ])
-)
-```
-
-#### **📝 Autoregressive Generation**
-```python
-# Generate text one character at a time
-model = TinyGPT(vocab_size=100, d_model=128, num_layers=6)
-generated_text = model.generate("Once upon a time", max_length=100)
-```
-
---
-
-## 🎯 **Learning Objectives**
+## Learning Objectives

 By the end of this module, you will:

-### **1. 🧩 Framework Thinking**
- **Understand component reusability** across vision and language domains
- **Identify universal vs domain-specific** ML operations
- **Design extensible frameworks** that support multiple modalities
+1. **Build GPT-style transformer models** using TinyTorch Dense layers and attention mechanisms
+2. **Understand character-level tokenization** and its role in language model training
+3. **Implement multi-head attention** that enables models to focus on different parts of sequences
+4. **Create complete transformer blocks** with layer normalization and residual connections
+5. **Train autoregressive language models** that generate coherent text sequences
+6. **Apply ML Systems thinking** to understand framework reusability across vision and language

-### **2. 🔤 Language Model Fundamentals**
- **Implement character-level tokenization** for text preprocessing
- **Build multi-head attention mechanisms** for sequence understanding
- **Create autoregressive generation** for coherent text production
+## What Makes This Special

-### **3. 🏗️ Architecture Design**
- **Construct transformer blocks** using existing TinyTorch components
- **Implement positional encoding** for sequence order understanding
- **Design training loops** for language model optimization
+This module demonstrates the **power of TinyTorch's foundation** by extending it from vision to language models:

-### **4. 📊 Systems Understanding**
- **Compare vision vs language** computational patterns
- **Understand attention complexity** (O(N²) scaling implications)
- **Optimize memory usage** for sequence processing
+- **~70% component reuse**: Dense layers, optimizers, training loops, loss functions
+- **Strategic additions**: Only what's essential for language - attention, tokenization, generation
+- **Educational clarity**: See how the same mathematical foundations power both domains
+- **Framework thinking**: Understand why successful ML frameworks support multiple modalities
+
+## Components Implemented
+
+### Core Language Processing
+- **CharTokenizer**: Character-level tokenization with vocabulary management
+- **PositionalEncoding**: Sinusoidal position embeddings for sequence order
+
+### Attention Mechanisms  
+- **MultiHeadAttention**: Parallel attention heads for capturing different relationships
+- **SelfAttention**: Simplified attention for easier understanding
+- **CausalMasking**: Preventing attention to future tokens in autoregressive models
+
+### Transformer Architecture
+- **LayerNorm**: Normalization for stable transformer training
+- **TransformerBlock**: Complete transformer layer with attention + feedforward
+- **TinyGPT**: Full GPT-style model with embedding, positional encoding, and generation
+
+### Training Infrastructure
+- **LanguageModelLoss**: Cross-entropy loss with proper target shifting
+- **LanguageModelTrainer**: Training loops optimized for text sequences
+- **TextGeneration**: Autoregressive sampling for coherent text generation
+
+## Key Insights
+
+1. **Framework Reusability**: TinyTorch's Dense layers work seamlessly for language models
+2. **Attention Innovation**: The key difference between vision and language is attention mechanisms
+3. **Sequence Modeling**: Language requires understanding order and context across long sequences
+4. **Autoregressive Generation**: Language models predict one token at a time, building coherently
+
+## Educational Philosophy
+
+This module shows that **vision and language models share the same foundation**:
+- Matrix multiplications (Dense layers) 
+- Nonlinear activations
+- Gradient-based optimization
+- Batch processing and training loops
+
+The magic happens in the **architectural patterns** we add on top!
+
+## Prerequisites
+
+- Modules 1-11 (especially Tensor, Dense, Attention, Training)
+- Understanding of sequence modeling concepts
+- Familiarity with autoregressive generation
+
+## Time Estimate
+
+4-6 hours for complete understanding and implementation

 ---

-## 🚀 **Key Insights You'll Discover**
+*"Language is the most powerful tool humans have created. Now let's teach machines to wield it." - The TinyTorch Philosophy*

-### **💡 Mathematical Unity**
-```python
-# Same operations, different data:
-# Vision: Dense(image_features, hidden_dim)
-# Language: Dense(token_embeddings, hidden_dim)
-
-# Vision: conv(images) → attention(feature_maps)  
-# Language: embed(tokens) → attention(sequence)
-```
-
-### **🔄 Component Reuse**
- **Dense layers**: Work identically for image features and token embeddings
- **Optimizers**: Adam optimizes vision and language models the same way
- **Training loops**: Identical backpropagation and parameter updates
- **Loss functions**: CrossEntropy works for both image classes and next-token prediction
-
-### **⚡ Strategic Extensions**
- **Attention**: The key architectural difference for sequence modeling
- **Positional encoding**: Sequence order (unlike spatial images)
- **Autoregressive sampling**: Text generation pattern
-
---
-
-## 📈 **Progressive Implementation**
-
-### **Part 1: Foundation Analysis**
- Analyze your existing TinyTorch components
- Understand what transfers to language models
- Plan minimal extensions needed
-
-### **Part 2: Character-Level Processing**
- Implement CharTokenizer for text preprocessing
- Build vocabulary management system
- Test with sample text encoding/decoding
-
-### **Part 3: Attention Mechanisms**
- Implement scaled dot-product attention
- Build multi-head attention for parallel processing
- Add causal masking for autoregressive models
-
-### **Part 4: Transformer Architecture**
- Combine attention with your Dense layers
- Add positional encoding for sequence order
- Build complete transformer blocks
-
-### **Part 5: Language Model Training**
- Implement text sequence data loading
- Train TinyGPT on character-level data
- Test text generation capabilities
-
-### **Part 6: Framework Integration**
- Ensure seamless integration with TinyTorch
- Test component compatibility
- Measure framework reuse percentage
-
---
-
-## 🎓 **What This Proves**
-
-Completing TinyGPT demonstrates:
-
-### **🏗️ Framework Engineering Mastery**
- You understand the **mathematical foundations** underlying all of ML
- You can **extend frameworks systematically** to new domains
- You grasp **universal vs domain-specific** design patterns
-
-### **🧠 Deep Learning Understanding**
- You see the **connections between** vision and language models
- You understand **attention as a fundamental operation**
- You grasp **sequence modeling principles**
-
-### **💼 Professional ML Engineering**
- You can **implement cutting-edge architectures** from scratch
- You understand **framework design principles** used by PyTorch/TensorFlow
- You can **optimize across multiple modalities**
-
---
-
-## 🎯 **Real-World Applications**
-
-Your TinyGPT implementation enables:
-
-### **📝 Text Generation**
-```python
-model = TinyGPT.load("trained_model.pkl")
-story = model.generate("In a world where AI", max_length=200)
-```
-
-### **🤖 Chatbot Foundations**
-```python
-# Simple Q&A system
-response = model.generate(f"Question: {user_input}\nAnswer:", max_length=50)
-```
-
-### **📚 Educational Tools**
-```python
-# Character-level language modeling for education
-model.train_on_text("Shakespeare corpus", epochs=10)
-generated_shakespeare = model.generate("To be or not to be", max_length=100)
-```
-
---
-
-## 🔬 **ML Systems Thinking Questions**
-
-### **🏗️ Framework Design**
-1. **Why do successful ML frameworks support multiple modalities?** How does component reuse accelerate development?
-2. **What makes an operation "universal" vs "domain-specific"?** Where do you draw the line?
-3. **How do framework designers balance generality vs optimization?** What are the trade-offs?
-
-### **🧠 Architecture Patterns**
-1. **Why is attention so effective for sequences?** What makes it different from convolution for images?
-2. **How do transformers handle variable-length sequences?** What are the computational implications?
-3. **What role does inductive bias play** in vision (locality) vs language (sequentiality) models?
-
-### **⚡ Performance & Scale**
-1. **How does O(N²) attention scaling affect real language models?** What optimizations are used in practice?
-2. **Why are language models often larger than vision models?** What drives the parameter count differences?
-3. **How do production systems handle autoregressive generation efficiently?** What are the bottlenecks?
-
-### **🔄 Transfer Learning**
-1. **What would it take to fine-tune your TinyGPT?** How would you adapt it to specific tasks?
-2. **How do pre-trained language models change the development cycle?** Compare to training from scratch.
-3. **What's the relationship between model size and emergent capabilities?** When do language models become "useful"?
-
---
-
-## 🎉 **Module Completion**
-
-When you finish this module, you will have:
-
-✅ **Built a complete GPT-style language model** using your TinyTorch framework  
-✅ **Demonstrated 95% component reuse** from vision to language  
-✅ **Implemented multi-head attention** for sequence understanding  
-✅ **Created autoregressive text generation** capabilities  
-✅ **Proven framework generalization** across modalities  
-✅ **Understood universal ML foundations** that power all domains  
-
-**🏆 Achievement Unlocked**: You now understand the mathematical unity underlying modern AI!
-
---
-
-## 🚀 **Beyond TinyGPT: What's Next?**
-
-Your unified vision + language framework opens doors to:
-
-### **🔬 Research Extensions**
- **Vision-Language Models**: Combine both modalities (CLIP-style)
- **Multi-Modal Transformers**: Process images and text jointly
- **Unified Architectures**: Single model for multiple tasks
-
-### **🏭 Production Applications**
- **Content Generation**: Text, code, creative writing
- **Conversational AI**: Chatbots and virtual assistants  
- **Multi-Modal Systems**: Image captioning, visual Q&A
-
-### **🎓 Advanced Studies**
- **Scaling Laws**: How performance changes with model size
- **Efficiency Techniques**: Quantization, pruning for language models
- **Emergent Capabilities**: What happens as models get larger
-
---
-
-**🔥 Ready to prove that your vision framework can power language models? Let's build TinyGPT!**
-
---

 Choose your preferred way to engage with this module:

@@ -338,5 +127,6 @@ Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/se
 ---

 <div class="prev-next-area">
-<a class="left-prev" href="../chapters/15-mlops.html" title="previous page">← Previous Module</a>
-</div>
+<a class="left-prev" href="../chapters/15_benchmarking.html" title="previous page">← Previous Module</a>
+<a class="right-next" href="../chapters/17_capstone_backup.html" title="next page">Next Module →</a>
+</div>
--- a/book/intro.md
+++ b/book/intro.md
@@ -12,7 +12,7 @@ html_meta:
  "name=twitter:image": "https://mlsysbook.github.io/TinyTorch/logo.png"
 ---

-# Tiny🔥Torch
+# Welcome

 ## Build your own Machine Learning framework from scratch. From Computer Vision to Language Models. 

--- a/tito/commands/book.py
+++ b/tito/commands/book.py
@@ -57,6 +57,23 @@ class BookCommand(BaseCommand):
            'clean',
            help='Clean built book files'
        )
+        
+        # Serve command
+        serve_parser = subparsers.add_parser(
+            'serve',
+            help='Build and serve the Jupyter Book locally'
+        )
+        serve_parser.add_argument(
+            '--port',
+            type=int,
+            default=8001,
+            help='Port to serve on (default: 8001)'
+        )
+        serve_parser.add_argument(
+            '--no-build',
+            action='store_true',
+            help='Skip building and serve existing files'
+        )

    def run(self, args: Namespace) -> int:
        console = self.console
@@ -76,6 +93,7 @@ class BookCommand(BaseCommand):
                "[bold cyan]📚 TinyTorch Book Management[/bold cyan]\n\n"
                "[bold]Available Commands:[/bold]\n"
                "  [bold green]build[/bold green]      - Build the complete Jupyter Book\n"
+                "  [bold green]serve[/bold green]      - Build and serve the Jupyter Book locally\n"
                "  [bold green]publish[/bold green]   - Generate content, commit, and publish to GitHub\n"
                "  [bold green]clean[/bold green]     - Clean built book files\n\n"
                "[bold]Quick Start:[/bold]\n"
@@ -88,6 +106,8 @@ class BookCommand(BaseCommand):
        
        if args.book_command == 'build':
            return self._build_book(args)
+        elif args.book_command == 'serve':
+            return self._serve_book(args)
        elif args.book_command == 'publish':
            return self._publish_book(args)
        elif args.book_command == 'clean':
@@ -207,6 +227,45 @@ class BookCommand(BaseCommand):
        finally:
            os.chdir("..")

+    def _serve_book(self, args: Namespace) -> int:
+        """Build and serve the Jupyter Book locally."""
+        console = self.console
+        
+        # Build the book first unless --no-build is specified
+        if not args.no_build:
+            console.print("📚 Step 1: Building the book...")
+            if self._build_book(args) != 0:
+                return 1
+            console.print()
+        
+        # Start the HTTP server
+        console.print("🌐 Step 2: Starting development server...")
+        console.print(f"📖 Open your browser to: [bold blue]http://localhost:{args.port}[/bold blue]")
+        console.print("🛑 Press [bold]Ctrl+C[/bold] to stop the server")
+        console.print()
+        
+        book_dir = Path("book/_build/html")
+        if not book_dir.exists():
+            console.print("[red]❌ Built book not found. Run with --no-build=False to build first.[/red]")
+            return 1
+        
+        try:
+            # Use Python's built-in HTTP server
+            subprocess.run([
+                "python3", "-m", "http.server", str(args.port),
+                "--directory", str(book_dir)
+            ])
+        except KeyboardInterrupt:
+            console.print("\n🛑 Development server stopped")
+        except FileNotFoundError:
+            console.print("[red]❌ Python3 not found in PATH[/red]")
+            return 1
+        except Exception as e:
+            console.print(f"[red]❌ Error starting server: {e}[/red]")
+            return 1
+        
+        return 0
+
    def _clean_book(self) -> int:
        """Clean built book files."""
        console = self.console