docs(book): Update introduction, TOC, and learning progress from dev branch

2026-05-26 06:00:54 -05:00 · 2025-10-28 15:35:29 -04:00
parent 53fdddbcb7
commit 7bc0a04217
4 changed files with 237 additions and 215 deletions
--- a/book/chapters/00-introduction.md
+++ b/book/chapters/00-introduction.md
@@ -141,95 +141,119 @@ output = model(input)                    # YOU know exactly how this works

 ---

-## What You'll Achieve: Complete ML Systems Mastery
+## What You'll Achieve: Tier-by-Tier Mastery

-### Immediate Achievements (Modules 1-8)
-By Module 8, you'll have built a complete neural network framework from scratch:
+### 🏗️ After Foundation Tier (Modules 01-07)
+Build a complete neural network framework from mathematical first principles:

 ```python
 # YOUR implementation training real networks on real data
 model = Sequential([
-    Linear(784, 128),    # Your linear layer
-    ReLU(),              # Your activation function  
-    Linear(128, 64),     # Your architecture design
+    Linear(784, 128),    # Your linear algebra implementation
+    ReLU(),              # Your activation function
+    Linear(128, 64),     # Your gradient-aware layers
    ReLU(),              # Your nonlinearity
-    Linear(64, 10)       # Your final classifier
+    Linear(64, 10)       # Your classification head
 ])

-# YOUR training loop using YOUR optimizer
-optimizer = Adam(model.parameters(), lr=0.001)  # Your Adam implementation
-for batch in dataloader:  # Your data loading
-    output = model(batch.x)                     # Your forward pass
-    loss = CrossEntropyLoss()(output, batch.y)  # Your loss function
-    loss.backward()                             # Your backpropagation
+# YOUR complete training system
+optimizer = Adam(model.parameters(), lr=0.001)  # Your optimization algorithm
+for batch in dataloader:  # Your data management
+    output = model(batch.x)                     # Your forward computation
+    loss = CrossEntropyLoss()(output, batch.y)  # Your loss calculation
+    loss.backward()                             # YOUR backpropagation engine
    optimizer.step()                            # Your parameter updates
 ```

-**Result: 95%+ accuracy on MNIST using 100% your own code.**
+**🎯 Foundation Achievement**: 95%+ accuracy on MNIST using 100% your own mathematical implementations

-### Advanced Capabilities (Modules 9-14)
- **Computer Vision**: CNNs achieving 75%+ accuracy on CIFAR-10
- **Language Models**: TinyGPT built using 95% of your vision components
- **Universal Architecture**: Same mathematical foundations power all modern AI
+### 🧠 After Intelligence Tier (Modules 08-13)
+- **Computer Vision Mastery**: CNNs achieving 75%+ accuracy on CIFAR-10 with YOUR convolution implementations
+- **Language Understanding**: Transformers generating coherent text using YOUR attention mechanisms
+- **Universal Architecture**: Discover why the SAME mathematical principles work for vision AND language
+- **AI Breakthrough Recreation**: Implement the architectures that created the modern AI revolution

-### Production Systems (Modules 15-20)
- **Performance Engineering**: Profile, measure, and optimize ML systems
- **Memory Optimization**: Understand and implement compression techniques
- **Hardware Acceleration**: Build efficient kernels and vectorized operations
- **TinyMLPerf Competition**: Compete with optimized implementations
+### ⚡ After Optimization Tier (Modules 14-20)
+- **Production Performance**: Systems optimized for <100ms inference latency using YOUR profiling tools
+- **Memory Efficiency**: Models compressed to 25% original size with YOUR quantization implementations
+- **Hardware Acceleration**: Kernels achieving 10x speedups through YOUR vectorization techniques
+- **Competition Ready**: TinyMLPerf submissions competitive with industry implementations

 ---

 ## The ML Evolution Story You'll Experience

-TinyTorch follows the actual historical progression of machine learning breakthroughs:
+TinyTorch's three-tier structure follows the actual historical progression of machine learning breakthroughs:

-### 🧠 Era 1: Foundation (1980s) - Modules 1-8
-**The Beginning**: Perceptrons and multi-layer networks
- Build tensor operations and automatic differentiation
- Implement gradient-based optimization (SGD, Adam)
- **Achievement**: Train MLPs to 95%+ accuracy on MNIST
+### 🏗️ Foundation Era (1980s-1990s) → Foundation Tier
+**The Beginning**: Mathematical foundations that started it all
+- **1986 Breakthrough**: Backpropagation enables multi-layer networks
+- **Your Implementation**: Build automatic differentiation and gradient-based optimization
+- **Historical Milestone**: Train MLPs to 95%+ accuracy on MNIST using YOUR autograd engine

-### 👁️ Era 2: Spatial Intelligence (1989-2012) - Modules 9-10  
-**The Revolution**: Convolutional neural networks
- Add spatial processing with Conv2d and pooling operations
- Build efficient data pipelines for real-world datasets
- **Achievement**: Train CNNs to 75%+ accuracy on CIFAR-10
+### 🧠 Intelligence Era (1990s-2010s) → Intelligence Tier
+**The Revolution**: Specialized architectures for vision and language
+- **1998 Breakthrough**: CNNs revolutionize computer vision (LeCun's LeNet)
+- **2017 Breakthrough**: Transformers unify vision and language ("Attention is All You Need")
+- **Your Implementation**: Build CNNs achieving 75%+ on CIFAR-10, then transformers for text generation
+- **Historical Milestone**: Recreate both revolutions using YOUR spatial and attention implementations

-### 🗣️ Era 3: Universal Architecture (2017-Present) - Modules 11-14
-**The Unification**: Transformers for vision AND language
- Implement attention mechanisms and positional embeddings
- Build TinyGPT using your existing vision infrastructure
- **Achievement**: Language generation with 95% component reuse
+### ⚡ Optimization Era (2010s-Present) → Optimization Tier
+**The Engineering**: Production systems that scale to billions of users
+- **2020s Breakthrough**: Efficient inference enables real-time LLMs (GPT, ChatGPT)
+- **Your Implementation**: Build KV-caching, quantization, and production optimizations
+- **Historical Milestone**: Deploy systems competitive in TinyMLPerf benchmarks

-### ⚡ Era 4: Production Systems (Present) - Modules 15-20
-**The Engineering**: Optimized, deployable ML systems
- Profile performance and identify bottlenecks
- Implement compression, quantization, and acceleration
- **Achievement**: TinyMLPerf competition-ready implementations
+**Why This Progression Matters**: You'll understand not just modern AI, but WHY it evolved this way. Each tier builds essential capabilities that inform the next, just like ML history itself.

 ---

-## Systems Engineering Focus: Why It Matters
+## Systems Engineering Focus: Why Tiers Matter

-Traditional ML courses focus on **algorithms**. TinyTorch focuses on **systems**.
+Traditional ML courses teach algorithms in isolation. TinyTorch's tier structure teaches **systems thinking** - how components interact to create production ML systems.

-### What Traditional Courses Teach:
- "Use `torch.optim.Adam` for optimization"
- "Transformers use attention mechanisms"  
- "Larger models generally perform better"
+### Traditional Linear Approach:
+```
+Module 1: Tensors → Module 2: Layers → Module 3: Training → ...
+```
+**Problem**: Students learn components but miss system interactions

-### What TinyTorch Teaches:
- "Why Adam consumes 3× more memory than SGD and when that matters in production"
- "How attention scales O(N²) with sequence length and limits context windows"
- "How to profile memory usage and identify training bottlenecks"
+### TinyTorch Tier Approach:
+```
+🏗️ Foundation Tier: Build mathematical infrastructure
+🧠 Intelligence Tier: Compose intelligent architectures
+⚡ Optimization Tier: Deploy at production scale
+```
+**Advantage**: Each tier builds complete, working systems with clear progression

-### Career Impact
-After TinyTorch, you become the team member who:
- **Debugs performance issues**: "Your convolution is memory-bound, not compute-bound"
- **Optimizes production systems**: "We can use gradient accumulation to train with less GPU memory"
- **Implements custom operations**: "I'll write a custom kernel for this novel architecture"
- **Designs system architecture**: "Here's why this model won't scale and how to fix it"
+### What Traditional Courses Teach vs. TinyTorch Tiers:
+
+**Traditional**: "Use `torch.optim.Adam` for optimization"
+**Foundation Tier**: "Why Adam needs 3× more memory than SGD and how to implement both from mathematical first principles"
+
+**Traditional**: "Transformers use attention mechanisms"
+**Intelligence Tier**: "How attention creates O(N²) scaling, why this limits context windows, and how to implement efficient attention yourself"
+
+**Traditional**: "Deploy models with TensorFlow Serving"
+**Optimization Tier**: "How to profile bottlenecks, implement KV-caching for 10× speedup, and compete in production benchmarks"
+
+### Career Impact by Tier
+After each tier, you become the team member who:
+
+**🏗️ Foundation Tier Graduate**:
+- Debugs gradient flow issues: "Your ReLU is causing dead neurons"
+- Implements custom optimizers: "I'll build a variant of Adam for this use case"
+- Understands memory patterns: "Batch size 64 hits your GPU memory limit here"
+
+**🧠 Intelligence Tier Graduate**:
+- Designs novel architectures: "We can adapt transformers for this computer vision task"
+- Optimizes attention patterns: "This attention bottleneck is why your model won't scale to longer sequences"
+- Bridges vision and language: "The same mathematical principles work for both domains"
+
+**⚡ Optimization Tier Graduate**:
+- Deploys production systems: "I can get us from 500ms to 50ms inference latency"
+- Leads performance optimization: "Here's our memory bottleneck and my 3-step plan to fix it"
+- Competes at industry scale: "Our optimizations achieve TinyMLPerf benchmark performance"

 ---

@@ -254,165 +278,159 @@ After TinyTorch, you become the team member who:

 ---

-## Ready to Begin?
+## 🚀 Start Your Journey

-You're about to embark on a journey that will transform how you think about machine learning systems. Instead of using black-box frameworks, you'll understand every component from the ground up.
+<div style="background: #f8f9fa; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0; text-align: center;">
+<h3 style="margin: 0 0 1rem 0; color: #495057;">Begin Building ML Systems</h3>
+<p style="margin: 0 0 1.5rem 0; color: #6c757d;">Choose your starting point based on your goals and time commitment</p>
+<a href="../quickstart-guide.html" style="display: inline-block; background: #007bff; color: white; padding: 0.75rem 1.5rem; border-radius: 0.25rem; text-decoration: none; font-weight: 500; margin-right: 1rem;">15-Minute Start →</a>
+<a href="01-setup.html" style="display: inline-block; background: #28a745; color: white; padding: 0.75rem 1.5rem; border-radius: 0.25rem; text-decoration: none; font-weight: 500;">Foundation Tier →</a>
+</div>

-**Next Step**: [Module 01: Setup](01-setup.md) - Configure your development environment and build your first TinyTorch function.
+**Next Steps**:
+- **New to TinyTorch**: Start with [Quick Start Guide](../quickstart-guide.html) for immediate hands-on experience
+- **Ready to Commit**: Begin [Module 01: Setup](01-setup.html) to configure your development environment
+- **Teaching a Course**: Review [Instructor Guide](../usage-paths/classroom-use.html) for classroom integration

-```{admonition} Your Learning Journey Awaits
+```{admonition} Your Three-Tier Journey Awaits
 :class: tip
-By the end of this course, you'll have built a complete ML framework that rivals educational implementations like MiniTorch and micrograd, while achieving production-level results:
- **95%+ accuracy on MNIST** (handwritten digit recognition)
- **75%+ accuracy on CIFAR-10** (real-world image classification)  
- **TinyGPT language generation** (modern transformer architecture)
- **TinyMLPerf competition entries** (optimized systems performance)
+By completing all three tiers, you'll have built a complete ML framework that rivals production implementations:

-All using code you wrote yourself, from scratch.
+**🏗️ Foundation Tier Achievement**: 95%+ accuracy on MNIST with YOUR mathematical implementations
+**🧠 Intelligence Tier Achievement**: 75%+ accuracy on CIFAR-10 AND coherent text generation
+**⚡ Optimization Tier Achievement**: Production systems competitive in TinyMLPerf benchmarks
+
+All using code you wrote yourself, from mathematical first principles to production optimization.
 ```

 ---

-## Complete Learning Timeline & Course Structure
+### 🏗️ FOUNDATION TIER (Modules 01-07)
+**Building Blocks of ML Systems • 6-8 weeks • All Prerequisites for Neural Networks**

-### Capability Progression: Foundation to Production
+<div style="background: #f8f9fd; border: 1px solid #e0e7ff; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0;">

-```{mermaid}
-:align: center
+**What You'll Learn**: Build the mathematical and computational infrastructure that powers all neural networks. Master tensor operations, gradient computation, and optimization algorithms.

-timeline
-    title TinyTorch Capability Development: Building ML Systems
+**Prerequisites**: Python programming, basic linear algebra (matrix multiplication)

-    section Foundation Capabilities
-        Environment Setup     : Checkpoint 00 Complete
-                             : Configure development environment
-                             : Verify dependencies
+**Career Connection**: Foundation skills required for ML Infrastructure Engineer, Research Engineer, Framework Developer roles

-        Tensor Operations     : Checkpoint 01 Complete
-                             : N-dimensional arrays
-                             : Mathematical foundations
+**Time Investment**: ~20 hours total (3 hours/week for 6-8 weeks)

-    section Core Learning
-        Neural Intelligence   : Checkpoint 02 Complete
-                             : Nonlinear activations
-                             : ReLU, Sigmoid, Softmax
-
-        Network Building     : Checkpoint 03 Complete
-                             : Layer abstractions
-                             : Forward propagation
-
-    section Training Systems
-        Gradient Computation  : Checkpoint 05 Complete
-                             : Automatic differentiation
-                             : Backpropagation mechanics
-
-        Optimization         : Checkpoint 06 Complete
-                             : SGD, Adam algorithms
-                             : Learning rate scheduling
-
-    section Advanced Architectures
-        Computer Vision      : Checkpoint 08 Complete
-                             : Convolutional operations
-                             : Spatial feature extraction
-
-        Language Processing  : Checkpoint 12 Complete
-                             : Attention mechanisms
-                             : Transformer architectures
-
-    section Production Systems
-        Performance Analysis : Checkpoint 14 Complete
-                             : Profiling and optimization
-                             : Bottleneck identification
-
-        Complete Mastery     : Checkpoint 15 Complete
-                             : End-to-end ML systems
-                             : Production deployment
-```
-
-### Part I: Core Foundations (Modules 1-8)
-**Focus: Neural Network Fundamentals | 8 weeks**
-
-| Week | Module | Core Capability | Implementation Focus | Checkpoint Unlocked |
-|------|--------|-----------------|---------------------|--------------------|
-| 1 | Setup | Environment Configuration | Development environment setup | 00: Environment |
-| 2 | Tensor | Mathematical Foundations | N-dimensional arrays with gradients | 01: Foundation |
-| 3 | Activations | Neural Intelligence | ReLU, Sigmoid, Softmax functions | 02: Intelligence |
-| 4 | Layers | Network Components | Linear layers and module system | 03: Components |
-| 5 | Losses | Learning Measurement | MSE, CrossEntropy loss functions | 04: Networks |
-| 6 | Autograd | Gradient Computation | Automatic differentiation engine | 05: Learning |
-| 7 | Optimizers | Parameter Updates | SGD, Adam optimization algorithms | 06: Optimization |
-| 8 | Training | Complete Systems | End-to-end training loops | 07: Training |
-
-**Capability Milestone**: After Module 8, you have complete neural network training capability!
-
---
-
-### Part II: Computer Vision (Modules 9-10)
-**Focus: Spatial Processing | 2 weeks**
-
-| Week | Module | Core Capability | Implementation Focus | Checkpoint Unlocked |
-|------|--------|-----------------|---------------------|--------------------|
-| 9 | Spatial | Spatial Processing | Conv2d, MaxPool2d operations | 08: Vision |
-| 10 | DataLoader | Data Management | Efficient data loading pipelines | 09: Data |
-
-**Capability Milestone**: Computer vision systems with spatial feature processing!
-
---
-
-### Part III: Language Processing (Modules 11-14)
-**Focus: Sequence Understanding | 4 weeks**
-
-| Week | Module | Core Capability | Implementation Focus | Checkpoint Unlocked |
-|------|--------|-----------------|---------------------|--------------------|
-| 11 | Tokenization | Text Processing | Vocabulary and token systems | 10: Language |
-| 12 | Embeddings | Representation Learning | Token and positional encodings | 11: Representation |
-| 13 | Attention | Sequence Understanding | Multi-head attention mechanisms | 12: Attention |
-| 14 | Transformers | Architecture Mastery | Complete transformer blocks | 13: Architecture |
-
-**Capability Milestone**: Complete language understanding and generation systems!
-
---
-
-### Part IV: Production Systems (Modules 15-20)
-**Focus: Performance Optimization | 6 weeks**
-
-| Week | Module | Core Capability | Implementation Focus | Checkpoint Unlocked |
-|------|--------|-----------------|---------------------|--------------------|
-| 15 | Profiling | Performance Analysis | Memory and compute profiling | 14: Systems |
-| 16 | Acceleration | Hardware Optimization | Vectorization and caching | |
-| 17 | Quantization | Model Compression | INT8 inference optimization | |
-| 18 | Compression | Size Optimization | Pruning and distillation | |
-| 19 | Caching | Memory Management | KV-cache for generation | |
-| 20 | Capstone | Complete Mastery | End-to-end ML systems | 15: Mastery |
-
-**Final Capability**: Complete ML systems engineering mastery!
-
---
-
-## 📈 8-Week Learning Progression Overview
-
-For a quick visual overview of the main learning phases:
-
-<div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 1rem; margin: 2rem 0;">
-
-<div style="background: #fef5e7; border-left: 4px solid #f6ad55; padding: 1rem;">
-<h4 style="margin: 0 0 0.5rem 0; color: #c05621;">Weeks 1-2: Mathematical Foundations</h4>
-<p style="font-size: 0.85rem; margin: 0;">Implement tensor operations, understand memory layout, build arithmetic foundations. Core mathematical building blocks.</p>
 </div>

-<div style="background: #e6fffa; border-left: 4px solid #4fd1c7; padding: 1rem;">
-<h4 style="margin: 0 0 0.5rem 0; color: #234e52;">Weeks 3-4: Neural Network Components</h4>
-<p style="font-size: 0.85rem; margin: 0;">Linear transformations, activation functions, loss functions. Build the mathematical components of neural computation.</p>
+| Module | Component | Core Capability | Real-World Connection |
+|--------|-----------|-----------------|----------------------|
+| **01** | **Tensor** | Data structures and operations | NumPy, PyTorch tensors |
+| **02** | **Activations** | Nonlinear functions | ReLU, attention activations |
+| **03** | **Layers** | Linear transformations | `nn.Linear`, dense layers |
+| **04** | **Losses** | Optimization objectives | CrossEntropy, MSE loss |
+| **05** | **Autograd** | Automatic differentiation | PyTorch autograd engine |
+| **06** | **Optimizers** | Parameter updates | Adam, SGD optimizers |
+| **07** | **Training** | Complete training loops | Model.fit(), training scripts |
+
+**🎯 Tier Milestone**: Train neural networks achieving **95%+ accuracy on MNIST** using 100% your own implementations!
+
+**Skills Gained**:
+- Understand memory layout and computational graphs
+- Debug gradient flow and numerical stability issues
+- Implement any optimization algorithm from research papers
+- Build custom neural network architectures from scratch
+
+---
+
+### 🧠 INTELLIGENCE TIER (Modules 08-13)
+**Modern AI Algorithms • 4-6 weeks • Vision + Language Architectures**
+
+<div style="background: #fef7ff; border: 1px solid #f3e8ff; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0;">
+
+**What You'll Learn**: Implement the architectures powering modern AI: convolutional networks for vision and transformers for language. Discover why the same mathematical principles work across domains.
+
+**Prerequisites**: Foundation Tier complete (Modules 01-07)
+
+**Career Connection**: Computer Vision Engineer, NLP Engineer, AI Research Scientist, ML Product Manager roles
+
+**Time Investment**: ~25 hours total (4-6 hours/week for 4-6 weeks)
+
 </div>

-<div style="background: #f0fff4; border-left: 4px solid #9ae6b4; padding: 1rem;">
-<h4 style="margin: 0 0 0.5rem 0; color: #22543d;">Weeks 5-6: Learning Algorithms</h4>
-<p style="font-size: 0.85rem; margin: 0;">Automatic differentiation, optimization algorithms, training procedures. Understand how neural networks learn.</p>
+| Module | Component | Core Capability | Real-World Connection |
+|--------|-----------|-----------------|----------------------|
+| **08** | **Spatial** | Convolutions and regularization | CNNs, ResNet, computer vision |
+| **09** | **DataLoader** | Batch processing | PyTorch DataLoader, tf.data |
+| **10** | **Tokenization** | Text preprocessing | BERT tokenizer, GPT tokenizer |
+| **11** | **Embeddings** | Representation learning | Word2Vec, positional encodings |
+| **12** | **Attention** | Information routing | Multi-head attention, self-attention |
+| **13** | **Transformers** | Modern architectures | GPT, BERT, Vision Transformer |
+
+**🎯 Tier Milestone**: Achieve **75%+ accuracy on CIFAR-10** with CNNs AND generate coherent text with transformers!
+
+**Skills Gained**:
+- Understand why convolution works for spatial data
+- Implement attention mechanisms from scratch
+- Build transformer architectures for any domain
+- Debug sequence modeling and attention patterns
+
+---
+
+### ⚡ OPTIMIZATION TIER (Modules 14-20)
+**Production & Performance • 4-6 weeks • Deploy and Scale ML Systems**
+
+<div style="background: #f0fdfa; border: 1px solid #a7f3d0; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0;">
+
+**What You'll Learn**: Transform research models into production systems. Master profiling, optimization, and deployment techniques used by companies like OpenAI, Google, and Meta.
+
+**Prerequisites**: Intelligence Tier complete (Modules 08-13)
+
+**Career Connection**: ML Systems Engineer, Performance Engineer, MLOps Engineer, Senior ML Engineer roles
+
+**Time Investment**: ~30 hours total (5-7 hours/week for 4-6 weeks)
+
 </div>

-<div style="background: #faf5ff; border-left: 4px solid #b794f6; padding: 1rem;">
-<h4 style="margin: 0 0 0.5rem 0; color: #553c9a;">Weeks 7-8: Systems Engineering</h4>
-<p style="font-size: 0.85rem; margin: 0;">Performance analysis, computational kernels, benchmarking. Study the engineering principles behind ML systems.</p>
+| Module | Component | Core Capability | Real-World Connection |
+|--------|-----------|-----------------|----------------------|
+| **14** | **Profiling** | Performance analysis | PyTorch Profiler, TensorBoard |
+| **15** | **Acceleration** | Speed improvements | CUDA kernels, vectorization |
+| **16** | **Quantization** | Memory efficiency | INT8 inference, model compression |
+| **17** | **Compression** | Model optimization | Pruning, distillation, ONNX |
+| **18** | **Caching** | Memory management | KV-cache for generation |
+| **19** | **Benchmarking** | Measurement systems | MLPerf, production monitoring |
+| **20** | **Capstone** | Full system integration | End-to-end ML pipeline |
+
+**🎯 Tier Milestone**: Build **production-ready systems** competitive in TinyMLPerf benchmarks!
+
+**Skills Gained**:
+- Profile memory usage and identify bottlenecks
+- Implement efficient inference optimizations
+- Deploy models with <100ms latency requirements
+- Design scalable ML system architectures
+
+---
+
+## 🎯 Learning Path Recommendations
+
+### Choose Your Learning Style
+
+<div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 1.5rem; margin: 2rem 0;">
+
+<div style="background: #fff7ed; border: 1px solid #fdba74; padding: 1.5rem; border-radius: 0.5rem;">
+<h4 style="margin: 0 0 1rem 0; color: #c2410c;">🚀 Complete Builder</h4>
+<p style="margin: 0 0 1rem 0; font-size: 0.9rem;">Implement every component from scratch</p>
+<p style="margin: 0; font-size: 0.85rem; color: #6b7280;"><strong>Time:</strong> 14-18 weeks<br><strong>Ideal for:</strong> CS students, aspiring ML engineers</p>
+</div>
+
+<div style="background: #f0f9ff; border: 1px solid #7dd3fc; padding: 1.5rem; border-radius: 0.5rem;">
+<h4 style="margin: 0 0 1rem 0; color: #0284c7;">⚡ Focused Explorer</h4>
+<p style="margin: 0 0 1rem 0; font-size: 0.9rem;">Pick one tier based on your goals</p>
+<p style="margin: 0; font-size: 0.85rem; color: #6b7280;"><strong>Time:</strong> 4-8 weeks<br><strong>Ideal for:</strong> Working professionals, specific skill gaps</p>
+</div>
+
+<div style="background: #f0fdf4; border: 1px solid #86efac; padding: 1.5rem; border-radius: 0.5rem;">
+<h4 style="margin: 0 0 1rem 0; color: #166534;">📚 Guided Learner</h4>
+<p style="margin: 0 0 1rem 0; font-size: 0.9rem;">Study implementations with hands-on exercises</p>
+<p style="margin: 0; font-size: 0.85rem; color: #6b7280;"><strong>Time:</strong> 8-12 weeks<br><strong>Ideal for:</strong> Self-directed learners, bootcamp graduates</p>
 </div>

 </div>
--- a/book/intro.md
+++ b/book/intro.md
@@ -31,11 +31,17 @@ TinyTorch is an educational ML systems course where you **build complete neural

 **Core Learning Approach**: Build → Profile → Optimize. You'll implement each system component, measure its performance characteristics, and understand the engineering trade-offs that shape production ML systems.

-## The ML Evolution Story You'll Experience
+## Three-Tier Learning Pathway

-Journey through 40+ years of ML breakthroughs by building each era yourself: **1980s neural foundations** → **1990s backpropagation** → **2012 CNN revolution** → **2017 transformer unification** → **2024 production optimization**. Each module teaches both the breakthrough AND the systems engineering that made it possible.
+TinyTorch organizes learning through **three pedagogically-motivated tiers** that follow ML history:

-**📖 See [Complete ML Evolution Timeline](chapters/00-introduction.html#the-ml-evolution-story-youll-experience)** for the full historical context and technical progression.
+**🏗️ Foundation Tier (Modules 01-07)**: Build mathematical infrastructure - tensors, autograd, optimizers
+**🧠 Intelligence Tier (Modules 08-13)**: Implement modern AI - CNNs for vision, transformers for language
+**⚡ Optimization Tier (Modules 14-20)**: Deploy production systems - profiling, quantization, acceleration
+
+Each tier builds complete, working systems with clear career connections and practical skills.
+
+**📖 See [Complete Three-Tier Structure](chapters/00-introduction.html#three-tier-learning-pathway-build-complete-ml-systems)** for detailed tier breakdown, time estimates, and learning outcomes.

 ## 🏆 Prove Your Mastery Through History

@@ -167,7 +173,7 @@ You master modern LLM optimizations

 ## How to Choose Your Learning Path

-**Two Learning Approaches**: You can either **build it yourself** (work through student notebooks and implement from scratch) or **learn by reading** (study the solution notebooks to understand how ML systems work). Both approaches use the same **Build → Profile → Optimize** methodology at different scales.
+**Three Learning Approaches**: You can **build complete tiers** (implement all 20 modules), **focus on specific tiers** (target your skill gaps), or **explore selectively** (study key concepts). Each tier builds complete, working systems.

 <div style="display: grid; grid-template-columns: repeat(2, 1fr); gap: 1.5rem; margin: 3rem 0;">

@@ -201,7 +207,7 @@ You master modern LLM optimizations

 ## Getting Started

-Whether you're just exploring or ready to dive in, here are helpful resources: **📖 See [Essential Commands](tito-essentials.html)** for complete setup and command reference, or **📖 See [Complete Course Structure](chapters/00-introduction.html)** for detailed module descriptions.
+Whether you're just exploring or ready to dive in, here are helpful resources: **📖 See [Essential Commands](tito-essentials.html)** for complete setup and command reference, or **📖 See [Three-Tier Learning Structure](chapters/00-introduction.html#three-tier-learning-pathway-build-complete-ml-systems)** for detailed tier breakdown and learning outcomes.

 **Additional Resources**:
 - **[Progress Tracking](learning-progress.html)** - Monitor your learning journey with 21 capability checkpoints
--- a/book/learning-progress.md
+++ b/book/learning-progress.md
@@ -22,23 +22,21 @@ Use TinyTorch's 21-checkpoint system to monitor your capability development. Tra

 ## Your Learning Path Overview

-TinyTorch organizes learning through four major phases, each building essential ML systems capabilities:
+TinyTorch organizes learning through **three pedagogically-motivated tiers**, each building essential ML systems capabilities:

-**📖 See [Complete Course Structure](chapters/00-introduction.html)** for the full learning timeline and detailed module descriptions.
+**📖 See [Three-Tier Learning Structure](chapters/00-introduction.html#three-tier-learning-pathway-build-complete-ml-systems)** for detailed tier breakdown, time estimates, and learning outcomes.

 ## Student Learning Journey

-### Typical Student Progression
- **Week 1-2**: Foundation capabilities (Environment, Tensors, Activations)
- **Week 3-4**: Core learning systems (Layers, Losses, Autograd)
- **Week 5-6**: Training and optimization (Optimizers, Training loops)
- **Week 7-8**: Advanced architectures (Spatial processing, Attention)
- **Week 9-12**: Production systems (Profiling, Optimization, Deployment)
+### Typical Student Progression by Tier
+- **🏗️ Foundation Tier (6-8 weeks)**: Build mathematical infrastructure - tensors, autograd, optimizers, training loops
+- **🧠 Intelligence Tier (4-6 weeks)**: Implement modern AI architectures - CNNs for vision, transformers for language
+- **⚡ Optimization Tier (4-6 weeks)**: Deploy production systems - profiling, quantization, acceleration

 ### Study Approaches
- **Full Implementation** (8-12 weeks): Build every component from scratch
- **Guided Study** (4-6 weeks): Study solution notebooks with implementation exercises
- **Quick Exploration** (2 weeks): Focus on key concepts with provided implementations
+- **Complete Builder** (14-18 weeks): Implement all three tiers from scratch
+- **Focused Explorer** (4-8 weeks): Pick specific tiers based on your goals
+- **Guided Learner** (8-12 weeks): Study implementations with hands-on exercises

 **📖 See [Quick Start Guide](quickstart-guide.html)** for immediate hands-on experience with your first module.

--- a/book/usage-paths/classroom-use.md
+++ b/book/usage-paths/classroom-use.md
@@ -21,7 +21,7 @@
 <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem;">
 <div>
 <ul style="margin: 0; padding-left: 1rem;">
-<li><strong>20 progressive modules</strong> with NBGrader integration</li>
+<li><strong>Three-tier progression</strong> (20 modules) with NBGrader integration</li>
 <li><strong>200+ automated tests</strong> for immediate feedback</li>
 <li><strong>Professional CLI tools</strong> for development workflow</li>
 <li><strong>Real datasets</strong> (CIFAR-10, text generation)</li>