diff --git a/tinytorch/milestones/02_1969_xor/ABOUT.md b/tinytorch/milestones/02_1969_xor/ABOUT.md
index c66402dad..5d9c37e9f 100644
--- a/tinytorch/milestones/02_1969_xor/ABOUT.md
+++ b/tinytorch/milestones/02_1969_xor/ABOUT.md
@@ -1,6 +1,6 @@
 # Milestone 02: The XOR Crisis (1969)
 
-**FOUNDATION TIER** | Difficulty: 2/4 | Time: 30-60 min | Prerequisites: Modules 01-06
+**FOUNDATION TIER** | Difficulty: 2/4 | Time: 30-60 min | Prerequisites: Modules 01-08
 
 ## Overview
 
diff --git a/tinytorch/milestones/README.md b/tinytorch/milestones/README.md
index d99887cc2..807f927de 100644
--- a/tinytorch/milestones/README.md
+++ b/tinytorch/milestones/README.md
@@ -14,12 +14,12 @@ After completing a set of modules, you unlock the ability to run a milestone. Ea
 
 | ID | Name | Year | Required Modules | What You'll Do |
 |----|------|------|------------------|----------------|
-| 01 | Perceptron | 1957 | 1-3 | Build Rosenblatt's first neural network (forward pass) |
-| 02 | XOR Crisis | 1969 | 1-3 | Experience the mathematical impossibility that killed AI for 17 years |
-| 03 | MLP Revival | 1986 | Part 1: 1-6, Part 2: 1-6, 8 | SOLVE XOR with hidden layers, then train on TinyDigits |
-| 04 | CNN Revolution | 1998 | 1-6, 8-9 | Build LeNet for image recognition |
-| 05 | Transformer Era | 2017 | 1-6, 11-13 | Prove attention works with sequence tasks |
-| 06 | MLPerf Benchmarks | 2018 | Part 1: 1-6, 14-19, Part 2: 1-3, 11-12, 14, 17 | Optimize and benchmark your neural networks |
+| 01 | Perceptron | 1957 | Part 1: 01-04, Part 2: 01-08 | Build Rosenblatt's first neural network |
+| 02 | XOR Crisis | 1969 | Part 1: 01-03, Part 2: 01-08 | Experience and solve the XOR impossibility |
+| 03 | MLP Revival | 1986 | 01-08 | Train MLPs on TinyDigits with backpropagation |
+| 04 | CNN Revolution | 1998 | 01-09 | Build LeNet for image recognition |
+| 05 | Transformer Era | 2017 | 01-13 | Build attention and generate text |
+| 06 | MLPerf Benchmarks | 2018 | 01-19 | Optimize and benchmark your neural networks |
 
 ## Running Milestones
 
diff --git a/tinytorch/paper/module_flow.dot b/tinytorch/paper/module_flow.dot
index b112ec421..7010d049f 100644
--- a/tinytorch/paper/module_flow.dot
+++ b/tinytorch/paper/module_flow.dot
@@ -11,9 +11,9 @@ digraph TinyTorch {
     node [shape=box, style="rounded,filled", fontsize=10, fontname="Helvetica"];
     edge [arrowsize=0.7];
 
-    // === FOUNDATION TIER (01-07) ===
+    // === FOUNDATION TIER (01-08) ===
     subgraph cluster_foundation {
-        label="FOUNDATION (01-07)";
+        label="FOUNDATION (01-08)";
         labelloc=t;
         style="dashed,rounded";
         color=blue;
@@ -33,9 +33,9 @@ digraph TinyTorch {
         T -> A -> L -> Loss -> Auto -> Opt -> Train;
     }
 
-    // === ARCHITECTURE TIER (08-13) ===
+    // === ARCHITECTURE TIER (09-13) ===
     subgraph cluster_architecture {
-        label="ARCHITECTURE (08-13)";
+        label="ARCHITECTURE (09-13)";
         labelloc=t;
         style="dashed,rounded";
         color=purple;
diff --git a/tinytorch/paper/module_flow_horizontal.tex b/tinytorch/paper/module_flow_horizontal.tex
index 34fd8160b..ddebe2664 100644
--- a/tinytorch/paper/module_flow_horizontal.tex
+++ b/tinytorch/paper/module_flow_horizontal.tex
@@ -18,7 +18,7 @@
     tier/.style={draw=gray!40, dashed, rounded corners=5pt, inner sep=8pt}
 ]
 
-% === FOUNDATION TIER (01-07) ===
+% === FOUNDATION TIER (01-08) ===
 \node[foundation] (T) {01 Tensor};
 \node[foundation, right=of T] (A) {02 Activ.};
 \node[foundation, right=of A] (L) {03 Layers};
@@ -35,7 +35,7 @@
 \draw[arr] (Auto) -- (Opt);
 \draw[arr] (Opt) -- (Train);
 
-% === ARCHITECTURE TIER (08-13) ===
+% === ARCHITECTURE TIER (09-13) ===
 % DataLoader - the branch point
 \node[architecture, right=1.2cm of Train] (Data) {08 DataLoader};
 \draw[arr] (Train) -- (Data);
@@ -82,8 +82,8 @@
 \draw[arr] (Bench) -- (Cap);
 
 % === TIER LABELS ===
-\node[above=0.8cm of L, font=\scriptsize\bfseries, blue!70] {FOUNDATION (01-07)};
-\node[above=0.9cm of Att, font=\scriptsize\bfseries, purple!70] {ARCHITECTURE (08-13)};
+\node[above=0.8cm of L, font=\scriptsize\bfseries, blue!70] {FOUNDATION (01-08)};
+\node[above=0.9cm of Att, font=\scriptsize\bfseries, purple!70] {ARCHITECTURE (09-13)};
 \node[above=0.9cm of Comp, font=\scriptsize\bfseries, orange!70] {OPTIMIZATION (14-19)};
 
 % === PATH LABELS ===
diff --git a/tinytorch/paper/organizational_insights.md b/tinytorch/paper/organizational_insights.md
index 5f8c2d684..0dd74287b 100644
--- a/tinytorch/paper/organizational_insights.md
+++ b/tinytorch/paper/organizational_insights.md
@@ -123,7 +123,7 @@ This document summarizes key organizational decisions and learnings from TinyTor
 
 ### 6. Three-Tier Architecture Organization
 
-**Evolution**: Modules organized into Foundation (01-07), Architecture (08-13), Optimization (14-19), Olympics (20) tiers.
+**Evolution**: Modules organized into Foundation (01-08), Architecture (09-13), Optimization (14-19), Olympics (20) tiers.
 
 **Key Decision**:
 - **Tier-Based Progression**: Students cannot skip tiers; architectures require foundation mastery
diff --git a/tinytorch/site/big-picture.md b/tinytorch/site/big-picture.md
index a024934b4..38b18d3a1 100644
--- a/tinytorch/site/big-picture.md
+++ b/tinytorch/site/big-picture.md
@@ -25,7 +25,7 @@ TinyTorch takes you from basic tensors to production-ready ML systems through 20
 :caption: "**TinyTorch Module Flow.** The 20 modules progress through three tiers: Foundation (blue) builds core ML primitives, Architecture (purple) applies them to vision and language tasks, and Optimization (orange) makes systems production-ready."
 
 graph LR
-    subgraph F["FOUNDATION (01-07)"]
+    subgraph F["FOUNDATION (01-08)"]
         direction TB
         T["Tensor"] --> A["Activations"]
         A --> L["Layers"]
@@ -35,7 +35,7 @@ graph LR
         Opt --> Train["Training"]
     end
 
-    subgraph Arch["ARCHITECTURE (08-13)"]
+    subgraph Arch["ARCHITECTURE (09-13)"]
         direction TB
         Data["DataLoader"]
         Data --> Conv["CNNs"]
@@ -126,8 +126,7 @@ Concrete outcomes at each major checkpoint:
 | After Module | You'll Have Built | Historical Context |
 |--------------|-------------------|-------------------|
 | **01-04** | Working Perceptron classifier | Rosenblatt 1957 |
-| **01-06** | MLP solving XOR (hidden layers!) | AI Winter breakthrough 1969→1986 |
-| **01-08** | Complete training pipeline with DataLoader | Backpropagation era |
+| **01-08** | MLP solving XOR + complete training pipeline | AI Winter breakthrough 1969→1986 |
 | **01-09** | CNN with convolutions and pooling | LeNet-5 (1998) |
 | **01-13** | GPT model with autoregressive generation | "Attention Is All You Need" (2017) |
 | **01-19** | Optimized, quantized, accelerated system | Production ML today |
diff --git a/tinytorch/site/getting-started.md b/tinytorch/site/getting-started.md
index 30135ba3a..b6b768bc7 100644
--- a/tinytorch/site/getting-started.md
+++ b/tinytorch/site/getting-started.md
@@ -149,10 +149,9 @@ As you complete more modules, you unlock more milestones:
 
 | Modules Completed | Milestone Unlocked | What You Recreate |
 |-------------------|-------------------|-------------------|
-| 01-03 | `perceptron` | The 1957 Perceptron |
-| 01-05 | `backprop` | 1986 Backpropagation |
-| 01-07 | `lenet` | 1989 LeNet CNN |
-| 01-09 | `alexnet` | 2012 AlexNet |
+| 01-04 | `perceptron` | The 1957 Perceptron |
+| 01-08 | `backprop` | 1986 Backpropagation |
+| 01-09 | `lenet` | 1998 LeNet CNN |
 | 01-13 | `transformer` | 2017 Transformer |
 | 01-19 | `mlperf` | MLPerf Benchmarks |
 
@@ -188,8 +187,8 @@ TinyTorch has 20 modules organized in progressive tiers:
 
 | Tier | Modules | Focus | Time Estimate |
 |------|---------|-------|---------------|
-| **Foundation** | 01-07 | Core ML infrastructure (tensors, autograd, training) | ~15-20 hours |
-| **Architecture** | 08-13 | Neural architectures (data loading, CNNs, transformers) | ~18-24 hours |
+| **Foundation** | 01-08 | Core ML infrastructure (tensors, dataloader, autograd, training) | ~18-24 hours |
+| **Architecture** | 09-13 | Neural architectures (CNNs, transformers) | ~15-20 hours |
 | **Optimization** | 14-19 | Production optimization (profiling, quantization) | ~18-24 hours |
 | **Capstone** | 20 | Torch Olympics Competition | ~8-10 hours |
 
diff --git a/tinytorch/site/intro.md b/tinytorch/site/intro.md
index 73a06d1d5..0372268b0 100644
--- a/tinytorch/site/intro.md
+++ b/tinytorch/site/intro.md
@@ -214,12 +214,12 @@ Four progressive tiers take you from foundations to production systems:
 <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1.25rem; margin: 1.5rem 0 2rem 0; max-width: 1100px;">
 
 <a href="tiers/foundation.html" style="background: linear-gradient(135deg, #e3f2fd 0%, #bbdefb 100%); padding: 1.25rem; border-radius: 0.5rem; border-left: 4px solid #1976d2; text-decoration: none; display: block;">
-<h3 style="margin: 0 0 0.5rem 0; color: #0d47a1; font-size: 1rem; font-weight: 600;">Foundation (01-07)</h3>
+<h3 style="margin: 0 0 0.5rem 0; color: #0d47a1; font-size: 1rem; font-weight: 600;">Foundation (01-08)</h3>
 <p style="margin: 0; color: #1565c0; font-size: 0.9rem;">Tensors, autograd, layers, training loops</p>
 </a>
 
 <a href="tiers/architecture.html" style="background: linear-gradient(135deg, #f3e5f5 0%, #e1bee7 100%); padding: 1.25rem; border-radius: 0.5rem; border-left: 4px solid #7b1fa2; text-decoration: none; display: block;">
-<h3 style="margin: 0 0 0.5rem 0; color: #4a148c; font-size: 1rem; font-weight: 600;">Architecture (08-13)</h3>
+<h3 style="margin: 0 0 0.5rem 0; color: #4a148c; font-size: 1rem; font-weight: 600;">Architecture (09-13)</h3>
 <p style="margin: 0; color: #6a1b9a; font-size: 0.9rem;">CNNs, attention, transformers, GPT</p>
 </a>
 
diff --git a/tinytorch/site/tiers/olympics.md b/tinytorch/site/tiers/olympics.md
index 9b71fcfea..7b84818d1 100644
--- a/tinytorch/site/tiers/olympics.md
+++ b/tinytorch/site/tiers/olympics.md
@@ -355,9 +355,9 @@ tito olympics logo
 
 **Or review prerequisites:**
 
-- **[ Foundation Tier](foundation)** (Modules 01-07)
-- **[ Architecture Tier](architecture)** (Modules 08-13)
-- **[ Optimization Tier](optimization)** (Modules 14-19)
+- **[Foundation Tier](foundation)** (Modules 01-08)
+- **[Architecture Tier](architecture)** (Modules 09-13)
+- **[Optimization Tier](optimization)** (Modules 14-19)
 
 
 **[← Back to Home](../intro)**
diff --git a/tinytorch/site/tiers/optimization.md b/tinytorch/site/tiers/optimization.md
index 7ed6fe216..b5e0b3ebb 100644
--- a/tinytorch/site/tiers/optimization.md
+++ b/tinytorch/site/tiers/optimization.md
@@ -165,7 +165,7 @@ After completing the Optimization tier, you'll be able to:
 ## Prerequisites
 
 **Required**:
-- ** Architecture Tier** (Modules 08-13) completed
+- ** Architecture Tier** (Modules 09-13) completed
 - Understanding of CNNs and/or transformers
 - Experience training models on real datasets
 - Basic understanding of systems concepts (memory, CPU/GPU, throughput)
@@ -284,8 +284,8 @@ tito module start 14_profiling
 
 **Or explore other tiers:**
 
-- **[ Foundation Tier](foundation)** (Modules 01-07): Mathematical foundations
-- **[ Architecture Tier](architecture)** (Modules 08-13): CNNs and transformers
+- **[Foundation Tier](foundation)** (Modules 01-08): Mathematical foundations
+- **[Architecture Tier](architecture)** (Modules 09-13): CNNs and transformers
 - **[ Torch Olympics](olympics)** (Module 20): Final integration challenge
 
 
diff --git a/tinytorch/site/tito/milestones.md b/tinytorch/site/tito/milestones.md
index 393228975..82cfbff9c 100644
--- a/tinytorch/site/tito/milestones.md
+++ b/tinytorch/site/tito/milestones.md
@@ -178,7 +178,7 @@ tito milestone run 02
 
 **What**: Backpropagation breakthrough - train deep networks on MNIST
 
-**Requires**: Modules 01-07 (Complete Foundation Tier)
+**Requires**: Modules 01-08 (Complete Foundation Tier)
 
 **What you'll do**: Train a multi-layer perceptron to recognize handwritten digits (95%+ accuracy)
 
@@ -345,7 +345,7 @@ TinyTorch tracks progress in three ways (all are related but distinct):
 ### Relationship Between Systems
 
 ```
-Complete Modules (01-07)
+Complete Modules (01-08)
  ↓
 Unlock Milestone 03
  ↓
@@ -362,8 +362,8 @@ Capability Unlocked (optional checkpoint system)
 
 While you can technically skip around, the tier structure is designed for progressive learning:
 
-- **Foundation Tier (01-07)**: Required for first milestone
-- **Architecture Tier (08-13)**: Build on Foundation
+- **Foundation Tier (01-08)**: Required for first milestone
+- **Architecture Tier (09-13)**: Build on Foundation
 - **Optimization Tier (14-19)**: Build on Architecture
 
 ### 2. Test as You Go
diff --git a/tinytorch/src/09_convolutions/ABOUT.md b/tinytorch/src/09_convolutions/ABOUT.md
index f8c858779..3c2cb9ac8 100644
--- a/tinytorch/src/09_convolutions/ABOUT.md
+++ b/tinytorch/src/09_convolutions/ABOUT.md
@@ -6,7 +6,7 @@
 **ARCHITECTURE TIER** | Difficulty: ●●●○ | Time: 6-8 hours | Prerequisites: 01-08
 
 **Prerequisites: Modules 01-08** assumes you have:
-- Built the complete training pipeline (Modules 01-07)
+- Built the complete training pipeline (Modules 01-08)
 - Implemented DataLoader for batch processing (Module 05)
 - Understanding of parameter initialization, forward/backward passes, and optimization
 
diff --git a/tinytorch/src/10_tokenization/ABOUT.md b/tinytorch/src/10_tokenization/ABOUT.md
index 9cf452544..47eb31d76 100644
--- a/tinytorch/src/10_tokenization/ABOUT.md
+++ b/tinytorch/src/10_tokenization/ABOUT.md
@@ -3,9 +3,9 @@
 :::{admonition} Module Info
 :class: note
 
-**ARCHITECTURE TIER** | Difficulty: ●●○○ | Time: 4-6 hours | Prerequisites: 01-07
+**ARCHITECTURE TIER** | Difficulty: ●●○○ | Time: 4-6 hours | Prerequisites: 01-08
 
-**Prerequisites: Foundation tier (Modules 01-07)** means you should have completed:
+**Prerequisites: Foundation tier (Modules 01-08)** means you should have completed:
 - Tensor operations (Module 01)
 - Basic neural network components (Modules 02-04)
 - Training fundamentals (Modules 05-07)
diff --git a/tinytorch/src/11_embeddings/ABOUT.md b/tinytorch/src/11_embeddings/ABOUT.md
index a1a7efd56..fadca2c4d 100644
--- a/tinytorch/src/11_embeddings/ABOUT.md
+++ b/tinytorch/src/11_embeddings/ABOUT.md
@@ -3,9 +3,9 @@
 :::{admonition} Module Info
 :class: note
 
-**ARCHITECTURE TIER** | Difficulty: ●●○○ | Time: 3-5 hours | Prerequisites: 01-07, 10
+**ARCHITECTURE TIER** | Difficulty: ●●○○ | Time: 3-5 hours | Prerequisites: 01-08, 10
 
-**Prerequisites: Modules 01-07 and 10** means you should understand:
+**Prerequisites: Modules 01-08 and 10** means you should understand:
 - Tensor operations (shape manipulation, matrix operations, broadcasting)
 - Training fundamentals (forward/backward, optimization)
 - Tokenization (converting text to token IDs, vocabularies)
diff --git a/tinytorch/src/12_attention/ABOUT.md b/tinytorch/src/12_attention/ABOUT.md
index 1884940c2..e7ebbc8c7 100644
--- a/tinytorch/src/12_attention/ABOUT.md
+++ b/tinytorch/src/12_attention/ABOUT.md
@@ -3,9 +3,9 @@
 :::{admonition} Module Info
 :class: note
 
-**ARCHITECTURE TIER** | Difficulty: ●●●○ | Time: 5-7 hours | Prerequisites: 01-07, 10-11
+**ARCHITECTURE TIER** | Difficulty: ●●●○ | Time: 5-7 hours | Prerequisites: 01-08, 10-11
 
-**Prerequisites: Modules 01-07 and 10-11** means you should understand:
+**Prerequisites: Modules 01-08 and 10-11** means you should understand:
 - Tensor operations and shape manipulation (Module 01)
 - Activations, particularly softmax (Module 02)
 - Linear layers and weight projections (Module 03)
diff --git a/tinytorch/src/13_transformers/ABOUT.md b/tinytorch/src/13_transformers/ABOUT.md
index 7025b27a4..939b33677 100644
--- a/tinytorch/src/13_transformers/ABOUT.md
+++ b/tinytorch/src/13_transformers/ABOUT.md
@@ -3,9 +3,9 @@
 :::{admonition} Module Info
 :class: note
 
-**ARCHITECTURE TIER** | Difficulty: ●●●● | Time: 8-10 hours | Prerequisites: 01-07, 10-12
+**ARCHITECTURE TIER** | Difficulty: ●●●● | Time: 8-10 hours | Prerequisites: 01-08, 10-12
 
-**Prerequisites: Modules 01-07 and 10-12** means you need a strong foundation across three domains. This module assumes you've implemented tensors, layers, training loops, tokenization, embeddings, and attention mechanisms. If you can explain how multi-head attention processes queries, keys, and values to compute weighted representations, you're ready.
+**Prerequisites: Modules 01-08 and 10-12** means you need a strong foundation across three domains. This module assumes you've implemented tensors, layers, training loops, tokenization, embeddings, and attention mechanisms. If you can explain how multi-head attention processes queries, keys, and values to compute weighted representations, you're ready.
 :::
 
 `````{only} html
diff --git a/tinytorch/src/14_profiling/ABOUT.md b/tinytorch/src/14_profiling/ABOUT.md
index 00ca7bbfd..19d5ea4e1 100644
--- a/tinytorch/src/14_profiling/ABOUT.md
+++ b/tinytorch/src/14_profiling/ABOUT.md
@@ -6,7 +6,7 @@
 **OPTIMIZATION TIER** | Difficulty: ●●○○ | Time: 3-5 hours | Prerequisites: 01-13
 
 **Prerequisites: Modules 01-13** means you should have:
-- Built the complete ML stack (Modules 01-07)
+- Built the complete ML stack (Modules 01-08)
 - Implemented CNN architectures (Module 09) or Transformers (Modules 10-13)
 - Models to profile and optimize
 
diff --git a/tinytorch/src/16_compression/ABOUT.md b/tinytorch/src/16_compression/ABOUT.md
index 685254538..99c589311 100644
--- a/tinytorch/src/16_compression/ABOUT.md
+++ b/tinytorch/src/16_compression/ABOUT.md
@@ -6,7 +6,7 @@
 **OPTIMIZATION TIER** | Difficulty: ●●●○ | Time: 5-7 hours | Prerequisites: 01-14
 
 **Prerequisites: Modules 01-14** means you should have:
-- Built tensors, layers, and the complete training pipeline (Modules 01-07)
+- Built tensors, layers, and the complete training pipeline (Modules 01-08)
 - Implemented profiling tools to measure model characteristics (Module 14)
 - Comfort with weight distributions, parameter counting, and memory analysis
 
diff --git a/tinytorch/tests/integration/README.md b/tinytorch/tests/integration/README.md
index 3366136a0..3fe2e01e1 100644
--- a/tinytorch/tests/integration/README.md
+++ b/tinytorch/tests/integration/README.md
@@ -20,7 +20,7 @@ This pattern catches the most common and frustrating bugs students encounter.
 
 | Test File | What It Catches | Modules |
 |-----------|-----------------|---------|
-| `test_gradient_flow.py` | Broken backpropagation | 01-07 |
+| `test_gradient_flow.py` | Broken backpropagation | 01-08 |
 | `test_training_flow.py` | Training loop failures | 05-07 |
 | `test_nlp_pipeline_flow.py` | NLP stack issues | 10-13 |
 | `test_cnn_integration.py` | CNN gradient issues | 09 |
diff --git a/tinytorch/tito/commands/module/test.py b/tinytorch/tito/commands/module/test.py
index d90ca0627..6328ea392 100644
--- a/tinytorch/tito/commands/module/test.py
+++ b/tinytorch/tito/commands/module/test.py
@@ -205,7 +205,7 @@ class ModuleTestCommand(BaseCommand):
         # Map module numbers to relevant integration tests
         # Each module inherits tests from earlier modules (progressive testing)
         integration_test_map = {
-            # Foundation modules (01-07)
+            # Foundation modules (01-08)
             1: ["test_basic_integration.py"],
             2: ["test_basic_integration.py"],
             3: ["test_layers_integration.py"],
@@ -214,7 +214,7 @@ class ModuleTestCommand(BaseCommand):
             6: ["test_training_flow.py"],
             7: ["test_training_flow.py"],
 
-            # Architecture modules (08-13)
+            # Architecture modules (09-13)
             8: ["test_dataloader_integration.py"],
             9: ["test_cnn_integration.py"],
             10: [],  # Tokenization: self-contained, no integration deps