docs(tinytorch): update TOC and tier documentation for module reordering

- Update _toc.yml: Foundation (01-08), Architecture (09-13) - Update _toc_pdf.yml: same tier ranges - Update foundation.md: add DataLoader as module 05, renumber autograd/optimizers/training - Update architecture.md: remove DataLoader, start with Convolutions at 09 - Update all Mermaid diagrams and tier references
2026-04-29 00:59:07 -05:00 · 2025-12-18 13:10:30 -05:00
parent 5b423f917a
commit c8161a7408
4 changed files with 59 additions and 62 deletions
--- a/tinytorch/site/_toc.yml
+++ b/tinytorch/site/_toc.yml
@@ -17,7 +17,7 @@ parts:
    title: "Quick Start"

 # Foundation Tier - Collapsible section
- caption: 🏗 Foundation Tier (01-07)
+- caption: 🏗 Foundation Tier (01-08)
  chapters:
  - file: tiers/foundation
    title: "📖 Tier Overview"
@@ -29,20 +29,20 @@ parts:
    title: "03. Layers"
  - file: modules/04_losses_ABOUT
    title: "04. Losses"
-  - file: modules/05_autograd_ABOUT
-    title: "05. Autograd"
-  - file: modules/06_optimizers_ABOUT
-    title: "06. Optimizers"
-  - file: modules/07_training_ABOUT
-    title: "07. Training"
+  - file: modules/05_dataloader_ABOUT
+    title: "05. DataLoader"
+  - file: modules/06_autograd_ABOUT
+    title: "06. Autograd"
+  - file: modules/07_optimizers_ABOUT
+    title: "07. Optimizers"
+  - file: modules/08_training_ABOUT
+    title: "08. Training"

 # Architecture Tier - Collapsible section
- caption: 🏛️ Architecture Tier (08-13)
+- caption: 🏛️ Architecture Tier (09-13)
  chapters:
  - file: tiers/architecture
    title: "📖 Tier Overview"
-  - file: modules/08_dataloader_ABOUT
-    title: "08. DataLoader"
  - file: modules/09_convolutions_ABOUT
    title: "09. Convolutions"
  - file: modules/10_tokenization_ABOUT
--- a/tinytorch/site/_toc_pdf.yml
+++ b/tinytorch/site/_toc_pdf.yml
@@ -19,7 +19,7 @@ parts:
    title: "Quick Start"

 # Foundation Tier
- caption: Foundation (Modules 01-07)
+- caption: Foundation (Modules 01-08)
  numbered: true
  chapters:
  - file: modules/01_tensor_ABOUT
@@ -30,19 +30,19 @@ parts:
    title: "03. Layers"
  - file: modules/04_losses_ABOUT
    title: "04. Losses"
-  - file: modules/05_autograd_ABOUT
-    title: "05. Autograd"
-  - file: modules/06_optimizers_ABOUT
-    title: "06. Optimizers"
-  - file: modules/07_training_ABOUT
-    title: "07. Training"
+  - file: modules/05_dataloader_ABOUT
+    title: "05. DataLoader"
+  - file: modules/06_autograd_ABOUT
+    title: "06. Autograd"
+  - file: modules/07_optimizers_ABOUT
+    title: "07. Optimizers"
+  - file: modules/08_training_ABOUT
+    title: "08. Training"

 # Architecture Tier
- caption: Architecture (Modules 08-13)
+- caption: Architecture (Modules 09-13)
  numbered: true
  chapters:
-  - file: modules/08_dataloader_ABOUT
-    title: "08. DataLoader"
  - file: modules/09_convolutions_ABOUT
    title: "09. Convolutions"
  - file: modules/10_tokenization_ABOUT
--- a/tinytorch/site/tiers/architecture.md
+++ b/tinytorch/site/tiers/architecture.md
@@ -1,14 +1,13 @@
-# Architecture Tier (Modules 08-13)
+# Architecture Tier (Modules 09-13)

 **Build modern neural architectures—from computer vision to language models.**


 ## What You'll Learn

-The Architecture tier teaches you how to build the neural network architectures that power modern AI. You'll implement CNNs for computer vision, transformers for language understanding, and the data loading infrastructure needed to train on real datasets.
+The Architecture tier teaches you how to build the neural network architectures that power modern AI. You'll implement CNNs for computer vision and transformers for language understanding, building on the foundational training infrastructure from the previous tier.

 **By the end of this tier, you'll understand:**
- How data loaders efficiently feed training data to models
 - Why convolutional layers are essential for computer vision
 - How attention mechanisms enable transformers to understand sequences
 - What embeddings do to represent discrete tokens as continuous vectors
@@ -19,14 +18,11 @@ The Architecture tier teaches you how to build the neural network architectures

 ```{mermaid}
 :align: center
-:caption: "**Architecture Module Flow.** Two parallel tracks branch from Foundation: vision (DataLoader, Convolutions) and language (Tokenization through Transformers)."
+:caption: "**Architecture Module Flow.** Two parallel tracks branch from Foundation: vision (Convolutions) and language (Tokenization through Transformers)."
 graph TB
- F[ Foundation<br/>Tensor, Autograd, Training]
+ F[ Foundation<br/>Tensor, DataLoader, Autograd, Training]

- F --> M08[08. DataLoader<br/>Efficient data pipelines]
 F --> M09[09. Convolutions<br/>Conv2d + Pooling]
-
- M08 --> M09
 M09 --> VISION[ Computer Vision<br/>CNNs unlock spatial intelligence]

 F --> M10[10. Tokenization<br/>Text → integers]
@@ -37,7 +33,6 @@ graph TB
 M13 --> LLM[ Language Models<br/>Transformers generate text]

 style F fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
- style M08 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
 style M09 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px
 style M10 fill:#e1bee7,stroke:#6a1b9a,stroke-width:3px
 style M11 fill:#e1bee7,stroke:#6a1b9a,stroke-width:3px
@@ -50,17 +45,6 @@ graph TB

 ## Module Details

-### 08. DataLoader - Efficient Data Pipelines
-
-**What it is**: Infrastructure for loading, batching, and shuffling training data efficiently.
-
-**Why it matters**: Real ML systems train on datasets that don't fit in memory. DataLoaders handle batching, shuffling, and parallel data loading—essential for efficient training.
-
-**What you'll build**: A DataLoader that supports batching, shuffling, and dataset iteration with proper memory management.
-
-**Systems focus**: Memory efficiency, batching strategies, I/O optimization
-
-
 ### 09. Convolutions - Convolutional Neural Networks

 **What it is**: Conv2d (convolutional layers) and pooling operations for processing images.
@@ -124,7 +108,7 @@ graph TB

 ```{mermaid}
 :align: center
-:caption: "**Architecture Tier Milestones.** After completing modules 08-13, you unlock computer vision (1998 CNN) and language understanding (2017 Transformer) breakthroughs."
+:caption: "**Architecture Tier Milestones.** After completing modules 09-13, you unlock computer vision (1998 CNN) and language understanding (2017 Transformer) breakthroughs."
 timeline
 title Historical Achievements Unlocked
 1998 : CNN Revolution : 75%+ accuracy on CIFAR-10 with spatial intelligence
@@ -142,8 +126,8 @@ After completing the Architecture tier, you'll be able to:
 ## Prerequisites

 **Required**:
- ** Foundation Tier** (Modules 01-07) completed
- Understanding of tensors, autograd, and training loops
+- ** Foundation Tier** (Modules 01-08) completed
+- Understanding of tensors, data loaders, autograd, and training loops
 - Basic understanding of images (height, width, channels)
 - Basic understanding of text/language concepts

@@ -199,8 +183,8 @@ python 01_vaswani_generation.py # Text generation with YOUR transformer

 The Architecture tier splits into two parallel paths that can be learned in any order:

-**Vision Track (Modules 08-09)**:
- DataLoader → Convolutions (Conv2d + Pooling)
+**Vision Track (Module 09)**:
+- Convolutions (Conv2d + Pooling)
 - Enables computer vision applications
 - Culminates in CNN milestone

@@ -209,7 +193,7 @@ The Architecture tier splits into two parallel paths that can be learned in any
 - Enables natural language processing
 - Culminates in Transformer milestone

-**Recommendation**: Complete both tracks in order (08→09→10→11→12→13), but you can prioritize the track that interests you more.
+**Recommendation**: Complete both tracks in order (09→10→11→12→13), but you can prioritize the track that interests you more.


 ## Next Steps
@@ -217,8 +201,8 @@ The Architecture tier splits into two parallel paths that can be learned in any
 **Ready to build modern architectures?**

 ```bash
-# Start the Architecture tier
-tito module start 08_dataloader
+# Start the Architecture tier with vision
+tito module start 09_convolutions

 # Or jump to language models
 tito module start 10_tokenization
@@ -226,7 +210,7 @@ tito module start 10_tokenization

 **Or explore other tiers:**

- **[ Foundation Tier](foundation)** (Modules 01-07): Mathematical foundations
+- **[ Foundation Tier](foundation)** (Modules 01-08): Mathematical foundations
 - **[ Optimization Tier](optimization)** (Modules 14-19): Production-ready performance
 - **[ Torch Olympics](olympics)** (Module 20): Compete in ML systems challenges

--- a/tinytorch/site/tiers/foundation.md
+++ b/tinytorch/site/tiers/foundation.md
@@ -1,15 +1,16 @@
-#  Foundation Tier (Modules 01-07)
+#  Foundation Tier (Modules 01-08)

 **Build the mathematical core that makes neural networks learn.**


 ## What You'll Learn

-The Foundation tier teaches you how to build a complete learning system from scratch. Starting with basic tensor operations, you'll construct the mathematical infrastructure that powers every modern ML framework—automatic differentiation, gradient-based optimization, and training loops.
+The Foundation tier teaches you how to build a complete learning system from scratch. Starting with basic tensor operations, you'll construct the mathematical infrastructure that powers every modern ML framework—data loading, automatic differentiation, gradient-based optimization, and training loops.

 **By the end of this tier, you'll understand:**
 - How tensors represent and transform data in neural networks
 - Why activation functions enable non-linear learning
+- How data loaders efficiently feed training data to models
 - How backpropagation computes gradients automatically
 - What optimizers do to make training converge
 - How training loops orchestrate the entire learning process
@@ -19,18 +20,18 @@ The Foundation tier teaches you how to build a complete learning system from scr

 ```{mermaid}
 :align: center
-:caption: "**Foundation Module Dependencies.** Tensors and activations feed into layers, which connect to losses and autograd, enabling optimizers and ultimately training loops."
+:caption: "**Foundation Module Dependencies.** Tensors and activations feed into layers, which connect to losses and dataloader, then autograd, enabling optimizers and ultimately training loops."
 graph TB
    M01[01. Tensor<br/>Multidimensional arrays] --> M03[03. Layers<br/>Linear transformations]
    M02[02. Activations<br/>Non-linear functions] --> M03

    M03 --> M04[04. Losses<br/>Measure prediction quality]
-    M03 --> M05[05. Autograd<br/>Automatic differentiation]
+    M04 --> M05[05. DataLoader<br/>Efficient data pipelines]
+    M05 --> M06[06. Autograd<br/>Automatic differentiation]

-    M04 --> M06[06. Optimizers<br/>Gradient-based updates]
-    M05 --> M06
+    M06 --> M07[07. Optimizers<br/>Gradient-based updates]

-    M06 --> M07[07. Training<br/>Complete learning loop]
+    M07 --> M08[08. Training<br/>Complete learning loop]

    style M01 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
    style M02 fill:#e3f2fd,stroke:#1976d2,stroke-width:3px
@@ -38,7 +39,8 @@ graph TB
    style M04 fill:#90caf9,stroke:#1565c0,stroke-width:3px
    style M05 fill:#90caf9,stroke:#1565c0,stroke-width:3px
    style M06 fill:#64b5f6,stroke:#0d47a1,stroke-width:3px
-    style M07 fill:#42a5f5,stroke:#0d47a1,stroke-width:4px
+    style M07 fill:#64b5f6,stroke:#0d47a1,stroke-width:3px
+    style M08 fill:#42a5f5,stroke:#0d47a1,stroke-width:4px
 ```


@@ -88,7 +90,18 @@ graph TB
 **Systems focus**: Numerical stability (log-sum-exp trick), reduction strategies


-### 05. Autograd - The Gradient Revolution
+### 05. DataLoader - Efficient Data Pipelines
+
+**What it is**: Infrastructure for loading, batching, and shuffling training data efficiently.
+
+**Why it matters**: Real ML systems train on datasets that don't fit in memory. DataLoaders handle batching, shuffling, and parallel data loading, which are essential for efficient training.
+
+**What you'll build**: A DataLoader that supports batching, shuffling, and dataset iteration with proper memory management.
+
+**Systems focus**: Memory efficiency, batching strategies, I/O optimization
+
+
+### 06. Autograd - The Gradient Revolution

 **What it is**: Automatic differentiation system that computes gradients through computation graphs.

@@ -99,7 +112,7 @@ graph TB
 **Systems focus**: Computational graphs, topological sorting, gradient accumulation


-### 06. Optimizers - Learning from Gradients
+### 07. Optimizers - Learning from Gradients

 **What it is**: Algorithms that update parameters using gradients (SGD, Adam, RMSprop).

@@ -110,7 +123,7 @@ graph TB
 **Systems focus**: Update rules, momentum buffers, numerical stability


-### 07. Training - Orchestrating the Learning Process
+### 08. Training - Orchestrating the Learning Process

 **What it is**: The training loop that ties everything together—forward pass, loss computation, backpropagation, parameter updates.

@@ -125,7 +138,7 @@ graph TB

 ```{mermaid}
 :align: center
-:caption: "**Foundation Tier Milestones.** After completing modules 01-07, you unlock three historical achievements spanning three decades of neural network breakthroughs."
+:caption: "**Foundation Tier Milestones.** After completing modules 01-08, you unlock three historical achievements spanning three decades of neural network breakthroughs."
 timeline
    title Historical Achievements Unlocked
    1957 : Perceptron : Binary classification with gradient descent
@@ -187,7 +200,7 @@ tito module start 01_tensor

 **Or explore other tiers:**

- **[ Architecture Tier](architecture)** (Modules 08-13): CNNs, transformers, attention
+- **[ Architecture Tier](architecture)** (Modules 09-13): CNNs, transformers, attention
 - **[ Optimization Tier](optimization)** (Modules 14-19): Production-ready performance
 - **[ Torch Olympics](olympics)** (Module 20): Compete in ML systems challenges