MAJOR: Implement beautiful module progression through strategic reordering

This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
2026-05-05 17:42:51 -05:00 · 2025-09-24 15:56:47 -04:00
parent 0d87b6603f
commit 2f23f757e7
68 changed files with 5875 additions and 2399 deletions
--- a/modules/16_caching/README.md
+++ b/modules/16_caching/README.md
@@ -0,0 +1,63 @@
+# Module 16: Caching - Memory Optimization for Transformers
+
+## Overview
+Transform transformer inference from O(N²) memory to O(N) through intelligent caching. Learn how production systems achieve 10-100x speedups in autoregressive generation.
+
+## What You'll Build
+- **KV Cache System**: Store and reuse attention computations across time steps
+- **Incremental Attention**: Compute only new tokens, not full sequence
+- **Memory Manager**: Track and optimize cache usage
+- **Production Patterns**: Learn how GPT, LLaMA handle generation
+
+## Learning Objectives
+1. **Memory vs Computation Tradeoffs**: When to trade memory for speed
+2. **Incremental Computation**: Reuse previous results efficiently  
+3. **Cache Management**: Handle variable sequence lengths
+4. **Real-World Impact**: See 50x speedup in text generation
+
+## Prerequisites
+- Module 14: Transformers (understand attention mechanism)
+- Module 15: Acceleration (backend dispatch system)
+
+## Key Concepts
+
+### The Problem: Redundant Computation
+```python
+# Without caching - recompute everything each token
+for token in range(1000):
+    # Compute attention for ALL previous tokens
+    output = attention(tokens[:token+1])  # O(N²) per token!
+```
+
+### The Solution: KV Caching
+```python
+# With caching - compute only new token
+cache = KVCache()
+for token in range(1000):
+    # Compute attention only for new token
+    output = attention(new_token, cache=cache)  # O(N) per token!
+    cache.update(new_token)
+```
+
+## Performance Impact
+- **Before**: 1000-token generation = 500,500 attention computations
+- **After**: 1000-token generation = 1,000 attention computations
+- **Speedup**: 500x fewer operations!
+
+## Real-World Applications
+- **ChatGPT**: How it generates responses in real-time
+- **GitHub Copilot**: Instant code suggestions
+- **LLaMA**: Efficient on-device inference
+
+## Module Structure
+1. **Understanding the Problem**: Profile transformer generation bottlenecks
+2. **Building KV Cache**: Implement cache data structure
+3. **Incremental Attention**: Modify attention for single-token updates
+4. **Integration**: Transparently accelerate existing transformer
+5. **Analysis**: Measure memory usage and speedup
+
+## Success Criteria
+- ✅ Transformer generates 1000 tokens with O(N) memory
+- ✅ 10x+ speedup on autoregressive generation
+- ✅ Existing transformer code works unchanged
+- ✅ Understand production caching strategies
--- a/modules/16_caching/module.yaml
+++ b/modules/16_caching/module.yaml
@@ -0,0 +1,28 @@
+name: Caching
+number: 16
+type: optimization
+difficulty: advanced
+estimated_hours: 8-12
+
+description: |
+  Memory optimization through caching, focusing on KV caching for transformer inference.
+  Students learn how to reuse computations across time steps in autoregressive generation.
+
+learning_objectives:
+  - Understand memory vs computation tradeoffs
+  - Implement KV caching for transformer inference
+  - Learn incremental computation patterns
+  - Optimize autoregressive generation speed
+
+prerequisites:
+  - Module 14: Transformers
+  - Module 15: Acceleration
+
+skills_developed:
+  - Memory optimization techniques
+  - Incremental computation strategies
+  - Transformer inference optimization
+  - Cache management patterns
+
+exports:
+  - tinytorch.optimizations.caching