mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-05 17:42:51 -05:00
MAJOR: Implement beautiful module progression through strategic reordering
This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles. ## Module Reordering Summary **Previous Order (Problems)**: - 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training - Issues: Autograd before optimizers, DataLoader before training, scattered dependencies **New Order (Beautiful Progression)**: - 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader - Benefits: Each module creates inevitable need for the next ## Pedagogical Flow Achieved **05_losses** → "Need systematic weight updates" → **06_optimizers** **06_optimizers** → "Need automatic gradients" → **07_autograd** **07_autograd** → "Need systematic training" → **08_training** **08_training** → "MLPs hit limits on images" → **09_spatial** **09_spatial** → "Training is too slow" → **10_dataloader** ## Technical Changes ### Module Directory Renaming - `06_autograd` → `07_autograd` - `07_dataloader` → `10_dataloader` - `08_optimizers` → `06_optimizers` - `10_training` → `08_training` - `09_spatial` → `09_spatial` (no change) ### System Integration Updates - **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py - **Test directories**: Renamed module_XX directories to match new numbers - **Documentation**: Updated all references in MD files and agent configurations - **CLI integration**: Updated next-steps suggestions for proper flow ### Agent Configuration Updates - **Quality Assurance**: Updated module audit status with new numbers - **Module Developer**: Updated work tracking with new sequence - **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression ## Educational Benefits 1. **Inevitable Discovery**: Each module naturally leads to the next 2. **Cognitive Load**: Concepts introduced exactly when needed 3. **Motivation**: Students understand WHY each tool is necessary 4. **Synthesis**: Everything flows toward complete ML systems understanding 5. **Professional Alignment**: Matches real ML engineering workflows ## Quality Assurance - ✅ All CLI commands still function - ✅ Checkpoint system mappings updated - ✅ Documentation consistency maintained - ✅ Test directory structure aligned - ✅ Agent configurations synchronized **Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.
This commit is contained in:
63
modules/16_caching/README.md
Normal file
63
modules/16_caching/README.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# Module 16: Caching - Memory Optimization for Transformers
|
||||
|
||||
## Overview
|
||||
Transform transformer inference from O(N²) memory to O(N) through intelligent caching. Learn how production systems achieve 10-100x speedups in autoregressive generation.
|
||||
|
||||
## What You'll Build
|
||||
- **KV Cache System**: Store and reuse attention computations across time steps
|
||||
- **Incremental Attention**: Compute only new tokens, not full sequence
|
||||
- **Memory Manager**: Track and optimize cache usage
|
||||
- **Production Patterns**: Learn how GPT, LLaMA handle generation
|
||||
|
||||
## Learning Objectives
|
||||
1. **Memory vs Computation Tradeoffs**: When to trade memory for speed
|
||||
2. **Incremental Computation**: Reuse previous results efficiently
|
||||
3. **Cache Management**: Handle variable sequence lengths
|
||||
4. **Real-World Impact**: See 50x speedup in text generation
|
||||
|
||||
## Prerequisites
|
||||
- Module 14: Transformers (understand attention mechanism)
|
||||
- Module 15: Acceleration (backend dispatch system)
|
||||
|
||||
## Key Concepts
|
||||
|
||||
### The Problem: Redundant Computation
|
||||
```python
|
||||
# Without caching - recompute everything each token
|
||||
for token in range(1000):
|
||||
# Compute attention for ALL previous tokens
|
||||
output = attention(tokens[:token+1]) # O(N²) per token!
|
||||
```
|
||||
|
||||
### The Solution: KV Caching
|
||||
```python
|
||||
# With caching - compute only new token
|
||||
cache = KVCache()
|
||||
for token in range(1000):
|
||||
# Compute attention only for new token
|
||||
output = attention(new_token, cache=cache) # O(N) per token!
|
||||
cache.update(new_token)
|
||||
```
|
||||
|
||||
## Performance Impact
|
||||
- **Before**: 1000-token generation = 500,500 attention computations
|
||||
- **After**: 1000-token generation = 1,000 attention computations
|
||||
- **Speedup**: 500x fewer operations!
|
||||
|
||||
## Real-World Applications
|
||||
- **ChatGPT**: How it generates responses in real-time
|
||||
- **GitHub Copilot**: Instant code suggestions
|
||||
- **LLaMA**: Efficient on-device inference
|
||||
|
||||
## Module Structure
|
||||
1. **Understanding the Problem**: Profile transformer generation bottlenecks
|
||||
2. **Building KV Cache**: Implement cache data structure
|
||||
3. **Incremental Attention**: Modify attention for single-token updates
|
||||
4. **Integration**: Transparently accelerate existing transformer
|
||||
5. **Analysis**: Measure memory usage and speedup
|
||||
|
||||
## Success Criteria
|
||||
- ✅ Transformer generates 1000 tokens with O(N) memory
|
||||
- ✅ 10x+ speedup on autoregressive generation
|
||||
- ✅ Existing transformer code works unchanged
|
||||
- ✅ Understand production caching strategies
|
||||
28
modules/16_caching/module.yaml
Normal file
28
modules/16_caching/module.yaml
Normal file
@@ -0,0 +1,28 @@
|
||||
name: Caching
|
||||
number: 16
|
||||
type: optimization
|
||||
difficulty: advanced
|
||||
estimated_hours: 8-12
|
||||
|
||||
description: |
|
||||
Memory optimization through caching, focusing on KV caching for transformer inference.
|
||||
Students learn how to reuse computations across time steps in autoregressive generation.
|
||||
|
||||
learning_objectives:
|
||||
- Understand memory vs computation tradeoffs
|
||||
- Implement KV caching for transformer inference
|
||||
- Learn incremental computation patterns
|
||||
- Optimize autoregressive generation speed
|
||||
|
||||
prerequisites:
|
||||
- Module 14: Transformers
|
||||
- Module 15: Acceleration
|
||||
|
||||
skills_developed:
|
||||
- Memory optimization techniques
|
||||
- Incremental computation strategies
|
||||
- Transformer inference optimization
|
||||
- Cache management patterns
|
||||
|
||||
exports:
|
||||
- tinytorch.optimizations.caching
|
||||
Reference in New Issue
Block a user