mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-05-04 08:12:32 -05:00

Files

Vijay Janapa Reddi 2f23f757e7 MAJOR: Implement beautiful module progression through strategic reordering

This commit implements the pedagogically optimal "inevitable discovery" module progression based on expert validation and educational design principles.

## Module Reordering Summary

**Previous Order (Problems)**:
- 05_losses → 06_autograd → 07_dataloader → 08_optimizers → 09_spatial → 10_training
- Issues: Autograd before optimizers, DataLoader before training, scattered dependencies

**New Order (Beautiful Progression)**:
- 05_losses → 06_optimizers → 07_autograd → 08_training → 09_spatial → 10_dataloader
- Benefits: Each module creates inevitable need for the next

## Pedagogical Flow Achieved

**05_losses** → "Need systematic weight updates" → **06_optimizers**
**06_optimizers** → "Need automatic gradients" → **07_autograd**
**07_autograd** → "Need systematic training" → **08_training**
**08_training** → "MLPs hit limits on images" → **09_spatial**
**09_spatial** → "Training is too slow" → **10_dataloader**

## Technical Changes

### Module Directory Renaming
- `06_autograd` → `07_autograd`
- `07_dataloader` → `10_dataloader`
- `08_optimizers` → `06_optimizers`
- `10_training` → `08_training`
- `09_spatial` → `09_spatial` (no change)

### System Integration Updates
- **MODULE_TO_CHECKPOINT mapping**: Updated in tito/commands/export.py
- **Test directories**: Renamed module_XX directories to match new numbers
- **Documentation**: Updated all references in MD files and agent configurations
- **CLI integration**: Updated next-steps suggestions for proper flow

### Agent Configuration Updates
- **Quality Assurance**: Updated module audit status with new numbers
- **Module Developer**: Updated work tracking with new sequence
- **Documentation**: Updated MASTER_PLAN_OF_RECORD.md with beautiful progression

## Educational Benefits

1. **Inevitable Discovery**: Each module naturally leads to the next
2. **Cognitive Load**: Concepts introduced exactly when needed
3. **Motivation**: Students understand WHY each tool is necessary
4. **Synthesis**: Everything flows toward complete ML systems understanding
5. **Professional Alignment**: Matches real ML engineering workflows

## Quality Assurance

- ✅ All CLI commands still function
- ✅ Checkpoint system mappings updated
- ✅ Documentation consistency maintained
- ✅ Test directory structure aligned
- ✅ Agent configurations synchronized

**Impact**: This reordering transforms TinyTorch from a collection of modules into a coherent educational journey where each step naturally motivates the next, creating optimal conditions for deep learning systems understanding.

2025-09-24 15:56:47 -04:00

module.yaml

MAJOR: Implement beautiful module progression through strategic reordering

2025-09-24 15:56:47 -04:00

README.md

MAJOR: Implement beautiful module progression through strategic reordering

2025-09-24 15:56:47 -04:00

README.md

Module 16: Caching - Memory Optimization for Transformers

Overview

Transform transformer inference from O(N²) memory to O(N) through intelligent caching. Learn how production systems achieve 10-100x speedups in autoregressive generation.

What You'll Build

KV Cache System: Store and reuse attention computations across time steps
Incremental Attention: Compute only new tokens, not full sequence
Memory Manager: Track and optimize cache usage
Production Patterns: Learn how GPT, LLaMA handle generation

Learning Objectives

Memory vs Computation Tradeoffs: When to trade memory for speed
Incremental Computation: Reuse previous results efficiently
Cache Management: Handle variable sequence lengths
Real-World Impact: See 50x speedup in text generation

Prerequisites

Module 14: Transformers (understand attention mechanism)
Module 15: Acceleration (backend dispatch system)

Key Concepts

The Problem: Redundant Computation

# Without caching - recompute everything each token
for token in range(1000):
    # Compute attention for ALL previous tokens
    output = attention(tokens[:token+1])  # O(N²) per token!

The Solution: KV Caching

# With caching - compute only new token
cache = KVCache()
for token in range(1000):
    # Compute attention only for new token
    output = attention(new_token, cache=cache)  # O(N) per token!
    cache.update(new_token)

Performance Impact

Before: 1000-token generation = 500,500 attention computations
After: 1000-token generation = 1,000 attention computations
Speedup: 500x fewer operations!

Real-World Applications

ChatGPT: How it generates responses in real-time
GitHub Copilot: Instant code suggestions
LLaMA: Efficient on-device inference

Module Structure

Understanding the Problem: Profile transformer generation bottlenecks
Building KV Cache: Implement cache data structure
Incremental Attention: Modify attention for single-token updates
Integration: Transparently accelerate existing transformer
Analysis: Measure memory usage and speedup

Success Criteria

✅ Transformer generates 1000 tokens with O(N) memory
✅ 10x+ speedup on autoregressive generation
✅ Existing transformer code works unchanged
✅ Understand production caching strategies