TinyTorch

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-04-28 16:12:32 -05:00

Files

Vijay Janapa Reddi 80734693e8 Add comprehensive documentation for KV cache path selection

Enhanced Module 14 with extensive educational documentation explaining:

Three-Path Selection Strategy:
- PATH 1: Training (seq_len > 1) - Uses original attention, preserves gradients
- PATH 2: First Token (cache empty) - Uses original attention, initializes cache
- PATH 3: Cached Generation (cache populated) - THE SPEEDUP PATH, O(n) computation

Why .data Instead of Tensor Operations:
- Explicit intent: Clear separation of training vs inference code
- Performance: Avoids autograd overhead during generation
- Industry standard: Production LLMs (vLLM, llama.cpp) use same pattern

O(n²) to O(n) Transformation Explained:
- WITHOUT cache: O(N³) total across all steps (1² + 2² + ... + N²)
- WITH cache: O(N²) total across all steps (1 + 2 + ... + N)
- Result: 5-7x speedup on short sequences, 10-15x on longer ones

Inline comments added at every decision point for student comprehension.
Module 14 now complete with working implementation and comprehensive pedagogy.

2025-11-06 12:30:39 -05:00

kvcaching_dev.ipynb

Add comprehensive documentation for KV cache path selection

2025-11-06 12:30:39 -05:00

kvcaching_dev.py

Add comprehensive documentation for KV cache path selection

2025-11-06 12:30:39 -05:00