Files
TinyTorch/modules/source
Vijay Janapa Reddi 0ba1a210a8 Implement non-invasive KV cache integration (enable_kv_cache)
Module 14 now provides enable_kv_cache(model) - following same pattern
as enable_autograd() from Module 05. Key innovation: students ADD
capabilities in new modules WITHOUT modifying old ones!

Implementation:
- enable_kv_cache(model): Patches model attention layers with caching
- disable_kv_cache(model): Restores original attention behavior
- Non-invasive: Modules 12-13 unchanged, Module 14 enhances them
- Educational: Teaches composition over modification

Architecture Pattern:
1. Module 14 wraps each TransformerBlock attention layer
2. Stores original forward methods before patching
3. Creates cache infrastructure for model architecture
4. Can enable/disable without breaking model

Systems Engineering Lesson:
Forward-only learning: New modules ADD features, never BREAK old ones
- Module 12 (Attention): Core implementation
- Module 13 (Transformers): Uses Module 12
- Module 14 (KV Caching): ENHANCES Module 12 without changing it

Milestone Integration:
- TinyGPT.generate() now uses enable_kv_cache() when use_cache=True
- Cache automatically created for model architecture
- Clean fallback if Module 14 not available
- Educational notes explain concept vs production implementation

Module now: 1005 lines (805 + 200 integration code)
Tests: All pass (12/12 including new integration tests)
2025-11-05 18:19:52 -05:00
..