TinyTorch

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-03-26 14:22:20 -05:00

Files

Vijay Janapa Reddi f2f46ac021 Fix enable_kv_cache to handle mask parameter and add integration test

Module 14 fix:
- Updated cached_forward() to accept mask parameter (x, mask=None)
- Attention forward calls with 2 args: forward(x, mask)
- Now properly passes through both arguments to original forward

Integration test (test_kv_cache_milestone.py):
- Tests generation WITHOUT cache (baseline)
- Tests generation WITH cache enabled
- Verifies cache infrastructure works without breaking model
- Documents current implementation (architecture demo)
- Shows that full speedup requires deeper attention integration

Test results:
✅ Without cache: 139.3 tok/s
✅ With cache: 142.5 tok/s (similar - expected with pass-through)
✅ Cache infrastructure successfully integrated
✅ Model continues to work with caching enabled

Educational value:
Students learn the PATTERN of non-invasive optimization through
composition and monkey-patching, which is more important than
absolute speedup numbers for this module.

2025-11-05 19:13:41 -05:00

source

Fix enable_kv_cache to handle mask parameter and add integration test

2025-11-05 19:13:41 -05:00