mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-29 17:20:21 -05:00
- Clarify that attention time complexity is O(n²×d), not O(n²), since each of the n² query-key pairs requires a d-dimensional dot product - Fix Total Memory column in analyze_attention_memory_overhead() which was duplicating the Optimizer column instead of summing all components - Update KEY INSIGHT multiplier from 4x to 7x to match corrected total Fixes harvard-edge/cs249r_book#1150