[PR #1155] [MERGED] fix(attention): correct complexity explanation and memory table bug #6361

Closed
opened 2026-04-21 22:16:17 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1155
Author: @profvjreddi
Created: 2/4/2026
Status: Merged
Merged: 2/4/2026
Merged by: @profvjreddi

Base: devHead: feature/tinytorch-core


📝 Commits (1)

  • 20a4ba2 fix(attention): correct O(n²) complexity explanation and memory table bug

📊 Changes

1 file changed (+10 additions, -7 deletions)

View changed files

📝 tinytorch/src/12_attention/12_attention.py (+10 -7)

📄 Description

Summary

  • Clarifies that attention time complexity is O(n² × d), not O(n²), since each of the n² query-key pairs requires a d-dimensional dot product
  • Fixes Total Memory column in analyze_attention_memory_overhead() which was duplicating the Optimizer column instead of summing Forward + Gradients + Optimizer
  • Updates KEY INSIGHT multiplier from 4x to ~7x to match corrected total

Fixes #1150

Test plan

  • Verified corrected memory math produces 7× multiplier across all sequence lengths
  • Confirmed Total Memory column now correctly sums all components

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1155 **Author:** [@profvjreddi](https://github.com/profvjreddi) **Created:** 2/4/2026 **Status:** ✅ Merged **Merged:** 2/4/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `feature/tinytorch-core` --- ### 📝 Commits (1) - [`20a4ba2`](https://github.com/harvard-edge/cs249r_book/commit/20a4ba237998dfacba0cfae03b8a71247c1d0fd6) fix(attention): correct O(n²) complexity explanation and memory table bug ### 📊 Changes **1 file changed** (+10 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `tinytorch/src/12_attention/12_attention.py` (+10 -7) </details> ### 📄 Description ## Summary - Clarifies that attention **time complexity** is O(n² × d), not O(n²), since each of the n² query-key pairs requires a d-dimensional dot product - Fixes `Total Memory` column in `analyze_attention_memory_overhead()` which was duplicating the Optimizer column instead of summing Forward + Gradients + Optimizer - Updates KEY INSIGHT multiplier from 4x to ~7x to match corrected total Fixes #1150 ## Test plan - [x] Verified corrected memory math produces 7× multiplier across all sequence lengths - [x] Confirmed Total Memory column now correctly sums all components --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-21 22:16:17 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#6361