[PR #1155] [MERGED] fix(attention): correct complexity explanation and memory table bug #6361

New Issue

GiteaMirror · 2026-04-21T22:16:17-05:00

GiteaMirror commented

2026-04-21 22:16:17 -05:00

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1155
Author: @profvjreddi
Created: 2/4/2026
Status: ✅ Merged
Merged: 2/4/2026
Merged by: @profvjreddi

Base: dev ← Head: feature/tinytorch-core

📝 Commits (1)

20a4ba2 fix(attention): correct O(n²) complexity explanation and memory table bug

📊 Changes

1 file changed (+10 additions, -7 deletions)

View changed files

📝 tinytorch/src/12_attention/12_attention.py (+10 -7)

📄 Description

Summary

Clarifies that attention time complexity is O(n² × d), not O(n²), since each of the n² query-key pairs requires a d-dimensional dot product
Fixes Total Memory column in analyze_attention_memory_overhead() which was duplicating the Optimizer column instead of summing Forward + Gradients + Optimizer
Updates KEY INSIGHT multiplier from 4x to ~7x to match corrected total

Fixes #1150

Test plan

Verified corrected memory math produces 7× multiplier across all sequence lengths
Confirmed Total Memory column now correctly sums all components

_{🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.}

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1155 **Author:** [@profvjreddi](https://github.com/profvjreddi) **Created:** 2/4/2026 **Status:** ✅ Merged **Merged:** 2/4/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `feature/tinytorch-core` --- ### 📝 Commits (1) - [`20a4ba2`](https://github.com/harvard-edge/cs249r_book/commit/20a4ba237998dfacba0cfae03b8a71247c1d0fd6) fix(attention): correct O(n²) complexity explanation and memory table bug ### 📊 Changes **1 file changed** (+10 additions, -7 deletions) <details> <summary>View changed files</summary> 📝 `tinytorch/src/12_attention/12_attention.py` (+10 -7) </details> ### 📄 Description ## Summary - Clarifies that attention **time complexity** is O(n² × d), not O(n²), since each of the n² query-key pairs requires a d-dimensional dot product - Fixes `Total Memory` column in `analyze_attention_memory_overhead()` which was duplicating the Optimizer column instead of summing Forward + Gradients + Optimizer - Updates KEY INSIGHT multiplier from 4x to ~7x to match corrected total Fixes #1150 ## Test plan - [x] Verified corrected memory math produces 7× multiplier across all sequence lengths - [x] Confirmed Total Memory column now correctly sums all components --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>

GiteaMirror added the pull-request label 2026-04-21 22:16:17 -05:00

GiteaMirror closed this issue

2026-04-21 22:16:18 -05:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#6361