[PR #1785] fix(tinytorch): fix GPT causal mask convention in module 13 #15736

Open
opened 2026-05-20 14:05:03 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1785
Author: @Shashank-Tripathi-07
Created: 5/18/2026
Status: 🔄 Open

Base: devHead: fix/tinytorch-modules-10-20-audit


📝 Commits (2)

  • fbb3f3d fix(site): add nav-footer and dropdown-menu dark mode selectors
  • b9ce5b2 fix(tinytorch): fix GPT causal mask convention in module 13

📊 Changes

3 files changed (+41 additions, -6 deletions)

View changed files

📝 shared/config/footer-site.yml (+1 -1)
📝 shared/styles/_site-dark.scss (+37 -3)
📝 tinytorch/src/13_transformers/13_transformers.py (+3 -2)

📄 Description

Summary

  • GPT._create_causal_mask in 13_transformers.py was generating an additive -inf mask (upper-triangular with -inf, 0 for allowed positions), but _apply_mask in 12_attention.py expects a binary lower-triangular mask (1=allow, 0=block).
  • With the old mask, every allowed position (value 0) received (1-0)*MASK_VALUE = -1e9 from _apply_mask, blocking all attention including valid past/present positions -- only future positions coincidentally received a correct -inf via (1-(-inf))*(-1e9).
  • Fixed by switching to np.tril(np.ones((seq_len, seq_len))) which matches the binary convention _apply_mask requires.

Test plan

  • Run test_unit_transformer_block in 13_transformers.py -- shape checks pass
  • Manually verify that GPT.generate() produces non-uniform attention over past tokens (all tokens no longer receive near-zero weight due to incorrect masking)
  • Run test_unit_scaled_dot_product_attention in 12_attention.py to confirm the mask format contract is respected end-to-end

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1785 **Author:** [@Shashank-Tripathi-07](https://github.com/Shashank-Tripathi-07) **Created:** 5/18/2026 **Status:** 🔄 Open **Base:** `dev` ← **Head:** `fix/tinytorch-modules-10-20-audit` --- ### 📝 Commits (2) - [`fbb3f3d`](https://github.com/harvard-edge/cs249r_book/commit/fbb3f3d1638fd7b123d885c9cbcd1cbd7e8257cc) fix(site): add nav-footer and dropdown-menu dark mode selectors - [`b9ce5b2`](https://github.com/harvard-edge/cs249r_book/commit/b9ce5b25224dbf81ac442fcbf503a742a0783efd) fix(tinytorch): fix GPT causal mask convention in module 13 ### 📊 Changes **3 files changed** (+41 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `shared/config/footer-site.yml` (+1 -1) 📝 `shared/styles/_site-dark.scss` (+37 -3) 📝 `tinytorch/src/13_transformers/13_transformers.py` (+3 -2) </details> ### 📄 Description ## Summary - `GPT._create_causal_mask` in `13_transformers.py` was generating an additive -inf mask (upper-triangular with -inf, 0 for allowed positions), but `_apply_mask` in `12_attention.py` expects a binary lower-triangular mask (1=allow, 0=block). - With the old mask, every allowed position (value 0) received `(1-0)*MASK_VALUE = -1e9` from `_apply_mask`, blocking all attention including valid past/present positions -- only future positions coincidentally received a correct -inf via `(1-(-inf))*(-1e9)`. - Fixed by switching to `np.tril(np.ones((seq_len, seq_len)))` which matches the binary convention `_apply_mask` requires. ## Test plan - [x] Run `test_unit_transformer_block` in `13_transformers.py` -- shape checks pass - [x] Manually verify that `GPT.generate()` produces non-uniform attention over past tokens (all tokens no longer receive near-zero weight due to incorrect masking) - [x] Run `test_unit_scaled_dot_product_attention` in `12_attention.py` to confirm the mask format contract is respected end-to-end --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-05-20 14:05:03 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#15736