mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-21 13:31:55 -05:00
[PR #1785] fix(tinytorch): fix GPT causal mask convention in module 13 #15736
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/harvard-edge/cs249r_book/pull/1785
Author: @Shashank-Tripathi-07
Created: 5/18/2026
Status: 🔄 Open
Base:
dev← Head:fix/tinytorch-modules-10-20-audit📝 Commits (2)
fbb3f3dfix(site): add nav-footer and dropdown-menu dark mode selectorsb9ce5b2fix(tinytorch): fix GPT causal mask convention in module 13📊 Changes
3 files changed (+41 additions, -6 deletions)
View changed files
📝
shared/config/footer-site.yml(+1 -1)📝
shared/styles/_site-dark.scss(+37 -3)📝
tinytorch/src/13_transformers/13_transformers.py(+3 -2)📄 Description
Summary
GPT._create_causal_maskin13_transformers.pywas generating an additive -inf mask (upper-triangular with -inf, 0 for allowed positions), but_apply_maskin12_attention.pyexpects a binary lower-triangular mask (1=allow, 0=block).(1-0)*MASK_VALUE = -1e9from_apply_mask, blocking all attention including valid past/present positions -- only future positions coincidentally received a correct -inf via(1-(-inf))*(-1e9).np.tril(np.ones((seq_len, seq_len)))which matches the binary convention_apply_maskrequires.Test plan
test_unit_transformer_blockin13_transformers.py-- shape checks passGPT.generate()produces non-uniform attention over past tokens (all tokens no longer receive near-zero weight due to incorrect masking)test_unit_scaled_dot_product_attentionin12_attention.pyto confirm the mask format contract is respected end-to-end🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.