mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 17:49:07 -05:00
[PR #1504] [MERGED] fix(tinytorch): LayerNorm gamma/beta missing requires_grad=True #7371
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/harvard-edge/cs249r_book/pull/1504
Author: @Shashank-Tripathi-07
Created: 4/23/2026
Status: ✅ Merged
Merged: 4/24/2026
Merged by: @profvjreddi
Base:
dev← Head:fix/layernorm-requires-grad📝 Commits (1)
c19109ffix(tinytorch): LayerNorm gamma/beta missing requires_grad=True📊 Changes
2 files changed (+2 additions, -6 deletions)
View changed files
📝
tinytorch/src/13_transformers/13_transformers.py(+2 -2)📝
tinytorch/tests/13_transformers/test_transformer_gradient_flow.py(+0 -4)📄 Description
What
LayerNorm.__init__createsgammaandbetaas plainTensor(np.ones(n))andTensor(np.zeros(n))-- norequires_gradflag. Afterenable_autograd()patchesTensor.__init__, the default isrequires_grad=False._LayerNormBackward.apply()guards gradient computation onbeta.requires_gradandgamma.requires_grad(lines 475, 482). Since both areFalse,grad_gammaandgrad_betaare alwaysNone-- LayerNorm silently never learns its scale and shift parameters during training.Why it was hidden
test_layernorm_gradient_flow()worked around the bug by manually settingparam.requires_grad = Trueafter construction:This meant the test passed but the actual source code was still broken for any real training run that didn't manually patch the parameters.
Fix
Pass
requires_grad=Trueat construction inLayerNorm.__init__:Also removes the manual workaround from
test_layernorm_gradient_flow()so the test now validates the source code directly.Verification
The backward math in
_LayerNormBackward.apply()is correct -- the only thing missing was the flag that gates it. Withrequires_grad=Trueset at construction,grad_gammaandgrad_betaare computed and populated on every backward pass.🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.