mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 02:28:25 -05:00
gamma and beta were created as Tensor(np.ones/zeros(n)) with no requires_grad flag, defaulting to False after enable_autograd() patches Tensor.__init__. The _LayerNormBackward.apply() guards on beta.requires_grad and gamma.requires_grad, so gradients were silently never computed for either parameter -- LayerNorm could not learn its scale and shift during training. Fix: pass requires_grad=True at construction so the backward pass computes grad_gamma and grad_beta correctly. Also remove the manual param.requires_grad = True workaround from test_layernorm_gradient_flow() which was masking the bug.