cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-08 02:28:25 -05:00

Files

Rocky c19109f9a2 fix(tinytorch): LayerNorm gamma/beta missing requires_grad=True

gamma and beta were created as Tensor(np.ones/zeros(n)) with no
requires_grad flag, defaulting to False after enable_autograd() patches
Tensor.__init__. The _LayerNormBackward.apply() guards on
beta.requires_grad and gamma.requires_grad, so gradients were silently
never computed for either parameter -- LayerNorm could not learn its
scale and shift during training.

Fix: pass requires_grad=True at construction so the backward pass
computes grad_gamma and grad_beta correctly.

Also remove the manual param.requires_grad = True workaround from
test_layernorm_gradient_flow() which was masking the bug.

2026-04-23 23:19:22 +05:30

test_13_transformers_progressive.py

refactor(tinytorch): migrate from legacy np.random to default_rng(7)

2026-04-03 17:57:51 -04:00

test_training_simple.py

refactor(tinytorch): migrate from legacy np.random to default_rng(7)

2026-04-03 17:57:51 -04:00

test_transformer_gradient_flow.py

fix(tinytorch): LayerNorm gamma/beta missing requires_grad=True

2026-04-23 23:19:22 +05:30

test_transformers_core.py

refactor(tinytorch): migrate from legacy np.random to default_rng(7)

2026-04-03 17:57:51 -04:00