mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 09:38:33 -05:00
[GH-ISSUE #1334] [TinyTorch] Potential issue in 08_training test_unit_trainer_optimizer_update not updating parameters - v0.1.9 #4377
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @avikde on GitHub (Apr 15, 2026).
Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/1334
Module
08 Training
Type of Improvement
Bug fix
Description
When running 08_training.py,
test_unit_trainer_optimizer_update()fails with the assertionassert changed, "Parameters should change after optimizer update". The root cause is thatparam.gradis None inopti, because the Tensors in SimpleModel's Linear layers were not defined withrequires_grad = True`.Potential confounding factor: I started with v0.1.3, and moved progress to a different computer and resumed with v0.1.9. I apologize if the issue is caused by this.
Proposed Solution
Force gradients calculation in
trainer_init()for the modelThis fixed the issue for me, and allowed 08_training.py to complete.
@avikde commented on GitHub (Apr 15, 2026):
A different possible solution is to update layers.Linear to have
requires_grad = Trueon the weight and bias Tensors.I had a related problem with
tito milestone run 03the first time -- it did not seem to update parameters during training, and the root cause was the same:After adding
require_grad = Truein layers.Linear self.weight and self.bias:@Shashank-Tripathi-07 commented on GitHub (Apr 16, 2026):
Hi @avikde
Thanks for the detailed report and clear repro. I've confirmed the root cause and applied a fix.
Root Cause
The failure is a silent gradient chain break —
backward()runs but propagates nothing, soparam.gradstaysNoneandoptimizer.step()skips every parameter.Here's the full chain of why:
1.
Linear.__init__creates params without gradient trackingAfter
enable_autograd()patchesTensor.__init__, the default isrequires_grad=Falseunless explicitly passed.2.
tracked_mse_forwardonly builds a backward graph ifpredictions.requires_gradIf the gate is False,
losshas no_grad_fnandloss.backward()returns immediately.3.
predictions.requires_graddepends on the weight's flag at matmul timepredictionsonly getsrequires_grad=Trueifweight.requires_gradis True at the moment of the forward pass.4.
Optimizer.__init__does set the flag — but it's fragileThis works if the optimizer is constructed before the forward pass with the same Tensor objects. Any ordering issue, re-creation of parameters, or incomplete student implementation of Module 07 breaks this silently.
The Fix
Added an explicit
requires_grad = Trueloop insidetrainer_init(08_training.py):
Why here? The Trainer is the place that owns the full training contract — it should be the authoritative point that guarantees trainability, regardless of optimizer setup order or how layers were constructed. It also makes the pedagogical intent explicit in Module 08.
I'll do a PR and correct this behaviour. Thanks for highlighting the issue !!
@Shashank-Tripathi-07 commented on GitHub (Apr 16, 2026):
Done, pushed the changes in this PR https://github.com/harvard-edge/cs249r_book/pull/1335. Thanks for your contribution !!