[GH-ISSUE #1334] [TinyTorch] Potential issue in 08_training test_unit_trainer_optimizer_update not updating parameters - v0.1.9 #4377

Closed
opened 2026-04-19 12:23:55 -05:00 by GiteaMirror · 3 comments
Owner

Originally created by @avikde on GitHub (Apr 15, 2026).
Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/1334

Module

08 Training

Type of Improvement

Bug fix

Description

When running 08_training.py, test_unit_trainer_optimizer_update() fails with the assertion assert changed, "Parameters should change after optimizer update". The root cause is that param.grad is None in opti, because the Tensors in SimpleModel's Linear layers were not defined withrequires_grad = True`.


Potential confounding factor: I started with v0.1.3, and moved progress to a different computer and resumed with v0.1.9. I apologize if the issue is caused by this.

Proposed Solution

Force gradients calculation in trainer_init() for the model

    # History tracking
    self.history = {
        'train_loss': [],
        'eval_loss': [],
        'learning_rates': []
    }

    # Enable gradient tracking for all model parameters
    # This is needed because layers may be created without requires_grad=True
    for param in model.parameters():
        param.requires_grad = True

    ### END SOLUTION

This fixed the issue for me, and allowed 08_training.py to complete.

Originally created by @avikde on GitHub (Apr 15, 2026). Original GitHub issue: https://github.com/harvard-edge/cs249r_book/issues/1334 ### Module 08 Training ### Type of Improvement Bug fix ### Description When running 08_training.py, `test_unit_trainer_optimizer_update()` fails with the assertion `assert changed, "Parameters should change after optimizer update"`. The root cause is that `param.grad` is None in `opti, because the Tensors in SimpleModel's Linear layers were not defined with`requires_grad = True`. --- Potential confounding factor: I started with v0.1.3, and moved progress to a different computer and resumed with v0.1.9. I apologize if the issue is caused by this. ### Proposed Solution Force gradients calculation in `trainer_init()` for the model ```python # History tracking self.history = { 'train_loss': [], 'eval_loss': [], 'learning_rates': [] } # Enable gradient tracking for all model parameters # This is needed because layers may be created without requires_grad=True for param in model.parameters(): param.requires_grad = True ### END SOLUTION ``` This fixed the issue for me, and allowed 08_training.py to complete.
GiteaMirror added the type: bugdependenciestype: improvementarea: tinytorch labels 2026-04-19 12:23:55 -05:00
Author
Owner

@avikde commented on GitHub (Apr 15, 2026):

A different possible solution is to update layers.Linear to have requires_grad = True on the weight and bias Tensors.

I had a related problem with tito milestone run 03 the first time -- it did not seem to update parameters during training, and the root cause was the same:

🔥 Training in Progress...
(Watch backpropagation optimize through hidden layers!)

Epoch  5/20  Loss: 2.3618  Train: 6.2%  Test: 9.5%  ✓ Gap: -3.3%
Epoch 10/20  Loss: 2.3583  Train: 6.2%  Test: 9.5%  ✓ Gap: -3.3%
Epoch 15/20  Loss: 2.3613  Train: 6.2%  Test: 9.5%  ✓ Gap: -3.3%
Epoch 20/20  Loss: 2.3611  Train: 6.2%  Test: 9.5%  ✓ Gap: -3.3%

After adding require_grad = True in layers.Linear self.weight and self.bias:

Epoch  5/20  Loss: 2.1046  Train: 40.8%  Test: 35.5%  ✓ Gap: 5.3%
Epoch 10/20  Loss: 1.8583  Train: 66.7%  Test: 60.5%  ✓ Gap: 6.2%
Epoch 15/20  Loss: 1.5911  Train: 81.3%  Test: 75.0%  ✓ Gap: 6.3%
Epoch 20/20  Loss: 1.3239  Train: 87.6%  Test: 83.5%  ✓ Gap: 4.1%
⠋ Epoch 20/20  Batch 32/32
<!-- gh-comment-id:4254415101 --> @avikde commented on GitHub (Apr 15, 2026): A different possible solution is to update layers.Linear to have `requires_grad = True` on the weight and bias Tensors. I had a related problem with `tito milestone run 03` the first time -- it did not seem to update parameters during training, and the root cause was the same: ``` 🔥 Training in Progress... (Watch backpropagation optimize through hidden layers!) Epoch 5/20 Loss: 2.3618 Train: 6.2% Test: 9.5% ✓ Gap: -3.3% Epoch 10/20 Loss: 2.3583 Train: 6.2% Test: 9.5% ✓ Gap: -3.3% Epoch 15/20 Loss: 2.3613 Train: 6.2% Test: 9.5% ✓ Gap: -3.3% Epoch 20/20 Loss: 2.3611 Train: 6.2% Test: 9.5% ✓ Gap: -3.3% ``` After adding `require_grad = True` in layers.Linear self.weight and self.bias: ``` Epoch 5/20 Loss: 2.1046 Train: 40.8% Test: 35.5% ✓ Gap: 5.3% Epoch 10/20 Loss: 1.8583 Train: 66.7% Test: 60.5% ✓ Gap: 6.2% Epoch 15/20 Loss: 1.5911 Train: 81.3% Test: 75.0% ✓ Gap: 6.3% Epoch 20/20 Loss: 1.3239 Train: 87.6% Test: 83.5% ✓ Gap: 4.1% ⠋ Epoch 20/20 Batch 32/32 ```
Author
Owner

@Shashank-Tripathi-07 commented on GitHub (Apr 16, 2026):

Hi @avikde

Thanks for the detailed report and clear repro. I've confirmed the root cause and applied a fix.


Root Cause

The failure is a silent gradient chain breakbackward() runs but propagates nothing, so param.grad stays None and optimizer.step() skips every parameter.

Here's the full chain of why:

1. Linear.__init__ creates params without gradient tracking

# 03_layers.py
self.weight = Tensor(weight_data)   # requires_grad=False by default
self.bias   = Tensor(bias_data)     # requires_grad=False by default

After enable_autograd() patches Tensor.__init__, the default is requires_grad=False unless explicitly passed.

2. tracked_mse_forward only builds a backward graph if predictions.requires_grad

# 06_autograd.py — tracked_mse_forward
result = Tensor(mse)
if _GRAD_TRACKING_ENABLED and predictions.requires_grad:   # ← gate
    result.requires_grad = True
    result._grad_fn = MSEBackward(predictions, targets)

If the gate is False, loss has no _grad_fn and loss.backward() returns immediately.

3. predictions.requires_grad depends on the weight's flag at matmul time

# 06_autograd.py — tracked_matmul
if _get_requires_grad(self) or _get_requires_grad(other):
    result.requires_grad = True
    result._grad_fn = MatmulBackward(self, other)

predictions only gets requires_grad=True if weight.requires_grad is True at the moment of the forward pass.

4. Optimizer.__init__ does set the flag — but it's fragile

# 07_optimizers.py
for param in self.params:
    if isinstance(param, Tensor):
        param.requires_grad = True   # ← does run

This works if the optimizer is constructed before the forward pass with the same Tensor objects. Any ordering issue, re-creation of parameters, or incomplete student implementation of Module 07 breaks this silently.


The Fix

Added an explicit requires_grad = True loop inside trainer_init
(08_training.py):

# Enable gradient tracking for all model parameters.
# Layers (e.g. Linear) may be created without requires_grad=True,
# so we set it explicitly here to ensure backward() populates param.grad.
for param in model.parameters():
    param.requires_grad = True

Why here? The Trainer is the place that owns the full training contract — it should be the authoritative point that guarantees trainability, regardless of optimizer setup order or how layers were constructed. It also makes the pedagogical intent explicit in Module 08.


I'll do a PR and correct this behaviour. Thanks for highlighting the issue !!

<!-- gh-comment-id:4259673751 --> @Shashank-Tripathi-07 commented on GitHub (Apr 16, 2026): Hi @avikde Thanks for the detailed report and clear repro. I've confirmed the root cause and applied a fix. --- ## Root Cause The failure is a **silent gradient chain break** — `backward()` runs but propagates nothing, so `param.grad` stays `None` and `optimizer.step()` skips every parameter. Here's the full chain of why: ### 1. `Linear.__init__` creates params without gradient tracking ```python # 03_layers.py self.weight = Tensor(weight_data) # requires_grad=False by default self.bias = Tensor(bias_data) # requires_grad=False by default ``` After `enable_autograd()` patches `Tensor.__init__`, the default is `requires_grad=False` unless explicitly passed. ### 2. `tracked_mse_forward` only builds a backward graph if `predictions.requires_grad` ```python # 06_autograd.py — tracked_mse_forward result = Tensor(mse) if _GRAD_TRACKING_ENABLED and predictions.requires_grad: # ← gate result.requires_grad = True result._grad_fn = MSEBackward(predictions, targets) ``` If the gate is False, `loss` has no `_grad_fn` and `loss.backward()` returns immediately. ### 3. `predictions.requires_grad` depends on the weight's flag at matmul time ```python # 06_autograd.py — tracked_matmul if _get_requires_grad(self) or _get_requires_grad(other): result.requires_grad = True result._grad_fn = MatmulBackward(self, other) ``` `predictions` only gets `requires_grad=True` if `weight.requires_grad` is True **at the moment of the forward pass**. ### 4. `Optimizer.__init__` does set the flag — but it's fragile ```python # 07_optimizers.py for param in self.params: if isinstance(param, Tensor): param.requires_grad = True # ← does run ``` This works *if* the optimizer is constructed before the forward pass with the same Tensor objects. Any ordering issue, re-creation of parameters, or incomplete student implementation of Module 07 breaks this silently. --- ## The Fix Added an explicit `requires_grad = True` loop inside `trainer_init` ([08_training.py](tinytorch/src/08_training/08_training.py)): ```python # Enable gradient tracking for all model parameters. # Layers (e.g. Linear) may be created without requires_grad=True, # so we set it explicitly here to ensure backward() populates param.grad. for param in model.parameters(): param.requires_grad = True ``` **Why here?** The Trainer is the place that owns the full training contract — it should be the authoritative point that guarantees trainability, regardless of optimizer setup order or how layers were constructed. It also makes the pedagogical intent explicit in Module 08. --- I'll do a PR and correct this behaviour. Thanks for highlighting the issue !!
Author
Owner

@Shashank-Tripathi-07 commented on GitHub (Apr 16, 2026):

Done, pushed the changes in this PR https://github.com/harvard-edge/cs249r_book/pull/1335. Thanks for your contribution !!

<!-- gh-comment-id:4259680643 --> @Shashank-Tripathi-07 commented on GitHub (Apr 16, 2026): Done, pushed the changes in this PR [https://github.com/harvard-edge/cs249r_book/pull/1335](url). Thanks for your contribution !!
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#4377