mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-28 04:36:50 -05:00
Refactored gradient accumulation to use clearer two-step approach: 1. Remove extra leading dimensions (batch dims) 2. Sum over dimensions that were size-1 (broadcast dims) Benefits: - Clearer intent: while loop for variable dims, for loop for fixed dims - Better comments with concrete examples - Easier for students to understand broadcasting in backprop - Matches how you'd explain it verbally Same functionality, cleaner code.