mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-03 05:20:57 -05:00
- Embedding.forward() now preserves requires_grad from weight tensor - PositionalEncoding.forward() uses Tensor addition (x + pos) instead of .data - Critical for transformer input embeddings to have gradients Both changes ensure gradient flows from loss back to embedding weights