mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-04 04:36:12 -05:00
- Embedding.forward() now preserves requires_grad from weight tensor - PositionalEncoding.forward() uses Tensor addition (x + pos) instead of .data - Critical for transformer input embeddings to have gradients Both changes ensure gradient flows from loss back to embedding weights