mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-02 18:46:13 -05:00
- Change from .data extraction to Tensor arithmetic (x - mean, diff * diff, x / std) - Preserve computation graph through normalization - std tensor now preserves requires_grad correctly LayerNorm is used before and after attention in transformer blocks