mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-02 08:32:31 -05:00
- Preserve computation graph by using Tensor arithmetic (x - x_max, exp / sum) - No more .data extraction that breaks gradient flow - Numerically stable with max subtraction before exp Required for transformer attention softmax gradient flow