mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-02 16:36:10 -05:00
- Preserve computation graph by using Tensor arithmetic (x - x_max, exp / sum) - No more .data extraction that breaks gradient flow - Numerically stable with max subtraction before exp Required for transformer attention softmax gradient flow