mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-28 21:02:45 -05:00
- Preserve computation graph by using Tensor arithmetic (x - x_max, exp / sum) - No more .data extraction that breaks gradient flow - Numerically stable with max subtraction before exp Required for transformer attention softmax gradient flow