mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-21 14:00:49 -05:00
- Preserve computation graph by using Tensor arithmetic (x - x_max, exp / sum) - No more .data extraction that breaks gradient flow - Numerically stable with max subtraction before exp Required for transformer attention softmax gradient flow