TinyTorch

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-04-29 01:10:36 -05:00

Files

Vijay Janapa Reddi c23946b20e fix(module-12): Rewrite attention to use batched Tensor operations

Major rewrite for gradient flow:
- scaled_dot_product_attention: Use Tensor ops (matmul, transpose, softmax)
- MultiHeadAttention: Process all heads in parallel with 4D batched tensors
- No explicit batch loops or .data extraction
- Proper mask broadcasting for (batch * heads) dimension

This is the most complex fix - attention is now fully differentiable end-to-end

2025-10-27 20:30:12 -04:00

attention_dev.ipynb

fix(module-12): Rewrite attention to use batched Tensor operations

2025-10-27 20:30:12 -04:00

attention_dev.py

fix(module-12): Rewrite attention to use batched Tensor operations

2025-10-27 20:30:12 -04:00