mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-02 08:32:31 -05:00
- Implement scaled dot-product attention with masking support - Build multi-head attention with learnable projections - Create sinusoidal positional encoding for sequence understanding - Add layer normalization for training stability - Complete transformer block with residual connections - Include self-attention wrapper and utility functions - Full inline testing with 100% pass rate - Educational content explaining attention mechanisms - Foundation for modern AI architectures (GPT, BERT, etc.) This module bridges classical ML (tensors, layers, networks) with modern transformer architectures that power ChatGPT and contemporary AI.