mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-05-07 18:42:33 -05:00
CHANGED: Simplified attention module to focus on core concepts - Remove multi-head attention, positional encoding, layer norm, transformer block - Keep only: scaled_dot_product_attention, SelfAttention, masking utilities - Reduce complexity from ⭐⭐⭐⭐ to ⭐⭐⭐ (matches CNN level) - Cut from 885 lines to ~440 lines (aligned with other modules) - Update dependencies: only requires tensor (not layers/activations/networks) - Change pedagogical framework: 'Build → Use → Understand' (not Reflect) - Focus on single concept per module (following established TinyTorch pattern) RESULT: Clean, focused attention module teaching core mechanism - Students master fundamental attention before advanced concepts - Consistent with TinyTorch's one-concept-per-module approach - Foundation for future multi-head attention and transformer modules - All tests passing (100% success rate)