Clean up repository: remove temp files, organize modules, prepare for PyPI publication

- Removed temporary test files and audit reports - Deleted backup and temp_holding directories - Reorganized module structure (07->09 spatial, 09->07 dataloader) - Added new modules: 11-14 (tokenization, embeddings, attention, transformers) - Updated examples with historical ML milestones - Cleaned up documentation structure
2026-05-04 21:27:31 -05:00 · 2025-09-24 10:13:37 -04:00
parent 60569cfaaa
commit 6491a7512e
124 changed files with 26011 additions and 66763 deletions
--- a/modules/13_attention/module.yaml
+++ b/modules/13_attention/module.yaml
@@ -0,0 +1,33 @@
+name: "Attention"
+number: 13
+description: "Scaled dot-product and multi-head attention mechanisms that enable transformer architectures"
+learning_objectives:
+  - "Implement scaled dot-product attention with proper masking and numerical stability"
+  - "Build multi-head attention with parallel head processing and output projection"
+  - "Design KV-cache systems for efficient autoregressive generation"
+  - "Understand attention's O(N²) scaling and memory optimization techniques"
+  - "Analyze attention performance bottlenecks and production optimization strategies"
+
+prerequisites:
+  - "02_tensor"
+  - "12_embeddings"
+
+exports:
+  - "ScaledDotProductAttention"
+  - "MultiHeadAttention"
+  - "KVCache"
+  - "AttentionProfiler"
+
+systems_concepts:
+  - "Quadratic memory scaling O(N²) with sequence length"
+  - "Memory-bandwidth bound attention computation"
+  - "KV-cache optimization for autoregressive generation"
+  - "Multi-head parallelization and hardware optimization"
+  - "Attention masking patterns and causal dependencies"
+
+ml_systems_focus: "Attention memory scaling, generation efficiency optimization, sequence length limitations"
+
+estimated_time: "5-6 hours"
+
+next_modules:
+  - "14_transformers"