mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-12 10:46:32 -05:00
- Create professional examples directory showcasing TinyTorch as real ML framework - Add examples: XOR, MNIST, CIFAR-10, text generation, autograd demo, optimizer comparison - Fix import paths in exported modules (training.py, dense.py) - Update training module with autograd integration for loss functions - Add progressive integration tests for all 16 modules - Document framework capabilities and usage patterns This commit establishes the examples gallery that demonstrates TinyTorch works like PyTorch/TensorFlow, validating the complete framework.
Module 16: TinyGPT - Language Models
From Vision to Language: Building GPT-style transformers with TinyTorch
Learning Objectives
By the end of this module, you will:
- Build GPT-style transformer models using TinyTorch Dense layers and attention mechanisms
- Understand character-level tokenization and its role in language model training
- Implement multi-head attention that enables models to focus on different parts of sequences
- Create complete transformer blocks with layer normalization and residual connections
- Train autoregressive language models that generate coherent text sequences
- Apply ML Systems thinking to understand framework reusability across vision and language
What Makes This Special
This module demonstrates the power of TinyTorch's foundation by extending it from vision to language models:
- ~70% component reuse: Dense layers, optimizers, training loops, loss functions
- Strategic additions: Only what's essential for language - attention, tokenization, generation
- Educational clarity: See how the same mathematical foundations power both domains
- Framework thinking: Understand why successful ML frameworks support multiple modalities
Components Implemented
Core Language Processing
- CharTokenizer: Character-level tokenization with vocabulary management
- PositionalEncoding: Sinusoidal position embeddings for sequence order
Attention Mechanisms
- MultiHeadAttention: Parallel attention heads for capturing different relationships
- SelfAttention: Simplified attention for easier understanding
- CausalMasking: Preventing attention to future tokens in autoregressive models
Transformer Architecture
- LayerNorm: Normalization for stable transformer training
- TransformerBlock: Complete transformer layer with attention + feedforward
- TinyGPT: Full GPT-style model with embedding, positional encoding, and generation
Training Infrastructure
- LanguageModelLoss: Cross-entropy loss with proper target shifting
- LanguageModelTrainer: Training loops optimized for text sequences
- TextGeneration: Autoregressive sampling for coherent text generation
Key Insights
- Framework Reusability: TinyTorch's Dense layers work seamlessly for language models
- Attention Innovation: The key difference between vision and language is attention mechanisms
- Sequence Modeling: Language requires understanding order and context across long sequences
- Autoregressive Generation: Language models predict one token at a time, building coherently
Educational Philosophy
This module shows that vision and language models share the same foundation:
- Matrix multiplications (Dense layers)
- Nonlinear activations
- Gradient-based optimization
- Batch processing and training loops
The magic happens in the architectural patterns we add on top!
Prerequisites
- Modules 1-11 (especially Tensor, Dense, Attention, Training)
- Understanding of sequence modeling concepts
- Familiarity with autoregressive generation
Time Estimate
4-6 hours for complete understanding and implementation
"Language is the most powerful tool humans have created. Now let's teach machines to wield it." - The TinyTorch Philosophy