mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-03-12 10:46:32 -05:00
Major Educational Framework Enhancements: • Deploy interactive NBGrader text response questions across ALL modules • Replace passive question lists with active 150-300 word student responses • Enable comprehensive ML Systems learning assessment and grading TinyGPT Integration (Module 16): • Complete TinyGPT implementation showing 70% component reuse from TinyTorch • Demonstrates vision-to-language framework generalization principles • Full transformer architecture with attention, tokenization, and generation • Shakespeare demo showing autoregressive text generation capabilities Module Structure Standardization: • Fix section ordering across all modules: Tests → Questions → Summary • Ensure Module Summary is always the final section for consistency • Standardize comprehensive testing patterns before educational content Interactive Question Implementation: • 3 focused questions per module replacing 10-15 passive questions • NBGrader integration with manual grading workflow for text responses • Questions target ML Systems thinking: scaling, deployment, optimization • Cumulative knowledge building across the 16-module progression Technical Infrastructure: • TPM agent for coordinated multi-agent development workflows • Enhanced documentation with pedagogical design principles • Updated book structure to include TinyGPT as capstone demonstration • Comprehensive QA validation of all module structures Framework Design Insights: • Mathematical unity: Dense layers power both vision and language models • Attention as key innovation for sequential relationship modeling • Production-ready patterns: training loops, optimization, evaluation • System-level thinking: memory, performance, scaling considerations Educational Impact: • Transform passive learning to active engagement through written responses • Enable instructors to assess deep ML Systems understanding • Provide clear progression from foundations to complete language models • Demonstrate real-world framework design principles and trade-offs
TinyGPT Core Components
This directory contains the core components for TinyGPT, a educational implementation of GPT-style language models built on TinyTorch foundations.
Components
tokenizer.py - Character-Level Tokenization
- CharTokenizer: Character-level tokenizer for text processing
- Key Features:
- Simple character-to-token mapping
- Vocabulary size limiting for computational efficiency
- Special tokens support (
<UNK>,<PAD>) - Batch encoding with padding/truncation
- Comprehensive text analysis capabilities
Usage:
from core.tokenizer import CharTokenizer
tokenizer = CharTokenizer(vocab_size=100)
tokenizer.fit(training_text)
tokens = tokenizer.encode("Hello, world!")
text = tokenizer.decode(tokens)
training.py - Language Model Training Infrastructure
- LanguageModelTrainer: Complete training pipeline for language models
- LanguageModelLoss: Cross-entropy loss with next-token prediction
- LanguageModelAccuracy: Accuracy metrics for language modeling
Key Features:
- Text-to-sequence data preparation
- Next-token prediction training
- Autoregressive text generation
- Training/validation splitting
- Comprehensive evaluation metrics
Usage:
from core.training import LanguageModelTrainer
from core.models import TinyGPT
model = TinyGPT(vocab_size=50, d_model=128)
trainer = LanguageModelTrainer(model, tokenizer)
history = trainer.fit(text, epochs=5, seq_length=64)
generated = trainer.generate_text("Hello", max_length=50)
attention.py - Attention Mechanisms
- MultiHeadAttention: Multi-head self-attention implementation
- SelfAttention: Simplified single-head attention
- PositionalEncoding: Sinusoidal positional embeddings
- create_causal_mask: Causal masking for autoregressive models
models.py - Transformer Models
- TinyGPT: Complete GPT-style transformer model
- TransformerBlock: Individual transformer layer
- LayerNorm: Layer normalization implementation
- SimpleLM: Simplified language model for comparison
Integration with TinyTorch
The TinyGPT components are designed to maximize reuse of TinyTorch components:
Reused Components (70%+):
- Dense layers for all linear transformations
- Activation functions (ReLU, Softmax)
- Loss functions (CrossEntropyLoss)
- Optimizers (Adam)
- Training infrastructure patterns
- Tensor operations
New Components for NLP:
- Multi-head attention mechanisms
- Positional encoding
- Layer normalization
- Causal masking
- Text tokenization
- Autoregressive generation
Educational Benefits
- Character-Level Simplicity: Easy to understand tokenization without complex subword algorithms
- Transparent Architecture: All components implemented with clear educational comments
- Component Reuse: Demonstrates how ML foundations generalize across domains
- Progressive Complexity: From simple tokenizer to full transformer model
- Mock Implementations: Works with or without TinyTorch for standalone learning
Example: Shakespeare Demo
The examples/shakespeare_demo.py demonstrates the complete pipeline:
- Character tokenization of Shakespeare text
- TinyGPT model creation and training
- Text generation at different temperatures
- Performance analysis and comparison with vision models
This shows how the same mathematical foundations (linear layers, attention, optimization) power both computer vision and natural language processing.
File Dependencies
core/
├── tokenizer.py # Standalone, only requires numpy
├── attention.py # Uses TinyTorch Tensor and Dense (with mocks)
├── models.py # Uses attention.py and TinyTorch layers
├── training.py # Uses tokenizer.py and TinyTorch components
└── README.md # This file
Design Philosophy
TinyGPT follows the same educational philosophy as TinyTorch:
- Build → Use → Understand: Implement each component before using it
- Educational Clarity: Clear code with extensive documentation
- Minimal Dependencies: NumPy + educational implementations
- Real-World Relevance: Patterns used in production frameworks
- Component Modularity: Each piece can be understood independently
The goal is to demystify how language models work while showing how they share foundational concepts with computer vision models.