mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-04-28 13:43:03 -05:00

Files

Vijay Janapa Reddi 1d6fd4b9f7 Restructure TinyTorch into three-part learning journey (17 modules)

- Part I: Foundations (Modules 1-5) - Build MLPs, solve XOR
- Part II: Computer Vision (Modules 6-11) - Build CNNs, classify CIFAR-10
- Part III: Language Models (Modules 12-17) - Build transformers, generate text

Key changes:
- Renamed 05_dense to 05_networks for clarity
- Moved 08_dataloader to 07_dataloader (swap with attention)
- Moved 07_attention to 13_attention (Part III)
- Renamed 12_compression to 16_regularization
- Created placeholder dirs for new language modules (12,14,15,17)
- Moved old modules 13-16 to temp_holding for content migration
- Updated README with three-part structure
- Added comprehensive documentation in docs/three-part-structure.md

This structure gives students three natural exit points with concrete achievements at each level.

2025-09-22 09:50:48 -04:00

module.yaml

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

qa_final_report.md

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

qa_manual_validation.py

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

README.md

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

test_tinygpt_comprehensive.py

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

test_tinygpt_fixed.py

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

tinygpt_dev.ipynb

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

tinygpt_dev.py

Restructure TinyTorch into three-part learning journey (17 modules)

2025-09-22 09:50:48 -04:00

README.md

Module 16: TinyGPT - Language Models

From Vision to Language: Building GPT-style transformers with TinyTorch

Learning Objectives

By the end of this module, you will:

Build GPT-style transformer models using TinyTorch Dense layers and attention mechanisms
Understand character-level tokenization and its role in language model training
Implement multi-head attention that enables models to focus on different parts of sequences
Create complete transformer blocks with layer normalization and residual connections
Train autoregressive language models that generate coherent text sequences
Apply ML Systems thinking to understand framework reusability across vision and language

What Makes This Special

This module demonstrates the power of TinyTorch's foundation by extending it from vision to language models:

~70% component reuse: Dense layers, optimizers, training loops, loss functions
Strategic additions: Only what's essential for language - attention, tokenization, generation
Educational clarity: See how the same mathematical foundations power both domains
Framework thinking: Understand why successful ML frameworks support multiple modalities

Components Implemented

Core Language Processing

CharTokenizer: Character-level tokenization with vocabulary management
PositionalEncoding: Sinusoidal position embeddings for sequence order

Attention Mechanisms

MultiHeadAttention: Parallel attention heads for capturing different relationships
SelfAttention: Simplified attention for easier understanding
CausalMasking: Preventing attention to future tokens in autoregressive models

Transformer Architecture

LayerNorm: Normalization for stable transformer training
TransformerBlock: Complete transformer layer with attention + feedforward
TinyGPT: Full GPT-style model with embedding, positional encoding, and generation

Training Infrastructure

LanguageModelLoss: Cross-entropy loss with proper target shifting
LanguageModelTrainer: Training loops optimized for text sequences
TextGeneration: Autoregressive sampling for coherent text generation

Key Insights

Framework Reusability: TinyTorch's Dense layers work seamlessly for language models
Attention Innovation: The key difference between vision and language is attention mechanisms
Sequence Modeling: Language requires understanding order and context across long sequences
Autoregressive Generation: Language models predict one token at a time, building coherently

Educational Philosophy

This module shows that vision and language models share the same foundation:

Matrix multiplications (Dense layers)
Nonlinear activations
Gradient-based optimization
Batch processing and training loops

The magic happens in the architectural patterns we add on top!

Prerequisites

Modules 1-11 (especially Tensor, Dense, Attention, Training)
Understanding of sequence modeling concepts
Familiarity with autoregressive generation

Time Estimate

4-6 hours for complete understanding and implementation

"Language is the most powerful tool humans have created. Now let's teach machines to wield it." - The TinyTorch Philosophy