github-starred/TinyTorch

Fork 0

mirror of https://github.com/MLSysBook/TinyTorch.git synced 2026-04-28 23:37:30 -05:00

Files

History

Vijay Janapa Reddi 46dfbdbf02 Clean up test files

2025-09-28 20:10:11 -04:00

README.md

IMPROVE: Make milestone examples self-contained with clear dataset handling

2025-09-26 13:53:06 -04:00

train_gpt.py

Fix all TinyTorch examples to work with current framework

2025-09-28 20:02:12 -04:00

README.md

🤖 TinyGPT (2018) - Transformer Architecture

What This Demonstrates

Complete transformer language model using YOUR TinyTorch! The architecture that powers ChatGPT, built from YOUR implementations.

Prerequisites

Complete ALL these TinyTorch modules:

Module 02 (Tensor) - Data structures
Module 03 (Activations) - ReLU
Module 04 (Layers) - Linear layers
Module 05 (Networks) - Module base class
Module 06 (Autograd) - Backprop through attention
Module 08 (Optimizers) - Adam optimizer
Module 12 (Embeddings) - Token embeddings, positional encoding
Module 13 (Attention) - Multi-head self-attention
Module 14 (Transformers) - LayerNorm, TransformerBlock

🚀 Quick Start

# Run transformer demo
python train_gpt.py

# This is a validation demo - no real training data needed

📊 Dataset Information

Demo Tokens Only

No Real Dataset: Uses random tokens for architecture validation
Purpose: Demonstrates the transformer works, not full training
No Download Required: Synthetic data only

Why No Real Dataset?

Full language model training requires:

Large text corpora (GBs of data)
Significant compute (GPU hours/days)
This example validates YOUR architecture works

🏗️ Architecture

                        Output Logits (Vocabulary Predictions)
                                    ↑
                            Output Projection
                                    ↑
                              Layer Norm
                                    ↑
                    ╔══════════════════════════════╗
                    ║   Transformer Block × 4      ║
                    ║  ┌────────────────────┐     ║
                    ║  │    Layer Norm       │     ║
                    ║  │         ↑           │     ║
                    ║  │  Feed Forward Net   │     ║
                    ║  │         ↑           │     ║
                    ║  │    Layer Norm       │     ║
                    ║  │         ↑           │     ║
                    ║  │  Multi-Head Attention│     ║
                    ║  └────────────────────┘     ║
                    ╚══════════════════════════════╝
                                    ↑
                          Positional Encoding
                                    ↑
                           Token Embeddings
                                    ↑
                             Input Tokens

📈 Demo Configuration

Vocab Size: 100 tokens (tiny for demo)
Embedding Dim: 32
Attention Heads: 4
Layers: 2 transformer blocks
Context Length: 16 tokens

💡 What Makes Transformers Special

Self-Attention

Each token can "look at" all other tokens to understand context:

"The cat sat on the [MASK]"
         ↓
   Attention looks at all words
         ↓
   "mat" (understands context!)

Key Innovations YOUR Implementation Shows

Attention: Context-aware representations
Positional Encoding: Order matters in sequences
Layer Norm: Stable deep network training
Residual Connections: Information flow through layers

📚 What You Learn

Complete transformer architecture from scratch
How attention creates contextual understanding
YOUR implementations power modern LLMs
Foundation for GPT, BERT, ChatGPT, etc.

🔬 Systems Insights

Memory: O(n²) for attention (sequence length squared)
Compute: Highly parallelizable (unlike RNNs)
Scaling: Stack more layers for more capability
YOUR Version: Core math is identical to production!

🚀 Real Training (Advanced)

To train a real language model:

Get text dataset (WikiText, BookCorpus, etc.)
Tokenize text into vocabulary
Create data loader for sequences
Train for many epochs (GPU recommended)
Generate text autoregressively

This demo validates the architecture - real training is a larger undertaking!

README.md Unescape Escape

🤖 TinyGPT (2018) - Transformer Architecture

What This Demonstrates

Prerequisites

🚀 Quick Start

📊 Dataset Information

Demo Tokens Only

Why No Real Dataset?

🏗️ Architecture

📈 Demo Configuration

💡 What Makes Transformers Special

Self-Attention

Key Innovations YOUR Implementation Shows

📚 What You Learn

🔬 Systems Insights

🚀 Real Training (Advanced)

README.md