mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-02 08:32:31 -05:00
Rename examples to exciting names and remove incomplete placeholders
- Rename xor_network/ → xornet/ (more exciting!) - Rename cifar10_classifier/ → cifar10/ (simpler, cleaner) - Remove incomplete optimization_comparison/ and text_generation/ (were placeholder templates, not working implementations) - Update README.md to reflect new exciting names - Streamline to only working, tested examples Final structure: - xornet/ - 100% XOR accuracy - cifar10/ - 57.2% real image classification Clean, exciting names that students will remember!
This commit is contained in:
@@ -14,40 +14,28 @@ These are **real ML applications** written using TinyTorch just like you would u
|
||||
|
||||
```bash
|
||||
# After installing/building TinyTorch:
|
||||
cd examples/xor_network/
|
||||
cd examples/xornet/
|
||||
python train.py
|
||||
|
||||
# Or for image classification:
|
||||
cd examples/cifar10_classifier/
|
||||
cd examples/cifar10/
|
||||
python train_cifar10_mlp.py
|
||||
```
|
||||
|
||||
## Available Examples
|
||||
|
||||
### 🧠 Neural Network Fundamentals
|
||||
- **`xor_network/`** - Classic XOR problem with hidden layers
|
||||
- Clean implementation showing autograd and training basics
|
||||
- Architecture: 2 → 4 → 1 with ReLU and Sigmoid
|
||||
- Achieves 100% accuracy on XOR truth table
|
||||
### 🧠 **`xornet/`** - Neural Network Fundamentals
|
||||
- Classic XOR problem with hidden layers
|
||||
- Clean implementation showing autograd and training basics
|
||||
- Architecture: 2 → 4 → 1 with ReLU and Sigmoid
|
||||
- **Achieves 100% accuracy** on XOR truth table
|
||||
|
||||
### 👁️ Computer Vision
|
||||
- **`cifar10_classifier/`** - Real-world object classification
|
||||
- **ACHIEVEMENT: 57.2% accuracy** - exceeds typical ML course benchmarks!
|
||||
- Multiple architectures: MLP, LeNet-5, and optimized models
|
||||
- Data augmentation, proper initialization, Adam optimization
|
||||
- Real dataset: 50,000 training images, 10,000 test images
|
||||
|
||||
### 🤖 Language & Generation
|
||||
- **`text_generation/`** - Generate text with TinyGPT (Module 16)
|
||||
- Transformer architecture built from scratch
|
||||
- Character-level text generation
|
||||
- Attention mechanisms and positional encoding
|
||||
|
||||
### 📊 Optimization & Analysis
|
||||
- **`optimization_comparison/`** - SGD vs Adam comparison
|
||||
- Side-by-side optimizer performance analysis
|
||||
- Visualization of convergence patterns
|
||||
- Memory usage and computational efficiency
|
||||
### 👁️ **`cifar10/`** - Real-World Computer Vision
|
||||
- Real-world object classification
|
||||
- **ACHIEVEMENT: 57.2% accuracy** - exceeds typical ML course benchmarks!
|
||||
- Multiple architectures: MLP, LeNet-5, and optimized models
|
||||
- Data augmentation, proper initialization, Adam optimization
|
||||
- Real dataset: 50,000 training images, 10,000 test images
|
||||
|
||||
## Example Structure
|
||||
|
||||
@@ -62,9 +50,8 @@ example_name/
|
||||
## Learning Progression
|
||||
|
||||
After completing each module, examples become functional:
|
||||
- **Module 05** → `xor_network/` works (Dense layers + activations)
|
||||
- **Module 11** → `cifar10_classifier/` works with training loops
|
||||
- **Module 16** → `text_generation/` works (TinyGPT)
|
||||
- **Module 05** → `xornet/` works (Dense layers + activations)
|
||||
- **Module 11** → `cifar10/` works with training loops
|
||||
|
||||
## Quick Demo
|
||||
|
||||
@@ -72,20 +59,16 @@ Want to see TinyTorch in action? Try these:
|
||||
|
||||
```bash
|
||||
# See a neural network learn XOR (30 seconds):
|
||||
python examples/xor_network/train.py
|
||||
python examples/xornet/train.py
|
||||
|
||||
# Train on real images (5 minutes, 57% accuracy):
|
||||
python examples/cifar10_classifier/train_cifar10_mlp.py --epochs 10
|
||||
|
||||
# Compare optimizers (2 minutes):
|
||||
python examples/optimization_comparison/compare.py
|
||||
python examples/cifar10/train_cifar10_mlp.py --epochs 10
|
||||
```
|
||||
|
||||
## Performance Achievements
|
||||
|
||||
- **XOR Network**: 100% accuracy (perfect solution)
|
||||
- **CIFAR-10 MLP**: 57.2% accuracy (exceeds typical course benchmarks)
|
||||
- **Optimization**: Adam 3.2x faster convergence than SGD
|
||||
- **XORnet**: 100% accuracy (perfect solution)
|
||||
- **CIFAR-10**: 57.2% accuracy (exceeds typical course benchmarks)
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -1,108 +0,0 @@
|
||||
# Optimization Algorithm Comparison
|
||||
|
||||
Compare SGD, Momentum, and Adam optimizers to see how different algorithms navigate the loss landscape!
|
||||
|
||||
## What This Demonstrates
|
||||
|
||||
- **Different optimization strategies** and their trade-offs
|
||||
- **Convergence speed** comparison between optimizers
|
||||
- **Why Adam is popular** for deep learning
|
||||
- **YOUR implementations** of all major optimizers
|
||||
|
||||
## Running the Comparison
|
||||
|
||||
```bash
|
||||
python compare.py
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
⚡ Optimizer Comparison with TinyTorch
|
||||
======================================================================
|
||||
|
||||
🏃 Training with different optimizers...
|
||||
------------------------------------------------------------
|
||||
|
||||
Training with SGD:
|
||||
Initial loss: 4.2315
|
||||
Final loss: 0.0234
|
||||
Improvement: 99.4%
|
||||
|
||||
Training with Momentum:
|
||||
Initial loss: 4.2315
|
||||
Final loss: 0.0156
|
||||
Improvement: 99.6%
|
||||
|
||||
Training with Adam:
|
||||
Initial loss: 4.2315
|
||||
Final loss: 0.0098
|
||||
Improvement: 99.8%
|
||||
|
||||
📊 Loss Curves (lower is better):
|
||||
------------------------------------------------------------
|
||||
Epoch 0: SGD: 4.2315 ████████████████████ Momentum: 4.2315 ████████████████████ Adam: 4.2315 ████████████████████
|
||||
Epoch 5: SGD: 1.5234 ███████ Momentum: 0.8976 ████ Adam: 0.2134 █
|
||||
Epoch 10: SGD: 0.6789 ███ Momentum: 0.2345 █ Adam: 0.0567
|
||||
Epoch 15: SGD: 0.3456 █ Momentum: 0.0876 Adam: 0.0234
|
||||
...
|
||||
|
||||
🏆 Best optimizer: Adam (lowest final loss)
|
||||
```
|
||||
|
||||
## Optimizers Compared
|
||||
|
||||
### SGD (Stochastic Gradient Descent)
|
||||
```python
|
||||
w = w - learning_rate * gradient
|
||||
```
|
||||
- Simple and reliable
|
||||
- Can be slow to converge
|
||||
- Fixed learning rate
|
||||
|
||||
### Momentum
|
||||
```python
|
||||
velocity = momentum * velocity - learning_rate * gradient
|
||||
w = w + velocity
|
||||
```
|
||||
- Accelerates in consistent directions
|
||||
- Dampens oscillations
|
||||
- Helps escape shallow local minima
|
||||
|
||||
### Adam (Adaptive Moment Estimation)
|
||||
```python
|
||||
m = β₁ * m + (1 - β₁) * gradient # First moment
|
||||
v = β₂ * v + (1 - β₂) * gradient² # Second moment
|
||||
w = w - learning_rate * m / (√v + ε)
|
||||
```
|
||||
- Adaptive learning rates per parameter
|
||||
- Combines momentum with RMSprop
|
||||
- Often fastest convergence
|
||||
|
||||
## Key Insights
|
||||
|
||||
| Optimizer | Pros | Cons | Best For |
|
||||
|-----------|------|------|----------|
|
||||
| **SGD** | Simple, stable | Slow convergence | Final fine-tuning |
|
||||
| **Momentum** | Faster than SGD | Requires tuning | General training |
|
||||
| **Adam** | Fast, adaptive | Can overfit | Most deep learning |
|
||||
|
||||
## Mathematical Foundation
|
||||
|
||||
Your TinyTorch implements:
|
||||
- First-order optimization (gradient-based)
|
||||
- Second-order moment estimation (Adam)
|
||||
- Momentum accumulation
|
||||
- Adaptive learning rates
|
||||
|
||||
## Requirements
|
||||
|
||||
- Module 10 (Optimizers) completed
|
||||
- TinyTorch package exported
|
||||
|
||||
## Next Steps
|
||||
|
||||
Try experimenting with:
|
||||
- Different learning rates
|
||||
- Various momentum values
|
||||
- Complex loss landscapes
|
||||
- Your own optimization algorithms!
|
||||
@@ -1,175 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Optimizer Comparison with TinyTorch
|
||||
|
||||
Compare different optimization algorithms (SGD, Momentum, Adam)
|
||||
to see how they navigate the loss landscape differently.
|
||||
|
||||
This shows why Adam often trains faster than SGD!
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import tinytorch as tt
|
||||
from tinytorch.core import Tensor
|
||||
from tinytorch.core.optimizers import SGD, Adam, Momentum
|
||||
from tinytorch.core.layers import Dense
|
||||
from tinytorch.core.activations import ReLU
|
||||
from tinytorch.core.training import MSELoss
|
||||
|
||||
|
||||
def create_toy_problem():
|
||||
"""Create a simple regression problem."""
|
||||
# Generate synthetic data: y = 2x + 1 + noise
|
||||
np.random.seed(42)
|
||||
X = np.random.randn(100, 1)
|
||||
y = 2 * X + 1 + 0.1 * np.random.randn(100, 1)
|
||||
|
||||
return Tensor(X), Tensor(y)
|
||||
|
||||
|
||||
class SimpleModel:
|
||||
"""A simple linear model for regression."""
|
||||
|
||||
def __init__(self):
|
||||
self.layer = Dense(1, 1)
|
||||
|
||||
def forward(self, x):
|
||||
return self.layer(x)
|
||||
|
||||
def parameters(self):
|
||||
return self.layer.parameters()
|
||||
|
||||
def reset_parameters(self):
|
||||
"""Reset to same initial weights for fair comparison."""
|
||||
self.layer.weights = Tensor([[0.5]])
|
||||
self.layer.bias = Tensor([0.1])
|
||||
|
||||
|
||||
def train_with_optimizer(model, optimizer_name, optimizer, X, y, epochs=50):
|
||||
"""Train model with given optimizer."""
|
||||
loss_fn = MSELoss()
|
||||
losses = []
|
||||
|
||||
# Reset model for fair comparison
|
||||
model.reset_parameters()
|
||||
|
||||
for epoch in range(epochs):
|
||||
# Forward pass
|
||||
predictions = model.forward(X)
|
||||
loss = loss_fn(predictions, y)
|
||||
losses.append(float(loss.data))
|
||||
|
||||
# Backward pass (simulated if no autograd)
|
||||
if hasattr(loss, 'backward'):
|
||||
optimizer.zero_grad()
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
else:
|
||||
# Manual gradient computation for demo
|
||||
# Gradient of MSE loss w.r.t predictions
|
||||
grad_output = 2 * (predictions.data - y.data) / len(y)
|
||||
|
||||
# Gradient w.r.t weights and bias
|
||||
grad_w = X.data.T @ grad_output
|
||||
grad_b = np.sum(grad_output)
|
||||
|
||||
# Manual update based on optimizer type
|
||||
if optimizer_name == "SGD":
|
||||
model.layer.weights.data -= optimizer.lr * grad_w
|
||||
model.layer.bias.data -= optimizer.lr * grad_b
|
||||
# For momentum/adam, we'd need to track history
|
||||
|
||||
return losses
|
||||
|
||||
|
||||
def visualize_losses(all_losses):
|
||||
"""Simple ASCII visualization of loss curves."""
|
||||
print("\n📊 Loss Curves (lower is better):")
|
||||
print("-" * 60)
|
||||
|
||||
max_loss = max(max(losses) for losses in all_losses.values())
|
||||
|
||||
# Show every 5th epoch
|
||||
epochs_to_show = list(range(0, 50, 5))
|
||||
|
||||
for epoch in epochs_to_show:
|
||||
print(f"Epoch {epoch:2d}: ", end="")
|
||||
for name, losses in all_losses.items():
|
||||
loss = losses[epoch]
|
||||
# Normalize to 0-20 character bar
|
||||
bar_length = int(20 * loss / max_loss)
|
||||
bar = "█" * bar_length
|
||||
print(f"{name}: {loss:.4f} {bar} ", end="")
|
||||
print()
|
||||
|
||||
|
||||
def main():
|
||||
print("=" * 70)
|
||||
print("⚡ Optimizer Comparison with TinyTorch")
|
||||
print("=" * 70)
|
||||
print()
|
||||
|
||||
# Create data
|
||||
X, y = create_toy_problem()
|
||||
print("📊 Dataset: Simple linear regression (y = 2x + 1)")
|
||||
print(f" 100 samples, 1 feature")
|
||||
print()
|
||||
|
||||
# Create model
|
||||
model = SimpleModel()
|
||||
|
||||
# Test different optimizers
|
||||
optimizers = {
|
||||
"SGD": SGD(model.parameters(), lr=0.01),
|
||||
"Momentum": Momentum(model.parameters(), lr=0.01, momentum=0.9),
|
||||
"Adam": Adam(model.parameters(), lr=0.01)
|
||||
}
|
||||
|
||||
print("🏃 Training with different optimizers...")
|
||||
print("-" * 60)
|
||||
|
||||
all_losses = {}
|
||||
|
||||
for name, optimizer in optimizers.items():
|
||||
print(f"\nTraining with {name}:")
|
||||
losses = train_with_optimizer(model, name, optimizer, X, y)
|
||||
all_losses[name] = losses
|
||||
|
||||
print(f" Initial loss: {losses[0]:.4f}")
|
||||
print(f" Final loss: {losses[-1]:.4f}")
|
||||
print(f" Improvement: {(1 - losses[-1]/losses[0])*100:.1f}%")
|
||||
|
||||
# Visualize convergence
|
||||
visualize_losses(all_losses)
|
||||
|
||||
print("\n" + "=" * 70)
|
||||
print("🎯 Key Observations:")
|
||||
print("-" * 60)
|
||||
|
||||
# Determine winner
|
||||
final_losses = {name: losses[-1] for name, losses in all_losses.items()}
|
||||
best_optimizer = min(final_losses, key=final_losses.get)
|
||||
|
||||
print(f"🏆 Best optimizer: {best_optimizer} (lowest final loss)")
|
||||
print()
|
||||
|
||||
print("Optimizer Characteristics:")
|
||||
print("• SGD: Simple, slow but steady convergence")
|
||||
print("• Momentum: Accelerates in consistent directions")
|
||||
print("• Adam: Adaptive learning rates, often fastest")
|
||||
print()
|
||||
|
||||
print("💡 Insights:")
|
||||
print("• Adam typically converges faster (fewer epochs)")
|
||||
print("• SGD may be more stable for some problems")
|
||||
print("• Momentum helps escape local minima")
|
||||
print("• Choice depends on your specific problem!")
|
||||
print()
|
||||
|
||||
print("🎉 Your TinyTorch implements all major optimizers!")
|
||||
|
||||
return True
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
success = main()
|
||||
@@ -1,92 +0,0 @@
|
||||
# Text Generation with TinyGPT
|
||||
|
||||
Generate text using a transformer model built with YOUR TinyTorch!
|
||||
|
||||
## What This Demonstrates
|
||||
|
||||
- **Transformer architecture** - the foundation of ChatGPT
|
||||
- **Multi-head attention** mechanisms you built
|
||||
- **Autoregressive generation** - predicting one token at a time
|
||||
- **The technology behind modern AI** - GPT, BERT, etc.
|
||||
|
||||
## How It Works
|
||||
|
||||
```
|
||||
Input Tokens → Embeddings → Transformer Blocks → Output Logits → Next Token
|
||||
↑__________________|
|
||||
(Autoregressive Loop)
|
||||
```
|
||||
|
||||
## Running the Example
|
||||
|
||||
```bash
|
||||
python generate.py
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
🤖 Text Generation with TinyGPT
|
||||
======================================================================
|
||||
|
||||
🎯 Generating Python-like code:
|
||||
--------------------------------------------------
|
||||
|
||||
Prompt: 'def'
|
||||
Generated: 'def function_name ( self ) : return None'
|
||||
|
||||
Prompt: 'class'
|
||||
Generated: 'class MyClass : def __init__ ( self ) :'
|
||||
|
||||
Prompt: 'for i in'
|
||||
Generated: 'for i in range ( 10 ) : print ( i )'
|
||||
|
||||
💡 What This Demonstrates:
|
||||
✅ Transformer architecture with self-attention
|
||||
✅ Multi-head attention you built from scratch
|
||||
✅ Autoregressive text generation
|
||||
✅ The foundation of ChatGPT and GitHub Copilot!
|
||||
|
||||
🎉 You've built the technology behind modern AI!
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
TinyGPT Model:
|
||||
├── Token Embeddings (vocab_size → embed_dim)
|
||||
├── Position Embeddings (max_length → embed_dim)
|
||||
├── Transformer Blocks (×4)
|
||||
│ ├── Multi-Head Attention
|
||||
│ ├── Layer Normalization
|
||||
│ └── Feed-Forward Network (MLP)
|
||||
└── Output Projection (embed_dim → vocab_size)
|
||||
```
|
||||
|
||||
## Key Components
|
||||
|
||||
- **Self-Attention**: Models relationships between all tokens
|
||||
- **Position Embeddings**: Gives model sense of word order
|
||||
- **Layer Normalization**: Stabilizes training
|
||||
- **Autoregressive**: Generates one token at a time
|
||||
|
||||
## What You've Built
|
||||
|
||||
This is the same architecture as:
|
||||
- GPT (Generative Pre-trained Transformer)
|
||||
- ChatGPT (with more layers and parameters)
|
||||
- GitHub Copilot (for code generation)
|
||||
- BERT (with bidirectional attention)
|
||||
|
||||
## Requirements
|
||||
|
||||
- Module 07 (Attention) for multi-head attention
|
||||
- Module 16 (TinyGPT) for complete transformer
|
||||
- All TinyTorch modules exported
|
||||
|
||||
## Next Steps
|
||||
|
||||
The full Module 16 implementation will:
|
||||
- Generate complete Python functions
|
||||
- Work with natural language prompts
|
||||
- Show beam search and sampling strategies
|
||||
- Demonstrate real code generation!
|
||||
@@ -1,223 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Text Generation with TinyGPT
|
||||
|
||||
Generate text using a transformer model built with YOUR TinyTorch!
|
||||
This demonstrates that you've built the technology behind ChatGPT.
|
||||
|
||||
This example:
|
||||
- Loads a pre-trained TinyGPT model
|
||||
- Generates text from prompts
|
||||
- Shows attention mechanisms in action
|
||||
- Proves you understand transformers
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import tinytorch as tt
|
||||
from tinytorch.core import Tensor
|
||||
from tinytorch.core.attention import MultiHeadAttention
|
||||
from tinytorch.core.layers import Dense, Embedding, LayerNorm
|
||||
from tinytorch.core.activations import GELU, Softmax
|
||||
from tinytorch.models import TinyGPT
|
||||
|
||||
|
||||
class SimpleGPT:
|
||||
"""A simple GPT model for text generation."""
|
||||
|
||||
def __init__(self, vocab_size=5000, embed_dim=128, num_heads=4, num_layers=4):
|
||||
self.vocab_size = vocab_size
|
||||
self.embed_dim = embed_dim
|
||||
|
||||
# Token and position embeddings
|
||||
self.token_embedding = Embedding(vocab_size, embed_dim)
|
||||
self.position_embedding = Embedding(1024, embed_dim) # Max sequence length
|
||||
|
||||
# Transformer blocks
|
||||
self.blocks = []
|
||||
for _ in range(num_layers):
|
||||
block = TransformerBlock(embed_dim, num_heads)
|
||||
self.blocks.append(block)
|
||||
|
||||
# Output projection
|
||||
self.ln_final = LayerNorm(embed_dim)
|
||||
self.lm_head = Dense(embed_dim, vocab_size)
|
||||
|
||||
def forward(self, input_ids):
|
||||
"""Forward pass through GPT."""
|
||||
seq_len = input_ids.shape[1]
|
||||
|
||||
# Get token embeddings
|
||||
token_emb = self.token_embedding(input_ids)
|
||||
|
||||
# Add position embeddings
|
||||
positions = Tensor(np.arange(seq_len).reshape(1, -1))
|
||||
pos_emb = self.position_embedding(positions)
|
||||
|
||||
x = token_emb + pos_emb
|
||||
|
||||
# Pass through transformer blocks
|
||||
for block in self.blocks:
|
||||
x = block(x)
|
||||
|
||||
# Final layer norm and projection
|
||||
x = self.ln_final(x)
|
||||
logits = self.lm_head(x)
|
||||
|
||||
return logits
|
||||
|
||||
def generate(self, prompt_ids, max_length=50, temperature=1.0):
|
||||
"""Generate text autoregressively."""
|
||||
generated = prompt_ids.copy()
|
||||
|
||||
for _ in range(max_length):
|
||||
# Get predictions for next token
|
||||
logits = self.forward(Tensor(generated.reshape(1, -1)))
|
||||
|
||||
# Get last token's predictions
|
||||
next_logits = logits.data[0, -1, :] / temperature
|
||||
|
||||
# Sample from distribution
|
||||
probs = np.exp(next_logits) / np.sum(np.exp(next_logits))
|
||||
next_token = np.random.choice(self.vocab_size, p=probs)
|
||||
|
||||
generated = np.append(generated, next_token)
|
||||
|
||||
# Stop if end token generated
|
||||
if next_token == 0: # Assuming 0 is end token
|
||||
break
|
||||
|
||||
return generated
|
||||
|
||||
|
||||
class TransformerBlock:
|
||||
"""A single transformer block."""
|
||||
|
||||
def __init__(self, embed_dim, num_heads):
|
||||
self.attention = MultiHeadAttention(embed_dim, num_heads)
|
||||
self.ln1 = LayerNorm(embed_dim)
|
||||
self.ln2 = LayerNorm(embed_dim)
|
||||
|
||||
# MLP
|
||||
self.mlp = MLP(embed_dim)
|
||||
|
||||
def forward(self, x):
|
||||
"""Forward pass through transformer block."""
|
||||
# Self-attention with residual
|
||||
attn_out = self.attention(x, x, x)
|
||||
x = x + attn_out
|
||||
x = self.ln1(x)
|
||||
|
||||
# MLP with residual
|
||||
mlp_out = self.mlp(x)
|
||||
x = x + mlp_out
|
||||
x = self.ln2(x)
|
||||
|
||||
return x
|
||||
|
||||
|
||||
class MLP:
|
||||
"""Feed-forward network in transformer."""
|
||||
|
||||
def __init__(self, embed_dim):
|
||||
self.fc1 = Dense(embed_dim, embed_dim * 4)
|
||||
self.fc2 = Dense(embed_dim * 4, embed_dim)
|
||||
self.gelu = GELU()
|
||||
|
||||
def forward(self, x):
|
||||
"""Forward pass through MLP."""
|
||||
x = self.fc1(x)
|
||||
x = self.gelu(x)
|
||||
x = self.fc2(x)
|
||||
return x
|
||||
|
||||
|
||||
# Simple tokenizer for demonstration
|
||||
class SimpleTokenizer:
|
||||
"""Basic word-level tokenizer."""
|
||||
|
||||
def __init__(self):
|
||||
# Common programming keywords for demo
|
||||
self.vocab = {
|
||||
'<pad>': 0, '<end>': 1, '<unk>': 2,
|
||||
'def': 3, 'return': 4, 'if': 5, 'else': 6,
|
||||
'for': 7, 'in': 8, 'range': 9, 'print': 10,
|
||||
'import': 11, 'class': 12, 'self': 13,
|
||||
'True': 14, 'False': 15, 'None': 16,
|
||||
'and': 17, 'or': 18, 'not': 19,
|
||||
'=': 20, '+': 21, '-': 22, '*': 23, '/': 24,
|
||||
'(': 25, ')': 26, '[': 27, ']': 28, '{': 29, '}': 30,
|
||||
':': 31, ',': 32, '.': 33,
|
||||
}
|
||||
self.id_to_token = {v: k for k, v in self.vocab.items()}
|
||||
|
||||
def encode(self, text):
|
||||
"""Convert text to token IDs."""
|
||||
tokens = text.split()
|
||||
return np.array([self.vocab.get(t, 2) for t in tokens]) # 2 is <unk>
|
||||
|
||||
def decode(self, ids):
|
||||
"""Convert token IDs to text."""
|
||||
tokens = [self.id_to_token.get(id, '<unk>') for id in ids]
|
||||
return ' '.join(tokens)
|
||||
|
||||
|
||||
def main():
|
||||
print("=" * 70)
|
||||
print("🤖 Text Generation with TinyGPT")
|
||||
print("=" * 70)
|
||||
print()
|
||||
|
||||
print("Building TinyGPT model...")
|
||||
model = SimpleGPT(vocab_size=100, embed_dim=64, num_heads=4, num_layers=2)
|
||||
tokenizer = SimpleTokenizer()
|
||||
|
||||
print("Model Architecture:")
|
||||
print(" • 2 transformer layers")
|
||||
print(" • 4 attention heads per layer")
|
||||
print(" • 64-dimensional embeddings")
|
||||
print(" • 100 token vocabulary")
|
||||
print()
|
||||
|
||||
# Demonstrate with different prompts
|
||||
prompts = [
|
||||
"def",
|
||||
"class",
|
||||
"for i in",
|
||||
"if True",
|
||||
"return"
|
||||
]
|
||||
|
||||
print("🎯 Generating Python-like code:")
|
||||
print("-" * 50)
|
||||
|
||||
for prompt in prompts:
|
||||
print(f"\nPrompt: '{prompt}'")
|
||||
|
||||
# Encode prompt
|
||||
prompt_ids = tokenizer.encode(prompt)
|
||||
|
||||
# Generate completion
|
||||
generated_ids = model.generate(prompt_ids, max_length=10, temperature=0.8)
|
||||
|
||||
# Decode to text
|
||||
generated_text = tokenizer.decode(generated_ids)
|
||||
print(f"Generated: '{generated_text}'")
|
||||
|
||||
print("\n" + "=" * 70)
|
||||
print("💡 What This Demonstrates:")
|
||||
print("-" * 50)
|
||||
print("✅ Transformer architecture with self-attention")
|
||||
print("✅ Multi-head attention you built from scratch")
|
||||
print("✅ Autoregressive text generation")
|
||||
print("✅ The foundation of ChatGPT and GitHub Copilot!")
|
||||
print()
|
||||
print("🎉 You've built the technology behind modern AI!")
|
||||
print()
|
||||
print("Note: This is a simplified demo. Full TinyGPT in Module 16")
|
||||
print("will generate real Python functions from natural language!")
|
||||
|
||||
return True
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
success = main()
|
||||
Reference in New Issue
Block a user