🏗️ Restructure milestones with decade-based naming

- Rename to clean, focused convention: 01_1957_perceptron, 02_1969_xor, etc.
- Drop dramatic language (crisis, revival, revolution, era)
- 06_2018_mlperf → 06_2020_scaling (matches GPT-3 scale era)
- Tells clear story: 1950s → 2020s ML evolution
- Each milestone represents major architectural/systems shift
- Remove redundant step1/2/3 files from transformer milestone
This commit is contained in:
Vijay Janapa Reddi
2025-10-27 13:00:06 -04:00
parent 0ae627d0ea
commit c4d5e4ebf8
19 changed files with 45 additions and 1492 deletions

View File

@@ -1,185 +0,0 @@
# 🎓 TinyTorch Capstone Project Ideas
## **Background: The Capstone Design Problem**
**Original Issue**: Module 20 was "TinyGPT Capstone" but students can already build TinyGPT after Module 13 (Transformers). This made:
- Modules 14-19 (optimization) feel like "optional extras"
- Module 20 anticlimactic ("TinyGPT again?")
- No integration of crucial systems engineering skills
**Solution Requirements**:
- Must integrate ALL modules 1-19 (especially optimization modules 14-19)
- Must be genuinely exciting and different
- Must demonstrate complete ML systems engineering mastery
- Must create portfolio-worthy deliverables
---
## **🏆 RECOMMENDED: AI Olympics Competition**
**📁 See: [ai-olympics.md](ai-olympics.md)**
**Core Concept**: Competitive leaderboard where students optimize TinyTorch models across systems engineering dimensions.
**Why This is Best**:
-**Natural motivation**: Students want to rank high on leaderboards
-**Systems focus**: Compete on speed, memory, efficiency - not just accuracy
-**Community building**: Creates ongoing engagement and peer interaction
-**Portfolio impact**: "I ranked #3 in TinyTorch AI Olympics" is compelling
-**Forces optimization**: ALL modules 14-19 become essential for competitive performance
**Competition Categories**:
- 🏃‍♂️ **Speed Demon**: Fastest inference
- 💾 **Memory Miser**: Smallest memory footprint
- 📱 **Edge Expert**: Best Raspberry Pi performance
- 🔋 **Energy Efficient**: Lowest power consumption
- 🏆 **TinyMLPerf**: Overall benchmark champion
---
## **🛠️ Alternative Ideas Considered**
### **1. Edge AI Deployment System**
**Concept**: Deploy optimized neural networks to actual edge hardware (Raspberry Pi)
**Pros**:
- Integrates all optimization modules (essential for edge constraints)
- Creates tangible deliverable ("I run neural networks on a $35 computer")
- Teaches real-world deployment challenges
**Cons**:
- Individual project (no community/competition aspect)
- Hardware dependencies (students need Pi)
- Less motivating than competition
### **2. Multi-Modal AI Assistant**
**Concept**: Combine vision (CNNs) + language (transformers) + optimization for real-time performance
**Pros**:
- Showcases multiple architectures working together
- Demonstrates practical AI applications
- Requires optimization for real-time performance
**Cons**:
- Complex scope potentially overwhelming
- Optimization feels secondary to "getting it working"
- Limited portfolio differentiation
### **3. ML Performance Laboratory**
**Concept**: Comprehensive benchmarking suite comparing different ML frameworks
**Pros**:
- Heavy focus on profiling and benchmarking skills
- Creates useful tool for community
- Deep systems engineering focus
**Cons**:
- More about measurement than optimization
- Limited creative expression for students
- May feel academic rather than practical
### **4. Neural Architecture Search**
**Concept**: Automated model design and optimization system
**Pros**:
- Cutting-edge research area
- Requires sophisticated optimization
- Highly technical achievement
**Cons**:
- Very advanced, may be beyond course scope
- Optimization becomes means rather than end
- Difficult to assess fairly
### **5. Distributed Training System**
**Concept**: Multi-GPU/multi-node training infrastructure
**Pros**:
- Advanced systems engineering skills
- High industry relevance
- Impressive technical achievement
**Cons**:
- Requires expensive hardware
- Complex debugging and setup
- May overshadow core ML concepts
### **6. ML Model Marketplace**
**Concept**: Complete system for sharing/deploying/optimizing models (like Hugging Face)
**Pros**:
- Full-stack systems engineering
- Practical deployment focus
- Creates useful community resource
**Cons**:
- Web development skills needed
- Broad scope potentially unfocused
- Less emphasis on optimization techniques
---
## **📊 Evaluation Criteria**
| Criteria | AI Olympics | Edge Deployment | Multi-Modal | ML Lab | NAS | Distributed | Marketplace |
|----------|-------------|-----------------|-------------|--------|-----|-------------|-------------|
| **Integrates All Modules** | ✅✅✅ | ✅✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| **Student Motivation** | ✅✅✅ | ✅ | ✅ | ⚠️ | ⚠️ | ⚠️ | ✅ |
| **Portfolio Impact** | ✅✅✅ | ✅✅ | ✅ | ✅ | ✅✅ | ✅✅ | ✅ |
| **Systems Engineering Focus** | ✅✅✅ | ✅✅ | ✅ | ✅✅✅ | ✅ | ✅✅✅ | ✅ |
| **Implementation Feasibility** | ✅✅ | ✅✅✅ | ✅ | ✅✅ | ⚠️ | ⚠️ | ✅ |
| **Community Building** | ✅✅✅ | ⚠️ | ⚠️ | ✅ | ⚠️ | ⚠️ | ✅✅ |
| **Scalability** | ✅✅✅ | ✅ | ✅ | ✅ | ⚠️ | ⚠️ | ✅ |
**Legend**: ✅✅✅ Excellent, ✅✅ Good, ✅ Adequate, ⚠️ Challenging
---
## **🎯 Final Recommendation**
**AI Olympics** emerges as the clear winner because it:
1. **Maximizes student motivation** through competitive leaderboards
2. **Forces integration** of ALL optimization modules (14-19)
3. **Creates lasting community** beyond individual course completion
4. **Produces compelling portfolio artifacts** (leaderboard rankings)
5. **Scales naturally** as more students participate
6. **Emphasizes systems engineering** over algorithmic implementation
### **Implementation Priority**
1. **Phase 1**: Design and build leaderboard infrastructure
2. **Phase 2**: Create standard benchmark evaluation suite
3. **Phase 3**: Deploy beta version with small student cohort
4. **Phase 4**: Full launch with all TinyTorch students
### **Success Metrics**
- **Participation Rate**: % of students who submit to multiple categories
- **Optimization Depth**: Average number of techniques applied per submission
- **Community Engagement**: Forum activity, peer collaboration, ongoing submissions
- **Portfolio Impact**: Industry feedback on graduate capabilities
---
## **📝 Notes for Implementation**
### **Technical Requirements**
- Automated submission and evaluation pipeline
- Standard benchmark datasets and environments
- Real-time leaderboard with rich visualizations
- Robust measurement and scoring systems
### **Educational Integration**
- Clear rubrics linking competition performance to course grades
- Structured optimization process through modules 14-19
- Portfolio development guidance and templates
- Peer review and collaboration opportunities
### **Community Features**
- Student profiles and achievement tracking
- Optimization technique sharing and discussion
- Mentorship connections between high performers and struggling students
- Industry guest judging and feedback
---
**🚀 The AI Olympics transforms TinyTorch from "just another ML course" into a competitive systems engineering community that motivates deep learning, creates lasting engagement, and produces industry-ready graduates.**

View File

@@ -1,227 +0,0 @@
# 🏅 AI Olympics: TinyTorch Systems Competition Capstone
## **Core Concept: Compete on Systems Performance, Not Just Accuracy**
Instead of individual projects, Module 20 becomes a **competitive leaderboard** where students optimize their TinyTorch models across multiple **systems engineering dimensions**.
### **🎯 Why AI Olympics is Perfect for TinyTorch**
- **Systems Focus**: Compete on memory, speed, efficiency - not just accuracy
- **Real ML Engineering**: Production systems care about performance, not just "does it work"
- **Leaderboard Motivation**: Students naturally want to rank high and beat peers
- **Portfolio Value**: "I ranked #3 in TinyTorch AI Olympics" is impressive
- **Community Building**: Creates ongoing engagement beyond the course
---
## **🏆 Competition Categories**
### **Category 1: Speed Demon** ⚡
*"Fastest inference on standard hardware"*
- **Metric**: Inferences per second on reference hardware
- **Required Skills**: Modules 14-19 optimization techniques
- **Constraint**: Must maintain >90% accuracy on test dataset
### **Category 2: Memory Miser** 💾
*"Smallest memory footprint"*
- **Metric**: Peak memory usage during inference
- **Required Skills**: Quantization, compression, efficient architectures
- **Constraint**: Must maintain >85% accuracy on test dataset
### **Category 3: Edge Expert** 📱
*"Best performance on Raspberry Pi"*
- **Metric**: Composite score (speed + accuracy + power efficiency)
- **Required Skills**: ALL optimization modules for edge constraints
- **Constraint**: Must actually run on Pi hardware
### **Category 4: Energy Efficient** 🔋
*"Lowest power consumption"*
- **Metric**: Energy per inference (joules/prediction)
- **Required Skills**: Model compression, efficient algorithms
- **Constraint**: Must maintain competitive accuracy
### **Category 5: TinyMLPerf** 🏃‍♂️
*"Official MLPerf-style benchmark"*
- **Metric**: Standardized benchmark suite performance
- **Required Skills**: Complete systems optimization pipeline
- **Constraint**: Must pass all benchmark compliance tests
---
## **🎮 Competition Structure**
### **Phase 1: Baseline Submission (Week 1)**
- Submit working model from modules 1-13 (CNN, transformer, or multi-modal)
- Get baseline scores across all categories
- See where you rank on initial leaderboard
### **Phase 2: Optimization Sprint (Weeks 2-4)**
- Apply techniques from modules 14-19 systematically
- **Module 14**: Profile and identify bottlenecks
- **Module 15**: Implement acceleration techniques
- **Module 16**: Add quantization for memory/speed
- **Module 17**: Apply compression for size reduction
- **Module 18**: Implement caching for inference speed
- **Module 19**: Benchmark against production systems
### **Phase 3: Final Submission & Olympics (Week 5)**
- Submit optimized models to all relevant categories
- **Live leaderboard updates** as submissions come in
- **Victory ceremony** with category winners
- **Portfolio artifacts**: Leaderboard rankings + optimization reports
---
## **📊 Leaderboard & Scoring System**
### **Public Leaderboard Features**
```
🏆 TinyTorch AI Olympics Leaderboard
Speed Demon Category:
1. alice_chen 847.3 inf/sec (95.2% acc) 🥇
2. bob_smith 612.7 inf/sec (94.8% acc) 🥈
3. carol_wong 588.1 inf/sec (96.1% acc) 🥉
Memory Miser Category:
1. dave_kim 12.4 MB (91.7% acc) 🥇
2. eve_patel 15.8 MB (93.2% acc) 🥈
3. frank_liu 18.2 MB (89.9% acc) 🥉
```
### **Scoring Methodology**
- **Primary Metric**: Category-specific performance (speed, memory, etc.)
- **Accuracy Threshold**: Must meet minimum accuracy to qualify
- **Tie-Breaker**: Higher accuracy wins ties in primary metric
- **Bonus Points**: Novel optimization techniques, exceptional documentation
### **Awards & Recognition**
- **🥇 Category Champions**: Top performer in each category
- **🏆 Overall Systems Engineer**: Best combined performance across categories
- **🚀 Innovation Award**: Most creative optimization approach
- **📚 Teaching Award**: Best documented optimization process
---
## **🎯 Required Deliverables**
### **Competition Submission Package**
1. **Optimized Model**: Runnable TinyTorch implementation
2. **Performance Report**: Detailed analysis of optimization techniques applied
3. **Reproduction Guide**: Clear instructions for others to run your solution
4. **Systems Engineering Documentation**: What you learned about ML systems
### **Portfolio Artifacts Students Get**
- **Leaderboard ranking** across multiple categories
- **Technical optimization report** demonstrating systems engineering skills
- **Benchmark results** comparing their work to industry standards
- **Peer recognition** from competitive performance
---
## **🔧 Technical Infrastructure Needed**
### **Leaderboard System**
- Automated submission processing
- Standard evaluation environment
- Real-time ranking updates
- Historical performance tracking
### **Benchmark Suite**
- Reference datasets for each category
- Standard hardware for testing
- Automated compliance checking
- Performance measurement tools
### **Submission Portal**
- Code upload and validation
- Automatic testing pipeline
- Results processing and ranking
- Student dashboard with progress
---
## **📈 Why This Beats Individual Projects**
### **Individual Project Problems:**
- ❌ No motivation to optimize beyond "it works"
- ❌ Hard to compare student achievements
- ❌ No ongoing engagement after submission
- ❌ Limited portfolio impact
### **AI Olympics Advantages:**
-**Natural optimization motivation**: Students want to rank higher
-**Clear performance comparison**: Leaderboard shows relative achievement
-**Ongoing engagement**: Leaderboard creates lasting community
-**Strong portfolio impact**: "I ranked #2 in Memory Efficiency" is compelling
### **Systems Engineering Focus:**
- Forces students to care about **ALL** optimization dimensions
- Makes modules 14-19 essential for competitive performance
- Teaches that "getting it working" is only the beginning
- Demonstrates real-world ML engineering priorities
---
## **🚀 Implementation Timeline**
### **Phase 1: Core Infrastructure (4 weeks)**
- Build leaderboard system
- Create benchmark evaluation suite
- Set up automated testing pipeline
- Design submission portal
### **Phase 2: Beta Testing (2 weeks)**
- Test with small group of students
- Refine scoring methodology
- Fix technical issues
- Gather feedback and iterate
### **Phase 3: Full Launch (Ongoing)**
- Deploy for all TinyTorch students
- Monitor and maintain leaderboard
- Regular benchmark updates
- Community management and awards
---
## **🎓 Educational Impact**
### **Learning Outcomes**
Students learn that ML engineering is about:
- **Systems performance**, not just algorithmic correctness
- **Trade-offs** between speed, memory, accuracy, and power
- **Optimization techniques** for real-world constraints
- **Benchmarking and measurement** for objective evaluation
- **Competition and collaboration** in technical communities
### **Career Preparation**
Students graduate with:
- **Demonstrable systems optimization skills**
- **Portfolio evidence of competitive performance**
- **Experience with ML engineering trade-offs**
- **Understanding of production ML constraints**
- **Community connections** with other systems engineers
---
## **💡 Future Extensions**
### **Multi-Semester Competitions**
- New benchmark challenges each semester
- Evolving leaderboards with increasing difficulty
- Alumni participation and mentorship
### **Industry Integration**
- Company-sponsored benchmark challenges
- Internship opportunities for top performers
- Guest judging from ML systems engineers
### **Research Integration**
- Novel optimization techniques become research contributions
- Student innovations feed back into TinyTorch framework
- Academic publications from exceptional submissions
---
**🎯 CONCLUSION: AI Olympics transforms Module 20 from "individual project" to "competitive systems engineering challenge" that motivates optimization, builds community, and produces compelling portfolio artifacts.**

View File

@@ -4,63 +4,20 @@
## 🎯 What You'll Build
Three progressively impressive demos:
A character-level transformer trained on Shakespeare's works - the classic "hello world" of language modeling!
### Step 1: Quick Validation (5 minutes)
**File**: `step1_quick_validation.py`
**Goal**: Verify transformer pipeline works
### Shakespeare Text Generation
**File**: `vaswani_shakespeare.py`
**Goal**: Build a transformer that generates Shakespeare-style text
```bash
python step1_quick_validation.py
```
**What it does**:
- Trains on simple repeating text ("hello world")
- Proves modules 10-13 are connected correctly
- Quick sanity check before bigger demos
**Success**: Generates "hello world" pattern
---
### Step 2: TinyCoder (15 minutes) 🔥
**File**: `step2_tinycoder.py`
**Goal**: Code completion like GitHub Copilot!
```bash
python step2_tinycoder.py
```
**What it does**:
- Trains on YOUR TinyTorch Python code
- Learns code patterns (def, class, self, etc.)
- Generates syntactically valid Python completions
**Demo**:
```python
Input: 'def forward(self, x):'
Output: 'def forward(self, x):\n return self.layer(x)'
Input: 'import '
Output: 'import numpy as np'
```
**Epic moment**: "I built GitHub Copilot!"
---
### Step 3: Shakespeare (15 minutes)
**File**: `step3_shakespeare.py`
**Goal**: Traditional text generation demo
```bash
python step3_shakespeare.py
python vaswani_shakespeare.py
```
**What it does**:
- Downloads Tiny Shakespeare dataset
- Trains character-level transformer
- Generates Shakespeare-style text
- Trains character-level transformer (YOUR implementation!)
- Generates coherent Shakespeare-style text
**Demo**:
```
@@ -69,8 +26,6 @@ Output: 'To be or not to be, that is the question
Whether tis nobler in the mind to suffer...'
```
**Classic**: Traditional "hello world" for language models
---
## 🚀 Quick Start
@@ -82,34 +37,18 @@ Complete these TinyTorch modules:
- ✅ Module 12: Attention
- ✅ Module 13: Transformers
### Run in Order
### Run the Example
```bash
# 1. Quick validation (5 min)
python step1_quick_validation.py
# 2. Code completion (15 min) - THE EPIC ONE
python step2_tinycoder.py
# 3. Shakespeare (15 min) - traditional demo
python step3_shakespeare.py
# Train transformer on Shakespeare (15-20 min)
python vaswani_shakespeare.py
```
---
## 📊 What Each Demo Teaches
| Demo | Dataset | Tokenizer | Time | Epic Factor | What You Learn |
|------|---------|-----------|------|-------------|----------------|
| **Step 1** | Simple text | CharTokenizer | 5 min | ⭐⭐ | Pipeline works |
| **Step 2** | TinyTorch code | BPETokenizer | 15 min | ⭐⭐⭐⭐⭐ | YOU built Copilot! |
| **Step 3** | Shakespeare | CharTokenizer | 15 min | ⭐⭐⭐⭐ | Language modeling |
---
## 🎓 Learning Outcomes
After completing these milestones, you'll understand:
After completing this milestone, you'll understand:
### Technical Mastery
- ✅ How tokenization bridges text and numbers
@@ -248,11 +187,12 @@ model = TinyGPT(
You've succeeded when:
**Step 1**: Model generates repeating pattern
**Step 2**: Code completions are syntactically valid
**Step 3**: Shakespeare text is coherent (even if not perfect)
✅ Model trains without errors
✅ Loss decreases over training epochs
✅ Generated Shakespeare text is coherent (even if not perfect)
✅ You can generate text with custom prompts
**Don't expect perfection!** Production models train for months on massive data. Your demos prove you understand the architecture!
**Don't expect perfection!** Production models train for months on massive data. Your demo proves you understand the architecture!
---
@@ -285,4 +225,4 @@ The transformer architecture you implemented powers:
---
**Ready to generate some text?** Start with `step1_quick_validation.py`!
**Ready to generate some text?** Run `python vaswani_shakespeare.py`!

View File

@@ -23,12 +23,12 @@ MODULES EXERCISED IN THIS EXAMPLE:
Transformer Architecture (Bottom to Top Flow):
Output Logits
Vocabulary Predictions (1000)
Output Logits
Vocabulary Predictions (1000)
Output Projection
Output Projection
Module 04: vectors vocabulary
@@ -39,41 +39,41 @@ Transformer Architecture (Bottom to Top Flow):
Transformer Block × 4 (Repeat)
Layer Norm
Module 14: Post-FFN normalization
Layer Norm
Module 14: Post-FFN normalization
Feed Forward Network (FFN)
Module 04: Linear(128512) ReLU Linear(512128)
Feed Forward Network (FFN)
Module 04: Linear(128512) ReLU Linear(512128)
Layer Norm
Module 14: Post-attention normalization
Layer Norm
Module 14: Post-attention normalization
Multi-Head Self-Attention
Module 13: 8 heads × (Q·K^T/d_k)·V
Each head: 16-dim attention on 128-dim embeddings
Multi-Head Self-Attention
Module 13: 8 heads × (Q·K^T/d_k)·V
Each head: 16-dim attention on 128-dim embeddings
Positional Encoding
Module 12: Add position information (sin/cos)
Positional Encoding
Module 12: Add position information (sin/cos)
Token Embeddings
Token Embeddings
Module 12: tokens 128-dim vectors
Input Tokens
[token_1, token_2, ..., token_10]
Input Tokens
[token_1, token_2, ..., token_10]
Key Insight: Attention allows each token to "look at" all other tokens

View File

@@ -1,288 +0,0 @@
#!/usr/bin/env python3
"""
Step 1: Quick Validation - Transformer Pipeline Test
====================================================
GOAL: Verify transformer modules work end-to-end in 5 minutes
DATASET: Simple repeating text (no download needed)
TOKENIZER: CharTokenizer (no training needed)
TIME: ~5 minutes
This is the simplest possible test to prove:
✅ Modules 10-13 are connected correctly
✅ Training loop works
✅ Generation works
If this passes, the pipeline is functional!
"""
import numpy as np
import sys
import os
# Add project root to path
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.insert(0, project_root)
from tinytorch.core.tensor import Tensor
from tinytorch.text.tokenization import CharTokenizer
from tinytorch.core.embeddings import Embedding, PositionalEncoding
from tinytorch.core.attention import MultiHeadAttention
from tinytorch.models.transformer import TransformerBlock, LayerNorm
from tinytorch.core.layers import Linear
from tinytorch.core.optimizers import Adam
class TinyGPT:
"""Minimal GPT for quick validation."""
def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_length):
self.vocab_size = vocab_size
self.embed_dim = embed_dim
# Token + position embeddings
self.token_embedding = Embedding(vocab_size, embed_dim)
self.pos_encoding = PositionalEncoding(max_length, embed_dim)
# Transformer blocks
self.blocks = []
for _ in range(num_layers):
block = TransformerBlock(embed_dim, num_heads, embed_dim * 4)
self.blocks.append(block)
# Output projection
self.ln_f = LayerNorm(embed_dim)
self.head = Linear(embed_dim, vocab_size)
def forward(self, idx):
"""Forward pass through the model."""
B, T = idx.shape
# Token + positional embeddings
tok_emb = self.token_embedding.forward(idx) # (B, T, embed_dim)
x = self.pos_encoding.forward(tok_emb) # (B, T, embed_dim) - includes positional info
# Transformer blocks
for block in self.blocks:
x = block(x)
# Output head
x = self.ln_f(x)
logits = self.head(x) # (B, T, vocab_size)
return logits
def generate(self, idx, max_new_tokens, temperature=1.0):
"""Generate new tokens autoregressively."""
for _ in range(max_new_tokens):
# Crop context if needed
idx_cond = idx if idx.shape[1] <= 128 else idx[:, -128:]
# Get predictions
logits = self.forward(idx_cond)
# Focus on last time step
logits = logits[:, -1, :] / temperature # (B, vocab_size)
# Sample from distribution (greedy for simplicity)
next_idx = np.argmax(logits.data, axis=-1, keepdims=True)
# Append to sequence
idx = Tensor(np.concatenate([idx.data, next_idx], axis=1))
return idx
def parameters(self):
"""Get all trainable parameters."""
params = []
params.extend(self.token_embedding.parameters())
for block in self.blocks:
params.extend(block.parameters())
params.extend(self.ln_f.parameters())
params.extend(self.head.parameters())
return params
def main():
print("="*70)
print("🚀 Step 1: Quick Transformer Validation")
print("="*70)
print()
# ========================================
# 1. Prepare simple repeating text
# ========================================
print("📝 Step 1: Preparing data...")
text = "hello world! " * 200 # Simple repeating pattern
print(f" Text length: {len(text)} characters")
print(f" Sample: '{text[:50]}...'")
print()
# ========================================
# 2. Tokenize (character-level)
# ========================================
print("🔤 Step 2: Tokenizing...")
tokenizer = CharTokenizer()
# Build vocab from text
unique_chars = sorted(list(set(text)))
tokenizer.vocab = unique_chars
tokenizer.char_to_idx = {ch: i for i, ch in enumerate(unique_chars)}
tokenizer.idx_to_char = {i: ch for i, ch in enumerate(unique_chars)}
# Encode text
data = tokenizer.encode(text)
vocab_size = len(tokenizer.vocab)
print(f" Vocabulary size: {vocab_size} unique characters")
print(f" Tokens: {data[:20]}...")
print(f" Vocab: {tokenizer.vocab}")
print()
# ========================================
# 3. Create training batches
# ========================================
print("📦 Step 3: Creating batches...")
block_size = 32 # Context length
batch_size = 4
def get_batch():
"""Get a random batch of data."""
ix = np.random.randint(0, len(data) - block_size, size=batch_size)
x = np.array([data[i:i+block_size] for i in ix])
y = np.array([data[i+1:i+block_size+1] for i in ix])
return Tensor(x), Tensor(y)
x_sample, y_sample = get_batch()
print(f" Batch size: {batch_size}")
print(f" Block size: {block_size}")
print(f" Input shape: {x_sample.shape}")
print(f" Target shape: {y_sample.shape}")
print()
# ========================================
# 4. Initialize model
# ========================================
print("🤖 Step 4: Initializing TinyGPT...")
model = TinyGPT(
vocab_size=vocab_size,
embed_dim=64, # Small for fast training
num_heads=4,
num_layers=2, # Just 2 layers
max_length=block_size
)
total_params = sum(p.data.size for p in model.parameters())
print(f" Model parameters: {total_params:,}")
print(f" Architecture: {len(model.blocks)} transformer blocks")
print()
# ========================================
# 5. Train
# ========================================
print("🏋️ Step 5: Training (10 steps)...")
optimizer = Adam(model.parameters(), learning_rate=3e-4)
for step in range(10):
# Get batch
xb, yb = get_batch()
# Forward pass
logits = model.forward(xb)
# Compute loss (simplified cross-entropy)
B, T, C = logits.shape
logits_flat = logits.data.reshape(B*T, C)
targets_flat = yb.data.reshape(B*T)
# One-hot encode targets
targets_one_hot = np.zeros((B*T, C))
for i, t in enumerate(targets_flat):
targets_one_hot[i, int(t)] = 1.0
# MSE loss (simplified)
loss_value = np.mean((logits_flat - targets_one_hot) ** 2)
# Backward (simplified - just for demo)
# In real training, this would compute gradients
# Update (simplified)
# optimizer.step()
# optimizer.zero_grad()
if step % 2 == 0:
print(f" Step {step:2d}/10 | Loss: {loss_value:.4f}")
print()
# ========================================
# 6. Generate
# ========================================
print("✨ Step 6: Generating text...")
# Start with "hello"
context = "hello"
context_tokens = tokenizer.encode(context)
idx = Tensor(np.array([context_tokens]))
# Generate 20 new tokens
generated = model.generate(idx, max_new_tokens=20)
# Decode
output = tokenizer.decode(generated.data[0].tolist())
print(f" Input: '{context}'")
print(f" Generated: '{output}'")
print()
# ========================================
# 7. Validation
# ========================================
print("="*70)
print("✅ Validation Results:")
print("="*70)
checks = []
# Check 1: Model initialized
checks.append(("Model initialization", total_params > 0))
# Check 2: Forward pass works
try:
test_logits = model.forward(xb)
checks.append(("Forward pass", test_logits.shape == (batch_size, block_size, vocab_size)))
except Exception as e:
checks.append(("Forward pass", False))
print(f" Error: {e}")
# Check 3: Generation works
checks.append(("Text generation", len(output) > len(context)))
# Check 4: Output is decodable
checks.append(("Output decodable", all(c in tokenizer.vocab for c in output)))
# Print results
for check_name, passed in checks:
status = "" if passed else ""
print(f"{status} {check_name}")
print()
if all(passed for _, passed in checks):
print("🎉 SUCCESS! Transformer pipeline is working!")
print()
print("Next steps:")
print(" → Run step2_tinycoder.py for code completion demo")
print(" → Run step3_shakespeare.py for text generation demo")
else:
print("⚠️ Some checks failed. Debug modules 10-13.")
print("="*70)
if __name__ == "__main__":
main()

View File

@@ -1,338 +0,0 @@
#!/usr/bin/env python3
"""
Step 2: TinyCoder - Code Autocompletion with Transformers
==========================================================
GOAL: Build GitHub Copilot using YOUR TinyTorch code
DATASET: Your actual TinyTorch modules (already exists!)
TOKENIZER: BPETokenizer (learns code patterns)
TIME: ~15 minutes
This demonstrates:
✅ Transformer trained on real Python code
✅ Generates syntactically valid completions
✅ YOU built the tool you use daily!
Epic moment: "IT'S COPILOT!"
"""
import numpy as np
import sys
import os
import glob
import re
# Add project root to path
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.insert(0, project_root)
from tinytorch.core.tensor import Tensor
from tinytorch.text.tokenization import BPETokenizer
from tinytorch.core.embeddings import Embedding, PositionalEncoding
from tinytorch.core.attention import MultiHeadAttention
from tinytorch.models.transformer import TransformerBlock, LayerNorm
from tinytorch.core.layers import Linear
from tinytorch.core.optimizers import Adam
class TinyCoder:
"""Code completion transformer - like GitHub Copilot!"""
def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_length):
self.vocab_size = vocab_size
self.embed_dim = embed_dim
self.max_length = max_length
# Token + position embeddings
self.token_embedding = Embedding(vocab_size, embed_dim)
self.pos_encoding = PositionalEncoding(max_length, embed_dim)
# Transformer blocks
self.blocks = []
for _ in range(num_layers):
block = TransformerBlock(embed_dim, num_heads, embed_dim * 4)
self.blocks.append(block)
# Output projection
self.ln_f = LayerNorm(embed_dim)
self.head = Linear(embed_dim, vocab_size)
def forward(self, idx):
"""Forward pass through the model."""
B, T = idx.shape
# Token + positional embeddings
tok_emb = self.token_embedding.forward(idx)
x = self.pos_encoding.forward(tok_emb)
# Transformer blocks
for block in self.blocks:
x = block(x)
# Output head
x = self.ln_f(x)
logits = self.head(x)
return logits
def complete(self, tokenizer, prefix, max_new_tokens=20):
"""
Complete code given a prefix.
Args:
tokenizer: BPETokenizer instance
prefix: String prefix to complete
max_new_tokens: How many tokens to generate
Returns:
Completed code string
"""
# Encode prefix
tokens = tokenizer.encode(prefix)
idx = Tensor(np.array([tokens]))
# Generate
for _ in range(max_new_tokens):
# Crop if too long
idx_cond = idx if idx.shape[1] <= self.max_length else idx[:, -self.max_length:]
# Forward pass
logits = self.forward(idx_cond)
# Get next token (greedy)
next_token = np.argmax(logits.data[0, -1, :])
# Stop at newline for single-line completion
if tokenizer.decode([next_token]).strip() == '':
break
# Append
idx = Tensor(np.concatenate([idx.data, [[next_token]]], axis=1))
# Decode
full_output = tokenizer.decode(idx.data[0].tolist())
# Return only the new part
return full_output[len(prefix):]
def parameters(self):
"""Get all trainable parameters."""
params = []
params.extend(self.token_embedding.parameters())
for block in self.blocks:
params.extend(block.parameters())
params.extend(self.ln_f.parameters())
params.extend(self.head.parameters())
return params
def load_tinytorch_code():
"""Load all Python code from TinyTorch modules."""
print("📂 Loading TinyTorch source code...")
# Find all Python module files
module_dir = os.path.join(project_root, "modules", "source")
python_files = []
# Get .py files from numbered module directories
for module_num in range(1, 14): # Modules 01-13
pattern = os.path.join(module_dir, f"{module_num:02d}_*", "*_dev.py")
files = glob.glob(pattern)
python_files.extend(files)
print(f" Found {len(python_files)} module files")
# Read all code
all_code = []
total_lines = 0
for file_path in python_files:
try:
with open(file_path, 'r', encoding='utf-8') as f:
code = f.read()
all_code.append(code)
lines = code.count('\n')
total_lines += lines
module_name = os.path.basename(os.path.dirname(file_path))
print(f"{module_name}: {lines:,} lines")
except Exception as e:
print(f" ✗ Error reading {file_path}: {e}")
# Combine all code
combined_code = "\n\n# " + "="*50 + "\n\n".join(all_code)
print(f"\n Total: {total_lines:,} lines of Python code")
print(f" Characters: {len(combined_code):,}")
return combined_code
def main():
print("="*70)
print("🤖 TinyCoder: Building GitHub Copilot with Transformers")
print("="*70)
print()
print("This trains a transformer on YOUR TinyTorch code to generate")
print("code completions - the same technology behind GitHub Copilot!")
print()
# ========================================
# 1. Load training data
# ========================================
code_corpus = load_tinytorch_code()
print()
# ========================================
# 2. Train BPE tokenizer
# ========================================
print("🔤 Training BPE tokenizer on code...")
vocab_size = 1000
tokenizer = BPETokenizer(vocab_size=vocab_size)
# Train tokenizer to learn code patterns
print(f" Learning {vocab_size} subword units from code...")
tokenizer.train(code_corpus)
# Show some learned tokens
print(f"\n Vocabulary size: {len(tokenizer.vocab)}")
print(f" Sample tokens:")
# Find interesting tokens (Python keywords, common patterns)
interesting = []
for token in list(tokenizer.vocab.keys())[:50]:
if any(keyword in token for keyword in ['def', 'class', 'import', 'self', 'return']):
interesting.append(token)
for token in interesting[:10]:
print(f" '{token}'")
# Encode the corpus
print(f"\n Tokenizing corpus...")
tokens = tokenizer.encode(code_corpus)
print(f" Total tokens: {len(tokens):,}")
print()
# ========================================
# 3. Prepare training data
# ========================================
print("📦 Preparing training batches...")
block_size = 128 # Context length
batch_size = 4
def get_batch():
"""Get a random batch of code."""
ix = np.random.randint(0, len(tokens) - block_size, size=batch_size)
x = np.array([tokens[i:i+block_size] for i in ix])
y = np.array([tokens[i+1:i+block_size+1] for i in ix])
return Tensor(x), Tensor(y)
print(f" Block size: {block_size} tokens")
print(f" Batch size: {batch_size} sequences")
print()
# ========================================
# 4. Initialize model
# ========================================
print("🏗️ Building TinyCoder model...")
model = TinyCoder(
vocab_size=vocab_size,
embed_dim=128,
num_heads=8,
num_layers=4,
max_length=block_size
)
total_params = sum(p.data.size for p in model.parameters())
print(f" Parameters: {total_params:,}")
print(f" Layers: {len(model.blocks)} transformer blocks")
print(f" Heads: 8 attention heads per block")
print()
# ========================================
# 5. Train
# ========================================
print("🏋️ Training on YOUR code (20 steps)...")
print(" (In production, this would be 1000s of steps)")
print()
optimizer = Adam(model.parameters(), learning_rate=3e-4)
for step in range(20):
# Get batch
xb, yb = get_batch()
# Forward
logits = model.forward(xb)
# Loss (simplified)
B, T, C = logits.shape
logits_flat = logits.data.reshape(B*T, C)
targets_flat = yb.data.reshape(B*T)
# One-hot
targets_one_hot = np.zeros((B*T, C))
for i, t in enumerate(targets_flat):
if 0 <= int(t) < C:
targets_one_hot[i, int(t)] = 1.0
loss_value = np.mean((logits_flat - targets_one_hot) ** 2)
if step % 5 == 0:
print(f" Step {step:3d}/20 | Loss: {loss_value:.4f}")
print()
# ========================================
# 6. Demo completions!
# ========================================
print("="*70)
print("✨ CODE COMPLETION DEMO")
print("="*70)
print()
demos = [
"import ",
"def forward(self, x):",
"class Linear:",
"self.",
"return ",
]
for prompt in demos:
completion = model.complete(tokenizer, prompt, max_new_tokens=10)
print(f"Input: '{prompt}'")
print(f"Output: '{prompt}{completion}'")
print()
# ========================================
# 7. Success!
# ========================================
print("="*70)
print("🏆 SUCCESS! You Built GitHub Copilot!")
print("="*70)
print()
print("What you learned:")
print(" ✅ Transformers can learn code patterns")
print(" ✅ BPE tokenization captures syntax")
print(" ✅ Autoregressive generation produces valid code")
print(" ✅ This is THE SAME architecture as Copilot!")
print()
print("Production differences:")
print(" • Real Copilot: 12B+ parameters (you: ~100K)")
print(" • Real Copilot: Trained on billions of lines")
print(" • Real Copilot: GPU inference <50ms")
print(" • But the ARCHITECTURE is what YOU built!")
print()
print("="*70)
if __name__ == "__main__":
main()

View File

@@ -1,349 +0,0 @@
#!/usr/bin/env python3
"""
Step 3: TinyGPT - Shakespeare Text Generation
=============================================
GOAL: Traditional transformer demo - generate Shakespeare-style text
DATASET: Tiny Shakespeare (1MB text file)
TOKENIZER: CharTokenizer (character-level for simplicity)
TIME: ~15 minutes
This demonstrates:
✅ Transformer learns language patterns
✅ Generates coherent text in Shakespeare's style
✅ Traditional "hello world" for language models
Classic demo: "To be or not to be..."
"""
import numpy as np
import sys
import os
import urllib.request
# Add project root to path
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
sys.path.insert(0, project_root)
from tinytorch.core.tensor import Tensor
from tinytorch.text.tokenization import CharTokenizer
from tinytorch.core.embeddings import Embedding, PositionalEncoding
from tinytorch.core.attention import MultiHeadAttention
from tinytorch.models.transformer import TransformerBlock, LayerNorm
from tinytorch.core.layers import Linear
from tinytorch.core.optimizers import Adam
class TinyGPT:
"""Shakespeare text generation transformer."""
def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_length):
self.vocab_size = vocab_size
self.embed_dim = embed_dim
self.max_length = max_length
# Embeddings
self.token_embedding = Embedding(vocab_size, embed_dim)
self.pos_encoding = PositionalEncoding(max_length, embed_dim)
# Transformer blocks
self.blocks = []
for _ in range(num_layers):
block = TransformerBlock(embed_dim, num_heads, embed_dim * 4)
self.blocks.append(block)
# Output
self.ln_f = LayerNorm(embed_dim)
self.head = Linear(embed_dim, vocab_size)
def forward(self, idx):
"""Forward pass."""
B, T = idx.shape
# Embeddings
tok_emb = self.token_embedding.forward(idx)
x = self.pos_encoding.forward(tok_emb)
# Transformer blocks
for block in self.blocks:
x = block(x)
# Output
x = self.ln_f(x)
logits = self.head(x)
return logits
def generate(self, tokenizer, start_text, max_new_tokens=100, temperature=0.8):
"""
Generate text starting from start_text.
Args:
tokenizer: CharTokenizer instance
start_text: String to start generation from
max_new_tokens: How many characters to generate
temperature: Sampling temperature (higher = more random)
Returns:
Generated text string
"""
# Encode start
tokens = tokenizer.encode(start_text)
idx = Tensor(np.array([tokens]))
# Generate
for _ in range(max_new_tokens):
# Crop if too long
idx_cond = idx if idx.shape[1] <= self.max_length else idx[:, -self.max_length:]
# Forward
logits = self.forward(idx_cond)
# Last token predictions
logits_last = logits.data[0, -1, :] / temperature
# Softmax
probs = np.exp(logits_last - np.max(logits_last))
probs = probs / np.sum(probs)
# Sample (or greedy if temperature very low)
if temperature < 0.1:
next_token = np.argmax(probs)
else:
next_token = np.random.choice(len(probs), p=probs)
# Append
idx = Tensor(np.concatenate([idx.data, [[next_token]]], axis=1))
# Decode
return tokenizer.decode(idx.data[0].tolist())
def parameters(self):
"""Get all parameters."""
params = []
params.extend(self.token_embedding.parameters())
for block in self.blocks:
params.extend(block.parameters())
params.extend(self.ln_f.parameters())
params.extend(self.head.parameters())
return params
def download_shakespeare():
"""Download Tiny Shakespeare dataset."""
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
data_dir = os.path.join(project_root, "milestones", "datasets")
os.makedirs(data_dir, exist_ok=True)
file_path = os.path.join(data_dir, "shakespeare.txt")
if os.path.exists(file_path):
print(f" ✓ Dataset already exists at {file_path}")
else:
print(f" Downloading from {url}...")
try:
urllib.request.urlretrieve(url, file_path)
print(f" ✓ Downloaded to {file_path}")
except Exception as e:
print(f" ✗ Download failed: {e}")
print(f" Please manually download from: {url}")
print(f" And save to: {file_path}")
return None
# Read text
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
return text
def main():
print("="*70)
print("📜 TinyGPT: Shakespeare Text Generation")
print("="*70)
print()
print("Train a transformer on Shakespeare's works to generate")
print("authentic-sounding 16th century English!")
print()
# ========================================
# 1. Download dataset
# ========================================
print("📥 Step 1: Loading Shakespeare dataset...")
text = download_shakespeare()
if text is None:
print("Failed to load dataset. Exiting.")
return
print(f" Text length: {len(text):,} characters")
print(f" Sample:")
print(f" {text[:200]}...")
print()
# ========================================
# 2. Tokenize
# ========================================
print("🔤 Step 2: Tokenizing (character-level)...")
tokenizer = CharTokenizer()
# Build vocab
unique_chars = sorted(list(set(text)))
tokenizer.vocab = unique_chars
tokenizer.char_to_idx = {ch: i for i, ch in enumerate(unique_chars)}
tokenizer.idx_to_char = {i: ch for i, ch in enumerate(unique_chars)}
# Encode
data = tokenizer.encode(text)
vocab_size = len(tokenizer.vocab)
print(f" Vocabulary size: {vocab_size} unique characters")
print(f" Total tokens: {len(data):,}")
print(f" Characters: {tokenizer.vocab[:20]}...")
print()
# ========================================
# 3. Split train/val
# ========================================
print("📊 Step 3: Preparing data splits...")
n = len(data)
train_data = data[:int(n*0.9)]
val_data = data[int(n*0.9):]
print(f" Train: {len(train_data):,} tokens")
print(f" Val: {len(val_data):,} tokens")
print()
# ========================================
# 4. Batching
# ========================================
block_size = 128
batch_size = 4
def get_batch(split='train'):
"""Get a batch of data."""
data_split = train_data if split == 'train' else val_data
ix = np.random.randint(0, len(data_split) - block_size, size=batch_size)
x = np.array([data_split[i:i+block_size] for i in ix])
y = np.array([data_split[i+1:i+block_size+1] for i in ix])
return Tensor(x), Tensor(y)
# ========================================
# 5. Initialize model
# ========================================
print("🏗️ Step 4: Building TinyGPT...")
model = TinyGPT(
vocab_size=vocab_size,
embed_dim=128,
num_heads=8,
num_layers=4,
max_length=block_size
)
total_params = sum(p.data.size for p in model.parameters())
print(f" Parameters: {total_params:,}")
print(f" Architecture: {len(model.blocks)} transformer blocks")
print()
# ========================================
# 6. Train
# ========================================
print("🏋️ Step 5: Training on Shakespeare (50 steps)...")
print(" (In production, this would be 5000+ steps)")
print()
optimizer = Adam(model.parameters(), learning_rate=3e-4)
for step in range(50):
# Get batch
xb, yb = get_batch('train')
# Forward
logits = model.forward(xb)
# Loss (simplified)
B, T, C = logits.shape
logits_flat = logits.data.reshape(B*T, C)
targets_flat = yb.data.reshape(B*T)
# One-hot
targets_one_hot = np.zeros((B*T, C))
for i, t in enumerate(targets_flat):
targets_one_hot[i, int(t)] = 1.0
loss_value = np.mean((logits_flat - targets_one_hot) ** 2)
# Validation loss every 10 steps
if step % 10 == 0:
xb_val, yb_val = get_batch('val')
logits_val = model.forward(xb_val)
B_val, T_val, C_val = logits_val.shape
logits_val_flat = logits_val.data.reshape(B_val*T_val, C_val)
targets_val_flat = yb_val.data.reshape(B_val*T_val)
targets_val_one_hot = np.zeros((B_val*T_val, C_val))
for i, t in enumerate(targets_val_flat):
targets_val_one_hot[i, int(t)] = 1.0
val_loss = np.mean((logits_val_flat - targets_val_one_hot) ** 2)
print(f" Step {step:3d}/50 | Train Loss: {loss_value:.4f} | Val Loss: {val_loss:.4f}")
print()
# ========================================
# 7. Generate!
# ========================================
print("="*70)
print("✨ SHAKESPEARE GENERATION")
print("="*70)
print()
prompts = [
"To be or not to be,",
"ROMEO:",
"First Citizen:",
]
for prompt in prompts:
print(f"Prompt: '{prompt}'")
print("-" * 70)
generated = model.generate(tokenizer, prompt, max_new_tokens=100, temperature=0.8)
print(generated)
print()
# ========================================
# 8. Success!
# ========================================
print("="*70)
print("🎭 SUCCESS! You Built a Language Model!")
print("="*70)
print()
print("What you learned:")
print(" ✅ Transformers learn language patterns from data")
print(" ✅ Character-level models can generate coherent text")
print(" ✅ Temperature controls randomness in generation")
print(" ✅ This is the foundation of GPT, ChatGPT, etc!")
print()
print("Model architecture comparison:")
print(" • Your TinyGPT: ~100K parameters, 4 layers")
print(" • GPT-2: 117M parameters, 12 layers")
print(" • GPT-3: 175B parameters, 96 layers")
print(" • GPT-4: ~1.8T parameters, ~120 layers (estimated)")
print()
print("But the ARCHITECTURE is identical to what YOU built!")
print("="*70)
if __name__ == "__main__":
main()