mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-06-03 03:45:52 -05:00
🏗️ Restructure milestones with decade-based naming
- Rename to clean, focused convention: 01_1957_perceptron, 02_1969_xor, etc. - Drop dramatic language (crisis, revival, revolution, era) - 06_2018_mlperf → 06_2020_scaling (matches GPT-3 scale era) - Tells clear story: 1950s → 2020s ML evolution - Each milestone represents major architectural/systems shift - Remove redundant step1/2/3 files from transformer milestone
This commit is contained in:
@@ -1,185 +0,0 @@
|
||||
# 🎓 TinyTorch Capstone Project Ideas
|
||||
|
||||
## **Background: The Capstone Design Problem**
|
||||
|
||||
**Original Issue**: Module 20 was "TinyGPT Capstone" but students can already build TinyGPT after Module 13 (Transformers). This made:
|
||||
- Modules 14-19 (optimization) feel like "optional extras"
|
||||
- Module 20 anticlimactic ("TinyGPT again?")
|
||||
- No integration of crucial systems engineering skills
|
||||
|
||||
**Solution Requirements**:
|
||||
- Must integrate ALL modules 1-19 (especially optimization modules 14-19)
|
||||
- Must be genuinely exciting and different
|
||||
- Must demonstrate complete ML systems engineering mastery
|
||||
- Must create portfolio-worthy deliverables
|
||||
|
||||
---
|
||||
|
||||
## **🏆 RECOMMENDED: AI Olympics Competition**
|
||||
|
||||
**📁 See: [ai-olympics.md](ai-olympics.md)**
|
||||
|
||||
**Core Concept**: Competitive leaderboard where students optimize TinyTorch models across systems engineering dimensions.
|
||||
|
||||
**Why This is Best**:
|
||||
- ✅ **Natural motivation**: Students want to rank high on leaderboards
|
||||
- ✅ **Systems focus**: Compete on speed, memory, efficiency - not just accuracy
|
||||
- ✅ **Community building**: Creates ongoing engagement and peer interaction
|
||||
- ✅ **Portfolio impact**: "I ranked #3 in TinyTorch AI Olympics" is compelling
|
||||
- ✅ **Forces optimization**: ALL modules 14-19 become essential for competitive performance
|
||||
|
||||
**Competition Categories**:
|
||||
- 🏃♂️ **Speed Demon**: Fastest inference
|
||||
- 💾 **Memory Miser**: Smallest memory footprint
|
||||
- 📱 **Edge Expert**: Best Raspberry Pi performance
|
||||
- 🔋 **Energy Efficient**: Lowest power consumption
|
||||
- 🏆 **TinyMLPerf**: Overall benchmark champion
|
||||
|
||||
---
|
||||
|
||||
## **🛠️ Alternative Ideas Considered**
|
||||
|
||||
### **1. Edge AI Deployment System**
|
||||
**Concept**: Deploy optimized neural networks to actual edge hardware (Raspberry Pi)
|
||||
|
||||
**Pros**:
|
||||
- Integrates all optimization modules (essential for edge constraints)
|
||||
- Creates tangible deliverable ("I run neural networks on a $35 computer")
|
||||
- Teaches real-world deployment challenges
|
||||
|
||||
**Cons**:
|
||||
- Individual project (no community/competition aspect)
|
||||
- Hardware dependencies (students need Pi)
|
||||
- Less motivating than competition
|
||||
|
||||
### **2. Multi-Modal AI Assistant**
|
||||
**Concept**: Combine vision (CNNs) + language (transformers) + optimization for real-time performance
|
||||
|
||||
**Pros**:
|
||||
- Showcases multiple architectures working together
|
||||
- Demonstrates practical AI applications
|
||||
- Requires optimization for real-time performance
|
||||
|
||||
**Cons**:
|
||||
- Complex scope potentially overwhelming
|
||||
- Optimization feels secondary to "getting it working"
|
||||
- Limited portfolio differentiation
|
||||
|
||||
### **3. ML Performance Laboratory**
|
||||
**Concept**: Comprehensive benchmarking suite comparing different ML frameworks
|
||||
|
||||
**Pros**:
|
||||
- Heavy focus on profiling and benchmarking skills
|
||||
- Creates useful tool for community
|
||||
- Deep systems engineering focus
|
||||
|
||||
**Cons**:
|
||||
- More about measurement than optimization
|
||||
- Limited creative expression for students
|
||||
- May feel academic rather than practical
|
||||
|
||||
### **4. Neural Architecture Search**
|
||||
**Concept**: Automated model design and optimization system
|
||||
|
||||
**Pros**:
|
||||
- Cutting-edge research area
|
||||
- Requires sophisticated optimization
|
||||
- Highly technical achievement
|
||||
|
||||
**Cons**:
|
||||
- Very advanced, may be beyond course scope
|
||||
- Optimization becomes means rather than end
|
||||
- Difficult to assess fairly
|
||||
|
||||
### **5. Distributed Training System**
|
||||
**Concept**: Multi-GPU/multi-node training infrastructure
|
||||
|
||||
**Pros**:
|
||||
- Advanced systems engineering skills
|
||||
- High industry relevance
|
||||
- Impressive technical achievement
|
||||
|
||||
**Cons**:
|
||||
- Requires expensive hardware
|
||||
- Complex debugging and setup
|
||||
- May overshadow core ML concepts
|
||||
|
||||
### **6. ML Model Marketplace**
|
||||
**Concept**: Complete system for sharing/deploying/optimizing models (like Hugging Face)
|
||||
|
||||
**Pros**:
|
||||
- Full-stack systems engineering
|
||||
- Practical deployment focus
|
||||
- Creates useful community resource
|
||||
|
||||
**Cons**:
|
||||
- Web development skills needed
|
||||
- Broad scope potentially unfocused
|
||||
- Less emphasis on optimization techniques
|
||||
|
||||
---
|
||||
|
||||
## **📊 Evaluation Criteria**
|
||||
|
||||
| Criteria | AI Olympics | Edge Deployment | Multi-Modal | ML Lab | NAS | Distributed | Marketplace |
|
||||
|----------|-------------|-----------------|-------------|--------|-----|-------------|-------------|
|
||||
| **Integrates All Modules** | ✅✅✅ | ✅✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| **Student Motivation** | ✅✅✅ | ✅ | ✅ | ⚠️ | ⚠️ | ⚠️ | ✅ |
|
||||
| **Portfolio Impact** | ✅✅✅ | ✅✅ | ✅ | ✅ | ✅✅ | ✅✅ | ✅ |
|
||||
| **Systems Engineering Focus** | ✅✅✅ | ✅✅ | ✅ | ✅✅✅ | ✅ | ✅✅✅ | ✅ |
|
||||
| **Implementation Feasibility** | ✅✅ | ✅✅✅ | ✅ | ✅✅ | ⚠️ | ⚠️ | ✅ |
|
||||
| **Community Building** | ✅✅✅ | ⚠️ | ⚠️ | ✅ | ⚠️ | ⚠️ | ✅✅ |
|
||||
| **Scalability** | ✅✅✅ | ✅ | ✅ | ✅ | ⚠️ | ⚠️ | ✅ |
|
||||
|
||||
**Legend**: ✅✅✅ Excellent, ✅✅ Good, ✅ Adequate, ⚠️ Challenging
|
||||
|
||||
---
|
||||
|
||||
## **🎯 Final Recommendation**
|
||||
|
||||
**AI Olympics** emerges as the clear winner because it:
|
||||
|
||||
1. **Maximizes student motivation** through competitive leaderboards
|
||||
2. **Forces integration** of ALL optimization modules (14-19)
|
||||
3. **Creates lasting community** beyond individual course completion
|
||||
4. **Produces compelling portfolio artifacts** (leaderboard rankings)
|
||||
5. **Scales naturally** as more students participate
|
||||
6. **Emphasizes systems engineering** over algorithmic implementation
|
||||
|
||||
### **Implementation Priority**
|
||||
1. **Phase 1**: Design and build leaderboard infrastructure
|
||||
2. **Phase 2**: Create standard benchmark evaluation suite
|
||||
3. **Phase 3**: Deploy beta version with small student cohort
|
||||
4. **Phase 4**: Full launch with all TinyTorch students
|
||||
|
||||
### **Success Metrics**
|
||||
- **Participation Rate**: % of students who submit to multiple categories
|
||||
- **Optimization Depth**: Average number of techniques applied per submission
|
||||
- **Community Engagement**: Forum activity, peer collaboration, ongoing submissions
|
||||
- **Portfolio Impact**: Industry feedback on graduate capabilities
|
||||
|
||||
---
|
||||
|
||||
## **📝 Notes for Implementation**
|
||||
|
||||
### **Technical Requirements**
|
||||
- Automated submission and evaluation pipeline
|
||||
- Standard benchmark datasets and environments
|
||||
- Real-time leaderboard with rich visualizations
|
||||
- Robust measurement and scoring systems
|
||||
|
||||
### **Educational Integration**
|
||||
- Clear rubrics linking competition performance to course grades
|
||||
- Structured optimization process through modules 14-19
|
||||
- Portfolio development guidance and templates
|
||||
- Peer review and collaboration opportunities
|
||||
|
||||
### **Community Features**
|
||||
- Student profiles and achievement tracking
|
||||
- Optimization technique sharing and discussion
|
||||
- Mentorship connections between high performers and struggling students
|
||||
- Industry guest judging and feedback
|
||||
|
||||
---
|
||||
|
||||
**🚀 The AI Olympics transforms TinyTorch from "just another ML course" into a competitive systems engineering community that motivates deep learning, creates lasting engagement, and produces industry-ready graduates.**
|
||||
@@ -1,227 +0,0 @@
|
||||
# 🏅 AI Olympics: TinyTorch Systems Competition Capstone
|
||||
|
||||
## **Core Concept: Compete on Systems Performance, Not Just Accuracy**
|
||||
|
||||
Instead of individual projects, Module 20 becomes a **competitive leaderboard** where students optimize their TinyTorch models across multiple **systems engineering dimensions**.
|
||||
|
||||
### **🎯 Why AI Olympics is Perfect for TinyTorch**
|
||||
|
||||
- **Systems Focus**: Compete on memory, speed, efficiency - not just accuracy
|
||||
- **Real ML Engineering**: Production systems care about performance, not just "does it work"
|
||||
- **Leaderboard Motivation**: Students naturally want to rank high and beat peers
|
||||
- **Portfolio Value**: "I ranked #3 in TinyTorch AI Olympics" is impressive
|
||||
- **Community Building**: Creates ongoing engagement beyond the course
|
||||
|
||||
---
|
||||
|
||||
## **🏆 Competition Categories**
|
||||
|
||||
### **Category 1: Speed Demon** ⚡
|
||||
*"Fastest inference on standard hardware"*
|
||||
- **Metric**: Inferences per second on reference hardware
|
||||
- **Required Skills**: Modules 14-19 optimization techniques
|
||||
- **Constraint**: Must maintain >90% accuracy on test dataset
|
||||
|
||||
### **Category 2: Memory Miser** 💾
|
||||
*"Smallest memory footprint"*
|
||||
- **Metric**: Peak memory usage during inference
|
||||
- **Required Skills**: Quantization, compression, efficient architectures
|
||||
- **Constraint**: Must maintain >85% accuracy on test dataset
|
||||
|
||||
### **Category 3: Edge Expert** 📱
|
||||
*"Best performance on Raspberry Pi"*
|
||||
- **Metric**: Composite score (speed + accuracy + power efficiency)
|
||||
- **Required Skills**: ALL optimization modules for edge constraints
|
||||
- **Constraint**: Must actually run on Pi hardware
|
||||
|
||||
### **Category 4: Energy Efficient** 🔋
|
||||
*"Lowest power consumption"*
|
||||
- **Metric**: Energy per inference (joules/prediction)
|
||||
- **Required Skills**: Model compression, efficient algorithms
|
||||
- **Constraint**: Must maintain competitive accuracy
|
||||
|
||||
### **Category 5: TinyMLPerf** 🏃♂️
|
||||
*"Official MLPerf-style benchmark"*
|
||||
- **Metric**: Standardized benchmark suite performance
|
||||
- **Required Skills**: Complete systems optimization pipeline
|
||||
- **Constraint**: Must pass all benchmark compliance tests
|
||||
|
||||
---
|
||||
|
||||
## **🎮 Competition Structure**
|
||||
|
||||
### **Phase 1: Baseline Submission (Week 1)**
|
||||
- Submit working model from modules 1-13 (CNN, transformer, or multi-modal)
|
||||
- Get baseline scores across all categories
|
||||
- See where you rank on initial leaderboard
|
||||
|
||||
### **Phase 2: Optimization Sprint (Weeks 2-4)**
|
||||
- Apply techniques from modules 14-19 systematically
|
||||
- **Module 14**: Profile and identify bottlenecks
|
||||
- **Module 15**: Implement acceleration techniques
|
||||
- **Module 16**: Add quantization for memory/speed
|
||||
- **Module 17**: Apply compression for size reduction
|
||||
- **Module 18**: Implement caching for inference speed
|
||||
- **Module 19**: Benchmark against production systems
|
||||
|
||||
### **Phase 3: Final Submission & Olympics (Week 5)**
|
||||
- Submit optimized models to all relevant categories
|
||||
- **Live leaderboard updates** as submissions come in
|
||||
- **Victory ceremony** with category winners
|
||||
- **Portfolio artifacts**: Leaderboard rankings + optimization reports
|
||||
|
||||
---
|
||||
|
||||
## **📊 Leaderboard & Scoring System**
|
||||
|
||||
### **Public Leaderboard Features**
|
||||
```
|
||||
🏆 TinyTorch AI Olympics Leaderboard
|
||||
|
||||
Speed Demon Category:
|
||||
1. alice_chen 847.3 inf/sec (95.2% acc) 🥇
|
||||
2. bob_smith 612.7 inf/sec (94.8% acc) 🥈
|
||||
3. carol_wong 588.1 inf/sec (96.1% acc) 🥉
|
||||
|
||||
Memory Miser Category:
|
||||
1. dave_kim 12.4 MB (91.7% acc) 🥇
|
||||
2. eve_patel 15.8 MB (93.2% acc) 🥈
|
||||
3. frank_liu 18.2 MB (89.9% acc) 🥉
|
||||
```
|
||||
|
||||
### **Scoring Methodology**
|
||||
- **Primary Metric**: Category-specific performance (speed, memory, etc.)
|
||||
- **Accuracy Threshold**: Must meet minimum accuracy to qualify
|
||||
- **Tie-Breaker**: Higher accuracy wins ties in primary metric
|
||||
- **Bonus Points**: Novel optimization techniques, exceptional documentation
|
||||
|
||||
### **Awards & Recognition**
|
||||
- **🥇 Category Champions**: Top performer in each category
|
||||
- **🏆 Overall Systems Engineer**: Best combined performance across categories
|
||||
- **🚀 Innovation Award**: Most creative optimization approach
|
||||
- **📚 Teaching Award**: Best documented optimization process
|
||||
|
||||
---
|
||||
|
||||
## **🎯 Required Deliverables**
|
||||
|
||||
### **Competition Submission Package**
|
||||
1. **Optimized Model**: Runnable TinyTorch implementation
|
||||
2. **Performance Report**: Detailed analysis of optimization techniques applied
|
||||
3. **Reproduction Guide**: Clear instructions for others to run your solution
|
||||
4. **Systems Engineering Documentation**: What you learned about ML systems
|
||||
|
||||
### **Portfolio Artifacts Students Get**
|
||||
- **Leaderboard ranking** across multiple categories
|
||||
- **Technical optimization report** demonstrating systems engineering skills
|
||||
- **Benchmark results** comparing their work to industry standards
|
||||
- **Peer recognition** from competitive performance
|
||||
|
||||
---
|
||||
|
||||
## **🔧 Technical Infrastructure Needed**
|
||||
|
||||
### **Leaderboard System**
|
||||
- Automated submission processing
|
||||
- Standard evaluation environment
|
||||
- Real-time ranking updates
|
||||
- Historical performance tracking
|
||||
|
||||
### **Benchmark Suite**
|
||||
- Reference datasets for each category
|
||||
- Standard hardware for testing
|
||||
- Automated compliance checking
|
||||
- Performance measurement tools
|
||||
|
||||
### **Submission Portal**
|
||||
- Code upload and validation
|
||||
- Automatic testing pipeline
|
||||
- Results processing and ranking
|
||||
- Student dashboard with progress
|
||||
|
||||
---
|
||||
|
||||
## **📈 Why This Beats Individual Projects**
|
||||
|
||||
### **Individual Project Problems:**
|
||||
- ❌ No motivation to optimize beyond "it works"
|
||||
- ❌ Hard to compare student achievements
|
||||
- ❌ No ongoing engagement after submission
|
||||
- ❌ Limited portfolio impact
|
||||
|
||||
### **AI Olympics Advantages:**
|
||||
- ✅ **Natural optimization motivation**: Students want to rank higher
|
||||
- ✅ **Clear performance comparison**: Leaderboard shows relative achievement
|
||||
- ✅ **Ongoing engagement**: Leaderboard creates lasting community
|
||||
- ✅ **Strong portfolio impact**: "I ranked #2 in Memory Efficiency" is compelling
|
||||
|
||||
### **Systems Engineering Focus:**
|
||||
- Forces students to care about **ALL** optimization dimensions
|
||||
- Makes modules 14-19 essential for competitive performance
|
||||
- Teaches that "getting it working" is only the beginning
|
||||
- Demonstrates real-world ML engineering priorities
|
||||
|
||||
---
|
||||
|
||||
## **🚀 Implementation Timeline**
|
||||
|
||||
### **Phase 1: Core Infrastructure (4 weeks)**
|
||||
- Build leaderboard system
|
||||
- Create benchmark evaluation suite
|
||||
- Set up automated testing pipeline
|
||||
- Design submission portal
|
||||
|
||||
### **Phase 2: Beta Testing (2 weeks)**
|
||||
- Test with small group of students
|
||||
- Refine scoring methodology
|
||||
- Fix technical issues
|
||||
- Gather feedback and iterate
|
||||
|
||||
### **Phase 3: Full Launch (Ongoing)**
|
||||
- Deploy for all TinyTorch students
|
||||
- Monitor and maintain leaderboard
|
||||
- Regular benchmark updates
|
||||
- Community management and awards
|
||||
|
||||
---
|
||||
|
||||
## **🎓 Educational Impact**
|
||||
|
||||
### **Learning Outcomes**
|
||||
Students learn that ML engineering is about:
|
||||
- **Systems performance**, not just algorithmic correctness
|
||||
- **Trade-offs** between speed, memory, accuracy, and power
|
||||
- **Optimization techniques** for real-world constraints
|
||||
- **Benchmarking and measurement** for objective evaluation
|
||||
- **Competition and collaboration** in technical communities
|
||||
|
||||
### **Career Preparation**
|
||||
Students graduate with:
|
||||
- **Demonstrable systems optimization skills**
|
||||
- **Portfolio evidence of competitive performance**
|
||||
- **Experience with ML engineering trade-offs**
|
||||
- **Understanding of production ML constraints**
|
||||
- **Community connections** with other systems engineers
|
||||
|
||||
---
|
||||
|
||||
## **💡 Future Extensions**
|
||||
|
||||
### **Multi-Semester Competitions**
|
||||
- New benchmark challenges each semester
|
||||
- Evolving leaderboards with increasing difficulty
|
||||
- Alumni participation and mentorship
|
||||
|
||||
### **Industry Integration**
|
||||
- Company-sponsored benchmark challenges
|
||||
- Internship opportunities for top performers
|
||||
- Guest judging from ML systems engineers
|
||||
|
||||
### **Research Integration**
|
||||
- Novel optimization techniques become research contributions
|
||||
- Student innovations feed back into TinyTorch framework
|
||||
- Academic publications from exceptional submissions
|
||||
|
||||
---
|
||||
|
||||
**🎯 CONCLUSION: AI Olympics transforms Module 20 from "individual project" to "competitive systems engineering challenge" that motivates optimization, builds community, and produces compelling portfolio artifacts.**
|
||||
@@ -4,63 +4,20 @@
|
||||
|
||||
## 🎯 What You'll Build
|
||||
|
||||
Three progressively impressive demos:
|
||||
A character-level transformer trained on Shakespeare's works - the classic "hello world" of language modeling!
|
||||
|
||||
### Step 1: Quick Validation (5 minutes)
|
||||
**File**: `step1_quick_validation.py`
|
||||
**Goal**: Verify transformer pipeline works
|
||||
### Shakespeare Text Generation
|
||||
**File**: `vaswani_shakespeare.py`
|
||||
**Goal**: Build a transformer that generates Shakespeare-style text
|
||||
|
||||
```bash
|
||||
python step1_quick_validation.py
|
||||
```
|
||||
|
||||
**What it does**:
|
||||
- Trains on simple repeating text ("hello world")
|
||||
- Proves modules 10-13 are connected correctly
|
||||
- Quick sanity check before bigger demos
|
||||
|
||||
**Success**: Generates "hello world" pattern
|
||||
|
||||
---
|
||||
|
||||
### Step 2: TinyCoder (15 minutes) 🔥
|
||||
**File**: `step2_tinycoder.py`
|
||||
**Goal**: Code completion like GitHub Copilot!
|
||||
|
||||
```bash
|
||||
python step2_tinycoder.py
|
||||
```
|
||||
|
||||
**What it does**:
|
||||
- Trains on YOUR TinyTorch Python code
|
||||
- Learns code patterns (def, class, self, etc.)
|
||||
- Generates syntactically valid Python completions
|
||||
|
||||
**Demo**:
|
||||
```python
|
||||
Input: 'def forward(self, x):'
|
||||
Output: 'def forward(self, x):\n return self.layer(x)'
|
||||
|
||||
Input: 'import '
|
||||
Output: 'import numpy as np'
|
||||
```
|
||||
|
||||
**Epic moment**: "I built GitHub Copilot!"
|
||||
|
||||
---
|
||||
|
||||
### Step 3: Shakespeare (15 minutes)
|
||||
**File**: `step3_shakespeare.py`
|
||||
**Goal**: Traditional text generation demo
|
||||
|
||||
```bash
|
||||
python step3_shakespeare.py
|
||||
python vaswani_shakespeare.py
|
||||
```
|
||||
|
||||
**What it does**:
|
||||
- Downloads Tiny Shakespeare dataset
|
||||
- Trains character-level transformer
|
||||
- Generates Shakespeare-style text
|
||||
- Trains character-level transformer (YOUR implementation!)
|
||||
- Generates coherent Shakespeare-style text
|
||||
|
||||
**Demo**:
|
||||
```
|
||||
@@ -69,8 +26,6 @@ Output: 'To be or not to be, that is the question
|
||||
Whether tis nobler in the mind to suffer...'
|
||||
```
|
||||
|
||||
**Classic**: Traditional "hello world" for language models
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
@@ -82,34 +37,18 @@ Complete these TinyTorch modules:
|
||||
- ✅ Module 12: Attention
|
||||
- ✅ Module 13: Transformers
|
||||
|
||||
### Run in Order
|
||||
### Run the Example
|
||||
|
||||
```bash
|
||||
# 1. Quick validation (5 min)
|
||||
python step1_quick_validation.py
|
||||
|
||||
# 2. Code completion (15 min) - THE EPIC ONE
|
||||
python step2_tinycoder.py
|
||||
|
||||
# 3. Shakespeare (15 min) - traditional demo
|
||||
python step3_shakespeare.py
|
||||
# Train transformer on Shakespeare (15-20 min)
|
||||
python vaswani_shakespeare.py
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 What Each Demo Teaches
|
||||
|
||||
| Demo | Dataset | Tokenizer | Time | Epic Factor | What You Learn |
|
||||
|------|---------|-----------|------|-------------|----------------|
|
||||
| **Step 1** | Simple text | CharTokenizer | 5 min | ⭐⭐ | Pipeline works |
|
||||
| **Step 2** | TinyTorch code | BPETokenizer | 15 min | ⭐⭐⭐⭐⭐ | YOU built Copilot! |
|
||||
| **Step 3** | Shakespeare | CharTokenizer | 15 min | ⭐⭐⭐⭐ | Language modeling |
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Learning Outcomes
|
||||
|
||||
After completing these milestones, you'll understand:
|
||||
After completing this milestone, you'll understand:
|
||||
|
||||
### Technical Mastery
|
||||
- ✅ How tokenization bridges text and numbers
|
||||
@@ -248,11 +187,12 @@ model = TinyGPT(
|
||||
|
||||
You've succeeded when:
|
||||
|
||||
**Step 1**: Model generates repeating pattern
|
||||
**Step 2**: Code completions are syntactically valid
|
||||
**Step 3**: Shakespeare text is coherent (even if not perfect)
|
||||
✅ Model trains without errors
|
||||
✅ Loss decreases over training epochs
|
||||
✅ Generated Shakespeare text is coherent (even if not perfect)
|
||||
✅ You can generate text with custom prompts
|
||||
|
||||
**Don't expect perfection!** Production models train for months on massive data. Your demos prove you understand the architecture!
|
||||
**Don't expect perfection!** Production models train for months on massive data. Your demo proves you understand the architecture!
|
||||
|
||||
---
|
||||
|
||||
@@ -285,4 +225,4 @@ The transformer architecture you implemented powers:
|
||||
|
||||
---
|
||||
|
||||
**Ready to generate some text?** Start with `step1_quick_validation.py`!
|
||||
**Ready to generate some text?** Run `python vaswani_shakespeare.py`!
|
||||
@@ -23,12 +23,12 @@ MODULES EXERCISED IN THIS EXAMPLE:
|
||||
Transformer Architecture (Bottom to Top Flow):
|
||||
|
||||
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Output Logits │
|
||||
│ Vocabulary Predictions (1000) │
|
||||
│ Output Logits │
|
||||
│ Vocabulary Predictions (1000) │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
▲
|
||||
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Output Projection │
|
||||
│ Output Projection │
|
||||
│ Module 04: vectors → vocabulary │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
▲
|
||||
@@ -39,41 +39,41 @@ Transformer Architecture (Bottom to Top Flow):
|
||||
▲
|
||||
╔══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗
|
||||
║ Transformer Block × 4 (Repeat) ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Layer Norm │ ║
|
||||
║ │ Module 14: Post-FFN normalization │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Layer Norm │ ║
|
||||
║ │ Module 14: Post-FFN normalization │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
║ ▲ ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Feed Forward Network (FFN) │ ║
|
||||
║ │ Module 04: Linear(128→512) → ReLU → Linear(512→128) │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Feed Forward Network (FFN) │ ║
|
||||
║ │ Module 04: Linear(128→512) → ReLU → Linear(512→128) │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
║ ▲ ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Layer Norm │ ║
|
||||
║ │ Module 14: Post-attention normalization │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Layer Norm │ ║
|
||||
║ │ Module 14: Post-attention normalization │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
║ ▲ ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Multi-Head Self-Attention │ ║
|
||||
║ │ Module 13: 8 heads × (Q·K^T/√d_k)·V │ ║
|
||||
║ │ Each head: 16-dim attention on 128-dim embeddings │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
║ ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ ║
|
||||
║ │ Multi-Head Self-Attention │ ║
|
||||
║ │ Module 13: 8 heads × (Q·K^T/√d_k)·V │ ║
|
||||
║ │ Each head: 16-dim attention on 128-dim embeddings │ ║
|
||||
║ └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ║
|
||||
╚══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
|
||||
▲
|
||||
▲
|
||||
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Positional Encoding │
|
||||
│ Module 12: Add position information (sin/cos) │
|
||||
│ Positional Encoding │
|
||||
│ Module 12: Add position information (sin/cos) │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
▲
|
||||
▲
|
||||
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Token Embeddings │
|
||||
│ Token Embeddings │
|
||||
│ Module 12: tokens → 128-dim vectors │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
▲
|
||||
▲
|
||||
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
|
||||
│ Input Tokens │
|
||||
│ [token_1, token_2, ..., token_10] │
|
||||
│ Input Tokens │
|
||||
│ [token_1, token_2, ..., token_10] │
|
||||
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Key Insight: Attention allows each token to "look at" all other tokens
|
||||
@@ -1,288 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Step 1: Quick Validation - Transformer Pipeline Test
|
||||
====================================================
|
||||
|
||||
GOAL: Verify transformer modules work end-to-end in 5 minutes
|
||||
DATASET: Simple repeating text (no download needed)
|
||||
TOKENIZER: CharTokenizer (no training needed)
|
||||
TIME: ~5 minutes
|
||||
|
||||
This is the simplest possible test to prove:
|
||||
✅ Modules 10-13 are connected correctly
|
||||
✅ Training loop works
|
||||
✅ Generation works
|
||||
|
||||
If this passes, the pipeline is functional!
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import sys
|
||||
import os
|
||||
|
||||
# Add project root to path
|
||||
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
sys.path.insert(0, project_root)
|
||||
|
||||
from tinytorch.core.tensor import Tensor
|
||||
from tinytorch.text.tokenization import CharTokenizer
|
||||
from tinytorch.core.embeddings import Embedding, PositionalEncoding
|
||||
from tinytorch.core.attention import MultiHeadAttention
|
||||
from tinytorch.models.transformer import TransformerBlock, LayerNorm
|
||||
from tinytorch.core.layers import Linear
|
||||
from tinytorch.core.optimizers import Adam
|
||||
|
||||
|
||||
class TinyGPT:
|
||||
"""Minimal GPT for quick validation."""
|
||||
|
||||
def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_length):
|
||||
self.vocab_size = vocab_size
|
||||
self.embed_dim = embed_dim
|
||||
|
||||
# Token + position embeddings
|
||||
self.token_embedding = Embedding(vocab_size, embed_dim)
|
||||
self.pos_encoding = PositionalEncoding(max_length, embed_dim)
|
||||
|
||||
# Transformer blocks
|
||||
self.blocks = []
|
||||
for _ in range(num_layers):
|
||||
block = TransformerBlock(embed_dim, num_heads, embed_dim * 4)
|
||||
self.blocks.append(block)
|
||||
|
||||
# Output projection
|
||||
self.ln_f = LayerNorm(embed_dim)
|
||||
self.head = Linear(embed_dim, vocab_size)
|
||||
|
||||
def forward(self, idx):
|
||||
"""Forward pass through the model."""
|
||||
B, T = idx.shape
|
||||
|
||||
# Token + positional embeddings
|
||||
tok_emb = self.token_embedding.forward(idx) # (B, T, embed_dim)
|
||||
x = self.pos_encoding.forward(tok_emb) # (B, T, embed_dim) - includes positional info
|
||||
|
||||
# Transformer blocks
|
||||
for block in self.blocks:
|
||||
x = block(x)
|
||||
|
||||
# Output head
|
||||
x = self.ln_f(x)
|
||||
logits = self.head(x) # (B, T, vocab_size)
|
||||
|
||||
return logits
|
||||
|
||||
def generate(self, idx, max_new_tokens, temperature=1.0):
|
||||
"""Generate new tokens autoregressively."""
|
||||
for _ in range(max_new_tokens):
|
||||
# Crop context if needed
|
||||
idx_cond = idx if idx.shape[1] <= 128 else idx[:, -128:]
|
||||
|
||||
# Get predictions
|
||||
logits = self.forward(idx_cond)
|
||||
|
||||
# Focus on last time step
|
||||
logits = logits[:, -1, :] / temperature # (B, vocab_size)
|
||||
|
||||
# Sample from distribution (greedy for simplicity)
|
||||
next_idx = np.argmax(logits.data, axis=-1, keepdims=True)
|
||||
|
||||
# Append to sequence
|
||||
idx = Tensor(np.concatenate([idx.data, next_idx], axis=1))
|
||||
|
||||
return idx
|
||||
|
||||
def parameters(self):
|
||||
"""Get all trainable parameters."""
|
||||
params = []
|
||||
params.extend(self.token_embedding.parameters())
|
||||
for block in self.blocks:
|
||||
params.extend(block.parameters())
|
||||
params.extend(self.ln_f.parameters())
|
||||
params.extend(self.head.parameters())
|
||||
return params
|
||||
|
||||
|
||||
def main():
|
||||
print("="*70)
|
||||
print("🚀 Step 1: Quick Transformer Validation")
|
||||
print("="*70)
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 1. Prepare simple repeating text
|
||||
# ========================================
|
||||
print("📝 Step 1: Preparing data...")
|
||||
text = "hello world! " * 200 # Simple repeating pattern
|
||||
print(f" Text length: {len(text)} characters")
|
||||
print(f" Sample: '{text[:50]}...'")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 2. Tokenize (character-level)
|
||||
# ========================================
|
||||
print("🔤 Step 2: Tokenizing...")
|
||||
tokenizer = CharTokenizer()
|
||||
|
||||
# Build vocab from text
|
||||
unique_chars = sorted(list(set(text)))
|
||||
tokenizer.vocab = unique_chars
|
||||
tokenizer.char_to_idx = {ch: i for i, ch in enumerate(unique_chars)}
|
||||
tokenizer.idx_to_char = {i: ch for i, ch in enumerate(unique_chars)}
|
||||
|
||||
# Encode text
|
||||
data = tokenizer.encode(text)
|
||||
vocab_size = len(tokenizer.vocab)
|
||||
|
||||
print(f" Vocabulary size: {vocab_size} unique characters")
|
||||
print(f" Tokens: {data[:20]}...")
|
||||
print(f" Vocab: {tokenizer.vocab}")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 3. Create training batches
|
||||
# ========================================
|
||||
print("📦 Step 3: Creating batches...")
|
||||
block_size = 32 # Context length
|
||||
batch_size = 4
|
||||
|
||||
def get_batch():
|
||||
"""Get a random batch of data."""
|
||||
ix = np.random.randint(0, len(data) - block_size, size=batch_size)
|
||||
x = np.array([data[i:i+block_size] for i in ix])
|
||||
y = np.array([data[i+1:i+block_size+1] for i in ix])
|
||||
return Tensor(x), Tensor(y)
|
||||
|
||||
x_sample, y_sample = get_batch()
|
||||
print(f" Batch size: {batch_size}")
|
||||
print(f" Block size: {block_size}")
|
||||
print(f" Input shape: {x_sample.shape}")
|
||||
print(f" Target shape: {y_sample.shape}")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 4. Initialize model
|
||||
# ========================================
|
||||
print("🤖 Step 4: Initializing TinyGPT...")
|
||||
model = TinyGPT(
|
||||
vocab_size=vocab_size,
|
||||
embed_dim=64, # Small for fast training
|
||||
num_heads=4,
|
||||
num_layers=2, # Just 2 layers
|
||||
max_length=block_size
|
||||
)
|
||||
|
||||
total_params = sum(p.data.size for p in model.parameters())
|
||||
print(f" Model parameters: {total_params:,}")
|
||||
print(f" Architecture: {len(model.blocks)} transformer blocks")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 5. Train
|
||||
# ========================================
|
||||
print("🏋️ Step 5: Training (10 steps)...")
|
||||
optimizer = Adam(model.parameters(), learning_rate=3e-4)
|
||||
|
||||
for step in range(10):
|
||||
# Get batch
|
||||
xb, yb = get_batch()
|
||||
|
||||
# Forward pass
|
||||
logits = model.forward(xb)
|
||||
|
||||
# Compute loss (simplified cross-entropy)
|
||||
B, T, C = logits.shape
|
||||
logits_flat = logits.data.reshape(B*T, C)
|
||||
targets_flat = yb.data.reshape(B*T)
|
||||
|
||||
# One-hot encode targets
|
||||
targets_one_hot = np.zeros((B*T, C))
|
||||
for i, t in enumerate(targets_flat):
|
||||
targets_one_hot[i, int(t)] = 1.0
|
||||
|
||||
# MSE loss (simplified)
|
||||
loss_value = np.mean((logits_flat - targets_one_hot) ** 2)
|
||||
|
||||
# Backward (simplified - just for demo)
|
||||
# In real training, this would compute gradients
|
||||
|
||||
# Update (simplified)
|
||||
# optimizer.step()
|
||||
# optimizer.zero_grad()
|
||||
|
||||
if step % 2 == 0:
|
||||
print(f" Step {step:2d}/10 | Loss: {loss_value:.4f}")
|
||||
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 6. Generate
|
||||
# ========================================
|
||||
print("✨ Step 6: Generating text...")
|
||||
|
||||
# Start with "hello"
|
||||
context = "hello"
|
||||
context_tokens = tokenizer.encode(context)
|
||||
idx = Tensor(np.array([context_tokens]))
|
||||
|
||||
# Generate 20 new tokens
|
||||
generated = model.generate(idx, max_new_tokens=20)
|
||||
|
||||
# Decode
|
||||
output = tokenizer.decode(generated.data[0].tolist())
|
||||
|
||||
print(f" Input: '{context}'")
|
||||
print(f" Generated: '{output}'")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 7. Validation
|
||||
# ========================================
|
||||
print("="*70)
|
||||
print("✅ Validation Results:")
|
||||
print("="*70)
|
||||
|
||||
checks = []
|
||||
|
||||
# Check 1: Model initialized
|
||||
checks.append(("Model initialization", total_params > 0))
|
||||
|
||||
# Check 2: Forward pass works
|
||||
try:
|
||||
test_logits = model.forward(xb)
|
||||
checks.append(("Forward pass", test_logits.shape == (batch_size, block_size, vocab_size)))
|
||||
except Exception as e:
|
||||
checks.append(("Forward pass", False))
|
||||
print(f" Error: {e}")
|
||||
|
||||
# Check 3: Generation works
|
||||
checks.append(("Text generation", len(output) > len(context)))
|
||||
|
||||
# Check 4: Output is decodable
|
||||
checks.append(("Output decodable", all(c in tokenizer.vocab for c in output)))
|
||||
|
||||
# Print results
|
||||
for check_name, passed in checks:
|
||||
status = "✅" if passed else "❌"
|
||||
print(f"{status} {check_name}")
|
||||
|
||||
print()
|
||||
|
||||
if all(passed for _, passed in checks):
|
||||
print("🎉 SUCCESS! Transformer pipeline is working!")
|
||||
print()
|
||||
print("Next steps:")
|
||||
print(" → Run step2_tinycoder.py for code completion demo")
|
||||
print(" → Run step3_shakespeare.py for text generation demo")
|
||||
else:
|
||||
print("⚠️ Some checks failed. Debug modules 10-13.")
|
||||
|
||||
print("="*70)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1,338 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Step 2: TinyCoder - Code Autocompletion with Transformers
|
||||
==========================================================
|
||||
|
||||
GOAL: Build GitHub Copilot using YOUR TinyTorch code
|
||||
DATASET: Your actual TinyTorch modules (already exists!)
|
||||
TOKENIZER: BPETokenizer (learns code patterns)
|
||||
TIME: ~15 minutes
|
||||
|
||||
This demonstrates:
|
||||
✅ Transformer trained on real Python code
|
||||
✅ Generates syntactically valid completions
|
||||
✅ YOU built the tool you use daily!
|
||||
|
||||
Epic moment: "IT'S COPILOT!"
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import sys
|
||||
import os
|
||||
import glob
|
||||
import re
|
||||
|
||||
# Add project root to path
|
||||
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
sys.path.insert(0, project_root)
|
||||
|
||||
from tinytorch.core.tensor import Tensor
|
||||
from tinytorch.text.tokenization import BPETokenizer
|
||||
from tinytorch.core.embeddings import Embedding, PositionalEncoding
|
||||
from tinytorch.core.attention import MultiHeadAttention
|
||||
from tinytorch.models.transformer import TransformerBlock, LayerNorm
|
||||
from tinytorch.core.layers import Linear
|
||||
from tinytorch.core.optimizers import Adam
|
||||
|
||||
|
||||
class TinyCoder:
|
||||
"""Code completion transformer - like GitHub Copilot!"""
|
||||
|
||||
def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_length):
|
||||
self.vocab_size = vocab_size
|
||||
self.embed_dim = embed_dim
|
||||
self.max_length = max_length
|
||||
|
||||
# Token + position embeddings
|
||||
self.token_embedding = Embedding(vocab_size, embed_dim)
|
||||
self.pos_encoding = PositionalEncoding(max_length, embed_dim)
|
||||
|
||||
# Transformer blocks
|
||||
self.blocks = []
|
||||
for _ in range(num_layers):
|
||||
block = TransformerBlock(embed_dim, num_heads, embed_dim * 4)
|
||||
self.blocks.append(block)
|
||||
|
||||
# Output projection
|
||||
self.ln_f = LayerNorm(embed_dim)
|
||||
self.head = Linear(embed_dim, vocab_size)
|
||||
|
||||
def forward(self, idx):
|
||||
"""Forward pass through the model."""
|
||||
B, T = idx.shape
|
||||
|
||||
# Token + positional embeddings
|
||||
tok_emb = self.token_embedding.forward(idx)
|
||||
x = self.pos_encoding.forward(tok_emb)
|
||||
|
||||
# Transformer blocks
|
||||
for block in self.blocks:
|
||||
x = block(x)
|
||||
|
||||
# Output head
|
||||
x = self.ln_f(x)
|
||||
logits = self.head(x)
|
||||
|
||||
return logits
|
||||
|
||||
def complete(self, tokenizer, prefix, max_new_tokens=20):
|
||||
"""
|
||||
Complete code given a prefix.
|
||||
|
||||
Args:
|
||||
tokenizer: BPETokenizer instance
|
||||
prefix: String prefix to complete
|
||||
max_new_tokens: How many tokens to generate
|
||||
|
||||
Returns:
|
||||
Completed code string
|
||||
"""
|
||||
# Encode prefix
|
||||
tokens = tokenizer.encode(prefix)
|
||||
idx = Tensor(np.array([tokens]))
|
||||
|
||||
# Generate
|
||||
for _ in range(max_new_tokens):
|
||||
# Crop if too long
|
||||
idx_cond = idx if idx.shape[1] <= self.max_length else idx[:, -self.max_length:]
|
||||
|
||||
# Forward pass
|
||||
logits = self.forward(idx_cond)
|
||||
|
||||
# Get next token (greedy)
|
||||
next_token = np.argmax(logits.data[0, -1, :])
|
||||
|
||||
# Stop at newline for single-line completion
|
||||
if tokenizer.decode([next_token]).strip() == '':
|
||||
break
|
||||
|
||||
# Append
|
||||
idx = Tensor(np.concatenate([idx.data, [[next_token]]], axis=1))
|
||||
|
||||
# Decode
|
||||
full_output = tokenizer.decode(idx.data[0].tolist())
|
||||
|
||||
# Return only the new part
|
||||
return full_output[len(prefix):]
|
||||
|
||||
def parameters(self):
|
||||
"""Get all trainable parameters."""
|
||||
params = []
|
||||
params.extend(self.token_embedding.parameters())
|
||||
for block in self.blocks:
|
||||
params.extend(block.parameters())
|
||||
params.extend(self.ln_f.parameters())
|
||||
params.extend(self.head.parameters())
|
||||
return params
|
||||
|
||||
|
||||
def load_tinytorch_code():
|
||||
"""Load all Python code from TinyTorch modules."""
|
||||
print("📂 Loading TinyTorch source code...")
|
||||
|
||||
# Find all Python module files
|
||||
module_dir = os.path.join(project_root, "modules", "source")
|
||||
python_files = []
|
||||
|
||||
# Get .py files from numbered module directories
|
||||
for module_num in range(1, 14): # Modules 01-13
|
||||
pattern = os.path.join(module_dir, f"{module_num:02d}_*", "*_dev.py")
|
||||
files = glob.glob(pattern)
|
||||
python_files.extend(files)
|
||||
|
||||
print(f" Found {len(python_files)} module files")
|
||||
|
||||
# Read all code
|
||||
all_code = []
|
||||
total_lines = 0
|
||||
|
||||
for file_path in python_files:
|
||||
try:
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
code = f.read()
|
||||
all_code.append(code)
|
||||
lines = code.count('\n')
|
||||
total_lines += lines
|
||||
|
||||
module_name = os.path.basename(os.path.dirname(file_path))
|
||||
print(f" ✓ {module_name}: {lines:,} lines")
|
||||
except Exception as e:
|
||||
print(f" ✗ Error reading {file_path}: {e}")
|
||||
|
||||
# Combine all code
|
||||
combined_code = "\n\n# " + "="*50 + "\n\n".join(all_code)
|
||||
|
||||
print(f"\n Total: {total_lines:,} lines of Python code")
|
||||
print(f" Characters: {len(combined_code):,}")
|
||||
|
||||
return combined_code
|
||||
|
||||
|
||||
def main():
|
||||
print("="*70)
|
||||
print("🤖 TinyCoder: Building GitHub Copilot with Transformers")
|
||||
print("="*70)
|
||||
print()
|
||||
print("This trains a transformer on YOUR TinyTorch code to generate")
|
||||
print("code completions - the same technology behind GitHub Copilot!")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 1. Load training data
|
||||
# ========================================
|
||||
code_corpus = load_tinytorch_code()
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 2. Train BPE tokenizer
|
||||
# ========================================
|
||||
print("🔤 Training BPE tokenizer on code...")
|
||||
|
||||
vocab_size = 1000
|
||||
tokenizer = BPETokenizer(vocab_size=vocab_size)
|
||||
|
||||
# Train tokenizer to learn code patterns
|
||||
print(f" Learning {vocab_size} subword units from code...")
|
||||
tokenizer.train(code_corpus)
|
||||
|
||||
# Show some learned tokens
|
||||
print(f"\n Vocabulary size: {len(tokenizer.vocab)}")
|
||||
print(f" Sample tokens:")
|
||||
|
||||
# Find interesting tokens (Python keywords, common patterns)
|
||||
interesting = []
|
||||
for token in list(tokenizer.vocab.keys())[:50]:
|
||||
if any(keyword in token for keyword in ['def', 'class', 'import', 'self', 'return']):
|
||||
interesting.append(token)
|
||||
|
||||
for token in interesting[:10]:
|
||||
print(f" '{token}'")
|
||||
|
||||
# Encode the corpus
|
||||
print(f"\n Tokenizing corpus...")
|
||||
tokens = tokenizer.encode(code_corpus)
|
||||
print(f" Total tokens: {len(tokens):,}")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 3. Prepare training data
|
||||
# ========================================
|
||||
print("📦 Preparing training batches...")
|
||||
|
||||
block_size = 128 # Context length
|
||||
batch_size = 4
|
||||
|
||||
def get_batch():
|
||||
"""Get a random batch of code."""
|
||||
ix = np.random.randint(0, len(tokens) - block_size, size=batch_size)
|
||||
x = np.array([tokens[i:i+block_size] for i in ix])
|
||||
y = np.array([tokens[i+1:i+block_size+1] for i in ix])
|
||||
return Tensor(x), Tensor(y)
|
||||
|
||||
print(f" Block size: {block_size} tokens")
|
||||
print(f" Batch size: {batch_size} sequences")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 4. Initialize model
|
||||
# ========================================
|
||||
print("🏗️ Building TinyCoder model...")
|
||||
|
||||
model = TinyCoder(
|
||||
vocab_size=vocab_size,
|
||||
embed_dim=128,
|
||||
num_heads=8,
|
||||
num_layers=4,
|
||||
max_length=block_size
|
||||
)
|
||||
|
||||
total_params = sum(p.data.size for p in model.parameters())
|
||||
print(f" Parameters: {total_params:,}")
|
||||
print(f" Layers: {len(model.blocks)} transformer blocks")
|
||||
print(f" Heads: 8 attention heads per block")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 5. Train
|
||||
# ========================================
|
||||
print("🏋️ Training on YOUR code (20 steps)...")
|
||||
print(" (In production, this would be 1000s of steps)")
|
||||
print()
|
||||
|
||||
optimizer = Adam(model.parameters(), learning_rate=3e-4)
|
||||
|
||||
for step in range(20):
|
||||
# Get batch
|
||||
xb, yb = get_batch()
|
||||
|
||||
# Forward
|
||||
logits = model.forward(xb)
|
||||
|
||||
# Loss (simplified)
|
||||
B, T, C = logits.shape
|
||||
logits_flat = logits.data.reshape(B*T, C)
|
||||
targets_flat = yb.data.reshape(B*T)
|
||||
|
||||
# One-hot
|
||||
targets_one_hot = np.zeros((B*T, C))
|
||||
for i, t in enumerate(targets_flat):
|
||||
if 0 <= int(t) < C:
|
||||
targets_one_hot[i, int(t)] = 1.0
|
||||
|
||||
loss_value = np.mean((logits_flat - targets_one_hot) ** 2)
|
||||
|
||||
if step % 5 == 0:
|
||||
print(f" Step {step:3d}/20 | Loss: {loss_value:.4f}")
|
||||
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 6. Demo completions!
|
||||
# ========================================
|
||||
print("="*70)
|
||||
print("✨ CODE COMPLETION DEMO")
|
||||
print("="*70)
|
||||
print()
|
||||
|
||||
demos = [
|
||||
"import ",
|
||||
"def forward(self, x):",
|
||||
"class Linear:",
|
||||
"self.",
|
||||
"return ",
|
||||
]
|
||||
|
||||
for prompt in demos:
|
||||
completion = model.complete(tokenizer, prompt, max_new_tokens=10)
|
||||
print(f"Input: '{prompt}'")
|
||||
print(f"Output: '{prompt}{completion}'")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 7. Success!
|
||||
# ========================================
|
||||
print("="*70)
|
||||
print("🏆 SUCCESS! You Built GitHub Copilot!")
|
||||
print("="*70)
|
||||
print()
|
||||
print("What you learned:")
|
||||
print(" ✅ Transformers can learn code patterns")
|
||||
print(" ✅ BPE tokenization captures syntax")
|
||||
print(" ✅ Autoregressive generation produces valid code")
|
||||
print(" ✅ This is THE SAME architecture as Copilot!")
|
||||
print()
|
||||
print("Production differences:")
|
||||
print(" • Real Copilot: 12B+ parameters (you: ~100K)")
|
||||
print(" • Real Copilot: Trained on billions of lines")
|
||||
print(" • Real Copilot: GPU inference <50ms")
|
||||
print(" • But the ARCHITECTURE is what YOU built!")
|
||||
print()
|
||||
print("="*70)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1,349 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Step 3: TinyGPT - Shakespeare Text Generation
|
||||
=============================================
|
||||
|
||||
GOAL: Traditional transformer demo - generate Shakespeare-style text
|
||||
DATASET: Tiny Shakespeare (1MB text file)
|
||||
TOKENIZER: CharTokenizer (character-level for simplicity)
|
||||
TIME: ~15 minutes
|
||||
|
||||
This demonstrates:
|
||||
✅ Transformer learns language patterns
|
||||
✅ Generates coherent text in Shakespeare's style
|
||||
✅ Traditional "hello world" for language models
|
||||
|
||||
Classic demo: "To be or not to be..."
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import sys
|
||||
import os
|
||||
import urllib.request
|
||||
|
||||
# Add project root to path
|
||||
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
sys.path.insert(0, project_root)
|
||||
|
||||
from tinytorch.core.tensor import Tensor
|
||||
from tinytorch.text.tokenization import CharTokenizer
|
||||
from tinytorch.core.embeddings import Embedding, PositionalEncoding
|
||||
from tinytorch.core.attention import MultiHeadAttention
|
||||
from tinytorch.models.transformer import TransformerBlock, LayerNorm
|
||||
from tinytorch.core.layers import Linear
|
||||
from tinytorch.core.optimizers import Adam
|
||||
|
||||
|
||||
class TinyGPT:
|
||||
"""Shakespeare text generation transformer."""
|
||||
|
||||
def __init__(self, vocab_size, embed_dim, num_heads, num_layers, max_length):
|
||||
self.vocab_size = vocab_size
|
||||
self.embed_dim = embed_dim
|
||||
self.max_length = max_length
|
||||
|
||||
# Embeddings
|
||||
self.token_embedding = Embedding(vocab_size, embed_dim)
|
||||
self.pos_encoding = PositionalEncoding(max_length, embed_dim)
|
||||
|
||||
# Transformer blocks
|
||||
self.blocks = []
|
||||
for _ in range(num_layers):
|
||||
block = TransformerBlock(embed_dim, num_heads, embed_dim * 4)
|
||||
self.blocks.append(block)
|
||||
|
||||
# Output
|
||||
self.ln_f = LayerNorm(embed_dim)
|
||||
self.head = Linear(embed_dim, vocab_size)
|
||||
|
||||
def forward(self, idx):
|
||||
"""Forward pass."""
|
||||
B, T = idx.shape
|
||||
|
||||
# Embeddings
|
||||
tok_emb = self.token_embedding.forward(idx)
|
||||
x = self.pos_encoding.forward(tok_emb)
|
||||
|
||||
# Transformer blocks
|
||||
for block in self.blocks:
|
||||
x = block(x)
|
||||
|
||||
# Output
|
||||
x = self.ln_f(x)
|
||||
logits = self.head(x)
|
||||
|
||||
return logits
|
||||
|
||||
def generate(self, tokenizer, start_text, max_new_tokens=100, temperature=0.8):
|
||||
"""
|
||||
Generate text starting from start_text.
|
||||
|
||||
Args:
|
||||
tokenizer: CharTokenizer instance
|
||||
start_text: String to start generation from
|
||||
max_new_tokens: How many characters to generate
|
||||
temperature: Sampling temperature (higher = more random)
|
||||
|
||||
Returns:
|
||||
Generated text string
|
||||
"""
|
||||
# Encode start
|
||||
tokens = tokenizer.encode(start_text)
|
||||
idx = Tensor(np.array([tokens]))
|
||||
|
||||
# Generate
|
||||
for _ in range(max_new_tokens):
|
||||
# Crop if too long
|
||||
idx_cond = idx if idx.shape[1] <= self.max_length else idx[:, -self.max_length:]
|
||||
|
||||
# Forward
|
||||
logits = self.forward(idx_cond)
|
||||
|
||||
# Last token predictions
|
||||
logits_last = logits.data[0, -1, :] / temperature
|
||||
|
||||
# Softmax
|
||||
probs = np.exp(logits_last - np.max(logits_last))
|
||||
probs = probs / np.sum(probs)
|
||||
|
||||
# Sample (or greedy if temperature very low)
|
||||
if temperature < 0.1:
|
||||
next_token = np.argmax(probs)
|
||||
else:
|
||||
next_token = np.random.choice(len(probs), p=probs)
|
||||
|
||||
# Append
|
||||
idx = Tensor(np.concatenate([idx.data, [[next_token]]], axis=1))
|
||||
|
||||
# Decode
|
||||
return tokenizer.decode(idx.data[0].tolist())
|
||||
|
||||
def parameters(self):
|
||||
"""Get all parameters."""
|
||||
params = []
|
||||
params.extend(self.token_embedding.parameters())
|
||||
for block in self.blocks:
|
||||
params.extend(block.parameters())
|
||||
params.extend(self.ln_f.parameters())
|
||||
params.extend(self.head.parameters())
|
||||
return params
|
||||
|
||||
|
||||
def download_shakespeare():
|
||||
"""Download Tiny Shakespeare dataset."""
|
||||
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
|
||||
data_dir = os.path.join(project_root, "milestones", "datasets")
|
||||
os.makedirs(data_dir, exist_ok=True)
|
||||
|
||||
file_path = os.path.join(data_dir, "shakespeare.txt")
|
||||
|
||||
if os.path.exists(file_path):
|
||||
print(f" ✓ Dataset already exists at {file_path}")
|
||||
else:
|
||||
print(f" Downloading from {url}...")
|
||||
try:
|
||||
urllib.request.urlretrieve(url, file_path)
|
||||
print(f" ✓ Downloaded to {file_path}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Download failed: {e}")
|
||||
print(f" Please manually download from: {url}")
|
||||
print(f" And save to: {file_path}")
|
||||
return None
|
||||
|
||||
# Read text
|
||||
with open(file_path, 'r', encoding='utf-8') as f:
|
||||
text = f.read()
|
||||
|
||||
return text
|
||||
|
||||
|
||||
def main():
|
||||
print("="*70)
|
||||
print("📜 TinyGPT: Shakespeare Text Generation")
|
||||
print("="*70)
|
||||
print()
|
||||
print("Train a transformer on Shakespeare's works to generate")
|
||||
print("authentic-sounding 16th century English!")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 1. Download dataset
|
||||
# ========================================
|
||||
print("📥 Step 1: Loading Shakespeare dataset...")
|
||||
text = download_shakespeare()
|
||||
|
||||
if text is None:
|
||||
print("Failed to load dataset. Exiting.")
|
||||
return
|
||||
|
||||
print(f" Text length: {len(text):,} characters")
|
||||
print(f" Sample:")
|
||||
print(f" {text[:200]}...")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 2. Tokenize
|
||||
# ========================================
|
||||
print("🔤 Step 2: Tokenizing (character-level)...")
|
||||
|
||||
tokenizer = CharTokenizer()
|
||||
|
||||
# Build vocab
|
||||
unique_chars = sorted(list(set(text)))
|
||||
tokenizer.vocab = unique_chars
|
||||
tokenizer.char_to_idx = {ch: i for i, ch in enumerate(unique_chars)}
|
||||
tokenizer.idx_to_char = {i: ch for i, ch in enumerate(unique_chars)}
|
||||
|
||||
# Encode
|
||||
data = tokenizer.encode(text)
|
||||
vocab_size = len(tokenizer.vocab)
|
||||
|
||||
print(f" Vocabulary size: {vocab_size} unique characters")
|
||||
print(f" Total tokens: {len(data):,}")
|
||||
print(f" Characters: {tokenizer.vocab[:20]}...")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 3. Split train/val
|
||||
# ========================================
|
||||
print("📊 Step 3: Preparing data splits...")
|
||||
|
||||
n = len(data)
|
||||
train_data = data[:int(n*0.9)]
|
||||
val_data = data[int(n*0.9):]
|
||||
|
||||
print(f" Train: {len(train_data):,} tokens")
|
||||
print(f" Val: {len(val_data):,} tokens")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 4. Batching
|
||||
# ========================================
|
||||
block_size = 128
|
||||
batch_size = 4
|
||||
|
||||
def get_batch(split='train'):
|
||||
"""Get a batch of data."""
|
||||
data_split = train_data if split == 'train' else val_data
|
||||
ix = np.random.randint(0, len(data_split) - block_size, size=batch_size)
|
||||
x = np.array([data_split[i:i+block_size] for i in ix])
|
||||
y = np.array([data_split[i+1:i+block_size+1] for i in ix])
|
||||
return Tensor(x), Tensor(y)
|
||||
|
||||
# ========================================
|
||||
# 5. Initialize model
|
||||
# ========================================
|
||||
print("🏗️ Step 4: Building TinyGPT...")
|
||||
|
||||
model = TinyGPT(
|
||||
vocab_size=vocab_size,
|
||||
embed_dim=128,
|
||||
num_heads=8,
|
||||
num_layers=4,
|
||||
max_length=block_size
|
||||
)
|
||||
|
||||
total_params = sum(p.data.size for p in model.parameters())
|
||||
print(f" Parameters: {total_params:,}")
|
||||
print(f" Architecture: {len(model.blocks)} transformer blocks")
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 6. Train
|
||||
# ========================================
|
||||
print("🏋️ Step 5: Training on Shakespeare (50 steps)...")
|
||||
print(" (In production, this would be 5000+ steps)")
|
||||
print()
|
||||
|
||||
optimizer = Adam(model.parameters(), learning_rate=3e-4)
|
||||
|
||||
for step in range(50):
|
||||
# Get batch
|
||||
xb, yb = get_batch('train')
|
||||
|
||||
# Forward
|
||||
logits = model.forward(xb)
|
||||
|
||||
# Loss (simplified)
|
||||
B, T, C = logits.shape
|
||||
logits_flat = logits.data.reshape(B*T, C)
|
||||
targets_flat = yb.data.reshape(B*T)
|
||||
|
||||
# One-hot
|
||||
targets_one_hot = np.zeros((B*T, C))
|
||||
for i, t in enumerate(targets_flat):
|
||||
targets_one_hot[i, int(t)] = 1.0
|
||||
|
||||
loss_value = np.mean((logits_flat - targets_one_hot) ** 2)
|
||||
|
||||
# Validation loss every 10 steps
|
||||
if step % 10 == 0:
|
||||
xb_val, yb_val = get_batch('val')
|
||||
logits_val = model.forward(xb_val)
|
||||
|
||||
B_val, T_val, C_val = logits_val.shape
|
||||
logits_val_flat = logits_val.data.reshape(B_val*T_val, C_val)
|
||||
targets_val_flat = yb_val.data.reshape(B_val*T_val)
|
||||
|
||||
targets_val_one_hot = np.zeros((B_val*T_val, C_val))
|
||||
for i, t in enumerate(targets_val_flat):
|
||||
targets_val_one_hot[i, int(t)] = 1.0
|
||||
|
||||
val_loss = np.mean((logits_val_flat - targets_val_one_hot) ** 2)
|
||||
|
||||
print(f" Step {step:3d}/50 | Train Loss: {loss_value:.4f} | Val Loss: {val_loss:.4f}")
|
||||
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 7. Generate!
|
||||
# ========================================
|
||||
print("="*70)
|
||||
print("✨ SHAKESPEARE GENERATION")
|
||||
print("="*70)
|
||||
print()
|
||||
|
||||
prompts = [
|
||||
"To be or not to be,",
|
||||
"ROMEO:",
|
||||
"First Citizen:",
|
||||
]
|
||||
|
||||
for prompt in prompts:
|
||||
print(f"Prompt: '{prompt}'")
|
||||
print("-" * 70)
|
||||
|
||||
generated = model.generate(tokenizer, prompt, max_new_tokens=100, temperature=0.8)
|
||||
|
||||
print(generated)
|
||||
print()
|
||||
|
||||
# ========================================
|
||||
# 8. Success!
|
||||
# ========================================
|
||||
print("="*70)
|
||||
print("🎭 SUCCESS! You Built a Language Model!")
|
||||
print("="*70)
|
||||
print()
|
||||
print("What you learned:")
|
||||
print(" ✅ Transformers learn language patterns from data")
|
||||
print(" ✅ Character-level models can generate coherent text")
|
||||
print(" ✅ Temperature controls randomness in generation")
|
||||
print(" ✅ This is the foundation of GPT, ChatGPT, etc!")
|
||||
print()
|
||||
print("Model architecture comparison:")
|
||||
print(" • Your TinyGPT: ~100K parameters, 4 layers")
|
||||
print(" • GPT-2: 117M parameters, 12 layers")
|
||||
print(" • GPT-3: 175B parameters, 96 layers")
|
||||
print(" • GPT-4: ~1.8T parameters, ~120 layers (estimated)")
|
||||
print()
|
||||
print("But the ARCHITECTURE is identical to what YOU built!")
|
||||
print("="*70)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user