mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2025-12-05 19:17:52 -06:00
- Delete outdated site/ directory - Rename docs/ → site/ to match original architecture intent - Update all GitHub workflows to reference site/: - publish-live.yml: Update paths and build directory - publish-dev.yml: Update paths and build directory - build-pdf.yml: Update paths and artifact locations - Update README.md: - Consolidate site/ documentation (website + PDF) - Update all docs/ links to site/ - Test successful: Local build works with all 40 pages The site/ directory now clearly represents the course website and documentation, making the repository structure more intuitive. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
283 lines
7.9 KiB
Markdown
283 lines
7.9 KiB
Markdown
# Team Onboarding Guide: TinyTorch for Industry
|
|
|
|
Complete guide for using TinyTorch in industry settings: new hire bootcamps, internal training programs, and debugging workshops.
|
|
|
|
## 🎯 Overview
|
|
|
|
TinyTorch's **Model 3: Team Onboarding** addresses industry use cases where ML teams want members to understand PyTorch internals. This guide covers deployment scenarios, training structures, and best practices for industry adoption.
|
|
|
|
## 🚀 Use Cases
|
|
|
|
### 1. New Hire Bootcamps (2-3 Week Intensive)
|
|
|
|
**Goal**: Rapidly onboard new ML engineers to understand framework internals
|
|
|
|
**Structure**:
|
|
- **Week 1**: Foundation Tier (Modules 01-07)
|
|
- Tensors, autograd, optimizers, training loops
|
|
- Focus: Understanding `loss.backward()` mechanics
|
|
- **Week 2**: Architecture Tier (Modules 08-13)
|
|
- CNNs, transformers, attention mechanisms
|
|
- Focus: Production architecture internals
|
|
- **Week 3**: Optimization Tier (Modules 14-19) OR Capstone
|
|
- Profiling, quantization, compression
|
|
- Focus: Production optimization techniques
|
|
|
|
**Schedule**:
|
|
- Full-time: 40 hours/week
|
|
- Hands-on coding: 70% of time
|
|
- Systems discussions: 30% of time
|
|
- Daily standups and code reviews
|
|
|
|
**Deliverables**:
|
|
- Completed modules with passing tests
|
|
- Capstone project (optional)
|
|
- Technical presentation on framework internals
|
|
|
|
### 2. Internal Training Programs (Distributed Over Quarters)
|
|
|
|
**Goal**: Deep understanding of ML systems for existing team members
|
|
|
|
**Structure**:
|
|
- **Quarter 1**: Foundation (Modules 01-07)
|
|
- Weekly sessions: 2-3 hours
|
|
- Self-paced module completion
|
|
- Monthly group discussions
|
|
- **Quarter 2**: Architecture (Modules 08-13)
|
|
- Weekly sessions: 2-3 hours
|
|
- Architecture deep-dives
|
|
- Production case studies
|
|
- **Quarter 3**: Optimization (Modules 14-19)
|
|
- Weekly sessions: 2-3 hours
|
|
- Performance optimization focus
|
|
- Real production optimization projects
|
|
|
|
**Benefits**:
|
|
- Fits into existing work schedules
|
|
- Allows deep learning without intensive time commitment
|
|
- Builds team knowledge gradually
|
|
- Enables peer learning
|
|
|
|
### 3. Debugging Workshops (Focused Modules)
|
|
|
|
**Goal**: Targeted understanding of specific framework components
|
|
|
|
**Common Focus Areas**:
|
|
|
|
#### Autograd Debugging Workshop (Module 05)
|
|
- Understanding gradient flow
|
|
- Debugging gradient issues
|
|
- Computational graph visualization
|
|
- **Duration**: 1-2 days
|
|
|
|
#### Attention Mechanism Workshop (Module 12)
|
|
- Understanding attention internals
|
|
- Debugging attention scaling issues
|
|
- Memory optimization for attention
|
|
- **Duration**: 1-2 days
|
|
|
|
#### Optimization Workshop (Modules 14-19)
|
|
- Profiling production models
|
|
- Quantization and compression
|
|
- Performance optimization strategies
|
|
- **Duration**: 2-3 days
|
|
|
|
## 🏗️ Deployment Scenarios
|
|
|
|
### Scenario 1: Cloud-Based Training (Recommended)
|
|
|
|
**Setup**: Google Colab or JupyterHub
|
|
- Zero local installation
|
|
- Consistent environment
|
|
- Easy sharing and collaboration
|
|
- **Best for**: Large teams, remote workers
|
|
|
|
**Steps**:
|
|
1. Clone repository to Colab
|
|
2. Install dependencies: `pip install -e .`
|
|
3. Work through modules
|
|
4. Share notebooks via Colab links
|
|
|
|
### Scenario 2: Local Development Environment
|
|
|
|
**Setup**: Local Python environment
|
|
- Full control over environment
|
|
- Better for debugging
|
|
- Offline capability
|
|
- **Best for**: Smaller teams, on-site training
|
|
|
|
**Steps**:
|
|
1. Clone repository locally
|
|
2. Set up virtual environment
|
|
3. Install: `pip install -e .`
|
|
4. Use JupyterLab for development
|
|
|
|
### Scenario 3: Hybrid Approach
|
|
|
|
**Setup**: Colab for learning, local for projects
|
|
- Learn in cloud environment
|
|
- Apply locally for projects
|
|
- **Best for**: Flexible teams
|
|
|
|
## 📋 Training Program Templates
|
|
|
|
### Template 1: 2-Week Intensive Bootcamp
|
|
|
|
**Week 1: Foundation**
|
|
- Day 1-2: Modules 01-02 (Tensor, Activations)
|
|
- Day 3-4: Modules 03-04 (Layers, Losses)
|
|
- Day 5: Module 05 (Autograd) - Full day focus
|
|
- Weekend: Review and practice
|
|
|
|
**Week 2: Architecture + Optimization**
|
|
- Day 1-2: Modules 08-09 (DataLoader, CNNs)
|
|
- Day 3: Module 12 (Attention)
|
|
- Day 4-5: Modules 14-15 (Profiling, Quantization)
|
|
- Final: Capstone project presentation
|
|
|
|
### Template 2: 3-Month Distributed Program
|
|
|
|
**Month 1: Foundation**
|
|
- Week 1: Modules 01-02
|
|
- Week 2: Modules 03-04
|
|
- Week 3: Module 05 (Autograd)
|
|
- Week 4: Modules 06-07 (Optimizers, Training)
|
|
|
|
**Month 2: Architecture**
|
|
- Week 1: Modules 08-09
|
|
- Week 2: Modules 10-11
|
|
- Week 3: Modules 12-13
|
|
- Week 4: Integration project
|
|
|
|
**Month 3: Optimization**
|
|
- Week 1: Modules 14-15
|
|
- Week 2: Modules 16-17
|
|
- Week 3: Modules 18-19
|
|
- Week 4: Capstone optimization project
|
|
|
|
## 🎓 Learning Outcomes
|
|
|
|
After completing TinyTorch onboarding, team members will:
|
|
|
|
1. **Understand Framework Internals**
|
|
- How autograd works
|
|
- Memory allocation patterns
|
|
- Optimization trade-offs
|
|
|
|
2. **Debug Production Issues**
|
|
- Gradient flow problems
|
|
- Memory bottlenecks
|
|
- Performance issues
|
|
|
|
3. **Make Informed Decisions**
|
|
- Optimizer selection
|
|
- Architecture choices
|
|
- Deployment strategies
|
|
|
|
4. **Read Production Code**
|
|
- Understand PyTorch source
|
|
- Navigate framework codebases
|
|
- Contribute to ML infrastructure
|
|
|
|
## 🔧 Integration with Existing Workflows
|
|
|
|
### Code Review Integration
|
|
|
|
- Review production code with TinyTorch knowledge
|
|
- Identify framework internals in production code
|
|
- Suggest optimizations based on systems understanding
|
|
|
|
### Debugging Integration
|
|
|
|
- Apply TinyTorch debugging strategies to production issues
|
|
- Use systems thinking for troubleshooting
|
|
- Profile production models using TinyTorch techniques
|
|
|
|
### Architecture Design
|
|
|
|
- Design new models with systems awareness
|
|
- Consider memory and performance from the start
|
|
- Make informed trade-offs
|
|
|
|
## 📊 Success Metrics
|
|
|
|
### Individual Metrics
|
|
- Module completion rate
|
|
- Test passing rate
|
|
- Capstone project quality
|
|
- Self-reported confidence increase
|
|
|
|
### Team Metrics
|
|
- Reduced debugging time
|
|
- Fewer production incidents
|
|
- Improved code review quality
|
|
- Better architecture decisions
|
|
|
|
## 🛠️ Setup for Teams
|
|
|
|
### Quick Start
|
|
|
|
```bash
|
|
# 1. Clone repository
|
|
git clone https://github.com/mlsysbook/TinyTorch.git
|
|
cd TinyTorch
|
|
|
|
# 2. Set up environment
|
|
python -m venv .venv
|
|
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
|
|
|
# 3. Install dependencies
|
|
pip install -r requirements.txt
|
|
pip install -e .
|
|
|
|
# 4. Verify setup
|
|
tito system doctor
|
|
|
|
# 5. Start with Module 01
|
|
tito view 01_tensor
|
|
```
|
|
|
|
### Team-Specific Customization
|
|
|
|
- **Custom datasets**: Replace with company-specific data
|
|
- **Domain modules**: Add modules for specific use cases
|
|
- **Integration**: Connect to company ML infrastructure
|
|
- **Assessment**: Customize grading for team needs
|
|
|
|
## 📚 Resources
|
|
|
|
- **Student Quickstart**: `docs/STUDENT_QUICKSTART.md`
|
|
- **Instructor Guide**: `INSTRUCTOR.md` (for training leads)
|
|
- **TA Guide**: `TA_GUIDE.md` (for support staff)
|
|
- **Module Documentation**: `modules/*/ABOUT.md`
|
|
|
|
## 💼 Industry Case Studies
|
|
|
|
### Case Study 1: ML Infrastructure Team
|
|
**Challenge**: Team members could use PyTorch but couldn't debug framework issues
|
|
**Solution**: 2-week intensive bootcamp focusing on autograd and optimization
|
|
**Result**: 50% reduction in debugging time, better architecture decisions
|
|
|
|
### Case Study 2: Research Team
|
|
**Challenge**: Researchers needed to understand transformer internals
|
|
**Solution**: Focused workshop on Modules 12-13 (Attention, Transformers)
|
|
**Result**: Improved model designs, better understanding of scaling
|
|
|
|
### Case Study 3: Production ML Team
|
|
**Challenge**: Team needed optimization skills for deployment
|
|
**Solution**: 3-month program focusing on Optimization Tier (Modules 14-19)
|
|
**Result**: 4x model compression, 10x speedup on production models
|
|
|
|
## 🎯 Next Steps
|
|
|
|
1. **Choose deployment model**: Bootcamp, distributed, or workshop
|
|
2. **Set up environment**: Cloud (Colab) or local
|
|
3. **Select modules**: Full curriculum or focused selection
|
|
4. **Schedule training**: Intensive or distributed
|
|
5. **Track progress**: Use checkpoint system or custom metrics
|
|
|
|
---
|
|
|
|
**For Questions**: See `INSTRUCTOR.md` or contact TinyTorch maintainers
|
|
|