mirror of https://github.com/MLSysBook/TinyTorch.git synced 2025-12-05 19:17:52 -06:00

Files

Vijay Janapa Reddi 7bc4f6f835 Reorganize repository: rename docs/ to site/ for clarity

- Delete outdated site/ directory
- Rename docs/ → site/ to match original architecture intent
- Update all GitHub workflows to reference site/:
  - publish-live.yml: Update paths and build directory
  - publish-dev.yml: Update paths and build directory
  - build-pdf.yml: Update paths and artifact locations
- Update README.md:
  - Consolidate site/ documentation (website + PDF)
  - Update all docs/ links to site/
- Test successful: Local build works with all 40 pages

The site/ directory now clearly represents the course website
and documentation, making the repository structure more intuitive.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-12-04 16:31:51 -08:00

7.9 KiB

Raw Permalink Blame History

Team Onboarding Guide: TinyTorch for Industry

Complete guide for using TinyTorch in industry settings: new hire bootcamps, internal training programs, and debugging workshops.

🎯 Overview

TinyTorch's Model 3: Team Onboarding addresses industry use cases where ML teams want members to understand PyTorch internals. This guide covers deployment scenarios, training structures, and best practices for industry adoption.

🚀 Use Cases

1. New Hire Bootcamps (2-3 Week Intensive)

Goal: Rapidly onboard new ML engineers to understand framework internals

Structure:

Week 1: Foundation Tier (Modules 01-07)
- Tensors, autograd, optimizers, training loops
- Focus: Understanding loss.backward() mechanics
Week 2: Architecture Tier (Modules 08-13)
- CNNs, transformers, attention mechanisms
- Focus: Production architecture internals
Week 3: Optimization Tier (Modules 14-19) OR Capstone
- Profiling, quantization, compression
- Focus: Production optimization techniques

Schedule:

Full-time: 40 hours/week
Hands-on coding: 70% of time
Systems discussions: 30% of time
Daily standups and code reviews

Deliverables:

Completed modules with passing tests
Capstone project (optional)
Technical presentation on framework internals

2. Internal Training Programs (Distributed Over Quarters)

Goal: Deep understanding of ML systems for existing team members

Structure:

Quarter 1: Foundation (Modules 01-07)
- Weekly sessions: 2-3 hours
- Self-paced module completion
- Monthly group discussions
Quarter 2: Architecture (Modules 08-13)
- Weekly sessions: 2-3 hours
- Architecture deep-dives
- Production case studies
Quarter 3: Optimization (Modules 14-19)
- Weekly sessions: 2-3 hours
- Performance optimization focus
- Real production optimization projects

Benefits:

Fits into existing work schedules
Allows deep learning without intensive time commitment
Builds team knowledge gradually
Enables peer learning

3. Debugging Workshops (Focused Modules)

Goal: Targeted understanding of specific framework components

Common Focus Areas:

Autograd Debugging Workshop (Module 05)

Understanding gradient flow
Debugging gradient issues
Computational graph visualization
Duration: 1-2 days

Attention Mechanism Workshop (Module 12)

Understanding attention internals
Debugging attention scaling issues
Memory optimization for attention
Duration: 1-2 days

Optimization Workshop (Modules 14-19)

Profiling production models
Quantization and compression
Performance optimization strategies
Duration: 2-3 days

🏗️ Deployment Scenarios

Scenario 1: Cloud-Based Training (Recommended)

Setup: Google Colab or JupyterHub

Zero local installation
Consistent environment
Easy sharing and collaboration
Best for: Large teams, remote workers

Steps:

Clone repository to Colab
Install dependencies: pip install -e .
Work through modules
Share notebooks via Colab links

Scenario 2: Local Development Environment

Setup: Local Python environment

Full control over environment
Better for debugging
Offline capability
Best for: Smaller teams, on-site training

Steps:

Clone repository locally
Set up virtual environment
Install: pip install -e .
Use JupyterLab for development

Scenario 3: Hybrid Approach

Setup: Colab for learning, local for projects

Learn in cloud environment
Apply locally for projects
Best for: Flexible teams

📋 Training Program Templates

Template 1: 2-Week Intensive Bootcamp

Week 1: Foundation

Day 1-2: Modules 01-02 (Tensor, Activations)
Day 3-4: Modules 03-04 (Layers, Losses)
Day 5: Module 05 (Autograd) - Full day focus
Weekend: Review and practice

Week 2: Architecture + Optimization

Day 1-2: Modules 08-09 (DataLoader, CNNs)
Day 3: Module 12 (Attention)
Day 4-5: Modules 14-15 (Profiling, Quantization)
Final: Capstone project presentation

Template 2: 3-Month Distributed Program

Month 1: Foundation

Week 1: Modules 01-02
Week 2: Modules 03-04
Week 3: Module 05 (Autograd)
Week 4: Modules 06-07 (Optimizers, Training)

Month 2: Architecture

Week 1: Modules 08-09
Week 2: Modules 10-11
Week 3: Modules 12-13
Week 4: Integration project

Month 3: Optimization

Week 1: Modules 14-15
Week 2: Modules 16-17
Week 3: Modules 18-19
Week 4: Capstone optimization project

🎓 Learning Outcomes

After completing TinyTorch onboarding, team members will:

Understand Framework Internals
- How autograd works
- Memory allocation patterns
- Optimization trade-offs
Debug Production Issues
- Gradient flow problems
- Memory bottlenecks
- Performance issues
Make Informed Decisions
- Optimizer selection
- Architecture choices
- Deployment strategies
Read Production Code
- Understand PyTorch source
- Navigate framework codebases
- Contribute to ML infrastructure

🔧 Integration with Existing Workflows

Code Review Integration

Review production code with TinyTorch knowledge
Identify framework internals in production code
Suggest optimizations based on systems understanding

Debugging Integration

Apply TinyTorch debugging strategies to production issues
Use systems thinking for troubleshooting
Profile production models using TinyTorch techniques

Architecture Design

Design new models with systems awareness
Consider memory and performance from the start
Make informed trade-offs

📊 Success Metrics

Individual Metrics

Module completion rate
Test passing rate
Capstone project quality
Self-reported confidence increase

Team Metrics

Reduced debugging time
Fewer production incidents
Improved code review quality
Better architecture decisions

🛠️ Setup for Teams

Quick Start

# 1. Clone repository
git clone https://github.com/mlsysbook/TinyTorch.git
cd TinyTorch

# 2. Set up environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt
pip install -e .

# 4. Verify setup
tito system doctor

# 5. Start with Module 01
tito view 01_tensor

Team-Specific Customization

Custom datasets: Replace with company-specific data
Domain modules: Add modules for specific use cases
Integration: Connect to company ML infrastructure
Assessment: Customize grading for team needs

📚 Resources

Student Quickstart: docs/STUDENT_QUICKSTART.md
Instructor Guide: INSTRUCTOR.md (for training leads)
TA Guide: TA_GUIDE.md (for support staff)
Module Documentation: modules/*/ABOUT.md

💼 Industry Case Studies

Case Study 1: ML Infrastructure Team

Challenge: Team members could use PyTorch but couldn't debug framework issues Solution: 2-week intensive bootcamp focusing on autograd and optimization Result: 50% reduction in debugging time, better architecture decisions

Case Study 2: Research Team

Challenge: Researchers needed to understand transformer internals Solution: Focused workshop on Modules 12-13 (Attention, Transformers) Result: Improved model designs, better understanding of scaling

Case Study 3: Production ML Team

Challenge: Team needed optimization skills for deployment Solution: 3-month program focusing on Optimization Tier (Modules 14-19) Result: 4x model compression, 10x speedup on production models

🎯 Next Steps

Choose deployment model: Bootcamp, distributed, or workshop
Set up environment: Cloud (Colab) or local
Select modules: Full curriculum or focused selection
Schedule training: Intensive or distributed
Track progress: Use checkpoint system or custom metrics

For Questions: See INSTRUCTOR.md or contact TinyTorch maintainers

7.9 KiB Raw Permalink Blame History