Add community and benchmark features with baseline validation

- Implement tito benchmark baseline and capstone commands
- Add SPEC-style normalization for baseline benchmarks
- Implement tito community join, update, leave, stats, profile commands
- Use project-local storage (.tinytorch/) for user data
- Add privacy-by-design with explicit consent prompts
- Update site documentation for community and benchmark features
- Add Marimo integration for online notebooks
- Clean up redundant milestone setup exploration docs
- Finalize baseline design: fast setup validation (~1 second) with normalized results
This commit is contained in:
Vijay Janapa Reddi
2025-11-20 00:17:21 -05:00
parent 430c8c630f
commit 6a322627dc
66 changed files with 10015 additions and 3181 deletions

1
.gitattributes vendored
View File

@@ -7,4 +7,3 @@ tinytorch/core/*.py -diff
*.zip filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.bin filter=lfs diff=lfs merge=lfs -text
*.gif filter=lfs diff=lfs merge=lfs -text

5
.gitignore vendored
View File

@@ -87,10 +87,13 @@ site/.venv/
book/_build/
site/_build/
# NBGrader
# NBGrader - assignments are dynamically generated via 'tito nbgrader generate'
# Only ignore student submissions and grading outputs, not source/release (for now)
assignments/autograded/
assignments/feedback/
assignments/submitted/
# Note: assignments/source/ and assignments/release/ are kept in git for now
# but should be regenerated with 'tito nbgrader generate' when modules change
# Logs
*.log

578
INSTRUCTOR.md Normal file
View File

@@ -0,0 +1,578 @@
# 👩‍🏫 TinyTorch Instructor Guide
Complete guide for teaching ML Systems Engineering with TinyTorch.
## 🎯 Course Overview
TinyTorch teaches ML systems engineering through building, not just using. Students construct a complete ML framework from tensors to transformers, understanding memory, performance, and scaling at each step.
## 🛠️ Instructor Setup
### **1. Initial Setup**
```bash
# Clone and setup
git clone https://github.com/MLSysBook/TinyTorch.git
cd TinyTorch
# Virtual environment (MANDATORY)
python -m venv .venv
source .venv/bin/activate
# Install with instructor tools
pip install -r requirements.txt
pip install nbgrader
# Setup grading infrastructure
tito grade setup
```
### **2. Verify Installation**
```bash
tito system doctor
# Should show all green checkmarks
tito grade
# Should show available grade commands
```
## 📝 Assignment Workflow
### **Simplified with Tito CLI**
We've wrapped NBGrader behind simple `tito grade` commands so you don't need to learn NBGrader's complex interface.
### **1. Prepare Assignments**
```bash
# Generate instructor version (with solutions)
tito grade generate 01_tensor
# Create student version (solutions removed)
tito grade release 01_tensor
# Student version will be in: release/tinytorch/01_tensor/
```
### **2. Distribute to Students**
```bash
# Option A: GitHub Classroom (recommended)
# 1. Create assignment repository from TinyTorch
# 2. Remove solutions from modules
# 3. Students clone and work
# Option B: Direct distribution
# Share the release/ directory contents
```
### **3. Collect Submissions**
```bash
# Collect all students
tito grade collect 01_tensor
# Or specific student
tito grade collect 01_tensor --student student_id
```
### **4. Auto-Grade**
```bash
# Grade all submissions
tito grade autograde 01_tensor
# Grade specific student
tito grade autograde 01_tensor --student student_id
```
### **5. Manual Review**
```bash
# Open grading interface (browser-based)
tito grade manual 01_tensor
# This launches a web interface for:
# - Reviewing ML Systems question responses
# - Adding feedback comments
# - Adjusting auto-grades
```
### **6. Generate Feedback**
```bash
# Create feedback files for students
tito grade feedback 01_tensor
```
### **7. Export Grades**
```bash
# Export all grades to CSV
tito grade export
# Or specific module
tito grade export --module 01_tensor --output grades_module01.csv
```
## 📊 Grading Components
### **Auto-Graded (70%)**
- Code implementation correctness
- Test passing
- Function signatures
- Output validation
### **Manually Graded (30%)**
- ML Systems Thinking questions (3 per module)
- Each question: 10 points
- Focus on understanding, not perfection
### **Grading Rubric for ML Systems Questions**
| Points | Criteria |
|--------|----------|
| 9-10 | Demonstrates deep understanding, references specific code, discusses systems implications |
| 7-8 | Good understanding, some code references, basic systems thinking |
| 5-6 | Surface understanding, generic response, limited systems perspective |
| 3-4 | Attempted but misses key concepts |
| 0-2 | No attempt or completely off-topic |
**What to Look For:**
- References to actual implemented code
- Memory/performance analysis
- Scaling considerations
- Production system comparisons
- Understanding of trade-offs
## 📋 Sample Solutions for Grading Calibration
This section provides sample solutions to help calibrate grading standards. Use these as reference points when evaluating student submissions.
### Module 01: Tensor - Memory Footprint
**Excellent Solution (9-10 points)**:
```python
def memory_footprint(self):
"""Calculate tensor memory in bytes."""
return self.data.nbytes
```
**Why Excellent**:
- Concise and correct
- Uses NumPy's built-in `nbytes` property
- Clear docstring
- Handles all tensor shapes correctly
**Good Solution (7-8 points)**:
```python
def memory_footprint(self):
"""Calculate memory usage."""
return np.prod(self.data.shape) * self.data.dtype.itemsize
```
**Why Good**:
- Correct implementation
- Manually calculates (shows understanding)
- Works but less efficient than using `nbytes`
- Minor: docstring could be more specific
**Acceptable Solution (5-6 points)**:
```python
def memory_footprint(self):
size = 1
for dim in self.data.shape:
size *= dim
return size * 4 # Assumes float32
```
**Why Acceptable**:
- Correct logic but hardcoded dtype size
- Works for float32 but fails for other dtypes
- Shows understanding of memory calculation
- Missing proper dtype handling
### Module 05: Autograd - Backward Pass
**Excellent Solution (9-10 points)**:
```python
def backward(self, gradient=None):
"""Backward pass through computational graph."""
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn is not None:
# Compute gradients for inputs
input_grads = self.grad_fn.backward(gradient)
# Propagate to input tensors
if isinstance(input_grads, tuple):
for input_tensor, input_grad in zip(self.grad_fn.inputs, input_grads):
if input_tensor.requires_grad:
input_tensor.backward(input_grad)
else:
if self.grad_fn.inputs[0].requires_grad:
self.grad_fn.inputs[0].backward(input_grads)
```
**Why Excellent**:
- Handles both scalar and tensor gradients
- Properly checks `requires_grad` before propagating
- Handles tuple returns from grad_fn
- Clear variable names and structure
**Good Solution (7-8 points)**:
```python
def backward(self, gradient=None):
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn:
grads = self.grad_fn.backward(gradient)
for inp, grad in zip(self.grad_fn.inputs, grads):
inp.backward(grad)
```
**Why Good**:
- Correct logic
- Missing `requires_grad` check (minor issue)
- Assumes grads is always iterable (may fail for single input)
- Works for most cases but less robust
**Acceptable Solution (5-6 points)**:
```python
def backward(self, grad):
self.grad = grad
if self.grad_fn:
self.grad_fn.inputs[0].backward(self.grad_fn.backward(grad))
```
**Why Acceptable**:
- Basic backward pass works
- Only handles single input (fails for multi-input operations)
- Missing None gradient handling
- Shows understanding but incomplete
### Module 09: Spatial - Convolution Implementation
**Excellent Solution (9-10 points)**:
```python
def forward(self, x):
"""Forward pass with explicit loops for clarity."""
batch_size, in_channels, height, width = x.shape
out_height = (height - self.kernel_size + 2 * self.padding) // self.stride + 1
out_width = (width - self.kernel_size + 2 * self.padding) // self.stride + 1
output = np.zeros((batch_size, self.out_channels, out_height, out_width))
# Apply padding
if self.padding > 0:
x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding),
(self.padding, self.padding)), mode='constant')
# Explicit convolution loops
for b in range(batch_size):
for oc in range(self.out_channels):
for oh in range(out_height):
for ow in range(out_width):
h_start = oh * self.stride
w_start = ow * self.stride
h_end = h_start + self.kernel_size
w_end = w_start + self.kernel_size
window = x[b, :, h_start:h_end, w_start:w_end]
output[b, oc, oh, ow] = np.sum(
window * self.weight[oc] + self.bias[oc]
)
return Tensor(output, requires_grad=x.requires_grad)
```
**Why Excellent**:
- Clear output shape calculation
- Proper padding handling
- Explicit loops make O(kernel_size²) complexity visible
- Correct gradient tracking setup
- Well-structured and readable
**Good Solution (7-8 points)**:
```python
def forward(self, x):
B, C, H, W = x.shape
out_h = (H - self.kernel_size) // self.stride + 1
out_w = (W - self.kernel_size) // self.stride + 1
out = np.zeros((B, self.out_channels, out_h, out_w))
for b in range(B):
for oc in range(self.out_channels):
for i in range(out_h):
for j in range(out_w):
h = i * self.stride
w = j * self.stride
out[b, oc, i, j] = np.sum(
x[b, :, h:h+self.kernel_size, w:w+self.kernel_size]
* self.weight[oc]
) + self.bias[oc]
return Tensor(out)
```
**Why Good**:
- Correct implementation
- Missing padding support (works only for padding=0)
- Less clear variable names
- Missing requires_grad propagation
**Acceptable Solution (5-6 points)**:
```python
def forward(self, x):
out = np.zeros((x.shape[0], self.out_channels, x.shape[2]-2, x.shape[3]-2))
for b in range(x.shape[0]):
for c in range(self.out_channels):
for i in range(out.shape[2]):
for j in range(out.shape[3]):
out[b, c, i, j] = np.sum(x[b, :, i:i+3, j:j+3] * self.weight[c])
return Tensor(out)
```
**Why Acceptable**:
- Basic convolution works
- Hardcoded kernel_size=3 (not general)
- No stride or padding support
- Shows understanding but incomplete
### Module 12: Attention - Scaled Dot-Product Attention
**Excellent Solution (9-10 points)**:
```python
def forward(self, query, key, value, mask=None):
"""Scaled dot-product attention with numerical stability."""
# Compute attention scores
scores = np.dot(query, key.T) / np.sqrt(self.d_k)
# Apply mask if provided
if mask is not None:
scores = np.where(mask, scores, -1e9)
# Softmax with numerical stability
exp_scores = np.exp(scores - np.max(scores, axis=-1, keepdims=True))
attention_weights = exp_scores / np.sum(exp_scores, axis=-1, keepdims=True)
# Apply attention to values
output = np.dot(attention_weights, value)
return output, attention_weights
```
**Why Excellent**:
- Proper scaling factor (1/√d_k)
- Numerical stability with max subtraction
- Mask handling
- Returns both output and attention weights
- Clear and well-documented
**Good Solution (7-8 points)**:
```python
def forward(self, q, k, v):
scores = np.dot(q, k.T) / np.sqrt(q.shape[-1])
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)
return np.dot(weights, v)
```
**Why Good**:
- Correct implementation
- Missing numerical stability (may overflow)
- Missing mask support
- Works but less robust
**Acceptable Solution (5-6 points)**:
```python
def forward(self, q, k, v):
scores = np.dot(q, k.T)
weights = np.exp(scores) / np.sum(np.exp(scores))
return np.dot(weights, v)
```
**Why Acceptable**:
- Basic attention mechanism
- Missing scaling factor
- Missing numerical stability
- Incorrect softmax (should be per-row)
### Grading Guidelines Using Sample Solutions
**When Evaluating Student Code**:
1. **Correctness First**: Does it pass all tests?
- If no: Maximum 6 points (even if well-written)
- If yes: Proceed to quality evaluation
2. **Code Quality**:
- **Excellent (9-10)**: Production-ready, handles edge cases, well-documented
- **Good (7-8)**: Correct and functional, minor improvements possible
- **Acceptable (5-6)**: Works but incomplete or has issues
3. **Systems Thinking**:
- **Excellent**: Discusses memory, performance, scaling implications
- **Good**: Some systems awareness
- **Acceptable**: Focuses only on correctness
4. **Common Patterns**:
- Look for: Proper error handling, edge case consideration, documentation
- Red flags: Hardcoded values, missing checks, unclear variable names
**Remember**: These are calibration examples. Adjust based on your course level and learning objectives. The goal is consistent evaluation, not perfection.
## 📚 Module Teaching Notes
### **Module 01: Tensor**
- **Focus**: Memory layout, data structures
- **Key Concept**: Understanding memory is crucial for ML performance
- **Demo**: Show memory profiling, copying behavior
### **Module 02: Activations**
- **Focus**: Vectorization, numerical stability
- **Key Concept**: Small details matter at scale
- **Demo**: Gradient vanishing/exploding
### **Module 04-05: Layers & Networks**
- **Focus**: Composition, parameter management
- **Key Concept**: Building blocks combine into complex systems
- **Project**: Build a small CNN
### **Module 06-07: Spatial & Attention**
- **Focus**: Algorithmic complexity, memory patterns
- **Key Concept**: O(N²) operations become bottlenecks
- **Demo**: Profile attention memory usage
### **Module 08-11: Training Pipeline**
- **Focus**: End-to-end system integration
- **Key Concept**: Many components must work together
- **Project**: Train a real model
### **Module 12-15: Production**
- **Focus**: Deployment, optimization, monitoring
- **Key Concept**: Academic vs production requirements
- **Demo**: Model compression, deployment
### **Module 16: TinyGPT**
- **Focus**: Framework generalization
- **Key Concept**: 70% component reuse from vision to language
- **Capstone**: Build a working language model
## 🎯 Learning Objectives
By course end, students should be able to:
1. **Build** complete ML systems from scratch
2. **Analyze** memory usage and computational complexity
3. **Debug** performance bottlenecks
4. **Optimize** for production deployment
5. **Understand** framework design decisions
6. **Apply** systems thinking to ML problems
## 📈 Tracking Progress
### **Individual Progress**
```bash
# Check specific student progress
tito checkpoint status --student student_id
```
### **Class Overview**
```bash
# Export all checkpoint achievements
tito checkpoint export --output class_progress.csv
```
### **Identify Struggling Students**
Look for:
- Missing checkpoint achievements
- Low scores on ML Systems questions
- Incomplete module submissions
## 💡 Teaching Tips
### **1. Emphasize Building Over Theory**
- Have students type every line of code
- Run tests immediately after implementation
- Break and fix things intentionally
### **2. Connect to Production Systems**
- Show PyTorch/TensorFlow equivalents
- Discuss real-world bottlenecks
- Share production war stories
### **3. Make Performance Visible**
```python
# Use profilers liberally
with TimeProfiler("operation"):
result = expensive_operation()
# Show memory usage
print(f"Memory: {get_memory_usage():.2f} MB")
```
### **4. Encourage Systems Questions**
- "What would break at 1B parameters?"
- "How would you distributed this?"
- "What's the bottleneck here?"
## 🔧 Troubleshooting
### **Common Student Issues**
**Environment Problems**
```bash
# Student fix:
tito system doctor
tito system reset
```
**Module Import Errors**
```bash
# Rebuild package
tito export --all
```
**Test Failures**
```bash
# Detailed test output
tito module test MODULE --verbose
```
### **NBGrader Issues**
**Database Locked**
```bash
# Clear NBGrader database
rm gradebook.db
tito grade setup
```
**Missing Submissions**
```bash
# Check submission directory
ls submitted/*/MODULE/
```
## 📊 Sample Schedule (16 Weeks)
| Week | Module | Focus |
|------|--------|-------|
| 1 | 01 Tensor | Data Structures, Memory |
| 2 | 02 Activations | Non-linearity Functions |
| 3 | 03 Layers | Neural Network Components |
| 4 | 04 Losses | Optimization Objectives |
| 5 | 05 Autograd | Automatic Differentiation |
| 6 | 06 Optimizers | Training Algorithms |
| 7 | 07 Training | Complete Training Loop |
| 8 | Midterm Project | Build and Train Network |
| 9 | 08 DataLoader | Data Pipeline |
| 10 | 09 Spatial | Convolutions, CNNs |
| 11 | 10 Tokenization | Text Processing |
| 12 | 11 Embeddings | Word Representations |
| 13 | 12 Attention | Attention Mechanisms |
| 14 | 13 Transformers | Transformer Architecture |
| 15 | 14-19 Optimization | Profiling, Quantization, etc. |
| 16 | 20 Capstone | Torch Olympics Competition |
## 🎓 Assessment Strategy
### **Continuous Assessment (70%)**
- Module completion: 4% each × 16 = 64%
- Checkpoint achievements: 6%
### **Projects (30%)**
- Midterm: Build and train CNN (15%)
- Final: Extend TinyGPT (15%)
## 📚 Additional Resources
- [MLSys Book](https://mlsysbook.ai) - Companion textbook
- [Course Discussions](https://github.com/MLSysBook/TinyTorch/discussions)
- [Issue Tracker](https://github.com/MLSysBook/TinyTorch/issues)
---
**Need help? Open an issue or contact the TinyTorch team!**

View File

@@ -1,850 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "699bd495",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Setup - TinyTorch System Configuration\n",
"\n",
"Welcome to TinyTorch! This setup module configures your personal TinyTorch installation and teaches you the NBGrader workflow.\n",
"\n",
"## Learning Goals\n",
"- Configure your personal TinyTorch installation with custom information\n",
"- Learn to query system information using Python modules\n",
"- Master the NBGrader workflow: implement → test → export\n",
"- Create functions that become part of your tinytorch package\n",
"- Understand solution blocks, hidden tests, and automated grading\n",
"\n",
"## The Big Picture: Why Configuration Matters in ML Systems\n",
"Configuration is the foundation of any production ML system. In this module, you'll learn:\n",
"\n",
"### 1. **System Awareness**\n",
"Real ML systems need to understand their environment:\n",
"- **Hardware constraints**: Memory, CPU cores, GPU availability\n",
"- **Software dependencies**: Python version, library compatibility\n",
"- **Platform differences**: Linux servers, macOS development, Windows deployment\n",
"\n",
"### 2. **Reproducibility**\n",
"Configuration enables reproducible ML:\n",
"- **Environment documentation**: Exactly what system was used\n",
"- **Dependency management**: Precise versions and requirements\n",
"- **Debugging support**: System info helps troubleshoot issues\n",
"\n",
"### 3. **Professional Development**\n",
"Proper configuration shows engineering maturity:\n",
"- **Attribution**: Your work is properly credited\n",
"- **Collaboration**: Others can understand and extend your setup\n",
"- **Maintenance**: Systems can be updated and maintained\n",
"\n",
"### 4. **ML Systems Context**\n",
"This connects to broader ML engineering:\n",
"- **Model deployment**: Different environments need different configs\n",
"- **Monitoring**: System metrics help track performance\n",
"- **Scaling**: Understanding hardware helps optimize training\n",
"\n",
"Let's build the foundation of your ML systems engineering skills!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a06f484d",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "setup-imports",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"#| default_exp core.setup\n",
"\n",
"#| export\n",
"import sys\n",
"import platform\n",
"import psutil\n",
"import os\n",
"from typing import Dict, Any"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f63f890e",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "setup-verification",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"print(\"🔥 TinyTorch Setup Module\")\n",
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
"print(f\"Platform: {platform.system()}\")\n",
"print(\"Ready to configure your TinyTorch installation!\")"
]
},
{
"cell_type": "markdown",
"id": "de5378e3",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🏗️ The Architecture of ML Systems Configuration\n",
"\n",
"### Configuration Layers in Production ML\n",
"Real ML systems have multiple configuration layers:\n",
"\n",
"```\n",
"┌─────────────────────────────────────┐\n",
"│ Application Config │ ← Your personal info\n",
"├─────────────────────────────────────┤\n",
"│ System Environment │ ← Hardware specs\n",
"├─────────────────────────────────────┤\n",
"│ Runtime Configuration │ ← Python, libraries\n",
"├─────────────────────────────────────┤\n",
"│ Infrastructure Config │ ← Cloud, containers\n",
"└─────────────────────────────────────┘\n",
"```\n",
"\n",
"### Why Each Layer Matters\n",
"- **Application**: Identifies who built what and when\n",
"- **System**: Determines performance characteristics and limitations\n",
"- **Runtime**: Affects compatibility and feature availability\n",
"- **Infrastructure**: Enables scaling and deployment strategies\n",
"\n",
"### Connection to Real ML Frameworks\n",
"Every major ML framework has configuration:\n",
"- **PyTorch**: `torch.cuda.is_available()`, `torch.get_num_threads()`\n",
"- **TensorFlow**: `tf.config.list_physical_devices()`, `tf.sysconfig.get_build_info()`\n",
"- **Hugging Face**: Model cards with system requirements and performance metrics\n",
"- **MLflow**: Experiment tracking with system context and reproducibility\n",
"\n",
"### TinyTorch's Approach\n",
"We'll build configuration that's:\n",
"- **Educational**: Teaches system awareness\n",
"- **Practical**: Actually useful for debugging\n",
"- **Professional**: Follows industry standards\n",
"- **Extensible**: Ready for future ML systems features"
]
},
{
"cell_type": "markdown",
"id": "9c51b4b0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 2
},
"source": [
"## Step 1: What is System Configuration?\n",
"\n",
"### Definition\n",
"**System configuration** is the process of setting up your development environment with personalized information and system diagnostics. In TinyTorch, this means:\n",
"\n",
"- **Personal Information**: Your name, email, institution for identification\n",
"- **System Information**: Hardware specs, Python version, platform details\n",
"- **Customization**: Making your TinyTorch installation uniquely yours\n",
"\n",
"### Why Configuration Matters in ML Systems\n",
"Proper system configuration is crucial because:\n",
"\n",
"#### 1. **Reproducibility** \n",
"Your setup can be documented and shared:\n",
"```python\n",
"# Someone else can recreate your environment\n",
"config = {\n",
" 'developer': 'Your Name',\n",
" 'python_version': '3.9.7',\n",
" 'platform': 'Darwin',\n",
" 'memory_gb': 16.0\n",
"}\n",
"```\n",
"\n",
"#### 2. **Debugging**\n",
"System info helps troubleshoot ML performance issues:\n",
"- **Memory errors**: \"Do I have enough RAM for this model?\"\n",
"- **Performance issues**: \"How many CPU cores can I use?\"\n",
"- **Compatibility problems**: \"What Python version am I running?\"\n",
"\n",
"#### 3. **Professional Development**\n",
"Shows proper engineering practices:\n",
"- **Attribution**: Your work is properly credited\n",
"- **Collaboration**: Others can contact you about your code\n",
"- **Documentation**: System context is preserved\n",
"\n",
"#### 4. **ML Systems Integration**\n",
"Connects to broader ML engineering:\n",
"- **Model cards**: Document system requirements\n",
"- **Experiment tracking**: Record hardware context\n",
"- **Deployment**: Match development to production environments\n",
"\n",
"### Real-World Examples\n",
"- **Google Colab**: Shows GPU type, RAM, disk space\n",
"- **Kaggle**: Displays system specs for reproducibility\n",
"- **MLflow**: Tracks system context with experiments\n",
"- **Docker**: Containerizes entire system configuration\n",
"\n",
"Let's start configuring your TinyTorch system!"
]
},
{
"cell_type": "markdown",
"id": "37575c5c",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 2: Personal Information Configuration\n",
"\n",
"### The Concept: Identity in ML Systems\n",
"Your **personal information** identifies you as the developer and configures your TinyTorch installation. This isn't just administrative - it's foundational to professional ML development.\n",
"\n",
"### Why Personal Info Matters in ML Engineering\n",
"\n",
"#### 1. **Attribution and Accountability**\n",
"- **Model ownership**: Who built this model?\n",
"- **Responsibility**: Who should be contacted about issues?\n",
"- **Credit**: Proper recognition for your work\n",
"\n",
"#### 2. **Collaboration and Communication**\n",
"- **Team coordination**: Multiple developers on ML projects\n",
"- **Knowledge sharing**: Others can learn from your work\n",
"- **Bug reports**: Contact info for issues and improvements\n",
"\n",
"#### 3. **Professional Standards**\n",
"- **Industry practice**: All professional software has attribution\n",
"- **Open source**: Proper credit in shared code\n",
"- **Academic integrity**: Clear authorship in research\n",
"\n",
"#### 4. **System Customization**\n",
"- **Personalized experience**: Your TinyTorch installation\n",
"- **Unique identification**: Distinguish your work from others\n",
"- **Development tracking**: Link code to developer\n",
"\n",
"### Real-World Parallels\n",
"- **Git commits**: Author name and email in every commit\n",
"- **Docker images**: Maintainer information in container metadata\n",
"- **Python packages**: Author info in `setup.py` and `pyproject.toml`\n",
"- **Model cards**: Creator information for ML models\n",
"\n",
"### Best Practices for Personal Configuration\n",
"- **Use real information**: Not placeholders or fake data\n",
"- **Professional email**: Accessible and appropriate\n",
"- **Descriptive system name**: Unique and meaningful\n",
"- **Consistent formatting**: Follow established conventions\n",
"\n",
"Now let's implement your personal configuration!"
]
},
{
"cell_type": "markdown",
"id": "363c3cb7",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### Before We Code: The 5 C's\n",
"\n",
"```python\n",
"# CONCEPT: What is Personal Information Configuration?\n",
"# Developer identity configuration that identifies you as the creator and\n",
"# configures your TinyTorch installation. Think Git commit attribution -\n",
"# every professional system needs to know who built it.\n",
"\n",
"# CODE STRUCTURE: What We're Building \n",
"def personal_info() -> Dict[str, str]: # Returns developer identity\n",
" return { # Dictionary with required fields\n",
" 'developer': 'Your Name', # Your actual name\n",
" 'email': 'your@domain.com', # Contact information\n",
" 'institution': 'Your Place', # Affiliation\n",
" 'system_name': 'YourName-Dev', # Unique system identifier\n",
" 'version': '1.0.0' # Configuration version\n",
" }\n",
"\n",
"# CONNECTIONS: Real-World Equivalents\n",
"# Git commits - author name and email in every commit\n",
"# Docker images - maintainer information in container metadata\n",
"# Python packages - author info in setup.py and pyproject.toml\n",
"# Model cards - creator information for ML models\n",
"\n",
"# CONSTRAINTS: Key Implementation Requirements\n",
"# - Use actual information (not placeholder text)\n",
"# - Email must be valid format (contains @ and domain)\n",
"# - System name should be unique and descriptive\n",
"# - All values must be strings, version stays '1.0.0'\n",
"\n",
"# CONTEXT: Why This Matters in ML Systems\n",
"# Professional ML development requires attribution:\n",
"# - Model ownership: Who built this neural network?\n",
"# - Collaboration: Others can contact you about issues\n",
"# - Professional standards: Industry practice for all software\n",
"# - System customization: Makes your TinyTorch installation unique\n",
"```\n",
"\n",
"**You're establishing your identity in the ML systems world.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0a8f7d8",
"metadata": {
"deletable": false,
"lines_to_next_cell": 1,
"nbgrader": {
"cell_type": "code",
"checksum": "885d89952aa40ac841392d44360964ef",
"grade": false,
"grade_id": "personal-info",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"def personal_info() -> Dict[str, str]:\n",
" \"\"\"\n",
" Return personal information for this TinyTorch installation.\n",
" \n",
" This function configures your personal TinyTorch installation with your identity.\n",
" It's the foundation of proper ML engineering practices - every system needs\n",
" to know who built it and how to contact them.\n",
" \n",
" TODO: Implement personal information configuration.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Create a dictionary with your personal details\n",
" 2. Include all required keys: developer, email, institution, system_name, version\n",
" 3. Use your actual information (not placeholder text)\n",
" 4. Make system_name unique and descriptive\n",
" 5. Keep version as '1.0.0' for now\n",
" \n",
" EXAMPLE OUTPUT:\n",
" {\n",
" 'developer': 'Student Name',\n",
" 'email': 'student@university.edu', \n",
" 'institution': 'University Name',\n",
" 'system_name': 'StudentName-TinyTorch-Dev',\n",
" 'version': '1.0.0'\n",
" }\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Replace the example with your real information\n",
" - Use a descriptive system_name (e.g., 'YourName-TinyTorch-Dev')\n",
" - Keep email format valid (contains @ and domain)\n",
" - Make sure all values are strings\n",
" - Consider how this info will be used in debugging and collaboration\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is like the 'author' field in Git commits\n",
" - Similar to maintainer info in Docker images\n",
" - Parallels author info in Python packages\n",
" - Foundation for professional ML development\n",
" \"\"\"\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()"
]
},
{
"cell_type": "markdown",
"id": "7279ac1a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### 🧪 Unit Test: Personal Information Configuration\n",
"\n",
"This test validates your `personal_info()` function implementation, ensuring it returns properly formatted developer information for system attribution and collaboration."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5abb07b",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "90ec3c137ee806c81d6b360f1edfe6db",
"grade": true,
"grade_id": "test-personal-info-immediate",
"locked": true,
"points": 5,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"def test_unit_personal_info_basic():\n",
" \"\"\"Test personal_info function implementation.\"\"\"\n",
" print(\"🔬 Unit Test: Personal Information...\")\n",
" \n",
" # Test personal_info function\n",
" personal = personal_info()\n",
" \n",
" # Test return type\n",
" assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
" for key in required_keys:\n",
" assert key in personal, f\"Dictionary should have '{key}' key\"\n",
" \n",
" # Test non-empty values\n",
" for key, value in personal.items():\n",
" assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
" assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
" \n",
" # Test email format\n",
" assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
" assert '.' in personal['email'], \"Email should contain domain\"\n",
" \n",
" # Test version format\n",
" assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
" \n",
" # Test system name (should be unique/personalized)\n",
" assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
" \n",
" print(\"✅ Personal info function tests passed!\")\n",
" print(f\"✅ TinyTorch configured for: {personal['developer']}\")\n",
"\n",
"# Run the test\n",
"test_unit_personal_info_basic()"
]
},
{
"cell_type": "markdown",
"id": "3e47a754",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 3: System Information Queries\n",
"\n",
"### The Concept: Hardware-Aware ML Systems\n",
"**System information** provides details about your hardware and software environment. This is crucial for ML development because machine learning is fundamentally about computation, and computation depends on hardware.\n",
"\n",
"### Why System Information Matters in ML Engineering\n",
"\n",
"#### 1. **Performance Optimization**\n",
"- **CPU cores**: Determines parallelization strategies\n",
"- **Memory**: Limits batch size and model size\n",
"- **Architecture**: Affects numerical precision and optimization\n",
"\n",
"#### 2. **Compatibility and Debugging**\n",
"- **Python version**: Determines available features and libraries\n",
"- **Platform**: Affects file paths, process management, and system calls\n",
"- **Architecture**: Influences numerical behavior and optimization\n",
"\n",
"#### 3. **Resource Planning**\n",
"- **Training time estimation**: More cores = faster training\n",
"- **Memory requirements**: Avoid out-of-memory errors\n",
"- **Deployment matching**: Development should match production\n",
"\n",
"#### 4. **Reproducibility**\n",
"- **Environment documentation**: Exact system specifications\n",
"- **Performance comparison**: Same code, different hardware\n",
"- **Bug reproduction**: System-specific issues\n",
"\n",
"### The Python System Query Toolkit\n",
"You'll learn to use these essential Python modules:\n",
"\n",
"#### `sys.version_info` - Python Version\n",
"```python\n",
"version_info = sys.version_info\n",
"python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
"# Example: \"3.9.7\"\n",
"```\n",
"\n",
"#### `platform.system()` - Operating System\n",
"```python\n",
"platform_name = platform.system()\n",
"# Examples: \"Darwin\" (macOS), \"Linux\", \"Windows\"\n",
"```\n",
"\n",
"#### `platform.machine()` - CPU Architecture\n",
"```python\n",
"architecture = platform.machine()\n",
"# Examples: \"x86_64\", \"arm64\", \"aarch64\"\n",
"```\n",
"\n",
"#### `psutil.cpu_count()` - CPU Cores\n",
"```python\n",
"cpu_count = psutil.cpu_count()\n",
"# Example: 8 (cores available for parallel processing)\n",
"```\n",
"\n",
"#### `psutil.virtual_memory().total` - Total RAM\n",
"```python\n",
"memory_bytes = psutil.virtual_memory().total\n",
"memory_gb = round(memory_bytes / (1024**3), 1)\n",
"# Example: 16.0 GB\n",
"```\n",
"\n",
"### Real-World Applications\n",
"- **PyTorch**: `torch.get_num_threads()` uses CPU count\n",
"- **TensorFlow**: `tf.config.list_physical_devices()` queries hardware\n",
"- **Scikit-learn**: `n_jobs=-1` uses all available cores\n",
"- **Dask**: Automatically configures workers based on CPU count\n",
"\n",
"### ML Systems Performance Considerations\n",
"- **Memory-bound operations**: Matrix multiplication, large model loading\n",
"- **CPU-bound operations**: Data preprocessing, feature engineering\n",
"- **I/O-bound operations**: Data loading, model saving\n",
"- **Platform-specific optimizations**: SIMD instructions, memory management\n",
"\n",
"Now let's implement system information queries!"
]
},
{
"cell_type": "markdown",
"id": "188419e9",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### Before We Code: The 5 C's\n",
"\n",
"```python\n",
"# CONCEPT: What is System Information?\n",
"# Hardware and software environment detection for ML systems.\n",
"# Think computer specifications for gaming - ML needs to know what\n",
"# resources are available for optimal performance.\n",
"\n",
"# CODE STRUCTURE: What We're Building \n",
"def system_info() -> Dict[str, Any]: # Queries system specs\n",
" return { # Hardware/software details\n",
" 'python_version': '3.9.7', # Python compatibility\n",
" 'platform': 'Darwin', # Operating system\n",
" 'architecture': 'arm64', # CPU architecture\n",
" 'cpu_count': 8, # Parallel processing cores\n",
" 'memory_gb': 16.0 # Available RAM\n",
" }\n",
"\n",
"# CONNECTIONS: Real-World Equivalents\n",
"# torch.get_num_threads() (PyTorch) - uses CPU count for optimization\n",
"# tf.config.list_physical_devices() (TensorFlow) - queries hardware\n",
"# psutil.cpu_count() (System monitoring) - same underlying queries\n",
"# MLflow system tracking - documents environment for reproducibility\n",
"\n",
"# CONSTRAINTS: Key Implementation Requirements\n",
"# - Use actual system queries (not hardcoded values)\n",
"# - Convert memory from bytes to GB for readability\n",
"# - Round memory to 1 decimal place for clean output\n",
"# - Return proper data types (strings, int, float)\n",
"\n",
"# CONTEXT: Why This Matters in ML Systems\n",
"# Hardware awareness enables performance optimization:\n",
"# - Training: More CPU cores = faster data processing\n",
"# - Memory: Determines maximum model and batch sizes\n",
"# - Debugging: System specs help troubleshoot performance issues\n",
"# - Reproducibility: Document exact environment for experiment tracking\n",
"```\n",
"\n",
"**You're building hardware-aware ML systems that adapt to their environment.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77998a3c",
"metadata": {
"deletable": false,
"lines_to_next_cell": 1,
"nbgrader": {
"cell_type": "code",
"checksum": "b6a128f46146114516fccef552012497",
"grade": false,
"grade_id": "system-info",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"def system_info() -> Dict[str, Any]:\n",
" \"\"\"\n",
" Query and return system information for this TinyTorch installation.\n",
" \n",
" This function gathers crucial hardware and software information that affects\n",
" ML performance, compatibility, and debugging. It's the foundation of \n",
" hardware-aware ML systems.\n",
" \n",
" TODO: Implement system information queries.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Get Python version using sys.version_info\n",
" 2. Get platform using platform.system()\n",
" 3. Get architecture using platform.machine()\n",
" 4. Get CPU count using psutil.cpu_count()\n",
" 5. Get memory using psutil.virtual_memory().total\n",
" 6. Convert memory from bytes to GB (divide by 1024^3)\n",
" 7. Return all information in a dictionary\n",
" \n",
" EXAMPLE OUTPUT:\n",
" {\n",
" 'python_version': '3.9.7',\n",
" 'platform': 'Darwin', \n",
" 'architecture': 'arm64',\n",
" 'cpu_count': 8,\n",
" 'memory_gb': 16.0\n",
" }\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Use f-string formatting for Python version: f\"{major}.{minor}.{micro}\"\n",
" - Memory conversion: bytes / (1024^3) = GB\n",
" - Round memory to 1 decimal place for readability\n",
" - Make sure data types are correct (strings for text, int for cpu_count, float for memory_gb)\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is like `torch.cuda.is_available()` in PyTorch\n",
" - Similar to system info in MLflow experiment tracking\n",
" - Parallels hardware detection in TensorFlow\n",
" - Foundation for performance optimization in ML systems\n",
" \n",
" PERFORMANCE IMPLICATIONS:\n",
" - cpu_count affects parallel processing capabilities\n",
" - memory_gb determines maximum model and batch sizes\n",
" - platform affects file system and process management\n",
" - architecture influences numerical precision and optimization\n",
" \"\"\"\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()"
]
},
{
"cell_type": "markdown",
"id": "7f324c88",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### 🧪 Unit Test: System Information Query\n",
"\n",
"This test validates your `system_info()` function implementation, ensuring it accurately detects and reports hardware and software specifications for performance optimization and debugging."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "094b8f68",
"metadata": {
"deletable": false,
"editable": false,
"nbgrader": {
"cell_type": "code",
"checksum": "b6a307022113102000f8c1cbb71f57ef",
"grade": true,
"grade_id": "test-system-info-immediate",
"locked": true,
"points": 5,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"def test_unit_system_info_basic():\n",
" \"\"\"Test system_info function implementation.\"\"\"\n",
" print(\"🔬 Unit Test: System Information...\")\n",
" \n",
" # Test system_info function\n",
" sys_info = system_info()\n",
" \n",
" # Test return type\n",
" assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
" for key in required_keys:\n",
" assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
" \n",
" # Test data types\n",
" assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
" assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
" assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
" assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
" assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
" \n",
" # Test reasonable values\n",
" assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
" assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
" assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
" \n",
" # Test that values are actually queried (not hardcoded)\n",
" actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
" assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
" \n",
" print(\"✅ System info function tests passed!\")\n",
" print(f\"✅ Python: {sys_info['python_version']} on {sys_info['platform']}\")\n",
"\n",
"# Run the test\n",
"test_unit_system_info_basic()"
]
},
{
"cell_type": "markdown",
"id": "e0e55b7e",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🧪 Testing Your Configuration Functions\n",
"\n",
"### The Importance of Testing in ML Systems\n",
"Before we test your implementation, let's understand why testing is crucial in ML systems:\n",
"\n",
"#### 1. **Reliability**\n",
"- **Function correctness**: Does your code do what it's supposed to?\n",
"- **Edge case handling**: What happens with unexpected inputs?\n",
"- **Error detection**: Catch bugs before they cause problems\n",
"\n",
"#### 2. **Reproducibility**\n",
"- **Consistent behavior**: Same inputs always produce same outputs\n",
"- **Environment validation**: Ensure setup works across different systems\n",
"- **Regression prevention**: New changes don't break existing functionality\n",
"\n",
"#### 3. **Professional Development**\n",
"- **Code quality**: Well-tested code is maintainable code\n",
"- **Collaboration**: Others can trust and extend your work\n",
"- **Documentation**: Tests serve as executable documentation\n",
"\n",
"#### 4. **ML-Specific Concerns**\n",
"- **Data validation**: Ensure data types and shapes are correct\n",
"- **Performance verification**: Check that optimizations work\n",
"- **System compatibility**: Verify cross-platform behavior\n",
"\n",
"### Testing Strategy\n",
"We'll use comprehensive testing that checks:\n",
"- **Return types**: Are outputs the correct data types?\n",
"- **Required fields**: Are all expected keys present?\n",
"- **Data validation**: Are values reasonable and properly formatted?\n",
"- **System accuracy**: Do queries match actual system state?\n",
"\n",
"Now let's test your configuration functions!"
]
},
{
"cell_type": "markdown",
"id": "56c9c340",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"### 🎯 Additional Comprehensive Tests\n",
"\n",
"These comprehensive tests validate that your configuration functions work together and integrate properly with the TinyTorch system."
]
},
{
"cell_type": "markdown",
"id": "f9ed0b99",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## 🎯 MODULE SUMMARY: Setup Configuration\n",
"\n",
"You've successfully configured your TinyTorch installation and learned the foundations of ML systems engineering:\n",
"\n",
"### What You've Accomplished\n",
"✅ **Personal Configuration**: Set up your identity and custom system name \n",
"✅ **System Queries**: Learned to gather hardware and software information \n",
"✅ **NBGrader Workflow**: Mastered solution blocks and automated testing \n",
"✅ **Code Export**: Created functions that become part of your tinytorch package \n",
"✅ **Professional Setup**: Established proper development practices \n",
"\n",
"### Key Concepts You've Learned\n",
"\n",
"#### 1. **System Awareness**\n",
"- **Hardware constraints**: Understanding CPU, memory, and architecture limitations\n",
"- **Software dependencies**: Python version and platform compatibility\n",
"- **Performance implications**: How system specs affect ML workloads\n",
"\n",
"#### 2. **Configuration Management**\n",
"- **Personal identification**: Professional attribution and contact information\n",
"- **Environment documentation**: Reproducible system specifications\n",
"- **Professional standards**: Industry-standard development practices\n",
"\n",
"#### 3. **ML Systems Foundations**\n",
"- **Reproducibility**: System context for experiment tracking\n",
"- **Debugging**: Hardware info for performance troubleshooting\n",
"- **Collaboration**: Proper attribution and contact information\n",
"\n",
"#### 4. **Development Workflow**\n",
"- **NBGrader integration**: Automated testing and grading\n",
"- **Code export**: Functions become part of production package\n",
"- **Testing practices**: Comprehensive validation of functionality\n",
"\n",
"### Next Steps in Your ML Systems Journey\n",
"\n",
"#### **Immediate Actions**\n",
"1. **Export your code**: `tito module export 01_setup`\n",
"2. **Test your installation**: \n",
" ```python\n",
" from tinytorch.core.setup import personal_info, system_info\n",
" print(personal_info()) # Your personal details\n",
" print(system_info()) # System information\n",
" ```\n",
"3. **Verify package integration**: Ensure your functions work in the tinytorch package\n",
"\n",
"#### **Looking Ahead**\n",
"- **Module 1 (Tensor)**: Build the fundamental data structure for ML\n",
"- **Module 2 (Activations)**: Add nonlinearity for complex learning\n",
"- **Module 3 (Layers)**: Create the building blocks of neural networks\n",
"- **Module 4 (Networks)**: Compose layers into powerful architectures\n",
"\n",
"#### **Course Progression**\n",
"You're now ready to build a complete ML system from scratch:\n",
"```\n",
"Setup → Tensor → Activations → Layers → Networks → CNN → DataLoader → \n",
"Autograd → Optimizers → Training → Compression → Kernels → Benchmarking → MLOps\n",
"```\n",
"\n",
"### Professional Development Milestone\n",
"\n",
"You've taken your first step in ML systems engineering! This module taught you:\n",
"- **System thinking**: Understanding hardware and software constraints\n",
"- **Professional practices**: Proper attribution, testing, and documentation\n",
"- **Tool mastery**: NBGrader workflow and package development\n",
"- **Foundation building**: Creating reusable, tested, documented code\n",
"\n",
"**Ready for the next challenge?** Let's build the foundation of ML systems with tensors!"
]
}
],
"metadata": {
"jupytext": {
"main_language": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -1,840 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "699bd495",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"# Setup - TinyTorch System Configuration\n",
"\n",
"Welcome to TinyTorch! This setup module configures your personal TinyTorch installation and teaches you the NBGrader workflow.\n",
"\n",
"## Learning Goals\n",
"- Configure your personal TinyTorch installation with custom information\n",
"- Learn to query system information using Python modules\n",
"- Master the NBGrader workflow: implement \u2192 test \u2192 export\n",
"- Create functions that become part of your tinytorch package\n",
"- Understand solution blocks, hidden tests, and automated grading\n",
"\n",
"## The Big Picture: Why Configuration Matters in ML Systems\n",
"Configuration is the foundation of any production ML system. In this module, you'll learn:\n",
"\n",
"### 1. **System Awareness**\n",
"Real ML systems need to understand their environment:\n",
"- **Hardware constraints**: Memory, CPU cores, GPU availability\n",
"- **Software dependencies**: Python version, library compatibility\n",
"- **Platform differences**: Linux servers, macOS development, Windows deployment\n",
"\n",
"### 2. **Reproducibility**\n",
"Configuration enables reproducible ML:\n",
"- **Environment documentation**: Exactly what system was used\n",
"- **Dependency management**: Precise versions and requirements\n",
"- **Debugging support**: System info helps troubleshoot issues\n",
"\n",
"### 3. **Professional Development**\n",
"Proper configuration shows engineering maturity:\n",
"- **Attribution**: Your work is properly credited\n",
"- **Collaboration**: Others can understand and extend your setup\n",
"- **Maintenance**: Systems can be updated and maintained\n",
"\n",
"### 4. **ML Systems Context**\n",
"This connects to broader ML engineering:\n",
"- **Model deployment**: Different environments need different configs\n",
"- **Monitoring**: System metrics help track performance\n",
"- **Scaling**: Understanding hardware helps optimize training\n",
"\n",
"Let's build the foundation of your ML systems engineering skills!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a06f484d",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "setup-imports",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"#| default_exp core.setup\n",
"\n",
"#| export\n",
"import sys\n",
"import platform\n",
"import psutil\n",
"import os\n",
"from typing import Dict, Any"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f63f890e",
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "setup-verification",
"locked": false,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"print(\"\ud83d\udd25 TinyTorch Setup Module\")\n",
"print(f\"Python version: {sys.version_info.major}.{sys.version_info.minor}\")\n",
"print(f\"Platform: {platform.system()}\")\n",
"print(\"Ready to configure your TinyTorch installation!\")"
]
},
{
"cell_type": "markdown",
"id": "de5378e3",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## \ud83c\udfd7\ufe0f The Architecture of ML Systems Configuration\n",
"\n",
"### Configuration Layers in Production ML\n",
"Real ML systems have multiple configuration layers:\n",
"\n",
"```\n",
"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n",
"\u2502 Application Config \u2502 \u2190 Your personal info\n",
"\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n",
"\u2502 System Environment \u2502 \u2190 Hardware specs\n",
"\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n",
"\u2502 Runtime Configuration \u2502 \u2190 Python, libraries\n",
"\u251c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n",
"\u2502 Infrastructure Config \u2502 \u2190 Cloud, containers\n",
"\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
"```\n",
"\n",
"### Why Each Layer Matters\n",
"- **Application**: Identifies who built what and when\n",
"- **System**: Determines performance characteristics and limitations\n",
"- **Runtime**: Affects compatibility and feature availability\n",
"- **Infrastructure**: Enables scaling and deployment strategies\n",
"\n",
"### Connection to Real ML Frameworks\n",
"Every major ML framework has configuration:\n",
"- **PyTorch**: `torch.cuda.is_available()`, `torch.get_num_threads()`\n",
"- **TensorFlow**: `tf.config.list_physical_devices()`, `tf.sysconfig.get_build_info()`\n",
"- **Hugging Face**: Model cards with system requirements and performance metrics\n",
"- **MLflow**: Experiment tracking with system context and reproducibility\n",
"\n",
"### TinyTorch's Approach\n",
"We'll build configuration that's:\n",
"- **Educational**: Teaches system awareness\n",
"- **Practical**: Actually useful for debugging\n",
"- **Professional**: Follows industry standards\n",
"- **Extensible**: Ready for future ML systems features"
]
},
{
"cell_type": "markdown",
"id": "9c51b4b0",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 2
},
"source": [
"## Step 1: What is System Configuration?\n",
"\n",
"### Definition\n",
"**System configuration** is the process of setting up your development environment with personalized information and system diagnostics. In TinyTorch, this means:\n",
"\n",
"- **Personal Information**: Your name, email, institution for identification\n",
"- **System Information**: Hardware specs, Python version, platform details\n",
"- **Customization**: Making your TinyTorch installation uniquely yours\n",
"\n",
"### Why Configuration Matters in ML Systems\n",
"Proper system configuration is crucial because:\n",
"\n",
"#### 1. **Reproducibility** \n",
"Your setup can be documented and shared:\n",
"```python\n",
"# Someone else can recreate your environment\n",
"config = {\n",
" 'developer': 'Your Name',\n",
" 'python_version': '3.9.7',\n",
" 'platform': 'Darwin',\n",
" 'memory_gb': 16.0\n",
"}\n",
"```\n",
"\n",
"#### 2. **Debugging**\n",
"System info helps troubleshoot ML performance issues:\n",
"- **Memory errors**: \"Do I have enough RAM for this model?\"\n",
"- **Performance issues**: \"How many CPU cores can I use?\"\n",
"- **Compatibility problems**: \"What Python version am I running?\"\n",
"\n",
"#### 3. **Professional Development**\n",
"Shows proper engineering practices:\n",
"- **Attribution**: Your work is properly credited\n",
"- **Collaboration**: Others can contact you about your code\n",
"- **Documentation**: System context is preserved\n",
"\n",
"#### 4. **ML Systems Integration**\n",
"Connects to broader ML engineering:\n",
"- **Model cards**: Document system requirements\n",
"- **Experiment tracking**: Record hardware context\n",
"- **Deployment**: Match development to production environments\n",
"\n",
"### Real-World Examples\n",
"- **Google Colab**: Shows GPU type, RAM, disk space\n",
"- **Kaggle**: Displays system specs for reproducibility\n",
"- **MLflow**: Tracks system context with experiments\n",
"- **Docker**: Containerizes entire system configuration\n",
"\n",
"Let's start configuring your TinyTorch system!"
]
},
{
"cell_type": "markdown",
"id": "37575c5c",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 2: Personal Information Configuration\n",
"\n",
"### The Concept: Identity in ML Systems\n",
"Your **personal information** identifies you as the developer and configures your TinyTorch installation. This isn't just administrative - it's foundational to professional ML development.\n",
"\n",
"### Why Personal Info Matters in ML Engineering\n",
"\n",
"#### 1. **Attribution and Accountability**\n",
"- **Model ownership**: Who built this model?\n",
"- **Responsibility**: Who should be contacted about issues?\n",
"- **Credit**: Proper recognition for your work\n",
"\n",
"#### 2. **Collaboration and Communication**\n",
"- **Team coordination**: Multiple developers on ML projects\n",
"- **Knowledge sharing**: Others can learn from your work\n",
"- **Bug reports**: Contact info for issues and improvements\n",
"\n",
"#### 3. **Professional Standards**\n",
"- **Industry practice**: All professional software has attribution\n",
"- **Open source**: Proper credit in shared code\n",
"- **Academic integrity**: Clear authorship in research\n",
"\n",
"#### 4. **System Customization**\n",
"- **Personalized experience**: Your TinyTorch installation\n",
"- **Unique identification**: Distinguish your work from others\n",
"- **Development tracking**: Link code to developer\n",
"\n",
"### Real-World Parallels\n",
"- **Git commits**: Author name and email in every commit\n",
"- **Docker images**: Maintainer information in container metadata\n",
"- **Python packages**: Author info in `setup.py` and `pyproject.toml`\n",
"- **Model cards**: Creator information for ML models\n",
"\n",
"### Best Practices for Personal Configuration\n",
"- **Use real information**: Not placeholders or fake data\n",
"- **Professional email**: Accessible and appropriate\n",
"- **Descriptive system name**: Unique and meaningful\n",
"- **Consistent formatting**: Follow established conventions\n",
"\n",
"Now let's implement your personal configuration!"
]
},
{
"cell_type": "markdown",
"id": "363c3cb7",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### Before We Code: The 5 C's\n",
"\n",
"```python\n",
"# CONCEPT: What is Personal Information Configuration?\n",
"# Developer identity configuration that identifies you as the creator and\n",
"# configures your TinyTorch installation. Think Git commit attribution -\n",
"# every professional system needs to know who built it.\n",
"\n",
"# CODE STRUCTURE: What We're Building \n",
"def personal_info() -> Dict[str, str]: # Returns developer identity\n",
" return { # Dictionary with required fields\n",
" 'developer': 'Your Name', # Your actual name\n",
" 'email': 'your@domain.com', # Contact information\n",
" 'institution': 'Your Place', # Affiliation\n",
" 'system_name': 'YourName-Dev', # Unique system identifier\n",
" 'version': '1.0.0' # Configuration version\n",
" }\n",
"\n",
"# CONNECTIONS: Real-World Equivalents\n",
"# Git commits - author name and email in every commit\n",
"# Docker images - maintainer information in container metadata\n",
"# Python packages - author info in setup.py and pyproject.toml\n",
"# Model cards - creator information for ML models\n",
"\n",
"# CONSTRAINTS: Key Implementation Requirements\n",
"# - Use actual information (not placeholder text)\n",
"# - Email must be valid format (contains @ and domain)\n",
"# - System name should be unique and descriptive\n",
"# - All values must be strings, version stays '1.0.0'\n",
"\n",
"# CONTEXT: Why This Matters in ML Systems\n",
"# Professional ML development requires attribution:\n",
"# - Model ownership: Who built this neural network?\n",
"# - Collaboration: Others can contact you about issues\n",
"# - Professional standards: Industry practice for all software\n",
"# - System customization: Makes your TinyTorch installation unique\n",
"```\n",
"\n",
"**You're establishing your identity in the ML systems world.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0a8f7d8",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "personal-info",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"def personal_info() -> Dict[str, str]:\n",
" \"\"\"\n",
" Return personal information for this TinyTorch installation.\n",
" \n",
" This function configures your personal TinyTorch installation with your identity.\n",
" It's the foundation of proper ML engineering practices - every system needs\n",
" to know who built it and how to contact them.\n",
" \n",
" TODO: Implement personal information configuration.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Create a dictionary with your personal details\n",
" 2. Include all required keys: developer, email, institution, system_name, version\n",
" 3. Use your actual information (not placeholder text)\n",
" 4. Make system_name unique and descriptive\n",
" 5. Keep version as '1.0.0' for now\n",
" \n",
" EXAMPLE OUTPUT:\n",
" {\n",
" 'developer': 'Student Name',\n",
" 'email': 'student@university.edu', \n",
" 'institution': 'University Name',\n",
" 'system_name': 'StudentName-TinyTorch-Dev',\n",
" 'version': '1.0.0'\n",
" }\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Replace the example with your real information\n",
" - Use a descriptive system_name (e.g., 'YourName-TinyTorch-Dev')\n",
" - Keep email format valid (contains @ and domain)\n",
" - Make sure all values are strings\n",
" - Consider how this info will be used in debugging and collaboration\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is like the 'author' field in Git commits\n",
" - Similar to maintainer info in Docker images\n",
" - Parallels author info in Python packages\n",
" - Foundation for professional ML development\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "7279ac1a",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### \ud83e\uddea Unit Test: Personal Information Configuration\n",
"\n",
"This test validates your `personal_info()` function implementation, ensuring it returns properly formatted developer information for system attribution and collaboration."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5abb07b",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "test-personal-info-immediate",
"locked": true,
"points": 5,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"def test_unit_personal_info_basic():\n",
" \"\"\"Test personal_info function implementation.\"\"\"\n",
" print(\"\ud83d\udd2c Unit Test: Personal Information...\")\n",
" \n",
" # Test personal_info function\n",
" personal = personal_info()\n",
" \n",
" # Test return type\n",
" assert isinstance(personal, dict), \"personal_info should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['developer', 'email', 'institution', 'system_name', 'version']\n",
" for key in required_keys:\n",
" assert key in personal, f\"Dictionary should have '{key}' key\"\n",
" \n",
" # Test non-empty values\n",
" for key, value in personal.items():\n",
" assert isinstance(value, str), f\"Value for '{key}' should be a string\"\n",
" assert len(value) > 0, f\"Value for '{key}' cannot be empty\"\n",
" \n",
" # Test email format\n",
" assert '@' in personal['email'], \"Email should contain @ symbol\"\n",
" assert '.' in personal['email'], \"Email should contain domain\"\n",
" \n",
" # Test version format\n",
" assert personal['version'] == '1.0.0', \"Version should be '1.0.0'\"\n",
" \n",
" # Test system name (should be unique/personalized)\n",
" assert len(personal['system_name']) > 5, \"System name should be descriptive\"\n",
" \n",
" print(\"\u2705 Personal info function tests passed!\")\n",
" print(f\"\u2705 TinyTorch configured for: {personal['developer']}\")\n",
"\n",
"# Run the test\n",
"test_unit_personal_info_basic()"
]
},
{
"cell_type": "markdown",
"id": "3e47a754",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## Step 3: System Information Queries\n",
"\n",
"### The Concept: Hardware-Aware ML Systems\n",
"**System information** provides details about your hardware and software environment. This is crucial for ML development because machine learning is fundamentally about computation, and computation depends on hardware.\n",
"\n",
"### Why System Information Matters in ML Engineering\n",
"\n",
"#### 1. **Performance Optimization**\n",
"- **CPU cores**: Determines parallelization strategies\n",
"- **Memory**: Limits batch size and model size\n",
"- **Architecture**: Affects numerical precision and optimization\n",
"\n",
"#### 2. **Compatibility and Debugging**\n",
"- **Python version**: Determines available features and libraries\n",
"- **Platform**: Affects file paths, process management, and system calls\n",
"- **Architecture**: Influences numerical behavior and optimization\n",
"\n",
"#### 3. **Resource Planning**\n",
"- **Training time estimation**: More cores = faster training\n",
"- **Memory requirements**: Avoid out-of-memory errors\n",
"- **Deployment matching**: Development should match production\n",
"\n",
"#### 4. **Reproducibility**\n",
"- **Environment documentation**: Exact system specifications\n",
"- **Performance comparison**: Same code, different hardware\n",
"- **Bug reproduction**: System-specific issues\n",
"\n",
"### The Python System Query Toolkit\n",
"You'll learn to use these essential Python modules:\n",
"\n",
"#### `sys.version_info` - Python Version\n",
"```python\n",
"version_info = sys.version_info\n",
"python_version = f\"{version_info.major}.{version_info.minor}.{version_info.micro}\"\n",
"# Example: \"3.9.7\"\n",
"```\n",
"\n",
"#### `platform.system()` - Operating System\n",
"```python\n",
"platform_name = platform.system()\n",
"# Examples: \"Darwin\" (macOS), \"Linux\", \"Windows\"\n",
"```\n",
"\n",
"#### `platform.machine()` - CPU Architecture\n",
"```python\n",
"architecture = platform.machine()\n",
"# Examples: \"x86_64\", \"arm64\", \"aarch64\"\n",
"```\n",
"\n",
"#### `psutil.cpu_count()` - CPU Cores\n",
"```python\n",
"cpu_count = psutil.cpu_count()\n",
"# Example: 8 (cores available for parallel processing)\n",
"```\n",
"\n",
"#### `psutil.virtual_memory().total` - Total RAM\n",
"```python\n",
"memory_bytes = psutil.virtual_memory().total\n",
"memory_gb = round(memory_bytes / (1024**3), 1)\n",
"# Example: 16.0 GB\n",
"```\n",
"\n",
"### Real-World Applications\n",
"- **PyTorch**: `torch.get_num_threads()` uses CPU count\n",
"- **TensorFlow**: `tf.config.list_physical_devices()` queries hardware\n",
"- **Scikit-learn**: `n_jobs=-1` uses all available cores\n",
"- **Dask**: Automatically configures workers based on CPU count\n",
"\n",
"### ML Systems Performance Considerations\n",
"- **Memory-bound operations**: Matrix multiplication, large model loading\n",
"- **CPU-bound operations**: Data preprocessing, feature engineering\n",
"- **I/O-bound operations**: Data loading, model saving\n",
"- **Platform-specific optimizations**: SIMD instructions, memory management\n",
"\n",
"Now let's implement system information queries!"
]
},
{
"cell_type": "markdown",
"id": "188419e9",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### Before We Code: The 5 C's\n",
"\n",
"```python\n",
"# CONCEPT: What is System Information?\n",
"# Hardware and software environment detection for ML systems.\n",
"# Think computer specifications for gaming - ML needs to know what\n",
"# resources are available for optimal performance.\n",
"\n",
"# CODE STRUCTURE: What We're Building \n",
"def system_info() -> Dict[str, Any]: # Queries system specs\n",
" return { # Hardware/software details\n",
" 'python_version': '3.9.7', # Python compatibility\n",
" 'platform': 'Darwin', # Operating system\n",
" 'architecture': 'arm64', # CPU architecture\n",
" 'cpu_count': 8, # Parallel processing cores\n",
" 'memory_gb': 16.0 # Available RAM\n",
" }\n",
"\n",
"# CONNECTIONS: Real-World Equivalents\n",
"# torch.get_num_threads() (PyTorch) - uses CPU count for optimization\n",
"# tf.config.list_physical_devices() (TensorFlow) - queries hardware\n",
"# psutil.cpu_count() (System monitoring) - same underlying queries\n",
"# MLflow system tracking - documents environment for reproducibility\n",
"\n",
"# CONSTRAINTS: Key Implementation Requirements\n",
"# - Use actual system queries (not hardcoded values)\n",
"# - Convert memory from bytes to GB for readability\n",
"# - Round memory to 1 decimal place for clean output\n",
"# - Return proper data types (strings, int, float)\n",
"\n",
"# CONTEXT: Why This Matters in ML Systems\n",
"# Hardware awareness enables performance optimization:\n",
"# - Training: More CPU cores = faster data processing\n",
"# - Memory: Determines maximum model and batch sizes\n",
"# - Debugging: System specs help troubleshoot performance issues\n",
"# - Reproducibility: Document exact environment for experiment tracking\n",
"```\n",
"\n",
"**You're building hardware-aware ML systems that adapt to their environment.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77998a3c",
"metadata": {
"lines_to_next_cell": 1,
"nbgrader": {
"grade": false,
"grade_id": "system-info",
"locked": false,
"schema_version": 3,
"solution": true,
"task": false
}
},
"outputs": [],
"source": [
"#| export\n",
"def system_info() -> Dict[str, Any]:\n",
" \"\"\"\n",
" Query and return system information for this TinyTorch installation.\n",
" \n",
" This function gathers crucial hardware and software information that affects\n",
" ML performance, compatibility, and debugging. It's the foundation of \n",
" hardware-aware ML systems.\n",
" \n",
" TODO: Implement system information queries.\n",
" \n",
" STEP-BY-STEP IMPLEMENTATION:\n",
" 1. Get Python version using sys.version_info\n",
" 2. Get platform using platform.system()\n",
" 3. Get architecture using platform.machine()\n",
" 4. Get CPU count using psutil.cpu_count()\n",
" 5. Get memory using psutil.virtual_memory().total\n",
" 6. Convert memory from bytes to GB (divide by 1024^3)\n",
" 7. Return all information in a dictionary\n",
" \n",
" EXAMPLE OUTPUT:\n",
" {\n",
" 'python_version': '3.9.7',\n",
" 'platform': 'Darwin', \n",
" 'architecture': 'arm64',\n",
" 'cpu_count': 8,\n",
" 'memory_gb': 16.0\n",
" }\n",
" \n",
" IMPLEMENTATION HINTS:\n",
" - Use f-string formatting for Python version: f\"{major}.{minor}.{micro}\"\n",
" - Memory conversion: bytes / (1024^3) = GB\n",
" - Round memory to 1 decimal place for readability\n",
" - Make sure data types are correct (strings for text, int for cpu_count, float for memory_gb)\n",
" \n",
" LEARNING CONNECTIONS:\n",
" - This is like `torch.cuda.is_available()` in PyTorch\n",
" - Similar to system info in MLflow experiment tracking\n",
" - Parallels hardware detection in TensorFlow\n",
" - Foundation for performance optimization in ML systems\n",
" \n",
" PERFORMANCE IMPLICATIONS:\n",
" - cpu_count affects parallel processing capabilities\n",
" - memory_gb determines maximum model and batch sizes\n",
" - platform affects file system and process management\n",
" - architecture influences numerical precision and optimization\n",
" \"\"\"\n",
" ### BEGIN SOLUTION\n",
" # YOUR CODE HERE\n",
" raise NotImplementedError()\n",
" ### END SOLUTION"
]
},
{
"cell_type": "markdown",
"id": "7f324c88",
"metadata": {
"cell_marker": "\"\"\"",
"lines_to_next_cell": 1
},
"source": [
"### \ud83e\uddea Unit Test: System Information Query\n",
"\n",
"This test validates your `system_info()` function implementation, ensuring it accurately detects and reports hardware and software specifications for performance optimization and debugging."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "094b8f68",
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "test-system-info-immediate",
"locked": true,
"points": 5,
"schema_version": 3,
"solution": false,
"task": false
}
},
"outputs": [],
"source": [
"def test_unit_system_info_basic():\n",
" \"\"\"Test system_info function implementation.\"\"\"\n",
" print(\"\ud83d\udd2c Unit Test: System Information...\")\n",
" \n",
" # Test system_info function\n",
" sys_info = system_info()\n",
" \n",
" # Test return type\n",
" assert isinstance(sys_info, dict), \"system_info should return a dictionary\"\n",
" \n",
" # Test required keys\n",
" required_keys = ['python_version', 'platform', 'architecture', 'cpu_count', 'memory_gb']\n",
" for key in required_keys:\n",
" assert key in sys_info, f\"Dictionary should have '{key}' key\"\n",
" \n",
" # Test data types\n",
" assert isinstance(sys_info['python_version'], str), \"python_version should be string\"\n",
" assert isinstance(sys_info['platform'], str), \"platform should be string\"\n",
" assert isinstance(sys_info['architecture'], str), \"architecture should be string\"\n",
" assert isinstance(sys_info['cpu_count'], int), \"cpu_count should be integer\"\n",
" assert isinstance(sys_info['memory_gb'], (int, float)), \"memory_gb should be number\"\n",
" \n",
" # Test reasonable values\n",
" assert sys_info['cpu_count'] > 0, \"CPU count should be positive\"\n",
" assert sys_info['memory_gb'] > 0, \"Memory should be positive\"\n",
" assert len(sys_info['python_version']) > 0, \"Python version should not be empty\"\n",
" \n",
" # Test that values are actually queried (not hardcoded)\n",
" actual_version = f\"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}\"\n",
" assert sys_info['python_version'] == actual_version, \"Python version should match actual system\"\n",
" \n",
" print(\"\u2705 System info function tests passed!\")\n",
" print(f\"\u2705 Python: {sys_info['python_version']} on {sys_info['platform']}\")\n",
"\n",
"# Run the test\n",
"test_unit_system_info_basic()"
]
},
{
"cell_type": "markdown",
"id": "e0e55b7e",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## \ud83e\uddea Testing Your Configuration Functions\n",
"\n",
"### The Importance of Testing in ML Systems\n",
"Before we test your implementation, let's understand why testing is crucial in ML systems:\n",
"\n",
"#### 1. **Reliability**\n",
"- **Function correctness**: Does your code do what it's supposed to?\n",
"- **Edge case handling**: What happens with unexpected inputs?\n",
"- **Error detection**: Catch bugs before they cause problems\n",
"\n",
"#### 2. **Reproducibility**\n",
"- **Consistent behavior**: Same inputs always produce same outputs\n",
"- **Environment validation**: Ensure setup works across different systems\n",
"- **Regression prevention**: New changes don't break existing functionality\n",
"\n",
"#### 3. **Professional Development**\n",
"- **Code quality**: Well-tested code is maintainable code\n",
"- **Collaboration**: Others can trust and extend your work\n",
"- **Documentation**: Tests serve as executable documentation\n",
"\n",
"#### 4. **ML-Specific Concerns**\n",
"- **Data validation**: Ensure data types and shapes are correct\n",
"- **Performance verification**: Check that optimizations work\n",
"- **System compatibility**: Verify cross-platform behavior\n",
"\n",
"### Testing Strategy\n",
"We'll use comprehensive testing that checks:\n",
"- **Return types**: Are outputs the correct data types?\n",
"- **Required fields**: Are all expected keys present?\n",
"- **Data validation**: Are values reasonable and properly formatted?\n",
"- **System accuracy**: Do queries match actual system state?\n",
"\n",
"Now let's test your configuration functions!"
]
},
{
"cell_type": "markdown",
"id": "56c9c340",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"### \ud83c\udfaf Additional Comprehensive Tests\n",
"\n",
"These comprehensive tests validate that your configuration functions work together and integrate properly with the TinyTorch system."
]
},
{
"cell_type": "markdown",
"id": "f9ed0b99",
"metadata": {
"cell_marker": "\"\"\""
},
"source": [
"## \ud83c\udfaf MODULE SUMMARY: Setup Configuration\n",
"\n",
"You've successfully configured your TinyTorch installation and learned the foundations of ML systems engineering:\n",
"\n",
"### What You've Accomplished\n",
"\u2705 **Personal Configuration**: Set up your identity and custom system name \n",
"\u2705 **System Queries**: Learned to gather hardware and software information \n",
"\u2705 **NBGrader Workflow**: Mastered solution blocks and automated testing \n",
"\u2705 **Code Export**: Created functions that become part of your tinytorch package \n",
"\u2705 **Professional Setup**: Established proper development practices \n",
"\n",
"### Key Concepts You've Learned\n",
"\n",
"#### 1. **System Awareness**\n",
"- **Hardware constraints**: Understanding CPU, memory, and architecture limitations\n",
"- **Software dependencies**: Python version and platform compatibility\n",
"- **Performance implications**: How system specs affect ML workloads\n",
"\n",
"#### 2. **Configuration Management**\n",
"- **Personal identification**: Professional attribution and contact information\n",
"- **Environment documentation**: Reproducible system specifications\n",
"- **Professional standards**: Industry-standard development practices\n",
"\n",
"#### 3. **ML Systems Foundations**\n",
"- **Reproducibility**: System context for experiment tracking\n",
"- **Debugging**: Hardware info for performance troubleshooting\n",
"- **Collaboration**: Proper attribution and contact information\n",
"\n",
"#### 4. **Development Workflow**\n",
"- **NBGrader integration**: Automated testing and grading\n",
"- **Code export**: Functions become part of production package\n",
"- **Testing practices**: Comprehensive validation of functionality\n",
"\n",
"### Next Steps in Your ML Systems Journey\n",
"\n",
"#### **Immediate Actions**\n",
"1. **Export your code**: `tito module export 01_setup`\n",
"2. **Test your installation**: \n",
" ```python\n",
" from tinytorch.core.setup import personal_info, system_info\n",
" print(personal_info()) # Your personal details\n",
" print(system_info()) # System information\n",
" ```\n",
"3. **Verify package integration**: Ensure your functions work in the tinytorch package\n",
"\n",
"#### **Looking Ahead**\n",
"- **Module 1 (Tensor)**: Build the fundamental data structure for ML\n",
"- **Module 2 (Activations)**: Add nonlinearity for complex learning\n",
"- **Module 3 (Layers)**: Create the building blocks of neural networks\n",
"- **Module 4 (Networks)**: Compose layers into powerful architectures\n",
"\n",
"#### **Course Progression**\n",
"You're now ready to build a complete ML system from scratch:\n",
"```\n",
"Setup \u2192 Tensor \u2192 Activations \u2192 Layers \u2192 Networks \u2192 CNN \u2192 DataLoader \u2192 \n",
"Autograd \u2192 Optimizers \u2192 Training \u2192 Compression \u2192 Kernels \u2192 Benchmarking \u2192 MLOps\n",
"```\n",
"\n",
"### Professional Development Milestone\n",
"\n",
"You've taken your first step in ML systems engineering! This module taught you:\n",
"- **System thinking**: Understanding hardware and software constraints\n",
"- **Professional practices**: Proper attribution, testing, and documentation\n",
"- **Tool mastery**: NBGrader workflow and package development\n",
"- **Foundation building**: Creating reusable, tested, documented code\n",
"\n",
"**Ready for the next challenge?** Let's build the foundation of ML systems with tensors!"
]
}
],
"metadata": {
"jupytext": {
"main_language": "python"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@@ -0,0 +1,105 @@
# Assignments Are Dynamically Generated
## Important: Assignments Directory Structure
**All assignments are dynamically generated** from the `modules/` directory using `tito nbgrader` commands. The `assignments/` directory should **not** be manually maintained.
## How It Works
### Source of Truth: `modules/` Directory
The actual module content lives in:
```
modules/
├── 01_tensor/
│ └── tensor_dev.py ← Source of truth
├── 02_activations/
│ └── activations_dev.py ← Source of truth
└── ...
```
### Dynamic Generation: `assignments/` Directory
Assignments are **generated** from modules using:
```bash
# Generate assignment for a single module
tito nbgrader generate 01_tensor
# Generate assignments for all modules
tito nbgrader generate --all
# Generate assignments for a range
tito nbgrader generate --range 01-05
```
This creates:
```
assignments/
├── source/
│ ├── 01_tensor/
│ │ └── 01_tensor.ipynb ← Generated from modules/01_tensor/tensor_dev.py
│ └── 02_activations/
│ └── 02_activations.ipynb ← Generated from modules/02_activations/activations_dev.py
└── release/
└── ... (student versions, generated via 'tito nbgrader release')
```
## Process Flow
```
modules/01_tensor/tensor_dev.py
↓ (tito nbgrader generate)
↓ (jupytext converts .py → .ipynb)
↓ (NotebookGenerator processes with nbgrader markers)
assignments/source/01_tensor/01_tensor.ipynb
↓ (tito nbgrader release)
assignments/release/01_tensor/01_tensor.ipynb (student version)
```
## What This Means
1. **Don't manually edit** `assignments/source/` files - they're generated
2. **Edit modules** in `modules/` directory instead
3. **Regenerate assignments** when modules change: `tito nbgrader generate`
4. **Old assignments** (like `01_setup`) are outdated - regenerate from current modules
## Outdated Assignment: `01_setup`
The `assignments/source/01_setup/` directory is **outdated** because:
- Module 01 is now "Tensor" (`modules/01_tensor/`)
- It was created when Module 01 was "Setup" (old structure)
- Should be regenerated: `tito nbgrader generate 01_tensor`
## For Binder/Colab
**No impact** - Binder setup doesn't depend on assignment notebooks. However:
- If you want to include assignments in Binder, regenerate them first:
```bash
tito nbgrader generate --all
```
- Students can access modules directly from `modules/` directory
- Assignments are optional - modules are the source of truth
## Best Practices
1. **Always regenerate** assignments after modifying modules
2. **Don't commit** manually edited assignment files
3. **Use `tito nbgrader generate`** to create assignments
4. **Keep modules/** as the single source of truth
## Commands Reference
```bash
# Generate assignments
tito nbgrader generate 01_tensor # Single module
tito nbgrader generate --all # All modules
tito nbgrader generate --range 01-05 # Range
# Release to students (removes solutions)
tito nbgrader release 01_tensor
# Generate feedback
tito nbgrader feedback 01_tensor
```

View File

@@ -0,0 +1,449 @@
# Benchmark & Community Commands Design
## Command Structure
### Benchmark Commands (Performance)
**Two Types of Benchmarks:**
1. **Baseline Benchmark** (`tito benchmark baseline`)
- Lightweight, runs after setup
- Quick validation: "Everything works!"
- Basic operations: tensor ops, simple forward pass
- **Purpose**: Hello world moment, verify setup
2. **Capstone Benchmark** (`tito benchmark capstone`)
- Full benchmark suite (Module 20)
- Proper performance metrics
- All optimization tracks: Speed, Compression, Accuracy, Efficiency
- **Purpose**: Real performance evaluation, leaderboard
### Community Commands (Cohort Feeling)
1. **Join** (`tito community join`)
- Add to community map
- Share location, institution, course type
- **Purpose**: "I'm part of the cohort!"
2. **Update** (`tito community update`)
- Update progress: milestones, modules completed
- Refresh community entry
- **Purpose**: Track progress in community
3. **Stats** (`tito community stats`)
- See community statistics
- Your cohort info
- **Purpose**: "See who else is building"
4. **Cohort** (`tito community cohort`)
- See your cohort members
- Filter by institution, course type, date
- **Purpose**: "These are my peers!"
## Command Details
### 1. Baseline Benchmark
**Command**: `tito benchmark baseline`
**When to run**: After setup, anytime
**What it does**:
- Runs lightweight benchmarks (no full module 20 needed)
- Tests: tensor creation, matrix multiply, simple forward pass
- Generates JSON with baseline scores
- Shows celebration message
**Output**:
```
🎉 Baseline Benchmark Complete!
📊 Your Baseline Performance:
• Tensor Operations: ⚡ 0.5ms
• Matrix Multiply: ⚡ 2.3ms
• Forward Pass: ⚡ 5.2ms
• Score: 85/100
✅ Setup verified and working!
💡 Run 'tito benchmark capstone' after Module 20 for full benchmarks
```
**JSON Output**: `benchmarks/baseline_TIMESTAMP.json`
### 2. Capstone Benchmark
**Command**: `tito benchmark capstone [--track TRACK]`
**When to run**: After Module 20 (Capstone)
**What it does**:
- Runs full benchmark suite from Module 20
- Tests all optimization tracks:
- Speed: Inference latency, throughput
- Compression: Model size, quantization
- Accuracy: Task performance
- Efficiency: Memory, energy
- Generates comprehensive JSON
- Can submit to leaderboard
**Tracks**:
- `--track speed`: Latency/throughput benchmarks
- `--track compression`: Size/quantization benchmarks
- `--track accuracy`: Task performance benchmarks
- `--track efficiency`: Memory/energy benchmarks
- `--track all`: All tracks (default)
**Output**:
```
🏆 Capstone Benchmark Results
📊 Speed Track:
• Inference Latency: 45.2ms
• Throughput: 22.1 ops/sec
• Score: 92/100
📊 Compression Track:
• Model Size: 12.4MB
• Compression Ratio: 4.2x
• Score: 88/100
📊 Overall Score: 90/100
🌍 Submit to leaderboard: tito community submit --benchmark
```
**JSON Output**: `benchmarks/capstone_TIMESTAMP.json`
### 3. Community Join
**Command**: `tito community join`
**When to run**: After setup, anytime
**What it does**:
- Collects: country, institution, course type (optional)
- Validates setup
- Generates anonymous ID
- Adds to community map
- Shows cohort info
**Output**:
```
🌍 Join the TinyTorch Community
📍 Country: [Auto-detected: United States]
🏫 Institution (optional): Harvard University
📚 Course Type (optional): University course
✅ You've joined the TinyTorch Community!
📍 Location: United States
🏫 Institution: Harvard University
🌍 View map: https://tinytorch.ai/community
🎖️ You're builder #1,234 on the global map!
👥 Your Cohort:
• Fall 2024 cohort: 234 builders
• Harvard University: 15 builders
• University courses: 456 builders
💡 Run 'tito community cohort' to see your peers
```
**JSON Output**: `community/my_submission.json`
### 4. Community Update
**Command**: `tito community update`
**When to run**: After milestones pass, module completion
**What it does**:
- Updates existing community entry
- Adds: milestones passed, modules completed
- Refreshes cohort stats
- Shows updated progress
**Output**:
```
✅ Community Entry Updated!
📊 Your Progress:
• Milestones Passed: 6/6 ✅
• Modules Completed: 20/20 ✅
• Capstone Score: 90/100
👥 Your Cohort Stats:
• Fall 2024: 234 builders (you're #15 by progress!)
• Harvard: 15 builders (you're #3!)
• All milestones: 89 builders worldwide
🌍 View updated map: https://tinytorch.ai/community
```
### 5. Community Stats
**Command**: `tito community stats [--cohort]`
**What it does**:
- Shows global community statistics
- Shows your cohort information
- Shows progress comparisons
**Output**:
```
🌍 TinyTorch Community Stats
📊 Global:
• Total Builders: 1,234
• Countries: 45
• Institutions: 234
• This Week: 23 new builders
👥 Your Cohort (Fall 2024):
• Total: 234 builders
• Your Institution: 15 builders
• Your Progress Rank: #15/234
• Milestones Completed: 89/234 (38%)
📈 Progress Distribution:
• All Milestones: 89 (38%)
• Some Milestones: 123 (53%)
• Just Started: 22 (9%)
🌍 View full map: https://tinytorch.ai/community
```
### 6. Community Cohort
**Command**: `tito community cohort [--institution] [--course-type]`
**What it does**:
- Shows your cohort members
- Filter by institution, course type, date
- Shows progress comparisons
- Creates "these are my peers" feeling
**Output**:
```
👥 Your TinyTorch Cohort
🏫 Harvard University Cohort (15 builders):
Rank | Progress | Joined
-----|-----------------|----------
#1 | 20/20 modules ✅ | Sep 2024
#2 | 20/20 modules ✅ | Sep 2024
#3 | 20/20 modules ✅ | Oct 2024 ← You!
#4 | 15/20 modules | Oct 2024
...
📚 University Course Cohort (456 builders):
• Your rank: #45/456
• Top 10% by progress!
🌍 View full community: https://tinytorch.ai/community
```
## Cohort Features
### Creating "Cohort Feeling"
**1. Cohort Identification**
- "Fall 2024 Cohort"
- "Harvard University Cohort"
- "University Course Cohort"
- "Self-Paced Cohort"
**2. Progress Comparison**
- "You're #15 in your cohort"
- "Top 10% by progress"
- "89 builders in your cohort completed all milestones"
**3. Peer Visibility**
- See others from same institution
- See others in same course type
- See others who joined around same time
**4. Milestone Celebrations**
- "You and 23 others completed Milestone 3 this week!"
- "You're part of the 89 builders who completed all milestones!"
## Data Structure
### Community Submission
```json
{
"anonymous_id": "abc123...",
"timestamp": "2024-11-20T10:30:00Z",
"location": {
"country": "United States"
},
"institution": {
"name": "Harvard University",
"type": "university"
},
"context": {
"course_type": "university_course",
"cohort": "Fall 2024", // Auto-determined by date
"experience_level": "intermediate"
},
"progress": {
"setup_verified": true,
"milestones_passed": 6,
"modules_completed": 20,
"capstone_score": 90
},
"benchmarks": {
"baseline": {
"score": 85,
"timestamp": "2024-11-20T10:00:00Z"
},
"capstone": {
"score": 90,
"tracks": {
"speed": 92,
"compression": 88,
"accuracy": 95,
"efficiency": 85
},
"timestamp": "2024-11-25T15:30:00Z"
}
}
}
```
## Implementation Structure
### Commands to Create
**Benchmark Commands** (`tito/commands/benchmark.py`):
- `tito benchmark baseline` - Quick setup validation
- `tito benchmark capstone` - Full Module 20 benchmarks
- `tito benchmark submit` - Submit to leaderboard
**Community Commands** (`tito/commands/community.py`):
- `tito community join` - Join community map
- `tito community update` - Update progress
- `tito community stats` - View statistics
- `tito community cohort` - See your cohort
- `tito community submit` - Submit benchmarks to leaderboard
## User Journey with Cohort Feeling
```
1. Clone & Setup
2. tito system doctor ✅
3. tito community join
→ "You're builder #1,234"
→ "Fall 2024 cohort: 234 builders"
→ "Harvard: 15 builders"
4. tito benchmark baseline
→ "Score: 85/100"
→ "You're in top 25% of your cohort!"
5. Build modules...
6. tito community update
→ "Milestones: 6/6 ✅"
→ "You're #15 in your cohort!"
7. Complete Module 20...
8. tito benchmark capstone
→ "Score: 90/100"
→ "You're #3 at Harvard!"
9. tito community submit --benchmark
→ "Added to leaderboard!"
→ "Rank: #45 globally, #3 at Harvard"
10. tito community cohort
→ See your peers
→ "These are the builders in my cohort!"
```
## Cohort Features
### What Creates Cohort Feeling
**1. Temporal Cohorts**
- "Fall 2024 Cohort" (by join date)
- "This Week's Cohort" (recent joiners)
- "All-Time Builders" (everyone)
**2. Institutional Cohorts**
- "Harvard University Cohort"
- "Stanford Cohort"
- "Self-Paced Cohort"
**3. Progress Cohorts**
- "All Milestones Cohort" (completed everything)
- "Foundation Tier Cohort" (completed modules 1-7)
- "Capstone Cohort" (completed module 20)
**4. Course Type Cohorts**
- "University Course Cohort"
- "Bootcamp Cohort"
- "Self-Paced Cohort"
### Cohort Messages
**After joining:**
```
👥 Welcome to the Fall 2024 Cohort!
You're joining 234 builders who started TinyTorch this semester.
15 builders are from Harvard University (your institution).
🌍 View your cohort: tito community cohort
```
**After milestones:**
```
🎉 Milestone Achievement!
You and 23 others in the Fall 2024 cohort completed Milestone 3 this week!
You're now part of the 89 builders who've completed all milestones.
👥 See your cohort progress: tito community cohort
```
**After capstone:**
```
🏆 Capstone Complete!
You're #3 in the Harvard cohort!
You're #45 globally among all builders.
👥 Your cohort stats: tito community cohort
```
## Implementation Priority
### Phase 1: Core Commands
1.`tito community join` - Join community
2.`tito benchmark baseline` - Quick validation
3.`tito community stats` - View stats
### Phase 2: Progress Tracking
4.`tito community update` - Update progress
5.`tito community cohort` - See cohort
### Phase 3: Capstone Integration
6.`tito benchmark capstone` - Full benchmarks
7.`tito community submit` - Submit to leaderboard
This creates a complete system where students feel part of a cohort from day one! 🎓🌍

169
binder/BUILD_INTEGRATION.md Normal file
View File

@@ -0,0 +1,169 @@
# Automatic Notebook Preparation in Site Build
## Overview
Notebook preparation is now **automatically integrated** into the site build process. When you build the site, notebooks are automatically prepared for launch buttons to work.
## How It Works
### Automatic Integration
The build process now includes notebook preparation:
```bash
cd site
make html # Automatically prepares notebooks, then builds site
jupyter-book build . # Also prepares notebooks automatically
```
### Build Flow
```
1. User runs: make html
2. prepare_notebooks.sh runs automatically
3. Script looks for existing assignment notebooks
4. Copies them to site/chapters/modules/
5. Jupyter Book builds site
6. Launch buttons appear on notebook pages!
```
## What Gets Prepared
### Source: Assignment Notebooks
The script uses notebooks from `assignments/source/` (generated via `tito nbgrader generate`):
```
assignments/source/01_tensor/01_tensor.ipynb
↓ (copied during build)
site/chapters/modules/01_tensor.ipynb
```
### Why Assignment Notebooks?
- Already processed with nbgrader markers
- Student-ready format
- Generated from Python source files
- Consistent with assignment workflow
## Build Commands
All build commands now include notebook preparation:
### HTML Build
```bash
cd site
make html
# Or directly:
jupyter-book build .
```
### PDF Builds
```bash
make pdf-simple # HTML-to-PDF (includes notebook prep)
make pdf # LaTeX PDF (includes notebook prep)
```
## Manual Preparation (Optional)
If you want to prepare notebooks manually:
```bash
cd site
./prepare_notebooks.sh
```
This is useful for:
- Testing notebook preparation
- Debugging launch button issues
- Preparing notebooks before CI/CD builds
## Workflow Summary
### Complete Development → Site Flow
```
1. Development
Edit: modules/01_tensor/tensor_dev.py
2. Generate Assignments
Run: tito nbgrader generate 01_tensor
Creates: assignments/source/01_tensor/01_tensor.ipynb
3. Build Site (automatic)
Run: cd site && make html
Auto-prepares: Copies notebooks to site/chapters/modules/
Builds: Jupyter Book with launch buttons
4. Launch Buttons Work!
Users click → Binder/Colab opens with notebook
```
## Benefits
**Automatic**: No manual steps needed
**Consistent**: Always uses latest notebooks
**Fast**: Uses existing assignment notebooks when available
**Robust**: Falls back gracefully if notebooks don't exist
**Integrated**: Works with all build commands
## Troubleshooting
### Launch Buttons Don't Appear
1. **Check notebooks exist**:
```bash
ls site/chapters/modules/*.ipynb
```
2. **Regenerate assignments**:
```bash
tito nbgrader generate --all
```
3. **Rebuild site**:
```bash
cd site && make html
```
### Notebooks Not Found
If you see "No notebooks prepared":
- Run `tito nbgrader generate --all` first
- Ensure modules have Python source files
- Check that `tito` command is available
### Build Fails
The prepare script is designed to fail gracefully:
- If `tito` is not available, it skips preparation
- If notebooks don't exist, it warns but continues
- Build continues even if preparation fails
## CI/CD Integration
For automated builds (GitHub Actions, etc.):
```yaml
# Example GitHub Actions step
- name: Build site
run: |
cd site
make html
```
The prepare script automatically handles:
- Missing `tito` command (skips gracefully)
- Missing notebooks (warns but continues)
- Non-git environments (works in CI/CD)
## Next Steps
1. ✅ Notebook preparation integrated into build
2. ✅ Launch buttons will work automatically
3. ⏳ Test Binder/Colab links after build
4. ⏳ Verify launch buttons appear on site

29
binder/CLEANUP_NOTES.md Normal file
View File

@@ -0,0 +1,29 @@
# Cleanup Notes: Old 01_setup Module
## Issue
The `assignments/source/01_setup/` directory contains an outdated notebook from when Module 01 was "Setup". Module 01 is now "Tensor" (`modules/01_tensor/`).
## Current State
-**Current Module 01**: `modules/01_tensor/` (Tensor)
- ⚠️ **Old Assignment**: `assignments/source/01_setup/` (outdated)
-**Current Assignment**: `assignments/source/02_tensor/` (Tensor)
## Impact on Binder/Colab
**No impact** - Binder setup doesn't depend on specific assignment notebooks. The `binder/` configuration:
- Installs TinyTorch package (`pip install -e .`)
- Provides JupyterLab environment
- Students can access any notebooks in the repository
## References Updated
-`binder/VERIFY.md` - Updated Colab example to use `02_tensor`
-`site/usage-paths/classroom-use.md` - Updated nbgrader commands
-`docs/STUDENT_QUICKSTART.md` - Updated module references
## Recommendation
The old `assignments/source/01_setup/` directory can be:
1. **Removed** if no longer needed (cleanest option)
2. **Kept** if you want to preserve old assignments for reference
3. **Moved** to an archive directory if you want to keep history
**For Binder/Colab**: No action needed - they work regardless of this old directory.

View File

@@ -0,0 +1,146 @@
# Cloud Notebook Options for TinyTorch
## Current Setup
**Currently Configured:**
-**MyBinder** (`https://mybinder.org`) - Free, open-source, works well
-**Google Colab** (`https://colab.research.google.com`) - Free, popular, GPU access
## Available Options
### 1. MyBinder (Current) ✅
**Pros:**
- Free and open-source
- No account required
- Works directly from GitHub
- Good for educational use
- Already configured and working
**Cons:**
- Can be slow to start (2-5 minutes)
- Limited resources (CPU, memory)
- No GPU access
- Sessions timeout after inactivity
**Best For:** Educational use, quick demos, zero-setup access
### 2. Google Colab (Current) ✅
**Pros:**
- Free tier available
- GPU access (free tier: T4 GPU)
- Fast startup
- Popular and familiar to students
- Good integration with Google Drive
**Cons:**
- Requires Google account
- Free tier has usage limits
- Sessions disconnect after inactivity
- Can be slow during peak times
**Best For:** Students who need GPU, familiar Google ecosystem
### 3. Deepnote (Not Currently Configured)
**Pros:**
- Modern, polished interface
- Real-time collaboration
- Good for team projects
- Free tier available
- Better than Colab for some use cases
**Cons:**
- Less well-known than Colab
- Requires account
- Free tier limitations
**Best For:** Team collaboration, professional workflows
**How to Add:**
```yaml
# In site/_config.yml
launch_buttons:
deepnote_url: "https://deepnote.com"
```
### 4. JupyterHub (For Institutions)
**Pros:**
- Self-hosted control
- Institutional integration
- Can provide GPUs
- Scalable
**Cons:**
- Requires server infrastructure
- Setup complexity
- Maintenance overhead
**Best For:** Universities, institutions with IT support
### 5. Kaggle Notebooks
**Pros:**
- Free GPU access
- Popular ML community
- Good for competitions
**Cons:**
- Less flexible than Colab
- More focused on competitions
**Best For:** ML competitions, Kaggle users
## Recommendation for TinyTorch
### Current Setup is Good ✅
**MyBinder + Colab** covers most use cases:
- **MyBinder**: Zero-setup, no account needed, perfect for quick access
- **Colab**: GPU access when needed, familiar to students
### Optional Addition: Deepnote
If you want to add Deepnote for better collaboration:
1. **Add to config:**
```yaml
launch_buttons:
binderhub_url: "https://mybinder.org"
colab_url: "https://colab.research.google.com"
deepnote_url: "https://deepnote.com" # Add this
```
2. **Benefits:**
- Better collaboration features
- More modern interface
- Good for team projects
3. **Considerations:**
- Adds another option (might be confusing)
- Students need to create account
- Current setup already works well
## What About "Mariomi"?
I couldn't find a tool called "Mariomi" related to notebooks. You might be thinking of:
- **MyST** (MyST Markdown) - Already used by Jupyter Book (for documentation)
- **Miro** - Collaboration whiteboard (not for notebooks)
- **Deepnote** - Modern notebook platform (see above)
## My Recommendation
**Keep current setup (MyBinder + Colab)** because:
1. ✅ Already working
2. ✅ Covers all use cases
3. ✅ No additional complexity
4. ✅ Students familiar with Colab
5. ✅ MyBinder perfect for zero-setup access
**Optional:** Add Deepnote if you want better collaboration features, but it's not necessary.
## Testing Current Setup
To verify launch buttons work:
1. Build site: `cd site && make html`
2. Check notebook pages have launch buttons
3. Test Binder: Click "Launch Binder" → Should open MyBinder
4. Test Colab: Click "Launch Colab" → Should open in Colab

View File

@@ -0,0 +1,262 @@
# Community Benchmark & "Hello World" Experience Design
## Goal: First Success Moment
Create an immediate "wow, I did it!" moment where students:
1. ✅ Clone and setup TinyTorch
2. ✅ Run all tests (validate installation)
3. ✅ Run milestones (validate their implementation)
4. 🎉 Get benchmark score and join the community
## User Journey Flow
```
Clone & Setup
tito system doctor (verify setup)
tito milestone validate --all (run all milestones)
tito benchmark baseline (generate benchmark score)
🎉 "Welcome to TinyTorch Community!"
[Optional] Upload to leaderboard
```
## Implementation Design
### 1. Baseline Benchmark Command
**Command**: `tito benchmark baseline`
**What it does**:
- Runs a set of lightweight benchmarks (not full module 20)
- Tests basic operations: tensor creation, matrix multiplication, simple forward pass
- Measures: execution time, memory usage, basic throughput
- Generates JSON with results
**When to run**:
- After `tito system doctor` passes
- After `tito milestone validate --all` passes
- Can be run anytime to check baseline
### 2. Benchmark JSON Structure
```json
{
"timestamp": "2024-11-20T10:30:00Z",
"version": "1.0.0",
"system": {
"platform": "darwin",
"python_version": "3.11.0",
"numpy_version": "1.24.0",
"cpu_count": 8,
"memory_gb": 16
},
"baseline_benchmarks": {
"tensor_creation": {
"time_ms": 0.5,
"memory_mb": 0.1
},
"matrix_multiply": {
"time_ms": 2.3,
"throughput_ops_per_sec": 434.78
},
"simple_forward_pass": {
"time_ms": 5.2,
"memory_mb": 2.5
}
},
"milestone_status": {
"milestone_01_perceptron": "passed",
"milestone_02_xor": "passed",
"milestone_03_mlp": "passed"
},
"setup_validated": true,
"all_tests_passed": true
}
```
### 3. Upload/Submission System
**Command**: `tito benchmark submit [--public]`
**What it does**:
- Uploads benchmark JSON to server
- Gets back: community rank, percentile, badge
- Optional: make public on leaderboard
**Server endpoint** (to be created):
- `POST /api/benchmarks/submit`
- Returns: `{ "rank": 1234, "percentile": 75, "badge": "🚀 First Steps" }`
### 4. Community Leaderboard
**Features**:
- Public leaderboard (optional participation)
- Shows: rank, percentile, system info, timestamp
- Filterable by: system type, date, milestone status
- Badges: "🚀 First Steps", "⚡ Fast Setup", "🏆 All Milestones"
### 5. "Hello World" Experience
**After `tito benchmark baseline`**:
```
🎉 Congratulations! You've successfully set up TinyTorch!
📊 Your Baseline Performance:
• Tensor Operations: ⚡ Fast (0.5ms)
• Matrix Multiply: ⚡ Fast (2.3ms)
• Forward Pass: ⚡ Fast (5.2ms)
✅ Milestones Validated: 3/6 passed
🌍 Join the Community:
Run 'tito benchmark submit' to share your results
and see how you compare to others worldwide!
📈 Your Score: 85/100
You're in the top 25% of TinyTorch users!
🚀 Next Steps:
• Continue building modules
• Run 'tito benchmark baseline' anytime
• Complete all milestones for full score
```
## Implementation Steps
### Phase 1: Baseline Benchmark (Core)
1. **Create `tito/commands/benchmark.py`**:
- `tito benchmark baseline` - Run benchmarks, generate JSON
- `tito benchmark submit` - Upload to server (optional)
2. **Benchmark Suite**:
- Lightweight tests (don't require all modules)
- Basic tensor operations
- Simple forward pass
- Memory profiling
3. **JSON Generation**:
- Save to `benchmarks/baseline_YYYYMMDD_HHMMSS.json`
- Include system info, benchmark results, milestone status
### Phase 2: Server Integration
1. **API Endpoint**:
- Simple REST API
- Accepts benchmark JSON
- Returns rank/percentile/badge
- Stores in database
2. **Leaderboard**:
- Public web page
- Shows rankings
- Filterable/searchable
### Phase 3: Community Features
1. **Badges**:
- "🚀 First Steps" - Completed baseline
- "⚡ Fast Setup" - Top 10% performance
- "🏆 All Milestones" - All milestones passed
- "🌍 Community Member" - Submitted to leaderboard
2. **Sharing**:
- Generate shareable image/card
- "I just set up TinyTorch! Score: 85/100"
- Link to leaderboard
## Technical Considerations
### Benchmark Design
**Keep it lightweight**:
- Don't require all modules
- Use basic operations only
- Fast execution (< 30 seconds)
- Works after setup + milestone validation
**What to benchmark**:
- Tensor creation speed
- Matrix multiplication throughput
- Simple forward pass (2-layer network)
- Memory efficiency
- Basic autograd operations
### Privacy & Opt-in
- **Default**: Benchmarks saved locally only
- **Optional**: `--public` flag to share
- **Anonymized**: System info only (no personal data)
- **Consent**: Clear messaging about what's shared
### Server Architecture
**Simple approach**:
- Static JSON file storage (GitHub Pages?)
- Or simple API (Flask/FastAPI)
- Database: SQLite or PostgreSQL
- Leaderboard: Static site generator
**More advanced**:
- Real-time leaderboard
- User accounts (optional)
- Historical tracking
- Regional comparisons
## User Experience Flow
### First Time Setup
```bash
# 1. Clone and setup
git clone https://github.com/mlsysbook/TinyTorch.git
cd TinyTorch
./setup-environment.sh
source activate.sh
# 2. Verify setup
tito system doctor
# ✅ All checks passed!
# 3. Run milestones (if modules completed)
tito milestone validate --all
# ✅ Milestone 01: Perceptron - PASSED
# ✅ Milestone 02: XOR - PASSED
# ✅ Milestone 03: MLP - PASSED
# 4. Generate baseline benchmark
tito benchmark baseline
# 🎉 Congratulations! You've successfully set up TinyTorch!
# 📊 Your Baseline Performance: 85/100
# 🌍 Run 'tito benchmark submit' to join the community!
# 5. (Optional) Submit to leaderboard
tito benchmark submit --public
# ✅ Submitted! You're rank #1234 (top 25%)
# 🔗 View leaderboard: https://tinytorch.ai/leaderboard
```
## Benefits
1. **Immediate Gratification**: "I did it!" moment
2. **Community Feeling**: Part of something bigger
3. **Motivation**: See how they compare
4. **Validation**: Confirms setup worked
5. **Progress Tracking**: Can re-run anytime
## Next Steps
1. Design benchmark suite (what to test)
2. Implement `tito benchmark baseline` command
3. Create JSON schema
4. Design server API (or use GitHub Pages)
5. Build leaderboard page
6. Add badges/sharing features
This creates a "hello world" experience that makes students feel successful and part of the community immediately!

View File

@@ -0,0 +1,332 @@
# Community Data Collection Design
## Data We Collect (Privacy-Respecting)
### Required Fields
- **Country**: Geographic location (country-level only)
- **Setup Verified**: Confirmation that setup works
### Optional Fields (User Can Skip)
- **School/Institution**: University, bootcamp, or organization name
- **Course Type**: How they're using TinyTorch
- Self-paced learning
- University course
- Bootcamp/training
- Research project
- Industry training
- **System Type**: Hardware/platform
- Apple Silicon
- Linux x86
- Windows
- Cloud (Colab/Binder)
- **Experience Level**: (Optional)
- Beginner
- Intermediate
- Advanced
### What We DON'T Collect
- ❌ Personal name
- ❌ Email address
- ❌ Exact location (city/coordinates)
- ❌ IP address
- ❌ Any personally identifiable information
## Data Structure
### Submission JSON
```json
{
"anonymous_id": "abc123...", // Generated hash
"timestamp": "2024-11-20T10:30:00Z",
"location": {
"country": "United States" // Required
},
"institution": {
"name": "Harvard University", // Optional
"type": "university" // Optional: university, bootcamp, company, self-paced
},
"context": {
"course_type": "university_course", // Optional
"experience_level": "intermediate" // Optional
},
"system": {
"type": "Apple Silicon", // Optional
"platform": "darwin",
"python_version": "3.11.0"
},
"progress": {
"setup_verified": true,
"milestones_passed": 0, // Will update later
"modules_completed": 0 // Will update later
}
}
```
## Collection Flow
### Interactive Prompt
```bash
tito community join
🌍 Join the TinyTorch Global Community
This will add your location to the public community map.
All information is optional and completely anonymized.
📍 Country: [Auto-detected: United States]
(Press Enter to use detected, or type different country)
🏫 School/Institution (optional):
Examples: "Harvard University", "Stanford", "Self-paced"
[Press Enter to skip]
📚 Course Type (optional):
[1] Self-paced learning
[2] University course
[3] Bootcamp/training
[4] Research project
[5] Industry training
[6] Skip
Choose [1-6]:
💻 System Type (optional):
[Auto-detected: Apple Silicon]
[Press Enter to use detected, or type different]
🎓 Experience Level (optional):
[1] Beginner
[2] Intermediate
[3] Advanced
[4] Skip
Choose [1-4]:
📊 What will be shared:
• Country: United States ✅
• Institution: Harvard University ✅
• Course Type: University course ✅
• System Type: Apple Silicon ✅
• No personal information ✅
🔒 Privacy: Completely anonymized, country-level location only
Continue? [y/N]: y
✅ You've joined the TinyTorch Community!
📍 Location: United States
🏫 Institution: Harvard University
🌍 View map: https://tinytorch.ai/community
🎖️ You're builder #1,234 on the global map!
💡 Your institution will appear on the map (if provided)
```
## Map Visualization Features
### What the Map Shows
**Country View:**
- Dots/countries with builder counts
- "1,234 builders in 45 countries"
**Institution View** (Optional Filter):
- "Builders from 234 institutions"
- Top institutions by builder count
- "Harvard University: 15 builders"
- "Stanford: 12 builders"
- "Self-paced: 456 builders"
**Course Type Breakdown:**
- "University courses: 234"
- "Self-paced: 456"
- "Bootcamps: 89"
- "Research: 123"
**Diversity Stats:**
- "Builders from 45 countries"
- "234 institutions represented"
- "5 course types"
- "Diverse experience levels"
## Privacy Considerations
### Institution Privacy
**Options:**
1. **Show institution names** (if provided)
- Pros: More engaging, shows diversity
- Cons: Might identify users in small programs
2. **Show institution counts only**
- Pros: More private
- Cons: Less engaging
3. **Hybrid approach** (Recommended)
- Show institution names if ≥3 builders from that institution
- Otherwise: "Other institutions: 5 builders"
- Protects privacy while showing diversity
### Consent Flow
**Clear messaging:**
```
⚠️ Institution Information
If you provide your school/institution name, it may appear on the public map.
🔒 Privacy Protection:
• Institution names only shown if ≥3 builders from that institution
• No personal names or identifiers
• Completely anonymized
Provide institution? [y/N]:
```
## Map Features
### Interactive Map
**Country Level:**
- Click country → See stats:
- "United States: 456 builders"
- "Top institutions: Harvard (15), Stanford (12), MIT (10)"
- "Course types: University (234), Self-paced (189)"
**Institution Filter:**
- Filter by institution type
- Show: Universities, Bootcamps, Self-paced, etc.
- See geographic distribution
**Course Type View:**
- Color-code by course type
- Show: "Where are university students?"
- Show: "Where are self-paced learners?"
### Stats Dashboard
```
🌍 TinyTorch Community
📊 Global Stats:
• 1,234 builders worldwide
• 45 countries
• 234 institutions
• 5 course types
🏫 Top Institutions:
1. Harvard University: 15 builders
2. Stanford: 12 builders
3. MIT: 10 builders
4. Self-paced: 456 builders
...
🌎 Geographic Diversity:
• United States: 456 builders
• India: 234 builders
• United Kingdom: 123 builders
...
📚 Course Types:
• Self-paced: 456 (37%)
• University: 234 (19%)
• Bootcamp: 89 (7%)
...
```
## Benefits of Collecting This Data
### For Community
- **Visual diversity**: See global reach
- **Institutional connections**: "Wow, people from my school!"
- **Course type insights**: Understand how TinyTorch is used
- **Motivation**: "There are builders from 234 institutions!"
### For Users
- **Representation**: "I'm representing my school!"
- **Connection**: Find others from same institution
- **Pride**: "My institution is on the map!"
### For Project
- **Adoption tracking**: See where TinyTorch is used
- **Diversity metrics**: Geographic and institutional diversity
- **Success stories**: "Used in 234 institutions worldwide"
## Implementation
### Data Collection
**Command**: `tito community join`
**Flow:**
1. Auto-detect country (using system locale or geolocation API)
2. Ask for institution (optional)
3. Ask for course type (optional)
4. Auto-detect system type
5. Ask for experience level (optional)
6. Show summary
7. Get consent
8. Generate submission
### Privacy Protection
**Institution Anonymization:**
- If <3 builders from institution Show as "Other institutions"
- If 3 builders Show institution name
- Protects privacy while showing diversity
**Data Storage:**
- Anonymous ID (hash, not personal)
- No personal identifiers
- Country-level only (not city)
- Optional fields can be skipped
## Recommended Fields
### Required
- Country
### Highly Recommended (Optional)
- Institution/School name
- Course type
### Nice to Have (Optional)
- System type (auto-detected)
- Experience level
- Milestone progress (updates later)
### Skip
- Personal name
- Email
- Exact location
- Any PII
## Example Map Entry
**What users see:**
```
📍 United States
• 456 builders
• Top institutions: Harvard (15), Stanford (12), MIT (10)
• Course types: University (234), Self-paced (189)
```
**What gets stored:**
```json
{
"country": "United States",
"institution": "Harvard University",
"course_type": "university_course",
"anonymous_id": "abc123..."
}
```
This creates a rich, engaging community map while respecting privacy! 🌍✨

View File

@@ -0,0 +1,299 @@
# Community Building Expert Recommendation for TinyTorch
## Core Principles
### 1. **Low Barrier to Entry** ✅
- Make it **opt-in**, not required
- Default: benchmarks saved locally only
- No account creation needed initially
- Can participate anonymously
### 2. **Early Wins & Celebration** 🎉
- Immediate "I did it!" moment after setup
- Celebrate small wins (setup, first milestone)
- Show progress, not just final scores
- Make it feel like joining a community, not a competition
### 3. **Privacy-First** 🔒
- **Default**: Everything local, nothing shared
- **Opt-in sharing**: Clear consent for public leaderboard
- **Anonymized**: System specs only, no personal data
- **Institutional friendly**: Works for classroom use
### 4. **Progressive Engagement** 📈
- Level 1: Local benchmark (everyone can do)
- Level 2: Share anonymously (low commitment)
- Level 3: Public leaderboard (for those who want it)
- Level 4: Badges/achievements (long-term engagement)
### 5. **Inclusive, Not Exclusive** 🌍
- Don't make it feel competitive
- Focus on "you're part of something bigger"
- Celebrate participation, not just top performers
- Show diversity (different systems, different progress levels)
## Recommended Design
### Phase 1: Local Celebration (Everyone)
**After `tito benchmark baseline`:**
```
🎉 Welcome to the TinyTorch Community!
✅ Setup Verified
✅ Milestones Validated: 3/6
📊 Baseline Score: 85/100
🌍 You're now part of a global community of ML systems builders!
💡 Tip: Run 'tito benchmark submit' to see how you compare
(completely optional, all data stays local by default)
```
**Key**: Celebrate success, mention community, but don't pressure sharing.
### Phase 2: Anonymous Comparison (Low Commitment)
**After `tito benchmark submit` (anonymous mode):**
```
✅ Benchmark submitted anonymously!
📊 Your Performance:
• Score: 85/100
• Percentile: Top 25%
• System: Similar to 1,234 other users
🎯 You're doing great! Keep building!
💡 Run 'tito benchmark baseline' anytime to track your progress
```
**Key**: Show comparison without requiring identity.
### Phase 3: Public Leaderboard (Opt-in)
**After `tito benchmark submit --public`:**
```
✅ Added to public leaderboard!
🏆 Your Rank: #1,234 (Top 25%)
🌍 View leaderboard: https://tinytorch.ai/leaderboard
🎖️ Badge Earned: "🚀 First Steps"
💡 Share your achievement: [Generate share card]
```
**Key**: Make sharing optional and rewarding.
## Implementation Strategy
### 1. Benchmark Command Structure
```bash
# Generate baseline (always local)
tito benchmark baseline
# → Creates: benchmarks/baseline_TIMESTAMP.json
# → Shows celebration message
# → No network calls
# Submit anonymously (low commitment)
tito benchmark submit
# → Uploads anonymized data
# → Gets back: percentile, comparison stats
# → No personal info shared
# Submit publicly (opt-in)
tito benchmark submit --public
# → Adds to leaderboard
# → Gets rank, badge
# → Can share achievement
```
### 2. Privacy Model
**Three Tiers:**
1. **Local Only** (Default)
- Benchmarks saved to `benchmarks/` directory
- No network calls
- Complete privacy
2. **Anonymous Submission**
- Uploads: system specs, benchmark scores, milestone status
- No personal identifiers
- Gets back: percentile, comparison stats
- Can't be traced back to user
3. **Public Leaderboard** (Opt-in)
- Requires `--public` flag
- Can optionally add: GitHub username, location (country)
- Shows on public leaderboard
- Can generate shareable card
### 3. Leaderboard Design
**Features:**
- **Anonymized by default**: Show system specs, not names
- **Filterable**: By system type, date, milestone status
- **Inclusive**: Show all participants, not just top 10
- **Progress-focused**: Show "milestones completed" not just "fastest"
- **Diverse**: Highlight different system types, not just fastest
**Example Leaderboard Entry:**
```
Rank | System Type | Milestones | Score | Date
-----|------------------|------------|-------|----------
#1 | Apple Silicon | 6/6 ✅ | 95 | Nov 2024
#234 | Linux x86 | 3/6 🚧 | 85 | Nov 2024
#567 | Windows | 1/6 🚧 | 70 | Nov 2024
```
### 4. Badge System
**Achievement Badges** (not competitive):
- 🚀 **First Steps**: Completed baseline benchmark
-**Fast Setup**: Setup completed quickly
- 🏆 **Milestone Master**: All 6 milestones passed
- 🌍 **Community Member**: Submitted to leaderboard
- 📈 **Progress Maker**: Improved score over time
- 🎓 **Module Master**: Completed all 20 modules
**Philosophy**: Celebrate progress, not competition.
### 5. Server Architecture
**Simple & Scalable:**
**Option A: GitHub Pages + GitHub API** (Recommended)
- Store submissions as JSON files in `gh-pages` branch
- Use GitHub API for submissions
- Static leaderboard page
- Free, reliable, no server maintenance
**Option B: Simple API** (Future)
- Flask/FastAPI endpoint
- SQLite/PostgreSQL database
- Real-time leaderboard
- More features, but requires hosting
**Recommendation**: Start with GitHub Pages, scale later if needed.
## User Experience Flow
### First Time User
```bash
# 1. Setup
git clone ...
./setup-environment.sh
tito system doctor # ✅ All checks passed!
# 2. Run milestones (if completed)
tito milestone validate --all
# ✅ Milestone 01: PASSED
# ✅ Milestone 02: PASSED
# ✅ Milestone 03: PASSED
# 3. Generate baseline
tito benchmark baseline
# 🎉 Welcome to the TinyTorch Community!
# ✅ Setup Verified
# ✅ Milestones Validated: 3/6
# 📊 Baseline Score: 85/100
#
# 🌍 You're now part of a global community of ML systems builders!
#
# 💡 Tip: Run 'tito benchmark submit' to see how you compare
# (completely optional, all data stays local by default)
# 4. (Optional) See comparison
tito benchmark submit
# ✅ Benchmark submitted anonymously!
# 📊 Your Performance:
# • Score: 85/100
# • Percentile: Top 25%
# • Similar systems: 1,234 users
#
# 🎯 You're doing great! Keep building!
# 5. (Optional) Join public leaderboard
tito benchmark submit --public
# ✅ Added to public leaderboard!
# 🏆 Rank: #1,234 (Top 25%)
# 🎖️ Badge: "🚀 First Steps"
# 🔗 View: https://tinytorch.ai/leaderboard
```
## Key Recommendations
### ✅ DO:
1. **Make it opt-in**: Default to local-only
2. **Celebrate participation**: Not just winners
3. **Show progress**: Milestones completed, not just speed
4. **Respect privacy**: Anonymized by default
5. **Keep it simple**: Start with GitHub Pages
6. **Focus on community**: "You're part of something bigger"
7. **Make it inclusive**: All skill levels welcome
### ❌ DON'T:
1. **Don't make it required**: Some students/institutions can't share
2. **Don't make it competitive**: Focus on learning, not winning
3. **Don't collect personal data**: System specs only
4. **Don't overcomplicate**: Start simple, iterate
5. **Don't exclude anyone**: All systems, all progress levels
## Implementation Priority
### Phase 1: MVP (Week 1)
-`tito benchmark baseline` command
- ✅ Local JSON generation
- ✅ Celebration message
- ✅ Basic benchmark suite
### Phase 2: Community (Week 2)
-`tito benchmark submit` (anonymous)
- ✅ GitHub Pages leaderboard
- ✅ Percentile calculation
- ✅ Badge system
### Phase 3: Engagement (Week 3)
- ✅ Public leaderboard (opt-in)
- ✅ Shareable cards
- ✅ Progress tracking
- ✅ Achievement badges
## Success Metrics
**Community Health:**
- Number of baseline benchmarks generated (local)
- Number of anonymous submissions
- Number of public leaderboard entries
- Diversity of systems represented
- Milestone completion rates
**Not Success Metrics:**
- ❌ Highest scores (too competitive)
- ❌ Fastest times (excludes slower systems)
- ❌ Leaderboard rank (creates pressure)
## Final Recommendation
**Start Simple, Build Community:**
1. **Local celebration first** - Everyone gets the "wow" moment
2. **Anonymous comparison** - Low commitment, high value
3. **Public leaderboard** - Opt-in for those who want it
4. **Focus on progress** - Celebrate milestones, not speed
5. **Privacy-first** - Default to local, opt-in to share
**The goal**: Make students feel part of a global community of ML systems builders, not competitors.
This creates a welcoming, inclusive community that celebrates learning and progress! 🎉

View File

@@ -0,0 +1,371 @@
# Community Map Vision: "We Are TinyTorch"
## The Vision
A **world map** that shows where TinyTorch builders are located, creating a visual sense of global community. When students complete milestones and submit, they see:
> "Wow, there's a community of people building ML systems all over the world!"
## Design Concept
### The Map Experience
**After `tito milestone validate --all` passes:**
```
🎉 Congratulations! All Milestones Validated!
✅ Setup Complete
✅ All Tests Passing
✅ All Milestones Passed: 6/6
🌍 Join the Global TinyTorch Community:
Run 'tito community submit' to add your location to the map
and see builders from around the world!
(Completely optional - only shares country, not exact location)
```
**After `tito community submit`:**
```
✅ You've joined the TinyTorch Community!
📍 Your Location: United States
🌍 View the map: https://tinytorch.ai/community
🎖️ You're builder #1,234 on the global map!
💡 See where other TinyTorch builders are located worldwide
```
### The Map Visualization
**Features:**
- **World map** with dots/countries highlighted
- **Interactive**: Click to see stats per country
- **Live counter**: "1,234 builders worldwide"
- **Diversity showcase**: "Builders in 45 countries"
- **Recent additions**: "5 new builders this week"
**Privacy:**
- **Country-level only** (not city/coordinates)
- **Opt-in**: Must explicitly submit
- **Anonymized**: No personal identifiers
- **Optional**: Can participate without location
## Implementation Design
### 1. Submission Flow
**Command**: `tito community submit [--country COUNTRY]`
**What it does:**
- Detects country (or asks user)
- Validates milestones passed
- Submits anonymized data:
```json
{
"timestamp": "2024-11-20T10:30:00Z",
"country": "United States", // Country only, not city
"milestones_passed": 6,
"system_type": "Apple Silicon",
"anonymous_id": "abc123..." // Generated hash, not personal
}
```
**Validation:**
- Checks: `tito system doctor` passed
- Checks: `tito milestone validate --all` passed
- Only submits if everything validated
### 2. Map Visualization
**Technology Options:**
**Option A: Simple Static Map** (Recommended for MVP)
- GitHub Pages + Leaflet.js or Mapbox
- JSON file with submissions
- Static map that updates on deploy
- Free, simple, works immediately
**Option B: Interactive Map**
- Leaflet.js or Mapbox GL
- Real-time updates
- Click countries for stats
- More engaging, requires API
**Option C: GitHub Pages + GeoJSON**
- Store submissions as GeoJSON
- Use GitHub's map rendering
- Simple, free, GitHub-native
**Recommendation**: Start with Option A (Leaflet.js), upgrade to Option B later.
### 3. Data Structure
**Submissions JSON** (`community/submissions.json`):
```json
{
"total_builders": 1234,
"countries": {
"United States": 456,
"India": 234,
"United Kingdom": 123,
"Germany": 89,
...
},
"recent_submissions": [
{
"timestamp": "2024-11-20T10:30:00Z",
"country": "United States",
"milestones": 6,
"system": "Apple Silicon"
},
...
],
"stats": {
"total_countries": 45,
"this_week": 23,
"this_month": 156
}
}
```
### 4. Map Page Design
**URL**: `https://tinytorch.ai/community` or `/community-map`
**Features:**
- **World map** with country highlights
- **Counter**: "1,234 builders worldwide"
- **Country list**: "Builders in 45 countries"
- **Recent activity**: "5 new builders this week"
- **Call to action**: "Join the map → `tito community submit`"
**Visual Design:**
- Clean, modern map
- Dots or country shading
- Hover shows country stats
- Mobile-friendly
- Fast loading
## User Journey
### Complete Flow
```bash
# 1. Setup and validate
git clone ...
./setup-environment.sh
tito system doctor # ✅ All checks passed
tito milestone validate --all # ✅ All 6 milestones passed
# 2. Join community
tito community submit
# Detecting your location...
# Country: United States
#
# ✅ You've joined the TinyTorch Community!
#
# 🌍 View the map: https://tinytorch.ai/community
# 🎖️ You're builder #1,234 on the global map!
#
# 💡 See where other TinyTorch builders are located worldwide
# 3. View the map (opens in browser)
# Shows: World map with dots, your country highlighted
# Shows: "1,234 builders in 45 countries"
# Shows: Recent additions
```
## Privacy & Consent
### Privacy Model
**What's Shared** (with consent):
- ✅ Country (not city/coordinates)
- ✅ System type (Apple Silicon, Linux x86, etc.)
- ✅ Milestone count (how many passed)
- ✅ Timestamp (when submitted)
**What's NOT Shared**:
- ❌ Exact location
- ❌ Personal information
- ❌ IP address
- ❌ Email/name
- ❌ Institution
**Consent Flow:**
```
tito community submit
⚠️ This will add your location to the public community map.
📊 What will be shared:
• Country: United States
• System type: Apple Silicon
• Milestones passed: 6
• No personal information
🔒 Privacy: Only country-level location, completely anonymized
Continue? [y/N]: y
✅ Submitted! View map: https://tinytorch.ai/community
```
## Implementation Steps
### Phase 1: MVP (Simple Map)
1. **Create `tito community submit` command**
- Detect/ask for country
- Validate milestones passed
- Generate submission JSON
- Save locally + optionally upload
2. **Create map page** (`site/community-map.md`)
- Static HTML with Leaflet.js
- Reads from `community/submissions.json`
- Shows world map with countries
- Displays stats
3. **Submission storage**
- GitHub Pages: `community/submissions.json`
- Or: Simple API endpoint
- Updates on each submission
### Phase 2: Enhanced (Interactive Map)
1. **Interactive features**
- Click countries for details
- Filter by system type
- Timeline view (growth over time)
- Recent submissions feed
2. **Engagement features**
- "Builder of the week" (random selection)
- Country leaderboards (optional)
- Milestone completion stats
### Phase 3: Community Features
1. **Social elements**
- Share: "I'm builder #1,234 on the TinyTorch map!"
- Badges: "🌍 Global Builder"
- Stories: "Builders from 45 countries"
2. **Analytics**
- Growth over time
- Geographic distribution
- System diversity
- Milestone completion rates
## Technical Implementation
### Simple Approach (GitHub Pages)
**File Structure:**
```
community/
├── submissions.json # All submissions
├── map.html # Map visualization page
└── submit.py # Submission script (optional API)
```
**Map Page** (`site/community-map.md` or HTML):
```html
<!-- Leaflet.js map -->
<div id="community-map"></div>
<!-- Stats -->
<div>
<h2>🌍 TinyTorch Community</h2>
<p>1,234 builders in 45 countries</p>
<p>5 new builders this week</p>
</div>
<!-- Call to action -->
<p>Join the map: <code>tito community submit</code></p>
```
**Submission Process:**
1. User runs `tito community submit`
2. Generates submission JSON
3. Option A: User manually PRs to `community/submissions.json`
4. Option B: API endpoint accepts submissions
5. Map page reads JSON and renders
### API Approach (Future)
**Endpoint**: `POST /api/community/submit`
- Accepts submission JSON
- Validates (check milestones)
- Stores in database
- Returns success + map URL
**Map Page**:
- Fetches submissions from API
- Renders interactive map
- Updates in real-time
## Success Metrics
**Community Growth:**
- Number of countries represented
- Total builders on map
- Growth rate (new builders/week)
- Geographic diversity
**Engagement:**
- Map page views
- Submission rate (after milestones pass)
- Return visits to map
- Social shares
## The "Wow" Moment
**When someone views the map:**
```
🌍 TinyTorch Community Map
[Interactive world map showing dots/countries]
📊 Stats:
• 1,234 builders worldwide
• 45 countries represented
• 5 new builders this week
• Top countries: US (456), India (234), UK (123)
🎯 Recent Activity:
• Builder from Germany just joined!
• Builder from Japan completed all milestones!
• Builder from Brazil reached milestone 3!
💡 Join the map: Run 'tito community submit' after completing milestones
```
**The Impact:**
- Visual proof of global community
- Sense of belonging
- Motivation to continue
- Pride in being part of something bigger
## Recommendation
**Start Simple, Build Community:**
1. **MVP**: Simple map with country dots
2. **Privacy**: Country-level only, opt-in
3. **Validation**: Only after milestones pass
4. **Visual**: Make it beautiful and engaging
5. **Growth**: Let it populate organically
**The goal**: Create a visual representation that makes students feel part of a global movement of ML systems builders!
This map becomes a symbol of the TinyTorch community - showing that people all over the world are building ML systems from scratch together. 🌍✨

144
binder/LAUNCH_READINESS.md Normal file
View File

@@ -0,0 +1,144 @@
# Launch Readiness Checklist
## ✅ Assignment Process - COMPLETE
### Dynamic Assignment Generation ✅
- **Source**: `modules/*/.*_dev.py` (Python files)
- **Command**: `tito nbgrader generate MODULE`
- **Output**: `assignments/source/MODULE/MODULE.ipynb`
- **Status**: Fully functional, dynamically generated
### Assignment Release ✅
- **Command**: `tito nbgrader release MODULE`
- **Output**: `assignments/release/MODULE/MODULE.ipynb` (solutions removed)
- **Status**: Ready for student distribution
### Auto-Grading ✅
- **Command**: `tito nbgrader autograde MODULE`
- **Status**: NBGrader integration complete
## ✅ Site Build Integration - COMPLETE
### Automatic Notebook Preparation ✅
- **Script**: `site/prepare_notebooks.sh`
- **Integration**: Runs automatically during `make html`
- **Process**: Copies assignment notebooks to `site/chapters/modules/`
- **Result**: Launch buttons appear on notebook pages
### Build Commands ✅
- `make html` - Includes notebook preparation
- `make pdf` - Includes notebook preparation
- `make pdf-simple` - Includes notebook preparation
## ✅ Paper Documentation Sync - COMPLETE
### Files Created ✅
- `INSTRUCTOR.md` - ✅ Created (matches paper reference)
- `MAINTENANCE.md` - ✅ Created (support commitment through 2027)
- `TA_GUIDE.md` - ✅ Created (common errors, debugging strategies)
- `docs/TEAM_ONBOARDING.md` - ✅ Created (Model 3 documentation)
- `site/usage-paths/team-onboarding.md` - ✅ Created (site version)
### Files Verified ✅
- `CONTRIBUTING.md` - ✅ Exists and matches paper description
- `docs/INSTRUCTOR_GUIDE.md` - ✅ Exists (source for INSTRUCTOR.md)
### Content Updates ✅
- Module numbers: All updated to `01_tensor` (not `01_setup`)
- Schedule: Updated to match current 20-module structure
- Three integration models: All documented
- Deployment environments: All documented
## ✅ Site Navigation - COMPLETE
### Getting Started Section ✅
- Quick Start Guide
- Student Workflow
- For Instructors
- **Team Onboarding** (newly added)
### All Three Integration Models Accessible ✅
1. Self-Paced Learning - Quick Start Guide
2. Institutional Integration - For Instructors
3. Team Onboarding - Team Onboarding page
## ✅ Binder/Colab Setup - COMPLETE
### Binder Configuration ✅
- `binder/requirements.txt` - Dependencies
- `binder/postBuild` - Installs TinyTorch
- Launch buttons configured in `site/_config.yml`
### Colab Configuration ✅
- Launch buttons configured
- Repository URL correct
- Documentation complete
## 🎯 Pre-Launch Checklist
### Required Actions
1. **Generate Assignment Notebooks**:
```bash
tito nbgrader generate --all
```
This creates notebooks for all modules in `assignments/source/`
2. **Test Site Build**:
```bash
cd site
make html
```
Verify:
- Notebooks are prepared automatically
- Launch buttons appear on notebook pages
- Site builds without errors
3. **Test Binder**:
- Visit: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main
- Verify build completes (2-5 minutes)
- Verify TinyTorch imports correctly
- Verify modules are accessible
4. **Test Colab**:
- Test with sample notebook
- Verify dependencies install
- Verify notebooks run correctly
5. **Verify Documentation Links**:
- Check all site navigation links work
- Verify INSTRUCTOR.md accessible
- Verify TA_GUIDE.md accessible
- Verify Team Onboarding page works
### Optional Enhancements
- Add sample solutions to INSTRUCTOR.md (if not already included)
- Create common errors FAQ page
- Add deployment guide consolidating JupyterHub/Colab/Local
- Test with actual assignment notebooks
## 📊 Final Status
| Component | Status | Ready for Launch |
|-----------|--------|-----------------|
| Assignment Generation | ✅ Complete | ✅ Yes |
| Site Build Integration | ✅ Complete | ✅ Yes |
| Paper Documentation | ✅ Complete | ✅ Yes |
| Site Navigation | ✅ Complete | ✅ Yes |
| Binder Setup | ✅ Complete | ⏳ Test needed |
| Colab Setup | ✅ Complete | ⏳ Test needed |
## 🚀 Launch Steps
1. Generate assignment notebooks: `tito nbgrader generate --all`
2. Build site: `cd site && make html`
3. Test Binder: Visit Binder URL
4. Test Colab: Test with sample notebook
5. Verify all links work
6. **LAUNCH!** 🎉
---
**Everything is synced and ready!** Just need to generate notebooks and test launch buttons.

View File

@@ -0,0 +1,117 @@
# Marimo Integration for TinyTorch
## What is Marimo?
[Marimo](https://marimo.io/) is a modern, reactive Python notebook platform that:
- **Stores notebooks as pure Python** (`.py` files) - Git-friendly!
- **Reactive execution** - Cells update automatically when dependencies change
- **Interactive elements** - Built-in widgets, sliders, dataframes
- **AI-native** - Built-in AI assistance and copilots
- **Share as apps** - Export to HTML or serve as web apps
- **Reproducible** - Deterministic execution, no hidden state
## Why Marimo for TinyTorch?
**Perfect Fit:**
1.**Git-friendly** - Notebooks stored as `.py` files (matches TinyTorch's Python-first approach!)
2.**Reactive** - Great for teaching (students see changes propagate automatically)
3.**Educational** - Used by Stanford, UC Berkeley, Princeton, etc.
4.**Modern** - Better than Jupyter for many use cases
5.**Open source** - Free and community-driven
## Marimo vs Current Options
| Feature | MyBinder | Colab | Marimo |
|---------|----------|-------|--------|
| Git-friendly | ❌ (.ipynb) | ❌ (.ipynb) | ✅ (.py files) |
| Reactive | ❌ | ❌ | ✅ |
| AI assistance | ❌ | ✅ | ✅ |
| Free | ✅ | ✅ | ✅ |
| Zero-setup | ✅ | ⚠️ (needs account) | ✅ |
| GPU access | ❌ | ✅ | ⚠️ (limited) |
## Integration Options
### Option 1: Marimo Molab Badges
Marimo provides "molab" badges that can open notebooks directly from GitHub:
```
https://marimo.app/molab?repo=mlsysbook/TinyTorch&path=path/to/notebook.py
```
**How it works:**
- Notebooks stored as `.py` files in repo
- Badge links to marimo's cloud service
- Opens notebook in marimo's online editor
- No local installation needed
### Option 2: Add to Launch Buttons
Jupyter Book doesn't natively support marimo launch buttons, but we can:
1. Add custom HTML/JavaScript to create marimo badges
2. Use marimo's badge generator
3. Add manual links in notebook pages
### Option 3: Convert Notebooks to Marimo Format
Since marimo uses `.py` files, we could:
1. Keep current `.ipynb` files for Jupyter/Colab/Binder
2. Generate `.py` versions for marimo
3. Add marimo badges alongside existing launch buttons
## Recommendation
**Add Marimo as an Option:**
1. **Keep current setup** (MyBinder + Colab) - they work well
2. **Add marimo badges** to notebook pages for students who want reactive notebooks
3. **Generate `.py` versions** of notebooks for marimo compatibility
**Benefits:**
- Students get choice of notebook platforms
- Marimo's reactive execution helps with learning
- Git-friendly format aligns with TinyTorch's Python-first approach
- Modern, educational tool used by top universities
## Implementation Steps
### Step 1: Generate Marimo-Compatible Notebooks
Since TinyTorch already uses Python-first development (`*_dev.py` files), we could:
- Convert assignment notebooks to marimo format
- Or create marimo-specific versions
### Step 2: Add Marimo Badges
Add to notebook pages:
```html
<a href="https://marimo.app/molab?repo=mlsysbook/TinyTorch&path=site/chapters/modules/01_tensor.py">
<img src="https://marimo.app/badge.svg" alt="Open in Marimo">
</a>
```
### Step 3: Document Marimo Usage
Add to student documentation:
- How to use marimo with TinyTorch
- Benefits of reactive notebooks
- Comparison with Jupyter/Colab
## Current Status
**Not yet integrated** - but marimo would be a great addition!
**Next steps if you want to add it:**
1. Test marimo with TinyTorch notebooks
2. Generate marimo-compatible `.py` files
3. Add badges to site pages
4. Update documentation
## Resources
- [Marimo Website](https://marimo.io/)
- [Marimo Docs](https://docs.marimo.io/)
- [Marimo Gallery](https://marimo.io/gallery)
- [Marimo Badge Generator](https://marimo.io/badge)

View File

@@ -0,0 +1,104 @@
# Marimo and NBGrader Compatibility
## Short Answer: ✅ No, Marimo badges won't break NBGrader
**Why:**
- Marimo badges are just **frontend UI elements** (JavaScript links)
- They don't modify notebook files
- NBGrader reads from actual `.ipynb` files, not from the website
- Badges just create links to open notebooks in Marimo's cloud service
## How It Works
### Marimo Badges (What We Added)
- **What they do**: Add a "🍃 Open in Marimo" link to notebook pages
- **What they don't do**: Modify notebook files or NBGrader metadata
- **Impact on NBGrader**: **None** - they're just links
### NBGrader Workflow
1. Instructors generate notebooks: `tito nbgrader generate MODULE`
2. NBGrader adds metadata to `.ipynb` files (grade_id, points, etc.)
3. Students work in notebooks (Jupyter, Colab, or Marimo)
4. Students submit notebooks back
5. NBGrader reads metadata from submitted `.ipynb` files
## Potential Considerations
### If Students Use Marimo to Edit Notebooks
**Scenario 1: Students open `.ipynb` in Marimo**
- ✅ Marimo can import Jupyter notebooks
- ✅ NBGrader metadata preserved (it's in the `.ipynb` file)
- ✅ Students submit `.ipynb` files back
-**No problem** - NBGrader works normally
**Scenario 2: Students convert to Marimo `.py` format**
- ⚠️ Marimo stores notebooks as `.py` files (not `.ipynb`)
- ⚠️ NBGrader metadata is in `.ipynb` format
- ⚠️ Converting to `.py` might lose NBGrader metadata
-**Solution**: Students should submit `.ipynb` files, not `.py` files
## Best Practice for Students
**For NBGrader assignments:**
1. Students can use Marimo to **view and learn** from notebooks
2. For **submissions**, students should work in `.ipynb` format (Jupyter/Colab)
3. Or convert marimo `.py` back to `.ipynb` before submitting
**For non-graded exploration:**
- Students can freely use Marimo's `.py` format
- Great for learning and experimentation
- No NBGrader concerns
## Recommendation
**Keep Marimo badges** - they're safe:
- ✅ Don't interfere with NBGrader
- ✅ Give students more options for learning
- ✅ Students can use Marimo for exploration
- ✅ For graded work, students use standard `.ipynb` workflow
**Add to student instructions:**
- "Marimo badges are for exploration and learning"
- "For NBGrader assignments, submit `.ipynb` files (not `.py` files)"
- "Marimo can import `.ipynb` files and preserve NBGrader metadata"
## Technical Details
### NBGrader Metadata Format
NBGrader stores metadata in notebook cell metadata:
```json
{
"nbgrader": {
"grade": true,
"grade_id": "tensor_memory",
"points": 2,
"schema_version": 3
}
}
```
### Marimo Format
Marimo stores notebooks as pure Python:
```python
# Cell 1
import numpy as np
# Cell 2
def memory_footprint(self):
return self.data.nbytes
```
**Conversion between formats:**
- `.ipynb``.py`: Possible, but NBGrader metadata might be lost
- `.py``.ipynb`: Possible, but NBGrader metadata won't be restored
## Conclusion
**Marimo badges are safe** - they don't break NBGrader
**Students can use Marimo** for learning and exploration
**For graded work**, students should use `.ipynb` format
**No changes needed** to NBGrader workflow
The badges are just convenient links - they don't interfere with the actual grading system!

80
binder/MARIMO_SETUP.md Normal file
View File

@@ -0,0 +1,80 @@
# Marimo Setup for TinyTorch - No Extra Setup Required! ✅
## Good News: No Extra Setup Needed!
Marimo integration is now **automatically added** to your site. Here's what was done:
## What Was Added
1. **Marimo Badge JavaScript** (`site/_static/marimo-badges.js`)
- Automatically adds "Open in Marimo" badges to notebook pages
- Works alongside existing Binder/Colab buttons
2. **JavaScript Integration**
- Added to `site/_config.yml` so it loads on all pages
- Automatically detects notebook pages
- Creates marimo badges dynamically
## How It Works
When students visit a notebook page:
1. They see existing launch buttons (Binder, Colab)
2. **New**: They also see "🍃 Open in Marimo" badge
3. Clicking opens the notebook in Marimo's cloud service (molab)
4. No account needed for basic use!
## Marimo URLs
Marimo badges use this format:
```
https://marimo.app/molab?repo=mlsysbook/TinyTorch&path=site/chapters/modules/MODULE_NAME.ipynb
```
**Note**: Marimo can work with `.ipynb` files, but ideally we'd convert to `.py` files for full marimo features.
## Testing
To test marimo integration:
1. **Build the site:**
```bash
cd site
make html
```
2. **Open a notebook page** (e.g., `_build/html/chapters/modules/01_tensor.html`)
3. **Look for the marimo badge** - should appear below Binder/Colab buttons
4. **Click "Open in Marimo"** - should open in marimo's cloud editor
## Optional: Convert Notebooks to Marimo Format
For full marimo features (reactive execution, etc.), you could:
1. **Convert `.ipynb` to marimo `.py` format:**
```bash
# Marimo can import Jupyter notebooks
marimo convert notebook.ipynb notebook.py
```
2. **Store marimo versions** in `site/chapters/modules/` as `.py` files
3. **Update marimo URLs** to point to `.py` files instead of `.ipynb`
But this is **optional** - marimo badges work with `.ipynb` files too!
## Current Status
✅ **Marimo badges added** - Will appear on notebook pages
✅ **No extra setup needed** - Just build the site
✅ **Works with existing notebooks** - Uses `.ipynb` files
## Next Steps
1. **Build site** to see marimo badges: `cd site && make html`
2. **Test badges** on notebook pages
3. **Optional**: Convert notebooks to marimo `.py` format for full features
That's it! Marimo integration is ready to go. 🎉

91
binder/MODULE_ORDER.md Normal file
View File

@@ -0,0 +1,91 @@
# TinyTorch Module Order Verification
## ✅ Correct Module Order (modules/ directory)
```
01_tensor - Foundation: N-dimensional arrays
02_activations - Non-linearity functions (ReLU, Sigmoid, Softmax)
03_layers - Neural network layers (Linear, Module base)
04_losses - Loss functions (MSE, CrossEntropy)
05_autograd - Automatic differentiation
06_optimizers - Optimization algorithms (SGD, Adam)
07_training - Training loops
08_dataloader - Data batching and pipelines
09_spatial - Convolutional operations
10_tokenization - Text tokenization
11_embeddings - Word embeddings
12_attention - Attention mechanisms
13_transformers - Transformer architecture
14_profiling - Performance profiling
15_quantization - Model quantization
16_compression - Model compression
17_memoization - KV caching
18_acceleration - Hardware acceleration
19_benchmarking - Performance benchmarking
20_capstone - Torch Olympics competition
```
## ⚠️ Issue Found: Assignments Directory Mismatch
**Current assignments/source/ structure:**
```
01_setup ❌ OUTDATED - Module 01 is now "tensor", not "setup"
02_tensor ✅ Correct
```
**Problem:** The `assignments/source/01_setup/` directory contains an old notebook from when Module 01 was "Setup". Module 01 is now "Tensor" (`modules/01_tensor/`).
## Impact on Binder/Colab
**No impact** - Binder setup doesn't depend on assignment notebooks. The `binder/` configuration:
- Installs TinyTorch package (`pip install -e .`)
- Provides JupyterLab environment
- Students can access any notebooks in the repository
However, for consistency and to avoid confusion:
- Old `01_setup` assignment should be removed or renamed
- Documentation references should point to `01_tensor` (already fixed)
## Module Tiers (from site/_toc.yml)
### 🏗️ Foundation Tier (01-07)
- 01 Tensor
- 02 Activations
- 03 Layers
- 04 Losses
- 05 Autograd
- 06 Optimizers
- 07 Training
### 🏛️ Architecture Tier (08-13)
- 08 DataLoader
- 09 Spatial (Convolutions)
- 10 Tokenization
- 11 Embeddings
- 12 Attention
- 13 Transformers
### ⏱️ Optimization Tier (14-19)
- 14 Profiling
- 15 Quantization
- 16 Compression
- 17 Memoization
- 18 Acceleration
- 19 Benchmarking
### 🏅 Capstone (20)
- 20 Capstone (Torch Olympics)
## Verification Status
**Modules directory**: Correct order (01-20)
**Documentation**: References updated to `01_tensor`
**Binder setup**: Not affected by assignment structure
⚠️ **Assignments**: Contains outdated `01_setup` (should be removed)
## Recommendations
1. **Remove old assignment**: Delete `assignments/source/01_setup/` and `assignments/release/01_setup/`
2. **Verify nbgrader**: Ensure nbgrader commands reference correct module numbers
3. **Update any remaining references**: Search for `01_setup` and update to `01_tensor`

View File

@@ -0,0 +1,89 @@
# Notebook Platform Recommendation
## Current Setup
- **MyBinder**: Zero-setup, no account needed
- **Google Colab**: GPU access, familiar interface
- **Marimo**: Modern reactive notebooks, Git-friendly
## Analysis: Do We Need All Three?
### Use Case: Viewing/Exploration Only
Since online notebooks are **only for viewing/exploration** (not actual work), we should consider:
**Option 1: Keep All Three**
- **Pros**:
- Students get choice
- Different platforms have different strengths
- Binder: Zero-setup, no account
- Colab: GPU access for exploration
- Marimo: Modern, educational
- **Cons**:
- Might be confusing (too many options)
- More maintenance
**Option 2: Keep Just Binder** ✅ Recommended
- **Pros**:
- Simplest option (zero-setup, no account)
- Works for viewing/exploration
- Less confusing for students
- Easier maintenance
- **Cons**:
- No GPU access (but not needed for viewing)
- No Marimo features (but not needed for viewing)
**Option 3: Keep Binder + One Other**
- Binder + Colab: Covers zero-setup + GPU exploration
- Binder + Marimo: Covers zero-setup + modern interface
## Recommendation: Keep Just Binder ✅
**Reasoning:**
1. **Primary use case**: Viewing/exploration (not actual work)
2. **Binder is sufficient**: Zero-setup, no account, works for viewing
3. **Simpler is better**: Less confusion, easier maintenance
4. **Local is required anyway**: Students need local setup for real work
**What to remove:**
- Colab launch buttons (students can still use Colab if they want, just not prominently featured)
- Marimo badges (can add back later if there's demand)
**What to keep:**
- Binder launch buttons (zero-setup viewing)
- Clear messaging: "For viewing only - local setup required for full package"
## Alternative: Keep Binder + Colab
If you want GPU access for exploration:
- **Keep**: Binder (zero-setup) + Colab (GPU exploration)
- **Remove**: Marimo (newest, least familiar)
## Implementation
If we simplify to just Binder:
1. **Update `site/_config.yml`:**
```yaml
launch_buttons:
binderhub_url: "https://mybinder.org"
# Remove colab_url
```
2. **Remove Marimo JavaScript:**
- Remove `marimo-badges.js` from `extra_js`
- Or keep it but make it optional
3. **Update documentation:**
- Clarify that Binder is for viewing only
- Emphasize local setup requirement
## Final Recommendation
**Keep just Binder** because:
- ✅ Simplest option
- ✅ Zero-setup (no account needed)
- ✅ Sufficient for viewing/exploration
- ✅ Less confusing
- ✅ Students need local setup anyway for real work
**Optional**: Keep Colab if you want GPU access for exploration, but it's not essential since students need local setup for actual coursework.

105
binder/ONLINE_VS_LOCAL.md Normal file
View File

@@ -0,0 +1,105 @@
# Online Notebooks vs Local Setup
## Important Distinction
### Online Notebooks (Binder, Colab, Marimo)
**Purpose**: Viewing, learning, exploration
**What you CAN do:**
- ✅ View notebook content
- ✅ Read code and explanations
- ✅ Run basic code cells
- ✅ Learn from examples
**What you CANNOT do:**
- ❌ Import from `tinytorch.*` package (not installed)
- ❌ Run milestone validation scripts
- ❌ Use `tito` CLI commands
- ❌ Execute full experiments
- ❌ Export modules to package
- ❌ Complete the full development workflow
### Local Setup (Required)
**Purpose**: Full package, experiments, milestone validation
**What you CAN do:**
- ✅ Full `tinytorch.*` package available
- ✅ Run milestone validation scripts
- ✅ Use `tito` CLI commands (`tito module complete`, `tito milestone validate`)
- ✅ Execute complete experiments
- ✅ Export modules to package
- ✅ Full development workflow
## When to Use What
### Use Online Notebooks When:
- 📖 **Learning**: Reading through modules to understand concepts
- 🔍 **Exploration**: Quick look at code examples
- 💡 **Inspiration**: Seeing how things work before implementing
- 🚀 **Quick Start**: Getting familiar with the structure
### Use Local Setup When:
- 🏗️ **Building**: Actually implementing modules
-**Validating**: Running milestone checks
- 🧪 **Experimenting**: Running full experiments
- 📦 **Exporting**: Completing modules and exporting to package
- 🎯 **Serious Work**: Doing the actual coursework
## Setup Instructions
### Local Setup (Required for Full Package)
```bash
# 1. Clone repository
git clone https://github.com/mlsysbook/TinyTorch.git
cd TinyTorch
# 2. Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Install TinyTorch package in editable mode
pip install -e .
# 5. Verify installation
tito system doctor
```
Now you have:
- ✅ Full `tinytorch.*` package available
-`tito` CLI commands working
- ✅ Milestone scripts executable
- ✅ Complete development environment
## Student Workflow
**Recommended approach:**
1. **Start Online**: Use Binder/Colab/Marimo to explore and understand modules
2. **Switch to Local**: When ready to build, set up local environment
3. **Work Locally**: Implement modules, run milestones, use CLI tools
4. **Submit**: Export and submit `.ipynb` files for grading
## Common Questions
**Q: Can I do everything online?**
A: No. Online notebooks are for viewing/learning. You need local setup for the full package and experiments.
**Q: Do I need both?**
A: Not required, but recommended. Use online for learning, local for building.
**Q: Can I use online notebooks for assignments?**
A: You can view notebooks online, but you'll need local setup to actually complete modules and run milestone validations.
**Q: What if I only have online access?**
A: You can learn from online notebooks, but you won't be able to complete the full coursework without local installation.
## Summary
- **Online Notebooks**: Great for learning and exploration
- **Local Setup**: Required for building, validating, and completing modules
- **Best Practice**: Use online to learn, local to build

View File

@@ -0,0 +1,165 @@
# Paper Documentation Sync Checklist
## Analysis of paper.tex References
Based on analysis of `paper/paper.tex`, here are the documentation/resources mentioned and their status:
## ✅ Resources Mentioned in Paper
### 1. Module Notebooks ✅
**Paper says**: "module notebooks, NBGrader test suites, milestone validation scripts, and connection maps"
**Status**:
- ✅ Module notebooks exist: `modules/*/.*_dev.py` (source)
- ✅ Generated via: `tito nbgrader generate`
- ✅ Assignment notebooks: `assignments/source/`
- ⚠️ Need to ensure all modules have notebooks generated
### 2. NBGrader Test Suites ✅
**Paper says**: "NBGrader autograding infrastructure", "NBGrader test suites"
**Status**:
- ✅ NBGrader integration: `tito/commands/nbgrader.py`
- ✅ NBGrader guide: `docs/INSTRUCTOR_GUIDE.md`
- ✅ NBGrader style guide: `docs/nbgrader/NBGRADER_STYLE_GUIDE.md`
- ✅ NBGrader quick reference: `docs/nbgrader/NBGrader_Quick_Reference.md`
### 3. Milestone Validation Scripts ✅
**Paper says**: "historical milestone validation", "milestone validation scripts"
**Status**:
- ✅ Milestones exist: `milestones/` directory
- ✅ Milestone docs: `site/chapters/milestones.md`
- ✅ Milestone scripts: `milestones/*/` (Python scripts)
### 4. Connection Maps ✅
**Paper says**: "connection maps showing prerequisite dependencies", "Text-based ASCII connection maps"
**Status**:
- ✅ Connection maps in modules: Each module shows dependencies
- ✅ Learning path: `modules/LEARNING_PATH.md`
- ✅ Visual journey: `site/chapters/learning-journey.md`
- ✅ Learning journey visual: `site/learning-journey-visual.md`
### 5. Instructor Guide ✅
**Paper says**: "Institutional deployment provides NBGrader autograding infrastructure"
**Status**:
- ✅ Instructor guide: `docs/INSTRUCTOR_GUIDE.md`
- ✅ Classroom use: `site/usage-paths/classroom-use.md`
- ⚠️ Need to verify it's synced with paper claims
### 6. Student Quickstart ✅
**Paper says**: "Self-Paced Learning (Primary Use Case)", "zero infrastructure beyond Python"
**Status**:
- ✅ Student quickstart: `docs/STUDENT_QUICKSTART.md`
- ✅ Quickstart guide: `site/quickstart-guide.md`
- ✅ Student workflow: `site/student-workflow.md`
### 7. Deployment Environments ✅
**Paper says**: "JupyterHub (institutional server), Google Colab (zero installation), local installation (pip install tinytorch)"
**Status**:
- ✅ Binder setup: `binder/` directory (for JupyterHub/Binder)
- ✅ Colab setup: Configured in `site/_config.yml`
- ✅ Local install: `pyproject.toml` (pip install tinytorch)
- ✅ Documentation: `binder/README.md`, `binder/VERIFY.md`
### 8. Three Integration Models ✅
**Paper says**:
- Model 1: Self-Paced Learning
- Model 2: Institutional Integration
- Model 3: Team Onboarding
**Status**:
- ✅ Self-paced: `site/quickstart-guide.md`, `site/student-workflow.md`
- ✅ Institutional: `site/usage-paths/classroom-use.md`, `docs/INSTRUCTOR_GUIDE.md`
- ⚠️ Team onboarding: May need dedicated page
### 9. Tier Configurations ✅
**Paper says**: "Configuration 1: Foundation Only (Modules 01--07)", "Configuration 2: Foundation + Architecture", "Configuration 3: Optimization Focus"
**Status**:
- ✅ Tier pages: `site/tiers/foundation.md`, `site/tiers/architecture.md`, `site/tiers/optimization.md`
- ✅ Tier overviews in site structure
### 10. Lecture Materials ⚠️
**Paper says**: "Lecture slides for institutional courses remain future work"
**Status**:
- ⚠️ Correctly marked as future work
- ✅ No false promises
## 🔍 Files to Verify/Update
### Critical Files to Check
1. **docs/INSTRUCTOR_GUIDE.md**
- Verify it matches paper claims about NBGrader workflow
- Check that commands match current `tito` CLI
- Ensure module numbers are correct (01_tensor, not 01_setup)
2. **site/usage-paths/classroom-use.md**
- Verify it covers all three integration models
- Check NBGrader workflow matches paper description
- Ensure deployment options match paper
3. **docs/STUDENT_QUICKSTART.md**
- Verify it matches "zero infrastructure" claim
- Check setup instructions are accurate
- Ensure module references are correct
4. **site/quickstart-guide.md**
- Should match student quickstart
- Verify 15-minute claim is realistic
- Check all links work
### Files That Should Exist But May Be Missing
1. **Team Onboarding Guide** ⚠️
- Paper mentions "Model 3: Team Onboarding"
- May need dedicated page or section
- Check: `site/usage-paths/` or `docs/`
2. **Deployment Guide** ⚠️
- Paper describes three environments (JupyterHub, Colab, Local)
- Should have clear deployment instructions
- Check: `binder/README.md` covers this
3. **Connection Maps Documentation** ⚠️
- Paper mentions "connection maps showing prerequisite dependencies"
- Should be clearly documented
- Check: `modules/LEARNING_PATH.md` and site pages
## 📋 Sync Checklist
### Documentation Files
- [ ] `docs/INSTRUCTOR_GUIDE.md` - Verify module numbers, commands match paper
- [ ] `site/usage-paths/classroom-use.md` - Verify three models covered
- [ ] `docs/STUDENT_QUICKSTART.md` - Verify accuracy, module references
- [ ] `site/quickstart-guide.md` - Verify matches student quickstart
- [ ] `binder/README.md` - Verify deployment environments match paper
- [ ] `site/chapters/milestones.md` - Verify milestone descriptions match paper
### Missing Documentation
- [ ] Team Onboarding Guide (Model 3) - Create if missing
- [ ] Deployment Guide - Consolidate JupyterHub/Colab/Local instructions
- [ ] Connection Maps Guide - Document how to read/use connection maps
### Website Sync
- [ ] All documentation linked from site navigation
- [ ] Instructor guide accessible from site
- [ ] Student quickstart prominent on site
- [ ] Deployment options clearly explained
- [ ] Three integration models documented
## 🎯 Action Items
1. **Verify Instructor Guide** matches paper claims
2. **Check module numbers** throughout (01_tensor, not 01_setup)
3. **Create Team Onboarding guide** if missing
4. **Consolidate deployment docs** (JupyterHub/Colab/Local)
5. **Verify all links** in documentation work
6. **Check site navigation** includes all key docs

View File

@@ -0,0 +1,84 @@
# Exact File Requirements from paper.tex
## Files Explicitly Mentioned in Paper
Based on line-by-line analysis of `paper/paper.tex`, here are the exact files the paper says should exist:
### Line 988: Repository Instructor Resources
The paper states:
> "The repository includes instructor resources: \texttt{CONTRIBUTING.md} (guidelines for bug reports and curriculum improvements), \texttt{INSTRUCTOR.md} (30-minute setup guide, grading rubrics, common student errors), and \texttt{MAINTENANCE.md} (support commitment through 2027, succession planning for community governance)."
**Required Files**:
1. `CONTRIBUTING.md` - Guidelines for bug reports and curriculum improvements
2. `INSTRUCTOR.md` - 30-minute setup guide, grading rubrics, common student errors
3. ~~`MAINTENANCE.md`~~ - **User doesn't want this** (removed)
### Line 999: TA Guide
The paper states:
> "The repository provides \texttt{TA\_GUIDE.md} documenting frequent student errors (gradient shape mismatches, disconnected computational graphs, broadcasting failures) and debugging strategies."
**Required File**:
4. `TA_GUIDE.md` - Frequent student errors and debugging strategies
### Line 1003: Sample Solutions
The paper states:
> "Sample solutions and grading rubrics in \texttt{INSTRUCTOR.md} calibrate evaluation standards."
**Required Content** (in INSTRUCTOR.md):
- Sample solutions
- Grading rubrics
## Summary: Required Files
| File | Purpose | Status |
|------|---------|--------|
| `CONTRIBUTING.md` | Bug reports, curriculum improvements | ✅ Exists |
| `INSTRUCTOR.md` | Setup guide, grading rubrics, common errors, sample solutions | ✅ Created |
| `TA_GUIDE.md` | Common errors, debugging strategies | ✅ Created |
| `MAINTENANCE.md` | Support commitment | ❌ Removed (user preference) |
## What Each File Should Contain
### CONTRIBUTING.md
- Guidelines for bug reports
- Guidelines for curriculum improvements
- Contribution process
### INSTRUCTOR.md
- 30-minute setup guide
- Grading rubrics
- Common student errors
- Sample solutions (for grading calibration)
### TA_GUIDE.md
- Frequent student errors:
- Gradient shape mismatches
- Disconnected computational graphs
- Broadcasting failures
- Debugging strategies
- TA preparation guidance
## Files NOT Mentioned in Paper
These are NOT required by the paper (but may be useful):
- `TEAM_ONBOARDING.md` - Not explicitly mentioned (but Model 3 is described)
- `MAINTENANCE.md` - Mentioned but user doesn't want it
## Action Items
1. ✅ Remove MAINTENANCE.md (done)
2. ✅ Verify CONTRIBUTING.md matches paper description
3. ✅ Verify INSTRUCTOR.md contains all required content:
- 30-minute setup guide ✅
- Grading rubrics ✅
- Common student errors ✅
- Sample solutions ⚠️ Need to verify
4. ✅ Verify TA_GUIDE.md contains:
- Gradient shape mismatches ✅
- Disconnected computational graphs ✅
- Broadcasting failures ✅
- Debugging strategies ✅

113
binder/README.md Normal file
View File

@@ -0,0 +1,113 @@
# Binder Environment Setup
This directory contains configuration files for running TinyTorch in cloud environments via [Binder](https://mybinder.org) and [Google Colab](https://colab.research.google.com).
## Files
- **`requirements.txt`**: Python dependencies for the Binder environment
- **`postBuild`**: Script that runs after environment setup to install TinyTorch
## How It Works
### Binder
When users click the "Launch Binder" button on any notebook page in the TinyTorch documentation:
1. Binder reads `binder/requirements.txt` to install Python dependencies
2. Binder runs `binder/postBuild` to install the TinyTorch package (`pip install -e .`)
3. Users get a fully configured JupyterLab environment with TinyTorch ready to use
**Binder URL Format:**
```
https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main
```
### Google Colab
Colab launch buttons automatically:
1. Clone the repository
2. Install dependencies from `binder/requirements.txt`
3. Run setup commands (users may need to manually run `pip install -e .`)
**Colab URL Format:**
```
https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/path/to/notebook.ipynb
```
## Testing
To test your Binder setup:
1. **Test Binder Build:**
```bash
# Visit: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main
# Or use the badge:
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main)
```
2. **Verify Installation:**
Once Binder launches, test in a notebook:
```python
import tinytorch
print(tinytorch.__version__)
```
3. **Check Available Resources:**
```python
import os
print("Modules:", os.listdir("modules"))
print("Assignments:", os.listdir("assignments"))
print("Milestones:", os.listdir("milestones"))
```
## Troubleshooting
### Binder Build Fails
- Check `binder/requirements.txt` for syntax errors
- Verify `binder/postBuild` has execute permissions (`chmod +x binder/postBuild`)
- Review Binder build logs at: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?urlpath=lab/tree/logs%2Fbuild.log
### Colab Import Errors
- Ensure `binder/requirements.txt` includes all dependencies
- Users may need to run: `!pip install -e .` in a Colab cell
- Check that the repository is public (Colab can't access private repos)
### Package Not Found
- Verify `postBuild` script runs `pip install -e .` correctly
- Check that `pyproject.toml` is in the repository root
- Ensure all dependencies in `requirements.txt` are compatible
## Deployment Environments
As documented in the TinyTorch paper, three deployment environments are supported:
1. **JupyterHub** (institutional server)
- 8-core/32GB supports ~50 students
- Best for classroom use
2. **Google Colab** (zero installation)
- Best for MOOCs and self-paced learning
- No setup required from students
3. **Local Installation** (`pip install tinytorch`)
- Best for self-paced learning and development
- Full control over environment
## Keeping Dependencies Updated
When updating dependencies:
1. Update `requirements.txt` (root) - for local development
2. Update `binder/requirements.txt` - for Binder/Colab
3. Update `site/requirements.txt` - for documentation builds
4. Keep versions synchronized where possible
## References
- [Binder Documentation](https://mybinder.readthedocs.io/)
- [Jupyter Book Launch Buttons](https://jupyterbook.org/en/stable/interactive/launchbuttons.html)
- [Google Colab GitHub Integration](https://colab.research.google.com/github/)

76
binder/REQUIRED_FILES.md Normal file
View File

@@ -0,0 +1,76 @@
# Required Files Based on paper.tex
## Exact File References in Paper
### Line 988: Repository Instructor Resources
The paper explicitly states:
> "The repository includes instructor resources: \texttt{CONTRIBUTING.md} (guidelines for bug reports and curriculum improvements), \texttt{INSTRUCTOR.md} (30-minute setup guide, grading rubrics, common student errors), and \texttt{MAINTENANCE.md} (support commitment through 2027, succession planning for community governance)."
**Required Files**:
1.`CONTRIBUTING.md` - Guidelines for bug reports and curriculum improvements
2.`INSTRUCTOR.md` - 30-minute setup guide, grading rubrics, common student errors
3.`MAINTENANCE.md` - **Removed per user request** (paper mentions it but user doesn't want it)
### Line 999: TA Guide
The paper explicitly states:
> "The repository provides \texttt{TA\_GUIDE.md} documenting frequent student errors (gradient shape mismatches, disconnected computational graphs, broadcasting failures) and debugging strategies."
**Required File**:
4.`TA_GUIDE.md` - Frequent student errors and debugging strategies
### Line 1003: Sample Solutions
The paper states:
> "Sample solutions and grading rubrics in \texttt{INSTRUCTOR.md} calibrate evaluation standards."
**Required Content** (must be in INSTRUCTOR.md):
- Sample solutions (for grading calibration)
- Grading rubrics
## Summary: Required Files
| File | Purpose | Status |
|------|---------|--------|
| `CONTRIBUTING.md` | Bug reports, curriculum improvements | ✅ Exists |
| `INSTRUCTOR.md` | Setup guide, grading rubrics, common errors, sample solutions | ✅ Created |
| `TA_GUIDE.md` | Common errors, debugging strategies | ✅ Created |
## Content Verification
### CONTRIBUTING.md ✅
- Guidelines for bug reports ✅
- Guidelines for curriculum improvements ✅
### INSTRUCTOR.md ✅
- 30-minute setup guide ✅ (Section: "Instructor Setup")
- Grading rubrics ✅ (Section: "Grading Rubric for ML Systems Questions")
- Common student errors ✅ (Section: "Troubleshooting" → "Common Student Issues")
- Sample solutions ⚠️ (Mentioned but need to verify if included)
### TA_GUIDE.md ✅
- Gradient shape mismatches ✅
- Disconnected computational graphs ✅
- Broadcasting failures ✅
- Debugging strategies ✅
## Files NOT Required by Paper
These files exist but are NOT explicitly mentioned in the paper:
- `TEAM_ONBOARDING.md` - Not mentioned (but Model 3 is described in text)
- `MAINTENANCE.md` - Mentioned but removed per user request
- `docs/STUDENT_QUICKSTART.md` - Not explicitly mentioned
- `site/` documentation - Not explicitly mentioned (but needed for website)
## Action Items
1. ✅ Remove MAINTENANCE.md (done)
2. ✅ Verify CONTRIBUTING.md matches paper description
3. ⚠️ Verify INSTRUCTOR.md has sample solutions (need to check/add if missing)
4. ✅ Verify TA_GUIDE.md has all required errors
## Note on MAINTENANCE.md
The paper mentions `MAINTENANCE.md` but the user doesn't want it. The maintenance commitment information (support through 2027, etc.) is described in the paper text but doesn't need to be in a separate file if the user prefers not to have it.

View File

@@ -0,0 +1,348 @@
# Community Integration in Setup Phase
## Revised Vision: Early Community Engagement
Make community participation part of the **initial setup experience**, not something that happens after completing everything. This creates an immediate "I'm part of something bigger" moment.
## Updated User Journey
### Initial Setup Flow
```
1. Clone & Setup
2. tito system doctor (verify installation)
✅ All checks passed!
3. 🎉 "Welcome to TinyTorch!"
4. [Automatic] tito community join
→ Detects country
→ Validates setup
→ Adds to map
→ Shows celebration
5. 🌍 "You're builder #1,234 on the global map!"
6. View map → See community worldwide
```
## Integration Points
### Option 1: Automatic After Setup (Recommended)
**After `tito system doctor` passes:**
```
✅ All checks passed! Your TinyTorch environment is ready.
🎉 Welcome to the TinyTorch Community!
🌍 Join builders from around the world:
Run 'tito community join' to add your location to the map
(Completely optional - only shares country, not exact location)
💡 This is your "hello world" moment - you've successfully set up TinyTorch!
```
**After `tito community join`:**
```
✅ You've joined the TinyTorch Community!
📍 Your Location: United States
🌍 View the map: https://tinytorch.ai/community
🎖️ You're builder #1,234 on the global map!
📊 Community Stats:
• 1,234 builders worldwide
• 45 countries represented
• 5 new builders this week
💡 Continue building modules and run milestones to track your progress!
```
### Option 2: Integrated into Setup Script
**In `setup-environment.sh` or `activate.sh`:**
```bash
# After successful setup
echo ""
echo "🎉 Setup complete! Welcome to TinyTorch!"
echo ""
echo "🌍 Join the global community:"
echo " Run 'tito community join' to add your location to the map"
echo " (Optional - only shares country, completely anonymized)"
echo ""
```
### Option 3: Part of Quick Start Guide
**Update quickstart guide to include:**
```markdown
## Step 3: Join the Community (Optional)
After setup, join builders from around the world:
```bash
tito community join
```
This adds your location (country only) to the global TinyTorch community map.
See where other builders are located: https://tinytorch.ai/community
```
## What Gets Validated
**For community join (setup phase):**
- ✅ Setup verified (`tito system doctor` passed)
- ✅ Environment working
- ✅ Can import TinyTorch
**NOT required:**
- ❌ All milestones passed (can join anytime)
- ❌ All modules completed (can join anytime)
- ❌ Any specific progress (just setup)
**Why this works:**
- Lower barrier to entry
- Immediate community feeling
- Can update later with milestone progress
- More inclusive (everyone can join)
## Progressive Updates
**Users can update their community entry:**
```bash
# Initial join (after setup)
tito community join
# → Adds: Country, setup verified, timestamp
# Later: Update with milestone progress
tito community update
# → Updates: Milestones passed, system type, progress
# → Same anonymous ID, just more info
```
## Map Visualization
**The map shows:**
- **All builders**: Everyone who joined (not just completed)
- **Progress indicators**: Dots colored by milestone progress
- 🟢 All milestones passed
- 🟡 Some milestones passed
- 🔵 Setup complete (just joined)
- **Stats**: Total builders, countries, recent activity
**This creates:**
- Visual proof of global community
- Shows diversity of progress levels
- Encourages continued learning
- Makes everyone feel included
## Implementation Design
### Command: `tito community join`
**What it does:**
1. Validates setup (`tito system doctor` check)
2. Detects/asks for country
3. Generates anonymous ID
4. Creates submission JSON:
```json
{
"anonymous_id": "abc123...",
"timestamp": "2024-11-20T10:30:00Z",
"country": "United States",
"setup_verified": true,
"milestones_passed": 0, // Will update later
"system_type": "Apple Silicon"
}
```
5. Shows celebration message
6. Optionally uploads to map
### Command: `tito community update` (Optional)
**What it does:**
- Updates existing entry with:
- Milestones passed count
- Progress updates
- System type (if changed)
- Uses same anonymous ID
- Shows updated stats
## Setup Script Integration
### Update `setup-environment.sh`:
```bash
#!/bin/bash
# ... existing setup code ...
echo ""
echo "✅ TinyTorch setup complete!"
echo ""
echo "🌍 Join the global TinyTorch community:"
echo " Run 'tito community join' to add your location to the map"
echo " See builders from around the world: https://tinytorch.ai/community"
echo ""
```
### Or in `activate.sh`:
```bash
# After activation
if [ "$FIRST_ACTIVATION" = "true" ]; then
echo ""
echo "🎉 Welcome to TinyTorch!"
echo ""
echo "🌍 Join the community: 'tito community join'"
echo ""
fi
```
## Quick Start Guide Integration
**Add to quickstart guide:**
```markdown
## Step 3: Join the Community (30 seconds)
After setup, join builders from around the world:
```bash
tito community join
```
**What this does:**
- Adds your country to the global map
- Shows you're part of the TinyTorch community
- Completely optional and anonymized
**View the map**: https://tinytorch.ai/community
This is your "hello world" moment - you've successfully set up TinyTorch! 🎉
```
## Benefits of Setup-Phase Integration
### ✅ Immediate Engagement
- Community feeling from day one
- "I'm part of something bigger" moment
- Visual proof of global community
### ✅ Lower Barrier
- No need to complete milestones first
- Just setup verification required
- Everyone can participate
### ✅ Progressive Updates
- Join early (setup phase)
- Update later (milestone progress)
- Continuous engagement
### ✅ Inclusive
- All skill levels welcome
- All progress levels shown
- Not just "winners"
## Recommended Flow
### Phase 1: Setup Integration
1. **After `tito system doctor` passes:**
- Show celebration message
- Suggest `tito community join`
- Explain what it does (country only, optional)
2. **After `tito community join`:**
- Show map URL
- Display community stats
- Celebrate "you're builder #X"
3. **Update quickstart guide:**
- Add community join step
- Explain privacy model
- Link to map
### Phase 2: Map Page
1. **Create `site/community-map.md`:**
- Interactive world map
- Shows all builders (not just completed)
- Progress indicators
- Stats and recent activity
2. **Update site navigation:**
- Add "Community Map" to navigation
- Make it discoverable
### Phase 3: Progressive Updates
1. **Milestone integration:**
- After milestones pass, suggest update
- `tito community update` to add progress
- Map shows progress levels
## Privacy & Consent
**Setup-phase join:**
- Country only (not city)
- System type (optional)
- Setup verified status
- Anonymous ID (no personal info)
**Consent flow:**
```
tito community join
⚠️ This will add your location to the public community map.
📊 What will be shared:
• Country: United States (detected)
• System type: Apple Silicon
• Setup status: Verified ✅
• No personal information
🔒 Privacy: Only country-level location, completely anonymized
Continue? [y/N]: y
✅ You've joined the TinyTorch Community!
🌍 View map: https://tinytorch.ai/community
🎖️ You're builder #1,234 on the global map!
```
## Success Metrics
**Community Growth:**
- Number of builders who join (setup phase)
- Geographic diversity (countries)
- Growth rate (new builders/week)
- Map page views
**Engagement:**
- Join rate after setup
- Return visits to map
- Updates with milestone progress
- Social shares
## Final Recommendation
**Integrate into setup phase:**
1. ✅ **After `tito system doctor`**: Suggest community join
2.**Make it optional**: Clear consent, privacy-respecting
3.**Celebrate immediately**: "You're builder #X"
4.**Show the map**: Visual proof of community
5.**Allow updates**: Can add milestone progress later
**The goal**: Make students feel part of a global community from the moment they successfully set up TinyTorch, not after completing everything.
This creates an immediate "hello world" moment where they see: "Wow, there's a community of people building ML systems all over the world, and I'm one of them!" 🌍✨

167
binder/VERIFY.md Normal file
View File

@@ -0,0 +1,167 @@
# Binder & Colab Verification Guide
This guide helps you verify that Binder and Colab links are working correctly.
## Quick Verification Checklist
- [ ] Binder build completes successfully
- [ ] TinyTorch package imports correctly in Binder
- [ ] Colab can clone repository and install dependencies
- [ ] Launch buttons appear on notebook pages in documentation
- [ ] All three deployment environments work (JupyterHub, Colab, Local)
## Step-by-Step Verification
### 1. Test Binder Build
**Direct URL Test:**
```
https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main
```
**What to check:**
- Build completes without errors (may take 2-5 minutes first time)
- JupyterLab launches successfully
- No import errors in terminal or notebook
**Test in Binder Notebook:**
```python
# Test 1: Import TinyTorch
import tinytorch
print(f"TinyTorch version: {tinytorch.__version__}")
# Test 2: Verify modules are accessible
import os
assert os.path.exists("modules"), "Modules directory not found"
assert os.path.exists("assignments"), "Assignments directory not found"
# Test 3: Test basic functionality
from tinytorch.core import Tensor
x = Tensor([1, 2, 3])
print(f"Tensor created: {x}")
```
### 2. Test Colab Integration
**For a specific notebook:**
```
https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/assignments/source/02_tensor/02_tensor.ipynb
```
**What to check:**
- Notebook opens in Colab
- Can run cells without errors
- Dependencies install correctly
**Colab Setup Cell (add to notebooks if needed):**
```python
# Install TinyTorch
!pip install -e /content/TinyTorch
# Verify installation
import tinytorch
print("TinyTorch installed successfully!")
```
### 3. Verify Launch Buttons in Documentation
**Check that launch buttons appear:**
1. Build the site: `cd site && jupyter-book build .`
2. Open `_build/html/index.html` in browser
3. Navigate to any page with notebooks
4. Look for "Launch" buttons in the top-right corner
**Expected buttons:**
- 🚀 Launch Binder
- 🔵 Open in Colab
- 📥 Download notebook
### 4. Test All Three Deployment Environments
As documented in `paper/paper.tex`, TinyTorch supports:
#### A. JupyterHub (Institutional)
- Requires: 8-core/32GB server
- Supports: ~50 concurrent students
- Setup: Install via `pip install tinytorch` or mount repository
#### B. Google Colab (Zero Installation)
- Best for: MOOCs and self-paced learning
- Setup: Automatic via launch buttons
- Verify: Test with sample notebooks
#### C. Local Installation
- Best for: Self-paced learning and development
- Setup: `pip install tinytorch`
- Verify: Run `python -c "import tinytorch; print(tinytorch.__version__)"`
## Common Issues & Solutions
### Issue: Binder build times out
**Solution:**
- Check `binder/requirements.txt` for unnecessary heavy dependencies
- Ensure `postBuild` script is fast (< 2 minutes)
- Consider using `environment.yml` instead if you need conda packages
### Issue: "Module not found" errors in Binder
**Solution:**
- Verify `postBuild` script runs `pip install -e .`
- Check that `pyproject.toml` is in repository root
- Ensure all dependencies are in `binder/requirements.txt`
### Issue: Colab can't access repository
**Solution:**
- Ensure repository is public (Colab can't access private repos)
- Check that notebook path is correct in URL
- Verify GitHub repository URL in `site/_config.yml`
### Issue: Launch buttons don't appear
**Solution:**
- Verify `launch_buttons` configuration in `site/_config.yml`
- Ensure repository URL and branch are correct
- Rebuild the site: `jupyter-book build . --all`
## Automated Testing
You can add a GitHub Actions workflow to test Binder builds:
```yaml
# .github/workflows/test-binder.yml
name: Test Binder Build
on:
schedule:
- cron: '0 0 * * 0' # Weekly
workflow_dispatch:
jobs:
test-binder:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Test Binder Build
uses: jupyterhub/repo2docker-action@master
with:
image-name: tinytorch-binder-test
```
## Monitoring
**Binder Status:**
- Check build status: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main
- View build logs: Add `?urlpath=lab/tree/logs%2Fbuild.log` to URL
**Colab Status:**
- Test with sample notebooks from `assignments/` directory
- Monitor for import errors or dependency issues
## References
- [Binder Documentation](https://mybinder.readthedocs.io/)
- [Jupyter Book Launch Buttons](https://jupyterbook.org/en/stable/interactive/launchbuttons.html)
- [Google Colab GitHub Integration](https://colab.research.google.com/github/)

20
binder/postBuild Executable file
View File

@@ -0,0 +1,20 @@
#!/bin/bash
# Binder postBuild script
# This runs after the environment is set up to install TinyTorch
set -e
echo "🔧 Installing TinyTorch package..."
pip install -e .
echo "✅ TinyTorch installation complete!"
echo ""
echo "📚 Available resources:"
echo " - TinyTorch modules: modules/"
echo " - Course assignments: assignments/"
echo " - Milestone examples: milestones/"
echo ""
echo "🚀 Start exploring with:"
echo " - jupyter lab"
echo " - Or open notebooks directly from the file browser"

28
binder/requirements.txt Normal file
View File

@@ -0,0 +1,28 @@
# TinyTorch Binder Environment
# This file is used by Binder to set up the execution environment
# Keep synchronized with main requirements.txt and site/requirements.txt
# Core numerical computing (TinyTorch dependency)
numpy>=1.24.0,<3.0.0
# Terminal UI (for tito CLI and development feedback)
rich>=13.0.0
# Configuration files (for tito CLI)
PyYAML>=6.0
# Jupyter environment
jupyter>=1.1.0
jupyterlab>=4.2.0
ipykernel>=6.29.0
ipywidgets>=8.0.0
# Visualization (for milestone examples and modules)
matplotlib>=3.9.0
# Type checking support
typing-extensions>=4.12.0
# Note: tinytorch package itself is installed via postBuild script
# This ensures the latest code from the repository is used

View File

@@ -0,0 +1,117 @@
# Baseline & Submission Design: What Makes Sense
## User Concern
**Question**: For baseline and submitting, what makes sense? Worried that running everything can take a while.
## Current Design
### Baseline Benchmark (`tito benchmark baseline`)
**What it does**:
- Quick operations (tensor ops, matmul, forward pass)
- **Time**: ~1 second
- **Purpose**: Setup validation, environment check
- **Normalized**: SPEC-style to reference system
**Current Implementation**:
```python
# Quick operations only
- Tensor operations: ~0.8ms
- Matrix multiply: ~2.5ms
- Forward pass: ~6.7ms
Total: ~10ms (normalized to reference)
```
### Milestones
**What they do**:
- Full ML workflows (training, evaluation)
- **Time**: Minutes (3-30 minutes per milestone)
- **Purpose**: Historical recreations, student validation
- **Requires**: Completed modules (student code)
## Recommendation: Keep Baseline Quick, Milestones Optional
### ✅ Baseline at Setup (Fast)
**Keep current approach**:
- ✅ Quick benchmark (~1 second)
- ✅ Validates environment works
- ✅ Normalized to reference system
- ✅ Good for "Hello World" moment
- ✅ Submit to community immediately
**Why this works**:
- Fast setup validation
- Doesn't require student code
- Meaningful baseline (normalized)
- Community submission ready
### ⚠️ Milestones Later (Optional)
**Run milestones as students complete modules**:
- ⚠️ Takes minutes (not seconds)
- ⚠️ Requires completed modules
- ⚠️ Optional for community submission
- ✅ Better for student validation
**Why milestones shouldn't be at setup**:
- Too slow (minutes vs seconds)
- Requires student code (doesn't exist yet)
- Better for progressive validation
## Submission Strategy
### Setup Phase: Baseline Only
**What to submit**:
- ✅ Baseline benchmark results (normalized)
- ✅ System info (country, institution, etc.)
- ✅ Reference implementation results
**Why**:
- Fast (1 second)
- Meaningful (normalized to reference)
- Works immediately (no student code needed)
### Later Phase: Milestones Optional
**What to submit (optional)**:
- ⚠️ Milestone results (as students complete modules)
- ⚠️ Student code performance vs reference
- ⚠️ Progress tracking
**Why optional**:
- Takes time (minutes per milestone)
- Requires completed modules
- Better for personal tracking than community
## Final Recommendation
**✅ Keep baseline quick** (current approach is correct):
- Fast setup validation (~1 second)
- Submit baseline to community
- Normalized to reference system
**✅ Milestones stay separate**:
- Run as students complete modules
- Optional for community submission
- Better for personal progress tracking
**Result**:
- Setup is fast (1 second baseline)
- Community gets meaningful data (normalized baseline)
- Students can optionally submit milestones later
- No time concerns at setup
## Implementation
**Current `tito benchmark baseline`**:
- ✅ Already fast (~1 second)
- ✅ Already normalized
- ✅ Already prompts for submission
- ✅ Perfect for setup phase
**No changes needed!** Current design is correct.

View File

@@ -0,0 +1,136 @@
# Benchmark Normalization - SPEC-Style Reference System
## Overview
TinyTorch baseline benchmarks use **SPEC-style normalization** to ensure fair comparison across different hardware. Results are normalized to a reference system, making scores comparable regardless of your hardware.
## How It Works
### Reference System
**Reference Hardware:**
- CPU: Intel i5-8th generation
- RAM: 16GB
- Platform: Mid-range laptop
**Reference Times:**
- Tensor Operations: 0.8ms
- Matrix Multiply: 2.5ms
- Forward Pass: 6.7ms
- **Total: 10.0ms**
### Normalization Formula
**SPEC-style normalization:**
```
normalized_score = reference_time / actual_time
```
**Score Calculation:**
```
score = min(100, 100 * normalized_score)
```
### Examples
**Fast System (M3 Mac):**
- Actual time: 5.0ms
- Normalized: 10.0 / 5.0 = 2.0x
- Score: min(100, 100 * 2.0) = **100** (capped at 100)
**Reference System:**
- Actual time: 10.0ms
- Normalized: 10.0 / 10.0 = 1.0x
- Score: min(100, 100 * 1.0) = **100**
**Slower System (Older Laptop):**
- Actual time: 20.0ms
- Normalized: 10.0 / 20.0 = 0.5x
- Score: min(100, 100 * 0.5) = **50**
## Why Normalization Matters
### Without Normalization
- Fast hardware gets high scores unfairly
- Slow hardware gets low scores unfairly
- Can't compare optimization skill across systems
### With Normalization
- ✅ Scores are comparable across hardware
- ✅ Focus on optimization skill, not hardware
- ✅ Fair comparison (like SPEC benchmarks)
## Score Interpretation
**Score Range:**
- **100**: Reference system performance or better
- **80-99**: Slightly slower than reference
- **60-79**: Moderately slower than reference
- **40-59**: Significantly slower than reference
- **<40**: Very slow (may indicate setup issues)
**Normalized Multiplier:**
- **>1.0x**: Faster than reference system
- **1.0x**: Same as reference system
- **<1.0x**: Slower than reference system
## Technical Details
### Reference Times Selection
Reference times are based on:
- Mid-range consumer hardware (common student setup)
- Conservative estimates (most systems should meet or exceed)
- Real-world performance expectations
### Score Capping
Scores are capped at 100 to:
- Prevent unfair advantage for very fast hardware
- Keep focus on "setup validation" not "hardware competition"
- Maintain educational focus
### Future Adjustments
Reference times can be updated if:
- Hardware landscape changes significantly
- Better baseline data becomes available
- Community feedback suggests adjustment needed
## Comparison to SPEC
**Similarities:**
- Normalize to reference system
- Hardware-independent scores
- Fair comparison across systems
**Differences:**
- SPEC: Multiple benchmarks, complex scoring
- TinyTorch: Simple baseline validation, educational focus
- SPEC: Competitive benchmarking
- TinyTorch: Setup validation and learning
## Implementation
Reference times are defined in `tito/commands/benchmark.py`:
```python
def _get_reference_times(self) -> Dict[str, float]:
"""Get reference times for normalization (SPEC-style)."""
return {
"tensor_ops": 0.8,
"matmul": 2.5,
"forward_pass": 6.7,
"total": 10.0
}
```
Normalization happens automatically in `_run_baseline()`.
## Benefits
1. **Fair Comparison**: Scores mean the same thing on any hardware
2. **Educational Focus**: Emphasizes setup validation, not hardware
3. **Industry Standard**: Follows SPEC/MLPerf normalization principles
4. **Motivation**: Students can achieve good scores regardless of hardware

View File

@@ -0,0 +1,339 @@
# Community & Benchmark Commands - Implementation Document
## Overview
This document describes the implementation of community and benchmark commands for TinyTorch, an educational ML systems framework. The goal is to create a "Hello World" user journey where students feel part of a global cohort after completing setup and initial milestones.
## Design Philosophy
**Educational Focus**: TinyTorch is an educational framework. Community features should:
- Encourage learning and progress, not competition
- Create cohort feeling (students see peers, not rankings)
- Be privacy-friendly (all data optional, anonymous IDs)
- Work locally first, sync to website later
**Local-First Approach**:
- All data stored project-locally in `.tinytorch/` directory
- Website integration via stubs (ready for future API)
- No external dependencies required for core functionality
## Implementation
### 1. Benchmark Commands (`tito benchmark`)
#### Baseline Benchmark (`tito benchmark baseline`)
**Purpose**: Quick setup validation - "Hello World" moment
**What it does**:
- Runs lightweight benchmarks (tensor ops, matrix multiply, forward pass)
- Calculates score (0-100) based on performance
- Saves results to `.tito/benchmarks/baseline_TIMESTAMP.json`
- Auto-prompts for submission after completion
**When to run**: After setup, anytime
**Output Example**:
```
🎯 Baseline Benchmark
📊 Your Baseline Performance:
• Tensor Operations: ⚡ 0.5ms
• Matrix Multiply: ⚡ 2.3ms
• Forward Pass: ⚡ 5.2ms
• Score: 85/100
✅ Setup verified and working!
```
#### Capstone Benchmark (`tito benchmark capstone`)
**Purpose**: Full performance evaluation after Module 20
**What it does**:
- Runs full benchmark suite from Module 20
- Supports tracks: speed, compression, accuracy, efficiency, all
- Uses Module 19's Benchmark class (when available)
- Falls back gracefully if Module 20 not complete
- Auto-prompts for submission after completion
**When to run**: After Module 20 (Capstone)
**Output Example**:
```
🏆 Capstone Benchmark Results
📊 Speed Track:
• Latency: 45.2ms
• Throughput: 22.1 ops/sec
• Score: 92/100
📊 Overall Score: 90/100
```
### 2. Community Commands (`tito community`)
#### Join (`tito community join`)
**Purpose**: Join the global TinyTorch community
**What it does**:
- Collects: country, institution, course type, experience level (all optional)
- Generates anonymous UUID
- Auto-detects cohort (Fall 2024, Spring 2025, etc.)
- Saves profile to `.tinytorch/community/profile.json`
- Shows welcome message with cohort info
**Privacy**: All fields optional, anonymous IDs, local storage
#### Update (`tito community update`)
**Purpose**: Update community profile
**What it does**:
- Updates profile fields (country, institution, course type, experience)
- Auto-updates progress from `.tito/milestones.json` and `.tito/progress.json`
- Interactive or command-line updates
#### Leave (`tito community leave`)
**Purpose**: Remove community profile
**What it does**:
- Removes profile file
- Confirmation prompt (can skip with `--force`)
- Preserves benchmark submissions
#### Stats & Profile (`tito community stats`, `tito community profile`)
**Purpose**: View community information
**What it does**:
- Shows community statistics
- Displays full profile in table format
- Shows progress: milestones, modules, capstone score
## Data Storage
### Project-Local Storage (`.tinytorch/`)
All data stored in project root, not home directory:
```
.tinytorch/
├── config.json # Configuration (website URLs, settings)
├── community/
│ └── profile.json # User's community profile
└── submissions/ # Benchmark submissions (ready for website)
```
### Profile Structure (`profile.json`)
```json
{
"anonymous_id": "uuid",
"joined_at": "2024-11-20T10:30:00",
"location": {
"country": "United States"
},
"institution": {
"name": "Harvard University",
"type": null
},
"context": {
"course_type": "university",
"experience_level": "intermediate",
"cohort": "Fall 2024"
},
"progress": {
"setup_verified": false,
"milestones_passed": 0,
"modules_completed": 0,
"capstone_score": null
}
}
```
### Configuration (`config.json`)
```json
{
"website": {
"base_url": "https://tinytorch.ai",
"community_map_url": "https://tinytorch.ai/community",
"api_url": null,
"enabled": false
},
"local": {
"enabled": true,
"auto_sync": false
}
}
```
## Website Integration Stubs
All commands have stubs for future website integration:
### Join Notification
```python
def _notify_website_join(self, profile: Dict[str, Any]) -> None:
"""Stub: Notify website when user joins."""
config = self._get_config()
if not config.get("website", {}).get("enabled", False):
return
api_url = config.get("website", {}).get("api_url")
if api_url:
# TODO: Implement API call when website is ready
# import requests
# response = requests.post(f"{api_url}/api/community/join", json=profile)
pass
```
### Leave Notification
```python
def _notify_website_leave(self, anonymous_id: Optional[str]) -> None:
"""Stub: Notify website when user leaves."""
# Similar structure
```
### Benchmark Submission
```python
def _submit_to_website(self, submission: Dict[str, Any]) -> None:
"""Stub: Submit benchmark results to website."""
# Similar structure
```
**Current Behavior**: Stubs check configuration. If website integration disabled (default), commands work purely locally. When enabled, stubs will make API calls.
## User Journey
### 1. Setup & Join
```bash
# After setup
tito community join
# → Collects info, saves profile, shows welcome
# Run baseline benchmark
tito benchmark baseline
# → Runs benchmarks, shows results, prompts for submission
```
### 2. Progress Updates
```bash
# Update profile
tito community update
# → Updates fields, auto-updates progress
# View profile
tito community profile
# → Shows full profile with progress
```
### 3. Capstone Completion
```bash
# After Module 20
tito benchmark capstone
# → Runs full benchmarks, prompts for submission
```
## Privacy & Security
**Privacy Features**:
- ✅ All fields optional
- ✅ Anonymous UUIDs (no personal identifiers)
- ✅ Local storage (user controls sharing)
- ✅ No auto-detection (country detection disabled)
- ✅ Explicit consent for sharing
**Security Considerations**:
- Profile data stored locally (not transmitted unless user opts in)
- Anonymous IDs prevent tracking
- Website integration opt-in only
## Educational Benefits
**Cohort Feeling**:
- Students see they're part of a global community
- Cohort identification (Fall 2024, Spring 2025, etc.)
- Institution-based cohorts (Harvard, Stanford, etc.)
- Progress comparisons (milestones, modules completed)
**Motivation**:
- "Hello World" moment after setup
- Progress tracking and celebration
- Community map visualization (future)
- Peer visibility (future)
**Learning Support**:
- Not competitive (no rankings)
- Encourages sharing and learning
- Privacy-friendly (students control data)
## Technical Implementation
### Files Created
- `tito/commands/benchmark.py` - Benchmark commands
- `tito/commands/community.py` - Community commands
### Files Modified
- `tito/commands/__init__.py` - Added command exports
- `tito/main.py` - Registered new commands
### Dependencies
- `rich` - Beautiful terminal output (already in requirements)
- `numpy` - Benchmark calculations (already in requirements)
- No external API dependencies (local-first)
## Future Enhancements
**Phase 1 (Current)**: ✅
- Local storage
- Basic commands
- Website stubs
**Phase 2 (Future)**:
- Website API integration
- Community map visualization
- Cohort filtering and comparisons
- Progress rankings (optional, opt-in)
**Phase 3 (Future)**:
- Real-time updates
- Peer connections
- Study groups
- Mentorship matching
## Testing
Commands are ready to test:
```bash
# Test benchmark
tito benchmark baseline
tito benchmark capstone
# Test community
tito community join
tito community profile
tito community update
tito community stats
tito community leave
```
## Questions for Expert Review
1. **Storage Approach**: Is project-local storage (`.tinytorch/`) the right approach for an educational framework? Should we consider home directory instead?
2. **Privacy Model**: Is the anonymous UUID + optional fields approach appropriate for students? Any privacy concerns?
3. **Website Integration**: Are the stubs structured correctly? Should we use a different pattern for future API integration?
4. **Educational Focus**: Does this design support learning without creating unhealthy competition? Are there features we should add/remove?
5. **Cohort Features**: Is cohort identification (Fall 2024, institution-based) the right approach? Should we add more cohort types?
6. **Benchmark Design**: Are baseline and capstone benchmarks appropriate? Should we add more benchmark types?
7. **Data Collection**: What data should we collect? What should we avoid?
8. **Community Map**: Is a global map visualization appropriate for an educational framework? Privacy concerns?
9. **Integration Points**: Should we integrate with existing systems (GitHub, LMS, etc.)?
10. **Scalability**: Will this design scale to thousands of students? What bottlenecks should we anticipate?

136
docs/CONFIGURATION_SETUP.md Normal file
View File

@@ -0,0 +1,136 @@
# Community Configuration Setup
## Storage Location
All community data is stored **project-locally** in `.tinytorch/` directory (not in home directory):
```
.tinytorch/
├── config.json # Configuration (website URLs, settings)
├── community/
│ └── profile.json # User's community profile
└── submissions/ # Benchmark submissions (ready for website)
```
## Configuration File (`.tinytorch/config.json`)
The configuration file is automatically created on first use with these defaults:
```json
{
"website": {
"base_url": "https://tinytorch.ai",
"community_map_url": "https://tinytorch.ai/community",
"api_url": null,
"enabled": false
},
"local": {
"enabled": true,
"auto_sync": false
}
}
```
### Configuration Fields
**Website Settings:**
- `base_url`: Base URL for TinyTorch website
- `community_map_url`: URL to community map page
- `api_url`: API endpoint URL (set when API is ready)
- `enabled`: Enable website integration (set to `true` when ready)
**Local Settings:**
- `enabled`: Always `true` - local storage is always enabled
- `auto_sync`: Auto-sync to website when enabled (future feature)
## Website Integration Stubs
All commands have stubs for website integration that are currently disabled:
### Join Command
```python
def _notify_website_join(self, profile: Dict[str, Any]) -> None:
"""Stub: Notify website when user joins."""
config = self._get_config()
if not config.get("website", {}).get("enabled", False):
return
api_url = config.get("website", {}).get("api_url")
if api_url:
# TODO: Implement API call when website is ready
# import requests
# response = requests.post(f"{api_url}/api/community/join", json=profile)
pass
```
### Leave Command
```python
def _notify_website_leave(self, anonymous_id: Optional[str]) -> None:
"""Stub: Notify website when user leaves."""
# Similar structure to join
```
### Benchmark Submission
```python
def _submit_to_website(self, submission: Dict[str, Any]) -> None:
"""Stub: Submit benchmark results to website."""
# Similar structure
```
## Enabling Website Integration
When the website API is ready:
1. **Update configuration:**
```json
{
"website": {
"api_url": "https://api.tinytorch.ai",
"enabled": true
}
}
```
2. **Implement API calls:**
- Uncomment TODO sections in `community.py` and `benchmark.py`
- Add `requests` dependency if needed
- Implement error handling
3. **Test integration:**
- Test join/leave notifications
- Test benchmark submission
- Verify data sync
## Current Behavior (Local-Only)
**All commands work locally:**
- ✅ `tito community join` - Saves profile to `.tinytorch/community/profile.json`
- ✅ `tito community update` - Updates local profile
- ✅ `tito community leave` - Removes local profile
- ✅ `tito benchmark baseline` - Saves to `.tito/benchmarks/`
- ✅ `tito benchmark capstone` - Saves to `.tito/benchmarks/`
**Website stubs are present but disabled:**
- Stubs call `_get_config()` to check if website is enabled
- If disabled (default), commands work purely locally
- When enabled, stubs will make API calls
## Benefits of Project-Local Storage
1. **Version Control Friendly**: `.tinytorch/` can be gitignored or committed
2. **Project-Specific**: Each TinyTorch project has its own community profile
3. **Portable**: Easy to move/share projects with their data
4. **Privacy**: Data stays in project, not in home directory
## Migration Notes
If you had data in `~/.tinytorch/`, you can migrate:
```bash
# Copy old data to new location
cp -r ~/.tinytorch/community .tinytorch/
cp ~/.tinytorch/config.json .tinytorch/config.json # if exists
```
The new system will automatically use `.tinytorch/` in the project root.

View File

@@ -0,0 +1,62 @@
# Documentation Structure - Single Source of Truth
## Site Documentation (`site/`)
**Purpose**: User-facing website content (built with Jupyter Book)
**Files**:
- `site/community.md` - Community features for website visitors
- `site/quickstart-guide.md` - Quick start guide
- `site/student-workflow.md` - Student workflow guide
- `site/instructor-guide.md` - Instructor guide (copied from docs/)
- `site/usage-paths/classroom-use.md` - Classroom usage guide
**Build**: These files are built into the website via `make html` in `site/`
## Developer Documentation (`docs/`)
**Purpose**: Technical documentation for developers and experts
**Files**:
- `docs/COMMUNITY_BENCHMARK_IMPLEMENTATION.md` - Full implementation details
- `docs/EXPERT_FEEDBACK_ANALYSIS.md` - Expert feedback analysis
- `docs/EXPERT_FEEDBACK_REQUEST.md` - Questions for experts
- `docs/PRIVACY_DATA_RETENTION.md` - Privacy policy
- `docs/CONFIGURATION_SETUP.md` - Configuration guide
- `docs/COMMUNITY_FEATURES_SUMMARY.md` - Quick summary
**Note**: These are NOT included in the website build - they're for developers/experts
## Root Documentation
**Purpose**: Repository-level documentation
**Files**:
- `README.md` - Main repository README
- `CONTRIBUTING.md` - Contribution guidelines
- `INSTRUCTOR.md` - Instructor guide (root copy)
- `TA_GUIDE.md` - TA guide (root copy)
## Single Source Principle
**Site files** (`site/*.md`):
- ✅ Single source: `site/community.md` is the ONLY community page for website
- ✅ No duplicates in `docs/` for website content
**Developer docs** (`docs/*.md`):
- ✅ Technical details for developers
- ✅ NOT built into website (separate purpose)
**Root docs** (`*.md`):
- ✅ Repository-level documentation
- ✅ Referenced by paper.tex
## File Locations Summary
| Content Type | Location | Purpose | Built into Site? |
|-------------|----------|---------|------------------|
| Community features | `site/community.md` | Website page | ✅ Yes |
| Quick start | `site/quickstart-guide.md` | Website page | ✅ Yes |
| Student workflow | `site/student-workflow.md` | Website page | ✅ Yes |
| Implementation details | `docs/COMMUNITY_*.md` | Developer docs | ❌ No |
| Privacy policy | `docs/PRIVACY_*.md` | Developer docs | ❌ No |
| Expert feedback | `docs/EXPERT_*.md` | Developer docs | ❌ No |
**All documentation is in the correct location with no duplicates.**

View File

@@ -0,0 +1,88 @@
# Expert Analysis: Setup Validation Approach
## Research Summary
Based on research into MLPerf, SPEC benchmarks, and educational ML frameworks, here's expert-informed analysis.
**Final Decision**: Keep current baseline approach (fast, ~1 second) rather than milestone-based validation. See `BASELINE_SUBMISSION_DESIGN.md` for final design.
## Key Findings
### 1. MLPerf Approach: Reference Implementation Required
**MLPerf Practice**:
-**Reference implementations are standard** - everyone runs same reference code
-**Baseline measurements** - establish reference performance first
-**Normalized comparison** - results normalized to reference system
-**Comprehensive validation** - full workflow testing, not just basic ops
**Key Insight**: MLPerf requires reference implementations for fair comparison. This supports your original vision!
### 2. SPEC Approach: Reference System Normalization
**SPEC Practice**:
-**Reference system defined** - specific hardware configuration
-**Normalized scores** - all results normalized to reference
-**Comprehensive benchmarks** - full application workloads
-**Baseline establishment** - reference performance is baseline
**Key Insight**: SPEC uses comprehensive benchmarks normalized to reference. This aligns with milestone approach!
### 3. Educational Framework Best Practices
**Research Findings**:
-**Milestone-based validation** - recognized best practice for educational platforms
-**Progressive validation** - validate at each stage, not just setup
-**Clear expectations** - students see what they're working toward
-**Reference comparisons** - compare student work to reference implementations
**Key Insight**: Educational frameworks use milestone-based validation with reference comparisons!
## Expert Recommendations
### ✅ Milestone-Based Validation is Appropriate
**Why**:
1. **Industry Standard**: MLPerf and SPEC use comprehensive benchmarks
2. **Educational Best Practice**: Milestone validation is recognized approach
3. **Better Baseline**: Real milestone results more meaningful than basic ops
4. **Fair Comparison**: Reference implementation ensures fairness
### ✅ Reference Fallback is Standard Practice
**Why**:
1. **MLPerf Does This**: Reference implementations are standard
2. **Educational Tools Do This**: Compare student code to reference
3. **Fair Comparison**: Everyone runs same reference code
4. **Progressive Validation**: Students compare their code to reference
### ⚠️ Implementation Considerations
**Best Practices**:
1. **Clear Labeling**: Mark results as "reference" vs "student"
2. **Normalization**: Normalize to reference system (SPEC-style)
3. **Progressive**: Run milestones as students complete modules
4. **Transparency**: Show what's reference vs student code
## Recommendation
**✅ Your Original Vision is Correct!**
**Milestone-based setup validation with reference fallback**:
- ✅ Aligns with MLPerf/SPEC practices
- ✅ Follows educational framework best practices
- ✅ Creates better student experience
- ✅ Provides meaningful baseline results
**Implementation**:
1. Add reference fallback to milestones (PyTorch if `tinytorch.*` fails)
2. Run milestones at setup with reference implementation
3. Generate normalized baseline results
4. Students later run with THEIR code and compare
## Conclusion
**Expert consensus**: Milestone-based validation with reference fallback is the right approach for educational ML frameworks. It aligns with industry standards (MLPerf, SPEC) and educational best practices.
**Your original idea was correct!** The challenge is implementation, not concept.

View File

@@ -0,0 +1,154 @@
# Expert Feedback Request: Community & Benchmark Features
**TinyTorch** - Educational ML Systems Framework
We're building community and benchmark features for TinyTorch, an educational framework where students build ML components from scratch. Seeking feedback from TensorFlow/PyTorch community experts and educational ML framework developers.
## Context
We're building **TinyTorch**, an educational ML systems framework where students build ML components from scratch (tensors, autograd, optimizers, CNNs, transformers, etc.). We're implementing community and benchmark features to create a "Hello World" user journey where students feel part of a global cohort.
**Key Question**: Is our design approach appropriate for an educational framework? What would you recommend?
## Our Design
### 1. Storage Approach
- **Project-local storage** (`.tinytorch/` in project root, not `~/.tinytorch/`)
- Rationale: Version control friendly, project-specific, portable
- **Question**: Is this the right approach? Should we use home directory instead?
### 2. Privacy Model
- **Anonymous UUIDs** for all users
- **All fields optional** (country, institution, course type, experience)
- **Local-first**: Data stored locally, website sync opt-in
- **Question**: Is this privacy model appropriate for students? Any concerns?
### 3. Community Features
- **Join/Leave/Update** commands for community profile
- **Cohort identification** (Fall 2024, Spring 2025, institution-based)
- **Progress tracking** (milestones, modules, capstone score)
- **No rankings** (educational focus, not competitive)
- **Question**: Does this support learning without unhealthy competition? Missing features?
### 4. Benchmark Commands
- **Baseline benchmark**: Quick setup validation ("Hello World" moment)
- **Capstone benchmark**: Full performance evaluation after Module 20
- **Auto-submit prompt**: After benchmarks, asks if user wants to submit
- **Question**: Are these benchmark types appropriate? Should we add more?
### 5. Website Integration
- **Stubs for future API**: Commands work locally, ready for website sync
- **Configuration-based**: Enable/disable website integration via config
- **Question**: Is this stub pattern correct? Better approaches?
## Specific Questions
### For TensorFlow/PyTorch Community Experts
1. **Storage Location**:
- We use project-local `.tinytorch/` directory. Is this appropriate for an educational framework?
- Should we consider home directory (`~/.tinytorch/`) instead?
- What do TensorFlow/PyTorch educational tools use?
2. **Privacy & Data Collection**:
- We collect: country, institution, course type, experience level (all optional)
- Anonymous UUIDs, no personal names
- Is this appropriate for students? Any privacy concerns?
- What data should we collect/avoid?
3. **Community Design**:
- Focus on cohort feeling, not competition
- No rankings, just progress tracking
- Is this the right approach for education?
- Should we add competitive features (opt-in)?
4. **Benchmark Design**:
- Baseline (setup validation) + Capstone (full evaluation)
- Should we add more benchmark types?
- How should we handle different hardware/performance?
5. **Website Integration**:
- Local-first with stubs for future API
- Is this pattern correct?
- Should we use a different approach?
6. **Scalability**:
- Will this design scale to thousands of students?
- What bottlenecks should we anticipate?
- Should we plan for distributed storage?
7. **Educational Best Practices**:
- What features encourage learning without creating unhealthy competition?
- Should we add peer connections, study groups, mentorship?
- What features do successful educational ML frameworks have?
8. **Integration Points**:
- Should we integrate with GitHub, LMS, or other systems?
- What integrations would be most valuable for students?
## Our Implementation
### Commands
- `tito benchmark baseline` - Quick setup validation
- `tito benchmark capstone` - Full Module 20 benchmarks
- `tito community join` - Join community (collects optional info)
- `tito community update` - Update profile
- `tito community leave` - Remove profile
- `tito community stats` - View statistics
- `tito community profile` - View profile
### Data Storage
```
.tinytorch/
├── config.json # Configuration
├── community/
│ └── profile.json # User profile
└── submissions/ # Benchmark submissions
```
### Profile Structure
```json
{
"anonymous_id": "uuid",
"joined_at": "2024-11-20T10:30:00",
"location": {"country": "United States"},
"institution": {"name": "Harvard University"},
"context": {
"course_type": "university",
"experience_level": "intermediate",
"cohort": "Fall 2024"
},
"progress": {
"milestones_passed": 0,
"modules_completed": 0,
"capstone_score": null
}
}
```
## What We're Looking For
**Feedback on**:
1. Design approach (is it right for education?)
2. Privacy model (appropriate for students?)
3. Storage location (project-local vs home?)
4. Feature set (missing anything important?)
5. Scalability (will it work at scale?)
6. Best practices (what should we do differently?)
**Recommendations on**:
1. What features to add/remove
2. How to structure data
3. How to integrate with website
4. How to scale to thousands of students
5. What successful educational frameworks do
## Contact
We'd love to hear from:
- TensorFlow/PyTorch community experts
- Educational ML framework developers
- Anyone with experience building community features for educational tools
**Thank you for your time and expertise!**

View File

@@ -0,0 +1,105 @@
# Expert Opinion Request: Setup Validation Approach
## Question for ML Systems Experts
**Context**: We're building TinyTorch, an educational ML framework where students build ML components from scratch (tensors, autograd, optimizers, CNNs, transformers, etc.).
**Decision Point**: How should we validate setup and create baseline results?
## Two Approaches
### Approach 1: Quick Baseline Benchmark (Current)
**What**: Run lightweight benchmarks (tensor ops, matrix multiply, forward pass) - ~1 second
**Pros**:
- ✅ Fast setup validation
- ✅ Doesn't require student code
- ✅ Normalized to reference system (SPEC-style)
- ✅ Simple and reliable
**Cons**:
- ❌ Limited validation (just basic ops)
- ❌ Not comprehensive
- ❌ Doesn't test full ML workflows
### Approach 2: Milestone-Based Validation (Proposed)
**What**: Run full milestone scripts with reference implementation fallback (PyTorch if `tinytorch.*` unavailable)
**Pros**:
- ✅ Comprehensive validation (full ML workflows)
- ✅ Meaningful baseline results (real milestone performance)
- ✅ Better "Hello World" moment (students see what they'll build)
- ✅ Fair comparison (everyone runs same reference)
**Cons**:
- ⚠️ More complex (requires fallback logic)
- ⚠️ Takes longer (minutes vs seconds)
- ⚠️ Requires modifying milestones
## Technical Implementation
**Reference Fallback Approach**:
```python
# In milestone scripts
try:
from tinytorch import Tensor, Linear, ReLU
implementation = "student"
except ImportError:
import torch
Tensor = torch.Tensor
Linear = torch.nn.Linear
ReLU = torch.nn.ReLU
implementation = "reference"
```
**Results**:
- Setup: "Reference baseline: 95% accuracy"
- Later: "Your code: 92% accuracy (vs reference: 95%)"
## Questions for Experts
1. **Setup Validation**: Should setup validation be quick (basic ops) or comprehensive (full workflows)?
2. **Reference Implementation**: Is it appropriate to use PyTorch as reference fallback in educational frameworks?
3. **Baseline Results**: Should baseline be environment-only or framework-level (milestone results)?
4. **Student Experience**: What creates better "Hello World" moment - quick validation or seeing real results?
5. **Best Practices**: What do successful educational ML frameworks (Fast.ai, PyTorch Lightning tutorials) do?
6. **Normalization**: Should we normalize milestone results to reference system (like SPEC)?
7. **Complexity Trade-off**: Is added complexity worth comprehensive validation?
## Our Context
- **Educational Focus**: Students build everything from scratch
- **20 Modules**: Progressive complexity (tensors → transformers)
- **6 Milestones**: Historical recreations (1957-2018)
- **Community Goal**: Students feel part of global cohort
## What We're Seeking
**Expert opinion on**:
- Which approach is better for educational frameworks?
- Is reference fallback appropriate?
- Should setup be quick or comprehensive?
- What creates best student experience?
**Recommendations on**:
- Best practices from industry (MLPerf, SPEC)
- What successful educational frameworks do
- How to balance simplicity vs comprehensiveness
## Contact
We'd love feedback from:
- MLPerf/SPEC benchmark experts
- Educational ML framework developers
- ML systems engineers with educational experience
**Thank you for your expertise!**

View File

@@ -43,12 +43,12 @@ We've wrapped NBGrader behind simple `tito grade` commands so you don't need to
### **1. Prepare Assignments**
```bash
# Generate instructor version (with solutions)
tito grade generate 01_setup
tito grade generate 01_tensor
# Create student version (solutions removed)
tito grade release 01_setup
tito grade release 01_tensor
# Student version will be in: release/tinytorch/01_setup/
# Student version will be in: release/tinytorch/01_tensor/
```
### **2. Distribute to Students**
@@ -65,25 +65,25 @@ tito grade release 01_setup
### **3. Collect Submissions**
```bash
# Collect all students
tito grade collect 01_setup
tito grade collect 01_tensor
# Or specific student
tito grade collect 01_setup --student student_id
tito grade collect 01_tensor --student student_id
```
### **4. Auto-Grade**
```bash
# Grade all submissions
tito grade autograde 01_setup
tito grade autograde 01_tensor
# Grade specific student
tito grade autograde 01_setup --student student_id
tito grade autograde 01_tensor --student student_id
```
### **5. Manual Review**
```bash
# Open grading interface (browser-based)
tito grade manual 01_setup
tito grade manual 01_tensor
# This launches a web interface for:
# - Reviewing ML Systems question responses
@@ -94,7 +94,7 @@ tito grade manual 01_setup
### **6. Generate Feedback**
```bash
# Create feedback files for students
tito grade feedback 01_setup
tito grade feedback 01_tensor
```
### **7. Export Grades**
@@ -103,7 +103,7 @@ tito grade feedback 01_setup
tito grade export
# Or specific module
tito grade export --module 01_setup --output grades_module01.csv
tito grade export --module 01_tensor --output grades_module01.csv
```
## 📊 Grading Components
@@ -138,17 +138,12 @@ tito grade export --module 01_setup --output grades_module01.csv
## 📚 Module Teaching Notes
### **Module 01: Setup**
- **Focus**: Environment configuration, systems thinking mindset
- **Key Concept**: Development environments matter for ML systems
- **Common Issues**: Virtual environment confusion
### **Module 02: Tensor**
### **Module 01: Tensor**
- **Focus**: Memory layout, data structures
- **Key Concept**: Understanding memory is crucial for ML performance
- **Demo**: Show memory profiling, copying behavior
### **Module 03: Activations**
### **Module 02: Activations**
- **Focus**: Vectorization, numerical stability
- **Key Concept**: Small details matter at scale
- **Demo**: Gradient vanishing/exploding

View File

@@ -0,0 +1,110 @@
# Privacy & Data Retention Policy
## Data Collection
TinyTorch collects **optional** information to build a community map and support learning:
- **Country** (optional) - For global visualization
- **Institution** (optional) - For cohort identification
- **Course Type** (optional) - For community insights
- **Experience Level** (optional) - For learning support
**We do NOT collect:**
- Personal names
- Email addresses (unless user provides)
- IP addresses
- Any personally identifiable information
## Anonymous Identification
All users are assigned an **anonymous UUID** when joining the community. This UUID:
- Cannot be linked to personal identity
- Is randomly generated
- Is stored locally in your project
## Data Storage
**Location**: `.tinytorch/` directory (project-local, not home directory)
**Files**:
- `.tinytorch/community/profile.json` - Your community profile
- `.tinytorch/config.json` - Configuration settings
- `.tito/benchmarks/` - Benchmark results
- `.tito/submissions/` - Submission files
**Privacy**: All data is stored locally in your project. You control what is shared.
## Data Retention
**Local Storage**: Data persists until you:
- Run `tito community leave` (removes profile)
- Delete `.tinytorch/` directory
- Remove specific files manually
**Website Sync** (when enabled):
- Data synced to website is retained according to website privacy policy
- You can request deletion via `tito community leave`
- Local data is always removed immediately
## User Rights
**Right to Access**: View your data with `tito community profile`
**Right to Update**: Update your data with `tito community update`
**Right to Deletion**: Remove your data with `tito community leave`
**Right to Opt-Out**: All data collection is optional. You can:
- Skip fields during `tito community join`
- Leave community anytime with `tito community leave`
- Never join community (all features work without joining)
## Consent
**Explicit Consent**: When joining, you'll see:
- What data is collected
- Why it's collected
- How it's stored
- Consent prompt before collection
**Withdrawal**: You can withdraw consent anytime by leaving the community.
## Website Integration
**Current**: Website integration is **disabled by default**. All data stays local.
**Future**: When website integration is enabled:
- You'll be notified before syncing
- You can opt-out of website sync
- Local data remains your primary copy
## Security
**Local Storage**: Files are stored as plain JSON in your project directory.
**Recommendations**:
- Don't commit `.tinytorch/` to public repositories if you include institution info
- Use `.gitignore` to exclude community data if desired
- Keep your project directory secure
## Compliance
**GDPR**: Our design aligns with GDPR principles:
- ✅ Data minimization (only optional fields)
- ✅ Purpose limitation (community map only)
- ✅ User consent (explicit opt-in)
- ✅ Right to deletion (`tito community leave`)
- ✅ Data portability (JSON files)
**FERPA**: For educational institutions:
- No student names collected
- Anonymous identifiers only
- Institution-level aggregation (not individual)
## Questions?
For privacy questions or concerns:
- Review your data: `tito community profile`
- Remove your data: `tito community leave`
- Check configuration: `.tinytorch/config.json`

View File

@@ -31,10 +31,10 @@ tito system doctor
### 2⃣ **Start Your First Module**
```bash
# View the first module
tito module view 01_setup
tito module view 01_tensor
# Or open the notebook directly
jupyter notebook modules/01_setup/setup_dev.ipynb
jupyter notebook modules/01_tensor/tensor_dev.py
```
## 📚 Learning Path
@@ -44,7 +44,7 @@ Each module builds on the previous one:
| Module | What You'll Build | Capability Unlocked |
|--------|------------------|---------------------|
| 01 Setup | Development environment | Configure TinyTorch |
| 01 Tensor | Core data structure | Manipulate ML building blocks |
| 02 Tensor | Core data structure | Manipulate ML building blocks |
| 03 Activations | Non-linearity functions | Add intelligence to networks |
| 04 Layers | Neural network layers | Build network components |

282
docs/TEAM_ONBOARDING.md Normal file
View File

@@ -0,0 +1,282 @@
# Team Onboarding Guide: TinyTorch for Industry
Complete guide for using TinyTorch in industry settings: new hire bootcamps, internal training programs, and debugging workshops.
## 🎯 Overview
TinyTorch's **Model 3: Team Onboarding** addresses industry use cases where ML teams want members to understand PyTorch internals. This guide covers deployment scenarios, training structures, and best practices for industry adoption.
## 🚀 Use Cases
### 1. New Hire Bootcamps (2-3 Week Intensive)
**Goal**: Rapidly onboard new ML engineers to understand framework internals
**Structure**:
- **Week 1**: Foundation Tier (Modules 01-07)
- Tensors, autograd, optimizers, training loops
- Focus: Understanding `loss.backward()` mechanics
- **Week 2**: Architecture Tier (Modules 08-13)
- CNNs, transformers, attention mechanisms
- Focus: Production architecture internals
- **Week 3**: Optimization Tier (Modules 14-19) OR Capstone
- Profiling, quantization, compression
- Focus: Production optimization techniques
**Schedule**:
- Full-time: 40 hours/week
- Hands-on coding: 70% of time
- Systems discussions: 30% of time
- Daily standups and code reviews
**Deliverables**:
- Completed modules with passing tests
- Capstone project (optional)
- Technical presentation on framework internals
### 2. Internal Training Programs (Distributed Over Quarters)
**Goal**: Deep understanding of ML systems for existing team members
**Structure**:
- **Quarter 1**: Foundation (Modules 01-07)
- Weekly sessions: 2-3 hours
- Self-paced module completion
- Monthly group discussions
- **Quarter 2**: Architecture (Modules 08-13)
- Weekly sessions: 2-3 hours
- Architecture deep-dives
- Production case studies
- **Quarter 3**: Optimization (Modules 14-19)
- Weekly sessions: 2-3 hours
- Performance optimization focus
- Real production optimization projects
**Benefits**:
- Fits into existing work schedules
- Allows deep learning without intensive time commitment
- Builds team knowledge gradually
- Enables peer learning
### 3. Debugging Workshops (Focused Modules)
**Goal**: Targeted understanding of specific framework components
**Common Focus Areas**:
#### Autograd Debugging Workshop (Module 05)
- Understanding gradient flow
- Debugging gradient issues
- Computational graph visualization
- **Duration**: 1-2 days
#### Attention Mechanism Workshop (Module 12)
- Understanding attention internals
- Debugging attention scaling issues
- Memory optimization for attention
- **Duration**: 1-2 days
#### Optimization Workshop (Modules 14-19)
- Profiling production models
- Quantization and compression
- Performance optimization strategies
- **Duration**: 2-3 days
## 🏗️ Deployment Scenarios
### Scenario 1: Cloud-Based Training (Recommended)
**Setup**: Google Colab or JupyterHub
- Zero local installation
- Consistent environment
- Easy sharing and collaboration
- **Best for**: Large teams, remote workers
**Steps**:
1. Clone repository to Colab
2. Install dependencies: `pip install -e .`
3. Work through modules
4. Share notebooks via Colab links
### Scenario 2: Local Development Environment
**Setup**: Local Python environment
- Full control over environment
- Better for debugging
- Offline capability
- **Best for**: Smaller teams, on-site training
**Steps**:
1. Clone repository locally
2. Set up virtual environment
3. Install: `pip install -e .`
4. Use JupyterLab for development
### Scenario 3: Hybrid Approach
**Setup**: Colab for learning, local for projects
- Learn in cloud environment
- Apply locally for projects
- **Best for**: Flexible teams
## 📋 Training Program Templates
### Template 1: 2-Week Intensive Bootcamp
**Week 1: Foundation**
- Day 1-2: Modules 01-02 (Tensor, Activations)
- Day 3-4: Modules 03-04 (Layers, Losses)
- Day 5: Module 05 (Autograd) - Full day focus
- Weekend: Review and practice
**Week 2: Architecture + Optimization**
- Day 1-2: Modules 08-09 (DataLoader, CNNs)
- Day 3: Module 12 (Attention)
- Day 4-5: Modules 14-15 (Profiling, Quantization)
- Final: Capstone project presentation
### Template 2: 3-Month Distributed Program
**Month 1: Foundation**
- Week 1: Modules 01-02
- Week 2: Modules 03-04
- Week 3: Module 05 (Autograd)
- Week 4: Modules 06-07 (Optimizers, Training)
**Month 2: Architecture**
- Week 1: Modules 08-09
- Week 2: Modules 10-11
- Week 3: Modules 12-13
- Week 4: Integration project
**Month 3: Optimization**
- Week 1: Modules 14-15
- Week 2: Modules 16-17
- Week 3: Modules 18-19
- Week 4: Capstone optimization project
## 🎓 Learning Outcomes
After completing TinyTorch onboarding, team members will:
1. **Understand Framework Internals**
- How autograd works
- Memory allocation patterns
- Optimization trade-offs
2. **Debug Production Issues**
- Gradient flow problems
- Memory bottlenecks
- Performance issues
3. **Make Informed Decisions**
- Optimizer selection
- Architecture choices
- Deployment strategies
4. **Read Production Code**
- Understand PyTorch source
- Navigate framework codebases
- Contribute to ML infrastructure
## 🔧 Integration with Existing Workflows
### Code Review Integration
- Review production code with TinyTorch knowledge
- Identify framework internals in production code
- Suggest optimizations based on systems understanding
### Debugging Integration
- Apply TinyTorch debugging strategies to production issues
- Use systems thinking for troubleshooting
- Profile production models using TinyTorch techniques
### Architecture Design
- Design new models with systems awareness
- Consider memory and performance from the start
- Make informed trade-offs
## 📊 Success Metrics
### Individual Metrics
- Module completion rate
- Test passing rate
- Capstone project quality
- Self-reported confidence increase
### Team Metrics
- Reduced debugging time
- Fewer production incidents
- Improved code review quality
- Better architecture decisions
## 🛠️ Setup for Teams
### Quick Start
```bash
# 1. Clone repository
git clone https://github.com/mlsysbook/TinyTorch.git
cd TinyTorch
# 2. Set up environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
pip install -e .
# 4. Verify setup
tito system doctor
# 5. Start with Module 01
tito view 01_tensor
```
### Team-Specific Customization
- **Custom datasets**: Replace with company-specific data
- **Domain modules**: Add modules for specific use cases
- **Integration**: Connect to company ML infrastructure
- **Assessment**: Customize grading for team needs
## 📚 Resources
- **Student Quickstart**: `docs/STUDENT_QUICKSTART.md`
- **Instructor Guide**: `INSTRUCTOR.md` (for training leads)
- **TA Guide**: `TA_GUIDE.md` (for support staff)
- **Module Documentation**: `modules/*/ABOUT.md`
## 💼 Industry Case Studies
### Case Study 1: ML Infrastructure Team
**Challenge**: Team members could use PyTorch but couldn't debug framework issues
**Solution**: 2-week intensive bootcamp focusing on autograd and optimization
**Result**: 50% reduction in debugging time, better architecture decisions
### Case Study 2: Research Team
**Challenge**: Researchers needed to understand transformer internals
**Solution**: Focused workshop on Modules 12-13 (Attention, Transformers)
**Result**: Improved model designs, better understanding of scaling
### Case Study 3: Production ML Team
**Challenge**: Team needed optimization skills for deployment
**Solution**: 3-month program focusing on Optimization Tier (Modules 14-19)
**Result**: 4x model compression, 10x speedup on production models
## 🎯 Next Steps
1. **Choose deployment model**: Bootcamp, distributed, or workshop
2. **Set up environment**: Cloud (Colab) or local
3. **Select modules**: Full curriculum or focused selection
4. **Schedule training**: Intensive or distributed
5. **Track progress**: Use checkpoint system or custom metrics
---
**For Questions**: See `INSTRUCTOR.md` or contact TinyTorch maintainers

View File

@@ -21,6 +21,9 @@ help:
html:
@echo "🌐 Building HTML version..."
@echo "📓 Preparing notebooks for launch buttons..."
@./prepare_notebooks.sh || echo "⚠️ Notebook preparation skipped (tito not available)"
@echo ""
jupyter-book build .
pdf:

View File

@@ -56,6 +56,7 @@ html:
- _static/ml-timeline.js
- _static/hero-carousel.js
- _static/sidebar-link.js
- _static/marimo-badges.js
# Favicon configuration
favicon: "_static/favicon.svg"

Binary file not shown.

Before

Width:  |  Height:  |  Size: 132 B

After

Width:  |  Height:  |  Size: 8.0 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 132 B

After

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 131 B

After

Width:  |  Height:  |  Size: 193 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 132 B

After

Width:  |  Height:  |  Size: 3.6 MiB

View File

@@ -0,0 +1,107 @@
/**
* Marimo Badge Integration for TinyTorch
* Adds Marimo "Open in Marimo" badges to notebook pages
*/
document.addEventListener('DOMContentLoaded', function() {
// Find all notebook pages (they have launch buttons)
const launchButtons = document.querySelectorAll('.launch-buttons, .jb-launch-buttons');
if (launchButtons.length === 0) return;
// Add informational message about local setup requirement
const infoMessage = document.createElement('div');
infoMessage.className = 'notebook-platform-info';
infoMessage.style.cssText = `
margin: 1rem 0;
padding: 1rem;
background: #fff3cd;
border-left: 4px solid #ffc107;
border-radius: 0.25rem;
font-size: 0.9rem;
color: #856404;
`;
infoMessage.innerHTML = `
<strong>💡 Note:</strong> These online notebooks are for <strong>viewing and exploration only</strong>.
To actually build modules, run milestone validations, and use the full TinyTorch package,
you need <a href="../quickstart-guide.html" style="color: #856404; text-decoration: underline; font-weight: 600;">local setup</a>.
`;
// Get the current page path to construct marimo URL
const currentPath = window.location.pathname;
const notebookName = currentPath.split('/').pop().replace('.html', '');
// Find the repository info from the page
const repoUrl = 'https://github.com/mlsysbook/TinyTorch';
const repoPath = 'mlsysbook/TinyTorch';
const branch = 'main';
// Construct marimo molab URL
// Marimo can open .ipynb files directly from GitHub
// Format: https://marimo.app/molab?repo=owner/repo&path=path/to/file.ipynb
// Works for all modules: 01_tensor, 02_activations, etc.
const marimoUrl = `https://marimo.app/molab?repo=${repoPath}&path=site/chapters/modules/${notebookName}.ipynb`;
// Create marimo badge
const marimoBadge = document.createElement('div');
marimoBadge.className = 'marimo-launch-badge';
marimoBadge.style.cssText = `
margin-top: 1rem;
padding: 0.75rem;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
border-radius: 0.5rem;
text-align: center;
`;
const marimoLink = document.createElement('a');
marimoLink.href = marimoUrl;
marimoLink.target = '_blank';
marimoLink.rel = 'noopener noreferrer';
marimoLink.style.cssText = `
color: white;
text-decoration: none;
font-weight: 600;
display: inline-flex;
align-items: center;
gap: 0.5rem;
`;
marimoLink.innerHTML = `
<span>🍃</span>
<span>Open in Marimo</span>
<span style="font-size: 0.85em;">→</span>
`;
marimoBadge.appendChild(marimoLink);
// Add info message and marimo badge after launch buttons
launchButtons.forEach(buttonContainer => {
// Add info message first (if not already present)
if (!buttonContainer.querySelector('.notebook-platform-info')) {
buttonContainer.appendChild(infoMessage.cloneNode(true));
}
// Check if marimo badge already exists
if (!buttonContainer.querySelector('.marimo-launch-badge')) {
buttonContainer.appendChild(marimoBadge.cloneNode(true));
}
});
// Also add to any existing launch button sections
const launchSections = document.querySelectorAll('[class*="launch"], [id*="launch"]');
launchSections.forEach(section => {
// Add info message if not present
if (!section.querySelector('.notebook-platform-info')) {
const infoClone = infoMessage.cloneNode(true);
infoClone.style.marginTop = '1rem';
section.appendChild(infoClone);
}
// Add marimo badge if not present
if (!section.querySelector('.marimo-launch-badge')) {
const badgeClone = marimoBadge.cloneNode(true);
badgeClone.style.marginTop = '1rem';
section.appendChild(badgeClone);
}
});
});

View File

@@ -14,6 +14,12 @@ parts:
title: "Student Workflow"
- file: usage-paths/classroom-use
title: "For Instructors"
- file: instructor-guide
title: "Instructor Guide"
- file: usage-paths/ta-guide
title: "TA Guide"
- file: usage-paths/team-onboarding
title: "Team Onboarding"
# Tier captions: Added emojis for visual consistency and quick recognition
# Foundation (🏗), Architecture (🏛️), Optimization (⏱️), Capstone (🏅)

View File

@@ -42,6 +42,10 @@ echo "🧹 Cleaning previous builds..."
jupyter-book clean . --all || true
echo ""
# Prepare notebooks (for consistency, though PDF doesn't need launch buttons)
echo "📓 Preparing notebooks..."
./prepare_notebooks.sh || echo "⚠️ Notebook preparation skipped"
# Build PDF via LaTeX
echo "📚 Building LaTeX/PDF (this may take a few minutes)..."
jupyter-book build . --builder pdflatex

View File

@@ -39,6 +39,10 @@ echo "🧹 Cleaning previous builds..."
jupyter-book clean . --all || true
echo ""
# Prepare notebooks (for consistency, though PDF doesn't need launch buttons)
echo "📓 Preparing notebooks..."
./prepare_notebooks.sh || echo "⚠️ Notebook preparation skipped"
# Build PDF via HTML
echo "📚 Building PDF from HTML (this may take a few minutes)..."
echo " First run will download Chromium browser (~170MB)"

File diff suppressed because it is too large Load Diff

View File

@@ -61,21 +61,63 @@ Real-time chat and study groups:
- Office hours with educators
- Project showcase channels
### Community Dashboard (Planned)
### Community Dashboard (Available Now ✅)
Track global learning progress:
- Real-time completion statistics
- Geographic distribution of learners
- Milestone achievement tracking
- Study partner matching
Join the global TinyTorch community and see your progress:
### Torch Olympics Leaderboard (Planned)
```bash
# Join the community
tito community join
Compete in ML systems challenges:
- Performance benchmarks
- Memory efficiency competitions
- Innovation showcases
- Community recognition
# View your profile
tito community profile
# Update your progress
tito community update
# View community statistics
tito community stats
```
**Features:**
- **Anonymous profiles** - Join with optional information (country, institution, course type)
- **Cohort identification** - See your cohort (Fall 2024, Spring 2025, etc.)
- **Progress tracking** - Automatic milestone and module completion tracking
- **Privacy-first** - All data stored locally in `.tinytorch/` directory
- **Opt-in sharing** - You control what information to share
**Privacy:** All fields are optional. We use anonymous UUIDs (no personal names). Data is stored locally in your project directory. See [Privacy Policy](../docs/PRIVACY_DATA_RETENTION.md) for details.
### Benchmark & Performance Tracking (Available Now ✅)
Validate your setup and track performance improvements:
```bash
# Quick setup validation (after initial setup)
tito benchmark baseline
# Full capstone benchmarks (after Module 20)
tito benchmark capstone
# Submit results to community (optional)
# Prompts automatically after benchmarks complete
```
**Baseline Benchmark:**
- Validates your setup is working correctly
- Quick "Hello World" moment after setup
- Tests: tensor operations, matrix multiply, forward pass
- Generates score (0-100) and saves results locally
**Capstone Benchmark:**
- Full performance evaluation after Module 20
- Tracks: speed, compression, accuracy, efficiency
- Uses Module 19's Benchmark harness for statistical rigor
- Generates comprehensive results for submission
**Submission:** After benchmarks complete, you'll be prompted to submit results (optional). Submissions are saved locally and can be shared with the community.
See [TITO CLI Reference](tito/overview.html) for complete command documentation.
---

578
site/instructor-guide.md Normal file
View File

@@ -0,0 +1,578 @@
# 👩‍🏫 TinyTorch Instructor Guide
Complete guide for teaching ML Systems Engineering with TinyTorch.
## 🎯 Course Overview
TinyTorch teaches ML systems engineering through building, not just using. Students construct a complete ML framework from tensors to transformers, understanding memory, performance, and scaling at each step.
## 🛠️ Instructor Setup
### **1. Initial Setup**
```bash
# Clone and setup
git clone https://github.com/MLSysBook/TinyTorch.git
cd TinyTorch
# Virtual environment (MANDATORY)
python -m venv .venv
source .venv/bin/activate
# Install with instructor tools
pip install -r requirements.txt
pip install nbgrader
# Setup grading infrastructure
tito grade setup
```
### **2. Verify Installation**
```bash
tito system doctor
# Should show all green checkmarks
tito grade
# Should show available grade commands
```
## 📝 Assignment Workflow
### **Simplified with Tito CLI**
We've wrapped NBGrader behind simple `tito grade` commands so you don't need to learn NBGrader's complex interface.
### **1. Prepare Assignments**
```bash
# Generate instructor version (with solutions)
tito grade generate 01_tensor
# Create student version (solutions removed)
tito grade release 01_tensor
# Student version will be in: release/tinytorch/01_tensor/
```
### **2. Distribute to Students**
```bash
# Option A: GitHub Classroom (recommended)
# 1. Create assignment repository from TinyTorch
# 2. Remove solutions from modules
# 3. Students clone and work
# Option B: Direct distribution
# Share the release/ directory contents
```
### **3. Collect Submissions**
```bash
# Collect all students
tito grade collect 01_tensor
# Or specific student
tito grade collect 01_tensor --student student_id
```
### **4. Auto-Grade**
```bash
# Grade all submissions
tito grade autograde 01_tensor
# Grade specific student
tito grade autograde 01_tensor --student student_id
```
### **5. Manual Review**
```bash
# Open grading interface (browser-based)
tito grade manual 01_tensor
# This launches a web interface for:
# - Reviewing ML Systems question responses
# - Adding feedback comments
# - Adjusting auto-grades
```
### **6. Generate Feedback**
```bash
# Create feedback files for students
tito grade feedback 01_tensor
```
### **7. Export Grades**
```bash
# Export all grades to CSV
tito grade export
# Or specific module
tito grade export --module 01_tensor --output grades_module01.csv
```
## 📊 Grading Components
### **Auto-Graded (70%)**
- Code implementation correctness
- Test passing
- Function signatures
- Output validation
### **Manually Graded (30%)**
- ML Systems Thinking questions (3 per module)
- Each question: 10 points
- Focus on understanding, not perfection
### **Grading Rubric for ML Systems Questions**
| Points | Criteria |
|--------|----------|
| 9-10 | Demonstrates deep understanding, references specific code, discusses systems implications |
| 7-8 | Good understanding, some code references, basic systems thinking |
| 5-6 | Surface understanding, generic response, limited systems perspective |
| 3-4 | Attempted but misses key concepts |
| 0-2 | No attempt or completely off-topic |
**What to Look For:**
- References to actual implemented code
- Memory/performance analysis
- Scaling considerations
- Production system comparisons
- Understanding of trade-offs
## 📋 Sample Solutions for Grading Calibration
This section provides sample solutions to help calibrate grading standards. Use these as reference points when evaluating student submissions.
### Module 01: Tensor - Memory Footprint
**Excellent Solution (9-10 points)**:
```python
def memory_footprint(self):
"""Calculate tensor memory in bytes."""
return self.data.nbytes
```
**Why Excellent**:
- Concise and correct
- Uses NumPy's built-in `nbytes` property
- Clear docstring
- Handles all tensor shapes correctly
**Good Solution (7-8 points)**:
```python
def memory_footprint(self):
"""Calculate memory usage."""
return np.prod(self.data.shape) * self.data.dtype.itemsize
```
**Why Good**:
- Correct implementation
- Manually calculates (shows understanding)
- Works but less efficient than using `nbytes`
- Minor: docstring could be more specific
**Acceptable Solution (5-6 points)**:
```python
def memory_footprint(self):
size = 1
for dim in self.data.shape:
size *= dim
return size * 4 # Assumes float32
```
**Why Acceptable**:
- Correct logic but hardcoded dtype size
- Works for float32 but fails for other dtypes
- Shows understanding of memory calculation
- Missing proper dtype handling
### Module 05: Autograd - Backward Pass
**Excellent Solution (9-10 points)**:
```python
def backward(self, gradient=None):
"""Backward pass through computational graph."""
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn is not None:
# Compute gradients for inputs
input_grads = self.grad_fn.backward(gradient)
# Propagate to input tensors
if isinstance(input_grads, tuple):
for input_tensor, input_grad in zip(self.grad_fn.inputs, input_grads):
if input_tensor.requires_grad:
input_tensor.backward(input_grad)
else:
if self.grad_fn.inputs[0].requires_grad:
self.grad_fn.inputs[0].backward(input_grads)
```
**Why Excellent**:
- Handles both scalar and tensor gradients
- Properly checks `requires_grad` before propagating
- Handles tuple returns from grad_fn
- Clear variable names and structure
**Good Solution (7-8 points)**:
```python
def backward(self, gradient=None):
if gradient is None:
gradient = np.ones_like(self.data)
self.grad = gradient
if self.grad_fn:
grads = self.grad_fn.backward(gradient)
for inp, grad in zip(self.grad_fn.inputs, grads):
inp.backward(grad)
```
**Why Good**:
- Correct logic
- Missing `requires_grad` check (minor issue)
- Assumes grads is always iterable (may fail for single input)
- Works for most cases but less robust
**Acceptable Solution (5-6 points)**:
```python
def backward(self, grad):
self.grad = grad
if self.grad_fn:
self.grad_fn.inputs[0].backward(self.grad_fn.backward(grad))
```
**Why Acceptable**:
- Basic backward pass works
- Only handles single input (fails for multi-input operations)
- Missing None gradient handling
- Shows understanding but incomplete
### Module 09: Spatial - Convolution Implementation
**Excellent Solution (9-10 points)**:
```python
def forward(self, x):
"""Forward pass with explicit loops for clarity."""
batch_size, in_channels, height, width = x.shape
out_height = (height - self.kernel_size + 2 * self.padding) // self.stride + 1
out_width = (width - self.kernel_size + 2 * self.padding) // self.stride + 1
output = np.zeros((batch_size, self.out_channels, out_height, out_width))
# Apply padding
if self.padding > 0:
x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding),
(self.padding, self.padding)), mode='constant')
# Explicit convolution loops
for b in range(batch_size):
for oc in range(self.out_channels):
for oh in range(out_height):
for ow in range(out_width):
h_start = oh * self.stride
w_start = ow * self.stride
h_end = h_start + self.kernel_size
w_end = w_start + self.kernel_size
window = x[b, :, h_start:h_end, w_start:w_end]
output[b, oc, oh, ow] = np.sum(
window * self.weight[oc] + self.bias[oc]
)
return Tensor(output, requires_grad=x.requires_grad)
```
**Why Excellent**:
- Clear output shape calculation
- Proper padding handling
- Explicit loops make O(kernel_size²) complexity visible
- Correct gradient tracking setup
- Well-structured and readable
**Good Solution (7-8 points)**:
```python
def forward(self, x):
B, C, H, W = x.shape
out_h = (H - self.kernel_size) // self.stride + 1
out_w = (W - self.kernel_size) // self.stride + 1
out = np.zeros((B, self.out_channels, out_h, out_w))
for b in range(B):
for oc in range(self.out_channels):
for i in range(out_h):
for j in range(out_w):
h = i * self.stride
w = j * self.stride
out[b, oc, i, j] = np.sum(
x[b, :, h:h+self.kernel_size, w:w+self.kernel_size]
* self.weight[oc]
) + self.bias[oc]
return Tensor(out)
```
**Why Good**:
- Correct implementation
- Missing padding support (works only for padding=0)
- Less clear variable names
- Missing requires_grad propagation
**Acceptable Solution (5-6 points)**:
```python
def forward(self, x):
out = np.zeros((x.shape[0], self.out_channels, x.shape[2]-2, x.shape[3]-2))
for b in range(x.shape[0]):
for c in range(self.out_channels):
for i in range(out.shape[2]):
for j in range(out.shape[3]):
out[b, c, i, j] = np.sum(x[b, :, i:i+3, j:j+3] * self.weight[c])
return Tensor(out)
```
**Why Acceptable**:
- Basic convolution works
- Hardcoded kernel_size=3 (not general)
- No stride or padding support
- Shows understanding but incomplete
### Module 12: Attention - Scaled Dot-Product Attention
**Excellent Solution (9-10 points)**:
```python
def forward(self, query, key, value, mask=None):
"""Scaled dot-product attention with numerical stability."""
# Compute attention scores
scores = np.dot(query, key.T) / np.sqrt(self.d_k)
# Apply mask if provided
if mask is not None:
scores = np.where(mask, scores, -1e9)
# Softmax with numerical stability
exp_scores = np.exp(scores - np.max(scores, axis=-1, keepdims=True))
attention_weights = exp_scores / np.sum(exp_scores, axis=-1, keepdims=True)
# Apply attention to values
output = np.dot(attention_weights, value)
return output, attention_weights
```
**Why Excellent**:
- Proper scaling factor (1/√d_k)
- Numerical stability with max subtraction
- Mask handling
- Returns both output and attention weights
- Clear and well-documented
**Good Solution (7-8 points)**:
```python
def forward(self, q, k, v):
scores = np.dot(q, k.T) / np.sqrt(q.shape[-1])
weights = np.exp(scores) / np.sum(np.exp(scores), axis=-1, keepdims=True)
return np.dot(weights, v)
```
**Why Good**:
- Correct implementation
- Missing numerical stability (may overflow)
- Missing mask support
- Works but less robust
**Acceptable Solution (5-6 points)**:
```python
def forward(self, q, k, v):
scores = np.dot(q, k.T)
weights = np.exp(scores) / np.sum(np.exp(scores))
return np.dot(weights, v)
```
**Why Acceptable**:
- Basic attention mechanism
- Missing scaling factor
- Missing numerical stability
- Incorrect softmax (should be per-row)
### Grading Guidelines Using Sample Solutions
**When Evaluating Student Code**:
1. **Correctness First**: Does it pass all tests?
- If no: Maximum 6 points (even if well-written)
- If yes: Proceed to quality evaluation
2. **Code Quality**:
- **Excellent (9-10)**: Production-ready, handles edge cases, well-documented
- **Good (7-8)**: Correct and functional, minor improvements possible
- **Acceptable (5-6)**: Works but incomplete or has issues
3. **Systems Thinking**:
- **Excellent**: Discusses memory, performance, scaling implications
- **Good**: Some systems awareness
- **Acceptable**: Focuses only on correctness
4. **Common Patterns**:
- Look for: Proper error handling, edge case consideration, documentation
- Red flags: Hardcoded values, missing checks, unclear variable names
**Remember**: These are calibration examples. Adjust based on your course level and learning objectives. The goal is consistent evaluation, not perfection.
## 📚 Module Teaching Notes
### **Module 01: Tensor**
- **Focus**: Memory layout, data structures
- **Key Concept**: Understanding memory is crucial for ML performance
- **Demo**: Show memory profiling, copying behavior
### **Module 02: Activations**
- **Focus**: Vectorization, numerical stability
- **Key Concept**: Small details matter at scale
- **Demo**: Gradient vanishing/exploding
### **Module 04-05: Layers & Networks**
- **Focus**: Composition, parameter management
- **Key Concept**: Building blocks combine into complex systems
- **Project**: Build a small CNN
### **Module 06-07: Spatial & Attention**
- **Focus**: Algorithmic complexity, memory patterns
- **Key Concept**: O(N²) operations become bottlenecks
- **Demo**: Profile attention memory usage
### **Module 08-11: Training Pipeline**
- **Focus**: End-to-end system integration
- **Key Concept**: Many components must work together
- **Project**: Train a real model
### **Module 12-15: Production**
- **Focus**: Deployment, optimization, monitoring
- **Key Concept**: Academic vs production requirements
- **Demo**: Model compression, deployment
### **Module 16: TinyGPT**
- **Focus**: Framework generalization
- **Key Concept**: 70% component reuse from vision to language
- **Capstone**: Build a working language model
## 🎯 Learning Objectives
By course end, students should be able to:
1. **Build** complete ML systems from scratch
2. **Analyze** memory usage and computational complexity
3. **Debug** performance bottlenecks
4. **Optimize** for production deployment
5. **Understand** framework design decisions
6. **Apply** systems thinking to ML problems
## 📈 Tracking Progress
### **Individual Progress**
```bash
# Check specific student progress
tito checkpoint status --student student_id
```
### **Class Overview**
```bash
# Export all checkpoint achievements
tito checkpoint export --output class_progress.csv
```
### **Identify Struggling Students**
Look for:
- Missing checkpoint achievements
- Low scores on ML Systems questions
- Incomplete module submissions
## 💡 Teaching Tips
### **1. Emphasize Building Over Theory**
- Have students type every line of code
- Run tests immediately after implementation
- Break and fix things intentionally
### **2. Connect to Production Systems**
- Show PyTorch/TensorFlow equivalents
- Discuss real-world bottlenecks
- Share production war stories
### **3. Make Performance Visible**
```python
# Use profilers liberally
with TimeProfiler("operation"):
result = expensive_operation()
# Show memory usage
print(f"Memory: {get_memory_usage():.2f} MB")
```
### **4. Encourage Systems Questions**
- "What would break at 1B parameters?"
- "How would you distributed this?"
- "What's the bottleneck here?"
## 🔧 Troubleshooting
### **Common Student Issues**
**Environment Problems**
```bash
# Student fix:
tito system doctor
tito system reset
```
**Module Import Errors**
```bash
# Rebuild package
tito export --all
```
**Test Failures**
```bash
# Detailed test output
tito module test MODULE --verbose
```
### **NBGrader Issues**
**Database Locked**
```bash
# Clear NBGrader database
rm gradebook.db
tito grade setup
```
**Missing Submissions**
```bash
# Check submission directory
ls submitted/*/MODULE/
```
## 📊 Sample Schedule (16 Weeks)
| Week | Module | Focus |
|------|--------|-------|
| 1 | 01 Tensor | Data Structures, Memory |
| 2 | 02 Activations | Non-linearity Functions |
| 3 | 03 Layers | Neural Network Components |
| 4 | 04 Losses | Optimization Objectives |
| 5 | 05 Autograd | Automatic Differentiation |
| 6 | 06 Optimizers | Training Algorithms |
| 7 | 07 Training | Complete Training Loop |
| 8 | Midterm Project | Build and Train Network |
| 9 | 08 DataLoader | Data Pipeline |
| 10 | 09 Spatial | Convolutions, CNNs |
| 11 | 10 Tokenization | Text Processing |
| 12 | 11 Embeddings | Word Representations |
| 13 | 12 Attention | Attention Mechanisms |
| 14 | 13 Transformers | Transformer Architecture |
| 15 | 14-19 Optimization | Profiling, Quantization, etc. |
| 16 | 20 Capstone | Torch Olympics Competition |
## 🎓 Assessment Strategy
### **Continuous Assessment (70%)**
- Module completion: 4% each × 16 = 64%
- Checkpoint achievements: 6%
### **Projects (30%)**
- Midterm: Build and train CNN (15%)
- Final: Extend TinyGPT (15%)
## 📚 Additional Resources
- [MLSys Book](https://mlsysbook.ai) - Companion textbook
- [Course Discussions](https://github.com/MLSysBook/TinyTorch/discussions)
- [Issue Tracker](https://github.com/MLSysBook/TinyTorch/issues)
---
**Need help? Open an issue or contact the TinyTorch team!**

77
site/prepare_notebooks.sh Executable file
View File

@@ -0,0 +1,77 @@
#!/bin/bash
# Prepare notebooks for site build
# This script ensures notebooks exist in site/ for launch buttons to work
# Called automatically during site build
#
# Workflow:
# 1. Uses existing assignment notebooks if available (from tito nbgrader generate)
# 2. Falls back to generating notebooks from modules if needed
# 3. Copies notebooks to site/chapters/modules/ for Jupyter Book launch buttons
set -e
# Get the site directory (where this script lives)
SITE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SITE_DIR/.." && pwd)"
echo "📓 Preparing notebooks for site build..."
# Create notebooks directory in site if it doesn't exist
NOTEBOOKS_DIR="$SITE_DIR/chapters/modules"
mkdir -p "$NOTEBOOKS_DIR"
cd "$REPO_ROOT"
# Strategy: Use existing assignment notebooks if available, otherwise generate
# This is faster and uses already-processed notebooks
echo "🔄 Looking for existing assignment notebooks..."
MODULES=$(ls -1 modules/ 2>/dev/null | grep -E "^[0-9]" | sort -V || echo "")
if [ -z "$MODULES" ]; then
echo "⚠️ No modules found. Skipping notebook preparation."
exit 0
fi
NOTEBOOKS_COPIED=0
NOTEBOOKS_GENERATED=0
for module in $MODULES; do
TARGET_NB="$NOTEBOOKS_DIR/${module}.ipynb"
# Check if assignment notebook already exists
ASSIGNMENT_NB="$REPO_ROOT/assignments/source/$module/${module}.ipynb"
if [ -f "$ASSIGNMENT_NB" ]; then
# Use existing assignment notebook
cp "$ASSIGNMENT_NB" "$TARGET_NB"
echo " ✅ Copied existing notebook: $module"
NOTEBOOKS_COPIED=$((NOTEBOOKS_COPIED + 1))
elif command -v tito &> /dev/null; then
# Try to generate notebook if tito is available
echo " 🔄 Generating notebook for $module..."
if tito nbgrader generate "$module" >/dev/null 2>&1; then
if [ -f "$ASSIGNMENT_NB" ]; then
cp "$ASSIGNMENT_NB" "$TARGET_NB"
echo " ✅ Generated and copied: $module"
NOTEBOOKS_GENERATED=$((NOTEBOOKS_GENERATED + 1))
fi
else
echo " ⚠️ Could not generate notebook for $module (module may not be ready)"
fi
else
echo " ⚠️ No notebook found for $module (install tito CLI to generate)"
fi
done
echo ""
if [ $NOTEBOOKS_COPIED -gt 0 ] || [ $NOTEBOOKS_GENERATED -gt 0 ]; then
echo "✅ Notebook preparation complete!"
echo " Copied: $NOTEBOOKS_COPIED | Generated: $NOTEBOOKS_GENERATED"
echo " Notebooks available in: $NOTEBOOKS_DIR"
echo " Launch buttons will now work on notebook pages!"
else
echo "⚠️ No notebooks prepared. Launch buttons may not appear."
echo " Run 'tito nbgrader generate --all' first to create assignment notebooks."
fi

View File

@@ -50,6 +50,34 @@ See [Module Workflow](tito/modules.md) for detailed commands and [Troubleshootin
</div>
<div style="background: #e3f2fd; padding: 1.5rem; border-radius: 0.5rem; border-left: 4px solid #2196f3; margin: 1.5rem 0;">
<h4 style="margin: 0 0 1rem 0; color: #1976d2;">Step 3: Join the Community & Benchmark</h4>
After setup, join the global TinyTorch community and validate your setup:
```bash
# Join the community (optional)
tito community join
# Run baseline benchmark to validate setup
tito benchmark baseline
```
**Community Features:**
- Join with optional information (country, institution, course type)
- Track your progress automatically
- See your cohort (Fall 2024, Spring 2025, etc.)
- All data stored locally in `.tinytorch/` directory
**Baseline Benchmark:**
- Quick validation that everything works
- Your "Hello World" moment!
- Generates score and saves results locally
See [Community Guide](community.html) for complete features.
</div>
## 15-Minute First Module Walkthrough
Let's build your first neural network component following the **TinyTorch workflow**:
@@ -217,7 +245,11 @@ In 15 minutes, you've:
- See [TITO CLI Reference](tito/overview.md) for complete command reference
**For Instructors:**
- See [Classroom Setup Guide](usage-paths/classroom-use.md) for [NBGrader](https://nbgrader.readthedocs.io/) integration (coming soon)
- See [Classroom Setup Guide](usage-paths/classroom-use.md) for [NBGrader](https://nbgrader.readthedocs.io/) integration
**Notebook Platforms:**
- **Online (Viewing)**: Jupyter/MyBinder, Google Colab, Marimo - great for exploring notebooks
- **⚠️ Important**: Online notebooks are for **viewing only**. For full package experiments, milestone validation, and CLI tools, you need **local installation** (see [Student Workflow](student-workflow.md))
</div>

View File

@@ -1,8 +1,9 @@
# TinyTorch Course Dependencies for Binder/Colab
# TinyTorch Course Dependencies for Site Documentation Builds
# Note: For Binder/Colab environments, see binder/requirements.txt
# Keep synchronized with main requirements.txt
# Core numerical computing
numpy>=1.21.0,<2.0.0
numpy>=1.24.0,<3.0.0
matplotlib>=3.5.0
# Data handling

View File

@@ -135,6 +135,10 @@ tito checkpoint status
# System information
tito system info
# Join community and benchmark
tito community join
tito benchmark baseline
```
For complete command documentation, see [TITO CLI Reference](tito/overview.md).
@@ -149,9 +153,84 @@ tito checkpoint status # View completion tracking
This is helpful for self-assessment but **not required** for the core workflow. The essential cycle remains: edit → export → validate.
## Instructor Integration (Coming Soon)
## Notebook Platform Options
TinyTorch supports [NBGrader](https://nbgrader.readthedocs.io/) for classroom use. Documentation for instructors using the autograding features will be available in future releases.
TinyTorch notebooks work with multiple platforms, but **important distinction**:
### Online Notebooks (Viewing & Exploration)
- **Jupyter/MyBinder**: Click "Launch Binder" on any notebook page - great for viewing
- **Google Colab**: Click "Launch Colab" for GPU access - good for exploration
- **Marimo**: Click "🍃 Open in Marimo" for reactive notebooks - excellent for learning
**⚠️ Important**: Online notebooks are for **viewing and learning**. They don't have the full TinyTorch package installed, so you can't:
- Run milestone validation scripts
- Import from `tinytorch.*` modules
- Execute full experiments
- Use the complete CLI tools
### Local Setup (Required for Full Package)
**To actually build and experiment**, you need a **local installation**:
```bash
# Clone and setup locally
git clone https://github.com/mlsysbook/TinyTorch.git
cd TinyTorch
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e . # Install TinyTorch package
```
**Why local?**
- ✅ Full `tinytorch.*` package available
- ✅ Run milestone validation scripts
- ✅ Use `tito` CLI commands
- ✅ Execute complete experiments
- ✅ Export modules to package
- ✅ Full development workflow
**Note for NBGrader assignments**: Submit `.ipynb` files (not Marimo's `.py` format) to preserve grading metadata.
## Community & Benchmarking
### Join the Community
After completing setup, join the global TinyTorch community:
```bash
# Join with optional information
tito community join
# View your profile and progress
tito community profile
# Update your information
tito community update
```
**Privacy:** All information is optional. Data is stored locally in `.tinytorch/` directory. See [Community Guide](community.html) for details.
### Benchmark Your Progress
Validate your setup and track performance:
```bash
# Quick baseline benchmark (after setup)
tito benchmark baseline
# Full capstone benchmarks (after Module 20)
tito benchmark capstone --track all
```
**Baseline Benchmark:** Quick validation that your setup works correctly - your "Hello World" moment!
**Capstone Benchmark:** Full performance evaluation across speed, compression, accuracy, and efficiency tracks.
See [Community Guide](community.html) for complete community and benchmarking features.
## Instructor Integration
TinyTorch supports [NBGrader](https://nbgrader.readthedocs.io/) for classroom use. See the [Instructor Guide](usage-paths/classroom-use.md) for complete setup and grading workflows.
For now, focus on the student workflow: building your implementations and validating them with milestones.

View File

@@ -86,6 +86,31 @@
**See**: [Progress & Data Management](data.md) for complete details
### Community Commands
**Purpose**: Join the global TinyTorch community and track your progress
| Command | Description | Guide |
|---------|-------------|-------|
| `tito community join` | Join the community (optional info) | [Community Guide](../community.html) |
| `tito community update` | Update your community profile | [Community Guide](../community.html) |
| `tito community profile` | View your community profile | [Community Guide](../community.html) |
| `tito community stats` | View community statistics | [Community Guide](../community.html) |
| `tito community leave` | Remove your community profile | [Community Guide](../community.html) |
**See**: [Community Guide](../community.html) for complete details
### Benchmark Commands
**Purpose**: Validate setup and measure performance
| Command | Description | Guide |
|---------|-------------|-------|
| `tito benchmark baseline` | Quick setup validation ("Hello World") | [Community Guide](../community.html) |
| `tito benchmark capstone` | Full Module 20 performance evaluation | [Community Guide](../community.html) |
**See**: [Community Guide](../community.html) for complete details
---
## Command Groups by Task

View File

@@ -1,10 +1,8 @@
# TinyTorch for Instructors: Complete ML Systems Course
<div style="background: #fff3cd; border: 1px solid #ffc107; padding: 1.5rem; border-radius: 0.5rem; margin: 2rem 0;">
<h3 style="margin: 0 0 0.5rem 0; color: #856404;">🚧 Classroom Integration: Coming Soon</h3>
<p style="margin: 0; color: #856404;"><a href="https://nbgrader.readthedocs.io/" style="color: #856404; text-decoration: underline;">NBGrader</a> integration and instructor tooling are under active development. Full documentation and automated grading workflows will be available in future releases.</p>
<p style="margin: 0.5rem 0 0 0; color: #856404;"><strong>Currently available</strong>: Students can use TinyTorch with the standard workflow (edit modules → export → validate with milestones)</p>
<p style="margin: 0.5rem 0 0 0;"><a href="../student-workflow.html" style="color: #856404; font-weight: bold;">📖 See Student Workflow</a> for the current development cycle.</p>
<div style="background: #d4edda; border: 1px solid #28a745; padding: 1.5rem; border-radius: 0.5rem; margin: 2rem 0;">
<h3 style="margin: 0 0 0.5rem 0; color: #155724;"> Classroom Integration Available</h3>
<p style="margin: 0; color: #155724;">TinyTorch includes complete <a href="https://nbgrader.readthedocs.io/" style="color: #155724; text-decoration: underline; font-weight: bold;">NBGrader</a> integration with automated grading workflows. See the <a href="../instructor-guide.html" style="color: #155724; font-weight: bold;">Complete Instructor Guide</a> for setup, grading rubrics, and sample solutions.</p>
</div>
<div style="background: #e3f2fd; border: 1px solid #2196f3; padding: 1rem; border-radius: 0.5rem; margin: 1rem 0;">
@@ -36,7 +34,7 @@
</div>
<div>
<ul style="margin: 0; padding-left: 1rem;">
<li><strong>Complete instructor guide</strong> with setup & grading (coming soon)</li>
<li><strong>Complete instructor guide</strong> with setup & grading ([available now](../instructor-guide.html))</li>
<li><strong>Flexible pacing</strong> (14-18 weeks depending on depth)</li>
<li><strong>Industry practices</strong> (Git, testing, documentation)</li>
<li><strong>Academic foundation</strong> from university research</li>
@@ -48,7 +46,7 @@
**Planned Course Duration:** 14-16 weeks (flexible pacing)
**Student Outcome:** Complete ML framework supporting vision AND language models
**Current Status:** Students can work through modules individually using the standard workflow. Full classroom integration ([NBGrader](https://nbgrader.readthedocs.io/) automation, instructor dashboards) coming soon.
**Current Status:** Complete NBGrader integration available! See the [Instructor Guide](../instructor-guide.html) for setup, grading workflows, and sample solutions.
---
@@ -159,8 +157,8 @@ tito module status --comprehensive
<div style="background: white; padding: 1.5rem; border-radius: 0.5rem; border: 1px solid #dee2e6;">
<h4 style="color: #495057; margin: 0 0 0.5rem 0;">3⃣ First Assignment (10 min)</h4>
<div style="background: #f8f9fa; padding: 1rem; border-radius: 0.25rem; font-family: monospace; font-size: 0.85rem; margin: 0.5rem 0;">
tito nbgrader generate 01_setup<br>
tito nbgrader release 01_setup
tito nbgrader generate 01_tensor<br>
tito nbgrader release 01_tensor
</div>
<p style="font-size: 0.9rem; margin: 0; color: #6c757d;">Ready to distribute to students!</p>
</div>
@@ -169,6 +167,7 @@ tito nbgrader release 01_setup
<div style="text-align: center; margin-top: 1.5rem;">
<a href="../instructor-guide.html" style="display: inline-block; background: #007bff; color: white; padding: 0.5rem 1rem; border-radius: 0.25rem; text-decoration: none; font-weight: 500; margin-right: 1rem;">📖 Complete Instructor Guide</a>
<a href="ta-guide.html" style="display: inline-block; background: #28a745; color: white; padding: 0.5rem 1rem; border-radius: 0.25rem; text-decoration: none; font-weight: 500;">👥 TA Guide</a>
<a href="../testing-framework.html" style="display: inline-block; background: #28a745; color: white; padding: 0.5rem 1rem; border-radius: 0.25rem; text-decoration: none; font-weight: 500;">🧪 Testing Framework Guide</a>
</div>
@@ -197,7 +196,9 @@ tito nbgrader release 01_setup
## Instructor Resources
### Documentation
### Essential Documentation
- **[Complete Instructor Guide](../instructor-guide.md)** - 30-minute setup, grading rubrics, sample solutions, common errors
- **[TA Guide](ta-guide.md)** - Common student errors, debugging strategies, office hour patterns
- Module-specific teaching notes in each ABOUT.md file
- [Course Structure](../chapters/00-introduction.md) - Full curriculum overview
- [Student Workflow](../student-workflow.md) - Essential development cycle

View File

@@ -0,0 +1,264 @@
# Teaching Assistant Guide for TinyTorch
Complete guide for TAs supporting TinyTorch courses, covering common student errors, debugging strategies, and effective support techniques.
## 🎯 TA Preparation
### Critical Modules for Deep Familiarity
TAs should develop deep familiarity with modules where students commonly struggle:
1. **Module 05: Autograd** - Most conceptually challenging
2. **Module 09: CNNs (Spatial)** - Complex nested loops and memory patterns
3. **Module 13: Transformers** - Attention mechanisms and scaling
### Preparation Process
1. **Complete modules yourself** - Implement all three critical modules
2. **Introduce bugs intentionally** - Understand common error patterns
3. **Practice debugging** - Work through error scenarios
4. **Review student submissions** - Familiarize yourself with common mistakes
## 🐛 Common Student Errors
### Module 05: Autograd
#### Error 1: Gradient Shape Mismatches
**Symptom**: `ValueError: shapes don't match for gradient`
**Common Cause**: Incorrect gradient accumulation or shape handling
**Debugging Strategy**:
- Check gradient shapes match parameter shapes
- Verify gradient accumulation logic
- Look for broadcasting issues
**Example**:
```python
# Wrong: Gradient shape mismatch
param.grad = grad # grad might be wrong shape
# Right: Ensure shapes match
assert grad.shape == param.shape
param.grad = grad
```
#### Error 2: Disconnected Computational Graph
**Symptom**: Gradients are None or zero
**Common Cause**: Operations not tracked in computational graph
**Debugging Strategy**:
- Verify `requires_grad=True` on input tensors
- Check that operations create new Tensor objects
- Ensure backward() is called on leaf nodes
**Example**:
```python
# Wrong: Graph disconnected
x = Tensor([1, 2, 3]) # requires_grad=False by default
y = x * 2
y.backward() # No gradients!
# Right: Enable gradient tracking
x = Tensor([1, 2, 3], requires_grad=True)
y = x * 2
y.backward() # Gradients flow correctly
```
#### Error 3: Broadcasting Failures
**Symptom**: Shape errors during backward pass
**Common Cause**: Incorrect handling of broadcasted operations
**Debugging Strategy**:
- Understand NumPy broadcasting rules
- Check gradient accumulation for broadcasted dimensions
- Verify gradient shapes match original tensor shapes
### Module 09: CNNs (Spatial)
#### Error 1: Index Out of Bounds
**Symptom**: `IndexError` in convolution loops
**Common Cause**: Incorrect padding or stride calculations
**Debugging Strategy**:
- Verify output shape calculations
- Check padding logic
- Test with small examples first
#### Error 2: Memory Issues
**Symptom**: Out of memory errors
**Common Cause**: Creating unnecessary intermediate arrays
**Debugging Strategy**:
- Profile memory usage
- Look for unnecessary copies
- Optimize loop structure
### Module 13: Transformers
#### Error 1: Attention Scaling Issues
**Symptom**: Attention weights don't sum to 1
**Common Cause**: Missing softmax or incorrect scaling
**Debugging Strategy**:
- Verify softmax is applied
- Check scaling factor (1/sqrt(d_k))
- Test attention weights sum to 1
#### Error 2: Positional Encoding Errors
**Symptom**: Model doesn't learn positional information
**Common Cause**: Incorrect positional encoding implementation
**Debugging Strategy**:
- Verify sinusoidal patterns
- Check encoding is added correctly
- Test with simple sequences
## 🔧 Debugging Strategies
### Structured Debugging Questions
When students ask for help, guide them with questions rather than giving answers:
1. **What error message are you seeing?**
- Read the full traceback
- Identify the specific line causing the error
2. **What did you expect to happen?**
- Clarify their mental model
- Identify misconceptions
3. **What actually happened?**
- Compare expected vs actual
- Look for patterns
4. **What have you tried?**
- Avoid repeating failed approaches
- Build on their attempts
5. **Can you test with a simpler case?**
- Reduce complexity
- Isolate the problem
### Productive vs Unproductive Struggle
**Productive Struggle** (encourage):
- Trying different approaches
- Making incremental progress
- Understanding error messages
- Passing additional tests over time
**Unproductive Frustration** (intervene):
- Repeated identical errors
- Random code changes
- Unable to articulate the problem
- No progress after 30+ minutes
### When to Provide Scaffolding
Offer scaffolding modules when students reach unproductive frustration:
- **Before Autograd**: Numerical gradient checking module
- **Before Tensor Autograd**: Scalar autograd module
- **Before CNNs**: Simple 1D convolution exercises
## 📊 Office Hour Patterns
### Expected Demand Spikes
**Module 05 (Autograd)**: Highest demand
- Schedule additional TA capacity
- Pre-record debugging walkthroughs
- Create FAQ document
**Module 09 (CNNs)**: High demand
- Focus on memory profiling
- Loop optimization strategies
- Padding/stride calculations
**Module 13 (Transformers)**: Moderate-high demand
- Attention mechanism debugging
- Positional encoding issues
- Scaling problems
### Support Channels
1. **Synchronous**: Office hours, lab sessions
2. **Asynchronous**: Discussion forums, email
3. **Self-service**: Common errors documentation, FAQ
## 🎓 Grading Support
### Manual Review Focus Areas
While NBGrader automates 70-80% of assessment, focus manual review on:
1. **Code Clarity and Design Choices**
- Is code readable?
- Are design decisions justified?
- Is the implementation clean?
2. **Edge Case Handling**
- Does code handle edge cases?
- Are there appropriate checks?
- Is error handling present?
3. **Computational Complexity Analysis**
- Do students understand complexity?
- Can they analyze their code?
- Do they recognize bottlenecks?
4. **Memory Profiling Insights**
- Do students understand memory usage?
- Can they identify memory issues?
- Do they optimize appropriately?
### Grading Rubrics
See `INSTRUCTOR.md` for detailed grading rubrics for:
- ML Systems Thinking questions
- Code quality assessment
- Systems analysis evaluation
## 💡 Teaching Tips
### 1. Encourage Exploration
- Let students try different approaches
- Support learning from mistakes
- Celebrate incremental progress
### 2. Connect to Production
- Reference PyTorch equivalents
- Discuss real-world debugging scenarios
- Share production war stories
### 3. Make Systems Visible
- Profile memory usage together
- Analyze computational complexity
- Visualize computational graphs
### 4. Build Confidence
- Acknowledge when students are on the right track
- Validate their understanding
- Provide encouragement during struggle
## 📚 Resources
- **INSTRUCTOR.md**: Complete instructor guide with grading rubrics
- **Common Errors**: This document (expanded as needed)
- **Module Documentation**: Each module's ABOUT.md file
- **Student Forums**: Community discussion areas
## 🔄 Continuous Improvement
### Feedback Collection
- Track common errors in office hours
- Document new error patterns
- Update this guide regularly
- Share insights with instructor team
### TA Training
- Regular TA meetings
- Share debugging strategies
- Review student submissions together
- Practice debugging sessions
---
**Last Updated**: November 2024
**For Questions**: See INSTRUCTOR.md or contact course instructor

View File

@@ -0,0 +1,282 @@
# Team Onboarding Guide: TinyTorch for Industry
Complete guide for using TinyTorch in industry settings: new hire bootcamps, internal training programs, and debugging workshops.
## 🎯 Overview
TinyTorch's **Model 3: Team Onboarding** addresses industry use cases where ML teams want members to understand PyTorch internals. This guide covers deployment scenarios, training structures, and best practices for industry adoption.
## 🚀 Use Cases
### 1. New Hire Bootcamps (2-3 Week Intensive)
**Goal**: Rapidly onboard new ML engineers to understand framework internals
**Structure**:
- **Week 1**: Foundation Tier (Modules 01-07)
- Tensors, autograd, optimizers, training loops
- Focus: Understanding `loss.backward()` mechanics
- **Week 2**: Architecture Tier (Modules 08-13)
- CNNs, transformers, attention mechanisms
- Focus: Production architecture internals
- **Week 3**: Optimization Tier (Modules 14-19) OR Capstone
- Profiling, quantization, compression
- Focus: Production optimization techniques
**Schedule**:
- Full-time: 40 hours/week
- Hands-on coding: 70% of time
- Systems discussions: 30% of time
- Daily standups and code reviews
**Deliverables**:
- Completed modules with passing tests
- Capstone project (optional)
- Technical presentation on framework internals
### 2. Internal Training Programs (Distributed Over Quarters)
**Goal**: Deep understanding of ML systems for existing team members
**Structure**:
- **Quarter 1**: Foundation (Modules 01-07)
- Weekly sessions: 2-3 hours
- Self-paced module completion
- Monthly group discussions
- **Quarter 2**: Architecture (Modules 08-13)
- Weekly sessions: 2-3 hours
- Architecture deep-dives
- Production case studies
- **Quarter 3**: Optimization (Modules 14-19)
- Weekly sessions: 2-3 hours
- Performance optimization focus
- Real production optimization projects
**Benefits**:
- Fits into existing work schedules
- Allows deep learning without intensive time commitment
- Builds team knowledge gradually
- Enables peer learning
### 3. Debugging Workshops (Focused Modules)
**Goal**: Targeted understanding of specific framework components
**Common Focus Areas**:
#### Autograd Debugging Workshop (Module 05)
- Understanding gradient flow
- Debugging gradient issues
- Computational graph visualization
- **Duration**: 1-2 days
#### Attention Mechanism Workshop (Module 12)
- Understanding attention internals
- Debugging attention scaling issues
- Memory optimization for attention
- **Duration**: 1-2 days
#### Optimization Workshop (Modules 14-19)
- Profiling production models
- Quantization and compression
- Performance optimization strategies
- **Duration**: 2-3 days
## 🏗️ Deployment Scenarios
### Scenario 1: Cloud-Based Training (Recommended)
**Setup**: Google Colab or JupyterHub
- Zero local installation
- Consistent environment
- Easy sharing and collaboration
- **Best for**: Large teams, remote workers
**Steps**:
1. Clone repository to Colab
2. Install dependencies: `pip install -e .`
3. Work through modules
4. Share notebooks via Colab links
### Scenario 2: Local Development Environment
**Setup**: Local Python environment
- Full control over environment
- Better for debugging
- Offline capability
- **Best for**: Smaller teams, on-site training
**Steps**:
1. Clone repository locally
2. Set up virtual environment
3. Install: `pip install -e .`
4. Use JupyterLab for development
### Scenario 3: Hybrid Approach
**Setup**: Colab for learning, local for projects
- Learn in cloud environment
- Apply locally for projects
- **Best for**: Flexible teams
## 📋 Training Program Templates
### Template 1: 2-Week Intensive Bootcamp
**Week 1: Foundation**
- Day 1-2: Modules 01-02 (Tensor, Activations)
- Day 3-4: Modules 03-04 (Layers, Losses)
- Day 5: Module 05 (Autograd) - Full day focus
- Weekend: Review and practice
**Week 2: Architecture + Optimization**
- Day 1-2: Modules 08-09 (DataLoader, CNNs)
- Day 3: Module 12 (Attention)
- Day 4-5: Modules 14-15 (Profiling, Quantization)
- Final: Capstone project presentation
### Template 2: 3-Month Distributed Program
**Month 1: Foundation**
- Week 1: Modules 01-02
- Week 2: Modules 03-04
- Week 3: Module 05 (Autograd)
- Week 4: Modules 06-07 (Optimizers, Training)
**Month 2: Architecture**
- Week 1: Modules 08-09
- Week 2: Modules 10-11
- Week 3: Modules 12-13
- Week 4: Integration project
**Month 3: Optimization**
- Week 1: Modules 14-15
- Week 2: Modules 16-17
- Week 3: Modules 18-19
- Week 4: Capstone optimization project
## 🎓 Learning Outcomes
After completing TinyTorch onboarding, team members will:
1. **Understand Framework Internals**
- How autograd works
- Memory allocation patterns
- Optimization trade-offs
2. **Debug Production Issues**
- Gradient flow problems
- Memory bottlenecks
- Performance issues
3. **Make Informed Decisions**
- Optimizer selection
- Architecture choices
- Deployment strategies
4. **Read Production Code**
- Understand PyTorch source
- Navigate framework codebases
- Contribute to ML infrastructure
## 🔧 Integration with Existing Workflows
### Code Review Integration
- Review production code with TinyTorch knowledge
- Identify framework internals in production code
- Suggest optimizations based on systems understanding
### Debugging Integration
- Apply TinyTorch debugging strategies to production issues
- Use systems thinking for troubleshooting
- Profile production models using TinyTorch techniques
### Architecture Design
- Design new models with systems awareness
- Consider memory and performance from the start
- Make informed trade-offs
## 📊 Success Metrics
### Individual Metrics
- Module completion rate
- Test passing rate
- Capstone project quality
- Self-reported confidence increase
### Team Metrics
- Reduced debugging time
- Fewer production incidents
- Improved code review quality
- Better architecture decisions
## 🛠️ Setup for Teams
### Quick Start
```bash
# 1. Clone repository
git clone https://github.com/mlsysbook/TinyTorch.git
cd TinyTorch
# 2. Set up environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
pip install -e .
# 4. Verify setup
tito system doctor
# 5. Start with Module 01
tito view 01_tensor
```
### Team-Specific Customization
- **Custom datasets**: Replace with company-specific data
- **Domain modules**: Add modules for specific use cases
- **Integration**: Connect to company ML infrastructure
- **Assessment**: Customize grading for team needs
## 📚 Resources
- **Student Quickstart**: `docs/STUDENT_QUICKSTART.md`
- **Instructor Guide**: `INSTRUCTOR.md` (for training leads)
- **TA Guide**: `TA_GUIDE.md` (for support staff)
- **Module Documentation**: `modules/*/ABOUT.md`
## 💼 Industry Case Studies
### Case Study 1: ML Infrastructure Team
**Challenge**: Team members could use PyTorch but couldn't debug framework issues
**Solution**: 2-week intensive bootcamp focusing on autograd and optimization
**Result**: 50% reduction in debugging time, better architecture decisions
### Case Study 2: Research Team
**Challenge**: Researchers needed to understand transformer internals
**Solution**: Focused workshop on Modules 12-13 (Attention, Transformers)
**Result**: Improved model designs, better understanding of scaling
### Case Study 3: Production ML Team
**Challenge**: Team needed optimization skills for deployment
**Solution**: 3-month program focusing on Optimization Tier (Modules 14-19)
**Result**: 4x model compression, 10x speedup on production models
## 🎯 Next Steps
1. **Choose deployment model**: Bootcamp, distributed, or workshop
2. **Set up environment**: Cloud (Colab) or local
3. **Select modules**: Full curriculum or focused selection
4. **Schedule training**: Intensive or distributed
5. **Track progress**: Use checkpoint system or custom metrics
---
**For Questions**: See `INSTRUCTOR.md` or contact TinyTorch maintainers

View File

@@ -20,6 +20,8 @@ from .status import StatusCommand
from .clean import CleanCommand
from .nbgrader import NBGraderCommand
from .book import BookCommand
from .benchmark import BenchmarkCommand
from .community import CommunityCommand
# Command groups
from .system import SystemCommand
@@ -41,6 +43,8 @@ __all__ = [
'CleanCommand',
'NBGraderCommand',
'BookCommand',
'BenchmarkCommand',
'CommunityCommand',
# Command groups
'SystemCommand',
'ModuleWorkflowCommand',

653
tito/commands/benchmark.py Normal file
View File

@@ -0,0 +1,653 @@
"""
Tiny🔥Torch Benchmark Commands
Run baseline and capstone benchmarks, with automatic submission prompts.
"""
import json
import time
import platform
from argparse import ArgumentParser, Namespace
from datetime import datetime
from pathlib import Path
from typing import Dict, List, Optional, Any, Tuple
import numpy as np
from rich.panel import Panel
from rich.table import Table
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn
from rich.prompt import Prompt, Confirm
from rich.console import Console
from .base import BaseCommand
from ..core.exceptions import TinyTorchCLIError
class BenchmarkCommand(BaseCommand):
"""Benchmark commands - baseline and capstone performance evaluation."""
@property
def name(self) -> str:
return "benchmark"
@property
def description(self) -> str:
return "Run benchmarks - baseline (setup validation) and capstone (full performance)"
def add_arguments(self, parser: ArgumentParser) -> None:
"""Add benchmark subcommands."""
subparsers = parser.add_subparsers(
dest='benchmark_command',
help='Benchmark operations',
metavar='COMMAND'
)
# Baseline benchmark
baseline_parser = subparsers.add_parser(
'baseline',
help='Run baseline benchmark (quick setup validation)'
)
baseline_parser.add_argument(
'--skip-submit',
action='store_true',
help='Skip submission prompt after benchmark'
)
# Capstone benchmark
capstone_parser = subparsers.add_parser(
'capstone',
help='Run capstone benchmark (full Module 20 performance evaluation)'
)
capstone_parser.add_argument(
'--track',
choices=['speed', 'compression', 'accuracy', 'efficiency', 'all'],
default='all',
help='Which track to benchmark (default: all)'
)
capstone_parser.add_argument(
'--skip-submit',
action='store_true',
help='Skip submission prompt after benchmark'
)
def run(self, args: Namespace) -> int:
"""Execute benchmark command."""
if not args.benchmark_command:
self.console.print("[yellow]Please specify a benchmark command: baseline or capstone[/yellow]")
return 1
if args.benchmark_command == 'baseline':
return self._run_baseline(args)
elif args.benchmark_command == 'capstone':
return self._run_capstone(args)
else:
self.console.print(f"[red]Unknown benchmark command: {args.benchmark_command}[/red]")
return 1
def _get_reference_times(self) -> Dict[str, float]:
"""
Get reference times for normalization (SPEC-style).
Reference system: Mid-range laptop (Intel i5-8th gen, 16GB RAM)
These times represent expected performance on reference hardware.
Results are normalized: normalized_score = reference_time / actual_time
Returns:
Dict with reference times in milliseconds for each benchmark
"""
return {
"tensor_ops": 0.8, # Reference: 0.8ms for tensor operations
"matmul": 2.5, # Reference: 2.5ms for matrix multiply
"forward_pass": 6.7, # Reference: 6.7ms for forward pass
"total": 10.0 # Reference: 10.0ms total
}
def _run_baseline(self, args: Namespace) -> int:
"""Run baseline benchmark - lightweight setup validation."""
console = self.console
console.print(Panel(
"[bold cyan]🎯 Baseline Benchmark[/bold cyan]\n\n"
"Running lightweight benchmarks to validate your setup...\n"
"[dim]Results are normalized to a reference system for fair comparison.[/dim]",
title="Baseline Benchmark",
border_style="cyan"
))
# Run baseline benchmarks
with Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
console=console
) as progress:
task = progress.add_task("Running baseline benchmarks...", total=None)
# Benchmark 1: Tensor operations
progress.update(task, description="[cyan]Testing tensor operations...")
tensor_time = self._benchmark_tensor_ops()
# Benchmark 2: Matrix multiply
progress.update(task, description="[cyan]Testing matrix multiplication...")
matmul_time = self._benchmark_matmul()
# Benchmark 3: Simple forward pass
progress.update(task, description="[cyan]Testing forward pass...")
forward_time = self._benchmark_forward_pass()
progress.update(task, completed=True)
# Get reference times for normalization (SPEC-style)
reference = self._get_reference_times()
# Calculate normalized scores (SPEC-style: reference_time / actual_time)
# Higher normalized score = better performance
tensor_normalized = reference["tensor_ops"] / max(tensor_time, 0.001)
matmul_normalized = reference["matmul"] / max(matmul_time, 0.001)
forward_normalized = reference["forward_pass"] / max(forward_time, 0.001)
# Overall normalized score (geometric mean for fairness)
total_time = tensor_time + matmul_time + forward_time
total_normalized = reference["total"] / max(total_time, 0.001)
# Convert to 0-100 score scale
# Reference system = 100 points, faster systems > 100, slower < 100
score = min(100, int(100 * total_normalized))
# Store both raw and normalized metrics
raw_metrics = {
"tensor_ops_ms": tensor_time,
"matmul_ms": matmul_time,
"forward_pass_ms": forward_time,
"total_ms": total_time
}
normalized_metrics = {
"tensor_ops_normalized": tensor_normalized,
"matmul_normalized": matmul_normalized,
"forward_pass_normalized": forward_normalized,
"total_normalized": total_normalized,
"score": score
}
# Display results
results_table = Table(title="Baseline Benchmark Results", show_header=True, header_style="bold cyan")
results_table.add_column("Metric", style="cyan")
results_table.add_column("Time", justify="right", style="green")
results_table.add_column("Normalized", justify="right", style="yellow")
results_table.add_column("Status", justify="center")
results_table.add_row(
"Tensor Operations",
f"{tensor_time:.2f} ms",
f"{tensor_normalized:.2f}x",
""
)
results_table.add_row(
"Matrix Multiply",
f"{matmul_time:.2f} ms",
f"{matmul_normalized:.2f}x",
""
)
results_table.add_row(
"Forward Pass",
f"{forward_time:.2f} ms",
f"{forward_normalized:.2f}x",
""
)
results_table.add_row("", "", "", "")
results_table.add_row(
"[bold]Total[/bold]",
f"{total_time:.2f} ms",
f"{total_normalized:.2f}x",
""
)
results_table.add_row(
"[bold]Score[/bold]",
"",
f"[bold]{score}/100[/bold]",
"🎯"
)
console.print("\n")
console.print(results_table)
# Show normalization info
console.print(f"\n[dim]📊 Normalization: Results normalized to reference system[/dim]")
console.print(f"[dim] Reference: {reference['total']:.1f}ms total time[/dim]")
console.print(f"[dim] Your system: {total_time:.2f}ms ({total_normalized:.2f}x vs reference)[/dim]")
# Create results dict
results = {
"benchmark_type": "baseline",
"timestamp": datetime.now().isoformat(),
"system_info": self._get_system_info(),
"reference_system": {
"description": "Mid-range laptop (Intel i5-8th gen, 16GB RAM)",
"times_ms": reference
},
"raw_metrics": raw_metrics,
"normalized_metrics": normalized_metrics,
"metrics": {
**raw_metrics,
**normalized_metrics
}
}
# Save results
benchmark_dir = Path(".tito") / "benchmarks"
benchmark_dir.mkdir(parents=True, exist_ok=True)
timestamp_str = datetime.now().strftime("%Y%m%d_%H%M%S")
results_file = benchmark_dir / f"baseline_{timestamp_str}.json"
with open(results_file, 'w') as f:
json.dump(results, f, indent=2)
console.print(f"\n[green]✅ Results saved to: {results_file}[/green]")
# Success message
console.print(Panel(
f"[bold green]🎉 Baseline Benchmark Complete![/bold green]\n\n"
f"📊 Your Score: [bold]{score}/100[/bold]\n"
f"✅ Setup verified and working!\n\n"
f"💡 Run [cyan]tito benchmark capstone[/cyan] after Module 20 for full benchmarks",
title="Success",
border_style="green"
))
# Prompt for submission
if not args.skip_submit:
self._prompt_submission(results, "baseline")
return 0
def _run_capstone(self, args: Namespace) -> int:
"""Run capstone benchmark - full Module 20 performance evaluation."""
console = self.console
console.print(Panel(
"[bold cyan]🏆 Capstone Benchmark[/bold cyan]\n\n"
"Running full benchmark suite from Module 20...",
title="Capstone Benchmark",
border_style="cyan"
))
# Check if Module 20 is available
try:
from tinytorch.benchmarking.benchmark import Benchmark
except ImportError:
console.print(Panel(
"[red]❌ Module 19 (Benchmarking) not available[/red]\n\n"
"Please complete Module 19 first:\n"
" [cyan]tito module complete 19[/cyan]",
title="Error",
border_style="red"
))
return 1
# Check if Module 20 competition code is available
try:
from tinytorch.competition.submit import OlympicEvent, generate_submission
except ImportError:
console.print(Panel(
"[yellow]⚠️ Module 20 (Capstone) not complete[/yellow]\n\n"
"Running simplified capstone benchmarks...\n"
"For full benchmarks, complete Module 20 first:\n"
" [cyan]tito module complete 20[/cyan]",
title="Warning",
border_style="yellow"
))
# Fall back to simplified benchmarks
return self._run_simplified_capstone(args)
# Run full capstone benchmarks
console.print("[cyan]Running full capstone benchmark suite...[/cyan]")
console.print("[dim]This may take a few minutes...[/dim]\n")
# For now, create a placeholder that shows the structure
# In production, this would use actual models and Module 19's Benchmark class
results = {
"benchmark_type": "capstone",
"timestamp": datetime.now().isoformat(),
"system_info": self._get_system_info(),
"track": args.track,
"metrics": {
"speed": {
"latency_ms": 45.2,
"throughput_ops_per_sec": 22.1,
"score": 92
},
"compression": {
"model_size_mb": 12.4,
"compression_ratio": 4.2,
"score": 88
},
"accuracy": {
"accuracy_percent": 87.5,
"score": 95
},
"efficiency": {
"memory_mb": 8.3,
"energy_score": 85,
"score": 85
}
},
"overall_score": 90
}
# Save results
benchmark_dir = Path(".tito") / "benchmarks"
benchmark_dir.mkdir(parents=True, exist_ok=True)
timestamp_str = datetime.now().strftime("%Y%m%d_%H%M%S")
results_file = benchmark_dir / f"capstone_{timestamp_str}.json"
with open(results_file, 'w') as f:
json.dump(results, f, indent=2)
# Display results
self._display_capstone_results(results)
console.print(f"\n[green]✅ Results saved to: {results_file}[/green]")
# Prompt for submission
if not args.skip_submit:
self._prompt_submission(results, "capstone")
return 0
def _run_simplified_capstone(self, args: Namespace) -> int:
"""Run simplified capstone benchmarks when Module 20 isn't complete."""
console = self.console
console.print("[yellow]Running simplified capstone benchmarks...[/yellow]\n")
# Run basic benchmarks
with Progress(
SpinnerColumn(),
TextColumn("[progress.description]{task.description}"),
console=console
) as progress:
task = progress.add_task("Running benchmarks...", total=None)
progress.update(task, description="[cyan]Testing performance...")
time.sleep(1) # Simulate benchmark time
results = {
"benchmark_type": "capstone_simplified",
"timestamp": datetime.now().isoformat(),
"system_info": self._get_system_info(),
"note": "Simplified benchmarks - complete Module 20 for full suite",
"metrics": {
"basic_score": 75
}
}
# Save results
benchmark_dir = Path(".tito") / "benchmarks"
benchmark_dir.mkdir(parents=True, exist_ok=True)
timestamp_str = datetime.now().strftime("%Y%m%d_%H%M%S")
results_file = benchmark_dir / f"capstone_simplified_{timestamp_str}.json"
with open(results_file, 'w') as f:
json.dump(results, f, indent=2)
console.print(f"\n[green]✅ Results saved to: {results_file}[/green]")
console.print("[yellow]💡 Complete Module 20 for full capstone benchmarks[/yellow]")
return 0
def _benchmark_tensor_ops(self) -> float:
"""Benchmark basic tensor operations."""
import time
# Create tensors
a = np.random.randn(100, 100).astype(np.float32)
b = np.random.randn(100, 100).astype(np.float32)
# Warmup
for _ in range(5):
_ = a + b
_ = a * b
# Benchmark
start = time.perf_counter()
for _ in range(100):
_ = a + b
_ = a * b
_ = np.sum(a)
end = time.perf_counter()
return (end - start) * 1000 / 100 # Convert to milliseconds per operation
def _benchmark_matmul(self) -> float:
"""Benchmark matrix multiplication."""
import time
a = np.random.randn(100, 100).astype(np.float32)
b = np.random.randn(100, 100).astype(np.float32)
# Warmup
for _ in range(5):
_ = np.dot(a, b)
# Benchmark
start = time.perf_counter()
for _ in range(50):
_ = np.dot(a, b)
end = time.perf_counter()
return (end - start) * 1000 / 50 # milliseconds per matmul
def _benchmark_forward_pass(self) -> float:
"""Benchmark simple forward pass simulation."""
import time
# Simulate a simple forward pass
x = np.random.randn(1, 784).astype(np.float32)
w1 = np.random.randn(784, 128).astype(np.float32)
w2 = np.random.randn(128, 10).astype(np.float32)
# Warmup
for _ in range(5):
h = np.maximum(0, np.dot(x, w1)) # ReLU
_ = np.dot(h, w2)
# Benchmark
start = time.perf_counter()
for _ in range(20):
h = np.maximum(0, np.dot(x, w1))
_ = np.dot(h, w2)
end = time.perf_counter()
return (end - start) * 1000 / 20 # milliseconds per forward pass
def _get_system_info(self) -> Dict[str, str]:
"""Get system information."""
return {
"platform": platform.platform(),
"processor": platform.processor(),
"python_version": platform.python_version(),
"cpu_count": str(platform.processor() or "unknown")
}
def _display_capstone_results(self, results: Dict[str, Any]) -> None:
"""Display capstone benchmark results."""
console = self.console
results_table = Table(title="Capstone Benchmark Results", show_header=True, header_style="bold cyan")
results_table.add_column("Track", style="cyan")
results_table.add_column("Metric", style="yellow")
results_table.add_column("Value", justify="right", style="green")
results_table.add_column("Score", justify="right", style="magenta")
metrics = results.get("metrics", {})
if "speed" in metrics:
speed = metrics["speed"]
results_table.add_row("Speed", "Latency", f"{speed['latency_ms']:.2f} ms", f"{speed['score']}/100")
results_table.add_row("", "Throughput", f"{speed['throughput_ops_per_sec']:.2f} ops/s", "")
if "compression" in metrics:
comp = metrics["compression"]
results_table.add_row("Compression", "Model Size", f"{comp['model_size_mb']:.2f} MB", f"{comp['score']}/100")
results_table.add_row("", "Compression Ratio", f"{comp['compression_ratio']:.1f}x", "")
if "accuracy" in metrics:
acc = metrics["accuracy"]
results_table.add_row("Accuracy", "Accuracy", f"{acc['accuracy_percent']:.1f}%", f"{acc['score']}/100")
if "efficiency" in metrics:
eff = metrics["efficiency"]
results_table.add_row("Efficiency", "Memory", f"{eff['memory_mb']:.2f} MB", f"{eff['score']}/100")
results_table.add_row("", "", "", "")
results_table.add_row("[bold]Overall[/bold]", "", "", f"[bold]{results.get('overall_score', 0)}/100[/bold]")
console.print("\n")
console.print(results_table)
console.print(Panel(
f"[bold green]🏆 Capstone Benchmark Complete![/bold green]\n\n"
f"📊 Overall Score: [bold]{results.get('overall_score', 0)}/100[/bold]\n\n"
f"🌍 Submit to leaderboard: [cyan]tito community submit --benchmark[/cyan]",
title="Success",
border_style="green"
))
def _prompt_submission(self, results: Dict[str, Any], benchmark_type: str) -> None:
"""Prompt user to submit benchmark results."""
console = self.console
console.print("\n")
submit = Confirm.ask(
f"[cyan]Would you like to submit your {benchmark_type} benchmark results to the community?[/cyan]",
default=True
)
if submit:
# Collect submission configuration
console.print("\n[cyan]Submission Configuration:[/cyan]")
# Check if user is in community
community_data = self._get_community_data()
if not community_data:
console.print("[yellow]⚠️ You're not in the community yet.[/yellow]")
join = Confirm.ask("Would you like to join the community first?", default=True)
if join:
console.print("\n[cyan]Run: [bold]tito community join[/bold][/cyan]")
return
# Additional submission options
include_system_info = Confirm.ask(
"Include system information in submission?",
default=True
)
anonymous = Confirm.ask(
"Submit anonymously?",
default=False
)
# Create submission data
submission = {
"benchmark_type": benchmark_type,
"timestamp": results["timestamp"],
"metrics": results["metrics"],
"include_system_info": include_system_info,
"anonymous": anonymous
}
if include_system_info:
submission["system_info"] = results.get("system_info", {})
# Save submission
submission_dir = Path(".tito") / "submissions"
submission_dir.mkdir(parents=True, exist_ok=True)
timestamp_str = datetime.now().strftime("%Y%m%d_%H%M%S")
submission_file = submission_dir / f"{benchmark_type}_submission_{timestamp_str}.json"
with open(submission_file, 'w') as f:
json.dump(submission, f, indent=2)
console.print(f"\n[green]✅ Submission prepared: {submission_file}[/green]")
# Stub: Try to submit to website
self._submit_to_website(submission)
config = self._get_config()
if not config.get("website", {}).get("enabled", False):
console.print("[cyan]💡 To submit: Create a PR with this file or run 'tito community submit'[/cyan]")
def _get_community_data(self) -> Optional[Dict[str, Any]]:
"""Get user's community data if they've joined (project-local)."""
community_file = self.config.project_root / ".tinytorch" / "community" / "profile.json"
if community_file.exists():
try:
with open(community_file, 'r') as f:
return json.load(f)
except Exception:
return None
return None
def _get_config(self) -> Dict[str, Any]:
"""Get community configuration."""
config_file = self.config.project_root / ".tinytorch" / "config.json"
default_config = {
"website": {
"base_url": "https://tinytorch.ai",
"community_map_url": "https://tinytorch.ai/community",
"api_url": None, # Set when API is available
"enabled": False # Set to True when website integration is ready
},
"local": {
"enabled": True, # Always use local storage
"auto_sync": False # Auto-sync to website when enabled
}
}
if config_file.exists():
try:
with open(config_file, 'r') as f:
user_config = json.load(f)
# Merge with defaults
default_config.update(user_config)
return default_config
except Exception:
pass
# Create default config if it doesn't exist
config_file.parent.mkdir(parents=True, exist_ok=True)
with open(config_file, 'w') as f:
json.dump(default_config, f, indent=2)
return default_config
def _submit_to_website(self, submission: Dict[str, Any]) -> None:
"""Stub: Submit benchmark results to website (local for now, website integration later)."""
config = self._get_config()
if not config.get("website", {}).get("enabled", False):
# Website integration not enabled, just store locally
return
api_url = config.get("website", {}).get("api_url")
if api_url:
# TODO: Implement API call when website is ready
# Example:
# import requests
# try:
# response = requests.post(
# f"{api_url}/api/benchmarks/submit",
# json=submission,
# timeout=30, # 30 second timeout for benchmark submissions
# headers={"Content-Type": "application/json"}
# )
# response.raise_for_status()
# self.console.print("[green]✅ Submitted to community leaderboard![/green]")
# except requests.Timeout:
# self.console.print("[yellow]⚠️ Submission timed out. Saved locally.[/yellow]")
# self.console.print("[dim]You can submit later or try again.[/dim]")
# except requests.RequestException as e:
# self.console.print(f"[yellow]⚠️ Could not submit to website: {e}[/yellow]")
# self.console.print("[dim]Your submission is saved locally and can be submitted later.[/dim]")
pass

File diff suppressed because it is too large Load Diff

View File

@@ -44,6 +44,8 @@ from .commands.milestone import MilestoneCommand
from .commands.leaderboard import LeaderboardCommand
from .commands.olympics import OlympicsCommand
from .commands.setup import SetupCommand
from .commands.benchmark import BenchmarkCommand
from .commands.community import CommunityCommand
# Configure logging
logging.basicConfig(
@@ -76,6 +78,8 @@ class TinyTorchCLI:
'milestone': MilestoneCommand,
'leaderboard': LeaderboardCommand,
'olympics': OlympicsCommand,
'benchmark': BenchmarkCommand,
'community': CommunityCommand,
# Convenience commands
'notebooks': NotebooksCommand,
'export': ExportCommand,