Cleanup: Remove old/unused files

- Remove datasets analysis and download scripts (replaced by updated README)
- Remove archived book development documentation
- Remove module review reports (16_compression, 17_memoization)
This commit is contained in:
Vijay Janapa Reddi
2025-11-11 19:04:56 -05:00
parent aeb6638975
commit cb5ad9ccf1
14 changed files with 0 additions and 3923 deletions

View File

@@ -1,351 +0,0 @@
# TinyTorch Dataset Analysis & Strategy
**Date**: November 10, 2025
**Purpose**: Determine which datasets to ship with TinyTorch for optimal educational experience
---
## Current Milestone Data Usage
### Summary Table
| Milestone | File | Data Source | Currently Shipped? | Size | Issue |
|-----------|------|-------------|-------------------|------|-------|
| **01 Perceptron** | perceptron_trained.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
| **01 Perceptron** | forward_pass.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
| **02 XOR** | xor_crisis.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
| **02 XOR** | xor_solved.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
| **03 MLP** | mlp_digits.py | `03_1986_mlp/data/digits_8x8.npz` | ✅ YES | 67 KB | **Sklearn source** |
| **03 MLP** | mlp_mnist.py | Downloads via `data_manager.get_mnist()` | ❌ NO | ~10 MB | **Download fails** |
| **04 CNN** | cnn_digits.py | `03_1986_mlp/data/digits_8x8.npz` (shared) | ✅ YES | 67 KB | **Sklearn source** |
| **04 CNN** | lecun_cifar10.py | Downloads via `data_manager.get_cifar10()` | ❌ NO | ~170 MB | **Too large** |
| **05 Transformer** | vaswani_chatgpt.py | `datasets/tinytalks/` | ✅ YES | 140 KB | None ✓ |
| **05 Transformer** | vaswani_copilot.py | Embedded Python patterns (in code) | ✅ N/A | 0 KB | None ✓ |
| **05 Transformer** | profile_kv_cache.py | Uses model from vaswani_chatgpt | ✅ N/A | 0 KB | None ✓ |
---
## Detailed Analysis
### ✅ What's Working (6/11 files)
**Fully Self-Contained:**
1. **Perceptron milestones** - Generate linearly separable data on-the-fly
2. **XOR milestones** - Generate XOR patterns on-the-fly
3. **mlp_digits.py** - Uses shipped `digits_8x8.npz` (67KB, sklearn digits)
4. **cnn_digits.py** - Reuses `digits_8x8.npz` (smart sharing!)
5. **vaswani_chatgpt.py** - Uses shipped TinyTalks (140KB)
6. **vaswani_copilot.py** - Embedded patterns in code
**Result**: 6 of 11 milestone files work offline, instantly, with zero setup.
### ❌ What's Broken (2/11 files)
**Requires External Downloads:**
1. **mlp_mnist.py** - Tries to download 10MB MNIST, fails with 404 error
2. **lecun_cifar10.py** - Tries to download 170MB CIFAR-10
**Impact**:
- Students can't run 2 milestone files without internet
- Downloads fail (saw 404 error in testing)
- First-time experience is 5+ minute wait or failure
### ⚠️ What's Problematic (3/11 files use sklearn data)
**Uses sklearn's digits dataset:**
- `digits_8x8.npz` (67KB) is currently shipped
- **Source**: Originally from sklearn.datasets.load_digits()
- **Issue**: Not "TinyTorch data", it's sklearn's data
- **Citation problem**: Can't cite as "TinyTorch educational dataset"
---
## Current Datasets Directory
```
datasets/
├── README.md (4KB)
├── download_mnist.py (unused script)
├── tiny/ (76KB - unknown purpose)
├── tinymnist/ (3.6MB - synthetic, recently added)
│ ├── train.pkl
│ └── test.pkl
└── tinytalks/ (140KB) ✅ TinyTorch original!
├── CHANGELOG.md
├── DATASHEET.md
├── README.md
├── LICENSE
├── splits/
│ ├── train.txt (12KB)
│ ├── val.txt
│ └── test.txt
└── tinytalks_v1.txt
```
**Current total**: ~3.8MB shipped data
---
## The Core Issues
### 1. **Attribution & Citation Problem**
Current situation:
- `digits_8x8.npz` = sklearn's data (not TinyTorch's)
- TinyTalks = TinyTorch original ✓
- tinymnist = Synthetic (not authentic MNIST)
**For white paper citation**, you need:
- ❌ Can't cite "digits_8x8" as TinyTorch dataset (it's sklearn)
- ✅ Can cite "TinyTalks" as TinyTorch original
- ❌ Can't cite synthetic tinymnist as educational benchmark
### 2. **Authenticity vs Speed Trade-off**
**Option A: Synthetic Data**
- ✅ Ships with repo (instant start)
- ❌ Not real examples (lower educational value)
- ❌ Not citable as benchmark
**Option B: Curated Real Data**
- ✅ Authentic samples from MNIST/CIFAR
- ✅ Citable as educational benchmark
- ✅ Teaches pattern recognition on real data
- ❌ Needs to be generated once from source
### 3. **The sklearn Dependency**
Files using sklearn data:
- mlp_digits.py
- cnn_digits.py
**Problem**:
- Not TinyTorch data
- Citation goes to sklearn, not you
- Loses educational ownership
---
## Recommended Strategy: TinyTorch Native Datasets
### Phase 1: Replace sklearn with TinyDigits ✅
**Create**: `datasets/tinydigits/`
- **Source**: Extract 200 samples from sklearn's digits (8x8 grayscale)
- **Purpose**: Replace `03_1986_mlp/data/digits_8x8.npz`
- **Size**: ~20KB
- **Citation**: "TinyDigits, curated from sklearn digits dataset for educational use"
**Files**:
```
datasets/tinydigits/
├── README.md (explains curation process)
├── train.pkl (150 samples, 8x8, ~15KB)
└── test.pkl (47 samples, 8x8, ~5KB)
```
**Why this works**:
- ✅ Quick start (instant, offline)
- ✅ Real data (from sklearn)
- ✅ TinyTorch branding
- ✅ Small enough to ship (20KB)
- ✅ Can cite: "We curated TinyDigits from the sklearn digits dataset"
### Phase 2: Create TinyMNIST (Real Samples) ✅
**Create**: `datasets/tinymnist/` (replace synthetic)
- **Source**: Extract 1000 best samples from actual MNIST
- **Purpose**: Fast MNIST demo for MLP milestone
- **Size**: ~90KB
- **Citation**: "TinyMNIST, 1K curated samples from MNIST (LeCun et al., 1998)"
**Curation criteria**:
- 100 samples per digit (0-9)
- Select clearest, most "canonical" examples
- Balanced difficulty (not all easy, not all hard)
- Test edge cases (ambiguous digits for teaching)
**Files**:
```
datasets/tinymnist/
├── README.md (explains curation from MNIST)
├── LICENSE (cite LeCun et al., 1998)
├── train.pkl (1000 samples, 28x28, ~75KB)
└── test.pkl (200 samples, 28x28, ~15KB)
```
**Why this works**:
- ✅ Authentic MNIST samples
- ✅ Fast enough to ship (90KB vs 10MB)
- ✅ Citable: "TinyMNIST subset for educational scaffolding"
- ✅ Students graduate to full MNIST later
### Phase 3: Document TinyTalks Properly ✅
**Already exists**: `datasets/tinytalks/` (140KB)
- ✅ Original TinyTorch creation
- ✅ Properly documented with DATASHEET.md
- ✅ Leveled difficulty (L1-L5)
- ✅ Citable as original work
**Action needed**: None! This is perfect.
### Phase 4: Skip TinyCIFAR (Too Large)
**Decision**: DON'T create TinyCIFAR
- CIFAR-10 at 1000 samples would still be ~3MB (color images)
- Combined with other data = 4+ MB repo bloat
- **Better**: Keep download-on-demand for CIFAR-10
**For lecun_cifar10.py**:
- Add `--download` flag to explicitly trigger download
- Add helpful error message: "Run with --download to fetch CIFAR-10 (170MB, 2-3 min)"
- Document that this is the "graduate to real benchmarks" milestone
---
## Final Dataset Suite
### What to Ship with TinyTorch
```
datasets/
├── tinydigits/ ~20KB ← NEW: Replace sklearn digits
│ ├── README.md
│ ├── train.pkl (150 samples, 8x8)
│ └── test.pkl (47 samples, 8x8)
├── tinymnist/ ~90KB ← REPLACE: Real MNIST subset
│ ├── README.md
│ ├── LICENSE (cite LeCun)
│ ├── train.pkl (1000 samples, 28x28)
│ └── test.pkl (200 samples, 28x28)
└── tinytalks/ ~140KB ← KEEP: Original TinyTorch
├── DATASHEET.md
├── README.md
├── LICENSE
└── splits/
├── train.txt
├── val.txt
└── test.txt
TOTAL: ~250KB (negligible repo impact)
```
### What NOT to Ship
**Don't include**:
- ❌ Full MNIST (10MB) - download on demand
- ❌ CIFAR-10 (170MB) - download on demand
- ❌ Any dataset >1MB - defeats portability
- ❌ Synthetic fake data - not authentic enough
---
## Citation Strategy
### White Paper Language
```markdown
## TinyTorch Educational Datasets
We developed three curated datasets optimized for progressive learning:
### TinyDigits (8×8 Grayscale, 200 samples)
Curated subset of sklearn's digits dataset, selected for visual clarity
and progressive difficulty. Used for rapid prototyping and CNN concept
demonstrations.
### TinyMNIST (28×28 Grayscale, 1.2K samples)
Curated subset of MNIST (LeCun et al., 1998), with 100 canonical examples
per digit class. Balances authentic data with fast iteration cycles,
enabling students to achieve success in <30 seconds while learning on
real handwritten digits.
### TinyTalks (Text Q&A, 300 pairs)
Original conversational dataset with 5 difficulty levels (L1: Greetings
→ L5: Context reasoning). Designed specifically for teaching attention
mechanisms and transformer architectures with clear learning signal and
fast convergence.
### Design Philosophy
- **Speed**: All datasets train in <60 seconds on CPU
- **Authenticity**: Real data (MNIST digits, human conversations)
- **Progressive**: TinyX → Full X graduation path
- **Reproducible**: Fixed subsets ensure consistent results
- **Offline**: No download dependencies for core learning
### Comparison to Standard Benchmarks
| Metric | MNIST | TinyMNIST | Impact |
|--------|-------|-----------|--------|
| Samples | 60,000 | 1,000 | 60× faster |
| Train time | 5-10 min | 30 sec | 10-20× faster |
| Download | 10MB, network | 0, offline | Always works |
| Student success | 65% (frustration) | 95% (confidence) | Better outcomes |
```
**This is citable research**. You're not just using datasets, you're **designing educational infrastructure**.
---
## Implementation Checklist
### Immediate Actions
- [x] Keep TinyTalks as-is (perfect!)
- [ ] Create TinyDigits from sklearn digits (replace 03_1986_mlp/data/)
- [ ] Create TinyMNIST from real MNIST (replace synthetic version)
- [ ] Remove synthetic tinymnist (not authentic)
- [ ] Update milestones to use new TinyDigits
- [ ] Update milestones to use new TinyMNIST
- [ ] Add download instructions for full MNIST/CIFAR
- [ ] Write datasets/PHILOSOPHY.md explaining curation
- [ ] Add LICENSE files citing original sources
- [ ] Write DATASHEET.md for each dataset
### File Changes Needed
**Update these milestones**:
1. `mlp_digits.py` - Point to `datasets/tinydigits/`
2. `cnn_digits.py` - Point to `datasets/tinydigits/`
3. `mlp_mnist.py` - Point to `datasets/tinymnist/` first, offer --full flag
4. `lecun_cifar10.py` - Add helpful message about --download flag
**Remove**:
- `03_1986_mlp/data/digits_8x8.npz` (replace with TinyDigits)
- Synthetic tinymnist pkl files (replace with real)
---
## Success Metrics
### Before (Current State)
- ✅ 6/11 milestones work offline
- ❌ 2/11 require downloads (often fail)
- ❌ 3/11 use non-TinyTorch data (sklearn)
- ❌ Not citable as educational infrastructure
### After (Proposed)
- ✅ 9/11 milestones work offline (<30 sec)
- ✅ 2/11 offer optional downloads with clear UX
- ✅ 3 TinyTorch-branded datasets (citable)
- ✅ White paper section on educational dataset design
- ✅ Total shipped data: ~250KB (negligible)
---
## Conclusion
**Recommendation**: Create TinyDigits and authentic TinyMNIST
**Rationale**:
1. **Educational**: Real data beats synthetic for learning
2. **Citable**: "TinyTorch educational datasets" becomes research contribution
3. **Practical**: 250KB total keeps repo lightweight
4. **Professional**: Proper curation, documentation, licenses
5. **Scalable**: Clear graduation path to full benchmarks
**Not reinventing the wheel** - building educational infrastructure that doesn't exist.
The goal: Make TinyTorch not just a framework, but a **citable educational system** with purpose-designed datasets.

View File

@@ -1,102 +0,0 @@
#!/usr/bin/env python3
"""
Download MNIST dataset files.
"""
import os
import gzip
import urllib.request
import numpy as np
def download_mnist():
"""Download MNIST dataset files."""
# Create mnist directory
os.makedirs('mnist', exist_ok=True)
# URLs for MNIST dataset (from original source)
base_url = 'http://yann.lecun.com/exdb/mnist/'
files = {
'train-images-idx3-ubyte.gz': 'train_images',
'train-labels-idx1-ubyte.gz': 'train_labels',
't10k-images-idx3-ubyte.gz': 'test_images',
't10k-labels-idx1-ubyte.gz': 'test_labels'
}
print("📥 Downloading MNIST dataset...")
for filename, label in files.items():
filepath = os.path.join('mnist', filename)
# Skip if already downloaded
if os.path.exists(filepath) and os.path.getsize(filepath) > 1000:
print(f"{filename} already exists")
continue
url = base_url + filename
print(f" Downloading {filename}...")
try:
# Download with custom headers to avoid 403 errors
request = urllib.request.Request(
url,
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
)
with urllib.request.urlopen(request) as response:
data = response.read()
# Save the file
with open(filepath, 'wb') as f:
f.write(data)
size = len(data) / 1024 / 1024
print(f" ✓ Downloaded {size:.1f} MB")
except Exception as e:
print(f" ✗ Failed: {e}")
print(f" Trying alternative method...")
# Alternative: Create synthetic MNIST-like data for testing
if 'images' in label:
# Create synthetic image data (60000 or 10000 samples)
n_samples = 60000 if 'train' in label else 10000
images = np.random.randint(0, 256, (n_samples, 28, 28), dtype=np.uint8)
# MNIST file format header
header = np.array([0x0803, n_samples, 28, 28], dtype='>i4')
with gzip.open(filepath, 'wb') as f:
f.write(header.tobytes())
f.write(images.tobytes())
print(f" ✓ Created synthetic {label} data")
else:
# Create synthetic label data
n_samples = 60000 if 'train' in label else 10000
labels = np.random.randint(0, 10, n_samples, dtype=np.uint8)
# MNIST file format header
header = np.array([0x0801, n_samples], dtype='>i4')
with gzip.open(filepath, 'wb') as f:
f.write(header.tobytes())
f.write(labels.tobytes())
print(f" ✓ Created synthetic {label} data")
print("\n✅ MNIST dataset ready in datasets/mnist/")
# Verify files
print("\nVerifying files:")
for filename in files.keys():
filepath = os.path.join('mnist', filename)
if os.path.exists(filepath):
size = os.path.getsize(filepath) / 1024 / 1024
print(f" {filename}: {size:.1f} MB")
if __name__ == "__main__":
download_mnist()

View File

@@ -1,30 +0,0 @@
{
"mnist": {
"dataset": "tinymnist",
"training_time": 0.5278840065002441,
"epochs": 20,
"final_accuracy": 27.0,
"architecture": "MLP(784\u2192128\u219210)",
"suitable_for_students": false
},
"vww": {
"dataset": "tinyvww",
"training_time": 8.571065664291382,
"epochs": 15,
"final_accuracy": 100.0,
"architecture": "CNN(Conv\u2192Pool\u2192Conv\u2192Pool\u2192FC)",
"precision": 1.0,
"recall": 1.0,
"f1_score": 1.0,
"suitable_for_students": true
},
"gpt": {
"dataset": "tinypy",
"training_time": 2.596580743789673,
"epochs": 10,
"final_loss": 1.9299052770321186,
"final_perplexity": 6.888857677630846,
"architecture": "TinyGPT(64 embed, 4 heads, 2 layers)",
"suitable_for_students": true
}
}

View File

@@ -1,127 +0,0 @@
# TinyTorch Flame-Inspired Design System
## Design Philosophy
The TinyTorch website design is inspired by the flame logo, creating a warm, professional academic environment that reflects the educational nature of the framework while maintaining credibility and accessibility.
## Color Palette
### Primary Flame Colors (Extracted from Logo)
- **Flame Primary**: `#E85A34` - Main orange from the flame
- **Flame Secondary**: `#F97316` - Secondary warm orange
- **Flame Light**: `#FED7AA` - Light warm orange for backgrounds
- **Flame Yellow**: `#FCD34D` - Warm yellow from flame core
- **Flame Deep**: `#DC2626` - Deep red from flame base
### Professional Text Colors
- **Text Dark**: `#1F2937` - Primary text color
- **Text Medium**: `#4B5563` - Secondary text
- **Text Light**: `#6B7280` - Tertiary text
### Background System
- **Background Main**: `#F8F9FA` - Matches logo background
- **Background White**: `#FFFFFF` - Content areas
- **Background Warm**: `#FEF7F0` - Subtle warm backgrounds
- **Accent Gradient**: Subtle flame-inspired gradient
## Design Principles
### 1. Warm Professionalism
- Flame colors provide warmth without sacrificing academic credibility
- Subtle gradients and warm backgrounds create inviting learning environment
- Professional typography maintains educational standards
### 2. Clean Academic Lines
- **No curved borders** - maintains academic formality
- Clean rectangular layouts with flame-colored accents
- Consistent spacing and typography hierarchy
### 3. Flame-Inspired Accents
- **Left borders**: Flame gradients on content blocks, code, and admonitions
- **Progress indicators**: Flame gradient progress bars
- **Interactive elements**: Flame colors for hover states and focus
### 4. Subtle Visual Hierarchy
- **H1 headers**: Flame gradient underlines
- **H3 headers**: Flame primary color
- **Links**: Flame primary with deeper red hover
- **Buttons**: Flame primary background with professional styling
## Component Styling
### Navigation
- **Sidebar**: Flame primary accents for current/hover states
- **Header**: Clean white with flame-colored interactive elements
- **TOC**: No curves, flame-colored indicators
### Content Areas
- **Code blocks**: Warm background with flame gradient left border
- **Admonitions**: Flame-colored borders with warm backgrounds
- **Blockquotes**: Flame left border with warm background
### Interactive Elements
- **Buttons**: Flame primary background, clean professional styling
- **Focus states**: Flame-colored outlines
- **Selection**: Flame background for text selection
- **Hover effects**: Subtle flame-colored shadows and transforms
### Special Components
- **Achievement cards**: Flame left borders with hover animations
- **Learning path steps**: Flame indicators with warm backgrounds
- **Module badges**: Flame-colored completion indicators
- **CTA boxes**: Flame gradient backgrounds with flame borders
## Accessibility Features
### High Contrast Support
- Darker flame colors in high contrast mode
- Maintained readability standards
- WCAG AA compliance for color contrast
### Reduced Motion Support
- Disabled animations for users with motion sensitivity
- Static alternatives for all animated elements
### Focus Management
- Clear flame-colored focus indicators
- Keyboard navigation support
- Screen reader friendly markup
## Usage Guidelines
### Do's
- Use flame colors for accents and interactive elements
- Maintain warm, professional tone
- Keep backgrounds subtle and readable
- Use gradients sparingly for emphasis
### Don'ts
- Avoid intense orange that overwhelms content
- Don't use flame colors for large background areas
- Avoid curved borders (academic requirement)
- Don't compromise text readability for visual appeal
## Implementation Notes
### CSS Custom Properties
All flame colors are defined as CSS custom properties for consistent theming and easy maintenance.
### Browser Compatibility
- Gradient fallbacks for older browsers
- Progressive enhancement for modern features
- Mobile-responsive design
### Performance
- Minimal use of animations
- Optimized gradients and shadows
- Efficient CSS organization
## Relationship to TinyTorch Logo
The design system directly extracts colors from the TinyTorch flame logo:
- Orange/red flame colors for primary accents
- Yellow core colors for highlights and progress
- Maintains visual consistency with brand identity
- Creates cohesive experience from logo to full website
This creates a unified brand experience where the logo naturally fits within the overall design language.

View File

@@ -1,452 +0,0 @@
#!/usr/bin/env python3
"""
Convert TinyTorch modules to Jupyter Book chapters.
This script processes modules/source/*_dev.py files and converts them to
student-ready notebooks for the Jupyter Book, stripping solutions manually.
"""
import os
import sys
import json
import subprocess
import tempfile
from pathlib import Path
from typing import Dict, List, Any, Optional
# Add project root to path for imports
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root))
class ModuleConverter:
"""Convert TinyTorch modules to Jupyter Book chapters."""
def __init__(self):
# Use absolute paths relative to project root
project_root = Path(__file__).parent.parent
self.modules_dir = project_root / "modules/source"
self.book_dir = project_root / "book"
self.chapters_dir = self.book_dir / "chapters"
# Module to chapter mapping
self.module_mapping = {
"": {"title": "Development Environment", "filename": "01-setup"},
"01_tensor": {"title": "Tensors", "filename": "02-tensor"},
"02_activations": {"title": "Activations", "filename": "03-activations"},
"03_layers": {"title": "Layers", "filename": "04-layers"},
"05_networks": {"title": "Networks", "filename": "05-networks"},
"06_cnn": {"title": "CNNs", "filename": "06-cnn"},
"07_dataloader": {"title": "DataLoader", "filename": "07-dataloader"},
"08_autograd": {"title": "Autograd", "filename": "08-autograd"},
"09_optimizers": {"title": "Optimizers", "filename": "09-optimizers"},
"10_training": {"title": "Training", "filename": "10-training"},
"11_compression": {"title": "Compression", "filename": "11-compression"},
"12_kernels": {"title": "Kernels", "filename": "12-kernels"},
"13_benchmarking": {"title": "Benchmarking", "filename": "13-benchmarking"},
"14_mlops": {"title": "MLOps", "filename": "14-mlops"},
}
# Mapping from directory name to dev file name
self.dev_file_mapping = {
"": "setup_dev.py",
"01_tensor": "tensor_dev.py",
"02_activations": "activations_dev.py",
"03_layers": "layers_dev.py",
"05_networks": "networks_dev.py",
"06_cnn": "cnn_dev.py",
"07_dataloader": "dataloader_dev.py",
"08_autograd": "autograd_dev.py",
"09_optimizers": "optimizers_dev.py",
"10_training": "training_dev.py",
"11_compression": "compression_dev.py",
"12_kernels": "kernels_dev.py",
"13_benchmarking": "benchmarking_dev.py",
"14_mlops": "mlops_dev.py",
}
def convert_to_notebook(self, dev_file: Path) -> Optional[Path]:
"""Convert dev file to notebook using Jupytext."""
print(f"📝 Converting {dev_file.name} to notebook")
# Create temporary output file
temp_notebook = dev_file.with_suffix('.temp.ipynb')
# Use jupytext to convert .py to .ipynb
cmd = ["jupytext", "--to", "ipynb", str(dev_file.absolute()), "--output", str(temp_notebook.absolute())]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
print(f"❌ Failed to convert {dev_file} to notebook: {result.stderr}")
return None
return temp_notebook
def remove_solutions(self, notebook_path: Path) -> Path:
"""Remove solutions from notebook."""
with open(notebook_path, 'r') as f:
notebook = json.load(f)
# Process each cell
for cell in notebook.get('cells', []):
if cell.get('cell_type') == 'code':
source = cell.get('source', [])
new_source = []
in_solution = False
for line in source:
if '### BEGIN SOLUTION' in line:
in_solution = True
new_source.append(line)
new_source.append(' # YOUR CODE HERE\n')
new_source.append(' raise NotImplementedError()\n')
continue
elif '### END SOLUTION' in line:
in_solution = False
new_source.append(line)
continue
elif in_solution:
# Skip solution lines
continue
else:
new_source.append(line)
cell['source'] = new_source
# Save processed notebook
output_path = notebook_path.with_suffix('.student.ipynb')
with open(output_path, 'w') as f:
json.dump(notebook, f, indent=2)
return output_path
def add_binder_config(self, notebook: Dict[str, Any], module_name: str) -> Dict[str, Any]:
"""Add Binder configuration to notebook metadata."""
if 'metadata' not in notebook:
notebook['metadata'] = {}
notebook['metadata'].update({
'kernelspec': {
'display_name': 'Python 3',
'language': 'python',
'name': 'python3'
},
'language_info': {
'name': 'python',
'version': '3.8+'
},
'mystnb': {
'execution_mode': 'auto'
}
})
return notebook
def extract_learning_goals(self, dev_file: Path) -> str:
"""Extract learning goals from source file and format as admonition block."""
with open(dev_file, 'r') as f:
content = f.read()
# Find the Learning Goals section
goals_start = content.find('## Learning Goals\n')
if goals_start == -1:
return ""
# Find the end of the goals section (next ## heading)
goals_content_start = goals_start + len('## Learning Goals\n')
next_section = content.find('\n## ', goals_content_start)
if next_section == -1:
# If no next section found, look for next markdown cell
next_section = content.find('\n# %%', goals_content_start)
if next_section == -1:
goals_text = content[goals_content_start:].strip()
else:
goals_text = content[goals_content_start:next_section].strip()
# Format as admonition block
admonition = ['```{admonition} 🎯 Learning Goals\n']
admonition.append(':class: tip\n')
for line in goals_text.split('\n'):
if line.strip():
admonition.append(f'{line}\n')
admonition.append('```\n\n')
return ''.join(admonition)
def extract_module_overview(self, dev_file: Path) -> str:
"""Extract first markdown cell content for book overview."""
with open(dev_file, 'r') as f:
content = f.read()
# Find first markdown cell
start = content.find('# %% [markdown]\n"""')
if start == -1:
return ""
end = content.find('"""', start + 20)
if end == -1:
return ""
# Extract and clean the content
overview = content[start + len('# %% [markdown]\n"""'):end].strip()
# Replace Learning Goals section with admonition block
learning_goals = self.extract_learning_goals(dev_file)
if learning_goals and '## Learning Goals' in overview:
# Find and replace the Learning Goals section
goals_start = overview.find('## Learning Goals')
if goals_start != -1:
# Find end of goals section
next_section = overview.find('\n## ', goals_start + 1)
if next_section == -1:
# Goals are at the end
overview = overview[:goals_start] + learning_goals
else:
# Replace goals section with admonition
overview = (overview[:goals_start] +
learning_goals +
overview[next_section:])
return overview
def create_module_overview_page(self, module_name: str) -> bool:
"""Create a module overview page for the book (hybrid approach)."""
if module_name not in self.module_mapping:
return False
module_dir = self.modules_dir / module_name
dev_file_name = self.dev_file_mapping.get(module_name)
if not dev_file_name:
return False
dev_file = module_dir / dev_file_name
if not dev_file.exists():
return False
module_info = self.module_mapping[module_name]
# Extract overview content
overview = self.extract_module_overview(dev_file)
# Create interactive launch buttons
github_url = f"https://github.com/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{dev_file_name}"
binder_url = f"https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?filepath=modules/source/{module_name}/{dev_file_name.replace('.py', '.ipynb')}"
colab_url = f"https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{dev_file_name.replace('.py', '.ipynb')}"
interactive_section = f"""
## 🚀 Interactive Learning
Choose your preferred way to engage with this module:
````{{grid}} 1 2 3 3
```{{grid-item-card}} 🚀 Launch Binder
:link: {binder_url}
:class-header: bg-light
Run this module interactively in your browser. No installation required!
```
```{{grid-item-card}} ⚡ Open in Colab
:link: {colab_url}
:class-header: bg-light
Use Google Colab for GPU access and cloud compute power.
```
```{{grid-item-card}} 📖 View Source
:link: {github_url}
:class-header: bg-light
Browse the Python source code and understand the implementation.
```
````
```{{admonition}} 💾 Save Your Progress
:class: tip
**Binder sessions are temporary!** Download your completed notebook when done, or switch to local development for persistent work.
Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/serious-development.md)
```
"""
# Combine everything
page_content = overview + interactive_section
# Save to chapters directory
self.chapters_dir.mkdir(parents=True, exist_ok=True)
output_file = self.chapters_dir / f"{module_info['filename']}.md"
with open(output_file, 'w') as f:
f.write(page_content)
print(f"✅ Created overview page: {output_file}")
return True
def add_book_frontmatter(self, notebook: Dict[str, Any], module_name: str, title: str) -> Dict[str, Any]:
"""Add Jupyter Book frontmatter to the notebook."""
# Create interactive learning admonition
interactive_cell = {
'cell_type': 'markdown',
'metadata': {},
'source': [
'```{admonition} Interactive Learning\n',
':class: tip\n',
'🚀 **Launch Binder**: Click the rocket icon above to run this chapter interactively!\n',
'\n',
'💾 **Save Your Work**: Download your completed notebook when done.\n',
'\n',
'🏗️ **Build Locally**: Ready for serious development? [Fork the repo](https://github.com/your-org/tinytorch) and work locally with the full `tito` workflow.\n',
'```\n',
'\n'
]
}
# Insert interactive cell after the first title cell
cells = notebook.get('cells', [])
# Find the first title cell and add interactive cell after it
title_found = False
for i, cell in enumerate(cells):
if cell.get('cell_type') == 'markdown':
source = ''.join(cell.get('source', []))
if source.startswith('# '):
# Insert interactive cell after the title
cells.insert(i + 1, interactive_cell)
title_found = True
break
if not title_found:
cells.insert(0, interactive_cell)
notebook['cells'] = cells
return notebook
def convert_module(self, module_name: str) -> bool:
"""Convert a single module to a chapter."""
if module_name not in self.module_mapping:
print(f"❌ Unknown module: {module_name}")
return False
module_dir = self.modules_dir / module_name
if not module_dir.exists():
print(f"❌ Module directory not found: {module_dir}")
return False
# Get the dev file name for this module
dev_file_name = self.dev_file_mapping.get(module_name)
if not dev_file_name:
print(f"❌ No dev file mapping for {module_name}")
return False
dev_file = module_dir / dev_file_name
if not dev_file.exists():
print(f"❌ Dev file not found: {dev_file}")
return False
print(f"🔄 Converting {module_name}: {dev_file}")
try:
# Convert to notebook
notebook_path = self.convert_to_notebook(dev_file)
if not notebook_path:
return False
# Keep solutions (no NBGrader processing)
# student_notebook_path = self.remove_solutions(notebook_path) # Disabled - keep solutions
# Load the full notebook with solutions
with open(notebook_path, 'r') as f:
notebook = json.load(f)
# Add book-specific enhancements
module_info = self.module_mapping[module_name]
notebook = self.add_binder_config(notebook, module_name)
# notebook = self.add_book_frontmatter(notebook, module_name, module_info['title']) # Disabled for raw export
# Save to chapters directory
self.chapters_dir.mkdir(parents=True, exist_ok=True)
output_file = self.chapters_dir / f"{module_info['filename']}.ipynb"
with open(output_file, 'w') as f:
json.dump(notebook, f, indent=2)
print(f"✅ Created chapter: {output_file}")
# Clean up temporary files
notebook_path.unlink(missing_ok=True)
return True
except Exception as e:
print(f"❌ Error converting {module_name}: {e}")
return False
def convert_all_modules(self) -> bool:
"""Convert all available modules."""
print("🔄 Converting all TinyTorch modules to Jupyter Book chapters...")
success_count = 0
total_count = 0
for module_name in self.module_mapping.keys():
total_count += 1
if self.convert_module(module_name):
success_count += 1
print(f"\n📊 Conversion Summary:")
print(f" ✅ Success: {success_count}/{total_count} modules")
print(f" 📁 Output: {self.chapters_dir}")
return success_count == total_count
def main():
"""Main conversion script."""
import argparse
parser = argparse.ArgumentParser(description="Convert TinyTorch modules to Jupyter Book")
parser.add_argument('--module', help='Convert specific module (e.g., )')
parser.add_argument('--all', action='store_true', help='Convert all modules')
parser.add_argument('--overview', action='store_true', help='Create overview pages instead of full notebooks')
parser.add_argument('--overview-module', help='Create overview page for specific module')
args = parser.parse_args()
converter = ModuleConverter()
if args.overview_module:
success = converter.create_module_overview_page(args.overview_module)
sys.exit(0 if success else 1)
elif args.overview:
# Create overview pages for all modules
print("🔄 Creating module overview pages for Jupyter Book...")
success_count = 0
total_count = 0
for module_name in converter.module_mapping.keys():
total_count += 1
if converter.create_module_overview_page(module_name):
success_count += 1
print(f"\n📊 Overview Creation Summary:")
print(f" ✅ Success: {success_count}/{total_count} modules")
print(f" 📁 Output: {converter.chapters_dir}")
success = success_count == total_count
sys.exit(0 if success else 1)
elif args.module:
success = converter.convert_module(args.module)
sys.exit(0 if success else 1)
elif args.all:
success = converter.convert_all_modules()
sys.exit(0 if success else 1)
else:
parser.print_help()
sys.exit(1)
if __name__ == "__main__":
main()

View File

@@ -1,298 +0,0 @@
#!/usr/bin/env python3
"""
Convert module READMEs to Jupyter Book chapters.
This script takes README files from modules/source/*/README.md and converts them
to Jupyter Book chapters in book/chapters/ with proper frontmatter and web optimization.
"""
import os
import re
import yaml
from pathlib import Path
from typing import Dict, List, Optional
def get_module_info(module_path: Path) -> Dict[str, str]:
"""Extract module information from module.yaml file."""
yaml_path = module_path / "module.yaml"
if yaml_path.exists():
with open(yaml_path, 'r') as f:
module_data = yaml.safe_load(f)
return {
'title': module_data.get('title', module_path.name.replace('_', ' ').title()),
'description': module_data.get('description', ''),
'difficulty': module_data.get('difficulty', 'Intermediate'),
'time_estimate': module_data.get('time_estimate', '2-4 hours'),
'prerequisites': module_data.get('prerequisites', []),
'next_steps': module_data.get('next_steps', [])
}
return {}
def extract_learning_objectives(content: str) -> List[str]:
"""Extract learning objectives from README content."""
objectives = []
# Look for common patterns in READMEs
patterns = [
r'By the end of this module, you will:?\s*\n((?:- [^\n]+\n?)+)',
r'Learning Goals?:?\s*\n((?:- [^\n]+\n?)+)',
r'Learning Objectives?:?\s*\n((?:- [^\n]+\n?)+)'
]
for pattern in patterns:
match = re.search(pattern, content, re.IGNORECASE | re.MULTILINE)
if match:
objectives_text = match.group(1)
objectives = [line.strip('- ').strip() for line in objectives_text.split('\n') if line.strip().startswith('-')]
break
return objectives
def create_frontmatter(module_name: str, module_info: Dict[str, str], objectives: List[str]) -> str:
"""Create Jupyter Book frontmatter for the chapter."""
# Clean up module name for title
title = module_info.get('title', module_name.replace('_', ' ').title())
frontmatter = f"""---
title: "{title}"
description: "{module_info.get('description', '')}"
difficulty: "{module_info.get('difficulty', 'Intermediate')}"
time_estimate: "{module_info.get('time_estimate', '2-4 hours')}"
prerequisites: {module_info.get('prerequisites', [])}
next_steps: {module_info.get('next_steps', [])}
learning_objectives: {objectives}
---
"""
return frontmatter
def enhance_content_for_web(content: str, module_name: str, module_num: int) -> str:
"""Enhance README content for web presentation."""
# Remove existing grid cards to prevent conflicts with new interactive elements
# Pattern to match grid sections (from ```{grid} to closing ```)
grid_pattern = r'```\{grid\}[^`]*?```'
content = re.sub(grid_pattern, '', content, flags=re.DOTALL)
# Also remove individual grid-item-card patterns that might be floating
grid_item_pattern = r'\{grid-item-card\}[^`]*?```'
content = re.sub(grid_item_pattern, '', content, flags=re.DOTALL)
# Clean up any remaining grid-related patterns
content = re.sub(r'\{grid-item-card\}[^\n]*\n', '', content)
content = re.sub(r':link:[^\n]*\n', '', content)
content = re.sub(r':class-[^:]*:[^\n]*\n', '', content)
# Clean up multiple newlines that result from removals
content = re.sub(r'\n{3,}', '\n\n', content)
# Add badges for difficulty and time
difficulty = get_difficulty_stars(module_name)
time_estimate = get_time_estimate(module_name)
badges = f"\n```{{div}} badges\n{difficulty} | ⏱️ {time_estimate}\n```\n"
# Get previous and next module names for navigation
prev_module = f"{module_num-1:02d}_{get_prev_module_name(module_num)}" if module_num > 1 else None
# Add interactive learning elements and navigation at the end
interactive_elements = f"""
Choose your preferred way to engage with this module:
````{{grid}} 1 2 3 3
```{{grid-item-card}} 🚀 Launch Binder
:link: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?filepath=modules/source/{module_name}/{module_name.split('_', 1)[1]}_dev.ipynb
:class-header: bg-light
Run this module interactively in your browser. No installation required!
```
```{{grid-item-card}} ⚡ Open in Colab
:link: https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{module_name.split('_', 1)[1]}_dev.ipynb
:class-header: bg-light
Use Google Colab for GPU access and cloud compute power.
```
```{{grid-item-card}} 📖 View Source
:link: https://github.com/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{module_name.split('_', 1)[1]}_dev.py
:class-header: bg-light
Browse the Python source code and understand the implementation.
```
````
```{{admonition}} 💾 Save Your Progress
:class: tip
**Binder sessions are temporary!** Download your completed notebook when done, or switch to local development for persistent work.
Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/serious-development.md)
```
---
"""
# Add navigation links
nav_links = "<div class=\"prev-next-area\">\n"
if prev_module:
nav_links += f'<a class="left-prev" href="../chapters/{prev_module}.html" title="previous page">← Previous Module</a>\n'
# Get total number of modules dynamically
module_names = get_module_names()
if module_num < len(module_names):
next_module = f"{module_num+1:02d}_{get_next_module_name(module_num)}"
nav_links += f'<a class="right-next" href="../chapters/{next_module}.html" title="next page">Next Module →</a>\n'
nav_links += "</div>\n"
# Combine interactive elements with navigation
nav_links = interactive_elements + nav_links
# Insert badges after the first heading
lines = content.split('\n')
enhanced_lines = []
added_badges = False
for i, line in enumerate(lines):
# Keep the meaningful module headers but clean up the breadcrumb reference
if line.startswith('# ') and not added_badges:
# Keep "Module: CNN" format, just remove emoji for clean display
if '🔥 Module:' in line:
line = line.replace('🔥 ', '') # Remove emoji, keep "Module: CNN"
enhanced_lines.append(line)
# Add badges after first heading
if not added_badges and line.startswith('# '):
enhanced_lines.append(badges)
added_badges = True
# Add navigation at the end
enhanced_lines.append(nav_links)
return '\n'.join(enhanced_lines)
def get_difficulty_stars(module_name: str) -> str:
"""Get difficulty stars from module.yaml file."""
# Map module number to module folder name
module_path = Path(f'../modules/source/{module_name}')
module_info = get_module_info(module_path)
return module_info.get('difficulty', '⭐⭐')
def get_time_estimate(module_name: str) -> str:
"""Get time estimate from module.yaml file."""
# Map module number to module folder name
module_path = Path(f'../modules/source/{module_name}')
module_info = get_module_info(module_path)
return module_info.get('time_estimate', '3-4 hours')
def get_module_names() -> List[str]:
"""Get actual module names from module.yaml files."""
modules_dir = Path("../modules/source")
module_names = []
# Get all module directories (sorted by number)
module_dirs = []
for item in modules_dir.iterdir():
if item.is_dir() and item.name != 'utils':
# Extract module number from directory name
match = re.match(r'(\d+)_(.+)', item.name)
if match:
module_num = int(match.group(1))
module_dirs.append((module_num, item))
# Sort by module number
module_dirs.sort(key=lambda x: x[0])
# Read module names from module.yaml files
for module_num, module_dir in module_dirs:
module_yaml_path = module_dir / "module.yaml"
if module_yaml_path.exists():
module_info = get_module_info(module_dir)
module_names.append(module_info.get('name', module_dir.name.split('_', 1)[1]))
else:
# Fallback to directory name
module_names.append(module_dir.name.split('_', 1)[1])
return module_names
def get_prev_module_name(module_num: int) -> str:
"""Get previous module name."""
module_names = get_module_names()
return module_names[module_num - 2] if module_num > 1 and module_num - 2 < len(module_names) else 'setup'
def get_next_module_name(module_num: int) -> str:
"""Get next module name."""
module_names = get_module_names()
return module_names[module_num] if module_num < len(module_names) else module_names[-1] if module_names else 'setup'
def convert_readme_to_chapter(readme_path: Path, chapter_path: Path, module_num: int):
"""Convert a single README to a Jupyter Book chapter."""
print(f"Converting {readme_path} to {chapter_path}")
# Read README content
with open(readme_path, 'r', encoding='utf-8') as f:
content = f.read()
# Get module information
module_path = readme_path.parent
module_name = module_path.name
module_info = get_module_info(module_path)
# Extract learning objectives
objectives = extract_learning_objectives(content)
# Create frontmatter
frontmatter = create_frontmatter(module_name, module_info, objectives)
# Enhance content for web
enhanced_content = enhance_content_for_web(content, module_name, module_num)
# Write chapter file
with open(chapter_path, 'w', encoding='utf-8') as f:
f.write(frontmatter)
f.write(enhanced_content)
print(f"✅ Created {chapter_path}")
def main():
"""Convert all module READMEs to Jupyter Book chapters."""
# Setup paths
modules_dir = Path("../modules/source")
chapters_dir = Path("chapters")
# Ensure chapters directory exists
chapters_dir.mkdir(exist_ok=True)
# Get all module directories (sorted by number)
module_dirs = []
for item in modules_dir.iterdir():
if item.is_dir() and item.name != 'utils':
# Extract module number from directory name
match = re.match(r'(\d+)_(.+)', item.name)
if match:
module_num = int(match.group(1))
module_dirs.append((module_num, item))
# Sort by module number
module_dirs.sort(key=lambda x: x[0])
print(f"Found {len(module_dirs)} modules to convert")
# Convert each README
for module_num, module_dir in module_dirs:
readme_path = module_dir / "README.md"
if readme_path.exists():
# Create chapter filename (just module number and name, no duplicate)
chapter_filename = f"{module_num:02d}-{module_dir.name.split('_', 1)[1]}.md"
chapter_path = chapters_dir / chapter_filename
convert_readme_to_chapter(readme_path, chapter_path, module_num)
else:
print(f"⚠️ No README.md found in {module_dir}")
print(f"\n🎉 Converted {len(module_dirs)} modules to chapters in {chapters_dir}")
if __name__ == "__main__":
main()

View File

@@ -1,663 +0,0 @@
# Frequently Asked Questions
## 🤔 Getting Started Questions
### **Installation & Setup**
**Q: I'm getting "tito: command not found" - what's wrong?**
A: This usually means your virtual environment isn't activated or TinyTorch isn't installed:
```bash
# 1. Activate virtual environment
source .venv/bin/activate # Windows: .venv\Scripts\activate
# 2. Install TinyTorch
pip install -e .
# 3. Verify installation
tito system doctor
```
**Q: What Python version do I need?**
A: Python 3.8 or higher. Check with:
```bash
python --version # Should show 3.8+
```
**Q: Can I use conda instead of venv?**
A: Yes! Replace the venv setup with:
```bash
conda create -n tinytorch python=3.9
conda activate tinytorch
pip install -r requirements.txt && pip install -e .
```
**Q: The installation is taking forever - is this normal?**
A: Initial setup typically takes 2-5 minutes depending on your connection. The main time is downloading NumPy, Jupyter, and other scientific packages.
---
## 📚 Learning Questions
### **Course Structure**
**Q: How long does TinyTorch take to complete?**
A: Depends on your goals and pace:
| **Goal** | **Time** | **Coverage** | **What You'll Build** |
|----------|----------|--------------|----------------------|
| **Quick Taste** | 15 minutes | Demo + overview | See framework in action |
| **Weekend Project** | 8-12 hours | Modules 1-6 | Neural network solver |
| **Neural Networks** | 4 weeks | Modules 1-8 | MNIST classifier |
| **Computer Vision** | 6 weeks | Modules 1-10 | CIFAR-10 CNN |
| **Language Models** | 8 weeks | Modules 1-14 | TinyGPT generator |
| **Full Framework** | 12 weeks | All 20 modules | Production-ready system |
**Q: Do I need machine learning experience to start?**
A: **No!** TinyTorch teaches ML systems from fundamentals. You need:
**✅ Required:**
- Basic Python (functions, classes, imports)
- High school math (multiplication, basic algebra)
- Curiosity about how things work
**❌ Not Required:**
- Previous ML experience
- Deep learning knowledge
- Advanced mathematics
- PyTorch/TensorFlow experience
**Q: Can I skip modules or do them out of order?**
A: **No** - the progression is carefully designed:
- Each module builds on previous implementations
- Later modules import code from earlier ones
- Checkpoints verify prerequisites are met
- Skipping creates import errors and broken functionality
**Example:** Module 6 (Autograd) requires your Tensor class from Module 2. Skipping Module 2 breaks everything that follows.
**Q: What if I get stuck on a difficult concept?**
A: Multiple support options:
1. **Interactive Help**: `tito help --interactive` for personalized guidance
2. **Module README**: Each module has detailed explanations
3. **Community Support**: Join leaderboard for peer help
4. **Troubleshooting**: `tito help troubleshooting` for common issues
5. **Office Hours**: If taking as a course, use instructor support
### **Learning Methods**
**Q: Should I read everything before coding, or jump right into coding?**
A: **Jump into coding!** TinyTorch uses active learning:
- Read just enough to understand the task
- Start implementing immediately
- Learn through building and testing
- Explanations become clearer after you've tried the code
**Q: How much time should I spend on each module?**
A: Varies by module and experience:
| **Module Type** | **Typical Time** | **Examples** |
|----------------|------------------|--------------|
| **Foundation** | 2-4 hours | Tensors, Activations |
| **Architecture** | 3-5 hours | Layers, Training |
| **Advanced** | 4-6 hours | Attention, Transformers |
| **Optimization** | 2-3 hours | Profiling, Benchmarking |
**Don't rush!** Deep understanding matters more than speed.
**Q: What's the difference between modules and checkpoints?**
A: **Modules** = Building, **Checkpoints** = Validating
| **Modules** | **Checkpoints** |
|-------------|-----------------|
| 20 hands-on coding sessions | 16 capability assessments |
| You build implementations | Tests verify understanding |
| `tito module complete 05` | `tito checkpoint test 05` |
| Export code to framework | Validate you achieved capability |
**Workflow:** Complete module → Export implementation → Checkpoint test validates learning
---
## 🛠️ Technical Questions
### **Development Workflow**
**Q: Why can't I edit files in the `tinytorch/` directory?**
A: Those files are **auto-generated** from your source modules:
**✅ Edit Here:**
```
modules/02_tensor/tensor_dev.py ← Your source code
```
**❌ Don't Edit:**
```
tinytorch/core/tensor.py ← Generated from source
```
**Workflow:**
1. Edit source: `modules/0X_name/name_dev.py`
2. Export: `tito module complete 0X_name`
3. Uses your code: `from tinytorch.core.name import Component`
**Q: What's the difference between .py and .ipynb files?**
A: **TinyTorch uses .py files only** for all development:
- **Source**: `tensor_dev.py` (edit this)
- **Generated**: `tensor_dev.ipynb` (auto-created from .py)
- **Never edit**: `.ipynb` files directly
**Why .py only?**
- Clean version control (no JSON metadata)
- Professional development practices
- Consistent environment across contributors
- Easy code review and collaboration
**Q: My tests are failing after implementing a function - what's wrong?**
A: Common debugging steps:
1. **Check syntax**: Run the module file directly
```bash
python modules/03_activations/activations_dev.py
```
2. **Verify function signature**: Make sure your function matches the expected interface
```python
# Expected
def relu(x: np.ndarray) -> np.ndarray:
# Not this
def relu(x): # Missing type hints
```
3. **Test incrementally**: Run tests after each function
```bash
tito checkpoint test 02 --verbose
```
4. **Check imports**: Ensure NumPy is imported as `np`
**Q: How do I run just one test instead of all tests?**
A: Use specific test commands:
```bash
# Test specific checkpoint
tito checkpoint test 03
# Test specific module export
tito module complete 03_activations --dry-run
# Run module file directly
python modules/03_activations/activations_dev.py
```
### **System Issues**
**Q: Jupyter Lab won't start - what's wrong?**
A: Common solutions:
1. **Check installation**:
```bash
pip install jupyterlab jupyter
jupyter lab --version
```
2. **Port conflict**:
```bash
jupyter lab --port 8889 # Try different port
```
3. **Virtual environment**:
```bash
source .venv/bin/activate # Ensure activated
which jupyter # Should show .venv path
```
**Q: I'm getting import errors when testing - help!**
A: Import errors usually mean:
1. **Virtual environment not activated**:
```bash
source .venv/bin/activate
```
2. **TinyTorch not installed in development mode**:
```bash
pip install -e . --force-reinstall
```
3. **Module not exported**:
```bash
tito module complete 0X_module_name
```
4. **Check your export directive**:
```python
#| default_exp tinytorch.core.module_name # At top of file
```
---
## 🌍 Community Questions
### **Leaderboard & Community**
**Q: Is the leaderboard competitive or supportive?**
A: **Both!** We designed it to be inclusive and encouraging:
**🏆 Multiple Ways to Excel:**
- **Progress**: Checkpoint completion (everyone can achieve)
- **Speed**: Fast learners (if that's your style)
- **Innovation**: Creative optimizations (for advanced users)
- **Community**: Helping others (valuable contribution)
**🤝 Supportive Culture:**
- Celebrate all achievements, not just "first place"
- Anonymous participation options available
- Community helps each other learn
- No shame in taking time to understand concepts
**Q: Do I have to share my progress publicly?**
A: **No!** Participation is entirely optional:
- All learning features work without leaderboard
- Checkpoint system tracks progress locally
- Join community only when/if you want to
- Privacy controls let you share what you're comfortable with
**Q: What information is shared when I join the leaderboard?**
A: You control what's shared:
**Always Shared:**
- Display name (you choose - can be pseudonymous)
- Checkpoint completion status
- Module completion dates
**Optionally Shared:**
- Real name (if you choose)
- Institution/company
- Achievement celebrations
- Optimization benchmarks
**Never Shared:**
- Personal information
- Email addresses
- Code implementations
- Detailed progress metrics (unless you opt in)
### **Competition & Olympics**
**Q: What are the Olympics and how are they different from the leaderboard?**
A: **Leaderboard** = Learning Progress, **Olympics** = Performance Competition
| **Leaderboard** | **Olympics** |
|-----------------|--------------|
| Track learning progress | Compete on optimization |
| Checkpoint completion | Benchmark performance |
| Supportive community | Competitive challenges |
| All experience levels | Advanced optimization |
**Olympics Events:**
- **MLP Sprint**: Fastest matrix operations
- **CNN Marathon**: Memory-efficient convolutions
- **Transformer Decathlon**: Complete language model optimization
**Q: Do I need to be an expert to participate in Olympics?**
A: **No!** Olympics have multiple categories:
- **Beginner**: Just-working implementations compete
- **Intermediate**: Solid optimizations
- **Advanced**: Cutting-edge techniques
- **Innovation**: Novel approaches
**Everyone can contribute and learn from others' solutions.**
---
## 🎓 Instructor Questions
### **Classroom Setup**
**Q: How much setup is required to use TinyTorch in my class?**
A: **Minimal!** TinyTorch includes complete teaching infrastructure:
**One-time Setup (30 minutes):**
```bash
tito nbgrader setup-instructor
tito grade setup-course
```
**Per-semester Setup (10 minutes):**
```bash
tito nbgrader create-student-repos
tito grade release-module 01_setup
```
**Everything Included:**
- NBGrader integration works out-of-the-box
- Student progress tracking built-in
- Automated grading workflow
- Assignment release/collection system
**Q: Can I customize the curriculum for my specific course?**
A: **Absolutely!** TinyTorch is designed for flexibility:
**Duration Options:**
- **4 weeks**: Neural network foundations (Modules 1-8)
- **8 weeks**: Add computer vision (Modules 1-10)
- **12 weeks**: Include language models (Modules 1-14)
- **16 weeks**: Complete system optimization (All 20)
**Difficulty Customization:**
- **Beginner**: Additional scaffolding and explanations
- **Advanced**: Extra optimization challenges
- **Research**: Custom project integration
**Q: How do I track student progress across the class?**
A: Multiple tracking tools built-in:
```bash
# Class overview
tito grade class-overview
# Individual student
tito grade student-progress student_name
# Checkpoint statistics
tito checkpoint class-stats
# Module completion rates
tito grade module-stats 05_losses
```
**Visual dashboards show:**
- Who's completed which modules
- Where students are getting stuck
- Average completion times
- Achievement distributions
### **Grading & Assessment**
**Q: How does automated grading work?**
A: **Three-layer validation system:**
1. **Functional Tests**: Does the code work correctly?
2. **Interface Tests**: Does it match expected function signatures?
3. **Checkpoint Tests**: Can student use their implementation?
```bash
# Grade student submission
tito nbgrader autograde 05_losses student_name
# Results show:
# ✅ Function implementation (40 points)
# ✅ Interface compliance (30 points)
# ✅ Integration test (30 points)
# Total: 100/100
```
**Q: What if a student's implementation works but doesn't match the test exactly?**
A: **Flexible grading system:**
- **Core functionality**: Must work correctly (non-negotiable)
- **Implementation details**: Multiple valid approaches accepted
- **Code style**: Guidance provided, not penalized
- **Performance**: Bonus points for optimization, not required
**Manual review system** catches edge cases and provides personalized feedback.
**Q: How do I handle students working at different paces?**
A: **Built-in flexibility:**
**Self-paced Options:**
- Students can work ahead through modules
- Checkpoint system validates readiness for advanced topics
- Extra credit opportunities for early finishers
**Support for Struggling Students:**
- Extended deadlines through system configuration
- Additional scaffolding materials included
- Peer tutoring through community features
- Office hours integration with progress tracking
---
## 🔧 Troubleshooting
### **Common Error Messages**
**Error: `ModuleNotFoundError: No module named 'tinytorch'`**
**Solutions:**
```bash
# 1. Activate virtual environment
source .venv/bin/activate
# 2. Install in development mode
pip install -e .
# 3. Verify installation
python -c "import tinytorch; print('Success!')"
```
**Error: `AttributeError: module 'tinytorch.core.tensor' has no attribute 'Tensor'`**
**Cause:** Module not exported or export failed
**Solutions:**
```bash
# 1. Check export status
tito module status 02_tensor
# 2. Re-export module
tito module complete 02_tensor
# 3. Verify export worked
python -c "from tinytorch.core.tensor import Tensor; print('Success!')"
```
**Error: Tests pass individually but fail in checkpoint**
**Cause:** Integration issues between modules
**Solutions:**
```bash
# 1. Test integration
tito checkpoint test 05 --verbose
# 2. Check all dependencies exported
tito module status --all
# 3. Re-export dependency chain
tito module complete 02_tensor
tito module complete 03_activations
# ... up to current module
```
### **Performance Issues**
**Q: Training is really slow - is this normal?**
A: **Some slowness is expected** (you're building from scratch!), but here's how to optimize:
**Expected Performance:**
- **Pure NumPy**: 10-100x slower than PyTorch
- **Simple examples**: Should complete in seconds
- **CIFAR-10 training**: 5-10 minutes per epoch
**Optimization Tips:**
```python
# Use vectorized operations
result = np.dot(x, weights) # Fast
# Avoid Python loops
for i in range(len(x)): # Slow
result[i] = x[i] * weights[i]
```
**Q: My computer is running out of memory during training**
A: **Memory management strategies:**
1. **Reduce batch size**:
```python
batch_size = 32 # Instead of 256
```
2. **Use gradient accumulation**:
```python
# Accumulate gradients over mini-batches
optimizer.step_every_n_batches(4)
```
3. **Profile memory usage**:
```bash
tito checkpoint test 10 --profile-memory
```
---
## 💡 Best Practices
### **Learning Effectively**
**Q: What's the best way to approach each module?**
A: **Follow the Build → Use → Reflect pattern:**
**1. Build (Implementation)**
- Read the introduction to understand the goal
- Implement functions one at a time
- Test each function immediately after writing it
**2. Use (Integration)**
- Export your module: `tito module complete 0X_name`
- Test the integration with checkpoint
- Use your component in examples
**3. Reflect (Understanding)**
- Answer the ML Systems Thinking questions
- Consider memory usage and performance
- Connect to production ML systems
**Q: How do I know if I really understand a concept?**
A: **True understanding means you can:**
1. **Implement from memory**: Re-write the function without looking
2. **Explain to others**: Describe how and why it works
3. **Debug problems**: Fix issues when something breaks
4. **Optimize performance**: Improve memory or speed
5. **Connect to production**: Relate to PyTorch/TensorFlow internals
**Checkpoint tests verify some of this, but self-reflection is crucial.**
### **Time Management**
**Q: I'm spending too much time on implementation details - should I move on?**
A: **Balance depth with progress:**
**When to Push Through:**
- Core concepts not clicking yet
- Function doesn't work correctly
- Tests are failing
**When to Move On:**
- Function works and passes tests
- You understand the main concept
- You're optimizing minor details
**Remember:** You can always return to optimize later. The goal is understanding systems, not perfect code.
**Q: Should I complete all modules before starting real projects?**
A: **No!** Start projects as soon as you have the basics:
- **After Module 6**: Build XOR solver
- **After Module 8**: Train MNIST classifier
- **After Module 10**: CIFAR-10 CNN
- **After Module 14**: TinyGPT language model
**Real projects reinforce learning and show practical applications.**
---
## 🚀 Getting More Help
### **When These FAQs Don't Help**
**1. Interactive CLI Help**
```bash
tito help --interactive # Personalized guidance
tito help troubleshooting # Common technical issues
```
**2. System Diagnostics**
```bash
tito system doctor # Comprehensive system check
```
**3. Community Support**
- Join leaderboard for peer help and discussion
- Share specific error messages for targeted assistance
- Learn from others' solutions and approaches
**4. Documentation Resources**
- **Module README files**: Detailed explanations for each topic
- **User Manual**: Comprehensive guide to all features
- **Instructor Guide**: Teaching resources and classroom management
**5. Course Support (if applicable)**
- Office hours with instructor
- Class discussion forums
- Teaching assistant support
### **Reporting Issues**
**Found a bug or unclear documentation?**
Please include:
- **System info**: Output of `tito system doctor`
- **Error message**: Complete traceback if available
- **Steps to reproduce**: What commands led to the issue
- **Expected vs actual**: What you expected to happen
**Contact through:**
- Course instructor (if taking as class)
- Community leaderboard (for peer support)
- GitHub issues (for bug reports)
---
**Still have questions? Try `tito help --interactive` for personalized guidance! 🚀**

View File

@@ -1,232 +0,0 @@
# KISS Principle in TinyTorch
## Keep It Simple, Stupid
The KISS principle is at the core of TinyTorch's design philosophy. Every component, interface, and implementation follows this fundamental rule: **simplicity enables understanding**.
## Why KISS Matters in ML Education
### Traditional ML Frameworks: Complexity by Default
Most production ML frameworks prioritize performance and features over clarity:
```python
# PyTorch: Multiple ways to do everything
torch.nn.Conv2d(3, 64, kernel_size=3, padding=1) # Object-oriented
F.conv2d(x, weight, bias, padding=1) # Functional
torch.conv2d(x, weight, bias, padding=[1,1]) # Low-level
# Result: Students learn APIs, not concepts
```
### TinyTorch: Clarity by Design
TinyTorch chooses the simplest approach that teaches the concept:
```python
# TinyTorch: One clear way to do each operation
Conv2D(in_channels=3, out_channels=64, kernel_size=3, padding=1)
# Result: Students understand the operation itself
```
## KISS in Practice
### 1. Single Responsibility Components
Every class has one clear purpose:
```python
# ✅ GOOD: Clear, single responsibility
class ReLU:
def forward(self, x):
return np.maximum(0, x)
def backward(self, grad_output):
return grad_output * (self.last_input > 0)
# ❌ AVOID: Multiple responsibilities
class ActivationWithDropoutAndNormalization:
# Too many concerns in one class
```
### 2. Minimal Interfaces
Functions do one thing with clear inputs/outputs:
```python
# ✅ GOOD: Simple, predictable interface
def conv2d(input, weight, bias=None, stride=1, padding=0):
# Implementation...
return output
# ❌ AVOID: Complex, unclear interface
def conv2d_advanced(input, weight, bias=None, stride=1, padding=0,
dilation=1, groups=1, padding_mode='zeros',
output_padding=0, **kwargs):
# Too many options obscure the core concept
```
### 3. Explicit Over Implicit
Make the "magic" visible:
```python
# ✅ GOOD: Shows what's happening
def train_step(model, loss_fn, optimizer, batch_x, batch_y):
# Forward pass
pred = model(batch_x)
loss = loss_fn(pred, batch_y)
# Backward pass
loss.backward()
optimizer.step()
optimizer.zero_grad()
return loss.data
# ❌ AVOID: Hidden complexity
def train_step(trainer, batch):
return trainer.step(batch) # What actually happens?
```
## KISS Design Decisions
### File Organization
```
# ✅ Simple structure
tinytorch/
├── core/ # Core implementations
├── utils/ # Utilities
└── datasets/ # Data handling
# vs. complex hierarchies with deep nesting
```
### Module Design
- **One concept per module**: Tensors, Activations, Layers, etc.
- **Progressive complexity**: Each module builds on previous ones
- **Self-contained**: Each module can be understood independently
### Code Style
- **No magic methods**: `__add__` is clear, `__radd__` is confusing
- **Explicit names**: `conv2d` not `conv`, `ReLU` not `R`
- **Minimal inheritance**: Composition over complex hierarchies
## Educational Benefits
### 1. Cognitive Load Reduction
Simple code means students focus on concepts, not syntax:
```python
# Cognitive load: LOW - focus on the math
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Cognitive load: HIGH - distracted by implementation details
def sigmoid(x, inplace=False, dtype=None, device=None, memory_format=None):
# Complex implementation with many edge cases
```
### 2. Debugging Clarity
When something breaks, simple code is easy to debug:
```python
# ✅ Easy to debug: clear execution path
def forward(self, x):
self.last_input = x
return np.maximum(0, x)
# ❌ Hard to debug: hidden state and side effects
def forward(self, x):
return self._apply_with_state_management(x, self._relu_impl)
```
### 3. Modification Confidence
Simple code invites experimentation:
```python
# Students think: "I can modify this!"
def adam_update(param, grad, m, v, lr=0.001, beta1=0.9, beta2=0.999):
m = beta1 * m + (1 - beta1) * grad
v = beta2 * v + (1 - beta2) * grad * grad
param -= lr * m / (np.sqrt(v) + 1e-8)
return param, m, v
# Students think: "I better not touch this..."
# [100 lines of optimized, abstracted update logic]
```
## KISS vs. Performance
### The Trade-off
KISS sometimes means choosing clarity over peak performance:
```python
# TinyTorch: Clear but not optimized
def conv2d_simple(input, kernel):
output = np.zeros(output_shape)
for i in range(output_height):
for j in range(output_width):
# Clear nested loops show the operation
output[i, j] = np.sum(input[i:i+k_h, j:j+k_w] * kernel)
return output
# Production: Optimized but opaque
def conv2d_optimized(input, kernel):
# BLAS calls, memory optimization, SIMD instructions
return torch._C._nn.conv2d(input, kernel, ...)
```
### When We Optimize
We add optimization layers **after** establishing clarity:
1. **First**: Implement the clearest possible version
2. **Then**: Profile and identify bottlenecks
3. **Finally**: Add optimizations with clear documentation
### Documentation of Trade-offs
Every optimization is explained:
```python
def conv2d_vectorized(input, kernel):
"""Vectorized convolution implementation.
This version uses im2col transformation for speed.
For the clear, educational version, see conv2d_simple().
Trade-off: 10x faster, but obscures the sliding window concept.
"""
```
## KISS Guidelines for Contributors
### Before Adding Complexity
Ask these questions:
1. **Is this essential for understanding the concept?**
2. **Can students modify this confidently?**
3. **Does this make debugging easier or harder?**
4. **Is there a simpler way to achieve the same goal?**
### Code Review Checklist
- [ ] Single responsibility per function/class
- [ ] Clear, explicit names
- [ ] Minimal parameter lists
- [ ] No hidden state or side effects
- [ ] Students can understand the implementation
- [ ] Debugging is straightforward
### Refactoring Triggers
Refactor when:
- Functions have more than 3-4 parameters
- Classes have more than one clear responsibility
- Students ask "what does this do?" frequently
- Debugging requires deep knowledge of implementation details
## The KISS Promise
TinyTorch promises that every component follows KISS principles:
- **You can understand any implementation in 5 minutes**
- **You can modify any component confidently**
- **When something breaks, you can debug it yourself**
- **The simplest solution is always preferred**
This isn't just about code - it's about **empowering learners** to become confident ML systems engineers who understand their tools completely.
Remember: **Complex problems often have simple solutions. Simple solutions enable deep understanding.**

View File

@@ -1,89 +0,0 @@
# Quick Exploration Path
**Perfect for:** "I want to see what this is about" • "Can I try this without installing anything?"
**Time Commitment:** 5-30 minutes • **Setup Required:** None
---
## Launch Immediately (0 Setup Required)
Click the **Launch Binder** button on any chapter to get:
- Live Jupyter environment in your browser
- Pre-configured TinyTorch development setup
- Ability to run and modify all code immediately
- No installation, no account creation needed
```{admonition} What You'll Experience in 5-30 Minutes
:class: tip
**Immediate implementation experience** with real ML components:
- **5 min**: ReLU activation function from scratch
- **10 min**: Tensor operations that power neural networks
- **15 min**: Dense layers that transform data
- **20 min**: Complete neural networks for image classification
- **30 min**: See how language models use the same foundations
All running live in your browser with zero setup!
```
---
## Recommended Exploration Path
### Start Here: Chapter 1 - Setup
- Understand the TinyTorch development workflow
- Get familiar with the educational approach
- See how components fit together
**[Launch Setup Chapter](../chapters/01-setup.md)**
### Then Try: Chapter 3 - Activations
- Implement your first ML function (ReLU)
- See immediate visual results
- Understand why nonlinearity matters
**[Launch Activations Chapter](../chapters/03-activations.md)**
### Build Up: Chapter 4 - Layers
- Create the building blocks of neural networks
- Combine your ReLU with matrix operations
- See how simple math becomes powerful AI
**[Launch Layers Chapter](../chapters/04-layers.md)**
---
## Important Limitations
**Sessions are temporary:**
- Binder sessions timeout after ~20 minutes of inactivity
- Your work is **not saved** when the session ends
- Great for exploration, not for ongoing projects
**For persistent work:** Ready to build your own TinyTorch? → **[Serious Development Path](serious-development.md)**
---
## What You'll Understand
After exploring 2-3 chapters, you'll have hands-on understanding of:
- **How ML frameworks work under the hood**
- **Why activation functions are crucial**
- **How matrix multiplication powers neural networks**
- **The relationship between layers, networks, and learning**
- **Real implementation vs. high-level APIs**
- **Why vision and language models share the same foundations**
---
## Next Steps
**Satisfied with exploration?** You've gained valuable insight into ML systems!
**Want to build more?****[Fork the repo and work locally](serious-development.md)**
**Teaching a class?****[Classroom setup guide](classroom-use.md)**
---
*No commitment required - just click and explore!*

View File

@@ -1,244 +0,0 @@
# Serious Development Path
**Perfect for:** "I want to build this myself" • "This is my class assignment" • "I want to understand ML frameworks deeply"
---
## What You'll Build
A complete ML framework from scratch, including:
- **Your own tensor library** with operations and autograd
- **Neural network components** (layers, activations, optimizers)
- **Training systems** that work on real datasets (CIFAR-10)
- **Production features** (compression, monitoring, benchmarking)
- **Language models** that extend your vision framework to TinyGPT
**End result:** A working ML framework that powers both computer vision AND language models.
---
## Quick Start (5 minutes)
### Step 1: Get the Code
```bash
git clone https://github.com/your-org/tinytorch.git
cd TinyTorch
```
### Step 2: Setup Environment
```bash
# Activate virtual environment
source bin/activate-tinytorch.sh
# Install dependencies
make install
# Verify everything works
tito system doctor
```
### Step 3: Start Building
```bash
# Open first assignment
cd modules/01_setup
jupyter lab setup_dev.py
```
### Step 4: Build → Test → Export → Use
```bash
# After implementing code in the notebook:
tito export # Export your code to tinytorch package
tito test setup # Test your implementation
# Now use YOUR own code:
python -c "from tinytorch.core.setup import hello_tinytorch; hello_tinytorch()"
# 🔥 TinyTorch! Built by: [Your Name]
```
---
## Learning Path (Progressive Complexity)
### Foundation (Weeks 1-2)
Build the core infrastructure:
**Module 01: Setup & CLI**
- Professional development workflow with `tito` CLI
- Understanding package architecture and exports
- Quality assurance with automated testing
**Module 01: Tensors**
- Multi-dimensional arrays and operations
- Memory management and data types
- Foundation for all ML operations
**Module 02: Activations**
- ReLU, Sigmoid, Tanh, Softmax functions
- Understanding nonlinearity in neural networks
- Mathematical foundations of deep learning
---
### 🧱 Building Blocks (Weeks 3-4)
Create neural network components:
**Module 03: Layers**
- Dense (linear) layers with matrix multiplication
- Weight initialization strategies
- Building blocks that stack together
**Module 04: Networks**
- Sequential model architecture
- Composition patterns and forward propagation
- Creating complete neural networks
**Module 05: CNNs**
- Convolutional operations for computer vision
- Understanding spatial processing
- Building blocks for image classification
---
### Training Systems (Weeks 5-6)
Complete training infrastructure:
**Module 06: DataLoader**
- Efficient data loading and preprocessing
- Real dataset handling (CIFAR-10)
- Batching, shuffling, and memory management
**Module 07: Autograd**
- Automatic differentiation engine
- Computational graphs and backpropagation
- The magic that makes training possible
**Module 08: Optimizers**
- SGD, Adam, and learning rate scheduling
- Understanding gradient descent variants
- Convergence and training dynamics
**Module 09: Training**
- Complete training loops and loss functions
- Model evaluation and metrics
- Checkpointing and persistence
---
### Production & Performance (Weeks 7-8)
Real-world deployment:
**Module 10: Compression**
- Model pruning and quantization
- Reducing model size by 75%+
- Deployment optimization
**Module 11: Kernels**
- High-performance custom operations
- Hardware-aware optimization
- Understanding framework internals
**Module 12: Benchmarking**
- Systematic performance measurement
- Statistical validation and reporting
- MLPerf-style evaluation
**Module 13: MLOps**
- Production deployment and monitoring
- Continuous learning and model updates
- Complete production pipeline
**Module 16: TinyGPT 🔥**
- Extend vision framework to language models
- GPT-style transformers with 95% component reuse
- Autoregressive text generation
- Framework generalization mastery
---
## Development Workflow
### The `tito` CLI System
TinyTorch includes a complete CLI for professional development:
```bash
# System management
tito system doctor # Check environment health
tito system info # Show module status
# Module development
tito export # Export dev code to package
tito test setup # Test specific module
tito test --all # Test everything
# NBGrader integration
tito nbgrader generate setup # Create assignments
tito nbgrader release setup # Release to students
tito nbgrader autograde setup # Auto-grade submissions
```
### Quality Assurance
Every module includes comprehensive testing:
- **100+ automated tests** ensure correctness
- **Inline tests** provide immediate feedback
- **Integration tests** verify cross-module functionality
- **Performance benchmarks** track optimization
---
## Proven Student Outcomes
```{admonition} Real Results
:class: success
**After 6-8 weeks, students consistently:**
✅ Build multi-layer perceptrons that classify CIFAR-10 images
✅ Implement automatic differentiation from scratch
✅ Create custom optimizers (SGD, Adam) that converge reliably
✅ Optimize models with pruning and quantization
✅ Deploy production ML systems with monitoring
✅ Understand framework internals better than most ML engineers
🔥 **Extend their vision framework to language models with 95% reuse**
**Test Coverage:** 200+ tests across all modules ensure student implementations work
```
---
## Why This Approach Works
### Build → Use → Understand
Every component follows this pattern:
1. **🔧 Build**: Implement `ReLU()` from scratch
2. **🚀 Use**: `from tinytorch.core.activations import ReLU` - your code!
3. **💡 Understand**: See how it enables complex pattern learning
### Real Data, Real Systems
- Work with CIFAR-10 (not toy datasets)
- Production-style code organization
- Performance and engineering considerations
- Professional development practices
### Immediate Feedback
- Code works immediately after implementation
- Visual progress indicators and success messages
- Comprehensive error handling and guidance
- Professional-quality development experience
---
## Ready to Start?
### Choose Your Module
**New to ML frameworks?** → Start with [Setup](../chapters/01-setup.md)
**Have ML experience?** → Jump to [Tensors](../chapters/01-tensor.md)
**Want to see the vision?** → Try [Activations](../chapters/02-activations.md)
### Get Help
- **💬 Discussions**: GitHub Discussions for questions
- **🐛 Issues**: Report bugs or suggest improvements
- **📧 Support**: Direct contact with TinyTorch team
---
*🎉 Ready to build your own ML framework? Your unified vision+language framework is 8 weeks away!*

View File

@@ -1,103 +0,0 @@
#!/usr/bin/env python3
"""
Verify that the Jupyter Book build is complete and all pages are present.
"""
import os
from pathlib import Path
from rich.console import Console
from rich.table import Table
from rich.panel import Panel
console = Console()
def verify_book_build():
"""Verify the book build is complete."""
build_dir = Path("book/_build/html")
if not build_dir.exists():
console.print("❌ Build directory not found! Run 'tito book build' first.")
return False
# Pages that must exist
required_pages = {
"Main Pages": [
"index.html",
"intro.html",
"setup.html",
"instructor-guide.html",
"system-architecture.html"
],
"Module Chapters": [
f"chapters/{i:02d}-{name}.html" for i, name in enumerate([
"introduction", "setup", "tensor", "activations", "layers",
"dense", "spatial", "attention", "dataloader", "autograd",
"optimizers", "training", "compression", "kernels",
"benchmarking", "mlops", "tinygpt"
], 0)
],
"New Documentation": [
"testing-framework.html",
"kiss-principle.html"
],
"Usage Paths": [
"usage-paths/quick-start.html",
"usage-paths/browse-online.html",
"usage-paths/serious-development.html"
]
}
# Check each category
results = {}
for category, pages in required_pages.items():
results[category] = []
for page in pages:
full_path = build_dir / page
exists = full_path.exists()
size = full_path.stat().st_size if exists else 0
results[category].append({
'page': page,
'exists': exists,
'size': size
})
# Display results
console.print(Panel.fit(
"📚 [bold blue]TinyTorch Jupyter Book Verification[/bold blue]",
border_style="blue"
))
all_good = True
for category, checks in results.items():
console.print(f"\n[bold]{category}[/bold]")
for check in checks:
if check['exists']:
if check['size'] > 100: # More than just a redirect
console.print(f"{check['page']} ({check['size']:,} bytes)")
else:
console.print(f" ⚠️ {check['page']} (small: {check['size']} bytes)")
else:
console.print(f"{check['page']} (missing)")
all_good = False
# Summary
if all_good:
console.print(Panel.fit(
"✨ [bold green]All documentation pages built successfully![/bold green]\n"
f"🌐 View at: file://{build_dir.absolute()}/index.html",
border_style="green"
))
else:
console.print(Panel.fit(
"⚠️ [bold yellow]Some pages are missing![/bold yellow]\n"
"Run 'tito book build' to rebuild the documentation.",
border_style="yellow"
))
return all_good
if __name__ == "__main__":
os.chdir(Path(__file__).parent.parent) # Go to project root
success = verify_book_build()
exit(0 if success else 1)

View File

@@ -1,213 +0,0 @@
# The TinyTorch Vision
**Training ML Systems Engineers: From Computer Vision to Language Models**
---
## The Problem We're Solving
The ML field has a critical gap: **most education teaches you to use frameworks, not build them.**
### Traditional ML Education:
```python
import torch
import torch.nn as nn
model = nn.Linear(784, 10)
optimizer = torch.optim.Adam(model.parameters())
```
**Questions students can't answer:**
- Why does Adam use 3× more memory than SGD?
- How does `loss.backward()` actually compute gradients?
- When should you use gradient accumulation vs larger batch sizes?
- Why do attention mechanisms limit context length?
### The TinyTorch Difference:
```python
class Linear:
def __init__(self, in_features, out_features):
self.weight = Tensor(np.random.randn(in_features, out_features))
self.bias = Tensor(np.zeros(out_features))
def forward(self, x):
return x @ self.weight + self.bias # YOU implemented @
def backward(self, grad_output):
# YOU understand exactly how gradients flow
self.weight.grad = x.T @ grad_output
return grad_output @ self.weight.T
```
**Questions students CAN answer:**
- Exactly how automatic differentiation works
- Why certain optimizers use more memory
- How to debug training instability
- When to make performance vs accuracy trade-offs
---
## What We Teach: Systems Thinking
### Beyond Algorithms: System-Level Understanding
**Memory Management:**
- Why Adam needs 3× parameter memory (parameters + momentum + variance)
- How attention matrices scale O(N²) with sequence length
- When gradient accumulation saves memory vs compute trade-offs
**Performance Analysis:**
- Why naive convolution is 100× slower than optimized versions
- How cache misses destroy performance in matrix operations
- When vectorization provides 10-100× speedups
**Production Trade-offs:**
- SGD vs Adam: convergence speed vs memory constraints
- Gradient checkpointing: trading compute for memory
- Mixed precision: 2× memory savings with accuracy considerations
**Hardware Awareness:**
- How memory bandwidth limits ML performance
- Why GPU utilization matters more than peak FLOPS
- When distributed training becomes necessary
---
## Target Audience: Future ML Systems Engineers
### Perfect For:
**Computer Science Students**
- Going beyond "use PyTorch" to "understand PyTorch"
- Building portfolio projects that demonstrate deep system knowledge
- Preparing for ML engineering roles (not just data science)
**Software Engineers → ML Engineers**
- Leveraging existing programming skills for ML systems
- Understanding performance, debugging, and optimization
- Learning production ML patterns and infrastructure
**ML Practitioners**
- Moving from model users to model builders
- Debugging training issues at the systems level
- Optimizing models for production deployment
**Researchers & Advanced Users**
- Implementing custom operations and architectures
- Understanding framework limitations and workarounds
- Building specialized ML systems for unique domains
### Career Transformation:
**Before TinyTorch:** "I can train models with PyTorch"
**After TinyTorch:** "I can build and optimize ML systems"
You become the person your team asks:
- *"Why is our training bottlenecked?"*
- *"Can we fit this model in memory?"*
- *"How do we implement this research paper?"*
- *"What's the best architecture for our constraints?"*
---
## Pedagogical Philosophy: Build → Use → Understand
### 1. Build First
Every component implemented from scratch:
- Tensors with broadcasting and memory management
- Automatic differentiation with computational graphs
- Optimizers with state management and memory profiling
- Complete training loops with checkpointing and monitoring
### 2. Use Immediately
No toy examples - recreate ML history with real results:
- **MLP Era**: Train MLPs to 52.7% CIFAR-10 accuracy (the baseline that motivated CNNs)
- **CNN Revolution**: Build LeNet-1 (39.4%) and LeNet-5 (47.5%) - witness the breakthrough
- **Modern CNNs**: Push beyond MLPs with optimized architectures (75%+ achievable)
- **Transformer Era**: Language models using 95% vision framework reuse
### 3. Understand Systems
Connect implementations to production reality:
- How your tensor maps to PyTorch's memory model
- Why your optimizer choices affect GPU utilization
- How your autograd compares to production frameworks
- When your implementations would need modification at scale
### 4. Reflect on Trade-offs
ML Systems Thinking sections in every module:
- Memory vs compute trade-offs in different architectures
- Accuracy vs efficiency considerations for deployment
- Debugging strategies for common production issues
- Framework design principles and their implications
---
## Unique Value Proposition
### What Makes TinyTorch Different:
**Systems-First Approach**
- Not just "how does attention work" but "why does attention scale O(N²) and how do production systems handle this?"
- Not just "implement SGD" but "when do you choose SGD vs Adam in production?"
**Production Relevance**
- Memory profiling, performance optimization, deployment patterns
- Real datasets, realistic scale, professional development workflow
- Connection to industry practices and framework design decisions
**Framework Generalization**
- 20 modules that build ONE cohesive ML framework supporting vision AND language
- 95% component reuse from computer vision to language models
- Professional package structure with CLI tools and testing
**Proven Pedagogy**
- Build → Use → Understand cycle creates deep intuition
- Immediate testing and feedback for every component
- Progressive complexity with solid foundations
- NBGrader integration for classroom deployment
---
## Learning Outcomes: Becoming an ML Systems Engineer
### Technical Mastery
- **Implement any ML paper** from first principles
- **Debug training issues** at the systems level
- **Optimize models** for production deployment
- **Profile and improve** ML system performance
- **Design custom architectures** for specialized domains
- **Understand framework generalization** across vision and language
### Systems Understanding
- **Memory management** in ML frameworks
- **Computational complexity** vs real-world performance
- **Hardware utilization** patterns and optimization
- **Distributed training** challenges and solutions
- **Production deployment** considerations and trade-offs
### Professional Skills
- **Test-driven development** for ML systems
- **Performance profiling** and optimization techniques
- **Code organization** and package development
- **Documentation** and API design
- **MLOps** and production monitoring
### Career Impact
- **Technical interviews**: Demonstrate deep ML systems knowledge
- **Job opportunities**: Qualify for ML engineer (not just data scientist) roles
- **Team leadership**: Become the go-to person for ML systems questions
- **Research ability**: Implement cutting-edge papers independently
- **Entrepreneurship**: Build ML products with full-stack understanding
---
## Ready to Become an ML Systems Engineer?
**TinyTorch transforms ML users into ML builders.**
Stop wondering how frameworks work. Start building them.
**[Begin Your Journey →](chapters/00-introduction.md)**
---
*TinyTorch: Because understanding how to build ML systems makes you a more effective ML engineer.*

View File

@@ -1,428 +0,0 @@
# Module 17: Compression - Comprehensive Review Report
**Date**: 2025-11-10
**Reviewer**: TinyTorch Standards Compliance
**Module**: compression_dev.py (1720 lines)
**Status**: ⚠️ NEEDS SIGNIFICANT IMPROVEMENTS
---
## Executive Summary
Module 17 (Compression) is a **well-structured educational module** that covers important ML compression techniques. However, it has **critical violations** of TinyTorch standards that must be addressed before it can be considered complete.
**Overall Score**: 6.5/10
### Critical Issues Found:
1.**Sequential class definition violates composition rules** (CRITICAL)
2.**Missing `__main__` guards for test execution** (CRITICAL)
3. ⚠️ **NBGrader cell metadata incomplete** (HIGH)
4. ⚠️ **Systems analysis sections could be more focused** (MEDIUM)
5. ✅ Good educational content and clear explanations
6. ✅ Comprehensive test coverage
---
## 1. NBGrader Cell Structure ❌ ISSUES FOUND
### Issues:
1. **Missing cell metadata on many cells** - Not all code cells have proper NBGrader metadata
2. **Inconsistent grade_id naming** - Some cells lack unique identifiers
3. **Missing "locked" flags on test cells** - Test cells should be marked as locked
### Examples of Problems:
```python
# Line 59: MISSING specific nbgrader metadata
# %% nbgrader={"grade": false, "grade_id": "imports", "solution": true}
# Should specify: "locked": false, "schema_version": 3, "solution": true
# Lines 362-379: Test cell MISSING grade metadata
def test_unit_measure_sparsity():
"""🔬 Test sparsity measurement functionality."""
# Should have: {"grade": true, "grade_id": "test-measure-sparsity", "locked": true, "points": 5}
```
### Required Fixes:
**Metadata Template for Implementation Cells:**
```python
# %% nbgrader={"grade": false, "grade_id": "cell-unique-id", "locked": false, "schema_version": 3, "solution": true}
```
**Metadata Template for Test Cells:**
```python
# %% nbgrader={"grade": true, "grade_id": "test-unique-id", "locked": true, "points": 5, "schema_version": 3}
```
---
## 2. Educational Content & Docstrings ✅ EXCELLENT
### Strengths:
- ✅ Clear progression from motivation to implementation
- ✅ Excellent ASCII diagrams explaining compression techniques
- ✅ Comprehensive docstrings with TODO/APPROACH/HINTS
- ✅ Strong mathematical foundations explained clearly
- ✅ Real-world production context throughout
### Examples of Excellence:
```python
# Lines 295-319: Excellent sparsity visualization
"""
Dense Matrix (0% sparse): Sparse Matrix (75% sparse):
┌─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐ ┌─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐
│ 2.1 1.3 0.8 1.9 2.4 1.1 0.7 │ │ 2.1 0.0 0.0 1.9 0.0 0.0 0.0 │
...
```
- Lines 322-360: Perfect docstring structure with TODO/APPROACH/EXAMPLE/HINT
- Lines 842-923: Outstanding knowledge distillation explanation with diagrams
### Minor Improvements Needed:
- Some sections could be more concise (avoid over-explanation)
- A few technical terms could benefit from simpler analogies
---
## 3. Imports and Module Structure ⚠️ CRITICAL VIOLATION
### CRITICAL ISSUE: Sequential Class Definition
**Lines 73-91: FORBIDDEN pattern detected**
```python
# Sequential container for model compression
class Sequential:
"""Sequential container for compression (not exported from core layers)."""
def __init__(self, *layers):
self.layers = list(layers)
```
**Why This Violates TinyTorch Standards:**
From the agent rules:
> ❌ FORBIDDEN: Sequential containers that chain layers
> Modules NEVER build COMPOSITIONS that hide student work
**The Problem:**
- Sequential is a **composition class** that hides layer interactions
- Students should see explicit layer chaining in milestones/examples
- Modules build ATOMIC COMPONENTS, not compositions
- This breaks the pedagogical principle of visible data flow
**Required Fix:**
```python
# REMOVE Sequential class entirely from module
# Instead, let milestones/examples show explicit composition:
class MLP: # In milestone, NOT in module
def __init__(self):
self.layer1 = Linear(784, 128)
self.relu = ReLU()
self.layer2 = Linear(128, 10)
def forward(self, x):
x = self.layer1.forward(x) # Students SEE each step
x = self.relu.forward(x)
x = self.layer2.forward(x)
return x
```
**Impact:**
- Tests currently use Sequential (lines 367, 498, 655, etc.)
- Need to rewrite tests to use explicit layer chaining
- Or import Sequential from a milestone helper (if available)
---
## 4. Memory Profiling & Performance Benchmarking ⚠️ NEEDS IMPROVEMENT
### Current State:
- ✅ Has profiling integration (lines 103-155, 1249-1317)
- ✅ Compression technique comparison (lines 1327-1377)
- ⚠️ Missing detailed memory analysis for sparse vs dense storage
- ⚠️ Missing timing comparisons for pruned vs unpruned inference
### Existing Good Examples:
**Lines 1249-1317: Excellent profiler integration**
```python
def demo_compression_with_profiler():
"""📊 Demonstrate parameter reduction using Profiler from Module 15."""
# Shows before/after parameter counts, sparsity, memory
```
### Missing Analysis:
**Should Add:**
1. **Sparse Storage Formats Analysis**
```python
def analyze_sparse_storage_formats():
"""Compare COO, CSR, CSC storage for different sparsity levels."""
# Show memory overhead of indices
# Show when sparse format beats dense
```
2. **Inference Time Impact**
```python
def analyze_pruning_speedup():
"""Measure actual inference time with/without sparse libraries."""
# Show that pruning alone doesn't guarantee speedup
# Demonstrate need for sparse BLAS libraries
```
3. **Memory Access Patterns**
```python
def analyze_cache_efficiency():
"""Compare structured vs unstructured sparsity memory patterns."""
# Show cache miss rates
# Demonstrate hardware acceleration benefits
```
---
## 5. ML Systems Analysis Content ⚠️ GOOD BUT COULD BE BETTER
### Current Systems Analysis:
**Lines 1230-1324: Good foundation**
- ✅ Compression technique comparison
- ✅ Profiler integration demonstration
- ✅ Parameter reduction tracking
**Lines 1327-1377: analyze_compression_techniques()**
- ✅ Compares magnitude vs structured pruning
- ✅ Shows compression ratios across model sizes
- ⚠️ Could add timing measurements
**Lines 1387-1417: analyze_distillation_effectiveness()**
- ✅ Shows teacher-student compression ratios
- ⚠️ Simulated data instead of real measurements
- ⚠️ Missing actual training/inference time comparison
### Recommendations:
1. **Add Real Measurements**: Replace simulated data with actual profiling
2. **Compare All Techniques**: Side-by-side comparison of all compression methods
3. **Hardware Impact**: Show how different techniques affect different hardware
4. **Production Patterns**: Reference real-world compression pipelines (BERT, MobileNet)
---
## 6. Test Coverage ✅ EXCELLENT
### Test Structure:
- ✅ Unit tests for every function (test_unit_*)
- ✅ Comprehensive module integration test (test_module)
- ✅ Clear test descriptions and assertions
- ✅ Realistic test scenarios
### Unit Tests Present:
1. ✅ test_unit_measure_sparsity() - Lines 362-379
2. ✅ test_unit_magnitude_prune() - Lines 493-525
3. ✅ test_unit_structured_prune() - Lines 650-684
4. ✅ test_unit_low_rank_approximate() - Lines 799-829
5. ✅ test_unit_knowledge_distillation() - Lines 1035-1064
6. ✅ test_unit_compress_model() - Lines 1196-1227
### Integration Test:
- ✅ test_module() - Lines 1427-1523
- ✅ Tests complete pipeline
- ✅ Validates all techniques work together
### **CRITICAL ISSUE: Missing `__main__` Guards**
**Lines 379, 525, 684, 829, 1064, 1227, 1523:** Tests run at module level without protection
```python
# CURRENT (WRONG):
test_unit_measure_sparsity() # Runs on import!
# REQUIRED (CORRECT):
if __name__ == "__main__":
test_unit_measure_sparsity() # Only runs when executing module directly
```
**Impact:**
- Tests execute when module is imported by other modules
- Causes unnecessary output and potential errors
- Violates the dependency chain rules
- Module 18+ cannot cleanly import from Module 17
**Fix Required for ALL test calls:**
```python
def test_unit_measure_sparsity():
"""🔬 Test sparsity measurement functionality."""
# Test implementation
pass
# Add this guard IMMEDIATELY after test definition:
if __name__ == "__main__":
test_unit_measure_sparsity()
```
---
## 7. Production Context & Real-World Applications ✅ EXCELLENT
### Strengths:
- ✅ Clear deployment scenarios (mobile, edge, cloud) - Lines 1099-1132
- ✅ Production compression pipelines explained - Lines 1076-1094
- ✅ Hardware considerations throughout
- ✅ Real-world compression ratios cited
- ✅ Knowledge distillation use cases
### Examples of Excellence:
**Lines 1099-1132: Deployment scenarios**
```python
MOBILE APP (Aggressive compression needed):
• Magnitude pruning: 95% sparsity
• Structured pruning: 50% channels
• Knowledge distillation: 10x reduction
```
**Lines 167-179: Real constraints**
```python
- Modern language models: 100GB+ (GPT-3 scale)
- Mobile devices: <1GB available for models
- Edge devices: <100MB realistic limits
```
---
## Detailed Issue Breakdown
### Priority 1: CRITICAL (Must Fix Before Export)
1. **Remove Sequential Class** (Lines 73-91)
- Violates composition principle
- Replace with explicit layer usage in tests
- Add note directing students to milestones for composition
2. **Add `__main__` Guards to ALL Test Calls**
- Lines: 379, 525, 684, 829, 1064, 1227, 1523
- Prevents tests from running on import
- Critical for Module 18+ to import cleanly
3. **Fix NBGrader Metadata**
- Add complete metadata to all cells
- Ensure consistent grade_id naming
- Mark test cells as locked with points
### Priority 2: HIGH (Should Fix Soon)
4. **Add Missing Systems Analysis Functions**
- Sparse storage format comparison
- Inference time measurements (pruned vs unpruned)
- Cache efficiency analysis
5. **Improve Existing Analysis**
- Replace simulated data with real measurements
- Add timing data to compression technique comparison
- Show hardware-specific differences
### Priority 3: MEDIUM (Nice to Have)
6. **Module Structure Improvements**
- Consider splitting into submodules if growing
- Add more cross-references to other modules
- Clarify package export structure
7. **Documentation Enhancements**
- Add references to academic papers
- Include real-world case studies
- Link to production implementations
---
## Compliance Checklist
### NBGrader Requirements
- ⚠️ **Jupytext headers**: Present but could be more complete
- ❌ **Cell metadata**: Incomplete, missing schema_version
- ✅ **BEGIN/END SOLUTION blocks**: Properly used
- ✅ **Scaffolding outside solution blocks**: Excellent
- ⚠️ **Test cells locked**: Missing lock flags
### Educational Quality
- ✅ **Cognitive load**: Well-managed, 2-3 concepts per section
- ✅ **Progressive disclosure**: Excellent flow
- ✅ **Immediate feedback**: Unit tests after each function
- ✅ **Production connections**: Strong throughout
### Technical Quality
- ✅ **Implementation correctness**: All functions properly implemented
- ❌ **Module dependency rules**: Sequential class violates rules
- ❌ **Test isolation**: Tests run on import (missing guards)
- ✅ **Integration validation**: Comprehensive test_module()
### Systems Quality
- ⚠️ **Performance profiling**: Good but could be more comprehensive
- ⚠️ **Memory analysis**: Present but incomplete
- ✅ **Real-world implications**: Excellent
- ⚠️ **Trade-off discussions**: Good but could add more measurements
---
## Recommended Action Plan
### Phase 1: Critical Fixes (1-2 hours)
1. Remove Sequential class, refactor tests to use explicit layers
2. Add `__main__` guards to all test function calls
3. Update NBGrader metadata on all cells
### Phase 2: High Priority (2-3 hours)
4. Add sparse storage format analysis function
5. Add inference timing comparison function
6. Replace simulated data with real measurements
### Phase 3: Polish (1-2 hours)
7. Review and enhance cross-references
8. Add academic paper references
9. Final consistency check
---
## Positive Highlights
Despite the issues, this module has many strengths:
1. **Excellent Educational Design**: Clear progression, strong explanations
2. **Comprehensive Coverage**: All major compression techniques included
3. **Strong Testing**: Unit tests and integration tests well-designed
4. **Production Context**: Real-world scenarios clearly explained
5. **Visual Aids**: Outstanding ASCII diagrams
6. **Mathematical Rigor**: Proper foundations explained clearly
---
## Final Verdict
**Current Status**: NOT READY FOR EXPORT
**With Critical Fixes**: READY FOR EXPORT
**Overall Assessment**: This is a **high-quality educational module** that needs **critical architectural fixes** to comply with TinyTorch standards. The Sequential class violation and missing `__main__` guards are blocking issues. Once these are resolved, this module will be an excellent addition to the curriculum.
**Estimated Time to Fix**: 4-8 hours for complete compliance
---
## Next Steps
1. Review this report with the development team
2. Prioritize Critical fixes (Priority 1)
3. Implement fixes following TinyTorch standards
4. Re-run validation after fixes
5. Export module once compliant
---
**Report Generated**: 2025-11-10
**Reviewer**: TinyTorch Quality Assurance
**Module**: 17_compression/compression_dev.py
**Lines Reviewed**: 1720
**Issues Found**: 7 (2 Critical, 2 High, 3 Medium)

View File

@@ -1,591 +0,0 @@
# Module 15: Memoization (KV Caching) - Review Report
**Date**: 2025-11-10
**Reviewer**: TinyTorch Standards Compliance
**Status**: ✅ PASSING (Minor Issues Found)
---
## Executive Summary
Module 15 (Memoization/KV Caching) is **well-structured and production-ready** with excellent educational content. The module successfully implements KV caching for transformer inference optimization with comprehensive testing and systems analysis.
**Overall Grade: A- (92/100)**
### Key Strengths
- ✅ Comprehensive KVCache implementation with proper memory management
- ✅ Excellent educational scaffolding with clear TODO/APPROACH/HINTS
- ✅ Strong systems analysis with memory profiling and speedup measurements
- ✅ Non-invasive integration pattern (enhances existing modules without breaking them)
- ✅ All tests pass successfully
- ✅ Real-world context and production relevance throughout
### Issues Found
1. ⚠️ **CRITICAL**: Missing proper test file protection with `if __name__ == "__main__"`
2. ⚠️ **MEDIUM**: Module number inconsistency (says Module 14 in some places, should be 15)
3. ⚠️ **MINOR**: Missing comprehensive docstrings for analysis functions
4. ⚠️ **MINOR**: Some markdown cells could use better formatting
---
## Detailed Analysis
### 1. NBGrader Cell Structure ✅ PASSING
**Score: 95/100**
#### Strengths:
- ✅ Proper Jupytext headers present (lines 1-13)
- ✅ Correct NBGrader metadata on implementation cells
- ✅ BEGIN/END SOLUTION blocks properly used
- ✅ Test cells have locked=true and grade=true
- ✅ Unique grade_ids for all graded cells
#### Issues:
- ⚠️ Some cells missing nbgrader metadata (lines 79-141 profile section)
**Recommendation**: Add nbgrader metadata to analysis cells:
```python
# %% nbgrader={"grade": false, "grade_id": "motivation-profile", "locked": false}
```
---
### 2. Educational Content & Docstrings ✅ EXCELLENT
**Score: 98/100**
#### Strengths:
- ✅ Outstanding conceptual explanations (Parts 1-2)
- ✅ Clear ASCII diagrams showing cache architecture
- ✅ Excellent scaffolding with TODO/APPROACH/HINTS pattern
- ✅ Rich examples in docstrings
- ✅ Strong narrative flow explaining WHY caching matters
- ✅ Progressive disclosure - builds complexity gradually
#### Example of Excellent Scaffolding:
```python
def __init__(self, ...):
"""
TODO: Set up pre-allocated cache storage for all transformer layers
APPROACH:
1. Store configuration parameters (batch_size, max_seq_len, etc.)
2. Initialize sequence position counter to 0
3. Create empty list for cache storage
4. For each layer, pre-allocate zero-filled key and value caches
5. Store each layer's (key_cache, value_cache) tuple in the list
HINTS:
- Cache shape: (batch_size, num_heads, max_seq_len, head_dim)
- Use Tensor(np.zeros(...)) to create cache tensors
"""
```
#### Issues:
- ⚠️ Analysis functions (lines 1339-1427) lack comprehensive docstrings
- Could add more pedagogical notes explaining when students use .data vs Tensor operations
**Recommendation**: Add full docstrings to analysis functions with educational context.
---
### 3. Imports & Module Structure ✅ PASSING
**Score: 90/100**
#### Strengths:
- ✅ Proper package export declarations (`#| export`)
- ✅ Clean dependency management (only imports from tinytorch.core)
- ✅ Correct import pattern for profiler
- ✅ Good separation of concerns (KVCache, enable_kv_cache, disable_kv_cache)
#### Issues:
- ⚠️ **CRITICAL**: Module executes profiling code on import (lines 79-141)
- This violates the "test code protection" rule
- Should be wrapped in `if __name__ == "__main__":` block
- ⚠️ Module number confusion:
- Line 45: Says "modules/15_memoization" (correct)
- Line 1505: Says "tito module complete 14" (should be 15)
- Line 918: Says "Module 14" (should be 15)
**Recommendation**:
1. Wrap profiling code in main guard:
```python
if __name__ == "__main__":
# Profile transformer generation to discover the bottleneck
profiler = Profiler()
# ... rest of profiling code
```
2. Fix all references to "Module 14" → "Module 15"
---
### 4. Memory Profiling & Performance Benchmarking ✅ EXCELLENT
**Score: 100/100**
#### Strengths:
- ✅ Comprehensive `get_memory_usage()` method in KVCache
- ✅ Excellent `analyze_kvcache_memory()` comparing different model sizes
- ✅ Outstanding `analyze_kvcache_speedup()` with complexity analysis
- ✅ Clear visualization of memory-compute trade-offs
- ✅ Production context showing real-world GPU memory costs
#### Example Excellence:
```python
def analyze_kvcache_speedup():
"""📊 Measure KV cache speedup vs vanilla attention."""
# Simulates O(n²) vs O(n) complexity
ops_without = sum(i**2 for i in range(1, gen_length + 1)) # O(n²)
ops_with = gen_length # O(n)
speedup = ops_without / ops_with
```
Shows students the EXACT mathematical reason for speedup!
---
### 5. ML Systems Analysis ✅ EXCELLENT
**Score: 98/100**
#### Strengths:
- ✅ Outstanding motivation section with profiling (lines 71-141)
- ✅ Clear explanation of O(n²) → O(n) transformation
- ✅ Excellent trade-off analysis (memory vs compute)
- ✅ Real production numbers (GPT-3 cache sizes, ChatGPT usage)
- ✅ Memory overhead calculations with concrete examples
- ✅ Scaling behavior clearly demonstrated
#### Highlights:
1. **Motivation Section**: Shows students the problem BEFORE the solution
2. **Trade-off Analysis**: "Memory is cheap, compute is expensive"
3. **Production Context**: "ChatGPT uses KV caching for ALL generation"
4. **Scaling Insight**: "Speedup increases with sequence length"
#### Minor Issues:
- Could add more discussion of cache eviction strategies for long sequences
- Could mention PagedAttention (used in vLLM) as advanced cache management
---
### 6. Test Coverage ✅ EXCELLENT
**Score: 95/100**
#### Strengths:
- ✅ Three comprehensive unit tests:
- `test_unit_kvcache()` - Core cache operations
- `test_unit_cache_enablement()` - Different model sizes
- `test_unit_noninvasive_integration()` - Integration pattern
-`test_module()` comprehensive integration test
- ✅ All tests pass successfully
- ✅ Good edge case coverage (empty cache, full sequence, reset)
- ✅ Clear test output with educational feedback
#### Test Run Results:
```
🧪 RUNNING MODULE INTEGRATION TEST
==================================================
✅ KVCache implementation works correctly!
✅ Cache enablement works correctly!
✅ Non-invasive cache integration works correctly!
✅ Complete KV cache workflow validated!
✅ Memory tracking: 2.00 MB for 8 tensors
==================================================
🎉 ALL TESTS PASSED! Module ready for export.
```
#### Issues:
- ⚠️ **CRITICAL**: Profiling code (lines 79-141) runs on import, should be protected
- Could add test for cache overflow (exceeding max_seq_len)
- Could test batch dimension changes
**Recommendation**: Add test for error conditions:
```python
def test_unit_cache_errors():
"""Test cache error handling"""
cache = KVCache(1, 10, 2, 4, 32)
# Fill cache to max
for i in range(10):
cache.update(0, key, value)
cache.advance()
# Should raise error on overflow
with pytest.raises(ValueError):
cache.update(0, key, value)
```
---
### 7. Production Context & Real-World Applications ✅ EXCELLENT
**Score: 100/100**
#### Strengths:
- ✅ Outstanding production context throughout
- ✅ Clear connection to ChatGPT, Claude, GPT-4
- ✅ Economic viability discussion (10× speedup = 10× more users per GPU)
- ✅ Real-world numbers (GPT-3: 4.7GB cache per sequence)
- ✅ Best practices section with deployment guidance
- ✅ Explains why all production LLMs use this technique
#### Highlights:
1. **Economic Impact**: "This optimization makes production language model serving economically viable"
2. **User Experience**: "Without caching: unacceptably slow" vs "With caching: real-time interaction"
3. **Scale**: "Technique that enables serving millions of users daily"
4. **Industry Standard**: "vLLM, llama.cpp use similar patterns"
---
## Specific Issues & Fixes
### Issue 1: Profiling Code Not Protected ⚠️ CRITICAL
**Location**: Lines 79-141
**Problem**:
```python
# %%
# Profile transformer generation to discover the bottleneck
profiler = Profiler()
# ... profiling code runs immediately
```
This code executes on import, which will cause issues when other modules import this file.
**Fix**:
```python
# %% [markdown]
"""
## 🔬 Motivation: Why Memoization Matters for Transformers
...
"""
# %%
def profile_naive_generation():
"""Profile transformer generation to discover the bottleneck."""
from tinytorch.profiling.profiler import Profiler
import matplotlib.pyplot as plt
profiler = Profiler()
def naive_attention_step(seq_len, hidden_dim=64):
# ... implementation
pass
# Profile at increasing sequence lengths
print("🔬 Profiling Transformer Generation (Without Caching):\n")
# ... rest of profiling code
# Run profiling when executing module directly
if __name__ == "__main__":
profile_naive_generation()
```
---
### Issue 2: Module Number Inconsistency ⚠️ MEDIUM
**Locations**:
- Line 918: "Module 14 doesn't modify Modules 12-13"
- Line 1505: "tito module complete 14"
- Line 1622: "Module 14 doesn't modify"
- Line 1650: "Module 14: KV Caching"
**Fix**: Change all instances of "Module 14" to "Module 15" since this is the memoization module.
**Search and Replace**:
```bash
# In memoization_dev.py
Module 14 → Module 15
tito module complete 14 → tito module complete 15
```
---
### Issue 3: Analysis Functions Missing Comprehensive Docstrings ⚠️ MINOR
**Locations**: Lines 1339, 1381
**Current**:
```python
def analyze_kvcache_memory():
"""📊 Analyze KV cache memory usage across different configurations."""
```
**Recommended**:
```python
def analyze_kvcache_memory():
"""
📊 Analyze KV cache memory usage across different configurations.
Educational Purpose:
Demonstrates how cache memory scales with model architecture.
Students discover:
- Linear scaling with sequence length O(n)
- Memory overhead as percentage of model parameters
- Trade-off between cache size and speedup gains
Analyzes:
- Tiny models (128D): ~0.12 MB
- Small models (512D): ~2 MB
- Medium models (768D): ~9 MB
- Large models (1024D): ~32 MB
Key Insight:
Cache overhead is 10-30% of model parameters, but enables
10-15× speedup. Memory is cheap, compute is expensive!
Production Context:
GPT-3 (175B params, 2048 context): ~4GB cache per sequence
This memory cost is acceptable given the massive speedup.
"""
```
---
### Issue 4: Missing __main__ Guards ⚠️ CRITICAL
**Problem**: Several code blocks execute on import instead of being protected:
1. Lines 79-141: Profiling code
2. Lines 1426-1427: Analysis function calls
**Fix Pattern**:
```python
# Define functions first
def analyze_kvcache_memory():
# ... implementation
pass
def analyze_kvcache_speedup():
# ... implementation
pass
# Protect execution
if __name__ == "__main__":
analyze_kvcache_memory()
analyze_kvcache_speedup()
```
---
## Comparison with TinyTorch Standards
### Template Compliance: ✅ EXCELLENT
| Standard Requirement | Status | Score |
|---------------------|--------|-------|
| Jupytext Headers | ✅ Complete | 100% |
| NBGrader Metadata | ✅ Mostly Complete | 95% |
| Educational Content | ✅ Excellent | 98% |
| Progressive Disclosure | ✅ Excellent | 100% |
| Immediate Testing | ✅ Yes | 100% |
| Systems Analysis | ✅ Excellent | 98% |
| Production Context | ✅ Outstanding | 100% |
| Module Integration Test | ✅ Present | 100% |
| ML Systems Questions | ✅ Comprehensive | 100% |
| Module Summary | ✅ Excellent | 100% |
### Pedagogical Quality: ✅ EXCELLENT
**Narrative Flow**: Outstanding (95/100)
- Clear motivation with profiling
- Builds complexity progressively
- Strong connection between theory and implementation
**Scaffolding**: Excellent (98/100)
- TODO/APPROACH/HINTS pattern consistently used
- Clear examples in docstrings
- Good balance of guidance vs independence
**Systems Thinking**: Outstanding (100/100)
- Excellent O(n²) → O(n) analysis
- Clear trade-off discussions
- Real production context throughout
### Code Quality: ✅ EXCELLENT
**Implementation**: Clean and Professional (95/100)
- Well-structured KVCache class
- Proper error handling with educational messages
- Good separation of concerns
**Testing**: Comprehensive (95/100)
- Multiple unit tests covering different aspects
- Integration test validates complete workflow
- All tests pass
**Documentation**: Excellent (92/100)
- Rich docstrings with examples
- Clear ASCII diagrams
- Good inline comments explaining design decisions
---
## Critical Path Items (Must Fix Before Release)
### Priority 1: CRITICAL (Block Release)
1. ⚠️ **Protect profiling code with `if __name__ == "__main__"`** (lines 79-141)
2. ⚠️ **Protect analysis function calls** (lines 1426-1427)
3. ⚠️ **Fix module number references** (14 → 15 throughout)
### Priority 2: HIGH (Should Fix)
4. Add nbgrader metadata to motivation/analysis cells
5. Add comprehensive docstrings to analysis functions
### Priority 3: NICE TO HAVE
6. Add test for cache overflow error handling
7. Add discussion of advanced cache strategies (PagedAttention)
8. Consider adding batch dimension testing
---
## Module-Specific Observations
### What This Module Does Exceptionally Well
1. **Motivation Through Profiling**: The opening section (lines 71-141) is BRILLIANT
- Shows students the problem BEFORE teaching the solution
- Concrete measurements demonstrate O(n²) growth
- Makes the optimization need visceral, not abstract
2. **Non-Invasive Enhancement Pattern**: Outstanding systems engineering lesson
- Shows how to ADD capabilities without BREAKING existing code
- Module 15 enhances Module 13 without modifying it
- Critical production skill: "forward compatibility"
3. **Clear Trade-off Analysis**: Excellent engineering thinking
- Memory vs compute explicitly quantified
- "2× memory enables 10× speedup" - concrete numbers
- Shows students real engineering decisions
4. **Production Grounding**: Every concept tied to real systems
- ChatGPT, Claude, GPT-4 all use this technique
- Actual numbers: GPT-3 cache size, speedup measurements
- Economic viability discussion connects to business reality
### Alignment with Module Philosophy
**Single Tensor Class**: Correctly uses Tensor throughout, no Variable confusion
**No Forward References**: Only uses concepts from previous modules
**Immediate Testing**: Tests after each implementation
**Systems Focus**: Outstanding performance analysis
**Production Patterns**: Real-world integration strategy
---
## Recommendations for Improvement
### Short-term (Next Iteration)
1. Add `if __name__ == "__main__"` guards (CRITICAL)
2. Fix module number references (CRITICAL)
3. Add comprehensive docstrings to analysis functions
4. Add nbgrader metadata to remaining cells
### Long-term (Future Enhancements)
1. Add advanced section on cache eviction strategies
2. Discuss PagedAttention (vLLM's cache management)
3. Add visualization of cache memory over time
4. Consider adding batch processing examples
5. Add section on cache-aware model serving (batch prefilling)
### Educational Enhancements
1. Could add interactive widget showing cache updates
2. Could visualize attention matrix sparsity with caching
3. Add "common mistakes" section (e.g., forgetting to advance cache)
---
## Final Assessment
### Overall: ✅ EXCELLENT MODULE (A-)
**Module 15 is production-ready with minor fixes needed.**
### Strengths Summary
- Outstanding educational content with clear progression
- Excellent systems analysis with real measurements
- Strong production context throughout
- Comprehensive testing with good coverage
- Clean, professional implementation
- All tests pass successfully
### Issues Summary
- 3 CRITICAL issues (all easy to fix)
- 2 HIGH priority improvements
- 3 NICE TO HAVE enhancements
### Recommendation
**APPROVE with required fixes:**
1. Add `if __name__ == "__main__"` guards to protect test code
2. Fix module number inconsistencies (14 → 15)
3. Add comprehensive docstrings to analysis functions
After these fixes, this module will be an exemplar of TinyTorch quality.
---
## Comparison with Other Modules
This module represents some of the best educational content in TinyTorch:
- **Better than Module 01-04**: More sophisticated systems analysis
- **On par with Module 12-13**: Excellent production grounding
- **Sets new standard for**: Non-invasive enhancement pattern
The "motivation through profiling" section is a pattern that should be adopted by other optimization modules.
---
## Test Results
```bash
$ python modules/15_memoization/memoization_dev.py
🧪 RUNNING MODULE INTEGRATION TEST
==================================================
Running unit tests...
🔬 Unit Test: KVCache Implementation...
Cache initialized: 0.02 MB
✅ KVCache implementation works correctly!
🔬 Unit Test: Cache Enablement for Different Models...
Test 1: Small Model (Tiny Transformer)
Small model cache: 0.125 MB
Test 2: Medium Model (Standard Transformer)
Medium model cache: 2.000 MB
Test 3: Batch Inference (4 sequences)
Batch cache: 0.500 MB (4x batch size)
✅ Cache enablement works correctly!
🔬 Unit Test: Non-Invasive Cache Integration...
✅ Non-invasive cache integration works correctly!
Running integration scenarios...
🔬 Integration Test: Complete KV Cache Workflow...
✅ Complete KV cache workflow validated!
🔬 Integration Test: Memory Tracking...
✅ Memory tracking: 2.00 MB for 8 tensors
==================================================
🎉 ALL TESTS PASSED! Module ready for export.
```
**Result: ✅ ALL TESTS PASSING**
---
## Sign-off
**Module Quality**: A- (92/100)
**Ready for Student Use**: ✅ YES (after critical fixes)
**Reviewer**: TinyTorch Standards Compliance
**Date**: 2025-11-10
**Final Recommendation**: APPROVE with required fixes for critical issues. This is an excellent educational module that teaches a production-critical optimization with outstanding clarity and systems thinking. The minor issues found are easily fixable and don't detract from the overall quality.