mirror of
https://github.com/MLSysBook/TinyTorch.git
synced 2026-04-30 19:37:36 -05:00
Cleanup: Remove old/unused files
- Remove datasets analysis and download scripts (replaced by updated README) - Remove archived book development documentation - Remove module review reports (16_compression, 17_memoization)
This commit is contained in:
@@ -1,351 +0,0 @@
|
||||
# TinyTorch Dataset Analysis & Strategy
|
||||
|
||||
**Date**: November 10, 2025
|
||||
**Purpose**: Determine which datasets to ship with TinyTorch for optimal educational experience
|
||||
|
||||
---
|
||||
|
||||
## Current Milestone Data Usage
|
||||
|
||||
### Summary Table
|
||||
|
||||
| Milestone | File | Data Source | Currently Shipped? | Size | Issue |
|
||||
|-----------|------|-------------|-------------------|------|-------|
|
||||
| **01 Perceptron** | perceptron_trained.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
|
||||
| **01 Perceptron** | forward_pass.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
|
||||
| **02 XOR** | xor_crisis.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
|
||||
| **02 XOR** | xor_solved.py | Synthetic (code-generated) | ✅ N/A | 0 KB | None |
|
||||
| **03 MLP** | mlp_digits.py | `03_1986_mlp/data/digits_8x8.npz` | ✅ YES | 67 KB | **Sklearn source** |
|
||||
| **03 MLP** | mlp_mnist.py | Downloads via `data_manager.get_mnist()` | ❌ NO | ~10 MB | **Download fails** |
|
||||
| **04 CNN** | cnn_digits.py | `03_1986_mlp/data/digits_8x8.npz` (shared) | ✅ YES | 67 KB | **Sklearn source** |
|
||||
| **04 CNN** | lecun_cifar10.py | Downloads via `data_manager.get_cifar10()` | ❌ NO | ~170 MB | **Too large** |
|
||||
| **05 Transformer** | vaswani_chatgpt.py | `datasets/tinytalks/` | ✅ YES | 140 KB | None ✓ |
|
||||
| **05 Transformer** | vaswani_copilot.py | Embedded Python patterns (in code) | ✅ N/A | 0 KB | None ✓ |
|
||||
| **05 Transformer** | profile_kv_cache.py | Uses model from vaswani_chatgpt | ✅ N/A | 0 KB | None ✓ |
|
||||
|
||||
---
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### ✅ What's Working (6/11 files)
|
||||
|
||||
**Fully Self-Contained:**
|
||||
1. **Perceptron milestones** - Generate linearly separable data on-the-fly
|
||||
2. **XOR milestones** - Generate XOR patterns on-the-fly
|
||||
3. **mlp_digits.py** - Uses shipped `digits_8x8.npz` (67KB, sklearn digits)
|
||||
4. **cnn_digits.py** - Reuses `digits_8x8.npz` (smart sharing!)
|
||||
5. **vaswani_chatgpt.py** - Uses shipped TinyTalks (140KB)
|
||||
6. **vaswani_copilot.py** - Embedded patterns in code
|
||||
|
||||
**Result**: 6 of 11 milestone files work offline, instantly, with zero setup.
|
||||
|
||||
### ❌ What's Broken (2/11 files)
|
||||
|
||||
**Requires External Downloads:**
|
||||
1. **mlp_mnist.py** - Tries to download 10MB MNIST, fails with 404 error
|
||||
2. **lecun_cifar10.py** - Tries to download 170MB CIFAR-10
|
||||
|
||||
**Impact**:
|
||||
- Students can't run 2 milestone files without internet
|
||||
- Downloads fail (saw 404 error in testing)
|
||||
- First-time experience is 5+ minute wait or failure
|
||||
|
||||
### ⚠️ What's Problematic (3/11 files use sklearn data)
|
||||
|
||||
**Uses sklearn's digits dataset:**
|
||||
- `digits_8x8.npz` (67KB) is currently shipped
|
||||
- **Source**: Originally from sklearn.datasets.load_digits()
|
||||
- **Issue**: Not "TinyTorch data", it's sklearn's data
|
||||
- **Citation problem**: Can't cite as "TinyTorch educational dataset"
|
||||
|
||||
---
|
||||
|
||||
## Current Datasets Directory
|
||||
|
||||
```
|
||||
datasets/
|
||||
├── README.md (4KB)
|
||||
├── download_mnist.py (unused script)
|
||||
├── tiny/ (76KB - unknown purpose)
|
||||
├── tinymnist/ (3.6MB - synthetic, recently added)
|
||||
│ ├── train.pkl
|
||||
│ └── test.pkl
|
||||
└── tinytalks/ (140KB) ✅ TinyTorch original!
|
||||
├── CHANGELOG.md
|
||||
├── DATASHEET.md
|
||||
├── README.md
|
||||
├── LICENSE
|
||||
├── splits/
|
||||
│ ├── train.txt (12KB)
|
||||
│ ├── val.txt
|
||||
│ └── test.txt
|
||||
└── tinytalks_v1.txt
|
||||
```
|
||||
|
||||
**Current total**: ~3.8MB shipped data
|
||||
|
||||
---
|
||||
|
||||
## The Core Issues
|
||||
|
||||
### 1. **Attribution & Citation Problem**
|
||||
|
||||
Current situation:
|
||||
- `digits_8x8.npz` = sklearn's data (not TinyTorch's)
|
||||
- TinyTalks = TinyTorch original ✓
|
||||
- tinymnist = Synthetic (not authentic MNIST)
|
||||
|
||||
**For white paper citation**, you need:
|
||||
- ❌ Can't cite "digits_8x8" as TinyTorch dataset (it's sklearn)
|
||||
- ✅ Can cite "TinyTalks" as TinyTorch original
|
||||
- ❌ Can't cite synthetic tinymnist as educational benchmark
|
||||
|
||||
### 2. **Authenticity vs Speed Trade-off**
|
||||
|
||||
**Option A: Synthetic Data**
|
||||
- ✅ Ships with repo (instant start)
|
||||
- ❌ Not real examples (lower educational value)
|
||||
- ❌ Not citable as benchmark
|
||||
|
||||
**Option B: Curated Real Data**
|
||||
- ✅ Authentic samples from MNIST/CIFAR
|
||||
- ✅ Citable as educational benchmark
|
||||
- ✅ Teaches pattern recognition on real data
|
||||
- ❌ Needs to be generated once from source
|
||||
|
||||
### 3. **The sklearn Dependency**
|
||||
|
||||
Files using sklearn data:
|
||||
- mlp_digits.py
|
||||
- cnn_digits.py
|
||||
|
||||
**Problem**:
|
||||
- Not TinyTorch data
|
||||
- Citation goes to sklearn, not you
|
||||
- Loses educational ownership
|
||||
|
||||
---
|
||||
|
||||
## Recommended Strategy: TinyTorch Native Datasets
|
||||
|
||||
### Phase 1: Replace sklearn with TinyDigits ✅
|
||||
|
||||
**Create**: `datasets/tinydigits/`
|
||||
- **Source**: Extract 200 samples from sklearn's digits (8x8 grayscale)
|
||||
- **Purpose**: Replace `03_1986_mlp/data/digits_8x8.npz`
|
||||
- **Size**: ~20KB
|
||||
- **Citation**: "TinyDigits, curated from sklearn digits dataset for educational use"
|
||||
|
||||
**Files**:
|
||||
```
|
||||
datasets/tinydigits/
|
||||
├── README.md (explains curation process)
|
||||
├── train.pkl (150 samples, 8x8, ~15KB)
|
||||
└── test.pkl (47 samples, 8x8, ~5KB)
|
||||
```
|
||||
|
||||
**Why this works**:
|
||||
- ✅ Quick start (instant, offline)
|
||||
- ✅ Real data (from sklearn)
|
||||
- ✅ TinyTorch branding
|
||||
- ✅ Small enough to ship (20KB)
|
||||
- ✅ Can cite: "We curated TinyDigits from the sklearn digits dataset"
|
||||
|
||||
### Phase 2: Create TinyMNIST (Real Samples) ✅
|
||||
|
||||
**Create**: `datasets/tinymnist/` (replace synthetic)
|
||||
- **Source**: Extract 1000 best samples from actual MNIST
|
||||
- **Purpose**: Fast MNIST demo for MLP milestone
|
||||
- **Size**: ~90KB
|
||||
- **Citation**: "TinyMNIST, 1K curated samples from MNIST (LeCun et al., 1998)"
|
||||
|
||||
**Curation criteria**:
|
||||
- 100 samples per digit (0-9)
|
||||
- Select clearest, most "canonical" examples
|
||||
- Balanced difficulty (not all easy, not all hard)
|
||||
- Test edge cases (ambiguous digits for teaching)
|
||||
|
||||
**Files**:
|
||||
```
|
||||
datasets/tinymnist/
|
||||
├── README.md (explains curation from MNIST)
|
||||
├── LICENSE (cite LeCun et al., 1998)
|
||||
├── train.pkl (1000 samples, 28x28, ~75KB)
|
||||
└── test.pkl (200 samples, 28x28, ~15KB)
|
||||
```
|
||||
|
||||
**Why this works**:
|
||||
- ✅ Authentic MNIST samples
|
||||
- ✅ Fast enough to ship (90KB vs 10MB)
|
||||
- ✅ Citable: "TinyMNIST subset for educational scaffolding"
|
||||
- ✅ Students graduate to full MNIST later
|
||||
|
||||
### Phase 3: Document TinyTalks Properly ✅
|
||||
|
||||
**Already exists**: `datasets/tinytalks/` (140KB)
|
||||
- ✅ Original TinyTorch creation
|
||||
- ✅ Properly documented with DATASHEET.md
|
||||
- ✅ Leveled difficulty (L1-L5)
|
||||
- ✅ Citable as original work
|
||||
|
||||
**Action needed**: None! This is perfect.
|
||||
|
||||
### Phase 4: Skip TinyCIFAR (Too Large)
|
||||
|
||||
**Decision**: DON'T create TinyCIFAR
|
||||
- CIFAR-10 at 1000 samples would still be ~3MB (color images)
|
||||
- Combined with other data = 4+ MB repo bloat
|
||||
- **Better**: Keep download-on-demand for CIFAR-10
|
||||
|
||||
**For lecun_cifar10.py**:
|
||||
- Add `--download` flag to explicitly trigger download
|
||||
- Add helpful error message: "Run with --download to fetch CIFAR-10 (170MB, 2-3 min)"
|
||||
- Document that this is the "graduate to real benchmarks" milestone
|
||||
|
||||
---
|
||||
|
||||
## Final Dataset Suite
|
||||
|
||||
### What to Ship with TinyTorch
|
||||
|
||||
```
|
||||
datasets/
|
||||
├── tinydigits/ ~20KB ← NEW: Replace sklearn digits
|
||||
│ ├── README.md
|
||||
│ ├── train.pkl (150 samples, 8x8)
|
||||
│ └── test.pkl (47 samples, 8x8)
|
||||
│
|
||||
├── tinymnist/ ~90KB ← REPLACE: Real MNIST subset
|
||||
│ ├── README.md
|
||||
│ ├── LICENSE (cite LeCun)
|
||||
│ ├── train.pkl (1000 samples, 28x28)
|
||||
│ └── test.pkl (200 samples, 28x28)
|
||||
│
|
||||
└── tinytalks/ ~140KB ← KEEP: Original TinyTorch
|
||||
├── DATASHEET.md
|
||||
├── README.md
|
||||
├── LICENSE
|
||||
└── splits/
|
||||
├── train.txt
|
||||
├── val.txt
|
||||
└── test.txt
|
||||
|
||||
TOTAL: ~250KB (negligible repo impact)
|
||||
```
|
||||
|
||||
### What NOT to Ship
|
||||
|
||||
**Don't include**:
|
||||
- ❌ Full MNIST (10MB) - download on demand
|
||||
- ❌ CIFAR-10 (170MB) - download on demand
|
||||
- ❌ Any dataset >1MB - defeats portability
|
||||
- ❌ Synthetic fake data - not authentic enough
|
||||
|
||||
---
|
||||
|
||||
## Citation Strategy
|
||||
|
||||
### White Paper Language
|
||||
|
||||
```markdown
|
||||
## TinyTorch Educational Datasets
|
||||
|
||||
We developed three curated datasets optimized for progressive learning:
|
||||
|
||||
### TinyDigits (8×8 Grayscale, 200 samples)
|
||||
Curated subset of sklearn's digits dataset, selected for visual clarity
|
||||
and progressive difficulty. Used for rapid prototyping and CNN concept
|
||||
demonstrations.
|
||||
|
||||
### TinyMNIST (28×28 Grayscale, 1.2K samples)
|
||||
Curated subset of MNIST (LeCun et al., 1998), with 100 canonical examples
|
||||
per digit class. Balances authentic data with fast iteration cycles,
|
||||
enabling students to achieve success in <30 seconds while learning on
|
||||
real handwritten digits.
|
||||
|
||||
### TinyTalks (Text Q&A, 300 pairs)
|
||||
Original conversational dataset with 5 difficulty levels (L1: Greetings
|
||||
→ L5: Context reasoning). Designed specifically for teaching attention
|
||||
mechanisms and transformer architectures with clear learning signal and
|
||||
fast convergence.
|
||||
|
||||
### Design Philosophy
|
||||
- **Speed**: All datasets train in <60 seconds on CPU
|
||||
- **Authenticity**: Real data (MNIST digits, human conversations)
|
||||
- **Progressive**: TinyX → Full X graduation path
|
||||
- **Reproducible**: Fixed subsets ensure consistent results
|
||||
- **Offline**: No download dependencies for core learning
|
||||
|
||||
### Comparison to Standard Benchmarks
|
||||
| Metric | MNIST | TinyMNIST | Impact |
|
||||
|--------|-------|-----------|--------|
|
||||
| Samples | 60,000 | 1,000 | 60× faster |
|
||||
| Train time | 5-10 min | 30 sec | 10-20× faster |
|
||||
| Download | 10MB, network | 0, offline | Always works |
|
||||
| Student success | 65% (frustration) | 95% (confidence) | Better outcomes |
|
||||
```
|
||||
|
||||
**This is citable research**. You're not just using datasets, you're **designing educational infrastructure**.
|
||||
|
||||
---
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
### Immediate Actions
|
||||
|
||||
- [x] Keep TinyTalks as-is (perfect!)
|
||||
- [ ] Create TinyDigits from sklearn digits (replace 03_1986_mlp/data/)
|
||||
- [ ] Create TinyMNIST from real MNIST (replace synthetic version)
|
||||
- [ ] Remove synthetic tinymnist (not authentic)
|
||||
- [ ] Update milestones to use new TinyDigits
|
||||
- [ ] Update milestones to use new TinyMNIST
|
||||
- [ ] Add download instructions for full MNIST/CIFAR
|
||||
- [ ] Write datasets/PHILOSOPHY.md explaining curation
|
||||
- [ ] Add LICENSE files citing original sources
|
||||
- [ ] Write DATASHEET.md for each dataset
|
||||
|
||||
### File Changes Needed
|
||||
|
||||
**Update these milestones**:
|
||||
1. `mlp_digits.py` - Point to `datasets/tinydigits/`
|
||||
2. `cnn_digits.py` - Point to `datasets/tinydigits/`
|
||||
3. `mlp_mnist.py` - Point to `datasets/tinymnist/` first, offer --full flag
|
||||
4. `lecun_cifar10.py` - Add helpful message about --download flag
|
||||
|
||||
**Remove**:
|
||||
- `03_1986_mlp/data/digits_8x8.npz` (replace with TinyDigits)
|
||||
- Synthetic tinymnist pkl files (replace with real)
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
### Before (Current State)
|
||||
- ✅ 6/11 milestones work offline
|
||||
- ❌ 2/11 require downloads (often fail)
|
||||
- ❌ 3/11 use non-TinyTorch data (sklearn)
|
||||
- ❌ Not citable as educational infrastructure
|
||||
|
||||
### After (Proposed)
|
||||
- ✅ 9/11 milestones work offline (<30 sec)
|
||||
- ✅ 2/11 offer optional downloads with clear UX
|
||||
- ✅ 3 TinyTorch-branded datasets (citable)
|
||||
- ✅ White paper section on educational dataset design
|
||||
- ✅ Total shipped data: ~250KB (negligible)
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
**Recommendation**: Create TinyDigits and authentic TinyMNIST
|
||||
|
||||
**Rationale**:
|
||||
1. **Educational**: Real data beats synthetic for learning
|
||||
2. **Citable**: "TinyTorch educational datasets" becomes research contribution
|
||||
3. **Practical**: 250KB total keeps repo lightweight
|
||||
4. **Professional**: Proper curation, documentation, licenses
|
||||
5. **Scalable**: Clear graduation path to full benchmarks
|
||||
|
||||
**Not reinventing the wheel** - building educational infrastructure that doesn't exist.
|
||||
|
||||
The goal: Make TinyTorch not just a framework, but a **citable educational system** with purpose-designed datasets.
|
||||
@@ -1,102 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Download MNIST dataset files.
|
||||
"""
|
||||
|
||||
import os
|
||||
import gzip
|
||||
import urllib.request
|
||||
import numpy as np
|
||||
|
||||
def download_mnist():
|
||||
"""Download MNIST dataset files."""
|
||||
|
||||
# Create mnist directory
|
||||
os.makedirs('mnist', exist_ok=True)
|
||||
|
||||
# URLs for MNIST dataset (from original source)
|
||||
base_url = 'http://yann.lecun.com/exdb/mnist/'
|
||||
files = {
|
||||
'train-images-idx3-ubyte.gz': 'train_images',
|
||||
'train-labels-idx1-ubyte.gz': 'train_labels',
|
||||
't10k-images-idx3-ubyte.gz': 'test_images',
|
||||
't10k-labels-idx1-ubyte.gz': 'test_labels'
|
||||
}
|
||||
|
||||
print("📥 Downloading MNIST dataset...")
|
||||
|
||||
for filename, label in files.items():
|
||||
filepath = os.path.join('mnist', filename)
|
||||
|
||||
# Skip if already downloaded
|
||||
if os.path.exists(filepath) and os.path.getsize(filepath) > 1000:
|
||||
print(f" ✓ {filename} already exists")
|
||||
continue
|
||||
|
||||
url = base_url + filename
|
||||
print(f" Downloading {filename}...")
|
||||
|
||||
try:
|
||||
# Download with custom headers to avoid 403 errors
|
||||
request = urllib.request.Request(
|
||||
url,
|
||||
headers={
|
||||
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
|
||||
}
|
||||
)
|
||||
|
||||
with urllib.request.urlopen(request) as response:
|
||||
data = response.read()
|
||||
|
||||
# Save the file
|
||||
with open(filepath, 'wb') as f:
|
||||
f.write(data)
|
||||
|
||||
size = len(data) / 1024 / 1024
|
||||
print(f" ✓ Downloaded {size:.1f} MB")
|
||||
|
||||
except Exception as e:
|
||||
print(f" ✗ Failed: {e}")
|
||||
print(f" Trying alternative method...")
|
||||
|
||||
# Alternative: Create synthetic MNIST-like data for testing
|
||||
if 'images' in label:
|
||||
# Create synthetic image data (60000 or 10000 samples)
|
||||
n_samples = 60000 if 'train' in label else 10000
|
||||
images = np.random.randint(0, 256, (n_samples, 28, 28), dtype=np.uint8)
|
||||
|
||||
# MNIST file format header
|
||||
header = np.array([0x0803, n_samples, 28, 28], dtype='>i4')
|
||||
|
||||
with gzip.open(filepath, 'wb') as f:
|
||||
f.write(header.tobytes())
|
||||
f.write(images.tobytes())
|
||||
|
||||
print(f" ✓ Created synthetic {label} data")
|
||||
|
||||
else:
|
||||
# Create synthetic label data
|
||||
n_samples = 60000 if 'train' in label else 10000
|
||||
labels = np.random.randint(0, 10, n_samples, dtype=np.uint8)
|
||||
|
||||
# MNIST file format header
|
||||
header = np.array([0x0801, n_samples], dtype='>i4')
|
||||
|
||||
with gzip.open(filepath, 'wb') as f:
|
||||
f.write(header.tobytes())
|
||||
f.write(labels.tobytes())
|
||||
|
||||
print(f" ✓ Created synthetic {label} data")
|
||||
|
||||
print("\n✅ MNIST dataset ready in datasets/mnist/")
|
||||
|
||||
# Verify files
|
||||
print("\nVerifying files:")
|
||||
for filename in files.keys():
|
||||
filepath = os.path.join('mnist', filename)
|
||||
if os.path.exists(filepath):
|
||||
size = os.path.getsize(filepath) / 1024 / 1024
|
||||
print(f" {filename}: {size:.1f} MB")
|
||||
|
||||
if __name__ == "__main__":
|
||||
download_mnist()
|
||||
@@ -1,30 +0,0 @@
|
||||
{
|
||||
"mnist": {
|
||||
"dataset": "tinymnist",
|
||||
"training_time": 0.5278840065002441,
|
||||
"epochs": 20,
|
||||
"final_accuracy": 27.0,
|
||||
"architecture": "MLP(784\u2192128\u219210)",
|
||||
"suitable_for_students": false
|
||||
},
|
||||
"vww": {
|
||||
"dataset": "tinyvww",
|
||||
"training_time": 8.571065664291382,
|
||||
"epochs": 15,
|
||||
"final_accuracy": 100.0,
|
||||
"architecture": "CNN(Conv\u2192Pool\u2192Conv\u2192Pool\u2192FC)",
|
||||
"precision": 1.0,
|
||||
"recall": 1.0,
|
||||
"f1_score": 1.0,
|
||||
"suitable_for_students": true
|
||||
},
|
||||
"gpt": {
|
||||
"dataset": "tinypy",
|
||||
"training_time": 2.596580743789673,
|
||||
"epochs": 10,
|
||||
"final_loss": 1.9299052770321186,
|
||||
"final_perplexity": 6.888857677630846,
|
||||
"architecture": "TinyGPT(64 embed, 4 heads, 2 layers)",
|
||||
"suitable_for_students": true
|
||||
}
|
||||
}
|
||||
@@ -1,127 +0,0 @@
|
||||
# TinyTorch Flame-Inspired Design System
|
||||
|
||||
## Design Philosophy
|
||||
|
||||
The TinyTorch website design is inspired by the flame logo, creating a warm, professional academic environment that reflects the educational nature of the framework while maintaining credibility and accessibility.
|
||||
|
||||
## Color Palette
|
||||
|
||||
### Primary Flame Colors (Extracted from Logo)
|
||||
- **Flame Primary**: `#E85A34` - Main orange from the flame
|
||||
- **Flame Secondary**: `#F97316` - Secondary warm orange
|
||||
- **Flame Light**: `#FED7AA` - Light warm orange for backgrounds
|
||||
- **Flame Yellow**: `#FCD34D` - Warm yellow from flame core
|
||||
- **Flame Deep**: `#DC2626` - Deep red from flame base
|
||||
|
||||
### Professional Text Colors
|
||||
- **Text Dark**: `#1F2937` - Primary text color
|
||||
- **Text Medium**: `#4B5563` - Secondary text
|
||||
- **Text Light**: `#6B7280` - Tertiary text
|
||||
|
||||
### Background System
|
||||
- **Background Main**: `#F8F9FA` - Matches logo background
|
||||
- **Background White**: `#FFFFFF` - Content areas
|
||||
- **Background Warm**: `#FEF7F0` - Subtle warm backgrounds
|
||||
- **Accent Gradient**: Subtle flame-inspired gradient
|
||||
|
||||
## Design Principles
|
||||
|
||||
### 1. Warm Professionalism
|
||||
- Flame colors provide warmth without sacrificing academic credibility
|
||||
- Subtle gradients and warm backgrounds create inviting learning environment
|
||||
- Professional typography maintains educational standards
|
||||
|
||||
### 2. Clean Academic Lines
|
||||
- **No curved borders** - maintains academic formality
|
||||
- Clean rectangular layouts with flame-colored accents
|
||||
- Consistent spacing and typography hierarchy
|
||||
|
||||
### 3. Flame-Inspired Accents
|
||||
- **Left borders**: Flame gradients on content blocks, code, and admonitions
|
||||
- **Progress indicators**: Flame gradient progress bars
|
||||
- **Interactive elements**: Flame colors for hover states and focus
|
||||
|
||||
### 4. Subtle Visual Hierarchy
|
||||
- **H1 headers**: Flame gradient underlines
|
||||
- **H3 headers**: Flame primary color
|
||||
- **Links**: Flame primary with deeper red hover
|
||||
- **Buttons**: Flame primary background with professional styling
|
||||
|
||||
## Component Styling
|
||||
|
||||
### Navigation
|
||||
- **Sidebar**: Flame primary accents for current/hover states
|
||||
- **Header**: Clean white with flame-colored interactive elements
|
||||
- **TOC**: No curves, flame-colored indicators
|
||||
|
||||
### Content Areas
|
||||
- **Code blocks**: Warm background with flame gradient left border
|
||||
- **Admonitions**: Flame-colored borders with warm backgrounds
|
||||
- **Blockquotes**: Flame left border with warm background
|
||||
|
||||
### Interactive Elements
|
||||
- **Buttons**: Flame primary background, clean professional styling
|
||||
- **Focus states**: Flame-colored outlines
|
||||
- **Selection**: Flame background for text selection
|
||||
- **Hover effects**: Subtle flame-colored shadows and transforms
|
||||
|
||||
### Special Components
|
||||
- **Achievement cards**: Flame left borders with hover animations
|
||||
- **Learning path steps**: Flame indicators with warm backgrounds
|
||||
- **Module badges**: Flame-colored completion indicators
|
||||
- **CTA boxes**: Flame gradient backgrounds with flame borders
|
||||
|
||||
## Accessibility Features
|
||||
|
||||
### High Contrast Support
|
||||
- Darker flame colors in high contrast mode
|
||||
- Maintained readability standards
|
||||
- WCAG AA compliance for color contrast
|
||||
|
||||
### Reduced Motion Support
|
||||
- Disabled animations for users with motion sensitivity
|
||||
- Static alternatives for all animated elements
|
||||
|
||||
### Focus Management
|
||||
- Clear flame-colored focus indicators
|
||||
- Keyboard navigation support
|
||||
- Screen reader friendly markup
|
||||
|
||||
## Usage Guidelines
|
||||
|
||||
### Do's
|
||||
- Use flame colors for accents and interactive elements
|
||||
- Maintain warm, professional tone
|
||||
- Keep backgrounds subtle and readable
|
||||
- Use gradients sparingly for emphasis
|
||||
|
||||
### Don'ts
|
||||
- Avoid intense orange that overwhelms content
|
||||
- Don't use flame colors for large background areas
|
||||
- Avoid curved borders (academic requirement)
|
||||
- Don't compromise text readability for visual appeal
|
||||
|
||||
## Implementation Notes
|
||||
|
||||
### CSS Custom Properties
|
||||
All flame colors are defined as CSS custom properties for consistent theming and easy maintenance.
|
||||
|
||||
### Browser Compatibility
|
||||
- Gradient fallbacks for older browsers
|
||||
- Progressive enhancement for modern features
|
||||
- Mobile-responsive design
|
||||
|
||||
### Performance
|
||||
- Minimal use of animations
|
||||
- Optimized gradients and shadows
|
||||
- Efficient CSS organization
|
||||
|
||||
## Relationship to TinyTorch Logo
|
||||
|
||||
The design system directly extracts colors from the TinyTorch flame logo:
|
||||
- Orange/red flame colors for primary accents
|
||||
- Yellow core colors for highlights and progress
|
||||
- Maintains visual consistency with brand identity
|
||||
- Creates cohesive experience from logo to full website
|
||||
|
||||
This creates a unified brand experience where the logo naturally fits within the overall design language.
|
||||
@@ -1,452 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Convert TinyTorch modules to Jupyter Book chapters.
|
||||
|
||||
This script processes modules/source/*_dev.py files and converts them to
|
||||
student-ready notebooks for the Jupyter Book, stripping solutions manually.
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import subprocess
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Any, Optional
|
||||
|
||||
# Add project root to path for imports
|
||||
project_root = Path(__file__).parent.parent
|
||||
sys.path.insert(0, str(project_root))
|
||||
|
||||
class ModuleConverter:
|
||||
"""Convert TinyTorch modules to Jupyter Book chapters."""
|
||||
|
||||
def __init__(self):
|
||||
# Use absolute paths relative to project root
|
||||
project_root = Path(__file__).parent.parent
|
||||
self.modules_dir = project_root / "modules/source"
|
||||
self.book_dir = project_root / "book"
|
||||
self.chapters_dir = self.book_dir / "chapters"
|
||||
|
||||
# Module to chapter mapping
|
||||
self.module_mapping = {
|
||||
"": {"title": "Development Environment", "filename": "01-setup"},
|
||||
"01_tensor": {"title": "Tensors", "filename": "02-tensor"},
|
||||
"02_activations": {"title": "Activations", "filename": "03-activations"},
|
||||
"03_layers": {"title": "Layers", "filename": "04-layers"},
|
||||
"05_networks": {"title": "Networks", "filename": "05-networks"},
|
||||
"06_cnn": {"title": "CNNs", "filename": "06-cnn"},
|
||||
"07_dataloader": {"title": "DataLoader", "filename": "07-dataloader"},
|
||||
"08_autograd": {"title": "Autograd", "filename": "08-autograd"},
|
||||
"09_optimizers": {"title": "Optimizers", "filename": "09-optimizers"},
|
||||
"10_training": {"title": "Training", "filename": "10-training"},
|
||||
"11_compression": {"title": "Compression", "filename": "11-compression"},
|
||||
"12_kernels": {"title": "Kernels", "filename": "12-kernels"},
|
||||
"13_benchmarking": {"title": "Benchmarking", "filename": "13-benchmarking"},
|
||||
"14_mlops": {"title": "MLOps", "filename": "14-mlops"},
|
||||
}
|
||||
|
||||
# Mapping from directory name to dev file name
|
||||
self.dev_file_mapping = {
|
||||
"": "setup_dev.py",
|
||||
"01_tensor": "tensor_dev.py",
|
||||
"02_activations": "activations_dev.py",
|
||||
"03_layers": "layers_dev.py",
|
||||
"05_networks": "networks_dev.py",
|
||||
"06_cnn": "cnn_dev.py",
|
||||
"07_dataloader": "dataloader_dev.py",
|
||||
"08_autograd": "autograd_dev.py",
|
||||
"09_optimizers": "optimizers_dev.py",
|
||||
"10_training": "training_dev.py",
|
||||
"11_compression": "compression_dev.py",
|
||||
"12_kernels": "kernels_dev.py",
|
||||
"13_benchmarking": "benchmarking_dev.py",
|
||||
"14_mlops": "mlops_dev.py",
|
||||
}
|
||||
|
||||
def convert_to_notebook(self, dev_file: Path) -> Optional[Path]:
|
||||
"""Convert dev file to notebook using Jupytext."""
|
||||
print(f"📝 Converting {dev_file.name} to notebook")
|
||||
|
||||
# Create temporary output file
|
||||
temp_notebook = dev_file.with_suffix('.temp.ipynb')
|
||||
|
||||
# Use jupytext to convert .py to .ipynb
|
||||
cmd = ["jupytext", "--to", "ipynb", str(dev_file.absolute()), "--output", str(temp_notebook.absolute())]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"❌ Failed to convert {dev_file} to notebook: {result.stderr}")
|
||||
return None
|
||||
|
||||
return temp_notebook
|
||||
|
||||
def remove_solutions(self, notebook_path: Path) -> Path:
|
||||
"""Remove solutions from notebook."""
|
||||
with open(notebook_path, 'r') as f:
|
||||
notebook = json.load(f)
|
||||
|
||||
# Process each cell
|
||||
for cell in notebook.get('cells', []):
|
||||
if cell.get('cell_type') == 'code':
|
||||
source = cell.get('source', [])
|
||||
new_source = []
|
||||
in_solution = False
|
||||
|
||||
for line in source:
|
||||
if '### BEGIN SOLUTION' in line:
|
||||
in_solution = True
|
||||
new_source.append(line)
|
||||
new_source.append(' # YOUR CODE HERE\n')
|
||||
new_source.append(' raise NotImplementedError()\n')
|
||||
continue
|
||||
elif '### END SOLUTION' in line:
|
||||
in_solution = False
|
||||
new_source.append(line)
|
||||
continue
|
||||
elif in_solution:
|
||||
# Skip solution lines
|
||||
continue
|
||||
else:
|
||||
new_source.append(line)
|
||||
|
||||
cell['source'] = new_source
|
||||
|
||||
# Save processed notebook
|
||||
output_path = notebook_path.with_suffix('.student.ipynb')
|
||||
with open(output_path, 'w') as f:
|
||||
json.dump(notebook, f, indent=2)
|
||||
|
||||
return output_path
|
||||
|
||||
def add_binder_config(self, notebook: Dict[str, Any], module_name: str) -> Dict[str, Any]:
|
||||
"""Add Binder configuration to notebook metadata."""
|
||||
if 'metadata' not in notebook:
|
||||
notebook['metadata'] = {}
|
||||
|
||||
notebook['metadata'].update({
|
||||
'kernelspec': {
|
||||
'display_name': 'Python 3',
|
||||
'language': 'python',
|
||||
'name': 'python3'
|
||||
},
|
||||
'language_info': {
|
||||
'name': 'python',
|
||||
'version': '3.8+'
|
||||
},
|
||||
'mystnb': {
|
||||
'execution_mode': 'auto'
|
||||
}
|
||||
})
|
||||
|
||||
return notebook
|
||||
|
||||
def extract_learning_goals(self, dev_file: Path) -> str:
|
||||
"""Extract learning goals from source file and format as admonition block."""
|
||||
with open(dev_file, 'r') as f:
|
||||
content = f.read()
|
||||
|
||||
# Find the Learning Goals section
|
||||
goals_start = content.find('## Learning Goals\n')
|
||||
if goals_start == -1:
|
||||
return ""
|
||||
|
||||
# Find the end of the goals section (next ## heading)
|
||||
goals_content_start = goals_start + len('## Learning Goals\n')
|
||||
next_section = content.find('\n## ', goals_content_start)
|
||||
|
||||
if next_section == -1:
|
||||
# If no next section found, look for next markdown cell
|
||||
next_section = content.find('\n# %%', goals_content_start)
|
||||
|
||||
if next_section == -1:
|
||||
goals_text = content[goals_content_start:].strip()
|
||||
else:
|
||||
goals_text = content[goals_content_start:next_section].strip()
|
||||
|
||||
# Format as admonition block
|
||||
admonition = ['```{admonition} 🎯 Learning Goals\n']
|
||||
admonition.append(':class: tip\n')
|
||||
for line in goals_text.split('\n'):
|
||||
if line.strip():
|
||||
admonition.append(f'{line}\n')
|
||||
admonition.append('```\n\n')
|
||||
|
||||
return ''.join(admonition)
|
||||
|
||||
def extract_module_overview(self, dev_file: Path) -> str:
|
||||
"""Extract first markdown cell content for book overview."""
|
||||
with open(dev_file, 'r') as f:
|
||||
content = f.read()
|
||||
|
||||
# Find first markdown cell
|
||||
start = content.find('# %% [markdown]\n"""')
|
||||
if start == -1:
|
||||
return ""
|
||||
|
||||
end = content.find('"""', start + 20)
|
||||
if end == -1:
|
||||
return ""
|
||||
|
||||
# Extract and clean the content
|
||||
overview = content[start + len('# %% [markdown]\n"""'):end].strip()
|
||||
|
||||
# Replace Learning Goals section with admonition block
|
||||
learning_goals = self.extract_learning_goals(dev_file)
|
||||
if learning_goals and '## Learning Goals' in overview:
|
||||
# Find and replace the Learning Goals section
|
||||
goals_start = overview.find('## Learning Goals')
|
||||
if goals_start != -1:
|
||||
# Find end of goals section
|
||||
next_section = overview.find('\n## ', goals_start + 1)
|
||||
if next_section == -1:
|
||||
# Goals are at the end
|
||||
overview = overview[:goals_start] + learning_goals
|
||||
else:
|
||||
# Replace goals section with admonition
|
||||
overview = (overview[:goals_start] +
|
||||
learning_goals +
|
||||
overview[next_section:])
|
||||
|
||||
return overview
|
||||
|
||||
def create_module_overview_page(self, module_name: str) -> bool:
|
||||
"""Create a module overview page for the book (hybrid approach)."""
|
||||
if module_name not in self.module_mapping:
|
||||
return False
|
||||
|
||||
module_dir = self.modules_dir / module_name
|
||||
dev_file_name = self.dev_file_mapping.get(module_name)
|
||||
if not dev_file_name:
|
||||
return False
|
||||
|
||||
dev_file = module_dir / dev_file_name
|
||||
if not dev_file.exists():
|
||||
return False
|
||||
|
||||
module_info = self.module_mapping[module_name]
|
||||
|
||||
# Extract overview content
|
||||
overview = self.extract_module_overview(dev_file)
|
||||
|
||||
# Create interactive launch buttons
|
||||
github_url = f"https://github.com/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{dev_file_name}"
|
||||
binder_url = f"https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?filepath=modules/source/{module_name}/{dev_file_name.replace('.py', '.ipynb')}"
|
||||
colab_url = f"https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{dev_file_name.replace('.py', '.ipynb')}"
|
||||
|
||||
interactive_section = f"""
|
||||
## 🚀 Interactive Learning
|
||||
|
||||
Choose your preferred way to engage with this module:
|
||||
|
||||
````{{grid}} 1 2 3 3
|
||||
|
||||
```{{grid-item-card}} 🚀 Launch Binder
|
||||
:link: {binder_url}
|
||||
:class-header: bg-light
|
||||
|
||||
Run this module interactively in your browser. No installation required!
|
||||
```
|
||||
|
||||
```{{grid-item-card}} ⚡ Open in Colab
|
||||
:link: {colab_url}
|
||||
:class-header: bg-light
|
||||
|
||||
Use Google Colab for GPU access and cloud compute power.
|
||||
```
|
||||
|
||||
```{{grid-item-card}} 📖 View Source
|
||||
:link: {github_url}
|
||||
:class-header: bg-light
|
||||
|
||||
Browse the Python source code and understand the implementation.
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
```{{admonition}} 💾 Save Your Progress
|
||||
:class: tip
|
||||
**Binder sessions are temporary!** Download your completed notebook when done, or switch to local development for persistent work.
|
||||
|
||||
Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/serious-development.md)
|
||||
```
|
||||
|
||||
"""
|
||||
|
||||
# Combine everything
|
||||
page_content = overview + interactive_section
|
||||
|
||||
# Save to chapters directory
|
||||
self.chapters_dir.mkdir(parents=True, exist_ok=True)
|
||||
output_file = self.chapters_dir / f"{module_info['filename']}.md"
|
||||
|
||||
with open(output_file, 'w') as f:
|
||||
f.write(page_content)
|
||||
|
||||
print(f"✅ Created overview page: {output_file}")
|
||||
return True
|
||||
|
||||
def add_book_frontmatter(self, notebook: Dict[str, Any], module_name: str, title: str) -> Dict[str, Any]:
|
||||
"""Add Jupyter Book frontmatter to the notebook."""
|
||||
|
||||
# Create interactive learning admonition
|
||||
interactive_cell = {
|
||||
'cell_type': 'markdown',
|
||||
'metadata': {},
|
||||
'source': [
|
||||
'```{admonition} Interactive Learning\n',
|
||||
':class: tip\n',
|
||||
'🚀 **Launch Binder**: Click the rocket icon above to run this chapter interactively!\n',
|
||||
'\n',
|
||||
'💾 **Save Your Work**: Download your completed notebook when done.\n',
|
||||
'\n',
|
||||
'🏗️ **Build Locally**: Ready for serious development? [Fork the repo](https://github.com/your-org/tinytorch) and work locally with the full `tito` workflow.\n',
|
||||
'```\n',
|
||||
'\n'
|
||||
]
|
||||
}
|
||||
|
||||
# Insert interactive cell after the first title cell
|
||||
cells = notebook.get('cells', [])
|
||||
|
||||
# Find the first title cell and add interactive cell after it
|
||||
title_found = False
|
||||
for i, cell in enumerate(cells):
|
||||
if cell.get('cell_type') == 'markdown':
|
||||
source = ''.join(cell.get('source', []))
|
||||
if source.startswith('# '):
|
||||
# Insert interactive cell after the title
|
||||
cells.insert(i + 1, interactive_cell)
|
||||
title_found = True
|
||||
break
|
||||
|
||||
if not title_found:
|
||||
cells.insert(0, interactive_cell)
|
||||
|
||||
notebook['cells'] = cells
|
||||
return notebook
|
||||
|
||||
def convert_module(self, module_name: str) -> bool:
|
||||
"""Convert a single module to a chapter."""
|
||||
if module_name not in self.module_mapping:
|
||||
print(f"❌ Unknown module: {module_name}")
|
||||
return False
|
||||
|
||||
module_dir = self.modules_dir / module_name
|
||||
if not module_dir.exists():
|
||||
print(f"❌ Module directory not found: {module_dir}")
|
||||
return False
|
||||
|
||||
# Get the dev file name for this module
|
||||
dev_file_name = self.dev_file_mapping.get(module_name)
|
||||
if not dev_file_name:
|
||||
print(f"❌ No dev file mapping for {module_name}")
|
||||
return False
|
||||
|
||||
dev_file = module_dir / dev_file_name
|
||||
if not dev_file.exists():
|
||||
print(f"❌ Dev file not found: {dev_file}")
|
||||
return False
|
||||
|
||||
print(f"🔄 Converting {module_name}: {dev_file}")
|
||||
|
||||
try:
|
||||
# Convert to notebook
|
||||
notebook_path = self.convert_to_notebook(dev_file)
|
||||
if not notebook_path:
|
||||
return False
|
||||
|
||||
# Keep solutions (no NBGrader processing)
|
||||
# student_notebook_path = self.remove_solutions(notebook_path) # Disabled - keep solutions
|
||||
|
||||
# Load the full notebook with solutions
|
||||
with open(notebook_path, 'r') as f:
|
||||
notebook = json.load(f)
|
||||
|
||||
# Add book-specific enhancements
|
||||
module_info = self.module_mapping[module_name]
|
||||
notebook = self.add_binder_config(notebook, module_name)
|
||||
# notebook = self.add_book_frontmatter(notebook, module_name, module_info['title']) # Disabled for raw export
|
||||
|
||||
# Save to chapters directory
|
||||
self.chapters_dir.mkdir(parents=True, exist_ok=True)
|
||||
output_file = self.chapters_dir / f"{module_info['filename']}.ipynb"
|
||||
|
||||
with open(output_file, 'w') as f:
|
||||
json.dump(notebook, f, indent=2)
|
||||
|
||||
print(f"✅ Created chapter: {output_file}")
|
||||
|
||||
# Clean up temporary files
|
||||
notebook_path.unlink(missing_ok=True)
|
||||
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
print(f"❌ Error converting {module_name}: {e}")
|
||||
return False
|
||||
|
||||
def convert_all_modules(self) -> bool:
|
||||
"""Convert all available modules."""
|
||||
print("🔄 Converting all TinyTorch modules to Jupyter Book chapters...")
|
||||
|
||||
success_count = 0
|
||||
total_count = 0
|
||||
|
||||
for module_name in self.module_mapping.keys():
|
||||
total_count += 1
|
||||
if self.convert_module(module_name):
|
||||
success_count += 1
|
||||
|
||||
print(f"\n📊 Conversion Summary:")
|
||||
print(f" ✅ Success: {success_count}/{total_count} modules")
|
||||
print(f" 📁 Output: {self.chapters_dir}")
|
||||
|
||||
return success_count == total_count
|
||||
|
||||
def main():
|
||||
"""Main conversion script."""
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Convert TinyTorch modules to Jupyter Book")
|
||||
parser.add_argument('--module', help='Convert specific module (e.g., )')
|
||||
parser.add_argument('--all', action='store_true', help='Convert all modules')
|
||||
parser.add_argument('--overview', action='store_true', help='Create overview pages instead of full notebooks')
|
||||
parser.add_argument('--overview-module', help='Create overview page for specific module')
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
converter = ModuleConverter()
|
||||
|
||||
if args.overview_module:
|
||||
success = converter.create_module_overview_page(args.overview_module)
|
||||
sys.exit(0 if success else 1)
|
||||
elif args.overview:
|
||||
# Create overview pages for all modules
|
||||
print("🔄 Creating module overview pages for Jupyter Book...")
|
||||
success_count = 0
|
||||
total_count = 0
|
||||
|
||||
for module_name in converter.module_mapping.keys():
|
||||
total_count += 1
|
||||
if converter.create_module_overview_page(module_name):
|
||||
success_count += 1
|
||||
|
||||
print(f"\n📊 Overview Creation Summary:")
|
||||
print(f" ✅ Success: {success_count}/{total_count} modules")
|
||||
print(f" 📁 Output: {converter.chapters_dir}")
|
||||
|
||||
success = success_count == total_count
|
||||
sys.exit(0 if success else 1)
|
||||
elif args.module:
|
||||
success = converter.convert_module(args.module)
|
||||
sys.exit(0 if success else 1)
|
||||
elif args.all:
|
||||
success = converter.convert_all_modules()
|
||||
sys.exit(0 if success else 1)
|
||||
else:
|
||||
parser.print_help()
|
||||
sys.exit(1)
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,298 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Convert module READMEs to Jupyter Book chapters.
|
||||
|
||||
This script takes README files from modules/source/*/README.md and converts them
|
||||
to Jupyter Book chapters in book/chapters/ with proper frontmatter and web optimization.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
import yaml
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
|
||||
def get_module_info(module_path: Path) -> Dict[str, str]:
|
||||
"""Extract module information from module.yaml file."""
|
||||
yaml_path = module_path / "module.yaml"
|
||||
if yaml_path.exists():
|
||||
with open(yaml_path, 'r') as f:
|
||||
module_data = yaml.safe_load(f)
|
||||
return {
|
||||
'title': module_data.get('title', module_path.name.replace('_', ' ').title()),
|
||||
'description': module_data.get('description', ''),
|
||||
'difficulty': module_data.get('difficulty', 'Intermediate'),
|
||||
'time_estimate': module_data.get('time_estimate', '2-4 hours'),
|
||||
'prerequisites': module_data.get('prerequisites', []),
|
||||
'next_steps': module_data.get('next_steps', [])
|
||||
}
|
||||
return {}
|
||||
|
||||
def extract_learning_objectives(content: str) -> List[str]:
|
||||
"""Extract learning objectives from README content."""
|
||||
objectives = []
|
||||
# Look for common patterns in READMEs
|
||||
patterns = [
|
||||
r'By the end of this module, you will:?\s*\n((?:- [^\n]+\n?)+)',
|
||||
r'Learning Goals?:?\s*\n((?:- [^\n]+\n?)+)',
|
||||
r'Learning Objectives?:?\s*\n((?:- [^\n]+\n?)+)'
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, content, re.IGNORECASE | re.MULTILINE)
|
||||
if match:
|
||||
objectives_text = match.group(1)
|
||||
objectives = [line.strip('- ').strip() for line in objectives_text.split('\n') if line.strip().startswith('-')]
|
||||
break
|
||||
|
||||
return objectives
|
||||
|
||||
def create_frontmatter(module_name: str, module_info: Dict[str, str], objectives: List[str]) -> str:
|
||||
"""Create Jupyter Book frontmatter for the chapter."""
|
||||
# Clean up module name for title
|
||||
title = module_info.get('title', module_name.replace('_', ' ').title())
|
||||
|
||||
frontmatter = f"""---
|
||||
title: "{title}"
|
||||
description: "{module_info.get('description', '')}"
|
||||
difficulty: "{module_info.get('difficulty', 'Intermediate')}"
|
||||
time_estimate: "{module_info.get('time_estimate', '2-4 hours')}"
|
||||
prerequisites: {module_info.get('prerequisites', [])}
|
||||
next_steps: {module_info.get('next_steps', [])}
|
||||
learning_objectives: {objectives}
|
||||
---
|
||||
|
||||
"""
|
||||
return frontmatter
|
||||
|
||||
def enhance_content_for_web(content: str, module_name: str, module_num: int) -> str:
|
||||
"""Enhance README content for web presentation."""
|
||||
# Remove existing grid cards to prevent conflicts with new interactive elements
|
||||
# Pattern to match grid sections (from ```{grid} to closing ```)
|
||||
grid_pattern = r'```\{grid\}[^`]*?```'
|
||||
content = re.sub(grid_pattern, '', content, flags=re.DOTALL)
|
||||
|
||||
# Also remove individual grid-item-card patterns that might be floating
|
||||
grid_item_pattern = r'\{grid-item-card\}[^`]*?```'
|
||||
content = re.sub(grid_item_pattern, '', content, flags=re.DOTALL)
|
||||
|
||||
# Clean up any remaining grid-related patterns
|
||||
content = re.sub(r'\{grid-item-card\}[^\n]*\n', '', content)
|
||||
content = re.sub(r':link:[^\n]*\n', '', content)
|
||||
content = re.sub(r':class-[^:]*:[^\n]*\n', '', content)
|
||||
|
||||
# Clean up multiple newlines that result from removals
|
||||
content = re.sub(r'\n{3,}', '\n\n', content)
|
||||
|
||||
# Add badges for difficulty and time
|
||||
difficulty = get_difficulty_stars(module_name)
|
||||
time_estimate = get_time_estimate(module_name)
|
||||
badges = f"\n```{{div}} badges\n{difficulty} | ⏱️ {time_estimate}\n```\n"
|
||||
|
||||
# Get previous and next module names for navigation
|
||||
prev_module = f"{module_num-1:02d}_{get_prev_module_name(module_num)}" if module_num > 1 else None
|
||||
|
||||
# Add interactive learning elements and navigation at the end
|
||||
interactive_elements = f"""
|
||||
|
||||
Choose your preferred way to engage with this module:
|
||||
|
||||
````{{grid}} 1 2 3 3
|
||||
|
||||
```{{grid-item-card}} 🚀 Launch Binder
|
||||
:link: https://mybinder.org/v2/gh/mlsysbook/TinyTorch/main?filepath=modules/source/{module_name}/{module_name.split('_', 1)[1]}_dev.ipynb
|
||||
:class-header: bg-light
|
||||
|
||||
Run this module interactively in your browser. No installation required!
|
||||
```
|
||||
|
||||
```{{grid-item-card}} ⚡ Open in Colab
|
||||
:link: https://colab.research.google.com/github/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{module_name.split('_', 1)[1]}_dev.ipynb
|
||||
:class-header: bg-light
|
||||
|
||||
Use Google Colab for GPU access and cloud compute power.
|
||||
```
|
||||
|
||||
```{{grid-item-card}} 📖 View Source
|
||||
:link: https://github.com/mlsysbook/TinyTorch/blob/main/modules/source/{module_name}/{module_name.split('_', 1)[1]}_dev.py
|
||||
:class-header: bg-light
|
||||
|
||||
Browse the Python source code and understand the implementation.
|
||||
```
|
||||
|
||||
````
|
||||
|
||||
```{{admonition}} 💾 Save Your Progress
|
||||
:class: tip
|
||||
**Binder sessions are temporary!** Download your completed notebook when done, or switch to local development for persistent work.
|
||||
|
||||
Ready for serious development? → [🏗️ Local Setup Guide](../usage-paths/serious-development.md)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
"""
|
||||
|
||||
# Add navigation links
|
||||
nav_links = "<div class=\"prev-next-area\">\n"
|
||||
if prev_module:
|
||||
nav_links += f'<a class="left-prev" href="../chapters/{prev_module}.html" title="previous page">← Previous Module</a>\n'
|
||||
|
||||
# Get total number of modules dynamically
|
||||
module_names = get_module_names()
|
||||
if module_num < len(module_names):
|
||||
next_module = f"{module_num+1:02d}_{get_next_module_name(module_num)}"
|
||||
nav_links += f'<a class="right-next" href="../chapters/{next_module}.html" title="next page">Next Module →</a>\n'
|
||||
|
||||
nav_links += "</div>\n"
|
||||
|
||||
# Combine interactive elements with navigation
|
||||
nav_links = interactive_elements + nav_links
|
||||
|
||||
# Insert badges after the first heading
|
||||
lines = content.split('\n')
|
||||
enhanced_lines = []
|
||||
added_badges = False
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
# Keep the meaningful module headers but clean up the breadcrumb reference
|
||||
if line.startswith('# ') and not added_badges:
|
||||
# Keep "Module: CNN" format, just remove emoji for clean display
|
||||
if '🔥 Module:' in line:
|
||||
line = line.replace('🔥 ', '') # Remove emoji, keep "Module: CNN"
|
||||
|
||||
enhanced_lines.append(line)
|
||||
|
||||
# Add badges after first heading
|
||||
if not added_badges and line.startswith('# '):
|
||||
enhanced_lines.append(badges)
|
||||
added_badges = True
|
||||
|
||||
# Add navigation at the end
|
||||
enhanced_lines.append(nav_links)
|
||||
|
||||
return '\n'.join(enhanced_lines)
|
||||
|
||||
def get_difficulty_stars(module_name: str) -> str:
|
||||
"""Get difficulty stars from module.yaml file."""
|
||||
# Map module number to module folder name
|
||||
module_path = Path(f'../modules/source/{module_name}')
|
||||
module_info = get_module_info(module_path)
|
||||
return module_info.get('difficulty', '⭐⭐')
|
||||
|
||||
def get_time_estimate(module_name: str) -> str:
|
||||
"""Get time estimate from module.yaml file."""
|
||||
# Map module number to module folder name
|
||||
module_path = Path(f'../modules/source/{module_name}')
|
||||
module_info = get_module_info(module_path)
|
||||
return module_info.get('time_estimate', '3-4 hours')
|
||||
|
||||
def get_module_names() -> List[str]:
|
||||
"""Get actual module names from module.yaml files."""
|
||||
modules_dir = Path("../modules/source")
|
||||
module_names = []
|
||||
|
||||
# Get all module directories (sorted by number)
|
||||
module_dirs = []
|
||||
for item in modules_dir.iterdir():
|
||||
if item.is_dir() and item.name != 'utils':
|
||||
# Extract module number from directory name
|
||||
match = re.match(r'(\d+)_(.+)', item.name)
|
||||
if match:
|
||||
module_num = int(match.group(1))
|
||||
module_dirs.append((module_num, item))
|
||||
|
||||
# Sort by module number
|
||||
module_dirs.sort(key=lambda x: x[0])
|
||||
|
||||
# Read module names from module.yaml files
|
||||
for module_num, module_dir in module_dirs:
|
||||
module_yaml_path = module_dir / "module.yaml"
|
||||
if module_yaml_path.exists():
|
||||
module_info = get_module_info(module_dir)
|
||||
module_names.append(module_info.get('name', module_dir.name.split('_', 1)[1]))
|
||||
else:
|
||||
# Fallback to directory name
|
||||
module_names.append(module_dir.name.split('_', 1)[1])
|
||||
|
||||
return module_names
|
||||
|
||||
def get_prev_module_name(module_num: int) -> str:
|
||||
"""Get previous module name."""
|
||||
module_names = get_module_names()
|
||||
return module_names[module_num - 2] if module_num > 1 and module_num - 2 < len(module_names) else 'setup'
|
||||
|
||||
def get_next_module_name(module_num: int) -> str:
|
||||
"""Get next module name."""
|
||||
module_names = get_module_names()
|
||||
return module_names[module_num] if module_num < len(module_names) else module_names[-1] if module_names else 'setup'
|
||||
|
||||
def convert_readme_to_chapter(readme_path: Path, chapter_path: Path, module_num: int):
|
||||
"""Convert a single README to a Jupyter Book chapter."""
|
||||
print(f"Converting {readme_path} to {chapter_path}")
|
||||
|
||||
# Read README content
|
||||
with open(readme_path, 'r', encoding='utf-8') as f:
|
||||
content = f.read()
|
||||
|
||||
# Get module information
|
||||
module_path = readme_path.parent
|
||||
module_name = module_path.name
|
||||
module_info = get_module_info(module_path)
|
||||
|
||||
# Extract learning objectives
|
||||
objectives = extract_learning_objectives(content)
|
||||
|
||||
# Create frontmatter
|
||||
frontmatter = create_frontmatter(module_name, module_info, objectives)
|
||||
|
||||
# Enhance content for web
|
||||
enhanced_content = enhance_content_for_web(content, module_name, module_num)
|
||||
|
||||
# Write chapter file
|
||||
with open(chapter_path, 'w', encoding='utf-8') as f:
|
||||
f.write(frontmatter)
|
||||
f.write(enhanced_content)
|
||||
|
||||
print(f"✅ Created {chapter_path}")
|
||||
|
||||
def main():
|
||||
"""Convert all module READMEs to Jupyter Book chapters."""
|
||||
# Setup paths
|
||||
modules_dir = Path("../modules/source")
|
||||
chapters_dir = Path("chapters")
|
||||
|
||||
# Ensure chapters directory exists
|
||||
chapters_dir.mkdir(exist_ok=True)
|
||||
|
||||
# Get all module directories (sorted by number)
|
||||
module_dirs = []
|
||||
for item in modules_dir.iterdir():
|
||||
if item.is_dir() and item.name != 'utils':
|
||||
# Extract module number from directory name
|
||||
match = re.match(r'(\d+)_(.+)', item.name)
|
||||
if match:
|
||||
module_num = int(match.group(1))
|
||||
module_dirs.append((module_num, item))
|
||||
|
||||
# Sort by module number
|
||||
module_dirs.sort(key=lambda x: x[0])
|
||||
|
||||
print(f"Found {len(module_dirs)} modules to convert")
|
||||
|
||||
# Convert each README
|
||||
for module_num, module_dir in module_dirs:
|
||||
readme_path = module_dir / "README.md"
|
||||
if readme_path.exists():
|
||||
# Create chapter filename (just module number and name, no duplicate)
|
||||
chapter_filename = f"{module_num:02d}-{module_dir.name.split('_', 1)[1]}.md"
|
||||
chapter_path = chapters_dir / chapter_filename
|
||||
|
||||
convert_readme_to_chapter(readme_path, chapter_path, module_num)
|
||||
else:
|
||||
print(f"⚠️ No README.md found in {module_dir}")
|
||||
|
||||
print(f"\n🎉 Converted {len(module_dirs)} modules to chapters in {chapters_dir}")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,663 +0,0 @@
|
||||
# Frequently Asked Questions
|
||||
|
||||
## 🤔 Getting Started Questions
|
||||
|
||||
### **Installation & Setup**
|
||||
|
||||
**Q: I'm getting "tito: command not found" - what's wrong?**
|
||||
|
||||
A: This usually means your virtual environment isn't activated or TinyTorch isn't installed:
|
||||
|
||||
```bash
|
||||
# 1. Activate virtual environment
|
||||
source .venv/bin/activate # Windows: .venv\Scripts\activate
|
||||
|
||||
# 2. Install TinyTorch
|
||||
pip install -e .
|
||||
|
||||
# 3. Verify installation
|
||||
tito system doctor
|
||||
```
|
||||
|
||||
**Q: What Python version do I need?**
|
||||
|
||||
A: Python 3.8 or higher. Check with:
|
||||
```bash
|
||||
python --version # Should show 3.8+
|
||||
```
|
||||
|
||||
**Q: Can I use conda instead of venv?**
|
||||
|
||||
A: Yes! Replace the venv setup with:
|
||||
```bash
|
||||
conda create -n tinytorch python=3.9
|
||||
conda activate tinytorch
|
||||
pip install -r requirements.txt && pip install -e .
|
||||
```
|
||||
|
||||
**Q: The installation is taking forever - is this normal?**
|
||||
|
||||
A: Initial setup typically takes 2-5 minutes depending on your connection. The main time is downloading NumPy, Jupyter, and other scientific packages.
|
||||
|
||||
---
|
||||
|
||||
## 📚 Learning Questions
|
||||
|
||||
### **Course Structure**
|
||||
|
||||
**Q: How long does TinyTorch take to complete?**
|
||||
|
||||
A: Depends on your goals and pace:
|
||||
|
||||
| **Goal** | **Time** | **Coverage** | **What You'll Build** |
|
||||
|----------|----------|--------------|----------------------|
|
||||
| **Quick Taste** | 15 minutes | Demo + overview | See framework in action |
|
||||
| **Weekend Project** | 8-12 hours | Modules 1-6 | Neural network solver |
|
||||
| **Neural Networks** | 4 weeks | Modules 1-8 | MNIST classifier |
|
||||
| **Computer Vision** | 6 weeks | Modules 1-10 | CIFAR-10 CNN |
|
||||
| **Language Models** | 8 weeks | Modules 1-14 | TinyGPT generator |
|
||||
| **Full Framework** | 12 weeks | All 20 modules | Production-ready system |
|
||||
|
||||
**Q: Do I need machine learning experience to start?**
|
||||
|
||||
A: **No!** TinyTorch teaches ML systems from fundamentals. You need:
|
||||
|
||||
**✅ Required:**
|
||||
- Basic Python (functions, classes, imports)
|
||||
- High school math (multiplication, basic algebra)
|
||||
- Curiosity about how things work
|
||||
|
||||
**❌ Not Required:**
|
||||
- Previous ML experience
|
||||
- Deep learning knowledge
|
||||
- Advanced mathematics
|
||||
- PyTorch/TensorFlow experience
|
||||
|
||||
**Q: Can I skip modules or do them out of order?**
|
||||
|
||||
A: **No** - the progression is carefully designed:
|
||||
- Each module builds on previous implementations
|
||||
- Later modules import code from earlier ones
|
||||
- Checkpoints verify prerequisites are met
|
||||
- Skipping creates import errors and broken functionality
|
||||
|
||||
**Example:** Module 6 (Autograd) requires your Tensor class from Module 2. Skipping Module 2 breaks everything that follows.
|
||||
|
||||
**Q: What if I get stuck on a difficult concept?**
|
||||
|
||||
A: Multiple support options:
|
||||
|
||||
1. **Interactive Help**: `tito help --interactive` for personalized guidance
|
||||
2. **Module README**: Each module has detailed explanations
|
||||
3. **Community Support**: Join leaderboard for peer help
|
||||
4. **Troubleshooting**: `tito help troubleshooting` for common issues
|
||||
5. **Office Hours**: If taking as a course, use instructor support
|
||||
|
||||
### **Learning Methods**
|
||||
|
||||
**Q: Should I read everything before coding, or jump right into coding?**
|
||||
|
||||
A: **Jump into coding!** TinyTorch uses active learning:
|
||||
- Read just enough to understand the task
|
||||
- Start implementing immediately
|
||||
- Learn through building and testing
|
||||
- Explanations become clearer after you've tried the code
|
||||
|
||||
**Q: How much time should I spend on each module?**
|
||||
|
||||
A: Varies by module and experience:
|
||||
|
||||
| **Module Type** | **Typical Time** | **Examples** |
|
||||
|----------------|------------------|--------------|
|
||||
| **Foundation** | 2-4 hours | Tensors, Activations |
|
||||
| **Architecture** | 3-5 hours | Layers, Training |
|
||||
| **Advanced** | 4-6 hours | Attention, Transformers |
|
||||
| **Optimization** | 2-3 hours | Profiling, Benchmarking |
|
||||
|
||||
**Don't rush!** Deep understanding matters more than speed.
|
||||
|
||||
**Q: What's the difference between modules and checkpoints?**
|
||||
|
||||
A: **Modules** = Building, **Checkpoints** = Validating
|
||||
|
||||
| **Modules** | **Checkpoints** |
|
||||
|-------------|-----------------|
|
||||
| 20 hands-on coding sessions | 16 capability assessments |
|
||||
| You build implementations | Tests verify understanding |
|
||||
| `tito module complete 05` | `tito checkpoint test 05` |
|
||||
| Export code to framework | Validate you achieved capability |
|
||||
|
||||
**Workflow:** Complete module → Export implementation → Checkpoint test validates learning
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Technical Questions
|
||||
|
||||
### **Development Workflow**
|
||||
|
||||
**Q: Why can't I edit files in the `tinytorch/` directory?**
|
||||
|
||||
A: Those files are **auto-generated** from your source modules:
|
||||
|
||||
**✅ Edit Here:**
|
||||
```
|
||||
modules/02_tensor/tensor_dev.py ← Your source code
|
||||
```
|
||||
|
||||
**❌ Don't Edit:**
|
||||
```
|
||||
tinytorch/core/tensor.py ← Generated from source
|
||||
```
|
||||
|
||||
**Workflow:**
|
||||
1. Edit source: `modules/0X_name/name_dev.py`
|
||||
2. Export: `tito module complete 0X_name`
|
||||
3. Uses your code: `from tinytorch.core.name import Component`
|
||||
|
||||
**Q: What's the difference between .py and .ipynb files?**
|
||||
|
||||
A: **TinyTorch uses .py files only** for all development:
|
||||
|
||||
- **Source**: `tensor_dev.py` (edit this)
|
||||
- **Generated**: `tensor_dev.ipynb` (auto-created from .py)
|
||||
- **Never edit**: `.ipynb` files directly
|
||||
|
||||
**Why .py only?**
|
||||
- Clean version control (no JSON metadata)
|
||||
- Professional development practices
|
||||
- Consistent environment across contributors
|
||||
- Easy code review and collaboration
|
||||
|
||||
**Q: My tests are failing after implementing a function - what's wrong?**
|
||||
|
||||
A: Common debugging steps:
|
||||
|
||||
1. **Check syntax**: Run the module file directly
|
||||
```bash
|
||||
python modules/03_activations/activations_dev.py
|
||||
```
|
||||
|
||||
2. **Verify function signature**: Make sure your function matches the expected interface
|
||||
```python
|
||||
# Expected
|
||||
def relu(x: np.ndarray) -> np.ndarray:
|
||||
|
||||
# Not this
|
||||
def relu(x): # Missing type hints
|
||||
```
|
||||
|
||||
3. **Test incrementally**: Run tests after each function
|
||||
```bash
|
||||
tito checkpoint test 02 --verbose
|
||||
```
|
||||
|
||||
4. **Check imports**: Ensure NumPy is imported as `np`
|
||||
|
||||
**Q: How do I run just one test instead of all tests?**
|
||||
|
||||
A: Use specific test commands:
|
||||
|
||||
```bash
|
||||
# Test specific checkpoint
|
||||
tito checkpoint test 03
|
||||
|
||||
# Test specific module export
|
||||
tito module complete 03_activations --dry-run
|
||||
|
||||
# Run module file directly
|
||||
python modules/03_activations/activations_dev.py
|
||||
```
|
||||
|
||||
### **System Issues**
|
||||
|
||||
**Q: Jupyter Lab won't start - what's wrong?**
|
||||
|
||||
A: Common solutions:
|
||||
|
||||
1. **Check installation**:
|
||||
```bash
|
||||
pip install jupyterlab jupyter
|
||||
jupyter lab --version
|
||||
```
|
||||
|
||||
2. **Port conflict**:
|
||||
```bash
|
||||
jupyter lab --port 8889 # Try different port
|
||||
```
|
||||
|
||||
3. **Virtual environment**:
|
||||
```bash
|
||||
source .venv/bin/activate # Ensure activated
|
||||
which jupyter # Should show .venv path
|
||||
```
|
||||
|
||||
**Q: I'm getting import errors when testing - help!**
|
||||
|
||||
A: Import errors usually mean:
|
||||
|
||||
1. **Virtual environment not activated**:
|
||||
```bash
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
2. **TinyTorch not installed in development mode**:
|
||||
```bash
|
||||
pip install -e . --force-reinstall
|
||||
```
|
||||
|
||||
3. **Module not exported**:
|
||||
```bash
|
||||
tito module complete 0X_module_name
|
||||
```
|
||||
|
||||
4. **Check your export directive**:
|
||||
```python
|
||||
#| default_exp tinytorch.core.module_name # At top of file
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌍 Community Questions
|
||||
|
||||
### **Leaderboard & Community**
|
||||
|
||||
**Q: Is the leaderboard competitive or supportive?**
|
||||
|
||||
A: **Both!** We designed it to be inclusive and encouraging:
|
||||
|
||||
**🏆 Multiple Ways to Excel:**
|
||||
- **Progress**: Checkpoint completion (everyone can achieve)
|
||||
- **Speed**: Fast learners (if that's your style)
|
||||
- **Innovation**: Creative optimizations (for advanced users)
|
||||
- **Community**: Helping others (valuable contribution)
|
||||
|
||||
**🤝 Supportive Culture:**
|
||||
- Celebrate all achievements, not just "first place"
|
||||
- Anonymous participation options available
|
||||
- Community helps each other learn
|
||||
- No shame in taking time to understand concepts
|
||||
|
||||
**Q: Do I have to share my progress publicly?**
|
||||
|
||||
A: **No!** Participation is entirely optional:
|
||||
|
||||
- All learning features work without leaderboard
|
||||
- Checkpoint system tracks progress locally
|
||||
- Join community only when/if you want to
|
||||
- Privacy controls let you share what you're comfortable with
|
||||
|
||||
**Q: What information is shared when I join the leaderboard?**
|
||||
|
||||
A: You control what's shared:
|
||||
|
||||
**Always Shared:**
|
||||
- Display name (you choose - can be pseudonymous)
|
||||
- Checkpoint completion status
|
||||
- Module completion dates
|
||||
|
||||
**Optionally Shared:**
|
||||
- Real name (if you choose)
|
||||
- Institution/company
|
||||
- Achievement celebrations
|
||||
- Optimization benchmarks
|
||||
|
||||
**Never Shared:**
|
||||
- Personal information
|
||||
- Email addresses
|
||||
- Code implementations
|
||||
- Detailed progress metrics (unless you opt in)
|
||||
|
||||
### **Competition & Olympics**
|
||||
|
||||
**Q: What are the Olympics and how are they different from the leaderboard?**
|
||||
|
||||
A: **Leaderboard** = Learning Progress, **Olympics** = Performance Competition
|
||||
|
||||
| **Leaderboard** | **Olympics** |
|
||||
|-----------------|--------------|
|
||||
| Track learning progress | Compete on optimization |
|
||||
| Checkpoint completion | Benchmark performance |
|
||||
| Supportive community | Competitive challenges |
|
||||
| All experience levels | Advanced optimization |
|
||||
|
||||
**Olympics Events:**
|
||||
- **MLP Sprint**: Fastest matrix operations
|
||||
- **CNN Marathon**: Memory-efficient convolutions
|
||||
- **Transformer Decathlon**: Complete language model optimization
|
||||
|
||||
**Q: Do I need to be an expert to participate in Olympics?**
|
||||
|
||||
A: **No!** Olympics have multiple categories:
|
||||
|
||||
- **Beginner**: Just-working implementations compete
|
||||
- **Intermediate**: Solid optimizations
|
||||
- **Advanced**: Cutting-edge techniques
|
||||
- **Innovation**: Novel approaches
|
||||
|
||||
**Everyone can contribute and learn from others' solutions.**
|
||||
|
||||
---
|
||||
|
||||
## 🎓 Instructor Questions
|
||||
|
||||
### **Classroom Setup**
|
||||
|
||||
**Q: How much setup is required to use TinyTorch in my class?**
|
||||
|
||||
A: **Minimal!** TinyTorch includes complete teaching infrastructure:
|
||||
|
||||
**One-time Setup (30 minutes):**
|
||||
```bash
|
||||
tito nbgrader setup-instructor
|
||||
tito grade setup-course
|
||||
```
|
||||
|
||||
**Per-semester Setup (10 minutes):**
|
||||
```bash
|
||||
tito nbgrader create-student-repos
|
||||
tito grade release-module 01_setup
|
||||
```
|
||||
|
||||
**Everything Included:**
|
||||
- NBGrader integration works out-of-the-box
|
||||
- Student progress tracking built-in
|
||||
- Automated grading workflow
|
||||
- Assignment release/collection system
|
||||
|
||||
**Q: Can I customize the curriculum for my specific course?**
|
||||
|
||||
A: **Absolutely!** TinyTorch is designed for flexibility:
|
||||
|
||||
**Duration Options:**
|
||||
- **4 weeks**: Neural network foundations (Modules 1-8)
|
||||
- **8 weeks**: Add computer vision (Modules 1-10)
|
||||
- **12 weeks**: Include language models (Modules 1-14)
|
||||
- **16 weeks**: Complete system optimization (All 20)
|
||||
|
||||
**Difficulty Customization:**
|
||||
- **Beginner**: Additional scaffolding and explanations
|
||||
- **Advanced**: Extra optimization challenges
|
||||
- **Research**: Custom project integration
|
||||
|
||||
**Q: How do I track student progress across the class?**
|
||||
|
||||
A: Multiple tracking tools built-in:
|
||||
|
||||
```bash
|
||||
# Class overview
|
||||
tito grade class-overview
|
||||
|
||||
# Individual student
|
||||
tito grade student-progress student_name
|
||||
|
||||
# Checkpoint statistics
|
||||
tito checkpoint class-stats
|
||||
|
||||
# Module completion rates
|
||||
tito grade module-stats 05_losses
|
||||
```
|
||||
|
||||
**Visual dashboards show:**
|
||||
- Who's completed which modules
|
||||
- Where students are getting stuck
|
||||
- Average completion times
|
||||
- Achievement distributions
|
||||
|
||||
### **Grading & Assessment**
|
||||
|
||||
**Q: How does automated grading work?**
|
||||
|
||||
A: **Three-layer validation system:**
|
||||
|
||||
1. **Functional Tests**: Does the code work correctly?
|
||||
2. **Interface Tests**: Does it match expected function signatures?
|
||||
3. **Checkpoint Tests**: Can student use their implementation?
|
||||
|
||||
```bash
|
||||
# Grade student submission
|
||||
tito nbgrader autograde 05_losses student_name
|
||||
|
||||
# Results show:
|
||||
# ✅ Function implementation (40 points)
|
||||
# ✅ Interface compliance (30 points)
|
||||
# ✅ Integration test (30 points)
|
||||
# Total: 100/100
|
||||
```
|
||||
|
||||
**Q: What if a student's implementation works but doesn't match the test exactly?**
|
||||
|
||||
A: **Flexible grading system:**
|
||||
|
||||
- **Core functionality**: Must work correctly (non-negotiable)
|
||||
- **Implementation details**: Multiple valid approaches accepted
|
||||
- **Code style**: Guidance provided, not penalized
|
||||
- **Performance**: Bonus points for optimization, not required
|
||||
|
||||
**Manual review system** catches edge cases and provides personalized feedback.
|
||||
|
||||
**Q: How do I handle students working at different paces?**
|
||||
|
||||
A: **Built-in flexibility:**
|
||||
|
||||
**Self-paced Options:**
|
||||
- Students can work ahead through modules
|
||||
- Checkpoint system validates readiness for advanced topics
|
||||
- Extra credit opportunities for early finishers
|
||||
|
||||
**Support for Struggling Students:**
|
||||
- Extended deadlines through system configuration
|
||||
- Additional scaffolding materials included
|
||||
- Peer tutoring through community features
|
||||
- Office hours integration with progress tracking
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### **Common Error Messages**
|
||||
|
||||
**Error: `ModuleNotFoundError: No module named 'tinytorch'`**
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# 1. Activate virtual environment
|
||||
source .venv/bin/activate
|
||||
|
||||
# 2. Install in development mode
|
||||
pip install -e .
|
||||
|
||||
# 3. Verify installation
|
||||
python -c "import tinytorch; print('Success!')"
|
||||
```
|
||||
|
||||
**Error: `AttributeError: module 'tinytorch.core.tensor' has no attribute 'Tensor'`**
|
||||
|
||||
**Cause:** Module not exported or export failed
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# 1. Check export status
|
||||
tito module status 02_tensor
|
||||
|
||||
# 2. Re-export module
|
||||
tito module complete 02_tensor
|
||||
|
||||
# 3. Verify export worked
|
||||
python -c "from tinytorch.core.tensor import Tensor; print('Success!')"
|
||||
```
|
||||
|
||||
**Error: Tests pass individually but fail in checkpoint**
|
||||
|
||||
**Cause:** Integration issues between modules
|
||||
|
||||
**Solutions:**
|
||||
```bash
|
||||
# 1. Test integration
|
||||
tito checkpoint test 05 --verbose
|
||||
|
||||
# 2. Check all dependencies exported
|
||||
tito module status --all
|
||||
|
||||
# 3. Re-export dependency chain
|
||||
tito module complete 02_tensor
|
||||
tito module complete 03_activations
|
||||
# ... up to current module
|
||||
```
|
||||
|
||||
### **Performance Issues**
|
||||
|
||||
**Q: Training is really slow - is this normal?**
|
||||
|
||||
A: **Some slowness is expected** (you're building from scratch!), but here's how to optimize:
|
||||
|
||||
**Expected Performance:**
|
||||
- **Pure NumPy**: 10-100x slower than PyTorch
|
||||
- **Simple examples**: Should complete in seconds
|
||||
- **CIFAR-10 training**: 5-10 minutes per epoch
|
||||
|
||||
**Optimization Tips:**
|
||||
```python
|
||||
# Use vectorized operations
|
||||
result = np.dot(x, weights) # Fast
|
||||
|
||||
# Avoid Python loops
|
||||
for i in range(len(x)): # Slow
|
||||
result[i] = x[i] * weights[i]
|
||||
```
|
||||
|
||||
**Q: My computer is running out of memory during training**
|
||||
|
||||
A: **Memory management strategies:**
|
||||
|
||||
1. **Reduce batch size**:
|
||||
```python
|
||||
batch_size = 32 # Instead of 256
|
||||
```
|
||||
|
||||
2. **Use gradient accumulation**:
|
||||
```python
|
||||
# Accumulate gradients over mini-batches
|
||||
optimizer.step_every_n_batches(4)
|
||||
```
|
||||
|
||||
3. **Profile memory usage**:
|
||||
```bash
|
||||
tito checkpoint test 10 --profile-memory
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💡 Best Practices
|
||||
|
||||
### **Learning Effectively**
|
||||
|
||||
**Q: What's the best way to approach each module?**
|
||||
|
||||
A: **Follow the Build → Use → Reflect pattern:**
|
||||
|
||||
**1. Build (Implementation)**
|
||||
- Read the introduction to understand the goal
|
||||
- Implement functions one at a time
|
||||
- Test each function immediately after writing it
|
||||
|
||||
**2. Use (Integration)**
|
||||
- Export your module: `tito module complete 0X_name`
|
||||
- Test the integration with checkpoint
|
||||
- Use your component in examples
|
||||
|
||||
**3. Reflect (Understanding)**
|
||||
- Answer the ML Systems Thinking questions
|
||||
- Consider memory usage and performance
|
||||
- Connect to production ML systems
|
||||
|
||||
**Q: How do I know if I really understand a concept?**
|
||||
|
||||
A: **True understanding means you can:**
|
||||
|
||||
1. **Implement from memory**: Re-write the function without looking
|
||||
2. **Explain to others**: Describe how and why it works
|
||||
3. **Debug problems**: Fix issues when something breaks
|
||||
4. **Optimize performance**: Improve memory or speed
|
||||
5. **Connect to production**: Relate to PyTorch/TensorFlow internals
|
||||
|
||||
**Checkpoint tests verify some of this, but self-reflection is crucial.**
|
||||
|
||||
### **Time Management**
|
||||
|
||||
**Q: I'm spending too much time on implementation details - should I move on?**
|
||||
|
||||
A: **Balance depth with progress:**
|
||||
|
||||
**When to Push Through:**
|
||||
- Core concepts not clicking yet
|
||||
- Function doesn't work correctly
|
||||
- Tests are failing
|
||||
|
||||
**When to Move On:**
|
||||
- Function works and passes tests
|
||||
- You understand the main concept
|
||||
- You're optimizing minor details
|
||||
|
||||
**Remember:** You can always return to optimize later. The goal is understanding systems, not perfect code.
|
||||
|
||||
**Q: Should I complete all modules before starting real projects?**
|
||||
|
||||
A: **No!** Start projects as soon as you have the basics:
|
||||
|
||||
- **After Module 6**: Build XOR solver
|
||||
- **After Module 8**: Train MNIST classifier
|
||||
- **After Module 10**: CIFAR-10 CNN
|
||||
- **After Module 14**: TinyGPT language model
|
||||
|
||||
**Real projects reinforce learning and show practical applications.**
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Getting More Help
|
||||
|
||||
### **When These FAQs Don't Help**
|
||||
|
||||
**1. Interactive CLI Help**
|
||||
```bash
|
||||
tito help --interactive # Personalized guidance
|
||||
tito help troubleshooting # Common technical issues
|
||||
```
|
||||
|
||||
**2. System Diagnostics**
|
||||
```bash
|
||||
tito system doctor # Comprehensive system check
|
||||
```
|
||||
|
||||
**3. Community Support**
|
||||
- Join leaderboard for peer help and discussion
|
||||
- Share specific error messages for targeted assistance
|
||||
- Learn from others' solutions and approaches
|
||||
|
||||
**4. Documentation Resources**
|
||||
- **Module README files**: Detailed explanations for each topic
|
||||
- **User Manual**: Comprehensive guide to all features
|
||||
- **Instructor Guide**: Teaching resources and classroom management
|
||||
|
||||
**5. Course Support (if applicable)**
|
||||
- Office hours with instructor
|
||||
- Class discussion forums
|
||||
- Teaching assistant support
|
||||
|
||||
### **Reporting Issues**
|
||||
|
||||
**Found a bug or unclear documentation?**
|
||||
|
||||
Please include:
|
||||
- **System info**: Output of `tito system doctor`
|
||||
- **Error message**: Complete traceback if available
|
||||
- **Steps to reproduce**: What commands led to the issue
|
||||
- **Expected vs actual**: What you expected to happen
|
||||
|
||||
**Contact through:**
|
||||
- Course instructor (if taking as class)
|
||||
- Community leaderboard (for peer support)
|
||||
- GitHub issues (for bug reports)
|
||||
|
||||
---
|
||||
|
||||
**Still have questions? Try `tito help --interactive` for personalized guidance! 🚀**
|
||||
@@ -1,232 +0,0 @@
|
||||
# KISS Principle in TinyTorch
|
||||
|
||||
## Keep It Simple, Stupid
|
||||
|
||||
The KISS principle is at the core of TinyTorch's design philosophy. Every component, interface, and implementation follows this fundamental rule: **simplicity enables understanding**.
|
||||
|
||||
## Why KISS Matters in ML Education
|
||||
|
||||
### Traditional ML Frameworks: Complexity by Default
|
||||
Most production ML frameworks prioritize performance and features over clarity:
|
||||
|
||||
```python
|
||||
# PyTorch: Multiple ways to do everything
|
||||
torch.nn.Conv2d(3, 64, kernel_size=3, padding=1) # Object-oriented
|
||||
F.conv2d(x, weight, bias, padding=1) # Functional
|
||||
torch.conv2d(x, weight, bias, padding=[1,1]) # Low-level
|
||||
|
||||
# Result: Students learn APIs, not concepts
|
||||
```
|
||||
|
||||
### TinyTorch: Clarity by Design
|
||||
TinyTorch chooses the simplest approach that teaches the concept:
|
||||
|
||||
```python
|
||||
# TinyTorch: One clear way to do each operation
|
||||
Conv2D(in_channels=3, out_channels=64, kernel_size=3, padding=1)
|
||||
|
||||
# Result: Students understand the operation itself
|
||||
```
|
||||
|
||||
## KISS in Practice
|
||||
|
||||
### 1. Single Responsibility Components
|
||||
Every class has one clear purpose:
|
||||
|
||||
```python
|
||||
# ✅ GOOD: Clear, single responsibility
|
||||
class ReLU:
|
||||
def forward(self, x):
|
||||
return np.maximum(0, x)
|
||||
|
||||
def backward(self, grad_output):
|
||||
return grad_output * (self.last_input > 0)
|
||||
|
||||
# ❌ AVOID: Multiple responsibilities
|
||||
class ActivationWithDropoutAndNormalization:
|
||||
# Too many concerns in one class
|
||||
```
|
||||
|
||||
### 2. Minimal Interfaces
|
||||
Functions do one thing with clear inputs/outputs:
|
||||
|
||||
```python
|
||||
# ✅ GOOD: Simple, predictable interface
|
||||
def conv2d(input, weight, bias=None, stride=1, padding=0):
|
||||
# Implementation...
|
||||
return output
|
||||
|
||||
# ❌ AVOID: Complex, unclear interface
|
||||
def conv2d_advanced(input, weight, bias=None, stride=1, padding=0,
|
||||
dilation=1, groups=1, padding_mode='zeros',
|
||||
output_padding=0, **kwargs):
|
||||
# Too many options obscure the core concept
|
||||
```
|
||||
|
||||
### 3. Explicit Over Implicit
|
||||
Make the "magic" visible:
|
||||
|
||||
```python
|
||||
# ✅ GOOD: Shows what's happening
|
||||
def train_step(model, loss_fn, optimizer, batch_x, batch_y):
|
||||
# Forward pass
|
||||
pred = model(batch_x)
|
||||
loss = loss_fn(pred, batch_y)
|
||||
|
||||
# Backward pass
|
||||
loss.backward()
|
||||
optimizer.step()
|
||||
optimizer.zero_grad()
|
||||
|
||||
return loss.data
|
||||
|
||||
# ❌ AVOID: Hidden complexity
|
||||
def train_step(trainer, batch):
|
||||
return trainer.step(batch) # What actually happens?
|
||||
```
|
||||
|
||||
## KISS Design Decisions
|
||||
|
||||
### File Organization
|
||||
```
|
||||
# ✅ Simple structure
|
||||
tinytorch/
|
||||
├── core/ # Core implementations
|
||||
├── utils/ # Utilities
|
||||
└── datasets/ # Data handling
|
||||
|
||||
# vs. complex hierarchies with deep nesting
|
||||
```
|
||||
|
||||
### Module Design
|
||||
- **One concept per module**: Tensors, Activations, Layers, etc.
|
||||
- **Progressive complexity**: Each module builds on previous ones
|
||||
- **Self-contained**: Each module can be understood independently
|
||||
|
||||
### Code Style
|
||||
- **No magic methods**: `__add__` is clear, `__radd__` is confusing
|
||||
- **Explicit names**: `conv2d` not `conv`, `ReLU` not `R`
|
||||
- **Minimal inheritance**: Composition over complex hierarchies
|
||||
|
||||
## Educational Benefits
|
||||
|
||||
### 1. Cognitive Load Reduction
|
||||
Simple code means students focus on concepts, not syntax:
|
||||
|
||||
```python
|
||||
# Cognitive load: LOW - focus on the math
|
||||
def sigmoid(x):
|
||||
return 1 / (1 + np.exp(-x))
|
||||
|
||||
# Cognitive load: HIGH - distracted by implementation details
|
||||
def sigmoid(x, inplace=False, dtype=None, device=None, memory_format=None):
|
||||
# Complex implementation with many edge cases
|
||||
```
|
||||
|
||||
### 2. Debugging Clarity
|
||||
When something breaks, simple code is easy to debug:
|
||||
|
||||
```python
|
||||
# ✅ Easy to debug: clear execution path
|
||||
def forward(self, x):
|
||||
self.last_input = x
|
||||
return np.maximum(0, x)
|
||||
|
||||
# ❌ Hard to debug: hidden state and side effects
|
||||
def forward(self, x):
|
||||
return self._apply_with_state_management(x, self._relu_impl)
|
||||
```
|
||||
|
||||
### 3. Modification Confidence
|
||||
Simple code invites experimentation:
|
||||
|
||||
```python
|
||||
# Students think: "I can modify this!"
|
||||
def adam_update(param, grad, m, v, lr=0.001, beta1=0.9, beta2=0.999):
|
||||
m = beta1 * m + (1 - beta1) * grad
|
||||
v = beta2 * v + (1 - beta2) * grad * grad
|
||||
param -= lr * m / (np.sqrt(v) + 1e-8)
|
||||
return param, m, v
|
||||
|
||||
# Students think: "I better not touch this..."
|
||||
# [100 lines of optimized, abstracted update logic]
|
||||
```
|
||||
|
||||
## KISS vs. Performance
|
||||
|
||||
### The Trade-off
|
||||
KISS sometimes means choosing clarity over peak performance:
|
||||
|
||||
```python
|
||||
# TinyTorch: Clear but not optimized
|
||||
def conv2d_simple(input, kernel):
|
||||
output = np.zeros(output_shape)
|
||||
for i in range(output_height):
|
||||
for j in range(output_width):
|
||||
# Clear nested loops show the operation
|
||||
output[i, j] = np.sum(input[i:i+k_h, j:j+k_w] * kernel)
|
||||
return output
|
||||
|
||||
# Production: Optimized but opaque
|
||||
def conv2d_optimized(input, kernel):
|
||||
# BLAS calls, memory optimization, SIMD instructions
|
||||
return torch._C._nn.conv2d(input, kernel, ...)
|
||||
```
|
||||
|
||||
### When We Optimize
|
||||
We add optimization layers **after** establishing clarity:
|
||||
|
||||
1. **First**: Implement the clearest possible version
|
||||
2. **Then**: Profile and identify bottlenecks
|
||||
3. **Finally**: Add optimizations with clear documentation
|
||||
|
||||
### Documentation of Trade-offs
|
||||
Every optimization is explained:
|
||||
|
||||
```python
|
||||
def conv2d_vectorized(input, kernel):
|
||||
"""Vectorized convolution implementation.
|
||||
|
||||
This version uses im2col transformation for speed.
|
||||
For the clear, educational version, see conv2d_simple().
|
||||
|
||||
Trade-off: 10x faster, but obscures the sliding window concept.
|
||||
"""
|
||||
```
|
||||
|
||||
## KISS Guidelines for Contributors
|
||||
|
||||
### Before Adding Complexity
|
||||
Ask these questions:
|
||||
1. **Is this essential for understanding the concept?**
|
||||
2. **Can students modify this confidently?**
|
||||
3. **Does this make debugging easier or harder?**
|
||||
4. **Is there a simpler way to achieve the same goal?**
|
||||
|
||||
### Code Review Checklist
|
||||
- [ ] Single responsibility per function/class
|
||||
- [ ] Clear, explicit names
|
||||
- [ ] Minimal parameter lists
|
||||
- [ ] No hidden state or side effects
|
||||
- [ ] Students can understand the implementation
|
||||
- [ ] Debugging is straightforward
|
||||
|
||||
### Refactoring Triggers
|
||||
Refactor when:
|
||||
- Functions have more than 3-4 parameters
|
||||
- Classes have more than one clear responsibility
|
||||
- Students ask "what does this do?" frequently
|
||||
- Debugging requires deep knowledge of implementation details
|
||||
|
||||
## The KISS Promise
|
||||
|
||||
TinyTorch promises that every component follows KISS principles:
|
||||
|
||||
- **You can understand any implementation in 5 minutes**
|
||||
- **You can modify any component confidently**
|
||||
- **When something breaks, you can debug it yourself**
|
||||
- **The simplest solution is always preferred**
|
||||
|
||||
This isn't just about code - it's about **empowering learners** to become confident ML systems engineers who understand their tools completely.
|
||||
|
||||
Remember: **Complex problems often have simple solutions. Simple solutions enable deep understanding.**
|
||||
@@ -1,89 +0,0 @@
|
||||
# Quick Exploration Path
|
||||
|
||||
**Perfect for:** "I want to see what this is about" • "Can I try this without installing anything?"
|
||||
**Time Commitment:** 5-30 minutes • **Setup Required:** None
|
||||
|
||||
---
|
||||
|
||||
## Launch Immediately (0 Setup Required)
|
||||
|
||||
Click the **Launch Binder** button on any chapter to get:
|
||||
- Live Jupyter environment in your browser
|
||||
- Pre-configured TinyTorch development setup
|
||||
- Ability to run and modify all code immediately
|
||||
- No installation, no account creation needed
|
||||
|
||||
```{admonition} What You'll Experience in 5-30 Minutes
|
||||
:class: tip
|
||||
**Immediate implementation experience** with real ML components:
|
||||
- **5 min**: ReLU activation function from scratch
|
||||
- **10 min**: Tensor operations that power neural networks
|
||||
- **15 min**: Dense layers that transform data
|
||||
- **20 min**: Complete neural networks for image classification
|
||||
- **30 min**: See how language models use the same foundations
|
||||
|
||||
All running live in your browser with zero setup!
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommended Exploration Path
|
||||
|
||||
### Start Here: Chapter 1 - Setup
|
||||
- Understand the TinyTorch development workflow
|
||||
- Get familiar with the educational approach
|
||||
- See how components fit together
|
||||
|
||||
**[Launch Setup Chapter](../chapters/01-setup.md)**
|
||||
|
||||
### Then Try: Chapter 3 - Activations
|
||||
- Implement your first ML function (ReLU)
|
||||
- See immediate visual results
|
||||
- Understand why nonlinearity matters
|
||||
|
||||
**[Launch Activations Chapter](../chapters/03-activations.md)**
|
||||
|
||||
### Build Up: Chapter 4 - Layers
|
||||
- Create the building blocks of neural networks
|
||||
- Combine your ReLU with matrix operations
|
||||
- See how simple math becomes powerful AI
|
||||
|
||||
**[Launch Layers Chapter](../chapters/04-layers.md)**
|
||||
|
||||
---
|
||||
|
||||
## Important Limitations
|
||||
|
||||
**Sessions are temporary:**
|
||||
- Binder sessions timeout after ~20 minutes of inactivity
|
||||
- Your work is **not saved** when the session ends
|
||||
- Great for exploration, not for ongoing projects
|
||||
|
||||
**For persistent work:** Ready to build your own TinyTorch? → **[Serious Development Path](serious-development.md)**
|
||||
|
||||
---
|
||||
|
||||
## What You'll Understand
|
||||
|
||||
After exploring 2-3 chapters, you'll have hands-on understanding of:
|
||||
|
||||
- **How ML frameworks work under the hood**
|
||||
- **Why activation functions are crucial**
|
||||
- **How matrix multiplication powers neural networks**
|
||||
- **The relationship between layers, networks, and learning**
|
||||
- **Real implementation vs. high-level APIs**
|
||||
- **Why vision and language models share the same foundations**
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
**Satisfied with exploration?** You've gained valuable insight into ML systems!
|
||||
|
||||
**Want to build more?** → **[Fork the repo and work locally](serious-development.md)**
|
||||
|
||||
**Teaching a class?** → **[Classroom setup guide](classroom-use.md)**
|
||||
|
||||
---
|
||||
|
||||
*No commitment required - just click and explore!*
|
||||
@@ -1,244 +0,0 @@
|
||||
# Serious Development Path
|
||||
|
||||
**Perfect for:** "I want to build this myself" • "This is my class assignment" • "I want to understand ML frameworks deeply"
|
||||
|
||||
---
|
||||
|
||||
## What You'll Build
|
||||
|
||||
A complete ML framework from scratch, including:
|
||||
- **Your own tensor library** with operations and autograd
|
||||
- **Neural network components** (layers, activations, optimizers)
|
||||
- **Training systems** that work on real datasets (CIFAR-10)
|
||||
- **Production features** (compression, monitoring, benchmarking)
|
||||
- **Language models** that extend your vision framework to TinyGPT
|
||||
|
||||
**End result:** A working ML framework that powers both computer vision AND language models.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start (5 minutes)
|
||||
|
||||
### Step 1: Get the Code
|
||||
```bash
|
||||
git clone https://github.com/your-org/tinytorch.git
|
||||
cd TinyTorch
|
||||
```
|
||||
|
||||
### Step 2: Setup Environment
|
||||
```bash
|
||||
# Activate virtual environment
|
||||
source bin/activate-tinytorch.sh
|
||||
|
||||
# Install dependencies
|
||||
make install
|
||||
|
||||
# Verify everything works
|
||||
tito system doctor
|
||||
```
|
||||
|
||||
### Step 3: Start Building
|
||||
```bash
|
||||
# Open first assignment
|
||||
cd modules/01_setup
|
||||
jupyter lab setup_dev.py
|
||||
```
|
||||
|
||||
### Step 4: Build → Test → Export → Use
|
||||
```bash
|
||||
# After implementing code in the notebook:
|
||||
tito export # Export your code to tinytorch package
|
||||
tito test setup # Test your implementation
|
||||
|
||||
# Now use YOUR own code:
|
||||
python -c "from tinytorch.core.setup import hello_tinytorch; hello_tinytorch()"
|
||||
# 🔥 TinyTorch! Built by: [Your Name]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Learning Path (Progressive Complexity)
|
||||
|
||||
### Foundation (Weeks 1-2)
|
||||
Build the core infrastructure:
|
||||
|
||||
**Module 01: Setup & CLI**
|
||||
- Professional development workflow with `tito` CLI
|
||||
- Understanding package architecture and exports
|
||||
- Quality assurance with automated testing
|
||||
|
||||
**Module 01: Tensors**
|
||||
- Multi-dimensional arrays and operations
|
||||
- Memory management and data types
|
||||
- Foundation for all ML operations
|
||||
|
||||
**Module 02: Activations**
|
||||
- ReLU, Sigmoid, Tanh, Softmax functions
|
||||
- Understanding nonlinearity in neural networks
|
||||
- Mathematical foundations of deep learning
|
||||
|
||||
---
|
||||
|
||||
### 🧱 Building Blocks (Weeks 3-4)
|
||||
Create neural network components:
|
||||
|
||||
**Module 03: Layers**
|
||||
- Dense (linear) layers with matrix multiplication
|
||||
- Weight initialization strategies
|
||||
- Building blocks that stack together
|
||||
|
||||
**Module 04: Networks**
|
||||
- Sequential model architecture
|
||||
- Composition patterns and forward propagation
|
||||
- Creating complete neural networks
|
||||
|
||||
**Module 05: CNNs**
|
||||
- Convolutional operations for computer vision
|
||||
- Understanding spatial processing
|
||||
- Building blocks for image classification
|
||||
|
||||
---
|
||||
|
||||
### Training Systems (Weeks 5-6)
|
||||
Complete training infrastructure:
|
||||
|
||||
**Module 06: DataLoader**
|
||||
- Efficient data loading and preprocessing
|
||||
- Real dataset handling (CIFAR-10)
|
||||
- Batching, shuffling, and memory management
|
||||
|
||||
**Module 07: Autograd**
|
||||
- Automatic differentiation engine
|
||||
- Computational graphs and backpropagation
|
||||
- The magic that makes training possible
|
||||
|
||||
**Module 08: Optimizers**
|
||||
- SGD, Adam, and learning rate scheduling
|
||||
- Understanding gradient descent variants
|
||||
- Convergence and training dynamics
|
||||
|
||||
**Module 09: Training**
|
||||
- Complete training loops and loss functions
|
||||
- Model evaluation and metrics
|
||||
- Checkpointing and persistence
|
||||
|
||||
---
|
||||
|
||||
### Production & Performance (Weeks 7-8)
|
||||
Real-world deployment:
|
||||
|
||||
**Module 10: Compression**
|
||||
- Model pruning and quantization
|
||||
- Reducing model size by 75%+
|
||||
- Deployment optimization
|
||||
|
||||
**Module 11: Kernels**
|
||||
- High-performance custom operations
|
||||
- Hardware-aware optimization
|
||||
- Understanding framework internals
|
||||
|
||||
**Module 12: Benchmarking**
|
||||
- Systematic performance measurement
|
||||
- Statistical validation and reporting
|
||||
- MLPerf-style evaluation
|
||||
|
||||
**Module 13: MLOps**
|
||||
- Production deployment and monitoring
|
||||
- Continuous learning and model updates
|
||||
- Complete production pipeline
|
||||
|
||||
**Module 16: TinyGPT 🔥**
|
||||
- Extend vision framework to language models
|
||||
- GPT-style transformers with 95% component reuse
|
||||
- Autoregressive text generation
|
||||
- Framework generalization mastery
|
||||
|
||||
---
|
||||
|
||||
## Development Workflow
|
||||
|
||||
### The `tito` CLI System
|
||||
TinyTorch includes a complete CLI for professional development:
|
||||
|
||||
```bash
|
||||
# System management
|
||||
tito system doctor # Check environment health
|
||||
tito system info # Show module status
|
||||
|
||||
# Module development
|
||||
tito export # Export dev code to package
|
||||
tito test setup # Test specific module
|
||||
tito test --all # Test everything
|
||||
|
||||
# NBGrader integration
|
||||
tito nbgrader generate setup # Create assignments
|
||||
tito nbgrader release setup # Release to students
|
||||
tito nbgrader autograde setup # Auto-grade submissions
|
||||
```
|
||||
|
||||
### Quality Assurance
|
||||
Every module includes comprehensive testing:
|
||||
- **100+ automated tests** ensure correctness
|
||||
- **Inline tests** provide immediate feedback
|
||||
- **Integration tests** verify cross-module functionality
|
||||
- **Performance benchmarks** track optimization
|
||||
|
||||
---
|
||||
|
||||
## Proven Student Outcomes
|
||||
|
||||
```{admonition} Real Results
|
||||
:class: success
|
||||
**After 6-8 weeks, students consistently:**
|
||||
|
||||
✅ Build multi-layer perceptrons that classify CIFAR-10 images
|
||||
✅ Implement automatic differentiation from scratch
|
||||
✅ Create custom optimizers (SGD, Adam) that converge reliably
|
||||
✅ Optimize models with pruning and quantization
|
||||
✅ Deploy production ML systems with monitoring
|
||||
✅ Understand framework internals better than most ML engineers
|
||||
🔥 **Extend their vision framework to language models with 95% reuse**
|
||||
|
||||
**Test Coverage:** 200+ tests across all modules ensure student implementations work
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Why This Approach Works
|
||||
|
||||
### Build → Use → Understand
|
||||
Every component follows this pattern:
|
||||
|
||||
1. **🔧 Build**: Implement `ReLU()` from scratch
|
||||
2. **🚀 Use**: `from tinytorch.core.activations import ReLU` - your code!
|
||||
3. **💡 Understand**: See how it enables complex pattern learning
|
||||
|
||||
### Real Data, Real Systems
|
||||
- Work with CIFAR-10 (not toy datasets)
|
||||
- Production-style code organization
|
||||
- Performance and engineering considerations
|
||||
- Professional development practices
|
||||
|
||||
### Immediate Feedback
|
||||
- Code works immediately after implementation
|
||||
- Visual progress indicators and success messages
|
||||
- Comprehensive error handling and guidance
|
||||
- Professional-quality development experience
|
||||
|
||||
---
|
||||
|
||||
## Ready to Start?
|
||||
|
||||
### Choose Your Module
|
||||
**New to ML frameworks?** → Start with [Setup](../chapters/01-setup.md)
|
||||
**Have ML experience?** → Jump to [Tensors](../chapters/01-tensor.md)
|
||||
**Want to see the vision?** → Try [Activations](../chapters/02-activations.md)
|
||||
|
||||
### Get Help
|
||||
- **💬 Discussions**: GitHub Discussions for questions
|
||||
- **🐛 Issues**: Report bugs or suggest improvements
|
||||
- **📧 Support**: Direct contact with TinyTorch team
|
||||
|
||||
---
|
||||
|
||||
*🎉 Ready to build your own ML framework? Your unified vision+language framework is 8 weeks away!*
|
||||
@@ -1,103 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Verify that the Jupyter Book build is complete and all pages are present.
|
||||
"""
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from rich.console import Console
|
||||
from rich.table import Table
|
||||
from rich.panel import Panel
|
||||
|
||||
console = Console()
|
||||
|
||||
def verify_book_build():
|
||||
"""Verify the book build is complete."""
|
||||
build_dir = Path("book/_build/html")
|
||||
|
||||
if not build_dir.exists():
|
||||
console.print("❌ Build directory not found! Run 'tito book build' first.")
|
||||
return False
|
||||
|
||||
# Pages that must exist
|
||||
required_pages = {
|
||||
"Main Pages": [
|
||||
"index.html",
|
||||
"intro.html",
|
||||
"setup.html",
|
||||
"instructor-guide.html",
|
||||
"system-architecture.html"
|
||||
],
|
||||
"Module Chapters": [
|
||||
f"chapters/{i:02d}-{name}.html" for i, name in enumerate([
|
||||
"introduction", "setup", "tensor", "activations", "layers",
|
||||
"dense", "spatial", "attention", "dataloader", "autograd",
|
||||
"optimizers", "training", "compression", "kernels",
|
||||
"benchmarking", "mlops", "tinygpt"
|
||||
], 0)
|
||||
],
|
||||
"New Documentation": [
|
||||
"testing-framework.html",
|
||||
"kiss-principle.html"
|
||||
],
|
||||
"Usage Paths": [
|
||||
"usage-paths/quick-start.html",
|
||||
"usage-paths/browse-online.html",
|
||||
"usage-paths/serious-development.html"
|
||||
]
|
||||
}
|
||||
|
||||
# Check each category
|
||||
results = {}
|
||||
for category, pages in required_pages.items():
|
||||
results[category] = []
|
||||
for page in pages:
|
||||
full_path = build_dir / page
|
||||
exists = full_path.exists()
|
||||
size = full_path.stat().st_size if exists else 0
|
||||
results[category].append({
|
||||
'page': page,
|
||||
'exists': exists,
|
||||
'size': size
|
||||
})
|
||||
|
||||
# Display results
|
||||
console.print(Panel.fit(
|
||||
"📚 [bold blue]TinyTorch Jupyter Book Verification[/bold blue]",
|
||||
border_style="blue"
|
||||
))
|
||||
|
||||
all_good = True
|
||||
for category, checks in results.items():
|
||||
console.print(f"\n[bold]{category}[/bold]")
|
||||
|
||||
for check in checks:
|
||||
if check['exists']:
|
||||
if check['size'] > 100: # More than just a redirect
|
||||
console.print(f" ✅ {check['page']} ({check['size']:,} bytes)")
|
||||
else:
|
||||
console.print(f" ⚠️ {check['page']} (small: {check['size']} bytes)")
|
||||
else:
|
||||
console.print(f" ❌ {check['page']} (missing)")
|
||||
all_good = False
|
||||
|
||||
# Summary
|
||||
if all_good:
|
||||
console.print(Panel.fit(
|
||||
"✨ [bold green]All documentation pages built successfully![/bold green]\n"
|
||||
f"🌐 View at: file://{build_dir.absolute()}/index.html",
|
||||
border_style="green"
|
||||
))
|
||||
else:
|
||||
console.print(Panel.fit(
|
||||
"⚠️ [bold yellow]Some pages are missing![/bold yellow]\n"
|
||||
"Run 'tito book build' to rebuild the documentation.",
|
||||
border_style="yellow"
|
||||
))
|
||||
|
||||
return all_good
|
||||
|
||||
if __name__ == "__main__":
|
||||
os.chdir(Path(__file__).parent.parent) # Go to project root
|
||||
success = verify_book_build()
|
||||
exit(0 if success else 1)
|
||||
@@ -1,213 +0,0 @@
|
||||
# The TinyTorch Vision
|
||||
|
||||
**Training ML Systems Engineers: From Computer Vision to Language Models**
|
||||
|
||||
---
|
||||
|
||||
## The Problem We're Solving
|
||||
|
||||
The ML field has a critical gap: **most education teaches you to use frameworks, not build them.**
|
||||
|
||||
### Traditional ML Education:
|
||||
```python
|
||||
import torch
|
||||
import torch.nn as nn
|
||||
model = nn.Linear(784, 10)
|
||||
optimizer = torch.optim.Adam(model.parameters())
|
||||
```
|
||||
|
||||
**Questions students can't answer:**
|
||||
- Why does Adam use 3× more memory than SGD?
|
||||
- How does `loss.backward()` actually compute gradients?
|
||||
- When should you use gradient accumulation vs larger batch sizes?
|
||||
- Why do attention mechanisms limit context length?
|
||||
|
||||
### The TinyTorch Difference:
|
||||
```python
|
||||
class Linear:
|
||||
def __init__(self, in_features, out_features):
|
||||
self.weight = Tensor(np.random.randn(in_features, out_features))
|
||||
self.bias = Tensor(np.zeros(out_features))
|
||||
|
||||
def forward(self, x):
|
||||
return x @ self.weight + self.bias # YOU implemented @
|
||||
|
||||
def backward(self, grad_output):
|
||||
# YOU understand exactly how gradients flow
|
||||
self.weight.grad = x.T @ grad_output
|
||||
return grad_output @ self.weight.T
|
||||
```
|
||||
|
||||
**Questions students CAN answer:**
|
||||
- Exactly how automatic differentiation works
|
||||
- Why certain optimizers use more memory
|
||||
- How to debug training instability
|
||||
- When to make performance vs accuracy trade-offs
|
||||
|
||||
---
|
||||
|
||||
## What We Teach: Systems Thinking
|
||||
|
||||
### Beyond Algorithms: System-Level Understanding
|
||||
|
||||
**Memory Management:**
|
||||
- Why Adam needs 3× parameter memory (parameters + momentum + variance)
|
||||
- How attention matrices scale O(N²) with sequence length
|
||||
- When gradient accumulation saves memory vs compute trade-offs
|
||||
|
||||
**Performance Analysis:**
|
||||
- Why naive convolution is 100× slower than optimized versions
|
||||
- How cache misses destroy performance in matrix operations
|
||||
- When vectorization provides 10-100× speedups
|
||||
|
||||
**Production Trade-offs:**
|
||||
- SGD vs Adam: convergence speed vs memory constraints
|
||||
- Gradient checkpointing: trading compute for memory
|
||||
- Mixed precision: 2× memory savings with accuracy considerations
|
||||
|
||||
**Hardware Awareness:**
|
||||
- How memory bandwidth limits ML performance
|
||||
- Why GPU utilization matters more than peak FLOPS
|
||||
- When distributed training becomes necessary
|
||||
|
||||
---
|
||||
|
||||
## Target Audience: Future ML Systems Engineers
|
||||
|
||||
### Perfect For:
|
||||
|
||||
**Computer Science Students**
|
||||
- Going beyond "use PyTorch" to "understand PyTorch"
|
||||
- Building portfolio projects that demonstrate deep system knowledge
|
||||
- Preparing for ML engineering roles (not just data science)
|
||||
|
||||
**Software Engineers → ML Engineers**
|
||||
- Leveraging existing programming skills for ML systems
|
||||
- Understanding performance, debugging, and optimization
|
||||
- Learning production ML patterns and infrastructure
|
||||
|
||||
**ML Practitioners**
|
||||
- Moving from model users to model builders
|
||||
- Debugging training issues at the systems level
|
||||
- Optimizing models for production deployment
|
||||
|
||||
**Researchers & Advanced Users**
|
||||
- Implementing custom operations and architectures
|
||||
- Understanding framework limitations and workarounds
|
||||
- Building specialized ML systems for unique domains
|
||||
|
||||
### Career Transformation:
|
||||
|
||||
**Before TinyTorch:** "I can train models with PyTorch"
|
||||
**After TinyTorch:** "I can build and optimize ML systems"
|
||||
|
||||
You become the person your team asks:
|
||||
- *"Why is our training bottlenecked?"*
|
||||
- *"Can we fit this model in memory?"*
|
||||
- *"How do we implement this research paper?"*
|
||||
- *"What's the best architecture for our constraints?"*
|
||||
|
||||
---
|
||||
|
||||
## Pedagogical Philosophy: Build → Use → Understand
|
||||
|
||||
### 1. Build First
|
||||
Every component implemented from scratch:
|
||||
- Tensors with broadcasting and memory management
|
||||
- Automatic differentiation with computational graphs
|
||||
- Optimizers with state management and memory profiling
|
||||
- Complete training loops with checkpointing and monitoring
|
||||
|
||||
### 2. Use Immediately
|
||||
No toy examples - recreate ML history with real results:
|
||||
- **MLP Era**: Train MLPs to 52.7% CIFAR-10 accuracy (the baseline that motivated CNNs)
|
||||
- **CNN Revolution**: Build LeNet-1 (39.4%) and LeNet-5 (47.5%) - witness the breakthrough
|
||||
- **Modern CNNs**: Push beyond MLPs with optimized architectures (75%+ achievable)
|
||||
- **Transformer Era**: Language models using 95% vision framework reuse
|
||||
|
||||
### 3. Understand Systems
|
||||
Connect implementations to production reality:
|
||||
- How your tensor maps to PyTorch's memory model
|
||||
- Why your optimizer choices affect GPU utilization
|
||||
- How your autograd compares to production frameworks
|
||||
- When your implementations would need modification at scale
|
||||
|
||||
### 4. Reflect on Trade-offs
|
||||
ML Systems Thinking sections in every module:
|
||||
- Memory vs compute trade-offs in different architectures
|
||||
- Accuracy vs efficiency considerations for deployment
|
||||
- Debugging strategies for common production issues
|
||||
- Framework design principles and their implications
|
||||
|
||||
---
|
||||
|
||||
## Unique Value Proposition
|
||||
|
||||
### What Makes TinyTorch Different:
|
||||
|
||||
**Systems-First Approach**
|
||||
- Not just "how does attention work" but "why does attention scale O(N²) and how do production systems handle this?"
|
||||
- Not just "implement SGD" but "when do you choose SGD vs Adam in production?"
|
||||
|
||||
**Production Relevance**
|
||||
- Memory profiling, performance optimization, deployment patterns
|
||||
- Real datasets, realistic scale, professional development workflow
|
||||
- Connection to industry practices and framework design decisions
|
||||
|
||||
**Framework Generalization**
|
||||
- 20 modules that build ONE cohesive ML framework supporting vision AND language
|
||||
- 95% component reuse from computer vision to language models
|
||||
- Professional package structure with CLI tools and testing
|
||||
|
||||
**Proven Pedagogy**
|
||||
- Build → Use → Understand cycle creates deep intuition
|
||||
- Immediate testing and feedback for every component
|
||||
- Progressive complexity with solid foundations
|
||||
- NBGrader integration for classroom deployment
|
||||
|
||||
---
|
||||
|
||||
## Learning Outcomes: Becoming an ML Systems Engineer
|
||||
|
||||
### Technical Mastery
|
||||
- **Implement any ML paper** from first principles
|
||||
- **Debug training issues** at the systems level
|
||||
- **Optimize models** for production deployment
|
||||
- **Profile and improve** ML system performance
|
||||
- **Design custom architectures** for specialized domains
|
||||
- **Understand framework generalization** across vision and language
|
||||
|
||||
### Systems Understanding
|
||||
- **Memory management** in ML frameworks
|
||||
- **Computational complexity** vs real-world performance
|
||||
- **Hardware utilization** patterns and optimization
|
||||
- **Distributed training** challenges and solutions
|
||||
- **Production deployment** considerations and trade-offs
|
||||
|
||||
### Professional Skills
|
||||
- **Test-driven development** for ML systems
|
||||
- **Performance profiling** and optimization techniques
|
||||
- **Code organization** and package development
|
||||
- **Documentation** and API design
|
||||
- **MLOps** and production monitoring
|
||||
|
||||
### Career Impact
|
||||
- **Technical interviews**: Demonstrate deep ML systems knowledge
|
||||
- **Job opportunities**: Qualify for ML engineer (not just data scientist) roles
|
||||
- **Team leadership**: Become the go-to person for ML systems questions
|
||||
- **Research ability**: Implement cutting-edge papers independently
|
||||
- **Entrepreneurship**: Build ML products with full-stack understanding
|
||||
|
||||
---
|
||||
|
||||
## Ready to Become an ML Systems Engineer?
|
||||
|
||||
**TinyTorch transforms ML users into ML builders.**
|
||||
|
||||
Stop wondering how frameworks work. Start building them.
|
||||
|
||||
**[Begin Your Journey →](chapters/00-introduction.md)**
|
||||
|
||||
---
|
||||
|
||||
*TinyTorch: Because understanding how to build ML systems makes you a more effective ML engineer.*
|
||||
@@ -1,428 +0,0 @@
|
||||
# Module 17: Compression - Comprehensive Review Report
|
||||
|
||||
**Date**: 2025-11-10
|
||||
**Reviewer**: TinyTorch Standards Compliance
|
||||
**Module**: compression_dev.py (1720 lines)
|
||||
**Status**: ⚠️ NEEDS SIGNIFICANT IMPROVEMENTS
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Module 17 (Compression) is a **well-structured educational module** that covers important ML compression techniques. However, it has **critical violations** of TinyTorch standards that must be addressed before it can be considered complete.
|
||||
|
||||
**Overall Score**: 6.5/10
|
||||
|
||||
### Critical Issues Found:
|
||||
1. ❌ **Sequential class definition violates composition rules** (CRITICAL)
|
||||
2. ❌ **Missing `__main__` guards for test execution** (CRITICAL)
|
||||
3. ⚠️ **NBGrader cell metadata incomplete** (HIGH)
|
||||
4. ⚠️ **Systems analysis sections could be more focused** (MEDIUM)
|
||||
5. ✅ Good educational content and clear explanations
|
||||
6. ✅ Comprehensive test coverage
|
||||
|
||||
---
|
||||
|
||||
## 1. NBGrader Cell Structure ❌ ISSUES FOUND
|
||||
|
||||
### Issues:
|
||||
1. **Missing cell metadata on many cells** - Not all code cells have proper NBGrader metadata
|
||||
2. **Inconsistent grade_id naming** - Some cells lack unique identifiers
|
||||
3. **Missing "locked" flags on test cells** - Test cells should be marked as locked
|
||||
|
||||
### Examples of Problems:
|
||||
|
||||
```python
|
||||
# Line 59: MISSING specific nbgrader metadata
|
||||
# %% nbgrader={"grade": false, "grade_id": "imports", "solution": true}
|
||||
# Should specify: "locked": false, "schema_version": 3, "solution": true
|
||||
|
||||
# Lines 362-379: Test cell MISSING grade metadata
|
||||
def test_unit_measure_sparsity():
|
||||
"""🔬 Test sparsity measurement functionality."""
|
||||
# Should have: {"grade": true, "grade_id": "test-measure-sparsity", "locked": true, "points": 5}
|
||||
```
|
||||
|
||||
### Required Fixes:
|
||||
|
||||
**Metadata Template for Implementation Cells:**
|
||||
```python
|
||||
# %% nbgrader={"grade": false, "grade_id": "cell-unique-id", "locked": false, "schema_version": 3, "solution": true}
|
||||
```
|
||||
|
||||
**Metadata Template for Test Cells:**
|
||||
```python
|
||||
# %% nbgrader={"grade": true, "grade_id": "test-unique-id", "locked": true, "points": 5, "schema_version": 3}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Educational Content & Docstrings ✅ EXCELLENT
|
||||
|
||||
### Strengths:
|
||||
- ✅ Clear progression from motivation to implementation
|
||||
- ✅ Excellent ASCII diagrams explaining compression techniques
|
||||
- ✅ Comprehensive docstrings with TODO/APPROACH/HINTS
|
||||
- ✅ Strong mathematical foundations explained clearly
|
||||
- ✅ Real-world production context throughout
|
||||
|
||||
### Examples of Excellence:
|
||||
|
||||
```python
|
||||
# Lines 295-319: Excellent sparsity visualization
|
||||
"""
|
||||
Dense Matrix (0% sparse): Sparse Matrix (75% sparse):
|
||||
┌─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐ ┌─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─┐
|
||||
│ 2.1 1.3 0.8 1.9 2.4 1.1 0.7 │ │ 2.1 0.0 0.0 1.9 0.0 0.0 0.0 │
|
||||
...
|
||||
```
|
||||
|
||||
- Lines 322-360: Perfect docstring structure with TODO/APPROACH/EXAMPLE/HINT
|
||||
- Lines 842-923: Outstanding knowledge distillation explanation with diagrams
|
||||
|
||||
### Minor Improvements Needed:
|
||||
- Some sections could be more concise (avoid over-explanation)
|
||||
- A few technical terms could benefit from simpler analogies
|
||||
|
||||
---
|
||||
|
||||
## 3. Imports and Module Structure ⚠️ CRITICAL VIOLATION
|
||||
|
||||
### CRITICAL ISSUE: Sequential Class Definition
|
||||
|
||||
**Lines 73-91: FORBIDDEN pattern detected**
|
||||
|
||||
```python
|
||||
# Sequential container for model compression
|
||||
class Sequential:
|
||||
"""Sequential container for compression (not exported from core layers)."""
|
||||
def __init__(self, *layers):
|
||||
self.layers = list(layers)
|
||||
```
|
||||
|
||||
**Why This Violates TinyTorch Standards:**
|
||||
|
||||
From the agent rules:
|
||||
> ❌ FORBIDDEN: Sequential containers that chain layers
|
||||
> Modules NEVER build COMPOSITIONS that hide student work
|
||||
|
||||
**The Problem:**
|
||||
- Sequential is a **composition class** that hides layer interactions
|
||||
- Students should see explicit layer chaining in milestones/examples
|
||||
- Modules build ATOMIC COMPONENTS, not compositions
|
||||
- This breaks the pedagogical principle of visible data flow
|
||||
|
||||
**Required Fix:**
|
||||
```python
|
||||
# REMOVE Sequential class entirely from module
|
||||
|
||||
# Instead, let milestones/examples show explicit composition:
|
||||
class MLP: # In milestone, NOT in module
|
||||
def __init__(self):
|
||||
self.layer1 = Linear(784, 128)
|
||||
self.relu = ReLU()
|
||||
self.layer2 = Linear(128, 10)
|
||||
|
||||
def forward(self, x):
|
||||
x = self.layer1.forward(x) # Students SEE each step
|
||||
x = self.relu.forward(x)
|
||||
x = self.layer2.forward(x)
|
||||
return x
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
- Tests currently use Sequential (lines 367, 498, 655, etc.)
|
||||
- Need to rewrite tests to use explicit layer chaining
|
||||
- Or import Sequential from a milestone helper (if available)
|
||||
|
||||
---
|
||||
|
||||
## 4. Memory Profiling & Performance Benchmarking ⚠️ NEEDS IMPROVEMENT
|
||||
|
||||
### Current State:
|
||||
- ✅ Has profiling integration (lines 103-155, 1249-1317)
|
||||
- ✅ Compression technique comparison (lines 1327-1377)
|
||||
- ⚠️ Missing detailed memory analysis for sparse vs dense storage
|
||||
- ⚠️ Missing timing comparisons for pruned vs unpruned inference
|
||||
|
||||
### Existing Good Examples:
|
||||
|
||||
**Lines 1249-1317: Excellent profiler integration**
|
||||
```python
|
||||
def demo_compression_with_profiler():
|
||||
"""📊 Demonstrate parameter reduction using Profiler from Module 15."""
|
||||
# Shows before/after parameter counts, sparsity, memory
|
||||
```
|
||||
|
||||
### Missing Analysis:
|
||||
|
||||
**Should Add:**
|
||||
1. **Sparse Storage Formats Analysis**
|
||||
```python
|
||||
def analyze_sparse_storage_formats():
|
||||
"""Compare COO, CSR, CSC storage for different sparsity levels."""
|
||||
# Show memory overhead of indices
|
||||
# Show when sparse format beats dense
|
||||
```
|
||||
|
||||
2. **Inference Time Impact**
|
||||
```python
|
||||
def analyze_pruning_speedup():
|
||||
"""Measure actual inference time with/without sparse libraries."""
|
||||
# Show that pruning alone doesn't guarantee speedup
|
||||
# Demonstrate need for sparse BLAS libraries
|
||||
```
|
||||
|
||||
3. **Memory Access Patterns**
|
||||
```python
|
||||
def analyze_cache_efficiency():
|
||||
"""Compare structured vs unstructured sparsity memory patterns."""
|
||||
# Show cache miss rates
|
||||
# Demonstrate hardware acceleration benefits
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. ML Systems Analysis Content ⚠️ GOOD BUT COULD BE BETTER
|
||||
|
||||
### Current Systems Analysis:
|
||||
|
||||
**Lines 1230-1324: Good foundation**
|
||||
- ✅ Compression technique comparison
|
||||
- ✅ Profiler integration demonstration
|
||||
- ✅ Parameter reduction tracking
|
||||
|
||||
**Lines 1327-1377: analyze_compression_techniques()**
|
||||
- ✅ Compares magnitude vs structured pruning
|
||||
- ✅ Shows compression ratios across model sizes
|
||||
- ⚠️ Could add timing measurements
|
||||
|
||||
**Lines 1387-1417: analyze_distillation_effectiveness()**
|
||||
- ✅ Shows teacher-student compression ratios
|
||||
- ⚠️ Simulated data instead of real measurements
|
||||
- ⚠️ Missing actual training/inference time comparison
|
||||
|
||||
### Recommendations:
|
||||
|
||||
1. **Add Real Measurements**: Replace simulated data with actual profiling
|
||||
2. **Compare All Techniques**: Side-by-side comparison of all compression methods
|
||||
3. **Hardware Impact**: Show how different techniques affect different hardware
|
||||
4. **Production Patterns**: Reference real-world compression pipelines (BERT, MobileNet)
|
||||
|
||||
---
|
||||
|
||||
## 6. Test Coverage ✅ EXCELLENT
|
||||
|
||||
### Test Structure:
|
||||
- ✅ Unit tests for every function (test_unit_*)
|
||||
- ✅ Comprehensive module integration test (test_module)
|
||||
- ✅ Clear test descriptions and assertions
|
||||
- ✅ Realistic test scenarios
|
||||
|
||||
### Unit Tests Present:
|
||||
1. ✅ test_unit_measure_sparsity() - Lines 362-379
|
||||
2. ✅ test_unit_magnitude_prune() - Lines 493-525
|
||||
3. ✅ test_unit_structured_prune() - Lines 650-684
|
||||
4. ✅ test_unit_low_rank_approximate() - Lines 799-829
|
||||
5. ✅ test_unit_knowledge_distillation() - Lines 1035-1064
|
||||
6. ✅ test_unit_compress_model() - Lines 1196-1227
|
||||
|
||||
### Integration Test:
|
||||
- ✅ test_module() - Lines 1427-1523
|
||||
- ✅ Tests complete pipeline
|
||||
- ✅ Validates all techniques work together
|
||||
|
||||
### **CRITICAL ISSUE: Missing `__main__` Guards**
|
||||
|
||||
**Lines 379, 525, 684, 829, 1064, 1227, 1523:** Tests run at module level without protection
|
||||
|
||||
```python
|
||||
# CURRENT (WRONG):
|
||||
test_unit_measure_sparsity() # Runs on import!
|
||||
|
||||
# REQUIRED (CORRECT):
|
||||
if __name__ == "__main__":
|
||||
test_unit_measure_sparsity() # Only runs when executing module directly
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
- Tests execute when module is imported by other modules
|
||||
- Causes unnecessary output and potential errors
|
||||
- Violates the dependency chain rules
|
||||
- Module 18+ cannot cleanly import from Module 17
|
||||
|
||||
**Fix Required for ALL test calls:**
|
||||
```python
|
||||
def test_unit_measure_sparsity():
|
||||
"""🔬 Test sparsity measurement functionality."""
|
||||
# Test implementation
|
||||
pass
|
||||
|
||||
# Add this guard IMMEDIATELY after test definition:
|
||||
if __name__ == "__main__":
|
||||
test_unit_measure_sparsity()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Production Context & Real-World Applications ✅ EXCELLENT
|
||||
|
||||
### Strengths:
|
||||
- ✅ Clear deployment scenarios (mobile, edge, cloud) - Lines 1099-1132
|
||||
- ✅ Production compression pipelines explained - Lines 1076-1094
|
||||
- ✅ Hardware considerations throughout
|
||||
- ✅ Real-world compression ratios cited
|
||||
- ✅ Knowledge distillation use cases
|
||||
|
||||
### Examples of Excellence:
|
||||
|
||||
**Lines 1099-1132: Deployment scenarios**
|
||||
```python
|
||||
MOBILE APP (Aggressive compression needed):
|
||||
• Magnitude pruning: 95% sparsity
|
||||
• Structured pruning: 50% channels
|
||||
• Knowledge distillation: 10x reduction
|
||||
```
|
||||
|
||||
**Lines 167-179: Real constraints**
|
||||
```python
|
||||
- Modern language models: 100GB+ (GPT-3 scale)
|
||||
- Mobile devices: <1GB available for models
|
||||
- Edge devices: <100MB realistic limits
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Detailed Issue Breakdown
|
||||
|
||||
### Priority 1: CRITICAL (Must Fix Before Export)
|
||||
|
||||
1. **Remove Sequential Class** (Lines 73-91)
|
||||
- Violates composition principle
|
||||
- Replace with explicit layer usage in tests
|
||||
- Add note directing students to milestones for composition
|
||||
|
||||
2. **Add `__main__` Guards to ALL Test Calls**
|
||||
- Lines: 379, 525, 684, 829, 1064, 1227, 1523
|
||||
- Prevents tests from running on import
|
||||
- Critical for Module 18+ to import cleanly
|
||||
|
||||
3. **Fix NBGrader Metadata**
|
||||
- Add complete metadata to all cells
|
||||
- Ensure consistent grade_id naming
|
||||
- Mark test cells as locked with points
|
||||
|
||||
### Priority 2: HIGH (Should Fix Soon)
|
||||
|
||||
4. **Add Missing Systems Analysis Functions**
|
||||
- Sparse storage format comparison
|
||||
- Inference time measurements (pruned vs unpruned)
|
||||
- Cache efficiency analysis
|
||||
|
||||
5. **Improve Existing Analysis**
|
||||
- Replace simulated data with real measurements
|
||||
- Add timing data to compression technique comparison
|
||||
- Show hardware-specific differences
|
||||
|
||||
### Priority 3: MEDIUM (Nice to Have)
|
||||
|
||||
6. **Module Structure Improvements**
|
||||
- Consider splitting into submodules if growing
|
||||
- Add more cross-references to other modules
|
||||
- Clarify package export structure
|
||||
|
||||
7. **Documentation Enhancements**
|
||||
- Add references to academic papers
|
||||
- Include real-world case studies
|
||||
- Link to production implementations
|
||||
|
||||
---
|
||||
|
||||
## Compliance Checklist
|
||||
|
||||
### NBGrader Requirements
|
||||
- ⚠️ **Jupytext headers**: Present but could be more complete
|
||||
- ❌ **Cell metadata**: Incomplete, missing schema_version
|
||||
- ✅ **BEGIN/END SOLUTION blocks**: Properly used
|
||||
- ✅ **Scaffolding outside solution blocks**: Excellent
|
||||
- ⚠️ **Test cells locked**: Missing lock flags
|
||||
|
||||
### Educational Quality
|
||||
- ✅ **Cognitive load**: Well-managed, 2-3 concepts per section
|
||||
- ✅ **Progressive disclosure**: Excellent flow
|
||||
- ✅ **Immediate feedback**: Unit tests after each function
|
||||
- ✅ **Production connections**: Strong throughout
|
||||
|
||||
### Technical Quality
|
||||
- ✅ **Implementation correctness**: All functions properly implemented
|
||||
- ❌ **Module dependency rules**: Sequential class violates rules
|
||||
- ❌ **Test isolation**: Tests run on import (missing guards)
|
||||
- ✅ **Integration validation**: Comprehensive test_module()
|
||||
|
||||
### Systems Quality
|
||||
- ⚠️ **Performance profiling**: Good but could be more comprehensive
|
||||
- ⚠️ **Memory analysis**: Present but incomplete
|
||||
- ✅ **Real-world implications**: Excellent
|
||||
- ⚠️ **Trade-off discussions**: Good but could add more measurements
|
||||
|
||||
---
|
||||
|
||||
## Recommended Action Plan
|
||||
|
||||
### Phase 1: Critical Fixes (1-2 hours)
|
||||
1. Remove Sequential class, refactor tests to use explicit layers
|
||||
2. Add `__main__` guards to all test function calls
|
||||
3. Update NBGrader metadata on all cells
|
||||
|
||||
### Phase 2: High Priority (2-3 hours)
|
||||
4. Add sparse storage format analysis function
|
||||
5. Add inference timing comparison function
|
||||
6. Replace simulated data with real measurements
|
||||
|
||||
### Phase 3: Polish (1-2 hours)
|
||||
7. Review and enhance cross-references
|
||||
8. Add academic paper references
|
||||
9. Final consistency check
|
||||
|
||||
---
|
||||
|
||||
## Positive Highlights
|
||||
|
||||
Despite the issues, this module has many strengths:
|
||||
|
||||
1. **Excellent Educational Design**: Clear progression, strong explanations
|
||||
2. **Comprehensive Coverage**: All major compression techniques included
|
||||
3. **Strong Testing**: Unit tests and integration tests well-designed
|
||||
4. **Production Context**: Real-world scenarios clearly explained
|
||||
5. **Visual Aids**: Outstanding ASCII diagrams
|
||||
6. **Mathematical Rigor**: Proper foundations explained clearly
|
||||
|
||||
---
|
||||
|
||||
## Final Verdict
|
||||
|
||||
**Current Status**: NOT READY FOR EXPORT
|
||||
|
||||
**With Critical Fixes**: READY FOR EXPORT
|
||||
|
||||
**Overall Assessment**: This is a **high-quality educational module** that needs **critical architectural fixes** to comply with TinyTorch standards. The Sequential class violation and missing `__main__` guards are blocking issues. Once these are resolved, this module will be an excellent addition to the curriculum.
|
||||
|
||||
**Estimated Time to Fix**: 4-8 hours for complete compliance
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Review this report with the development team
|
||||
2. Prioritize Critical fixes (Priority 1)
|
||||
3. Implement fixes following TinyTorch standards
|
||||
4. Re-run validation after fixes
|
||||
5. Export module once compliant
|
||||
|
||||
---
|
||||
|
||||
**Report Generated**: 2025-11-10
|
||||
**Reviewer**: TinyTorch Quality Assurance
|
||||
**Module**: 17_compression/compression_dev.py
|
||||
**Lines Reviewed**: 1720
|
||||
**Issues Found**: 7 (2 Critical, 2 High, 3 Medium)
|
||||
@@ -1,591 +0,0 @@
|
||||
# Module 15: Memoization (KV Caching) - Review Report
|
||||
|
||||
**Date**: 2025-11-10
|
||||
**Reviewer**: TinyTorch Standards Compliance
|
||||
**Status**: ✅ PASSING (Minor Issues Found)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Module 15 (Memoization/KV Caching) is **well-structured and production-ready** with excellent educational content. The module successfully implements KV caching for transformer inference optimization with comprehensive testing and systems analysis.
|
||||
|
||||
**Overall Grade: A- (92/100)**
|
||||
|
||||
### Key Strengths
|
||||
- ✅ Comprehensive KVCache implementation with proper memory management
|
||||
- ✅ Excellent educational scaffolding with clear TODO/APPROACH/HINTS
|
||||
- ✅ Strong systems analysis with memory profiling and speedup measurements
|
||||
- ✅ Non-invasive integration pattern (enhances existing modules without breaking them)
|
||||
- ✅ All tests pass successfully
|
||||
- ✅ Real-world context and production relevance throughout
|
||||
|
||||
### Issues Found
|
||||
1. ⚠️ **CRITICAL**: Missing proper test file protection with `if __name__ == "__main__"`
|
||||
2. ⚠️ **MEDIUM**: Module number inconsistency (says Module 14 in some places, should be 15)
|
||||
3. ⚠️ **MINOR**: Missing comprehensive docstrings for analysis functions
|
||||
4. ⚠️ **MINOR**: Some markdown cells could use better formatting
|
||||
|
||||
---
|
||||
|
||||
## Detailed Analysis
|
||||
|
||||
### 1. NBGrader Cell Structure ✅ PASSING
|
||||
|
||||
**Score: 95/100**
|
||||
|
||||
#### Strengths:
|
||||
- ✅ Proper Jupytext headers present (lines 1-13)
|
||||
- ✅ Correct NBGrader metadata on implementation cells
|
||||
- ✅ BEGIN/END SOLUTION blocks properly used
|
||||
- ✅ Test cells have locked=true and grade=true
|
||||
- ✅ Unique grade_ids for all graded cells
|
||||
|
||||
#### Issues:
|
||||
- ⚠️ Some cells missing nbgrader metadata (lines 79-141 profile section)
|
||||
|
||||
**Recommendation**: Add nbgrader metadata to analysis cells:
|
||||
```python
|
||||
# %% nbgrader={"grade": false, "grade_id": "motivation-profile", "locked": false}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Educational Content & Docstrings ✅ EXCELLENT
|
||||
|
||||
**Score: 98/100**
|
||||
|
||||
#### Strengths:
|
||||
- ✅ Outstanding conceptual explanations (Parts 1-2)
|
||||
- ✅ Clear ASCII diagrams showing cache architecture
|
||||
- ✅ Excellent scaffolding with TODO/APPROACH/HINTS pattern
|
||||
- ✅ Rich examples in docstrings
|
||||
- ✅ Strong narrative flow explaining WHY caching matters
|
||||
- ✅ Progressive disclosure - builds complexity gradually
|
||||
|
||||
#### Example of Excellent Scaffolding:
|
||||
```python
|
||||
def __init__(self, ...):
|
||||
"""
|
||||
TODO: Set up pre-allocated cache storage for all transformer layers
|
||||
|
||||
APPROACH:
|
||||
1. Store configuration parameters (batch_size, max_seq_len, etc.)
|
||||
2. Initialize sequence position counter to 0
|
||||
3. Create empty list for cache storage
|
||||
4. For each layer, pre-allocate zero-filled key and value caches
|
||||
5. Store each layer's (key_cache, value_cache) tuple in the list
|
||||
|
||||
HINTS:
|
||||
- Cache shape: (batch_size, num_heads, max_seq_len, head_dim)
|
||||
- Use Tensor(np.zeros(...)) to create cache tensors
|
||||
"""
|
||||
```
|
||||
|
||||
#### Issues:
|
||||
- ⚠️ Analysis functions (lines 1339-1427) lack comprehensive docstrings
|
||||
- Could add more pedagogical notes explaining when students use .data vs Tensor operations
|
||||
|
||||
**Recommendation**: Add full docstrings to analysis functions with educational context.
|
||||
|
||||
---
|
||||
|
||||
### 3. Imports & Module Structure ✅ PASSING
|
||||
|
||||
**Score: 90/100**
|
||||
|
||||
#### Strengths:
|
||||
- ✅ Proper package export declarations (`#| export`)
|
||||
- ✅ Clean dependency management (only imports from tinytorch.core)
|
||||
- ✅ Correct import pattern for profiler
|
||||
- ✅ Good separation of concerns (KVCache, enable_kv_cache, disable_kv_cache)
|
||||
|
||||
#### Issues:
|
||||
- ⚠️ **CRITICAL**: Module executes profiling code on import (lines 79-141)
|
||||
- This violates the "test code protection" rule
|
||||
- Should be wrapped in `if __name__ == "__main__":` block
|
||||
|
||||
- ⚠️ Module number confusion:
|
||||
- Line 45: Says "modules/15_memoization" (correct)
|
||||
- Line 1505: Says "tito module complete 14" (should be 15)
|
||||
- Line 918: Says "Module 14" (should be 15)
|
||||
|
||||
**Recommendation**:
|
||||
1. Wrap profiling code in main guard:
|
||||
```python
|
||||
if __name__ == "__main__":
|
||||
# Profile transformer generation to discover the bottleneck
|
||||
profiler = Profiler()
|
||||
# ... rest of profiling code
|
||||
```
|
||||
|
||||
2. Fix all references to "Module 14" → "Module 15"
|
||||
|
||||
---
|
||||
|
||||
### 4. Memory Profiling & Performance Benchmarking ✅ EXCELLENT
|
||||
|
||||
**Score: 100/100**
|
||||
|
||||
#### Strengths:
|
||||
- ✅ Comprehensive `get_memory_usage()` method in KVCache
|
||||
- ✅ Excellent `analyze_kvcache_memory()` comparing different model sizes
|
||||
- ✅ Outstanding `analyze_kvcache_speedup()` with complexity analysis
|
||||
- ✅ Clear visualization of memory-compute trade-offs
|
||||
- ✅ Production context showing real-world GPU memory costs
|
||||
|
||||
#### Example Excellence:
|
||||
```python
|
||||
def analyze_kvcache_speedup():
|
||||
"""📊 Measure KV cache speedup vs vanilla attention."""
|
||||
# Simulates O(n²) vs O(n) complexity
|
||||
ops_without = sum(i**2 for i in range(1, gen_length + 1)) # O(n²)
|
||||
ops_with = gen_length # O(n)
|
||||
speedup = ops_without / ops_with
|
||||
```
|
||||
|
||||
Shows students the EXACT mathematical reason for speedup!
|
||||
|
||||
---
|
||||
|
||||
### 5. ML Systems Analysis ✅ EXCELLENT
|
||||
|
||||
**Score: 98/100**
|
||||
|
||||
#### Strengths:
|
||||
- ✅ Outstanding motivation section with profiling (lines 71-141)
|
||||
- ✅ Clear explanation of O(n²) → O(n) transformation
|
||||
- ✅ Excellent trade-off analysis (memory vs compute)
|
||||
- ✅ Real production numbers (GPT-3 cache sizes, ChatGPT usage)
|
||||
- ✅ Memory overhead calculations with concrete examples
|
||||
- ✅ Scaling behavior clearly demonstrated
|
||||
|
||||
#### Highlights:
|
||||
1. **Motivation Section**: Shows students the problem BEFORE the solution
|
||||
2. **Trade-off Analysis**: "Memory is cheap, compute is expensive"
|
||||
3. **Production Context**: "ChatGPT uses KV caching for ALL generation"
|
||||
4. **Scaling Insight**: "Speedup increases with sequence length"
|
||||
|
||||
#### Minor Issues:
|
||||
- Could add more discussion of cache eviction strategies for long sequences
|
||||
- Could mention PagedAttention (used in vLLM) as advanced cache management
|
||||
|
||||
---
|
||||
|
||||
### 6. Test Coverage ✅ EXCELLENT
|
||||
|
||||
**Score: 95/100**
|
||||
|
||||
#### Strengths:
|
||||
- ✅ Three comprehensive unit tests:
|
||||
- `test_unit_kvcache()` - Core cache operations
|
||||
- `test_unit_cache_enablement()` - Different model sizes
|
||||
- `test_unit_noninvasive_integration()` - Integration pattern
|
||||
- ✅ `test_module()` comprehensive integration test
|
||||
- ✅ All tests pass successfully
|
||||
- ✅ Good edge case coverage (empty cache, full sequence, reset)
|
||||
- ✅ Clear test output with educational feedback
|
||||
|
||||
#### Test Run Results:
|
||||
```
|
||||
🧪 RUNNING MODULE INTEGRATION TEST
|
||||
==================================================
|
||||
✅ KVCache implementation works correctly!
|
||||
✅ Cache enablement works correctly!
|
||||
✅ Non-invasive cache integration works correctly!
|
||||
✅ Complete KV cache workflow validated!
|
||||
✅ Memory tracking: 2.00 MB for 8 tensors
|
||||
==================================================
|
||||
🎉 ALL TESTS PASSED! Module ready for export.
|
||||
```
|
||||
|
||||
#### Issues:
|
||||
- ⚠️ **CRITICAL**: Profiling code (lines 79-141) runs on import, should be protected
|
||||
- Could add test for cache overflow (exceeding max_seq_len)
|
||||
- Could test batch dimension changes
|
||||
|
||||
**Recommendation**: Add test for error conditions:
|
||||
```python
|
||||
def test_unit_cache_errors():
|
||||
"""Test cache error handling"""
|
||||
cache = KVCache(1, 10, 2, 4, 32)
|
||||
|
||||
# Fill cache to max
|
||||
for i in range(10):
|
||||
cache.update(0, key, value)
|
||||
cache.advance()
|
||||
|
||||
# Should raise error on overflow
|
||||
with pytest.raises(ValueError):
|
||||
cache.update(0, key, value)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. Production Context & Real-World Applications ✅ EXCELLENT
|
||||
|
||||
**Score: 100/100**
|
||||
|
||||
#### Strengths:
|
||||
- ✅ Outstanding production context throughout
|
||||
- ✅ Clear connection to ChatGPT, Claude, GPT-4
|
||||
- ✅ Economic viability discussion (10× speedup = 10× more users per GPU)
|
||||
- ✅ Real-world numbers (GPT-3: 4.7GB cache per sequence)
|
||||
- ✅ Best practices section with deployment guidance
|
||||
- ✅ Explains why all production LLMs use this technique
|
||||
|
||||
#### Highlights:
|
||||
1. **Economic Impact**: "This optimization makes production language model serving economically viable"
|
||||
2. **User Experience**: "Without caching: unacceptably slow" vs "With caching: real-time interaction"
|
||||
3. **Scale**: "Technique that enables serving millions of users daily"
|
||||
4. **Industry Standard**: "vLLM, llama.cpp use similar patterns"
|
||||
|
||||
---
|
||||
|
||||
## Specific Issues & Fixes
|
||||
|
||||
### Issue 1: Profiling Code Not Protected ⚠️ CRITICAL
|
||||
|
||||
**Location**: Lines 79-141
|
||||
|
||||
**Problem**:
|
||||
```python
|
||||
# %%
|
||||
# Profile transformer generation to discover the bottleneck
|
||||
profiler = Profiler()
|
||||
# ... profiling code runs immediately
|
||||
```
|
||||
|
||||
This code executes on import, which will cause issues when other modules import this file.
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# %% [markdown]
|
||||
"""
|
||||
## 🔬 Motivation: Why Memoization Matters for Transformers
|
||||
...
|
||||
"""
|
||||
|
||||
# %%
|
||||
def profile_naive_generation():
|
||||
"""Profile transformer generation to discover the bottleneck."""
|
||||
from tinytorch.profiling.profiler import Profiler
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
profiler = Profiler()
|
||||
|
||||
def naive_attention_step(seq_len, hidden_dim=64):
|
||||
# ... implementation
|
||||
pass
|
||||
|
||||
# Profile at increasing sequence lengths
|
||||
print("🔬 Profiling Transformer Generation (Without Caching):\n")
|
||||
# ... rest of profiling code
|
||||
|
||||
# Run profiling when executing module directly
|
||||
if __name__ == "__main__":
|
||||
profile_naive_generation()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Issue 2: Module Number Inconsistency ⚠️ MEDIUM
|
||||
|
||||
**Locations**:
|
||||
- Line 918: "Module 14 doesn't modify Modules 12-13"
|
||||
- Line 1505: "tito module complete 14"
|
||||
- Line 1622: "Module 14 doesn't modify"
|
||||
- Line 1650: "Module 14: KV Caching"
|
||||
|
||||
**Fix**: Change all instances of "Module 14" to "Module 15" since this is the memoization module.
|
||||
|
||||
**Search and Replace**:
|
||||
```bash
|
||||
# In memoization_dev.py
|
||||
Module 14 → Module 15
|
||||
tito module complete 14 → tito module complete 15
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Issue 3: Analysis Functions Missing Comprehensive Docstrings ⚠️ MINOR
|
||||
|
||||
**Locations**: Lines 1339, 1381
|
||||
|
||||
**Current**:
|
||||
```python
|
||||
def analyze_kvcache_memory():
|
||||
"""📊 Analyze KV cache memory usage across different configurations."""
|
||||
```
|
||||
|
||||
**Recommended**:
|
||||
```python
|
||||
def analyze_kvcache_memory():
|
||||
"""
|
||||
📊 Analyze KV cache memory usage across different configurations.
|
||||
|
||||
Educational Purpose:
|
||||
Demonstrates how cache memory scales with model architecture.
|
||||
Students discover:
|
||||
- Linear scaling with sequence length O(n)
|
||||
- Memory overhead as percentage of model parameters
|
||||
- Trade-off between cache size and speedup gains
|
||||
|
||||
Analyzes:
|
||||
- Tiny models (128D): ~0.12 MB
|
||||
- Small models (512D): ~2 MB
|
||||
- Medium models (768D): ~9 MB
|
||||
- Large models (1024D): ~32 MB
|
||||
|
||||
Key Insight:
|
||||
Cache overhead is 10-30% of model parameters, but enables
|
||||
10-15× speedup. Memory is cheap, compute is expensive!
|
||||
|
||||
Production Context:
|
||||
GPT-3 (175B params, 2048 context): ~4GB cache per sequence
|
||||
This memory cost is acceptable given the massive speedup.
|
||||
"""
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Issue 4: Missing __main__ Guards ⚠️ CRITICAL
|
||||
|
||||
**Problem**: Several code blocks execute on import instead of being protected:
|
||||
1. Lines 79-141: Profiling code
|
||||
2. Lines 1426-1427: Analysis function calls
|
||||
|
||||
**Fix Pattern**:
|
||||
```python
|
||||
# Define functions first
|
||||
def analyze_kvcache_memory():
|
||||
# ... implementation
|
||||
pass
|
||||
|
||||
def analyze_kvcache_speedup():
|
||||
# ... implementation
|
||||
pass
|
||||
|
||||
# Protect execution
|
||||
if __name__ == "__main__":
|
||||
analyze_kvcache_memory()
|
||||
analyze_kvcache_speedup()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Comparison with TinyTorch Standards
|
||||
|
||||
### Template Compliance: ✅ EXCELLENT
|
||||
|
||||
| Standard Requirement | Status | Score |
|
||||
|---------------------|--------|-------|
|
||||
| Jupytext Headers | ✅ Complete | 100% |
|
||||
| NBGrader Metadata | ✅ Mostly Complete | 95% |
|
||||
| Educational Content | ✅ Excellent | 98% |
|
||||
| Progressive Disclosure | ✅ Excellent | 100% |
|
||||
| Immediate Testing | ✅ Yes | 100% |
|
||||
| Systems Analysis | ✅ Excellent | 98% |
|
||||
| Production Context | ✅ Outstanding | 100% |
|
||||
| Module Integration Test | ✅ Present | 100% |
|
||||
| ML Systems Questions | ✅ Comprehensive | 100% |
|
||||
| Module Summary | ✅ Excellent | 100% |
|
||||
|
||||
### Pedagogical Quality: ✅ EXCELLENT
|
||||
|
||||
**Narrative Flow**: Outstanding (95/100)
|
||||
- Clear motivation with profiling
|
||||
- Builds complexity progressively
|
||||
- Strong connection between theory and implementation
|
||||
|
||||
**Scaffolding**: Excellent (98/100)
|
||||
- TODO/APPROACH/HINTS pattern consistently used
|
||||
- Clear examples in docstrings
|
||||
- Good balance of guidance vs independence
|
||||
|
||||
**Systems Thinking**: Outstanding (100/100)
|
||||
- Excellent O(n²) → O(n) analysis
|
||||
- Clear trade-off discussions
|
||||
- Real production context throughout
|
||||
|
||||
### Code Quality: ✅ EXCELLENT
|
||||
|
||||
**Implementation**: Clean and Professional (95/100)
|
||||
- Well-structured KVCache class
|
||||
- Proper error handling with educational messages
|
||||
- Good separation of concerns
|
||||
|
||||
**Testing**: Comprehensive (95/100)
|
||||
- Multiple unit tests covering different aspects
|
||||
- Integration test validates complete workflow
|
||||
- All tests pass
|
||||
|
||||
**Documentation**: Excellent (92/100)
|
||||
- Rich docstrings with examples
|
||||
- Clear ASCII diagrams
|
||||
- Good inline comments explaining design decisions
|
||||
|
||||
---
|
||||
|
||||
## Critical Path Items (Must Fix Before Release)
|
||||
|
||||
### Priority 1: CRITICAL (Block Release)
|
||||
1. ⚠️ **Protect profiling code with `if __name__ == "__main__"`** (lines 79-141)
|
||||
2. ⚠️ **Protect analysis function calls** (lines 1426-1427)
|
||||
3. ⚠️ **Fix module number references** (14 → 15 throughout)
|
||||
|
||||
### Priority 2: HIGH (Should Fix)
|
||||
4. Add nbgrader metadata to motivation/analysis cells
|
||||
5. Add comprehensive docstrings to analysis functions
|
||||
|
||||
### Priority 3: NICE TO HAVE
|
||||
6. Add test for cache overflow error handling
|
||||
7. Add discussion of advanced cache strategies (PagedAttention)
|
||||
8. Consider adding batch dimension testing
|
||||
|
||||
---
|
||||
|
||||
## Module-Specific Observations
|
||||
|
||||
### What This Module Does Exceptionally Well
|
||||
|
||||
1. **Motivation Through Profiling**: The opening section (lines 71-141) is BRILLIANT
|
||||
- Shows students the problem BEFORE teaching the solution
|
||||
- Concrete measurements demonstrate O(n²) growth
|
||||
- Makes the optimization need visceral, not abstract
|
||||
|
||||
2. **Non-Invasive Enhancement Pattern**: Outstanding systems engineering lesson
|
||||
- Shows how to ADD capabilities without BREAKING existing code
|
||||
- Module 15 enhances Module 13 without modifying it
|
||||
- Critical production skill: "forward compatibility"
|
||||
|
||||
3. **Clear Trade-off Analysis**: Excellent engineering thinking
|
||||
- Memory vs compute explicitly quantified
|
||||
- "2× memory enables 10× speedup" - concrete numbers
|
||||
- Shows students real engineering decisions
|
||||
|
||||
4. **Production Grounding**: Every concept tied to real systems
|
||||
- ChatGPT, Claude, GPT-4 all use this technique
|
||||
- Actual numbers: GPT-3 cache size, speedup measurements
|
||||
- Economic viability discussion connects to business reality
|
||||
|
||||
### Alignment with Module Philosophy
|
||||
|
||||
✅ **Single Tensor Class**: Correctly uses Tensor throughout, no Variable confusion
|
||||
✅ **No Forward References**: Only uses concepts from previous modules
|
||||
✅ **Immediate Testing**: Tests after each implementation
|
||||
✅ **Systems Focus**: Outstanding performance analysis
|
||||
✅ **Production Patterns**: Real-world integration strategy
|
||||
|
||||
---
|
||||
|
||||
## Recommendations for Improvement
|
||||
|
||||
### Short-term (Next Iteration)
|
||||
1. Add `if __name__ == "__main__"` guards (CRITICAL)
|
||||
2. Fix module number references (CRITICAL)
|
||||
3. Add comprehensive docstrings to analysis functions
|
||||
4. Add nbgrader metadata to remaining cells
|
||||
|
||||
### Long-term (Future Enhancements)
|
||||
1. Add advanced section on cache eviction strategies
|
||||
2. Discuss PagedAttention (vLLM's cache management)
|
||||
3. Add visualization of cache memory over time
|
||||
4. Consider adding batch processing examples
|
||||
5. Add section on cache-aware model serving (batch prefilling)
|
||||
|
||||
### Educational Enhancements
|
||||
1. Could add interactive widget showing cache updates
|
||||
2. Could visualize attention matrix sparsity with caching
|
||||
3. Add "common mistakes" section (e.g., forgetting to advance cache)
|
||||
|
||||
---
|
||||
|
||||
## Final Assessment
|
||||
|
||||
### Overall: ✅ EXCELLENT MODULE (A-)
|
||||
|
||||
**Module 15 is production-ready with minor fixes needed.**
|
||||
|
||||
### Strengths Summary
|
||||
- Outstanding educational content with clear progression
|
||||
- Excellent systems analysis with real measurements
|
||||
- Strong production context throughout
|
||||
- Comprehensive testing with good coverage
|
||||
- Clean, professional implementation
|
||||
- All tests pass successfully
|
||||
|
||||
### Issues Summary
|
||||
- 3 CRITICAL issues (all easy to fix)
|
||||
- 2 HIGH priority improvements
|
||||
- 3 NICE TO HAVE enhancements
|
||||
|
||||
### Recommendation
|
||||
**APPROVE with required fixes:**
|
||||
1. Add `if __name__ == "__main__"` guards to protect test code
|
||||
2. Fix module number inconsistencies (14 → 15)
|
||||
3. Add comprehensive docstrings to analysis functions
|
||||
|
||||
After these fixes, this module will be an exemplar of TinyTorch quality.
|
||||
|
||||
---
|
||||
|
||||
## Comparison with Other Modules
|
||||
|
||||
This module represents some of the best educational content in TinyTorch:
|
||||
- **Better than Module 01-04**: More sophisticated systems analysis
|
||||
- **On par with Module 12-13**: Excellent production grounding
|
||||
- **Sets new standard for**: Non-invasive enhancement pattern
|
||||
|
||||
The "motivation through profiling" section is a pattern that should be adopted by other optimization modules.
|
||||
|
||||
---
|
||||
|
||||
## Test Results
|
||||
|
||||
```bash
|
||||
$ python modules/15_memoization/memoization_dev.py
|
||||
|
||||
🧪 RUNNING MODULE INTEGRATION TEST
|
||||
==================================================
|
||||
|
||||
Running unit tests...
|
||||
🔬 Unit Test: KVCache Implementation...
|
||||
Cache initialized: 0.02 MB
|
||||
✅ KVCache implementation works correctly!
|
||||
|
||||
🔬 Unit Test: Cache Enablement for Different Models...
|
||||
Test 1: Small Model (Tiny Transformer)
|
||||
Small model cache: 0.125 MB
|
||||
Test 2: Medium Model (Standard Transformer)
|
||||
Medium model cache: 2.000 MB
|
||||
Test 3: Batch Inference (4 sequences)
|
||||
Batch cache: 0.500 MB (4x batch size)
|
||||
✅ Cache enablement works correctly!
|
||||
|
||||
🔬 Unit Test: Non-Invasive Cache Integration...
|
||||
✅ Non-invasive cache integration works correctly!
|
||||
|
||||
Running integration scenarios...
|
||||
🔬 Integration Test: Complete KV Cache Workflow...
|
||||
✅ Complete KV cache workflow validated!
|
||||
|
||||
🔬 Integration Test: Memory Tracking...
|
||||
✅ Memory tracking: 2.00 MB for 8 tensors
|
||||
|
||||
==================================================
|
||||
🎉 ALL TESTS PASSED! Module ready for export.
|
||||
```
|
||||
|
||||
**Result: ✅ ALL TESTS PASSING**
|
||||
|
||||
---
|
||||
|
||||
## Sign-off
|
||||
|
||||
**Module Quality**: A- (92/100)
|
||||
**Ready for Student Use**: ✅ YES (after critical fixes)
|
||||
**Reviewer**: TinyTorch Standards Compliance
|
||||
**Date**: 2025-11-10
|
||||
|
||||
**Final Recommendation**: APPROVE with required fixes for critical issues. This is an excellent educational module that teaches a production-critical optimization with outstanding clarity and systems thinking. The minor issues found are easily fixable and don't detract from the overall quality.
|
||||
Reference in New Issue
Block a user