Prepare TinyTorch for December 2024 community release

## Documentation Consistency - Fix module count: All docs now correctly show 20 modules (01-20) - Update Optimization Tier: Modules 14-20 (added 19 Benchmarking, 20 Competition) - Correct tier descriptions across student-workflow, learning-progress, classroom-use ## New Documentation - **FAQ (site/faq.md)**: Comprehensive FAQ addressing: - Why TinyTorch vs PyTorch/TensorFlow direct usage - Why TinyTorch vs micrograd/nanoGPT - Who should use TinyTorch - Course structure and flexibility - Practical getting started questions - **Datasets (site/datasets.md)**: Complete dataset documentation: - Ship-with-repo datasets (TinyDigits 310KB, TinyTalks 40KB) - Downloaded datasets (MNIST 10MB, CIFAR-10 170MB) - Design philosophy and rationale - Usage instructions per milestone - **Release Checklist (DECEMBER_2024_RELEASE.md)**: - Comprehensive pre-launch checklist - Documentation, technical, community preparation tasks - Version recommendation: v0.9.0 - Success metrics and launch timeline ## Module Count Corrections - learning-progress.md: 18→20 modules, added Benchmarking & Competition to table - student-workflow.md: 18→20 modules, updated Optimization tier description - classroom-use.md: 18→20 modules in feature list ## Version Recommendation Proposed **v0.9.0** for December 2024 release: - Signals feature-complete for individual learners - Reserves v1.0 for classroom integration (Spring 2025) - Allows v0.9.x patches for post-launch refinements [Claude Code](https://claude.com/claude-code)
2026-04-27 17:37:42 -05:00 · 2025-11-11 22:04:36 -05:00
parent c7bc68fa37
commit 94b25debbd
6 changed files with 1013 additions and 7 deletions
--- a/DECEMBER_2024_RELEASE.md
+++ b/DECEMBER_2024_RELEASE.md
@@ -0,0 +1,310 @@
+# TinyTorch December 2024 Community Release Checklist
+
+**Target**: December 2024 community launch as a functional educational framework
+**Focus**: Individual learners (classroom integration coming in future releases)
+**Goal**: Stable, well-documented system for building ML frameworks from scratch
+
+---
+
+## ✅ Documentation (CRITICAL)
+
+### Core Documentation
+- [x] **Student workflow documented** - Clear edit → export → validate cycle
+- [x] **Module count corrected** - All docs show 20 modules consistently
+- [x] **FAQ created** - Addresses "why TinyTorch vs alternatives"
+- [x] **Datasets documented** - Clear explanation of shipped vs downloaded data
+- [ ] **README.md polished** - First impression for GitHub visitors
+- [ ] **LICENSE verified** - Appropriate open-source license in place
+- [ ] **CONTRIBUTING.md** - Guidelines for community contributions
+- [ ] **Installation guide tested** - Setup works on Mac/Linux/Windows
+
+### Module Documentation
+- [ ] **All 20 ABOUT.md files complete** - Each module has learning objectives
+- [ ] **Module numbering verified** - 01-20 with correct tier assignments
+- [ ] **Prerequisites documented** - Clear dependency chains
+- [ ] **Time estimates realistic** - Accurate completion time expectations
+
+### Milestone Documentation
+- [x] **All 6 milestone READMEs standardized** - Historical context + requirements
+- [ ] **Expected results documented** - Clear success criteria per milestone
+- [ ] **Troubleshooting sections** - Common issues and solutions
+- [ ] **Dataset requirements clear** - Which datasets needed per milestone
+
+---
+
+## 🔧 Technical Validation (CRITICAL)
+
+### Environment Setup
+- [ ] **setup-environment.sh tested** on:
+  - [ ] macOS (M1/M2 arm64)
+  - [ ] macOS (Intel x86_64)
+  - [ ] Linux (Ubuntu 22.04)
+  - [ ] Linux (Ubuntu 20.04)
+  - [ ] Windows (WSL2)
+- [ ] **Dependencies verified** - All packages install correctly
+- [ ] **Version pins checked** - Compatible NumPy, Jupyter, etc.
+- [ ] **Virtual environment isolation** - No conflicts with system Python
+
+### TITO CLI Commands
+- [ ] **`tito system doctor`** - Comprehensive environment checks
+- [ ] **`tito system info`** - Shows correct configuration
+- [ ] **`tito module complete N`** - Exports work correctly for all 20 modules
+- [ ] **`tito checkpoint status`** - Optional checkpoint tracking works
+- [ ] **Error messages helpful** - Clear guidance when things fail
+
+### Module Export System
+- [ ] **Export validates** - All 20 modules export without errors
+- [ ] **Import verification** - Exported modules importable from tinytorch.*
+- [ ] **Dependency handling** - Modules export in correct order
+- [ ] **File structure correct** - Modules land in right package locations
+
+### Milestone Execution
+- [ ] **M01: Perceptron** - Runs successfully with module 07 exports
+- [ ] **M02: XOR** - Trains and solves XOR problem
+- [ ] **M03: MLP** - Achieves 85%+ on TinyDigits, 90%+ on MNIST
+- [ ] **M04: CNN** - Achieves 70%+ on CIFAR-10
+- [ ] **M05: Transformer** - Generates coherent text
+- [ ] **M06: MLPerf** - Benchmarking completes successfully
+
+---
+
+## 📦 Repository Health (HIGH PRIORITY)
+
+### Git Repository
+- [ ] **.gitignore complete** - No datasets/checkpoints/cache in repo
+- [ ] **No large files** - Repository under 50 MB
+- [ ] **Clean history** - No sensitive data in commits
+- [ ] **Branch strategy** - main/dev branches clear
+- [ ] **Tags for release** - v0.9.0 tag created
+
+### Repository Structure
+- [ ] **Directory organization clear**:
+  - `modules/` - 20 module directories
+  - `milestones/` - 6 milestone directories
+  - `datasets/` - TinyDigits, TinyTalks (shipped)
+  - `site/` - Documentation website
+  - `tinytorch/` - Package code (generated from modules)
+  - `tests/` - Test suite
+- [ ] **README files present** - Key directories have README.md
+- [ ] **No orphaned files** - Old experiments cleaned up
+
+### Code Quality
+- [ ] **Python 3.9+ compatibility** - Works on modern Python
+- [ ] **Type hints** - Critical functions annotated
+- [ ] **Docstrings present** - Public APIs documented
+- [ ] **Code formatting** - Consistent style (black/ruff)
+- [ ] **No obvious bugs** - Core functionality works
+
+---
+
+## 🌐 Website/Documentation Site (HIGH PRIORITY)
+
+### Website Build
+- [ ] **Site builds successfully** - `jupyter-book build site/` works
+- [ ] **All pages render** - No broken markdown/formatting
+- [ ] **Navigation clear** - Easy to find information
+- [ ] **Mobile-friendly** - Responsive design works
+
+### Critical Pages
+- [x] **intro.md** - Landing page with clear value proposition
+- [x] **quickstart-guide.md** - 15-minute getting started
+- [x] **student-workflow.md** - Core development cycle
+- [x] **tito-essentials.md** - Command reference
+- [x] **learning-progress.md** - Module progression guide
+- [x] **faq.md** - Answers common questions
+- [x] **datasets.md** - Dataset documentation
+- [ ] **chapters/** - All chapter content complete
+
+### Internal Links
+- [ ] **All internal links work** - No broken cross-references
+- [ ] **Code references formatted** - Syntax highlighting works
+- [ ] **Images display** - If any diagrams/screenshots present
+
+---
+
+## 🧪 Testing (MEDIUM PRIORITY)
+
+### Automated Tests
+- [ ] **Test suite exists** - tests/ directory has comprehensive coverage
+- [ ] **Tests pass** - `pytest tests/` succeeds
+- [ ] **Coverage reasonable** - Core functionality tested
+- [ ] **CI/CD configured** - GitHub Actions run tests (optional for v0.9)
+
+### Manual Testing
+- [ ] **Fresh install tested** - New user can complete Module 01
+- [ ] **Module 01-07 validated** - Foundation tier works end-to-end
+- [ ] **Module 08-13 validated** - Architecture tier works
+- [ ] **Module 14-20 validated** - Optimization tier works
+- [ ] **Cross-platform tested** - Works on Mac/Linux at minimum
+
+### Edge Cases
+- [ ] **Missing dependencies handled** - Clear error messages
+- [ ] **Network failures graceful** - MNIST/CIFAR download errors handled
+- [ ] **Disk space issues** - Helpful messages if space low
+- [ ] **Permission errors** - Guide users to fix permissions
+
+---
+
+## 📢 Community Preparation (MEDIUM PRIORITY)
+
+### GitHub Repository
+- [ ] **Description clear** - "Educational ML framework built from scratch"
+- [ ] **Topics tagged** - machine-learning, education, pytorch-alternative, etc.
+- [ ] **GitHub Pages enabled** - Documentation site live
+- [ ] **Issues template** - Bug report and feature request templates
+- [ ] **PR template** - Contribution guidelines template
+- [ ] **Code of Conduct** - Community standards documented
+
+### Communication
+- [ ] **Release announcement drafted** - What, why, how to get started
+- [ ] **Social media prepared** - Twitter/LinkedIn posts ready
+- [ ] **README badges** - Build status, license, etc.
+- [ ] **Changelog started** - CHANGELOG.md for v0.9.0
+
+### Community Resources
+- [ ] **GitHub Discussions enabled** - Q&A and community space
+- [ ] **Discord/Slack** (optional) - Real-time community chat
+- [ ] **Leaderboard** (optional) - Module 20 competition results
+- [ ] **Contributor guide** - How to contribute code/docs
+
+---
+
+## 🎓 Educational Quality (MEDIUM PRIORITY)
+
+### Pedagogical Soundness
+- [ ] **Learning objectives clear** - Each module states what you'll learn
+- [ ] **Prerequisites documented** - Students know what's required
+- [ ] **Scaffolding effective** - Modules build on previous work
+- [ ] **Systems focus maintained** - Profiling/performance emphasized
+
+### Student Experience
+- [ ] **First module polished** - Module 01 is excellent intro
+- [ ] **Error messages helpful** - Students not blocked by cryptic errors
+- [ ] **Success feedback** - Celebrate completions appropriately
+- [ ] **Realistic expectations** - Time estimates accurate
+
+### Reference Materials
+- [ ] **Production comparisons** - How TinyTorch relates to PyTorch/TF
+- [ ] **Historical context** - Why each milestone matters
+- [ ] **Career connections** - Job relevance clear
+- [ ] **Further reading** - Links to deepen understanding
+
+---
+
+## 🚀 Launch Readiness (LOW PRIORITY - Nice to Have)
+
+### Optional Enhancements
+- [ ] **Video walkthrough** - 5-minute intro video
+- [ ] **Blog post** - Detailed launch article
+- [ ] **Academic paper** - Pedagogy research paper (future)
+- [ ] **Conference submission** - SIGCSE/ICER presentation (future)
+
+### Future Features (Mark as "Coming Soon")
+- [x] **NBGrader integration** - Marked as coming soon in docs
+- [x] **Classroom tooling** - Instructor guide states under development
+- [ ] **Advanced modules** - 21-25 as extension (future)
+- [ ] **GPU support** - CUDA implementation (future)
+
+---
+
+## Final Pre-Launch Checklist
+
+**Run through this sequence 1 week before launch:**
+
+### Day -7: Documentation Review
+- [ ] Read entire documentation site as a new user
+- [ ] Fix all typos, broken links, unclear sections
+- [ ] Verify all code examples run correctly
+
+### Day -5: Technical Validation
+- [ ] Fresh install on 3 different machines
+- [ ] Complete Module 01 on each platform
+- [ ] Run all 6 milestones successfully
+- [ ] Verify all TITO commands work
+
+### Day -3: Community Prep
+- [ ] Finalize GitHub repository settings
+- [ ] Prepare announcement posts
+- [ ] Set up community channels (Discussions/Discord)
+- [ ] Test contributor workflow
+
+### Day -1: Final Polish
+- [ ] Create v0.9.0 release tag
+- [ ] Deploy documentation site
+- [ ] Queue social media announcements
+- [ ] Prepare for launch day support
+
+### Launch Day
+- [ ] Publish release on GitHub
+- [ ] Post announcements (social media, forums)
+- [ ] Monitor issues/discussions
+- [ ] Celebrate! 🎉
+
+---
+
+## Version Recommendation
+
+**Proposed**: **v0.9.0** for December 2024 release
+
+**Rationale:**
+- v1.0 implies "production complete" - saves that for classroom integration
+- v0.9 signals "feature-complete for individual learners, refinements ongoing"
+- Allows v0.9.x patches for bugs discovered post-launch
+- v1.0 can mark full classroom integration milestone (Spring 2025?)
+
+**Version Roadmap:**
+- **v0.9.0** (Dec 2024) - Community launch for individual learners
+- **v0.9.x** (Dec-Feb) - Bug fixes and documentation improvements
+- **v1.0.0** (Spring 2025?) - NBGrader integration + full classroom support
+- **v1.x.x** - Advanced modules, GPU support, additional features
+
+---
+
+## Success Metrics (Post-Launch)
+
+Track these after release:
+
+**Technical:**
+- Setup success rate (% users completing Module 01)
+- Platform coverage (macOS/Linux/Windows compatibility)
+- Bug report frequency
+- Milestone completion rates
+
+**Community:**
+- GitHub stars/forks
+- Documentation page views
+- Community discussions activity
+- Contribution rate
+
+**Educational:**
+- Module completion rates
+- Time-to-complete estimates validated
+- Learning objective achievement
+- Student feedback quality
+
+---
+
+## Notes
+
+**Current Status (as of checklist creation):**
+- ✅ Documentation structure complete and consistent
+- ✅ Module count corrected to 20
+- ✅ FAQ and datasets documented
+- ⏳ Need comprehensive testing across platforms
+- ⏳ Need community infrastructure setup
+- ⏳ Need final polish pass
+
+**Estimated time to launch-ready:** 2-3 weeks of focused work
+
+**Critical path items:**
+1. Technical validation (test on multiple platforms)
+2. Module/milestone execution verification
+3. Documentation final polish
+4. Community infrastructure setup
+5. Release announcement preparation
+
+**Non-blocking items (can be post-launch):**
+- Video tutorials
+- Advanced test coverage
+- Performance optimizations
+- Additional example notebooks
--- a/site/datasets.md
+++ b/site/datasets.md
@@ -0,0 +1,309 @@
+# TinyTorch Datasets
+
+<div style="background: #f8f9fa; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0; text-align: center;">
+<h2 style="margin: 0 0 1rem 0; color: #495057;">Ship-with-Repo Datasets for Fast Learning</h2>
+<p style="margin: 0; font-size: 1.1rem; color: #6c757d;">Small datasets for instant iteration + standard benchmarks for validation</p>
+</div>
+
+**Purpose**: Understand TinyTorch's dataset strategy and where to find each dataset used in milestones.
+
+## Design Philosophy
+
+TinyTorch uses a two-tier dataset approach:
+
+<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1.5rem; margin: 2rem 0;">
+
+<div style="background: #e3f2fd; border: 1px solid #2196f3; padding: 1.5rem; border-radius: 0.5rem;">
+<h3 style="margin: 0 0 1rem 0; color: #1976d2;">📦 Shipped Datasets</h3>
+<p style="margin: 0 0 1rem 0;"><strong>~350 KB total - Ships with repository</strong></p>
+<ul style="margin: 0; font-size: 0.9rem;">
+<li>Small enough to fit in Git (~1K samples each)</li>
+<li>Fast training (seconds to minutes)</li>
+<li>Instant gratification for learners</li>
+<li>Works offline - no download needed</li>
+<li>Perfect for rapid iteration</li>
+</ul>
+</div>
+
+<div style="background: #f3e5f5; border: 1px solid #9c27b0; padding: 1.5rem; border-radius: 0.5rem;">
+<h3 style="margin: 0 0 1rem 0; color: #7b1fa2;">⬇️ Downloaded Datasets</h3>
+<p style="margin: 0 0 1rem 0;"><strong>~180 MB - Auto-downloaded when needed</strong></p>
+<ul style="margin: 0; font-size: 0.9rem;">
+<li>Standard ML benchmarks (MNIST, CIFAR-10)</li>
+<li>Larger scale (~60K samples)</li>
+<li>Used for validation and scaling</li>
+<li>Downloaded automatically by milestones</li>
+<li>Cached locally for reuse</li>
+</ul>
+</div>
+
+</div>
+
+**Philosophy**: Following Andrej Karpathy's "~1K samples" approach—small datasets for learning, full benchmarks for validation.
+
+---
+
+## Shipped Datasets (Included with TinyTorch)
+
+### TinyDigits - Handwritten Digit Recognition
+
+<div style="background: #fff5f5; border-left: 4px solid #e74c3c; padding: 1.5rem; margin: 1.5rem 0;">
+
+**📍 Location**: `datasets/tinydigits/`
+**📊 Size**: ~310 KB
+**🎯 Used by**: Milestones 03 & 04 (MLP and CNN examples)
+
+**Contents:**
+- 1,000 training samples
+- 200 test samples
+- 8×8 grayscale images (downsampled from MNIST)
+- 10 classes (digits 0-9)
+
+**Format**: Python pickle file with NumPy arrays
+
+**Why 8×8?**
+- Fast iteration: Trains in seconds
+- Memory-friendly: Small enough to debug
+- Conceptually complete: Same challenges as 28×28 MNIST
+- Git-friendly: Only 310 KB vs 10 MB for full MNIST
+
+**Usage in milestones:**
+```python
+# Automatically loaded by milestones
+from datasets.tinydigits import load_tinydigits
+X_train, y_train, X_test, y_test = load_tinydigits()
+# X_train shape: (1000, 8, 8)
+# y_train shape: (1000,)
+```
+
+</div>
+
+### TinyTalks - Conversational Q&A Dataset
+
+<div style="background: #f0fff4; border-left: 4px solid #22c55e; padding: 1.5rem; margin: 1.5rem 0;">
+
+**📍 Location**: `datasets/tinytalks/`
+**📊 Size**: ~40 KB
+**🎯 Used by**: Milestone 05 (Transformer/GPT text generation)
+
+**Contents:**
+- 350 Q&A pairs across 5 difficulty levels
+- Character-level text data
+- Topics: General knowledge, math, science, reasoning
+- Balanced difficulty distribution
+
+**Format**: Plain text files with Q: / A: format
+
+**Why conversational format?**
+- Engaging: Questions feel natural
+- Varied: Different answer lengths and complexity
+- Educational: Difficulty levels scaffold learning
+- Practical: Mirrors real chatbot use cases
+
+**Example:**
+```
+Q: What is the capital of France?
+A: Paris
+
+Q: If a train travels 120 km in 2 hours, what is its average speed?
+A: 60 km/h
+```
+
+**Usage in milestones:**
+```python
+# Automatically loaded by transformer milestones
+from datasets.tinytalks import load_tinytalks
+dataset = load_tinytalks()
+# Returns list of (question, answer) pairs
+```
+
+**📖 See detailed documentation:** `datasets/tinytalks/README.md`
+
+</div>
+
+---
+
+## Downloaded Datasets (Auto-Downloaded On-Demand)
+
+These standard benchmarks download automatically when you run relevant milestone scripts:
+
+### MNIST - Handwritten Digit Classification
+
+<div style="background: #fffbeb; border-left: 4px solid #f59e0b; padding: 1.5rem; margin: 1.5rem 0;">
+
+**📍 Downloads to**: `milestones/datasets/mnist/`
+**📊 Size**: ~10 MB (compressed)
+**🎯 Used by**: `milestones/03_1986_mlp/02_rumelhart_mnist.py`
+
+**Contents:**
+- 60,000 training samples
+- 10,000 test samples
+- 28×28 grayscale images
+- 10 classes (digits 0-9)
+
+**Auto-download**: When you run the MNIST milestone script, it automatically:
+1. Checks if data exists locally
+2. Downloads if needed (~10 MB)
+3. Caches for future runs
+4. Loads data using your TinyTorch DataLoader
+
+**Purpose**: Validate that your framework achieves production-level results (95%+ accuracy target)
+
+**Milestone goal**: Implement backpropagation and achieve 95%+ accuracy—matching 1986 Rumelhart's breakthrough.
+
+</div>
+
+### CIFAR-10 - Natural Image Classification
+
+<div style="background: #fdf2f8; border-left: 4px solid #ec4899; padding: 1.5rem; margin: 1.5rem 0;">
+
+**📍 Downloads to**: `milestones/datasets/cifar-10/`
+**📊 Size**: ~170 MB (compressed)
+**🎯 Used by**: `milestones/04_1998_cnn/02_lecun_cifar10.py`
+
+**Contents:**
+- 50,000 training samples
+- 10,000 test samples
+- 32×32 RGB images
+- 10 classes (airplane, car, bird, cat, deer, dog, frog, horse, ship, truck)
+
+**Auto-download**: Milestone script handles everything:
+1. Downloads from official source
+2. Verifies integrity
+3. Caches locally
+4. Preprocesses for your framework
+
+**Purpose**: Prove your CNN implementation works on real natural images (75%+ accuracy target)
+
+**Milestone goal**: Build LeNet-style CNN achieving 75%+ accuracy—demonstrating spatial intelligence.
+
+</div>
+
+---
+
+## Dataset Selection Rationale
+
+### Why These Specific Datasets?
+
+**TinyDigits (not full MNIST):**
+- ✅ 100× faster training iterations
+- ✅ Ships with repo (no download)
+- ✅ Same conceptual challenges
+- ✅ Perfect for learning and debugging
+
+**TinyTalks (custom dataset):**
+- ✅ Designed for educational progression
+- ✅ Scaffolded difficulty levels
+- ✅ Character-level tokenization friendly
+- ✅ Engaging conversational format
+
+**MNIST (when scaling up):**
+- ✅ Industry standard benchmark
+- ✅ Validates your implementation
+- ✅ Comparable to published results
+- ✅ 95%+ accuracy is achievable milestone
+
+**CIFAR-10 (for CNN validation):**
+- ✅ Natural images (harder than digits)
+- ✅ RGB channels (multi-dimensional)
+- ✅ Standard CNN benchmark
+- ✅ 75%+ with basic CNN proves it works
+
+---
+
+## Accessing Datasets
+
+### For Students
+
+**You don't need to manually download anything!**
+
+```bash
+# Just run milestone scripts
+cd milestones/03_1986_mlp
+python 01_rumelhart_tinydigits.py  # Uses shipped TinyDigits
+
+python 02_rumelhart_mnist.py       # Auto-downloads MNIST if needed
+```
+
+The milestones handle all data loading automatically.
+
+### For Developers/Researchers
+
+**Direct dataset access:**
+
+```python
+# Shipped datasets (always available)
+from datasets.tinydigits import load_tinydigits
+X_train, y_train, X_test, y_test = load_tinydigits()
+
+from datasets.tinytalks import load_tinytalks
+conversations = load_tinytalks()
+
+# Downloaded datasets (through milestones)
+# See milestones/data_manager.py for download utilities
+```
+
+---
+
+## Dataset Sizes Summary
+
+| Dataset | Size | Samples | Ships With Repo | Purpose |
+|---------|------|---------|-----------------|---------|
+| TinyDigits | 310 KB | 1,200 | ✅ Yes | Fast MLP/CNN iteration |
+| TinyTalks | 40 KB | 350 pairs | ✅ Yes | Transformer learning |
+| MNIST | 10 MB | 70,000 | ❌ Downloads | MLP validation |
+| CIFAR-10 | 170 MB | 60,000 | ❌ Downloads | CNN validation |
+
+**Total shipped**: ~350 KB
+**Total with benchmarks**: ~180 MB
+
+---
+
+## Why Ship-with-Repo Matters
+
+<div style="background: #e3f2fd; padding: 1.5rem; border-radius: 0.5rem; margin: 1.5rem 0;">
+
+**Traditional ML courses:**
+- "Download MNIST (10 MB)"
+- "Download CIFAR-10 (170 MB)"
+- Wait for downloads before starting
+- Large files in Git (bad practice)
+
+**TinyTorch approach:**
+- Clone repo → Immediately start learning
+- Train first model in under 1 minute
+- Full benchmarks download only when scaling
+- Git repo stays small and fast
+
+**Educational benefit**: Students see working models within minutes, not hours.
+
+</div>
+
+---
+
+## Frequently Asked Questions
+
+**Q: Why not use full MNIST from the start?**
+A: TinyDigits trains 100× faster, enabling rapid iteration during learning. MNIST validates your complete implementation later.
+
+**Q: Can I use my own datasets?**
+A: Absolutely! TinyTorch is a real framework—add your data loading code just like PyTorch.
+
+**Q: Why ship datasets in Git?**
+A: 350 KB is negligible (smaller than many images), and it enables offline learning with instant iteration.
+
+**Q: Where does CIFAR-10 download from?**
+A: Official sources via `milestones/data_manager.py`, with integrity verification.
+
+**Q: Can I skip the large downloads?**
+A: Yes! You can work through most milestones using only shipped datasets. Downloaded datasets are for validation milestones.
+
+---
+
+## Related Documentation
+
+- **📖 [Milestones Guide](chapters/milestones.html)** - See how each dataset is used in historical achievements
+- **📖 [Student Workflow](student-workflow.html)** - Learn the development cycle
+- **📖 [Quick Start](quickstart-guide.html)** - Start building in 15 minutes
+
+**Dataset implementation details**: See `datasets/tinydigits/README.md` and `datasets/tinytalks/README.md` for technical specifications.
--- a/site/faq.md
+++ b/site/faq.md
@@ -0,0 +1,385 @@
+# Frequently Asked Questions
+
+<div style="background: #f8f9fa; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0; text-align: center;">
+<h2 style="margin: 0 0 1rem 0; color: #495057;">Common Questions About TinyTorch</h2>
+<p style="margin: 0; font-size: 1.1rem; color: #6c757d;">Why build from scratch? Why not just use PyTorch? All your questions answered.</p>
+</div>
+
+## General Questions
+
+### What is TinyTorch?
+
+TinyTorch is an educational ML systems framework where you build a complete neural network library from scratch. Instead of using PyTorch or TensorFlow as black boxes, you implement every component yourself—tensors, gradients, optimizers, attention mechanisms—gaining deep understanding of how modern ML frameworks actually work.
+
+### Who is TinyTorch for?
+
+TinyTorch is designed for:
+
+- **Students** learning ML who want to understand what's happening under the hood
+- **ML practitioners** who want to debug models more effectively
+- **Systems engineers** building or optimizing ML infrastructure
+- **Researchers** who need to implement novel architectures
+- **Educators** teaching ML systems (not just ML algorithms)
+
+If you've ever wondered "why does my model OOM?" or "how does autograd actually work?", TinyTorch is for you.
+
+### How long does it take?
+
+**Quick exploration**: 2-4 weeks focusing on Foundation Tier (Modules 01-07)
+**Complete course**: 14-18 weeks implementing all three tiers (20 modules)
+**Flexible approach**: Pick specific modules based on your learning goals
+
+You control the pace. Some students complete it in intensive 8-week sprints, others spread it across a semester.
+
+---
+
+## Why TinyTorch vs. Alternatives?
+
+### Why not just use PyTorch or TensorFlow directly?
+
+**Short answer**: Because using a library doesn't teach you how it works.
+
+**The problem with "just use PyTorch":**
+
+When you write:
+```python
+import torch.nn as nn
+model = nn.Linear(784, 10)
+optimizer = torch.optim.Adam(model.parameters())
+```
+
+You're calling functions you don't understand. When things break (and they will), you're stuck:
+- **OOM errors**: Why? How much memory does this need?
+- **Slow training**: What's the bottleneck? Data loading? Computation?
+- **NaN losses**: Where did gradients explode? How do you debug?
+
+**What TinyTorch teaches:**
+
+When you implement `Linear` yourself:
+```python
+class Linear:
+    def __init__(self, in_features, out_features):
+        # You understand EXACTLY what memory is allocated
+        self.weight = randn(in_features, out_features) * 0.01  # Why 0.01?
+        self.bias = zeros(out_features)  # Why zeros?
+
+    def forward(self, x):
+        self.input = x  # Why save input? (Hint: backward pass)
+        return x @ self.weight + self.bias  # You know the exact operations
+
+    def backward(self, grad):
+        # You wrote this gradient! You can debug it!
+        self.weight.grad = self.input.T @ grad
+        return grad @ self.weight.T
+```
+
+Now you can:
+- **Calculate memory requirements** before running
+- **Profile and optimize** every operation
+- **Debug gradient issues** by inspecting your own code
+- **Implement novel architectures** with confidence
+
+### Why TinyTorch instead of Andrej Karpathy's micrograd or nanoGPT?
+
+We love micrograd and nanoGPT! They're excellent educational resources. Here's how TinyTorch differs:
+
+**micrograd (100 lines)**
+- **Scope**: Teaches autograd elegantly in minimal code
+- **Limitation**: Doesn't cover CNNs, transformers, data loading, optimization
+- **Use case**: Perfect introduction to automatic differentiation
+
+**nanoGPT (300 lines)**
+- **Scope**: Clean GPT implementation for understanding transformers
+- **Limitation**: Doesn't teach fundamentals (tensors, layers, training loops)
+- **Use case**: Excellent for understanding transformer architecture specifically
+
+**TinyTorch (20 modules, complete framework)**
+- **Scope**: Full ML systems course from mathematical primitives to production deployment
+- **Coverage**:
+  - Foundation (tensors, autograd, optimizers)
+  - Architecture (CNNs for vision, transformers for language)
+  - Optimization (profiling, quantization, benchmarking)
+- **Outcome**: You build a unified framework supporting both vision AND language models
+- **Systems focus**: Memory profiling, performance analysis, and production context built into every module
+
+**Analogy:**
+- **micrograd**: Learn how an engine works
+- **nanoGPT**: Learn how a sports car works
+- **TinyTorch**: Build a complete vehicle manufacturing plant (and understand engines, cars, AND the factory)
+
+**When to use each:**
+- **Start with micrograd** if you want a gentle introduction to autograd (1-2 hours)
+- **Try nanoGPT** if you specifically want to understand GPT architecture (1-2 days)
+- **Choose TinyTorch** if you want complete ML systems engineering skills (8-18 weeks)
+
+### Why not just read PyTorch source code?
+
+**Three problems with reading production framework code:**
+
+1. **Complexity**: PyTorch has 350K+ lines optimized for production, not learning
+2. **C++/CUDA**: Core operations are in low-level languages for performance
+3. **No learning path**: Where do you even start?
+
+**TinyTorch's pedagogical approach:**
+
+1. **Incremental complexity**: Start with 2D matrices, build up to 4D tensors
+2. **Pure Python**: Understand algorithms before optimization
+3. **Guided curriculum**: Clear progression from basics to advanced
+4. **Systems thinking**: Every module includes profiling and performance analysis
+
+You learn the *concepts* in TinyTorch, then understand how PyTorch optimizes them for production.
+
+---
+
+## Technical Questions
+
+### What programming background do I need?
+
+**Required:**
+- Python programming (functions, classes, basic NumPy)
+- Basic calculus (derivatives, chain rule)
+- Linear algebra (matrix multiplication)
+
+**Helpful but not required:**
+- Git version control
+- Command-line comfort
+- Previous ML course (though TinyTorch teaches from scratch)
+
+### What hardware do I need?
+
+**Minimum:**
+- Any laptop with 8GB RAM
+- Works on M1/M2 Macs, Intel, AMD
+
+**No GPU required!** TinyTorch runs on CPU and teaches concepts that transfer to GPU optimization.
+
+### Does TinyTorch replace a traditional ML course?
+
+**No, it complements it.**
+
+**Traditional ML course teaches:**
+- Algorithms (gradient descent, backpropagation)
+- Theory (loss functions, regularization)
+- Applications (classification, generation)
+
+**TinyTorch teaches:**
+- Systems (how frameworks work)
+- Implementation (building from scratch)
+- Production (profiling, optimization, deployment)
+
+**Best approach**: Take a traditional ML course for theory, use TinyTorch to deeply understand implementation.
+
+### Can I use TinyTorch for research or production?
+
+**Research**: Absolutely! Build novel architectures with full control
+**Production**: TinyTorch is educational—use PyTorch/TensorFlow for production scale
+
+**However:** Understanding TinyTorch makes you much better at using production frameworks. You'll:
+- Write more efficient PyTorch code
+- Debug issues faster
+- Understand performance characteristics
+- Make better architectural decisions
+
+---
+
+## Course Structure Questions
+
+### Do I need to complete all 20 modules?
+
+**No!** TinyTorch offers flexible learning paths:
+
+**Three tiers:**
+1. **Foundation (01-07)**: Core ML infrastructure—understand how training works
+2. **Architecture (08-13)**: Modern AI architectures—CNNs and transformers
+3. **Optimization (14-20)**: Production deployment—profiling and acceleration
+
+**Suggested paths:**
+- **ML student**: Foundation tier gives you deep understanding
+- **Systems engineer**: All three tiers teach complete ML systems
+- **Researcher**: Focus on Foundation + Architecture for implementation skills
+- **Curious learner**: Pick modules that interest you
+
+### What are the milestones?
+
+Milestones are historical ML achievements you recreate with YOUR implementations:
+
+- **M01: 1957 Perceptron** - First trainable neural network
+- **M02: 1969 XOR** - Multi-layer networks solve XOR problem
+- **M03: 1986 MLP** - Backpropagation achieves 95%+ on MNIST
+- **M04: 1998 CNN** - LeNet-style CNN gets 75%+ on CIFAR-10
+- **M05: 2017 Transformer** - GPT-style text generation
+- **M06: 2018 MLPerf** - Production optimization benchmarking
+
+Each milestone proves your framework works by running actual ML experiments.
+
+**📖 See [Journey Through ML History](chapters/milestones.html)** for details.
+
+### Are the checkpoints required?
+
+**No, they're optional.**
+
+**The essential workflow:**
+```
+1. Edit modules → 2. Export → 3. Validate with milestones
+```
+
+**Optional checkpoint system:**
+- Tracks 21 capability checkpoints
+- Helpful for self-assessment
+- Use `tito checkpoint status` to view progress
+
+**📖 See [Student Workflow](student-workflow.html)** for the core development cycle.
+
+---
+
+## Practical Questions
+
+### How do I get started?
+
+**Quick start (15 minutes):**
+
+```bash
+# 1. Clone repository
+git clone https://github.com/mlsysbook/TinyTorch.git
+cd TinyTorch
+
+# 2. Automated setup
+./setup-environment.sh
+source activate.sh
+
+# 3. Verify setup
+tito system doctor
+
+# 4. Start first module
+cd modules/01_tensor
+jupyter lab tensor_dev.py
+```
+
+**📖 See [Quick Start Guide](quickstart-guide.html)** for detailed setup.
+
+### What's the typical workflow?
+
+```bash
+# 1. Work on module source
+cd modules/03_layers
+jupyter lab layers_dev.py
+
+# 2. Export when ready
+tito module complete 03
+
+# 3. Validate by running milestones
+cd ../../milestones/01_1957_perceptron
+python rosenblatt_forward.py  # Uses YOUR implementation!
+```
+
+**📖 See [Student Workflow](student-workflow.html)** for complete details.
+
+### Can I use this in my classroom?
+
+**Yes!** TinyTorch is designed for classroom use.
+
+**Current status:**
+- Students can work through modules individually
+- NBGrader integration coming soon for automated grading
+- Instructor tooling under development
+
+**📖 See [Classroom Use Guide](usage-paths/classroom-use.html)** for details.
+
+### How do I get help?
+
+**Resources:**
+- **Documentation**: Comprehensive guides for every module
+- **GitHub Issues**: Report bugs or ask questions
+- **Community**: (Coming soon) Discord/forum for peer support
+
+---
+
+## Philosophy Questions
+
+### Why build from scratch instead of using libraries?
+
+**The difference between using and understanding:**
+
+When you import a library, you're limited by what it provides. When you build from scratch, you understand the foundations and can create anything.
+
+**Real-world impact:**
+- **Debugging**: "My model won't train" → You know exactly where to look
+- **Optimization**: "Training is slow" → You can profile and fix bottlenecks
+- **Innovation**: "I need a novel architecture" → You build it confidently
+- **Career**: ML systems engineers who understand internals are highly valued
+
+### Isn't this reinventing the wheel?
+
+**Yes, intentionally!**
+
+**The best way to learn engineering:** Build it yourself.
+
+- Car mechanics learn by taking apart engines
+- Civil engineers build bridge models
+- Software engineers implement data structures from scratch
+
+**Then** they use production tools with deep understanding.
+
+### Will I still use PyTorch/TensorFlow after this?
+
+**Absolutely!** TinyTorch makes you *better* at using production frameworks.
+
+**Before TinyTorch:**
+```python
+model = nn.Sequential(nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10))
+# It works but... why 128? What's the memory usage? How does ReLU affect gradients?
+```
+
+**After TinyTorch:**
+```python
+model = nn.Sequential(nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10))
+# I know: 784*128 + 128*10 params = ~100K params * 4 bytes = ~400KB
+# I understand: ReLU zeros negative gradients, affects backprop
+# I can optimize: Maybe use smaller hidden layer or quantize to INT8
+```
+
+You use the same tools, but with systems-level understanding.
+
+---
+
+## Community Questions
+
+### Can I contribute to TinyTorch?
+
+**Yes!** TinyTorch is open-source and welcomes contributions:
+
+- Bug fixes and improvements
+- Documentation enhancements
+- Additional modules or extensions
+- Educational resources
+
+Check the GitHub repository for contribution guidelines.
+
+### Is there a community?
+
+**Growing!** TinyTorch is launching to the community in December 2024.
+
+- GitHub Discussions for Q&A
+- Optional leaderboard for module 20 competition
+- Community showcase (coming soon)
+
+### How is TinyTorch maintained?
+
+TinyTorch is developed at the intersection of academia and education:
+- Research-backed pedagogy
+- Active development and testing
+- Community feedback integration
+- Regular updates and improvements
+
+---
+
+## Still Have Questions?
+
+<div style="background: #f8f9fa; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0; text-align: center;">
+<h3 style="margin: 0 0 1rem 0; color: #495057;">Ready to Start Building?</h3>
+<p style="margin: 0 0 1.5rem 0; color: #6c757d;">Jump in and start implementing ML systems from scratch</p>
+<a href="quickstart-guide.html" style="display: inline-block; background: #007bff; color: white; padding: 0.75rem 1.5rem; border-radius: 0.25rem; text-decoration: none; font-weight: 500; margin-right: 1rem;">15-Minute Start →</a>
+<a href="intro.html" style="display: inline-block; background: #28a745; color: white; padding: 0.75rem 1.5rem; border-radius: 0.25rem; text-decoration: none; font-weight: 500;">Learn More →</a>
+</div>
+
+**Can't find your question?** Open an issue on [GitHub](https://github.com/mlsysbook/TinyTorch/issues) and we'll help!
--- a/site/learning-progress.md
+++ b/site/learning-progress.md
@@ -2,7 +2,7 @@

 <div style="background: #f8f9fa; padding: 2rem; border-radius: 0.5rem; margin: 2rem 0; text-align: center;">
 <h2 style="margin: 0 0 1rem 0; color: #495057;">Monitor Your Learning Journey</h2>
-<p style="margin: 0; font-size: 1.1rem; color: #6c757d;">Track your capability development through 18 modules and 6 historical milestones</p>
+<p style="margin: 0; font-size: 1.1rem; color: #6c757d;">Track your capability development through 20 modules and 6 historical milestones</p>
 </div>

 **Purpose**: Monitor your progress as you build a complete ML framework from scratch. Track module completion and milestone achievements.
@@ -65,7 +65,7 @@ TinyTorch organizes learning through **three pedagogically-motivated tiers**, ea

 ## Module Progression

-Your journey through 18 modules organized in three tiers:
+Your journey through 20 modules organized in three tiers:

 ### 🏗️ Foundation Tier (Modules 01-07)

@@ -98,7 +98,7 @@ Implement modern architectures:

 **Milestones unlocked**: M03 MLP (1986), M04 CNN (1998), M05 Transformers (2017)

-### ⚡ Optimization Tier (Modules 14-18)
+### ⚡ Optimization Tier (Modules 14-20)

 Optimize for production:

@@ -109,6 +109,8 @@ Optimize for production:
 | 16 | Compression | Pruning techniques |
 | 17 | Memoization | KV-cache for generation |
 | 18 | Acceleration | Batching strategies |
+| 19 | Benchmarking | MLPerf-style fair comparison |
+| 20 | Competition | Capstone optimization challenge |

 **Milestone unlocked**: M06 MLPerf (2018)

--- a/site/student-workflow.md
+++ b/site/student-workflow.md
@@ -68,7 +68,7 @@ See [Milestones Guide](chapters/milestones.md) for the full progression.

 ## Module Progression

-TinyTorch has 18 modules organized in three tiers:
+TinyTorch has 20 modules organized in three tiers:

 ### 🏗️ Foundation (Modules 01-07)
 Core ML infrastructure - tensors, autograd, training loops
@@ -85,8 +85,8 @@ Neural network architectures - data loading, CNNs, transformers
 - M04: CNNs (after Module 09)
 - M05: Transformers (after Module 13)

-### ⚡ Optimization (Modules 14-18)
-Production optimization - profiling, quantization, acceleration
+### ⚡ Optimization (Modules 14-20)
+Production optimization - profiling, quantization, benchmarking, capstone

 **Milestones unlocked:**
 - M06: MLPerf (after Module 18)
--- a/site/usage-paths/classroom-use.md
+++ b/site/usage-paths/classroom-use.md
@@ -28,7 +28,7 @@
 <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem;">
 <div>
 <ul style="margin: 0; padding-left: 1rem;">
-<li><strong>Three-tier progression</strong> (18 modules) with NBGrader integration</li>
+<li><strong>Three-tier progression</strong> (20 modules) with NBGrader integration</li>
 <li><strong>Automated grading</strong> for immediate feedback</li>
 <li><strong>Professional CLI tools</strong> for development workflow</li>
 <li><strong>Real datasets</strong> (CIFAR-10, text generation)</li>