diff --git a/.claude/README.md b/.claude/README.md new file mode 100644 index 00000000..92e8d3d5 --- /dev/null +++ b/.claude/README.md @@ -0,0 +1,138 @@ +# TinyTorch .claude Directory Structure + +This directory contains all guidelines, standards, and agent definitions for the TinyTorch project. + +## ๐Ÿ“ Directory Structure + +``` +.claude/ +โ”œโ”€โ”€ README.md # This file +โ”œโ”€โ”€ guidelines/ # Development standards and principles +โ”‚ โ”œโ”€โ”€ DESIGN_PHILOSOPHY.md # KISS principle and simplicity guidelines +โ”‚ โ”œโ”€โ”€ GIT_WORKFLOW.md # Git branching and commit standards +โ”‚ โ”œโ”€โ”€ MODULE_DEVELOPMENT.md # How to develop TinyTorch modules +โ”‚ โ”œโ”€โ”€ TESTING_STANDARDS.md # Testing patterns and requirements +โ”‚ โ”œโ”€โ”€ PERFORMANCE_CLAIMS.md # How to make honest performance claims +โ”‚ โ””โ”€โ”€ AGENT_COORDINATION.md # How AI agents work together +โ”œโ”€โ”€ agents/ # AI agent definitions +โ”‚ โ”œโ”€โ”€ technical-program-manager.md +โ”‚ โ”œโ”€โ”€ education-architect.md +โ”‚ โ”œโ”€โ”€ module-developer.md +โ”‚ โ”œโ”€โ”€ package-manager.md +โ”‚ โ”œโ”€โ”€ quality-assurance.md +โ”‚ โ”œโ”€โ”€ documentation-publisher.md +โ”‚ โ”œโ”€โ”€ workflow-coordinator.md +โ”‚ โ”œโ”€โ”€ devops-engineer.md +โ”‚ โ””โ”€โ”€ tito-cli-developer.md +โ””โ”€โ”€ [legacy files to review] + +``` + +## ๐ŸŽฏ Quick Start for New Development + +1. **Read Core Principles First** + - `guidelines/DESIGN_PHILOSOPHY.md` - Understand KISS principle + - `guidelines/GIT_WORKFLOW.md` - Learn branching requirements + +2. **For Module Development** + - `guidelines/MODULE_DEVELOPMENT.md` - Module structure and patterns + - `guidelines/TESTING_STANDARDS.md` - How to write tests + - `guidelines/PERFORMANCE_CLAIMS.md` - How to report results + +3. **For Agent Coordination** + - `guidelines/AGENT_COORDINATION.md` - How agents work together + - Start with Technical Program Manager (TPM) for all requests + +## ๐Ÿ“‹ Key Principles Summary + +### 1. Keep It Simple, Stupid (KISS) +- One file, one purpose +- Clear over clever +- Verified over theoretical +- Direct over abstract + +### 2. Git Workflow +- ALWAYS work on feature branches +- NEVER commit directly to main/dev +- Test before committing +- No automated attribution in commits + +### 3. Module Development +- Edit .py files only (never .ipynb) +- Test immediately after implementation +- Include systems analysis (memory, performance) +- Follow exact structure pattern + +### 4. Testing Standards +- Test immediately, not at the end +- Simple assertions over complex frameworks +- Tests should educate, not just verify +- Always compare against baseline + +### 5. Performance Claims +- Only claim what you've measured +- Include all relevant metrics +- Report failures honestly +- Reproducibility is key + +### 6. Agent Coordination +- TPM is primary interface +- Sequential workflow with clear handoffs +- QA testing is MANDATORY +- Package integration is MANDATORY + +## ๐Ÿš€ Common Workflows + +### Starting New Module Development +```bash +1. Create feature branch +2. Request TPM agent assistance +3. Follow MODULE_DEVELOPMENT.md structure +4. Test with TESTING_STANDARDS.md patterns +5. Verify performance per PERFORMANCE_CLAIMS.md +6. Merge following GIT_WORKFLOW.md +``` + +### Making Performance Claims +```bash +1. Run baseline measurements +2. Run actual measurements +3. Calculate real improvements +4. Document with all metrics +5. No unverified claims +``` + +### Working with Agents +```bash +1. Always start with TPM agent +2. Let TPM coordinate other agents +3. Wait for QA approval before proceeding +4. Wait for Package Manager integration +5. Only then commit +``` + +## ๐Ÿ“ Important Notes + +- **Virtual Environment**: Always activate .venv before development +- **Honesty**: Report actual results, not aspirations +- **Simplicity**: When in doubt, choose the simpler option +- **Education First**: We're teaching, not impressing + +## ๐Ÿ”— Quick Links + +- Main Instructions: `/CLAUDE.md` +- Module Source: `/modules/source/` +- Examples: `/examples/` +- Tests: `/tests/` + +## ๐Ÿ“Œ Remember + +> "If students can't understand it, we've failed." + +Every decision should be filtered through: +1. Is it simple? +2. Is it honest? +3. Is it educational? +4. Is it verified? + +If any answer is "no", reconsider. \ No newline at end of file diff --git a/.claude/guidelines/AGENT_COORDINATION.md b/.claude/guidelines/AGENT_COORDINATION.md new file mode 100644 index 00000000..94be6be3 --- /dev/null +++ b/.claude/guidelines/AGENT_COORDINATION.md @@ -0,0 +1,204 @@ +# TinyTorch Agent Coordination Guidelines + +## ๐ŸŽฏ Core Principle + +**Agents work in sequence with clear handoffs, not in isolation.** + +## ๐Ÿค– The Agent Team + +### Primary Interface: Technical Program Manager (TPM) + +The TPM is your SINGLE point of communication for all development. + +``` +User Request โ†’ TPM โ†’ Coordinates Agents โ†’ Reports Back +``` + +**The TPM knows when to invoke:** +- Education Architect - Learning design +- Module Developer - Implementation +- Package Manager - Integration +- Quality Assurance - Testing +- Documentation Publisher - Content +- Workflow Coordinator - Process +- DevOps Engineer - Infrastructure +- Tito CLI Developer - CLI features + +## ๐Ÿ“‹ Standard Development Workflow + +### The Sequential Pattern + +**For EVERY module development:** + +``` +1. Planning (Workflow Coordinator + Education Architect) + โ†“ +2. Implementation (Module Developer) + โ†“ +3. Testing (Quality Assurance) โ† MANDATORY + โ†“ +4. Integration (Package Manager) โ† MANDATORY + โ†“ +5. Documentation (Documentation Publisher) + โ†“ +6. Review (Workflow Coordinator) +``` + +### Critical Handoff Points + +**Module Developer โ†’ QA Agent** +```python +# Module Developer completes implementation +"Implementation complete. Ready for QA testing. +Files modified: 02_tensor_dev.py +Key changes: Added reshape operation with broadcasting" + +# QA MUST test before proceeding +``` + +**QA Agent โ†’ Package Manager** +```python +# QA completes testing +"All tests passed. +- Module imports correctly +- All functions work as expected +- Performance benchmarks met +Ready for package integration" + +# Package Manager MUST verify integration +``` + +## ๐Ÿšซ Blocking Rules + +### QA Agent Can Block Progress + +**If tests fail, STOP everything:** +- No commits allowed +- No integration permitted +- Must fix and re-test + +### Package Manager Can Block Release + +**If integration fails:** +- Module doesn't export correctly +- Breaks other modules +- Package won't build + +## ๐Ÿ“ Agent Communication Protocol + +### Structured Handoffs + +Every handoff must include: +1. **What was completed** +2. **What needs to be done next** +3. **Any issues found** +4. **Test results (if applicable)** +5. **Recommendations** + +**Example:** +``` +From: Module Developer +To: QA Agent + +Completed: +- Implemented attention mechanism in 07_attention_dev.py +- Added scaled dot-product attention +- Included positional encoding + +Needs Testing: +- Attention score computation +- Mask application +- Memory usage with large sequences + +Known Issues: +- Performance degrades with sequences >1000 tokens + +Recommendations: +- Focus testing on edge cases with padding +``` + +## ๐Ÿ”„ Parallel vs Sequential Work + +### Can Work in Parallel + +โœ… Different modules by different developers +โœ… Documentation while code is being tested +โœ… Planning next modules while current ones build + +### Must Be Sequential + +โŒ Implementation โ†’ Testing (MUST test after implementation) +โŒ Testing โ†’ Integration (MUST pass tests first) +โŒ Integration โ†’ Commit (MUST integrate successfully) + +## ๐ŸŽฏ The Checkpoint Success Story + +**How agents successfully implemented the 16-checkpoint system:** + +1. **Education Architect** designed capability progression +2. **Workflow Coordinator** orchestrated implementation +3. **Module Developer** built checkpoint tests + CLI +4. **QA Agent** validated all 16 checkpoints work +5. **Package Manager** ensured integration with modules +6. **Documentation Publisher** updated all docs + +**Result:** Complete working system with proper handoffs + +## โš ๏ธ Common Coordination Failures + +### Working in Isolation +โŒ Module Developer implements without QA testing +โŒ Documentation written before code works +โŒ Integration attempted before tests pass + +### Skipping Handoffs +โŒ Direct commit without QA approval +โŒ Missing Package Manager validation +โŒ No Workflow Coordinator review + +### Poor Communication +โŒ "It's done" (no details) +โŒ No test results provided +โŒ Issues discovered but not reported + +## ๐Ÿ“‹ Agent Checklist + +### Before Module Developer Starts +- [ ] Education Architect defined learning objectives +- [ ] Workflow Coordinator approved plan +- [ ] Clear specifications provided + +### Before QA Testing +- [ ] Module Developer completed ALL implementation +- [ ] Code follows standards +- [ ] Basic self-testing done + +### Before Package Integration +- [ ] QA Agent ran comprehensive tests +- [ ] All tests PASSED +- [ ] Performance acceptable + +### Before Commit +- [ ] Package Manager verified integration +- [ ] Documentation complete +- [ ] Workflow Coordinator approved + +## ๐Ÿ”ง Conflict Resolution + +**If agents disagree:** + +1. **QA has veto on quality** - If tests fail, stop +2. **Education Architect owns learning objectives** +3. **Workflow Coordinator resolves other disputes** +4. **User has final override** + +## ๐Ÿ“Œ Remember + +> Agents amplify capabilities when coordinated, create chaos when isolated. + +**Key Success Factors:** +- Clear handoffs between agents +- Mandatory testing and integration +- Structured communication +- Sequential workflow where needed +- Parallel work where possible \ No newline at end of file diff --git a/.claude/guidelines/DESIGN_PHILOSOPHY.md b/.claude/guidelines/DESIGN_PHILOSOPHY.md new file mode 100644 index 00000000..145b4504 --- /dev/null +++ b/.claude/guidelines/DESIGN_PHILOSOPHY.md @@ -0,0 +1,212 @@ +# TinyTorch Design Philosophy + +## ๐ŸŽฏ Core Principle: Keep It Simple, Stupid (KISS) + +**Simplicity is the soul of TinyTorch. We are building an educational framework where clarity beats cleverness every time.** + +## ๐Ÿ“š Why Simplicity Matters + +TinyTorch is for students learning ML systems engineering. If they can't understand it, we've failed our mission. Every design decision should prioritize: + +1. **Readability** over performance +2. **Clarity** over cleverness +3. **Directness** over abstraction +4. **Honesty** over aspiration + +## ๐Ÿš€ KISS Guidelines + +### Code Simplicity + +**โœ… DO:** +- Write code that reads like a textbook +- Use descriptive variable names (`gradient` not `g`) +- Implement one concept per file +- Show the direct path from input to output +- Keep functions short and focused + +**โŒ DON'T:** +- Use clever one-liners that require decoding +- Create unnecessary abstractions +- Optimize prematurely +- Hide complexity behind magic + +**Example:** +```python +# โœ… GOOD: Clear and direct +def forward(self, x): + h1 = self.relu(self.fc1(x)) + h2 = self.relu(self.fc2(h1)) + return self.fc3(h2) + +# โŒ BAD: Clever but unclear +def forward(self, x): + return reduce(lambda h, l: self.relu(l(h)) if l != self.layers[-1] else l(h), + self.layers, x) +``` + +### File Organization + +**โœ… DO:** +- One purpose per file +- Clear, descriptive filenames +- Minimal file count + +**โŒ DON'T:** +- Create multiple versions of the same thing +- Split related code unnecessarily +- Create deep directory hierarchies + +**Example:** +``` +โœ… GOOD: +examples/cifar10/ +โ”œโ”€โ”€ random_baseline.py # Shows untrained performance +โ”œโ”€โ”€ train.py # Training script +โ””โ”€โ”€ README.md # Simple documentation + +โŒ BAD: +examples/cifar10/ +โ”œโ”€โ”€ train_basic.py +โ”œโ”€โ”€ train_optimized.py +โ”œโ”€โ”€ train_advanced.py +โ”œโ”€โ”€ train_experimental.py +โ”œโ”€โ”€ train_with_ui.py +โ””โ”€โ”€ ... (20 more variations) +``` + +### Documentation Simplicity + +**โœ… DO:** +- State what it does clearly +- Give one good example +- Report verified results only +- Keep README files short + +**โŒ DON'T:** +- Write novels in docstrings +- Promise theoretical performance +- Add complex diagrams for simple concepts +- Create documentation that's longer than the code + +**Example:** +```python +# โœ… GOOD: Clear and concise +""" +Train a neural network on CIFAR-10 images. +Achieves 55% accuracy in 2 minutes. +""" + +# โŒ BAD: Over-documented +""" +This advanced training framework implements state-of-the-art optimization +techniques including adaptive learning rate scheduling, progressive data +augmentation, and sophisticated regularization strategies to push the +boundaries of what's possible with MLPs on CIFAR-10, potentially achieving +60-70% accuracy with proper hyperparameter tuning... +[continues for 500 more words] +""" +``` + +### Performance Claims + +**โœ… DO:** +- Report what you actually measured +- Include training time +- Be honest about limitations +- Compare against clear baselines + +**โŒ DON'T:** +- Claim unverified performance +- Hide negative results +- Exaggerate improvements +- Make theoretical claims + +**Example:** +```markdown +โœ… GOOD: +- Random baseline: 10% (measured) +- Trained model: 55% (measured) +- Training time: 2 minutes + +โŒ BAD: +- Can achieve 60-70% with optimization (unverified) +- State-of-the-art MLP performance (vague) +- Approaches CNN-level accuracy (misleading) +``` + +## ๐ŸŽ“ Educational Simplicity + +### Learning Progression + +**โœ… DO:** +- Build concepts incrementally +- Show before explaining +- Test immediately after implementing +- Keep examples minimal but complete + +**โŒ DON'T:** +- Jump to complex examples +- Hide important details +- Add unnecessary features +- Overwhelm with options + +### Error Messages + +**โœ… DO:** +- Make errors educational +- Suggest fixes +- Show what went wrong clearly + +**โŒ DON'T:** +- Hide errors +- Use cryptic messages +- Stack trace without context + +## ๐Ÿ” Decision Framework + +When making any design decision, ask: + +1. **Can a student understand this in 30 seconds?** + - If no โ†’ simplify + +2. **Is there a simpler way that still works?** + - If yes โ†’ use it + +3. **Does this add essential value?** + - If no โ†’ remove it + +4. **Would I want to debug this at 2 AM?** + - If no โ†’ rewrite it + +## ๐Ÿ“ Examples of KISS in Action + +### Recent CIFAR-10 Cleanup +**Before:** 20+ experimental files with complex optimizations +**After:** 2 files (random_baseline.py, train.py) +**Result:** Clearer story, same educational value + +### Module Structure +**Before:** Complex inheritance hierarchies +**After:** Direct implementations students can trace +**Result:** Students understand what's happening + +### Testing +**Before:** Complex test frameworks +**After:** Simple assertions after each implementation +**Result:** Immediate feedback and understanding + +## ๐Ÿšจ When Complexity is OK + +Sometimes complexity is necessary, but it must be: +1. **Essential** to the learning objective +2. **Well-documented** with clear explanations +3. **Isolated** from simpler concepts +4. **Justified** by significant educational value + +Example: Autograd is complex, but it's the core learning objective of that module. + +## ๐Ÿ“Œ Remember + +> "Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away." - Antoine de Saint-Exupรฉry + +**Every line of code, every file, every feature should justify its existence. When in doubt, leave it out.** \ No newline at end of file diff --git a/.claude/GIT_WORKFLOW_STANDARDS.md b/.claude/guidelines/GIT_WORKFLOW.md similarity index 100% rename from .claude/GIT_WORKFLOW_STANDARDS.md rename to .claude/guidelines/GIT_WORKFLOW.md diff --git a/.claude/guidelines/MODULE_DEVELOPMENT.md b/.claude/guidelines/MODULE_DEVELOPMENT.md new file mode 100644 index 00000000..16d6ca60 --- /dev/null +++ b/.claude/guidelines/MODULE_DEVELOPMENT.md @@ -0,0 +1,299 @@ +# TinyTorch Module Development Standards + +## ๐ŸŽฏ Core Principle + +**Modules teach ML systems engineering through building, not just ML algorithms through reading.** + +## ๐Ÿ“ File Structure + +### One Module = One .py File + +``` +modules/source/XX_modulename/ +โ”œโ”€โ”€ modulename_dev.py # The ONLY file you edit +โ”œโ”€โ”€ modulename_dev.ipynb # Auto-generated from .py (DO NOT EDIT) +โ””โ”€โ”€ README.md # Module overview +``` + +**Critical Rules:** +- โœ… ALWAYS edit `.py` files only +- โŒ NEVER edit `.ipynb` notebooks directly +- โœ… Use jupytext to sync .py โ†’ .ipynb + +## ๐Ÿ“š Module Structure Pattern + +Every module MUST follow this exact structure: + +```python +# %% [markdown] +""" +# Module XX: [Name] + +**Learning Objectives:** +- Build [component] from scratch +- Understand [systems concept] +- Analyze performance implications +""" + +# %% [markdown] +""" +## Part 1: Mathematical Foundations +[Theory and complexity analysis] +""" + +# %% [code] +# Implementation + +# %% [markdown] +""" +### Testing [Component] +Let's verify our implementation works correctly. +""" + +# %% [code] +# Immediate test + +# %% [markdown] +""" +## Part 2: Systems Analysis +### Memory Profiling +Let's understand the memory implications. +""" + +# %% [code] +# Memory profiling code + +# %% [markdown] +""" +## Part 3: Production Context +In real ML systems like PyTorch... +""" + +# ... continue pattern ... + +# %% [code] +if __name__ == "__main__": + run_all_tests() + +# %% [markdown] +""" +## ๐Ÿค” ML Systems Thinking +[Interactive questions analyzing implementation] +""" + +# %% [markdown] +""" +## ๐ŸŽฏ Module Summary +[What was learned - ALWAYS LAST] +""" +``` + +## ๐Ÿงช Implementation โ†’ Test Pattern + +**MANDATORY**: Every implementation must be immediately followed by a test. + +```python +# โœ… CORRECT Pattern: + +# %% [markdown] +""" +## Building the Dense Layer +""" + +# %% [code] +class Dense: + def __init__(self, in_features, out_features): + self.weights = np.random.randn(in_features, out_features) * 0.1 + self.bias = np.zeros(out_features) + + def forward(self, x): + return x @ self.weights + self.bias + +# %% [markdown] +""" +### Testing Dense Layer +Let's verify our dense layer handles shapes correctly. +""" + +# %% [code] +def test_dense_layer(): + layer = Dense(10, 5) + x = np.random.randn(32, 10) # Batch of 32, 10 features + output = layer.forward(x) + assert output.shape == (32, 5), f"Expected (32, 5), got {output.shape}" + print("โœ… Dense layer forward pass works!") + +test_dense_layer() +``` + +## ๐Ÿ”ฌ ML Systems Focus + +### MANDATORY Systems Analysis Sections + +Every module MUST include: + +1. **Complexity Analysis** +```python +# %% [markdown] +""" +### Computational Complexity +- Matrix multiply: O(batch ร— in_features ร— out_features) +- Memory usage: O(in_features ร— out_features) for weights +- This becomes the bottleneck when... +""" +``` + +2. **Memory Profiling** +```python +# %% [code] +def profile_memory(): + import tracemalloc + tracemalloc.start() + + layer = Dense(1000, 1000) + x = np.random.randn(128, 1000) + output = layer.forward(x) + + current, peak = tracemalloc.get_traced_memory() + print(f"Peak memory: {peak / 1024 / 1024:.2f} MB") + print("This shows why large models need GPUs!") +``` + +3. **Production Context** +```python +# %% [markdown] +""" +### In Production Systems +PyTorch's nn.Linear does the same thing but with: +- GPU acceleration via CUDA kernels +- Automatic differentiation support +- Optimized BLAS operations +- Memory pooling for efficiency +""" +``` + +## ๐Ÿ“ NBGrader Integration + +### Cell Metadata Structure + +```python +# %% [code] {"nbgrader": {"grade": false, "locked": false, "solution": true, "grade_id": "dense_implementation"}} +### BEGIN SOLUTION +class Dense: + # Full implementation for instructors + ... +### END SOLUTION + +### BEGIN HIDDEN TESTS +# Instructor-only tests +... +### END HIDDEN TESTS +``` + +### Critical NBGrader Rules + +1. **Every cell needs unique grade_id** +2. **Scaffolding stays OUTSIDE solution blocks** +3. **Hidden tests validate student work** +4. **Points should reflect complexity** + +## ๐ŸŽ“ Educational Patterns + +### The "Build โ†’ Measure โ†’ Understand" Pattern + +```python +# 1. BUILD +class LayerNorm: + def forward(self, x): + mean = np.mean(x, axis=-1, keepdims=True) + var = np.var(x, axis=-1, keepdims=True) + return (x - mean) / np.sqrt(var + 1e-5) + +# 2. MEASURE +def measure_performance(): + layer = LayerNorm() + x = np.random.randn(1000, 512) + + start = time.time() + for _ in range(100): + output = layer.forward(x) + elapsed = time.time() - start + + print(f"Time per forward pass: {elapsed/100*1000:.2f}ms") + print(f"Throughput: {100*1000*512/elapsed:.0f} tokens/sec") + +# 3. UNDERSTAND +""" +With 512 dimensions, normalization adds ~2ms overhead. +This is why large models use fused kernels! +""" +``` + +### Progressive Complexity + +Start simple, build up: + +```python +# Step 1: Simplest possible version +def relu_v1(x): + return np.maximum(0, x) + +# Step 2: Add complexity +def relu_v2(x): + # Handle gradients + output = np.maximum(0, x) + output.grad_fn = lambda grad: grad * (x > 0) + return output + +# Step 3: Production version +class ReLU: + def forward(self, x): + self.input = x # Save for backward + return np.maximum(0, x) + + def backward(self, grad): + return grad * (self.input > 0) +``` + +## โš ๏ธ Common Pitfalls + +1. **Too Much Theory** + - Students want to BUILD, not read + - Show through code, not exposition + +2. **Missing Systems Analysis** + - Not just algorithms, but engineering + - Always discuss memory and performance + +3. **Tests at the End** + - Loses educational flow + - Test immediately after implementation + +4. **No Production Context** + - Students need to see real-world relevance + - Compare with PyTorch/TensorFlow + +## ๐Ÿ“Œ Module Checklist + +Before considering a module complete: + +- [ ] All code in .py file (not notebook) +- [ ] Follows exact structure pattern +- [ ] Every implementation has immediate test +- [ ] Includes memory profiling +- [ ] Includes complexity analysis +- [ ] Shows production context +- [ ] NBGrader metadata correct +- [ ] ML systems thinking questions +- [ ] Summary is LAST section +- [ ] Tests run when module executed + +## ๐ŸŽฏ Remember + +> We're teaching ML systems engineering, not just ML algorithms. + +Every module should help students understand: +- How to BUILD ML systems +- Why performance matters +- Where bottlenecks occur +- How production systems work \ No newline at end of file diff --git a/.claude/guidelines/PERFORMANCE_CLAIMS.md b/.claude/guidelines/PERFORMANCE_CLAIMS.md new file mode 100644 index 00000000..763c8a87 --- /dev/null +++ b/.claude/guidelines/PERFORMANCE_CLAIMS.md @@ -0,0 +1,245 @@ +# TinyTorch Performance Claims Guidelines + +## ๐ŸŽฏ Core Principle + +**Only claim what you have measured and verified. Honesty builds trust.** + +## โœ… Verified Performance Standards + +### The Three-Step Verification + +1. **Measure Baseline** +```python +# Random/untrained performance +random_model = create_untrained_model() +baseline_accuracy = evaluate(random_model, test_data) +print(f"Baseline: {baseline_accuracy:.1%}") # Measured: 10% +``` + +2. **Measure Actual Performance** +```python +# Trained model performance +trained_model = train_model(epochs=15) +actual_accuracy = evaluate(trained_model, test_data) +print(f"Actual: {actual_accuracy:.1%}") # Measured: 55% +``` + +3. **Calculate Real Improvement** +```python +improvement = actual_accuracy / baseline_accuracy +print(f"Improvement: {improvement:.1f}ร—") # Measured: 5.5ร— +``` + +### Reporting Requirements + +**ALWAYS include:** +- Exact accuracy percentage +- Training time +- Hardware used +- Number of epochs +- Dataset size + +**Example:** +```markdown +โœ… GOOD: +- Accuracy: 55% on CIFAR-10 test set +- Training time: 2 minutes on M1 MacBook +- Epochs: 15 +- Batch size: 64 + +โŒ BAD: +- "State-of-the-art performance" +- "Can achieve 60-70% with optimization" +- "Approaches CNN-level accuracy" +``` + +## ๐Ÿ“Š The CIFAR-10 Lesson + +### What We Claimed vs Reality + +**Initial Claims (unverified):** +- "60-70% accuracy achievable with optimization" +- "Advanced techniques push beyond baseline" +- "Sophisticated MLPs rival simple CNNs" + +**Actual Results (verified):** +- Baseline: 51-55% consistently +- With optimization attempts: Still ~55% +- Deep networks: Too slow, no improvement +- **Honest conclusion: MLPs achieve 55% reliably** + +### The Right Response + +When results don't match expectations: + +โœ… **CORRECT Approach:** +- Test thoroughly +- Report actual results +- Update documentation +- Explain limitations + +โŒ **WRONG Approach:** +- Keep unverified claims +- Hide negative results +- Blame implementation +- Make excuses + +## ๐Ÿ”ฌ Performance Testing Protocol + +### Minimum Testing Requirements + +```python +def verify_performance_claim(): + """ + Every performance claim must pass this verification. + """ + results = [] + + # Run multiple trials + for trial in range(3): + model = create_model() + accuracy = train_and_evaluate(model) + results.append(accuracy) + + mean_acc = np.mean(results) + std_acc = np.std(results) + + # Report with confidence intervals + print(f"Performance: {mean_acc:.1%} ยฑ {std_acc:.1%}") + + # Only claim if consistent + if std_acc > 0.02: # >2% variance + print("โš ๏ธ High variance - need more testing") + return False + + return True +``` + +### Time Complexity Reporting + +```python +# โœ… GOOD: Measured complexity +def measure_scalability(): + sizes = [100, 1000, 10000] + times = [] + + for size in sizes: + data = create_data(size) + start = time.time() + process(data) + times.append(time.time() - start) + + # Analyze scaling + print("Scaling behavior:") + for size, time in zip(sizes, times): + print(f" n={size}: {time:.2f}s") + + # Determine complexity + if times[2] / times[1] > 90: # 10x data โ†’ 100x time + print("Complexity: O(nยฒ)") + +# โŒ BAD: Theoretical claims +def theoretical_complexity(): + print("Should be O(n log n)") # Not measured +``` + +## ๐Ÿ“ Documentation Standards + +### Performance Tables + +```markdown +โœ… GOOD Table: + +| Model | Dataset | Accuracy | Time | Hardware | +|-------|---------|----------|------|----------| +| MLP-4-layer | CIFAR-10 | 55% | 2 min | M1 CPU | +| Random baseline | CIFAR-10 | 10% | 0 sec | N/A | +| MLP-4-layer | MNIST | 98% | 30 sec | M1 CPU | + +โŒ BAD Table: + +| Model | Performance | +|-------|------------| +| Our MLP | State-of-the-art | +| With optimization | Up to 70% | +| Best case | Rivals CNNs | +``` + +### Comparison Claims + +```markdown +โœ… GOOD Comparisons: +- "5.5ร— better than random baseline (10% โ†’ 55%)" +- "Matches typical educational MLP benchmarks" +- "20% below simple CNN performance" + +โŒ BAD Comparisons: +- "Competitive with modern architectures" +- "Approaching state-of-the-art" +- "Best-in-class for educational frameworks" +``` + +## โš ๏ธ Red Flags to Avoid + +### Weasel Words +- "Can achieve..." (but didn't) +- "Up to..." (theoretical maximum) +- "Potentially..." (unverified) +- "Should be able to..." (untested) +- "With proper tuning..." (hand-waving) + +### Unverified Optimizations +- "With these 10 techniques..." (didn't implement) +- "Research shows..." (not our research) +- "In theory..." (not in practice) +- "Could reach..." (but didn't) + +### Vague Metrics +- "Good performance" +- "Impressive results" +- "Significant improvement" +- "Fast training" + +## ๐ŸŽฏ The Integrity Test + +Before making any performance claim, ask: + +1. **Did I measure this myself?** + - If no โ†’ Don't claim it + +2. **Can someone reproduce this?** + - If no โ†’ Don't publish it + +3. **Is this the typical case?** + - If no โ†’ Note it's exceptional + +4. **Would I bet money on this?** + - If no โ†’ Reconsider the claim + +## ๐Ÿ“Œ Remember + +> "It's better to under-promise and over-deliver than the opposite." + +**Trust is earned through:** +- Honest reporting +- Reproducible results +- Clear limitations +- Verified claims + +**Trust is lost through:** +- Exaggerated claims +- Unverified results +- Hidden failures +- Theoretical promises + +## ๐Ÿ† Good Examples from TinyTorch + +### CIFAR-10 Cleanup +**Before:** "60-70% achievable with optimization" +**After:** "55% verified performance" +**Result:** Honest, trustworthy documentation + +### XOR Network +**Claim:** "100% accuracy on XOR" +**Verified:** Yes, consistently achieves 100% +**Result:** Credible claim that builds trust \ No newline at end of file diff --git a/.claude/guidelines/TESTING_STANDARDS.md b/.claude/guidelines/TESTING_STANDARDS.md new file mode 100644 index 00000000..5193cf3d --- /dev/null +++ b/.claude/guidelines/TESTING_STANDARDS.md @@ -0,0 +1,228 @@ +# TinyTorch Testing Standards + +## ๐ŸŽฏ Core Testing Philosophy + +**Test immediately, test simply, test educationally.** + +Testing in TinyTorch serves two purposes: +1. **Verification**: Ensure the code works +2. **Education**: Help students understand what they built + +## ๐Ÿ“‹ Testing Patterns + +### The Immediate Testing Pattern + +**MANDATORY**: Test immediately after each implementation, not at the end. + +```python +# โœ… CORRECT: Implementation followed by immediate test +class Tensor: + def __init__(self, data): + self.data = data + +# Test Tensor creation immediately +def test_tensor_creation(): + t = Tensor([1, 2, 3]) + assert t.data == [1, 2, 3], "Tensor should store data" + print("โœ… Tensor creation works") + +test_tensor_creation() + +# โŒ WRONG: All tests grouped at the end +# [100 lines of implementations] +# [Then all tests at the bottom] +``` + +### Simple Assertion Testing + +**Use simple assertions, not complex frameworks.** + +```python +# โœ… GOOD: Simple and clear +def test_forward_pass(): + model = SimpleMLP() + x = Tensor(np.random.randn(32, 784)) + output = model.forward(x) + assert output.shape == (32, 10), f"Expected (32, 10), got {output.shape}" + print("โœ… Forward pass shapes correct") + +# โŒ BAD: Over-engineered +class TestMLPForwardPass(unittest.TestCase): + def setUp(self): + self.model = SimpleMLP() + + def test_forward_pass_shape_validation_with_mock_data(self): + # ... 50 lines of test setup +``` + +### Educational Test Messages + +**Tests should teach, not just verify.** + +```python +# โœ… GOOD: Educational +def test_backpropagation(): + # Create simple network: 2 inputs โ†’ 2 hidden โ†’ 1 output + net = TwoLayerNet(2, 2, 1) + + # Forward pass with XOR data + x = Tensor([[0, 0], [0, 1], [1, 0], [1, 1]]) + y = Tensor([[0], [1], [1], [0]]) + + output = net.forward(x) + loss = mse_loss(output, y) + + print(f"Initial loss: {loss.data:.4f}") + print("This high loss shows the network hasn't learned XOR yet") + + # Backward pass + loss.backward() + + # Check gradients exist + assert net.w1.grad is not None, "Gradients should be computed" + print("โœ… Backpropagation computed gradients") + print("The network can now learn from its mistakes!") + +# โŒ BAD: Just verification +def test_backprop(): + net = TwoLayerNet(2, 2, 1) + # ... minimal test + assert net.w1.grad is not None + # No educational value +``` + +## ๐Ÿงช Performance Testing + +### Baseline Comparisons + +**Always test against a clear baseline.** + +```python +def test_model_performance(): + # 1. Test random baseline + random_model = create_random_network() + random_acc = evaluate(random_model, test_data) + print(f"Random network accuracy: {random_acc:.1%}") + + # 2. Test trained model + trained_model = load_trained_model() + trained_acc = evaluate(trained_model, test_data) + print(f"Trained network accuracy: {trained_acc:.1%}") + + # 3. Show improvement + improvement = trained_acc / random_acc + print(f"Improvement: {improvement:.1f}ร— better than random") + + assert trained_acc > random_acc * 2, "Should be at least 2ร— better than random" +``` + +### Honest Performance Reporting + +```python +# โœ… GOOD: Report actual measurements +def test_training_performance(): + start_time = time.time() + accuracy = train_model(epochs=10) + train_time = time.time() - start_time + + print(f"Achieved accuracy: {accuracy:.1%}") + print(f"Training time: {train_time:.1f} seconds") + print(f"Status: {'โœ… PASS' if accuracy > 0.5 else 'โŒ FAIL'}") + +# โŒ BAD: Theoretical claims +def test_training(): + # ... training code + print("Can achieve 60-70% with proper tuning") # Unverified claim +``` + +## ๐Ÿ” Test Organization + +### Test Placement + +```python +# Module structure with immediate tests +# module_name.py + +# Part 1: Core implementation +class Tensor: + ... + +# Immediate test +test_tensor_creation() + +# Part 2: Operations +def add(a, b): + ... + +# Immediate test +test_addition() + +# Part 3: Advanced features +def backward(): + ... + +# Immediate test +test_backward() + +# At the end: Run all tests when executed directly +if __name__ == "__main__": + print("Running all tests...") + test_tensor_creation() + test_addition() + test_backward() + print("โœ… All tests passed!") +``` + +## โš ๏ธ Common Testing Mistakes + +1. **Grouping all tests at the end** + - Loses educational flow + - Students don't see immediate verification + +2. **Over-complicated test frameworks** + - Obscures what's being tested + - Adds unnecessary complexity + +3. **Testing without teaching** + - Missing opportunity to reinforce concepts + - No educational value + +4. **Unverified performance claims** + - Damages credibility + - Misleads students + +## ๐Ÿ“ Test Documentation + +```python +def test_attention_mechanism(): + """ + Test that attention correctly weighs different positions. + + This test demonstrates the key insight of attention: + the model learns what to focus on. + """ + # Create simple sequence + sequence = Tensor([[1, 0, 0], # Position 0: important + [0, 0, 0], # Position 1: padding + [0, 0, 1]]) # Position 2: important + + attention_weights = compute_attention(sequence) + + # Check that important positions get more weight + assert attention_weights[0] > attention_weights[1] + assert attention_weights[2] > attention_weights[1] + + print("โœ… Attention focuses on important positions") + print(f"Weights: {attention_weights}") + print("Notice how padding (position 1) gets less attention") +``` + +## ๐ŸŽฏ Remember + +> Tests are teaching tools, not just verification tools. + +Every test should help a student understand: +- What the code does +- Why it matters +- How to verify it works +- What success looks like \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md index 127663e7..eed46426 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,7 +1,20 @@ # Claude Code Instructions for TinyTorch -## โšก **MANDATORY: Read Git Policies First** -**Before any development work, you MUST read and follow the Git Workflow Standards section below.** +## ๐Ÿ“š **MANDATORY: Read Guidelines First** + +**All development standards are documented in the `.claude/` directory.** + +### Required Reading Order: +1. `.claude/guidelines/DESIGN_PHILOSOPHY.md` - KISS principle and core values +2. `.claude/guidelines/GIT_WORKFLOW.md` - Git policies and branching standards +3. `.claude/guidelines/MODULE_DEVELOPMENT.md` - How to build modules +4. `.claude/guidelines/TESTING_STANDARDS.md` - Testing requirements +5. `.claude/guidelines/PERFORMANCE_CLAIMS.md` - Honest reporting standards +6. `.claude/guidelines/AGENT_COORDINATION.md` - How to work with AI agents + +**Start with `.claude/README.md` for a complete overview.** + +## โšก **CRITICAL: Core Policies** **CRITICAL POLICIES - NO EXCEPTIONS:** - โœ… Always use virtual environment (`.venv`) @@ -15,28 +28,6 @@ --- -## ๐Ÿ’ก **CORE PRINCIPLE: Keep It Simple, Stupid (KISS)** - -**Simplicity is a fundamental principle of TinyTorch. Always prefer simple, clear solutions over complex ones.** - -**KISS Guidelines:** -- **One file, one purpose** - Don't create multiple versions doing the same thing -- **Clear over clever** - Code should be readable by students learning ML -- **Minimal dependencies** - Avoid unnecessary libraries or complex UI -- **Direct implementation** - Show the core concepts without abstraction layers -- **Honest performance** - Report what actually works, not theoretical possibilities - -**Examples:** -- โœ… `random_baseline.py` and `train.py` - two files, clear story -- โŒ Multiple optimization scripts with unverified claims -- โœ… Simple console output showing progress -- โŒ Complex dashboards with ASCII plots that don't add educational value -- โœ… "Achieves 55% accuracy" (verified) -- โŒ "Can achieve 60-70% with optimization" (unverified) - -**When in doubt, choose the simpler option. If students can't understand it, we've failed.** - ---- ## ๐Ÿšจ **CRITICAL: Think First, Don't Just Agree** diff --git a/CLAUDE_SIMPLE.md b/CLAUDE_SIMPLE.md new file mode 100644 index 00000000..cfc08068 --- /dev/null +++ b/CLAUDE_SIMPLE.md @@ -0,0 +1,81 @@ +# Claude Code Instructions for TinyTorch + +## ๐Ÿ“š **START HERE: Read the Guidelines** + +All development standards, principles, and workflows are documented in the `.claude/` directory. + +### Quick Start +```bash +# First, read the overview +cat .claude/README.md + +# Then read core guidelines in order: +cat .claude/guidelines/DESIGN_PHILOSOPHY.md # KISS principle +cat .claude/guidelines/GIT_WORKFLOW.md # Git standards +cat .claude/guidelines/MODULE_DEVELOPMENT.md # Building modules +cat .claude/guidelines/TESTING_STANDARDS.md # Testing patterns +``` + +## ๐ŸŽฏ Core Mission + +**Build an educational ML framework where students learn ML systems engineering by implementing everything from scratch.** + +Key principles: +- **KISS**: Keep It Simple, Stupid +- **Build to Learn**: Implementation teaches more than reading +- **Systems Focus**: Not just algorithms, but engineering +- **Honest Claims**: Only report verified performance + +## โšก Critical Policies + +1. **ALWAYS use virtual environment** (`.venv`) +2. **ALWAYS work on feature branches** (never main/dev directly) +3. **ALWAYS test before committing** +4. **NEVER add automated attribution** to commits +5. **NEVER edit .ipynb files directly** (edit .py only) + +## ๐Ÿค– Working with AI Agents + +**Always start with the Technical Program Manager (TPM)**: +- TPM coordinates all other agents +- Don't invoke agents directly +- Follow the workflow in `.claude/guidelines/AGENT_COORDINATION.md` + +## ๐Ÿ“ Key Directories + +``` +.claude/guidelines/ # All development standards +.claude/agents/ # AI agent definitions +modules/source/ # Module implementations (.py files) +examples/ # Working examples (keep simple) +tests/ # Test suites +``` + +## ๐Ÿšจ Think Critically + +**Don't just agree with suggestions. Always:** +1. Evaluate if it makes pedagogical sense +2. Check if there's a simpler way +3. Verify it actually works +4. Consider student perspective + +## ๐Ÿ“‹ Before Any Work + +1. **Read guidelines**: Start with `.claude/README.md` +2. **Create branch**: Follow `.claude/guidelines/GIT_WORKFLOW.md` +3. **Activate venv**: `source .venv/bin/activate` +4. **Use TPM agent**: For coordinated development + +## ๐ŸŽ“ Remember + +> "If students can't understand it, we've failed." + +Every decision should be: +- Simple +- Verified +- Educational +- Honest + +--- + +**For detailed instructions on any topic, see the appropriate file in `.claude/guidelines/`** \ No newline at end of file