🛡️ **CRITICAL FIXES & PROTECTION SYSTEM** **Core Variable/Tensor Compatibility Fixes:** - Fix bias shape corruption in Adam optimizer (CIFAR-10 blocker) - Add Variable/Tensor compatibility to matmul, ReLU, Softmax, MSE Loss - Enable proper autograd support with gradient functions - Resolve broadcasting errors with variable batch sizes **Student Protection System:** - Industry-standard file protection (read-only core files) - Enhanced auto-generated warnings with prominent ASCII-art headers - Git integration (pre-commit hooks, .gitattributes) - VSCode editor protection and warnings - Runtime validation system with import hooks - Automatic protection during module exports **CLI Integration:** - New `tito system protect` command group - Protection status, validation, and health checks - Automatic protection enabled during `tito module complete` - Non-blocking validation with helpful error messages **Development Workflow:** - Updated CLAUDE.md with protection guidelines - Comprehensive validation scripts and health checks - Clean separation of source vs compiled file editing - Professional development practices enforcement **Impact:** ✅ CIFAR-10 training now works reliably with variable batch sizes ✅ Students protected from accidentally breaking core functionality ✅ Professional development workflow with industry-standard practices ✅ Comprehensive testing and validation infrastructure This enables reliable ML systems training while protecting students from common mistakes that break the Variable/Tensor compatibility.
41 KiB
Claude Code Instructions for TinyTorch
⚡ MANDATORY: Read Git Policies First
Before any development work, you MUST read and follow the Git Workflow Standards section below.
CRITICAL POLICIES - NO EXCEPTIONS:
- ✅ Always use virtual environment (
.venv) - ✅ Always work on feature branches (never
dev/maindirectly) - ✅ Always test before committing
- 🚨 NEVER add Co-Authored-By or automated attribution to commits
- 🚨 NEVER add "Generated with Claude Code" to commits
- 🚨 Only project owner adds Co-Authored-By when explicitly needed
These policies ensure clean history, professional development practices, and project integrity.
🚨 CRITICAL: Think First, Don't Just Agree
YOU MUST CRITICALLY EVALUATE EVERY SUGGESTION - DO NOT DEFAULT TO AGREEMENT
When I suggest something, your FIRST response should be to:
- STOP and think critically - Is this actually a good idea?
- Identify potential problems - What could go wrong? What's confusing?
- Consider alternatives - Is there a better approach?
- Then respond with honest analysis - "Here's what works and what doesn't..."
Example of GOOD critical thinking (like you just did):
"Let me think critically about this naming suggestion...
The problem with lenet/alexnet/chatgpt is:
1. LeNet is misleading - we're using MLP not CNN
2. Students aren't actually building those architectures
3. This could confuse students about what they're implementing"
Example of BAD agreeable behavior:
"Great idea! Let me rename everything to lenet/alexnet/chatgpt right away!"
[Proceeds without questioning if this makes sense]
🎯 Primary Mission: Pedagogical Excellence
Your Role as Educational Design Partner
You are my co-designer in creating the best possible ML Systems educational framework. Your job is to:
- Give Candid Feedback - If something doesn't make pedagogical sense, say so directly
- Challenge Decisions - Don't just agree. If my approach has flaws, explain why
- Propose Alternatives - When you disagree, offer better solutions with reasoning
- Think Like a Student - Always ask "Would this flow make sense to someone learning?"
- Think Like an Educator - Consider cognitive load, prerequisite knowledge, and skill building
- Think Like an Engineer - Ensure what we teach reflects real ML systems challenges
Design Principles We're Optimizing For
Pedagogical Goals:
- Clarity: Each module should have a clear, singular learning objective
- Progression: Skills must build logically - no forward references
- Motivation: Students should understand WHY they're learning each component
- Synthesis: Everything should culminate in building TinyGPT as proof of mastery
- Systems Focus: Emphasize memory, compute, scaling - not just algorithms
What You Should Question:
- Module ordering - Does the sequence make sense?
- Cognitive load - Are we teaching too much at once?
- Motivation - Will students understand why this matters?
- Prerequisites - Do they have the knowledge needed?
- Practical value - Does this reflect real ML engineering?
How to Disagree Effectively:
"I see why you want X, but consider this issue: [specific problem].
Here's an alternative: [concrete proposal].
This would be better because: [pedagogical reasoning]."
Remember: We're building this together. Your candid feedback makes it better. Don't hold back concerns in the name of being agreeable. The best educational product comes from honest collaboration.
🔍 Critical Evaluation Checklist
Before implementing ANY suggestion, ask yourself:
- Does this make pedagogical sense? Will students understand why?
- Is the naming clear and accurate? No misleading terminology
- Does this match what we're actually building? Don't pretend it's something it's not
- Will this confuse or clarify? Always choose clarity
- Is there a simpler/better way? Propose alternatives
Your job is to be the critical voice that ensures quality, not a yes-person who implements without thinking.
Git Workflow Standards
🌿 Always Use Feature Branches
NEVER work directly on dev or main
# Start any new work with a feature branch
git checkout dev
git pull origin dev
git checkout -b feature/your-feature-name
📝 Branch Naming Convention
Use descriptive branch names that indicate the type of work:
- Features:
feature/add-tito-view-command - Bug fixes:
fix/tensor-dtype-handling - Improvements:
improve/module-documentation - Experiments:
experiment/new-testing-approach
🔧 Development Workflow
- Activate virtual environment - ALWAYS use
.venvfor consistent dependencies - Create branch for each logical piece of work
- Make focused commits related to that branch only
- Test your changes before committing
- Merge to dev when feature is complete and tested
- Delete feature branch after successful merge
✅ Commit Standards - MANDATORY POLICIES
- One feature per branch - don't mix unrelated changes
- Test before committing - ensure functionality works
- Descriptive commit messages that explain the "why"
- Clean history - squash if needed before merging
🚨 CRITICAL: Commit Authorship Policy - READ EVERY TIME
NEVER add Co-Authored-By or any automated attribution to commits.
- Co-Authored-By: Only added by project owner when explicitly needed
- Generated with Claude Code: FORBIDDEN - do not add this line to commits
- Automated attribution: Forbidden - keep commits clean and professional
- Commit ownership: All commits should reflect actual authorship, not tool usage
- History integrity: Clean commit history is essential for project maintenance
This policy MUST be followed for every single commit. No exceptions.
🚫 What NOT to Do
- ❌ Work directly on
devormain - ❌ Mix unrelated changes in one branch
- ❌ Commit broken code
- ❌ Merge untested changes
- ❌ Leave stale feature branches
📋 Merge Checklist
Before merging any feature branch:
- Virtual environment activated and dependencies installed
- Code works correctly
- Tests pass (if applicable)
- Documentation updated (if needed)
- No conflicts with dev branch
- Feature is complete and ready for use
🔄 Example Workflow
# 0. ALWAYS start with virtual environment
python -m venv .venv
source .venv/bin/activate # On macOS/Linux
# OR: .venv\Scripts\activate # On Windows
pip install -r requirements.txt
# 1. Start new feature
git checkout dev
git pull origin dev
git checkout -b feature/add-module-validation
# 2. Work on feature, make commits
git add .
git commit -m "Add module validation logic"
git add .
git commit -m "Add validation tests"
# 3. When ready, merge to dev
git checkout dev
git pull origin dev
git merge feature/add-module-validation
# 4. Clean up
git branch -d feature/add-module-validation
# 5. Push updated dev
git push origin dev
🎯 Why This Matters
- Keeps history clean - easy to understand what changed when
- Enables collaboration - multiple people can work without conflicts
- Allows experimentation - try things without breaking main code
- Facilitates rollbacks - easy to undo specific features if needed
- Professional practice - industry standard for software development
🐍 Virtual Environment Standards - MANDATORY
🔐 ALWAYS Use Virtual Environments
NEVER work directly with system Python or globally installed packages.
# Create virtual environment (one time setup)
python -m venv .venv
# Activate virtual environment (EVERY session)
source .venv/bin/activate # macOS/Linux
# OR: .venv\Scripts\activate # Windows
# Install dependencies (after activation)
pip install -r requirements.txt
# Verify environment
which python # Should show .venv path
pip list # Should show only project dependencies
📋 Virtual Environment Checklist
Before ANY development work:
- Virtual environment activated (
source .venv/bin/activate) - Dependencies installed (
pip install -r requirements.txt) - Verification:
which pythonshows.venv/bin/python - Verification:
tito system doctorshows ✅ environment checks
🚫 What NOT to Do
- ❌ Use system Python for development
- ❌ Install packages globally with
sudo pip install - ❌ Work without activating
.venv - ❌ Mix dependencies from different environments
- ❌ Commit with virtual environment deactivated
🔧 Environment Troubleshooting
If you see dependency errors:
- Deactivate and recreate:
deactivate && rm -rf .venv && python -m venv .venv - Reactivate:
source .venv/bin/activate - Reinstall:
pip install -r requirements.txt - Verify:
tito system doctor
💡 Pro Tips
- Add to shell profile:
alias activate='source .venv/bin/activate' - Check activation: Your prompt should show
(.venv)prefix - Architecture issues: Use
python -m pip install --force-reinstallfor numpy/architecture conflicts
AI Agent Workflow Standards
🏢 Technical Program Manager (TPM) - Primary Communication Interface
The TPM is your single point of communication for all TinyTorch development.
The TPM agent is defined in .claude/agents/technical-program-manager.md and has complete knowledge of all other agents' capabilities.
TPM Usage:
# Primary communication pattern
User Request → TPM Agent → Coordinates Specialized Agents → Reports Back
The TPM knows when to invoke:
- Education Architect (.claude/agents/education-architect.md) - Learning design
- Module Developer (.claude/agents/module-developer.md) - Implementation
- Package Manager (.claude/agents/package-manager.md) - Integration
- Quality Assurance (.claude/agents/quality-assurance.md) - Testing
- Documentation Publisher (.claude/agents/documentation-publisher.md) - Content
- Workflow Coordinator (.claude/agents/workflow-coordinator.md) - Process management
- DevOps Engineer (.claude/agents/devops-engineer.md) - Infrastructure
- Tito CLI Developer (.claude/agents/tito-cli-developer.md) - CLI functionality
Agent Communication Protocol:
All agents are available via the Task tool. The TPM manages:
- Project planning and agent sequencing
- Quality gates and handoff criteria
- Timeline management and progress tracking
- Escalation when new agents are needed
- Integration across all deliverables
🤖 Agent Team Orchestration - Best Practices
The TPM manages multiple AI agents with structured coordination:
📊 Agent Team Structure
Workflow Coordinator (Team Lead)
├── Education Architect (Strategy)
├── Module Developer (Implementation)
├── Package Manager (Integration)
├── Quality Assurance (Validation)
└── Documentation Publisher (Communication)
🎯 Checkpoint System Implementation - Agent Workflow Case Study
SUCCESSFUL IMPLEMENTATION: The agent team successfully implemented a comprehensive 16-checkpoint capability assessment system with integration testing. Here's how the workflow functioned:
Phase 1: Strategic Planning (Education Architect + Workflow Coordinator)
- Education Architect: Designed capability-based learning progression (Foundation → Architecture → Training → Inference → Serving)
- Workflow Coordinator: Orchestrated agent coordination and defined implementation phases
- Result: 16-checkpoint structure aligned with 17 TinyTorch modules, each with clear capability statements
Phase 2: Implementation (Module Developer)
- Implemented checkpoint test suite: 16 individual test files (
checkpoint_00_environment.pythroughcheckpoint_15_capstone.py) - Built CLI integration: Complete
tito checkpointcommand system with Rich visualizations - Created module completion workflow:
tito module completewith automatic export and testing - Added integration testing: Post-module completion checkpoint validation
- MUST call QA Agent: Immediately after implementation completed
Phase 3: Quality Assurance (QA Agent) - MANDATORY
- Tested all 16 checkpoint implementations: Each test file executes correctly and validates capabilities
- Verified CLI integration: All
tito checkpointcommands work with Rich progress tracking - Validated module completion workflow:
tito module completecorrectly exports and tests checkpoints - Tested integration pipeline: Module-to-checkpoint mapping functions correctly
- Reported success to Package Manager: All tests passed, ready for integration
Phase 4: Package Integration (Package Manager) - MANDATORY
- Validated checkpoint test execution: All checkpoint files import and run correctly
- Verified CLI command registration:
tito checkpointcommands integrated into main CLI - Tested module-to-checkpoint mapping: Correct checkpoint triggered for each module completion
- Ensured complete package build: All checkpoint functionality available in built package
- Integration success confirmed: Complete system works end-to-end
Phase 5: Documentation (Documentation Publisher)
- Updated documentation: This CLAUDE.md file, checkpoint-system.md, README.md updates
- Documented agent workflow: How agents successfully coordinated implementation
- Created CLI documentation: Usage examples and command reference
- Explained integration testing: How checkpoint system validates student progress
Phase 6: Review and Approval (Workflow Coordinator)
- Verified all agents completed tasks: QA passed, Package Manager confirmed integration
- Confirmed capability delivery: 16-checkpoint system with CLI and integration testing
- Approved for commit: Complete implementation ready for production use
🚀 Implemented Checkpoint System Capabilities
The successful agent workflow delivered these concrete features:
16-Checkpoint Capability Assessment System
# Checkpoint progression with capability questions:
00: Environment - "Can I configure my TinyTorch development environment?"
01: Foundation - "Can I create and manipulate the building blocks of ML?"
02: Intelligence - "Can I add nonlinearity - the key to neural network intelligence?"
03: Components - "Can I build the fundamental building blocks of neural networks?"
04: Networks - "Can I build complete multi-layer neural networks?"
05: Learning - "Can I process spatial data like images with convolutional operations?"
06: Attention - "Can I build attention mechanisms for sequence understanding?"
07: Stability - "Can I stabilize training with normalization techniques?"
08: Differentiation - "Can I automatically compute gradients for learning?"
09: Optimization - "Can I optimize neural networks with sophisticated algorithms?"
10: Training - "Can I build complete training loops for end-to-end learning?"
11: Regularization - "Can I prevent overfitting and build robust models?"
12: Kernels - "Can I implement high-performance computational kernels?"
13: Benchmarking - "Can I analyze performance and identify bottlenecks in ML systems?"
14: Deployment - "Can I deploy and monitor ML systems in production?"
15: Capstone - "Can I build complete end-to-end ML systems from scratch?"
Rich CLI Progress Tracking
# Visual progress tracking with Rich library
tito checkpoint status # Current progress overview with capability statements
tito checkpoint status --detailed # Module-level detail with test file status
tito checkpoint timeline # Vertical tree view with connecting lines
tito checkpoint timeline --horizontal # Linear progress bar with Rich styling
tito checkpoint test 01 # Test specific checkpoint capabilities
tito checkpoint run 00 --verbose # Run checkpoint with detailed output
Module Completion Workflow with Integration Testing
# Automatic export and checkpoint testing
tito module complete 02_tensor # Exports module to package AND tests capabilities
tito module complete tensor # Works with short names too
tito module complete 02_tensor --skip-test # Skip checkpoint test if needed
# Workflow automatically:
# 1. Exports module to tinytorch package
# 2. Maps module to appropriate checkpoint (02_tensor → checkpoint_01_foundation)
# 3. Runs capability test with Rich progress tracking
# 4. Shows achievement celebration and next steps
Comprehensive Integration Testing
- Module-to-Checkpoint Mapping: Each module automatically triggers appropriate checkpoint test
- Capability Validation: Tests verify actual functionality works, not just code completion
- Progress Visualization: Rich CLI shows achievements and suggests next steps
- Immediate Feedback: Students get instant validation when capabilities are achieved
🔄 Standard Agent Workflow Pattern
For EVERY module update, follow this sequence:
-
Planning Phase (Workflow Coordinator + Education Architect)
- Define learning objectives
- Plan module structure
- Set implementation goals
-
Implementation Phase (Module Developer)
- Write code following specifications
- Add NBGrader metadata
- Create test scaffolding
- Add proper export directives (#| default_exp)
- MUST call QA Agent when done
-
Testing Phase (Quality Assurance) - MANDATORY
- Run comprehensive test suite
- Verify all functionality
- Report results to Package Manager
- Block progress if tests fail
-
Integration Phase (Package Manager) - MANDATORY
- Validate module exports correctly
- Check integration with other modules
- Run integration tests
- Ensure complete package works
- Block progress if integration fails
-
Documentation Phase (Documentation Publisher)
- Add explanatory markdown
- Create ML systems thinking questions
- Ensure clarity and consistency
-
Review Phase (Workflow Coordinator)
- Verify all agents completed their tasks
- Ensure QA tests passed
- Confirm package integration successful
- Approve for commit
🎯 Agent Communication Protocol
Agents MUST communicate through structured handoffs:
# Example workflow for module update:
workflow_coordinator.plan_update(module="tensor")
→ education_architect.design_learning_path()
→ module_developer.implement_code()
→ qa_agent.run_tests() # MANDATORY
→ documentation_publisher.add_documentation()
→ workflow_coordinator.review_and_approve()
⚡ Parallel vs Sequential Work
Parallel Tasks (can happen simultaneously):
- Multiple Module Developers working on different modules
- Documentation Publisher preparing content while code is tested
- Education Architect planning next modules
Sequential Tasks (must happen in order):
- Virtual Environment Setup → Implementation → Testing → Commit
- Planning → Implementation → Documentation
- Test Failure → Fix → Re-test → Proceed
🚨 Agent Accountability Rules
- Module Developer: Cannot mark task complete without QA approval
- QA Agent: Must test EVERY change, no exceptions
- Package Manager: Must validate integration, can block releases
- Workflow Coordinator: Cannot proceed without all agent sign-offs
- Documentation Publisher: Must verify code works before documenting
- Education Architect: Must validate learning objectives are met
📝 Agent Handoff Checklist
When passing work between agents, include:
- What was completed
- What needs to be done next
- Any issues or blockers found
- Test results (if applicable)
- Recommendations for next agent
🔧 Conflict Resolution
If agents disagree or find conflicts:
- QA Agent has veto power on code quality
- Education Architect has final say on learning objectives
- Workflow Coordinator resolves all other disputes
- User has ultimate override authority
🤖 Workflow Compliance
ALL AI agents MUST follow the Git Workflow Standards defined in /Users/VJ/GitHub/TinyTorch/CLAUDE.md.
Read the complete Git Workflow Standards section in this file for all branching, commit, and merge requirements.
TinyTorch Module Development Standards
🔬 CRITICAL: ML Systems Course - Not Just ML Algorithms
TinyTorch is an ML SYSTEMS course where you understand systems by building them. Every module MUST emphasize systems engineering principles, not just algorithms.
MANDATORY Systems Analysis in Every Module:
- Memory complexity: How much RAM does this operation use? When are copies made?
- Computational complexity: O(N), O(N²), O(N³) - measure and explain performance
- Cache efficiency: How do memory access patterns affect performance?
- Scaling bottlenecks: What breaks first when data/models get large?
- Production implications: How is this used in real ML systems like PyTorch?
- Hardware considerations: CPU vs GPU, vectorization, bandwidth limits
🎯 CRITICAL: .py Files Only
ALL TinyTorch development and modifications MUST be done in .py files ONLY.
- ✅ ALWAYS edit:
module_name_dev.pyfiles - ❌ NEVER edit:
.ipynbnotebook files - ✅ Notebooks are generated: from .py files using jupytext
- ❌ Direct notebook editing: breaks the development workflow
Why .py files only:
- Version control friendly (clean diffs, no notebook metadata noise)
- Consistent development environment across all contributors
- Automated notebook generation ensures consistency
- Professional development practices
📚 Module Structure Requirements - ML SYSTEMS FOCUS
All TinyTorch modules MUST follow the standardized structure with MANDATORY systems analysis:
- Module Introduction - What we're building and why (systems context)
- Mathematical Background - Theory and computational complexity
- Implementation - Building components with memory/performance analysis
- Systems Analysis - MANDATORY: Memory profiling, complexity analysis, scaling behavior
- Testing - Immediate tests after each implementation (including performance tests)
- Integration - How components work together in larger systems
- Production Context - How do real ML systems handle this? (PyTorch, TensorFlow examples)
- Comprehensive Testing - Full validation including performance characteristics
- Main Execution Block -
if __name__ == "__main__":with all test execution - ML Systems Thinking - Systems-focused reflection questions (AFTER main block)
- Module Summary - What was accomplished (ALWAYS LAST SECTION)
🔬 New Principle: Every Module Teaches Systems Thinking Through Implementation
MANDATORY: Every module must demonstrate that understanding systems comes through building them, not just studying them.
🧪 Testing Pattern - MANDATORY
Implementation → Test Explanation (Markdown) → Test Code → Next Implementation
CRITICAL RULES:
- EVERY test must have a preceding markdown cell explaining what it tests and why
- IMMEDIATE testing after each implementation (not grouped at end)
- Unit tests = immediate after implementation
- Integration tests = Part 9 only
🔬 ML Systems Analysis - MANDATORY IN EVERY MODULE
Every module MUST include comprehensive systems analysis, not just algorithmic implementation.
REQUIRED Systems Insights Sections:
- Memory Analysis: Explicit memory profiling, copying behavior, space complexity
- Performance Characteristics: Computational complexity, benchmarking, bottleneck identification
- Scaling Behavior: How does performance degrade with larger inputs/models?
- Production Context: How do real systems (PyTorch, TensorFlow) handle this?
- Hardware Implications: Cache behavior, vectorization opportunities, bandwidth limits
Example Required Analysis:
# MANDATORY: Include memory profiling like this in every module
def profile_memory_usage():
\"\"\"Analyze memory consumption patterns.\"\"\"
import tracemalloc
tracemalloc.start()
# Your operation here
result = adam_optimizer.step()
current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.2f} MB")
print(f"Peak: {peak / 1024 / 1024:.2f} MB")
# Why is Adam using 3× parameter memory?
🤔 ML Systems Thinking Questions - REQUIRED
Documentation Publisher must create systems-focused reflection questions that analyze the actual implementations.
MANDATORY Question Categories:
- Memory & Performance: "Why does this operation use O(N²) memory? When does this become problematic?"
- Systems Architecture: "How would you optimize this for distributed training across 8 GPUs?"
- Production Engineering: "What happens when this operation fails in production? How do you debug it?"
- Scaling Analysis: "At what model size does this become the bottleneck? How do you know?"
Questions MUST reference the actual code students implemented, not abstract concepts.
🎯 ML Systems Content Integration - CURRENT STATUS
ML Systems rationale and content is ALREADY INTEGRATED into the current TinyTorch structure:
✅ Memory Analysis: Optimizer modules include memory profiling (Adam = 3× parameter memory)
✅ Performance Insights: Production contexts in training, spatial, attention modules
✅ System Trade-offs: Memory vs speed analysis in multiple modules
✅ Production Context: Real-world applications and deployment considerations
✅ Comprehensive Documentation: System architecture guide with Mermaid diagrams
✅ NBGrader Integration: Automated grading with instructor workflow
✅ Updated README: Emphasizes system-level learning and ML engineering skills
Key ML Systems Concepts Covered:
- Module 02 (Tensor): Memory layout and performance implications
- Module 06 (Spatial): Cache efficiency and memory access patterns
- Module 07 (Attention): O(N²) scaling and memory bottlenecks
- Module 09 (Autograd): Graph memory management and checkpointing
- Module 10 (Optimizers): Memory profiling, Adam 3× memory usage, production patterns
- Module 11 (Training): Gradient accumulation and resource management
- Module 13 (Kernels): Hardware acceleration and vectorization
- Module 14 (Benchmarking): Performance analysis and bottleneck identification
- Module 15 (MLOps): Production deployment and monitoring
🎯 North Star Goal Achievement - COMPLETED
Successfully implemented all enhancements for semester north star goal: Train CNN on CIFAR-10 to 75% accuracy
✅ CIFAR-10 Dataset Support (Module 08)
download_cifar10(): Automatic dataset download and extraction (~170MB)CIFAR10Dataset: Complete dataset class with train/test splits (50k/10k samples)- Real data loading: Support for 32x32 RGB images, not toy datasets
- Efficient batching: DataLoader integration with shuffling and preprocessing
✅ Model Checkpointing & Training (Module 11)
save_checkpoint()/load_checkpoint(): Save and restore complete model statesave_best=True: Automatically tracks and saves best validation modelearly_stopping_patience: Prevents overfitting with automatic stopping- Training history: Complete loss and metric tracking for visualization
✅ Evaluation Tools (Module 11)
evaluate_model(): Comprehensive evaluation with multiple metricscompute_confusion_matrix(): Class-wise error analysisplot_training_history(): Visualization of training/validation curves- Per-class accuracy: Detailed performance breakdown by category
✅ Documentation & Guides
- Main README: Added dedicated "North Star Achievement" section with complete example
- Module READMEs: Updated dataloader and training modules with new capabilities
- CIFAR-10 Training Guide: Complete student guide at
docs/cifar10-training-guide.md - Demo scripts: Working examples validating 75%+ accuracy achievable
✅ Pipeline Validation
test_pipeline.py: Validates complete training pipeline works end-to-enddemo_cifar10_training.py: Demonstrates achieving north star goal- Integration tests: Module exports correctly support full CNN training
- Checkpoint tests: All 16 capability checkpoints validated
Result: Students can now train real CNNs on real data to achieve meaningful accuracy (75%+) using 100% their own code!
Documentation Resources:
book/instructor-guide.md- Complete NBGrader workflow for instructorsbook/system-architecture.md- Visual system architecture with Mermaid diagramsNBGrader_Quick_Reference.md- Essential commands for daily use- Module README files - Learning objectives emphasizing system concepts
📝 Markdown Cell Format - CRITICAL
# CORRECT:
# %% [markdown]
"""
## Section Title
Content here...
"""
# WRONG (breaks notebooks):
# %% [markdown]
# ## Section Title
# Content here...
🏗️ Agent Responsibilities for Modules
Education Architect:
- Learning objectives focused on ML SYSTEMS understanding
- Ensure Build→Profile→Optimize workflow compliance
- Educational strategy emphasizing systems engineering
- MUST ensure every module teaches systems thinking through implementation
Module Developer:
- Code implementation with MANDATORY ML systems analysis
- Memory profiling and complexity analysis in every module
- Performance benchmarking and bottleneck identification
- Production context and real-world scaling implications
- NBGrader metadata and technical scaffolding
- Add export directives (#| default_exp)
- Checkpoint system implementation: Build checkpoint test files and CLI integration
- Module completion workflow: Implement
tito module completewith export and testing - MUST include systems insights: memory usage, computational complexity, scaling behavior
- MUST notify QA Agent after ANY module changes
Package Manager:
- Module integration and export validation
- Dependency resolution between modules
- Integration testing after exports
- Checkpoint system integration: Ensure checkpoint tests work with package exports
- Module-to-checkpoint mapping: Validate correct checkpoint triggered for each module
- MANDATORY: Validate ALL module exports
- MUST ensure modules work together
- MUST run integration tests
- MUST verify complete package builds
- MUST block release if integration fails
Quality Assurance:
- Test coverage and functionality WITH performance characteristics
- MUST test performance and memory usage, not just correctness
- Memory leak detection: Ensure operations don't unexpectedly consume memory
- Performance regression testing: Verify optimizations don't break over time
- Scaling behavior validation: Test how operations perform with large inputs
- Checkpoint test validation: Test all 16 checkpoint implementations thoroughly
- CLI integration testing: Verify all
tito checkpointcommands work correctly - Module completion workflow testing: Validate
tito module completeend-to-end - MANDATORY: Test ALL modified modules after ANY changes
- MUST run tests before ANY commit
- MUST verify module imports correctly
- MUST ensure all test functions work
- MUST validate systems analysis is present and accurate
- MUST report test results to Package Manager
Documentation Publisher:
- Markdown prose emphasizing SYSTEMS UNDERSTANDING
- MANDATORY: ML systems analysis sections in every module
- Module-specific ML systems thinking questions (analyze actual implementations, memory usage, scaling)
- Performance implications documentation: Explain computational complexity and memory usage
- Production context: How do real systems handle these operations?
- Checkpoint system documentation: Update documentation to reflect new capabilities
- Agent workflow documentation: Document successful agent coordination patterns
- CLI usage documentation: Document new commands and workflows for users
- MUST connect every implementation to broader systems principles
Workflow Coordinator:
- ML Systems focus enforcement: Ensure all modules teach systems principles through implementation
- Checkpoint system orchestration: Coordinate complex multi-agent implementations like checkpoint system
- Agent workflow enforcement: Ensure proper agent handoffs and communication protocols
- Systems analysis validation: Verify every module includes memory/performance/scaling analysis
- MUST enforce QA testing after EVERY module update
- CANNOT approve changes without QA test results
- MUST block commits if tests fail
- MUST ensure modules teach systems thinking, not just algorithms
🧪 QA Testing Protocol - MANDATORY
EVERY module update MUST trigger the following QA process:
🎯 Checkpoint System Testing Protocol - MANDATORY
When implementing checkpoint system features, follow this comprehensive testing protocol:
Checkpoint Implementation Testing
# Test each checkpoint file individually
python tests/checkpoints/checkpoint_00_environment.py
python tests/checkpoints/checkpoint_01_foundation.py
# ... through checkpoint_15_capstone.py
# Test checkpoint CLI integration
tito checkpoint status
tito checkpoint timeline --horizontal
tito checkpoint test 01
tito checkpoint run 00 --verbose
Module Completion Workflow Testing
# Test module completion workflow end-to-end
tito module complete 02_tensor
tito module complete tensor --skip-test
# Verify module-to-checkpoint mapping
# 02_tensor should trigger checkpoint_01_foundation
# 03_activations should trigger checkpoint_02_intelligence
# etc.
Integration Testing Requirements
-
All checkpoint tests execute without errors
-
CLI commands work with Rich visualizations
-
Module completion workflow functions end-to-end
-
Module-to-checkpoint mapping is correct
-
Progress tracking updates properly
-
Achievement celebrations display correctly
-
Immediate Testing After Changes
- QA Agent MUST be invoked after ANY module modification
- Module Developer CANNOT proceed without QA approval
- Workflow Coordinator MUST enforce this requirement
-
Comprehensive Test Suite - INCLUDING SYSTEMS VALIDATION
# QA Agent must run these tests for EVERY modified module: - Module imports without errors - All classes can be instantiated - All test functions execute successfully - No syntax errors present - Required profiler/classes exist - Tests only run when module executed directly (not on import) # NEW MANDATORY SYSTEMS TESTS: - Memory profiling sections are present and functional - Performance benchmarking code executes and measures complexity - Scaling behavior analysis is included and accurate - Production context sections reference real systems (PyTorch/TensorFlow) - Systems thinking questions analyze actual implemented code -
Test Execution Requirements
- Create isolated test environment with mocked dependencies
- Test both with mocks AND actual dependencies when available
- Verify module structure compliance
- Check for immediate test execution issues
- Validate all NBGrader metadata
-
Failure Protocol
- If ANY test fails, QA Agent MUST:
- Block the commit
- Report specific failures to Module Developer
- Require fixes before proceeding
- Re-test after fixes applied
- If ANY test fails, QA Agent MUST:
-
Success Protocol
- Only after ALL tests pass, QA Agent:
- Approves the changes
- Reports success to Workflow Coordinator
- Allows commit to proceed
- Only after ALL tests pass, QA Agent:
-
Test Results Documentation
- QA Agent MUST provide detailed test report including:
- Module name and version
- Tests run and results
- Any warnings or issues found
- Performance metrics if applicable
- Recommendations for improvement
- QA Agent MUST provide detailed test report including:
⚠️ Critical Requirements
- ML SYSTEMS FOCUS is MANDATORY - every module must teach systems engineering through implementation
- All module sections must be present including MANDATORY systems analysis
- Every test needs markdown explanation AND performance characteristics
- ML systems reflection is mandatory with questions analyzing actual implemented code
- Memory profiling and complexity analysis required in every module
- Production context sections must reference real systems (PyTorch, TensorFlow)
- Maintain immediate testing pattern (test after each implementation)
- Use clear, consistent section organization
- QA testing is MANDATORY before ANY commit (including systems validation)
🚨 CRITICAL RULE: ANYTHING IN tinytorch/ = UPDATE THE SOURCE IN modules/
GOLDEN RULE: If you see changes needed in tinytorch/ directory, make them in modules/ instead
MANDATORY WORKFLOW - NO EXCEPTIONS:
- ✅ ANY change in
tinytorch/→ Find corresponding file inmodules/source/XX_modulename/modulename_dev.py - ✅ ALWAYS edit:
modules/source/files ONLY - ✅ ALWAYS export: Use
tito module complete XX_modulenameto sync changes - ✅ ALWAYS use
tito: Never usenbdev_exportdirectly - usetitocommands only - ❌ NEVER edit: ANY file in
tinytorch/directory directly - ❌ NEVER commit: Manual changes to
tinytorch/files
CRITICAL: Always Use tito Commands
- ✅ Correct:
tito module complete 11_training - ✅ Correct:
tito module export 11_training - ❌ Wrong:
nbdev_export(bypasses student/staff workflow) - ❌ Wrong: Manual exports (inconsistent with user experience)
Why tito Only:
- Consistent workflow: Students and staff use
titocommands - Proper validation:
titoincludes testing and checkpoints - Auto-generated warnings:
titoadds protection headers automatically - Error handling:
titoprovides helpful error messages - Progress tracking:
titoshows visual progress and next steps
SIMPLE TEST: If the file path contains tinytorch/, DON'T EDIT IT DIRECTLY
WHY THIS RULE EXISTS:
- Core files are AUTO-GENERATED from source modules
- Direct core edits create dangerous SOURCE/COMPILED MISMATCH
- Next export will OVERWRITE manual core changes
- Creates INCONSISTENT BEHAVIOR between development and production
- Makes DEBUGGING IMPOSSIBLE when source ≠ compiled code
VIOLATION CONSEQUENCES:
- Manual core changes will be LOST on next export
- Source code and compiled code become INCONSISTENT
- IMPOSSIBLE TO REPRODUCE bugs in different environments
- BREAKS THE DEVELOPMENT WORKFLOW completely
CORRECT WORKFLOW EXAMPLE:
# ✅ CORRECT: Edit source file
vim modules/source/10_optimizers/optimizers_dev.py
# ✅ CORRECT: Export to regenerate core
tito module complete 10_optimizers
# ❌ WRONG: Never edit core directly
vim tinytorch/core/optimizers.py # FORBIDDEN!
EMERGENCY EXCEPTION PROTOCOL: If core files MUST be modified temporarily for testing:
- Document the manual change with clear comments
- Immediately update source to match the manual change
- Export immediately to sync source and core
- Never commit manual core changes to git
This rule is NON-NEGOTIABLE for maintaining code integrity.
🚨 CRITICAL: Module Section Ordering - MANDATORY STRUCTURE
THE LAST THREE SECTIONS OF EVERY MODULE MUST BE IN THIS EXACT ORDER:
-
if __name__ == "__main__":block - Contains all test executions- This is where all tests run when module is executed directly
- Consolidate ALL test execution here (no scattered if blocks throughout the module)
- Example:
if __name__ == "__main__": run_all_tests()
-
ML Systems Thinking Questions - Interactive NBGrader questions
- Must come AFTER the main execution block
- Contains 3-4 interactive reflection questions
- Section header:
## 🤔 ML Systems Thinking: Interactive Questions
-
MODULE SUMMARY - Always the ABSOLUTE LAST section
- Must be the final section before EOF
- Nothing should come after Module Summary
- Section header:
## 🎯 MODULE SUMMARY: [Module Name]
❌ INCORRECT Example (WRONG):
## 🎯 MODULE SUMMARY: Neural Networks
# Summary content here...
if __name__ == "__main__": # ❌ WRONG - comes after summary
run_tests()
✅ CORRECT Example (like 01_setup):
if __name__ == "__main__": # ✅ First of final three sections
run_all_tests()
## 🤔 ML Systems Thinking: Interactive Questions # ✅ Second
# Interactive NBGrader questions here...
## 🎯 MODULE SUMMARY: Setup Configuration # ✅ Always last
# Summary content here...
# [EOF]
Modules with scattered if __name__ blocks must be refactored to have a single consolidated block before ML Systems Thinking.
Remember: TinyTorch is an ML SYSTEMS course, not just an ML algorithms course. Students learn systems engineering principles through building complete implementations. Professional software development always uses branches AND comprehensive testing. This keeps the codebase stable, enables collaboration, and maintains a clean development history.