Files
cs249r_book/.cursorrules
Vijay Janapa Reddi 834ce8a140 docs: update Cursor Rules with workflow decision guidelines
- Add clear rules for when to use feature branches vs direct dev work
- Define decision criteria: feature branches for new features/changes needing review, direct dev for docs/small fixes
- Update development workflow to show both approaches
- Add automatic workflow recommendation requirements for AI assistant
- Maintain existing quality standards and pre-commit requirements

This provides clear guidance for consistent development workflow decisions.
2025-08-19 11:56:07 -04:00

313 lines
15 KiB
Plaintext

# Cursor Rules for MLSysBook Textbook Project
## 🚫 NEVER work directly on main branch
- Always create a feature branch for any new development work
- Branch naming convention: `feature/description` or `fix/description`
- Example: `feature/add-new-cleanup-method`, `fix/improve-file-detection`
## 📝 Commit Guidelines
- **MANDATORY: Run `pre-commit run --all-files` before every commit**
- Make atomic commits with clear, descriptive messages
- Use conventional commit format: `type(scope): description`
- Types: feat, fix, docs, style, refactor, test, chore
- Never use exclamation marks (!) in commit messages or shell commands
- Examples:
- `feat(cleanup): add support for large file detection`
- `fix(ui): improve file selection interface`
- `docs(readme): update installation instructions`
## 🔄 Branch Management & Workflow Decision Rules
- Create branches from dev: `git checkout dev && git pull origin dev && git checkout -b feature/your-feature-name`
- Keep branches focused on single features or fixes
- **ALWAYS use `--no-ff` when merging into dev branch** to maintain clear history
- Delete feature branches after merging to dev
- **NEVER commit directly to main branch**
- All changes go to dev first, then dev merges to main for releases
### 🌿 FEATURE BRANCH REQUIRED (create `feature/description-name`):
- New features (CLI commands, workflow changes, build system updates)
- Breaking changes or major refactoring
- Multi-file changes that work together as a unit
- Experimental work or anything uncertain
- Changes that might need review
- Work spanning multiple sessions
### 🚀 DIRECT DEV WORK ALLOWED (commit directly to dev):
- Documentation fixes (typos, outdated info, README updates)
- Small bug fixes (obvious, low-risk, single-file)
- Version bumps or dependency updates
- Pre-commit hook fixes
- Single-file maintenance tasks
### 🤔 Decision Filter: Ask "Would this benefit from review?"
- **Yes** → Feature branch + PR workflow
- **No** → Direct dev work
## 🚀 Development Workflow
### For Feature Branch Work:
1. Always start with: `git checkout dev && git pull origin dev`
2. Create feature branch: `git checkout -b feature/your-feature`
3. Make changes
4. **Run `pre-commit run --all-files` before every commit**
5. Commit frequently with meaningful messages (only after pre-commit passes)
6. Test your changes thoroughly
7. Push branch: `git push origin feature/your-feature`
8. Create pull request targeting **dev branch** for review
9. Merge to dev using `git merge --no-ff feature/your-feature` after approval
10. Delete feature branch after successful merge
### For Direct Dev Work:
1. Always start with: `git checkout dev && git pull origin dev`
2. Make changes directly on dev branch
3. **Run `pre-commit run --all-files` before every commit**
4. Commit with meaningful messages (only after pre-commit passes)
5. Test your changes thoroughly
6. Push to dev: `git push origin dev`
## 📋 Code Quality Standards
- Follow PEP 8 for Python code
- Add type hints to all functions
- Include docstrings for all classes and methods
- Write unit tests for new functionality
- Use meaningful variable and function names
## 🔒 Security & Safety
- Never commit sensitive data (API keys, passwords, etc.)
- Use environment variables for configuration
- Always ask user before pushing any changes
- Backup important files before major changes
## 📁 Project Structure
- Keep related files organized in appropriate directories
- Use clear, descriptive file names
- Maintain consistent import structure
- Document any new dependencies
- Follow the existing book structure: contents/core/, contents/frontmatter/, contents/labs/
- Keep chapter files organized by topic
- Maintain consistent file naming: topic_name.qmd, topic_name.bib
- Organize images in chapter-specific directories
## 🧪 Testing Requirements
- Write tests for new features
- Ensure existing tests pass before committing
- Use pytest for testing framework
- Aim for good test coverage
- Test book builds locally before pushing
- Verify all links and cross-references work
- Check both HTML and PDF outputs
- Ensure code examples run correctly
## 🔍 Pre-commit Requirements
- **CRITICAL: Never commit without running `pre-commit run --all-files` first**
- **ALWAYS ensure pre-commit hooks pass before any commit**
- **NO EXCEPTIONS: All pre-commit violations must be fixed before committing**
- Pre-commit should check: code formatting, linting, type checking, security
- If pre-commit fails, fix issues and run again until all checks pass
## 📚 Documentation & Content
- Update README.md for new features
- Add inline comments for complex logic
- Document any new command-line arguments
- Keep changelog updated
- Ensure all content follows academic writing standards
- Maintain consistent terminology across chapters
- Update bibliography and references when adding new citations
- Test all code examples and ensure they work correctly
## 🚨 Important Reminders
- **CRITICAL: Never commit without running `pre-commit run --all-files` first**
- ALWAYS ask before pushing to remote
- Never force push to main or dev branches
- **ALWAYS use `--no-ff` when merging to dev** to preserve feature branch history
- Keep commits small and focused
- Review code before committing
- Test thoroughly before pushing
- All features go through dev branch before reaching main
## 🌳 Git Workflow & Branch Strategy
- **main**: Production-ready code, tagged releases only
- **dev**: Integration branch where all features are merged before main
- **feature/**: Individual feature branches created from dev
- **fix/**: Bug fix branches created from dev
### Merge Strategy
- **Feature to dev**: `git merge --no-ff feature/branch-name`
- **Dev to main**: `git merge --no-ff dev` (for releases only)
- **Why --no-ff**: Preserves branch history, makes it clear when features were integrated
- **Benefits**: Easy to see feature boundaries, easier rollbacks, cleaner git log
### Example Commands
```bash
# Start new feature
git checkout dev && git pull origin dev
git checkout -b feature/new-awesome-feature
# After development is complete
git checkout dev
git pull origin dev # Ensure dev is up to date
git merge --no-ff feature/new-awesome-feature
git push origin dev
git branch -d feature/new-awesome-feature # Clean up
```
## 🎯 Project-Specific Rules
- This is a Machine Learning Systems textbook project - prioritize content quality and accuracy
- Be extra careful with content changes that affect the live textbook
- Always test builds locally before pushing changes
- Maintain consistent formatting and style across all chapters
- Follow Quarto/Markdown best practices for academic writing
- Ensure all cross-references and links remain valid
- Keep the book structure and navigation consistent
- Test both HTML and PDF outputs when making structural changes
## 💻 Coding & Tool Development Rules
### 🏗️ Software Engineering Best Practices
- **SOLID Principles**: Single responsibility, Open/closed, Liskov substitution, Interface segregation, Dependency inversion
- **DRY Principle**: Don't repeat yourself - extract common functionality into reusable modules
- **Separation of Concerns**: Keep business logic, data access, and UI concerns separate
- **Fail Fast**: Validate inputs early and provide clear error messages
- **Defense in Depth**: Multiple layers of validation and error handling
### 📦 Python Package Structure
- **Proper Module Organization**: Use meaningful package/module names that reflect functionality
- **Import Standards**: Use absolute imports, group imports (stdlib, third-party, local)
- **__init__.py Files**: Include proper module initialization and expose clean APIs
- **Package Dependencies**: Manage dependencies through requirements files with version pinning
- **Virtual Environments**: Always use virtual environments for development
### 🔧 Code Quality Standards
- **Type Hints**: MANDATORY for all function parameters, return types, and class attributes
- **Docstrings**: Google-style docstrings for all classes, methods, and functions
- **Error Handling**: Specific exception types, proper logging, graceful degradation
- **Input Validation**: Validate all inputs with clear error messages
- **Resource Management**: Use context managers for file operations, API connections
- **Configuration**: Use environment variables and config files, never hardcode values
### 🏛️ Architecture Patterns
- **Command Pattern**: For CLI tools with clear command/action separation
- **Factory Pattern**: For creating objects based on configuration or runtime conditions
- **Strategy Pattern**: For interchangeable algorithms (e.g., different LLM providers)
- **Observer Pattern**: For progress reporting and event handling
- **Dependency Injection**: Make dependencies explicit and testable
### 📁 File & Directory Structure
```
tools/scripts/
├── __init__.py # Package initialization
├── common/ # Shared utilities and base classes
│ ├── __init__.py
│ ├── base_classes.py # Abstract base classes
│ ├── exceptions.py # Custom exception definitions
│ ├── config.py # Configuration management
│ ├── logging_config.py # Logging setup
│ └── validators.py # Input validation utilities
├── content/ # Content management tools
│ ├── __init__.py
│ ├── caption_improver.py # Specific tool modules
│ └── section_manager.py
├── maintenance/ # System maintenance tools
├── testing/ # Test utilities and frameworks
└── utils/ # General utility functions
```
### 🧪 Testing Requirements
- **Unit Tests**: Test individual functions and classes in isolation
- **Integration Tests**: Test component interactions
- **End-to-End Tests**: Test complete workflows
- **Property-Based Testing**: For complex algorithms with many edge cases
- **Mock External Dependencies**: File systems, APIs, network calls
- **Test Coverage**: Aim for >80% code coverage, 100% for critical paths
### 🔍 Code Review Standards
- **Single Responsibility**: Each function/class should have one clear purpose
- **Function Length**: Keep functions under 50 lines when possible
- **Cognitive Complexity**: Avoid deeply nested conditionals and loops
- **Magic Numbers**: Use named constants instead of literal values
- **Dead Code**: Remove unused imports, variables, and functions
- **Performance**: Consider time/space complexity for data processing tools
### 📊 Logging & Monitoring
- **Structured Logging**: Use consistent log levels and structured data
- **Progress Indicators**: For long-running operations with rich progress bars
- **Error Context**: Include relevant context in error messages
- **Performance Metrics**: Track execution time for optimization opportunities
- **Debug Information**: Comprehensive debug logging for troubleshooting
### 🛡️ Security & Safety Practices
- **Input Sanitization**: Sanitize all user inputs, especially file paths
- **Path Traversal Prevention**: Validate file paths to prevent directory traversal
- **API Key Management**: Never hardcode API keys, use environment variables
- **Safe File Operations**: Use atomic operations, temporary files for safety
- **Backup Strategies**: Always backup before destructive operations
- **Permission Checks**: Verify file/directory permissions before operations
### 🔄 Backward Compatibility & Versioning
- **API Stability**: Maintain backward compatibility for internal APIs
- **Deprecation Warnings**: Provide clear migration paths for deprecated features
- **Version Pinning**: Pin dependency versions to avoid breaking changes
- **Configuration Migration**: Handle config file format changes gracefully
- **Feature Flags**: Use feature flags for experimental functionality
### 🚀 Performance & Scalability
- **Lazy Loading**: Load resources only when needed
- **Caching**: Cache expensive operations (file parsing, API calls)
- **Batch Processing**: Process multiple items efficiently
- **Memory Management**: Monitor memory usage for large data processing
- **Async Operations**: Use async/await for I/O-bound operations when beneficial
- **Resource Pooling**: Reuse expensive resources (database connections, etc.)
### 📝 Documentation Requirements
- **README Files**: Each module should have clear usage documentation
- **API Documentation**: Generate API docs from docstrings
- **Architecture Documentation**: Document design decisions and patterns
- **Troubleshooting Guides**: Common issues and solutions
- **Examples**: Working code examples for complex functionality
- **Changelog**: Document breaking changes and migration paths
### 🔧 Development Tools Integration
- **Pre-commit Hooks**: MANDATORY formatting, linting, type checking, security scans
- **IDE Configuration**: Consistent editor settings across the team
- **Debugging Support**: Include debug configurations and utilities
- **Profiling Tools**: Integration with performance profiling tools
- **Static Analysis**: Use tools like pylint, mypy, bandit for code quality
### 🌍 Compatibility & Portability
- **Cross-Platform**: Code should work on Windows, macOS, and Linux
- **Python Version**: Support specified Python version range
- **Path Handling**: Use pathlib for cross-platform path operations
- **Environment Variables**: Handle different shell environments
- **Character Encoding**: Always specify UTF-8 encoding for text files
## 🤝 Expert Collaboration Guidelines
- Textbook Content: Act as an expert textbook editor - don't always agree, think critically about content quality, structure, and pedagogical effectiveness
- Code Development: Act as an expert software engineer - recommend best practices, consider maintainability, performance, and scalability
- Deployment/DevOps: Act as an expert DevOps engineer - balance complexity with maintenance ease, recommend robust CI/CD practices
- Challenge Assumptions: Question approaches that may not be optimal for the project's long-term success
- Provide Alternatives: When disagreeing, offer concrete alternatives with reasoning
- Maintain Standards: Uphold high quality standards even when it means pushing back on quick solutions
- Code Writing: Ask for confirmation before writing code unless explicitly instructed to proceed - discuss approach, design, and implementation strategy first
## 🔄 Automatic Workflow Recommendations
- **ALWAYS automatically recommend** the appropriate workflow (feature branch vs direct dev) when starting new work
- Base recommendations on the workflow decision rules above
- Provide clear rationale for the recommendation
- Show the exact git commands to use for the recommended approach
- Allow user to override the recommendation if they disagree
- Create atomic commits with conventional commit format (feat/fix/docs/refactor)
- Organize multiple changes into logical, focused commits
## 🔍 Code Review Checklist
Before committing, ensure:
- [ ] **FIRST: Pre-commit hooks pass (`pre-commit run --all-files`)**
- [ ] Code follows project style guidelines
- [ ] All tests pass
- [ ] No sensitive data in commits
- [ ] Meaningful commit message
- [ ] Changes are focused and atomic
- [ ] Documentation updated if needed
- [ ] Book builds successfully (for content changes)
- [ ] Tools work with both local and cloud LLMs (for tool changes)
- [ ] Error handling is robust (for tool changes)
- [ ] Progress indicators are clear (for tool changes)