mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-04-29 17:20:21 -05:00
- Add clear rules for when to use feature branches vs direct dev work - Define decision criteria: feature branches for new features/changes needing review, direct dev for docs/small fixes - Update development workflow to show both approaches - Add automatic workflow recommendation requirements for AI assistant - Maintain existing quality standards and pre-commit requirements This provides clear guidance for consistent development workflow decisions.
313 lines
15 KiB
Plaintext
313 lines
15 KiB
Plaintext
# Cursor Rules for MLSysBook Textbook Project
|
|
|
|
## 🚫 NEVER work directly on main branch
|
|
- Always create a feature branch for any new development work
|
|
- Branch naming convention: `feature/description` or `fix/description`
|
|
- Example: `feature/add-new-cleanup-method`, `fix/improve-file-detection`
|
|
|
|
## 📝 Commit Guidelines
|
|
- **MANDATORY: Run `pre-commit run --all-files` before every commit**
|
|
- Make atomic commits with clear, descriptive messages
|
|
- Use conventional commit format: `type(scope): description`
|
|
- Types: feat, fix, docs, style, refactor, test, chore
|
|
- Never use exclamation marks (!) in commit messages or shell commands
|
|
- Examples:
|
|
- `feat(cleanup): add support for large file detection`
|
|
- `fix(ui): improve file selection interface`
|
|
- `docs(readme): update installation instructions`
|
|
|
|
## 🔄 Branch Management & Workflow Decision Rules
|
|
- Create branches from dev: `git checkout dev && git pull origin dev && git checkout -b feature/your-feature-name`
|
|
- Keep branches focused on single features or fixes
|
|
- **ALWAYS use `--no-ff` when merging into dev branch** to maintain clear history
|
|
- Delete feature branches after merging to dev
|
|
- **NEVER commit directly to main branch**
|
|
- All changes go to dev first, then dev merges to main for releases
|
|
|
|
### 🌿 FEATURE BRANCH REQUIRED (create `feature/description-name`):
|
|
- New features (CLI commands, workflow changes, build system updates)
|
|
- Breaking changes or major refactoring
|
|
- Multi-file changes that work together as a unit
|
|
- Experimental work or anything uncertain
|
|
- Changes that might need review
|
|
- Work spanning multiple sessions
|
|
|
|
### 🚀 DIRECT DEV WORK ALLOWED (commit directly to dev):
|
|
- Documentation fixes (typos, outdated info, README updates)
|
|
- Small bug fixes (obvious, low-risk, single-file)
|
|
- Version bumps or dependency updates
|
|
- Pre-commit hook fixes
|
|
- Single-file maintenance tasks
|
|
|
|
### 🤔 Decision Filter: Ask "Would this benefit from review?"
|
|
- **Yes** → Feature branch + PR workflow
|
|
- **No** → Direct dev work
|
|
|
|
## 🚀 Development Workflow
|
|
|
|
### For Feature Branch Work:
|
|
1. Always start with: `git checkout dev && git pull origin dev`
|
|
2. Create feature branch: `git checkout -b feature/your-feature`
|
|
3. Make changes
|
|
4. **Run `pre-commit run --all-files` before every commit**
|
|
5. Commit frequently with meaningful messages (only after pre-commit passes)
|
|
6. Test your changes thoroughly
|
|
7. Push branch: `git push origin feature/your-feature`
|
|
8. Create pull request targeting **dev branch** for review
|
|
9. Merge to dev using `git merge --no-ff feature/your-feature` after approval
|
|
10. Delete feature branch after successful merge
|
|
|
|
### For Direct Dev Work:
|
|
1. Always start with: `git checkout dev && git pull origin dev`
|
|
2. Make changes directly on dev branch
|
|
3. **Run `pre-commit run --all-files` before every commit**
|
|
4. Commit with meaningful messages (only after pre-commit passes)
|
|
5. Test your changes thoroughly
|
|
6. Push to dev: `git push origin dev`
|
|
|
|
## 📋 Code Quality Standards
|
|
- Follow PEP 8 for Python code
|
|
- Add type hints to all functions
|
|
- Include docstrings for all classes and methods
|
|
- Write unit tests for new functionality
|
|
- Use meaningful variable and function names
|
|
|
|
## 🔒 Security & Safety
|
|
- Never commit sensitive data (API keys, passwords, etc.)
|
|
- Use environment variables for configuration
|
|
- Always ask user before pushing any changes
|
|
- Backup important files before major changes
|
|
|
|
## 📁 Project Structure
|
|
- Keep related files organized in appropriate directories
|
|
- Use clear, descriptive file names
|
|
- Maintain consistent import structure
|
|
- Document any new dependencies
|
|
- Follow the existing book structure: contents/core/, contents/frontmatter/, contents/labs/
|
|
- Keep chapter files organized by topic
|
|
- Maintain consistent file naming: topic_name.qmd, topic_name.bib
|
|
- Organize images in chapter-specific directories
|
|
|
|
## 🧪 Testing Requirements
|
|
- Write tests for new features
|
|
- Ensure existing tests pass before committing
|
|
- Use pytest for testing framework
|
|
- Aim for good test coverage
|
|
- Test book builds locally before pushing
|
|
- Verify all links and cross-references work
|
|
- Check both HTML and PDF outputs
|
|
- Ensure code examples run correctly
|
|
|
|
## 🔍 Pre-commit Requirements
|
|
- **CRITICAL: Never commit without running `pre-commit run --all-files` first**
|
|
- **ALWAYS ensure pre-commit hooks pass before any commit**
|
|
- **NO EXCEPTIONS: All pre-commit violations must be fixed before committing**
|
|
- Pre-commit should check: code formatting, linting, type checking, security
|
|
- If pre-commit fails, fix issues and run again until all checks pass
|
|
|
|
## 📚 Documentation & Content
|
|
- Update README.md for new features
|
|
- Add inline comments for complex logic
|
|
- Document any new command-line arguments
|
|
- Keep changelog updated
|
|
- Ensure all content follows academic writing standards
|
|
- Maintain consistent terminology across chapters
|
|
- Update bibliography and references when adding new citations
|
|
- Test all code examples and ensure they work correctly
|
|
|
|
## 🚨 Important Reminders
|
|
- **CRITICAL: Never commit without running `pre-commit run --all-files` first**
|
|
- ALWAYS ask before pushing to remote
|
|
- Never force push to main or dev branches
|
|
- **ALWAYS use `--no-ff` when merging to dev** to preserve feature branch history
|
|
- Keep commits small and focused
|
|
- Review code before committing
|
|
- Test thoroughly before pushing
|
|
- All features go through dev branch before reaching main
|
|
|
|
## 🌳 Git Workflow & Branch Strategy
|
|
- **main**: Production-ready code, tagged releases only
|
|
- **dev**: Integration branch where all features are merged before main
|
|
- **feature/**: Individual feature branches created from dev
|
|
- **fix/**: Bug fix branches created from dev
|
|
|
|
### Merge Strategy
|
|
- **Feature to dev**: `git merge --no-ff feature/branch-name`
|
|
- **Dev to main**: `git merge --no-ff dev` (for releases only)
|
|
- **Why --no-ff**: Preserves branch history, makes it clear when features were integrated
|
|
- **Benefits**: Easy to see feature boundaries, easier rollbacks, cleaner git log
|
|
|
|
### Example Commands
|
|
```bash
|
|
# Start new feature
|
|
git checkout dev && git pull origin dev
|
|
git checkout -b feature/new-awesome-feature
|
|
|
|
# After development is complete
|
|
git checkout dev
|
|
git pull origin dev # Ensure dev is up to date
|
|
git merge --no-ff feature/new-awesome-feature
|
|
git push origin dev
|
|
git branch -d feature/new-awesome-feature # Clean up
|
|
```
|
|
|
|
## 🎯 Project-Specific Rules
|
|
- This is a Machine Learning Systems textbook project - prioritize content quality and accuracy
|
|
- Be extra careful with content changes that affect the live textbook
|
|
- Always test builds locally before pushing changes
|
|
- Maintain consistent formatting and style across all chapters
|
|
- Follow Quarto/Markdown best practices for academic writing
|
|
- Ensure all cross-references and links remain valid
|
|
- Keep the book structure and navigation consistent
|
|
- Test both HTML and PDF outputs when making structural changes
|
|
|
|
## 💻 Coding & Tool Development Rules
|
|
|
|
### 🏗️ Software Engineering Best Practices
|
|
- **SOLID Principles**: Single responsibility, Open/closed, Liskov substitution, Interface segregation, Dependency inversion
|
|
- **DRY Principle**: Don't repeat yourself - extract common functionality into reusable modules
|
|
- **Separation of Concerns**: Keep business logic, data access, and UI concerns separate
|
|
- **Fail Fast**: Validate inputs early and provide clear error messages
|
|
- **Defense in Depth**: Multiple layers of validation and error handling
|
|
|
|
### 📦 Python Package Structure
|
|
- **Proper Module Organization**: Use meaningful package/module names that reflect functionality
|
|
- **Import Standards**: Use absolute imports, group imports (stdlib, third-party, local)
|
|
- **__init__.py Files**: Include proper module initialization and expose clean APIs
|
|
- **Package Dependencies**: Manage dependencies through requirements files with version pinning
|
|
- **Virtual Environments**: Always use virtual environments for development
|
|
|
|
### 🔧 Code Quality Standards
|
|
- **Type Hints**: MANDATORY for all function parameters, return types, and class attributes
|
|
- **Docstrings**: Google-style docstrings for all classes, methods, and functions
|
|
- **Error Handling**: Specific exception types, proper logging, graceful degradation
|
|
- **Input Validation**: Validate all inputs with clear error messages
|
|
- **Resource Management**: Use context managers for file operations, API connections
|
|
- **Configuration**: Use environment variables and config files, never hardcode values
|
|
|
|
### 🏛️ Architecture Patterns
|
|
- **Command Pattern**: For CLI tools with clear command/action separation
|
|
- **Factory Pattern**: For creating objects based on configuration or runtime conditions
|
|
- **Strategy Pattern**: For interchangeable algorithms (e.g., different LLM providers)
|
|
- **Observer Pattern**: For progress reporting and event handling
|
|
- **Dependency Injection**: Make dependencies explicit and testable
|
|
|
|
### 📁 File & Directory Structure
|
|
```
|
|
tools/scripts/
|
|
├── __init__.py # Package initialization
|
|
├── common/ # Shared utilities and base classes
|
|
│ ├── __init__.py
|
|
│ ├── base_classes.py # Abstract base classes
|
|
│ ├── exceptions.py # Custom exception definitions
|
|
│ ├── config.py # Configuration management
|
|
│ ├── logging_config.py # Logging setup
|
|
│ └── validators.py # Input validation utilities
|
|
├── content/ # Content management tools
|
|
│ ├── __init__.py
|
|
│ ├── caption_improver.py # Specific tool modules
|
|
│ └── section_manager.py
|
|
├── maintenance/ # System maintenance tools
|
|
├── testing/ # Test utilities and frameworks
|
|
└── utils/ # General utility functions
|
|
```
|
|
|
|
### 🧪 Testing Requirements
|
|
- **Unit Tests**: Test individual functions and classes in isolation
|
|
- **Integration Tests**: Test component interactions
|
|
- **End-to-End Tests**: Test complete workflows
|
|
- **Property-Based Testing**: For complex algorithms with many edge cases
|
|
- **Mock External Dependencies**: File systems, APIs, network calls
|
|
- **Test Coverage**: Aim for >80% code coverage, 100% for critical paths
|
|
|
|
### 🔍 Code Review Standards
|
|
- **Single Responsibility**: Each function/class should have one clear purpose
|
|
- **Function Length**: Keep functions under 50 lines when possible
|
|
- **Cognitive Complexity**: Avoid deeply nested conditionals and loops
|
|
- **Magic Numbers**: Use named constants instead of literal values
|
|
- **Dead Code**: Remove unused imports, variables, and functions
|
|
- **Performance**: Consider time/space complexity for data processing tools
|
|
|
|
### 📊 Logging & Monitoring
|
|
- **Structured Logging**: Use consistent log levels and structured data
|
|
- **Progress Indicators**: For long-running operations with rich progress bars
|
|
- **Error Context**: Include relevant context in error messages
|
|
- **Performance Metrics**: Track execution time for optimization opportunities
|
|
- **Debug Information**: Comprehensive debug logging for troubleshooting
|
|
|
|
### 🛡️ Security & Safety Practices
|
|
- **Input Sanitization**: Sanitize all user inputs, especially file paths
|
|
- **Path Traversal Prevention**: Validate file paths to prevent directory traversal
|
|
- **API Key Management**: Never hardcode API keys, use environment variables
|
|
- **Safe File Operations**: Use atomic operations, temporary files for safety
|
|
- **Backup Strategies**: Always backup before destructive operations
|
|
- **Permission Checks**: Verify file/directory permissions before operations
|
|
|
|
### 🔄 Backward Compatibility & Versioning
|
|
- **API Stability**: Maintain backward compatibility for internal APIs
|
|
- **Deprecation Warnings**: Provide clear migration paths for deprecated features
|
|
- **Version Pinning**: Pin dependency versions to avoid breaking changes
|
|
- **Configuration Migration**: Handle config file format changes gracefully
|
|
- **Feature Flags**: Use feature flags for experimental functionality
|
|
|
|
### 🚀 Performance & Scalability
|
|
- **Lazy Loading**: Load resources only when needed
|
|
- **Caching**: Cache expensive operations (file parsing, API calls)
|
|
- **Batch Processing**: Process multiple items efficiently
|
|
- **Memory Management**: Monitor memory usage for large data processing
|
|
- **Async Operations**: Use async/await for I/O-bound operations when beneficial
|
|
- **Resource Pooling**: Reuse expensive resources (database connections, etc.)
|
|
|
|
### 📝 Documentation Requirements
|
|
- **README Files**: Each module should have clear usage documentation
|
|
- **API Documentation**: Generate API docs from docstrings
|
|
- **Architecture Documentation**: Document design decisions and patterns
|
|
- **Troubleshooting Guides**: Common issues and solutions
|
|
- **Examples**: Working code examples for complex functionality
|
|
- **Changelog**: Document breaking changes and migration paths
|
|
|
|
### 🔧 Development Tools Integration
|
|
- **Pre-commit Hooks**: MANDATORY formatting, linting, type checking, security scans
|
|
- **IDE Configuration**: Consistent editor settings across the team
|
|
- **Debugging Support**: Include debug configurations and utilities
|
|
- **Profiling Tools**: Integration with performance profiling tools
|
|
- **Static Analysis**: Use tools like pylint, mypy, bandit for code quality
|
|
|
|
### 🌍 Compatibility & Portability
|
|
- **Cross-Platform**: Code should work on Windows, macOS, and Linux
|
|
- **Python Version**: Support specified Python version range
|
|
- **Path Handling**: Use pathlib for cross-platform path operations
|
|
- **Environment Variables**: Handle different shell environments
|
|
- **Character Encoding**: Always specify UTF-8 encoding for text files
|
|
|
|
## 🤝 Expert Collaboration Guidelines
|
|
- Textbook Content: Act as an expert textbook editor - don't always agree, think critically about content quality, structure, and pedagogical effectiveness
|
|
- Code Development: Act as an expert software engineer - recommend best practices, consider maintainability, performance, and scalability
|
|
- Deployment/DevOps: Act as an expert DevOps engineer - balance complexity with maintenance ease, recommend robust CI/CD practices
|
|
- Challenge Assumptions: Question approaches that may not be optimal for the project's long-term success
|
|
- Provide Alternatives: When disagreeing, offer concrete alternatives with reasoning
|
|
- Maintain Standards: Uphold high quality standards even when it means pushing back on quick solutions
|
|
- Code Writing: Ask for confirmation before writing code unless explicitly instructed to proceed - discuss approach, design, and implementation strategy first
|
|
|
|
## 🔄 Automatic Workflow Recommendations
|
|
- **ALWAYS automatically recommend** the appropriate workflow (feature branch vs direct dev) when starting new work
|
|
- Base recommendations on the workflow decision rules above
|
|
- Provide clear rationale for the recommendation
|
|
- Show the exact git commands to use for the recommended approach
|
|
- Allow user to override the recommendation if they disagree
|
|
- Create atomic commits with conventional commit format (feat/fix/docs/refactor)
|
|
- Organize multiple changes into logical, focused commits
|
|
|
|
## 🔍 Code Review Checklist
|
|
Before committing, ensure:
|
|
- [ ] **FIRST: Pre-commit hooks pass (`pre-commit run --all-files`)**
|
|
- [ ] Code follows project style guidelines
|
|
- [ ] All tests pass
|
|
- [ ] No sensitive data in commits
|
|
- [ ] Meaningful commit message
|
|
- [ ] Changes are focused and atomic
|
|
- [ ] Documentation updated if needed
|
|
- [ ] Book builds successfully (for content changes)
|
|
- [ ] Tools work with both local and cloud LLMs (for tool changes)
|
|
- [ ] Error handling is robust (for tool changes)
|
|
- [ ] Progress indicators are clear (for tool changes) |