mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-03 16:18:49 -05:00
111 lines
3.6 KiB
Markdown
111 lines
3.6 KiB
Markdown
# Scripts Directory
|
|
|
|
This directory contains various Python scripts used for book maintenance and processing.
|
|
|
|
## Available Scripts
|
|
|
|
### Figure Caption Improvement
|
|
The `improve_figure_captions.py` script provides automated caption enhancement using local Ollama LLM models:
|
|
|
|
```bash
|
|
# Improve all captions (recommended)
|
|
python3 scripts/improve_figure_captions.py -d contents/core/
|
|
|
|
# Analysis and utilities
|
|
python3 scripts/improve_figure_captions.py --analyze -d contents/core/
|
|
python3 scripts/improve_figure_captions.py --build-map -d contents/core/
|
|
```
|
|
|
|
📖 **Full documentation**: See [`FIGURE_CAPTIONS.md`](FIGURE_CAPTIONS.md) for complete usage guide, model selection, and troubleshooting.
|
|
|
|
### Cross-Reference Generation
|
|
The `cross_refs/` directory contains scripts for generating AI-powered cross-references with explanations.
|
|
|
|
📖 **Full documentation**: See [`cross_refs/RECIPE.md`](/tools/scripts/cross_refs/RECIPE.md) for complete workflow.
|
|
## Python Dependencies
|
|
|
|
All Python dependencies are managed through the root-level `requirements.txt` file. This ensures consistent package versions across all scripts and the GitHub Actions workflow.
|
|
|
|
### Adding New Dependencies
|
|
|
|
When adding new Python scripts that require external packages:
|
|
|
|
1. Add the required packages to `requirements.txt` at the project root
|
|
2. Include version constraints where appropriate (e.g., `>=1.0.0`)
|
|
3. Add comments to group related packages
|
|
4. Test locally with: `pip install -r requirements.txt`
|
|
|
|
### Current Dependencies
|
|
|
|
The current dependencies include:
|
|
|
|
- **Quarto/Jupyter**: `jupyterlab-quarto`, `jupyter`
|
|
- **NLP**: `nltk` (with stopwords and punkt data)
|
|
- **AI Integration**: `openai`, `gradio`
|
|
- **Document Processing**: `pybtex`, `pypandoc`, `pyyaml`
|
|
- **Image Processing**: `Pillow`
|
|
- **Validation**: `jsonschema`
|
|
- **Utilities**: `absl-py`
|
|
|
|
### Subdirectory Requirements Files
|
|
|
|
Some subdirectories have their own `requirements.txt` files for specific workflows:
|
|
|
|
- `scripts/genai/requirements.txt` - AI-specific dependencies
|
|
- `scripts/publish/requirements.txt` - Publishing dependencies
|
|
|
|
These are kept for reference but the main workflow uses the root `requirements.txt`.
|
|
|
|
### GitHub Actions Integration
|
|
|
|
The GitHub Actions workflow automatically:
|
|
|
|
1. Caches Python packages for faster builds
|
|
2. Installs all dependencies from `requirements.txt`
|
|
3. Downloads required NLTK data
|
|
4. Reports cache status in build summaries
|
|
|
|
Cache is invalidated when `requirements.txt` changes, ensuring dependencies stay up-to-date.
|
|
|
|
## Pre-commit Setup
|
|
|
|
The project uses pre-commit hooks for code quality checks. The hooks run automatically on commit and include:
|
|
|
|
- **Spell checking** with codespell
|
|
- **YAML validation** for `_quarto-html.yml` and `_quarto-pdf.yml`
|
|
- **Markdown formatting** and linting
|
|
- **Bibliography formatting** with bibtex-tidy
|
|
- **Custom Python scripts** for section ID management and unreferenced label detection
|
|
|
|
### Setup Instructions
|
|
|
|
1. **Install pre-commit** (included in requirements.txt):
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
2. **Install the git hooks**:
|
|
```bash
|
|
pre-commit install
|
|
```
|
|
|
|
3. **Run manually** (optional):
|
|
```bash
|
|
# Run on all files
|
|
pre-commit run --all-files
|
|
|
|
# Run on specific files
|
|
pre-commit run --files path/to/file.qmd
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
- **NLTK data issues**: The hooks automatically download required NLTK data, but if you encounter issues, you can manually run:
|
|
```python
|
|
import nltk
|
|
nltk.download('stopwords')
|
|
nltk.download('punkt')
|
|
```
|
|
|
|
- **Python environment**: The hooks use isolated Python environments with the specified dependencies, so they should work regardless of your local Python setup.
|