Scripts Directory
This directory contains various Python scripts used for book maintenance and processing.
Available Scripts
Figure Caption Improvement
The improve_figure_captions.py script provides automated caption enhancement using local Ollama LLM models:
# Improve all captions (recommended)
python3 scripts/improve_figure_captions.py -d contents/core/
# Analysis and utilities
python3 scripts/improve_figure_captions.py --analyze -d contents/core/
python3 scripts/improve_figure_captions.py --build-map -d contents/core/
📖 Full documentation: See FIGURE_CAPTIONS.md for complete usage guide, model selection, and troubleshooting.
Cross-Reference Generation
The cross_refs/ directory contains scripts for generating AI-powered cross-references with explanations.
📖 Full documentation: See cross_refs/RECIPE.md for complete workflow.
Python Dependencies
All Python dependencies are managed through the root-level requirements.txt file. This ensures consistent package versions across all scripts and the GitHub Actions workflow.
Adding New Dependencies
When adding new Python scripts that require external packages:
- Add the required packages to
requirements.txtat the project root - Include version constraints where appropriate (e.g.,
>=1.0.0) - Add comments to group related packages
- Test locally with:
pip install -r requirements.txt
Current Dependencies
The current dependencies include:
- Quarto/Jupyter:
jupyterlab-quarto,jupyter - NLP:
nltk(with stopwords and punkt data) - AI Integration:
openai,gradio - Document Processing:
pybtex,pypandoc,pyyaml - Image Processing:
Pillow - Validation:
jsonschema - Utilities:
absl-py
Subdirectory Requirements Files
Some subdirectories have their own requirements.txt files for specific workflows:
scripts/genai/requirements.txt- AI-specific dependenciesscripts/publish/requirements.txt- Publishing dependencies
These are kept for reference but the main workflow uses the root requirements.txt.
GitHub Actions Integration
The GitHub Actions workflow automatically:
- Caches Python packages for faster builds
- Installs all dependencies from
requirements.txt - Downloads required NLTK data
- Reports cache status in build summaries
Cache is invalidated when requirements.txt changes, ensuring dependencies stay up-to-date.
Pre-commit Setup
The project uses pre-commit hooks for code quality checks. The hooks run automatically on commit and include:
- Spell checking with codespell
- YAML validation for
_quarto-html.ymland_quarto-pdf.yml - Markdown formatting and linting
- Bibliography formatting with bibtex-tidy
- Custom Python scripts for section ID management and unreferenced label detection
Setup Instructions
-
Install pre-commit (included in requirements.txt):
pip install -r requirements.txt -
Install the git hooks:
pre-commit install -
Run manually (optional):
# Run on all files pre-commit run --all-files # Run on specific files pre-commit run --files path/to/file.qmd
Troubleshooting
-
NLTK data issues: The hooks automatically download required NLTK data, but if you encounter issues, you can manually run:
import nltk nltk.download('stopwords') nltk.download('punkt') -
Python environment: The hooks use isolated Python environments with the specified dependencies, so they should work regardless of your local Python setup.