Files
cs249r_book/book/tools/scripts/README.md

78 lines
3.3 KiB
Markdown

# Scripts Directory
Automation scripts and tools for the Machine Learning Systems textbook.
## Deprecation Note
For workflows now exposed by Binder, prefer `./book/binder ...` commands over direct script execution.
- Validation checks: use `./book/binder validate ...`
- Maintenance utilities: use `./book/binder maintain ...`
Scripts remain available as internal utilities, but direct invocation is soft-deprecated for Binder-covered tasks.
## Directory Structure
```
scripts/
├── common/ Shared base classes, config, logging, validators
├── content/ Content validation, formatting, and editing tools
├── docs/ Script documentation
├── genai/ AI-assisted tools (quizzes, footnotes, dash fixes)
├── glossary/ Glossary generation and consolidation
├── images/ Image processing, compression, validation
├── infrastructure/ CI/CD and Docker utilities
├── maintenance/ Repo health, image casing, build artifact cleanup
├── publish/ MIT Press release builder, figure extraction, deployment
├── socratiQ/ SocratiQ integration
├── testing/ Debug builds, test runners, linters
└── utilities/ Footnote analysis, ref auditing, JSON/EPUB validation
```
## Key Scripts by Task
### Content Editing
- `content/format_blank_lines.py` - Normalize blank lines in .qmd files
- `content/format_tables.py` - Format Quarto tables
- `content/section_splitter.py` - Split chapters into sections for processing
- `content/relocate_figures.py` - Move figures closer to first reference
- `content/manage_section_ids.py` - Manage `@sec-` cross-reference IDs
### Validation
- **Reference check** — `./book/binder validate references` (native CLI; validates .bib vs academic DBs via [hallucinator](https://github.com/gianlucasb/hallucinator)). See [README_REFERENCE_CHECK.md](README_REFERENCE_CHECK.md).
- `content/check_duplicate_labels.py` - Find duplicate labels
- `content/check_fig_references.py` - Validate figure references
- `content/check_unreferenced_labels.py` - Find unused labels
- `content/validate_citations.py` - Check citation formatting
- `utilities/validate_epub.py` - Validate EPUB output
- `utilities/validate_json.py` - Validate JSON files
### Publishing
- `publish/mit-press-release.sh` - Build MIT Press PDFs (regular or copy-edit)
- `publish/extract_figures.py` - Extract figure lists for MIT Press submission
- `publish/publish.sh` - Full release workflow with versioning
- `publish/render_compress_publish.py` - Render, compress, and publish
### Images
- `images/compress_images.py` - Compress images for web/PDF
- `images/validate_image_references.py` - Check image references
- `images/convert_svg_to_png.py` - SVG to PNG conversion
### Glossary
- `glossary/build_global_glossary.py` - Build master glossary from chapters
- `glossary/consolidate_similar_terms.py` - Merge near-duplicate terms
### AI Tools
- `genai/quizzes.py` - Generate quiz questions
- `genai/footnote_assistant.py` - AI-assisted footnote writing
## Usage
All Python scripts use `python3`. Most support `--help` for options.
```bash
python3 book/tools/scripts/content/format_blank_lines.py path/to/file.qmd
python3 book/tools/scripts/publish/extract_figures.py --vol 1
./book/tools/scripts/publish/mit-press-release.sh --vol1
```