Files
cs249r_book/scripts/README.md
Vijay Janapa Reddi 496e728135 fix(bib): restore vol1/vol2 references.bib after title-mangling regression
Commit 42bc54275 (figure-audit feat) inadvertently ran a tool that
broke BibTeX title syntax across hundreds of entries: e.g.
'{TensorFlow: Large-Scale...}' became '{{TensorFlow}}: {Large}-Scale...}',
producing unbalanced braces that caused the bib_lint parser to
truncate parsing partway through the entry. This surfaced in
pre-commit as 772 'missing required field' violations.

Restoring vol1+vol2 references.bib to the pre-mangling state
(9ebdf77d0) preserves all legitimate citation work from earlier
commits while undoing the unintended damage. The mechanical
formatter and bibtex-tidy hooks then re-emit a stable form.

Also: trailing newline added to scripts/README.md by pre-commit's
end-of-file-fixer.
2026-04-27 15:11:37 -04:00

32 lines
2.0 KiB
Markdown

# Figure Audit Automation
This directory contains `figure_audit.py`, a script designed to automate the visual auditing of figures within the ML Systems textbook.
## What it does
The script orchestrates a multimodal audit of every figure across Volume 1 and Volume 2 of the textbook. It ensures that the prose, the captions (`fig-cap`), and the alt-text (`fig-alt`) precisely match the content of the fully rendered visual images.
1. **Discovery:** It scans the `book/quarto/contents/` directory to identify all `.qmd` chapters containing figures.
2. **Visual Extraction:** It resolves the corresponding published HTML URL for each chapter, parses the HTML, and downloads the exact rendered `<img src="...">` and inline `<svg>` visual assets locally.
3. **Auditing:** It dispatches parallel worker tasks via the `gemini` CLI. The CLI is given explicit instructions to load the local images visually, compare them directly against the `.qmd` source text, and evaluate them based on the `figure-audit-brief.md` rubric.
4. **Reporting:** It generates strict, granular YAML output files in `.claude/_reviews/Figure Audit/`, detailing any misalignments (e.g., the text claims $10^4$ but the chart shows $10^3$) along with surgically precise `.qmd` fix recommendations.
## How to use it
Run the script from the repository root:
```bash
python3 scripts/figure_audit.py
```
### Pre-requisites
* You must have `gemini` CLI installed and authenticated on your local machine.
* The script assumes the rendered HTML book is available at `https://harvard-edge.github.io/cs249r_book_dev/...` (used purely to scrape the final image variants).
### Applying the fixes
Once `figure_audit.py` finishes running, your `.claude/_reviews/Figure Audit/` directory will be populated with `.yml` files containing `proposed_fix` entries.
These fixes are written as precise, minimal adjustments targeting the `.qmd` source files. They can either be applied manually by a human reviewing the YAML reports, or parsed programmatically/agentically to apply the diffs across the workspace.