Files
cs249r_book/scripts/README.md
Vijay Janapa Reddi 496e728135 fix(bib): restore vol1/vol2 references.bib after title-mangling regression
Commit 42bc54275 (figure-audit feat) inadvertently ran a tool that
broke BibTeX title syntax across hundreds of entries: e.g.
'{TensorFlow: Large-Scale...}' became '{{TensorFlow}}: {Large}-Scale...}',
producing unbalanced braces that caused the bib_lint parser to
truncate parsing partway through the entry. This surfaced in
pre-commit as 772 'missing required field' violations.

Restoring vol1+vol2 references.bib to the pre-mangling state
(9ebdf77d0) preserves all legitimate citation work from earlier
commits while undoing the unintended damage. The mechanical
formatter and bibtex-tidy hooks then re-emit a stable form.

Also: trailing newline added to scripts/README.md by pre-commit's
end-of-file-fixer.
2026-04-27 15:11:37 -04:00

2.0 KiB

Figure Audit Automation

This directory contains figure_audit.py, a script designed to automate the visual auditing of figures within the ML Systems textbook.

What it does

The script orchestrates a multimodal audit of every figure across Volume 1 and Volume 2 of the textbook. It ensures that the prose, the captions (fig-cap), and the alt-text (fig-alt) precisely match the content of the fully rendered visual images.

  1. Discovery: It scans the book/quarto/contents/ directory to identify all .qmd chapters containing figures.
  2. Visual Extraction: It resolves the corresponding published HTML URL for each chapter, parses the HTML, and downloads the exact rendered <img src="..."> and inline <svg> visual assets locally.
  3. Auditing: It dispatches parallel worker tasks via the gemini CLI. The CLI is given explicit instructions to load the local images visually, compare them directly against the .qmd source text, and evaluate them based on the figure-audit-brief.md rubric.
  4. Reporting: It generates strict, granular YAML output files in .claude/_reviews/Figure Audit/, detailing any misalignments (e.g., the text claims 10^4 but the chart shows 10^3) along with surgically precise .qmd fix recommendations.

How to use it

Run the script from the repository root:

python3 scripts/figure_audit.py

Pre-requisites

  • You must have gemini CLI installed and authenticated on your local machine.
  • The script assumes the rendered HTML book is available at https://harvard-edge.github.io/cs249r_book_dev/... (used purely to scrape the final image variants).

Applying the fixes

Once figure_audit.py finishes running, your .claude/_reviews/Figure Audit/ directory will be populated with .yml files containing proposed_fix entries.

These fixes are written as precise, minimal adjustments targeting the .qmd source files. They can either be applied manually by a human reviewing the YAML reports, or parsed programmatically/agentically to apply the diffs across the workspace.