mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 18:18:42 -05:00
Commit42bc54275(figure-audit feat) inadvertently ran a tool that broke BibTeX title syntax across hundreds of entries: e.g. '{TensorFlow: Large-Scale...}' became '{{TensorFlow}}: {Large}-Scale...}', producing unbalanced braces that caused the bib_lint parser to truncate parsing partway through the entry. This surfaced in pre-commit as 772 'missing required field' violations. Restoring vol1+vol2 references.bib to the pre-mangling state (9ebdf77d0) preserves all legitimate citation work from earlier commits while undoing the unintended damage. The mechanical formatter and bibtex-tidy hooks then re-emit a stable form. Also: trailing newline added to scripts/README.md by pre-commit's end-of-file-fixer.
32 lines
2.0 KiB
Markdown
32 lines
2.0 KiB
Markdown
# Figure Audit Automation
|
|
|
|
This directory contains `figure_audit.py`, a script designed to automate the visual auditing of figures within the ML Systems textbook.
|
|
|
|
## What it does
|
|
|
|
The script orchestrates a multimodal audit of every figure across Volume 1 and Volume 2 of the textbook. It ensures that the prose, the captions (`fig-cap`), and the alt-text (`fig-alt`) precisely match the content of the fully rendered visual images.
|
|
|
|
1. **Discovery:** It scans the `book/quarto/contents/` directory to identify all `.qmd` chapters containing figures.
|
|
2. **Visual Extraction:** It resolves the corresponding published HTML URL for each chapter, parses the HTML, and downloads the exact rendered `<img src="...">` and inline `<svg>` visual assets locally.
|
|
3. **Auditing:** It dispatches parallel worker tasks via the `gemini` CLI. The CLI is given explicit instructions to load the local images visually, compare them directly against the `.qmd` source text, and evaluate them based on the `figure-audit-brief.md` rubric.
|
|
4. **Reporting:** It generates strict, granular YAML output files in `.claude/_reviews/Figure Audit/`, detailing any misalignments (e.g., the text claims $10^4$ but the chart shows $10^3$) along with surgically precise `.qmd` fix recommendations.
|
|
|
|
## How to use it
|
|
|
|
Run the script from the repository root:
|
|
|
|
```bash
|
|
python3 scripts/figure_audit.py
|
|
```
|
|
|
|
### Pre-requisites
|
|
|
|
* You must have `gemini` CLI installed and authenticated on your local machine.
|
|
* The script assumes the rendered HTML book is available at `https://harvard-edge.github.io/cs249r_book_dev/...` (used purely to scrape the final image variants).
|
|
|
|
### Applying the fixes
|
|
|
|
Once `figure_audit.py` finishes running, your `.claude/_reviews/Figure Audit/` directory will be populated with `.yml` files containing `proposed_fix` entries.
|
|
|
|
These fixes are written as precise, minimal adjustments targeting the `.qmd` source files. They can either be applied manually by a human reviewing the YAML reports, or parsed programmatically/agentically to apply the diffs across the workspace.
|