Files
cs249r_book/CITATION.bib
Vijay Janapa Reddi b79e6dc882 pass 16 bib: pilot sweep (vol1 + CITATION) + bib_lint infrastructure
Two deliverables in one commit because they are co-dependent:

1. bib_lint.py — repo-wide BibTeX parser, validator, and formatter

   New tool at book/tools/bib_lint.py. Proper stateful parser (not
   regex) that handles nested braces in titles (\'{e}, \^{o}, \"{o}
   LaTeX accents), double vs. brace quoted field values, biblatex
   date-vs-year equivalence, and the long author lists common in
   multi-author ML papers (Theano, Habitat, etc.).

   Enforces §5 Bibliography Hygiene rules:
     - Required fields per entry type (publisher, journal, booktitle,
       year, author, title, institution, school as applicable)
     - Forbidden fields dropped (organization, address — per MIT
       Press round 1 cleanup)
     - Journal names spelled out (detect J. Mach. Learn. Res. patterns)
     - Author list rules (no et al., no em-dash shorthand, warn on
       initial-only first names)
     - Pages format (require --, not single hyphen)
     - DOI format (bare, no https:// prefix)
     - x-verified ISO-8601 date validation

   Provides bib_lint.apply_fields() as the SAFE alternative to
   regex-based field insertion. Used by the parallel-agent sweep
   apply pipeline to insert verified metadata without risk of
   mangling titles containing braces.

   CLI modes: --check (validate, exit non-zero on new errors),
   --fix (rewrite to canonical form), --report (detailed violations,
   default), --baseline (regenerate the grandfather allow-list).

2. Pilot sweep: vol1 (715 entries) + CITATION.bib (1 entry)

   Pass 16 parallel-agent sweep Batch A. One general-purpose agent
   verified 25 flagged entries via DBLP, Crossref, arXiv, publisher
   pages, OpenReview, and ACL Anthology. Operated under the ten-rule
   anti-hallucination contract documented in §5:
     - Every field traced to a source URL with verbatim quote
     - HIGH confidence required ≥2 independent-domain sources
     - NOT_FOUND always acceptable over guessing
     - DOI captured opportunistically for all verified entries

   Results: 24 VERIFIED (19 HIGH + 5 MEDIUM), 1 NOT_FOUND. DOIs
   added to 6 entries (plus 1 canonical-DOI correction for
   Rajbhandari2020, replacing the ACM 10.5555 placeholder with the
   IEEE SC20 DOI). All 24 carry x-verified / x-verified-by /
   x-verified-source markers for the audit trail.

   Not applied: wolf2017we. Agent flagged this entry as likely
   fabricated — the title claims a paper about Google Bigtable
   authored by Thomas Wolf (Hugging Face), but the URL points to a
   Hugging Face blog about datasets. Title, author, and URL do not
   match each other. Flagged for human review, intentionally left
   in the open-findings ledger.

3. Pre-commit hook integration

   Repo-wide bib_lint --check hook added to .pre-commit-config.yaml,
   runs after bibtex-tidy on every .bib file in the repo (not just
   vol1/vol2). Uses a baseline allow-list at bib_lint_baseline.json
   that grandfathers 226 pre-existing violations across all 19 .bib
   files — only NEW violations block commits going forward.

   bibtex-tidy scope was also broadened from quarto/contents to
   the whole repo so paper bibs (mlsysim, tinytorch, interviews,
   periodic-table) get formatted consistently.

Post-sweep state:
  - vol1 bibliography-hygiene findings: 24 -> 1 (wolf2017we)
  - CITATION.bib bibliography-hygiene findings: 1 -> 0
  - Total remaining: 195 findings across vol2 + 6 paper .bib files,
    scheduled for the parallel-agent fan-out (Batches B-G)
2026-04-08 14:53:21 -04:00

17 lines
606 B
BibTeX

@inproceedings{reddi2024mlsysbook,
title = {MLSysBook.AI: Principles and Practices of Machine Learning Systems Engineering},
author = {Reddi, Vijay Janapa},
year = {2024},
booktitle = {
2024 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS)
},
publisher = {IEEE},
pages = {41--42},
doi = {10.1109/CODES-ISSS60120.2024.00015},
url = {https://mlsysbook.org},
note = {Available at: https://mlsysbook.org},
x-verified = {2026-04-08},
x-verified-by = {pass-16-bib-sweep},
x-verified-source = {https://dblp.org/rec/conf/codesisss/Reddi24},
}