[PR #1382] [MERGED] chore(bib): verify 30 grandfathered paper-bib entries #6506

Closed
opened 2026-04-21 22:22:54 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1382
Author: @profvjreddi
Created: 4/17/2026
Status: Merged
Merged: 4/17/2026
Merged by: @profvjreddi

Base: devHead: fix/bib-verify-29-entries


📝 Commits (1)

  • 97a5429 chore(bib): verify 30 grandfathered paper-bib entries

📊 Changes

3 files changed (+153 additions, -186 deletions)

View changed files

📝 book/tools/bib_lint_baseline.json (+1 -181)
📝 interviews/paper/references.bib (+55 -2)
📝 periodic-table/paper/references.bib (+97 -3)

📄 Description

Summary

Follows up on #1373 (which grandfathered 29 @inproceedings missing publisher violations in two paper bibs as a CI stopgap). This PR replaces the grandfather with proper verification of all 30 entries (the 29 from CI + asanovic2006landscape which was already in the baseline).

What changed

For each of the 30 entries, added:

  • publisher = {<authoritative venue publisher>} — the field that was missing
  • x-verified = {2026-04-17}
  • x-verified-by = {claude-bib-sweep-2026-04}
  • x-verified-source = {<DOI or canonical proceedings URL>}

Three entry-type corrections for papers that were mis-tagged as @inproceedings but were actually standalone artifacts:

  • shoeybi2019megatron@misc (arXiv:1909.08053, never published at a venue)
  • asanovic2006landscape@techreport (UC Berkeley EECS-2006-183)
  • gu2023mamba@misc (arXiv:2312.00752; COLM 2024 was a later version)

Why authoritative, not Crossref-fuzzy

I tried Crossref's top-1 match for each key first. It returned wrong papers for ~15 of 30 (e.g. Vaswani 2017 → a Springer book chapter, Reddi 2020 MLPerf → "New Electronics" magazine, Jouppi 2017 TPU → a SCITEPRESS cloud-computing workshop). For canonical ML-systems papers, the venue-specific publisher is the authoritative signal.

Publisher map

Venue Publisher
NeurIPS Curran Associates, Inc.
OSDI, ATC USENIX Association
MLSys mlsys.org
SOSP, PLDI ACM
SC, ISCA IEEE (via ACM/IEEE co-sponsored proceedings)
ICLR OpenReview.net
CGO, ISPASS IEEE
EMNLP Association for Computational Linguistics

Baseline delta

book/tools/bib_lint_baseline.json: 67 → 37 grandfathered entries (−30 exactly).

Verification

$ python3 book/tools/bib_lint.py --check --all
bib_lint: check mode, 19 file(s)
Total: 0 NEW errors (37 grandfathered), 81 warnings

Verification coverage for the two touched bibs:

File Before After
interviews/paper/references.bib 0/52 (0%) 9/52 (17%)
periodic-table/paper/references.bib 0/44 (0%) 21/44 (48%)

Out of scope

  • The remaining 37 grandfathered entries (mostly older book-bib entries missing publisher for @inproceedings) are a separate cleanup sweep.
  • Author-list normalization (author-initials-only warnings) is its own pass.
  • The rest of vol1/vol2 bibs are at 3% / 17% verification — a broader sweep is a follow-up.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1382 **Author:** [@profvjreddi](https://github.com/profvjreddi) **Created:** 4/17/2026 **Status:** ✅ Merged **Merged:** 4/17/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `fix/bib-verify-29-entries` --- ### 📝 Commits (1) - [`97a5429`](https://github.com/harvard-edge/cs249r_book/commit/97a542922720280751d417f90ebfa58fe591434b) chore(bib): verify 30 grandfathered paper-bib entries ### 📊 Changes **3 files changed** (+153 additions, -186 deletions) <details> <summary>View changed files</summary> 📝 `book/tools/bib_lint_baseline.json` (+1 -181) 📝 `interviews/paper/references.bib` (+55 -2) 📝 `periodic-table/paper/references.bib` (+97 -3) </details> ### 📄 Description ## Summary Follows up on #1373 (which grandfathered 29 `@inproceedings missing publisher` violations in two paper bibs as a CI stopgap). This PR replaces the grandfather with **proper verification** of all 30 entries (the 29 from CI + `asanovic2006landscape` which was already in the baseline). ## What changed For each of the 30 entries, added: - `publisher = {<authoritative venue publisher>}` — the field that was missing - `x-verified = {2026-04-17}` - `x-verified-by = {claude-bib-sweep-2026-04}` - `x-verified-source = {<DOI or canonical proceedings URL>}` Three entry-type corrections for papers that were mis-tagged as `@inproceedings` but were actually standalone artifacts: - `shoeybi2019megatron` → `@misc` (arXiv:1909.08053, never published at a venue) - `asanovic2006landscape` → `@techreport` (UC Berkeley EECS-2006-183) - `gu2023mamba` → `@misc` (arXiv:2312.00752; COLM 2024 was a later version) ## Why authoritative, not Crossref-fuzzy I tried Crossref's top-1 match for each key first. It returned wrong papers for ~15 of 30 (e.g. Vaswani 2017 → a Springer book chapter, Reddi 2020 MLPerf → "New Electronics" magazine, Jouppi 2017 TPU → a SCITEPRESS cloud-computing workshop). For canonical ML-systems papers, the venue-specific publisher is the authoritative signal. ## Publisher map | Venue | Publisher | |-------|-----------| | NeurIPS | Curran Associates, Inc. | | OSDI, ATC | USENIX Association | | MLSys | mlsys.org | | SOSP, PLDI | ACM | | SC, ISCA | IEEE (via ACM/IEEE co-sponsored proceedings) | | ICLR | OpenReview.net | | CGO, ISPASS | IEEE | | EMNLP | Association for Computational Linguistics | ## Baseline delta `book/tools/bib_lint_baseline.json`: 67 → 37 grandfathered entries (−30 exactly). ## Verification ``` $ python3 book/tools/bib_lint.py --check --all bib_lint: check mode, 19 file(s) Total: 0 NEW errors (37 grandfathered), 81 warnings ``` Verification coverage for the two touched bibs: | File | Before | After | |------|--------|-------| | `interviews/paper/references.bib` | 0/52 (0%) | 9/52 (17%) | | `periodic-table/paper/references.bib` | 0/44 (0%) | 21/44 (48%) | ## Out of scope - The remaining 37 grandfathered entries (mostly older book-bib entries missing `publisher` for @inproceedings) are a separate cleanup sweep. - Author-list normalization (`author-initials-only` warnings) is its own pass. - The rest of vol1/vol2 bibs are at 3% / 17% verification — a broader sweep is a follow-up. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-21 22:22:54 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#6506