Deleted reddi2024mlsysbook from tinytorch/paper and rafailov2023direct from
periodic-table/paper. Both were identified as true orphans (not cited in
any .qmd or .tex) and out of scope for their respective papers.
Same regression as vol1/vol2 references.bib (commit 42bc54275 figure-audit
feat) — five auxiliary bib files (interviews/paper, mlsysim/docs,
mlsysim/paper, periodic-table/paper, tinytorch/paper) had brace patterns
mangled in titles, e.g. 'Throughput-Latency Tradeoff in {LLM} Inference'
became 'Throughput-Latency Tradeoff in {LLM}} Inference', which
bibtex-tidy refuses to parse.
Restored to the parent of 42bc54275 (state at 9ebdf77d0) and
re-formatted via the bib_apply_mechanical + bibtex-tidy hooks.
Align the MLSys·im code, docs, paper, website, workflows, and lab wheel for the 0.1.1 release. This also fixes runtime/API issues found during release review and prepares the paper PDF plus archive package.
simulator/page.tsx:
- Activation memory: replace 2*L*H^2*B*S formula with selective-recompute
10*L*B*S*H formula (more accurate for modern training stacks).
- FLOPs per iter: remove the spurious 3x multiplier on fwd+bwd. The
flops_per_token figures in MODEL_CONFIGS already account for fwd+bwd.
issue-url.ts:
- buildContributeUrl() now accepts an optional customBody argument so
contribute page can pass an exported markdown body.
contribute/page.tsx:
- Pass exportAsGitHubBody() to buildContributeUrl() so the GitHub issue
pre-fills with the contributor's drafted question.
periodicTable.ts + periodic-table/table.yml:
- Remove duplicate 'Knowledge Distillation' and 'Systolic Array (TPU
Core)' entries under Efficiency & Optimization section.
__tests__/simulator-logic.test.ts: new tests asserting the corrected
memory + compute formulas against canonical model+hardware combos.
Add `text` language tag to 25 unlabeled fenced code blocks across the
public-facing READMEs. Mostly directory-tree listings, all-contributors
bot instructions, and pseudo-output ASCII blocks — none were getting
syntax highlighting anyway, but the explicit tag silences markdownlint
MD040 and signals intent ("this is plain text, not a forgotten lang").
The sub-project READMEs used an old-school nested-table card design
with hardcoded bgcolor="#ffffff", "#cfd6dd", "#eef2f7" plus deprecated
HTML4 attributes (cellpadding, cellspacing, border). It looked good in
light mode but produced harsh white islands in GitHub's dark theme,
which is what most readers see today.
Across 11 sub-READMEs:
- Strip the card wrapper so data tables are just clean
<table width="100%"> with semantic <thead>/<tbody>. Headers keep
their column widths; bgcolor/valign/zebra-stripe cruft is removed
(GitHub provides its own theme-aware striping).
- Convert the early-release callouts (and mlperf-edu's two-tier
status block + "source of truth" note + interviews' two info boxes)
to GitHub-native > [!NOTE] / > [!WARNING] / > [!TIP] callouts.
These are theme-aware, get proper icons, and render correctly in
light AND dark mode.
Net result: 528 lines of HTML cruft removed, 230 lines of clean
markdown added. Visual identity is preserved (callouts still stand
out, tables still stretch full-width) while becoming dark-mode safe
and consistent with the main README.
Clean up planning, kickoff, audit, and persona-feedback documents
accumulated during prior AI-assisted work sessions. These are
session artifacts, not durable documentation — the decisions they
captured have either shipped, been retired, or are traceable via
git history.
interviews/vault/REVIEWS.md is intentionally kept: it is cited by
section ID (H-6, H-7, H-21, C-6, ...) from production code in
interviews/vault-cli/ and interviews/vault/ and published as the
pyproject.toml Review-Ledger URL, which makes it engineering
documentation rather than a session artifact.
Deletions:
- RELEASE-PREP.md, review_prompt.md (root handoff / review prompts)
- interviews/vault/KICKOFF.md, BOOK_LINKING_PLAN.md, EXPANSION_PLAN.md
- interviews/staffml/FEEDBACK_SYNTHESIS.md, V1_REDESIGN_SPEC.md,
STAFFML_UX_PLAN.md, VAULT_DESIGN_PLAN.md
- interviews/staffml/.gemini-reviews/ (2 review call logs)
- book/docs/SVG_FIGURE_AUDIT_PLAN.md, book/tools/agent_personas.md
- mlsysim/docs/WEBSITE_AUDIT.md
- periodic-table/iteration-log.md, refinement-log.md
Reference fixes for pointers into deleted files:
- interviews/vault/ARCHITECTURE.md: drop section 21 (pointed at KICKOFF.md)
- interviews/vault/schema/question_schema.yaml: drop BOOK_LINKING_PLAN.md
reference in the author-curated resource description
- interviews/staffml/src/components/Footer.tsx: drop BOOK_LINKING_PLAN.md
reference from the docstring; rationale preserved
Also removes the untracked gemini_prompts/ directory at repo root.
Replace markdown blockquotes with a shared centered table pattern
(cellpadding, bgcolor panel, h3 + aligned paragraphs) so GitHub renders
consistent spacing. Align labs and mlsysim DEV-BANNER with the same layout
and 2026 messaging.
Use a short top-of-README callout for periodic-table, StaffML, TinyTorch,
slides, and instructors: live with the 2026 release, expect steady iteration,
link to GitHub issues. Slides banner replaces dev-only wording with the same
framing while keeping dev/live badges.
Replaces the PR #1373 grandfather stopgap with proper verification.
For each of the 30 @inproceedings entries that were missing the
required 'publisher' field, added:
- publisher = {<authoritative venue publisher>}
- x-verified = {2026-04-17}
- x-verified-by = {claude-bib-sweep-2026-04}
- x-verified-source = {<DOI or canonical proceedings URL>}
Entry-type corrections (two papers that were never published at a
proper venue and were mis-tagged as @inproceedings):
- shoeybi2019megatron → @misc (arXiv preprint only)
- asanovic2006landscape → @techreport (UC Berkeley EECS TR)
- gu2023mamba → @misc (arXiv; COLM 2024 was a later version)
Publisher map (authoritative, not Crossref-fuzzy which returned wrong
top-hits for most of these well-known ML-systems papers):
NeurIPS → Curran Associates, Inc.
OSDI/ATC → USENIX Association
MLSys → mlsys.org
SOSP/PLDI/SC/ISCA → ACM or IEEE per venue-specific proceedings
ICLR → OpenReview.net
CGO/ISPASS → IEEE
EMNLP → Association for Computational Linguistics
Baseline regenerated: grandfathered entries dropped from 67 to 37
(−30 exactly matching the sweep). Global bib_lint --check --all is
still clean: 0 NEW errors.
Verification coverage for the two paper bibs:
interviews/paper/references.bib 0 → 9 verified (of 52)
periodic-table/paper/references.bib 0 → 21 verified (of 44)
Both auto-fix hooks were failing book-validate-dev on first pass
because recent merges added content that wasn't committed in canonical
form. Running them now is purely mechanical:
- end-of-file-fixer: strip/add trailing newlines on 2 markdown files
- bibtex-tidy: re-align and re-sort 3 .bib files per hook config
No semantic content change. Verified bib_lint still passes (0 NEW
errors, 67 grandfathered).
- Recognize HTML comment close --!> in LineWalker (py/bad-tag-filter)
- Stop returning provider error detail to clients; log server-side (js/stack-trace-exposure)
- Harden migrate-html-to-yaml script tag match and tag stripping loops (js/bad-tag-filter, js/incomplete-multi-character-sanitization)
- Resolve post-login next redirect via URL() with same-origin checks (js/client-side-unvalidated-url-redirection)
The merge conflict resolution incorrectly kept the pre-polish paper.tex
which still had MLIR/compiler content and duplicate lstset overrides.
Restored from round-2 polish commit (78a7a77) which had:
- Removed all MLIR Section 4 content
- Clean elegant listing style (no overlapping line numbers)
- Polished references.bib with complete entries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply the creative and technical decisions from the 5-reviewer synthesis:
Creative:
- Subtitle: "A Generative Design Space for Modern Architectures"
-> "A Constraint-Driven Design Space" (drops Generative and the
Modern Architectures overclaim flagged by R1/R2/R5)
- Move §2.3 Irreducibility Criterion formal proofs to a new
appendix-proofs.tex; keep the Definition, cost-model distinction,
and Boundary paragraph inline. Proofs are flagged as
"research scaffolding, wrong order for learners" by the pedagogy
reviewer and "parameterized into near-vacuity" by the MLSys
reviewer. Appendix preserves them as formal backing.
Structural (sixth walkthrough):
- Add §4.6 An Honest Failure: Mamba — a sixth walkthrough where
the framework runs the four filters against the decode-time HBM
bandwidth constraint of long-context Transformers and is unable
to produce State (St) as an intervention. Explains exactly where
in the filter chain the search fails and names the scope: the
framework is generative over layout refactorings, not over
algorithm substitution (Mamba, Speculative Decoding, MoE).
Resolves the 3-of-5 convergence on scope overclaim + walkthrough
selection bias in a single edit.
Technical corrections:
- §4.5 million-token FLOPs formula: remove the spurious x1000
factor. The formula 2P * 1000 / compute gave 264 ms not 0.26 ms.
Decode compute per token is 2P (linear layers dominate once
attention is sub-quadratic); attention contributes ~2% and is
dropped with explanation.
- §4.1.4 "8-16x better than naive": rewrite to say "8-16x
reduction in HBM bytes transferred" rather than implying higher
arithmetic intensity. Both I_naive=124 and I_flash=64 sit below
the 295 ridge point, so both are memory-bound; the gain is in
HBM traffic, not intensity.
- §6.1 Bound 1 memory capacity: add a clarifying sentence that
the optimizer and KV-cache terms don't coexist. Training drops
the KV-cache; inference drops the optimizer. Bound is a
pedagogical superset.
- §6.1 Bound 2 throughput: add a clarifying sentence that the 2P
compute term is the inference form; training replaces with 6P
for backward-pass cost, and the gradient-communication term
drops out entirely during inference.
Build: 26 pages, 421 KB, zero undefined refs.
- Refactored table.yml to expand grid from 15 to 18 columns, perfectly aligning elements to their block columns without visual impurity.
- Updated schema validation and SVG layout logic to handle 18-col bounds.
- Re-generated periodic_table_hero.svg with the new clean grid layout.
- Switched 'Constraint-Driven Lowering Heuristic' to 'Constraint-Driven Structural Search' to emphasize generative design over compiler semantics.
- Fixed paper.tex title to remove academic jargon ('Generative Design Space').
- Styled code listings with custom 'elegant' light-theme formatting.
- Added Mamba compound visual figure to the limitations section.
- Added 'Predict the Paper' table to validate the heuristic empirically.
- Generated and included 'Nomenclature' table of all 90 elements.
- Re-added 'Framework Tax' limitation acknowledging software bottlenecks.
Parallel-agent bibliography verification sweep applied to the paper
bibliography files outside the book proper. These are academic papers
that live in the repo (mlsysim tutorial paper, tinytorch paper,
interviews paper, periodic-table paper) and were previously only subject
to bibtex-tidy formatting, not §5 hygiene validation.
Batches F and G of the Pass 16 parallel sweep processed 77 entries
total across 6 files; 73 auto-applied at HIGH+MEDIUM confidence.
Per-file summary:
mlsysim/paper/references.bib 50 entries applied (0 open)
mlsysim/docs/references.bib 15 entries applied (0 open)
tinytorch/paper/references.bib 7 entries applied (1 open)
interviews/paper/references.bib 3 entries applied (0 open)
periodic-table/paper/ref.bib 11 entries applied (0 open)
Each applied entry carries:
publisher or journal (primary field) + doi (when present on source)
+ x-verified = "2026-04-08"
+ x-verified-by = "pass-16-bib-sweep"
+ x-verified-source = <authoritative URL from DBLP, Crossref, arXiv, etc.>
One open finding (intentional skip):
tanenbaum1987minix — typed @article but the actual publication is
A. S. Tanenbaum's 1987 book "Operating Systems: Design and
Implementation" (Prentice-Hall), not a journal article. The fix is
to re-type as @book, not fill a wrong `journal` field. Flagged for
a future type-refactor pass.
Cross-file duplicate keys are expected and correct: dao2022flashattention,
mattson2020mlperf, and vaswani2017attention each appear in multiple
paper .bib files because each paper independently cites these
foundational works. Each copy was verified and annotated separately.
This is the first pass that the repo-wide bib_lint + bibtex-tidy
pre-commit hooks have been applied to these paper .bib files.
Brings in upstream dev work that landed since the local merge:
* feat(framework): periodic-table page with YAML source of truth
* chore(framework): periodic-table paper, figures, iteration archive
* chore(deps): bump vite 8.0.3 → 8.0.5 in interviews/staffml
* chore: 2 newsletter sync commits
Conflict resolution:
* interviews/staffml/src/components/Nav.tsx — kept the Lab/More menu
reorg from staffml UX work, with /framework added to Lab alongside
Roofline and Simulator
* periodic-table/index.html — took origin's auto-generated version
and re-applied the AUTO-GENERATED header comment
* periodic-table/paper/paper.tex — took origin's version
(the newer iteration archive snapshot)
* periodic-table/paper/references.bib — took origin's version
* periodic-table/debate-log.md — removed entirely (was from a
divergent earlier iteration; no longer needed)
Apply codified §10.2 percent-spelling rule to a single residual in
the Zillow correction cascade footnote: "(25% of workforce)" →
"(25 percent of workforce)".
Round 1 pass 03 swept body prose for vol1 percent symbols and missed
this footnote-body residual. This pass closes the gap.
Pass 15 phase C, vol1 percent-symbol category → 0.
Add the Periodic Table of Machine Learning Systems source of truth and
the academic paper that introduces it.
* table.yml — canonical 90-element design space (8 abstraction layers
x 5 information-processing roles), with table.schema.json
* scripts/ — Node.js tools that build index.html from the YAML,
migrate prior HTML to YAML, and validate the schema
* paper/ — LaTeX paper "The Periodic Table of Machine Learning Systems:
A Constraint-Driven Design Heuristic with Compiler Correspondence"
* paper/scripts/generate_periodic_svg.py — vector hero figure
generator that reads table.yml and emits a crisp SVG/PDF the paper
embeds in place of a screenshot
* paper/figures/ — molecular_ml, mamba, and periodic_table_hero
figures (SVG sources + PDF outputs)
* paper/Makefile — full build pipeline (svgs -> rsvg-convert -> pdf)
* paper/references.bib — bibliography including Hennessy-Patterson,
Hooker, Sze/Emer, Halide, GPipe, PipeDream, Korthikanti, Kung
Companion commit to c5f90022b (the YAML migration). Pulls in the rest of
the periodic-table work that was sitting in the working tree.
- periodic-table/paper/: LaTeX paper draft (paper.tex, references.bib),
Makefile with hero-figure and SVG->PDF rules (uses rsvg-convert), the
Puppeteer capture_table.js script used to screenshot the table for the
paper, generate_periodic_svg.py which builds the hero SVG from
table.yml (the same source of truth used by the React app), and the
figure sources (SVGs) + derived outputs (PDFs/PNGs) + compiled paper.pdf.
- root .gitignore gains two entries following the existing convention
(cf. the !interviews/paper/fig-*.pdf line just above) so the
periodic-table paper PDF + figure PDFs are not swept up by the blanket
*.pdf LaTeX-artifact rule.
- periodic-table/paper/.gitignore excludes the LaTeX build artifacts
(aux, bbl, blg, log, out, fdb_latexmk, synctex, toc) that make paper
regenerates.
- periodic-table/{iteration,refinement,debate}-log.md: research
provenance from the 100-round LLM iteration loop and the 5-expert
debate simulations that produced v0.2 of the table.
- periodic-table/scripts/archive/: historical iteration scripts
(iterate.sh, debate.sh, debate-continue.sh, run_100_rounds.sh, plus
the Python helpers append_log.py, get_elements.py, patch_informal.py,
patch_website.py, run_claude_loop.py, run_iterations{,_13,_16_20}.py,
update_log.py) moved out of the repo root into an archive subdirectory
with a README documenting their provenance and caveats. These scripts
are preserved for reproducibility and are not part of the active build
pipeline -- the source of truth is now periodic-table/table.yml.
- root package.json + package-lock.json pin puppeteer ^24 for
capture_table.js.
- periodic-table/table.yml is now the canonical source for 90 elements
and 53 compounds, validated against table.schema.json on every build.
- build-html.mjs regenerates the standalone index.html idempotently;
sentinel comments mark the data sections so CSS and render JS are
preserved across edits.
- sync-periodic-table.mjs regenerates the StaffML React data file as a
prebuild/predev hook in interviews/staffml; the generated TS file
carries an @generated header.
- validate.mjs catches cell collisions, broken bonds, undeclared symbol
collisions, and unresolved formula references at edit time.
- new /framework route in StaffML renders the table interactively with
search, block filter, element detail modal, and bidirectional cross-
references between elements and Molecular ML compounds.
- dark mode is the site-wide default; CSP allows 'unsafe-eval' in dev so
Next.js HMR can run without blocking client JS hydration.
- Makefile + package.json scripts let you run \`make all\` (or just
\`npm run dev\`) to keep both consumers in sync from the YAML.