Three improvements to the PyPI publish pipeline, all small, all
high-value. The pipeline now has proper pre-publish AND post-publish
verification, and ships with cryptographic provenance.
1. Python version matrix on the test stage
Previously: pytest ran once on Python 3.11 only.
Now: runs in parallel on 3.10, 3.11, 3.12, 3.13 — matching the
versions claimed in pyproject.toml classifiers. Catches cross-version
bugs BEFORE the wheel ships. fail-fast: false means all four complete
even if one fails, so we see the full picture. Zero wall-clock cost
(parallel fan-out).
2. New 'verify-pypi' job (post-publish smoke test)
Runs after publish-pypi. Creates fresh venv, does pip install from
the real PyPI index (with 4-attempt retry + 60s backoff for CDN
propagation), imports the package, checks version + license, and
runs the CLI smoke that appears in the runbook (mlsysim eval
Llama3_8B H100 --batch-size 32). Closes the gap between 'upload
accepted' and 'users can actually install' — catches CDN, metadata
rendering, and platform-tag failures that pre-publish tests can't
see.
3. Explicit PEP 740 attestations on upload
pypa/gh-action-pypi-publish generates attestations by default with
OIDC, but we now set attestations: true explicitly. This records
cryptographic provenance ('this wheel was built by this exact
workflow run from this exact commit'), lets downstream users
verify the chain of custody, and makes the intent obvious to
anyone reading the workflow later.
Also renumbers pipeline stages in header comment and updates
mlsysim/RELEASE.md's 'Happy path' description to reflect the new
7-stage pipeline (was 6 stages).
5.9 KiB
MLSys·im Release Runbook
Releases are automated via the mlsysim-pypi-publish.yml GitHub Actions
workflow. Publishing happens when a tag matching mlsysim-v* is pushed to
origin. The workflow authenticates to PyPI via Trusted Publishing (OIDC) —
no PyPI API token is stored in the repo or in GitHub Secrets.
Happy path — the 3-step release
Prerequisite: your changes must already be merged to dev, with the version
bumped in pyproject.toml, mlsysim/__init__.py, CITATION.cff, and an entry
added to CHANGELOG.md.
# 1. Move to merged dev
git checkout dev
git pull --ff-only origin dev
# 2. Tag the release (annotated, prefixed)
git tag -a mlsysim-v0.1.2 -m "MLSys·im 0.1.2"
# 3. Push the tag → the workflow fires automatically
git push origin mlsysim-v0.1.2
From there, the workflow does everything:
- Verify — tag format, version coherence across
pyproject.toml/__init__.py/CITATION.cff, CHANGELOG entry present, tag reachable from dev. - Test — full pytest suite in parallel across every supported Python version (3.10, 3.11, 3.12, 3.13). Catches cross-version issues before the wheel ships.
- Build —
python -m build→ wheel + sdist;twine checkon both. - Publish to PyPI — via
pypa/gh-action-pypi-publish+ OIDC, with PEP 740 attestations (cryptographic provenance). - Verify from PyPI — post-publish smoke test:
pip installfrom the public PyPI (with CDN propagation retry), import check, CLI smoke. Closes the gap between "upload accepted" and "user install works." - GitHub Release — creates the release, attaches wheel + sdist, uses
RELEASE_NOTES_<version>.mdas the body if present. - Docs redeploy — dispatches
mlsysim-publish-live.ymlondevso the docs site reflects the new version.
Monitor the run: https://github.com/harvard-edge/cs249r_book/actions/workflows/mlsysim-pypi-publish.yml
Pre-release checklist (before tagging)
Run this locally to catch the easy failures before the workflow does:
cd mlsysim && pytest tests/ -q→ 0 failures- Versions aligned: all four places read the same
X.Y.Zgrep -E '^version = ' mlsysim/pyproject.toml grep -E '^__version__ = ' mlsysim/mlsysim/__init__.py grep -E '^version:' mlsysim/CITATION.cff head -3 mlsysim/CHANGELOG.md - CHANGELOG top entry is the version about to ship
- Optional:
mlsysim/RELEASE_NOTES_<version>.mdwritten for the GH Release body - CLI smoke:
mlsysim eval Llama3_8B H100 --batch-size 32returns a scorecard - Docs render cleanly:
cd mlsysim/docs && quarto render— noUnable to resolve link targetwarnings
If anything fails, fix on a PR to dev and merge before tagging. The workflow will re-run these checks, but catching them locally saves 5 minutes per attempt.
Trusted Publishing — one-time setup
If the workflow fails with an OIDC error, Trusted Publishing has not been configured on pypi.org. Do this once:
-
Sign in to https://pypi.org/ as a maintainer of the
mlsysimproject. -
Click Add a new pending publisher (or Manage for existing ones).
-
Fill in:
Field Value PyPI Project name mlsysimOwner harvard-edgeRepository name cs249r_bookWorkflow name mlsysim-pypi-publish.ymlEnvironment name pypi-mlsysim -
Save. No token is ever generated; GitHub's OIDC provider attests the workflow identity at publish time and PyPI trusts that attestation.
This setup is per-project and per-workflow. It stays in place across workflow runs indefinitely; only re-do it if the workflow filename or environment name changes.
Post-release verification
From a clean venv (CI already ran this, but a human spot-check catches UX bugs):
python -m venv /tmp/release-verify && source /tmp/release-verify/bin/activate
pip install mlsysim==<just-released-version>
python -c "import mlsysim; print('OK', mlsysim.__version__)"
mlsysim eval Llama3_8B H100 --batch-size 32
deactivate
Open https://mlsysbook.ai/mlsysim/ in an incognito window; confirm:
- Version number on the site matches
- Navbar shows
MLSys·im(mixed case, notMLSys·IM) - Footer shows
Code: Apache-2.0 · Docs: CC-BY-NC-SA 4.0 - Getting Started and Tutorials still load
Announce (optional)
- Bump
mlsysim/docs/config/announcement.ymlbanner if the release is user-visible (major features, breaking changes). - Cross-post to the textbook newsletter / course channels.
Rollback
You cannot re-upload a PyPI version, even after deleting it. If a release has a critical bug:
- Yank the bad version on pypi.org (Manage → Versions → Yank).
This hides it from
pip install mlsysimbut keeps existing pins working. - Fix the bug on a PR to
dev, bump to the next patch version (X.Y.Z+1), merge. - Tag
mlsysim-vX.Y.Z+1and push — the workflow ships the fix.
Never force-push or amend a tag that's been pushed. Tags on origin are immutable release markers.
Manual fallback — only if the workflow is broken
If the workflow itself has a bug that prevents automated release and you
must ship, the legacy manual steps still work (your ~/.pypirc with a
PyPI token is the required credential path here). Fix the workflow in a
follow-up PR; don't normalize the manual path.
# From a clean checkout of the tagged commit:
cd mlsysim
make clean && make build
twine check dist/*
twine upload dist/* # requires ~/.pypirc with scoped token
gh release create mlsysim-vX.Y.Z \
--repo harvard-edge/cs249r_book \
--title "MLSys·im X.Y.Z" \
--notes-file RELEASE_NOTES_X.Y.Z.md \
dist/mlsysim-X.Y.Z-py3-none-any.whl \
dist/mlsysim-X.Y.Z.tar.gz
gh workflow run mlsysim-publish-live.yml -R harvard-edge/cs249r_book --ref dev