mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-08 02:28:25 -05:00

Files

Vijay Janapa Reddi 789509080d docs: contributor-facing versioning guide

Documents the unified versioning convention adopted across all
publishable artifacts: how to publish (operator), how to verify a
deployed release (consumer), what each tier means, where the contract
lives. Written last so it reflects what actually shipped, not what
was planned.

Pairs with shared/release/README.md (operator-internal contract) —
this file is for first-time contributors trying to understand "why
do I get five inputs when I run the publish workflow?".

2026-04-28 18:22:25 -04:00

8.3 KiB

Raw Permalink Blame History

Versioning

How releases work across the MLSysBook monorepo. One pattern for all publishable artifacts; one source of truth per release; one URL the operator visits to ship one.

Why this exists

Before this convention landed, every project had its own version story:

StaffML built every publish as --release-id publish-live (a constant string), and the manifest drifted to 0.1.2-dev while releases/0.1.0/release.json said 0.1.0. Citations couldn't be trusted.
TinyTorch hand-edited the version in 6 files per release; one missed edit could silently downgrade the source.
Book had per-volume tags but no cross-coordination — Vol I and Vol II could disagree about which release shipped together.
MLSYSIM had a PyPI version, a paper, and a docs site that all claimed different things at any given time.
Kits/Labs/Instructors had no versions at all — just commit SHAs.

The unified convention fixes the operational problem ("did I ship the right thing? what's deployed right now?") and the citation problem ("v0.1.0 means exactly these bytes, forever").

What every release produces

Each <project>-publish-live workflow run produces:

A git tag — <project>-v<release_id>, e.g. staffml-v0.1.1.
A release-manifest.json at the deploy URL — e.g. https://mlsysbook.ai/staffml/release-manifest.json. Cacheable, readable by anyone, parseable by tools.
A draft GitHub Release — for the human-facing changelog with auto-generated commit list (Tier A also runs AI-enhanced notes).
A footer pill on the live site — small inline "v0.1.1 · Apr 28, 2026" element. Click to copy hash. Best-effort chrome.

Tier A projects additionally produce:

A release_hash — Merkle-style SHA-256 over input bytes, recorded in the manifest. This is the citation anchor: a paper referencing "MLSysBook v0.1.1 (hash 6883e85)" is now reproducible.

How to publish (operator)

Go to the project's "Publish (Live)" workflow in GitHub Actions.
Click "Run workflow" and fill in:
- release_type: patch (small fixes), minor (new content), major (breaking changes). Default patch.
- description: One-line summary. Becomes the release-notes title.
- site_only: Check this for CSS/copy-only redeploys that should NOT bump the release_id (citation integrity demands a given version maps to fixed bytes — re-tagging existing releases is forbidden).
- explicit_version: For non-incremental jumps (e.g. 0.1.x → 0.10.0 alongside a coordinated launch). Leave blank to auto-bump.
- confirm: Type PUBLISH. The workflow refuses to proceed if this isn't exact — stops accidental clicks.
Wait for the workflow to complete. The draft GitHub Release will appear at https://github.com/<repo>/releases. Review the auto- generated notes, then publish.

How to verify a deployed version

# What release is live right now?
curl -s https://mlsysbook.ai/staffml/release-manifest.json | jq .

# What's in release 0.1.1?
gh release view staffml-v0.1.1
cat releases/staffml-0.1.1/release.json | jq .

# Does the deployed manifest match what's tagged?
curl -s https://mlsysbook.ai/staffml/release-manifest.json | jq -r .releaseHash
git show staffml-v0.1.1:releases/staffml-0.1.1/release.json | jq -r .release_hash

If those two hashes differ, something between tag and deploy went wrong — file an issue.

Who lives in which tier

Project	Tier	Why
StaffML	A	Citable question bank; authors will reference v X.Y.Z in papers
TinyTorch	A	Educational framework; cited in syllabi and papers
Book Vol I	A	Textbook, multiple per-volume tag tracks (vol1-v*)
Book Vol II	A	Textbook, separate tag track (vol2-v*)
MLSYSIM	A	Site identity binds to PyPI + paper, all citable
Kits	B	Hardware deployment labs, iterate fast, not formally cited
Labs	B	Marimo notebooks, evolve constantly
Instructors	B	Instructor guide, lower citation stakes

Tier A and Tier B share the workflow UX. They differ in:

Hash detail: Tier A includes a per-file files: [{path, hash}] index in release.json (Merkle-ish: lets a consumer verify a single question without downloading the whole corpus). Tier B uses a flat SHA-256.
Release notes: Tier A runs AI-enhanced summarization. Tier B uses plain auto-generated commit lists.

Architecture in one paragraph

scripts/version/release.py is the canonical implementation of every versioning operation: hash a directory, compute next release_id, emit a release.json, emit a build-time manifest. The reusable workflow .github/workflows/_release-prepare.yml validates confirm, computes the new release_id from prior tag + bump, and outputs values the caller workflow uses to drive its own build. Each project's <project>-publish-live.yml calls _release-prepare.yml first, runs its existing build with the computed release_id, emits a manifest into the build output (so it deploys at the canonical URL), and then tags + creates a release. The shared/release/release-pill.html fragment fetches the manifest at runtime and renders the footer pill; each project's Quarto config sets a <meta name="release-manifest"> tag so the snippet finds the right URL.

Contract reference

Field	Source	Notes
`releaseId`	`_release-prepare.yml` output	Bare semver, no prefix
`releaseHash`	`release.py compute-hash`	64 hex chars; Merkle-ish (Tier A) or flat SHA-256 (Tier B)
`schemaVersion`	Manifest emitter caller	Project-internal; rarely changes
`tier`	Manifest emitter caller	`A` or `B`
`project`	Manifest emitter caller	Short identifier
`buildDate`	Manifest emitter	UTC ISO 8601 — set at emit time

The full schema is in scripts/version/schema.json.

Backfill

Pre-cutover releases (everything tagged before this convention landed) do NOT have a releases/<id>/release.json artifact. They keep their existing tags and behave normally; only releases ≥ this PR get the full treatment. Backfilling historical releases is a separate, optional follow-up task.

Out of scope (today)

Periodic-Table — no publish workflow exists; we'd be adding versioning to a project that doesn't deploy. Establish publishing first.
Reusable build orchestration — the reusable workflow only does prepare. Each project keeps its own build/test/deploy because those steps are project-specific (Quarto vs Next.js vs Marimo). Trying to generalize the build itself produced too-thin or too-rigid abstractions in earlier drafts.
PyPI version unification — MLSYSIM's PyPI version comes from pyproject.toml and remains the canonical source for the package. The site's release identity is a separate (compatible) track. The paper's identity rides on the site release.
Cross-project release coordination — there's no "stamp all of MLSysBook with the same release_id" path. Each project bumps independently. The book's per-volume coordination is the only intentional exception.

Files at a glance

docs/VERSIONING.md                              ← this file
scripts/version/release.py                      ← Python helpers + CLI
scripts/version/schema.json                     ← JSON Schema for release.json
shared/release/README.md                        ← contract documentation
shared/release/release-pill.html                ← footer snippet
shared/release/release-card.html                ← about-page snippet
.github/workflows/_release-prepare.yml          ← reusable workflow

# Per-project: each <project>-publish-live.yml calls _release-prepare,
# emits a manifest before deploy, tags + drafts a release after.
.github/workflows/staffml-publish-live.yml
.github/workflows/tinytorch-publish-live.yml
.github/workflows/book-publish-live.yml
.github/workflows/mlsysim-publish-live.yml
.github/workflows/kits-publish-live.yml
.github/workflows/labs-publish-live.yml
.github/workflows/instructors-publish-live.yml

# Per-project Quarto/Next config: meta tag + pill include
interviews/staffml/src/components/Footer.tsx    ← StaffML uses build-time bake
tinytorch/quarto/_quarto.yml
book/quarto/config/_quarto-html-vol1.yml
book/quarto/config/_quarto-html-vol2.yml
mlsysim/docs/config/_quarto-html.yml
kits/config/_quarto-html.yml
labs/config/_quarto-html.yml
instructors/_quarto.yml

8.3 KiB Raw Permalink Blame History