mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 18:18:42 -05:00
Documents the unified versioning convention adopted across all publishable artifacts: how to publish (operator), how to verify a deployed release (consumer), what each tier means, where the contract lives. Written last so it reflects what actually shipped, not what was planned. Pairs with shared/release/README.md (operator-internal contract) — this file is for first-time contributors trying to understand "why do I get five inputs when I run the publish workflow?".
192 lines
8.3 KiB
Markdown
192 lines
8.3 KiB
Markdown
# Versioning
|
|
|
|
How releases work across the MLSysBook monorepo. One pattern for all
|
|
publishable artifacts; one source of truth per release; one URL the
|
|
operator visits to ship one.
|
|
|
|
## Why this exists
|
|
|
|
Before this convention landed, every project had its own version story:
|
|
|
|
- StaffML built every publish as `--release-id publish-live` (a constant
|
|
string), and the manifest drifted to `0.1.2-dev` while
|
|
`releases/0.1.0/release.json` said `0.1.0`. Citations couldn't be
|
|
trusted.
|
|
- TinyTorch hand-edited the version in 6 files per release; one missed
|
|
edit could silently downgrade the source.
|
|
- Book had per-volume tags but no cross-coordination — Vol I and Vol II
|
|
could disagree about which release shipped together.
|
|
- MLSYSIM had a PyPI version, a paper, and a docs site that all claimed
|
|
different things at any given time.
|
|
- Kits/Labs/Instructors had no versions at all — just commit SHAs.
|
|
|
|
The unified convention fixes the operational problem ("did I ship the
|
|
right thing? what's deployed right now?") and the citation problem
|
|
("v0.1.0 means *exactly these bytes*, forever").
|
|
|
|
## What every release produces
|
|
|
|
Each `<project>-publish-live` workflow run produces:
|
|
|
|
1. **A git tag** — `<project>-v<release_id>`, e.g. `staffml-v0.1.1`.
|
|
2. **A `release-manifest.json`** at the deploy URL — e.g.
|
|
`https://mlsysbook.ai/staffml/release-manifest.json`. Cacheable,
|
|
readable by anyone, parseable by tools.
|
|
3. **A draft GitHub Release** — for the human-facing changelog with
|
|
auto-generated commit list (Tier A also runs AI-enhanced notes).
|
|
4. **A footer pill on the live site** — small inline "v0.1.1 · Apr 28,
|
|
2026" element. Click to copy hash. Best-effort chrome.
|
|
|
|
Tier A projects additionally produce:
|
|
|
|
5. **A `release_hash`** — Merkle-style SHA-256 over input bytes,
|
|
recorded in the manifest. This is the citation anchor: a paper
|
|
referencing "MLSysBook v0.1.1 (hash 6883e85)" is now reproducible.
|
|
|
|
## How to publish (operator)
|
|
|
|
1. Go to the project's "Publish (Live)" workflow in GitHub Actions.
|
|
2. Click "Run workflow" and fill in:
|
|
- **release_type**: `patch` (small fixes), `minor` (new content),
|
|
`major` (breaking changes). Default `patch`.
|
|
- **description**: One-line summary. Becomes the release-notes title.
|
|
- **site_only**: Check this for CSS/copy-only redeploys that
|
|
should NOT bump the release_id (citation integrity demands a
|
|
given version maps to fixed bytes — re-tagging existing releases
|
|
is forbidden).
|
|
- **explicit_version**: For non-incremental jumps (e.g. 0.1.x →
|
|
0.10.0 alongside a coordinated launch). Leave blank to auto-bump.
|
|
- **confirm**: Type `PUBLISH`. The workflow refuses to proceed if
|
|
this isn't exact — stops accidental clicks.
|
|
3. Wait for the workflow to complete. The draft GitHub Release will
|
|
appear at `https://github.com/<repo>/releases`. Review the auto-
|
|
generated notes, then publish.
|
|
|
|
## How to verify a deployed version
|
|
|
|
```bash
|
|
# What release is live right now?
|
|
curl -s https://mlsysbook.ai/staffml/release-manifest.json | jq .
|
|
|
|
# What's in release 0.1.1?
|
|
gh release view staffml-v0.1.1
|
|
cat releases/staffml-0.1.1/release.json | jq .
|
|
|
|
# Does the deployed manifest match what's tagged?
|
|
curl -s https://mlsysbook.ai/staffml/release-manifest.json | jq -r .releaseHash
|
|
git show staffml-v0.1.1:releases/staffml-0.1.1/release.json | jq -r .release_hash
|
|
```
|
|
|
|
If those two hashes differ, something between tag and deploy went
|
|
wrong — file an issue.
|
|
|
|
## Who lives in which tier
|
|
|
|
| Project | Tier | Why |
|
|
|---|---|---|
|
|
| StaffML | A | Citable question bank; authors will reference v X.Y.Z in papers |
|
|
| TinyTorch | A | Educational framework; cited in syllabi and papers |
|
|
| Book Vol I | A | Textbook, multiple per-volume tag tracks (vol1-v*) |
|
|
| Book Vol II | A | Textbook, separate tag track (vol2-v*) |
|
|
| MLSYSIM | A | Site identity binds to PyPI + paper, all citable |
|
|
| Kits | B | Hardware deployment labs, iterate fast, not formally cited |
|
|
| Labs | B | Marimo notebooks, evolve constantly |
|
|
| Instructors | B | Instructor guide, lower citation stakes |
|
|
|
|
Tier A and Tier B share the workflow UX. They differ in:
|
|
|
|
- **Hash detail**: Tier A includes a per-file `files: [{path, hash}]`
|
|
index in `release.json` (Merkle-ish: lets a consumer verify a single
|
|
question without downloading the whole corpus). Tier B uses a flat
|
|
SHA-256.
|
|
- **Release notes**: Tier A runs AI-enhanced summarization. Tier B
|
|
uses plain auto-generated commit lists.
|
|
|
|
## Architecture in one paragraph
|
|
|
|
`scripts/version/release.py` is the canonical implementation of every
|
|
versioning operation: hash a directory, compute next release_id, emit
|
|
a release.json, emit a build-time manifest. The reusable workflow
|
|
`.github/workflows/_release-prepare.yml` validates `confirm`, computes
|
|
the new release_id from prior tag + bump, and outputs values the
|
|
caller workflow uses to drive its own build. Each project's
|
|
`<project>-publish-live.yml` calls `_release-prepare.yml` first, runs
|
|
its existing build with the computed release_id, emits a manifest
|
|
into the build output (so it deploys at the canonical URL), and then
|
|
tags + creates a release. The `shared/release/release-pill.html`
|
|
fragment fetches the manifest at runtime and renders the footer pill;
|
|
each project's Quarto config sets a `<meta name="release-manifest">`
|
|
tag so the snippet finds the right URL.
|
|
|
|
## Contract reference
|
|
|
|
| Field | Source | Notes |
|
|
|---|---|---|
|
|
| `releaseId` | `_release-prepare.yml` output | Bare semver, no prefix |
|
|
| `releaseHash` | `release.py compute-hash` | 64 hex chars; Merkle-ish (Tier A) or flat SHA-256 (Tier B) |
|
|
| `schemaVersion` | Manifest emitter caller | Project-internal; rarely changes |
|
|
| `tier` | Manifest emitter caller | `A` or `B` |
|
|
| `project` | Manifest emitter caller | Short identifier |
|
|
| `buildDate` | Manifest emitter | UTC ISO 8601 — set at emit time |
|
|
|
|
The full schema is in `scripts/version/schema.json`.
|
|
|
|
## Backfill
|
|
|
|
Pre-cutover releases (everything tagged before this convention landed)
|
|
do NOT have a `releases/<id>/release.json` artifact. They keep their
|
|
existing tags and behave normally; only releases ≥ this PR get the
|
|
full treatment. Backfilling historical releases is a separate, optional
|
|
follow-up task.
|
|
|
|
## Out of scope (today)
|
|
|
|
- **Periodic-Table** — no publish workflow exists; we'd be adding
|
|
versioning to a project that doesn't deploy. Establish publishing
|
|
first.
|
|
- **Reusable build orchestration** — the reusable workflow only does
|
|
prepare. Each project keeps its own build/test/deploy because those
|
|
steps are project-specific (Quarto vs Next.js vs Marimo). Trying to
|
|
generalize the build itself produced too-thin or too-rigid abstractions
|
|
in earlier drafts.
|
|
- **PyPI version unification** — MLSYSIM's PyPI version comes from
|
|
`pyproject.toml` and remains the canonical source for the package.
|
|
The site's release identity is a separate (compatible) track. The
|
|
paper's identity rides on the site release.
|
|
- **Cross-project release coordination** — there's no "stamp all of
|
|
MLSysBook with the same release_id" path. Each project bumps
|
|
independently. The book's per-volume coordination is the only
|
|
intentional exception.
|
|
|
|
## Files at a glance
|
|
|
|
```
|
|
docs/VERSIONING.md ← this file
|
|
scripts/version/release.py ← Python helpers + CLI
|
|
scripts/version/schema.json ← JSON Schema for release.json
|
|
shared/release/README.md ← contract documentation
|
|
shared/release/release-pill.html ← footer snippet
|
|
shared/release/release-card.html ← about-page snippet
|
|
.github/workflows/_release-prepare.yml ← reusable workflow
|
|
|
|
# Per-project: each <project>-publish-live.yml calls _release-prepare,
|
|
# emits a manifest before deploy, tags + drafts a release after.
|
|
.github/workflows/staffml-publish-live.yml
|
|
.github/workflows/tinytorch-publish-live.yml
|
|
.github/workflows/book-publish-live.yml
|
|
.github/workflows/mlsysim-publish-live.yml
|
|
.github/workflows/kits-publish-live.yml
|
|
.github/workflows/labs-publish-live.yml
|
|
.github/workflows/instructors-publish-live.yml
|
|
|
|
# Per-project Quarto/Next config: meta tag + pill include
|
|
interviews/staffml/src/components/Footer.tsx ← StaffML uses build-time bake
|
|
tinytorch/quarto/_quarto.yml
|
|
book/quarto/config/_quarto-html-vol1.yml
|
|
book/quarto/config/_quarto-html-vol2.yml
|
|
mlsysim/docs/config/_quarto-html.yml
|
|
kits/config/_quarto-html.yml
|
|
labs/config/_quarto-html.yml
|
|
instructors/_quarto.yml
|
|
```
|