mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 02:03:55 -05:00
[PR #1409] [MERGED] PR-5: Cutover skeletons (rollback-legacy + redirect map + sitemap aggregator) #6529
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/harvard-edge/cs249r_book/pull/1409
Author: @profvjreddi
Created: 4/19/2026
Status: ✅ Merged
Merged: 4/19/2026
Merged by: @profvjreddi
Base:
dev← Head:release-prep/cutover-skeletons📝 Commits (3)
d71dd6ffeat(launch): rollback-legacy.sh — snapshot + restore the gh-pages root01dbce4feat(seo): redirect-map skeleton + HTML-stub generatorcc9a19cfeat(seo): aggregate per-subsite sitemaps into mlsysbook.ai/sitemap.xml📊 Changes
5 files changed (+834 additions, -0 deletions)
View changed files
➕
.github/workflows/infra-build-sitemap.yml(+155 -0)➕
shared/config/redirect-map.json(+70 -0)➕
shared/scripts/build-redirects.py(+204 -0)➕
shared/scripts/build-sitemap.py(+172 -0)➕
shared/scripts/rollback-legacy.sh(+233 -0)📄 Description
Summary
Three thin scripts/configs that the actual cutover (legacy
mlsysbook.ai→ unified landing) will rely on. Skeletons only — nobehavior change yet, no workflow that runs them in CI. Lets us
review the shape now and wire up the runners as part of the
launch sequence rather than the prep sequence.
What's in this PR
1.
shared/scripts/rollback-legacy.sh— gh-pages root snapshot/restore.The cutover replaces the gh-pages root content (currently the legacy
single-volume book) with the new unified landing. If something breaks
post-cutover, we need to be able to restore the legacy site fast.
Script supports two operations:
snapshot— clones gh-pages, archives the current root content(everything except subsite directories) to a timestamped tarball
under
shared/_snapshots/. Run this BEFORE the cutover.restore <tarball>— clones gh-pages, replaces the root contentwith the snapshot, force-pushes. Run this if cutover needs reverting.
Subsite directories (
vol1/,vol2/,tinytorch/,kits/,labs/,slides/,instructors/,mlsysim/,staffml/,site/,assets/) are explicitly preserved by both operations — they havetheir own publish workflows and aren't part of the legacy root.
2.
shared/config/redirect-map.json+shared/scripts/build-redirects.py.The legacy book has dozens of indexed deep-links into chapters that
need 301 redirects to their Volume I equivalents (and a handful that
move to Volume II). Approach:
redirect-map.jsonis the source of truth. Each entry hasfrom(legacy path),to(canonical new URL),reason(whythe URL changed; documentation for future maintainers), and
status(active,pending, orarchive).build-redirects.pyreads the JSON, generates HTML stubs at thefrompaths with both a<meta http-equiv=\"refresh\">and a<link rel=\"canonical\">for each entry. GitHub Pages doesn'tsupport real 301s, so HTML-stub redirect is the standard pattern.
populated. Will be expanded based on actual analytics referrer
data before cutover.
3.
shared/scripts/build-sitemap.py+.github/workflows/infra-build-sitemap.yml.Quarto generates a
sitemap.xmlper subsite. Search engines andLLM crawlers expect a single sitemap (or sitemap index) at
mlsysbook.ai/sitemap.xml. Approach:build-sitemap.pywalks gh-pages, finds all per-subsitesitemap.xmlfiles, generates a sitemap index at the rootthat lists each one. Sitemap index format (xmlns
http://www.sitemaps.org/schemas/sitemap/0.9) is the standardway to handle multi-property sites without merging URL lists
(which would break the per-property
lastmodtimestamps).infra-build-sitemap.ymlruns the script onworkflow_dispatchand on a daily cron. Skeleton — not yet referenced by any
publish workflow's post-deploy step.
What this PR is NOT
PR/runbook).
by design — rollback is a deliberate decision).
build-redirects.pyinto any deploy workflow(that happens in the launch PR, after the redirect map is filled
in from referrer analytics).
build-sitemap.pyinto any publish workflow(cron-only for now — switching it to post-deploy is a launch task).
Risk surface
config references the new files. Worst case: dead code.
shellcheckclean for bash,python3 -m py_compileclean for Python).vars.*/secrets.*on apull_requesttrigger).Test plan
python3 shared/scripts/build-redirects.py --dry-runproduces the expected stubs from the current skeleton map.
python3 shared/scripts/build-sitemap.py --dry-runagainst a built
_site/produces a syntactically-validsitemap index (validate with
xmllint --noout).infra-build-sitemap.ymlonce after merge toconfirm the workflow lifts off (will exit early because no
gh-pages branch on the head this is dispatched from is OK —
it queries the actual
gh-pagesbranch).Followup (launch PR)
redirect-map.jsonfrom actual analytics referrer data.build-redirects.pyinto thesite-publish-live.yml(or adedicated
infra-publish-redirects.yml) so stubs get re-generatedon every site publish.
infra-build-sitemap.ymlfrom cron-only to also beingtriggered post-publish from each
*-publish-live.ymlso theindex stays fresh.
infra(or tosite-publish-live.yml'spre-deploy gate) that runs
rollback-legacy.sh snapshotbeforethe cutover deploy.
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.