[PR #1406] [MERGED] PR-3: Scripts, audits, cleanup (build stamp, PDF dropdown, 404s, mirror guard, dedup, RELEASE-PREP) #8173

Closed
opened 2026-04-27 17:28:38 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1406
Author: @profvjreddi
Created: 4/19/2026
Status: Merged
Merged: 4/19/2026
Merged by: @profvjreddi

Base: devHead: release-prep/scripts-audits-cleanup


📝 Commits (6)

  • ae4159f feat(footer): build-time "last updated" stamp
  • 86fdc91 feat(navbar): expose paper.pdf for TinyTorch / MLSys·im / StaffML
  • 0fc4a56 feat(404): per-site 404 pages for slides / instructors / unified site
  • 3ab7d99 ci(precommit): block subsite-mirror drift on shared assets
  • 0d565a2 refactor(audit): duplicate-file finder + clean up obvious leftover
  • ab10dca docs(release-prep): handoff notes covering all five PR groupings

📊 Changes

14 files changed (+792 additions, -120 deletions)

View changed files

📝 .github/workflows/kits-publish-live.yml (+3 -0)
📝 .gitignore (+4 -0)
📝 .pre-commit-config.yaml (+17 -0)
RELEASE-PREP.md (+328 -0)
📝 book/quarto/config/shared/html/footer-common.yml (+2 -1)
instructors/404.qmd (+31 -0)
📝 shared/config/footer-site.yml (+2 -1)
📝 shared/config/navbar-common.yml (+19 -0)
📝 shared/config/site-head.html (+17 -0)
shared/scripts/find-duplicates.py (+235 -0)
shared/scripts/inject-build-stamp.sh (+71 -0)
site/404.qmd (+34 -0)
slides/404.qmd (+29 -0)
tinytorch/scripts/cleanup_repo_history.sh (+0 -118)

📄 Description

Summary

Quality-of-life and hygiene improvements identified during the
release-prep review. Independent of PR-1 / PR-2 — can merge in any
order. Includes the comprehensive handoff document covering all five
release-prep PRs.

What's in this PR

Build-time "last updated" footer stamp. New
shared/scripts/inject-build-stamp.sh finds the placeholder
<!-- MLSB_BUILD_STAMP --> in any built HTML page and replaces it
with <span class="mlsb-build-stamp">Last updated YYYY-MM-DD · <SiteLabel> · <CommitSHA></span>. Style block (small, dark-mode
aware) inlined into shared/config/site-head.html. Placeholder
already added to book/quarto/config/shared/html/footer-common.yml
and shared/config/footer-site.yml. First wired into
kits-publish-live.yml as a reference implementation; other publish
workflows can adopt the same step in followup.

Paper.pdf links surfaced in the navbar. Added direct entries
under the Build dropdown for TinyTorch and MLSys·im (where each
property generates a paper.pdf), and under Prepare for StaffML.
Single source of truth in shared/config/navbar-common.yml; all
HTML builds inherit. Closes the gap where the rendered papers had no
discovery surface.

Per-site 404 pages. Quarto/Next subsites that lacked a
maintained 404 page now have one tailored to their context with
relevant navigation back into the ecosystem:

  • slides/404.qmd
  • instructors/404.qmd
  • site/404.qmd (unified landing — broadest cross-property nav)

Pre-commit guard against shared-mirror drift. Quarto's
resource-copy step preserves symlinks instead of dereferencing them,
so we keep real-file copies of certain shared assets (subscribe modal
JS in particular) per subsite. Without a guard, the canonical and the
mirrors silently diverge — most common symptom is the wrong subscribe
modal rendering on one subsite. New check-shared-mirrors hook runs
bash shared/scripts/sync-mirrors.sh --check on every commit
(always_run: true because mirrors can drift via deletion of the
canonical, not just by editing the canonical itself).

Duplicate-file audit script + initial cleanup.
shared/scripts/find-duplicates.py walks the chosen subsite roots,
hashes files, groups by hash, and reports unintended duplicates
(known mirrors are excluded via an allowlist; symlinks are skipped).
First run found
tinytorch/scripts/cleanup_repo_history.sh ==
tinytorch/tools/maintenance/cleanup_history.sh byte-for-byte —
the script is removed in this commit. .gitignore updated to
exclude .audit/ / _audit/ output dirs.

RELEASE-PREP.md handoff document. Single document organizing
all 19 release-prep commits into the five logical PR groupings
(safety net, visual polish, scripts/audits/cleanup, TinyTorch prep,
cutover skeletons), with per-PR rationale, deferred items, and
local verification notes. Living document — will be updated as PRs
land and merge.

Risk surface

  • The build-stamp injector mutates built HTML at deploy time; if the
    placeholder isn't present (which is the case for any property that
    hasn't adopted it), the script is a no-op (safe-by-default).
  • The check-shared-mirrors hook refuses commits when mirrors are
    drifted. Worst case for an accidentally drifted file: developer
    runs bash shared/scripts/sync-mirrors.sh to re-sync, then commits.
    Hook itself is idempotent.
  • The dedup audit ran clean on its second pass (only unintended dup
    was the script removed here).

Test plan

  • CI: book-validate-dev, kits-validate-dev, etc. all pass
    (no schema regressions).
  • Local: pre-commit run check-shared-mirrors --all-files exits
    0 on a clean checkout.
  • Local: python3 shared/scripts/find-duplicates.py produces a
    report with no flagged duplicates.
  • Local on built HTML: bash shared/scripts/inject-build-stamp.sh kits/_site Kits rewrites the placeholder.
  • Manual: navbar Build dropdown shows TinyTorch Paper / MLSys·im
    Paper / StaffML Paper entries on a built site.

Followup

  • Wire inject-build-stamp.sh into the other *-publish-live.yml
    workflows (TinyTorch, labs, slides, instructors, mlsysim, site,
    staffml, book). Currently only kits is wired as a reference.
  • Periodic find-duplicates.py workflow (cron, weekly) so the
    audit doesn't bit-rot.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1406 **Author:** [@profvjreddi](https://github.com/profvjreddi) **Created:** 4/19/2026 **Status:** ✅ Merged **Merged:** 4/19/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `release-prep/scripts-audits-cleanup` --- ### 📝 Commits (6) - [`ae4159f`](https://github.com/harvard-edge/cs249r_book/commit/ae4159f7616796de349d976dc6cd59c2bf89a07b) feat(footer): build-time "last updated" stamp - [`86fdc91`](https://github.com/harvard-edge/cs249r_book/commit/86fdc9149818b138c79fdad9761b5af31c19f0d1) feat(navbar): expose paper.pdf for TinyTorch / MLSys·im / StaffML - [`0fc4a56`](https://github.com/harvard-edge/cs249r_book/commit/0fc4a56d2a67268cd74409cde01901302806367c) feat(404): per-site 404 pages for slides / instructors / unified site - [`3ab7d99`](https://github.com/harvard-edge/cs249r_book/commit/3ab7d996919b7c3f676a70416b2f5019ea9c6c13) ci(precommit): block subsite-mirror drift on shared assets - [`0d565a2`](https://github.com/harvard-edge/cs249r_book/commit/0d565a2d1443e4553b920e0d55ed239bea94fe79) refactor(audit): duplicate-file finder + clean up obvious leftover - [`ab10dca`](https://github.com/harvard-edge/cs249r_book/commit/ab10dcaed44cb47be764f0890902be83e89c3c43) docs(release-prep): handoff notes covering all five PR groupings ### 📊 Changes **14 files changed** (+792 additions, -120 deletions) <details> <summary>View changed files</summary> 📝 `.github/workflows/kits-publish-live.yml` (+3 -0) 📝 `.gitignore` (+4 -0) 📝 `.pre-commit-config.yaml` (+17 -0) ➕ `RELEASE-PREP.md` (+328 -0) 📝 `book/quarto/config/shared/html/footer-common.yml` (+2 -1) ➕ `instructors/404.qmd` (+31 -0) 📝 `shared/config/footer-site.yml` (+2 -1) 📝 `shared/config/navbar-common.yml` (+19 -0) 📝 `shared/config/site-head.html` (+17 -0) ➕ `shared/scripts/find-duplicates.py` (+235 -0) ➕ `shared/scripts/inject-build-stamp.sh` (+71 -0) ➕ `site/404.qmd` (+34 -0) ➕ `slides/404.qmd` (+29 -0) ➖ `tinytorch/scripts/cleanup_repo_history.sh` (+0 -118) </details> ### 📄 Description ## Summary Quality-of-life and hygiene improvements identified during the release-prep review. Independent of PR-1 / PR-2 — can merge in any order. Includes the comprehensive handoff document covering all five release-prep PRs. ### What's in this PR **Build-time "last updated" footer stamp.** New `shared/scripts/inject-build-stamp.sh` finds the placeholder `<!-- MLSB_BUILD_STAMP -->` in any built HTML page and replaces it with `<span class="mlsb-build-stamp">Last updated YYYY-MM-DD · <SiteLabel> · <CommitSHA></span>`. Style block (small, dark-mode aware) inlined into `shared/config/site-head.html`. Placeholder already added to `book/quarto/config/shared/html/footer-common.yml` and `shared/config/footer-site.yml`. First wired into `kits-publish-live.yml` as a reference implementation; other publish workflows can adopt the same step in followup. **Paper.pdf links surfaced in the navbar.** Added direct entries under the Build dropdown for TinyTorch and MLSys·im (where each property generates a `paper.pdf`), and under Prepare for StaffML. Single source of truth in `shared/config/navbar-common.yml`; all HTML builds inherit. Closes the gap where the rendered papers had no discovery surface. **Per-site 404 pages.** Quarto/Next subsites that lacked a maintained 404 page now have one tailored to their context with relevant navigation back into the ecosystem: - `slides/404.qmd` - `instructors/404.qmd` - `site/404.qmd` (unified landing — broadest cross-property nav) **Pre-commit guard against shared-mirror drift.** Quarto's resource-copy step preserves symlinks instead of dereferencing them, so we keep real-file copies of certain shared assets (subscribe modal JS in particular) per subsite. Without a guard, the canonical and the mirrors silently diverge — most common symptom is the wrong subscribe modal rendering on one subsite. New `check-shared-mirrors` hook runs `bash shared/scripts/sync-mirrors.sh --check` on every commit (always_run: true because mirrors can drift via deletion of the canonical, not just by editing the canonical itself). **Duplicate-file audit script + initial cleanup.** `shared/scripts/find-duplicates.py` walks the chosen subsite roots, hashes files, groups by hash, and reports unintended duplicates (known mirrors are excluded via an allowlist; symlinks are skipped). First run found `tinytorch/scripts/cleanup_repo_history.sh` == `tinytorch/tools/maintenance/cleanup_history.sh` byte-for-byte — the script is removed in this commit. `.gitignore` updated to exclude `.audit/` / `_audit/` output dirs. **RELEASE-PREP.md handoff document.** Single document organizing all 19 release-prep commits into the five logical PR groupings (safety net, visual polish, scripts/audits/cleanup, TinyTorch prep, cutover skeletons), with per-PR rationale, deferred items, and local verification notes. Living document — will be updated as PRs land and merge. ### Risk surface - The build-stamp injector mutates built HTML at deploy time; if the placeholder isn't present (which is the case for any property that hasn't adopted it), the script is a no-op (safe-by-default). - The `check-shared-mirrors` hook refuses commits when mirrors are drifted. Worst case for an accidentally drifted file: developer runs `bash shared/scripts/sync-mirrors.sh` to re-sync, then commits. Hook itself is idempotent. - The dedup audit ran clean on its second pass (only unintended dup was the script removed here). ### Test plan - [ ] CI: book-validate-dev, kits-validate-dev, etc. all pass (no schema regressions). - [ ] Local: `pre-commit run check-shared-mirrors --all-files` exits 0 on a clean checkout. - [ ] Local: `python3 shared/scripts/find-duplicates.py` produces a report with no flagged duplicates. - [ ] Local on built HTML: `bash shared/scripts/inject-build-stamp.sh kits/_site Kits` rewrites the placeholder. - [ ] Manual: navbar Build dropdown shows TinyTorch Paper / MLSys·im Paper / StaffML Paper entries on a built site. ### Followup - Wire `inject-build-stamp.sh` into the other `*-publish-live.yml` workflows (TinyTorch, labs, slides, instructors, mlsysim, site, staffml, book). Currently only kits is wired as a reference. - Periodic `find-duplicates.py` workflow (cron, weekly) so the audit doesn't bit-rot. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-27 17:28:38 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#8173