mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-06 09:38:33 -05:00
dev
14 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
e4069996d8 |
fix(theme): bridge dark mode to .quarto-dark body class
The site/about/about.css, site/community/community.css, and
site/newsletter/newsletter.css custom styles key dark-mode CSS
variables off `.quarto-dark { ... }` (and descendant chains like
`.quarto-dark .opening-lead`). That class is added by Quarto only when
the user clicks the toggle button — never on the OS-preferred dark
path. Result: a visitor whose browser is set to dark mode but who has
not toggled the site explicitly gets the dark <html data-bs-theme=dark>
background but the LIGHT mode `--ab-text: #1a1a2e` text, producing the
classic "Open by design." invisibility on /about/license.html and a
slate of low-contrast hero/body text across about/community/newsletter
pages.
Patch the central `apply()` so every theme transition — initial paint,
toggle click, OS-pref change, cross-tab storage event — also mirrors
the scheme onto `document.body.classList`. A MutationObserver covers
the FOUC window where <body> has not yet parsed at first apply().
This is intentionally a JS bridge rather than a CSS rewrite: the three
custom-CSS files have ~15 selectors keyed on `.quarto-dark` between
them, including descendant chains where a naive find/replace would
break selector grouping (`.quarto-dark .foo {...}` cannot become
`[data-bs-theme="dark"], .quarto-dark .foo {...}` — that comma turns
the first selector standalone). One JS line covers all of them and
keeps the CSS files unchanged.
|
||
|
|
d759f3c4c2 |
fix(theme): bridge Quarto's 'alternate'/'default' to data-bs-theme
Quarto's built-in toggle stores its color-scheme choice as
'alternate'/'default' under the same localStorage key (`quarto-color-scheme`)
that the shared theme-persist shim reads. The shim only recognized
'dark'/'light', so once a reader clicked the toggle it would, on the next
load, fall back to OS preference and apply data-bs-theme=light while
Quarto correctly enabled the dark stylesheet (or vice versa). The result
was a half-themed page — most visible to readers on macOS dark mode whose
stored choice was 'default' (light): Bootstrap's CSS-variable dark mode
kicked in via data-bs-theme=dark, but the dark-mode SCSS layer never
loaded, leaving a dark navbar against a light sidebar/content/announcement.
theme-persist now accepts both vocabularies on read (alternate→dark,
default→light) and wraps quartoToggleColorScheme so the html attribute
syncs immediately after a click instead of waiting for the next reload.
The wrapper is a no-op on non-Quarto subsites (StaffML/Next.js).
Quarto's startup still checks `=== 'alternate'`, so we do NOT rewrite
Quarto's stored values — only mirror them onto <html>.
Single shared file in include-in-header propagates to all 8 Quarto
subsites: book vol1+vol2, labs, kits, slides, instructors, site, mlsysim.
Verified with Playwright across the full vol1↔vol2 navigation + toggle
sequence and across {OS=dark|light} × {storage=null|default|alternate}
matrix: 5/10 mismatches before, 0 after.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b011cd62a4 |
Merge pull request #1394 from harvard-edge/feat/socratiq
feat: add socratiq directory (excluding node_modules and dist) |
||
|
|
5f97cca590 | Merge remote-tracking branch 'origin/dev' into dev | ||
|
|
b5357f1f02 |
Fix dark-mode shield + dedup About paper CTA (#1452)
* fix(brand): make SEAS shield PNG transparent so it works in dark mode
The canonical shield (shared/assets/img/logo-seas-shield.png) shipped as
RGB-no-alpha with white pixels in the rectangular bleed around the
curved shield outline. In light mode the white was invisible against
white nav bgs; in dark mode it rendered as a stark white tile around
the shield (visible on StaffML's dark navbar).
Flood-filled the exterior white from the corners with PIL and saved as
RGBA. The interior white "VERITAS" books are isolated from the corners
by the shield's black border and so are preserved (they are the actual
design, not background bleed).
Also added interviews/staffml/public/logo-seas-shield.png to the
sync-mirrors.sh map so the StaffML mirror stays in lockstep with the
canonical asset on future regenerations.
Verified:
* Build is RGBA (file out/logo-seas-shield.png reports "8-bit/color RGBA")
* Local dark-mode StaffML navbar: shield blends seamlessly into the
#212529 navbar bg, no white tile.
* Local light-mode: indistinguishable from before (the now-transparent
pixels were previously white-on-white, so no regression).
* fix(staffml/about): drop duplicate "Read the Research Paper" CTA
The /about page had two surfaces pointing at the same StaffML-Paper.pdf:
1. PaperCitationCard at the top (above the fold, PDF + BibTeX) — the
Phase 6 academic-citation entry point.
2. A second large bordered "Read the Research Paper" CTA card inside
the "How Questions Are Built" section, with effectively the same
pitch in different words.
The bottom card duplicated the top card's CTA without adding new
information and visually competed with the citation card a few sections
above. Replaced with a single inline link inside the methodology prose
("…is described in our paper"), so the in-context pointer survives
(this section IS the methodology) without the duplicate visual surface.
Net result: one prominent paper CTA above the fold, one inline
reference where the methodology text actually mentions it.
Also drop now-unused FileText import.
|
||
|
|
edbea966bf |
refactor(tinytorch): rename site-quarto/ to quarto/
Brings the TinyTorch lab guide's Quarto project in line with
book/quarto/, the only other in-tree Quarto publication that builds
both web and PDF outputs from a single source. The previous name had
three redundancies:
- already under tinytorch/, so "site-" prefix wasn't disambiguating
- also produces the PDF lab guide, so "site-" was misleading
- the top-level site/ dir made "site-quarto" read as "the site's
quarto config" rather than "the tinytorch site, in quarto"
After this rename the convention is straightforward:
book/quarto/ -> the textbook (web + PDF)
tinytorch/quarto/ -> the TinyTorch lab guide (web + PDF)
mlsysim/docs/ -> mlsysim API reference (kept as docs/, since it
really is API reference, not a publication)
Touches 7 GitHub workflows, both .gitignore files, the rename target's
own self-references (Makefile, _quarto.yml configs, STYLE.md,
measure-pdf-images.py), and 6 copies of subscribe-modal.js plus a few
shared scripts/configs whose comments documented the old path.
Verified: rebuilt pdf/TinyTorch-Guide.pdf (2.1M) cleanly from the new
location with 'make pdf' from tinytorch/quarto/.
|
||
|
|
1e12b3474b | refactor: document SocratiQ canonicals in sync-mirrors.sh | ||
|
|
152b8630dc |
fix(ci): clear all 8 failing pre-commit hooks on dev (#1413)
* fix(content): clear two mitpress-above-below pre-commit failures The "📚 Book · ✅ Validate (Dev)" workflow has been failing on dev for 8+ consecutive runs because the mitpress-above-below pre-commit hook flags spatial references like "above"/"below" inside body prose and figure captions (the MIT Press style guide wants @sec-/@fig- cross-refs or "earlier"/"later" instead). Two pre-existing violations were tripping the hook on every push: - book/quarto/contents/vol1/responsible_engr/responsible_engr.qmd:1604 fig-cap for fig-data-governance-pillars said "obligations discussed below: privacy, security, compliance, and transparency" — but those four obligations are *immediately* listed in the same caption, so "discussed below" was redundant. Reworded to "obligations of privacy, security, compliance, and transparency …". - book/quarto/contents/vol2/network_fabrics/network_fabrics.qmd:1217 fig-cap for fig-congestion-cascade said "the PFC backpressure cascades described below." Reworded to "described later in this section." which is what the hook wants. After our 4 release-prep merges (PR-1/2/7/12) cleaned up the other hook failures (spelling, bibtex tidy, pipe tables, contractions, mitpress-vs-period, …), this was the last remaining failing hook. Verified locally: pre-commit run mitpress-above-below --all-files MIT Press: No above/below spatial refs (use cross-refs).....Passed These are pure copy-edits to figure captions; no semantic change to the diagrams or surrounding text. * fix(check-internal-links): suppress 4 categories of false positives The Tier 1 link checker (shipped in PR #1404) was over-eager and flagged author content as broken in four documented patterns: 1. TikZ source inside HTML comments. Link regex matched `\node[mycycle](B1)` as a Markdown link `[mycycle](B1)`. Fix: strip `<!-- ... -->` bodies before scanning, preserving line/column offsets so any *real* failure we report stays accurate. 2. Quarto cross-references like `[Foo](@sec-bar)`, `@fig-x`, `@tbl-y`. These resolve through the project xref index at render time, not the filesystem; book/binder owns that validation. Fix: skip targets whose first token is `@sec-/@fig-/@tbl-/@eq-/@lst-/@thm-/@cor-/@def-/@exr-/ @exm-/@prp-`. 3. Uppercase URL schemes (`HTTPS://`, `HTTP://`) — common after mobile auto-capitalize or copied citations. Fix: case-insensitive prefix match for the EXTERNAL_SCHEMES tuple. 4. GitHub-style emoji-prefix slugs in `.md` READMEs (e.g. `## 🎯 20 Progressive Modules` produces anchor `#-20-progressive-modules` on github.com, but Pandoc would slugify to `progressive-modules`). Fix: register both Pandoc-style and GitHub-style slugs as valid anchors so neither rendering target trips the checker. Drops repo-wide broken-link count from 150 → 84 (false positives only; no real link rot is masked). Real rot is fixed in a separate commit so the checker improvement can be reviewed independently. * fix(content): repair internal-link rot across 10 files Concrete link rot the new checker (PR #1404) surfaced once its false positives were cleared. None of these are stylistic; each link points at a path or anchor that does not exist. - README/README_{zh,ja,ko}.md (24 links): translation files live in README/ so paths to repo-root targets need a `../` prefix (`book/README.md` -> `../book/README.md`, etc.). - mlsysim/docs/contributing.qmd (21 links): `../slides/...` pointed inside `mlsysim/`; the slides root is two levels up (`../../slides/...`). - mlsysim/docs/cli-reference.qmd: `getting-started.qmd#bring-your-own-yaml-byoy` removed; retarget to `#defining-custom-models` (closest surviving section about user-supplied model specs). - mlsysim/docs/for-engineers.qmd, for-instructors.qmd: `solver-guide.qmd#extending-mlsysim` no longer exists; retarget to `#writing-a-custom-solver` (the surviving custom-solver guide). - book/tools/scripts/README.md: `../docs/BINDER.md` resolved to `book/tools/docs/BINDER.md` (nonexistent); the file actually lives at `book/docs/BINDER.md`, which is `../../docs/BINDER.md` from here. - book/quarto/contents/frontmatter/index.qmd: `about.qmd#about-the-book-unnumbered` anchor was removed when the About heading was simplified; drop the anchor so the link lands at the top of the page (which IS the About section). - tinytorch/datasets/tinytalks/README.md: `scripts/README.md` was never created; point at the directory listing instead. * chore(pre-commit): exclude 3 forward-looking files from internal-link checker Three files reference content that does not (yet) exist on the filesystem; the references are intentional rather than rot, so they should not block CI: - labs/index.qmd: lists the 33 planned labs (vol1/lab_00..lab_16, vol2/lab_01..lab_16) as a roadmap. Links go live as each lab ships. De-linking now would lose the visual roadmap. When a lab lands the exclusion narrows naturally on its own. - labs/PROTOCOL.md, labs/TEMPLATE.md: internal authoring docs that reference `../.claude/docs/labs/{PROTOCOL,TEMPLATE}.md`. The `.claude/` tree is per-worktree and not always present at the same relative path; these are author-tooling refs, not user-facing. Net effect: the link checker is now green on a clean checkout. The exclude block uses comments per existing convention so the rationale is discoverable from the config alone. * fix(content): clear codespell, contractions, and vs. pre-commit failures Three pre-existing pre-commit hooks were failing on the dev branch prior to the release-prep merges. Each is a small content normalization: - codespell (2): re-declares -> redeclares (book/quarto/config/shared/README.md); unparseable -> unparsable (handled in the check-internal-links rewrite). - contractions (2): * socratiq/socratiq.qmd callout: "If you're" -> "If you are". * nn_architectures fig-alt for the attention-visualization figure: "didn't" -> "did not". Alt-text is descriptive prose for screen readers, not a verbatim transcription of pixels, so expanding the contraction matches MIT Press style without changing the figure itself. - mitpress-vs-period (6): bare `vs` -> `vs.` per MIT Press 2026 §10.5 in benchmarking.qmd, distributed_training.qmd (x3 across two Python docstrings rendered in code listings), fault_tolerance.qmd, and inference.qmd. Code-listing strings are visible prose in the rendered PDF, so the rule applies there as well. * chore: bibtex-tidy auto-format outputs Outputs of the bibtex-tidy pre-commit hook (which auto-fixes its own input). Picked up here so that running pre-commit on a clean checkout no longer reports a "files were modified" failure for the same files on every invocation. Pure formatting; no entry semantics changed. |
||
|
|
456ecc85b2 |
PR-1: Release-prep safety net (link checking + publish guards + nightly link-rot) (#1404)
* ci(links): add Tier 1 pre-commit internal-link checker
Wire shared/scripts/check-internal-links.py into pre-commit to validate
relative-path markdown links and same-file anchors in changed .md/.qmd
files. External (http/https) URLs are deliberately out of scope here —
that belongs to Lychee in CI (Tier 2 per-site validate-dev, Tier 3
nightly rot scan).
The hook ignores fenced code blocks and inline code spans to avoid
false positives on TikZ syntax embedded in Quarto sources, and ships
with a baseline exclude list (auto-generated quartodoc API stubs,
legacy Sphinx 404s, GitHub line-range anchors) so it can land without
churn on existing content. Tighten the exclude list incrementally as
those areas are cleaned up.
Part of the staged-rollout safety net.
* ci(links): Tier 2 per-site Lychee validate-dev coverage
Generalize the reusable Lychee workflow and extend per-site validate-dev
coverage so every shippable property has external-link reachability as a
CI signal.
Reusable workflow (.github/workflows/infra-link-check.yml):
- New inputs: lycheeignore_path, fail_on_broken (default false),
accept_status. Resolves the ignore file at runtime and warns if
missing rather than crashing the job.
- Summary step now exits non-zero only when fail_on_broken is true,
so it can be used as a non-blocking baseline today and tightened
per site later.
Shared ignore file (shared/config/.lycheeignore):
Universal patterns reused across sites (localhost, Google Slides
behind auth, known transient 404s, the live preview targets we are
about to publish to). The book keeps its existing canonical ignore
at book/config/linting/.lycheeignore — do not duplicate.
Per-site validate-dev:
- book, instructors, kits, labs, mlsysim, slides, tinytorch:
add a check-links job calling the reusable workflow, scoped to
that site's content tree and using the shared ignore file (book
keeps its own). All wired with fail_on_broken=false initially so
we discover the external-link baseline without blocking dev CI.
- site, staffml: new validate-dev workflows so the unified landing
page and StaffML have first-class CI parity (build + smoke + link
check + summary), matching the cadence used by the other sites.
- All summary steps updated to surface link-check results and to
mark them explicitly as non-blocking until baselines are clean.
Part of the staged-rollout safety net (Tier 2 of the link-checking
strategy: pre-commit / per-site / nightly).
* ci(release): publish-live green gate + nightly link rot tracker
Two safety nets that close the loop on the staged-rollout plan: prevent
shipping from an unvalidated baseline, and keep a durable record of
external link rot across all sites.
Publish guard (.github/workflows/infra-publish-guard.yml):
Reusable workflow called as the first job in every publish-live
pipeline. Queries the GitHub API for the latest run of the matching
validate-dev workflow on the dev branch and fails the publish if
that run is not 'success' or is older than max_age_minutes (default
24h). Inputs: validate_workflow (required), branch (default 'dev'),
max_age_minutes (default 1440).
Wire-up: every *-publish-live.yml now starts with a `guard` job and
chains its existing first job's `needs` to depend on it.
- book: guard runs only when confirm == 'PUBLISH' and not in
testing_mode (matches the existing dispatch-guard pattern).
- tinytorch: guard runs in addition to its in-band preflight (which
re-runs validate-dev against the publish commit). Defense in depth
on a workflow that already builds tags + PyPI artifacts.
- kits, labs, instructors, mlsysim, slides, site, staffml: guard is
the first job; the existing build-and-deploy / build job depends
on it.
Nightly link-rot sweep (.github/workflows/infra-link-rot-nightly.yml):
Runs at 04:30 UTC daily. Sweeps every site in parallel using the
Tier 2 reusable workflow, then aggregates results into a single
sticky GitHub issue (label: link-rot) so triage has one source of
truth instead of dozens of opened/closed tickets. Each run rewrites
the issue body with the current per-site status table and appends
a count comment so trend over time stays visible.
Manual trigger supports a dry_run input that prints the report to the
job log without touching the issue.
Part of the staged-rollout safety net (Tier 3 + green-gate enforcement).
* fix(ci): drop --exclude-mail from Lychee args (removed in v0.21)
First real CI run on PR-1 surfaced this:
error: unexpected argument '--exclude-mail' found
tip: a similar argument exists: '--include-mail'
In lychee >= v0.21 the `--exclude-mail` flag was removed; mailto: links
are now skipped by default and the new opt-in flag is `--include-mail`.
The reusable infra-link-check.yml was still passing the old flag, so
lychee was crashing before checking any link. Every reusable
check-links job was reporting "success" anyway because:
- the lychee step has `continue-on-error: true` (so a crash doesn't
fail the job), and
- every caller in this repo currently sets `fail_on_broken: false`
(so the summary step also exits 0).
Net effect: link checking on PR-1 was a no-op. Fix is a one-arg
removal — skipping mail is the new default, which is what we want.
(Worth a separate followup: the summary step should distinguish
"lychee crashed" from "lychee found broken links" so that bad args
fail loudly even when fail_on_broken=false. Filed mentally as a
followup; not blocking this PR.)
|
||
|
|
73967f7c42 |
PR-2: Visual polish (announcement bars, theme persistence, dev-mirror fix, audit script) (#1405)
* fix(dev-mirror): compute prefix from dev-side depth in rewrite-dev-urls.sh The previous implementation hard-coded PREFIX="../" for any non-root subsite, which silently mis-rewrote every absolute mlsysbook.ai link on the dev preview for nested subsites (vol1, vol2 — they live at /book/vol1/ and /book/vol2/ on dev). The most visible symptom was the navbar title-href landing one level too shallow: clicking the navbar title from inside Vol I went to /book/ instead of the unified landing page at the dev root. Fix: derive PREFIX from the number of path segments in the calling subsite's dev-side path (book/vol1 → 2 hops → '../../') and use the mlsysbook.ai key (not the dev-path) for self-link detection. Add an explicit error if the caller passes a subsite name that is not in the SUBSITES map, instead of silently producing wrong rewrites. Sample rewrites with the fix: vol1 page https://mlsysbook.ai/ → ../../ vol1 page https://mlsysbook.ai/vol2/ → ../../book/vol2/ vol1 page https://mlsysbook.ai/kits/ → ../../kits/ kits page https://mlsysbook.ai/ → ../ kits page https://mlsysbook.ai/vol1/ → ../book/vol1/ Live builds are unaffected — they use the original absolute URLs. * feat(book): per-volume announcement bars (Crimson / ETH-Blue) Split the shared book announcement bar into two volume-scoped files so each volume gets audience-appropriate copy AND inherits the right brand tint. Vol I keeps the Harvard-Crimson tint (its theme accent) and the Foundations-flavored content; Vol II picks up the ETH-Blue tint (its theme accent) and Scale-flavored content that leads with the new volume launch and the cross-ecosystem build path. Files: - announcement-vol1.yml — new, Vol I copy, no hard-coded color (uses `type: primary` so .announcement / .alert-primary get $accent = $brand-crimson via theme-harvard.scss) - announcement-vol2.yml — new, Vol II copy, same pattern but theme feeds $accent = $brand-eth-blue via theme-eth.scss - announcement.yml — emptied to a no-op with a deprecation note; keep for one release cycle to avoid breaking any external metadata reference, then delete The CSS that translates `type: primary` into the per-theme tint already lived in book/quarto/assets/styles/_base-styles.scss (`.announcement { background: linear-gradient(... lighten($accent, 52%) ...) }`). No SCSS changes needed — the previous behavior of a single shared bar just hid that the tint was already theme-driven. Resolves the "Vol II announcement should be ETH-themed" QA note. * feat(theme): cross-site dark-mode persistence + FOUC guard Make dark-mode preference flow seamlessly across every subsite under mlsysbook.ai (Quarto-built and Next.js alike) and eliminate the theme-flash that dark-mode readers see on first paint. Quarto subsites (book / labs / kits / slides / instructors / mlsysim / tinytorch / unified site): - shared/config/site-head.html now inlines a tiny pre-paint script that reads `quarto-color-scheme` from localStorage (or falls back to OS preference) and applies `data-bs-theme`, `data-quarto-color-scheme`, and `style.color-scheme` on <html> BEFORE any other script runs. Eliminates the visible flash that was happening because Quarto's own toggle script runs late. - Listens for `storage` events so a toggle in tab A propagates to tab B without a refresh. - Inlined deliberately: the script is tiny, must be synchronous in <head> to avoid the flash, and inlining sidesteps per-subsite asset path differences. Canonical externalized source kept at shared/scripts/theme-persist.js for documentation/testability — if you change one, mirror to the other. StaffML (Next.js): - public/theme-bootstrap.js now reads the Quarto-side key as a fallback when StaffML has no local preference, so a user toggling dark mode on the book lands here in dark mode on first visit. - components/ThemeProvider.tsx mirrors writes back to `quarto-color-scheme`, so navigating onward to any Quarto subsite inherits StaffML's choice. Both subsystems retain their own keys as primary so each app's behavior is unchanged in isolation. The `quarto-color-scheme` key is the bridge contract — keep it stable across all theme code paths. * test(audit): Playwright site-audit script (sidebar / darkmode / assets) Single Playwright-driven QA script that the release-prep plan needs in three flavors. Implemented as one CLI with three subcommands so the shared boilerplate (browser launch, URL list, output dirs, screenshot naming) lives in one place and the per-site source-of-truth list does too. Subcommands: sidebar Assert every Quarto subsite exposes a populated, visible #quarto-sidebar / .sidebar-navigation. Skips sites that intentionally have no sidebar (landing, slides, StaffML). Catches the regression where Vol I/II builds dropped the sidebar after a config refactor. darkmode Force dark-mode via localStorage + data attributes, scroll top→bottom in 800px chunks (so lazy content renders), and screenshot full-page into _audit/darkmode/<site>.png for eyeball review. Surfaces "half-themed" widgets that CSS linters can't find (announcement bar, footer tiles, code blocks, etc.). assets Listen for failed network requests + 4xx/5xx responses on every site URL. Catches the broken <img> embeds reported during dev-mirror review (TinyTorch big-picture PDF viewer, Vol II cover) before they hit production. Targets dev / live / local with --target. Use --only <substring> to narrow scope. JSON report written to _audit/<cmd>.json for CI ingest. Exits non-zero on issues so it can become a blocking CI check once the baseline is clean. Requires `npm i -D playwright && npx playwright install chromium`. |
||
|
|
8f09e80c4c |
PR-3: Scripts, audits, cleanup (build stamp, PDF dropdown, 404s, mirror guard, dedup, RELEASE-PREP) (#1406)
* feat(footer): build-time "last updated" stamp
Add a small build-time stamp to the page footer ("Last updated YYYY-MM-DD
· <site> · <commit>") so readers can see at a glance that the site is
fresh. Quarto's per-page `date-modified` already exists for chapter
pages, but it doesn't capture site-level rebuilds (theme tweaks,
navbar changes, deploy reruns).
Pieces:
- shared/scripts/inject-build-stamp.sh: wraps a token-replace over a
build directory. Search-and-replace on `<!-- MLSB_BUILD_STAMP -->`
means sites that haven't adopted the token are unaffected — opt-in
rollout per subsite.
- book/quarto/config/shared/html/footer-common.yml: token added next
to the existing copyright line in the shared book footer.
- shared/config/footer-site.yml: token added next to the copyright
in the unified-site footer.
- shared/config/site-head.html: minimal CSS for `.mlsb-build-stamp`
(small, neutral, dark-mode aware).
- .github/workflows/kits-publish-live.yml: representative wiring —
runs the stamp step after build and before deploy. Other publish-
live workflows can adopt the step the same way as they roll
through release-prep validation.
* feat(navbar): expose paper.pdf for TinyTorch / MLSys·im / StaffML
Each of these subsites already builds a companion paper.tex in CI and
ships the PDF alongside the HTML site. Surface those papers in the
navbar dropdowns where readers actually look for them:
Build menu:
- TinyTorch → site
- TinyTorch Paper (file-pdf icon, opens in new tab)
→ /tinytorch/assets/downloads/TinyTorch-Paper.pdf
- MLSys·im → site
- MLSys·im Paper (file-pdf icon, opens in new tab)
→ /mlsysim/mlsysim-paper.pdf
Prepare menu (after a separator):
- StaffML Paper (file-pdf icon, opens in new tab)
→ /staffml/downloads/StaffML-Paper.pdf
Paper URLs are intentionally kept in lockstep with the build steps in
tinytorch-publish-live (assets/downloads/), mlsysim-publish-live
(site root), and staffml-publish-live (out/downloads/). If a build
path moves, both the workflow and this navbar entry need to move
together — there is no single source.
* feat(404): per-site 404 pages for slides / instructors / unified site
The book, kits, labs, mlsysim, and tinytorch subsites already have
flavored 404.qmd pages that route lost readers to the right
neighborhood. Add the missing three so every subsite under
mlsysbook.ai has a coherent recovery experience instead of falling
back to GitHub Pages' default white-page 404.
- slides/404.qmd — slide-deck flavored copy, pointers back to
the deck index, the volumes, and the hub.
- instructors/404.qmd — instructor-flavored copy, pointers to the
course map, slides, and both volumes.
- site/404.qmd — landing-page flavored copy, the most
ecosystem-wide nav (links to every subsite)
because this is the most common 404 source
for inbound links from the legacy single-
volume mlsysbook.ai.
StaffML already has its own React not-found.tsx so no work needed.
TinyTorch's legacy Sphinx 404.md is preserved for now (still wired on
the Sphinx site that hasn't migrated yet).
* ci(precommit): block subsite-mirror drift on shared assets
Add a pre-commit hook that runs `shared/scripts/sync-mirrors.sh --check`
on every commit. The hook fails if any of the per-subsite real-file
mirrors (subscribe-modal.js, theme SCSS partials, logo) has drifted
from its canonical source in `shared/`.
Why a guard, not just a sync: Quarto's resource-copy step preserves
symlinks instead of dereferencing them, so we have to keep real
copies. Without the guard, "I'll edit the canonical and forget to
re-sync" silently re-introduces the duplicate-divergence bug we just
spent effort fixing. `always_run: true` because a mirror can drift via
deletion of the canonical, not just by editing the canonical itself.
To re-sync after a deliberate change:
bash shared/scripts/sync-mirrors.sh
* refactor(audit): duplicate-file finder + clean up obvious leftover
Add shared/scripts/find-duplicates.py as a periodic duplication
auditor. It SHA-1 hashes every source-y file across the ecosystem
roots, groups identical contents, subtracts the intentional groups
declared in shared/scripts/sync-mirrors.sh, and reports the rest as
unintended duplicates. JSON report written to .audit/duplicates.json
for CI ingest later; --strict makes it exit non-zero.
Defaults err on the side of being useful out of the box:
- Skips symlinks (those are deliberate aliases, not duplicates).
- Skips small files (<256B) — LICENSE stubs, .gitkeep, etc.
- Skips _site / _build / node_modules / .next / out / .git.
- Source-y suffix list (.js, .ts, .scss, .css, .html, .yml, .py, .sh).
Binary assets (images, PDFs) are NOT scanned because their dup
story is different (logos, icons are intentionally repeated).
Initial-cleanup pass:
- Delete tinytorch/scripts/cleanup_repo_history.sh — byte-identical
leftover; the canonical version lives at
tinytorch/tools/maintenance/cleanup_history.sh and is the one
referenced by tinytorch/tools/maintenance/README.md.
After this commit the only remaining unintended duplicate is
runHistoryProvider.ts in three vscode-ext packages (kits / labs /
tinytorch). Promoting that into a shared vscode-ext package is real
refactor work — out of scope for release-prep, captured for later.
Add .audit/ and _audit/ (the latter from the Playwright site-audit
script) to .gitignore.
* docs(release-prep): handoff notes covering all five PR groupings
Add a single document at the worktree root that walks through what
this branch contains, why each piece is there, the recommended PR
split (PR-1 safety-net, PR-2 visual polish, PR-3 scripts/audits/
cleanup, PR-4 TinyTorch prep, PR-5 cutover skeletons), what was
intentionally LEFT OUT (and why), and what verification was done
locally vs. what still needs the dev mirror to exercise.
Treat this as the cover memo for the staged-rollout foundation
work; once the five PRs are individually merged into dev, this file
will outlive the branch but the per-PR sections still document why
each piece exists for anyone debugging months from now.
|
||
|
|
773e106c63 |
PR-5: Cutover skeletons (rollback-legacy + redirect map + sitemap aggregator) (#1409)
* feat(launch): rollback-legacy.sh — snapshot + restore the gh-pages root
Add the panic button for the mlsysbook.ai cutover. The staged-rollout
plan keeps the legacy single-volume site at the gh-pages root while
the new properties (Vol I, Vol II, TinyTorch, labs, kits, slides,
mlsysim, instructors, staffml, unified landing) get deployed into
subdirectories. Once everything is verified, the unified landing
page replaces the legacy root — and at exactly that moment we want a
one-command revert path that doesn't require remembering which gh-
pages SHA "the old root" lived at.
Three modes:
snapshot Take a timestamped backup of the legacy root files
(everything at gh-pages root that is NOT a known
subsite directory) and push to legacy-backup/<TS>/.
restore <ID> Copy a snapshot back to root, OVERWRITING current
root files but leaving subsite directories alone.
list List available snapshots.
Design choices worth flagging:
1. Subsite-aware. The script hard-codes the list of top-level
subsite directories (book/, tinytorch/, kits/, labs/, mlsysim/,
slides/, instructors/, interviews/, staffml/, about/, community/,
newsletter/) and excludes them from BOTH snapshot capture AND
restore overwrites. Rolling back the legacy landing page should
never wipe out actively-deployed properties.
2. Dry-run by default. Every destructive mode requires --apply. The
default behavior prints what would happen, including a diff
preview for restore. This is the same posture the existing
sync-mirrors.sh / link-checker / publish-guard scripts take.
3. Snapshots are kept, not moved. Restoring a snapshot is itself a
reversible commit on gh-pages; the snapshot directory is preserved
so a "rollback the rollback" is one more command away.
4. Doesn't touch the working tree. Operates against a fresh shallow
clone in mktemp, so it can be run from any clone of the repo
(developer machine or a GitHub Actions runner) without dirtying
anything local.
Typical sequence on launch day is documented inline at the top of
the script. Two short commands wrap the whole rollout: snapshot
before deploy, restore-by-ID if anything looks wrong.
* feat(seo): redirect-map skeleton + HTML-stub generator
Add the cutover plumbing for legacy-URL → new-URL redirects so the
PageRank accumulated under the old single-volume mlsysbook.ai
structure flows into the new ecosystem URLs (`/book/vol1/`,
`/labs/`, `/about/`, etc.) as soon as the unified landing replaces
the legacy root.
Two artifacts:
1. `shared/config/redirect-map.json` — declarative source of truth.
Schema:
- `from`: legacy path (must start with '/')
- `to`: destination URL or path (resolves against base_url)
- `status`: 301 / 302 / 307 / 308 (default 301)
- `note`: optional human note
A trailing-`*` wildcard is supported in `from` for whole-subtree
moves like `/contents/labs/* → /labs/*`. The file ships
intentionally small: just enough entries to demonstrate the
patterns and seed the launch — populating the full inventory
from the legacy mlsysbook.ai sitemap is a separate task.
2. `shared/scripts/build-redirects.py` — generator.
For each entry it emits a tiny HTML stub at the legacy path
containing:
<meta http-equiv="refresh" content="0;url=<dest>">
<link rel="canonical" href="<dest>">
<meta name="robots" content="noindex,follow">
That combo is the closest GitHub-Pages-friendly equivalent of a
301: real users get redirected in <100ms; crawlers treat the
canonical as authoritative and drop the legacy URL on recrawl;
PageRank flows through. The script ALSO emits a Netlify-format
`_redirects` file from the same map, so the day we move off
GitHub Pages (Cloudflare Pages, Netlify, S3+CF) the same source
of truth produces real 301s with no rewrite.
`--check` mode validates the JSON without writing anything (CI
hook). Wildcards skip stub emission (we'd have to walk the
deployed tree to expand them) but are still emitted to the
Netlify file where they work natively.
Wiring into a *-publish-live workflow is a one-liner step
(`python3 shared/scripts/build-redirects.py --map shared/config/
redirect-map.json --out gh-pages-staging/`) but is intentionally
NOT done in this commit — it should land alongside the actual
unified-landing deploy, when there is something for the legacy
URLs to redirect away from.
* feat(seo): aggregate per-subsite sitemaps into mlsysbook.ai/sitemap.xml
The new ecosystem has every subsite (Vol I, Vol II, TinyTorch, labs,
kits, slides, instructors, mlsysim, staffml, the unified landing)
emitting its own `<subsite>/sitemap.xml` because that's what Quarto
and Next produce automatically. Search engines, however, want a
single authoritative entry point per *domain*. Without an aggregated
index they end up either crawling the subsite sitemaps separately
(if they happen to discover them) or missing some entirely.
This commit adds the aggregator:
shared/scripts/build-sitemap.py
Walks a deployed gh-pages tree, discovers every sitemap.xml under
it (skipping the root one, legacy-backup snapshots, _archive,
_site, and the like), and writes a single sitemap-index.xml at
`<root>/sitemap.xml` that points at each subsite's sitemap as a
`<sitemap><loc>…</loc></sitemap>` entry. It also creates or
appends to `<root>/robots.txt` so the index is surfaced to
crawlers via the standard `Sitemap:` directive.
Optional `--include-subsite` allowlist (repeatable) for staged
rollouts where we want the index to advertise only the subsites
that have been verified live, even if other ones happen to be
deployed in the tree. Defaults to "everything found".
`--check` does discovery without writing.
.github/workflows/infra-build-sitemap.yml
Reusable workflow (`workflow_call`) wrapping the script so any
`*-publish-live` workflow can refresh the index as its final
step. Also `workflow_dispatch`-able for manual rebuilds. Joins
the existing `gh-pages-deploy` concurrency group so it never
races a publish push.
Uses sparse-checkout to grab just the script from `dev` (no need
to clone the whole monorepo into the runner) and a full clone of
`gh-pages` to do the work.
Wiring into per-subsite publish workflows happens in a follow-up
commit alongside the actual launch — this PR is "skeletons", and
the per-publish trigger is best landed when each subsite's launch
PR ships.
|
||
|
|
2190968942 |
refactor: deduplicate subscribe-modal + socratiQ via mirror sync script
Quarto's resource-copy step preserves symlinks rather than dereferencing
them, which breaks both local builds (AlreadyExists on the second pass)
and gh-pages deploys (relative symlink targets fall outside _build/).
And Sass resolves @import relative to the importing file's physical
location, not the symlink target. So symlinks inside the resource path
are not a viable dedup mechanism.
Instead, keep real file copies in each consumer subsite and enforce
dedup at edit time with shared/scripts/sync-mirrors.sh:
- bash shared/scripts/sync-mirrors.sh # propagate canonicals
- bash shared/scripts/sync-mirrors.sh --check # CI: fail on drift
Mirror map (source | mirrors):
shared/scripts/subscribe-modal.js -> {site, book/quarto, labs, kits,
mlsysim/docs}/.../subscribe-modal.js
Intentional non-mirrors (left untouched, customized variants):
tinytorch/site-quarto/assets/scripts/subscribe-modal.js (TinyTorch-branded)
tinytorch/site/_static/subscribe-modal.js (legacy Sphinx)
Also dedupe the SocratiQ widget bundle via a symlink (safe here because
book/tools/ sits outside any Quarto project, so the resource walker
never touches it):
book/tools/scripts/socratiQ/bundle.js -> ../../../quarto/tools/scripts/socratiQ/bundle.js
The shared canonical (book/quarto/tools/scripts/socratiQ/bundle.js) is
the version actually referenced and served in production.
|
||
|
|
396506d29d |
refactor(site): unify 4 site subsites into single Quarto project
Architecture: - Merge landing, about, community, newsletter into one site/ project - Move navbar-common.yml to shared/config/ (used by 12 configs) - Create shared/config/footer-site.yml for centralized footer - Create shared/scripts/subscribe-modal.js as canonical copy - Single _quarto.yml replaces 4 independent configs - One site_libs/ copy replaces four Features gained: - Google Analytics on ALL hub pages (was only on book volumes) - Subscribe modal on landing page (was missing) - Centralized footer with consistent links Workflows updated: - site-preview-dev.yml: matrix strategy → single build job - site-publish-live.yml: loop over subsites → single build + deploy - sync-newsletter.yml: builds from unified site project - publish-all-live.yml: removed stale subsite input - rewrite-dev-urls.sh: added --shallow flag for unified builds All 12 navbar-common.yml references updated: book vol1/vol2, site (unified), slides, instructors, interviews, kits, labs, mlsysim |