mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 02:03:55 -05:00
* ci(links): add Tier 1 pre-commit internal-link checker
Wire shared/scripts/check-internal-links.py into pre-commit to validate
relative-path markdown links and same-file anchors in changed .md/.qmd
files. External (http/https) URLs are deliberately out of scope here —
that belongs to Lychee in CI (Tier 2 per-site validate-dev, Tier 3
nightly rot scan).
The hook ignores fenced code blocks and inline code spans to avoid
false positives on TikZ syntax embedded in Quarto sources, and ships
with a baseline exclude list (auto-generated quartodoc API stubs,
legacy Sphinx 404s, GitHub line-range anchors) so it can land without
churn on existing content. Tighten the exclude list incrementally as
those areas are cleaned up.
Part of the staged-rollout safety net.
* ci(links): Tier 2 per-site Lychee validate-dev coverage
Generalize the reusable Lychee workflow and extend per-site validate-dev
coverage so every shippable property has external-link reachability as a
CI signal.
Reusable workflow (.github/workflows/infra-link-check.yml):
- New inputs: lycheeignore_path, fail_on_broken (default false),
accept_status. Resolves the ignore file at runtime and warns if
missing rather than crashing the job.
- Summary step now exits non-zero only when fail_on_broken is true,
so it can be used as a non-blocking baseline today and tightened
per site later.
Shared ignore file (shared/config/.lycheeignore):
Universal patterns reused across sites (localhost, Google Slides
behind auth, known transient 404s, the live preview targets we are
about to publish to). The book keeps its existing canonical ignore
at book/config/linting/.lycheeignore — do not duplicate.
Per-site validate-dev:
- book, instructors, kits, labs, mlsysim, slides, tinytorch:
add a check-links job calling the reusable workflow, scoped to
that site's content tree and using the shared ignore file (book
keeps its own). All wired with fail_on_broken=false initially so
we discover the external-link baseline without blocking dev CI.
- site, staffml: new validate-dev workflows so the unified landing
page and StaffML have first-class CI parity (build + smoke + link
check + summary), matching the cadence used by the other sites.
- All summary steps updated to surface link-check results and to
mark them explicitly as non-blocking until baselines are clean.
Part of the staged-rollout safety net (Tier 2 of the link-checking
strategy: pre-commit / per-site / nightly).
* ci(release): publish-live green gate + nightly link rot tracker
Two safety nets that close the loop on the staged-rollout plan: prevent
shipping from an unvalidated baseline, and keep a durable record of
external link rot across all sites.
Publish guard (.github/workflows/infra-publish-guard.yml):
Reusable workflow called as the first job in every publish-live
pipeline. Queries the GitHub API for the latest run of the matching
validate-dev workflow on the dev branch and fails the publish if
that run is not 'success' or is older than max_age_minutes (default
24h). Inputs: validate_workflow (required), branch (default 'dev'),
max_age_minutes (default 1440).
Wire-up: every *-publish-live.yml now starts with a `guard` job and
chains its existing first job's `needs` to depend on it.
- book: guard runs only when confirm == 'PUBLISH' and not in
testing_mode (matches the existing dispatch-guard pattern).
- tinytorch: guard runs in addition to its in-band preflight (which
re-runs validate-dev against the publish commit). Defense in depth
on a workflow that already builds tags + PyPI artifacts.
- kits, labs, instructors, mlsysim, slides, site, staffml: guard is
the first job; the existing build-and-deploy / build job depends
on it.
Nightly link-rot sweep (.github/workflows/infra-link-rot-nightly.yml):
Runs at 04:30 UTC daily. Sweeps every site in parallel using the
Tier 2 reusable workflow, then aggregates results into a single
sticky GitHub issue (label: link-rot) so triage has one source of
truth instead of dozens of opened/closed tickets. Each run rewrites
the issue body with the current per-site status table and appends
a count comment so trend over time stays visible.
Manual trigger supports a dry_run input that prints the report to the
job log without touching the issue.
Part of the staged-rollout safety net (Tier 3 + green-gate enforcement).
* fix(ci): drop --exclude-mail from Lychee args (removed in v0.21)
First real CI run on PR-1 surfaced this:
error: unexpected argument '--exclude-mail' found
tip: a similar argument exists: '--include-mail'
In lychee >= v0.21 the `--exclude-mail` flag was removed; mailto: links
are now skipped by default and the new opt-in flag is `--include-mail`.
The reusable infra-link-check.yml was still passing the old flag, so
lychee was crashing before checking any link. Every reusable
check-links job was reporting "success" anyway because:
- the lychee step has `continue-on-error: true` (so a crash doesn't
fail the job), and
- every caller in this repo currently sets `fail_on_broken: false`
(so the summary step also exits 0).
Net effect: link checking on PR-1 was a no-op. Fix is a one-arg
removal — skipping mail is the new default, which is what we want.
(Worth a separate followup: the summary step should distinguish
"lychee crashed" from "lychee found broken links" so that bad args
fail loudly even when fail_on_broken=false. Filed mentally as a
followup; not blocking this PR.)