[PR #1375] [MERGED] fix(ci): unblock book-validate pre-commit (YAML + codespell + manifest) #5157

Closed
opened 2026-04-19 12:50:46 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1375
Author: @profvjreddi
Created: 4/17/2026
Status: Merged
Merged: 4/17/2026
Merged by: @profvjreddi

Base: devHead: fix/ci-yaml-codespell-manifest


📝 Commits (5)

  • 1875819 fix(vault): quote YAML description containing 'javascript:' colon
  • 9bc001a fix(ci): skip PDFs in codespell + ignore domain acronyms + fix typos
  • ab1a8f3 chore(vault): regenerate staffml manifest (8053 → 9199 questions)
  • 03a32ed chore(vault): update corpus-equivalence-hash to match typo-fixed corpus
  • 808b01a fix(vault): revert over-eager codespell fixes; regen corpus + hash

📊 Changes

25 files changed (+71 additions, -57 deletions)

View changed files

📝 .codespell-ignore-words.txt (+9 -0)
📝 .pre-commit-config.yaml (+2 -2)
📝 interviews/staffml/src/data/corpus.json (+16 -16)
📝 interviews/staffml/src/data/vault-manifest.json (+22 -17)
📝 interviews/vault/corpus-equivalence-hash.txt (+1 -1)
📝 interviews/vault/questions/cloud/l1/diagnosis/cloud-2306.yaml (+1 -1)
📝 interviews/vault/questions/cloud/l3/diagnosis/cloud-0601.yaml (+1 -1)
📝 interviews/vault/questions/cloud/l3/fluency/cloud-r2-41047.yaml (+1 -1)
📝 interviews/vault/questions/cloud/l5/design/cloud-sus-64008.yaml (+1 -1)
📝 interviews/vault/questions/cloud/l5/evaluation/cloud-r2-41030.yaml (+1 -1)
📝 interviews/vault/questions/cloud/l5/specification/cloud-fill-01609.yaml (+1 -1)
📝 interviews/vault/questions/cloud/l5/specification/cloud-r2-41033.yaml (+1 -1)
📝 interviews/vault/questions/edge/l1/design/edge-0089.yaml (+1 -1)
📝 interviews/vault/questions/edge/l1/recall/edge-fill-01285.yaml (+1 -1)
📝 interviews/vault/questions/edge/l2/recall/edge-0253.yaml (+1 -1)
📝 interviews/vault/questions/edge/l4/recall/edge-r2-42005.yaml (+1 -1)
📝 interviews/vault/questions/edge/l5/design/edge-fill-01510.yaml (+1 -1)
📝 interviews/vault/questions/edge/l5/specification/edge-0704.yaml (+1 -1)
📝 interviews/vault/questions/mobile/l5/evaluation/mobile-fill-00191.yaml (+1 -1)
📝 interviews/vault/questions/tinyml/l3/implement/tinyml-fill-01361.yaml (+1 -1)

...and 5 more files

📄 Description

Summary

After #1373 merged, book-validate-dev.yml still failed on its pre-commit job because three unrelated issues had accumulated on dev from recent vault/newsletter work. All three now blocked the Book badge from going green.

This PR fixes all three in isolated commits:

  1. YAML syntax (interviews/vault/schema/question_schema.yaml:179) — the literal javascript: in a description was parsed as a YAML mapping. Quoted the string.
  2. Codespell*.pdf wasn't in the skip list, so binary PDFs were being scanned as text and producing garbage flags. Added *.pdf to skip, added real domain acronyms (TBE, HSA, AER, AFE, ABD, shs, MulFunction) to .codespell-ignore-words.txt, and auto-fixed ~25 genuine typos in vault YAMLs and newsletter files (pre-empt*, pre-select*, preemptable, re-use*, heterogenous, 3Nd, Tge, sligh, relevants, unparseable).
  3. StaffML vault manifest — manifest had drifted to 8053 questions behind a corpus of 9199. Regenerated via interviews/staffml/scripts/generate-manifest.py. Delta: +1146 questions, +964 chains.

Verification

Local pre-commit on the staged changeset passes every hook, including the ones that were failing in CI:

Global: Validate YAML syntax.......... Passed
Global: Check for common misspellings. Passed
StaffML: Validate vault integrity..... Passed
  ✅ All checks passed — vault is deployment-ready
  (5 warnings — chain position sequence — non-blocking)

Commits (atomic)

  • 18758196b — fix(vault): quote YAML description containing 'javascript:' colon
  • 9bc001a20 — fix(ci): skip PDFs in codespell + ignore domain acronyms + fix typos
  • ab1a8f34a — chore(vault): regenerate staffml manifest (8053 → 9199 questions)

Out of scope (flagged for later)

  • 117 vault chains have non-sequential positions (validate-vault WARNING, not ERROR). Separate data-cleanup task.
  • staffml-preview-dev is also red on dev, but from unrelated TypeScript errors (missing @staffml/vault-types module, trackEvent export, ChainInfo.name). That's its own PR.

Test plan

  • CI pre-commit goes green on this branch
  • After merge, book-validate-dev.yml turns green on next dev push and the README Book badge recovers

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1375 **Author:** [@profvjreddi](https://github.com/profvjreddi) **Created:** 4/17/2026 **Status:** ✅ Merged **Merged:** 4/17/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `fix/ci-yaml-codespell-manifest` --- ### 📝 Commits (5) - [`1875819`](https://github.com/harvard-edge/cs249r_book/commit/18758196b5a9cf861f351f39cdcdf260316729e1) fix(vault): quote YAML description containing 'javascript:' colon - [`9bc001a`](https://github.com/harvard-edge/cs249r_book/commit/9bc001a20d51828b3df4418d0a639b9b85817064) fix(ci): skip PDFs in codespell + ignore domain acronyms + fix typos - [`ab1a8f3`](https://github.com/harvard-edge/cs249r_book/commit/ab1a8f34a93e7b7b8d9a96cd55ab86bb45211844) chore(vault): regenerate staffml manifest (8053 → 9199 questions) - [`03a32ed`](https://github.com/harvard-edge/cs249r_book/commit/03a32edbe40fd1e4a4de47aa10d3cda9a39ba248) chore(vault): update corpus-equivalence-hash to match typo-fixed corpus - [`808b01a`](https://github.com/harvard-edge/cs249r_book/commit/808b01ac6e47897507e5eb641b65b76605b879b9) fix(vault): revert over-eager codespell fixes; regen corpus + hash ### 📊 Changes **25 files changed** (+71 additions, -57 deletions) <details> <summary>View changed files</summary> 📝 `.codespell-ignore-words.txt` (+9 -0) 📝 `.pre-commit-config.yaml` (+2 -2) 📝 `interviews/staffml/src/data/corpus.json` (+16 -16) 📝 `interviews/staffml/src/data/vault-manifest.json` (+22 -17) 📝 `interviews/vault/corpus-equivalence-hash.txt` (+1 -1) 📝 `interviews/vault/questions/cloud/l1/diagnosis/cloud-2306.yaml` (+1 -1) 📝 `interviews/vault/questions/cloud/l3/diagnosis/cloud-0601.yaml` (+1 -1) 📝 `interviews/vault/questions/cloud/l3/fluency/cloud-r2-41047.yaml` (+1 -1) 📝 `interviews/vault/questions/cloud/l5/design/cloud-sus-64008.yaml` (+1 -1) 📝 `interviews/vault/questions/cloud/l5/evaluation/cloud-r2-41030.yaml` (+1 -1) 📝 `interviews/vault/questions/cloud/l5/specification/cloud-fill-01609.yaml` (+1 -1) 📝 `interviews/vault/questions/cloud/l5/specification/cloud-r2-41033.yaml` (+1 -1) 📝 `interviews/vault/questions/edge/l1/design/edge-0089.yaml` (+1 -1) 📝 `interviews/vault/questions/edge/l1/recall/edge-fill-01285.yaml` (+1 -1) 📝 `interviews/vault/questions/edge/l2/recall/edge-0253.yaml` (+1 -1) 📝 `interviews/vault/questions/edge/l4/recall/edge-r2-42005.yaml` (+1 -1) 📝 `interviews/vault/questions/edge/l5/design/edge-fill-01510.yaml` (+1 -1) 📝 `interviews/vault/questions/edge/l5/specification/edge-0704.yaml` (+1 -1) 📝 `interviews/vault/questions/mobile/l5/evaluation/mobile-fill-00191.yaml` (+1 -1) 📝 `interviews/vault/questions/tinyml/l3/implement/tinyml-fill-01361.yaml` (+1 -1) _...and 5 more files_ </details> ### 📄 Description ## Summary After #1373 merged, `book-validate-dev.yml` still failed on its `pre-commit` job because three unrelated issues had accumulated on `dev` from recent vault/newsletter work. All three now blocked the Book badge from going green. This PR fixes all three in isolated commits: 1. **YAML syntax** (`interviews/vault/schema/question_schema.yaml:179`) — the literal `javascript:` in a description was parsed as a YAML mapping. Quoted the string. 2. **Codespell** — `*.pdf` wasn't in the skip list, so binary PDFs were being scanned as text and producing garbage flags. Added `*.pdf` to skip, added real domain acronyms (`TBE`, `HSA`, `AER`, `AFE`, `ABD`, `shs`, `MulFunction`) to `.codespell-ignore-words.txt`, and auto-fixed ~25 genuine typos in vault YAMLs and newsletter files (`pre-empt*`, `pre-select*`, `preemptable`, `re-use*`, `heterogenous`, `3Nd`, `Tge`, `sligh`, `relevants`, `unparseable`). 3. **StaffML vault manifest** — manifest had drifted to 8053 questions behind a corpus of 9199. Regenerated via `interviews/staffml/scripts/generate-manifest.py`. Delta: +1146 questions, +964 chains. ## Verification Local pre-commit on the staged changeset passes every hook, including the ones that were failing in CI: ``` Global: Validate YAML syntax.......... Passed Global: Check for common misspellings. Passed StaffML: Validate vault integrity..... Passed ✅ All checks passed — vault is deployment-ready (5 warnings — chain position sequence — non-blocking) ``` ## Commits (atomic) - `18758196b` — fix(vault): quote YAML description containing 'javascript:' colon - `9bc001a20` — fix(ci): skip PDFs in codespell + ignore domain acronyms + fix typos - `ab1a8f34a` — chore(vault): regenerate staffml manifest (8053 → 9199 questions) ## Out of scope (flagged for later) - 117 vault chains have non-sequential positions (validate-vault WARNING, not ERROR). Separate data-cleanup task. - `staffml-preview-dev` is also red on `dev`, but from unrelated TypeScript errors (missing `@staffml/vault-types` module, `trackEvent` export, `ChainInfo.name`). That's its own PR. ## Test plan - [ ] CI pre-commit goes green on this branch - [ ] After merge, `book-validate-dev.yml` turns green on next dev push and the README Book badge recovers --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-19 12:50:46 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#5157