[PR #1348] [MERGED] feat(vault): v0.9.0 — YAML source of truth + release pipeline + CC-BY-NC-4.0 corpus #8136

Closed
opened 2026-04-27 17:26:46 -05:00 by GiteaMirror · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/harvard-edge/cs249r_book/pull/1348
Author: @profvjreddi
Created: 4/16/2026
Status: Merged
Merged: 4/16/2026
Merged by: @profvjreddi

Base: devHead: feat/vault-architecture


📝 Commits (10+)

  • 84a381d docs(vault): Round 1 review ledger — 4 reviewers, 50+ findings
  • c51ca7b docs(vault): architecture v2 (Round-1 review integration)
  • aa5db46 docs(vault): architecture v2.1 (Round-2 review integration)
  • eaca501 docs(vault): detailed testing plan and cutover QA checklist
  • 5ee46fc feat(vault-cli): Phase 0 package scaffold
  • 6dff01c docs(vault): Phase 0 documentation deliverables
  • 8af0948 ci(vault): Phase 0 CI workflow + exemplar-coverage audit
  • 812ba40 feat(vault): Phase 1 core — schema, hashing, policy, loader, validator
  • f633cc9 feat(vault-cli): Phase 1 commands — build, check, new/edit/rm/restore/move, api, serve
  • 7a8b016 feat(vault): Phase 1 corpus split — 9,657 YAML files, 9,199 published

📊 Changes

9783 files changed (+499014 additions, -156561 deletions)

View changed files

.github/workflows/staffml-link-check.yml (+0 -247)
.github/workflows/vault-ci.yml (+148 -0)
.github/workflows/vault-content-hash-sli.yml (+148 -0)
interviews/CONTRIBUTING.md (+190 -0)
📝 interviews/README.md (+3 -3)
📝 interviews/paper/corpus_stats.json (+46 -1073)
📝 interviews/paper/macros.tex (+41 -27)
📝 interviews/paper/scripts/generate_macros.py (+72 -123)
interviews/staffml-vault-types/index.ts (+80 -0)
interviews/staffml-vault-types/package.json (+9 -0)
interviews/staffml-vault-worker/.gitignore (+7 -0)
interviews/staffml-vault-worker/README.md (+58 -0)
interviews/staffml-vault-worker/migrations/0001_bootstrap.sql (+81 -0)
interviews/staffml-vault-worker/package-lock.json (+3456 -0)
interviews/staffml-vault-worker/package.json (+21 -0)
interviews/staffml-vault-worker/src/index.ts (+518 -0)
interviews/staffml-vault-worker/src/rate_limit.ts (+83 -0)
interviews/staffml-vault-worker/src/types.ts (+64 -0)
interviews/staffml-vault-worker/tests/worker.test.ts (+195 -0)
interviews/staffml-vault-worker/tsconfig.json (+19 -0)

...and 80 more files

📄 Description

StaffML vault architecture migration

Status: migration-complete in-repo; deploy-gated on your wrangler login.

What this does

Moves the 9,199-question StaffML corpus from a 19 MB JSON blob inlined into every page bundle to a proper YAML-as-source + SQLite-build + edge-D1 serving architecture.

  • YAML is the sole authoring surface (9,657 per-question files under interviews/vault/questions/). Pre-commit hook refuses direct edits to the generated corpus.json; CI enforces YAML → JSON equivalence.
  • 22-subcommand vault CLI for authoring, building, releasing, verifying, shipping.
  • Cloudflare D1 Worker + keyset-paginated API + Cache API + rate limiting + release-keyed ETag — code ready; deploy is user-action.
  • Release 0.9.0 committed as citable artifact at interviews/vault/releases/0.9.0/ with vault.db, release.json, migration SQL, and release_hash fe69d4c4... reproducible from YAML source via vault verify.
  • License: corpus under CC-BY-NC-4.0 (interviews/vault/questions/LICENSE); vault-cli unchanged from historical state.

The bug this closes

v1 paper reported 9,199 questions; v1 site rendered 8,053. They used different filter predicates. Post-migration: both read from the same SQL over the same vault.db with a single release-policy.yaml predicate. Paper and site agree by construction.

Review history

Four full adversarial review rounds with Chip Huyen, Jeff Dean, Soumith Chintala, and an industry-engineer lens. 80+ findings, all integrated or explicitly deferred with rationale. See interviews/vault/REVIEWS.md.

Tests

  • pytest: 38/38 green
  • vitest (worker): 7/7 green
  • ruff: clean
  • vault check --strict on 9,657 questions: 0 load errors, 0 invariant failures
  • vault verify 0.9.0: citation round-trip passes

Deploy runbook

See interviews/vault-cli/docs/CUTOVER_QA.md for the sequential operator checklist. Phase-3 entry gates (FTS5 load test) and Phase-4 cutover (canary ship + 48h watch) are user-action when ready.

Companion docs


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/harvard-edge/cs249r_book/pull/1348 **Author:** [@profvjreddi](https://github.com/profvjreddi) **Created:** 4/16/2026 **Status:** ✅ Merged **Merged:** 4/16/2026 **Merged by:** [@profvjreddi](https://github.com/profvjreddi) **Base:** `dev` ← **Head:** `feat/vault-architecture` --- ### 📝 Commits (10+) - [`84a381d`](https://github.com/harvard-edge/cs249r_book/commit/84a381d9278f1e29e45adb1d6203f566270b579c) docs(vault): Round 1 review ledger — 4 reviewers, 50+ findings - [`c51ca7b`](https://github.com/harvard-edge/cs249r_book/commit/c51ca7bae71c6565b1f521edb56edf70891d9063) docs(vault): architecture v2 (Round-1 review integration) - [`aa5db46`](https://github.com/harvard-edge/cs249r_book/commit/aa5db46a0be9915f49c63c51dcb116f2b597be41) docs(vault): architecture v2.1 (Round-2 review integration) - [`eaca501`](https://github.com/harvard-edge/cs249r_book/commit/eaca50116a9809e948b811443371e517086179e5) docs(vault): detailed testing plan and cutover QA checklist - [`5ee46fc`](https://github.com/harvard-edge/cs249r_book/commit/5ee46fc2a575421218c8b093018b1b4fd769042c) feat(vault-cli): Phase 0 package scaffold - [`6dff01c`](https://github.com/harvard-edge/cs249r_book/commit/6dff01c0654a0e812490ad79a87938bba7a971c1) docs(vault): Phase 0 documentation deliverables - [`8af0948`](https://github.com/harvard-edge/cs249r_book/commit/8af0948a35ff7cb5d307f64bafa951c784cd07c9) ci(vault): Phase 0 CI workflow + exemplar-coverage audit - [`812ba40`](https://github.com/harvard-edge/cs249r_book/commit/812ba408d0e8aac5c0c2b2edefd296247ac7e947) feat(vault): Phase 1 core — schema, hashing, policy, loader, validator - [`f633cc9`](https://github.com/harvard-edge/cs249r_book/commit/f633cc9174a9da3fbc1060329728a27c6ef37828) feat(vault-cli): Phase 1 commands — build, check, new/edit/rm/restore/move, api, serve - [`7a8b016`](https://github.com/harvard-edge/cs249r_book/commit/7a8b016001f49dde5cc1af05f83a6eb309780b70) feat(vault): Phase 1 corpus split — 9,657 YAML files, 9,199 published ### 📊 Changes **9783 files changed** (+499014 additions, -156561 deletions) <details> <summary>View changed files</summary> ➖ `.github/workflows/staffml-link-check.yml` (+0 -247) ➕ `.github/workflows/vault-ci.yml` (+148 -0) ➕ `.github/workflows/vault-content-hash-sli.yml` (+148 -0) ➕ `interviews/CONTRIBUTING.md` (+190 -0) 📝 `interviews/README.md` (+3 -3) 📝 `interviews/paper/corpus_stats.json` (+46 -1073) 📝 `interviews/paper/macros.tex` (+41 -27) 📝 `interviews/paper/scripts/generate_macros.py` (+72 -123) ➕ `interviews/staffml-vault-types/index.ts` (+80 -0) ➕ `interviews/staffml-vault-types/package.json` (+9 -0) ➕ `interviews/staffml-vault-worker/.gitignore` (+7 -0) ➕ `interviews/staffml-vault-worker/README.md` (+58 -0) ➕ `interviews/staffml-vault-worker/migrations/0001_bootstrap.sql` (+81 -0) ➕ `interviews/staffml-vault-worker/package-lock.json` (+3456 -0) ➕ `interviews/staffml-vault-worker/package.json` (+21 -0) ➕ `interviews/staffml-vault-worker/src/index.ts` (+518 -0) ➕ `interviews/staffml-vault-worker/src/rate_limit.ts` (+83 -0) ➕ `interviews/staffml-vault-worker/src/types.ts` (+64 -0) ➕ `interviews/staffml-vault-worker/tests/worker.test.ts` (+195 -0) ➕ `interviews/staffml-vault-worker/tsconfig.json` (+19 -0) _...and 80 more files_ </details> ### 📄 Description # StaffML vault architecture migration **Status**: migration-complete in-repo; deploy-gated on your `wrangler login`. ## What this does Moves the 9,199-question StaffML corpus from a 19 MB JSON blob inlined into every page bundle to a proper YAML-as-source + SQLite-build + edge-D1 serving architecture. - **YAML is the sole authoring surface** (9,657 per-question files under `interviews/vault/questions/`). Pre-commit hook refuses direct edits to the generated `corpus.json`; CI enforces YAML → JSON equivalence. - **22-subcommand `vault` CLI** for authoring, building, releasing, verifying, shipping. - **Cloudflare D1 Worker + keyset-paginated API + Cache API + rate limiting + release-keyed ETag** — code ready; deploy is user-action. - **Release 0.9.0** committed as citable artifact at `interviews/vault/releases/0.9.0/` with `vault.db`, `release.json`, migration SQL, and release_hash `fe69d4c4...` reproducible from YAML source via `vault verify`. - **License**: corpus under CC-BY-NC-4.0 ([`interviews/vault/questions/LICENSE`](interviews/vault/questions/LICENSE)); vault-cli unchanged from historical state. ## The bug this closes v1 paper reported 9,199 questions; v1 site rendered 8,053. They used different filter predicates. Post-migration: both read from the same SQL over the same `vault.db` with a single `release-policy.yaml` predicate. **Paper and site agree by construction.** ## Review history Four full adversarial review rounds with Chip Huyen, Jeff Dean, Soumith Chintala, and an industry-engineer lens. 80+ findings, all integrated or explicitly deferred with rationale. See [`interviews/vault/REVIEWS.md`](interviews/vault/REVIEWS.md). ## Tests - **pytest**: 38/38 green - **vitest** (worker): 7/7 green - **ruff**: clean - **`vault check --strict`** on 9,657 questions: 0 load errors, 0 invariant failures - **`vault verify 0.9.0`**: citation round-trip passes ## Deploy runbook See [`interviews/vault-cli/docs/CUTOVER_QA.md`](interviews/vault-cli/docs/CUTOVER_QA.md) for the sequential operator checklist. Phase-3 entry gates (FTS5 load test) and Phase-4 cutover (canary ship + 48h watch) are user-action when ready. ## Companion docs - [ARCHITECTURE.md v2.4](interviews/vault/ARCHITECTURE.md) — 1,800+ line design doc with keyed changelog v1 → v2.4. - [TESTING.md](interviews/vault/TESTING.md) — test plan + CI spec + phase gates. - [REVIEWS.md](interviews/vault/REVIEWS.md) — 4-round review ledger. - [CONTRIBUTING.md](interviews/CONTRIBUTING.md) — quickstart + NC-license guidance. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
GiteaMirror added the pull-request label 2026-04-27 17:26:46 -05:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github-starred/cs249r_book#8136