v2.3 \u2192 v2.4. ARCHITECTURE.md header + Appendix reflect the completed
migration.
WHAT CLOSED (\u00a711.1 contract):
1. `vault build --legacy-json` regenerates the site's
interviews/staffml/src/data/corpus.json from YAML. 9,199 published
questions, site-compatible shape (chain_positions back to 0-indexed
dict form, bloom_level derived from zone, competency_area aliased
from topic, scope aliased from track). Deterministic via sort_keys +
id-sort.
2. Pre-commit hook INSTALLED via worktree-aware Makefile target
(`make -C interviews/vault-cli hooks`). Symlink points at
pre_commit_corpus_guard.py. Tested end-to-end: direct edit to
vault/corpus.json triggers exit-1 with §11.1 reference.
3. CI equivalence check added to .github/workflows/vault-ci.yml:
regenerates corpus.json from YAML, diffs against committed. Fails
PR on drift with actionable error message.
4. Legacy generators demoted with DEPRECATED headers:
- interviews/paper/scripts/analyze_corpus.py \u2192 vault export-paper
- interviews/staffml/scripts/sync-vault.py \u2192 vault build --legacy-json
- interviews/staffml/scripts/generate-manifest.py \u2192 vault publish
- interviews/vault/scripts/export_to_staffml.py \u2192 vault build --legacy-json
5. New DEPRECATED.md files at interviews/vault/scripts/ and
interviews/staffml/scripts/ map every legacy script to its
replacement. Both directories keep the old scripts for git-history
legibility and archaeology; new contributors see the vault CLI first.
6. ARCHITECTURE.md \u00a7Appendix rewritten as current-state table instead
of aspirational "gone. replaced by..." entries.
NEW TESTS (interviews/vault-cli/tests/test_legacy_export.py \u2014 +4):
- test_legacy_shape_matches_site_interface: every field corpus.ts
declares is present in regenerated JSON.
- test_chain_positions_legacy_shape: 1-indexed new schema \u2192
0-indexed legacy dict form.
- test_emitter_deterministic: byte-stable across reversed input order
(required for CI diff-check).
- test_competency_area_aliases_topic: legacy alias fields populated
correctly.
FULL MATRIX GREEN:
pytest: 38/38 passed in 0.19s (34 + 4 legacy-export)
ruff: All checks passed
hook: exit 0 on clean diff / exit 1 on corpus.json direct edit
e2e: vault build --legacy-json regenerates a bit-identical corpus.json
vs the committed one; CI check wired to catch drift
WHAT'S LEFT (deploy-gated, \u00a720.5 #1, #5, #6 partial, #8, #9):
- Production serves from D1: requires Phase-3 wrangler d1 create + deploy
- Manual QA per CUTOVER_QA.md: requires live staging
- Zero data loss D1-side verification: requires live D1
- 48h monitoring: requires production traffic
These are intrinsically user-action; the YAML-side migration is done.
vault-cli/src/vault_cli/commands/stats.py (NEW, B.8)
vault stats — live scorecard over vault.db with --format-prometheus
scrape mode + --exemplar-coverage audit shim. Reports total / topics
/ chains / by_status / by_track / by_provenance. Resolves R3 gap
about missing stats subcommand.
vault-cli/src/vault_cli/commands/codegen.py (NEW, B.7)
vault codegen --check — Phase-1 presence-and-non-empty verification
of the 3 shared-artifact files (models.py, d1-schema.sql,
@staffml/vault-types/index.ts). Full LinkML-driven generation is
Phase-2 follow-up.
vault-cli/Makefile (NEW, B.2)
make install / test / lint / hooks / hooks-uninstall. Hooks target
symlinks pre_commit_corpus_guard.py into .git/hooks/pre-commit.
vault-cli/scripts/check_registry_append_only.py (NEW, B.3)
CI script verifying id-registry.yaml is append-only vs base branch.
Rejects removed or reordered lines — C-5 enforcement at merge time.
vault/questions/LICENSE (NEW)
CC-BY-4.0 for corpus content. BibTeX template with release_hash
placeholder. Scope note clarifies vault-cli is MIT separately.
vault-cli/LICENSE (NEW)
MIT for vault-cli Python package + scripts + docs. Scope note
clarifies corpus is CC-BY-4.0 separately.
staffml/src/lib/corpus-vault.ts (NEW, B.11)
Vault-API-backed data source mirroring corpus.ts public surface.
Adapts @staffml/vault-types Question → legacy Question shape so
callers don't need to change. Not wired into any component yet —
the swap happens via corpus-source.ts.
staffml/src/lib/corpus-source.ts (NEW, B.11)
Cutover router: getCorpusSource() returns 'static' or 'vault-api'
based on NEXT_PUBLIC_VAULT_FALLBACK. Components that opt into the
cutover import from here; others continue using corpus.ts directly
(unchanged behavior). Phase-4 cutover flips components one-by-one
rather than big-bang-replacing corpus.ts.
Phase-1/2 now has the full CLI surface (19 subcommands), LICENSEs
for legal Phase-3 deploy, and the site-side cutover pathway ready
for Phase-4 canary.