2 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
cbdb566381 feat(vault): Phase-1 migration contract fully closed in-repo
v2.3 \u2192 v2.4. ARCHITECTURE.md header + Appendix reflect the completed
migration.

WHAT CLOSED (\u00a711.1 contract):
  1. `vault build --legacy-json` regenerates the site's
     interviews/staffml/src/data/corpus.json from YAML. 9,199 published
     questions, site-compatible shape (chain_positions back to 0-indexed
     dict form, bloom_level derived from zone, competency_area aliased
     from topic, scope aliased from track). Deterministic via sort_keys +
     id-sort.
  2. Pre-commit hook INSTALLED via worktree-aware Makefile target
     (`make -C interviews/vault-cli hooks`). Symlink points at
     pre_commit_corpus_guard.py. Tested end-to-end: direct edit to
     vault/corpus.json triggers exit-1 with §11.1 reference.
  3. CI equivalence check added to .github/workflows/vault-ci.yml:
     regenerates corpus.json from YAML, diffs against committed. Fails
     PR on drift with actionable error message.
  4. Legacy generators demoted with DEPRECATED headers:
     - interviews/paper/scripts/analyze_corpus.py \u2192 vault export-paper
     - interviews/staffml/scripts/sync-vault.py \u2192 vault build --legacy-json
     - interviews/staffml/scripts/generate-manifest.py \u2192 vault publish
     - interviews/vault/scripts/export_to_staffml.py \u2192 vault build --legacy-json
  5. New DEPRECATED.md files at interviews/vault/scripts/ and
     interviews/staffml/scripts/ map every legacy script to its
     replacement. Both directories keep the old scripts for git-history
     legibility and archaeology; new contributors see the vault CLI first.
  6. ARCHITECTURE.md \u00a7Appendix rewritten as current-state table instead
     of aspirational "gone. replaced by..." entries.

NEW TESTS (interviews/vault-cli/tests/test_legacy_export.py \u2014 +4):
  - test_legacy_shape_matches_site_interface: every field corpus.ts
    declares is present in regenerated JSON.
  - test_chain_positions_legacy_shape: 1-indexed new schema \u2192
    0-indexed legacy dict form.
  - test_emitter_deterministic: byte-stable across reversed input order
    (required for CI diff-check).
  - test_competency_area_aliases_topic: legacy alias fields populated
    correctly.

FULL MATRIX GREEN:
  pytest:  38/38 passed in 0.19s (34 + 4 legacy-export)
  ruff:    All checks passed
  hook:    exit 0 on clean diff / exit 1 on corpus.json direct edit
  e2e:     vault build --legacy-json regenerates a bit-identical corpus.json
           vs the committed one; CI check wired to catch drift

WHAT'S LEFT (deploy-gated, \u00a720.5 #1, #5, #6 partial, #8, #9):
  - Production serves from D1: requires Phase-3 wrangler d1 create + deploy
  - Manual QA per CUTOVER_QA.md: requires live staging
  - Zero data loss D1-side verification: requires live D1
  - 48h monitoring: requires production traffic

These are intrinsically user-action; the YAML-side migration is done.
2026-04-16 14:57:24 -04:00
Vijay Janapa Reddi
1bc93374e1 feat(vault): Phase-1/2 polish + LICENSEs + corpus cutover branch
vault-cli/src/vault_cli/commands/stats.py (NEW, B.8)
  vault stats — live scorecard over vault.db with --format-prometheus
  scrape mode + --exemplar-coverage audit shim. Reports total / topics
  / chains / by_status / by_track / by_provenance. Resolves R3 gap
  about missing stats subcommand.

vault-cli/src/vault_cli/commands/codegen.py (NEW, B.7)
  vault codegen --check — Phase-1 presence-and-non-empty verification
  of the 3 shared-artifact files (models.py, d1-schema.sql,
  @staffml/vault-types/index.ts). Full LinkML-driven generation is
  Phase-2 follow-up.

vault-cli/Makefile (NEW, B.2)
  make install / test / lint / hooks / hooks-uninstall. Hooks target
  symlinks pre_commit_corpus_guard.py into .git/hooks/pre-commit.

vault-cli/scripts/check_registry_append_only.py (NEW, B.3)
  CI script verifying id-registry.yaml is append-only vs base branch.
  Rejects removed or reordered lines — C-5 enforcement at merge time.

vault/questions/LICENSE (NEW)
  CC-BY-4.0 for corpus content. BibTeX template with release_hash
  placeholder. Scope note clarifies vault-cli is MIT separately.

vault-cli/LICENSE (NEW)
  MIT for vault-cli Python package + scripts + docs. Scope note
  clarifies corpus is CC-BY-4.0 separately.

staffml/src/lib/corpus-vault.ts (NEW, B.11)
  Vault-API-backed data source mirroring corpus.ts public surface.
  Adapts @staffml/vault-types Question → legacy Question shape so
  callers don't need to change. Not wired into any component yet —
  the swap happens via corpus-source.ts.

staffml/src/lib/corpus-source.ts (NEW, B.11)
  Cutover router: getCorpusSource() returns 'static' or 'vault-api'
  based on NEXT_PUBLIC_VAULT_FALLBACK. Components that opt into the
  cutover import from here; others continue using corpus.ts directly
  (unchanged behavior). Phase-4 cutover flips components one-by-one
  rather than big-bang-replacing corpus.ts.

Phase-1/2 now has the full CLI surface (19 subcommands), LICENSEs
for legal Phase-3 deploy, and the site-side cutover pathway ready
for Phase-4 canary.
2026-04-16 13:10:16 -04:00