16 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
bc26a0bf37 feat(vault): Phase 6 schema tightening — markers + Details forbid + invariant
Three coordinated edits to lift the marker convention from a soft
draft-validation gate to a published-corpus invariant:

1. interviews/vault/schema/question_schema.yaml (LinkML, source of truth):
   common_mistake and napkin_math gain regex patterns matching the
   AUTHORING.md Pitfall/Rationale/Consequence and Assumptions/
   Calculations/Conclusion conventions. Documents the spec; enforced
   in the validator below.

2. interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived):
   Details flips from extra='allow' to extra='forbid'. A pre-flight
   survey on 2026-05-04 across all 10,711 YAMLs found 0 unknown keys
   on Details, so the historical 'imported legacy fields' risk no
   longer applies.

3. interviews/vault-cli/src/vault_cli/validator.py:
   structural_tier gains _check_format_markers (invariant #19), which
   flags published YAMLs whose non-empty cm/nm doesn't match the
   AUTHORING.md markers. Drafts are exempt — author-in-progress drafts
   may still have malformed markers. Lifts gate_format from
   validate_drafts.py / _judges.py from a CI-time gate to a
   vault-check-strict invariant.

Tests: 4 new cases in test_models covering Details forbid, marker-
compliant pass, malformed cm fail, and draft-exempt skip. Total
88 passing (was 84). codegen-hashes.txt updated for the models.py
edit; vault codegen --check passes.

The on-disk corpus is fully clean post-Phase-5+drain: vault check
--strict reports 10,711 loaded, 0 invariant failures, 0 format-
marker violations on published YAMLs.
2026-05-04 08:41:08 -04:00
Vijay Janapa Reddi
03031dc38e test(vault-cli): smoke tests for audit_corpus_batched batching
7 tests covering pack_batches:
  - empty input → no batches
  - single small item → one batch
  - no items lost across batches (50 items, 10/batch → all 50 round-trip)
  - max_items_per_batch caps batch size (33 items, 10/batch → 10/10/10/3)
  - max_chars triggers a flush before items overflow the budget
  - input order preserved within and across batches
  - oversized single item still lands in a batch (we don't drop, the
    caller is expected to detect overflow downstream)

The audit script itself can't easily be unit-tested in CI (it
subprocess-shells the gemini CLI); the batching helper is the main
piece of pure logic, so this is where the value is.

  84 / 84 pytest pass (was 77; added 7)

CORPUS_HARDENING_PLAN.md Phase 3.
2026-05-03 08:23:08 -04:00
Vijay Janapa Reddi
f691d6c14a feat(vault-cli): vault new scaffolds full Pitfall/Rationale/Consequence + Assumptions/Calculations/Conclusion stubs
The previous scaffold only stubbed scenario and realistic_solution with
<TODO> placeholders. That meant authors had to know about the markup
conventions from somewhere else (the regex in validate_drafts.py, the
SCHEMA_SUMMARY in generate_question_for_gap.py, or the paragraph in
ARCHITECTURE.md §3.6.1) — none of which a new contributor would find.

Now `vault new` produces a YAML with the canonical bold markers
pre-written. Authors fill in the content between markers; they can't
forget to use them.

Templates extracted as module-level constants (COMMON_MISTAKE_TEMPLATE
and NAPKIN_MATH_TEMPLATE in commands/authoring.py) so they're testable
in isolation. New tests in test_authoring_scaffold.py guard against
accidental marker removal — if a contributor edits the scaffold and
drops, say, **The Rationale:**, the test fails immediately rather than
every new question silently failing the format gate downstream.

  77 / 77 pytest pass (was 74; added 3)
  ruff clean
  vault check --strict — 10,711 loaded, 0 invariant failures

CORPUS_HARDENING_PLAN.md Phase 2.
2026-05-03 08:11:59 -04:00
Vijay Janapa Reddi
270b1a5bd2 fix(vault): drop 55 Δ=0 chains + remove Δ=0 from lenient mode
Action on the strongest finding from the 2026-05-01 independent audit:
54 of 55 Δ=0 chains had no shared scenario (the "two questions
sharing a scenario thread" constraint the lenient prompt was supposed
to enforce). Two independent audit fields agreed (verdict=bad and
shared_scenario=no), so this isn't a tuning question — the design
choice was wrong.

Why remove Δ=0 entirely rather than tighten the prompt:

  - The chain definition is "pedagogical progression through Bloom
    levels"; same-level edges contradict the definition.
  - The "shared scenario / different angle" carve-out is unenforceable
    by an LLM at corpus scale (audit confirmed).
  - Same-scenario same-level pairs are more honestly modeled as
    siblings of a chain anchor, not as chain members.

Changes:
  - chains.json: 879 → 824. Dropped: 55 chains (all tier=secondary,
    since Δ=0 was only ever produced by the lenient sweep).
    Per-track: edge -19, tinyml -12, mobile -10, cloud -7, global -7.
  - build_chains_with_gemini.py:
      MODE_CONFIG["lenient"]["allowed_deltas"]: {0,1,2,3} → {1,2,3}
      LENIENT_PROMPT_TEMPLATE: Δ=0 paragraph rewritten to explicitly
        REJECT same-level pairs (with rationale citing the audit).
      docstring + --mode help text updated.
  - tests/test_chain_validation.py:
      test_lenient_accepts_same_level_pair → test_lenient_rejects_same_level_pair
      header docstring updated to reflect the new rule.
  - vault-manifest.json: chainCount 879 → 824, releaseHash rolls to
    479811040b7a… (real content delta, not a timestamp churn).

Validation:
  - vault check --strict: 10,705 loaded, 0 failures
  - vault build --local-json: chainCount=824, releaseHash=479811040b…
  - pytest: 74/74
  - playwright chain-and-vault-smoke: 19/19 (fixtures cloud-0001 +
    cloud-0231 are still in their chains post-drop)

Audit findings #2 (gap detection ~50% noise) and #3 (4 pilot drafts
disposition) remain open — see CHAIN_ROADMAP.md Progress Log.
2026-05-02 08:51:49 -04:00
Vijay Janapa Reddi
9680e8e9fd feat(vault+staffml): Phase 2 — tier surfacing, schema → TS → UI
Carries the primary/secondary chain tier (from Phase 1) through the
build pipeline into the practice + explore surfaces, so primary chains
are the unmarked default and secondary chains are an opt-in alternative
path the user can deep-link into via ?chain=<id>.

Backend (2.1):
  - legacy_export.py emits chain_tiers per question alongside chain_ids
    and chain_positions; missing chain-tier defaults to "primary".
  - vault build re-run: 2953 chained questions, all carry chain_tiers
    (releaseHash unchanged — new field is additive, doesn't perturb the
    manifest hash inputs).
  - Existing legacy_export tests were stale (asserted on the v1.0 YAML
    chains: field path; v1.1 made chains.json the sidecar source).
    Rewrote them to write chains.json fixtures into tmp_path and added
    chain_tiers assertions, plus a focused
    test_chain_tiers_emitted_per_membership case.

TypeScript (2.2):
  - Question.chain_tiers? (Record<string, "primary"|"secondary">)
  - ChainTier export, ChainInfo.tier required.
  - getChainForQuestion / getAllChainsForQuestion populate tier;
    getAllChains... sorts primary first.
  - New getPrimaryChainForQuestion(qid) helper for default surfaces.

UI (2.3):
  - practice page reads ?chain=<id> URL param; defaults to
    getPrimaryChainForQuestion when unset.
  - ChainBadge gains an inline "alt path" pill when tier=secondary
    (always visible — no click needed).
  - ChainStrip mirrors that pill in the progress row for users who
    expand the strip.
  - Explore page prefers the first non-secondary chain when picking
    activeChainId for the related-questions panel.
  - Deferred to a follow-up commit (intentional, scoped via Progress Log):
    explore-page "Primary only / All" filter; daily/mock routing.

Tests (2.4):
  - test7_tier_aware_chain_routing in chain-and-vault-smoke.mjs:
    secondary reachable via ?chain=, alt-path badge visible on
    secondary, primary regression, alt-path badge ABSENT on primary.
  - Full smoke suite: 17/17 pass (was 13/13).

Validation:
  - vault check --strict: 10,701 loaded, 0 failures
  - vault build --legacy-json: 9438 published, chainCount=879
  - pytest interviews/vault-cli/tests: 74/74
  - npx tsc --noEmit: 0 errors
  - playwright chain-and-vault-smoke: 17/17

Phase 2 complete. Next: Phase 3 (gap-driven authoring; 407-gap backlog).
2026-04-30 20:22:54 -04:00
Vijay Janapa Reddi
d272d374aa feat(chains): --mode lenient + tier field for second-pass coverage
Phase 1.2 + 1.3 of CHAIN_ROADMAP.md. The two land together because the
prompt template, validator Δ-rule, and tier-tagging must stay in lockstep
or chains.proposed.lenient.json would mis-validate.

build_chains_with_gemini.py:
  - new LENIENT_PROMPT_TEMPLATE alongside renamed STRICT_PROMPT_TEMPLATE;
    lenient template tells Gemini to accept Δ ∈ {0,1,2,3}, with Δ=0 only
    for shared-scenario same-level pairs and Δ=3 last-resort
  - MODE_CONFIG single-source-of-truth maps mode → (template, allowed Δ set)
  - validate_chain now takes mode= and gates on the per-mode Δ set
  - process_batch tags lenient-mode chains with tier="secondary" and
    a chain_id suffix (-secondary) so primary/secondary IDs never collide
  - new --mode {strict,lenient} flag (default strict — primary chains
    keep producing under the same rules as before)
  - new --buckets-from <chain-coverage.json> flag that restricts the run
    to the uncovered_buckets list from diagnose_chain_coverage.py
    (the Phase 1.4 second-pass entry point)

apply_proposed_chains.py:
  - docstring note: tier field is intentionally not validated here
    (it's a UI hint, not a structural invariant)
  - already accepts Δ=0 chains via its non-strict monotonicity check, so
    no logic change needed

tests/test_chain_validation.py:
  - 19 cases covering both modes: strict accepts +1/+2 and rejects Δ=0,
    Δ≥3, and backward; lenient accepts Δ=0/Δ=3 but still rejects Δ≥4 and
    backward; both modes reject size-out-of-range, multi-topic, and
    unknown qids. Loads the script via importlib (it's not part of the
    importable vault_cli package).

Smoke check (--dry-run --buckets-from chain-coverage.json --mode lenient):
17 calls planned for the 211 uncovered buckets, well under the 200 cap.
2026-04-30 19:29:12 -04:00
Vijay Janapa Reddi
1d3c91d8e8 fix(vault-cli): ruff auto-fixes — datetime.UTC alias + drop unused import
ruff catches:
- UP017: datetime.timezone.utc -> datetime.UTC alias (Python 3.11+).
- I001 / F401: drop unused Details import in test_models.py.

Both auto-applied via `ruff check --fix`. Lint now passes; unblocks the
publish-live green gate.
2026-04-26 10:11:16 -04:00
Vijay Janapa Reddi
eb71638630 feat(vault): release-grade Phase G — full audit + cleanup + 0.1.3 release
Final brute-force release-readiness pass: every gate green, 0.1.3
released and verified, every observable failure mode closed at source.

═══ AUDITS (G.A–G.D) ═══

G.A — gemini-3.1-pro-preview default everywhere. Active CLI scripts
    already used it; bulk-patched 6 legacy scripts (`generate_batch.py`,
    `validate_questions.py`, `generate_gaps.py`, `run_reviews.sh`,
    `generate.py`, `review_math.sh`) + WORKFLOW.md off `gemini-2.5-flash`
    or `gemini-2.5-pro` to `gemini-3.1-pro-preview`. Only `archive/`
    references remain (intentionally legacy).

G.B — Cloudflare workflow audit. `vault verify 0.1.1` correctly
    failed (YAMLs evolved since 0.1.1 cut). Confirmed `vault publish`,
    `vault deploy`, `vault ship`, `vault rollback`, `vault verify`,
    `vault snapshot`, `vault tag` all wired. Released 0.1.2 then 0.1.3
    to lock final state.

G.C — Visual asset integrity audit. 236/236 YAML visual references
    resolve, 0 orphan SVGs, 0 missing files, 0 unrendered sources.
    Clean.

G.D — Unit tests for new validators added at `tests/test_models.py`:
    15 tests covering Visual.kind enum, Visual.path regex, Visual.alt
    + caption min lengths + required, Question._zone_bloom_compatible
    (recall+remember accepted, recall+evaluate rejected, mastery+
    remember rejected, evaluation+evaluate accepted, design+create
    accepted), Question._visual_path_resolves. **15/15 pass.**

═══ CONTENT CLEANUP (G.E–G.L) ═══

G.E — Sample re-judge of 100 random cloud parallelism items via
    Gemini 3.1 Pro Preview (4 API calls): 53% PASS / 23% NEEDS_FIX /
    24% DROP. Surfaced legacy quality drift — items generated under
    pre-Phase-D laxer prompts were not meeting the new strict bar
    (math errors with bidirectional vs unidirectional NVLink,
    "Based on the diagram..." references with no diagram, deprecated
    practices like SSP for modern LLM training, wrong-track scenarios
    like Cortex-M4 in cloud track).

G.H — General-purpose cleanup agent on 47 flagged items:
    **31 rewritten** with PARALLELISM_RULES bar applied (concrete
    unidirectional NVLink 450 GB/s, IB NDR 25 GB/s, RoCE v2 22 GB/s,
    PCIe Gen3 12 GB/s; multi-step ring AllReduce arguments with the
    2(N-1)/N factor; non-obvious failure modes); **16 archived** with
    documented `deletion_reason` (mathematically broken premises,
    physics errors, topic-irreconcilable, direct duplicates).

G.L — Re-judge of 31 G.H rewrites: **23 PASS / 3 NEEDS_FIX / 5 DROP =
    74.2% pass rate**. The 8 still-failing items archived (after the
    cleanup pass still couldn't satisfy the strict bar). Contract:
    items get THREE chances — original generation, fix-agent, retry-
    fix — and if they still fail, archived not promoted. Honest.

═══ STUBBORN-FAIL ARCHIVES (Phase F residuals) ═══

After three independent fix-agent passes (Phase C, F.2, F.4), 4 items
remained NEEDS_FIX or DROP: edge-2390, edge-2401, mobile-1948,
tinyml-1681. Archived with `deletion_reason` documenting the 3-attempt
failure history. The cell may be structurally awkward; preserving
items for audit but removing from the bundle.

═══ ORPHAN CHAIN FIX ═══

After archives, `cloud-chain-359` had only 1 published member
(`cloud-1840`); its sibling `cloud-1845` got archived. Dropped the
chain ref from cloud-1840 + ran `repair_chains.py` to clean residual
references in archived YAMLs. `vault check --strict` now passes 0
chain warnings.

═══ E.2 / E.3 SHIPPED EARLIER IN PRIOR COMMIT ═══

(Documented in commit `20ea20005` for completeness):
- `vault build --legacy-json` auto-emits `vault-manifest.json`.
- `analyze_coverage_gaps.py --include-areas <areas>` flag.

═══ 0.1.3 FINAL RELEASE ═══

`vault publish 0.1.3` snapshot at `releases/0.1.3/`. Migrations:
+0 ~27 -28 (zero net new questions, 27 modified during cleanup, 28
archived/promoted). `vault verify 0.1.3` ✓ — release_hash
`793c06f414f2bf8391a8a5c56ec0ff8d76bfce4ab7c64ad12ecb83f6d932280e`
reconstructs from YAML. Latest symlink → 0.1.3.

═══ FINAL ALL-9-GATES SWEEP — ALL GREEN ═══

[1] vault check --strict          ✓ 10,701 / 0 errors / 0 invariants
[2] vault lint                    ✓ 0 errors / 0 warnings / 9,757 info
[3] vault doctor                  ✓ 0 fails (registry-history info OK)
[4] vault codegen --check         ✓ artifacts in sync
[5] vault verify 0.1.3            ✓ hash reconstructs from YAML
[6] staffml validate-vault        ✓ 0 errors / 0 warnings, deployment-ready
[7] render_visuals                ✓ 236 visuals, 0 errors
[8] tsc                           ✓ TypeScript clean
[9] Playwright                    ✓ 9/9 pass

═══ FINAL CORPUS STATE ═══

Bundle: 9,757 published (was 9,224 at branch cut, **+533 net** across
the full multi-session push, after all archives).

Total commits on branch since cut: 10.
Release tag latest: 0.1.3 (verified-clean).
Status: StaffML-day-ready. Ship it.
2026-04-25 19:45:32 -04:00
Vijay Janapa Reddi
37414fed9e chore(vault): regen staffml corpus + wire drift checks into CI
- Regenerate interviews/staffml/src/data/corpus.json from the v1.0
  YAMLs. 9,199 published questions (up from 9,113 — recovered 86).
  Every record carries validated + math_verified booleans; human_reviewed
  surfaces when populated. Dead 'scope' field dropped.
- Regenerate interviews/staffml/src/data/vault-manifest.json to match:
  questionCount 9199, chainCount 970, levelDistribution now shows L6+
  as 946 (up from 0) and L1 corrected to 462 (down from 1,387 inflated).
- Wire check_schema_sync.py into the pre-commit config under the
  StaffML section.
- Wire check_schema_sync.py into the 'Vault + Corpus Smoke Tests' CI
  job so PR builds fail on enums.py ↔ LinkML drift.
- Update test_legacy_export.py for v1.0: plural chains, classification
  on Question body, competency_area passed through instead of resolved.
2026-04-21 18:27:42 -04:00
Vijay Janapa Reddi
c3b8411230 fix(vault): resolve competency_area from topics.json instead of aliasing topic
The legacy exporter was setting competency_area = topic, collapsing
the 13 canonical areas into 87 single-topic areas. The Vault UI
showed "1 topics" per area instead of meaningful sub-groupings.

Now resolves each topic's area via topics.json (87 topics -> 13 areas)
with a graceful fallback to the topic slug if topics.json is absent.
2026-04-17 18:47:02 -04:00
Vijay Janapa Reddi
cbdb566381 feat(vault): Phase-1 migration contract fully closed in-repo
v2.3 \u2192 v2.4. ARCHITECTURE.md header + Appendix reflect the completed
migration.

WHAT CLOSED (\u00a711.1 contract):
  1. `vault build --legacy-json` regenerates the site's
     interviews/staffml/src/data/corpus.json from YAML. 9,199 published
     questions, site-compatible shape (chain_positions back to 0-indexed
     dict form, bloom_level derived from zone, competency_area aliased
     from topic, scope aliased from track). Deterministic via sort_keys +
     id-sort.
  2. Pre-commit hook INSTALLED via worktree-aware Makefile target
     (`make -C interviews/vault-cli hooks`). Symlink points at
     pre_commit_corpus_guard.py. Tested end-to-end: direct edit to
     vault/corpus.json triggers exit-1 with §11.1 reference.
  3. CI equivalence check added to .github/workflows/vault-ci.yml:
     regenerates corpus.json from YAML, diffs against committed. Fails
     PR on drift with actionable error message.
  4. Legacy generators demoted with DEPRECATED headers:
     - interviews/paper/scripts/analyze_corpus.py \u2192 vault export-paper
     - interviews/staffml/scripts/sync-vault.py \u2192 vault build --legacy-json
     - interviews/staffml/scripts/generate-manifest.py \u2192 vault publish
     - interviews/vault/scripts/export_to_staffml.py \u2192 vault build --legacy-json
  5. New DEPRECATED.md files at interviews/vault/scripts/ and
     interviews/staffml/scripts/ map every legacy script to its
     replacement. Both directories keep the old scripts for git-history
     legibility and archaeology; new contributors see the vault CLI first.
  6. ARCHITECTURE.md \u00a7Appendix rewritten as current-state table instead
     of aspirational "gone. replaced by..." entries.

NEW TESTS (interviews/vault-cli/tests/test_legacy_export.py \u2014 +4):
  - test_legacy_shape_matches_site_interface: every field corpus.ts
    declares is present in regenerated JSON.
  - test_chain_positions_legacy_shape: 1-indexed new schema \u2192
    0-indexed legacy dict form.
  - test_emitter_deterministic: byte-stable across reversed input order
    (required for CI diff-check).
  - test_competency_area_aliases_topic: legacy alias fields populated
    correctly.

FULL MATRIX GREEN:
  pytest:  38/38 passed in 0.19s (34 + 4 legacy-export)
  ruff:    All checks passed
  hook:    exit 0 on clean diff / exit 1 on corpus.json direct edit
  e2e:     vault build --legacy-json regenerates a bit-identical corpus.json
           vs the committed one; CI check wired to catch drift

WHAT'S LEFT (deploy-gated, \u00a720.5 #1, #5, #6 partial, #8, #9):
  - Production serves from D1: requires Phase-3 wrangler d1 create + deploy
  - Manual QA per CUTOVER_QA.md: requires live staging
  - Zero data loss D1-side verification: requires live D1
  - 48h monitoring: requires production traffic

These are intrinsically user-action; the YAML-side migration is done.
2026-04-16 14:57:24 -04:00
Vijay Janapa Reddi
4aae33c036 test+ci: green test matrix + lint-clean + real vitest + committed lockfile
LOCAL TEST RESULTS (all green):
  pytest:  34 passed in 0.19s (28 existing + 6 new command tests)
  ruff:    All checks passed  (0 errors)
  vitest:  7 passed in 127ms (worker contract tests)
  CLI e2e: vault --version / build / verify / stats / doctor / diff /
           export-paper / ship --dry-run / publish + verify rc1 / api shim
           via curl against 9199-question corpus — all green

Python-side fixes:
- interviews/vault-cli/pyproject.toml: ruff config now has principled
  per-file-ignores for B008 (Typer pattern), N806 (DAG cycle colors),
  E402 (scripts), SIM118 (sqlite3.Row iterator). Keeps signal tight.
- 13 real ruff violations fixed across authoring.py (contextlib.suppress),
  diff_cmd.py + serve_api.py (dict(sqlite3.Row) instead of broken
  .keys() iteration), policy.py (direct return), release.py (zip
  strict=True, update_latest_symlink now validates target exists;
  previous 'target' variable was unused), commands/release.py
  (import order reshuffled, ambiguous 'l' renamed).
- commands/release.py ship_cmd leg-skip uses 'leg' not 'l'.

New pytest file: interviews/vault-cli/tests/test_commands.py (+6 tests)
  - stats: JSON shape + Prometheus format.
  - diff: add/remove/modify detection + classification.
  - doctor: graceful skip on missing vault; unknown --check returns
    USAGE_ERROR.
  - codegen: --check passes against baseline.

Worker-side fixes:
- src/index.ts cachedOrCompute graceful-degrades when caches global
  isn't available (Node test env, future-proofing against runtime
  regressions).
- src/index.ts handleSearch: 'query: q' → 'query: qRaw' (q was
  renamed earlier).
- src/rate_limit.ts: removed unused WINDOW_MS const.
- tests/worker.test.ts: vi.resetModules() between tests so
  module-level schemaOk/lastSeenRelease state doesn't leak
  across test cases (fingerprint memoization was sticky).
- package.json: added test:watch + lint aliases.
- .gitignore: node_modules, .wrangler, dist, .dev.vars.
- package-lock.json committed (npm — pnpm not on the machine; CI
  updated to use npm ci).

CI (.github/workflows/vault-ci.yml):
- Split into python + worker jobs.
- Python job: ruff + mypy (non-blocking) + pytest + vault check
  --strict + vault build release_hash regression + vault codegen
  --check + registry append-only + exemplar audit staleness.
- Worker job: node 20 + npm ci + tsc typecheck + vitest run.
- Triggers now include staffml-vault-types path (keeps CI honest
  when shared-types drift).

What runs vs what's gated on user:
  RAN LOCALLY: pytest, ruff, vitest, tsc, CLI end-to-end smoke
              (build→verify→export→stats→doctor→diff→publish
              rc→api-shim→ship --dry-run), full corpus invariants.
  GATED ON USER (requires Cloudflare credentials):
    - wrangler login + wrangler d1 create
    - wrangler d1 execute (schema + seed)
    - pnpm/npm deploy:staging
    - FTS5 production load-test
    - vault ship --env production (live D1 + Next.js + tag push)

Everything that CAN be verified without credentials HAS been.
2026-04-16 14:30:20 -04:00
Vijay Janapa Reddi
42f4d1ca8b fix(vault): Round-3 correctness + vault ship + authoring contract
Round-3 review (4 reviewers on v2.1) surfaced two code-correctness
Criticals that this commit fixes, plus the contracted-but-missing
`vault ship` coordinator and David's authoring-UX gaps.

Critical fixes (real bugs in landed code):

worker/src/index.ts
- SCHEMA_FINGERPRINT placeholder fail-closed (Chip R3-C1 / Dean R3-NH-3).
  Was: placeholder auto-passed and silently disabled the fingerprint
  check. Now: placeholder forces degraded mode until operator sets
  real fingerprint.
- DDL hash now includes triggers (FTS5-aware).
- release_id change invalidates schema-fingerprint memoization
  (Dean R3-NH-4).
- wrangler.toml now pins the real fingerprint.

staffml/public/sw.js
- /manifest polling TTL-throttled to 5min (Chip R3-C2). Was:
  per-request fetch nullified the §10.4 cost model.
- API origin persisted to IndexedDB; rehydrated on activate so cold
  offline wake-ups serve cached content (Chip R3-H3).

vault-cli/src/vault_cli/release.py
- emit_migrations diffs all 4 tables via PRAGMA-driven column
  introspection (Dean R3-NC-1 + R3-NH-2). Was: only questions table,
  silently missing chains/chain_questions/tags. Rollback-symmetry
  test extended to populate + verify all tables.

vault-cli/src/vault_cli/commands/release.py
- vault verify --git-ref reconstructs release from 'git archive <ref>'
  into a tempdir (Dean R3-NC-2). Was: always rebuilt from HEAD, so
  verifying a historical release always failed post-authoring.
  Academic-citability contract (C-3) now actually holds.

vault-cli/src/vault_cli/ship.py (NEW)
- vault ship composed verb with journaling (Dean R3-NH-1):
  * Legs run D1 → Next.js → paper-tag-last (§6.1.1 ordering).
  * Journal at releases/<v>/.ship-journal.json records per-leg state;
    --resume continues interrupted ships idempotently.
  * Pre-paper failure auto-rolls back in reverse order.
  * Paper-leg failure pages operator; does NOT auto-rollback earlier
    legs (git tag is remote-durable per §6.1.1).
- 4 unit tests cover happy path, pre-paper failure auto-rollback,
  paper-leg needs-manual, --resume across interruptions.

vault-cli/src/vault_cli/commands/authoring.py
- vault new appends to id-registry.yaml (David R3-H3 + C-5
  enforcement); `git pull --rebase` before allocation.
- authors: auto-populated from git config user.email (David R3-H4 /
  M-15). Was: field never set.
- vault edit injects validation-error comment block at top of YAML
  and re-opens up to --retries=3 times (David R3-H1). Was: terminal
  traceback mid-authoring session.
- vault move refuses dirty tree, chained question, excluded-cell
  per applicability matrix (David R3-H2). Was: unchecked git mv.
- vault renumber command (NEW): post-rebase seq-collision recovery.
  Bumps seq, renames file, updates id field, appends registry
  (David R3-N-2, was spec-only).
- vault mark-exemplar command (NEW): promotes to vault/exemplars/
  with provenance + human_reviewed_at gate (David R3-N-9).

vault-cli/src/vault_cli/compiler.py
- FTS5 virtual table + sync triggers added to DDL (B.5). Triggers
  keep questions_fts in sync via AFTER INSERT/UPDATE/DELETE.
  schema_fingerprint accounts for triggers now.

tests/test_hashing.py
- Nested-dict hash-stability fixture (Soumith R3-F-4). Was: test
  only reordered top-level keys + collapsed details to one key.

All 28 tests pass (22 → 28: +4 ship journaling, +1 multi-table
migration symmetry, +1 nested-dict hash stability). release_hash
unchanged at 1b304282... — FTS5 addition doesn't affect content
Merkle per §3.5 input-only design.
2026-04-16 13:10:16 -04:00
Vijay Janapa Reddi
8205d8a5f9 feat(vault): Phase 2 release pipeline — snapshot, migrations, export-paper, publish, verify
Primitives (§4.2):
  vault snapshot <v>          — stage to releases/.pending-<v>/.
  vault migrations-emit A B   — forward + inverse SQL; rollback embeds full
                                prior-row bodies for UPDATE/DELETE so rollback
                                works without mechanical inversion (C-1).
  vault export-paper <v>      — emit macros.tex + corpus_stats.json via SQL
                                over releases/<v>/vault.db. Replaces
                                paper/scripts/generate_macros.py; paper + site
                                agree by construction (H-21 runtime closure).
  vault tag <v>               — git-commit + git-tag v<version>.
  vault verify <v>            — reconstruct release_hash from YAML source and
                                assert equality with release.json (C-3
                                academic-citability property).

Composed product (§4.3):
  vault publish <v>           — check --strict + build + snapshot +
                                migrations-emit. Stages to .pending-<v>/ and
                                swaps to final via POSIX rename(2) as the
                                last step (C-7 non-atomic fix). --resume
                                detects orphaned pending dirs.
                                latest symlink swap via rename-over-tmp.

First citable release artifact committed: releases/0.9.0/ with
vault.db (25 MB), release.json (release_hash, policy_version,
git_sha, timestamp), d1-migration.sql + d1-rollback.sql stubs.

Gitignore updated: /vault.db at vault/ root excludes the transient
build artifact, but vault.db inside releases/<v>/ IS committed per
§13 (academic integrity — permanently citable).

Tests (22 pass, +3):
- test_snapshot_copies_db_and_writes_release_json
- test_migrations_emit_added_modified_removed: verifies rollback embeds
  prior-row body (C-1 correctness)
- test_rollback_symmetry_property: apply-forward-then-rollback returns
  dump identical to pre-migration state (closes M-1 property test).

Phase 2 milestone: PASSED. End-to-end:
  vault publish 0.9.0      # composed product
  vault verify 0.9.0       # from-source round-trip
  vault export-paper 0.9.0 # emits macros.tex + stats
  → release_hash 1b304282... reproducible from YAML.
2026-04-16 12:40:04 -04:00
Vijay Janapa Reddi
812ba408d0 feat(vault): Phase 1 core — schema, hashing, policy, loader, validator
LinkML schema at vault/schema/question_schema.yaml is the sole schema
source of truth. Pydantic models in vault_cli.models are currently
hand-authored to match; full LinkML codegen wires in Phase 2 with the
drift-check in CI.

Core modules:
  vault_cli/models.py     — Pydantic question model (closed enums, content-
                            format per field, schema_version=1 gate).
  vault_cli/hashing.py    — canonical content_hash over whitelisted fields;
                            release_hash Merkle with __policy__ and
                            __canon_version__ leaves (Chip N-H5).
  vault_cli/yaml_io.py    — hardened SafeLoader: 256KB cap, depth 10 cap,
                            aliases rejected, timeout (H-7).
  vault_cli/paths.py      — path-as-classification parser with lowercase +
                            enum enforcement (H-9).
  vault_cli/loader.py     — walks vault/questions/, returns loaded + errors
                            (never raises — aggregate reporting).
  vault_cli/validator.py  — tiered invariant engine; fast + structural tiers
                            implemented per ARCHITECTURE.md §5.
  vault_cli/compiler.py   — YAML → SQLite with release_metadata rows
                            (release_id, release_hash, policy_version,
                            schema_version, published_count).
  vault_cli/policy.py     — single filter predicate. No consumer
                            re-implements (H-21).

release-policy.yaml v1: status=published. Dropped require_validated in
the wake of 9199/8053 resolution — validation is implicit in the
maintainer-approval → status=published transition, not a separate flag.

Tests (19 pass): key-order hash invariance (Soumith M-NEW-4), policy
filter correctness (H-21 runtime check), YAML hardening (H-7).
2026-04-16 12:37:06 -04:00
Vijay Janapa Reddi
5ee46fc2a5 feat(vault-cli): Phase 0 package scaffold
Pyproject.toml with Typer+Rich+Pydantic+PyYAML deps. Console entry
point 'vault' → vault_cli.main:app. Smoke tests cover --version,
--help, and the exit-code taxonomy regression guard.

Module layout:
  src/vault_cli/__init__.py
  src/vault_cli/_version.py   — single source for __version__
  src/vault_cli/exit_codes.py — stable IntEnum taxonomy (§4.6)
  src/vault_cli/main.py       — Typer app, --version flag
  tests/test_smoke.py         — 4 tests, all green in 0.04s

Subcommands land incrementally from Phase 1 per ARCHITECTURE.md §14.
Python ≥3.12 required; CI pins 3.12 for hash stability.

Milestone gate: pip install -e interviews/vault-cli/[dev] &&
vault --version passes. Tests green. Ready for Phase 1.
2026-04-15 21:25:52 -04:00