Commit Graph

34 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
c92effc269 feat(vault-cli): Phase 4.7 — chain decay detection (advisory)
Detects chain members that have drifted semantically away from their
chain mates after an edit. Re-embeds changed YAMLs with the same model
the corpus uses (BAAI/bge-small-en-v1.5) and reports the min cosine to
each chain mate.

Default invocation (advisory):

    python3 scripts/check_chain_decay.py
    # diffs against origin/dev, flags chains with min mate-cosine < 0.40

Other modes:

    --files <a.yaml> <b.yaml>     explicit files instead of git diff
    --base HEAD~5                 different base ref
    --threshold 0.50              tighter cutoff (slow drift detection)
    --strict                      exit non-zero on flag (use as CI gate)

Default is advisory not blocking — first ship intentionally doesn't
fail commits or CI. The threshold 0.40 is calibrated against the
post-Phase-1 corpus; tune as needed once you've seen what real-edit
deltas look like in practice.

Implementation notes:
  - Reuses embeddings.npz for chain-mate vectors (no re-embedding the
    whole corpus per run).
  - Only the changed question gets re-embedded — fast for typical
    PR-sized changes.
  - Skips changed questions that aren't in chains; skips chain
    memberships where the mate isn't in embeddings.npz (e.g., the
    Phase 3 promoted drafts before they hit the next embedding rebuild).

Smoke checks:
  - --base origin/dev finds 4 changed YAMLs (the Phase 3 promoted
    drafts), correctly reports no chain memberships (those questions
    aren't in chains.json yet — by design, gated on human review).
  - --files <cloud-2520.yaml> on a real chain member: cos=0.79 vs
    its L5 mate cloud-2521 (well above 0.40 threshold ✓).
2026-05-01 17:31:30 -04:00
Vijay Janapa Reddi
12b35a0929 feat(vault-cli): promote_drafts.py — one-command Phase 3.d helper
Closes the loop on the pilot pattern from a750ab7bc (manual promotion
inline script). Reads draft-validation-scorecard.json and either
promotes every passing draft (--all-passing) or an explicit list
(--qids edge-2536,edge-2537).

Per draft:
  - strips _authoring private metadata; replaces with proper schema
    fields (provenance, status, authors, human_reviewed, created_at)
  - adds gap-bridge:<lower>-<higher> tag for traceability
  - renames .yaml.draft → .yaml
  - appends id to id-registry.yaml (append-only — preserves the
    CI-enforced ledger contract)

Optional flags:
  --publish        flip status to published (default: keep as draft so
                   the human reviewer's workflow stays explicit)
  --reviewed-by X  set human_reviewed.status=verified, by=X, date=now
                   (implies the reviewer has actually read the drafts)
  --dry-run        preview without writing

Refuses to overwrite a <id>.yaml that already exists. Skips
already-promoted drafts (with a warning) when called with
--all-passing on a scorecard whose drafts have been promoted earlier.

Smoke checks:
  - --all-passing on the existing scorecard correctly identifies all 4
    pilot drafts as already-promoted (they shipped in a750ab7bc).
  - --qids edge-2535 --dry-run on the leftover failed-validation draft
    previews the promotion as expected.
2026-05-01 17:22:45 -04:00
Vijay Janapa Reddi
202397f594 Merge origin/dev into yaml-audit
Pull in the dev work that landed since yaml-audit was last synced:
  - --legacy-json renamed to --local-json (2b381bb949) — script/doc
    updates needed below in this branch
  - CI workflow refactor (validate-dev / validate-vault now reusable)
  - all-contributors automation, gitignore tightening, codespell list
  - PR #1622 navbar URL rewrite for dev preview
  - PR #1619 clone-size refactor, #1618 milestone3 xor fix, #1617
    perceptron seed, #1616 tito status M3
  - Chapter 9 PDF layout refinement
  - assorted staffml/practice fixes (pickRandom deps, GitHub star gate)

This merges the canonical dev state into yaml-audit so subsequent
work continues on top of the freshest base. Conflicts in
practice/page.tsx + corpus.ts + ARCHITECTURE.md resolved to keep both
sides' additive changes (Phase 2 tier work + dev's later refactors).
2026-05-01 17:11:31 -04:00
Vijay Janapa Reddi
604869b986 feat(vault-cli): Phase 3.a + 3.b — gap-driven authoring tooling
Two new scripts that together close the loop from a gap entry to a
reviewable candidate question with a multi-gate scorecard.

generate_question_for_gap.py (3.a):
  - Reads a gap entry, loads between-questions + same-bucket exemplars,
    prompts gemini-3.1-pro-preview, runs Pydantic Question validation,
    and writes <track>/<area>/<id>.yaml.draft. The .draft suffix keeps
    drafts out of vault check / vault build until promotion.
  - ID allocator scans corpus + existing drafts so a batch run gets
    distinct fresh IDs without touching id-registry.yaml.
  - Modes: --gap-index, --gaps-from + --limit, --dry-run.

validate_drafts.py (3.b):
  - Five gates per draft: schema (Pydantic), originality (cosine vs
    in-bucket neighbours via BAAI/bge-small-en-v1.5; matches the corpus
    embeddings.npz so values are comparable; cutoff 0.92), level_fit
    (Gemini-judge against same-level exemplars), coherence
    (Gemini-judge: scenario/question/solution consistency), and bridge
    (Gemini-judge: chain-fit between the gap's two anchors).
  - Final verdict pass iff every non-skipped gate passes.
  - Skips: --no-originality, --no-llm-judge.
  - Output: interviews/vault/draft-validation-scorecard.json.

Smoke checks:
  - 3.a --dry-run --gap-index 0: resolves gap, builds prompt, allocates
    cloud-4579. Synthetic Gemini response Pydantic-validates clean.
  - 3.b on a synthetic /tmp draft: schema + originality pass (top
    neighbour cosine 0.73 vs 0.92 threshold).

Phase 3.c (pilot run on 30 gaps) deferred: it generates new YAML
question content that needs human review before promotion. The
tooling ships ready; running it is a user-supervised step.

CHAIN_ROADMAP.md Progress Log + Phase 3 status updated.
2026-05-01 11:31:06 -04:00
Vijay Janapa Reddi
83fe0f7193 feat(vault): Phase 1 — second-pass chain coverage build (373 → 879)
Diagnoses uncovered (track, topic) buckets and runs a relaxed Gemini
sweep targeting them. New chains tier="secondary"; pre-existing chains
backfilled tier="primary".

Tools (Phases 1.1, 1.2/1.3, 1.5):
  - diagnose_chain_coverage.py: surface buckets with no chains
    (committed earlier on yaml-audit)
  - build_chains_with_gemini.py: --mode lenient adds Δ ∈ {0,1,2,3}
    (committed earlier on yaml-audit)
  - merge_chain_passes.py: merges primary + secondary, enforces the
    multi-membership cap (max 2 chains/qid; non-L1/L2 capped at 1)

Sweep (Phase 1.4):
  - 17 Gemini-3.1-pro-preview calls, ~22 min wall time, 211 buckets
  - 506 chains accepted (above the 200-400 estimate), 269 new gaps
  - validator caught a few cross-bucket and Δ=4 hallucinations inline
  - Δ distribution: Δ=1 69.1%, Δ=2 21.1%, Δ=3 4.6%, Δ=0 5.2%
    (10.9% of chains contain at least one Δ=0 — within target band)
  - random spot-check of 5 Δ=0 chains: all share scenario threads
    (DMA, CMSIS-NN, on-device routing, PB-scale pipelines)

Coverage gains (chains/topic before → after):
  - cloud   2.95 → 4.37   (242 + 116 secondary)
  - edge    0.64 → 2.59   ( 49 + 148 secondary)
  - mobile  0.74 → 2.56   ( 46 + 113 secondary)
  - tinyml  0.80 → 2.64   ( 36 +  83 secondary)
  - global  0.00 → 0.96   (  0 +  46 secondary)
  Buckets with ≥1 chain: 102 / 313 (33%) → 285 / 313 (91%).

Validation:
  - apply_proposed_chains.py --dry-run: validation clean (879 chains)
  - vault check --strict: 10,701 loaded, 0 invariant failures
  - vault build --legacy-json: chainCount 373 → 879, release_hash
    rolled to 04ee8a23…
  - playwright chain-and-vault-smoke.mjs: 13/13 pass

Phase 1 complete. Next: Phase 2 (tier surfacing in staffml UI).
2026-04-30 20:12:27 -04:00
Vijay Janapa Reddi
d272d374aa feat(chains): --mode lenient + tier field for second-pass coverage
Phase 1.2 + 1.3 of CHAIN_ROADMAP.md. The two land together because the
prompt template, validator Δ-rule, and tier-tagging must stay in lockstep
or chains.proposed.lenient.json would mis-validate.

build_chains_with_gemini.py:
  - new LENIENT_PROMPT_TEMPLATE alongside renamed STRICT_PROMPT_TEMPLATE;
    lenient template tells Gemini to accept Δ ∈ {0,1,2,3}, with Δ=0 only
    for shared-scenario same-level pairs and Δ=3 last-resort
  - MODE_CONFIG single-source-of-truth maps mode → (template, allowed Δ set)
  - validate_chain now takes mode= and gates on the per-mode Δ set
  - process_batch tags lenient-mode chains with tier="secondary" and
    a chain_id suffix (-secondary) so primary/secondary IDs never collide
  - new --mode {strict,lenient} flag (default strict — primary chains
    keep producing under the same rules as before)
  - new --buckets-from <chain-coverage.json> flag that restricts the run
    to the uncovered_buckets list from diagnose_chain_coverage.py
    (the Phase 1.4 second-pass entry point)

apply_proposed_chains.py:
  - docstring note: tier field is intentionally not validated here
    (it's a UI hint, not a structural invariant)
  - already accepts Δ=0 chains via its non-strict monotonicity check, so
    no logic change needed

tests/test_chain_validation.py:
  - 19 cases covering both modes: strict accepts +1/+2 and rejects Δ=0,
    Δ≥3, and backward; lenient accepts Δ=0/Δ=3 but still rejects Δ≥4 and
    backward; both modes reject size-out-of-range, multi-topic, and
    unknown qids. Loads the script via importlib (it's not part of the
    importable vault_cli package).

Smoke check (--dry-run --buckets-from chain-coverage.json --mode lenient):
17 calls planned for the 211 uncovered buckets, well under the 200 cap.
2026-04-30 19:29:12 -04:00
Vijay Janapa Reddi
b289a5eb75 Merge branch 'yaml-audit' into dev
Brings the vault chain rebuild + sidecar architecture work into dev:

  - Hierarchical question layout (interviews/vault/questions/<track>/<area>/<id>.yaml)
    completed in earlier dev merge; this branch adds the sidecar split
  - chains.json is now the authoritative chain registry; YAML chains: field
    stripped from all 10,701 question files
  - 373 chains rebuilt via Gemini 3.1 Pro Preview with strict progression
    rules (Δ ∈ {1,2}, single-track, single-topic, multi-membership cap=2)
  - 138 gaps surfaced into gaps.proposed.json for Phase 3 authoring
  - Tooling: build_chains_with_gemini.py, apply_proposed_chains.py,
    summarize_proposed_chains.py, diagnose_chain_coverage.py
  - CHAIN_ROADMAP.md captures the resumable Phase 1-4 plan

State at merge:
  - vault check --strict: 10,701 loaded, 0 invariant failures
  - vault build --legacy-json: clean, releaseId=dev, 9438 published, 373 chains
  - playwright UI suite (last run on yaml-audit): 13/13 pass

Phase 1.1 (diagnose_chain_coverage.py) shipped on yaml-audit; Phase
1.2-1.6 (lenient sweep, tier merge) still pending. See CHAIN_ROADMAP.md
Progress Log for the resumable cursor.
2026-04-30 18:39:05 -04:00
Vijay Janapa Reddi
af5f25f543 feat(vault-cli): diagnose_chain_coverage.py — surface buckets needing chains
Loads the published corpus (via vault_cli.policy — single source of truth)
and chains.json, buckets by (track, topic), and emits chain-coverage.json
with two cuts:
  - uncovered_buckets: ≥3 questions, 0 chains
  - under_covered_buckets: ≥6 questions, ≤1 chain
Plus per-track summary + top-10 uncovered for quick read.

Output is gitignored — regeneratable, fed to Phase 1.4's --buckets-from.

Phase 1.1 of CHAIN_ROADMAP.md. See progress log for the run results
(211 uncovered buckets, edge/mobile/tinyml chain density 0.6-0.8 vs
cloud's 2.95, biggest miss is cloud:roofline-analysis at 144q/0 chains).
2026-04-30 18:15:59 -04:00
Vijay Janapa Reddi
9fdbfb9a4c refactor(vault-cli): rename --legacy-json to --local-json
The flag is the StaffML frontend's local-dev fallback (read corpus.json
from disk via NEXT_PUBLIC_VAULT_FALLBACK=static), not a deprecated path.
"Legacy" implied "soon to be removed"; "local-json" describes its actual
role and reads correctly in scripts and docs.

- vault-cli: rename CLI flag, parameter, result key, and help text.
- CI workflows + pre-commit config: invoke the new flag name.
- All scripts that print the command (suggest_exemplars,
  pre_commit_corpus_guard, promote_validated, rename_legacy_ids,
  export_to_staffml, the paper analyze_corpus/generate_*) updated.
- Comments and docs (ARCHITECTURE, CHANGELOG, REVIEWS, TESTING,
  MASSIVE_BUILD_RUNBOOK, DEPRECATED, AUTHORING, plus frontend
  comments and .env.example / .gitignore) updated.

The "legacy_json" sentinel string in corpus_stats.json._meta.source
is intentionally NOT renamed — it is a stable artifact format read
by downstream paper-generation tooling.
2026-04-30 09:30:28 -04:00
Vijay Janapa Reddi
d82a4f00aa fix(chains): tolerate Gemini CLI exit-1 + add inter-call backoff 2026-04-30 09:22:57 -04:00
Vijay Janapa Reddi
681e404633 feat(chains): add gap detection + multi-chain UI helpers
build_chains_with_gemini.py: prompt now asks Gemini to also surface
missing-rung gaps — e.g., 'this bucket has L1 + L3 questions on the same
scenario thread but no L2 to bridge them.' Gaps are captured to
interviews/vault/gaps.proposed.json as a separate authoring backlog.
This is a free signal: it costs no extra calls, identifies pedagogical
holes the corpus doesn't yet fill, and feeds a future generation pass
(with independent validation before any new question is committed).

corpus.ts: getChainForQuestion now accepts an optional preferredChainId
so multi-chain questions can disambiguate via URL (?chain=...). Adds
getAllChainsForQuestion() returning every chain a qid belongs to.
Default behavior unchanged when only one chain exists.
2026-04-30 09:02:35 -04:00
Vijay Janapa Reddi
0b14e08b52 feat(vault-cli): summarize_proposed_chains.py — quick-read report on staging
After build_chains_with_gemini.py produces chains.proposed.json, this
script gives a one-shot summary: chain count, size distribution, level-Δ
histogram, multi-chain membership stats, sample chains for spot-check.
2026-04-30 09:00:04 -04:00
Vijay Janapa Reddi
d8a55f3334 feat(chains): tighten progression rules + allow up to 2-chain membership
Gemini prompt + structural validator now enforce:
  - Consecutive Bloom delta MUST be 1 or 2 (rejects Δ=0 same-level pairs
    and Δ≥3 huge jumps; backward steps already impossible)
  - Strict +1 preferred; +2 accepted only when no +1 candidate exists
  - A question can appear in up to 2 chains, but only if it's L1 or L2
    (foundational anchor pattern); 3+ chain memberships are rejected as
    over-stuffing

Empirical alignment: 70% of legacy chains were strict +1, 19% had +2
jumps, 8% had +3 jumps that we now reject as too-large pedagogical
moves. The new rules tighten quality while keeping the bulk of
defensible existing structure expressible.
2026-04-30 08:58:38 -04:00
Vijay Janapa Reddi
8423dcb08f feat(vault-cli): Gemini-powered chain builder + apply script
build_chains_with_gemini.py — adaptive batched chain proposal:
  - Buckets corpus by (track, topic), packs into ~80K-token batches
  - Calls gemini-3.1-pro-preview with structured-output prompt
  - Validates each proposed chain (size 2-6, monotonic, single-topic,
    members exist, no cross-chain duplicates)
  - Writes staging chains.proposed.json (never touches live registry)

Full-corpus plan: 313 buckets pack into 44 calls (well under 250/day Pro
cap, uses ~70K input tokens per call out of 1M context).

Test on tinyml:network-bandwidth-bottlenecks (6 questions) -> 2 well-formed
chains, Bloom-monotonic with coherent rationale (Hailo-8 PCIe arc + BLE
network arc).

apply_proposed_chains.py — gated migration:
  - Re-validates staging file against live YAML corpus
  - Backs up chains.json -> chains.json.bak
  - Refuses to apply if any structural invariant fails
2026-04-30 08:53:06 -04:00
Vijay Janapa Reddi
e43ff34719 feat(vault-cli): chain audit + rescue suggestions with embedding similarity
Adds two subcommands and supporting modules:

  vault chains audit
    Reports chain health: orphans, position-drift (gaps from filtered
    members), stale-registry, intra-chain cosine distribution, weakest
    chains list. Embedding-aware via --no-embeddings escape hatch.

  vault chains suggest
    For each orphan singleton, ranks rescue candidates within the same
    (track, topic) bucket. Hybrid scoring:
      HARD filter: level_delta in {0, 1, 2} (matches 92% of observed
                   chain edges across the corpus)
      SOFT rank:   embedding cosine + delta=1 priority
      Bands:       strong-merge / review-merge / below-threshold

Embeddings: bge-small-en-v1.5 (BAAI). Calibrated via
scripts/calibrate_chain_embeddings.py against the 726 healthy chains.
Empirical findings (in script header docstring):
  - bge-small precision@1 = 0.283, recall@3 = 0.447
  - bge-large gains only +0.013 P@1 at 7x embedding time — not worth it
  - Same-bucket questions are inherently close (μ_pos=0.785, μ_neg=0.757);
    so this is suggestion-only, never auto-apply.

Cross-encoder rerank experiment script included for future research
(BAAI/bge-reranker-base) — current run OOM'd on 16GB; deferred.

Embedding cache (.npz) is gitignored — reproducible from source.
2026-04-29 19:00:09 -04:00
Vijay Janapa Reddi
43dedf9948 docs(vault): update architecture docs and audit scripts for 87-topic baseline
Update ARCHITECTURE.md to reflect 87 curated topics and 131 edges. Refactor exemplar_coverage_audit.py to use vault.db instead of retired corpus.json. Update exemplar-gaps.yaml inventory.
2026-04-26 16:47:56 -04:00
Vijay Janapa Reddi
d2a8e4d28b fix(vault-cli): spoof check picked wrong base ref, swept 132-commit diff
The reviewer-identity spoof check tried base refs in the order
(origin/main, origin/dev, HEAD~1) and returned the first that
resolved. On dev, where origin/main is 132 commits behind origin/dev,
this picked main and diffed every vault YAML changed since that point —
sweeping up 100+ files unrelated to the current push and reporting
each as a spoof-check failure.

Fix: respect GITHUB_BASE_REF when set (PR mode), otherwise diff
against HEAD~1 (push mode). This produces exactly the file set the
check is meant to validate — what this PR or push is proposing —
not the entire branch divergence from main.

Verified locally on the codespell+codegen-hashes commit: now reports
"no vault/questions/ changes in this PR" instead of 100+ spurious
failures.
2026-04-25 14:06:18 -04:00
Vijay Janapa Reddi
3f9b044b31 chore(ci): rename vault-ci.yml → staffml-validate-vault.yml
Brings the last outlier workflow file into the repo-wide
<cluster>-<verb>-<scope>.yml naming convention. Every other cluster
(book, tinytorch, kits, labs, instructors, mlsysim, slides, site,
staffml) uses this pattern; vault-ci.yml was the only one that didn't.

  vault-ci.yml  →  staffml-validate-vault.yml
  name: '🎯 StaffML · 🔎 Vault CI'  →  '🎯 StaffML ·  Validate (Vault)'

Now staffml-validate-vault.yml is a direct sibling of
staffml-validate-dev.yml — the former validates the vault data + CLI
+ worker, the latter validates the site build. Same verb, different
scope, easy to reason about.

Updated references:
  .github/workflows/staffml-validate-vault.yml — self-reference in
    the paths trigger (so the workflow still fires when it's edited)
  interviews/vault/ARCHITECTURE.md §19.3 and §51 — both path refs
  interviews/vault/TESTING.md §4.1 — workflow name + display name
  interviews/vault-cli/scripts/check_registry_append_only.py — docstring

No branch-protection settings change needed — GitHub matches required
checks on the workflow's 'name:' field, not the filename. Anyone with
a bookmark to the old Actions-tab URL will get a 404 (harmless).

Other workflow naming I surveyed but deliberately LEFT alone
(all consistent with existing conventions):
  staffml-update-paper.yml      matches tinytorch-update-pdfs pattern
  staffml-auto-pr.yml           matches bot-workflow convention
  staffml-welcome.yml           single-word verb, standard
  auto-label / update-contributors / infra-* / publish-all-live
    are cross-cutting (no cluster prefix) by design
2026-04-22 11:27:37 -04:00
Vijay Janapa Reddi
8d385b0c1a feat(d1): cutover production D1 to schema v1.0 + live worker serving
Four deployment-level fixes landed on the live Cloudflare worker + D1
instance:

1. compiler.py — populate chains table from chains.json. Pre-v1.0 the
   table was never filled, which only mattered once D1 (which enforces
   FKs by default, unlike SQLite) tried to insert chain_questions. The
   cutover failed with FOREIGN KEY constraint failed until chains(id)
   was populated.

2. types.ts (worker) — add competency_area, bloom_level, phase, and
   human_review_* fields. Worker SQL was already SELECT *, so the new
   columns flow through without code changes, but the TypeScript row
   interface needed updating for downstream consumers.

3. rate_limit.ts — Math.max(60, …) floor on expirationTtl. Old calc
   could emit values as low as 11s, which D1's KV backend rejects
   (minimum 60s). Was throwing 1101 on every request after the
   deployment. Tail logs showed 'Invalid expiration_ttl of 14'.

4. wrangler.toml — bump SCHEMA_FINGERPRINT to match the v1.0 vault.db
   (b97218dae6354b1b…). Without this, /manifest reports
   schema_fingerprint_ok: false and clients degrade.

New script:
  scripts/ship_d1.py — end-to-end reload of D1 from the current YAMLs.
  'vault build' → SQL dump → 'wrangler d1 execute --file'. Handles FK
  ordering (chains first, then questions, then chain_questions). Used
  for this cutover; repeatable for future schema bumps.

Deployment state (2026-04-22):
  Worker URL:    https://staffml-vault.mlsysbook-ai-account.workers.dev
  D1 database:   staffml-vault (254f630f-…) — 9,199 questions loaded
  Release hash:  997747a8f43bbd89e03c6bb0e67865f8de35ac8316fbb0457ee0b8f955afb32f
  Manifest:      curl …/manifest returns 9,199 / schema_fingerprint_ok=true
  GET question:  /questions/cloud-0185 returns the post-Phase-2 v1.0 record
                 (zone=mastery, level=L6+, competency_area=latency, …)
  Filtered list: /questions?track=cloud&level=L6%2B works with pagination

Site cutover is NOT in this commit. The existing hybrid path
(bundled corpus.json primary + worker /search secondary) keeps
working unchanged. To flip the site entirely to the worker:
  export NEXT_PUBLIC_VAULT_API=https://staffml-vault.mlsysbook-ai-account.workers.dev
  unset NEXT_PUBLIC_VAULT_FALLBACK
  # then: next build && next deploy
That flip converts every caller from sync 'getQuestions()' to async
via corpus-source.ts — deferred because callers need an audit pass
to handle async correctly.
2026-04-22 10:29:35 -04:00
Vijay Janapa Reddi
a17107f3df chore(vault-cli): update d1 schema + codegen hashes for schema v1.0
- d1-schema.sql: regenerated to match compiler.py changes. Adds
  competency_area, bloom_level, phase, human_review_* columns to
  questions table. Adds idx_questions_human_review index.
  chain_questions PK changes from (chain_id, position) to
  (chain_id, question_id) for multi-chain + non-contiguous support.
  Drops deep_dive_title/deep_dive_url.
- codegen-hashes.txt: new baseline covering the v1.0 models.py,
  d1-schema.sql, and @staffml/vault-types/index.ts.

Fixes the vault codegen --check drift test that was failing CI.
2026-04-21 18:24:21 -04:00
Vijay Janapa Reddi
6ccee10a9d feat(vault-cli): add vault lint + schema drift check
Two new tools.

vault lint <path>
  Author-facing linter. Accepts a single YAML file or a directory.
  Severity levels:
    ERROR    schema violation; question cannot be loaded
    WARN     likely misclassification (zone-level affinity mismatch,
             chain position duplication, etc.)
    INFO     hygiene suggestions (human-review-pending on published Qs)

  Zone-level affinity warning implements paper §3.3 Table 2 (line 397):
  'An L1 question tagged as evaluation is flagged for review, since
  evaluation is cognitively inconsistent with Bloom's Remember level.'
  The warning is soft — marking an outlier does not reject it; it
  surfaces for reviewer judgement. Quickly identifies the ~943 L6+
  questions currently carrying zone=design that should probably be
  zone=mastery.

scripts/check_schema_sync.py
  CI drift check. Compares enum values in schema/enums.py against
  schema/question_schema.yaml (the authoritative LinkML schema) and
  exits non-zero if they disagree. Prevents the three-schema drift
  that caused the v0.1 migration defects from recurring.

  Enums cross-checked: Track, Level, Zone, BloomLevel, Phase, Status,
  Provenance, HumanReviewStatus. Output on success: 'OK: 8 enums in
  sync.' Wire into CI in a follow-up PR.
2026-04-21 18:04:10 -04:00
Vijay Janapa Reddi
ed58b56cf4 docs(vault): archive obsolete scripts + post-mortem the v1.0 migration
Archives pre-v1.0 scripts under scripts/archive/ in both
interviews/vault/ and interviews/vault-cli/. ARCHITECTURE.md §3.3
rewritten with a post-mortem on why path-as-classification could not
represent the paper's full 11-zone × 6-level taxonomy. CHANGELOG.md
added documenting the full v1.0 migration.
2026-04-21 18:02:05 -04:00
Vijay Janapa Reddi
63dc15977c chore(paper): regenerate macros for 87-topic taxonomy (0.10.0) 2026-04-18 08:07:22 -04:00
Vijay Janapa Reddi
4013aa422e feat(vault): seed exemplar pool with 86 human-reviewed questions
Adds suggest_exemplars.py script for identifying high-quality candidates.
Moves 86 top-scoring questions (1 per topic) from vault/questions/ to
vault/exemplars/ with provenance upgraded to human. Scored by presence
of napkin_math, common_mistake, solution length, and scenario length.

vault generate now finds exemplars for topic-specific generation.
Published count: 9,113 (86 moved to exemplar pool).
2026-04-18 08:06:24 -04:00
Vijay Janapa Reddi
0eecfe1108 fix(vault-cli): pre-commit corpus guard resolves COMMIT_EDITMSG via git rev-parse
The hook read '.git/COMMIT_EDITMSG' via a literal relative path. That
works in a regular clone where .git is a directory, but fails silently
in a git worktree where .git is a file pointing at a per-worktree
gitdir under the main repo. In a worktree, Path('.git/COMMIT_EDITMSG')
never exists, so commit_message_has_override() always returned False
and legitimate Vault-Override trailers were rejected.

Resolve via 'git rev-parse --git-path COMMIT_EDITMSG' which returns
the correct path in both regular clones and worktrees. This matches
the pattern already used by the Makefile's HOOKS_DIR resolution.

No behaviour change in a regular clone; worktrees can now commit with
the Vault-Override trailer as documented.
2026-04-16 18:22:03 -04:00
Vijay Janapa Reddi
3ce0595035 fix: Round-7 (Chip) + Round-8 (Dean) \u2014 2H + 1MH + 5M + 4L closed
Chip R7 findings:

R7-H-1 (HIGH): sw.js ReferenceError on offline fetch failure
  `cached` was const-scoped inside `if (!manifestStale)` block but
  referenced in the outer catch's "if (cached) return cached" offline
  fallback. Offline users hit ReferenceError instead of cache.
  Fix: hoist to `let cached = null` above the gate.

R7-H-2 (MEDIUM-HIGH): schema_fingerprint portability across SQLite versions
  Previous compiler hashed all sqlite_master including FTS5 shadow tables
  (questions_fts_data/idx/docsize/content/config) whose DDL varies across
  SQLite versions. Host Python SQLite \u2260 Cloudflare D1 SQLite \u2192
  fingerprint permanent mismatch \u2192 worker pinned to degraded mode forever.
  Fix: filter shadow tables out on both sides (compiler.py + worker/index.ts);
  fingerprint covers only user-authored DDL.

R7-M-3 (MEDIUM): schemaOk sticky on transient D1 failure
  Previously any probe exception pinned schemaOk=false until release
  rollover. Now: 5-minute retry window via schemaCheckedAt tracking.

R7-M-4 (MEDIUM): vault dup --vault-dir + pass-through
  ACKS_PATH was CWD-relative; invoking CLI from non-default cwd silently
  missed the ack file, legitimate templates reddied nightly CI forever.
  Fix: vault dup --vault-dir flag + pass through to ack_pairs(vault_dir);
  validator._scenario_dedup_lsh takes vault_dir; slow_tier threads it.

R7-M-5 (MEDIUM): FTS5 probe memoization
  Previously probed sqlite_master on every /search request \u2014 directly
  undid part of R5-H-1's manifest memo cost fix. Now: module-level
  ftsProbed memo, reset on release_id change (FTS5 presence can only
  change across releases).

R7-L-6 (LOW): reviewer-identity name clarity
  Var was `committer_emails` but git log %ae is AUTHOR email. Behavior
  was correct (intentional, so rebase-by-maintainer preserves chain);
  renamed to commit_author_emails and updated comments.

R7-L-7 (LOW): manifest memo race on release rollover
  maybeInvalidateSchemaCache nulled manifestMemo mid-write causing
  microsecond stampede. Now: don't null memo \u2014 60s TTL is forgiving
  enough staleness bound for release rollover.

Dean R8 findings:

R8-H-1 (HIGH): SLI cron was structurally broken
  Previous SLI reconstructed canonical content_hash from worker JSON
  response \u2014 but reconstruction omitted tags, chain, generation_meta
  (WHITELIST_TOP includes tags + chain). Every hourly run false-positive'd
  on any question with a non-empty tag list, effectively a pager-DoS.
  Fix: compare worker's stored content_hash directly against release
  vault.db's stored content_hash. Same compilation source \u2192 mismatch
  means real corruption.

R8-M-2 (MEDIUM): SLI 404 handling
  urlopen crashed on deprecated IDs during release rollover. Fix:
  classify by response code. 404 \u2192 id_missing_in_worker (expected);
  5xx \u2192 transport_errors (separate tally); only real hash mismatch
  pages the operator.

R8-M-3 (MEDIUM): vault deploy + rollback primitives spec-only
  ARCHITECTURE \u00a76.2 said 'default rollback = snapshot restore \u2014 always
  works' but no vault deploy or vault rollback command existed, so the
  R2 snapshot substrate that makes §6.2 true was never built.
  Fix: implemented `vault deploy` with synchronous R2 snapshot
  (wrangler d1 export + wrangler r2 object put before migration) and
  `vault rollback --method snapshot|sql`. CLI now has 26 subcommands.
  Deploy requires authenticated wrangler; code path exists.

R8-LM-4 (LOW-MED): D1 bootstrap migration
  wrangler.toml referenced migrations_dir='migrations' but that dir
  didn't exist. First-deploy-from-scratch relied on manual operator steps.
  Fix: generated interviews/staffml-vault-worker/migrations/0001_bootstrap.sql
  from compiler.DDL so `wrangler d1 migrations apply` works on a fresh D1.

Test matrix (post-R7+R8 integration):
  pytest:  38 green in 0.15s
  vitest:  7 green in 131ms
  ruff:    All checks passed
  vault build: release_hash fe69d4c4... stable (unchanged \u2014 fingerprint
               filter change affects release_metadata content, not
               the release Merkle per \u00a73.5)
  vault --help: 26 subcommands (added deploy + rollback)

Convergence tracking:
  R1-R5 closed 90+ findings
  R6 (Gemini queued, not yet invoked) \u2014 will launch with R9/R10
  R7-R8 produced 12 new findings (2H + 1MH + 5M + 4L), all closed here
  Pattern: each round still finds 8-12 issues. Not yet stable.
  Expect 2-3 more rounds to hit 'no new findings' signal.
2026-04-16 16:31:17 -04:00
Vijay Janapa Reddi
6b7b3f1b70 fix: integrate Gemini Round-5 holistic review \u2014 3C + 4H + 1M + 1L fixed
Gemini 3.1 Pro reviewed the full branch (371KB / 43K words) with 1M
context. Caught 9 cross-file issues none of the 4 prior per-file
rounds saw because they required seeing multiple systems at once.

CRITICAL fixes:

R5-C-1: _MIGRATION_TABLES omitted release_metadata (release.py:118).
  Result: after `vault ship`, release_metadata never propagated to D1
  \u2192 worker kept serving old release_id forever \u2192 cache never
  invalidated \u2192 new release functionally invisible.
  Fix: added 'release_metadata' to migration-participating tables.

R5-C-2: SW offline wake-up deleted the real cache (sw.js).
  Result: when SW woke offline, currentRelease=null, cacheName
  defaulted to '...-unknown'. Activate pruned all caches not matching,
  i.e. it deleted the real cache. Offline users: no cache.
  Fix: persist currentRelease to IDB on fetch success; restore on
  activate; move cache pruning from activate to updateReleaseFromManifest
  so it only runs AFTER a successful online manifest fetch.

R5-C-3: schema_fingerprint hand-edited in wrangler.toml (compiler.py +
  worker/src/index.ts). Every DDL change required manually recomputing
  + pasting a hash + Worker redeploy; forgetting any step put the site
  in degraded mode.
  Fix: compiler.py now computes fingerprint from sqlite_master at build
  time and stores in release_metadata. Worker reads it from the DB via
  getManifest; env.SCHEMA_FINGERPRINT path removed.

HIGH fixes:

R5-H-1: getManifest hit D1 on every request before Cache API check
  (worker/src/index.ts:130). Destroyed the \u00a710.4 cost target.
  Fix: module-level manifest memo with 60s TTL. Invalidated on
  release_id change (natural cadence from Cloudflare's eventual
  propagation).

R5-H-2: _insert_stmt emitted NULL for columns absent in row (release.py).
  Result: rolling back past a new NOT NULL column would crash on SQLite
  constraint violation.
  Fix: emit only columns actually in row dict; let SQLite apply defaults.

R5-H-3: ARCHITECTURE.md \u00a713 promised CI rejects --reviewed-by spoofing,
  but no check existed.
  Fix: new scripts/check_reviewer_identity.py + CI step. Verifies for
  every changed question with provenance=llm-then-human-edited that at
  least one `authors` entry matches a commit email from the PR.

R5-H-4: LSH dedup told operator to run `vault dup --ack` but that
  command didn't exist \u2014 legitimate templates would red nightly CI
  forever.
  Fix: implemented `vault dup --ack`/--unack/--show. Writes to
  vault/dedup-acks.yaml. Validator reads the ack list and skips flagged
  pairs.

MEDIUM / LOW fixes:

R5-M-1: `vault tag` swallowed git failures with check=False.
  Result: 'tag already exists', 'nothing to commit', merge conflicts
  all printed '[green]tagged[/green]' and exited 0.
  Fix: explicit error check on every subprocess call; pre-existence
  check on tag like ship.paper_forward does.

R5-L-1: applicability-matrix invariant case-sensitive; 'Cloud' vs
  'cloud' silently failed enforcement.
  Fix: lowercase-normalize both sides of the comparison.

State:
  pytest:  38/38 green in 0.15s
  vitest:  7/7 green (fingerprint test updated to mock via release_metadata)
  ruff:    All checks passed
  CLI:     23 subcommands (added vault dup)
  release_hash: fe69d4c4... (unchanged \u2014 schema_fingerprint addition
               affects release_metadata table, not content Merkle per \u00a73.5)
2026-04-16 16:13:35 -04:00
Vijay Janapa Reddi
cbdb566381 feat(vault): Phase-1 migration contract fully closed in-repo
v2.3 \u2192 v2.4. ARCHITECTURE.md header + Appendix reflect the completed
migration.

WHAT CLOSED (\u00a711.1 contract):
  1. `vault build --legacy-json` regenerates the site's
     interviews/staffml/src/data/corpus.json from YAML. 9,199 published
     questions, site-compatible shape (chain_positions back to 0-indexed
     dict form, bloom_level derived from zone, competency_area aliased
     from topic, scope aliased from track). Deterministic via sort_keys +
     id-sort.
  2. Pre-commit hook INSTALLED via worktree-aware Makefile target
     (`make -C interviews/vault-cli hooks`). Symlink points at
     pre_commit_corpus_guard.py. Tested end-to-end: direct edit to
     vault/corpus.json triggers exit-1 with §11.1 reference.
  3. CI equivalence check added to .github/workflows/vault-ci.yml:
     regenerates corpus.json from YAML, diffs against committed. Fails
     PR on drift with actionable error message.
  4. Legacy generators demoted with DEPRECATED headers:
     - interviews/paper/scripts/analyze_corpus.py \u2192 vault export-paper
     - interviews/staffml/scripts/sync-vault.py \u2192 vault build --legacy-json
     - interviews/staffml/scripts/generate-manifest.py \u2192 vault publish
     - interviews/vault/scripts/export_to_staffml.py \u2192 vault build --legacy-json
  5. New DEPRECATED.md files at interviews/vault/scripts/ and
     interviews/staffml/scripts/ map every legacy script to its
     replacement. Both directories keep the old scripts for git-history
     legibility and archaeology; new contributors see the vault CLI first.
  6. ARCHITECTURE.md \u00a7Appendix rewritten as current-state table instead
     of aspirational "gone. replaced by..." entries.

NEW TESTS (interviews/vault-cli/tests/test_legacy_export.py \u2014 +4):
  - test_legacy_shape_matches_site_interface: every field corpus.ts
    declares is present in regenerated JSON.
  - test_chain_positions_legacy_shape: 1-indexed new schema \u2192
    0-indexed legacy dict form.
  - test_emitter_deterministic: byte-stable across reversed input order
    (required for CI diff-check).
  - test_competency_area_aliases_topic: legacy alias fields populated
    correctly.

FULL MATRIX GREEN:
  pytest:  38/38 passed in 0.19s (34 + 4 legacy-export)
  ruff:    All checks passed
  hook:    exit 0 on clean diff / exit 1 on corpus.json direct edit
  e2e:     vault build --legacy-json regenerates a bit-identical corpus.json
           vs the committed one; CI check wired to catch drift

WHAT'S LEFT (deploy-gated, \u00a720.5 #1, #5, #6 partial, #8, #9):
  - Production serves from D1: requires Phase-3 wrangler d1 create + deploy
  - Manual QA per CUTOVER_QA.md: requires live staging
  - Zero data loss D1-side verification: requires live D1
  - 48h monitoring: requires production traffic

These are intrinsically user-action; the YAML-side migration is done.
2026-04-16 14:57:24 -04:00
Vijay Janapa Reddi
8e1e47f9f8 fix(vault): normalize chain positions + tighten provenance invariant
vault-cli/scripts/normalize_chain_positions.py (NEW)
  Phase-1 split kept only chain_ids[0] per question when legacy corpus
  had multi-chain membership (up to 4 chains/question). Chains whose
  members chose a different chain_ids[0] were left with position gaps.
  Script walks vault/questions/, groups by chain_id, renumbers each
  chain's members to contiguous [1..N] sorted by current position.
  Idempotent. Rewrote 87 questions across 977 chains.

validator.py #18 (provenance-meta)
  Tightened from 'any non-human provenance requires generation_meta'
  to 'only llm-draft / llm-then-human-edited require it'. Imported
  content legitimately has no LLM attribution and shouldn't carry
  stub meta. Was incorrectly flagging 9,199 imported questions.

Re-ran vault build → new release_hash (input changed, which is
correct): fe69d4c4d3c2884efeab6189a67e929e4e970dc0f4de42ab9493531a4cabeda1.
Republished 0.9.0 release artifact. corpus-equivalence-hash.txt updated.
paper/macros.tex + corpus_stats.json regenerated (same counts:
9199/87/964 chains/31.9% coverage).

State: vault check --strict 100% clean on full 9,657-question corpus;
zero load errors; zero invariant failures. 28/28 pytest green.
vault verify 0.9.0 round-trips from YAML source. Citation property
holds on the new hash.
2026-04-16 14:10:14 -04:00
Vijay Janapa Reddi
1bc93374e1 feat(vault): Phase-1/2 polish + LICENSEs + corpus cutover branch
vault-cli/src/vault_cli/commands/stats.py (NEW, B.8)
  vault stats — live scorecard over vault.db with --format-prometheus
  scrape mode + --exemplar-coverage audit shim. Reports total / topics
  / chains / by_status / by_track / by_provenance. Resolves R3 gap
  about missing stats subcommand.

vault-cli/src/vault_cli/commands/codegen.py (NEW, B.7)
  vault codegen --check — Phase-1 presence-and-non-empty verification
  of the 3 shared-artifact files (models.py, d1-schema.sql,
  @staffml/vault-types/index.ts). Full LinkML-driven generation is
  Phase-2 follow-up.

vault-cli/Makefile (NEW, B.2)
  make install / test / lint / hooks / hooks-uninstall. Hooks target
  symlinks pre_commit_corpus_guard.py into .git/hooks/pre-commit.

vault-cli/scripts/check_registry_append_only.py (NEW, B.3)
  CI script verifying id-registry.yaml is append-only vs base branch.
  Rejects removed or reordered lines — C-5 enforcement at merge time.

vault/questions/LICENSE (NEW)
  CC-BY-4.0 for corpus content. BibTeX template with release_hash
  placeholder. Scope note clarifies vault-cli is MIT separately.

vault-cli/LICENSE (NEW)
  MIT for vault-cli Python package + scripts + docs. Scope note
  clarifies corpus is CC-BY-4.0 separately.

staffml/src/lib/corpus-vault.ts (NEW, B.11)
  Vault-API-backed data source mirroring corpus.ts public surface.
  Adapts @staffml/vault-types Question → legacy Question shape so
  callers don't need to change. Not wired into any component yet —
  the swap happens via corpus-source.ts.

staffml/src/lib/corpus-source.ts (NEW, B.11)
  Cutover router: getCorpusSource() returns 'static' or 'vault-api'
  based on NEXT_PUBLIC_VAULT_FALLBACK. Components that opt into the
  cutover import from here; others continue using corpus.ts directly
  (unchanged behavior). Phase-4 cutover flips components one-by-one
  rather than big-bang-replacing corpus.ts.

Phase-1/2 now has the full CLI surface (19 subcommands), LICENSEs
for legal Phase-3 deploy, and the site-side cutover pathway ready
for Phase-4 canary.
2026-04-16 13:10:16 -04:00
Vijay Janapa Reddi
42f4d1ca8b fix(vault): Round-3 correctness + vault ship + authoring contract
Round-3 review (4 reviewers on v2.1) surfaced two code-correctness
Criticals that this commit fixes, plus the contracted-but-missing
`vault ship` coordinator and David's authoring-UX gaps.

Critical fixes (real bugs in landed code):

worker/src/index.ts
- SCHEMA_FINGERPRINT placeholder fail-closed (Chip R3-C1 / Dean R3-NH-3).
  Was: placeholder auto-passed and silently disabled the fingerprint
  check. Now: placeholder forces degraded mode until operator sets
  real fingerprint.
- DDL hash now includes triggers (FTS5-aware).
- release_id change invalidates schema-fingerprint memoization
  (Dean R3-NH-4).
- wrangler.toml now pins the real fingerprint.

staffml/public/sw.js
- /manifest polling TTL-throttled to 5min (Chip R3-C2). Was:
  per-request fetch nullified the §10.4 cost model.
- API origin persisted to IndexedDB; rehydrated on activate so cold
  offline wake-ups serve cached content (Chip R3-H3).

vault-cli/src/vault_cli/release.py
- emit_migrations diffs all 4 tables via PRAGMA-driven column
  introspection (Dean R3-NC-1 + R3-NH-2). Was: only questions table,
  silently missing chains/chain_questions/tags. Rollback-symmetry
  test extended to populate + verify all tables.

vault-cli/src/vault_cli/commands/release.py
- vault verify --git-ref reconstructs release from 'git archive <ref>'
  into a tempdir (Dean R3-NC-2). Was: always rebuilt from HEAD, so
  verifying a historical release always failed post-authoring.
  Academic-citability contract (C-3) now actually holds.

vault-cli/src/vault_cli/ship.py (NEW)
- vault ship composed verb with journaling (Dean R3-NH-1):
  * Legs run D1 → Next.js → paper-tag-last (§6.1.1 ordering).
  * Journal at releases/<v>/.ship-journal.json records per-leg state;
    --resume continues interrupted ships idempotently.
  * Pre-paper failure auto-rolls back in reverse order.
  * Paper-leg failure pages operator; does NOT auto-rollback earlier
    legs (git tag is remote-durable per §6.1.1).
- 4 unit tests cover happy path, pre-paper failure auto-rollback,
  paper-leg needs-manual, --resume across interruptions.

vault-cli/src/vault_cli/commands/authoring.py
- vault new appends to id-registry.yaml (David R3-H3 + C-5
  enforcement); `git pull --rebase` before allocation.
- authors: auto-populated from git config user.email (David R3-H4 /
  M-15). Was: field never set.
- vault edit injects validation-error comment block at top of YAML
  and re-opens up to --retries=3 times (David R3-H1). Was: terminal
  traceback mid-authoring session.
- vault move refuses dirty tree, chained question, excluded-cell
  per applicability matrix (David R3-H2). Was: unchecked git mv.
- vault renumber command (NEW): post-rebase seq-collision recovery.
  Bumps seq, renames file, updates id field, appends registry
  (David R3-N-2, was spec-only).
- vault mark-exemplar command (NEW): promotes to vault/exemplars/
  with provenance + human_reviewed_at gate (David R3-N-9).

vault-cli/src/vault_cli/compiler.py
- FTS5 virtual table + sync triggers added to DDL (B.5). Triggers
  keep questions_fts in sync via AFTER INSERT/UPDATE/DELETE.
  schema_fingerprint accounts for triggers now.

tests/test_hashing.py
- Nested-dict hash-stability fixture (Soumith R3-F-4). Was: test
  only reordered top-level keys + collapsed details to one key.

All 28 tests pass (22 → 28: +4 ship journaling, +1 multi-table
migration symmetry, +1 nested-dict hash stability). release_hash
unchanged at 1b304282... — FTS5 addition doesn't affect content
Merkle per §3.5 input-only design.
2026-04-16 13:10:16 -04:00
Vijay Janapa Reddi
d8f6abae4b feat(worker): Phase 3 D1 worker scaffold + shared types package
Phase 3 is CODE-COMPLETE; actual D1 creation + Worker deployment
require authenticated Cloudflare credentials (user action gate per
kickoff stop-conditions).

staffml-vault-worker/
  wrangler.toml            — DB binding, CORS allowlist, TTL env vars,
                             SCHEMA_FINGERPRINT placeholder,
                             GRACE_WINDOW_SECONDS for cross-release
                             serving.
  src/index.ts             — 6 endpoints (manifest, questions, questions/:id,
                             search, stats) with ETag + cursor pagination +
                             SWR Cache-Control + CORS.
  src/types.ts             — Env binding + row shapes.
  README.md                — deploy-day runbook.

Key v2.1 behaviors wired:
- X-Vault-Release is INFORMATIONAL (not hard-reject) — worker serves
  from release_metadata.release_id; header is SLI signal only. Fixes
  Soumith H-NEW-2 local-dev + SWR revalidation brownout.
- schema_fingerprint cold-start check hashes actual sqlite_master DDL
  (not metadata-vs-metadata, closes Dean N-4). On mismatch: Cache-API
  read-only mode with X-Vault-Degraded header, never 5xx (closes
  Chip N-H1 total-outage risk).
- Cache keys keyed by release_id → deploy-time atomic POP
  invalidation (H-14).
- ETag format: '<release_id>:q:<content_hash>' — 304 support
  (Soumith H-NEW-2).
- Cursor pagination via opaque base64 {offset, filter_hash} tokens
  (H-20). Clients never construct cursors.
- CORS allowlist from wrangler var; no wildcard in prod.

staffml-vault-types/
  index.ts                 — shared TS contract types; pnpm workspace
                             protocol between worker + site (Soumith
                             M-NEW-1 resolution).
  package.json             — @staffml/vault-types, workspace-private.

vault-cli/scripts/
  emit_d1_schema.py        — generates d1-schema.sql from compiler DDL;
                             reports SHA-256 fingerprint to paste into
                             wrangler.toml SCHEMA_FINGERPRINT var.
  d1-schema.sql            — committed schema; applied to fresh D1 via
                             'wrangler d1 execute <db> --file d1-schema.sql'.

Deploy-day gates (per CUTOVER_QA.md §0 and TESTING.md phase-entry):
  1. License decision resolved (L-10 still OPEN).
  2. wrangler d1 create staffml-vault (prod + staging) — user action.
  3. Apply d1-schema.sql + seed via d1-migration.sql.
  4. FTS5 load-test gate: p99 warm ≤100ms, p99 cold ≤500ms,
     ≤500 D1 row-reads/query (Dean N-5 cost gate).
  5. Data-plane SLI crons emitting to Grafana.
2026-04-16 12:42:13 -04:00
Vijay Janapa Reddi
f633cc9174 feat(vault-cli): Phase 1 commands — build, check, new/edit/rm/restore/move, api, serve
vault build        — compile YAML → vault.db with release_hash stamped.
vault check        — invariant check engine with --tier (fast|structural|all)
                     and --json (LSP-diagnostic shape per JSON_OUTPUT.md).
vault new          — allocate content-addressed ID (topic + 6-hex + seq);
                     refuses to overwrite existing files.
vault edit         — open in $EDITOR; re-validates on save; exit 1 on
                     validation failure per exit-code taxonomy.
vault rm           — default soft-delete (status=deprecated); --hard requires
                     typed title confirmation + --force for chained questions
                     (Chip N-H11).
vault restore      — inverse of soft-delete.
vault move         — reclassify via git mv (falls back to shutil.move if
                     not in a git repo); --dry-run safe.
vault api          — localhost Worker-surface shim serving the production
                     endpoint contract from local vault.db (closes H-17).
                     Binds 127.0.0.1; prints divergence notice.
vault serve        — Datasette wrapper, 127.0.0.1 only.

scripts/split_corpus.py  — one-shot converter corpus.json → per-question
                           YAML under vault/questions/<track>/<level>/<zone>/.
                           Preserves legacy IDs for bookmark stability.
                           Appends to id-registry.yaml (append-only log).
                           Legacy chain_positions dict → structured
                           {id, position} form with 1-indexed positions.

scripts/pre_commit_corpus_guard.py  — refuses commits that edit
                                      vault/corpus.json directly (enforces
                                      C-2 single-authoring-surface).
                                      Override via 'Vault-Override:
                                      corpus-json-hand-edit' trailer.
2026-04-16 12:37:06 -04:00
Vijay Janapa Reddi
8af0948a35 ci(vault): Phase 0 CI workflow + exemplar-coverage audit
.github/workflows/vault-ci.yml
  Matches repo workflow style (emoji-prefixed name, concurrency group,
  path-scoped triggers). Phase-0 scope: pip install, vault --version,
  ruff, pytest, exemplar-audit staleness check. Python 3.12 pinned for
  hash stability per ARCHITECTURE.md §3.5. mypy --strict included but
  non-blocking at Phase 0; enforces in Phase 1. Placeholder for
  vault check --strict, vault build, vault codegen --check as those
  commands land.

interviews/vault-cli/scripts/exemplar_coverage_audit.py
  Reads corpus.json, groups by (track, level, zone), counts total
  questions vs exemplar-eligible per cell (requires provenance ∈
  {human, llm-then-human-edited}). Phase-0 honest output: provenance
  field doesn't exist in corpus.json yet, so eligible=0 for every
  cell until Phase-1 YAML split + provenance backfill. Audit shape is
  stable so Phase-1 re-runs slot in without refactoring.

interviews/vault/exemplar-gaps.yaml
  First audit snapshot: 190 cells catalogued, all gap=3 pending
  Phase-1. Filling gaps unblocks vault generate in Phase 7, not a
  Phase 0 blocker (Chip N-H3 resolution).

Phase 0 milestone: complete.
2026-04-15 21:25:52 -04:00