cs249r_book

mirror of https://github.com/harvard-edge/cs249r_book.git synced 2026-05-07 18:18:42 -05:00

Author	SHA1	Message	Date
Vijay Janapa Reddi	bc26a0bf37	feat(vault): Phase 6 schema tightening — markers + Details forbid + invariant Three coordinated edits to lift the marker convention from a soft draft-validation gate to a published-corpus invariant: 1. interviews/vault/schema/question_schema.yaml (LinkML, source of truth): common_mistake and napkin_math gain regex patterns matching the AUTHORING.md Pitfall/Rationale/Consequence and Assumptions/ Calculations/Conclusion conventions. Documents the spec; enforced in the validator below. 2. interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived): Details flips from extra='allow' to extra='forbid'. A pre-flight survey on 2026-05-04 across all 10,711 YAMLs found 0 unknown keys on Details, so the historical 'imported legacy fields' risk no longer applies. 3. interviews/vault-cli/src/vault_cli/validator.py: structural_tier gains _check_format_markers (invariant #19), which flags published YAMLs whose non-empty cm/nm doesn't match the AUTHORING.md markers. Drafts are exempt — author-in-progress drafts may still have malformed markers. Lifts gate_format from validate_drafts.py / _judges.py from a CI-time gate to a vault-check-strict invariant. Tests: 4 new cases in test_models covering Details forbid, marker- compliant pass, malformed cm fail, and draft-exempt skip. Total 88 passing (was 84). codegen-hashes.txt updated for the models.py edit; vault codegen --check passes. The on-disk corpus is fully clean post-Phase-5+drain: vault check --strict reports 10,711 loaded, 0 invariant failures, 0 format- marker violations on published YAMLs.	2026-05-04 08:41:08 -04:00
Vijay Janapa Reddi	03031dc38e	test(vault-cli): smoke tests for audit_corpus_batched batching 7 tests covering pack_batches: - empty input → no batches - single small item → one batch - no items lost across batches (50 items, 10/batch → all 50 round-trip) - max_items_per_batch caps batch size (33 items, 10/batch → 10/10/10/3) - max_chars triggers a flush before items overflow the budget - input order preserved within and across batches - oversized single item still lands in a batch (we don't drop, the caller is expected to detect overflow downstream) The audit script itself can't easily be unit-tested in CI (it subprocess-shells the gemini CLI); the batching helper is the main piece of pure logic, so this is where the value is. 84 / 84 pytest pass (was 77; added 7) CORPUS_HARDENING_PLAN.md Phase 3.	2026-05-03 08:23:08 -04:00
Vijay Janapa Reddi	f691d6c14a	feat(vault-cli): vault new scaffolds full Pitfall/Rationale/Consequence + Assumptions/Calculations/Conclusion stubs The previous scaffold only stubbed scenario and realistic_solution with <TODO> placeholders. That meant authors had to know about the markup conventions from somewhere else (the regex in validate_drafts.py, the SCHEMA_SUMMARY in generate_question_for_gap.py, or the paragraph in ARCHITECTURE.md §3.6.1) — none of which a new contributor would find. Now `vault new` produces a YAML with the canonical bold markers pre-written. Authors fill in the content between markers; they can't forget to use them. Templates extracted as module-level constants (COMMON_MISTAKE_TEMPLATE and NAPKIN_MATH_TEMPLATE in commands/authoring.py) so they're testable in isolation. New tests in test_authoring_scaffold.py guard against accidental marker removal — if a contributor edits the scaffold and drops, say, The Rationale:, the test fails immediately rather than every new question silently failing the format gate downstream. 77 / 77 pytest pass (was 74; added 3) ruff clean vault check --strict — 10,711 loaded, 0 invariant failures CORPUS_HARDENING_PLAN.md Phase 2.	2026-05-03 08:11:59 -04:00
Vijay Janapa Reddi	270b1a5bd2	fix(vault): drop 55 Δ=0 chains + remove Δ=0 from lenient mode Action on the strongest finding from the 2026-05-01 independent audit: 54 of 55 Δ=0 chains had no shared scenario (the "two questions sharing a scenario thread" constraint the lenient prompt was supposed to enforce). Two independent audit fields agreed (verdict=bad and shared_scenario=no), so this isn't a tuning question — the design choice was wrong. Why remove Δ=0 entirely rather than tighten the prompt: - The chain definition is "pedagogical progression through Bloom levels"; same-level edges contradict the definition. - The "shared scenario / different angle" carve-out is unenforceable by an LLM at corpus scale (audit confirmed). - Same-scenario same-level pairs are more honestly modeled as siblings of a chain anchor, not as chain members. Changes: - chains.json: 879 → 824. Dropped: 55 chains (all tier=secondary, since Δ=0 was only ever produced by the lenient sweep). Per-track: edge -19, tinyml -12, mobile -10, cloud -7, global -7. - build_chains_with_gemini.py: MODE_CONFIG["lenient"]["allowed_deltas"]: {0,1,2,3} → {1,2,3} LENIENT_PROMPT_TEMPLATE: Δ=0 paragraph rewritten to explicitly REJECT same-level pairs (with rationale citing the audit). docstring + --mode help text updated. - tests/test_chain_validation.py: test_lenient_accepts_same_level_pair → test_lenient_rejects_same_level_pair header docstring updated to reflect the new rule. - vault-manifest.json: chainCount 879 → 824, releaseHash rolls to 479811040b7a… (real content delta, not a timestamp churn). Validation: - vault check --strict: 10,705 loaded, 0 failures - vault build --local-json: chainCount=824, releaseHash=479811040b… - pytest: 74/74 - playwright chain-and-vault-smoke: 19/19 (fixtures cloud-0001 + cloud-0231 are still in their chains post-drop) Audit findings #2 (gap detection ~50% noise) and #3 (4 pilot drafts disposition) remain open — see CHAIN_ROADMAP.md Progress Log.	2026-05-02 08:51:49 -04:00
Vijay Janapa Reddi	9680e8e9fd	feat(vault+staffml): Phase 2 — tier surfacing, schema → TS → UI Carries the primary/secondary chain tier (from Phase 1) through the build pipeline into the practice + explore surfaces, so primary chains are the unmarked default and secondary chains are an opt-in alternative path the user can deep-link into via ?chain=<id>. Backend (2.1): - legacy_export.py emits chain_tiers per question alongside chain_ids and chain_positions; missing chain-tier defaults to "primary". - vault build re-run: 2953 chained questions, all carry chain_tiers (releaseHash unchanged — new field is additive, doesn't perturb the manifest hash inputs). - Existing legacy_export tests were stale (asserted on the v1.0 YAML chains: field path; v1.1 made chains.json the sidecar source). Rewrote them to write chains.json fixtures into tmp_path and added chain_tiers assertions, plus a focused test_chain_tiers_emitted_per_membership case. TypeScript (2.2): - Question.chain_tiers? (Record<string, "primary"\|"secondary">) - ChainTier export, ChainInfo.tier required. - getChainForQuestion / getAllChainsForQuestion populate tier; getAllChains... sorts primary first. - New getPrimaryChainForQuestion(qid) helper for default surfaces. UI (2.3): - practice page reads ?chain=<id> URL param; defaults to getPrimaryChainForQuestion when unset. - ChainBadge gains an inline "alt path" pill when tier=secondary (always visible — no click needed). - ChainStrip mirrors that pill in the progress row for users who expand the strip. - Explore page prefers the first non-secondary chain when picking activeChainId for the related-questions panel. - Deferred to a follow-up commit (intentional, scoped via Progress Log): explore-page "Primary only / All" filter; daily/mock routing. Tests (2.4): - test7_tier_aware_chain_routing in chain-and-vault-smoke.mjs: secondary reachable via ?chain=, alt-path badge visible on secondary, primary regression, alt-path badge ABSENT on primary. - Full smoke suite: 17/17 pass (was 13/13). Validation: - vault check --strict: 10,701 loaded, 0 failures - vault build --legacy-json: 9438 published, chainCount=879 - pytest interviews/vault-cli/tests: 74/74 - npx tsc --noEmit: 0 errors - playwright chain-and-vault-smoke: 17/17 Phase 2 complete. Next: Phase 3 (gap-driven authoring; 407-gap backlog).	2026-04-30 20:22:54 -04:00
Vijay Janapa Reddi	d272d374aa	feat(chains): --mode lenient + tier field for second-pass coverage Phase 1.2 + 1.3 of CHAIN_ROADMAP.md. The two land together because the prompt template, validator Δ-rule, and tier-tagging must stay in lockstep or chains.proposed.lenient.json would mis-validate. build_chains_with_gemini.py: - new LENIENT_PROMPT_TEMPLATE alongside renamed STRICT_PROMPT_TEMPLATE; lenient template tells Gemini to accept Δ ∈ {0,1,2,3}, with Δ=0 only for shared-scenario same-level pairs and Δ=3 last-resort - MODE_CONFIG single-source-of-truth maps mode → (template, allowed Δ set) - validate_chain now takes mode= and gates on the per-mode Δ set - process_batch tags lenient-mode chains with tier="secondary" and a chain_id suffix (-secondary) so primary/secondary IDs never collide - new --mode {strict,lenient} flag (default strict — primary chains keep producing under the same rules as before) - new --buckets-from <chain-coverage.json> flag that restricts the run to the uncovered_buckets list from diagnose_chain_coverage.py (the Phase 1.4 second-pass entry point) apply_proposed_chains.py: - docstring note: tier field is intentionally not validated here (it's a UI hint, not a structural invariant) - already accepts Δ=0 chains via its non-strict monotonicity check, so no logic change needed tests/test_chain_validation.py: - 19 cases covering both modes: strict accepts +1/+2 and rejects Δ=0, Δ≥3, and backward; lenient accepts Δ=0/Δ=3 but still rejects Δ≥4 and backward; both modes reject size-out-of-range, multi-topic, and unknown qids. Loads the script via importlib (it's not part of the importable vault_cli package). Smoke check (--dry-run --buckets-from chain-coverage.json --mode lenient): 17 calls planned for the 211 uncovered buckets, well under the 200 cap.	2026-04-30 19:29:12 -04:00
Vijay Janapa Reddi	1d3c91d8e8	fix(vault-cli): ruff auto-fixes — datetime.UTC alias + drop unused import ruff catches: - UP017: datetime.timezone.utc -> datetime.UTC alias (Python 3.11+). - I001 / F401: drop unused Details import in test_models.py. Both auto-applied via `ruff check --fix`. Lint now passes; unblocks the publish-live green gate.	2026-04-26 10:11:16 -04:00
Vijay Janapa Reddi	eb71638630	feat(vault): release-grade Phase G — full audit + cleanup + 0.1.3 release Final brute-force release-readiness pass: every gate green, 0.1.3 released and verified, every observable failure mode closed at source. ═══ AUDITS (G.A–G.D) ═══ G.A — gemini-3.1-pro-preview default everywhere. Active CLI scripts already used it; bulk-patched 6 legacy scripts (`generate_batch.py`, `validate_questions.py`, `generate_gaps.py`, `run_reviews.sh`, `generate.py`, `review_math.sh`) + WORKFLOW.md off `gemini-2.5-flash` or `gemini-2.5-pro` to `gemini-3.1-pro-preview`. Only `archive/` references remain (intentionally legacy). G.B — Cloudflare workflow audit. `vault verify 0.1.1` correctly failed (YAMLs evolved since 0.1.1 cut). Confirmed `vault publish`, `vault deploy`, `vault ship`, `vault rollback`, `vault verify`, `vault snapshot`, `vault tag` all wired. Released 0.1.2 then 0.1.3 to lock final state. G.C — Visual asset integrity audit. 236/236 YAML visual references resolve, 0 orphan SVGs, 0 missing files, 0 unrendered sources. Clean. G.D — Unit tests for new validators added at `tests/test_models.py`: 15 tests covering Visual.kind enum, Visual.path regex, Visual.alt + caption min lengths + required, Question._zone_bloom_compatible (recall+remember accepted, recall+evaluate rejected, mastery+ remember rejected, evaluation+evaluate accepted, design+create accepted), Question._visual_path_resolves. 15/15 pass. ═══ CONTENT CLEANUP (G.E–G.L) ═══ G.E — Sample re-judge of 100 random cloud parallelism items via Gemini 3.1 Pro Preview (4 API calls): 53% PASS / 23% NEEDS_FIX / 24% DROP. Surfaced legacy quality drift — items generated under pre-Phase-D laxer prompts were not meeting the new strict bar (math errors with bidirectional vs unidirectional NVLink, "Based on the diagram..." references with no diagram, deprecated practices like SSP for modern LLM training, wrong-track scenarios like Cortex-M4 in cloud track). G.H — General-purpose cleanup agent on 47 flagged items: 31 rewritten with PARALLELISM_RULES bar applied (concrete unidirectional NVLink 450 GB/s, IB NDR 25 GB/s, RoCE v2 22 GB/s, PCIe Gen3 12 GB/s; multi-step ring AllReduce arguments with the 2(N-1)/N factor; non-obvious failure modes); 16 archived with documented `deletion_reason` (mathematically broken premises, physics errors, topic-irreconcilable, direct duplicates). G.L — Re-judge of 31 G.H rewrites: 23 PASS / 3 NEEDS_FIX / 5 DROP = 74.2% pass rate. The 8 still-failing items archived (after the cleanup pass still couldn't satisfy the strict bar). Contract: items get THREE chances — original generation, fix-agent, retry- fix — and if they still fail, archived not promoted. Honest. ═══ STUBBORN-FAIL ARCHIVES (Phase F residuals) ═══ After three independent fix-agent passes (Phase C, F.2, F.4), 4 items remained NEEDS_FIX or DROP: edge-2390, edge-2401, mobile-1948, tinyml-1681. Archived with `deletion_reason` documenting the 3-attempt failure history. The cell may be structurally awkward; preserving items for audit but removing from the bundle. ═══ ORPHAN CHAIN FIX ═══ After archives, `cloud-chain-359` had only 1 published member (`cloud-1840`); its sibling `cloud-1845` got archived. Dropped the chain ref from cloud-1840 + ran `repair_chains.py` to clean residual references in archived YAMLs. `vault check --strict` now passes 0 chain warnings. ═══ E.2 / E.3 SHIPPED EARLIER IN PRIOR COMMIT ═══ (Documented in commit `20ea20005` for completeness): - `vault build --legacy-json` auto-emits `vault-manifest.json`. - `analyze_coverage_gaps.py --include-areas <areas>` flag. ═══ 0.1.3 FINAL RELEASE ═══ `vault publish 0.1.3` snapshot at `releases/0.1.3/`. Migrations: +0 ~27 -28 (zero net new questions, 27 modified during cleanup, 28 archived/promoted). `vault verify 0.1.3` ✓ — release_hash `793c06f414f2bf8391a8a5c56ec0ff8d76bfce4ab7c64ad12ecb83f6d932280e` reconstructs from YAML. Latest symlink → 0.1.3. ═══ FINAL ALL-9-GATES SWEEP — ALL GREEN ═══ [1] vault check --strict ✓ 10,701 / 0 errors / 0 invariants [2] vault lint ✓ 0 errors / 0 warnings / 9,757 info [3] vault doctor ✓ 0 fails (registry-history info OK) [4] vault codegen --check ✓ artifacts in sync [5] vault verify 0.1.3 ✓ hash reconstructs from YAML [6] staffml validate-vault ✓ 0 errors / 0 warnings, deployment-ready [7] render_visuals ✓ 236 visuals, 0 errors [8] tsc ✓ TypeScript clean [9] Playwright ✓ 9/9 pass ═══ FINAL CORPUS STATE ═══ Bundle: 9,757 published (was 9,224 at branch cut, +533 net across the full multi-session push, after all archives). Total commits on branch since cut: 10. Release tag latest: 0.1.3 (verified-clean). Status: StaffML-day-ready. Ship it.	2026-04-25 19:45:32 -04:00
Vijay Janapa Reddi	37414fed9e	chore(vault): regen staffml corpus + wire drift checks into CI - Regenerate interviews/staffml/src/data/corpus.json from the v1.0 YAMLs. 9,199 published questions (up from 9,113 — recovered 86). Every record carries validated + math_verified booleans; human_reviewed surfaces when populated. Dead 'scope' field dropped. - Regenerate interviews/staffml/src/data/vault-manifest.json to match: questionCount 9199, chainCount 970, levelDistribution now shows L6+ as 946 (up from 0) and L1 corrected to 462 (down from 1,387 inflated). - Wire check_schema_sync.py into the pre-commit config under the StaffML section. - Wire check_schema_sync.py into the 'Vault + Corpus Smoke Tests' CI job so PR builds fail on enums.py ↔ LinkML drift. - Update test_legacy_export.py for v1.0: plural chains, classification on Question body, competency_area passed through instead of resolved.	2026-04-21 18:27:42 -04:00
Vijay Janapa Reddi	c3b8411230	fix(vault): resolve competency_area from topics.json instead of aliasing topic The legacy exporter was setting competency_area = topic, collapsing the 13 canonical areas into 87 single-topic areas. The Vault UI showed "1 topics" per area instead of meaningful sub-groupings. Now resolves each topic's area via topics.json (87 topics -> 13 areas) with a graceful fallback to the topic slug if topics.json is absent.	2026-04-17 18:47:02 -04:00
Vijay Janapa Reddi	cbdb566381	feat(vault): Phase-1 migration contract fully closed in-repo v2.3 \u2192 v2.4. ARCHITECTURE.md header + Appendix reflect the completed migration. WHAT CLOSED (\u00a711.1 contract): 1. `vault build --legacy-json` regenerates the site's interviews/staffml/src/data/corpus.json from YAML. 9,199 published questions, site-compatible shape (chain_positions back to 0-indexed dict form, bloom_level derived from zone, competency_area aliased from topic, scope aliased from track). Deterministic via sort_keys + id-sort. 2. Pre-commit hook INSTALLED via worktree-aware Makefile target (`make -C interviews/vault-cli hooks`). Symlink points at pre_commit_corpus_guard.py. Tested end-to-end: direct edit to vault/corpus.json triggers exit-1 with §11.1 reference. 3. CI equivalence check added to .github/workflows/vault-ci.yml: regenerates corpus.json from YAML, diffs against committed. Fails PR on drift with actionable error message. 4. Legacy generators demoted with DEPRECATED headers: - interviews/paper/scripts/analyze_corpus.py \u2192 vault export-paper - interviews/staffml/scripts/sync-vault.py \u2192 vault build --legacy-json - interviews/staffml/scripts/generate-manifest.py \u2192 vault publish - interviews/vault/scripts/export_to_staffml.py \u2192 vault build --legacy-json 5. New DEPRECATED.md files at interviews/vault/scripts/ and interviews/staffml/scripts/ map every legacy script to its replacement. Both directories keep the old scripts for git-history legibility and archaeology; new contributors see the vault CLI first. 6. ARCHITECTURE.md \u00a7Appendix rewritten as current-state table instead of aspirational "gone. replaced by..." entries. NEW TESTS (interviews/vault-cli/tests/test_legacy_export.py \u2014 +4): - test_legacy_shape_matches_site_interface: every field corpus.ts declares is present in regenerated JSON. - test_chain_positions_legacy_shape: 1-indexed new schema \u2192 0-indexed legacy dict form. - test_emitter_deterministic: byte-stable across reversed input order (required for CI diff-check). - test_competency_area_aliases_topic: legacy alias fields populated correctly. FULL MATRIX GREEN: pytest: 38/38 passed in 0.19s (34 + 4 legacy-export) ruff: All checks passed hook: exit 0 on clean diff / exit 1 on corpus.json direct edit e2e: vault build --legacy-json regenerates a bit-identical corpus.json vs the committed one; CI check wired to catch drift WHAT'S LEFT (deploy-gated, \u00a720.5 #1, #5, #6 partial, #8, #9): - Production serves from D1: requires Phase-3 wrangler d1 create + deploy - Manual QA per CUTOVER_QA.md: requires live staging - Zero data loss D1-side verification: requires live D1 - 48h monitoring: requires production traffic These are intrinsically user-action; the YAML-side migration is done.	2026-04-16 14:57:24 -04:00
Vijay Janapa Reddi	4aae33c036	test+ci: green test matrix + lint-clean + real vitest + committed lockfile LOCAL TEST RESULTS (all green): pytest: 34 passed in 0.19s (28 existing + 6 new command tests) ruff: All checks passed (0 errors) vitest: 7 passed in 127ms (worker contract tests) CLI e2e: vault --version / build / verify / stats / doctor / diff / export-paper / ship --dry-run / publish + verify rc1 / api shim via curl against 9199-question corpus — all green Python-side fixes: - interviews/vault-cli/pyproject.toml: ruff config now has principled per-file-ignores for B008 (Typer pattern), N806 (DAG cycle colors), E402 (scripts), SIM118 (sqlite3.Row iterator). Keeps signal tight. - 13 real ruff violations fixed across authoring.py (contextlib.suppress), diff_cmd.py + serve_api.py (dict(sqlite3.Row) instead of broken .keys() iteration), policy.py (direct return), release.py (zip strict=True, update_latest_symlink now validates target exists; previous 'target' variable was unused), commands/release.py (import order reshuffled, ambiguous 'l' renamed). - commands/release.py ship_cmd leg-skip uses 'leg' not 'l'. New pytest file: interviews/vault-cli/tests/test_commands.py (+6 tests) - stats: JSON shape + Prometheus format. - diff: add/remove/modify detection + classification. - doctor: graceful skip on missing vault; unknown --check returns USAGE_ERROR. - codegen: --check passes against baseline. Worker-side fixes: - src/index.ts cachedOrCompute graceful-degrades when caches global isn't available (Node test env, future-proofing against runtime regressions). - src/index.ts handleSearch: 'query: q' → 'query: qRaw' (q was renamed earlier). - src/rate_limit.ts: removed unused WINDOW_MS const. - tests/worker.test.ts: vi.resetModules() between tests so module-level schemaOk/lastSeenRelease state doesn't leak across test cases (fingerprint memoization was sticky). - package.json: added test:watch + lint aliases. - .gitignore: node_modules, .wrangler, dist, .dev.vars. - package-lock.json committed (npm — pnpm not on the machine; CI updated to use npm ci). CI (.github/workflows/vault-ci.yml): - Split into python + worker jobs. - Python job: ruff + mypy (non-blocking) + pytest + vault check --strict + vault build release_hash regression + vault codegen --check + registry append-only + exemplar audit staleness. - Worker job: node 20 + npm ci + tsc typecheck + vitest run. - Triggers now include staffml-vault-types path (keeps CI honest when shared-types drift). What runs vs what's gated on user: RAN LOCALLY: pytest, ruff, vitest, tsc, CLI end-to-end smoke (build→verify→export→stats→doctor→diff→publish rc→api-shim→ship --dry-run), full corpus invariants. GATED ON USER (requires Cloudflare credentials): - wrangler login + wrangler d1 create - wrangler d1 execute (schema + seed) - pnpm/npm deploy:staging - FTS5 production load-test - vault ship --env production (live D1 + Next.js + tag push) Everything that CAN be verified without credentials HAS been.	2026-04-16 14:30:20 -04:00
Vijay Janapa Reddi	42f4d1ca8b	fix(vault): Round-3 correctness + `vault ship` + authoring contract Round-3 review (4 reviewers on v2.1) surfaced two code-correctness Criticals that this commit fixes, plus the contracted-but-missing `vault ship` coordinator and David's authoring-UX gaps. Critical fixes (real bugs in landed code): worker/src/index.ts - SCHEMA_FINGERPRINT placeholder fail-closed (Chip R3-C1 / Dean R3-NH-3). Was: placeholder auto-passed and silently disabled the fingerprint check. Now: placeholder forces degraded mode until operator sets real fingerprint. - DDL hash now includes triggers (FTS5-aware). - release_id change invalidates schema-fingerprint memoization (Dean R3-NH-4). - wrangler.toml now pins the real fingerprint. staffml/public/sw.js - /manifest polling TTL-throttled to 5min (Chip R3-C2). Was: per-request fetch nullified the §10.4 cost model. - API origin persisted to IndexedDB; rehydrated on activate so cold offline wake-ups serve cached content (Chip R3-H3). vault-cli/src/vault_cli/release.py - emit_migrations diffs all 4 tables via PRAGMA-driven column introspection (Dean R3-NC-1 + R3-NH-2). Was: only questions table, silently missing chains/chain_questions/tags. Rollback-symmetry test extended to populate + verify all tables. vault-cli/src/vault_cli/commands/release.py - vault verify --git-ref reconstructs release from 'git archive <ref>' into a tempdir (Dean R3-NC-2). Was: always rebuilt from HEAD, so verifying a historical release always failed post-authoring. Academic-citability contract (C-3) now actually holds. vault-cli/src/vault_cli/ship.py (NEW) - vault ship composed verb with journaling (Dean R3-NH-1): * Legs run D1 → Next.js → paper-tag-last (§6.1.1 ordering). * Journal at releases/<v>/.ship-journal.json records per-leg state; --resume continues interrupted ships idempotently. * Pre-paper failure auto-rolls back in reverse order. * Paper-leg failure pages operator; does NOT auto-rollback earlier legs (git tag is remote-durable per §6.1.1). - 4 unit tests cover happy path, pre-paper failure auto-rollback, paper-leg needs-manual, --resume across interruptions. vault-cli/src/vault_cli/commands/authoring.py - vault new appends to id-registry.yaml (David R3-H3 + C-5 enforcement); `git pull --rebase` before allocation. - authors: auto-populated from git config user.email (David R3-H4 / M-15). Was: field never set. - vault edit injects validation-error comment block at top of YAML and re-opens up to --retries=3 times (David R3-H1). Was: terminal traceback mid-authoring session. - vault move refuses dirty tree, chained question, excluded-cell per applicability matrix (David R3-H2). Was: unchecked git mv. - vault renumber command (NEW): post-rebase seq-collision recovery. Bumps seq, renames file, updates id field, appends registry (David R3-N-2, was spec-only). - vault mark-exemplar command (NEW): promotes to vault/exemplars/ with provenance + human_reviewed_at gate (David R3-N-9). vault-cli/src/vault_cli/compiler.py - FTS5 virtual table + sync triggers added to DDL (B.5). Triggers keep questions_fts in sync via AFTER INSERT/UPDATE/DELETE. schema_fingerprint accounts for triggers now. tests/test_hashing.py - Nested-dict hash-stability fixture (Soumith R3-F-4). Was: test only reordered top-level keys + collapsed details to one key. All 28 tests pass (22 → 28: +4 ship journaling, +1 multi-table migration symmetry, +1 nested-dict hash stability). release_hash unchanged at 1b304282... — FTS5 addition doesn't affect content Merkle per §3.5 input-only design.	2026-04-16 13:10:16 -04:00
Vijay Janapa Reddi	8205d8a5f9	feat(vault): Phase 2 release pipeline — snapshot, migrations, export-paper, publish, verify Primitives (§4.2): vault snapshot <v> — stage to releases/.pending-<v>/. vault migrations-emit A B — forward + inverse SQL; rollback embeds full prior-row bodies for UPDATE/DELETE so rollback works without mechanical inversion (C-1). vault export-paper <v> — emit macros.tex + corpus_stats.json via SQL over releases/<v>/vault.db. Replaces paper/scripts/generate_macros.py; paper + site agree by construction (H-21 runtime closure). vault tag <v> — git-commit + git-tag v<version>. vault verify <v> — reconstruct release_hash from YAML source and assert equality with release.json (C-3 academic-citability property). Composed product (§4.3): vault publish <v> — check --strict + build + snapshot + migrations-emit. Stages to .pending-<v>/ and swaps to final via POSIX rename(2) as the last step (C-7 non-atomic fix). --resume detects orphaned pending dirs. latest symlink swap via rename-over-tmp. First citable release artifact committed: releases/0.9.0/ with vault.db (25 MB), release.json (release_hash, policy_version, git_sha, timestamp), d1-migration.sql + d1-rollback.sql stubs. Gitignore updated: /vault.db at vault/ root excludes the transient build artifact, but vault.db inside releases/<v>/ IS committed per §13 (academic integrity — permanently citable). Tests (22 pass, +3): - test_snapshot_copies_db_and_writes_release_json - test_migrations_emit_added_modified_removed: verifies rollback embeds prior-row body (C-1 correctness) - test_rollback_symmetry_property: apply-forward-then-rollback returns dump identical to pre-migration state (closes M-1 property test). Phase 2 milestone: PASSED. End-to-end: vault publish 0.9.0 # composed product vault verify 0.9.0 # from-source round-trip vault export-paper 0.9.0 # emits macros.tex + stats → release_hash 1b304282... reproducible from YAML.	2026-04-16 12:40:04 -04:00
Vijay Janapa Reddi	812ba408d0	feat(vault): Phase 1 core — schema, hashing, policy, loader, validator LinkML schema at vault/schema/question_schema.yaml is the sole schema source of truth. Pydantic models in vault_cli.models are currently hand-authored to match; full LinkML codegen wires in Phase 2 with the drift-check in CI. Core modules: vault_cli/models.py — Pydantic question model (closed enums, content- format per field, schema_version=1 gate). vault_cli/hashing.py — canonical content_hash over whitelisted fields; release_hash Merkle with __policy__ and __canon_version__ leaves (Chip N-H5). vault_cli/yaml_io.py — hardened SafeLoader: 256KB cap, depth 10 cap, aliases rejected, timeout (H-7). vault_cli/paths.py — path-as-classification parser with lowercase + enum enforcement (H-9). vault_cli/loader.py — walks vault/questions/, returns loaded + errors (never raises — aggregate reporting). vault_cli/validator.py — tiered invariant engine; fast + structural tiers implemented per ARCHITECTURE.md §5. vault_cli/compiler.py — YAML → SQLite with release_metadata rows (release_id, release_hash, policy_version, schema_version, published_count). vault_cli/policy.py — single filter predicate. No consumer re-implements (H-21). release-policy.yaml v1: status=published. Dropped require_validated in the wake of 9199/8053 resolution — validation is implicit in the maintainer-approval → status=published transition, not a separate flag. Tests (19 pass): key-order hash invariance (Soumith M-NEW-4), policy filter correctness (H-21 runtime check), YAML hardening (H-7).	2026-04-16 12:37:06 -04:00
Vijay Janapa Reddi	5ee46fc2a5	feat(vault-cli): Phase 0 package scaffold Pyproject.toml with Typer+Rich+Pydantic+PyYAML deps. Console entry point 'vault' → vault_cli.main:app. Smoke tests cover --version, --help, and the exit-code taxonomy regression guard. Module layout: src/vault_cli/__init__.py src/vault_cli/_version.py — single source for __version__ src/vault_cli/exit_codes.py — stable IntEnum taxonomy (§4.6) src/vault_cli/main.py — Typer app, --version flag tests/test_smoke.py — 4 tests, all green in 0.04s Subcommands land incrementally from Phase 1 per ARCHITECTURE.md §14. Python ≥3.12 required; CI pins 3.12 for hash stability. Milestone gate: pip install -e interviews/vault-cli/[dev] && vault --version passes. Tests green. Ready for Phase 1.	2026-04-15 21:25:52 -04:00

16 Commits