mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-07 18:18:42 -05:00
dev
16 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bc26a0bf37 |
feat(vault): Phase 6 schema tightening — markers + Details forbid + invariant
Three coordinated edits to lift the marker convention from a soft draft-validation gate to a published-corpus invariant: 1. interviews/vault/schema/question_schema.yaml (LinkML, source of truth): common_mistake and napkin_math gain regex patterns matching the AUTHORING.md Pitfall/Rationale/Consequence and Assumptions/ Calculations/Conclusion conventions. Documents the spec; enforced in the validator below. 2. interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived): Details flips from extra='allow' to extra='forbid'. A pre-flight survey on 2026-05-04 across all 10,711 YAMLs found 0 unknown keys on Details, so the historical 'imported legacy fields' risk no longer applies. 3. interviews/vault-cli/src/vault_cli/validator.py: structural_tier gains _check_format_markers (invariant #19), which flags published YAMLs whose non-empty cm/nm doesn't match the AUTHORING.md markers. Drafts are exempt — author-in-progress drafts may still have malformed markers. Lifts gate_format from validate_drafts.py / _judges.py from a CI-time gate to a vault-check-strict invariant. Tests: 4 new cases in test_models covering Details forbid, marker- compliant pass, malformed cm fail, and draft-exempt skip. Total 88 passing (was 84). codegen-hashes.txt updated for the models.py edit; vault codegen --check passes. The on-disk corpus is fully clean post-Phase-5+drain: vault check --strict reports 10,711 loaded, 0 invariant failures, 0 format- marker violations on published YAMLs. |
||
|
|
03031dc38e |
test(vault-cli): smoke tests for audit_corpus_batched batching
7 tests covering pack_batches:
- empty input → no batches
- single small item → one batch
- no items lost across batches (50 items, 10/batch → all 50 round-trip)
- max_items_per_batch caps batch size (33 items, 10/batch → 10/10/10/3)
- max_chars triggers a flush before items overflow the budget
- input order preserved within and across batches
- oversized single item still lands in a batch (we don't drop, the
caller is expected to detect overflow downstream)
The audit script itself can't easily be unit-tested in CI (it
subprocess-shells the gemini CLI); the batching helper is the main
piece of pure logic, so this is where the value is.
84 / 84 pytest pass (was 77; added 7)
CORPUS_HARDENING_PLAN.md Phase 3.
|
||
|
|
f691d6c14a |
feat(vault-cli): vault new scaffolds full Pitfall/Rationale/Consequence + Assumptions/Calculations/Conclusion stubs
The previous scaffold only stubbed scenario and realistic_solution with <TODO> placeholders. That meant authors had to know about the markup conventions from somewhere else (the regex in validate_drafts.py, the SCHEMA_SUMMARY in generate_question_for_gap.py, or the paragraph in ARCHITECTURE.md §3.6.1) — none of which a new contributor would find. Now `vault new` produces a YAML with the canonical bold markers pre-written. Authors fill in the content between markers; they can't forget to use them. Templates extracted as module-level constants (COMMON_MISTAKE_TEMPLATE and NAPKIN_MATH_TEMPLATE in commands/authoring.py) so they're testable in isolation. New tests in test_authoring_scaffold.py guard against accidental marker removal — if a contributor edits the scaffold and drops, say, **The Rationale:**, the test fails immediately rather than every new question silently failing the format gate downstream. 77 / 77 pytest pass (was 74; added 3) ruff clean vault check --strict — 10,711 loaded, 0 invariant failures CORPUS_HARDENING_PLAN.md Phase 2. |
||
|
|
270b1a5bd2 |
fix(vault): drop 55 Δ=0 chains + remove Δ=0 from lenient mode
Action on the strongest finding from the 2026-05-01 independent audit:
54 of 55 Δ=0 chains had no shared scenario (the "two questions
sharing a scenario thread" constraint the lenient prompt was supposed
to enforce). Two independent audit fields agreed (verdict=bad and
shared_scenario=no), so this isn't a tuning question — the design
choice was wrong.
Why remove Δ=0 entirely rather than tighten the prompt:
- The chain definition is "pedagogical progression through Bloom
levels"; same-level edges contradict the definition.
- The "shared scenario / different angle" carve-out is unenforceable
by an LLM at corpus scale (audit confirmed).
- Same-scenario same-level pairs are more honestly modeled as
siblings of a chain anchor, not as chain members.
Changes:
- chains.json: 879 → 824. Dropped: 55 chains (all tier=secondary,
since Δ=0 was only ever produced by the lenient sweep).
Per-track: edge -19, tinyml -12, mobile -10, cloud -7, global -7.
- build_chains_with_gemini.py:
MODE_CONFIG["lenient"]["allowed_deltas"]: {0,1,2,3} → {1,2,3}
LENIENT_PROMPT_TEMPLATE: Δ=0 paragraph rewritten to explicitly
REJECT same-level pairs (with rationale citing the audit).
docstring + --mode help text updated.
- tests/test_chain_validation.py:
test_lenient_accepts_same_level_pair → test_lenient_rejects_same_level_pair
header docstring updated to reflect the new rule.
- vault-manifest.json: chainCount 879 → 824, releaseHash rolls to
479811040b7a… (real content delta, not a timestamp churn).
Validation:
- vault check --strict: 10,705 loaded, 0 failures
- vault build --local-json: chainCount=824, releaseHash=479811040b…
- pytest: 74/74
- playwright chain-and-vault-smoke: 19/19 (fixtures cloud-0001 +
cloud-0231 are still in their chains post-drop)
Audit findings #2 (gap detection ~50% noise) and #3 (4 pilot drafts
disposition) remain open — see CHAIN_ROADMAP.md Progress Log.
|
||
|
|
9680e8e9fd |
feat(vault+staffml): Phase 2 — tier surfacing, schema → TS → UI
Carries the primary/secondary chain tier (from Phase 1) through the
build pipeline into the practice + explore surfaces, so primary chains
are the unmarked default and secondary chains are an opt-in alternative
path the user can deep-link into via ?chain=<id>.
Backend (2.1):
- legacy_export.py emits chain_tiers per question alongside chain_ids
and chain_positions; missing chain-tier defaults to "primary".
- vault build re-run: 2953 chained questions, all carry chain_tiers
(releaseHash unchanged — new field is additive, doesn't perturb the
manifest hash inputs).
- Existing legacy_export tests were stale (asserted on the v1.0 YAML
chains: field path; v1.1 made chains.json the sidecar source).
Rewrote them to write chains.json fixtures into tmp_path and added
chain_tiers assertions, plus a focused
test_chain_tiers_emitted_per_membership case.
TypeScript (2.2):
- Question.chain_tiers? (Record<string, "primary"|"secondary">)
- ChainTier export, ChainInfo.tier required.
- getChainForQuestion / getAllChainsForQuestion populate tier;
getAllChains... sorts primary first.
- New getPrimaryChainForQuestion(qid) helper for default surfaces.
UI (2.3):
- practice page reads ?chain=<id> URL param; defaults to
getPrimaryChainForQuestion when unset.
- ChainBadge gains an inline "alt path" pill when tier=secondary
(always visible — no click needed).
- ChainStrip mirrors that pill in the progress row for users who
expand the strip.
- Explore page prefers the first non-secondary chain when picking
activeChainId for the related-questions panel.
- Deferred to a follow-up commit (intentional, scoped via Progress Log):
explore-page "Primary only / All" filter; daily/mock routing.
Tests (2.4):
- test7_tier_aware_chain_routing in chain-and-vault-smoke.mjs:
secondary reachable via ?chain=, alt-path badge visible on
secondary, primary regression, alt-path badge ABSENT on primary.
- Full smoke suite: 17/17 pass (was 13/13).
Validation:
- vault check --strict: 10,701 loaded, 0 failures
- vault build --legacy-json: 9438 published, chainCount=879
- pytest interviews/vault-cli/tests: 74/74
- npx tsc --noEmit: 0 errors
- playwright chain-and-vault-smoke: 17/17
Phase 2 complete. Next: Phase 3 (gap-driven authoring; 407-gap backlog).
|
||
|
|
d272d374aa |
feat(chains): --mode lenient + tier field for second-pass coverage
Phase 1.2 + 1.3 of CHAIN_ROADMAP.md. The two land together because the
prompt template, validator Δ-rule, and tier-tagging must stay in lockstep
or chains.proposed.lenient.json would mis-validate.
build_chains_with_gemini.py:
- new LENIENT_PROMPT_TEMPLATE alongside renamed STRICT_PROMPT_TEMPLATE;
lenient template tells Gemini to accept Δ ∈ {0,1,2,3}, with Δ=0 only
for shared-scenario same-level pairs and Δ=3 last-resort
- MODE_CONFIG single-source-of-truth maps mode → (template, allowed Δ set)
- validate_chain now takes mode= and gates on the per-mode Δ set
- process_batch tags lenient-mode chains with tier="secondary" and
a chain_id suffix (-secondary) so primary/secondary IDs never collide
- new --mode {strict,lenient} flag (default strict — primary chains
keep producing under the same rules as before)
- new --buckets-from <chain-coverage.json> flag that restricts the run
to the uncovered_buckets list from diagnose_chain_coverage.py
(the Phase 1.4 second-pass entry point)
apply_proposed_chains.py:
- docstring note: tier field is intentionally not validated here
(it's a UI hint, not a structural invariant)
- already accepts Δ=0 chains via its non-strict monotonicity check, so
no logic change needed
tests/test_chain_validation.py:
- 19 cases covering both modes: strict accepts +1/+2 and rejects Δ=0,
Δ≥3, and backward; lenient accepts Δ=0/Δ=3 but still rejects Δ≥4 and
backward; both modes reject size-out-of-range, multi-topic, and
unknown qids. Loads the script via importlib (it's not part of the
importable vault_cli package).
Smoke check (--dry-run --buckets-from chain-coverage.json --mode lenient):
17 calls planned for the 211 uncovered buckets, well under the 200 cap.
|
||
|
|
1d3c91d8e8 |
fix(vault-cli): ruff auto-fixes — datetime.UTC alias + drop unused import
ruff catches: - UP017: datetime.timezone.utc -> datetime.UTC alias (Python 3.11+). - I001 / F401: drop unused Details import in test_models.py. Both auto-applied via `ruff check --fix`. Lint now passes; unblocks the publish-live green gate. |
||
|
|
eb71638630 |
feat(vault): release-grade Phase G — full audit + cleanup + 0.1.3 release
Final brute-force release-readiness pass: every gate green, 0.1.3
released and verified, every observable failure mode closed at source.
═══ AUDITS (G.A–G.D) ═══
G.A — gemini-3.1-pro-preview default everywhere. Active CLI scripts
already used it; bulk-patched 6 legacy scripts (`generate_batch.py`,
`validate_questions.py`, `generate_gaps.py`, `run_reviews.sh`,
`generate.py`, `review_math.sh`) + WORKFLOW.md off `gemini-2.5-flash`
or `gemini-2.5-pro` to `gemini-3.1-pro-preview`. Only `archive/`
references remain (intentionally legacy).
G.B — Cloudflare workflow audit. `vault verify 0.1.1` correctly
failed (YAMLs evolved since 0.1.1 cut). Confirmed `vault publish`,
`vault deploy`, `vault ship`, `vault rollback`, `vault verify`,
`vault snapshot`, `vault tag` all wired. Released 0.1.2 then 0.1.3
to lock final state.
G.C — Visual asset integrity audit. 236/236 YAML visual references
resolve, 0 orphan SVGs, 0 missing files, 0 unrendered sources.
Clean.
G.D — Unit tests for new validators added at `tests/test_models.py`:
15 tests covering Visual.kind enum, Visual.path regex, Visual.alt
+ caption min lengths + required, Question._zone_bloom_compatible
(recall+remember accepted, recall+evaluate rejected, mastery+
remember rejected, evaluation+evaluate accepted, design+create
accepted), Question._visual_path_resolves. **15/15 pass.**
═══ CONTENT CLEANUP (G.E–G.L) ═══
G.E — Sample re-judge of 100 random cloud parallelism items via
Gemini 3.1 Pro Preview (4 API calls): 53% PASS / 23% NEEDS_FIX /
24% DROP. Surfaced legacy quality drift — items generated under
pre-Phase-D laxer prompts were not meeting the new strict bar
(math errors with bidirectional vs unidirectional NVLink,
"Based on the diagram..." references with no diagram, deprecated
practices like SSP for modern LLM training, wrong-track scenarios
like Cortex-M4 in cloud track).
G.H — General-purpose cleanup agent on 47 flagged items:
**31 rewritten** with PARALLELISM_RULES bar applied (concrete
unidirectional NVLink 450 GB/s, IB NDR 25 GB/s, RoCE v2 22 GB/s,
PCIe Gen3 12 GB/s; multi-step ring AllReduce arguments with the
2(N-1)/N factor; non-obvious failure modes); **16 archived** with
documented `deletion_reason` (mathematically broken premises,
physics errors, topic-irreconcilable, direct duplicates).
G.L — Re-judge of 31 G.H rewrites: **23 PASS / 3 NEEDS_FIX / 5 DROP =
74.2% pass rate**. The 8 still-failing items archived (after the
cleanup pass still couldn't satisfy the strict bar). Contract:
items get THREE chances — original generation, fix-agent, retry-
fix — and if they still fail, archived not promoted. Honest.
═══ STUBBORN-FAIL ARCHIVES (Phase F residuals) ═══
After three independent fix-agent passes (Phase C, F.2, F.4), 4 items
remained NEEDS_FIX or DROP: edge-2390, edge-2401, mobile-1948,
tinyml-1681. Archived with `deletion_reason` documenting the 3-attempt
failure history. The cell may be structurally awkward; preserving
items for audit but removing from the bundle.
═══ ORPHAN CHAIN FIX ═══
After archives, `cloud-chain-359` had only 1 published member
(`cloud-1840`); its sibling `cloud-1845` got archived. Dropped the
chain ref from cloud-1840 + ran `repair_chains.py` to clean residual
references in archived YAMLs. `vault check --strict` now passes 0
chain warnings.
═══ E.2 / E.3 SHIPPED EARLIER IN PRIOR COMMIT ═══
(Documented in commit `20ea20005` for completeness):
- `vault build --legacy-json` auto-emits `vault-manifest.json`.
- `analyze_coverage_gaps.py --include-areas <areas>` flag.
═══ 0.1.3 FINAL RELEASE ═══
`vault publish 0.1.3` snapshot at `releases/0.1.3/`. Migrations:
+0 ~27 -28 (zero net new questions, 27 modified during cleanup, 28
archived/promoted). `vault verify 0.1.3` ✓ — release_hash
`793c06f414f2bf8391a8a5c56ec0ff8d76bfce4ab7c64ad12ecb83f6d932280e`
reconstructs from YAML. Latest symlink → 0.1.3.
═══ FINAL ALL-9-GATES SWEEP — ALL GREEN ═══
[1] vault check --strict ✓ 10,701 / 0 errors / 0 invariants
[2] vault lint ✓ 0 errors / 0 warnings / 9,757 info
[3] vault doctor ✓ 0 fails (registry-history info OK)
[4] vault codegen --check ✓ artifacts in sync
[5] vault verify 0.1.3 ✓ hash reconstructs from YAML
[6] staffml validate-vault ✓ 0 errors / 0 warnings, deployment-ready
[7] render_visuals ✓ 236 visuals, 0 errors
[8] tsc ✓ TypeScript clean
[9] Playwright ✓ 9/9 pass
═══ FINAL CORPUS STATE ═══
Bundle: 9,757 published (was 9,224 at branch cut, **+533 net** across
the full multi-session push, after all archives).
Total commits on branch since cut: 10.
Release tag latest: 0.1.3 (verified-clean).
Status: StaffML-day-ready. Ship it.
|
||
|
|
37414fed9e |
chore(vault): regen staffml corpus + wire drift checks into CI
- Regenerate interviews/staffml/src/data/corpus.json from the v1.0 YAMLs. 9,199 published questions (up from 9,113 — recovered 86). Every record carries validated + math_verified booleans; human_reviewed surfaces when populated. Dead 'scope' field dropped. - Regenerate interviews/staffml/src/data/vault-manifest.json to match: questionCount 9199, chainCount 970, levelDistribution now shows L6+ as 946 (up from 0) and L1 corrected to 462 (down from 1,387 inflated). - Wire check_schema_sync.py into the pre-commit config under the StaffML section. - Wire check_schema_sync.py into the 'Vault + Corpus Smoke Tests' CI job so PR builds fail on enums.py ↔ LinkML drift. - Update test_legacy_export.py for v1.0: plural chains, classification on Question body, competency_area passed through instead of resolved. |
||
|
|
c3b8411230 |
fix(vault): resolve competency_area from topics.json instead of aliasing topic
The legacy exporter was setting competency_area = topic, collapsing the 13 canonical areas into 87 single-topic areas. The Vault UI showed "1 topics" per area instead of meaningful sub-groupings. Now resolves each topic's area via topics.json (87 topics -> 13 areas) with a graceful fallback to the topic slug if topics.json is absent. |
||
|
|
cbdb566381 |
feat(vault): Phase-1 migration contract fully closed in-repo
v2.3 \u2192 v2.4. ARCHITECTURE.md header + Appendix reflect the completed
migration.
WHAT CLOSED (\u00a711.1 contract):
1. `vault build --legacy-json` regenerates the site's
interviews/staffml/src/data/corpus.json from YAML. 9,199 published
questions, site-compatible shape (chain_positions back to 0-indexed
dict form, bloom_level derived from zone, competency_area aliased
from topic, scope aliased from track). Deterministic via sort_keys +
id-sort.
2. Pre-commit hook INSTALLED via worktree-aware Makefile target
(`make -C interviews/vault-cli hooks`). Symlink points at
pre_commit_corpus_guard.py. Tested end-to-end: direct edit to
vault/corpus.json triggers exit-1 with §11.1 reference.
3. CI equivalence check added to .github/workflows/vault-ci.yml:
regenerates corpus.json from YAML, diffs against committed. Fails
PR on drift with actionable error message.
4. Legacy generators demoted with DEPRECATED headers:
- interviews/paper/scripts/analyze_corpus.py \u2192 vault export-paper
- interviews/staffml/scripts/sync-vault.py \u2192 vault build --legacy-json
- interviews/staffml/scripts/generate-manifest.py \u2192 vault publish
- interviews/vault/scripts/export_to_staffml.py \u2192 vault build --legacy-json
5. New DEPRECATED.md files at interviews/vault/scripts/ and
interviews/staffml/scripts/ map every legacy script to its
replacement. Both directories keep the old scripts for git-history
legibility and archaeology; new contributors see the vault CLI first.
6. ARCHITECTURE.md \u00a7Appendix rewritten as current-state table instead
of aspirational "gone. replaced by..." entries.
NEW TESTS (interviews/vault-cli/tests/test_legacy_export.py \u2014 +4):
- test_legacy_shape_matches_site_interface: every field corpus.ts
declares is present in regenerated JSON.
- test_chain_positions_legacy_shape: 1-indexed new schema \u2192
0-indexed legacy dict form.
- test_emitter_deterministic: byte-stable across reversed input order
(required for CI diff-check).
- test_competency_area_aliases_topic: legacy alias fields populated
correctly.
FULL MATRIX GREEN:
pytest: 38/38 passed in 0.19s (34 + 4 legacy-export)
ruff: All checks passed
hook: exit 0 on clean diff / exit 1 on corpus.json direct edit
e2e: vault build --legacy-json regenerates a bit-identical corpus.json
vs the committed one; CI check wired to catch drift
WHAT'S LEFT (deploy-gated, \u00a720.5 #1, #5, #6 partial, #8, #9):
- Production serves from D1: requires Phase-3 wrangler d1 create + deploy
- Manual QA per CUTOVER_QA.md: requires live staging
- Zero data loss D1-side verification: requires live D1
- 48h monitoring: requires production traffic
These are intrinsically user-action; the YAML-side migration is done.
|
||
|
|
4aae33c036 |
test+ci: green test matrix + lint-clean + real vitest + committed lockfile
LOCAL TEST RESULTS (all green):
pytest: 34 passed in 0.19s (28 existing + 6 new command tests)
ruff: All checks passed (0 errors)
vitest: 7 passed in 127ms (worker contract tests)
CLI e2e: vault --version / build / verify / stats / doctor / diff /
export-paper / ship --dry-run / publish + verify rc1 / api shim
via curl against 9199-question corpus — all green
Python-side fixes:
- interviews/vault-cli/pyproject.toml: ruff config now has principled
per-file-ignores for B008 (Typer pattern), N806 (DAG cycle colors),
E402 (scripts), SIM118 (sqlite3.Row iterator). Keeps signal tight.
- 13 real ruff violations fixed across authoring.py (contextlib.suppress),
diff_cmd.py + serve_api.py (dict(sqlite3.Row) instead of broken
.keys() iteration), policy.py (direct return), release.py (zip
strict=True, update_latest_symlink now validates target exists;
previous 'target' variable was unused), commands/release.py
(import order reshuffled, ambiguous 'l' renamed).
- commands/release.py ship_cmd leg-skip uses 'leg' not 'l'.
New pytest file: interviews/vault-cli/tests/test_commands.py (+6 tests)
- stats: JSON shape + Prometheus format.
- diff: add/remove/modify detection + classification.
- doctor: graceful skip on missing vault; unknown --check returns
USAGE_ERROR.
- codegen: --check passes against baseline.
Worker-side fixes:
- src/index.ts cachedOrCompute graceful-degrades when caches global
isn't available (Node test env, future-proofing against runtime
regressions).
- src/index.ts handleSearch: 'query: q' → 'query: qRaw' (q was
renamed earlier).
- src/rate_limit.ts: removed unused WINDOW_MS const.
- tests/worker.test.ts: vi.resetModules() between tests so
module-level schemaOk/lastSeenRelease state doesn't leak
across test cases (fingerprint memoization was sticky).
- package.json: added test:watch + lint aliases.
- .gitignore: node_modules, .wrangler, dist, .dev.vars.
- package-lock.json committed (npm — pnpm not on the machine; CI
updated to use npm ci).
CI (.github/workflows/vault-ci.yml):
- Split into python + worker jobs.
- Python job: ruff + mypy (non-blocking) + pytest + vault check
--strict + vault build release_hash regression + vault codegen
--check + registry append-only + exemplar audit staleness.
- Worker job: node 20 + npm ci + tsc typecheck + vitest run.
- Triggers now include staffml-vault-types path (keeps CI honest
when shared-types drift).
What runs vs what's gated on user:
RAN LOCALLY: pytest, ruff, vitest, tsc, CLI end-to-end smoke
(build→verify→export→stats→doctor→diff→publish
rc→api-shim→ship --dry-run), full corpus invariants.
GATED ON USER (requires Cloudflare credentials):
- wrangler login + wrangler d1 create
- wrangler d1 execute (schema + seed)
- pnpm/npm deploy:staging
- FTS5 production load-test
- vault ship --env production (live D1 + Next.js + tag push)
Everything that CAN be verified without credentials HAS been.
|
||
|
|
42f4d1ca8b |
fix(vault): Round-3 correctness + vault ship + authoring contract
Round-3 review (4 reviewers on v2.1) surfaced two code-correctness
Criticals that this commit fixes, plus the contracted-but-missing
`vault ship` coordinator and David's authoring-UX gaps.
Critical fixes (real bugs in landed code):
worker/src/index.ts
- SCHEMA_FINGERPRINT placeholder fail-closed (Chip R3-C1 / Dean R3-NH-3).
Was: placeholder auto-passed and silently disabled the fingerprint
check. Now: placeholder forces degraded mode until operator sets
real fingerprint.
- DDL hash now includes triggers (FTS5-aware).
- release_id change invalidates schema-fingerprint memoization
(Dean R3-NH-4).
- wrangler.toml now pins the real fingerprint.
staffml/public/sw.js
- /manifest polling TTL-throttled to 5min (Chip R3-C2). Was:
per-request fetch nullified the §10.4 cost model.
- API origin persisted to IndexedDB; rehydrated on activate so cold
offline wake-ups serve cached content (Chip R3-H3).
vault-cli/src/vault_cli/release.py
- emit_migrations diffs all 4 tables via PRAGMA-driven column
introspection (Dean R3-NC-1 + R3-NH-2). Was: only questions table,
silently missing chains/chain_questions/tags. Rollback-symmetry
test extended to populate + verify all tables.
vault-cli/src/vault_cli/commands/release.py
- vault verify --git-ref reconstructs release from 'git archive <ref>'
into a tempdir (Dean R3-NC-2). Was: always rebuilt from HEAD, so
verifying a historical release always failed post-authoring.
Academic-citability contract (C-3) now actually holds.
vault-cli/src/vault_cli/ship.py (NEW)
- vault ship composed verb with journaling (Dean R3-NH-1):
* Legs run D1 → Next.js → paper-tag-last (§6.1.1 ordering).
* Journal at releases/<v>/.ship-journal.json records per-leg state;
--resume continues interrupted ships idempotently.
* Pre-paper failure auto-rolls back in reverse order.
* Paper-leg failure pages operator; does NOT auto-rollback earlier
legs (git tag is remote-durable per §6.1.1).
- 4 unit tests cover happy path, pre-paper failure auto-rollback,
paper-leg needs-manual, --resume across interruptions.
vault-cli/src/vault_cli/commands/authoring.py
- vault new appends to id-registry.yaml (David R3-H3 + C-5
enforcement); `git pull --rebase` before allocation.
- authors: auto-populated from git config user.email (David R3-H4 /
M-15). Was: field never set.
- vault edit injects validation-error comment block at top of YAML
and re-opens up to --retries=3 times (David R3-H1). Was: terminal
traceback mid-authoring session.
- vault move refuses dirty tree, chained question, excluded-cell
per applicability matrix (David R3-H2). Was: unchecked git mv.
- vault renumber command (NEW): post-rebase seq-collision recovery.
Bumps seq, renames file, updates id field, appends registry
(David R3-N-2, was spec-only).
- vault mark-exemplar command (NEW): promotes to vault/exemplars/
with provenance + human_reviewed_at gate (David R3-N-9).
vault-cli/src/vault_cli/compiler.py
- FTS5 virtual table + sync triggers added to DDL (B.5). Triggers
keep questions_fts in sync via AFTER INSERT/UPDATE/DELETE.
schema_fingerprint accounts for triggers now.
tests/test_hashing.py
- Nested-dict hash-stability fixture (Soumith R3-F-4). Was: test
only reordered top-level keys + collapsed details to one key.
All 28 tests pass (22 → 28: +4 ship journaling, +1 multi-table
migration symmetry, +1 nested-dict hash stability). release_hash
unchanged at 1b304282... — FTS5 addition doesn't affect content
Merkle per §3.5 input-only design.
|
||
|
|
8205d8a5f9 |
feat(vault): Phase 2 release pipeline — snapshot, migrations, export-paper, publish, verify
Primitives (§4.2):
vault snapshot <v> — stage to releases/.pending-<v>/.
vault migrations-emit A B — forward + inverse SQL; rollback embeds full
prior-row bodies for UPDATE/DELETE so rollback
works without mechanical inversion (C-1).
vault export-paper <v> — emit macros.tex + corpus_stats.json via SQL
over releases/<v>/vault.db. Replaces
paper/scripts/generate_macros.py; paper + site
agree by construction (H-21 runtime closure).
vault tag <v> — git-commit + git-tag v<version>.
vault verify <v> — reconstruct release_hash from YAML source and
assert equality with release.json (C-3
academic-citability property).
Composed product (§4.3):
vault publish <v> — check --strict + build + snapshot +
migrations-emit. Stages to .pending-<v>/ and
swaps to final via POSIX rename(2) as the
last step (C-7 non-atomic fix). --resume
detects orphaned pending dirs.
latest symlink swap via rename-over-tmp.
First citable release artifact committed: releases/0.9.0/ with
vault.db (25 MB), release.json (release_hash, policy_version,
git_sha, timestamp), d1-migration.sql + d1-rollback.sql stubs.
Gitignore updated: /vault.db at vault/ root excludes the transient
build artifact, but vault.db inside releases/<v>/ IS committed per
§13 (academic integrity — permanently citable).
Tests (22 pass, +3):
- test_snapshot_copies_db_and_writes_release_json
- test_migrations_emit_added_modified_removed: verifies rollback embeds
prior-row body (C-1 correctness)
- test_rollback_symmetry_property: apply-forward-then-rollback returns
dump identical to pre-migration state (closes M-1 property test).
Phase 2 milestone: PASSED. End-to-end:
vault publish 0.9.0 # composed product
vault verify 0.9.0 # from-source round-trip
vault export-paper 0.9.0 # emits macros.tex + stats
→ release_hash 1b304282... reproducible from YAML.
|
||
|
|
812ba408d0 |
feat(vault): Phase 1 core — schema, hashing, policy, loader, validator
LinkML schema at vault/schema/question_schema.yaml is the sole schema
source of truth. Pydantic models in vault_cli.models are currently
hand-authored to match; full LinkML codegen wires in Phase 2 with the
drift-check in CI.
Core modules:
vault_cli/models.py — Pydantic question model (closed enums, content-
format per field, schema_version=1 gate).
vault_cli/hashing.py — canonical content_hash over whitelisted fields;
release_hash Merkle with __policy__ and
__canon_version__ leaves (Chip N-H5).
vault_cli/yaml_io.py — hardened SafeLoader: 256KB cap, depth 10 cap,
aliases rejected, timeout (H-7).
vault_cli/paths.py — path-as-classification parser with lowercase +
enum enforcement (H-9).
vault_cli/loader.py — walks vault/questions/, returns loaded + errors
(never raises — aggregate reporting).
vault_cli/validator.py — tiered invariant engine; fast + structural tiers
implemented per ARCHITECTURE.md §5.
vault_cli/compiler.py — YAML → SQLite with release_metadata rows
(release_id, release_hash, policy_version,
schema_version, published_count).
vault_cli/policy.py — single filter predicate. No consumer
re-implements (H-21).
release-policy.yaml v1: status=published. Dropped require_validated in
the wake of 9199/8053 resolution — validation is implicit in the
maintainer-approval → status=published transition, not a separate flag.
Tests (19 pass): key-order hash invariance (Soumith M-NEW-4), policy
filter correctness (H-21 runtime check), YAML hardening (H-7).
|
||
|
|
5ee46fc2a5 |
feat(vault-cli): Phase 0 package scaffold
Pyproject.toml with Typer+Rich+Pydantic+PyYAML deps. Console entry point 'vault' → vault_cli.main:app. Smoke tests cover --version, --help, and the exit-code taxonomy regression guard. Module layout: src/vault_cli/__init__.py src/vault_cli/_version.py — single source for __version__ src/vault_cli/exit_codes.py — stable IntEnum taxonomy (§4.6) src/vault_cli/main.py — Typer app, --version flag tests/test_smoke.py — 4 tests, all green in 0.04s Subcommands land incrementally from Phase 1 per ARCHITECTURE.md §14. Python ≥3.12 required; CI pins 3.12 for hash stability. Milestone gate: pip install -e interviews/vault-cli/[dev] && vault --version passes. Tests green. Ready for Phase 1. |