mirror of
https://github.com/harvard-edge/cs249r_book.git
synced 2026-05-08 09:57:21 -05:00
dev
4 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
bc26a0bf37 |
feat(vault): Phase 6 schema tightening — markers + Details forbid + invariant
Three coordinated edits to lift the marker convention from a soft draft-validation gate to a published-corpus invariant: 1. interviews/vault/schema/question_schema.yaml (LinkML, source of truth): common_mistake and napkin_math gain regex patterns matching the AUTHORING.md Pitfall/Rationale/Consequence and Assumptions/ Calculations/Conclusion conventions. Documents the spec; enforced in the validator below. 2. interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived): Details flips from extra='allow' to extra='forbid'. A pre-flight survey on 2026-05-04 across all 10,711 YAMLs found 0 unknown keys on Details, so the historical 'imported legacy fields' risk no longer applies. 3. interviews/vault-cli/src/vault_cli/validator.py: structural_tier gains _check_format_markers (invariant #19), which flags published YAMLs whose non-empty cm/nm doesn't match the AUTHORING.md markers. Drafts are exempt — author-in-progress drafts may still have malformed markers. Lifts gate_format from validate_drafts.py / _judges.py from a CI-time gate to a vault-check-strict invariant. Tests: 4 new cases in test_models covering Details forbid, marker- compliant pass, malformed cm fail, and draft-exempt skip. Total 88 passing (was 84). codegen-hashes.txt updated for the models.py edit; vault codegen --check passes. The on-disk corpus is fully clean post-Phase-5+drain: vault check --strict reports 10,711 loaded, 0 invariant failures, 0 format- marker violations on published YAMLs. |
||
|
|
542aaf95d2 |
cleanup(vault): release-ready Phase A — schema hardening + lint calibration + chain repair
Closes the cleanup arc (A.1–A.10 in RESUME_PLAN_RELEASE.md). Every
gate is now green: vault check --strict, vault lint, vault doctor,
vault codegen --check, staffml validate-vault, Playwright (9/9), tsc.
A.1 mobile-1962.svg: renamed `Edge` → `RegEdge` in graphviz source
(`Edge` is a reserved keyword); SVG renders cleanly. Also fixed
tinyml-1570.py (missing `import numpy as np`) which the new failure
log surfaced.
A.2 render_visuals.py: structured per-ID failure log written to
`_validation_results/render_failures.json` on every run; non-zero
exit on any per-item crash; new `--fail-fast` and `--failure-log`
CLI options. Replaces the prior silent-failure mode.
A.3 LinkML visual schema: typed as a structured sub-schema. New
`VisualKind` enum (svg only — `mermaid` was reserved but never
shipped, dropped to keep the enum honest). Path regex tightened
to `^[a-z0-9-]+\.svg$`. Alt minimum length 10, caption required
minimum length 5. TypeScript Visual interface + Question.visual
field added to staffml-vault-types/index.ts.
A.4 Pydantic Visual + Question validators:
- Visual.kind hard-rejects anything but `svg`
- Visual.path enforces the new regex
- Visual.alt min 10 chars, caption required min 5 chars
- Question.model_validator: visual.path MUST resolve to a real
file under interviews/vault/visuals/<track>/. Skipped in
production deploys where the working tree is absent.
A.5 Registry repair + doctor split:
- tools: repair_registry.py appended 5,269 missing IDs
(the rename refactor at
|
||
|
|
a17107f3df |
chore(vault-cli): update d1 schema + codegen hashes for schema v1.0
- d1-schema.sql: regenerated to match compiler.py changes. Adds competency_area, bloom_level, phase, human_review_* columns to questions table. Adds idx_questions_human_review index. chain_questions PK changes from (chain_id, position) to (chain_id, question_id) for multi-chain + non-contiguous support. Drops deep_dive_title/deep_dive_url. - codegen-hashes.txt: new baseline covering the v1.0 models.py, d1-schema.sql, and @staffml/vault-types/index.ts. Fixes the vault codegen --check drift test that was failing CI. |
||
|
|
f25f9e8184 |
feat(vault): B.1-B.7 + B.13 + B.15 + B.17 \u2014 finish bucket B
Worker hardening (interviews/staffml-vault-worker/src/index.ts rewritten):
- B.1 Cloudflare Cache API wired via caches.default; cache key is
/__vault__/<release_id>/<path> so each release is a disjoint namespace.
Deploy changes release_id \u2192 all old entries miss atomically. Degraded
responses are NEVER cached (would poison the namespace).
- B.3 Keyset pagination: cursor is {after_id, filter_hash}. Server
computes filter_hash per-request and rejects cross-filter cursor reuse
with 400. Pagination cost drops from O(offset + N) to O(N) per page.
- B.4 Rate limiting via RATE_LIMIT_KV (src/rate_limit.ts): token bucket
per (IP, class) windowed at 60s. 'default' 60 rpm, 'search' 10 rpm.
Returns 429 with Retry-After header. Open-allows if KV not bound so
the local vault-api shim still works.
- /search uses FTS5 MATCH when questions_fts exists; fallback to LIKE
for pre-FTS5 D1 instances. Escapes FTS5 special chars to prevent
MATCH injection.
vault-api.ts circuit breaker (B.2 \u2014 Soumith R3-F-2 fix):
- Proper closed \u2192 open \u2192 half-open state machine. Half-open admits
exactly one probe; failure \u2192 re-open immediately, success \u2192 close.
- AbortSignal.timeout(10_000) per-attempt; AbortSignal.any() combines
with caller's signal so React unmounts don't count as failures.
- Retry only on retryable statuses (408/425/429/5xx/network), not on
4xx user errors or caller-aborted fetches.
- Module-level _singleton so multiple makeClientFromEnv() share breaker
state. __resetSingleton() exposed for tests.
Worker vitest suite (B.6 \u2014 staffml-vault-worker/tests/worker.test.ts):
6 tests: rate-limit under/over cap with Retry-After; schema-fingerprint
placeholder forces degraded mode; real fingerprint clears flag;
cursor filter_hash mismatch returns 400; CORS echoes allowed origin;
405 on POST/PUT/DELETE; /admin/release returns 404 (no auth footgun).
vault ship real hooks (B.15 \u2014 commands/release.py):
- d1_forward: pnpm exec wrangler d1 execute <env-db> --file <migration.sql>
- d1_rollback: applies d1-rollback.sql (SQL path); snapshot path remains
primary per \u00a76.2.
- nextjs_forward: pnpm run deploy:<env> from site_dir.
- nextjs_rollback: pnpm exec wrangler pages deployment list (lets operator
pick rollback target).
- paper_forward: git tag -a v<version> && git push origin v<version>.
- --skip-legs allows shipping subset (e.g., skip=paper for pre-tag validation).
Content-hash SLI workflow (B.5 \u2014 .github/workflows/vault-content-hash-sli.yml):
Hourly GitHub Action samples 20 IDs from latest release's vault.db,
fetches same IDs from production worker, recomputes canonical content_hash
in Python, asserts parity. Files a priority-high issue on mismatch.
Avoids porting hashing.py canonicalization to TypeScript (Chip R3-H5's
invariant-bomb risk).
JSON schemas (B.7 \u2014 vault-cli/docs/JSON_OUTPUT.md):
Full stable shapes for build, publish, ship, new, rm, move, renumber,
restore, promote, mark-exemplar, snapshot, migrations-emit, export-paper,
tag, deploy, rollback, generate. Plus notes for serve/api (not
JSON-emitting \u2014 long-running servers).
Codegen hash baseline (B.13 hash-check variant):
vault codegen --check now computes SHA-256 over 3 shared artifacts and
compares to committed interviews/vault-cli/codegen-hashes.txt. First run
auto-records baseline; subsequent runs enforce no drift. Full LinkML-driven
regeneration remains a Phase-2 follow-up. Baseline recorded this commit.
Component migration hook (B.17 \u2014
staffml/src/lib/hooks/useVaultQuestion.ts):
Minimal React hook that routes through corpus-source.ts. Components opt
into the cutover by importing from here; existing corpus.ts callers remain
untouched. Cutover-day swap is one import per component, not a big-bang
replacement.
28/28 pytest still green. release_hash 1b304282... unchanged (no
content-affecting mutations).
|