4 Commits

Author SHA1 Message Date
Vijay Janapa Reddi
bc26a0bf37 feat(vault): Phase 6 schema tightening — markers + Details forbid + invariant
Three coordinated edits to lift the marker convention from a soft
draft-validation gate to a published-corpus invariant:

1. interviews/vault/schema/question_schema.yaml (LinkML, source of truth):
   common_mistake and napkin_math gain regex patterns matching the
   AUTHORING.md Pitfall/Rationale/Consequence and Assumptions/
   Calculations/Conclusion conventions. Documents the spec; enforced
   in the validator below.

2. interviews/vault-cli/src/vault_cli/models.py (Pydantic, derived):
   Details flips from extra='allow' to extra='forbid'. A pre-flight
   survey on 2026-05-04 across all 10,711 YAMLs found 0 unknown keys
   on Details, so the historical 'imported legacy fields' risk no
   longer applies.

3. interviews/vault-cli/src/vault_cli/validator.py:
   structural_tier gains _check_format_markers (invariant #19), which
   flags published YAMLs whose non-empty cm/nm doesn't match the
   AUTHORING.md markers. Drafts are exempt — author-in-progress drafts
   may still have malformed markers. Lifts gate_format from
   validate_drafts.py / _judges.py from a CI-time gate to a
   vault-check-strict invariant.

Tests: 4 new cases in test_models covering Details forbid, marker-
compliant pass, malformed cm fail, and draft-exempt skip. Total
88 passing (was 84). codegen-hashes.txt updated for the models.py
edit; vault codegen --check passes.

The on-disk corpus is fully clean post-Phase-5+drain: vault check
--strict reports 10,711 loaded, 0 invariant failures, 0 format-
marker violations on published YAMLs.
2026-05-04 08:41:08 -04:00
Vijay Janapa Reddi
542aaf95d2 cleanup(vault): release-ready Phase A — schema hardening + lint calibration + chain repair
Closes the cleanup arc (A.1–A.10 in RESUME_PLAN_RELEASE.md). Every
gate is now green: vault check --strict, vault lint, vault doctor,
vault codegen --check, staffml validate-vault, Playwright (9/9), tsc.

A.1 mobile-1962.svg: renamed `Edge` → `RegEdge` in graphviz source
    (`Edge` is a reserved keyword); SVG renders cleanly. Also fixed
    tinyml-1570.py (missing `import numpy as np`) which the new failure
    log surfaced.

A.2 render_visuals.py: structured per-ID failure log written to
    `_validation_results/render_failures.json` on every run; non-zero
    exit on any per-item crash; new `--fail-fast` and `--failure-log`
    CLI options. Replaces the prior silent-failure mode.

A.3 LinkML visual schema: typed as a structured sub-schema. New
    `VisualKind` enum (svg only — `mermaid` was reserved but never
    shipped, dropped to keep the enum honest). Path regex tightened
    to `^[a-z0-9-]+\.svg$`. Alt minimum length 10, caption required
    minimum length 5. TypeScript Visual interface + Question.visual
    field added to staffml-vault-types/index.ts.

A.4 Pydantic Visual + Question validators:
    - Visual.kind hard-rejects anything but `svg`
    - Visual.path enforces the new regex
    - Visual.alt min 10 chars, caption required min 5 chars
    - Question.model_validator: visual.path MUST resolve to a real
      file under interviews/vault/visuals/<track>/. Skipped in
      production deploys where the working tree is absent.

A.5 Registry repair + doctor split:
    - tools: repair_registry.py appended 5,269 missing IDs
      (the rename refactor at 8a5c3ff3c left the append-only registry
      unsynced; this brings disk-coverage to 100%). Header block in
      id-registry.yaml documents the rebuild rationale.
    - doctor.py: split symmetric `registry-integrity` check into
      `disk-coverage` (HARD FAIL if any disk YAML id is unregistered)
      and `registry-history` (INFO ONLY for retired ids — the registry
      is by design an audit log, retired ids are normal). Pre-existing
      `_check_schema_version` bug (`versions == {1}` vs string `"1.0"`)
      fixed.

A.6 Lint calibration via 4-expert consensus + bloom-canonical
    reclassification:
    - Spawned 4 experts (Vijay Reddi, Chip Huyen, Jeff Dean,
      education-reviewer) on 42 disputed (zone, level) pairs;
      consensus-builder aggregated to 15 valid / 19 invalid / 8
      borderline.
    - User arbitrated 8 borderlines: 7 widen / 1 reclassify.
    - Built ZONE_BLOOM_AFFINITY matrix (Education-Reviewer's idea):
      every zone admits its dominant Bloom verb + adjacent verbs,
      rejects clear hierarchy violations.
    - reclassify_zone_bloom_mismatch.py applied 576 deterministic
      zone fixes via BLOOM_CANONICAL_ZONE mapping (e.g. fluency+analyze
      → analyze, recall+analyze → analyze, evaluation+apply → implement).
    - Question.model_validator(_zone_bloom_compatible): hard-rejects
      future zone-bloom mismatches at write time. Generated drafts
      can no longer ship a self-contradicting classification.
    - ZONE_LEVEL_AFFINITY widened per consensus + arbitration +
      post-reclassification adjustments. Lint warnings: 1,308 → 0.

A.7 Chain integrity:
    - repair_chains.py: drops chain refs when a chain has <2 published
      members (chain ceases to exist), renumbers all members of any
      chain whose positions are non-sequential / duplicated /
      non-monotonic-by-level. Sort key: level ascending, then old
      position, then qid (deterministic).
    - validate-vault.py: relaxed sequential check to unique-positions
      check. Position gaps from mid-chain deletions are normal; what
      matters is uniqueness + bloom-monotonicity (vault check --strict
      enforces both from YAML source-of-truth).

A.8 Practice page visual + zoom modal:
    - QuestionVisual.tsx: wraps the `<img>` in `<Zoom>` from
      react-medium-image-zoom (4 KB). Click image → fullscreen
      `<dialog data-rmiz-modal>`; ESC closes. Added test-id
      `question-visual-img` for stable selector.
    - New Playwright test: 9th in the suite, deep-links cloud-4492,
      asserts the dialog opens on click and closes on ESC.
    - TypeScript: removed `mermaid` from local Visual types in
      corpus.ts and corpus-vault.ts; tsc clean.

A.9 All gates green:
    - vault check --strict: 0 errors / 0 invariant failures
    - vault lint: 0 errors / 0 warnings (was 1,308 warnings)
    - vault codegen --check: artifacts in sync (hash baseline updated)
    - vault doctor: 0 fails (registry-history info, git-state warn
      on uncommitted state-pre-this-commit)
    - staffml validate-vault: 0 errors / 0 warnings, deployment-ready
    - Playwright: 9/9 pass (was 8; +zoom modal test)
    - render_visuals: 0 errors (was 2 silent failures pre-A.2)
    - tsc: clean

Distribution after reclassification: 9,544 published unchanged;
576 items moved zone via bloom-canonical mapping (full per-item
report at /tmp/reclassify_changes.csv). Chain count 879 → 850
after orphan-singleton drops. release_hash updated.

Carry-forward to next session (Phase B):
- Priority gap closure for parallelism cells + global L4-L6+
  (the run that produced this corpus did not close the targeted
  cells; B.3 needs specialized prompts per cell-class)
- 120 NEEDS_FIX items from coverage_loop/20260425_150712/ still
  carry judge fix_suggestions; spawn fix-agent in Phase C
2026-04-25 15:12:51 -04:00
Vijay Janapa Reddi
a17107f3df chore(vault-cli): update d1 schema + codegen hashes for schema v1.0
- d1-schema.sql: regenerated to match compiler.py changes. Adds
  competency_area, bloom_level, phase, human_review_* columns to
  questions table. Adds idx_questions_human_review index.
  chain_questions PK changes from (chain_id, position) to
  (chain_id, question_id) for multi-chain + non-contiguous support.
  Drops deep_dive_title/deep_dive_url.
- codegen-hashes.txt: new baseline covering the v1.0 models.py,
  d1-schema.sql, and @staffml/vault-types/index.ts.

Fixes the vault codegen --check drift test that was failing CI.
2026-04-21 18:24:21 -04:00
Vijay Janapa Reddi
f25f9e8184 feat(vault): B.1-B.7 + B.13 + B.15 + B.17 \u2014 finish bucket B
Worker hardening (interviews/staffml-vault-worker/src/index.ts rewritten):
- B.1 Cloudflare Cache API wired via caches.default; cache key is
  /__vault__/<release_id>/<path> so each release is a disjoint namespace.
  Deploy changes release_id \u2192 all old entries miss atomically. Degraded
  responses are NEVER cached (would poison the namespace).
- B.3 Keyset pagination: cursor is {after_id, filter_hash}. Server
  computes filter_hash per-request and rejects cross-filter cursor reuse
  with 400. Pagination cost drops from O(offset + N) to O(N) per page.
- B.4 Rate limiting via RATE_LIMIT_KV (src/rate_limit.ts): token bucket
  per (IP, class) windowed at 60s. 'default' 60 rpm, 'search' 10 rpm.
  Returns 429 with Retry-After header. Open-allows if KV not bound so
  the local vault-api shim still works.
- /search uses FTS5 MATCH when questions_fts exists; fallback to LIKE
  for pre-FTS5 D1 instances. Escapes FTS5 special chars to prevent
  MATCH injection.

vault-api.ts circuit breaker (B.2 \u2014 Soumith R3-F-2 fix):
- Proper closed \u2192 open \u2192 half-open state machine. Half-open admits
  exactly one probe; failure \u2192 re-open immediately, success \u2192 close.
- AbortSignal.timeout(10_000) per-attempt; AbortSignal.any() combines
  with caller's signal so React unmounts don't count as failures.
- Retry only on retryable statuses (408/425/429/5xx/network), not on
  4xx user errors or caller-aborted fetches.
- Module-level _singleton so multiple makeClientFromEnv() share breaker
  state. __resetSingleton() exposed for tests.

Worker vitest suite (B.6 \u2014 staffml-vault-worker/tests/worker.test.ts):
6 tests: rate-limit under/over cap with Retry-After; schema-fingerprint
placeholder forces degraded mode; real fingerprint clears flag;
cursor filter_hash mismatch returns 400; CORS echoes allowed origin;
405 on POST/PUT/DELETE; /admin/release returns 404 (no auth footgun).

vault ship real hooks (B.15 \u2014 commands/release.py):
- d1_forward: pnpm exec wrangler d1 execute <env-db> --file <migration.sql>
- d1_rollback: applies d1-rollback.sql (SQL path); snapshot path remains
  primary per \u00a76.2.
- nextjs_forward: pnpm run deploy:<env> from site_dir.
- nextjs_rollback: pnpm exec wrangler pages deployment list (lets operator
  pick rollback target).
- paper_forward: git tag -a v<version> && git push origin v<version>.
- --skip-legs allows shipping subset (e.g., skip=paper for pre-tag validation).

Content-hash SLI workflow (B.5 \u2014 .github/workflows/vault-content-hash-sli.yml):
Hourly GitHub Action samples 20 IDs from latest release's vault.db,
fetches same IDs from production worker, recomputes canonical content_hash
in Python, asserts parity. Files a priority-high issue on mismatch.
Avoids porting hashing.py canonicalization to TypeScript (Chip R3-H5's
invariant-bomb risk).

JSON schemas (B.7 \u2014 vault-cli/docs/JSON_OUTPUT.md):
Full stable shapes for build, publish, ship, new, rm, move, renumber,
restore, promote, mark-exemplar, snapshot, migrations-emit, export-paper,
tag, deploy, rollback, generate. Plus notes for serve/api (not
JSON-emitting \u2014 long-running servers).

Codegen hash baseline (B.13 hash-check variant):
vault codegen --check now computes SHA-256 over 3 shared artifacts and
compares to committed interviews/vault-cli/codegen-hashes.txt. First run
auto-records baseline; subsequent runs enforce no drift. Full LinkML-driven
regeneration remains a Phase-2 follow-up. Baseline recorded this commit.

Component migration hook (B.17 \u2014
staffml/src/lib/hooks/useVaultQuestion.ts):
Minimal React hook that routes through corpus-source.ts. Components opt
into the cutover by importing from here; existing corpus.ts callers remain
untouched. Cutover-day swap is one import per component, not a big-bang
replacement.

28/28 pytest still green. release_hash 1b304282... unchanged (no
content-affecting mutations).
2026-04-16 14:04:03 -04:00